Fastest method to write incoming serial data to an sd card

jwmueckl · 05-14-2016, 07:20 PM

Linux users,

I’ve been brainstorming to determine the fastest method to capture data coming into a serial RS232 port on a Beaglebone Black at a baud rate of 115,200, and then write that data to a file on an SD card as fast and seamless as possible. The standard method is to use the fread function to read the input data followed by the fwrite command to write that data to a file on the SD card. I have tried that, but found that unless incoming data stops long enough to allow the fwrite command to complete, some data will definitely be lost.

The use of a Linux pipe appears to be the fastest way to speed up the data transfer between source and destination because data is transferred via memory as opposed to the slow speed of a serial port. The pipe command determines two file descriptors, one which I could hook up to the input port, and the other to the output port. In my case, however, the input and output ports are specified by the hardware. I would need to change the file descriptors created by the pipe command as follows, but it doesn’t seem likely that the new file descriptors would remain synchronized to the ones created by the pipe command.

int pfd[2], fileRd, fileWrt;
fileRd = open(in_uart, O_RDONLY | O_NOCTTY);
fileWrt = open (logFileName, O_WRONLY);
pipe(pfd); // create two file descriptors, one for each pipe
pfd[0] = fileRd;
pfd[1] = fileWrt;

I can route my incoming serial data to an RS485 port as opposed to an RS232 port and further increase the data input rate. But the issue of significance is that now the faster incoming data must be copied even faster to the sd card.

I would appreciate any suggestions from others who have faced this issue.

mknirsch · 05-15-2016, 10:02 AM

Hi jwmueckl,
well, you have a data source and a data destination that have different capabilities when it comes to IOPs. Best you separate data input and data output using two threads. You can use pipes, a mutex or semaphores to synchronize threads. Establish a memory region to exchange data between input and output threads. This can be shared memory, but a memory mapped IO or a fifo will do as well. Using a fifo probably is the easies method. Using any kind of read/write or c streams will do for IO. Check the IO descriptor status using basic function calls like ioctl, fcntl or switch.
Bare in mind to configure termios/terminfo suiting your needs.
IO speed is not an issue at all. The drivers reside in kernel space and are real fast.

Regards
Martin

jefro · 05-16-2016, 05:41 PM

If I recall Python had a way to do that but it is a very large install.

jwmueckl · 05-17-2016, 12:40 AM

Vielen dank, Martin. I would like to follow your recommendation. Suppose I decide to use a pipe that spawns two threads, one for reading data from an RS232 UART, and the other for writing to an SD card. Before the pipe is implemented, I have opened the input port for reading using the open command, and it creates the file descriptor fileRd. Likewise, I open the output port to send data to the SD card, and it creates the file descriptor fileWrt. Now, when the pipe command is implemented, it creates two more file descriptors, pfd[0] and pfd[1]. A fork follows the pipe command, implementing a read routine using the read function as the master, and a write routine using the write function as the slave. Both threads will be implemented simultaneously.

My question is, which file descriptor should be used in the read routine?
bytesRead = read ( fd[0], message, 100);
I would think that fd[0] must be fileRd, which pertains to the hardware port.
However, the pipe command has created a new file descriptor, pfd[0]. All the examples I have seen use that newly created file descriptor in the read routine:
bytesRead = read ( pfd[0], message, 100);
The same issue applies to the write command which is carried out in the write routine.

Also, would you please explain why you said that IO is not an issue at all? What kernel drivers are you referring to? Suppose that I replaced the RS232 port with an Ethernet port that sends data at 1Mbps instead of RS232’s 115,200. Would IO speed be a significant issue for this case?

sundialsvcs · 05-17-2016, 09:41 AM

You can actually do it all on the command line. Probably with the cat command ... applied twice.

The first process reads from the serial-port (its STDIN) and writes the data to a pipe (its STDOUT).

The second process reads from the pipe (its STDIN) and writes the data to the SD card.

The key to the whole picture is "what stands between" ... the pipe. The two processes are now loosely coupled, as it were, "by a flexible hose and a storage-tank." The reader can collect data from the serial port as fast as it comes in, and the writer can dispose of the data as soon as it gets it. The pipe allows the two processes to operate independently.

And "a pipe" is easy to come by: the "|" operator on the command-line!

The only thing that you must be sure of is that the SD card can, in fact, absorb the data fast enough. (If you're not sure of this, stage the data to a disk-drive directory first. A third process monitors this directory, looking for new files to appear, and shovels these out to the SD card.) Pipes have a certain fairly-beefy amount of storage capacity, though, so I sincerely doubt that this would be a problem.

suicidaleggroll · 05-17-2016, 02:14 PM

Quote:

Originally Posted by sundialsvcs

You can actually do it all on the command line. Probably with the cat command ... applied twice.

The first process reads from the serial-port (its STDIN) and writes the data to a pipe (its STDOUT).

The second process reads from the pipe (its STDIN) and writes the data to the SD card.

The key to the whole picture is "what stands between" ... the pipe. The two processes are now loosely coupled, as it were, "by a flexible hose and a storage-tank." The reader can collect data from the serial port as fast as it comes in, and the writer can dispose of the data as soon as it gets it. The pipe allows the two processes to operate independently.

And "a pipe" is easy to come by: the "|" operator on the command-line!

The only thing that you must be sure of is that the SD card can, in fact, absorb the data fast enough. (If you're not sure of this, stage the data to a disk-drive directory first. A third process monitors this directory, looking for new files to appear, and shovels these out to the SD card.) Pipes have a certain fairly-beefy amount of storage capacity, though, so I sincerely doubt that this would be a problem.

While true, sometimes the small size of the pipe buffer can cause problems. On Linux you can increase the size of the pipe, or you can create your own "big pipe" program which uses independent threads to read from stdin and write to stdout, connected by a fifo that can be as large as you like. I've had to do this on several occasions when minor "hiccups" in the consumer process cause the producer process to fill up the 64kB pipe and then start missing data.

sundialsvcs · 05-17-2016, 04:54 PM

Quote:

Originally Posted by suicidaleggroll

While true, sometimes the small size of the pipe buffer can cause problems. On Linux you can increase the size of the pipe, or you can create your own "big pipe" program which uses independent threads to read from stdin and write to stdout, connected by a fifo that can be as large as you like. I've had to do this on several occasions when minor "hiccups" in the consumer process cause the producer process to fill up the 64kB pipe and then start missing data.

Absolutely agree.

Experiential data should immediately be gathered to prove the concept, whatever concept it might turn out to be, and using the actual hardware contemplated. The OP should bear in mind that "SD Cards" (and the like ...) can vary wildly in their performance characteristics. They're designed to be cheap and capacious, and "speed" becomes "the number-three" of "pick any two."

jwmueckl · 05-17-2016, 11:40 PM

Unfortunately, I cannot live at a terminal typing commands for the rest of my life. We write programs so that we don't have to do that.

Can anyone answer any of my questions specifically?

suicidaleggroll · 05-18-2016, 08:58 AM

Quote:

Originally Posted by jwmueckl

Unfortunately, I cannot live at a terminal typing commands for the rest of my life. We write programs so that we don't have to do that.

I have no idea what that comment is written in reference to, but it came across as very snarky, and will reduce the likelyhood of getting useful advice from anybody.

Quote:

Originally Posted by jwmueckl

Can anyone answer any of my questions specifically?

You're making this too complicated IMO. Your data rate is very low (~11 kBps), and frankly I'm very surprised you're having any problems in the first place. I have certainly never had any issues reading from a 115200 bps serial port and writing it to an SD card in real time on an embedded system, maybe there's something wrong with your card?

Either way, just split your reading and writing into separate programs and join them with a pipe. Your reading process would read from the serial port and write to stdout, your writing process would read from stdin and write to disk, and your "main process" would simply be a script that runs "./read | ./write". That pipe will give you a 64 kB buffer, if you need more you can increase its size up to ~1 MB (see /proc/sys/fs/pipe-max-size for the maximum size) in your writing process with fcntl(ifd, F_SETPIPE_SZ, pipesize).

As I said in my earlier post, sometimes even this won't be big enough, at which point you can malloc your own memory array as big as you like and multi-thread your process, but I wouldn't expect you to need to go to those lengths until your data rate is at least 100x higher than present. As I said, I have had to do this before, but that system was reading from SPI at around 3 MBps and dumping to a process that doesn't just write to disk, but does extensive processing on the data first.

jwmueckl · 05-18-2016, 11:57 AM

Suicidaleggroll,

Thank you for your clarification. I couldn't understand why Sundialscvs was recommending that this all be done on the command line.

The use of the script apparently hides the details of the pipe so I don't have to specify any file descriptors. That is very simple! But if I need to increase the size of the pipe, I will need to specify ifd in the fcntl function. What would I define ifd?

When you implemented your multi-thread processes to read your 3MBps SPI, were you still able to use a script that runs "./read | ./write"? Or, did you use the pipe command and specify the file descriptors?

suicidaleggroll · 05-18-2016, 02:25 PM

"ifd" in my example command is the file descriptor on which you want to run fcntl, if that's stdin then you could simply replace it with 0, the file descriptor for stdin.

In my 3 MBps SPI application, I originally ran it in a script with:

Code:

./read | ./bigpipe | ./process

where bigpipe was a 2-threaded program that would malloc a large array, then read from stdin and dump into the array on thread 1, and read out of the array and dump to stdout on thread 2, essentially acting like a VERY big "|" (hence the name bigpipe). I used 128 MB for that application because the nature of the processing code meant it could stall for 10-20 seconds before resuming reading, and the buffer had to be big enough to work through it.

It worked well, but while investigating a different, completely unrelated problem, I ended up integrating all of the "bigpipe" logic directly into "read", so it would malloc and buffer the data stream internally, then replaced the script with a simpler:

Code:

./read | ./process

Once I finally found and fixed the bug, I could have gone back to the independent "bigpipe" approach, but decided to leave it as-is because it's a bit cleaner this way. At no point did I use the "pipe" command in C.

jwmueckl · 05-19-2016, 10:22 AM

Awesome. Especially the simplicity.