Intermittent mkfifo Pipe Failures
We have been experiencing odd failures related to FIFO pipes in our shell scripts.
The technique we commonly use is to background processes which write to a mkfifo pipe, which allows us to effectively do parallel processing in our scripts on large amounts of data. However, sometimes the command that receives the data from the pipe (via cat) behaves as though the input contains a zero-byte file, yet when you subsequently cat the contents of the pipe, all of the data is returned. In order to help us collect more data to find the cause and solution for this bug, please run the script below and post the output along with the OS version and file system type of the directory where you run it e.g.: Code:
uname -a Code:
#!/bin/ksh This behavior may be caused by the kernel, or at the file system level, but we do not believe it's normal. The following are results from some recent tests we have run on multiple file systems: Code:
FAILSPER10K FSTYPE LOCALOS REMOTEOS Code:
$ uname -a Please note that this is a relatively rare phenomenon, so the script supplied does some odd things in order to make it happen frequently enough to measure. There are ways to make it happen less often, but we still see it happen with negative consequences for our data processing systems. |
I can’t reproduce this behavior. But maybe it’s a race-condition as nowhere the exit code of any operation is checked. Can you insert a wait and try with it again:
Code:
… |
I think your problem is timing.
You need to ensure that the fifo is open for reading before you start writing. (reference man 7 fifo). The way you have the command structure set up this may or may not happen. You are putting the writing process (the: eval "$docat" | awk '{print}' > mypipe) in the background. If the first part (the eval "$docat" part) delays things long enough, then the following forground process (the cat mypipe >myfile) has time to get started, and open a read on the fifo. If it happens the other way (the first write occurs) you should be getting a "SIGPIPE" error. The timing becomes critical due to the way the 'eval "$docat"...' is handled. The first process in the sequence started it the last one - awk '{print}' >mypipe. Awk doesn't wait, and will open the fifo... (no delay), then depending on how long it takes the "eval "$docat..." to produce the first bit of data... If this delay is too short, then a write to the pipe can occur before the foreground process starts the read. This can happen due to other system loading factors outside the script. IF you happen have two or more processors, this should happen less often (depending on load of course). If you are on an idle multiprocessor, it should almost never happen. You might try adding a simple sleep 2 to the awk (such as ...| sleep 2; awk '{... ) as I think that would be the minimal change. You can also try using bash coprocesses... |
All times are GMT -5. The time now is 11:58 PM. |