Welcome to the most active Linux Forum on the web.
Go Back > Forums > Enterprise Linux Forums > Linux - Enterprise
User Name
Linux - Enterprise This forum is for all items relating to using Linux in the Enterprise.


  Search this Thread
Old 10-24-2012, 03:15 PM   #1
LQ Newbie
Registered: Oct 2012
Distribution: RHEL5, RHEL6
Posts: 1

Rep: Reputation: Disabled
Intermittent mkfifo Pipe Failures

We have been experiencing odd failures related to FIFO pipes in our shell scripts.

The technique we commonly use is to background processes which write to a mkfifo pipe, which allows us to effectively do parallel processing in our scripts on large amounts of data. However, sometimes the command that receives the data from the pipe (via cat) behaves as though the input contains a zero-byte file, yet when you subsequently cat the contents of the pipe, all of the data is returned.

In order to help us collect more data to find the cause and solution for this bug, please run the script below and post the output along with the OS version and file system type of the directory where you run it e.g.:
uname -a
mount -l | grep $(df -h . | tail -1 | awk '{print $1}')
Here is the script that effectively reproduces the bug:


awk 'BEGIN{for(i=0;i<1000;++i){print i}}' >datafile

docat="cat datafile"
for i in {1 .. 100} ; do
	docat="$docat && cat datafile"

for reps in {1..$nreps} ; do
	mkfifo mypipe
	eval "$docat" | awk '{print}' > mypipe &
	cat mypipe >myfile
	[[ -z $(head myfile) ]] && failcount=$(( $failcount + 1 ))
	echo "failrate ($failcount/$reps)" >status
	rm -f mypipe myfile

cat status
rm -f status datafile

This behavior may be caused by the kernel, or at the file system level, but we do not believe it's normal. The following are results from some recent tests we have run on multiple file systems:
1           ext4   RHEL6   ()
1           tmpfs  RHEL5   ()
3           ext3   RHEL5   ()
326         nfs    RHEL5   RHEL5
373         nfs4   RHEL5   RHEL6
Version info from one of the hosts we used to run these tests:
$ uname -a
Linux 2.6.18-274.3.1.el5 #1 SMP Fri Aug 26 18:49:02 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

$ /bin/ksh --version
  version         sh (AT&T Research) 93t+ 2010-02-02

$/bin/awk --version
GNU Awk 3.1.5

$ /usr/bin/mkfifo --version
mkfifo (GNU coreutils) 5.97

$ yum --version nfs
If you have any other insights or suggetions, please post a reply.

Please note that this is a relatively rare phenomenon, so the script supplied does some odd things in order to make it happen frequently enough to measure. There are ways to make it happen less often, but we still see it happen with negative consequences for our data processing systems.

Last edited by djl; 10-24-2012 at 03:51 PM. Reason: meant "post" not "send"
Old 10-25-2012, 12:07 PM   #2
Senior Member
Registered: Dec 2004
Location: Marburg, Germany
Distribution: openSUSE 13.1
Posts: 1,327

Rep: Reputation: 254Reputation: 254Reputation: 254
I canít reproduce this behavior. But maybe itís a race-condition as nowhere the exit code of any operation is checked. Can you insert a wait and try with it again:
	echo "failrate ($failcount/$reps)" >status
	rm -f mypipe myfile
Old 12-03-2012, 05:31 AM   #3
Senior Member
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,654

Rep: Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255Reputation: 1255
I think your problem is timing.

You need to ensure that the fifo is open for reading before you start writing. (reference man 7 fifo).

The way you have the command structure set up this may or may not happen.

You are putting the writing process (the: eval "$docat" | awk '{print}' > mypipe) in the background. If the first part (the eval "$docat" part) delays things long enough, then the following forground process (the cat mypipe >myfile) has time to get started, and open a read on the fifo.

If it happens the other way (the first write occurs) you should be getting a "SIGPIPE" error.

The timing becomes critical due to the way the 'eval "$docat"...' is handled.
The first process in the sequence started it the last one - awk '{print}' >mypipe. Awk doesn't wait, and will open the fifo... (no delay), then depending on how long it takes the "eval "$docat..." to produce the first bit of data...

If this delay is too short, then a write to the pipe can occur before the foreground process starts the read. This can happen due to other system loading factors outside the script. IF you happen have two or more processors, this should happen less often (depending on load of course). If you are on an idle multiprocessor, it should almost never happen.

You might try adding a simple sleep 2 to the awk (such as ...| sleep 2; awk '{... ) as I think that would be the minimal change.

You can also try using bash coprocesses...

Last edited by jpollard; 12-03-2012 at 05:47 AM. Reason: a bit more on why the timing is critical.


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Centos 6 - intermittent XP domain logon failures to Samba as PDC rylan76 Linux - Server 7 10-23-2012 04:27 AM
Error creating pipe with mkfifo on Samba mount Mr. Swillis Linux - Networking 4 01-15-2009 04:47 PM
write in pipe based system+mkfifo sahel Programming 1 12-27-2005 12:39 PM
Red Hat RPC Intermittent Failures WarrenRoss Linux - Networking 0 09-02-2004 08:16 AM

All times are GMT -5. The time now is 08:10 PM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration