LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Enterprise Linux Forums > Linux - Enterprise
User Name
Password
Linux - Enterprise This forum is for all items relating to using Linux in the Enterprise.

Notices


Reply
  Search this Thread
Old 10-24-2012, 02:15 PM   #1
djl
LQ Newbie
 
Registered: Oct 2012
Distribution: RHEL5, RHEL6
Posts: 1

Rep: Reputation: Disabled
Intermittent mkfifo Pipe Failures


We have been experiencing odd failures related to FIFO pipes in our shell scripts.

The technique we commonly use is to background processes which write to a mkfifo pipe, which allows us to effectively do parallel processing in our scripts on large amounts of data. However, sometimes the command that receives the data from the pipe (via cat) behaves as though the input contains a zero-byte file, yet when you subsequently cat the contents of the pipe, all of the data is returned.

In order to help us collect more data to find the cause and solution for this bug, please run the script below and post the output along with the OS version and file system type of the directory where you run it e.g.:
Code:
uname -a
mount -l | grep $(df -h . | tail -1 | awk '{print $1}')
Here is the script that effectively reproduces the bug:
Code:
#!/bin/ksh

failcount=0
nreps=10000

awk 'BEGIN{for(i=0;i<1000;++i){print i}}' >datafile

docat="cat datafile"
for i in {1 .. 100} ; do
	docat="$docat && cat datafile"
done

for reps in {1..$nreps} ; do
	mkfifo mypipe
	eval "$docat" | awk '{print}' > mypipe &
	cat mypipe >myfile
	[[ -z $(head myfile) ]] && failcount=$(( $failcount + 1 ))
	echo "failrate ($failcount/$reps)" >status
	rm -f mypipe myfile
done

cat status
rm -f status datafile

This behavior may be caused by the kernel, or at the file system level, but we do not believe it's normal. The following are results from some recent tests we have run on multiple file systems:
Code:
FAILSPER10K FSTYPE LOCALOS REMOTEOS
1           ext4   RHEL6   ()
1           tmpfs  RHEL5   ()
3           ext3   RHEL5   ()
326         nfs    RHEL5   RHEL5
373         nfs4   RHEL5   RHEL6
Version info from one of the hosts we used to run these tests:
Code:
$ uname -a
Linux myhost.mydomain.com 2.6.18-274.3.1.el5 #1 SMP Fri Aug 26 18:49:02 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

$ /bin/ksh --version
  version         sh (AT&T Research) 93t+ 2010-02-02

$/bin/awk --version
GNU Awk 3.1.5

$ /usr/bin/mkfifo --version
mkfifo (GNU coreutils) 5.97

$ yum --version nfs
3.2.22
If you have any other insights or suggetions, please post a reply.

Please note that this is a relatively rare phenomenon, so the script supplied does some odd things in order to make it happen frequently enough to measure. There are ways to make it happen less often, but we still see it happen with negative consequences for our data processing systems.

Last edited by djl; 10-24-2012 at 02:51 PM. Reason: meant "post" not "send"
 
Old 10-25-2012, 11:07 AM   #2
Reuti
Senior Member
 
Registered: Dec 2004
Location: Marburg, Germany
Distribution: openSUSE 15.2
Posts: 1,339

Rep: Reputation: 260Reputation: 260Reputation: 260
I can’t reproduce this behavior. But maybe it’s a race-condition as nowhere the exit code of any operation is checked. Can you insert a wait and try with it again:
Code:
        …
	echo "failrate ($failcount/$reps)" >status
        wait
	rm -f mypipe myfile
        …
 
Old 12-03-2012, 04:31 AM   #3
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,912

Rep: Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513
I think your problem is timing.

You need to ensure that the fifo is open for reading before you start writing. (reference man 7 fifo).

The way you have the command structure set up this may or may not happen.

You are putting the writing process (the: eval "$docat" | awk '{print}' > mypipe) in the background. If the first part (the eval "$docat" part) delays things long enough, then the following forground process (the cat mypipe >myfile) has time to get started, and open a read on the fifo.

If it happens the other way (the first write occurs) you should be getting a "SIGPIPE" error.

The timing becomes critical due to the way the 'eval "$docat"...' is handled.
The first process in the sequence started it the last one - awk '{print}' >mypipe. Awk doesn't wait, and will open the fifo... (no delay), then depending on how long it takes the "eval "$docat..." to produce the first bit of data...

If this delay is too short, then a write to the pipe can occur before the foreground process starts the read. This can happen due to other system loading factors outside the script. IF you happen have two or more processors, this should happen less often (depending on load of course). If you are on an idle multiprocessor, it should almost never happen.

You might try adding a simple sleep 2 to the awk (such as ...| sleep 2; awk '{... ) as I think that would be the minimal change.

You can also try using bash coprocesses...

Last edited by jpollard; 12-03-2012 at 04:47 AM. Reason: a bit more on why the timing is critical.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Centos 6 - intermittent XP domain logon failures to Samba as PDC rylan76 Linux - Server 7 10-23-2012 03:27 AM
Error creating pipe with mkfifo on Samba mount Mr. Swillis Linux - Networking 4 01-15-2009 03:47 PM
write in pipe based system+mkfifo sahel Programming 1 12-27-2005 11:39 AM
Red Hat RPC Intermittent Failures WarrenRoss Linux - Networking 0 09-02-2004 07:16 AM

LinuxQuestions.org > Forums > Enterprise Linux Forums > Linux - Enterprise

All times are GMT -5. The time now is 07:20 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration