grep chain breaks due to pipe buffer

grob115 · 10-25-2011, 10:52 AM

Hi, it appears if I pipe two grep together, I don't see the output on stdout. For example

This works
tail -f file.log | grep 'content 1'

This doesn't work
tail -f file.log | grep 'content 1' | grep 'content 2'

Believe the issue is due to buffering by the pipe between the two greps. Now the questions:
1) Why wouldn't the same buffering issue appear between the tail and the first grep?
2) How can I control how much buffering is applied?

tronayne · 10-25-2011, 11:01 AM

You're most likely overflowing the system buffer; you might want to try xargs to avoid that.

Hope this helps some.

rknichols · 10-25-2011, 11:09 AM

The 'tail' program with the "-f" option always line-buffers its output, unlike the usual default of line-buffering only when stdout is connected to a terminal and block-buffering otherwise.

For 'grep', you can use the "--line-buffered" option to make it buffer in the same manner as "tail -f".

NevemTeve · 10-25-2011, 11:09 AM

Quote:

Originally Posted by grob115

This doesn't work
tail -f file.log | grep 'content 1' | grep 'content 2'

Or perhaps there are no lines having both 'content 1' and 'content 2'? (Or bring more concrete example.)

grob115 · 10-25-2011, 11:38 AM

Quote:

Originally Posted by rknichols

The 'tail' program with the "-f" option always line-buffers its output, unlike the usual default of line-buffering only when stdout is connected to a terminal and block-buffering otherwise.

For 'grep', you can use the "--line-buffered" option to make it buffer in the same manner as "tail -f".

You mean the following?
tail -f file.log | grep --line-buffered 'content 1' | grep 'content 2'

rknichols · 10-25-2011, 04:20 PM

Quote:

Originally Posted by grob115

You mean the following?
tail -f file.log | grep --line-buffered 'content 1' | grep 'content 2'

Yes. Didn't it work? Works fine for me. Any line that matches both grep expressions shows up immediately on the terminal.

grob115 · 10-26-2011, 10:00 AM

Hi, yes it did work. Not only does grep have buffer, the same is also true for awk and sed. Have you run across the unbuffer command? I don't have it on my OS but have seen elsewhere online people are talking about it. Wonder how it compares with the commands own built in switches for turning off buffering.

One thing though, it appears the unbuffer switch for sed (ie sed -u doesn't turn the buffer off completely). I notice that the following:
tail -f logfile.log | egrep --line-unbuffered <filter> | awk '{print{$1} fflush()}' | sed -u -e 's/something//g'

Is sometimes one line behind the following.
tail -f logfile.log | egrep --line-unbuffered <filter>

I'm blaming it on sed because the man page says:

Quote:

-u, --unbuffered

load minimal amounts of data from the input files and flush the output buffers more often

Note it says more often, and not immediately, or perform line buffering.

Also find the following very helpful.
http://www.pixelbeat.org/programming/stdio_buffering/

rknichols · 10-26-2011, 11:12 AM

Quote:

Originally Posted by grob115

Hi, yes it did work. Not only does grep have buffer, the same is also true for awk and sed. Have you run across the unbuffer command?

Anything that uses stdio is likely to do similar buffering. The default for stdio is to use line buffering for output to file descriptors connected to a terminal, and block buffering (typically 4K blocks) otherwise, except for stderr, which is always line-buffered. It's easy enough for the program to override the default, but it takes specific action to do so.

Yes, I've seen the 'unbuffer' command. It's part of the 'expect' package, and works by connecting a program's output descriptor to a pseudo-terminal device and then passing the data along to the next stage of the pipeline. When I last tried it several years ago, it didn't work -- output was still block-buffered. I haven't had occasion to try it since.

The issue with 'sed' is that it has an additional layer of internal buffering independent of stdio. You can spend an hour or so reading the manpage about it, but when a look at that manpage I am always reminded of an old comment deep in the source for the kernel's scheduler, "You aren't expected to understand this."

grob115 · 10-28-2011, 10:17 PM

Thanks. Is there a way to switch to line buffering for the PIDs triggered under my current login or a specific PPID? Or to change the buffering size from 4k to 0?

rknichols · 10-28-2011, 11:28 PM

Quote:

Originally Posted by grob115

Thanks. Is there a way to switch to line buffering for the PIDs triggered under my current login or a specific PPID? Or to change the buffering size from 4k to 0?

Basically, no. You would have to modify and re-compile each program for which you wanted that change, or else make equivalent changes to the stdio library.

FWIW, some limited testing I just tried with the 'unbuffer' command shows that it seems to be working properly these days. I don't recall just what the issue was with it in the past, though I do know I wasn't the only person having the problem.