LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Stripping lines versus stripping bytes in a bash subshell. (https://www.linuxquestions.org/questions/programming-9/stripping-lines-versus-stripping-bytes-in-a-bash-subshell-763377/)

poorman_installer 10-21-2009 02:53 AM

Stripping lines versus stripping bytes in a bash subshell.
 
Can anybody explain this one?

Code:

bash-3.1# echo "1
2
3
4
5" | (head --lines=2 > /dev/null; cat)
bash-3.1#

So the above stripping don't work (no output), while the following does (the first "1" byte and its newline feed byte are stripped indeed):

Code:

bash-3.1# echo "1
2
3
4
5" | (head --bytes=2 > /dev/null; cat)
2
3
4
5
bash-3.1#


I'm puzzled. Thanks in advance.

Agrouf 10-21-2009 04:07 AM

Hello,
you are looking for the tail command.
Anyway, it looks like your implementation of head only reads the bytes needed from stdin with the --bytes option, while it reads all the file with the --lines parameter.

poorman_installer 10-21-2009 05:17 AM

Quote:

Originally Posted by Agrouf (Post 3727001)
Hello,
you are looking for the tail command.

Not really, see:
http://www.tomas-m.com/blog/994-Resume-your-build.html

Quote:

Anyway, it looks like your implementation of head only reads the bytes needed from stdin with the --bytes option, while it reads all the file with the --lines parameter.
I'm using slackware (slax) with GNU coreutils 6.12.
Your guess seems not to agree with experiments:

Code:

bash-3.1# ( for I in $(seq 1 8) ;do echo $I;done )| (head -2 >/dev/null; cat )
3
4
5
6
7
8
bash-3.1# ( for I in $(seq 1 8) ;do echo $I;done ) > a; cat a | (head -2 >/dev/null ;cat)
bash-3.1# ( for I in $(seq 1 1859) ;do echo $I;done ) > a; cat a | (head -2 >/dev/null ;cat)
bash-3.1# ( for I in $(seq 1 1860) ;do echo $I;done ) > a; cat a | (head -2 >/dev/null ;cat)

bash-3.1# ( for I in $(seq 1 1861) ;do echo $I;done ) > a; cat a | (head -2 >/dev/null ;cat)

1861
bash-3.1#

So there is a difference between chopping head on the fly and doing a passage inbetween with a real file.

The weird thing is that upon substituting "head -2" above with "sed -e '2 q'" (which should be effectively the same) the magic number 1860 lowers, while the first example works identically.

Agrouf 10-21-2009 05:40 AM

Well, I suppose the head command is buffering the input somehow. With the --bytes parameter, it know exactly how many bytes it has to read, therefore it reads exactly that. Without the --bytes command, it does not know how much bytes to read, so it reads a big chunk of data to be analyzed. In the first case, input is coming slowly, line by line, so it has the time to parse it and stop reading. When you use the cat command, input is coming fast, so the head command reads a big chunk of data before it parses it.

Anyway, is using the read command an option?
Code:

( for I in $(seq 1 8) ;do echo $I;done )|(read;read;cat)

catkin 10-21-2009 05:42 AM

I'm very puzzled; the behaviour is not consistent. I ran the following command repeatedly (by using up arrow at the command prompt to recall it)
Code:

c:~$ ( for I in $(seq 1 8) ;do echo $I;done )| (head --lines=2 >/dev/null; cat )
Sometimes it produced no output and sometimes (roughly half of the times for each)
Code:

3
4
5
6
7
8

Here are relevant software versions
Code:

c:~$ cat /etc/slackware-version
Slackware 13.0.0.0.0
c:~$ head --version
head (GNU coreutils) 7.4
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by David MacKenzie and Jim Meyering.
c:~$ cat --version
cat (GNU coreutils) 7.4
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Torbjorn Granlund and Richard M. Stallman.
c:~$ bash --version
GNU bash, version 3.1.17(2)-release (i486-slackware-linux-gnu)
Copyright (C) 2005 Free Software Foundation, Inc.


poorman_installer 10-21-2009 06:24 AM

Thanks Agrouf and Catkin for your feedback.
I actually had resolved by using line command similarly to what suggested by Agrouf, altough in a less satisfactory manner than I would have liked if sed and head had behaved as expected.
I posted the issue because it seems relevant with respect to on-the-fly implementations like the one treated in the link I give previously.
Also, I suppose Posix specifications should face the issue and dictate some rules, but I could not find anything on a first skimming through them.

I also did a Catkin-like trial :), by issuing the following commandline and waiting a few seconds (I omit the first tenths lines):
Code:

bash-3.1# while true;do ( for I in $(seq 1 8) ;do echo $I;done )| (head --lines=2 >/dev/null; cat )|wc;done
.
.
.
      6      6      12
      6      6      12
      6      6      12
      6      6      12
      6      6      12
      6      6      12
      6      6      12
      6      6      12
      6      6      12
      0      0      0
      6      6      12
      6      6      12
      6      6      12
      6      6      12
      6      6      12
      6      6      12
      6      6      12

The failure rate is quite lower than 50% reported by Catkin, as I noticed doing tests manually.
But now an even weirder punchline; try:

Code:

bash-3.1# ( while true;do ( for I in $(seq 1 8) ;do echo $I;done )| (head --lines=2 >/dev/null; cat )|wc;done ) |less
      0      0      0
      0      0      0
      0      0      0
      0      0      0
      0      0      0
      0      0      0
      0      0      0
      0      0      0
      0      0      0
      0      0      0
      0      0      0
      0      0      0
      0      0      0
      0      0      0
      0      0      0
      0      0      0
      0      0      0
      0      0      0
      0      0      0
      0      0      0
      0      0      0
      0      0      0
      0      0      0
      0      0      0
      0      0      0
      0      0      0
      0      0      0
      0      0      0
      0      0      0
      0      0      0
lines 1-30

Not only output is unexpected, but also when quitting less pager by pressing 'q', I'm not returned directly to the bash prompt. I have to add a Ctrl-C to do that, which is totally unexperienced to me.

Agrouf 10-21-2009 07:05 AM

I believe the head command is not supposed to be used like that anyway. It can read all the file, or what it wants depending on the implementation. I've just tested on AIX and there the head command reads it all either way, even with the -c argument (same as --bytes for GNU)
You should not expect head to read any specific amount of data.

ghostdog74 10-21-2009 07:49 AM

seriously...all you ever need is awk for what you are doing in post #1

poorman_installer 10-21-2009 08:20 AM

Quote:

Originally Posted by ghostdog74 (Post 3727200)
seriously...all you ever need is awk for what you are doing in post #1

Such a flat and useless statement.

1) I'm not trying to do anything in my first post, just exposing a phenomenon.

2) Although all started actually for a script I was setting up, you don't know what I needed to do originally and no, awk wouldn't have solved it aptly.

3) I already said the originating problem was already solved, so this was clearly a thread for its own sake.

4) you didn't add a bit to the core of the discussion, which is not about how to strip two top lines, that's a no-brainer; rather about what one expects from piping filters.

So, back away from diverting noises to the real matters, thanks to Agrouf for testing elsewhere unix. Probably you're right about what to expect from head, still I think that it's a pity to break the piping metaphore (one could say this issue violates conservation of matter, precisely water, to stay on the model), which is so simple and powerful.
It permits doing such things with few keystrokes, I hold it as one of the main gems left from original unix concepts.

And, anyway, if that can be called an unexpected behaviour, I strongly suspect that subsequent malfunctions (see last posts of catkin's and mine) are due to some bug.

catkin 10-21-2009 08:36 AM

I guess it's not so much defective behaviour as evidence of an asynchronous phenomenon; if either of head or cat (or wc) find their stdin empty they will exit and the pipeline (and sub-shells) are demolished even if the data-generating component had not finished. I have to go now so cannot try it myself; what happens if a sleep 1 is introduced before the data reading components start? I think that would produce consistent behaviour.


All times are GMT -5. The time now is 07:40 AM.