LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 06-23-2006, 09:13 PM   #1
tvynr
Member
 
Registered: Apr 2004
Distribution: Debian
Posts: 143

Rep: Reputation: 15
Bash Scripting: Question About tr


Hello, all. I am writing a bash script to simplify a common operation I am performing on my machine. The script executes multiple programs using a given set of CL parameters and produces a neat and tidy output. This is a fairly common exercise, I understand.

The subprocesses, however, dump large quantities of output to the standard output stream. In an effort to make the output more usable and readable in terms of a status report, I am reprocessing that output. All of the subprocesses use the technique of writing a carriage return instead of a newline and simply rewriting the last line. For example, the string "some garbage 0.00%\rsome more garbage 0.01%\rsome extra junk 0.02%" might be a common substring of the output of one of the subprocesses.

Since there is more than just the state of progress being written, I am piping the output through egrep -o to retrieve only the part I want (in the above example, the expression "[0-9.]+%" would be used to sift out the relevant portion). However, since the line breaks are carriage returns and not newlines, grepping the output doesn't work; grep keeps reading until it finds a newline, which doesn't appear until the subprocess has completed.

So, to address this problem, I tossed a tr between the subprocess and the grep to translate all carriage returns into newlines. This seems to work passably well but the granularity of the progress indicator is rougher than it was in the subprocess. I added --line-buffered to grep and that fixed some things, but it's still quite jerky.

My assumption is that tr is buffering more than I'd like and I only get output when its buffer fills up; this would explain the jerkiness, since the output will be processed in bursts in that case. So, the question is: is there any way I can control or affect the size of the buffer that tr uses?

Thanks for reading! I'm pretty sure I'm not going to get this to work without either rewriting tr or processing it character by character in the interpreted script (inefficiency... ick), but I thought I'd ask.

Cheers!
 
Old 06-25-2006, 02:52 PM   #2
acid_kewpie
Moderator
 
Registered: Jun 2001
Location: UK
Distribution: Gentoo, RHEL, Fedora, Centos
Posts: 43,372

Rep: Reputation: 1962Reputation: 1962Reputation: 1962Reputation: 1962Reputation: 1962Reputation: 1962Reputation: 1962Reputation: 1962Reputation: 1962Reputation: 1962Reputation: 1962
Whilst i've not been able to see anything concrete, tr clearly processes on a per line basis by default. It wants to obtain an entire field to execute on. I just had a play and as i'd have thought, it's the occurence of whatever $IFS contains that defines when it processes its data. for example, if we have a named pipe called test, which we listen to and pipe through to tr:
Code:
tail -f test | tr -d x
and then run a little doodad to enter data into it:
Code:
for i in $(seq 1 10); do echo -n $i > test; sleep 1; done
this shows nothing from tr at all as no line feeds enter it. echoing a normal line to it incluing a carriage return as normal, and all the contents dumps out. if you then run it again, but with a different LFS value:
Code:
IFS=5; for i in $(seq 1 10); do echo -n $i > test; sleep 1; done
then you see 1 to 4 appear at once, then 6 to 10 appear after you echo anythign else to the pipe. so if you can find somethign preferable to use over a new line you should have a clearer buffer, but i'mhalf thinking i'm going off in a direction that means nothign to what you're really asking...
 
Old 06-25-2006, 05:34 PM   #3
bigrigdriver
LQ Addict
 
Registered: Jul 2002
Location: East Centra Illinois, USA
Distribution: Debian Squeeze
Posts: 5,739

Rep: Reputation: 298Reputation: 298Reputation: 298
I don't know enough about shell scripting, and less about calls to C modules, but I wonder if it would be possible to make calls to C flush functions to flush the buffer after each newline or carriage return.

To find out which you have installed on your system, from a console, do 'apropos flush'. Then start reading. There might be something you can use to flush the buffer at each newline.

I may be way off base. If I am, I apologize for my illitaracy.
 
Old 06-26-2006, 01:22 AM   #4
acid_kewpie
Moderator
 
Registered: Jun 2001
Location: UK
Distribution: Gentoo, RHEL, Fedora, Centos
Posts: 43,372

Rep: Reputation: 1962Reputation: 1962Reputation: 1962Reputation: 1962Reputation: 1962Reputation: 1962Reputation: 1962Reputation: 1962Reputation: 1962Reputation: 1962Reputation: 1962
well it looks like each newline will flush the buffer. i'd suggest trying your own experiemnts with real output to check this.
 
Old 06-26-2006, 06:09 AM   #5
tvynr
Member
 
Registered: Apr 2004
Distribution: Debian
Posts: 143

Original Poster
Rep: Reputation: 15
First, thanks to both of you for your replies. bigrigdriver: I'm afraid that I do not understand what you mean at all. I suppose I should first find out what apropos is. Is this at all like KDE's dcop?

acid_kewpie: I tried the bash snippets you posted above and found them quite interesting. What precisely is IFS doing here? Do I take it that echo is using IFS to determine when to flush the buffer? Or is it something lower level than echo? Will I have to hope that the underlying processes will respect the contents of the IFS environment variable?

It seems possible that the problem will be resolved simply by adding $'\r' to the end of the IFS environment variable. I'm gonna go play with it and see if I can make it dance. :-D

Thanks again!
 
Old 06-26-2006, 06:36 AM   #6
jschiwal
Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654
I tried a little experiment. First I wrote a oneliner that simulates a progress indicater that uses "\r" to reprint on the same line:
> for (( progress=0; progress<101; progress++ )); do sleep 2; echo -ne "progress: ${progress}\r"; done
progress: 3

Next, I changed the value of IFS so that the return character would be used to separate fields. This was piped to your "tr '\r' '\n' filter. It would then print out on individual lines at each iteration of the loop.
> for (( progress=0; progress<101; progress++ )); do sleep 2; IFS=\005; echo -ne "progress: ${progress}\r" | tr '\r' '\n'; done
progress: 0
progress: 1
progress: 2
progress: 3
 
Old 06-26-2006, 06:41 AM   #7
tvynr
Member
 
Registered: Apr 2004
Distribution: Debian
Posts: 143

Original Poster
Rep: Reputation: 15
I've corrected my analysis of the problem, actually. I ran the following snippet:
Code:
for n in $(seq 0 9); do echo -en "$n\r"; sleep 1; done | (tr $'\r' $'\n')
which behaved as I wanted; each entry was translated immediately and written to the standard output stream with a newline instead of the original carriage return. Then, I tried
Code:
for n in $(seq 0 9); do echo -en "$n\r"; sleep 1; done | (tr $'\r' $'\n' | egrep --line-buffered -o '[0-9]')
which dumped the entire ten lines all at once. I must conclude, therefore, that the problem is grep and not tr, as I had originally thought.

This confuses me. I am using --line-buffered, which I thought would fix any buffering issues created by grep. I'll keep digging.

jschiwal: I'm not sure I fully understand your test. What does it illustrate? What is character 5?

Thanks again for your help, all!
 
Old 06-26-2006, 06:56 AM   #8
tvynr
Member
 
Registered: Apr 2004
Distribution: Debian
Posts: 143

Original Poster
Rep: Reputation: 15
I've performed a couple more tests:
Code:
n=0; while [ "$n" -lt "10000" ]; do echo -en "$n\r"; sleep 0.0001; n=$(($n+1)); done | (tr $'\r' $'\n' | egrep --line-buffered -o '[0-9]+'
n=0; while [ "$n" -lt "10000" ]; do echo -en "$n\r"; sleep 0.0001; n=$(($n+1)); done | (tr $'\r' $'\n' | egrep -o '[0-9]+')
The purpose of the above two one-liners is to attempt to determine the number of lines being buffered. As you can see, the only difference in the two commands is that the execution of egrep in one uses the "--line-buffered" flag whilst the other does not.

I executed each command several times. During each execution, I observed when the display changed and made a note of the bottom number. In both cases, the display updates came in bursts... the same bursts. Both commands had bulk output which ended at 1040, 1859, 2678, 3497, 4317, and so on. The numbers seem to be roughly eight hundred to one thousand lines apart but are consistent and repeatable. The presence of the "--line-buffered" flag did not seem to have any effect on this behavior.

I hope I have described this test sufficiently. Did it make any sense? Does it seem like it's producing valuable data? It suggests to me that the line buffering flag on grep either does not behave the way I think it does or does not work at all.

I'd appreciate any and all suggestions. At the moment, it would seem that I might have to use a different line parser. I imagine sed and awk could both approach this task as well...

Just in case anyone is wondering, the egrep -o is being used to separate a match for a regexp pattern from the rest of the line it is in. For example, for the line
Code:
Current progress:  0.57%   Estimated time remaining: way too long.  Blah blah.
would be filtered with the expression "[0-9.]+%" (as I mentioned in the OP) to produce "0.57%". I emphasize this since, in light of the above example, the presence of the call to egrep looks kind of useless.

Cheers!
 
Old 06-26-2006, 08:32 AM   #9
jschiwal
Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654
Change the \005 to \r.
for (( progress=0; progress<101; progress++ )); do sleep 2; IFS='\r'; echo -ne "progress: ${progress}\r" | tr '\r' '\n'; done

Changing IFS allows tr to translate from a CR to a NL without having to wait for a final NL.

Keep in mind that the console is display the output of stderr rather than stdout. Or the message is sent to /dev/tty. This allows messages to be displayed while operating on stdin and outputting to stdout.

There is a gotcha in changing IFS. You need to change it back before it causes problems elsewhere. For example, try the oneliner. Then type "ls". Surprise! Now try "/bin/ls". That worked. The reason is that the first version is aliased to something like: alias ls='/bin/ls $LS_OPTIONS'. The space no longer seperates command line arguments.

I'm not certain why you want to be doing this. Is it because you have several programs each sending progress indicators to stderr and you want to combine them into a loggable form? There are programs that have an option to use a loggable output. Others have a quiet option.

Be careful handling stderr. You don't want to do something that will insert it into the data stream. ( That sounds like something from TRON! )
 
Old 06-26-2006, 08:48 AM   #10
spirit receiver
Member
 
Registered: May 2006
Location: Frankfurt, Germany
Distribution: SUSE 10.2
Posts: 424

Rep: Reputation: 33
I think the following loop should work as well:
Code:
(echo -ne "first line\r"; sleep 1; echo -ne "second line\r")| while read -d $'\r'; do echo "$REPLY"; done | grep -o "line"
 
Old 06-26-2006, 09:54 AM   #11
jschiwal
Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654
Correction.

I looked at my original test line again, substituting a longer message. tr worked without having to change IFS. I guess I didn't read though message #7 carefully.
Grep would be another story however. Also, substituting one pattern for another is more up sed's alley. But sed is also line oriented.

I wrote a short program to output the indicator. Then I used a oneliner to shorten up the indicator. The output is almost funny. It runs in 50 to 100 spurts. So I think I have a better idea on what you are trying to do.
At first I thought you wanted to change it so that there was a line printed for each change.

jschiwal@hpamd64:~/Documents> cat progtest
#! /bin/bash
for (( progress=0; progress<10000; progress++ )); do
sleep 0.01
echo -ne "Current Progress ${progress} $(date -R)\r"
done
jschiwal@hpamd64:~/Documents> IFS='\r'; ./progtest | tr '\r' '\n' | sed -u 's/^\(Current Progress [0-9][0-9]*\).*/\1/' | tr '\n' '\r' ; echo
Current Progress 199

I think that the chunkiness is caused by buffering in the pipe.
echo "$(ulimit -p)*1024 | bc"
8192

Last edited by jschiwal; 06-26-2006 at 09:55 AM.
 
Old 06-26-2006, 10:35 PM   #12
tvynr
Member
 
Registered: Apr 2004
Distribution: Debian
Posts: 143

Original Poster
Rep: Reputation: 15
jschiwal: Excellent deduction!

After you mentioned that and in light of the performance of
Code:
for n in $(seq 0 9); do echo -en "$n\r"; sleep 1; done | (tr $'\r' $'\n' | egrep --line-buffered -o '[0-9]')
(which was choppy) and
Code:
for n in $(seq 0 9); do echo -en "$n\r"; sleep 0.25; done | (tr $'\r' $'\n')
(which was good), I executed
Code:
for n in $(seq 0 9); do echo -en "$n\r"; sleep 0.25; done | (tr $'\r' $'\n' | cat)
which turned out to be just as choppy as the one with grep. It looks like the pipe is what's causing the trouble after all.

That leads me to a fascinating little question... how do I change the pipe? Do I have to create the pipe myself using mkfifo and specify some special parameters? Is there any way I can change how much the pipe is buffering?

To answer your question about my rationale: I have multiple programs all of which display their progress indication in a different way. All of them display quite a lot of header information when they are first executed (at least twenty lines) and none of them allow me to suppress that behavior without suppressing progress indication as well. Finally, one of them seems to be writing newlines to standard error as it writes its output, causing the stderr of the progress indicator to contain several thousand newlines by the time the program is finished running, spreading out its display quite a bit. All of this together is rather inconvenient. My intention is to gather the output into a form which is more reportable to a user viewing the execution of my script.

Of course, I'm being a nice scriptwriter and allowing a command line parameter to suppress the output processing behavior if that's necessary. I do this especially in light of the fact that I am reprocessing the subprocesses' standard error streams. I realize this is problematic if something goes wrong with a subprocess; eventually, I hope to both be able to provide intelligent reporting based upon the subprocesses' exit codes as well as direct all of this through tee to produce a copy of each subprocess's output and error streams (again as directed by the command line parameters). However, for most executions of the script, this should not be necessary.

Additionally, it's helping me develop some bash skills I don't usually have the need to expand.

Thanks muchly for your help. Cheers!
 
Old 06-27-2006, 05:56 AM   #13
archtoad6
Senior Member
 
Registered: Oct 2004
Location: Houston, TX (usa)
Distribution: MEPIS, Debian, Knoppix,
Posts: 4,727
Blog Entries: 15

Rep: Reputation: 230Reputation: 230Reputation: 230
Would read w/ the "-d" option help?

See either [c|k]onsole:
Code:
help read
or search the bash man page for "[-t timeout]".
 
Old 06-27-2006, 06:36 AM   #14
spirit receiver
Member
 
Registered: May 2006
Location: Frankfurt, Germany
Distribution: SUSE 10.2
Posts: 424

Rep: Reputation: 33
Quote:
Originally Posted by tvynr
It looks like the pipe is what's causing the trouble after all.
But it's not the pipe alone, have a look at
Code:
for n in $(seq 0 9); do echo -en "$n\r"; sleep 0.25; done | (cat | cat)
As for that "read" command: I guess it would help, see my example above
 
Old 06-27-2006, 11:50 AM   #15
tvynr
Member
 
Registered: Apr 2004
Distribution: Debian
Posts: 143

Original Poster
Rep: Reputation: 15
That's quite an interesting snippet... I followed up with
Code:
for n in $(seq 0 9); do echo -en "$n\r"; sleep 0.25; done | (cat | egrep -o --line-buffered '[0-9]+')
and
Code:
for n in $(seq 0 9); do echo -en "$n\r"; sleep 0.25; done | egrep -o --line-buffered '[0-9]+'
which both perform in the jerky fashion.

Upon reading archtoad6's message and rereading spirit_receiver's earlier post containing the read example, I now understand its intention: replace tr with the while read loop, yes? So I tried the snippet
Code:
for n in $(seq 0 9); do echo -en "$n\r"; sleep 0.25; done | (while read -d $'\r' line; do echo "$line"; done) | egrep -o --line-buffered '[0-9]+'
which worked quite nicely. Of course, when I try
Code:
for n in $(seq 0 9); do echo -en "$n\r"; sleep 0.25; done | (tr $'\r' $'\n') | egrep -o --line-buffered '[0-9]+'
I get the unpleasant behavior again.

I have inserted the replacement for tr into my script and everything runs most pleasantly. :-D

In summary, I guess the solution is to replace
Code:
tr "$a" "$b"
with
Code:
while read -d "$a" line; do echo -n "$line$b"; done
whenever this problem crops up (where $a is the character to replace and $b is the character with which to replace it). In my case, $b happens to be $'\n', so I can simplify the echo command.

Many thanks to all of you for your explanations and assistance in tracking this down. I'm still quite perplexed by the buffering that the pipe creates, especially considering that it only seems to happen some of the time. However, the display on my script is much smoother and I'm quite satisfied.

Again, thanks for all your help! Cheers!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Bash scripting question nitroid Programming 3 04-13-2006 07:08 AM
Bash Scripting Question fiod Linux - Newbie 4 11-19-2005 05:09 AM
Bash scripting question Hammett Linux - General 4 11-29-2004 06:29 AM
bash scripting question Andy@DP Programming 4 04-13-2004 05:06 PM
Bash Scripting Question Rezon Programming 2 10-30-2003 10:16 AM


All times are GMT -5. The time now is 02:44 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration