LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 04-01-2016, 11:18 AM   #1
slik
LQ Newbie
 
Registered: Jun 2014
Posts: 4

Rep: Reputation: Disabled
disaster performance 'while read' script


Hi,

I have input looking like:
XP035649954
20160322
20160322
20160324
XP035649953
20160322
20160322
20160324

I want to output every 4 lines in one single line, like:
XP035649954 20160322 20160322 20160324
XP035649953 20160322 20160322 20160324

I wrote a script which does the job, but it has terrible performance. I use the "while read" construct quite often, but it's always slow. I would like to understand why it is performing so badly. Additionally, how can I speed up what I want? Can this be done in awk?


Script I use:
-------------
while read line
do
out=$(echo $line)
for i in {1..3}
do
read line
out=$(echo $out $line)
done
echo $out >>outfile
done <infile



Thanks in advance
 
Old 04-01-2016, 01:27 PM   #2
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,804

Rep: Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069
Firstly, please use [code][/code] tags around code / data

Yes this can be done in awk ... look into the NR variable

As for your script, you have a loop inside a loop so obviously this will take longer than a single loop.
On top of that, you use process substitution of echo to reassign to a variable when you could simply append the variable data:
Code:
out=$(echo $line)
# is just
out=$line

out=$(echo $out $line)
# is just
out="$out $line"
Like the awk solution, why not have a simple counter and each time you reach a particular number of lines read, you append a new line to the data.

Your initial question talks about outputting the data, but your example shows it being written to a new file, in that case build the line as above and simply echo to new file once
you reach counter value.
 
1 members found this post helpful.
Old 04-01-2016, 01:44 PM   #3
michaelk
Moderator
 
Registered: Aug 2002
Posts: 20,847

Rep: Reputation: 3756Reputation: 3756Reputation: 3756Reputation: 3756Reputation: 3756Reputation: 3756Reputation: 3756Reputation: 3756Reputation: 3756Reputation: 3756Reputation: 3756
Your using the echo command to remove the new line character and append which is inefficient. You can use bash's parameter expansion instead

Code:
while read line
do
out=${line%$'\n'}
for i in {1..3}
do
read line
line=${line%$'\n'}
out=$out" "$line
done
echo $out >> outfile
done < infile
On a 4000 line file the time it takes is:
Quote:
real 0m5.730s
user 0m0.680s
sys 0m1.280s

vs
real 0m0.320s
user 0m0.272s
sys 0m0.036s
Note: The first time I ran your script the times were 2x slower. Not sure what what changed.

Last edited by michaelk; 04-01-2016 at 01:53 PM.
 
1 members found this post helpful.
Old 04-01-2016, 02:01 PM   #4
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: CentOS
Posts: 4,504

Rep: Reputation: 2062Reputation: 2062Reputation: 2062Reputation: 2062Reputation: 2062Reputation: 2062Reputation: 2062Reputation: 2062Reputation: 2062Reputation: 2062Reputation: 2062
Quote:
Originally Posted by michaelk View Post
Your using the echo command to remove the new line character which is inefficient. You can use bash's parameter expansion instead

Code:
while read line
do
out=${line%$'\n'}
for i in {1..3}
do
read line
line=${line%$'\n'}
out=$out" "$line
done
echo $out >> outfile
done < infile
That script is inefficiently re-opening the output file for every line written. That's a fairly expensive operation. Try this:
Code:
while read out
do
    for i in {1..3}
    do
        read line
        out="$out $line"
    done
    echo $out
done < infile >outfile
I've also eliminated unnecessary variable manipulation. Time for a 4000 line input file is 32ms.

Last edited by rknichols; 04-01-2016 at 02:02 PM. Reason: typo
 
1 members found this post helpful.
Old 04-01-2016, 02:32 PM   #5
slik
LQ Newbie
 
Registered: Jun 2014
Posts: 4

Original Poster
Rep: Reputation: Disabled
Thanks all for your feedback.
Very helpful comments indeed.
 
Old 04-01-2016, 03:27 PM   #6
michaelk
Moderator
 
Registered: Aug 2002
Posts: 20,847

Rep: Reputation: 3756Reputation: 3756Reputation: 3756Reputation: 3756Reputation: 3756Reputation: 3756Reputation: 3756Reputation: 3756Reputation: 3756Reputation: 3756Reputation: 3756
Quote:
while read out
do
for i in {1..3}
do
read line
out="$out $line"
done
echo $out
done < infile >outfile
Much better then my script... Although the relative time difference from mine (I'm using a VM) is only .11 sec.
 
Old 04-01-2016, 04:06 PM   #7
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,804

Rep: Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069Reputation: 3069
Couple of alternatives:
Code:
#!/usr/bin/env bash

cnt=1

while read line
do
	[[ "$out" ]] && out="$out $line" || out=$line

	if (( cnt++ % 4 == 0 ))
	then
		echo $out
		unset out
	fi
done< infile > outfile

real	0m0.081s
user	0m0.073s
sys	0m0.007s

awk 'ORS = (NR % 4) ? " " : RS' infile > outfile

real	0m0.008s
user	0m0.007s
sys	0m0.000s
Both of the above times are with 4000 lines
 
Old 04-03-2016, 11:05 AM   #8
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 1,562

Rep: Reputation: 708Reputation: 708Reputation: 708Reputation: 708Reputation: 708Reputation: 708Reputation: 708
Another bash/ksh
Code:
while read a0; read a1; read a2; read a3
do
  echo "$a0 $a1 $a2 $a3"
done < infile
Interesting: with an array it becomes awfully slow!?
Code:
while read A[0]; read A[1]; read A[2]; read A[3]
do
  echo "${A[@]}"
done < infile

Last edited by MadeInGermany; 04-03-2016 at 11:08 AM.
 
Old 04-03-2016, 12:12 PM   #9
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: CentOS
Posts: 4,504

Rep: Reputation: 2062Reputation: 2062Reputation: 2062Reputation: 2062Reputation: 2062Reputation: 2062Reputation: 2062Reputation: 2062Reputation: 2062Reputation: 2062Reputation: 2062
There are a bazillion ways to do it.
Code:
sed '{N;N;N;s/\n/ /g}' infile >outfile
Just 2 milliseconds for that one with 4000 lines.

Last edited by rknichols; 04-03-2016 at 12:16 PM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Best read performance? bcache, dm_cache or EnhanceIO Lop3 Linux - General 5 07-17-2014 06:52 AM
iSCSI write performance very poor while read performance is excellent dinominant Linux - Server 1 10-10-2012 11:51 AM
[SOLVED] read shell script code even if only have execute permission not read em31amit Linux - Security 7 06-13-2012 12:32 AM
Software Raid 6 - poor read performance / fast write performance Kvothe Linux - Server 0 02-28-2011 04:11 PM
Problem with GFS read and write performance rbh123 Linux - Enterprise 1 10-06-2008 05:24 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 05:41 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration