LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 09-16-2011, 01:08 AM   #31
amcohen
LQ Newbie
 
Registered: Jul 2011
Posts: 4

Rep: Reputation: Disabled

expr is so Bourne shell old school.
Here's a neater way:

#!/bin/ksh
integer count
while read line; do
count=$((count + 1))
done < myfile
echo The line count is: ${count}

Note that you don't need any dollar signs within $(( ... ))

Also you could simply do:

echo The line count is: $((cat myfile | wc - l))
 
Click here to see the post LQ members have rated as the most helpful post in this thread.
Old 09-16-2011, 02:23 AM   #32
cristalp
Member
 
Registered: Aug 2011
Distribution: Linux Mint
Posts: 103

Rep: Reputation: Disabled
I do not know what exactly you want to achieve later. But may be the internal for loop of awk may suit?

Code:
awk '{for (i = 1; i <= NR; i++); print NR}' YOURFILE
which just gives the same printout. I do not know if it fit your further goal.

Last edited by cristalp; 11-03-2011 at 11:16 AM.
 
Old 09-16-2011, 02:25 AM   #33
Ramurd
Member
 
Registered: Mar 2009
Location: Rotterdam, the Netherlands
Distribution: Slackwarelinux
Posts: 548

Rep: Reputation: 74
yes, old thread; but it was fresh today; so I read it...

1: bash does indeed spawn subshells when a pipe is involved; ksh does not; so I often fall back to ksh for such things.
2: declaring variables is not really required; it can sometimes be handy though.

amcohen's script can therefore be altered to:
Code:
#!/bin/ksh

while read line
do
((count+=1))
done < filename

echo "The line count is: $count"
This should work for bash as well, as in this case a pipe is not involved.

In bash with pipe you might be able to do it like this:

Code:
count=$( $(cat file | while read line; do; ((count+=1)); echo $count; done) | tail -n 1)
echo $count
stupid, I know...
 
Old 09-16-2011, 03:52 AM   #34
R.Hicks
LQ Newbie
 
Registered: Nov 2007
Distribution: RHEL3AS/ES & RHEL4AS/ES
Posts: 10

Rep: Reputation: 1
While read line is evil.

Evil.

Granted what will follow is processing a large text file (~ 3 million lines) but given the difference between the two methods I will almost always NEVER use while read line.

While Read Line - Processing a single file containing ~3 million lines, took over 50 hours (ended up giving up after that point).
Awk - Processing the same file. 90 seconds including a bit extra sed processing tacked onto the end.

Granted on tiny tiny files you will notice almost no improvement but on larger text files the difference is huge.

Last edited by R.Hicks; 09-16-2011 at 03:55 AM.
 
1 members found this post helpful.
Old 09-17-2011, 10:28 PM   #35
ta0kira
Senior Member
 
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 3,078

Rep: Reputation: Disabled
Quote:
Originally Posted by R.Hicks View Post
While Read Line - Processing a single file containing ~3 million lines, took over 50 hours (ended up giving up after that point).
Awk - Processing the same file. 90 seconds including a bit extra sed processing tacked onto the end.
These two are functionally different. If you performed the same process in both cases you would have had to invoke awk 3 million times, unless the awk functionality you used was trivial or you emulated awk with shell code. There is also a very limited set of things you can do with text in awk, compared to what you have access to from the shell (e.g. awk and other things). while read line doesn't cause line reading to be 2000x slower, and it certainly isn't used to manipulate text. Maybe you accidentally read from the terminal and it took you 50h to Ctrl+C out of it.
Kevin Barry

PS I don't even think I was using Linux when this thread was started...

Last edited by ta0kira; 09-17-2011 at 10:30 PM.
 
Old 09-18-2011, 06:46 AM   #36
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947
@Ramurd

AIUI, ksh does fork off subshells for piped commands if there's more than one. It just runs the last command in the chain in the current environment. So only environmental changes done in the first and last commands will be available outside the chain.

Bash 4.2 has also finally implemented the same feature, by the way, when you enable the new lastpipe shell option.


@R.Hicks

It's not nice to go around calling something "evil" just because you didn't properly understand it's strengths and weaknesses, and tried to use it to do something it wasn't really designed for.

Loops are simply a kind of flow control. They're used to execute a command or group of commands sequentially on a series of entries, or until a defined condition is reached. So of course using them to manipulate millions of lines in a file is going to take hours to process.

Where while+read is most useful is when processing lists of filenames, the output of other commands, and other similar situations where the number of iterations can be reasonably defined. Its use as a text manipulation tool is a secondary feature at best, and even then it's most efficient when the manipulations can be done entirely with built-in shell features like parameter substitutions.
 
1 members found this post helpful.
Old 09-19-2011, 04:50 AM   #37
R.Hicks
LQ Newbie
 
Registered: Nov 2007
Distribution: RHEL3AS/ES & RHEL4AS/ES
Posts: 10

Rep: Reputation: 1
Quote:
Originally Posted by ta0kira View Post
These two are functionally different. If you performed the same process in both cases you would have had to invoke awk 3 million times, unless the awk functionality you used was trivial or you emulated awk with shell code. There is also a very limited set of things you can do with text in awk, compared to what you have access to from the shell (e.g. awk and other things). while read line doesn't cause line reading to be 2000x slower, and it certainly isn't used to manipulate text. Maybe you accidentally read from the terminal and it took you 50h to Ctrl+C out of it.
Kevin Barry

PS I don't even think I was using Linux when this thread was started...
Nice @ the reading the terminal line. Nope, not the case.

The script I originally used to process the 3 million lines worked flawlessly (taking ~ 60 seconds or so) for smaller (20,000 or so) numbers. The 3 million lines came about due to the server being restarted unexpectedly and this script not being included as part of a cronjob, so a backlog occurred.

Quote:
@R.Hicks

It's not nice to go around calling something "evil" just because you didn't properly understand it's strengths and weaknesses, and tried to use it to do something it wasn't really designed for.
While Read Line is not meant for parsing large text files, which is the point I was illustrating above. You seem to agree with me on this.

Quote:
Loops are simply a kind of flow control. They're used to execute a command or group of commands sequentially on a series of entries, or until a defined condition is reached. So of course using them to manipulate millions of lines in a file is going to take hours to process.

Where while+read is most useful is when processing lists of filenames,
It's still a list. Depending on the length and processing you're doing, while read line is not useful at all.

Quote:
the output of other commands, and other similar situations where the number of iterations can be reasonably defined. Its use as a text manipulation tool is a secondary feature at best, and even then it's most efficient when the manipulations can be done entirely with built-in shell features like parameter substitutions.
The manipulations were all done with built-ins pretty much using bash's own string manipulation functions ${var:3:9} etc... granted there were quite a few of them, however replacing the entire script with AWK as I say reduced processing time of large text files from hours, to seconds.

I stand by my initial comments. Processing large text files in while read line is EVIL and my experience with them has meant that whenever a size of a text file is not a known value beforehand I will prefer to do any string manipulation that I require to do in bulk, using AWK.
 
Old 09-20-2011, 01:01 AM   #38
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,261

Rep: Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028
It's got nothing to do with 'while read line'.
The point is that bash (or any shell) is an interpreted lang; awk (& Perl) are not.
The first time I wrote prog in Perl at work was indeed to do some text manipulations on large text files.
The (ksh) soln worked, but was very slow.
Even just doing an almost line-by-line translation to Perl dropped it down from iirc several 10s of minutes to a few seconds (this was a long time ago ~ 10 yrs; memory a bit vague on fine detail).

Note that to get the max speedup you'd want to do the entire thing in awk or Perl, not start invoking them just to do the fiddly bits.
 
Old 09-20-2011, 01:11 AM   #39
Ramurd
Member
 
Registered: Mar 2009
Location: Rotterdam, the Netherlands
Distribution: Slackwarelinux
Posts: 548

Rep: Reputation: 74
Quote:
Originally Posted by David the H. View Post
@Ramurd

AIUI, ksh does fork off subshells for piped commands if there's more than one. It just runs the last command in the chain in the current environment. So only environmental changes done in the first and last commands will be available outside the chain.

Bash 4.2 has also finally implemented the same feature, by the way, when you enable the new lastpipe shell option.
Nice to learn a few new things:
0) AIUI is a new abbreviation for me ;-)
1) I didn't know about the limitation in ksh regarding the first and last command in the chain; good to know!
2) Didn't know either that bash since 4.2 found a way around this issue. Is this also only the 1st and last command in the chain?

Nice constructive comments. While I would agee processing a full 3 million-line text file is not really a shell task; I would indeed resort to other solutions than a full loop in a shell; I guess I'd write a C program for it instead. (I'm not very fluent with awk and perl, I fear)
 
Old 09-20-2011, 06:00 AM   #40
sag47
Senior Member
 
Registered: Sep 2009
Location: Philly, PA
Distribution: Kubuntu x64, RHEL, Fedora Core, FreeBSD, Windows x64
Posts: 1,422
Blog Entries: 33

Rep: Reputation: 356Reputation: 356Reputation: 356Reputation: 356
I personally use bash, and/or Python, and/or R (statistics). If the 3m lines of text can be processed independently then I would use the split command and split the file into several parts possibly into numbered folders (depending on how many procs you have). And then launch all shells at once in parallel and combine your results at the end. A lot of times (such as in the case of scientific/genome data) each file requires a header but that's easy enough to process. Then have a shell script which fires off all the processes.

By doing this I have cut 5 hrs of processing down to 10 minutes on a 64 processor machine. That's out of the scope of the OP but in your case is why I made this recommendation.
 
Old 09-20-2011, 08:38 PM   #41
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,261

Rep: Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028
@Ramurd: If you know C, you'll pick up Perl easy enough. Its a lot like C but easier.
I did about 8-9 yrs in C, then switched to Perl.
Runtime is a bit slower (~ 80-90% as fast as C), but speed of programming is much faster and easier to make resilient, as Perl strings, arrays, buffers etc are all handled for you ie no 'writing off the end' by accident.
I also think their references are easier to follow than C's ptr notation.

C is a good lang, but you don't need to go down to that level of detail for a lot of programming.
 
1 members found this post helpful.
Old 10-09-2011, 06:12 AM   #42
kvmreddy
LQ Newbie
 
Registered: Aug 2009
Posts: 15

Rep: Reputation: 3
Arrow

You can do it more than one way, using loops,exec,awk and redirection. Choice depend on requirement.
Check bellow link for different methods, and also given run time statistics.


How to read a fiel line by line in a shell script
 
2 members found this post helpful.
Old 11-02-2011, 11:32 PM   #43
allanf
Member
 
Registered: Sep 2008
Location: MN
Distribution: Gentoo, Fedora, Suse, Slackware, Debian, CentOS
Posts: 97
Blog Entries: 1

Rep: Reputation: 19
Quote:
Originally Posted by Darren[UoW] View Post
I am using the following code to read line by line from a file, however my problem is that $value=0 at the end of the loop possibly because bash has created a subshell for the loop or something similar. How can I solve this.


value=0;

while read line
do
value=`expr $value + 1`;
echo $value;
done < "myfile"

echo $value;


Note: This example just counts the number of lines, I actually desire to do more complex processing than this though, so 'wc' is not an alternative, nor is perl im afraid.


Thanks Darren.
BASH has math operations built in and the `expr $value + 1` or $(($value+1)) are not needed.
try this:
Code:
let counter=0
echo $counter
let counter+=5
echo $counter 
let counter+=10
echo $counter
let counter=counter/5
echo $counter
let counter=counter*6+3
let delta=13
let counter=counter/7+delta
echo $counter
 
0 members found this post helpful.
Old 11-04-2011, 06:19 AM   #44
Ramurd
Member
 
Registered: Mar 2009
Location: Rotterdam, the Netherlands
Distribution: Slackwarelinux
Posts: 548

Rep: Reputation: 74
I don't want to be mean, but run this script and see the difference in speed:

Code:
#!/bin/bash

bylet()
{
        val=0
        iter=1000000
        for((i=0;i<$iter;i++))
        do
                let val+=${i}
        done

        echo ${val}
}

other()
{
        val=0
        iter=1000000
        for((i=0;i<$iter;i++))
        do
                ((val+=i))
        done
        echo ${val}
}
time bylet
time other
I got the significant difference of:
./speed.sh
499999500000

real 0m12.808s
user 0m12.540s
sys 0m0.251s
499999500000

real 0m8.609s
user 0m8.363s
sys 0m0.236s

I guess you can safely state that (( )) is way faster than using let

Last edited by Ramurd; 11-04-2011 at 06:20 AM. Reason: whoops; wrong tags for the code
 
2 members found this post helpful.
Old 11-05-2011, 12:56 AM   #45
allanf
Member
 
Registered: Sep 2008
Location: MN
Distribution: Gentoo, Fedora, Suse, Slackware, Debian, CentOS
Posts: 97
Blog Entries: 1

Rep: Reputation: 19
Quote:
Originally Posted by Ramurd View Post
I don't want to be mean, but run this script and see the difference in speed:

Code:
#!/bin/bash

bylet()
{
        val=0
        iter=1000000
        for((i=0;i<$iter;i++))
        do
                let val+=${i}
        done

        echo ${val}
}

other()
{
        val=0
        iter=1000000
        for((i=0;i<$iter;i++))
        do
                ((val+=i))
        done
        echo ${val}
}
time bylet
time other
I got the significant difference of:
./speed.sh
499999500000

real 0m12.808s
user 0m12.540s
sys 0m0.251s
499999500000

real 0m8.609s
user 0m8.363s
sys 0m0.236s

I guess you can safely state that (( )) is way faster than using let
I modified the code as "let val+=i" rather than "let val+=${i}" and got times of:

With the let the statement is "let val+=i" not "let val=${i}". The use of the "${i}" rather than "i" added a lot of time difference.

Code:
499999500000

real    0m7.804s
user    0m7.688s
sys     0m0.112s
499999500000

real    0m6.199s
user    0m6.082s
sys     0m0.112s
I then changed the for to be "for i in {0..999999}" in bylet and got times of:
Code:
499999500000

real    0m5.990s
user    0m5.897s
sys     0m0.090s
499999500000

real    0m6.385s
user    0m6.256s
sys     0m0.124s
With both for statements changed the times are:
Code:
499999500000

real    0m5.820s
user    0m5.733s
sys     0m0.083s
499999500000

real    0m4.269s
user    0m4.231s
sys     0m0.036s
From these I would say that "(( ))" has a slight edge but the number of de-references (using ${i} where it is not needed really effects the times. Also the use of the older "for i in {0..99999}" performs much faster than the"
Code:
        iter=1000000
        for((i=0;i<$iter;i++))
For two reasons here first the need to do a compare, a de-reference (i.e. $iter), and addition on each loop.
The use of the
Code:
        for i in {0..999999}
just walk the set.


AS I started this out, I agreed that the usage of "(( .... ))" had a slight edge over the "let" statement. The difference was not as bad as you had shown as you used a "${i}" where only "i" was needed. This narrowed the time difference gap greatly. To improve time so that the time reflects the use of the doing math is a bigger part of the time generated.

Last edited by allanf; 11-05-2011 at 06:17 PM. Reason: Put in more verbage....
 
1 members found this post helpful.
  


Reply

Tags
content, do, file, from, how, name, possibilities, read, script, write


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
BASH: read every line in the files and use the line as parameters as another program tam3c36 Programming 10 12-07-2010 01:42 PM
bash: read a line from file zokik Linux - General 6 12-10-2008 09:24 AM
shell script that read each line separatly xpucto Programming 6 09-20-2005 08:06 AM
Shell Script to read 500files from the command line saravanan1979 Programming 1 09-22-2004 09:44 AM
linux scripting help needed read from file line by line exc commands each line read atokad Programming 4 12-26-2003 10:24 PM


All times are GMT -5. The time now is 04:15 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration