LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums HCL Reviews Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Red Hat
User Name
Password
Red Hat This forum is for the discussion of Red Hat Linux.

Notices


Reply
  Search this Thread
Old 09-11-2007, 06:27 PM   #1
quackerjack_98
LQ Newbie
 
Registered: Aug 2006
Posts: 12

Rep: Reputation: 0
fast line count for large files?


does anyone know of a way to get a file line count for large files that doesn't use "wc -l" or "grep -c" etc.?
 
Old 09-11-2007, 07:14 PM   #2
jiml8
Senior Member
 
Registered: Sep 2003
Posts: 3,171

Rep: Reputation: 115Reputation: 115
there are countless ways to do that.

Is this homework?
 
Old 09-12-2007, 08:37 AM   #3
quackerjack_98
LQ Newbie
 
Registered: Aug 2006
Posts: 12

Original Poster
Rep: Reputation: 0
I'm just looking for a better way to calculate the line numbers from a file that builds to several hundred MB during the course of a day that takes seconds or less and doesn't require using an editor. has to be command line.
 
Old 09-12-2007, 08:58 AM   #4
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390
Hi,

Here's one way that seems to be faster then wc -l.

sed -n '$=' infile

I don't know if this is also true for large (a few hundred Mb or larger) files.

Here's one using perl, maybe not the fastest but it will be able to handle very large files:

perl -lne 'END { print $. }' < infile

Anyway, hope this helps.

Last edited by druuna; 09-12-2007 at 09:03 AM.
 
Old 09-12-2007, 10:01 AM   #5
jiml8
Senior Member
 
Registered: Sep 2003
Posts: 3,171

Rep: Reputation: 115Reputation: 115
*shrug*. This one seems to run a bit faster than the sed one in the previous post runs (at least, on my system). More complicated too.

nl infile | tac |sed -n 1p | awk '{print $1}
 
Old 09-12-2007, 10:53 AM   #6
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390
Hi,

Creative!

The execution speed depends on a lot of things and my result (or yours) may not be correct for others. Here are some anyway:

nl infile | tac |sed -n 1p | awk '{print $1} vs sed -n '$=' infile vs wc -l infile
infile: 500000 lines (half a million), 18 Mb flat text file.

5 runs each, only last 4 are counted.

0.024 sec (avg) - wc -l infile
0.121 sec (avg) - sed -n '$=' infile
0.396 sec (avg) - nl infile | tac |sed -n 1p | awk '{print $1}

On my box wc -l wins hands down. But that was to be expected; it was written especially for counting lines/words/bytes.

I disagree about the 'more complicated part'. How can one, small and simple, command be more complicated then 4 different, piped, commands

Last edited by druuna; 09-12-2007 at 10:55 AM.
 
1 members found this post helpful.
Old 09-12-2007, 01:36 PM   #7
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390
Hi,

Just came to mind (although kinda obvious): awk '{ x++ } END { print x }' infile

It's a bit faster(0.099 sec) then the sed solution.
 
Old 09-12-2007, 05:29 PM   #8
quackerjack_98
LQ Newbie
 
Registered: Aug 2006
Posts: 12

Original Poster
Rep: Reputation: 0
thanks for all the great ideas.
 
Old 09-12-2007, 06:45 PM   #9
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 16,070

Rep: Reputation: 2236Reputation: 2236Reputation: 2236Reputation: 2236Reputation: 2236Reputation: 2236Reputation: 2236Reputation: 2236Reputation: 2236Reputation: 2236Reputation: 2236
The sed solution offered by druuna just has to be the choice. sed always wins these simple problems - easy to do, easy to remember.
I suspect all the tests above will be invalid - unless specific steps were taken between every run to purge the disk cache.

Edit: still can't understand why the obvious (wc -l) isn't acceptable.

Last edited by syg00; 09-12-2007 at 06:56 PM.
 
Old 09-12-2007, 06:56 PM   #10
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 677Reputation: 677Reputation: 677Reputation: 677Reputation: 677Reputation: 677
If this is a log file that grows over time, you could use dd with a block size of 1 and an offset equal to the size of the file the last time you ran your check. This will output the new contents of the file, which you could pipe through wc -l.
Record the size of the file each time you run the check.

For a file this large, maybe using the size of the file instead of the number of lines would be the information you should be using.

Last edited by jschiwal; 09-12-2007 at 06:57 PM.
 
Old 03-02-2010, 06:37 PM   #11
vouser
LQ Newbie
 
Registered: Mar 2010
Posts: 10

Rep: Reputation: 0
How to find number of lines in a file.

I looked for answers to this question and saw complicated versions 2 inches long. This one came to mind and I tried it on several files:

grep -c "" /home/usr1/my-file.txt

Will this always work?

Thanks to all.
 
Old 03-03-2010, 02:50 AM   #12
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390Reputation: 2390
@vouser: Why not use wc -l <file>? wc was written for this task and is by far the fastest.

BTW: Your grep solution does work.
 
Old 05-19-2010, 02:58 PM   #13
kenosando
LQ Newbie
 
Registered: Oct 2009
Distribution: Ubuntu 9.11
Posts: 10

Rep: Reputation: 0
Code:
awk 'END {print NR}' file
 
Old 12-01-2010, 07:04 PM   #14
seemit
LQ Newbie
 
Registered: Dec 2010
Posts: 1

Rep: Reputation: 1
fast way to get line count

If you just need the estimate of count and not the exact count, there is always something called "Mathemetics".

1. Do a head -1 FILE >NEW_FILE
2. get size of NEW_FILE
3. Get size of FILE
4. Divide result of 3 by result of 2 to get an estimated row count.

To get better result estimate, do a head -n and adjust accordingly during division.

I have almost 46 Gig file still growing in size, doing a wc -l or any other methods would take quite a bit. I do above to estimate when my process would finish (An oracle spool of 300 Mn Records). Just a thought....
 
1 members found this post helpful.
Old 02-02-2011, 11:19 AM   #15
lcoker
LQ Newbie
 
Registered: Feb 2011
Posts: 1

Rep: Reputation: 0
wc -l counts the number of newlines and as such has to scan to the end of each line. To speed it up you can first use cut to reduce the lines down to a couple characters and pipe it into wc. This can cut the time down to less than half of what it takes to just run wc.

eg. cut -c -2 hugefile.txt | wc -l
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
line count mismatch between source and target systems. cool244 Linux - Enterprise 1 04-08-2005 06:25 PM
word count in a line pantera Programming 4 08-25-2004 02:14 PM
How to count line numbers recursively in a directory? puzz_1 Linux - General 1 07-01-2004 10:43 AM
Count number of line in a file. philipina Programming 7 03-18-2004 06:04 PM
Count of unread messages from command line rebuke Linux - Software 0 12-26-2003 07:41 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Red Hat

All times are GMT -5. The time now is 10:29 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration