LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 02-24-2008, 07:55 AM   #1
Clutch2
LQ Newbie
 
Registered: Feb 2008
Location: Northern Michigan
Distribution: Debian on Raspberry Pi's
Posts: 14

Rep: Reputation: 0
Limiting how deep in file grep searches


I have a script that looks for a certain header entry that breaks the statistics script I am running. Unfortunately grep scans each of many files all the way through when either the the offending entry is in the first 10 or so lines or doesn't matter.

I looked at the man page and didn't see any setting that only says search only so many bytes in file or lines.

Any ideas?

Thanks,

Clutch
 
Old 02-24-2008, 08:01 AM   #2
b0uncer
LQ Guru
 
Registered: Aug 2003
Distribution: CentOS, OS X
Posts: 5,131

Rep: Reputation: Disabled
If you want to search only N first (last) lines, you can use head (tail) to put them in for Grep. For example if N = 15 (15 first lines, or 15 last with tail):
Code:
head -15 myfile | grep yourpattern
tail -15 myfile | grep yourpatterm
Haven't tested if it's faster, but could be.
 
Old 02-24-2008, 08:14 AM   #3
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
sed -n '5q; /word/p' filename

Prints lines containing "word", until it reaches line 5.
 
Old 02-24-2008, 08:21 AM   #4
slakmagik
Senior Member
 
Registered: Feb 2003
Distribution: Slackware
Posts: 4,113

Rep: Reputation: Disabled
Code:
:time grep oo ~/.slrn/var/killfiled* >/dev/null

real    0m0.092s
user    0m0.084s
sys     0m0.008s

:time head -20 ~/.slrn/var/killfiled* | grep oo >/dev/null

real    0m0.008s
user    0m0.000s
sys     0m0.008s


:time sed -n '/oo/p;20q' ~/.slrn/var/killfiled* >/dev/null

real    0m0.004s
user    0m0.000s
sys     0m0.000s
Hardly solid benchmarking, but that might give an idea. (I redirect to /dev/null because otherwise I'd be timing the terminal drawing time of the avalanche of stuff grep spits out.)

-- Crap. Pixellany beat me to it while I was 'benchmarking'. Ah well - at least the timings might be interesting.

Last edited by slakmagik; 02-24-2008 at 08:22 AM.
 
Old 02-24-2008, 08:42 AM   #5
Clutch2
LQ Newbie
 
Registered: Feb 2008
Location: Northern Michigan
Distribution: Debian on Raspberry Pi's
Posts: 14

Original Poster
Rep: Reputation: 0
Sweet!

head -15 * | grep mypattern works! I'll have to do timings to see how much faster.

Since this is a bunch of usenet messages that leafnode stored, is there a way to start from an incrementing numerical filename?

Just to be honest, I'm running this on W2k using Cygwin though I do have a FC7 box.

BTW, I know there is a command to time a job, what is it?

Clutch

Last edited by Clutch2; 02-24-2008 at 08:44 AM.
 
Old 02-24-2008, 12:31 PM   #6
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
Would you believe.........time!!

eg:
time find / -name rumplestiltskin

will tell you how long it takes your computer to learn that that name is nowhere in the system. What a waste, because you already knew that....
 
Old 02-24-2008, 07:33 PM   #7
JWPurple
Member
 
Registered: Feb 2008
Posts: 67

Rep: Reputation: 17
Remember that the first run is probably taking the biggest hit just to load the file data into cache. Subsequent runs of the same command are likely to take much less time because the data is already in cache. So ignore the first run.
 
Old 02-25-2008, 08:22 AM   #8
Clutch2
LQ Newbie
 
Registered: Feb 2008
Location: Northern Michigan
Distribution: Debian on Raspberry Pi's
Posts: 14

Original Poster
Rep: Reputation: 0
Well, using head actually increased my times since I was greping files from a text based newsgroup. If it had been a binary group, well it would have likely rocked.

On the sed example, can the filename be a wildcard?

Clutch
 
Old 02-25-2008, 10:20 AM   #9
slakmagik
Senior Member
 
Registered: Feb 2003
Distribution: Slackware
Posts: 4,113

Rep: Reputation: Disabled
Not so far as I know. You could do a loop but that would probably kill any time savings. I don't understand how head increased your times, though, or even why text vs. binary would relate to that.

(Based on more crappy benchmarks, a for loop with sed is way slower a full grep and a full grep is slower than a partial grep with head.)
 
Old 02-25-2008, 11:34 AM   #10
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
Quote:
Based on more crappy benchmarks, a for loop with sed is way slower a full grep and a full grep is slower than a partial grep with head.
What about a find with sed?
Code:
time find ~/.slrn/var -type f -name killfiled* -exec sed -n '/oo/p;20q' {} \;
[edit]
Btw, if the directory structure had a set
depth you COULD use sed with wildcards ...

With e.g. ~/news/altlinuxos/2001/ ~/news/slackwareos/2003/
~/news/awk/2005/ and so on you could simply do
Code:
sed -n '/oo/p;20q' ~/news/*/*/*
[/edit]


Cheers,
Tink

Last edited by Tinkster; 02-25-2008 at 11:42 AM. Reason: [edit]
 
Old 02-25-2008, 11:37 AM   #11
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
Quote:
Originally Posted by digiot View Post
Not so far as I know. You could do a loop but that would probably kill any time savings. I don't understand how head increased your times, though, or even why text vs. binary would relate to that.
Well .. if it WAS binary data head would try to find n occurrences
of a LF .... which MAY be few and far between in binaries. With
some bad luck it may need to grep through the whole 1 MB.



Cheers,
Tink
 
Old 02-25-2008, 11:51 AM   #12
slakmagik
Senior Member
 
Registered: Feb 2003
Distribution: Slackware
Posts: 4,113

Rep: Reputation: Disabled
Quote:
Originally Posted by Tinkster View Post
What about a find with sed?
Wow. Time for me to take a break and get some rest.

Quote:
Originally Posted by Tinkster View Post
Well .. if it WAS binary data head would try to find n occurrences
of a LF .... which MAY be few and far between in binaries. With
some bad luck it may need to grep through the whole 1 MB.
Yeah, good point. So it does relate, but in the reverse sense. 'Course, grep and sed would be in the same boat, I think. Unless that's just the tired talking again.
 
Old 02-25-2008, 02:18 PM   #13
Clutch2
LQ Newbie
 
Registered: Feb 2008
Location: Northern Michigan
Distribution: Debian on Raspberry Pi's
Posts: 14

Original Poster
Rep: Reputation: 0
find -mtime 8 seems to find the files I'm interested in but it reports the file name.
That part takes 28 seconds.

How do I connect it to grep in order to get grep to scan each filename output by the find command?

greping every file takes about 11 minutes for just grep, 15 minutes using head|grep


Thanks,
Clutch
 
Old 02-25-2008, 02:33 PM   #14
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
Look at the exec option to find.

"man find" for more than you ever wanted to know...
 
Old 02-25-2008, 03:27 PM   #15
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,140

Rep: Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123
If the "depth" of the match is indeterminate at the beginning of the run, maybe use perl.
I did some tests on scanning big files a while back, and perl ran faster than sed with quit (reboot between very run to obviate cache effects).
If the record count isn't really high (in the hundreds of thousands to millions potentially), probably not worth the effort.
 
  


Reply

Tags
files, find, grep, recent



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
can't grep file ust Linux - Software 3 01-07-2008 06:03 AM
how to grep all users's certain file packets Linux - Newbie 5 01-03-2008 08:35 PM
grep output on stdout and grep output to file don't match xnomad Linux - General 3 01-13-2007 04:56 AM
Limiting the size of a log file ganninu Linux - General 1 09-18-2003 08:23 AM
grep file in the subdirectory juno Linux - General 3 09-30-2002 11:08 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 01:26 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration