LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Limiting how deep in file grep searches (https://www.linuxquestions.org/questions/linux-newbie-8/limiting-how-deep-in-file-grep-searches-623476/)

Clutch2 02-24-2008 07:55 AM

Limiting how deep in file grep searches
 
I have a script that looks for a certain header entry that breaks the statistics script I am running. Unfortunately grep scans each of many files all the way through when either the the offending entry is in the first 10 or so lines or doesn't matter.

I looked at the man page and didn't see any setting that only says search only so many bytes in file or lines.

Any ideas?

Thanks,

Clutch

b0uncer 02-24-2008 08:01 AM

If you want to search only N first (last) lines, you can use head (tail) to put them in for Grep. For example if N = 15 (15 first lines, or 15 last with tail):
Code:

head -15 myfile | grep yourpattern
tail -15 myfile | grep yourpatterm

Haven't tested if it's faster, but could be.

pixellany 02-24-2008 08:14 AM

sed -n '5q; /word/p' filename

Prints lines containing "word", until it reaches line 5.

slakmagik 02-24-2008 08:21 AM

Code:

:time grep oo ~/.slrn/var/killfiled* >/dev/null

real    0m0.092s
user    0m0.084s
sys    0m0.008s

:time head -20 ~/.slrn/var/killfiled* | grep oo >/dev/null

real    0m0.008s
user    0m0.000s
sys    0m0.008s


:time sed -n '/oo/p;20q' ~/.slrn/var/killfiled* >/dev/null

real    0m0.004s
user    0m0.000s
sys    0m0.000s

Hardly solid benchmarking, but that might give an idea. (I redirect to /dev/null because otherwise I'd be timing the terminal drawing time of the avalanche of stuff grep spits out.) ;)

-- Crap. Pixellany beat me to it while I was 'benchmarking'. Ah well - at least the timings might be interesting.

Clutch2 02-24-2008 08:42 AM

Sweet!

head -15 * | grep mypattern works! I'll have to do timings to see how much faster.

Since this is a bunch of usenet messages that leafnode stored, is there a way to start from an incrementing numerical filename?

Just to be honest, I'm running this on W2k using Cygwin though I do have a FC7 box.

BTW, I know there is a command to time a job, what is it?

Clutch

pixellany 02-24-2008 12:31 PM

Would you believe.........time!!

eg:
time find / -name rumplestiltskin

will tell you how long it takes your computer to learn that that name is nowhere in the system. What a waste, because you already knew that....;)

JWPurple 02-24-2008 07:33 PM

Remember that the first run is probably taking the biggest hit just to load the file data into cache. Subsequent runs of the same command are likely to take much less time because the data is already in cache. So ignore the first run.

Clutch2 02-25-2008 08:22 AM

Well, using head actually increased my times since I was greping files from a text based newsgroup. If it had been a binary group, well it would have likely rocked.

On the sed example, can the filename be a wildcard?

Clutch

slakmagik 02-25-2008 10:20 AM

Not so far as I know. You could do a loop but that would probably kill any time savings. I don't understand how head increased your times, though, or even why text vs. binary would relate to that.

(Based on more crappy benchmarks, a for loop with sed is way slower a full grep and a full grep is slower than a partial grep with head.)

Tinkster 02-25-2008 11:34 AM

Quote:

Based on more crappy benchmarks, a for loop with sed is way slower a full grep and a full grep is slower than a partial grep with head.
What about a find with sed?
Code:

time find ~/.slrn/var -type f -name killfiled* -exec sed -n '/oo/p;20q' {} \;
[edit]
Btw, if the directory structure had a set
depth you COULD use sed with wildcards ...

With e.g. ~/news/altlinuxos/2001/ ~/news/slackwareos/2003/
~/news/awk/2005/ and so on you could simply do
Code:

sed -n '/oo/p;20q' ~/news/*/*/*
[/edit]


Cheers,
Tink

Tinkster 02-25-2008 11:37 AM

Quote:

Originally Posted by digiot (Post 3069182)
Not so far as I know. You could do a loop but that would probably kill any time savings. I don't understand how head increased your times, though, or even why text vs. binary would relate to that.

Well .. if it WAS binary data head would try to find n occurrences
of a LF .... which MAY be few and far between in binaries. With
some bad luck it may need to grep through the whole 1 MB.



Cheers,
Tink

slakmagik 02-25-2008 11:51 AM

Quote:

Originally Posted by Tinkster (Post 3069274)
What about a find with sed?

Wow. Time for me to take a break and get some rest. :)

Quote:

Originally Posted by Tinkster (Post 3069276)
Well .. if it WAS binary data head would try to find n occurrences
of a LF .... which MAY be few and far between in binaries. With
some bad luck it may need to grep through the whole 1 MB.

Yeah, good point. So it does relate, but in the reverse sense. 'Course, grep and sed would be in the same boat, I think. Unless that's just the tired talking again.

Clutch2 02-25-2008 02:18 PM

find -mtime 8 seems to find the files I'm interested in but it reports the file name.
That part takes 28 seconds.

How do I connect it to grep in order to get grep to scan each filename output by the find command?

greping every file takes about 11 minutes for just grep, 15 minutes using head|grep


Thanks,
Clutch

pixellany 02-25-2008 02:33 PM

Look at the exec option to find.

"man find" for more than you ever wanted to know...;)

syg00 02-25-2008 03:27 PM

If the "depth" of the match is indeterminate at the beginning of the run, maybe use perl.
I did some tests on scanning big files a while back, and perl ran faster than sed with quit (reboot between very run to obviate cache effects).
If the record count isn't really high (in the hundreds of thousands to millions potentially), probably not worth the effort.


All times are GMT -5. The time now is 01:58 AM.