Limiting how deep in file grep searches
I have a script that looks for a certain header entry that breaks the statistics script I am running. Unfortunately grep scans each of many files all the way through when either the the offending entry is in the first 10 or so lines or doesn't matter.
I looked at the man page and didn't see any setting that only says search only so many bytes in file or lines. Any ideas? Thanks, Clutch |
If you want to search only N first (last) lines, you can use head (tail) to put them in for Grep. For example if N = 15 (15 first lines, or 15 last with tail):
Code:
head -15 myfile | grep yourpattern |
sed -n '5q; /word/p' filename
Prints lines containing "word", until it reaches line 5. |
Code:
:time grep oo ~/.slrn/var/killfiled* >/dev/null -- Crap. Pixellany beat me to it while I was 'benchmarking'. Ah well - at least the timings might be interesting. |
Sweet!
head -15 * | grep mypattern works! I'll have to do timings to see how much faster. Since this is a bunch of usenet messages that leafnode stored, is there a way to start from an incrementing numerical filename? Just to be honest, I'm running this on W2k using Cygwin though I do have a FC7 box. BTW, I know there is a command to time a job, what is it? Clutch |
Would you believe.........time!!
eg: time find / -name rumplestiltskin will tell you how long it takes your computer to learn that that name is nowhere in the system. What a waste, because you already knew that....;) |
Remember that the first run is probably taking the biggest hit just to load the file data into cache. Subsequent runs of the same command are likely to take much less time because the data is already in cache. So ignore the first run.
|
Well, using head actually increased my times since I was greping files from a text based newsgroup. If it had been a binary group, well it would have likely rocked.
On the sed example, can the filename be a wildcard? Clutch |
Not so far as I know. You could do a loop but that would probably kill any time savings. I don't understand how head increased your times, though, or even why text vs. binary would relate to that.
(Based on more crappy benchmarks, a for loop with sed is way slower a full grep and a full grep is slower than a partial grep with head.) |
Quote:
Code:
time find ~/.slrn/var -type f -name killfiled* -exec sed -n '/oo/p;20q' {} \; Btw, if the directory structure had a set depth you COULD use sed with wildcards ... With e.g. ~/news/altlinuxos/2001/ ~/news/slackwareos/2003/ ~/news/awk/2005/ and so on you could simply do Code:
sed -n '/oo/p;20q' ~/news/*/*/* Cheers, Tink |
Quote:
of a LF .... which MAY be few and far between in binaries. With some bad luck it may need to grep through the whole 1 MB. Cheers, Tink |
Quote:
Quote:
|
find -mtime 8 seems to find the files I'm interested in but it reports the file name.
That part takes 28 seconds. How do I connect it to grep in order to get grep to scan each filename output by the find command? greping every file takes about 11 minutes for just grep, 15 minutes using head|grep Thanks, Clutch |
Look at the exec option to find.
"man find" for more than you ever wanted to know...;) |
If the "depth" of the match is indeterminate at the beginning of the run, maybe use perl.
I did some tests on scanning big files a while back, and perl ran faster than sed with quit (reboot between very run to obviate cache effects). If the record count isn't really high (in the hundreds of thousands to millions potentially), probably not worth the effort. |
All times are GMT -5. The time now is 01:58 AM. |