Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have a script that looks for a certain header entry that breaks the statistics script I am running. Unfortunately grep scans each of many files all the way through when either the the offending entry is in the first 10 or so lines or doesn't matter.
I looked at the man page and didn't see any setting that only says search only so many bytes in file or lines.
If you want to search only N first (last) lines, you can use head (tail) to put them in for Grep. For example if N = 15 (15 first lines, or 15 last with tail):
:time grep oo ~/.slrn/var/killfiled* >/dev/null
real 0m0.092s
user 0m0.084s
sys 0m0.008s
:time head -20 ~/.slrn/var/killfiled* | grep oo >/dev/null
real 0m0.008s
user 0m0.000s
sys 0m0.008s
:time sed -n '/oo/p;20q' ~/.slrn/var/killfiled* >/dev/null
real 0m0.004s
user 0m0.000s
sys 0m0.000s
Hardly solid benchmarking, but that might give an idea. (I redirect to /dev/null because otherwise I'd be timing the terminal drawing time of the avalanche of stuff grep spits out.)
-- Crap. Pixellany beat me to it while I was 'benchmarking'. Ah well - at least the timings might be interesting.
Remember that the first run is probably taking the biggest hit just to load the file data into cache. Subsequent runs of the same command are likely to take much less time because the data is already in cache. So ignore the first run.
Well, using head actually increased my times since I was greping files from a text based newsgroup. If it had been a binary group, well it would have likely rocked.
On the sed example, can the filename be a wildcard?
Not so far as I know. You could do a loop but that would probably kill any time savings. I don't understand how head increased your times, though, or even why text vs. binary would relate to that.
(Based on more crappy benchmarks, a for loop with sed is way slower a full grep and a full grep is slower than a partial grep with head.)
Not so far as I know. You could do a loop but that would probably kill any time savings. I don't understand how head increased your times, though, or even why text vs. binary would relate to that.
Well .. if it WAS binary data head would try to find n occurrences
of a LF .... which MAY be few and far between in binaries. With
some bad luck it may need to grep through the whole 1 MB.
Wow. Time for me to take a break and get some rest.
Quote:
Originally Posted by Tinkster
Well .. if it WAS binary data head would try to find n occurrences
of a LF .... which MAY be few and far between in binaries. With
some bad luck it may need to grep through the whole 1 MB.
Yeah, good point. So it does relate, but in the reverse sense. 'Course, grep and sed would be in the same boat, I think. Unless that's just the tired talking again.
If the "depth" of the match is indeterminate at the beginning of the run, maybe use perl.
I did some tests on scanning big files a while back, and perl ran faster than sed with quit (reboot between very run to obviate cache effects).
If the record count isn't really high (in the hundreds of thousands to millions potentially), probably not worth the effort.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.