LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 06-05-2010, 08:40 AM   #1
allasso
Member
 
Registered: Oct 2003
Distribution: debian
Posts: 35

Rep: Reputation: 15
recursive grep speed


Hello,

When I grep recursively using:

grep -r '' big/hairy/file_hierarchy/ | grep 'expression'

it is about 10 times faster (no exagerration) than using:

grep -r 'expression' big/hairy/file_hierarchy/

Anyone know why?
 
Old 06-05-2010, 09:08 AM   #2
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
I get the opposite result:
Code:
[root@Ath etc]# time grep -r '' .|grep comment
<<output snipped>>
real    0m0.161s
user    0m0.053s
sys     0m0.040s
[root@Ath etc]# time grep -r '' .|grep comment > /dev/null

real    0m0.108s
user    0m0.043s
sys     0m0.047s
[root@Ath etc]# time grep -r comment . > /dev/null

real    0m0.044s
user    0m0.013s
sys     0m0.020s
[root@Ath etc]#
This is what I would expect. (The difference between the first 2 cases is presumably the time to write to the terminal.)
 
Old 06-05-2010, 10:10 AM   #3
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
Also note the very first op that returns a set of items like 'find' or 'grep -r' also serves to fill kernel caches meaning subsequent similar ops on the same set will be faster so you always have to run timing tests a few times. And most of the time using 'cat|less', 'cat|grep', 'ps|grep' (pgrep) or 'ls|grep' ('grep' or 'find') results in inefficiency, rarely it's used for compatibility reasons: one grep good, two greps baaa-aaahhhd!
 
Old 06-05-2010, 10:15 AM   #4
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
hmmmm---caching had occured to me. How can you clear the cache to get a valid test?
 
Old 06-05-2010, 10:54 AM   #5
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
http://www.linuxquestions.org/questi...2/#post3084228 (/proc/sys/vm/drop_caches) but IIGC that sysctl only frees memory from clean caches, it does not write out dirty objects. I don't know what else there is to free cached objects apart from 'sync'.
 
Old 06-05-2010, 11:30 AM   #6
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
Code:
[root@Ath log]# time grep -r '' . | grep lizard

real    0m1.630s
user    0m0.040s
sys     0m0.057s


[root@Ath log]# time grep -r lizard .

real    0m0.118s
user    0m0.027s
sys     0m0.020s


[root@Ath log]# time grep -r lizard .

real    0m0.054s
user    0m0.030s
sys     0m0.013s


[root@Ath log]# time grep -r '' . | grep lizard

real    0m0.087s
user    0m0.033s
sys     0m0.043s


[root@Ath log]# time grep -r lizard .

real    0m0.054s
user    0m0.017s
sys     0m0.027s


[root@Ath log]# time grep -r '' . | grep lizard

real    0m0.086s
user    0m0.043s
sys     0m0.030s
[root@Ath log]#
Yes--caching is obviously working....How to explain all the details?--I'm not sure.
 
Old 06-06-2010, 08:55 AM   #7
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
If you run 'sync; slabtop -d1 -sa; sync' and perform a system-wide 'find' you should see the amount of objects in caches growing quickly.
 
Old 06-08-2010, 12:14 PM   #8
allasso
Member
 
Registered: Oct 2003
Distribution: debian
Posts: 35

Original Poster
Rep: Reputation: 15
wow, you guys are talking pretty far over my head.

The results I get are grepping 5 GB hierarchies with about 15000 files. I am looking at times like 72s versus 7s (timing with my watch), rather than msec ranges.

I started observing this about 2 years ago and it has been absolutely consistent, so much so that I just got in the habit of doing it that way for big trees. Every once in a while I check to see if it is still true, and it is (on my computer anyway.)

I use a mac, but doesn't seem like that would make any difference.
 
Old 06-08-2010, 12:19 PM   #9
allasso
Member
 
Registered: Oct 2003
Distribution: debian
Posts: 35

Original Poster
Rep: Reputation: 15
--error--

Last edited by allasso; 06-08-2010 at 12:28 PM.
 
Old 06-09-2010, 04:43 PM   #10
Revery
LQ Newbie
 
Registered: Jun 2010
Posts: 4

Rep: Reputation: 1
First off, shame on you guys for trying to derive performance measurements out of tiny greps that don't even take a second. :-)
I'm not sure about the reason behind that time different using those two different syntaxes, but I can offer you the advice of always using the -I flag to ignore binary files. This drastically cuts down the time that my recursive greps take.
 
Old 06-09-2010, 07:22 PM   #11
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
Excuse me?? something approaching 1/10th of a second is a VERY LONG TIME for a modern computer.

I wonder if bypassing the binary files will reduce the **difference** between the 2 methods.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
logging recursive grep output zvar Linux - Newbie 4 09-13-2009 08:45 AM
how does recursive grep work? serutan Linux - Newbie 5 07-11-2008 01:00 PM
Sorting recursive 'ls' and 'grep' SirTristan Linux - Newbie 5 03-13-2008 02:39 PM
recursive grep xpucto Solaris / OpenSolaris 2 05-29-2007 09:57 AM
Recursive grep jimieee Linux - General 5 10-06-2003 10:13 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 10:40 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration