LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 02-24-2013, 04:13 AM   #1
danone1
LQ Newbie
 
Registered: Feb 2013
Location: Australia, Sydney
Posts: 6

Rep: Reputation: Disabled
faster search inside files


when i use this command it takes 2 second to find right file (i have 30 files under this directory)
I want fast search by using any command per your recommendation. I use "find" and "grep" so far and it's not fast enough because php function use grep to identify the word on result page. Is there any blazing fast system where i can use any command or software i can install anything on my server as long as it works fast i also tried ramdisk but no difference any suggestion from linux gurus highly appriciated.

(need to be exact match only -Rc works in 1 sec however doesn't come with exact match "--word regexp" catch exact match but it takes 2-3 second to result i been trying to solve this problem for 3 days


grep -Rc --word-regexp 'departmento'

/language/latest/swe.txt:0
/language/latest/ltz.txt:0
/language/latest/liv.txt:0
/language/latest/bos.txt:0
/language/latest/spa.txt:1
/language/latest/azj.txt:0
/language/latest/afr.txt:0
/language/latest/als.txt:0
/language/latest/cym.txt:0
/language/latest/oci.txt:0



grep -Rc --word-regexp 'department'

/language/latest/swe.txt:0
/language/latest/ltz.txt:0
/language/latest/liv.txt:0
/language/latest/bos.txt:0
/language/latest/spa.txt:0
/language/latest/azj.txt:0
/language/latest/afr.txt:0
/language/latest/als.txt:0
/language/latest/cym.txt:0
/language/latest/oci.txt:0
 
Old 02-24-2013, 04:22 AM   #2
yowi
Member
 
Registered: Dec 2002
Location: Au
Distribution: Debian
Posts: 209

Rep: Reputation: 55
Sounds like need some indexing, have you had a look at Lucene or Glimpse?
 
Old 02-24-2013, 04:24 AM   #3
danone1
LQ Newbie
 
Registered: Feb 2013
Location: Australia, Sydney
Posts: 6

Original Poster
Rep: Reputation: Disabled
hi thank you for your answer no i haven't install anything on my centos can i use yum to install lucene ? i am using nginx not apache btw

Quote:
Originally Posted by yowi View Post
Sounds like need some indexing, have you had a look at Lucene or Glimpse?
 
Old 02-24-2013, 05:21 AM   #4
danone1
LQ Newbie
 
Registered: Feb 2013
Location: Australia, Sydney
Posts: 6

Original Poster
Rep: Reputation: Disabled
i found -P now little bit fast maybe not 4sec but 2sec

is there any way i can pipe 8 instance like this example any better if i do ?


1. Find the necessary files, pipe them to xargs and tell it to execute 8 instances.

time find ./ -name "*.ext" -print0 | xargs -0 -n1 -P8 grep -H "string" >> Strings_find8

real 3m24.358s
user 1m27.654s
sys 9m40.316s



can i use this way ?



time find ./ -name "*.ext" -print0 | xargs -0 -n1 -P8 grep -PRc --word-regexp "Aneurysmavuodon" /language/latest
>> Strings_find8
 
Old 02-24-2013, 07:24 AM   #5
danone1
LQ Newbie
 
Registered: Feb 2013
Location: Australia, Sydney
Posts: 6

Original Poster
Rep: Reputation: Disabled
i found -P very fast but still far from my expectation

[root@server1 latest]# time find ./ -name "*.ext" -print0 | grep -PRc --word-regexp 'direzione' /language/latest
/language/latest/swe.txt:0
/language/latest/ltz.txt:0
/language/latest/lit.txt:0
/language/latest/bos.txt:0
/language/latest/lav.txt:0
/language/latest/som.txt:0
/language/latest/deu.txt:0
/language/latest/spa.txt:0
/language/latest/azj.txt:0
/language/latest/ita.txt:1

real 0m0.225s
user 0m0.180s
sys 0m0.044s



some of them suggest me to make 8 instance pipe by using args but not good so far


[root@server1 latest]# time find ./ -name "*.ext" -print0 | xargs -0 -n1 -P8 grep -PRc --word-regexp 'direzione' /language/latest
/language/latest/swe.txt:0
/language/latest/ltz.txt:0
/language/latest/lit.txt:0
/language/latest/bos.txt:0
/language/latest/lav.txt:0
/language/latest/som.txt:0
/language/latest/deu.txt:0
/language/latest/ita.txt:1
real 0m0.227s
user 0m0.186s
sys 0m0.040s
 
Old 02-24-2013, 07:39 AM   #6
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
Why use
Code:
]$ time (find language/latest/ -type f -name "*.txt" -print0|xargs -0 -iX grep -Hc "direzione" 'X')
language/latest/swe.txt:0
language/latest/ltz.txt:0
language/latest/lit.txt:0
language/latest/bos.txt:0
language/latest/lav.txt:0
language/latest/som.txt:0
language/latest/deu.txt:0
language/latest/spa.txt:0
language/latest/azj.txt:0
language/latest/ita.txt:1

real    0m0.023s
user    0m0.002s
sys     0m0.024s
when you can
Code:
]$ time grep -c "direzione" -r language/latest/
language/latest/swe.txt:0
language/latest/ltz.txt:0
language/latest/lit.txt:0
language/latest/bos.txt:0
language/latest/lav.txt:0
language/latest/som.txt:0
language/latest/deu.txt:0
language/latest/spa.txt:0
language/latest/azj.txt:0
language/latest/ita.txt:1

real    0m0.004s
user    0m0.000s
sys     0m0.003s
and I agree indexing probably makes more sense if you query often and a lot. Either find something that works with Nginx or find something that offers an independent abstraction layer.
 
Old 02-24-2013, 03:31 PM   #7
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,912

Rep: Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513
I won't guarantee this will work in all cases, but none of the examples so far perform a parallel search - and serial file searching always takes longer:

Code:
find language/latest/ -type f -name "*.txt" -print|
(
   while read F do
     grep -Hc "direzione" $F &
   done
   wait
) >>log.file
--- note, not syntax checked...

This should provide the same list, though the order of lines will be interleaved.

The goal here is to do all the searching in parallel (hence the grep ... &). This should work as long as you don't have many thousands of files to search (you will use up your process limit quota).

The system time should increase due to context switches, but the elapsed time should decrease even more.

Last edited by jpollard; 02-24-2013 at 03:33 PM.
 
Old 02-24-2013, 04:17 PM   #8
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,781

Rep: Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082
Quote:
Originally Posted by danone1 View Post
i found -P very fast but still far from my expectation
You are looking for a plain string, so you can use -F (it only handles plain strings so it's probably the fastest mode; although it seems like IO could be the limiting factor here, in which case there won't be any measurable improvement).

Also, from your output it appears that you never have more than 1 match per file, maybe you could use -l (--files-with-matches) instead of -c (--count)?

Quote:
Originally Posted by jpollard View Post
I won't guarantee this will work in all cases, but none of the examples so far perform a parallel search
I think the xargs -n1 -P8 one is parallel. It might be better to try with a bit less parallelism, xargs -n5 -P2.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Search text inside PDF files yogomix Linux - Desktop 7 09-15-2014 05:12 AM
Make Ramdisk of 6GB for SQL DB fot faster search petruha1983 Linux - Hardware 1 05-21-2009 07:27 AM
script to search inside list of files adam_blackice Programming 5 03-25-2008 09:35 AM
LXer: Tip: Faster, Better Search for Movable Type LXer Syndicated Linux News 0 10-13-2006 03:54 PM
Search for text inside files alaios Linux - Newbie 7 03-12-2006 09:20 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 07:46 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration