LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 11-25-2014, 04:28 PM   #1
NotionCommotion
Member
 
Registered: Aug 2012
Posts: 536

Rep: Reputation: Disabled
Find strings in files recursively that match patter


I have a bunch of directories with a bunch of text files (actually, PHP, but they are text). I am trying to find all words that start with "BJ_" (case sensitive). In an ideally world, I would get the list of matches truncated after the first non-alphanumerical character except "_" (i.e. blablablaBJ_FOO_BAR'blablabla would result in BJ_FOO_BAR). Even more ideally, it would eliminate duplicates, but I am sure I can figure that out.

My initial stab was below, but it just hangs.

Code:
grep -rl BJ_.
Any thoughts?
 
Old 11-25-2014, 04:44 PM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 14,834

Rep: Reputation: 1820Reputation: 1820Reputation: 1820Reputation: 1820Reputation: 1820Reputation: 1820Reputation: 1820Reputation: 1820Reputation: 1820Reputation: 1820Reputation: 1820
Open another terminal and keep an eye on it - probably in state "D", waiting on the disk(s).
 
Old 11-25-2014, 04:50 PM   #3
NotionCommotion
Member
 
Registered: Aug 2012
Posts: 536

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by syg00 View Post
Open another terminal and keep an eye on it - probably in state "D", waiting on the disk(s).
I am totally lost. Can you please elaborate? Thank you

EDIT. Got a little closer. This at least finds the lines, but includes content before the match.

Code:
grep -r 'BJ_' */*.php

Last edited by NotionCommotion; 11-25-2014 at 04:55 PM.
 
Old 11-25-2014, 05:29 PM   #4
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Arch
Posts: 3,013

Rep: Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225
Quote:
My initial stab was below, but it just hangs.
Code:
grep -rl BJ_.
You missed the space before the ".", so it's considered part of the pattern. Since you didn't pass any file names it waits for input on stdin.

Code:
grep -ro --include='*.php' 'BJ_[a-zA-Z0-9_]*' .
Quote:
--include=GLOB
Search only files whose base name matches GLOB (using wildcard matching as described under --exclude).
-o, --only-matching
Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.
 
1 members found this post helpful.
Old 11-25-2014, 05:57 PM   #5
NotionCommotion
Member
 
Registered: Aug 2012
Posts: 536

Original Poster
Rep: Reputation: Disabled
Thank you ntubski,

I want to remove duplicates, and the proceeding filename messed me up. Fortunately, I consulted the handy man, and discovered the -h flag!

I then tried to pipe it to uniq, but no success. Suggestions?
 
Old 11-25-2014, 07:33 PM   #6
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 14,834

Rep: Reputation: 1820Reputation: 1820Reputation: 1820Reputation: 1820Reputation: 1820Reputation: 1820Reputation: 1820Reputation: 1820Reputation: 1820Reputation: 1820Reputation: 1820
uniq requires sorted input - pipe it through sort first.
 
Old 11-25-2014, 08:23 PM   #7
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,245

Rep: Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684
You could try using awk (not tested):
Code:
awk 'match($0,/(BJ_[[:alnum:]_]*)/,f) && !_[f[1]]++{out[i++] = f[1]}END{for(a in f)print f[a]}' *
 
Old 11-25-2014, 08:54 PM   #8
NotionCommotion
Member
 
Registered: Aug 2012
Posts: 536

Original Poster
Rep: Reputation: Disabled
Thank you syg00 and grail,

As for as awk goes, my head is spinning a bit. Maybe I will use Excel
Quote:
Originally Posted by grail View Post
Code:
awk 'match($0,/(BJ_[[:alnum:]_]*)/,f) && !_[f[1]]++{out[i++] = f[1]}END{for(a in f)print f[a]}' *
 
Old 11-25-2014, 09:36 PM   #9
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 14,834

Rep: Reputation: 1820Reputation: 1820Reputation: 1820Reputation: 1820Reputation: 1820Reputation: 1820Reputation: 1820Reputation: 1820Reputation: 1820Reputation: 1820Reputation: 1820
grail revels in making the code as compact as possible ...
It could be written more expansively to make it more "self documenting" to the uninitiated.

"grep | sort | uniq" is quite vaid.
 
Old 11-26-2014, 02:49 AM   #10
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,245

Rep: Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684
My bad ... I do forget sometimes if it is tricky to explain

Code:
awk 'match($0,/(BJ_[[:alnum:]_]*)/,f) # look in each file for any lines that match 'BJ_' followed by zero or more alpha-numeric characters or an underscore
                                      # additionally, stores the found item in the f array
     && !_[f[1]]++                    # this is the uniq testing part.  So if we have already found this item before the '++' will increase and once 1 or more the ! negates it to be false
     {out[i++] = f[1]}                # if both items above return true (or in the case of returning numbers 0 is false and any non-zero value is true) assign the value in array position 1
                                      # to array 'out' and increase the index of array in 'out'
     END{for(a in out)print out[a]}   # END is executed after all files have been read (as a side note, if you have many 100s/1000s of files you may wish to use ENDFILE to execute this portion after closing each file
                                      # ( side-side note - if you do use this option you would also need to clear the 'out' array prior to reading the next file))
                                      # the for loop assigns each index of 'out' to the value of 'a' (note: order is not guaranteed, ie. could be 2,3,1 or any other order), then print returns the required value from
                                      # out the array (note I had used wrong array previously :( )
     ' *                              # all files in current directory
If you should have any questions, please ask away and I will try to help

Everything can be found at Gawk manual

Note: the ENDFILE option is only from V4+
 
1 members found this post helpful.
Old 11-26-2014, 08:01 AM   #11
NotionCommotion
Member
 
Registered: Aug 2012
Posts: 536

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by grail View Post
My bad ... I do forget sometimes if it is tricky to explain
Very nicely documented! Thank you
 
Old 11-26-2014, 09:07 AM   #12
NotionCommotion
Member
 
Registered: Aug 2012
Posts: 536

Original Poster
Rep: Reputation: Disabled
Ended up going with the following:

Code:
grep -roh --include='*.php' 'BJ_[a-zA-Z0-9_]*' . | sort -u
I also tried awk, but received an error.
Code:
[Michael@devserver application]$ awk 'match($0,/(BJ_[[:alnum:]_]*)/,f) && !_[f[1]]++ {out[i++] = f[1]} END{for(a in out)print out[a]}' *
awk: cmd. line:1: fatal: cannot open file `classes' for reading (Success)
[Michael@devserver application]$
Anyhow, all is well that ends well. Thank you all for your help!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] AWK: match multiple strings in the file, print 1 when match and 0 when not cristalp Programming 12 11-15-2011 11:18 AM
[SOLVED] how to match patter and merge below line in sun solaris 5.10 Sha_unix Solaris / OpenSolaris 8 09-22-2011 06:59 AM
How to recursively find files over a certain size beckettisdogg Linux - Newbie 1 04-15-2010 05:56 PM
awk, sed find and replace recursively from files bluewind Linux - Newbie 17 02-26-2010 11:06 AM
how to find files in current directory only (not recursively) babu198649 Linux - Newbie 2 12-27-2007 02:30 AM


All times are GMT -5. The time now is 04:11 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration