LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices



Reply
 
Search this Thread
Old 09-12-2009, 11:45 AM   #1
zvar
LQ Newbie
 
Registered: Sep 2009
Posts: 2

Rep: Reputation: 0
logging recursive grep output


I have a file structure kinda like below:
Main
|-Folder1
|--*.htm
|
|-Folder2
|--*.htm
|
|-Folder3
|--*.htm

I want to grep the htm files and output the results to each folder like:
Main
|-Folder1
|--*.htm
|--results.txt
|
|-Folder2
|--*.htm
|--results.txt

I can do something like "grep -r $pattern * > results.txt", but that only puts it under the Main/results.txt, and doesn't break it up. Is thara a way that each folder only has the results for the htm files in the folder?
 
Old 09-12-2009, 02:29 PM   #2
bartonski
Member
 
Registered: Jul 2006
Location: Louisville, KY
Distribution: Fedora 12, Slackware, Debian, Ubuntu Karmic, FreeBSD 7.1
Posts: 443
Blog Entries: 1

Rep: Reputation: 47
for loop, find and grep.

Quote:
Originally Posted by zvar View Post
I want to grep the htm files and output the results to each folder like:
Main
|-Folder1
|--*.htm
|--results.txt
|
|-Folder2
|--*.htm
|--results.txt
grep isn't powerful enough to do this in and of its self, but I think that a combination of the shell's for loop and a find might do what you need:

Code:
for dir in $(find . -type d)
do
    grep "your pattern here" $dir/*.htm > $dir/results.txt
done
A little explanation is in order:

find is a program that is used to traverse file system trees, and test each directory and/or file within that tree. Very powerful tool, little bit of a learning curve {and just a hair of understatement}.

the -type argument to find will allow you to specify whether you want to test for directories ( -type d) or files (-type f).

So in this case, we're looking for all sub directories under the current directory.

The for loop then executes the commands between "do" and "done" for each directory.

There are probably a few other ways to do this: 'find' can be used to execute commands directly, without use of the for loop, or, if you're a perlish type of person, there's a module called "File::Find" which does everything that find does, and more.

If this works for you and you're interested, I would take a look at the man pages for bash (to learn about for loops), find, and xargs (which is often used in conjunction with find).
 
Old 09-12-2009, 04:29 PM   #3
zvar
LQ Newbie
 
Registered: Sep 2009
Posts: 2

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by bartonski View Post
Code:
for dir in $(find . -type d)
do
    grep "your pattern here" $dir/*.htm > $dir/results.txt
done
This works great for some of the folders, but some have spaces in the name, and others even have single quotes in the name. Is there a way to escape all the special characters?

I didn't find anything in the help and man pages for find or for to do this.
All the experiments I did trying to add double quotes failed. Is there a way to get folders with special characters to be passed?

And thanks for the help already.
 
Old 09-12-2009, 10:12 PM   #4
lutusp
Member
 
Registered: Sep 2009
Distribution: Fedora
Posts: 835

Rep: Reputation: 101Reputation: 101
Quote:
Originally Posted by zvar View Post
This works great for some of the folders, but some have spaces in the name, and others even have single quotes in the name. Is there a way to escape all the special characters?

I didn't find anything in the help and man pages for find or for to do this.
All the experiments I did trying to add double quotes failed. Is there a way to get folders with special characters to be passed?

And thanks for the help already.
The method below will handle spaces in the paths, but apart from spaces, "special characters" is an open category -- you will need to determine what special characters are present and how they will affect the process.

Code:
pattern="this"
path="/path/to/files"

find $path | egrep "\.html?$" | while read file
do
   target="$(dirname $file)/search_log.txt"
   count=$(grep -c $pattern $file)
   echo "There was $count instances of \"$pattern\" in $file"
   # echo "There was $count instances of \"$pattern\" in $file" >> $target
done
I commented out the line that makes entries into any number of "search_log.txt" files because I didn't want to pollute my system with them -- testing this feature is up to you. But remember this -- you can only run this program once without finding and removing all the log files (they are appended to on each new run).

Be careful when making changes to this program, and use only backup copies of your data tree for testing.
 
Old 09-13-2009, 09:45 AM   #5
bartonski
Member
 
Registered: Jul 2006
Location: Louisville, KY
Distribution: Fedora 12, Slackware, Debian, Ubuntu Karmic, FreeBSD 7.1
Posts: 443
Blog Entries: 1

Rep: Reputation: 47
Quote:
Originally Posted by lutusp View Post
apart from spaces, "special characters" is an open category -- you will need to determine what special characters are present and how they will affect the process.
Yeah. What lutusp said.

If it's not too much trouble, I would suggest renaming folders with special characters. Also, as I mentioned in my first post, find has an option which allows you to execute code directly from the find. This can be used as a way to circumvent some issues with special characters, sometimes (I used this once to do a mass rename of a bunch of files which had single quotes in the names, which is otherwise a fairly thorny problem)

The '-exec' option to find is used to execute code.

your find would look like this:

Code:
path="path/to/your/stuff"
search_string="whatever your're trying to match"
find $path -type d -exec grep $search_string {}/* > {}/search_log.txt \;
Again, a little explanation is in order:
when you use the -exec option to find, '{}' is a placeholder for the file or directory that find is operating on. Also, the whole expression has to be followed by an escaped semi-colon. The reason that this works so well for processing special characters is that '{}' isn't expanded until find is actually processing that expression, which is well after the shell tokenizes the command.

I haven't tested this myself, it may take some fiddling with to get right, and you may very well spend a lot less time renaming some directories.
 
  


Reply

Tags
bash, find, grep


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Trying to understand pipes - Can't pipe output from tail -f to grep then grep again lostjohnny Linux - Newbie 15 03-12-2009 11:31 PM
how does recursive grep work? serutan Linux - Newbie 5 07-11-2008 02:00 PM
recursive grep xpucto Solaris / OpenSolaris 2 05-29-2007 10:57 AM
grep output on stdout and grep output to file don't match xnomad Linux - General 3 01-13-2007 05:56 AM
Recursive grep jimieee Linux - General 5 10-06-2003 11:13 AM


All times are GMT -5. The time now is 10:27 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration