LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   logging recursive grep output (https://www.linuxquestions.org/questions/linux-newbie-8/logging-recursive-grep-output-754654/)

zvar 09-12-2009 10:45 AM

logging recursive grep output
 
I have a file structure kinda like below:
Main
|-Folder1
|--*.htm
|
|-Folder2
|--*.htm
|
|-Folder3
|--*.htm

I want to grep the htm files and output the results to each folder like:
Main
|-Folder1
|--*.htm
|--results.txt
|
|-Folder2
|--*.htm
|--results.txt

I can do something like "grep -r $pattern * > results.txt", but that only puts it under the Main/results.txt, and doesn't break it up. Is thara a way that each folder only has the results for the htm files in the folder?

bartonski 09-12-2009 01:29 PM

for loop, find and grep.
 
Quote:

Originally Posted by zvar (Post 3680101)
I want to grep the htm files and output the results to each folder like:
Main
|-Folder1
|--*.htm
|--results.txt
|
|-Folder2
|--*.htm
|--results.txt

grep isn't powerful enough to do this in and of its self, but I think that a combination of the shell's for loop and a find might do what you need:

Code:

for dir in $(find . -type d)
do
    grep "your pattern here" $dir/*.htm > $dir/results.txt
done

A little explanation is in order:

find is a program that is used to traverse file system trees, and test each directory and/or file within that tree. Very powerful tool, little bit of a learning curve {and just a hair of understatement}.

the -type argument to find will allow you to specify whether you want to test for directories ( -type d) or files (-type f).

So in this case, we're looking for all sub directories under the current directory.

The for loop then executes the commands between "do" and "done" for each directory.

There are probably a few other ways to do this: 'find' can be used to execute commands directly, without use of the for loop, or, if you're a perlish type of person, there's a module called "File::Find" which does everything that find does, and more.

If this works for you and you're interested, I would take a look at the man pages for bash (to learn about for loops), find, and xargs (which is often used in conjunction with find).

zvar 09-12-2009 03:29 PM

Quote:

Originally Posted by bartonski (Post 3680227)
Code:

for dir in $(find . -type d)
do
    grep "your pattern here" $dir/*.htm > $dir/results.txt
done


This works great for some of the folders, but some have spaces in the name, and others even have single quotes in the name. Is there a way to escape all the special characters?

I didn't find anything in the help and man pages for find or for to do this.
All the experiments I did trying to add double quotes failed. Is there a way to get folders with special characters to be passed?

And thanks for the help already.

lutusp 09-12-2009 09:12 PM

Quote:

Originally Posted by zvar (Post 3680327)
This works great for some of the folders, but some have spaces in the name, and others even have single quotes in the name. Is there a way to escape all the special characters?

I didn't find anything in the help and man pages for find or for to do this.
All the experiments I did trying to add double quotes failed. Is there a way to get folders with special characters to be passed?

And thanks for the help already.

The method below will handle spaces in the paths, but apart from spaces, "special characters" is an open category -- you will need to determine what special characters are present and how they will affect the process.

Code:

pattern="this"
path="/path/to/files"

find $path | egrep "\.html?$" | while read file
do
  target="$(dirname $file)/search_log.txt"
  count=$(grep -c $pattern $file)
  echo "There was $count instances of \"$pattern\" in $file"
  # echo "There was $count instances of \"$pattern\" in $file" >> $target
done

I commented out the line that makes entries into any number of "search_log.txt" files because I didn't want to pollute my system with them -- testing this feature is up to you. But remember this -- you can only run this program once without finding and removing all the log files (they are appended to on each new run).

Be careful when making changes to this program, and use only backup copies of your data tree for testing.

bartonski 09-13-2009 08:45 AM

Quote:

Originally Posted by lutusp (Post 3680568)
apart from spaces, "special characters" is an open category -- you will need to determine what special characters are present and how they will affect the process.

Yeah. What lutusp said.

If it's not too much trouble, I would suggest renaming folders with special characters. Also, as I mentioned in my first post, find has an option which allows you to execute code directly from the find. This can be used as a way to circumvent some issues with special characters, sometimes (I used this once to do a mass rename of a bunch of files which had single quotes in the names, which is otherwise a fairly thorny problem)

The '-exec' option to find is used to execute code.

your find would look like this:

Code:

path="path/to/your/stuff"
search_string="whatever your're trying to match"
find $path -type d -exec grep $search_string {}/* > {}/search_log.txt \;

Again, a little explanation is in order:
when you use the -exec option to find, '{}' is a placeholder for the file or directory that find is operating on. Also, the whole expression has to be followed by an escaped semi-colon. The reason that this works so well for processing special characters is that '{}' isn't expanded until find is actually processing that expression, which is well after the shell tokenizes the command.

I haven't tested this myself, it may take some fiddling with to get right, and you may very well spend a lot less time renaming some directories.


All times are GMT -5. The time now is 11:50 AM.