LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   Bash: recursion = bad? (http://www.linuxquestions.org/questions/programming-9/bash-recursion-%3D-bad-805765/)

bdoremus 05-04-2010 12:00 AM

Bash: recursion = bad?
 
Hello all,
new Bash programmer here. I've dealt with other languages in the past, but I honestly think that's not to my advantage right now.

GOAL:
Starting in a directory, search it and all sub directories for any files containing a certain text string, then copy those files to another directory. There's about 1GB worth of files (all email messages) I want to search through.

PROBLEM:
Since I don't know the depth of the folders, my instinct was to create a recursive script. Unfortunately, I have to run this on a server... and I'm guessing it'll completely kill everything.
I'm hoping there's a cleaner solution?

Code:

#!/bin/bash

#navigate to the backups
cd /mnt/home/user/.mailbox/

for f in *; do

  if [ -f "$f" ]; then # it was a file
      #here is where I'd check for the string
      #if it matched, I'd copy it to the directory

  else # it was a folder
      cd "$f"
      /bin/emailSearch.sh # the script calls itself in the new subdirectory
      cd ../
  fi
done

Help?!

SuperJediWombat! 05-04-2010 12:23 AM

Code:

man find
Something like this will do. If you are expecting to match a large volume of files, you may want to look at piping find into xargs rather than using -exec.
Code:

find /mnt/home/user/.mailbox/ -iname "string" -type f -exec mv '{}' /mnt/home/user/thefolder/ \;

bdoremus 05-04-2010 12:51 AM

doesn't find -iname just search for the file name? I need to check text inside the file itself, or perhaps I'm reading the man wrong..?

The check is rather complicated, too. If any part of the "to" field matches three or more of a set of 10 users, it's considered a hit. I can use regular expressions to accomplish this, but again I don't think find can do this inside of a file.

Or am I missing something?

catkin 05-04-2010 01:25 AM

You could run the script via the nice command to reduce its impact on the server and if execution time is not important you could add a sleep command in the script too.

You could use the file command to restrict the search to appropriate file types.

You could use find to run the script in each directory but that gains little over recursion.

A.Thyssen 05-04-2010 01:40 AM

Have a look at grep -R option
This is a recursive grep.

If you combine it with the -l option you get a list of all files containing the search string.

SuperJediWombat! 05-04-2010 02:31 AM

My mistake, I misread your post and thought that you only needed to search the filenames.
Code:

find . -type f | xargs grep -s -l "test" | xargs -I {} cp {} ~/

grail 05-04-2010 02:44 AM

+1 to A. Thyssen - and if you use the -f option with the name of the 10 users in a file you can cover that part too :)

bdoremus 05-04-2010 11:14 PM

Awesome, thanks guys!

I was able to rewrite it using grep -rl; definitely cleaned things up a lot!

@grail: I wasn't able to find any documentation on the "-f" option; maybe I'm not looking in the right place (man grep)?

@Super: I'm going to come back to this; it's taking me a while to decipher. Thanks!

SuperJediWombat! 05-05-2010 12:05 AM

Try this :)
Quote:

grep -R -s -l "test" /mnt/home/user/.mailbox/ | xargs -I {} cp {} /mnt/home/user/movedir/

grail 05-05-2010 03:02 AM

Yes to 'man grep':

Code:

-f FILE, --file=FILE
              Obtain patterns from FILE, one per line.        The  empty  file  con-
              tains zero patterns, and therefore matches nothing.


bdoremus 05-05-2010 09:01 PM

rightyo- I've restructured my program after learning about the tools you all mentioned here; thanks!

SuperJediWombat! 05-05-2010 09:33 PM

Can you post it?

bdoremus 05-05-2010 09:49 PM

It's still broken, but with other problems :p

In essence, what I did was:
1) run a find to get a list of all files in the applicable folders (current problem, see other post)
allFiles=$(find $src -print)
2) grep through the files to see if they had the string I wanted, and save the file paths of the matches to a variable
files=$(grep -il -E "^To:.*($bod|$admin)" $allFiles)
3) iterate through the matches to modify the files,
for f in $files; do
...

4) then copy them over
cp "$fSrc" "$dest$fName"

Thanks again! I've learned a ton so far because of this little project!

tuxdev 05-06-2010 09:47 AM

Code:

allFiles=$(find $src -print)
It's almost never correct to put the results of find into a variable like thatk..

Code:

files=$(grep -il -E "^To:.*($bod|$admin)" $allFiles)
..because properly quoting allFiles is one big long string, and not quoting leads to word-splitting problems.

Code:

for f in $files; do
Again, not quoting means word-splitting, and that's almost always a Bad Thing.

You can also use this sort of loop:
Code:

while IFS="" read -r -d "" file ; do
  if grep -iq -E "^To:.*($bod|$admin)" "$file ; then
      cp ...
  fi
done < <(find "$src" -print0)


catkin 05-06-2010 11:47 AM

Quote:

Originally Posted by tuxdev (Post 3959365)
Code:

while IFS="" read -r -d "" file ; do
  if grep -iq -E "^To:.*($bod|$admin)" "$file ; then
      cp ...
  fi
done < <(find "$src" -print0)


As someone pointed out in another thread (CTOP) there is no need for the IFS="" when each line is being read into a single variable, in this case $file.


All times are GMT -5. The time now is 03:59 PM.