LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices



Reply
 
Search this Thread
Old 06-18-2007, 06:40 AM   #1
Epitaph
LQ Newbie
 
Registered: Mar 2007
Posts: 2

Rep: Reputation: 0
Bash script help


Hi

I'm a scripting noob, having spent the past few weeks reading the books I'm trying to put it all into practice.

Part of the bash script that I am working on requires me to find and list all of the .htm files in a volume, then search each of those files for specific words, then export any of those .htm files that contain any of my search terms to an output folder. I have a bit of a blind spot with the final part...I suspect the solution is stunningly simple, can anyone help?

The relevant part of the script is as follows:

find search_path -iname '.htm' > htm.txt

cat htm.txt | while read i; do grep -if /path_to_search_terms "$i" ; done > hit_htm.txt

Obviously, from the above I get a large chunk of raw html containing my search hits. How do I copy each .htm file that contains one or more of my search terms to an output folder?

Thanks in advance
 
Old 06-18-2007, 07:21 AM   #2
zaichik
Member
 
Registered: May 2004
Location: Iowa USA
Distribution: CentOS
Posts: 419

Rep: Reputation: 30
How about:
Code:
for i in `find search_path -iname *.htm*`
do
   [[ grep -if /path_to_regexes $i ]] && cp $i /path/to/output/dir
done
 
Old 06-18-2007, 10:59 AM   #3
Epitaph
LQ Newbie
 
Registered: Mar 2007
Posts: 2

Original Poster
Rep: Reputation: 0
Thanks for the reply zaichik. I am getting the following error message:

bash: conditional binary operator expected
bash: syntax error near '-if'

The path to my regex file is correct. I have tried putting the $i variables inside " ", but still get the same error message. I have noticed that some of the file paths and/or file names of the .htm* files have spaces in them? Could that be the problem?
 
Old 06-18-2007, 02:11 PM   #4
slakmagik
Senior Member
 
Registered: Feb 2003
Distribution: Slackware
Posts: 4,113

Rep: Reputation: Disabled
Code:
for i in $(find search_path -iname '*.htm*'); do
   if grep -qif /path_to_regexes $i; then
     cp $i /path/to/output/dir
   fi
done
might work - you need to quote the find pattern, probably don't want the grep output (thus the q flag) and need to check its exit status, not its output, anyway. I also modernized the command expansion ($(...) instead of `...`) and wrote out the full if..then..fi construct.

Last edited by slakmagik; 06-18-2007 at 02:12 PM. Reason: trivial style - also changed the for..do..done to how I write them
 
Old 06-18-2007, 03:02 PM   #5
slakmagik
Senior Member
 
Registered: Feb 2003
Distribution: Slackware
Posts: 4,113

Rep: Reputation: Disabled
Actually, I got to thinking about it again and it's probably best/simplest just to make minimal modifications to what you had to begin with. You may *need* the htm.txt in addition to actually copying/moving files. And spaces *can* be problematic.

Code:
#!/bin/sh

find search_path -iname '*.html*' > htm.txt

cat htm.txt |
while read i; do
    if grep -qif /path_to_search_terms "$i"; then
        cp "$i" output_dir
    fi
done
You should probably throw in an 'if [ ! -d output_dir ]; then mkdir output_dir; fi' sort of thing before you try to copy to it (mkdir -p if you want it deeper in the hierarchy).
 
Old 06-18-2007, 03:16 PM   #6
radoulov
Member
 
Registered: Apr 2007
Location: Milano, Italia/Варна, България
Distribution: Ubuntu, Open SUSE
Posts: 212

Rep: Reputation: 35
Quote:
Originally Posted by Epitaph
[...]
Part of the bash script that I am working on requires me to find and list all of the .htm files in a volume, then search each of those files for specific words, then export any of those .htm files that contain any of my search terms to an output folder.
[...]
Assuming Linux (GNU utilities and bash):

Code:
xargs cp -t your_out_dir < <(grep -rlf your_pattern_file --include='*.htm' *)
 
Old 06-20-2007, 06:28 AM   #7
hoodedmanwithsythe
Member
 
Registered: Apr 2006
Location: West Midlands, UK
Distribution: mandriva, centos, debian
Posts: 91

Rep: Reputation: 15
that sounds pretty but whats the pattern file - looks dumb to make point
needs simmilar only i have no idea how to make pattern file also

Last edited by hoodedmanwithsythe; 06-20-2007 at 06:32 AM.
 
Old 06-20-2007, 06:30 AM   #8
radoulov
Member
 
Registered: Apr 2007
Location: Milano, Italia/Варна, България
Distribution: Ubuntu, Open SUSE
Posts: 212

Rep: Reputation: 35
Quote:
Originally Posted by hoodedmanwithsythe
that sounds pretty but whats the pattern file
[...]
The one indicated as "/path_to_regexes" in the previous posts.
 
Old 06-22-2007, 10:39 AM   #9
hoodedmanwithsythe
Member
 
Registered: Apr 2006
Location: West Midlands, UK
Distribution: mandriva, centos, debian
Posts: 91

Rep: Reputation: 15
yes I understood that but I am not the OP and I could do with knowing what this is and how I would make one please
 
Old 06-22-2007, 11:58 AM   #10
slakmagik
Senior Member
 
Registered: Feb 2003
Distribution: Slackware
Posts: 4,113

Rep: Reputation: Disabled
Simply a regular text file containing a list of regexes that the -f option tells grep to look for. Man grep for '-f' and for 'REGULAR EXPRESSIONS'.
 
Old 06-23-2007, 07:28 AM   #11
hoodedmanwithsythe
Member
 
Registered: Apr 2006
Location: West Midlands, UK
Distribution: mandriva, centos, debian
Posts: 91

Rep: Reputation: 15
oh ok thanks
 
  


Reply

Tags
bash, linux, script


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
passing variable from bash to perl in a bash script quadmore Programming 6 02-21-2011 05:11 AM
[bash] having trouble debugging this bash script. jons Programming 4 02-08-2007 07:51 AM
Bash script hangs upon starting another script in the background masea2 Linux - Software 4 11-13-2006 06:18 AM
send automatic input to a script called by another script in bash programming jorgecab Programming 2 04-01-2004 01:20 AM
bash script prob: how can i tell the script that a 'dd' has finished? Frustin Linux - General 2 04-02-2003 06:34 AM


All times are GMT -5. The time now is 10:35 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration