LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 02-07-2011, 10:36 AM   #1
snaravane
LQ Newbie
 
Registered: Feb 2008
Posts: 4

Rep: Reputation: 3
GREP help.. Urgent :(


Hi I am a newbie and need you expert's help.

I got a doc root (lets say /site/mysite/docs/) where i want to execute a recursive grep on all the directories and get a list of files in a file_list.txt

Now search is like this

1. Capture all files which has "<!--# ((Any Text Here)) -->"
2. Capture all files that has "<!--# ((Any Text Here)) -->" as well as "<!--#include virtual= ((Path To SSI/HTML)) -->" BOTH
3. Ignore all file that has "<!--#include virtual= ((PATH TO SSI/HTML))-->" ONLY



I was able to get first two points done with following

find /site/mysite/docs/ -exec grep -ls '<!--#' {} \; > ssi_file_list.txt

However my boss needs to cut off files which has "<!--#include virtual= ((PATH TO SSI/HTML))-->" ONLY.

Can someone help ?
 
Old 02-07-2011, 11:18 AM   #2
tirab
Member
 
Registered: Dec 2010
Location: Italy, Lucca
Distribution: Slackware
Posts: 38

Rep: Reputation: 0
Not sure that I've understood, however to exclude a match from your grep use -v, example

grep -something- | grep -v -e'<!--#include virtual= ((PATH TO SSI/HTML))-->'

hope this helps
 
Old 02-07-2011, 12:08 PM   #3
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976
If you're using a recent version of GNU grep you can try its recursive capability (option -r) to avoid the find command. Anyway, you have to apply grep multiple times to include all the requirements. For example, to find out the files that have only the <!--#include virtual= pattern, you might do something like this:
Code:
while read file
do
  if ! grep -v '<!--#include virtual=' $file | grep -q '<!--#'
  then
    echo $file
  fi
done < <(grep -lr '<!--#include virtual=' *)
However, I would try something easier. Suppose you have the following awk code in a file called test.awk:
Code:
BEGIN {
  _[0] = "none"
  _[2] = "comment only"
  _[4] = "virtual only"
  _[6] = "both"
}

/<!--#/ {
  if ( $0 ~ /<!--#include virtual=/ )
    virtual = 4
  else
    comment = 2
}

END {
  print _[comment+virtual]
}
then you can run it recursively on all the files one at a time (using the find command), check the result and act accordingly. Example:
Code:
#!/bin/bash
while read file
do
  result=$(awk -f test.awk "$file");
  case "$result" in
    "none") echo $file has none ;;
    "comment only") echo $file has only comments ;;
    "virtual only") echo $file has only include virtual ;;
    "both") echo $file has both the patterns ;;
  esac
done < <(find /site/mysite/docs -type f)
Just an aside note: please don't use words as urgent in the thread title, since it's considered rude for people that voluntarily spend their time to answer questions and give help. Thank you.
 
Old 02-07-2011, 10:01 PM   #4
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,250

Rep: Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684
You could probably combine like so with a small modification:
Code:
#!/usr/bin/awk -f

BEGIN {
  _[0] = "none"
  _[2] = "comment only"
  _[4] = "virtual only"
  _[6] = "both"
}

/<!--#/ {
  if ( $0 ~ /<!--#include virtual=/ )
    virtual = 4
  else
    comment = 2
}

END {
  print FILENAME,"has",_[comment+virtual]
}
Then you could just call it in your find:
Code:
find /site/mysite/docs -type f -exec ./test.awk {} \;
 
Old 02-08-2011, 12:58 AM   #5
snaravane
LQ Newbie
 
Registered: Feb 2008
Posts: 4

Original Poster
Rep: Reputation: 3
@ Everyone :Thanks a ton guys.. Let me try out the options and update you Gurus


@ colucix : Hey I am sorry for using word like "Urgent" .. this is my first post ever to any forum .. didn't know the etiquettes And thanks again for your help!!!

Saggu
 
Old 02-08-2011, 02:15 AM   #6
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976
Quote:
Originally Posted by snaravane View Post
@ colucix : Hey I am sorry for using word like "Urgent" .. this is my first post ever to any forum .. didn't know the etiquettes And thanks again for your help!!!

Saggu
No problem!
 
Old 02-08-2011, 05:07 AM   #7
snaravane
LQ Newbie
 
Registered: Feb 2008
Posts: 4

Original Poster
Rep: Reputation: 3
Quote:
Originally Posted by grail View Post
You could probably combine like so with a small modification:
Code:
#!/usr/bin/awk -f

BEGIN {
  _[0] = "none"
  _[2] = "comment only"
  _[4] = "virtual only"
  _[6] = "both"
}

/<!--#/ {
  if ( $0 ~ /<!--#include virtual=/ )
    virtual = 4
  else
    comment = 2
}

END {
  print FILENAME,"has",_[comment+virtual]
}
Then you could just call it in your find:
Code:
find /site/mysite/docs -type f -exec ./test.awk {} \;

Is there an option with find where I can ignore all binary files like .gz, .tar, .mp3, .gif etc ?
 
Old 02-08-2011, 06:29 AM   #8
snaravane
LQ Newbie
 
Registered: Feb 2008
Posts: 4

Original Poster
Rep: Reputation: 3
Quote:
Originally Posted by tirab View Post
Not sure that I've understood, however to exclude a match from your grep use -v, example

grep -something- | grep -v -e'<!--#include virtual= ((PATH TO SSI/HTML))-->'

hope this helps
hey i used your code and created a small script. it looks something like :

#!/bin/bash
ssi_list="/home/wwwdocs/ssi_list.txt"
for line in $(cat $ssi_list); do
grep '<!--#' $line | grep -v '<!--#include virtual='
if [ $? == "0" ]; then
echo $line >> ssi_list_refined.txt
fi
done

ssi_list.txt has list of all the files under /site/mysite/docs/ docroot !

its working.. thankyou for all your help
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Grep -p for Linux, Trying to grep a paragraph. ohijames Linux - Newbie 5 07-22-2010 03:09 PM
boot sector and lilo collapse !!!!! URGENT URGENT URGEN !!!!! frelihm Linux - Software 21 12-02-2009 11:21 AM
Trying to understand pipes - Can't pipe output from tail -f to grep then grep again lostjohnny Linux - Newbie 15 03-12-2009 11:31 PM
how to grep multiple filters with grep LinuxLover Linux - Enterprise 1 10-18-2007 08:12 AM
ps -ef|grep -v root|grep apache<<result maelstrombob Linux - Newbie 1 09-24-2003 12:38 PM


All times are GMT -5. The time now is 05:15 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration