LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   GREP help.. Urgent :( (https://www.linuxquestions.org/questions/linux-newbie-8/grep-help-urgent-861179/)

snaravane 02-07-2011 09:36 AM

GREP help.. Urgent :(
 
Hi I am a newbie and need you expert's help.

I got a doc root (lets say /site/mysite/docs/) where i want to execute a recursive grep on all the directories and get a list of files in a file_list.txt

Now search is like this

1. Capture all files which has "<!--# ((Any Text Here)) -->"
2. Capture all files that has "<!--# ((Any Text Here)) -->" as well as "<!--#include virtual= ((Path To SSI/HTML)) -->" BOTH
3. Ignore all file that has "<!--#include virtual= ((PATH TO SSI/HTML))-->" ONLY



I was able to get first two points done with following

find /site/mysite/docs/ -exec grep -ls '<!--#' {} \; > ssi_file_list.txt

However my boss needs to cut off files which has "<!--#include virtual= ((PATH TO SSI/HTML))-->" ONLY.

Can someone help ?

tirab 02-07-2011 10:18 AM

Not sure that I've understood, however to exclude a match from your grep use -v, example

grep -something- | grep -v -e'<!--#include virtual= ((PATH TO SSI/HTML))-->'

hope this helps

colucix 02-07-2011 11:08 AM

If you're using a recent version of GNU grep you can try its recursive capability (option -r) to avoid the find command. Anyway, you have to apply grep multiple times to include all the requirements. For example, to find out the files that have only the <!--#include virtual= pattern, you might do something like this:
Code:

while read file
do
  if ! grep -v '<!--#include virtual=' $file | grep -q '<!--#'
  then
    echo $file
  fi
done < <(grep -lr '<!--#include virtual=' *)

However, I would try something easier. Suppose you have the following awk code in a file called test.awk:
Code:

BEGIN {
  _[0] = "none"
  _[2] = "comment only"
  _[4] = "virtual only"
  _[6] = "both"
}

/<!--#/ {
  if ( $0 ~ /<!--#include virtual=/ )
    virtual = 4
  else
    comment = 2
}

END {
  print _[comment+virtual]
}

then you can run it recursively on all the files one at a time (using the find command), check the result and act accordingly. Example:
Code:

#!/bin/bash
while read file
do
  result=$(awk -f test.awk "$file");
  case "$result" in
    "none") echo $file has none ;;
    "comment only") echo $file has only comments ;;
    "virtual only") echo $file has only include virtual ;;
    "both") echo $file has both the patterns ;;
  esac
done < <(find /site/mysite/docs -type f)

Just an aside note: please don't use words as urgent in the thread title, since it's considered rude for people that voluntarily spend their time to answer questions and give help. Thank you.

grail 02-07-2011 09:01 PM

You could probably combine like so with a small modification:
Code:

#!/usr/bin/awk -f

BEGIN {
  _[0] = "none"
  _[2] = "comment only"
  _[4] = "virtual only"
  _[6] = "both"
}

/<!--#/ {
  if ( $0 ~ /<!--#include virtual=/ )
    virtual = 4
  else
    comment = 2
}

END {
  print FILENAME,"has",_[comment+virtual]
}

Then you could just call it in your find:
Code:

find /site/mysite/docs -type f -exec ./test.awk {} \;

snaravane 02-07-2011 11:58 PM

@ Everyone :Thanks a ton guys.. Let me try out the options and update you Gurus :)


@ colucix : Hey I am sorry for using word like "Urgent" .. this is my first post ever to any forum .. didn't know the etiquettes :) And thanks again for your help!!!

Saggu

colucix 02-08-2011 01:15 AM

Quote:

Originally Posted by snaravane (Post 4251528)
@ colucix : Hey I am sorry for using word like "Urgent" .. this is my first post ever to any forum .. didn't know the etiquettes :) And thanks again for your help!!!

Saggu

No problem! :)

snaravane 02-08-2011 04:07 AM

Quote:

Originally Posted by grail (Post 4251421)
You could probably combine like so with a small modification:
Code:

#!/usr/bin/awk -f

BEGIN {
  _[0] = "none"
  _[2] = "comment only"
  _[4] = "virtual only"
  _[6] = "both"
}

/<!--#/ {
  if ( $0 ~ /<!--#include virtual=/ )
    virtual = 4
  else
    comment = 2
}

END {
  print FILENAME,"has",_[comment+virtual]
}

Then you could just call it in your find:
Code:

find /site/mysite/docs -type f -exec ./test.awk {} \;


Is there an option with find where I can ignore all binary files like .gz, .tar, .mp3, .gif etc ?

snaravane 02-08-2011 05:29 AM

Quote:

Originally Posted by tirab (Post 4250890)
Not sure that I've understood, however to exclude a match from your grep use -v, example

grep -something- | grep -v -e'<!--#include virtual= ((PATH TO SSI/HTML))-->'

hope this helps

hey i used your code and created a small script. it looks something like :

#!/bin/bash
ssi_list="/home/wwwdocs/ssi_list.txt"
for line in $(cat $ssi_list); do
grep '<!--#' $line | grep -v '<!--#include virtual='
if [ $? == "0" ]; then
echo $line >> ssi_list_refined.txt
fi
done

ssi_list.txt has list of all the files under /site/mysite/docs/ docroot !

its working.. thankyou for all your help :)


All times are GMT -5. The time now is 03:48 PM.