LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   grep: find files that do not have multiple different strings (https://www.linuxquestions.org/questions/linux-newbie-8/grep-find-files-that-do-not-have-multiple-different-strings-4175540304/)

kmkocot 04-21-2015 03:59 AM

grep: find files that do not have multiple different strings
 
Hi all,

I'm trying to identify files that do not have matches for certain strings. FYI, these are files of DNA sequences and I'm trying to find those that are NOT sampled for any species by my group of interest (e.g., genes that are specific to that group of organisms).

I tried this code but it's actually yielding a list of files that DO match for my regexp.
Code:

for FILENAME in *.fas
do
grep -q -L ">PBAH" $FILENAME && grep -q -L ">SKOW" $FILENAME && grep -q -L ">CGRA" $FILENAME && echo $FILENAME
done

Basically I want to somehow go through and file files that do not contain ">PBAH" ">SKOW" or ">CGRA". Any assistance would be greatly appreciated!

Best,
Kevin

pan64 04-21-2015 04:10 AM

awk would be better I think (or perl, python ...)
Code:

(pseudo code)
awk ' />PBAH/ { next file }
      />SKOW/ { next file }
      ....
      END { print filename }
    ' *.pas

If I understand it well

millgates 04-21-2015 04:21 AM

first, using -q and and -L together seems a bit nonsensical. -L tells grep to print filenames that don't match, but -q supresses the output.
Also, you seem to be confusing standard output of the program and its return code. The && operators depend on the latter, while -L affects the former.
So, if the file does contain the first pattern, grep returns 0 and the next grep is executed. If, however, the first pattern is not included, grep prints nothing because of the -q switch and returns 1, which means the tests for the other patterns are not executed.
Also, you are running the greps in a loop, but the -L switch makes most sense when executed with multiple filenames to list those that don't match.

So, some possibilities:

Code:

grep -L "PBAH\|SKOW\|CGRA" *.fas
or, if you have more patterns, you can use the -f switch. so , if your patterns are in file patterns, one per line:

Code:

grep -L -f patterns *.fas
should work. There's plenty of variations, but I hope this will get you started.

kmkocot 04-22-2015 07:23 AM

Thanks guys! Very helpful!

pan64 04-22-2015 12:55 PM

glad to help you
(if you really want to say thanks just click on yes)


All times are GMT -5. The time now is 03:06 PM.