LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   find/grep question (https://www.linuxquestions.org/questions/linux-newbie-8/find-grep-question-4175602703/)

ocrts 03-28-2017 09:12 AM

find/grep question
 
I'm using the following command to get the number of *.log files containing "yes" or "no" at the end of the file (the files are large and I'm looking for "yes" or "no" near the end of each file, and each file will contain at most one "yes" OR one "no", but never both):

find ./OUTPUT -type f | grep '\.log' | xargs -l tail -20 {} | grep 'yes\|no' | wc -l

This works fine. Now I need the filenames that DO NOT contain "yes" or "no". How do I modify the above command to do that?

Thanks!

pan64 03-28-2017 09:24 AM

looks like a homework
at first I would suggest you to check the man page of grep about possible options.

hydrurga 03-28-2017 09:24 AM

grep -v does inverse matching (see man grep)

Oops, sorry pan64, I posted this at the same time as you. I wasn't wanting to overrule your advice.

@ocrts: Welcome to LQ. :) Is this a homework question or something you're trying to achieve in the real world? Just interested.

ocrts 03-28-2017 09:34 AM

Not a homework. Professional. I'm a little old for homework ;). Just made the text generic. I have checked the man pages. The -v option will just list all the lines that don't contain "yes" or "no". I need the filenames. I tried the -l option on the second grep and send the output to a file (instead of wc -l) to get the filenames of those that do contain "yes" or "no", but I get a broken pipe on the xargs.

JeremyBoden 03-28-2017 10:07 AM

How 'near' is "near the end of a file"?

ocrts 03-28-2017 10:10 AM

"near" is within the last 20 lines. Since the log files are >10MB each and there are ~10k files, the -tail significantly reduces the execution time of the command as it isn't searching the entirety of each file.

wpeckham 03-28-2017 11:43 AM

Quote:

Originally Posted by ocrts (Post 5689237)
I'm using the following command to get the number of *.log files containing "yes" or "no" at the end of the file (the files are large and I'm looking for "yes" or "no" near the end of each file, and each file will contain at most one "yes" OR one "no", but never both):

find ./OUTPUT -type f | grep '\.log' | xargs -l tail -20 {} | grep 'yes\|no' | wc -l

This works fine. Now I need the filenames that DO NOT contain "yes" or "no". How do I modify the above command to do that?

Thanks!

It strikes me that a small script might be better than a one-liner for this purpose. You could easily have a single small script prepare a list of the log files, and split it into three lists: files with yes, files with no, files without either. From that point generating the counts would be trivial.
Something like
Code:

#!/bin/bash
cd OUTPUT
LIST0=`$(ls *.log)
LISTY=""
LISTN=""
LIST1=""
for foo in ${LIST0} ; do
  if tail -20 ${foo} | grep yes
  then
      LISTY="${LISTY ${foo}"
  elif tail -20 ${foo} | grep no
  then
      LISTN="${LISTN} ${foo}"
  else
      LIST1="${LIST1} ${foo}"
  fi
done
COUNTYES=$(echo "$LISTY"|wc -w)
COUNTNO=$(echo "$LISTN" |wc -w)
COUNT0=$(echo "$LIST1"  |wc -w)
# followed by whatever you want to do with these lists and numbers

This is just off the top of my head, possible with syntax errors, and unlikley to be the most efficient way. Also, as written it will be "noisy" because I have not suppressed output of those grep commands in the if statements. Still, it might be enough to give you ideas.

pan64 03-28-2017 11:57 AM

Quote:

Originally Posted by ocrts (Post 5689237)
find ./OUTPUT -type f | grep '\.log' | xargs -l tail -20 {} | grep 'yes\|no' | wc -l

itself can be simplified by:
Code:

find ./OUTPUT -type f -name '*.log' | xargs -l tail -20 {} | grep -c 'yes\|no'
which is more efficient, but otherwise should do the same thing (not tested).


I would use a more powerful language, but you can also list the files containing yes and no and removes them from the full list (which is actually only one additional grep and a small script)

MadeInGermany 03-28-2017 01:02 PM

And you can replace xargs -l with find -exec
Code:

find ./OUTPUT -type f -name '*.log' -exec tail -20 {} \; | grep -c -e 'yes' -e 'no'
find ./OUTPUT -type f -name '*.log' -exec tail -20 {} \; | grep -v -c -e 'yes' -e 'no'



All times are GMT -5. The time now is 05:26 AM.