GREP command help

UnixNewbie91 · 04-27-2012, 05:02 PM

Hi

Really new to unix so might not be able to explain my problem clearly but here goes.

I have two variables MONTH and YEAR and need to look through a directory containing 15 or so files for find lines that contain both variables. Once this is done I need the output to show the file name and the number of times both variables appear in each file.

So far I have my grep command at:

grep $MONTH ~/webhits/* | grep -c -H $YEAR

When I run this command I get an output of "(standard input):[the number of times both my variables appear in the whole directory] but I need it broken down into the number of times it appears in each file.

Thanks in advance for any help

suicidaleggroll · 04-27-2012, 05:06 PM

Easy way is to use your current script in a loop over each file

Code:

for file in ~/webhits/*; do echo "$file: "$(grep $MONTH "$file" | grep -c $YEAR); done

UnixNewbie91 · 04-27-2012, 05:09 PM

Brilliant mate thanks so much. Works like a charm

UnixNewbie91 · 04-27-2012, 05:18 PM

Another question

Going to need to give more details on the files:

Each file contains the number of web hits for a fake website from a number of different made up IP addresses. Some of the IP addresses appear twice in the same file so the hit from this IP address is counted multiple times. If I wanted to only count the hit from each IP once (so get the number unique hits) how would I do this.

Here is an example of the file if it helps
44.184.167.119 Mon May 07 08:11:50 GMT 2007
78.230.158.130 Thu May 10 01:59:33 GMT 2007
78.230.158.130 Thu May 10 05:14:58 GMT 2007

So for these three hits I want to count the hits from IP address 78.230.158.130 as one unique hit.

Any suggestions?

suicidaleggroll · 04-27-2012, 05:21 PM

I believe that's getting out of the realm of bash/grep. Something like awk would probably be powerful enough to do it. You could either leave your current grep in tact and use awk to find the unique IPs from the match, or use awk to do both steps. I'm not an awk expert though, so I'll let somebody else chime in there.

Tinkster · 04-27-2012, 06:30 PM

Code:

echo "44.184.167.119 Mon May 07 08:11:50 GMT 2007
78.230.158.130 Thu May 10 01:59:33 GMT 2007
78.230.158.130 Thu May 10 05:14:58 GMT 2007"|cut -d" " -f 1,7|sort -u
44.184.167.119
78.230.158.130

Cheers,
Tink

UnixNewbie91 · 04-27-2012, 08:37 PM

Not quite sure what you mean there Tink

Basically I have to write a script that looks at the files which are all set up like the example I gave and then put into a table the name of the folder, the number of hits in a given time frame which are sorted in descending order and then the number of unique hits (number of different IP addresses.

Any help?

Tinkster · 04-27-2012, 09:07 PM

If you didn't use grep to count, but to only output records that match your
$YEAR then pipe that into the cut|sort combo above you could then count the
matching lines.

UnixNewbie91 · 04-27-2012, 09:14 PM

My current grep command is as follows:
for file in ~/webhits/*; do echo "$file: "$(grep $MONTH "$file" | grep -c $YEAR); done

Your saying to make it into this?
for file in ~/webhits/*; do echo "$file: "$(grep $MONTH "$file" | grep -c $YEAR)|cut -d" " -f 1,7|sort -u; done

Tinkster · 04-27-2012, 10:21 PM

Close ...

Code:

for file in ~/webhits/*; do echo "$file: "$(grep $MONTH "$file" | grep $YEAR|cut -d" " -f 1,7|sort -u|wc -l); done

Tinkster · 04-27-2012, 11:18 PM

And after a little thought .... maybe this does what you want?

Code:

$ cat webhits/fakelog
44.184.167.119 Mon May 07 08:11:50 GMT 2007
78.230.158.130 Thu May 10 01:59:33 GMT 2007
78.230.158.130 Thu May 10 05:14:58 GMT 2007
$ MONTH=May
$ YEAR=2007
$ awk -v month=$MONTH -v year=$YEAR '$3==month&&$7==year {a[$1]++}END{print FILENAME": "asort(a)}' webhits/*
webhits/fakelog: 2

Back to the drawing board ... this doesn't work when there's more
than one file :} ... as I said: little thought ;D

Cheers,
Tink

UnixNewbie91 · 04-28-2012, 06:28 AM

Thanks so much for the help Tink,

for file in ~/webhits/*; do echo "$file: "$(grep $MONTH "$file" | grep $YEAR|cut -d" " -f 1,7|sort -u|wc -l); done

that code works perfectly, cant thank you enough

Tinkster · 04-29-2012, 12:18 AM

And a pure awk solution, finally =o)

Code:

$3==month&&$7==year{
  if(FNR==1){
    fn[FILENAME]=FILENAME
  }
  if(fn[FILENAME]!~$1){
    fn[FILENAME]=fn[FILENAME]","$1
  }
}
END{
  for(i in fn){
    split(fn[i],tmp,",")
    print tmp[1]": "asort(tmp)-1
  }
}

Code:

$ ls webhits/
fakelog  fakelog2
$ cat webhits/fakelog
44.184.167.119 Mon May 07 08:11:50 GMT 2007
78.230.158.130 Thu May 10 01:59:33 GMT 2007
78.230.158.130 Thu May 10 05:14:58 GMT 2007
$ cat webhits/fakelog2
44.184.167.119 Mon May 07 08:11:50 GMT 2007
78.230.158.130 Thu May 10 01:59:33 GMT 2009
78.230.158.130 Thu May 10 01:59:33 GMT 2007
78.230.158.130 Thu May 10 05:14:58 GMT 2008
119.167.184.44 Mon May 07 08:11:50 GMT 2007

Code:

$ awk -v month=$MONTH -v year=$YEAR -f awking webhits/*
webhits/fakelog2: 3
webhits/fakelog: 2

I'm sure grail can come up with something more elegant AND shorter.

Cheers,
Tink