grep command

castor0troy · 04-10-2013, 05:25 AM

i need some advice on a unix command.
see i have input.txt
keyword1
keyword2
etc
i want to perform the grep command for each keyword in the input.txt file and save result in output.txt
grep -c "keyword1" file.txt

can you please give me a command to perform this action.

thanks
Sam

Madhu Desai · 04-10-2013, 06:03 AM

Code:

$ cat input.txt
root
ftp
nobody
mail

PHP Code:



#!/bin/bash 
# filename = sr_by_grep.sh 
# $1 = pattern file 
# $2 = output file 
# $3 = file to be searched for 
# syntax sr_by_grep.sh <$1> <$2> <$3> 
 
keyword_file=$1 
output_file=$2 
searched_file=$3 
 
while read LINE 
do 
    let count++ 
    echo $(cat "$searched_file" | grep "$LINE" ) >> "$output_file" 
done < $keyword_file

Code:

$ sh sr_by_grep.sh input.txt output.txt /etc/passwd
$ cat output.txt
root:x:0:0:root:/root:/bin/bash operator:x:11:0:operator:/root:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:99:99:Nobody:/:/sbin/nologin nfsnobody:x:65534:65534:Anonymous NFS User:/var/lib/nfs:/sbin/nologin
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin

millgates · 04-10-2013, 06:05 AM

What exactly do you want the output to look like? List of number of occurences of each keyword in the file?

maybe something like

Code:

grep -of input.txt file.txt|sort|uniq -c >output.txt

simple, but not very fast, so if the files are large, I'd try awk or something.

Madhu Desai · 04-10-2013, 06:14 AM

Quote:

Originally Posted by millgates

Code:

grep -of input.txt file.txt|sort|uniq -c >output.txt

Damn!!! you are so good.

My previous post can be altered like this:

Code:

$ cat /etc/passwd | grep -f input.txt | sort | uniq > output.txt

Difference between pro and novice...

millgates · 04-10-2013, 06:24 AM

Quote:

Originally Posted by mddesai

Code:

$ cat input.txt
root
ftp
nobody
mail

PHP Code:



#!/bin/bash
# filename = sr_by_grep.sh
# $1 = pattern file
# $2 = output file
# $3 = file to be searched for
# syntax sr_by_grep.sh <$1> <$2> <$3>

keyword_file=$1
output_file=$2
searched_file=$3

while read LINE
do
    let count++
    echo $(cat "$searched_file" | grep "$LINE" ) >> "$output_file"
done < $keyword_file

Code:

$ sh sr_by_grep.sh input.txt output.txt /etc/passwd
$ cat output.txt
root:x:0:0:root:/root:/bin/bash operator:x:11:0:operator:/root:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:99:99:Nobody:/:/sbin/nologin nfsnobody:x:65534:65534:Anonymous NFS User:/var/lib/nfs:/sbin/nologin
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin

Nice, except for a few details:
1)

Code:

echo $(cat "$searched_file" | grep "$LINE" ) >> "$output_file"

A fascinating example demonstrating a useless usage of both cat and echo in a single line.
2) You should be more careful with quoting your variables. You quote $searched_file and $output_file, but it doesn't make much difference because you don't quote $2 and $3.
3) why do you count the lines? The count variable is not used anywhere.

castor0troy · 04-10-2013, 06:31 AM

sorry guys.let me explain my requirement better.

input.txt is a million keywords
keyword1
keyword2
.
.
.
.

i want a grep -c "keyword1" file.txt for all keywords in input.txt and save 'number of occurences of each keyword in the file'

so output.txt will be
keyword1 334
keyword 3342
keyword 6644

Madhu Desai · 04-10-2013, 06:35 AM

Quote:

Originally Posted by millgates

A fascinating example demonstrating a useless usage of both cat and echo in a single line.
2) You should be more careful with quoting your variables. You quote $searched_file and $output_file, but it doesn't make much difference because you don't quote $2 and $3.
3) why do you count the lines? The count variable is not used anywhere.

Great!! Thanks for correcting my mistakes. i have just started learning scripts (3-4 weeks young)

still long way to go...

castor0troy · 04-10-2013, 06:38 AM

sorry guys

i want a grep -c "keyword1" file.txt for all keywords in input.txt and save the results in output.txt

alright i am a newbie here.

castor0troy · 04-10-2013, 06:52 AM

samples files are below

input.txt is
aaa
bbb

file.txt is
aaa
aaa343343
bbb

i want to perform a grep -c command for each keyword in input.txt and save the resuts of grep in output.txt
grep -c "aaa" file.txt
grep -c "bbb" file.txt

output.txt is
aaa 2[means how many lines matches the given keyword 'aaa' in file.txt]
bbb 1

i can do this manually but the files have million keywords each.

sorry about this confusion guys.
hope you can help

Madhu Desai · 04-10-2013, 07:02 AM

Based on command given by millgates (#3)

Code:

$ cat input.txt
nobody
nologin
daemon

$ grep -of input.txt /etc/passwd | sort | uniq -c > output.txt

$ cat output.txt
      5 daemon
      2 nobody
     34 nologin

millgates · 04-10-2013, 07:02 AM

Quote:

Originally Posted by castor0troy

sorry guys.let me explain my requirement better.

input.txt is a million keywords
keyword1
keyword2
.
.
.
.

i want a grep -c "keyword1" file.txt for all keywords in input.txt and save 'number of occurences of each keyword in the file'

so output.txt will be
keyword1 334
keyword 3342
keyword 6644

Well, that's what my first example does. I can imagine that a search of a million keywords takes a while to sort, though. How large is the file you search in? How many occurrences total do you expect to be there? millions? more? In my example, I use grep -f, which searches for all the keywords in the same time. That's much faster than grepping million times for each keyword. On the other hand, my solution then sorts all the found occurences so I can use uniq -c on the result. That might be quite slow if there's a lot of matches.

The solution provided by mddesai is less efficient to grep, but it does not have to sort the output. Just modify it to

Code:

while read line
do
    grep -c "$line" file.txt
done <input.txt >output.txt

If you want a more efficient solution, I would try awk, but it would be more complicated.

castor0troy · 04-10-2013, 07:25 AM

thanks guys.
that worked millgates.
the file is around 2gb in size.
i am testing the grep command but i dont think its working.

any clue how i can replace grep with awk?

millgates · 04-10-2013, 07:53 AM

OK, my idea of this in awk would be something like this:

Code:

awk '
    NR == FNR { kw[$0]=0; }
    NR != FNR {
        for (w in kw) {
            split($0,a,w);
            kw[w]+= (length(a)-1);
        }
    }
    END {
        for(w in kw) {print w, kw[w];}
    }
' input.txt file.txt >output.txt

Also, please note that there's a difference between "number of lines containing a pattern" and "number of occurences of a pattern in a file". This does the latter. I haven't tested it, but I hope it would be at least slightly faster than the grep solution posted earlier.

castor0troy · 04-10-2013, 08:11 AM

thanks
say keyword is 'goldcoast'
and the main file is
goldcoast
123goldcoast
goldcoast123

i want to see how many times the keyword 'goldcoast' occurs in the beginning of the main file
in this case the output should be
goldcoast 2

how do i get this?

chrism01 · 04-10-2013, 08:17 AM

Define 'beginning'; you've got goldcoast 3 times, but you say you want to see '2'.