grep multiple words and linking with other file

zonah12 · 10-21-2011, 03:52 AM

just an e.g. i have two files, in one file (top.txt) 1000 random words are present which i want to use. In second file (All.txt), i have the 10000 words and their meanings in two fields. Now, what i want to do is that i want to use the output of my top.txt file in a way that after greping the words from top file i get the meaning of all those words from my other file All.txt. if i use the command % grep -e "(foul|.....|zeal) top.txt i grep all the words but now how to compare it with the other file. Kindly let me know

grail · 10-21-2011, 04:58 AM

I think you might need to demonstrate with some small examples as I do not follow what you require?

David the H. · 10-21-2011, 05:53 AM

As grail said. Vague questions tend to get you vague answers, and one concrete example is worth a thousand lines of explanation. Show us a sample of each file, at least, along with what you want the output to look like.

But in any case, it's possible to use a file as a collection of patterns to search for. As long as the first file has only a single search word per line, you can try this:

Code:

grep -f top.txt All.txt

You can also use other options like -F, to search for fixed strings only, and -i to make them case-insensitive. See the grep man and info pages for more options.

crts · 10-21-2011, 06:12 AM

My guess would be that the OP wants to grep for a certain pattern(s) in top.txt first and then use the results to grep them in All.txt. Some sort of cascaded (?) filtering process?

Code:

grep -E "(foul|.....|zeal)" top.txt > tmpfile
grep -f tmpfile All.txt

@OP: As stated before please provide an example with some representative sample data.

David the H. · 10-21-2011, 07:12 AM

Well if that's the case, then we could also use a process substitution as the "file" to search with.

http://mywiki.wooledge.org/ProcessSubstitution

Code:

grep -f <( grep -E -e "(foul|.....|zeal)" top.txt ) All.txt

Not that it really changes anything from the above other than bypassing the need for a tempfile. It still requires two grep processes. If we knew more about the actual requirements, perhaps we could even come up with a single-step solution.

Also be aware that P.S. is a bash-only extension.

PS: You need grep -E/egrep for a complex regex like that.

crts · 10-21-2011, 08:03 AM

Quote:

Originally Posted by David the H.

PS: You need grep -E/egrep for a complex regex like that.

Thanks for the hint. I also noticed that I forgot to close the quote in my previous post. I simply copy+pasted that part from the OP's solution without further examining it. Corrected it now.

zonah12 · 10-26-2011, 01:47 AM

Thankyou so much for the solutions but its not working. I will try to elaborate by giving more examples.
file TOP.txt

Foul
Tall
blot
grail
House
System
Galaxy
jar
trophy
laptop

This file contains 10 words

Second file all.txt

system ns01
broad ns02
house ns03
laptop ns04
trophy ns05
ginger ns06
foul ns07
dustbin ns08
mugs ns09
blot ns10
pack ns11
butter ns12
jar ns13
knife ns14
kangroo ns15
galaxy ns16
kind ns17
heart ns18
grail ns19
short ns20
tall ns21
table ns22
chair ns23
blot ns24
onion ns25
foul ns26

this file contains 26words with their codes, now what i want to do is to relate the top file words with the codes in all.txt files omiting the words which are not present in the top file. that is i want the result to look like this
Result
Foul ns26
Tall ns21
blot ns24
grail ns19
House ns03
System ns01
Galaxy ns16
jar ns13
trophy ns05
laptop ns04

two things important firstly words are not arranged alphabetically and secondly they are not case sensitive that is similar words might be in small alphabet in top file where as in capital in all file.
I hape i have given the clear e.g. now

crts · 10-26-2011, 04:19 AM

Hi,

your example still needs a bit more explanation. What happens with multiple matches? Do you want to keep the first match or the last match? Or maybe something else? Your sample output suggests that you want to print only the last match.

Code:

grep -i -f top.txt all.txt|tac| awk '(a[$1]++ == 0) {print}'

However, the order is not the same as in your sample. If you wish to keep the first match:

Code:

grep -i -f top.txt all.txt | awk '(a[$1]++ == 0) {print}'

And to keep all matches:

Code:

grep -i -f top.txt all.txt

If none of the above works then you need to provide some more criteria for the filtering process.

zonah12 · 11-01-2011, 11:28 PM

Hi,
Its still not working. No i dont have multiple matches and i have single word in all.txt file with a single code no repeatation in the top.txt or all.txt files. Since i am a new user so if possible do guide me about using commands as well. Thankyou so much for your answers.

chrism01 · 11-02-2011, 12:59 AM

Quote:

Its still not working.

... in what way? We need exact details/example

grail · 11-02-2011, 01:44 AM

So to confirm, the data you provided is wrong as it does have duplicates?

Code:

system ns01
broad ns02
house ns03
laptop ns04 
trophy ns05
ginger ns06
foul ns07
dustbin ns08
mugs ns09
blot ns10 
pack ns11
butter ns12
jar ns13 
knife ns14 
kangroo ns15
galaxy ns16 
kind ns17
heart ns18
grail ns19
short ns20 
tall ns21
table ns22
chair ns23 
blot ns24
onion ns25 
foul ns26

zonah12 · 11-17-2011, 10:29 PM

system ns01
broad ns02
house ns03
laptop ns04
trophy ns05
ginger ns06
foul ns07
dustbin ns08
mugs ns09
blot ns10
pack ns11
butter ns12
jar ns13
knife ns14
kangroo ns15
galaxy ns16
kind ns17
heart ns18
grail ns19
short ns20
tall ns21
table ns22
chair ns23
blot ns24
onion ns25
jacket ns26

I am sorry for it. I have changed the repeated word, if any more repeatation then i wish to keep the first match. Well, i tried the commands but in top.txt file i didnt get any codes after my words with out any errors. I checked all.txt file as well but it remained same, and no new file was created as well.