LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Matching Between Two Files: (https://www.linuxquestions.org/questions/programming-9/matching-between-two-files-918988/)

ali2011 12-15-2011 10:19 PM

Matching Between Two Files:
 
I have two files, a.txt and b.txt. The file a.txt (3 columns) is containing the data in file b.txt (1 column) and more. What I am looking to do is to search in a.txt for each element in b.txt, and print its corresponding line. Here is a partial cut of the two big files:

a.txt
Code:

planetlab4-dsl.cs.cornell.edu        42.4436    -76.4816
planetlab2.cs.purdue.edu        40.4274    -86.9167
planet2.scs.cs.nyu.edu        40.7292    -73.9937
planetlab2.cnis.nyit.edu        40.814    -73.6081
ricepl-1.cs.rice.edu        29.7225    -95.3924
pl1.eecs.utk.edu        35.9483    -83.9367
pl2.eecs.utk.edu        35.9483    -83.9367
planetlab2.itwm.fhg.de        49.26    7.46
planetlab06.cs.washington.edu        47.6531    -122.313
planetlab2.cs.ubc.ca        49.2637    -123.237
pub1-s.ane.cmc.osaka-u.ac.jp        34.81    135.52
pub2-s.ane.cmc.osaka-u.ac.jp        34.81    135.52
planetlab1.citadel.edu        32.7984    -79.9614
planetlab2.citadel.edu        32.7984    -79.9614
pl2.ernet.in        28.6    77.09
planetlab3.inf.ethz.ch        47.3794    8.54513
planetlab4.inf.ethz.ch        47.3794    8.54513

b.txt
Code:

planetlab2.cnis.nyit.edu
planetlab1.cs.purdue.edu
planetlab2.cs.purdue.edu
planet3.cs.ucsb.edu
planetlab1.cs.colorado.edu
planetlab02.sys.virginia.edu
planet1.cs.rochester.edu
planetlab2.byu.edu
planetlab06.cs.washington.edu
planetlab04.cs.washington.edu

out.txt

Code:

planetlab2.cnis.nyit.edu        40.814    -73.6081
planetlab2.cs.purdue.edu        40.4274    -86.9167
planetlab06.cs.washington.edu        47.6531    -122.313

So, the script will look for each line in b.txt in the first column in a.txt, and if a match found, it will print that 3 columns corresponding line from b.txt in out.txt. In this example just 3 matches were found.

jhwilliams 12-15-2011 10:43 PM

Here's one possible solution.

Hash the contents of file a, then look up the fields in file b. If there's a match, print it.

Code:

#!/bin/bash

declare -A servers

function hash_contents() {
    while read line; do
        fields=($line)
        servers[${fields[0]}]="${fields[1]}\t${fields[2]}"
    done < $@
}

function lookup_fields() {
    while read line; do
        [ ! -z "${servers[$line]}" ] && \
            echo -e $line'\t'${servers[$line]}
    done < $@
}

hash_contents a.txt
lookup_fields b.txt

exit 0


ali2011 12-15-2011 11:53 PM

Worked perfectly, thanks a lot.

All the best

Reuti 12-16-2011 05:54 AM

There is also the command join:
Code:

$ join <(sort a.txt) <(sort b.txt)
planetlab06.cs.washington.edu 47.6531    -122.313
planetlab2.cnis.nyit.edu 40.814    -73.6081
planetlab2.cs.purdue.edu 40.4274    -86.9167


jhwilliams 12-16-2011 06:22 AM

Quote:

Originally Posted by Reuti (Post 4551501)
There is also the command join

Ah! Just totally pwned my solution with a one-liner.

Quote:

"Those who don't understand UNIX are condemned to reinvent it, poorly." – Henry Spencer
Anyhow, good solution.


All times are GMT -5. The time now is 11:23 PM.