LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Merging text files based on Pattern (https://www.linuxquestions.org/questions/linux-newbie-8/merging-text-files-based-on-pattern-934572/)

t.othoneos 03-15-2012 05:26 AM

Merging text files based on Pattern
 
Hei everybody,

I have 2 text files like so:

file1.txt

101.63.121.142 email1@domain.com
101.63.129.87 email2@domain.com
105.130.100.194 email2@domain.com
108.167.112.229 email3@domain.com
108.167.112.229 email5@domain.com
108.204.74.14 email4@domain.com
108.27.234.42 email7@domain.com
109.100.20.55 email10@domain.com
109.107.14.233 email135@domain.com
.
.
.about 6000 Lines

and file2.txt

111 168.122.13.218 US
156 155.33.225.201 US
174 38.127.112.130 US
209 63.231.92.16 US
209 70.56.123.222 US
278 132.248.173.102 MX
278 132.248.44.85 MX
278 132.248.84.79 MX
286 92.68.196.78 NL
513 128.141.145.169 CH
513 128.141.145.203 CH
513 128.141.226.236 CH
513 128.141.86.21 CH
680 194.94.224.254 DE
702 195.124.9.41 DE
702 195.124.9.42 DE
766 161.111.163.120 EU
786 129.215.5.255 EU
.
.
.About 3000 lines

Just to clarify, file1.txt contains IP addresses and e-mails and file2.txt has AS Numbers, IP Addresses and Country Codes.

What I need is to read the column of the IP Addresses on file1.txt and search file2.txt, in order to have a result like this:


101.63.121.142 email1@domain.com 111 US
101.63.129.87 email2@domain.com 156 US
105.130.100.194 email2@domain.com 278 MX
108.167.112.229 email3@domain.com 1241 GR
108.167.112.229 email5@domain.com 1248 FI
108.204.74.14 email4@domain.com 1680 IL
108.27.234.42 email7@domain.com 2529 GB
109.100.20.55 email10@domain.com 3268 CY
109.107.14.233 email135@domain.com 197295 PL

* The above are not correct, just to make clear the pattern.

Any Help?

Markus Franke 03-15-2012 05:31 AM

find, grep and cut are your friends. ;-)

Hope this helps

t.othoneos 03-15-2012 05:34 AM

I know i can use these, maybe even awk. I don't know HOW, in order to get the result I need.

Markus Franke 03-15-2012 05:39 AM

1. for each line in file1.txt (while loop)
2. get IP address from line (cut)
3. search this IP address in file2.txt (grep)
4. if found then get number and country code from this line (cut)
5. put all information together and write to a new output file

Which step of the algorithm you need code for?

t.othoneos 03-15-2012 06:18 AM

I think I got it :

while read ip mail
do
var=$(grep -o $ip file2.txt)
asn=$(echo $var | awk '{print $1}')
cod=$(echo $var | awk '{print $NF}')
printf "%15s %s %s %s\n" $ip $mail $asn $cod
done < file1.txt > final.txt;

Markus Franke 03-15-2012 06:25 AM

Looks good. I would have used cut instead of awk as I am not very familiar with the sytax of awk.
But if it's working - Ok.

grail 03-15-2012 07:06 AM

Why not ditch the awk lines all together and just use a bash array:
Code:

var=($(grep -o "$ip" file2.txt))
printf "%15s %s %s %s\n" $ip $mail ${var[0]} ${var[2]}

If second file has unknown number of values per line then use the length of the array for final value.

t.othoneos 03-15-2012 07:19 AM

@grail: Works too

My code had the mistake of grep -o instead of the correct grep -m1.

SOLVED. Thanks guys


All times are GMT -5. The time now is 11:54 PM.