How to match element in a file using Bash
Hi,
I tried to simulate some example in Shell particularly finding item and matching in another file.I have two files initially; file A and file B. A very simple example:- In file A, I have columns of fields such that:- aaa 107 bbb 108 ccc 109 In a file B, I have columns of fields such that:- 101 2 1 102 3 1 107 2 1 108 3 1 109 2 1 I would like to know, if I would like to extract let say first element of file A and compare with file B elements. If found, I would like to have the position of element 107 in file B in this case it is on 3rd line. From the elements found in file B, I would like to perform some computation on their fields. Next, I would read the second element of file A which is "bbb 108" and open another file B which is quite similar to the file B. Currently, I assume I have only one file A and one file B to compare. I tried to do below:- #!/bin/bash cat a.txt| while read LINE do #grab the second element of a.txt one line at a time char=`echo "${LINE}"| awk '{print $2}'` echo $char #grab the line number of the element found in b.txt i=`grep -n "^$char" b.txt|tr ":" " "|awk '{ print $1}'` echo $i #grab the particular line number fields in b.txt to do computation cat b.txt|awk -v h=$i '{ count[$h]=$2+$3; } { printf("Count position %d is %d\n",h,count[$h]); }' done However weird thing is, the output i get is the sum of the last $2 and $3 for all the entries; which is Coutn position 2 is 6 # which is 2+4 Coutn position 1 is 6 #which is 2+4 Coutn position 3 is 6 #which is 2+4 My desired output for each time comparison would be something like:- Coutn position 2 is 5 #which is 2+3 And the same goes for other matching patterns when I open again another b.txt to compare with second line of a.txt. Anyone could tell me what is wrong with the above data structure? maybe I missed out some important structures for the above. Thanks. -ahjiefreak |
You can use the join command to do a line-by-line lookup of values between two files. You just have to specify which field is the join field using the -1 and -2 options:
For example, this command outputs a concatenation of fields where field number 2 in the first file is the same as field number 1 in the second file. (using your example files) Code:
join -1 2 -2 1 fileA fileB Code:
107 aaa 2 1 Code:
join -1 2 -2 1 fileA fileB | while read a b c d; do Code:
for join field 107 : 2 + 1 = 3 Code:
join -1 2 -2 1 fileA fileB | |
Hi Matthew,
I agree with you. But the problem is in the first file (A.txt); first element is looked and compare with second file namely B1.txt. Then, next element in A.txt (second element) is looked and compare again with another file namely B2.txt. If we use join, that would means we need to join two files. Can it be done in this case where while I read line by lnie of A.txt, I join the first element to the first field of B1.txt. Then, I can perform operation on that. I doubt we can do that because when we join, it still joins and match the whole element of first file with B1.txt. But the desired thing I would like to do is just get one element from A.txt at a time and join them (match). Please advise. Thanks. -ahjiefreak |
Aha, I mis-read the OP a little.
Do you have files names B1.txt, B2.txt B3.txt etc, where the numerical component increments by 1 each time, and presumably you have as many B files as there are lines in A.txt? Well, you could do it with something like the approach you took in the OP. However, I think this will be pretty bad performance if you have a lot of lines in A.txt because you will have to invoke several new processes per line of A.txt. Personally I'd switch to Perl for something like this, although awk is also a good choice. Here's how I'd do it: Code:
#!/usr/bin/perl |
Hi Matthew.
Thanks for the reply. I tried a silly method that for this kinda problem by having:- #!/bin/sh -x cat a.txt|while read LINE do char=`echo "${LINE}"| awk '{print $2}'` #echo $char i=`grep -n "^$char" b.txt|awk '{print $2}'` j=`grep -n "^$char" b.txt|awk '{print $3}'` k=`grep -n "$char" c.txt|awk '{print $1}'` q=`echo $j/\( $i +$j\) | bc` echo $i echo $j echo $k echo $q But I still face problem where:- in k=`grep -n "$char" c.txt|awk '{print $1}'` it could not grep only the exact number; For example; when I try grep number which is 108; ++ awk '{print $1}' + k='2: 3:' It gives me two values. Do you or anyone know how we can use awk (instead of echo) to simplify the whole process? I am kinda confused and headache thinking of this problem for the couple of days. Please advise. Thanks. -Jason |
let's say that I roughly understood the requirement...
sample input: Code:
# more file Code:
awk 'BEGIN{ i=0 } output: Code:
# ./test.sh |
Hi,
Thanks for the input. I havent tried it on my Linux Box currently as I am using currently using my friends pc. However, I do not quite understood from first glance. Do I have to open any file at the first place? Or just start with awk 'BEGIN...?(because from my understanding, the second field of first file is been read and store in store array. Second, for the FILENAME i assume it should be my second file? How do you deal with different number at the back for different file1,2,3...etc. to open and compare? And one more thing, is it that in the if(line~store[i]) when compared with element two of first file, the whole one line at a time is able to automatically compared with store[i]? Sorry as I am quite new to bash shell and it seems complicated for me to understand in details of the bits. If you dont mind, could you either clarify my doubts or comment on the code? Thanks alot.Really appreciate it. Will let you know once I try it out. Thanks. -ahjiefreak Code: awk 'BEGIN{ i=0 } NR==FNR{ store[++c] = $2 next } { ++i while ( (getline line < FILENAME )> 0 ) { if ( line ~ store[i] ) { print "Now I can do something with this line: " line " from file: " FILENAME } } nextfile } ' file file1 file2 file3 |
All times are GMT -5. The time now is 11:40 AM. |