LinuxQuestions.org
LinuxAnswers - the LQ Linux tutorial section.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 12-12-2007, 07:20 AM   #1
ahjiefreak
LQ Newbie
 
Registered: Dec 2007
Posts: 13

Rep: Reputation: 0
How to match element in a file using Bash


Hi,

I tried to simulate some example in Shell particularly finding item and matching in another file.I have two files initially; file A and file B.

A very simple example:-

In file A, I have columns of fields such that:-

aaa 107
bbb 108
ccc 109

In a file B, I have columns of fields such that:-
101 2 1
102 3 1
107 2 1
108 3 1
109 2 1

I would like to know, if I would like to extract let say first element of file A and compare with file B elements.

If found, I would like to have the position of element 107 in file B in this case it is on 3rd line. From the elements found in file B, I would like to perform some computation on their fields.

Next, I would read the second element of file A which is "bbb 108" and open another file B which is quite similar to the file B.

Currently, I assume I have only one file A and one file B to compare.

I tried to do below:-


#!/bin/bash

cat a.txt| while read LINE
do
#grab the second element of a.txt one line at a time
char=`echo "${LINE}"| awk '{print $2}'`

echo $char
#grab the line number of the element found in b.txt
i=`grep -n "^$char" b.txt|tr ":" " "|awk '{ print $1}'`
echo $i

#grab the particular line number fields in b.txt to do computation
cat b.txt|awk -v h=$i '{

count[$h]=$2+$3;

}
{
printf("Count position %d is %d\n",h,count[$h]);
}'
done

However weird thing is, the output i get is the sum of the last $2 and $3 for all the entries; which is

Coutn position 2 is 6 # which is 2+4
Coutn position 1 is 6 #which is 2+4
Coutn position 3 is 6 #which is 2+4

My desired output for each time comparison would be something like:-

Coutn position 2 is 5 #which is 2+3

And the same goes for other matching patterns when I open again another b.txt to compare with second line of a.txt.

Anyone could tell me what is wrong with the above data structure?
maybe I missed out some important structures for the above.

Thanks.


-ahjiefreak
 
Old 12-12-2007, 07:33 AM   #2
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 63
You can use the join command to do a line-by-line lookup of values between two files. You just have to specify which field is the join field using the -1 and -2 options:
For example, this command outputs a concatenation of fields where field number 2 in the first file is the same as field number 1 in the second file. (using your example files)
Code:
join -1 2 -2 1 fileA fileB
The output is:
Code:
107 aaa 2 1
108 bbb 3 1
109 ccc 2 1
You can read this into a shell "while read" loop and perform whatever operations you like:
Code:
join -1 2 -2 1 fileA fileB | while read a b c d; do
    echo "for join field $a : $c + $d = $(($c + $d))"
done
And the output:
Code:
for join field 107 : 2 + 1 = 3
for join field 108 : 3 + 1 = 4
for join field 109 : 2 + 1 = 3
If there is a lot of input data you would be better off doing any line-by-line operations in Awk or Perl because the shell's read and arithmetic operators are not very efficient:
Code:
join -1 2 -2 1 fileA fileB |
  awk '{ print "for join field " $1 " : " $3 " + " $4 " = " $3 + $4; }'
 
Old 12-12-2007, 04:30 PM   #3
ahjiefreak
LQ Newbie
 
Registered: Dec 2007
Posts: 13

Original Poster
Rep: Reputation: 0
Hi Matthew,

I agree with you. But the problem is in the first file (A.txt); first element is looked and compare with second file namely B1.txt.

Then, next element in A.txt (second element) is looked and compare again with another file namely B2.txt.

If we use join, that would means we need to join two files. Can it be done in this case where while I read line by lnie of A.txt,
I join the first element to the first field of B1.txt. Then, I can perform operation on that.

I doubt we can do that because when we join, it still joins and match the whole element of first file with B1.txt. But the desired thing I would like to do is just get one element from A.txt at a time and join them (match).

Please advise.
Thanks.

-ahjiefreak
 
Old 12-12-2007, 06:24 PM   #4
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 63
Aha, I mis-read the OP a little.

Do you have files names B1.txt, B2.txt B3.txt etc, where the numerical component increments by 1 each time, and presumably you have as many B files as there are lines in A.txt?

Well, you could do it with something like the approach you took in the OP. However, I think this will be pretty bad performance if you have a lot of lines in A.txt because you will have to invoke several new processes per line of A.txt. Personally I'd switch to Perl for something like this, although awk is also a good choice.

Here's how I'd do it:
Code:
#!/usr/bin/perl

use strict;
use warnings;

my $n = 1;

open(A, "<A.txt") || die "cannot open A.txt : $!\n";
while(<A>) {
    chomp;
    my @a = split(/\s+/);
    my $bfile = "B$n.txt";
    open(B, "<$bfile") || die "cannot open $bfile : $!\n";
    while(<B>) {
        chomp;
        my @b = split(/\s+/);
        if ( $b[0] eq $a[1] ) {
            printf "found %s in %s at line %d. %d + %d = %d\n",
                $a[1], $bfile, $., $b[1], $b[2], $b[1]+$b[2];
        }
    }
    close(B);
    $n++;
}
close(A);
 
Old 12-13-2007, 12:56 AM   #5
ahjiefreak
LQ Newbie
 
Registered: Dec 2007
Posts: 13

Original Poster
Rep: Reputation: 0
Hi Matthew.

Thanks for the reply.

I tried a silly method that for this kinda problem by having:-


#!/bin/sh -x

cat a.txt|while read LINE
do

char=`echo "${LINE}"| awk '{print $2}'`

#echo $char
i=`grep -n "^$char" b.txt|awk '{print $2}'`
j=`grep -n "^$char" b.txt|awk '{print $3}'`
k=`grep -n "$char" c.txt|awk '{print $1}'`


q=`echo $j/\( $i +$j\) | bc`

echo $i
echo $j
echo $k
echo $q

But I still face problem where:-

in k=`grep -n "$char" c.txt|awk '{print $1}'`

it could not grep only the exact number;

For example; when I try grep number which is 108;

++ awk '{print $1}'
+ k='2:
3:'

It gives me two values.


Do you or anyone know how we can use awk (instead of echo) to simplify the whole process? I am kinda confused and headache thinking of this problem for the couple of days.


Please advise. Thanks.

-Jason
 
Old 12-13-2007, 02:07 AM   #6
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
let's say that I roughly understood the requirement...
sample input:
Code:
# more file
aaa 107
bbb 108
ccc 109
# more file1
101 2 1
102 3 1
107 2 1
108 3 1
107 10 1
# more file2
101 2 1
102 3 1
107 5 1
108 3 1
109 2 1
# more file3
101 2 1
102 3 1
107 5 1
108 3 1
109 6 1
GNU awk
Code:
awk  'BEGIN{ i=0 }
NR==FNR{ 
       store[++c] = $2
       next
     }
{
    ++i
    while ( (getline line < FILENAME )> 0 ) {
       if  ( line ~ store[i] ) {
            print "Now I can do something with this line:  " line  " from file: " FILENAME
       }
    }
    nextfile
}
' file file1 file2 file3

output:
Code:
# ./test.sh
Now I can do something with this line:  107 2 1 from file: file1
Now I can do something with this line:  107 10 1 from file: file1
Now I can do something with this line:  108 3 1 from file: file2
Now I can do something with this line:  109 6 1 from file: file3

Last edited by ghostdog74; 12-13-2007 at 02:08 AM.
 
Old 12-13-2007, 03:57 AM   #7
ahjiefreak
LQ Newbie
 
Registered: Dec 2007
Posts: 13

Original Poster
Rep: Reputation: 0
Hi,

Thanks for the input. I havent tried it on my Linux Box currently as I am using currently using my friends pc.

However, I do not quite understood from first glance. Do I have to open any file at the first place? Or just start with awk 'BEGIN...?(because from my understanding, the second field of first file is been read and store in store array.

Second, for the FILENAME i assume it should be my second file? How do you deal with different number at the back for different file1,2,3...etc. to open and compare?

And one more thing, is it that in the if(line~store[i]) when compared with element two of first file, the whole one line at a time is able to automatically compared with store[i]?

Sorry as I am quite new to bash shell and it seems complicated for me to understand in details of the bits. If you dont mind, could you either clarify my doubts or comment on the code?

Thanks alot.Really appreciate it. Will let you know once I try it out. Thanks.



-ahjiefreak
Code:
awk 'BEGIN{ i=0 }
NR==FNR{
store[++c] = $2
next
}
{
++i
while ( (getline line < FILENAME )> 0 ) {
if ( line ~ store[i] ) {
print "Now I can do something with this line: " line " from file: " FILENAME
}
}
nextfile
}
' file file1 file2 file3
 
  


Reply

Tags
awk, bash, join, read, shell, while


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Bash scripting to match ps -o etime RaelOM Programming 8 10-11-2007 03:43 PM
grep/sed/awk - find match, then match on next line gctaylor1 Programming 3 07-11-2007 08:55 AM
metadata file does not match checksum ? anilbh Fedora 4 04-15-2007 12:08 PM
delete file that match the content packets Programming 5 04-03-2007 02:47 PM
ELF file version does not not match current one kakultech Linux - Software 1 10-06-2003 07:24 PM


All times are GMT -5. The time now is 09:41 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration