ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
I tried to simulate some example in Shell particularly finding item and matching in another file.I have two files initially; file A and file B.
A very simple example:-
In file A, I have columns of fields such that:-
aaa 107
bbb 108
ccc 109
In a file B, I have columns of fields such that:-
101 2 1
102 3 1
107 2 1
108 3 1
109 2 1
I would like to know, if I would like to extract let say first element of file A and compare with file B elements.
If found, I would like to have the position of element 107 in file B in this case it is on 3rd line. From the elements found in file B, I would like to perform some computation on their fields.
Next, I would read the second element of file A which is "bbb 108" and open another file B which is quite similar to the file B.
Currently, I assume I have only one file A and one file B to compare.
I tried to do below:-
#!/bin/bash
cat a.txt| while read LINE
do
#grab the second element of a.txt one line at a time
char=`echo "${LINE}"| awk '{print $2}'`
echo $char
#grab the line number of the element found in b.txt
i=`grep -n "^$char" b.txt|tr ":" " "|awk '{ print $1}'`
echo $i
#grab the particular line number fields in b.txt to do computation
cat b.txt|awk -v h=$i '{
count[$h]=$2+$3;
}
{
printf("Count position %d is %d\n",h,count[$h]);
}'
done
However weird thing is, the output i get is the sum of the last $2 and $3 for all the entries; which is
Coutn position 2 is 6 # which is 2+4
Coutn position 1 is 6 #which is 2+4
Coutn position 3 is 6 #which is 2+4
My desired output for each time comparison would be something like:-
Coutn position 2 is 5 #which is 2+3
And the same goes for other matching patterns when I open again another b.txt to compare with second line of a.txt.
Anyone could tell me what is wrong with the above data structure?
maybe I missed out some important structures for the above.
You can use the join command to do a line-by-line lookup of values between two files. You just have to specify which field is the join field using the -1 and -2 options:
For example, this command outputs a concatenation of fields where field number 2 in the first file is the same as field number 1 in the second file. (using your example files)
Code:
join -1 2 -2 1 fileA fileB
The output is:
Code:
107 aaa 2 1
108 bbb 3 1
109 ccc 2 1
You can read this into a shell "while read" loop and perform whatever operations you like:
Code:
join -1 2 -2 1 fileA fileB | while read a b c d; do
echo "for join field $a : $c + $d = $(($c + $d))"
done
And the output:
Code:
for join field 107 : 2 + 1 = 3
for join field 108 : 3 + 1 = 4
for join field 109 : 2 + 1 = 3
If there is a lot of input data you would be better off doing any line-by-line operations in Awk or Perl because the shell's read and arithmetic operators are not very efficient:
I agree with you. But the problem is in the first file (A.txt); first element is looked and compare with second file namely B1.txt.
Then, next element in A.txt (second element) is looked and compare again with another file namely B2.txt.
If we use join, that would means we need to join two files. Can it be done in this case where while I read line by lnie of A.txt,
I join the first element to the first field of B1.txt. Then, I can perform operation on that.
I doubt we can do that because when we join, it still joins and match the whole element of first file with B1.txt. But the desired thing I would like to do is just get one element from A.txt at a time and join them (match).
Do you have files names B1.txt, B2.txt B3.txt etc, where the numerical component increments by 1 each time, and presumably you have as many B files as there are lines in A.txt?
Well, you could do it with something like the approach you took in the OP. However, I think this will be pretty bad performance if you have a lot of lines in A.txt because you will have to invoke several new processes per line of A.txt. Personally I'd switch to Perl for something like this, although awk is also a good choice.
Here's how I'd do it:
Code:
#!/usr/bin/perl
use strict;
use warnings;
my $n = 1;
open(A, "<A.txt") || die "cannot open A.txt : $!\n";
while(<A>) {
chomp;
my @a = split(/\s+/);
my $bfile = "B$n.txt";
open(B, "<$bfile") || die "cannot open $bfile : $!\n";
while(<B>) {
chomp;
my @b = split(/\s+/);
if ( $b[0] eq $a[1] ) {
printf "found %s in %s at line %d. %d + %d = %d\n",
$a[1], $bfile, $., $b[1], $b[2], $b[1]+$b[2];
}
}
close(B);
$n++;
}
close(A);
Do you or anyone know how we can use awk (instead of echo) to simplify the whole process? I am kinda confused and headache thinking of this problem for the couple of days.
awk 'BEGIN{ i=0 }
NR==FNR{
store[++c] = $2
next
}
{
++i
while ( (getline line < FILENAME )> 0 ) {
if ( line ~ store[i] ) {
print "Now I can do something with this line: " line " from file: " FILENAME
}
}
nextfile
}
' file file1 file2 file3
output:
Code:
# ./test.sh
Now I can do something with this line: 107 2 1 from file: file1
Now I can do something with this line: 107 10 1 from file: file1
Now I can do something with this line: 108 3 1 from file: file2
Now I can do something with this line: 109 6 1 from file: file3
Last edited by ghostdog74; 12-13-2007 at 02:08 AM.
Thanks for the input. I havent tried it on my Linux Box currently as I am using currently using my friends pc.
However, I do not quite understood from first glance. Do I have to open any file at the first place? Or just start with awk 'BEGIN...?(because from my understanding, the second field of first file is been read and store in store array.
Second, for the FILENAME i assume it should be my second file? How do you deal with different number at the back for different file1,2,3...etc. to open and compare?
And one more thing, is it that in the if(line~store[i]) when compared with element two of first file, the whole one line at a time is able to automatically compared with store[i]?
Sorry as I am quite new to bash shell and it seems complicated for me to understand in details of the bits. If you dont mind, could you either clarify my doubts or comment on the code?
Thanks alot.Really appreciate it. Will let you know once I try it out. Thanks.
-ahjiefreak
Code:
awk 'BEGIN{ i=0 }
NR==FNR{
store[++c] = $2
next
}
{
++i
while ( (getline line < FILENAME )> 0 ) {
if ( line ~ store[i] ) {
print "Now I can do something with this line: " line " from file: " FILENAME
}
}
nextfile
}
' file file1 file2 file3
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.