LinuxQuestions.org
Latest LQ Deal: Linux Power User Bundle
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 06-13-2012, 08:04 AM   #1
amar_lahiri
LQ Newbie
 
Registered: Jun 2012
Posts: 1

Rep: Reputation: Disabled
join of 2 files on the bais of 2 columns of by perl


I want to write script in perl for below logic-

I have two files: file1 file2, and I wish to extract lines in file1 that have no match with second file. script should match 1st and 2nd cloumn of first file with 1st and 2nd column of second file ,then give the output on the basis of 1st file (values of file1 whioch are not present in second file)

FILE1-
6781 9547253220
55555 9777676804
3334 8016307705
8990 9777676804
121 9679234253
123 9938365782
6781 9679234253
55555 9938365782


FILE2-
55555 8018555766
55555 9938365782
6781 9679234253
6781 8018555766



so the output should be on the basis of first file or output extracts the line from first file -Menas value of FILE1 which are not present in FILE2

OUTPUT-
6781 9547253220
55555 9777676804
3334 8016307705
8990 9777676804
121 9679234253
123 9938365782
 
Old 06-14-2012, 05:13 PM   #2
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910
Not perl, but does what you asked for ...

Code:
 grep -v -f file2 file1|sort -u

121 9679234253
123 9938365782
3334 8016307705
55555 9777676804
6781 9547253220
8990 9777676804
 
Old 06-14-2012, 05:40 PM   #3
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958Reputation: 1958
Please use ***[code][/code] tags*** around your code and data, to preserve formatting and to improve readability. Please do not use quote tags, colors, or other fancy formatting.


I'm not experienced in perl, but assuming the input files aren't too large to keep in memory, here's how I'd do it in awk. The concept should be transferable to other languages.

Code:
awk 'NR==FNR { arr[$1$2]=1 ; next } ( ! arr[$1$2] ){ print }' File2.txt File1.txt
1) Read file2 in a hash (assoc. array), with the index equal to field1+field2. The value only needs to be a positive string of some kind, as all we need to do is keep track of existing fields.

2) Iterate over file1, and if the first two fields fail to match an existing index, print the line. Otherwise ignore it.


Incidentally, if you aren't wedded to perl, and if the format is unique enough that a simple whole-line comparison can't result in false positives, then a simple grep could also do the job:

Code:
grep -v -f File2.txt File1.txt
 
Old 06-15-2012, 09:51 AM   #4
Valdis Grinbergs
Member
 
Registered: Dec 2005
Distribution: Debian
Posts: 30

Rep: Reputation: 25
The other posters are right, tools other than perl are probably a better fit for this problem. A tool box should have more than one tool and a skilled worker uses the right tool for the right job. That being said, if you really need to use perl, here is one solution:
Code:
use strict;
use warnings;

my $record;
my %set1;
my %set2;

my ($filename1, $filename2) = @ARGV;

open my $f1, "<", $filename1 or die "Could not open '$filename1' - $!";
while ($record = <$f1>) {
    $set1{$record} = 1;
}
close $f1;

open my $f2, "<", $filename2 or die "Could not open '$filename2' - $!";
while ($record = <$f2>) {
    $set2{$record} = 1;
}
close $f2;

foreach my $item (sort keys %set1) {
    print $item if not exists $set2{$item};
}
Use it with the command:
perl scriptname.pl FILE1 FILE2

Last edited by Tinkster; 06-16-2012 at 06:38 PM. Reason: fixed code tags
 
Old 06-16-2012, 06:39 PM   #5
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910
Very nice solution valdis!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How can I extract columns from a file without using awk or perl? KG425 Programming 13 06-06-2012 12:40 PM
[SOLVED] Perl for columns Trotel Programming 7 04-28-2012 02:39 PM
Batch manipulating CSV columns and files in Perl script briana.paige Linux - Newbie 1 07-14-2009 12:02 PM
addings columns in perl script activeq Programming 5 09-03-2008 02:17 AM
[Perl] append columns to file noir911 Programming 3 02-08-2007 06:29 AM


All times are GMT -5. The time now is 01:40 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration