LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 08-26-2013, 06:42 AM   #1
jv61
LQ Newbie
 
Registered: May 2012
Posts: 24

Rep: Reputation: Disabled
How to compare lines from two files and print the number of matches and mismatches


Hello everyone,

I want to compare lines from two files and print the number of matches and mismatches for each position in the line.

Code:
My first file has 800 lines like this 

BIN1: ABHHHABBHAB
BIN2: ABBHBBBBBBA

My second file has 10000 lines like this. 

m1: AB--HHABBHA
m2: ABBHBB----A
m3: ABBHBBBBBBA
I want to compare the lines of my second file to the first one and print the results by percentage identity. My results should look like

Code:
m1 matches to BIN1 with 8 matches and 3 mismatches-perenctage identity is 72.7 
m2 matches to BIN2 with 7 matches and 4 mismatches-perenctage identity is 63.6
m3 matches to BIN2 with 11 matches and no mismatches-perenctage identity is 100
Could anyone give me some suggesstions of how to do this?

Thank you,

Last edited by jv61; 08-26-2013 at 06:54 PM.
 
Old 08-26-2013, 05:06 PM   #2
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,702

Rep: Reputation: 1270Reputation: 1270Reputation: 1270Reputation: 1270Reputation: 1270Reputation: 1270Reputation: 1270Reputation: 1270Reputation: 1270
Have you looked into the diff utility?

Usually something like a percentage of difference is useless.
 
Old 08-26-2013, 07:11 PM   #3
jv61
LQ Newbie
 
Registered: May 2012
Posts: 24

Original Poster
Rep: Reputation: Disabled
Just an update, I had a go with String::Approx module in perl and it seems to be getting there. Here is the code

Code:
#!/usr/bin/perl -w

use String::Approx 'amatch';

open (DICT,"first.txt") or die "Can't open dict: $!";
open (PAT,"second.txt") or die "Can't open pattern: $!";
my @in=<DICT>;
my @pat=<PAT>;
my $lines;
my @MATCHED;
my $pat;
my $patl;
foreach $pat(@pat){
my @patl=split('\t',$pat);
 
foreach $lines(@in){
 
if(my @MATCHED=amatch($patl[1],["20%"],$lines))
{
print $patl[0],":",@MATCHED;
} }
}
I am learning to program in perl, any suggestions to improve are welcome

Thanks for the reply jpollard. I haven't tried using diff yet. I will see how it works.

Thanks again
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Script to compare files in directories and if matches copy into third directory srinietrx Linux - Newbie 11 05-07-2013 11:45 PM
[SOLVED] How to compare a list of files in two directories: compare content and print size Batistuta_g_2000 Linux - Newbie 9 03-24-2013 07:05 AM
[SOLVED] bash: sort lines in 2 files so that equal lines are at the same line number... masavini Programming 10 06-21-2012 01:58 PM
[SOLVED] compare two files and print the common lines sasanthi Linux - Newbie 7 07-26-2011 01:18 PM
bash- how to compare only certain lines of text files daberkow Linux - Newbie 2 06-01-2009 04:48 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 10:30 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration