Download your favorite Linux distribution at LQ ISO.
Go Back > Forums > Linux Forums > Linux - Newbie
User Name
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!


  Search this Thread
Old 08-26-2013, 06:42 AM   #1
LQ Newbie
Registered: May 2012
Posts: 24

Rep: Reputation: Disabled
How to compare lines from two files and print the number of matches and mismatches

Hello everyone,

I want to compare lines from two files and print the number of matches and mismatches for each position in the line.

My first file has 800 lines like this 


My second file has 10000 lines like this. 

m2: ABBHBB----A
I want to compare the lines of my second file to the first one and print the results by percentage identity. My results should look like

m1 matches to BIN1 with 8 matches and 3 mismatches-perenctage identity is 72.7 
m2 matches to BIN2 with 7 matches and 4 mismatches-perenctage identity is 63.6
m3 matches to BIN2 with 11 matches and no mismatches-perenctage identity is 100
Could anyone give me some suggesstions of how to do this?

Thank you,

Last edited by jv61; 08-26-2013 at 06:54 PM.
Old 08-26-2013, 05:06 PM   #2
Senior Member
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,702

Rep: Reputation: 1270Reputation: 1270Reputation: 1270Reputation: 1270Reputation: 1270Reputation: 1270Reputation: 1270Reputation: 1270Reputation: 1270
Have you looked into the diff utility?

Usually something like a percentage of difference is useless.
Old 08-26-2013, 07:11 PM   #3
LQ Newbie
Registered: May 2012
Posts: 24

Original Poster
Rep: Reputation: Disabled
Just an update, I had a go with String::Approx module in perl and it seems to be getting there. Here is the code

#!/usr/bin/perl -w

use String::Approx 'amatch';

open (DICT,"first.txt") or die "Can't open dict: $!";
open (PAT,"second.txt") or die "Can't open pattern: $!";
my @in=<DICT>;
my @pat=<PAT>;
my $lines;
my $pat;
my $patl;
foreach $pat(@pat){
my @patl=split('\t',$pat);
foreach $lines(@in){
if(my @MATCHED=amatch($patl[1],["20%"],$lines))
print $patl[0],":",@MATCHED;
} }
I am learning to program in perl, any suggestions to improve are welcome

Thanks for the reply jpollard. I haven't tried using diff yet. I will see how it works.

Thanks again


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Script to compare files in directories and if matches copy into third directory srinietrx Linux - Newbie 11 05-07-2013 11:45 PM
[SOLVED] How to compare a list of files in two directories: compare content and print size Batistuta_g_2000 Linux - Newbie 9 03-24-2013 07:05 AM
[SOLVED] bash: sort lines in 2 files so that equal lines are at the same line number... masavini Programming 10 06-21-2012 01:58 PM
[SOLVED] compare two files and print the common lines sasanthi Linux - Newbie 7 07-26-2011 01:18 PM
bash- how to compare only certain lines of text files daberkow Linux - Newbie 2 06-01-2009 04:48 PM > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 10:30 PM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration