LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 07-19-2012, 04:45 PM   #1
oliviaxinw
LQ Newbie
 
Registered: Jul 2012
Posts: 3

Rep: Reputation: Disabled
Help with looking for similarities in two files


Hello,
I'm using bash on Mac Terminal. I have two files with sequences that look like, for example:
File 1:
ABCDE11111112345
BERSD222222223453
ADSFAFG243234123

File 2:
ABCDE11111112345
ABCDE3453454
ADSFASDFF12345
ADSFASDGAS34123

I want to find similar sequences that are not only completely the same (like ABCDE11111112345), but also those that are the same in the first 5 or the last five characters. For example, ABCDE3453454 would count as the same as ABCDE11111112345 because the first 5 characters are both ABCDE. ADSFASDGAS34123 and ADSFAFG243234123 would count as the same before the last 5 characters are both 34123. And I want to search for all the "same" sequences from the two files.

Is there a way to do such a search?

Thank you!
 
Old 07-19-2012, 05:29 PM   #2
MCD555
Member
 
Registered: May 2009
Location: Milan, Italy
Distribution: Ubuntu, Debian, Fedora, Oracle Linux
Posts: 107

Rep: Reputation: 10
Yes, it should be a way!
Do you mean: "do exist a way with system command / bash / programs?", do you?

I think that a similar "scanner" would require programming by yourself with a deep know of regular expression!
I would suggest you Perl as program language (and in that case I would happy to help you!).

Hope this helps!
 
Old 07-19-2012, 09:25 PM   #3
frankbell
LQ Guru
 
Registered: Jan 2006
Location: Virginia, USA
Distribution: Slackware, Debian, Mageia, and whatever VMs I happen to be playing with
Posts: 12,379
Blog Entries: 16

Rep: Reputation: 3155Reputation: 3155Reputation: 3155Reputation: 3155Reputation: 3155Reputation: 3155Reputation: 3155Reputation: 3155Reputation: 3155Reputation: 3155Reputation: 3155
You might want to look at awk and diff and regular expressions (aka regex).

I don't know enough to tell you how to do the search--still trying to learn those--but those tools sound appropriate to this issue, if they are available on your Mac.

See man awk and man diff for more. Wikipedia has an article on regex.

https://en.wikipedia.org/wiki/Regular_expression

More on awk: http://www.gnu.org/software/gawk/manual/gawk.html
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
generate a matrix of similarities between multiple files sasanthi Linux - Newbie 1 08-29-2011 06:06 PM
Comparing two linux files for diffirences and similarities. secondchanti Linux - Newbie 5 07-27-2010 12:37 AM
LXer: Similarities LXer Syndicated Linux News 0 06-04-2010 07:30 PM
BIOS | GRUB differences, similarities. Carsto Linux - Laptop and Netbook 11 12-05-2009 10:58 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 02:28 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration