LinuxQuestions.org
Latest LQ Deal: Linux Power User Bundle
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 05-21-2012, 10:30 AM   #1
Leath
LQ Newbie
 
Registered: May 2012
Posts: 11

Rep: Reputation: Disabled
Diff 2 files (not line by line)


Hi guys,

I have 2 text files with very similar contents, but in jumbled order. Does anyone know a good way to compare the lines of one to (all) the lines in the other?

eg.
File1 contains:
cat
dog
bird

File2 contains:
dog
bird
cat

Because diff compares these 2 files line by line it sees them as different - this is true, but I only want to see words that are missing from the document altogether.

Thanks in advance!
 
Old 05-21-2012, 10:41 AM   #2
schneidz
LQ Guru
 
Registered: May 2005
Location: boston, usa
Distribution: fc-15/ fc-20-live-usb/ aix
Posts: 5,167

Rep: Reputation: 890Reputation: 890Reputation: 890Reputation: 890Reputation: 890Reputation: 890Reputation: 890
can you sort them then compare them ?

edit: else grep -f mite work ? ... or something like
Code:
for word in `cat file-1.lst`
do
 grep $word file-2.lst
done

Last edited by schneidz; 05-21-2012 at 10:43 AM.
 
Old 05-21-2012, 08:37 PM   #3
Leath
LQ Newbie
 
Registered: May 2012
Posts: 11

Original Poster
Rep: Reputation: Disabled
Thanks schneidz!

That finally got my brain off the diff command.
Looks pretty simple and sweet, I'll give it a shot.
 
Old 05-22-2012, 12:13 PM   #4
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian + kde 4 / 5
Posts: 6,837

Rep: Reputation: 1981Reputation: 1981Reputation: 1981Reputation: 1981Reputation: 1981Reputation: 1981Reputation: 1981Reputation: 1981Reputation: 1981Reputation: 1981Reputation: 1981
Note that it's not generally a good idea to read lines of input from a file or command with a for loop. You should generally use a while+read loop instead.

http://mywiki.wooledge.org/DontReadLinesWithFor
http://mywiki.wooledge.org/BashFAQ/001

Although in this particular case the expansion splitting the file into individual words results in the desired behavior. If the file were very large, however, it could possibly overwhelm the capacity of the terminal, as the whole list gets expanded before the for loop is run.


As mentioned, grep can also be used to test one file against another (on a per-line basis). This command prints every line in file2 that does not exist in file1:

Code:
grep -v -f file1.txt file2.txt
Just run the command again with the files reversed to get all the unique lines in file1.

Last edited by David the H.; 05-22-2012 at 12:14 PM.
 
Old 05-22-2012, 07:19 PM   #5
Leath
LQ Newbie
 
Registered: May 2012
Posts: 11

Original Poster
Rep: Reputation: Disabled
Thanks David,

I ended up using that exact grep command yesterday, but the files I was using seemed to be too large for it to handle. An strace on the process showed no activity at all. When I split the files to make them smaller it seemed to work well, and give me exactly the type of output I'm after. It seems to be the best option for now, as the very large files were just a one off - all the rest are smaller.

Thanks again.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to compare two files line by line and print the line which is same. nancypriyanjali Programming 9 05-30-2011 11:04 PM
match and combine 2 text files line by line Lowellj Linux - Newbie 9 03-21-2011 09:21 PM
BASH: read every line in the files and use the line as parameters as another program tam3c36 Programming 10 12-07-2010 02:42 PM
[SOLVED] open two text files , read them line by line and update parameters of the 3rd file rastin_nz Programming 17 10-20-2010 08:10 PM
BASH: Each line of multiple text files gets added to one line Gavin Harper Programming 3 09-12-2010 08:31 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 03:50 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration