LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware
User Name
Password
Slackware This Forum is for the discussion of Slackware Linux.

Notices



Reply
 
Search this Thread
Old 03-31-2007, 01:29 PM   #1
jong357
Senior Member
 
Registered: May 2003
Location: Columbus, OH
Distribution: DIYSlackware
Posts: 1,914

Rep: Reputation: 52
Comparing text files...


Greets all. I'm having a brain freeze and can't figure out this simple problem. Been in windows too much I guess...

I have 2 files. File 1 is a complete list of what I need. File 2 has one or more lines missing but is more or less the same as file 1. I need to compare file 1 and 2 but only print the extra lines found in file 1. The files contain nothing but single words on each line if that matters. I've looked into sdiff, cmp, awk, uniq et. all but am still stuck for some reason. None of those seem to do what I want except for some sort of awk array maybe... But that still seems like overkill.

Thanks in advance for pointing out the obvious...
 
Old 03-31-2007, 01:36 PM   #2
H_TeXMeX_H
Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269
'diff' should work fine ... did you try that ? Read 'man diff'
 
Old 03-31-2007, 01:57 PM   #3
jong357
Senior Member
 
Registered: May 2003
Location: Columbus, OH
Distribution: DIYSlackware
Posts: 1,914

Original Poster
Rep: Reputation: 52
Yea, I looked at that. It outputs garbage along with what I need.

I tried doing something hackish like:

cat file1 >> file2
uniq -u file2 missing-text.txt

But it doesn't work.. missing-text.txt is the same as file2. Makes no sense.
 
Old 03-31-2007, 02:08 PM   #4
H_TeXMeX_H
Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269
Try something like:

Code:
diff rc.S rc.S-backorigdrax | tail -n4 | sed 's/< //'
where 'tail -n' omits the first line, and "sed 's/< //' gets rid of the '< '.

Last edited by H_TeXMeX_H; 03-31-2007 at 02:09 PM.
 
Old 03-31-2007, 02:38 PM   #5
jong357
Senior Member
 
Registered: May 2003
Location: Columbus, OH
Distribution: DIYSlackware
Posts: 1,914

Original Poster
Rep: Reputation: 52
tail -n outputs the last N lines not omits the first line. That could short me a bunch of files...

piping to sed isn't bad I guess (kludgy tho) but that doesn't work in all circumstances. Here is my:

Code:
$ diff file1 file2      
14c14
< mkcfm
---
>
The only thing your command does is scoot the 'name' I want to the beginning of the line. I'd have to pipe multiple times to get it solo... Extremely kludgy. Also, I'm assuming the "---" is because their is a blank line in file2 which I don't need to account for. This just doesn't seem the way to go. Way too specific and sloppy to boot.

I'd really like to get sort or uniq working. I guess I just don't understand uniq and why -u isn't doing anything.

Last edited by jong357; 03-31-2007 at 02:43 PM.
 
Old 03-31-2007, 02:45 PM   #6
H_TeXMeX_H
Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269
Not sure if a bash script is the best thing for this ... maybe perl ? I mean, you don't want 'kludgy', so ...
 
Old 03-31-2007, 02:53 PM   #7
jong357
Senior Member
 
Registered: May 2003
Location: Columbus, OH
Distribution: DIYSlackware
Posts: 1,914

Original Poster
Rep: Reputation: 52
Sure, I could write a perl script. I'd just tick everything in bash... I suck at perl. Besides, this is going into an existing bash script anyway.

The term 'kludgy' is subjective I guess. I'd really like to keep it down to just 2 or 3 short lines. This is an extremely easy operation (should be anyway), it's just eluding me for some reason.

Thanks for your help thus far. I'm still open to suggestions. Especially clarification on correct usage of uniq (ditching all repeated lines in one file)...
 
Old 03-31-2007, 03:06 PM   #8
gnashley
Amigo developer
 
Registered: Dec 2003
Location: Germany
Distribution: Slackware
Posts: 4,775

Rep: Reputation: 481Reputation: 481Reputation: 481Reputation: 481Reputation: 481
comm is what you want:
comm - compare two sorted files line by line
comm -3 suppress lines that appear in both files

Don't forget the -u option for sort, which may behave differently than sort itself.

Last edited by gnashley; 03-31-2007 at 03:08 PM.
 
Old 03-31-2007, 03:21 PM   #9
jong357
Senior Member
 
Registered: May 2003
Location: Columbus, OH
Distribution: DIYSlackware
Posts: 1,914

Original Poster
Rep: Reputation: 52
Cool. comm does seem to be what I want but check this out.

File1
Code:
cat
boy
dog
bird
File2
Code:
cat
bird
Code:
$ comm -3 file1 file2
        bird
boy
dog
bird
No good.
 
Old 03-31-2007, 03:30 PM   #10
H_TeXMeX_H
Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269
try sorting before you run it

Code:
sort file1 > file1new
sort file2 > file2new
comm -3 file1new file2new
 
Old 03-31-2007, 03:37 PM   #11
jong357
Senior Member
 
Registered: May 2003
Location: Columbus, OH
Distribution: DIYSlackware
Posts: 1,914

Original Poster
Rep: Reputation: 52
Yea.. I thought of that after posting my last comment. It works... But... Now we are back into being kludgy again... Don't you just love that word?

I don't get why this has to be a multiple step process but whatever I guess... See, once I extroplate the missing name, I have to fetch a version number to tag onto it, integrate it with the file that didn't have it and then sort them according to another 'order' file... I'm sorting twice, once with 'sort' and then thru a function I have to sort according to a static list.

It'll work tho so it's all good. Thanks guys!
 
Old 03-31-2007, 04:02 PM   #12
simcox1
Member
 
Registered: Mar 2005
Location: UK
Distribution: Slackware
Posts: 794
Blog Entries: 2

Rep: Reputation: 30
You have two files. File1 and file2. You want to output only the differences.

Does this work?

cat file1 file2 | sort -u

The reason that uniq -u isn't working, is because the text isn't sorted first. Sort -u will show only unique lines with unsorted text.

You could also output it to a new file.

cat file1 file2 | sort -u > file3
 
Old 03-31-2007, 04:53 PM   #13
jong357
Senior Member
 
Registered: May 2003
Location: Columbus, OH
Distribution: DIYSlackware
Posts: 1,914

Original Poster
Rep: Reputation: 52
That doesn't work. That gives the same exact output that is in file1 to begin with. All I need is the missing bits.

Code:
$ cat file1 file2 | sort -u
bird
boy
cat
dog

$cat file1
cat
boy
dog
bird
I don't NEED to sort anything. The only reason why I'm using 'sort' is because it seems to be necessary for comm to function. What I need is the missing words from file1...
 
Old 03-31-2007, 05:10 PM   #14
simcox1
Member
 
Registered: Mar 2005
Location: UK
Distribution: Slackware
Posts: 794
Blog Entries: 2

Rep: Reputation: 30
Yes. How about this.

cat file1 file2 | sort | uniq -u

cat file1 file2 | sort | uniq -d

The first one gives only unique lines. The second gives duplicates.
 
Old 03-31-2007, 05:29 PM   #15
jong357
Senior Member
 
Registered: May 2003
Location: Columbus, OH
Distribution: DIYSlackware
Posts: 1,914

Original Poster
Rep: Reputation: 52
And there it is... The elegant one-liner that has been eluding me...

Funny thing is, I tried that after your first post but gave sort the u switch instead of calling it vanilla. Seems my ignorance with these commands and patience with 'man' was the only problem here...

Thanks again everyone. Much appreciated!!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Comparing 2 Files for Duplicates Mr_H Linux - Newbie 5 11-09-2005 01:43 PM
comparing lots of files Frustin Linux - General 4 09-22-2005 03:54 PM
Using diff for comparing 2 files beep Programming 5 01-21-2005 01:51 PM
Comparing files contents? hhegab Linux - Newbie 3 07-01-2004 01:45 AM
Comparing 2 Files xianzai Programming 2 05-23-2004 12:50 PM


All times are GMT -5. The time now is 04:48 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration