LinuxQuestions.org
Did you know LQ has a Linux Hardware Compatibility List?
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 04-14-2008, 02:38 PM   #1
Stannjudy
LQ Newbie
 
Registered: Apr 2008
Posts: 3

Rep: Reputation: 0
Unhappy more file comparison


I have searched and not found quite what will work for me. I am fairly new to this and need to compare two lists and delete the difference.

The problem I'm having is that the lines have different lengths of value. i.e.

one list give the label:
ABC001L1
ABC002L1
ABD090L1

while the other gives more information:
ABC001L1 /filename/date/initials 0021 files
ABC050L1 /filename/differnetdate/initials 0034 files

etc....what I need is to compare list1 against list2 and come up with the same (or delete the different) labels on the lists. I want to keep the ones with the label names only...

I've looked at diff -E, fgrep (which I don't fully understand - especially looking at the man page), and comm...none seem to do what I need.

Both lists are 4,000 to 6,000 entries.

Thanks,
Stan
 
Old 04-14-2008, 03:31 PM   #2
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,458

Rep: Reputation: 1941Reputation: 1941Reputation: 1941Reputation: 1941Reputation: 1941Reputation: 1941Reputation: 1941Reputation: 1941Reputation: 1941Reputation: 1941Reputation: 1941
Maybe join is what you're looking for:
Code:
$ cat file_one
ABC001L1
ABC002L1
ABD090L1

$ cat file_two
ABC001L1 /filename/date/initials 0021 files
ABC050L1 /filename/differnetdate/initials 0034 files

$ join -o 1.1 file_one file_two
ABC001L1
See man join for details and more options.
 
Old 04-14-2008, 03:42 PM   #3
bigrigdriver
LQ Addict
 
Registered: Jul 2002
Location: East Centra Illinois, USA
Distribution: Debian Squeeze
Posts: 5,743

Rep: Reputation: 299Reputation: 299Reputation: 299
This looks like a problem designed for awk.

If the two files have the label on the left, you can use awk to specify the single space as the field seperator. That results in each line being seperated into fields. Each field is filled with the contents of a word or path seperated by a space at each end.

Use a loop that looks for the second field ($2 in awk). If it isn't empty then delete that line.
Else if the second field is empty, then write the contents of field 1 ($1) to another file.

That will give you two files with only the labels in them.

If you want to compare the two files in such a way that only unique labels remain (labels which are not common to both files), then use diff to compare the files and pipe the output through uniq to select only the lables that are unique, and write them to a single file.

Update: I just saw the solution proposed by colucix. His solution joins the two files, but doesn't remove the lables with description, leaving only the lables.

So, use join to join the two files, then use awk to select only the files without descriptions as I described above, selecting on the basis on content in the second fiels ($2).

Last edited by bigrigdriver; 04-14-2008 at 03:45 PM.
 
Old 04-14-2008, 04:04 PM   #4
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,458

Rep: Reputation: 1941Reputation: 1941Reputation: 1941Reputation: 1941Reputation: 1941Reputation: 1941Reputation: 1941Reputation: 1941Reputation: 1941Reputation: 1941Reputation: 1941
Quote:
Originally Posted by bigrigdriver View Post
Update: I just saw the solution proposed by colucix. His solution joins the two files, but doesn't remove the lables with description, leaving only the lables.
With the -o option you can control the output format, selecting one or more fields from one or both files. Indeed, in my example it keeps the labels only.

I was just thinking about a problem: if the two files are not sorted, some labels may be missed. You can circumvent this problem by passing the sorted file with process substitution:
Code:
join -o 1.1 <(sort file_one) <(sort file_two)
Anyway, I agree that some lines of awk code can give a finer control on the output format.
 
Old 04-14-2008, 06:45 PM   #5
beadyallen
Member
 
Registered: Mar 2008
Location: UK
Distribution: Fedora, Gentoo
Posts: 209

Rep: Reputation: 36
From what I can tell (and I'm probably wrong), you want to output a line in file2 if there's a corresponding line in file1. Is that right? If so, how about the following:
Code:
for x in $(cat file1);
do
   grep "^${x} " file2;
done
If you want to just keep the common label names, stick an if statement in there, like:
Code:
for x in $(cat file1)
do
  result=$(grep "^${x} " file2)
  if [ "$result" ]
  then
     echo $x
  fi
done
Is that what you're wanting?
 
Old 04-15-2008, 09:57 AM   #6
Stannjudy
LQ Newbie
 
Registered: Apr 2008
Posts: 3

Original Poster
Rep: Reputation: 0
Thanks for all the great help and advice. Actually, beadyallen hit it on the head (I didn't really make myself clear enough). I do want the entire line from file2 if there is a corresponding label in file1. Maybe it would be clearer if I gave a wee bit of background. These are lists of tapes. File1 is from the dept. that wants certain tapes duplicated, file2 is from querying the db for all tapes within a given range (which covers what the dept wants and more). These need to be - and are - sorted by filefamily from the query when doing the duplication.

I tried running both the codes mentioned by beadyallen, but (and only guessing here) maybe because I'm using bash as my shell, here is the results of the first:

-bash-3.00$ for x in $(cat migrate_now.txt); do grep "$(x) " new-mig-list >> mig_joined; done
-bash: x: command not found

and the second would not end with done:

-bash-3.00$ for x in $(cat migrate_now.txt); do result=$(grep "^${x} " new-mig-list
> if [ "$result" ]
> then
> echo $x
> fi
> done
>
-bash-3.00$ or x in $(cat migrate_now.txt)
-bash: or: command not found
-bash-3.00$ for x in $(cat migrate_now.txt)
> do result=$(grep "^${x} " new-mig-list
> if [ "$result" ]
> then
>
> echo $x
> fi
> done
>
>
Oh, and join gave me way too much info and too many duplicate entries. File1 is 5878 lines (from wc) to give you an idea of the scope.

thanks again...
Stanley
 
Old 04-15-2008, 11:07 AM   #7
beadyallen
Member
 
Registered: Mar 2008
Location: UK
Distribution: Fedora, Gentoo
Posts: 209

Rep: Reputation: 36
If you've cut and pasted the output you got, you've not typed it in properly. You've missed brackets, used the wrong brackets( '()' instead of '{}' etc). Based on what you've written, the following should work if you just cut and paste it:
Code:
for x in $(cat migrate_now.txt);
do
   grep "^${x} " new-mig-list;
done >> mig_joined
or

Code:
for x in $(cat migrate_now.txt)
do
  result=$(grep "^${x} " new-mig-list)
  if [ "$result" ]
  then
     echo $x
  fi
done >> mig_joined
Oh, and they're both bash scripts.
 
Old 04-15-2008, 12:08 PM   #8
Stannjudy
LQ Newbie
 
Registered: Apr 2008
Posts: 3

Original Poster
Rep: Reputation: 0
I realized that later....in looking closer. Thanks. I did:
for x in $( < migrate_now.txt); do grep $x new-mig-list ;done > mig_joined

and that gave me what I needed. Now I just need to re-sort it by file families and it will be great.

Thanks for all your help and for everybody who chimed in. I certainly appreciate it and hopefully have even learned something!!

Stanley
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Config file comparison ideas please ? Xtr3m3 Programming 3 09-01-2006 03:06 PM
Perl - File Comparison PsypherPunk Programming 1 09-01-2006 12:58 PM
File size comparison in bash? dwarf007 Programming 7 08-24-2006 05:46 AM
DIFF (or other file comparison) ciscokid1967 Linux - General 1 12-11-2003 02:13 PM
Good File Comparison Utility Witch-King Linux - Software 4 01-01-2003 10:13 PM


All times are GMT -5. The time now is 09:59 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration