Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place! |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
04-14-2008, 02:38 PM
|
#1
|
|
LQ Newbie
Registered: Apr 2008
Posts: 3
Rep:
|
more file comparison
I have searched and not found quite what will work for me. I am fairly new to this and need to compare two lists and delete the difference.
The problem I'm having is that the lines have different lengths of value. i.e.
one list give the label:
ABC001L1
ABC002L1
ABD090L1
while the other gives more information:
ABC001L1 /filename/date/initials 0021 files
ABC050L1 /filename/differnetdate/initials 0034 files
etc....what I need is to compare list1 against list2 and come up with the same (or delete the different) labels on the lists. I want to keep the ones with the label names only...
I've looked at diff -E, fgrep (which I don't fully understand - especially looking at the man page), and comm...none seem to do what I need.
Both lists are 4,000 to 6,000 entries.
Thanks,
Stan
|
|
|
|
04-14-2008, 03:31 PM
|
#2
|
|
Moderator
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.4 OpenSuSE 12.2
Posts: 9,899
|
Maybe join is what you're looking for:
Code:
$ cat file_one
ABC001L1
ABC002L1
ABD090L1
$ cat file_two
ABC001L1 /filename/date/initials 0021 files
ABC050L1 /filename/differnetdate/initials 0034 files
$ join -o 1.1 file_one file_two
ABC001L1
See man join for details and more options.
|
|
|
|
04-14-2008, 03:42 PM
|
#3
|
|
LQ Addict
Registered: Jul 2002
Location: East Centra Illinois, USA
Distribution: Debian Squeeze
Posts: 5,594
|
This looks like a problem designed for awk.
If the two files have the label on the left, you can use awk to specify the single space as the field seperator. That results in each line being seperated into fields. Each field is filled with the contents of a word or path seperated by a space at each end.
Use a loop that looks for the second field ($2 in awk). If it isn't empty then delete that line.
Else if the second field is empty, then write the contents of field 1 ($1) to another file.
That will give you two files with only the labels in them.
If you want to compare the two files in such a way that only unique labels remain (labels which are not common to both files), then use diff to compare the files and pipe the output through uniq to select only the lables that are unique, and write them to a single file.
Update: I just saw the solution proposed by colucix. His solution joins the two files, but doesn't remove the lables with description, leaving only the lables.
So, use join to join the two files, then use awk to select only the files without descriptions as I described above, selecting on the basis on content in the second fiels ($2).
Last edited by bigrigdriver; 04-14-2008 at 03:45 PM.
|
|
|
|
04-14-2008, 04:04 PM
|
#4
|
|
Moderator
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.4 OpenSuSE 12.2
Posts: 9,899
|
Quote:
Originally Posted by bigrigdriver
Update: I just saw the solution proposed by colucix. His solution joins the two files, but doesn't remove the lables with description, leaving only the lables.
|
With the -o option you can control the output format, selecting one or more fields from one or both files. Indeed, in my example it keeps the labels only.
I was just thinking about a problem: if the two files are not sorted, some labels may be missed. You can circumvent this problem by passing the sorted file with process substitution:
Code:
join -o 1.1 <(sort file_one) <(sort file_two)
Anyway, I agree that some lines of awk code can give a finer control on the output format.
|
|
|
|
04-14-2008, 06:45 PM
|
#5
|
|
Member
Registered: Mar 2008
Location: UK
Distribution: Fedora, Gentoo
Posts: 209
Rep:
|
From what I can tell (and I'm probably wrong), you want to output a line in file2 if there's a corresponding line in file1. Is that right? If so, how about the following:
Code:
for x in $(cat file1);
do
grep "^${x} " file2;
done
If you want to just keep the common label names, stick an if statement in there, like:
Code:
for x in $(cat file1)
do
result=$(grep "^${x} " file2)
if [ "$result" ]
then
echo $x
fi
done
Is that what you're wanting?
|
|
|
|
04-15-2008, 09:57 AM
|
#6
|
|
LQ Newbie
Registered: Apr 2008
Posts: 3
Original Poster
Rep:
|
Thanks for all the great help and advice. Actually, beadyallen hit it on the head (I didn't really make myself clear enough). I do want the entire line from file2 if there is a corresponding label in file1. Maybe it would be clearer if I gave a wee bit of background. These are lists of tapes. File1 is from the dept. that wants certain tapes duplicated, file2 is from querying the db for all tapes within a given range (which covers what the dept wants and more). These need to be - and are - sorted by filefamily from the query when doing the duplication.
I tried running both the codes mentioned by beadyallen, but (and only guessing here) maybe because I'm using bash as my shell, here is the results of the first:
-bash-3.00$ for x in $(cat migrate_now.txt); do grep "$(x) " new-mig-list >> mig_joined; done
-bash: x: command not found
and the second would not end with done:
-bash-3.00$ for x in $(cat migrate_now.txt); do result=$(grep "^${x} " new-mig-list
> if [ "$result" ]
> then
> echo $x
> fi
> done
>
-bash-3.00$ or x in $(cat migrate_now.txt)
-bash: or: command not found
-bash-3.00$ for x in $(cat migrate_now.txt)
> do result=$(grep "^${x} " new-mig-list
> if [ "$result" ]
> then
>
> echo $x
> fi
> done
>
>
Oh, and join gave me way too much info and too many duplicate entries. File1 is 5878 lines (from wc) to give you an idea of the scope.
thanks again...
Stanley
|
|
|
|
04-15-2008, 11:07 AM
|
#7
|
|
Member
Registered: Mar 2008
Location: UK
Distribution: Fedora, Gentoo
Posts: 209
Rep:
|
If you've cut and pasted the output you got, you've not typed it in properly. You've missed brackets, used the wrong brackets( '()' instead of '{}' etc). Based on what you've written, the following should work if you just cut and paste it:
Code:
for x in $(cat migrate_now.txt);
do
grep "^${x} " new-mig-list;
done >> mig_joined
or
Code:
for x in $(cat migrate_now.txt)
do
result=$(grep "^${x} " new-mig-list)
if [ "$result" ]
then
echo $x
fi
done >> mig_joined
Oh, and they're both bash scripts.
|
|
|
|
04-15-2008, 12:08 PM
|
#8
|
|
LQ Newbie
Registered: Apr 2008
Posts: 3
Original Poster
Rep:
|
I realized that later....in looking closer. Thanks. I did:
for x in $( < migrate_now.txt); do grep $x new-mig-list ;done > mig_joined
and that gave me what I needed. Now I just need to re-sort it by file families and it will be great.
Thanks for all your help and for everybody who chimed in. I certainly appreciate it and hopefully have even learned something!!
Stanley
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 05:06 AM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|