Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place! |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
07-03-2012, 10:42 AM
|
#1
|
Member
Registered: Oct 2011
Posts: 73
Rep: 
|
unmatching strings between two files
Dear all,
I have two files like the ones below:
file_1.txt
alpha 3 5 eu rt
beta 4 5 ew sd
gamma 4 56 er df
delta 23 13 rt rt
file_2.txt
alpha 3 5 eu rt
pluto 2 1 rf gf
gamma 4 56 er df
mouse 23 13 rt rt
I would like to compare them and get a third file showing only those lines having strings appearing in file_2.txt column 1, but not in file_1.txt column 1:
file_output.txt
pluto 2 1 rf gf
mouse 23 13 rt rt
I'am using something like:
Code:
for i in `cat file_2.txt`;
do
echo $i|grep -v -f file_1.txt;
done > file_output.txt
This seems to work. The only problem is that file_output is not showing lines, but rather a long one-column vector:
pluto
2
1
rf
gf
mouse
23
13
rt
rt
Any reason for that?
Any help/suggestion is highly appreciated!
Best,
Udiubu
P.S. I'm using a Mac Terminal right now
|
|
|
07-03-2012, 12:06 PM
|
#2
|
Senior Member
Registered: Mar 2012
Distribution: Red Hat
Posts: 1,604
|
Are you looking for it to display line numbers? If so that is a functionality of the program used to view/edit the file and not an issue with your command.
Your command is doing exactly what it should be doing, a for loop will do each thing one time for the amount of arguments provided. So in this case it is running the echo command 10 times and appending your file. If you are looking to reformat or append line numbers you need to look at using additional utilities like awk or modify your for loop.
Let us know if you have something more specific you want assistance with but from what you've posted I don't see any problems.
|
|
|
07-03-2012, 12:14 PM
|
#3
|
Member
Registered: Jun 2012
Location: Porto Alegre-Brazil
Distribution: Slackware- 14, Debian Wheezy, Ubuntu Studio, Tails
Posts: 88
Rep:
|
Hello
1- Sort the files:
File 1 >
Code:
# sort file_1.txt > f1.txt
File 2>
Code:
# sort file_2.txt > f2.txt
Compare and specify the output in your case 1 and 3 >
Code:
# comm -13 f1.txt f2.txt > fileout.txt
Verify:
Or putting all lines in one time, and after view with nano or your favorite editor:
Code:
# sort file_1.txt > fa.txt ; sort file_2.txt > fb.txt ; comm -13 fa.txt fb.txt > out.txt ; nano out.txt
I Hope that's it help you, Here I got:
mouse 23 13 rt rt
pluto 2 1 rf gf
Cheers
Alchemikos
Last edited by Alchemikos; 07-03-2012 at 12:33 PM.
|
|
|
07-03-2012, 12:50 PM
|
#4
|
Member
Registered: Oct 2011
Posts: 73
Original Poster
Rep: 
|
HI Alchemikos,
Thanks for your reply.
I do not understand why you sort files and most important, why you say "(specify the output) in your case 1 and 3.
What do you mean by 1 and 3 exactly? If you mean "exclude 1 and 3" this is not good.
Note that the files to be compared are very long and of different length, so I cannot specify lines to be excluded.
I just want the script to look for all those strings in file_2 that are not present in file_1.
I thank you anyways!
Best,
Udiubu
|
|
|
07-03-2012, 01:14 PM
|
#5
|
Member
Registered: Jun 2012
Location: Canada
Distribution: Ubuntu/Debian/CentOS
Posts: 45
Rep:
|
I have done something similar, but comparing file numbers and grabbing ones that were not matching in a second file and putting them into a new file. This might not be the best way to do things, but it worked in my situation.
The whole script I used is here
files:
firstFile
date file number name description
06-06-2012 0224115548 John Doe He is one stand up guy
06-07-2012 0224125743 Jane Doe A people person
06-08-2012 0224196541 Bob Awesome His last name is Awesome!
secFile
date file number name description
06-06-2012 0224115548 John Doe He is one stand up guy
06-07-2012 0224125743 Jane Doe A people person
newFile
date file number name description
06-08-2012 0224196541 Bob Awesome His last name is Awesome!
Code:
#The while loop will go through each line in the file "$secFile"
while read line ; do
#In my files, I am looking for any line that contains 2241 with anything after it (till the end of the word)
output=`echo $line | grep -o "2241\w*"`
#If the string is not null, it will store it in the array, and add one onto the counter
if [[ -n "$output" ]]
then
myarray[$i]=$output
#i is keeping track of the number of $output stored into the array i=`expr $i + 1`
i=`expr $i + 1`
fi
count=`expr $count + 1`
#Give the while loop the secFile
done < $secFile
#This while loop goes through the lines of $firtFile, and check them against the newly created array.
while read line ; do
#match is to see if the line and the array match. If they do, it will trigger match to equal 1, and not do anything with that line.
#If it does not match, it will echo the line into a new file.
match=0
compare=`echo "$line" | grep -o "02241\w*"`
#If $compare is not null, then it will continue into the if statement.
if [ -n "$compare" ]
then
#Looping through myarray, this gets the number of entries in the array ( ${#myarray[@]} ) and will only execute the loop till that number is met.
for (( x=0 ; x < ${#myarray[@]} ; x++ )) do
#If the entry in myarray matches $compare or matches with a 0 infront of the array, it will change match to 1.
if [ "${myarray[$x]}" == "$compare" ] || [ "0${myarray[$x]}" == "$compare" ]
then
match=1
fi
done
fi
if [ $match = 0 ]
then
#You can put the new file wherever you like, I have just put an example here that the script will write to the file: newfile.txt in the directory: missedLines.
echo "$line" >> "/missedLines/newfile.txt"
fi
done < "$firstFile"
I can go into more detail if it is something that you think will work in your situation. Also, I am extremely tired this morning, so if this doesn't make sense im sorry.
|
|
|
07-03-2012, 01:28 PM
|
#6
|
Member
Registered: Jun 2012
Location: Porto Alegre-Brazil
Distribution: Slackware- 14, Debian Wheezy, Ubuntu Studio, Tails
Posts: 88
Rep:
|
Quote:
Originally Posted by udiubu
HI Alchemikos,
Thanks for your reply.
I do not understand why you sort files and most important, why you say "(specify the output) in your case 1 and 3.
What do you mean by 1 and 3 exactly? If you mean "exclude 1 and 3" this is not good.
Note that the files to be compared are very long and of different length, so I cannot specify lines to be excluded.
I just want the script to look for all those strings in file_2 that are not present in file_1.
I thank you anyways!
Best,
Udiubu
|
Hello Udiubu
The command 'comm' produce three-column output. Column one contains lines unique to FILE1,
column two contains lines unique to FILE2, and column three contains lines common to both files.
-1
suppress column 1 (lines unique to FILE1)
-2
suppress column 2 (lines unique to FILE2)
-3
suppress column 3 (lines that appear in both files)
comm -13 = columm 3 - columm 1 > that's the difference, that's lines unique to FILE2.
I sorted the files because the comm need to work with them sorted, or isn't work.
Did you ran the commands?
|
|
|
07-03-2012, 01:28 PM
|
#7
|
Member
Registered: Oct 2011
Posts: 73
Original Poster
Rep: 
|
The issue has been solved by using Alchemikos suggestion.
I just for got a step.
Thanks to everybody.
|
|
|
07-03-2012, 02:03 PM
|
#8
|
Member
Registered: Jun 2012
Location: Porto Alegre-Brazil
Distribution: Slackware- 14, Debian Wheezy, Ubuntu Studio, Tails
Posts: 88
Rep:
|
 Nice
Hey Udiubu click the (yes) button at the bottom I'm crazy for the light-green squares.. Hehee
Alchemikos
|
|
1 members found this post helpful.
|
All times are GMT -5. The time now is 08:05 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|