Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
both files have one "keyword" in a line, but
part of it can differs, part can duplicate on both files.
need to compare both files, and output, what keywords is the same on both
files, and what keywords is unical in File 1 comparing with File2, and vice versa.
as i understand, diff, nor wdiff cant do that - it can compare files only line by line, not by chaotic words...?
What you could do is first sort the two files (with -u, if required, to weed out duplicates), then use comm to produce the report you require.
Edit. :-) Pipped to the post.
sort do nothing for this task, because keywords quantity differ, as result, sort what you want, there anyway always be different line number on the same keywords.
The utility comm can do that. Look at the manual page for the different options.
Code:
comm -1 -2 <(sort -u file1) <(sort -u file2)
as i said, that task cant be done comparing files line by line, because its positions differ, and contents, too, in part, differ.
sort what way you want, there always be the same words for both files, who have different line numbers....
i need compare not by position ( line number), but by existing, or not existing a word ( code) in the whole file.
sort do nothing for this task, because keywords quantity differ, as result, sort what you want, there anyway always be different line number on the same keywords.
day, sorted beginning on one file was:
AAC
AAT
ABI
ABL
ADO
and other file was:
AAA
AAB
AAT
ABA
ABC
...
What do line numbers have to do with it? Did you try using comm, and/or Turbocapitalist's neater suggestion using it?
i need compare not by position ( line number), but by existing, or not existing a word ( code) in the whole file.
That's what comm does. The sort instances are there to generate the unique list for each file. Then comm can tell you which strings are in both files, or just in one or the other, depending on the options given.
That's what comm does. The sort instances are there to generate the unique list for each file. Then comm can tell you which strings are in both files, or just in one or the other, depending on the options given.
it means, it look for exact words, not important, what is it position in file?
example:
file1:
AAV
AAR
ABT
ATI
file2:
ATI
AWO
AYY
AZZ
it compares right, and give me, the word ATI is in both files?
i understand right?
try to understand output of your given example, but there is a lot of text, and i cant fast see, what it works...
Yes. The example above with the options -1 and -2 finds only the words which are common to both files.
yes, i prove ir. great, thank you very much!
there is possible get output too for first column is words, who is in first file, in second columnn - in second file, and if word is in both files, then that word in the same output line one against others, whereas the single words have a empty position in correspond column?
hope, my idea can be understand.
if that can be done, that was supergreat
bash-4.3$ comm -12 <(sort -u file1) <(sort -u file2)
ATI
bash-4.3$
bash-4.3$ comm -3 <(sort -u file1) <(sort -u file2)
AAR
AAV
ABT
AWO
AYY
AZZ
bash-4.3$
be super, if output can be given in such:
AAR
AAV
ABT
ATI ATI
AWO
AYY
AZZ
bash-4.3$ comm -12 <(sort -u file1) <(sort -u file2)
ATI
bash-4.3$
bash-4.3$ comm -3 <(sort -u file1) <(sort -u file2)
AAR
AAV
ABT
AWO
AYY
AZZ
bash-4.3$
be super, if output can be given in such:
AAR
AAV
ABT
ATI ATI
AWO
AYY
AZZ
Have you tried comm with no -1/2/3 parameters? It's not quite what you want but it's close enough. If you specifically want your desired level of customisation then you'll probably have to write your own Bash script or use a programming language to do the job.
Have you tried comm with no -1/2/3 parameters? It's not quite what you want but it's close enough. If you specifically want your desired level of customisation then you'll probably have to write your own Bash script or use a programming language to do the job.
yes, without parameters output is ok for my hopes, thank you too!
Thank you both, guys, and have a nice day! You re super!
I think that to get that you'd have to write a very short script to check the files separately and then merge and sort the results. It would probably need use of a temporary file. (You can use tempfile to safely generate one.)
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.