LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 12-13-2016, 05:10 AM   #1
WiseDraco
Member
 
Registered: Nov 2006
Location: Europe,Latvia,Riga
Distribution: slackware,slax, OS X, exMandriva
Posts: 591

Rep: Reputation: 73
Question diff 2 files by content keywords, not lines!


Hello!
I have to compare two text files.

it looks like that:

WRLV70RPVE
WIBY40UAMN
WALT11EXHM
WALT12EXHM
WELT14ERHM
WTLT20ENHM

both files have one "keyword" in a line, but
part of it can differs, part can duplicate on both files.

need to compare both files, and output, what keywords is the same on both
files, and what keywords is unical in File 1 comparing with File2, and vice versa.

as i understand, diff, nor wdiff cant do that - it can compare files only line by line, not by chaotic words...?
 
Old 12-13-2016, 05:19 AM   #2
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,307
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
The utility comm can do that. Look at the manual page for the different options.

Code:
comm -1 -2 <(sort -u file1) <(sort -u file2)
 
1 members found this post helpful.
Old 12-13-2016, 05:27 AM   #3
hydrurga
LQ Guru
 
Registered: Nov 2008
Location: Pictland
Distribution: Linux Mint 21 MATE
Posts: 8,048
Blog Entries: 5

Rep: Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925
What you could do is first sort the two files (with -u, if required, to weed out duplicates), then use comm to produce the report you require.

Edit. :-) Pipped to the post.

Last edited by hydrurga; 12-13-2016 at 05:27 AM. Reason: :-)
 
Old 12-13-2016, 05:32 AM   #4
WiseDraco
Member
 
Registered: Nov 2006
Location: Europe,Latvia,Riga
Distribution: slackware,slax, OS X, exMandriva
Posts: 591

Original Poster
Rep: Reputation: 73
Quote:
Originally Posted by hydrurga View Post
What you could do is first sort the two files (with -u, if required, to weed out duplicates), then use comm to produce the report you require.

Edit. :-) Pipped to the post.
sort do nothing for this task, because keywords quantity differ, as result, sort what you want, there anyway always be different line number on the same keywords.

day, sorted beginning on one file was:

AAC
AAT
ABI
ABL
ADO

and other file was:

AAA
AAB
AAT
ABA
ABC

...
 
Old 12-13-2016, 05:35 AM   #5
WiseDraco
Member
 
Registered: Nov 2006
Location: Europe,Latvia,Riga
Distribution: slackware,slax, OS X, exMandriva
Posts: 591

Original Poster
Rep: Reputation: 73
Quote:
Originally Posted by Turbocapitalist View Post
The utility comm can do that. Look at the manual page for the different options.

Code:
comm -1 -2 <(sort -u file1) <(sort -u file2)

as i said, that task cant be done comparing files line by line, because its positions differ, and contents, too, in part, differ.
sort what way you want, there always be the same words for both files, who have different line numbers....
i need compare not by position ( line number), but by existing, or not existing a word ( code) in the whole file.
 
Old 12-13-2016, 05:36 AM   #6
hydrurga
LQ Guru
 
Registered: Nov 2008
Location: Pictland
Distribution: Linux Mint 21 MATE
Posts: 8,048
Blog Entries: 5

Rep: Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925
Quote:
Originally Posted by WiseDraco View Post
sort do nothing for this task, because keywords quantity differ, as result, sort what you want, there anyway always be different line number on the same keywords.

day, sorted beginning on one file was:

AAC
AAT
ABI
ABL
ADO

and other file was:

AAA
AAB
AAT
ABA
ABC

...
What do line numbers have to do with it? Did you try using comm, and/or Turbocapitalist's neater suggestion using it?
 
Old 12-13-2016, 05:38 AM   #7
WiseDraco
Member
 
Registered: Nov 2006
Location: Europe,Latvia,Riga
Distribution: slackware,slax, OS X, exMandriva
Posts: 591

Original Poster
Rep: Reputation: 73
Quote:
Originally Posted by hydrurga View Post
What do line numbers have to do with it? Did you try using comm, and/or Turbocapitalist's neater suggestion using it?

NAME
comm - compare two sorted files line by line
 
Old 12-13-2016, 05:39 AM   #8
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,307
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
Quote:
Originally Posted by WiseDraco View Post
i need compare not by position ( line number), but by existing, or not existing a word ( code) in the whole file.
That's what comm does. The sort instances are there to generate the unique list for each file. Then comm can tell you which strings are in both files, or just in one or the other, depending on the options given.
 
Old 12-13-2016, 05:45 AM   #9
WiseDraco
Member
 
Registered: Nov 2006
Location: Europe,Latvia,Riga
Distribution: slackware,slax, OS X, exMandriva
Posts: 591

Original Poster
Rep: Reputation: 73
Quote:
Originally Posted by Turbocapitalist View Post
That's what comm does. The sort instances are there to generate the unique list for each file. Then comm can tell you which strings are in both files, or just in one or the other, depending on the options given.

it means, it look for exact words, not important, what is it position in file?
example:

file1:

AAV
AAR
ABT
ATI

file2:


ATI
AWO
AYY
AZZ


it compares right, and give me, the word ATI is in both files?

i understand right?

try to understand output of your given example, but there is a lot of text, and i cant fast see, what it works...
 
Old 12-13-2016, 05:50 AM   #10
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,307
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
Quote:
Originally Posted by WiseDraco View Post
it compares right, and give me, the word ATI is in both files?
Yes. The example above with the options -1 and -2 finds only the words which are common to both files.
 
Old 12-13-2016, 05:57 AM   #11
WiseDraco
Member
 
Registered: Nov 2006
Location: Europe,Latvia,Riga
Distribution: slackware,slax, OS X, exMandriva
Posts: 591

Original Poster
Rep: Reputation: 73
Quote:
Originally Posted by Turbocapitalist View Post
Yes. The example above with the options -1 and -2 finds only the words which are common to both files.

yes, i prove ir. great, thank you very much!

there is possible get output too for first column is words, who is in first file, in second columnn - in second file, and if word is in both files, then that word in the same output line one against others, whereas the single words have a empty position in correspond column?


hope, my idea can be understand.
if that can be done, that was supergreat
 
Old 12-13-2016, 05:59 AM   #12
WiseDraco
Member
 
Registered: Nov 2006
Location: Europe,Latvia,Riga
Distribution: slackware,slax, OS X, exMandriva
Posts: 591

Original Poster
Rep: Reputation: 73
Code:
bash-4.3$ comm -12 <(sort -u file1) <(sort -u file2)

ATI
bash-4.3$ 

bash-4.3$ comm -3 <(sort -u file1) <(sort -u file2)
	 
AAR
AAV
ABT
	AWO
	AYY
	AZZ
bash-4.3$ 

be super, if output can be given in such:


AAR
AAV
ABT
ATI	ATI
        AWO
	AYY
	AZZ
 
Old 12-13-2016, 06:06 AM   #13
hydrurga
LQ Guru
 
Registered: Nov 2008
Location: Pictland
Distribution: Linux Mint 21 MATE
Posts: 8,048
Blog Entries: 5

Rep: Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925Reputation: 2925
Quote:
Originally Posted by WiseDraco View Post
Code:
bash-4.3$ comm -12 <(sort -u file1) <(sort -u file2)

ATI
bash-4.3$ 

bash-4.3$ comm -3 <(sort -u file1) <(sort -u file2)
	 
AAR
AAV
ABT
	AWO
	AYY
	AZZ
bash-4.3$ 

be super, if output can be given in such:


AAR
AAV
ABT
ATI	ATI
        AWO
	AYY
	AZZ
Have you tried comm with no -1/2/3 parameters? It's not quite what you want but it's close enough. If you specifically want your desired level of customisation then you'll probably have to write your own Bash script or use a programming language to do the job.
 
Old 12-13-2016, 06:08 AM   #14
WiseDraco
Member
 
Registered: Nov 2006
Location: Europe,Latvia,Riga
Distribution: slackware,slax, OS X, exMandriva
Posts: 591

Original Poster
Rep: Reputation: 73
Quote:
Originally Posted by hydrurga View Post
Have you tried comm with no -1/2/3 parameters? It's not quite what you want but it's close enough. If you specifically want your desired level of customisation then you'll probably have to write your own Bash script or use a programming language to do the job.

yes, without parameters output is ok for my hopes, thank you too!
Thank you both, guys, and have a nice day! You re super!
 
Old 12-13-2016, 06:13 AM   #15
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,307
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
Quote:
Originally Posted by WiseDraco View Post
be super, if output can be given in such:
I think that to get that you'd have to write a very short script to check the files separately and then merge and sort the results. It would probably need use of a temporary file. (You can use tempfile to safely generate one.)
 
  


Reply

Tags
compare, diff, words



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
keep groups of lines with specific keywords sepide Linux - Newbie 5 02-09-2014 09:43 PM
how to use diff to only print lines fancylad Linux - General 10 06-17-2013 12:50 PM
Creating a diff file when the lines in the input files are slightly different towheedm Linux - General 7 04-08-2012 11:10 AM
How to compare/diff a range of lines from two text files jedibrand Linux - Software 1 03-26-2010 01:54 PM
filter out lines containing specific keywords from output kpachopoulos Linux - General 2 03-12-2007 08:29 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 03:36 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration