LinuxQuestions.org
Latest LQ Deal: Linux Power User Bundle
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 07-26-2010, 11:41 AM   #1
secondchanti
LQ Newbie
 
Registered: Jul 2010
Posts: 23

Rep: Reputation: 15
Comparing two linux files for diffirences and similarities.


Dear friends,

Iam having the following two linux files.

file :1

123
456
789
987
654
321

file :2

123
258
236
456
458
658
987
321
568
963
458
758
854
569

Now i want the following out puts

1. similar nos in both the file 1 and file 2 > output= File 3;
2. In file 1, but not in file 2 > out put= file 4;
3. In file 2, but not in file 1 > output = file 5;

The command sdiff is giving output with symbols > < | etc,
and the such output file is not clear and ready to print.

I want to print directly the output files.

PL SUGGEST ME THE SUITABLE COMMANDS OR AWK COMMANDS.

AND

ALSO TELL ME WHERE I HAVE TO WRITE AWK PROGRAMS AND HOW TO RUN IT.

HELP ME

RAO
 
Old 07-26-2010, 11:54 AM   #2
GrapefruiTgirl
LQ Guru
 
Registered: Dec 2006
Location: underground
Distribution: Slackware64
Posts: 7,594

Rep: Reputation: 551Reputation: 551Reputation: 551Reputation: 551Reputation: 551Reputation: 551
Sounds like all you need is the plain old `diff` command, perhaps with the --GTYPE-group-format=GFMT option.

Get file one but not in file 2:
Code:
diff %< file file > output
Get file two but not file one:
Code:
diff %> file file > output
Get stuff common to both files:
Code:
diff %= file file > output
NOTE: I've never used this option, so it may not work exactly as I've written - try it and see.

There are other ways of getting only one file's different lines, still using diff. `diff` also has lots of options for formatting the output - read the man page for details, and experiment with it.

If you want to use `awk`, either write an awk script (a plain text file basically) using a shebang like #!/usr/bin/awk -f or if you wish, just write a bash script (again, basically a text file) with a shebang like #!/bin/bash and within the bash script, send data into `awk` either via a pipe, or by telling awk to read the file you want to operate on. Both methods (the scripts) can be executed from your console terminal.

P.S. - if `diff` alone is not producing precisely the output you want (like if it still has < or > symbols you don't want) then pipe the output through something like `sed` or `tr` to remove unwanted characters.

EDIT: Added info:
Code:
diff --left-column file1 file2     # show only file1 stuff
diff --left-column file2 file1     # show only file2 stuff

Last edited by GrapefruiTgirl; 07-26-2010 at 12:10 PM. Reason: added sed info
 
0 members found this post helpful.
Old 07-26-2010, 12:43 PM   #3
secondchanti
LQ Newbie
 
Registered: Jul 2010
Posts: 23

Original Poster
Rep: Reputation: 15
It is not working. it is showing to go to help --diff.

Pl guide me

Rao
 
Old 07-26-2010, 02:30 PM   #4
b0uncer
LQ Guru
 
Registered: Aug 2003
Distribution: CentOS, OS X
Posts: 5,131

Rep: Reputation: Disabled
I couldn't get the diff produce the wanted output either, quickly enough, so I wrote a small&ugly perl script to do the job. Here goes...

Code:
#!/usr/bin/perl

use strict;

# Hashes to store the lines (numbers) from the files.

my (%hash1,
    %hash2,
    %hash3);

# Now read the two files into hashes 1 and 2; hash 3 will be
# filled with values that exist in both hash 1 and 2.

open(FILE, 'file1') or die('Could not open file1');
while (<FILE>) {
	$hash1{$_} = 1;
}
close(FILE);

open(FILE, 'file2') or die('Could not open file2');
while (<FILE>) {
	$hash2{$_} = 1;
}
close(FILE);

# Go through keys in hash 1 (lines of 1st file).
# Write those to OUTPUT1 that don't exist in hash 2 (2nd file).
# Those that do exist in hash 2 are added to hash 3.
# Do the same the other way around, writing to OUTPUT2, after
# which hash3 contains lines that exist in both of the files.
# Then just write the keys of hash 3 to OUTPUT3 and close files.

open(OUTPUT, '>output1') or die ('Could not open output1');
for (keys %hash1) {
	if ($hash2{$_} eq undef) {
		print OUTPUT1 $_;	
	}
	else {
		$hash3{$_} = 1;
	}
}
close(OUTPUT);

open(OUTPUT, '>output2') or die ('Could not open output2');
for (keys %hash2) {
	if ($hash1{$_} eq undef) {
		print OUTPUT2 $_;
	}
	else {
		$hash3{$_} = 1;
	}
}
close(OUTPUT);

open(OUTPUT, '>output3') or die ('Could not open output3');
for (keys %hash3) {
	print OUTPUT3 $_;
}
close(OUTPUT3);
It doesn't produce sorted output (but you'll figure out how to do that yourself, don't you? ), but it does produce three files (output1--output3), of which one contains the unique items in file1, one the unique items in file2 and one the items that exist in both of the files. I'm confident that the above code can be made a lot shorter, and that diff probably works faster in the hands of somebody who knows how to use it, but this shows one way too. At least it worked on my test files -- hope it helps a little, if nothing else

Last edited by b0uncer; 07-26-2010 at 02:41 PM. Reason: removed unnecessary handles
 
Old 07-26-2010, 09:16 PM   #5
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,245

Rep: Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684
So it turns out the diff stuff sasha put up does work although I had to do an extended version (Lines from post #2 did not work for me as is):
Code:
#diff %< file file > output
diff --changed-group-format='%<' --unchanged-group-format='' file1 file2 > output

#diff %> file file > output
diff --changed-group-format='%>' --unchanged-group-format='' file1 file2 > output

#diff %= file file > output
diff --changed-group-format='' --unchanged-group-format='%=' file1 file2 > output
 
Old 07-27-2010, 01:37 AM   #6
b0uncer
LQ Guru
 
Registered: Aug 2003
Distribution: CentOS, OS X
Posts: 5,131

Rep: Reputation: Disabled
Right, so I was missing the 2nd option all the way. Thanks to grail for updating my knowledge by another piece

It appears this work workstation, having a different variant of Linux than at home, has an older man page for (also older) diff, which is somewhat easier to understand on this part (or then it's just easier after getting it working). However the man page does not define the format for the options in any way, which is odd because running
Code:
diff --help
does; glad this is an older system people don't use much these days. And apparently the diff on the SunOS here doesn't work that way at all, so it's another dead end. Luckily they all have Perl
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
comparing files newbiesforever Linux - Software 3 07-07-2010 04:20 PM
LXer: Similarities LXer Syndicated Linux News 0 06-04-2010 08:30 PM
Comparing two files problem caponewgp Linux - Newbie 5 09-17-2009 03:20 PM
Comparing 2 files in linux makefile system akkigupta Linux - Newbie 3 07-21-2009 03:47 AM
Comparing 2 Files xianzai Programming 2 05-23-2004 12:50 PM


All times are GMT -5. The time now is 08:02 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration