LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices



Reply
 
Search this Thread
Old 01-06-2006, 09:13 AM   #1
lluciano
LQ Newbie
 
Registered: Feb 2004
Distribution: FC4
Posts: 19

Rep: Reputation: 0
Removing partially unique lines


Greetings.
We have been trying to remove unique lines from a text file using sort and uniq as follows:


Example:

Remove duplicate lines based on column 1 (only)

Node1 10.0.0.1
Node1 10.0.0.2
Node1 10.0.0.3
Node1 10.0.0.4
Node1 10.0.0.5
Node1 10.0.0.6
Node2 10.1.1.1
Node1 10.1.1.2

The above file would look like this after the command is ran:

Node1 10.0.0.1
Node2 10.1.1.1

It does not matter which second column value is left over, only that there are no remaining duplicate column 1 values.

We tried using awk, sort and the like, but have not been able to find a suitable command to accomplish this. Does anyone have any ideas?
 
Old 01-06-2006, 09:54 AM   #2
Hko
Senior Member
 
Registered: Aug 2002
Location: Groningen, The Netherlands
Distribution: ubuntu
Posts: 2,530

Rep: Reputation: 108Reputation: 108
Just make 'uniq' only consider the first field (by default delimited by spaces or tabs).

Assuming the file is called "nodes.txt":
Code:
sort nodes.txt | uniq -W1
 
Old 01-06-2006, 12:32 PM   #3
homey
Senior Member
 
Registered: Oct 2003
Posts: 3,057

Rep: Reputation: 56
From uniq --help
-w, --check-chars=N compare no more than N characters in lines


You're only interested in the first 5 characters, so I would try this...

sort file.txt | uniq -w5
 
Old 01-06-2006, 12:46 PM   #4
Hko
Senior Member
 
Registered: Aug 2002
Location: Groningen, The Netherlands
Distribution: ubuntu
Posts: 2,530

Rep: Reputation: 108Reputation: 108
IMHO the -W1 option will work better, because then the length of the node-names does not matter.
 
Old 01-06-2006, 12:56 PM   #5
homey
Senior Member
 
Registered: Oct 2003
Posts: 3,057

Rep: Reputation: 56
sort file.txt | uniq -W1
uniq: invalid option -- W
Try `uniq --help' for more information.

uniq --version
uniq (coreutils) 5.2.1
Written by Richard Stallman and David MacKenzie.


We may have different versions or something.
 
Old 01-06-2006, 01:02 PM   #6
Hko
Senior Member
 
Registered: Aug 2002
Location: Groningen, The Netherlands
Distribution: ubuntu
Posts: 2,530

Rep: Reputation: 108Reputation: 108
Quote:
Originally Posted by homey
sort file.txt | uniq -W1
uniq: invalid option -- W
Try `uniq --help' for more information.

uniq --version
uniq (coreutils) 5.2.1
Written by Richard Stallman and David MacKenzie.


We may have different versions or something.
That's certainly strange. I have the same version (from Debian sarge package though, but I suppose that shouldn't make a difference in this case).

Code:
heiko@hko3:~$ uniq --version
uniq (coreutils) 5.2.1
Written by Richard Stallman and David MacKenzie.
[..snip..]

heiko@hko3:~$ uniq --help | grep -- -W
  -W, --check-fields=N  compare no more than N fields in lines

Last edited by Hko; 01-06-2006 at 01:04 PM.
 
Old 01-06-2006, 01:19 PM   #7
homey
Senior Member
 
Registered: Oct 2003
Posts: 3,057

Rep: Reputation: 56
sort file.txt | uniq -w1
Node1 10.0.0.1

sort file.txt | uniq -w5
Node1 10.0.0.1
Node2 10.1.1.1

I wonder what output you get on your box using W1 and W5
 
Old 01-06-2006, 05:01 PM   #8
Hko
Senior Member
 
Registered: Aug 2002
Location: Groningen, The Netherlands
Distribution: ubuntu
Posts: 2,530

Rep: Reputation: 108Reputation: 108
Quote:
Originally Posted by homey
I wonder what output you get on your box using W1 and W5
Code:
heiko@hko3:/tmp$ sort file.txt | uniq -W1
Node1 10.0.0.1
Node2 10.1.1.1

heiko@hko3:/tmp$ sort file.txt | uniq -W5
Node1 10.0.0.1
Node1 10.0.0.2
Node1 10.0.0.3
Node1 10.0.0.4
Node1 10.0.0.5
Node1 10.0.0.6
Node1 10.1.1.2
Node2 10.1.1.1
 
Old 01-11-2006, 09:56 AM   #9
lluciano
LQ Newbie
 
Registered: Feb 2004
Distribution: FC4
Posts: 19

Original Poster
Rep: Reputation: 0
Thanks for all of the replies.

In the end, here is what I used, since the legnth of each node name is different. For example:

abcnode1 10.x.x.x
dnode 10.x.x.x
cdefgnode 10.x.x.x
othername 10.x.x.x


Script:

sort dp.txt | awk -v NODE="INITIALIZED" '{if ($1!=NODE)
{
print $0
NODE=$1
}
else
NODE=$1}'
 
Old 01-11-2006, 12:20 PM   #10
schneidz
Senior Member
 
Registered: May 2005
Location: boston, usa
Distribution: fc-15/ fc-20-live-usb/ aix
Posts: 4,197

Rep: Reputation: 642Reputation: 642Reputation: 642Reputation: 642Reputation: 642Reputation: 642
here's my version. i tried to non-script it but this is easier:

Code:
schneidz@lq:/temp/lluciano> cat nodes.txt
Node1 10.0.0.1
Node1 10.0.0.2
Node1 10.0.0.3
Node1 10.0.0.4
Node1 10.0.0.5
Node1 10.0.0.6
Node1 10.1.1.2
Node2 10.1.1.1
schneidz@lq:/temp/lluciano> cat lluciano.ksh
#!/usr/bin/bash

sort nodes.txt | awk '{print $1}' | uniq > nodes.unq

for line in `cat nodes.unq`
do
 grep $line nodes.txt | head -n 1
done
schneidz@lq:/temp/lluciano> lluciano.ksh
Node1 10.0.0.1
Node2 10.1.1.1
schneidz@lq:/temp/lluciano>

Last edited by schneidz; 01-11-2006 at 12:21 PM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
removing special lines from files Prasun1 Linux - General 3 09-11-2005 06:16 AM
Unique lines based on specific fields. carl.waldbieser Programming 6 08-21-2005 03:26 PM
Removing duplicate lines with sed tireseas Programming 10 01-12-2005 04:27 AM
removing lines from file script iluvatar Programming 9 08-20-2004 06:49 AM
Removing lines from file Aylar Programming 2 04-22-2004 07:34 AM


All times are GMT -5. The time now is 06:44 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration