LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 10-22-2012, 12:07 PM   #1
MikeyCarter
Member
 
Registered: Feb 2003
Location: Orangeville
Distribution: Fedora
Posts: 492

Rep: Reputation: 31
Question Delete duplicates without using sort -u?


I have a case where if I do sort -u on some foreign characters it removes both lines, instead of just one. (only happens for a few usernames which is odd)

I figure it's a bug with sort (GNU coreutils) 5.97 but I won't be able to get the sys-admins to patch the system.

So is there a way of removing duplicate lines from a file with another tool?
 
Old 10-22-2012, 12:09 PM   #2
schneidz
LQ Guru
 
Registered: May 2005
Location: boston, usa
Distribution: fedora-35
Posts: 5,313

Rep: Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918
does
Code:
sort | uniq
work ?
 
Old 10-22-2012, 12:19 PM   #3
Didier Spaier
LQ Addict
 
Registered: Nov 2008
Location: Paris, France
Distribution: Slint64-15.0
Posts: 11,057

Rep: Reputation: Disabled
Code:
#/bin/bash
rm -f withoutduplicates.txt
cat yourfile.txt|sort|awk '
    BEGIN {prev=""}
    {if (prev != $0) {print >> "withoutduplicates.txt"}
    prev=$0
}'

Last edited by Didier Spaier; 10-22-2012 at 12:30 PM.
 
Old 10-22-2012, 12:22 PM   #4
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Code:
awk '!_[$0]++' file
 
Old 10-22-2012, 01:10 PM   #5
MikeyCarter
Member
 
Registered: Feb 2003
Location: Orangeville
Distribution: Fedora
Posts: 492

Original Poster
Rep: Reputation: 31
Turns out I found the "bug" -- mostly in my head --

| LC_COLLATE=C sort -u

That did the trick.

I'm keeping the other solutions on hand in case something else comes up.

Thanks for all your help.
 
Old 10-22-2012, 02:44 PM   #6
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,128

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
There are occasions where sort is inappropriate, so @colucix has a useful answer. Of course, not all awk behaves as expected - I have a SunOS awk doing mighty strange things at present.
 
Old 10-22-2012, 03:24 PM   #7
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Quote:
Originally Posted by syg00 View Post
There are occasions where sort is inappropriate, so @colucix has a useful answer. Of course, not all awk behaves as expected - I have a SunOS awk doing mighty strange things at present.
Yes, other users reported that this simple syntax doesn't work on SunOS awk. I cannot explain what is the reason, since awk on SunOS has the ! and ++ operators, it has the same concept of true and false and referencing a non-existent array element creates that element and returns the null string (false). I don't see any other rule in action here, that might eventually be specific to GNU awk. Someone reported that awk on SunOS is buggy, but I cannot verify it. Anyway, just out of curiosity, what do you get by running the suggested code on SunOS?
 
Old 10-23-2012, 01:46 AM   #8
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
According to this page (#43), the following variation is more efficient. I imagine it's more likely to work properly on SunOS too.

Code:
awk '!($0 in a) { a[$0]; print }'
 
Old 10-23-2012, 03:19 AM   #9
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,128

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
Quote:
Originally Posted by colucix View Post
Anyway, just out of curiosity, what do you get by running the suggested code on SunOS?
Nothing.
As it happened I was in the mood to try a few things before I saw these responses. The ++ post operator works, the not (!) doesn't - if expanded to full "if (_[$0]++ != 0) <blah> ..." it works as expected.
I also wanted to replicate the (single) data in each line - simple. "awk '{print $0,$0}' file"
I wish ...
Testing with strings of "..." and "\t" strategically placed seemed to indicate the first field was always dropped - unless it was the only field. This was true using $0 or $1 or $NF in the command.

And of course sed wasn't smart enough to allow me to do anything useful either.
I could get *really* attached to the GNU extensions ...

Last edited by syg00; 10-23-2012 at 05:14 AM. Reason: s/%NF/$NF/
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Bash: combine arrays & delete duplicates jomann Programming 5 05-24-2011 05:42 AM
Package to find and delete duplicates maury0324 Linux - Software 8 08-03-2010 05:47 AM
[SOLVED] [bash] sort string and discard duplicates hashbang#! Programming 10 08-21-2009 06:17 AM
how could I delete duplicates entries in xml using php catzilla Programming 2 10-30-2005 07:08 PM
how to delete duplicates entries in xml file using sed/awk/sort ? catzilla Linux - Software 1 10-28-2005 02:57 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 06:40 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration