LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 07-01-2011, 10:05 AM   #1
crowzie
LQ Newbie
 
Registered: Jul 2011
Posts: 16

Rep: Reputation: Disabled
Duplicates in text file.


Hi guys

I have a text file which is a list of all my contacts.
So far i have only found software and commands which remove duplicates but i would like to remove all duplicates AND their original entries too so only contacts which have no duplicates are left.

Any help is welcome.
Thanks in advance.
 
Old 07-01-2011, 10:22 AM   #2
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,246

Rep: Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684
Well it might help if you show us what you have tried and what the data looks like before and after?
 
Old 07-01-2011, 10:27 AM   #3
szboardstretcher
Senior Member
 
Registered: Aug 2006
Location: Detroit, MI
Distribution: GNU/Linux systemd
Posts: 3,774
Blog Entries: 1

Rep: Reputation: 1339Reputation: 1339Reputation: 1339Reputation: 1339Reputation: 1339Reputation: 1339Reputation: 1339Reputation: 1339Reputation: 1339Reputation: 1339
Are you asking to remove any contacts that have duplicates?

So,..

If you had:

Code:
Laquica
Devon
Devon
Oranjalo
Wing
Tim
You would want

Code:
Laquica
Oranjalo
Wing
Tim
To be Left?
 
Old 07-01-2011, 10:34 AM   #4
crowzie
LQ Newbie
 
Registered: Jul 2011
Posts: 16

Original Poster
Rep: Reputation: Disabled
Sorry , its a large list of my contacts emails in a single column like this :

john@msn.com
gill@hotmail.com
etc.
Lots of these contacts are in my list 2 or 3 times.
I would like to remove the duplicates and also each entry that has a duplicate.
Thanks again.
 
Old 07-01-2011, 10:41 AM   #5
crowzie
LQ Newbie
 
Registered: Jul 2011
Posts: 16

Original Poster
Rep: Reputation: Disabled
@szboardstretcher

Yes that is exactly what im looking for.



God i love linux!
 
Old 07-01-2011, 10:44 AM   #6
Andrew Benton
Senior Member
 
Registered: Aug 2003
Location: Birkenhead/Britain
Distribution: Linux From Scratch
Posts: 2,073

Rep: Reputation: 64
Code:
sort file-name | uniq > new-file-name
Change file-name and new-file-name to fit your situation. To find out more about sort or uniq, use man sort or man uniq

Last edited by Andrew Benton; 07-01-2011 at 10:46 AM.
 
Old 07-01-2011, 10:50 AM   #7
crowzie
LQ Newbie
 
Registered: Jul 2011
Posts: 16

Original Poster
Rep: Reputation: Disabled
@Andy

Uniq only removes duplicates.
I need to remove the duplicates and also each email in the list which had a duplicate too.
 
Old 07-01-2011, 12:38 PM   #8
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,246

Rep: Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684
Well its crude, but if you sort the file first, this could work:
Code:
awk 'x == $0{a=1;next}x && !a{print x}{x=$0;a=0}END{if(x && !a)print x}' file
Can probably be better but too tired to think.
 
Old 07-01-2011, 01:01 PM   #9
szboardstretcher
Senior Member
 
Registered: Aug 2006
Location: Detroit, MI
Distribution: GNU/Linux systemd
Posts: 3,774
Blog Entries: 1

Rep: Reputation: 1339Reputation: 1339Reputation: 1339Reputation: 1339Reputation: 1339Reputation: 1339Reputation: 1339Reputation: 1339Reputation: 1339Reputation: 1339
make a copy of your datafile called NEW_WITHOUT_DUPES.txt

Code:
NEW_WITHOUT_DUPES.txt
asdf
asdf
test
and
testes
asdf
Quote:
uniq -d NEW_WITHOUT_DUPES.txt > remove_lines

for LINE in $(cat remove_lines); do sed -ie "\|^$LINE\$|d" NEW_WITHOUT_DUPES.txt; done
Code:
NEW_WITHOUT_DUPES.txt
test
and 
testes
Hope that helps.

Last edited by szboardstretcher; 07-01-2011 at 01:02 PM.
 
1 members found this post helpful.
Old 07-01-2011, 01:28 PM   #10
szboardstretcher
Senior Member
 
Registered: Aug 2006
Location: Detroit, MI
Distribution: GNU/Linux systemd
Posts: 3,774
Blog Entries: 1

Rep: Reputation: 1339Reputation: 1339Reputation: 1339Reputation: 1339Reputation: 1339Reputation: 1339Reputation: 1339Reputation: 1339Reputation: 1339Reputation: 1339
You could also just do:
Code:
uniq -u filename
which will print only unique lines.
 
1 members found this post helpful.
Old 07-02-2011, 11:42 PM   #11
crowzie
LQ Newbie
 
Registered: Jul 2011
Posts: 16

Original Poster
Rep: Reputation: Disabled
Thanks guys
Virtual beers all round!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Linux Text File convert to Windows/Notepad Text File = Wrapped? backroger Linux - Software 4 01-18-2009 06:54 AM
BASH out duplicates from multiple text files smudge|lala Linux - General 3 09-24-2008 08:51 PM
text match pipe to file then delete from original text file create new dir automatic tr1px Linux - Newbie 6 09-10-2008 10:40 PM
How to parse text file to a set text column width and output to new text file? jsstevenson Programming 12 04-23-2008 03:36 PM
how to delete duplicates entries in xml file using sed/awk/sort ? catzilla Linux - Software 1 10-28-2005 03:57 PM


All times are GMT -5. The time now is 06:24 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration