LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 12-03-2013, 05:05 AM   #1
dunryc
LQ Newbie
 
Registered: Jul 2004
Posts: 10

Rep: Reputation: 0
removing duplicate entries from a text file


HI Guys i have a file as below

Quote:
oranges
apples
lemons
grapes
pears
grapes
grapes
lemons
ornages

I want to display only lines that have not been duplicated i dont want to just remove the duplicates

Quote:
pears
apples
any way to do this from the cli ?


Many thanks Pete
 
Old 12-03-2013, 05:07 AM   #2
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,864
Blog Entries: 1

Rep: Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869
Code:
sort <input | uniq -u >output
 
3 members found this post helpful.
Old 12-03-2013, 05:18 AM   #3
dunryc
LQ Newbie
 
Registered: Jul 2004
Posts: 10

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by NevemTeve View Post
Code:
sort <input | uniq -u >output
this is going to help me out on a daily basis thanks alot and in future ill be sure to read the man [ages a little more !
 
Old 12-03-2013, 06:52 AM   #4
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,659
Blog Entries: 4

Rep: Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941
Quote:
Originally Posted by dunryc View Post
... and in future ill be sure to read the man pages a little more!
You definitely should. Unix/Linux is fairly stuffed with an odd assortment of very-useful things ... not to mention a generous helping of true programming-languages ... all of it free. It's quite entertaining to mosey around /usr/bin, say, and to think, "gee, wonder what that's for?"

For example, if you really need to "do-the-business with a text file," check out awk.

Last edited by sundialsvcs; 12-04-2013 at 08:19 AM.
 
Old 12-03-2013, 09:51 AM   #5
gnashley
Amigo developer
 
Registered: Dec 2003
Location: Germany
Distribution: Slackware
Posts: 4,928

Rep: Reputation: 612Reputation: 612Reputation: 612Reputation: 612Reputation: 612Reputation: 612
Code:
sort <input | uniq -u >output
doesn't do what the OP asked anyway:
Quote:
bash-4.1$ sort <input | uniq -u
apples
oranges
ornages
pears
 
Old 12-03-2013, 10:49 AM   #6
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,864
Blog Entries: 1

Rep: Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869
You mean it should have found out that 'oranges' = 'ornages'?
 
Old 12-03-2013, 06:04 PM   #7
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,187

Rep: Reputation: 354Reputation: 354Reputation: 354Reputation: 354
Another solution:
Code:
$ gawk '{++x[$0]} END{for(i in x){if (x[i]==1){print i}}}' test.dat
apples
ornages
oranges
pears
And, if you want to "fix" the orange problem, try
Code:
$ /usr/share/awk/soundex.awk test.dat | gawk 'NF==2{print $2}'
apples
pears
 
2 members found this post helpful.
Old 12-04-2013, 05:38 AM   #8
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Notice something nice about the method of PTrenholme: it is easily scalable.
Code:
echo "Identify input lines which appeared EXACTLY once"
awk '{a[$0]++} END{for (j in a) if (a[j]==1) print j}' $InFile >$OutFile

echo "Identify input lines which appeared EXACTLY twice"
awk '{a[$0]++} END{for (j in a) if (a[j]==2) print j}' $InFile >$OutFile

echo "Identify input lines which appeared EXACTLY thrice"
awk '{a[$0]++} END{for (j in a) if (a[j]==3) print j}' $InFile >$OutFile
Daniel B. Martin
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Removing a set of Duplicate lines from a file raosr020 Linux - General 4 01-29-2013 11:09 AM
Shell script to concatinate two text files and removing the duplicate entries punky Programming 13 06-17-2012 08:30 PM
File Mgmt. Removing duplicate files. heimbichner Linux - General 3 12-29-2011 10:11 PM
removing duplicate entries shabev Linux - Enterprise 3 03-25-2008 10:36 AM
Removing user entries from the shadow file kaplan71 Linux - Security 1 01-17-2008 07:03 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:16 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration