LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 03-11-2011, 10:20 AM   #1
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,484

Rep: Reputation: 411Reputation: 411Reputation: 411Reputation: 411Reputation: 411
Keep duplicates based on first word only


I have a large file and want to keep lines which are duplicates, but the test for duplicates is performed only on the first blank-delimited word.

Sample input file:
ALBERT 54
BENJAMIN 37
BILL 24
BILL 25
BILL 77
CARL 40
CARL 44
CHESTER 59
DAVID 23
DAVID 23
DAVID 28
DAVID 61
EDGAR 33
EDWARD 54
EDWARD 59
EDWIN 30

Desired output file:
BILL 24
BILL 25
BILL 77
CARL 40
CARL 44
DAVID 23
DAVID 23
DAVID 28
DAVID 61
EDWARD 54
EDWARD 59

I'm a newbie and still learning the basics, so please:
- no awk
- no bash
- no Perl

Let's stick to commands such as uniq, sort, sed, grep, cut, paste, join, etc
 
Old 03-11-2011, 10:37 AM   #2
szboardstretcher
Senior Member
 
Registered: Aug 2006
Location: Detroit, MI
Distribution: GNU/Linux systemd
Posts: 3,836

Rep: Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360
Sounds a lot like homework.

Is this homework?

Have you looked at the man page of 'sort'?
 
Old 03-11-2011, 10:45 AM   #3
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 670Reputation: 670Reputation: 670Reputation: 670Reputation: 670Reputation: 670
This looks like homework. You can do it with 4 of the commands you listed. Show what you tried, and we can supply hints.
 
Old 03-11-2011, 10:49 AM   #4
szboardstretcher
Senior Member
 
Registered: Aug 2006
Location: Detroit, MI
Distribution: GNU/Linux systemd
Posts: 3,836

Rep: Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360
Double post

http://www.linuxquestions.org/questi...-theme-867810/
 
Old 03-12-2011, 11:08 AM   #5
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,484

Original Poster
Rep: Reputation: 411Reputation: 411Reputation: 411Reputation: 411Reputation: 411
Quote:
Originally Posted by szboardstretcher View Post
Sounds a lot like homework.
Is this homework?
I assure you, this is *not* homework! I am well into retirement (16 years, now) and dabble in programming as a hobby, hoping to keep my brain from atrophying. Any LQ member who has lingering doubts is invited to contact me off-forum at danielbmartin ..aatt.. earthlink ..ddott.. net. I will respond with details about my employment history, detail which should convince you that I am in compliance with LQ forum rules.

Quote:
Originally Posted by szboardstretcher View Post
Have you looked at the man page of 'sort'?
Yes. In fact, I have it as an icon on my Ubuntu desktop. The syntax is daunting. I wonder why there is a --unique option but not a complementary --notunique option. Technical intuition suggests that the logic which identifies uniques and keeps them could just as well identify duplicates and keep them.
 
Old 03-12-2011, 11:27 AM   #6
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976
The uniq command can print out all the duplicate lines and discard the rest. Moreover it has an option to skip the first N fields. What would be useful is an option to compare only the first N fields. Due to its absence, here is a workaround:
Code:
rev file | uniq -f1 -D | rev
As previously noticed, you have a double post with quite the same question. I'm going to close that one and keep this open for further discussion.
 
Old 03-12-2011, 11:48 AM   #7
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 670Reputation: 670Reputation: 670Reputation: 670Reputation: 670Reputation: 670
Another thing to try is to use cut & uniq to return duplicated names. The grep for those names in the original list:
Code:
grep -f <(cut -f1 -d' ' file | uniq -d) file
FYI: the form <( command; command ...) will return the results of the commands, in a form where a filename is expected.
I'll often use it when a command needs sorted input. Such as:
Code:
comm -23 <(sort list1) <(sort list2)
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
BAD PASSWORD: it is based on a dictionary word zahadumy Linux - Software 14 07-21-2016 07:20 AM
BAD PASSWORD: it is based on a dictionary word athomas Linux - Software 12 07-04-2009 07:51 AM
bad password based on dictionary word muhammednavas Linux - Security 2 01-12-2007 04:25 AM
is there any good X11-based word processor? pyenos Linux - Software 3 03-13-2005 04:50 AM
Any Text-Based Word Processors for Linux? linguae Linux - Software 10 07-06-2004 06:21 PM


All times are GMT -5. The time now is 08:22 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration