LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices



Reply
 
Search this Thread
Old 04-08-2010, 05:16 AM   #1
himu3118
Member
 
Registered: Mar 2010
Posts: 35

Rep: Reputation: 15
remove duplicate lines from shell script


i have a file with semi duplicate lines
like
abc 12 32
agsi 82
sha 26
abc 1
iaij
agsi 3

now i want to edit my file and make it:
abc 12 32
agsi 82
sha 26
iaij
i.e. remove second occurrence of line when 1st column is abc or agsi.
 
Click here to see the post LQ members have rated as the most helpful post in this thread.
Old 04-08-2010, 05:19 AM   #2
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Servers: Debian Squeeze and Wheezy. Desktop: Slackware64 14.0. Netbook: Slackware 13.37
Posts: 8,563
Blog Entries: 29

Rep: Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179
This LQ thread might help.

EDIT: it is helpful if you mark your solved threads SOLVED using the Thread Tool smenu.

Last edited by catkin; 04-08-2010 at 05:21 AM.
 
Old 04-08-2010, 05:20 AM   #3
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
Code:
awk '!_[$1]++' file
 
4 members found this post helpful.
Old 04-08-2010, 05:22 AM   #4
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Servers: Debian Squeeze and Wheezy. Desktop: Slackware64 14.0. Netbook: Slackware 13.37
Posts: 8,563
Blog Entries: 29

Rep: Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179Reputation: 1179
Quote:
Originally Posted by ghostdog74 View Post
Code:
awk '!_[$1]++' file
Wow ghostdog74, that's gloriously minimal!

Could you spell out for us how it works?

Last edited by catkin; 04-08-2010 at 05:23 AM.
 
1 members found this post helpful.
Old 04-08-2010, 05:39 AM   #5
himu3118
Member
 
Registered: Mar 2010
Posts: 35

Original Poster
Rep: Reputation: 15
thanx a lot its working fine..

but i have 1 problem. i hav one more line whose second occurrence i dont want to remove. like
abc 12 32
agsi 82
sha 26
abc 1
iaij
agsi 3
iaij

i just want to remove abc 1 and agsi 3 but not iaij... its removing all three..
 
Old 04-08-2010, 05:52 AM   #6
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957
Quote:
Originally Posted by catkin View Post
Wow ghostdog74, that's gloriously minimal!
+1 from me
Quote:
Could you spell out for us how it works?
May I?

Let's try: ! is the negation operator, _ is an array name (it could have been "array", "pippo" or anything else, but undoubtedly a simple underscore is minimal and effective).

The expression _[$1]++ increments the value of the element of the array with index $1 by one. Since the ++ comes after, the expression is evaluated before the increment so that at the first occurrence of $1 the value is still 0 (false). For any other occurrence of $1 the value is >0 (true).

The negation invert the boolean logic. Genius!

Finally, no action is associated to the expression, so that the default one (print $0) is applied.
 
5 members found this post helpful.
Old 04-08-2010, 05:57 AM   #7
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957
Quote:
Originally Posted by himu3118 View Post
i just want to remove abc 1 and agsi 3 but not iaij... its removing all three..
This complicates things a bit. What is the logic for not removing duplicates of "iaij"? Is it simply an exclusion of some specific strings? Or does it depend on the absence of any other (numeric) field on the same line?
 
Old 04-08-2010, 06:04 AM   #8
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,698

Rep: Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988
Legen ..... wait for it ...... dary

Last edited by grail; 04-08-2010 at 06:07 AM.
 
Old 04-08-2010, 06:04 AM   #9
himu3118
Member
 
Registered: Mar 2010
Posts: 35

Original Poster
Rep: Reputation: 15
actually i have to remove only specific strings. i can have any number of duplicate strings but i only want to remove second occurrence of some specific strings. here of abc and agsi...
 
Old 04-08-2010, 06:20 AM   #10
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957
If the list of words you want to remove is not so long to type, you can try something like:
Code:
awk '
BEGIN{ nodup["abc"]
       nodup["agsi"]
}
!_[$1]++ || !($1 in nodup)' file
 
Old 04-08-2010, 06:36 AM   #11
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
Quote:
Originally Posted by colucix View Post
+1 from me

May I?
that's about it
 
Old 04-08-2010, 06:42 AM   #12
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
Quote:
Originally Posted by himu3118 View Post
thanx a lot its working fine..

but i have 1 problem. i hav one more line whose second occurrence i dont want to remove. like
abc 12 32
agsi 82
sha 26
abc 1
iaij
agsi 3
iaij

i just want to remove abc 1 and agsi 3 but not iaij... its removing all three..
please test this with more data (if i get what you want), otherwise, use the other solutions
Code:
awk '!_[$1]++ && (_["iaij"]!=2)' file
 
Old 04-08-2010, 07:11 AM   #13
himu3118
Member
 
Registered: Mar 2010
Posts: 35

Original Poster
Rep: Reputation: 15
thanx a lot all ...
the strings are not that large. i can type them easily...
its perfect..

its working gr8..

Last edited by himu3118; 04-09-2010 at 06:42 AM.
 
Old 04-08-2010, 07:12 AM   #14
EricTRA
Guru
 
Registered: May 2009
Location: Gibraltar, Gibraltar
Distribution: Fedora 20 with Awesome WM
Posts: 6,805
Blog Entries: 1

Rep: Reputation: 1291Reputation: 1291Reputation: 1291Reputation: 1291Reputation: 1291Reputation: 1291Reputation: 1291Reputation: 1291Reputation: 1291
Hi,

Thank you very much, both ghostdog74 and colucix. This is why I LOVE LQ, I'm always learning new stuff.

Kind regards,

Eric
 
Old 03-30-2012, 03:17 PM   #15
januka
LQ Newbie
 
Registered: Sep 2010
Posts: 16

Rep: Reputation: 0
ghostdog74

Wow ghostdog74! Gloriously minimal indeed !!! These are tears of joy !!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Delete Duplicate Lines in a file, leaving only the unique lines left xmrkite Linux - Software 6 01-14-2010 07:18 PM
script to remove the lines which are having the duplicate value in 2 fields ajcapri Linux - Newbie 10 11-29-2009 10:38 PM
[SOLVED] uniq -u : does not seem to remove duplicate lines boxb29 Linux - General 7 08-15-2009 07:34 PM
Shell Script : Finding a duplicate Number from file ? avklinux Programming 8 12-16-2008 12:50 PM
How to remove first 2 lines of a file in a script nazs Programming 16 02-19-2007 08:08 AM


All times are GMT -5. The time now is 07:29 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration