LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-31-2012, 05:19 AM   #1
oreka18
LQ Newbie
 
Registered: May 2012
Posts: 7

Rep: Reputation: Disabled
remove records which have 2 same fields


how can i remove records which have 2 same fields?
my file:
Code:
saeed 1 2 sa
vahid 2 3 45
reza 212 33 sa
amir 1 1 ui
reza 21 33 sa
i want to remove records which first and 3rd field of that are as the same, here line 3 and 5 must be removed.

Last edited by oreka18; 05-31-2012 at 05:20 AM.
 
Old 05-31-2012, 08:13 AM   #2
CTM
Member
 
Registered: Apr 2004
Distribution: Slackware
Posts: 297

Rep: Reputation: 275Reputation: 275Reputation: 275
edit: code removed, as it doesn't do exactly what the OP asked (thanks danielbmartin for pointing that out). See grail's solution.

Last edited by CTM; 05-31-2012 at 10:09 AM.
 
Old 05-31-2012, 08:52 AM   #3
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,700

Rep: Reputation: 564Reputation: 564Reputation: 564Reputation: 564Reputation: 564Reputation: 564
Quote:
Originally Posted by CTM View Post
Here's a Perl solution ...
With due respect, OP said lines 3 and 5 must be removed. Your solution removed only line 5.

Daniel B. Martin
 
Old 05-31-2012, 10:02 AM   #4
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,643

Rep: Reputation: 2961Reputation: 2961Reputation: 2961Reputation: 2961Reputation: 2961Reputation: 2961Reputation: 2961Reputation: 2961Reputation: 2961Reputation: 2961Reputation: 2961
Is the duplicate entry only likely to occur in pairs or is there no certainty behind the repeats?

Below works only for even repetition:
Code:
awk '$1$3 in a{delete a[$1$3];next}{a[$1$3]=$0}END{for(i in a)print a[i]}' file
EDIT: Just came to me, simple change should work for any number of repeats:
Code:
awk '$1$3 in a{a[$1$3]="";next}{a[$1$3]=$0}END{for(i in a)if(a[i])print a[i]}' file

Last edited by grail; 05-31-2012 at 10:06 AM.
 
1 members found this post helpful.
Old 05-31-2012, 11:16 AM   #5
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978
Another (similar) way:
Code:
awk 'FNR==NR {_[$1$3]++} FNR<NR && !(_[$1$3]-1)' file file
 
1 members found this post helpful.
Old 05-31-2012, 11:24 AM   #6
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,700

Rep: Reputation: 564Reputation: 564Reputation: 564Reputation: 564Reputation: 564Reputation: 564
Quote:
Originally Posted by grail View Post
Code:
awk '$1$3 in a{a[$1$3]="";next}{a[$1$3]=$0}END{for(i in a)if(a[i])print a[i]}' file
Love this solution... but I put on my "tester" hat and found a flaw. I constructed an input file which is an extension of that provided by OP.
Code:
saeed 1 2 sa
vahid 2 3 45
reza 212 33 sa (must toss)
amir 1 1 ui
reza 21 33 sa (must toss)
phil19 34 37 p3
jack22 41 47 jk
dave7 28 31 de
phil1 34 937 p4
dick7 42 55 c9
Note that there are two fellows named "phil" but they are distinct. One is phil19 (on line 6); the other is phil1 (on line 9). Lines 3 and 5 are the only lines which should be tossed. Your solution uses fields 1 and 3 (catenated) as an array subscript. My test input file shows how that can lead to trouble.

Perhaps an array subscript which inserts "^" or some other special character will avoid this minor hazard.

Daniel B. Martin
 
Old 05-31-2012, 12:12 PM   #7
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,700

Rep: Reputation: 564Reputation: 564Reputation: 564Reputation: 564Reputation: 564Reputation: 564
Quote:
Originally Posted by colucix View Post
Code:
awk 'FNR==NR {_[$1$3]++} FNR<NR && !(_[$1$3]-1)' file file
A minor change avoids the "two phils" pitfall cited previously.
Code:
awk 'FNR==NR {_[$1 $3]++} FNR<NR && !(_[$1 $3]-1)' $InFile $InFile
Daniel B. Martin

Last edited by danielbmartin; 05-31-2012 at 12:15 PM. Reason: Simplify, simplify, simplify ...
 
1 members found this post helpful.
Old 05-31-2012, 02:13 PM   #8
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,643

Rep: Reputation: 2961Reputation: 2961Reputation: 2961Reputation: 2961Reputation: 2961Reputation: 2961Reputation: 2961Reputation: 2961Reputation: 2961Reputation: 2961Reputation: 2961
I would probably use SUBSEP as it won't then get mixed up with another such curve ball allowing for extraneous insertions:
Code:
awk '$1SUBSEP$3 in a{a[$1,$3]="";next}{a[$1,$3]=$0}END{for(i in a)if(a[i])print a[i]}' file

Last edited by grail; 05-31-2012 at 02:14 PM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] deleting fields in lines that have more fields than the average patolfo Linux - Software 4 09-14-2011 12:03 PM
[SOLVED] MX Records / A Records / CNAME Records - Advice Please fusion1275 Linux - Newbie 15 01-18-2011 05:06 AM
awk: swapping fields and records and for loop sebelk Programming 5 05-10-2010 08:41 PM
script to remove the lines which are having the duplicate value in 2 fields ajcapri Linux - Newbie 10 11-29-2009 10:38 PM
awk to remove first 3 lines and print remaining $1, $2 fields phyx Linux - General 1 01-10-2007 06:21 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 07:26 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration