Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
05-22-2012, 05:02 PM
|
#1
|
|
LQ Newbie
Registered: Jan 2010
Posts: 20
Rep:
|
test file processing question
hi all ,
I have a dilemma I hope you can help me solve.
I have a largish text file that I want to process.
The file has 4 colums separated by tab. so it looks like this.
fred john dave pete
dave pete terry phil
john dave pete fred
I would like to remove all lines where there are more than one duplicate entry in column 4.
I am not looking to remove duplicates, I want to completely remove the lines that have more than one entry in column 4.
So if 2 or more entries in colum 4 are the same, remove those two rows. I want to be left with only rows who only ever had a single entry in column 4.
Could you help with this? If so, thanks in advance.
seabro
ps. I am using Centos so I guess tools like grep and awk might do it I just dont know how.
Last edited by seabro; 05-22-2012 at 05:06 PM.
|
|
|
|
|
Click here to see the post LQ members have rated as the most helpful post in this thread.
|
05-22-2012, 05:30 PM
|
#2
|
|
Member
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 765
Rep: 
|
Quote:
Originally Posted by seabro
I would like to remove all lines where there are more than one duplicate entry in column 4.
|
As you describe the file and as you show by example, column 4 consists of one word. Consequently it is impossible to have a duplicate entry in column 4. Please elaborate or correct your problem statement.
It is always helpful to give two sample files, a Before and an After. You gave a Before but no After.
Daniel B. Martin
|
|
|
|
05-22-2012, 05:34 PM
|
#3
|
|
LQ Newbie
Registered: Jan 2010
Posts: 20
Original Poster
Rep:
|
hi Daniel,
thanks for your reply.
I guess I forgot to add that the test of duplication should be vertically on column 4
before
bob jon fred pete
fred john phil dave
mike phil john pete
fred jack bob seabro
after
fred john phil dave
fred jack bob seabro
I hope that makes more sense!
thanks again,
Seabro
|
|
|
|
05-22-2012, 06:23 PM
|
#4
|
|
Moderator
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.4 OpenSuSE 12.2
Posts: 9,897
|
Code:
awk 'NR == FNR { _[$4]++ } NR > FNR && _[$4] == 1' file file
The double argument is not an error: it causes awk to process the file twice: the first time it counts how many repetitions of the name in field 4 are there, the second time it prints out only lines with unique names. Hope this helps.
|
|
|
3 members found this post helpful.
|
05-22-2012, 07:43 PM
|
#5
|
|
Member
Registered: Sep 2008
Location: The Netherlands
Distribution: Slackware64 current
Posts: 560
Rep: 
|
Code:
sort -k4 test.txt|uniq -f3 -u
Use sort with the selection of the fourth field and remove with uniq the lines with the duplicates.
|
|
|
3 members found this post helpful.
|
05-31-2012, 11:02 AM
|
#6
|
|
LQ Newbie
Registered: Jan 2010
Posts: 20
Original Poster
Rep:
|
Thanks Guys, that worked a treat!
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 01:22 PM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|