Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place! |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
12-03-2008, 10:25 PM
|
#1
|
|
LQ Newbie
Registered: Dec 2008
Posts: 6
Rep: 
|
How to compare two lines and delete the duplicate line from a file?
Hi Friends,
I have a file with contents as given below.
1111|9999|||1|WHI1|Name1||0|0||
1111|9999|1111CS|9999|2|WHI1|Name1||0|0||
1111|55555|||1|MER|Name2||0|0||
1111|55555|22222|55555|2|MER|Name2||0|0||
I want to compare two fields separated by "|" symbol from each line and delete the entire line if there is any duplicate.
Example:
1111|9999|||1|WHI1|Name1||0|0||
1111|9999|1111CS|9999|2|WHI1|Name1||0|0||
In these two lines, the value "1111|9999" is matching and the second line (which is duplicate) should be removed from the file.
The search should continue till the end of the file to remove the duplicates.
Can you please help me on this.
Thanks in advance.
Have a great day!!!
|
|
|
|
12-04-2008, 12:53 AM
|
#2
|
|
Moderator
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,903
|
Hi,
And welcome to LQ!
You're positive that the second line with the same
matching field is always the duplicate that needs
to be removed?
Code:
sort -t\| -k 1,2 -u dupes.txt
1111|55555|||1|MER|Name2||0|0||
1111|9999|||1|WHI1|Name1||0|0||
|
|
|
|
12-04-2008, 01:14 AM
|
#3
|
|
LQ Newbie
Registered: Dec 2008
Posts: 6
Original Poster
Rep: 
|
Yes, the second line will always be duplicate if it matches.
Thanks a lot for your help 
Last edited by Shobhna; 12-04-2008 at 01:16 AM.
|
|
|
|
12-04-2008, 03:03 AM
|
#4
|
|
LQ Newbie
Registered: Dec 2008
Posts: 6
Original Poster
Rep: 
|
Hi,
From the above example file, is it possible to export the fields separated by "|" to an excel file?
Can you please let me know.
Thanks..
|
|
|
|
12-04-2008, 03:09 AM
|
#5
|
|
Moderator
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,903
|
Well, not strictly speaking an Excel-file, but it's trivial
to convert it to CSV which excel will most happily open
directly
Code:
sort -t\| -k 1,2 -u dupes.txt | sed -e 's/|/","/g' -e 's/^/"/' -e 's/$/"/' > no_dupes.csv
*if* this is what you mean by "exporting to an excel file".
If it's not - please explain in more detail what you're
trying to achieve.
Last edited by Tinkster; 12-04-2008 at 03:10 AM.
|
|
|
|
12-04-2008, 03:23 AM
|
#6
|
|
LQ Newbie
Registered: Dec 2008
Posts: 6
Original Poster
Rep: 
|
Thanks a lot..
I'll try this.
|
|
|
|
12-04-2008, 04:07 AM
|
#7
|
|
LQ Newbie
Registered: Dec 2008
Posts: 6
Rep:
|
in excel try file->import and there you can specify the delimiter char to '|' or whatever you need
|
|
|
|
12-04-2008, 07:40 AM
|
#8
|
|
LQ Newbie
Registered: Dec 2008
Posts: 6
Original Poster
Rep: 
|
Hi,
Can somebody help me on this:
How to add a new field to end of each line in a file based on a condition?
For Example,
If the 6th field is "MER" concatenate "NEW1" and "Mercury" to the end of the line. And if it's "WHI1", concatenate "NEW2" and "White" to the end of the line separated by "|" as shown below.
Original File:
1111|55555|||1|MER|Name2||0|0||
1111|9999|||1|WHI1|Name1||0|0||
New File:
1111|55555|||1|MER|Name2||0|0||NEW1|Mercury
1111|9999|||1|WHI1|Name1||0|0||NEW2|White
Thank you.
|
|
|
|
12-04-2008, 11:13 AM
|
#9
|
|
Moderator
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,903
|
Code:
awk -F\| '$6 ~ /MER/ {print $0"|New1|Mercury"} $6 ~ /WHI1/{print $0"|New2|White"}' dupes.txt
1111|9999|||1|WHI1|Name1||0|0|||New2|White
1111|55555|||1|MER|Name2||0|0|||New1|Mercury
Cheers,
Tink
|
|
|
|
12-04-2008, 09:53 PM
|
#10
|
|
LQ Newbie
Registered: Dec 2008
Posts: 6
Original Poster
Rep: 
|
Thank you :-)
|
|
|
|
12-05-2008, 01:08 PM
|
#11
|
|
Moderator
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,903
|
welcome ... ;}
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 06:17 AM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|