Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
04-01-2009, 08:14 AM
|
#1
|
|
LQ Newbie
Registered: Apr 2009
Posts: 6
Rep:
|
Perl question: delete line from text file with duplicate match at beginning of line
Hi all:
Was wondering if any perl guru's could help me with a quick log file adjustment. I have a text file that looks like so (tabs and newlines are revealed so you can see what separates the data):
1234 {tab} purchase {tab} sale {newline}
4567 {tab} broken {tab} sale {newline}
4588 {tab} theft {tab} misc {newline}
1234 {tab} purchase {tab} audit {newline}
There are maybe 100 lines of text in this file at any given time. I need to delete all duplicate lines only looking at the first bit of text prior to the first tab. It doesn't matter which one gets deleted as long as there are no two lines that begin with that same text at the beginning before the first tab. So in this example, either the fist line "1234" or the last line "1234" would need to be deleted. I already have code in my script that opens the files - I just need the code to read the text into an array and the part that would find matches based on the above criteria, and make the deletions.
If it would be easier, I can even do a system call and use SED (v4.1.5) and/or AWK (3.1.5) instead.
With kind regards.
|
|
|
|
04-01-2009, 08:25 AM
|
#2
|
|
Senior Member
Registered: May 2005
Posts: 4,397
|
Quote:
Originally Posted by mrealty
Hi all:
Was wondering if any perl guru's could help me with a quick log file adjustment. I have a text file that looks like so (tabs and newlines are revealed so you can see what separates the data):
1234 {tab} purchase {tab} sale {newline}
4567 {tab} broken {tab} sale {newline}
4588 {tab} theft {tab} misc {newline}
1234 {tab} purchase {tab} audit {newline}
There are maybe 100 lines of text in this file at any given time. I need to delete all duplicate lines only looking at the first bit of text prior to the first tab. It doesn't matter which one gets deleted as long as there are no two lines that begin with that same text at the beginning before the first tab. So in this example, either the fist line "1234" or the last line "1234" would need to be deleted. I already have code in my script that opens the files - I just need the code to read the text into an array and the part that would find matches based on the above criteria, and make the deletions.
If it would be easier, I can even do a system call and use SED (v4.1.5) and/or AWK (3.1.5) instead.
With kind regards.
|
Think about your problem from a different angle. Consider the first field (e.g. "1234") as hash key, and the rest of the line as value.
So, if you compose your hash this way, since keys are unique, there will be exactly on line with one unique key, and the line will be the key followed by the key's value.
|
|
|
|
04-01-2009, 09:00 AM
|
#3
|
|
LQ Newbie
Registered: Apr 2009
Posts: 6
Original Poster
Rep:
|
Thanks for the speedy reply.
I see where you're coming from but...I still don't see the whole picture. I'm not clear on the conditions. Read line, if hash value exists, then that section of the array gets that line, but what if the hash value does not exist? Then insert? Not sure how to read it in to begin with. Am I needing two arrays?
With kind regards.
|
|
|
|
04-01-2009, 09:25 AM
|
#4
|
|
Senior Member
Registered: May 2005
Posts: 4,397
|
Quote:
Originally Posted by mrealty
Thanks for the speedy reply.
I see where you're coming from but...I still don't see the whole picture. I'm not clear on the conditions. Read line, if hash value exists, then that section of the array gets that line, but what if the hash value does not exist? Then insert? Not sure how to read it in to begin with. Am I needing two arrays?
With kind regards.
|
There is no array, and there is no "if" - just add the hash key => value unconditionally.
I.e.
- read the line;
- split it it into the first field and the rest;
- unconditionally insert the first_field => the_rest into the hash.
|
|
|
|
04-01-2009, 10:46 AM
|
#5
|
|
LQ Newbie
Registered: Apr 2009
Posts: 6
Original Poster
Rep:
|
Sorry, I'm old school and don't remember hash data structure from the language I learned (Turbo Pascal about 20 years ago). This is about 2 lines of code, correct?
I don't know how to "say" that pseudo code in perl.
With kind regards.
|
|
|
|
04-01-2009, 11:17 AM
|
#6
|
|
Senior Member
Registered: May 2005
Posts: 4,397
|
Quote:
Originally Posted by mrealty
Sorry, I'm old school and don't remember hash data structure from the language I learned (Turbo Pascal about 20 years ago). This is about 2 lines of code, correct?
I don't know how to "say" that pseudo code in perl.
With kind regards.
|
Yes, it's about two lines in Perl.
Did you start learning Perl at all ? I.e. did you write any Perl code with hashes ?
|
|
|
|
04-01-2009, 12:01 PM
|
#7
|
|
Member
Registered: May 2007
Distribution: Debian
Posts: 754
Rep:
|
Here's a version with some commentary. It's more than two lines, but I'm not a fan of shortest possible code for its own sake.
Code:
#!/usr/bin/env perl
use strict;
use warnings;
my %file_hash;
while (<>) {
next unless $_ =~ m/^\d/; # line doesn't begin with a digit; skip
# split line into the digit portion and everything else;
# assign digit to $key and everything else to $value
my ($key, $value) = ($_ =~ m/(\d+)(.*)/);
# each line becomes one entry in the hash %file_hash;
# since hash keys must be unique, any repeats overwrite
# the previous duplicate (ie, the second 1234 overwrites
# the first 1234 and the third would overwrite the second)
$file_hash{$key} = $value;
}
# go through the hash and print out what's left
for my $key (keys %file_hash) {
print $key, $file_hash{$key}, "\n";
}
If you save this code as, say, file_fixer, you can then run it by typing perl file_fixer file-name. Substitute the name of your file for <file-name>. The output will print to your terminal. If the output is sane, then you can save it with redirection in the shell: [b]perl file_fixer file-name > new-file
|
|
|
|
04-01-2009, 06:46 PM
|
#8
|
|
LQ Newbie
Registered: Apr 2009
Posts: 6
Original Poster
Rep:
|
Quote:
Originally Posted by Telemachos
Here's a version with some commentary. It's more than two lines, but I'm not a fan of shortest possible code for its own sake.
|
Telemachos! Thank you! You are a valuable asset to this forum. It worked beautifully (with a slight modification, as I didn't mention there was a header line in that file, but all is well). Thank you so much for explaining it too.
With kind regards.
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 03:51 PM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|