[Perl] fail to sort a file with 300,000 lines by multiple columns
Each line of the file I am sorting is in the following format:
<url> <month> <day> For example: http://www.google.com 10 3 I wrote the following to sort: Code:
#!/usr/bin/perl The output will contains some lines like that: ----------------- url_one 10 1 url_two 10 1 url_three 10 3 url_four 10 1 ---------------- Is that because my file is too big for perl to handle ? Any idea is well appreciated, -Thanks, -Kun |
Perl is designed to handle and process large amounts of data, so I imagine your 18MB file isnt "breaking" Perl. Actually, I'd bet money it isnt, because Ive written Perl scripts to process large files (GBs in size).
Quote:
|
@OP, if its not a must to use Perl, you can try using the normal GNU sort command.
|
Just found out the program is not working for new small files (one with 100 lines and have real url inside). My previous testing files were only supposing URL as a simple string only, it was working for files with 10-20 lines.
So is any problem with my program according to the code ? Quote:
|
I tried 'sort +1 +2 [my_file]' but it reminds me 'sort: open failed: +1: No such file or directory'.
And 'sort -k2 -k3 [my_file]' could execute but the result is not correct either.. Yet I do have three columns inside the file (separate by ' '). Quote:
|
show a few more input samples, and show your desired output when sorted.
|
Rather than slurp the file, I would read it in line by line, build up the @fields array a line at a time and then sort it. 18MB is not the end of the world, but as filesize grows, it gets less and less smart to slurp.
|
A piece of file is like below (It has been sorted once by the program):
the input and output files are of same content but different order. The last two fields are month and date in numeric number. Basically I want to sort them by month then by date,my program seem to succeed in sorting month but not all the date.. Code:
http://www.amazon.com/Lawn-Garden-Tools-Hardware/b/ref=sa_menu_outequip11/191-6429805-3838363 9 5 |
You want a Schwartzian transform, I think. Here's how I might do it:
An input file (out of order): Code:
http://www.amazon.com/b/ref=amb_link_7395972_64/191-6429805-3838363 9 6 Code:
#!/usr/bin/env perl Code:
hektor ~ $ perl parser file.txt |
Thanks a lot! Telemachos! It works like a charm!
Possible give me some explaination about that ? (or what was wrong with my previous program ?) The prev program I was using was introduced as working example according to many sites although it is not... Also they mentioned the prev program were using 'Schwartzian transform'.. |
I have to be honest, I came to this post a bit late and didn't even look at the first Perl version you had posted. (I knew it wasn't working, so I wrote something that I thought did work.) So I didn't see that there was in fact a Schwartzian transform there.
There were two basic problems with the script you started with: first, it was sorting by fields 2 and 1 - even though field 1 was the url and second, it was trying to sort by date before month. My guess is that the script you found was written for similar, but not identical records. As for what my script does, here's the breakdown. I put the explanations into comments in the file. Hope this helps: Code:
#!/usr/bin/env perl |
All times are GMT -5. The time now is 10:33 AM. |