sort large data in large file - commands
Hi there,
I have a document (700MB) that contains about 2-3 million lines of data, Data format is like this: xxx.yyy.zzz.aaa (IP address) date/time url link some numbers more numbers TCP_xxx:yyy The ip address are in internal IP range and each IP address may be repeated several hundreds of times. My aim is to sort IP addresses in some order (run sed command maybe) so that same IP address are grouped one below other etc….I want to sort report by IP address not by name and I don’t want to delete duplicates. I had a go by using sed command but I got stuck. I run split command and broken down file to 20MB size, but still not joy, I just multiplied my problem. Can someone advice on how to the archive above please. Thank you in advance Denis |
i think the sort command would be adequate.
|
Yes, sort sounds like all you need. Use the '-n' option to make it use numeric, rather than alphabetic order. It could take a while.
--- rod. |
Well, first thing is that there's no poing of sorting IP addresses, as they're simple numbers (and not any index like 1, 2, 3...), so no meaning of sorting like lower to hight or higher to lower. But in your case, I guess you want to sort (infact, extract) only unique IP addresses.
A simple uniq command will not help, but use of sort -u will be helpful. So, you can try with:- Code:
awk '!_[$1]++' /path/to/file Code:
sort -t" " -k1 -u /path/to/file Code:
sort -t" " -k1 /path/to/file | more |
for speeding the process you can put your file in your ram . using this tutorial :
http://www.thegeekstuff.com/2008/11/...mpfs-on-linux/ |
Quote:
I have been doing sort/uniq operations on very large text files (even more than 2GB) and there's no problem with above mentioned commands with large files. Suggesting such incomplete link will just make things complicated and lead to confusion to OP. |
i didn't said theres process in that link .
"""" for speeding the process you can put your file in your ram . using this tutorial """" and he split hes file to smaller files so i taught he may have speed problems. doing this for me make some times on big files. |
Hi all,
thank you for your suggestions. Once I successfully complete this task, I'll update this post with my solution. Cheers Denis |
All times are GMT -5. The time now is 08:11 AM. |