[SOLVED] sort large data in large file

smithy2010 · 01-09-2013, 08:44 AM

Hi there,

I have a document (700MB) that contains about 2-3 million lines of data,

Data format is like this:

xxx.yyy.zzz.aaa (IP address) date/time url link some numbers more numbers TCP_xxx:yyy

The ip address are in internal IP range and each IP address may be repeated several hundreds of times.
My aim is to sort IP addresses in some order (run sed command maybe) so that same IP address are grouped one below other etc….I want to sort report by IP address not by name and I don’t want to delete duplicates.
I had a go by using sed command but I got stuck. I run split command and broken down file to 20MB size, but still not joy, I just multiplied my problem.

Can someone advice on how to the archive above please.

Thank you in advance

Denis

schneidz · 01-09-2013, 08:45 AM

i think the sort command would be adequate.

theNbomr · 01-09-2013, 09:25 AM

Yes, sort sounds like all you need. Use the '-n' option to make it use numeric, rather than alphabetic order. It could take a while.
--- rod.

shivaa · 01-09-2013, 12:37 PM

Well, first thing is that there's no poing of sorting IP addresses, as they're simple numbers (and not any index like 1, 2, 3...), so no meaning of sorting like lower to hight or higher to lower. But in your case, I guess you want to sort (infact, extract) only unique IP addresses.

A simple uniq command will not help, but use of sort -u will be helpful. So, you can try with:-

Code:

awk '!_[$1]++' /path/to/file

Or,

Code:

sort -t" " -k1 -u /path/to/file

On the other hand, if you just want to group similar range of IP addresses, then you can use simple sort cmd, like this (but it will output the whole file, so use more filter to see output in page by page manner):

Code:

sort -t" " -k1 /path/to/file | more

alieblice · 01-09-2013, 01:06 PM

for speeding the process you can put your file in your ram . using this tutorial :
http://www.thegeekstuff.com/2008/11/...mpfs-on-linux/

shivaa · 01-09-2013, 01:15 PM

Quote:

Originally Posted by alieblice

for speeding the process you can put your file in your ram . using this tutorial :
http://www.thegeekstuff.com/2008/11/...mpfs-on-linux/

@alieblice: Where's the process in this link?

I have been doing sort/uniq operations on very large text files (even more than 2GB) and there's no problem with above mentioned commands with large files. Suggesting such incomplete link will just make things complicated and lead to confusion to OP.

alieblice · 01-09-2013, 01:46 PM

i didn't said theres process in that link .
""""
for speeding the process you can put your file in your ram . using this tutorial
""""
and he split hes file to smaller files so i taught he may have speed problems.
doing this for me make some times on big files.

smithy2010 · 02-03-2013, 09:01 AM

Hi all,

thank you for your suggestions. Once I successfully complete this task, I'll update this post with my solution.

Cheers

Denis