LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   sort large data in large file - commands (https://www.linuxquestions.org/questions/linux-newbie-8/sort-large-data-in-large-file-commands-4175444819/)

smithy2010 01-09-2013 08:44 AM

sort large data in large file - commands
 
Hi there,

I have a document (700MB) that contains about 2-3 million lines of data,

Data format is like this:

xxx.yyy.zzz.aaa (IP address) date/time url link some numbers more numbers TCP_xxx:yyy

The ip address are in internal IP range and each IP address may be repeated several hundreds of times.
My aim is to sort IP addresses in some order (run sed command maybe) so that same IP address are grouped one below other etc….I want to sort report by IP address not by name and I don’t want to delete duplicates.
I had a go by using sed command but I got stuck. I run split command and broken down file to 20MB size, but still not joy, I just multiplied my problem.

Can someone advice on how to the archive above please.

Thank you in advance

Denis

schneidz 01-09-2013 08:45 AM

i think the sort command would be adequate.

theNbomr 01-09-2013 09:25 AM

Yes, sort sounds like all you need. Use the '-n' option to make it use numeric, rather than alphabetic order. It could take a while.
--- rod.

shivaa 01-09-2013 12:37 PM

Well, first thing is that there's no poing of sorting IP addresses, as they're simple numbers (and not any index like 1, 2, 3...), so no meaning of sorting like lower to hight or higher to lower. But in your case, I guess you want to sort (infact, extract) only unique IP addresses.

A simple uniq command will not help, but use of sort -u will be helpful. So, you can try with:-

Code:

awk '!_[$1]++' /path/to/file
Or,
Code:

sort -t" " -k1 -u /path/to/file
On the other hand, if you just want to group similar range of IP addresses, then you can use simple sort cmd, like this (but it will output the whole file, so use more filter to see output in page by page manner):
Code:

sort -t" " -k1 /path/to/file | more

alieblice 01-09-2013 01:06 PM

for speeding the process you can put your file in your ram . using this tutorial :
http://www.thegeekstuff.com/2008/11/...mpfs-on-linux/

shivaa 01-09-2013 01:15 PM

Quote:

Originally Posted by alieblice (Post 4866373)
for speeding the process you can put your file in your ram . using this tutorial :
http://www.thegeekstuff.com/2008/11/...mpfs-on-linux/

@alieblice: Where's the process in this link?

I have been doing sort/uniq operations on very large text files (even more than 2GB) and there's no problem with above mentioned commands with large files. Suggesting such incomplete link will just make things complicated and lead to confusion to OP.

alieblice 01-09-2013 01:46 PM

i didn't said theres process in that link .
""""
for speeding the process you can put your file in your ram . using this tutorial
""""
and he split hes file to smaller files so i taught he may have speed problems.
doing this for me make some times on big files.

smithy2010 02-03-2013 09:01 AM

Hi all,

thank you for your suggestions. Once I successfully complete this task, I'll update this post with my solution.

Cheers

Denis


All times are GMT -5. The time now is 08:11 AM.