Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have a document (700MB) that contains about 2-3 million lines of data,
Data format is like this:
xxx.yyy.zzz.aaa (IP address) date/time url link some numbers more numbers TCP_xxx:yyy
The ip address are in internal IP range and each IP address may be repeated several hundreds of times.
My aim is to sort IP addresses in some order (run sed command maybe) so that same IP address are grouped one below other etc….I want to sort report by IP address not by name and I don’t want to delete duplicates.
I had a go by using sed command but I got stuck. I run split command and broken down file to 20MB size, but still not joy, I just multiplied my problem.
Can someone advice on how to the archive above please.
Well, first thing is that there's no poing of sorting IP addresses, as they're simple numbers (and not any index like 1, 2, 3...), so no meaning of sorting like lower to hight or higher to lower. But in your case, I guess you want to sort (infact, extract) only unique IP addresses.
A simple uniq command will not help, but use of sort -u will be helpful. So, you can try with:-
Code:
awk '!_[$1]++' /path/to/file
Or,
Code:
sort -t" " -k1 -u /path/to/file
On the other hand, if you just want to group similar range of IP addresses, then you can use simple sort cmd, like this (but it will output the whole file, so use more filter to see output in page by page manner):
Code:
sort -t" " -k1 /path/to/file | more
Last edited by shivaa; 01-09-2013 at 12:56 PM.
Reason: Info. added
I have been doing sort/uniq operations on very large text files (even more than 2GB) and there's no problem with above mentioned commands with large files. Suggesting such incomplete link will just make things complicated and lead to confusion to OP.
i didn't said theres process in that link .
""""
for speeding the process you can put your file in your ram . using this tutorial
""""
and he split hes file to smaller files so i taught he may have speed problems.
doing this for me make some times on big files.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.