LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 01-09-2013, 09:44 AM   #1
smithy2010
Member
 
Registered: May 2010
Location: UK
Distribution: OpenSuse 64 bit
Posts: 45

Rep: Reputation: 15
Smile sort large data in large file - commands


Hi there,

I have a document (700MB) that contains about 2-3 million lines of data,

Data format is like this:

xxx.yyy.zzz.aaa (IP address) date/time url link some numbers more numbers TCP_xxx:yyy

The ip address are in internal IP range and each IP address may be repeated several hundreds of times.
My aim is to sort IP addresses in some order (run sed command maybe) so that same IP address are grouped one below other etc….I want to sort report by IP address not by name and I don’t want to delete duplicates.
I had a go by using sed command but I got stuck. I run split command and broken down file to 20MB size, but still not joy, I just multiplied my problem.

Can someone advice on how to the archive above please.

Thank you in advance

Denis
 
Old 01-09-2013, 09:45 AM   #2
schneidz
LQ Guru
 
Registered: May 2005
Location: boston, usa
Distribution: fc-15/ fc-20-live-usb/ aix
Posts: 5,026

Rep: Reputation: 845Reputation: 845Reputation: 845Reputation: 845Reputation: 845Reputation: 845Reputation: 845
i think the sort command would be adequate.
 
Old 01-09-2013, 10:25 AM   #3
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,396
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
Yes, sort sounds like all you need. Use the '-n' option to make it use numeric, rather than alphabetic order. It could take a while.
--- rod.
 
Old 01-09-2013, 01:37 PM   #4
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,800
Blog Entries: 4

Rep: Reputation: 286Reputation: 286Reputation: 286
Well, first thing is that there's no poing of sorting IP addresses, as they're simple numbers (and not any index like 1, 2, 3...), so no meaning of sorting like lower to hight or higher to lower. But in your case, I guess you want to sort (infact, extract) only unique IP addresses.

A simple uniq command will not help, but use of sort -u will be helpful. So, you can try with:-

Code:
awk '!_[$1]++' /path/to/file
Or,
Code:
sort -t" " -k1 -u /path/to/file
On the other hand, if you just want to group similar range of IP addresses, then you can use simple sort cmd, like this (but it will output the whole file, so use more filter to see output in page by page manner):
Code:
sort -t" " -k1 /path/to/file | more

Last edited by shivaa; 01-09-2013 at 01:56 PM. Reason: Info. added
 
Old 01-09-2013, 02:06 PM   #5
alieblice
Member
 
Registered: Jul 2011
Posts: 80

Rep: Reputation: Disabled
for speeding the process you can put your file in your ram . using this tutorial :
http://www.thegeekstuff.com/2008/11/...mpfs-on-linux/
 
Old 01-09-2013, 02:15 PM   #6
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,800
Blog Entries: 4

Rep: Reputation: 286Reputation: 286Reputation: 286
Quote:
Originally Posted by alieblice View Post
for speeding the process you can put your file in your ram . using this tutorial :
http://www.thegeekstuff.com/2008/11/...mpfs-on-linux/
@alieblice: Where's the process in this link?

I have been doing sort/uniq operations on very large text files (even more than 2GB) and there's no problem with above mentioned commands with large files. Suggesting such incomplete link will just make things complicated and lead to confusion to OP.

Last edited by shivaa; 01-09-2013 at 02:18 PM.
 
Old 01-09-2013, 02:46 PM   #7
alieblice
Member
 
Registered: Jul 2011
Posts: 80

Rep: Reputation: Disabled
i didn't said theres process in that link .
""""
for speeding the process you can put your file in your ram . using this tutorial
""""
and he split hes file to smaller files so i taught he may have speed problems.
doing this for me make some times on big files.
 
Old 02-03-2013, 10:01 AM   #8
smithy2010
Member
 
Registered: May 2010
Location: UK
Distribution: OpenSuse 64 bit
Posts: 45

Original Poster
Rep: Reputation: 15
Hi all,

thank you for your suggestions. Once I successfully complete this task, I'll update this post with my solution.

Cheers

Denis
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Shell Session Crashing when cat'ing a large file, tar'ing a large file, etc. newmanium2001 Linux - General 3 12-22-2012 11:32 PM
making the file containing large data empty chetanpatel Linux - Newbie 4 03-16-2009 01:15 PM
[quick] trying to split a large file but linux says it's to large steve51184 Linux - General 16 05-06-2008 08:40 AM
LXer: This week at LWN: Large pages, large blocks, and large problems LXer Syndicated Linux News 0 09-27-2007 12:40 PM
File too large (script is too large to execute) DeuceNegative Linux - General 1 05-09-2007 01:10 AM


All times are GMT -5. The time now is 06:00 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration