LinuxQuestions.org
LinuxAnswers - the LQ Linux tutorial section.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices

Reply
 
Search this Thread
Old 08-10-2006, 08:03 PM   #1
gruffy
LQ Newbie
 
Registered: Oct 2004
Distribution: RHE4-WS
Posts: 7

Rep: Reputation: 0
Is there a line limit with the sort utility? Trying to sort 130 million lines of text


Running RHE4 WS - my first big data project. I have a 2GB file that has about 130 million lines of text.

I need to get a count of the duplicates so I need to run SORT first so I can use uniq -cd on the sorted file.

I'm just using "sort myfile.txt -o myfile2.txt" with no luck.

The terminal just sits there doing nothing and doesn't create the second file or anything... I left it for about 1/2 hour and it just sits.


Any practical limitations to sort that I should be aware of?
 
Old 08-10-2006, 08:49 PM   #2
gilead
Senior Member
 
Registered: Dec 2005
Location: Brisbane, Australia
Distribution: Slackware64 14.0
Posts: 4,123

Rep: Reputation: 162Reputation: 162
I couldn't find any info about limits for the sort command. If you have a fast enough box with enough disk space, can you let it just keep running? Put it into the background or use screen so that you can keep working on other tasks. Or better still, fire it off when you go home on a Friday night and check it monday morning.

That's not much help, but I'm curious as to whether it can do it.
 
Old 08-10-2006, 09:04 PM   #3
gruffy
LQ Newbie
 
Registered: Oct 2004
Distribution: RHE4-WS
Posts: 7

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by gilead
I couldn't find any info about limits for the sort command. If you have a fast enough box with enough disk space, can you let it just keep running? Put it into the background or use screen so that you can keep working on other tasks. Or better still, fire it off when you go home on a Friday night and check it monday morning.

That's not much help, but I'm curious as to whether it can do it.

Box should be fast enough... dual p3 1ghz w/ 1GB RAM and a SCSI U160 drive.

I actually tried again and am pretty sure its doing something as there are 10MB slices prefixed with "sort" (ie, sortYVgT.txt) appearing and disapearing in the \tmp dir. I can leave it on indefinately... but this hopefully won't take more than 24 hours.
 
Old 08-10-2006, 09:07 PM   #4
cs-cam
Senior Member
 
Registered: May 2004
Location: Australia
Distribution: Gentoo
Posts: 3,544
Blog Entries: 4

Rep: Reputation: 56
The memory will be the limitation. I don't know about 24hrs but it'll certainly take more than 30mins.
 
Old 08-10-2006, 09:40 PM   #5
gruffy
LQ Newbie
 
Registered: Oct 2004
Distribution: RHE4-WS
Posts: 7

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by cs-cam
The memory will be the limitation. I don't know about 24hrs but it'll certainly take more than 30mins.

Actually the main limitation is probablly space! The 10 MB slices have turned into 200MB slices and i'm down to my last 1GB of disk space.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
sorting text file - sort command man_linux Linux - General 16 08-09-2006 05:58 PM
How can I sort the lines in a file? windhair Linux - Software 2 11-17-2005 09:37 AM
Command line to sort files satimis Programming 6 06-28-2005 03:50 AM
Sort utility is miss-behaving under linux raees Linux - Software 1 04-21-2005 04:10 PM
Reverse Sort Text File BxBoy Linux - General 1 08-02-2004 11:13 AM


All times are GMT -5. The time now is 08:21 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration