LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   Is there a line limit with the sort utility? Trying to sort 130 million lines of text (https://www.linuxquestions.org/questions/linux-general-1/is-there-a-line-limit-with-the-sort-utility-trying-to-sort-130-million-lines-of-text-472771/)

gruffy 08-10-2006 07:03 PM

Is there a line limit with the sort utility? Trying to sort 130 million lines of text
 
Running RHE4 WS - my first big data project. I have a 2GB file that has about 130 million lines of text.

I need to get a count of the duplicates so I need to run SORT first so I can use uniq -cd on the sorted file.

I'm just using "sort myfile.txt -o myfile2.txt" with no luck.

The terminal just sits there doing nothing and doesn't create the second file or anything... I left it for about 1/2 hour and it just sits.


Any practical limitations to sort that I should be aware of?

gilead 08-10-2006 07:49 PM

I couldn't find any info about limits for the sort command. If you have a fast enough box with enough disk space, can you let it just keep running? Put it into the background or use screen so that you can keep working on other tasks. Or better still, fire it off when you go home on a Friday night and check it monday morning.

That's not much help, but I'm curious as to whether it can do it.

gruffy 08-10-2006 08:04 PM

Quote:

Originally Posted by gilead
I couldn't find any info about limits for the sort command. If you have a fast enough box with enough disk space, can you let it just keep running? Put it into the background or use screen so that you can keep working on other tasks. Or better still, fire it off when you go home on a Friday night and check it monday morning.

That's not much help, but I'm curious as to whether it can do it.


Box should be fast enough... dual p3 1ghz w/ 1GB RAM and a SCSI U160 drive.

I actually tried again and am pretty sure its doing something as there are 10MB slices prefixed with "sort" (ie, sortYVgT.txt) appearing and disapearing in the \tmp dir. I can leave it on indefinately... but this hopefully won't take more than 24 hours.

cs-cam 08-10-2006 08:07 PM

The memory will be the limitation. I don't know about 24hrs but it'll certainly take more than 30mins.

gruffy 08-10-2006 08:40 PM

Quote:

Originally Posted by cs-cam
The memory will be the limitation. I don't know about 24hrs but it'll certainly take more than 30mins.


Actually the main limitation is probablly space! The 10 MB slices have turned into 200MB slices and i'm down to my last 1GB of disk space.


All times are GMT -5. The time now is 03:42 PM.