Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Hi all, I am forever dealing with vast lists typically exported to text files with records split by carriage return.
I use a python script to create the files I need but due to the sizes produced, I have to split them into manageable files or the text editor crashes or I run out of memory (or both).
I have been reading about the awk, sort and uniq commands as a way to filter out any duplicate lines that may have been produced across the files but do not know how to implement it as I wish to work with the file-set as if it were one, not multiple.
Can anyone assist me with my crunching problem with a few handy commands or a small script?
How big are these files? If they're HUGE, you may be best off with a dedicated app to handle this (or, more likely, a real database engine). Why open records like this in a text editor?
The following would work, if you have a reasonable amount of memory (assumes your files are named set#.txt):
Thank you Matir, I shall try this and see if it works. I would like to use a database system and have toyed with Postgres but for the amount of records and frequency of change I'm not sure the best way to handle them.
All are typically ASCII or extended 8 bit from 1 to any length but typically no more than 32. I am using text whilst I better understand databases, compression, speed of access and also find good tutorials on Postgres, which seem to be few. Don't get me wrong, I like MySQL but for this and future projects, I lean heavily towards Postgres. I also have no idea how to load files of 1Mb to 2Tb on the fly (with relative ease), so you see my boggle!
I have looked at SQLite and the firefox plugin to work with the data, but again, I can't find any good tutorials on using it!
Last edited by smudge|lala; 09-24-2008 at 06:00 PM.
If you're serious about 2TB files I'd use python if you already know it (from your OP) to do file manipulation, or learn Perl if you're open to suggestions. Its good at that stuff.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.