[SOLVED] sorting

sundialsvcs · 03-31-2016, 10:58 AM

A basic statistics package, such as "R" or otherwise, is often used to do this sort of thing ... and with data volumes that are this big.

Interestingly, if the number of commands and the number of users is not, itself, outrageously large, "a moderate [Perl?] script" can also be used to tackle this sort of problem, through the use of in-memory hashes. A hash-table keyed by "user-id" could contain an integer count. Likewise, a hash-table keyed by "command." Or, a so-called "hash of hashes" structure, where (say ...) each element in a hash keyed by "user" is itself a hash keyed by "command," containing an integer count.

In this approach, the file can be read sequentially, in situ, without being sorted at all. The only requirement is that enough RAM is available ... probably a very safe assumption these days.

jpollard · 03-31-2016, 01:18 PM

You can always put the base hashes into disk files...

Lsatenstein · 04-02-2016, 06:33 PM

Quote:

Originally Posted by bosong

The server I am doing has maybe about 1 million records per day, i want to sort out the username along side with the command use and also count for each command, is there anyway i can do it???

If the fields in your file have delimiters, I would use the cut command to create a file with only the columns you need. This presort pass may chop away many million bytes of unnecessary data. Subsequently use the Linux sort utility to achieve your objectives.

If possible, use/write a filter program to extract only the records you need before passing the result to the sort program.