Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
A basic statistics package, such as "R" or otherwise, is often used to do this sort of thing ... and with data volumes that are this big.
Interestingly, if the number of commands and the number of users is not, itself, outrageously large, "a moderate [Perl?] script" can also be used to tackle this sort of problem, through the use of in-memory hashes. A hash-table keyed by "user-id" could contain an integer count. Likewise, a hash-table keyed by "command." Or, a so-called "hash of hashes" structure, where (say ...) each element in a hash keyed by "user" is itself a hash keyed by "command," containing an integer count.
In this approach, the file can be read sequentially, in situ, without being sorted at all. The only requirement is that enough RAM is available ... probably a very safe assumption these days.
The server I am doing has maybe about 1 million records per day, i want to sort out the username along side with the command use and also count for each command, is there anyway i can do it???
If the fields in your file have delimiters, I would use the cut command to create a file with only the columns you need. This presort pass may chop away many million bytes of unnecessary data. Subsequently use the Linux sort utility to achieve your objectives.
If possible, use/write a filter program to extract only the records you need before passing the result to the sort program.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.