more efficient {min,max} than "sort | {head,tail} -1"?
Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
You know if you post in your own thread such that it has more than 1 (the original post) it removes your thread from the 0-reply list, and you're less likely to get help with an issue.
Either way, whatever way there is of finding max and min, it will ALWAYS involve sorting. And since sort is written in C, it's your best bet for efficiency. You could also do it with awk, but it would likely be less efficient.
You know if you post in your own thread such that it has more than 1 (the original post) it removes your thread from the 0-reply list, and you're less likely to get help with an issue.
Either way, whatever way there is of finding max and min, it will ALWAYS involve sorting. And since sort is written in C, it's your best bet for efficiency. You could also do it with awk, but it would likely be less efficient.
Why do you think that looking for min/max should ALWAYS involve sorting ?
A naive way to do it is sequence reading of lines and storing min/max value. Could be done with sh, awk, perl - (nearly) every available interpreter. Of course you can "abuse" sort too, but why it's mandatory ?
Why do you think that looking for min/max should ALWAYS involve sorting ?
A naive way to do it is sequence reading of lines and storing min/max value. Could be done with sh, awk, perl - (nearly) every available interpreter. Of course you can "abuse" sort too, but why it's mandatory ?
Lets say you have an array or list of unsorted data and you want to find the min and max values. How would you do it ? You would have to sort the data or at least go through every element and check it versus every other element and see which is smallest and which is biggest. ghostdog74 posted the algorithm that would be applied. It effect even doing it this way you are sorting the data and doing at least as much work as effectively sorting the data.
Lets say you have an array or list of unsorted data and you want to find the min and max values. How would you do it ? You would have to sort the data or at least go through every element and check it versus every other element and see which is smallest and which is biggest. ghostdog74 posted the algorithm that would be applied. It effect even doing it this way you are sorting the data and doing at least as much work as effectively sorting the data.
"at least go through every element and check it versus every other element",
No, you have not. You have only (as in ghostdog74's algorithm) compare each element with CURRENT minimum and current maximum, not each element with each other.
And this is surely less work then sorting.
bash-3.1$ time sort -n file | awk 'NR==1;END{print}'
0
32767
real 0m0.100s
user 0m0.097s
sys 0m0.004s
After thinking about it a bit, sorting is likely to be less efficient than just finding min and max, but if you have to use bash only then it won't matter. Best think to do is write a small C program.
There's a tool in the Generic Mapping Tools (GMT) package called minmax which does just this: it reads a file and spits out the minimum and maximum values for each column. I'm not sure exactly how it's done, but on a file generated using your "for i in $(seq 100000); do echo $RANDOM >> file; done" command, these are the time results:
Code:
pwc@mike ~/desktop $ time minmax file
file: N = 100000 <0/32767>
minmax file 0.12s user 0.06s system 51% cpu 0.359 total
pwc@mike ~/desktop $ time sort -n file | awk 'NR==1;END{print}'
0
32767
sort -n file 0.19s user 0.01s system 99% cpu 0.203 total
awk 'NR==1;END{print}' 0.08s user 0.03s system 52% cpu 0.203 total
pwc@mike ~/desktop $ time awk 'NR==1{
> tempmin=$0
> tempmax=$0
> }
> $0 >= tempmax{ tempmax=$0 }
> $0 <= tempmin { tempmin = $0 }
> END{
> print "min: "tempmin
> print "max: "tempmax
> }' file
min: 0
max: 32767
awk file 0.23s user 0.03s system 105% cpu 0.250 total
I've attached the source for minmax, if anyone's interested. The project's homepage is http://gmt.soest.hawaii.edu/.
So a minmax utility does exist in some package (albeit one whose ~60 other utilities I have no need for)! Interesting find, pwc101. What I'm looking for though is something with a similar interface to coreutils' "sort", so I've just posted to the bug-coreutils mailing list asking if they might add it.
You might want to ask for a multi-utility tool, because minmax is just not enough. If you want I'll write a short C program later today to do it. It's quite simple, so it shouldn't be hard.
So a minmax utility does exist in some package (albeit one whose ~60 other utilities I have no need for)! Interesting find, pwc101. What I'm looking for though is something with a similar interface to coreutils' "sort", so I've just posted to the bug-coreutils mailing list asking if they might add it.
We use GMT all the time here at work for our plots, so it wasn't much of a find!
I'm sure it'd be pretty straight forward to hack out the GMT specific parts and be left with a simple minmax utility. But as you say, you've a different approach in mind.
If I weren't as busy as I am at the moment, I'd be tempted to have a go myself. Maybe I will, when I get bored!
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.