Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
|
06-30-2009, 01:11 PM
|
#1
|
LQ Newbie
Registered: Jun 2009
Posts: 4
Rep:
|
more efficient {min,max} than "sort | {head,tail} -1"?
Is there a more efficient way to get the min (max) than sorting the entire input and piping to head (tail) -1?
|
|
|
06-30-2009, 01:13 PM
|
#2
|
LQ Newbie
Registered: Jun 2009
Posts: 4
Original Poster
Rep:
|
In other words, does a command line "min" or "max" program exist in Unix?
|
|
|
06-30-2009, 02:03 PM
|
#3
|
LQ Guru
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
|
You know if you post in your own thread such that it has more than 1 (the original post) it removes your thread from the 0-reply list, and you're less likely to get help with an issue.
Either way, whatever way there is of finding max and min, it will ALWAYS involve sorting. And since sort is written in C, it's your best bet for efficiency. You could also do it with awk, but it would likely be less efficient.
|
|
|
06-30-2009, 11:30 PM
|
#4
|
ELF Statifier author
Registered: Oct 2007
Posts: 676
Rep: 
|
Quote:
Originally Posted by H_TeXMeX_H
You know if you post in your own thread such that it has more than 1 (the original post) it removes your thread from the 0-reply list, and you're less likely to get help with an issue.
Either way, whatever way there is of finding max and min, it will ALWAYS involve sorting. And since sort is written in C, it's your best bet for efficiency. You could also do it with awk, but it would likely be less efficient.
|
Why do you think that looking for min/max should ALWAYS involve sorting ?
A naive way to do it is sequence reading of lines and storing min/max value. Could be done with sh, awk, perl - (nearly) every available interpreter. Of course you can "abuse" sort too, but why it's mandatory ?
|
|
|
06-30-2009, 11:46 PM
|
#5
|
Senior Member
Registered: Aug 2006
Posts: 2,697
|
Quote:
Originally Posted by magicbronson
Is there a more efficient way to get the min (max) than sorting the entire input and piping to head (tail) -1?
|
if you meant getting min/max value of a range of numbers (in a file), see here for an example.
|
|
|
07-01-2009, 02:36 AM
|
#6
|
LQ Guru
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
|
Quote:
Originally Posted by Valery Reznic
Why do you think that looking for min/max should ALWAYS involve sorting ?
A naive way to do it is sequence reading of lines and storing min/max value. Could be done with sh, awk, perl - (nearly) every available interpreter. Of course you can "abuse" sort too, but why it's mandatory ?
|
Lets say you have an array or list of unsorted data and you want to find the min and max values. How would you do it ? You would have to sort the data or at least go through every element and check it versus every other element and see which is smallest and which is biggest. ghostdog74 posted the algorithm that would be applied. It effect even doing it this way you are sorting the data and doing at least as much work as effectively sorting the data.
|
|
|
07-01-2009, 03:39 AM
|
#7
|
ELF Statifier author
Registered: Oct 2007
Posts: 676
Rep: 
|
Quote:
Originally Posted by H_TeXMeX_H
Lets say you have an array or list of unsorted data and you want to find the min and max values. How would you do it ? You would have to sort the data or at least go through every element and check it versus every other element and see which is smallest and which is biggest. ghostdog74 posted the algorithm that would be applied. It effect even doing it this way you are sorting the data and doing at least as much work as effectively sorting the data.
|
"at least go through every element and check it versus every other element",
No, you have not. You have only (as in ghostdog74's algorithm) compare each element with CURRENT minimum and current maximum, not each element with each other.
And this is surely less work then sorting.
|
|
|
07-01-2009, 03:58 AM
|
#8
|
LQ Guru
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
|
Alright, you might be right, so I ran a benchmark:
Code:
bash-3.1$ for i in $(seq 100000); do echo $RANDOM >> file; done
bash-3.1$ time awk 'NR==1{
tempmin=$0
tempmax=$0
}
$0 >= tempmax{ tempmax=$0 }
$0 <= tempmin { tempmin = $0 }
END{
print "min: "tempmin
print "max: "tempmax
}' file
min: 0
max: 32767
real 0m0.078s
user 0m0.077s
sys 0m0.001s
bash-3.1$ time sort -n file > sorted && head -n1 sorted && tail -n1 sorted
real 0m0.082s
user 0m0.079s
sys 0m0.003s
0
9999
You'd be better off writing the minmax is C, it will likely be much faster.
Last edited by H_TeXMeX_H; 07-01-2009 at 04:00 AM.
|
|
|
07-01-2009, 04:35 AM
|
#9
|
Senior Member
Registered: Aug 2006
Posts: 2,697
|
Quote:
Originally Posted by H_TeXMeX_H
bash-3.1$ time sort -n file > sorted && head -n1 sorted && tail -n1 sorted
[/code]
|
one little things in this piece of code is that you are redirecting it to a file. Might take a bit of i/o time.
that said, another way to write
Code:
sort -n file | awk 'NR==1;END{print}'
one awk process to print first and last record, however i doubt it will be faster than head + tail combination.
Last edited by ghostdog74; 07-01-2009 at 04:40 AM.
|
|
|
07-01-2009, 04:50 AM
|
#10
|
LQ Guru
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
|
yeah, it is slower:
Code:
bash-3.1$ time sort -n file | awk 'NR==1;END{print}'
0
32767
real 0m0.100s
user 0m0.097s
sys 0m0.004s
After thinking about it a bit, sorting is likely to be less efficient than just finding min and max, but if you have to use bash only then it won't matter. Best think to do is write a small C program.
|
|
|
07-01-2009, 05:24 AM
|
#11
|
Senior Member
Registered: Oct 2005
Location: UK
Distribution: Slackware
Posts: 1,847
Rep: 
|
There's a tool in the Generic Mapping Tools (GMT) package called minmax which does just this: it reads a file and spits out the minimum and maximum values for each column. I'm not sure exactly how it's done, but on a file generated using your "for i in $(seq 100000); do echo $RANDOM >> file; done" command, these are the time results:
Code:
pwc@mike ~/desktop $ time minmax file
file: N = 100000 <0/32767>
minmax file 0.12s user 0.06s system 51% cpu 0.359 total
pwc@mike ~/desktop $ time sort -n file | awk 'NR==1;END{print}'
0
32767
sort -n file 0.19s user 0.01s system 99% cpu 0.203 total
awk 'NR==1;END{print}' 0.08s user 0.03s system 52% cpu 0.203 total
pwc@mike ~/desktop $ time awk 'NR==1{
> tempmin=$0
> tempmax=$0
> }
> $0 >= tempmax{ tempmax=$0 }
> $0 <= tempmin { tempmin = $0 }
> END{
> print "min: "tempmin
> print "max: "tempmax
> }' file
min: 0
max: 32767
awk file 0.23s user 0.03s system 105% cpu 0.250 total
I've attached the source for minmax, if anyone's interested. The project's homepage is http://gmt.soest.hawaii.edu/.
|
|
|
07-01-2009, 05:23 PM
|
#12
|
LQ Newbie
Registered: Jun 2009
Posts: 4
Original Poster
Rep:
|
So a minmax utility does exist in some package (albeit one whose ~60 other utilities I have no need for)! Interesting find, pwc101. What I'm looking for though is something with a similar interface to coreutils' "sort", so I've just posted to the bug-coreutils mailing list asking if they might add it.
|
|
|
07-01-2009, 11:12 PM
|
#13
|
LQ Newbie
Registered: Jun 2009
Posts: 4
Original Poster
Rep:
|
|
|
|
07-02-2009, 04:03 AM
|
#14
|
LQ Guru
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
|
You might want to ask for a multi-utility tool, because minmax is just not enough. If you want I'll write a short C program later today to do it. It's quite simple, so it shouldn't be hard.
|
|
|
07-02-2009, 04:07 AM
|
#15
|
Senior Member
Registered: Oct 2005
Location: UK
Distribution: Slackware
Posts: 1,847
Rep: 
|
Quote:
Originally Posted by magicbronson
So a minmax utility does exist in some package (albeit one whose ~60 other utilities I have no need for)! Interesting find, pwc101. What I'm looking for though is something with a similar interface to coreutils' "sort", so I've just posted to the bug-coreutils mailing list asking if they might add it.
|
We use GMT all the time here at work for our plots, so it wasn't much of a find!
I'm sure it'd be pretty straight forward to hack out the GMT specific parts and be left with a simple minmax utility. But as you say, you've a different approach in mind.
If I weren't as busy as I am at the moment, I'd be tempted to have a go myself. Maybe I will, when I get bored! 
|
|
|
All times are GMT -5. The time now is 04:09 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|