LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 06-30-2009, 01:11 PM   #1
magicbronson
LQ Newbie
 
Registered: Jun 2009
Posts: 4

Rep: Reputation: 0
more efficient {min,max} than "sort | {head,tail} -1"?


Is there a more efficient way to get the min (max) than sorting the entire input and piping to head (tail) -1?
 
Old 06-30-2009, 01:13 PM   #2
magicbronson
LQ Newbie
 
Registered: Jun 2009
Posts: 4

Original Poster
Rep: Reputation: 0
In other words, does a command line "min" or "max" program exist in Unix?
 
Old 06-30-2009, 02:03 PM   #3
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301
You know if you post in your own thread such that it has more than 1 (the original post) it removes your thread from the 0-reply list, and you're less likely to get help with an issue.

Either way, whatever way there is of finding max and min, it will ALWAYS involve sorting. And since sort is written in C, it's your best bet for efficiency. You could also do it with awk, but it would likely be less efficient.
 
Old 06-30-2009, 11:30 PM   #4
Valery Reznic
ELF Statifier author
 
Registered: Oct 2007
Posts: 676

Rep: Reputation: 137Reputation: 137
Quote:
Originally Posted by H_TeXMeX_H View Post
You know if you post in your own thread such that it has more than 1 (the original post) it removes your thread from the 0-reply list, and you're less likely to get help with an issue.

Either way, whatever way there is of finding max and min, it will ALWAYS involve sorting. And since sort is written in C, it's your best bet for efficiency. You could also do it with awk, but it would likely be less efficient.
Why do you think that looking for min/max should ALWAYS involve sorting ?
A naive way to do it is sequence reading of lines and storing min/max value. Could be done with sh, awk, perl - (nearly) every available interpreter. Of course you can "abuse" sort too, but why it's mandatory ?
 
Old 06-30-2009, 11:46 PM   #5
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Quote:
Originally Posted by magicbronson View Post
Is there a more efficient way to get the min (max) than sorting the entire input and piping to head (tail) -1?
if you meant getting min/max value of a range of numbers (in a file), see here for an example.
 
Old 07-01-2009, 02:36 AM   #6
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301
Quote:
Originally Posted by Valery Reznic View Post
Why do you think that looking for min/max should ALWAYS involve sorting ?
A naive way to do it is sequence reading of lines and storing min/max value. Could be done with sh, awk, perl - (nearly) every available interpreter. Of course you can "abuse" sort too, but why it's mandatory ?
Lets say you have an array or list of unsorted data and you want to find the min and max values. How would you do it ? You would have to sort the data or at least go through every element and check it versus every other element and see which is smallest and which is biggest. ghostdog74 posted the algorithm that would be applied. It effect even doing it this way you are sorting the data and doing at least as much work as effectively sorting the data.
 
Old 07-01-2009, 03:39 AM   #7
Valery Reznic
ELF Statifier author
 
Registered: Oct 2007
Posts: 676

Rep: Reputation: 137Reputation: 137
Quote:
Originally Posted by H_TeXMeX_H View Post
Lets say you have an array or list of unsorted data and you want to find the min and max values. How would you do it ? You would have to sort the data or at least go through every element and check it versus every other element and see which is smallest and which is biggest. ghostdog74 posted the algorithm that would be applied. It effect even doing it this way you are sorting the data and doing at least as much work as effectively sorting the data.
"at least go through every element and check it versus every other element",
No, you have not. You have only (as in ghostdog74's algorithm) compare each element with CURRENT minimum and current maximum, not each element with each other.
And this is surely less work then sorting.
 
Old 07-01-2009, 03:58 AM   #8
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301
Alright, you might be right, so I ran a benchmark:

Code:
bash-3.1$ for i in $(seq 100000); do echo $RANDOM >> file; done         
bash-3.1$ time awk 'NR==1{                                             
 tempmin=$0
 tempmax=$0
}
$0 >= tempmax{   tempmax=$0 }
$0 <= tempmin {  tempmin = $0 }
END{
  print "min: "tempmin
  print "max: "tempmax
}' file
min: 0
max: 32767

real	0m0.078s
user	0m0.077s
sys	0m0.001s
bash-3.1$ time sort -n file > sorted && head -n1 sorted && tail -n1 sorted

real	0m0.082s
user	0m0.079s
sys	0m0.003s
0
9999
You'd be better off writing the minmax is C, it will likely be much faster.

Last edited by H_TeXMeX_H; 07-01-2009 at 04:00 AM.
 
Old 07-01-2009, 04:35 AM   #9
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Quote:
Originally Posted by H_TeXMeX_H View Post
bash-3.1$ time sort -n file > sorted && head -n1 sorted && tail -n1 sorted
[/code]
one little things in this piece of code is that you are redirecting it to a file. Might take a bit of i/o time.

that said, another way to write
Code:
sort -n file | awk 'NR==1;END{print}'
one awk process to print first and last record, however i doubt it will be faster than head + tail combination.

Last edited by ghostdog74; 07-01-2009 at 04:40 AM.
 
Old 07-01-2009, 04:50 AM   #10
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301
yeah, it is slower:

Code:
bash-3.1$ time sort -n file | awk 'NR==1;END{print}'
0
32767

real	0m0.100s
user	0m0.097s
sys	0m0.004s
After thinking about it a bit, sorting is likely to be less efficient than just finding min and max, but if you have to use bash only then it won't matter. Best think to do is write a small C program.
 
Old 07-01-2009, 05:24 AM   #11
pwc101
Senior Member
 
Registered: Oct 2005
Location: UK
Distribution: Slackware
Posts: 1,847

Rep: Reputation: 128Reputation: 128
There's a tool in the Generic Mapping Tools (GMT) package called minmax which does just this: it reads a file and spits out the minimum and maximum values for each column. I'm not sure exactly how it's done, but on a file generated using your "for i in $(seq 100000); do echo $RANDOM >> file; done" command, these are the time results:
Code:
pwc@mike ~/desktop $ time minmax file
file: N = 100000        <0/32767>
minmax file  0.12s user 0.06s system 51% cpu 0.359 total
pwc@mike ~/desktop $ time sort -n file | awk 'NR==1;END{print}'
0
32767
sort -n file  0.19s user 0.01s system 99% cpu 0.203 total
awk 'NR==1;END{print}'  0.08s user 0.03s system 52% cpu 0.203 total
pwc@mike ~/desktop $ time awk 'NR==1{                                           
>  tempmin=$0
>  tempmax=$0
> }
> $0 >= tempmax{   tempmax=$0 }
> $0 <= tempmin {  tempmin = $0 }
> END{
>   print "min: "tempmin
>   print "max: "tempmax
> }' file
min: 0
max: 32767
awk  file  0.23s user 0.03s system 105% cpu 0.250 total
I've attached the source for minmax, if anyone's interested. The project's homepage is http://gmt.soest.hawaii.edu/.
Attached Files
File Type: txt minmax.c.txt (19.1 KB, 13 views)
 
Old 07-01-2009, 05:23 PM   #12
magicbronson
LQ Newbie
 
Registered: Jun 2009
Posts: 4

Original Poster
Rep: Reputation: 0
So a minmax utility does exist in some package (albeit one whose ~60 other utilities I have no need for)! Interesting find, pwc101. What I'm looking for though is something with a similar interface to coreutils' "sort", so I've just posted to the bug-coreutils mailing list asking if they might add it.
 
Old 07-01-2009, 11:12 PM   #13
magicbronson
LQ Newbie
 
Registered: Jun 2009
Posts: 4

Original Poster
Rep: Reputation: 0
here is the message I posted to bug-coreutils: http://www.mail-archive.com/bug-core.../msg16920.html
 
Old 07-02-2009, 04:03 AM   #14
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301
You might want to ask for a multi-utility tool, because minmax is just not enough. If you want I'll write a short C program later today to do it. It's quite simple, so it shouldn't be hard.
 
Old 07-02-2009, 04:07 AM   #15
pwc101
Senior Member
 
Registered: Oct 2005
Location: UK
Distribution: Slackware
Posts: 1,847

Rep: Reputation: 128Reputation: 128
Quote:
Originally Posted by magicbronson View Post
So a minmax utility does exist in some package (albeit one whose ~60 other utilities I have no need for)! Interesting find, pwc101. What I'm looking for though is something with a similar interface to coreutils' "sort", so I've just posted to the bug-coreutils mailing list asking if they might add it.
We use GMT all the time here at work for our plots, so it wasn't much of a find!

I'm sure it'd be pretty straight forward to hack out the GMT specific parts and be left with a simple minmax utility. But as you say, you've a different approach in mind.

If I weren't as busy as I am at the moment, I'd be tempted to have a go myself. Maybe I will, when I get bored!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LFS6.3 - Ch5.4.1 "/bin/sh sort not found" error at "make bootstrap" ubyt3m3 Linux From Scratch 2 06-23-2008 12:09 AM
How to make newer "tail" behave like older "tail" rylan76 Linux - Software 4 12-07-2007 04:27 AM
where can i find the paper "efficient dispersal of information for security.. " xzfxzf Programming 2 12-12-2005 03:59 AM
Howto end the "tail" command??? Schmurff Linux - Newbie 5 03-03-2004 05:32 AM
Error in Sendmail Tail : "apache set sender to <> using -f" cartfanatic39 Linux - Software 0 01-30-2004 10:04 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 04:09 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration