LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   is optimized software really faster? (https://www.linuxquestions.org/questions/linux-software-2/is-optimized-software-really-faster-294353/)

shanenin 02-24-2005 10:50 AM

is optimized software really faster?
 
If you have a system like gentoo that is optimiized for you specific processor, althon xp in my case. Is this truely faster then a system built with bianaries that are built to run on an i386 machine? I realize in theory they are, but in practice can anyone really tell the difference? Are there any benchmarks that show the difference?

Scorpio 02-24-2005 10:56 AM

Not sure about benchmarks but they would be faster than binaries.
Only bad thing about the optimisation of say gentoo is the compile time.
Especailly on older pcs imo.

Hammett 02-24-2005 12:20 PM

I use Gentoo and since I hit Linux Gentoo 2.6.9 on grub, until Fluxbox is ready, the elapsed time is 1minute....
Before I was using FC3 and it took around 1:35 - 1:40 to start up.

I really think it's way faster, but, as said above, compilation sometimes is endless (took me more than 1h to compile qt libraries)

foo_bar_foo 02-24-2005 12:25 PM

this is only for floating point math so the results are not that significant overall
http://home.comcast.net/~jcunningham...imization.html
his results with athlon xp show roughly 47% increase in speed from
-O3 -march=i386 350 seconds
to
-march=athlon-xp -O3 -ffast-math -malign-double -funroll-loops -pipe -fomit-frame-pointer -msse -mfpmath=sse,387 241 seconds

the results would be even greater if he had used -O2 on the baseline

on my own machine i did tests once and came up with an overall increase of 11% for i/o throughput
but that was just with compiling the test with different flags.

if you were to factor in how much slower it would actually be on a non optimized kernel/non optimized filesystem etc i would imagine it would be at least 25% or greater

shanenin 02-24-2005 12:33 PM

thanks, that is some good information.

Hammett 02-24-2005 12:41 PM

Quote:

Originally posted by foo_bar_foo
this is only for floating point math so the results are not that significant overall
http://home.comcast.net/~jcunningham...imization.html
his results with athlon xp show roughly 47% increase in speed from
-O3 -march=i386 350 seconds
to
-march=athlon-xp -O3 -ffast-math -malign-double -funroll-loops -pipe -fomit-frame-pointer -msse -mfpmath=sse,387 241 seconds

the results would be even greater if he had used -O2 on the baseline

on my own machine i did tests once and came up with an overall increase of 11% for i/o throughput
but that was just with compiling the test with different flags.

if you were to factor in how much slower it would actually be on a non optimized kernel/non optimized filesystem etc i would imagine it would be at least 25% or greater

So, which are the desirabel optimization flags? All of them?? I mean, I'm not a guru in hardware/software and I didn't change that much the optimizations for my arch (-O2 -march=pentium4 -fomit-frame-pointer), so I really don't know if I should put more and more flags to the make.conf file for getting a faster machine or, on the contrary, putting more flags will end up in a mess....

PMorph 02-24-2005 01:43 PM

I'm sure these optimizations can really make a difference in computationally heavy tasks, like 3D rendering, audio/video encoding etc. number crunching.

But I doubt it makes any noticeable difference in the everyday desktop usage, X performance?
Or does it? Anyone?

bulliver 02-24-2005 01:53 PM

I don't think it creates a noticeable difference at all. Don't get me wrong, I use and love gentoo, but not for any supposed speed increase.

I used to be one of those gentoo 'ricers' that used uber-optimized gcc flags, but gave up because of broken builds and unstable software. Now I use a 'sober' -O2

While a .002 second speed increase may be nice in the HPC domain, it is meaningless to your typical end user.

Real gentoo'ers know the reason we use it is because of:
1. choice
2. portage

not any supposed speed increase.

Tinkster 02-24-2005 01:57 PM

Quote:

Originally posted by Hammett
I use Gentoo and since I hit Linux Gentoo 2.6.9 on grub, until Fluxbox is ready, the elapsed time is 1minute....
Before I was using FC3 and it took around 1:35 - 1:40 to start up.

I really think it's way faster, but, as said above, compilation sometimes is endless (took me more than 1h to compile qt libraries)

Heh ... Slackware 8.1 ("optimized" for i386) was booting in 40
seconds on a P4 1.6 ... Mandreck (i586) on the same machine
was booting in 110 ... does that mean that i386 code is faster? Or
do you think it might be due to different services that are being
started and the way the init-scripts work?

As others have pointed out ... it won't have much impact on
everyday work. I noticed a negligable difference in e.g.
video-stream encoding when I recompiled transcode (and
the libs it uses) for -march=pentium4 -msse2 ... encoding
the same video gained just under a minute, which doesn't
make a lot of a difference when you're looking at 7 hours ;)


Cheers,
Tink

rshaw 02-24-2005 02:02 PM

in case anyone missed the 'ricer' reference:
http://funroll-loops.org/ careful, naughty language therein.

__J 02-24-2005 03:02 PM

Quote:

Originally posted by Hammett
So, which are the desirabel optimization flags? All of them?? I mean, I'm not a guru in hardware/software and I didn't change that much the optimizations for my arch (-O2 -march=pentium4 -fomit-frame-pointer), so I really don't know if I should put more and more flags to the make.conf file for getting a faster machine or, on the contrary, putting more flags will end up in a mess....
Too much is a bad thing and can have a negative impact on performance. If you put too many in, you will start to see erratic behavior/crashes from some of your applications.

do a google as there are alot of references out there:

http://home.comcast.net/~jcunningham...ion_Flags.html
http://freshmeat.net/articles/view/730/
http://www.emerson.emory.edu/service...e_Options.html

dastrike 02-24-2005 03:14 PM

Overall: the actual performance increase is not really worth the time spent on recompiling everything with every bell-and-whizbang-"optimization"-settings.

But in specific cases of particular programmes there are noticeable performance increases though.

As with all performance optimization: identify your bottleneck first. Then resolve that bottleneck and move onto the next bottleneck. There are many bottlenecks to fix before one needs to go beyond a plain -O2 optimization during compilation (and in many cases an -O3 just makes the code slower and the executable far more bloated sizewise).

voyciz 02-24-2005 03:19 PM

In the March edition of Linux Journal, a test was conducted with a script that "performs a bubble search over 10,000 elements. The elements in the array have been reversed to force the worst-case scenario". Results:

$ gcc -o sort sort.c -O2
$ time ./sort

real 0m1.036s
user 0m2.030s
sys 0m0.000s

$ gcc -o sort sort.c -O2 -march=pentium2
$ time ./sort

real 0m0.799s
user 0m0.790s
sys 0m0.010s

There was a 237ms (23%) speed increase. The tests were conducted on a 633MHz Celeron. Hope this helps.

sh1ft 02-24-2005 03:33 PM

Im sorry but this gentoo speed increase crap is a bunch of bs. It does nothing to decrease interactivity and latency and is mostly useless on a desktop os. I'd rather wait .0002 seconds longer for firefox to startup then waste 10 hours compiling it. What a waste of time and cpu cycles. The only difference it makes is in programs that occupy the cpu 100% and do caclulation intensive tasks like number crunching and video or music encoding.

In fact for some programs it INCREASES startup time by making the binary bigger.

In the words of Vanilla Ice:

rice rice baby... ;)

bulliver 02-24-2005 04:20 PM

Could have sworn that's what I said 6 posts ago, but a lot more politely.
Stop equating gentoo users with ricers. We're not all boneheads you know.

shengchieh 02-24-2005 05:30 PM

Benchmarking Links
http://www.tux.org/~mayer/linux/bmark.html (Linux/Unix nbench)
http://www.nobell.org/~gjm/linux/benchmark.html (Linux Benchmark Notes)

Sheng-Chieh

Hammett 02-24-2005 05:45 PM

Quote:

Originally posted by Tinkster
Heh ... Slackware 8.1 ("optimized" for i386) was booting in 40
seconds on a P4 1.6 ... Mandreck (i586) on the same machine
was booting in 110 ... does that mean that i386 code is faster? Or
do you think it might be due to different services that are being
started and the way the init-scripts work?

As others have pointed out ... it won't have much impact on
everyday work. I noticed a negligable difference in e.g.
video-stream encoding when I recompiled transcode (and
the libs it uses) for -march=pentium4 -msse2 ... encoding
the same video gained just under a minute, which doesn't
make a lot of a difference when you're looking at 7 hours ;)


Cheers,
Tink

Yeah, of course Gentoo has less services (not too much less), but I'm completely sure as well, optimization did its work in getting a faster machine. Another prove is that now OO opens way faster than in FC3 (from 12-15secs to 7-8). Thats quite a lot of improvement.
And of course, I won't recompile the whole system again only because I missed a couple of flags....I was just wondering which one was the best "optimization" I could get via flags of my P4 2.4Ghz.

KimVette 02-24-2005 08:24 PM

Processor optimizations are well worth it. Think about it - your Linux system does not run just one small monolithic executable.

Note: I am using arbitrary numbers just to illustrate a point.

If, say, a particular code path through a menu based on a Qt widget take 50ms to execute, you're not looking at JUST the 50ms delay from the lack of optimizations. You're looking at 50ms performance degradation on top of another one - let's say file I/O to grab the icon resource. Let's say the optimization, or lack thereof, for that action, costs another 50ms. Now we're at 100ms, a tenth of a second. Not so bad, right?

Now let's say you're running Xfree86 - VSEA driver because you have an ATI card and we all know ATI's support for Linux is piss-poor (those arrogant bastids!). Now ket's say, purely for example, that you're running 1920x1440x24bits, and you have KDE's transparent menus enabled. Now we'll assume, for no other reason than to illustrate the point, that you incur a 200ms delay between all of the various KDE and Xfree86 libraries. This (hypothetical) menu which would normally take under 100ms to open with otimized code, on a non-accellerated VESA driver, is now taking 300ms LONGER to open. This will make the system feel very unresponsive and downright annoying to run. In isolation a few milliseconds here and there isn't very noticeable, but we're not talking one single codepath through one single widget here - we're talking tens of millions possible paths through thousands of various widgets and libraries in your system, resulting in significant delays.

Now you're going to say that optimizations cost disk space, and time to load - this is true. The way that themost beneficial and common optimizations work is by "flattening" iterative loops so that the MMU can parse through memory in a sequential fasion, rather than to have to keep fetching. In most cases, it is going to improve performance, at a small memory and storage footprint penalty, and the I/O to load the larger code segment is going to cost less time than looping will.

Eliminating these individually-minute but cumulative delays in games result in many frames' per second improvements in your favorite games.

Eliminating these individually-minute but cumulative delays in NLE renderers result in saving many hours on small video production jobs, days on moderat-sized jobs, and WEEKS to MONTHS on large projects. (Use hardware encoders, you say? Guess what is running in those "hardware" encoders? That's right - highly-optimized software - only with the "hardware" encoders, you cannot set up a rendering farm).

Eliminating these individually-minute but cumulative delays in modeling/3D renderers result in saving many weeks to months on moderate to large 3D modeling/rendering jobs

Eliminating these individually-minute but cumulative delays in development packages such as kdevelop, anjuta, make, and g++ result in saving TONS of time in devloping , compiling, debugging, recompiling, etc. your favorite Linux games and window managers..

Eliminating these individually-minute but cumulative delays in OOo makes for an office suite which performs better than Microsoft Office. Oh wait, it doesn't- the Ooo does not consider resolving their SEVERE performance issues to be a priority - even when it takes 2 hours to open a darn spreadsheet that Microsoft Office 2000 under wine can open in under a minute - or OfficeXP under Windows on the same hardware can open in under 10 seconds. No, optimization of your code isn't important. ;) (I had to flame the Ooo team here)

Tinkster 02-24-2005 08:27 PM

Quote:

Originally posted by Hammett
Yeah, of course Gentoo has less services (not too much less), but I'm completely sure as well, optimization did its work in getting a faster machine.
Uh-huh...

Quote:

Another prove is that now OO opens way faster than in FC3 (from 12-15secs to 7-8). Thats quite a lot of improvement.
And you know for a fact that other than the CPU
optimizations all other (project-relevant) settings from
OOs perspective were identical in FC3 and Gentoo,
I take it? Like building with/without Python support and
the likes?


Cheers,
Tink

foo_bar_foo 02-24-2005 10:13 PM

Quote:

Originally posted by Hammett
-march=pentium4 so I really don't know if I should put more and more flags to the make.conf file for getting a faster machine or, on the contrary, putting more flags will end up in a mess....
i have done extensive benchmarking of my pentium4
overall the fastest set of flags seems to be
-march=pentium4 -O3 -funroll-loops -ftracer -momit-leaf-frame-pointer -fprefetch-loop-arrays

there are some other more bizarre sets using -O1 and pick and choose the rest that are faster for certain kinds of applications but that would require testing trouble way beyond the performance gains

shanenin 02-24-2005 10:18 PM

what are ricers?

foo_bar_foo 02-24-2005 10:29 PM

Quote:

Originally posted by sh1ft
Im sorry but this gentoo speed increase crap is a bunch of bs.
In the words of Vanilla Ice:

rice rice baby... ;)

It is fascinating that this topic provokes such visceral responses

i assume these same people just waste money on newer machines instead of their time on building their operating system.. which seems fine (same performance increase)

but the angry response suggests they are not happy inside nor comfortable with their choice.

i don't use gentoo nor have i ever investigated what it is all about

"ricers" of course implies japanese cars over AMERICAN cowboy wastefull gas wasting never last more that 2 years cars and i believe in this context is meant to be RASCIST which is inapropriate on an international forum that is not about FASCIST HATE

Hammett 02-25-2005 07:00 AM

Quote:

Originally posted by Tinkster
Uh-huh...


And you know for a fact that other than the CPU
optimizations all other (project-relevant) settings from
OOs perspective were identical in FC3 and Gentoo,
I take it? Like building with/without Python support and
the likes?


Cheers,
Tink

Maybe you're right. I really can't tell under which circumstances OO in FC3 was build. But the important thing for me is HOW FAST does the system respond to me under the same demands. And under these circumstances, Gentoo kicks ass FC3. I really don't care that much of the types of the builds (I'm economist, not computer-scientist), so the relevant thing for me is that I see the system go faster and more stable (much less errors) in Gentoo than in FC3.
Optimization of CPU flags HAS to have its effects on the performance of the whole system, and since I'm not a guru in informatics, my conclusion is that. Maybe I'm right, maybe not. But I clearly thing optimizations did its work to me, and I see it and feel it like it.

doralsoral 02-25-2005 07:23 AM

first of all im a gentoo user so im not knocking it but all thes people are comparing gentoo boot time with other distros, Fedora Core and mandrake. These distros install a bunch of applications that are started at boot time. gentoo doesnt do this so of course its going to be faster. i know you can supposedly tell fedora to not install these but it still sneaks some stuff in there.

foo_bar_foo 02-25-2005 08:15 AM

ok -- so we have objective measurements in the form of benchmarking
KimVette has pointed out the layered or cumalative effect of those objective measures quite clearly
and we have subjective observations about boot time which if you have ever done this is one of the most glaringly obvious things you see right off and you know it's real.

example: from the time i type hit return on "startx" to the time i have a fully up and running kde desktop
is 8 seconds ! and each time i see it it's obviously blazing.

Tinkster has pointed out that there could be differences in optional included functionality and that makes sense but is really just another form of optimization done through compiling and can not in any way account for the benchmarking results.

So, in the face of obviouse evidence many still deny the reality -- why ?

which leads us back to psychological factors so clearly demonstrated by the undiferentiated and simplistic categorization and demonization of people interested in efficiency as "ricers"
a term steeped in zenophobia and racism.

This leads one to believe that desktop computers must be performing some totem like or mythological like function in peoples lives.(a sad reality) Like for instance if i believed my computer magically caused other people to think i'm smart. If i came across people discussing something that confused or intimidated me in response to the computer i may need to project that uncomfortableness (my myth has clashed with reality) back outward onto a perceived group "ricers" who are the embodment of evil so i can feel good about myself again and regain my mythological footing.

Hammett 02-25-2005 08:58 AM

Quote:

Originally posted by foo_bar_foo
This leads one to believe that desktop computers must be performing some totem like or mythological like function in peoples lives.(a sad reality) Like for instance if i believed my computer magically caused other people to think i'm smart. If i came across people discussing something that confused or intimidated me in response to the computer i may need to project that uncomfortableness (my myth has clashed with reality) back outward onto a perceived group "ricers" who are the embodment of evil so i can feel good about myself again and regain my mythological footing.
LMAO! Great argument, at least is quite funny. But I think that those people that deny that optimization speeds performance say it doesn't mainly because they don't see that optimization as something worthwile. For instance, some may say FC3 is faster (or slower, doesn't matter) than Mdk 10...Well, I could agree or not, but from my point of view (totally subjective) Mdk will be slower than FC3. And if I truly believe in that I wouldn't listen to anybody saying the contrary. That's what I think people is behaving on this thread. Those who don't believe in optimization, cause their believes says so, but they don't make the efford of trying to change their minds, questioning if whether they're right or not abaout what they're saying.
If optimization wasn't effective, no one would lose their time doing it. Human beings can be very dumb, but not that much. If some people do it, it's because it's working. And if that isn't a commond way to work with computers is because optimization knowledge (at this level) is very restrictive, only known by those who work hard on computers or real geeks. Unfortunatelly I'm not neither of both, but I do the efford of grabbing a Gentoo, breaking my balls installing and configuring it and trying as well as I can to optimize the resources of my Pentium 4. That's definitely worth for me.

PMorph 02-25-2005 12:31 PM

What we need here is a "blind" ABX test ;)
Soo.. anyone with two identical systems and a lot of spare time...

KimVette 02-25-2005 06:10 PM

Bah. I run Suse - a distribution considered by some to be bloated. On this box what I have configured to start automatically in runlevel 5 are:

MySQL
Apache2
KDM
webmin
sshd
samba
postfix
networking (xinetd, etc. of course)
hotplug
cups
alsa
XFree86
(along with the other usual runlevel 5 stuff)
KDE Kwin
KMix
Kopete
Various other utilities and applets that sit in the tray and dock

And yet, my system boots more quickly in Linux than it does in Windows XP. It's a lowly dual Pentium III at only 975Mhz, 1GB RAM and I have to run the VESA driver because ATI's crappy Catalyst driver won't run with my dual head setup, I have all the KDE eye candy turned on with the thin keramic style, and yet the system runs just fine. I don't know why all you folks are complaining about performance when most of you are probably running on 2.4Ghz or faster Pentium 4s. :)

--Kim

voyciz 02-25-2005 06:34 PM

I'm actually getting some new hardware this weekend, and what I'm going to do is a normal Slack install just to have something running on it. I'll run some tests with that. In the meantime, I'll be putting together a custom install CD completely optimized for that new hardware. Then I'll run some tests when I have that installed and see just how much faster things really are. Of course, this will take a while to do, but I will post my results in this section when I'm finished. Will probably be at least a few weeks (have lots of compiling to do). What sorts of tests do you guys think I should run?


All times are GMT -5. The time now is 02:48 AM.