LinuxQuestions.org - is optimized software really faster?

- Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)

- - is optimized software really faster? (https://www.linuxquestions.org/questions/linux-software-2/is-optimized-software-really-faster-294353/)

Benchmarking Links
http://www.tux.org/~mayer/linux/bmark.html (Linux/Unix nbench)
http://www.nobell.org/~gjm/linux/benchmark.html (Linux Benchmark Notes)

Sheng-Chieh

Quote:

Originally posted by Tinkster
Heh ... Slackware 8.1 ("optimized" for i386) was booting in 40
seconds on a P4 1.6 ... Mandreck (i586) on the same machine
was booting in 110 ... does that mean that i386 code is faster? Or
do you think it might be due to different services that are being
started and the way the init-scripts work?

As others have pointed out ... it won't have much impact on
everyday work. I noticed a negligable difference in e.g.
video-stream encoding when I recompiled transcode (and
the libs it uses) for -march=pentium4 -msse2 ... encoding
the same video gained just under a minute, which doesn't
make a lot of a difference when you're looking at 7 hours ;)

Cheers,
Tink

Yeah, of course Gentoo has less services (not too much less), but I'm completely sure as well, optimization did its work in getting a faster machine. Another prove is that now OO opens way faster than in FC3 (from 12-15secs to 7-8). Thats quite a lot of improvement.
And of course, I won't recompile the whole system again only because I missed a couple of flags....I was just wondering which one was the best "optimization" I could get via flags of my P4 2.4Ghz.

Processor optimizations are well worth it. Think about it - your Linux system does not run just one small monolithic executable.

Note: I am using arbitrary numbers just to illustrate a point.

If, say, a particular code path through a menu based on a Qt widget take 50ms to execute, you're not looking at JUST the 50ms delay from the lack of optimizations. You're looking at 50ms performance degradation on top of another one - let's say file I/O to grab the icon resource. Let's say the optimization, or lack thereof, for that action, costs another 50ms. Now we're at 100ms, a tenth of a second. Not so bad, right?

Now let's say you're running Xfree86 - VSEA driver because you have an ATI card and we all know ATI's support for Linux is piss-poor (those arrogant bastids!). Now ket's say, purely for example, that you're running 1920x1440x24bits, and you have KDE's transparent menus enabled. Now we'll assume, for no other reason than to illustrate the point, that you incur a 200ms delay between all of the various KDE and Xfree86 libraries. This (hypothetical) menu which would normally take under 100ms to open with otimized code, on a non-accellerated VESA driver, is now taking 300ms LONGER to open. This will make the system feel very unresponsive and downright annoying to run. In isolation a few milliseconds here and there isn't very noticeable, but we're not talking one single codepath through one single widget here - we're talking tens of millions possible paths through thousands of various widgets and libraries in your system, resulting in significant delays.

Now you're going to say that optimizations cost disk space, and time to load - this is true. The way that themost beneficial and common optimizations work is by "flattening" iterative loops so that the MMU can parse through memory in a sequential fasion, rather than to have to keep fetching. In most cases, it is going to improve performance, at a small memory and storage footprint penalty, and the I/O to load the larger code segment is going to cost less time than looping will.

Eliminating these individually-minute but cumulative delays in games result in many frames' per second improvements in your favorite games.

Eliminating these individually-minute but cumulative delays in NLE renderers result in saving many hours on small video production jobs, days on moderat-sized jobs, and WEEKS to MONTHS on large projects. (Use hardware encoders, you say? Guess what is running in those "hardware" encoders? That's right - highly-optimized software - only with the "hardware" encoders, you cannot set up a rendering farm).

Eliminating these individually-minute but cumulative delays in modeling/3D renderers result in saving many weeks to months on moderate to large 3D modeling/rendering jobs

Eliminating these individually-minute but cumulative delays in development packages such as kdevelop, anjuta, make, and g++ result in saving TONS of time in devloping , compiling, debugging, recompiling, etc. your favorite Linux games and window managers..

Eliminating these individually-minute but cumulative delays in OOo makes for an office suite which performs better than Microsoft Office. Oh wait, it doesn't- the Ooo does not consider resolving their SEVERE performance issues to be a priority - even when it takes 2 hours to open a darn spreadsheet that Microsoft Office 2000 under wine can open in under a minute - or OfficeXP under Windows on the same hardware can open in under 10 seconds. No, optimization of your code isn't important. ;) (I had to flame the Ooo team here)

Quote:

Originally posted by Hammett
Yeah, of course Gentoo has less services (not too much less), but I'm completely sure as well, optimization did its work in getting a faster machine.

Uh-huh...

Quote:

Another prove is that now OO opens way faster than in FC3 (from 12-15secs to 7-8). Thats quite a lot of improvement.

And you know for a fact that other than the CPU
optimizations all other (project-relevant) settings from
OOs perspective were identical in FC3 and Gentoo,
I take it? Like building with/without Python support and
the likes?

Cheers,
Tink

Quote:

Originally posted by Hammett
-march=pentium4 so I really don't know if I should put more and more flags to the make.conf file for getting a faster machine or, on the contrary, putting more flags will end up in a mess....

i have done extensive benchmarking of my pentium4
overall the fastest set of flags seems to be
-march=pentium4 -O3 -funroll-loops -ftracer -momit-leaf-frame-pointer -fprefetch-loop-arrays

there are some other more bizarre sets using -O1 and pick and choose the rest that are faster for certain kinds of applications but that would require testing trouble way beyond the performance gains

Quote:

Originally posted by sh1ft
Im sorry but this gentoo speed increase crap is a bunch of bs.
In the words of Vanilla Ice:

rice rice baby... ;)

It is fascinating that this topic provokes such visceral responses

i assume these same people just waste money on newer machines instead of their time on building their operating system.. which seems fine (same performance increase)

but the angry response suggests they are not happy inside nor comfortable with their choice.

i don't use gentoo nor have i ever investigated what it is all about

"ricers" of course implies japanese cars over AMERICAN cowboy wastefull gas wasting never last more that 2 years cars and i believe in this context is meant to be RASCIST which is inapropriate on an international forum that is not about FASCIST HATE

Quote:

Originally posted by Tinkster
Uh-huh...

And you know for a fact that other than the CPU
optimizations all other (project-relevant) settings from
OOs perspective were identical in FC3 and Gentoo,
I take it? Like building with/without Python support and
the likes?

Cheers,
Tink

Maybe you're right. I really can't tell under which circumstances OO in FC3 was build. But the important thing for me is HOW FAST does the system respond to me under the same demands. And under these circumstances, Gentoo kicks ass FC3. I really don't care that much of the types of the builds (I'm economist, not computer-scientist), so the relevant thing for me is that I see the system go faster and more stable (much less errors) in Gentoo than in FC3.
Optimization of CPU flags HAS to have its effects on the performance of the whole system, and since I'm not a guru in informatics, my conclusion is that. Maybe I'm right, maybe not. But I clearly thing optimizations did its work to me, and I see it and feel it like it.

first of all im a gentoo user so im not knocking it but all thes people are comparing gentoo boot time with other distros, Fedora Core and mandrake. These distros install a bunch of applications that are started at boot time. gentoo doesnt do this so of course its going to be faster. i know you can supposedly tell fedora to not install these but it still sneaks some stuff in there.

ok -- so we have objective measurements in the form of benchmarking
KimVette has pointed out the layered or cumalative effect of those objective measures quite clearly
and we have subjective observations about boot time which if you have ever done this is one of the most glaringly obvious things you see right off and you know it's real.

example: from the time i type hit return on "startx" to the time i have a fully up and running kde desktop
is 8 seconds ! and each time i see it it's obviously blazing.

Tinkster has pointed out that there could be differences in optional included functionality and that makes sense but is really just another form of optimization done through compiling and can not in any way account for the benchmarking results.

So, in the face of obviouse evidence many still deny the reality -- why ?

which leads us back to psychological factors so clearly demonstrated by the undiferentiated and simplistic categorization and demonization of people interested in efficiency as "ricers"
a term steeped in zenophobia and racism.

This leads one to believe that desktop computers must be performing some totem like or mythological like function in peoples lives.(a sad reality) Like for instance if i believed my computer magically caused other people to think i'm smart. If i came across people discussing something that confused or intimidated me in response to the computer i may need to project that uncomfortableness (my myth has clashed with reality) back outward onto a perceived group "ricers" who are the embodment of evil so i can feel good about myself again and regain my mythological footing.

Quote:

Originally posted by foo_bar_foo
This leads one to believe that desktop computers must be performing some totem like or mythological like function in peoples lives.(a sad reality) Like for instance if i believed my computer magically caused other people to think i'm smart. If i came across people discussing something that confused or intimidated me in response to the computer i may need to project that uncomfortableness (my myth has clashed with reality) back outward onto a perceived group "ricers" who are the embodment of evil so i can feel good about myself again and regain my mythological footing.

LMAO! Great argument, at least is quite funny. But I think that those people that deny that optimization speeds performance say it doesn't mainly because they don't see that optimization as something worthwile. For instance, some may say FC3 is faster (or slower, doesn't matter) than Mdk 10...Well, I could agree or not, but from my point of view (totally subjective) Mdk will be slower than FC3. And if I truly believe in that I wouldn't listen to anybody saying the contrary. That's what I think people is behaving on this thread. Those who don't believe in optimization, cause their believes says so, but they don't make the efford of trying to change their minds, questioning if whether they're right or not abaout what they're saying.
If optimization wasn't effective, no one would lose their time doing it. Human beings can be very dumb, but not that much. If some people do it, it's because it's working. And if that isn't a commond way to work with computers is because optimization knowledge (at this level) is very restrictive, only known by those who work hard on computers or real geeks. Unfortunatelly I'm not neither of both, but I do the efford of grabbing a Gentoo, breaking my balls installing and configuring it and trying as well as I can to optimize the resources of my Pentium 4. That's definitely worth for me.

What we need here is a "blind" ABX test ;)
Soo.. anyone with two identical systems and a lot of spare time...

Bah. I run Suse - a distribution considered by some to be bloated. On this box what I have configured to start automatically in runlevel 5 are:

MySQL
Apache2
KDM
webmin
sshd
samba
postfix
networking (xinetd, etc. of course)
hotplug
cups
alsa
XFree86
(along with the other usual runlevel 5 stuff)
KDE Kwin
KMix
Kopete
Various other utilities and applets that sit in the tray and dock

And yet, my system boots more quickly in Linux than it does in Windows XP. It's a lowly dual Pentium III at only 975Mhz, 1GB RAM and I have to run the VESA driver because ATI's crappy Catalyst driver won't run with my dual head setup, I have all the KDE eye candy turned on with the thin keramic style, and yet the system runs just fine. I don't know why all you folks are complaining about performance when most of you are probably running on 2.4Ghz or faster Pentium 4s. :)

--Kim

I'm actually getting some new hardware this weekend, and what I'm going to do is a normal Slack install just to have something running on it. I'll run some tests with that. In the meantime, I'll be putting together a custom install CD completely optimized for that new hardware. Then I'll run some tests when I have that installed and see just how much faster things really are. Of course, this will take a while to do, but I will post my results in this section when I'm finished. Will probably be at least a few weeks (have lots of compiling to do). What sorts of tests do you guys think I should run?