Dear Slackware, Optimized Distros?

joncr · 05-05-2013, 01:27 PM

I'm still waiting for a good reason to go to the trouble of optimizing (i.e., tweak compiler flags) pretty much anything.

Unless you're running on really sub-par hardware, I suspect more often than not you'd need to run tests to actually discern the difference.

E.g., I might optimize my browser, but to what end? So much else impacts the performance of a browser that has nothing to do with the flags given to gcc during the build.

Or, say I'm using some GUI firewall configuration tool. What is to be gained by optimizing that?

Better optimization would come via better software design, better algorithms, and better coding, not via an effort to apply cookie-cutter compiler flags to an entire distribution.

Beelzebud · 05-05-2013, 02:00 PM

I must say I'm a bit perplexed by the knee-jerk hostility from nearly everyone in this thread.

astrogeek · 05-05-2013, 02:06 PM

Quote:

Originally Posted by Beelzebud

I must say I'm a bit perplexed by the knee-jerk hostility from nearly everyone in this thread.

Are we reading the same thread?

Knee-jerk hostility?

I have seen some rather extreme patience exhibited by most participants as the OP persists in ignoring their points and suggestions and their requests for supporting benchmeark data - then moves the goalposts from "easy" to "possible" and claims his case to be proven.

Knee-jerk hostility? Give a clear example please...

dugan · 05-05-2013, 02:09 PM

Quote:

Originally Posted by Beelzebud

I must say I'm a bit perplexed by the knee-jerk hostility from nearly everyone in this thread.

I'm not. From the first page, Konsolebox ignored every point that he didn't make himself. Normally this could be excused as blind enthusiasm, but not 10 pages into the thread. Knee-jerk hostility is what you eventually get if you persist in behaving like this for the aforementioned ten pages

hitest · 05-05-2013, 02:42 PM

I have a suggestion. It may now be the time to let this thread scroll down off the Slackware forum. Just a thought.

konsolebox · 05-05-2013, 06:07 PM

Quote:

Originally Posted by dugan

So tell me if this would be a good description of your progress so far. You took the easy step of doing a global search and replace and noticed that this operation failed on many SlackBuilds. You then tried to build the packages. The build results were exactly what we told you to expect: that many packages failed to build.

Check again. Among those packages I attempted to compiled, the ones that did not build are the ones that won't build even if you just use the normal SlackBuilds. Well that should simply mean that they had to be patched if they had to be updated or recompiled. Yet again among those that were not compiled are the ones that were essential for runtime. Again, why is it that you want everything to be compiled? Those 20 packages among 400+? And they even didn't compile just because you used the flag. They also won't compiled even with the normal SlackBuilds and the numbers match. So how does that affect my proof. Again, have you seen my latest report?

Quote:

You then declared, based on these highly negative results

Highly negative results? Are you really just trying to find alibis just so you could deny the proof that's being shown in front of you?

Quote:

that you've "proven" that an "optimized Slackware" is "possible" "for the whole distribution and not just one package".

Well the concept is proven and again as I've said if everything has to be recompiled though not really necessary, it's up for the team to decide that.

Quote:

You have also refused, despite repeated prompting, to do benchmarks to test the assumption underlying your project: that these "optimizations" will have any real-world, human-noticeable benefits at all.

Ok let's just let it end up as a philosophical thing as I no longer want to prove any thing about those speed benefits. If you think then there's no significant benefit to it then so be it.

Quote:

Would you say that that is an accurate summary of what you've posted so far?

I'm not sure if that would be needed? I'm sure you would continue criticizing the most little details of it. How would that be helpful to you?

Quote:

(Of course, if the claim has indeed changed to "possible" from "easy enough for Pat to maintain", then it's no longer wrong. There are/were Slackware forks that rebuilt every package).

It would be easy enough if really wanted as how I see it. After everything is fine-tuned and automation scripts are already stable, it would be just like recompiling a whole system of Gentoo.

And before you guys were saying that it's not possible, after you find that it's "easy enough", what would be next?

dugan · 05-05-2013, 06:19 PM

Quote:

Originally Posted by konsolebox

And before you guys were saying that it's not possible, after you find that it's "easy enough", what would be next?

I hate to rise to the further bait, but: did you really need to ask?

"What would be next" would, obviously, be a requirement that you complete your fork. As in, finish building it. Let me speak for everyone except you when I say that that is the only thing we will accept as "proof" that what you're asking for is feasible. Things like demonstrating (not "proving") that you're able to maintain it in concordance with Slackware upstream, and benchmarking it for actual numbers, will need to be established separately.

I think I've said this before.

Quote:

Originally Posted by konsolebox

So how does that affect my proof.

Well the concept is proven

And before you guys were saying that it's not possible

Let me remind of what you're trying to "prove": that it's feasible for the Slackware team to maintain and distribute -march-optimized versions of the distribution. You got as far as getting some but not all of Slackware's packages to compile. Then you stopped there, and you said: "In conclusion, you can do the rest if you really want to." Or, to use your exact words, "It would be easy enough if really wanted as how I see it." No, that's not proof.

T3slider · 05-05-2013, 06:28 PM

Quote:

Originally Posted by konsolebox

Ok let's just let it end up as a philosophical thing as I no longer want to prove any thing about those speed benefits. If you think then there's no significant benefit to it then so be it.

In the absence of any evidence whatsoever to suggest that there is, in fact, any speed improvement, then we must continue to believe that there is limited benefit. A good benchmark on a clean Slackware 14.0 and a clean Slackware 14.0_optimized is required to justify this exercise at all. (And those benchmarks must take into account possible disk access/caching issues.) If you have given up deciding that it is too much work (either to actually build, install and benchmark the optimized build or to prove to others that it is worth it) then you have reached the conclusion predicted by many earlier in this thread -- that it is too much work. If you have decided that Slackware is not a good candidate because of its conservative nature, then you have also reached a prophesied result.

volkerdi · 05-05-2013, 06:44 PM

Quote:

Originally Posted by konsolebox

Are you really just trying to find alibis just so you could deny the proof that's being shown in front of you?

The only thing that you've proven is that the Dunning–Kruger effect is alive and well.

Let's take a very simple example. CPUs that support ssse3 have added a new native optimized memcpy instruction, so glibc added some code to use this if the CPU supported it. One problem -- traditionally memcpy on UNIX like operating systems ran in one direction, while this optimized version ran in the other direction. POSIX never specified whether the copy should start from the top or the bottom of the memory range, so this new instruction was technically compliant. Guess what? Stuff broke. Audio applications suffered the worst, with skipping and clicking being common reported problems. Everything _compiled_ fine, though. We ended up having to patch this optimization out of glibc to fix the issues.

This sort of thing is more common than you seem to realize with CPU or compiler optimizations. This is why it is safer in nearly all cases to keep running an existing binary than it is to recompile it. It may recompile with no errors, but due to more aggressive optimizations by the compiler suffer from new runtime bugs. We recently saw this with Qt as well. Once something is recompiled with a new toolchain, testing of the binaries has to start all over again.

So then we have to wonder what the benefits are that justify these risks. In my experience, most software isn't going to use optimized instructions anyway (or very few of them). The places where you are likely to see benefits are mostly in areas such as multimedia or number crunching. If you're running a CPU intensive application (something that runs for hours and hours), you might see some benefit to compiling that application with CPU specific optimizations. But for a general purpose Linux distribution, all you're likely to do by compiling it with CPU specific optimizations is introduce corner case bugs in the binaries that will be very hard to debug due to the fact that any given binary is getting tested by a lot less users.

All that said, I support your experiment! When you have some hard results, I'll be curious to see them. Meanwhile, arguing with everyone about who has proved what doesn't seem like a fruitful activity.

dugan · 05-05-2013, 06:57 PM

Has anyone reading this thread worked on an all-packages-recompiled fork like SLAMD64 or ARMEDSLACK, and can that person give us some insight into what such a project is like for the maintainer?

konsolebox · 05-05-2013, 10:27 PM

@volkerdi Sir, thanks for the reply and I'm glad. How should I address you?

Quote:

Originally Posted by volkerdi

Let's take a very simple example. CPUs that support ssse3 have added a new native optimized memcpy instruction, so glibc added some code to use this if the CPU supported it. One problem -- traditionally memcpy on UNIX like operating systems ran in one direction, while this optimized version ran in the other direction. POSIX never specified whether the copy should start from the top or the bottom of the memory range, so this new instruction was technically compliant. Guess what? Stuff broke. Audio applications suffered the worst, with skipping and clicking being common reported problems. Everything _compiled_ fine, though. We ended up having to patch this optimization out of glibc to fix the issues.

This sort of thing is more common than you seem to realize with CPU or compiler optimizations. This is why it is safer in nearly all cases to keep running an existing binary than it is to recompile it. It may recompile with no errors, but due to more aggressive optimizations by the compiler suffer from new runtime bugs.

Well I'm actually surprised and honestly I really can't say much about that, only thing is that I have glibc running here which is built for a machine that has ssse3. Perhaps they disabled it in Gentoo to make the compilation work, or perhaps is already disabled in glibc itself. I'm not sure but then again, if a package is known to have troubles with new instructions then one could just be excluded. I must admit that it would also be troublesome to detect it especially if such bugs are still prominent these days but I sure do hope that it's just applicable to old or unstable/development software, or old gcc, knowing that most softwares are already tested with new flags these days thanks to distros like Gentoo, and ssse3 is even at the minimum. What's popular now are the sse4+. Also, native has already been set as the default architecture for gcc if -march flag is not used, so probably gcc is already highly tested and stable to most architectures, at least with common flags like -O2 or -O3. And many end-user people use -march=native flag already when building their local packages. I just can't see how having something like -march=core2, or -march=corei7 on an architecture that's within the range of let's say x86_64 could be an aggressive optimization much compared to other flags.

Still as to why I am really enthusiastic to having a distro like Slackware optimized binaries by default even I see the risks which I believe are minimum is because of the benefits I kept looking at mainly at graphics and multimedia which not only includes players and transcoders but the libraries as well especially mesa and the X11 drivers in which commonly used softwares like firefox run around, not mentioning javascript and developer tools like gcc, and IDE's. One may be able to recompile the top level applications, but I'm not sure if it would be as straightforward for drivers and libraries. And firefox has already started using WebGL. Besides that there must be other effective optimizations on the whole system and other packages as well which should include the kernel.

Quote:

We recently saw this with Qt as well. Once something is recompiled with a new toolchain, testing of the binaries has to start all over again.

Once I had a trouble related to Qt before more specifically PyQt4 but that was when I used flags for graphite. And I think I just lowered it to -O2 to make it work, but I didn't change the march, or perhaps just removed one of the three graphite flags which was -fgraphite-identity as shown in my make.conf file. And that was the only trouble I had compile-time wise, and as or runtime, I haven't noticed any yet so far. As for my latest compilation with newer gcc 4.6 (or was it 4.7 can't remember; previous was 4.5), I no longer had the trouble but perhaps there were some changes already.

Quote:

But for a general purpose Linux distribution, all you're likely to do by compiling it with CPU specific optimizations is introduce corner case bugs in the binaries that will be very hard to debug due to the fact that any given binary is getting tested by a lot less users.

Well I should admit that there's no way I could say that there is no possibility that a bug would be found because it's compiled to a new sub-architecture, or more specifically same architecture with use of newer CPU capabilities or instructions, so I could just hope that you would consider to see it with fair balance.

Quote:

All that said, I support your experiment! When you have some hard results, I'll be curious to see them. Meanwhile, arguing with everyone about who has proved what doesn't seem like a fruitful activity.

Thanks for being open with it. If odds would give favor the only thing I'm certain is that I could deliver a complete set of the product, and probably including prototype scripts, but as for the benchmarks and tests I'm not sure if I could give them as my system is already a bit fragile to such. Of course I wouldn't expect much from it, but I can't make promises as well as I could only give my easy time on it.

Erik_FL · 05-05-2013, 11:10 PM

Quote:

Originally Posted by volkerdi

This sort of thing is more common than you seem to realize with CPU or compiler optimizations. This is why it is safer in nearly all cases to keep running an existing binary than it is to recompile it. It may recompile with no errors, but due to more aggressive optimizations by the compiler suffer from new runtime bugs.

I agree with this comment. I've had to investigate and solve a number of problems with software that turned out to be compiler bugs. In many cases the bugs didn't happen until a slight change was made in the software sources or compiler options. The majority of software in a Linux distro is already compiled with fairly aggressive optimizations. Enabling the few CPU-specific special purpose instructions introduces more risks than rewards of performance improvement. Sometimes making part of the software faster breaks something.

The kernel is the main exception to the risk/reward rule, because many of the added instructions provide improvements to operating system performance. Mostly they affect multi-core, multi-CPU or hyper-threaded systems. Because the kernel can be compiled with so many different options, it poses the largest challenge for optimization and testing. One of the strengths of Slackware is the use of nearly unmodified versions of the kernel, thus taking advantage of the optimizations and testing done by the Linux developers.

The same is true for the other software in Slackware. Other distros may re-compile software frequently and encourage users to do so. They also make modifications to software to distinguish them from other distros. The end result is often less stability and predictability from one system to another.

If one is considering recompiling applications with CPU specific options, then a sensible approach is to focus on the applications that are a problem rather than all the software. That has a much greater chance of providing performance improvements that can actually be measured, and defining a controlled set of software that can be tested for problems.

It's not a bad idea to improve performance by taking advantage of CPU features, but trying to apply that approach to all of the software is a bad idea. Here is where Slackware users can provide valuable insight. Users can test individual applications and identify those that show a major performance increases by using newer CPU features. Recommending specific applications with performance affected by CPU options is a more practical approach than the "boil the ocean" five different ways approach. When the applications are identified, any difficulties for users rebuilding the applications can be addressed.

A problem in search of a solution is much better than a solution in search of a problem.

GazL · 05-06-2013, 05:52 AM

When this thread was first posted I took it on myself to try adding -march=core2 to my mplayer build. All I managed to achieve was to double the amount of CPU it was using when playing back a video file. Conclusion: -march=native is not a magic button and it's just as likely to interfere with any optimisations the original developers have made as it is to make things better/faster/more efficient.

I must confess to having always liked the 'make world' rebuild it all concept, and back in the beginning I very nearly went with TAMU rather than Slackware because of it, but what Pat does has worked well for near on 20 years and you just can't argue with that.

hitest · 05-06-2013, 09:10 AM

Quote:

Originally Posted by GazL

I must confess to having always liked the 'make world' rebuild it all concept, and back in the beginning I very nearly went with TAMU rather than Slackware because of it, but what Pat does has worked well for near on 20 years and you just can't argue with that.

Agreed. If it ain't broke, don't fix it.