ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
After reading a lot on the optimizations gain with GCC's -march=native option, I decided to give it a try.
I'm on Debian Stretch and decided to re-build some of the packages as a test bed. The packages built, installed and ran without a hitch.
However, after some further reading I came across what I think might be a conflict between the flags set my -march=native and the actual flags reported for my particular CPU.
From GCC's manual 3.18.55 x86 options:
Quote:
-march=cpu-type
Generate instructions for the machine type cpu-type. In contrast to -mtune=cpu-type, which merely tunes the generated code for the specified cpu-type, -march=cpu-type allows GCC to generate code that may not run at all on processors other than the one indicated. Specifying -march=cpu-type implies -mtune=cpu-type.
There is no SSE3 flag for my processor, which I believe is Haswell-E. However for -march=native, GCC set the processor to Haswell and turns on the SSE3 flag.
My question is, should I also pass the '-mno-sse3' option to disable the SSE3 instruction set? Further down the same GCC manual page says:
Quote:
GCC depresses SSEx instructions when -mavx is used. Instead, it generates new AVX instructions or AVX equivalence for all SSEx instructions when needed.
Not sure what that means.
So should I '-march=native -mno-sse3' or just '-march=native'.
Any help in understanding this further is greatly appreciated.
When you specify march=native, you are asking gcc to auto-detect the architecture of the build processor and to adapt its behavior to produce code that is optimized for the machine upon which the compiler is now running. I would therefore anticipate that it can see whether this-or-that feature exists, without further clues from you.
However, if you know that a particular feature isn't there, I would think that there is no harm in being more specific if you want to. If you know that SSE3 really isn't there, and fear that the compiler might think that it is (which would surprise me ... those guys are good at what they do), then you could certainly specify that, and see what happens.
Last edited by sundialsvcs; 12-12-2017 at 11:20 AM.
haswell is at about 4 years old. If there had been an error like that it would have already been detected/reported and also fixed.
I think using a non-existent instruction (set) will lead to SIGILL.
haswell is at about 4 years old. If there had been an error like that it would have already been detected/reported and also fixed.
I quite-frankly agree. Maybe the man-page is what is out of date on this very-small point. If you ask gcc to "adapt itself to the architecture of the host upon which it now finds itself," I'll betcha that it will do so correctly – no matter what the man-page says.
Also – "now that even 'run-of-the-mill' microprocessors are running billions(!) of ops-per-second," how much does any of this really matter anymore? Can anyone, today, actually "hear you scream?"
Last edited by sundialsvcs; 12-12-2017 at 11:25 AM.
I quite-frankly agree. Maybe the man-page is what is out of date on this very-small point. If you ask gcc to "adapt itself to the architecture of the host upon which it now finds itself," I'll betcha that it will do so correctly – no matter what the man-page says.
Also – "now that even 'run-of-the-mill' microprocessors are running billions(!) of ops-per-second," how much does any of this really matter anymore? Can anyone, today, actually "hear you scream?"
Sorry I disagree.
Gentoo ~amd64.
The most time consuming package you can compile on gentoo linux is libreoffice.
I changed a specific config for my kernel from generic amd64 to my ivybridge architecture.
I am not quite sure, but I think i calculated two years ago something in around 3-5 Percent speed improvement. Only by building the kernel to my ivybridge cpu subset instead of the generic intel subset.
I compared several runs before this run with the gentoo bash command splat
with at least three runs after this optimization.
--
The binary distros are the worst i some regards, because they are not optimized for your architecture.
I think building a package is a proper benchmark. Building the biggest package as usual before and after certainly is.
--
My improvement is just with generic safe cflags. No ricer flags. So as much freedom as possible to gcc. No unrolling of loops or other stuff
--
Well does it matter. The computer runs less time on highest performance and therefore consumes less power. So a box with 1200 packages to compile for, it safes a lot of money over the time
haswell is at about 4 years old. If there had been an error like that it would have already been detected/reported and also fixed.
I think using a non-existent instruction (set) will lead to SIGILL.
Lol nope.
Some bugs are discovered quite late. Look at the intel management engine for example. Dirty cow and others.
I assume you talked about software and hardware in one piece. One can not exist with the other.
Also do not forget those intel nas cpu, which were flawed recently.
Some intel cpus do not age very well
also do not forget the sata bug on intel platforms. Intel does not really test that well their cpu in my point of view. Just overprized for their low quality of service. See amd ryzen and how intel suddenly was able to lower their prices.
Yes your CPU can do SSE3, all instructions of SSE3 are included in SSSE3. So applications looking for SSE3 instructions will find them, this is why -march=native includes it.
Yes your CPU can do SSE3, all instructions of SSE3 are included in SSSE3. So applications looking for SSE3 instructions will find them, this is why -march=native includes it.
No and No and No
What march native does is very well explained in details on forums.gentoo.org. at least a hundreds topics covering this topic what march native does, how you check what gcc does with different settings. in the past even the gentoo wiki had an article about it
SSE3, Streaming SIMD Extensions 3, also known by its Intel code name Prescott New Instructions (PNI), ... bla bla bla
From a gentoo user perspective.
March native usually does a better job as the user choosen architecture. Most of the time it was wrongly choosen. The gentoo wiki had pages over pages. At the end before march native was really introduced and used, gentoo wiki had a list of every common processor with what march settings to use.
e.g. 3610 QM => ....
--
I hardly can remember any event where on the common "amd64" platform, which also ofc includes these days intel processors, where the march native was wrong.
As said, check how those flags are reported, check the gcc manual. It's a bit confusing, different named.
Not sure when gentoo introduced the march native thing. Also in the past, every new gcc sometimes introduced something fresh to choose from.
The only thing I have choosen Ivybridge explictely is
Quote:
ASUS-G75VW roman # zgrep IVY /proc/config.gz
CONFIG_MIVYBRIDGE=y
So you are saying OP's CPU cannot do SSE3, SSE3 instructions are not included in SSSE3 and -march=native does not include SSE3.
No comments.
I said
Check the GCC manual
left always is the corresponding flag, last time i checked two years ago
Check alternative names.
A computer does not care for instructions which are a subset + addition of other instructions. That is a technical rant basically.
A computer just checks for is the instruction there? Yes use it? no do not use it? => /proc/cpuinfo
You may be right if those instructions are a subset or not, but it does not matter.
What matters as said explicitely now with that wikipedia link. It seems I failed to explain it more clearly.
I remember instantly knowing for what to look for with the gcc manual and the gentoo wiki regarding march settings.
The gcc output is a bit different and not so obvious in my personal opinion. It helps to check up on how to read it, what it means, because they have choosen the wording quite bad in my point of view.
To rephrase it. I expected that you would also say that PNI is that corresponding flag, as instead arguing with next "not checked yet" subset instruction.
I think the question was, or I understood it that way, Why is there no xxxx flag, which was obvious to myself as a long term gentoo user. It is just named differently. Your technicallities are nice, but you miss the point, the flag is there just named differently.
---
You may be right with your statement that this instruction is a subset of another instruction, but thats just knowledge which is nice to have.
basically generic speaking
check gcc manual
check how else it may be named
check what gcc output is, and check how to read that output.
I most of the time lookup gentoo useflags. what does it really means, what does it do. Same with kernel settings. They are vaguely described.
--
When you use gentoo you see a lot of those "funny text rolling" = compiler + linker output
at the beginning you always see
has x
has y
has z
the computer does not care for the is subset of. it checks for mmx, yes it has it, i use it, no it does not have it, do something else.
Like in real life
there are several words for the same thing, just labeled differently because two countries use a similar language it has two different words for the same stuff. one calls it pni other calls it SSE3
I'm not a programmer so the nitty gritty of gcc is all greek to me. But, while I may not understand the inner workings, I can certainly appreciate what certain option such as funroll-loops and something-inline might do for optimizing.
Then it's possible (repeat possible) that pni is just another name for SSE3. I will do some more reading.
And @roman keeps pointing to the gentoo forums/wikis. The gcc command I posted was from the gentoo forums.
Anyone care to explain the last part of my initial post:
Quote:
GCC depresses SSEx instructions when -mavx is used. Instead, it generates new AVX instructions or AVX equivalence for all SSEx instructions when needed.
To me it says since it is setting -mavx, it will be replacing all SSEx instructions with the avx equivalence.
The most time consuming package you can compile on gentoo linux is libreoffice.
I changed a specific config for my kernel from generic amd64 to my ivybridge architecture.
I am not quite sure, but I think i calculated two years ago something in around 3-5 Percent speed improvement. Only by building the kernel to my ivybridge cpu subset instead of the generic intel subset.
I compared several runs before this run with the gentoo bash command splat
with at least three runs after this optimization.
"Generic" is a least-common denominator setting which of course is commonly used by distros who don't want to find their software issuing any instructions that someone's chip does not have. It would be interesting to know (as if you want to re-compile LibreOffice again ... ...) whether "native" would have performed (nearly) as well as "ivybridge" in your case.
I've also used Gentoo for many years – including the old days – and I literally found that the mere fact that the software is being compiled-from-source at all(!) seemed to be the thing that made the most difference. I'd been running Red Hat on the box (for the "free" year which they allowed you, at that time, before they wanted you to start paying), and it was a really tiny box that had originally been sold with Windows 95. Simply by installing Gentoo and letting it do the compile-from-source thing, it was very easy to see that the software was smaller, and ran appreciably faster than before. (In fact, this little box, which I used for many years, was positively quick. "From power-up to ready-to-go in six seconds flat," for instance.)
"Standard Distros," for obvious reasons, purposely don't build for speed nor for small-size. They build for universality: they want to be sure that their binaries will run on everything.
Last edited by sundialsvcs; 12-15-2017 at 08:33 AM.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.