LinuxQuestions.org - Compiling kernel with aggressive flags

- Linux - Kernel (https://www.linuxquestions.org/questions/linux-kernel-70/)

- - Compiling kernel with aggressive flags (https://www.linuxquestions.org/questions/linux-kernel-70/compiling-kernel-with-aggressive-flags-4175667648/)

Compiling kernel with aggressive flags

After reading this Phonorix article I decided to give it a try and compile the kernel with the following flags:

Code:

export CXXFLAGS="-g -O3 -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=32 -Wformat -Wformat-security -Wl,--copy-dt-needed-entries -m64 -fasynchronous-unwind-tables -Wp,-D_REENTRANT -ftree-loop-distribute-patterns -Wl,-z -Wl,now -Wl,-z -Wl,relro -fno-semantic-interposition -ffat-lto-objects -fno-signed-zeros -fno-trapping-math -fassociative-math -Wl,-sort-common -fvisibility-inlines-hidden"



export CFLAGS="-g -O3 -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=32 -Wformat -Wformat-security -Wl,--copy-dt-needed-entries -m64 -fasynchronous-unwind-tables -Wp,-D_REENTRANT -ftree-loop-distribute-patterns -Wl,-z -Wl,now -Wl,-z -Wl,relro -fno-semantic-interposition -ffat-lto-objects -fno-signed-zeros -fno-trapping-math -fassociative-math -Wl,-sort-common"



export FFLAGS="-g -O3 -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=32 -Wl,--copy-dt-needed-entries -m64 -fasynchronous-unwind-tables -Wp,-D_REENTRANT -ftree-loop-distribute-patterns -Wl,-z -Wl,now -Wl,-z -Wl,relro -malign-data=abi -fno-semantic-interposition"



export KBUILD_KCONFIG="--build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --enable-libmpx --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib --with-tune=generic --without-cuda-driver -v"

However is not clear if I should set these flags in my script before make oldconfig or before make -j4.

Also, if I want to compile in 32 bit, should I just change all KBUILD_KCONFIG flags from x86_64 to x86? Or should I also change all -m64 to -m32 and remove --with-multilib-list?

Thanks!

First thing I would say is that with all those flags, it's a wonder anything compiles at all
Put them in the Makefile or export $CFLAGS, etc.

export CFLAGS=<whatever>
export FFLAG=<whatever>
etc.

Then just make.

Well, it's how Clear Linux compiles its kernel, and it was tested by Phonorix, so as much as it sounds too aggressive, it should work.

I tested and it compiles, but I'm not sure if it used all those flags. Is there a way to check?

Kernel compile optimization flags, set, CFLAGS, CXXFLAGS, FFLAGS

Hello,

well, this topic is not resolved anywhere ... so I am reviving it. Once upon a time you'd only have to set the

Code:

CFLAGS

Code:

CXXFLAGS

and occationally the

Code:

FFLAGS

in the environment; this is true, BUT I'm really not sure this thing still works. Certainly not when it comes to the linux kernel compile process. It's like a trivial task is simply nowhere to be found on the internet.

I know for a fact my CFLAGS and CXXFLAGS say -mtune=skylake and the kernel only spits at the beginning of compilation -march=westmere -mtune=haswell and practically says nothing anymore.

Also

Code:

make V=1 build

does not make the

Code:

make

process verbous anymore, so I can't even see what's happening (what kind of commands are being sent to the compiler).

Basically I want to ask how do you CURRENTLY set the OPTIMIZATION compiler flags SPECIFICALLY for the linux KERNEL ???
And how do you make the process verbous ?

Thanx in advance,
Butterfly Collector

You're right, it's gone difficult.

The day was, you could make recommendations for $CFLAGS & CXXFLAGS. Now there's such a variety of hardware out there that and probably instructions that one cpu will run, but another won't. I'm told Optimizations are moot, but I'm not sure I believe it.

Then it's slightly irritating as a dev if you distribute something which immediately has bugs filed against it because Celerons or some other cpu can't run it. IIRC, the last time this was big news was around the time that the MMX stuff came out, and those who had it wanted to use it. I have an i3, for instance, and I'm fairly sure some features are disabled. I don't know it it will run every instruction in the top end Intel or AMD instruction set.

Even back in the day, most computer manufacturers specified Z80 over the inferior 8080, but most software was compiled with the 8080 instruction set, costing them several powerful instructions. I think there was some primitive block copy Assembler instructions which were the real power advantage of the Z80, although they rapidly added speed as an advantage. It meant you could copy 256 bytes(?) from ram to I/O with one small a small bit of prep and one assembler instruction, vs no prep and 256 read-then-write cycles in the 8080.

I get that .. even at the time of pentium 4 most kernels were built for i686, but I could always get a little bit of speed advantage with something tuned for my specific cpu. The options to optimize for a specific cpu at the end, instead of the pre-distribution stage still serves me. I am simply going to run something heavy on my cpu, which would normally require a cluster of xeons (at the least), so I would take any speed up. And it's the matter of using my hardware to the maximum, because it's good, I have bought it and there is no reason not to use it the best I can. I am a speed freak when it comes to computers and not having the speed I can bugs me.

I know that the linux I use has the compiler flag -mtune=skylake and yet no matter what I try my make build keeps using -march=westmere -mtune=haswell, which as a question rattles me a lot. I google and google and ... I can't remember the last time there was something ... even way more exotic than that, that I cannot find a solution to. I know that the solution cannot be complex but instead I find nothing. So, it kind of freaks me.

At some point the make process says +export CFLAGS= .... and +export CXXFLAGS= .... with the wrong flags, while I have set mine, and yet even a grep -r 'export CFLAGS' * gives me nothing, besides the build.log with the mentioned statements. There is not a single text file that matches. There is no thing in the Makefile ... I don't even understand where is this output coming from. What is the mechanism of selecting the optimization flags.

Even in the support forum of the linux I use (Clear Linux) no one can tell me what is going on. And I know it has been built with some pretty agressive (the most gressive, actually) compiler flags. The kernel and binaries are installed with -mtune=skylake (the highest currently running on my cpu), and yet if I am to compile my own kernel with their patches from github ... I cannot set the flags.

This is incredible ...

AFAIK cpu optimizations are usually set at -o2, which doesn't stress any cpu. I repeat that optimizations are moot. Let me explain.

Take your cpu, running at 3.splash Ghz, and bursting to 4.splash Ghz. (Let's call that 4Ghz for round figures.) Your cache can't keep up, much less your outside ram. Your on-chip video does best. Next a technical paragraph.

When I tell you that light travels ~30cm in 1 nanosecond, you begin to see the issue from a pcb designers perspective. If there was a 30 cm track on a m/b being clocked at 1Ghz, there would be a moment every nanosecond where 150mm had a high, and 150mm had a low. At 4 Ghz, that's 37.5mm To add to your difficulties, The Ghz frequencies are up above what used to be UHF tv, which went to 900Mhz, all good broadcasting territory, and lines in proximity have capacitance. So imagine three such lines: One, (in the middle) is trying to go from 2.5V or 3.3V down under 0.8V, a drop of 1.7V or 2.5V. The other two in close proximity on either side are going 0-2.5V or 3.3V upwards. They can change state at 1.0 -1.2V. The middle one has to fight it's way down against opposing capacitive, and electro-magnetic forces. You might be able to see the potential for things getting out of phase. That limits peripheral speed. I did a design with 250Mhz on it for my project in University, and none of the lecturers wanted to go near it, because it was to fast! (The pcb was fine, btw. No speed worries.)

It's the I/O that's the bottleneck now, except in very computational loads. If you look at a cpu like the Celeron in wikipedia, which goes back some distance, you'll see how cpu speeds have risen. The hardware breakthroughs that haven't come are memory and general peripheral speed increases, because a bit of cable kills all that.

Well, I hope that the optimizations are moot and I understand the example, but aren't those limitations real with both optimized and unoptimized kernels. Sometimes it comes to 1 instruction instead of 10, instead of 100 or instead of even more. Sometimes it can be additional ot lacking overhead. Tricks with previous instruction sets that are no longer needed, because there is a faster and easier to code way with new instructions ... and so on, I don't even know how many things can happen with a kernel code for a newer architecture. There were for example some kind of copy on whatever instructions using the much wider whatever bus on my specific architecture which could make certain things many times faster.

Changing the clock of the cache rarely does anything for me, but changing the memory clock and the core clock makes a big difference for the calculations I do, not just for benchmarks. My NVMe does develop the 3.5GB/s read and 3.0GB/s write speeds (much higher than my SSD, limited at around 600 / 700 as every SSD, due to its bus). Which also helps a lot if I am writing dosens of gigabytes files for every major iteration in the algorithm, all of which data my cpu is to process from and back to the RAM.

Either way, I didn't much want to argue about it, so much as to find a solution for something which for me should have a direct and simple one, in no more than a few lines of commands or a txt file. At the end of the day, until I try it and see it I cannot be sure. Even 5% for me is something and who knows, it might even be 10.

Join the gcc dev mailing list and post there.