What can I do to optimize compilation of an Intel pentium dual-core T3400(mobile)?
Hello!
Preface Host system Objective For some software I would like to make maximum perfomance optimizations possible for my processor(computer).Note I'm compiling mostly "C" and "C++" code on the GNOME/GTK+ environment.My compiler sets: Thanks in advance! Cheers! |
Quote:
If you want better optimized code, start with full optimization but with debugging symbols, then run the program under a tool, such as oprofile, that does low level, non intrusive, sampling of where the time is spent. Then hand optimize the critical functions. Note that if you want to improve the algorithm, rather than the code, you may be better off with a more traditional profiler that measures more intrusively (so much less accurately) but effectively measures the whole call tree, so you see which functions call the frequently used functions, rather than just a one dimensional view of where the time is used. When "hand optimizing" key routines, an important part of the task is providing optimizer hints. So the code might be unchanged, but the compiler optimizer is told more about the code so it can do a better job. Read about the "restrict" attribute on pointers for the best example of that kind of info (in the source code for benefit of the optimizer). For x86_64 specific hand optimizations (beyond ordinary optimization that you would do for any architecture), one very important area is index variables. Consider very common code looking like Code:
for ( datatype N=0; N<S; N++ ) { In 32 bit x86, N could be int or unsigned int or long or std::size_t, interchangeable with zero impact on performance (since the value fits in 31 bits unsigned, there is also zero impact on correctness). Most programmers use int, because they do that by default in situations where any one of those four types would give equally correct results. In x86_64, int is usually the worst choice for performance in that situation and unsigned int is usually the best. The difference is usually tiny, so profile first and think about this issue only for the most time consuming loops. The loop control itself is equally good with int or unsigned int, and only a little worse with long or std::size_t. The X[N] expressions depends on context and on the nature of X. But int (for N) is almost always a little worse than any of the other choices. Unsigned int is typically a little better than long, but rarely may be a little worse. std::size_t has the same performance as long. The X[N+1] expression, if you don't tweak it, is usually better with long or std::size_t than with int or unsigned int, because (X+1)[N] is usually more efficient than X[N+1]. In this situation, the programmer knows (X+1)[N] has the same meaning as X[N+1], but the compiler only knows that if N is 64 bit. So I often write X[N+1L] so the compiler may optimize it to (X+1)[N] so that unsigned int N will be more efficient than long or std::size_t. You can trust the compiler to know whether (X+1)[N] is better or worse than X[N+1L] and to switch between them as appropriate. But when N is 32 bit, the compiler can't know that (X+1)[N] is the same as X[N+1] so it can't switch. The 1 is just an example. This logic applies whenever the added or subtracted value is a compile time constant. But if that value is a run time variable, it is usually best for that variable and N both to be unsigned int. |
Use these CFLAGS:
Code:
-march=native -O2 -pipe -fPIC |
thanks!
Quote:
Does that mean that if I use the compiler from my 32 bits only debian lenny (GCC 4.3) instead of the one from the Linux Mint, I could get specific cpu perfomance boosts? Also, thanks for pointing the oprofile tool, is very useful. Quote:
Can I create and mount, just a new JFS filesystem partition and make compilations there, or do I have to format the entire disk to have the benefits of the Journalised system(and Install the system again)? Asking because I can make a whole disk backup easily... so formatting won't be much deal. Thanks for the posts, they were really helpful! cheers! |
You would have to run the system with JFS as the filesystem = backup and reinstall with JFS.
I found out about JFS when I wanted something to run on my laptop, and I wanted performance but low CPU usage ... and this fit it. Also, never had any problems with it. |
Quote:
I assume the 32 bit compiler you mean can only build 32 bit code. 32 bit code often runs a tiny bit faster than the source compiled for 64 bit. So maybe you want to compare 32 bit compile vs. 64 bit. Both versions should be runnable on a 64-bit OS (you might need to install some extra 32-bit .so files). If you want a 32 bit compile on a 64 bit system, you should use -m32 on a 64 bit compiler, rather than using a 32 bit compiler. (A 32 bit compiler could work, but has extra issues for no extra benefit). When you use -m32 or a 32 bit compiler, you should specify the CPU model, because 32 bit x86 defaults to supporting a wide range of older models. The constraint of supporting older models will reduce the performance on the current model. So specifying the model for 32 bit makes a bigger difference because it un supports some older models, not because it has great insight into the specified model. Specifying model within x86_64 does little because there is no old x86_64 model lame enough to be worth un supporting. In general, I think 32 bit x86 GCC is pretty lame. I would not focus on getting best results from it. If your program happens to run a little faster in 32 bit than in 64 bit (as many do), I still would not push forward with 32 bit. I would just investigate the reasons (usually cache misses) that make 64 bit slower and fix things (maybe pool allocation of certain data structures) to take away the 64 bit disadvantage. Quote:
I'm assuming the program to be optimized uses a lot of user mode CPU time (otherwise the compiler oriented optimization question makes no sense). So the JFS etc. portion of the answer makes no sense. At best that might optimize the non user portion of CPU time (for which compiler oriented optimization questions wouldn't be asked). |
It's better to use PIC for x86_64. I don't use -03 because it sometimes produces unstable code (quite often from what I've seen, and not much faster).
For JFS I was responding to: Quote:
|
okay, thanks
Alright, thanks johnsfine and H_TeXMeX_H
Conclusion I will use the following settings then:Cheers! |
Quote:
Quote:
Here is a link to documentation of the syntax for restrict. You need to look elsewhere (maybe c99 documentation) to get a detailed understanding of the meaning: http://gcc.gnu.org/onlinedocs/gcc/Re...-Pointers.html Roughly: a restricted pointer is a promise by the programmer that any object read or written by through that pointer is not read or written in the same section of code any other way (directly or via another pointer). If you wrote *p=*q; *p+=*r; the optimizer normally could not change that to the more efficient *p=*q+*r; because it must allow for the possibility that p and r point to the same object. Restricting either p or r ought to make the compiler able to optimize that code. (In my experience, you often need to restrict both p and r to get the compiler to see it). I used an example in which simply writing the code in the more obvious way in the first place would have made the optimization unnecessary, because only that kind of example is simple enough to highlight just the action of the restrict. There are plenty of more common, slightly more complicated, cases in which restrict lets the compiler see a less obvious optimization. |
Okay, thank you johnsfine and H_TeXMeX_H for the help.
I will try to start using the restrict word on "C" code from now on, thanks for the tip. From what I understood, it only works on C code, because restrict is a keyword from the "C" language (C99) only. I will also read fully the GCC manual, as might have some new ideas of implement better code. (sorry johnsfine, I probably made you look for the instruction when I could make it myself). Also, thanks for the explanation, from what I've understood "restrict" keyword says something like this: So you want to restrict this variables, then I'll save and lock them so that I can make use of them... Well, thanks again guys. Cheers! |
Quote:
G++ does not support the restrict keyword spelled restrict. But G++ does support it spelled __restrict__ The usual (I think best) way to deal with that is to use in restrict your C++ code and have some project wide .hpp file (included by all .cpp files in your project) that tests which compiler is in use and #defines restrict as either __restrict__ or nothing. Then you get the benefits of the feature when compiling with appropriate compilers (such as g++) but your code loses only optimization, not correctness, when porting to some other compiler. Quote:
But you don't really need to understand that. Consider restrict only in terms of the promise it implies from the programmer to the compiler (the things accessed here through this pointer are accessed here only through this pointer). The only part of that which may be hard to understand well is the meaning of "here" in that sentence. Almost always, the things accessed here through that pointer will be accessed somewhere else by some other method. That doesn't matter to restrict. As with almost all hand optimization, it is only worthwhile after a tool such as oprofile identifies the hot spots. BTW, I also looked up the -pipe option to gcc. It may significantly reduce the time needed for gcc to compile your program. But it has no effect of the generated code. I assume you wanted to make your program execute faster and/or use less battery power. Compiling faster is a different topic. |
thanks for the patiente
Thanks for your time and patiente johnsfine.
Quote:
like suggested on your tip (thanks a lot). Quote:
that is used by a tool (the compiler in this case), so that it can be interpreted and used by it. I must confess, I only gave the GCC manual a quick view, as I mentioned I do not understand very much of optimizations. Once I have time I will read it all. Quote:
That means I should put in acount try learn the GCC API and the assembly i386. I'm probably still very newbie at programming too... it's just all very recent(retrying). But I want to give my best, and learn the most I can. When I was younger, I used to learn almost a whole language on 3 days or so... One week later, and all the code I've made just looked like from another galaxy. So now...nice and easy. I shall say that your posts we're indeed very helpful, I've already learned only reading your posts(hope didn't made waste your time). Thanks very much again! Cheers! |
All times are GMT -5. The time now is 10:46 PM. |