If you want high optimization you could try some of the following:
-s
linker option which removes all symbol table and relocation information from the binary
-O3
sets the highest optimization level
-fomit-frame-pointer
tells the compiler not to keep the frame pointer in a register for functions that don't need one
-march=i686
defines the instructions set to use when compiling
-malign-functions=4
aligns the start of functions to a 2 raised to 4 byte boundary
-funroll-loops
performs the optimization of loop unrolling
-fexpensive-optimizations
performs a number of minor optimizations which are relatively expensive
-malign-double
controls whether gcc aligns some variables on a two word boundary or a one word boundary
-fschedule-insns2
similar to `-fschedule-insns', but requests an additional pass of instruction scheduling after register allocation has been done
-mwide-multiply
control whether GCC uses the mul and imul that produce 64-bit results in eax:edx from 32-bit operands to do long long multiplies and 32-bit division by constants
-ffast-math
compiles in faster floating point math routines
You could also try specifying athlon instead of i686 for the architecture. But if the software causes any problems then there is a great chance it's because of the optimizations.
I tried looking for the patches isajera mentioned but I couldn't really find them. Probably could find more if I looked in the kernel mailing archives. I found a linux kernel patches website though:
http://linux-patches.rock-projects.com