LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 08-29-2004, 01:54 PM   #1
C++Boar
Member
 
Registered: Mar 2004
Location: Denmark
Distribution: Gentoo
Posts: 68

Rep: Reputation: 16
+performance +gcc +elimination


I have triying to make some fast code, when I found out that gcc with O3 did not help med
I have an object r with a getDirInfo. It tells about a property the object has. It must be between 0 and 27. (s is another object)

Now I loop over it and tells gcc to unroll loops (c500() should not be unrolled, since they are class members).

Now I would except gcc to unroll the d-loop and generate 27 times the
code and then in different code eat up nogood ifs (since they are always false)

But either I am wrong or I am passing the wrong options to gcc. I
use gcc version 3.3.2 and have tried with fun-roll-loop
(and a lot of other things)

I have not seen the asm code for this. I only know this to be a problem since I have tried
experiments with d=26 and it takes somewhat longer than if just the code for d=26 was there.

Hope somebody can help.



for (int v=0;v<c500();v++)
{
for (int d=0;d<27;d++)
{
if (d==r.getDirInfo())
{
for (int x=0;x<c500();x++)
for (int y=0;y<c500();y++)
{
r.p.setX(x);
r.p.setY(y);
bool first = false;
bool last = false;
realnum hit1,hit2;

bool b;
if (d==0) { b= s.xmax()<r.p.x() || s.ymax()<r.p.y() || s.zmax()<r.p.z();}
if (d==1) { b= s.xmax()<r.p.x() || s.ymax()<r.p.y() || s.zmin()>r.p.z();}
if (d==2) { b= s.xmax()<r.p.x() || s.ymin()>r.p.y() || s.zmax()<r.p.z();}
if (d==3) { b= s.xmin()> r.p.x() || s.ymin()>r.p.y() || s.zmax()<r.p.z();}
//..... and lots of others up to 27.

if (b)
do something.

}
}
}
}
 
Old 08-30-2004, 10:56 AM   #2
jim mcnamara
Member
 
Registered: May 2002
Posts: 964

Rep: Reputation: 36
If I understand what you want -

You do know that getting information about a directory requires disk I/O -
which is very slow. Unrolling loops will not help all that much. I can't see enough of your code to know if you're calling a function to get directory information or not.

Plus, using code tags would help us read your code.

If your program runs very slowly, try compiling for profiling ( -g -p) and then use a profiler like gprof to find where your problem is.
 
Old 08-30-2004, 12:15 PM   #3
C++Boar
Member
 
Registered: Mar 2004
Location: Denmark
Distribution: Gentoo
Posts: 68

Original Poster
Rep: Reputation: 16
You must have misunderstood. I am not doing any I/O. I do math.
I have a vector (a,b,c). This vector can have 27 forms
a is positive, a is negative and a is zero (same for b and c (= 3*3*3 = 27 possibilities))

Now I want to check something (s) with that vector called r in my code.
The code will do that a lot. So it must be real fast.

So I create my normal loops so that they can not be unrolled (that means they do not have a known endpoint at compiletime c500() =500 but unknown at compiletime). But the one with the constant 27 I want that one unrolled right away.

By that I mean that I want the compiler to write that code 27 times with integer-constants instead of d. Then I want the compiler to realize that 26 of the 27 ifs in the code will always be false and forget about them. There should then only be one true if-statement per unrolled value.
But I can see on execution time that that is not happening.

It should be so, but I I don't think it is possible (with gcc)

I can work around this in several ways but unless I use macros (with no chance to debug) or maintain 27 different 'almost the same' code (Or make some code to create code). And I don't like any of the above.

Maybe I should contact some of these gcc-guys.
 
Old 08-30-2004, 02:47 PM   #4
jim mcnamara
Member
 
Registered: May 2002
Posts: 964

Rep: Reputation: 36
Okay. I understand - you want to execute one comparison, if (d==?) then do one set of operations per loop, and NOT do the other 26 compares?

Correct?

use switch()
Code:
switch(d){
    case 0: { b= s.xmax()<r.p.x() || s.ymax()<r.p.y() || s.zmax()<r.p.z(); break;}
    case 1: { b= s.xmax()<r.p.x() || s.ymax()<r.p.y() || s.zmin()>r.p.z(); break;} 
    case 2: { b= s.xmax()<r.p.x() || s.ymin()>r.p.y() || s.zmax()<r.p.z(); break;} 
    case 3: { b= s.xmin()> r.p.x() || s.ymin()>r.p.y() || s.zmax()<r.p.z();break;}
    case ...
    case 27:{.........}
    default:
          break;
}
 
Old 08-30-2004, 02:55 PM   #5
jim mcnamara
Member
 
Registered: May 2002
Posts: 964

Rep: Reputation: 36
Another option - (assuming all 27 operations are unique) create 27 functions that handle your operations. Then create an array of function pointers:
Code:
typedef int (*pt2Function)(struct r, int, int);
pt2Function funcs[27];
...............
for(d=0;d<27){
...............
  func[d](arg1,arg2,arg3);
..........
}
 
Old 08-30-2004, 03:15 PM   #6
jim mcnamara
Member
 
Registered: May 2002
Posts: 964

Rep: Reputation: 36
Also: heuristics play part in this solution, not just brute force programming
Code:
b= s.xmin()> r.p.x() || s.ymin()>r.p.y() || s.zmax()<r.p.z();
The compiler evaluates logical or expressions from left to right.
It quits evaluation when it finds a logical true. If you have a priori knowledge that for 40% of the cases
Code:
s.zmax()<r.p.z()
is more likely to evaluate true, put it first. If you know that it is more unlikely, put it last.

Also,
Code:
s.zmax()<r.p.z()
is calling functions, you definitely want to get each of those f(x)'s inlined. Big time.

You REALLY should profile this before you spend time worrying about if() statements. if(d==3) takes like 3 opcodes. Each of those class functions takes at least 20 opcodes for housekeeping.
 
Old 09-02-2004, 11:48 PM   #7
C++Boar
Member
 
Registered: Mar 2004
Location: Denmark
Distribution: Gentoo
Posts: 68

Original Poster
Rep: Reputation: 16
Thanks.

The case: actually made it much faster. However not as fast as if the compiler had expanded the loop correctly and skiped all the other cases (or if-statements) except the one that was needed.

I have tried pointer to funktion and virtuel functions as well, but the call of a function is also expensive. And I really don't want 27 classes or functionpointers.

I know the other techniques you tell about. (s.xmax(), r.p.x() (and others) are inlined. In facf they only return a private member variable) Also I don't know anything about what will be more likely.

but gcc could improve here. This should be a common way to optimize you code without writting almost the same x (for my case 27) times.

C++Boar

Last edited by C++Boar; 09-02-2004 at 11:49 PM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
gcc wont install, 'failed dependencies: glibc-devel is needed by gcc-3.3.3-41' TdlSnare SUSE / openSUSE 3 11-29-2004 02:13 PM
Kernel compiling: gcc-3.3 is 586, should be gcc-3.3 386 Erik Plaggenmar Linux - Software 0 10-01-2004 11:38 AM
Cups elimination beowulf405 Linux - Newbie 5 06-30-2004 09:16 PM
a doubt with host gcc and arm-linux-gcc renjithgopal Linux - General 1 09-11-2003 04:02 PM
export CC=/usr/bin/gcc-3.2 - switch gcc version? ferreter Linux - Software 1 08-20-2003 12:07 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 05:58 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration