I need your help with a benchmark, run it and report it! :)

mark_umr · 04-26-2003, 05:20 PM

I am trying to verify that Linux PCs are faster than Sun Blade 2000's or any other Suns that are anywhere under $10k, at least with a simple benchmark I wrote. I have run it on Blade 2000's, and in Linux and in Cygwin on my own Duron 950.

I am seeing results that seem to indicate that the Duron 950 is significantly faster for small array sizes, which would indicate to me that probably the Duron is much faster with register or cache accesses, but with a 200 MHz bus it is slower when it needs to do the multiplications from main memory.

I would really like for some people with P4's with the 533 MHz bus to run this, but I'm open to all (faster than my Duron 950) and would would definitely like to see some Athlon XP 3000's report. I STRONGLY prefer that if you are overclocking, that you set your BIOS to a normal/correct clock and report your results. I'm suggesting to my school to change to Linux PCs and I'm quite sure they are not going to overclock them, so overclocked results are useless for my purposes. Start a separate thread maybe if you want to report overclocked results too.

DEFINITELY feel free to critique the code (BUT PLEASE START A SEPARATE THREAD). It is my first benchmark ever and so I will not mind. However, I would like results more than critiques I guess.

I did this with an array because I am attempting to simulate floating point matrix multiplications within Matlab and Spice. I chose purposely not to emulate a matrix multiplication exactly, because I do not think that it matters too much how far elements are apart from each other, just that they are apart from each other, such that register to register, cache to cache, and main memory to main memory results can be checked. Correct me if I'm wrong here (PLEASE START A SEPARATE THREAD FOR THIS TOO). I do not have knowledge of IA32 architecture so I could be wrong here.

I'm also pondering starting a sourceforge project for a spec like benchmark that is free and GNU. This MIGHT be a starting point, but in reality, spec benchmarks use actual algorithms, so this is probably not a good starting point if the goal is to emulate a spec benchmark. Also, it does not involve multiple threads, etc. which a spec benchmark for matrix multiplication might.

To run it, copy the code to the named files, copy the script to any file you want, then chmod the script and run the script and then post your results with the following info:

CPU
RAM type and bus rate

I do not care video cards, etc. because if you look at the code, it obviously does not use them (neither do Spice or Matlab DSP applications).

Mark

PS There is a float version (which is usually 32 bits) and a double version (which is usually 64 bits). I originally assumed the 64 bit Suns might out perform the 32 bit Intel architecture with the double version, but I could have an incorrect assumption here, but the results between the two are very close, so I suspect that with a fast enough bus speed that the IA32 will outperform the Sun on either.

----------- the script ----------------
#!/bin/sh

g++ -Wall -o newDoubleLoops newDoubleLoops.cc
g++ -Wall -o newFloatLoops newFloatLoops.cc

echo "-----------------------------------------"
./newFloatLoops 4 2000000
./newFloatLoops 4 2000000
./newFloatLoops 4 2000000
echo "-----------------------------------------"
./newFloatLoops 32 200000
./newFloatLoops 32 200000
./newFloatLoops 32 200000
echo "-----------------------------------------"
./newFloatLoops 3200 2
./newFloatLoops 3200 2
./newFloatLoops 3200 2
echo "-----------------------------------------"

echo "-----------------------------------------"
./newDoubleLoops 4 2000000
./newDoubleLoops 4 2000000
./newDoubleLoops 4 2000000
echo "-----------------------------------------"
./newDoubleLoops 32 200000
./newDoubleLoops 32 200000
./newDoubleLoops 32 200000
echo "-----------------------------------------"
./newDoubleLoops 3200 2
./newDoubleLoops 3200 2
./newDoubleLoops 3200 2
echo "-----------------------------------------"

------------- newDoubleLoops.cc -------------

#include <stdio.h>
#include <math.h>
#include <iostream>
#include <stdlib.h>
#include <time.h>
#include <sys/time.h>

using namespace std;

int main( int argc, char **argv )
{
if ( argc != 3 )
exit( 1 );

int start_timestamp = time((time_t *) 0);
// cout << "double time start = " << start_timestamp << endl;

srand( start_timestamp );

long int limit = atoi( argv[ 1 ] );
//cout << "limit " << limit << endl;
long int limitTwo = atoi( argv[ 2 ] );
//cout << "limitTwo " << limitTwo << endl;
// double numbers[limit][limit];
double ** numbers;
numbers = new double* [ limit ];
for( long int i = 0; i < limit; i++ )
{
numbers[ i ] = new double[ limit ];
}

for( long int i = 0; i < limit; i++ )
{
for( long int j = 0; j < limit; j++ )
{
numbers[ i ][ j ] = double( rand() ) / 2.0 + 1.0;
}
}

for( long int k = 0; k < limitTwo; k++ )
{
for( long int i = 0; i < limit; i++ )
{
for( long int j = 0; j < limit; j++ )
{
numbers[ i ][ j ] = numbers[ i ][ j ] * numbers[ i ][ limit - 1 - j ];
}
}

for( int j = 0; j < limit; j++ )
{
for( int i = 0; i < limit; i++ )
{
//cout << "i = " << i << " j = " << j << endl;
numbers[ i ][ j ] = numbers[ i ][ j ] * numbers[ limit - 1 - i ][ j ];
}
}

}

int end_timestamp = time((time_t *) 0);
// cout << "double time stop = " << end_timestamp << endl;
cout << "double total time = " << ( end_timestamp - start_timestamp ) << " with ( " << limit << ", " << limitTwo << " )\n";

for( long int i = 0; i < limit; i++ )
{
delete[] numbers[ i ];
}
delete []numbers;

}

------------- newFloatLoops.cc -------------

#include <stdio.h>
#include <math.h>
#include <iostream>
#include <stdlib.h>
#include <time.h>
#include <sys/time.h>

using namespace std;

int main( int argc, char **argv )
{
if ( argc != 3 )
exit( 1 );

int start_timestamp = time((time_t *) 0);
// cout << "float time start = " << start_timestamp << endl;

srand( start_timestamp );

long int limit = atoi( argv[ 1 ] );
//cout << "limit " << limit << endl;
long int limitTwo = atoi( argv[ 2 ] );
//cout << "limitTwo " << limitTwo << endl;
// float numbers[limit][limit];
float ** numbers;
numbers = new float* [ limit ];
for( long int i = 0; i < limit; i++ )
{
numbers[ i ] = new float[ limit ];
}

for( long int i = 0; i < limit; i++ )
{
for( long int j = 0; j < limit; j++ )
{
numbers[ i ][ j ] = float( rand() ) / 2.0F + 1.0F;
}
}

for( long int k = 0; k < limitTwo; k++ )
{
for( long int i = 0; i < limit; i++ )
{
for( long int j = 0; j < limit; j++ )
{
numbers[ i ][ j ] = numbers[ i ][ j ] * numbers[ i ][ limit - 1 - j ];
}
}

for( int j = 0; j < limit; j++ )
{
for( int i = 0; i < limit; i++ )
{
//cout << "i = " << i << " j = " << j << endl;
numbers[ i ][ j ] = numbers[ i ][ j ] * numbers[ limit - 1 - i ][ j ];
}
}

}

int end_timestamp = time((time_t *) 0);
// cout << "float time stop = " << end_timestamp << endl;
cout << "float total time = " << ( end_timestamp - start_timestamp ) << " with ( " << limit << ", " << limitTwo << " )\n";

for( long int i = 0; i < limit; i++ )
{
delete[] numbers[ i ];
}
delete []numbers;

}

mark_umr · 04-27-2003, 11:58 PM

By the way, after you have all 3 files (sh script and two cc files) created, it will take less than 5 minutes to run on anything faster than a duron 950.

Mik · 04-28-2003, 04:10 AM

Well here is some results for you. I don't really know what the results would mean. But hopefully it is somehow usefull to you.
I did the test on two different machines:

First one is a Athlon XP 2000. With 512 MB of DDR 333Mhz ram.
The OS is a LFS compiled with optimizations. I'm still not sure why it reports the cpu speed as 1250 though. But I'm not really a hardware person and it seems to still perform well enough for me.

Code:

mik:~/bench$ cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 6
model name      : AMD Athlon(tm) Processor
stepping        : 2
cpu MHz         : 1250.069
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips        : 2490.36

mik:~/bench$ sh bench.sh
-----------------------------------------
float total time = 2 with ( 4, 2000000 )
float total time = 2 with ( 4, 2000000 )
float total time = 1 with ( 4, 2000000 )
-----------------------------------------
float total time = 11 with ( 32, 200000 )
float total time = 10 with ( 32, 200000 )
float total time = 11 with ( 32, 200000 )
-----------------------------------------
float total time = 2 with ( 3200, 2 )
float total time = 3 with ( 3200, 2 )
float total time = 3 with ( 3200, 2 )
-----------------------------------------
-----------------------------------------
double total time = 2 with ( 4, 2000000 )
double total time = 1 with ( 4, 2000000 )
double total time = 2 with ( 4, 2000000 )
-----------------------------------------
double total time = 10 with ( 32, 200000 )
double total time = 11 with ( 32, 200000 )
double total time = 10 with ( 32, 200000 )
-----------------------------------------
double total time = 4 with ( 3200, 2 )
double total time = 4 with ( 3200, 2 )
double total time = 3 with ( 3200, 2 )
-----------------------------------------

The second machine was on a pentium 4 with 1 gig memory. I have no idea what type it is though.
The OS is Suse 7.3. It seems to take extremely long on this machine. It is a shared machine but nobody else was logged in at that moment. It was however running X and the kde login manager (but it was idle).

Code:

mik:~/bench$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 0
model name      : Intel(R) Pentium(R) 4 CPU 1500MHz
stepping        : 10
cpu MHz         : 1483.119
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss tm
bogomips        : 2962.22

mik:~/bench$ sh bench.sh
-----------------------------------------
float total time = 51 with ( 4, 2000000 )
float total time = 50 with ( 4, 2000000 )
float total time = 51 with ( 4, 2000000 )
-----------------------------------------
float total time = 323 with ( 32, 200000 )
float total time = 322 with ( 32, 200000 )
float total time = 322 with ( 32, 200000 )
-----------------------------------------
float total time = 27 with ( 3200, 2 )
float total time = 27 with ( 3200, 2 )
float total time = 26 with ( 3200, 2 )
-----------------------------------------
-----------------------------------------
double total time = 51 with ( 4, 2000000 )
double total time = 50 with ( 4, 2000000 )
double total time = 51 with ( 4, 2000000 )
-----------------------------------------
double total time = 323 with ( 32, 200000 )
double total time = 323 with ( 32, 200000 )
double total time = 323 with ( 32, 200000 )
-----------------------------------------
double total time = 11 with ( 3200, 2 )
double total time = 12 with ( 3200, 2 )
double total time = 11 with ( 3200, 2 )
-----------------------------------------

webtoe · 04-28-2003, 04:37 AM

The Athlon chip may be reported at that speed because they don't use the clock speed to name their chips anymore so the 2000 in the name doesn't mean anything except that they think that it will out perform a pentium at that clock speed. You may know that already but hey, i was going to subscribe to this thread anyway.

Alex

Mik · 04-28-2003, 04:43 AM

Yes I know that the number 2000 doesn't equal the cpu speed in Mhz. But I thought that the Atlon XP 2000 would at least match up to 1600 Mhz. Maybe I'm just misinformed.

webtoe · 04-28-2003, 04:53 AM

well im not sure since my friend had the same thing as you. He was also slightly miffed to find his stupidly powerful new machine was having its cpu speed reported at about 1200. I just blamed it on win XP but maybe all the chips are like that. He overclocked it in the end (he had some bastardly huge fans all over the place so it ran coooooooool

)

hope i didn't sound superior in my earlier post.

Alex

P.S. I think he did go attempting a bios update which was reported to fix something in relation to this. He didn't try it in the end coz he didn't want to mangle the machine (though he was happy to overclock)

Mik · 04-28-2003, 05:36 AM

No you didn't sound too superior. I'm not really a hardware person so my knowledge on those topics is very limited.
I don't have windows on my PC so I can't blame it on that. But it should be pretty safe to try either overclocking or updating the bios. Seems there is a way to recover from a faulty bios flash so I'm pretty safe there. And the computer is supposed to shutdown automatically when the processor gets too hot. So I don't think I can mess up too much if I try overclocking. I guess I'll just have to try out a few things. Although I'm kinda fine with the way it runs now. I don't mind waiting a few extra seconds for something to compile. And compiling code is about the only really resource intensive stuff I use it for right now.

Mik · 04-28-2003, 12:55 PM

Ok I figured out what why it was misdetecting my cpu. Apparantly the FSB gets set to 100 in the bios. I set it to 266 an now I get proper results. I did the bench mark test again for the first machine and this is what I get now:

Code:

mik:~/bench$ cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 6
model name      : AMD Athlon(tm) XP 2000+
stepping        : 2
cpu MHz         : 1730.138
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips        : 3447.19

mik:~/bench$ sh bench.sh
-----------------------------------------
float total time = 1 with ( 4, 2000000 )
float total time = 1 with ( 4, 2000000 )
float total time = 1 with ( 4, 2000000 )
-----------------------------------------
float total time = 8 with ( 32, 200000 )
float total time = 7 with ( 32, 200000 )
float total time = 8 with ( 32, 200000 )
-----------------------------------------
float total time = 2 with ( 3200, 2 )
float total time = 3 with ( 3200, 2 )
float total time = 3 with ( 3200, 2 )
-----------------------------------------
-----------------------------------------
double total time = 1 with ( 4, 2000000 )
double total time = 1 with ( 4, 2000000 )
double total time = 1 with ( 4, 2000000 )
-----------------------------------------
double total time = 8 with ( 32, 200000 )
double total time = 7 with ( 32, 200000 )
double total time = 8 with ( 32, 200000 )
-----------------------------------------
double total time = 3 with ( 3200, 2 )
double total time = 4 with ( 3200, 2 )
double total time = 3 with ( 3200, 2 )
-----------------------------------------

green_dragon37 · 04-28-2003, 01:34 PM

Here are my results
I ran it on a P4 2.4 Ghz w/ 533 FSB,
256 M of PC-800 Rambus RAM

processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Pentium(R) 4 CPU 2.40GHz
stepping : 7
cpu MHz : 2405.586
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips : 4784.26
________________________________________________________

-----------------------------------------
float total time = 26 with ( 4, 2000000 )
float total time = 26 with ( 4, 2000000 )
float total time = 27 with ( 4, 2000000 )
-----------------------------------------
float total time = 169 with ( 32, 200000 )
float total time = 168 with ( 32, 200000 )
float total time = 165 with ( 32, 200000 )
-----------------------------------------
float total time = 13 with ( 3200, 2 )
float total time = 13 with ( 3200, 2 )
float total time = 13 with ( 3200, 2 )
-----------------------------------------
-----------------------------------------
double total time = 26 with ( 4, 2000000 )
double total time = 26 with ( 4, 2000000 )
double total time = 27 with ( 4, 2000000 )
-----------------------------------------
double total time = 166 with ( 32, 200000 )
double total time = 169 with ( 32, 200000 )
double total time = 164 with ( 32, 200000 )
-----------------------------------------
double total time = 5 with ( 3200, 2 )
double total time = 5 with ( 3200, 2 )
double total time = 5 with ( 3200, 2 )
-----------------------------------------

Ian

nxny · 04-28-2003, 02:12 PM

P4 2.0G, 256MB SDRAM not sure about the FSB speed ( 400/533?! )

processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 1
model name : Intel(R) Pentium(R) 4 CPU 2.00GHz
stepping : 2
cpu MHz : 1992.653
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips : 3971.48

-----------------------------------------
float total time = 37 with ( 4, 2000000 )
float total time = 37 with ( 4, 2000000 )
float total time = 37 with ( 4, 2000000 )
-----------------------------------------
float total time = 235 with ( 32, 200000 )
float total time = 234 with ( 32, 200000 )
float total time = 235 with ( 32, 200000 )
-----------------------------------------
float total time = 21 with ( 3200, 2 )
float total time = 20 with ( 3200, 2 )
float total time = 21 with ( 3200, 2 )
-----------------------------------------
-----------------------------------------
double total time = 37 with ( 4, 2000000 )
double total time = 37 with ( 4, 2000000 )
double total time = 37 with ( 4, 2000000 )
-----------------------------------------
double total time = 235 with ( 32, 200000 )
double total time = 234 with ( 32, 200000 )
double total time = 234 with ( 32, 200000 )
-----------------------------------------
double total time = 12 with ( 3200, 2 )
double total time = 11 with ( 3200, 2 )
double total time = 12 with ( 3200, 2 )
-----------------------------------------

Whoa. Mik's AMD XP 2000+ blew the lid off of the the 'Intel Pentia' and at this moment, we're still searching for it. Although my results more or less falls in line with green_dragon's results, the cheaper RAM I use taken into account, the 2000+ with DDR SD is about a whopping 40 times faster than the 2000 with SD; it is a bit un-nerving to me eek:

Mik · 05-05-2003, 06:23 AM

So have you got any results/conclusions from these test results? Or are you still waiting for more results?

mark_umr · 06-01-2003, 07:03 PM

Well, I really am curious why the P4's are so much slower. I found this to be the case at school and thought maybe it was a Win2k on cygwin issue since I have a much slower Duron at home on Win XP and it was so much faster. However, I guess it is a P4 issue.

I think this may show that this is not a good benchmark because surely the P4 is not that much slower than Athlons in real world performance.

Here are my results in cygwin on Win XP right now. The results are the same in linux.

-----------------------------------------
float total time = 2 with ( 4, 2000000 )
float total time = 3 with ( 4, 2000000 )
float total time = 4 with ( 4, 2000000 )
-----------------------------------------
float total time = 18 with ( 32, 200000 )
float total time = 18 with ( 32, 200000 )
float total time = 19 with ( 32, 200000 )
-----------------------------------------
float total time = 19 with ( 3200, 2 )
float total time = 18 with ( 3200, 2 )
float total time = 15 with ( 3200, 2 )
-----------------------------------------
-----------------------------------------
double total time = 4 with ( 4, 2000000 )
double total time = 3 with ( 4, 2000000 )
double total time = 2 with ( 4, 2000000 )
-----------------------------------------
double total time = 17 with ( 32, 200000 )
double total time = 18 with ( 32, 200000 )
double total time = 16 with ( 32, 200000 )
-----------------------------------------
double total time = 19 with ( 3200, 2 )
double total time = 18 with ( 3200, 2 )
double total time = 20 with ( 3200, 2 )
-----------------------------------------

This is on a Duron 950.

Mark

cli_man · 06-02-2003, 11:41 AM

In an earlier post on this thread someone mentioned the AMD showing the wrong cpu speed in winxp, XP throttles down your cpu speed when it is not being used to save power I guess, my laptop has a amd, it is about a 1 GHz but it always shows up as 500 Mhz, If I am running seti or something on it, or a good game it will show up as 1000 Mhz. It is hard to get XP to show the speed correctly, even with seti running it shows up as 500 Mhz many times! Leave it to windows :-)