Multiple cores/optimization in C

moisespedro · 11-02-2014, 11:22 AM

Hi, I've wrote this as an university exercise. It is a program to calculate and say if a number is perfect or not.

Code:

#include <stdio.h>

int numero_perfeito(unsigned long int x) {
    
    unsigned long int i, soma = 0;
    for(i = 1; i < x ; i++) {
        if (x % i == 0)
        soma = soma + i;
    }   

    if(soma == x)
        return 1;
    return 0;
}

main() {
    unsigned long int a;
    int retorno;
    
    printf("Digite um numero:\n");
    scanf("%lu", &a);

    retorno = numero_perfeito(a);
    if(retorno) 
        printf("Numero perfeito!\n");
    else
        printf("Meh, nao e' perfeito\n");
}

It takes a number typed in by the user, calls "numero_perfeito" function and returns 1 (perfect number) or 0 (not perfect number). While running it with large numbers, such as 2305843008139952128 it uses 100% of one CPU core and it takes quite some time to calculate (it is running for more than 9 minutes at the time of this post). How can I make this program faster? Can I set it to use all CPU cores?

NevemTeve · 11-02-2014, 11:45 AM

Use sqrt function to determine the square-root of the number:

Code:

#include <math.h>

int numero_perfeito(unsigned long int x) {
    
    unsigned long int i, soma = 0;
    unsigned long maxdiv= sqrt (x);

    for(i= 1; i <= maxdiv; i++) {
        if (x % i == 0) {
            soma = soma + i;
            if (i != 1 && i != x/i) {
                soma = soma + x/i;
            }
        }
    }   

    if(soma == x)
        return 1;
    return 0;
}

moisespedro · 11-02-2014, 11:54 AM

I don't get it. I tested here and it worked but I am not sure why.

Like, take 28 for example, it is a perfect number. maxdiv is sqrt of 28 which is equal to 5.29150262213 but 28 is also divisible by 7.

NevemTeve · 11-02-2014, 12:19 PM

when i=4, you program will execute this:

Code:

sum := sum + 4 + 28/4

moisespedro · 11-02-2014, 04:55 PM

I am still not sure I got it. Anyways, it still takes some time.

metaschima · 11-02-2014, 06:58 PM

Here's my multi-threaded version, there's a bit of a problem with number 6, because how can you even split that between say 4 threads ? If you can figure out an algorithm, good for you.

Code:

bash-4.2$ compile pn.c -fopenmp

# 1 thread    
bash-4.2$ time ./pn 8589869056
8589869056 is a perfect number

real	1m3.830s
user	1m3.849s
sys	0m0.000s

# 4 threads
bash-4.2$ time ./pn 8589869056
8589869056 is a perfect number

real	0m16.885s
user	1m7.395s
sys	0m0.004s

That's 3.78 times faster.

Code:

#include <stdio.h>
#include <stdint.h>

#include <omp.h>

#define NUM_THREADS 4

int main (int argc, char * argv[])
{
	uint64_t i;
	uint64_t num;
	uint64_t tnum[2][NUM_THREADS];
	unsigned int tid;

	uint64_t sum[NUM_THREADS] = {};
	uint64_t total = 0;

	// parse arg
	if (2 != argc)
	{
		fprintf (stderr, "Usage: %s number\n", argv[0]);
		return 1;
	}

	// get num
	sscanf (argv[1], "%lu", &num);

	if (6 == num)
	{
		printf ("6 is a perfect number\n");
		return 0;
	}

	for (i = 0; i < NUM_THREADS; i++)
	{
		tnum[1][i] = (i + 1) * num / NUM_THREADS;
		tnum[0][i] = tnum[1][i] - (num / NUM_THREADS);
		//printf ("%lu %lu\n", tnum[0][i], tnum[1][i]);
	}
	tnum[0][0] = 1;
	tnum[1][NUM_THREADS - 1] = num;
	//printf ("%lu\n", tnum[1][NUM_THREADS - 1]);

	// main loop
	#pragma omp parallel private(i, tid) shared(num, sum, tnum) num_threads(NUM_THREADS)
	{
		tid = omp_get_thread_num();
		//printf ("%d %lu %lu %lu\n", tid, tnum[0][tid], tnum[1][tid], num);
		for (i = tnum[0][tid]; i < tnum[1][tid]; i++)
		{
			//printf ("%d %lu\n", tid, i);
			if (0 == num % i)
			{
				sum[tid] += i;
				//printf ("adding %lu\n", i);
			}
		}
	}

	// calculate total
	for (i = 0; i < NUM_THREADS; i++)
	{
		total += sum[i];
		//printf ("%lu\n", sum[i]);
	}
	//printf ("%lu\n", total);

	// output result
	if (total == num)
	{
		printf ("%lu is a perfect number\n", num);
	}
	else
	{
		printf ("%lu is NOT a perfect number\n", num);
	}

	return 0;
}

Not fully bug tested, but don't have time for that.

moisespedro · 11-02-2014, 07:52 PM

Gonna give your version a shot. I am a still a C begginer and I am trying to understand the code.

Did you try a really large number like 2305843008139952128?

I left it running for hours and it didn't finish (I ended up turning off the PC without noticing it

).

My last attempt:

Code:

$ time ./numeroperfeito-1                    
Digite um numero:
2305843008139952128
^C
real	175m1.960s
user	174m44.540s
sys	0m2.831s

EDIT: And how rude from my part, many thanks for your code (you both)

metaschima · 11-02-2014, 08:27 PM

Remember that finding primes and perfect numbers is computationally very expensive. Don't expect the calculation of higher values to be doable on a desktop in a day. If you wanted even more optimization, you could try using some MMX or SSE, but I've found that it is very difficult, requires knowledge of assembly, is less portable, and only with an expert's touch can it truly outperform a compiler except in obvious cases (and the compiler is likely to figure these out anyway). If I have more time, I'll see how plausible it is to use MMX or SSE for this. The main problem is that loading and unloading from these registers is slower than for other registers, so you should keep data in the registers if you want a performance improvement.

a4z · 11-03-2014, 01:14 AM

the openmp sample is already nice and goes into the right direction,
additional
try help the compiler by passing
-funroll-loops -ftree-vectorize -march=corei7-avx (orwhat ever cool cpu you have)
and check if your loops are written in a way that the compiler can unroll and verctorize them, do not write sse code by hand, help the copilerr to generate it
additionally, no optimization runs out the best, or a better , algorithm,
so you can also check it the way you solve the problem is the best to do

nbritton · 11-03-2014, 11:14 AM

If I recall correctly, all, even, positive perfect numbers can be factored with the equation y = 2^(2x - 1) - 2^(x - 1) when x is a mersenne prime number.

Quote:

x = mersenne prime
x = x, y = 2^(2x - 1) - 2^(x - 1)
x = 2, y = 2^3 - 2^1 = 6
x = 3, y = 2^5 - 2^2 = 28
x = 5, y = 2^9 - 2^4 = 496
x = 7, y = 2^13 - 2^6 = 8128
x = 13, y = 2^25 - 2^12 = 33,550,336
x = 17, y = 2^33 - 2^16 = 8,589,869,056
x = 19, y = 2^37 - 2^18 = 137,438,691,328
x = 31, y = 2^61 - 2^30 = 2,305,843,008,139,952,128

johnsfine · 11-03-2014, 11:33 AM

Quote:

Originally Posted by nbritton

If I recall correctly, all positive perfect numbers can be factored with the equation y = 2^(2x - 1) - 2^(x - 1) when x is a prime number.

All even perfect numbers can be factored with the equation
y = 2^(x-1) * (2^x-1)
where x is prime and (2^x-1) is also prime.

Your equation is equivalent.

y is perfect if and only if (2^x-1) is prime.

So the search for even perfect numbers is equivalent to searching prime numbers x for those for which (2^x-1) is prime. That is much much faster than directly testing whether y is perfect.

There is no proof (so far as I know) that odd perfect numbers don't exist. But it is pretty easy to prove no odd perfect number is small enough to be found with a simplistic search like the programs discussed earlier in this thread (certainly no odd perfect number is smaller than 2^64, so unsigned long could not hold one). So for practical purposes, perfect numbers are even.

metaschima · 11-03-2014, 12:23 PM

Quote:

Originally Posted by johnsfine

But it is pretty easy to prove no odd perfect number is small enough to be found with a simplistic search like the programs discussed earlier in this thread (certainly no odd perfect number is smaller than 2^64, so unsigned long could not hold one). So for practical purposes, perfect numbers are even.

That's one thing I was wondering about, it is likely that even 64-bit numbers will overflow. Not sure how to fix this, you many need some library that can handle larger numbers.

If you really wanted to find more perfect numbers, why not try to construct pernicious numbers and test them ? Like the wiki says:
6 = 110
28 = 11100
496 = 111110000
8128 = 1111111000000
33550336 = 1111111111111000000000000

nbritton · 11-03-2014, 01:25 PM

http://primes.utm.edu/mersenne/

Another optimization you can make is that, even, perfect numbers end in ether 6 or 8, the pattern of the series can be described with the equation y = 8 - mod(x + x^2, 4).

For example:

Code:

$ for x in $(cat mersenne_powers); do ((echo "$x, `bc <<< "2^($x-1) * (2^$x-1)" | tail -c 2`")&); done
2, 6
3, 8
5, 6
7, 8
13, 6
17, 6
19, 8
31, 8
61, 6
89, 6
107, 8
127, 8
521, 6
607, 8
1279, 8
2203, 8
2281, 6
3217, 6
4253, 6
4423, 8
9689, 6
9941, 6
11213, 6
19937, 6
21701, 6
23209, 6
44497, 6
86243, 8
110503, 8
132049, 6
216091, 8
756839, 8
859433, 6
1257787, 8
1398269, 6
2976221, 6
3021377, 6
6972593, 6
13466917, 6
20996011, 8
24036583, 8
25964951, 8
30402457, 6
32582657, 6
37156667, 8
42643801, 6
43112609, 6
57885161, 6

nbritton · 11-03-2014, 08:34 PM

Quote:

Originally Posted by johnsfine

All even perfect numbers can be factored with the equation
y = 2^(x-1) * (2^x-1)
where x is prime and (2^x-1) is also prime.

Your equation is equivalent.

According to bc, your equation is faster to compute on x86_64...

Code:

$ time echo "2^(10000000-1) * (2^10000000-1)" | bc > /dev/null

real	10m51.011s
user	10m50.292s
sys	0m0.589s


$ time echo "2^(2*10000000-1) - 2^(10000000-1)" | bc > /dev/null

real	11m37.377s
user	11m36.306s
sys	0m0.861s

johnsfine · 11-04-2014, 07:44 AM

Quote:

Originally Posted by nbritton

According to bc, your equation is faster to compute on x86_64...

Computing that isn't a significant fraction of the time regardless of which way you do it.

The significant time is in testing whether (2^X-1) is prime.

Testing whether 2^(X-1)*(2^X-1) is perfect is mathematically equivalent to testing whether (2^X-1) is prime, but since 2^(X-1)*(2^X-1)is a far larger number, directly testing whether it is perfect would be far slower than directly testing whether (2^X-1) is prime.

After discovering that (2^X-1) is prime, computing 2^(X-1)*(2^X-1) is trivial, then converting it to decimal for display is bit harder, but still trivial compared to discovering (2^X-1) is prime.