LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Multiple cores/optimization in C (https://www.linuxquestions.org/questions/programming-9/multiple-cores-optimization-in-c-4175524105/)

moisespedro 11-02-2014 11:22 AM

Multiple cores/optimization in C
 
2 Attachment(s)
Hi, I've wrote this as an university exercise. It is a program to calculate and say if a number is perfect or not.

Code:

#include <stdio.h>

int numero_perfeito(unsigned long int x) {
   
    unsigned long int i, soma = 0;
    for(i = 1; i < x ; i++) {
        if (x % i == 0)
        soma = soma + i;
    } 

    if(soma == x)
        return 1;
    return 0;
}

main() {
    unsigned long int a;
    int retorno;
   
    printf("Digite um numero:\n");
    scanf("%lu", &a);

    retorno = numero_perfeito(a);
    if(retorno)
        printf("Numero perfeito!\n");
    else
        printf("Meh, nao e' perfeito\n");
}

It takes a number typed in by the user, calls "numero_perfeito" function and returns 1 (perfect number) or 0 (not perfect number). While running it with large numbers, such as 2305843008139952128 it uses 100% of one CPU core and it takes quite some time to calculate (it is running for more than 9 minutes at the time of this post). How can I make this program faster? Can I set it to use all CPU cores?

NevemTeve 11-02-2014 11:45 AM

Use sqrt function to determine the square-root of the number:

Code:

#include <math.h>

int numero_perfeito(unsigned long int x) {
   
    unsigned long int i, soma = 0;
    unsigned long maxdiv= sqrt (x);

    for(i= 1; i <= maxdiv; i++) {
        if (x % i == 0) {
            soma = soma + i;
            if (i != 1 && i != x/i) {
                soma = soma + x/i;
            }
        }
    } 

    if(soma == x)
        return 1;
    return 0;
}


moisespedro 11-02-2014 11:54 AM

I don't get it. I tested here and it worked but I am not sure why.

Like, take 28 for example, it is a perfect number. maxdiv is sqrt of 28 which is equal to 5.29150262213 but 28 is also divisible by 7.

NevemTeve 11-02-2014 12:19 PM

when i=4, you program will execute this:
Code:

sum := sum + 4 + 28/4

moisespedro 11-02-2014 04:55 PM

I am still not sure I got it. Anyways, it still takes some time.

metaschima 11-02-2014 06:58 PM

Here's my multi-threaded version, there's a bit of a problem with number 6, because how can you even split that between say 4 threads ? If you can figure out an algorithm, good for you.

Code:

bash-4.2$ compile pn.c -fopenmp

# 1 thread   
bash-4.2$ time ./pn 8589869056
8589869056 is a perfect number

real        1m3.830s
user        1m3.849s
sys        0m0.000s

# 4 threads
bash-4.2$ time ./pn 8589869056
8589869056 is a perfect number

real        0m16.885s
user        1m7.395s
sys        0m0.004s

That's 3.78 times faster.

Code:


#include <stdio.h>
#include <stdint.h>

#include <omp.h>

#define NUM_THREADS 4

int main (int argc, char * argv[])
{
        uint64_t i;
        uint64_t num;
        uint64_t tnum[2][NUM_THREADS];
        unsigned int tid;

        uint64_t sum[NUM_THREADS] = {};
        uint64_t total = 0;

        // parse arg
        if (2 != argc)
        {
                fprintf (stderr, "Usage: %s number\n", argv[0]);
                return 1;
        }

        // get num
        sscanf (argv[1], "%lu", &num);

        if (6 == num)
        {
                printf ("6 is a perfect number\n");
                return 0;
        }

        for (i = 0; i < NUM_THREADS; i++)
        {
                tnum[1][i] = (i + 1) * num / NUM_THREADS;
                tnum[0][i] = tnum[1][i] - (num / NUM_THREADS);
                //printf ("%lu %lu\n", tnum[0][i], tnum[1][i]);
        }
        tnum[0][0] = 1;
        tnum[1][NUM_THREADS - 1] = num;
        //printf ("%lu\n", tnum[1][NUM_THREADS - 1]);

        // main loop
        #pragma omp parallel private(i, tid) shared(num, sum, tnum) num_threads(NUM_THREADS)
        {
                tid = omp_get_thread_num();
                //printf ("%d %lu %lu %lu\n", tid, tnum[0][tid], tnum[1][tid], num);
                for (i = tnum[0][tid]; i < tnum[1][tid]; i++)
                {
                        //printf ("%d %lu\n", tid, i);
                        if (0 == num % i)
                        {
                                sum[tid] += i;
                                //printf ("adding %lu\n", i);
                        }
                }
        }

        // calculate total
        for (i = 0; i < NUM_THREADS; i++)
        {
                total += sum[i];
                //printf ("%lu\n", sum[i]);
        }
        //printf ("%lu\n", total);

        // output result
        if (total == num)
        {
                printf ("%lu is a perfect number\n", num);
        }
        else
        {
                printf ("%lu is NOT a perfect number\n", num);
        }

        return 0;
}

Not fully bug tested, but don't have time for that.

moisespedro 11-02-2014 07:52 PM

Gonna give your version a shot. I am a still a C begginer and I am trying to understand the code.

Did you try a really large number like 2305843008139952128?

I left it running for hours and it didn't finish (I ended up turning off the PC without noticing it :().

My last attempt:

Code:

$ time ./numeroperfeito-1                   
Digite um numero:
2305843008139952128
^C
real        175m1.960s
user        174m44.540s
sys        0m2.831s

EDIT: And how rude from my part, many thanks for your code (you both) :D

metaschima 11-02-2014 08:27 PM

Remember that finding primes and perfect numbers is computationally very expensive. Don't expect the calculation of higher values to be doable on a desktop in a day. If you wanted even more optimization, you could try using some MMX or SSE, but I've found that it is very difficult, requires knowledge of assembly, is less portable, and only with an expert's touch can it truly outperform a compiler except in obvious cases (and the compiler is likely to figure these out anyway). If I have more time, I'll see how plausible it is to use MMX or SSE for this. The main problem is that loading and unloading from these registers is slower than for other registers, so you should keep data in the registers if you want a performance improvement.

a4z 11-03-2014 01:14 AM

the openmp sample is already nice and goes into the right direction,
additional
try help the compiler by passing
-funroll-loops -ftree-vectorize -march=corei7-avx (orwhat ever cool cpu you have)
and check if your loops are written in a way that the compiler can unroll and verctorize them, do not write sse code by hand, help the copilerr to generate it
additionally, no optimization runs out the best, or a better , algorithm,
so you can also check it the way you solve the problem is the best to do

nbritton 11-03-2014 11:14 AM

If I recall correctly, all, even, positive perfect numbers can be factored with the equation y = 2^(2x - 1) - 2^(x - 1) when x is a mersenne prime number.

Quote:

x = mersenne prime
x = x, y = 2^(2x - 1) - 2^(x - 1)
x = 2, y = 2^3 - 2^1 = 6
x = 3, y = 2^5 - 2^2 = 28
x = 5, y = 2^9 - 2^4 = 496
x = 7, y = 2^13 - 2^6 = 8128
x = 13, y = 2^25 - 2^12 = 33,550,336
x = 17, y = 2^33 - 2^16 = 8,589,869,056
x = 19, y = 2^37 - 2^18 = 137,438,691,328
x = 31, y = 2^61 - 2^30 = 2,305,843,008,139,952,128

johnsfine 11-03-2014 11:33 AM

Quote:

Originally Posted by nbritton (Post 5263886)
If I recall correctly, all positive perfect numbers can be factored with the equation y = 2^(2x - 1) - 2^(x - 1) when x is a prime number.

All even perfect numbers can be factored with the equation
y = 2^(x-1) * (2^x-1)
where x is prime and (2^x-1) is also prime.

Your equation is equivalent.

y is perfect if and only if (2^x-1) is prime.

So the search for even perfect numbers is equivalent to searching prime numbers x for those for which (2^x-1) is prime. That is much much faster than directly testing whether y is perfect.

There is no proof (so far as I know) that odd perfect numbers don't exist. But it is pretty easy to prove no odd perfect number is small enough to be found with a simplistic search like the programs discussed earlier in this thread (certainly no odd perfect number is smaller than 2^64, so unsigned long could not hold one). So for practical purposes, perfect numbers are even.

metaschima 11-03-2014 12:23 PM

Quote:

Originally Posted by johnsfine (Post 5263895)
But it is pretty easy to prove no odd perfect number is small enough to be found with a simplistic search like the programs discussed earlier in this thread (certainly no odd perfect number is smaller than 2^64, so unsigned long could not hold one). So for practical purposes, perfect numbers are even.

That's one thing I was wondering about, it is likely that even 64-bit numbers will overflow. Not sure how to fix this, you many need some library that can handle larger numbers.

If you really wanted to find more perfect numbers, why not try to construct pernicious numbers and test them ? Like the wiki says:
6 = 110
28 = 11100
496 = 111110000
8128 = 1111111000000
33550336 = 1111111111111000000000000

nbritton 11-03-2014 01:25 PM

http://primes.utm.edu/mersenne/

Another optimization you can make is that, even, perfect numbers end in ether 6 or 8, the pattern of the series can be described with the equation y = 8 - mod(x + x^2, 4).

For example:
Code:

$ for x in $(cat mersenne_powers); do ((echo "$x, `bc <<< "2^($x-1) * (2^$x-1)" | tail -c 2`")&); done
2, 6
3, 8
5, 6
7, 8
13, 6
17, 6
19, 8
31, 8
61, 6
89, 6
107, 8
127, 8
521, 6
607, 8
1279, 8
2203, 8
2281, 6
3217, 6
4253, 6
4423, 8
9689, 6
9941, 6
11213, 6
19937, 6
21701, 6
23209, 6
44497, 6
86243, 8
110503, 8
132049, 6
216091, 8
756839, 8
859433, 6
1257787, 8
1398269, 6
2976221, 6
3021377, 6
6972593, 6
13466917, 6
20996011, 8
24036583, 8
25964951, 8
30402457, 6
32582657, 6
37156667, 8
42643801, 6
43112609, 6
57885161, 6


nbritton 11-03-2014 08:34 PM

Quote:

Originally Posted by johnsfine (Post 5263895)
All even perfect numbers can be factored with the equation
y = 2^(x-1) * (2^x-1)
where x is prime and (2^x-1) is also prime.

Your equation is equivalent.

According to bc, your equation is faster to compute on x86_64...
Code:

$ time echo "2^(10000000-1) * (2^10000000-1)" | bc > /dev/null

real        10m51.011s
user        10m50.292s
sys        0m0.589s


$ time echo "2^(2*10000000-1) - 2^(10000000-1)" | bc > /dev/null

real        11m37.377s
user        11m36.306s
sys        0m0.861s


johnsfine 11-04-2014 07:44 AM

Quote:

Originally Posted by nbritton (Post 5264131)
According to bc, your equation is faster to compute on x86_64...

Computing that isn't a significant fraction of the time regardless of which way you do it.

The significant time is in testing whether (2^X-1) is prime.

Testing whether 2^(X-1)*(2^X-1) is perfect is mathematically equivalent to testing whether (2^X-1) is prime, but since 2^(X-1)*(2^X-1)is a far larger number, directly testing whether it is perfect would be far slower than directly testing whether (2^X-1) is prime.

After discovering that (2^X-1) is prime, computing 2^(X-1)*(2^X-1) is trivial, then converting it to decimal for display is bit harder, but still trivial compared to discovering (2^X-1) is prime.


All times are GMT -5. The time now is 01:13 PM.