ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
// oomptest.c
// compile as gcc oomptest.c -o oomp -lm -fopenmp
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
int omp_get_max_threads(void);
int omp_get_thread_num(void);
void subroutine(float *subject, float *object, int index){
object[index] -= subject[0];
}
int main(int argc, char* argv[]){
int i, id;
float v0[2] = {13579, 24680}, v1[5]={1, 3, 5, 7, 9};
#pragma omp parallel num_threads(5) private (id)
{
id = omp_get_thread_num(); printf("line 22 id=%d\n", id);
subroutine(&v0[1], &v1[id], id);
}
for(i=0; i<5; i++)printf("v1[%d] = %f\n", i, v1[i]);
}
output:
Quote:
line 22 id=0
line 22 id=3
line 22 id=1
line 22 id=2
line 22 id=4
v1[0] = -24679.000000
v1[1] = 3.000000
v1[2] = -24675.000000
v1[3] = 7.000000
v1[4] = -24671.000000
expected output:
Quote:
line 22 id=0
line 22 id=3
line 22 id=1
line 22 id=2
line 22 id=4
v1[0] = -24679.000000
v1[1] = -24677.000000
v1[2] = -24675.000000
v1[3] = -24673.000000
v1[4] = -24671.000000
Commentary:
The original program performed a gaussian elimination on a very large, very sparse linear algebra matrix. The matrix consists of two flat files: one file to the left of the equation, and one to the right of the
equation. Each element in those files is a structure with six degrees of freedom. Thus, what is actually being passed to the omp pragma are the leftmost structure in each equation, and also the rightmost structure in each equation.
As compiled, the program runs without fault under valgrind, will run successfully if "number of threads" is reduced to one, but fails for more than one thread.
So your solution to a buggy omp library is to avoid omp entirely?
I submit that a better solution is to change the number of threads passed to the omp pragma from 5 to one. That will work since the omp library apparently does work for single threads. For my application, that means solving the problem with one thread chunks instead multiple thread chunks. When the library eventually does get fixed, I can simply change the '1' back to "nthreads".
So your solution to a buggy omp library is to avoid omp entirely?
No, that is not my solution. Sorry for being unclear, I included just the parts of your program that were relevant, to keep things short. That is, I left out the omp parts because you don't need to change them.
You have a bug in your program, the omp library is fine. It just happens that the bug is hidden when you are only using one thread. It is not even a threading bug, it's just about the arithmetic you are doing on the thread ID number.
My solution is to put 0 instead of id or index in one of the highlighted places (choose one of the places, not both).
That would work if v0 and v1 were not both pointers to a linked list (the left side of linear algebra).
I tried to keep the program snippet simple. In practice, I am also similarly passing two pointers to a different linked list (the right side of the equation).
Quote:
Originally Posted by ntubski
No, that is not my solution. Sorry for being unclear, I included just the parts of your program that were relevant, to keep things short. That is, I left out the omp parts because you don't need to change them.
You have a bug in your program, the omp library is fine. It just happens that the bug is hidden when you are only using one thread. It is not even a threading bug, it's just about the arithmetic you are doing on the thread ID number.
My solution is to put 0 instead of id or index in one of the highlighted places (choose one of the places, not both).
Please explain to me why I have a bug in my sample program.
My sample program works correctly for id = 1, 3, 5. For id == 1, 3, 5 it reads from both the v0 array and the v1 array correctly.
As I said in my initial posting, this code has worked correctly for the past ten years, having been recompiled at least once for each distribution upgrade.
The only thing that has changed is the version. In that program, I am passing the addresses of two structures in one linked list of structures, and the addresses of two structures in a parallel linked list of structures. My (poor old) CPU will handle up to 8 simultaneous threads.
Just to see what would happen, I changed line 23 to read:
Code:
subroutine(&v0[id%2], &v1[id], id);
(switching back and forth between v0[0] and v0[1])
I only got one correct answer.
Quote:
Originally Posted by NevemTeve
You have a bug in your minimal sample program, so it is plausible that you have bugs in the actual application as well.
I tried to keep the program snippet simple. In practice, I am also similarly passing two pointers to a different linked list (the right side of the equation).
Then you have made it too simple. Please try again using linked lists.
Quote:
Originally Posted by piobair
Please explain to me why I have a bug in my sample program.
Okay, here is your program without omp but still with the same bug:
Code:
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
void subroutine(float *subject, float *object, int index){
object[index] -= subject[0];
}
int main(){
/* purposely making v1 extra long, so that the program has deterministic behaviour */
float v0[2] = {13579, 24680}, v1[10]={1, 3, 5, 7, 9};
for (int id=0; id < 5; id++) { //#pragma omp parallel num_threads(5) private (id)
printf("line xx id=%d\n", id);
subroutine(&v0[1], &v1[id], id);
for(int i=0; i<10; i++)printf("v1[%d] = %f\n", i, v1[i]);
}
return 0;
}
My sample program works correctly for id = 1, 3, 5. For id == 1, 3, 5 it reads from both the v0 array and the v1 array correctly.
Your threads are numbered 0 to 4. You don't have any thread with id = 5. And you should be able to infer from the above that only id = 0 is working correctly. The only part I'm not sure about is why Valgrind didn't catch your writes past the end of the v1 array.
Quote:
As I said in my initial posting, this code has worked correctly for the past ten years, having been recompiled at least once for each distribution upgrade.
Perhaps you are thinking of some other post; you didn't actually say this in your initial posting.
Just a piece of advice. I don't know if your original code is similar to the one you show here.
In the code you show, all threads are modifying v1 at the same time, this may logically be Ok, but in terms of hardware if several cores act in the same memory block you may have a "false sharing" performance issue. This is because each core has a copy of the block in a cache line, and if one modifies its copy the other cores go stall until the caches are synchronized.
How bad this is depends on the number of cache levels and the number of processors/cores you are using.
$ ./paralell_test
3,15,3 4,11,0 5,16,0 6,18,4
3,15,1 4,11,6
3,15,1 5,16,2 7,12,0 8,17,4
3,15,1 6,18,2 8,17,0 9,14,4
5,16,1 7,12,6
5,16,1 6,18,0 8,17,2 10,13,4
6,18,1 9,14,6
8,17,1 10,13,6
line 263 node0=3; beginning[0]=3
line 263 node0=3; beginning[1]=3
line 263 node0=3; beginning[2]=3
line 284 value = 3345.225586, -410.425629
line 286 nt=2
threads available = 4; threads used = 2
line 286 node0=15>3, 3345.225586; line=[0] = 3, -410.425629
line 286 node0=15>3, 3345.225586; line=[1] = 3, -2900.000000
line 14
line 18
line 19 line1 = 3
line 20 line[id] = 3; -2900.000000
line 20a value = -2900.000000
line 14
line 18
line 19 line1 = 3
line 20 line[id] = 3; 0.000000
line 20a value = 0.000000
line 263 node0=3; beginning[0]=5
line 284 value = 3345.225586, -410.425629
line 285
line 294 exiting
For gaussian forward elimination, nodes head[0], head[1] and head[2] would be freed from the matrix
line1 = end_of_line1->next; loop until (line1 == NULL);
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.