LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   pthread segmentation fault 64-bit only (https://www.linuxquestions.org/questions/programming-9/pthread-segmentation-fault-64-bit-only-720977/)

rushmanw 04-22-2009 02:15 PM

pthread segmentation fault 64-bit only
 
I have been converting a statistical analysis utility in C from single-threaded design to multi-thread using POSIX threads (pthread). It ran fine on a 32-bit Fedora (Intel dual core), and gave a 30% improvement over the single-thread version. When I moved to a new quad-core, 64-bit machine (64-bit Fedora), it fails with a segmentation fault.

Other forums pointed out issues with pthread on 64-bit machines, and recommended a fix of adding -ldl to the compile line, but this did not help. It is compiled with -Wall with no warnings. I believe the code to be thread safe, and it runs fine on a 32-bit machine.

Why not use fork? This utility spawns 480,000 threads over its execution, and all threads read (never write) the same shared 1 GB dataset (initialized via file before entering the multithreaded tasks). pthread has the advantage of not making a copy of the heap space, so each thread can access the data.

Lastly, the fault occurs before any pthreads have been created, and all output (stdout and stderr) is fflush'd to find the exact spot where the fault occurs. It fails at the first declaration of an integer of around 250,000 elements. If I drop the array to 200,000 in a sample function, it does not fault. At 300,000 it always fails. In this snippet:
bool testFunc(int numberThreads, GlobalParams *globals)
{
printf("In testFunc A\n");fflush(stdout);fflush(stderr);
int a[300000];
return true;
}
The printf produces no output, and it faults. If I change the array size to 200000, printf works and there is no fault.

By the way, the original single-threaded utility ran 17 days for a single run. I am hoping this i7 920 can do the job in a few days with 8 threads, but I am stuck.

ta0kira 04-22-2009 02:49 PM

You shouldn't put that much data on the stack. Because you're on a 64-bit machine, int is 8 bytes; twice the size of one on a 32-bit machine. I assume you're getting a stack overflow. Try making it static or using malloc. You also need to run a backtrace (e.g. in gdb) to find out the exact line it happens (might be somewhere else.)
Kevin Barry

paulsm4 04-22-2009 03:22 PM

Hi -

1. I totally agree with ta0kira: stack overflow could easily be the root cause.

2. For whatever it's worth, I *strongly* recommend "fprintf (stderr, ...)" over "printf (); fflush (stdout)". In my experience, the latter simply doesn't work most of the time... (or, more precisely, "doesn't work the way you expect it when you most need it to"!)

IMHO .. PSM

rushmanw 04-22-2009 04:15 PM

Thanks, but I almost put a note saying "please don't tell me not to do this... I have to."
I have got away from doing programming lately, so maybe I have forgotten what the stack is. ;)

I have to spawn 480,000 threads, and I did not make it clear that there is only one thread active per CPU. So there are never more than 8 additional threads, or 9 counting the supervisor. Of course, I would never spawn 480,000 threads at once, but I failed to state that. Does that change the stack recommendation, or am I missing the point? The idea is to work the long list 8 at a time until complete, one thread per CPU, roughly. I did not use CPU affinity, BTW.

Also, I just added ifdef and macros so I can switch off multitasking while using the same functions for the work, and it runs fine that way. I have pthread_create reference a wrapper for the function that does the work, so I can call it either via pthread_create or directly (switchable by macro). The structures are complex, so I wanted to validate all the pointers and malloc stuff before adding threads.

I will change all the printf. It is really better code to do it that way, and I was being sloppy. It allows far better output control, too. Thanks for the fast response!

dmail 04-22-2009 04:16 PM

Quote:

Originally Posted by ta0kira (Post 3517569)
You shouldn't put that much data on the stack. Because you're on a 64-bit machine, int is 8 bytes; twice the size of one on a 32-bit machine.

Doesn't Fedora use the LP64 model? I would think it does which means int is 4 bytes.

What does ulimit -S report?

rushmanw 04-22-2009 04:21 PM

dmail, int shows as 4, long as 8, yes. Either way, the stack would be a concern if I spawned 480,000 at once. Looking to see if only 8 threads at a time is still a concern.

paulsm4 04-22-2009 04:28 PM

Quote:

dmail, int shows as 4, long as 8, yes. Either way, the stack would be a concern if I spawned 480,000 at once. Looking to see if only 8 threads at a time is still a concern.
Declaring large objects as local variables is *always* a concern. Even if you can get away with it today ... you're still leaving a potential landmine that somebody's going to have to debug tomorrow.

My advice is "Just don't do it".

IMHO .. PSM

rushmanw 04-22-2009 04:41 PM

This is pretty much one-time use to create a datafile needed for another program, but I can convert the large arrays to pointers. I don't declare anything large inside the threads, just for managing them. I can start changing to pure pointers and malloc, and post back with whether it fixes the problem. Still not sure why it worked so well on the 32-bit OS. I could just give up and let it run on the 32, but waiting 17 days per iteration really slows down our progress.

rushmanw 04-22-2009 05:03 PM

(sorry for the confusion because I refreshed and created a dup question)
Paulsm4: I moved the large array declarations to global per your suggestion, and that fixed it, at least for my limited test runs. Now I need to inspect the rest of the code for places that may cause a similar problem later. Because of the design, it was much easier to move the two large arrays to global than to change to pointers/malloc. Getting sloppy in my old age. ;)

THANK YOU!

Paulsm4 reply to the dup thread copied here for reference:
>One of the differences between Linux and other *nix is that Linux processes *are* >Linux threads:
>
>http://www.linuxjournal.com/article/3814
>
>But that's academic.
>
>The reason you're crashing has little/nothing to do with threads, and everything >to do with the fact that you've allocated 250,000 element arrays as local >variables (off the stack).
>
>As I recommended in my other post: I believe that's unwise. I would strongly urge >you to allocate large objects from the heap, not the stack.
>
>IMHO .. PSM

ta0kira 04-22-2009 11:45 PM

Quote:

Originally Posted by rushmanw (Post 3517649)
dmail, int shows as 4, long as 8, yes. Either way, the stack would be a concern if I spawned 480,000 at once. Looking to see if only 8 threads at a time is still a concern.

Each thread should have its own stack, but they're dynamically allocated with the thread. It might have been better to say you're overrunning a stack rather than the stack.
Kevin Barry

ta0kira 04-22-2009 11:49 PM

Quote:

Originally Posted by dmail (Post 3517647)
Doesn't Fedora use the LP64 model? I would think it does which means int is 4 bytes.

What does ulimit -S report?

Who expands the stack?
Kevin Barry

dmail 04-23-2009 03:39 AM

Quote:

Originally Posted by ta0kira (Post 3517951)
Who expands the stack?
Kevin Barry

That should have been a small 's', I was not indicating to increase the stack size only to query it.

ta0kira 04-23-2009 02:02 PM

Quote:

Originally Posted by dmail (Post 3518098)
That should have been a small 's', I was not indicating to increase the stack size only to query it.

Sorry, that's not what I meant. I wasn't quite sure where stack expansion would take place were it to near capacity. I guess I should read up on it; it isn't something I ever really contemplated before.
Kevin Barry


All times are GMT -5. The time now is 07:39 PM.