LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel
User Name
Password
Linux - Kernel This forum is for all discussion relating to the Linux kernel.

Notices

Reply
 
Search this Thread
Old 09-25-2009, 01:59 PM   #1
Member88
LQ Newbie
 
Registered: Sep 2009
Posts: 2

Rep: Reputation: 0
Question Stack trace shows a function called itself when there is no recursion. How?


Hi,

I'm hoping that kernel engineers can help me with a puzzling issue I am encountering.

I have a multi-threaded program that is in a "hung" state. So, upon debugging the program, I found that one thread shows a function twice in the call stack. The function for example is in frames 9 & 10. My program is quite simple does not involve any recursion. I believe this is causing my program to hang, but why is this happening? Is it possible for the call stack of a thread to get corrupted somehow by other threads? Or heap corruption, maybe? What else can cause this?

Greatly appreciate any comments or help. Thanks!

NewMember
 
Old 09-25-2009, 02:31 PM   #2
paulsm4
Guru
 
Registered: Mar 2004
Distribution: SusE 8.2
Posts: 5,863
Blog Entries: 1

Rep: Reputation: Disabled
Don't guess. Don't speculate.

You've got a debugger - use it. Just single-step through the code, and observe what happens!
 
Old 09-25-2009, 03:01 PM   #3
johnsfine
Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,083

Rep: Reputation: 1110Reputation: 1110Reputation: 1110Reputation: 1110Reputation: 1110Reputation: 1110Reputation: 1110Reputation: 1110Reputation: 1110
Quote:
Originally Posted by Member88 View Post
I'm hoping that kernel engineers can help me with a puzzling issue I am encountering.
Doesn't sound like there is any kernel aspect to the problem.

Quote:
shows a function twice in the call stack.
Lots of possibilities, such as:
Debugger misunderstands the call stack because the debugger isn't perfect.
Debugger misunderstands the call stack because something corrupted the stack.
The code executed something like the recursion you see as a result of some corrupted memory.

In all cases, there is likely some corrupted memory involved (in the actual failure and probably in the strange stack display).

Quote:
Is it possible for the call stack of a thread to get corrupted somehow by other threads?
Yes, but typically it is more likely to be corrupted by the same thread (the one actually using that stack).
You may have other reasons (such as the pattern of non reproducibility of the failure) for deciding the bug is more likely cross thread.

If you know assembly language, you generally can look at the asm instructions around the point of failure and those at the start of the function, look at the register values at the point of failure and look at the stack yourself and from all that make a good estimate about what eaxactly was corrupted. Then you usually need to debug again from the start to try to catch the memory corruption in the act.

I don't really know how you chase such bugs if you don't know assembler.

In theory the run time tools for catching writes beyond the end of arrays and similar bugs, should catch a fair fraction of the original bugs leading to such memory clobbers. In practice, my co workers who use such tools to find such bugs usually fail to find them and need me to apply assembler expertise and debugging experience to a more manual approach to solving failures like this.

An example of a cross thread bug fitting this symptom would be passing a pointer to a local object from a function creating a new thread to that thread, then exiting the function so the object no longer has valid memory. If the target thread writes to the object, the original thread might crash with a messed up stack.

That is just one example, since you seem to not understand the sort of thing that might corrupt a stack. There are many possibilities and the above overly specific example is not given as a likely theory.

Last edited by johnsfine; 09-25-2009 at 03:06 PM.
 
Old 09-26-2009, 02:14 AM   #4
Member88
LQ Newbie
 
Registered: Sep 2009
Posts: 2

Original Poster
Rep: Reputation: 0
Thanks for the replies. It is very helpful. But I can elaborate more on the problem and please let me know what you think.

The problem I am presenting here is actually seen by someone else. He just gave me the stack traces of a problem that is very hard to reproduce. He's been seeing it only a few times and requires his program to run for a very long time. So, I did not have the chance to trace the problem using a debugger. But he did give me the traces of all threads during the "hung" state, and while one of the threads shows that two functions are called twice without recursion involved, the main thread shows that a signal handler interrupted a malloc() call and the signal handler calls sem_wait(), which the main thread is now sleeping in. The interesting thing is that the function that shows up twice in the other thread is related to free() and is waiting on a _ll_mutex_lock -- might have been misspelled because I don't have the trace details right now. I know it is bad practice to call sem_wait() in a signal handler (I have emphasized this to the author of the code), and I really think that the interrupted malloc() may be causing this puzzling stack trace. In one of the replies, I see that memory corruption could be a cause. Given the details of this case, what do you think may really be happening here? Can the interrupted malloc() call result in memory corruption and then stack corruption?

Thanks so much for the insight!

Member88
 
Old 09-26-2009, 06:41 AM   #5
johnsfine
Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,083

Rep: Reputation: 1110Reputation: 1110Reputation: 1110Reputation: 1110Reputation: 1110Reputation: 1110Reputation: 1110Reputation: 1110Reputation: 1110
Now you're making it sound like a simple deadlock. So I think you are correct that the problem is doing things in a signal handler that are not safe to do there.

No memory corruption is necessary for the deadlock. For deadlock, you just need two resources, such as:

thread A owns resource X
thread B owns resource Y and is waiting for resource X
thread A gets a signal and the signal handler waits for resource Y
System hung.

I don't know the details of how malloc interacts with multi threading and signals, so I'm not certain, but from your description, I would expect the resource X in the deadlock is something used internally to malloc to control multi thread access to memory management data structures.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Building a stack trace in C++ Kikazaru Programming 5 07-02-2006 03:10 PM
java: stack trace eantoranz Programming 3 04-25-2005 11:38 AM
stack trace of a process node047 Linux - Newbie 2 04-01-2005 09:11 PM
Stack trace ust Linux - General 0 02-27-2004 02:30 AM
help with recursion function debdas Programming 4 05-14-2003 03:03 AM


All times are GMT -5. The time now is 10:57 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration