Kernel Call Trace Order - Is it top to bottom OR vice-versa
Linux - KernelThis forum is for all discussion relating to the Linux kernel.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Kernel Call Trace Order - Is it top to bottom OR vice-versa
Hello All,
I am facing a strange problem with my written program. It goes to zombie state. When I give "echo t > /proc/sysrq-trigger", I get the following in "/var/log/messages" file.
Can you also help me in finding the cause of the problem ? As of now, I think It is happening because of some hardware issue. As the same application is running fine in other hardware set with same OS loaded in it. What kind of probable hardware issue it may be ?
Your hardware issue is likely the driver that implements "close". The driver may need different implementation for different hardware. Sometimes in /var/log/messages you can see the instruction EIP that is causing the problem. You can decipher that by gdb.
For example:
kernel: EIP is at d_instantiate+0x2d/0x56
[11:05am] /usr/src/linux-2.6.6-1.435.2.3.lair6smp
44 > gdb vmlinux
(gdb) info line *d_instantiate+0x2d
Line 66 of "list.h" starts at address 0xc0167d87
and ends at 0xc0167d8a <d_instantiate+48>.
Since only limited output is available and the stack dump does not give a very clear image. Two close() in one stack which is impossible. The best guess is that it crashed during do_page_fault().
Translate "do_page_fault+0x2fd/0x4b4" and "sys_close+0x0/0x61" to the corresponding line numbers. Check if any possible divide by zero occurs around the line number. do_page_fault() internally does not do any "divide" operation. It's likely that the divide operation is done in a function called by do_page_fault(). For example handle_mm_fault() (just a guess). Close() can be implemented by either a filesystem or network drivers.
You can also add assert() or panic() around the possible candidates and narrow down the problem. That should tell you the exact line number when the unexpected happens
I tried my application on Red-hat ES OS. The issue seems to be resolved, since the application along the OS is running fine since 10 days.
BTW I could not debug FC2 kernel in my case, because I could not locate the "vmlinux" file for it. I guess I need to compile the kernel [from the provided source] for the file (vmlinux) to be generated.
So It looks like a hardware compatibility issue with FC2, which got resolved with RH-ES4.
One more thing I want to ask is that, Why different distributions (like Fedora, Debian) instrument the standard kernel? Also Why they don't clearly specify the changes they had done?
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.