Kernel Call Trace Order - Is it top to bottom OR vice-versa
Linux - KernelThis forum is for all discussion relating to the Linux kernel.
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Can you also help me in finding the cause of the problem ? As of now, I think It is happening because of some hardware issue. As the same application is running fine in other hardware set with same OS loaded in it. What kind of probable hardware issue it may be ?
Your hardware issue is likely the driver that implements "close". The driver may need different implementation for different hardware. Sometimes in /var/log/messages you can see the instruction EIP that is causing the problem. You can decipher that by gdb.
kernel: EIP is at d_instantiate+0x2d/0x56
44 > gdb vmlinux
(gdb) info line *d_instantiate+0x2d
Line 66 of "list.h" starts at address 0xc0167d87
and ends at 0xc0167d8a <d_instantiate+48>.
Since only limited output is available and the stack dump does not give a very clear image. Two close() in one stack which is impossible. The best guess is that it crashed during do_page_fault().
Translate "do_page_fault+0x2fd/0x4b4" and "sys_close+0x0/0x61" to the corresponding line numbers. Check if any possible divide by zero occurs around the line number. do_page_fault() internally does not do any "divide" operation. It's likely that the divide operation is done in a function called by do_page_fault(). For example handle_mm_fault() (just a guess). Close() can be implemented by either a filesystem or network drivers.
You can also add assert() or panic() around the possible candidates and narrow down the problem. That should tell you the exact line number when the unexpected happens
I tried my application on Red-hat ES OS. The issue seems to be resolved, since the application along the OS is running fine since 10 days.
BTW I could not debug FC2 kernel in my case, because I could not locate the "vmlinux" file for it. I guess I need to compile the kernel [from the provided source] for the file (vmlinux) to be generated.
So It looks like a hardware compatibility issue with FC2, which got resolved with RH-ES4.
One more thing I want to ask is that, Why different distributions (like Fedora, Debian) instrument the standard kernel? Also Why they don't clearly specify the changes they had done?