LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   How to track the bug? (https://www.linuxquestions.org/questions/linux-newbie-8/how-to-track-the-bug-710951/)

maradnus 03-12-2009 01:57 AM

How to track the bug?
 
Dear friends,

I've deployed a couple of server programs using RPC in a Fedora 6 server.
These server programs are being accessed by more than 40 simultaneous users. The server gets rebooted now and then. I do not know what the problem is?.. I don't know how to track the bug if it is in the server programs?

Kindly help in this regard.


Thanks in advance.


Sundaram M

i92guboj 03-12-2009 02:39 AM

You are going to have to provide more info.

"Rebooted" like in a hard reboot at any random time without even the OS switching off properly?

Random hard reboots are almost always a synonym of faulty hardware.

m_w_sundaram 03-12-2009 04:41 AM

Thanks for attending the query...

Yes. I think you are correct. It is a hard reboot at any random time without even the OS switching off properly. How to identify the problem?

I get some messages from syslod often in the virtual consoles saying that something like "avc=...permission denied ... pid=... "postmaster"...device=sda2..."


Could you give some idea to rectify the problem....


Thanks

i92guboj 03-12-2009 04:54 AM

Quote:

Originally Posted by m_w_sundaram (Post 3472915)
Thanks for attending the query...

Yes. I think you are correct. It is a hard reboot at any random time without even the OS switching off properly. How to identify the problem?

I get some messages from syslod often in the virtual consoles saying that something like "avc=...permission denied ... pid=... "postmaster"...device=sda2..."


Could you give some idea to rectify the problem....


Thanks

Well, frequent candidates for random reboots are bad memory sticks or overheated cpu. So, I think that a good place to start is to use a livecd with memtest86 or to try to replace the ram sticks if you have some sane ones available.

You can also check the temperature of your cpu with any monitoring program. mbmon -A should tell you in command line, there are lot of graphical monitors as well. You should monitor it *while* you are doing heavy work (try compiling the kernel or launching some modern game or whatever, and see if the temperature raises too much while doing so.

I or someone else here could take a look at your dmesg output just in case, so if you can, post it to pastebin.com and put a link here so we can review it.

T74marcell 03-12-2009 06:04 AM

The last time I ran into a similar constant rebooting phenomenon it was caused by a new Kingston RAM module on a completely new PC hardware. Kingston is famous for being good, but obviously not free from failing occasionally.

The problem can also be caused by problems with the motherboard (that would be much more severe) or, as already mentioned, by overheating. You should check for the RAM modules first, because that's rather easy to do and verify.

Arch Linux

sundialsvcs 03-12-2009 07:55 AM

I agree: hardware.

Start replacing things. Heck, replace the whole box. Check the power circuit, too.

Probably a good idea to hire a specialist to help you.

maradnus 03-13-2009 05:57 AM

Quote:

Originally posted by i92guboj

Well, frequent candidates for random reboots are bad memory sticks or overheated cpu. So, I think that a good place to start is to use a livecd with memtest86 or to try to replace the ram sticks if you have some sane ones available.

You can also check the temperature of your cpu with any monitoring program. mbmon -A should tell you in command line, there are lot of graphical monitors as well. You should monitor it *while* you are doing heavy work (try compiling the kernel or launching some modern game or whatever, and see if the temperature raises too much while doing so.

I or someone else here could take a look at your dmesg output just in case, so if you can, post it to pastebin.com and put a link here so we can review it.
Thanks for the replies....


Here is the output of dmesg in pastebin.com

http://pastebin.com/m5907a50c


All times are GMT -5. The time now is 03:16 PM.