LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Kernel (https://www.linuxquestions.org/questions/linux-kernel-70/)
-   -   Fatal exception: panic in 5 seconds (https://www.linuxquestions.org/questions/linux-kernel-70/fatal-exception-panic-in-5-seconds-560298/)

bendani 06-08-2007 04:10 PM

Fatal exception: panic in 5 seconds
 
i got centos 4 (i thing its 4.3 but not sure)
my server crashes every now and then and every time its not because of the same reason
i really don't know what to do anymore

i got Pentium D 3.0
With 2GB ddr


here is some logs :

Jun 6 17:43:37 host kernel: Unable to handle kernel paging request at virtual address 04000000
Jun 6 17:43:37 host kernel: printing eip:
Jun 6 17:43:37 host kernel: c0140804
Jun 6 17:43:37 host kernel: *pde = 37252001
Jun 6 17:43:37 host kernel: Oops: 0000 [#1]
Jun 6 17:43:37 host kernel: SMP
Jun 6 17:43:37 host kernel: Modules linked in: ipt_state ipt_TOS iptable_mangle ip_conntrack_ftp ip_conntrack_irc ip_conntr$
Jun 6 17:43:37 host kernel: CPU: 0
Jun 6 17:43:37 host kernel: EIP: 0060:[<c0140804>] Not tainted VLI
Jun 6 17:43:37 host kernel: EFLAGS: 00010006 (2.6.9-42.0.10.ELsmp)
Jun 6 17:43:37 host kernel: EIP is at find_get_page+0x20/0x3b
Jun 6 17:43:37 host kernel: eax: 04000000 ebx: 04000000 ecx: 00004d00 edx: 04000000
Jun 6 17:43:37 host kernel: esi: dc87b3d0 edi: 00000000 ebp: 00004d90 esp: f18c8e0c
Jun 6 17:43:37 host kernel: ds: 007b es: 007b ss: 0068
Jun 6 17:43:37 host kernel: Process httpd (pid: 20040, threadinfo=f18c8000 task=e15bc3b0)
Jun 6 17:43:37 host kernel: Stack: 00000000 dc87b320 c0140bde 00000246 00001000 00000000 0ffc5e52 00000000
Jun 6 17:43:37 host kernel: 00004d91 00004d92 0000ffc5 dc87b320 ccabee80 ccabeed0 dc87b3d0 00004d78
Jun 6 17:43:37 host kernel: 00000020 00000020 00004d90 00004d98 00000020 00000041 00000010 00000020
Jun 6 17:43:37 host kernel: Call Trace:
Jun 6 17:43:37 host kernel: [<c0140bde>] do_generic_mapping_read+0x149/0x445
Jun 6 17:43:37 host kernel: [<c0141142>] __generic_file_aio_read+0x19f/0x1bd
Jun 6 17:43:37 host kernel: [<c0140eda>] file_read_actor+0x0/0xc9
Jun 6 17:43:37 host kernel: [<c01411a0>] generic_file_aio_read+0x40/0x47
Jun 6 17:43:37 host kernel: [<c015ad71>] do_sync_read+0x97/0xc9
Jun 6 17:43:37 host kernel: [<c0107ab4>] do_IRQ+0x1a2/0x1ae
Jun 6 17:43:37 host kernel: [<c01204f5>] autoremove_wake_function+0x0/0x2d
Jun 6 17:43:37 host kernel: [<c02d2837>] schedule+0x8af/0x8db
Jun 6 17:43:37 host kernel: [<c015ae59>] vfs_read+0xb6/0xe2
Jun 6 17:43:37 host kernel: [<c015b06c>] sys_read+0x3c/0x62
Jun 6 17:43:37 host kernel: [<c02d4903>] syscall_call+0x7/0xb
Jun 6 17:43:37 host kernel: [<c02d007b>] packet_rcv+0x46/0x307
Jun 6 17:43:37 host kernel: ds: 007b es: 007b ss: 0068
Jun 6 17:43:37 host kernel: Process httpd (pid: 20040, threadinfo=f18c8000 task=e15bc3b0)
Jun 6 17:43:37 host kernel: Stack: 00000000 dc87b320 c0140bde 00000246 00001000 00000000 0ffc5e52 00000000
Jun 6 17:43:37 host kernel: 00004d91 00004d92 0000ffc5 dc87b320 ccabee80 ccabeed0 dc87b3d0 00004d78
Jun 6 17:43:37 host kernel: 00000020 00000020 00004d90 00004d98 00000020 00000041 00000010 00000020
Jun 6 17:43:37 host kernel: Call Trace:
Jun 6 17:43:37 host kernel: [<c0140bde>] do_generic_mapping_read+0x149/0x445
Jun 6 17:43:37 host kernel: [<c0141142>] __generic_file_aio_read+0x19f/0x1bd
Jun 6 17:43:37 host kernel: [<c0140eda>] file_read_actor+0x0/0xc9
Jun 6 17:43:37 host kernel: [<c01411a0>] generic_file_aio_read+0x40/0x47
Jun 6 17:43:37 host kernel: [<c015ad71>] do_sync_read+0x97/0xc9
Jun 6 17:43:37 host kernel: [<c0107ab4>] do_IRQ+0x1a2/0x1ae
Jun 6 17:43:37 host kernel: [<c01204f5>] autoremove_wake_function+0x0/0x2d
Jun 6 17:43:37 host kernel: [<c02d2837>] schedule+0x8af/0x8db
Jun 6 17:43:37 host kernel: [<c015ae59>] vfs_read+0xb6/0xe2
Jun 6 17:43:37 host kernel: [<c015b06c>] sys_read+0x3c/0x62
Jun 6 17:43:37 host kernel: [<c02d4903>] syscall_call+0x7/0xb
Jun 6 17:43:37 host kernel: [<c02d007b>] packet_rcv+0x46/0x307
Jun 6 17:43:37 host kernel: Code: e8 c7 fc fd ff 83 c4 40 5b 5e c3 56 89 c6 8d 40 10 53 89 d3 e8 6d 2d 19 00 89 da 8d 46 04$
Jun 6 17:43:37 host kernel: <0>Fatal exception: panic in 5 seconds

rtspitz 06-08-2007 04:50 PM

you could run memtest to check the ram.

bendani 06-08-2007 04:53 PM

Quote:

Originally Posted by rtspitz
you could run memtest to check the ram.

how can i do this without psychical access to the server
i got ssh only with root

rtspitz 06-08-2007 06:53 PM

Quote:

Originally Posted by bendani
how can i do this without psychical access to the server
i got ssh only with root

that is a nasty one.

so I guess it's some sort of hosted server (maybe virtualized) ?
in that case I'd recommend talking to the provider's tech-support people.

the logfile gives me (not an expert at all) hints to memory issues (unable to handle kernel paging request...), maybe the later mentioned apache process causes the system to run out of memory....

you could have a litte bash-script monitor the memory usage to a file and try to find correlations with the crashing times.

bendani 06-08-2007 07:05 PM

its not vps
its a dedicated server that i'm renting ...
the server was fine until power failure few months ago
the power failure ruin the server and he had to be reinstalled
after he was reinstalled,
everything was fine
and than the crashes began
the server can work sometimes for 30 days and than crash
and sometimes he can crash every day for a few days

i got swap memory so if my memory is full it shouldn't be a problem

but if you say that replacing to memory will fix it , this i can do

rtspitz 06-08-2007 09:56 PM

well, that was just a guess.

But if you plan having them change the memory modules I guess having them boot the server with a memtest-cd is less effort for them and cheaper. let it run for a day and let them report if any errors came up.

bendani 06-09-2007 05:27 AM

Quote:

Originally Posted by rtspitz
well, that was just a guess.

But if you plan having them change the memory modules I guess having them boot the server with a memtest-cd is less effort for them and cheaper. let it run for a day and let them report if any errors came up.

memtest can take 20 hours
do you know what 20 hours downtime can do for my company ?
i can lose all my clients

please guys , is there any proper solution ?

rtspitz 06-09-2007 10:40 AM

didn't know you were running a business on that machine !

in that case I would move your setup over to a completely new machine.
that will be more expensive, but you'll get warranty on the hardware and don't have to guess what might be wrong, which is likely to drive you crazy.

btw. what kind of power-failure was that ? usually data-centers have backup for power, don't they ?

bendani 06-09-2007 02:08 PM

Quote:

Originally Posted by rtspitz
didn't know you were running a business on that machine !

in that case I would move your setup over to a completely new machine.
that will be more expensive, but you'll get warranty on the hardware and don't have to guess what might be wrong, which is likely to drive you crazy.

btw. what kind of power-failure was that ? usually data-centers have backup for power, don't they ?

usually.. in my case it didn't work

anyway..thanks
is there anything else i can check before doing your suggestion ?

rtspitz 06-09-2007 05:56 PM

I can't think of any specific thing (except memtest) that would tell you what exactly is wrong with that machine.

You could run some cpu or disk torture test (mprime / bonnie) to provoke a crash, but that would only tell us if that machine can run under heavy load or not. Maybe it's bad memory, bad power supply, or just a bad cpu fan... speaking of which, maybe you could try the "lm_sensors" package to read out cpu/mainboard/fan parameters such as temperature and so forth. (http://www.lm-sensors.org/)

on our department server I run a software called "munin" (http://munin.projects.linpro.no/). at least on opensuse/suse enterprise server 10 it's really easy to install. it's basically a server monitoring software that monitors stuff like mem/cpu/disk/network usage/harddisk temperature and so on and creates a web-page with nice graphs (requires rrdtool). that could give some data if something odd happens at times close to a crash. data is taken every 5 minutes.


All times are GMT -5. The time now is 05:12 AM.