ext3 related kernel panic (keeps occurring)

cygnus-x1 · 10-11-2007, 10:16 AM

We are running CentOS 4.4 (64 bit) kernel 2.6.9-34.ELsmp. There are 2 RAID arrays on the box 1 72 gig array root partition( / ) at RAID level 1 and a 1.8 Tera byte partition at RAID level 5.

The SCSI and RAID controllers in the box are:

SCSI
03:01.0 SCSI storage controller: Adaptec AIC-7892A U160/m (rev 02)
Subsystem: Adaptec 29160 Ultra160 SCSI
Controller

RAID
03:02.0 RAID bus controller: Adaptec AAC-RAID (Rocket) (rev 02)
Subsystem: Adaptec ASR-2820SA

Approximately every 2-3 weeks we get a kernel panic. Originally we were having drive issues but have since swapped out to a newer drives and according to the RAID controller the arrays are optimal now but the kernel panic keeps occuring.

The box does a lot of I/O as it is a production web backup machine.

I have not yet had the chance to setup a netdump server so I don't have the whole console but I do have a screen shot of the final lines. It appears to be a syncing issue within ext3. BTW LVM is in use if that is important.

Here is a link to the console image.

Any help is GREATLY appreciated. Even if it is directing me to a different place to ask the question.

macemoneta · 10-11-2007, 04:52 PM

From what I can see, it looks like a paging error.

1- Are your swap areas properly initialized?
2- Did you run out of swap?

If you get a complete panic output, you should submit it to your distribution vendor's bugzilla.

cygnus-x1 · 10-11-2007, 09:21 PM

Quote:

Originally Posted by macemoneta

From what I can see, it looks like a paging error.

1- Are your swap areas properly initialized?

Not quite sure I know how to tell. I let CentOS handle that configuration when I installed it. It has 2 gigs of swap space. The swap space is part of LVM however. Not sure if that is a good idea or not.

Quote:

Originally Posted by macemoneta

2- Did you run out of swap?

How would I know that? Is there something I can use to monitor swap usage? When I do a swapon -s I see that I have only used 160 however it is running fine now. The machine locks down hard when it panics and you have to power cycle it so postmortem info is hard to come by. I am really not that familiar with knowing what to do in this case anyway.

Quote:

Originally Posted by macemoneta

If you get a complete panic output, you should submit it to your distribution vendor's bugzilla.

I will try and get a netdump server configured and a whole punched through the firewall so I can get the full console when in it crashes.

thanks

macemoneta · 10-11-2007, 09:50 PM

Quote:

Originally Posted by cygnus-x1

Not quite sure I know how to tell. I let CentOS handle that configuration when I installed it. It has 2 gigs of swap space. The swap space is part of LVM however. Not sure if that is a good idea or not.

If there was an interruption during the install and you restarted, the installer may think the swap is already initialized and not complete that step if you re-run the install. In that case, you need to manually re-initialize the swap space:

Code:

swapoff -a
mkswap swapdevice
swapon -a

Quote:

How would I know that? Is there something I can use to monitor swap usage? When I do a swapon -s I see that I have only used 160 however it is running fine now. The machine locks down hard when it panics and you have to power cycle it so postmortem info is hard to come by. I am really not that familiar with knowing what to do in this case anyway.

There are many ways to monitor swap. The simplest is to just open a terminal and run:

Code:

while true; do swapon -s | grep -v Filename | awk '{print $4}' | echo "`date` `cat -`" ; sleep 15 ; done

Quote:

I will try and get a netdump server configured and a whole punched through the firewall so I can get the full console when in it crashes.

You can also force the system to reboot cleanly. Some one-time preparation is required. Add to /etc/sysctl.conf:

Code:

kernel.sysrq = 1

and also run (as root):

Code:

echo "1" > /proc/sys/kernel/sysrq

This will enable the magic sysrq function. The next time the system panics, press and hold Alt and Sys Req (usually the same key as PrtSc). While holding those down, press the following keys in this sequence, with about a 5 second delay between keys: reisub

If the magic sysreq function was able to get control, at the end of that sequence the system will reboot cleanly. With any luck, you will have your panic in /var/log/messages, or at least some more data to go on.

cygnus-x1 · 10-11-2007, 10:09 PM

Thank you very much. Never heard of that magic sequence before I will give it a try. I reinitialized swap. I will setup a monitoring script and record the info somewhere so I can the next time it crashes if swap is full or not.

thanks again o wizard!

Doug

syg00 · 10-11-2007, 10:33 PM

Mmmm - if swap filling was the issue you'd probably hear from the oom_killer. I suspect you have other problems.
Centos should have systat installed - see if you have that running; sar will give you all the info you'll need, and a good history. Running vmstat in background is another option in addition to the commands above.

salasi · 10-12-2007, 02:57 AM

Quote:

We are running CentOS 4.4 (64 bit) kernel 2.6.9-34.ELsmp.

Back then there was a bug in SATA support that only affected 64 bit versions. I don't know that this bug affected anything other than SATA (and have no idea whether it affected LVM and/or SCSI), but I'd do a bit of research on that and see if CentOS offers any newer/patched kernels and/or bug reports just in case it had wider ramifications.

From memory, the bug was fixed around 2.6.10, and it wouldn't let you install 64 bit/SATA at all, so may not be relevant in any way to your circumstance, but I'd still want to spend a few minutes checking into it.

cygnus-x1 · 10-12-2007, 07:19 AM

Quote:

Originally Posted by salasi

Back then there was a bug in SATA support that only affected 64 bit versions. I don't know that this bug affected anything other than SATA (and have no idea whether it affected LVM and/or SCSI), but I'd do a bit of research on that and see if CentOS offers any newer/patched kernels and/or bug reports just in case it had wider ramifications.

From memory, the bug was fixed around 2.6.10, and it wouldn't let you install 64 bit/SATA at all, so may not be relevant in any way to your circumstance, but I'd still want to spend a few minutes checking into it.

Thank you for that tidbit. I will take the time to investigate. One of our options was to wipe and reinstall CentOS 5.0. I would rather just upgrade if that path is available to me.

cygnus-x1 · 10-12-2007, 01:07 PM

Quote:

Originally Posted by syg00

Mmmm - if swap filling was the issue you'd probably hear from the oom_killer. I suspect you have other problems.
Centos should have systat installed - see if you have that running; sar will give you all the info you'll need, and a good history. Running vmstat in background is another option in addition to the commands above.

Never used any of these commands before. I don't seem to have systat but I do have sar and vmstat. What would sar tell me? Can I look back in time with it? vmstat seems to give up to the minute info, if the kernel panics how would it help then?

thanks

Doug