LinuxQuestions.org - 2.4.17 crashing; how do you set up logging and debug?

- Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)

- - 2.4.17 crashing; how do you set up logging and debug? (https://www.linuxquestions.org/questions/linux-general-1/2-4-17-crashing%3B-how-do-you-set-up-logging-and-debug-12878/)

2.4.17 crashing; how do you set up logging and debug?

I have an RH 7.1 install. I updated the kernel to 2.4.17 src and compiled my own. I also updated a good number of the packages to either the updated 7.1 rpms or the new rawhide rpms (The kernel came from rawhide. I'm using gcc 2.96.98 to compile the kernel.

The box this is on is a AMD 5x86 (it's a 486 class board w/ PCI) and running as a network server for my Win98 box. The machine runs xinetd for telnet, samba, and dhcpd. The only non redhat thing on it is the NIC card driver. The built in NIC card driver for a 3com 900B in the stock RH 7.1 kernel was crap so I now use the driver provided by 3com. When I upgraded the kernel, I stuck with 3com's driver and recompiled it for the new kernel.

Under heavy load (I was running badblocks on a new drive through a telnet login, playing an mp3 over samba, and compiling a new kernel in a second telnet login), the box is unstable and crashes regularly with "cannot handle kernel virtual memory paging request" errors. I realize they just rebuilt the VM system.

I need help setting up good kernel logging and I need to know what to do with it to correct the crashing under heavy load. A unix box failing under heavy load is not tolerable to me.

Am I doing something wrong? Is 2.4.17's VM still unstable? Is this problem fixable, and what should I do to fix it?

thanks.

first - try using a different gcc. i don't know that the 7.x redhat compiler has any problems, but the 2.96xx gcc has been problematic. if you can, try recompiling the kernel with gcc 2.95. if that doesn't help, then the kernel hackers will definitely want to see that error if you can reproduce it. check out kernel.org for bug reporting.

I had mega problems with 2.4.17 using GCC 2.9.6. Once I upgraded to version 3.0.3, all of my problems went away. I was having kernal dump errors, booting problems, instability, you name it. You might try that first before going to something more dramatic.

As far as debugging is concerned, I would rewrite your syslog.conf file to log all kernel messages and whatever else you want logged to a differenty tty, like tty12. Then when something crashes hard you should be able to switch over to that tty and see what happened; and of course you can dump these to a file in addition. If you're not famliiar with the syslog.conf file in detail, let me know and I'll give you some examples. :D

#This is the kernel specific part of my "syslog.conf"

kern.warning /var/log/kernelerr
kern.crit /dev/console

#This is from my syslog init script setting the options for the loggers

SYSLOGD_OPTIONS="-m 0"
KLOGD_OPTIONS="-c 6 -2"

I did get quite a bit of data logged to /var/log/kernelerr last crash..
I don't know how to do anything useful with it though.

This is the data I got from last crash, what in it is of any use??

Jan 27 01:45:54 Server kernel: Unable to handle kernel paging request at virtual
address c7391e5f
Jan 27 01:45:54 Server kernel: printing eip:
Jan 27 01:45:54 Server kernel: c0130ee0
Jan 27 01:45:54 Server kernel: *pde = 00000000
Jan 27 01:45:54 Server kernel: Oops: 0000
Jan 27 01:45:54 Server kernel: CPU: 0
Jan 27 01:45:54 Server kernel: EIP: 0010:[try_to_free_buffers+32/256] Not
tainted
Jan 27 01:45:54 Server kernel: EIP: 0010:[<c0130ee0>] Not tainted
Jan 27 01:45:54 Server kernel: EFLAGS: 00010206
Jan 27 01:45:54 Server kernel: EIP is at
Jan 27 01:45:54 Server kernel: eax: 00000000 ebx: c7391e47 ecx: 000001d0 e
dx: 00000000
Jan 27 01:45:54 Server kernel: esi: c10379c0 edi: c148fed0 ebp: c10379c0 e
sp: c10edf08
Jan 27 01:45:54 Server kernel: ds: 0018 es: 0018 ss: 0018
Jan 27 01:45:54 Server kernel: Process kswapd (pid: 4, stackpage=c10ed000)
Jan 27 01:45:54 Server kernel: Stack: 00000212 00000213 c1000000 c0205cf8 000001
d0 c10379c0 00000018 000002cc
Jan 27 01:45:54 Server kernel: c0127a45 c10379c0 000001d0 00000000 c10ec0
00 00000048 000001d0 c0205d88
Jan 27 01:45:54 Server kernel: c1063e40 c0eda010 c10637b0 00000000 000000
20 000001d0 00000006 00000020
Jan 27 01:45:55 Server kernel: Call Trace: [shrink_cache+501/832]
Jan 27 01:45:55 Server kernel: Call Trace: [<c0127a45>]
Jan 27 01:45:55 Server kernel: [shrink_caches+78/128]
Jan 27 01:45:55 Server kernel: [<c0127cce>]
Jan 27 01:45:55 Server kernel: [try_to_free_pages+60/96]
Jan 27 01:45:55 Server kernel: [<c0127d3c>]
Jan 27 01:45:55 Server kernel: [kswapd_balance_pgdat+81/176]
Jan 27 01:45:55 Server kernel: [<c0127de1>]
Jan 27 01:45:55 Server kernel: [kswapd_balance+38/64]
Jan 27 01:45:55 Server kernel: [<c0127e66>]
Jan 27 01:45:55 Server kernel: [kswapd+145/176]
Jan 27 01:45:55 Server kernel: [<c0127f91>]
Jan 27 01:45:55 Server kernel: [kswapd+0/176]
Jan 27 01:45:55 Server kernel: [<c0127f00>]
Jan 27 01:45:55 Server kernel: [stext+0/48]
Jan 27 01:45:55 Server kernel: [<c0105000>]
Jan 27 01:45:55 Server kernel: [kernel_thread+38/48]
Jan 27 01:45:55 Server kernel: [<c0105566>]
Jan 27 01:45:55 Server kernel: [kswapd+0/176]
Jan 27 01:45:55 Server kernel: [<c0127f00>]
Jan 27 01:45:55 Server kernel:
Jan 27 01:45:55 Server kernel:
Jan 27 01:45:55 Server kernel: Code: 8b 53 18 83 e2 06 8b 43 10 09 d0 0f 85 7f 0
0 00 00 8b 5b 28
Jan 27 01:48:18 Server kernel: <1>Unable to handle kernel paging request at virtual address df2cdd14
Jan 27 01:48:18 Server kernel: printing eip:
Jan 27 01:48:18 Server kernel: c0130ee0
Jan 27 01:48:18 Server kernel: *pde = 00000000
Jan 27 01:48:18 Server kernel: Oops: 0000
Jan 27 01:48:18 Server kernel: CPU: 0
Jan 27 01:48:18 Server kernel: EIP: 0010:[try_to_free_buffers+32/256] Not
tainted
Jan 27 01:48:18 Server kernel: EIP: 0010:[<c0130ee0>] Not tainted
Jan 27 01:48:18 Server kernel: EFLAGS: 00010282
Jan 27 01:48:18 Server kernel: EIP is at
Jan 27 01:48:18 Server kernel: eax: 00000000 ebx: df2cdcfc ecx: 000001d0 e
dx: c0205ce0
Jan 27 01:48:18 Server kernel: esi: c10393c0 edi: df2cdcfc ebp: c10393c0 e
sp: c117fe54
Jan 27 01:48:18 Server kernel: ds: 0018 es: 0018 ss: 0018
Jan 27 01:48:18 Server kernel: Process badblocks (pid: 845, stackpage=c117f000)
Jan 27 01:48:18 Server kernel: Stack: 00000000 00000213 c1000000 c0205cf8 000001
d0 c10393c0 0000001d 000002d8
Jan 27 01:48:18 Server kernel: c0127a45 c10393c0 000001d0 c117fef8 c117e0
00 00000046 000001d0 c0205d88
Jan 27 01:48:18 Server kernel: c013280e 00000341 c1063580 00000000 000000
20 000001d0 00000006 00000020
Jan 27 01:48:18 Server kernel: Call Trace: [shrink_cache+501/832]
Jan 27 01:48:18 Server kernel: Call Trace: [<c0127a45>]
Jan 27 01:48:18 Server kernel: [blkdev_get_block+30/80]
Jan 27 01:48:18 Server kernel: [<c013280e>]
Jan 27 01:48:18 Server kernel: [shrink_caches+78/128]
Jan 27 01:48:18 Server kernel: [<c0127cce>]
Jan 27 01:48:18 Server kernel: [try_to_free_pages+60/96]
Jan 27 01:48:18 Server kernel: [<c0127d3c>]
Jan 27 01:48:18 Server kernel: [balance_classzone+81/432]
Jan 27 01:48:18 Server kernel: [<c01285d1>]
Jan 27 01:48:18 Server kernel: [__alloc_pages+298/400]
Jan 27 01:48:18 Server kernel: [<c012885a>]
Jan 27 01:48:18 Server kernel: [generic_file_write+1078/1728]
Jan 27 01:48:18 Server kernel: [<c0124236>]
Jan 27 01:48:18 Server kernel: [do_IRQ+156/176]
Jan 27 01:48:18 Server kernel: [<c010846c>]
Jan 27 01:48:18 Server kernel: [sys_write+150/208]
Jan 27 01:48:18 Server kernel: [<c012d4e6>]
Jan 27 01:48:18 Server kernel: [sys_llseek+192/208]
Jan 27 01:48:18 Server kernel: [<c012d370>]
Jan 27 01:48:18 Server kernel: [system_call+51/64]
Jan 27 01:48:18 Server kernel: [<c0106dc3>]
Jan 27 01:48:18 Server kernel:
Jan 27 01:48:18 Server kernel:
Jan 27 01:48:18 Server kernel: Code: 8b 53 18 83 e2 06 8b 43 10 09 d0 0f 85 7f 0
0 00 00 8b 5b 28

Please be more specific on what you were running? You were running a RH 7.2 box w/ the 2.4.17 kernel? The gcc that 7.2 comes with is 2.96-98 as far as I can tell all 2.96-xx are custom RH versions, not official gcc versions. The kernel notes for 2.4.17 mainain that, for stability, use 2.95.3 or 2.95.4. Neither of these are available in RPM so I'm hesitant to try to install them. They make a big deal out of 3.0 being unstable. I'll try it though, but I'd like to learn how to do rudimentary debugging while I have this problem.

Just a note - I had a TON of problems with 2.4.17 on this machine. The machine would lock solid after about 6 days. No logs, No error messages - just a hard lock. Reverted back to 2.4.13 and everything is ok. I will probably try 2.4.18 + rmap when it is available.

--jeremy

strange... i haven't had any problems at all. i've been using 2.4.17 for a while now - it's a patched mjc(pre3) kernel - it's easily the fastest kernel i've ever used for a desktop.