LinuxQuestions.org - pppd causing kernel crash

- Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)

- - pppd causing kernel crash (https://www.linuxquestions.org/questions/linux-software-2/pppd-causing-kernel-crash-328100/)

pppd causing kernel crash

Hi all,
First off an intro, I have a computer which is running a custom cd distro, and connects to the internet through a usb isdn modem (it uses the cdc-acm module) with pppd. It has been working fine up until the well known cdc-acm troubles in the kernel since 2.6.8. I have now migrated it to 2.6.11.9 and the computer dials up fine but if it disconnects it then comes up with this when it tries to reconnect:

Code:

Unable to handle kernel NULL pointer dereference at virtual address 00000000

 printing eip:

c02048fa

*pde = 00000000

Oops: 0000 [#1]

Modules linked in: bsd_comp ppp_deflate ppp_async ppp_generic slhc cdc_acm cifs smbfs parport_pc parport usblp 8139too tvaudio tuner bttv video_buf firmware_class i2c_algo_bit btcx_risc tveeprom i2c_core lirc_serial lirc_dev sermouse psmouse atkbd libps2 serport i8042 serio mousedev evdev usbhid usbserial uhci_hcd ohci_hcd ehci_hcd usbcore snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_mpu401_uart snd_rawmidi snd_seq_device snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd soundcore pcspkr

CPU:    0

EIP:    0060:[<c02048fa>]    Tainted: GF    VLI

EFLAGS: 00010286  (2.6.11.9)

eax: 00000000  ebx: 00000001  ecx: ffffffff  edx: cee920b8

esi: c8db7b18  edi: 00000000  ebp: cee920b8  esp: c0041d58

ds: 007b  es: 007b  ss: 0068

Process pppd (pid: 2003, threadinfo=c0041000 task=c0ae9a80)

Stack: c8db7b00 cee92094 c0204966 cee920b8 000003ad c8db7b00 c8db7b18 cee92094

      000003ad c022ca26 cee920b8 000000d0 00000000 c8db790c 00000000 00000000

      c8db7b00 c478a037 c478a053 c478a000 c02052cd c03b9080 c8db7914 c8db7b18

Call Trace:

 [<c0204966>]

 [<c022ca26>]

 [<c02052cd>]

 [<c0204bff>]

 [<c022cd5b>]

 [<c022cd7d>]

 [<d0a1f3a4>]

 [<c021bbf2>]

 [<c01250f8>]

 [<c0146f5e>]

 [<c013f1b4>]

 [<c013fad9>]

 [<c021c1a2>]

 [<c013bb27>]

 [<c0134cc8>]

 [<c0134c02>]

 [<c0134e8e>]

 [<c0101e57>]

Code: 56 e8 93 ff ff ff 89 c3 58 85 db 74 07 56 e8 af 75 f5 ff 5e 89 d8 5b 5e c3 57 53 8b 54 24 0c bb 01 00 00 00 8b 3a 31 c0 83 c9 ff <f2> ae f7 d1 49 8b 52 24 8d 5c 0b 01 85 d2 75 e9 89 d8 5b 5f c3

This used to mention PREEMT, so I removed preemption and it now shows the message you see above. Is this a pppd incompatability with the new kernel? Or is it something wrong in the kernel itself (given it's the only thing I have changed)?

Here's a list of the modules loaded BEFORE the crash happens, I will update this with another one after it happens again.

Code:

Module                  Size  Used by

bsd_comp                4224  0

ppp_deflate            4096  0

ppp_async              7680  1

ppp_generic            15380  7 bsd_comp,ppp_deflate,ppp_async

slhc                    4864  1 ppp_generic

cdc_acm                8224  2

cifs                  150904  0

smbfs                  45304  0

parport_pc            26436  0

parport                21320  1 parport_pc

usblp                  8832  0

8139too                16128  0

tvaudio                14884  0

tuner                  14756  0

bttv                  117584  0

video_buf              11012  1 bttv

firmware_class          5760  1 bttv

i2c_algo_bit            6536  1 bttv

btcx_risc              2568  1 bttv

tveeprom                8600  1 bttv

i2c_core              11408  5 tvaudio,tuner,bttv,i2c_algo_bit,tveeprom

lirc_serial            9056  1

lirc_dev                9612  1 lirc_serial

sermouse                3584  0

psmouse                18312  0

atkbd                  10384  0

libps2                  2944  2 psmouse,atkbd

serport                2688  0

i8042                  8028  0

serio                  7304  7 sermouse,psmouse,atkbd,serport,i8042

mousedev                7448  1

evdev                  6528  0

usbhid                19712  0

usbserial              19944  0

uhci_hcd              22032  0

ohci_hcd              12168  0

ehci_hcd              20616  0

usbcore                69624  8 cdc_acm,usblp,usbhid,usbserial,uhci_hcd,ohci_hcd,ehci_hcd

snd_seq_oss            20864  0

snd_seq_midi_event      3456  1 snd_seq_oss

snd_seq                29616  4 snd_seq_oss,snd_seq_midi_event

snd_pcm_oss            36896  0

snd_mixer_oss          12800  1 snd_pcm_oss

snd_mpu401_uart        4096  0

snd_rawmidi            13856  1 snd_mpu401_uart

snd_seq_device          4360  2 snd_seq_oss,snd_rawmidi

snd_intel8x0          20032  0

snd_ac97_codec        46720  1 snd_intel8x0

snd_pcm                49416  3 snd_pcm_oss,snd_intel8x0,snd_ac97_codec

snd_timer              13700  2 snd_seq,snd_pcm

snd_page_alloc          5636  2 snd_intel8x0,snd_pcm

snd                    26852  11 snd_seq_oss,snd_seq,snd_pcm_oss,snd_mixer_oss,snd_mpu401_uart,snd_rawmidi,snd_seq_device,snd_intel8x0,snd_ac97_codec,snd_pcm,snd_timer

soundcore              4320  1 snd

pcspkr                  2660  0

I guess what I'm wanting to know is should I go to all the trouble of putting a new kernel in (about 12 hours work) in hopes it will fix it, or could it be something else that's now become outdated?

Quote:

EIP: 0060:[<c02048fa>] Tainted: GF VLI

this is the pointer to the last executed instruction before the crash

now the problem is i don't see this address in the stack or the call trace

is there further output after the Code: section you did not show us ?

as it is at least for my limited understanding i can't diagnose what caused the crash from this output

The output is copied straight from the logs, there isn't any other mention of anything going wrong - as in samba still works, networking still works...I can still play videos and things. Just pppd won't run anymore, and it won't allow me to unload the modules.

I can tell you how it happens though, i'll make a timeline..

1. It will be sitting on the net with pppd and chat running, with the modem on ttyACM0

2. Either the ISP will send a hangup to us (every 4 hours), OR the kernel (through electrical interference or otherwise) will detect or cause a usb hub timeout, which then re-initializes the modem as ttyACM1 (or ttyACM0 if the previous step finds it as ttyACM1...it swaps between the two)

3. Since pppd is set to persist, it should stay open and redial automatically if the modem hasn't changed ports.

*** This is where the problem happens. pppd used to continue trying the old port forever, so if it hadn't changed then it would reconnect. If the modem had changed ports, I was able to kill off pppd, change the port, and re-run it and everything would be back working.

Yet now if it gets disconnected from the other end or the usb has disconnected and reconnected the modem, it comes up with the kernel error, and pppd is killed off (presumably something to do with the error, rather than its own choice). I can't unload the modules, and if I re-run pppd it does run, but does not do anything and I am unable to kill it off at all. The only way out is to reset the machine.

Could it be a bug in the 2.6.11.9 kernel or cdc-acm module? or maybe mod-utils, or pppd? I have run ram tests and processor tests on the host computer, and tried re-compiling incase it was a buggy image, and even tried different cd's and different cd drives and everything checks out. The only software I have changed is the kernel.

--------------------------------

Edit: Just had it happen 15 minutes ago, pppd quit and the usb disconnected the modem, but I was able to re-run pppd with no troubles. Might be slightly intermittent although It has happened every time over the past couple of days.

I am going ahead with the 2.6.11.11 kernel to see if that helps...will know late tonight and post if it's a success.

Well the new kernel is in play (2.6.11.11), and just got the same problem. I'll post the logs below, they both overlap by the way (messages and syslog):

Code:

Jun  2 15:25:27 (none) kernel: hub 2-0:1.0: port 2 disabled by hub (EMI?), re-enabling...

Jun  2 15:25:28 (none) kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000

Jun  2 15:25:28 (none) kernel:  printing eip:

Jun  2 15:25:28 (none) kernel: c020a8a2

Jun  2 15:25:28 (none) kernel: *pde = 00000000

Jun  2 15:25:28 (none) kernel: Oops: 0000 [#1]

Jun  2 15:25:28 (none) kernel: Modules linked in: bsd_comp ppp_deflate ppp_async ppp_generic slhc cdc_acm cifs smbfs parport_pc parport usblp 8139too tvaudio tuner bttv video_buf firmware_class i2c_algo_bit btcx_risc tveeprom i2c_core lirc_serial lirc_dev sermouse psmouse atkbd libps2 serport i8042 serio mousedev evdev usbhid usbserial uhci_hcd ohci_hcd ehci_hcd usbcore snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_mpu401_uart snd_rawmidi snd_seq_device snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd soundcore pcspkr

Jun  2 15:25:28 (none) kernel: CPU:    0

Jun  2 15:25:29 (none) kernel: EIP:    0060:[<c020a8a2>]    Tainted: GF    VLI

Jun  2 15:25:29 (none) kernel: EFLAGS: 00010286  (2.6.11.11) 

Jun  2 15:25:29 (none) kernel: eax: 00000000  ebx: 00000001  ecx: ffffffff  edx: cb1420b8

Jun  2 15:25:29 (none) kernel: esi: ca8bca18  edi: 00000000  ebp: cb1420b8  esp: c57fcde0

Jun  2 15:25:29 (none) kernel: ds: 007b  es: 007b  ss: 0068

Jun  2 15:25:29 (none) kernel: Process pppd (pid: 2904, threadinfo=c57fc000 task=c32f5a00)

Jun  2 15:25:29 (none) kernel: Stack: ca8bca00 cb142094 c020a90e cb1420b8 000003ad ca8bca00 ca8bca18 cb142094 

Jun  2 15:25:29 (none) kernel:        000003ad c02329c6 cb1420b8 000000d0 00000000 c952818c 00000000 00000000 

Jun  2 15:25:29 (none) kernel:        ca8bca00 c976a837 c976a853 c976a800 c020b275 c03c0500 c9528194 ca8bca18 

Jun  2 15:25:29 (none) kernel: Call Trace:

Jun  2 15:25:29 (none) kernel:  [<c020a90e>]

Jun  2 15:25:29 (none) kernel:  [<c02329c6>]

Jun  2 15:25:29 (none) kernel:  [<c020b275>]

Jun  2 15:25:29 (none) kernel:  [<c020aba7>]

Jun  2 15:25:29 (none) kernel:  [<c0232cfb>]

Jun  2 15:25:29 (none) kernel:  [<c0232d1d>]

Jun  2 15:25:29 (none) kernel:  [<d0a1f3a4>]

Jun  2 15:25:29 (none) kernel:  [<c0221b92>]

Jun  2 15:25:29 (none) kernel:  [<c010d8ff>]

Jun  2 15:25:29 (none) kernel:  [<c0110029>]

Jun  2 15:25:29 (none) kernel:  [<c011164b>]

Jun  2 15:25:29 (none) kernel:  [<c03464c8>]

Jun  2 15:25:29 (none) kernel:  [<c010c6ae>]

Jun  2 15:25:29 (none) kernel:  [<c0345e9d>]

Jun  2 15:25:29 (none) kernel:  [<c0142804>]

Jun  2 15:25:29 (none) kernel:  [<c0222392>]

Jun  2 15:25:29 (none) kernel:  [<c01360f9>]

Jun  2 15:25:29 (none) kernel:  [<c0134f63>]

Jun  2 15:25:29 (none) kernel:  [<c0134faf>]

Jun  2 15:25:29 (none) kernel:  [<c0101e57>]

Jun  2 15:25:29 (none) kernel: Code: 56 e8 93 ff ff ff 89 c3 58 85 db 74 07 56 e8 4f 57 f5 ff 5e 89 d8 5b 5e c3 57 53 8b 54 24 0c bb 01 00 00 00 8b 3a 31 c0 83 c9 ff <f2> ae f7 d1 49 8b 52 24 8d 5c 0b 01 85 d2 75 e9 89 d8 5b 5f c3

Code:

Jun  2 15:18:49 (none) -- MARK --

Jun  2 15:25:27 (none) kernel: usb 2-2: USB disconnect, address 2

Jun  2 15:25:27 (none) pppd[2904]: Hangup (SIGHUP)

Jun  2 15:25:27 (none) pppd[2904]: Modem hangup

Jun  2 15:25:27 (none) pppd[2904]: Connect time 84.4 minutes.

Jun  2 15:25:27 (none) pppd[2904]: Sent 13655914 bytes, received 46880023 bytes.

Jun  2 15:25:27 (none) pppd[2904]: Connection terminated.

Jun  2 15:25:27 (none) kernel: usb 2-2: new full speed USB device using uhci_hcd and address 3

Jun  2 15:25:27 (none) kernel: cdc_acm 2-2:1.0: ttyACM1: USB ACM device

Jun  2 15:26:12 (none) pppd[3342]: pppd 2.4.3 started by root, uid 0

Jun  2 15:26:12 (none) pppd[3342]: Removed stale lock on input_ttyACM0 (pid 2904)

The very first line (Jun 2 15:25:27 (none) kernel: hub 2-0:1.0: port 2 disabled by hub (EMI?), re-enabling...) seems to be caused either by the thing timing out on its own, or sometimes by people switching lights on/off and fan switches. This last problem was a fan switch. I believe it's possible to disconnect the usb cable and reconnect it manually without any problems, but I will test that another time.

The problem is that when pppd tries to connect after this glitch, it runs and then freezes, with no way to kill it. The cdc_acm module will not unload either - the only way out is a reboot.

One thing I have changed is that I used to have the usb ohci/uhci/etc, and cdc_acm built in to the kernel (all with 2.6.7). Could it be something to do with it being a module that is causing this? Thanks for any help - i'll be glad when this is over!

I think I have the same problems with you. Though I don have a console to see what happens.
I got a custom dist based on debian
and kernel 2.6.11
pppd version 2.4.3
The system halts completely!!

My system doesn't halt completely, infact everything works perfectly except pppd freezes when trying to talk to the modem after this has happened.

What I have done is compiled the uhci/ohci/etc in to the kernel, along with cdc-acm. I have only tried it for about 24 hours now and while the modem still resets (due to lights/fans and other electrical interference) it comes back flawlessly, with nothing written to the logs. If pppd didn't quit before its due time, I wouldn't have any clue that this problem was happening. The modem still changes usb id's though. The length of the usb cable didn't seem to make any difference either.

So If this continues working, then pppd freezing only happens if cdc-acm and/or the uhci/ohci drivers are built as modules, and the usb hub decides to reset due to EMI. Hopefully this helps someone in the future at least!

I recompiled the kernel with USB bult in but
unfortunately it didn't work.
In some point the call is interuppted and the system halts
and I mean halts completely.

I got kernel version 2.6.11_10.

By the way I notesed that system crashes also if I unplug
the modem usb cable from the system.
But only if ------ mgetty /dev/usb/acm/ttyACM0

The compination of a mgetty watching the ttyACM0
and usb unplug causes the kernel to crash.

If the mgetty is not running and plug unplug the modem
is ok. I don't get any problem.

Any one any ideas?

Not sure why your whole machine would be stopping like that - are you sure you're not able to get any logs at all? What I would do if I was in that situation is run the usual ram and processor tests (http://www.ultimatebootcd.com). Without logs though it's a little difficult to know where to look after that, Sorry.

Of course I get logs
the kernel dumps a stack
trace on the console.
the last part it looks like this :

Code: 8b 93 ... bla bla bla

<1> Unable to handle kernel NULL
pointer dereference at virtual
address 00000000020

printing eip:
C0116588
*pde=0000000000
Oops: 0000 [#16]
SMP
Modules linked in : evdev ....... bla bla bla

CPU: 0
EIP: 0060:[<c0116588>] Not tinted vli
EFLAGS: 00010282 (2.6.11.11.PENTIUM-M)
EIP is at m_release + 0x38/0xa0

Do you know if it dumps it also in a file?

If it's spitting it out to the console then you're probably not running the syslog daemon which would store it in /var/log/messages or /var/log/syslog.

I noticed it mentioned SMP there though, if compiling the kernel without any SMP or preempt support at all isn't too much trouble then try that and see if it helps. That's about all I can get from that info :(

Nothing, thanx any way!
I'll start trying older versions.

It seem to be more stable with version 2.4.(37-something)
I can unplug now with out kernel crash.
But this version has a problem with ups-hid I can't connect
to my MGE UPS now.
What a mess!!!!