Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
08-08-2004, 07:34 PM
|
#1
|
Member
Registered: Aug 2003
Location: Little Rock, Arkansas
Distribution: RH, Fedora, Suse, AIX
Posts: 736
Rep:
|
kernel panic Aiee, killing interrupt handler
I'm having a problem doing rsync to another machine. Here's the setup...
I've got my first machine with all of my data. I've got a second machine with a 160gig drive mounted as a slave. I've got the 160gig mounted over NFS to the first machine. So what I'm doing is... every night via cron, I'm running rsync on the first machine and it's copying data to the 160gig drive that is mounted on the first via NFS.
When I run it normally during the week, there is not a lot of data being moved. Usually everything works fine. However, on the weekends I'm doing a bigger backup and its throwing kernel errors on the machine that holds the 160gig drive. When the error happens, it completely freezes the backup machine and I have to reboot to get everything back to normal. Thankfully the cron is just paused while the backup machine is frozen and when it comes back online the backup continues to run. So the backups are doing good, but I still hate rebooting and I'm sure the kernel errors need to be fixed.
When I boot up, I have been running fsck on the 160gig and the check seems to pass without any problems.
Here's the error...
Quote:
Aug 8 04:32:24 xwing kernel: Unable to handle kernel paging request at virtual address 6a63c0c4
Aug 8 04:32:24 xwing kernel: printing eip:
Aug 8 04:32:24 xwing kernel: c0145119
Aug 8 04:32:24 xwing kernel: *pde = 00000000
Aug 8 04:32:24 xwing kernel: Oops: 0002
Aug 8 04:32:24 xwing kernel: nfsd lockd sunrpc autofs via-rhine mii sg scsi_mod keybdev mousedev hid input usb-uhci usbcore ext3 jbd
Aug 8 04:32:24 xwing kernel: CPU: 0
Aug 8 04:32:24 xwing kernel: EIP: 0060:[<c0145119>] Not tainted
Aug 8 04:32:24 xwing kernel: EFLAGS: 00010206
Aug 8 04:32:24 xwing kernel:
Aug 8 04:32:24 xwing kernel: EIP is at get_unused_buffer_head [kernel] 0x49 (2.4.22-1.2197.nptl)
Aug 8 04:32:24 xwing kernel: eax: 6a63c0c0 ebx: 00000000 ecx: c3522000 edx: cfe25300
Aug 8 04:32:24 xwing kernel: esi: 00000000 edi: 00001000 ebp: 00000001 esp: dc64dccc
Aug 8 04:32:24 xwing kernel: ds: 0068 es: 0068 ss: 0068
Aug 8 04:32:24 xwing kernel: Process nfsd (pid: 2491, stackpage=dc64d000)
Aug 8 04:32:24 xwing kernel: Stack: c15abeac 000000f0 c01451b8 00000001 d800dd40 c15b3af4 00000341 c14b30b0
Aug 8 04:32:24 xwing kernel: dde0e840 dde0e840 c0145435 c14b30b0 00001000 00000001 c14b30b0 c14b30b0
Aug 8 04:32:24 xwing kernel: c01459b5 c14b30b0 00000341 00001000 0000001c 00000000 d9103000 dc64dd38
Aug 8 04:32:24 xwing kernel: Call Trace: [<c01451b8>] create_buffers [kernel] 0x28 (0xdc64dcd4)
Aug 8 04:32:24 xwing kernel: [<c0145435>] create_empty_buffers [kernel] 0x25 (0xdc64dcf4)
Aug 8 04:32:24 xwing kernel: [<c01459b5>] __block_prepare_write [kernel] 0x2d5 (0xdc64dd0c)
Aug 8 04:32:24 xwing kernel: [<de80d36b>] new_handle [jbd] 0x2b (0xdc64dd34)
Aug 8 04:32:24 xwing kernel: [<c0146169>] block_prepare_write [kernel] 0x39 (0xdc64dd50)
Aug 8 04:32:24 xwing kernel: [<de81f540>] ext3_get_block [ext3] 0x0 (0xdc64dd64)
Aug 8 04:32:24 xwing kernel: [<de81faf3>] ext3_prepare_write [ext3] 0xa3 (0xdc64dd70)
Aug 8 04:32:24 xwing kernel: [<de81f540>] ext3_get_block [ext3] 0x0 (0xdc64dd80)
Aug 8 04:32:24 xwing kernel: [<c0132985>] add_to_page_cache_unique [kernel] 0x45 (0xdc64dd8c)
Aug 8 04:32:24 xwing kernel: [<c0135a13>] do_generic_file_write [kernel] 0x223 (0xdc64dda0)
Aug 8 04:32:24 xwing kernel: [<c0135fe6>] generic_file_write [kernel] 0x136 (0xdc64ddf0)
Aug 8 04:32:24 xwing kernel: [<de81cfe9>] ext3_file_write [ext3] 0x39 (0xdc64de1c)
Aug 8 04:32:24 xwing kernel: [<de92dccf>] nfsd_write [nfsd] 0x14f (0xdc64de3c)
Aug 8 04:32:24 xwing kernel: [<c01184e0>] recalc_task_prio [kernel] 0x90 (0xdc64de84)
Aug 8 04:32:24 xwing kernel: [<de82be80>] ext3_file_operations [ext3] 0x0 (0xdc64dea4)
Aug 8 04:32:24 xwing kernel: [<de90b67f>] svc_sock_enqueue [sunrpc] 0x1bf (0xdc64df00)
Aug 8 04:32:24 xwing kernel: [<de933a28>] nfsd3_proc_write [nfsd] 0xa8 (0xdc64df14)
Aug 8 04:32:24 xwing kernel: [<de93bb3c>] nfsd_procedures3 [nfsd] 0xfc (0xdc64df40)
Aug 8 04:32:24 xwing kernel: [<de9295ce>] nfsd_dispatch [nfsd] 0xce (0xdc64df4c)
Aug 8 04:32:24 xwing kernel: [<de93b378>] nfsd_version3 [nfsd] 0x0 (0xdc64df60)
Aug 8 04:32:24 xwing kernel: [<de929500>] nfsd_dispatch [nfsd] 0x0 (0xdc64df64)
Aug 8 04:32:24 xwing kernel: [<de90b37f>] svc_process_R2466cc14 [sunrpc] 0x44f (0xdc64df68)
Aug 8 04:32:24 xwing kernel: [<de93bb3c>] nfsd_procedures3 [nfsd] 0xfc (0xdc64df88)
Aug 8 04:32:24 xwing kernel: [<de93b398>] nfsd_program [nfsd] 0x0 (0xdc64df8c)
Aug 8 04:32:24 xwing kernel: [<de9293a2>] nfsd [nfsd] 0x182 (0xdc64dfa8)
Aug 8 04:32:24 xwing kernel: [<de929220>] nfsd [nfsd] 0x0 (0xdc64dfe0)
Aug 8 04:32:24 xwing kernel: [<c010719d>] kernel_thread_helper [kernel] 0x5 (0xdc64dff0)
Aug 8 04:32:24 xwing kernel:
Aug 8 04:32:24 xwing kernel:
Aug 8 04:32:24 xwing kernel: Code: c7 40 04 ff ff ff ff c7 40 28 00 00 00 00 eb cd 8b 44 24 0c
|
This is all that shows up in /var/log/messages. This message is also echo'd at the terminal and it ends with...
Quote:
<0> kernel panic aiee killing interrupt handler. In interrupt handler - not syncing.
|
Note that I'm paraphrasing the above error because it didn't actually show in the logs, but it shows on the terminal.
My kernel is 2.4.22-1.2197 nptl and my OS is FC1.
Any ideas where to begin? Should I join the kernel mailing list and post it there? I hesitate to do that because I have done similar stuff in the past and unless you research your problem very well before posting, usually the list members will chew you a new arse, so I think I need to get my ducks in a row before I go posting on some list.
Thanks in advance.
|
|
|
08-08-2004, 08:32 PM
|
#2
|
Senior Member
Registered: May 2004
Location: In the DC 'burbs
Distribution: Arch, Scientific Linux, Debian, Ubuntu
Posts: 4,290
|
From the paging request error message it looks like the kernel might be having trouble finding enough memory to complete the copy operation. This is particularly interesting since create_empty_buffers is the top entry on your call trace. How much RAM do you have on the machine and how much swap? Maybe you could try running a big rsync load and watch free and vmstat to see if your taxing your memory beyond what it can take.
|
|
|
08-08-2004, 08:58 PM
|
#3
|
Member
Registered: Aug 2003
Location: Little Rock, Arkansas
Distribution: RH, Fedora, Suse, AIX
Posts: 736
Original Poster
Rep:
|
Hey! Excellent ideas! I will certainly try that.
Here's what I've got...
I have a single stick of 512MB PC-133 (32x8) 64X64. The item lists as "Generic Low Density". When I bootup, I get this in dmesg:
479MB LOWMEM available.
Memory: 481904k/491456k available (1482k kernel code, 9164k reserved, 1110k data, 136k init, 0k highmem)
Here's my "free" output...
Quote:
[root@xwing root]# free
total used free shared buffers cached
Mem: 482200 391104 91096 0 76456 220844
-/+ buffers/cache: 93804 388396
Swap: 979924 8 979916
|
As you can see, I've got a little under a gig of swap allocated and nearly none of it being used. I'm basically not doing much at all on the system. rsync is running on the sending machine and this machine is just doing the receiving, so it isn't working too hard. ;)
Here's my dmesg. Doesn't seem to be anything that looks bad (out of the ordinary) but maybe you'll see something I don't.
Quote:
Linux version 2.4.22-1.2115.nptl (bhcompile@bugs.devel.redhat.com) (gcc version 3.2.3 20030422 (Red Hat Linux 3.2.3-6)) #1 Wed Oct 29 15:31:21 EST 2003
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000cc000 - 00000000000d0000 (reserved)
BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000001dff0000 (usable)
BIOS-e820: 000000001dff0000 - 000000001dff8000 (ACPI data)
BIOS-e820: 000000001dff8000 - 000000001e000000 (ACPI NVS)
BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
0MB HIGHMEM available.
479MB LOWMEM available.
ACPI: have wakeup address 0xc0001000
On node 0 totalpages: 122864
zone(0): 4096 pages.
zone(1): 118768 pages.
zone(2): 0 pages.
ACPI: RSDP (v000 AMI ) @ 0x000fafe0
ACPI: RSDT (v001 AMIINT VIA_K7 0x00000010 MSFT 0x00000097) @ 0x1dff0000
ACPI: FADT (v001 AMIINT VIA_K7 0x00000011 MSFT 0x00000097) @ 0x1dff0030
ACPI: MADT (v001 AMIINT VIA_K7 0x00000009 MSFT 0x00000097) @ 0x1dff00c0
ACPI: DSDT (v001 VIA VIA_K7 0x00001000 MSFT 0x0100000d) @ 0x00000000
Kernel command line: ro root=LABEL=/
Initializing CPU#0
Detected 1470.042 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 2929.45 BogoMIPS
Memory: 481904k/491456k available (1482k kernel code, 9164k reserved, 1110k data, 136k init, 0k highmem)
Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
Inode cache hash table entries: 32768 (order: 6, 262144 bytes)
Mount cache hash table entries: 512 (order: 0, 4096 bytes)
Buffer cache hash table entries: 32768 (order: 5, 131072 bytes)
Page-cache hash table entries: 131072 (order: 7, 524288 bytes)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 256K (64 bytes/line)
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU: After generic, caps: 0383fbff c1cbfbff 00000000 00000000
CPU: Common caps: 0383fbff c1cbfbff 00000000 00000000
CPU: AMD Athlon(tm) XP 1700+ stepping 02
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
mtrr: v1.40 (20010327) Richard Gooch (rgooch@atnf.csiro.au)
mtrr: detected mtrr type: Intel
ACPI: Subsystem revision 20031002
ACPI: Interpreter disabled.
PCI: PCI BIOS revision 2.10 entry at 0xfdb41, last bus=1
PCI: Using configuration type 1
PCI: Probing PCI hardware
PCI: Probing PCI hardware (bus 00)
PCI: Using IRQ router VIA [1106/0686] at 00:07.0
Applying VIA southbridge workaround.
isapnp: Scanning for PnP cards...
isapnp: No Plug & Play device found
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
apm: BIOS not found.
Starting kswapd
VFS: Disk quotas vdquot_6.5.1
pty: 2048 Unix98 ptys configured
Serial driver version 5.05c (2001-07-08) with MANY_PORTS MULTIPORT SHARE_IRQ SERIAL_PCI ISAPNP enabled
Real Time Clock Driver v1.10e
NET4: Frame Diverter 0.46
RAMDISK driver initialized: 16 RAM disks of 8192K size 1024 blocksize
Uniform Multi-Platform E-IDE driver Revision: 7.00beta4-2.4
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: IDE controller at PCI slot 00:07.1
VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: VIA vt82c686b (rev 40) IDE UDMA100 controller on pci00:07.1
ide0: BM-DMA at 0xfc00-0xfc07, BIOS settings: hda:DMA, hdb:DMA
ide1: BM-DMA at 0xfc08-0xfc0f, BIOS settings: hdc:pio, hdd:pio
hda: IC35L060AVV207-0, ATA DISK drive
hdb: WDC WD1600JB-00EVA0, ATA DISK drive
blk: queue c0408880, I/O limit 4095Mb (mask 0xffffffff)
blk: queue c04089c0, I/O limit 4095Mb (mask 0xffffffff)
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hda: attached ide-disk driver.
hda: host protected area => 1
hda: 120103200 sectors (61493 MB) w/1821KiB Cache, CHS=7476/255/63, UDMA(100)
hdb: attached ide-disk driver.
hdb: host protected area => 1
hdb: 312581808 sectors (160042 MB) w/8192KiB Cache, CHS=19457/255/63, UDMA(100)
Partition check:
hda: hda1 hda2 hda3 hda4 < hda5 >
hdb: hdb1
ide: late registration of driver.
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
Initializing Cryptographic API
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 4096 buckets, 32Kbytes
TCP: Hash tables configured (established 32768 bind 65536)
Linux IP multicast router 0.06 plus PIM-SM
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
RAMDISK: Compressed image found at block 0
Freeing initrd memory: 158k freed
VFS: Mounted root (ext2 filesystem).
Journalled Block Device driver loaded
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
kjournald starting. Commit interval 5 seconds
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
Freeing unused kernel memory: 136k freed
usb.c: registered new driver usbdevfs
usb.c: registered new driver hub
usb-uhci.c: $Revision: 1.275 $ time 15:37:48 Oct 29 2003
usb-uhci.c: High bandwidth mode enabled
PCI: Found IRQ 10 for device 00:07.3
PCI: Sharing IRQ 10 with 00:07.2
usb-uhci.c: USB UHCI at I/O 0xec00, IRQ 10
usb-uhci.c: Detected 2 ports
usb.c: new USB bus registered, assigned bus number 1
hub.c: USB hub found
hub.c: 2 ports detected
PCI: Found IRQ 10 for device 00:07.2
PCI: Sharing IRQ 10 with 00:07.3
usb-uhci.c: USB UHCI at I/O 0xe800, IRQ 10
usb-uhci.c: Detected 2 ports
usb.c: new USB bus registered, assigned bus number 2
hub.c: USB hub found
hub.c: 2 ports detected
usb-uhci.c: v1.275:USB Universal Host Controller Interface driver
usb.c: registered new driver hiddev
usb.c: registered new driver hid
hid-core.c: v1.8.1 Andreas Gal, Vojtech Pavlik <vojtech@suse.cz>
hid-core.c: USB HID support drivers
mice: PS/2 mouse device common for all mice
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,2), internal journal
Adding Swap: 979924k swap-space (priority -1)
kjournald starting. Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,1), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,3), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,65), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
|
Thanks for the ideas, and I will be happy to post anything you think may be of help.
Last edited by Donboy; 08-08-2004 at 09:00 PM.
|
|
|
08-08-2004, 09:39 PM
|
#4
|
Senior Member
Registered: Jun 2004
Location: Australia
Distribution: Mandriva/Slack - KDE
Posts: 1,672
Rep:
|
Sheesh that was confusing. I thought this was a listing from my own box for a minute and got toatally confufuse... almost same VIA chipset (I have 686a) and my drives are:
hda: IC35L040AVVN07-0, ATA DISK drive
hdb: WDC WD800JB-00FMA0, ATA DISK drive
almost the same as well!!! Well, at a quick glance
Very confusiong for me Sorry, nothing on the problem tho Have you ever had any trouble with the IBM drive? I've never fully trusted mine and it makes some funny noises and seems to reset itself when it's working a lot.... Probably unrelated to this problem tho...
|
|
|
08-08-2004, 10:10 PM
|
#5
|
Member
Registered: Aug 2003
Location: Little Rock, Arkansas
Distribution: RH, Fedora, Suse, AIX
Posts: 736
Original Poster
Rep:
|
Well, I don't know. You see, originally this drive came as a USB device, but I got sick of running like that and took the drive out of the case (rather easily too) and mounted it as a slave in one of my machines for a long time. During that time it was used as a backup drive and its worked pretty good for that. However, now I have removed all the backups, formatted to ext3 and loaded an OS and now it's my master drive in another machine. So I really haven't had that much experience with this drive while running an OS on it, but I can tell you that mine has never made any noises, and in fact runs pretty quiet.
The 160gig I picked up at Sams for a decent price (I think, anyway) and it's doing fine also, but again, using it as a backup drive, so don't know how well it would perform with an OS on it.
My motherboard is a stanky old ASRock mobo that carried an Athlon chip. I say stanky because it doesn't have an AGP slot and has only 2 measly PCI slots, but thankfully there is a lot of stuff on the board itself like sound and video. For a backup server it's more than adequate and didn't cost me a lot. But anyway, I digress.
Amazing coincidence.
|
|
|
08-08-2004, 10:36 PM
|
#6
|
Senior Member
Registered: Jun 2004
Location: Australia
Distribution: Mandriva/Slack - KDE
Posts: 1,672
Rep:
|
I have OS spread over both and use heavy (tho the WD is new) and can't really complain. Just nerves maybe as I got the IBM for free (sealed in a bag tho). It runs quiet too, except on rare occassions it goes CHWEEEAPWEEEAP-CLICK CHWEEEAPWEEEAP-CLICK ... then quiet again.
I hold my breath a moment, then continue on working
Anyway, better get back to the real issue here...
|
|
|
08-15-2004, 02:50 PM
|
#7
|
Member
Registered: Aug 2003
Location: Little Rock, Arkansas
Distribution: RH, Fedora, Suse, AIX
Posts: 736
Original Poster
Rep:
|
Just wanted to give an update on my problem here.
I rearranged my backups from "push" type to "pull" type. So now everything seems to be OK, as far as I can tell.
Now I am doing it so that the source computers (containing the data that needs to be backed up) are being exported through /etc/exports and the backup machine is pulling the data from the remote machines to the local machine.
During the runs, the destination machine runs about 80% CPU and never uses more than the RAM I have installed. It uses all the RAM I have available but never taps into the swap space... it always remains zero. On the source machines, there are about 6 or 8 NFS daemons that are using zero memory but each of them uses about less than 1% CPU.
No more kernel errors, but I'm going to give it another week and see what happens.
|
|
|
All times are GMT -5. The time now is 10:58 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|