Yesterday i had a Redhat server crash on me. After the crash, the only way I can get it to boot is using a uniprocesser kernel.
With the SMP kernel (2.4.18-19.7) the server boots the first CPU just fine, but hangs on the 2nd with:
Booting Processor 2/6 eip 2000
I did get the SMP kernel to load once, but the server was down again within 2 hours with the same above error happening upon restart.
To add more confusion to the situation, after RAID was done resync, i deciede to run quotacheck. as soon as a ran it, the server booted me and proceded to crash.
How would I check the harddrives, maybe one is bad?
the setup is 2X Xeon 2GB with HyperThreading (now disabled but was on)
redhat 7.2
2x 80gb HD running software RAID-1
Any ideas on why the SMP kernel would suddenly stop working? Could be a bad CPU, but how would I test it?
dmesg:
=======
ST 2003
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 0000000080000000 (usable)
BIOS-e820: 00000000fec00000 - 00000000fed00000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
Warning only 896MB will be used.
Use a HIGHMEM enabled kernel.
896MB LOWMEM available.
On node 0 totalpages: 229376
zone(0): 4096 pages.
zone(1): 225280 pages.
zone(2): 0 pages.
Kernel command line: auto BOOT_IMAGE=linux-2.4.18-up ro root=900 BOOT_FILE=/boot/vmlinuz-2.4.18-26.7.x
Initializing CPU#0
Detected 1999.820 MHz processor.
Speakup v-1.00 CVS: Tue Jun 11 14:22:53 EDT 2002 : initialized
Console: colour VGA+ 80x25
Calibrating delay loop... 3984.58 BogoMIPS
Memory: 900472k/917504k available (1160k kernel code, 14468k reserved, 989k data, 152k init, 0k highmem)
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
Inode cache hash table entries: 65536 (order: 7, 524288 bytes)
Mount cache hash table entries: 16384 (order: 5, 131072 bytes)
ramfs: mounted with options: <defaults>
ramfs: max_pages=112831 max_file_pages=0 max_inodes=0 max_dentries=112831
Buffer cache hash table entries: 65536 (order: 6, 262144 bytes)
Page-cache hash table entries: 262144 (order: 8, 1048576 bytes)
CPU: Before vendor init, caps: bfebfbff 00000000 00000000, vendor = 0
CPU: L1 I cache: 0K, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: After vendor init, caps: bfebfbff 00000000 00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU: After generic, caps: bfebfbff 00000000 00000000 00000000
CPU: Common caps: bfebfbff 00000000 00000000 00000000
CPU: Intel(R) Xeon(TM) CPU 2.00GHz stepping 07
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
Checking for popad bug... OK.
POSIX conformance testing by UNIFIX
mtrr: v1.40 (20010327) Richard Gooch (rgooch@atnf.csiro.au)
mtrr: detected mtrr type: Intel
PCI: PCI BIOS revision 2.10 entry at 0xf0031, last bus=4
PCI: Using configuration type 1
PCI: Probing PCI hardware
Transparent bridge - Intel Corp. 82801BA/CA/DB PCI Bridge
PCI: Using IRQ router PIIX [8086/2480] at 00:1f.0
PCI: Found IRQ 11 for device 00:1f.1
PCI: Sharing IRQ 11 with 01:02.0
isapnp: Scanning for PnP cards...
isapnp: No Plug & Play device found
speakup: initialized device: /dev/synth, node (MAJOR 10, MINOR 25)
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
apm: BIOS not found.
Starting kswapd
VFS: Diskquotas version dquot_6.5.0 initialized
Detected PS/2 Mouse Port.
pty: 512 Unix98 ptys configured
Serial driver version 5.05c (2001-07-08) with MANY_PORTS MULTIPORT SHARE_IRQ SERIAL_PCI ISAPNP enabled
ttyS0 at 0x03f8 (irq = 4) is a 16550A
ttyS1 at 0x02f8 (irq = 3) is a 16550A
Real Time Clock Driver v1.10e
block: 1024 slots per queue, batch=256
Uniform Multi-Platform E-IDE driver Revision: 6.31
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
PIIX4: IDE controller on PCI bus 00 dev f9
PCI: Enabling device 00:1f.1 (0005 -> 0007)
PCI: Found IRQ 11 for device 00:1f.1
PCI: Sharing IRQ 11 with 01:02.0
PIIX4: chipset revision 2
PIIX4: not 100% native mode: will probe irqs later
ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda

MA, hdb

io
ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc

MA, hdd

MA
hda: ST380021A, ATA DISK drive
hdc: SR244W, ATAPI CD/DVD-ROM drive
hdd: ST380021A, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
blk: queue c0377604, I/O limit 4095Mb (mask 0xffffffff)
blk: queue c0377604, I/O limit 4095Mb (mask 0xffffffff)
hda: 156301488 sectors (80026 MB) w/2048KiB Cache, CHS=9729/255/63, UDMA(100)
blk: queue c0377aa8, I/O limit 4095Mb (mask 0xffffffff)
blk: queue c0377aa8, I/O limit 4095Mb (mask 0xffffffff)
hdd: 156301488 sectors (80026 MB) w/2048KiB Cache, CHS=155061/16/63, UDMA(100)
ide-floppy driver 0.99.newide
Partition check:
hda: hda1 hda2 hda3
hdd: [PTBL] [9729/255/63] hdd1 hdd2
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
NET4: Frame Diverter 0.46
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
ide-floppy driver 0.99.newide
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
[events: 0000001d]
[events: 0000001d]
[events: 0000001d]
[events: 0000001d]
md: autorun ...
md: considering hdd2 ...
md: adding hdd2 ...
md: adding hda2 ...
md: created md0
md: bind<hda2,1>
md: bind<hdd2,2>
md: running: <hdd2><hda2>
md: hdd2's event counter: 0000001d
md: hda2's event counter: 0000001d
md: md0: raid array is not clean -- starting background reconstruction
md: RAID level 1 does not need chunksize! Continuing anyway.
kmod: failed to exec /sbin/modprobe -s -k md-personality-3, errno = 2
md: personality 3 is not loaded!
md :do_md_run() returned -22
md: md0 stopped.
md: unbind<hdd2,1>
md: export_rdev(hdd2)
md: unbind<hda2,0>
md: export_rdev(hda2)
md: considering hdd1 ...
md: adding hdd1 ...
md: adding hda1 ...
md: created md1
md: bind<hda1,1>
md: bind<hdd1,2>
md: running: <hdd1><hda1>
md: hdd1's event counter: 0000001d
md: hda1's event counter: 0000001d
md: md1: raid array is not clean -- starting background reconstruction
md: RAID level 1 does not need chunksize! Continuing anyway.
kmod: failed to exec /sbin/modprobe -s -k md-personality-3, errno = 2
md: personality 3 is not loaded!
md :do_md_run() returned -22
md: md1 stopped.
md: unbind<hdd1,1>
md: export_rdev(hdd1)
md: unbind<hda1,0>
md: export_rdev(hda1)
md: ... autorun DONE.
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 8192 buckets, 64Kbytes
TCP: Hash tables configured (established 262144 bind 65536)
Linux IP multicast router 0.06 plus PIM-SM
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
RAMDISK: Compressed image found at block 0
Freeing initrd memory: 123k freed
VFS: Mounted root (ext2 filesystem).
md: raid1 personality registered as nr 3
Journalled Block Device driver loaded
md: Autodetecting RAID arrays.
[events: 0000001d]
[events: 0000001d]
[events: 0000001d]
[events: 0000001d]
md: autorun ...
md: considering hda1 ...
md: adding hda1 ...
md: adding hdd1 ...
md: created md1
md: bind<hdd1,1>
md: bind<hda1,2>
md: running: <hda1><hdd1>
md: hda1's event counter: 0000001d
md: hdd1's event counter: 0000001d
md: md1: raid array is not clean -- starting background reconstruction
md: RAID level 1 does not need chunksize! Continuing anyway.
md1: max total readahead window set to 508k
md1: 1 data-disks, max readahead per data-disk: 508k
raid1: device hda1 operational as mirror 0
raid1: device hdd1 operational as mirror 1
raid1: raid set md1 not clean; reconstructing mirrors
raid1: raid set md1 active with 2 out of 2 mirrors
md: updating md1 RAID superblock on device
md: hda1 [events: 0000001e]<6>(write) hda1's sb offset: 48064
md: syncing RAID array md1
md: minimum _guaranteed_ reconstruction speed: 100 KB/sec/disc.
md: using maximum available idle IO bandwith (but not more than 10000 KB/sec) for reconstruction.
md: using 508k window, over a total of 48064 blocks.
md: hdd1 [events: 0000001e]<6>(write) hdd1's sb offset: 48064
md: considering hda2 ...
md: adding hda2 ...
md: adding hdd2 ...
md: created md0
md: bind<hdd2,1>
md: bind<hda2,2>
md: running: <hda2><hdd2>
md: hda2's event counter: 0000001d
md: hdd2's event counter: 0000001d
md: md0: raid array is not clean -- starting background reconstruction
md: RAID level 1 does not need chunksize! Continuing anyway.
md0: max total readahead window set to 508k
md0: 1 data-disks, max readahead per data-disk: 508k
raid1: device hda2 operational as mirror 0
raid1: device hdd2 operational as mirror 1
raid1: raid set md0 not clean; reconstructing mirrors
raid1: raid set md0 active with 2 out of 2 mirrors
md: updating md0 RAID superblock on device
md: hda2 [events: 0000001e]<6>(write) hda2's sb offset: 73730240
md: delaying resync of md0 until md1 has finished resync (they share one or more physical units)
md: hdd2 [events: 0000001e]<6>(write) hdd2's sb offset: 73730240
md: ... autorun DONE.
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
md: md1: sync done.
md: syncing RAID array md0
md: minimum _guaranteed_ reconstruction speed: 100 KB/sec/disc.
md: using maximum available idle IO bandwith (but not more than 10000 KB/sec) for reconstruction.
md: using 508k window, over a total of 73730240 blocks.
kjournald starting. Commit interval 5 seconds
EXT3-fs: md(9,0): orphan cleanup on readonly fs
ext3_orphan_cleanup: deleting unreferenced inode 7684894
ext3_orphan_cleanup: deleting unreferenced inode 7684568
EXT3-fs: md(9,0): 2 orphan inodes deleted
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
Freeing unused kernel memory: 152k freed
Adding Swap: 1052248k swap-space (priority -1)
usb.c: registered new driver usbdevfs
usb.c: registered new driver hub
usb-uhci.c: $Revision: 1.275 $ time 10:19:38 Feb 24 2003
usb-uhci.c: High bandwidth mode enabled
PCI: Found IRQ 5 for device 00:1d.0
PCI: Sharing IRQ 5 with 04:01.0
PCI: Setting latency timer of device 00:1d.0 to 64
usb-uhci.c: USB UHCI at I/O 0xec00, IRQ 5
usb-uhci.c: Detected 2 ports
usb.c: new USB bus registered, assigned bus number 1
hub.c: USB hub found
hub.c: 2 ports detected
usb-uhci.c: v1.275:USB Universal Host Controller Interface driver
EXT3 FS 2.4-0.9.18, 14 May 2002 on md(9,0), internal journal
kjournald starting. Commit interval 5 seconds
EXT3 FS 2.4-0.9.18, 14 May 2002 on md(9,1), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
parport0: PC-style at 0x378 [PCSPP]
eepro100.c:v1.09j-t 9/29/99 Donald Becker
http://www.scyld.com/network/eepro100.html
eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin <saw@saw.sw.com.sg> and others
PCI: Found IRQ 9 for device 01:01.0
PCI: Sharing IRQ 9 with 00:1f.3
divert: allocating divert_blk for eth0
eth0: Intel Corp. 82557/8/9 [Ethernet Pro 100], 00:E0:81:23:FD:4A, IRQ 9.
Board assembly 567812-052, Physical connectors present: RJ45
Primary interface chip i82555 PHY #1.
General self-test: passed.
Serial sub-system self-test: passed.
Internal registers self-test: passed.
ROM checksum self-test: passed (0xd0a6c714).