LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Hardware (http://www.linuxquestions.org/questions/linux-hardware-18/)
-   -   What hardware causes my kernel panic? (http://www.linuxquestions.org/questions/linux-hardware-18/what-hardware-causes-my-kernel-panic-404755/)

JimBass 01-17-2006 10:34 PM

What hardware causes my kernel panic?
 
Ladies and Gentlemen,

I am having extremely frequent kernel panics (within 10 minutes of booting), and they are caused by some pience of misbehaving hardware, I am just not sure which. I know it is hardware because it happens with all 3 of my kernels (2.6.2 (stock Debian), 2.6.13-4 and 2.6.15) as well as when I run under Knoppix.

When the panic happens before I startx, I usually get a printout of the panic. The printout is not consistent, but the most common reported reason is "kernel panic - not synching: CPU context corrupt". If The panic happens in X, I usually either freeze up hard (everything stuck, num lock stuck on, ctrl-alt-f2 has no effect, ctrl-alt-del has no effect), or it will freeze for about 20 seconds then reboot.

Starting the machine is also a circus. Flipping the switch always gets both case fans and the fan over the processor moving, and the DVD drives light up and can open and close, but the BIOS never loads. That happens about 10-20 times before something catches and it boots. I suspected the video card, and swapped it, but that wasn't my problem. I also did memtest86 on the RAM, and that passed.

I would like your help to figure out what is failing so that I can replace it. I can't afford to rebuild the entire box, both because of the cash and the time it would take. Here is the requested info from the sticky:

LSPCI
Code:

0000:00:00.0 Host bridge: Silicon Integrated Systems [SiS] 755 Host (rev 01)
0000:00:01.0 PCI bridge: Silicon Integrated Systems [SiS] SG86C202
0000:00:02.0 ISA bridge: Silicon Integrated Systems [SiS] SiS964 [MuTIOL Media IO] (rev 36)
0000:00:02.5 IDE interface: Silicon Integrated Systems [SiS] 5513 [IDE] (rev 01)
0000:00:03.0 USB Controller: Silicon Integrated Systems [SiS] USB 1.0 Controller (rev 0f)
0000:00:03.1 USB Controller: Silicon Integrated Systems [SiS] USB 1.0 Controller (rev 0f)
0000:00:03.2 USB Controller: Silicon Integrated Systems [SiS] USB 1.0 Controller (rev 0f)
0000:00:03.3 USB Controller: Silicon Integrated Systems [SiS] USB 2.0 Controller
0000:00:04.0 Ethernet controller: Silicon Integrated Systems [SiS] SiS900 PCI Fast Ethernet (rev 91)
0000:00:05.0 RAID bus controller: Silicon Integrated Systems [SiS] RAID bus controller 180 SATA/PATA  [SiS] (rev 01)
0000:00:09.0 VGA compatible controller: nVidia Corporation NV11 [GeForce2 MX/MX 400] (rev b2)
0000:00:0b.0 Multimedia audio controller: Creative Labs SB Audigy (rev 04)
0000:00:0b.1 Input device controller: Creative Labs SB Audigy MIDI/Game port (rev 04)
0000:00:0b.2 FireWire (IEEE 1394): Creative Labs SB Audigy FireWire Port (rev 04)
0000:00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
0000:00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
0000:00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
0000:00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control

LSUSB
Code:

Bus 004 Device 001: ID 0000:0000
Bus 003 Device 001: ID 0000:0000
Bus 002 Device 001: ID 0000:0000
Bus 001 Device 003: ID 1241:1155 Belkin
Bus 001 Device 001: ID 0000:0000

DMESG
Code:

BogoMIPS (lpj=8802842)
Security Framework v1.0.0 initialized
Capability LSM initialized
Mount-cache hash table entries: 512
CPU: After generic identify, caps: 078bfbff e1d3fbff 00000000 00000000 00000000 00000000 00000000
CPU: After vendor identify, caps: 078bfbff e1d3fbff 00000000 00000000 00000000 00000000 00000000
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
CPU: After all inits, caps: 078bfbff e1d3fbff 00000000 00000010 00000000 00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
mtrr: v2.0 (20020519)
CPU: AMD Athlon(tm) 64 Processor 3200+ stepping 00
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
 tbxface-0109 [02] load_tables          : ACPI Tables successfully acquired
Parsing all Control Methods:............................................................................................................................................
Table [DSDT](id 0005) - 475 Objects with 49 Devices 140 Methods 30 Regions
ACPI Namespace successfully loaded at root c038475c
evxfevnt-0091 [03] enable                : Transition to ACPI mode successful
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
checking if image is initramfs... it is
Freeing initrd memory: 970k freed
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: PCI BIOS revision 2.10 entry at 0xfb4f0, last bus=1
PCI: Using configuration type 1
ACPI: Subsystem revision 20050902
evgpeblk-0988 [06] ev_create_gpe_block  : GPE 00 to 0F [_GPE] 2 regs on int 0x9
evgpeblk-0996 [06] ev_create_gpe_block  : Found 10 Wake, Enabled 2 Runtime GPEs in this block
evgpeblk-0988 [06] ev_create_gpe_block  : GPE 10 to 1F [_GPE] 2 regs on int 0x9
evgpeblk-0996 [06] ev_create_gpe_block  : Found 1 Wake, Enabled 0 Runtime GPEs in this block
Completing Region/Field/Buffer/Package initialization:...................................................................................
Initialized 30/30 Regions 14/14 Fields 19/19 Buffers 20/20 Packages (484 nodes)
Executing all Device _STA and_INI methods:.....................................................
53 Devices found containing: 53 _STA, 2 _INI methods
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI: Probing PCI hardware (bus 00)
ACPI: Assume root bridge [\_SB_.PCI0] bus is 0
Boot video device is 0000:00:09.0
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 9 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 *5 6 7 9 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 9 *10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 9 *10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 9 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 *9 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKH] (IRQs *3 4 5 6 7 9 10 11 12 14 15)
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
pnp: ACPI device : hid PNP0C01
pnp: ACPI device : hid PNP0C02
pnp: ACPI device : hid PNP0200
pnp: ACPI device : hid PNP0B00
pnp: ACPI device : hid PNP0800
pnp: ACPI device : hid PNP0C04
pnp: ACPI device : hid PNP0700
pnp: ACPI device : hid PNP0501
pnp: ACPI device : hid PNP0401
pnp: ACPI device : hid PNP0303
pnp: PnP ACPI: found 10 devices
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq".  If it helps, post a report
pnp: the driver 'system' has been registered
pnp: match found with the PnP device '00:00' and the driver 'system'
pnp: match found with the PnP device '00:01' and the driver 'system'
PCI: Bridge: 0000:00:01.0
  IO window: disabled.
  MEM window: disabled.
  PREFETCH window: disabled.
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
Initializing Cryptographic API
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered
Linux agpgart interface v0.101 (c) Dave Jones
agpgart: Detected AGP bridge 0
agpgart: AGP aperture is 64M @ 0xe8000000
[drm] Initialized drm 1.0.0 20040925
pnp: the driver 'i8042 kbd' has been registered
pnp: match found with the PnP device '00:09' and the driver 'i8042 kbd'
pnp: the driver 'i8042 aux' has been registered
PNP: PS/2 Controller [PNP0303:PS2K] at 0x60,0x64 irq 1
PNP: PS/2 controller doesn't have AUX irq; using default 12
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
pnp: the driver 'serial' has been registered
pnp: match found with the PnP device '00:07' and the driver 'serial'
00:07: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
RAMDISK driver initialized: 16 RAM disks of 65536K size 1024 blocksize
usbmon: debugfs is not available
NET: Registered protocol family 2
IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
TCP established hash table entries: 131072 (order: 7, 524288 bytes)
TCP bind hash table entries: 65536 (order: 6, 262144 bytes)
TCP: Hash tables configured (established 131072 bind 65536)
TCP reno registered
TCP bic registered
Initializing IPsec netlink socket
NET: Registered protocol family 1
NET: Registered protocol family 15
Using IPI Shortcut mode
Freeing unused kernel memory: 136k freed
mice: PS/2 mouse device common for all mice
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
SIS5513: IDE controller at PCI slot 0000:00:02.5
SIS5513: chipset revision 1
SIS5513: not 100% native mode: will probe irqs later
SIS5513: SiS 962/963 MuTIOL IDE UDMA133 controller
    ide0: BM-DMA at 0x4000-0x4007, BIOS settings: hda:DMA, hdb:pio
    ide1: BM-DMA at 0x4008-0x400f, BIOS settings: hdc:DMA, hdd:DMA
Probing IDE interface ide0...
input: AT Translated Set 2 keyboard as /class/input/input0
hda: WDC WD2500JB-00GVA0, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
hdc: IDE DVD-ROM 16X, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
hda: max request size: 1024KiB
hda: 488397168 sectors (250059 MB) w/8192KiB Cache, CHS=30401/255/63, UDMA(100)
hda: cache flushes supported
 hda: hda1 hda2 hda3
ReiserFS: hda3: found reiserfs format "3.6" with standard journal
ReiserFS: hda3: warning: CONFIG_REISERFS_CHECK is set ON
ReiserFS: hda3: warning: - it is slow mode for debugging.
ReiserFS: hda3: using ordered data mode
ReiserFS: hda3: journal params: device hda3, size 8192, journal first block 18,
max trans len 1024, max batch 900, max commit age 30, max trans age 30
ReiserFS: hda3: checking transaction log (hda3)
ReiserFS: hda3: journal-1153: found in header: first_unflushed_offset 6985, last
_flushed_trans_id 413919
ReiserFS: hda3: journal-1006: found valid transaction start offset 6985, len 1 i
d 413920
ReiserFS: hda3: journal-1206: Starting replay from offset 1777772863167305, tran
s_id 4035481600
ReiserFS: hda3: journal-1037: journal_read_transaction, offset 6985, len 1 mount
_id 309
ReiserFS: hda3: journal-1095: setting journal start to offset 6988
ReiserFS: hda3: journal-1037: journal_read_transaction, offset 6988, len 7 mount_id 309
ReiserFS: hda3: journal-1095: setting journal start to offset 6988
ReiserFS: hda3: journal-1037: journal_read_transaction, offset 6988, len 7 mount_id 309
ReiserFS: hda3: journal-1095: setting journal start to offset 6997
ReiserFS: hda3: journal-1037: journal_read_transaction, offset 6997, len 1936655564 mount_id 7172724
ReiserFS: hda3: journal-1146: journal_read_trans skipping because 7172724 is != newest_mount_id 309
ReiserFS: hda3: journal-1299: Setting newest_mount_id to 310
ReiserFS: hda3: replayed 2 transactions in 0 seconds
ReiserFS: hda3: Using r5 hash to sort names
ohci_hcd: 2005 April 22 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI)
ACPI: PCI Interrupt 0000:00:03.0[A] -> GSI 20 (level, low) -> IRQ 16
ohci_hcd 0000:00:03.0: OHCI Host Controller
ohci_hcd 0000:00:03.0: new USB bus registered, assigned bus number 1
ohci_hcd 0000:00:03.0: irq 16, io mem 0xed025000
SCSI subsystem initialized
libata version 1.20 loaded.
hdc: ATAPI 48X DVD-ROM drive, 512kB Cache
Uniform CD-ROM driver Revision: 3.20
hdd: ATAPI 40X DVD-ROM DVD-R CD-R/RW drive, 2000kB Cache
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 3 ports detected
sis900.c: v1.08.08 Jan. 22 2005
input: PC Speaker as /class/input/input1
Real Time Clock Driver v1.12
ACPI: PCI Interrupt 0000:00:03.1[B] -> GSI 21 (level, low) -> IRQ 17
ohci_hcd 0000:00:03.1: OHCI Host Controller
ohci_hcd 0000:00:03.1: new USB bus registered, assigned bus number 2
ohci_hcd 0000:00:03.1: irq 17, io mem 0xed026000
Floppy drive(s): fd0 is 1.44M
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 3 ports detected
FDC 0 is a post-1991 82077
ACPI: PCI Interrupt 0000:00:03.2[C] -> GSI 22 (level, low) -> IRQ 18
ohci_hcd 0000:00:03.2: OHCI Host Controller
ohci_hcd 0000:00:03.2: new USB bus registered, assigned bus number 3
ohci_hcd 0000:00:03.2: irq 18, io mem 0xed027000
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 2 ports detected
usb 1-1: new low speed USB device using ohci_hcd and address 2
gameport: EMU10K1 is pci0000:00:0b.1/gameport0, io 0xe800, speed 1242kHz
sata_sis 0000:00:05.0: version 0.5
acpi_bus-0201 [01] bus_set_power        : Device is not power manageable
ACPI: PCI Interrupt 0000:00:05.0[A] -> GSI 17 (level, low) -> IRQ 19
sata_sis 0000:00:05.0: Detected SiS 180/181 chipset in SATA mode
ata1: SATA max UDMA/133 cmd 0xE100 ctl 0xE202 bmdma 0xE500 irq 19
ata2: SATA max UDMA/133 cmd 0xE300 ctl 0xE402 bmdma 0xE508 irq 19
acpi_bus-0201 [01] bus_set_power        : Device is not power manageable
ACPI: PCI Interrupt 0000:00:04.0[A] -> GSI 19 (level, low) -> IRQ 20
0000:00:04.0: Realtek RTL8201 PHY transceiver found at address 1.
0000:00:04.0: Using transceiver found at address 1 as default
eth0: SiS 900 PCI Fast Ethernet at 0xe000, IRQ 20, 00:11:5b:5d:2a:eb.
ata1: dev 0 cfg 49:2f00 82:346b 83:7f21 84:4003 85:3469 86:3c01 87:4003 88:003f
ata1: dev 0 ATA-6, max UDMA/100, 488397168 sectors: LBA48
ata1(0): applying bridge limits
ata1: dev 0 configured for UDMA/100
scsi0 : sata_sis
ata2: dev 0 cfg 49:2f00 82:3469 83:7f21 84:4003 85:3469 86:3c01 87:4003 88:003f
ata2: dev 0 ATA-6, max UDMA/100, 488281250 sectors: LBA48
ata2(0): applying bridge limits
ata2: dev 0 configured for UDMA/100
scsi1 : sata_sis
  Vendor: ATA      Model: WDC WD2500JD-00G  Rev: 02.0
  Type:  Direct-Access                      ANSI SCSI revision: 05
  Vendor: ATA      Model: WDC WD2500JD-75F  Rev: 02.0
  Type:  Direct-Access                      ANSI SCSI revision: 05
acpi_bus-0201 [01] bus_set_power        : Device is not power manageable
ACPI: PCI Interrupt 0000:00:03.3[D] -> GSI 23 (level, low) -> IRQ 21
ehci_hcd 0000:00:03.3: EHCI Host Controller
PCI: cache line size of 64 is not supported by device 0000:00:03.3
ehci_hcd 0000:00:03.3: new USB bus registered, assigned bus number 4
ehci_hcd 0000:00:03.3: irq 21, io mem 0xed029000
ehci_hcd 0000:00:03.3: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 8 ports detected
ACPI: PCI Interrupt 0000:00:0b.0[A] -> GSI 17 (level, low) -> IRQ 19
Installing spdif_bug patch: Audigy 2 ZS [SB0350]
usb 1-1: USB disconnect, address 2
usb 1-1: new low speed USB device using ohci_hcd and address 3
usbcore: registered new driver hiddev
input: HID 1241:1155 as /class/input/input2
input: USB HID v1.00 Mouse [HID 1241:1155] on usb-0000:00:03.0-1
usbcore: registered new driver usbhid
drivers/usb/input/hid-core.c: v2.6:USB HID core driver
SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB)
SCSI device sda: drive cache: write back
SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB)
SCSI device sda: drive cache: write back
 sda: sda1
sd 0:0:0:0: Attached scsi disk sda
SCSI device sdb: 488281250 512-byte hdwr sectors (250000 MB)
SCSI device sdb: drive cache: write back
SCSI device sdb: 488281250 512-byte hdwr sectors (250000 MB)
SCSI device sdb: drive cache: write back
 sdb: sdb1
sd 1:0:0:0: Attached scsi disk sdb
Adding 1951888k swap on /dev/hda2.  Priority:-1 extents:1 across:1951888k
it87: Found IT8705F chip at 0x290, revision 2
it87-isa 9191-0290: Detected broken BIOS defaults, disabling PWM interface
device-mapper: 4.4.0-ioctl (2005-01-12) initialised: dm-devel@redhat.com
ReiserFS: hda1: found reiserfs format "3.6" with standard journal
ReiserFS: hda1: warning: CONFIG_REISERFS_CHECK is set ON
ReiserFS: hda1: warning: - it is slow mode for debugging.
ReiserFS: hda1: using ordered data mode
ReiserFS: hda1: journal params: device hda1, size 8192, journal first block 18,
max trans len 1024, max batch 900, max commit age 30, max trans age 30
ReiserFS: hda1: checking transaction log (hda1)
ReiserFS: hda1: journal-1153: found in header: first_unflushed_offset 1674, last
_flushed_trans_id 3110
ReiserFS: hda1: journal-1006: found valid transaction start offset 1674, len 1 i
d 518
ReiserFS: hda1: journal-1206: Starting replay from offset 13361643259530, trans_
id 4039401472
ReiserFS: hda1: journal-1037: journal_read_transaction, offset 1674, len 1 mount_id 31
ReiserFS: hda1: journal-1039: journal_read_trans skipping because 1674 is too old
ReiserFS: hda1: journal-1299: Setting newest_mount_id to 300
ReiserFS: hda1: Using r5 hash to sort names
NET: Registered protocol family 17
eth0: Media Link On 100mbps full-duplex
ReiserFS: dm-0: found reiserfs format "3.6" with standard journal
ReiserFS: dm-0: warning: CONFIG_REISERFS_CHECK is set ON
ReiserFS: dm-0: warning: - it is slow mode for debugging.
ReiserFS: dm-0: using ordered data mode
ReiserFS: dm-0: journal params: device dm-0, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30
ReiserFS: dm-0: checking transaction log (dm-0)
ReiserFS: dm-0: journal-1153: found in header: first_unflushed_offset 3865, last_flushed_trans_id 308503
ReiserFS: dm-0: journal-1006: found valid transaction start offset 3865, len 8 id 308504
ReiserFS: dm-0: journal-1206: Starting replay from offset 1325014590689049, trans_id 4040105984
ReiserFS: dm-0: journal-1037: journal_read_transaction, offset 3865, len 8 mount_id 274
ReiserFS: dm-0: journal-1095: setting journal start to offset 3875
ReiserFS: dm-0: journal-1037: journal_read_transaction, offset 3875, len 14 mount_id 274
ReiserFS: dm-0: journal-1095: setting journal start to offset 3891
ReiserFS: dm-0: journal-1037: journal_read_transaction, offset 3891, len 4 mount_id 274
ReiserFS: dm-0: journal-1095: setting journal start to offset 3897
ReiserFS: dm-0: journal-1037: journal_read_transaction, offset 3897, len 11 mount_id 274
ReiserFS: dm-0: journal-1095: setting journal start to offset 3910
ReiserFS: dm-0: journal-1037: journal_read_transaction, offset 3910, len 2557232 mount_id 2643093
ReiserFS: dm-0: journal-1146: journal_read_trans skipping because 2643093 is != newest_mount_id 274
ReiserFS: dm-0: journal-1299: Setting newest_mount_id to 275
ReiserFS: dm-0: replayed 4 transactions in 0 seconds
ReiserFS: dm-0: Using r5 hash to sort names
nvidia: module license 'NVIDIA' taints kernel.
ACPI: PCI Interrupt 0000:00:09.0[A] -> GSI 17 (level, low) -> IRQ 19
NVRM: loading NVIDIA Linux x86 NVIDIA Kernel Module  1.0-8178  Wed Dec 14 16:22:51 PST 2005

Thanks for any help, and I can provide more info as needed.

Peace,
JimBass

stress_junkie 01-18-2006 06:35 AM

I would first disconnect any devices connected to the motherboard. Remove any cards. Disconnect all of the disk drives. Remove the memory. Just have a bare motherboard with your video controller and your keyboard and mouse. Then turn on the power and see what happens. If that doesn't start properly then you know that your problem is either in the keyboard, mouse, video card, or the motherboard itself. That at least reduces the possible source of the problem to those few components. You could then try swapping each of these components to see if that fixes the problem. If not then you might check the CMOS battery by removing it and seeing if the motherboard works properly. If all of that fails then you may have one of the problem capacitors in your motherboard.

Naturally if your stripped system works then you start attaching devices. First start with the memory. If that works then add the boot disk. If that works then add the DVD/CD-ROM drive, and so on until either the system starts to act up again or until you have the system restored and working properly. If you reconnect everything and the machine is working properly then it is possible that a connector got rusty and removing it and reseating it cleaned the contacts.

Half_Elf 01-18-2006 07:11 AM

This might sound weird, but are you sure your PSU (power supply unit) is not broken? I had similar problem and it turned out that the PSU power was going up and down, causing certains hardware to turn off (to fail according to kernel point of view) and booting was very difficult (not enought power to get all the thing online). My first guess would be to try with another PSU (just borrow one somewhere, don't buy anything unless you are sure you found the bug/you are rich) and with all hardwares pluged first. If it does boot properly several time in a row, then you probably found the bug.

<edit>Posted before my first coffee --> typo</edit>

Xerop 01-18-2006 09:26 AM

Half_Elf is right I would first suspect the power supply, especially if it came with the case. (ie built in to case when it was purchased, not sold seperatly)

khaleel5000 01-18-2006 12:31 PM

I also experienced some problem but no kernel panics but my EIDE devices were not detected due to "earth " [thats what we call it ]-- to check that just test with an electric tester if your motherboard's casing has current or not if yeah then that might be the problem ...

JimBass 01-18-2006 02:00 PM

Thanks for the suggestions so far guys. I will try putting it together piece by piece when I get home from work. I had forgotten to mention the power situation. I have a Chiefmax 650w ATX power supply, purchased seperately from the case, and installed by me. That isn't to say that it couldn't be failing, but it is both relatively new and seriously overpowered for what I ask of it.

I will take a spare power supply home from work, as I don't think I have a spare at home that has the right plugs for my MOBO. The boot disk is a 250 Gb ide, and the 2x250 Gb satas are managed by LVM and hold only the /home directory.

Also forgot to mentions that I run the TechMon theme under superkaramba, and it keeps a good eye on my hardware. The processor usually stays between 42-55 degrees Celcius, and the board and case stay around 30. All 3 fans rotate right arount 3000 rpm, so I don't think overheating is part of the deal. We shall see however. Thanks again for the replys.

Peace,
JimBass

JimBass 01-20-2006 09:19 AM

Motherboard or Processor
 
I tried switching power supplies. The first attempt with the replacement PSU booted, then failed as there were no drives in. Adding the boot drive allowed a boot, then adding the SATAs for my /home caused it to lock up. I thought that might have just been something funky, but then restarting I was stuck with the same old symptoms, case has power, but the BIOS wouldn't start and the monitor send the no signal report. I switched back to the 650 watt PSU, and it again booted. I went into the BIOS and it crashed there. I was scrolling through a list of PCI options, and suddenly I couldn't move the cursor.

Switching between power supplies I had varying results. Sometimes again it wouldn't boot at all. Sometimes it would boot, get past the BIOS config option, then lock on the message "verifying DMI pool data". In all of these crashes listed here, I don't get a kernel panic.

I also used 2 different known good video cards during the testing, and 2 keyboards.

I am guessing at this point that it is either the MOBO or the processor that is failing. I am thinking it is the MOBO, as if the processor failed I would expect it to be an all on or all off kind of deal. By that I mean it couldn't fail and then be repaired. I have only seen processors burn out, not warble back and forth between working and not.

I don't have access to a spare board with a socket 754 chipset, so I can't remove the processor from the board for testing elsewhere. Does anyone have a suggestion about how to tell if it is the board or the processor that is failing?

Thanks again for all the help!

Peace,
JimBass

Half_Elf 01-20-2006 09:33 AM

The motherboard is really the harder to test. Usually, if everything else seems all fine, you have to conclude that the mobo is fried... I don't really see any other way to do so. I doubt the processor has something to do with it, I never heard of broken processors, but maybe, who know?

By the ways, you never mentionned the RAM, have you tried another your RAM module? That kind of errors might be caused by a defective RAM. By my experience, I found that memtest86 and others tests are usually un-accurate, it may report that everything is fine when it's not.

JimBass 01-20-2006 09:52 AM

I have 2 RAM modules, a 512 and a 256. I ripped out the 256 and left just the 512 during most of this testing. I can certainly go the other way and try just the 256. Memtest86 gave them a pass, but it wouldn't hurt to do that on my own seperately.

Thanks for the suggestion!

Peace,
JimBass

khaleel5000 01-20-2006 10:23 AM

MY BROTHER THEN I M NEARLY SURE THAT THE PROBLEM LIES IN EARTH THING (that exact thing without bios crash or any thing happened with me , either it ran windows or didnot detect any HDS at all ) so MAKE SURE THAT UR CASING IS NOT SHOWING ANY sign of POWER when u test with a TESTER (i am using terms of my region may be there is something different
JUST TRY TO RUN SYSTEM WITH YOUR BIOS RESTING ON WOOD OR ANY insulator AND MAKE SURE UR BACK OF MOBO ;) IS COVERED with foam or something like that which manufacturers give with eir boards (some dont so get one frm somewhere or rest it on cloth or paper the goal is to keep even the slightest of current away frm this path ......... AND DO TRY RUNNING WITH only 256 MB RAM

khaleel5000 01-20-2006 10:28 AM

also it wont hurt to put all EIDES on a piece of paper 2
Believe me here in pakistan no one kinda knew what was happening so i used 2 put a foam under my mobo and but HDs and CD-rom on papers (seperate of course) and PSU on a seperate sheet as my PSU had earth problem and my dad was too stubborn to buy me one and i was 13 at that time and believe me v dont have rights till v r 99+ here

khaleel5000 01-20-2006 10:31 AM

---- BY THE WAY IS HAS THIS STARTED 2 HAPPEN RECENTLY OR SINCE U BOUGHT UR SYSTEM ???

*** sorry if i m a lil IRRITATING ***

JimBass 01-20-2006 12:02 PM

This has been an escalating problem for the past 5 - 6 weeks. The machine sits on my desk which is wood. It is literally a desktop on a desktop. Khaleel5000, I'm not quite sure what you mean by earth. Do you mean something like "dirt" that gets in the way of the connecting pins? Power wise I think I'm cool, I'm behind a surge protector. There is a router and speakers + alarmclock all behind the same surge protector, and none of them have had a problem, just the PC.

Thanks Again.

Peace,
JimBass

Electro 01-20-2006 06:49 PM

Even if your power supply is crap, it will still output about 50% which will be more enough for the computer. The symptoms sounds like a defect or a bad processor. Before buying a new processor, put the memory in another system and run memtest86 because replacing a processor is more costly than memory. I have seen some cases that a malfunction USB device will be the culprit.

Earth means ground. What khaleel5000 is saying is check the electrical wiring of the room that the computer is in with a special device that checks if it is correct. Some contractors do not do it right in old or new houses.

Just to note, motherboards based on SIS chipsets should be never be consider in your decision to buy. SIS chipsets are poorly supported. Also the stability and reliability of SIS chipsets are very poor.

khaleel5000 01-20-2006 11:09 PM

THANKS ELECTRO i meant ground but just check it around your casing .... I DONT KNOW SURGE PROTECTOR (and if its escallating then i m pretty sure its same problem just take out your :- board processor , ram , hds , cdroms and powersupply , put em on a table but make sure u put newspapers (2-3 sheets ) under each one [honestly its not a prank ] then check ,otherwise it could be BIOS [but inshallah ur prob lil b solved when u do that newspaper thing ... instead of newspaper u can put cloth but dont put it plainly under wood (bad omen ) .. please do that


All times are GMT -5. The time now is 01:57 PM.