LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 04-06-2010, 06:02 PM   #1
ccar_support
Member
 
Registered: Jan 2007
Posts: 40

Rep: Reputation: 1
Angry RHEL5.5 memory/processing error


System specifications:
Gigabyte GA-EP45T-UD3LR
Crucial PC3-10600U memory - 4 x 2GB modules
Intel Core 2 quad processor
OS = RHEL 5.5 (32 bit) most recent kernel
Base install with IDL installed in /usr/local
16GB of swap space

Here's what is happening:
User was successfully running an IDL code on an older system with much less memory and far slower processor. Code ran but he needed a system to process the code faster. Created a system with the above specs. Code runs fine for an hour or two and then the system freezes.

/var/log/messages reports:
Mar 14 18:09:42 saber kernel: Bad page state in process 'kswapd0'
Mar 14 18:09:42 saber kernel: page:c1fcb8e0 flags:0xc0010008 mapping:00000000 mapcount:-8388608 count:0 (Ta
inted: P B)
Mar 14 18:09:42 saber kernel: Trying to fix it up, but a reboot is needed
Mar 14 18:09:42 saber kernel: Backtrace:
Mar 14 18:09:42 saber kernel: [<c045a13a>] bad_page+0x52/0x79
Mar 14 18:09:42 saber kernel: [<c045a45e>] free_hot_cold_page+0x6a/0x140
Mar 14 18:09:42 saber kernel: [<c045a548>] __pagevec_free+0x14/0x1a
Mar 14 18:09:42 saber kernel: [<c045cc7a>] __pagevec_release_nonlru+0x61/0x6c
Mar 14 18:09:42 saber kernel: [<c045dbdf>] remove_mapping+0x65/0x88
Mar 14 18:09:42 saber kernel: [<c045e1e2>] shrink_inactive_list+0x5e0/0x7d2
Mar 14 18:09:42 saber kernel: [<c045da59>] shrink_active_list+0x33d/0x3e3
Mar 14 18:09:42 saber kernel: [<c045e49e>] shrink_zone+0xca/0x12f
Mar 14 18:09:42 saber kernel: [<c045e945>] kswapd+0x28a/0x3ab
Mar 14 18:09:42 saber kernel: [<c043603f>] autoremove_wake_function+0x0/0x2d
Mar 14 18:09:42 saber kernel: [<c045e6bb>] kswapd+0x0/0x3ab
Mar 14 18:09:42 saber kernel: [<c0435f7d>] kthread+0xc0/0xeb
Mar 14 18:09:42 saber kernel: [<c0435ebd>] kthread+0x0/0xeb
Mar 14 18:09:42 saber kernel: [<c0405c53>] kernel_thread_helper+0x7/0x10
Mar 14 18:09:42 saber kernel: =======================

Here's what I have done so far in terms of testing:

-I've replaced the motherboard with a brand new identical GA-EP45T-UD3LR motherboard and the same error is present.

-Memtest reports that the memory has no errors

-Seagate drive tester says all Seagate hard drives are error free

-The core 2 quad shows up normally in /proc/cpuinfo so that implies the processor is doing OK.

-I've reloaded the OS so that it's a fully updated, stand alone, base install with IDL installed in /usr/local

-confirmed swap is on and being used.

I'm now at a loss in figuring out this problem. The new system has more than enough memory and swap as this same code works fine on a slower system with far less memory (~1GB of memory, 2GB swap). I was thinking it was a motherboard problem but I used a brand new board and the same issue happens.

I really could use some ideas on this one.

Thanks
 
Old 04-07-2010, 07:38 AM   #2
AlucardZero
Senior Member
 
Registered: May 2006
Location: USA
Distribution: Debian
Posts: 4,824

Rep: Reputation: 615Reputation: 615Reputation: 615Reputation: 615Reputation: 615Reputation: 615
Post your "lspci -v". Do you have an nVidia NIC in there? Google says try disabling your onboard NIC.

Last edited by AlucardZero; 04-07-2010 at 07:41 AM.
 
Old 04-07-2010, 07:53 AM   #3
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,131

Rep: Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121
PAE kernel ?.
I'd pull some (half) memory out as a test. How long did you run memtest for ?.
 
Old 04-07-2010, 11:01 AM   #4
ccar_support
Member
 
Registered: Jan 2007
Posts: 40

Original Poster
Rep: Reputation: 1
here you go:

Ran memtest for about 1 hour.

Already turned off onboard NIC and put in a basic Intel 10/100 PCI network card. Problem still happened. Also disabled the onboard and had no pci card or network at all and the problem still happened.

Video card = nVidia Corporation G98 [GeForce 8400 GS]
Kernel = 2.6.18-194.el5PAE
Nvidia accelerated driver = NVIDIA-Linux-x86-190.42

Output from lspci -v:
----------------------------------------------

00:00.0 Host bridge: Intel Corporation 4 Series Chipset DRAM Controller (rev 03)
Subsystem: Giga-byte Technology GA-EP45-DS5 Motherboard
Flags: bus master, fast devsel, latency 0
Capabilities: [e0] Vendor Specific Information

00:01.0 PCI bridge: Intel Corporation 4 Series Chipset PCI Express Root Port (re
v 03) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
I/O behind bridge: 0000a000-0000afff
Memory behind bridge: e4000000-e7ffffff
Prefetchable memory behind bridge: 00000000d0000000-00000000dff00000
Capabilities: [88] #0d [0000]
Capabilities: [80] Power Management version 3
Capabilities: [90] Message Signalled Interrupts: 64bit- Queue=0/0 Enable
+
Capabilities: [a0] Express Root Port (Slot+) IRQ 0
Capabilities: [100] Virtual Channel
Capabilities: [140] Unknown (5)

00:1a.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Contro
ller #4 (prog-if 00 [UHCI])
Subsystem: Giga-byte Technology GA-EP45-DS5 Motherboard
Flags: bus master, medium devsel, latency 0, IRQ 169
I/O ports at d100 [size=32]
Capabilities: [50] #13 [0306]

00:1a.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Contro
ller #5 (prog-if 00 [UHCI])
Subsystem: Giga-byte Technology GA-EP45-DS5 Motherboard
Flags: bus master, medium devsel, latency 0, IRQ 50
I/O ports at d200 [size=32]
Capabilities: [50] #13 [0306]

00:1a.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Contro
ller #6 (prog-if 00 [UHCI])
Subsystem: Giga-byte Technology GA-EP45-DS5 Motherboard
Flags: bus master, medium devsel, latency 0, IRQ 225
I/O ports at d000 [size=32]
Capabilities: [50] #13 [0306]

00:1a.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Contr
oller #2 (prog-if 20 [EHCI])
Subsystem: Giga-byte Technology GA-EP45-DS5 Motherboard
Flags: bus master, medium devsel, latency 0, IRQ 225
Memory at ea104000 (32-bit, non-prefetchable) [size=1K]
Capabilities: [50] Power Management version 2

00:1b.0 Audio device: Intel Corporation 82801JI (ICH10 Family) HD Audio Controll
er
Subsystem: Giga-byte Technology Unknown device a002
Flags: bus master, fast devsel, latency 0, IRQ 82
Memory at ea100000 (64-bit, non-prefetchable) [size=16K]
Capabilities: [50] Power Management version 2
Capabilities: [60] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable
+
Capabilities: [70] Express Unknown type IRQ 0
Capabilities: [100] Virtual Channel
Capabilities: [130] Unknown (5)

00:1c.0 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Po
rt 1 (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
Capabilities: [40] Express Root Port (Slot+) IRQ 0
Capabilities: [80] Message Signalled Interrupts: 64bit- Queue=0/0 Enable
+
Capabilities: [90] #0d [0000]
Capabilities: [a0] Power Management version 2
Capabilities: [100] Virtual Channel
Capabilities: [180] Unknown (5)

00:1c.4 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Po
rt 5 (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
I/O behind bridge: 0000b000-0000bfff
Memory behind bridge: e8000000-e8ffffff
Capabilities: [40] Express Root Port (Slot+) IRQ 0
Capabilities: [80] Message Signalled Interrupts: 64bit- Queue=0/0 Enable
+
Capabilities: [90] #0d [0000]
Capabilities: [a0] Power Management version 2
Capabilities: [100] Virtual Channel
Capabilities: [180] Unknown (5)

00:1c.5 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Po
rt 6 (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=04, subordinate=04, sec-latency=0
I/O behind bridge: 0000c000-0000cfff
Memory behind bridge: e9000000-e9ffffff
Prefetchable memory behind bridge: 00000000ea000000-00000000ea000000
Capabilities: [40] Express Root Port (Slot+) IRQ 0
Capabilities: [80] Message Signalled Interrupts: 64bit- Queue=0/0 Enable
+
Capabilities: [90] #0d [0000]
Capabilities: [a0] Power Management version 2
Capabilities: [100] Virtual Channel
Capabilities: [180] Unknown (5)

00:1d.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Contro
ller #1 (prog-if 00 [UHCI])
Subsystem: Giga-byte Technology GA-EP45-DS5 Motherboard
Flags: bus master, medium devsel, latency 0, IRQ 233
I/O ports at d300 [size=32]
Capabilities: [50] #13 [0306]

00:1d.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Contro
ller #2 (prog-if 00 [UHCI])
Subsystem: Giga-byte Technology GA-EP45-DS5 Motherboard
Flags: bus master, medium devsel, latency 0, IRQ 58
I/O ports at d400 [size=32]
Capabilities: [50] #13 [0306]

00:1d.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Contro
ller #3 (prog-if 00 [UHCI])
Subsystem: Giga-byte Technology GA-EP45-DS5 Motherboard
Flags: bus master, medium devsel, latency 0, IRQ 225
I/O ports at d500 [size=32]
Capabilities: [50] #13 [0306]

00:1d.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Contr
oller #1 (prog-if 20 [EHCI])
Subsystem: Giga-byte Technology GA-EP45-DS5 Motherboard
Flags: bus master, medium devsel, latency 0, IRQ 233
Memory at ea105000 (32-bit, non-prefetchable) [size=1K]
Capabilities: [50] Power Management version 2

00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90) (prog-if 01 [Sub
tractive decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=05, subordinate=05, sec-latency=32
Capabilities: [50] #0d [0000]

00:1f.0 ISA bridge: Intel Corporation 82801JIR (ICH10R) LPC Interface Controller
Subsystem: Giga-byte Technology GA-EP45-DS5 Motherboard
Flags: bus master, medium devsel, latency 0
Capabilities: [e0] Vendor Specific Information

00:1f.2 IDE interface: Intel Corporation 82801JI (ICH10 Family) 4 port SATA IDE
Controller #1 (prog-if 8f [Master SecP SecO PriP PriO])
Subsystem: Giga-byte Technology Unknown device b002
Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 58
I/O ports at d600 [size=8]
I/O ports at d700 [size=4]
I/O ports at d800 [size=8]
I/O ports at d900 [size=4]
I/O ports at da00 [size=16]
I/O ports at db00 [size=16]
Capabilities: [70] Power Management version 3
Capabilities: [b0] #13 [0306]

00:1f.3 SMBus: Intel Corporation 82801JI (ICH10 Family) SMBus Controller
Subsystem: Giga-byte Technology GA-EP45-DS5 Motherboard
Flags: medium devsel, IRQ 225
Memory at ea106000 (64-bit, non-prefetchable) [size=256]
I/O ports at 0500 [size=32]

00:1f.5 IDE interface: Intel Corporation 82801JI (ICH10 Family) 2 port SATA IDE
Controller #2 (prog-if 85 [Master SecO PriO])
Subsystem: Giga-byte Technology Unknown device b002
Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 58
I/O ports at dd00 [size=8]
I/O ports at de00 [size=4]
I/O ports at df00 [size=8]
I/O ports at e000 [size=4]
I/O ports at e100 [size=16]
I/O ports at e200 [size=16]
Capabilities: [70] Power Management version 3
Capabilities: [b0] #13 [0306]

01:00.0 VGA compatible controller: nVidia Corporation G98 [GeForce 8400 GS] (rev
a1) (prog-if 00 [VGA controller])
Subsystem: Micro-Star International Co., Ltd. Unknown device 2061
Flags: bus master, fast devsel, latency 0, IRQ 169
Memory at e6000000 (32-bit, non-prefetchable) [size=16M]
Memory at d0000000 (64-bit, prefetchable) [size=256M]
Memory at e4000000 (64-bit, non-prefetchable) [size=32M]
I/O ports at a000 [size=128]
[virtual] Expansion ROM at e7000000 [disabled] [size=128K]
Capabilities: [60] Power Management version 3
Capabilities: [68] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable
-
Capabilities: [78] Express Endpoint IRQ 0
Capabilities: [100] Virtual Channel
Capabilities: [128] Power Budgeting
Capabilities: [600] Unknown (11)

03:00.0 IDE interface: JMicron Technology Corp. JMB368 IDE controller (prog-if 8
5 [Master SecO PriO])
Subsystem: Giga-byte Technology Unknown device b000
Flags: bus master, fast devsel, latency 0, IRQ 169
I/O ports at b000 [size=8]
I/O ports at b100 [size=4]
I/O ports at b200 [size=8]
I/O ports at b300 [size=4]
I/O ports at b400 [size=16]
Capabilities: [68] Power Management version 2
Capabilities: [50] Express Legacy Endpoint IRQ 1

04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI E
xpress Gigabit Ethernet controller (rev 02)
Subsystem: Giga-byte Technology GA-EP45-DS5 Motherboard
Flags: bus master, fast devsel, latency 0, IRQ 66
I/O ports at c000 [size=256]
Memory at ea010000 (64-bit, prefetchable) [size=4K]
Memory at ea000000 (64-bit, prefetchable) [size=64K]
[virtual] Expansion ROM at ea020000 [disabled] [size=64K]
Capabilities: [40] Power Management version 3
Capabilities: [50] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable
+
Capabilities: [70] Express Endpoint IRQ 1
Capabilities: [b0] MSI-X: Enable- Mask- TabSize=2
Capabilities: [d0] Vital Product Data
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Virtual Channel
Capabilities: [160] Device Serial Number 78-56-34-12-78-56-34-12

Last edited by ccar_support; 04-07-2010 at 11:05 AM.
 
Old 04-07-2010, 02:27 PM   #5
srbdivb
LQ Newbie
 
Registered: Apr 2010
Posts: 1

Rep: Reputation: 0
Try disabling the nvidia driver and just use one of the open source video drivers or upgrade to one of the newer nvidia drivers. 190.42 is a bit old.
 
Old 04-07-2010, 04:59 PM   #6
ccar_support
Member
 
Registered: Jan 2007
Posts: 40

Original Poster
Rep: Reputation: 1
I updated to the NVIDIA-Linux-x86-195.36.15 driver which appears to be the latest one.

I've been running my test scripts for a couple hours now and I do see the same "kernel: Bad page state in process 'kswapd0'" error in /var/log/messages, but the code is still running, it has not seg faulted like before, and the system has not locked up. I'll keep it running overnight and see if it survives.

Keep the ideas coming if anyone has any, please! I need to get this resolved for good.

cheers
 
Old 04-08-2010, 12:05 PM   #7
ccar_support
Member
 
Registered: Jan 2007
Posts: 40

Original Poster
Rep: Reputation: 1
All the test IDL scripts ran to completion overnight so I was jazzed. I next logged in as the user himself and tried using the onboard ethernet NIC and re-ran the tests. No crashes, but the IDL code did seg fault. I'm thinking this might be a combination of the bad NVIDIA driver and something wonky happening to the driver for the onboard ethernet NIC. I've disabled the onboard NIC and put in a simple PCI Intel card that I had in there last night. I'm again running as the user himself and so far so good. I'll let it run and see what happens.

cheers
 
Old 04-14-2010, 12:03 PM   #8
ccar_support
Member
 
Registered: Jan 2007
Posts: 40

Original Poster
Rep: Reputation: 1
Looks like I've got the thing stabilized. I put on the latest NVIDIA driver and I disabled the onboard NIC. Something about this boards onboard NIC acts wonky when RHEL5 really starts working hard. Long story short, it's chugging away and appears stable for now.

Thanks for the help everyone!
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
xen installed on rhel5.3,less physical memory 500MB,how matter???? m4trix Linux - Software 2 05-12-2009 05:04 AM
Help: rhel4->rhel5==>Matlab consumes more memory pronuncer Linux - Kernel 4 01-18-2008 05:40 PM
User wants to allocate more than 3Gb memory space in RHel5 batkinso Red Hat 3 07-24-2007 10:21 AM
nim processing error opeyrega AIX 1 09-11-2006 06:26 AM
Normal broadcast processing error afzaal Linux - Enterprise 0 07-06-2004 12:14 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 06:28 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration