LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 03-12-2012, 09:08 PM   #1
rnturn
Senior Member
 
Registered: Jan 2003
Location: Illinois (SW Chicago 'burbs)
Distribution: openSUSE, Raspbian, Slackware. Previous: MacOS, Red Hat, Coherent, Consensys SVR4.2, Tru64, Solaris
Posts: 2,801

Rep: Reputation: 550Reputation: 550Reputation: 550Reputation: 550Reputation: 550Reputation: 550
Is this an indication of a memory problem?


Has anyone seen log entries like this?

I have an older system that serves as a name server for a LAN and a print server. Lately it's been hanging for several minutes and the standard fix has been to reboot it. (Which we had been doing even before we saw the log entries.) Here's a sample of the /var/log/messages log entries from the most recent problem:
Code:
Mar 12 17:25:01 vger kernel: Bad page state at prep_new_page (in process 'cron', page c109e2c0)
Mar 12 17:25:01 vger kernel: flags:0x40000824 mapping:cffe0a04 mapcount:0 count:2
Mar 12 17:25:01 vger kernel: Backtrace:
Mar 12 17:25:01 vger kernel:  [<c014017a>] bad_page+0x5a/0xa0
Mar 12 17:25:01 vger kernel:  [<c01405b8>] prep_new_page+0x18/0x60
Mar 12 17:25:01 vger kernel:  [<c0140af9>] buffered_rmqueue+0xb9/0x1f0
Mar 12 17:25:01 vger kernel:  [<c0140dbb>] __alloc_pages+0xeb/0x420
Mar 12 17:25:01 vger kernel:  [<c014569e>] __pagevec_lru_add_active+0x8e/0xa0
Mar 12 17:25:01 vger kernel:  [<c0149fff>] do_wp_page+0x9f/0x2e0
Mar 12 17:25:01 vger kernel:  [<c014af0b>] __handle_mm_fault+0x11b/0x130
Mar 12 17:25:01 vger kernel:  [<c0117497>] do_page_fault+0x127/0x5ef
Mar 12 17:25:01 vger kernel:  [<c0148c22>] free_pte_range+0x32/0x50
Mar 12 17:25:01 vger kernel:  [<c0148d4c>] free_pgd_range+0x10c/0x160
Mar 12 17:25:01 vger kernel:  [<c015aeb9>] invalidate_inode_buffers+0x9/0x40
Mar 12 17:25:01 vger kernel:  [<c016fb39>] clear_inode+0x9/0xf0
Mar 12 17:25:01 vger kernel:  [<c0117370>] do_page_fault+0x0/0x5ef
Mar 12 17:25:01 vger kernel:  [<c0103f0f>] error_code+0x4f/0x60
Mar 12 17:25:01 vger kernel:  [<c0118ffd>] schedule_tail+0x4d/0x70
Mar 12 17:25:02 vger kernel:  [<c0117370>] do_page_fault+0x0/0x5ef
Mar 12 17:25:02 vger kernel:  [<c0103f0f>] error_code+0x4f/0x60
Mar 12 17:25:02 vger kernel: Trying to fix it up, but a reboot is needed
Mar 12 17:25:02 vger kernel: ep_new_page+0x18/0x60
Mar 12 17:25:02 vger kernel:  [<c0140dbb>] __alloc_pages+0xeb/0x420
Mar 12 17:25:02 vger kernel:  [<c014ac80>] do_no_page+0x230/0x2e0
Mar 12 17:25:02 vger kernel:  [<c01ef0d4>] prio_tree_insert+0x84/0x1c0
Mar 12 17:25:02 vger kernel:  [<c014aeca>] __handle_mm_fault+0xda/0x130
Mar 12 17:25:02 vger kernel:  [<c0117497>] do_page_fault+0x127/0x5ef
Mar 12 17:25:02 vger kernel:  [<c014dca7>] change_pte_range+0x27/0x70
Mar 12 17:25:02 vger kernel:  [<c014dd68>] change_protection+0x78/0xd0
Mar 12 17:25:02 vger kernel:  [<c014de9a>] mprotect_fixup+0xda/0x190
Mar 12 17:25:02 vger kernel:  [<c014e0b1>] do_mprotect+0x161/0x230
Mar 12 17:25:02 vger kernel:  [<c0117370>] do_page_fault+0x0/0x5ef
Mar 12 17:25:02 vger kernel:  [<c0103f0f>] error_code+0x4f/0x60
Mar 12 17:25:02 vger kernel: Trying to fix it up, but a reboot is needed
Mar 12 17:25:02 vger kernel: Bad page state at prep_new_page (in process 'sh', page c109d1e0)
Mar 12 17:25:02 vger kernel: flags:0x40000824 mapping:cffe0684 mapcount:0 count:2
Mar 12 17:25:02 vger kernel: Backtrace:
Mar 12 17:25:02 vger kernel:  [<c014017a>] bad_page+0x5a/0xa0

[snip]

Mar 12 19:31:00 vger kernel: Bad page state at prep_new_page (in process 'find', page c10afac0)
Mar 12 19:31:00 vger kernel: flags:0x40000824 mapping:cffe0684 mapcount:0 count:2
Mar 12 19:31:00 vger kernel: Backtrace:
Mar 12 19:31:00 vger kernel:  [<c014017a>] bad_page+0x5a/0xa0
Mar 12 19:31:00 vger kernel:  [<c01405b8>] prep_new_page+0x18/0x60
Mar 12 19:31:00 vger kernel:  [<c0140af9>] buffered_rmqueue+0xb9/0x1f0
Mar 12 19:31:00 vger kernel:  [<c0140dbb>] __alloc_pages+0xeb/0x420
Mar 12 19:31:01 vger kernel:  [<c014c40c>] find_mergeable_anon_vma+0x3c/0xc0
Mar 12 19:31:01 vger kernel:  [<c014a993>] do_anonymous_page+0x63/0x120
Mar 12 19:31:01 vger kernel:  [<c014abcc>] do_no_page+0x17c/0x2e0
Mar 12 19:31:01 vger kernel:  [<c014aeca>] __handle_mm_fault+0xda/0x130
Mar 12 19:31:01 vger kernel:  [<c0117497>] do_page_fault+0x127/0x5ef
Mar 12 19:31:01 vger kernel:  [<c013dd15>] filemap_nopage+0x2c5/0x340
Mar 12 19:31:01 vger kernel:  [<c014ab9f>] do_no_page+0x14f/0x2e0
Mar 12 19:31:01 vger kernel:  [<c014d925>] do_brk+0x275/0x280
Mar 12 19:31:01 vger kernel:  [<c0117370>] do_page_fault+0x0/0x5ef
Mar 12 19:31:01 vger kernel:  [<c0103f0f>] error_code+0x4f/0x60
Mar 12 19:31:01 vger kernel:  [<c01f24c6>] __copy_to_user_ll+0x36/0x60
Mar 12 19:31:01 vger kernel:  [<c01759be>] seq_read+0x21e/0x2e0
Mar 12 19:31:01 vger kernel:  [<c01757a0>] seq_read+0x0/0x2e0
Mar 12 19:31:01 vger kernel:  [<c01590eb>] vfs_read+0x8b/0x170
Mar 12 19:31:01 vger kernel:  [<c015948c>] sys_read+0x3c/0x70
Mar 12 19:31:01 vger kernel:  [<c0102d1b>] sysenter_past_esp+0x54/0x79
Mar 12 19:31:01 vger kernel: Trying to fix it up, but a reboot is needed
When these messages begin showing up in the logs, they go on for 5-6 minutes and then seem to clear up for 8-10 minutes before re-appearing. I'm assuming that when we perceive the system to be hanging, what's actually happening is that the system is struggling to deal with whatever the problem is.

I haven't tried powering down and reseating the memory but will this weekend when I have some more time. (It's an older P-III system so I really don't to invest in new memory for it... at least, I'm hoping it doesn't come to that.)

Any thoughts?

As usual... TIA

--
Rick
 
Old 03-12-2012, 09:42 PM   #2
MS3FGX
LQ Guru
 
Registered: Jan 2004
Location: NJ, USA
Distribution: Slackware, Debian
Posts: 5,852

Rep: Reputation: 361Reputation: 361Reputation: 361Reputation: 361
Possible. A few hours on memtest86 would be the best way to know for sure.
 
1 members found this post helpful.
Old 03-13-2012, 04:57 AM   #3
salasi
Senior Member
 
Registered: Jul 2007
Location: Directly above centre of the earth, UK
Distribution: SuSE, plus some hopping
Posts: 4,070

Rep: Reputation: 897Reputation: 897Reputation: 897Reputation: 897Reputation: 897Reputation: 897Reputation: 897
Bear in mind that, while this does seem to be some kind of 'memory problem', the problem may not be so much a 'hardware problem with the memory' as a 'software bug leading to a program trying to access memory location to which it should not have access'.

The memtest suggestion is simple and easy (although long-winded), so that is probably where you should start.
 
Old 03-13-2012, 02:22 PM   #4
rnturn
Senior Member
 
Registered: Jan 2003
Location: Illinois (SW Chicago 'burbs)
Distribution: openSUSE, Raspbian, Slackware. Previous: MacOS, Red Hat, Coherent, Consensys SVR4.2, Tru64, Solaris
Posts: 2,801

Original Poster
Rep: Reputation: 550Reputation: 550Reputation: 550Reputation: 550Reputation: 550Reputation: 550
Quote:
Originally Posted by salasi View Post
... may not be so much a 'hardware problem with the memory' as a 'software bug leading to a program trying to access memory location to which it should not have access'.
I'm leaning toward it being a hardware problem since sprinkled in with all these kernel error messages are strings that say that "sh", or "cron", or "find" is the process running when the kernel problem occurs. It doesn't look like any single piece of software that is getting bit by this.

Quote:
The memtest suggestion is simple and easy (although long-winded), so that is probably where you should start.
Yeah... I tried compiling memtest86 for the system having the problem and it bombed with some odd errors and never finished compiling. Possibly due to there never having been a need for doing development on that system and there are probably many tools missing. Compiling on another system (with newer kernel, gcc, etc. ) was successful but when I tried adding it to the grub menu or copying it onto a boot floppy (hey.. I said it was an older system) booting the memory test doesn't do more than hang the system. Looks like I have some more leisure reading to do. Either that or I need to track down the old memtest86 floppy that I have. Somewhere.

Thanks...

--
Rick
 
Old 03-13-2012, 03:04 PM   #5
wpeckham
LQ Guru
 
Registered: Apr 2010
Location: Continental USA
Distribution: Debian, Ubuntu, RedHat, DSL, Puppy, CentOS, Knoppix, Mint-DE, Sparky, VSIDO, tinycore, Q4OS,Manjaro
Posts: 5,623

Rep: Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695Reputation: 2695
memtest

There are some liveCD and floppy diagnostic distribution versions of Linux (tomasrbt? Certainly puppy.) that mave excellent memroy test software. You could just boot one of those and leave it burn-test the ram for a few hours.

It would not have to be a particular application to be a software issue? Hardware is more likely, but it is possible that a kernel module is faulty. That have been a LOT or memory fixes per year for the last decade!
 
Old 03-13-2012, 05:05 PM   #6
rnturn
Senior Member
 
Registered: Jan 2003
Location: Illinois (SW Chicago 'burbs)
Distribution: openSUSE, Raspbian, Slackware. Previous: MacOS, Red Hat, Coherent, Consensys SVR4.2, Tru64, Solaris
Posts: 2,801

Original Poster
Rep: Reputation: 550Reputation: 550Reputation: 550Reputation: 550Reputation: 550Reputation: 550
Quote:
Originally Posted by wpeckham View Post
Hardware is more likely, but it is possible that a kernel module is faulty. That have been a LOT or memory fixes per year for the last decade!
I thought about that. I would expect, though, that if some software was at fault, the problem would be fairly frequent and continuous. I went back through the old /var/log/messages files and found another flurry of error messages like those in my original post back in July of last year. Then nothing until fairly recently. That makes me think that the underlying problem is hardware related. I can't imagine that software-based errors would come and go like that. (At least I've never seen that before.) I've been tailing the messages file on that system since 10:00AM this morning and none of those nasty kernel messages have appeared. In fact, none have shown up since the last error storm that ended at 19:31 last night. The power cycling that I did then probably cleared up the problem for a while and, unfortunately, will likely make it trickier to diagnose.

--
Rick
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Internet connection indication ikon colintivy Linux - Desktop 3 06-08-2011 06:33 AM
Iowait indication c0m4r Linux - Server 4 08-20-2010 10:24 AM
Original Poster Indication is Great! blackhole54 LQ Suggestions & Feedback 12 06-01-2009 10:07 PM
Searching for app: cp with status indication tle02 Linux - Software 2 03-30-2006 05:17 AM
Eth link up/down indication prital Linux - Networking 2 11-30-2005 04:58 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 09:37 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration