Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Distribution: openSUSE, Raspbian, Slackware. Previous: MacOS, Red Hat, Coherent, Consensys SVR4.2, Tru64, Solaris
Posts: 2,801
Rep:
Is this an indication of a memory problem?
Has anyone seen log entries like this?
I have an older system that serves as a name server for a LAN and a print server. Lately it's been hanging for several minutes and the standard fix has been to reboot it. (Which we had been doing even before we saw the log entries.) Here's a sample of the /var/log/messages log entries from the most recent problem:
Code:
Mar 12 17:25:01 vger kernel: Bad page state at prep_new_page (in process 'cron', page c109e2c0)
Mar 12 17:25:01 vger kernel: flags:0x40000824 mapping:cffe0a04 mapcount:0 count:2
Mar 12 17:25:01 vger kernel: Backtrace:
Mar 12 17:25:01 vger kernel: [<c014017a>] bad_page+0x5a/0xa0
Mar 12 17:25:01 vger kernel: [<c01405b8>] prep_new_page+0x18/0x60
Mar 12 17:25:01 vger kernel: [<c0140af9>] buffered_rmqueue+0xb9/0x1f0
Mar 12 17:25:01 vger kernel: [<c0140dbb>] __alloc_pages+0xeb/0x420
Mar 12 17:25:01 vger kernel: [<c014569e>] __pagevec_lru_add_active+0x8e/0xa0
Mar 12 17:25:01 vger kernel: [<c0149fff>] do_wp_page+0x9f/0x2e0
Mar 12 17:25:01 vger kernel: [<c014af0b>] __handle_mm_fault+0x11b/0x130
Mar 12 17:25:01 vger kernel: [<c0117497>] do_page_fault+0x127/0x5ef
Mar 12 17:25:01 vger kernel: [<c0148c22>] free_pte_range+0x32/0x50
Mar 12 17:25:01 vger kernel: [<c0148d4c>] free_pgd_range+0x10c/0x160
Mar 12 17:25:01 vger kernel: [<c015aeb9>] invalidate_inode_buffers+0x9/0x40
Mar 12 17:25:01 vger kernel: [<c016fb39>] clear_inode+0x9/0xf0
Mar 12 17:25:01 vger kernel: [<c0117370>] do_page_fault+0x0/0x5ef
Mar 12 17:25:01 vger kernel: [<c0103f0f>] error_code+0x4f/0x60
Mar 12 17:25:01 vger kernel: [<c0118ffd>] schedule_tail+0x4d/0x70
Mar 12 17:25:02 vger kernel: [<c0117370>] do_page_fault+0x0/0x5ef
Mar 12 17:25:02 vger kernel: [<c0103f0f>] error_code+0x4f/0x60
Mar 12 17:25:02 vger kernel: Trying to fix it up, but a reboot is needed
Mar 12 17:25:02 vger kernel: ep_new_page+0x18/0x60
Mar 12 17:25:02 vger kernel: [<c0140dbb>] __alloc_pages+0xeb/0x420
Mar 12 17:25:02 vger kernel: [<c014ac80>] do_no_page+0x230/0x2e0
Mar 12 17:25:02 vger kernel: [<c01ef0d4>] prio_tree_insert+0x84/0x1c0
Mar 12 17:25:02 vger kernel: [<c014aeca>] __handle_mm_fault+0xda/0x130
Mar 12 17:25:02 vger kernel: [<c0117497>] do_page_fault+0x127/0x5ef
Mar 12 17:25:02 vger kernel: [<c014dca7>] change_pte_range+0x27/0x70
Mar 12 17:25:02 vger kernel: [<c014dd68>] change_protection+0x78/0xd0
Mar 12 17:25:02 vger kernel: [<c014de9a>] mprotect_fixup+0xda/0x190
Mar 12 17:25:02 vger kernel: [<c014e0b1>] do_mprotect+0x161/0x230
Mar 12 17:25:02 vger kernel: [<c0117370>] do_page_fault+0x0/0x5ef
Mar 12 17:25:02 vger kernel: [<c0103f0f>] error_code+0x4f/0x60
Mar 12 17:25:02 vger kernel: Trying to fix it up, but a reboot is needed
Mar 12 17:25:02 vger kernel: Bad page state at prep_new_page (in process 'sh', page c109d1e0)
Mar 12 17:25:02 vger kernel: flags:0x40000824 mapping:cffe0684 mapcount:0 count:2
Mar 12 17:25:02 vger kernel: Backtrace:
Mar 12 17:25:02 vger kernel: [<c014017a>] bad_page+0x5a/0xa0
[snip]
Mar 12 19:31:00 vger kernel: Bad page state at prep_new_page (in process 'find', page c10afac0)
Mar 12 19:31:00 vger kernel: flags:0x40000824 mapping:cffe0684 mapcount:0 count:2
Mar 12 19:31:00 vger kernel: Backtrace:
Mar 12 19:31:00 vger kernel: [<c014017a>] bad_page+0x5a/0xa0
Mar 12 19:31:00 vger kernel: [<c01405b8>] prep_new_page+0x18/0x60
Mar 12 19:31:00 vger kernel: [<c0140af9>] buffered_rmqueue+0xb9/0x1f0
Mar 12 19:31:00 vger kernel: [<c0140dbb>] __alloc_pages+0xeb/0x420
Mar 12 19:31:01 vger kernel: [<c014c40c>] find_mergeable_anon_vma+0x3c/0xc0
Mar 12 19:31:01 vger kernel: [<c014a993>] do_anonymous_page+0x63/0x120
Mar 12 19:31:01 vger kernel: [<c014abcc>] do_no_page+0x17c/0x2e0
Mar 12 19:31:01 vger kernel: [<c014aeca>] __handle_mm_fault+0xda/0x130
Mar 12 19:31:01 vger kernel: [<c0117497>] do_page_fault+0x127/0x5ef
Mar 12 19:31:01 vger kernel: [<c013dd15>] filemap_nopage+0x2c5/0x340
Mar 12 19:31:01 vger kernel: [<c014ab9f>] do_no_page+0x14f/0x2e0
Mar 12 19:31:01 vger kernel: [<c014d925>] do_brk+0x275/0x280
Mar 12 19:31:01 vger kernel: [<c0117370>] do_page_fault+0x0/0x5ef
Mar 12 19:31:01 vger kernel: [<c0103f0f>] error_code+0x4f/0x60
Mar 12 19:31:01 vger kernel: [<c01f24c6>] __copy_to_user_ll+0x36/0x60
Mar 12 19:31:01 vger kernel: [<c01759be>] seq_read+0x21e/0x2e0
Mar 12 19:31:01 vger kernel: [<c01757a0>] seq_read+0x0/0x2e0
Mar 12 19:31:01 vger kernel: [<c01590eb>] vfs_read+0x8b/0x170
Mar 12 19:31:01 vger kernel: [<c015948c>] sys_read+0x3c/0x70
Mar 12 19:31:01 vger kernel: [<c0102d1b>] sysenter_past_esp+0x54/0x79
Mar 12 19:31:01 vger kernel: Trying to fix it up, but a reboot is needed
When these messages begin showing up in the logs, they go on for 5-6 minutes and then seem to clear up for 8-10 minutes before re-appearing. I'm assuming that when we perceive the system to be hanging, what's actually happening is that the system is struggling to deal with whatever the problem is.
I haven't tried powering down and reseating the memory but will this weekend when I have some more time. (It's an older P-III system so I really don't to invest in new memory for it... at least, I'm hoping it doesn't come to that.)
Bear in mind that, while this does seem to be some kind of 'memory problem', the problem may not be so much a 'hardware problem with the memory' as a 'software bug leading to a program trying to access memory location to which it should not have access'.
The memtest suggestion is simple and easy (although long-winded), so that is probably where you should start.
Distribution: openSUSE, Raspbian, Slackware. Previous: MacOS, Red Hat, Coherent, Consensys SVR4.2, Tru64, Solaris
Posts: 2,801
Original Poster
Rep:
Quote:
Originally Posted by salasi
... may not be so much a 'hardware problem with the memory' as a 'software bug leading to a program trying to access memory location to which it should not have access'.
I'm leaning toward it being a hardware problem since sprinkled in with all these kernel error messages are strings that say that "sh", or "cron", or "find" is the process running when the kernel problem occurs. It doesn't look like any single piece of software that is getting bit by this.
Quote:
The memtest suggestion is simple and easy (although long-winded), so that is probably where you should start.
Yeah... I tried compiling memtest86 for the system having the problem and it bombed with some odd errors and never finished compiling. Possibly due to there never having been a need for doing development on that system and there are probably many tools missing. Compiling on another system (with newer kernel, gcc, etc. ) was successful but when I tried adding it to the grub menu or copying it onto a boot floppy (hey.. I said it was an older system) booting the memory test doesn't do more than hang the system. Looks like I have some more leisure reading to do. Either that or I need to track down the old memtest86 floppy that I have. Somewhere.
There are some liveCD and floppy diagnostic distribution versions of Linux (tomasrbt? Certainly puppy.) that mave excellent memroy test software. You could just boot one of those and leave it burn-test the ram for a few hours.
It would not have to be a particular application to be a software issue? Hardware is more likely, but it is possible that a kernel module is faulty. That have been a LOT or memory fixes per year for the last decade!
Distribution: openSUSE, Raspbian, Slackware. Previous: MacOS, Red Hat, Coherent, Consensys SVR4.2, Tru64, Solaris
Posts: 2,801
Original Poster
Rep:
Quote:
Originally Posted by wpeckham
Hardware is more likely, but it is possible that a kernel module is faulty. That have been a LOT or memory fixes per year for the last decade!
I thought about that. I would expect, though, that if some software was at fault, the problem would be fairly frequent and continuous. I went back through the old /var/log/messages files and found another flurry of error messages like those in my original post back in July of last year. Then nothing until fairly recently. That makes me think that the underlying problem is hardware related. I can't imagine that software-based errors would come and go like that. (At least I've never seen that before.) I've been tailing the messages file on that system since 10:00AM this morning and none of those nasty kernel messages have appeared. In fact, none have shown up since the last error storm that ended at 19:31 last night. The power cycling that I did then probably cleared up the problem for a while and, unfortunately, will likely make it trickier to diagnose.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.