Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
02-01-2006, 01:15 PM
|
#1
|
Member
Registered: Jun 2003
Location: Cleveburg, OH
Distribution: mostly Fedora
Posts: 154
Rep:
|
random core dumps, please help troubleshoot
Hi everybody!
I think these might be related to mysql... server is RedHat 8.0, kernel 2.4.18-14smp, mysql version 3.23.52.
It runs 5 big databases with about 75 tables total. There are about 150 clients doing inserts, and scripts running that mine data from it. Ttotal insert/update traffic is roughly 2.5 gig per 24 hours, and total online data is about 200 gigs.
The past two days, it has core dumped or locked hard 4 times. I can find no issues with the server... load average is within reason, disks aren't full, etc. This is driving me nuts!
Any ideas, suggestions on investgating this, etc are greatly appreciated. Thank you all!
I happened to get a picture of the screen (unfortunately poor quality) which you can see HERE
I coped there text here as well (was hard for me to read, might have an error or two):
Code:
autofs 3c59x iptable_filter ip_tables ide-scsi ide-cd cdrom mousedev keybdev h
CPU: 0
EIP: 0010:[<c0140bb6>] Not tainted
EFLAGS: 00010282
EIP is at __free_pages_ok [kernel] 0x326 (2.4.18-14smp)
eax: 00000047 ebx: c1e24450 ecx: eded0000 edx: d7d07014
esi: 00000000 edi: f3ba19b4 ebp: 00000000 esp: 49613ecc
ds: 0018 es: 0018 ss: 0018
Process mysqld (pid: 29090, stackpage=d9613000)
Stack: c0296360 c1e24450 f3ba19b4 c0147868 c1e24450 00001000 c0137aa5 000042d8
00000000 000015b4 00001000 c1e24450 f3ba19b4 000042d9 c01374e6 d9613f6c
c1e24450 00000000 00001000 00001000 00000000 00000000 00000000 f3ba1900
Call Trace: [<c0147868>] kmap_high [kernel] 0x50 (0xd9613ed8))
[<c0137aa5>] file_read_actor [kernel] 0xd5 (0xd9613f30))
[<c01374e6>] do_generic_file_read [kernel] 0x266 (0xd9613f04))
[<c01379d0>] file_read_actor [kernel] 0x0 (0xd9613f30))
[<c0137b80>] generic_file_read [kernel] 0xb0 (0xd9613f50))
[<c01379d0>] file_read_actor [kernel] 0x0 (0xd9613f60))
[<c014a5fa>] sys_pread [kernel] 0xca (0xd9613f8c))
[<c0109447>] system_call [kernel] 0x33 (0xd9613fc0))
Code: 0f 0b 82 00 99 5a 27 c0 8b 53 08 e9 0b fd ff ff 89 d8 e8 b3
|
|
|
02-02-2006, 09:18 AM
|
#2
|
Moderator
Registered: May 2001
Posts: 29,415
|
The past two days, it has core dumped or locked hard 4 times. / I think these might be related to mysql...
Get a copy of your syslogging. Most OOPSes should be logged there. Diff all four and if they're not equal post at least two of them: better to have more and accurate nfo because screenshots +typing can't compensate for lines scrolled off the screen. Next to that, does *any* daemon/application log show errors before the OOPS? Are these the only four OOPSes? In the past six months? Year? How about database and users? Was there an increase of usage? Recently? Where there any applications added? Any (recent?) other changes to the box?
I can find no issues with the server... load average is within reason, disks aren't full, etc.
Do you run continuous stats with like Sa, Atsar or Dstat? Esp. in cases where problems don't appear every 5 minutes it comes in handy to be able to paint a larger picture of what is going on.
server is RedHat 8.0, kernel 2.4.18-14smp, mysql version 3.23.52.
Was this box designed and configured for this task?
BTW, any compelling reason for running an EOL'ed release and vulnerable kernel?
|
|
|
02-02-2006, 11:51 AM
|
#3
|
Member
Registered: Jun 2003
Location: Cleveburg, OH
Distribution: mostly Fedora
Posts: 154
Original Poster
Rep:
|
Thanks for the response, I know I'm a total hack and I always appreciate knowledgable users having patience for me...
Syslogging: here (linked due to length)
Only one of the 4 crashes actually wrote to log, and it looks like syslogd is double logging them (I'll have to look into that also) but I left it as-is to avoid any errors. The server has been rock solid since installation. Only issues where due to filling disks on a couple of occasions.
There has in the past 10 days or so been a modest increase in volume... of the 150 concurrent client connections I mentioned, about 10 of them are new. Their individual volume is not much different from average for our clients, but it is 10 new ones.
I can find no other application logs in the crash timeframe, everything seemed to be normal.
SA is running in cron, but I have never dealt with it, nor do I know if it is even running properly. I'll have to look into this, any suggestions welcome =)
This box was spec'd and built for this task and this task only, and was build from factory-new parts and has been in continuous operation since late 2003 (EDIT: late 2002, I can't subtract...). We're still running RH8.0 (1) because it hasn't been broken until now and (2) downtime / prohibitive costs ( parallel hardware, man hours, et al) associated with an upgrade.
Thanks again!
EDIT: typos
Last edited by whysyn; 02-02-2006 at 07:38 PM.
|
|
|
02-15-2006, 02:11 PM
|
#4
|
Moderator
Registered: May 2001
Posts: 29,415
|
Sorry. Way late.
Feb 1 10:57:40 durant kernel: Page has mapping still set. This is a serious situation. However if you
Feb 1 10:57:40 durant kernel: kernel BUG at page_alloc.c:130!
Feb 1 10:57:40 durant kernel: EIP is at __free_pages_ok [kernel] 0x326 (2.4.18-14smp)
IIGC this has something to do with trying to free a page while it is still in use, and I can't think of any other advice than moving to a later kernel, maybe updating through Fedora Legacy is an option.
|
|
|
05-08-2006, 12:56 PM
|
#5
|
Member
Registered: Jun 2003
Location: Cleveburg, OH
Distribution: mostly Fedora
Posts: 154
Original Poster
Rep:
|
Quote:
Originally Posted by unSpawn
Sorry. Way late.
|
I'm even later =)
It turned out to be a heat issue. One of the CPU fans had died. After replacing it, the system has been rock solid.
|
|
|
All times are GMT -5. The time now is 12:11 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|