LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 02-01-2006, 02:15 PM   #1
whysyn
Member
 
Registered: Jun 2003
Location: Cleveburg, OH
Distribution: mostly Fedora
Posts: 154

Rep: Reputation: 30
random core dumps, please help troubleshoot


Hi everybody!

I think these might be related to mysql... server is RedHat 8.0, kernel 2.4.18-14smp, mysql version 3.23.52.

It runs 5 big databases with about 75 tables total. There are about 150 clients doing inserts, and scripts running that mine data from it. Ttotal insert/update traffic is roughly 2.5 gig per 24 hours, and total online data is about 200 gigs.

The past two days, it has core dumped or locked hard 4 times. I can find no issues with the server... load average is within reason, disks aren't full, etc. This is driving me nuts!

Any ideas, suggestions on investgating this, etc are greatly appreciated. Thank you all!

I happened to get a picture of the screen (unfortunately poor quality) which you can see HERE

I coped there text here as well (was hard for me to read, might have an error or two):
Code:
autofs 3c59x iptable_filter ip_tables ide-scsi ide-cd cdrom mousedev keybdev h
CPU:    0
EIP:    0010:[<c0140bb6>]    Not tainted
EFLAGS: 00010282

EIP is at __free_pages_ok [kernel] 0x326 (2.4.18-14smp)
eax: 00000047   ebx: c1e24450   ecx: eded0000   edx: d7d07014
esi: 00000000   edi: f3ba19b4   ebp: 00000000   esp: 49613ecc
ds: 0018   es: 0018   ss: 0018
Process mysqld (pid: 29090, stackpage=d9613000)
Stack: c0296360 c1e24450 f3ba19b4 c0147868 c1e24450 00001000 c0137aa5 000042d8
       00000000 000015b4 00001000 c1e24450 f3ba19b4 000042d9 c01374e6 d9613f6c
       c1e24450 00000000 00001000 00001000 00000000 00000000 00000000 f3ba1900
Call Trace: [<c0147868>] kmap_high [kernel] 0x50 (0xd9613ed8))
[<c0137aa5>] file_read_actor [kernel] 0xd5 (0xd9613f30))
[<c01374e6>] do_generic_file_read [kernel] 0x266 (0xd9613f04))
[<c01379d0>] file_read_actor [kernel] 0x0 (0xd9613f30))
[<c0137b80>] generic_file_read [kernel] 0xb0 (0xd9613f50))
[<c01379d0>] file_read_actor [kernel] 0x0 (0xd9613f60))
[<c014a5fa>] sys_pread [kernel] 0xca (0xd9613f8c))
[<c0109447>] system_call [kernel] 0x33 (0xd9613fc0))


Code: 0f 0b 82 00 99 5a 27 c0 8b 53 08 e9 0b fd ff ff 89 d8 e8 b3
 
Old 02-02-2006, 10:18 AM   #2
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3599Reputation: 3599Reputation: 3599Reputation: 3599Reputation: 3599Reputation: 3599Reputation: 3599Reputation: 3599Reputation: 3599Reputation: 3599Reputation: 3599
The past two days, it has core dumped or locked hard 4 times. / I think these might be related to mysql...
Get a copy of your syslogging. Most OOPSes should be logged there. Diff all four and if they're not equal post at least two of them: better to have more and accurate nfo because screenshots +typing can't compensate for lines scrolled off the screen. Next to that, does *any* daemon/application log show errors before the OOPS? Are these the only four OOPSes? In the past six months? Year? How about database and users? Was there an increase of usage? Recently? Where there any applications added? Any (recent?) other changes to the box?


I can find no issues with the server... load average is within reason, disks aren't full, etc.
Do you run continuous stats with like Sa, Atsar or Dstat? Esp. in cases where problems don't appear every 5 minutes it comes in handy to be able to paint a larger picture of what is going on.


server is RedHat 8.0, kernel 2.4.18-14smp, mysql version 3.23.52.
Was this box designed and configured for this task?
BTW, any compelling reason for running an EOL'ed release and vulnerable kernel?
 
Old 02-02-2006, 12:51 PM   #3
whysyn
Member
 
Registered: Jun 2003
Location: Cleveburg, OH
Distribution: mostly Fedora
Posts: 154

Original Poster
Rep: Reputation: 30
Thanks for the response, I know I'm a total hack and I always appreciate knowledgable users having patience for me...

Syslogging: here (linked due to length)
Only one of the 4 crashes actually wrote to log, and it looks like syslogd is double logging them (I'll have to look into that also) but I left it as-is to avoid any errors. The server has been rock solid since installation. Only issues where due to filling disks on a couple of occasions.

There has in the past 10 days or so been a modest increase in volume... of the 150 concurrent client connections I mentioned, about 10 of them are new. Their individual volume is not much different from average for our clients, but it is 10 new ones.

I can find no other application logs in the crash timeframe, everything seemed to be normal.

SA is running in cron, but I have never dealt with it, nor do I know if it is even running properly. I'll have to look into this, any suggestions welcome =)

This box was spec'd and built for this task and this task only, and was build from factory-new parts and has been in continuous operation since late 2003 (EDIT: late 2002, I can't subtract...). We're still running RH8.0 (1) because it hasn't been broken until now and (2) downtime / prohibitive costs ( parallel hardware, man hours, et al) associated with an upgrade.

Thanks again!

EDIT: typos

Last edited by whysyn; 02-02-2006 at 08:38 PM.
 
Old 02-15-2006, 03:11 PM   #4
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3599Reputation: 3599Reputation: 3599Reputation: 3599Reputation: 3599Reputation: 3599Reputation: 3599Reputation: 3599Reputation: 3599Reputation: 3599Reputation: 3599
Sorry. Way late.

Feb 1 10:57:40 durant kernel: Page has mapping still set. This is a serious situation. However if you
Feb 1 10:57:40 durant kernel: kernel BUG at page_alloc.c:130!
Feb 1 10:57:40 durant kernel: EIP is at __free_pages_ok [kernel] 0x326 (2.4.18-14smp)

IIGC this has something to do with trying to free a page while it is still in use, and I can't think of any other advice than moving to a later kernel, maybe updating through Fedora Legacy is an option.
 
Old 05-08-2006, 01:56 PM   #5
whysyn
Member
 
Registered: Jun 2003
Location: Cleveburg, OH
Distribution: mostly Fedora
Posts: 154

Original Poster
Rep: Reputation: 30
Quote:
Originally Posted by unSpawn
Sorry. Way late.
I'm even later =)

It turned out to be a heat issue. One of the CPU fans had died. After replacing it, the system has been rock solid.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Ps dumps core in linux 2.4.22 rbecker Linux - General 2 01-04-2006 12:43 PM
Does limewire cause x server core dumps, or have I been hacked? cyberdwarf Linux - Software 0 07-01-2004 02:52 PM
Reading Core Dumps bru Linux - General 3 05-07-2004 08:19 PM
Disabling core dumps in mandrake pshepperd Linux - Software 2 09-26-2003 10:16 PM
Upgraded glibc to 2.3; now rpm segfaults/dumps core nfisk Linux - Software 0 08-26-2003 05:50 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 10:43 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration