LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 04-06-2011, 01:20 PM   #1
Willow315
LQ Newbie
 
Registered: Apr 2011
Posts: 5

Rep: Reputation: 0
CentOS server crashing repeatedly, sometimes Timed Out


I am running a Linux Server, with CentOS OS. Apache, MySQL, and PHP.
Running sites through Drupal 6. Recently the site started timing out periodically. Sometimes restarting services brings it back up. Sometimes I am locked, and cannot restart services or reboot remotely out and have to restart the whole server manually.

I am so new to Linux, I don't have the foggiest idea about where to begin to find out what the problem is. I can find the error logs, but making sense of them is another story. I suspect it's a problem with MySQL, but don't know how to confirm that, and if it is, don't know what to do about it. iiiiieeee. When I restart MySQL service, the site comes right back up. But what is taking it down to begin with?

I have tons of books, but they only tell you how things work when everyting is running perfectly. I've had several experts optimize the Apache config file, and things were fine for a while. We've had this problem before. As a matter of fact, one of the last times, the site went down and couldn't get it back up until I reverted to an older version of the db. Then, that failed the first time until a huge bunch of Boost cache files were deleted. I've done all the clean-up, I can think of, but the durn thing keeps crashing.

Any assistance at all would be helpful. Thanks, WCW
 
Old 04-06-2011, 01:32 PM   #2
Walter.Stroebel
LQ Newbie
 
Registered: Apr 2011
Location: Arnhem, The Netherlands
Distribution: CentOS
Posts: 19

Rep: Reputation: 6
Just to make sure, when you type dmesg in a terminal window, you don't get a lot of hardware errors, right?
If there is a problem with MySQL, there should be some pretty clear messages in /var/log/mysql.
 
Old 04-07-2011, 08:57 AM   #3
Willow315
LQ Newbie
 
Registered: Apr 2011
Posts: 5

Original Poster
Rep: Reputation: 0
Thanks Walter. I have just run dmesg, I've noticed down at the bottom some mysql errors and then at the end something that says, "out of socket memory." I don't know how to interpret that. Here are my results:

ide1: BM-DMA at 0xfc08-0xfc0f, BIOS settings: hdcio, hddio
Probing IDE interface ide0...
hda: TEAC CD-ROM CD-224E, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
Probing IDE interface ide1...
ide-floppy driver 0.99.newide
usbcore: registered new driver hiddev
usbcore: registered new driver usbhid
drivers/usb/input/hid-core.c: v2.6:USB HID core driver
PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
serio: i8042 KBD port at 0x60,0x64 irq 1
serio: i8042 AUX port at 0x60,0x64 irq 12
mice: PS/2 mouse device common for all mice
md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: bitmap version 4.39
TCP bic registered
Initializing IPsec netlink socket
NET: Registered protocol family 1
NET: Registered protocol family 17
Using IPI No-Shortcut mode
ACPI: (supports S0 S4 S5)
Initalizing network drop monitor service
Freeing unused kernel memory: 228k freed
Write protecting the kernel read-only data: 409k
Time: tsc clocksource has been installed.
input: AT Translated Set 2 keyboard as /class/input/input0
ACPI: PCI Interrupt 0000:00:1d.7[D] -> GSI 23 (level, low) -> IRQ 177
PCI: Setting latency timer of device 0000:00:1d.7 to 64
ehci_hcd 0000:00:1d.7: EHCI Host Controller
ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 1
ehci_hcd 0000:00:1d.7: debug port 1
PCI: cache line size of 128 is not supported by device 0000:00:1d.7
ehci_hcd 0000:00:1d.7: irq 177, io mem 0xfeb00000
ehci_hcd 0000:00:1d.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 6 ports detected
ohci_hcd: 2005 April 22 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI)
USB Universal Host Controller Interface driver v3.0
ACPI: PCI Interrupt 0000:00:1d.0[A] -> GSI 16 (level, low) -> IRQ 169
PCI: Setting latency timer of device 0000:00:1d.0 to 64
uhci_hcd 0000:00:1d.0: UHCI Host Controller
uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 2
uhci_hcd 0000:00:1d.0: irq 169, io base 0x0000bce0
usb usb2: configuration #1 chosen from 1 choice
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 2 ports detected
ACPI: PCI Interrupt 0000:00:1d.1[B] -> GSI 19 (level, low) -> IRQ 185
PCI: Setting latency timer of device 0000:00:1d.1 to 64
uhci_hcd 0000:00:1d.1: UHCI Host Controller
uhci_hcd 0000:00:1d.1: new USB bus registered, assigned bus number 3
uhci_hcd 0000:00:1d.1: irq 185, io base 0x0000bcc0
usb usb3: configuration #1 chosen from 1 choice
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 2 ports detected
usb 1-3: new high speed USB device using ehci_hcd and address 2
ACPI: PCI Interrupt 0000:00:1d.2[C] -> GSI 18 (level, low) -> IRQ 193
PCI: Setting latency timer of device 0000:00:1d.2 to 64
uhci_hcd 0000:00:1d.2: UHCI Host Controller
uhci_hcd 0000:00:1d.2: new USB bus registered, assigned bus number 4
uhci_hcd 0000:00:1d.2: irq 193, io base 0x0000bca0
usb usb4: configuration #1 chosen from 1 choice
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 2 ports detected
input: ImExPS/2 Logitech Wheel Mouse as /class/input/input1
usb 1-3: configuration #1 chosen from 1 choice
hub 1-3:1.0: USB hub found
hub 1-3:1.0: 2 ports detected
megaraid cmm: 2.20.2.7 (Release Date: Sun Jul 16 00:01:03 EST 2006)
SCSI subsystem initialized
megaraid: 2.20.5.1 (Release Date: Thu Nov 16 15:32:35 EST 2006)
megaraid: probe new device 0x1028:0x0013:0x1028:0x016d: bus 2:slot 14:func 0
ACPI: PCI Interrupt 0000:02:0e.0[A] -> GSI 46 (level, low) -> IRQ 201
megaraid: fw version:[513O] bios version:[H418]
scsi0 : LSI Logic MegaRAID driver
scsi[0]: scanning scsi channel 0 [Phy 0] for non-raid devices
Vendor: PE/PV Model: 1x6 SCSI BP Rev: 1.0
Type: Processor ANSI SCSI revision: 02
scsi[0]: scanning scsi channel 1 [Phy 1] for non-raid devices
scsi[0]: scanning scsi channel 2 [virtual] for logical drives
Vendor: MegaRAID Model: LD 0 RAID1 139G Rev: 513O
Type: Direct-Access ANSI SCSI revision: 02
SCSI device sda: 286515200 512-byte hdwr sectors (146696 MB)
sda: Write Protect is off
sda: Mode Sense: 00 00 00 00
sda: asking for cache data failed
sda: assuming drive cache: write through
SCSI device sda: 286515200 512-byte hdwr sectors (146696 MB)
sda: Write Protect is off
sda: Mode Sense: 00 00 00 00
sda: asking for cache data failed
sda: assuming drive cache: write through
sda: sda1 sda2
sd 0:2:0:0: Attached scsi disk sda
Vendor: MegaRAID Model: LD 1 RAID1 140G Rev: 513O
Type: Direct-Access ANSI SCSI revision: 02
SCSI device sdb: 287047680 512-byte hdwr sectors (146968 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 00 00 00
sdb: asking for cache data failed
sdb: assuming drive cache: write through
SCSI device sdb: 287047680 512-byte hdwr sectors (146968 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 00 00 00
sdb: asking for cache data failed
sdb: assuming drive cache: write through
sdb: sdb1
sd 0:2:1:0: Attached scsi disk sdb
libata version 3.00 loaded.
device-mapper: uevent: version 1.0.3
device-mapper: ioctl: 4.11.5-ioctl (2007-12-12) initialised: dm-devel@redhat.com
device-mapper: dm-raid45: initialized v0.2594l
kjournald starting. Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
SELinux: Disabled at runtime.
SELinux: Unregistering netfilter hooks
type=1404 audit(1302149482.468:2): selinux=0 auid=4294967295 ses=4294967295
Intel(R) PRO/1000 Network Driver - version 7.3.21-k4.1-NAPI
Copyright (c) 1999-2006 Intel Corporation.
ACPI: PCI Interrupt 0000:06:07.0[A] -> GSI 64 (level, low) -> IRQ 209
EDAC MC: Ver: 2.0.1 Jan 5 2011
e1000: 0000:06:07.0: e1000_probe: (PCI:66MHz:32-bit) 00:11:43:d4:74:3b
input: PC Speaker as /class/input/input2
intel_rng: FWH not detected
Contact your BIOS vendor to see if the E752x error registers can be safely un-hidden
hda: ATAPI 24X CD-ROM drive, 128kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
scsi 0:0:6:0: Attached scsi generic sg0 type 3
sd 0:2:0:0: Attached scsi generic sg1 type 0
sd 0:2:1:0: Attached scsi generic sg2 type 0
Floppy drive(s): fd0 is 1.44M
FDC 0 is a National Semiconductor PC87306
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
ACPI: PCI Interrupt 0000:07:08.0[A] -> GSI 65 (level, low) -> IRQ 217
e1000: 0000:07:08.0: e1000_probe: (PCI:66MHz:32-bit) 00:11:43:d4:74:3c
e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection
lp: driver loaded but no devices found
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
IPv6 over IPv4 tunneling driver
mtrr: type mismatch for f0000000,1000000 old: write-back new: write-combining
ACPI: Power Button (FF) [PWRF]
ACPI: Mapper loaded
dell-wmi: No known WMI GUID found
ACPI Exception (video-1450): AE_NOT_FOUND, Evaluating _DOD [20060707]
input: Video Bus as /class/input/input3
ACPI: Video Device [EVGA] (multi-head: no rom: yes post: no)
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
device-mapper: multipath: version 1.0.5 loaded
EXT3 FS on dm-0, internal journal
kjournald starting. Commit interval 5 seconds
EXT3 FS on sda1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Adding 5668856k swap on /dev/VolGroup00/LogVol01. Priority:-1 extents:1 across:5668856k
IA-32 Microcode Update Driver: v1.14a <tigran@veritas.com>
ip6_tables: (C) 2000-2006 Netfilter Core Team
ip_tables: (C) 2000-2006 Netfilter Core Team
Netfilter messages via NETLINK v0.30.
ip_conntrack version 2.4 (8192 buckets, 65536 max) - 228 bytes per conntrack
ADDRCONF(NETDEV_UP): eth0: link is not ready
e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Bluetooth: Core ver 2.10
NET: Registered protocol family 31
Bluetooth: HCI device and connection manager initialized
Bluetooth: HCI socket layer initialized
Bluetooth: L2CAP ver 2.8
Bluetooth: L2CAP socket layer initialized
Bluetooth: RFCOMM socket layer initialized
Bluetooth: RFCOMM TTY layer initialized
Bluetooth: RFCOMM ver 1.8
Bluetooth: HIDP (Human Interface Emulation) ver 1.1
eth0: no IPv6 routers present
mtrr: type mismatch for f0000000,1000000 old: write-back new: write-combining
INFO: task mysqld:2915 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mysqld D 0000190D 1992 2915 2721 2917 (NOTLB)
f7679f70 00200082 d5e2da98 0000190d 6e41cbd8 f71d9000 c0434180 00000005
f74a8aa0 d5e43e4b 0000190d 000163b3 00000000 f74a8bac cd412d80 f76a3740
f74a8aa0 f7679fb4 cd413720 cd419bc4 00000020 00000000 bfd00e68 c05b6ff0
Call Trace:
[<c0434180>] attach_pid+0x6c/0x98
[<c05b6ff0>] sys_setsockopt+0x76/0x95
[<c05b7004>] sys_setsockopt+0x8a/0x95
[<c061d5e6>] rwsem_down_write_failed+0x126/0x141
[<c0438ed5>] .text.lock.rwsem+0x2b/0x3a
[<c0466fe9>] sys_mmap_pgoff+0x3c/0x81
[<c0404f17>] syscall_call+0x7/0xb
=======================
INFO: task mysqld:7752 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mysqld D 0000190D 2152 7752 2721 7753 7751 (NOTLB)
e446bf2c 00200082 dbd6102d 0000190d 5c8c1e8a 0000000f de58b009 00000007
f5066550 dbd76819 0000190d 000157ec 00000000 f506665c cd412d80 f76a3740
de58b000 00000401 cd413720 cd419bc4 00000020 00000000 cd41323c de58b000
Call Trace:
[<c061d729>] rwsem_down_read_failed+0x128/0x143
[<c0438edf>] .text.lock.rwsem+0x35/0x3a
[<c061ead5>] do_page_fault+0x249/0x64f
[<c0449d96>] audit_syscall_entry+0x15a/0x18c
[<c061e88c>] do_page_fault+0x0/0x64f
[<c0405a89>] error_code+0x39/0x40
=======================
INFO: task mysqld:8197 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mysqld D 00001911 2572 8197 2721 8198 7898 (NOTLB)
e152ef2c 00200082 25a3ee6e 00001911 00000000 00000000 73c4e076 00000007
f02ee550 25a46ae2 00001911 00007c74 00000002 f02ee65c cd420a08 f76a3740
00000064 e152eefc cd4213a8 cd42784c 00000020 00000000 e152ef44 e152ef44
Call Trace:
[<c061d729>] rwsem_down_read_failed+0x128/0x143
[<c0438edf>] .text.lock.rwsem+0x35/0x3a
[<c061ead5>] do_page_fault+0x249/0x64f
[<c0449d96>] audit_syscall_entry+0x15a/0x18c
[<c061e88c>] do_page_fault+0x0/0x64f
[<c0405a89>] error_code+0x39/0x40
=======================
Out of socket memory

______________________________________

Any assistance in interpreting this would be appreciated. The site keeps going down more and more often, so there is a problem that is continuing to get worse.... Thanks, WCW
 
Old 04-07-2011, 03:11 PM   #4
Willow315
LQ Newbie
 
Registered: Apr 2011
Posts: 5

Original Poster
Rep: Reputation: 0
Here are some entries from my /var/log/mysqld.log file.

I checked all the persmissions in relation to these files. found in another blog that this first sequence was caused by incorrect permissions....but mine are OK.

110404 10:56:20 [Note] Starting crash recovery...
110404 10:56:20 [Note] Crash recovery finished.
110404 10:56:20 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his host
name changed!! Please use '--relay-log=mysqld-relay-bin' to avoid this problem.
110404 10:56:20 [ERROR] /usr/libexec/mysqld: File './mysqld-relay-bin.000001' not found (Errcode: 2)
110404 10:56:20 [ERROR] Failed to open log (file './mysqld-relay-bin.000001', errno 2)
110404 10:56:20 [ERROR] Failed to open the relay log './mysqld-relay-bin.000001' (relay_log_pos 4)
110404 10:56:20 [ERROR] Could not open log file
110404 10:56:20 [ERROR] Failed to initialize the master info structure
110404 10:56:20 [Note] Event Scheduler: Loaded 0 events
110404 10:56:20 [Note] /usr/libexec/mysqld: ready for connections.

We do have a second server, but it's not being replicated in the standard fashion. Custom script has been written.
However, we did have mysqld-relay-bin files, but the not in this location nor do we have that number. Ours start at 000013. That path leads to a folder where there is an executable file called mysqld, but no folder or files by that name. Don't even know if that's what should be there. All my mysqld-relay-bin files are in /var/lib/mysql folder.

Last edited by Willow315; 04-07-2011 at 03:12 PM.
 
Old 04-09-2011, 06:05 PM   #5
Walter.Stroebel
LQ Newbie
 
Registered: Apr 2011
Location: Arnhem, The Netherlands
Distribution: CentOS
Posts: 19

Rep: Reputation: 6
It would seem you do have replication set up (or had it set up).
Examine /etc/my.cnf for lines like bin-log=xxxx and server-id=xxxx which would indicate replication was set up at some point.
Also check in /var/lib/mysql for any files ending with .info

If I would have to guess I would say you restored a backup from a setup that did have replication between two mysql servers and you now have a *TON* of data waiting to be replicated to another server that does not exist (anymore).

If you do have /var/lib/mysql/*.info files but are sure there should be no replication going on, just rename them to something like (for instance) master.info.old
That will probably solve your problem.

Additional tips:
- Recent versions of phpmyadmin will show replication status and allow you to set it up fairly simply.
- Read up on mysql replication at www.mysql.com

Hope this helps,
Regards,
Walter.
 
Old 04-11-2011, 09:34 AM   #6
Willow315
LQ Newbie
 
Registered: Apr 2011
Posts: 5

Original Poster
Rep: Reputation: 0
Thank you Walter!!! I will look into all these. We actually DID do a db restore because several months ago the site crashed and I COULD NOT get it up, no matter what I did. A db restore was done.

There is STILL a server that custom scripts have been written for to transfer mySQL data to...I thought that becuase of these custom scripts, normal Replication was not used. but, as I mentioned, I am a newbie, and The problem is only getting worse.

I REALLY appreciate your response, and will check everything you have recommended. WCW
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
CentOS server crashing, too many httpd processes, mysql crashes too JackJermey Linux - Server 16 05-10-2011 09:59 AM
firefox repeatedly crashing CMXILies Linux - Newbie 17 03-04-2009 04:15 AM
Server crashing once a day. CentOS 4.4 Kandahar Linux - General 5 09-25-2008 09:16 PM
SuSE 10.0 Crashing Repeatedly jantman SUSE / openSUSE 3 12-20-2006 09:24 PM
10.1, server crashing repeatedly wheel Mandriva 1 04-04-2006 03:39 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 04:38 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration