LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 06-27-2012, 11:24 PM   #1
jbenezech
LQ Newbie
 
Registered: Jun 2012
Posts: 12

Rep: Reputation: Disabled
Server overload apache/php/mysql


Hi all,
I'm posting this in the apache forum although the topic is broader I think.
I've been asked to look at a site generating frequent 500 internal errors. The site is an apache/joomla/mysql install run on a dedicated linux server.
Looking at the server, I see extremely heavy load averages: around 30 in normal hours and up to 90 during busy hours.
I figure there is some sort of configuration problem but haven't been able to find where (apache, mysql, php). Note that mod_suphp is enabled and php scripts are run as site owner. Php5 used.
I'd appreciate your inputs.

Here are some infos:

Apache:

Apache/1.3.42 (Unix) mod_gzip/1.3.26.1a mod_log_bytes/1.2 mod_bwlimited/1.4 mod_auth_passthrough/1.8 FrontPage/5.0.2.2635 mod_ssl/2.8.31 OpenSSL/0.9.8e-fips-rhel5 configured

Top output (2208 http connections at the time):

top - 10:30:55 up 28 days, 16:48, 1 user, load average: 44.47, 38.04, 36.38
Tasks: 285 total, 4 running, 275 sleeping, 0 stopped, 6 zombie
Cpu(s): 45.3%us, 9.9%sy, 0.0%ni, 43.6%id, 1.1%wa, 0.0%hi, 0.2%si, 0.0%st
Mem: 12462248k total, 5214072k used, 7248176k free, 409068k buffers
Swap: 2104504k total, 144k used, 2104360k free, 3732868k cached


PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4597 myuser 17 0 30568 13m 6840 R 95.9 0.1 0:00.49 php
4485 myuser 16 0 100m 21m 7296 R 84.1 0.2 0:02.05 php
23401 mysql 15 0 2432m 366m 4296 S 68.5 3.0 614:01.70 mysqld
4567 myuser 16 0 105m 24m 7560 R 54.8 0.2 0:01.90 php
4569 myuser 16 0 108m 28m 7672 S 50.9 0.2 0:01.97 php
4589 myuser 17 0 96252 13m 6932 R 47.0 0.1 0:00.33 php
4595 myuser 17 0 0 0 0 Z 37.2 0.0 0:00.20 php <defunct>


mysqladmin proc stat

Uptime: 71024 Threads: 22 Questions: 17147915 Slow queries: 7466 Opens: 3723 Flush tables: 1 Open tables: 1950 Queries per second avg: 241.438
 
Old 06-28-2012, 01:19 AM   #2
zer0signal
Member
 
Registered: Oct 2010
Location: Cleveland
Distribution: Slackware, Fedora, RHEL (4,5), LFS 6.7, CentOS
Posts: 258

Rep: Reputation: 29
what are the zombie processes i see top is showing...
 
Old 06-28-2012, 02:37 AM   #3
jbenezech
LQ Newbie
 
Registered: Jun 2012
Posts: 12

Original Poster
Rep: Reputation: Disabled
I think it's only finishing processes (php, httpd, exim). No process is staying in zombie state, although some php are staying in that state for a few seconds.

Last edited by jbenezech; 06-28-2012 at 02:43 AM.
 
Old 06-28-2012, 03:04 AM   #4
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
500 return codes are the "bad" ones: bad configuration (internal error), bad code, bad time (server busy), etc, etc, so it does not suffice to post the server string and top information.
* BTW if you want to run 'top' then make the output more meaningful by temporarily renaming your ~/.toprc and saving this as ~/.toprc:
Code:
RCfile for "top with windows"
Id:a, Mode_altscr=0, Mode_irixps=1, Delay_time=1.000, Curwin=0
Def     fieldscur=AEHIOQTWKNMbcdfgjplrsuvyzX
        winflags=30201, sortindx=10, maxtasks=0
        summclr=1, msgsclr=1, headclr=3, taskclr=1
Job     fieldscur=ABcefgjlrstuvyzMKNHIWOPQDX
        winflags=62777, sortindx=0, maxtasks=0
        summclr=6, msgsclr=6, headclr=7, taskclr=6
Mem     fieldscur=ANOPQRSTUVbcdefgjlmyzWHIKX
        winflags=62777, sortindx=13, maxtasks=0
        summclr=5, msgsclr=5, headclr=4, taskclr=5
Usr     fieldscur=ABDECGfhijlopqrstuvyzMKNWX
        winflags=62777, sortindx=4, maxtasks=0
        summclr=3, msgsclr=3, headclr=2, taskclr=3
and then run this one shot invocation: 'top -u myuser -n 1 -H 2>&1 | tee /path/to/top.log'.
- Take stock of the exact application versions that make up the web stack (web server, database and interpreter) and get a grip on what runs on top of that so that means Joomla, its configuration and including all plug-ins and homebrewn scripts. If versions don't match with the latest stable the vendor supplies you know what to do.
- Look for clues in system and daemon log files and include recently rotated log files. Logwatch can generate a report for you, making it easier to look for leads.
- Watching performance, start with mysqltop(-like) process list watching like 'watch -d 'mysql -e"show processlist;"';' or mtop or mytop, and wtop/logrep or any apachetop(-like) tool. As for the ^.*top$ tools watch out as there's different implementations around, some seemingly more recent than others.
 
Old 06-28-2012, 03:52 AM   #5
jbenezech
LQ Newbie
 
Registered: Jun 2012
Posts: 12

Original Poster
Rep: Reputation: Disabled
Thanks for the tips. I know query optimization is definitely needed given joomla framework but there must be something else that brings down this server which should be able to handle the load (8 processors, 12GB mem). I've seen some problems in logs (premature end of script header for instance) but which only happen when the server is overloaded. The 500 errors also only happen when there is too much load.
I'll start with mtop and see what happens.
 
Old 06-28-2012, 04:46 AM   #6
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
Quote:
Originally Posted by jbenezech View Post
I've seen some problems in logs
I choose dealing with cold, hard unambiguous data over accounts of things. Could it be that in your case more pairs of eyeballs could see more? As in attach any reporting and whatever tool output you have?
 
Old 06-28-2012, 05:45 AM   #7
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,141

Rep: Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123
In addition to all that, when you see problems, run these and post the output
Code:
top -b -n 1 | awk '{if (NR <=7) print; else if ($8 == "D") {print; count++} } END {print "Total status D: "count}' > topsave.txt
grep -i blocked /proc/stat
The first will list tasks in uninterruptible sleep, and the second the number blocked on I/O (no, they ain't necessarily the same).
 
Old 06-28-2012, 12:53 PM   #8
zer0signal
Member
 
Registered: Oct 2010
Location: Cleveland
Distribution: Slackware, Fedora, RHEL (4,5), LFS 6.7, CentOS
Posts: 258

Rep: Reputation: 29
I may be out of my league here, but do you have Sysstat running? You could use it too seem whats going on in a 24hr/10day range..

man sar
 
Old 07-04-2012, 05:00 AM   #9
jbenezech
LQ Newbie
 
Registered: Jun 2012
Posts: 12

Original Poster
Rep: Reputation: Disabled
Sorry I could not post more details earlier. I was trying some tools you suggested and tried to see what was relevant or not in the logs.

First of all, regarding config:
Apache:
Apache/1.3.42 (Unix) mod_gzip/1.3.26.1a mod_log_bytes/1.2 mod_bwlimited/1.4 mod_auth_passthrough/1.8 FrontPage/5.0.2.2635 mod_ssl/2.
8.31 OpenSSL/0.9.8e-fips-rhel5 configured -- resuming normal operations
[Wed Jul 4 05:32:07 2012] [notice] suEXEC mechanism enabled (wrapper: /usr/local/apache/bin/suexec)

Joomla: 1.5
PHP: 5
MYSQL: Server version: 5.0.91-community-log MySQL Community Edition

I had found some errors in the sys message log about full EXT3 directory. Cleaning the /tmp directory and adding a crontab to keep it clean solved that problem.

I changed some apache config, mainly put KeepAlive On again with a KeepAliveTimeout at 5 as I had noticed thousands of sockets in TIME_WAIT state.
MaxRequestsPerChild 250
MaxClients 1000
KeepAlive On
Timeout 300
KeepAliveTimeout 5
MaxKeepAliveRequests 250

Using mysqltop, I saw a number of queries being locked by constant table updates. These where caused by joomla updating the main table jos_content to increment page hit counter. I suppressed this counter.

After these changes, the server is still struggling with 30% load average but we did not notice 500 internals. We'll it's back today.

Here are some outputs at the time of the problem:

> netstat | grep http | wc -l
558

> netstat -alpn | grep :80 | awk '{print $5}' |awk -F: '{print $(NF-1)}' |sort | uniq -u | wc -l
69

> mysqladmin -i1 extended | grep Threads_connected
16

>ps -el | grep D
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
0 D 0 12436 12001 1 78 0 - 957 - pts/1 00:00:00 rm
5 D 504 22121 5073 0 86 10 - 5923 - ? 00:00:08 pure-ftpd


See also attached mysqlprocstat.txt, top.log as per recommandations previously, topsave.txt as per previous request

Finally, a strace of a processs showed a line with a very long execution time:
1114 read(3, "", 4096) = 0 <5.870374>

I could atach the trace file but it's quite big and I'm not sure which lines would be relevant. Please let me know.
Attached Files
File Type: txt mysqlprocstat.txt (1.8 KB, 19 views)
File Type: txt top.log.txt (3.0 KB, 30 views)
File Type: txt topsave.txt (908 Bytes, 13 views)
 
Old 07-04-2012, 05:22 AM   #10
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,141

Rep: Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123Reputation: 4123
Well I expected to see (lots) more status "D" tasks - blows my theory away.
Pity you didn't get that blocked count I asked for as well ...
 
Old 07-04-2012, 06:08 AM   #11
TenTenths
Senior Member
 
Registered: Aug 2011
Location: Dublin
Distribution: Centos 5 / 6 / 7
Posts: 3,483

Rep: Reputation: 1556Reputation: 1556Reputation: 1556Reputation: 1556Reputation: 1556Reputation: 1556Reputation: 1556Reputation: 1556Reputation: 1556Reputation: 1556Reputation: 1556
Your website must be generating some traffic if you're hitting 225 MySQL queries a second. My vBulletin site doing 1.5M pages a month only averages 85 Q/s

Have you considered that your site may need a dedicated database server instead of trying to run the web server and the database server on the same box?
 
Old 07-04-2012, 05:46 PM   #12
jbenezech
LQ Newbie
 
Registered: Jun 2012
Posts: 12

Original Poster
Rep: Reputation: Disabled
Blocked count was 2, sorry didn't post. Thought the list was enough.
Regarding the queries per second on mysql, I find this number very suspicious. Monitoring mysql with mtop with a refresh of 2s, I see max 6-7 queries running. I don't know how to reset this number to start fresh stats. I think the site might have been victim of DOS attack 2 months ago and maybe then thousands of queries were sent and the average doesn't reflect current reality. Another theory (I don't know how mysql calculates this number) is maybe that every sub-query is counted as a query. Default joomla queries are usually very poorly optimized and made of inner/outer joins, sub-queries...
 
Old 07-05-2012, 12:05 AM   #13
jbenezech
LQ Newbie
 
Registered: Jun 2012
Posts: 12

Original Poster
Rep: Reputation: Disabled
I managed to strace httpd process when getting a 500. The sudo php child process is being terminated by SIGKILL:

5685 poll([{fd=5, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) <0.000010>
5685 write(5, "\212\2\0\0\3SELECT * FROM jos_content I"..., 654) = 654 <0.053426>
5685 read(5, <unfinished ...>
5685 +++ killed by SIGKILL +++
32407 <... read resumed> "", 4096) = 0 <26.886731>
32407 --- SIGCHLD (Child exited) @ 0 (0) ---
32407 time(NULL) = 1341464085 <0.000005>
32407 write(15, "[Thu Jul 5 11:54:45 2012] [erro"..., 135) = 135 <0.000009>

Am I hitting some kind of server limit ?

ulimit -a:
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 212992
max locked memory (kbytes, -l) 2048
max memory size (kbytes, -m) unlimited
open files (-n) 50000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 28670
cpu time (seconds, -t) unlimited
max user processes (-u) 14335
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
 
Old 07-05-2012, 07:22 PM   #14
jbenezech
LQ Newbie
 
Registered: Jun 2012
Posts: 12

Original Poster
Rep: Reputation: Disabled
Sigkill

I did some more strace and it seems that the 500 are due to this SIGKILL. This KILL doesn't seem to be linked to any particular action (mysql, file system...):

5685 read(5, <unfinished ...>
5685 +++ killed by SIGKILL +++
32407 <... read resumed> "", 4096) = 0 <26.886731>
32407 --- SIGCHLD (Child exited) @ 0 (0) ---

19454 open("/opt/xslt//lib/libnss_files.so.2", O_RDONLY <unfinished ...>
19454 +++ killed by SIGKILL +++
32030 <... read resumed> "", 4096) = 0 <0.558116>
32030 --- SIGCHLD (Child exited) @ 0 (0) ---

16359 mkdir("/tmp/eaccelerator/8/d", 0777) = -1 EEXIST (File exists) <0.000010>
16359 umask(022) = 0 <0.000007>
16359 +++ killed by SIGKILL +++
32030 <... read resumed> "", 4096) = 0 <0.241756>
32030 --- SIGCHLD (Child exited) @ 0 (0) ---
 
Old 07-10-2012, 03:41 AM   #15
jbenezech
LQ Newbie
 
Registered: Jun 2012
Posts: 12

Original Poster
Rep: Reputation: Disabled
Still no clue what's going on but I think I found what was killing the processes.
The host runs a perl script which checks every 5 seconds the load average and kills some processes, starting with the php processes if load average is too high. Processes are getting killed with SIGKILL which is why I guess the end user gets a 500 error.
Back to the root of the problem, this is happening because the server is over loaded on CPU but I still can't figure out why.

I don't think the problem comes from memory, there are always a few GB available.
Not sure it comes from IO either, iowait is always around 1%, await from iostat averages to 4.70 (but with picks to 300).

The top command shows processes using 100% CPU, mainly mysql and php.
I've run quite a few tools and the only thing I can see not being quite right is the sar output showing an average of 35 in run queue size.
Also the number of context switching seems high around 2000 /s.

So it looks like some processes get access to 100% CPU while the other ones wait. Next one gets 100% and so on. How come this is not better balanced ?
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
mysql using 50%-110% of cpu / Getting [statscheck] Stats/Server Overload on server!!! CrewXp Linux - Server 2 05-16-2007 05:28 AM
communication of PHP ,MySQL and apache server shweta Linux - Server 1 08-14-2006 10:37 AM
PHP/MySQL and Apache server setup nenyo Linux - Software 7 02-12-2005 01:48 PM
Apache+php on one server and MySQL on onother server? goe Linux - Newbie 3 08-19-2004 04:25 PM
Installing PHP and MySQL w/ Apache web server dcrealm47 Linux - Newbie 5 02-26-2003 11:21 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 04:08 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration