Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Hi all,
I'm posting this in the apache forum although the topic is broader I think.
I've been asked to look at a site generating frequent 500 internal errors. The site is an apache/joomla/mysql install run on a dedicated linux server.
Looking at the server, I see extremely heavy load averages: around 30 in normal hours and up to 90 during busy hours.
I figure there is some sort of configuration problem but haven't been able to find where (apache, mysql, php). Note that mod_suphp is enabled and php scripts are run as site owner. Php5 used.
I'd appreciate your inputs.
I think it's only finishing processes (php, httpd, exim). No process is staying in zombie state, although some php are staying in that state for a few seconds.
500 return codes are the "bad" ones: bad configuration (internal error), bad code, bad time (server busy), etc, etc, so it does not suffice to post the server string and top information.
* BTW if you want to run 'top' then make the output more meaningful by temporarily renaming your ~/.toprc and saving this as ~/.toprc:
and then run this one shot invocation: 'top -u myuser -n 1 -H 2>&1 | tee /path/to/top.log'.
- Take stock of the exact application versions that make up the web stack (web server, database and interpreter) and get a grip on what runs on top of that so that means Joomla, its configuration and including all plug-ins and homebrewn scripts. If versions don't match with the latest stable the vendor supplies you know what to do.
- Look for clues in system and daemon log files and include recently rotated log files. Logwatch can generate a report for you, making it easier to look for leads.
- Watching performance, start with mysqltop(-like) process list watching like 'watch -d 'mysql -e"show processlist;"';' or mtop or mytop, and wtop/logrep or any apachetop(-like) tool. As for the ^.*top$ tools watch out as there's different implementations around, some seemingly more recent than others.
Thanks for the tips. I know query optimization is definitely needed given joomla framework but there must be something else that brings down this server which should be able to handle the load (8 processors, 12GB mem). I've seen some problems in logs (premature end of script header for instance) but which only happen when the server is overloaded. The 500 errors also only happen when there is too much load.
I'll start with mtop and see what happens.
I choose dealing with cold, hard unambiguous data over accounts of things. Could it be that in your case more pairs of eyeballs could see more? As in attach any reporting and whatever tool output you have?
Sorry I could not post more details earlier. I was trying some tools you suggested and tried to see what was relevant or not in the logs.
First of all, regarding config:
Apache:
Apache/1.3.42 (Unix) mod_gzip/1.3.26.1a mod_log_bytes/1.2 mod_bwlimited/1.4 mod_auth_passthrough/1.8 FrontPage/5.0.2.2635 mod_ssl/2.
8.31 OpenSSL/0.9.8e-fips-rhel5 configured -- resuming normal operations
[Wed Jul 4 05:32:07 2012] [notice] suEXEC mechanism enabled (wrapper: /usr/local/apache/bin/suexec)
Joomla: 1.5
PHP: 5
MYSQL: Server version: 5.0.91-community-log MySQL Community Edition
I had found some errors in the sys message log about full EXT3 directory. Cleaning the /tmp directory and adding a crontab to keep it clean solved that problem.
I changed some apache config, mainly put KeepAlive On again with a KeepAliveTimeout at 5 as I had noticed thousands of sockets in TIME_WAIT state.
MaxRequestsPerChild 250
MaxClients 1000
KeepAlive On
Timeout 300
KeepAliveTimeout 5
MaxKeepAliveRequests 250
Using mysqltop, I saw a number of queries being locked by constant table updates. These where caused by joomla updating the main table jos_content to increment page hit counter. I suppressed this counter.
After these changes, the server is still struggling with 30% load average but we did not notice 500 internals. We'll it's back today.
Your website must be generating some traffic if you're hitting 225 MySQL queries a second. My vBulletin site doing 1.5M pages a month only averages 85 Q/s
Have you considered that your site may need a dedicated database server instead of trying to run the web server and the database server on the same box?
Blocked count was 2, sorry didn't post. Thought the list was enough.
Regarding the queries per second on mysql, I find this number very suspicious. Monitoring mysql with mtop with a refresh of 2s, I see max 6-7 queries running. I don't know how to reset this number to start fresh stats. I think the site might have been victim of DOS attack 2 months ago and maybe then thousands of queries were sent and the average doesn't reflect current reality. Another theory (I don't know how mysql calculates this number) is maybe that every sub-query is counted as a query. Default joomla queries are usually very poorly optimized and made of inner/outer joins, sub-queries...
I did some more strace and it seems that the 500 are due to this SIGKILL. This KILL doesn't seem to be linked to any particular action (mysql, file system...):
Still no clue what's going on but I think I found what was killing the processes.
The host runs a perl script which checks every 5 seconds the load average and kills some processes, starting with the php processes if load average is too high. Processes are getting killed with SIGKILL which is why I guess the end user gets a 500 error.
Back to the root of the problem, this is happening because the server is over loaded on CPU but I still can't figure out why.
I don't think the problem comes from memory, there are always a few GB available.
Not sure it comes from IO either, iowait is always around 1%, await from iostat averages to 4.70 (but with picks to 300).
The top command shows processes using 100% CPU, mainly mysql and php.
I've run quite a few tools and the only thing I can see not being quite right is the sar output showing an average of 35 in run queue size.
Also the number of context switching seems high around 2000 /s.
So it looks like some processes get access to 100% CPU while the other ones wait. Next one gets 100% and so on. How come this is not better balanced ?
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.