Slackware This Forum is for the discussion of Slackware Linux.
|
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
|
09-06-2013, 01:32 PM
|
#1
|
MLED Founder
Registered: Jun 2011
Location: Montpezat (South France)
Distribution: CentOS, OpenSUSE
Posts: 3,453
|
Root server crash: hunting down the cause of the crash
Hi,
I have a public server running Slackware64 14.0, with the following services :
- DNS (Bind)
- LAMP (Apache / PHP / MySQL)
- IMAP mail (Postfix / Dovecot / Postgrey)
- Streaming audio (Icecast / MPD)
The server is hosting a few static sites, a few dynamic CMS sites, our local school's management platform, and a small webradio.
It's not very powerful: a single-core processor (VIA Nano processor U2250 (1.6GHz Capable)) and 2 GB RAM.
I've carefully added services and users one by one, each time measuring resources using top and free and the likes.
Everything sort of works fine, but nonetheless, every three or four days, the server becomes unresponsive, and the automatic monitoring services ends up rebooting it after a while. So every three or four days, I have something like 15 minutes of downtime, which is not good.
Now I've setup a good dozen local LAN servers for clients, running 24/7/365, without any major problems. None of these machines has ever given me a headache. But now I'm puzzled. I'd like to investigate the cause of these regular crashes of my public machine, but I don't quite know where even to begin.
Any suggestions?
|
|
|
09-06-2013, 01:43 PM
|
#2
|
Member
Registered: Aug 2012
Distribution: Ubuntu 10.04, CentOS 6.3, Windows 7
Posts: 262
Rep:
|
First place to look would be the system logs. Check to see what's going on in there at the time of the crash.
|
|
1 members found this post helpful.
|
09-06-2013, 02:10 PM
|
#3
|
Slackware Contributor
Registered: Sep 2005
Location: Eindhoven, The Netherlands
Distribution: Slackware
Posts: 8,559
|
My first guess would be a DDoS against your web server. Or just too many people interested in downloading your MLES.
Eric
|
|
|
09-06-2013, 02:19 PM
|
#4
|
MLED Founder
Registered: Jun 2011
Location: Montpezat (South France)
Distribution: CentOS, OpenSUSE
Posts: 3,453
Original Poster
|
Quote:
Originally Posted by YankeePride13
First place to look would be the system logs. Check to see what's going on in there at the time of the crash.
|
I just spent some time leafing through everything in /var/log around +/- 10 min the time of the crash, but there's nothing suspicious.
|
|
|
09-06-2013, 02:21 PM
|
#5
|
MLED Founder
Registered: Jun 2011
Location: Montpezat (South France)
Distribution: CentOS, OpenSUSE
Posts: 3,453
Original Poster
|
Quote:
Originally Posted by Alien Bob
My first guess would be a DDoS against your web server. Or just too many people interested in downloading your MLES.
Eric
|
My MLES/MLED/MLWS is hosted on another server, so this is not the cause.
Is there any way to know if a DDoS has happened? And if that is the case, are there any countermeasures?
|
|
|
09-06-2013, 04:03 PM
|
#6
|
Member
Registered: Aug 2004
Location: MD
Distribution: Slackware
Posts: 114
Rep:
|
Check weblogs and bind logs for unusual activity. GoAccess might by handy for getting a quick weblog overview.
If you can configure your monitoring to restart the affected services rather than the whole machine, then, assuming that is enough to recover, you might be able to mitigate the downtime at least while you narrow things down.
|
|
|
09-06-2013, 04:27 PM
|
#7
|
Member
Registered: Apr 2011
Location: California, USA
Distribution: Slackware
Posts: 528
|
Disk Space
On a less sophisticated level you may want to see if it's running out of disk space. The somewhat regular failure suggests a memory leak or full disk (temporary files) is worth investigating.
Once a needed disk partition is full all sorts of symptoms can appear. In a previous life managing lots of UNIX servers this problem used to bite me about once a year. I eventually learned to check for resource exhaustion first.
|
|
|
09-06-2013, 04:46 PM
|
#8
|
MLED Founder
Registered: Jun 2011
Location: Montpezat (South France)
Distribution: CentOS, OpenSUSE
Posts: 3,453
Original Poster
|
Quote:
Originally Posted by Tracy Tiger
On a less sophisticated level you may want to see if it's running out of disk space. The somewhat regular failure suggests a memory leak or full disk (temporary files) is worth investigating.
Once a needed disk partition is full all sorts of symptoms can appear. In a previous life managing lots of UNIX servers this problem used to bite me about once a year. I eventually learned to check for resource exhaustion first.
|
No, that's not it.
Code:
# df -h
Sys. fich. Taille Util. Dispo Uti% Monté sur
/dev/sda3 145G 7,6G 130G 6% /
/dev/sda1 92M 34M 53M 40% /boot
tmpfs 986M 0 986M 0% /dev/shm
|
|
|
09-06-2013, 05:16 PM
|
#9
|
Member
Registered: Apr 2011
Location: California, USA
Distribution: Slackware
Posts: 528
|
Although it's just one of a million things that can cause your system to crash, to check for resource exhaustion you need to know what the state of the resource is just before it crashes, not when the system is running without problems.
In the past I set up cron jobs to regularly log the suspect areas to look for patterns. In some cases I took snapshots of resource usage every few seconds (keeping only the last few minutes worth) as the system went from healthy to broken in less than a minute.
But of course don't waste time on this if it's not the likely problem.
Regarding a network based problem ...
You're not new to the game so you probably already know that DOS/DDOS is a general term than can take many forms. A basic firewall using netfilter (iptables) can eliminate the basic ones by limiting the packets in different ways and to prevent table exhaustion and incomplete sessions. Iptables has helped me narrow down and find network based attacks on a couple of occasions.
I'm probably not mentioning anything that you don't already know. My understanding is basic so others can probably suggest newer and more efficient tools to protect your network and discover problems.
Many of us have spent days trying to solve a software problem that turns out to be an intermittent hardware failure. Don't forget that possibility.
Last edited by TracyTiger; 09-06-2013 at 05:24 PM.
Reason: Added last sentence
|
|
|
09-06-2013, 09:42 PM
|
#10
|
LQ 5k Club
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware64-15.0
Posts: 6,493
|
What is the form factor of the server? That CPU is associated with notebook designs. Could this be an overheating problem?
|
|
|
09-06-2013, 11:18 PM
|
#11
|
LQ Guru
Registered: Jul 2011
Location: California
Distribution: Slackware64-15.0 Multilib
Posts: 6,564
|
Crashes can be one of many things especially if it's software related, but if it's doing it every 3-4 days, then most likely it's something to do with a service you're running that is generating disk usage issues, or a piece of hardware that is slowly failing.
A few questions I might ask:
1. How old is the hardware? (Each component's age would help.)
2. You said you run a mail service IMAP correct? How is the disk usage for the mail system, and how often does it take for the services to generate more than 20GB of disk usage including log files?
3. What temperature does your system idle at? Less or more than 50 degrees Celsius?
4. Do you run a software SPI (Stateful Packet Filtering) firewall like IPTables or a Hardware Firewall like a Barracuda Networks brand firewall?
5. One last question, but have you ever done a stress test to where you have a service running by itself for at least 5 days total to check for instabilities, before adding other services?
My educated guess is pointing towards the mail service allocating too much disk space for itself and then shutting down the server by over taxing the hard disk space with temporary generated files. If necessary, could you allocate a separate server just for mail services alone?
|
|
1 members found this post helpful.
|
09-07-2013, 02:02 AM
|
#12
|
Member
Registered: Oct 2003
Location: West Midlands, UK
Distribution: Slackware 14 (Server),OpenSuse 13.2 (Laptop & Desktop),, OpenSuse 13.2 on the wifes lappy
Posts: 781
Rep:
|
Well,
It seems that you are being guided toward this being a software problem, when it seems pretty obvious (to me at least) that it's a hardware problem. I would suspect that the electrolytic caps on the motherboard are in the early stages of failure and over varying and unpredictable periods of time, which would generally seem to be unrelated to anything the server might be doing, cause a reboot at processor level.
I would expect that the times between reboots will start to get closer together over the next few months until eventually it just wont restart. See this all the time where I work. On average out of the 4000+ or so servers we use, at any one time around 10% exhibit this problem and it's always hardware.
|
|
2 members found this post helpful.
|
09-07-2013, 02:18 AM
|
#13
|
Member
Registered: Mar 2005
Location: Winnipeg, MB
Distribution: Raspbian, Debian, Slackware, OS X
Posts: 443
Rep:
|
Template matching works %100 of the time %80 percent of the time! Something like this could easily be either hardware or software. If you have the redundancy/capability to deal with downtime, it might be worth taking the machine apart and doing some component isolation if the logs don't prove helpful.
I don't see any mention of using fsck to check the health of your partitions, or something like memtest86+ to check out your RAMs. (If you have specific, licensed diagnostic tools for your hardware that would be a plus) I'm not great at troubleshooting *nix logfiles, so that's why I'm talking hardware, but still if something is software and can be easily fixed, I'd definitely want to determine that first, especially since a software fix is typically less expensive (In my experience).
|
|
|
09-07-2013, 03:10 AM
|
#14
|
MLED Founder
Registered: Jun 2011
Location: Montpezat (South France)
Distribution: CentOS, OpenSUSE
Posts: 3,453
Original Poster
|
Thank you very much to everybody for your precious input.
The server itself is not a machine I bought, it's an el-cheapo root server renting offer from the french hoster Online (10 euros per month with unlimited bandwith). It's a single core processor, 2 GB RAM and 160 GB disk space (upgraded to 500 GB on recent offers). The machine comes with either Debian, Ubuntu LTS or CentOS preinstalled, but I managed to install Slackware on it using the Live Rescue session.
I think the right thing to do here would be a simple upgrade to real server hardware. I cringe at the thought of migrating all my freshly installed mail accounts, CMS sites and everything, but I think this would be the least of all evils.
|
|
|
09-07-2013, 06:43 AM
|
#15
|
MLED Founder
Registered: Jun 2011
Location: Montpezat (South France)
Distribution: CentOS, OpenSUSE
Posts: 3,453
Original Poster
|
Quote:
Originally Posted by kikinovak
I think the right thing to do here would be a simple upgrade to real server hardware. I cringe at the thought of migrating all my freshly installed mail accounts, CMS sites and everything, but I think this would be the least of all evils.
|
OK, just ordered a new server with a big fat hardware upgrade. Costs about thrice as much, but that's the price of sound sleep. In the meantime I'll mark this thread as SOLVED.
|
|
|
All times are GMT -5. The time now is 12:08 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|