LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware
User Name
Password
Slackware This Forum is for the discussion of Slackware Linux.

Notices


Reply
  Search this Thread
Old 09-06-2013, 01:32 PM   #1
kikinovak
MLED Founder
 
Registered: Jun 2011
Location: Montpezat (South France)
Distribution: CentOS, OpenSUSE
Posts: 3,453

Rep: Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156
Root server crash: hunting down the cause of the crash


Hi,

I have a public server running Slackware64 14.0, with the following services :
  • DNS (Bind)
  • LAMP (Apache / PHP / MySQL)
  • IMAP mail (Postfix / Dovecot / Postgrey)
  • Streaming audio (Icecast / MPD)

The server is hosting a few static sites, a few dynamic CMS sites, our local school's management platform, and a small webradio.

It's not very powerful: a single-core processor (VIA Nano processor U2250 (1.6GHz Capable)) and 2 GB RAM.

I've carefully added services and users one by one, each time measuring resources using top and free and the likes.

Everything sort of works fine, but nonetheless, every three or four days, the server becomes unresponsive, and the automatic monitoring services ends up rebooting it after a while. So every three or four days, I have something like 15 minutes of downtime, which is not good.

Now I've setup a good dozen local LAN servers for clients, running 24/7/365, without any major problems. None of these machines has ever given me a headache. But now I'm puzzled. I'd like to investigate the cause of these regular crashes of my public machine, but I don't quite know where even to begin.

Any suggestions?
 
Old 09-06-2013, 01:43 PM   #2
YankeePride13
Member
 
Registered: Aug 2012
Distribution: Ubuntu 10.04, CentOS 6.3, Windows 7
Posts: 262

Rep: Reputation: 55
First place to look would be the system logs. Check to see what's going on in there at the time of the crash.
 
1 members found this post helpful.
Old 09-06-2013, 02:10 PM   #3
Alien Bob
Slackware Contributor
 
Registered: Sep 2005
Location: Eindhoven, The Netherlands
Distribution: Slackware
Posts: 8,559

Rep: Reputation: 8119Reputation: 8119Reputation: 8119Reputation: 8119Reputation: 8119Reputation: 8119Reputation: 8119Reputation: 8119Reputation: 8119Reputation: 8119Reputation: 8119
My first guess would be a DDoS against your web server. Or just too many people interested in downloading your MLES.

Eric
 
Old 09-06-2013, 02:19 PM   #4
kikinovak
MLED Founder
 
Registered: Jun 2011
Location: Montpezat (South France)
Distribution: CentOS, OpenSUSE
Posts: 3,453

Original Poster
Rep: Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156
Quote:
Originally Posted by YankeePride13 View Post
First place to look would be the system logs. Check to see what's going on in there at the time of the crash.
I just spent some time leafing through everything in /var/log around +/- 10 min the time of the crash, but there's nothing suspicious.
 
Old 09-06-2013, 02:21 PM   #5
kikinovak
MLED Founder
 
Registered: Jun 2011
Location: Montpezat (South France)
Distribution: CentOS, OpenSUSE
Posts: 3,453

Original Poster
Rep: Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156
Quote:
Originally Posted by Alien Bob View Post
My first guess would be a DDoS against your web server. Or just too many people interested in downloading your MLES.

Eric
My MLES/MLED/MLWS is hosted on another server, so this is not the cause.

Is there any way to know if a DDoS has happened? And if that is the case, are there any countermeasures?
 
Old 09-06-2013, 04:03 PM   #6
NeoMetal
Member
 
Registered: Aug 2004
Location: MD
Distribution: Slackware
Posts: 114

Rep: Reputation: 24
Check weblogs and bind logs for unusual activity. GoAccess might by handy for getting a quick weblog overview.

If you can configure your monitoring to restart the affected services rather than the whole machine, then, assuming that is enough to recover, you might be able to mitigate the downtime at least while you narrow things down.
 
Old 09-06-2013, 04:27 PM   #7
TracyTiger
Member
 
Registered: Apr 2011
Location: California, USA
Distribution: Slackware
Posts: 528

Rep: Reputation: 273Reputation: 273Reputation: 273
Disk Space

On a less sophisticated level you may want to see if it's running out of disk space. The somewhat regular failure suggests a memory leak or full disk (temporary files) is worth investigating.

Once a needed disk partition is full all sorts of symptoms can appear. In a previous life managing lots of UNIX servers this problem used to bite me about once a year. I eventually learned to check for resource exhaustion first.
 
Old 09-06-2013, 04:46 PM   #8
kikinovak
MLED Founder
 
Registered: Jun 2011
Location: Montpezat (South France)
Distribution: CentOS, OpenSUSE
Posts: 3,453

Original Poster
Rep: Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156
Quote:
Originally Posted by Tracy Tiger View Post
On a less sophisticated level you may want to see if it's running out of disk space. The somewhat regular failure suggests a memory leak or full disk (temporary files) is worth investigating.

Once a needed disk partition is full all sorts of symptoms can appear. In a previous life managing lots of UNIX servers this problem used to bite me about once a year. I eventually learned to check for resource exhaustion first.
No, that's not it.

Code:
# df -h
Sys. fich.     Taille Util. Dispo Uti% Monté sur
/dev/sda3        145G  7,6G  130G   6% /
/dev/sda1         92M   34M   53M  40% /boot
tmpfs            986M     0  986M   0% /dev/shm
 
Old 09-06-2013, 05:16 PM   #9
TracyTiger
Member
 
Registered: Apr 2011
Location: California, USA
Distribution: Slackware
Posts: 528

Rep: Reputation: 273Reputation: 273Reputation: 273
Although it's just one of a million things that can cause your system to crash, to check for resource exhaustion you need to know what the state of the resource is just before it crashes, not when the system is running without problems.

In the past I set up cron jobs to regularly log the suspect areas to look for patterns. In some cases I took snapshots of resource usage every few seconds (keeping only the last few minutes worth) as the system went from healthy to broken in less than a minute.

But of course don't waste time on this if it's not the likely problem.

Regarding a network based problem ...
You're not new to the game so you probably already know that DOS/DDOS is a general term than can take many forms. A basic firewall using netfilter (iptables) can eliminate the basic ones by limiting the packets in different ways and to prevent table exhaustion and incomplete sessions. Iptables has helped me narrow down and find network based attacks on a couple of occasions.

I'm probably not mentioning anything that you don't already know. My understanding is basic so others can probably suggest newer and more efficient tools to protect your network and discover problems.

Many of us have spent days trying to solve a software problem that turns out to be an intermittent hardware failure. Don't forget that possibility.

Last edited by TracyTiger; 09-06-2013 at 05:24 PM. Reason: Added last sentence
 
Old 09-06-2013, 09:42 PM   #10
allend
LQ 5k Club
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware64-15.0
Posts: 6,493

Rep: Reputation: 2809Reputation: 2809Reputation: 2809Reputation: 2809Reputation: 2809Reputation: 2809Reputation: 2809Reputation: 2809Reputation: 2809Reputation: 2809Reputation: 2809
What is the form factor of the server? That CPU is associated with notebook designs. Could this be an overheating problem?
 
Old 09-06-2013, 11:18 PM   #11
ReaperX7
LQ Guru
 
Registered: Jul 2011
Location: California
Distribution: Slackware64-15.0 Multilib
Posts: 6,564
Blog Entries: 15

Rep: Reputation: 2118Reputation: 2118Reputation: 2118Reputation: 2118Reputation: 2118Reputation: 2118Reputation: 2118Reputation: 2118Reputation: 2118Reputation: 2118Reputation: 2118
Crashes can be one of many things especially if it's software related, but if it's doing it every 3-4 days, then most likely it's something to do with a service you're running that is generating disk usage issues, or a piece of hardware that is slowly failing.

A few questions I might ask:

1. How old is the hardware? (Each component's age would help.)

2. You said you run a mail service IMAP correct? How is the disk usage for the mail system, and how often does it take for the services to generate more than 20GB of disk usage including log files?

3. What temperature does your system idle at? Less or more than 50 degrees Celsius?

4. Do you run a software SPI (Stateful Packet Filtering) firewall like IPTables or a Hardware Firewall like a Barracuda Networks brand firewall?

5. One last question, but have you ever done a stress test to where you have a service running by itself for at least 5 days total to check for instabilities, before adding other services?

My educated guess is pointing towards the mail service allocating too much disk space for itself and then shutting down the server by over taxing the hard disk space with temporary generated files. If necessary, could you allocate a separate server just for mail services alone?
 
1 members found this post helpful.
Old 09-07-2013, 02:02 AM   #12
vdemuth
Member
 
Registered: Oct 2003
Location: West Midlands, UK
Distribution: Slackware 14 (Server),OpenSuse 13.2 (Laptop & Desktop),, OpenSuse 13.2 on the wifes lappy
Posts: 781

Rep: Reputation: 98
Well,

It seems that you are being guided toward this being a software problem, when it seems pretty obvious (to me at least) that it's a hardware problem. I would suspect that the electrolytic caps on the motherboard are in the early stages of failure and over varying and unpredictable periods of time, which would generally seem to be unrelated to anything the server might be doing, cause a reboot at processor level.
I would expect that the times between reboots will start to get closer together over the next few months until eventually it just wont restart. See this all the time where I work. On average out of the 4000+ or so servers we use, at any one time around 10% exhibit this problem and it's always hardware.
 
2 members found this post helpful.
Old 09-07-2013, 02:18 AM   #13
truthfatal
Member
 
Registered: Mar 2005
Location: Winnipeg, MB
Distribution: Raspbian, Debian, Slackware, OS X
Posts: 443
Blog Entries: 9

Rep: Reputation: 32
Template matching works %100 of the time %80 percent of the time! Something like this could easily be either hardware or software. If you have the redundancy/capability to deal with downtime, it might be worth taking the machine apart and doing some component isolation if the logs don't prove helpful.
I don't see any mention of using fsck to check the health of your partitions, or something like memtest86+ to check out your RAMs. (If you have specific, licensed diagnostic tools for your hardware that would be a plus) I'm not great at troubleshooting *nix logfiles, so that's why I'm talking hardware, but still if something is software and can be easily fixed, I'd definitely want to determine that first, especially since a software fix is typically less expensive (In my experience).
 
Old 09-07-2013, 03:10 AM   #14
kikinovak
MLED Founder
 
Registered: Jun 2011
Location: Montpezat (South France)
Distribution: CentOS, OpenSUSE
Posts: 3,453

Original Poster
Rep: Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156
Thank you very much to everybody for your precious input.

The server itself is not a machine I bought, it's an el-cheapo root server renting offer from the french hoster Online (10 euros per month with unlimited bandwith). It's a single core processor, 2 GB RAM and 160 GB disk space (upgraded to 500 GB on recent offers). The machine comes with either Debian, Ubuntu LTS or CentOS preinstalled, but I managed to install Slackware on it using the Live Rescue session.

I think the right thing to do here would be a simple upgrade to real server hardware. I cringe at the thought of migrating all my freshly installed mail accounts, CMS sites and everything, but I think this would be the least of all evils.
 
Old 09-07-2013, 06:43 AM   #15
kikinovak
MLED Founder
 
Registered: Jun 2011
Location: Montpezat (South France)
Distribution: CentOS, OpenSUSE
Posts: 3,453

Original Poster
Rep: Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156Reputation: 2156
Quote:
Originally Posted by kikinovak View Post
I think the right thing to do here would be a simple upgrade to real server hardware. I cringe at the thought of migrating all my freshly installed mail accounts, CMS sites and everything, but I think this would be the least of all evils.
OK, just ordered a new server with a big fat hardware upgrade. Costs about thrice as much, but that's the price of sound sleep. In the meantime I'll mark this thread as SOLVED.
 
  


Reply

Tags
crash


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
crash () { crash|crash& }; crash grob115 Linux - Security 6 05-07-2011 04:06 AM
Crash, Crash, Crash, Crash and You Guessed it Crash! little_penguin SUSE / openSUSE 8 07-04-2005 10:34 AM
X.Org crash........very odd crash..... doctorzoidberg Linux - Software 11 01-07-2005 08:38 PM
kde crash, then other crash, now weird problems true_atlantis Linux - Laptop and Netbook 1 04-28-2004 01:01 AM
xmms crash xine crash mplayer crash paledread Linux - Software 9 03-09-2004 08:09 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware

All times are GMT -5. The time now is 12:08 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration