LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 12-13-2017, 11:50 PM   #1
ethonbridges
LQ Newbie
 
Registered: Dec 2017
Posts: 7

Rep: Reputation: Disabled
Question Dovecot/postfix slows machine to a crawl, takes 30 minutes to reboot...


I have been running a postfix/dovecot mail server for about 5 years now, generally with no issues. The server is located in a data center about half an hour away from my office, so everything is managed remotely.

As of the last couple of months, I am getting user calls about once every 2 or 3 days stating that the mail server is not responding. Logging into the machine via SSH is slow as molasses, and a shutdown -r now command takes about half an hour to reboot the machine. Once it reboots, it's pretty speedy again. In looking at a top command, it appears that the IMAP process is usually the one taking the bulk of the CPU, so it's probably more likely a dovecot problem than postfix.

I have about 100 email accounts (including my own), but a pretty large maildir of about 100G between all the users.

I don't see any evidence of maildir corruption or loss of data, just the slowdown every so often. I'm trying to determine if I actually have something wrong with a server that has been relatively trouble free since I built it, or if it's simply a case of overloading the machine as it has grown over the years.

So I guess my questions are:

1. How can I determine if the slowdown is the result of something malicious?
2. Can the user's maildir's be checked for corruption?
3. How can I determine if it's the usage?
4. Is there some clean-up or maintenance process in dovecot or postfix that might be running that hogs the machine?

I'm not a total Linux noob, but in this particular area, I'm not sure where to begin to troubleshoot something like this.

Ethon
 
Old 12-14-2017, 12:08 AM   #2
descendant_command
Senior Member
 
Registered: Mar 2012
Posts: 1,876

Rep: Reputation: 643Reputation: 643Reputation: 643Reputation: 643Reputation: 643Reputation: 643
No 'usual suspect' springs to mind.

Probably closely inspect the logs around the 'slowdown' events.

Maybe crank up the logging verbosity and run some cronjob to dump top|netstat|iotop|lsof|doveadm etc output periodically to try and catch whatever's going on.

No funny dmesg output that might indicate kernel oopses or running out of resources etc?

Do you have any timebased resource monitoring like munin or such, on it?

Last edited by descendant_command; 12-14-2017 at 12:11 AM.
 
Old 12-14-2017, 12:23 AM   #3
ethonbridges
LQ Newbie
 
Registered: Dec 2017
Posts: 7

Original Poster
Rep: Reputation: Disabled
Right after I posted the message, it started doing it again. Started seeing:

Dec 14 00:19:22 mail dovecot: master: Error: service(auth-worker): Initial status notification not received in 30 seconds, killing the process
Dec 14 00:19:22 mail dovecot: master: Error: service(auth-worker): kill(32177, SIGKILL) failed: Permission denied
Dec 14 00:20:04 mail dovecot: imap: Error: Internal auth failure (client-pid=32175 client-id=1)
Dec 14 00:20:05 mail dovecot: master: Error: service(ssl-params): Initial status notification not received in 30 seconds, killing the process
Dec 14 00:20:05 mail dovecot: master: Error: service(ssl-params): child 32182 killed with signal 9
Dec 14 00:20:05 mail dovecot: master: Error: service(ssl-params): command startup failed, throttling
Dec 14 00:20:05 mail dovecot: imap-login: Fatal: Corrupted SSL ssl-parameters.dat in state_dir: Truncated file


Quote:
Do you have any timebased resource monitoring like munin or such, on it?
I don't know what that is or how to use it. I'll have to research it further.

Incidentally, restarting postfix/dovecot has no effect on the issue. Once it has started to crawl, it's slow. That would seem to indicate a system problem to me...

Ethon

Last edited by ethonbridges; 12-14-2017 at 12:28 AM. Reason: Clarity
 
Old 12-14-2017, 12:49 AM   #4
descendant_command
Senior Member
 
Registered: Mar 2012
Posts: 1,876

Rep: Reputation: 643Reputation: 643Reputation: 643Reputation: 643Reputation: 643Reputation: 643
Any relevant updates recently?
What OS?
What hardware? (real or virtual?)
Disk space?

Last edited by descendant_command; 12-14-2017 at 12:50 AM.
 
Old 12-14-2017, 01:06 AM   #5
ethonbridges
LQ Newbie
 
Registered: Dec 2017
Posts: 7

Original Poster
Rep: Reputation: Disabled
No updates that I am aware of.

CentOS 6.9

Real. Dedicated only running postfix and dovecot. Intel Celeron(R) CPU 420 1.6Ghz 1 Core

500G drive with 380G free. 1G RAM.

Last edited by ethonbridges; 12-14-2017 at 01:09 AM.
 
Old 12-14-2017, 01:44 AM   #6
descendant_command
Senior Member
 
Registered: Mar 2012
Posts: 1,876

Rep: Reputation: 643Reputation: 643Reputation: 643Reputation: 643Reputation: 643Reputation: 643
RAM maybe a little low if load is high - do you have appropriate swap available?
(although dovecot is generally pretty good and not known as a memory hog).

Does 'free' show any clues?

Maybe bad hardware - failing disk or RAM?

If no config changes or software updates between working and acting up, then failing hardware is a prime suspect.

smartctl report any disk warnings?

I'd maybe take it down for a bit to run an fsck and a memtest for starters.

Last edited by descendant_command; 12-14-2017 at 01:45 AM.
 
Old 12-14-2017, 02:06 AM   #7
ethonbridges
LQ Newbie
 
Registered: Dec 2017
Posts: 7

Original Poster
Rep: Reputation: Disabled
I noticed that the RAM seems to be maxed out during the slow times, so it's probably swapping.

No smart warnings on the drive.

Since it's relatively cheap and I can always use RAM in other machines, I'm going to bump it up to 8GB (the motherboard's max) tomorrow. Have also ordered a Xeon processor to replace the Celeron, will change that out when it arrives.

New memory should be telling..

Ethon
 
Old 12-14-2017, 02:17 AM   #8
descendant_command
Senior Member
 
Registered: Mar 2012
Posts: 1,876

Rep: Reputation: 643Reputation: 643Reputation: 643Reputation: 643Reputation: 643Reputation: 643
Swapping will be slower but shouldn't cause the process timeouts under normal circumstances, unless your swap maxes out too - how much do you have assigned?

It's easy to add some, to test or buy you some time.
https://www.cyberciti.biz/faq/linux-...ap-file-howto/
 
Old 12-29-2017, 12:41 PM   #9
ethonbridges
LQ Newbie
 
Registered: Dec 2017
Posts: 7

Original Poster
Rep: Reputation: Disabled
Upgraded the processor to a Xeon, maxed out the RAM at 8GB. No problems since and I've thrown everything I can at it, doesn't bog down any more.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Using debian 6 with i386 sony vaio, laptop slows to a crawl, then I reboot... Then animalcrackers Linux - Newbie 2 10-15-2012 04:44 PM
NFS I/O slows client to a crawl k-gun12 Linux - Server 1 03-16-2011 01:11 PM
dial up connection slows to a useless crawl itschaotic Fedora - Installation 5 03-15-2007 11:11 PM
suse 9.3 slows down to a crawl jf.vdbosch SUSE / openSUSE 5 03-22-2006 10:02 AM
Squid Problem? Slows to crawl requires reboot. butters Linux - Software 2 05-07-2005 12:38 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 12:35 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration