Why did SquirrelMail, Dovecot Imap and Outgoing Mail all break at once?
Help! I need some seasoned advice please.
We're running the OldStable version of Debian Etch from August 2008. We've been using squirrelmail connecting through Dovecot's Imap and Pop3 servers since then to provide either SSH or TLS/SSL connections to postfix mail via squirrelmail on our server. Although the SSL capability is installed, we're really not using it -- choosing to use SSH with strong passwords instead.
This configuration has given us NO problems since we started... until today. For unexplained reasons this morning the IMAP interface suddenly began refusing or failing connections to everyone trying to connect through SquirrelMail (or that's the way it looks from the outside). It also fails to send out any emails. I've tried rebooting the server but that made almost no difference.
The problem was first reported by a user. I then verified it. What I was seeing BEFORE the reboot when I tried to login was an error from SquirrelMail that said:
Error connecting to IMAP server: myserver.com.
but what I'm seeing since the server reboot is:
Error connecting to IMAP server: myserver.com.
The IMAP server connect problem seems to be isolated to SquirrelMail. At least I ran 2 tests and found I CAN connect to the IMAP server using both Microsoft Outlook 2003 and Outlook Express and can see the contents of all folders on the server. So the IMAP problem only shows up in SquirrelMail. But it DOES prevent ANY users from loggin in through SquirrelMail
However, the inability to send mail OUT from the server shows up everywhere. Mail sent internally between accounts on the server -- either within a single domain or between domains -- and even from remote users connected to the server through Outlook or Outlook express gets delivered fine. But email addressed to anyone outside the server to any domain -- whether Yahoo or or MSN or Google or whereever is all bouncing back with a "relay request denied" error.
For instance, I sent an email from one of my server accounts to my yahoo inbox and it bounced back.
Other things I've tried are:
checked port status with
both port 443 and 993 are reported as open -- one with imap and the other with imaps.
I restarted inetd. It had no effect.
I restarted postfix. It had no effect.
I restarted the whole server. The IMAP login error code when attempting to login via SquirrelMail changed 11074 to 11087. That's all. All other behaviors remain the same.
I also confirmed the SquirrelMail login failure problem occurs in IE 7, IE 8 and Firefox from 3 different machines in multiple geo-locations and networks AND both with and without the user's local firewall running. So the issue is definitely ON the server and seems to be isolated to squirrelmail even though no changes have been made to squirrelmail or any of its components in months.
When I checked the mail.err log, I found the following series of seemingly useless error messages:
It sounds like someone started blocking ports. Did you check iptables? Is there a firewall device between you and the places you're trying to go that maybe the network team made changes on? Did someone introduce some new tool like Websense? Did your ISP start blocking mail suddenly.
Your issue may be with port 25 SMTP rather than the rest since everything broke at the same time.
I'm the one and only server admin on this dedicated server. Indeed, I pay the annual server lease and have designed and admin all but one of the sites on the server. I've made no changes to iptables; but I'll double-check that to confirm that's NOT the issue. The server does not run Websense. It could be that the hosting center has started blocking outgoing mail; but you'd think that if they had done that and targeted my server they would have notified me of any problem first and I've received no notices or warnings whatsoever. So for the moment I assume they are NOT blocking our outgoing mail. Once I've eliminated other potential on-server issues, I'll check that. There's no firewall device I know of in their dedicated server hosting center that could produce this result; but I'll ask about that too.
The way I see it, it seems more likely a single interruption or change somewhere in the server's mail loop has caused all these problems than that a series of coincidental events has. Therefore I'm convinced I'm looking for a single smoking gun somewhere.
So, the question is what single event could possibly cause all the behaviors I'm seeing here?
Thanks for the feedback!
The Smoking Gun?
It dawned on me a while ago that I may have known the cause of this problem all along but have been overlooking it because it was so obvious.
We had an issue involving email on the server on Monday the 26th in which users complained they were unable to login to check mail through squirrelmail. At the time I could not identify an obvious cause for the problem. Furthermore, except for squirrelmail logins, the server responded normally and did not seem to be under stress. So, after several wasted hours trying to isolate a cause for the problem, I ruled out the possibility of a DDOS attack and decided to try a remote server restart.
So without considering the impact, I logged-in as the admin and did a "shutdown -r now". I realized within seconds I should have done a more orderly shutdown; but by the time that dawned on me the server was already rebooting.
To my surprise and disappointment that reboot failed and the system did not come back up again as expected. After waiting an hour with no reply or recovery from the system, I contacted the hosting center and requested a manual reboot of the server. It came back up right away and from all the tests I ran at the time, it seemed to be fine. Email logins worked and everything else I tried seemed to work too.
That was, until the server's email went down again yesterday morning -- this time refusing to allow IMAP logins and throwing the strange error messages you see in my first post above into /var/logs/mailerr.log I've been chasing the cause of that problem ever since.
But now I'm wondering if the issue can't be all traced back to that uncontrolled shutdown on Monday.
So, my question is: "If I take the server offline in a KVM mode, aren't there some disk integrity checking utilities I can run to make sure the mail queue or other postfix datafiles on the hard drive weren't damaged by the shutdown?"
Can anyone tell me what those utilities are or point me to a procedure somewhere that will help? I've looked. But so far, I'm not having much luck.
"shutdown -r now" is clean and orderly.
So, to elminate the possibility of undetected file system damage as the cause of my email problem, what I'm proposing to do is restart the system from the secondary hard drive (that drive should be bootable) and run a manual fsck on the unmounted primary drive to confirm all files are intact and there is no undetected damage to the main file system as a result of that forced reboot.
Are you saying that's unnecessary because the journaling file system should have recovered from such glitches?
Time will tell whether I really solved the underlying problem or not. But at the moment it's running smoothly and mail IS going out again.
|All times are GMT -5. The time now is 12:17 PM.|