LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Strange random problem with sendmail (damaged mailboxes) (https://www.linuxquestions.org/questions/linux-newbie-8/strange-random-problem-with-sendmail-damaged-mailboxes-4175427775/)

niente0 09-18-2012 03:24 AM

Strange random problem with sendmail (damaged mailboxes)
 
Hello,
I have a strange problem on my Ubuntu mailserver. It happens randomly.
The problem is that sometimes one random mailbox file (/var/spool/mail/*) gets corrupted, if I edit it I see that at the beginning of file a random number of null chars (00) is inserted, partially overwriting the real content.

I'm not sure if there's a link with the fact that these e-mail boxes are checked both on computer and mobile.
Sometimes it happens when I restart the sendmail service.

To fix these mailboxes I proceed with deleting all chars (included nulls) until I find a line starting with the word "From:" (the beginning of a new message). After that, the mailbox starts working correctly.

I searched a lot on the web but it seems I'm the only one to have this particular problem.
I'm using sendmail+dovecot+squirrelmail.
Could you please help me? What could I check?

Thank you!!!

unSpawn 09-18-2012 07:37 AM

Quote:

Originally Posted by niente0 (Post 4783037)
What could I check?

- What distribution release are you using?
- Does this release provide the latest versions of Dovecot, Squirrel Mail and any dependencies and what are their exact versions?
- Is the size of these mail boxes large, huge, humongous or plain ludicrous? (What does 'ls -lh /var/spool/mail/*' return wrt size?)
- Does it only happen with just-delivered mail in spool files or also with mailboxes in ~/?
- When did this mailbox corruption first manifest itself and can you trace back any system (re-)configuration, mailbox or directory permission changes or SW upgrading?
- Do your run any Squirrel Mail plugins we should know about?
- Does it happen with any mobile, some, any application it or they use or some?
- What do the system and daemon logs show?
- And what do system and daemon logs show when a user tries to access a corrupt mailbox?
- Is the system low on memory when mailbox corruption happens and do you collect and have SAR data?
*Please note these are fifteen questions that should be answered and as verbose as possible.

niente0 09-19-2012 03:48 AM

First of all thanks for your interest in my problem!

- What distribution release are you using?

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=10.10
DISTRIB_CODENAME=maverick
DISTRIB_DESCRIPTION="Ubuntu 10.10"

- Does this release provide the latest versions of Dovecot, Squirrel Mail and any dependencies and what are their exact versions?

I don't think these are the latest versions, anyway:
Dovecot v1.1.11
SquirrelMail v1.4.21
OpenSSL 0.9.8o 01 Jun 2010
SpamAssassin version 3.3.1 running on Perl version 5.10.0

- Is the size of these mail boxes large, huge, humongous or plain ludicrous? (What does 'ls -lh /var/spool/mail/*' return wrt size?)

There are about 30 mailboxes, the biggest ones are around 500Mb. Total is 7,8Gb

- Does it only happen with just-delivered mail in spool files or also with mailboxes in ~/?

Sorry, I'm not sure I didn't understand your question... Anyway the problem seems to be only in /var/spool/mail folder

- When did this mailbox corruption first manifest itself and can you trace back any system (re-)configuration, mailbox or directory permission changes or SW upgrading?

The first time it happened was a long time ago, probably this problem is manifesting since the first setup of the server. Since then, I've only made some updates of Squirrelmail (plus I have auto-updates configured, but I think they don't work anymore because repositories were put offline)

- Do your run any Squirrel Mail plugins we should know about?

I use the standard plugins, plus "local_autorespond_forward" and "vlogin" (to manage multiple domains)

- Does it happen with any mobile, some, any application it or they use or some?

The mobiles used are of various type. In my opinion the problem is related with Blackberries (the mailbox is checked with the standard client on them). No problems with mailboxes checked with iPad or Galaxy tabs until now.

- What do the system and daemon logs show?

They are quite huge and I don't know what string I can search for. I'm trying to trace back the point in which the problem happened, when I find something I'll post it. Sorry if I have nothing to post now.

- And what do system and daemon logs show when a user tries to access a corrupt mailbox?

In mail.log and syslog I have only this:
Sep 18 08:44:14 mail dovecot: imap-login: Login: user=<p.xxxxxxxx>, method=PLAIN, rip=178.239.87.70, lip=xx.xx.xx.67
Sep 18 08:44:14 mail dovecot: IMAP(p.xxxxxxxx): Disconnected: Logged out bytes=74/357
...
Sep 18 09:45:00 mail dovecot: POP3(p.xxxxxxxx): Couldn't init INBOX: Mailbox isn't a valid mbox file
Sep 18 09:45:01 mail dovecot: POP3(p.xxxxxxxx): Mailbox init failed top=0/0, retr=0/0, del=0/0, size=0

In mail.warn:
Sep 18 09:45:00 mail dovecot: POP3(p.xxxxxxxx): Couldn't init INBOX: Mailbox isn't a valid mbox file

After repairing:
Sep 18 11:16:07 mail dovecot: pop3-login: Login: user=<p.xxxxxxxx>, method=PLAIN, rip=212.91.93.67, lip=xx.xx.xx.67
Sep 18 11:16:07 mail dovecot: POP3(p.xxxxxxxx): Disconnected: Logged out top=0/0, retr=0/0, del=0/0, size=0

- Is the system low on memory when mailbox corruption happens and do you collect and have SAR data?

The server has 2Gb of memory. I cannot say if when the problem happens the memory is low, I think it should be enough for 30 mailboxes.
I tried to launch a "sar" command but a message appears saying that the packet is not installed.

I'm sorry I cannot provide more info, I'm not an expert :-( so I think I'll have to live with this problem. My only hope would be to find another one who has or had this same problem.

Thanks!

unSpawn 09-19-2012 10:02 AM

Quote:

Originally Posted by niente0 (Post 4783978)
- Does this release provide the latest versions of Dovecot, Squirrel Mail and any dependencies and what are their exact versions?

I don't think these are the latest versions, anyway:
Dovecot v1.1.11
SquirrelMail v1.4.21

Dovecot 1 series is at v1.2.17 and Squirrel Mail at 1.4.22 so you're not that far behind. Checking both Dovecot Squirrel Mail's changelogs and bug trackers you could research if upgrading your version outside of what your distribution offers could be beneficial:
http://hg.dovecot.org/dovecot-1.2/log (http://hg.dovecot.org/dovecot-1.2/log?rev=corruption)
http://sourceforge.net/tracker/?group_id=311 (http://sourceforge.net/search/?group...rds=corruption)


Quote:

Originally Posted by niente0 (Post 4783978)
- Is the size of these mail boxes large, huge, humongous or plain ludicrous? (What does 'ls -lh /var/spool/mail/*' return wrt size?)

There are about 30 mailboxes, the biggest ones are around 500Mb. Total is 7,8Gb

The reason I asked is that while searching the 'net for IMAP server-related mailbox corruption specifically in relation to mobile clients I found some clues (IIRC in Debian's bug tracker) mailbox size could be an issue. Can't remember though if it was related to a specific IMAP server or a generic remark.


Quote:

Originally Posted by niente0 (Post 4783978)
- Does it only happen with just-delivered mail in spool files or also with mailboxes in ~/?

Sorry, I'm not sure I didn't understand your question... Anyway the problem seems to be only in /var/spool/mail folder

Mail can be delivered and stored in several places: a users home, a user directory in a separate partition for email, it all depends on your needs and how you configured mail delivery and storage. /var/spool/mail means either virtual mail users or mail delivered from external sources ('net -> MTA -> MDA) so that narrows things down a bit. Reading http://wiki1.dovecot.org/MboxProblems you'll notice Dovecot doesn't mind mailboxes being modified as long as the MUA adheres to anything within the IMAP protocol. Still I think the first step would be to get a better understanding of the problem and that would mean enabling verbose / debug output. In /etc/dovecot.conf you can set "log_path" to a log file (ensure ample free space is available!), set "mail_debug = yes" and restart Dovecot.


Quote:

Originally Posted by niente0 (Post 4783978)
- Do your run any Squirrel Mail plugins we should know about?

I use the standard plugins, plus "local_autorespond_forward" and "vlogin" (to manage multiple domains)

I doubt autorespond / forward could cause mailbox writes but you could look at the documentation if it supports debugging anyway. For Login Manager (latest release was in 2009 BTW) you can turn on $vlogin_debug in the configuration file.


Quote:

Originally Posted by niente0 (Post 4783978)
- Does it happen with any mobile, some, any application it or they use or some?

The mobiles used are of various type. In my opinion the problem is related with Blackberries (the mailbox is checked with the standard client on them). No problems with mailboxes checked with iPad or Galaxy tabs until now.


- What do the system and daemon logs show?

They are quite huge and I don't know what string I can search for. I'm trying to trace back the point in which the problem happened, when I find something I'll post it. Sorry if I have nothing to post now.

- And what do system and daemon logs show when a user tries to access a corrupt mailbox?

In mail.log and syslog I have only this:
Sep 18 08:44:14 mail dovecot: imap-login: Login: user=<p.xxxxxxxx>, method=PLAIN, rip=178.239.87.70, lip=xx.xx.xx.67
Sep 18 08:44:14 mail dovecot: IMAP(p.xxxxxxxx): Disconnected: Logged out bytes=74/357
...
Sep 18 09:45:00 mail dovecot: POP3(p.xxxxxxxx): Couldn't init INBOX: Mailbox isn't a valid mbox file
Sep 18 09:45:01 mail dovecot: POP3(p.xxxxxxxx): Mailbox init failed top=0/0, retr=0/0, del=0/0, size=0

In mail.warn:
Sep 18 09:45:00 mail dovecot: POP3(p.xxxxxxxx): Couldn't init INBOX: Mailbox isn't a valid mbox file

After repairing:
Sep 18 11:16:07 mail dovecot: pop3-login: Login: user=<p.xxxxxxxx>, method=PLAIN, rip=212.91.93.67, lip=xx.xx.xx.67
Sep 18 11:16:07 mail dovecot: POP3(p.xxxxxxxx): Disconnected: Logged out top=0/0, retr=0/0, del=0/0, size=0

OK. Additionally you could try and see if an existing mail spool suddenly contains NULL bytes:
Code:

inotifywait -m var/spool/mail/ -e modify --format "%w%f" 2>&- | while read ITEM; do
 grep -q '\x0' -m1 "${ITEM}" && echo "$(/bin/date +"%b %e %H:%M:%S") found NULL bytes in "${ITEM}""
done

and with that syslog-like time stamp try and correlate with the debug log you set in /etc/dovecot.conf.


Quote:

Originally Posted by niente0 (Post 4783978)
- Is the system low on memory when mailbox corruption happens and do you collect and have SAR data?

The server has 2Gb of memory. I cannot say if when the problem happens the memory is low, I think it should be enough for 30 mailboxes.
I tried to launch a "sar" command but a message appears saying that the packet is not installed.

Regardless of this issue it could be beneficial to have SAR data for servers, provisioning-wise and it doesn't need to be as big as Nagios immediately. Check out Atop, Atsar, Dstat, Collectl (hi Mark ;-p) or any equivalent your distribution repo's have on offer.


Quote:

Originally Posted by niente0 (Post 4783978)
I'm sorry I cannot provide more info, I'm not an expert :-( so I think I'll have to live with this problem. My only hope would be to find another one who has or had this same problem.

You don't need to aplogize for anything and you don't need to be an expert to solve problems. What helps is having a basic understanding of how all things Linux work, being methodical about it and knowing how to do research.


All times are GMT -5. The time now is 12:55 AM.