How to locate and extract a mail from a *nix mailbox?
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
How to locate and extract a mail from a *nix mailbox?
I would like to write a script that automatically locates and extracts a mail in a *nix mailbox and sends it to someone.
The problem is as follows:
I use a procmail-based e-mail sanitizer, that sends sanitized mails to the users' mailboxes, while the original (unsanitized) mails are backed up. Sometimes an important user is not satisfied with the look of the sanitized (mostly html) mail. I would like to allow him get the original (unchanged) mail on his own responsibility.
A possible way of this would be to let him send the sanitized message to a specific address (to an auto-responding element of the mail system); the mail system should automatically identify the message in the backup mailbox and send the original (unchanged) message to the user.
The headers of the sanitized and original messages are almost identical, except that the first has some additional lines added by SpamAssassin and the Sanitizer.
Maybe the date of receipt, sender and the subject could be used to locate the message in the backup mailbox, or are there better data in the header for identification?
How to find the boundaries of each message in the mailbox and how to extract a specific message (cat, grep, sed, formail?)?
I would say that by copying and chowning the backup file to /var/spool/mail/someuser then that user would own the mail. Then an email client can filter the mail by the To: username in the headers and put them in seperate mail files, which could then be chowned and copied to the proper user
Just an idea, but if you could have procmail filter incoming mail using formail to tack on the Message-Id in an extra header (like X-OrigMessage-Id), and they send it to a designated address, and then let procmail find the orig by comparing X-OrigMessage-Id and Message-Id I think that could work w/o much user trouble. Statistical chance for Message-Id's to be identical is much smaller than for a to|from|subject set I'd say.
Thank you for the X-Message-Id tip, it is a good idea.
However I am still stucked at the mail extraction problem:
Reading the procmail and the formail manuals I suspect that I could use formail to split the backup mailbox into messages and maybe they could be fed to procmail (called with a specific procmailrc file) to filter out the rigth message into the user's mailbox using the X-Message-Id.
However, I would rather avoid calling procmail the second time to filter the message, since I regard it as a too heavy tool to be called two times on the same message. The route of the message would be rather complicated:
user's mailbox > procmail > spam filter > sanitizer > procmail > formail (splitting) > procmail (filter out the right message) > user's mailbox
Maybe the spam filter and the sanitizer could be avoided, but it still seems rather heavy because of the two calls of procmail:
user's mailbox > procmail > formail (splitting) > procmail (filter out the right message) > user's mailbox
I also suspect that there would not be two procmail calls, but as many as there are messages in the backup mailbox (formail would call procmail for each message).
I would rather replace the second procmail call with a call to a lightweight utility.
Or am I misunderstanding something?
You already have got the sanitizer running tru procmail, I hope the spam filter as well? Procmail does have some crude form of if-then-else, for instance availability of X-Orig-Message-Id header could lead to running a retrieve, else the sanitizer/spam filter are run. Also, if you rotate backup_spool per week/month/user? How much strain would that put on the server?
If I had a choice I'd be running all processing way before it hits the users spool/dir.
If I store each incoming message in a separate backup file, then I can use grep to find the right message in the backup, and simply append that to the user's mailbox. No need to call procmail again!
If the message backup file gets the message ID as filename, then even grep can be omitted (supposed that the message ID cannot contain characters that are forbidden in filenames, which I hope).
It seems now very easy.
Plus it is even easier to clean up the backups of old messages. This I made rare and manually before, now I can easily clean up old backup files by the date of creation.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.