Deleting duplicate messages

essdeeay · 11-18-2005, 08:42 PM

In the processing of recovery from a mail server crash on a Windows OS, I have a few thousand duplicate messages. I'm thinking along the lines of reading all the messages and checking for the "Message-ID" string, storing it if it's the 1st time I've seen it, then deleting all subsequent messages with the same Message-ID.

The messages may not be exactly the same because the duplicates have been through the mail router at least once more than the original message.

Is this a flawed method, and is there a better way?

As I'm fairly new to scripting (but understand quite well), what kind of techniques should I look to be using?

Many thanks,
Steve

bigrigdriver · 11-20-2005, 07:58 AM

I should think you could do a three part operation on the mail.

Sort by Message_ID so that duplicates are adjacent in the sorted file.

Pipe the output through uniq, which requires duplicate lines in a file to be adjacent in order to filter out duplicates.

Write the first occurrance of a Message_ID to a new file, and drop the duplicates into null space, or another file "just in case".