LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices

Reply
 
Search this Thread
Old 02-25-2004, 03:12 AM   #1
andredude
Member
 
Registered: Dec 2003
Location: Johannesburg, South Africa
Distribution: Mandrake
Posts: 48

Rep: Reputation: 15
convert html emails to plain text emails


How can I convert html emails (fetched from an exchange server using fetchmail) to plain text emails before they are processed by procmail? I need to do this to properly implement our bugzilla bugmail system where users send their problems to a normal email address.

Thanks!

Andre
 
Old 02-25-2004, 11:00 PM   #2
bruce1271
Member
 
Registered: Apr 2003
Location: Dallas, TX
Posts: 122

Rep: Reputation: 15
write a perl script.
 
Old 03-01-2004, 09:29 AM   #3
andredude
Member
 
Registered: Dec 2003
Location: Johannesburg, South Africa
Distribution: Mandrake
Posts: 48

Original Poster
Rep: Reputation: 15
right... erm, well i was actually wondering if this kind of functionality isn't provided by something standard like fetchmail or procmail already since i'm sure i'm not the first person to want all emails converted to plain text.
 
Old 03-01-2004, 09:42 AM   #4
meldar
Member
 
Registered: Feb 2004
Location: Denmark
Distribution: Gentoo
Posts: 136

Rep: Reputation: 15
I can't help you finding a standard feature, but maybe http://userpage.fu-berlin.de/~mbayer...html2text.html could be handy? Allthough it is easier than writing a !"#%perl script
 
Old 03-03-2004, 03:03 AM   #5
andredude
Member
 
Registered: Dec 2003
Location: Johannesburg, South Africa
Distribution: Mandrake
Posts: 48

Original Poster
Rep: Reputation: 15
Thanks! This looks like it could help, I'll try it out.
 
Old 04-02-2004, 09:48 AM   #6
andredude
Member
 
Registered: Dec 2003
Location: Johannesburg, South Africa
Distribution: Mandrake
Posts: 48

Original Poster
Rep: Reputation: 15
ok... for anyone still interested in getting this right, this is how i finally got it to work. first, download html2text. then create a script containing this

awk '{x=substr($0, length($0)-1,2); if (x==" =") printf substr($0, 0, length($0)-1); else print $0;}' $1 > temp_clean_file.txt
x=`egrep -ni "^<\!DOCTYPE|^<HTML" temp_clean_file.txt | awk -F: '{print $1}' | head -1`
y=`grep -ni "^</html>" temp_clean_file.txt | awk -F: '{print $1}' | head -1`
head -$x temp_clean_file.txt > $2
tail +$x temp_clean_file.txt | head -$[$[$y-$x]+1]| html2text -nobs >> $2
tail +$[$y+1] temp_clean_file.txt >> $2


this will output your message (first parameter) into a plain text message (second parameter). the steps are basically: take all lines ending with " =", and append the following line at the end. then get the first line number with a <html> or <!doctype> tag, and the first number line with a </html> tag, this should be the html part of the message. you have to get these line numbers, because the first part is header info which you should not mess with, and the last part could be attachments or other messages (which I don't bother to convert here) which should also be left alone. then run these lines through html2text, and replace the original lines. the output is a file called temp_clean_file.txt

then put this into your .procmailrc to do whatever you wanted to do with your plain text email:
:0
RESULT=| cat > $MY_HOME/mfile && $MY_HOME/clean-html.sh $MY_HOME/mfile $MY_HOME/outfile && cat $MY_HOME/outfile | (cd $BUGZILLA_HOME && ./bugzilla_email_appen
d.pl)

I piped it into the bugzilla email gateway here but you can change that to whatever. So i cat the message into a file called mfile, then run the script above (which i've put into $MY_HOME/clean-html.sh) with this file and the output file (called $MY_HOME/outfile) and then I cat $MY_HOME/outfile into the bugzilla email gateway

I know the script and the procmail is dirty, but this way i have lots of scattered files lying around that I can look at to see what happened

Last edited by andredude; 04-02-2004 at 09:51 AM.
 
Old 03-20-2005, 12:33 PM   #7
osueerower
LQ Newbie
 
Registered: Mar 2005
Posts: 1

Rep: Reputation: 0
thanks for the insights provided here. my approach to convert HTML-only email to text also uses html2text, saves a copy of the original email in an htmlOnly maildir folder (since the conversion is lossy), and uses only procmailrc filtering to convert the body and change the content type header:

## Change html email to text
:0
* ^Content-Type: text/html;
{
:0c
$MAILDIR/htmlOnly/
:0fwb
| `which html2text`
:0fwh
| `which formail` -i "Content-Type: text/plain; charset=us-ascii"
LOG="HTML message found and converted...
"
}
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
how do you convert emails to mbox Remo Linux - Software 3 11-28-2005 11:42 AM
Can't get Kmail to diplay HTML emails Stevetgn Linux - Software 1 06-10-2005 06:40 PM
How can I automatically open HTML links sent in emails? bad_andy Linux - General 6 01-01-2005 07:42 AM
help with sending html emails pdelucia Linux - General 1 09-13-2003 11:06 AM
Evolution and HTML emails linuxfanatic Linux - Newbie 1 08-02-2002 12:21 AM


All times are GMT -5. The time now is 03:58 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration