Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
02-25-2004, 03:12 AM
|
#1
|
Member
Registered: Dec 2003
Location: Johannesburg, South Africa
Distribution: Mandrake
Posts: 48
Rep:
|
convert html emails to plain text emails
How can I convert html emails (fetched from an exchange server using fetchmail) to plain text emails before they are processed by procmail? I need to do this to properly implement our bugzilla bugmail system where users send their problems to a normal email address.
Thanks!
Andre
|
|
|
02-25-2004, 11:00 PM
|
#2
|
Member
Registered: Apr 2003
Location: Dallas, TX
Posts: 122
Rep:
|
write a perl script.
|
|
|
03-01-2004, 09:29 AM
|
#3
|
Member
Registered: Dec 2003
Location: Johannesburg, South Africa
Distribution: Mandrake
Posts: 48
Original Poster
Rep:
|
right... erm, well i was actually wondering if this kind of functionality isn't provided by something standard like fetchmail or procmail already since i'm sure i'm not the first person to want all emails converted to plain text.
|
|
|
03-01-2004, 09:42 AM
|
#4
|
Member
Registered: Feb 2004
Location: Denmark
Distribution: Gentoo
Posts: 136
Rep:
|
I can't help you finding a standard feature, but maybe http://userpage.fu-berlin.de/~mbayer...html2text.html could be handy? Allthough it is easier than writing a !"#¤%perl script 
|
|
|
03-03-2004, 03:03 AM
|
#5
|
Member
Registered: Dec 2003
Location: Johannesburg, South Africa
Distribution: Mandrake
Posts: 48
Original Poster
Rep:
|
Thanks! This looks like it could help, I'll try it out.
|
|
|
04-02-2004, 09:48 AM
|
#6
|
Member
Registered: Dec 2003
Location: Johannesburg, South Africa
Distribution: Mandrake
Posts: 48
Original Poster
Rep:
|
ok... for anyone still interested in getting this right, this is how i finally got it to work. first, download html2text. then create a script containing this
awk '{x=substr($0, length($0)-1,2); if (x==" =") printf substr($0, 0, length($0)-1); else print $0;}' $1 > temp_clean_file.txt
x=`egrep -ni "^<\!DOCTYPE|^<HTML" temp_clean_file.txt | awk -F: '{print $1}' | head -1`
y=`grep -ni "^</html>" temp_clean_file.txt | awk -F: '{print $1}' | head -1`
head -$x temp_clean_file.txt > $2
tail +$x temp_clean_file.txt | head -$[$[$y-$x]+1]| html2text -nobs >> $2
tail +$[$y+1] temp_clean_file.txt >> $2
this will output your message (first parameter) into a plain text message (second parameter). the steps are basically: take all lines ending with " =", and append the following line at the end. then get the first line number with a <html> or <!doctype> tag, and the first number line with a </html> tag, this should be the html part of the message. you have to get these line numbers, because the first part is header info which you should not mess with, and the last part could be attachments or other messages (which I don't bother to convert here) which should also be left alone. then run these lines through html2text, and replace the original lines. the output is a file called temp_clean_file.txt
then put this into your .procmailrc to do whatever you wanted to do with your plain text email:
:0
RESULT=| cat > $MY_HOME/mfile && $MY_HOME/clean-html.sh $MY_HOME/mfile $MY_HOME/outfile && cat $MY_HOME/outfile | (cd $BUGZILLA_HOME && ./bugzilla_email_appen
d.pl)
I piped it into the bugzilla email gateway here but you can change that to whatever. So i cat the message into a file called mfile, then run the script above (which i've put into $MY_HOME/clean-html.sh) with this file and the output file (called $MY_HOME/outfile) and then I cat $MY_HOME/outfile into the bugzilla email gateway
I know the script and the procmail is dirty, but this way i have lots of scattered files lying around that I can look at to see what happened
Last edited by andredude; 04-02-2004 at 09:51 AM.
|
|
|
03-20-2005, 12:33 PM
|
#7
|
LQ Newbie
Registered: Mar 2005
Posts: 1
Rep:
|
thanks for the insights provided here. my approach to convert HTML-only email to text also uses html2text, saves a copy of the original email in an htmlOnly maildir folder (since the conversion is lossy), and uses only procmailrc filtering to convert the body and change the content type header:
## Change html email to text
:0
* ^Content-Type: text/html;
{
:0c
$MAILDIR/htmlOnly/
:0fwb
| `which html2text`
:0fwh
| `which formail` -i "Content-Type: text/plain; charset=us-ascii"
LOG="HTML message found and converted...
"
}
|
|
|
All times are GMT -5. The time now is 07:28 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|