Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Distribution: FreeBSD, Fedora, RHEL, Ubuntu; OS X, Win; have used Slackware, Mandrake, SuSE, Xandros
Posts: 444
Rep:
convert cron output to plain text
I have a cron that runs a Perl binary, which generates output in raw HTML code. This output is emailed to me by the cron:
Code:
1 6,12,22 * * * apache /my_script.pl | mail my@email.com -s "My Cron Output"
However, when I receive the output via email, I see the output in raw HTML code (which is hard to read), instead of either a well-formed MIME message, or just plain text. I've posted an example of the output here: http://deesto.pastebin.com/fdea3fea
Is there an easy way (a one-liner in the cron, or that I can add to a bash script) to either strip the HTML tags from the output, or somehow turn this output into valid HTML, so it doesn't look like garbage on the screen?
Distribution: FreeBSD, Fedora, RHEL, Ubuntu; OS X, Win; have used Slackware, Mandrake, SuSE, Xandros
Posts: 444
Original Poster
Rep:
Thanks, but there seem to be several incarnations of "html2txt": one is an online script on the W3C site[1], one appears to be a Windows GUI tool[2], one is a Python script that runs only on existing URLs[3], etc. I need something to work with cron/bash in Linux to clean up command output. I'd hoped someone might have a quick command or script to do something similar.
Depending on which of your distros you are working on, you may find it in your package manager (it's in the RH derivatives for example - see rpmforge repo.).
If you use www.google.com/linux, you will find much more targeted results than the general google.
Distribution: FreeBSD, Fedora, RHEL, Ubuntu; OS X, Win; have used Slackware, Mandrake, SuSE, Xandros
Posts: 444
Original Poster
Rep:
Thanks billymayday. I downloaded and installed that as an RPM on my system (RHEL4), and it worked without error. However, the output was kind of strange when I sent it to a file:
It seems like it stripped out the HTML tags, but also took any words it found and split those multiple ways, and added funky characters of its own; thus the word 'Web' became:
Code:
W^HWe^Heb^Hb:^H
After some playing, it seems to work fine when you add a few knobs to the command:
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.