-   Linux - General (
-   -   convert cron output to plain text (

deesto 01-09-2009 12:28 PM

convert cron output to plain text
I have a cron that runs a Perl binary, which generates output in raw HTML code. This output is emailed to me by the cron:

1 6,12,22 * * * apache / | mail -s "My Cron Output"
However, when I receive the output via email, I see the output in raw HTML code (which is hard to read), instead of either a well-formed MIME message, or just plain text. I've posted an example of the output here:

Is there an easy way (a one-liner in the cron, or that I can add to a bash script) to either strip the HTML tags from the output, or somehow turn this output into valid HTML, so it doesn't look like garbage on the screen?

Disillusionist 01-09-2009 12:47 PM

Google html2txt

deesto 01-09-2009 03:17 PM

Thanks, but there seem to be several incarnations of "html2txt": one is an online script on the W3C site[1], one appears to be a Windows GUI tool[2], one is a Python script that runs only on existing URLs[3], etc. I need something to work with cron/bash in Linux to clean up command output. I'd hoped someone might have a quick command or script to do something similar.


billymayday 01-09-2009 03:36 PM

Try this one

Depending on which of your distros you are working on, you may find it in your package manager (it's in the RH derivatives for example - see rpmforge repo.).

If you use, you will find much more targeted results than the general google.

deesto 01-09-2009 04:00 PM

Thanks billymayday. I downloaded and installed that as an RPM on my system (RHEL4), and it worked without error. However, the output was kind of strange when I sent it to a file:

-rw-r--r--  1 root    root      47006 Oct 18  2006 install.log
P^HPu^Hub^Hbl^Hli^His^Hsh^Hhe^Her^Hr:^H: TWikiAdminGroup
D^HDa^Hat^Hte^He:^H: 09 Jan 2009 - 12:01
{^H{P^HPu^Hub^Hbl^Hli^His^Hsh^HhC^HCo^Hon^Hnt^Htr^Hri^Hib^Hb}^H}{^H{D^HDi^Hir^Hr}^H}:^H: /var/www/twikihtml/
{^H{P^HPu^Hub^Hbl^Hli^His^Hsh^HhC^HCo^Hon^Hnt^Htr^Hri^Hib^Hb}^H}{^H{U^HUR^HRL^HL}^H}:^H: undef/publish/
W^HWe^Heb^Hb:^H: Admins
C^HCo^Hon^Hnt^Hte^Hen^Hnt^Ht G^HGe^Hen^Hne^Her^Hra^Hat^Hto^Hor^Hr:^H: file
S^HSk^Hki^Hin^Hn:^H: anon
I^HIn^Hnc^Hcl^Hlu^Hus^Hsi^Hio^Hon^Hns^Hs:^H: .*
E^HEx^Hxc^Hcl^Hlu^Hus^Hsi^Hio^Hon^Hns^Hs:^H: WebSearch.*
C^HCo^Hon^Hnt^Hte^Hen^Hnt^Ht F^HFi^Hil^Hlt^Hte^Her^Hr:^H:
G^HGe^Hen^Hne^Her^Hra^Hat^Hto^Hor^Hr O^HOp^Hpt^Hti^Hio^Hon^Hns^Hs:^H:

It seems like it stripped out the HTML tags, but also took any words it found and split those multiple ways, and added funky characters of its own; thus the word 'Web' became:

After some playing, it seems to work fine when you add a few knobs to the command:

html2text -ascii -nobs -o [output.file] [input.file]
Thanks again.

billymayday 01-09-2009 04:10 PM

You're welcome

All times are GMT -5. The time now is 09:23 AM.