LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (http://www.linuxquestions.org/questions/linux-general-1/)
-   -   convert cron output to plain text (http://www.linuxquestions.org/questions/linux-general-1/convert-cron-output-to-plain-text-696117/)

deesto 01-09-2009 11:28 AM

convert cron output to plain text
 
I have a cron that runs a Perl binary, which generates output in raw HTML code. This output is emailed to me by the cron:
Code:

1 6,12,22 * * * apache /my_script.pl | mail my@email.com -s "My Cron Output"
However, when I receive the output via email, I see the output in raw HTML code (which is hard to read), instead of either a well-formed MIME message, or just plain text. I've posted an example of the output here:
http://deesto.pastebin.com/fdea3fea

Is there an easy way (a one-liner in the cron, or that I can add to a bash script) to either strip the HTML tags from the output, or somehow turn this output into valid HTML, so it doesn't look like garbage on the screen?

Disillusionist 01-09-2009 11:47 AM

Google html2txt

deesto 01-09-2009 02:17 PM

Thanks, but there seem to be several incarnations of "html2txt": one is an online script on the W3C site[1], one appears to be a Windows GUI tool[2], one is a Python script that runs only on existing URLs[3], etc. I need something to work with cron/bash in Linux to clean up command output. I'd hoped someone might have a quick command or script to do something similar.

[1] http://cgi.w3.org/cgi-bin/html2txt
[2] http://www.bobsoft.com/html2txt/
[3] http://www.aaronsw.com/2002/html2text/

billymayday 01-09-2009 02:36 PM

Try this one http://www.mbayer.de/html2text/

Depending on which of your distros you are working on, you may find it in your package manager (it's in the RH derivatives for example - see rpmforge repo.).

If you use www.google.com/linux, you will find much more targeted results than the general google.

deesto 01-09-2009 03:00 PM

Thanks billymayday. I downloaded and installed that as an RPM on my system (RHEL4), and it worked without error. However, the output was kind of strange when I sent it to a file:
Code:

-rw-r--r--  1 root    root      47006 Oct 18  2006 install.log
P^HPu^Hub^Hbl^Hli^His^Hsh^Hhe^Her^Hr:^H: TWikiAdminGroup
D^HDa^Hat^Hte^He:^H: 09 Jan 2009 - 12:01
{^H{P^HPu^Hub^Hbl^Hli^His^Hsh^HhC^HCo^Hon^Hnt^Htr^Hri^Hib^Hb}^H}{^H{D^HDi^Hir^Hr}^H}:^H: /var/www/twikihtml/
{^H{P^HPu^Hub^Hbl^Hli^His^Hsh^HhC^HCo^Hon^Hnt^Htr^Hri^Hib^Hb}^H}{^H{U^HUR^HRL^HL}^H}:^H: undef/publish/
W^HWe^Heb^Hb:^H: Admins
C^HCo^Hon^Hnt^Hte^Hen^Hnt^Ht G^HGe^Hen^Hne^Her^Hra^Hat^Hto^Hor^Hr:^H: file
S^HSk^Hki^Hin^Hn:^H: anon
I^HIn^Hnc^Hcl^Hlu^Hus^Hsi^Hio^Hon^Hns^Hs:^H: .*
E^HEx^Hxc^Hcl^Hlu^Hus^Hsi^Hio^Hon^Hns^Hs:^H: WebSearch.*
C^HCo^Hon^Hnt^Hte^Hen^Hnt^Ht F^HFi^Hil^Hlt^Hte^Her^Hr:^H:
G^HGe^Hen^Hne^Her^Hra^Hat^Hto^Hor^Hr O^HOp^Hpt^Hti^Hio^Hon^Hns^Hs:^H:
...

It seems like it stripped out the HTML tags, but also took any words it found and split those multiple ways, and added funky characters of its own; thus the word 'Web' became:
Code:

W^HWe^Heb^Hb:^H
After some playing, it seems to work fine when you add a few knobs to the command:
Code:

html2text -ascii -nobs -o [output.file] [input.file]
Thanks again.

billymayday 01-09-2009 03:10 PM

You're welcome


All times are GMT -5. The time now is 11:02 PM.