LinuxQuestions.org
LinuxAnswers - the LQ Linux tutorial section.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices

Reply
 
Search this Thread
Old 01-09-2009, 11:28 AM   #1
deesto
Member
 
Registered: May 2002
Location: NY, USA
Distribution: FreeBSD, Fedora, RHEL, Ubuntu; OS X, Win; have used Slackware, Mandrake, SuSE, Xandros
Posts: 448

Rep: Reputation: 31
Question convert cron output to plain text


I have a cron that runs a Perl binary, which generates output in raw HTML code. This output is emailed to me by the cron:
Code:
1 6,12,22 * * * apache /my_script.pl | mail my@email.com -s "My Cron Output"
However, when I receive the output via email, I see the output in raw HTML code (which is hard to read), instead of either a well-formed MIME message, or just plain text. I've posted an example of the output here:
http://deesto.pastebin.com/fdea3fea

Is there an easy way (a one-liner in the cron, or that I can add to a bash script) to either strip the HTML tags from the output, or somehow turn this output into valid HTML, so it doesn't look like garbage on the screen?
 
Old 01-09-2009, 11:47 AM   #2
Disillusionist
Senior Member
 
Registered: Aug 2004
Location: England
Distribution: Ubuntu
Posts: 1,013

Rep: Reputation: 83
Google html2txt
 
Old 01-09-2009, 02:17 PM   #3
deesto
Member
 
Registered: May 2002
Location: NY, USA
Distribution: FreeBSD, Fedora, RHEL, Ubuntu; OS X, Win; have used Slackware, Mandrake, SuSE, Xandros
Posts: 448

Original Poster
Rep: Reputation: 31
Thanks, but there seem to be several incarnations of "html2txt": one is an online script on the W3C site[1], one appears to be a Windows GUI tool[2], one is a Python script that runs only on existing URLs[3], etc. I need something to work with cron/bash in Linux to clean up command output. I'd hoped someone might have a quick command or script to do something similar.

[1] http://cgi.w3.org/cgi-bin/html2txt
[2] http://www.bobsoft.com/html2txt/
[3] http://www.aaronsw.com/2002/html2text/
 
Old 01-09-2009, 02:36 PM   #4
billymayday
Guru
 
Registered: Mar 2006
Location: Sydney, Australia
Distribution: Fedora, CentOS, OpenSuse, Slack, Gentoo, Debian, Arch, PCBSD
Posts: 6,678

Rep: Reputation: 122Reputation: 122
Try this one http://www.mbayer.de/html2text/

Depending on which of your distros you are working on, you may find it in your package manager (it's in the RH derivatives for example - see rpmforge repo.).

If you use www.google.com/linux, you will find much more targeted results than the general google.
 
Old 01-09-2009, 03:00 PM   #5
deesto
Member
 
Registered: May 2002
Location: NY, USA
Distribution: FreeBSD, Fedora, RHEL, Ubuntu; OS X, Win; have used Slackware, Mandrake, SuSE, Xandros
Posts: 448

Original Poster
Rep: Reputation: 31
Thanks billymayday. I downloaded and installed that as an RPM on my system (RHEL4), and it worked without error. However, the output was kind of strange when I sent it to a file:
Code:
-rw-r--r--  1 root     root      47006 Oct 18  2006 install.log
P^HPu^Hub^Hbl^Hli^His^Hsh^Hhe^Her^Hr:^H: TWikiAdminGroup
D^HDa^Hat^Hte^He:^H: 09 Jan 2009 - 12:01
{^H{P^HPu^Hub^Hbl^Hli^His^Hsh^HhC^HCo^Hon^Hnt^Htr^Hri^Hib^Hb}^H}{^H{D^HDi^Hir^Hr}^H}:^H: /var/www/twikihtml/
{^H{P^HPu^Hub^Hbl^Hli^His^Hsh^HhC^HCo^Hon^Hnt^Htr^Hri^Hib^Hb}^H}{^H{U^HUR^HRL^HL}^H}:^H: undef/publish/
W^HWe^Heb^Hb:^H: Admins
C^HCo^Hon^Hnt^Hte^Hen^Hnt^Ht G^HGe^Hen^Hne^Her^Hra^Hat^Hto^Hor^Hr:^H: file
S^HSk^Hki^Hin^Hn:^H: anon
I^HIn^Hnc^Hcl^Hlu^Hus^Hsi^Hio^Hon^Hns^Hs:^H: .*
E^HEx^Hxc^Hcl^Hlu^Hus^Hsi^Hio^Hon^Hns^Hs:^H: WebSearch.*
C^HCo^Hon^Hnt^Hte^Hen^Hnt^Ht F^HFi^Hil^Hlt^Hte^Her^Hr:^H:
G^HGe^Hen^Hne^Her^Hra^Hat^Hto^Hor^Hr O^HOp^Hpt^Hti^Hio^Hon^Hns^Hs:^H:
...
It seems like it stripped out the HTML tags, but also took any words it found and split those multiple ways, and added funky characters of its own; thus the word 'Web' became:
Code:
W^HWe^Heb^Hb:^H
After some playing, it seems to work fine when you add a few knobs to the command:
Code:
html2text -ascii -nobs -o [output.file] [input.file]
Thanks again.
 
Old 01-09-2009, 03:10 PM   #6
billymayday
Guru
 
Registered: Mar 2006
Location: Sydney, Australia
Distribution: Fedora, CentOS, OpenSuse, Slack, Gentoo, Debian, Arch, PCBSD
Posts: 6,678

Rep: Reputation: 122Reputation: 122
You're welcome
 
  


Reply

Tags
cron, html, output, perl


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
plain old text editor autophil Linux - General 9 08-12-2007 08:46 PM
CMS for plain text rblampain Linux - Software 3 12-14-2005 10:40 PM
not a plain text file wazza4610 Linux - Newbie 1 11-22-2005 04:20 AM
convert html emails to plain text emails andredude Linux - General 6 03-20-2005 12:33 PM
Printing from lpr to plain text DoubleLetter Linux - General 2 07-19-2002 11:25 PM


All times are GMT -5. The time now is 09:00 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration