LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Forums > Linux > Linux - General
User Name
Password
Linux - General This forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices

Tags used in this thread
Popular LQ Tags , , ,

Reply
 
Thread Tools
Old 01-09-2009, 12:28 PM   #1
deesto
Member
 
Registered: May 2002
Location: NY, USA
Distribution: FreeBSD, Fedora, RHEL, OS X, Win; have played with Slackware, Mandrake, SuSE, Ubuntu, Xandros
Posts: 370
Thanked: 0
Question convert cron output to plain text


[Log in to get rid of this advertisement]
I have a cron that runs a Perl binary, which generates output in raw HTML code. This output is emailed to me by the cron:
Code:
1 6,12,22 * * * apache /my_script.pl | mail my@email.com -s "My Cron Output"
However, when I receive the output via email, I see the output in raw HTML code (which is hard to read), instead of either a well-formed MIME message, or just plain text. I've posted an example of the output here:
http://deesto.pastebin.com/fdea3fea

Is there an easy way (a one-liner in the cron, or that I can add to a bash script) to either strip the HTML tags from the output, or somehow turn this output into valid HTML, so it doesn't look like garbage on the screen?
deesto is offline  
Tag This Post , , ,
Reply With Quote
Old 01-09-2009, 12:47 PM   #2
Disillusionist
Member
 
Registered: Aug 2004
Location: England
Distribution: Ubuntu
Posts: 825
Thanked: 48
Google html2txt
Disillusionist is offline     Reply With Quote
Old 01-09-2009, 03:17 PM   #3
deesto
Member
 
Registered: May 2002
Location: NY, USA
Distribution: FreeBSD, Fedora, RHEL, OS X, Win; have played with Slackware, Mandrake, SuSE, Ubuntu, Xandros
Posts: 370
Thanked: 0

Original Poster
Thanks, but there seem to be several incarnations of "html2txt": one is an online script on the W3C site[1], one appears to be a Windows GUI tool[2], one is a Python script that runs only on existing URLs[3], etc. I need something to work with cron/bash in Linux to clean up command output. I'd hoped someone might have a quick command or script to do something similar.

[1] http://cgi.w3.org/cgi-bin/html2txt
[2] http://www.bobsoft.com/html2txt/
[3] http://www.aaronsw.com/2002/html2text/
deesto is offline     Reply With Quote
Old 01-09-2009, 03:36 PM   #4
billymayday
Guru
 
Registered: Mar 2006
Location: Sydney, Australia
Distribution: Fedora, CentOS, OpenSuse, Slack, Gentoo, Debian, Arch, PCBSD
Posts: 6,678
Thanked: 126
Try this one http://www.mbayer.de/html2text/

Depending on which of your distros you are working on, you may find it in your package manager (it's in the RH derivatives for example - see rpmforge repo.).

If you use www.google.com/linux, you will find much more targeted results than the general google.
billymayday is offline     Reply With Quote
Thanked by:
Old 01-09-2009, 04:00 PM   #5
deesto
Member
 
Registered: May 2002
Location: NY, USA
Distribution: FreeBSD, Fedora, RHEL, OS X, Win; have played with Slackware, Mandrake, SuSE, Ubuntu, Xandros
Posts: 370
Thanked: 0

Original Poster
Thanks billymayday. I downloaded and installed that as an RPM on my system (RHEL4), and it worked without error. However, the output was kind of strange when I sent it to a file:
Code:
-rw-r--r--  1 root     root      47006 Oct 18  2006 install.log
P^HPu^Hub^Hbl^Hli^His^Hsh^Hhe^Her^Hr:^H: TWikiAdminGroup
D^HDa^Hat^Hte^He:^H: 09 Jan 2009 - 12:01
{^H{P^HPu^Hub^Hbl^Hli^His^Hsh^HhC^HCo^Hon^Hnt^Htr^Hri^Hib^Hb}^H}{^H{D^HDi^Hir^Hr}^H}:^H: /var/www/twikihtml/
{^H{P^HPu^Hub^Hbl^Hli^His^Hsh^HhC^HCo^Hon^Hnt^Htr^Hri^Hib^Hb}^H}{^H{U^HUR^HRL^HL}^H}:^H: undef/publish/
W^HWe^Heb^Hb:^H: Admins
C^HCo^Hon^Hnt^Hte^Hen^Hnt^Ht G^HGe^Hen^Hne^Her^Hra^Hat^Hto^Hor^Hr:^H: file
S^HSk^Hki^Hin^Hn:^H: anon
I^HIn^Hnc^Hcl^Hlu^Hus^Hsi^Hio^Hon^Hns^Hs:^H: .*
E^HEx^Hxc^Hcl^Hlu^Hus^Hsi^Hio^Hon^Hns^Hs:^H: WebSearch.*
C^HCo^Hon^Hnt^Hte^Hen^Hnt^Ht F^HFi^Hil^Hlt^Hte^Her^Hr:^H:
G^HGe^Hen^Hne^Her^Hra^Hat^Hto^Hor^Hr O^HOp^Hpt^Hti^Hio^Hon^Hns^Hs:^H:
...
It seems like it stripped out the HTML tags, but also took any words it found and split those multiple ways, and added funky characters of its own; thus the word 'Web' became:
Code:
W^HWe^Heb^Hb:^H
After some playing, it seems to work fine when you add a few knobs to the command:
Code:
html2text -ascii -nobs -o [output.file] [input.file]
Thanks again.
deesto is offline     Reply With Quote
Old 01-09-2009, 04:10 PM   #6
billymayday
Guru
 
Registered: Mar 2006
Location: Sydney, Australia
Distribution: Fedora, CentOS, OpenSuse, Slack, Gentoo, Debian, Arch, PCBSD
Posts: 6,678
Thanked: 126
You're welcome
billymayday is offline     Reply With Quote

Reply

Bookmarks


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
plain old text editor autophil Linux - General 9 08-12-2007 09:46 PM
CMS for plain text rblampain Linux - Software 3 12-14-2005 11:40 PM
not a plain text file wazza4610 Linux - Newbie 1 11-22-2005 05:20 AM
convert html emails to plain text emails andredude Linux - General 6 03-20-2005 01:33 PM
Printing from lpr to plain text DoubleLetter Linux - General 2 07-20-2002 12:25 AM


All times are GMT -5. The time now is 07:42 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
RSS2  LQ Podcast
RSS2  LQ Radio
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: @linuxquestions
Open Source Consulting | Domain Registration