LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 11-08-2004, 02:35 AM   #1
hq4ever
Member
 
Registered: May 2004
Location: Israel
Distribution: Debian
Posts: 98

Rep: Reputation: 15
Bash script for correcting HTML tags


Hello to everyone,

I have an html file that look like this :

Quote:
<HTML
><HEAD
><TITLE
>Linux Security HOWTO</TITLE
><META
NAME="GENERATOR"
CONTENT="Modular DocBook HTML Stylesheet Version 1.7"></HEAD
><BODY
CLASS="article"
BGCOLOR="#FFFFFF"
TEXT="#000000"
LINK="#0000FF"
VLINK="#840084"
ALINK="#0000FF"
><DIV
CLASS="ARTICLE"
>
Clearly this is an not an easy human readable format, though the browser can display this fine.

The requested output is something like :
Quote:
<HTML>
<HEAD>
<TITLE>Linux Security HOWTO</TITLE>
<META NAME="GENERATOR" CONTENT="Modular DocBook HTML Stylesheet Version 1.7"></HEAD>
....
Since i have little experience with bash, I would like to know please for :
1. Is there any way i can put "if" logic into the script : like if tag start with "</" put it on the corrent line, if not open a new line.
2. Can bash work with strings & write them back to a file.
3. Looping, Can bash calculate the EOF & stop when it gets there ?

Thank you, maxim.
 
Old 11-08-2004, 03:17 AM   #2
ror
Member
 
Registered: May 2004
Distribution: Ubuntu
Posts: 583

Rep: Reputation: 33
combine cat, sed | and > and it turns it into an HTML writing beast
 
Old 11-08-2004, 03:38 AM   #3
theYinYeti
Senior Member
 
Registered: Jul 2004
Location: France
Distribution: Arch Linux
Posts: 1,897

Rep: Reputation: 61
Wouldn't this work?
Code:
xmllint --format --recover your_file.html
Yves.
 
Old 11-08-2004, 03:40 AM   #4
ror
Member
 
Registered: May 2004
Distribution: Ubuntu
Posts: 583

Rep: Reputation: 33
That's a nice little program.
 
Old 11-08-2004, 04:06 AM   #5
hq4ever
Member
 
Registered: May 2004
Location: Israel
Distribution: Debian
Posts: 98

Original Poster
Rep: Reputation: 15
Quote:
Originally posted by theYinYeti
Wouldn't this work?
Code:
xmllint --format --recover your_file.html
Yves.
It does, Thanks
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
bash+awk to fix MP3 tags Yalla-One Programming 7 11-12-2005 09:11 AM
strip html tags rblampain Programming 6 08-07-2005 06:22 AM
removing php font tags for cut and paste of script? rioguia Linux - Newbie 0 10-16-2004 07:23 AM
how can I seprate normal text from html tags spell check it & then again place it ins amit_28oct Programming 5 08-07-2004 07:09 AM
regular expression for parsing html tags Bert Linux - Software 3 10-14-2002 04:31 PM


All times are GMT -5. The time now is 04:46 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration