LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-01-2014, 11:10 AM   #1
kosovafan
LQ Newbie
 
Registered: Mar 2014
Distribution: Gentoo
Posts: 1

Rep: Reputation: Disabled
sed on html pages


Hello,

i use a static generator for my blog and after i use tidy to clean the html code. But Tidy make some things wrong so i try to fix it with sed. I found a way to delete empty lines and delete div tags. But i not find a way to break a line after title and insert whitespace.

Code:
Delete Lines:
sed -i '/^$/d' index.html
Code:
Delete div tags:
sed -i 's|<[/]\?div[^>]*>||g' index.html
Code:
This i not understand, before:
  <title>Silvio Siefke | Blog</title><!--[if lt IE 9]>
      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js" type="text/javascript"></script>
    <![endif]-->  
  <link href="/static/css/style.css" rel="stylesheet" type="text/css">

After i wish:
  <title>Silvio Siefke | Blog</title>

    <!--[if lt IE 9]>
      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js" type="text/javascript"></script>
    <![endif]-->  

  <link href="/static/css/style.css" rel="stylesheet" type="text/css">
Can me someone help? Can i combined all steps? So that first delete empty lines, delete div, delete id="ext" and id="vid" and this what i write before?

Hope understand because my english not very good.


Thank you for help & Nice Day
Silvio
 
Old 03-03-2014, 11:16 AM   #2
TenTenths
Senior Member
 
Registered: Aug 2011
Location: Dublin
Distribution: Centos 5 / 6 / 7
Posts: 3,474

Rep: Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553Reputation: 1553
Code:
sed -i 's/<\/title>/<\/title>\n\n    /g' index.html
Will replace the </title> tag with </title>, two newlines (\n\n) and four spaces. That will achieve what you ask for in your "before/after" example.

You could write a small BASH script that would do all your steps in turn on a file of your choice.

tidy.sh
Code:
#!/bin/bash

# Delete div tags:
sed -i 's|<[/]\?div[^>]*>||g' $1

# Delete Lines:
sed -i '/^$/d' $1

# "Fix" Title
sed -i 's/<\/title>/<\/title>\n\n    /g'
Then call it with /path/to/tidy.sh index.html
 
1 members found this post helpful.
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
MS Publisher html pages for new web pages do not open in firefox, any suggestions?? Bwebman Linux - Newbie 3 06-13-2009 10:35 AM
Creating html pages with python athreyavc Programming 3 02-12-2008 10:57 PM
HTML Man Pages bneal Linux - General 9 07-30-2007 12:08 PM
can't view html pages in IE thisObject Linux - Software 4 05-07-2006 10:55 AM
Cookie Sharing Between CGI generated HTML pages and standard HTML pages rkwhited Linux - Newbie 5 08-15-2004 07:39 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 10:25 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration