Insert a comment in html file based on its contents
I have multiple HTML files in a folder. there is a <h2> tag like this:
Code:
<h2>some text</h2> I want to write a shell script/batch file to add this tag in <head> section of each file: Code:
<!-- TITLE= "same text from h2 tag" --> for example: <h2>first part of text second line of text</h2> The line break shouldn't be shown in <!-- TITLE= "same text from h2 tag" -->. The script has to capture tag content & skip line breaks. Can anybody help me? |
I would go this way:
1. write a sed script: <code> #!/bin/sed -f :loop N $!b loop s?<h2>?¢? s?</h2>?£? s?^\(.*\)\(</[hH][eE][aA][dD]>[^¢]*\)¢\([^£]*\)£?\1¢\3£\2<h2>\3</h2>? s?\(¢\)\([^\n£]*\)\n?\1\2 ? s?\(¢\)\([^\n£]*\)\n?\1\2 ? s?\(¢\)\([^\n£]*\)\n?\1\2 ? s?¢?<!-- TITLE= "? s?£?" -->? </code> The first 3 lines is to put the contents of the input file in a single line so that '\n' (the new line character) could be treated as an ordinary character. I use the 'cent' and 'british pound' characters as delimiters (these are unlikely to be found in an html file) to retrieve what's between the '<h2>' and '</h2>' tags. Then I place the contents just before the </head> tag and surrounded by 'cent' and 'pound', replace the 'cent' and 'pound' below (the ones that appear after </head>) by <h2> and </h2>. The next 3 lines are for replacing the new line character by a space. The last 2 lines would replace 'cent' and 'pound' by '<TITLE> ....' and '...</TITLE>', respectively All you need to do is save the sed script, eg foo.sed then <code> ./foo.sed your_html_file > output_html_file </code> Hope this'll help |
This problem resolves into a problem of parsing HTML, which is a non-trivial exercise, if it is to be done well. If there is much uncertainly at all about the formatting of your HTML, it is probably worthwhile to use something like Perl and one of the existing HTML parser modules.
--- rod. |
Quote:
|
All times are GMT -5. The time now is 10:50 AM. |