grep, sed, awk or tr - searching words in a string
I'm making a number of changes to html web pages. I've used Quanta "find in files" option, but would like to have something fully automatic.
First problem is I need to get just the title of the page Example, from the string:- <title>Download Page</title> I need to parse the string so it just returns "Download Page" (without quotes). I've used tr '</>' ' ' (which gets rid of the <, >, /, characters , but how do I get rid of the string "title" but still keep other characters in the string? Thanks in advance |
Using sed you can keep part of the pattern. Just embed it in escaped parentheses and refer to it as \1, like in the following example:
Code:
echo "<title>Download Page</title>" | sed 's/<title>\(.*\)<\/title>/\1/' I'd suggest to use an already coded HTML parser. There are plenty of them available for free and written in different languages. Just google for them to get the idea! :) Edit: just thought about a more simple sed command, just removing the unwanted part: Code:
echo "<title>Download Page</title>" | sed 's/<\/*title>//g' |
I prefer the first offering - pick the data you want to keep. Easy to make it handle the potential for extra data on the record. Even the unlikely multiple <title>..</title> pairs.
The "simple" latter offering won't deal with extra data at all. Where regex is concerned I favour being as explicit as possible - it's way too easy for things to slip "under the radar". |
All times are GMT -5. The time now is 05:45 PM. |