grep, sed, awk or tr - searching words in a string
I'm making a number of changes to html web pages. I've used Quanta "find in files" option, but would like to have something fully automatic.
First problem is I need to get just the title of the page
Example, from the string:-
I need to parse the string so it just returns
"Download Page" (without quotes).
tr '</>' ' ' (which gets rid of the <, >, /, characters , but how do I get rid of the string "title" but still keep other characters in the string?
Thanks in advance
Using sed you can keep part of the pattern. Just embed it in escaped parentheses and refer to it as \1, like in the following example:
I'd suggest to use an already coded HTML parser. There are plenty of them available for free and written in different languages. Just google for them to get the idea! :)
Edit: just thought about a more simple sed command, just removing the unwanted part:
I prefer the first offering - pick the data you want to keep. Easy to make it handle the potential for extra data on the record. Even the unlikely multiple <title>..</title> pairs.
The "simple" latter offering won't deal with extra data at all.
Where regex is concerned I favour being as explicit as possible - it's way too easy for things to slip "under the radar".
|All times are GMT -5. The time now is 12:47 AM.|