Welcome to the most active Linux Forum on the web.
Go Back > Forums > Non-*NIX Forums > Programming
User Name
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.


  Search this Thread
Old 03-06-2009, 05:20 PM   #1
Registered: Mar 2001
Location: UK
Distribution: Mint, Arch, Debian7
Posts: 189

Rep: Reputation: 23
grep, sed, awk or tr - searching words in a string

I'm making a number of changes to html web pages. I've used Quanta "find in files" option, but would like to have something fully automatic.

First problem is I need to get just the title of the page
Example, from the string:-
<title>Download Page</title>

I need to parse the string so it just returns
"Download Page" (without quotes).

I've used
tr '</>' ' ' (which gets rid of the <, >, /, characters , but how do I get rid of the string "title" but still keep other characters in the string?

Thanks in advance
Old 03-06-2009, 05:35 PM   #2
LQ Guru
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978Reputation: 1978
Using sed you can keep part of the pattern. Just embed it in escaped parentheses and refer to it as \1, like in the following example:
echo "<title>Download Page</title>" | sed 's/<title>\(.*\)<\/title>/\1/'
you have to carefully chose the regular expression to retrieve a unique result. In the case of the title it should be easy, but what if you have multiple html tags in the same line?

I'd suggest to use an already coded HTML parser. There are plenty of them available for free and written in different languages. Just google for them to get the idea!

Edit: just thought about a more simple sed command, just removing the unwanted part:
echo "<title>Download Page</title>" | sed 's/<\/*title>//g'

Last edited by colucix; 03-06-2009 at 05:52 PM.
Old 03-06-2009, 08:04 PM   #3
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 17,106

Rep: Reputation: 2615Reputation: 2615Reputation: 2615Reputation: 2615Reputation: 2615Reputation: 2615Reputation: 2615Reputation: 2615Reputation: 2615Reputation: 2615Reputation: 2615
I prefer the first offering - pick the data you want to keep. Easy to make it handle the potential for extra data on the record. Even the unlikely multiple <title>..</title> pairs.
The "simple" latter offering won't deal with extra data at all.

Where regex is concerned I favour being as explicit as possible - it's way too easy for things to slip "under the radar".


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
bash - awk, sed, grep, ... advice schneidz Programming 13 08-25-2008 09:30 AM
Sed, Awk, grep,Search,delete joyds219 Linux - Newbie 6 04-03-2008 06:15 AM
awk/sed to grep the text ahpin Linux - Software 3 10-17-2007 12:34 AM
Need to strip words from front of line. sed/awk/grep? joadoor Linux - Software 6 08-28-2006 04:39 AM
diffrence between grep, sed, awk and egrep Fond_of_Opensource Linux - Newbie 3 08-18-2006 08:15 AM > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:26 PM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration