LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
LinkBack Search this Thread
Old 03-01-2005, 07:22 PM   #1
thanhVic
LQ Newbie
 
Registered: Jan 2005
Location: Canada
Distribution: Fedora3
Posts: 26

Rep: Reputation: 15
a sed problem


Hi there,

I have a question like this,

I have a file named "text.html":

<BODY> This is a HTML page </BODY>

How can I use "sed" to print out:

<BODY>
This is a HTML page
</BODY>


I try several methods but they didn't work. Can you help me out ?, thank you.

Last edited by thanhVic; 03-01-2005 at 07:24 PM.
 
Old 03-01-2005, 07:46 PM   #2
ortho-orange#42
LQ Newbie
 
Registered: Feb 2005
Distribution: Slackware
Posts: 16

Rep: Reputation: 2
Code:
sed -e 's/\([<A-Za-z>]*\)\([A-Za-z ]*\)\([<//A-Za-z>]*\)/\1\n\2\n\3/'
Seems to work for me on this particular instance. (being a regexp newb myself, you may want to wait for more experienced responses (also, to gurus, I'd be interested to hear of any problems regarding what I posted))

HTH
 
Old 03-01-2005, 08:03 PM   #3
Dark_Helmet
Senior Member
 
Registered: Jan 2003
Posts: 2,786

Rep: Reputation: 367Reputation: 367Reputation: 367Reputation: 367
Here's a command geared specifically to your example:
Code:
cat text.html | sed 's@<BODY> @<BODY>\n@' | sed 's@ </BODY>@\n</BODY>@'
To expand on what ortho-orange#42 gave:
Code:
cat text.html | sed 's@\(<.\+>\)\(.*\)\(</.\+>\)@\1\n\2\n\3@'
Aside from being extraordinarily cryptic, it is reasonable as long as you don't have nested tags. Basically, it requires something in the first tag (between the '<' and '>'), allows for nothing between the start and end tag, and also requires something in the end tag (between '</' and '>'). It is only slightly more flexible than ortho's; the one above will allow any character in the tag (uppercase, lowercase, numbers, punctuation, spaces, etc.). Whether that's a good thing entirely depends on the data you intend to use.
 
Old 03-01-2005, 08:51 PM   #4
ortho-orange#42
LQ Newbie
 
Registered: Feb 2005
Distribution: Slackware
Posts: 16

Rep: Reputation: 2
Thanks, Dark_Helmet, for clearing that up.
It does seem cryptic, now that you mention it.
 
Old 03-01-2005, 10:00 PM   #5
Dark_Helmet
Senior Member
 
Registered: Jan 2003
Posts: 2,786

Rep: Reputation: 367Reputation: 367Reputation: 367Reputation: 367
Hehehe... oh no, I wasn't saying yours in particular was cryptic.

sed has a way of turning even the simplist ideas into gibberish. They're all nasty...
 
Old 03-02-2005, 01:05 AM   #6
farmerjoe
Member
 
Registered: Oct 2004
Location: Texas
Distribution: Ubuntu - Home, RHEL4 - Server
Posts: 96

Rep: Reputation: 15
This sed command will remove MOST html tags. You just need to run it on your file:

sed -e :a -e 's/<[^>]*>//g;/</N;//ba' text.html

Hope this helps,
-farmerjoe
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Sed problem twintornado Programming 2 06-02-2005 08:25 AM
sed problem fossilet Linux - Software 4 12-26-2004 05:44 AM
sed problem Warmduvet Programming 2 09-15-2004 06:33 PM
Help Sed Problem anirudh Linux User Groups (LUG) 1 08-30-2004 02:25 PM
Insert character into a line with sed? & variables in sed? jago25_98 Programming 5 03-11-2004 06:12 AM


All times are GMT -5. The time now is 12:32 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration