LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 06-04-2007, 06:38 AM   #1
Fond_of_Opensource
Member
 
Registered: May 2006
Posts: 55

Rep: Reputation: 15
Exclamation sed command extract contents withing body tag of html


hi all,

what is the sed command syntax to extract the html body tag contents? I have a html file file.html. I want to only extract the text between the <body> and </body> tag. What is the sed command for doing this,

Thankx.
 
Old 06-04-2007, 06:47 AM   #2
bathory
Guru
 
Registered: Jun 2004
Location: Piraeus
Distribution: Slackware
Posts: 10,895

Rep: Reputation: 1322Reputation: 1322Reputation: 1322Reputation: 1322Reputation: 1322Reputation: 1322Reputation: 1322Reputation: 1322Reputation: 1322Reputation: 1322
From sed examples try:
Code:
sed -e :a -e 's/<[^>]*>//g;/</N;//ba'
 
Old 06-04-2007, 07:00 AM   #3
Fond_of_Opensource
Member
 
Registered: May 2006
Posts: 55

Original Poster
Rep: Reputation: 15
hi Bathory,

I don't want all the tags eliminated. I want to print whatever there within the tags <body> and </body>.
 
Old 06-04-2007, 07:25 AM   #4
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371
Hi,

Something like this:

sed -n '/<body>/,/<\/body>/p' file.html

Only downside to this is that if there are any words before the <body> and after </body> (on the same line), these words will also be printed. But this can be solved as follows:

sed -n '/<body>/,/<\/body>/p' file.html | sed -e '1s/.*<body>/<body>/' -e '$s/<\/body>.*/<\/body>/'

Or, if you also want to remove the body tag:

sed -n '/<body>/,/<\/body>/p' file.html | sed -e '1s/.*<body>//' -e '$s/<\/body>.*//'

Hope this helps.
 
Old 06-04-2007, 07:34 AM   #5
Fond_of_Opensource
Member
 
Registered: May 2006
Posts: 55

Original Poster
Rep: Reputation: 15
Smile

thankx very much druuna, it solved my problem.
 
Old 06-04-2007, 07:38 AM   #6
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728
A more knowledgeable member has rendered my attempted help as irrelevant...

Last edited by pixellany; 06-04-2007 at 07:40 AM.
 
Old 06-04-2007, 07:55 AM   #7
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371
@Fond_of_Opensource: You're welcome

@pixellany: Sorry about that
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Extract body message from raw e-mail rigel_kent Programming 2 06-03-2006 06:07 AM
Extract contents from an ISO General Linux - Software 3 12-10-2005 07:46 PM
html in cgi, closing a tag ludeKing Programming 2 05-09-2004 10:34 AM
PERL: split on html tag? ocularbob Programming 12 09-08-2003 05:52 PM
html .avi tag ?? itsjustme Programming 2 07-30-2003 12:32 PM


All times are GMT -5. The time now is 11:52 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration