sed command extract contents withing body tag of html
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
sed command extract contents withing body tag of html
hi all,
what is the sed command syntax to extract the html body tag contents? I have a html file file.html. I want to only extract the text between the <body> and </body> tag. What is the sed command for doing this,
Only downside to this is that if there are any words before the <body> and after </body> (on the same line), these words will also be printed. But this can be solved as follows:
sed -n '/<body>/,/<\/body>/p' file.html | sed -e '1s/.*<body>/<body>/' -e '$s/<\/body>.*/<\/body>/'
Or, if you also want to remove the body tag:
sed -n '/<body>/,/<\/body>/p' file.html | sed -e '1s/.*<body>//' -e '$s/<\/body>.*//'
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.