LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Grep data inside <body>*</body> only (https://www.linuxquestions.org/questions/linux-newbie-8/grep-data-inside-body-%2A-body-only-761195/)

tpubcom 10-11-2009 01:46 PM

Grep data inside <body>*</body> only
 
Is there a way to use grep or any other command to only search for text inside the body tags of an html file? I don't want the data inside the title or any of the header tags to show up.

Thanks in advance.

Tinkster 10-11-2009 01:54 PM

Hi,

welcome to LQ ...

Something like this *should work* (untested).
Code:

sed -n '/<body>/,/<\/body>/p' file | grep "search string"

catkin 10-11-2009 02:23 PM

But a similar question was asked recently (sorry -- can't find it to give a link) and we came up with a few ingenious ways of doing it and then somebody sanely pointed out that HTML allows a lot of variation in formatting (for example, line ends are only token separators) and that automated editing was very much better done with specialist tools that are written to work with HTML syntax. Made a lot of sense.

Quick netsearch got this page. Might be some use.

chrism01 10-11-2009 06:19 PM

As catkin said, a proper tool is recommended.
If you know Perl or don't mind learning it, this module may be the one you want;http://search.cpan.org/~gaas/HTML-Parser-3.62/Parser.pm


All times are GMT -5. The time now is 05:15 PM.