LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Saving part of a html document (https://www.linuxquestions.org/questions/programming-9/saving-part-of-a-html-document-570011/)

bruno buys 07-17-2007 02:38 AM

Saving part of a html document
 
Hi all,
I am writing a few bash scripts that need to extract parts of html documents at a certain point. Consider the exerpt below: I need to select the text begining at <!--TITULO--> and ending at <!--TITULO--> and also the part begining at <!--TEXTO--> and ending at <!--TEXTO-->. The first one is easy because its one line, so grep does it. The second is the problem because its a bunch of lines.
System is debian etch, bash is 3.1dfsg-8. Any help would be appreciated, thanks!




</tr>
<tr>
<td class="tit18b">
<!--TITULO-->Foguete VSB-30 deve ser lançado hoje em Alcantara<!--TITULO-->
</td>
</tr>
<tr>
<td class="texto11" height="20">
<!--TEXTO-->
<P> <P>Agencia JB<P> <P><P> <P>MARANHAO - O Veiculo de Sondagem Booster (SBV-30), no Centro de Lancamento de Alcantara, no Maranhao, deve ser lancado nesta segunda-feira. As condicoes meteorologicas sao favoraveis para o lancamento do foguete que deve ocorrer as 10h30, de acordo com a assessoria de imprensa da Agencia Espacial Brasileira (AEB).<P>O Veiculo de Sondagem Booster (VSB-30) levara nove experimentos cientificos, a maioria de universidades brasileiras. O voo terá duracao total de 20 minutos e o foguete chegara a cerca de 280 quilometros do solo.<P>
<!--TEXTO-->
</td>
</tr>
<tr>
<td>

ghostdog74 07-17-2007 02:57 AM

Code:

awk '/<!--TITULO-->/,/<!--TITULO-->/{
      gsub("<!--TITULO-->","")
      print
    }
    flag {
        if ( /<!--TEXTO-->/ ) {
              flag=0;next
        }
        else {        print }
    }
    /<!--TEXTO-->/{
        flag=1
        next
    }' "file"


bruno buys 07-17-2007 11:15 AM

nice, it worked. I'll learn some awk...

thanks friend!


All times are GMT -5. The time now is 05:06 PM.