LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   newbe bash question ( grep processing) (https://www.linuxquestions.org/questions/programming-9/newbe-bash-question-grep-processing-257164/)

therealbxp 11-20-2004 05:47 AM

newbe bash question ( grep processing)
 
so first of all hello everybody this is my first post
i am also new to linux so if possible be very clear and thx in advance

the problem
i have writen a bash script that gets a html file from a server(this is done with wget, no probs so far) . now i want to process this file( there is always a part(script) i want to delete).
the part i want to delete is a javascript( placed between <script> </script)
how can i do this
i know u can search the file and get the linenumbers with grep( problem here after the linenumbers theres also the line with text)

i think if i can solve the above problem so i only can get the numbers i would be able to write the rest of the program

quick version
how can i get only the first part( just the number198) of this line
198:<script>blblblablana</script>

thx in advance
bxp

Hko 11-20-2004 07:34 AM

Quote:

quick version
how can i get only the first part( just the number198) of this line
198:<script>blblblablana</script>
You can do that like this:
Code:

#!/bin/bash

LINE="198:<script>blblblablana</script>"
echo "${LINE%%:*}"

If you can rely on both the "<script>" and "</script>" tags to be on the same line, and also you don't need to preserve text before "<script>" and after "</script>", then you can do it easier in one go with grep:
Code:

grep -vi '<script>.*</script>' dummy.html
if the "<script>" and "</script>" tags can possibly occur on different lines, use 'sed' instead of 'grep'. then you can also preserve text before "<script>" and after "</script>".

Say, some html file ("dummy.html") looks like this:
Code:

<html>
blalbla
zzz<script>blblblablana</script>444
qweqw
<script>blblbla
asas
asas
blana</script>
123
</html>

Then this script:
Code:

#!/bin/bash

sed \
-e '/<script>.*<\/script>/{' \
-e 's/^\(.*\)<script>.*<\/script>\(.*\)$/\1 \2/' \
-e 'n' \
-e '}' \
-e 's/^\(.*\)<script>/\1/' \
-e 's/<\/script>\(.*\)$/\1/' \
dummy.html

...outputs:
Code:

<html>
blalbla
zzz 444
qweqw
blblbla
asas
asas
blana
123
</html>


therealbxp 11-20-2004 07:40 AM

thx for the excellent answer, sorry for the stupid question but everybody has to learn

greetings bxp


All times are GMT -5. The time now is 03:02 PM.