LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Need help with SED command to remove text (https://www.linuxquestions.org/questions/linux-newbie-8/need-help-with-sed-command-to-remove-text-787222/)

guessity 02-05-2010 07:09 AM

Need help with SED command to remove text
 
Hi Again,
I got stuck again with SED and I need a lil' help.

I have the following file named y.xml

Code:

<?xml version="1.0" encoding="iso-8859-1"?> <rss version="2.0"> <channel> <title>News</title> <lastBuildDate>Tue, 21 Jul 2009 16:32:47 Asia/Singapore</lastBuildDate><art>201002041656400001">
</art><aut>ANI</aut>
<item><title>Oh My God!!</title>
<cats>
Lifestyle
</cats>
<subcats>
Washington
</subcats>
<description><![CDATA[<p>
God</p>
]]></description></item></channel></rss>
<?xml version="1.0" encoding="iso-8859-1"?> <rss version="2.0"> <channel> <title>News</title> <lastBuildDate>Tue, 21 Jul 2009 16:32:47 Asia/Singapore</lastBuildDate><art>201002041656400018">
</art><aut>ANI</aut>
<item><title>twitter now</title>
<cats>
Tech
</cats>
<subcats>
Washington
</subcats>
<description><![CDATA[<p>
oh out here</p>
]]></description></item></channel></rss>

I am trying to fix the xml feed by trying to remove
Code:

</channel></rss>
<?xml version="1.0" encoding="iso-8859-1"?> <rss version="2.0"> <channel> <title>News</title> <lastBuildDate>Tue, 21 Jul 2009 16:32:47 Asia/Singapore</lastBuildDate><art>201002041656400018">
</art><aut>ANI</aut>

I used SED with the following -

Code:

sed -n '/</channel></rss>/,/</art><aut>ANI</aut>/p' y.xml
I get weird output. The xml can have any number of items. Any idea where i went wrong?

Can someone help me with SED or any other ways to remove those lines as I cant seem to figure out. :(

pixellany 02-05-2010 07:25 AM

"sed -n" means do not print unless instructed, and "p" means print. Thus your expression does the opposite of what you intended.

You want something like this:
sed '/start/,/stop/ d'

the second problem is the "/" characters inside the two addresses. These need to be escaped so they are not confused with the boundaries of the addresses. e.g.:

/<\/channel><\/rss>/


All times are GMT -5. The time now is 11:23 PM.