LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   read file and filter out specific tags in file (https://www.linuxquestions.org/questions/linux-newbie-8/read-file-and-filter-out-specific-tags-in-file-774970/)

noob_soni 12-11-2009 10:37 AM

read file and filter out specific tags in file
 
I am a newbie in shell scripting and would appreciate a help with this qstn. many thanks in advance and apologies for the big input file.
I have a .xml file that is a concat of multiple rss files. reqrmnt is to filter out all extra content in the file and keep only the actual items.

eg:
<?xml version="1.0" encoding="iso-8859-1"?>
<rss>
...........
some text here
<channel>
..........
some more tags here
<item>
<title>Item Example 1</title>
<link>http://www.domain.com/link1.htm</link>
</item>
<item>
<title>Item Example 2</title>
<link>http://www.domain.com/link2.htm</link>
</item>
</channel>
</rss>
<rss>
.....
some other tags
......
<item>
<title>Item Example 3</title>
<link>http://www.domain.com/link3.htm</link>
</item>
.......
more tags
.......
<item>
<title>Item Example 4</title>
<link>http://www.domain.com/link4.htm</link>
</item>
<item>
<title>Item Example 5</title>
<link>http://www.domain.com/link5.htm</link>
</item>
</rss>

//item can have more attribs

output should be:
<item>
<title>Item Example 1</title>
<link>http://www.domain.com/link1.htm</link>
</item>
and other items

much thanks,
Soni

Web31337 12-11-2009 11:01 AM

grep -v rss ?
i think if your format of RSS is static(one tag per line) it's quite simple to remove unwanted tags with grep, unless format will change to, say, single-line, where you will need to either use hard regexes or external programming lang.

ghostdog74 12-11-2009 07:25 PM

Code:

$ awk '/<\/item>/{f=0}/<item>/{f=1}f ' file


All times are GMT -5. The time now is 08:46 PM.