line-and-regex based programs like grep
do not handle formats that use free-form nested tagging like xml
very well. You really should use a tool that has a dedicated xml parser like perl
If you would supply an example of the xml data and what you want to extract from it, I could help you write up an xmlstarlet
rule to extract it for you.
As for the above, first of all grep
can only do simple matching and can't extract substrings from a match. sed can
do it, but without a sample of the input to work with we can only guess the expression.
sed -rn '/name/ s|.*name>([^>]+)</name.*|\1|p' infile.xml
The above assumes that there's a single tag in the file that formatted like this:
<name>data I want</name>
...and you want the part between them.
" alone can't be used because "*
" is greedy
, and will continue matching everything to the end of the line. You have to use a negating pattern to stop it where you want it to.
But again, this depends on the xml being regular enough for the full pattern to always exist on a single line. Again, it's much better to use a real xml parser.
Edit: To do the same with xmlstarlet
, the following command should work:
xmlstarlet sel -T -t -v '//name' -n input.xml
It will extract the values of all "name
" elements, and print them one per line, just like sed
. But unlike sed
, the exact structure of the file is immaterial.
Do note however that it is very picky about the input being well-formatted xml.