LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   help extracting a matching pattern and next lines of match (https://www.linuxquestions.org/questions/programming-9/help-extracting-a-matching-pattern-and-next-lines-of-match-754339/)

madvicious 09-11-2009 02:41 AM

help extracting a matching pattern and next lines of match
 
Hi there,

i'm having some problems just making an awk script (i've tried this way, but other way can be posible for sure), for the next file

file.txt

<register>
<createProfile>
<result>0</result>
<description><![CDATA[OK]]></description>
<msisdn>34661461174</msisdn>
<inputOmvID>1</inputOmvID>
<inputGroupID>-2</inputGroupID>
<ProfileOmvID>1</ProfileOmvID>
<contentID>3365</contentID>
<contentProfileID>3525</contentProfileID>
<chargingProfileTypeId>22</chargingProfileTypeId>
<operationID>201022</operationID>
...

i have to test if <createProfile> is in the file. If it does, then i have to extract the lines

<msisdn>34661461174</msisdn>

and <contentProfileID>3525</contentProfileID>

so i've tried staring with something like this

> awk '/^<createProfile>/{getline;print}' file.txt

but this only print the next line to the matching pattern <createProfile>.

With this script

> awk '/^<createProfile>/ {print NR,$0}' file.txt

i get the line where he regex matches, bu i don't know how to go on to print the registers for <msisdn>34661461174</msisdn> and <contentProfileID>3525</contentProfileID>

The file is always this way of structure, i mean all the tags are in the same position if the first matching pattern is matched.

Thank you for any help

colucix 09-11-2009 04:39 AM

You can try with a flag: every time the tag <createProfile> is encountered you switch on the flag. When the tags <msisdn> and <contentProfileID> are encountered they are printed out (or processed). Switch off the flag when it encounters the last tag you need, in the order they appear. E.g. something like:
Code:

/^<createProfile>/ {
  isCreate = 1
}
isCreate && /^<msisdn>/
isCreate && /^<contentProfileID>/ {
  print
  isCreate = 0
}


madvicious 09-11-2009 05:19 AM

Thanks colucix in another forum i've get this answer that matches my needs


#!/bin/bash

awk '
/createProfile/{f=1}
f && /createProfile/
f && /msisdn/
f && /contentProfileID/
' file.txt

and could match one that one set of XML :)

Thanks you very much for your kind answer :)

Best wishes ;)

colucix 09-11-2009 05:23 AM

Quote:

Originally Posted by madvicious (Post 3678499)
#!/bin/bash

awk '
/createProfile/{f=1}
f && /createProfile/
f && /msisdn/
f && /contentProfileID/
' file.txt

Indeed is quite the same solution, except for the "switching off" part. :)

jschiwal 09-11-2009 05:40 AM

If createProfile always precedes the items you want to extract, you can use sed easily as well:
Code:

sed -n '/createProfile/,${ /msisdn/p
                          /contentID/p
                          }' file.txt
<msisdn>34661461174</msisdn>
<contentID>3365</contentID>

If createProfile can appear anywhere, you can save the lines in a variable or array and print them out after the file is read. For sed you could push msisdn lines in the Hold buffer. For awk, you would probably have variables printed out in the END block.

madvicious 09-11-2009 05:58 AM

thank you jschiwal, it's another great solution too :)

i'll take it into account too :D

jschiwal 09-12-2009 07:11 PM

For extracting particular items from xml files, look at using xsltproc. That is what it is designed for.

ghostdog74 09-12-2009 07:38 PM

Quote:

Originally Posted by madvicious (Post 3678499)
Thanks colucix in another forum i've get this answer that matches my needs


#!/bin/bash

awk '
/createProfile/{f=1}
f && /createProfile/
f && /msisdn/
f && /contentProfileID/
' file.txt

and could match one that one set of XML :)

Thanks you very much for your kind answer :)

Best wishes ;)

you can also combine them
Code:

...
f && /createProfile|msisdn|contentProfileID/
..


Sergei Steshenko 09-13-2009 01:01 AM

Quote:

Originally Posted by madvicious (Post 3678499)
Thanks colucix in another forum i've get this answer that matches my needs


#!/bin/bash

awk '
/createProfile/{f=1}
f && /createProfile/
f && /msisdn/
f && /contentProfileID/
' file.txt

and could match one that one set of XML :)

Thanks you very much for your kind answer :)

Best wishes ;)

I do not think XML is line oriented. If I'm right, your approach is wrong, because nobody promises items will stay on the same line forever i.e., one day it may become


Code:

<msisdn>
  34661461174
</msisdn>

.

Again, if I'm correct, use a true XML parser.


All times are GMT -5. The time now is 04:59 AM.