LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   extracting elements (https://www.linuxquestions.org/questions/linux-newbie-8/extracting-elements-672311/)

skydive814 09-25-2008 09:51 AM

extracting elements
 
I'm stuck between a rock and a hard place...hopefully you can help.

I have a text file filled with all sorts of junk....sprinkled throughout are "proprietary" markup tags. Here's an example:

Code:

COM blah
COM blah
{{foo}}
  {{bar}}1{{/bar}} {{baz}}<MNFRAME_VAR_HERE>{{/baz}}
  {{blah}}<...>{{/blah}}    {{tag5}}off{{/tag5}}
COM junk I don't care about
COM junk I don't care about
{{foo}}
  {{bar}}2{{/bar}} {{baz}}<MNFRAME_VAR_HERE>{{/baz}}
  {{blah}}<...>{{/blah}}    {{tag5}}on{{/tag5}}

  • none of the "elements" have attributes....they are simply markup for variables coming from the mainframe
  • there can be multiple "elements" on a line
  • the white space between elements on the same line is unknown. you will see
    {{...}}{{....}} {{....}}
    on one line then see
    {{...}} {{....}} {{....}}
    on the next
  • there is no closeing {{foo}}. That's in a different file

what I need is a script that will extract all the {{???}} elements. My plan (at this point) is to pipe that through "sort -u" and create a DB table with a column for each unique element.

I'm still early in my research on the best way to accomplish my task, but in any case, a bash/sed/awk/perl script that would return my unique elements would be very helpful.

basically....a non-greedy 'grep -e "{{.*}}" filename'

any help is appreciated....

thanks,

-sky

skydive814 09-25-2008 10:13 AM

maybe something like this:
 
Code:

grep "{{" DTAGSTHD | sed 's/[ \t]*{{/{{/g' | sed 's/}}{{/}}\n{{/g' | sed 's/[ \t]*$//'


All times are GMT -5. The time now is 10:53 PM.