[SOLVED] Script to print repeated values separated by line break

ntubski · 04-10-2014, 10:37 AM

Quote:

Originally Posted by Perseus

But is not printing anything if I put the full path:

Code:

Dir.glob("C:\XMLs\latest\*.xml").each do |file|  
	My script
end

I try in cygwin and ruby for windows.

I expect you need to escape the backslashes, as in C:\\XMLs\\latest\\*.xml. If you installed ruby via cygwin you may need to use cygwin paths: /cygdrive/c/XMLs/latest/*.xml

Perseus · 04-10-2014, 01:25 PM

Hello grail and ntubski,

Thanks for your answers. I've tried escaping the backslashes but didn't work. I've tried with full path for Cygwin
like you said and worked and for Ruby on Windows I had to change from "C:\XMLs\latest\*.xml" to "C:/XMLs/latest/*.xml"
like in Linux and works in that way.

The last code I have is below, I only would like to know how to compress the code for the parts of code possible, for example the long "puts command" at the end in red. Or if it is possible to include in a single array the 3 hashes in red (mr, pk, m3).

The other thing I've tried to reduce without success is replace the path with a.elements.each(...) with a variable
like this a.elements.each(Var << "NA"), but I don't know why the output changes when I do that.
for grail the output is correct when does that.

Thanks again for the help.

Code:

#!/usr/bin/ruby -w
require 'rexml/document'
include REXML

print "RepName|RepIn|ReportType|Date|NA|NRB|SubRangeB|SubRangeE|NA|NRB|SubRangeB|SubRangeE|MXA|MXB\n"

Dir["/cygdrive/c/XMLs/*.xml"].each do |file|
xmlfile = File.new(file)
xmldoc = Document.new(xmlfile)
xmldoc.elements.each("REPORT-01-NUUMAX16/SReport/") {
	|e|

	print e.elements["RepName"].text + "|" + e.elements["RepIn"].text + "|"
	e.elements.each("RepIni/Report"){
		|a|

		print	a.elements["ReportType"].text, "|" +
			a.elements["ReportData/MainSec/Date"].text + "|"
				
		mr = { "NA" => [], "NRB" => [], "SubRangeB" => [], "SubRangeE" => []}
		pk = { "NA" => [], "NRB" => [], "SubRangeB" => [], "SubRangeE" => []}
		m3 = { "MXA" => [], "MXB" => [] }

		a.elements.each("ReportData/MainSec/Indicators_MAX-MR/MR_NRanges/MRValues/MR_ValRanges/NA"){|c| mr["NA"] << c.text }
		a.elements.each("ReportData/MainSec/Indicators_MAX-MR/MR_NRanges/MRValues/MR_ValRanges/NRB"){|c| mr["NRB"] << c.text}
		a.elements.each("ReportData/MainSec/Indicators_MAX-MR/MR_NRanges/MRValues/MR_ValRanges/SubRange/SubRangeB"){|c| mr["SubRangeB"] << c.text}
		a.elements.each("ReportData/MainSec/Indicators_MAX-MR/MR_NRanges/MRValues/MR_ValRanges/SubRange/SubRangeE"){|c| mr["SubRangeE"] << c.text}	
		
		a.elements.each("ReportData/MainSec/Indicators_MAX-MR/PK_NRanges/MRValues/MR_ValRanges/NA"){|c| pk["NA"] << c.text}
		a.elements.each("ReportData/MainSec/Indicators_MAX-MR/PK_NRanges/MRValues/MR_ValRanges/NRB"){|c| pk["NRB"] << c.text}
		a.elements.each("ReportData/MainSec/Indicators_MAX-MR/PK_NRanges/MRValues/MR_ValRanges/SubRange/SubRangeB"){|c| pk["SubRangeB"] << c.text}
		a.elements.each("ReportData/MainSec/Indicators_MAX-MR/PK_NRanges/MRValues/MR_ValRanges/SubRange/SubRangeE"){|c| pk["SubRangeE"] << c.text}	
		
		a.elements.each("ReportData/MainSec/MAX03_NRanges/MXA"){|c| m3["MXA"] << c.text}
		a.elements.each("ReportData/MainSec/MAX03_NRanges/MXB"){|c| m3["MXB"] << c.text}									

        puts mr.values.map{|z| z.size==1?z:('"' + z.join(",") + '"')}.join('|') + "|" +
             pk.values.map{|z| z.size==1?z:('"' + z.join(",") + '"')}.join('|') + "|" +
             m3.values.map{|z| z.size==1?z:('"' + z.join(",") + '"')}.join('|')
	}
}
end

grail · 04-11-2014, 07:13 AM

So I had a little play and this may or may not be what you need but it goes towards explaining what I meant when I said, "somehow use the value as a reference in the hash":

Code:

#!/usr/bin/env ruby

require 'rexml/document'
include REXML

# Format the hash data as required
# Currently there is no checking to see if a field may not exist
# that is in the header

def print_data(hsh)
    hsh.each do |key,value|
        if value.is_a?(Hash)
            value.each_value{ |v| print '"' + v.join(",") + '"|' }
        else
            print value + (key =~ /^MXB/?"":"|")
        end
    end 
    puts
end

# Recurse down through all nodes / elements within the xml tree
# and store the values in a hash

def recurse(element, hsh = {}) 

    element.elements.each do |child| 

        if child.name == 'SReport' && ! hsh.empty?
            print_data(hsh)
            hsh.clear
        end

        pk_finished = child.name =~ /^MX/?true:false

        hsh[:mr] = {} if child.name == 'MR_NRanges'
        hsh[:pk] = {} if child.name == 'PK_NRanges'

        if child.has_text? && child.text =~ /^[^[:space:]]+$/
            if pk_finished || ! hsh[:mr].is_a?(Hash)
                hsh[child.name] = child.text
            else
                if hsh[:pk].is_a?(Hash)
                    hsh[:pk][child.name] ||= []
                    hsh[:pk][child.name] << child.text
                else
                    hsh[:mr][child.name] ||= []
                    hsh[:mr][child.name] << child.text
                end
            end

        end

        recurse(child,hsh) if child.has_elements?
    end 

    hsh

end

xmldoc = Document.new File.new("input_1.xml")

puts "RepName|RepIn|ReportType|Date|NA|NRB|SubRangeB|SubRangeE|NA|NRB|SubRangeB|SubRangeE|MXA|MXB"

print_data(recurse(xmldoc))

I did not include the option for looping but I am sure you can add that in

Let me know if any of it needs some clarification?

If we are lucky, one of the gurus like kurumi may jump in and show better ways to do some of the above

Perseus · 04-12-2014, 02:03 AM

Hello grail,

Just great, it seems to work and I'm trying to understand the way you did it.

I've tried to modify your last code in orderto avoid print all hashes surrounded with double quotes,
since only is mandatory when hash has more than one element. For example I need in the output as below.

From Column 5 to 14, if hash has one element don't surround with double quotes, but if has more than one element
surround the values with "".

For example,
- column 5(NA), in line 1, has 3 values, then should o surrounded in the output, like this ...|"763,358,852"|...
- column 5(NA), in line 2, has 1 value, then shouldn't be printed surrounded with "", like this ...|256|...

Code:

RepName|RepIn|ReportType|Date|NA|NRB|SubRangeB|SubRangeE|NA|NRB|SubRangeB|SubRangeE|MXA|MXB
JEUOP|KUI|Regular|2014-03-15|"763,358,852"|"91,95,76"|"000,130,200"|"899,149,299"|"441,705"|"97,98"|"786,677"|"789,859"|999|87
KURMT|MUR|Regular|2014-03-19|256|12|100|999|113|12|466|899|398|02

Many thanks again for such help.

Best regards

grail · 04-12-2014, 03:04 AM

Just need to tweak the print_data to now look more like your previous print option:

Code:

# change
value.each_value{ |v| print '"' + v.join(",") + '"|' }

# to
value.each_value{|v| print (v.size == 1?v[0]:('"' + v.join(",") + '"')) + "|" }

Perseus · 04-13-2014, 02:38 AM

Hello grail,

The double quotes issue works as expecting!

I'm trying yo understand your code in order to add more nodes to the extraction of values. For example, may you
explaing me a little bit the lines in red.

Code:

def print_data(hsh)
    hsh.each do |key,value|
        if value.is_a?(Hash)
            value.each_value{ |v| print '"' + v.join(",") + '"|' }
        else
            print value + (key =~ /^MXB/?"":"|") Is this supposed to check the last node to print?
        end
    end 
    puts
end

# Recurse down through all nodes / elements within the xml tree
# and store the values in a hash

def recurse(element, hsh = {}) 

    element.elements.each do |child| 

        if child.name == 'SReport' && ! hsh.empty?
            print_data(hsh)
            hsh.clear
        end

        pk_finished = child.name =~ /^MX/?true:false what is this line for? and instead of MXA and MXB I have MXA and MZV how would change? I've tried instead of /^MX/, I've tried /^MXA/||/^MZV/. Is this correct?

        hsh[:mr] = {} if child.name == 'MR_NRanges'
        hsh[:pk] = {} if child.name == 'PK_NRanges' Can I add more nodes here to extract more values?

        if child.has_text? && child.text =~ /^[^[:space:]]+$/ This regex matches the closure of Top node?
            if pk_finished || ! hsh[:mr].is_a?(Hash)
                hsh[child.name] = child.text
            else
                if hsh[:pk].is_a?(Hash)
                    hsh[:pk][child.name] ||= []
                    hsh[:pk][child.name] << child.text what do these 2 lines mean?
                else
                    hsh[:mr][child.name] ||= []
                    hsh[:mr][child.name] << child.text
                end
            end

        end

        recurse(child,hsh) if child.has_elements?
    end 

    hsh

end

Thanks in advance for the help.

grail · 04-13-2014, 03:23 AM

Code:

print value + (key =~ /^MXB/?"":"|") Is this supposed to check the last node to print?

Yes. This assumes an element name starting with MXB will be last and hence will not need a pipe (|) after it

Code:

pk_finished = child.name =~ /^MX/?true:false what is this line for? and instead of MXA and MXB I have MXA and MZV how would change? I've tried instead of /^MX/, I've tried /^MXA/||/^MZV/. Is this correct?

This identifies when we are out of the PK_NRanges so the following elements are not to be part of that hash nor require arrays to be created.
As for a change, you only need what will be the next element name after the pk section, so if MZV is not going to be next, ie MXA will be before it, then you do not need to change anything.
If on the other hand you are not sure which will appear first then the change would be:

Code:

/^M(X|Z)/

You would probably need to check that theses do not possibly appear elsewhere is in the data as it will cause issues.

Code:

hsh[:pk] = {} if child.name == 'PK_NRanges' Can I add more nodes here to extract more values?

I am not sure what you mean here? Will there be new sections where you need to append data, like the NA ranges?

Code:

if child.has_text? && child.text =~ /^[^[:space:]]+$/ This regex matches the closure of Top node?

This has nothing to do with nodes per say, it is checking if an element contains text and if so, does it also only contains data that is not white space.

Code:

<NA>731</NA>   #valid as text is not whitespace
<SReport>
        <blah> #the whitespace prior to <blah> is returned as the text part of that element, so not what we wanted

Code:

hsh[:pk][child.name] ||= []
hsh[:pk][child.name] << child.text what do these 2 lines mean?

The first checks to see if this value has been initialised to an array and if not then set it to an array type
The second then appends our data to the array. If you try to append prior to initialising ruby does not know its type and hence it will be from the Nil class which does not
have an append option.

Perseus · 04-14-2014, 03:20 AM

Hello grail,

Thanks for explanation.

I've been trying to modify your last code in order to handle a slightly different XML input, but showing different output.

I was wondering if your last code may be forced to extract only values of desired nodes, since the current code prints different ouput
if the XML input has some other nodes before <SReport> and after <MAX03_NRanges>. Those other nodes are not of interest but it seems the code is printin values of others nodes.

Besides that if I have another node similar to MR_NRanges and PK_NRanges, that is YU_NRanges that goes after PK_NRanges, how must be changed the code?

I've trying add another line as below in red, but is not working.

Code:

        hsh[:mr] = {} if child.name == 'MR_NRanges'
        hsh[:pk] = {} if child.name == 'PK_NRanges'
        hsh[:pk] = {} if child.name == 'YU_NRanges'

Below I put the slightly different input for reference. The nodes I wamt to print is the same but adding YU_NRanges too.

Many thanks in advance.

Code:

<?xml version="1.0" encoding="UTF-8"?>
<REPORT-01-NUUMAX16 >
  <NUUMAX16FHeader>
    <Date1>2013-12-17</Date1>
    <CodeXV>7.4</CodeXV>
    <NUUMAX16Ver>8.91</NUUMAX16Ver>
  </NUUMAX16FHeader>
  <SReport>
    <RepName>JEUOP</RepName>
    <RepIn>KUI</RepIn>
    <RepIni>
      <Report>
        <ReportType>Regular</ReportType>
        <ReportData>
          <MainSec>
            <Date>2014-03-15</Date>
            <Indicators_MAX-MR>
              <MR_NRanges>
                <MRValues>
                  <MR_ValRanges>
                    <NA>763</NA>
                    <NRB>91</NRB>
                    <SubRange>
                      <SubRangeB>000</SubRangeB>
                      <SubRangeE>899</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>
                <MRValues>
                  <MR_ValRanges>
                    <NA>358</NA>
                    <NRB>95</NRB>
                    <SubRange>
                      <SubRangeB>130</SubRangeB>
                      <SubRangeE>149</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>
                <MRValues>
                  <MR_ValRanges>
                    <NA>852</NA>
                    <NRB>76</NRB>
                    <SubRange>
                      <SubRangeB>200</SubRangeB>
                      <SubRangeE>299</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>				
              </MR_NRanges>
              <PK_NRanges>
                <MRValues>
                  <MR_ValRanges>
                    <NA>441</NA>
                    <NRB>97</NRB>
                    <SubRange>
                      <SubRangeB>786</SubRangeB>
                      <SubRangeE>789</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>
                <MRValues>
                  <MR_ValRanges>
                    <NA>705</NA>
                    <NRB>98</NRB>
                    <SubRange>
                      <SubRangeB>677</SubRangeB>
                      <SubRangeE>859</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>
              </PK_NRanges>
<YU_NRanges>
                <MRValues>
                  <MR_ValRanges>
                    <NA>223</NA>
                    <NRB>11</NRB>
                    <SubRange>
                      <SubRangeB>345</SubRangeB>
                      <SubRangeE>457</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>
              </YU_NRanges>	
            </Indicators_MAX-MR>
            <MAX03_NRanges>
              <MXA>999</MXA>
              <MXB>87</MXB>
            </MAX03_NRanges>
          </MainSec>
        </ReportData>
      </Report>
    </RepIni>
  </SReport>
  <SReport>
    <RepName>KURMT</RepName>
    <RepIn>MUR</RepIn>
    <RepIni>
      <Report>
        <ReportType>Regular</ReportType>
        <ReportData>
          <MainSec>
            <Date>2014-03-19</Date>
            <Indicators_MAX-MR>
              <MR_NRanges>
                <MRValues>
                  <MR_ValRanges>
                    <NA>256</NA>
                    <NRB>12</NRB>
                    <SubRange>
                      <SubRangeB>100</SubRangeB>
                      <SubRangeE>999</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>			
              </MR_NRanges>
              <PK_NRanges>
                <MRValues>
                  <MR_ValRanges>
                    <NA>113</NA>
                    <NRB>12</NRB>
                    <SubRange>
                      <SubRangeB>466</SubRangeB>
                      <SubRangeE>899</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>
              </PK_NRanges>
              <YU_NRanges>
                <MRValues>
                  <MR_ValRanges>
                    <NA>555</NA>
                    <NRB>34</NRB>
                    <SubRange>
                      <SubRangeB>840</SubRangeB>
                      <SubRangeE>879</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>
              </YU_NRanges>			  
            </Indicators_MAX-MR>
            <MAX03_NRanges>
              <MXA>398</MXA>
              <MXB>02</MXB>
            </MAX03_NRanges>
          </MainSec>
        </ReportData>
      </Report>
    </RepIni>
  </SReport>
</REPORT-01-NUUMAX16>

---------- Post added 04-14-14 at 04:20 AM ----------

Hello grail,

Thanks for explanation.

I've been trying to modify your last code in order to handle a slightly different XML input, but showing different output.

I was wondering if your last code may be forced to extract only values of desired nodes, since the current code prints different ouput
if the XML input has some other nodes before <SReport> and after <MAX03_NRanges>. Those other nodes are not of interest but it seems the code is printin values of others nodes.

Besides that if I have another node similar to MR_NRanges and PK_NRanges, that is YU_NRanges that goes after PK_NRanges, how must be changed the code?

I've trying add another line as below in red, but is not working.

Code:

        hsh[:mr] = {} if child.name == 'MR_NRanges'
        hsh[:pk] = {} if child.name == 'PK_NRanges'
        hsh[:pk] = {} if child.name == 'YU_NRanges'

Below I put the slightly different input for reference. The nodes I wamt to print is the same but adding YU_NRanges too.

Many thanks in advance.

Code:

<?xml version="1.0" encoding="UTF-8"?>
<REPORT-01-NUUMAX16 >
  <NUUMAX16FHeader>
    <Date1>2013-12-17</Date1>
    <CodeXV>7.4</CodeXV>
    <NUUMAX16Ver>8.91</NUUMAX16Ver>
  </NUUMAX16FHeader>
  <SReport>
    <RepName>JEUOP</RepName>
    <RepIn>KUI</RepIn>
    <RepIni>
      <Report>
        <ReportType>Regular</ReportType>
        <ReportData>
          <MainSec>
            <Date>2014-03-15</Date>
            <Indicators_MAX-MR>
              <MR_NRanges>
                <MRValues>
                  <MR_ValRanges>
                    <NA>763</NA>
                    <NRB>91</NRB>
                    <SubRange>
                      <SubRangeB>000</SubRangeB>
                      <SubRangeE>899</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>
                <MRValues>
                  <MR_ValRanges>
                    <NA>358</NA>
                    <NRB>95</NRB>
                    <SubRange>
                      <SubRangeB>130</SubRangeB>
                      <SubRangeE>149</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>
                <MRValues>
                  <MR_ValRanges>
                    <NA>852</NA>
                    <NRB>76</NRB>
                    <SubRange>
                      <SubRangeB>200</SubRangeB>
                      <SubRangeE>299</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>				
              </MR_NRanges>
              <PK_NRanges>
                <MRValues>
                  <MR_ValRanges>
                    <NA>441</NA>
                    <NRB>97</NRB>
                    <SubRange>
                      <SubRangeB>786</SubRangeB>
                      <SubRangeE>789</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>
                <MRValues>
                  <MR_ValRanges>
                    <NA>705</NA>
                    <NRB>98</NRB>
                    <SubRange>
                      <SubRangeB>677</SubRangeB>
                      <SubRangeE>859</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>
              </PK_NRanges>
              <YU_NRanges>
                <MRValues>
                  <MR_ValRanges>
                    <NA>223</NA>
                    <NRB>11</NRB>
                    <SubRange>
                      <SubRangeB>345</SubRangeB>
                      <SubRangeE>457</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>
              </YU_NRanges>	
            </Indicators_MAX-MR>
            <MAX03_NRanges>
              <MXA>999</MXA>
              <MXB>87</MXB>
            </MAX03_NRanges>
          </MainSec>
        </ReportData>
      </Report>
    </RepIni>
  </SReport>
  <SReport>
    <RepName>KURMT</RepName>
    <RepIn>MUR</RepIn>
    <RepIni>
      <Report>
        <ReportType>Regular</ReportType>
        <ReportData>
          <MainSec>
            <Date>2014-03-19</Date>
            <Indicators_MAX-MR>
              <MR_NRanges>
                <MRValues>
                  <MR_ValRanges>
                    <NA>256</NA>
                    <NRB>12</NRB>
                    <SubRange>
                      <SubRangeB>100</SubRangeB>
                      <SubRangeE>999</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>			
              </MR_NRanges>
              <PK_NRanges>
                <MRValues>
                  <MR_ValRanges>
                    <NA>113</NA>
                    <NRB>12</NRB>
                    <SubRange>
                      <SubRangeB>466</SubRangeB>
                      <SubRangeE>899</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>
              </PK_NRanges>
              <YU_NRanges>
                <MRValues>
                  <MR_ValRanges>
                    <NA>555</NA>
                    <NRB>34</NRB>
                    <SubRange>
                      <SubRangeB>840</SubRangeB>
                      <SubRangeE>879</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>
              </YU_NRanges>			  
            </Indicators_MAX-MR>
            <MAX03_NRanges>
              <MXA>398</MXA>
              <MXB>02</MXB>
            </MAX03_NRanges>
          </MainSec>
        </ReportData>
      </Report>
    </RepIni>
  </SReport>
</REPORT-01-NUUMAX16>

Perseus · 04-14-2014, 03:21 AM

Hello grail,

Thanks for explanation.

I've been trying to modify your last code in order to handle a slightly different XML input, but showing different output.

I was wondering if your last code may be forced to extract only values of desired nodes, since the current code prints different ouput
if the XML input has some other nodes before <SReport> and after <MAX03_NRanges>. Those other nodes are not of interest but it seems the code is printin values of others nodes.

Besides that if I have another node similar to MR_NRanges and PK_NRanges, that is YU_NRanges that goes after PK_NRanges, how must be changed the code?

I've trying add another line as below in red, but is not working.

Code:

        hsh[:mr] = {} if child.name == 'MR_NRanges'
        hsh[:pk] = {} if child.name == 'PK_NRanges'
        hsh[:pk] = {} if child.name == 'YU_NRanges'

Below I put the slightly different input for reference. The nodes I wamt to print is the same but adding YU_NRanges too.

Many thanks in advance.

Code:

<?xml version="1.0" encoding="UTF-8"?>
<REPORT-01-NUUMAX16 >
  <NUUMAX16FHeader>
    <Date1>2013-12-17</Date1>
    <CodeXV>7.4</CodeXV>
    <NUUMAX16Ver>8.91</NUUMAX16Ver>
  </NUUMAX16FHeader>
  <SReport>
    <RepName>JEUOP</RepName>
    <RepIn>KUI</RepIn>
    <RepIni>
      <Report>
        <ReportType>Regular</ReportType>
        <ReportData>
          <MainSec>
            <Date>2014-03-15</Date>
            <Indicators_MAX-MR>
              <MR_NRanges>
                <MRValues>
                  <MR_ValRanges>
                    <NA>763</NA>
                    <NRB>91</NRB>
                    <SubRange>
                      <SubRangeB>000</SubRangeB>
                      <SubRangeE>899</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>
                <MRValues>
                  <MR_ValRanges>
                    <NA>358</NA>
                    <NRB>95</NRB>
                    <SubRange>
                      <SubRangeB>130</SubRangeB>
                      <SubRangeE>149</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>
                <MRValues>
                  <MR_ValRanges>
                    <NA>852</NA>
                    <NRB>76</NRB>
                    <SubRange>
                      <SubRangeB>200</SubRangeB>
                      <SubRangeE>299</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>				
              </MR_NRanges>
              <PK_NRanges>
                <MRValues>
                  <MR_ValRanges>
                    <NA>441</NA>
                    <NRB>97</NRB>
                    <SubRange>
                      <SubRangeB>786</SubRangeB>
                      <SubRangeE>789</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>
                <MRValues>
                  <MR_ValRanges>
                    <NA>705</NA>
                    <NRB>98</NRB>
                    <SubRange>
                      <SubRangeB>677</SubRangeB>
                      <SubRangeE>859</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>
              </PK_NRanges>
              <YU_NRanges>
                <MRValues>
                  <MR_ValRanges>
                    <NA>223</NA>
                    <NRB>11</NRB>
                    <SubRange>
                      <SubRangeB>345</SubRangeB>
                      <SubRangeE>457</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>
              </YU_NRanges>	
            </Indicators_MAX-MR>
            <MAX03_NRanges>
              <MXA>999</MXA>
              <MXB>87</MXB>
            </MAX03_NRanges>
          </MainSec>
        </ReportData>
      </Report>
    </RepIni>
  </SReport>
  <SReport>
    <RepName>KURMT</RepName>
    <RepIn>MUR</RepIn>
    <RepIni>
      <Report>
        <ReportType>Regular</ReportType>
        <ReportData>
          <MainSec>
            <Date>2014-03-19</Date>
            <Indicators_MAX-MR>
              <MR_NRanges>
                <MRValues>
                  <MR_ValRanges>
                    <NA>256</NA>
                    <NRB>12</NRB>
                    <SubRange>
                      <SubRangeB>100</SubRangeB>
                      <SubRangeE>999</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>			
              </MR_NRanges>
              <PK_NRanges>
                <MRValues>
                  <MR_ValRanges>
                    <NA>113</NA>
                    <NRB>12</NRB>
                    <SubRange>
                      <SubRangeB>466</SubRangeB>
                      <SubRangeE>899</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>
              </PK_NRanges>
              <YU_NRanges>
                <MRValues>
                  <MR_ValRanges>
                    <NA>555</NA>
                    <NRB>34</NRB>
                    <SubRange>
                      <SubRangeB>840</SubRangeB>
                      <SubRangeE>879</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>
              </YU_NRanges>			  
            </Indicators_MAX-MR>
            <MAX03_NRanges>
              <MXA>398</MXA>
              <MXB>02</MXB>
            </MAX03_NRanges>
          </MainSec>
        </ReportData>
      </Report>
    </RepIni>
  </SReport>
</REPORT-01-NUUMAX16>

grail · 04-14-2014, 03:53 AM

The code as written well extract all nodes / elements inside a SReport (so before does not matter), but as for anything after MAX03 which is still inside the report, you simply need
to place something in the print_data to tell it to stop on the MX info.

As for adding another hash, you were correct except for the fact you did not assign a new name as a key:

Code:

        hsh[:mr] = {} if child.name == 'MR_NRanges'
        hsh[:pk] = {} if child.name == 'PK_NRanges'
        hsh[:yu] = {} if child.name == 'YU_NRanges'

You will then of course need to add the corresponding section where you set the internal arrays and change from looking for when 'pk' finishes but now to look for when the last node (yu in this case)
finishes

Perseus · 04-15-2014, 01:05 AM

Hello grail,

Yes, I think I changing the code to get the values for YU_NRanges, but with the code I have so far, I don't know why, but
is printing a completely different output using the input I posted in post #24.

Thanks for the help so far.

Code:

#!/usr/bin/env ruby

require 'rexml/document'
include REXML

# Format the hash data as required
# Currently there is no checking to see if a field may not exist
# that is in the header

def print_data(hsh)
    hsh.each do |key,value|
        if value.is_a?(Hash)
            #value.each_value{ |v| print '"' + v.join(",") + '"|' }
			value.each_value{|v| print (v.size == 1?v[0]:('"' + v.join(",") + '"')) + "|" }
        else
            print value + (key =~ /^MXB/?(""):("|"))
        end
    end 
    puts
end

# Recurse down through all nodes / elements within the xml tree
# and store the values in a hash

def recurse(element, hsh = {}) 

    element.elements.each do |child| 

        if child.name == 'SReport' && ! hsh.empty?
            print_data(hsh)
            hsh.clear
        end

		yu_finished = child.name =~ /^MX/?true:false

        hsh[:mr] = {} if child.name == 'MR_NRanges'
        hsh[:pk] = {} if child.name == 'PK_NRanges'
		hsh[:yu] = {} if child.name == 'YU_NRanges'

        if child.has_text? && child.text =~ /^[^[:space:]]+$/
            if yu_finished || ! hsh[:mr].is_a?(Hash) || ! hsh[:pk].is_a?(Hash)
                hsh[child.name] = child.text
            else
				if hsh[:yu].is_a?(Hash)
					hsh[:yu][child.name] ||= []
					hsh[:yu][child.name] << child.text				
				elsif hsh[:pk].is_a?(Hash)
					hsh[:pk][child.name] ||= []
					hsh[:pk][child.name] << child.text
				else
					hsh[:mr][child.name] ||= []
					hsh[:mr][child.name] << child.text
				end
            end
        end

        recurse(child,hsh) if child.has_elements?
    end 

    hsh

end

xmldoc = Document.new File.new("input_1.xml")

puts "RepName|RepIn|ReportType|Date|NA|NRB|SubRangeB|SubRangeE|NA|NRB|SubRangeB|SubRangeE|NA|NRB|SubRangeB|SubRangeE|MXA|MXB"

print_data(recurse(xmldoc))

grail · 04-15-2014, 11:54 AM

You were close

I also made a change to the if of when to print the data so your early junk does not appear:

Code:

#!/usr/bin/env ruby

require 'rexml/document'
include REXML

def print_data(hsh)
	hsh.each do |key,value|
		if value.is_a?(Hash)
			value.each_value{|v| print (v.size == 1?v[0]:('"' + v.join(",") + '"')) + "|" }
		else
			print value + (key =~ /^MXB/?"":"|")
		end
	end
	puts
end

def recurse(element, h = {})

	element.elements.each do |n| 

		if n.name == 'SReport' && h.has_key?("RepName")
			print_data(h)
			h.clear
		end

		yu_finished = n.name =~ /^MX/?true:false

		h[:mr] = {} if n.name == 'MR_NRanges'
		h[:pk] = {} if n.name == 'PK_NRanges'
		h[:yu] = {} if n.name == 'YU_NRanges'

		if n.has_text? && n.text =~ /^[^[:space:]]+$/
			if yu_finished || ! h[:mr].is_a?(Hash)
				h[n.name] = n.text
			else
				if h[:yu].is_a?(Hash)
					h[:yu][n.name] ||= []
					h[:yu][n.name] << n.text
				elsif h[:pk].is_a?(Hash)
					h[:pk][n.name] ||= []
					h[:pk][n.name] << n.text
				else
					h[:mr][n.name] ||= []
					h[:mr][n.name] << n.text
				end
			end

		end

		recurse(n,h) if n.has_elements?
	end

	h

end

xmldoc = Document.new File.new("f2.xml")

puts "RepName|RepIn|ReportType|Date|NA|NRB|SubRangeB|SubRangeE|NA|NRB|SubRangeB|SubRangeE|MXA|MXB"

print_data(recurse(xmldoc))

My main changes are in red

Perseus · 04-16-2014, 02:44 AM

Hello grail,

Thank you, it works fine now, is not printing the nodes before SReport that I don't want.

The following line

Code:

if n.name == 'SReport' && h.has_key?("RepName")

it works like saying, when match node "RepName", begin to print nodes values? in other words, this line
says, this "RepName" is the first node I like to print?

Since there are nodes within "SReport/RepIni/ReportData/MainSec" that appear after node "MAX03_NRanges" that I don't
want to print, how can I say that the last node I want to print is "MAX03_NRanges"?

For example, I'm not interested to print the nodes in red below, since currently are being printed.

Code:

  <SReport>
    <RepName>JEUOP</RepName>
    <RepIn>KUI</RepIn>
    <RepIni>
      <Report>
        <ReportType>Regular</ReportType>
        <ReportData>
          <MainSec>
            <Date>2014-03-15</Date>
            <Indicators_MAX-MR>
		.
		.
		.
            </Indicators_MAX-MR>
            <MAX03_NRanges>
              <MXA>999</MXA>
              <MXB>87</MXB>
            </MAX03_NRanges>
            <NodeXYZ>some text</NodeXYZ>
	    <NodeWYU>
              <MMNJ>999</MMNJ>
              <PTO>87</PTO>
            </NodeWYU>
          </MainSec>
          <SubSec>
	    .
	    .
	    .
	  </SubSec>
        </ReportData>
      </Report>
    </RepIni>
  </SReport>

Thanks again for all the help as usual grail

grail · 04-16-2014, 11:19 AM

Quote:

it works like saying, when match node "RepName", begin to print nodes values? in other words, this line
says, this "RepName" is the first node I like to print?

Not exactly. What it does say is that the hash must contain "RepName" as a key in order to be true. To this end I found it did not remove the guff data, but
rather just printed it along with the rest when called to print. The fix was a simply to move it into the if

Code:

        if n.name == 'SReport'
            print_data(h) if h.has_key?("RepName")
            h.clear
        end

The second issue has me a little more perplexed

Firstly, due to our code that checks if yu has finished, it is actually adding the data from the extra cells into the yu hash.

Secondly, due to us now using a recursive function to get the data, I think our return needs to happen either at the end of the file or when it reaches MXB.
Problem is we cannot tell it to do so when it reaches MXB as it will then not store it to printed.

So I will have a think one this one and let you know. If you come up with a solution, I would be keen to see it

I think it is simple but do not seem to be seeing the forest for the trees presently.

grail · 04-16-2014, 11:26 AM

ahhh ... see, it is not until you say it out loud (or at least type it

) that an idea comes to you:

Code:

#!/usr/bin/env ruby

require 'rexml/document'
include REXML

def print_data(hsh)
    hsh.each do |key,value|
        if value.is_a?(Hash)
            value.each_value{|v| print (v.size == 1?v[0]:('"' + v.join(",") + '"')) + "|" }
        else
            print value + (key =~ /^MXB/?"":"|")
        end
    end
    puts
end

def recurse(element, h = {})

    element.elements.each do |n|

        h.clear if n.name == 'SReport'

        yu_finished = n.name =~ /^MX/?true:false

        h[:mr] = {} if n.name == 'MR_NRanges'
        h[:pk] = {} if n.name == 'PK_NRanges'
        h[:yu] = {} if n.name == 'YU_NRanges'

        if n.has_text? && n.text =~ /^[^[:space:]]+$/
            if yu_finished || ! h[:mr].is_a?(Hash)
                h[n.name] = n.text
            else
                if h[:yu].is_a?(Hash)
                    h[:yu][n.name] ||= []
                    h[:yu][n.name] << n.text
                elsif h[:pk].is_a?(Hash)
                    h[:pk][n.name] ||= []
                    h[:pk][n.name] << n.text
                else
                    h[:mr][n.name] ||= []
                    h[:mr][n.name] << n.text
                end
            end

        end

        print_data(h) if n.name == 'MXB' && h.has_key?("RepName")

        recurse(n,h) if n.has_elements?
    end

end

xmldoc = Document.new File.new("f2.xml")

puts "RepName|RepIn|ReportType|Date|NA|NRB|SubRangeB|SubRangeE|NA|NRB|SubRangeB|SubRangeE|MXA|MXB"

recurse(xmldoc)

Of course if "MXB" is not last this will then cause issues