[SOLVED] Script to print repeated values separated by line break

grail · 04-18-2014, 12:03 PM

That would be due to you introducing text with spaces. The usual story will probably be that given enough time you will always find some form of xml that will not fit the pattern.
However, this one is a simple fix:

Code:

xmldoc.elements[x].text =~ /^[[:alnum:]]/

Perseus · 04-19-2014, 02:41 AM

Hello grail,

Your codes works great. I've tried each step, one by one and I think I understand better the logic.

I've modified your script to print headers from nodes names as below.

Code:

require 'rexml/document'
include REXML

xmldoc = Document.new File.new("input.xml")

array_A = []
array_B = []
array_Headers = []

xmldoc.elements.each("//"){ |z| array_A << z.xpath.gsub(/\[.\]/,'') }

array_A.uniq.each{ |x| 
	if xmldoc.elements[x].has_text? && xmldoc.elements[x].text =~ /^[[:alnum:]]/
		array_B << xmldoc.get_elements(x).map{ |a| a.text }.join(",")
		array_Headers << x.sub(/^\/(.+\/)*(\w*)(\[\d*\])?/ ,'\2') # to get only node name
	end
}
print array_Headers.join("|") + "\n" + array_B.map{ |n| (n.include?(","))?"\"#{n}\"":n }.join("|")

But even is printing all values I want, the output is not be presented good enough for some nodes, since using the real file I detected a issue with similar nodes from different categories, what I mean is:

In input XML, after "<SecondSection>" comes a "<ThirdSection>... </ThirSection>" with contact names for this different kind of issues (IssuesTypeA, IssuesTypeB, IssueTypeC, etc). The data of all contacts persons is printed but in different columns. I want to improve the output for this contact data printing similar nodes in the same column (grouped with commas and surrounded with double quotes like before). This is print only once the headers "Prefix, GivenName,FamilyName,JobTitle,PhoneFix,PhoneMobile,Fax,Email" and group all prefixes with commas, group all GivenName with commas etc.

So, instead to get this like currently get(repeated 3 times headers for the 3 contacts)

Code:

Prefix|GivenName|FamilyName|JobTitle|PhoneFix|PhoneMobile|Fax|Email|Prefix|GivenName|FamilyName|JobTitle|PhoneFix|PhoneMobile|Fax|Email|Prefix|GivenName|FamilyName|JobTitle|PhoneFix|PhoneMobile|Fax|Email
Mr|John|Jones|Controller|3333333311|5553334435|12684807476|jt@email.com|Mrs|Mary|Jean|Manager|333333|444444|55555|mj@email.com|Mrs|Joan|Roads|Representative|43434|678767|32255111|jr@email.com

Get this (print only once the headers, grouping data of each contact by commas and adding before the prefix the IssueTypeX)

Code:

Prefix|GivenName|FamilyName|JobTitle|PhoneFix|PhoneMobile|Fax|Email
"(IssueTypeA)-Mr,(IssueTypeB)-Mrs,(IssueTypeC)-Mrs"|"John,Mary,Joan"|"Jones,Jean,Roads"|"Controller,Manager,Representative"|"3333333311,333333,43434"|"5553334435,444444,678767"|"12684807476,55555,32255111"|"jt@email.com,mj@email.com,jr@email.com"

I've added this <ThirdSection> to the input XML and looks as below.

Code:

<?xml version="1.0" encoding="UTF-8"?>
<REPORT-01-NUUMAX16 >
 <NUUMAX16FHeader>
    <Date1>2013-12-17</Date1>
    <CodeXV>7.4</CodeXV>
    <NUUMAX16Ver>8.91</NUUMAX16Ver>
  </NUUMAX16FHeader> 
  <SReport>
    <RepName>JEUOP</RepName>
    <RepIn>KUI</RepIn>
    <RepIni>
      <Report>
        <ReportType>Regular</ReportType>
        <ReportData>
          <MainSec>
            <Date>2014-03-15</Date>
            <Indicators_MAX-MR>
              <MR_NRanges>
                <MRValues>
                  <MR_ValRanges>
                    <NA>763</NA>
                    <NRB>91</NRB>
                    <SubRange>
                      <SubRangeB>000</SubRangeB>
                      <SubRangeE>899</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>
                <MRValues>
                  <MR_ValRanges>
                    <NA>358</NA>
                    <NRB>95</NRB>
                    <SubRange>
                      <SubRangeB>130</SubRangeB>
                      <SubRangeE>149</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>
                <MRValues>
                  <MR_ValRanges>
                    <NA>852</NA>
                    <NRB>76</NRB>
                    <SubRange>
                      <SubRangeB>200</SubRangeB>
                      <SubRangeE>299</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>				
              </MR_NRanges>
              <PK_NRanges>
                <MRValues>
                  <MR_ValRanges>
                    <NA>441</NA>
                    <NRB>97</NRB>
                    <SubRange>
                      <SubRangeB>786</SubRangeB>
                      <SubRangeE>789</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>
                <MRValues>
                  <MR_ValRanges>
                    <NA>705</NA>
                    <NRB>98</NRB>
                    <SubRange>
                      <SubRangeB>677</SubRangeB>
                      <SubRangeE>859</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>
              </PK_NRanges>
			  <YU_NRanges>
                <MRValues>
                  <MR_ValRanges>
                    <NA>223</NA>
                    <NRB>11</NRB>
                    <SubRange>
                      <SubRangeB>345</SubRangeB>
                      <SubRangeE>457</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>
              </YU_NRanges>	
            </Indicators_MAX-MR>
            <MAX03_NRanges>
              <MXA>999</MXA>
              <MXB>87</MXB>
            </MAX03_NRanges>
            <MAX04_NRanges>
              <MGA_GG>435</MGA_GG>
              <MGB_JG>445</MGB_JG>
            </MAX04_NRanges>
            <DNPA>0</DNPA>
            <CH>
              <CHItem>
                <Date>2008-12-31</Date>
                <Description>CH date 1</Description>
              </CHItem>
              <CHItem>
                <Date>2010-08-01</Date>
                <Description>CH date 2</Description>
              </CHItem>
            </CH>			
          </MainSec>
		  <SecondSec>
            <EdC>2011-04-01</EdC>
            <SCLst>
              <SCLIt>
                <SCLNm>SCLMns codes</SCLNm>
                <DPCList>
                  <DPCItem>
                    <SCSig>SCSig val</SCSig>
                    <SCType>KKDDH</SCType>
                    <DPC>3.4.1</DPC>
                    <Comments>ALP-091</Comments>
                  </DPCItem>
                  <DPCItem>
                    <SCSig>HSHSHS-0</SCSig>
                    <SCType>LLSO</SCType>
                    <DPC>3.10.7</DPC>
                    <Comments>ALP-88</Comments>
                  </DPCItem>
                  <DPCItem>
                    <SCSig>SCSig Bne</SCSig>
                    <SCType>WERTT</SCType>
                    <DPC>1.44.30</DPC>
                    <Comments>URI-9918</Comments>
                  </DPCItem>
                </DPCList>
              </SCLIt>
            </SCLst>
          </SecondSec>
          <ThirdSection>
	    <IssuesTypeA>
              <ContactPerson>
                <Prefix>Mr</Prefix>
                <GivenName>John</GivenName>
                <FamilyName>Jones</FamilyName>
                <JobTitle>Controller</JobTitle>
                <PhoneFixList>
                  <PhoneFix>3333333311</PhoneFix>
                </PhoneFixList>
                <PhoneMobileList>
                  <PhoneMobile>5553334435</PhoneMobile>
                </PhoneMobileList>
                <FaxList>
                  <Fax>12684807476</Fax>
                </FaxList>
                <EmailList>
                  <Email>jt@email.com</Email>
                </EmailList>
              </ContactPerson>
            </IssuesTypeA>
            <IssuesTypeB>
              <ContactPerson>
                <Prefix>Mrs</Prefix>
                <GivenName>Mary</GivenName>
                <FamilyName>Jean</FamilyName>
                <JobTitle>Manager</JobTitle>
                <PhoneFixList>
                  <PhoneFix>333333</PhoneFix>
                </PhoneFixList>
                <PhoneMobileList>
                  <PhoneMobile>444444</PhoneMobile>
                </PhoneMobileList>
                <FaxList>
                  <Fax>55555</Fax>
                </FaxList>
                <EmailList>
                  <Email>mj@email.com</Email>
                </EmailList>
              </ContactPerson>
            </IssuesTypeB>
            <IssuesTypeC>
              <ContactPerson>
                <Prefix>Mrs</Prefix>
                <GivenName>Joan</GivenName>
                <FamilyName>Roads</FamilyName>
                <JobTitle>Representative</JobTitle>
                <PhoneFixList>
                  <PhoneFix>43434</PhoneFix>
                </PhoneFixList>
                <PhoneMobileList>
                  <PhoneMobile>678767</PhoneMobile>
                </PhoneMobileList>
                <FaxList>
                  <Fax>32255111</Fax>
                </FaxList>
                <EmailList>
                  <Email>jr@email.com</Email>
                </EmailList>
              </ContactPerson>
            </IssuesTypeC>
          </ThirdSection>
        </ReportData>
      </Report>
    </RepIni>
  </SReport>
</REPORT-01-NUUMAX16>

Many thanks again for the help and time and teachings

Regards

grail · 04-19-2014, 08:38 AM

Firstly, for printing the names as headers, just use .name:

Code:

array_Headers << x.name

As for your second requirement ... it is not possible. You now are requiring knowledge of the data, whereas the current solution says we do not care the format of the data, but
if it should fall under the same heading then we will group it all together.

The data you have created now has a completely different node name, ie IssuesTypeA and IssuesTypeB, hence no path will ever equal both so they are quite correctly separated in to
separate values.

I think we have probably gone far enough off the reservation with this question as it has transformed several times. Also, as I stated earlier, you have now found yet another example
where the current solution does not work. This may continue at infinitum as once we solve a problem you provide a new issue.

Lastly, the new sections being added do not seem to match any of the initial data, so I am not sure if you are just taking on new things to see how to change the solution, but
I will leave you with your new hurdle. From the current different solutions you may be able to cobble together the 2 solutions (which would seem to be what you are now heading for)
and see what you can come up with.

Good Luck

Perseus · 04-20-2014, 02:48 AM

Hello grail,

For some reason I get error trying with "array_Headers << x.name", but is a minor issue.

I know that looks that I'm changing things each time, but actually I only presented a representative sample
of the original XML to make easier to understand. Your last code it works just correct for what I asked, only
happened that when I tested with the complete XML I saw that issue with the contacts nodes. I'll use your previous
examples to try to get that output.

Many thanks again for the great help, support, patience and time provided

.

Regards

grail · 04-20-2014, 09:08 AM

Here is the header part, sorry I did not look at the exact code part knew the .name option was there from rexml:

Code:

#!/usr/bin/env ruby

require 'rexml/document'
include REXML

xmldoc = Document.new File.new("f.xml")

array_A = []
array_D = []
array_H = []

xmldoc.elements.each("//"){ |z| array_A << z.xpath.gsub(/\[.\]/,'') if z.has_text? && z.text =~ /^[[:alnum:]]/ }

array_A.uniq.each do |x| 
    array_D << xmldoc.get_elements(x).map{ |a| a.text }.join(",")
    array_H << xmldoc.elements[x].name
end

puts array_H.join("|")
puts array_D.map{ |n| (n.include?',')?"\"#{n}\"":n }.join("|")

As I said above, I think your biggest hurdle will be to assume you do not know the data and still have it fall in line when a node does not have the same name.
One thought I did have is that wildcards can be used so you may be able to replace the path so you could have:

Code:

/REPORT-01-NUUMAX16/SReport/RepIni/Report/ReportData/ThirdSection/*/ContactPerson/Prefix

Of course this supposes you know the "IssueTypeX" is going to exist and that there are multiples and you will replace all with this line.
I am also then not sure how you would go about getting the format you specified with the "IssueTypeX" preceding the value

Here is an example:

Code:

array_A.map!{ |p| p.sub(/IssuesType./, '*') }

Perseus · 04-21-2014, 02:36 AM

Hello grail,

Thank you for the fix to get element.name, t works fine!

Thanks for the suggestion to get IssuesTypeX in the same column. I'm trying to adapt your suggestion mixing the option

Code:

xmldoc.elements[x].parent.parent.name

I think replacing the parent the parent.parent.name with nothing, Is not needed to previously know which IssuesType is.

I'll continue trying.

Thanks again