LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 04-18-2014, 12:03 PM   #46
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,011

Rep: Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194

That would be due to you introducing text with spaces. The usual story will probably be that given enough time you will always find some form of xml that will not fit the pattern.
However, this one is a simple fix:
Code:
xmldoc.elements[x].text =~ /^[[:alnum:]]/
 
Old 04-19-2014, 02:41 AM   #47
Perseus
Member
 
Registered: Oct 2011
Posts: 179

Original Poster
Rep: Reputation: Disabled
Hello grail,

Your codes works great. I've tried each step, one by one and I think I understand better the logic.

I've modified your script to print headers from nodes names as below.
Code:
require 'rexml/document'
include REXML

xmldoc = Document.new File.new("input.xml")

array_A = []
array_B = []
array_Headers = []

xmldoc.elements.each("//"){ |z| array_A << z.xpath.gsub(/\[.\]/,'') }

array_A.uniq.each{ |x| 
	if xmldoc.elements[x].has_text? && xmldoc.elements[x].text =~ /^[[:alnum:]]/
		array_B << xmldoc.get_elements(x).map{ |a| a.text }.join(",")
		array_Headers << x.sub(/^\/(.+\/)*(\w*)(\[\d*\])?/ ,'\2') # to get only node name
	end
}
print array_Headers.join("|") + "\n" + array_B.map{ |n| (n.include?(","))?"\"#{n}\"":n }.join("|")
But even is printing all values I want, the output is not be presented good enough for some nodes, since using the real file I detected a issue with similar nodes from different categories, what I mean is:

In input XML, after "<SecondSection>" comes a "<ThirdSection>... </ThirSection>" with contact names for this different kind of issues (IssuesTypeA, IssuesTypeB, IssueTypeC, etc). The data of all contacts persons is printed but in different columns. I want to improve the output for this contact data printing similar nodes in the same column (grouped with commas and surrounded with double quotes like before). This is print only once the headers "Prefix, GivenName,FamilyName,JobTitle,PhoneFix,PhoneMobile,Fax,Email" and group all prefixes with commas, group all GivenName with commas etc.

So, instead to get this like currently get(repeated 3 times headers for the 3 contacts)
Code:
Prefix|GivenName|FamilyName|JobTitle|PhoneFix|PhoneMobile|Fax|Email|Prefix|GivenName|FamilyName|JobTitle|PhoneFix|PhoneMobile|Fax|Email|Prefix|GivenName|FamilyName|JobTitle|PhoneFix|PhoneMobile|Fax|Email
Mr|John|Jones|Controller|3333333311|5553334435|12684807476|jt@email.com|Mrs|Mary|Jean|Manager|333333|444444|55555|mj@email.com|Mrs|Joan|Roads|Representative|43434|678767|32255111|jr@email.com
Get this (print only once the headers, grouping data of each contact by commas and adding before the prefix the IssueTypeX)
Code:
Prefix|GivenName|FamilyName|JobTitle|PhoneFix|PhoneMobile|Fax|Email
"(IssueTypeA)-Mr,(IssueTypeB)-Mrs,(IssueTypeC)-Mrs"|"John,Mary,Joan"|"Jones,Jean,Roads"|"Controller,Manager,Representative"|"3333333311,333333,43434"|"5553334435,444444,678767"|"12684807476,55555,32255111"|"jt@email.com,mj@email.com,jr@email.com"
I've added this <ThirdSection> to the input XML and looks as below.
Code:
<?xml version="1.0" encoding="UTF-8"?>
<REPORT-01-NUUMAX16 >
 <NUUMAX16FHeader>
    <Date1>2013-12-17</Date1>
    <CodeXV>7.4</CodeXV>
    <NUUMAX16Ver>8.91</NUUMAX16Ver>
  </NUUMAX16FHeader> 
  <SReport>
    <RepName>JEUOP</RepName>
    <RepIn>KUI</RepIn>
    <RepIni>
      <Report>
        <ReportType>Regular</ReportType>
        <ReportData>
          <MainSec>
            <Date>2014-03-15</Date>
            <Indicators_MAX-MR>
              <MR_NRanges>
                <MRValues>
                  <MR_ValRanges>
                    <NA>763</NA>
                    <NRB>91</NRB>
                    <SubRange>
                      <SubRangeB>000</SubRangeB>
                      <SubRangeE>899</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>
                <MRValues>
                  <MR_ValRanges>
                    <NA>358</NA>
                    <NRB>95</NRB>
                    <SubRange>
                      <SubRangeB>130</SubRangeB>
                      <SubRangeE>149</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>
                <MRValues>
                  <MR_ValRanges>
                    <NA>852</NA>
                    <NRB>76</NRB>
                    <SubRange>
                      <SubRangeB>200</SubRangeB>
                      <SubRangeE>299</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>				
              </MR_NRanges>
              <PK_NRanges>
                <MRValues>
                  <MR_ValRanges>
                    <NA>441</NA>
                    <NRB>97</NRB>
                    <SubRange>
                      <SubRangeB>786</SubRangeB>
                      <SubRangeE>789</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>
                <MRValues>
                  <MR_ValRanges>
                    <NA>705</NA>
                    <NRB>98</NRB>
                    <SubRange>
                      <SubRangeB>677</SubRangeB>
                      <SubRangeE>859</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>
              </PK_NRanges>
			  <YU_NRanges>
                <MRValues>
                  <MR_ValRanges>
                    <NA>223</NA>
                    <NRB>11</NRB>
                    <SubRange>
                      <SubRangeB>345</SubRangeB>
                      <SubRangeE>457</SubRangeE>
                    </SubRange>
                  </MR_ValRanges>
                </MRValues>
              </YU_NRanges>	
            </Indicators_MAX-MR>
            <MAX03_NRanges>
              <MXA>999</MXA>
              <MXB>87</MXB>
            </MAX03_NRanges>
            <MAX04_NRanges>
              <MGA_GG>435</MGA_GG>
              <MGB_JG>445</MGB_JG>
            </MAX04_NRanges>
            <DNPA>0</DNPA>
            <CH>
              <CHItem>
                <Date>2008-12-31</Date>
                <Description>CH date 1</Description>
              </CHItem>
              <CHItem>
                <Date>2010-08-01</Date>
                <Description>CH date 2</Description>
              </CHItem>
            </CH>			
          </MainSec>
		  <SecondSec>
            <EdC>2011-04-01</EdC>
            <SCLst>
              <SCLIt>
                <SCLNm>SCLMns codes</SCLNm>
                <DPCList>
                  <DPCItem>
                    <SCSig>SCSig val</SCSig>
                    <SCType>KKDDH</SCType>
                    <DPC>3.4.1</DPC>
                    <Comments>ALP-091</Comments>
                  </DPCItem>
                  <DPCItem>
                    <SCSig>HSHSHS-0</SCSig>
                    <SCType>LLSO</SCType>
                    <DPC>3.10.7</DPC>
                    <Comments>ALP-88</Comments>
                  </DPCItem>
                  <DPCItem>
                    <SCSig>SCSig Bne</SCSig>
                    <SCType>WERTT</SCType>
                    <DPC>1.44.30</DPC>
                    <Comments>URI-9918</Comments>
                  </DPCItem>
                </DPCList>
              </SCLIt>
            </SCLst>
          </SecondSec>
          <ThirdSection>
	    <IssuesTypeA>
              <ContactPerson>
                <Prefix>Mr</Prefix>
                <GivenName>John</GivenName>
                <FamilyName>Jones</FamilyName>
                <JobTitle>Controller</JobTitle>
                <PhoneFixList>
                  <PhoneFix>3333333311</PhoneFix>
                </PhoneFixList>
                <PhoneMobileList>
                  <PhoneMobile>5553334435</PhoneMobile>
                </PhoneMobileList>
                <FaxList>
                  <Fax>12684807476</Fax>
                </FaxList>
                <EmailList>
                  <Email>jt@email.com</Email>
                </EmailList>
              </ContactPerson>
            </IssuesTypeA>
            <IssuesTypeB>
              <ContactPerson>
                <Prefix>Mrs</Prefix>
                <GivenName>Mary</GivenName>
                <FamilyName>Jean</FamilyName>
                <JobTitle>Manager</JobTitle>
                <PhoneFixList>
                  <PhoneFix>333333</PhoneFix>
                </PhoneFixList>
                <PhoneMobileList>
                  <PhoneMobile>444444</PhoneMobile>
                </PhoneMobileList>
                <FaxList>
                  <Fax>55555</Fax>
                </FaxList>
                <EmailList>
                  <Email>mj@email.com</Email>
                </EmailList>
              </ContactPerson>
            </IssuesTypeB>
            <IssuesTypeC>
              <ContactPerson>
                <Prefix>Mrs</Prefix>
                <GivenName>Joan</GivenName>
                <FamilyName>Roads</FamilyName>
                <JobTitle>Representative</JobTitle>
                <PhoneFixList>
                  <PhoneFix>43434</PhoneFix>
                </PhoneFixList>
                <PhoneMobileList>
                  <PhoneMobile>678767</PhoneMobile>
                </PhoneMobileList>
                <FaxList>
                  <Fax>32255111</Fax>
                </FaxList>
                <EmailList>
                  <Email>jr@email.com</Email>
                </EmailList>
              </ContactPerson>
            </IssuesTypeC>
          </ThirdSection>
        </ReportData>
      </Report>
    </RepIni>
  </SReport>
</REPORT-01-NUUMAX16>
Many thanks again for the help and time and teachings

Regards
 
Old 04-19-2014, 08:38 AM   #48
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,011

Rep: Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194
Firstly, for printing the names as headers, just use .name:
Code:
array_Headers << x.name
As for your second requirement ... it is not possible. You now are requiring knowledge of the data, whereas the current solution says we do not care the format of the data, but
if it should fall under the same heading then we will group it all together.

The data you have created now has a completely different node name, ie IssuesTypeA and IssuesTypeB, hence no path will ever equal both so they are quite correctly separated in to
separate values.

I think we have probably gone far enough off the reservation with this question as it has transformed several times. Also, as I stated earlier, you have now found yet another example
where the current solution does not work. This may continue at infinitum as once we solve a problem you provide a new issue.

Lastly, the new sections being added do not seem to match any of the initial data, so I am not sure if you are just taking on new things to see how to change the solution, but
I will leave you with your new hurdle. From the current different solutions you may be able to cobble together the 2 solutions (which would seem to be what you are now heading for)
and see what you can come up with.

Good Luck
 
Old 04-20-2014, 02:48 AM   #49
Perseus
Member
 
Registered: Oct 2011
Posts: 179

Original Poster
Rep: Reputation: Disabled
Thumbs up

Hello grail,

For some reason I get error trying with "array_Headers << x.name", but is a minor issue.

I know that looks that I'm changing things each time, but actually I only presented a representative sample
of the original XML to make easier to understand. Your last code it works just correct for what I asked, only
happened that when I tested with the complete XML I saw that issue with the contacts nodes. I'll use your previous
examples to try to get that output.

Many thanks again for the great help, support, patience and time provided.

Regards
 
Old 04-20-2014, 09:08 AM   #50
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,011

Rep: Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194
Here is the header part, sorry I did not look at the exact code part knew the .name option was there from rexml:
Code:
#!/usr/bin/env ruby

require 'rexml/document'
include REXML

xmldoc = Document.new File.new("f.xml")

array_A = []
array_D = []
array_H = []

xmldoc.elements.each("//"){ |z| array_A << z.xpath.gsub(/\[.\]/,'') if z.has_text? && z.text =~ /^[[:alnum:]]/ }

array_A.uniq.each do |x| 
    array_D << xmldoc.get_elements(x).map{ |a| a.text }.join(",")
    array_H << xmldoc.elements[x].name
end

puts array_H.join("|")
puts array_D.map{ |n| (n.include?',')?"\"#{n}\"":n }.join("|")
As I said above, I think your biggest hurdle will be to assume you do not know the data and still have it fall in line when a node does not have the same name.
One thought I did have is that wildcards can be used so you may be able to replace the path so you could have:
Code:
/REPORT-01-NUUMAX16/SReport/RepIni/Report/ReportData/ThirdSection/*/ContactPerson/Prefix
Of course this supposes you know the "IssueTypeX" is going to exist and that there are multiples and you will replace all with this line.
I am also then not sure how you would go about getting the format you specified with the "IssueTypeX" preceding the value

Here is an example:
Code:
array_A.map!{ |p| p.sub(/IssuesType./, '*') }

Last edited by grail; 04-20-2014 at 09:16 AM.
 
Old 04-21-2014, 02:36 AM   #51
Perseus
Member
 
Registered: Oct 2011
Posts: 179

Original Poster
Rep: Reputation: Disabled
Hello grail,

Thank you for the fix to get element.name, t works fine!

Thanks for the suggestion to get IssuesTypeX in the same column. I'm trying to adapt your suggestion mixing the option
Code:
xmldoc.elements[x].parent.parent.name
I think replacing the parent the parent.parent.name with nothing, Is not needed to previously know which IssuesType is.

I'll continue trying.

Thanks again
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Replace Tab Separated Values with Commas Except Last Column darkangel29 Programming 2 07-15-2013 05:20 AM
how to extract ascii separated values in a text file? depam Linux - General 4 01-27-2012 12:43 AM
line break on bash script ZAMO Linux - General 5 04-24-2010 12:27 PM
bash script stdin accept values separated with new lines, commas, spaces m4rtin Programming 6 12-30-2009 06:22 AM
help with comma separated values and what should be a simple script. zaber Programming 10 03-06-2008 12:58 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 12:10 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration