LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 10-05-2011, 04:10 AM   #1
cgcamal
Member
 
Registered: Nov 2008
Location: Tegucigalpa
Posts: 78

Rep: Reputation: 16
Script to process xml file to get new layout


Hi everybody,

I want to change the structure of a xml file.

Maybe someone could help me with this.

I have this infile:
Code:
<?xml version="1.0" encoding="utf-8"?>
<cases>
  <file name="Reference 19762">
    <case>
      <name>CaseX - exp</name>
      <number>No. 3 Div. 870</number>
      <citation>271 Bypl. 44; 122 So. 2d 119; 2005 Bypl. MOPPE 405</citation>
      <date>July 17, 2005</date>
      <judges>Peter, Ely, Mark.</judges>
    </case>
    <case>
      <name>Comp and Indfi</name>
      <number>No. 3 Div. 887</number>
      <citation>271 Bypl. 70; 122 So. 2d 360; 2005 Bypl. MOPPE 421</citation>
      <date>July 17, 2005</date>
      <judges>Mary, Peter, Ely, Perry, Mark.</judges>
    </case>
  </file>
</cases>
and the output should be:
Code:
<Table ss:ExpandedColumnCount="6" ss:ExpandedRowCount="3" x:FullColumns="1"
   x:FullRows="1" ss:DefaultColumnWidth="60" ss:DefaultRowHeight="15">
   <Column ss:Width="81.75"/>
   <Column ss:Width="75.75"/>
   <Column ss:Width="67.5"/>
   <Column ss:Width="238.5"/>
   <Column ss:Width="62.25"/>
   <Column ss:Width="141.75"/>
   <Row>
    <Cell><Data ss:Type="String">name</Data></Cell>
    <Cell><Data ss:Type="String">name2</Data></Cell>
    <Cell><Data ss:Type="String">number</Data></Cell>
    <Cell><Data ss:Type="String">citation</Data></Cell>
    <Cell><Data ss:Type="String">date</Data></Cell>
    <Cell><Data ss:Type="String">judges</Data></Cell>
   </Row>
   <Row>
    <Cell><Data ss:Type="String">Reference 19762</Data></Cell>
    <Cell><Data ss:Type="String">CaseX - exp</Data></Cell>
    <Cell><Data ss:Type="String">No. 3 Div. 870</Data></Cell>
    <Cell><Data ss:Type="String">271 Bypl. 44; 122 So. 2d 119; 2005 Bypl. MOPPE 405</Data></Cell>
    <Cell><Data ss:Type="String">July 17, 2005</Data></Cell>
    <Cell><Data ss:Type="String">Peter, Ely, Mark.</Data></Cell>
   </Row>
   <Row>
    <Cell><Data ss:Type="String">Reference 19762</Data></Cell>
    <Cell><Data ss:Type="String">Comp and Indfi</Data></Cell>
    <Cell><Data ss:Type="String">No. 3 Div. 887</Data></Cell>
    <Cell><Data ss:Type="String">271 Bypl. 70; 122 So. 2d 360; 2005 Bypl. MOPPE 421</Data></Cell>
    <Cell><Data ss:Type="String">July 17, 2005</Data></Cell>
    <Cell><Data ss:Type="String">Mary, Peter, Ely, Perry, Mark.</Data></Cell>
   </Row>
  </Table>
As you can see, in infile there are 2 blocks of "cases" (it could be more than 2 "cases" blocks. e.g 5, 7, 8 etc.).

Then, in the output the "Row" blocks should be obtained as follow:

Block 1:
Code:
<Table ss:ExpandedColumnCount="6" ss:ExpandedRowCount="3" x:FullColumns="1"
   x:FullRows="1" ss:DefaultColumnWidth="60" ss:DefaultRowHeight="15">
Variables here are in red:

ExpandedRowCount = Number of "Cases" blocks + 1 = 2 + 1 = 3
ExpandedColumnCount="6" (is always equal to 6)

Block 2 (the first "Row" block for headers):
Code:
   <Row>
    <Cell><Data ss:Type="String">name</Data></Cell>
    <Cell><Data ss:Type="String">name2</Data></Cell>
    <Cell><Data ss:Type="String">number</Data></Cell>
    <Cell><Data ss:Type="String">citation</Data></Cell>
    <Cell><Data ss:Type="String">date</Data></Cell>
    <Cell><Data ss:Type="String">judges</Data></Cell>
   </Row>
This block is always the same text.

Blocks 3,4,...,N ("Row" blocks that are not for headers):
Code:
   <Row>
    <Cell><Data ss:Type="String">Reference 19762</Data></Cell>
    <Cell><Data ss:Type="String">CaseX - exp</Data></Cell>
    <Cell><Data ss:Type="String">No. 3 Div. 870</Data></Cell>
    <Cell><Data ss:Type="String">271 Bypl. 44; 122 So. 2d 119; 2005 Bypl. MOPPE 405</Data></Cell>
    <Cell><Data ss:Type="String">July 17, 2005</Data></Cell>
    <Cell><Data ss:Type="String">Peter, Ely, Mark.</Data></Cell>
   </Row>
The first value, in red, is get from the line "<file name="Reference 19762">" in infile that only appears once.
The next rows for this block, values in green, are taken in the same order that they appear in "Cases" block

I can do some basic sed or awk replacements individually, but I don't know how to get a script to do all
things needed as described above.

I hope someone could help me.

Many thanks in advance for any help.

Regards.
 
Old 10-05-2011, 09:49 PM   #2
paulsm4
LQ Guru
 
Registered: Mar 2004
Distribution: SusE 8.2
Posts: 5,863
Blog Entries: 1

Rep: Reputation: Disabled
In general, this is the kind of thing XSL was invented to do:

http://www.ibm.com/developerworks/xm..._TACT=105AGX59

I would suggest one of two things:

1. XSLT approach:
a) Install Xalan on your PC (it should be available from your distro's package manager)
b) Work through the tutorial on the above link
c) See if it looks like it might be a good match

2. Non-XSLT apprach:
a) Pick a scripting language. Any language: Perl, Python, Java, etc.
b) Pick the level you want to work at (use XSLT to read/write, use a DOM to read, but write directly, read and write directly)
c) Post back if you have any questions

PS:
Xalan and XSLT can easily be "scripted" from the Linux shell.

'Hope that helps!
 
1 members found this post helpful.
Old 10-06-2011, 04:32 PM   #3
cgcamal
Member
 
Registered: Nov 2008
Location: Tegucigalpa
Posts: 78

Original Poster
Rep: Reputation: 16
Hi paulsm4,

Many thanks for reply. I didn't know about that tool. I'll need to read how to use that tool because I don't now a language as java, perl, etc.

But many thanks again for your answer and give share that option.

Regards.
 
Old 10-06-2011, 04:44 PM   #4
Proud
Senior Member
 
Registered: Dec 2002
Location: England
Distribution: Used to use Mandrake/Mandriva
Posts: 2,794

Rep: Reputation: 116Reputation: 116
Seconding that XML to XML transformation is the point of XLS(T), which is also XML itself. It should be a no-brainer if you understand the XML you're using, no programming language required as long as it can handle the specifics.
 
Old 10-06-2011, 07:55 PM   #5
cgcamal
Member
 
Registered: Nov 2008
Location: Tegucigalpa
Posts: 78

Original Poster
Rep: Reputation: 16
Hi Proud,

Thanks for the reference.

Do you now if with XLST is possible do parsing of the xml file, not only change the structure?

Because in other file I have, not only I need to change the structure of the xml, but delete repeted names and all its associated values must be in the same block.

Is possible to do batch process?

Regards.
 
Old 10-07-2011, 05:01 AM   #6
Proud
Senior Member
 
Registered: Dec 2002
Location: England
Distribution: Used to use Mandrake/Mandriva
Posts: 2,794

Rep: Reputation: 116Reputation: 116
I last touched XLS in uni, but I believe the contents/value of an element can be searched and manipulated as easily as any attribute or tag name.

Your specific manipulation might be classed under a distinct/choose/coalesce operator/instruction. A quick google suggests a common approach might be what is called the Muenchian Method, or for use with XSLT 2.0 perhaps use the <xsl:for-each-group> instruction.

There also seems to be quite a few results at places like stackoverflow.com for more specific problems when using XLS to transform XML into Excel XML stylesheets, which I think you're trying to do.

As for batches of transformations, again as Paulsm4 points out, an implementation such as Xalan can be easily scripted with some shell programming, such as Bash.
 
1 members found this post helpful.
Old 10-07-2011, 09:30 PM   #7
cgcamal
Member
 
Registered: Nov 2008
Location: Tegucigalpa
Posts: 78

Original Poster
Rep: Reputation: 16
Thanks for answer Proud,

I've been seeing that, some examples. I'll read more to learn about and apply it, it looks
something similar to what I want it could be done with those methods.

Thanks again to both.

Regards
 
Old 10-08-2011, 06:06 PM   #8
aspire1
Member
 
Registered: Dec 2008
Distribution: Ubuntu
Posts: 62

Rep: Reputation: 23
..and just to point out xsltproc is probably already installed on your system which takes a stylesheet and applies it to your original xml.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to find the tag value in xml file through unix shell script hussain.s Linux - General 2 11-03-2010 07:52 AM
LXer: Choosing the right Linux File System Layout using a Top-Bottom Process LXer Syndicated Linux News 0 08-01-2009 10:10 PM
Bash script to strip some content from XML file. musther Programming 10 10-14-2007 05:47 PM
Bash script only appears to process one line of file Nylex Programming 5 08-19-2007 10:35 PM
Is there any script to reverse the process of dpkg -i file-1.1.1.deb? sunzaifa Debian 2 08-03-2004 09:01 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:40 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration