LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 10-01-2009, 12:17 AM   #1
Wim Sturkenboom
Senior Member
 
Registered: Jan 2005
Location: Roodepoort, South Africa
Distribution: Slackware 10.1/10.2/12, Ubuntu 12.04, Crunchbang Statler
Posts: 3,786

Rep: Reputation: 282Reputation: 282Reputation: 282
java xml parsing


I'm trying to parse the below xml document using java (as part of a dedicated xml editor).
Code:
<?xml version="1.0" encoding="ISO-8859-1"?>

<ADI>
  <Metadata>
    <AMS Asset_Name="720-PG-R18" Provider="xx" Product="SVOD" Version_Major="1" Version_Minor="0" Description="Default Description (MCA)" Creation_Date="2009-09-15" Provider_ID="mydomain.com" Asset_ID="XPPK0001253018895421" Asset_Class="package" Verb="" />
    <App_Data App="SVOD" Name="Metadata_Spec_Version" Value="TTV1.0" />
  </Metadata>
  <Asset>
    <Metadata>
      <AMS Asset_Name="720-PG-R18_Title" Provider="xx" Product="SVOD" Version_Major="1" Version_Minor="0" Description="720-PG-R18_Title" Creation_Date="2009-09-15" Provider_ID="mydomain.com" Asset_ID="XPTL0001253018895421" Asset_Class="title" Verb="" />
      <App_Data App="SVOD" Name="Type" Value="title" />
      <App_Data App="SVOD" Name="Title" Value="720p R18 20 minutes(MCA)" />
      <App_Data App="SVOD" Name="Summary_Short" Value="title.summary short" />
      <App_Data App="SVOD" Name="Display_Run_Time" Value="00:05" />
      <App_Data App="SVOD" Name="Run_Time" Value="00:05:00" />
      <App_Data App="SVOD" Name="Rating" Value="R18" />
      <App_Data App="SVOD" Name="Billing_ID" Value="12345" />
      <App_Data App="SVOD" Name="Licensing_Window_Start" Value="2009-09-22" />
      <App_Data App="SVOD" Name="Licensing_Window_End" Value="2009-09-30" />
      <App_Data App="SVOD" Name="Title_Brief" Value="720p R18(MCA)" />
      <App_Data App="SVOD" Name="Validity Date" Value="2009-12-30" />
    </Metadata>
    <Asset>
      <Metadata>
	<AMS Asset_Name="720-PG-R18_movie" Provider="xx" Product="SVOD" Version_Major="1" Version_Minor="0" Description="default description" Creation_Date="2009-09-15" Provider_ID="mydomain.com" Asset_ID="XPMV0001253018895421" Asset_Class="movie" Verb="" />
	<App_Data App="SVOD" Name="Type" Value="movie" />
	<App_Data App="SVOD" Name="Audio_Type" Value="Dolby 5.1" />
	<App_Data App="SVOD" Name="HDContent" Value="N" />
	<App_Data App="SVOD" Name="Screen_Format" Value="Widescreen" />
	<App_Data App="SVOD" Name="Content_FileSize" Value="1845780480" />
	<App_Data App="SVOD" Name="Content_CheckSum" Value="aec57f1636c1257272abb92800e25d9b" />
      </Metadata>
      <Content Value="scr_720p-H264-1AUD-Full-Movie.mpg.tar.mpg" />
    </Asset>
    <Asset>
      <Metadata>
	<AMS Asset_Name="720-PG-R18_cover" Provider="xx" Product="SVOD" Version_Major="1" Version_Minor="0" Description="default desc" Creation_Date="2009-09-18" Provider_ID="mydomain.com" Asset_ID="XPBC0001253018895421" Asset_Class="box cover" Verb="" />
	<App_Data App="SVOD" Name="Type" Value="box cover" />
	<App_Data App="SVOD" Name="Content_CheckSum" Value="846d4e4aa0dcb7aca0e9e07662db7301" />
	<App_Data App="SVOD" Name="Content_FileSize" Value="19637" />
      </Metadata>
      <Content Value="Box_Cover_Test_1.JPG" />
    </Asset>
  </Asset>
</ADI>
I've managed to find the root-element (ADI) but that's about it. I'm stuck because whatever I try does not give me any of the children. The green line e.g. gives me that there are 5 children. I don't understand that as there are only 2 children (metadata and asset).
The red line returns a type 3 (TEXT_NODE) which I also don't understand; this is more than likely because I'm not familiar with all terms in the DOM/XML (I'm currently digging through 200+ pages of W3C DOM specification).

Code:
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setIgnoringComments(true);
        factory.setCoalescing(true);
        factory.setNamespaceAware(false);
        factory.setValidating(false);

        DocumentBuilder parser = factory.newDocumentBuilder();
        Document document = parser.parse(infile);

        // get the first node (root element)
        String firstnode = document.getDocumentElement().getNodeName();
        jTextArea1.append("First node: " + firstnode + "\n");

        // get the section
        NodeList sections = document.getElementsByTagName(firstnode);
        int numSections = sections.getLength();
        // display number of sections
        jTextArea1.append("Number of sections: " + Integer.toString(numSections) + "\n");
        for (int i = 0; i < numSections; i++)
        {
            Element section = (Element) sections.item(i);

            NodeList children = section.getChildNodes();
            int numChildren = children.getLength();
            jTextArea1.append("Number of children: " + Integer.toString(numChildren) + "\n");

            Node child = section.getFirstChild();
            if (child==null)
                jTextArea1.append(">>.. no children ..<<\n");
            else
            {
                jTextArea1.append(">>" + Integer.toString(child.getNodeType()) + "<<\n");
            }
        }
So the (first) question is if somebody can tell me which method to use to get the children?

Last edited by Wim Sturkenboom; 10-01-2009 at 12:18 AM.
 
Old 10-01-2009, 12:26 AM   #2
smeezekitty
Senior Member
 
Registered: Sep 2009
Location: Washington U.S.
Distribution: M$ Windows / Debian / Ubuntu / DSL / many others
Posts: 2,220

Rep: Reputation: 164Reputation: 164
Africa?
anyway
my approch would be build a table of tags that it is in currently
and read the table and extract the values
i dont have code because i have made one in C but it was unstable
 
Old 10-01-2009, 12:37 AM   #3
paulsm4
Guru
 
Registered: Mar 2004
Distribution: SusE 8.2
Posts: 5,863
Blog Entries: 1

Rep: Reputation: Disabled
Hi -

I cut/pasted your XML file into a text file, and looked at it under Firefox (Firefox, IE and most other browsers will show you XML in a tree view ... and let you quickly verify whether the file "looks OK" or not). I thought there might be tags out of order (or something similar) ... but the file looks fine.

I also glanced at your code ... and didn't see any obvious problems.

XML parsing in Java is really easy. Reading the WC3 specs ... and trying to make any programming sense out of it ... is very hard.

SUGGESTION:
Spend a few minutes with a good, simple Java/XML tutorial, and then take another look at your code.

Here are a few examples:

http://java.sun.com/j2ee/1.4/docs/tu...AXPIntro5.html
http://java.sun.com/j2ee/1.4/docs/tu...c/JAXPDOM.html

http://www.ibm.com/developerworks/ed...GX02&S_CMP=EDU

http://java.sun.com/javase/6/docs/te...xml/index.html

Last edited by paulsm4; 10-01-2009 at 12:43 AM.
 
Old 10-01-2009, 12:54 AM   #4
Wim Sturkenboom
Senior Member
 
Registered: Jan 2005
Location: Roodepoort, South Africa
Distribution: Slackware 10.1/10.2/12, Ubuntu 12.04, Crunchbang Statler
Posts: 3,786

Original Poster
Rep: Reputation: 282Reputation: 282Reputation: 282
@smeezekitty
How? To know the elements/nodes you need to parse it.

@paulsm4
Thanks for the links; I'm heavily searching in the web (that's how I found the base for my code) but the links that I've found till now are code examples without much explanantion.
 
Old 10-01-2009, 01:35 AM   #5
smeezekitty
Senior Member
 
Registered: Sep 2009
Location: Washington U.S.
Distribution: M$ Windows / Debian / Ubuntu / DSL / many others
Posts: 2,220

Rep: Reputation: 164Reputation: 164
parsing is simple
when you encounter a "<" you start recording the tag name
then when you finish reading the tag name
you load it into the array
then when you find a "</" you start recording the tag name again
then search the array and delete it from the array
the array will also record tag parameters such as names, filenames, etc.
also this may help:http://www.xml.com/pub/a/1999/11/cplus/index.html
 
Old 10-01-2009, 02:00 AM   #6
Wim Sturkenboom
Senior Member
 
Registered: Jan 2005
Location: Roodepoort, South Africa
Distribution: Slackware 10.1/10.2/12, Ubuntu 12.04, Crunchbang Statler
Posts: 3,786

Original Poster
Rep: Reputation: 282Reputation: 282Reputation: 282
OK, that's the hard way. Nothing against it, I use it often if I'm not aware of the existence of functions or libraries for something. I did csv and ini file parsing that way in the past in Tcl/Tk and C.

However, I don't consider any parsing easy. There are plenty exceptions that one must take care of. Therefore it's very prone to errors in my opinion (a possible reason why your C code was instable).

And why re-invent the wheel if all functionality already exists.
 
Old 10-01-2009, 02:27 AM   #7
smeezekitty
Senior Member
 
Registered: Sep 2009
Location: Washington U.S.
Distribution: M$ Windows / Debian / Ubuntu / DSL / many others
Posts: 2,220

Rep: Reputation: 164Reputation: 164
better XML then C++
 
Old 10-01-2009, 04:52 AM   #8
CroMagnon
Member
 
Registered: Sep 2004
Location: New Zealand
Distribution: Debian
Posts: 900

Rep: Reputation: 33
The problem is this:

Code:
<Tag>
  <Child1/>
  <Child2/>
</Tag>
The parser does not know that the whitespace is not significant, so it returns it as a child:

Child 1: text: <newline><space><space>
Child 2: Child1 tag
Child 3: text: <newline><space><space>
Child 4: Child2 tag
Child 5: text: <newline>

You can ignore whitespace with setIgnoringElementContentWhitespace, but this also requires validation be turned on. Other options include checking for whitespace text nodes and ignoring them yourself, or adjusting the XML:

Code:
<Tag><Child1/><Child2/></Tag>
[Edit]
Just read your question at the end - you already used the right method to get the children - you have them in your NodeList called 'children'. It's just that some of them are text. Try this:
Code:
for (int k = 0; k < numChildren; k++) {
  if (children.item(k).getNodeType() == ELEMENT_NODE) {
    print( "Node: " + children.item(k).getNodeName() );
  }
}
Substitute your preferred method of output in the print() line... I just used a console program, not a GUI.

Last edited by CroMagnon; 10-01-2009 at 05:04 AM.
 
Old 10-02-2009, 03:35 AM   #9
Wim Sturkenboom
Senior Member
 
Registered: Jan 2005
Location: Roodepoort, South Africa
Distribution: Slackware 10.1/10.2/12, Ubuntu 12.04, Crunchbang Statler
Posts: 3,786

Original Poster
Rep: Reputation: 282Reputation: 282Reputation: 282
@CroMagnon
The spaces were indeed the problem. I should have thought about that but I'm quite new to XML documents.

Ran into the next problem, but will start another thread if I can't solve it.

The full code (as reference for others)
Code:
    public void FileOpen() throws IOException, ParserConfigurationException, org.xml.sax.SAXException {
    // WimS

        // An array of names for DOM node types
        // (Array indexes = nodeType() values.)
        String[] typeName = {
            "none",
            "Element",
            "Attribute",
            "Text",
            "CDATA Section",
            "Entity Reference",
            "Entity",
            "Processing Instruction",
            "Comment",
            "Document",
            "Document Type",
            "Document Fragment",
            "Notation",
        };

        JFileChooser chooser = new JFileChooser();
        if (chooser.showOpenDialog(null) != JFileChooser.APPROVE_OPTION)
        {
            return;
        }
        File infile = chooser.getSelectedFile();
        /* clear the textarea */
        jTextArea1.setText(null);
        /* display filename */
        jTextArea1.append("\nFile: " + infile.getName() + "\n");

        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setIgnoringComments(true);
        factory.setCoalescing(true);
        factory.setNamespaceAware(false);
        factory.setValidating(false);

        DocumentBuilder parser = factory.newDocumentBuilder();
        Document document = parser.parse(infile);

        // get the first node (root element)
        String firstnode = document.getDocumentElement().getNodeName();
        jTextArea1.append("First node: " + firstnode + "\n");

        // get the section
        NodeList sections = document.getElementsByTagName(firstnode);
        int numSections = sections.getLength();
        // display number of sections
        jTextArea1.append("Number of sections: " + Integer.toString(numSections) + "\n");
        for (int i = 0; i < numSections; i++)
        {
            Element section = (Element) sections.item(i);

            NodeList children = section.getChildNodes();
            int numChildren = children.getLength();
            jTextArea1.append("Number of children: " + Integer.toString(numChildren) + "\n");

            for (int j = 0; j < numChildren; j++)
            {
                jTextArea1.append(Integer.toString(j) + ">>" + typeName[children.item(j).getNodeType()]);
                if (children.item(j).getNodeType() == Node.ELEMENT_NODE)
                {
                    Element child = (Element) children.item(j);
                    jTextArea1.append(" " + child.getNodeName() + "<<\n");
                    jTextArea1.append("Attribute Value: '" + child.getAttribute("Value") + "'\n");
                }
                else
                {
                    jTextArea1.append("<<\n");
                }
            }
        }
    }

Last edited by Wim Sturkenboom; 10-02-2009 at 03:37 AM. Reason: Added the code
 
  


Reply

Tags
java, parsing, xml


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Need help in parsing XML file madhi Programming 12 07-10-2009 01:36 AM
Parsing XML file sneha hendre Linux - Newbie 2 09-15-2008 10:55 PM
XML parsing in C irfanhab Programming 3 05-06-2006 12:47 AM
Standard XML parsing library? pembo13 Programming 3 01-27-2006 11:45 AM
xml parsing with xsl crabboy Programming 2 03-22-2004 01:45 AM


All times are GMT -5. The time now is 07:25 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration