ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
I've managed to find the root-element (ADI) but that's about it. I'm stuck because whatever I try does not give me any of the children. The green line e.g. gives me that there are 5 children. I don't understand that as there are only 2 children (metadata and asset).
The red line returns a type 3 (TEXT_NODE) which I also don't understand; this is more than likely because I'm not familiar with all terms in the DOM/XML (I'm currently digging through 200+ pages of W3C DOM specification).
Code:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setIgnoringComments(true);
factory.setCoalescing(true);
factory.setNamespaceAware(false);
factory.setValidating(false);
DocumentBuilder parser = factory.newDocumentBuilder();
Document document = parser.parse(infile);
// get the first node (root element)
String firstnode = document.getDocumentElement().getNodeName();
jTextArea1.append("First node: " + firstnode + "\n");
// get the section
NodeList sections = document.getElementsByTagName(firstnode);
int numSections = sections.getLength();
// display number of sections
jTextArea1.append("Number of sections: " + Integer.toString(numSections) + "\n");
for (int i = 0; i < numSections; i++)
{
Element section = (Element) sections.item(i);
NodeList children = section.getChildNodes();
int numChildren = children.getLength();
jTextArea1.append("Number of children: " + Integer.toString(numChildren) + "\n");
Node child = section.getFirstChild();
if (child==null)
jTextArea1.append(">>.. no children ..<<\n");
else
{
jTextArea1.append(">>" + Integer.toString(child.getNodeType()) + "<<\n");
}
}
So the (first) question is if somebody can tell me which method to use to get the children?
Last edited by Wim Sturkenboom; 10-01-2009 at 12:18 AM.
Distribution: Damn Small Linux, KateOs, M$ Ickdows Vista, My own OS
Posts: 2,094
Rep:
Africa?
anyway
my approch would be build a table of tags that it is in currently
and read the table and extract the values
i dont have code because i have made one in C but it was unstable
I cut/pasted your XML file into a text file, and looked at it under Firefox (Firefox, IE and most other browsers will show you XML in a tree view ... and let you quickly verify whether the file "looks OK" or not). I thought there might be tags out of order (or something similar) ... but the file looks fine.
I also glanced at your code ... and didn't see any obvious problems.
XML parsing in Java is really easy. Reading the WC3 specs ... and trying to make any programming sense out of it ... is very hard.
SUGGESTION:
Spend a few minutes with a good, simple Java/XML tutorial, and then take another look at your code.
@smeezekitty
How? To know the elements/nodes you need to parse it.
@paulsm4
Thanks for the links; I'm heavily searching in the web (that's how I found the base for my code) but the links that I've found till now are code examples without much explanantion.
Distribution: Damn Small Linux, KateOs, M$ Ickdows Vista, My own OS
Posts: 2,094
Rep:
parsing is simple
when you encounter a "<" you start recording the tag name
then when you finish reading the tag name
you load it into the array
then when you find a "</" you start recording the tag name again
then search the array and delete it from the array
the array will also record tag parameters such as names, filenames, etc.
also this may help:http://www.xml.com/pub/a/1999/11/cplus/index.html
OK, that's the hard way. Nothing against it, I use it often if I'm not aware of the existence of functions or libraries for something. I did csv and ini file parsing that way in the past in Tcl/Tk and C.
However, I don't consider any parsing easy. There are plenty exceptions that one must take care of. Therefore it's very prone to errors in my opinion (a possible reason why your C code was instable).
And why re-invent the wheel if all functionality already exists.
The parser does not know that the whitespace is not significant, so it returns it as a child:
Child 1: text: <newline><space><space>
Child 2: Child1 tag
Child 3: text: <newline><space><space>
Child 4: Child2 tag
Child 5: text: <newline>
You can ignore whitespace with setIgnoringElementContentWhitespace, but this also requires validation be turned on. Other options include checking for whitespace text nodes and ignoring them yourself, or adjusting the XML:
Code:
<Tag><Child1/><Child2/></Tag>
[Edit]
Just read your question at the end - you already used the right method to get the children - you have them in your NodeList called 'children'. It's just that some of them are text. Try this:
Code:
for (int k = 0; k < numChildren; k++) {
if (children.item(k).getNodeType() == ELEMENT_NODE) {
print( "Node: " + children.item(k).getNodeName() );
}
}
Substitute your preferred method of output in the print() line... I just used a console program, not a GUI.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.