[Python]Logical problem working with lists

Caesar Tjalbo · 12-23-2006, 10:01 AM

Context: I'm trying to read a XML file and return the data as a dictionary where every xml element is a key and its value the value in the dictionary. Whenever I find an element nested inside another I want the value to be another dictionary, so I've made a function I can call recursively.

I use a xml.sax.ContentHandler to parse the document. Simplified, I have a list with elements as numbers and a list with values as numbers. In my thinking, every item in the list with elements that doesn't have a value must be an element with elements inside itself.

The parsing works, example lists:

Code:

    elements = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17]
    values = [2,3,4,6,7,8,10,11,12,13,14,15,16,17]

element 0: root element of the document,
element 1: first element, contains 3 elements (2,3,4),
element 2: second element, has a value,
etc.

The code simplified:

Code:

def main():
    elements = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17]
    values = [2,3,4,6,7,8,10,11,12,13,14,15,16,17]
    fillDict(elements, values)

def fillDict(elements, values):
    for indexer in elements:
        if indexer in values:
            print '-IN- indexer =', indexer, 'elements =', elements
        else:
            print '-NOT IN- indexer =', indexer, 'elements =', elements
            fillDict(elements[indexer + 1:], values)
            return
    return

main()

The output:

Code:

-NOT IN- indexer = 0 elements = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]
-NOT IN- indexer = 1 elements = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]
-IN- indexer = 3 elements = [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]
-IN- indexer = 4 elements = [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]
-NOT IN- indexer = 5 elements = [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]
-NOT IN- indexer = 9 elements = [9, 10, 11, 12, 13, 14, 15, 16, 17]

It skips element 2, 6, 7 etc. I really don't see why the 'indexer' jumps from 1 to 3 and from 5 to 9. Does anybody see what I'm missing here?

taylor_venable · 12-23-2006, 10:42 AM

Argh! It's the unholy alliance of iteration and recursion!

But really, the problem you're experiencing is that indexer is the values in the list, not the indices. Hence, when you the list suddenly jumps from 1 to 3, it's because you were moving by (1 [value from indexer] + 1) = 2.

Judging by your code, I think what you want can be much more easily accomplished with a list comprehension:

Code:

[x for x in elements if x in values]

This returns a list of elements that are values. If that's not what you're looking for, please elaborate on your problem a bit more and I'll try to help.

sundialsvcs · 12-23-2006, 07:53 PM

My usual way of thinking about such things is heavily influenced by Lisp. Fortunately, so was Python's designer.

A Python list is much more than an array!

To my way of thinking, an XML data structure is most properly represented by a list containing one three-tuple: (element_name, attributes_list, elements_list) This tuple represents the root-node of the XML structure.

Both the second and the third items are, themselves, lists. The elements_list is a list of zero or more three-tuples of the format previously described. The attributes_list is a list of zero or more two-tuples of the form (attribute_name, attribute_value), where attribute_value cannot be a list but must be a simple value.

If your purpose is to build a DOM-like data structure, an important part of your processing might involve a "scaffolding list," which is a push-down stack which contains references to "the nested set of things that you are presently building." As SAX notifies you that you are entering and leaving the nested structures, the topmost scaffolding-list entry tells you where you are. The scaffolding is completely consumed by the time the processing ends.

If you need to provide an index to the DOM structure, additional data structures can be built alongside the DOM to serve that purpose.

Caesar Tjalbo · 12-27-2006, 03:58 AM

Thank you for your answers.
@ taylor_venable: I'm sure it was "the unholy alliance of iteration and recursion" (LOL) that plagued me, but as a (i.c. private) programmer I like to live on the edge and haven't found that many risky things in Python yet...
@ sundialsvcs: your comment made me re-asses what I wanted to do but I determined that another representation of the data wasn't useful to me. I tried to code a generic class that accepted a dict and returned a dict, with nothing more advanced as the original data types as attributes. You were right in saying that I was in fact building a DOM structure within the dict, so I didn't need to stick to SAX as a processing mechanism.

Since I already spent more time on this than anticipated, I took the easy way: I searched for code on the net and found an example which uses ElementTree. That was educational for me, solved my problem and provided an elegant and extensible way of dealing with XML. With my 'generic' class finished and put in a module, I don't need to bother with XML again.