python xml dom help please

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • deglog

    python xml dom help please

    Apologies if this post appears more than once.

    The file -

    ---------------
    <?xml version="1.0" encoding="utf-8"?>
    <Game><A/><B/><C/></Game>
    ---------------

    is processed by this program -

    ---------------
    #!/usr/bin/env python

    from xml.dom.ext.rea der import PyExpat
    from xml.dom.ext import PrettyPrint

    import sys

    def deepen(nodeList ):
    for node in nodeList:
    print(node.node Name)
    if node.previousSi bling != None:
    if node.previousSi bling.nodeType == node.ELEMENT_NO DE:
    if node.previousSi bling.hasChildN odes():
    print("has children")
    node.previousSi bling.lastChild .appendChild(no de)
    else:
    node.previousSi bling.appendChi ld(node)
    deepen(node.chi ldNodes)

    # get DOM object
    reader = PyExpat.Reader( )
    doc = reader.fromUri( sys.argv[1])

    # call func
    deepen(doc.chil dNodes)

    # display altered document
    PrettyPrint(doc )
    ---------------

    which outputs the following -

    ---------------
    Game
    Game
    A
    B
    <?xml version='1.0' encoding='UTF-8'?>
    <Game>
    <A>
    <B/>
    </A>
    <C/>
    </Game>

    ---------------

    Can anybody explain why the line 'print(node.nod eName)' never prints 'C'?

    Also, why 'has children' is never printed?

    I am trying to output

    ---------------
    <?xml version='1.0' encoding='UTF-8'?>
    <Game>
    <A>
    <B>
    <C/>
    </B>
    </A>
    </Game>
    ---------------

    I know there are easier ways to do this, but i want to do it using dom.

    Thanks in advance.
  • Miklós

    #2
    Re: python xml dom help please

    Without having any thorough look at your (recursive)'dee pen' function, I can
    see there's no termination condition for the recursion....
    So that's one reason this won't work the way you want it to.

    Miklós


    deglog <spam.meplease@ ntlworld.com> wrote in message
    news:f78fb98.03 11230813.3ab7cf d4@posting.goog le.com...[color=blue]
    >
    > ---------------
    > #!/usr/bin/env python
    >
    > from xml.dom.ext.rea der import PyExpat
    > from xml.dom.ext import PrettyPrint
    >
    > import sys
    >
    > def deepen(nodeList ):
    > for node in nodeList:
    > print(node.node Name)
    > if node.previousSi bling != None:
    > if node.previousSi bling.nodeType == node.ELEMENT_NO DE:
    > if node.previousSi bling.hasChildN odes():
    > print("has children")
    > node.previousSi bling.lastChild .appendChild(no de)
    > else:
    > node.previousSi bling.appendChi ld(node)
    > deepen(node.chi ldNodes)
    >[/color]



    Comment

    • Diez B. Roggisch

      #3
      Re: python xml dom help please

      Miklós wrote:
      [color=blue]
      > Without having any thorough look at your (recursive)'dee pen' function, I
      > can see there's no termination condition for the recursion....
      > So that's one reason this won't work the way you want it to.[/color]

      Nope - he has a termination condition. deepen is called for all childNodes,
      so he makes a traversal of all nodes.

      Regards,

      Diez

      Comment

      • Diez B. Roggisch

        #4
        Re: python xml dom help please

        Hi,
        [color=blue]
        >
        > Also, why 'has children' is never printed?[/color]

        The code is somewhat complicated, however the reason for "has children" not
        beeing printed is simply that for the example no node matches the condition
        - nodes A,B,C are the only ones with siblings, and none of them has a child
        node....
        [color=blue]
        > I know there are easier ways to do this, but i want to do it using dom.[/color]

        I'm not sure what easier ways _you_ think of - but to me it looks like a
        classic field for XSLT, which is much more convenient to deal with. DOM is
        usually PIA, don't mess around with it if you're not forced to.

        Diez

        Comment

        • Andrew Clover

          #5
          Re: python xml dom help please

          spam.meplease@n tlworld.com (deglog) wrote:
          [color=blue]
          > def deepen(nodeList ):
          > for node in nodeList:
          > [...]
          > node.previousSi bling.appendChi ld(node)[/color]

          Bzzt: destructive iteration gotcha.

          DOM NodeLists are 'live': when you move a child Element out of the parent,
          it no longer exists in the childNodes list. So in the example:

          <a/>
          <b/>
          <c/>

          the first element (a) cannot be moved and is skipped; the second element (b)
          is moved into its previousSibling (a); the third element... wait, there is no
          third element any more because (c) is now the second element. So the loop
          stops.

          A solution would be to make a static copy of the list beforehand. There's no
          standard-DOM way of doing that and the Python copy() method is not guaranteed
          to work here, so use a list comprehension or map:

          identity= lambda x: x
          for node in map(identity, nodeList):
          ...

          --
          Andrew Clover
          mailto:and@doxd esk.com

          Comment

          • John J. Lee

            #6
            Re: python xml dom help please

            and-google@doxdesk. com (Andrew Clover) writes:
            [color=blue]
            > spam.meplease@n tlworld.com (deglog) wrote:[/color]
            [...][color=blue]
            > A solution would be to make a static copy of the list beforehand. There's no
            > standard-DOM way of doing that and the Python copy() method is not guaranteed
            > to work here, so use a list comprehension or map:
            >
            > identity= lambda x: x
            > for node in map(identity, nodeList):
            > ...[/color]

            Why not just

            for node in list(nodeList):
            ...

            ?


            John

            Comment

            • deglog

              #7
              Re: python xml dom help please

              Thanks for the help - this works and i understand how, and why.

              jjl@pobox.com (John J. Lee) wrote in message news:<87isl89sh v.fsf@pobox.com >...
              [color=blue]
              >
              > Why not just
              >
              > for node in list(nodeList):
              > ...
              >
              > ?
              >
              >
              > John[/color]

              the following also works (as i intended):

              from xml.dom.NodeFil ter import NodeFilter

              def appendToDescend ant(node):
              walker.previous Sibling()
              while 1:
              if walker.currentN ode.hasChildNod es():
              next = walker.nextNode ()
              else: break
              walker.currentN ode.appendChild (node)

              walker = doc.createTreeW alker(doc.docum entElement,Node Filter.SHOW_ELE MENT,
              None, 0)
              while 1:
              print walker.currentN ode.nodeName
              if walker.currentN ode.previousSib ling != None:
              print "ps "+walker.curren tNode.previousS ibling.nodeName
              if walker.currentN ode.previousSib ling.nodeName != "Game":
              if walker.currentN ode.previousSib ling.hasChildNo des():
              appendToDescend ant(walker.curr entNode)
              else:
              walker.currentN ode.previousSib ling.appendChil d(walker.curren tNode)
              next = walker.nextNode ()
              if next is None: break

              Strangely, the line checking "Game" is needed, because this firstnode
              is its own previous sibling - how can this be right?

              for example with the input file:
              ---
              <?xml version="1.0" encoding="utf-8"?>
              <Game/>
              ---
              the ouptput is:
              ---
              Game
              ps Game
              <?xml version='1.0' encoding='UTF-8'?>
              <Game/>

              Comment

              • Andrew Clover

                #8
                Re: python xml dom help please

                John J. Lee <jjl@pobox.co m> wrote:
                [color=blue]
                > Why not just for node in list(nodeList)?[/color]

                You're right! I never trusted list() to make a copy if it was already a
                native list (as it is sometimes in eg. minidom) but, bothering to check the
                docs, it is guaranteed to after all. Hurrah.

                spam.meplease@n tlworld.com (deglog) wrote:
                [color=blue]
                > def appendToDescend ant(node):
                > walker.previous Sibling()
                > while 1:
                > if walker.currentN ode.hasChildNod es():
                > next = walker.nextNode ()
                > else: break
                > walker.currentN ode.appendChild (node)[/color]

                Are you sure this is doing what you want? A TreeWalker's nextNode() method
                goes to an node's next matching sibling, not into its children. To go into
                the matching children you'd use TreeWalker.firs tChild().

                The function as written above appends the argument node to the first sibling
                to have no child nodes, starting from the TreeWalker's current node or its
                previous sibling if there is one.

                I'm not wholly sure I understand the problem you're trying to solve. If you
                just want to nest sibling elements as first children, you could do it without
                Traversal or recursion, for example:

                def nestChildrenInt oFirstElements( parent):
                elements= [c for c in parent.childNod es if c.nodeType==c.E LEMENT_NODE]
                if len(elements)>= 2:
                insertionPoint= elements[0]
                for element in elements[1:]:
                insertionPoint. appendChild(ele ment)
                insertionPoint= element

                (Untested but no reason it shouldn't work.)
                [color=blue]
                > Strangely, the line checking "Game" is needed, because this firstnode
                > is its own previous sibling - how can this be right?[/color]

                4DOM is fooling you. It has inserted a <!DOCTYPE> declaration automatically
                for you. (It probably shouldn't do that.) So the previous sibling of the
                documentElement is the doctype; of course the doctype has the same nodeName
                as the documentElement , so the debugging output is misleading.

                --
                Andrew Clover
                mailto:and@doxd esk.com

                Comment

                • deglog

                  #9
                  Re: python xml dom help please

                  and-google@doxdesk. com (Andrew Clover) wrote in message news:<2c60a528. 0311261642.5478 397d@posting.go ogle.com>...
                  [color=blue]
                  >[color=green]
                  > > def appendToDescend ant(node):
                  > > walker.previous Sibling()
                  > > while 1:
                  > > if walker.currentN ode.hasChildNod es():
                  > > next = walker.nextNode ()
                  > > else: break
                  > > walker.currentN ode.appendChild (node)[/color]
                  >
                  > Are you sure this is doing what you want? A TreeWalker's nextNode() method
                  > goes to an node's next matching sibling, not into its children. To go into
                  > the matching children you'd use TreeWalker.firs tChild().[/color]

                  right
                  [color=blue]
                  >
                  > I'm not wholly sure I understand the problem you're trying to solve.[/color]

                  actually i'm trying to change the relationship 'is next sibling of' to
                  'is child of' throughout a document

                  my latest idea is to go to the end of the document, then walk it
                  backwards (for christmas?:-) towards this end i wrote:
                  ---
                  walker = doc.createTreeW alker(doc.docum entElement,Node Filter.SHOW_ELE MENT,
                  None, 0)
                  while 1:
                  print '1 '+walker.curren tNode.nodeName
                  next = walker.nextNode ()
                  if next is None: break
                  print '2 '+walker.curren tNode.nodeName
                  ---
                  which, given
                  ---
                  <?xml version="1.0" encoding="utf-8"?>
                  <Game><A/></Game>

                  ---
                  outputs
                  ---
                  1 Game
                  1 A
                  2 Game
                  ---
                  foiled again. How come the current node is back at the start atfter
                  the loop has finished?

                  Comment

                  • Andrew Clover

                    #10
                    Re: python xml dom help please

                    spam.meplease@n tlworld.com (deglog) wrote:
                    [color=blue]
                    > actually i'm trying to change the relationship 'is next sibling of' to
                    > 'is child of' throughout a document[/color]

                    Well, the snippet in the posting above should do that well enough. What
                    happens to any existing nested children is not defined.
                    [color=blue]
                    > How come the current node is back at the start atfter the loop has finished?[/color]

                    Bug. I've just submitted a patch to the PyXML tracker to address this issue.

                    (Note: earlier versions of TreeWalker - certainly 0.8.0 - have more significant
                    bugs, that can lead to infinite recursion.)

                    That said, I'm not sure how using a TreeWalker or walking backwards actually
                    helps you here! If you are just using it to filter out non-element children,
                    remember that moving the current node takes the position of the TreeWalker
                    with it. It's not like NodeIterator.

                    --
                    Andrew Clover
                    mailto:and@doxd esk.com

                    Comment

                    • deglog

                      #11
                      Re: python xml dom help please

                      and-google@doxdesk. com (Andrew Clover) wrote in message news:<2c60a528. 0311291051.33b7 d789@posting.go ogle.com>...
                      [color=blue]
                      >
                      > Bug. I've just submitted a patch to the PyXML tracker to address this issue.
                      >
                      > (Note: earlier versions of TreeWalker - certainly 0.8.0 - have more significant
                      > bugs, that can lead to infinite recursion.)
                      >[/color]

                      Thanks.

                      Does the function def __regress(self) from the same package need a similar fix?

                      (i am using PyXml 0.8.3)

                      Comment

                      • Andrew Clover

                        #12
                        Re: python xml dom help please

                        spam.meplease@n tlworld.com (deglog) wrote:
                        [color=blue]
                        > Does the function def __regress(self) from the same package need a similar
                        > fix?[/color]

                        Nope, looks OK to me. There's no 'in between' state where the current node
                        ends up pointing somewhere it shouldn't in this one, because of the different
                        order of the next/previous-sibling step and the move-through-ancestor/descendant
                        step.

                        I haven't checked all of the rest of the code, though, so I can't guarantee
                        there aren't any other problems with 4DOM's Traversal/Range implementation.

                        --
                        Andrew Clover
                        mailto:and@doxd esk.com

                        Comment

                        Working...