xml minidom redundant children??

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • bkamrani@gmail.com

    xml minidom redundant children??

    Great guys:

    As a newbie, I'm trying to simply parse a xml file using minidom, but
    I don't know why I get some extra children(?). I don't know what is
    wrong in xml file, but I've tried different xml files, still same
    problem.

    *************** *************** *************** *************** *************** ***
    xml file (fileTest) looks like:
    <?xml version="1.0" encoding="ISO-8859-1" ?>
    <afc xmlns="http://python.org/:aaa" xmlns:afc="http ://
    python.org/:foo">
    <afc:Bibliograp hy>
    <File version="2.0.0. 0" publicationDate ="2007-02-16
    11:23:06+01:00" />
    <Revision version="2" />
    <Application version="02.00. 00" />
    </afc:Bibliograph y>
    </afc>
    *************** *************** *************** *************** *************** ***
    Python file looks like:
    from xml.dom import minidom
    doc = minidom.parse(f ileTest)
    a= doc.documentEle ment.childNodes
    print a
    print '--------------'
    for item in a:
    print item.nodeName
    *************** *************** *************** *************** *************** ***
    And output is:
    [<DOM Text node "\n">, <DOM Element: afc:Bibliograph y at 12082960>,
    <DOM Text node "\n">]
    --------------
    #text
    afc:Bibliograph y
    #text
    *************** *************** *************** *************** *************** ***

    My question is why this <DOM Text node "\n"or #text has been
    created and how to get rid of them by changing python code? (here I'm
    not interested to change xml file.)

    Have search the forum without finding any solution :-(

    Thank you to all in advance!!
    /Ben

  • Marc 'BlackJack' Rintsch

    #2
    Re: xml minidom redundant children??

    In <1172774813.293 437.108250@v33g 2000cwv.googleg roups.com>, bkamrani
    wrote:
    *************** *************** *************** *************** *************** ***
    xml file (fileTest) looks like:
    <?xml version="1.0" encoding="ISO-8859-1" ?>
    <afc xmlns="http://python.org/:aaa" xmlns:afc="http ://
    python.org/:foo">
    <afc:Bibliograp hy>
    <File version="2.0.0. 0" publicationDate ="2007-02-16
    11:23:06+01:00" />
    <Revision version="2" />
    <Application version="02.00. 00" />
    </afc:Bibliograph y>
    </afc>
    *************** *************** *************** *************** *************** ***
    Python file looks like:
    from xml.dom import minidom
    doc = minidom.parse(f ileTest)
    a= doc.documentEle ment.childNodes
    print a
    print '--------------'
    for item in a:
    print item.nodeName
    *************** *************** *************** *************** *************** ***
    And output is:
    [<DOM Text node "\n">, <DOM Element: afc:Bibliograph y at 12082960>,
    <DOM Text node "\n">]
    --------------
    #text
    afc:Bibliograph y
    #text
    *************** *************** *************** *************** *************** ***
    >
    My question is why this <DOM Text node "\n"or #text has been
    created and how to get rid of them by changing python code? (here I'm
    not interested to change xml file.)
    They have been created because the text is in the XML source. Line breaks
    are valid text.

    Ciao,
    Marc 'BlackJack' Rintsch

    Comment

    • Diez B. Roggisch

      #3
      Re: xml minidom redundant children??

      bkamrani@gmail. com schrieb:
      Great guys:
      >
      As a newbie, I'm trying to simply parse a xml file using minidom, but
      I don't know why I get some extra children(?). I don't know what is
      wrong in xml file, but I've tried different xml files, still same
      problem.
      >
      *************** *************** *************** *************** *************** ***
      xml file (fileTest) looks like:
      <?xml version="1.0" encoding="ISO-8859-1" ?>
      <afc xmlns="http://python.org/:aaa" xmlns:afc="http ://
      python.org/:foo">
      <afc:Bibliograp hy>
      <File version="2.0.0. 0" publicationDate ="2007-02-16
      11:23:06+01:00" />
      <Revision version="2" />
      <Application version="02.00. 00" />
      </afc:Bibliograph y>
      </afc>
      *************** *************** *************** *************** *************** ***
      Python file looks like:
      from xml.dom import minidom
      doc = minidom.parse(f ileTest)
      a= doc.documentEle ment.childNodes
      print a
      print '--------------'
      for item in a:
      print item.nodeName
      *************** *************** *************** *************** *************** ***
      And output is:
      [<DOM Text node "\n">, <DOM Element: afc:Bibliograph y at 12082960>,
      <DOM Text node "\n">]
      --------------
      #text
      afc:Bibliograph y
      #text
      *************** *************** *************** *************** *************** ***
      >
      My question is why this <DOM Text node "\n"or #text has been
      created and how to get rid of them by changing python code? (here I'm
      not interested to change xml file.)
      >
      Have search the forum without finding any solution :-(
      You can't get rid of them by itself - xml.minidom can't possibly know if
      whitespace is of any significance for you or not.

      There are several ways to deal with this. If you have to stay in
      minidom, just loop over the children and discard all whitespace-only
      text-nodes, before really processing the document.

      But the better alternative would be to use a better API for processing
      XML. Use one of the several ElementTree implementations , such as lxml:



      This will not rid you of the whitespace itself, but represents text
      differently so that you can focus on elements without intespersed
      text-nodes.

      Diez

      Comment

      • MonkeeSage

        #4
        Re: xml minidom redundant children??

        On Mar 1, 12:46 pm, bkamr...@gmail. com wrote:
        As a newbie, I'm trying to simply parse a xml file using minidom, but
        I don't know why I get some extra children(?). I don't know what is
        wrong in xml file, but I've tried different xml files, still same
        problem.
        Most simply, if you need to stick with xml.dom.minidom ; just check the
        nodeType and make sure its not 3 (textNode):

        from xml.dom import minidom
        doc = minidom.parse(f ileTest)
        for item in doc.documentEle ment.childNodes :
        if not item.nodeType == 3:
        print item.nodeName

        Regards,
        Jordan

        Comment

        • Gabriel Genellina

          #5
          Re: xml minidom redundant children??

          En Thu, 01 Mar 2007 15:46:53 -0300, <bkamrani@gmail .comescribió:
          As a newbie, I'm trying to simply parse a xml file using minidom, but
          I don't know why I get some extra children(?). I don't know what is
          wrong in xml file, but I've tried different xml files, still same
          problem.
          If you don't have to use exactly xml.dom.minidom , try using ElementTree
          http://www.effbot.org/zone/element-index.htm (it's already included with
          Python 2.5, for earlier versions you have to download and install it).
          It's a lot easier and clean if you are mostly concerned about the infoset
          rather than its representation.

          --
          Gabriel Genellina

          Comment

          Working...