Problem with processing XML

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • John Carlyle-Clarke

    Problem with processing XML

    Hi.

    I'm new to Python and trying to use it to solve a specific problem. I
    have an XML file in which I need to locate a specific text node and
    replace the contents with some other text. The text in question is
    actually about 70k of base64 encoded data.

    I wrote some code that works on my Linux box using xml.dom.minidom , but
    it will not run on the windows box that I really need it on. Python
    2.5.1 on both.

    On the windows machine, it's a clean install of the Python .msi from
    python.org. The linux box is Ubuntu 7.10, which has some Python XML
    packages installed which can't easily be removed (namely python-libxml2
    and python-xml).

    I have boiled the code down to its simplest form which shows the problem:-

    import xml.dom.minidom
    import sys

    input_file = sys.argv[1];
    output_file = sys.argv[2];

    doc = xml.dom.minidom .parse(input_fi le)
    file = open(output_fil e, "w")
    doc.writexml(fi le)

    The error is:-

    $ python test2.py input2.xml output.xml
    Traceback (most recent call last):
    File "test2.py", line 9, in <module>
    doc.writexml(fi le)
    File "c:\Python25\li b\xml\dom\minid om.py", line 1744, in writexml
    node.writexml(w riter, indent, addindent, newl)
    File "c:\Python25\li b\xml\dom\minid om.py", line 814, in writexml
    node.writexml(w riter,indent+ad dindent,addinde nt,newl)
    File "c:\Python25\li b\xml\dom\minid om.py", line 809, in writexml
    _write_data(wri ter, attrs[a_name].value)
    File "c:\Python25\li b\xml\dom\minid om.py", line 299, in _write_data
    data = data.replace("& ", "&amp;").replac e("<", "&lt;")
    AttributeError: 'NoneType' object has no attribute 'replace'

    As I said, this code runs fine on the Ubuntu box. If I could work out
    why the code runs on this box, that would help because then I call set
    up the windows box the same way.

    The input file contains an <xsd:schemabloc k which is what actually
    causes the problem. If you remove that node and subnodes, it works
    fine. For a while at least, you can view the input file at


    Someone suggested that I should try xml.etree.Eleme ntTree, however
    writing the same type of simple code to import and then write the file
    mangles the xsd:schema stuff because ElementTree does not understand
    namespaces.

    By the way, is pyxml a live project or not? Should it still be used?
    It's odd that if you go to http://www.python.org/ and click the link
    "Using python for..." XML, it leads you to
    The home page for XML processing with Python.


    If you then follow the download links to
    http://sourceforge.net/project/showf...?group_id=6473 you see that
    the latest file is 2004, and there are no versions for newer pythons.
    It also says "PyXML is no longer maintained". Shouldn't the link be
    removed from python.org?

    Thanks in advance!
  • Paul McGuire

    #2
    Re: Problem with processing XML

    On Jan 22, 8:11 am, John Carlyle-Clarke <j...@nowhere.o rgwrote:
    Hi.
    >
    I'm new to Python and trying to use it to solve a specific problem.  I
    have an XML file in which I need to locate a specific text node and
    replace the contents with some other text.  The text in question is
    actually about 70k of base64 encoded data.
    >
    Here is a pyparsing hack for your problem. I normally advise against
    using literal strings like "<value>" to match XML or HTML tags in a
    parser, since this doesn't cover variations in case, embedded
    whitespace, or unforeseen attributes, but your example was too simple
    to haul in the extra machinery of an expression created by pyparsing's
    makeXMLTags.

    Also, I don't generally recommend pyparsing for working on XML, since
    there are so many better and faster XML-specific modules available.
    But if this does the trick for you for your specific base64-removal
    task, great.

    -- Paul

    # requires pyparsing 1.4.8 or later
    from pyparsing import makeXMLTags, withAttribute, keepOriginalTex t,
    SkipTo

    xml = """
    ... long XML string goes here ...
    """

    # define a filter that will key off of the <datatag with the
    # attribute 'name="PctShow. Image"', and then use suppress to filter
    the
    # body of the following <valuetag
    dataTag = makeXMLTags("da ta")[0]
    dataTag.setPars eAction(withAtt ribute(name="Pc tShow.Image"),
    keepOriginalTex t)

    filter = dataTag + "<value>" + SkipTo("</value>").suppre ss() + "</
    value>"

    xmlWithoutBase6 4Block = filter.transfor mString(xml)
    print xmlWithoutBase6 4Block

    Comment

    • Alnilam

      #3
      Re: Problem with processing XML

      On Jan 22, 9:11 am, John Carlyle-Clarke <j...@nowhere.o rgwrote:
      By the way, is pyxml a live project or not?  Should it still be used?
      It's odd that if you go tohttp://www.python.org/and click the link
      "Using python for..." XML, it leads you tohttp://pyxml.sourcefor ge.net/topics/
      >
      If you then follow the download links tohttp://sourceforge.net/project/showfiles.php?g roup_id=6473you see that
      the latest file is 2004, and there are no versions for newer pythons.
      It also says "PyXML is no longer maintained".  Shouldn't the link be
      removed from python.org?
      I was wondering that myself. Any answer yet?

      Comment

      • John Carlyle-Clarke

        #4
        Re: Problem with processing XML

        Paul McGuire wrote:
        >
        Here is a pyparsing hack for your problem.
        Thanks Paul! This looks like an interesting approach, and once I get my
        head around the syntax, I'll give it a proper whirl.

        Comment

        Working...