Re: XML -> Tab-delimited text file (using lxml)

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Stefan Behnel

    Re: XML -> Tab-delimited text file (using lxml)

    Gibson wrote:
    I'm attempting to do the following:
    A) Read/scan/iterate/etc. through a semi-large XML file (about 135 mb)
    B) Grab specific fields and output to a tab-delimited text file
    [...]
    out = open('output.tx t','w')
    cat = etree.parse('ca talog.xml')
    Use iterparse() instead of parsing the file into memory completely.

    untested:

    for _, item in etree.iterparse ('catalog.xml', tag='Item'):
    # do some cleanup to save memory
    previous_item = item.getpreviou s()
    while previous_item is not None:
    previous_item.g etparent().remo ve(previous_ite m)
    previous_item = item.getpreviou s()

    # now read the data
    id = item.get('ID')
    collect = {}
    for child in item:
    if child.tag != 'ItemVal': continue
    collect[child.get('Valu eId')] = child.get('valu e')

    print "%s\t%s\t%s\t%s " % ((id,) + tuple(
    collect[key] for key in ['name','descrip tion','image']))

    Stefan
  • Gibson

    #2
    Re: XML -> Tab-delimited text file (using lxml)

    On Nov 19, 11:03 am, Stefan Behnel <stefan...@behn el.dewrote:
    >
    Use iterparse() instead of parsing the file into memory completely.
    >
    *stuff*
    >
    Stefan
    That worked wonders. Thanks a lot, Stefan.

    So, iterparse() uses an iterate -parse method instead of parse() and
    iter()'s parse -iterate method (if that makes any sense)?

    Comment

    Working...