Re: XML -> Tab-delimited text file (using lxml)

Stefan Behnel
#1

Re: XML -> Tab-delimited text file (using lxml)

Nov 19 '08, 04:05 PM

Gibson wrote:

I'm attempting to do the following:
A) Read/scan/iterate/etc. through a semi-large XML file (about 135 mb)
B) Grab specific fields and output to a tab-delimited text file
[...]
out = open('output.tx t','w')
cat = etree.parse('ca talog.xml')

Use iterparse() instead of parsing the file into memory completely.

untested:

for _, item in etree.iterparse ('catalog.xml', tag='Item'):
# do some cleanup to save memory
previous_item = item.getpreviou s()
while previous_item is not None:
previous_item.g etparent().remo ve(previous_ite m)
previous_item = item.getpreviou s()

# now read the data
id = item.get('ID')
collect = {}
for child in item:
if child.tag != 'ItemVal': continue
collect[child.get('Valu eId')] = child.get('valu e')

print "%s\t%s\t%s\t%s " % ((id,) + tuple(
collect[key] for key in ['name','descrip tion','image']))

Stefan
Tags: None
Gibson
#2

Nov 19 '08, 05:35 PM

Re: XML -> Tab-delimited text file (using lxml)

On Nov 19, 11:03 am, Stefan Behnel <stefan...@behn el.dewrote:

>
Use iterparse() instead of parsing the file into memory completely.
>
*stuff*
>
Stefan

That worked wonders. Thanks a lot, Stefan.

So, iterparse() uses an iterate -parse method instead of parse() and
iter()'s parse -iterate method (if that makes any sense)?
Comment

Comment