Gibson wrote:
Use iterparse() instead of parsing the file into memory completely.
untested:
for _, item in etree.iterparse ('catalog.xml', tag='Item'):
# do some cleanup to save memory
previous_item = item.getpreviou s()
while previous_item is not None:
previous_item.g etparent().remo ve(previous_ite m)
previous_item = item.getpreviou s()
# now read the data
id = item.get('ID')
collect = {}
for child in item:
if child.tag != 'ItemVal': continue
collect[child.get('Valu eId')] = child.get('valu e')
print "%s\t%s\t%s\t%s " % ((id,) + tuple(
collect[key] for key in ['name','descrip tion','image']))
Stefan
I'm attempting to do the following:
A) Read/scan/iterate/etc. through a semi-large XML file (about 135 mb)
B) Grab specific fields and output to a tab-delimited text file
[...]
out = open('output.tx t','w')
cat = etree.parse('ca talog.xml')
A) Read/scan/iterate/etc. through a semi-large XML file (about 135 mb)
B) Grab specific fields and output to a tab-delimited text file
[...]
out = open('output.tx t','w')
cat = etree.parse('ca talog.xml')
untested:
for _, item in etree.iterparse ('catalog.xml', tag='Item'):
# do some cleanup to save memory
previous_item = item.getpreviou s()
while previous_item is not None:
previous_item.g etparent().remo ve(previous_ite m)
previous_item = item.getpreviou s()
# now read the data
id = item.get('ID')
collect = {}
for child in item:
if child.tag != 'ItemVal': continue
collect[child.get('Valu eId')] = child.get('valu e')
print "%s\t%s\t%s\t%s " % ((id,) + tuple(
collect[key] for key in ['name','descrip tion','image']))
Stefan
Comment