Progress when parsing a large file with SAX

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • marc.omorain@gmail.com

    Progress when parsing a large file with SAX

    Hi there,

    I have a 28mb XML file which I parse with SAX. I have some processing
    to do in the startElement / endElement callbacks, which slows the
    parsing down to about 60 seconds on my machine.

    My application is unresponsive for this time, so I would like to show
    a progress bar. I could show a spinner to show that the application is
    responsive, but I would prefer to show a percentage. Is there any way
    to query the parser to see how many bytes of the input file have been
    processed so far?

    Thanks,

    Marc

  • Diez B. Roggisch

    #2
    Re: Progress when parsing a large file with SAX

    marc.omorain@gm ail.com wrote:
    Hi there,
    >
    I have a 28mb XML file which I parse with SAX. I have some processing
    to do in the startElement / endElement callbacks, which slows the
    parsing down to about 60 seconds on my machine.
    >
    My application is unresponsive for this time, so I would like to show
    a progress bar. I could show a spinner to show that the application is
    responsive, but I would prefer to show a percentage. Is there any way
    to query the parser to see how many bytes of the input file have been
    processed so far?
    I'd create a file-like object that does this for you. It should wrap the
    original file, and count the number of bytes delivered. Something along
    these lines (untested!!!):

    class PercentageFile( object):

    def __init__(self, filename):
    self.size = os.stat(filenam e)[6]
    self.delivered = 0
    self.f = file(filename)

    def read(self, size=None):
    if size is None:
    self.delivered = self.size
    return self.f.read()
    data = self.f.read(siz e)
    self.delivered += len(data)
    return data

    @property
    def percentage(self ):
    return float(self.deli vered) / self.size * 100.0

    Diez

    Comment

    • Anastasios Hatzis

      #3
      Re: Progress when parsing a large file with SAX

      Diez B. Roggisch wrote:

      ....

      I got the same problem with large XML as Marc.

      So you deserve also my thanks for the example. :-)
      class PercentageFile( object):
      >
      def __init__(self, filename):
      self.size = os.stat(filenam e)[6]
      self.delivered = 0
      self.f = file(filename)
      >
      def read(self, size=None):
      if size is None:
      self.delivered = self.size
      return self.f.read()
      data = self.f.read(siz e)
      self.delivered += len(data)
      return data
      >
      I guess some client impl need to call read() on a wrapped xml file until
      all portions of the file are read.
      @property
      def percentage(self ):
      return float(self.deli vered) / self.size * 100.0
      >
      @property?

      What is that supposed to do?

      Anastasios

      Comment

      • Diez B. Roggisch

        #4
        Re: Progress when parsing a large file with SAX

        Anastasios Hatzis wrote:
        Diez B. Roggisch wrote:
        >
        ...
        >
        I got the same problem with large XML as Marc.
        >
        So you deserve also my thanks for the example. :-)
        >
        >class PercentageFile( object):
        >>
        > def __init__(self, filename):
        > self.size = os.stat(filenam e)[6]
        > self.delivered = 0
        > self.f = file(filename)
        >>
        > def read(self, size=None):
        > if size is None:
        > self.delivered = self.size
        > return self.f.read()
        > data = self.f.read(siz e)
        > self.delivered += len(data)
        > return data
        >>
        >
        I guess some client impl need to call read() on a wrapped xml file until
        all portions of the file are read.
        You should fed the PercentageFile-object to the xml-parser, like this:

        parser = xml.sax.make_pa rser()
        pf = PercentageFile( filename)
        parser.parse(pf )

        > @property
        > def percentage(self ):
        > return float(self.deli vered) / self.size * 100.0
        >>
        >
        @property?
        >
        What is that supposed to do?
        It's making percentage a property, so that you can access it like this:

        pf.percentage

        instead of

        pf.percentage()

        Google python property for details, or pydoc property.

        Diez

        Comment

        • marc.omorain@gmail.com

          #5
          Re: Progress when parsing a large file with SAX

          On Feb 12, 1:21 pm, "Diez B. Roggisch" <d...@nospam.we b.dewrote:
          Anastasios Hatzis wrote:
          Diez B. Roggisch wrote:
          Thanks guys!

          I'll try it out later today and let you know how I get on.

          Comment

          Working...