Stop and Resume parsing of large XML file

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Brian Cryer

    Stop and Resume parsing of large XML file

    Currently I am using XmlReader (but I am open to other options) to parse an
    XML file, and I would like to be able to stop/break the current parse
    (simple enough) and then resume it later (say after a reboot). Is there any
    way to get the current location in the file that the XmlReader has reached
    so as to be able to restore that and start from that point later?

    TIA.

  • Martin Honnen

    #2
    Re: Stop and Resume parsing of large XML file

    Brian Cryer wrote:
    Currently I am using XmlReader (but I am open to other options) to parse
    an XML file, and I would like to be able to stop/break the current parse
    (simple enough) and then resume it later (say after a reboot). Is there
    any way to get the current location in the file that the XmlReader has
    reached so as to be able to restore that and start from that point later?
    I don't think so. If you have an underlying stream you could store the
    stream position but I don't know of any way to store and restore the
    state of the XmlReader.


    --

    Martin Honnen --- MVP XML

    Comment

    • Brian Cryer

      #3
      Re: Stop and Resume parsing of large XML file

      "Martin Honnen" <mahotrash@yaho o.dewrote in message
      news:erugPY$vIH A.4912@TK2MSFTN GP03.phx.gbl...
      Brian Cryer wrote:
      >Currently I am using XmlReader (but I am open to other options) to parse
      >an XML file, and I would like to be able to stop/break the current parse
      >(simple enough) and then resume it later (say after a reboot). Is there
      >any way to get the current location in the file that the XmlReader has
      >reached so as to be able to restore that and start from that point later?
      >
      I don't think so. If you have an underlying stream you could store the
      stream position but I don't know of any way to store and restore the state
      of the XmlReader.
      I'm not too worried about the "state" of the XmlReader (I might be when I
      get there but for now I'm assuming if there are any issues that I'll be able
      to work round them).

      I've looked at storing the stream position, but its evident that the
      XmlReader reads in a buffer load because my stream position is at about the
      8KB mark when I get to the first tag in the XmlReader.

      Ahh ... Martin, you've been a great "sounding board". Knowing that the
      XmlReader doesn't provide any way of doing this is useful. But thinking
      about it, if the XmlReader reads in 8KB chunks (an assumption on my part,
      but one which I ought to be able to test) then as a way of "restoring" I may
      be able to get away with simply putting my read point 8096 bytes before the
      last known position in the underlying stream and then deal with any errors
      that get thrown up when XmlReader hits what it thinks is malformed XML. Bit
      yucky, but this might work for me (if XmlReader will play ball). At least it
      gives me an avenue to explore.

      TA.

      Comment

      • =?Utf-8?B?RmFtaWx5IFRyZWUgTWlrZQ==?=

        #4
        Re: Stop and Resume parsing of large XML file



        "Brian Cryer" wrote:
        "Martin Honnen" <mahotrash@yaho o.dewrote in message
        news:erugPY$vIH A.4912@TK2MSFTN GP03.phx.gbl...
        Brian Cryer wrote:
        Currently I am using XmlReader (but I am open to other options) to parse
        an XML file, and I would like to be able to stop/break the current parse
        (simple enough) and then resume it later (say after a reboot). Is there
        any way to get the current location in the file that the XmlReader has
        reached so as to be able to restore that and start from that point later?
        I don't think so. If you have an underlying stream you could store the
        stream position but I don't know of any way to store and restore the state
        of the XmlReader.
        >
        I'm not too worried about the "state" of the XmlReader (I might be when I
        get there but for now I'm assuming if there are any issues that I'll be able
        to work round them).
        >
        I've looked at storing the stream position, but its evident that the
        XmlReader reads in a buffer load because my stream position is at about the
        8KB mark when I get to the first tag in the XmlReader.
        >
        Ahh ... Martin, you've been a great "sounding board". Knowing that the
        XmlReader doesn't provide any way of doing this is useful. But thinking
        about it, if the XmlReader reads in 8KB chunks (an assumption on my part,
        but one which I ought to be able to test) then as a way of "restoring" I may
        be able to get away with simply putting my read point 8096 bytes before the
        last known position in the underlying stream and then deal with any errors
        that get thrown up when XmlReader hits what it thinks is malformed XML. Bit
        yucky, but this might work for me (if XmlReader will play ball). At least it
        gives me an avenue to explore.
        >
        TA.
        >
        >
        It seems like a lot of work to go through, and likely prone to errors due to
        machine dependencies. How are you persisting the part that was read before
        the reboot? Are you no longer interested in that portion of the XML after it
        has been processed?

        Comment

        • Brian Cryer

          #5
          Re: Stop and Resume parsing of large XML file

          "Family Tree Mike" <FamilyTreeMike @discussions.mi crosoft.comwrot e in
          message news:60AE9B88-1244-4699-90B2-AF1211FE2941@mi crosoft.com...
          >
          "Brian Cryer" wrote:
          >
          >"Martin Honnen" <mahotrash@yaho o.dewrote in message
          >news:erugPY$vI HA.4912@TK2MSFT NGP03.phx.gbl.. .
          Brian Cryer wrote:
          >Currently I am using XmlReader (but I am open to other options) to
          >parse
          >an XML file, and I would like to be able to stop/break the current
          >parse
          >(simple enough) and then resume it later (say after a reboot). Is
          >there
          >any way to get the current location in the file that the XmlReader has
          >reached so as to be able to restore that and start from that point
          >later?
          >
          I don't think so. If you have an underlying stream you could store the
          stream position but I don't know of any way to store and restore the
          state
          of the XmlReader.
          >>
          >I'm not too worried about the "state" of the XmlReader (I might be when I
          >get there but for now I'm assuming if there are any issues that I'll be
          >able
          >to work round them).
          >>
          >I've looked at storing the stream position, but its evident that the
          >XmlReader reads in a buffer load because my stream position is at about
          >the
          >8KB mark when I get to the first tag in the XmlReader.
          >>
          >Ahh ... Martin, you've been a great "sounding board". Knowing that the
          >XmlReader doesn't provide any way of doing this is useful. But thinking
          >about it, if the XmlReader reads in 8KB chunks (an assumption on my part,
          >but one which I ought to be able to test) then as a way of "restoring" I
          >may
          >be able to get away with simply putting my read point 8096 bytes before
          >the
          >last known position in the underlying stream and then deal with any
          >errors
          >that get thrown up when XmlReader hits what it thinks is malformed XML.
          >Bit
          >yucky, but this might work for me (if XmlReader will play ball). At least
          >it
          >gives me an avenue to explore.
          >>
          >TA.
          >>
          >
          It seems like a lot of work to go through, and likely prone to errors due
          to
          machine dependencies. How are you persisting the part that was read
          before
          the reboot? Are you no longer interested in that portion of the XML after
          it
          has been processed?
          Fortunatly in this case the XML file whilst rather long is quite shallow. So
          I can forget about what went on before, and if I come across a duplicate
          section (which I will) then I can handle that (because each has a unique
          ID). So, in short, I don't need to worry too much about what went on before
          or the context. So, this isn't a generic solution by any means. (If I were
          processing something like an HTML file then it would get too messy to be
          viable.)

          However, all this is still theory at the moment, as other work has pulled me
          away from this. I am hoping to be able to prove whether or not thie approach
          works for me either today or tomorrow.


          Comment

          • Brian Cryer

            #6
            Re: Stop and Resume parsing of large XML file

            "Brian Cryer" <www.cryer.co.u kwrote in message
            news:u5SuoDXwIH A.1236@TK2MSFTN GP02.phx.gbl...
            "Family Tree Mike" <FamilyTreeMike @discussions.mi crosoft.comwrot e in
            message news:60AE9B88-1244-4699-90B2-AF1211FE2941@mi crosoft.com...
            >>
            >"Brian Cryer" wrote:
            >>
            >>"Martin Honnen" <mahotrash@yaho o.dewrote in message
            >>news:erugPY$v IHA.4912@TK2MSF TNGP03.phx.gbl. ..
            >Brian Cryer wrote:
            >>Currently I am using XmlReader (but I am open to other options) to
            >>parse
            >>an XML file, and I would like to be able to stop/break the current
            >>parse
            >>(simple enough) and then resume it later (say after a reboot). Is
            >>there
            >>any way to get the current location in the file that the XmlReader
            >>has
            >>reached so as to be able to restore that and start from that point
            >>later?
            >>
            >I don't think so. If you have an underlying stream you could store the
            >stream position but I don't know of any way to store and restore the
            >state
            >of the XmlReader.
            >>>
            >>I'm not too worried about the "state" of the XmlReader (I might be when
            >>I
            >>get there but for now I'm assuming if there are any issues that I'll be
            >>able
            >>to work round them).
            >>>
            >>I've looked at storing the stream position, but its evident that the
            >>XmlReader reads in a buffer load because my stream position is at about
            >>the
            >>8KB mark when I get to the first tag in the XmlReader.
            >>>
            >>Ahh ... Martin, you've been a great "sounding board". Knowing that the
            >>XmlReader doesn't provide any way of doing this is useful. But thinking
            >>about it, if the XmlReader reads in 8KB chunks (an assumption on my
            >>part,
            >>but one which I ought to be able to test) then as a way of "restoring" I
            >>may
            >>be able to get away with simply putting my read point 8096 bytes before
            >>the
            >>last known position in the underlying stream and then deal with any
            >>errors
            >>that get thrown up when XmlReader hits what it thinks is malformed XML.
            >>Bit
            >>yucky, but this might work for me (if XmlReader will play ball). At
            >>least it
            >>gives me an avenue to explore.
            >>>
            >>TA.
            >>>
            >>
            >It seems like a lot of work to go through, and likely prone to errors due
            >to
            >machine dependencies. How are you persisting the part that was read
            >before
            >the reboot? Are you no longer interested in that portion of the XML
            >after it
            >has been processed?
            >
            Fortunatly in this case the XML file whilst rather long is quite shallow.
            So I can forget about what went on before, and if I come across a
            duplicate section (which I will) then I can handle that (because each has
            a unique ID). So, in short, I don't need to worry too much about what went
            on before or the context. So, this isn't a generic solution by any means.
            (If I were processing something like an HTML file then it would get too
            messy to be viable.)
            >
            However, all this is still theory at the moment, as other work has pulled
            me away from this. I am hoping to be able to prove whether or not thie
            approach works for me either today or tomorrow.
            Incase anyone is monitoring this or wants to do something similar one day
            .... I've decided to abandon this approach. It just started to get too messy.
            Since the XML is well structured I'm going to implement reader from scratch
            which does exactly what I need.

            Comment

            Working...