How to remember last record read

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • francisds@hotmail.com

    How to remember last record read



    Hi,

    Can you guys see if there's a solution to this problem?

    I have a database from which I have to read each record and process
    that record. New records are being added all the time, so I need to go
    back and check for new records and process them.

    However:

    -- there is no 'identity' column in the database design (so I cannot
    keep track of the last record read by use of a numeric variable)

    -- I am not allowed to update the database (so I cannot flag the
    records I have read).

    My problem is: how can I know which records I have already read and
    which ones I haven't read yet? I don't want to process records twice
    and don't want to miss any records.

    Is there a known solution to this problem? Any ideas?

    Thanks.

  • Madhivanan

    #2
    Re: How to remember last record read


    Use a filed which has the datetime of type. Every time a record is
    inserted update this with latest datetime. When viewing the record get
    the latest of this field and check whether this different from the
    record by getting the latest datetime from the table again

    Madhivanan

    Comment

    • Hugo Kornelis

      #3
      Re: How to remember last record read

      On 2 Mar 2005 22:36:48 -0800, francisds@hotma il.com wrote:

      (snip)[color=blue]
      >My problem is: how can I know which records I have already read and
      >which ones I haven't read yet? I don't want to process records twice
      >and don't want to miss any records.[/color]

      Hi francisds,

      Madhivanan already pointed out the possibility to use a datetime column
      with the date a row is inserted in the table. If such a column is not
      available and you're not allowed to add one, here are a few other
      options.

      * You say that you're not allowed to update the database. Is that just
      this table? If so, you can create a new table to hold the ey values of
      all rows you already have processed.

      * If you're not allowed to add tables to the database either, you could
      consider keeping track of rows already processed in a table in another
      database. This is quite a hack, though. It can be hard to keep the
      databases synchronized, especially if you ever have to restore to a
      backup.

      * Yet another option would be to use a trigger on the table. Either do
      the complete processing in that trigger (if it's short and can't fail),
      or write rows with the primary keys of the inserted rows to another
      table that you can use to decide which rows to process. Note that even
      this trigger will slow down the insert process somewhat!

      Best, Hugo
      --

      (Remove _NO_ and _SPAM_ to get my e-mail address)

      Comment

      • Erland Sommarskog

        #4
        Re: How to remember last record read

        (francisds@hotm ail.com) writes:[color=blue]
        > Can you guys see if there's a solution to this problem?
        >
        > I have a database from which I have to read each record and process
        > that record. New records are being added all the time, so I need to go
        > back and check for new records and process them.
        >
        > However:
        >
        > -- there is no 'identity' column in the database design (so I cannot
        > keep track of the last record read by use of a numeric variable)
        >
        > -- I am not allowed to update the database (so I cannot flag the
        > records I have read).
        >
        > My problem is: how can I know which records I have already read and
        > which ones I haven't read yet? I don't want to process records twice
        > and don't want to miss any records.
        >
        > Is there a known solution to this problem? Any ideas?[/color]

        So where does your process live? Is it a stored procedure or a client
        program? If it's stored procedure maybe a temp table would do.

        More information about this process and the circumstances would be
        needed to give a good answer.


        --
        Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

        Books Online for SQL Server SP3 at
        SQL Server 2025 redefines what's possible for enterprise data. With developer-first features and integration with analytics and AI models, SQL Server 2025 accelerates AI innovation using the data you already have.

        Comment

        • francisds@hotmail.com

          #5
          Re: How to remember last record read

          Thank you for the suggestions. I am not sure they will work yet, but
          it does give me hope of finding a solution. I still need to digest
          them.

          As for the Erland's question. I am writing an application which runs
          on a separate PC from the SQL Server database which holds the table.

          The table contains records of a business event (for example, a
          payment). Each time an event occurs, then a record describing that
          event is appended in the table. My application needs to process all
          those records.

          Again, my problem is, I am not allowed to update the table (it's a
          stupid political issue, not a technical one). I checked and it has no
          identity field.

          Since the table has a column containing the datetime of the event that
          occured, I tried using the datetime to remember the last event I
          processed. Unfortunately, I found out that events can be generated
          from many points and there is no guarantee that the records of that
          event will be saved in the order they occured (IOW, an event that
          occured at 2:30pm could be saved before an event that occured at 2:15).

          -Francis



          Erland Sommarskog wrote:[color=blue]
          > So where does your process live? Is it a stored procedure or a client
          > program? If it's stored procedure maybe a temp table would do.
          >
          > More information about this process and the circumstances would be
          > needed to give a good answer.
          >
          >[/color]

          Comment

          • francisds@hotmail.com

            #6
            Re: How to remember last record read

            > * You say that you're not allowed to update the database. Is that
            just[color=blue]
            > this table? If so, you can create a new table to hold the ey values[/color]
            of[color=blue]
            > all rows you already have processed.[/color]

            Hi Hugo,

            Novice question: what do you mean by the "value" of a row? How do I
            get that?

            I discovered that there is a column of 'uniqueidentifi er' data type. I
            could use this probably, although since the database generates
            thousands of records a day, the list of processed keys will be fairly
            large.

            The trigger suggestion is intriguing. I might be able to use it for
            another project, but not for this since I cannot change the database
            (uhm, I haven't used triggers before, but I assume it requires
            modifying the database, at least to save a trigger?)

            Thanks.

            -Francis

            Comment

            • Erland Sommarskog

              #7
              Re: How to remember last record read

              (francisds@hotm ail.com) writes:[color=blue]
              > Thank you for the suggestions. I am not sure they will work yet, but
              > it does give me hope of finding a solution. I still need to digest
              > them.
              >
              > As for the Erland's question. I am writing an application which runs
              > on a separate PC from the SQL Server database which holds the table.
              >
              > The table contains records of a business event (for example, a
              > payment). Each time an event occurs, then a record describing that
              > event is appended in the table. My application needs to process all
              > those records.[/color]

              OK, so in theory you could just keep track in memory of which records
              you have processed. But if you application crashes, you will lose
              the knowledge of what you have processed. Then again, that depends on
              what your processing consists of. If you have to redo the processing in
              case of a crash that's the right thing.

              Else your alternative appears to be to write to a file on the machine
              you application runs on. When your application starts, you read the
              file to see what you already have processed.

              The challenge is two have a two-phase commit between the file and
              the processing. If you write to the file too soon, you mail fail to
              process a record in case of a restart. If you write too late, you may
              reprocess a record on a restart.

              Then again, if your processing involves some external mechanism, you
              have the same problem, if you were to update the table.
              [color=blue]
              > Again, my problem is, I am not allowed to update the table (it's a
              > stupid political issue, not a technical one). I checked and it has no
              > identity field.
              >
              > Since the table has a column containing the datetime of the event that
              > occured, I tried using the datetime to remember the last event I
              > processed. Unfortunately, I found out that events can be generated
              > from many points and there is no guarantee that the records of that
              > event will be saved in the order they occured (IOW, an event that
              > occured at 2:30pm could be saved before an event that occured at 2:15).[/color]

              You mentioned in another post that there is a uniqueidentifie r column.
              Thus the table has a primary key. An identity column had been a little
              easier to work with since it monotonic, and a uniqueidentifie r is not.
              But you could use a combination of datetime and uniqueidentifer . The
              nice with the datetime column, is that it permits you prune the log
              of processed records easily.

              --
              Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

              Books Online for SQL Server SP3 at
              SQL Server 2025 redefines what's possible for enterprise data. With developer-first features and integration with analytics and AI models, SQL Server 2025 accelerates AI innovation using the data you already have.

              Comment

              • Hugo Kornelis

                #8
                Re: How to remember last record read

                On 4 Mar 2005 10:58:19 -0800, francisds@hotma il.com wrote:
                [color=blue][color=green]
                >> * You say that you're not allowed to update the database. Is that[/color]
                >just[color=green]
                >> this table? If so, you can create a new table to hold the ey values[/color]
                >of[color=green]
                >> all rows you already have processed.[/color]
                >
                >Hi Hugo,
                >
                >Novice question: what do you mean by the "value" of a row? How do I
                >get that?[/color]

                Hi Francis,

                My fault. I should have written "you can create a new table to hold the
                values of the primary key columns in the rows you already have
                processed".

                [color=blue]
                >I discovered that there is a column of 'uniqueidentifi er' data type. I
                >could use this probably, although since the database generates
                >thousands of records a day, the list of processed keys will be fairly
                >large.[/color]

                You could only use this if you were allowed to add a column to the
                table. And if you are allowed to do that, than there are better
                solutions than uniqueidentifie r.

                [color=blue]
                >The trigger suggestion is intriguing. I might be able to use it for
                >another project, but not for this since I cannot change the database
                >(uhm, I haven't used triggers before, but I assume it requires
                >modifying the database, at least to save a trigger?)[/color]

                Yes, the trigger is stored in the database. But I'm not sure exactly
                what you may and may not change, as in other messages you wrote that you
                are not allowed to update the table. The trigger is not part of the
                table, it's stored seperately in the same database.

                BTW, have you already considered how to handle updates to a row that you
                have already processed?

                Best, Hugo
                --

                (Remove _NO_ and _SPAM_ to get my e-mail address)

                Comment

                • francisds@hotmail.com

                  #9
                  Re: How to remember last record read

                  > You mentioned in another post that there is a uniqueidentifie r
                  column.[color=blue]
                  > Thus the table has a primary key. An identity column had been a[/color]
                  little[color=blue]
                  > easier to work with since it monotonic, and a uniqueidentifie r is[/color]
                  not.[color=blue]
                  > But you could use a combination of datetime and uniqueidentifer . The
                  > nice with the datetime column, is that it permits you prune the log
                  > of processed records easily.[/color]


                  As I wrote earlier, I am not guaranteed about the order of arrival of
                  the records. How do you figure I might use the datetime in combination
                  with the uniqueidentifie r?


                  -Francis

                  Comment

                  • francisds@hotmail.com

                    #10
                    Re: How to remember last record read

                    > >I discovered that there is a column of 'uniqueidentifi er' data type.
                    I[color=blue]
                    >
                    > You could only use this if you were allowed to add a column to the
                    > table. And if you are allowed to do that, than there are better
                    > solutions than uniqueidentifie r.[/color]

                    Oh, I meant to say that I discovered that the table I am supposed to
                    read has a column of uniqueidentifie rs, not that I can create one for
                    my use.
                    [color=blue]
                    > Yes, the trigger is stored in the database. But I'm not sure exactly
                    > what you may and may not change, as in other messages you wrote that[/color]
                    you[color=blue]
                    > are not allowed to update the table. The trigger is not part of the
                    > table, it's stored seperately in the same database.[/color]

                    I can't touch the database at all. :-(
                    [color=blue]
                    > BTW, have you already considered how to handle updates to a row that[/color]
                    you[color=blue]
                    > have already processed?[/color]

                    Good alert. It's not needed for this application (the events are
                    mechanically generated and no one will be modifying the data).

                    (If it makes things clearer, the database I am reading is a Cisco
                    CallManager CDR table).



                    -Francis

                    Comment

                    • Erland Sommarskog

                      #11
                      Re: How to remember last record read

                      (francisds@hotm ail.com) writes:[color=blue]
                      > As I wrote earlier, I am not guaranteed about the order of arrival of
                      > the records. How do you figure I might use the datetime in combination
                      > with the uniqueidentifie r?[/color]

                      Sorry, I missed that part about the arrivals being out of order.

                      However, I supposed that you can assume that records arriving are not,
                      say, 24 hours old?

                      So you would write all GUIDs you processed to a file, together with
                      their datetime value. When you determine whether you have processed
                      a row or not, you first look at the datevalue, and if it's older than
                      24 hours (or two hours or whatever), you discard the record. If it's
                      newer, you look up the GUID in the file. (Probably you should have a
                      copy in memory for quick access. Every once in a while you would rewrite
                      the file and drop entires that are too old.

                      The main reason for this is to keep the local file down in size.

                      But you could also have a local SQL Server database that you write to. In
                      such case there would be no need for expiring old data. If the volume is
                      considerable so that several entries per second are generated, a database
                      may be the only option. (Or more precisely, you need an indexed store.)

                      --
                      Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

                      Books Online for SQL Server SP3 at
                      SQL Server 2025 redefines what's possible for enterprise data. With developer-first features and integration with analytics and AI models, SQL Server 2025 accelerates AI innovation using the data you already have.

                      Comment

                      • Hugo Kornelis

                        #12
                        Re: How to remember last record read

                        On 4 Mar 2005 21:58:26 -0800, francisds@hotma il.com wrote:

                        (snip)[color=blue]
                        >I can't touch the database at all. :-([/color]

                        Hi Francis,

                        In that case, you'll have to make do with a kludge.

                        I think the best way to handle this is to keep a list of the primary key
                        values of all rows you have already processed. You could store this in
                        another database, either on the same SQL Server instance or on another
                        instance. Depending on the environment where your process runs, using
                        another method might be better. Erland already made some suggestions.


                        (snip)[color=blue]
                        >(If it makes things clearer, the database I am reading is a Cisco
                        >CallManager CDR table).
                        >
                        >http://www.cisco.com/en/US/products/...c.html#wp33440[/color]

                        Based on a very quikc scan of the description, I'd say that you could
                        use either the pkid (based on the description, I agree that this is
                        probably a uniqueidentifie r), or the combination of globalCallID_ca llID
                        and globalCallID_ca llManagerID. The latter might be the better choice,
                        as (if I understand the description correctly), the callID is assigned
                        sequentially on each server (and I assume that each server maps to a
                        callManagerID). You might find that you only need to store a last
                        processed callID for each callManagerID.

                        Best, Hugo
                        --

                        (Remove _NO_ and _SPAM_ to get my e-mail address)

                        Comment

                        • francisds@hotmail.com

                          #13
                          Re: How to remember last record read

                          Hi All,

                          We are in process of discussing these solutions. I may later post here
                          the final solution we adopt. If I don't, thought you might want to
                          know that I am considering the two options suggested:

                          - "You might find that you only need to store a last
                          processed callID for each callManagerID. "

                          - Remembering the last datetime of last record, querying for new
                          records with a later datetime (though including those that are 15
                          minutes earlier, just to make sure), and keeping a list of processed
                          pkids to prevent duplicate processing. This private list will be
                          purged around the time of each query, removing those that are old
                          enough.

                          Thanks much guys !

                          -Francis

                          Comment

                          Working...