Hourly batch process locking my database

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Sam

    Hourly batch process locking my database

    Hello everyone,

    I have around 20 reports in an ASP web-application which connects to a
    SQL Server 2000 dB, executes stored procedures based on input
    parameters and returns the data in a nice tabular format.

    The data which is used in these reports actually originates from a 3rd
    party accounting application called Exchequer. I have written a VB
    application (I call it the extractor) which extracts data from
    Exchequer and dumps the same into the SQL Server dB every hour. The
    running time for the extractor is an average of 10 minutes. During
    these 10 minutes, while the extractor seems to run happily, my ASP
    web-application which queries the same dB that the extractor
    application is updating becomes dead slow.

    Is there anyway I can get the extractor to be nice to SQL Server and
    not take up all its resources so that the ASP web-application users do
    not have to contend with a very very slow application during those
    times?

    I am using a DSN to connect to the dB from the server that runs the
    web-application and well as the other server which runs extractor.
    Connection pooling has been enabled on both (using the ODBC
    Administrator). The Detach Database dialog gives me a list of open
    connections to the dB. I have been monitoring the same and I have
    noted 10-15 open connections at most times, even during the execution
    of extractor.

    All connection objects in the ASP as well as VB applications are
    closed and then set to nothing.

    This system has been in use from 2002. My Data file has grown to 450MB
    and my Transaction Log is close to 2GB. Can the Transaction Log be a
    problem. For some reason, the size of the Transaction Log does not go
    down even after a complete dB backup is done. Once a complete dB
    backup is done, doesn't the Transaction Log lose its significance and
    can be actually deleted? Anyway this is another post I'm doing today
    to the group.

    In the extractor program,
    1) I create a temporary table
    2) I create an empty recordset out of the table
    3) I loop through the Exchequer records using Exchequer's APIs, adding
    records into the recordset of the temporary table as I go along.
    4) I do an UpdateBatch of the Recordset intermitently
    5) I open an SQL Transaction
    6) I delete all records from the main table
    7) I run a INSERT INTO main_table SELECT * FROM #temp_table
    8) I commit the transaction

    I hope that the information is sufficient

    Thanks
    Sam
  • Erland Sommarskog

    #2
    Re: Hourly batch process locking my database

    Sam (simantahandiqu e@indiatimes.co m) writes:[color=blue]
    > Is there anyway I can get the extractor to be nice to SQL Server and
    > not take up all its resources so that the ASP web-application users do
    > not have to contend with a very very slow application during those
    > times?[/color]

    There probably is, but it's not that there is a low-priority setting.
    You simply will have to analyse in what way the extractor is slowing
    the rest down. It could be blocking, it could be that the extractor
    consumes a lot of CPU. Indeed during these steps:
    [color=blue]
    > 5) I open an SQL Transaction
    > 6) I delete all records from the main table
    > 7) I run a INSERT INTO main_table SELECT * FROM #temp_table
    > 8) I commit the transaction[/color]

    The other processes are likely to be blocked, as the main table is
    being reloaded.

    I don't think you said how much data that is loaded each time, but it is
    not impossible that here are possibilities for improvements. In step 6,
    you could use TRUNCATE TABLE to empty the table, this is faster than using
    the DELETE statement. TRUNCATE TABLE will not work if the table has
    foreign keys. Also, the extractor must be running as dbo to have permission.

    Then again, does data really change that wildly? Maybe you are better off
    loading deltas only?

    Also: you load the data through a recordset. You are likely to get better
    performance with bulk-load operations.
    [color=blue]
    > This system has been in use from 2002. My Data file has grown to 450MB
    > and my Transaction Log is close to 2GB. Can the Transaction Log be a
    > problem. For some reason, the size of the Transaction Log does not go
    > down even after a complete dB backup is done. Once a complete dB
    > backup is done, doesn't the Transaction Log lose its significance and
    > can be actually deleted?[/color]

    Well, someone has to make that decision, and SQL Server is not doing it
    behind your back. You need to backup the transaction log as well to
    keep it down in size. The transaction log is kept so that you can back
    it up and make an up-to-the point recovery.

    Judging from the description of the database, it sounds that you might
    be interested in setting the database in simple recovery. This means
    that you cannot perform up-to-the-point recovery, only restore from
    the most recent backup. The flip side is that SQL Server will regularly
    truncate the transaction log from all committed transaction.


    --
    Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

    Books Online for SQL Server SP3 at
    Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

    Comment

    • Sam

      #3
      Re: Hourly batch process locking my database

      Thank you for your post. A couple of points.

      Erland Sommarskog <esquel@sommars kog.se> wrote in message news:<Xns952F72 0586F5AYazorman @127.0.0.1>...[color=blue]
      >
      > Then again, does data really change that wildly? Maybe you are better off
      > loading deltas only?
      >[/color]
      I run the extractor every hour to synchronize data from the past 7
      days from Exchequer. This is very much the requirement, so I do not
      have any choice there. This results on an average 10,000 records to be
      moved into the main_table from the #temp_table, which is when the
      dead-lock happens. Does this information help you in any way to
      suggest some improvements?

      I was thinking of running two parallel dBs in some sort of a
      round-robin load balanced setup. (I'm sure the "round-robin load
      balanced" term is completely incorrect to describe the setup. But I am
      using it for lack of a better phrase to describe my plan) In a typical
      scenario, the Extractor will run on dB1. The Intranet web-application
      will be connected to dB2. Once the extractor completes its execution,
      it can alter the connection string in the conf file for my Intranet
      web-application so that the Intranet connects to the updated dB1
      henceforth. The next time the extractor executes, it will update dB2
      instead of dB1 and switch the Intranet back to dB2 after its
      execution, and so on. The only problem is that the dB has around 20
      tables out of which only 6 are being updated by Extractor. How do I
      synchronize the other 14 tables between dB1 and db2? Does SQL Server
      provide any facility for that?
      [color=blue]
      >
      > Also: you load the data through a recordset. You are likely to get better
      > performance with bulk-load operations.
      >[/color]
      What are these bulk-load operations? Are they a group of SQL Commands?
      If you can point me to the right place in the SQL Server help file, I
      can read up from there.
      [color=blue]
      >
      > Judging from the description of the database, it sounds that you might
      > be interested in setting the database in simple recovery. This means
      > that you cannot perform up-to-the-point recovery, only restore from
      > the most recent backup. The flip side is that SQL Server will regularly
      > truncate the transaction log from all committed transaction.
      >[/color]
      I just enquired with the administration team. They are using a
      third-party software called Backup Exec from Veritas for all backup
      activities. The backup software uses some thing it calls as SQL Server
      agents to backup the dB. I'm not sure if these SQL Server agents would
      cause the transaction log to get truncated once they are done with the
      backups.
      If I use a simple recovery model, is the transaction log table still
      used by the dB? Would SQL statements which have a BEGIN TRANS use the
      Transaction Log? Once the transaction is committed in this case, does
      it automatically truncate the Transaction Log and thus keep its file
      size in check?
      These questions may been answered in the previous post. I'm sorry if
      I'm repeating the same question. But I could not understand the last
      few sentences completely. Especially, when it was mentioned thus: "The
      flip side is that ...". Why do you say it is a "flip" side?

      Thanks and regards,
      Sam

      Comment

      • Ryan

        #4
        Re: Hourly batch process locking my database

        Just a suggestion for you. It sounds similar to something we do in
        here, so it may be worth a thought.

        We have to extract a large number of records from several tables. From
        this we need to perform some complex calculations and play around with
        data formatting to get the data into a required output which is a
        single table. We need to do this every 10 minutes with no longer than
        3 mins taken up for the processing. Locking users is a major concern,
        as previous methods used directly on the 'live' tables caused us
        problems.

        What we found worked best for us was to have a database in which we
        could store the structure of the tables we were copying (specifically
        just the fields we needed). Using a DTS package and SP's, we extract
        the data from the master db into CSV files (very quick to do this -
        for 1/2 million records between all of the tables, this takes about 1
        min to export and import) and re-import the data into the copied
        structure in the new database. We then perform all of the calculations
        on these tables as opposed to the master 'live' ones.

        I know it probably seems daft exporting and importing, but it's the
        quickest way of getting the data without causing the users problems.
        SQL server can easily handle the workload on the data, but we were
        causing too many problems with locking. By doing this, we have
        improved the speed dramatically.

        The advantages are that you can take the data as read only and reduce
        locking (although this will be minimal) and speed of export and
        import.

        A simple scheduled task runs this for us every 10 minutes.

        If you get a chance, try this and see how it performs against your
        normal method, it may prove quick enough for you to take further.

        Hope that helps, let me know if you need more info.

        Ryan

        Comment

        • Erland Sommarskog

          #5
          Re: Hourly batch process locking my database

          Sam (simantahandiqu e@indiatimes.co m) writes:[color=blue]
          > Erland Sommarskog <esquel@sommars kog.se> wrote in message
          > news:<Xns952F72 0586F5AYazorman @127.0.0.1>...[color=green]
          >>
          >> Then again, does data really change that wildly? Maybe you are better off
          >> loading deltas only?
          >>[/color]
          > I run the extractor every hour to synchronize data from the past 7
          > days from Exchequer. This is very much the requirement, so I do not
          > have any choice there. This results on an average 10,000 records to be
          > moved into the main_table from the #temp_table, which is when the
          > dead-lock happens. Does this information help you in any way to
          > suggest some improvements?[/color]

          Not really. If I understood your previous post, you are once an hour
          flushing the target database, and reloads everything from Exchequer. This
          sounds very ineffecient to me. Is there no way to find out what has
          changed, and load only the changes?
          [color=blue]
          > I was thinking of running two parallel dBs in some sort of a
          > round-robin load balanced setup.
          > ...
          > The only problem is that the dB has around 20 tables out of which only 6
          > are being updated by Extractor. How do I synchronize the other 14 tables
          > between dB1 and db2? Does SQL Server provide any facility for that?[/color]

          You could use triggers for this, although you need to take some care that
          you don't trigger forth and back.

          However, having two databases is a bit overkill. It might be possible
          to do this with two tables that you rename once an hour. Not that this
          is particularly exciting. (Loading only deltas still seems like the best
          bet.)
          [color=blue]
          > What are these bulk-load operations? Are they a group of SQL Commands?
          > If you can point me to the right place in the SQL Server help file, I
          > can read up from there.[/color]

          There are several methods to bulk-load:

          o BCP, which is a command-line tool.
          o BULK INSERT which is an SQL command.
          o There are bulk-copy APIs both in OLE DB and ODBC.
          o DTS.

          The easiest to use in my opinion is BCP. But these ways have many things
          in common, including how you specify how the data to import looks like.
          (With the reservation for DTS that I don't know much about.)

          With bulk-load, you could get all data into a staging table, and then
          work from there. Preferably set-based, and not one-by-one processing
          like in ADO.
          [color=blue]
          > I just enquired with the administration team. They are using a
          > third-party software called Backup Exec from Veritas for all backup
          > activities. The backup software uses some thing it calls as SQL Server
          > agents to backup the dB. I'm not sure if these SQL Server agents would
          > cause the transaction log to get truncated once they are done with the
          > backups.[/color]

          I don't know about Backup Exec, you will have to sort that out with
          your admin people. But I would guess, that it's a configuration thing
          whether you also backup the log with Backup Exec.

          Then again... You said, 10000 rows on average each time. If I assume
          an average row size of 500 bytes, this means that you insert and delete
          10000 * 500 * 2 * 24 = 240.000.000 bytes a day. If your transaction log
          never had been truncated since 2002, you would have a terabyte transaction
          log by now. So I would guess that it is backed up after all.
          [color=blue]
          > If I use a simple recovery model, is the transaction log table still
          > used by the dB? Would SQL statements which have a BEGIN TRANS use the
          > Transaction Log? Once the transaction is committed in this case, does
          > it automatically truncate the Transaction Log and thus keep its file
          > size in check?[/color]

          More or less, yes. That is, transaction atomicity is still maintained,
          so that a transaction can be rolled back (or forward). The log is never
          truncated past the oldest active open transaction. The actual truncation
          does not really happen with commit, but by the checkpointing process
          which runs about once minute or so. But that's really nothing you have
          to bother about.

          --
          Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

          Books Online for SQL Server SP3 at
          Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

          Comment

          Working...