DBCC and Failed Assertion Errors - HELP!

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Morgan Leppink

    DBCC and Failed Assertion Errors - HELP!

    Hey all -

    We are running SQL 2000 with ALL available service packs, etc.
    applied. We just built a brand new database server, which has dual
    2Ghz XEONs, 2GB memory, and the following disk configuration:

    RAID 1 array (2 disks) Operating System Windows Server 2003
    RAID 1 array (2 disks) Database Logs
    RAID 10 array (4 disks) Database Data

    Disks are SATA, with a 3Ware hardware RAID controller. The machine
    SCREAMS.

    We run 5 databases on this machine. 2 of these are fairly large (by
    our standards, anyway). The second largest database (and the busiest
    and most important) is consistently generating consistency errors that
    bring many important queries down. These are almost ALWAYS in the
    form of index corruption on one single table. The corruption does not
    normally occur on other tables, although it DOES happen once in a
    while - rarely - on one of the other tables), nor does it EVER occur
    on any other databases on the server.

    The corruption seems to happen right in the neighborhood of midnight
    ALMOST every day, give or take a few minutes, but does not seem
    directly associated with any of our MANY scheduled database cleanup
    tasks (believe me, we've tried desperately to find an association
    using SQL profiler). At midnight, our database traffic is fairly low,
    so it does not seem associated with a high traffic level.

    We are using the FULL recovery model, with log backups every 15
    minutes, and full backups daily at 12:15am. However, the corruption
    happens consistently BEFORE 12:15, like between 11:50pm and 12:10am.
    The most frustrating thing is, the database can go WEEKS without any
    corruption at all, and then it'll go 4 or 5 days in a row with this
    strange corruption stuff.

    *************** *************** *************** *************** *************
    Typical query errors when the corruption exists include:
    *************** *************** *************** *************** *************

    SQL Server Assertion: File:
    <p:\sql\ntdbms\ storeng\drs\inc lude\record.inl >, line=1447
    Failed Assertion = 'm_SizeRec > 0 && m_SizeRec <= MAXDATAROW'.


    SQL Server Assertion: File: <recbase.cpp> , line=1378
    Failed Assertion = 'm_offBeginVar < m_SizeRec'.


    Server: Msg 3624, Level 20, State 1, Line 7
    Location: recbase.cpp:137 4
    Expression: m_nVars > 0


    Connection Broken

    *************** *************** *************** *************** *************

    Most of the responses to this type of issue (failed assertions) on the
    newgroups appear to point to hardware failures. However, this is
    brand new hardware, AND, it seems to us that if this was a hardware
    issue, other databases, tables, and indexes would be affected
    randomly. Isn't that a valid assumption (that if it was hardware,
    particularly the RAID controller, the corruption would not be in such
    a predictable place)? What if we moved the physical database files to
    another location on the disk? Would/could that help?

    If anyone could offer some suggestions as to what may be causing this
    corruption, we would be eternally grateful. It is getting to be a
    real pain in the A*** to run DBCC CHECKDB with REPAIR_ALLOW_DA TA_LOSS
    every day or two (it always seems to solve the problem without data
    loss, but still...).

    Again, thanks in advance for your response.


    Sincerely,


    Morgan Leppink
    mleppink@hotmai l.com
  • Paul S Randal [MS]

    #2
    Re: DBCC and Failed Assertion Errors - HELP!

    Hi Morgan,

    Have you actually checked the event logs and run hardware diagnostics on
    your IO system to see if there are hardware problems?

    If so and there's no clues there, you should call Product Support to help
    you diagnose the problem.

    Regards.

    --
    Paul Randal
    Dev Lead, Microsoft SQL Server Storage Engine

    This posting is provided "AS IS" with no warranties, and confers no rights.

    "Morgan Leppink" <mleppink@hotma il.com> wrote in message
    news:806e6d7.04 05271455.1bf6a2 d4@posting.goog le.com...[color=blue]
    > Hey all -
    >
    > We are running SQL 2000 with ALL available service packs, etc.
    > applied. We just built a brand new database server, which has dual
    > 2Ghz XEONs, 2GB memory, and the following disk configuration:
    >
    > RAID 1 array (2 disks) Operating System Windows Server 2003
    > RAID 1 array (2 disks) Database Logs
    > RAID 10 array (4 disks) Database Data
    >
    > Disks are SATA, with a 3Ware hardware RAID controller. The machine
    > SCREAMS.
    >
    > We run 5 databases on this machine. 2 of these are fairly large (by
    > our standards, anyway). The second largest database (and the busiest
    > and most important) is consistently generating consistency errors that
    > bring many important queries down. These are almost ALWAYS in the
    > form of index corruption on one single table. The corruption does not
    > normally occur on other tables, although it DOES happen once in a
    > while - rarely - on one of the other tables), nor does it EVER occur
    > on any other databases on the server.
    >
    > The corruption seems to happen right in the neighborhood of midnight
    > ALMOST every day, give or take a few minutes, but does not seem
    > directly associated with any of our MANY scheduled database cleanup
    > tasks (believe me, we've tried desperately to find an association
    > using SQL profiler). At midnight, our database traffic is fairly low,
    > so it does not seem associated with a high traffic level.
    >
    > We are using the FULL recovery model, with log backups every 15
    > minutes, and full backups daily at 12:15am. However, the corruption
    > happens consistently BEFORE 12:15, like between 11:50pm and 12:10am.
    > The most frustrating thing is, the database can go WEEKS without any
    > corruption at all, and then it'll go 4 or 5 days in a row with this
    > strange corruption stuff.
    >
    > *************** *************** *************** *************** *************
    > Typical query errors when the corruption exists include:
    > *************** *************** *************** *************** *************
    >
    > SQL Server Assertion: File:
    > <p:\sql\ntdbms\ storeng\drs\inc lude\record.inl >, line=1447
    > Failed Assertion = 'm_SizeRec > 0 && m_SizeRec <= MAXDATAROW'.
    >
    >
    > SQL Server Assertion: File: <recbase.cpp> , line=1378
    > Failed Assertion = 'm_offBeginVar < m_SizeRec'.
    >
    >
    > Server: Msg 3624, Level 20, State 1, Line 7
    > Location: recbase.cpp:137 4
    > Expression: m_nVars > 0
    >
    >
    > Connection Broken
    >
    > *************** *************** *************** *************** *************
    >
    > Most of the responses to this type of issue (failed assertions) on the
    > newgroups appear to point to hardware failures. However, this is
    > brand new hardware, AND, it seems to us that if this was a hardware
    > issue, other databases, tables, and indexes would be affected
    > randomly. Isn't that a valid assumption (that if it was hardware,
    > particularly the RAID controller, the corruption would not be in such
    > a predictable place)? What if we moved the physical database files to
    > another location on the disk? Would/could that help?
    >
    > If anyone could offer some suggestions as to what may be causing this
    > corruption, we would be eternally grateful. It is getting to be a
    > real pain in the A*** to run DBCC CHECKDB with REPAIR_ALLOW_DA TA_LOSS
    > every day or two (it always seems to solve the problem without data
    > loss, but still...).
    >
    > Again, thanks in advance for your response.
    >
    >
    > Sincerely,
    >
    >
    > Morgan Leppink
    > mleppink@hotmai l.com[/color]


    Comment

    • Morgan Leppink

      #3
      Re: DBCC and Failed Assertion Errors - HELP!

      Paul -

      The only information in the event logs is the text of the failed
      assertion error itself. I have never seen any OS-reported problems
      with the hardware.

      I hate to seem stupid, but can you be more specific about what you
      mean when you say "hardware diagnostics?" Are you talking about the
      simple Windows CheckDisk utility or something more advanced? This is
      the first time I've used a hardware RAID controller - is Windows even
      capable of checking the hardware-controlled disk array, or do I need
      to use a utility provided by the RAID controller manufacturer?

      Or would you suggest some sort of third-party utility for "burning in"
      the hardware? Would you suspect disk drives, memory, or what? Could
      it be ANY of the hradware, or just specific things?

      One last question: What's the most effective method for contacting
      product support if I need to do so?

      Thanks,

      Morgan Leppink


      "Paul S Randal [MS]" <prandal@online .microsoft.com> wrote in message news:<40b684c7$ 1@news.microsof t.com>...[color=blue]
      > Hi Morgan,
      >
      > Have you actually checked the event logs and run hardware diagnostics on
      > your IO system to see if there are hardware problems?
      >
      > If so and there's no clues there, you should call Product Support to help
      > you diagnose the problem.
      >
      > Regards.
      >
      > --
      > Paul Randal
      > Dev Lead, Microsoft SQL Server Storage Engine
      >
      > This posting is provided "AS IS" with no warranties, and confers no rights.
      >
      > "Morgan Leppink" <mleppink@hotma il.com> wrote in message
      > news:806e6d7.04 05271455.1bf6a2 d4@posting.goog le.com...[color=green]
      > > Hey all -
      > >
      > > We are running SQL 2000 with ALL available service packs, etc.
      > > applied. We just built a brand new database server, which has dual
      > > 2Ghz XEONs, 2GB memory, and the following disk configuration:
      > >
      > > RAID 1 array (2 disks) Operating System Windows Server 2003
      > > RAID 1 array (2 disks) Database Logs
      > > RAID 10 array (4 disks) Database Data
      > >
      > > Disks are SATA, with a 3Ware hardware RAID controller. The machine
      > > SCREAMS.
      > >
      > > We run 5 databases on this machine. 2 of these are fairly large (by
      > > our standards, anyway). The second largest database (and the busiest
      > > and most important) is consistently generating consistency errors that
      > > bring many important queries down. These are almost ALWAYS in the
      > > form of index corruption on one single table. The corruption does not
      > > normally occur on other tables, although it DOES happen once in a
      > > while - rarely - on one of the other tables), nor does it EVER occur
      > > on any other databases on the server.
      > >
      > > The corruption seems to happen right in the neighborhood of midnight
      > > ALMOST every day, give or take a few minutes, but does not seem
      > > directly associated with any of our MANY scheduled database cleanup
      > > tasks (believe me, we've tried desperately to find an association
      > > using SQL profiler). At midnight, our database traffic is fairly low,
      > > so it does not seem associated with a high traffic level.
      > >
      > > We are using the FULL recovery model, with log backups every 15
      > > minutes, and full backups daily at 12:15am. However, the corruption
      > > happens consistently BEFORE 12:15, like between 11:50pm and 12:10am.
      > > The most frustrating thing is, the database can go WEEKS without any
      > > corruption at all, and then it'll go 4 or 5 days in a row with this
      > > strange corruption stuff.
      > >
      > > *************** *************** *************** *************** *************
      > > Typical query errors when the corruption exists include:
      > > *************** *************** *************** *************** *************
      > >
      > > SQL Server Assertion: File:
      > > <p:\sql\ntdbms\ storeng\drs\inc lude\record.inl >, line=1447
      > > Failed Assertion = 'm_SizeRec > 0 && m_SizeRec <= MAXDATAROW'.
      > >
      > >
      > > SQL Server Assertion: File: <recbase.cpp> , line=1378
      > > Failed Assertion = 'm_offBeginVar < m_SizeRec'.
      > >
      > >
      > > Server: Msg 3624, Level 20, State 1, Line 7
      > > Location: recbase.cpp:137 4
      > > Expression: m_nVars > 0
      > >
      > >
      > > Connection Broken
      > >
      > > *************** *************** *************** *************** *************
      > >
      > > Most of the responses to this type of issue (failed assertions) on the
      > > newgroups appear to point to hardware failures. However, this is
      > > brand new hardware, AND, it seems to us that if this was a hardware
      > > issue, other databases, tables, and indexes would be affected
      > > randomly. Isn't that a valid assumption (that if it was hardware,
      > > particularly the RAID controller, the corruption would not be in such
      > > a predictable place)? What if we moved the physical database files to
      > > another location on the disk? Would/could that help?
      > >
      > > If anyone could offer some suggestions as to what may be causing this
      > > corruption, we would be eternally grateful. It is getting to be a
      > > real pain in the A*** to run DBCC CHECKDB with REPAIR_ALLOW_DA TA_LOSS
      > > every day or two (it always seems to solve the problem without data
      > > loss, but still...).
      > >
      > > Again, thanks in advance for your response.
      > >
      > >
      > > Sincerely,
      > >
      > >
      > > Morgan Leppink
      > > mleppink@hotmai l.com[/color][/color]

      Comment

      • druss

        #4
        Re: DBCC and Failed Assertion Errors - HELP!

        I am running a 3ware SATA Raid card also and have been getting consistency
        errors randomly also. I have to run repair_allow_da ta_loss to fix. I wish
        I knew the cause. No drive errors. Microsoft can not pin point either. All
        they can tell me is that it is most likely hardware related and to move my
        database to another server.

        Comment

        • Greg D. Moore \(Strider\)

          #5
          Re: DBCC and Failed Assertion Errors - HELP!


          "druss" <dean@corp.dsle xtreme.com> wrote in message
          news:6c23824f88 90833bc7e6bd07c 5331636@localho st.talkaboutdat abases.com...[color=blue]
          > I am running a 3ware SATA Raid card also and have been getting consistency
          > errors randomly also. I have to run repair_allow_da ta_loss to fix. I wish
          > I knew the cause. No drive errors. Microsoft can not pin point either. All
          > they can tell me is that it is most likely hardware related and to move my
          > database to another server.[/color]

          I would suggest they're probably right in this case.

          [color=blue]
          >[/color]


          Comment

          Working...