Strange performance issue with UPDATE FROM

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Richard

    Strange performance issue with UPDATE FROM

    Hello!

    I have this piece of SQL code:

    UPDATE a
    SET Field1 = c.Field1
    FROM a
    INNER JOIN b ON a.GUID1 = b.GUID1
    INNER JOIN c ON b.GUID2 = c.GUID2
    WHERE c.Type = 1
    AND @date BETWEEN b.DateFrom AND b.DateTo

    This query takes hours to complete.

    Now while trying to find out what's causing the poor performance (it
    surely looks simple enough!) I've rewritten it to use temp tables:

    SELECT a.GUID1, a.Field1, c.Type, b.DateFrom, b.DateTo INTO #temptable
    FROM a
    INNER JOIN b ON a.GUID1 = b.GUID1
    INNER JOIN c ON b.GUID2 = c.GUID2
    WHERE c.Type = 1
    AND @date BETWEEN b.DateFrom AND b.DateTo

    UPDATE a SET Field1 = subsel.Field1
    FROM (SELECT * FROM #temptable) AS subsel
    WHERE subsel.GUID1 = a.GUID1

    Now it completes in 10 seconds.

    My question is why? Am I wrong in saying that the two batches above
    produce same results? Is there something I've missed about the UPDATE
    FROM syntax? Why would the first query perform THAT poorly?

    Table sizes:
    a: 24k rows
    b: 268k rows
    c: 260k rows

    GUIDs are of type uniqueidentifie r.

    Any answers appreciated!

    Regards,
    // Richard

  • Erland Sommarskog

    #2
    Re: Strange performance issue with UPDATE FROM

    Richard (nassegris@gmai l.com) writes:
    I have this piece of SQL code:
    >
    UPDATE a
    SET Field1 = c.Field1
    FROM a
    INNER JOIN b ON a.GUID1 = b.GUID1
    INNER JOIN c ON b.GUID2 = c.GUID2
    WHERE c.Type = 1
    AND @date BETWEEN b.DateFrom AND b.DateTo
    >
    This query takes hours to complete.
    >
    Now while trying to find out what's causing the poor performance (it
    surely looks simple enough!) I've rewritten it to use temp tables:
    >
    SELECT a.GUID1, a.Field1, c.Type, b.DateFrom, b.DateTo INTO #temptable
    FROM a
    INNER JOIN b ON a.GUID1 = b.GUID1
    INNER JOIN c ON b.GUID2 = c.GUID2
    WHERE c.Type = 1
    AND @date BETWEEN b.DateFrom AND b.DateTo
    >
    UPDATE a SET Field1 = subsel.Field1
    FROM (SELECT * FROM #temptable) AS subsel
    WHERE subsel.GUID1 = a.GUID1
    >
    Now it completes in 10 seconds.
    >
    My question is why? Am I wrong in saying that the two batches above
    produce same results? Is there something I've missed about the UPDATE
    FROM syntax? Why would the first query perform THAT poorly?
    One problem with UPDATE FROM is that you can update the same row
    several times if your join conditions are not unique. What happens if
    you run:

    UPDATE a
    SET Field = (SELECT c.Field1
    FROM c
    JOIN b ON c.GUID2 = b.GUID2
    WHERE a.GUID1 = b.GUID1
    AND c.type = 1
    AND @date BETWEEN b.DateFrom AND b.DateTo)

    I'm most interested to know if the query succeds at all, or if it fails
    with an error message. From the table sizes you have indicated, I would
    expect an error, but I don't know how your tables are related.

    As for the performance, investigating the query plan can give some ideas.
    Without seeing query plans, the table definitions, the indexes etc,
    it's difficult to say much useful.

    Which version of SQL Server are you using?

    Is @date a parameter to a stored procedure or a local variable?


    --
    Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

    Books Online for SQL Server 2005 at

    Books Online for SQL Server 2000 at

    Comment

    • Roy Harvey

      #3
      Re: Strange performance issue with UPDATE FROM

      On Tue, 26 Jun 2007 21:46:15 +0000 (UTC), Erland Sommarskog
      <esquel@sommars kog.sewrote:
      >One problem with UPDATE FROM is that you can update the same row
      >several times if your join conditions are not unique. What happens if
      >you run:
      >
      UPDATE a
      SET Field = (SELECT c.Field1
      FROM c
      JOIN b ON c.GUID2 = b.GUID2
      WHERE a.GUID1 = b.GUID1
      AND c.type = 1
      AND @date BETWEEN b.DateFrom AND b.DateTo)
      BE CAREFUL WITH THIS!!

      One thing that could happen from this UPDATE is that it sets Field =
      NULL for rows that are untouched by the UPDATEs in the original post.
      That happens if there are rows in the table being UPDATEd that do not
      have matches in the subquery. The FROM/JOIN prevents that in the
      original versions. I don't know if any such non-matching rows exist,
      but it certainly seems possible with the date range and type tests
      dropping rows from the subquery.

      I believe this query would show if the original UPDATEs using FROM
      result in the same row updated more than once, without possible impact
      on the data.

      SELECT A.PrimaryKey, count(*)
      FROM A
      JOIN B
      ON A.GUID1 = B.GUID1
      JOIN C
      ON B.GUID2 = B.GUID2
      WHERE C.type = 1
      AND @date BETWEEN B.DateFrom AND B.DateTo)
      GROUP BY A.PrimaryKey
      HAVING COUNT(*) 1

      Roy Harvey
      Beacon Falls, CT

      Comment

      • --CELKO--

        #4
        Re: Strange performance issue with UPDATE FROM


        I hope you know better than to use GUIDs in an RDBMS except for
        replication, never to use reserved words like "date" for data element
        names or vague names like "type" -- the basic ISO-11179 rules, etc.
        You also seem to confuse fields and columns, but let's skip the signs
        of poor SQL practices for now.

        If you rewrite this in statement into Standard SQL, you will see if
        the join returns multiple rows instead of a scalar value. That would
        let us know that the schema has serious design problems. The illegal
        syntax you used can do multiple updates on each row; talk to an old
        Sybase programmer about this problem.

        UPDATE A SET field1
        = (SELECT C.field1 FROM B, C
        WHERE A.guid1 = B.guid1
        AND B.guid2 = C.guid2 AND C.somekind_type = 1
        AND @my_date BETWEEN B.start_date AND B.end_date);

        Comment

        • Erland Sommarskog

          #5
          Re: Strange performance issue with UPDATE FROM

          Roy Harvey (roy_harvey@sne t.net) writes:
          On Tue, 26 Jun 2007 21:46:15 +0000 (UTC), Erland Sommarskog
          ><esquel@sommar skog.sewrote:
          >
          >>One problem with UPDATE FROM is that you can update the same row
          >>several times if your join conditions are not unique. What happens if
          >>you run:
          >>
          > UPDATE a
          > SET Field = (SELECT c.Field1
          > FROM c
          > JOIN b ON c.GUID2 = b.GUID2
          > WHERE a.GUID1 = b.GUID1
          > AND c.type = 1
          > AND @date BETWEEN b.DateFrom AND b.DateTo)
          >
          BE CAREFUL WITH THIS!!
          >
          One thing that could happen from this UPDATE is that it sets Field =
          NULL for rows that are untouched by the UPDATEs in the original post.
          Right, Roy. I didn't add a WHERE clause, because it was more intended as a
          test to see if the query would work at all. But I should have included the
          warning.




          --
          Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

          Books Online for SQL Server 2005 at

          Books Online for SQL Server 2000 at

          Comment

          • Richard

            #6
            Re: Strange performance issue with UPDATE FROM

            On Jun 27, 2:27 am, --CELKO-- <jcelko...@eart hlink.netwrote:
            I hope you know better than to use GUIDs in an RDBMS except for
            replication, never to use reserved words like "date" for data element
            names or vague names like "type" -- the basic ISO-11179 rules, etc.
            You also seem to confuse fields and columns, but let's skip the signs
            of poor SQL practices for now.
            >
            Well, yes, I actually do know better. The columns, variables and
            tables in the query are renamed as I don't want to post production
            code on the Internet. Also excuse the mixup between fields and
            columns, I'm not a native English speaker.

            One big problem (as i see it, and I'm by no means a SQL expert) is
            that the db in question uses uniqueidentifie r primary keys with
            clustered indexes on those almost EVERYWHERE, and there is nothing I
            can do to change that at the moment...Constr ucts like

            FROM z
            INNER JOIN a ON ..GUID = ..GUID
            INNER JOIN b ON ..GUID = ..GUID
            INNER JOIN c ON ..GUID = ..GUID
            INNER JOIN d ON ..GUID = ..GUID
            LEFT OUTER JOIN eON ..GUID = ..GUID
            AND VERSION = (
            SELECT MAX(VERSION) FROM f WHERE ..GUID = ..GUID)

            make the queries run painfuly slow.

            So the question is, is there ANYTHING I can do to optimize this type
            of queries or is a redesign the only thing that would help?

            @Erland:
            I use MS SQL 2000 server and @date is a local variable =)

            Comment

            • Erland Sommarskog

              #7
              Re: Strange performance issue with UPDATE FROM

              Richard (nassegris@gmai l.com) writes:
              One big problem (as i see it, and I'm by no means a SQL expert) is
              that the db in question uses uniqueidentifie r primary keys with
              clustered indexes on those almost EVERYWHERE, and there is nothing I
              can do to change that at the moment...Constr ucts like
              Clustered indexes on GUIDs requires a lot of skill in monitoring
              fragmentation.

              With the standard setup with a high fill factor, clustering on GUIDs is bad,
              because you get page splits and fragmentation galore.

              SQL Server MVP Greg Linwood suggested to me that clustering on GUIDs may
              still be good for INSERT performance, if you create the indexes with a low
              fill factor, say 50%. Now when you add new rows, there are good chance
              that there is a hole to plug into. When the table starts to fill up, you
              need to reindex again. But this strategy requires careful planning, and is
              nothing for the armchair DBA.

              If you are stuck with these clustered indexes, make you sure you set up
              a reindexing job that runs regularly, and you should probably aim at
              a lower fill factor. If not 50%, maybe 75-80%. It depends a bit how
              big the INSERT frequency is. And use DBCC SHOWCONTIG to monitor
              fragmentation.
              FROM z
              INNER JOIN a ON ..GUID = ..GUID
              INNER JOIN b ON ..GUID = ..GUID
              INNER JOIN c ON ..GUID = ..GUID
              INNER JOIN d ON ..GUID = ..GUID
              LEFT OUTER JOIN eON ..GUID = ..GUID
              AND VERSION = (
              SELECT MAX(VERSION) FROM f WHERE ..GUID = ..GUID)
              >
              make the queries run painfuly slow.
              >
              So the question is, is there ANYTHING I can do to optimize this type
              of queries or is a redesign the only thing that would help?
              With the information you have posted, it's impossible to tell. But
              I would at least give defragmentation with DBCC DBREINDEX a chance
              first if DBCC SHOWCONTIG show horrendeous numbers.

              You could also consider adding covering non-clustered index on tables
              where only a few columns of many are involved in the query.


              --
              Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

              Books Online for SQL Server 2005 at

              Books Online for SQL Server 2000 at

              Comment

              • --CELKO--

                #8
                Re: Strange performance issue with UPDATE FROM

                >Well, yes, I actually do know better. The columns, variables and tables in the query are renamed as I don't want to post production code on the Internet. <<

                Understood. But look at how many production DBs posted here have such
                flaws.
                >Also excuse the mix up between fields and columns, I'm not a native English speaker. <<
                And that means that your English is probably better than a native
                English speaker :) But my objection is not English; it is RDBMS
                versus File Systems. A big problem I see when I teach SQL is students
                using SQL as if it were a sequential file system -- no constraints,
                improper data types. no DRI actions, depending on applications to do
                what DDL should do, etc.
                >One big problem (as i see it, and I'm by no means a SQL expert) is that the db in question uses uniqueidentifie r primary keys with clustered indexes on those almost EVERYWHERE,.. <<
                You are doing very well for an amateur :) Yes, this is a major
                problem and not just for performance. A uniqueidentifie r cannot be a
                key in a properly designed RDBMS by definition -- it is an attribute
                of the hardware and not the data model. You cannot verify it with a
                trusted external source, so you have no data integrity. And it is
                bitch to write them out without making an error.

                The programmers who do this are trying to mimic pointer chains and
                build a linked list in SQL. They missed the whole idea of RDBMS.
                >So the question is, is there ANYTHING I can do to optimize this type of queries or is a redesign the only thing that would help? <<
                Not much. Clustered indexes are not going to help with the random
                nature of a uniqueidentifie r. Perhaps the best thing you can do is
                kill the guy that did this to you and prevent him from coding again.

                Comment

                • Alex Kuznetsov

                  #9
                  Re: Strange performance issue with UPDATE FROM

                  On Jun 26, 10:31 am, Richard <nasseg...@gmai l.comwrote:
                  Hello!
                  >
                  I have this piece of SQL code:
                  >
                  UPDATE a
                  SET Field1 = c.Field1
                  FROM a
                  INNER JOIN b ON a.GUID1 = b.GUID1
                  INNER JOIN c ON b.GUID2 = c.GUID2
                  WHERE c.Type = 1
                  AND @date BETWEEN b.DateFrom AND b.DateTo
                  >
                  This query takes hours to complete.
                  >
                  Now while trying to find out what's causing the poor performance (it
                  surely looks simple enough!) I've rewritten it to use temp tables:
                  >
                  SELECT a.GUID1, a.Field1, c.Type, b.DateFrom, b.DateTo INTO #temptable
                  FROM a
                  INNER JOIN b ON a.GUID1 = b.GUID1
                  INNER JOIN c ON b.GUID2 = c.GUID2
                  WHERE c.Type = 1
                  AND @date BETWEEN b.DateFrom AND b.DateTo
                  >
                  UPDATE a SET Field1 = subsel.Field1
                  FROM (SELECT * FROM #temptable) AS subsel
                  WHERE subsel.GUID1 = a.GUID1
                  >
                  Now it completes in 10 seconds.
                  >
                  My question is why? Am I wrong in saying that the two batches above
                  produce same results? Is there something I've missed about the UPDATE
                  FROM syntax? Why would the first query perform THAT poorly?
                  >
                  Table sizes:
                  a: 24k rows
                  b: 268k rows
                  c: 260k rows
                  >
                  GUIDs are of type uniqueidentifie r.
                  >
                  Any answers appreciated!
                  >
                  Regards,
                  // Richard
                  Not arguing with other party on correctness/standards etc., when you
                  create a temp table, you get statistics on it. So them optimizer has a
                  better estimate of number of rows to modify and may choose a better
                  plan.



                  Comment

                  Working...