UPDATEs with multiple aggregate functions

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Steven An

    UPDATEs with multiple aggregate functions

    Howdy,

    I need to write an update query with multiple aggregate functions.
    Here is an example:

    UPDATE t
    SET
    t.a = ( select avg(f.q) from dbo.foo f where f.p = t.y ),
    t.b = ( select sum(f.q) from dbo.foo f where f.p = t.y )
    FROM dbo.test t

    Basically I need to get some aggregate statistics about the rows of
    foo and store them in rows of t. The above statement works fine...but
    note how the two subSelect's have the exact same WHERE clause. This
    screams at me to combine them...but how? I would like to have
    something like this in my query:

    SELECT avg(f.q), sum(f.q) FROM dbo.foo f WHERE f.p = 2

    ...and somehow store the results in t.a and t.b. Is there any way to
    do this?

    Thanks before hand!
  • John Bell

    #2
    Re: UPDATEs with multiple aggregate functions

    Hi

    You could try:

    UPDATE t
    SET a = d.a
    b = d.b
    FROM dbo.test t JOIN
    ( SELECT p,avg(q),sum(q) FROM dbo.foo GROUP BY p ) f ON f.p = t.y

    John

    "Steven An" <stevenan@uclin k.berkeley.edu> wrote in message
    news:4953656d.0 406101610.270ac e29@posting.goo gle.com...[color=blue]
    > Howdy,
    >
    > I need to write an update query with multiple aggregate functions.
    > Here is an example:
    >
    > UPDATE t
    > SET
    > t.a = ( select avg(f.q) from dbo.foo f where f.p = t.y ),
    > t.b = ( select sum(f.q) from dbo.foo f where f.p = t.y )
    > FROM dbo.test t
    >
    > Basically I need to get some aggregate statistics about the rows of
    > foo and store them in rows of t. The above statement works fine...but
    > note how the two subSelect's have the exact same WHERE clause. This
    > screams at me to combine them...but how? I would like to have
    > something like this in my query:
    >
    > SELECT avg(f.q), sum(f.q) FROM dbo.foo f WHERE f.p = 2
    >
    > ..and somehow store the results in t.a and t.b. Is there any way to
    > do this?
    >
    > Thanks before hand![/color]


    Comment

    • --CELKO--

      #3
      Re: UPDATEs with multiple aggregate functions

      >> I need to write an update query with multiple aggregate functions.
      Here is an example: <<

      You might want to learn Standard SQL. There is no FROM clause in a
      Standard SQL UPDATE statement; it would make no sense. Other products
      (SQL Server, Sybase and Ingres) also use the UPDATE .. FROM syntax,
      but with different semantics. So it does not port, or even worse,
      when you do move it, it trashes your database. Other programmers
      cannot read it and maintaining it is harder. And when Microsoft
      decides to change it, you will have to do a re-write. Remember the
      deprecated "*=" versus "LEFT OUTER JOIN" conversions?

      The correct syntax for a searched update statement is

      <update statement> ::=
      UPDATE <table name>
      SET <set clause list>
      [WHERE <search condition>]

      <set clause list> ::=
      <set clause> [{ , <set clause> }...]

      <set clause> ::= <object column> = <update source>

      <update source> ::= <value expression> | NULL | DEFAULT

      <object column> ::= <column name>

      The UPDATE clause simply gives the name of the base table or updatable
      view to be changed.

      Notice that no correlation name is allowed in the UPDATE clause; this
      is to avoid some self-referencing problems that could occur. But it
      also follows the data model in Standard SQL. When you give a table
      expression a correlation name, it is to act as if a materialized table
      with that correlation name has been created in the database. That
      table then is dropped at the end of the statement. If you allowed
      correlation names in the UPDATE clause, you would be updating the
      materialized table, which would then disappear and leave the base
      table untouched.

      The SET clause is a list of columns to be changed or made; the WHERE
      clause tells the statement which rows to use. For this discussion, we
      will assume the user doing the update has applicable UPDATE privileges
      for each <object column>.

      * The WHERE Clause

      As mentioned, the most important thing to remember about the WHERE
      clause is that it is optional. If there is no WHERE clause, all rows
      in the table are changed. This is a common error; if you make it,
      immediately execute a ROLLBACK statement.

      All rows that test TRUE for the <search condition> are marked as a
      subset and not as individual rows. It is also possible that this
      subset will be empty. This subset is used to construct a new set of
      rows that will be inserted into the table when the subset is deleted
      from the table. Note that the empty subset is a valid update that
      will fire declarative referential actions and triggers.

      * The SET Clause

      Each assignment in the <set clause list> is executed in parallel and
      each SET clause changes all the qualified rows at once. Or at least
      that is the theoretical model. In practice, implementations will
      first mark all of the qualified rows in the table in one pass, using
      the WHERE clause. If there were no problems, then the SQL engine
      makes a copy of each marked row in working storage. Each SET clause
      is executed based on the old row image and the results are put in the
      new row image. Finally, the old rows are deleted and the new rows are
      inserted. If an error occurs during all of this, then system does a
      ROLLBACK, the table is left unchanged and the errors are reported.
      This parallelism is not like what you find in a traditional
      third-generation programming language, so it may be hard to learn.
      This feature lets you write a statement that will swap the values in
      two columns, thus:

      UPDATE MyTable
      SET a = b, b = a;

      This is not the same thing as

      BEGIN ATOMIC
      UPDATE MyTable
      SET a = b;
      UPDATE MyTable
      SET b = a;
      END;

      In the first UPDATE, columns a and b will swap values in each row. In
      the second pair of UPDATEs, column a will get all of the values of
      column b in each row. In the second UPDATE of the pair, a, which now
      has the same value as the original value of b, will be written back
      into column b -- no change at all. There are some limits as to what
      the value expression can be. The same column cannot appear more than
      once in a <set clause list> -- which makes sense, given the parallel
      nature of the statement. Since both go into effect at the same time,
      you would not know which SET clause to use.

      If a subquery expression is used in a <set clause>, and it returns a
      single value, the result set is cast to a scalar; if it returns an
      empty, the result set is cast to a NULL; if it returns multiple rows,
      a cardinality violation is raised.

      When we finally get T-SQL up to SQL-92 specs, you could use row
      constructors and have written:

      UPDATE Test
      SET ROW(a,b)
      = (SELECT AVG(q), SUM(q) FROM FooWHERE Foo.p = Test.y);

      Comment

      • Steven An

        #4
        Re: UPDATEs with multiple aggregate functions

        Just what I needed. This saves me a LOT of time :)

        Many thanks!

        "John Bell" <jbellnewsposts @hotmail.com> wrote in message news:<2ccyc.136 9$uW1.12016770@ news-text.cableinet. net>...[color=blue]
        > Hi
        >
        > You could try:
        >
        > UPDATE t
        > SET a = d.a
        > b = d.b
        > FROM dbo.test t JOIN
        > ( SELECT p,avg(q),sum(q) FROM dbo.foo GROUP BY p ) f ON f.p = t.y
        >
        > John
        >
        > "Steven An" <stevenan@uclin k.berkeley.edu> wrote in message
        > news:4953656d.0 406101610.270ac e29@posting.goo gle.com...[color=green]
        > > Howdy,
        > >
        > > I need to write an update query with multiple aggregate functions.
        > > Here is an example:
        > >
        > > UPDATE t
        > > SET
        > > t.a = ( select avg(f.q) from dbo.foo f where f.p = t.y ),
        > > t.b = ( select sum(f.q) from dbo.foo f where f.p = t.y )
        > > FROM dbo.test t
        > >
        > > Basically I need to get some aggregate statistics about the rows of
        > > foo and store them in rows of t. The above statement works fine...but
        > > note how the two subSelect's have the exact same WHERE clause. This
        > > screams at me to combine them...but how? I would like to have
        > > something like this in my query:
        > >
        > > SELECT avg(f.q), sum(f.q) FROM dbo.foo f WHERE f.p = 2
        > >
        > > ..and somehow store the results in t.a and t.b. Is there any way to
        > > do this?
        > >
        > > Thanks before hand![/color][/color]

        Comment

        • Steven An

          #5
          Re: UPDATEs with multiple aggregate functions

          Well, thank you for that very insightful answer. Yes, that ROW
          operator is a lot like what I had in mind. I was looking for
          something like SET (a,b) = (select ..). I will try that, and see
          which one is faster. I have a feeling your method will be MUCH
          faster, since unlike the JOIN, it does not require that I calculate
          avg's and sum's for ALL rows, only the ones I want. Thank you!

          jcelko212@earth link.net (--CELKO--) wrote in message news:<18c7b3c2. 0406121447.35ad ed55@posting.go ogle.com>...[color=blue][color=green][color=darkred]
          > >> I need to write an update query with multiple aggregate functions.[/color][/color]
          > Here is an example: <<
          >
          > You might want to learn Standard SQL. There is no FROM clause in a
          > Standard SQL UPDATE statement; it would make no sense. Other products
          > (SQL Server, Sybase and Ingres) also use the UPDATE .. FROM syntax,
          > but with different semantics. So it does not port, or even worse,
          > when you do move it, it trashes your database. Other programmers
          > cannot read it and maintaining it is harder. And when Microsoft
          > decides to change it, you will have to do a re-write. Remember the
          > deprecated "*=" versus "LEFT OUTER JOIN" conversions?
          >
          > The correct syntax for a searched update statement is
          >
          > <update statement> ::=
          > UPDATE <table name>
          > SET <set clause list>
          > [WHERE <search condition>]
          >
          > <set clause list> ::=
          > <set clause> [{ , <set clause> }...]
          >
          > <set clause> ::= <object column> = <update source>
          >
          > <update source> ::= <value expression> | NULL | DEFAULT
          >
          > <object column> ::= <column name>
          >
          > The UPDATE clause simply gives the name of the base table or updatable
          > view to be changed.
          >
          > Notice that no correlation name is allowed in the UPDATE clause; this
          > is to avoid some self-referencing problems that could occur. But it
          > also follows the data model in Standard SQL. When you give a table
          > expression a correlation name, it is to act as if a materialized table
          > with that correlation name has been created in the database. That
          > table then is dropped at the end of the statement. If you allowed
          > correlation names in the UPDATE clause, you would be updating the
          > materialized table, which would then disappear and leave the base
          > table untouched.
          >
          > The SET clause is a list of columns to be changed or made; the WHERE
          > clause tells the statement which rows to use. For this discussion, we
          > will assume the user doing the update has applicable UPDATE privileges
          > for each <object column>.
          >
          > * The WHERE Clause
          >
          > As mentioned, the most important thing to remember about the WHERE
          > clause is that it is optional. If there is no WHERE clause, all rows
          > in the table are changed. This is a common error; if you make it,
          > immediately execute a ROLLBACK statement.
          >
          > All rows that test TRUE for the <search condition> are marked as a
          > subset and not as individual rows. It is also possible that this
          > subset will be empty. This subset is used to construct a new set of
          > rows that will be inserted into the table when the subset is deleted
          > from the table. Note that the empty subset is a valid update that
          > will fire declarative referential actions and triggers.
          >
          > * The SET Clause
          >
          > Each assignment in the <set clause list> is executed in parallel and
          > each SET clause changes all the qualified rows at once. Or at least
          > that is the theoretical model. In practice, implementations will
          > first mark all of the qualified rows in the table in one pass, using
          > the WHERE clause. If there were no problems, then the SQL engine
          > makes a copy of each marked row in working storage. Each SET clause
          > is executed based on the old row image and the results are put in the
          > new row image. Finally, the old rows are deleted and the new rows are
          > inserted. If an error occurs during all of this, then system does a
          > ROLLBACK, the table is left unchanged and the errors are reported.
          > This parallelism is not like what you find in a traditional
          > third-generation programming language, so it may be hard to learn.
          > This feature lets you write a statement that will swap the values in
          > two columns, thus:
          >
          > UPDATE MyTable
          > SET a = b, b = a;
          >
          > This is not the same thing as
          >
          > BEGIN ATOMIC
          > UPDATE MyTable
          > SET a = b;
          > UPDATE MyTable
          > SET b = a;
          > END;
          >
          > In the first UPDATE, columns a and b will swap values in each row. In
          > the second pair of UPDATEs, column a will get all of the values of
          > column b in each row. In the second UPDATE of the pair, a, which now
          > has the same value as the original value of b, will be written back
          > into column b -- no change at all. There are some limits as to what
          > the value expression can be. The same column cannot appear more than
          > once in a <set clause list> -- which makes sense, given the parallel
          > nature of the statement. Since both go into effect at the same time,
          > you would not know which SET clause to use.
          >
          > If a subquery expression is used in a <set clause>, and it returns a
          > single value, the result set is cast to a scalar; if it returns an
          > empty, the result set is cast to a NULL; if it returns multiple rows,
          > a cardinality violation is raised.
          >
          > When we finally get T-SQL up to SQL-92 specs, you could use row
          > constructors and have written:
          >
          > UPDATE Test
          > SET ROW(a,b)
          > = (SELECT AVG(q), SUM(q) FROM FooWHERE Foo.p = Test.y);[/color]

          Comment

          • Steven An

            #6
            Re: UPDATEs with multiple aggregate functions

            OK I didn't read the "When we finally" part. So as of now, my best
            actual solution seems to be the JOIN'd UPDATE.. I guess I'll have to
            break standard. Oh well..

            jcelko212@earth link.net (--CELKO--) wrote in message news:<18c7b3c2. 0406121447.35ad ed55@posting.go ogle.com>...[color=blue][color=green][color=darkred]
            > >> I need to write an update query with multiple aggregate functions.[/color][/color]
            > Here is an example: <<
            >
            > You might want to learn Standard SQL. There is no FROM clause in a
            > Standard SQL UPDATE statement; it would make no sense. Other products
            > (SQL Server, Sybase and Ingres) also use the UPDATE .. FROM syntax,
            > but with different semantics. So it does not port, or even worse,
            > when you do move it, it trashes your database. Other programmers
            > cannot read it and maintaining it is harder. And when Microsoft
            > decides to change it, you will have to do a re-write. Remember the
            > deprecated "*=" versus "LEFT OUTER JOIN" conversions?
            >
            > The correct syntax for a searched update statement is
            >
            > <update statement> ::=
            > UPDATE <table name>
            > SET <set clause list>
            > [WHERE <search condition>]
            >
            > <set clause list> ::=
            > <set clause> [{ , <set clause> }...]
            >
            > <set clause> ::= <object column> = <update source>
            >
            > <update source> ::= <value expression> | NULL | DEFAULT
            >
            > <object column> ::= <column name>
            >
            > The UPDATE clause simply gives the name of the base table or updatable
            > view to be changed.
            >
            > Notice that no correlation name is allowed in the UPDATE clause; this
            > is to avoid some self-referencing problems that could occur. But it
            > also follows the data model in Standard SQL. When you give a table
            > expression a correlation name, it is to act as if a materialized table
            > with that correlation name has been created in the database. That
            > table then is dropped at the end of the statement. If you allowed
            > correlation names in the UPDATE clause, you would be updating the
            > materialized table, which would then disappear and leave the base
            > table untouched.
            >
            > The SET clause is a list of columns to be changed or made; the WHERE
            > clause tells the statement which rows to use. For this discussion, we
            > will assume the user doing the update has applicable UPDATE privileges
            > for each <object column>.
            >
            > * The WHERE Clause
            >
            > As mentioned, the most important thing to remember about the WHERE
            > clause is that it is optional. If there is no WHERE clause, all rows
            > in the table are changed. This is a common error; if you make it,
            > immediately execute a ROLLBACK statement.
            >
            > All rows that test TRUE for the <search condition> are marked as a
            > subset and not as individual rows. It is also possible that this
            > subset will be empty. This subset is used to construct a new set of
            > rows that will be inserted into the table when the subset is deleted
            > from the table. Note that the empty subset is a valid update that
            > will fire declarative referential actions and triggers.
            >
            > * The SET Clause
            >
            > Each assignment in the <set clause list> is executed in parallel and
            > each SET clause changes all the qualified rows at once. Or at least
            > that is the theoretical model. In practice, implementations will
            > first mark all of the qualified rows in the table in one pass, using
            > the WHERE clause. If there were no problems, then the SQL engine
            > makes a copy of each marked row in working storage. Each SET clause
            > is executed based on the old row image and the results are put in the
            > new row image. Finally, the old rows are deleted and the new rows are
            > inserted. If an error occurs during all of this, then system does a
            > ROLLBACK, the table is left unchanged and the errors are reported.
            > This parallelism is not like what you find in a traditional
            > third-generation programming language, so it may be hard to learn.
            > This feature lets you write a statement that will swap the values in
            > two columns, thus:
            >
            > UPDATE MyTable
            > SET a = b, b = a;
            >
            > This is not the same thing as
            >
            > BEGIN ATOMIC
            > UPDATE MyTable
            > SET a = b;
            > UPDATE MyTable
            > SET b = a;
            > END;
            >
            > In the first UPDATE, columns a and b will swap values in each row. In
            > the second pair of UPDATEs, column a will get all of the values of
            > column b in each row. In the second UPDATE of the pair, a, which now
            > has the same value as the original value of b, will be written back
            > into column b -- no change at all. There are some limits as to what
            > the value expression can be. The same column cannot appear more than
            > once in a <set clause list> -- which makes sense, given the parallel
            > nature of the statement. Since both go into effect at the same time,
            > you would not know which SET clause to use.
            >
            > If a subquery expression is used in a <set clause>, and it returns a
            > single value, the result set is cast to a scalar; if it returns an
            > empty, the result set is cast to a NULL; if it returns multiple rows,
            > a cardinality violation is raised.
            >
            > When we finally get T-SQL up to SQL-92 specs, you could use row
            > constructors and have written:
            >
            > UPDATE Test
            > SET ROW(a,b)
            > = (SELECT AVG(q), SUM(q) FROM FooWHERE Foo.p = Test.y);[/color]

            Comment

            • Erland Sommarskog

              #7
              Re: UPDATEs with multiple aggregate functions

              Steven An (stevenan@uclin k.berkeley.edu) writes:[color=blue]
              > Well, thank you for that very insightful answer. Yes, that ROW
              > operator is a lot like what I had in mind. I was looking for
              > something like SET (a,b) = (select ..). I will try that, and see
              > which one is faster. I have a feeling your method will be MUCH
              > faster, since unlike the JOIN, it does not require that I calculate
              > avg's and sum's for ALL rows, only the ones I want. Thank you![/color]

              Actually, my experience is that on SQL Server

              UPDATE tbl
              SET col = (SELECT SUM(x)
              FROM tbl2
              WHERE tbl2.keycol1 = tbl.keycol)

              usually performs less well than:

              UPDATE tbl
              SET col = x.sum2
              FROM tbl a
              JOIN (SELECT keycol1, sum2 = SUM(x) FROM tbl2 GROUP BY keycol1) b
              ON a.keycol = b.keycol1

              The derived table, is only logical and SQL Server does not have to
              compute all sums, even if the looks so. We had one case in our
              system where a colleague rewrote from the first syntax to second,
              and execution time fell from two minutes to a few seconds. But

              If you have more than one column to update, using a derived table is
              almost a guaranteed winner.

              Of course, this is all up to the optimizer, and if you need to know
              for the specific case, you have to benchmark.

              --
              Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

              Books Online for SQL Server SP3 at
              Accelerate your AI application's time to market by harnessing the power of your own data and the built-in AI capabilities of SQL Server 2025, the enterprise database with best-in-class security, performance and availability.

              Comment

              Working...