Major query optimiser weirdness with UDFs and SPs on SQL 2000

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Andrew Mayo

    Major query optimiser weirdness with UDFs and SPs on SQL 2000

    There is something very strange going on here. Tested with ADO 2.7 and
    MSDE/2000. At first, things look quite sensible.

    You have a simple SQL query, let's say

    select * from mytab where col1 = 1234

    Now, let's write a simple VB program to do this query back to an
    MSDE/2000 database on our local machine. Effectively, we'll

    rs.open sSQL
    rs.close

    and do that 1,000 times. We wont bother fetching the result set, it
    isn't important in this example.

    No problem. On my machine this takes around 1.6 seconds and modifying
    the code so that the column value in the where clause changes each
    time (i.e col1 = nnnn), doesn't make a substantial difference to this
    time. Well, that all seems reasonable, so moving right along...

    Now we do it with a stored procedure

    create procedure proctest(@id int)
    as
    select * from mytab where col1 = @id

    and we now find that executing

    proctest nnnn

    1,000 times takes around 1.6 seconds whether or not the argument
    changes. So far so good. No obvious saving, but then we wouldn't
    expect any. The query is very simple, after all.

    Well, get to the point!

    Now create a table-returning UDF

    create function functest(@id int) returns table as

    return
    (
    select * from mytab where col1 = @id
    )

    try calling that 1,000 times as

    select * from functest(nnnn)

    and we get around 5.5 seconds on my machine if the argument changes,
    otherwise 1.6 seconds if it remains the same for each call.

    Hmm, looks like the query plan is discarded if the argument changes.
    Well, that's fair enough I guess. UDFs might well be more expensive...
    gotta be careful about using them. It's odd that discarding the query
    plan seems to be SO expensive, but hey, waddya expect?. (perhaps the
    UDF is completely rebuilt, who knows)

    last test, then. Create an SP that calls the UDF

    create procedure proctest1(@id int)
    as
    select * from functest(@id)


    Ok, here's the $64,000 question. How long will this take if @id
    changes each time. The raw UDF took 5.5 seconds, remember, so this
    should be slightly slower.

    But... IT IS NOT.. It takes 1.6 seconds whether or not @id changes.
    Somehow, the UDF becomes FOUR TIMES more efficient when wrapped in an
    SP.

    My theory, which I stress is not entirely scientific, goes something
    like this:-

    I deduce that SQL Server decides to reuse the query plan in this
    circumstance but does NOT when the UDF is called directly. This is
    counter-intuitive but it may be because SQL Server's query parser is
    tuned for conventional SQL i.e it can say

    well, I've got

    select * from mytab WHERE [something or other]

    and now I've got

    select * from mytab WHERE [something else]

    so I can probably re-use the query plan from last time. (I don't know
    if it is this clever, but it does seem to know when two
    textually-different queries have some degree of commonality)

    Whereas with

    select * from UDF(arg1)

    and

    select * from UDF(arg2)

    it goes... hmm, mebbe not.... I better not risk it.

    But with

    sp_something arg1

    and

    sp_something arg2

    it goes... yup, i'll just go call it... and because the SP was already
    compiled, the internal call to the UDF already has a query plan.

    Anyway, that's the theory. For more complex UDFs, by the way, the
    performance increase can be a lot more substantial. On a big complex
    UDF with a bunch of joins, I measured a tenfold increase in
    performance just by wrapping it in an SP, as above.

    Obviously, wrapping a UDF in an SP isn't generally a good thing; the
    idea of UDFs is to allow the column list and where clause to filter
    the rowset of the UDF, but if you are repeatedly calling the UDF with
    the same where clause and column list, this will make it a *lot*
    faster.
  • Erland Sommarskog

    #2
    Re: Major query optimiser weirdness with UDFs and SPs on SQL 2000

    Andrew Mayo (ajmayo@my-deja.com) writes:[color=blue]
    > try calling that 1,000 times as
    >
    > select * from functest(nnnn)
    >
    > and we get around 5.5 seconds on my machine if the argument changes,
    > otherwise 1.6 seconds if it remains the same for each call.
    >
    > Hmm, looks like the query plan is discarded if the argument changes.
    > Well, that's fair enough I guess. UDFs might well be more expensive...
    > gotta be careful about using them. It's odd that discarding the query
    > plan seems to be SO expensive, but hey, waddya expect?. (perhaps the
    > UDF is completely rebuilt, who knows)[/color]

    The reason why the call to the UDF takes longer time is simple,
    and you would have obtained the same result if you had called the
    bare SQL statement with different values for 1234.

    To wit, a table-valued UDF is not really a function: it's a macro, so
    when you call it on your own, this is the same as invoking the naked
    SQL statement.

    Here, this costs you performance, but normally this works to your
    advantage. You use the table-valued UDF in a query, and the optimizer
    can then work with the expanded, potentially creating a better plan
    than if had to first had to evaluate the UDF on its own.

    A good way to illustrate this, is to write three functions, one of
    each kind that all return the value of @@nestlevel.

    --
    Erland Sommarskog, SQL Server MVP, sommar@algonet. se

    Books Online for SQL Server SP3 at
    Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

    Comment

    • Andrew Mayo

      #3
      Re: Major query optimiser weirdness with UDFs and SPs on SQL 2000

      Erland Sommarskog <sommar@algonet .se> wrote in message news:<Xns940E3E 3B9C99Yazorman@ 127.0.0.1>...
      [snip][color=blue]
      > The reason why the call to the UDF takes longer time is simple,
      > and you would have obtained the same result if you had called the
      > bare SQL statement with different values for 1234.
      >[/color]
      Hmm. Interesting. But I did state in the original post that I had also
      tried the bare SQL statement with *different* values for the column
      value and that this does *not* affect the execution time. This would
      appear to contradict your expected observation.

      I have done quite a bit of testing with more complex queries and again
      the cost of the raw query does not seem to be affected significantly
      by changing the substituted argument (I have only done this for the
      one argument case)

      The most complex query I have tested involves several joins, has a
      CASE statement, and has the argument parametrically substituted in
      four places within the query. The execution time for this query is
      around 8 seconds per 100 iterations running on the local server. This
      figure is almost constant, regardless of the parameter value, which is
      numeric.

      If I wrap this into a UDF, it also takes 8 seconds for 100 iterations,
      UNLESS I keep the argument the same. If I do that, execution time
      drops to 0.4 seconds.

      If I wrap it into an SP, execution time drops to 0.4 seconds
      regardless of the passed-in argument. If I then call the UDF from the
      SP, the execution time remains at 0.4 seconds regardless of the
      argument.

      However, if a UDF is essentially a macro (and unfortunately lacking
      the source code for SQL Server I have no idea how they are
      implemented), then I can see that effectively each time the UDF is
      invoked, if the arguments change, then the UDF is effectively rebuilt
      - presumably, this is equivalent to recompiling a stored procedure -
      but if the UDF is already inside a precompiled SP, then this process
      does not need to occur. That would make sense.

      Comment

      • Erland Sommarskog

        #4
        Re: Major query optimiser weirdness with UDFs and SPs on SQL 2000

        Andrew Mayo (ajmayo@my-deja.com) writes:[color=blue]
        > Hmm. Interesting. But I did state in the original post that I had also
        > tried the bare SQL statement with *different* values for the column
        > value and that this does *not* affect the execution time. This would
        > appear to contradict your expected observation.[/color]

        It could be that the single SQL statement is autoparameteriz ed, but
        when it is packaged in a UDF it is not.
        [color=blue]
        > However, if a UDF is essentially a macro (and unfortunately lacking
        > the source code for SQL Server I have no idea how they are
        > implemented), then I can see that effectively each time the UDF is
        > invoked, if the arguments change, then the UDF is effectively rebuilt
        > - presumably, this is equivalent to recompiling a stored procedure -
        > but if the UDF is already inside a precompiled SP, then this process
        > does not need to occur. That would make sense.[/color]

        Not that this only apply to inlined table function. If you were to
        put the SELECT in a multi-statement UDF, I predict that the execution
        time is the same as for the stored procedure, or possibly somewhat
        higher, since there is some overhead for the table variable.



        --
        Erland Sommarskog, SQL Server MVP, sommar@algonet. se

        Books Online for SQL Server SP3 at
        Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

        Comment

        Working...