Speeding up LIKE with placeholders?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Dan Sugalski

    Speeding up LIKE with placeholders?

    Is there any good way to speed up SQL that uses like and has placeholders?

    Here's the scoop. I've got a system that uses a lot of pre-generated
    SQL with placeholders in it. At runtime these SQL statements are
    fired off (through the C PQexecParams function, if that matters) for
    execution. No prepares or anything, just bare statements with $1 and
    friends, with the values passed in as parameters. Straightforward ,
    and no big deal.

    Unfortunately, performance is horrible. And when I mean horrible,
    we're talking 6 orders of magnitude (101355.884 ms vs 0.267 ms) when
    checked out via EXPLAIN ANALYZE. The slow version has the SQL defined
    as a function with the parameters passed in, while the fast way has
    the parameters substituted in, and the query plan for the slow
    version notes that it's doing a sequential scan, while the fast
    version uses one of the indexes. (And the field being LIKEd has a
    b-tree index on it) The LIKE condition always has a constant prefix
    -- it's 'S%' or 'S42343%' -- so it fits the index.

    Now, I'd not be surprised for a generic function to do this, if the
    plan is created when the function is created, and I can deal with
    that. I'd figure, though, that since the parameters are being passed
    into PQexecParams basically to get them out of band so I don't have
    to deal with escaping, quoting, and suchlike things, that the
    optimizer would look at things *after* the substitution was done.

    Is there anything I can do to speed this up, short of doing the
    parameter substitution myself and skipping PQexecParams here? (Which
    I'd rather not, since it's a pain and somewhat error-prone (for me,
    at least))
    --
    Dan

    --------------------------------------it's like this-------------------
    Dan Sugalski even samurai
    dan@sidhe.org have teddy bears and even
    teddy bears get drunk

    ---------------------------(end of broadcast)---------------------------
    TIP 7: don't forget to increase your free space map settings

  • Tom Lane

    #2
    Re: Speeding up LIKE with placeholders?

    Dan Sugalski <dan@sidhe.or g> writes:[color=blue]
    > I'd figure, though, that since the parameters are being passed
    > into PQexecParams basically to get them out of band so I don't have
    > to deal with escaping, quoting, and suchlike things, that the
    > optimizer would look at things *after* the substitution was done.[/color]

    You'd figure wrong :-(. The present mechanism for the LIKE-to-index
    optimization requires the LIKE pattern to be a pure, unadulterated constant.

    regards, tom lane

    ---------------------------(end of broadcast)---------------------------
    TIP 1: subscribe and unsubscribe commands go to majordomo@postg resql.org

    Comment

    • Dan Sugalski

      #3
      Re: Speeding up LIKE with placeholders?

      At 5:19 PM -0400 9/10/04, Tom Lane wrote:[color=blue]
      >Dan Sugalski <dan@sidhe.or g> writes:[color=green]
      >> I'd figure, though, that since the parameters are being passed
      >> into PQexecParams basically to get them out of band so I don't have
      >> to deal with escaping, quoting, and suchlike things, that the
      >> optimizer would look at things *after* the substitution was done.[/color]
      >
      >You'd figure wrong :-(. The present mechanism for the LIKE-to-index
      >optimization requires the LIKE pattern to be a pure, unadulterated constant.[/color]

      Well. Darn.

      Would I regret it if I asked where in the source this lies so I could
      go fix it?
      --
      Dan

      --------------------------------------it's like this-------------------
      Dan Sugalski even samurai
      dan@sidhe.org have teddy bears and even
      teddy bears get drunk

      ---------------------------(end of broadcast)---------------------------
      TIP 9: the planner will ignore your desire to choose an index scan if your
      joining column's datatypes do not match

      Comment

      • Tom Lane

        #4
        Re: Speeding up LIKE with placeholders?

        Dan Sugalski <dan@sidhe.or g> writes:[color=blue]
        > Would I regret it if I asked where in the source this lies so I could
        > go fix it?[/color]

        If it were easy to fix it would have been fixed before now ...

        I have toyed with the notion of converting "var LIKE pattern" to
        "var LIKE pattern AND var >= lowbound(patter n) AND var < highbound(patte rn)"
        where lowbound() and highbound() are actual functions that we leave in
        the generated plan, rather than insisting that the planner derive these
        bounds before making the plan at all. Then the pattern wouldn't have
        to be a true constant. However, it falls down on this problem: what
        shall those functions do if the supplied pattern isn't left-anchored at
        all? highbound in particular doesn't have a valid result it can give
        that's guaranteed larger than all possible values of var. Not to
        mention that a full-table index scan is the very last thing you want ---
        I think the planner would really be abdicating its responsibilitie s to
        generate a plan with that kind of downside risk.

        You could possibly sidestep this argument by envisioning a query like
        var LIKE ('^' || $1)
        but I doubt that anyone actually writes such things. In the end, LIKE
        is the sort of thing that you really have to run a planning cycle for
        in order to get a reasonable plan.

        regards, tom lane

        ---------------------------(end of broadcast)---------------------------
        TIP 7: don't forget to increase your free space map settings

        Comment

        • Dan Sugalski

          #5
          Re: Speeding up LIKE with placeholders?

          At 5:55 PM -0400 9/10/04, Tom Lane wrote:[color=blue]
          >Dan Sugalski <dan@sidhe.or g> writes:[color=green]
          >> Would I regret it if I asked where in the source this lies so I could
          >> go fix it?[/color]
          >
          >If it were easy to fix it would have been fixed before now ...[/color]

          Oh, I wasn't expecting it to be an *easy* fix... :) The question is
          whether it's less work to make the fix in Postgres or in my back-end
          library code. Worst case I fix it in my code and submit a doc patch,
          but I'm up for at least investigating the general fix.

          Since the only difference in this case is that the parameters are
          pulled out for transport rather than being in band (a
          properly-escaped string substitution could turn this case from a
          PQexecParams call into a PQexec call) I was thinking the thing to do
          would be to either teach the planner to look in the parameter list
          when it gets handed $xxx variables, or have the back-end do the
          substitution to the SQL before handing the code to the planner.

          I'm not sure I like either option (the pre-substitution's got some
          issues when handed large parameters, while teaching the planner to
          use a parameter list may require thumping a lot of back-end code) and
          it may amount to nothing, but I figured it was worth a look, if for
          no other reason than to find a big mass of code I can safely run
          screaming from. ;-)
          --
          Dan

          --------------------------------------it's like this-------------------
          Dan Sugalski even samurai
          dan@sidhe.org have teddy bears and even
          teddy bears get drunk

          ---------------------------(end of broadcast)---------------------------
          TIP 6: Have you searched our list archives?



          Comment

          • Tom Lane

            #6
            Re: Speeding up LIKE with placeholders?

            Dan Sugalski <dan@sidhe.or g> writes:[color=blue]
            > Since the only difference in this case is that the parameters are
            > pulled out for transport rather than being in band (a
            > properly-escaped string substitution could turn this case from a
            > PQexecParams call into a PQexec call) I was thinking the thing to do
            > would be to either teach the planner to look in the parameter list
            > when it gets handed $xxx variables, or have the back-end do the
            > substitution to the SQL before handing the code to the planner.[/color]

            This has already been considered and rejected. Oliver Jowett did the
            part that is safe, which is to use the parameter values for estimation
            purposes in other contexts, but pre-substituting a parameter value for
            LIKE calls the mere correctness of the plan into question.

            What it would take to make it workable is a change in the semantics of
            the v3 protocol messages, such that there is no re-use of a plan. That,
            no one is up for quite yet, when we just hacked the protocol last year ...

            regards, tom lane

            ---------------------------(end of broadcast)---------------------------
            TIP 8: explain analyze is your friend

            Comment

            • Dan Sugalski

              #7
              Re: Speeding up LIKE with placeholders?

              At 6:33 PM -0400 9/10/04, Tom Lane wrote:[color=blue]
              >Dan Sugalski <dan@sidhe.or g> writes:[color=green]
              >> Since the only difference in this case is that the parameters are
              >> pulled out for transport rather than being in band (a
              >> properly-escaped string substitution could turn this case from a
              >> PQexecParams call into a PQexec call) I was thinking the thing to do
              >> would be to either teach the planner to look in the parameter list
              >> when it gets handed $xxx variables, or have the back-end do the
              >> substitution to the SQL before handing the code to the planner.[/color]
              >
              >This has already been considered and rejected. Oliver Jowett did the
              >part that is safe, which is to use the parameter values for estimation
              >purposes in other contexts, but pre-substituting a parameter value for
              >LIKE calls the mere correctness of the plan into question.[/color]

              Ouch. Okay, fair 'nuff. (I figured the parameters could be factored
              in before the plan was made. Wrongly, I see, now that I poke around
              in the code a bit :) Plan B for me it is.
              [color=blue]
              >What it would take to make it workable is a change in the semantics of
              >the v3 protocol messages, such that there is no re-use of a plan. That,
              >no one is up for quite yet, when we just hacked the protocol last year ...[/color]

              It might be possible with a backwards-compatible protocol change, but
              that's more work than I'm up for, and this is the wrong list for it
              anyway.
              --
              Dan

              --------------------------------------it's like this-------------------
              Dan Sugalski even samurai
              dan@sidhe.org have teddy bears and even
              teddy bears get drunk

              ---------------------------(end of broadcast)---------------------------
              TIP 8: explain analyze is your friend

              Comment

              • Greg Stark

                #8
                Re: Speeding up LIKE with placeholders?


                Tom Lane <tgl@sss.pgh.pa .us> writes:
                [color=blue]
                > I think the planner would really be abdicating its responsibilitie s to
                > generate a plan with that kind of downside risk.[/color]

                Sure, but what about the risk of using a sequential scan the other 99% of the
                time? The downside risk of the index scan is a 5x slowdown or so. The downside
                risk of the sequential scan is unbounded.
                [color=blue]
                > You could possibly sidestep this argument by envisioning a query like
                > var LIKE ('^' || $1)
                > but I doubt that anyone actually writes such things. In the end, LIKE
                > is the sort of thing that you really have to run a planning cycle for
                > in order to get a reasonable plan.[/color]

                Actually ^ doesn't mean anything to LIKE. There's no way to anchor a LIKE
                pattern except by ensuring it doesn't start with % or _.

                I don't know. I wrote code that did "LIKE ?||'%'" on Oracle tons of times and
                it always used an index scan. I was really impressed when I first checked
                whether that worked and really happy when it did. And it always ran just fine.

                In retrospect I would have done something like "LIKE escape(?)||'%'" . Except
                there's no such function. And if I had to write it myself I would do it in the
                application. String manipulation in SQL always being such a pain. And in any
                case I would have to check for an empty argument and handle that with some
                friendly UI message, which can't be done with a simple function in the query.
                So the database would be none the wiser and I still would have been
                disappointed if it didn't use the index scan.

                In the end it's always possible to fool the planner into producing a bad plan.
                It's just got to pick the plan that's most likely to be the one the user
                intended and least dangerous. It's hard to picture someone intentionally doing
                ?||'%' without thinking it would use an index scan. If they didn't check for
                leading %s and _s or empty parameters then it was their oversight or they were
                expecting it to be slow.

                --
                greg


                ---------------------------(end of broadcast)---------------------------
                TIP 9: the planner will ignore your desire to choose an index scan if your
                joining column's datatypes do not match

                Comment

                • Martijn van Oosterhout

                  #9
                  Re: Speeding up LIKE with placeholders?

                  On Sat, Sep 11, 2004 at 02:30:35AM -0400, Greg Stark wrote:[color=blue]
                  > I don't know. I wrote code that did "LIKE ?||'%'" on Oracle tons of timesand
                  > it always used an index scan. I was really impressed when I first checked
                  > whether that worked and really happy when it did. And it always ran just fine.[/color]

                  Note, this would have worked fine in previous versions of postgresql,
                  because there was no such thing as prepared statements so the plan
                  needed to be recreated for each query, thus it could take advantage of
                  all knowledge available in the query itself.

                  Kinda ironic that a seemingly very nice feature (seperating query from
                  arguments) has such a side-effect (less information for planner).
                  --
                  Martijn van Oosterhout <kleptog@svana. org> http://svana.org/kleptog/[color=blue]
                  > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
                  > tool for doing 5% of the work and then sitting around waiting for someone
                  > else to do the other 95% so you can sue them.[/color]

                  -----BEGIN PGP SIGNATURE-----
                  Version: GnuPG v1.0.6 (GNU/Linux)
                  Comment: For info see http://www.gnupg.org

                  iD8DBQFBQsPvY5T wig3Ge+YRAvbhAJ 9GJS5szbEaiZEfz asA7ptMoSaoFgCg gkW2
                  AIEwYmv6OsF+XHB oaaUeYd4=
                  =NvJz
                  -----END PGP SIGNATURE-----

                  Comment

                  • Pierre-Frédéric Caillaud

                    #10
                    Re: Speeding up LIKE with placeholders?


                    If I understand correctly your problem is that the plan for your prepared
                    query is bad because the LIKE parameter is not known at prepare time...

                    Immediate solutions :

                    Is it possible to use EXECUTE in your SQL to sidestep this and replan the
                    query with the right values ?
                    Can you put your query in a function so that it's planned on first
                    execution with real parameter values ?

                    There is still the old :
                    SELECT ... WHERE field BETWEEN 'prefix' AND 'prefiy'; but it kinda sucks
                    ;)

                    ---------------------------
                    Maybe this just indicates a lack of user-friendliness in the API ?
                    Suggestions for improvement (in random order):
                    Don't flame me for suggesting impossible things... I'm just wishing.

                    * Maybe a new operator like <column STARTSWITH 'prefix'> could replace
                    <column LIKE 'prefix%'> and tell the planner with absolute certainty that,
                    whatever the value is, use of an index is possible ? It's possible to
                    create user defined operators in Postgres... so...

                    * In a future version, when planning prepared queries, maybe the planner
                    could flag a query as "replan according to real parameter values every
                    time it's executed" when it detects a hard to plan condition like "LIKE
                    $1" or other hard to predict cases ? In this case the cost of planning
                    would occur at each execution, but the query parsing time would still be
                    saved. Or maybe the planner could write itself a note saying : "I planned
                    the query for the generic LIKE, but if $1 does not start with a %, I
                    should replan it when it's executed"

                    * Maybe there should be a keyword like "REPLAN" to allow the user to
                    specify that the query should be replanned every time ?

                    * Maybe there should be a keyword like "DEFER PLAN" to allow the user to
                    specify that the query should be not be planned when PREPARE is called,
                    but rather on its first execution, with real parameters, and the plan then
                    stored and used for subsequent queries ?
                    This has the disadvantage of depending on the next query to specify
                    adequate parameters.

                    Maybe we could tell the planner which parameters to use for preparing the
                    plan :
                    PREPARE myquery( text, integer )
                    WITH PARAMETERS ('aa%', 5)
                    AS SELECT * FROM mytable WHERE ...

                    so the plan stored for this prepared query would be generated using the
                    specified parameter values instead of just placeholders (so the planner
                    may generate a better plan, in this case it would notice the prefix nature
                    of the LIKE 'aa%').

                    * Maybe we could give the planner hints about the placeholders in a
                    prepared query. Something like

                    PREPARE myquery( text, integer )
                    AS SELECT * FROM mytable WHERE ...

                    would become :

                    PREPARE myquery( text, integer )
                    WHERE expression
                    AS SELECT * FROM mytable WHERE ...

                    where expression would be any boolean expression giving hints to the
                    planner, like :
                    $2 NOT NULL
                    in this case the planner would know that it should take into account only
                    the stats for $2 not null,10 ; and ignore the non-indexed NULLs for
                    example. However this is redundant as the "$2 NOT NULL" can as well be put
                    in the WHERE part of the query.

                    This could be used in the 'LIKE $1' case by specifying that $1 cannot
                    start with a '%'. The planner would have to be pretty smart to take this
                    into account...
                    When the expression is false, the query would be re-planned instead of
                    using the stored plan.

                    * Maybe we could add a clause to this poster's SELECT :

                    PREPARE myquery AS SELECT * FROM mytable
                    WHERE myfield LIKE $1 AND NOT ($1 LIKE '%%%');

                    In this case the planner would have to recognize the "NOT ($1 LIKE
                    '%%%')" part and take appropriate measures... When executing the query
                    this would not add any cost as it is a constant. However if the user
                    messes up and sends $1='%something' , he'll get no results.

                    Anyway,
                    have a nice day !




                    ---------------------------(end of broadcast)---------------------------
                    TIP 8: explain analyze is your friend

                    Comment

                    • Dan Sugalski

                      #11
                      Re: Speeding up LIKE with placeholders?

                      At 12:30 PM +0200 9/11/04, Pierre-Frédéric Caillaud wrote:[color=blue]
                      > If I understand correctly your problem is
                      >that the plan for your prepared query is bad
                      >because the LIKE parameter is not known at
                      >prepare time...[/color]

                      Almost. My problem is that queries made with the
                      PQexecParams C call are automatically split into
                      a prepare and execute pair. So while as far as my
                      code's concerned everything goes over to the
                      server in one big wad, the same as if I'd sent it
                      over with a PQexec call, the server sees it in
                      several pieces.

                      The only way to fix this is to change the way
                      that PQexecParams is handled, which from what I
                      can see of the postgres code would require a
                      protocol change and corresponding extension of
                      the server back end, or a change in the way
                      prepared queries are executed by the server,
                      neither of which is a particularly simple thing
                      to do. (Which I didn't realize initially)
                      --
                      Dan

                      --------------------------------------it's like this-------------------
                      Dan Sugalski even samurai
                      dan@sidhe.org have teddy bears and even
                      teddy bears get drunk

                      ---------------------------(end of broadcast)---------------------------
                      TIP 5: Have you checked our extensive FAQ?



                      Comment

                      • Greg Stark

                        #12
                        Re: Speeding up LIKE with placeholders?


                        Martijn van Oosterhout <kleptog@svana. org> writes:
                        [color=blue]
                        > Kinda ironic that a seemingly very nice feature (seperating query from
                        > arguments) has such a side-effect (less information for planner).[/color]

                        It was an known and expected downside.

                        But there is an upside too. For me it's actually a benefit, because it means
                        I'm less likely to called in the middle of the night because the web site when
                        someone entered some unusual parameters and caused a bad plan.

                        Whatever plan it's using when I test it will be the same as the plan it uses
                        for others, which may not be ideal but it won't be a surprise and I can
                        anticipate whether it'll be a reasonable plan in the future.

                        I can't anticipate all the potential plans the database might come up with for
                        all the possible parameters.

                        --
                        greg


                        ---------------------------(end of broadcast)---------------------------
                        TIP 8: explain analyze is your friend

                        Comment

                        Working...