Strange count(*) implementation?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Henk Ernst Blok

    Strange count(*) implementation?

    Hi Posgres users/developers,

    Can anyone explain why PosgreSQL (version 7.4.5 on Linux) does a full
    table scan to compute a count(*) on a base table after a vacuum analyze
    has been done with no following updates that might have outdated any
    statistics. Strangly the explain command does give the correct number of
    tuples instantaniously from the catalog, as one would expect. Still the
    optimizer thinks it needs a full table scan to do count.

    See example below:

    ------8<---------------------------------------------------------------------------------------------

    TestDB=# \d test_tbl;
    Table "public.test_tb l"
    Column | Type | Modifiers
    --------+---------+-----------
    pre | integer | not null
    name | text | not null
    Indexes:
    "test_tbl_p key" primary key, btree (pre)
    "test_tbl_pre_i ndex" unique, btree (pre)
    "test_tbl_name_ index" btree (name)

    TestDB=# explain select count(*) from test_tbl;
    QUERY PLAN
    ----------------------------------------------------------------------------
    Aggregate (cost=34293.60. .34293.60 rows=1 width=0)
    -> Seq Scan on test_tbl (cost=0.00..342 93.60 rows=166558 width=0)
    (2 rows)

    Time: 25.188 ms
    TestDB=# select count(*) from test_tbl;
    count
    --------
    166558
    (1 row)

    Time: 1024.803 ms
    TestDB=#

    ------8<---------------------------------------------------------------------------------------------

    The consequence of this seemingly odd count implementation is a very
    very slow count.

    Regards,


    Henk Ernst Blok

    --
    address: DB group, Computer Science, EEMCS Dept., University of Twente,
    PO Box 217, 7500 AE, ENSCHEDE, THE NETHERLANDS
    phone: ++31 (0)53 489 3754 (if no response: 3690)
    email: h.e.blok@utwent e.nl
    WWW: http://www.cs.utwente.nl/~blokh


    ---------------------------(end of broadcast)---------------------------
    TIP 4: Don't 'kill -9' the postmaster

  • Richard Huxton

    #2
    Re: Strange count(*) implementation?

    Henk Ernst Blok wrote:[color=blue]
    > Hi Posgres users/developers,
    >
    > Can anyone explain why PosgreSQL (version 7.4.5 on Linux) does a full
    > table scan to compute a count(*) on a base table after a vacuum analyze
    > has been done with no following updates that might have outdated any
    > statistics. Strangly the explain command does give the correct number of
    > tuples instantaniously from the catalog, as one would expect. Still the
    > optimizer thinks it needs a full table scan to do count.[/color]
    [color=blue]
    > The consequence of this seemingly odd count implementation is a very
    > very slow count.[/color]

    To put it simply, count() doesn't look at the statistics because most of
    the time they are out of date. In any case, they aren't useful for any
    query with a WHERE clause.

    The next most obvious choice is to use the primary-key index rather than
    scanning the table. However, MVCC means that we can have multiple views
    of the table, with some backends seeing a different number of rows than
    others. So - either we need to store multiple index-entry versions as
    well as multiple row versions or you need to check the actual row in
    these cases. PostgreSQL does the second, which results in the full scan
    which you see.

    There is plenty of discussion of this (and also max()/min() aggregate
    functions) in the mailing list archives.

    HTH
    --
    Richard Huxton
    Archonet Ltd

    ---------------------------(end of broadcast)---------------------------
    TIP 3: if posting/reading through Usenet, please send an appropriate
    subscribe-nomail command to majordomo@postg resql.org so that your
    message can get through to the mailing list cleanly

    Comment

    • Tino Wildenhain

      #3
      Re: Strange count(*) implementation?

      hi,

      On Tue, 2004-10-26 at 10:16, Henk Ernst Blok wrote:[color=blue]
      > Hi Posgres users/developers,
      >
      > Can anyone explain why PosgreSQL (version 7.4.5 on Linux) does a full
      > table scan to compute a count(*) on a base table after a vacuum analyze
      > has been done with no following updates that might have outdated any
      > statistics. Strangly the explain command does give the correct number of
      > tuples instantaniously from the catalog, as one would expect. Still the
      > optimizer thinks it needs a full table scan to do count.
      >[/color]
      ....[color=blue]
      > The consequence of this seemingly odd count implementation is a very
      > very slow count.[/color]

      How should the query planner know the vacuum was recent enough and there
      were no modifications to the table since?

      If you are interested in rough numbers you could read the system tables
      for the last vacuum statistics. If you need fast count and can spend
      some cycles on inserts, just make a buffer table with count results
      after insert.

      Unqualified count e.g. without a WHERE clause should not need to
      be used a lot.

      Regards
      Tino


      ---------------------------(end of broadcast)---------------------------
      TIP 9: the planner will ignore your desire to choose an index scan if your
      joining column's datatypes do not match

      Comment

      • Henk Ernst Blok

        #4
        Re: Strange count(*) implementation?

        Hi,

        My question was more of a fundamental nature as this count by scan
        seemed to contradict the theory about how to optimize it.

        I assume(d) the more expensive statistics (e.g., value distribution
        info) are updated only when outdated too much or on request (manual
        vacuum). Usually, other/cheap statistics can easily be maintained
        incrementally and thus reflect actual table state after each update. Of
        course, the MVCC principle seems to make things a bit more complicated I
        understand now. But tracking whether statistics are dirty has to be in
        the system anyway. How does it otherwise decide when to do its own
        statistics updates? So if explain can get the most recent count, why
        not use it in the count as well if you know the statistics are still
        acurate?

        By the way, a count(*) without any where does occur very frequently if
        you are dealing with an OLAP load, which is the case in my setting. So,
        I indeed already 'fixed' the performance problem by precomputing all
        counts I can predict to be of any use.

        Anyway, I understood this issue has been subject to discusion before I
        was on the list (searching the archive/website was/is not very
        effective, so I didn't know until someone told me so, sorry). So, I
        leave it to the developers what to do with this topic.

        Regards,



        Henk Ernst


        Tino Wildenhain wrote:
        [color=blue]
        >hi,
        >
        >On Tue, 2004-10-26 at 10:16, Henk Ernst Blok wrote:
        >
        >[color=green]
        >>Hi Posgres users/developers,
        >>
        >>Can anyone explain why PosgreSQL (version 7.4.5 on Linux) does a full
        >>table scan to compute a count(*) on a base table after a vacuum analyze
        >>has been done with no following updates that might have outdated any
        >>statistics. Strangly the explain command does give the correct number of
        >>tuples instantaniously from the catalog, as one would expect. Still the
        >>optimizer thinks it needs a full table scan to do count.
        >>
        >>
        >>[/color]
        >...
        >
        >[color=green]
        >>The consequence of this seemingly odd count implementation is a very
        >>very slow count.
        >>
        >>[/color]
        >
        >How should the query planner know the vacuum was recent enough and there
        >were no modifications to the table since?
        >
        >If you are interested in rough numbers you could read the system tables
        >for the last vacuum statistics. If you need fast count and can spend
        >some cycles on inserts, just make a buffer table with count results
        >after insert.
        >
        >Unqualified count e.g. without a WHERE clause should not need to
        >be used a lot.
        >
        >Regards
        >Tino
        >
        >[/color]

        --
        address: DB group, Computer Science, EEMCS Dept., University of Twente,
        PO Box 217, 7500 AE, ENSCHEDE, THE NETHERLANDS
        phone: ++31 (0)53 489 3754 (if no response: 3690)
        email: h.e.blok@utwent e.nl
        WWW: http://www.cs.utwente.nl/~blokh


        Comment

        • Neil Conway

          #5
          Re: Strange count(*) implementation?

          Henk Ernst Blok wrote:[color=blue]
          > I assume(d) the more expensive statistics (e.g., value distribution
          > info) are updated only when outdated too much or on request (manual
          > vacuum).[/color]

          They are only updated on request -- i.e. when an ANALYZE is issued.
          [color=blue]
          > So if explain can get the most recent count, why
          > not use it in the count as well if you know the statistics are still
          > acurate?[/color]

          Aside from the issue of stale statistics, there is another problem:
          optimizer statistics are designed to be approximations. They are not
          necessarily precise, even if ANALYZE has just been run (for example,
          pg_class.reltup les is stored as a floating point number).

          A practical problem is that aggregates like count() are implemented via
          a general-purpose API; there is currently no provision for bypassing the
          API in certain special case scenarios. See here for more info:



          -Neil

          ---------------------------(end of broadcast)---------------------------
          TIP 1: subscribe and unsubscribe commands go to majordomo@postg resql.org

          Comment

          • Alvaro Herrera

            #6
            Re: Strange count(*) implementation?

            On Tue, Oct 26, 2004 at 01:56:41PM +0200, Henk Ernst Blok wrote:

            Hi,
            [color=blue]
            > I assume(d) the more expensive statistics (e.g., value distribution
            > info) are updated only when outdated too much or on request (manual
            > vacuum). Usually, other/cheap statistics can easily be maintained
            > incrementally and thus reflect actual table state after each update. Of
            > course, the MVCC principle seems to make things a bit more complicated I
            > understand now.[/color]

            It's not only MVCC; it's also the fact that aggregates are extensible.
            So to the system they are just opaque functions and it doesn't know how
            to optimize them.

            Of course this can be done, e.g. by supplying an optimizing function with
            each aggregate that would try to convert the step-by-step aggregate
            execution plan into a completely different operation (say by looking at
            an index to obtain a max() value), but as you see this is not a trivial
            task, and more importantly, it hasn't been done. Which is to mean, if
            you want to try and submit a patch, it could be improved in the future.

            --
            Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
            "There is evil in the world. There are dark, awful things. Occasionally, we get
            a glimpse of them. But there are dark corners; horrors almost impossible to
            imagine... even in our worst nightmares." (Van Helsing, Dracula A.D. 1972)


            ---------------------------(end of broadcast)---------------------------
            TIP 3: if posting/reading through Usenet, please send an appropriate
            subscribe-nomail command to majordomo@postg resql.org so that your
            message can get through to the mailing list cleanly

            Comment

            • Tino Wildenhain

              #7
              Re: Strange count(*) implementation?

              On Tue, 2004-10-26 at 13:56, Henk Ernst Blok wrote:[color=blue]
              > Hi,
              >
              > My question was more of a fundamental nature as this count by scan
              > seemed to contradict the theory about how to optimize it.[/color]

              It is hard or next to impossible to optimize count() or more generally
              aggregates in a MVCC environment. Note also all the cases where
              you have a qualified select to calculate the aggregate.
              [color=blue]
              > I assume(d) the more expensive statistics (e.g., value distribution
              > info) are updated only when outdated too much or on request (manual
              > vacuum). Usually, other/cheap statistics can easily be maintained
              > incrementally and thus reflect actual table state after each update.[/color]

              I remember some discussion about this. But also here the MVCC and
              the general call for performance leads to the current solution
              where statistics are only updated on vacuum. With PG8.0 you have
              vacuum strategies with better performance which can run more often
              as I understand - so while not giving you exact figures your
              count() could be estimated at least.
              [color=blue]
              > Of course, the MVCC principle seems to make things a bit more
              > complicated I understand now. But tracking whether statistics are
              > dirty has to be in the system anyway. How does it otherwise decide
              > when to do its own statistics updates? So if explain can get the most
              > recent count, why not use it in the count as well if you know the
              > statistics are still acurate?[/color]

              The point is: you dont know it. There is curently no: mark statistics
              dirty if table has new tuples or tuples removed.
              [color=blue]
              > By the way, a count(*) without any where does occur very frequently if
              > you are dealing with an OLAP load, which is the case in my setting.
              > So, I indeed already 'fixed' the performance problem by precomputing
              > all counts I can predict to be of any use.[/color]

              I'm not familar with OLAP specifics, so what is the meaning of the
              count() here? What is done with this information?
              [color=blue]
              > Anyway, I understood this issue has been subject to discusion before I
              > was on the list (searching the archive/website was/is not very
              > effective, so I didn't know until someone told me so, sorry). So, I
              > leave it to the developers what to do with this topic.[/color]

              Yes, very very often I can tell you :-)
              Really this is an FAQ :-)

              Regards
              Tino


              ---------------------------(end of broadcast)---------------------------
              TIP 3: if posting/reading through Usenet, please send an appropriate
              subscribe-nomail command to majordomo@postg resql.org so that your
              message can get through to the mailing list cleanly

              Comment

              • Henk Ernst Blok

                #8
                Re: Strange count(*) implementation?

                Hi Tino,

                Tino Wildenhain wrote:
                [color=blue][color=green]
                >>I assume(d) the more expensive statistics (e.g., value distribution
                >>info) are updated only when outdated too much or on request (manual
                >>vacuum). Usually, other/cheap statistics can easily be maintained
                >>incremental ly and thus reflect actual table state after each update.
                >>
                >>[/color]
                >I remember some discussion about this. But also here the MVCC and
                >the general call for performance leads to the current solution
                >where statistics are only updated on vacuum. With PG8.0 you have
                >vacuum strategies with better performance which can run more often
                >as I understand - so while not giving you exact figures your
                >count() could be estimated at least.
                >
                >[/color]
                I need exact figures at the moment due to the type of some scientific
                experiments I'm running. Approximations are in the pipeline but I need a
                base-line run without any tricks that affect the accuracy of the numbers
                involved.
                [color=blue][color=green]
                >> Of course, the MVCC principle seems to make things a bit more
                >>complicated I understand now. But tracking whether statistics are
                >>dirty has to be in the system anyway. How does it otherwise decide
                >>when to do its own statistics updates? So if explain can get the most
                >>recent count, why not use it in the count as well if you know the
                >>statistics are still acurate?
                >>
                >>[/color]
                >The point is: you dont know it. There is curently no: mark statistics
                >dirty if table has new tuples or tuples removed.
                >
                >[/color]
                OK. Didn't know that, but got the impression by now it worked that way
                in Postgres.
                [color=blue][color=green]
                >>By the way, a count(*) without any where does occur very frequently if
                >>you are dealing with an OLAP load, which is the case in my setting.
                >>So, I indeed already 'fixed' the performance problem by precomputing
                >>all counts I can predict to be of any use.
                >>
                >>[/color]
                >I'm not familar with OLAP specifics, so what is the meaning of the
                >count() here? What is done with this information?
                >
                >[/color]
                OLAP stands for OnLine Analystical Processing, typically meaning heavy
                queries with a lot of data (for typical examples, see: http://www.tpc.org/
                the TPC-H query set in particular). So decision support and datamining
                are in that area for instance. My topic of interest is IR (information
                retrieval) in a database context. My experiments behave like an OLAP
                load at the moment. My current experiments involve a lot of counting and
                expensive joins as I have to compute certain estimators in a
                mathematical model I'm working on, hence the importance of the count... ;)
                On MySQL each of the 30 queries I have to run took on average about 24
                h. As my queries are getting even complexer I'm now trying to find out
                whether Postgres can do a better job.

                Regards,


                Henk Ernst

                --
                address: DB group, Computer Science, EEMCS Dept., University of Twente,
                PO Box 217, 7500 AE, ENSCHEDE, THE NETHERLANDS
                phone: ++31 (0)53 489 3754 (if no response: 3690)
                email: h.e.blok@utwent e.nl
                WWW: http://www.cs.utwente.nl/~blokh


                Comment

                • Tino Wildenhain

                  #9
                  Re: Strange count(*) implementation?

                  Hi,

                  On Tue, 2004-10-26 at 15:25, Henk Ernst Blok wrote:
                  ....
                  [color=blue]
                  > the TPC-H query set in particular). So decision support and datamining
                  > are in that area for instance. My topic of interest is IR (information
                  > retrieval) in a database context. My experiments behave like an OLAP
                  > load at the moment. My current experiments involve a lot of counting
                  > and expensive joins as I have to compute certain estimators in a
                  > mathematical model I'm working on, hence the importance of the
                  > count... ;)
                  > On MySQL each of the 30 queries I have to run took on average about 24
                  > h. As my queries are getting even complexer I'm now trying to find out
                  > whether Postgres can do a better job.[/color]

                  In your specific application if you have not many inserts
                  or have a phase where you do the inserts and another distinct
                  phase where you do the analysis, you should be able to
                  count() just before your analysis. If not you can always
                  experiment with triggers to count() in an optimized way
                  using just another table to store the count value
                  for every table you need.

                  INSERT/DELETE via function, use a trigger and/or RULES.
                  This should do the trick.
                  Maybe you can precalculate a lot more - depending on
                  the algorithms you use.

                  Regards
                  Tino


                  ---------------------------(end of broadcast)---------------------------
                  TIP 3: if posting/reading through Usenet, please send an appropriate
                  subscribe-nomail command to majordomo@postg resql.org so that your
                  message can get through to the mailing list cleanly

                  Comment

                  • Henk Ernst Blok

                    #10
                    Re: Strange count(*) implementation?

                    Tino,

                    Thanks for the sugestion about exploiting the rules system. I hadn't
                    thought about that option yet. Currently I'm trying to pre-compute as
                    much as possible.

                    Regards,


                    Henk Ernst

                    Tino Wildenhain wrote:
                    [color=blue]
                    >Hi,
                    >
                    >On Tue, 2004-10-26 at 15:25, Henk Ernst Blok wrote:
                    >...
                    >
                    >
                    >[color=green]
                    >>the TPC-H query set in particular). So decision support and datamining
                    >>are in that area for instance. My topic of interest is IR (information
                    >>retrieval) in a database context. My experiments behave like an OLAP
                    >>load at the moment. My current experiments involve a lot of counting
                    >>and expensive joins as I have to compute certain estimators in a
                    >>mathematica l model I'm working on, hence the importance of the
                    >>count... ;)
                    >>On MySQL each of the 30 queries I have to run took on average about 24
                    >>h. As my queries are getting even complexer I'm now trying to find out
                    >>whether Postgres can do a better job.
                    >>
                    >>[/color]
                    >
                    >In your specific application if you have not many inserts
                    >or have a phase where you do the inserts and another distinct
                    >phase where you do the analysis, you should be able to
                    >count() just before your analysis. If not you can always
                    >experiment with triggers to count() in an optimized way
                    >using just another table to store the count value
                    >for every table you need.
                    >
                    >INSERT/DELETE via function, use a trigger and/or RULES.
                    >This should do the trick.
                    >Maybe you can precalculate a lot more - depending on
                    >the algorithms you use.
                    >
                    >Regards
                    >Tino
                    >
                    >
                    >---------------------------(end of broadcast)---------------------------
                    >TIP 3: if posting/reading through Usenet, please send an appropriate
                    > subscribe-nomail command to majordomo@postg resql.org so that your
                    > message can get through to the mailing list cleanly
                    >
                    >[/color]

                    --
                    address: DB group, Computer Science, EEMCS Dept., University of Twente,
                    PO Box 217, 7500 AE, ENSCHEDE, THE NETHERLANDS
                    phone: ++31 (0)53 489 3754 (if no response: 3690)
                    email: h.e.blok@utwent e.nl
                    WWW: http://www.cs.utwente.nl/~blokh


                    Comment

                    • Henk Ernst Blok

                      #11
                      Re: Strange count(*) implementation?

                      Hi Alvaro,

                      I used to do some research in extensibility of query optimizers to match
                      the extensibility of the operators. However, it's not really in the
                      focus of my research anymore so I can't spend much time on it,
                      unfortunately. I'll keep it in mind in case a student of the group where
                      I'm working is looking for an project.

                      Regards,


                      Henk Ernst.

                      Alvaro Herrera wrote:
                      [color=blue]
                      >On Tue, Oct 26, 2004 at 01:56:41PM +0200, Henk Ernst Blok wrote:
                      >
                      >Hi,
                      >
                      >
                      >[color=green]
                      >>I assume(d) the more expensive statistics (e.g., value distribution
                      >>info) are updated only when outdated too much or on request (manual
                      >>vacuum). Usually, other/cheap statistics can easily be maintained
                      >>incremental ly and thus reflect actual table state after each update. Of
                      >>course, the MVCC principle seems to make things a bit more complicated I
                      >>understand now.
                      >>
                      >>[/color]
                      >
                      >It's not only MVCC; it's also the fact that aggregates are extensible.
                      >So to the system they are just opaque functions and it doesn't know how
                      >to optimize them.
                      >
                      >Of course this can be done, e.g. by supplying an optimizing function with
                      >each aggregate that would try to convert the step-by-step aggregate
                      >execution plan into a completely different operation (say by looking at
                      >an index to obtain a max() value), but as you see this is not a trivial
                      >task, and more importantly, it hasn't been done. Which is to mean, if
                      >you want to try and submit a patch, it could be improved in the future.
                      >
                      >
                      >[/color]

                      --
                      address: DB group, Computer Science, EEMCS Dept., University of Twente,
                      PO Box 217, 7500 AE, ENSCHEDE, THE NETHERLANDS
                      phone: ++31 (0)53 489 3754 (if no response: 3690)
                      email: h.e.blok@utwent e.nl
                      WWW: http://www.cs.utwente.nl/~blokh


                      Comment

                      Working...