Perfomance difference between 7.2 and 7.3

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Paulo Jan

    Perfomance difference between 7.2 and 7.3

    Hi all:

    I have here a table with the following schema:

    Table "todocinetv "
    Column | Type | Modifiers
    -------------+-----------------------------+----------------------
    id | integer | not null default '0'
    datestamp | timestamp without time zone | not null
    thread | integer | not null default '0'
    parent | integer | not null default '0'
    author | character(37) | not null default ''
    subject | character(255) | not null default ''
    email | character(200) | not null default ''
    attachment | character(64) | default ''
    host | character(50) | not null default ''
    email_reply | character(1) | not null default 'N'
    approved | character(1) | not null default 'N'
    msgid | character(100) | not null default ''
    modifystamp | integer | not null default '0'
    userid | integer | not null default '0'
    Indexes: todocinetv_appr oved,
    todocinetv_auth or,
    todocinetv_date stamp,
    todocinetv_modi fystamp,
    todocinetv_msgi d,
    todocinetv_pare nt,
    todocinetv_subj ect,
    todocinetv_thre ad,
    todocinetv_user id,
    todocinetvpri_k ey


    (It's actually a table created by the discussion board application
    Phorum (version 3.3)).
    This table has about 28000 rows, and is running with Postgres 7.2.3
    under Red Hat 8.0, in a 2.4 Ghz. Pentiun 4 with 512 Mb. of RAM.
    The problem I'm having is that, when you access the main page of the
    discussion board, it takes forever to show you the list of posts. The
    query that Phorum uses for doing so is:


    phorum=# explain
    phorum-# SELECT thread, modifystamp, count(id) AS tcount,
    datetime(modify stamp) AS latest, max(id) as maxid FROM todocinetv WHERE
    approved='Y' GROUP BY thread, modifystamp ORDER BY modifystamp desc,
    thread desc limit 30;
    NOTICE: QUERY PLAN:

    Limit (cost=40354.79. .40354.79 rows=30 width=12)
    -> Sort (cost=40354.79. .40354.79 rows=2879 width=12)
    -> Aggregate (cost=39901.43. .40189.35 rows=2879 width=12)
    -> Group (cost=39901.43. .40045.39 rows=28792 width=12)
    -> Sort (cost=39901.43. .39901.43 rows=28792 width=12)
    -> Seq Scan on todocinetv
    (cost=0.00..377 68.90 rows=28792 width=12)


    This query takes up to 3 minutes to execute. I have tried to strip it
    down and leaving it in its most vanilla form (without "count(id)" and
    such), and it's still almost as slow:


    phorum=# explain
    phorum-# SELECT thread, modifystamp, datetime(modify stamp) AS latest
    from todocinetv WHERE approved='Y' ORDER BY modifystamp desc, thread
    desc limit 30;
    NOTICE: QUERY PLAN:

    Limit (cost=39901.43. .39901.43 rows=30 width=8)
    -> Sort (cost=39901.43. .39901.43 rows=28792 width=8)
    -> Seq Scan on todocinetv (cost=0.00..377 68.90 rows=28792
    width=8)


    But here is the weird thing: I dump the table, export it into another
    machine running Postgres 7.3.2 (Celeron 1.7 Ghz, 512 Mb. of memory), and
    the query takes only 2 or 3 seconds to execute, even though the query
    plan is almost the same:


    provphorum=# explain
    provphorum-# SELECT thread, modifystamp, count(id) AS tcount,
    modifystamp AS latest, max(id) as maxid FROM todocinetv WHERE
    approved='Y' GROUP BY thread, modifystamp ORDER BY modifystamp desc,
    thread desc limit 30 ;
    QUERY PLAN

    ------------------------------------------------------------------------------------------------
    Limit (cost=5765.92.. 5765.99 rows=30 width=12)
    -> Sort (cost=5765.92.. 5772.96 rows=2817 width=12)
    Sort Key: modifystamp, thread
    -> Aggregate (cost=5252.34.. 5604.49 rows=2817 width=12)
    -> Group (cost=5252.34.. 5463.63 rows=28172 width=12)
    -> Sort (cost=5252.34.. 5322.77 rows=28172 width=12)
    Sort Key: thread, modifystamp
    -> Seq Scan on todocinetv
    (cost=0.00..317 0.15 rows=28172 width=12)
    Filter: (approved = 'Y'::bpchar)
    (9 rows)


    (I took out the "datetime" function, since 7.3 didn't accept it and I
    didn't think it was relevant to the performance problem (am I wrong?))

    So my question is: what causes such a big difference? (3 min. vs. 3
    seconds) Does the version difference (7.2 vs. 7.3) account for all of
    it? Or should I start looking at other factors? As I said, both machines
    are almost equivalent hardware-wise, and as for the number of shared
    buffers, the faster machine actually has less of them (the 7.3 machine
    has "shared_buf fers = 768", while the 7.2 one has "shared_buf fers = 1024").



    Paulo Jan.
    DDnet.



    ---------------------------(end of broadcast)---------------------------
    TIP 8: explain analyze is your friend

  • Tom Lane

    #2
    Re: Perfomance difference between 7.2 and 7.3

    Paulo Jan <admin@digital. ddnet.es> writes:[color=blue]
    > -> Seq Scan on todocinetv (cost=0.00..377 68.90 rows=28792
    > width=8)[/color]

    The estimated cost seems to be more than one disk page read per row
    returned. This suggests to me that you have a huge amount of dead space
    in that table --- try a VACUUM FULL on it. If that fixes the problem,
    then you need to improve your housekeeping procedures on the 7.2
    installation: run vacuums more often and ensure that your FSM settings
    are large enough.

    regards, tom lane

    ---------------------------(end of broadcast)---------------------------
    TIP 3: if posting/reading through Usenet, please send an appropriate
    subscribe-nomail command to majordomo@postg resql.org so that your
    message can get through to the mailing list cleanly

    Comment

    • Bruno Wolff III

      #3
      Re: Perfomance difference between 7.2 and 7.3

      On Wed, Nov 12, 2003 at 16:30:41 +0100,
      Paulo Jan <admin@digital. ddnet.es> wrote:[color=blue]
      > This table has about 28000 rows, and is running with Postgres 7.2.3
      > under Red Hat 8.0, in a 2.4 Ghz. Pentiun 4 with 512 Mb. of RAM.[/color]

      You probably want to use 7.4 for this since a new way to do aggragates using
      hashes has been added. 7.4 is currently in release candidate status and
      maybe be released as early as next Monday.
      [color=blue]
      >
      > This query takes up to 3 minutes to execute. I have tried to strip
      > it down and leaving it in its most vanilla form (without "count(id)" and
      > But here is the weird thing: I dump the table, export it into
      > another machine running Postgres 7.3.2 (Celeron 1.7 Ghz, 512 Mb. of
      > memory), and the query takes only 2 or 3 seconds to execute, even though
      > the query plan is almost the same:[/color]

      This makes it sound like you haven't been properly vacuuming and/or
      analyzing the database. You might want to run a vacuum full on the
      production db and see if that speeds things up. Once you have done
      a vacuum full then regular vacuums should keep the number of dead tuples
      down (as long as FSM is set high enough).

      ---------------------------(end of broadcast)---------------------------
      TIP 5: Have you checked our extensive FAQ?



      Comment

      • Paulo Jan

        #4
        Re: Perfomance difference between 7.2 and 7.3

        Tom Lane wrote:[color=blue]
        > Paulo Jan <admin@digital. ddnet.es> writes:
        >[color=green]
        >> -> Seq Scan on todocinetv (cost=0.00..377 68.90 rows=28792
        >>width=8)[/color]
        >
        >
        > The estimated cost seems to be more than one disk page read per row
        > returned. This suggests to me that you have a huge amount of dead space
        > in that table --- try a VACUUM FULL on it. If that fixes the problem,[/color]



        Argh!!! The thing is, I *had* run VACUUM several times before posting
        to the list... but it was VACUUM ANALYZE, not VACUUM FULL. And here I
        was, wondering why VACUUMing so much didn't have any effect...



        Paulo Jan.
        DDnet.



        ---------------------------(end of broadcast)---------------------------
        TIP 2: you can get off all lists at once with the unregister command
        (send "unregister YourEmailAddres sHere" to majordomo@postg resql.org)

        Comment

        • Rick Gigger

          #5
          Re: simple question

          Is this correct?

          vacuum by itself just cleans out the old extraneous tuples so that they
          aren't in the way anymore
          vacuum analyze rebuilds indexes. If you add an index to a table it won't be
          used until you vacuum analyze it
          vacuum full actually compresses the table on disk by reclaiming the space
          from the old tuples after they have been removed.


          ----- Original Message -----
          From: "Tom Lane" <tgl@sss.pgh.pa .us>
          To: "Paulo Jan" <admin@digital. ddnet.es>
          Cc: <pgsql-general@postgre sql.org>
          Sent: Wednesday, November 12, 2003 8:38 AM
          Subject: Re: [GENERAL] Perfomance difference between 7.2 and 7.3

          [color=blue]
          > Paulo Jan <admin@digital. ddnet.es> writes:[color=green]
          > > -> Seq Scan on todocinetv (cost=0.00..377 68.90 rows=28792
          > > width=8)[/color]
          >
          > The estimated cost seems to be more than one disk page read per row
          > returned. This suggests to me that you have a huge amount of dead space
          > in that table --- try a VACUUM FULL on it. If that fixes the problem,
          > then you need to improve your housekeeping procedures on the 7.2
          > installation: run vacuums more often and ensure that your FSM settings
          > are large enough.
          >
          > regards, tom lane
          >
          > ---------------------------(end of broadcast)---------------------------
          > TIP 3: if posting/reading through Usenet, please send an appropriate
          > subscribe-nomail command to majordomo@postg resql.org so that your
          > message can get through to the mailing list cleanly
          >[/color]


          ---------------------------(end of broadcast)---------------------------
          TIP 3: if posting/reading through Usenet, please send an appropriate
          subscribe-nomail command to majordomo@postg resql.org so that your
          message can get through to the mailing list cleanly

          Comment

          • Doug McNaught

            #6
            Re: simple question

            "Rick Gigger" <rick@alpinenet working.com> writes:
            [color=blue]
            > Is this correct?
            >
            > vacuum by itself just cleans out the old extraneous tuples so that they
            > aren't in the way anymore[/color]

            Actually it puts the free space in each page on a list (the free space
            map) so it can be reused for new tuples without having to allocate
            fresh pages. It finds free space by looking for tuples that can't be
            seen any more by any transaction.
            [color=blue]
            > vacuum analyze rebuilds indexes. If you add an index to a table it won't be
            > used until you vacuum analyze it[/color]

            It doesn't rebuild indexes--REINDEX does that. ANALYZE measures the
            size and statistics of the data in the table, so the planner can do a
            good job.
            [color=blue]
            > vacuum full actually compresses the table on disk by reclaiming the space
            > from the old tuples after they have been removed.[/color]

            It moves tuples around and frees up pages at the end of the table,
            thus compacting it.

            So you're mostly wrong on all three. :)

            -Doug

            ---------------------------(end of broadcast)---------------------------
            TIP 5: Have you checked our extensive FAQ?



            Comment

            • scott.marlowe

              #7
              Re: simple question

              On Wed, 12 Nov 2003, Rick Gigger wrote:
              [color=blue]
              > Is this correct?
              >
              > vacuum by itself just cleans out the old extraneous tuples so that they
              > aren't in the way anymore
              > vacuum analyze rebuilds indexes. If you add an index to a table it won't be
              > used until you vacuum analyze it
              > vacuum full actually compresses the table on disk by reclaiming the space
              > from the old tuples after they have been removed.[/color]

              You don't have to analyze AFTER index creation, just at some point in
              time. I.e.:

              create table test ...

              import into table test 1000000 rows

              analyze test;

              create index test_field1_dx on test (id);

              select * from test where id=4567; <- this will likely use the index.


              ---------------------------(end of broadcast)---------------------------
              TIP 6: Have you searched our list archives?



              Comment

              • Rick Gigger

                #8
                Re: simple question

                > > Is this correct?[color=blue][color=green]
                > >
                > > vacuum by itself just cleans out the old extraneous tuples so that they
                > > aren't in the way anymore[/color]
                >
                > Actually it puts the free space in each page on a list (the free space
                > map) so it can be reused for new tuples without having to allocate
                > fresh pages. It finds free space by looking for tuples that can't be
                > seen any more by any transaction.
                >[color=green]
                > > vacuum analyze rebuilds indexes. If you add an index to a table it[/color][/color]
                won't be[color=blue][color=green]
                > > used until you vacuum analyze it[/color]
                >
                > It doesn't rebuild indexes--REINDEX does that. ANALYZE measures the
                > size and statistics of the data in the table, so the planner can do a
                > good job.[/color]

                Is REINDEX something that needs to be done on a periodic basis?
                [color=blue][color=green]
                > > vacuum full actually compresses the table on disk by reclaiming the[/color][/color]
                space[color=blue][color=green]
                > > from the old tuples after they have been removed.[/color]
                >
                > It moves tuples around and frees up pages at the end of the table,
                > thus compacting it.
                >
                > So you're mostly wrong on all three. :)
                >
                > -Doug
                >[/color]

                Thanks!

                Rick


                ---------------------------(end of broadcast)---------------------------
                TIP 3: if posting/reading through Usenet, please send an appropriate
                subscribe-nomail command to majordomo@postg resql.org so that your
                message can get through to the mailing list cleanly

                Comment

                • Bruno Wolff III

                  #9
                  Re: simple question

                  On Thu, Nov 13, 2003 at 12:06:05 -0700,
                  Rick Gigger <rick@alpinenet working.com> wrote:[color=blue]
                  >
                  > Is REINDEX something that needs to be done on a periodic basis?[/color]

                  In version prior to 7.4 some patterns of use will require periodic
                  reindexing. The problem case is when the index column monoticly
                  increases (or decreases) and old values eventually get deleted.
                  In this case the index blocks for the deleted values don't get
                  reused and the size of the index will continually grow.

                  ---------------------------(end of broadcast)---------------------------
                  TIP 5: Have you checked our extensive FAQ?



                  Comment

                  Working...