Performance problem on RH7.1

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Együd Csaba

    Performance problem on RH7.1

    Hi All,
    I've a problem with the perfprmance of the production environment.
    I've two db servers. One on my laptop computer (2Ghz, 1GB, WinXP, Cygwin,
    Postgres 7.3.4) and one on a production server (2GHz, 1GB, Ultra SCSI,
    RH7.1, Postgres 7.3.2).

    I run the same dump and the same query on both of the computers. The
    difference is substantial.
    The query takes 5 times longer on the production server then on the laptop.

    What can be the reason? Could anybody suggest me something?
    Thakn you in advance.

    Best regards,
    -- Csaba Együd


    Kernel parameters on the linux server:
    --------------------------------------
    [root@db kernel]# pwd
    /proc/sys/kernel
    [root@db kernel]# cat shmall shmmax
    134217728
    134217728
    [root@db kernel]#


    The query:
    ----------
    explain analyze select
    id, artnum, oldartnum, name, munitid, getupid, vtsz, vat, description,
    getupquantity, minstock,
    (select count(*) from t_prices where t_prices.produc tid=t_products. id) as
    pcount,
    round(get_stock (id,1)::numeric ,2) as stockm,
    round(get_stock _getup(id,1)::n umeric,2) as stockg,
    (select abbrev from t_munits where id=munitid) as munit,
    (select get_order_getup (id)) as deliverygetup,
    (select (select deliverydate from t_orders where id=orderid) as
    deliverydate
    from t_orderdetails
    where productid=t_pro ducts.id and
    not (select delivered from t_orders where id=orderid) limit 1) as
    deliverydate,
    (select abbrev from t_getups where id=getupid) as getup
    from t_products
    order by artnum;

    QUERY PLAN on my laptop:
    ------------------------
    Sort (cost=70.17..72 .38 rows=885 width=184) (actual time=7264.00..7 264.00
    rows=885 loops=1)
    Sort Key: artnum
    -> Seq Scan on t_products (cost=0.00..26. 85 rows=885 width=184) (actual
    time=21.00..725 9.00 rows=885 loops=1)
    SubPlan
    -> Aggregate (cost=28.62..28 .62 rows=1 width=0) (actual
    time=0.12..0.12 rows=1 loops=885)
    -> Index Scan using t_prices_produc tid on t_prices
    (cost=0.00..28. 60 rows=8 width=0) (actual time=0.05..0.10 rows=2 loops=885)
    Index Cond: (productid = $0)
    -> Seq Scan on t_munits (cost=0.00..1.0 6 rows=1 width=32)
    (actual time=0.02..0.02 rows=1 loops=885)
    Filter: (id = $1)
    -> Result (cost=0.00..0.0 1 rows=1 width=0) (actual
    time=1.13..1.13 rows=1 loops=885)
    -> Limit (cost=0.00..149 .06 rows=1 width=4) (actual
    time=0.09..0.09 rows=0 loops=885)
    -> Seq Scan on t_orderdetails (cost=0.00..149 .06 rows=1
    width=4) (actual time=0.08..0.08 rows=0 loops=885)
    Filter: ((productid = $0) AND (NOT (subplan)))
    SubPlan
    -> Seq Scan on t_orders (cost=0.00..1.2 7 rows=1
    width=14) (actual time=0.00..0.00 rows=1 loops=107)
    Filter: (id = $2)
    -> Seq Scan on t_orders (cost=0.00..1.2 7 rows=1
    width=1) (actual time=0.02..0.03 rows=1 loops=107)
    Filter: (id = $2)
    -> Seq Scan on t_getups (cost=0.00..1.1 6 rows=1 width=32)
    (actual time=0.01..0.02 rows=1 loops=885)
    Filter: (id = $3)
    Total runtime: 7265.00 msec

    QUERY PLAN on the production server:
    ------------------------------------
    Sort (cost=70.17..72 .38 rows=885 width=121) (actual time=36729.92.. 36730.18
    rows=885 loops=1)
    Sort Key: artnum
    -> Seq Scan on t_products (cost=0.00..26. 85 rows=885 width=121) (actual
    time=45.16..367 24.73 rows=885 loops=1)
    SubPlan
    -> Aggregate (cost=9.06..9.0 6 rows=1 width=0) (actual
    time=0.15..0.15 rows=1 loops=885)
    -> Index Scan using t_prices_produc tid on t_prices
    (cost=0.00..9.0 5 rows=2 width=0) (actual time=0.12..0.14 rows=2 loops=885)
    Index Cond: (productid = $0)
    -> Seq Scan on t_munits (cost=0.00..1.0 6 rows=1 width=5) (actual
    time=0.04..0.04 rows=1 loops=885)
    Filter: (id = $1)
    -> Result (cost=0.00..0.0 1 rows=1 width=0) (actual
    time=0.80..0.80 rows=1 loops=885)
    -> Limit (cost=0.00..149 .06 rows=1 width=4) (actual
    time=0.08..0.08 rows=0 loops=885)
    -> Seq Scan on t_orderdetails (cost=0.00..149 .06 rows=1
    width=4) (actual time=0.07..0.08 rows=0 loops=885)
    Filter: ((productid = $0) AND (NOT (subplan)))
    SubPlan
    -> Seq Scan on t_orders (cost=0.00..1.2 7 rows=1
    width=14) (actual time=0.01..0.02 rows=1 loops=107)
    Filter: (id = $2)
    -> Seq Scan on t_orders (cost=0.00..1.2 7 rows=1
    width=1) (actual time=0.01..0.02 rows=1 loops=107)
    Filter: (id = $2)
    -> Seq Scan on t_getups (cost=0.00..1.1 6 rows=1 width=11)
    (actual time=0.03..0.04 rows=1 loops=885)
    Filter: (id = $3)
    Total runtime: 36730.67 msec


    ---
    Outgoing mail is certified Virus Free.
    Checked by AVG anti-virus system (http://www.grisoft.com).
    Version: 6.0.707 / Virus Database: 463 - Release Date: 2004. 06. 15.



    ---------------------------(end of broadcast)---------------------------
    TIP 8: explain analyze is your friend

  • Tom Lane

    #2
    Re: Performance problem on RH7.1

    =?iso-8859-2?Q?Egy=FCd_Csa ba?= <csegyud@vnet.h u> writes:[color=blue]
    > I've a problem with the perfprmance of the production environment.
    > I've two db servers. One on my laptop computer (2Ghz, 1GB, WinXP, Cygwin,
    > Postgres 7.3.4) and one on a production server (2GHz, 1GB, Ultra SCSI,
    > RH7.1, Postgres 7.3.2).[/color]

    Are you using the same postgresql.conf settings on both?

    regards, tom lane

    ---------------------------(end of broadcast)---------------------------
    TIP 3: if posting/reading through Usenet, please send an appropriate
    subscribe-nomail command to majordomo@postg resql.org so that your
    message can get through to the mailing list cleanly

    Comment

    • Együd Csaba

      #3
      Re: Performance problem on RH7.1

      Hi Tom,
      Thank you for your reply.

      No, I do not. On the production server I have higher values for sort_mem
      (32768) and shared_buffers (2048).
      The other settings are the same.

      bye,
      -- Csaba Együd

      [color=blue]
      > -----Original Message-----
      > From: pgsql-general-owner@postgresq l.org
      > [mailto:pgsql-general-owner@postgresq l.org]On Behalf Of Tom Lane
      > Sent: 2004. június 26. 17:10
      > To: csegyud@vnet.hu
      > Cc: Pgsql-General@Postgre sql.Org (E-mail)
      > Subject: Re: [GENERAL] Performance problem on RH7.1
      >
      >
      > =?iso-8859-2?Q?Egy=FCd_Csa ba?= <csegyud@vnet.h u> writes:[color=green]
      > > I've a problem with the perfprmance of the production environment.
      > > I've two db servers. One on my laptop computer (2Ghz, 1GB,[/color]
      > WinXP, Cygwin,[color=green]
      > > Postgres 7.3.4) and one on a production server (2GHz, 1GB,[/color]
      > Ultra SCSI,[color=green]
      > > RH7.1, Postgres 7.3.2).[/color]
      >
      > Are you using the same postgresql.conf settings on both?
      >
      > regards, tom lane
      >
      > ---------------------------(end of
      > broadcast)---------------------------
      > TIP 3: if posting/reading through Usenet, please send an appropriate
      > subscribe-nomail command to majordomo@postg resql.org so
      > that your
      > message can get through to the mailing list cleanly
      >
      > ---
      > Incoming mail is certified Virus Free.
      > Checked by AVG anti-virus system (http://www.grisoft.com).
      > Version: 6.0.707 / Virus Database: 463 - Release Date: 2004. 06. 15.
      >[/color]

      ---
      Outgoing mail is certified Virus Free.
      Checked by AVG anti-virus system (http://www.grisoft.com).
      Version: 6.0.707 / Virus Database: 463 - Release Date: 2004. 06. 15.



      ---------------------------(end of broadcast)---------------------------
      TIP 5: Have you checked our extensive FAQ?



      Comment

      • Scott Marlowe

        #4
        Re: Performance problem on RH7.1

        On Sat, 2004-06-26 at 04:16, Együd Csaba wrote:[color=blue]
        > Hi All,
        > I've a problem with the perfprmance of the production environment.
        > I've two db servers. One on my laptop computer (2Ghz, 1GB, WinXP, Cygwin,
        > Postgres 7.3.4) and one on a production server (2GHz, 1GB, Ultra SCSI,
        > RH7.1, Postgres 7.3.2).
        >
        > I run the same dump and the same query on both of the computers. The
        > difference is substantial.
        > The query takes 5 times longer on the production server then on the laptop.
        >
        > What can be the reason? Could anybody suggest me something?
        > Thakn you in advance.[/color]
        [color=blue]
        >
        > QUERY PLAN on my laptop:
        > ------------------------
        > Sort (cost=70.17..72 .38 rows=885 width=184) (actual time=7264.00..7 264.00
        > rows=885 loops=1)
        > Sort Key: artnum
        > -> Seq Scan on t_products (cost=0.00..26. 85 rows=885 width=184) (actual
        > time=21.00..725 9.00 rows=885 loops=1)[/color]
        [color=blue]
        > QUERY PLAN on the production server:
        > ------------------------------------
        > Sort (cost=70.17..72 .38 rows=885 width=121) (actual time=36729.92.. 36730.18
        > rows=885 loops=1)
        > Sort Key: artnum
        > -> Seq Scan on t_products (cost=0.00..26. 85 rows=885 width=121) (actual
        > time=45.16..367 24.73 rows=885 loops=1)[/color]

        This is the only real difference between the two, the time it's taking
        to seq scan that table. Have you done a vacuum full on it lately? If
        the table is the same size on the disk, but is taking 5 times longer on
        the production server, then something on that machine is broken.


        ---------------------------(end of broadcast)---------------------------
        TIP 8: explain analyze is your friend

        Comment

        • Alvaro Herrera

          #5
          Re: Performance problem on RH7.1

          On Sat, Jun 26, 2004 at 12:16:17PM +0200, Együd Csaba wrote:
          [color=blue]
          > I've a problem with the perfprmance of the production environment.
          > I've two db servers. One on my laptop computer (2Ghz, 1GB, WinXP, Cygwin,
          > Postgres 7.3.4) and one on a production server (2GHz, 1GB, Ultra SCSI,
          > RH7.1, Postgres 7.3.2).
          >
          > I run the same dump and the same query on both of the computers. The
          > difference is substantial.
          > The query takes 5 times longer on the production server then on the laptop.[/color]

          Are both databases properly vacuumed? Did you try a VACUUM FULL? I'm
          wondering why it takes a lot of time seqscanning the t_products table in
          the production server compared to the laptop.

          --
          Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
          "Hoy es el primer día del resto de mi vida"


          ---------------------------(end of broadcast)---------------------------
          TIP 8: explain analyze is your friend

          Comment

          • Együd Csaba

            #6
            Re: Performance problem on RH7.1

            Hi,
            yes it is vacuumed regulary once a day. And vacuum full is done once a week.

            The reasons of the slow seq scan are those two stored procedures in the
            field list (get_stock and get_stock_getup ). These take 13-20 ms every time
            thay executed. Multiplying with the nr of rows we get 11-18 sec.

            It is strange that the laptop substantially faster then the server. The
            get_stock* functions are executed 2-3 times faster. This is a reason, but I
            think it isn't enough. There must be something more there. Next time I'll
            try to run a fsck on the data partition. May be it will show something
            wrong.

            Thank you all.
            Best regards,
            -- Csaba Együd
            [color=blue]
            > -----Original Message-----
            > From: Alvaro Herrera [mailto:alvherre @dcc.uchile.cl]
            > Sent: 2004. június 27. 3:38
            > To: Együd Csaba
            > Cc: Pgsql-General@Postgre sql.Org (E-mail)
            > Subject: Re: [GENERAL] Performance problem on RH7.1
            >
            >
            > On Sat, Jun 26, 2004 at 12:16:17PM +0200, Együd Csaba wrote:
            >[color=green]
            > > I've a problem with the perfprmance of the production environment.
            > > I've two db servers. One on my laptop computer (2Ghz, 1GB,[/color]
            > WinXP, Cygwin,[color=green]
            > > Postgres 7.3.4) and one on a production server (2GHz, 1GB,[/color]
            > Ultra SCSI,[color=green]
            > > RH7.1, Postgres 7.3.2).
            > >
            > > I run the same dump and the same query on both of the computers. The
            > > difference is substantial.
            > > The query takes 5 times longer on the production server[/color]
            > then on the laptop.
            >
            > Are both databases properly vacuumed? Did you try a VACUUM FULL? I'm
            > wondering why it takes a lot of time seqscanning the
            > t_products table in
            > the production server compared to the laptop.
            >
            > --
            > Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
            > "Hoy es el primer día del resto de mi vida"
            >
            > ---
            > Incoming mail is certified Virus Free.
            > Checked by AVG anti-virus system (http://www.grisoft.com).
            > Version: 6.0.707 / Virus Database: 463 - Release Date: 2004. 06. 15.
            >[/color]

            ---
            Outgoing mail is certified Virus Free.
            Checked by AVG anti-virus system (http://www.grisoft.com).
            Version: 6.0.712 / Virus Database: 468 - Release Date: 2004. 06. 27.



            ---------------------------(end of broadcast)---------------------------
            TIP 2: you can get off all lists at once with the unregister command
            (send "unregister YourEmailAddres sHere" to majordomo@postg resql.org)

            Comment

            • Tom Lane

              #7
              Re: Performance problem on RH7.1

              =?iso-8859-1?Q?Egy=FCd_Csa ba?= <csegyud@vnet.h u> writes:[color=blue]
              > It is strange that the laptop substantially faster then the server. The
              > get_stock* functions are executed 2-3 times faster.[/color]

              So what do those stored procedures do exactly?

              What it smells like to me is a bad plan for a query executed in one of
              the stored procedures, but it's hard to theorize with no data.

              regards, tom lane

              ---------------------------(end of broadcast)---------------------------
              TIP 4: Don't 'kill -9' the postmaster

              Comment

              • Együd Csaba

                #8
                Re: Performance problem on RH7.1

                Hi,
                here is one of the stored procedures (the other is almost the same - queries
                quantity instead of getup). I explain analyzed the queries called from the
                stored procedures.
                Thans.

                bye,
                -- cs.

                *************** *************** *************** *************** *********
                alumiltmp=# explain analyze select
                round(get_stock _getup(234,1,'2 004.06.28')::nu meric,2);
                NOTICE: select date,time from t_stockchanges where stockid='1' and
                productid='234' and date<='2004.06. 28' and changeid=
                1 order by time desc limit 1;
                NOTICE: select dir, sum(getup) as getup from (select getup, (select dir
                from t_changes where id = changeid) as dir from
                t_stockchanges where productid='234' and stockid='1' and date>='2004.06. 01
                ' and date<='2004.06. 28' order by ti
                me) as foo group by dir
                QUERY PLAN
                ----------------------------------------------------------------------------
                --------
                Result (cost=0.00..0.0 1 rows=1 width=0) (actual time=13.97..13. 97 rows=1
                loops=1)
                Total runtime: 13.99 msec
                (2 rows)

                *************** *************** *************** *************** *********

                alumiltmp=# EXPLAIN ANALYZE select date,time from t_stockchanges where
                stockid='1' and productid='234' and date<='2004.
                06.28' and changeid=1 order by time desc limit 1;
                QUERY
                PLAN
                ----------------------------------------------------------------------------
                --------------------------------------------
                -----------------------------
                Limit (cost=28.84..28 .84 rows=1 width=46) (actual time=9.10..9.10 rows=1
                loops=1)
                -> Sort (cost=28.84..28 .86 rows=7 width=46) (actual time=9.10..9.10
                rows=2 loops=1)
                Sort Key: "time"
                -> Index Scan using t_stockchanges_ fullindex on t_stockchanges
                (cost=0.00..28. 74 rows=7 width=46)
                (actual time=0.14..9.03 rows=6 loops=1)
                Index Cond: ((date <= '2004.06.28'::b pchar) AND (stockid = 1)
                AND (productid = 234) AND (changeid = 1))
                Total runtime: 9.17 msec
                (6 rows)

                *************** *************** *************** *************** *********

                alumiltmp=# EXPLAIN ANALYZE select dir, sum(getup) as getup from (select
                getup, (select dir from t_changes where id = c
                hangeid) as dir from t_stockchanges where productid='234' and stockid='1'
                and date>='2004.06. 01 ' and date<='20
                04.06.28' order by time) as foo group by dir;

                QUERY PLAN

                ----------------------------------------------------------------------------
                --------------------------------------------
                ----------------------------------------------
                Aggregate (cost=6.92..6.9 3 rows=1 width=38) (actual time=1.63..1.65 rows=2
                loops=1)
                -> Group (cost=6.92..6.9 2 rows=1 width=38) (actual time=1.62..1.63
                rows=7 loops=1)
                -> Sort (cost=6.92..6.9 2 rows=1 width=38) (actual time=1.61..1.62
                rows=7 loops=1)
                Sort Key: dir
                -> Subquery Scan foo (cost=6.90..6.9 1 rows=1 width=38)
                (actual time=1.55..1.56 rows=7 loops=1)
                -> Sort (cost=6.90..6.9 1 rows=1 width=38) (actual
                time=1.55..1.55 rows=7 loops=1)
                Sort Key: "time"
                -> Index Scan using t_stockchanges_ fullindex on
                t_stockchanges (cost=0.00..6.8 9 rows=1
                width=38) (actual time=0.07..1.52 rows=7
                loops=1)
                Index Cond: ((date >= '2004.06.01
                '::bpchar) AND (date <= '2004.06.28'::b pchar)
                AND (stockid = 1) AND (productid = 234))
                SubPlan
                -> Seq Scan on t_changes
                (cost=0.00..1.1 6 rows=1 width=5) (actual time=0.01..0.01
                rows=1 loops=7)
                Filter: (id = $0)
                Total runtime: 1.78 msec
                (13 rows)


                *************** *************** *************** *************** *********

                create or replace function "get_stock_getu p" (int, int, text) returns
                numeric as'
                declare
                ProductID alias for $1;
                StockID alias for $2;
                ADate alias for $3;

                OpenTime text;
                q text;
                R record;
                retval numeric;
                begin
                OpenTime := '''';
                -- Megkeressük a termék utolsó nyitókészletét az adott raktárban. Ha
                nincs,
                -- akkor a raktár elejétõl kezdve dolgozzuk fel az adatokat.
                q := ''select date,time from t_stockchanges where '' ||
                ''stockid='' || quote_literal(S tockID) || '' and '' ||
                ''productid='' || quote_literal(P roductID) || '' and '' ||
                ''date<='' || quote_literal(A Date) || '' and '' ||
                ''changeid=1 order by time desc limit 1;'';
                -- raise notice ''%'',q;
                for R in execute q loop
                OpenTime := R.date;
                end loop;
                --raise notice ''%'', OpenTime;

                -- Ha OpenTime is null, azaz nem volt nyitó, akkor az összes rekordot
                visszakapjuk.
                retval := 0;
                q := ''select dir, sum(getup) as getup from (select getup, (select dir
                from t_changes where id = changeid) as dir '' ||
                ''from t_stockchanges where productid='' || quote_literal(P roductID)
                || '' and '' ||
                ''stockid='' || quote_literal(S tockID) || '' and '' ||
                ''date>='' || quote_literal(O penTime) || '' and date<='' ||
                quote_literal(A Date) ||
                '' order by time) as foo group by dir'';
                -- raise notice ''%'',q;

                for R in execute q loop
                if R.dir=''+'' then
                retval := retval + R.getup;
                end if;
                if R.dir=''-'' then
                retval := retval - R.getup;
                end if;
                end loop;

                return retval;
                end;
                'LANGUAGE 'plpgsql';

                [color=blue]
                > -----Original Message-----
                > From: pgsql-general-owner@postgresq l.org
                > [mailto:pgsql-general-owner@postgresq l.org]On Behalf Of Tom Lane
                > Sent: 2004. június 28. 0:15
                > To: csegyud@vnet.hu
                > Cc: 'Alvaro Herrera'; 'Pgsql-General@Postgre sql.Org (E-mail)'
                > Subject: Re: [GENERAL] Performance problem on RH7.1
                >
                >
                > =?iso-8859-1?Q?Egy=FCd_Csa ba?= <csegyud@vnet.h u> writes:[color=green]
                > > It is strange that the laptop substantially faster then the[/color]
                > server. The[color=green]
                > > get_stock* functions are executed 2-3 times faster.[/color]
                >
                > So what do those stored procedures do exactly?
                >
                > What it smells like to me is a bad plan for a query executed in one of
                > the stored procedures, but it's hard to theorize with no data.
                >
                > regards, tom lane
                >
                > ---------------------------(end of
                > broadcast)---------------------------
                > TIP 4: Don't 'kill -9' the postmaster
                >
                > ---
                > Incoming mail is certified Virus Free.
                > Checked by AVG anti-virus system (http://www.grisoft.com).
                > Version: 6.0.712 / Virus Database: 468 - Release Date: 2004. 06. 27.
                >[/color]

                ---
                Outgoing mail is certified Virus Free.
                Checked by AVG anti-virus system (http://www.grisoft.com).
                Version: 6.0.712 / Virus Database: 468 - Release Date: 2004. 06. 27.



                ---------------------------(end of broadcast)---------------------------
                TIP 2: you can get off all lists at once with the unregister command
                (send "unregister YourEmailAddres sHere" to majordomo@postg resql.org)

                Comment

                • Tom Lane

                  #9
                  Re: Performance problem on RH7.1

                  =?iso-8859-2?Q?Egy=FCd_Csa ba?= <csegyud@vnet.h u> writes:[color=blue]
                  > here is one of the stored procedures (the other is almost the same - queries
                  > quantity instead of getup). I explain analyzed the queries called from the
                  > stored procedures.[/color]

                  The major time sink is clearly here:
                  [color=blue]
                  > -> Index Scan using t_stockchanges_ fullindex on t_stockchanges
                  > (cost=0.00..28. 74 rows=7 width=46)
                  > (actual time=0.14..9.03 rows=6 loops=1)
                  > Index Cond: ((date <= '2004.06.28'::b pchar) AND (stockid = 1)
                  > AND (productid = 234) AND (changeid = 1))[/color]

                  and I think the problem is you've not chosen the index very well. Using
                  date as the most significant index column is simply the wrong way to do
                  this query --- the thing is going to start at the beginning of time and
                  scan forward over *all* index entries until it reaches a date greater
                  than the cutoff. What you want is date as the least significant index
                  column, so that you don't have to scan entries for irrelevant stocks at
                  all. Also you should probably put time into the index (in fact, why do
                  you have separate date and time fields at all?). What you really want
                  here is an index on (stockid, productid, changeid, date, time) and to
                  get a backwards indexscan with no sort step. It'd have to look like

                  where stockid='1' and productid='234' and changeid=1 and date<='2004.06. 28'
                  order by stockid desc, productid desc, changeid desc, date desc, time desc
                  limit 1

                  I'd also suggest dropping the EXECUTE approach, as this is costing you
                  a re-plan on every call without buying much of anything.

                  A larger issue is whether you shouldn't forego the stored procedures
                  entirely and convert the whole problem into a join. The way you are
                  doing things now is essentially a forced nested-loop join between the
                  table traversed by the outer query and the table examined by the stored
                  procedures. Nested-loop is often the least efficient way to do a join.
                  But that could get pretty messy notationally, and I'm not sure how much
                  win there would be.

                  regards, tom lane

                  ---------------------------(end of broadcast)---------------------------
                  TIP 7: don't forget to increase your free space map settings

                  Comment

                  • Együd Csaba

                    #10
                    Re: Performance problem on RH7.1

                    > The major time sink is clearly here:[color=blue]
                    >[color=green]
                    > > -> Index Scan using t_stockchanges_ fullindex on[/color]
                    > t_stockchanges[color=green]
                    > > (cost=0.00..28. 74 rows=7 width=46)
                    > > (actual time=0.14..9.03 rows=6 loops=1)
                    > > Index Cond: ((date <= '2004.06.28'::b pchar)[/color]
                    > AND (stockid = 1)[color=green]
                    > > AND (productid = 234) AND (changeid = 1))[/color][/color]
                    Yes, it must be there.
                    [color=blue]
                    >
                    > and I think the problem is you've not chosen the index very
                    > well. Using
                    > date as the most significant index column is simply the wrong
                    > way to do
                    > this query[/color]
                    You are right. I haven't thought about this yet, and to tell the truth this
                    index is a "left there" index from the early development times. I didn't
                    review that since I had made it.
                    [color=blue]
                    > --- the thing is going to start at the beginning
                    > of time and
                    > scan forward over *all* index entries until it reaches a date greater
                    > than the cutoff. What you want is date as the least significant index
                    > column, so that you don't have to scan entries for irrelevant
                    > stocks at
                    > all. Also you should probably put time into the index (in
                    > fact, why do
                    > you have separate date and time fields at all?). What you really want
                    > here is an index on (stockid, productid, changeid, date, time) and to
                    > get a backwards indexscan with no sort step. It'd have to look like
                    >
                    > where stockid='1' and productid='234' and changeid=1
                    > and date<='2004.06. 28'
                    > order by stockid desc, productid desc, changeid desc,
                    > date desc, time desc
                    > limit 1[/color]
                    It is a good idea and I will do it in this way.
                    [color=blue]
                    > I'd also suggest dropping the EXECUTE approach, as this is costing you
                    > a re-plan on every call without buying much of anything.[/color]
                    Do you mean I should use PERFORM instead? Or what else?
                    Do you mean the "for R in execute" statements? How can I run a dynamic query
                    in other way?
                    [color=blue]
                    >
                    > A larger issue is whether you shouldn't forego the stored procedures
                    > entirely and convert the whole problem into a join. The way you are
                    > doing things now is essentially a forced nested-loop join between the
                    > table traversed by the outer query and the table examined by
                    > the stored
                    > procedures. Nested-loop is often the least efficient way to
                    > do a join.
                    > But that could get pretty messy notationally, and I'm not
                    > sure how much
                    > win there would be.[/color]
                    I use stored procedures because it is clearer and simpler way then always
                    writing big complex queries with a lot of joins etc. I know that it has it's
                    price as well.
                    On the other hand you have lit up something in my mind so I will think about
                    it seriosly. I wish I have some time to do so...

                    Bye,
                    -- Csaba Együd

                    ---
                    Outgoing mail is certified Virus Free.
                    Checked by AVG anti-virus system (http://www.grisoft.com).
                    Version: 6.0.712 / Virus Database: 468 - Release Date: 2004. 06. 27.



                    ---------------------------(end of broadcast)---------------------------
                    TIP 8: explain analyze is your friend

                    Comment

                    • Tom Lane

                      #11
                      Re: Performance problem on RH7.1

                      =?iso-8859-2?Q?Egy=FCd_Csa ba?= <csegyud@vnet.h u> writes:[color=blue][color=green]
                      >> I'd also suggest dropping the EXECUTE approach, as this is costing you
                      >> a re-plan on every call without buying much of anything.[/color][/color]
                      [color=blue]
                      > Do you mean I should use PERFORM instead? Or what else?
                      > Do you mean the "for R in execute" statements? How can I run a dynamic query
                      > in other way?[/color]

                      No, I mean the most straightforward way:

                      for R in select ... where stockid = $1 and ...

                      This lets plpgsql cache the plan for the SELECT.

                      regards, tom lane

                      ---------------------------(end of broadcast)---------------------------
                      TIP 8: explain analyze is your friend

                      Comment

                      • Együd Csaba

                        #12
                        Re: Performance problem on RH7.1

                        Hi Tom,
                        I did the modifications you suggested on the t_stockchanges_ fullindex and
                        the result tells everthing:

                        ---------
                        explain analyze select date,time from t_stockchanges where stockid='1' and
                        productid='234' and date<='2004.06. 29' and changeid=1 order by stockid,
                        productid, changeid, date, time desc limit 1;
                        ---------
                        QUERY PLAN
                        Limit (cost=30.28..30 .28 rows=1 width=58) (actual time=0.19..0.19 rows=1
                        loops=1)
                        -> Sort (cost=30.28..30 .30 rows=7 width=58) (actual time=0.18..0.18
                        rows=2 loops=1)
                        Sort Key: stockid, productid, changeid, date, "time"
                        -> Index Scan using t_stockchanges_ fullindex on t_stockchanges
                        (cost=0.00..30. 18 rows=7 width=58) (actual time=0.04..0.08 rows=6 loops=1)
                        Index Cond: ((stockid = 1) AND (productid = 234) AND (changeid
                        = 1) AND (date <= '2004.06.29'::b pchar))
                        Total runtime: 0.25 msec
                        ( Compared to 9.17 msec !!!! 37 times faster! )
                        ----------

                        Thank you wery much Tom! It was very kind of you!

                        Best regards,
                        -- Csaba Együd


                        [color=blue]
                        > -----Original Message-----
                        > From: Tom Lane [mailto:tgl@sss. pgh.pa.us]
                        > Sent: 2004. június 28. 20:10
                        > To: csegyud@vnet.hu
                        > Cc: 'Alvaro Herrera'; 'Pgsql-General@Postgre sql.Org (E-mail)'
                        > Subject: Re: [GENERAL] Performance problem on RH7.1
                        >
                        >
                        > =?iso-8859-2?Q?Egy=FCd_Csa ba?= <csegyud@vnet.h u> writes:[color=green][color=darkred]
                        > >> I'd also suggest dropping the EXECUTE approach, as this is[/color][/color]
                        > costing you[color=green][color=darkred]
                        > >> a re-plan on every call without buying much of anything.[/color][/color]
                        >[color=green]
                        > > Do you mean I should use PERFORM instead? Or what else?
                        > > Do you mean the "for R in execute" statements? How can I[/color]
                        > run a dynamic query[color=green]
                        > > in other way?[/color]
                        >
                        > No, I mean the most straightforward way:
                        >
                        > for R in select ... where stockid = $1 and ...
                        >
                        > This lets plpgsql cache the plan for the SELECT.
                        >
                        > regards, tom lane
                        > ---
                        > Incoming mail is certified Virus Free.
                        > Checked by AVG anti-virus system (http://www.grisoft.com).
                        > Version: 6.0.712 / Virus Database: 468 - Release Date: 2004. 06. 27.
                        >[/color]

                        ---
                        Outgoing mail is certified Virus Free.
                        Checked by AVG anti-virus system (http://www.grisoft.com).
                        Version: 6.0.712 / Virus Database: 468 - Release Date: 2004. 06. 27.



                        ---------------------------(end of broadcast)---------------------------
                        TIP 6: Have you searched our list archives?



                        Comment

                        • Tom Lane

                          #13
                          Re: Performance problem on RH7.1

                          =?iso-8859-2?Q?Egy=FCd_Csa ba?= <csegyud@vnet.h u> writes:[color=blue]
                          > Limit (cost=30.28..30 .28 rows=1 width=58) (actual time=0.19..0.19 rows=1
                          > loops=1)
                          > -> Sort (cost=30.28..30 .30 rows=7 width=58) (actual time=0.18..0.18
                          > rows=2 loops=1)
                          > Sort Key: stockid, productid, changeid, date, "time"
                          > -> Index Scan using t_stockchanges_ fullindex on t_stockchanges
                          > (cost=0.00..30. 18 rows=7 width=58) (actual time=0.04..0.08 rows=6 loops=1)
                          > Index Cond: ((stockid = 1) AND (productid = 234) AND (changeid
                          > = 1) AND (date <= '2004.06.29'::b pchar))
                          > Total runtime: 0.25 msec
                          > ( Compared to 9.17 msec !!!! 37 times faster! )[/color]

                          Good, but you're not there yet --- the Sort step shouldn't be there at
                          all. You've still got some inconsistency between the ORDER BY and the
                          index. Check my example again.

                          regards, tom lane

                          ---------------------------(end of broadcast)---------------------------
                          TIP 1: subscribe and unsubscribe commands go to majordomo@postg resql.org

                          Comment

                          • Együd Csaba

                            #14
                            Re: Performance problem on RH7.1

                            Hi Tom,
                            [color=blue]
                            > Good, but you're not there yet --- the Sort step shouldn't be there at
                            > all. You've still got some inconsistency between the ORDER BY and the
                            > index. Check my example again.[/color]
                            yes yes I missed that, sorry. Now don't mention the performance because I
                            couldn' see anything but the result. :)

                            In general I'd like to draw the consequences. What kind of theories should I
                            keep in mind when I want to choose an appropriate index key? I ask it
                            bacause I'm trying to optimize an other query of mine and I'm facing some
                            more problems.

                            I have this query: (note, that the planner uses t_stockchanges_ fullindex,
                            instead of t_stockchanges_ fullindex4 which is exactly what I would need)
                            =============== =========
                            explain analyze select getup, (select dir from t_changes where id=changeid)
                            as dir
                            from t_stockchanges where stockid='1' and productid='428' and
                            date>='2004.06. 01' and date<='2004.06. 29'
                            order by stockid, productid, date;
                            QUERY PLAN
                            Sort (cost=7.17..7.1 7 rows=1 width=46) (actual time=3.00..3.00 rows=5
                            loops=1)
                            Sort Key: stockid, productid, date
                            -> Index Scan using t_stockchanges_ fullindex on t_stockchanges
                            (cost=0.00..7.1 6 rows=1 width=46) (actual time=1.00..3.00 rows=5 loops=1)
                            Index Cond: ((date >= '2004.06.01'::b pchar) AND (date <=
                            '2004.06.29'::b pchar) AND (stockid = 1) AND (productid = 428))
                            SubPlan
                            -> Seq Scan on t_changes (cost=0.00..1.1 6 rows=1 width=5)
                            (actual time=0.00..0.00 rows=1 loops=5)
                            Filter: (id = $0)
                            Total runtime: 3.00 msec
                            =============== =========

                            And these indexes:

                            =============== =========
                            CREATE INDEX t_stockchanges_ fullindex4
                            ON t_stockchanges
                            USING btree
                            (stockid, productid, date);

                            DROP INDEX t_stockchanges_ fullindex3;
                            CREATE INDEX t_stockchanges_ fullindex3
                            ON t_stockchanges
                            USING btree
                            (stockid, productid, changeid, date, time);

                            DROP INDEX t_stockchanges_ fullindex;
                            CREATE INDEX t_stockchanges_ fullindex
                            ON t_stockchanges
                            USING btree
                            (date, stockid, productid, changeid);
                            =============== =========

                            If I delete the index t_stockchanges_ fullindex, I get the following (better)
                            result.

                            =============== =========
                            DROP INDEX t_stockchanges_ fullindex;

                            explain analyze select getup, (select dir from t_changes where id=changeid)
                            as dir
                            from t_stockchanges where stockid='1' and productid='428' and
                            date>='2004.06. 01' and date<='2004.06. 29'
                            order by stockid, productid, date;
                            QUERY PLAN
                            Index Scan using t_stockchanges_ fullindex4 on t_stockchanges
                            (cost=0.00..7.3 3 rows=1 width=46) (actual time=0.00..0.00 rows=5 loops=1)
                            Index Cond: ((stockid = 1) AND (productid = 428) AND (date >=
                            '2004.06.01'::b pchar) AND (date <= '2004.06.29'::b pchar))
                            SubPlan
                            -> Seq Scan on t_changes (cost=0.00..1.1 6 rows=1 width=5) (actual
                            time=0.00..0.00 rows=1 loops=5)
                            Filter: (id = $0)
                            Total runtime: 0.00 msec
                            =============== =========


                            Recreating the t_stockchanges_ fullindex I get the first result - so it is
                            not the case of the creation order of similar indexes or something similar.
                            Is there any explicit way to make the server to use an index of my choice? I
                            thought (from your examples) that it can be done by giving the "where" and
                            "order by" fields in the correct order. But now I seem making mistakes.

                            Thank you for your patience!

                            Best regards,
                            -- Csaba

                            ---
                            Outgoing mail is certified Virus Free.
                            Checked by AVG anti-virus system (http://www.grisoft.com).
                            Version: 6.0.712 / Virus Database: 468 - Release Date: 2004. 06. 27.



                            ---------------------------(end of broadcast)---------------------------
                            TIP 4: Don't 'kill -9' the postmaster

                            Comment

                            • Tom Lane

                              #15
                              Re: Performance problem on RH7.1

                              =?iso-8859-2?Q?Egy=FCd_Csa ba?= <csegyud@vnet.h u> writes:[color=blue]
                              > In general I'd like to draw the consequences. What kind of theories should I
                              > keep in mind when I want to choose an appropriate index key?[/color]

                              Generally you want '=' conditions on the leftmost index keys; any
                              inequality or range constraint should be on the rightmost keys. You can
                              see this by thinking about the range of index entries that the scan will
                              have to pass over.

                              Unfortunately I think the planner's cost model for indexscans is too
                              crude to recognize this fact (something else for the TODO list...).
                              It understands about index size and index selectivity, but given two
                              indexes on the same columns in different orders, I don't think it really
                              has the tools to make the right choice --- the cost estimates are going
                              to come out the same.
                              [color=blue]
                              > Is there any explicit way to make the server to use an index of my choice?[/color]

                              No, but in most cases choosing an ORDER BY clause that matches the index
                              order (ascending or descending depending on where you want the scan to
                              start) is a sufficiently heavy thumb on the scales. To meet the ORDER
                              BY when using the "wrong" index, the planner will have to add a Sort
                              step, and that is usually enough to push the estimated cost above the
                              cost of using the "right" index.

                              regards, tom lane

                              ---------------------------(end of broadcast)---------------------------
                              TIP 7: don't forget to increase your free space map settings

                              Comment

                              Working...