Insert speed question

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Josué Maldonado

    Insert speed question

    Hello List,

    I'm importing some data from Foxpro to Postgres, there is atable wich
    contains aprox 4.8 million rows and it size about 830MB. I uploaded it
    to Postgres using dbf2pg and worked fine, it tooks about 10-15 minutes.
    Now I'm inserting some data from that table to a brand new table in
    Postgresql, for that I'm doing insert into ... select from. The point is
    inserting this data from one table to another table in Postgresql took
    about 35 minutes ago. Is that the expected behavior in Postgres?

    BTW both tables have no indices or triggers, my Postgres version is 7.4
    running on a dual Xeon 2.8 with 2GB ram and about 11GB available on the
    partition where Postgres is Installed.

    Settings in postgresql.conf are:

    effective_cache _size = 170000 # typically 8KB each
    sort_mem = 131072 # min 64, size in KB
    checkpoint_segm ents = 10
    shared_buffers = 63000 # min max_connections *2 or 16, 8KB each
    max_fsm_relatio ns = 400 # min 10, fsm is free space map, ~40 bytes
    max_fsm_pages = 80000 # min 1000, fsm is free space map,
    max_locks_per_t ransaction = 64 # min 10
    tcpip_socket = true
    max_connections = 128


    Thanks in advance


    --
    Sinceramente,
    Josué Maldonado.
    "Te dejaré de amar el día que un pintor pinte sobre su tela el sonido de
    una lágrima."

    ---------------------------(end of broadcast)---------------------------
    TIP 6: Have you searched our list archives?



  • Gaetano Mendola

    #2
    Re: Insert speed question

    Josué Maldonado wrote:
    [color=blue]
    > sort_mem = 131072 # min 64, size in KB[/color]

    128 MB for sort_mem is really an huge ammount of
    memory considering that is not system-wide but
    almost for process ( under certain operations a
    single process can use more then this quantity ).
    Hackers: am I wrong ?


    Regards
    Gaetano Mendola


    Comment

    • Shridhar Daithankar

      #3
      Re: Insert speed question

      On Tuesday 01 June 2004 01:35, Josué Maldonado wrote:[color=blue]
      > Hello List,
      >
      > I'm importing some data from Foxpro to Postgres, there is atable wich
      > contains aprox 4.8 million rows and it size about 830MB. I uploaded it
      > to Postgres using dbf2pg and worked fine, it tooks about 10-15 minutes.
      > Now I'm inserting some data from that table to a brand new table in
      > Postgresql, for that I'm doing insert into ... select from. The point is
      > inserting this data from one table to another table in Postgresql took
      > about 35 minutes ago. Is that the expected behavior in Postgres?[/color]

      Can you generate explain analyze for insert into.. select from? Most probably
      it is using seq. scan because you haven't analysed after inserting 4.8M rows.

      Do a vacuum verbose analyze tablename and reattempt inter into.. select from.

      You can also read general tuning guide at



      HTH

      Shridhar

      ---------------------------(end of broadcast)---------------------------
      TIP 2: you can get off all lists at once with the unregister command
      (send "unregister YourEmailAddres sHere" to majordomo@postg resql.org)

      Comment

      • Richard Huxton

        #4
        Re: Insert speed question

        Gaetano Mendola wrote:[color=blue]
        > Josué Maldonado wrote:
        >[color=green]
        >> sort_mem = 131072 # min 64, size in KB[/color]
        >
        >
        > 128 MB for sort_mem is really an huge ammount of
        > memory considering that is not system-wide but
        > almost for process ( under certain operations a
        > single process can use more then this quantity ).
        > Hackers: am I wrong ?[/color]

        Not a hacker, but you're right. It's the amount of memory *per sort*. Of
        course, Josue might have a Terabyte of RAM but it's unlikely.

        --
        Richard Huxton
        Archonet Ltd

        ---------------------------(end of broadcast)---------------------------
        TIP 7: don't forget to increase your free space map settings

        Comment

        • Josué Maldonado

          #5
          Re: Insert speed question

          Thanks for your responses,

          I did the vacuum but I cannot make the insert again at this moment, even
          when that server is not in production so all the resources should be
          dedicated to Postgres I think I still have some perfomance issues

          Did some changes to postgresql.conf according the tuning guide:
          tcpip_socket = true
          max_connections = 28
          shared_buffers = 32768 # min max_connections *2 or 16, 8KB each
          max_fsm_relatio ns = 500 # min 10, fsm is free space map, ~40 bytes
          max_fsm_pages = 80000 # min 1000, fsm is free space map, ~6
          max_locks_per_t ransaction = 64 # min 10
          sort_mem = 16384 # min 64, size in KB
          vacuum_mem = 419430 # min 1024, size in KB
          checkpoint_segm ents = 10
          effective_cache _size = 819200 # typically 8KB each

          Shmmax is:
          /proc/sys/kernel: cat shmmax
          536870912

          A simple query on the 4.8 million row table:

          dbmund=# explain analyze select * from pkardex where pkd_procode='89 59';
          QUERY
          PLAN
          ---------------------------------------------------------------------------------------------------------------------------------------
          Index Scan using ix_pkardex_proc ode on pkardex (cost=0.00..386 5.52
          rows=991 width=287) (actual time=10.879..10 0.914 rows=18 loops=1)
          Index Cond: (pkd_procode = '8959'::bpchar)
          Total runtime: 101.057 ms
          (3 rows)


          A simple query on 1.2 million row

          explain analyze select * from pmdoc where pdc_docto='7441 44';
          QUERY PLAN

          ------------------------------------------------------------------------------------------------------------------------
          Index Scan using ix_pmdoc_docto on pmdoc (cost=0.00..5.2 0 rows=2
          width=206) (actual time=0.081..0.0 85 rows=1 loops=1)
          Index Cond: (pdc_docto = '744144'::bpcha r)
          Total runtime: 0.140 ms
          (3 rows)


          I would appreciate any comment or suggestion, does a hardware upgrade is
          needed, does it seems "normal" for postgresql perfomance.

          Thanks in advance




          El 01/06/2004 1:35 AM, Shridhar Daithankar en su mensaje escribio:
          [color=blue]
          > On Tuesday 01 June 2004 01:35, Josué Maldonado wrote:
          >[color=green]
          >>Hello List,
          >>
          >>I'm importing some data from Foxpro to Postgres, there is atable wich
          >>contains aprox 4.8 million rows and it size about 830MB. I uploaded it
          >>to Postgres using dbf2pg and worked fine, it tooks about 10-15 minutes.
          >>Now I'm inserting some data from that table to a brand new table in
          >>Postgresql, for that I'm doing insert into ... select from. The point is
          >>inserting this data from one table to another table in Postgresql took
          >>about 35 minutes ago. Is that the expected behavior in Postgres?[/color]
          >
          >
          > Can you generate explain analyze for insert into.. select from? Most probably
          > it is using seq. scan because you haven't analysed after inserting 4.8M rows.
          >
          > Do a vacuum verbose analyze tablename and reattempt inter into.. select from.
          >
          > You can also read general tuning guide at
          >
          > http://www.varlena.com/varlena/Gener...bits/index.php
          >
          > HTH
          >
          > Shridhar
          >
          > ---------------------------(end of broadcast)---------------------------
          > TIP 2: you can get off all lists at once with the unregister command
          > (send "unregister YourEmailAddres sHere" to majordomo@postg resql.org)
          >[/color]


          --
          Sinceramente,
          Josué Maldonado.
          "Toda otra ciencia es perjudicial a quien no posee la ciencia de la
          bondad." Michel Eyquen de Montaigne. Filósofo y escritor francés.

          ---------------------------(end of broadcast)---------------------------
          TIP 4: Don't 'kill -9' the postmaster

          Comment

          • Shridhar Daithankar

            #6
            Re: Insert speed question

            On Tuesday 01 June 2004 21:42, Josué Maldonado wrote:[color=blue]
            > Thanks for your responses,
            >
            > I did the vacuum but I cannot make the insert again at this moment, even
            > when that server is not in production so all the resources should be
            > dedicated to Postgres I think I still have some perfomance issues[/color]

            I am not sure I understand. You could not insert? Why? Was there any problem
            with database? Can you use typical linux tools such as vmstat/top to locate
            the bottleneck?
            [color=blue]
            > Did some changes to postgresql.conf according the tuning guide:
            > tcpip_socket = true
            > max_connections = 28
            > shared_buffers = 32768 # min max_connections *2 or 16, 8KB each
            > max_fsm_relatio ns = 500 # min 10, fsm is free space map, ~40 bytes
            > max_fsm_pages = 80000 # min 1000, fsm is free space map, ~6
            > max_locks_per_t ransaction = 64 # min 10
            > sort_mem = 16384 # min 64, size in KB
            > vacuum_mem = 419430 # min 1024, size in KB
            > checkpoint_segm ents = 10
            > effective_cache _size = 819200 # typically 8KB each[/color]

            OK, I would say the parameters are still slightly oversized but there is no
            perfect set of parameters. You still might have to tune it according to your
            usual workload.
            [color=blue]
            > A simple query on the 4.8 million row table:
            >
            > dbmund=# explain analyze select * from pkardex where pkd_procode='89 59';
            > QUERY
            > PLAN
            > ---------------------------------------------------------------------------
            >------------------------------------------------------------ Index Scan
            > using ix_pkardex_proc ode on pkardex (cost=0.00..386 5.52 rows=991
            > width=287) (actual time=10.879..10 0.914 rows=18 loops=1)
            > Index Cond: (pkd_procode = '8959'::bpchar)
            > Total runtime: 101.057 ms
            > (3 rows)
            >
            >
            > A simple query on 1.2 million row
            >
            > explain analyze select * from pmdoc where pdc_docto='7441 44';
            > QUERY PLAN
            >
            > ---------------------------------------------------------------------------
            >--------------------------------------------- Index Scan using
            > ix_pmdoc_docto on pmdoc (cost=0.00..5.2 0 rows=2 width=206) (actual
            > time=0.081..0.0 85 rows=1 loops=1)
            > Index Cond: (pdc_docto = '744144'::bpcha r)
            > Total runtime: 0.140 ms
            > (3 rows)[/color]

            I wouldn't say these timings have performance issues. 100ms is pretty fastso
            much is 0.140 ms.

            Note that there is a latency involved. No matter how much you tune, it can not
            drop below a certain level. On my last machine(P-III/1GHz with IDE disk) I
            observed it to be 200ms no matter what you do. But it could do 70 concurrent
            connections with worst case latency of 210ms.(This was long back so number
            means little but this is just an illustration)

            This could be different on your setup but trend should be roughly same.
            [color=blue]
            > I would appreciate any comment or suggestion, does a hardware upgrade is
            > needed, does it seems "normal" for postgresql perfomance.[/color]

            I would ask the question otherway round. What is the level of performance you
            are looking at for your current workload. By how much this performance is
            worse than your expectation?

            IMO it is essential to set a target for performance tuning otherwise it
            becomes an endless loop with minimal returns..

            HTH

            Shridhar

            ---------------------------(end of broadcast)---------------------------
            TIP 2: you can get off all lists at once with the unregister command
            (send "unregister YourEmailAddres sHere" to majordomo@postg resql.org)

            Comment

            • Josué Maldonado

              #7
              Re: Insert speed question

              Hello Shridhar,

              El 02/06/2004 1:16 AM, Shridhar Daithankar en su mensaje escribio:
              [color=blue]
              > I am not sure I understand. You could not insert? Why? Was there any problem
              > with database? Can you use typical linux tools such as vmstat/top to locate
              > the bottleneck?
              >[/color]

              I was unable to make the insert at that moment, after the changes to
              postgresql.conf the speed increased, here is the explain:

              dbmund=# explain analyze
              dbmund-# insert into pk2
              dbmund-# (pkd_stamp,pkd_ fecha,
              doctipo,'YYY',-1,-1,-1,-1,prod_no,impue s,
              dbmund(# pkd_docto,pkd_e s,pkd_qtysold,p kd_qtyinv,
              dbmund(# pkd_unidad,pkd_ price,pkd_costo ,pkd_saldo,
              dbmund(# pkd_procode,pkd _custid,pkd_ven dedor,pkd_tippr ice,
              dbmund(# pkd_totprice,pk d_estanulo,pkd_ estmes,pkd_porc omision,
              dbmund(# pkd_rutafk,pkd_ provcode,pkd_de pto,pkd_pk,
              dbmund(# pkd_totcost,pkd _doctipo2,pkd_d octipo,fk_autor izacion,
              dbmund(# fk_devolucion,p kd_udindexfk,pk d_unidadfk,pkd_ prodno,
              dbmund(# pkd_gravada,pkd _fraimp,pkd_ims ove,pkd_clmayor ,
              dbmund(# pkd_cajanum,pkd _es)
              dbmund-# select fkardex,facfec, facnum,es,tqtys old,
              dbmund-# invqty,unidad,f price,fcost,sal do,
              dbmund-# substr(prod_no, 8,4),codclie,wh o_sold,
              dbmund-# pre_tipo,fprice *tqtysold,'U',d tos(fkardex),
              dbmund-# por_comisi,'XXX ',substr(prod_n o,1,3),
              dbmund-# substr(prod_no, 5,2),OID,fcost* tqtysold,
              dbmund-# doctipo,'YYY',-1,-1,-1,-1,prod_no,impue s,
              dbmund-# fra_imp,imsove, clmayor,cajanum ,es
              dbmund-# from hisventa
              dbmund-# ;
              ERROR: column "pkd_es" specified more than once
              dbmund=# explain analyze
              dbmund-# insert into pk2
              dbmund-# (pkd_stamp,pkd_ fecha,
              doctipo,'YYY',-1,-1,-1,-1,prod_no,impue s,
              dbmund(# pkd_docto,pkd_e s,pkd_qtysold,p kd_qtyinv,
              dbmund(# pkd_unidad,pkd_ price,pkd_costo ,pkd_saldo,
              dbmund(# pkd_procode,pkd _custid,pkd_ven dedor,pkd_tippr ice,
              dbmund(# pkd_totprice,pk d_estanulo,pkd_ estmes,pkd_porc omision,
              dbmund(# pkd_rutafk,pkd_ provcode,pkd_de pto,pkd_pk,
              dbmund(# pkd_totcost,pkd _doctipo2,pkd_d octipo,fk_autor izacion,
              dbmund(# fk_devolucion,p kd_udindexfk,pk d_unidadfk,pkd_ prodno,
              dbmund(# pkd_gravada,pkd _fraimp,pkd_ims ove,pkd_clmayor ,
              dbmund(# pkd_cajanum)
              dbmund-# select fkardex,facfec, facnum,es,tqtys old,
              dbmund-# invqty,unidad,f price,fcost,sal do,
              dbmund-# substr(prod_no, 8,4),codclie,wh o_sold,
              dbmund-# pre_tipo,fprice *tqtysold,'U',d tos(fkardex),
              dbmund-# por_comisi,'XXX ',substr(prod_n o,1,3),
              dbmund-# substr(prod_no, 5,2),OID,fcost* tqtysold,
              dbmund-# doctipo,'YYY',-1,-1,-1,-1,prod_no,impue s,
              dbmund-# fra_imp,imsove, clmayor,cajanum
              dbmund-# from hisventa;
              QUERY PLAN

              ---------------------------------------------------------------------------------------------------------------------------
              Seq Scan on hisventa (cost=0.00..633 607.24 rows=4882546 width=149)
              (actual time=26.647..36 3517.935 rows=4882546 loops=1)
              Total runtime: 1042927.167 ms
              (2 rows)

              dbmund=#

              [color=blue]
              >[color=green]
              >>Did some changes to postgresql.conf according the tuning guide:
              >>tcpip_socke t = true
              >>max_connectio ns = 28
              >>shared_buffer s = 32768 # min max_connections *2 or 16, 8KB each
              >>max_fsm_relat ions = 500 # min 10, fsm is free space map, ~40 bytes
              >>max_fsm_pag es = 80000 # min 1000, fsm is free space map, ~6
              >>max_locks_per _transaction = 64 # min 10
              >>sort_mem = 16384 # min 64, size in KB
              >>vacuum_mem = 419430 # min 1024, size in KB
              >>checkpoint_se gments = 10
              >>effective_cac he_size = 819200 # typically 8KB each[/color]
              >
              >
              > OK, I would say the parameters are still slightly oversized but there is no
              > perfect set of parameters. You still might have to tune it according to your
              > usual workload.
              >[/color]

              As I said before the server is not yet in production, the expected
              connections are 80-100 in normal day, the users tasks in the system
              affects the following areas: inventory, sales, customers, banks, and
              accounting basically, I know there is no rule for tuning but I'll
              aprecciate your comment about the parameters for such scenario.
              [color=blue]
              > I would ask the question otherway round. What is the level of performance you
              > are looking at for your current workload. By how much this performance is
              > worse than your expectation?[/color]

              Since I have not tested the server with the production workload yet,
              maybe my perpception of performance is not rigth focused, basically my
              expectation is database must be faster than the current old legacy
              Foxpro system.

              Thanks,


              --
              Sinceramente,
              Josué Maldonado.
              "La monogamia es como estar obligado a comer papas fritas todos los
              dias." -- Henry Miller. (1891-1980) Escritor estadounidense.

              ---------------------------(end of broadcast)---------------------------
              TIP 1: subscribe and unsubscribe commands go to majordomo@postg resql.org

              Comment

              • Alvaro Herrera

                #8
                Re: Insert speed question

                On Wed, Jun 02, 2004 at 08:50:16AM -0600, Josué Maldonado wrote:
                [color=blue]
                > dbmund=# explain analyze
                > dbmund-# insert into pk2
                > dbmund-# (pkd_stamp,pkd_ fecha,
                > doctipo,'YYY',-1,-1,-1,-1,prod_no,impue s,
                > dbmund(# pkd_docto,pkd_e s,pkd_qtysold,p kd_qtyinv,[/color]
                ^^^^^^[color=blue]
                > dbmund(# pkd_unidad,pkd_ price,pkd_costo ,pkd_saldo,
                > dbmund(# pkd_procode,pkd _custid,pkd_ven dedor,pkd_tippr ice,
                > dbmund(# pkd_totprice,pk d_estanulo,pkd_ estmes,pkd_porc omision,
                > dbmund(# pkd_rutafk,pkd_ provcode,pkd_de pto,pkd_pk,
                > dbmund(# pkd_totcost,pkd _doctipo2,pkd_d octipo,fk_autor izacion,
                > dbmund(# fk_devolucion,p kd_udindexfk,pk d_unidadfk,pkd_ prodno,
                > dbmund(# pkd_gravada,pkd _fraimp,pkd_ims ove,pkd_clmayor ,
                > dbmund(# pkd_cajanum,pkd _es)[/color]
                ^^^^^^[color=blue]
                > dbmund-# select fkardex,facfec, facnum,es,tqtys old,
                > dbmund-# invqty,unidad,f price,fcost,sal do,
                > dbmund-# substr(prod_no, 8,4),codclie,wh o_sold,
                > dbmund-# pre_tipo,fprice *tqtysold,'U',d tos(fkardex),
                > dbmund-# por_comisi,'XXX ',substr(prod_n o,1,3),
                > dbmund-# substr(prod_no, 5,2),OID,fcost* tqtysold,
                > dbmund-# doctipo,'YYY',-1,-1,-1,-1,prod_no,impue s,
                > dbmund-# fra_imp,imsove, clmayor,cajanum ,es
                > dbmund-# from hisventa
                > dbmund-# ;
                > ERROR: column "pkd_es" specified more than once[/color]
                ^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^ ^^^^^^^^^^

                So fix your query! Also what do you expect to happen if you put
                constants in the column list? This certainly looks like a mistake to
                me. Anyway you should really format your query better so you can
                understand it and see obvious mistakes.
                [color=blue]
                > dbmund=# explain analyze
                > dbmund-# insert into pk2[/color]
                [...][color=blue]
                > QUERY PLAN
                >
                > ---------------------------------------------------------------------------------------------------------------------------
                > Seq Scan on hisventa (cost=0.00..633 607.24 rows=4882546 width=149)
                > (actual time=26.647..36 3517.935 rows=4882546 loops=1)
                > Total runtime: 1042927.167 ms
                > (2 rows)[/color]

                So you are inserting 4 million rows. This makes a lot of I/O so no
                wonder it takes a long time. I'm not sure if the time is reasonable or
                not though; 4M rows/1M ms = 4 rows/ms. Not that bad.

                [color=blue][color=green]
                > >I would ask the question otherway round. What is the level of performance
                > >you are looking at for your current workload. By how much this performance
                > >is worse than your expectation?[/color]
                >
                > Since I have not tested the server with the production workload yet,
                > maybe my perpception of performance is not rigth focused, basically my
                > expectation is database must be faster than the current old legacy
                > Foxpro system.[/color]

                If you are going to have big load, you should at least try to code a
                simulation with big load, doing random queries (not any query but the
                actual queries you'll get from your system -- for example if this is a
                web-based app you can try to use Siege or something along those lines).

                --
                Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
                "No reniegues de lo que alguna vez creíste"


                ---------------------------(end of broadcast)---------------------------
                TIP 2: you can get off all lists at once with the unregister command
                (send "unregister YourEmailAddres sHere" to majordomo@postg resql.org)

                Comment

                • Josué Maldonado

                  #9
                  Re: Insert speed question

                  Hello Alvaro,

                  El 02/06/2004 9:58 AM, Alvaro Herrera en su mensaje escribio:[color=blue]
                  > So fix your query! Also what do you expect to happen if you put
                  > constants in the column list? This certainly looks like a mistake to
                  > me. Anyway you should really format your query better so you can
                  > understand it and see obvious mistakes.[/color]

                  I'm sorry I did copy a wrong piece of the clipboard :(
                  [color=blue][color=green]
                  >> QUERY PLAN
                  >>
                  >>---------------------------------------------------------------------------------------------------------------------------
                  >> Seq Scan on hisventa (cost=0.00..633 607.24 rows=4882546 width=149)
                  >>(actual time=26.647..36 3517.935 rows=4882546 loops=1)
                  >> Total runtime: 1042927.167 ms
                  >>(2 rows)[/color]
                  >
                  >
                  > So you are inserting 4 million rows. This makes a lot of I/O so no
                  > wonder it takes a long time. I'm not sure if the time is reasonable or
                  > not though; 4M rows/1M ms = 4 rows/ms. Not that bad.[/color]

                  Agree, insert time got better
                  [color=blue][color=green][color=darkred]
                  >>>I would ask the question otherway round. What is the level of performance
                  >>>you are looking at for your current workload. By how much this performance
                  >>>is worse than your expectation?[/color]
                  >>
                  >>Since I have not tested the server with the production workload yet,
                  >>maybe my perpception of performance is not rigth focused, basically my
                  >>expectation is database must be faster than the current old legacy
                  >>Foxpro system.[/color]
                  >
                  >
                  > If you are going to have big load, you should at least try to code a
                  > simulation with big load, doing random queries (not any query but the
                  > actual queries you'll get from your system -- for example if this is a
                  > web-based app you can try to use Siege or something along those lines).
                  >[/color]

                  I'll take your word and will make such tests



                  --
                  Sinceramente,
                  Josué Maldonado.
                  "Tiene algo que ocultar aquel que se toma a mal las críticas." Helmut
                  Schmidt. Político alemán.

                  ---------------------------(end of broadcast)---------------------------
                  TIP 6: Have you searched our list archives?



                  Comment

                  Working...