space taken by a row & compressed data

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Leonardo Francalanci

    space taken by a row & compressed data

    With mysql I know how much space a row will take, based on the datatype
    of it columns. I also (approximately) know the size of indexes.
    Is there a way to know that in postgresql?

    Is there a way to pack (compress) data, as with myisampack for mysql?



    Thank you

    ---------------------------(end of broadcast)---------------------------
    TIP 8: explain analyze is your friend

  • Bruce Momjian

    #2
    Re: space taken by a row & compressed data

    Leonardo Francalanci wrote:[color=blue]
    > With mysql I know how much space a row will take, based on the datatype
    > of it columns. I also (approximately) know the size of indexes.
    > Is there a way to know that in postgresql?[/color]

    We have an FAQ item about this.
    [color=blue]
    > Is there a way to pack (compress) data, as with myisampack for mysql?[/color]

    Long data values are automatically compressed.

    --
    Bruce Momjian | http://candle.pha.pa.us
    pgman@candle.ph a.pa.us | (610) 359-1001
    + If your life is a hard drive, | 13 Roberts Road
    + Christ can be your backup. | Newtown Square, Pennsylvania 19073

    ---------------------------(end of broadcast)---------------------------
    TIP 3: if posting/reading through Usenet, please send an appropriate
    subscribe-nomail command to majordomo@postg resql.org so that your
    message can get through to the mailing list cleanly

    Comment

    • Tino Wildenhain

      #3
      Re: space taken by a row & compressed data

      Hi Leonardo,

      Am Do, den 26.08.2004 schrieb Leonardo Francalanci um 15:51:[color=blue]
      > With mysql I know how much space a row will take, based on the datatype
      > of it columns. I also (approximately) know the size of indexes.
      > Is there a way to know that in postgresql?
      >
      > Is there a way to pack (compress) data, as with myisampack for mysql?[/color]

      in the contrib dir you will find something (dbsize or so)
      to check for sizes of objects (at least tables)

      Due to the MVCC you happen to have multiple versions
      of an updated row at the same time so the numbers arent
      very exact all the time.

      Large objects such as text columns are stored in compressed
      form already.

      HTH
      Tino Wildenhain


      ---------------------------(end of broadcast)---------------------------
      TIP 3: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to majordomo@postg resql.org so that your
      message can get through to the mailing list cleanly

      Comment

      • Leonardo Francalanci

        #4
        R: space taken by a row & compressed data

        > We have an FAQ item about this.

        Damn! I didn't see that one! Sorry...
        [color=blue]
        > Long data values are automatically compressed.[/color]

        The reason I'm asking is:
        we have a system that stores 200,000,000 rows per month
        (other tables store 10,000,000 rows per month)
        Every row has 400 columns of integers + 2 columns (date+integer) as index.

        Our system compresses rows before writing them to a binary file on disk.
        Data don't usually need to be updated/removed.
        We usually access all columns of a row (hence compression on a per-row basis
        makes sense).

        Is there any way to compress data on a per-row basis? Maybe with
        a User-Defined type?


        ---------------------------(end of broadcast)---------------------------
        TIP 7: don't forget to increase your free space map settings

        Comment

        • Bruce Momjian

          #5
          Re: R: space taken by a row & compressed data

          Leonardo Francalanci wrote:[color=blue][color=green]
          > > We have an FAQ item about this.[/color]
          >
          > Damn! I didn't see that one! Sorry...
          >[color=green]
          > > Long data values are automatically compressed.[/color]
          >
          > The reason I'm asking is:
          > we have a system that stores 200,000,000 rows per month
          > (other tables store 10,000,000 rows per month)
          > Every row has 400 columns of integers + 2 columns (date+integer) as index.
          >
          > Our system compresses rows before writing them to a binary file on disk.
          > Data don't usually need to be updated/removed.
          > We usually access all columns of a row (hence compression on a per-row basis
          > makes sense).
          >
          > Is there any way to compress data on a per-row basis? Maybe with
          > a User-Defined type?[/color]

          Ah, we only compress long row values, which integers would not be. I
          don't see any way to compress an entire row even with a user-defined
          type unless you put multiple values into a single column and compress
          those as a single value. In fact, if you used an array or some special
          data type it would become a long value and would be automatically
          compressed.

          However, as integers, there would have to be a lot of duplicate values
          before compression would be a win.

          --
          Bruce Momjian | http://candle.pha.pa.us
          pgman@candle.ph a.pa.us | (610) 359-1001
          + If your life is a hard drive, | 13 Roberts Road
          + Christ can be your backup. | Newtown Square, Pennsylvania 19073

          ---------------------------(end of broadcast)---------------------------
          TIP 3: if posting/reading through Usenet, please send an appropriate
          subscribe-nomail command to majordomo@postg resql.org so that your
          message can get through to the mailing list cleanly

          Comment

          • Tom Lane

            #6
            Re: R: space taken by a row & compressed data

            "Leonardo Francalanci" <lfrancalanci@s imtel.ie> writes:[color=blue]
            > we have a system that stores 200,000,000 rows per month
            > (other tables store 10,000,000 rows per month)
            > Every row has 400 columns of integers + 2 columns (date+integer) as index.[/color]
            [color=blue]
            > Our system compresses rows before writing them to a binary file on disk.
            > Data don't usually need to be updated/removed.
            > We usually access all columns of a row (hence compression on a per-row basis
            > makes sense).[/color]
            [color=blue]
            > Is there any way to compress data on a per-row basis? Maybe with
            > a User-Defined type?[/color]

            If you just stuck all the integers into a single integer-array column,
            it would be 1600 bytes wide, which is ... hmm ... not quite wide enough
            to trigger the toast logic. Perhaps it would be worthwhile for you to
            run a custom build with TOAST_TUPLE_THR ESHOLD/TOAST_TUPLE_TAR GET set
            to half their standard values (see src/include/access/tuptoaster.h).
            You'd not need to write any specialized code that way.

            Note that if you sometimes search on the values of one of the non-index
            columns, this might be a bad idea.

            regards, tom lane

            ---------------------------(end of broadcast)---------------------------
            TIP 5: Have you checked our extensive FAQ?



            Comment

            • Tom Lane

              #7
              Re: R: space taken by a row &amp; compressed data

              Bruce Momjian <pgman@candle.p ha.pa.us> writes:[color=blue]
              > However, as integers, there would have to be a lot of duplicate values
              > before compression would be a win.[/color]

              Not necessarily. If for instance most of the values fit in int2, then
              the upper zero bytes would be fodder for compression. (If they *all*
              fit in int2 then of course he's missing a trick...) The fact that they
              are successfully using row compression on their old platform indicates
              that there's some win available there.

              regards, tom lane

              ---------------------------(end of broadcast)---------------------------
              TIP 2: you can get off all lists at once with the unregister command
              (send "unregister YourEmailAddres sHere" to majordomo@postg resql.org)

              Comment

              • Leonardo Francalanci

                #8
                R: R: space taken by a row &amp; compressed data

                > If you just stuck all the integers into a single integer-array column,[color=blue]
                > it would be 1600 bytes wide, which is ... hmm ... not quite wide enough
                > to trigger the toast logic. Perhaps it would be worthwhile for you to
                > run a custom build with TOAST_TUPLE_THR ESHOLD/TOAST_TUPLE_TAR GET set
                > to half their standard values (see src/include/access/tuptoaster.h).[/color]

                Could you point me to some docs on this TOAST-mechanism?
                I only found http://postgresql.zadnik.org/projects/devel-toast.html
                but seems very old.
                For instance: what kind of method is used to compress data?
                [color=blue]
                > You'd not need to write any specialized code that way.[/color]
                Great!
                [color=blue]
                > Note that if you sometimes search on the values of one of the non-index
                > columns, this might be a bad idea.[/color]


                No, this never happens.

                ---------------------------(end of broadcast)---------------------------
                TIP 9: the planner will ignore your desire to choose an index scan if your
                joining column's datatypes do not match

                Comment

                • Tom Lane

                  #9
                  Re: R: R: space taken by a row &amp; compressed data

                  "Leonardo Francalanci" <lfrancalanci@s imtel.ie> writes:[color=blue]
                  > Could you point me to some docs on this TOAST-mechanism?
                  > For instance: what kind of method is used to compress data?[/color]

                  When in doubt, read the code ;-)
                  src/backend/utils/adt/pg_lzcompress.c
                  src/include/utils/pg_lzcompress.h

                  regards, tom lane

                  ---------------------------(end of broadcast)---------------------------
                  TIP 3: if posting/reading through Usenet, please send an appropriate
                  subscribe-nomail command to majordomo@postg resql.org so that your
                  message can get through to the mailing list cleanly

                  Comment

                  • Greg Stark

                    #10
                    Re: R: R: space taken by a row &amp; compressed data

                    Tom Lane <tgl@sss.pgh.pa .us> writes:
                    [color=blue]
                    > "Leonardo Francalanci" <lfrancalanci@s imtel.ie> writes:[color=green]
                    > > Could you point me to some docs on this TOAST-mechanism?
                    > > For instance: what kind of method is used to compress data?[/color]
                    >
                    > When in doubt, read the code ;-)
                    > src/backend/utils/adt/pg_lzcompress.c
                    > src/include/utils/pg_lzcompress.h[/color]

                    Are toasted values stored in the table itself or in a separate table?

                    My understanding was that it was the latter, which leads me to wonder whether
                    he'll actually gain anything by having all the records in his table be
                    toasted. It'll mean every record lookup has to traverse two indexes, and a
                    sequential scan loses the sequential read performance boost.

                    Or am I wrong and toasted values can be stored inline?

                    --
                    greg


                    ---------------------------(end of broadcast)---------------------------
                    TIP 1: subscribe and unsubscribe commands go to majordomo@postg resql.org

                    Comment

                    • Joe Conway

                      #11
                      Re: R: R: space taken by a row &amp; compressed data

                      Greg Stark wrote:[color=blue]
                      > Are toasted values stored in the table itself or in a separate table?[/color]

                      In a separate table if they exceed a threshold.
                      [color=blue]
                      > My understanding was that it was the latter, which leads me to wonder whether
                      > he'll actually gain anything by having all the records in his table be
                      > toasted. It'll mean every record lookup has to traverse two indexes, and a
                      > sequential scan loses the sequential read performance boost.
                      >
                      > Or am I wrong and toasted values can be stored inline?
                      >[/color]

                      They can be, but are not by default. See:
                      ALTER TABLE ALTER TABLE — change the definition of a table Synopsis ALTER TABLE [ IF EXISTS ] [ ONLY …


                      SET STORAGE

                      This form sets the storage mode for a column. This controls whether
                      this column is held inline or in a supplementary table, and whether the
                      data should be compressed or not. PLAIN must be used for fixed-length
                      values such as integer and is inline, uncompressed. MAIN is for inline,
                      compressible data. EXTERNAL is for external, uncompressed data, and
                      EXTENDED is for external, compressed data. EXTENDED is the default for
                      all data types that support it. The use of EXTERNAL will, for example,
                      make substring operations on a text column faster, at the penalty of
                      increased storage space.


                      Joe

                      ---------------------------(end of broadcast)---------------------------
                      TIP 6: Have you searched our list archives?



                      Comment

                      • Jan Wieck

                        #12
                        Re: R: R: space taken by a row &amp; compressed data

                        On 8/26/2004 4:13 PM, Greg Stark wrote:
                        [color=blue]
                        > Tom Lane <tgl@sss.pgh.pa .us> writes:
                        >[color=green]
                        >> "Leonardo Francalanci" <lfrancalanci@s imtel.ie> writes:[color=darkred]
                        >> > Could you point me to some docs on this TOAST-mechanism?
                        >> > For instance: what kind of method is used to compress data?[/color]
                        >>
                        >> When in doubt, read the code ;-)
                        >> src/backend/utils/adt/pg_lzcompress.c
                        >> src/include/utils/pg_lzcompress.h[/color]
                        >
                        > Are toasted values stored in the table itself or in a separate table?[/color]

                        Some here and some there. Toast tries by default to get a row under 2K
                        size. As long as that isn't the case, it compresses the largest varlen
                        attribute. If there are no more uncompressed varlen attributes, it takes
                        the largest value and stores it in slices in the toast table, again only
                        until the main row is under 2K.
                        [color=blue]
                        >
                        > My understanding was that it was the latter, which leads me to wonder whether
                        > he'll actually gain anything by having all the records in his table be
                        > toasted. It'll mean every record lookup has to traverse two indexes, and a
                        > sequential scan loses the sequential read performance boost.
                        >
                        > Or am I wrong and toasted values can be stored inline?
                        >[/color]

                        Depends on the type of query. Queries that really access a lot of
                        toasted values lose. Queries that shovel around a lot of rows but don't
                        touch most of the toasted values win.


                        Jan

                        --
                        #============== =============== =============== =============== ===========#
                        # It's easier to get forgiveness for being wrong than for being right. #
                        # Let's break this rule - forgive me. #
                        #============== =============== =============== ====== JanWieck@Yahoo. com #

                        ---------------------------(end of broadcast)---------------------------
                        TIP 1: subscribe and unsubscribe commands go to majordomo@postg resql.org

                        Comment

                        • Greg Stark

                          #13
                          Re: R: R: space taken by a row &amp; compressed data


                          Joe Conway <mail@joeconway .com> writes:
                          [color=blue]
                          > Greg Stark wrote:[color=green]
                          > > Are toasted values stored in the table itself or in a separate table?[/color]
                          >
                          > In a separate table if they exceed a threshold.[/color]

                          How do you check to see how many records, or ideally which records, are being
                          toasted and/or stored externally?


                          --
                          greg


                          ---------------------------(end of broadcast)---------------------------
                          TIP 9: the planner will ignore your desire to choose an index scan if your
                          joining column's datatypes do not match

                          Comment

                          • Greg Stark

                            #14
                            Re: R: R: space taken by a row &amp; compressed data


                            Joe Conway <mail@joeconway .com> writes:
                            [color=blue]
                            > I don't know of a builtin way to do that from SQL, but the attached seems to
                            > work for me.[/color]

                            Cool. Thank you.


                            --
                            greg


                            ---------------------------(end of broadcast)---------------------------
                            TIP 2: you can get off all lists at once with the unregister command
                            (send "unregister YourEmailAddres sHere" to majordomo@postg resql.org)

                            Comment

                            Working...