Binary data in PostgreSQL

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Holger Marzen

    Binary data in PostgreSQL

    Hi all,

    AFAIK it is possible for columns to be very large, up to about 2 GB. Are
    there any hints or experiences about storing binary data (jpg-images,
    pdf-documents) in PostgrreSQL with or without the complicated lo-stuff?

    Of course it's in many cases a good approach to store those files simply
    in the file system but there's always a risk of running out of sync
    (filesystem and tables), e.g. by deleting files and not deleting the
    table rows with the filenames.

    Any ideas and comments welcome.

    ---------------------------(end of broadcast)---------------------------
    TIP 7: don't forget to increase your free space map settings

  • Uwe C. Schroeder

    #2
    Re: Binary data in PostgreSQL

    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1


    Well, you have the option to use the bytea types. It's nice to have the stuff
    in the database. I wrote an application (quite a while back) where I had to
    store documents to the database. At that time I switched to mysql, not
    because it's better, but because it handles binary data quite easily. Since
    this is a couple of years ago it's quite likely that PostgreSQL byte handling
    improved a lot (back then PG was still at a pre 7 version).
    Meanwhile I'm developing an application that stores huge amounts of documents.
    Since this time I need the features of a real database, like triggers etc.
    I'm using PG now. The amount of data I'm storing now is huge, therefor I
    decided to store the files in a directory tree. I'm storing about 40000
    documents around 100k each a week, thats about 12GB a month. The file storage
    directory is handled by the server side of the application and is invisible
    to normal users. The main reasons for using the filesystem instead of the
    database were:

    1) a lot of documents
    2) due to 1) I'm able to spread the documents over several
    harddrives/filesystems - so this solves the storage space problem (unless you
    want to use LVM or similar)
    3) much easier handling on the application side. I can use rsync, ftp, scp or
    whatever else to transfer files. Also most programing languages are quite
    good at handling normal files.
    4) You never run into a file problem regarding file type or size, at least not
    up to the limit of the operating system you're on.
    5) You can use conventional backup methods without interrupting the database
    server. It's also much faster to restore the whole system on a different
    machine - dump/restore the database and copy your directory tree.

    You have some drawbacks though:
    1) your application has to make sure that the referencing records and the file
    storage is in sync. If there's a bug in the application you can totally
    scramble the storage.
    2) If your server has a filesystem problem you could lose files due to fsck,
    which again would "unsync" the data from the files.

    Basically it all depends on the structure of your application. If you can
    guarantee that only the application has access to the file storage you can
    control the synchronization problem. If not I'd probably go the extra mile to
    store the stuff as blobs.
    There is another option you might want to consider. If you're storing
    relatively small files, like the images for a website, you could even go with
    a normal "text" field and base64 encode the data before storing (as well as
    decoding it after retrieval).

    My $0.02



    On Saturday 21 February 2004 10:53 am, Holger Marzen wrote:[color=blue]
    > Hi all,
    >
    > AFAIK it is possible for columns to be very large, up to about 2 GB. Are
    > there any hints or experiences about storing binary data (jpg-images,
    > pdf-documents) in PostgrreSQL with or without the complicated lo-stuff?
    >
    > Of course it's in many cases a good approach to store those files simply
    > in the file system but there's always a risk of running out of sync
    > (filesystem and tables), e.g. by deleting files and not deleting the
    > table rows with the filenames.
    >
    > Any ideas and comments welcome.
    >
    > ---------------------------(end of broadcast)---------------------------
    > TIP 7: don't forget to increase your free space map settings[/color]

    - --
    UC

    - --
    Open Source Solutions 4U, LLC 2570 Fleetwood Drive
    Phone: +1 650 872 2425 San Bruno, CA 94066
    Cell: +1 650 302 2405 United States
    Fax: +1 650 872 2417
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.2.3 (GNU/Linux)

    iD8DBQFAN8g4jqG XBvRToM4RAreNAJ wMwT1Twtg9c35rp 1SgaghKU7XDiQCg 0bQM
    Z7cbgYmGYtjkCFn JfSL3tm8=
    =3yhL
    -----END PGP SIGNATURE-----


    ---------------------------(end of broadcast)---------------------------
    TIP 5: Have you checked our extensive FAQ?



    Comment

    • Gavin M. Roy

      #3
      Re: Binary data in PostgreSQL

      It's not the most effective use of space, and I'm sure not the best way
      to do it, but I store such data as base64 encoded text. Works very
      well for my needs, in that regard.

      Gavin

      Holger Marzen wrote:
      [color=blue]
      >Hi all,
      >
      >AFAIK it is possible for columns to be very large, up to about 2 GB. Are
      >there any hints or experiences about storing binary data (jpg-images,
      >pdf-documents) in PostgrreSQL with or without the complicated lo-stuff?
      >
      >Of course it's in many cases a good approach to store those files simply
      >in the file system but there's always a risk of running out of sync
      >(filesystem and tables), e.g. by deleting files and not deleting the
      >table rows with the filenames.
      >
      >Any ideas and comments welcome.
      >
      >---------------------------(end of broadcast)---------------------------
      >TIP 7: don't forget to increase your free space map settings
      >
      >[/color]


      ---------------------------(end of broadcast)---------------------------
      TIP 7: don't forget to increase your free space map settings

      Comment

      • Joshua Drake

        #4
        Re: Binary data in PostgreSQL

        Hello,

        I would use large objects. Easy to access, easy to use.

        J


        Gavin M. Roy wrote:[color=blue]
        > It's not the most effective use of space, and I'm sure not the best way
        > to do it, but I store such data as base64 encoded text. Works very
        > well for my needs, in that regard.
        >
        > Gavin
        >
        > Holger Marzen wrote:
        >[color=green]
        >> Hi all,
        >>
        >> AFAIK it is possible for columns to be very large, up to about 2 GB. Are
        >> there any hints or experiences about storing binary data (jpg-images,
        >> pdf-documents) in PostgrreSQL with or without the complicated lo-stuff?
        >>
        >> Of course it's in many cases a good approach to store those files simply
        >> in the file system but there's always a risk of running out of sync
        >> (filesystem and tables), e.g. by deleting files and not deleting the
        >> table rows with the filenames.
        >>
        >> Any ideas and comments welcome.
        >>
        >> ---------------------------(end of broadcast)---------------------------
        >> TIP 7: don't forget to increase your free space map settings
        >>
        >>[/color]
        >
        >
        > ---------------------------(end of broadcast)---------------------------
        > TIP 7: don't forget to increase your free space map settings[/color]



        ---------------------------(end of broadcast)---------------------------
        TIP 1: subscribe and unsubscribe commands go to majordomo@postg resql.org

        Comment

        • Jonathan Bartlett

          #5
          Re: Binary data in PostgreSQL

          bytea's suck for large data. I think my processes ate up 5x the size of
          the large objects just trying to select them from the database. Anything
          over 10 Meg it usually isn't useful for.

          Jon

          On Sat, 21 Feb 2004, Joshua Drake wrote:
          [color=blue]
          > Hello,
          >
          > I would use large objects. Easy to access, easy to use.
          >
          > J
          >
          >
          > Gavin M. Roy wrote:[color=green]
          > > It's not the most effective use of space, and I'm sure not the best way
          > > to do it, but I store such data as base64 encoded text. Works very
          > > well for my needs, in that regard.
          > >
          > > Gavin
          > >
          > > Holger Marzen wrote:
          > >[color=darkred]
          > >> Hi all,
          > >>
          > >> AFAIK it is possible for columns to be very large, up to about 2 GB. Are
          > >> there any hints or experiences about storing binary data (jpg-images,
          > >> pdf-documents) in PostgrreSQL with or without the complicated lo-stuff?
          > >>
          > >> Of course it's in many cases a good approach to store those files simply
          > >> in the file system but there's always a risk of running out of sync
          > >> (filesystem and tables), e.g. by deleting files and not deleting the
          > >> table rows with the filenames.
          > >>
          > >> Any ideas and comments welcome.
          > >>
          > >> ---------------------------(end of broadcast)---------------------------
          > >> TIP 7: don't forget to increase your free space map settings
          > >>
          > >>[/color]
          > >
          > >
          > > ---------------------------(end of broadcast)---------------------------
          > > TIP 7: don't forget to increase your free space map settings[/color]
          >
          >[/color]


          ---------------------------(end of broadcast)---------------------------
          TIP 1: subscribe and unsubscribe commands go to majordomo@postg resql.org

          Comment

          Working...