Recomended FS

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Martijn van Oosterhout

    #46
    Re: Recomended FS

    On Sat, Oct 25, 2003 at 11:04:00PM -0700, James Moe wrote:[color=blue]
    > Other posts have noted that SCSI never fails under this condition. Apparently SCSI
    > drives sense an impending power loss and flush the cache before power completely
    > disappears. Speed *and* reliability. Hm.[/color]

    I understood it differently. Postgresql has WAL to deal with this situation.
    This issue that it only works as long as the drive doesn't lie about which
    blocks have been written and which are merely in cache. Apparently IDE disks
    lie and SCSI disks don't. It may be a protocol thing.

    The other alternative is battery backed memory. i.e. keep the blocks in
    memory hoping that power will return to the drive before it fails. Some RAID
    cards do this.

    Another thing is that 3ware RAID controllers stick a SCSI interface in
    front of the IDE drives, so perhaps it has more scope to deal with this
    issue.

    Remember, when power fails the first thing that happens is the system
    cancels any DMA tranfer in progress as memory is the part most sensative to
    power fluctuations.
    [color=blue]
    > Of course, anyone serious about a server would have it backed up with aUPS and
    > appropriate software to shut the system down during an extended power outage. This just
    > leaves people tripping over the power cords or maliciously pulling the plugs.[/color]

    If you start adding up the points of failure it's quite a lot. But you
    should be able to proof the system against even malicious tampering.
    --
    Martijn van Oosterhout <kleptog@svana. org> http://svana.org/kleptog/[color=blue]
    > "All that is needed for the forces of evil to triumph is for enough good
    > men to do nothing." - Edmond Burke
    > "The penalty good people pay for not being interested in politics is to be
    > governed by people worse than themselves." - Plato[/color]

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.0.6 (GNU/Linux)
    Comment: For info see http://www.gnupg.org

    iD8DBQE/m2g5Y5Twig3Ge+Y RApUAAKCgdeohxv/jn49jGHW4dJdsvI sTNQCgomWz
    kuOH+216afo/LCes5lcTSmw=
    =3AWJ
    -----END PGP SIGNATURE-----

    Comment

    • Ben-Nes Michael

      #47
      Re: Recomended FS

      Don't forget that the power supply can fail too, so its not all about UPS,
      and cords.

      --------------------------
      Canaan Surfing Ltd.
      Internet Service Providers
      Ben-Nes Michael - Manager
      Tel: 972-4-6991122
      Fax: 972-4-6990098

      --------------------------
      ----- Original Message -----
      From: "James Moe" <jimoe@sohnen-moe.com>
      To: "Postgresql General Mail List" <pgsql-general@postgre sql.org>
      Sent: Sunday, October 26, 2003 8:04 AM
      Subject: Re: [GENERAL] Recomended FS

      [color=blue]
      > -----BEGIN PGP SIGNED MESSAGE-----
      > Hash: SHA1
      >
      > On Sun, 26 Oct 2003 16:24:17 +1300, Mark Kirkwood wrote:
      >[color=green]
      > >I would conclude that it not *always* the case that power failure
      > >renders the database unuseable.
      > >
      > >I have just noticed a similar posting from Scott were he finds the cache
      > >enabled case has a dead database after power failure.
      > >[/color]
      > Other posts have noted that SCSI never fails under this condition.[/color]
      Apparently SCSI[color=blue]
      > drives sense an impending power loss and flush the cache before power[/color]
      completely[color=blue]
      > disappears. Speed *and* reliability. Hm.
      > Of course, anyone serious about a server would have it backed up with a[/color]
      UPS and[color=blue]
      > appropriate software to shut the system down during an extended power[/color]
      outage. This just[color=blue]
      > leaves people tripping over the power cords or maliciously pulling the[/color]
      plugs.[color=blue]
      >
      >
      > - --
      > jimoe at sohnen-moe dot com
      > pgp/gpg public key: http://www.keyserver.net/en/
      > -----BEGIN PGP SIGNATURE-----
      > Version: PGPfreeware 5.0 OS/2 for non-commercial use
      > Comment: PGP 5.0 for OS/2
      > Charset: cp850
      >
      > wj8DBQE/m2PQsxxMki0foKo RAjsOAJ0ed1MV8F cWcALoxIJk66wn4 0EEvwCfVTPB
      > n/rxejkV2upgeZmoy 3yipes=
      > =fDes
      > -----END PGP SIGNATURE-----
      >
      >
      >
      > ---------------------------(end of broadcast)---------------------------
      > TIP 5: Have you checked our extensive FAQ?
      >
      > http://www.postgresql.org/docs/faqs/FAQ.html
      >[/color]


      ---------------------------(end of broadcast)---------------------------
      TIP 5: Have you checked our extensive FAQ?



      Comment

      • Fernando Schapachnik

        #48
        Re: Recomended FS

        En un mensaje anterior, Scott Chapman escribió:[color=blue]
        > I don't recall seeing anyone explain how to disable caching on a drive in this
        > thread. Did I miss that? 'Would be useful. I'm running a 3Ware mirror of 2
        > IDE drives.[/color]

        In FreeBSD, add "hw.ata.wc= 0" to /boot/loader.conf.

        Regards.

        ---------------------------(end of broadcast)---------------------------
        TIP 4: Don't 'kill -9' the postmaster

        Comment

        • scott.marlowe

          #49
          Re: Recomended FS

          On Sat, 25 Oct 2003, James Moe wrote:
          [color=blue]
          > -----BEGIN PGP SIGNED MESSAGE-----
          > Hash: SHA1
          >
          > On Sun, 26 Oct 2003 16:24:17 +1300, Mark Kirkwood wrote:
          >[color=green]
          > >I would conclude that it not *always* the case that power failure
          > >renders the database unuseable.
          > >
          > >I have just noticed a similar posting from Scott were he finds the cache
          > >enabled case has a dead database after power failure.
          > >[/color]
          > Other posts have noted that SCSI never fails under this condition. Apparently SCSI
          > drives sense an impending power loss and flush the cache before power completely
          > disappears. Speed *and* reliability. Hm.[/color]

          Actually, it would appear that the SCSI drives simply don't lie about
          fsync. I.e. when they tell the OS that they wrote the data, they wrote
          the data. Some of them may have caching flushing with lying about fsync
          built in, but the performance looks more like just good fsyncing to me.
          It's all a guess without examining the microcode though... :-)
          [color=blue]
          > Of course, anyone serious about a server would have it backed up with a UPS and
          > appropriate software to shut the system down during an extended power outage. This just
          > leaves people tripping over the power cords or maliciously pulling the plugs.[/color]

          Or a CPU frying, or a power supply dying, or a motherboard failure, or a
          kernel panic, or any number of other possibilities. Admittedly, the first
          line of defense is always good backups, but it's nice knowing that if one
          of my CPUs fry, I can pull it, put in the terminator / replacement, and my
          whole machine will likely come back up.

          But anyone serious about a server will also likely be running on SCSI as
          well as on a UPS. We use a hosting center with 3 UPS and a Diesel
          generator, and we still managed to lose power about a year ago when one
          UPS went haywire, browned out the circuits of the other two, and the
          diesel generator's switch burnt out. Millions of dollars worth of UPS /
          high reliability equipment, and a $50 switch brought it all down.


          ---------------------------(end of broadcast)---------------------------
          TIP 1: subscribe and unsubscribe commands go to majordomo@postg resql.org

          Comment

          • scott.marlowe

            #50
            Re: Recomended FS

            On Sun, 26 Oct 2003, Mark Kirkwood wrote:
            [color=blue]
            > Got to going this today, after a small delay due to the arrival of new
            > disks,
            >
            > So the system is 2x700Mhz PIII, 512 Mb, Promise TX2000, 2x40G ATA-133
            > Maxtor Diamond+8 .
            > The relevent software is Freebsd 4.8 and Postgresql 7.4 Beta 2.
            >
            > Two runs of 'pgbench -c 50 -t 1000000 -s 10 bench' with a power cord
            > removal after about 2 minutes were performed, one with hw.ata.wc = 1
            > (write cache enabled) and other with hw.ata.wc = 0 (disabled).
            >
            > In *both* cases the Pg server survived - i.e it came up, performed
            > automatic recovery. Subsequent 'vacuum full' and further runs of pgbench
            > completed with no issues.[/color]

            Sweet. It may be that the promise is turning off the cache, or that the
            new generation of IDE drives is finally reporting fsync correctly. Was
            there a performance difference in the set with write cache on or off?
            [color=blue]
            > I would conclude that it not *always* the case that power failure
            > renders the database unuseable.[/color]

            But it usually is if write cache is enabled.
            [color=blue]
            > I have just noticed a similar posting from Scott were he finds the cache
            > enabled case has an dead database after power failure. It seems that
            > it's a question of how *likely* is it that the database will survive/not
            > survive a power failure...
            >
            > The other interesting possibility is that Freebsd with soft updates
            > helped things remain salvageable in the cache enabled case (as some
            > writes *must* be lost at power off in this case)....[/color]

            Free BSD may be the reason here. If it's softupdates are ordered in the
            right way, it may be that even with write caching on, the drives "do the
            right thing" under BSD. Time to get out my 5.0 disks and start playing
            with my test server. Thanks for the test!


            ---------------------------(end of broadcast)---------------------------
            TIP 4: Don't 'kill -9' the postmaster

            Comment

            • scott.marlowe

              #51
              Re: Recomended FS

              On Fri, 24 Oct 2003, Scott Chapman wrote:
              [color=blue]
              > On Friday 24 October 2003 16:23, scott.marlowe wrote:[color=green]
              > > Right, but NONE of the benchmarks I've seen have been with IDE drives with
              > > their cache disabled, which is the only way to make them reliable under
              > > postgresql should something bad happen. but thanks for the benchmarks,
              > > I'll look them over.[/color]
              >
              > I don't recall seeing anyone explain how to disable caching on a drive in this
              > thread. Did I miss that? 'Would be useful. I'm running a 3Ware mirror of 2
              > IDE drives.
              >
              > Scott[/color]

              Each OS has it's own methods, and some IDE RAID cards don't give you
              direct access to the drives to enable / disable write cache.

              On Linux you can disable write cache like so:

              hdparm -W0 /dev/hda

              back on:

              hdparm -W1 /dev/hda


              ---------------------------(end of broadcast)---------------------------
              TIP 9: the planner will ignore your desire to choose an index scan if your
              joining column's datatypes do not match

              Comment

              • Greg Stark

                #52
                Re: Recomended FS

                "scott.marl owe" <scott.marlowe@ ihs.com> writes:
                [color=blue]
                > Sweet. It may be that the promise is turning off the cache, or that the
                > new generation of IDE drives is finally reporting fsync correctly. Was
                > there a performance difference in the set with write cache on or off?[/color]

                Check out this thread. It seems the ATA standard does not include any way to
                make fsync work properly without destroying performance. At least on linux
                even that much is impossible without disabling caching entirely as the
                operation required isn't exposed to user-space. There is some hope for the
                future though.


                [color=blue][color=green]
                > > The other interesting possibility is that Freebsd with soft updates
                > > helped things remain salvageable in the cache enabled case (as some
                > > writes *must* be lost at power off in this case)....[/color]
                >
                > Free BSD may be the reason here. If it's softupdates are ordered in the
                > right way, it may be that even with write caching on, the drives "do the
                > right thing" under BSD. Time to get out my 5.0 disks and start playing
                > with my test server. Thanks for the test![/color]

                I thought soft updates applied only to directory metadata changes.

                --
                greg


                ---------------------------(end of broadcast)---------------------------
                TIP 6: Have you searched our list archives?



                Comment

                • Bruce Momjian

                  #53
                  Re: Recomended FS

                  Greg Stark wrote:[color=blue]
                  > "scott.marl owe" <scott.marlowe@ ihs.com> writes:
                  >[color=green]
                  > > Sweet. It may be that the promise is turning off the cache, or that the
                  > > new generation of IDE drives is finally reporting fsync correctly. Was
                  > > there a performance difference in the set with write cache on or off?[/color]
                  >
                  > Check out this thread. It seems the ATA standard does not include any way to
                  > make fsync work properly without destroying performance. At least on linux
                  > even that much is impossible without disabling caching entirely as the
                  > operation required isn't exposed to user-space. There is some hope for the
                  > future though.
                  >
                  > http://www.ussg.iu.edu/hypermail/lin...10.2/0163.html[/color]

                  I thought the operating system has to write the block and force it to
                  disk, and that happened the same with SCSI and IDE. I didn't assume the
                  drive would associate multiple blocks with the fsync.

                  --
                  Bruce Momjian | http://candle.pha.pa.us
                  pgman@candle.ph a.pa.us | (610) 359-1001
                  + If your life is a hard drive, | 13 Roberts Road
                  + Christ can be your backup. | Newtown Square, Pennsylvania 19073

                  ---------------------------(end of broadcast)---------------------------
                  TIP 6: Have you searched our list archives?



                  Comment

                  • Greg Stark

                    #54
                    Re: Recomended FS


                    "scott.marl owe" <scott.marlowe@ ihs.com> writes:
                    [color=blue]
                    > Or a CPU frying, or a power supply dying, or a motherboard failure, or a
                    > kernel panic, or any number of other possibilities. Admittedly, the first
                    > line of defense is always good backups, but it's nice knowing that if one
                    > of my CPUs fry, I can pull it, put in the terminator / replacement, and my
                    > whole machine will likely come back up.[/color]

                    Well, note that in all of those cases the disk drive would still have a chance
                    to sync its buffers to disk. Linux isn't lying about fsync as far as its
                    buffers getting flushed, only the drive itself.

                    In theory even in those cases there's no guarantee of exactly how long the
                    drive will hold the buffers without committing them, but in practice I think
                    any sane drive will commit pretty damn soon or else normal power-off wouldn't
                    work.

                    --
                    greg


                    ---------------------------(end of broadcast)---------------------------
                    TIP 8: explain analyze is your friend

                    Comment

                    • Mark Kirkwood

                      #55
                      Re: Recomended FS



                      scott.marlowe wrote:
                      [color=blue]
                      >Was there a performance difference in the set with write cache on or off?
                      >
                      >[/color]
                      Yes - just in the process of a little study concerning this - I will
                      post some preliminary results soon

                      cheers

                      Mark


                      ---------------------------(end of broadcast)---------------------------
                      TIP 5: Have you checked our extensive FAQ?



                      Comment

                      • Mark Kirkwood

                        #56
                        Re: Recomended FS

                        Maybe it is a little late to be posting on this thread - but I was doing
                        pgbench runs with a Raid 0 ATA system and thought the results might be
                        interesting.

                        So here they are : pgbench -c 5 -t 1000 -s 5, median of 3 runs on a
                        Dual PIII 700 512Mb 2x7200 RPM ATA 133 Promise TX200
                        (same method / Pg configuration parameters as Scott's):

                        2 disk Raid0 W0
                        66 tps

                        2 disk Raid0 W1
                        220 tps

                        I was expecting a slightly better result for W0 (write caching off),
                        mind you the point could be made that you get about half the performance
                        of the SCSI system - for about half the price.

                        And the W1 result - that's fast, when (or if) that little power saving
                        capacitor arrives for these drives we could see performance, reliability
                        *and* economy....

                        regards

                        Mark

                        scott.marlowe wrote:
                        [color=blue]
                        >
                        >MachineA Config1:
                        >141 tps
                        >
                        >MachineB Config1 W0:
                        >60 tps
                        >
                        >MachineB Config1 W1:
                        >112 tps
                        >
                        >MachineA Config2:
                        >101 tps
                        >
                        >MachineB Config2 W0:
                        >44 tps
                        >
                        >MachineB Config2 W1:
                        >135 tps
                        >
                        >
                        >
                        >[/color]



                        ---------------------------(end of broadcast)---------------------------
                        TIP 5: Have you checked our extensive FAQ?



                        Comment

                        Working...