SQL Server 2008 - Filestream

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Andreas Zimmermann

    SQL Server 2008 - Filestream

    Hi,

    we just set up a SQL Server 2008 on Windows Server 2008 (configured as file
    / webserver, 16 GB memory, 4 dual core processors, 6 internal disks + 15
    disc ISCSI array with overall almost 3 TB space) to use filestream and ran
    into the following problem:

    We extracted binaries from our 'original' database and imported it into the
    new 08 database using filestream - so far, so good.

    In the beginning everything looked pretty good, we extracted / imported over
    100.000 rows per hour but over time the performance degraded significantly
    (down to a roughly 4500 per hour). At this point, we have about 1 million
    files in our filesteam filegroup (which means in one directory on our binary
    array) but there are 500.000 more to go - and at this speed this will take
    days (not even thinking about what it might mean for performance
    afterwards).

    Searching for reasons, we realized that the inserts into the new table took
    most of the time (over 80%), so we first stopped and rebuilt the only index
    on that one (primary key) and, after realizing that this did not help at
    all, we dropped it completely (since uniqueness is assured through the
    original table anyways) - same result, no change, still way too slow.

    Did anybody run into a similar problem so far or is it in the end not even a
    SQL server but a Windows / File System problem (since through filestreaming
    millions of files go into a single folder) ?!

    Any ideas or tips would be appreciated ... thanks in advance !

    Kind regards

    Andy



  • Erland Sommarskog

    #2
    Re: SQL Server 2008 - Filestream

    Andreas Zimmermann (souldiver@nurf uerspam.de) writes:
    we just set up a SQL Server 2008 on Windows Server 2008 (configured as
    file / webserver, 16 GB memory, 4 dual core processors, 6 internal disks
    + 15 disc ISCSI array with overall almost 3 TB space) to use filestream
    and ran into the following problem:
    >
    We extracted binaries from our 'original' database and imported it into
    the new 08 database using filestream - so far, so good.
    >
    In the beginning everything looked pretty good, we extracted / imported
    over 100.000 rows per hour but over time the performance degraded
    significantly (down to a roughly 4500 per hour). At this point, we have
    about 1 million files in our filesteam filegroup (which means in one
    directory on our binary array) but there are 500.000 more to go - and at
    this speed this will take days (not even thinking about what it might
    mean for performance afterwards).
    >
    Searching for reasons, we realized that the inserts into the new table
    took most of the time (over 80%), so we first stopped and rebuilt the
    only index on that one (primary key) and, after realizing that this did
    not help at all, we dropped it completely (since uniqueness is assured
    through the original table anyways) - same result, no change, still way
    too slow.
    I don't have any answer for you, so I've forwarded your post to my MVP
    colleagues, to see if they have anything to add. But I have one question:
    how do you write the data? Through INSERT, or do you use the
    OpenSqlFilestre am to get a file handle for the files?


    --
    Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

    Books Online for SQL Server 2005 at

    Books Online for SQL Server 2000 at

    Comment

    • Dan Guzman

      #3
      Re: SQL Server 2008 - Filestream

      Did anybody run into a similar problem so far or is it in the end not even
      a
      SQL server but a Windows / File System problem (since through
      filestreaming
      millions of files go into a single folder) ?!
      How many paths in your FILESTREAM filegroup? If you have only a single
      path, I'm curious if adding additional paths will improve performance.

      --
      Hope this helps.

      Dan Guzman
      SQL Server MVP


      "Andreas Zimmermann" <souldiver@nurf uerspam.dewrote in message
      news:g29p93$o8u $2@aioe.org...
      Hi,
      >
      we just set up a SQL Server 2008 on Windows Server 2008 (configured as
      file
      / webserver, 16 GB memory, 4 dual core processors, 6 internal disks + 15
      disc ISCSI array with overall almost 3 TB space) to use filestream and ran
      into the following problem:
      >
      We extracted binaries from our 'original' database and imported it into
      the
      new 08 database using filestream - so far, so good.
      >
      In the beginning everything looked pretty good, we extracted / imported
      over
      100.000 rows per hour but over time the performance degraded significantly
      (down to a roughly 4500 per hour). At this point, we have about 1 million
      files in our filesteam filegroup (which means in one directory on our
      binary
      array) but there are 500.000 more to go - and at this speed this will take
      days (not even thinking about what it might mean for performance
      afterwards).
      >
      Searching for reasons, we realized that the inserts into the new table
      took
      most of the time (over 80%), so we first stopped and rebuilt the only
      index
      on that one (primary key) and, after realizing that this did not help at
      all, we dropped it completely (since uniqueness is assured through the
      original table anyways) - same result, no change, still way too slow.
      >
      Did anybody run into a similar problem so far or is it in the end not even
      a
      SQL server but a Windows / File System problem (since through
      filestreaming
      millions of files go into a single folder) ?!
      >
      Any ideas or tips would be appreciated ... thanks in advance !
      >
      Kind regards
      >
      Andy
      >
      >
      >

      Comment

      • Andreas Zimmermann

        #4
        Re: SQL Server 2008 - Filestream

        Well, that was an idea we also had ... by the 'standard' setup there is
        only one 'file' (in this case path) in the filegroup - so we tried to
        add one and had to realize that this is not possible (so far ?!) ... you
        get a simple error message that filestream filegroups are not allowed to
        contain more than one path ...

        Dan Guzman wrote:
        >Did anybody run into a similar problem so far or is it in the end not
        >even a
        >SQL server but a Windows / File System problem (since through
        >filestreamin g
        >millions of files go into a single folder) ?!
        >
        How many paths in your FILESTREAM filegroup? If you have only a single
        path, I'm curious if adding additional paths will improve performance.
        >

        Comment

        • Andreas Zimmermann

          #5
          Re: SQL Server 2008 - Filestream

          At the moment we're using a 'straight forward' insert after extracting
          the data from our original table(s) ... is there an advantage in using
          OpenSqlFilestre am instead (if so we would certainly 'restrucure' the
          extract / import) ?!

          Erland Sommarskog wrote:
          Andreas Zimmermann (souldiver@nurf uerspam.de) writes:
          >we just set up a SQL Server 2008 on Windows Server 2008 (configured as
          >file / webserver, 16 GB memory, 4 dual core processors, 6 internal disks
          >+ 15 disc ISCSI array with overall almost 3 TB space) to use filestream
          >and ran into the following problem:
          >>
          >We extracted binaries from our 'original' database and imported it into
          >the new 08 database using filestream - so far, so good.
          >>
          >In the beginning everything looked pretty good, we extracted / imported
          >over 100.000 rows per hour but over time the performance degraded
          >significantl y (down to a roughly 4500 per hour). At this point, we have
          >about 1 million files in our filesteam filegroup (which means in one
          >directory on our binary array) but there are 500.000 more to go - and at
          >this speed this will take days (not even thinking about what it might
          >mean for performance afterwards).
          >>
          >Searching for reasons, we realized that the inserts into the new table
          >took most of the time (over 80%), so we first stopped and rebuilt the
          >only index on that one (primary key) and, after realizing that this did
          >not help at all, we dropped it completely (since uniqueness is assured
          >through the original table anyways) - same result, no change, still way
          >too slow.
          >
          I don't have any answer for you, so I've forwarded your post to my MVP
          colleagues, to see if they have anything to add. But I have one question:
          how do you write the data? Through INSERT, or do you use the
          OpenSqlFilestre am to get a file handle for the files?
          >
          >

          Comment

          • Erland Sommarskog

            #6
            Re: SQL Server 2008 - Filestream

            Andreas Zimmermann (souldiver@nurf uerspam.de) writes:
            At the moment we're using a 'straight forward' insert after extracting
            the data from our original table(s) ... is there an advantage in using
            OpenSqlFilestre am instead (if so we would certainly 'restrucure' the
            extract / import) ?!
            As I understand it, OpenSqlFilstrea m is more or less the whole point
            with FILESTREAM: you can write data through the Win32 file API, which
            should be faster than going through the needles eye of SQL Server.

            However, it's not likely to help you with the issue of number of files
            in the same directory getting larger than NTFS can cope with. I will have
            to say that this issue exists at all. It should be a no-brainer to spot
            this problem in the design.

            There seems to be a workaround, though. I don't know if Jimi2Cool is your alter ego, if he just a person with the same problem as you. In any case,
            he was able to diagnose the root of the problem and also able to address
            it. Whether his fix is supported remains to see:


            --
            Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

            Books Online for SQL Server 2005 at

            Books Online for SQL Server 2000 at

            Comment

            • Dan Guzman

              #7
              Re: SQL Server 2008 - Filestream

              Well, that was an idea we also had ... by the 'standard' setup there is
              only one 'file' (in this case path) in the filegroup - so we tried to add
              one and had to realize that this is not possible (so far ?!) ... you get a
              simple error message that filestream filegroups are not allowed to contain
              more than one path ...
              My bad - I didn't consider separate filegroups are required for separate
              FILESTREAM folders. Another workaround might to create separate FILESTREAM
              filegroups and parition the FILESTREAM portion of the table so that data is
              distributed evenly in those folders. This is a bit of a kluge, though. I
              think the workaround Erland found is a less kludgey kludge ;-)

              --
              Hope this helps.

              Dan Guzman
              SQL Server MVP


              "Andreas Zimmermann" <souldiver@nurf uerspam.dewrote in message
              news:g2bic1$2sj $1@aioe.org...
              Well, that was an idea we also had ... by the 'standard' setup there is
              only one 'file' (in this case path) in the filegroup - so we tried to add
              one and had to realize that this is not possible (so far ?!) ... you get a
              simple error message that filestream filegroups are not allowed to contain
              more than one path ...
              >
              Dan Guzman wrote:
              >>Did anybody run into a similar problem so far or is it in the end not
              >>even a
              >>SQL server but a Windows / File System problem (since through
              >>filestreami ng
              >>millions of files go into a single folder) ?!
              >>
              >How many paths in your FILESTREAM filegroup? If you have only a single
              >path, I'm curious if adding additional paths will improve performance.
              >>

              Comment

              • Andreas Zimmermann

                #8
                Re: SQL Server 2008 - Filestream

                In fact, he is a collegue of mine ... and we found about that particular
                workaround on friday (and it solved the problem for now it seems) - I
                just did not have the time to post here ... but thanks for your heads up !

                Erland Sommarskog wrote:
                Andreas Zimmermann (souldiver@nurf uerspam.de) writes:
                >At the moment we're using a 'straight forward' insert after extracting
                >the data from our original table(s) ... is there an advantage in using
                >OpenSqlFilestr eam instead (if so we would certainly 'restrucure' the
                >extract / import) ?!
                >
                As I understand it, OpenSqlFilstrea m is more or less the whole point
                with FILESTREAM: you can write data through the Win32 file API, which
                should be faster than going through the needles eye of SQL Server.
                >
                However, it's not likely to help you with the issue of number of files
                in the same directory getting larger than NTFS can cope with. I will have
                to say that this issue exists at all. It should be a no-brainer to spot
                this problem in the design.
                >
                There seems to be a workaround, though. I don't know if Jimi2Cool is your alter ego, if he just a person with the same problem as you. In any case,
                he was able to diagnose the root of the problem and also able to address
                it. Whether his fix is supported remains to see:

                >

                Comment

                • Andreas Zimmermann

                  #9
                  Re: SQL Server 2008 - Filestream

                  Thanks ... we might have a look at this - although you're right, it
                  sounds like a kludge :)

                  Dan Guzman wrote:
                  >Well, that was an idea we also had ... by the 'standard' setup there
                  >is only one 'file' (in this case path) in the filegroup - so we tried
                  >to add one and had to realize that this is not possible (so far ?!)
                  >... you get a simple error message that filestream filegroups are not
                  >allowed to contain more than one path ...
                  >
                  My bad - I didn't consider separate filegroups are required for separate
                  FILESTREAM folders. Another workaround might to create separate
                  FILESTREAM filegroups and parition the FILESTREAM portion of the table
                  so that data is distributed evenly in those folders. This is a bit of a
                  kluge, though. I think the workaround Erland found is a less kludgey
                  kludge ;-)
                  >

                  Comment

                  Working...