Horizontal Partitioning question

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • MissLivvy

    Horizontal Partitioning question

    I recently came across a database where the data are horizonally partitioned
    into 4 tables. I'm not sure if this was a poor design choice, or if it was
    done for valid performance reasons. The schema of the tables are essentially
    the same, it's just that they are named differenly and the columns are named
    differenlty to differentiate the data from a business usage perspective. The
    tables could easily be combined inot one by adding a new colum to the
    clustered index that would be used to differentiate the business usage. I am
    trying to evaluate whether combining the tables would improve performance or
    if it would be better to leave them the way they are. Many queries that run
    against these tables do not request records from more than one of the
    tables, which is good. However, there are a number of processes that query
    against all of the tables on the identical clustered index range. I am not
    sure exactly how many rows are in the tables but I'm fairly certain the
    entire database is < 50 GB.


  • John Bell

    #2
    Re: Horizontal Partitioning question

    Hi

    You don't say if they have been set up as a partitioned view, but your
    comment about business usage would tend to imply they haven't? If they
    haven't then this would be the change I would look at first, especially if
    the growth rate of the system would indicate federation will be necessary

    If only a small percentage of queries access all the tables, then this may
    also indicate there is a performance benefit. If the tables are on different
    filegroups and are on different disc subsystems then performance may have
    been a valid reason to split them up.

    Without being there when the decission to partition them was made, you will
    not know the underlying stats or reasons for this design, and I would bet
    they have not been documented!

    If you are going to combine them, then create a benchmark test so that you
    can compare each configuration, and test the two alternatives in a
    controlled environment. If you can't do that, then unless there is a
    specific reason to change what is already working (and perfoming well!) then
    I wouldn't.

    John

    "MissLivvy" <Xeveryidiwanti stakenX@yahoo.c om> wrote in message
    news:DSGld.2617 6$KJ6.5257@news read1.news.pas. earthlink.net.. .[color=blue]
    >I recently came across a database where the data are horizonally
    >partitioned
    > into 4 tables. I'm not sure if this was a poor design choice, or if it was
    > done for valid performance reasons. The schema of the tables are
    > essentially
    > the same, it's just that they are named differenly and the columns are
    > named
    > differenlty to differentiate the data from a business usage perspective.
    > The
    > tables could easily be combined inot one by adding a new colum to the
    > clustered index that would be used to differentiate the business usage. I
    > am
    > trying to evaluate whether combining the tables would improve performance
    > or
    > if it would be better to leave them the way they are. Many queries that
    > run
    > against these tables do not request records from more than one of the
    > tables, which is good. However, there are a number of processes that query
    > against all of the tables on the identical clustered index range. I am not
    > sure exactly how many rows are in the tables but I'm fairly certain the
    > entire database is < 50 GB.
    >
    >[/color]


    Comment

    • --CELKO--

      #3
      Re: Horizontal Partitioning question

      >> I recently came across a database where the data are horizonally
      partitioned into 4 tables. I'm not sure if this was a poor design
      choice, or if it was
      done for valid performance reasons. <<

      Without knowing any more than that, the smart would bet on poor design
      ...
      [color=blue][color=green]
      >> The schema of the tables are essentially the same, it's just that[/color][/color]
      they are named differenly and the columns are named differenlty to
      differentiate the data from a business usage perspective. <<

      Here we MAY have a valid design reason. Is the data logically
      different in each case? Not just a status change (paid versus unpaid
      bills, etc.), really different? If not, then this is a mess.
      [color=blue][color=green]
      >> The tables could easily be combined inot one by adding a new column[/color][/color]
      to the clustered index that would be used to differentiate the
      business usage. <<

      Bingo! No logical differences, no separate tables in the data model.
      [color=blue][color=green]
      >> I am trying to evaluate whether combining the tables would improve[/color][/color]
      performance or if it would be better to leave them the way they are.
      <<

      Performance is a secondary issue. Correctness and removing redudant
      data element name is the first issue. Make it right, then make it
      fast.
      [color=blue][color=green]
      >> Many queries that run against these tables do not request records[/color][/color]
      [sic] from more than one of the tables, which is good. However, there
      are a number of processes that query against all of the tables on the
      identical clustered index range. I am not sure exactly how many rows
      are in the tables but I'm fairly certain the entire database is < 50
      GB. <<

      Write some VIEWs on the data. Performance with a clustered index
      starting on the status column will be fine.

      Comment

      • MissLivvy

        #4
        Re: Horizontal Partitioning question

        [color=blue]
        >
        > You don't say if they have been set up as a partitioned view, but your
        > comment about business usage would tend to imply they haven't?[/color]

        Correct. There is no partitioned view. I don't think the current design
        lends itself to that since there is currenlty no column that could be used
        for the check constraint. There exist data spread across all tables with the
        same primary key. Data with the same PK are logically related from a
        business perspective. To create a check constraint, I think we'd have to add
        another column like the one I mention below.
        [color=blue]
        > specific reason to change what is already working (and perfoming well!)[/color]
        then

        Peformance is definately a problem though with operations that need to query
        against all of the tables at the same time. For example, one thing that
        users routinely need to do is copy a large range of rows from all of the
        tables and insert them back into the same tables (with a new PK, of course).
        I will try to find out if different filegroups were used for the different
        tables, but I'm guessing this is not the case.

        In my case, since sometimes we need to acess all of the tables at once, and
        sometimes not, what I need to do is measure the tradeoff between improved
        performance in situations where only 1 of the tables need accessed, vs the
        penaly paid when all tables need to be accessed. My gut feeling is that
        increase in time spent traversing the B-tree in the combined table should be
        less significant than the penalty paid for having the data split up when we
        need to access all tables at the same time. But again, I really need to
        measure this.

        Thanks.
        [color=blue]
        > "MissLivvy" <Xeveryidiwanti stakenX@yahoo.c om> wrote in message
        > news:DSGld.2617 6$KJ6.5257@news read1.news.pas. earthlink.net.. .[color=green]
        > >I recently came across a database where the data are horizonally
        > >partitioned
        > > into 4 tables. I'm not sure if this was a poor design choice, or if it[/color][/color]
        was[color=blue][color=green]
        > > done for valid performance reasons. The schema of the tables are
        > > essentially
        > > the same, it's just that they are named differenly and the columns are
        > > named
        > > differenlty to differentiate the data from a business usage perspective.
        > > The
        > > tables could easily be combined inot one by adding a new colum to the
        > > clustered index that would be used to differentiate the business usage.[/color][/color]
        I[color=blue][color=green]
        > > am
        > > trying to evaluate whether combining the tables would improve[/color][/color]
        performance[color=blue][color=green]
        > > or
        > > if it would be better to leave them the way they are. Many queries that
        > > run
        > > against these tables do not request records from more than one of the
        > > tables, which is good. However, there are a number of processes that[/color][/color]
        query[color=blue][color=green]
        > > against all of the tables on the identical clustered index range. I am[/color][/color]
        not[color=blue][color=green]
        > > sure exactly how many rows are in the tables but I'm fairly certain the
        > > entire database is < 50 GB.
        > >
        > >[/color]
        >
        >[/color]


        Comment

        • Vincent Lascaux

          #5
          Re: Horizontal Partitioning question

          > Bingo! No logical differences, no separate tables in the data model.

          Hum, may I expose one problem I had. I have been in charge of redesigning a
          database. This database contained a table called Directories that contained
          the absolute path of some folders frequently used in other tables. There was
          a need to differentiate three kind of folders : input, output and binary
          folders. The goal was to use nick names of the folders in other tables. So I
          had this schema :

          Directories
          nick_name varchar(20)
          type byte //0: input, 1: output, 2:binary
          path varchar(1000)
          primary key(nick_name, type)

          Jobs
          input_folder
          output_folder
          binary_folder


          I have been told that this was not a good design because I was not able to
          link the Jobs table to the Directories one (the join would require a
          constant. For example, input_folder is the nick_name, the type is 0).
          The way to solve the problem was to create 3 different tables
          InputDirectorie s, OutputDirectori es and BinaryDirectori es and to link the
          Jobs table to those 3 directories.

          What is best design ?

          --
          Vincent


          Comment

          • Erland Sommarskog

            #6
            Re: Horizontal Partitioning question

            MissLivvy (Xeveryidiwanti stakenX@yahoo.c om) writes:[color=blue]
            > In my case, since sometimes we need to acess all of the tables at once,
            > and sometimes not, what I need to do is measure the tradeoff between
            > improved performance in situations where only 1 of the tables need
            > accessed, vs the penaly paid when all tables need to be accessed. My gut
            > feeling is that increase in time spent traversing the B-tree in the
            > combined table should be less significant than the penalty paid for
            > having the data split up when we need to access all tables at the same
            > time. But again, I really need to measure this.[/color]

            One option would be to retain the tables, and then build an indexed view
            that combines them. Of course, this will double the disk space, and also
            come with a cost for updates. But if the main activity is querying, this
            could be the best of both words.

            Note: to be able to fully use indexed views, you need Enterprise Edition.

            --
            Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

            Books Online for SQL Server SP3 at
            Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

            Comment

            • MissLivvy

              #7
              Re: Horizontal Partitioning question

              Thanks Erland.
              Yes there is a lot of inserting and updating going on with these tables, so
              I think we'd be paying too high a price for the querying benefit of the
              indexed view.


              "Erland Sommarskog" <esquel@sommars kog.se> wrote in message
              news:Xns95A298E 3E50AYazorman@1 27.0.0.1...[color=blue]
              > MissLivvy (Xeveryidiwanti stakenX@yahoo.c om) writes:[color=green]
              > > In my case, since sometimes we need to acess all of the tables at once,
              > > and sometimes not, what I need to do is measure the tradeoff between
              > > improved performance in situations where only 1 of the tables need
              > > accessed, vs the penaly paid when all tables need to be accessed. My gut
              > > feeling is that increase in time spent traversing the B-tree in the
              > > combined table should be less significant than the penalty paid for
              > > having the data split up when we need to access all tables at the same
              > > time. But again, I really need to measure this.[/color]
              >
              > One option would be to retain the tables, and then build an indexed view
              > that combines them. Of course, this will double the disk space, and also
              > come with a cost for updates. But if the main activity is querying, this
              > could be the best of both words.
              >
              > Note: to be able to fully use indexed views, you need Enterprise Edition.
              >
              > --
              > Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se
              >
              > Books Online for SQL Server SP3 at
              > http://www.microsoft.com/sql/techinf...2000/books.asp[/color]


              Comment

              • MissLivvy

                #8
                Re: Horizontal Partitioning question

                What about:

                Directories
                nick_name varchar(20)
                type byte //0: input, 1: output, 2:binary
                path varchar(1000)
                primary key(nick_name, type)

                Job
                (JobID int primary key,
                JobName varchar(20)
                )

                Job_Directory
                (JobID int,
                nickname varchar(20),
                type (byte)
                )
                with PK on JobID + nickname + type

                "Vincent Lascaux" <nospam@nospam. org> wrote in message
                news:4197dca6$0 $10441$636a15ce @news.free.fr.. .[color=blue][color=green]
                > > Bingo! No logical differences, no separate tables in the data model.[/color]
                >
                > Hum, may I expose one problem I had. I have been in charge of redesigning[/color]
                a[color=blue]
                > database. This database contained a table called Directories that[/color]
                contained[color=blue]
                > the absolute path of some folders frequently used in other tables. There[/color]
                was[color=blue]
                > a need to differentiate three kind of folders : input, output and binary
                > folders. The goal was to use nick names of the folders in other tables. So[/color]
                I[color=blue]
                > had this schema :
                >
                > Directories
                > nick_name varchar(20)
                > type byte //0: input, 1: output, 2:binary
                > path varchar(1000)
                > primary key(nick_name, type)
                >
                > Jobs
                > input_folder
                > output_folder
                > binary_folder
                >
                >
                > I have been told that this was not a good design because I was not able to
                > link the Jobs table to the Directories one (the join would require a
                > constant. For example, input_folder is the nick_name, the type is 0).
                > The way to solve the problem was to create 3 different tables
                > InputDirectorie s, OutputDirectori es and BinaryDirectori es and to link the
                > Jobs table to those 3 directories.
                >
                > What is best design ?
                >
                > --
                > Vincent
                >
                >[/color]


                Comment

                • Dan Gidman

                  #9
                  Re: Horizontal Partitioning question

                  >For example, one thing that
                  users routinely need to do is copy a large range of rows from all of
                  the
                  tables and insert them back into the same tables (with a new PK, of
                  course).

                  This seems to me like a lot of redundant data will get created
                  needlessly. It is probably why the db is +50 gig in size. Also a good
                  indication of poor design. Is this data historic or frequently
                  updated? if it is historic and is not changed (like a pos sales
                  record) Why copy the data around so much?

                  Comment

                  • Vincent Lascaux

                    #10
                    Re: Horizontal Partitioning question

                    > Directories[color=blue]
                    > nick_name varchar(20)
                    > type byte //0: input, 1: output, 2:binary
                    > path varchar(1000)
                    > primary key(nick_name, type)
                    >
                    > Job
                    > (JobID int primary key,
                    > JobName varchar(20)
                    > )
                    >
                    > Job_Directory
                    > (JobID int,
                    > nickname varchar(20),
                    > type (byte)
                    > )
                    > with PK on JobID + nickname + type[/color]

                    Considering that any job has one and exactly one path of each type, you have
                    a 1-3 relationship. I dont know if it is better than 1-1, that I heard is
                    bad :)
                    And it makes the SQL queries more complex to write (for no added value)

                    --
                    Vincent


                    Comment

                    • MissLivvy

                      #11
                      Re: Horizontal Partitioning question

                      It's a financial forecasting application and the data are heavily
                      manipulated by the users after copying from another version of the forecast.
                      Copying is just an easier way for them to get started vs. starting over
                      completely from scratch. They also run variance reports to compare different
                      versions of the forecast. To reduce the size of the database, I think an
                      archiving strategy would be appropriate.

                      "Dan Gidman" <danatcofo@gmai l.com> wrote in message
                      news:1100530558 .486160.194200@ f14g2000cwb.goo glegroups.com.. .[color=blue][color=green]
                      > >For example, one thing that[/color]
                      > users routinely need to do is copy a large range of rows from all of
                      > the
                      > tables and insert them back into the same tables (with a new PK, of
                      > course).
                      >
                      > This seems to me like a lot of redundant data will get created
                      > needlessly. It is probably why the db is +50 gig in size. Also a good
                      > indication of poor design. Is this data historic or frequently
                      > updated? if it is historic and is not changed (like a pos sales
                      > record) Why copy the data around so much?
                      >[/color]


                      Comment

                      • MissLivvy

                        #12
                        Re: Horizontal Partitioning question

                        Maybe I misunderstood the problem. The way I understood it:

                        1] a directory can be one of 3 types: input, output, or binary.
                        2] A job has up to 3 directories: input, output and binary.
                        3] A directory can be shared by more than one job.

                        Is that correct?

                        "Vincent Lascaux" <nospam@nospam. org> wrote in message
                        news:41990172$0 $13751$626a14ce @news.free.fr.. .[color=blue][color=green]
                        > > Directories
                        > > nick_name varchar(20)
                        > > type byte //0: input, 1: output, 2:binary
                        > > path varchar(1000)
                        > > primary key(nick_name, type)
                        > >
                        > > Job
                        > > (JobID int primary key,
                        > > JobName varchar(20)
                        > > )
                        > >
                        > > Job_Directory
                        > > (JobID int,
                        > > nickname varchar(20),
                        > > type (byte)
                        > > )
                        > > with PK on JobID + nickname + type[/color]
                        >
                        > Considering that any job has one and exactly one path of each type, you[/color]
                        have[color=blue]
                        > a 1-3 relationship. I dont know if it is better than 1-1, that I heard is
                        > bad :)
                        > And it makes the SQL queries more complex to write (for no added value)
                        >
                        > --
                        > Vincent
                        >
                        >[/color]


                        Comment

                        • Vincent Lascaux

                          #13
                          Re: Horizontal Partitioning question

                          > 1] a directory can be one of 3 types: input, output, or binary.

                          True
                          And the same nickname can be used for different types
                          [color=blue]
                          > 2] A job has up to 3 directories: input, output and binary.[/color]

                          Half true : a job has exactly 3 directories : one input, one output and one
                          binary directory
                          [color=blue]
                          > 3] A directory can be shared by more than one job.[/color]

                          True

                          --
                          Vincent


                          Comment

                          • MissLivvy

                            #14
                            Re: Horizontal Partitioning question

                            Then like my design better. You may find it a pain to have to join to an
                            extra table, but with your design you need 3 joins to get each of the 3
                            directories related to a job. Also, if you ever have a 4th directory related
                            to a job you have to add a new column.

                            If you have no other attributes to add to the Job table, then you could get
                            rid of the job table and just do:

                            Directory (
                            nick_name varchar(20)
                            type byte //0: input, 1: output, 2:binary
                            path varchar(1000)

                            )
                            (with primary key(nick_name, type))

                            Job_Directory
                            (JobName nvarchar(20),
                            nickname varchar(20),
                            type (byte)
                            )

                            (with primary key(JobName, nick_name, type))
                            and fk to Directory on nickname, type)



                            "Vincent Lascaux" <nospam@nospam. org> wrote in message
                            news:41990172$0 $13751$626a14ce @news.free.fr.. .[color=blue][color=green]
                            > > Directories
                            > > nick_name varchar(20)
                            > > type byte //0: input, 1: output, 2:binary
                            > > path varchar(1000)
                            > > primary key(nick_name, type)
                            > >
                            > > Job
                            > > (JobID int primary key,
                            > > JobName varchar(20)
                            > > )
                            > >
                            > > Job_Directory
                            > > (JobID int,
                            > > nickname varchar(20),
                            > > type (byte)
                            > > )
                            > > with PK on JobID + nickname + type[/color]
                            >
                            > Considering that any job has one and exactly one path of each type, you[/color]
                            have[color=blue]
                            > a 1-3 relationship. I dont know if it is better than 1-1, that I heard is
                            > bad :)
                            > And it makes the SQL queries more complex to write (for no added value)
                            >
                            > --
                            > Vincent
                            >
                            >[/color]


                            Comment

                            Working...