Best way to model hybrid single/multiple logical database

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Steve Jorgensen

    Best way to model hybrid single/multiple logical database

    I've run across this issue several times of late, and I've never come up with
    a satisfactory answer to the best way to handle this schema issue. You have a
    large section of schema in which a subset of records across all tables is
    often considered a separate logical system, but sometimes may be treaded ar
    part of the global system, and there is not simply a 1-m-m... tree among the
    records in a logical database.

    Here's an example. A system that was originally designed to handle only data
    for a single company is now handling data for multiple companies. So, the
    simplest concept to implement would be to simply add CompanyID to all the
    tables, and include that in all the relationships between the tables.
    Somehow, that's not very satisfying.

    One obvious problem with this is that among the tables with the largest number
    of rows, adding a column is a big cost in size. These are mostly subdetail
    records, so one would think it would work to simply rely on the parent
    record's CompanyID, but wait - The detail table references lookup tables that
    are also company-specific. If the CompanyID is omitted from the table, there
    is no way to have the relationship enforce that all the related records belong
    to the same company. One can rely on the application to do that, but this
    leaves some danger of integrity problems when manual table maintenance is
    performed.

    Has anyone come up with a better strategy for handling these kinds of designs?
  • Mike MacSween

    #2
    Re: Best way to model hybrid single/multiple logical database

    "Steve Jorgensen" <nospam@nospam. nospam> wrote in message
    news:1hqcvvoch5 ijidn0thstv0af1 kjstdh20b@4ax.c om...
    [color=blue]
    > Here's an example. A system that was originally designed to handle only[/color]
    data[color=blue]
    > for a single company is now handling data for multiple companies.[/color]

    How can you tell? What is it about a particular datum which allows you to
    think - 'ah, this is a fact about the Acme Widget Company'. If you can tell
    already, perhaps that's your identifier, if you can't then it might not be
    doing what it appears to be doing.

    Maybe some of the actual tables and sample data would help? We can't really
    tell what sort of application it is.
    [color=blue]
    > So, the
    > simplest concept to implement would be to simply add CompanyID to all the
    > tables, and include that in all the relationships between the tables.
    > Somehow, that's not very satisfying.[/color]

    And presumably some junction tables too? Might be your only way.
    [color=blue]
    > Has anyone come up with a better strategy for handling these kinds of[/color]
    designs?

    Er, start again? Seems to me an app that is designed to handle data for one
    organisation that then starts to handle data for >1 organisation - well,
    that's a fundamental change isn't it?

    Mike


    Comment

    • Steve Jorgensen

      #3
      Re: Best way to model hybrid single/multiple logical database

      On Sat, 3 Jan 2004 07:45:49 -0000, "Mike MacSween"
      <mike.macsween. nospam@btintern et.com> wrote:
      [color=blue]
      >"Steve Jorgensen" <nospam@nospam. nospam> wrote in message
      >news:1hqcvvoch 5ijidn0thstv0af 1kjstdh20b@4ax. com...
      >[color=green]
      >> Here's an example. A system that was originally designed to handle only[/color]
      >data[color=green]
      >> for a single company is now handling data for multiple companies.[/color]
      >
      >How can you tell? What is it about a particular datum which allows you to
      >think - 'ah, this is a fact about the Acme Widget Company'. If you can tell
      >already, perhaps that's your identifier, if you can't then it might not be
      >doing what it appears to be doing.
      >
      >Maybe some of the actual tables and sample data would help? We can't really
      >tell what sort of application it is.
      >[color=green]
      >> So, the
      >> simplest concept to implement would be to simply add CompanyID to all the
      >> tables, and include that in all the relationships between the tables.
      >> Somehow, that's not very satisfying.[/color]
      >
      >And presumably some junction tables too? Might be your only way.
      >[color=green]
      >> Has anyone come up with a better strategy for handling these kinds of[/color]
      >designs?
      >
      >Er, start again? Seems to me an app that is designed to handle data for one
      >organisation that then starts to handle data for >1 organisation - well,
      >that's a fundamental change isn't it?
      >
      >Mike
      >[/color]

      The same issue comes up whether it is a design change or an initial design
      requirement. It's just easier to describe the issue in terms of a design
      change. In either case, there is a large section of schema in which all the
      data is normally partitioned by some owning entity such as a Company.

      Comment

      • Mike MacSween

        #4
        Re: Best way to model hybrid single/multiple logical database

        "Steve Jorgensen" <nospam@nospam. nospam> wrote in message
        news:q06dvv48av r5fckv1rp59djsb hvrbj2rtl@4ax.c om...
        [color=blue]
        > The same issue comes up whether it is a design change or an initial design
        > requirement.[/color]

        Sure, what's the design then?
        [color=blue]
        > It's just easier to describe the issue in terms of a design
        > change.[/color]

        Describe it then.
        [color=blue]
        > In either case, there is a large section of schema in which all the
        > data is normally partitioned by some owning entity such as a Company.[/color]

        By 'a large section of schema' do you mean that certain tables are to do
        with a Company? Or that some rows from some tables are to do with a Company?

        Maybe you could tell us some more. Like some of the tables, what data they
        contain, and the relationships between them.

        Cheers, Mike MacSween


        Comment

        • Steve Jorgensen

          #5
          Re: Best way to model hybrid single/multiple logical database

          On Sat, 03 Jan 2004 07:11:27 GMT, Steve Jorgensen <nospam@nospam. nospam>
          wrote:
          [color=blue]
          >I've run across this issue several times of late, and I've never come up with
          >a satisfactory answer to the best way to handle this schema issue. You have a
          >large section of schema in which a subset of records across all tables is
          >often considered a separate logical system, but sometimes may be treaded ar
          >part of the global system, and there is not simply a 1-m-m... tree among the
          >records in a logical database.
          >
          >Here's an example. A system that was originally designed to handle only data
          >for a single company is now handling data for multiple companies. So, the
          >simplest concept to implement would be to simply add CompanyID to all the
          >tables, and include that in all the relationships between the tables.
          >Somehow, that's not very satisfying.
          >
          >One obvious problem with this is that among the tables with the largest number
          >of rows, adding a column is a big cost in size. These are mostly subdetail
          >records, so one would think it would work to simply rely on the parent
          >record's CompanyID, but wait - The detail table references lookup tables that
          >are also company-specific. If the CompanyID is omitted from the table, there
          >is no way to have the relationship enforce that all the related records belong
          >to the same company. One can rely on the application to do that, but this
          >leaves some danger of integrity problems when manual table maintenance is
          >performed.
          >
          >Has anyone come up with a better strategy for handling these kinds of designs?[/color]

          I see that perhaps, my explanation was not clear enough. In one of the
          applications this came up on, almost every table in the system needed to be
          partitioned by company, spanning just about every kind of table role you can
          imagine. Company-specific employees, certifications, exams, exam types,
          employee exam date/score data, test questions, etc. ad infinitum. In another
          case, relative to the whole system, the company-specific portion of the schema
          was a small fraction of the overall schema, but still large in terms of data
          and spans all types of tables including lookups, details, subdetails, M-M
          junciton tables, etc.

          Essentially, the only bullet-proof schema I can come up with for these cases
          is to go to 5N form for nearly -all- the impacted tables.

          Comment

          • Steve Jorgensen

            #6
            Re: Best way to model hybrid single/multiple logical database

            On Sat, 03 Jan 2004 17:40:07 GMT, Steve Jorgensen <nospam@nospam. nospam>
            wrote:

            ....[color=blue]
            >
            >Essentially, the only bullet-proof schema I can come up with for these cases
            >is to go to 5N form for nearly -all- the impacted tables.[/color]

            OK, in the shower I just realized that was just plain wrong. None of the
            tables go to 5N form. The right thing to say would be simply that pretty much
            all involved tables seem to need CompanyID added to allow for enforcing that
            all related records must belong to the same company.

            Comment

            • David W. Fenton

              #7
              Re: Best way to model hybrid single/multiple logical database

              nospam@nospam.n ospam (Steve Jorgensen) wrote in
              <q06dvv48avr5fc kv1rp59djsbhvrb j2rtl@4ax.com>:
              [color=blue]
              >The same issue comes up whether it is a design change or an
              >initial design requirement. It's just easier to describe the
              >issue in terms of a design change. In either case, there is a
              >large section of schema in which all the data is normally
              >partitioned by some owning entity such as a Company.[/color]

              I have a simpler example of the problem:

              Parking Garage Customer Management.

              Basic entities: garages, customers, cars

              The customers park in only one garage at a time (business rule or
              entity attribute?).

              The cars table lists the cars for each customer, but have
              attributes that are restricted by which garage the car is parked
              in.

              Is the Garage an attribute of the car or an attribute of the
              Customer?

              My actual data model (this app was built in 1997, back when I
              wasn't very good at this) puts the GarageID in the Customer record
              *and* in the Car record. However, the Car table's data fields are
              not using RI to restrict what can be entered (e.g., Cars are
              assigned numbers specific to a garage, no upper limit but you can't
              have two cars assigned the same number) -- it could be done with a
              unique index on GarageID and CarNumber, but I actually never
              implemented that! I'm not sure why, except that the data model was
              clearly not very carefully delineated in my mind, as I still don't
              know why GarageID is in both the Customer and Car tables.

              Well, yes, I *do* know why -- because that's how I was able to get
              it to do what I needed to do. But there's nothing except UI objects
              (a combo box with an outer join rowsource based on a table of
              possible car numbers) that enforces the business rule.

              Now, looking at this schema again (having reworked the app twice in
              the last 18 months for new customers), I think that the GarageID in
              the customer record doesn't belong there at all. There is no
              necessity that one customer would park a car in only one garage.
              While this particular organization has all their parking garages in
              the city of New York (3 different boroughs), which makes it
              unlikely that someone will rent spaces in more than one garage,
              it's not at all inconceivable. Indeed, someone might rent a space
              in a garage in Brooklyn for home parking and a space in Manhattan
              for work parking. The current application would require that the
              customer be entered twice, once with each GarageID. That's bad.

              So, in this example, Garage is really an attribute of the Car, and
              that's pretty clear.

              In the case of invoices generated for different companies, it's
              more complicated, and not as immediately obvious what is better. No
              invoice should have mix items from different companies, as
              companies don't (presumably) crossbill for each others' products.
              So, it would seem that CompanyID is an attribute of the invoice.
              But, as you ask, how to get RI enforcement for the specifics?

              I had thought that there would be a messy triangular way to model
              this, but I think there is.

              I think the flaw is in assuming that tblInvoice.Comp anyID and
              tblInvoiceDetai l.CompanyID actually represent the same data. They
              really don't -- even though they point to the same entity in the
              relationship model, the CompanyID that restricts which products can
              be attached to a particular InvoiceDetail record is not really
              serving the same function as the CompanyID in the invoice header.

              Think of it this way:

              CompanyID in the Invoice header is really BillingEntityID .

              CompanyID in the InvoiceDetail is really SupplierID.

              Now, in your model, the crucial rule is this:

              - No invoice may include products from any supplier but the one
              that's creating the invoice.

              But that's a rule of your particular application, and is a business
              rule that might very well change. Perhaps you *don't* want to build
              it into your schema at all because it's subject to change. Now that
              the companies have merged, maybe they'll start cross-stocking
              inventory from each other and shipping/billing each others'
              products. A schema that has your business rule built into it will
              need to be changed, while a schema that has no necessity of matchup
              between BillingEntityID in tblInvoice and SupplierID in
              tblInvoiceDetai l will not.

              The upshot of that observation (that the crucial problem is not
              really a part of the data relationship, but a business rule) is
              that it becomes less imperative to enforce this rule in the data
              schema itself.

              Assuming you're using your usual tools (Access + SQL Server) it
              sounds to me like you'd be best to use a normal data schema
              (BillingEntityI D in the invoice header, SupplierID in the products
              table) and have a stored procedure that returns only valid products
              for the particular invoice.

              In a sense, you've got a 3-tiered structure there, *if* you
              consider the data schema to be a base tier and views and stored
              procedures to be a middle tier between your schema and your
              application. While that's not at all the way the terminology is
              usually used, it does indicate the degree to which there's a huge
              difference between putting a business rule in the database schema
              and putting it in views and stored procedures. While the business
              rules are still locked up in your particular back end database
              engine (the cardinal violation of two-tier apps that three-tier
              apps are intended to address), they are not implemented in the same
              degree of fixity. Stored procedures and views are much more
              susceptible to alteration as business rules change, much moreso
              than the data schema itself.

              If through security you restrict all manipulation of the data
              tables to the supplied stored procedures and views, you've
              accomplished your task, to prohibit the creation of invalid data in
              your database.

              --
              David W. Fenton http://www.bway.net/~dfenton
              dfenton at bway dot net http://www.bway.net/~dfassoc

              Comment

              • David W. Fenton

                #8
                Re: Best way to model hybrid single/multiple logical database

                nospam@nospam.n ospam (Steve Jorgensen) wrote in
                <lu0evvg7pelsb6 qpdnvlcoobsgh7g aq5or@4ax.com>:
                [color=blue]
                >On Sat, 03 Jan 2004 17:40:07 GMT, Steve Jorgensen
                ><nospam@nospam .nospam> wrote:
                >
                >...[color=green]
                >>
                >>Essentially , the only bullet-proof schema I can come up with for
                >>these cases is to go to 5N form for nearly -all- the impacted
                >>tables.[/color]
                >
                >OK, in the shower I just realized that was just plain wrong. None
                >of the tables go to 5N form. The right thing to say would be
                >simply that pretty much all involved tables seem to need CompanyID
                >added to allow for enforcing that all related records must belong
                >to the same company.[/color]

                See my post where I suggest treating stored procedures and views as
                your "middle tier" for enforcing what seem to me to be business
                rules, not aspects of the schema. If you secure the base tables
                from manipulation through anything other than the SPs and views
                that encapsulate your business rules, you've fixed your problem
                without needing to alter the data schema.

                You've also allowed for the possibility that the distinction that
                is today crucial may one day vanish entirely.

                --
                David W. Fenton http://www.bway.net/~dfenton
                dfenton at bway dot net http://www.bway.net/~dfassoc

                Comment

                • Tony Toews

                  #9
                  Re: Best way to model hybrid single/multiple logical database

                  Steve Jorgensen <nospam@nospam. nospam> wrote:
                  [color=blue]
                  >OK, in the shower I just realized that was just plain wrong. None of the
                  >tables go to 5N form. The right thing to say would be simply that pretty much
                  >all involved tables seem to need CompanyID added to allow for enforcing that
                  >all related records must belong to the same company.[/color]

                  If you setup your queries to get enough tables to pickup your jobid from a parent you
                  don't need to worry about duplicating the companyid all over the place.

                  If you do decide to do this you have to add code to all kinds of forms on the
                  BeforeInsert event. Now this approach may very well work.

                  We just decided for no good reason to use queries for this and we kept on going.
                  But then the system kept on getting more and more complex as the client kept
                  requesting functionality too. <smile>

                  Tony
                  --
                  Tony Toews, Microsoft Access MVP
                  Please respond only in the newsgroups so that others can
                  read the entire thread of messages.
                  Microsoft Access Links, Hints, Tips & Accounting Systems at

                  Comment

                  • Tony Toews

                    #10
                    Re: Best way to model hybrid single/multiple logical database

                    Steve Jorgensen <nospam@nospam. nospam> wrote:
                    [color=blue]
                    >I see that perhaps, my explanation was not clear enough.[/color]

                    Your explanation made a lot of sense to me but then that's because I've been working
                    with such a system for the past few years. Except that this was by jobs. The shop
                    might have 2 or 50 jobs per year. The data with the exception of some basic master
                    tables, ie weld type, QC test type, is all completely seperate by job.
                    [color=blue]
                    >Essentially, the only bullet-proof schema I can come up with for these cases
                    >is to go to 5N form for nearly -all- the impacted tables.[/color]

                    No idea what 5N form means as I have no formal education in databases and even the
                    sentence or two describing 3N form doesn't make much sense. That said my systems
                    are definitely normalized. <smile>

                    The jobid (aka companyid in your case) is present in only the first table
                    "relational ly down" from the jobid table. So while the master item table has the
                    jobid the component table did not. My queries always did the relationship back to
                    the master item table to get the jobid for the form viewing and report selecting.
                    Sometimes I would have to join five or six tables to finally get to a table which had
                    the jobid.

                    Furthermore we had a form where you selected the job number. We also had active or
                    inactive combo box to make this easier for mousing folks. Once you selected the job
                    another large form, which was yellow and appeared very small, appeared in the upper
                    left hand corner. All the job specific fields were on this form so we used them
                    throughout the app. That is, where appropriate, combo box queries filtered the
                    records by jobid retrieved from that form. We put this right inside the combbox
                    recordsource.

                    I should also add that this is my biggest app in Access. I've occasionally
                    mentioned in the past. 140-150 tables, 1200 queries, 450 forms, 350 reports, 70K
                    lines of code last time I counted.

                    Tony
                    --
                    Tony Toews, Microsoft Access MVP
                    Please respond only in the newsgroups so that others can
                    read the entire thread of messages.
                    Microsoft Access Links, Hints, Tips & Accounting Systems at

                    Comment

                    • Mike MacSween

                      #11
                      Re: Best way to model hybrid single/multiple logical database

                      "David W. Fenton" <dXXXfenton@bwa y.net.invalid> wrote in message
                      [color=blue]
                      > Is the Garage an attribute of the car or an attribute of the
                      > Customer?[/color]

                      It's an entity.

                      Mike


                      Comment

                      • David W. Fenton

                        #12
                        Re: Best way to model hybrid single/multiple logical database

                        mike.macsween.n ospam@btinterne t.com (Mike MacSween) wrote in
                        <3ff73428$0$528 88$5a6aecb4@new s.aaisp.net.uk> :
                        [color=blue]
                        >"David W. Fenton" <dXXXfenton@bwa y.net.invalid> wrote in message
                        >[color=green]
                        >> Is the Garage an attribute of the car or an attribute of the
                        >> Customer?[/color]
                        >
                        >It's an entity.[/color]

                        That's not what I meant. Of course, it's an entity. But is its
                        relation to the car or to the customer?

                        --
                        David W. Fenton http://www.bway.net/~dfenton
                        dfenton at bway dot net http://www.bway.net/~dfassoc

                        Comment

                        • rkc

                          #13
                          Re: Best way to model hybrid single/multiple logical database


                          "David W. Fenton" <dXXXfenton@bwa y.net.invalid> wrote in message
                          news:9465CA42Fd fentonbwaynetin vali@24.168.128 .74...[color=blue]
                          > mike.macsween.n ospam@btinterne t.com (Mike MacSween) wrote in
                          > <3ff73428$0$528 88$5a6aecb4@new s.aaisp.net.uk> :
                          >[color=green]
                          > >"David W. Fenton" <dXXXfenton@bwa y.net.invalid> wrote in message
                          > >[color=darkred]
                          > >> Is the Garage an attribute of the car or an attribute of the
                          > >> Customer?[/color]
                          > >
                          > >It's an entity.[/color]
                          >
                          > That's not what I meant. Of course, it's an entity. But is its
                          > relation to the car or to the customer?[/color]

                          Would not Car, Customer and Garage be attributes of a Rental relation?
                          That is their indentifiers would be.



                          Comment

                          • Steve Jorgensen

                            #14
                            Re: Best way to model hybrid single/multiple logical database

                            On Sat, 03 Jan 2004 20:59:09 GMT, dXXXfenton@bway .net.invalid (David W.
                            Fenton) wrote:

                            ....[color=blue]
                            >
                            >In a sense, you've got a 3-tiered structure there, *if* you
                            >consider the data schema to be a base tier and views and stored
                            >procedures to be a middle tier between your schema and your
                            >application. While that's not at all the way the terminology is
                            >usually used, it does indicate the degree to which there's a huge
                            >difference between putting a business rule in the database schema
                            >and putting it in views and stored procedures. While the business
                            >rules are still locked up in your particular back end database
                            >engine (the cardinal violation of two-tier apps that three-tier
                            >apps are intended to address), they are not implemented in the same
                            >degree of fixity. Stored procedures and views are much more
                            >susceptible to alteration as business rules change, much moreso
                            >than the data schema itself.[/color]

                            What you're saying makes sense. It is also true, though, that any schema will
                            necessarily be a model with a finite flexibiltity and is, itself, really a
                            reflection of business rules, though they are only the most general rules
                            presumed to be fixed enough that the risk of change justifies the cost
                            involved in changing the schema.

                            In the case I've been dealing with here, there is less evidence that the
                            breaking of the partitioning is a likely thing to change which is why I didn't
                            consider thinking of the partitioning of related data sets as being a "mere"
                            business rule. I see, though, that it is somewhat reasonable if only because
                            there's a finite chance that it could change and because there are additional
                            performance and applciation complexity factors in favor of not enforcing the
                            rule via the schema. Still, not doing it in the schema might be considered a
                            denormalization .

                            Here's a 3rd recent case I thought of. I know you already get this, but I
                            like this example ... A hotel has rooms types, a hotel has rate periods, and a
                            hotel-room-type has rates that form a junction with room-types and rate
                            periods. The room rates table -can- get its hotel ID via either its room type
                            or its rate period, but if we decide the schema should enforce that a rate's
                            room and period must belong to the same hotel, then we need the hotel ID in
                            the rates table.

                            If I understand your point correctly, we should simply not assume that this
                            restriction is really immutable, and it could be that later, a hotel room type
                            or a rate period will always be an attribute of a single hotel. In this case,
                            it's counterproducti ve to enforce the rule in the relationships to the room
                            rates table. The counter-argument would be that the schema has to change
                            anyway at that point to allow room types or rate periods to be disentangled
                            from hotels, but it would still be less to change.
                            [color=blue]
                            >
                            >If through security you restrict all manipulation of the data
                            >tables to the supplied stored procedures and views, you've
                            >accomplished your task, to prohibit the creation of invalid data in
                            >your database.[/color]

                            Well, 1 of my 3 cases, the back-end is JET, so it's not so desirable to let
                            "stored procedures" handle all updates since that prohibits editing via
                            continuous forms. I guess another way around it is to write a data integrity
                            check process that can be run by the developer or administrator following any
                            manual table maintenance. Come to think of it, that's probably acceptable for
                            an Access app, so long as somthing is done to make that hard to forget to do
                            (still thionking what that would be).

                            Comment

                            • Mike MacSween

                              #15
                              Re: Best way to model hybrid single/multiple logical database

                              "David W. Fenton" <dXXXfenton@bwa y.net.invalid> wrote in message
                              [color=blue]
                              > Basic entities: garages, customers, cars
                              >
                              > The customers park in only one garage at a time (business rule or
                              > entity attribute?).[/color]

                              What do you mean, that a customer is 'allocated' to one garage at any
                              particular time?
                              [color=blue]
                              > The cars table lists the cars for each customer, but have
                              > attributes that are restricted by which garage the car is parked
                              > in.[/color]

                              So it's one to many, customers to cars? This 'attributes are restricted...'
                              thing I can't make sense of. How can the attributes of a car entity be
                              governed by where it is parked? Perhaps an example would help.
                              [color=blue]
                              > Is the Garage an attribute of the car or an attribute of the
                              > Customer?[/color]

                              Like Rick says, you probably need a 'Rental' entity.

                              Mike



                              Comment

                              Working...