SET vs Fully Normalised vs Both

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Christopher Key

    SET vs Fully Normalised vs Both

    Hi,

    I'm looking for some advice on database structure.

    Without unnecessary details, I have a database that deals, in part,
    with 'matches' each of which progresses through a series of statuses.
    I need to store when the match progressed to each of these statuses.

    At present, I have two tables, matches and matchstatuses:

    matches:
    matchid (PK)
    etc...

    matchstatus:
    matchstatusid (PK)
    matchid
    status (ENUM)
    datechanged
    etc...

    This works fine, except that it's somewhat messy to query, and I
    assume potentially slow, as I have to resort to using HAVING ... an
    awful lot.

    What I was considering is using a SET field, status, in the matches
    table, whose values coincided with the status ENUM. I would just
    update this at the same time as any inserts into matchstatuses. I'm
    using transactions, and the code is reasonably simple, so maintaining
    consistency isn't an issue. This would certainly simplify a lot of my
    queries, as most don't require the date information.

    My main concern is with style. This seems a somewhat messy solution
    to the problem, and I can well imagine such redundancy being frowned
    upon.

    I did also consider just adding a set of fields to the match table,
    each nullable, and corresponding to a given status. This however
    seemed wrong as:

    1) A lot of the matches will not go through a lot of the statuses,
    leading to wasted storage of 4 bytes per status per record.

    2) I'd rather update the ENUM / SET definitions than add additional
    fields if new statuses need to be added.

    3) ENUMs and SETs are sort of normalised, whereas having a field per
    status clearly isn't.



    I'd be very grateful for any thoughts on whether using an ENUM and SET
    in the above mentioned way is considered very bad form, as well as
    whether using a field per status is better or worse. I'd also be very
    grateful of any suggestions as to an better approach to this.

    Many thanks,

    Chris Key
  • Bill Karwin

    #2
    Re: SET vs Fully Normalised vs Both

    Christopher Key wrote:[color=blue]
    > My main concern is with style. This seems a somewhat messy solution
    > to the problem, and I can well imagine such redundancy being frowned
    > upon.[/color]

    I try to avoid redundancy because there's a risk that the data will
    become inconsistent. There's the academic issue of style or relational
    normalization, but the more practical risk that is more likely to bite
    you is data integrity.
    [color=blue]
    > I did also consider just adding a set of fields to the match table,
    > each nullable, and corresponding to a given status. This however
    > seemed wrong as:
    >
    > 1) A lot of the matches will not go through a lot of the statuses,
    > leading to wasted storage of 4 bytes per status per record.
    >
    > 2) I'd rather update the ENUM / SET definitions than add additional
    > fields if new statuses need to be added.
    >
    > 3) ENUMs and SETs are sort of normalised, whereas having a field per
    > status clearly isn't.[/color]

    Actually, I believe they're equivalent with respect to breaking
    normalization. But it's only a fine point; normalization often has to
    be broken to achieve some tradeoff with query simplicity or efficiency.

    Here's another possible solution: you could keep the two-table design,
    and create a "virtual" column-per-status-value, by doing an outer join
    to the matchstatus table for each status value:

    select distinct m.*, ms1.status, ms2.status, ms3.status
    from matches m
    left outer join matchstatus ms1 on m.matchid = ms1.matchid and
    ms1.status = 'STATUS1'
    left outer join matchstatus ms2 on m.matchid = ms2.matchid and
    ms2.status = 'STATUS2'
    left outer join matchstatus ms3 on m.matchid = ms3.matchid and
    ms3.status = 'STATUS3'

    That way you don't need to add values to the SET or add fields to the
    matches table if you add values. But you would need to add another
    joined instances of the matchstatus table to your query.

    Regards,
    Bill K.

    Comment

    • Christopher Key

      #3
      Re: SET vs Fully Normalised vs Both

      > > My main concern is with style. This seems a somewhat messy solution[color=blue][color=green]
      > > to the problem, and I can well imagine such redundancy being frowned
      > > upon.[/color]
      >
      > I try to avoid redundancy because there's a risk that the data will
      > become inconsistent. There's the academic issue of style or relational
      > normalization, but the more practical risk that is more likely to bite
      > you is data integrity.[/color]

      Thanks, will certainly keep this in mind. Fortunately, there are very
      few different sections of code that perform any updating, and I will
      be using transactions, but still something to avoid if possible.
      [color=blue][color=green]
      > > I did also consider just adding a set of fields to the match table,
      > > each nullable, and corresponding to a given status. This however
      > > seemed wrong as:
      > >
      > > 1) A lot of the matches will not go through a lot of the statuses,
      > > leading to wasted storage of 4 bytes per status per record.
      > >
      > > 2) I'd rather update the ENUM / SET definitions than add additional
      > > fields if new statuses need to be added.
      > >
      > > 3) ENUMs and SETs are sort of normalised, whereas having a field per
      > > status clearly isn't.[/color]
      >
      > Actually, I believe they're equivalent with respect to breaking
      > normalization. But it's only a fine point; normalization often has to
      > be broken to achieve some tradeoff with query simplicity or efficiency.
      >
      > Here's another possible solution: you could keep the two-table design,
      > and create a "virtual" column-per-status-value, by doing an outer join
      > to the matchstatus table for each status value:
      >
      > select distinct m.*, ms1.status, ms2.status, ms3.status
      > from matches m
      > left outer join matchstatus ms1 on m.matchid = ms1.matchid and
      > ms1.status = 'STATUS1'
      > left outer join matchstatus ms2 on m.matchid = ms2.matchid and
      > ms2.status = 'STATUS2'
      > left outer join matchstatus ms3 on m.matchid = ms3.matchid and
      > ms3.status = 'STATUS3'
      >
      > That way you don't need to add values to the SET or add fields to the
      > matches table if you add values. But you would need to add another
      > joined instances of the matchstatus table to your query.[/color]

      Yes, this seems like the best solution. I've actually used a similar
      joining technique elsewhere in the project, and am amazed that I
      didn't think to apply it here. This will certainly make for far more
      readable code than present:

      ....
      GROUP BY matchid
      HAVING NOT NOT (
      BIT_OR(status = 'STATUS1') AND NOT BIT_OR(status=' STATUS2')
      )

      I assume it'll also be potentially more efficient aswell, with MySQL
      not having to faff around with aggregate functions etc.


      Many thanks for the advice,

      Chris Key

      Comment

      Working...