SET vs Fully Normalised vs Both

**Bill Karwin** · Jul 20 '05, 12:33 AM

Re: SET vs Fully Normalised vs Both

Christopher Key wrote:[color=blue]
> My main concern is with style. This seems a somewhat messy solution
> to the problem, and I can well imagine such redundancy being frowned
> upon.[/color]

I try to avoid redundancy because there's a risk that the data will
become inconsistent. There's the academic issue of style or relational
normalization, but the more practical risk that is more likely to bite
you is data integrity.
[color=blue]
> I did also consider just adding a set of fields to the match table,
> each nullable, and corresponding to a given status. This however
> seemed wrong as:
>
> 1) A lot of the matches will not go through a lot of the statuses,
> leading to wasted storage of 4 bytes per status per record.
>
> 2) I'd rather update the ENUM / SET definitions than add additional
> fields if new statuses need to be added.
>
> 3) ENUMs and SETs are sort of normalised, whereas having a field per
> status clearly isn't.[/color]

Actually, I believe they're equivalent with respect to breaking
normalization. But it's only a fine point; normalization often has to
be broken to achieve some tradeoff with query simplicity or efficiency.

Here's another possible solution: you could keep the two-table design,
and create a "virtual" column-per-status-value, by doing an outer join
to the matchstatus table for each status value:

select distinct m.*, ms1.status, ms2.status, ms3.status
from matches m
left outer join matchstatus ms1 on m.matchid = ms1.matchid and
ms1.status = 'STATUS1'
left outer join matchstatus ms2 on m.matchid = ms2.matchid and
ms2.status = 'STATUS2'
left outer join matchstatus ms3 on m.matchid = ms3.matchid and
ms3.status = 'STATUS3'

That way you don't need to add values to the SET or add fields to the
matches table if you add values. But you would need to add another
joined instances of the matchstatus table to your query.

Regards,
Bill K.

**Christopher Key** · Jul 20 '05, 12:34 AM

Re: SET vs Fully Normalised vs Both

> > My main concern is with style. This seems a somewhat messy solution[color=blue][color=green]
> > to the problem, and I can well imagine such redundancy being frowned
> > upon.[/color]
>
> I try to avoid redundancy because there's a risk that the data will
> become inconsistent. There's the academic issue of style or relational
> normalization, but the more practical risk that is more likely to bite
> you is data integrity.[/color]

Thanks, will certainly keep this in mind. Fortunately, there are very
few different sections of code that perform any updating, and I will
be using transactions, but still something to avoid if possible.
[color=blue][color=green]
> > I did also consider just adding a set of fields to the match table,
> > each nullable, and corresponding to a given status. This however
> > seemed wrong as:
> >
> > 1) A lot of the matches will not go through a lot of the statuses,
> > leading to wasted storage of 4 bytes per status per record.
> >
> > 2) I'd rather update the ENUM / SET definitions than add additional
> > fields if new statuses need to be added.
> >
> > 3) ENUMs and SETs are sort of normalised, whereas having a field per
> > status clearly isn't.[/color]
>
> Actually, I believe they're equivalent with respect to breaking
> normalization. But it's only a fine point; normalization often has to
> be broken to achieve some tradeoff with query simplicity or efficiency.
>
> Here's another possible solution: you could keep the two-table design,
> and create a "virtual" column-per-status-value, by doing an outer join
> to the matchstatus table for each status value:
>
> select distinct m.*, ms1.status, ms2.status, ms3.status
> from matches m
> left outer join matchstatus ms1 on m.matchid = ms1.matchid and
> ms1.status = 'STATUS1'
> left outer join matchstatus ms2 on m.matchid = ms2.matchid and
> ms2.status = 'STATUS2'
> left outer join matchstatus ms3 on m.matchid = ms3.matchid and
> ms3.status = 'STATUS3'
>
> That way you don't need to add values to the SET or add fields to the
> matches table if you add values. But you would need to add another
> joined instances of the matchstatus table to your query.[/color]

Yes, this seems like the best solution. I've actually used a similar
joining technique elsewhere in the project, and am amazed that I
didn't think to apply it here. This will certainly make for far more
readable code than present:

....
GROUP BY matchid
HAVING NOT NOT (
BIT_OR(status = 'STATUS1') AND NOT BIT_OR(status=' STATUS2')
)

I assume it'll also be potentially more efficient aswell, with MySQL
not having to faff around with aggregate functions etc.

Many thanks for the advice,

Chris Key

SET vs Fully Normalised vs Both

SET vs Fully Normalised vs Both

Comment

Comment