Cartesian product bug?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Ã…smund Kveim Lie

    Cartesian product bug?

    Hi,

    We have found a possible bug in 7.3.1. It seems that using CROSS JOIN and
    doing plain Cartesian product, listing to tables in the from clause, gives
    different results. According to the documentation this should be
    equivalent. The following example should explain the problem:

    CREATE TABLE a (a1 text, a2 text);
    CREATE TABLE b (b1 text, b2 text);
    CREATE TABLE c (a1 text, b1 text, c1 text);

    INSERT INTO a VALUES('a1', 'a2');
    INSERT INTO b VALUES('b1', 'b2');
    INSERT INTO c VALUES('a3', 'b1', 'c1');

    SELECT * FROM a,b NATURAL JOIN c;
    a1 | a2 | b1 | b2 | a1 | c1
    ----+----+----+----+----+----
    a1 | a2 | b1 | b2 | a3 | c1
    (1 row)

    SELECT * FROM a CROSS JOIN b NATURAL JOIN c;
    a1 | b1 | a2 | b2 | c1
    ----+----+----+----+----
    (0 rows)

    These two example queries should give the same result. In the first query,
    it seems like it’s doing the natural join between b and c, and then does
    the Cartesian product on that result with a. On the second query, it does
    as we assume it should, namely does the Cartesian product first.

    Is this the correct behavior?

    Regards

    Ã…smund


    --
    Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
  • Greg Stark

    #2
    Re: Cartesian product bug?

    Åsmund Kveim Lie <asmundkl@skipt his.ifi.uio.no> writes:
    [color=blue]
    > SELECT * FROM a,b NATURAL JOIN c;[/color]

    This parses as

    select * from a, (b natural join c)
    [color=blue]
    > SELECT * FROM a CROSS JOIN b NATURAL JOIN c;[/color]

    This parses as

    select * from (a cross join b) natural join c
    [color=blue]
    > These two example queries should give the same result. In the first query, it
    > seems like it’s doing the natural join between b and c, and then does
    > the Cartesian product on that result with a. On the second query, it does as
    > we assume it should, namely does the Cartesian product first.
    >
    > Is this the correct behavior?[/color]

    yes

    You can put parentheses to change the explicit joins like

    select * from a cross join (b natural join c);

    But the implicit join is harder to fix. I think you either need to use an
    explicit join like above or a subquery like

    select * from (select * from a,b) as ab natural join c

    I tend to find it's easier to stick to all explicit or all implicit joins and
    not mix them. Personally I like explicit joins for aesthetic reasons
    especially in 7.4 where they get optimized as well as implicit joins.


    --
    greg


    ---------------------------(end of broadcast)---------------------------
    TIP 3: if posting/reading through Usenet, please send an appropriate
    subscribe-nomail command to majordomo@postg resql.org so that your
    message can get through to the mailing list cleanly

    Comment

    • Tom Lane

      #3
      Re: Cartesian product bug?

      =?utf-8?Q?=C3=85smund _Kveim_Lie?= <asmundkl@skipt his.ifi.uio.no> writes:
      [color=blue]
      > SELECT * FROM a,b NATURAL JOIN c;[/color]
      [color=blue]
      > SELECT * FROM a CROSS JOIN b NATURAL JOIN c;[/color]
      [color=blue]
      > These two example queries should give the same result.[/color]

      No, they shouldn't, because JOIN binds more tightly than comma. The
      first is equivalent to

      SELECT * FROM a CROSS JOIN (b NATURAL JOIN c);

      while in the second case the JOINs associate left-to-right, giving

      SELECT * FROM (a CROSS JOIN b) NATURAL JOIN c;

      Because you have columns with the same names in A and C, the second
      NATURAL JOIN has a different implicit join clause than the first.

      (Personally I think NATURAL JOIN is an evil, bug-prone construct,
      precisely because coincidental matches of column names will mess up your
      results.)
      [color=blue]
      > In the first query, it seems like it’s doing the natural
      > join between b and c, and then does the Cartesian product on that
      > result with a. On the second query, it does as we assume it should,
      > namely does the Cartesian product first.[/color]

      I think your expectations have been set by MySQL, which last I heard
      interprets all joins as being done left-to-right. That's not compliant
      with the SQL standard, however.

      regards, tom lane

      ---------------------------(end of broadcast)---------------------------
      TIP 2: you can get off all lists at once with the unregister command
      (send "unregister YourEmailAddres sHere" to majordomo@postg resql.org)

      Comment

      • scott.marlowe

        #4
        Re: Cartesian product bug?

        On Fri, 31 Oct 2003, Tom Lane wrote:
        [color=blue]
        > (Personally I think NATURAL JOIN is an evil, bug-prone construct,
        > precisely because coincidental matches of column names will mess up your
        > results.)[/color]

        Me too. When I first saw it, I figured it would "naturally join" the two
        tables on their fk/pk relation if there was one. That seems natural.
        Joining on two fields that just happen to have the same name is unnatural
        to me.


        ---------------------------(end of broadcast)---------------------------
        TIP 9: the planner will ignore your desire to choose an index scan if your
        joining column's datatypes do not match

        Comment

        • Greg Stark

          #5
          Re: Cartesian product bug?


          "scott.marl owe" <scott.marlowe@ ihs.com> writes:
          [color=blue]
          > On Fri, 31 Oct 2003, Tom Lane wrote:
          >[color=green]
          > > (Personally I think NATURAL JOIN is an evil, bug-prone construct,
          > > precisely because coincidental matches of column names will mess up your
          > > results.)[/color]
          >
          > Me too. When I first saw it, I figured it would "naturally join" the two
          > tables on their fk/pk relation if there was one. That seems natural.
          > Joining on two fields that just happen to have the same name is unnatural
          > to me.[/color]

          Well 99% of the time I impose on myself a constraint to only use the same name
          iff they refer to the same attribute. So if they have the same name then they
          really ought to be a reasonable join clause.

          However the 1% are things like "date_creat ed, date_updated" or even flags like
          "active", "deleted" etc. Which are more than enough to make it utterly
          useless.

          Too bad really, it would be a handy thing for ad-hoc queries typed at psql. It
          would still seem too fragile for production queries though.

          --
          greg


          ---------------------------(end of broadcast)---------------------------
          TIP 1: subscribe and unsubscribe commands go to majordomo@postg resql.org

          Comment

          • Bruce Momjian

            #6
            Re: Cartesian product bug?

            Greg Stark wrote:[color=blue]
            >
            > "scott.marl owe" <scott.marlowe@ ihs.com> writes:
            >[color=green]
            > > On Fri, 31 Oct 2003, Tom Lane wrote:
            > >[color=darkred]
            > > > (Personally I think NATURAL JOIN is an evil, bug-prone construct,
            > > > precisely because coincidental matches of column names will mess up your
            > > > results.)[/color]
            > >
            > > Me too. When I first saw it, I figured it would "naturally join" the two
            > > tables on their fk/pk relation if there was one. That seems natural.
            > > Joining on two fields that just happen to have the same name is unnatural
            > > to me.[/color]
            >
            > Well 99% of the time I impose on myself a constraint to only use the same name
            > iff they refer to the same attribute. So if they have the same name then they
            > really ought to be a reasonable join clause.
            >
            > However the 1% are things like "date_creat ed, date_updated" or even flags like
            > "active", "deleted" etc. Which are more than enough to make it utterly
            > useless.
            >
            > Too bad really, it would be a handy thing for ad-hoc queries typed at psql. It
            > would still seem too fragile for production queries though.[/color]

            I think the reason they don't use pk/fk in natural joins is because you can
            join all sorts of results, like SELECT in FROM, that doesn't always have
            a meaningful pk/fk.

            --
            Bruce Momjian | http://candle.pha.pa.us
            pgman@candle.ph a.pa.us | (610) 359-1001
            + If your life is a hard drive, | 13 Roberts Road
            + Christ can be your backup. | Newtown Square, Pennsylvania 19073

            ---------------------------(end of broadcast)---------------------------
            TIP 7: don't forget to increase your free space map settings

            Comment

            Working...