Join works in 7.3.6, fails in 7.4.2

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Michael Fuhr

    Join works in 7.3.6, fails in 7.4.2

    I have a query that works in 7.3.6 but not in 7.4.2 unless I turn
    off enable_hashjoin . I'm joining a table of network interfaces and
    a table of networks so I can find additional info about a particular
    interface's network. To speed up the join, I'm indexing the
    interface IP addresses using a function that converts the IP address
    to its network address; this way the join doesn't have to scan using
    the << or >> operator.

    Here's a reduced example of what I'm doing:

    CREATE FUNCTION inet2net (INET) RETURNS INET AS '
    SELECT NETWORK(SET_MAS KLEN($1, 24));
    ' LANGUAGE SQL IMMUTABLE;

    CREATE TABLE ipinterface (
    ifid INTEGER NOT NULL PRIMARY KEY,
    ifaddr INET NOT NULL
    );

    CREATE INDEX ipinterface_ifa ddr_idx ON ipinterface (ifaddr);
    CREATE INDEX ipinterface_ifa ddrnet_idx ON ipinterface (inet2net(ifadd r));

    CREATE TABLE ipnet (
    netid INTEGER NOT NULL PRIMARY KEY,
    netaddr INET NOT NULL,
    CONSTRAINT uniq_netaddr UNIQUE (netaddr)
    );

    CREATE INDEX ipnet_netaddr_i dx ON ipnet (netaddr);

    After populating the tables, I ran VACUUM ANALYZE on both of them,
    so the planner's statistics should be current.

    Here's a query that illustrates the problem:

    SELECT ifid, ifaddr, netid, netaddr
    FROM ipinterface AS i
    JOIN ipnet AS n ON (inet2net(i.ifa ddr) = n.netaddr)
    WHERE netid IN (10, 20);

    From my sample data set (available upon request), this query returns
    24 rows in 7.3.6, which is correct. Here's the 7.3.6 EXPLAIN ANALZYE:

    Nested Loop (cost=0.00..533 .78 rows=24 width=32) (actual time=0.20..0.37 rows=24 loops=1)
    -> Index Scan using ipnet_pkey, ipnet_pkey on ipnet n (cost=0.00..6.0 3 rows=2 width=16) (actual time=0.11..0.12 rows=2 loops=1)
    Index Cond: ((netid = 10) OR (netid = 20))
    -> Index Scan using ipinterface_ifa ddrnet_idx on ipinterface i (cost=0.00..262 .58 rows=92 width=16) (actual time=0.06..0.10 rows=12 loops=2)
    Index Cond: (inet2net(i.ifa ddr) = "outer".netaddr )
    Total runtime: 0.52 msec
    (6 rows)

    The same query in 7.4.2 returns no results. Here's its plan:

    Hash Join (cost=6.04..483 .92 rows=24 width=30) (actual time=299.948..2 99.948 rows=0 loops=1)
    Hash Cond: (network(set_ma sklen("outer".i faddr, 24)) = "inner".netaddr )
    -> Seq Scan on ipinterface i (cost=0.00..293 .32 rows=18432 width=15) (actual time=0.039..130 .604 rows=18432 loops=1)
    -> Hash (cost=6.03..6.0 3 rows=2 width=15) (actual time=0.257..0.2 57 rows=0 loops=1)
    -> Index Scan using ipnet_pkey, ipnet_pkey on ipnet n (cost=0.00..6.0 3 rows=2 width=15) (actual time=0.142..0.1 96 rows=2 loops=1)
    Index Cond: ((netid = 10) OR (netid = 20))
    Total runtime: 300.775 ms
    (7 rows)

    If I turn off enable_hashjoin in 7.4.2 I get 24 rows, as expected:

    Nested Loop (cost=0.00..534 .87 rows=24 width=30) (actual time=0.301..1.0 94 rows=24 loops=1)
    -> Index Scan using ipnet_pkey, ipnet_pkey on ipnet n (cost=0.00..6.0 3 rows=2 width=15) (actual time=0.132..0.1 80 rows=2 loops=1)
    Index Cond: ((netid = 10) OR (netid = 20))
    -> Index Scan using ipinterface_ifa ddrnet_idx on ipinterface i (cost=0.00..262 .81 rows=92 width=15) (actual time=0.088..0.2 42 rows=12 loops=2)
    Index Cond: (network(set_ma sklen(i.ifaddr, 24)) = "outer".netaddr )
    Total runtime: 1.914 ms
    (6 rows)

    Am I doing something wrong, or should I report this to the bugs
    list?

    --
    Michael Fuhr


    ---------------------------(end of broadcast)---------------------------
    TIP 7: don't forget to increase your free space map settings

  • Tom Lane

    #2
    Re: Join works in 7.3.6, fails in 7.4.2

    Michael Fuhr <mike@fuhr.or g> writes:[color=blue]
    > I have a query that works in 7.3.6 but not in 7.4.2 unless I turn
    > off enable_hashjoin . I'm joining a table of network interfaces and
    > a table of networks so I can find additional info about a particular
    > interface's network.[/color]

    Hmm. The inet = operator is marked hashable in 7.4 but not in 7.3 ...
    I wonder if that is a mistake? I recall looking at the datatype and
    deciding there were no insignificant bits in it, but that could be
    wrong. Or it could be that the network() function is taking some
    shortcut it shouldn't.

    Is any of this data IPv6 addresses by any chance?
    [color=blue]
    > From my sample data set (available upon request),[/color]

    Could we see the specific values that join in 7.3 and fail to do so in
    7.4?

    regards, tom lane

    ---------------------------(end of broadcast)---------------------------
    TIP 1: subscribe and unsubscribe commands go to majordomo@postg resql.org

    Comment

    • Tom Lane

      #3
      Re: Join works in 7.3.6, fails in 7.4.2

      I wrote:[color=blue]
      > Michael Fuhr <mike@fuhr.or g> writes:[color=green]
      >> I have a query that works in 7.3.6 but not in 7.4.2 unless I turn
      >> off enable_hashjoin . I'm joining a table of network interfaces and
      >> a table of networks so I can find additional info about a particular
      >> interface's network.[/color][/color]
      [color=blue]
      > Hmm. The inet = operator is marked hashable in 7.4 but not in 7.3 ...
      > I wonder if that is a mistake?[/color]

      Digging further, I find that indeed this seems to be a mistake. CIDR
      and INET values that have the same address and masklen compare as equal
      according to network_eq(), but they will not hash the same because
      there's a flag identifying whether a given value is considered CIDR or
      INET. And what the network() function returns is marked as a CIDR.
      It's a bit surprising that your hash join produces any matches at all...

      I believe I got misled on this because there is a hash index operator
      class for inet; at one point during the 7.4 cycle I went around and
      cleaned up cases where the equality operator's canhash flag was
      inconsistent with the set of hash index opclasses. Arguably the hash
      opclass is broken, although in practice people probably don't notice the
      failure since a given column is likely to contain either all inet or all
      cidr values. (And of course it's entirely likely that there *aren't*
      any people using the inet hash opclass, period...)

      I can think of a number of possible fixes:

      1. Mark inet = as not hashjoinable. We'd probably want to remove the
      inet hash opclass too.

      2. Redefine inet = so that CIDR and INET values are never considered
      equal, thus eliminating the unused field. This could be back-patched
      into 7.4 but otherwise seems to have little to recommend it. It
      would certainly not help solve Michael's problem.

      3. Provide a specialized hash method for type inet that ignores the
      iptype field.

      #3 seems the most desirable going forward, but is probably impractical
      to back-patch into 7.4.*, so I'm not sure what to do about the problem
      in that branch. Given the relatively low incidence of the problem,
      maybe it's okay to just clear the oprcanhash flag in future 7.4.*
      releases. This would not fix the problem for existing installations
      (unless they initdb) but any complainers could be told how to adjust
      their catalogs manually.

      Can anyone think of any other approaches?

      regards, tom lane

      ---------------------------(end of broadcast)---------------------------
      TIP 7: don't forget to increase your free space map settings

      Comment

      • Michael Fuhr

        #4
        Re: Join works in 7.3.6, fails in 7.4.2

        On Tue, Apr 13, 2004 at 03:42:54PM -0400, Tom Lane wrote:[color=blue]
        > Michael Fuhr <mike@fuhr.or g> writes:[color=green]
        > > I have a query that works in 7.3.6 but not in 7.4.2 unless I turn
        > > off enable_hashjoin . I'm joining a table of network interfaces and
        > > a table of networks so I can find additional info about a particular
        > > interface's network.[/color]
        >
        > Hmm. The inet = operator is marked hashable in 7.4 but not in 7.3 ...
        > I wonder if that is a mistake? I recall looking at the datatype and
        > deciding there were no insignificant bits in it, but that could be
        > wrong. Or it could be that the network() function is taking some
        > shortcut it shouldn't.[/color]

        So would a workaround be to set oprcanhash to false for that
        operator? I did the following and it appeared to solve the
        problem:

        UPDATE pg_operator SET oprcanhash = FALSE WHERE oid = 1201;

        Or, without knowing that 1201 is the correct OID:

        UPDATE pg_operator SET oprcanhash = FALSE
        WHERE oprname = '='
        AND oprleft IN (SELECT oid FROM pg_type WHERE typname = 'inet');
        [color=blue]
        > Is any of this data IPv6 addresses by any chance?[/color]

        Nope -- all IPv4.
        [color=blue][color=green]
        > > From my sample data set (available upon request),[/color]
        >
        > Could we see the specific values that join in 7.3 and fail to do so in
        > 7.4?[/color]

        I can duplicate the problem with the following data:

        INSERT INTO ipinterface VALUES (1, '10.0.1.1');
        INSERT INTO ipinterface VALUES (2, '10.0.2.1');
        INSERT INTO ipnet VALUES (10, '10.0.1.0/24');
        INSERT INTO ipnet VALUES (20, '10.0.2.0/24');

        Thanks for looking into this.

        --
        Michael Fuhr


        ---------------------------(end of broadcast)---------------------------
        TIP 5: Have you checked our extensive FAQ?



        Comment

        • Tom Lane

          #5
          Re: Join works in 7.3.6, fails in 7.4.2

          Michael Fuhr <mike@fuhr.or g> writes:[color=blue]
          > So would a workaround be to set oprcanhash to false for that
          > operator?[/color]

          Right, see my followup note. This may in fact be the only workable
          solution for the 7.4.* series.

          regards, tom lane

          ---------------------------(end of broadcast)---------------------------
          TIP 3: if posting/reading through Usenet, please send an appropriate
          subscribe-nomail command to majordomo@postg resql.org so that your
          message can get through to the mailing list cleanly

          Comment

          Working...