Selecting from a view built using UDFs

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • eugenef@profitlogic.com

    Selecting from a view built using UDFs

    We are running into the following issue with DB2 V8.1.3 on AIX 5:

    We have a sample table

    create table t ( c1 int, c2 varchar(20))

    populated with some records:

    db2 => select * from t
    C1 C2
    ----------- --------------------
    1 AAA
    2 BBB
    3 CCC

    3 record(s) selected.


    Here's an external java UDF that simply returns the argument string
    and also writes it to a file on the database server:


    import java.io.*;
    public class FileLogWriter
    {
    public static String flw( String filename, String str ) throws
    IOException
    {
    BufferedWriter wr = new BufferedWriter( new
    FileWriter(file name,true));
    wr.write(str);
    wr.newLine();
    wr.close();
    return str;
    }
    }


    CREATE FUNCTION flw(filename varchar(512), str varchar(2000))
    RETURNS VARCHAR(2000)
    SPECIFIC flw
    LANGUAGE JAVA
    PARAMETER STYLE JAVA
    FENCED
    THREADSAFE
    DETERMINISTIC
    NO SQL
    EXTERNAL NAME 'FileLogWriter. flw'
    @


    Then we create the following view on the table t:

    db2 => create view v1(c1,c3) as select c1, flw('/tmp/flw.log',c2) c3
    from t
    DB20000I The SQL command completed successfully.

    and execute the following SQL using the view:

    db2 => select c1 from v1
    C1
    -----------
    1
    2
    3

    3 record(s) selected.


    No surprises here, but a big surprise is found in the flw.log file:

    [root@aix-00 tmp]# cat XXX.log
    AAA
    BBB
    CCC

    It turns DB2 needlessly executed the UDF, ie flw(c2), in order to get
    the record set on column c1 that the UDF does not refer at all!

    Why DB2 optimizer desides to materialize the result set on all the
    view's colums (c1 and c3 in this case) instead of just simply
    rewriting the SQL to select c1 from t?

    Well...it's not an issue in that basic case, but in production the
    table like "t" is going to be very large and the real UDF like "flw"
    implements a complex logic (writing to an external file will be taken
    off from the production version of the flw UDF) so it would cause SQL
    performance issues.

    Any idea?

    Thanks,
    -Eugene
  • nobody

    #2
    Re: Selecting from a view built using UDFs

    eugenef@profitl ogic.com wrote:[color=blue]
    > We are running into the following issue with DB2 V8.1.3 on AIX 5:
    >
    > We have a sample table
    >
    > create table t ( c1 int, c2 varchar(20))
    >
    > populated with some records:
    >
    > db2 => select * from t
    > C1 C2
    > ----------- --------------------
    > 1 AAA
    > 2 BBB
    > 3 CCC
    >
    > 3 record(s) selected.
    >
    >
    > Here's an external java UDF that simply returns the argument string
    > and also writes it to a file on the database server:
    >
    >
    > import java.io.*;
    > public class FileLogWriter
    > {
    > public static String flw( String filename, String str ) throws
    > IOException
    > {
    > BufferedWriter wr = new BufferedWriter( new
    > FileWriter(file name,true));
    > wr.write(str);
    > wr.newLine();
    > wr.close();
    > return str;
    > }
    > }
    >
    >
    > CREATE FUNCTION flw(filename varchar(512), str varchar(2000))
    > RETURNS VARCHAR(2000)
    > SPECIFIC flw
    > LANGUAGE JAVA
    > PARAMETER STYLE JAVA
    > FENCED
    > THREADSAFE
    > DETERMINISTIC
    > NO SQL
    > EXTERNAL NAME 'FileLogWriter. flw'
    > @
    >
    >
    > Then we create the following view on the table t:
    >
    > db2 => create view v1(c1,c3) as select c1, flw('/tmp/flw.log',c2) c3
    > from t
    > DB20000I The SQL command completed successfully.
    >
    > and execute the following SQL using the view:
    >
    > db2 => select c1 from v1
    > C1
    > -----------
    > 1
    > 2
    > 3
    >
    > 3 record(s) selected.
    >
    >
    > No surprises here, but a big surprise is found in the flw.log file:
    >
    > [root@aix-00 tmp]# cat XXX.log
    > AAA
    > BBB
    > CCC
    >
    > It turns DB2 needlessly executed the UDF, ie flw(c2), in order to get
    > the record set on column c1 that the UDF does not refer at all!
    >
    > Why DB2 optimizer desides to materialize the result set on all the
    > view's colums (c1 and c3 in this case) instead of just simply
    > rewriting the SQL to select c1 from t?
    >
    > Well...it's not an issue in that basic case, but in production the
    > table like "t" is going to be very large and the real UDF like "flw"
    > implements a complex logic (writing to an external file will be taken
    > off from the production version of the flw UDF) so it would cause SQL
    > performance issues.
    >
    > Any idea?
    >
    > Thanks,
    > -Eugene[/color]

    My guess is that the optimizer played it safe. Your udf does external
    actions which means it can't just be optimized off. DB2 is doing exactly
    what the query was suppose to do, get the view and return the first
    column back. If the second column is not used at all, the optimizer
    probably would just not select it in the first place. However, since the
    udf does have an external effects, it would be dangerous for the
    optmizer to second-guess what the app developer is trying to do. For
    example, the udf could be doing security logging to track whoever
    selected from the view. If your production system, if the udf has no
    external actions, it would most likely be optimized off.

    Comment

    • Serge Rielau

      #3
      Re: Selecting from a view built using UDFs

      Hmm, try NO EXTERNAL ACTION (I thought that was the default?).

      Cheers
      Serge


      Comment

      • Knut Stolze

        #4
        Re: Selecting from a view built using UDFs

        eugenef@profitl ogic.com <eugenef@profit logic.com> wrote:
        [color=blue]
        > We are running into the following issue with DB2 V8.1.3 on AIX 5:
        >
        > We have a sample table
        >
        > create table t ( c1 int, c2 varchar(20))
        >
        > populated with some records:
        >
        > db2 => select * from t
        > C1 C2
        > ----------- --------------------
        > 1 AAA
        > 2 BBB
        > 3 CCC
        >
        > 3 record(s) selected.
        >
        >
        > Here's an external java UDF that simply returns the argument string
        > and also writes it to a file on the database server:
        >
        >
        > import java.io.*;
        > public class FileLogWriter
        > {
        > public static String flw( String filename, String str ) throws
        > IOException
        > {
        > BufferedWriter wr = new BufferedWriter( new
        > FileWriter(file name,true));
        > wr.write(str);
        > wr.newLine();
        > wr.close();
        > return str;
        > }
        > }
        >
        >
        > CREATE FUNCTION flw(filename varchar(512), str varchar(2000))
        > RETURNS VARCHAR(2000)
        > SPECIFIC flw
        > LANGUAGE JAVA
        > PARAMETER STYLE JAVA
        > FENCED
        > THREADSAFE
        > DETERMINISTIC
        > NO SQL
        > EXTERNAL NAME 'FileLogWriter. flw'
        > @
        >
        >
        > Then we create the following view on the table t:
        >
        > db2 => create view v1(c1,c3) as select c1, flw('/tmp/flw.log',c2) c3
        > from t
        > DB20000I The SQL command completed successfully.
        >
        > and execute the following SQL using the view:
        >
        > db2 => select c1 from v1
        > C1
        > -----------
        > 1
        > 2
        > 3
        >
        > 3 record(s) selected.
        >
        >
        > No surprises here, but a big surprise is found in the flw.log file:
        >
        > [root@aix-00 tmp]# cat XXX.log
        > AAA
        > BBB
        > CCC
        >
        > It turns DB2 needlessly executed the UDF, ie flw(c2), in order to get
        > the record set on column c1 that the UDF does not refer at all!
        >
        > Why DB2 optimizer desides to materialize the result set on all the
        > view's colums (c1 and c3 in this case) instead of just simply
        > rewriting the SQL to select c1 from t?[/color]

        Because you told DB2 that it must do that. Your UDF is defined (implicitly)
        with EXTERNAL ACTION, so DB2 must ensure that the UDF is called exactly
        oncy for each row that qualifies in the WHERE clause of the view
        definition.

        That's the reason why I would always explicitly specify all options for
        functions or procedures. It doesn't hurt, but it makes things much
        clearer.

        --
        Knut Stolze
        Information Integration
        IBM Germany / University of Jena

        Comment

        • W Gemini

          #5
          Re: Selecting from a view built using UDFs

          Serge Rielau wrote:[color=blue]
          > Hmm, try NO EXTERNAL ACTION (I thought that was the default?).
          >
          > Cheers
          > Serge
          >
          >[/color]

          Serge, the default is EXTERNAL ACTION (again, DB2 chose to play it
          safe). I tried with no external action, and the udf does get optimized out.

          Comment

          • eugenef@profitlogic.com

            #6
            Re: Selecting from a view built using UDFs

            "Serge Rielau" <srielau@ca.eye beem.com> wrote in message news:<bnkpe8$hi i$1@hanover.tor olab.ibm.com>.. .[color=blue]
            > Hmm, try NO EXTERNAL ACTION (I thought that was the default?).
            >
            > Cheers
            > Serge[/color]

            Excellent Serge!

            Specifying NO EXTERNAL ACTION for UDF solves the issue: it causes the
            optimiser to eliminate the UDF calls on c2. So now from the "Command
            Center->Statement->Show Optimized SQL Text" the rewritten query is
            built as desired:

            SELECT Q1."C1" AS "C1"
            FROM OAK.T AS Q1

            vs.

            SELECT Q2.$C0 AS "C1"
            FROM
            (SELECT Q1."C1", "OAK "."FLW"('/tmp/flw.log', Q1."C2")
            FROM OAK.T AS Q1) AS Q2

            which was initially, and, of course, the log file is not written with
            the contents of t.c2.

            Unfortunately, DB2 SQL reference does not mention what is the default
            for EXTERNAL ACTION parameter:
            "This optional clause specifies whether or not the function takes some
            action that changes the state of an object not managed by the database
            manager. Optimizations that assume functions have no external impacts
            are prevented by specifying EXTERNAL ACTION. For example: sending a
            message, ringing a bell, or writing a record to a file"

            However, now it turns out the default is "with" EXTERNAL ACTION.

            Thank you guys for the quick solution!

            -Eugene

            Comment

            • Knut Stolze

              #7
              Re: Selecting from a view built using UDFs

              eugenef@profitl ogic.com <eugenef@profit logic.com> wrote:
              [color=blue]
              > Unfortunately, DB2 SQL reference does not mention what is the default
              > for EXTERNAL ACTION parameter:
              > "This optional clause specifies whether or not the function takes some
              > action that changes the state of an object not managed by the database
              > manager. Optimizations that assume functions have no external impacts
              > are prevented by specifying EXTERNAL ACTION. For example: sending a
              > message, ringing a bell, or writing a record to a file"
              >
              > However, now it turns out the default is "with" EXTERNAL ACTION.[/color]

              It does mention it: in the syntax diagram. There is somewhere a discription
              how those diagrams are to be read. In short, it is like this:

              .--arg1--.
              ----+--------+---
              '--arg2--'

              Everything above the line that goes through is used a default if nothing is
              specified for the clause. Everything below the line are other options. So
              in the case here, "arg1" is the default and "arg2" are other options that
              you could specify for the clause.

              p.s: Granted, it would be nice to have the same information available in the
              description sometimes.

              --
              Knut Stolze
              Information Integration
              IBM Germany / University of Jena

              Comment

              Working...