SQL parsing

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • =?windows-1252?Q?=22=C1lvaro_G=2E_Vicario=22?=

    SQL parsing

    I’ve written a simple results cache for an Oracle driven application.
    When I run a query I need to know which tables it uses so I can find out
    if they have been modified and thus the cached data must be discarded
    (as I said, it’s a very simple cache and the DB is basically static).

    I didn’t like the idea of passing a table list as argument for each
    query because they are dynamically built and, anyway, I thought it’d
    become unmaintainable sooner or later. So I wrote an even simpler SQL
    parser that looks like this:

    $tables = 'PRODUCTS|CATEG ORIES|CUSTOMERS |ORDERS';
    $regex = '/\W(FROM|JOIN)\s +(' . $tables . ')(?:\W|$)/i';
    preg_match_all( $regex, $sql, $matches);

    So far, it works fine enough for my purposes, but I think it’s not a
    long term solution. Is it possible to build a generic reliable SQL
    parser to fetch the table names of a query without diving in the regex
    hell? Do you know about an existing library?* Can you think of any other
    solution?

    Thank you in advance,
    [*] I’ve browsed some code out there that does what I need (typically
    via regex soup), but it always fails with the queries I test.

    --
    -- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
    -- Mi sitio sobre programación web: http://bits.demogracia.com
    -- Mi web de humor al baño María: http://www.demogracia.com
    --
  • C. (http://symcbean.blogspot.com/)

    #2
    Re: SQL parsing

    On May 20, 12:43 pm, "Álvaro G. Vicario"
    <alvaroNOSPAMTH A...@demogracia .comwrote:
    I’ve written a simple results cache for an Oracle driven application.
    When I run a query I need to know which tables it uses so I can find out
    if they have been modified and thus the cached data must be discarded
    (as I said, it’s a very simple cache and the DB is basically static).
    >
    I didn’t like the idea of passing a table list as argument for each
    query because they are dynamically built and, anyway, I thought it’d
    become unmaintainable sooner or later. So I wrote an even simpler SQL
    parser that looks like this:
    >
    $tables = 'PRODUCTS|CATEG ORIES|CUSTOMERS |ORDERS';
    $regex = '/\W(FROM|JOIN)\s +(' . $tables . ')(?:\W|$)/i';
    preg_match_all( $regex, $sql, $matches);
    >
    So far, it works fine enough for my purposes, but I think it’s not a
    long term solution. Is it possible to build a generic reliable SQL
    parser to fetch the table names of a query without diving in the regex
    hell? Do you know about an existing library?* Can you think of any other
    solution?
    >
    Thank you in advance,
    >[*] I’ve browsed some code out there that does what I need (typically
    via regex soup), but it always fails with the queries I test.
    >
    Certainly its **possible** to write an SQL parser - lots of other
    people have managed - but stepping back a bit this seems like
    premature optimization - even for an Oracle DBMS. Maybe you should
    start by looking at the DBMS. Surprisingly it looks like Oracle didn't
    introduce result caching until 11g but prior to that it would still be
    possible to minimise the disk I/O with a large buffer cache.

    For large result sets you may get better performance by going back to
    the DBMS - this the sort of thing DBMS are designed to do well - more
    so than PHP - which (IME) can be a bit slow handling large arrays.

    For small result sets (except maybe aggregates) SQL tuning is the way
    to go. While the CBO can be very, very smart - it can also be very,
    very dumb at times.

    As you have discovered, your regex fails to deal with a lot of cases,
    but even a complete parser won't cope with stored procedures. Much
    though I dislike SQL stored procedures this might be the most
    convenient way to solve the problem - add a parameter for
    last_cache_time stamp and pass in the value from the previous query -
    in the PL/SQL, first check if any of the tables have been modified
    since then - if not return a custom error code, otherwise redo the
    query.

    But the simplest option might be to just apply a default timeout on
    the local query cache.

    Alternatively build your own abstraction in PHP whereby developers
    never enter SQL commands directly, but it makes it easier for you to
    extract the table names.

    C.

    Comment

    • Rik Wasmus

      #3
      Re: SQL parsing

      Álvaro G. Vicario wrote:
      I’ve written a simple results cache for an Oracle driven application.
      When I run a query I need to know which tables it uses so I can find out
      if they have been modified and thus the cached data must be discarded
      (as I said, it’s a very simple cache and the DB is basically static).
      >
      I didn’t like the idea of passing a table list as argument for each
      query because they are dynamically built and, anyway, I thought it’d
      become unmaintainable sooner or later. So I wrote an even simpler SQL
      parser that looks like this:
      >
      $tables = 'PRODUCTS|CATEG ORIES|CUSTOMERS |ORDERS';
      $regex = '/\W(FROM|JOIN)\s +(' . $tables . ')(?:\W|$)/i';
      preg_match_all( $regex, $sql, $matches);
      >
      So far, it works fine enough for my purposes, but I think it’s not a
      long term solution. Is it possible to build a generic reliable SQL
      parser to fetch the table names of a query without diving in the regex
      hell? Do you know about an existing library?* Can you think of any other
      solution?
      >
      Thank you in advance,
      >[*] I’ve browsed some code out there that does what I need (typically
      via regex soup), but it always fails with the queries I test.
      Of course it is possible to build a PARSER. This is generally done
      WITHOUT regexes. There are some examples available on the net, none that
      I tested though. Google for "PHP SQL parser". If you want to right your
      own, Matt Zandstra's book ("PHP5 objects and patterns" or something like
      that) has a fine example & starting point for you.
      --
      Rik Wasmus
      ....spamrun finished

      Comment

      • =?windows-1252?Q?=22=C1lvaro_G=2E_Vicario=22?=

        #4
        Re: SQL parsing

        C. (http://symcbean.blogspot.com/) escribió:
        >So far, it works fine enough for my purposes, but I think it’s not a
        >long term solution. Is it possible to build a generic reliable SQL
        >parser to fetch the table names of a query without diving in the regex
        >hell? Do you know about an existing library?* Can you think of any other
        >solution?
        Certainly its **possible** to write an SQL parser - lots of other
        people have managed - but stepping back a bit this seems like
        premature optimization - even for an Oracle DBMS. Maybe you should
        start by looking at the DBMS. Surprisingly it looks like Oracle didn't
        introduce result caching until 11g but prior to that it would still be
        possible to minimise the disk I/O with a large buffer cache.
        Can you speak of premature optimization when it was an application that
        had been in production for several months? :)

        DBMS-level solutions are not as promising as you may think for several
        reasons:

        - The app consists basically in a collection of SUM() and COUNT()
        statements over multi-million row tables spiced with on-the-fly
        user-defined filters that often imply joins or subqueries. There's no
        fast way to sum 300,000 values of non-consecutive non-indexed random
        rows. And there's no need to do it 100 times a day when data won't
        change until next month.

        - I only take care of the PHP part. The Oracle server is a separate host
        somewhere in the LAN; I don't have the least control over it.

        - The DB design is as-is. I mean, I'd really *love* to add primary keys
        and such fashionable stuff to all tables, but I just can't ;-)

        For large result sets you may get better performance by going back to
        the DBMS - this the sort of thing DBMS are designed to do well - more
        so than PHP - which (IME) can be a bit slow handling large arrays.
        Result sets are normally small. The problem is that they're calculated
        from too many rows.
        As you have discovered, your regex fails to deal with a lot of cases,
        but even a complete parser won't cope with stored procedures.
        Actually, my regex works without a flaw because I search for the
        specific table names I handle. It's third-party code (such as PEAR's
        SQL_Parser) which fails.
        But the simplest option might be to just apply a default timeout on
        the local query cache.
        That was version one. It was fine, but I wanted to make new data
        available in a reasonable time since it was dumped, while not needing to
        re-run several times a day all the slow queries on data that had not
        changed in a month.

        As I said, I have a system that works well, I'm just looking for
        improvements. Anyway, thanks for your comments.


        --
        -- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
        -- Mi sitio sobre programación web: http://bits.demogracia.com
        -- Mi web de humor al baño María: http://www.demogracia.com
        --

        Comment

        Working...