Multiple modules with database access + general app design?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Robin Haswell

    Multiple modules with database access + general app design?

    Hey people

    I'm an experience PHP programmer who's been writing python for a couple of
    weeks now. I'm writing quite a large application which I've decided to
    break down in to lots of modules (replacement for PHP's include()
    statement).

    My problem is, in PHP if you open a database connection it's always in
    scope for the duration of the script. Even if you use an abstraction layer
    ($db = DB::connect(... )) you can `global $db` and bring it in to scope,
    but in Python I'm having trouble keeping the the database in scope. At the
    moment I'm having to "push" the database into the module, but I'd prefer
    the module to bring the database connection in ("pull") from its parent.

    Eg:
    import modules
    modules.foo.c = db.cursor()
    modules.foo.Bar ()

    Can anyone recommend any "cleaner" solutions to all of this? As far as I
    can see it, Python doesn't have much support for breaking down large
    programs in to organisable files and referencing each other.

    Another problem is I keep having to import modules all over the place. A
    real example is, I have a module "webhosting ", a module "users", and a
    module "common". These are all submodules of the module "modules" (bad
    naming I know). The database connection is instantiated on the "db"
    variable of my main module, which is "yellowfish " (a global module), so
    get the situation where:

    (yellowfish.py)
    import modules
    modules.webhost ing.c = db.cursor()
    modules.webhost ing.Something()

    webhosting needs methods in common and users:

    from modules import common, users

    However users also needs common:

    from modules import common

    And they all need access to the database

    (users and common)
    from yellowfish import db
    c = db.cursor()

    Can anyone give me advice on making this all a bit more transparent? I
    guess I really would like a method to bring all these files in to the same
    scope to make everything seem to be all one application, even though
    everything is broken up in to different files.

    One added complication in this particular application:

    I used modules because I'm calling arbitrary methods defined in some XML
    format. Obviously I wanted to keep security in mind, so my application
    goes something like this:

    import modules
    module, method, args = getXmlAction()
    m = getattr(modules , module)
    m.c = db.cursor()
    f = getattr(m, method)
    f(args)

    In PHP this method is excellent, because I can include all the files I
    need, each containing a class, and I can use variable variables:

    <?php
    $class = new $module; // can't remember if this works, there are
    // alternatves though
    $class->$method($args) ;
    ?>

    And $class->$method() just does "global $db; $db->query(...);" .

    Any advice would be greatly appreciated!

    Cheers

    -Robin Haswell
  • Paul McGuire

    #2
    Re: Multiple modules with database access + general app design?

    "Robin Haswell" <rob@digital-crocus.com> wrote in message
    news:pan.2006.0 1.19.10.28.37.6 68978@digital-crocus.com...[color=blue]
    > Hey people
    >
    > I'm an experience PHP programmer who's been writing python for a couple of
    > weeks now. I'm writing quite a large application which I've decided to
    > break down in to lots of modules (replacement for PHP's include()
    > statement).
    >
    > My problem is, in PHP if you open a database connection it's always in
    > scope for the duration of the script. Even if you use an abstraction layer
    > ($db = DB::connect(... )) you can `global $db` and bring it in to scope,
    > but in Python I'm having trouble keeping the the database in scope. At the
    > moment I'm having to "push" the database into the module, but I'd prefer
    > the module to bring the database connection in ("pull") from its parent.
    >
    > Eg:
    > import modules
    > modules.foo.c = db.cursor()
    > modules.foo.Bar ()
    >
    > Can anyone recommend any "cleaner" solutions to all of this?[/color]

    Um, I think your Python solution *is* moving in a cleaner direction than
    simple sharing of a global $db variable. Why make the Bar class have to
    know where to get a db cursor from? What do you do if your program extends
    to having multiple Bar() objects working with different cursors into the db?

    The unnatural part of this (and hopefully, the part that you feel is
    "unclean") is that you're trading one global for another. By just setting
    modules.foo.c to the db cursor, you force all Bar() instances to use that
    same cursor.

    Instead, make the database cursor part of Bar's constructor. Now you can
    externally create multiple db cursors, a Bar for each, and they all merrily
    do their own separate, isolated processing, in blissful ignorance of each
    other's db cursors (vs. colliding on the shared $db variable).

    -- Paul


    Comment

    • Robin Haswell

      #3
      Re: Multiple modules with database access + general app design?

      On Thu, 19 Jan 2006 12:23:12 +0000, Paul McGuire wrote:
      [color=blue]
      > "Robin Haswell" <rob@digital-crocus.com> wrote in message
      > news:pan.2006.0 1.19.10.28.37.6 68978@digital-crocus.com...[color=green]
      >> Hey people
      >>
      >> I'm an experience PHP programmer who's been writing python for a couple of
      >> weeks now. I'm writing quite a large application which I've decided to
      >> break down in to lots of modules (replacement for PHP's include()
      >> statement).
      >>
      >> My problem is, in PHP if you open a database connection it's always in
      >> scope for the duration of the script. Even if you use an abstraction layer
      >> ($db = DB::connect(... )) you can `global $db` and bring it in to scope,
      >> but in Python I'm having trouble keeping the the database in scope. At the
      >> moment I'm having to "push" the database into the module, but I'd prefer
      >> the module to bring the database connection in ("pull") from its parent.
      >>
      >> Eg:
      >> import modules
      >> modules.foo.c = db.cursor()
      >> modules.foo.Bar ()
      >>
      >> Can anyone recommend any "cleaner" solutions to all of this?[/color]
      >
      > Um, I think your Python solution *is* moving in a cleaner direction than
      > simple sharing of a global $db variable. Why make the Bar class have to
      > know where to get a db cursor from? What do you do if your program extends
      > to having multiple Bar() objects working with different cursors into the db?
      >
      > The unnatural part of this (and hopefully, the part that you feel is
      > "unclean") is that you're trading one global for another. By just setting
      > modules.foo.c to the db cursor, you force all Bar() instances to use that
      > same cursor.
      >
      > Instead, make the database cursor part of Bar's constructor. Now you can
      > externally create multiple db cursors, a Bar for each, and they all merrily
      > do their own separate, isolated processing, in blissful ignorance of each
      > other's db cursors (vs. colliding on the shared $db variable).[/color]

      Hm if truth be told, I'm not totally interested in keeping a separate
      cursor for every class instance. This application runs in a very simple
      threaded socket server - every time a new thread is created, we create a
      new db.cursor (m = getattr(modules , module)\n m.c = db.cursor() is the
      first part of the thread), and when the thread finishes all its actions
      (of which there are many, but all sequential), the thread exits. I don't
      see any situations where lots of methods will tread on another method's
      cursor. My main focus really is minimising the number of connections.
      Using MySQLdb, I'm not sure if every MySQLdb.connect or db.cursor is a
      separate connection, but I get the feeling that a lot of cursors = a lot
      of connections. I'd much prefer each method call with a thread to reuse
      that thread's connection, as creating a connection incurs significant
      overhead on the MySQL server and DNS server.

      -Rob
      [color=blue]
      >
      > -- Paul[/color]

      Comment

      • Daniel Dittmar

        #4
        Re: Multiple modules with database access + general app design?

        Robin Haswell wrote:[color=blue]
        > cursor for every class instance. This application runs in a very simple
        > threaded socket server - every time a new thread is created, we create a
        > new db.cursor (m = getattr(modules , module)\n m.c = db.cursor() is the
        > first part of the thread), and when the thread finishes all its actions
        > (of which there are many, but all sequential), the thread exits. I don't[/color]

        If you use a threading server, you can't put the connection object into
        the module. Modules and hence module variables are shared across
        threads. You could use thread local storage, but I think it's better to
        pass the connection explicitely as a parameter.
        [color=blue]
        > separate connection, but I get the feeling that a lot of cursors = a lot
        > of connections. I'd much prefer each method call with a thread to reuse
        > that thread's connection, as creating a connection incurs significant
        > overhead on the MySQL server and DNS server.[/color]

        You can create several cursor objects from one connection. There should
        be no problems if you finish processing of one cursor before you open
        the next one. In earlier (current?) versions of MySQL, only one result
        set could be opened at a time, so using cursors in parallel present some
        problems to the driver implementor.

        Daniel

        Comment

        • Robin Haswell

          #5
          Re: Multiple modules with database access + general app design?

          On Thu, 19 Jan 2006 14:37:34 +0100, Daniel Dittmar wrote:
          [color=blue]
          > Robin Haswell wrote:[color=green]
          >> cursor for every class instance. This application runs in a very simple
          >> threaded socket server - every time a new thread is created, we create a
          >> new db.cursor (m = getattr(modules , module)\n m.c = db.cursor() is the
          >> first part of the thread), and when the thread finishes all its actions
          >> (of which there are many, but all sequential), the thread exits. I don't[/color]
          >
          > If you use a threading server, you can't put the connection object into
          > the module. Modules and hence module variables are shared across
          > threads. You could use thread local storage, but I think it's better to
          > pass the connection explicitely as a parameter.[/color]

          Would you say it would be better if in every thread I did:

          m = getattr(modules , module)
          b.db = db

          ...

          def Foo():
          c = db.cursor()

          ?
          [color=blue]
          >[color=green]
          >> separate connection, but I get the feeling that a lot of cursors = a lot
          >> of connections. I'd much prefer each method call with a thread to reuse
          >> that thread's connection, as creating a connection incurs significant
          >> overhead on the MySQL server and DNS server.[/color]
          >
          > You can create several cursor objects from one connection. There should
          > be no problems if you finish processing of one cursor before you open
          > the next one. In earlier (current?) versions of MySQL, only one result
          > set could be opened at a time, so using cursors in parallel present some
          > problems to the driver implementor.
          >
          > Daniel[/color]

          Comment

          • Frank Millman

            #6
            Re: Multiple modules with database access + general app design?


            Robin Haswell wrote:[color=blue]
            > Hey people
            >
            > I'm an experience PHP programmer who's been writing python for a couple of
            > weeks now. I'm writing quite a large application which I've decided to
            > break down in to lots of modules (replacement for PHP's include()
            > statement).
            >
            > My problem is, in PHP if you open a database connection it's always in
            > scope for the duration of the script. Even if you use an abstraction layer
            > ($db = DB::connect(... )) you can `global $db` and bring it in to scope,
            > but in Python I'm having trouble keeping the the database in scope. At the
            > moment I'm having to "push" the database into the module, but I'd prefer
            > the module to bring the database connection in ("pull") from its parent.
            >[/color]

            This is what I do.

            Create a separate module to contain your global variables - mine is
            called 'common'.

            In common, create a class, with attributes, but with no methods. Each
            attribute becomes a global variable. My class is called 'c'.

            At the top of every other module, put 'from common import c'.

            Within each module, you can now refer to any global variable as
            c.whatever.

            You can create class attributes on the fly. You can therefore have
            something like -

            c.db = MySql.connect(. ..)

            All modules will be able to access c.db

            As Daniel has indicated, it may not be safe to share one connection
            across multiple threads, unless you can guarantee that one thread
            completes its processing before another one attempts to access the
            database. You can use threading locks to assist with this.

            HTH

            Frank Millman

            Comment

            • Daniel Dittmar

              #7
              Re: Multiple modules with database access + general app design?

              Robin Haswell wrote:[color=blue]
              > On Thu, 19 Jan 2006 14:37:34 +0100, Daniel Dittmar wrote:[color=green]
              >>If you use a threading server, you can't put the connection object into
              >>the module. Modules and hence module variables are shared across
              >>threads. You could use thread local storage, but I think it's better to
              >>pass the connection explicitely as a parameter.[/color]
              >
              >
              > Would you say it would be better if in every thread I did:
              >
              > m = getattr(modules , module)
              > b.db = db
              >
              > ...
              >
              > def Foo():
              > c = db.cursor()
              >[/color]

              I was thinking (example from original post):

              import modules
              modules.foo.Bar (db.cursor ())

              # file modules.foo
              def Bar (cursor):
              cursor.execute (...)

              The same is true for other objects like the HTTP request: always pass
              them as parameters because module variables are shared between threads.

              If you have an HTTP request object, then you could attach the database
              connection to that object, that way you have to pass only one object.

              Or you create a new class that encompasses everything useful for this
              request: the HTTP request, the database connection, possibly an object
              containing authorization infos etc.

              I assume that in PHP, global still means 'local to this request', as PHP
              probably runs in threads under Windows IIS (and Apache 2.0?). In Python,
              you have to be more explicit about the scope.

              Daniel

              Comment

              • Robin Haswell

                #8
                Re: Multiple modules with database access + general app design?

                On Thu, 19 Jan 2006 15:43:58 +0100, Daniel Dittmar wrote:
                [color=blue]
                > Robin Haswell wrote:[color=green]
                >> On Thu, 19 Jan 2006 14:37:34 +0100, Daniel Dittmar wrote:[color=darkred]
                >>>If you use a threading server, you can't put the connection object into
                >>>the module. Modules and hence module variables are shared across
                >>>threads. You could use thread local storage, but I think it's better to
                >>>pass the connection explicitely as a parameter.[/color]
                >>
                >>
                >> Would you say it would be better if in every thread I did:
                >>
                >> m = getattr(modules , module)
                >> b.db = db
                >>
                >> ...
                >>
                >> def Foo():
                >> c = db.cursor()
                >>[/color]
                >
                > I was thinking (example from original post):
                >
                > import modules
                > modules.foo.Bar (db.cursor ())
                >
                > # file modules.foo
                > def Bar (cursor):
                > cursor.execute (...)[/color]

                Ah I see.. sounds interesting. Is it possible to make any module variable
                local to a thread, if set within the current thread? Your method, although
                good, would mean revising all my functions in order to make it work?

                Thanks

                Comment

                • Robin Haswell

                  #9
                  Re: Multiple modules with database access + general app design?

                  On Thu, 19 Jan 2006 06:38:39 -0800, Frank Millman wrote:
                  [color=blue]
                  >
                  > Robin Haswell wrote:[color=green]
                  >> Hey people
                  >>
                  >> I'm an experience PHP programmer who's been writing python for a couple of
                  >> weeks now. I'm writing quite a large application which I've decided to
                  >> break down in to lots of modules (replacement for PHP's include()
                  >> statement).
                  >>
                  >> My problem is, in PHP if you open a database connection it's always in
                  >> scope for the duration of the script. Even if you use an abstraction layer
                  >> ($db = DB::connect(... )) you can `global $db` and bring it in to scope,
                  >> but in Python I'm having trouble keeping the the database in scope. At the
                  >> moment I'm having to "push" the database into the module, but I'd prefer
                  >> the module to bring the database connection in ("pull") from its parent.
                  >>[/color]
                  >
                  > This is what I do.
                  >
                  > Create a separate module to contain your global variables - mine is
                  > called 'common'.
                  >
                  > In common, create a class, with attributes, but with no methods. Each
                  > attribute becomes a global variable. My class is called 'c'.
                  >
                  > At the top of every other module, put 'from common import c'.
                  >
                  > Within each module, you can now refer to any global variable as
                  > c.whatever.
                  >
                  > You can create class attributes on the fly. You can therefore have
                  > something like -
                  >
                  > c.db = MySql.connect(. ..)
                  >
                  > All modules will be able to access c.db
                  >
                  > As Daniel has indicated, it may not be safe to share one connection
                  > across multiple threads, unless you can guarantee that one thread
                  > completes its processing before another one attempts to access the
                  > database. You can use threading locks to assist with this.
                  >
                  > HTH
                  >
                  > Frank Millman[/color]


                  Thanks, that sounds like an excellent idea. While I don't think it applies
                  to the database (threading seems to be becoming a bit of an issue at the
                  moment), I know I can use that in other areas :-)

                  Cheers

                  -Rob

                  Comment

                  • Magnus Lycka

                    #10
                    Re: Multiple modules with database access + general app design?

                    Robin Haswell wrote:[color=blue]
                    > Can anyone give me advice on making this all a bit more transparent? I
                    > guess I really would like a method to bring all these files in to the same
                    > scope to make everything seem to be all one application, even though
                    > everything is broken up in to different files.[/color]

                    This is very much a deliberate design decision in Python.
                    I haven't used PHP, but in e.g. C, the #include directive
                    means that you pollute your namespace with all sorts of
                    strange names from all the third party libraries you are
                    using, and this doesn't scale well. As your application
                    grows, you'll get mysterious bugs due to strange name clashes,
                    removing some module you no-longer need means that your app
                    won't build since the include file you no longer include in
                    turn included another file that you should have included but
                    didn't etc. In Python, explicit is better than implicit (type
                    "import this" at the Python prompt) and while this causes some
                    extra typing it helps with code maintenance. You can always
                    see where a name in your current namespace comes from (unless
                    you use "from xxx import *"). No magic!


                    Concerning your database operations, it seems they are distributed
                    over a lot of different modules, and that might also cause problems,
                    whatever programming language we use. In typical database
                    applications, you need to keep track of transactions properly.

                    For each opened connection, you can perform a number of transactions
                    after each other. A transaction starts with the first database
                    operation after a connect, commit or rollback. A cursor should only
                    live within a transaction. In other words, you should close all
                    cursors before you perform a commit or rollback.

                    I find it very difficult to manage transactions properly if the
                    commits are spread out in the code. Usually I want one module to
                    contain some kind of transaction management logic, where I determine
                    the transaction boundries. This logic will hand out cursor object
                    to various pieces of code, and determine when to close the cursors
                    and commit the transaction.

                    I haven't really written multithreaded applications, so I don't
                    have any experiences in the problems that might cause. I know that
                    it's a fairly common pattern to have all database transactions in
                    one thread though, and to use Queue.Queue instances to pass data
                    to and from the thread that handles DB.

                    Anyway, you can only have one transaction going on at a time for
                    a connection, so if you share connections between threads (or use
                    a separate DB thread and queues) a rollback or commit in one thread
                    will affect the other threads as well...

                    Each DB-API 2.0 compliant library should be able to declare how it
                    can be used in a threaded application. See the DB-API 2.0 spec:
                    http://python.org/peps/pep-0249.html Look for "threadsafe ty".

                    Comment

                    • Daniel Dittmar

                      #11
                      Re: Multiple modules with database access + general app design?

                      Robin Haswell wrote:[color=blue]
                      > Ah I see.. sounds interesting. Is it possible to make any module variable
                      > local to a thread, if set within the current thread?[/color]

                      Not directly. The following class tries to simulate it (only in Python 2.4):

                      import threading

                      class ThreadLocalObje ct (threading.loca l):
                      def setObject (self, object):
                      setattr (self, 'object', object)

                      def clearObject (self):
                      setattr (self, 'object', None)

                      def __getattr__ (self, name):
                      object = threading.local .__getattribute __ (self, 'object')
                      return getattr (object, name)

                      You use it as:

                      in some module x:

                      db = ThreadLocalObje ct ()

                      in some module that create the database connection:

                      import x

                      def createConnectio n ()
                      localdb = ...connect (...)
                      x.db.setObject (localdb)

                      in some module that uses the databasse connection:

                      import x

                      def bar ():
                      cursor = x.db.cursor ()

                      The trick is:
                      - every attribute of a threading.local is thread local (see doc of
                      module threading)
                      - when accessing an attribute of object x.db, the method __getattr__
                      will first retrieve the thread local database connection and then access
                      the specific attribute of the database connection. Thus it looks as if
                      x.db is itself a database connection object.

                      That way, only the setting of the db variable would have to be changed.

                      I'm not exactly recommneding this, as it seems very error prone to me.
                      It's easy to overwrite the variable holding the cursors with an actual
                      cursor object.

                      Daniel

                      Comment

                      • Frank Millman

                        #12
                        Re: Multiple modules with database access + general app design?


                        Daniel Dittmar wrote:[color=blue]
                        > Robin Haswell wrote:[color=green]
                        > > Ah I see.. sounds interesting. Is it possible to make any module variable
                        > > local to a thread, if set within the current thread?[/color]
                        >
                        > Not directly. The following class tries to simulate it (only in Python 2.4):
                        >
                        > import threading
                        >
                        > class ThreadLocalObje ct (threading.loca l):[/color]

                        Daniel, perhaps you can help me here.

                        I have subclassed threading.Threa d, and I store a number of attributes
                        within the subclass that are local to the thread. It seems to work
                        fine, but according to what you say (and according to the Python docs,
                        otherwise why would there be a 'Local' class) there must be some reason
                        why it is not a good idea. Please can you explain the problem with this
                        approach.

                        Briefly, this is what I am doing.

                        class Link(threading. Thread): # each link runs in its own thread
                        """Run a loop listening for messages from client."""

                        def __init__(self,a rgs):
                        threading.Threa d.__init__(self )
                        print 'link connected',self .getName()
                        self.ctrl, self.conn = args
                        self._db = {} # to store db connections for this client
                        connection
                        [create various other local attributes]

                        def run(self):
                        readable = [self.conn.filen o()]
                        error = []
                        self.sendData = [] # 'stack' of replies to be sent

                        self.running = True
                        while self.running:
                        if self.sendData:
                        writable = [self.conn.filen o()]
                        else:
                        writable = []
                        r,w,e = select.select(r eadable,writabl e,error,0.1) # 0.1
                        timeout
                        [continue to handle connection]

                        class Controller(obje ct):
                        """Run a main loop listening for client connections."""

                        def __init__(self):
                        self.s = socket.socket(s ocket.AF_INET, socket.SOCK_STR EAM)
                        self.s.bind((HO ST,PORT))
                        self.s.listen(5 )
                        self.running = True

                        def mainloop(self):
                        while self.running:
                        try:
                        conn,addr = self.s.accept()
                        Link(args=(self ,conn)).start() # create thread to
                        handle connection
                        except KeyboardInterru pt:
                        self.shutdown()

                        Controller().ma inloop()

                        TIA

                        Frank Millman

                        Comment

                        • Daniel Dittmar

                          #13
                          Re: Multiple modules with database access + general app design?

                          Frank Millman wrote:[color=blue]
                          > I have subclassed threading.Threa d, and I store a number of attributes
                          > within the subclass that are local to the thread. It seems to work
                          > fine, but according to what you say (and according to the Python docs,
                          > otherwise why would there be a 'Local' class) there must be some reason
                          > why it is not a good idea. Please can you explain the problem with this
                          > approach.[/color]

                          Your design is just fine. If you follow the thread upwards, you'll
                          notice that I encouraged the OP to pass everything by parameter.

                          Using thread local storage in this case was meant to be a kludge so that
                          not every def and every call has to be changed. There are other cases
                          when you don't control how threads are created (say, a plugin for web
                          framework) where thread local storage is useful.

                          threading.local is new in Python 2.4, so it doesn't seem to be that
                          essential to Python thread programming.

                          Daniel

                          Comment

                          • Frank Millman

                            #14
                            Re: Multiple modules with database access + general app design?


                            Daniel Dittmar wrote:[color=blue]
                            > Frank Millman wrote:[color=green]
                            > > I have subclassed threading.Threa d, and I store a number of attributes
                            > > within the subclass that are local to the thread. It seems to work
                            > > fine, but according to what you say (and according to the Python docs,
                            > > otherwise why would there be a 'Local' class) there must be some reason
                            > > why it is not a good idea. Please can you explain the problem with this
                            > > approach.[/color]
                            >
                            > Your design is just fine. If you follow the thread upwards, you'll
                            > notice that I encouraged the OP to pass everything by parameter.
                            >[/color]

                            Many thanks, Daniel

                            Frank

                            Comment

                            Working...