Parallel insert to postgresql with thread

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Abandoned

    Parallel insert to postgresql with thread

    Hi..
    I use the threading module for the fast operation. But i have some
    problems..
    This is my code sample:
    =============== ==
    conn =
    psycopg2.connec t(user='postgre s',password='po stgres',databas e='postgres')
    cursor = conn.cursor()
    class paralel(Thread) :
    def __init__ (self, veriler, sayii):
    Thread.__init__ (self)
    def run(self):
    save(a, b, c)

    def save(a,b,c):
    cursor.execute( "INSERT INTO keywords (keyword) VALUES
    ('%s')" % a)
    conn.commit()
    cursor.execute( "SELECT
    CURRVAL('keywor ds_keyword_id_s eq')")
    idd=cursor.fetc hall()
    return idd[0][0]

    def start(hiz):
    datas=[........]
    for a in datas:
    current = paralel(a, sayii)
    current.start()
    =============== ===
    And it gives me different errors to try parallel insert. My querys
    work in normal operation but in paralel don't work.
    How can i insert data to postgresql the same moment ?
    errors:
    no results to fetch
    cursor already closed

  • Diez B. Roggisch

    #2
    Re: Parallel insert to postgresql with thread

    Abandoned wrote:
    Hi..
    I use the threading module for the fast operation. But i have some
    problems..
    This is my code sample:
    =============== ==
    conn =
    psycopg2.connec t(user='postgre s',password='po stgres',databas e='postgres')
    cursor = conn.cursor()
    class paralel(Thread) :
    def __init__ (self, veriler, sayii):
    Thread.__init__ (self)
    def run(self):
    save(a, b, c)
    >
    def save(a,b,c):
    cursor.execute( "INSERT INTO keywords (keyword) VALUES
    ('%s')" % a)
    conn.commit()
    cursor.execute( "SELECT
    CURRVAL('keywor ds_keyword_id_s eq')")
    idd=cursor.fetc hall()
    return idd[0][0]
    >
    def start(hiz):
    datas=[........]
    for a in datas:
    current = paralel(a, sayii)
    current.start()
    =============== ===
    And it gives me different errors to try parallel insert. My querys
    work in normal operation but in paralel don't work.
    How can i insert data to postgresql the same moment ?
    errors:
    no results to fetch
    cursor already closed
    DB modules aren't necessarily thread-safe. Most of the times, a connection
    (and of course their cursor) can't be shared between threads.

    So open a connection for each thread.

    Diez

    Comment

    • Scott David Daniels

      #3
      Re: Parallel insert to postgresql with thread

      Diez B. Roggisch wrote:
      Abandoned wrote:
      >
      >Hi..
      >I use the threading module for the fast operation. But ....
      [in each thread]
      >def save(a,b,c):
      > cursor.execute( "INSERT INTO ...
      > conn.commit()
      > cursor.execute( ...)
      >How can i insert data to postgresql the same moment ?...
      >
      DB modules aren't necessarily thread-safe. Most of the times, a connection
      (and of course their cursor) can't be shared between threads.
      >
      So open a connection for each thread.
      Note that your DB server will have to "serialize" your inserts, so
      unless there is some other reason for the threads, a single thread
      through a single connection to the DB is the way to go. Of course
      it may be clever enough to behave "as if" they are serialized, but
      mostly of your work parallelizing at your end simply creates new
      work at the DB server end.

      -Scott David Daniels
      Scott.Daniels@A cm.Org

      Comment

      • Erik Jones

        #4
        Re: Parallel insert to postgresql with thread


        On Oct 25, 2007, at 7:28 AM, Scott David Daniels wrote:
        Diez B. Roggisch wrote:
        >Abandoned wrote:
        >>
        >>Hi..
        >>I use the threading module for the fast operation. But ....
        [in each thread]
        >>def save(a,b,c):
        >> cursor.execute( "INSERT INTO ...
        >> conn.commit()
        >> cursor.execute( ...)
        >>How can i insert data to postgresql the same moment ?...
        >>
        >DB modules aren't necessarily thread-safe. Most of the times, a
        >connection
        >(and of course their cursor) can't be shared between threads.
        >>
        >So open a connection for each thread.
        >
        Note that your DB server will have to "serialize" your inserts, so
        unless there is some other reason for the threads, a single thread
        through a single connection to the DB is the way to go. Of course
        it may be clever enough to behave "as if" they are serialized, but
        mostly of your work parallelizing at your end simply creates new
        work at the DB server end.
        Fortunately, in his case, that's not necessarily true. If they do
        all their work with the same connection then, yes, but there are
        other problems with that as mention wrt thread safety and psycopg2.
        If he goes the recommended route with a separate connection for each
        thread, then Postgres will not serialize multiple inserts coming from
        separate connections unless there is something like and ALTER TABLE
        or REINDEX concurrently happening on the table. The whole serialized
        inserts thing is strictly something popularized by MySQL and is by no
        means necessary or standard (as with a lot of MySQL).

        Erik Jones

        Software Developer | Emma®
        erik@myemma.com
        800.595.4401 or 615.292.5888
        615.292.0777 (fax)

        Emma helps organizations everywhere communicate & market in style.
        Visit us online at http://www.myemma.com


        Comment

        • Scott David Daniels

          #5
          Re: Parallel insert to postgresql with thread

          Erik Jones wrote:
          >
          On Oct 25, 2007, at 7:28 AM, Scott David Daniels wrote:
          >Diez B. Roggisch wrote:
          >>Abandoned wrote:
          >>>Hi..
          >>>I use the threading module for the fast operation. But ....
          >[in each thread]
          >>>def save(a,b,c):
          >>> cursor.execute( "INSERT INTO ...
          >>> conn.commit()
          >>> cursor.execute( ...)
          >>>How can i insert data to postgresql the same moment ?...
          >>>
          >>DB modules aren't necessarily thread-safe. Most of the times, a
          >>connection (and ... cursor) can't be shared between threads.
          >>So open a connection for each thread.
          >>
          >Note that your DB server will have to "serialize" your inserts, so
          >... a single thread through a single connection to the DB is the way
          >to go. Of course it (the DB server) may be clever enough to behave
          >"as if" they are serialized, but most of your work parallelizing at
          >your end simply creates new work at the DB server end.
          >
          Fortunately, in his case, that's not necessarily true.... If he
          goes the recommended route with a separate connection for each thread,
          then Postgres will not serialize multiple inserts coming from separate
          connections unless there is something like and ALTER TABLE or REINDEX
          concurrently happening on the table.
          The whole serialized inserts thing is strictly something popularized
          by MySQL and is by no means necessary or standard (as with a lot of
          MySQL).
          But he commits after every insert, which _does_ force serialization (if
          only to provide safe transaction boundaries). I understand you can get
          clever at how to do it, _but_ preserving ACID properties is exactly what
          I mean by "serialize, " and while I like to bash MySQL as well as the
          next person, I most certainly am not under the evil sway of the vile
          MySQL cabal.

          The server will have to be able to abort each transaction
          _independently_ of the others, and so must serialize any index
          updates that share a page by, for example, landing in the same node
          of a B-Tree.

          -Scott David Daniels
          Scott.Daniels@A cm.Org

          Comment

          • Erik Jones

            #6
            OT Re: Parallel insert to postgresql with thread

            If you're not Scott Daniels, beware that this conversation has gone
            horribly off topic and, unless you have an interest in PostreSQL, you
            may not want to bother reading on...

            On Oct 25, 2007, at 9:46 PM, Scott David Daniels wrote:
            Erik Jones wrote:
            >>
            >On Oct 25, 2007, at 7:28 AM, Scott David Daniels wrote:
            >>Diez B. Roggisch wrote:
            >>>Abandoned wrote:
            >>>>Hi..
            >>>>I use the threading module for the fast operation. But ....
            >>[in each thread]
            >>>>def save(a,b,c):
            >>>> cursor.execute( "INSERT INTO ...
            >>>> conn.commit()
            >>>> cursor.execute( ...)
            >>>>How can i insert data to postgresql the same moment ?...
            >>>>
            >>>DB modules aren't necessarily thread-safe. Most of the times, a
            >>>connection (and ... cursor) can't be shared between threads.
            >>>So open a connection for each thread.
            >>>
            >>Note that your DB server will have to "serialize" your inserts, so
            >>... a single thread through a single connection to the DB is the way
            >>to go. Of course it (the DB server) may be clever enough to behave
            >>"as if" they are serialized, but most of your work parallelizing at
            >>your end simply creates new work at the DB server end.
            >>
            >Fortunately, in his case, that's not necessarily true.... If he
            >goes the recommended route with a separate connection for each
            >thread,
            >then Postgres will not serialize multiple inserts coming from
            >separate
            >connections unless there is something like and ALTER TABLE or REINDEX
            >concurrently happening on the table.
            >The whole serialized inserts thing is strictly something popularized
            >by MySQL and is by no means necessary or standard (as with a lot of
            >MySQL).
            >
            But he commits after every insert, which _does_ force serialization
            (if
            only to provide safe transaction boundaries). I understand you can
            get
            clever at how to do it, _but_ preserving ACID properties is exactly
            what
            I mean by "serialize, "
            First, bad idea to work with your own definition of a very domain
            specific and standardized term. Especially when Postgres's Multi-
            Version Concurrency Control mechanisms are designed specifically for
            the purpose of preserve ACID compliance without forcing serialized
            transactions on the user.

            Second, unless he specifically sets his transaction level to
            serializable, he will be working in read-committed mode. What this
            specifically means is that two (or more) transactions writing to the
            same table will not block any of the others. Let's say the user has
            two concurrent inserts to run on the same table that, for whatever
            reason, take a while to run (for example, they insert the results of
            some horribly complex or inefficient select), if either is run in
            serializable mode then which ever one starts a fraction of a second
            sooner will run until completion before the second is even allowed to
            begin. In (the default) read-committed mode they will both begin
            executing as soon as they are called and will write their data
            regardless of conflicts. At commit time (which may be sometime later
            for transactions with multiple statements are used) is when conflicts
            are resolved. So, if between the two example transactions there does
            turn out to be a conflict betwen their results, whichever commits
            second will roll back and, since the data written by the second
            transaction will not be marked as committed, it will never be visible
            to any other transactions and the space will remain available for
            future transactions.

            Here's the relevant portion of the Postgres docs on all of this:

            and while I like to bash MySQL as well as the
            next person, I most certainly am not under the evil sway of the vile
            MySQL cabal.
            Good to hear ;)
            >
            The server will have to be able to abort each transaction
            _independently_ of the others, and so must serialize any index
            updates that share a page by, for example, landing in the same node
            of a B-Tree.
            There is nothing inherent in B-Trees that prevents identical datum
            from being written in them. If there was the only they'd be good for
            would be unique indexes. Even if you do use a unique index, as noted
            above, constraints and conflicts are only enforced at commit time.

            Erik Jones

            Software Developer | Emma®
            erik@myemma.com
            800.595.4401 or 615.292.5888
            615.292.0777 (fax)

            Emma helps organizations everywhere communicate & market in style.
            Visit us online at http://www.myemma.com


            Comment

            • Laurent Pointal

              #7
              Re: Parallel insert to postgresql with thread

              Le Thu, 25 Oct 2007 13:27:40 +0200, Diez B. Roggisch a écrit :
              DB modules aren't necessarily thread-safe. Most of the times, a
              connection (and of course their cursor) can't be shared between threads.
              >
              So open a connection for each thread.
              >
              Diez
              DB modules following DBAPI2 must define the following attribute:

              """
              threadsafety

              Integer constant stating the level of thread safety the
              interface supports. Possible values are:

              0 Threads may not share the module.
              1 Threads may share the module, but not connections.
              2 Threads may share the module and connections.
              3 Threads may share the module, connections and
              cursors.
              """

              This API has been defined to encourage similarity between the Python modules that are used to access databases. By doing this, we hope to achieve a consistency leading to more easily understood modules, code that is generally more portable across datab...




              --
              Laurent POINTAL - laurent.pointal @laposte.net

              Comment

              Working...