(in memory) database

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • mark

    (in memory) database

    Hi there,

    I need to extract data from text files (~4 GB) on this data some
    operations are performed like avg, max, min, group etc. The result is
    formated and written in some other text files (some KB).

    I currently think about database tools might be suitable for this. I
    would just write the import from the text files and ... the tool does
    the rest. The only problem I can imagine is that this would not be
    fast enough. But I would give it a shoot.
    Unfortunately I have only some knowledge of SQLite which is not an
    option here.

    Some additional requirements I can think of are:
    - Python (I want to hone my programming skills too)
    - Python-only (no C-lib) for simplicity (installation, portability).
    Therefore SQLite is not an option
    - must be fast
    - I like SQL (select a, b from ...) this would be nice (row[..] + ...
    is a little hard getting used to)

    So far I found PyDBLite, PyTables, Buzhug but they are difficult to
    compare for a beginner.

    Cheers,
    Mark
  • Fredrik Lundh

    #2
    Re: (in memory) database

    mark wrote:
    I need to extract data from text files (~4 GB) on this data some
    operations are performed like avg, max, min, group etc. The result is
    formated and written in some other text files (some KB).
    you could probably do all that with data stream processing, but if you
    haven't worked with such algorithms, just stuffing it all in a database
    is probably less work for you (if not for your CPU).
    Unfortunately I have only some knowledge of SQLite which is not an
    option here.
    why is sqlite not an option? it's is bundled with Python these days,
    and should be available (or trivial to install) on all major deployment
    platforms.

    </F>

    Comment

    • Cameron Laird

      #3
      Re: (in memory) database

      In article <mailman.297.12 20190030.3487.p ython-list@python.org >,
      Fredrik Lundh <fredrik@python ware.comwrote:
      >mark wrote:

      Comment

      • Paul Boddie

        #4
        Re: (in memory) database

        On 31 Aug, 16:45, cla...@lairds.u s (Cameron Laird) wrote:
        Yes and no. My own experience with Debian packages is that with a
        standard
        apt-get install python2.5
        an attempt to
        import sqlite3
        results in
        ImportError: No module named _sqlite3
        That's strange from the perspective of the Debian package information:




        Both have libsqlite3-0 as a dependency. On my Ubuntu system, the same
        dependency applies.
        that is, <URL:https://bugzilla.novell .com/show_bug.cgi?id =228733>.
        I'm not sure Novell can help with the matter, though. ;-)
        I recognize the error was resolved nearly two years ago,
        but I, for one, don't understand how to express the resolution in
        terms of Debian packages. Is there a way to install Python and have
        it manage SQLite3 correctly withOUT configuring recent sources "by
        hand"?
        Which Debian version and which package repository? I imagine that
        there may have been backports of Python 2.5 to Debian 3.1 (Sarge) and
        earlier, but my own experience with sqlite prior to running Python 2.5
        on Ubuntu involved use of the pysqlite2 module with Python 2.4
        instead. Since Python 2.5 became the default on Ubuntu, I don't recall
        having any problems with sqlite.

        Paul

        Comment

        • Cameron Laird

          #5
          Re: (in memory) database

          In article <b68940e4-78cc-4fbb-94cd-69478d45f96c@26 g2000hsk.google groups.com>,
          Paul Boddie <paul@boddie.or g.ukwrote:
          >On 31 Aug, 16:45, cla...@lairds.u s (Cameron Laird) wrote:
          >Yes and no. My own experience with Debian packages is that with a
          >standard
          > apt-get install python2.5
          >an attempt to
          > import sqlite3
          >results in
          > ImportError: No module named _sqlite3
          >
          >That's strange from the perspective of the Debian package information:
          >
          >http://packages.debian.org/etch/python2.5
          >http://packages.debian.org/lenny/python2.5
          >
          >Both have libsqlite3-0 as a dependency. On my Ubuntu system, the same
          >dependency applies.
          >
          >that is, <URL:https://bugzilla.novell .com/show_bug.cgi?id =228733>.
          >
          >I'm not sure Novell can help with the matter, though. ;-)
          >
          >I recognize the error was resolved nearly two years ago,
          >but I, for one, don't understand how to express the resolution in
          >terms of Debian packages. Is there a way to install Python and have
          >it manage SQLite3 correctly withOUT configuring recent sources "by
          >hand"?
          >
          >Which Debian version and which package repository? I imagine that
          >there may have been backports of Python 2.5 to Debian 3.1 (Sarge) and
          >earlier, but my own experience with sqlite prior to running Python 2.5
          >on Ubuntu involved use of the pysqlite2 module with Python 2.4
          >instead. Since Python 2.5 became the default on Ubuntu, I don't recall
          >having any problems with sqlite.
          >
          >Paul
          Thanks for pursuing this, Paul. You have me curious now.

          Let's take a definite example: I have a convenient
          Ubuntu 8.04.1
          The content of /etc/apt/sources.list is
          deb http://us.archive.ubuntu.com/ubuntu hardy main restricted
          deb http://us.archive.ubuntu.com/ubuntu hardy-updates main restricted
          deb http://us.archive.ubuntu.com/ubuntu hardy universe multiverse
          deb http://security.ubuntu.com/ubuntu hardy-security main restricted
          I do
          apt-get update
          apt-get upgrade
          apt-get install python2.5
          then
          # python2.5
          Python 2.5 (r25:51908, Dec 11 2006, 21:09:56)
          [GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2
          Type "help", "copyright" , "credits" or "license" for more information.
          >>import sqlite3
          Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
          File "/usr/local/lib/python2.5/sqlite3/__init__.py", line 24, in <module>
          from dbapi2 import *
          File "/usr/local/lib/python2.5/sqlite3/dbapi2.py", line 27, in <module>
          from _sqlite3 import *
          ImportError: No module named _sqlite3

          How do you interpret this?

          Comment

          • Paul Boddie

            #6
            Re: (in memory) database

            On 31 Aug, 20:05, cla...@lairds.u s (Cameron Laird) wrote:
            >
            Let's take a definite example: I have a convenient
            Ubuntu 8.04.1
            The content of /etc/apt/sources.list is
            debhttp://us.archive.ubun tu.com/ubuntuhardy main restricted
            debhttp://us.archive.ubun tu.com/ubuntuhardy-updates main restricted
            debhttp://us.archive.ubun tu.com/ubuntuhardy universe multiverse
            debhttp://security.ubuntu .com/ubuntuhardy-security main restricted
            I do
            apt-get update
            apt-get upgrade
            apt-get install python2.5
            then
            # python2.5
            Python 2.5 (r25:51908, Dec 11 2006, 21:09:56)
            [GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2
            Type "help", "copyright" , "credits" or "license" for more information.
            >>import sqlite3
            Traceback (most recent call last):
            File "<stdin>", line 1, in <module>
            File "/usr/local/lib/python2.5/sqlite3/__init__.py", line 24, in <module>
            from dbapi2 import *
            File "/usr/local/lib/python2.5/sqlite3/dbapi2.py", line 27, in <module>
            from _sqlite3 import *
            ImportError: No module named _sqlite3
            >
            How do you interpret this?
            What do you get if you run this command...?

            dpkg -s python2.5

            For me, I get something which mentions the following:

            Package: python2.5

            [...]

            Depends: python2.5-minimal (= 2.5.1-0ubuntu1.2), mime-support,
            libbz2-1.0, libc6 (>= 2.5-0ubuntu1), libdb4.4,
            libncursesw5 (>= 5.4-5), libreadline5 (>= 5.2),
            libsqlite3-0 (>= 3.3.13), libssl0.9.8 (>= 0.9.8c-1)

            Note the presence of the libsqlite3-0 package. In addition, you should
            have the sqlite3 extension module somewhere:

            locate sqlite3.so

            This should tell you where the sqlite libraries are as well as where
            the extension module is. For me, I get something which includes the
            following:

            /usr/lib/python2.5/lib-dynload/_sqlite3.so
            /usr/lib/libsqlite3.so.0

            Passing one of these to "dpkg -S" should say which package provided
            it.

            The strange thing is that the Ubuntu package information for your
            version does mention the sqlite dependency and include the extension
            module in the list of files:



            You can run the following command to see whether your python2.5
            package really provides the extension module:

            dpkg --listfiles python2.5

            Even if the sqlite library is installed, if that package doesn't
            provide the extension module, something must be wrong with it because
            it should be there.

            Paul

            Comment

            • Cameron Laird

              #7
              Re: (in memory) database

              In article <eed3b104-d08d-40e9-8608-a8bee65b3a68@z7 2g2000hsb.googl egroups.com>,
              Paul Boddie <paul@boddie.or g.ukwrote:
              >On 31 Aug, 20:05, cla...@lairds.u s (Cameron Laird) wrote:
              >>
              >Let's take a definite example: I have a convenient
              > Ubuntu 8.04.1
              >The content of /etc/apt/sources.list is
              > debhttp://us.archive.ubun tu.com/ubuntuhardy main restricted
              > debhttp://us.archive.ubun tu.com/ubuntuhardy-updates main restricted
              > debhttp://us.archive.ubun tu.com/ubuntuhardy universe multiverse
              > debhttp://security.ubuntu .com/ubuntuhardy-security main restricted
              >I do
              > apt-get update
              > apt-get upgrade
              > apt-get install python2.5
              >then
              > # python2.5
              > Python 2.5 (r25:51908, Dec 11 2006, 21:09:56)
              > [GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2
              > Type "help", "copyright" , "credits" or "license" for more information.
              > >>import sqlite3
              > Traceback (most recent call last):
              > File "<stdin>", line 1, in <module>
              > File "/usr/local/lib/python2.5/sqlite3/__init__.py", line 24, in
              ><module>
              > from dbapi2 import *
              > File "/usr/local/lib/python2.5/sqlite3/dbapi2.py", line 27, in <module>
              > from _sqlite3 import *
              > ImportError: No module named _sqlite3
              >>
              >How do you interpret this?
              >
              >What do you get if you run this command...?
              >
              dpkg -s python2.5
              >
              >For me, I get something which mentions the following:
              >
              Package: python2.5
              >
              [...]
              >
              Depends: python2.5-minimal (= 2.5.1-0ubuntu1.2), mime-support,
              libbz2-1.0, libc6 (>= 2.5-0ubuntu1), libdb4.4,
              libncursesw5 (>= 5.4-5), libreadline5 (>= 5.2),
              libsqlite3-0 (>= 3.3.13), libssl0.9.8 (>= 0.9.8c-1)
              For me:
              Depends: libbz2-1.0, libc6 (>= 2.4), libdb4.6, libncursesw5 (>= 5.6+20071006-3), libreadline5 (>= 5.2), libsqlite3-0 (>= 3.4.2), libssl0.9.8 (>= 0.9.8f-1), mime-support, python2.5-minimal (= 2.5.2-2ubuntu4.1)
              >
              >Note the presence of the libsqlite3-0 package. In addition, you should
              >have the sqlite3 extension module somewhere:
              >
              locate sqlite3.so
              >
              >This should tell you where the sqlite libraries are as well as where
              >the extension module is. For me, I get something which includes the
              >following:
              >
              /usr/lib/python2.5/lib-dynload/_sqlite3.so
              /usr/lib/libsqlite3.so.0
              /usr/lib/python2.5/lib-dynload/_sqlite3.so
              /usr/lib/libsqlite3.so.0 .8.6
              /usr/lib/xulrunner-1.9.0.1/libsqlite3.so
              /usr/lib/xulrunner-1.9.0.1/libsqlite3.so.0
              /usr/lib/libsqlite3.so.0
              /usr/lib/libsqlite3.so
              >
              >Passing one of these to "dpkg -S" should say which package provided
              >it.
              libsqlite3-dev: /usr/lib/libsqlite3.so
              >
              >The strange thing is that the Ubuntu package information for your
              >version does mention the sqlite dependency and include the extension
              >module in the list of files:
              >

              >
              >You can run the following command to see whether your python2.5
              >package really provides the extension module:
              >
              dpkg --listfiles python2.5
              # dpkg --listfiles python2.5 | grep sqli
              /usr/lib/python2.5/sqlite3
              /usr/lib/python2.5/sqlite3/test
              /usr/lib/python2.5/sqlite3/test/__init__.py
              /usr/lib/python2.5/sqlite3/test/dbapi.py
              /usr/lib/python2.5/sqlite3/test/factory.py
              /usr/lib/python2.5/sqlite3/test/hooks.py
              /usr/lib/python2.5/sqlite3/test/regression.py
              /usr/lib/python2.5/sqlite3/test/transactions.py
              /usr/lib/python2.5/sqlite3/test/types.py
              /usr/lib/python2.5/sqlite3/test/userfunctions.p y
              /usr/lib/python2.5/sqlite3/__init__.py
              /usr/lib/python2.5/sqlite3/dbapi2.py
              /usr/lib/python2.5/lib-dynload/_sqlite3.so
              >
              >Even if the sqlite library is installed, if that package doesn't
              >provide the extension module, something must be wrong with it because
              >it should be there.
              >
              >Paul
              I'm certainly perplexed, and welcome suggestions.

              Comment

              • Paul Boddie

                #8
                Re: (in memory) database

                On 31 Aug, 21:29, cla...@lairds.u s (Cameron Laird) wrote:
                >
                [Lots of output suggesting correct package configuration]
                I'm certainly perplexed, and welcome suggestions.
                Maybe...

                which python

                I think Jean-Paul might be on to something with his response. Are we
                referring to the system-packaged Python? There's always "python -v"
                and/or "strace python" for full details of what might be happening
                otherwise.

                Paul

                Comment

                • Cousin Stanley

                  #9
                  Re: (in memory) database

                  .... .
                  Yes and no. My own experience with Debian packages
                  is that with a standard
                  >
                  apt-get install python2.5
                  >
                  an attempt to
                  import sqlite3
                  >
                  results in
                  ImportError: No module named _sqlite3
                  ....
                  No problems here with Debian Lenny ....

                  All packages via .... apt-get install xxxx ....

                  $ uname -a
                  Linux em1 2.6.25-2-686 #1 SMP Fri Jul 18 17:46:56 UTC 2008 i686 GNU/Linux

                  $ dpkg -l | grep sqlite
                  ii libhk-classes-sqlite3 0.8.3-4 SQLite 3 driver plugin for hk_classes
                  ii libsqlite3-0 3.5.9-3 SQLite 3 shared library
                  ii python-pysqlite2 2.4.1-1 Python interface to SQLite 3
                  ii sqlite3 3.5.9-3 A command line interface for SQLite 3

                  $ py
                  Python 2.5.2 (r252:60911, Aug 8 2008, 09:22:44)
                  [GCC 4.3.1] on linux2
                  Type "help", "copyright" , "credits" or "license" for more information.
                  >>>
                  >>import sqlite3
                  >>>

                  --
                  Stanley C. Kitching
                  Human Being
                  Phoenix, Arizona

                  Comment

                  • Cousin Stanley

                    #10
                    Re: (in memory) database

                    ....
                    Yes and no. My own experience with Debian packages
                    is that with a standard
                    apt-get install python2.5
                    an attempt to
                    import sqlite3
                    results in
                    ImportError: No module named _sqlite3
                    ....
                    From Kubuntu 8.04 ....

                    $ uname -a
                    Linux em1 2.6.24-19-generic #1 SMP
                    Wed Aug 20 22:56:21 UTC 2008 i686 GNU/Linux

                    $ dpkg -l | grep sqlite
                    ii libsqlite0 2.8.17-4build1 SQLite shared library
                    ii libsqlite3-0 3.4.2-2 SQLite 3 shared library

                    $ py25
                    Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52)
                    [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
                    Type "help", "copyright" , "credits" or "license" for more information.
                    >>>
                    >>import sqlite3
                    >>>
                    It is now my estimation that the Force
                    is not currently with you .... :-)

                    --
                    Stanley C. Kitching
                    Human Being
                    Phoenix, Arizona

                    Comment

                    • Bruno Desthuilliers

                      #11
                      Re: (in memory) database

                      mark a écrit :
                      Hi there,
                      >
                      I need to extract data from text files (~4 GB) on this data some
                      operations are performed like avg, max, min, group etc. The result is
                      formated and written in some other text files (some KB).
                      >
                      I currently think about database tools might be suitable for this. I
                      would just write the import from the text files and ... the tool does
                      the rest. The only problem I can imagine is that this would not be
                      fast enough.
                      Is this an a priori, or did you actually benchmark and found out it
                      would not fit your requirements ?
                      But I would give it a shoot.
                      Unfortunately I have only some knowledge of SQLite which is not an
                      option here.
                      >
                      Some additional requirements I can think of are:
                      - Python (I want to hone my programming skills too)
                      - Python-only (no C-lib) for simplicity (installation, portability).
                      Therefore SQLite is not an option
                      - must be fast
                      These two requirements can conflict for some values of "fast".
                      - I like SQL (select a, b from ...) this would be nice (row[..] + ...
                      is a little hard getting used to)
                      >
                      So far I found PyDBLite, PyTables, Buzhug but they are difficult to
                      compare for a beginner.
                      Never used any of them - I have sqlite, mysql and pgsql installed on all
                      my machines -, so I can't help here.

                      Comment

                      • M.-A. Lemburg

                        #12
                        Re: (in memory) database

                        On 2008-08-31 15:15, mark wrote:
                        Hi there,
                        >
                        I need to extract data from text files (~4 GB) on this data some
                        operations are performed like avg, max, min, group etc. The result is
                        formated and written in some other text files (some KB).
                        >
                        I currently think about database tools might be suitable for this. I
                        would just write the import from the text files and ... the tool does
                        the rest. The only problem I can imagine is that this would not be
                        fast enough. But I would give it a shoot.
                        Unfortunately I have only some knowledge of SQLite which is not an
                        option here.
                        >
                        Some additional requirements I can think of are:
                        - Python (I want to hone my programming skills too)
                        - Python-only (no C-lib) for simplicity (installation, portability).
                        Therefore SQLite is not an option
                        - must be fast
                        - I like SQL (select a, b from ...) this would be nice (row[..] + ...
                        is a little hard getting used to)
                        >
                        So far I found PyDBLite, PyTables, Buzhug but they are difficult to
                        compare for a beginner.
                        You could use Gadfly for this since it is pure Python and provides
                        a standard Python DB-API interface:



                        (the C extensions are optional to speedup processing)

                        This is the SQL subset it supports:



                        Another option is SnakeSQL:



                        but I've never used that one, so can't judge its quality.

                        --
                        Marc-Andre Lemburg
                        eGenix.com

                        Professional Python Services directly from the Source (#1, Sep 01 2008)
                        >>Python/Zope Consulting and Support ... http://www.egenix.com/
                        >>mxODBC.Zope.D atabase.Adapter ... http://zope.egenix.com/
                        >>mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
                        _______________ _______________ _______________ _______________ ____________

                        :::: Try mxODBC.Zope.DA for Windows,Linux,S olaris,MacOSX for free ! ::::


                        eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
                        D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
                        Registered at Amtsgericht Duesseldorf: HRB 46611

                        Comment

                        • Zentrader

                          #13
                          Re: (in memory) database

                          I don't understand why Cameron has a different version of Python which
                          doesn't seem to have sqlite support enabled.
                          Agreed, but won't the package manager tell him if python-sqlite is
                          installed? That would be the next step since it appears that SQLite
                          intself is already installed. Since Ubuntu uses precompied binaries,
                          Python should be configured for SQLite which again leaves no python-
                          sqlite as the only possibility (yeah right). BTW Python is easy to
                          install manually.

                          Comment

                          • Peter Otten

                            #14
                            Re: (in memory) database

                            Zentrader wrote:
                            >I don't understand why Cameron has a different version of Python which
                            >doesn't seem to have sqlite support enabled.
                            >
                            Agreed, but won't the package manager tell him if python-sqlite is
                            installed? That would be the next step since it appears that SQLite
                            intself is already installed. Since Ubuntu uses precompied binaries,
                            Python should be configured for SQLite which again leaves no python-
                            sqlite as the only possibility (yeah right). BTW Python is easy to
                            install manually.
                            When you install Python manually from source you need the header files for
                            sqlite3 to get sqlite3 support. These are in the libsqlite3-dev package.

                            I think you can distinguish a manually installed python from the packaged
                            one by the .../local/... in its path, e. g., on my machine

                            $ which python2.5 # in the distribution
                            /usr/bin/python2.5
                            $ which python2.6
                            /usr/local/bin/python2.6 # installed from source

                            I have installed libsqlite3-dev so I can't reproduce Cameron's error, but
                            here's a similar one for bsddb:

                            $ python2.5
                            Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:43)
                            [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
                            Type "help", "copyright" , "credits" or "license" for more information.
                            >>import bsddb
                            >>bsddb.__file_ _
                            '/usr/lib/python2.5/bsddb/__init__.pyc'

                            $ python2.6
                            Python 2.6b2+ (trunk:65902, Aug 20 2008, 08:38:26)
                            [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
                            Type "help", "copyright" , "credits" or "license" for more information.
                            >>import bsddb
                            Traceback (most recent call last):
                            File "<stdin>", line 1, in <module>
                            File "/usr/local/lib/python2.6/bsddb/__init__.py", line 58, in <module>
                            import _bsddb
                            ImportError: No module named _bsddb

                            Peter

                            PS: Yes, I'm using 2.6, but I don't think that's relevant for the problem.

                            Comment

                            • Paul Boddie

                              #15
                              Re: (in memory) database

                              On 2 Sep, 17:38, Zentrader <zentrad...@gma il.comwrote:
                              I don't understand why Cameron has a different version of Python which
                              doesn't seem to have sqlite support enabled.
                              >
                              Agreed, but won't the package manager tell him if python-sqlite is
                              installed?
                              It shouldn't need to be installed: the python2.5 package includes the
                              sqlite3 module and the _sqlite extension module. He's running a more
                              modern version of Ubuntu than I am, but I don't think that they've
                              reintroduced the python-sqlite package in any form.
                              That would be the next step since it appears that SQLite
                              intself is already installed. Since Ubuntu uses precompied binaries,
                              Python should be configured for SQLite which again leaves no python-
                              sqlite as the only possibility (yeah right). BTW Python is easy to
                              install manually.
                              Indeed, which is why I think that there must be a manually installed
                              Python on his system, especially given that /usr/local/lib/python2.5/
                              sqlite3/__init__.py is one of the files mentioned in the traceback.

                              Paul

                              Comment

                              Working...