Python plain-text database or library that supports joins?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • felciano

    Python plain-text database or library that supports joins?

    Hello --

    Is there a convention, library or Pythonic idiom for performing
    lightweight relational operations on flatfiles? I frequently find
    myself writing code to do simple SQL-like operations between flat
    files, such as appending columns from one file to another, linked
    through a common id. For example, take a list of addresses and append
    a 'district' field by looking up a congressional district from a
    second file that maps zip codes to districts.

    Conceptually this is a simple database operation with a join on a
    common field (zip code in the above example). Other case use other
    relational operators (projection, cross-product, etc) so I'm really
    looking for something SQL-like in functionality. However, the data is
    in flat-files, the file structure changes frequently, the files are
    dynamically generated from a range of sources, are short-lived in
    nature, and otherwise not warrant the hassle of a database setup. So
    I've been looking around for a nice, Pythonic, zero-config (no
    parsers, no setup/teardown, etc) solution for simple queries that
    handles a database of csv-files-with-headers automatically. There are
    number of solutions that are close, but in the end come up short:

    - KirbyBase 1.9 (latest Python version) is the closest that I could
    find, as it lets you keep your data in flatfiles and perform
    operations using the field names from those text-based tables, but it
    doesn't support joins (the more recent Ruby version seems to).
    - Buzhug and Sqlite have their data structures w no automatic .tab
    or .csv parsing (unless sqlite includes a way to map flatfiles to
    sqlite virtual tables that I don't know about).
    - http://aspn.activestate.com/ASPN/Coo.../Recipe/159974 is
    heading in the right direction, as it shows how to perform relational
    operations on lists and are index based rather than field-name based.
    - http://aspn.activestate.com/ASPN/Coo.../Recipe/498130 and

    provide ways of automatically populating DBs but not the reverse
    (persist changes back out to the data files)

    The closest alternatives I've found are the GNU textutils that support
    join, cut, merge, etc but I need to add additional logic they don't
    support, nor do they allow field-level write operations from Python
    (UPDATE ... WHERE ...). Normally I'd jump right in and start coding
    but this seems like something so common that I would have expected
    someone else to have solved, so in the interest of not re-inventing
    the wheel I thought I'd see if anyone had any other suggestions. Any
    thoughts?

    Thanks!

    Ramon

  • askel

    #2
    Re: Python plain-text database or library that supports joins?

    On Jun 22, 1:18 pm, felciano <felci...@gmail .comwrote:
    Hello --
    >
    Is there a convention, library or Pythonic idiom for performing
    lightweight relational operations on flatfiles? I frequently find
    myself writing code to do simple SQL-like operations between flat
    files, such as appending columns from one file to another, linked
    through a common id. For example, take a list of addresses and append
    a 'district' field by looking up a congressional district from a
    second file that maps zip codes to districts.
    >
    Conceptually this is a simple database operation with a join on a
    common field (zip code in the above example). Other case use other
    relational operators (projection, cross-product, etc) so I'm really
    looking for something SQL-like in functionality. However, the data is
    in flat-files, the file structure changes frequently, the files are
    dynamically generated from a range of sources, are short-lived in
    nature, and otherwise not warrant the hassle of a database setup. So
    I've been looking around for a nice, Pythonic, zero-config (no
    parsers, no setup/teardown, etc) solution for simple queries that
    handles a database of csv-files-with-headers automatically. There are
    number of solutions that are close, but in the end come up short:
    >
    - KirbyBase 1.9 (latest Python version) is the closest that I could
    find, as it lets you keep your data in flatfiles and perform
    operations using the field names from those text-based tables, but it
    doesn't support joins (the more recent Ruby version seems to).
    - Buzhug and Sqlite have their data structures w no automatic .tab
    or .csv parsing (unless sqlite includes a way to map flatfiles to
    sqlite virtual tables that I don't know about).
    -http://aspn.activestat e.com/ASPN/Cookbook/Python/Recipe/159974is
    heading in the right direction, as it shows how to perform relational
    operations on lists and are index based rather than field-name based.
    -http://aspn.activestat e.com/ASPN/Cookbook/Python/Recipe/498130andhttp://furius.ca/pubcode/pub/conf/common/bin/csv-db-import.html
    provide ways of automatically populating DBs but not the reverse
    (persist changes back out to the data files)
    >
    The closest alternatives I've found are the GNU textutils that support
    join, cut, merge, etc but I need to add additional logic they don't
    support, nor do they allow field-level write operations from Python
    (UPDATE ... WHERE ...). Normally I'd jump right in and start coding
    but this seems like something so common that I would have expected
    someone else to have solved, so in the interest of not re-inventing
    the wheel I thought I'd see if anyone had any other suggestions. Any
    thoughts?
    >
    Thanks!
    >
    Ramon
    ramon,

    i don't think that using flat text files as a database is common these
    days. if you need relational database features what stops you from
    using rdbms? if the only reason for that is some legacy system then
    i'd still use in-memory sqlite database for all relational operations.
    import, process, export back to text if you need to.

    Comment

    • felciano

      #3
      Re: Python plain-text database or library that supports joins?

      >
      i don't think that using flat text files as a database is common these
      days. if you need relational database features what stops you from
      using rdbms? if the only reason for that is some legacy system then
      i'd still use in-memory sqlite database for all relational operations.
      import, process, export back to text if you need to.
      >
      These are often one-off operations, so those import + export steps are
      non-trivial overhead. For example, most log files are structured, but
      it seems like we still use scripts or command line tools to find data
      in those files. I'm essentially doing the same thing, only with
      operations across multiple files (e.g. merge records these two files
      based on a common key, or append a column based on a look up value). I
      may end up having to go to DB, but that seems like a heavyweight jump
      for what are otherwise simple operations.

      Maybe this is the wrong forum for the question. I prefer programming
      in Python, but the use cases I'm looking is closer to shell scripting.
      I'd be perfectly happy with a more powerful version of GNU textutils
      that allowed for greater flexibility in text manipulation.

      HTH,

      Ramon

      Comment

      • Alan Isaac

        #4
        Re: Python plain-text database or library that supports joins?

        Not Python, but maybe relevant:


        Alan Isaac

        Comment

        • Michele Simionato

          #5
          Re: Python plain-text database or library that supports joins?

          On Jun 22, 7:18 pm, felciano <felci...@gmail .comwrote:
          Hello --
          >
          Is there a convention, library or Pythonic idiom for performing
          lightweight relational operations on flatfiles? I frequently find
          myself writing code to do simple SQL-like operations between flat
          files, such as appending columns from one file to another, linked
          through a common id. For example, take a list of addresses and append
          a 'district' field by looking up a congressional district from a
          second file that maps zip codes to districts.
          Have you looked at itools?



          HTH,

          Michele Simionato

          Comment

          • Joshua J. Kugler

            #6
            Re: Python plain-text database or library that supports joins?

            On Friday 22 June 2007 09:18, felciano wrote:
            Hello --
            >
            Is there a convention, library or Pythonic idiom for performing
            lightweight relational operations on flatfiles? I frequently find
            myself writing code to do simple SQL-like operations between flat
            files, such as appending columns from one file to another, linked
            through a common id. For example, take a list of addresses and append
            a 'district' field by looking up a congressional district from a
            second file that maps zip codes to districts.
            Two pointers, but maybe not a complete solution:


            Perl library that uses CSV files and supports simple joins. Maybe a port of
            this?


            Functional interface for CSV files inspired by Sprite, but does not support
            joins. Possibly could be extended?

            j

            --
            Joshua Kugler
            Lead System Admin -- Senior Programmer

            PGP Key: http://pgp.mit.edu/  ID 0xDB26D7CE

            --
            Posted via a free Usenet account from http://www.teranews.com

            Comment

            Working...