In need of a virtual filesystem / archive

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Enigma Curry

    In need of a virtual filesystem / archive

    I need to store a large number of files in an archive. From Python, I
    need to be able to create an archive, put files into it, modify files
    that are already in it, and delete files already in it.

    The easy solution would be to use a zip file or a tar file. Python has
    good standard modules for accessing those types. However, I would tend
    to think that modifying or deleting files in the archive would require
    rewriting the entire archive.

    Is there any archive format that can allow Python to modify a file in
    the archive *in place*? That is to say if my archive is 2GB large and I
    have a small text file in the archive I want to be able to modify that
    small text file (or delete it) without having to rewrite the entire
    archive to disk.

    Does anything like this exist? If nothing exists for Python, is there
    something written in C maybe that I could wrap (preferably you won't
    suggest wrapping the ext2 filesystem driver.. ;) ?

  • bonono@gmail.com

    #2
    Re: In need of a virtual filesystem / archive

    may be store them in sqlite ?

    On linux, fuse can also be an interesting option, gmailfs is written in
    python.

    Enigma Curry wrote:[color=blue]
    > I need to store a large number of files in an archive. From Python, I
    > need to be able to create an archive, put files into it, modify files
    > that are already in it, and delete files already in it.
    >
    > The easy solution would be to use a zip file or a tar file. Python has
    > good standard modules for accessing those types. However, I would tend
    > to think that modifying or deleting files in the archive would require
    > rewriting the entire archive.
    >
    > Is there any archive format that can allow Python to modify a file in
    > the archive *in place*? That is to say if my archive is 2GB large and I
    > have a small text file in the archive I want to be able to modify that
    > small text file (or delete it) without having to rewrite the entire
    > archive to disk.
    >
    > Does anything like this exist? If nothing exists for Python, is there
    > something written in C maybe that I could wrap (preferably you won't
    > suggest wrapping the ext2 filesystem driver.. ;) ?[/color]

    Comment

    • Paul Rubin

      #3
      Re: In need of a virtual filesystem / archive

      "Enigma Curry" <workbee@gmail. com> writes:[color=blue]
      > Is there any archive format that can allow Python to modify a file in
      > the archive *in place*? That is to say if my archive is 2GB large and I
      > have a small text file in the archive I want to be able to modify that
      > small text file (or delete it) without having to rewrite the entire
      > archive to disk.
      >
      > Does anything like this exist?[/color]

      Yes, what you want is called a database. Try the bsddb module or
      something with MySQL depending on your requirements.

      Comment

      • Steven D'Aprano

        #4
        Re: In need of a virtual filesystem / archive

        Enigma Curry wrote:
        [color=blue]
        > I need to store a large number of files in an archive. From Python, I
        > need to be able to create an archive, put files into it, modify files
        > that are already in it, and delete files already in it.
        >
        > The easy solution would be to use a zip file or a tar file. Python has
        > good standard modules for accessing those types. However, I would tend
        > to think that modifying or deleting files in the archive would require
        > rewriting the entire archive.
        >
        > Is there any archive format that can allow Python to modify a file in
        > the archive *in place*? That is to say if my archive is 2GB large and I
        > have a small text file in the archive I want to be able to modify that
        > small text file (or delete it) without having to rewrite the entire
        > archive to disk.[/color]

        Yes. I believe your common or garden variety file
        manager can handle this task, by storing files in an
        archive called "a directory". For example, many mail
        systems use the "maildir" archive for storing email
        while still being able to access it quickly and robustly.

        Do you really need to store your files in a single
        meta-file? Do you need compression? How much overhead
        for the archive structure are you prepared to carry? Do
        you expect the archive to shrink when you delete a file
        from the middle?

        I suspect you can pick any two of the following three:

        1. single file
        2. space used for deleted files is reclaimed
        3. fast performance

        Using a proper database will give you 2 and 3, but at
        the cost of a lot of overhead, and typically a
        relational database is not a single file.



        --
        Steven.

        Comment

        • bonono@gmail.com

          #5
          Re: In need of a virtual filesystem / archive


          Steven D'Aprano wrote:[color=blue]
          > I suspect you can pick any two of the following three:
          >
          > 1. single file
          > 2. space used for deleted files is reclaimed
          > 3. fast performance
          >
          > Using a proper database will give you 2 and 3, but at
          > the cost of a lot of overhead, and typically a
          > relational database is not a single file.[/color]
          sqlite can give 1-3, it does have overhead but whether it worths it
          depends on individual judgement based on features, usage pattern etc..
          I think monotone use it.

          Comment

          • Rene Pijlman

            #6
            Re: In need of a virtual filesystem / archive

            Enigma Curry:[color=blue]
            >I need to store a large number of files in an archive. From Python, I
            >need to be able to create an archive, put files into it, modify files
            >that are already in it, and delete files already in it.[/color]

            Use the file system. That's what it's for.

            --
            René Pijlman

            Comment

            • Ivan Vilata i Balaguer

              #7
              Re: In need of a virtual filesystem / archive

              En/na Enigma Curry ha escrit::
              [color=blue]
              > I need to store a large number of files in an archive. From Python, I
              > need to be able to create an archive, put files into it, modify files
              > that are already in it, and delete files already in it.
              >[...]
              > Is there any archive format that can allow Python to modify a file in
              > the archive *in place*? That is to say if my archive is 2GB large and I
              > have a small text file in the archive I want to be able to modify that
              > small text file (or delete it) without having to rewrite the entire
              > archive to disk.
              >[...][/color]

              Although it is not its main usage, PyTables_ can be used to store
              ordinary files in a single HDF5_ file. HDF5 files have a hierarchical
              structure of nodes and groups which maps quite well to files and
              directories. You can create, read, modify, copy, move and remove nodes
              at will, freed space is reclaimed, and HDF5 is very efficient no matter
              how large data is.

              For working with the files, PyTables includes a FileNode_ module which
              offers Python file semantics for nodes in an HDF5 file. You can also
              keep nodes transparently compressed, or you may repack the whole HDF5
              file to defragment it or (de)compress its nodes, which may make a
              reasonable option to a compressed archive.

              I will be pleased to give more information. Hope that helps.

              .. _PyTables: http://www.pytables.org/
              .. _HDF5: http://hdf.ncsa.uiuc.edu/HDF5/
              .. _FileNode: http://pytables.sourceforge.net/html...ersguide6.html

              import disclaimer

              ::

              Ivan Vilata i Balaguer >qo< http://www.carabos.com/
              Cárabos Coop. V. V V Enjoy Data
              ""


              -----BEGIN PGP SIGNATURE-----
              Version: GnuPG v1.4.1 (GNU/Linux)

              iD8DBQFD+wXCmKr UC8oEF40RAnF2AJ 40ZFvZhujkpK2Gt AXXZOA05EUBXQCg inkR
              JrkqUEMB8pKxyPg hkKlY7Gg=
              =7iCi
              -----END PGP SIGNATURE-----

              Comment

              • Enigma Curry

                #8
                Re: In need of a virtual filesystem / archive

                Thanks for all the suggestions!

                I realized a few minutes after I posted that a database would work.. I
                just wasn't in that "mode" of thinking when I posted.

                PyTables also looks very interesting, especially because apparently I
                can read a file in the archive like a normal python file, ie one line
                at a time.

                Could I do the same using SQL? I'm assuming I would get the whole file
                back when I did my SELECT statement. I guess I could chunk the file out
                and store it in multiple rows, but that sounds complicated.

                Comment

                Working...