Re: how to remove oldest files up to a limit efficiently

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Terry Reedy

    Re: how to remove oldest files up to a limit efficiently



    Dan Stromberg wrote:
    On Tue, 08 Jul 2008 15:18:23 -0700, linuxnow@gmail. com wrote:
    >
    >I need to mantain a filesystem where I'll keep only the most recently
    >used (MRU) files; least recently used ones (LRU) have to be removed to
    >leave space for newer ones. The filesystem in question is a clustered fs
    >(glusterfs) which is very slow on "find" operations. To add complexity
    >there are more than 10^6 files in 2 levels: 16³ dirs with equally
    >distributed number of files inside.
    >
    >Any suggestions of how to do it effectively?
    >
    os.walk once.
    >
    Build a list of all files in memory.
    >
    Sort them by whatever time you prefer - you can get times from os.stat.
    Since you do not need all 10**6 files sorted, you might also try the
    heapq module. The entries into the heap would be (time, fileid)

  • linuxnow@gmail.com

    #2
    Re: how to remove oldest files up to a limit efficiently

    On Jul 9, 7:08 pm, Terry Reedy <tjre...@udel.e duwrote:
    Dan Stromberg wrote:
    On Tue, 08 Jul 2008 15:18:23 -0700, linux...@gmail. com wrote:
    >
    I need to mantain a filesystem where I'll keep only the most recently
    used (MRU) files; least recently used ones (LRU) have to be removed to
    leave space for newer ones. The filesystem in question is a clustered fs
    (glusterfs) which is very slow on "find" operations. To add complexity
    there are more than 10^6 files in 2 levels: 16³ dirs with equally
    distributed number of files inside.
    >
    Any suggestions of how to do it effectively?
    >
    os.walk once.
    >
    Build a list of all files in memory.
    >
    Sort them by whatever time you prefer - you can get times from os.stat.
    >
    Since you do not need all 10**6 files sorted, you might also try the
    heapq module.  The entries into the heap would be (time, fileid)
    I'll look into it: probably sorting dirs by atime and adding the files
    inside to the heapq until I can remove enough of them would work very
    efficiently.

    Thanks
    Pau

    Comment

    Working...