Python - Sort files based on timestamp encoded in the filename

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • helloR
    New Member
    • Jun 2015
    • 8

    Python - Sort files based on timestamp encoded in the filename

    I have a list which contains list of file names, i wanted to sort list of those files based on timestamp encoded in file names.

    Note: In file, Hello_Hi_2015-02-20T084521_14245 43480.tar.gz --> 2015-02-20T084521 represents as "year-moth-dayTHHMMSS" ( Based on this i wanted to sort )

    Input file below:

    file_list = ['Hello_Hi_2015-02-20T084521_14245 43480.tar.gz',
    'Hello_Hi_2015-02-20T095845_14245 43481.tar.gz',
    'Hello_Hi_2015-02-20T095926_14245 43481.tar.gz',
    'Hello_Hi_2015-02-20T100025_14245 43482.tar.gz',
    'Hello_Hi_2015-02-20T111631_14245 43483.tar.gz',
    'Hello_Hi_2015-02-20T111718_14245 43483.tar.gz',
    'Hello_Hi_2015-02-20T112502_14245 43483.tar.gz',
    'Hello_Hi_2015-02-20T112633_14245 43484.tar.gz',
    'Hello_Hi_2015-02-20T113427_14245 43484.tar.gz',
    'Hello_Hi_2015-02-20T113456_14245 43484.tar.gz',
    'Hello_Hi_2015-02-20T113608_14245 43484.tar.gz',
    'Hello_Hi_2015-02-20T113659_14245 43485.tar.gz',
    'Hello_Hi_2015-02-20T113809_14245 43485.tar.gz',
    'Hello_Hi_2015-02-20T113901_14245 43485.tar.gz',
    'Hello_Hi_2015-02-20T113955_14245 43485.tar.gz',
    'Hello_Hi_2015-03-20T114122_14245 43485.tar.gz',
    'Hello_Hi_2015-02-20T114532_14245 43486.tar.gz',
    'Hello_Hi_2015-02-20T120045_14245 43487.tar.gz',
    'Hello_Hi_2015-02-20T120146_14245 43487.tar.gz',
    'Hello_WR_2015-02-20T084709_14245 43480.tar.gz',
    'Hello_WR_2015-02-20T113016_14245 43486.tar.gz']

    Output should be:

    file_list = ['Hello_Hi_2015-02-20T084521_14245 43480.tar.gz',
    'Hello_WR_2015-02-20T084709_14245 43480.tar.gz',
    'Hello_Hi_2015-02-20T095845_14245 43481.tar.gz',
    'Hello_Hi_2015-02-20T095926_14245 43481.tar.gz',
    'Hello_Hi_2015-02-20T100025_14245 43482.tar.gz',
    'Hello_Hi_2015-02-20T111631_14245 43483.tar.gz',
    'Hello_Hi_2015-02-20T111718_14245 43483.tar.gz',
    'Hello_Hi_2015-02-20T112502_14245 43483.tar.gz',
    'Hello_Hi_2015-02-20T112633_14245 43484.tar.gz',
    'Hello_WR_2015-02-20T113016_14245 43486.tar.gz',
    'Hello_Hi_2015-02-20T113427_14245 43484.tar.gz',
    'Hello_Hi_2015-02-20T113456_14245 43484.tar.gz',
    'Hello_Hi_2015-02-20T113608_14245 43484.tar.gz',
    'Hello_Hi_2015-02-20T113659_14245 43485.tar.gz',
    'Hello_Hi_2015-02-20T113809_14245 43485.tar.gz',
    'Hello_Hi_2015-02-20T113901_14245 43485.tar.gz',
    'Hello_Hi_2015-02-20T113955_14245 43485.tar.gz',
    'Hello_Hi_2015-02-20T114532_14245 43486.tar.gz',
    'Hello_Hi_2015-02-20T120045_14245 43487.tar.gz',
    'Hello_Hi_2015-02-20T120146_14245 43487.tar.gz',
    'Hello_Hi_2015-03-20T114122_14245 43485.tar.gz']
    Below is the code which i have tried.
    Code:
    def sort( dir ):
       os.chdir( dir )
       file_list = glob.glob('Hello_*')
       file_list.sort(key=os.path.getmtime)
       print("\n".join(file_list))
       return 0
    Thanks in advance!!
  • bvdet
    Recognized Expert Specialist
    • Oct 2006
    • 2851

    #2
    Here is a regex solution:
    Code:
    import re
    
    pattern = re.compile(r"^\D+?_\D+?_(.+?)_")
    
    def sort_on_TS(a, b):
        return cmp(pattern.match(a).group(1), pattern.match(b).group(1))
    
    for item in sorted(file_list, sort_on_TS):
        print item
    This method uses string method split:
    Code:
    def sort_on_TS(a,b):
        return cmp(a.split("_")[2], b.split("_")[2])
    More lines to write, but you could also use string method index:
    Code:
    def sort_on_TS(a,b):
        idx1 = a.index("_", a.index("_")+1)
        idx2 = a.index("_", idx1+1)
        idx11 = b.index("_", b.index("_")+1)
        idx22 = b.index("_", idx11+1)
        return cmp(a[idx1:idx2+1], b[idx11:idx22+1])

    Comment

    • helloR
      New Member
      • Jun 2015
      • 8

      #3
      @bvdet: Excellent solution!!! Thank you very much!!!

      Comment

      Working...