Matching Directory Names and Grouping Them

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • J

    Matching Directory Names and Grouping Them

    Hello Group-

    I have limited programming experience, but I'm looking for a generic
    way to search through a root directory for subdirectories with similar
    names, organize and group them by matching their subdirectory path, and
    then output their full paths into a text file. For example, the
    contents of the output text file may look like this:

    <root>\Input1\2 001\01\
    <root>\Input2\2 001\01\
    <root>\Input3\2 001\01\

    <root>\Input1\2 002\03\
    <root>\Input2\2 002\03\
    <root>\Input3\2 002\03\

    <root>\Input2\2 005\05\
    <root>\Input3\2 005\05\

    <root>\Input1\2 005\12\
    <root>\Input3\2 005\12\

    I tried working with python regular expressions, but so far haven't
    found code that can do the trick. Any help would be greatly
    appreciated. Thanks!


    J.

  • Steve Holden

    #2
    Re: Matching Directory Names and Grouping Them

    J wrote:
    Hello Group-
    >
    I have limited programming experience, but I'm looking for a generic
    way to search through a root directory for subdirectories with similar
    names, organize and group them by matching their subdirectory path, and
    then output their full paths into a text file. For example, the
    contents of the output text file may look like this:
    >
    <root>\Input1\2 001\01\
    <root>\Input2\2 001\01\
    <root>\Input3\2 001\01\
    >
    <root>\Input1\2 002\03\
    <root>\Input2\2 002\03\
    <root>\Input3\2 002\03\
    >
    <root>\Input2\2 005\05\
    <root>\Input3\2 005\05\
    >
    <root>\Input1\2 005\12\
    <root>\Input3\2 005\12\
    >
    I tried working with python regular expressions, but so far haven't
    found code that can do the trick. Any help would be greatly
    appreciated. Thanks!
    >
    Define "similar".

    regards
    Steve
    --
    Steve Holden +44 150 684 7255 +1 800 494 3119
    Holden Web LLC/Ltd http://www.holdenweb.com
    Skype: holdenweb http://del.icio.us/steve.holden
    Blog of Note: http://holdenweb.blogspot.com

    Comment

    • J

      #3
      Re: Matching Directory Names and Grouping Them

      Steve-

      Thanks for the reply. I think what I'm trying to say by similar is
      pattern matching. Essentially, walking through a directory tree
      starting at a specified root folder, and returning a list of all
      folders that matches a pattern, in this case, a folder name containing
      a four digit number representing year and a subdirectory name
      containing a two digit number representing a month. The matches are
      grouped together and written into a text file. I hope this helps.

      Kind Regards,
      J

      Steve Holden wrote:
      J wrote:
      Hello Group-

      I have limited programming experience, but I'm looking for a generic
      way to search through a root directory for subdirectories with similar
      names, organize and group them by matching their subdirectory path, and
      then output their full paths into a text file. For example, the
      contents of the output text file may look like this:

      <root>\Input1\2 001\01\
      <root>\Input2\2 001\01\
      <root>\Input3\2 001\01\

      <root>\Input1\2 002\03\
      <root>\Input2\2 002\03\
      <root>\Input3\2 002\03\

      <root>\Input2\2 005\05\
      <root>\Input3\2 005\05\

      <root>\Input1\2 005\12\
      <root>\Input3\2 005\12\

      I tried working with python regular expressions, but so far haven't
      found code that can do the trick. Any help would be greatly
      appreciated. Thanks!
      Define "similar".
      >
      regards
      Steve
      --
      Steve Holden +44 150 684 7255 +1 800 494 3119
      Holden Web LLC/Ltd http://www.holdenweb.com
      Skype: holdenweb http://del.icio.us/steve.holden
      Blog of Note: http://holdenweb.blogspot.com

      Comment

      • Virgil Dupras

        #4
        Re: Matching Directory Names and Grouping Them

        >From your example, if you want to group every path that has the same
        last 9 characters, a simple solution could be something like:

        groups = {}
        for path in paths:
        group = groups.setdefau lt(path[-9:],[])
        group.append(pa th)

        I didn't actually test it, there ight be syntax errors.

        J wrote:
        Steve-
        >
        Thanks for the reply. I think what I'm trying to say by similar is
        pattern matching. Essentially, walking through a directory tree
        starting at a specified root folder, and returning a list of all
        folders that matches a pattern, in this case, a folder name containing
        a four digit number representing year and a subdirectory name
        containing a two digit number representing a month. The matches are
        grouped together and written into a text file. I hope this helps.
        >
        Kind Regards,
        J
        >
        Steve Holden wrote:
        J wrote:
        Hello Group-
        >
        I have limited programming experience, but I'm looking for a generic
        way to search through a root directory for subdirectories with similar
        names, organize and group them by matching their subdirectory path, and
        then output their full paths into a text file. For example, the
        contents of the output text file may look like this:
        >
        <root>\Input1\2 001\01\
        <root>\Input2\2 001\01\
        <root>\Input3\2 001\01\
        >
        <root>\Input1\2 002\03\
        <root>\Input2\2 002\03\
        <root>\Input3\2 002\03\
        >
        <root>\Input2\2 005\05\
        <root>\Input3\2 005\05\
        >
        <root>\Input1\2 005\12\
        <root>\Input3\2 005\12\
        >
        I tried working with python regular expressions, but so far haven't
        found code that can do the trick. Any help would be greatly
        appreciated. Thanks!
        >
        Define "similar".

        regards
        Steve
        --
        Steve Holden +44 150 684 7255 +1 800 494 3119
        Holden Web LLC/Ltd http://www.holdenweb.com
        Skype: holdenweb http://del.icio.us/steve.holden
        Blog of Note: http://holdenweb.blogspot.com

        Comment

        • Neil Cerutti

          #5
          Re: Matching Directory Names and Grouping Them

          On 2007-01-11, J <wilder.usenet@ gmail.comwrote:
          Steve-
          >
          Thanks for the reply. I think what I'm trying to say by similar
          is pattern matching. Essentially, walking through a directory
          tree starting at a specified root folder, and returning a list
          of all folders that matches a pattern, in this case, a folder
          name containing a four digit number representing year and a
          subdirectory name containing a two digit number representing a
          month. The matches are grouped together and written into a text
          file. I hope this helps.
          Here's a solution using itertools.group by, just because this is
          the first programming problem I've seen that seemed to call for
          it. Hooray!

          from itertools import groupby

          def print_by_date(d irs):
          r""" Group a directory list according to date codes.
          >>data = [
          ... "<root>/Input2/2002/03/",
          ... "<root>/Input1/2001/01/",
          ... "<root>/Input3/2005/05/",
          ... "<root>/Input3/2001/01/",
          ... "<root>/Input1/2002/03/",
          ... "<root>/Input3/2005/12/",
          ... "<root>/Input2/2001/01/",
          ... "<root>/Input3/2002/03/",
          ... "<root>/Input2/2005/05/",
          ... "<root>/Input1/2005/12/"]
          >>print_by_date (data)
          <root>/Input1/2001/01/
          <root>/Input2/2001/01/
          <root>/Input3/2001/01/
          <BLANKLINE>
          <root>/Input1/2002/03/
          <root>/Input2/2002/03/
          <root>/Input3/2002/03/
          <BLANKLINE>
          <root>/Input2/2005/05/
          <root>/Input3/2005/05/
          <BLANKLINE>
          <root>/Input1/2005/12/
          <root>/Input3/2005/12/
          <BLANKLINE>

          """
          def date_key(path):
          return path[-7:]
          groups =[list(g) for _,g in groupby(sorted( dirs, key=date_key), date_key)]
          for g in groups:
          print '\n'.join(path for path in sorted(g))
          print

          if __name__ == "__main__":
          import doctest
          doctest.testmod ()

          I really wanted nested join calls for the output, to suppress
          that trailing blank line, but I kept getting confused and
          couldn't sort it out.

          It would better to use the os.path module, but I couldn't find
          the function in there lets me pull out path tails.

          I didn't filter out stuff that didn't match the date path
          convention you used.

          --
          Neil Cerutti

          Comment

          Working...