os.walk help

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • hokieghal99

    os.walk help

    This script is not recursive... in order to make it recursive, I have to
    call it several times (my kludge... hey, it works). I thought os.walk's
    sole purpose was to recursively walk a directory structure, no? Also,it
    generates the below error during the os.renames section, but the odd
    thing is that it actually renames the files before saying it can't find
    them. Any ideas are welcomed. If I'm doing something *really* wrong
    here, just let me know.

    #-------------- ERROR Message ----------------------#

    File "/home/rbt/fix-names-1.1.py", line 29, in ?
    clean_names(set path)
    File "/home/rbt/fix-names-1.1.py", line 27, in clean_names
    os.renames(oldp ath, newpath)
    File "/usr/local/lib/python2.3/os.py", line 196, in renames
    rename(old, new)
    OSError: [Errno 2] No such file or directory

    #------------- Code -------------------------#

    setpath = raw_input("Path to the Directory: ")
    bad = re.compile(r'[*?<>/\|\\]')
    for root, dirs, files in os.walk(setpath ):
    for dname in dirs:
    badchars = bad.findall(dna me)
    for badchar in badchars:
    newdname = dname.replace(b adchar,'-')
    if newdname != dname:
    newpath = os.path.join(ro ot, newdname)
    oldpath = os.path.join(ro ot, dname)
    os.renames(oldp ath, newpath)

  • Joe Francia

    #2
    Re: os.walk help

    hokieghal99 wrote:[color=blue]
    > This script is not recursive... in order to make it recursive, I have to
    > call it several times (my kludge... hey, it works). I thought os.walk's
    > sole purpose was to recursively walk a directory structure, no? Also,it
    > generates the below error during the os.renames section, but the odd
    > thing is that it actually renames the files before saying it can't find
    > them. Any ideas are welcomed. If I'm doing something *really* wrong
    > here, just let me know.
    >
    > #-------------- ERROR Message ----------------------#
    >
    > File "/home/rbt/fix-names-1.1.py", line 29, in ?
    > clean_names(set path)
    > File "/home/rbt/fix-names-1.1.py", line 27, in clean_names
    > os.renames(oldp ath, newpath)
    > File "/usr/local/lib/python2.3/os.py", line 196, in renames
    > rename(old, new)
    > OSError: [Errno 2] No such file or directory
    >
    > #------------- Code -------------------------#
    >
    > setpath = raw_input("Path to the Directory: ")
    > bad = re.compile(r'[*?<>/\|\\]')
    > for root, dirs, files in os.walk(setpath ):
    > for dname in dirs:
    > badchars = bad.findall(dna me)
    > for badchar in badchars:
    > newdname = dname.replace(b adchar,'-')
    > if newdname != dname:
    > newpath = os.path.join(ro ot, newdname)
    > oldpath = os.path.join(ro ot, dname)
    > os.renames(oldp ath, newpath)
    >[/color]

    Your code is trying to recurse into the list of directories in 'dirs',
    but you are renaming these directories before it can get to them. For
    example, if dirs = ['baddir?*', 'gooddir', 'okdir'], you rename
    'baddir?*' to 'baddir--' and then os.walk tries to enter 'baddir?*' and
    cannot find it. You're better off building a list of paths to rename,
    and then renaming them outside of the os.walk scope, or doing something
    like...

    dirs.remove(dna me)
    dirs.append(new dname)

    ....in your 'if' block.

    Peace,
    Joe

    Comment

    • hokiegal99

      #3
      Re: os.walk help

      Joe Francia wrote:[color=blue]
      > Your code is trying to recurse into the list of directories in 'dirs',
      > but you are renaming these directories before it can get to them. For
      > example, if dirs = ['baddir?*', 'gooddir', 'okdir'], you rename
      > 'baddir?*' to 'baddir--' and then os.walk tries to enter 'baddir?*' and
      > cannot find it. You're better off building a list of paths to rename,
      > and then renaming them outside of the os.walk scope, or doing something
      > like...
      >
      > dirs.remove(dna me)
      > dirs.append(new dname)
      >
      > ...in your 'if' block.
      >
      > Peace,
      > Joe[/color]

      So, which is better... rename in the os.walk scope or not? The below
      code works sometimes at others it produces this error:

      ValueError: list.remove(x): x is not in list

      setpath = raw_input("Path to the Directory: ")
      def clean_names(set path):
      bad = re.compile(r'%2 f|%25|%20|[*?<>/\|\\]')
      for root, dirs, files in os.walk(setpath ):
      for dname in dirs:
      badchars = bad.findall(dna me)
      for badchar in badchars:
      newdname = dname.replace(b adchar,'-')
      if newdname != dname:
      dirs.remove(dna me)
      dirs.append(new dname)
      newpath = os.path.join(ro ot, newdname)
      oldpath = os.path.join(ro ot, dname)
      os.renames(oldp ath, newpath)



      Comment

      • Robin Munn

        #4
        Re: os.walk help

        hokiegal99 <hokiegal99@hot mail.com> wrote:[color=blue]
        > Joe Francia wrote:[color=green]
        >> Your code is trying to recurse into the list of directories in 'dirs',
        >> but you are renaming these directories before it can get to them. For
        >> example, if dirs = ['baddir?*', 'gooddir', 'okdir'], you rename
        >> 'baddir?*' to 'baddir--' and then os.walk tries to enter 'baddir?*' and
        >> cannot find it. You're better off building a list of paths to rename,
        >> and then renaming them outside of the os.walk scope, or doing something
        >> like...
        >>
        >> dirs.remove(dna me)
        >> dirs.append(new dname)
        >>
        >> ...in your 'if' block.
        >>
        >> Peace,
        >> Joe[/color]
        >
        > So, which is better... rename in the os.walk scope or not? The below
        > code works sometimes at others it produces this error:
        >
        > ValueError: list.remove(x): x is not in list[/color]

        That's strange. It shouldn't be happening. Stick some print statements
        in there and see what's going on:
        [color=blue]
        > setpath = raw_input("Path to the Directory: ")
        > def clean_names(set path):
        > bad = re.compile(r'%2 f|%25|%20|[*?<>/\|\\]')
        > for root, dirs, files in os.walk(setpath ):
        > for dname in dirs:
        > badchars = bad.findall(dna me)
        > for badchar in badchars:
        > newdname = dname.replace(b adchar,'-')
        > if newdname != dname:[/color]
        try:[color=blue]
        > dirs.remove(dna me)[/color]
        except ValueError:
        print "%s not in %s" % (dname, dirs)
        else:[color=blue]
        > dirs.append(new dname)
        > newpath = os.path.join(ro ot, newdname)
        > oldpath = os.path.join(ro ot, dname)
        > os.renames(oldp ath, newpath)[/color]

        Note that I'm assuming it's the dirs.remove(dna me) call that's
        triggering the ValueError, since there aren't any invocations of
        list.remove() anywhere else in your sample code. But I could be wrong;
        you should look at the complete exception trace, which will include the
        line number at which the exception was thrown.

        --
        Robin Munn
        rmunn@pobox.com

        Comment

        • hokiegal99

          #5
          Re: os.walk help

          Thanks for the tip. That code shows all of the dirs that Python is
          complaining about not in the list... trouble is, they *are* in the list.
          Go figure. I'd like to try doing the rename outside the scope of
          os.walk, but I don't undersdtand how to do this, when I break out of
          os.walk and try the rename at a parallel level, Python complains that
          variables such as "oldpath" and "newpath" are undefined.

          Robin Munn wrote:[color=blue]
          > hokiegal99 <hokiegal99@hot mail.com> wrote:
          >[color=green]
          >>Joe Francia wrote:
          >>[color=darkred]
          >>>Your code is trying to recurse into the list of directories in 'dirs',
          >>>but you are renaming these directories before it can get to them. For
          >>>example, if dirs = ['baddir?*', 'gooddir', 'okdir'], you rename
          >>>'baddir?*' to 'baddir--' and then os.walk tries to enter 'baddir?*' and
          >>>cannot find it. You're better off building a list of paths to rename,
          >>>and then renaming them outside of the os.walk scope, or doing something
          >>>like...
          >>>
          >>>dirs.remove( dname)
          >>>dirs.append( newdname)
          >>>
          >>>...in your 'if' block.
          >>>
          >>>Peace,
          >>>Joe[/color]
          >>
          >>So, which is better... rename in the os.walk scope or not? The below
          >>code works sometimes at others it produces this error:
          >>
          >>ValueError: list.remove(x): x is not in list[/color]
          >
          >
          > That's strange. It shouldn't be happening. Stick some print statements
          > in there and see what's going on:
          >
          >[color=green]
          >>setpath = raw_input("Path to the Directory: ")
          >>def clean_names(set path):
          >> bad = re.compile(r'%2 f|%25|%20|[*?<>/\|\\]')
          >> for root, dirs, files in os.walk(setpath ):
          >> for dname in dirs:
          >> badchars = bad.findall(dna me)
          >> for badchar in badchars:
          >> newdname = dname.replace(b adchar,'-')
          >> if newdname != dname:[/color]
          >
          > try:
          >[color=green]
          >> dirs.remove(dna me)[/color]
          >
          > except ValueError:
          > print "%s not in %s" % (dname, dirs)
          > else:
          >[color=green]
          >> dirs.append(new dname)
          >> newpath = os.path.join(ro ot, newdname)
          >> oldpath = os.path.join(ro ot, dname)
          >> os.renames(oldp ath, newpath)[/color]
          >
          >
          > Note that I'm assuming it's the dirs.remove(dna me) call that's
          > triggering the ValueError, since there aren't any invocations of
          > list.remove() anywhere else in your sample code. But I could be wrong;
          > you should look at the complete exception trace, which will include the
          > line number at which the exception was thrown.
          >[/color]


          Comment

          • afilip--usenet@freenet.de

            #6
            Re: os.walk help

            > This script is not recursive... in order to make it recursive, I have to
            [color=blue]
            > call it several times (my kludge... hey, it works). I thought os.walk's
            > sole purpose was to recursively walk a directory structure, no? Also,it
            > generates the below error during the os.renames section, but the odd
            > thing is that it actually renames the files before saying it can't find
            > them. Any ideas are welcomed. If I'm doing something *really* wrong
            > here, just let me know.[/color]

            Try iterating from bottom to top.

            See "help(os.walk)" :
            walk(top, topdown=True, onerror=None)

            ...

            If optional arg 'topdown' is true or not specified, the triple for a
            directory is generated before the triples for any of its
            subdirectories
            (directories are generated top down). If topdown is false, the triple
            for a directory is generated after the triples for all of its
            subdirectories (directories are generated bottom up).

            ...

            Comment

            • Robin Munn

              #7
              Re: os.walk help

              hokiegal99 <hokiegal99@hot mail.com> wrote:[color=blue]
              > Thanks for the tip. That code shows all of the dirs that Python is
              > complaining about not in the list... trouble is, they *are* in the list.
              > Go figure. I'd like to try doing the rename outside the scope of
              > os.walk, but I don't undersdtand how to do this, when I break out of
              > os.walk and try the rename at a parallel level, Python complains that
              > variables such as "oldpath" and "newpath" are undefined.[/color]

              Wait, I just realized that you're changing the list *while* you're
              iterating over it. That's a bad idea. See the warning at the bottom of
              this page in the language reference:

              The official home of the Python Programming Language


              Instead of modifying the list while you're looping over it, use the
              topdown argument to os.walk to build the tree from the bottom up instead
              of from the top down. That way you won't have to futz with the dirnames
              list at all:

              def clean_names(roo tpath):
              bad = re.compile(r'%2 f|%25|%20|[*?<>/\|\\]')
              for root, dirs, files in os.walk(rootpat h, topdown=False):
              for dname in dirs:
              newdname = re.sub(bad, '-', dname)
              if newdname != dname:
              newpath = os.path.join(ro ot, newdname)
              oldpath = os.path.join(ro ot, dname)
              os.renames(oldp ath, newpath)

              Notice also the use of re.sub to do all the character substitutions at
              once. Your code as written would have failed on a filename like "foo*?",
              since it always renamed from the original filename: it would have first
              done os.renames("foo *?", "foo-?") followed by os.renames("foo *?",
              "foo--") and the second would have raised an OSError.

              --
              Robin Munn
              rmunn@pobox.com

              Comment

              • Peter Otten

                #8
                Re: os.walk help

                Robin Munn wrote:
                [color=blue]
                > hokiegal99 <hokiegal99@hot mail.com> wrote:[color=green]
                >> Thanks for the tip. That code shows all of the dirs that Python is
                >> complaining about not in the list... trouble is, they *are* in the list.
                >> Go figure. I'd like to try doing the rename outside the scope of
                >> os.walk, but I don't undersdtand how to do this, when I break out of
                >> os.walk and try the rename at a parallel level, Python complains that
                >> variables such as "oldpath" and "newpath" are undefined.[/color]
                >
                > Wait, I just realized that you're changing the list *while* you're
                > iterating over it. That's a bad idea. See the warning at the bottom of
                > this page in the language reference:[/color]

                Here's a way to modify the list while iterating over it. Too lazy to
                generate the sample directory tree, so I suggest that the OP test it :-)

                <untested>
                def clean_names(roo tpath):
                bad = re.compile(r'%2 f|%25|%20|[*?<>/\|\\]')
                for root, dirs, files in os.walk(rootpat h):
                for index, dname in enumerate(dirs) :
                newdname = bad.sub('-', dname)
                if newdname != dname:
                newpath = os.path.join(ro ot, newdname)
                oldpath = os.path.join(ro ot, dname)
                try:
                os.rename(oldpa th, newpath)
                except OSError:
                print >> sys.stderr, "cannot rename %r to %r" %
                (oldpath, newpath)
                else:
                dirs[index] = newdname # inform os.walk() about the new
                name
                </untested>

                Peter

                Comment

                • hokiegal99

                  #9
                  Re: os.walk help

                  This works great! No errors... and it gets dirs that are 8 levels deep
                  (that's as far down as I've tested). Thanks for the tip! The re.sub
                  seems to be much faster than the string find/replace approach as well...
                  I need to read-up more on the documentation of os.walk and re in
                  general. Thanks again!!!


                  Robin Munn wrote:[color=blue]
                  > Wait, I just realized that you're changing the list *while* you're
                  > iterating over it. That's a bad idea. See the warning at the bottom of
                  > this page in the language reference:
                  >
                  > http://www.python.org/doc/current/ref/for.html
                  >
                  > Instead of modifying the list while you're looping over it, use the
                  > topdown argument to os.walk to build the tree from the bottom up instead
                  > of from the top down. That way you won't have to futz with the dirnames
                  > list at all:
                  >
                  > def clean_names(roo tpath):
                  > bad = re.compile(r'%2 f|%25|%20|[*?<>/\|\\]')
                  > for root, dirs, files in os.walk(rootpat h, topdown=False):
                  > for dname in dirs:
                  > newdname = re.sub(bad, '-', dname)
                  > if newdname != dname:
                  > newpath = os.path.join(ro ot, newdname)
                  > oldpath = os.path.join(ro ot, dname)
                  > os.renames(oldp ath, newpath)
                  >
                  > Notice also the use of re.sub to do all the character substitutions at
                  > once. Your code as written would have failed on a filename like "foo*?",
                  > since it always renamed from the original filename: it would have first
                  > done os.renames("foo *?", "foo-?") followed by os.renames("foo *?",
                  > "foo--") and the second would have raised an OSError.
                  >[/color]


                  Comment

                  • hokiegal99

                    #10
                    Re: os.walk help

                    Could we discuss more about the topdown feature in os.walk? My script is
                    working fine now, I have no trouble at all with it. I just want to
                    better understand os.walk in Python 2.3. This is how I understand it as
                    of today, someone please correct me if I'm wrong:

                    topdown=False would build a list of filesystem (fs) objects from the
                    bottom up. The objects at the begining of the list would be the end-most
                    objects (the leaf nodes) of the fs. When you make changes to that list,
                    the changes would be from leaf node to os.walk's root instead of root to
                    leaf node, correct? For example, if I had this dir structure:

                    dir_a
                    file_a
                    dir_b
                    file_b

                    My list would look like this:

                    file_b
                    dir_b
                    file_a
                    dir_a

                    And, if I made changes to the list and commited those changes to the fs
                    then there would be no problems because of the order in which the
                    changes are made. Is this a proper way to describe topdown=False in
                    os.walk? Or in other words, our list would be static (one change would
                    not impact another), where if topdown=True our list would be dynamic
                    (one change could impact another).

                    Thanks for the help!!!




                    [color=blue]
                    > Robin Munn wrote:
                    >[color=green]
                    >> Wait, I just realized that you're changing the list *while* you're
                    >> iterating over it. That's a bad idea. See the warning at the bottom of
                    >> this page in the language reference:
                    >>
                    >> http://www.python.org/doc/current/ref/for.html
                    >>
                    >> Instead of modifying the list while you're looping over it, use the
                    >> topdown argument to os.walk to build the tree from the bottom up instead
                    >> of from the top down. That way you won't have to futz with the dirnames
                    >> list at all:
                    >>
                    >> def clean_names(roo tpath):
                    >> bad = re.compile(r'%2 f|%25|%20|[*?<>/\|\\]')
                    >> for root, dirs, files in os.walk(rootpat h, topdown=False):
                    >> for dname in dirs:
                    >> newdname = re.sub(bad, '-', dname)
                    >> if newdname != dname:
                    >> newpath = os.path.join(ro ot, newdname)
                    >> oldpath = os.path.join(ro ot, dname)
                    >> os.renames(oldp ath, newpath)
                    >>
                    >> Notice also the use of re.sub to do all the character substitutions at
                    >> once. Your code as written would have failed on a filename like "foo*?",
                    >> since it always renamed from the original filename: it would have first
                    >> done os.renames("foo *?", "foo-?") followed by os.renames("foo *?",
                    >> "foo--") and the second would have raised an OSError.
                    >>[/color]
                    >
                    >[/color]


                    Comment

                    Working...