more os.walk() issues... probably user error

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • rbt

    more os.walk() issues... probably user error

    This function is intended to remove unwanted files and dirs from
    os.walk(). It will return correctly *IF* I leave the 'for fs in
    fs_objects' statement out (basically leave out the entire purpose of the
    function).

    It's odd, when the program goes into that statment... even when only a
    'pass', and nothing else is present, nothing is returned. Why is that?
    I'm testing Python 2.4 on Linux x86 and WinXP. Results are the same on
    either platform.

    def build_clean_lis t(self, path):

    file_skip_list = ['search_results .txt']
    dir_skip_list = ['dev', 'proc', 'Temporary Internet Files']

    fs_objects = os.walk(path, topdown=True)
    ## for fs in fs_objects:
    ##
    ## for f in fs[2]:
    ## if f in file_skip_list:
    ## print f
    ## fs[2].remove(f)
    ##
    ## for d in fs[1]:
    ## if d in dir_skip_list:
    ## print d
    ## fs[1].remove(d)

    return fs_objects


  • rbt

    #2
    Re: more os.walk() issues... probably user error

    rbt wrote:[color=blue]
    > This function is intended to remove unwanted files and dirs from
    > os.walk(). It will return correctly *IF* I leave the 'for fs in
    > fs_objects' statement out (basically leave out the entire purpose of the
    > function).
    >
    > It's odd, when the program goes into that statment... even when only a
    > 'pass', and nothing else is present, nothing is returned. Why is that?
    > I'm testing Python 2.4 on Linux x86 and WinXP. Results are the same on
    > either platform.
    >
    > def build_clean_lis t(self, path):
    >
    > file_skip_list = ['search_results .txt']
    > dir_skip_list = ['dev', 'proc', 'Temporary Internet Files']
    >
    > fs_objects = os.walk(path, topdown=True)
    > ## for fs in fs_objects:
    > ##
    > ## for f in fs[2]:
    > ## if f in file_skip_list:
    > ## print f
    > ## fs[2].remove(f)
    > ##
    > ## for d in fs[1]:
    > ## if d in dir_skip_list:
    > ## print d
    > ## fs[1].remove(d)
    >
    > return fs_objects
    >
    >[/color]

    Just to clarify, it's wrong of me to say that 'nothing is returned'...
    in either case, this is what is returned:

    Here's what was returned and its type:
    ----------------------------------------
    <generator object at 0x407dbe4c>
    <type 'generator'>
    ----------------------------------------

    But, I can't iterate over the returned object when I descend into the
    for statement I mentioned above.

    Comment

    • wittempj@hotmail.com

      #3
      Re: more os.walk() issues... probably user error

      That's an easy one: fs_objects is not modified by your ode, so you get
      it back as created by os.walk

      Comment

      • Dan Perl

        #4
        Re: more os.walk() issues... probably user error


        "rbt" <rbt@athop1.ath .vt.edu> wrote in message
        news:cuvsmi$4bb $1@solaris.cc.v t.edu...[color=blue]
        > rbt wrote:[color=green]
        >> This function is intended to remove unwanted files and dirs from
        >> os.walk(). It will return correctly *IF* I leave the 'for fs in
        >> fs_objects' statement out (basically leave out the entire purpose of the
        >> function).
        >>
        >> It's odd, when the program goes into that statment... even when only a
        >> 'pass', and nothing else is present, nothing is returned. Why is that?
        >> I'm testing Python 2.4 on Linux x86 and WinXP. Results are the same on
        >> either platform.
        >>
        >> def build_clean_lis t(self, path):
        >>
        >> file_skip_list = ['search_results .txt']
        >> dir_skip_list = ['dev', 'proc', 'Temporary Internet Files']
        >>
        >> fs_objects = os.walk(path, topdown=True)
        >> ## for fs in fs_objects:
        >> ##
        >> ## for f in fs[2]:
        >> ## if f in file_skip_list:
        >> ## print f
        >> ## fs[2].remove(f)
        >> ##
        >> ## for d in fs[1]:
        >> ## if d in dir_skip_list:
        >> ## print d
        >> ## fs[1].remove(d)
        >>
        >> return fs_objects
        >>
        >>[/color]
        >
        > Just to clarify, it's wrong of me to say that 'nothing is returned'... in
        > either case, this is what is returned:
        >
        > Here's what was returned and its type:
        > ----------------------------------------
        > <generator object at 0x407dbe4c>
        > <type 'generator'>
        > ----------------------------------------
        >
        > But, I can't iterate over the returned object when I descend into the for
        > statement I mentioned above.[/color]

        What do you mean by not being able to iterate over the returned object?
        What kind of error are you getting? Have you tried to debug the code?

        BTW, os.walk indeed returns a generator. You should familiarize yourself
        with generators and iterators if you haven't done so yet.


        Comment

        • Dan Perl

          #5
          Re: more os.walk() issues... probably user error


          "rbt" <rbt@athop1.ath .vt.edu> wrote in message
          news:cuvr5b$2ei $1@solaris.cc.v t.edu...[color=blue]
          > def build_clean_lis t(self, path):
          >
          > file_skip_list = ['search_results .txt']
          > dir_skip_list = ['dev', 'proc', 'Temporary Internet Files']
          >
          > fs_objects = os.walk(path, topdown=True)
          > ## for fs in fs_objects:
          > ##
          > ## for f in fs[2]:
          > ## if f in file_skip_list:
          > ## print f
          > ## fs[2].remove(f)
          > ##
          > ## for d in fs[1]:
          > ## if d in dir_skip_list:
          > ## print d
          > ## fs[1].remove(d)
          >
          > return fs_objects[/color]

          Rather as an aside, the idiom for using os.walk is
          for dirpath, dirnames, dirfiles in os.walk(path):
          for f in dirnames:
          if f in file_skip_list:
          print f
          filenames.remov e(f)
          if d in dir_skip_list:
          print d
          dirnames.remove (f)

          More crucially for your code, returning the generator object after having
          iterated all the way through it will not do you any good. The generator has
          an internal state that puts it at "the end of the iteration" so you cannot
          use it to iterate again.


          Comment

          • Kent Johnson

            #6
            Re: more os.walk() issues... probably user error

            rbt wrote:[color=blue]
            > rbt wrote:
            >[color=green]
            >> This function is intended to remove unwanted files and dirs from
            >> os.walk(). It will return correctly *IF* I leave the 'for fs in
            >> fs_objects' statement out (basically leave out the entire purpose of
            >> the function).
            >>
            >> It's odd, when the program goes into that statment... even when only a
            >> 'pass', and nothing else is present, nothing is returned. Why is that?
            >> I'm testing Python 2.4 on Linux x86 and WinXP. Results are the same on
            >> either platform.
            >>
            >> def build_clean_lis t(self, path):
            >>
            >> file_skip_list = ['search_results .txt']
            >> dir_skip_list = ['dev', 'proc', 'Temporary Internet Files']
            >>
            >> fs_objects = os.walk(path, topdown=True)[/color][/color]

            fs_objects is a generator, not a list. This loop is exhausting fs_objects, so when you return
            fs_objects is at the end of iteration, there is nothing left.
            [color=blue][color=green]
            >> ## for fs in fs_objects:
            >> ##
            >> ## for f in fs[2]:
            >> ## if f in file_skip_list:
            >> ## print f
            >> ## fs[2].remove(f)
            >> ##
            >> ## for d in fs[1]:
            >> ## if d in dir_skip_list:
            >> ## print d
            >> ## fs[1].remove(d)[/color][/color]

            Add this here:
            yield fs

            and take out the return. This turns build_clean_lis t() into a generator function and you will be
            able to iterate the result.

            Kent
            [color=blue][color=green]
            >>
            >> return fs_objects
            >>
            >>[/color]
            >
            > Just to clarify, it's wrong of me to say that 'nothing is returned'...
            > in either case, this is what is returned:
            >
            > Here's what was returned and its type:
            > ----------------------------------------
            > <generator object at 0x407dbe4c>
            > <type 'generator'>
            > ----------------------------------------
            >
            > But, I can't iterate over the returned object when I descend into the
            > for statement I mentioned above.
            >[/color]

            Comment

            • rbt

              #7
              Re: more os.walk() issues... probably user error

              Kent Johnson wrote:[color=blue]
              > rbt wrote:
              >[color=green]
              >> rbt wrote:
              >>[color=darkred]
              >>> This function is intended to remove unwanted files and dirs from
              >>> os.walk(). It will return correctly *IF* I leave the 'for fs in
              >>> fs_objects' statement out (basically leave out the entire purpose of
              >>> the function).
              >>>
              >>> It's odd, when the program goes into that statment... even when only
              >>> a 'pass', and nothing else is present, nothing is returned. Why is
              >>> that? I'm testing Python 2.4 on Linux x86 and WinXP. Results are the
              >>> same on either platform.
              >>>
              >>> def build_clean_lis t(self, path):
              >>>
              >>> file_skip_list = ['search_results .txt']
              >>> dir_skip_list = ['dev', 'proc', 'Temporary Internet Files']
              >>>
              >>> fs_objects = os.walk(path, topdown=True)[/color][/color]
              >
              >
              > fs_objects is a generator, not a list. This loop is exhausting
              > fs_objects, so when you return fs_objects is at the end of iteration,
              > there is nothing left.[/color]

              That makes sense. Thanks for the explanation. I've never used generators
              before.
              [color=blue]
              >[color=green][color=darkred]
              >>> ## for fs in fs_objects:
              >>> ##
              >>> ## for f in fs[2]:
              >>> ## if f in file_skip_list:
              >>> ## print f
              >>> ## fs[2].remove(f)
              >>> ##
              >>> ## for d in fs[1]:
              >>> ## if d in dir_skip_list:
              >>> ## print d
              >>> ## fs[1].remove(d)[/color][/color]
              >
              >
              > Add this here:
              > yield fs
              >
              > and take out the return. This turns build_clean_lis t() into a generator
              > function and you will be able to iterate the result.[/color]

              I'll try this.

              Will the changes I made (file and dir removals from os.walk()) be
              reflected in the generator object? Is it safe to remove objects this way
              and pass the results in a generator on to another function? Sorry for
              all the questions, I just like to fully understand something before I
              start doing it with confidence.

              rbt

              [color=blue]
              >
              > Kent
              >[color=green][color=darkred]
              >>>
              >>> return fs_objects
              >>>
              >>>[/color]
              >>
              >> Just to clarify, it's wrong of me to say that 'nothing is returned'...
              >> in either case, this is what is returned:
              >>
              >> Here's what was returned and its type:
              >> ----------------------------------------
              >> <generator object at 0x407dbe4c>
              >> <type 'generator'>
              >> ----------------------------------------
              >>
              >> But, I can't iterate over the returned object when I descend into the
              >> for statement I mentioned above.
              >>[/color][/color]

              Comment

              • Kent Johnson

                #8
                Re: more os.walk() issues... probably user error

                rbt wrote:[color=blue][color=green][color=darkred]
                >>>> ## for fs in fs_objects:
                >>>> ##
                >>>> ## for f in fs[2]:
                >>>> ## if f in file_skip_list:
                >>>> ## print f
                >>>> ## fs[2].remove(f)
                >>>> ##
                >>>> ## for d in fs[1]:
                >>>> ## if d in dir_skip_list:
                >>>> ## print d
                >>>> ## fs[1].remove(d)[/color][/color]
                >
                > Will the changes I made (file and dir removals from os.walk()) be
                > reflected in the generator object? Is it safe to remove objects this way
                > and pass the results in a generator on to another function? Sorry for
                > all the questions, I just like to fully understand something before I
                > start doing it with confidence.[/color]

                Yes. The docs for os.walk() explicitly state, "When topdown is true, the caller can modify the
                dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into
                the subdirectories whose names remain in dirnames."

                So changes to the dir list affect the iteration; changes to the file list directly affect the value
                you return to the caller.

                Kent

                Comment

                Working...