combining the path and fileinput modules

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • wo_shi_big_stomach

    combining the path and fileinput modules

    Newbie to python writing a script to recurse a directory tree and delete
    the first line of a file if it contains a given string. I get the same
    error on a Mac running OS X 10.4.8 and FreeBSD 6.1.

    Here's the script:

    # start of program

    # p.pl - fix broken SMTP headers in email files
    #
    # recurses from dir and searches all subdirs
    # for each file, evaluates whether 1st line starts with "From "
    # for each match, program deletes line

    import fileinput
    import os
    import re
    import string
    import sys
    from path import path

    # recurse dirs
    dir = path(/home/wsbs/Maildir)
    for f in dir.walkfiles(' *'):
    #
    # test:
    # print f
    #
    # open file, search, change if necessary, write backup
    for line in fileinput.input (f, inplace=1, backup='.bak'):
    # check first line only
    if fileinput.isfir stline():
    if not re.search('^Fro m ',line):
    print line.rstrip('\n ')
    # just print all other lines
    if not fileinput.isfir stline():
    print line.rstrip('\n ')
    fileinput.close ()
    # end of program

    The script produces this error:

    Traceback (most recent call last):
    File "./p", line 22, in ?
    for line in fileinput.input (f, inplace=1, backup='.bak'):
    File "/sw/lib/python2.4/fileinput.py", line 231, in next
    line = self.readline()
    File "/sw/lib/python2.4/fileinput.py", line 300, in readline
    os.rename(self. _filename, self._backupfil ename)
    OSError: [Errno 21] Is a directory

    If I uncomment that test routine, and comment out the fileinput stuff,
    the program DOES print the full pathname/filename for the variable f.

    Many thanks for clues as to why fileinput.input doesn't like f.

  • Rob Wolfe

    #2
    Re: combining the path and fileinput modules


    wo_shi_big_stom ach wrote:
    Newbie to python writing a script to recurse a directory tree and delete
    the first line of a file if it contains a given string. I get the same
    error on a Mac running OS X 10.4.8 and FreeBSD 6.1.
    >
    Here's the script:
    >
    # start of program
    >
    # p.pl - fix broken SMTP headers in email files
    #
    # recurses from dir and searches all subdirs
    # for each file, evaluates whether 1st line starts with "From "
    # for each match, program deletes line
    >
    import fileinput
    import os
    import re
    import string
    import sys
    from path import path
    >
    # recurse dirs
    dir = path(/home/wsbs/Maildir)
    for f in dir.walkfiles(' *'):
    #
    # test:
    # print f
    Are you absolutely sure that f list doesn't contain
    any path to directory, not file?
    Add this:

    f = filter(os.path. isfile, f)

    and try one more time.
    #
    # open file, search, change if necessary, write backup
    for line in fileinput.input (f, inplace=1, backup='.bak'):
    # check first line only
    if fileinput.isfir stline():
    if not re.search('^Fro m ',line):
    print line.rstrip('\n ')
    # just print all other lines
    if not fileinput.isfir stline():
    print line.rstrip('\n ')
    fileinput.close ()
    # end of program
    --
    HTH,
    Rob

    Comment

    • wo_shi_big_stomach

      #3
      Re: combining the path and fileinput modules

      On 11/23/06 6:15 AM, Rob Wolfe wrote:
      wo_shi_big_stom ach wrote:
      >Newbie to python writing a script to recurse a directory tree and delete
      >the first line of a file if it contains a given string. I get the same
      >error on a Mac running OS X 10.4.8 and FreeBSD 6.1.
      >>
      >Here's the script:
      >>
      ># start of program
      >>
      ># p.pl - fix broken SMTP headers in email files
      >#
      ># recurses from dir and searches all subdirs
      ># for each file, evaluates whether 1st line starts with "From "
      ># for each match, program deletes line
      >>
      >import fileinput
      >import os
      >import re
      >import string
      >import sys
      >from path import path
      >>
      ># recurse dirs
      >dir = path(/home/wsbs/Maildir)
      >for f in dir.walkfiles(' *'):
      > #
      > # test:
      > # print f
      >
      Are you absolutely sure that f list doesn't contain
      any path to directory, not file?
      Add this:
      >
      f = filter(os.path. isfile, f)
      >
      and try one more time.
      Sorry, no joy. Printing f then produces:

      rppp
      rppppp
      rppppp
      rpppr
      rppppp
      rpppP
      rppppp
      rppppp

      which I assure you are not the filenames in this directory.

      I've tried this with f and f.name. The former prints the full pathname
      and filename; the latter prints just the filename. But neither works
      with the fileinput.input () call below.

      I get the same error with the filtered mod as before:

      File "./p", line 23, in ?
      for line in fileinput.input (f, inplace=1, backup='.bak'):

      Thanks again for info on what to feed fileinput.input ()


      >
      > #
      > # open file, search, change if necessary, write backup
      > for line in fileinput.input (f, inplace=1, backup='.bak'):
      > # check first line only
      > if fileinput.isfir stline():
      > if not re.search('^Fro m ',line):
      > print line.rstrip('\n ')
      > # just print all other lines
      > if not fileinput.isfir stline():
      > print line.rstrip('\n ')
      > fileinput.close ()
      ># end of program
      >

      Comment

      • Gabriel Genellina

        #4
        Re: combining the path and fileinput modules

        At Thursday 23/11/2006 12:21, wo_shi_big_stom ach wrote:
        dir = path(/home/wsbs/Maildir)
        for f in dir.walkfiles(' *'):
        #
        # test:
        # print f
        Are you absolutely sure that f list doesn't contain
        any path to directory, not file?
        Add this:

        f = filter(os.path. isfile, f)

        and try one more time.
        >
        >Sorry, no joy. Printing f then produces:
        >
        >rppp
        >rppppp
        >rppppp
        The filter should be applied to walkfiles. Something like this:

        dir = path(/home/wsbs/Maildir)
        for f in filter(os.path. isfile, dir.walkfiles(' *')):
        #
        # test:
        # print f


        --
        Gabriel Genellina
        Softlab SRL

        _______________ _______________ _______________ _____
        Correo Yahoo!
        Espacio para todos tus mensajes, antivirus y antispam ¡gratis!
        ¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar

        Comment

        • wo_shi_big_stomach

          #5
          Re: combining the path and fileinput modules

          Gabriel Genellina wrote:
          The filter should be applied to walkfiles. Something like this:
          >
          dir = path(/home/wsbs/Maildir)
          for f in filter(os.path. isfile, dir.walkfiles(' *')):
          #
          # test:
          # print f
          Thanks, this way f will print the full pathname/filename. But f already
          does that using Jason Orendorff's path module:

          dir = path('/home/wsbs/Maildir')
          for f in dir.walkfiles(' *'):
          print f

          Printing the full path/filename isn't the problem. The problem instead
          is how to supply f to fileinput.input ().

          Either the path or the os.path methods cause this line:

          for line in fileinput.input (f, inplace=1, backup='.bak'):

          to throw this error:

          File "./p2.py", line 23, in ?
          for line in fileinput.input (f, inplace=1, backup='.bak'):

          At this point I believe the error has to do with fileinput, not the path
          or os.path modules.

          If I give fileinput.input () a hardcoded path/filename in place of 'f'
          the program runs. However the program will not accept either f or 'f' as
          an argument to fileinput.input ().

          Again, thanks for guidance on the care and feeding of fileinput.input ()

          /wsbs

          import fileinput
          import os
          import re
          import string
          import sys
          from path import path

          # p.pl - fix broken SMTP headers in email files
          #
          # recurses from dir and searches all subdirs
          # for each file, evaluates whether 1st line starts with "From "
          # for each match, program deletes line

          # recurse dirs
          dir = path('/home/wsbs/Maildir')
          #for f in dir.walkfiles(' *'):
          for f in filter(os.path. isfile, dir.walkfiles(' *')):
          #
          # test: this will print the full path/filename of each file
          print f
          #
          # open file, search, change if necessary, write backup
          # for line in fileinput.input ('f', inplace=1, backup='.bak'):
          # # just print 2nd and subsequent lines
          # if not fileinput.isfir stline():
          # print line.rstrip('\n ')
          # # check first line only
          # elif fileinput.isfir stline():
          # if not re.search('^Fro m ',line):
          # print line.rstrip('\n ')
          # fileinput.close ()


          Comment

          • Gabriel Genellina

            #6
            Re: combining the path and fileinput modules

            At Saturday 25/11/2006 00:14, wo_shi_big_stom ach wrote:
            The filter should be applied to walkfiles. Something like this:

            dir = path(/home/wsbs/Maildir)
            for f in filter(os.path. isfile, dir.walkfiles(' *')):
            #
            # test:
            # print f
            >
            >Thanks, this way f will print the full pathname/filename. But f already
            >does that using Jason Orendorff's path module:
            >
            >dir = path('/home/wsbs/Maildir')
            >for f in dir.walkfiles(' *'):
            print f
            The filter is used to exclude directories. fileinput can't handle directories.
            >At this point I believe the error has to do with fileinput, not the path
            >or os.path modules.
            >
            >If I give fileinput.input () a hardcoded path/filename in place of 'f'
            >the program runs. However the program will not accept either f or 'f' as
            >an argument to fileinput.input ().
            Tried with (f,) ?
            Notice that *this* error is not the same as your previous error.


            --
            Gabriel Genellina
            Softlab SRL

            _______________ _______________ _______________ _____
            Correo Yahoo!
            Espacio para todos tus mensajes, antivirus y antispam ¡gratis!
            ¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar

            Comment

            • wo_shi_big_stomach

              #7
              Re: combining the path and fileinput modules

              Gabriel Genellina wrote:
              At Saturday 25/11/2006 00:14, wo_shi_big_stom ach wrote:
              >
              The filter should be applied to walkfiles. Something like this:
              >
              dir = path(/home/wsbs/Maildir)
              for f in filter(os.path. isfile, dir.walkfiles(' *')):
              #
              # test:
              # print f
              >>
              >Thanks, this way f will print the full pathname/filename. But f already
              >does that using Jason Orendorff's path module:
              >>
              >dir = path('/home/wsbs/Maildir')
              >for f in dir.walkfiles(' *'):
              > print f
              >
              The filter is used to exclude directories. fileinput can't handle
              directories.
              ???

              Both routines above produce identical output -- full path/filenames.
              Neither prints just a directory name.
              >
              >At this point I believe the error has to do with fileinput, not the path
              >or os.path modules.
              >>
              >If I give fileinput.input () a hardcoded path/filename in place of 'f'
              >the program runs. However the program will not accept either f or 'f' as
              >an argument to fileinput.input ().
              >
              Tried with (f,) ?
              Notice that *this* error is not the same as your previous error.
              File "p2.py", line 23, in ?
              for line in fileinput.input (f,):
              File
              "/System/Library/Frameworks/Python.framewor k/Versions/2.3/lib/python2.3/fileinput.py",
              line 231, in next
              line = self.readline()
              File
              "/System/Library/Frameworks/Python.framewor k/Versions/2.3/lib/python2.3/fileinput.py",
              line 320, in readline
              self._file = open(self._file name, "r")

              This looks similar to before -- fileinput.input () still isn't operating
              on the input.

              Again, I'm looking 1) walk through all files in a directory tree and 2)
              using fileinput, evaluate and possibly edit the files.

              The current version of the program is below.

              thanks!

              /wsbs

              # start of program
              import fileinput
              import os
              import re
              import string
              import sys
              from path import path

              # p2.py - fix broken SMTP headers in email files
              #
              # recurses from dir and searches all subdirs
              # for each file, evaluates whether 1st line starts with "From "
              # for each match, program deletes line

              # recurse dirs
              dir = path('/home/wsbs/Maildir')
              #for f in dir.walkfiles(' *'):
              for f in filter(os.path. isfile, dir.walkfiles(' *')):
              #
              # test: this will print the full path/filename of each file
              # print f
              #
              # open file, search, change if necessary, write backup
              for line in fileinput.input (f,):
              # just print 2nd and subsequent lines
              if not fileinput.isfir stline():
              print line.rstrip('\n ')
              # check first line only
              elif fileinput.isfir stline():
              if not re.search('^Fro m ',line):
              print line.rstrip('\n ')
              fileinput.close ()

              # end of program

              Comment

              • wo_shi_big_stomach

                #8
                Re: combining the path and fileinput modules SOLVED

                Dennis Lee Bieber wrote:
                On Sat, 25 Nov 2006 07:58:26 -0800, wo_shi_big_stom ach
                <wo_shi_big_sto mach@mac.comdec laimed the following in
                comp.lang.pytho n:
                >
                > File "p2.py", line 23, in ?
                > for line in fileinput.input (f,):
                > File
                >"/System/Library/Frameworks/Python.framewor k/Versions/2.3/lib/python2.3/fileinput.py",
                >line 231, in next
                > line = self.readline()
                > File
                >"/System/Library/Frameworks/Python.framewor k/Versions/2.3/lib/python2.3/fileinput.py",
                >line 320, in readline
                > self._file = open(self._file name, "r")
                >>
                >This looks similar to before -- fileinput.input () still isn't operating
                >on the input.
                >>
                And where is the actual exception message line -- the one with the
                error code/description.
                >
                >
                >dir = path('/home/wsbs/Maildir')
                >#for f in dir.walkfiles(' *'):
                >for f in filter(os.path. isfile, dir.walkfiles(' *')):
                >
                If I understand the documentation of fileinput, you shouldn't even
                need this output loop; fileinput is designed to expect a list of files
                (that it works with a single file seems an afterthought)
                Yes, thanks. This is the key point.

                Feeding fileinput.input () a list rather than a single file (or whatever
                it's called in Python) got my program working. Thanks!
                >
                > for line in fileinput.input (f,):
                for line in fileinput.input (filter(os.path .isfile,
                dir.walkfiles(" *")),
                inplace=1):
                >
                should handle all the files...
                Indeed it does -- too many times.

                Sorry, but this (and the program you provided) iterate over the entire
                list N times, where N is the number of files, rather than doing one
                iteration on each file.

                For instance, using your program with inplace editing and a ".bak" file
                extension for the originals, I ended up with filenames like
                name.bak.bak.ba k.bak.bak in a directory with five files in it.

                I don't have this third party path
                module, so the directory tree walking isn't active, but...
                The path module:



                is a *lot* cleaner than os.path; see the examples at that URL.

                Thanks for the great tip about fileinput.input (), and thanks to all who
                answered my query. I've pasted the working code below.

                /wsbs

                import fileinput
                import os
                import re
                import string
                import sys
                from path import path

                # p2.py - fix broken SMTP headers in email files
                #
                # recurses from dir and searches all subdirs
                # for each file, evaluates whether 1st line starts with "From "
                # for each match, program deletes line

                # recurse dirs
                dir = path('/home/wsbs/Maildir')
                g = dir.walkfiles(' *')
                for line in fileinput.input (g, inplace=1, backup='.bak'):
                # just print 2nd and subsequent lines
                if not fileinput.isfir stline():
                print line.rstrip('\n ')
                # check first line only
                elif fileinput.isfir stline():
                if not re.search('^Fro m ',line):
                print line.rstrip('\n ')
                fileinput.close ()

                Comment

                • John Machin

                  #9
                  search versus match in re module

                  wo_shi_big_stom ach wrote:
                  Thanks for the great tip about fileinput.input (), and thanks to all who
                  answered my query. I've pasted the working code below.
                  >
                  [snip]
                  # check first line only
                  elif fileinput.isfir stline():
                  if not re.search('^Fro m ',line):
                  This "works", and in this case you are doing it on only the first line
                  in each file, but for future reference:

                  1. Read the re docs section about when to use search and when to use
                  match; the "^" anchor in your pattern means that search and match give
                  the same result here.

                  However the time they take to do it can differ quite a bit :-0

                  C:\junk>\python 25\python -mtimeit -s"import re;text='x'*100 "
                  "re.match('^Fro m ',
                  text)"
                  100000 loops, best of 3: 4.39 usec per loop

                  C:\junk>\python 25\python -mtimeit -s"import re;text='x'*100 0"
                  "re.match('^Fro m '
                  ,text)"
                  100000 loops, best of 3: 4.41 usec per loop

                  C:\junk>\python 25\python -mtimeit -s"import re;text='x'*100 00"
                  "re.match('^Fro m
                  ',text)"
                  100000 loops, best of 3: 4.4 usec per loop

                  C:\junk>\python 25\python -mtimeit -s"import re;text='x'*100 "
                  "re.search('^Fr om '
                  ,text)"
                  100000 loops, best of 3: 6.54 usec per loop

                  C:\junk>\python 25\python -mtimeit -s"import re;text='x'*100 0"
                  "re.search('^Fr om
                  ',text)"
                  10000 loops, best of 3: 26 usec per loop

                  C:\junk>\python 25\python -mtimeit -s"import re;text='x'*100 00"
                  "re.search('^Fr om
                  ',text)"
                  1000 loops, best of 3: 219 usec per loop

                  Aside: I noticed this years ago but assumed that the simple
                  optimisation of search was not done as a penalty on people who didn't
                  RTFM, and so didn't report it :-)

                  2. Then realise that your test is equivalent to

                  if not line.startswith ('^From '):

                  which is much easier to understand without the benefit of comments, and
                  (bonus!) is also much faster than re.match:

                  C:\junk>\python 25\python -mtimeit -s"text='x'*1 00"
                  "text.startswit h('^From ')"
                  1000000 loops, best of 3: 0.584 usec per loop

                  C:\junk>\python 25\python -mtimeit -s"text='x'*1000 "
                  "text.startswit h('^From ')"
                  1000000 loops, best of 3: 0.583 usec per loop

                  C:\junk>\python 25\python -mtimeit -s"text='x'*1000 0"
                  "text.startswit h('^From ')"

                  1000000 loops, best of 3: 0.612 usec per loop

                  HTH,
                  John

                  Comment

                  • John Machin

                    #10
                    Re: search versus match in re module

                    John Machin wrote:
                    [snip]
                    2. Then realise that your test is equivalent to
                    >
                    if not line.startswith ('^From '):
                    Whoops!

                    That '^From ' (and all later ones) should have been 'From '

                    (the perils of over-hasty copy/paste)

                    The timings are, if anything, a tiny bit faster than before.

                    Cheers,
                    John

                    Comment

                    • Gabriel Genellina

                      #11
                      Re: combining the path and fileinput modules SOLVED

                      At Sunday 26/11/2006 01:29, wo_shi_big_stom ach wrote:
                      >for line in fileinput.input (g, inplace=1, backup='.bak'):
                      ># just print 2nd and subsequent lines
                      if not fileinput.isfir stline():
                      print line.rstrip('\n ')
                      # check first line only
                      elif fileinput.isfir stline():
                      if not re.search('^Fro m ',line):
                      print line.rstrip('\n ')
                      Just a note: the elif is redundant, use a simple else clause.


                      --
                      Gabriel Genellina
                      Softlab SRL

                      _______________ _______________ _______________ _____
                      Correo Yahoo!
                      Espacio para todos tus mensajes, antivirus y antispam ¡gratis!
                      ¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar

                      Comment

                      Working...