recursive file editing

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • TaeKyon

    recursive file editing

    I'm a python newbie; here are a few questions relative to a
    problem I'm trying to solve; I'm wandering if python is the best
    instrument or if awk or a mix of bash and sed would be better:

    1) how would I get recursively descend
    through all files in all subdirectories of
    the one in which my script is called ?

    2) each file examined should be edited; IF a string of this type is found

    foo.asp=dev?bar (where bar can be a digit or an empty space)

    it should always be substituted with this string

    foo-bar.html (if bar is an empty space the new string is foo-.html)

    3) the names of files read may themselves be of the sort foo.asp=dev?bar ;
    the edited output file should also be renamed according to the same rule
    as above ... or would this be better handled by a bash script ?

    Any hints appreciated

    --
    Michele Alzetta

  • Leif B. Kristensen

    #2
    Re: recursive file editing

    This is a Perl one-liner:

    perl -p -i -e 's/foo/bar/gi' `find ./`

    regards,
    --
    Leif Biberg Kristensen

    Validare necesse est

    Comment

    • Josiah Carlson

      #3
      Re: recursive file editing

      > I'm a python newbie; here are a few questions relative to a[color=blue]
      > problem I'm trying to solve; I'm wandering if python is the best
      > instrument or if awk or a mix of bash and sed would be better:
      >
      > 1) how would I get recursively descend
      > through all files in all subdirectories of
      > the one in which my script is called ?[/color]

      Check out os.path.walk.

      [color=blue]
      > 2) each file examined should be edited; IF a string of this type is found
      >
      > foo.asp=dev?bar (where bar can be a digit or an empty space)
      >
      > it should always be substituted with this string
      >
      > foo-bar.html (if bar is an empty space the new string is foo-.html)[/color]

      Check out the re module.

      [color=blue]
      > 3) the names of files read may themselves be of the sort foo.asp=dev?bar ;
      > the edited output file should also be renamed according to the same rule
      > as above ... or would this be better handled by a bash script ?[/color]

      Do it however you feel more comfortable, Python can do it.


      - Josiah

      Comment

      • TaeKyon

        #4
        Re: recursive file editing

        Il Sat, 03 Apr 2004 22:35:30 +0200, Leif B. Kristensen ha scritto:
        [color=blue]
        > This is a Perl one-liner:
        >
        > perl -p -i -e 's/foo/bar/gi' `find ./`[/color]

        Didn't work; however I realized I could just repeatedly run variations of

        sed -i s/foo/bar/g *

        and then rename the files with a bash script called renna.

        I'm sure python could do it but I got stuck -

        for file in os.walk('mydir' ):
        print file[2]

        gives me the names of all files

        but how do I open each with r+ mode ?

        for thing in os.walk('mydir' ):
        file(thing,mode =r+)

        is invalid syntax

        --
        Michele Alzetta

        Comment

        • Josiah Carlson

          #5
          Re: recursive file editing

          > for thing in os.walk('mydir' ):[color=blue]
          > file(thing,mode =r+)[/color]

          for thing in os.walk('mydir' ):
          filehandle = file(thing, 'r+')


          - Josiah

          Comment

          • Peter Otten

            #6
            Re: recursive file editing

            TaeKyon wrote:
            [color=blue]
            > I'm a python newbie; here are a few questions relative to a
            > problem I'm trying to solve; I'm wandering if python is the best
            > instrument or if awk or a mix of bash and sed would be better:
            >
            > 1) how would I get recursively descend
            > through all files in all subdirectories of
            > the one in which my script is called ?
            >
            > 2) each file examined should be edited; IF a string of this type is found
            >
            > foo.asp=dev?bar (where bar can be a digit or an empty space)
            >
            > it should always be substituted with this string
            >
            > foo-bar.html (if bar is an empty space the new string is foo-.html)
            >
            > 3) the names of files read may themselves be of the sort foo.asp=dev?bar ;
            > the edited output file should also be renamed according to the same rule
            > as above ... or would this be better handled by a bash script ?
            >
            > Any hints appreciated
            >[/color]

            The following code comes with no warranties. Be sure to backup valuable data
            before trying it. You may need to edit the regular expressions. Call the
            script with the directory you want to process.

            Peter

            import os, re, sys

            class Path(object):
            def __init__(self, folder, name):
            self.folder = folder
            self.name = name

            def _get_path(self) :
            return os.path.join(se lf.folder, self.name)
            path = property(_get_p ath)

            def rename(self, newname):
            if self.name != newname:
            os.rename(self. path, os.path.join(se lf.folder, newname))
            self.name = newname

            def processContents (self, operation):
            data = file(self.path) .read()
            newdata = operation(data)
            if data != newdata:
            file(self.path, "w").write(newd ata)

            def __str__(self):
            return self.path

            def files(rootfolde r):
            for folder, folders, files in os.walk(rootfol der):
            for name in files:
            yield Path(folder, name)

            fileExpr = re.compile(r"^( .+?)\.asp\=dev\ ?(.*)$")
            filePattern = r"\1-\2.html"

            textExpr = re.compile(r"([/\'\"])(.+?)\.asp\=de v\?(.*?)([\'\"])")
            textPattern = r"\1\2-\3.html\4"

            if __name__ == "__main__":
            for f in files(sys.argv[1]):
            f.rename(fileEx pr.sub(filePatt ern, f.name))
            f.processConten ts(lambda s: textExpr.sub(te xtPattern, s))

            Comment

            • TaeKyon

              #7
              Re: recursive file editing

              Il Sat, 03 Apr 2004 17:22:04 -0800, Josiah Carlson ha scritto:
              [color=blue]
              > for thing in os.walk('mydir' ):
              > filehandle = file(thing, 'r+')[/color]

              I'm such a newbe I can't get it to work. Here is an example:

              in empty directory foo I touch a b c d;
              suppose I want to write "This works !" in each of these files.

              I run python[color=blue][color=green][color=darkred]
              >>> import os
              >>> for thing in os.walk('foo'):[/color][/color][/color]
              .... thingopen = file(thing,'r+' )
              .... thingopen.write ("This works !")
              .... thingopen.close ()
              ....
              Traceback (most recent call last):
              File "<stdin>", line 2, in ?
              TypeError: coercing to Unicode: need string or buffer, tuple found

              And in fact:
              [color=blue][color=green][color=darkred]
              >>> for thing in os.walk('foo'):[/color][/color][/color]
              .... print thing
              ....
              ('foo', [], ['a', 'b', 'c', 'd'])

              which is a tuple, I suppose.

              Selecting thing[2] doesn't help, because it now complains of it being a
              list.

              In the end I get this to work:

              for filetuple in os.walk('foo'):
              .... for filename in filetuple[2]:
              .... fileopen = file(filename, 'r+')
              fileopen.write( "This works !")
              fileopen.close( )

              which seems a bit of a clumsy way to do it.
              And besides it only works if I run python from directory foo,
              otherwise it tells me "no such file or directory".

              --
              Michele Alzetta

              Comment

              • Peter Otten

                #8
                Re: recursive file editing

                TaeKyon wrote:
                [color=blue]
                > Il Sat, 03 Apr 2004 17:22:04 -0800, Josiah Carlson ha scritto:
                >[color=green]
                >> for thing in os.walk('mydir' ):
                >> filehandle = file(thing, 'r+')[/color]
                >
                > I'm such a newbe I can't get it to work. Here is an example:
                >
                > in empty directory foo I touch a b c d;
                > suppose I want to write "This works !" in each of these files.
                >
                > I run python[color=green][color=darkred]
                >>>> import os
                >>>> for thing in os.walk('foo'):[/color][/color]
                > ... thingopen = file(thing,'r+' )
                > ... thingopen.write ("This works !")
                > ... thingopen.close ()
                > ...
                > Traceback (most recent call last):
                > File "<stdin>", line 2, in ?
                > TypeError: coercing to Unicode: need string or buffer, tuple found
                >
                > And in fact:
                >[color=green][color=darkred]
                >>>> for thing in os.walk('foo'):[/color][/color]
                > ... print thing
                > ...
                > ('foo', [], ['a', 'b', 'c', 'd'])
                >
                > which is a tuple, I suppose.
                >
                > Selecting thing[2] doesn't help, because it now complains of it being a
                > list.
                >
                > In the end I get this to work:
                >
                > for filetuple in os.walk('foo'):
                > ... for filename in filetuple[2]:
                > ... fileopen = file(filename, 'r+')
                > fileopen.write( "This works !")
                > fileopen.close( )
                >
                > which seems a bit of a clumsy way to do it.
                > And besides it only works if I run python from directory foo,
                > otherwise it tells me "no such file or directory".[/color]

                A minimal working example is:

                import os
                for path, folders, files in os.walk("/path/to/folder"):
                for name in files:
                filepath = os.path.join(pa th, name)
                fileopen = file(filepath, 'r+')
                fileopen.write( "This works !")
                fileopen.close( )

                You need to compose the filepath, and, yes, it's a bit clumsy.
                I've written a little generator function to hide some of the clumsiness:

                def files(folder):
                for path, folders, files in os.walk(folder) :
                for name in files:
                yield os.path.join(pa th, name)

                With that the code is simplified to:

                for filepath in files("/path/to/folder"):
                fileopen = file(filepath, 'r+')
                fileopen.write( "This works !")
                fileopen.close( )

                HTH,
                Peter


                Comment

                • Michael Geary

                  #9
                  Re: recursive file editing

                  > for filetuple in os.walk('foo'):[color=blue]
                  > ... for filename in filetuple[2]:
                  > ... fileopen = file(filename, 'r+')
                  > fileopen.write( "This works !")
                  > fileopen.close( )
                  >
                  > which seems a bit of a clumsy way to do it.[/color]

                  You have the right idea, although this would be a bit cleaner:

                  for root, dirs, files in os.walk( 'foo' ):
                  for name in files:
                  etc...

                  You might want to take a look at the documentation for os.walk. It explains
                  all this and has a couple of good code samples.
                  [color=blue]
                  > And besides it only works if I run python from directory foo,
                  > otherwise it tells me "no such file or directory".[/color]

                  You mean if you run Python from the *parent* directory of foo, right?

                  'foo' is a relative path, not an absolute one, so it gets appended to the
                  current directory.

                  -Mike


                  Comment

                  • TaeKyon

                    #10
                    Re: recursive file editing

                    Il Mon, 05 Apr 2004 19:15:01 +0200, Peter Otten ha scritto:
                    [color=blue]
                    > You need to compose the filepath, and, yes, it's a bit clumsy.
                    > I've written a little generator function to hide some of the clumsiness:
                    >
                    > def files(folder):
                    > for path, folders, files in os.walk(folder) :
                    > for name in files:
                    > yield os.path.join(pa th, name)
                    >
                    > With that the code is simplified to:
                    >
                    > for filepath in files("/path/to/folder"):
                    > fileopen = file(filepath, 'r+')
                    > fileopen.write( "This works !")
                    > fileopen.close( )[/color]

                    Great !

                    --
                    Michele Alzetta

                    Comment

                    • Josiah Carlson

                      #11
                      Re: recursive file editing

                      >>for thing in os.walk('mydir' ):[color=blue][color=green]
                      >> filehandle = file(thing, 'r+')[/color]
                      >
                      >
                      > I'm such a newbe I can't get it to work. Here is an example:[/color]

                      Nah, it's my fault. I thought you were having issues with the file open
                      mode needing to be a string. I forgot the format of os.walk iteration.

                      Good to hear that you now have something that works the way you want it.

                      - Josiah

                      Comment

                      • TaeKyon

                        #12
                        Re: recursive file editing

                        Il Sun, 04 Apr 2004 12:11:25 +0200, Peter Otten ha scritto:
                        [color=blue]
                        > The following code comes with no warranties. Be sure to backup valuable data
                        > before trying it. You may need to edit the regular expressions. Call the
                        > script with the directory you want to process.[/color]

                        Seems to work all right !
                        I have a question:
                        [color=blue]
                        > class Path(object):[/color]
                        # multiple function definitions follow, amongst which:
                        [color=blue]
                        > def files(rootfolde r):
                        > for folder, folders, files in os.walk(rootfol der):
                        > for name in files:
                        > yield Path(folder, name)[/color]

                        So 'Path' is the name of a class and _contemporaneou sly_ the
                        result of one of the functions the class contains ?
                        Or are there really two separate 'Path' things which don't interfere
                        because each has its own namepace ?

                        I'm sorry for the repeated questions, maybe I should take this discussion
                        over to the tutor mailing list !

                        --
                        Michele Alzetta

                        Comment

                        • Peter Otten

                          #13
                          Re: recursive file editing

                          TaeKyon wrote:
                          [color=blue]
                          > Il Sun, 04 Apr 2004 12:11:25 +0200, Peter Otten ha scritto:
                          >[color=green]
                          >> The following code comes with no warranties. Be sure to backup valuable
                          >> data before trying it. You may need to edit the regular expressions. Call
                          >> the script with the directory you want to process.[/color]
                          >
                          > Seems to work all right !
                          > I have a question:
                          >[color=green]
                          >> class Path(object):[/color]
                          > # multiple function definitions follow, amongst which:
                          >[color=green]
                          >> def files(rootfolde r):
                          >> for folder, folders, files in os.walk(rootfol der):
                          >> for name in files:
                          >> yield Path(folder, name)[/color]
                          >
                          > So 'Path' is the name of a class and _contemporaneou sly_ the
                          > result of one of the functions the class contains ?[/color]

                          No, the functions up to __str__() are indented one level. This means they
                          belong to the Path class, i. e. they are methods.
                          In contrast, files() is a standalone function - or more precisely a
                          generator. As a rule of thumb you can tell functions from methods by
                          looking at the first parameter - if it's called "self" it's a method.

                          As a side note, though it's not the case here it is possible for a class to
                          have methods that return new instances of the same class (or even the same
                          instance which is what a considerable fraction of python users wishes for
                          list.sort()).
                          For example:

                          class Path(object):
                          # ... as above
                          def child(self, name):
                          """ create a new Path instance denoting a
                          child of the current path """
                          return Path(self.path, name)
                          def __repr__(self):
                          """ added for better commandline experience :-) """
                          return "Path(%r)" % self.path

                          Now try it:
                          [color=blue][color=green][color=darkred]
                          >>> from processtree import Path
                          >>> p = Path("/path/to", "folder")
                          >>> p.child("file")[/color][/color][/color]
                          Path('/path/to/folder/file')[color=blue][color=green][color=darkred]
                          >>> p.child("sub"). child("subsub")[/color][/color][/color]
                          Path('/path/to/folder/sub/subsub')
                          [color=blue]
                          > Or are there really two separate 'Path' things which don't interfere
                          > because each has its own namepace ?[/color]

                          No, every Path(folder, name) creates a new Path instance as defined above.
                          When you see Class(arg1, arg2, ..., argN), under the hood Python creates a
                          new instance of Class and calls the special __init__(self, arg1, ..., argN)
                          method with the instance as the first (called self by convention) and
                          arg1,..., argN as the following arguments.
                          [color=blue]
                          > I'm sorry for the repeated questions, maybe I should take this discussion
                          > over to the tutor mailing list ![/color]

                          I suggest that you stick with with the simpler approach in my later post
                          until you have a firm grip of classes. For the task at hand the Path class
                          seems overkill, now I'm reconsidering it.

                          Peter

                          Comment

                          • TaeKyon

                            #14
                            Re: recursive file editing

                            Il Tue, 06 Apr 2004 15:08:37 +0200, Peter Otten ha scritto:
                            [color=blue]
                            > I suggest that you stick with with the simpler approach in my later post
                            > until you have a firm grip of classes. For the task at hand the Path class
                            > seems overkill, now I'm reconsidering it.[/color]

                            Here is a variation on the theme I came up with this afternoon:

                            #!/usr/bin/python
                            import os, sys, re, fileinput

                            try:
                            target_folder = (sys.argv[1])
                            original_patter n = (sys.argv[2])
                            result_pattern = (sys.argv[3])
                            except:
                            print "Substitute s a string with another in all files of a directory"
                            print " Use: ./MyScript.py directory string other_string"
                            sys.exit()
                            for folders, folder, filelist in os.walk(target_ folder):
                            for filename in filelist:
                            file = os.path.join(fo lders,filename)
                            for line in fileinput.input (file,'inplace= 1'):
                            line = re.sub(original _pattern,result _pattern,line)
                            print line
                            # Commented out because apparently useless, from the documentation I
                            # don't quite understand whether it ought to be here or not
                            # fileinput.close ()

                            This works - almost.
                            1) It does substitute the pattern, however it seems to
                            add a newline for each newline present in the original file every time
                            it is run (so files get longer and longer), and I don't understand why.
                            2) The final fileinput.close () seems to be useless; the program works
                            without, and bug 1) isn't affected.

                            --
                            Michele Alzetta

                            Comment

                            • Peter Otten

                              #15
                              Re: recursive file editing

                              TaeKyon wrote:
                              [color=blue]
                              > Il Tue, 06 Apr 2004 15:08:37 +0200, Peter Otten ha scritto:
                              >[color=green]
                              >> I suggest that you stick with with the simpler approach in my later post
                              >> until you have a firm grip of classes. For the task at hand the Path
                              >> class seems overkill, now I'm reconsidering it.[/color]
                              >
                              > Here is a variation on the theme I came up with this afternoon:
                              >
                              > #!/usr/bin/python
                              > import os, sys, re, fileinput
                              >
                              > try:
                              > target_folder = (sys.argv[1])
                              > original_patter n = (sys.argv[2])
                              > result_pattern = (sys.argv[3])
                              > except:
                              > print "Substitute s a string with another in all files of a directory"
                              > print " Use: ./MyScript.py directory string other_string"
                              > sys.exit()
                              > for folders, folder, filelist in os.walk(target_ folder):
                              > for filename in filelist:
                              > file = os.path.join(fo lders,filename)[/color]

                              file as a variable name is not recommended, because it hides the builtin
                              file.
                              [color=blue]
                              > for line in fileinput.input (file,'inplace= 1'):[/color]

                              That 'inplace=1' works as expected is sheer luck because any non-empty
                              string works as a True value - 'inplace=0' would have the same effect. Make
                              that

                              for line in fileinput.input (file, inplace=1)
                              [color=blue]
                              > line = re.sub(original _pattern,result _pattern,line)
                              > print line[/color]

                              Add a trailing comma to the above line. Lines are always read including the
                              trailing newline, and the print statement adds another newline if it does
                              not end with a comma like so:
                              print line,
                              [color=blue]
                              > # Commented out because apparently useless, from the documentation I
                              > # don't quite understand whether it ought to be here or not
                              > # fileinput.close ()[/color]

                              The file will eventually be closed anyway - if you omit the close() call
                              it's up to the python implementation to decide when that will happen.
                              [color=blue]
                              >
                              > This works - almost.
                              > 1) It does substitute the pattern, however it seems to
                              > add a newline for each newline present in the original file every time
                              > it is run (so files get longer and longer), and I don't understand why.
                              > 2) The final fileinput.close () seems to be useless; the program works
                              > without, and bug 1) isn't affected.[/color]

                              I've never used fileinput, so I probably shouldn't comment on that, but the
                              first impression is that it does too much magic (like redirecting stdout,
                              and chaining multiple files) for my taste. If I read the documentation
                              correctly you could omit the intermediate loop like so (untested):

                              # using your variable names
                              for folders, folder, filelist in os.walk(target_ folder):
                              os.chdir(folder s)
                              for line in fileinput.input (filelist, inplace=1):
                              print re.sub(original _pattern,result _pattern,line),

                              Peter

                              Comment

                              Working...