Parsing and Editing Source

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Paul Wilson

    Parsing and Editing Source

    Hi all,

    I'd like to be able to do the following to a python source file
    programmaticall y:
    * Read in a source file
    * Add/Remove/Edit Classes, methods, functions
    * Add/Remove/Edit Decorators
    * List the Classes
    * List the imported modules
    * List the functions
    * List methods of classes

    And then save out the result back to the original file (or elsewhere).

    I've begun by using the tokenize module to generate a token-tuple list
    and am building datastructures around it that enable the above
    methods. I'm find that I'm getting a little caught up in the details
    and thought I'd step back and ask if there's a more elegant way to
    approach this, or if anyone knows a library that could assist.

    So far, I've got code that generates a line number to token-tuple list
    dictionary, and am working on a datastructure describing where the
    classes begin and end, indexed by their name, such that they can be
    later modified.

    Any thoughts?
    Thanks,
    Paul
  • eliben

    #2
    Re: Parsing and Editing Source

    On Aug 15, 4:21 pm, "Paul Wilson" <paulalexwil... @gmail.comwrote :
    Hi all,
    >
    I'd like to be able to do the following to a python source file
    programmaticall y:
     * Read in a source file
     * Add/Remove/Edit Classes, methods, functions
     * Add/Remove/Edit Decorators
     * List the Classes
     * List the imported modules
     * List the functions
     * List methods of classes
    >
    And then save out the result back to the original file (or elsewhere).
    >
    I've begun by using the tokenize module to generate a token-tuple list
    and am building datastructures around it that enable the above
    methods. I'm find that I'm getting a little caught up in the details
    and thought I'd step back and ask if there's a more elegant way to
    approach this, or if anyone knows a library that could assist.
    >
    So far, I've got code that generates a line number to token-tuple list
    dictionary, and am working on a datastructure describing where the
    classes begin and end, indexed by their name, such that they can be
    later modified.
    >
    Any thoughts?
    Thanks,
    Paul
    Consider using the 'compiler' module which will lend you more help
    than 'tokenize'.

    For example, the following demo lists all the method names in a file:

    import compiler

    class MethodFinder:
    """ Print the names of all the methods

    Each visit method takes two arguments, the node and its
    current scope.
    The scope is the name of the current class or None.
    """

    def visitClass(self , node, scope=None):
    self.visit(node .code, node.name)

    def visitFunction(s elf, node, scope=None):
    if scope is not None:
    print "%s.%s" % (scope, node.name)
    self.visit(node .code, None)

    def main(files):
    mf = MethodFinder()
    for file in files:
    f = open(file)
    buf = f.read()
    f.close()
    ast = compiler.parse( buf)
    compiler.walk(a st, mf)

    if __name__ == "__main__":
    import pprint
    import sys

    main(sys.argv)

    Comment

    • Wilson

      #3
      Re: Parsing and Editing Source

      On Aug 15, 3:45 pm, eliben <eli...@gmail.c omwrote:
      On Aug 15, 4:21 pm, "Paul Wilson" <paulalexwil... @gmail.comwrote :
      >
      >
      >
      Hi all,
      >
      I'd like to be able to do the following to a python source file
      programmaticall y:
       * Read in a source file
       * Add/Remove/Edit Classes, methods, functions
       * Add/Remove/Edit Decorators
       * List the Classes
       * List the imported modules
       * List the functions
       * List methods of classes
      >
      And then save out the result back to the original file (or elsewhere).
      >
      I've begun by using the tokenize module to generate a token-tuple list
      and am building datastructures around it that enable the above
      methods. I'm find that I'm getting a little caught up in the details
      and thought I'd step back and ask if there's a more elegant way to
      approach this, or if anyone knows a library that could assist.
      >
      So far, I've got code that generates a line number to token-tuple list
      dictionary, and am working on a datastructure describing where the
      classes begin and end, indexed by their name, such that they can be
      later modified.
      >
      Any thoughts?
      Thanks,
      Paul
      >
      Consider using the 'compiler' module which will lend you more help
      than 'tokenize'.
      >
      For example, the following demo lists all the method names in a file:
      >
      import compiler
      >
      class MethodFinder:
          """ Print the names of all the methods
      >
              Each visit method takes two arguments, the node and its
              current scope.
              The scope is the name of the current class or None.
          """
      >
          def visitClass(self , node, scope=None):
              self.visit(node .code, node.name)
      >
          def visitFunction(s elf, node, scope=None):
              if scope is not None:
                  print "%s.%s" % (scope, node.name)
              self.visit(node .code, None)
      >
      def main(files):
          mf = MethodFinder()
          for file in files:
              f = open(file)
              buf = f.read()
              f.close()
              ast = compiler.parse( buf)
              compiler.walk(a st, mf)
      >
      if __name__ == "__main__":
          import pprint
          import sys
      >
          main(sys.argv)
      Thanks! Will I be able to make changes to the ast such as "rename
      decorator", "add decorator", etc.. and write them back out to a file
      as Python source?

      Regards,
      Paul

      Comment

      • Rafe

        #4
        Re: Parsing and Editing Source

        On Aug 15, 9:21 pm, "Paul Wilson" <paulalexwil... @gmail.comwrote :
        Hi all,
        >
        I'd like to be able to do the following to a python source file
        programmaticall y:
        * Read in a source file
        * Add/Remove/Edit Classes, methods, functions
        * Add/Remove/Edit Decorators
        * List the Classes
        * List the imported modules
        * List the functions
        * List methods of classes
        >
        And then save out the result back to the original file (or elsewhere).
        >
        I've begun by using the tokenize module to generate a token-tuple list
        and am building datastructures around it that enable the above
        methods. I'm find that I'm getting a little caught up in the details
        and thought I'd step back and ask if there's a more elegant way to
        approach this, or if anyone knows a library that could assist.
        >
        So far, I've got code that generates a line number to token-tuple list
        dictionary, and am working on a datastructure describing where the
        classes begin and end, indexed by their name, such that they can be
        later modified.
        >
        Any thoughts?
        Thanks,
        Paul

        I can't help much...yet, but I am also heavily interested in this as I
        will be approaching a project which will require me to write code
        which writes code back to a file or new file after being manipulated.
        I had planned on using the inspect module's getsource(), getmodule()
        and getmembers() methods rather than doing any sort of file reading.
        Have you tried any of these yet? Have you found any insurmountable
        limitations?

        It looks like everything needed is there. Some quick thoughts
        regarding inspect.getmemb ers(module) results...
        * Module objects can be written based on their attribute name and
        __name__ values. If they are the same, then just write "import %s" %
        mod.__name__. If they are different, write "import %s as %s" % (name,
        mod.__name__)

        * Skipping built in stuff is easy and everything else is either an
        attribute name,value pair or an object of type 'function' or 'class'.
        Both of which work with inspect.getsour ce() I believe.

        * If the module used any from-import-* lines, it doesn't look like
        there is any difference between items defined in the module and those
        imported in to the modules name space. writing this back directly
        would 'flatten' this call to individual module imports and local
        module attributes. Maybe reading the file just to test for this would
        be the answer. You could then import the module and subtract items
        which haven't changed. This is easy for attributes but harder for
        functions and classes...right ?


        Beyond this initial bit of code, I'm hoping to be able to write new
        code where I only want the new object to have attributes which were
        changed. So if I have an instance of a Person object who's name has
        been changed from it's default, I only want a new class which inherits
        the Person class and has an attribute 'name' with the new value.
        Basically using python as a text-based storage format instead of
        something like XML. Thoughts on this would be great for me if it
        doesn't hijack the thread ;) I know there a quite a few who have done
        this already.


        Cheers,

        - Rafe




        Comment

        • Wilson

          #5
          Re: Parsing and Editing Source

          On Aug 15, 4:16 pm, Rafe <rafesa...@gmai l.comwrote:
          On Aug 15, 9:21 pm, "Paul Wilson" <paulalexwil... @gmail.comwrote :
          >
          >
          >
          Hi all,
          >
          I'd like to be able to do the following to a python source file
          programmaticall y:
           * Read in a source file
           * Add/Remove/Edit Classes, methods, functions
           * Add/Remove/Edit Decorators
           * List the Classes
           * List the imported modules
           * List the functions
           * List methods of classes
          >
          And then save out the result back to the original file (or elsewhere).
          >
          I've begun by using the tokenize module to generate a token-tuple list
          and am building datastructures around it that enable the above
          methods. I'm find that I'm getting a little caught up in the details
          and thought I'd step back and ask if there's a more elegant way to
          approach this, or if anyone knows a library that could assist.
          >
          So far, I've got code that generates a line number to token-tuple list
          dictionary, and am working on a datastructure describing where the
          classes begin and end, indexed by their name, such that they can be
          later modified.
          >
          Any thoughts?
          Thanks,
          Paul
          >
          I can't help much...yet, but I am also heavily interested in this as I
          will be approaching a project which will require me to write code
          which writes code back to a file or new file after being manipulated.
          I had planned on using the inspect module's getsource(), getmodule()
          and getmembers() methods rather than doing any sort of file reading.
          Have you tried any of these yet? Have you found any insurmountable
          limitations?
          The inspect module's getsource() returns the source code as originally
          defined. It does not return any changes that have been made during
          runtime. So, if you attached a new class to a module, I don't belive
          that getsource() would be any use for extracting the code again to be
          saved. I have rejected this approach for this reason. getmembers()
          seems to be fine for this purpose, however I don't seen anyway to get
          class decorators and method decorators out.
          It looks like everything needed is there. Some quick thoughts
          regarding inspect.getmemb ers(module) results...
           * Module objects can be written based on their attribute name and
          __name__ values. If they are the same, then just write "import %s" %
          mod.__name__. If they are different, write "import %s as %s" % (name,
          mod.__name__)
          >
           * Skipping built in stuff is easy and everything else is either an
          attribute name,value pair or an object of type 'function' or 'class'.
          Both of which work with inspect.getsour ce() I believe.
          True, but if you add a function or class at runtime,
          inspect.getsour ce() will not pick it up. It's reading the source from
          a file, not doing some sort of AST unparse magic as I'd hoped. You'll
          also have to check getsource() will return the decorator of an object
          too.
           * If the module used any from-import-* lines, it doesn't look like
          there is any difference between items defined in the module and those
          imported in to the modules name space. writing this back directly
          would 'flatten' this call to individual module imports and local
          module attributes. Maybe reading the file just to test for this would
          be the answer. You could then import the module and subtract items
          which haven't changed. This is easy for attributes but harder for
          functions and classes...right ?
          Does getmodule() not tell you where objects are defined?
          Beyond this initial bit of code, I'm hoping to be able to write new
          code where I only want the new object to have attributes which were
          changed. So if I have an instance of a Person object who's name has
          been changed from it's default, I only want a new class which inherits
          the Person class and has an attribute 'name' with the new value.
          Basically using python as a text-based storage format instead of
          something like XML. Thoughts on this would be great for me if it
          doesn't hijack the thread ;) I know there a quite a few who have done
          this already.
          You want to be able to make class attribute changes and then have some
          automated way of generating overriding subclasses that reflects this
          change? Sounds difficult. Be sure to keep me posted on your journey!

          Regards,
          Paul

          Comment

          • Benjamin

            #6
            Re: Parsing and Editing Source

            On Aug 15, 9:21 am, "Paul Wilson" <paulalexwil... @gmail.comwrote :
            Hi all,
            >
            I'd like to be able to do the following to a python source file
            programmaticall y:
             * Read in a source file
             * Add/Remove/Edit Classes, methods, functions
             * Add/Remove/Edit Decorators
             * List the Classes
             * List the imported modules
             * List the functions
             * List methods of classes
            >
            And then save out the result back to the original file (or elsewhere).
            >
            I've begun by using the tokenize module to generate a token-tuple list
            and am building datastructures around it that enable the above
            methods. I'm find that I'm getting a little caught up in the details
            and thought I'd step back and ask if there's a more elegant way to
            approach this, or if anyone knows a library that could assist.
            >
            So far, I've got code that generates a line number to token-tuple list
            dictionary, and am working on a datastructure describing where the
            classes begin and end, indexed by their name, such that they can be
            later modified.

            Look at the 2to3 tool which is good at this sort of thing. It lets you
            define custom "fixers" that work on a fairly high-level representation
            of the parse tree and then write the source back exactly unchanged.
            >
            Any thoughts?
            Thanks,
            Paul

            Comment

            • Wilson

              #7
              Re: Parsing and Editing Source

              On Aug 16, 3:51 am, Benjamin <musiccomposit. ..@gmail.comwro te:
              On Aug 15, 9:21 am, "Paul Wilson" <paulalexwil... @gmail.comwrote :
              >
              >
              >
              Hi all,
              >
              I'd like to be able to do the following to a python source file
              programmaticall y:
               * Read in a source file
               * Add/Remove/Edit Classes, methods, functions
               * Add/Remove/Edit Decorators
               * List the Classes
               * List the imported modules
               * List the functions
               * List methods of classes
              >
              And then save out the result back to the original file (or elsewhere).
              >
              I've begun by using the tokenize module to generate a token-tuple list
              and am building datastructures around it that enable the above
              methods. I'm find that I'm getting a little caught up in the details
              and thought I'd step back and ask if there's a more elegant way to
              approach this, or if anyone knows a library that could assist.
              >
              So far, I've got code that generates a line number to token-tuple list
              dictionary, and am working on a datastructure describing where the
              classes begin and end, indexed by their name, such that they can be
              later modified.
              >
              Look at the 2to3 tool which is good at this sort of thing. It lets you
              define custom "fixers" that work on a fairly high-level representation
              of the parse tree and then write the source back exactly unchanged.
              >
              >
              >
              Any thoughts?
              Thanks,
              Paul
              >
              >
              Thanks for the hint. I've looked at lib2to3 and there might be some
              useful stuff in there!

              Thank you,
              Paul

              Comment

              Working...