Generic logic/conditional class or library for classification of data

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Basilisk96

    Generic logic/conditional class or library for classification of data

    This topic is difficult to describe in one subject sentence...

    Has anyone come across the application of the simple statement "if
    (object1's attributes meet some conditions) then (set object2's
    attributes to certain outcomes)", where "object1" and "object2" are
    generic objects, and the "conditions " and "outcomes" are dynamic run-
    time inputs? Typically, logic code for any application out there is
    hard-coded. I have been working with Python for a year, and its
    flexibility is nothing short of amazing. Wouldn't it be possible to
    have a class or library that can do this sort of dynamic logic?

    The main application of such code would be for classification
    algorithms which, based on the attributes of a given object, can
    classify the object into a scheme. In general, conditions for
    classification can be complex, sometimes involving a collection of
    "and", "or", "not" clauses. The simplest outcome would involve simply
    setting a few attributes of the output object to given values if the
    input condition is met. So each such "if-then" clause can be viewed as
    a rule that is custom-defined at runtime.

    As a very basic example, consider a set of uncategorized objects that
    have text descriptions associated with them. The objects are some type
    of tangible product, e.g., books. So the input object has a
    Description attribute, and the output object (a categorized book)
    would have some attributes like Discipline, Target audience, etc.
    Let's say that one such rule is "if ( 'description' contains
    'algebra') then ('discipline' = 'math', 'target' = 'student') ". Keep
    in mind that all these attribute names and their values are not known
    at design time.

    Is there one obvious way to do this in Python?
    Perhaps this is more along the lines of data mining methods?
    Is there a library with this sort of functionality out there already?

    Any help will be appreciated.

  • Steven D'Aprano

    #2
    Re: Generic logic/conditional class or library for classification of data

    On Sat, 31 Mar 2007 21:54:46 -0700, Basilisk96 wrote:
    As a very basic example, consider a set of uncategorized objects that
    have text descriptions associated with them. The objects are some type
    of tangible product, e.g., books. So the input object has a
    Description attribute, and the output object (a categorized book)
    would have some attributes like Discipline, Target audience, etc.
    Let's say that one such rule is "if ( 'description' contains
    'algebra') then ('discipline' = 'math', 'target' = 'student')". Keep
    in mind that all these attribute names and their values are not known at
    design time.
    Easy-peasy.

    rules = {'algebra': {'discipline': 'math', 'target': 'student'},
    'python': {'section': 'programming', 'os': 'linux, windows'}}

    class Input_Book(obje ct):
    def __init__(self, description):
    self.descriptio n = description

    class Output_Book(obj ect):
    def __repr__(self):
    return "Book - %s" % self.__dict__

    def process_book(bo ok):
    out = Output_Book()
    for desc in rules:
    if desc in book.descriptio n:
    attributes = rules[desc]
    for attr in attributes:
    setattr(out, attr, attributes[attr])
    return out

    book1 = Input_Book('pyt hon for cheese-makers')
    book2 = Input_Book('tea ching algebra in haikus')
    book3 = Input_Book('how to teach algebra to python programmers')

    >>process_book( book1)
    Book - {'section': 'programming', 'os': 'linux, windows'}
    >>process_book( book2)
    Book - {'discipline': 'math', 'target': 'student'}
    >>process_book( book3)
    Book - {'discipline': 'math', 'section': 'programming',
    'os': 'linux, windows', 'target': 'student'}


    I've made some simplifying assumptions: the input object always has a
    description attribute. Also the behaviour when two or more rules set the
    same attribute is left undefined. If you want more complex rules you can
    follow the same technique, except you'll need a set of meta-rules to
    decide what rules to follow.

    But having said that, I STRONGLY recommend that you don't follow that
    approach of creating variable instance attributes at runtime. The reason
    is, it's quite hard for you to know what to do with an Output_Book once
    you've got it. You'll probably end up filling your code with horrible
    stuff like this:

    if hasattr(book, 'target'):
    do_something_wi th(book.target)
    elif hasattr(book, 'discipline'):
    do_something_wi th(book.discipl ine)
    elif ... # etc.


    Replacing the hasattr() checks with try...except blocks isn't any
    less icky.

    Creating instance attributes at runtime has its place; I just don't think
    this is it.

    Instead, I suggest you encapsulate the variable parts of the book
    attributes into a single attribute:

    class Output_Book(obj ect):
    def __init__(self, name, data):
    self.name = name # common attribute(s)
    self.data = data # variable attributes


    Then, instead of setting each variable attribute individually with
    setattr(), simply collect all of them in a dict and save them in data:

    def process_book(bo ok):
    data = {}
    for desc in rules:
    if desc in book.descriptio n:
    data.update(rul es[desc])
    return Output_Book(boo k.name, data)


    Now you can do this:

    outbook = process_book(bo ok)
    # handle the common attributes that are always there
    print outbook.name
    # handle the variable attributes
    print "Stock = %s" % output.data.set default('status ', 0)
    print "discipline = %s" % output.data.get ('discipline', 'none')
    # handle all the variable attributes
    for key, value in output.data.ite ritems():
    do_something_wi th(key, value)


    Any time you have to deal with variable attributes that may or may not be
    there, you have to use more complex code, but you can minimize the
    complexity by keeping the variable attributes separate from the common
    attributes.


    --
    Steven.

    Comment

    • Michael Bentley

      #3
      Re: Generic logic/conditional class or library for classification ofdata


      On Mar 31, 2007, at 11:54 PM, Basilisk96 wrote:
      This topic is difficult to describe in one subject sentence...
      >
      Has anyone come across the application of the simple statement "if
      (object1's attributes meet some conditions) then (set object2's
      attributes to certain outcomes)", where "object1" and "object2" are
      generic objects, and the "conditions " and "outcomes" are dynamic run-
      time inputs? Typically, logic code for any application out there is
      hard-coded. I have been working with Python for a year, and its
      flexibility is nothing short of amazing. Wouldn't it be possible to
      have a class or library that can do this sort of dynamic logic?
      >
      The main application of such code would be for classification
      algorithms which, based on the attributes of a given object, can
      classify the object into a scheme. In general, conditions for
      classification can be complex, sometimes involving a collection of
      "and", "or", "not" clauses. The simplest outcome would involve simply
      setting a few attributes of the output object to given values if the
      input condition is met. So each such "if-then" clause can be viewed as
      a rule that is custom-defined at runtime.
      >
      As a very basic example, consider a set of uncategorized objects that
      have text descriptions associated with them. The objects are some type
      of tangible product, e.g., books. So the input object has a
      Description attribute, and the output object (a categorized book)
      would have some attributes like Discipline, Target audience, etc.
      Let's say that one such rule is "if ( 'description' contains
      'algebra') then ('discipline' = 'math', 'target' = 'student') ". Keep
      in mind that all these attribute names and their values are not known
      at design time.
      >
      Is there one obvious way to do this in Python?
      Perhaps this is more along the lines of data mining methods?
      Is there a library with this sort of functionality out there already?
      >
      Any help will be appreciated.
      You may be interested in http://divmod.org/trac/wiki/DivmodReverend
      -- it is a general purpose Bayesian classifier written in python.

      hope this helps,
      Michael

      Comment

      • Basilisk96

        #4
        Re: Generic logic/conditional class or library for classification of data

        Thanks for the help, guys.
        Dictionaries to the rescue!

        Steven, it's certainly true that runtime creation of attributes does
        not fit well here. At some point, an application needs to come out of
        generics and deal with logic that is specific to the problem. The
        example I gave was classification of books, which is relatively easy
        to understand. The particular app I'm working with deals with
        specialty piping valves, where the list of rules grows complicated
        fairly quickly.

        So, having said that "attributes are not known at design time", it
        seems that dictionaries are best for the generic core functionality:
        it's easy to iterate over arbitrary "key, value" pairs without
        hiccups. I can even reference a custom function by a key, and call it
        during the iteration to do what's necessary. The input/output
        dictionaries would dictate that behavior, so that would be the
        implementation-specific stuff. Easy enough, and the core functionality
        remains generic enough for re-use.

        Michael, I looked at the sample snippets at that link, and I'll have
        to try it out. Thanks!

        Comment

        • nawijn@gmail.com

          #5
          Re: Generic logic/conditional class or library for classification of data

          On Apr 3, 5:43 am, "Basilisk96 " <basilis...@gma il.comwrote:
          Thanks for the help, guys.
          Dictionaries to the rescue!
          >
          Steven, it's certainly true that runtime creation of attributes does
          not fit well here. At some point, an application needs to come out of
          generics and deal with logic that is specific to the problem. The
          example I gave was classification of books, which is relatively easy
          to understand. The particular app I'm working with deals with
          specialty piping valves, where the list of rules grows complicated
          fairly quickly.
          >
          So, having said that "attributes are not known at design time", it
          seems that dictionaries are best for the generic core functionality:
          it's easy to iterate over arbitrary "key, value" pairs without
          hiccups. I can even reference a custom function by a key, and call it
          during the iteration to do what's necessary. The input/output
          dictionaries would dictate that behavior, so that would be the
          implementation-specific stuff. Easy enough, and the core functionality
          remains generic enough for re-use.
          >
          Michael, I looked at the sample snippets at that link, and I'll have
          to try it out. Thanks!
          Hello,

          If your rules become more complicated and maybe increase in number
          significantly,
          it might be an idea to switch to a rule-based system. Take a look at
          CLIPS and the
          associated Python bindings:


          Download PyCLIPS Python Module for free. Python module to interface the CLIPS expert system shell library.


          Kind regards,

          Marco

          Comment

          Working...