Simple Bayesian classifier?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Pavel Kalinov

    Simple Bayesian classifier?

    Hi all,

    I am trying to build an application to classify texts from a number of
    sources. I am programming it in PHP and I go "by the book" - i.e.
    calculating probabilities according to the formula etc.
    It works, but it's very slow (due to slow PHP mathematical
    implementation, I guess).
    Is there some variation of the Naive Bayes classifier which is not so
    demanding in the way of computing power used?

    Best
    Pavel
  • shimmyshack

    #2
    Re: Simple Bayesian classifier?

    On Jun 8, 11:52 am, Pavel Kalinov <pavk...@gmail. comwrote:
    Hi all,
    >
    I am trying to build an application to classify texts from a number of
    sources. I am programming it in PHP and I go "by the book" - i.e.
    calculating probabilities according to the formula etc.
    It works, but it's very slow (due to slow PHP mathematical
    implementation, I guess).
    Is there some variation of the Naive Bayes classifier which is not so
    demanding in the way of computing power used?
    >
    Best
    Pavel
    spamassasin's code is OS, have you checked that out?

    AFAIK php offloads its maths to c libraries; so your problem is that
    it can be much more computationally intensive to work by the book,
    with no code optimisation techniques etc... (hash tables and so on).
    (A mathematician C programmer I know got their code to run in 2 days
    rather than 2 weeks after some optimisation)

    Comment

    • Schraalhans Keukenmeester

      #3
      Re: Simple Bayesian classifier?

      At Fri, 08 Jun 2007 20:52:39 +1000, Pavel Kalinov let h(is|er) monkeys
      type:
      Hi all,
      >
      I am trying to build an application to classify texts from a number of
      sources. I am programming it in PHP and I go "by the book" - i.e.
      calculating probabilities according to the formula etc.
      It works, but it's very slow (due to slow PHP mathematical
      implementation, I guess).
      Is there some variation of the Naive Bayes classifier which is not so
      demanding in the way of computing power used?
      >
      Best
      Pavel
      You may like http://xhtml.net/php/PHPNaiveBayesianFilter
      I am a bit surprised you have such a slow response, the typical algorithms
      don't seem to be extremely taxing.

      As part of an author authenticity scoring app Naive Bayesian filtering
      proved quite useful, for spam filtering its use *by itself) proves rather
      limited. Quite a few spam creators (scripts) are well equipped these days
      to lower scores substantially, allowing their messages to leak through.

      hth

      --
      Schraalhans Keukenmeester - schraalhans@the .Spamtrapexampl e.nl
      [Remove the lowercase part of Spamtrap to send me a message]

      "strcmp('apples ','oranges') < 0"

      Comment

      • Pavel Kalinov

        #4
        Re: Simple Bayesian classifier?

        Thanks, I didn't know this - will look into it.
        BTW, I am not trying to make a spam filter, but to sort news articles in
        a number of categories (16 at present, as test). And I need
        milliseconds, not days :-(

        Best
        Pavel

        shimmyshack wrote:
        On Jun 8, 11:52 am, Pavel Kalinov <pavk...@gmail. comwrote:
        >Hi all,
        >>
        >I am trying to build an application to classify texts from a number of
        >sources. I am programming it in PHP and I go "by the book" - i.e.
        >calculating probabilities according to the formula etc.
        >It works, but it's very slow (due to slow PHP mathematical
        >implementation , I guess).
        >Is there some variation of the Naive Bayes classifier which is not so
        >demanding in the way of computing power used?
        >>
        >Best
        >Pavel
        >
        spamassasin's code is OS, have you checked that out?

        AFAIK php offloads its maths to c libraries; so your problem is that
        it can be much more computationally intensive to work by the book,
        with no code optimisation techniques etc... (hash tables and so on).
        (A mathematician C programmer I know got their code to run in 2 days
        rather than 2 weeks after some optimisation)
        >

        Comment

        • Toby A Inkster

          #5
          Re: Simple Bayesian classifier?

          Pavel Kalinov wrote:
          BTW, I am not trying to make a spam filter, but to sort news articles in
          a number of categories (16 at present, as test). And I need
          milliseconds, not days :-(
          Still, SpamAssassin might be what you're looking for.

          Turn off all SA's non-Bayes scoring, and then feed SA a corpus of say, 500
          sports articles, telling it that they're "spam"; then 500 non-sports
          articles, telling them they're "ham". After this preparation, your SA
          configuration should be primed to detect sports articles.

          Another 15 SA configurations, and your setup should be complete.

          With SA, one user can have multiple configurations using the "--configpath"
          command-line option.

          --
          Toby A Inkster BSc (Hons) ARCS
          [Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
          [OS: Linux 2.6.12-12mdksmp, up 108 days, 16 min.]

          URLs in demiblog

          Comment

          Working...