pop3 email header classifier?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Robin Becker

    pop3 email header classifier?

    Hi, I'm getting vast numbers of fake upgrade emails containing some kind
    of virus. My rather old client can be made to reject these based on some
    patterns in the subject line. They're nearly all based on the word
    'New', 'Latest', 'Microsoft', 'Patch', 'Pack', ... etc etc.

    Is there a python tool that can be made to delete these from my POP3
    mail box rather than let my client reject? Quite a few seem to have
    semi-valid return addresses so I get postmaster rejects from
    xxx@microsoft.c om etc.

    I know about spam-bayes etc, but these things are over 120k each and it
    seems pretty pointless to download them (as well as taking about an
    hour).
    --
    Robin Becker
  • Richie Hindle

    #2
    Re: pop3 email header classifier?


    [Robin][color=blue]
    > Hi, I'm getting vast numbers of fake upgrade emails containing some kind
    > of virus. My rather old client can be made to reject these based on some
    > patterns in the subject line. They're nearly all based on the word
    > 'New', 'Latest', 'Microsoft', 'Patch', 'Pack', ... etc etc.
    >
    > Is there a python tool that can be made to delete these from my POP3
    > mail box rather than let my client reject?[/color]

    I have a webmail application that can be made to delete messages based on
    regular expressions, at http://entrian.com/cgi-bin/pop3.py

    I wrote it in response to a similar problem, whereby a spammer used my
    address as his From address, and I received a couple of thousand bounce
    messages a day.

    You can set up regular expression filters on To, From and Subject, and set
    it to either mark messages for deletion (so you get to review them before
    deleting them) or delete them straight away (via the "I'm either brave or
    stupid" checkbox, TM 8-) You can save your filters for later use.

    Take EXTREME CARE with this, particularly if you check the "I'm either
    brave or stupid" box. 8-) There is no way to recover a deleted message.
    Don't sue me if it eats your hamster's emails.

    You probably need something like (untested):

    From: microsoft|ms\b
    Subject: patch|latest|mi crosoft|update| upgrade|pack

    There's no SSL version of this, so your POP3 account details will pass in
    plain text over the internet (in theory my provider has a scheme whereby
    you can access the site over SSL using their certificate, but it doesn't
    work for some reason - if there's any interest I'll see whether I can make
    it work).

    (And no, I'm not going to harvest your POP3 account details. They never
    even hit the hard drive.)

    --
    Richie Hindle
    richie@entrian. com


    Comment

    • Robin Becker

      #3
      Re: pop3 email header classifier?

      In message <6cammvoibfes7s cnan7kkorctoi55 n4d57@4ax.com>, Richie Hindle
      <richie@entrian .com> writes[color=blue]
      >[/color]

      someone has posted a poplib command line thing on much the same lines in
      another thread.
      [color=blue]
      >[Robin][color=green]
      >> Hi, I'm getting vast numbers of fake upgrade emails containing some kind
      >> of virus. My rather old client can be made to reject these based on some
      >> patterns in the subject line. They're nearly all based on the word
      >> 'New', 'Latest', 'Microsoft', 'Patch', 'Pack', ... etc etc.
      >>
      >> Is there a python tool that can be made to delete these from my POP3
      >> mail box rather than let my client reject?[/color]
      >
      >I have a webmail application that can be made to delete messages based on
      >regular expressions, at http://entrian.com/cgi-bin/pop3.py
      >
      >I wrote it in response to a similar problem, whereby a spammer used my
      >address as his From address, and I received a couple of thousand bounce
      >messages a day.
      >
      >You can set up regular expression filters on To, From and Subject, and set
      >it to either mark messages for deletion (so you get to review them before
      >deleting them) or delete them straight away (via the "I'm either brave or
      >stupid" checkbox, TM 8-) You can save your filters for later use.
      >
      >Take EXTREME CARE with this, particularly if you check the "I'm either
      >brave or stupid" box. 8-) There is no way to recover a deleted message.
      >Don't sue me if it eats your hamster's emails.
      >
      >You probably need something like (untested):
      >
      >From: microsoft|ms\b
      >Subject: patch|latest|mi crosoft|update| upgrade|pack
      >
      >There's no SSL version of this, so your POP3 account details will pass in
      >plain text over the internet (in theory my provider has a scheme whereby
      >you can access the site over SSL using their certificate, but it doesn't
      >work for some reason - if there's any interest I'll see whether I can make
      >it work).
      >
      >(And no, I'm not going to harvest your POP3 account details. They never
      >even hit the hard drive.)
      >[/color]

      --
      Robin Becker

      Comment

      • Tim Roberts

        #4
        Re: pop3 email header classifier?

        Robin Becker <robin@jessikat .fsnet.co.uk> wrote:[color=blue]
        >
        >Hi, I'm getting vast numbers of fake upgrade emails containing some kind
        >of virus. My rather old client can be made to reject these based on some
        >patterns in the subject line. They're nearly all based on the word
        >'New', 'Latest', 'Microsoft', 'Patch', 'Pack', ... etc etc.
        >
        >Is there a python tool that can be made to delete these from my POP3
        >mail box rather than let my client reject? Quite a few seem to have
        >semi-valid return addresses so I get postmaster rejects from
        >xxx@microsoft. com etc.[/color]

        Is your e-mail client actually set up to send a RESPONSE when you receive a
        virus attachment? If so, can you please STOP IT AT ONCE?

        ALL viruses released in the last 3 years choose random names for both the
        sender AND recipient. It is not possible to automatically extract the
        infected individual's e-mail address from a virus message. You can find
        the address of their e-mail server, but that's all.

        By sending a polite "you sent me a virus" message, you are doing NOTHING to
        stop the viruses, you are ANNOYING an innocent person, and you are DOUBLING
        the e-mail volume damage caused by the virus script kiddies.

        I got close to 10,000 helpful and completely bogus "you sent my a virus"
        messages during the "SoBig" fiasco.
        --
        - Tim Roberts, timr@probo.com
        Providenza & Boekelheide, Inc.

        Comment

        • Robin Becker

          #5
          Re: pop3 email header classifier?

          In article <r81tmvo2rph810 9ohf357mq2fajkl iqhoh@4ax.com>, Tim Roberts
          <timr@probo.com > writes[color=blue]
          >Robin Becker <robin@jessikat .fsnet.co.uk> wrote:[color=green]
          >>
          >>Hi, I'm getting vast numbers of fake upgrade emails containing some kind
          >>of virus. My rather old client can be made to reject these based on some
          >>patterns in the subject line. They're nearly all based on the word
          >>'New', 'Latest', 'Microsoft', 'Patch', 'Pack', ... etc etc.
          >>
          >>Is there a python tool that can be made to delete these from my POP3
          >>mail box rather than let my client reject? Quite a few seem to have
          >>semi-valid return addresses so I get postmaster rejects from
          >>xxx@microsoft .com etc.[/color]
          >
          >Is your e-mail client actually set up to send a RESPONSE when you receive a
          >virus attachment? If so, can you please STOP IT AT ONCE?
          >[/color]

          I have no virus detection in the client and am deliberately not
          rejecting. That was the whole point of my question I wanted to do
          better.

          As a point of fact with this SWEN worm, it does seem possible to kill by
          a combination of the subject, from address and attachment size. The
          spambayes approach would certainly work, but it wouldn't improve my
          download times. I estimate I had about 50Mb of these things to download
          yesterday (ie 3-4 hours @ 56k). By employing a kill script I could keep
          up fairy easily.

          I'm certainly not sending any response or rejecting, I'm using DELE
          which should be a sink.
          [color=blue]
          >ALL viruses released in the last 3 years choose random names for both the
          >sender AND recipient. It is not possible to automatically extract the
          >infected individual's e-mail address from a virus message. You can find
          >the address of their e-mail server, but that's all.
          >
          >By sending a polite "you sent me a virus" message, you are doing NOTHING to
          >stop the viruses, you are ANNOYING an innocent person, and you are DOUBLING
          >the e-mail volume damage caused by the virus script kiddies.
          >
          >I got close to 10,000 helpful and completely bogus "you sent my a virus"
          >messages during the "SoBig" fiasco.[/color]

          --
          Robin Becker

          Comment

          • Alex Martelli

            #6
            Re: pop3 email header classifier?

            <posted & mailed>

            Robin Becker wrote:
            [color=blue]
            > Hi, I'm getting vast numbers of fake upgrade emails containing some kind
            > of virus. My rather old client can be made to reject these based on some
            > patterns in the subject line. They're nearly all based on the word
            > 'New', 'Latest', 'Microsoft', 'Patch', 'Pack', ... etc etc.
            >
            > Is there a python tool that can be made to delete these from my POP3
            > mail box rather than let my client reject? Quite a few seem to have
            > semi-valid return addresses so I get postmaster rejects from
            > xxx@microsoft.c om etc.
            >
            > I know about spam-bayes etc, but these things are over 120k each and it
            > seems pretty pointless to download them (as well as taking about an
            > hour).[/color]

            I posted an "emergency script" to be used for the purpose -- it
            triggers SOLELY on mail size. I have now enhanced it with lots of
            options etc, but the basic idea remains that of size-only triggering --
            risky but, it IS an emergency. BTW, the "postmaster rejects" are
            likely not connected to what you do with the "fake upgrade emails",
            alas -- rather, virus senders are now faking "From:" &c addresses,
            so everybody's getting lots of bounce msgs for mails they never sent.


            Alex

            Comment

            • David Mertz

              #7
              Re: pop3 email header classifier?

              Robin Becker <robin@jessikat .fsnet.co.uk> wrote previously:
              |Is there a python tool that can be made to delete these from my POP3
              |mail box rather than let my client reject?
              |I know about spam-bayes etc, but these things are over 120k each and it
              |seems pretty pointless to download them (as well as taking about an
              |hour).

              I do exactly this myself. For my article (about a year ago now) on Spam
              filtering, for IBM developerWorks, I developed my own little custom
              tool. I've refined it over time, but it remains kinda hackerish and
              un(der)document ed. Still, I'd be happy to share with anyone
              interested... especially if anyone wants to make something nice out of
              it for distribution.

              The idea of what I do is a hodgepodge. But the general idea is that I
              use [poplib] to download ONLY the headers. Those messages that are
              convincingly spam based on that get deleted without me ever needing to
              download bodies.

              As a first line of defense, I have a collection of blacklist and
              whitelist patterns (I only use strings and globs, not regexen; though
              the latter would be easy to add). These look at specific headers fields
              in which patterns might occur (or at the whole header, if I wish).

              But the next line of defense is the usual naive Bayesian style. The
              wrinkle here is that I do not use "words" in the headers for analysis,
              but rather trigrams (sequences of three characters). I believe that for
              headers-only, this is more accurate, although I have not rigorously
              tested this. Things like routing IPs and spam mail clients are hard to
              pick out by whole words, but trigrams do some magic.

              The other feature of my 'spamfilter' tool is that it knows nothing at
              all about specific mail clients. It just sits daemon-like, and
              periodically deletes stuff it doesn't like. I check mail from a lot of
              different clients, on a lot of different machines; so for me it would be
              inconvenient to have the filtering tied to one particular mail
              client/machine. My thing just runs and kills, even when I'm out of
              town, and checking for internet cafes.

              Yours, David...

              --
              mertz@ | The specter of free information is haunting the `Net! All the
              gnosis | powers of IP- and crypto-tyranny have entered into an unholy
              ..cx | alliance...idea s have nothing to lose but their chains. Unite
              | against "intellectu al property" and anti-privacy regimes!
              -------------------------------------------------------------------------


              Comment

              Working...