filter valid email addresses

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Hoang

    filter valid email addresses

    anyone know of an algorithm to filter out real email addresses as opposed to
    computer generated email addresses? I have been going through past email
    archives in order to find friends email address. Unfortunately about 75% of
    them are junk addresses or spammer addresses. It's quite obvious when you
    look at it and delete it... but you don't want to do it by hand.



  • Karlheinz klingbeil

    #2
    Re: filter valid email addresses

    Hoang wrote:
    [color=blue]
    > anyone know of an algorithm to filter out real email addresses as opposed
    > to
    > computer generated email addresses? I have been going through past email
    > archives in order to find friends email address. Unfortunately about 75%
    > of
    > them are junk addresses or spammer addresses. It's quite obvious when you
    > look at it and delete it... but you don't want to do it by hand.[/color]

    the only means to check if an email-address is valid is to send a mail to it
    and ask for a reply.... if the syntax is right you cannot say which address
    exists and which doesnt.

    I have mad a pop3-filter, which checks emails in your inbox and deletes
    using multi-staged regular expressions. the python script and documentation
    is available at http://www.lunqual.de/poppers.zip
    --
    Greetz.... lunqual

    Comment

    • Andrew Dalke

      #3
      Re: filter valid email addresses

      Hoang:[color=blue]
      > anyone know of an algorithm to filter out real email addresses as opposed[/color]
      to[color=blue]
      > computer generated email addresses? I have been going through past email
      > archives in order to find friends email address. Unfortunately about 75%[/color]
      of[color=blue]
      > them are junk addresses or spammer addresses.[/color]

      Why just look at the email addresses? Since you have the emails
      themselves, try this. Get SpamBayes or any of the other systems you
      can use to recognize ham/spam. Find the emails where the addresses
      are used more than once. These are much more likely to be from
      your friends. Use these emails as ham. From the remaining addresses,
      identify some of the spam. Train SpamBayes on this and use it
      to classify the remaining emails. These can be sorted from most
      ham-like to most spam-like, making it easier to identify valid emails
      and hence valid email addresses.

      Andrew
      dalke@dalkescie ntific.com


      Comment

      Working...