Help on Email Parsing

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • dont bother

    Help on Email Parsing

    Hey,
    I have been trying to parse emails:
    But I could not find any examples or snippets of
    parsing emails in python from the documentation.
    Google did not help me much too.
    I am trying to understand the module 'email' and the
    functions described there to parse email but seems
    difficult.
    Can anyone help me in locating some pointers or
    snippets on this issue.
    Thanks a Ton
    Dont

    _______________ _______________ ____
    Do you Yahoo!?
    Yahoo! Mail SpamGuard - Read only the mail you want.


  • Jeremy Sanders

    #2
    Re: Help on Email Parsing

    On Mon, 23 Feb 2004 00:47:17 -0800, dont bother wrote:
    [color=blue][color=green]
    >> I have been trying to parse emails:[/color]
    > But I could not find any examples or snippets of parsing emails in
    > python from the documentation.[/color]

    Here is a simple program (a bit of a hack) I wrote to count the number of
    messages in a mailbox in each day (used for counting spams). It may be of
    some use to you, although I don't actually parse the message itself, and
    only the headers.

    Jeremy

    # Released under the GPL (version 2 or greater)
    # Copyright (C) 2003 Jeremy Sanders

    import mailbox
    import string
    import email
    import email.Utils
    import time
    import sys

    # open passed mailbox filename
    # (yes - we need checking of this)
    fp = open(sys.argv[1], 'r')

    # open mailbox from file
    mbox = mailbox.Portabl eUnixMailbox(fp )

    secsinday = 86400
    counts = {}

    # get current time
    nowtime = time.time()

    # iterate over mail messages
    while 1:
    # get next message
    msg = mbox.next()
    # exit if we've looked at the last one
    if msg == None:
    break

    # get received header
    received = msg.get('receiv ed')
    # skip messages with no received header
    if received == None:
    continue

    # get unix time of email
    date_rfind = string.rfind(re ceived, ';')
    date = received[date_rfind+1:]
    pd = email.Utils.par sedate( string.strip(da te) )

    # skip messages we can't parse the date on
    if pd == None:
    continue

    # get time between now and received date in message
    unixtime = time.mktime(pd)
    day = int( (unixtime-nowtime) / secsinday)

    # increment counter for day
    # (using a dict allows us to parse the messages only once)
    if not day in counts:
    counts[day] = 0
    counts[day] += 1

    # sort days into numerical order
    daylist = counts.keys()
    daylist.sort()

    # print out counts
    for d in daylist:
    print d, counts[d]


    Comment

    • deelan

      #3
      Re: Help on Email Parsing

      dont bother wrote:
      [color=blue]
      > Hey,
      > I have been trying to parse emails:
      > But I could not find any examples or snippets of
      > parsing emails in python from the documentation.
      > Google did not help me much too.
      > I am trying to understand the module 'email' and the
      > functions described there to parse email but seems
      > difficult.
      > Can anyone help me in locating some pointers or
      > snippets on this issue.[/color]

      this script will extract one or more images
      from an email message given as argument

      hope this helps.



      """Extracts all images from given rfc822-compliant email message.
      A quick hack by deelan

      python extract.py filename
      """

      # good MIME's
      mimes = 'image/gif', 'image/jpeg', 'image/png'

      import email

      def main(filename):
      f = file(filename, 'r')
      m = email.message_f rom_file(f)
      f.close()

      # loop thru message body and look for JPEG, GIF and PNG images
      images = [(part.get_filen ame(), part.get_payloa d(decode=True))
      for part in m.get_payload() if part.get_type() in mimes]

      for name, data in images:
      print 'writing', name, '...'
      f = file(name, 'wb')
      f.write(data)
      f.close()

      print 'done %d image(s).' % len(images)

      if __name__ == '__main__':
      import sys
      if len(sys.argv) > 1:
      main(sys.argv[1])
      else:
      print __doc__



      --
      @prefix foaf: <http://xmlns.com/foaf/0.1/> .
      <#me> a foaf:Person ; foaf:nick "deelan" ;
      foaf:weblog <http://www.deelan.com/> .

      Comment

      • John Roth

        #4
        Re: Help on Email Parsing


        "dont bother" <dontbotherworl d@yahoo.com> wrote in message
        news:mailman.19 0.1077526040.27 104.python-list@python.org ...[color=blue]
        > Hey,
        > I have been trying to parse emails:
        > But I could not find any examples or snippets of
        > parsing emails in python from the documentation.
        > Google did not help me much too.
        > I am trying to understand the module 'email' and the
        > functions described there to parse email but seems
        > difficult.
        > Can anyone help me in locating some pointers or
        > snippets on this issue.
        > Thanks a Ton
        > Dont[/color]

        You may want to study the MIME format a
        bit first. It's not a particularly simple format.

        The final example in the email documentation
        seems to be fairly straightforward . The line:

        msg = email.message_f rom_file(fp)

        does everything and leaves the result in
        memory as objects.

        Of course, this is the *new* email package
        that is in 2.2.3 and later. I don't believe the
        old one was particularly easy to work with.

        John Roth

        ..[color=blue]
        >
        > _______________ _______________ ____
        > Do you Yahoo!?
        > Yahoo! Mail SpamGuard - Read only the mail you want.
        > http://antispam.yahoo.com/tools
        >[/color]


        Comment

        Working...