Decode email subjects into unicode

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Laszlo Nagy

    Decode email subjects into unicode

    Hi All,

    'm in trouble with decoding email subjects. Here are some examples:
    =?koi8-r?B?4tnT1NLP19n Qz8zOyc3PIMkgzc HMz9rB1NLB1M7P? =
    [Fwd: re:Flags Of The World, Us States, And Military]
    =?ISO-8859-2?Q?=E9rdekes?=
    =?UTF-8?B?aGliw6Fr?=

    I know that "=?UTF-8?B" means UTF-8 + base64 encoding, but I wonder if
    there is a standard method in the "email" package to decode these
    subjects? I do not want to re-invent the weel.

    Thanks,

    Laszlo

  • Jeffrey Froman

    #2
    Re: Decode email subjects into unicode

    Laszlo Nagy wrote:
    I know that "=?UTF-8?B" means UTF-8 + base64 encoding, but I wonder if
    there is a standard method in the "email" package to decode these
    subjects?
    The standard library function email.Header.de code_header will parse these
    headers into an encoded bytestring paired with the appropriate encoding
    specification, if any. For example:
    >>raw_headers = [
    .... '=?koi8-r?B?4tnT1NLP19n Qz8zOyc3PIMkgzc HMz9rB1NLB1M7P? =',
    .... '[Fwd: re:Flags Of The World, Us States, And Military]',
    .... '=?ISO-8859-2?Q?=E9rdekes?= ',
    .... '=?UTF-8?B?aGliw6Fr?=' ,
    .... ]
    >>from email.Header import decode_header
    >>for raw_header in raw_headers:
    .... for header, encoding in decode_header(r aw_header):
    .... if encoding is None:
    .... print header.decode()
    .... else:
    .... print header.decode(e ncoding)
    ....
    Ð‘Ñ‹ÑÑ‚Ñ€Ð¾Ð²Ñ ‹Ð¿Ð¾Ð»Ð½Ð¸Ð¼Ð¾ и Ð¼Ð°Ð»Ð¾Ð·Ð°Ñ‚Ñ €Ð°Ñ‚но
    [Fwd: re:Flags Of The World, Us States, And Military]
    érdekes
    hibák


    Jeffrey

    Comment

    Working...