email header decoding fails

**Gabriel Genellina** · Apr 10 '08, 09:25 AM

Re: email header decoding fails

En Thu, 10 Apr 2008 05:45:41 -0300, ZeeGeek <ZeeGeek@gmail. comescribió:

On Apr 10, 4:31 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.a r>
wrote:

>En Wed, 09 Apr 2008 23:12:00 -0300, ZeeGeek <ZeeG...@gmail. com>
>escribió:
>>

It seems that the decode_header function in email.Header fails when
the string is in the following form,

>>

'=?gb2312?Q?=D0 =C7=C8=FC?=(rev ised)'

> An 'encoded-word' that appears within a
> 'phrase' MUST be separated from any adjacent 'word', 'text' or
> 'special' by 'linear-white-space'.

>
Thank you very much, Gabriel.

The above just says "why" decode_header refuses to decode it, and why it's
not a bug. But if you actually have to deal with those malformed headers,
some heuristics may help. By example, if you *know* your mails typically
specify gb2312 encoding, or iso-8859-1, you may look for things that look
like the example above and "fix" it.

--
Gabriel Genellina

**ZeeGeek** · Apr 11 '08, 06:05 AM

Re: email header decoding fails

On Apr 10, 5:18 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.a r>
wrote:

En Thu, 10 Apr 2008 05:45:41 -0300, ZeeGeek <ZeeG...@gmail. comescribió:
>

On Apr 10, 4:31 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.a r>
wrote:

En Wed, 09 Apr 2008 23:12:00 -0300, ZeeGeek <ZeeG...@gmail. com>
escribió:

>

It seems that the decode_header function in email.Header fails when
the string is in the following form,

>

'=?gb2312?Q?=D0 =C7=C8=FC?=(rev ised)'
An 'encoded-word' that appears within a
'phrase' MUST be separated from any adjacent 'word', 'text' or
'special' by 'linear-white-space'.

>

Thank you very much, Gabriel.

>
The above just says "why" decode_header refuses to decode it, and why it's
not a bug. But if you actually have to deal with those malformed headers,
some heuristics may help. By example, if you *know* your mails typically
specify gb2312 encoding, or iso-8859-1, you may look for things that look
like the example above and "fix" it.

Right now what I'm doing is to use re.sub(r'(=\?([^\?]*\?){3}=)', r'
\1 ', orig_string) to detect and place an extra white space before and
after every occurrence of an encoded string. Then the whole string is
compliant with the standard and decode_header can decode it properly.

email header decoding fails

email header decoding fails

Comment

Comment