Problems with email.Generator.Generator

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Chris Withers

    Problems with email.Generator.Generator

    Hi All,

    The following piece of code is giving me issues:

    from email.Charset import Charset,QP
    from email.MIMEText import MIMEText
    charset = Charset('utf-8')
    charset.body_en coding = QP
    msg = MIMEText(
    u'Some text with chars that need encoding: \xa3',
    'plain',
    )
    msg.set_charset (charset)
    print msg.as_string()

    Under Python 2.4.2, this produces the following output, as I'd expect:

    MIME-Version: 1.0
    Content-Transfer-Encoding: 8bit
    Content-Type: text/plain; charset="utf-8"

    Some text with chars that need encoding: =A3

    However, under Python 2.4.3, I now get:

    Traceback (most recent call last):
    File "test_encoding. py", line 14, in ?
    msg.as_string()
    File "c:\python24\li b\email\Message .py", line 129,
    in
    as_string
    g.flatten(self, unixfrom=unixfr om)
    File "c:\python24\li b\email\Generat or.py", line 82,
    in flatten
    self._write(msg )
    File "c:\python24\li b\email\Generat or.py", line 113,
    in _write
    self._dispatch( msg)
    File "c:\python24\li b\email\Generat or.py", line 139,
    in
    _dispatch
    meth(msg)
    File "c:\python24\li b\email\Generat or.py", line 182,
    in
    _handle_text
    self._fp.write( payload)
    UnicodeEncodeEr ror: 'ascii' codec can't encode
    character
    u'\xa3' in position 41:
    ordinal not in range(128)

    This seems to be as a result of this change:



    ....which is referred to as part of a fix for this bug:



    Now, is this change to Generator.py in error or am I doing something wrong?

    If the latter, how can I change my code such that it works as I'd expect?

    cheers,

    Chris

    --
    Simplistix - Content Management, Zope & Python Consulting
    - http://www.simplistix.co.uk
  • Manlio Perillo

    #2
    Re: Problems with email.Generator .Generator

    Chris Withers ha scritto:
    Hi All,
    >
    The following piece of code is giving me issues:
    >
    from email.Charset import Charset,QP
    from email.MIMEText import MIMEText
    charset = Charset('utf-8')
    charset.body_en coding = QP
    msg = MIMEText(
    u'Some text with chars that need encoding: \xa3',
    'plain',
    )
    msg.set_charset (charset)
    print msg.as_string()
    >
    Under Python 2.4.2, this produces the following output, as I'd expect:
    >
    [...]
    However, under Python 2.4.3, I now get:
    >
    Try with:

    msg = MIMEText(
    u'Some text with chars that need encoding: \xa3',
    _charset='utf-8',
    )


    and you will obtain the error:


    Traceback (most recent call last):
    File "<pyshell#4 >", line 3, in -toplevel-
    _charset='utf-8',
    File "C:\Python2.4\l ib\email\MIMETe xt.py", line 28, in __init__
    self.set_payloa d(_text, _charset)
    File "C:\Python2.4\l ib\email\Messag e.py", line 218, in set_payload
    self.set_charse t(charset)
    File "C:\Python2.4\l ib\email\Messag e.py", line 260, in set_charset
    self._payload = charset.body_en code(self._payl oad)
    File "C:\Python2.4\l ib\email\Charse t.py", line 366, in body_encode
    return email.base64MIM E.body_encode(s )
    File "C:\Python2.4\l ib\email\base64 MIME.py", line 136, in encode
    enc = b2a_base64(s[i:i + max_unencoded])
    UnicodeEncodeEr ror: 'ascii' codec can't encode character u'\xa3' in
    position 41: ordinal not in range(128)




    Regards Manlio Perillo

    Comment

    • Chris Withers

      #3
      Re: Problems with email.Generator .Generator

      Manlio Perillo wrote:
      Try with:
      >
      msg = MIMEText(
      u'Some text with chars that need encoding: \xa3',
      _charset='utf-8',
      )
      >
      >
      and you will obtain the error:
      >
      Traceback (most recent call last):
      File "<pyshell#4 >", line 3, in -toplevel-
      _charset='utf-8',
      File "C:\Python2.4\l ib\email\MIMETe xt.py", line 28, in __init__
      self.set_payloa d(_text, _charset)
      File "C:\Python2.4\l ib\email\Messag e.py", line 218, in set_payload
      self.set_charse t(charset)
      File "C:\Python2.4\l ib\email\Messag e.py", line 260, in set_charset
      self._payload = charset.body_en code(self._payl oad)
      File "C:\Python2.4\l ib\email\Charse t.py", line 366, in body_encode
      return email.base64MIM E.body_encode(s )
      File "C:\Python2.4\l ib\email\base64 MIME.py", line 136, in encode
      enc = b2a_base64(s[i:i + max_unencoded])
      UnicodeEncodeEr ror: 'ascii' codec can't encode character u'\xa3' in
      position 41: ordinal not in range(128)
      OK, but I fail to see how replacing one unicode error with another is
      any help... :-S

      Chris

      --
      Simplistix - Content Management, Zope & Python Consulting
      - http://www.simplistix.co.uk

      Comment

      • Peter Otten

        #4
        Re: Problems with email.Generator .Generator

        Chris Withers wrote:
        The following piece of code is giving me issues:
        >
        from email.Charset import Charset,QP
        from email.MIMEText import MIMEText
        charset = Charset('utf-8')
        charset.body_en coding = QP
        msg = MIMEText(
        u'Some text with chars that need encoding: \xa3',
        'plain',
        )
        msg.set_charset (charset)
        print msg.as_string()
        >
        Under Python 2.4.2, this produces the following output, as I'd expect:
        >
        MIME-Version: 1.0
        Content-Transfer-Encoding: 8bit
        Content-Type: text/plain; charset="utf-8"
        >
        Some text with chars that need encoding: =A3
        >
        However, under Python 2.4.3, I now get:
        >
        Traceback (most recent call last):
        File "test_encoding. py", line 14, in ?
        msg.as_string()
        File "c:\python24\li b\email\Message .py", line 129,
        in
        as_string
        g.flatten(self, unixfrom=unixfr om)
        File "c:\python24\li b\email\Generat or.py", line 82,
        in flatten
        self._write(msg )
        File "c:\python24\li b\email\Generat or.py", line 113,
        in _write
        self._dispatch( msg)
        File "c:\python24\li b\email\Generat or.py", line 139,
        in
        _dispatch
        meth(msg)
        File "c:\python24\li b\email\Generat or.py", line 182,
        in
        _handle_text
        self._fp.write( payload)
        UnicodeEncodeEr ror: 'ascii' codec can't encode
        character
        u'\xa3' in position 41:
        ordinal not in range(128)
        >
        This seems to be as a result of this change:
        >
        >
        http://svn.python.org/view/python/br...37910&r2=42272
        >
        ...which is referred to as part of a fix for this bug:
        >
        >
        http://sourceforge.net/tracker/?func...70&atid=105470
        >
        Now, is this change to Generator.py in error or am I doing something
        wrong?
        I'm not familiar enough with the email package to answer that.
        If the latter, how can I change my code such that it works as I'd expect?
        email.Generator and email.Message use cStringIO.Strin gIO internally, which
        can't cope with unicode. A quick fix might be to monkey-patch:

        from StringIO import StringIO
        from email import Generator, Message
        Generator.Strin gIO = Message.StringI O = StringIO
        # your code here

        Peter

        Comment

        • Chris Withers

          #5
          Re: Problems with email.Generator .Generator

          Peter Otten wrote:
          http://sourceforge.net/tracker/?func...70&atid=105470
          >Now, is this change to Generator.py in error or am I doing something
          >wrong?
          >
          I'm not familiar enough with the email package to answer that.
          I'm hoping someone around here is ;-)
          >If the latter, how can I change my code such that it works as I'd expect?
          >
          email.Generator and email.Message use cStringIO.Strin gIO internally, which
          can't cope with unicode. A quick fix might be to monkey-patch:
          I'm not sure that's correct, but I'm happy to stand corrected.

          My understanding is that the StringIO's don't mind as long as they type
          is consistent - ie: con't mix unicode and encoded strings, 'cos it
          forced python's default ascii codec to kick in and spew unicode errors.

          Now, I want to know what I'm supposed to do when I have unicode source
          and want it to end up as either a text/plain or text/html mime part.

          Is there a how-to for this anywhere? The email package's docs are short
          on examples involving charsets, unicode and the like :-(

          Chris

          --
          Simplistix - Content Management, Zope & Python Consulting
          - http://www.simplistix.co.uk

          Comment

          • Steve Holden

            #6
            Re: Problems with email.Generator .Generator

            Chris Withers wrote:
            Peter Otten wrote:
            >
            >>http://sourceforge.net/tracker/?func...70&atid=105470
            >>
            >>>Now, is this change to Generator.py in error or am I doing something
            >>>wrong?
            >>
            >>I'm not familiar enough with the email package to answer that.
            >
            >
            I'm hoping someone around here is ;-)
            >
            >
            >>>If the latter, how can I change my code such that it works as I'd expect?
            >>
            >>email.Generat or and email.Message use cStringIO.Strin gIO internally, which
            >>can't cope with unicode. A quick fix might be to monkey-patch:
            >
            >
            I'm not sure that's correct, but I'm happy to stand corrected.
            >
            My understanding is that the StringIO's don't mind as long as they type
            is consistent - ie: con't mix unicode and encoded strings, 'cos it
            forced python's default ascii codec to kick in and spew unicode errors.
            >
            Now, I want to know what I'm supposed to do when I have unicode source
            and want it to end up as either a text/plain or text/html mime part.
            >
            Is there a how-to for this anywhere? The email package's docs are short
            on examples involving charsets, unicode and the like :-(
            >
            Well, it would seem like the easiest approach is to monkey-patch the use
            of cStringIO to StringIO as recommended and see if that fixes your
            problem. Wouldn't it?

            regards
            Steve
            --
            Steve Holden +44 150 684 7255 +1 800 494 3119
            Holden Web LLC/Ltd http://www.holdenweb.com
            Skype: holdenweb http://holdenweb.blogspot.com
            Recent Ramblings http://del.icio.us/steve.holden

            Comment

            • Gerard Flanagan

              #7
              Re: Problems with email.Generator .Generator


              Chris Withers wrote:
              >
              Now, I want to know what I'm supposed to do when I have unicode source
              and want it to end up as either a text/plain or text/html mime part.
              >
              Is there a how-to for this anywhere? The email package's docs are short
              on examples involving charsets, unicode and the like :-(
              no expert in this, but have you tried the codecs module?



              ( with 'xmlcharrefrepl ace' for the html )?

              Gerard

              Comment

              • Manlio Perillo

                #8
                Re: Problems with email.Generator .Generator

                Chris Withers ha scritto:
                [...]
                >
                OK, but I fail to see how replacing one unicode error with another is
                any help... :-S
                >

                The problem is simple: email package does not support well Unicode strings.

                For now I'm using this:

                charset = "utf-8" # the charset to be used for email


                class HeadersMixin(ob ject):
                """A custom mixin, for automatic internationaliz ed headers
                support.
                """

                def __setitem__(sel f, name, val, **_params):
                if isinstance(val, str):
                try:
                # only 7 bit ascii
                val.decode("us-ascii")
                except UnicodeDecodeEr ror:
                raise ValueError("8 bit strings not accepted")

                return self.add_header (name, val)
                else:
                try:
                # to avoid unnecessary trash
                val = val.encode('us-ascii')
                except:
                val = Header.Header(v al, charset).encode ()

                return self.add_header (name, val)


                class MIMEText(Header sMixin, _MIMEText.MIMET ext):
                """A MIME Text message that allows only Unicode strings, or plain
                ascii (7 bit) ones.
                """

                def __init__(self, _text, _subtype="plain "):
                _charset = charset

                if isinstance(_tex t, str):
                try:
                # only 7 bit ascii
                _text.decode("u s-ascii")
                _charset = "us-ascii"
                except UnicodeDecodeEr ror:
                raise ValueError("8 bit strings not accepted")
                else:
                _text = _text.encode(ch arset)

                return _MIMEText.MIMET ext.__init__(se lf, _text, _subtype, _charset)


                class MIMEMultipart(H eadersMixin, _MIMEMultipart. MIMEMultipart):
                def __init__(self):
                _MIMEMultipart. MIMEMultipart._ _init__(self)



                This only accepts Unicode strings or plain ascii strings.




                Regards Manlio Perillo

                Comment

                • Chris Withers

                  #9
                  Re: Problems with email.Generator .Generator

                  Steve Holden wrote:
                  >Is there a how-to for this anywhere? The email package's docs are short
                  >on examples involving charsets, unicode and the like :-(
                  >>
                  Well, it would seem like the easiest approach is to monkey-patch the use
                  of cStringIO to StringIO as recommended and see if that fixes your
                  problem. Wouldn't it?
                  No, not really, since at best that's a nasty (and I meant really nasty)
                  hack. I'm using the email package as part of a library that I'm building
                  which is to be used with various frameworks. Monkey patching modules is
                  about as bad as it gets in that situation...

                  At worst, and most likely based on my past experience of (c)StringIO
                  being used to accumulate output, it won't make a jot of difference...

                  Chris

                  --
                  Simplistix - Content Management, Zope & Python Consulting
                  - http://www.simplistix.co.uk

                  Comment

                  • Chris Withers

                    #10
                    Re: Problems with email.Generator .Generator

                    Manlio Perillo wrote:
                    >
                    The problem is simple: email package does not support well Unicode strings.
                    Really? All the character set support seems to indicate a fair bit of
                    thought went into this aspect, although it does appear that no-one
                    bothered to document it :-(

                    Chris

                    --
                    Simplistix - Content Management, Zope & Python Consulting
                    - http://www.simplistix.co.uk

                    Comment

                    • Peter Otten

                      #11
                      Re: Problems with email.Generator .Generator

                      Chris Withers wrote:
                      At worst, and most likely based on my past experience of (c)StringIO
                      being used to accumulate output, it won't make a jot of difference...
                      What past experience?
                      >>StringIO.Stri ngIO().write(un ichr(128))
                      >>cStringIO.Str ingIO().write(u nichr(128))
                      Traceback (most recent call last):
                      File "<stdin>", line 1, in ?
                      UnicodeEncodeEr ror: 'ascii' codec can't encode character u'\x80' in position
                      0: ordinal not in range(128)

                      Peter

                      Comment

                      • Steve Holden

                        #12
                        Re: Problems with email.Generator .Generator

                        Chris Withers wrote:
                        Steve Holden wrote:
                        >
                        >>Is there a how-to for this anywhere? The email package's docs are
                        >>short on examples involving charsets, unicode and the like :-(
                        >>>
                        >Well, it would seem like the easiest approach is to monkey-patch the
                        >use of cStringIO to StringIO as recommended and see if that fixes your
                        >problem. Wouldn't it?
                        >
                        >
                        No, not really, since at best that's a nasty (and I meant really nasty)
                        hack. I'm using the email package as part of a library that I'm building
                        which is to be used with various frameworks. Monkey patching modules is
                        about as bad as it gets in that situation...
                        >
                        At worst, and most likely based on my past experience of (c)StringIO
                        being used to accumulate output, it won't make a jot of difference...
                        >
                        Under those circumstances you probably know best ...

                        regards
                        Steve
                        --
                        Steve Holden +44 150 684 7255 +1 800 494 3119
                        Holden Web LLC/Ltd http://www.holdenweb.com
                        Skype: holdenweb http://holdenweb.blogspot.com
                        Recent Ramblings http://del.icio.us/steve.holden

                        Comment

                        • Chris Withers

                          #13
                          Re: Problems with email.Generator .Generator

                          Peter Otten wrote:
                          What past experience?
                          >
                          >>>StringIO.Str ingIO().write(u nichr(128))
                          >>>cStringIO.St ringIO().write( unichr(128))
                          Traceback (most recent call last):
                          File "<stdin>", line 1, in ?
                          UnicodeEncodeEr ror: 'ascii' codec can't encode character u'\x80' in position
                          0: ordinal not in range(128)
                          OK, I stand corrected, although I suspect the bug is actually in
                          StringIO.String IO in that it doesn't barf on unicodes.

                          (Python 3000 and all that)

                          Which again leads us back to the email package: it used to do the right
                          thing from what I can see, and now it doesn't, and ends up trying to
                          write a unicode to a cStringIO, which (rightly, I guess) barfs...

                          Barry, Barry, where are you? ;-)

                          Chris

                          --
                          Simplistix - Content Management, Zope & Python Consulting
                          - http://www.simplistix.co.uk

                          Comment

                          • Chris Withers

                            #14
                            Re: Problems with email.Generator .Generator

                            Peter Otten wrote:
                            Chris Withers wrote:
                            >
                            >At worst, and most likely based on my past experience of (c)StringIO
                            >being used to accumulate output, it won't make a jot of difference...
                            >
                            What past experience?
                            >
                            >>>StringIO.Str ingIO().write(u nichr(128))
                            >>>cStringIO.St ringIO().write( unichr(128))
                            Traceback (most recent call last):
                            File "<stdin>", line 1, in ?
                            UnicodeEncodeEr ror: 'ascii' codec can't encode character u'\x80' in position
                            0: ordinal not in range(128)
                            Okay, more out of desperation than anything else, lets try this:

                            from email.Charset import Charset,QP
                            from email.MIMEText import MIMEText
                            from StringIO import StringIO
                            from email import Generator,Messa ge
                            Generator.Strin gIO = Message.StringI O = StringIO
                            charset = Charset('utf-8')
                            charset.body_en coding = QP
                            msg = MIMEText(u'Some text with chars that need encoding: \xa3','plain')
                            msg.set_charset (charset)
                            print repr(msg.as_str ing())
                            u'MIME-Version: 1.0\nContent-Transfer-Encoding: 8bit\nContent-Type:
                            text/plain; charset="utf-8"\n\nSome text with chars that need encoding:
                            \xa3'

                            Yay! No unicode error, but also no use:

                            File "c:\python24\li b\smtplib.py", line 692, in sendmail
                            (code,resp) = self.data(msg)
                            File "c:\python24\li b\smtplib.py", line 489, in data
                            self.send(q)
                            File "c:\python24\li b\smtplib.py", line 316, in send
                            self.sock.senda ll(str)
                            File "<string>", line 1, in sendall
                            UnicodeEncodeEr ror: 'ascii' codec can't encode character u'\xa3' in
                            position 297: ordinal not in range(128)

                            The other variant I've tried is:

                            from email.Charset import Charset,QP
                            from email.MIMEText import MIMEText
                            charset = Charset('utf-8')
                            charset.body_en coding = QP
                            msg = MIMEText('','pl ain',)
                            msg.set_charset (charset)
                            msg.set_payload (charset.body_e ncode(u'Some text with chars that need
                            encoding: \xa3'))
                            print msg.as_string()

                            Which is sort of okay:

                            MIME-Version: 1.0
                            Content-Transfer-Encoding: 7bit
                            Content-Type: text/plain; charset="utf-8"

                            Some text with chars that need encoding: =A3

                            ....except it gets the transfer encoding wrong, which means Thunderbird
                            shows =A3 instead of the pound sign that it should :-(

                            ....this is down to a pretty lame bit of code in Encoders.py which
                            basically checks for a unicode error *sigh*

                            Chris

                            --
                            Simplistix - Content Management, Zope & Python Consulting
                            - http://www.simplistix.co.uk

                            Comment

                            • Chris Withers

                              #15
                              Re: Problems with email.Generator .Generator

                              Chris Withers wrote:
                              ...except it gets the transfer encoding wrong, which means Thunderbird
                              shows =A3 instead of the pound sign that it should :-(
                              >
                              ...this is down to a pretty lame bit of code in Encoders.py which
                              basically checks for a unicode error *sigh*
                              OK, slight progress... here a new version that actually works:

                              from email.Charset import Charset,QP
                              from email.MIMEText import MIMEText
                              charset = Charset('utf-8')
                              charset.body_en coding = QP
                              msg = MIMEText('','pl ain',None)
                              msg.set_payload (u'Some text with chars that need encoding:\xa3', charset)
                              print msg.as_string()

                              MIME-Version: 1.0
                              Content-Type: text/plain; charset; charset="utf-8"
                              Content-Transfer-Encoding: quoted-printable

                              Some text with chars that need encoding:=A3

                              Okay, so this actually does the right thing... wahey!

                              ....but hold your horses, if Charset isn't set to quoted printable, then
                              you end up with problems:

                              charset = Charset('utf-8')
                              msg = MIMEText('','pl ain',None)
                              msg.set_payload (u'Some text with chars that need encoding:\xa3', charset)

                              Traceback (most recent call last):
                              File "C:\test_encodi ng.py", line 5, in ?
                              msg.set_payload (u'Some text with chars that need
                              encoding:\xa3', charset)
                              File "c:\python24\li b\email\Message .py", line 218, in set_payload
                              self.set_charse t(charset)
                              File "c:\python24\li b\email\Message .py", line 260, in set_charset
                              self._payload = charset.body_en code(self._payl oad)
                              File "c:\python24\li b\email\Charset .py", line 366, in body_encode
                              return email.base64MIM E.body_encode(s )
                              File "c:\python24\li b\email\base64M IME.py", line 136, in encode
                              enc = b2a_base64(s[i:i + max_unencoded])
                              UnicodeEncodeEr ror: 'ascii' codec can't encode character u'\xa3' in
                              position 40: ordinal not in range(128)

                              Now what?

                              *sigh*

                              Chris

                              --
                              Simplistix - Content Management, Zope & Python Consulting
                              - http://www.simplistix.co.uk

                              Comment

                              Working...