can compile function have a bug?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • ygao

    can compile function have a bug?

    >>compile('U"ÖÐ "','c:/test','single')
    <code object ? at 00F06B60, file "c:/test", line 1>
    >>d=compile('U" ÖÐ"','c:/test','single')
    >>d
    <code object ? at 00F06BA0, file "c:/test", line 1>
    >>exec(d)
    u'\xd6\xd0'
    >>U"ÖÐ"
    u'\u4e2d'
    >>>
    why is the result different?
    a bug or another reason?

  • Peter Otten

    #2
    Re: can compile function have a bug?

    ygao wrote:
    >>>compile('U"ä ¸­"','c:/test','single')
    <code object ? at 00F06B60, file "c:/test", line 1>
    >>>d=compile('U "中"','c:/test','single')
    >>>d
    <code object ? at 00F06BA0, file "c:/test", line 1>
    >>>exec(d)
    u'\xd6\xd0'
    >>>U"中"
    u'\u4e2d'
    >>>>
    >
    why is the result different?
    a bug or another reason?
    How that particular output came to be I don't know, but you should be able
    to avoid the confusion by either passing a unicode string to compile() or
    specifying the encoding:
    >>exec compile(u'u"中 "','c:/test','single')
    u'\u4e2d'
    >>exec compile('# -*- coding: utf8 -*-\nu"中"','c:/test','single')
    u'\u4e2d'

    Peter

    PS: In and all-UTF-8 environment I would have /expected/ to see
    >>your_encodi ng = "utf8"
    >>identity = "latin1"
    >>u'\u4e2d'.enc ode(your_encodi ng).decode(iden tity)
    u'\xe4\xb8\xad'

    and that's indeed what I get over here:
    >>exec compile('u"中" ','c:/test','single')
    u'\xe4\xb8\xad'


    Comment

    • John Machin

      #3
      Re: can compile function have a bug?


      Peter Otten wrote:
      ygao wrote:
      >
      >>compile('U"ä¸ ­"','c:/test','single')
      <code object ? at 00F06B60, file "c:/test", line 1>
      >>d=compile('U" 中"','c:/test','single')
      >>d
      <code object ? at 00F06BA0, file "c:/test", line 1>
      >>exec(d)
      u'\xd6\xd0'
      >>U"中"
      u'\u4e2d'
      >>>
      why is the result different?
      a bug or another reason?
      >
      How that particular output came to be I don't know, but you should be able
      to avoid the confusion by either passing a unicode string to compile() or
      specifying the encoding:
      >
      >exec compile(u'u"中 "','c:/test','single')
      u'\u4e2d'
      >exec compile('# -*- coding: utf8 -*-\nu"中"','c:/test','single')
      u'\u4e2d'
      >
      Peter
      >
      PS: In and all-UTF-8 environment I would have /expected/ to see
      >
      >your_encodin g = "utf8"
      >identity = "latin1"
      >u'\u4e2d'.enco de(your_encodin g).decode(ident ity)
      u'\xe4\xb8\xad'
      >
      and that's indeed what I get over here:
      >
      >exec compile('u"中" ','c:/test','single')
      u'\xe4\xb8\xad'
      But it's not an all-UTF-8 environment; his_encoding = 'gb2312' or one
      of its heirs/successors :-)

      Cheers,
      John

      Comment

      • Peter Otten

        #4
        Re: can compile function have a bug?

        John Machin wrote:
        But it's not an all-UTF-8 environment; his_encoding = 'gb2312' or one
        of its heirs/successors :-)
        Ouch. Almost understanding a problem hurts more than not understanding it at
        all. I just had a refresher of the experience...

        Peter

        Comment

        • ygao

          #5
          Re: can compile function have a bug?


          Peter Otten wrote:
          ygao wrote:
          >
          >>compile('U"ä¸ ­"','c:/test','single')
          <code object ? at 00F06B60, file "c:/test", line 1>
          >>d=compile('U" 中"','c:/test','single')
          >>d
          <code object ? at 00F06BA0, file "c:/test", line 1>
          >>exec(d)
          u'\xd6\xd0'
          >>U"中"
          u'\u4e2d'
          >>>
          why is the result different?
          a bug or another reason?
          >
          How that particular output came to be I don't know, but you should be able
          to avoid the confusion by either passing a unicode string to compile() or
          specifying the encoding:
          >
          >exec compile(u'u"中 "','c:/test','single')
          u'\u4e2d'
          >exec compile('# -*- coding: utf8 -*-\nu"中"','c:/test','single')
          u'\u4e2d'
          this is what I want!
          many thanks!
          >
          Peter
          >
          PS: In and all-UTF-8 environment I would have /expected/ to see
          >
          >your_encodin g = "utf8"
          >identity = "latin1"
          >u'\u4e2d'.enco de(your_encodin g).decode(ident ity)
          u'\xe4\xb8\xad'
          >
          and that's indeed what I get over here:
          >exec compile('u"中" ','c:/test','single')
          u'\xe4\xb8\xad'

          Comment

          Working...