Unicode formatting for Strings

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • robson.cozendey.rj@gmail.com

    Unicode formatting for Strings

    Hi,

    I´m trying desperately to tell the interpreter to put an 'á' in my
    string, so here is the code snippet:

    # -*- coding: utf-8 -*-
    filename = u"Ataris Aquáticos #2.txt"
    f = open(filename, 'w')

    Then I save it with Windows Notepad, in the UTF-8 format. So:

    1) I put the "magic comment" at the start of the file
    2) I write u"" to specify my unicode string
    3) I save it in the UTF-8 format

    And even so, I get an error!

    File "Ataris Aqußticos #2.py", line 1
    SyntaxError: Non-ASCII character '\xff' in file Ataris Aqußticos #2.py
    on line 1
    , but no encoding declared; see http://www.python.org/peps/
    pep-0263.html for det
    ails

    I don´t know how to tell Python that it should use UTF-8, it keeps
    saying "no encoding declared" !

    Robson

  • kyosohma@gmail.com

    #2
    Re: Unicode formatting for Strings

    On Feb 5, 11:55 am, robson.cozendey ...@gmail.com wrote:
    Hi,
    >
    I´m trying desperately to tell the interpreter to put an 'á' in my
    string, so here is the code snippet:
    >
    # -*- coding: utf-8 -*-
    filename = u"Ataris Aquáticos #2.txt"
    f = open(filename, 'w')
    >
    Then I save it with Windows Notepad, in the UTF-8 format. So:
    >
    1) I put the "magic comment" at the start of the file
    2) I write u"" to specify my unicode string
    3) I save it in the UTF-8 format
    >
    And even so, I get an error!
    >
    File "Ataris Aqußticos #2.py", line 1
    SyntaxError: Non-ASCII character '\xff' in file Ataris Aqußticos #2.py
    on line 1
    , but no encoding declared; seehttp://www.python.org/peps/
    pep-0263.html for det
    ails
    >
    I don´t know how to tell Python that it should use UTF-8, it keeps
    saying "no encoding declared" !
    >
    Robson
    I can't tell from your email if you get the message when you try to
    open or close the file. So, I recommend that you read the following
    article as it explains the whole unicode business quite well:


    Comment

    • Kent Johnson

      #3
      Re: Unicode formatting for Strings

      robson.cozendey .rj@gmail.com wrote:
      Hi,
      >
      I´m trying desperately to tell the interpreter to put an 'á' in my
      string, so here is the code snippet:
      >
      # -*- coding: utf-8 -*-
      filename = u"Ataris Aquáticos #2.txt"
      f = open(filename, 'w')
      >
      Then I save it with Windows Notepad, in the UTF-8 format. So:
      >
      1) I put the "magic comment" at the start of the file
      2) I write u"" to specify my unicode string
      3) I save it in the UTF-8 format
      >
      And even so, I get an error!
      >
      File "Ataris Aqußticos #2.py", line 1
      SyntaxError: Non-ASCII character '\xff' in file Ataris Aqußticos #2.py
      on line 1
      It looks like you are saving the file in Unicode format (not utf-8) and
      Python is choking on the Byte Order Mark that Notepad puts at the
      beginning of the document.

      Try using an editor that will save utf-8 without a BOM, e.g. jedit or
      TextPad.

      Kent

      Comment

      • Chris Mellon

        #4
        Re: Unicode formatting for Strings

        On 2/5/07, Kent Johnson <kent@kentsjohn son.comwrote:
        robson.cozendey .rj@gmail.com wrote:
        Hi,

        I´m trying desperately to tell the interpreter to put an 'á' in my
        string, so here is the code snippet:

        # -*- coding: utf-8 -*-
        filename = u"Ataris Aquáticos #2.txt"
        f = open(filename, 'w')

        Then I save it with Windows Notepad, in the UTF-8 format. So:

        1) I put the "magic comment" at the start of the file
        2) I write u"" to specify my unicode string
        3) I save it in the UTF-8 format

        And even so, I get an error!

        File "Ataris Aqußticos #2.py", line 1
        SyntaxError: Non-ASCII character '\xff' in file Ataris Aqußticos #2.py
        on line 1
        >
        It looks like you are saving the file in Unicode format (not utf-8) and
        Python is choking on the Byte Order Mark that Notepad puts at the
        beginning of the document.
        >
        Notepad does support saving to UTF-8, and I was able to do this
        without the problem the OP was having. I also saved both with and
        without a BOM (in UTF-8) using SciTe, and Python worked correctly in
        both cases.
        Try using an editor that will save utf-8 without a BOM, e.g. jedit or
        TextPad.
        >
        Kent
        --

        >

        Comment

        • robson.cozendey.rj@gmail.com

          #5
          Re: Unicode formatting for Strings

          On Feb 5, 7:00 pm, "Chris Mellon" <arka...@gmail. comwrote:
          On 2/5/07, Kent Johnson <k...@kentsjohn son.comwrote:
          >
          >
          >
          >
          >
          robson.cozendey ...@gmail.com wrote:
          Hi,
          >
          I´m trying desperately to tell the interpreter to put an 'á' in my
          string, so here is the code snippet:
          >
          # -*- coding: utf-8 -*-
          filename = u"Ataris Aquáticos #2.txt"
          f = open(filename, 'w')
          >
          Then I save it with Windows Notepad, in the UTF-8 format. So:
          >
          1) I put the "magic comment" at the start of the file
          2) I write u"" to specify my unicode string
          3) I save it in the UTF-8 format
          >
          And even so, I get an error!
          >
          File "Ataris Aqußticos #2.py", line 1
          SyntaxError: Non-ASCII character '\xff' in file Ataris Aqußticos #2..py
          on line 1
          >
          It looks like you are saving the file in Unicode format (not utf-8) and
          Python is choking on the Byte Order Mark that Notepad puts at the
          beginning of the document.
          >
          Notepad does support saving to UTF-8, and I was able to do this
          without the problem the OP was having. I also saved both with and
          without a BOM (in UTF-8) using SciTe, and Python worked correctly in
          both cases.
          >
          >
          >
          Try using an editor that will save utf-8 without a BOM, e.g. jedit or
          TextPad.
          >>
          - Show quoted text -- Hide quoted text -
          >
          - Show quoted text -
          I saved it in UTF-8 with Notepad. I was thinking here... It can be a
          limitation of file.open() method? Have anyone tested that?

          Comment

          • John Machin

            #6
            Re: Unicode formatting for Strings

            On Feb 6, 8:05 am, robson.cozendey ...@gmail.com wrote:
            On Feb 5, 7:00 pm, "Chris Mellon" <arka...@gmail. comwrote:
            >
            >
            >
            On 2/5/07, Kent Johnson <k...@kentsjohn son.comwrote:
            >
            robson.cozendey ...@gmail.com wrote:
            Hi,
            >
            I´m trying desperately to tell the interpreter to put an 'á' inmy
            string, so here is the code snippet:
            >
            # -*- coding: utf-8 -*-
            filename = u"Ataris Aquáticos #2.txt"
            f = open(filename, 'w')
            >
            Then I save it with Windows Notepad, in the UTF-8 format. So:
            >
            1) I put the "magic comment" at the start of the file
            2) I write u"" to specify my unicode string
            3) I save it in the UTF-8 format
            >
            And even so, I get an error!
            >
            File "Ataris Aqußticos #2.py", line 1
            SyntaxError: Non-ASCII character '\xff' in file Ataris Aqußticos #2.py
            on line 1
            >
            It looks like you are saving the file in Unicode format (not utf-8) and
            Python is choking on the Byte Order Mark that Notepad puts at the
            beginning of the document.
            >
            Notepad does support saving to UTF-8, and I was able to do this
            without the problem the OP was having. I also saved both with and
            without a BOM (in UTF-8) using SciTe, and Python worked correctly in
            both cases.
            >
            Try using an editor that will save utf-8 without a BOM, e.g. jedit or
            TextPad.
            >>
            - Show quoted text -- Hide quoted text -
            >
            - Show quoted text -
            >
            I saved it in UTF-8 with Notepad.
            Please consider that you might possibly be mistaken.

            Here are dumps of 4 varieties of file:

            | >>for i in range(4):
            .... print '\nFile %d:\n%r' % (i, open('robson' + str(i) + '.py',
            'rb').read())
            ....

            File 0:
            '\xef\xbb\xbf# -*- coding: utf-8 -*-\r\nfilename = u"Ataris Aqu
            \xc3\xa1ticos #2.
            txt"\r\nf = open(filename, \'w\')'

            File 1:
            '# -*- coding: utf-8 -*-\r\nfilename = u"Ataris Aqu\xc3\xa1tico s
            #2.txt"\r\nf =
            open(filename, \'w\')'

            File 2:
            '# -*- coding: cp1252 -*-\r\nfilename = u"Ataris Aqu\xe1ticos #2.txt"\r
            \nf = ope
            n(filename, \'w\')'

            File 3:
            '\xff\xfe#\x00 \x00-\x00*\x00-\x00 \x00c\x00o\x00d \x00i\x00n\x00g
            \x00:\x00 \x00u
            \x00t\x00f\x00-\x008\x00 \x00-\x00*\x00-\x00\r\x00\n\x0 0f\x00i\x00l
            \x00e\x00n\x0
            0a\x00m\x00e\x0 0 \x00=\x00 \x00u\x00"\x00A \x00t\x00a\x00r \x00i\x00s
            \x00 ]
            [snip]

            File 0 was saved in UTF-8 with Notepad. Notepad puts a "UTF-8 BOM" at
            the front of the file. It works (that is, it creates a file with the a-
            acute character in its name). There is no \xff character in line 1 for
            Python to complain about.

            File 1 was saved in UTF-8 with another editor. No BOM, no problem.
            Works.

            File 2 (which specifies cp1252 encoding (my default, and probably
            yours too)) was saved normally (i.e. without the stuffing about
            necessary to get UTF-8). Works.

            File 3 was saved in "Unicode" (really utf_16_le) using Notepad. As you
            can see, it has a UTF-16-LE BOM (which contains \xff) at the start.
            Python is not amused, giving exactly the same error message as you
            reported.

            So:

            (1) If you still believe that you are getting a problem with a file
            saved as UTF-8, please present reproducible credible evidence: for
            example, a copy/paste of what happens when you (a) dump of the file,
            immediately followed by (b) running the file with Python.

            (2) Consider using your "native" encoding (e.g. cp1252) with your
            normal/usual editor/IDE.
            I was thinking here... It can be a
            limitation of file.open() method?
            No, it can't.
            Have anyone tested that?
            Unlikely.

            HTH,
            John

            Comment

            Working...