Writing UTF-8 string to UNICODE file

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Michael Weir

    Writing UTF-8 string to UNICODE file

    I'm sure this is a very simple thing to do, once you know how to do it, but
    I am having no fun at all trying to write utf-8 strings to a unicode file.
    Does anyone have a couple of lines of code that
    - opens a file appropriately for output
    - writes to this file
    Thanks very much.
    Michael Weir


  • Peter Hansen

    #2
    Re: Writing UTF-8 string to UNICODE file

    Michael Weir wrote:[color=blue]
    >
    > I'm sure this is a very simple thing to do, once you know how to do it, but
    > I am having no fun at all trying to write utf-8 strings to a unicode file.
    > Does anyone have a couple of lines of code that
    > - opens a file appropriately for output
    > - writes to this file[/color]

    I can't give you an example, never having done this, but if you would post
    a few lines of your own code which you thought would work, someone can probably
    point out the error of your ways more easily than writing something from
    scratch. (Of course, we'll shortly see a complete working solution from
    someone anyway, but in general this is the better way to proceed with such
    a problem.)

    -Peter

    Comment

    • Alan Kennedy

      #3
      Re: Writing UTF-8 string to UNICODE file

      Michael Weir wrote:[color=blue]
      > Does anyone have a couple of lines of code that
      > - opens a file appropriately for output
      > - writes to this file[/color]

      Simplest way (IMHO), with python 2.3

      #-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
      import codecs

      f = codecs.open('my unicodefile.txt ', 'wt', 'utf-8')
      for i in range(5):
      for j in range(32, 300):
      f.write(unichr( j))
      f.write('\n')
      f.close()
      #-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

      HTH,

      --
      alan kennedy
      -----------------------------------------------------
      check http headers here: http://xhaus.com/headers
      email alan: http://xhaus.com/mailto/alan

      Comment

      • Francis Avila

        #4
        Re: Writing UTF-8 string to UNICODE file

        "Michael Weir" <mweir@transres .com> wrote in message
        news:4e9sb.154$ s8.2312@news.on .tac.net...[color=blue]
        > I'm sure this is a very simple thing to do, once you know how to do it,[/color]
        but[color=blue]
        > I am having no fun at all trying to write utf-8 strings to a unicode file.
        > Does anyone have a couple of lines of code that
        > - opens a file appropriately for output
        > - writes to this file
        > Thanks very much.
        > Michael Weir[/color]

        I don't quite understand, since you seem to be talking about "unicode" as if
        it were a distinct encoding. Unicode is not an encoding, but a mapping of
        numbers to meaningful symbolic representations (letters, numbers, whatever).
        There's no such thing as a "unicode file", strictly speaking, because a file
        is a byte stream and unicode has nothing to do with bytes. Of course,
        loosely speaking, "unicode file" means "a file which uses one of those
        byte-stream encodings by which any arbitrary subset of unicode code points
        can be represented."

        If you mean, "how do I encode a unicode string as utf-8", do like this:
        [color=blue][color=green][color=darkred]
        >>> u"I'm a unicode string in utf-8 encoding.".enco de('utf-8')[/color][/color][/color]
        "I'm a unicode string in utf-8 encoding."

        This serializes an ordered collection of unicode code points into a byte
        stream, using the encoding method "utf-8". You want to write this byte
        stream to a file? Go right ahead.

        If you write a unicode string to something that wants a byte stream, I think
        Python's internal representation of the unicode string object will get
        serialized. (I'm not really sure what would happen, but it probably won't be
        utf-8.) I doubt this is what you want. You have to encode the unicode
        string first.

        To avoid having to do explicit conversions for every unicode string you want
        to write to a file, use codecs.open to open the file. This will wrap all
        reads/writes in an encoder/decoder, and all reads will give you a unicode
        string. However, I don't think you'll be able to write raw byte streams
        anymore--even normal strings will be reencoded. Also, be sure not to
        accidentally open the file using file() later--you'll be reading and writing
        raw byte
        streams, and will make a big mess of things.

        Perhaps Python should have all "strings" be unicode strings, and make a
        distinct "byte stream" type? This might make the "codepoint v.
        representation" distinction cleaner and more explicit, and allow us to go
        raw if we really want (although, mixing text and binary in a single file
        isn't such a good idea). It'd also be incredibly messy to change things,
        and less efficient if all you do is ascii text all day. Oh well.
        --
        Francis Avila


        Comment

        Working...