How to emit UTF-8 from console mode?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Siegfried Heintze

    How to emit UTF-8 from console mode?

    The following perl program works when I run it from urxvt-X console on
    cygwin-x windows

    LC_CTYPE=en_US. UTF-8 urxvt-X.exe&
    perl -wle "binmode STDOUT, q[:utf8]; print chr() for 0x410 .. 0x430;"

    This little one liner prints the Russian alphabet in Cryllic. With some
    slight modification it will also print a lot of other alphabets too --
    including Hebrew, chinese and japanese.

    It does not work with cmd.exe because apparently cmd.exe cannot deal with
    UTF-8.

    Can someone help me translate it into python? I would not expect it to work
    from cmd.exe with python, but I am hopeful it will work with urxvt-X!

    Thanks,
    Siegfried


  • =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

    #2
    Re: How to emit UTF-8 from console mode?

    LC_CTYPE=en_US. UTF-8 urxvt-X.exe&
    perl -wle "binmode STDOUT, q[:utf8]; print chr() for 0x410 .. 0x430;"
    Can someone help me translate it into python?
    LC_CTYPE=en_US. UTF-8 urxvt-X.exe&
    python -c 'for i in range(0x410, 0x431):print unichr(i),'
    I would not expect it to work
    from cmd.exe with python
    It should work in cmd.exe, as long as the terminal's encoding supports
    these characters in the first place. Use chcp.exe to find out what the
    terminal's encoding is. The Python program is not completely equivalent,
    as it leaves the output encoding to Python, rather than assuming a fixed
    UTF-8 output encoding.

    Regards,
    Martin

    Comment

    • Mark Tolonen

      #3
      Re: How to emit UTF-8 from console mode?


      ""Martin v. Löwis"" <martin@v.loewi s.dewrote in message
      news:48e31066$0 $2548$9b622d9e@ news.freenet.de ...
      >LC_CTYPE=en_US .UTF-8 urxvt-X.exe&
      >perl -wle "binmode STDOUT, q[:utf8]; print chr() for 0x410 .. 0x430;"
      >
      >Can someone help me translate it into python?
      >
      LC_CTYPE=en_US. UTF-8 urxvt-X.exe&
      python -c 'for i in range(0x410, 0x431):print unichr(i),'
      >
      >I would not expect it to work
      >from cmd.exe with python
      >
      It should work in cmd.exe, as long as the terminal's encoding supports
      these characters in the first place. Use chcp.exe to find out what the
      terminal's encoding is. The Python program is not completely equivalent,
      as it leaves the output encoding to Python, rather than assuming a fixed
      UTF-8 output encoding.
      >
      Regards,
      Martin
      Make sure you are using the Lucida Console font for the cmd.exe window and
      type the commands:

      chcp 1251
      python -c "print ''.join(unichr( i) for i in range(0x410,0x4 31))"

      Output:

      АБВГДЕЖР—ИЙКЛМНО ПРСТУФХЦЧР¨Ð©ÐªÐ«Ð¬Ð­Ð®Ð¯ а

      UTF-8 encoding (chcp 65001) doesn't work (Python doesn't recognize it:
      "LookupErro r: unknown encoding: cp65001") and I couldn't get any Chinese
      code pages to work either. There is some trick I don't know, because
      Chinese versions of Windows can display Chinese. I have the East Asian
      languages installed and Chinese IME enabled, but it doesn't help for console
      apps.

      --Mark

      Comment

      • Siegfried Heintze

        #4
        Re: How to emit UTF-8 from console mode?

        >Make sure you are using the Lucida Console font for the cmd.exe window and
        >type the commands:
        >
        >chcp 1251
        >python -c "print ''.join(unichr( i) for i in range(0x410,0x4 31))"
        >
        >Output:
        >
        >?????????????? ??????????????? ????
        >
        Wowa! I was not aware of that chcp command! Thanks! How could I do that
        "chcp 1251" programatically ?

        The code was a little confusing because those two apostrophes look like a
        double quote!

        But what are we doing here? Can you convince me that we are emitting UTF-8?
        I need UTF-8 because I need to experiment with some OS function calls that
        give me UTF-16 and I need to emit UTF-16 or UTF-8.

        I think part of the problem is that Lucida Console is not as capable as
        "Arial Unicode MS" or the fonts used by urxvt-X.

        Thanks,
        Siegfried


        Comment

        • Lie Ryan

          #5
          Re: How to emit UTF-8 from console mode?

          On Wed, 01 Oct 2008 08:17:15 -0700, Siegfried Heintze wrote:

          (snip)
          The code was a little confusing because those two apostrophes look like
          a double quote!
          Tips: use mono-spaced font. There is no ambiguity.

          (snip)
          I think part of the problem is that Lucida Console is not as capable as
          "Arial Unicode MS" or the fonts used by urxvt-X.
          >
          Thanks,
          Siegfried
          Why don't you write it to a file? Then open that file from Notepad

          Comment

          • =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

            #6
            Re: How to emit UTF-8 from console mode?

            But what are we doing here? Can you convince me that we are emitting UTF-8?

            Most definitely not. We are emitting cp1251.
            I need UTF-8 because I need to experiment with some OS function calls that
            give me UTF-16 and I need to emit UTF-16 or UTF-8.
            Try setting the code page to 65001, and emit the UTF-8 explicitly.

            Regards,
            Martin

            Comment

            • Ross Ridge

              #7
              Re: How to emit UTF-8 from console mode?

              I need UTF-8 because I need to experiment with some OS function calls that
              give me UTF-16 and I need to emit UTF-16 or UTF-8.
              <martin@v.loewi s.dewrote:
              >Try setting the code page to 65001, and emit the UTF-8 explicitly.
              Hmm... apparently that's not allowed on Windows XP:

              C:\chcp 65001
              Active code page: 65001

              C:\python -c "for i in range(0x410, 0x430): print unichr(i).encod e('utf-8')"
              Traceback (most recent call last):
              File "<string>", line 1, in <module>
              IOError: [Errno 13] Permission denied

              This works though:

              C:\python -c "for i in range(0x410, 0x430): print unichr(i).encod e('utf-8')" x

              C:\type x
              [a bunch of Cyrillic letters]

              Hmm... "more x" doesn't work, while "copy x con" works but gives an error.
              Looks like Windows XP support UTF-8 console output is a bit half-assed.

              Ross Ridge

              --
              l/ // Ross Ridge -- The Great HTMU
              [oo][oo] rridge@csclub.u waterloo.ca
              -()-/()/ http://www.csclub.uwaterloo.ca/~rridge/
              db //

              Comment

              • Mark Tolonen

                #8
                Re: How to emit UTF-8 from console mode?


                "Ross Ridge" <rridge@csclub. uwaterloo.cawro te in message
                news:gc10lj$jqe $1@rumours.uwat erloo.ca...
                >I need UTF-8 because I need to experiment with some OS function calls
                >that
                >give me UTF-16 and I need to emit UTF-16 or UTF-8.
                >
                <martin@v.loewi s.dewrote:
                >>Try setting the code page to 65001, and emit the UTF-8 explicitly.
                >
                Hmm... apparently that's not allowed on Windows XP:
                >
                C:\chcp 65001
                Active code page: 65001
                >
                C:\python -c "for i in range(0x410, 0x430): print
                unichr(i).encod e('utf-8')"
                Traceback (most recent call last):
                File "<string>", line 1, in <module>
                IOError: [Errno 13] Permission denied
                >
                This works though:
                >
                C:\python -c "for i in range(0x410, 0x430): print
                unichr(i).encod e('utf-8')" x
                >
                C:\type x
                [a bunch of Cyrillic letters]
                >
                Hmm... "more x" doesn't work, while "copy x con" works but gives an error.
                Looks like Windows XP support UTF-8 console output is a bit half-assed.
                It is odd, though, that when the code page (and font) are correct,
                redirecting to a file and typing it work, but printing the result to the
                console does not

                Comment

                • Mark Tolonen

                  #9
                  Re: How to emit UTF-8 from console mode?


                  "Siegfried Heintze" <siegfried@hein tze.comwrote in message
                  news:vLCdnUSj27 MaCX7VnZ2dnUVZ_ uGdnZ2d@comcast .com...
                  >
                  >>Make sure you are using the Lucida Console font for the cmd.exe window and
                  >>type the commands:
                  >>
                  >>chcp 1251
                  >>python -c "print ''.join(unichr( i) for i in range(0x410,0x4 31))"
                  >>
                  >>Output:
                  >>
                  >>????????????? ??????????????? ?????
                  >>
                  Wowa! I was not aware of that chcp command! Thanks! How could I do that
                  "chcp 1251" programatically ?
                  >
                  The code was a little confusing because those two apostrophes look like a
                  double quote!
                  >
                  But what are we doing here? Can you convince me that we are emitting
                  UTF-8? I need UTF-8 because I need to experiment with some OS function
                  calls that give me UTF-16 and I need to emit UTF-16 or UTF-8.
                  >
                  I think part of the problem is that Lucida Console is not as capable as
                  "Arial Unicode MS" or the fonts used by urxvt-X.
                  In this case, it is not emitting UTF-8. It is emitting the windows-1251
                  encoding. As another poster mentioned, the Windows console gets an error
                  when attempting to write UTF8 when the code page is 65001 (UTF8). But you
                  can write output to a file explicitly in UTF-8 or UTF-16 and view the file
                  with Notepad. I've used this method for processing Chinese.
                  >>import os,codecs
                  >>data = u''.join(unichr (i) for i in range(0x410,0x4 31))
                  >>codecs.open(' out.txt','wt',' utf-8').write(data)
                  >>os.startfile( 'out.txt')
                  P.S.

                  One way to set the code page programmaticall y is to use ctypes, but this
                  will only work in a Windows console:
                  >>import ctypes
                  >>k=ctypes.WinD LL('kernel32')
                  >>x.SetConsoleO utputCP(1251)
                  1
                  >>print u''.join(unichr (i) for i in
                  >>range(0x410,0 x430)).encode(' windows-1251')
                  АБВГДЕЖР—ИЙКЛМНО ПРСТУФХЦЧР¨Ð©ÐªÐ«Ð¬Ð­Ð®Ð¯

                  --Mark

                  Comment

                  Working...