Hebrew in idle ans eclipse (Windows)

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • iu2

    Hebrew in idle ans eclipse (Windows)

    Hi all,

    I'll realy appreciate your help in this:

    I read data from a database containg Hebrew words.
    When the application is run from IDLE a word looks like this, for
    example:
    \xe8\xe9\xe5

    But when I run the same application from eclipse or the Windows shell
    I get the 'e's replaced with '8's:
    \x88\x89\x85

    The IDLE way is the way I need, since using Hebrew words in the
    program text itself, as keys in a dict, for example, yield similar
    strings (with 'e's). When running from eclipse I get KeyError for this
    dict..

    What do I need to do run my app like IDLE does?

    Thanks
    iu2
  • =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

    #2
    Re: Hebrew in idle ans eclipse (Windows)

    What do I need to do run my app like IDLE does?

    Can you please show the fragment of your program that prints
    these strings?

    Regards,
    Martin

    Comment

    • iu2

      #3
      Re: Hebrew in idle ans eclipse (Windows)

      On Jan 17, 6:59 am, "Martin v. Löwis" <mar...@v.loewi s.dewrote:
      What do I need to do run my app like IDLE does?
      >
      Can you please show the fragment of your program that prints
      these strings?
      >
      Regards,
      Martin
      Hi,
      I use pymssql to get the data from a database, just like this (this is
      from the pymssql doc):

      import pymssql

      con =
      pymssql.connect (host='192.168. 13.122',user='s a',password='', database='tempd b')
      cur = con.cursor()
      cur.execute('se lect firstname, lastname from [users]')
      lines = cur.fetchall()

      print lines

      or

      print lines[0]

      'lines' is a list containing tuples of 2 values, for firstname and
      lastname. The names are Hebrew and their code looks different when I'm
      runnig it from IDLE than when running it from Windows shell or
      eclipse, as I described in my first post.


      Important: This doesn't happer when I read text from a file containing
      Hebrew text. In that case both IDLE and eclipse give the same reulst
      (the hebrew word itself is printed to the console)

      Comment

      • =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

        #4
        Re: Hebrew in idle ans eclipse (Windows)

        import pymssql
        >
        con =
        pymssql.connect (host='192.168. 13.122',user='s a',password='', database='tempd b')
        cur = con.cursor()
        cur.execute('se lect firstname, lastname from [users]')
        lines = cur.fetchall()
        >
        print lines
        >
        or
        >
        print lines[0]
        >
        'lines' is a list containing tuples of 2 values, for firstname and
        lastname. The names are Hebrew and their code looks different when I'm
        runnig it from IDLE than when running it from Windows shell or
        eclipse, as I described in my first post.
        Ok. Please understand that there are different ways to represent
        characters as bytes; these different ways are called "encodings" .

        Please also understand that you have to make a choice of encoding
        every time you represent characters as bytes: if you read it from a
        database, and if you print it to a file or to the terminal.

        Please further understand that interpreting bytes in an encoding
        different from the one they were meant for results in a phenomenon
        called "moji-bake" (from Japanese, "ghost characters"). You get
        some text, but it makes no sense (or individual characters are incorrect).

        So you need to find out
        a) what the encoding is that your data have in MySQL
        b) what the encoding is that is used when printing in IDLE
        c) what the encoding is that is used when printing into
        a terminal window.

        b) and c) are different on Windows; the b) encoding is called
        the "ANSI code page", and c) is called the "OEM code page".
        What the specific choice is depends on your specific Windows
        version and local system settings.

        As for a: that's a choice somebody made when the database
        was created; I don't know how to figure out what encoding
        MySQL uses.

        In principle, rather than doing

        print lines[0]

        you should do

        print lines[0].decode("<a-encoding>").enc ode("<c-encoding>")

        when printing to the console. Furtenately, you can also write
        this as

        print lines[0].decode("<a-encoding>")

        as Python will figure out the console encoding by itself, but
        it can't figure out the MySQL encoding (or atleast doesn't,
        the way you use MySQL).

        Regards,
        Martin

        Comment

        • iu2

          #5
          Re: Hebrew in idle ans eclipse (Windows)

          On Jan 17, 10:35 pm, "Martin v. Löwis" <mar...@v.loewi s.dewrote:
          ...
          print lines[0].decode("<a-encoding>").enc ode("<c-encoding>")
          ...
          Regards,
          Martin
          Ok, I've got the solution, but I still have a question.

          Recall:
          When I read data using sql I got a sequence like this:
          \x88\x89\x85
          But when I entered heberw words directly in the print statement (or as
          a dictionary key)
          I got this:
          \xe8\xe9\xe5

          Now, scanning the encoding module I discovered that cp1255 maps
          '\u05d9' to \xe9
          while cp856 maps '\u05d9' to \x89,
          so trasforming \x88\x89\x85 to \xe8\xe9\xe5 is done by

          s.decode('cp856 ').encode('cp12 55')

          ending up with the pattern you suggested.

          My qestion is, is there a way I can deduce cp856 and cp1255 from the
          string itself? Is there a function doing it? (making the
          transformation more robust)

          I don't know how IDLE guessed cp856, but it must have done it.
          (perhaps because it uses tcl, and maybe tcl guesses the encoding
          automatically?)

          thanks
          iu2



          Comment

          • iu2

            #6
            Re: Hebrew in idle ans eclipse (Windows)

            On Jan 23, 11:17 am, "Martin v. Löwis" <mar...@v.loewi s.dewrote:
            If you are claimaing that the program
            >
            Apparently, they do the OEMtoANSI conversion when you run a console
            application (i.e. python.exe), whereas they don't convert when running
            a GUI application (pythonw.exe).
            >
            I'm not quite sure how they find out whether the program is a console
            application or not; the easiest thing to do might be to turn the
            autoconversion off on the server.
            >
            Regards,
            Martin
            True! It's amazing, I've just written a little code that reads from
            the database and writes the data to a file.
            Then I ran the code with both python.exe and pythonw.exe and got the
            two kinds of results - the IDLE one and the eclipse one!

            Comment

            Working...