charCodeAt()

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Ron Lange

    charCodeAt()

    hi, sometimes I get weird codes by String.charCode At() if I'am converting
    binary data (iso-8859-1 written by a java app) into numeric values. For
    example instead of 164 I got 8364 (the generic currency symbol '¤'). I
    have already tested it by the url-statement javascript:("¤" ).charCodeAt();
    in my browser, the result is 164. Some other values exeeding the maximum
    of 255 for the first latin-1 conform unicode set just only a bit, but I am
    wondering that they're doing it at all.
    regards
    ron
  • Martin Honnen

    #2
    Re: charCodeAt()



    Ron Lange wrote:
    [color=blue]
    > sometimes I get weird codes by String.charCode At() if I'am
    > converting binary data (iso-8859-1 written by a java app) into numeric
    > values. For example instead of 164 I got 8364 (the generic currency
    > symbol '¤'). I have already tested it by the url-statement
    > javascript:("¤ ").charCodeAt() ; in my browser, the result is 164. Some
    > other values exeeding the maximum of 255 for the first latin-1 conform
    > unicode set just only a bit, but I am wondering that they're doing it
    > at all.[/color]

    String.charCode At() is a method on JavaScript strings and JavaScript
    doesn't know binary data thus I am not sure what you are doing. I think
    you need to provide some more information on what the Java app is doing
    and how it interacts with JavaScript (in the browser?, or are you using
    some JavaScript implementation like Rhino in a Java app?).
    Strings in JavaScript are Unicode encoded since Netscape 4.06 and IE4 thus
    "¤".charCodeAt ()
    should indeed yield
    164
    while you should see
    8364
    as the result of
    "€".charCodeA t()

    Thus if you are having any problems you want help with you need to
    provide more information, if it is JavaScript in a browser tell us the
    browser version you are using and how it interacts with that Java app.

    --

    Martin Honnen


    Comment

    • Ron Lange

      #3
      Re: charCodeAt()

      Hi Martin,
      the application is writing numeric values as iso-8859-1 encoded characters
      (further called binary data;-). These characters are concatenated to one
      string var within a javascript in a simple html page, which is then
      converted back into numeric values by this script.

      And, to prevent further detail questions ;-), of course the javascript
      literals quote, newline and backslash are written as octal escape sequence.

      In my little computer scientist's brain I assumed something like that a
      iso-8859-1 encoded string should be treated as iso-8859-1 string, even the
      charCodeAt() method is defined on Unicode or whatever, since the
      iso-8859-1 literals should be the lowest 8-bit set. For safety I
      determined the document's charset in the html page, too.

      Additional information:
      Used Browsers: Opera 7.53, Opera 7.52 and Mozilla 1.4 on Solaris and Linux

      Regards
      Ron

      Comment

      • Ron Lange

        #4
        Re: charCodeAt()

        I put a java generated page on the net to illustrate my issue.


        Comment

        • Martin Honnen

          #5
          Re: charCodeAt()



          Ron Lange wrote:

          [color=blue]
          > the application is writing numeric values as iso-8859-1 encoded
          > characters (further called binary data;-). These characters are
          > concatenated to one string var within a javascript in a simple html
          > page, which is then converted back into numeric values by this script.[/color]

          So the Java application is some JSP or servlet or other thing answering
          HTTP requests of a browser and sending it a HTTP response, in this case
          a HTML page with some JavaScript section embedded, is that right?
          [color=blue]
          > And, to prevent further detail questions ;-), of course the javascript
          > literals quote, newline and backslash are written as octal escape sequence.[/color]
          [color=blue]
          > In my little computer scientist's brain I assumed something like that a
          > iso-8859-1 encoded string should be treated as iso-8859-1 string, even
          > the charCodeAt() method is defined on Unicode or whatever, since the
          > iso-8859-1 literals should be the lowest 8-bit set. For safety I
          > determined the document's charset in the html page, too.[/color]

          If it is a HTML page then the browser/user agent will decode that whole
          page by trying to use any encoding send in the HTTP response (e.g. in the
          Content-Type: text/html; charset=ISO-8859-1
          header)
          or if nothing is present there then by looking for a
          <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
          element line in the <head> section of the HTML page.
          So whatever your Java app writes in that JavaScript section in the whole
          HTML page being sent should match the encoding for the whole page, then
          I don't see any problems, the whole script should be passed on to the
          script engine in whichever encoding that prefers internally and should
          work with Unicode strings.
          What happens if you do a view-source of the page your Java app sends, if
          you look at the JavaScript string literal in the source, which
          characters show up there?
          What happens if you do a document.write of the JavaScript string
          literal, which characters show up there?
          I suspect that you should already see something there that you do not
          expect, I have some doubts that the problem is the charCodeAt method if
          two different browsers (as stated below) give you the same result.
          [color=blue]
          > Additional information:
          > Used Browsers: Opera 7.53, Opera 7.52 and Mozilla 1.4 on Solaris and Linux[/color]

          If your app is online post a URL and someone in the newsgroup can then
          examine the HTML/JavaScript send by it to find out what is going wrong.

          --

          Martin Honnen


          Comment

          • Ron Lange

            #6
            Re: charCodeAt()

            Hi Martin,
            no, it is no cgi. Please have a look at



            where you can find an illustration of my problem. I forget to mention that
            different browsers obvisiously threat the encoding in different ways
            (although page encoding should be preserved).

            Regards and thank you for your reply
            Ron

            Comment

            • Thomas 'PointedEars' Lahn

              #7
              Re: charCodeAt()

              Ron Lange wrote:[color=blue]
              > hi, sometimes I get weird codes by String.charCode At()[/color]

              You mean String.prototyp e.charCodeAt().
              [color=blue]
              > if I'am converting binary data (iso-8859-1 written by a java app)[/color]

              You are confusing JavaScript and Java:

              <http://jibbering.com/faq/#FAQ2_2>
              [color=blue]
              > into numeric values. For example instead of 164 I got 8364
              > (the generic currency symbol '¤').[/color]

              I do not. Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6)
              Gecko/20040122 Debian/1.6-1, using a JavaScript 1.5 script
              engine and de_DE@euro as locale (where @euro is equal to
              @ISO-8859-15 AFAIK).

              The script engine you have tested with seems to try to
              workaround the Euro issue, possibly due to system limitations.
              In ISO-8859-1, code point 0xA4 (164) is the Currency sign. In
              ISO-8859-15, that code point is the Euro sign. In Unicode,
              however, the Euro sign is located at code point 0x20AC (8364).
              [color=blue]
              > I have already tested it by the url-statement
              > javascript:("¤" ).charCodeAt();[/color]

              There is nothing like a "url-statement". What you have used is a
              proprietary "javascript :" URI designed to return a dynamic document
              generated from JavaScript expressions. By using a call operator
              within the expression, the respective method is called (and its
              return value, if it has any, creates a temporary document).

              <http://jibbering.com/faq/#FAQ4_24>
              [color=blue]
              > in my browser,[/color]

              Which is? On which OS? On which platform?

              <http://jibbering.com/faq/#FAQ2_3>
              [color=blue]
              > the result is 164.[/color]

              Seems like your browser's script engine, too, is interpreting strings with
              charCodeAt() as ISO-8859-1 strings (where appropriate) and not as Unicode
              strings.

              But then your test case is not even Valid HTML:

              <http://validator.w3.or g/check?uri=http://home.teleos-web.de/rlange1/chartest.htm&ss =1;verbose=1>
              [color=blue]
              > Some other values exeeding the maximum of 255 for the first latin-1
              > conform unicode set[/color]

              There is no "Latin-1 conform unicode set". Latin-1 (ISO-8859-1) is a subset
              of Unicode, with the exception of a few code points, including 0xA4 (164).
              The correct designation for the unicode subsets without those exceptions
              is "Basic Latin" and "Latin-1 Supplement".

              <http://www.unicode.org/charts/>
              <http://www.htmlhelp.co m/reference/charset/iso160-191.html>
              [color=blue]
              > just only a bit, but I am wondering that they're doing it at all.[/color]

              See also <40F0A7CD.80103 03@PointedEars. de>.


              PointedEars

              Comment

              • Ron Lange

                #8
                Re: charCodeAt()

                Am Sun, 25 Jul 2004 22:26:27 +0200 hat Thomas 'PointedEars' Lahn
                <PointedEars@we b.de> geschrieben:
                [color=blue][color=green]
                >> if I'am converting binary data (iso-8859-1 written by a java app)[/color]
                >
                > You are confusing JavaScript and Java:[/color]

                No, I don't confuse it, just follow the thread.
                [color=blue]
                > There is nothing like a "url-statement". What you have used is a
                > proprietary "javascript :" URI ---snip---[/color]

                Thank you very much. I'm wondering why someone is spending so much energy
                on explaining such trivial things. But anyway. The problem couldn't be
                solved since just only the first 128 values of all iso charsets being
                interpreted in the same way by charCodeAt(), there I was a bit
                disinformed. Finally, the capabilities of JavaScript for e.g. high
                compressed content are quite unsatisfying.

                And: be nice and dont' reply, this thread is terminated.

                Comment

                Working...