Writing Unicode-16 to a text file

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Konrad Den Ende

    Writing Unicode-16 to a text file

    I tried to write some Unicode-16 characters (that were displayed
    correctly, as expected, on the screen) to a file but it didn't work
    out very well. I have those in an char[] as well as a String. Both
    will give me a number of "?".

    What do i miss?

    --

    Kindly
    Konrad
    ---------------------------------------------------
    May all spammers die an agonizing death; have no burial places;
    their souls be chased by demons in Gehenna from one room to
    another for all eternity and more.

    Sleep - thing used by ineffective people
    as a substitute for coffee

    Ambition - a poor excuse for not having
    enough sense to be lazy
    ---------------------------------------------------




  • chris

    #2
    Re: Writing Unicode-16 to a text file

    Konrad Den Ende wrote:
    [color=blue]
    > I tried to write some Unicode-16 characters (that were displayed
    > correctly, as expected, on the screen) to a file but it didn't work
    > out very well. I have those in an char[] as well as a String. Both
    > will give me a number of "?".
    >
    > What do i miss?
    >[/color]

    When you wrote the characters to a file (what method did you use?) they
    probably underwent a 16-bit to 8-bit conversion, using some encoding (what
    encoding did you specify? or what is your Java installation using as its
    default encoding?). When you looked at the file afterwards, the software
    you used to do that (what did you use?) probably wasn't set up to grok that
    encoding.

    What happens when you read the file back into Java?

    Good luck,

    Chris

    --
    Chris Gray chris@kiffer.eu net.be
    /k/ Embedded Java Solutions

    Comment

    • chris

      #3
      Re: Writing Unicode-16 to a text file

      Konrad Den Ende wrote:
      [color=blue]
      > I tried to write some Unicode-16 characters (that were displayed
      > correctly, as expected, on the screen) to a file but it didn't work
      > out very well. I have those in an char[] as well as a String. Both
      > will give me a number of "?".
      >
      > What do i miss?
      >[/color]

      When you wrote the characters to a file (what method did you use?) they
      probably underwent a 16-bit to 8-bit conversion, using some encoding (what
      encoding did you specify? or what is your Java installation using as its
      default encoding?). When you looked at the file afterwards, the software
      you used to do that (what did you use?) probably wasn't set up to grok that
      encoding.

      What happens when you read the file back into Java?

      Good luck,

      Chris

      --
      Chris Gray chris@kiffer.eu net.be
      /k/ Embedded Java Solutions

      Comment

      • Konrad Den Ende

        #4
        Re: Writing Unicode-16 to a text file

        > When you wrote the characters to a file (what method did you use?) they[color=blue]
        > probably underwent a 16-bit to 8-bit conversion[/color]

        try {
        BufferedWriter writer = new BufferedWriter (new FileWriter
        ("nihongo.txt") );
        writer.write (cc); // cc is a char[] that stores the characters
        writer.close ();
        }
        catch (Exception e) {System.out.pri ntln (e.getMessage ());}

        [color=blue]
        > using some encoding (what encoding did you specify? or what is your Java
        > installation using as its default encoding?).[/color]

        I didn't specify any encoding so i guess it's english. BUT i figured that
        since
        char is not more than a number then my char[] variable is just an array of
        some
        kind of integers (2-byte, i guess, so it will contain all the 65k
        characters).
        [color=blue]
        > When you looked at the file afterwards, the software you used to do that
        > (what did you use?) probably wasn't set up to grok that encoding.[/color]

        I used MS Word and a text reader with enabled japanese. Just to be sure i
        checked a file that i can read japanese text from using my usual software,
        and read from it using notepad. I didn's see japanese (oh, what a surprise)
        but i could see a number of strange characters.
        Yet, the file that my application creates, contains only "?"'s.
        [color=blue]
        > What happens when you read the file back into Java?[/color]

        "?"'s only.

        Any hint?
        --

        Kindly
        Konrad
        ---------------------------------------------------
        May all spammers die an agonizing death; have no burial places;
        their souls be chased by demons in Gehenna from one room to
        another for all eternity and more.

        Sleep - thing used by ineffective people
        as a substitute for coffee

        Ambition - a poor excuse for not having
        enough sense to be lazy
        ---------------------------------------------------




        Comment

        • Konrad Den Ende

          #5
          Re: Writing Unicode-16 to a text file

          > When you wrote the characters to a file (what method did you use?) they[color=blue]
          > probably underwent a 16-bit to 8-bit conversion[/color]

          try {
          BufferedWriter writer = new BufferedWriter (new FileWriter
          ("nihongo.txt") );
          writer.write (cc); // cc is a char[] that stores the characters
          writer.close ();
          }
          catch (Exception e) {System.out.pri ntln (e.getMessage ());}

          [color=blue]
          > using some encoding (what encoding did you specify? or what is your Java
          > installation using as its default encoding?).[/color]

          I didn't specify any encoding so i guess it's english. BUT i figured that
          since
          char is not more than a number then my char[] variable is just an array of
          some
          kind of integers (2-byte, i guess, so it will contain all the 65k
          characters).
          [color=blue]
          > When you looked at the file afterwards, the software you used to do that
          > (what did you use?) probably wasn't set up to grok that encoding.[/color]

          I used MS Word and a text reader with enabled japanese. Just to be sure i
          checked a file that i can read japanese text from using my usual software,
          and read from it using notepad. I didn's see japanese (oh, what a surprise)
          but i could see a number of strange characters.
          Yet, the file that my application creates, contains only "?"'s.
          [color=blue]
          > What happens when you read the file back into Java?[/color]

          "?"'s only.

          Any hint?
          --

          Kindly
          Konrad
          ---------------------------------------------------
          May all spammers die an agonizing death; have no burial places;
          their souls be chased by demons in Gehenna from one room to
          another for all eternity and more.

          Sleep - thing used by ineffective people
          as a substitute for coffee

          Ambition - a poor excuse for not having
          enough sense to be lazy
          ---------------------------------------------------




          Comment

          • Soren Kuula

            #6
            Re: Writing Unicode-16 to a text file

            Konrad Den Ende wrote:[color=blue][color=green]
            >>When you wrote the characters to a file (what method did you use?) they
            >>probably underwent a 16-bit to 8-bit conversion[/color][/color]
            [color=blue]
            > try {
            > BufferedWriter writer = new BufferedWriter (new FileWriter
            > ("nihongo.txt") );
            > writer.write (cc); // cc is a char[] that stores the characters
            > writer.close ();
            > }
            > catch (Exception e) {System.out.pri ntln (e.getMessage ());}[/color]
            [color=blue][color=green]
            >>using some encoding (what encoding did you specify? or what is your Java
            >>installatio n using as its default encoding?).[/color][/color]
            [color=blue]
            > Any hint?[/color]

            Sure.

            You have been writing Japanese with an encoding that doensn't support
            it. I bet your default encoding, derived from your operating system
            locale (you may see that from System.getPrope rties() . .. ) is ISO-8859
            or something like that. It does not support Japanese.

            You should look at OutputStreamWri ter, of which you can make an instance
            that uses an encoding that supports Japanese. You can get an idea of
            what encodings are supported by looking at the CharSet class of java
            1.4's nio package. There is a static method there, I forgot its name,
            that will return you a Set of the names of supported encodings.

            You may end up using ISO-2022-something, but I prefer Unicode's UTF-8,
            it's a lot nicer and cleaner, and it supports almost any language. You
            will need Unicode fonts though.

            En encoding is the mapping from bytes (sequences of 8 bits) to a higher
            level of abstraction, namely characters. Streams are byte oriented,
            readers/writers are character oriented, and encoding/decoding is in
            between.

            Hope that helped.
            Soren
            --
            Fjern de 4 bogstaver i min mailadresse som er indsat for at hindre s...
            Remove the 4 letter word meaning "junk mail" in my mail address.

            Comment

            Working...