Reading Unicode escape sequences from File

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • John Ztwin

    Reading Unicode escape sequences from File

    Hello,

    I have a file that contains ordinary text and some special charaters in
    Unicode escape sequences (\uxxxx).

    When I read the file using e.g. StreamReader Unicode escape sequences are
    not converted to their character representation. They are shown excatly same
    way than in file. Literals in C# code's variables are shown corretly.

    Can anyone tell how to read Unicode escape sequences from file so that they
    are presented like literals?

    Thanks,


  • Jon Skeet [C# MVP]

    #2
    Re: Reading Unicode escape sequences from File

    John Ztwin <John.z@mail.co mwrote:
    I have a file that contains ordinary text and some special charaters in
    Unicode escape sequences (\uxxxx).
    >
    When I read the file using e.g. StreamReader Unicode escape sequences are
    not converted to their character representation.
    No, I wouldn't expect them to be. That's done by the C# compiler - it
    would be a big mistake for it to be done by StreamReader.
    They are shown excatly same
    way than in file. Literals in C# code's variables are shown corretly.
    >
    Can anyone tell how to read Unicode escape sequences from file so that they
    are presented like literals?
    You basically need to parse the text you've read, just like the C#
    compiler does. You can search for \u fairly easily, then take the next
    four digits, complain if they're not all hex, convert the hex to a
    char, then replace the whole section with the character value.

    --
    Jon Skeet - <skeet@pobox.co m>
    Web site: http://www.pobox.com/~skeet
    Blog: http://www.msmvps.com/jon.skeet
    C# in Depth: http://csharpindepth.com

    Comment

    • John Ztwin

      #3
      Re: Reading Unicode escape sequences from File

      A little bit more work than in Java if I remember right,
      Thanks for reply!

      "Jon Skeet [C# MVP]" <skeet@pobox.co mwrote in message
      news:MPG.22c611 1d826f19f2da2@m snews.microsoft .com...
      John Ztwin <John.z@mail.co mwrote:
      >I have a file that contains ordinary text and some special charaters in
      >Unicode escape sequences (\uxxxx).
      >>
      >When I read the file using e.g. StreamReader Unicode escape sequences are
      >not converted to their character representation.
      >
      No, I wouldn't expect them to be. That's done by the C# compiler - it
      would be a big mistake for it to be done by StreamReader.
      >
      >They are shown excatly same
      >way than in file. Literals in C# code's variables are shown corretly.
      >>
      >Can anyone tell how to read Unicode escape sequences from file so that
      >they
      >are presented like literals?
      >
      You basically need to parse the text you've read, just like the C#
      compiler does. You can search for \u fairly easily, then take the next
      four digits, complain if they're not all hex, convert the hex to a
      char, then replace the whole section with the character value.
      >
      --
      Jon Skeet - <skeet@pobox.co m>
      Web site: http://www.pobox.com/~skeet
      Blog: http://www.msmvps.com/jon.skeet
      C# in Depth: http://csharpindepth.com

      Comment

      • Jon Skeet [C# MVP]

        #4
        Re: Reading Unicode escape sequences from File

        John Ztwin <John.z@mail.co mwrote:
        A little bit more work than in Java if I remember right,
        Well, not if you use the normal BufferedReader and InputStreamRead er in
        Java.

        Java's Properties class will do the unescaping for properties files,
        but it isn't general purpose.

        --
        Jon Skeet - <skeet@pobox.co m>
        Web site: http://www.pobox.com/~skeet
        Blog: http://www.msmvps.com/jon.skeet
        C# in Depth: http://csharpindepth.com

        Comment

        • =?ISO-8859-1?Q?Arne_Vajh=F8j?=

          #5
          Re: Reading Unicode escape sequences from File

          John Ztwin wrote:
          I have a file that contains ordinary text and some special charaters in
          Unicode escape sequences (\uxxxx).
          >
          When I read the file using e.g. StreamReader Unicode escape sequences are
          not converted to their character representation. They are shown excatly same
          way than in file. Literals in C# code's variables are shown corretly.
          >
          Can anyone tell how to read Unicode escape sequences from file so that they
          are presented like literals?
          You will need to make a text replace.

          Example code:

          public static string U2U(string s)
          {
          string res = s;
          MatchCollection reg = Regex.Matches(r es, @"\\u([0-9A-F]{4})");
          for(int i = 0; i < reg.Count; i++) {
          res = res.Replace(reg[i].Groups[0].Value, "" +
          (char)int.Parse (reg[i].Groups[1].Value, NumberStyles.He xNumber));
          }
          return res;
          }

          Arne

          Comment

          • Michael Justin

            #6
            Re: Reading Unicode escape sequences from File

            John Ztwin wrote:
            I have a file that contains ordinary text and some special charaters in
            Unicode escape sequences (\uxxxx).
            If the file always uses \u then there is no risk. However, some
            standards (like the C# spec) allow \U (uppercase) escape sequences:

            unicode-escape-sequence:
            \u hex-digit hex-digit hex-digit hex-digit
            \U hex-digit hex-digit hex-digit hex-digit hex-digit
            hex-digit hex-digit hex-digit




            Best regards
            --
            Michael Justin
            SCJP, SCJA
            betasoft - Software for Delphiâ„¢ and for the Javaâ„¢ platform
            http://www.mikejustin.com - http://www.betabeans.de

            Comment

            Working...