ASCII vs Unicode

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Jeff

    ASCII vs Unicode

    Hi -

    I'm setting up a streamreader in a VB.NET app to read a text file and
    display its contents in a multiline textbox.

    If I set it up with System.Text.Enc oding.Unicode, it reads a unicode file
    just fine. If I set it up as ASCII, it reads a non-unicode text file. But
    I don't know the file format in advance.

    How can my app determine whether to use Unicode encoding before I read the
    file?

    - Jeff


  • Peter Huang

    #2
    RE: ASCII vs Unicode

    Hi Jeff,

    Based on my test, we do not need to specified the encoding when we read a
    file into string, .net framework will handle the issue.
    Private Const FILE_NAME As String = "c:\unicode.txt "
    Private Const FILE_NAME1 As String = "c:\ascii.t xt"
    Public Sub Main()
    If Not File.Exists(FIL E_NAME) Then
    Console.WriteLi ne("{0} does not exist.", FILE_NAME)
    Return
    End If
    Dim sr As StreamReader = File.OpenText(F ILE_NAME)
    Dim input As String
    input = sr.ReadToEnd()
    Console.WriteLi ne(input)
    sr.Close()

    sr = File.OpenText(F ILE_NAME1)
    input = sr.ReadToEnd()
    Console.WriteLi ne(input)
    sr.Close()
    End Sub

    Best regards,

    Peter Huang
    Microsoft Online Partner Support

    Get Secure! - www.microsoft.com/security
    This posting is provided "AS IS" with no warranties, and confers no rights.

    Comment

    • Jeff

      #3
      Re: ASCII vs Unicode

      Thanks for responding, Peter -

      I wish my results were the same as yours. I've attached a couple of files,
      ASCII.txt and Unicode.txt. My code is below, FYI.

      I've got a form with a textbox on it, and a couple of radiobuttons (encode
      or not). To make it simple, I also included a couple of buttons, one for
      each file. I display the results of the function below in the textbox.

      With encoding, the Unicode file displays fine, and the ASCII file is a
      string of unreadable characters. With no encoding, the ASCII file displays
      fine, and the Unicode file only displays the first character (=).

      I've tried replacing the If-Else-End If block of the code with "rdr =
      File.OpenText(r File)", but my results are the same as the no encoding
      results just described.

      Note that the Unicode file that I'm trying to read is a log file created by
      Microsoft's SQL Server Desktop Engine Setup.exe.

      I'd appreciate additional help to solve this problem.

      - Jeff


      My code:

      Public Function ReadFile(ByVal rFile As String) As String
      Dim fi As FileInfo
      Dim rdr As StreamReader

      Try

      ReadFile = ""

      fi = New FileInfo(rFile)
      If Not fi.Exists Then
      MessageBox.Show ("File Not Found." & ControlChars.Cr Lf & rFile)
      Exit Function
      End If

      fi = Nothing
      If frmMain.optUnic ode.Checked Then
      rdr = New StreamReader(rF ile, System.Text.Enc oding.Unicode)
      Else
      rdr = New StreamReader(rF ile)
      End If

      ReadFile = rdr.ReadToEnd

      rdr.Close()
      rdr = Nothing

      Catch ex As Exception
      MessageBox.Show (ex.ToString)

      End Try

      End Function


      ""Peter Huang"" <v-phuang@online.m icrosoft.com> wrote in message
      news:gFrwoqEoEH A.2864@cpmsftng xa06.phx.gbl...[color=blue]
      > Hi Jeff,
      >
      > Based on my test, we do not need to specified the encoding when we read a
      > file into string, .net framework will handle the issue.
      > Private Const FILE_NAME As String = "c:\unicode.txt "
      > Private Const FILE_NAME1 As String = "c:\ascii.t xt"
      > Public Sub Main()
      > If Not File.Exists(FIL E_NAME) Then
      > Console.WriteLi ne("{0} does not exist.", FILE_NAME)
      > Return
      > End If
      > Dim sr As StreamReader = File.OpenText(F ILE_NAME)
      > Dim input As String
      > input = sr.ReadToEnd()
      > Console.WriteLi ne(input)
      > sr.Close()
      >
      > sr = File.OpenText(F ILE_NAME1)
      > input = sr.ReadToEnd()
      > Console.WriteLi ne(input)
      > sr.Close()
      > End Sub
      >
      > Best regards,
      >
      > Peter Huang
      > Microsoft Online Partner Support
      >
      > Get Secure! - www.microsoft.com/security
      > This posting is provided "AS IS" with no warranties, and confers no[/color]
      rights.[color=blue]
      >[/color]


      Comment

      • Peter Huang

        #4
        Re: ASCII vs Unicode

        Hi Jeff,

        FFFE is the byte order mark of Unicode, it signifies to Unicode that the
        bytes are little endian.
        If you use the similar as below to write a string into a file with unicode
        encoding, you will find that there is the FFFE leader characters.
        Dim wr As New StreamWriter("C :\testuni.txt", True,
        System.Text.Enc oding.Unicode)
        wr.Write(input)
        wr.Close()

        If you use the notepad.exe to save text as unicode file in the save as
        dialog, you will find the same FFFE occurance.

        Now the different between my unicode and your unicode file is that the
        unicode.txt you provide did not have the FFFE leader characters which cause
        the problem. StreamReader needs that to do the right decoding from byte
        stream to string.

        Acutally if you use the StreamReader to read your unicode.txt into a string
        and then use the Console.WriteLi ne to print the string, you will find that
        the string will be displayed correctly but a space between every two
        characters.

        So far now I think you may try to add the FFFE at the very beginning of the
        unicode file when you generate the file.
        You may try to use the hex editor to observe the unicode.txt. e.g.
        UltraEdit is a good hex editor.


        Best regards,

        Peter Huang
        Microsoft Online Partner Support

        Get Secure! - www.microsoft.com/security
        This posting is provided "AS IS" with no warranties, and confers no rights.

        Comment

        • Peter Huang

          #5
          Re: ASCII vs Unicode

          Hi Jeff,

          FFFE is the byte order mark of Unicode, it signifies to Unicode that the
          bytes are little endian.
          If you use the similar as below to write a string into a file with unicode
          encoding, you will find that there is the FFFE leader characters.
          Dim wr As New StreamWriter("C :\testuni.txt", True,
          System.Text.Enc oding.Unicode)
          wr.Write(input)
          wr.Close()

          If you use the notepad.exe to save text as unicode file in the save as
          dialog, you will find the same FFFE occurance.

          Now the different between my unicode and your unicode file is that the
          unicode.txt you provide did not have the FFFE leader characters which cause
          the problem. StreamReader needs that to do the right decoding from byte
          stream to string.

          Acutally if you use the StreamReader to read your unicode.txt into a string
          and then use the Console.WriteLi ne to print the string, you will find that
          the string will be displayed correctly but a space between every two
          characters.

          So far now I think you may try to add the FFFE at the very beginning of the
          unicode file when you generate the file.
          You may try to use the hex editor to observe the unicode.txt. e.g.
          UltraEdit is a good hex editor.


          Best regards,

          Peter Huang
          Microsoft Online Partner Support

          Get Secure! - www.microsoft.com/security
          This posting is provided "AS IS" with no warranties, and confers no rights.

          Comment

          • Jeff

            #6
            Re: ASCII vs Unicode

            Thanks, Peter -

            But, as I mentioned in my post, I am not creating the unicode file.
            (Microsoft creates it as a log file written by their setup.exe for Microsoft
            SQL Server Desktop Edition.) And the unicode file that MS creates, while it
            doesn't have the FFFE at its start, is quite readable with
            StreamReader(rF ile, System.Text.Enc oding.Unicode).

            I'm not looking for a way to create a unicode file. I'm looking for the
            best way to display the contents of a text file in a multiline textbox, when
            I don't know in advance whether it's ASCII or unicode.

            Please help.

            - Jeff



            ""Peter Huang"" <v-phuang@online.m icrosoft.com> wrote in message
            news:Lxuv9pRoEH A.2640@cpmsftng xa06.phx.gbl...[color=blue]
            > Hi Jeff,
            >
            > FFFE is the byte order mark of Unicode, it signifies to Unicode that the
            > bytes are little endian.
            > If you use the similar as below to write a string into a file with unicode
            > encoding, you will find that there is the FFFE leader characters.
            > Dim wr As New StreamWriter("C :\testuni.txt", True,
            > System.Text.Enc oding.Unicode)
            > wr.Write(input)
            > wr.Close()
            >
            > If you use the notepad.exe to save text as unicode file in the save as
            > dialog, you will find the same FFFE occurance.
            >
            > Now the different between my unicode and your unicode file is that the
            > unicode.txt you provide did not have the FFFE leader characters which[/color]
            cause[color=blue]
            > the problem. StreamReader needs that to do the right decoding from byte
            > stream to string.
            >
            > Acutally if you use the StreamReader to read your unicode.txt into a[/color]
            string[color=blue]
            > and then use the Console.WriteLi ne to print the string, you will find that
            > the string will be displayed correctly but a space between every two
            > characters.
            >
            > So far now I think you may try to add the FFFE at the very beginning of[/color]
            the[color=blue]
            > unicode file when you generate the file.
            > You may try to use the hex editor to observe the unicode.txt. e.g.
            > UltraEdit is a good hex editor.
            >
            >
            > Best regards,
            >
            > Peter Huang
            > Microsoft Online Partner Support
            >
            > Get Secure! - www.microsoft.com/security
            > This posting is provided "AS IS" with no warranties, and confers no[/color]
            rights.[color=blue]
            >[/color]


            Comment

            • Peter Huang

              #7
              Re: ASCII vs Unicode

              Hi Jeff,

              Based on my test, the msde setup.exe tool will generate the log file with
              the FFFE tag.
              I run the command line as below.
              setup /l*v C:\msde.log

              After that, I will get the msde.log file, if I open it in the hex editor I
              will find the flag FFFE.

              If we do not have the flag, we can not identity the file's encoding.
              e.g. the string below
              =
              is stored as
              FFFE3d00

              From the FFFE, streamreader will know that it is unicode, and it will
              convert the 3d00 as the unicode.

              But is we just encoding it as
              3d00
              the we can decoding in two way, acsii or unicode way.
              If in unicode way, the 3d00 will be one character "=".
              but if in ascii way, the 3d00 will be two character 3d and 00 i.e. "=" and
              the character represented by ascii code(00)

              Maybe there is any problem with the SQL MSDE setup program. As for that
              issue, I think the SQL group will be better.
              microsoft.publi c.sqlserver.msd e
              or
              microsoft.publi c.sqlserver.set up

              Best regards,

              Peter Huang
              Microsoft Online Partner Support

              Get Secure! - www.microsoft.com/security
              This posting is provided "AS IS" with no warranties, and confers no rights.

              Comment

              Working...