Convert DOS Cyrillic text to Unicode

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Nikolay Petrov

    Convert DOS Cyrillic text to Unicode

    How can I convert DOS cyrillic text to Unicode


  • Herfried K. Wagner [MVP]

    #2
    Re: Convert DOS Cyrillic text to Unicode

    * "Nikolay Petrov" <johntup2@mail. bg> scripsit:[color=blue]
    > How can I convert DOS cyrillic text to Unicode[/color]

    Take a look at the 'System.Text.En coding' class.

    --
    Herfried K. Wagner [MVP]
    <URL:http://dotnet.mvps.org/>
    <URL:http://dotnet.mvps.org/dotnet/faqs/>

    Comment

    • Nikolay Petrov

      #3
      Re: Convert DOS Cyrillic text to Unicode

      I am doing this all day ;-)
      still nothing ;-)

      "Herfried K. Wagner [MVP]" <hirf-spam-me-here@gmx.at> wrote in message
      news:%2329I7hwc EHA.3864@TK2MSF TNGP10.phx.gbl. ..[color=blue]
      > * "Nikolay Petrov" <johntup2@mail. bg> scripsit:[color=green]
      > > How can I convert DOS cyrillic text to Unicode[/color]
      >
      > Take a look at the 'System.Text.En coding' class.
      >
      > --
      > Herfried K. Wagner [MVP]
      > <URL:http://dotnet.mvps.org/>
      > <URL:http://dotnet.mvps.org/dotnet/faqs/>[/color]


      Comment

      • Cor Ligthert

        #4
        Re: Convert DOS Cyrillic text to Unicode

        LOL
        [color=blue]
        > I am doing this all day ;-)
        > still nothing ;-)
        >[color=green][color=darkred]
        > > > How can I convert DOS cyrillic text to Unicode[/color]
        > >
        > > Take a look at the 'System.Text.En coding' class.
        > >[/color][/color]


        Comment

        • Nikolay Petrov

          #5
          Re: Convert DOS Cyrillic text to Unicode

          did i mention that this is going to be my first app ;-)

          "Cor Ligthert" <notfirstname@p lanet.nl> wrote in message
          news:efRDlTxcEH A.1424@tk2msftn gp13.phx.gbl...[color=blue]
          > LOL
          >[color=green]
          > > I am doing this all day ;-)
          > > still nothing ;-)
          > >[color=darkred]
          > > > > How can I convert DOS cyrillic text to Unicode
          > > >
          > > > Take a look at the 'System.Text.En coding' class.
          > > >[/color][/color]
          >
          >[/color]


          Comment

          • Cor Ligthert

            #6
            Re: Convert DOS Cyrillic text to Unicode

            Hi Nikolay,

            Send some code in advance, when you have luck Jay will help you, you can
            send this as well to the newsgroup.

            Microsoft.publi c.dotnet.genera l

            There you have a change that Jon Skeet will help you.

            They are the two who do the most encoding problems.

            Cor


            Comment

            • Jay B. Harlow [MVP - Outlook]

              #7
              Re: Convert DOS Cyrillic text to Unicode

              Nikolay,
              In addition to the other comments

              What is the Code Page for DOS Cyrillic? Quickly checking MSDN I think its
              866, but you need to double check!

              You would use Encoding.GetEnc oding to get the DOS Cyrillic Encoding object.

              Imports System.Text

              Dim cyrillic As Encoding = Encoding.GetEnc oding(866)

              Given an array of Bytes with DOS Cyrillic in it, you would use
              Encoding.GetStr ing to convert to a Unicode String.

              Dim bytes() As Byte
              Dim s As String = cyrillic.GetStr ing(bytes)

              Given a Unicode String, you would us Encoding.GetByt es to get an array of
              Bytes with DOS cyrillic.

              bytes = cyrillic.GetByt es(s)

              If your DOS cyrillic is in a Text File you pass the Encoding object to your
              System.IO reader & writer classes

              Dim input As New StreamReader("m yCyrillic.txt", cyrillic)

              Dim output As New StreamWriter("m yCyrillic.txt", False, cyrillic)

              For information on Unicode, Encoding, and code pages (such as DOS Cyrillic)
              see:



              One last thing: Once you have a String it is Unicode! Only Byte arrays &
              Streams contain DOS Cyrillic and other character encodings.

              Hope this helps
              Jay

              "Nikolay Petrov" <johntup2@mail. bg> wrote in message
              news:u8aV5UvcEH A.3632@TK2MSFTN GP09.phx.gbl...[color=blue]
              > How can I convert DOS cyrillic text to Unicode
              >
              >[/color]


              Comment

              • Nikolay Petrov

                #8
                Re: Convert DOS Cyrillic text to Unicode

                Thanks Cor

                "Cor Ligthert" <notfirstname@p lanet.nl> wrote in message
                news:OIsQ08xcEH A.1672@TK2MSFTN GP12.phx.gbl...[color=blue]
                > Hi Nikolay,
                >
                > Send some code in advance, when you have luck Jay will help you, you can
                > send this as well to the newsgroup.
                >
                > Microsoft.publi c.dotnet.genera l
                >
                > There you have a change that Jon Skeet will help you.
                >
                > They are the two who do the most encoding problems.
                >
                > Cor
                >
                >[/color]


                Comment

                • Nikolay Petrov

                  #9
                  Re: Convert DOS Cyrillic text to Unicode

                  That was very helpfull.
                  But I have some problems. Let me first tell you exactly what I want to
                  achieve.
                  I've made a simple ASP .NET page with two text boxes and a button.
                  What I need is, that a user paste DOS cyrillic text (taken from Notepad) in
                  left text box,
                  and when he clicks the button, the Converted to Unicode text to appear at
                  the right box.
                  So I get the DOS text as String, not as bytes. How should I proceed in this
                  case?


                  "Jay B. Harlow [MVP - Outlook]" <Jay_Harlow_MVP @msn.com> wrote in message
                  news:utwflAycEH A.2268@TK2MSFTN GP12.phx.gbl...[color=blue]
                  > Nikolay,
                  > In addition to the other comments
                  >
                  > What is the Code Page for DOS Cyrillic? Quickly checking MSDN I think its
                  > 866, but you need to double check!
                  >
                  > You would use Encoding.GetEnc oding to get the DOS Cyrillic Encoding[/color]
                  object.[color=blue]
                  >
                  > Imports System.Text
                  >
                  > Dim cyrillic As Encoding = Encoding.GetEnc oding(866)
                  >
                  > Given an array of Bytes with DOS Cyrillic in it, you would use
                  > Encoding.GetStr ing to convert to a Unicode String.
                  >
                  > Dim bytes() As Byte
                  > Dim s As String = cyrillic.GetStr ing(bytes)
                  >
                  > Given a Unicode String, you would us Encoding.GetByt es to get an array of
                  > Bytes with DOS cyrillic.
                  >
                  > bytes = cyrillic.GetByt es(s)
                  >
                  > If your DOS cyrillic is in a Text File you pass the Encoding object to[/color]
                  your[color=blue]
                  > System.IO reader & writer classes
                  >
                  > Dim input As New StreamReader("m yCyrillic.txt", cyrillic)
                  >
                  > Dim output As New StreamWriter("m yCyrillic.txt", False, cyrillic)
                  >
                  > For information on Unicode, Encoding, and code pages (such as DOS[/color]
                  Cyrillic)[color=blue]
                  > see:
                  >
                  > http://www.yoda.arachsys.com/csharp/unicode.html
                  >
                  > One last thing: Once you have a String it is Unicode! Only Byte arrays &
                  > Streams contain DOS Cyrillic and other character encodings.
                  >
                  > Hope this helps
                  > Jay
                  >
                  > "Nikolay Petrov" <johntup2@mail. bg> wrote in message
                  > news:u8aV5UvcEH A.3632@TK2MSFTN GP09.phx.gbl...[color=green]
                  > > How can I convert DOS cyrillic text to Unicode
                  > >
                  > >[/color]
                  >
                  >[/color]


                  Comment

                  • Jay B. Harlow [MVP - Outlook]

                    #10
                    Re: Convert DOS Cyrillic text to Unicode

                    Nikolay,[color=blue]
                    > What I need is, that a user paste DOS Cyrillic text (taken from Notepad)[/color]
                    in[color=blue]
                    > left text box,[/color]
                    I would expect Notepad will have Windows Cyrillic or Unicode or think it
                    has, depending on the version of Windows & your regional settings in Control
                    Panel.
                    [color=blue]
                    > So I get the DOS text as String, not as bytes. How should I proceed in[/color]
                    this[color=blue]
                    > case?[/color]
                    No you don't get DOS text as a String!

                    Strings in .NET are always Unicode! Period.

                    Notepad, the browser & ASP.NET has already converted your "DOS text" into
                    Unicode for you. As I stated Notepad made an assumption of what kind of text
                    it is, then the browser used some encoding, such as UTF-8 or Windows
                    Cyrillic to send the response to ASP.NET as a stream of bytes. ASP.NET then
                    converted this response stream of bytes into a Unicode String. Hence your
                    program now has a Unicode string!

                    I've only used the normal encoding for requests & response in ASP.NET, so
                    I'm not certain on how to use a specific encoding for requests & responses.

                    Unfortunately you will need to ask in one of the ASP.NET newsgroups, such as
                    microsoft.publi c.dotnet.framew ork.aspnet for specifics on specific encodings
                    on requests & responses...

                    Notice that in the above there is a whole lot of converting going on! Once
                    your user opened the file in Notepad it was converted, an assumption was
                    made about the type of text in the file (I strongly suspect the assumption
                    was not DOS Cyrillic). Then when you cut & pasted the text from notepad to
                    your browser a conversion may have been made, but more then likely it was
                    done in the code page of your regional settings in windows, then when you
                    submitted the page to ASP.NET a conversion is made from the request/response
                    encoding into Unicode. So by the time ASP.NET gets you text is has already
                    been converted for you, so it is no where near DOC Cyrillic any more.

                    If you have files with DOS Cyrillic in them and you need or want to use
                    ASP.NET to convert them to Unicode I would recommend rather then using a
                    notepad, a text box and cut & paste. That you use the input type=file HTML
                    control to upload your DOS Cyrillic to the server as a stream of bytes
                    (preserving the DOS Cyrillic), then using the encoding object as I showed to
                    read this stream validly converting it to Unicode.

                    Hope this helps
                    Jay

                    "Nikolay Petrov" <johntup2@mail. bg> wrote in message
                    news:ebOujt7cEH A.1888@TK2MSFTN GP12.phx.gbl...[color=blue]
                    > That was very helpfull.
                    > But I have some problems. Let me first tell you exactly what I want to
                    > achieve.
                    > I've made a simple ASP .NET page with two text boxes and a button.
                    > What I need is, that a user paste DOS cyrillic text (taken from Notepad)[/color]
                    in[color=blue]
                    > left text box,
                    > and when he clicks the button, the Converted to Unicode text to appear at
                    > the right box.
                    > So I get the DOS text as String, not as bytes. How should I proceed in[/color]
                    this[color=blue]
                    > case?
                    >
                    >
                    > "Jay B. Harlow [MVP - Outlook]" <Jay_Harlow_MVP @msn.com> wrote in message
                    > news:utwflAycEH A.2268@TK2MSFTN GP12.phx.gbl...[color=green]
                    > > Nikolay,
                    > > In addition to the other comments
                    > >
                    > > What is the Code Page for DOS Cyrillic? Quickly checking MSDN I think[/color][/color]
                    its[color=blue][color=green]
                    > > 866, but you need to double check!
                    > >
                    > > You would use Encoding.GetEnc oding to get the DOS Cyrillic Encoding[/color]
                    > object.[color=green]
                    > >
                    > > Imports System.Text
                    > >
                    > > Dim cyrillic As Encoding = Encoding.GetEnc oding(866)
                    > >
                    > > Given an array of Bytes with DOS Cyrillic in it, you would use
                    > > Encoding.GetStr ing to convert to a Unicode String.
                    > >
                    > > Dim bytes() As Byte
                    > > Dim s As String = cyrillic.GetStr ing(bytes)
                    > >
                    > > Given a Unicode String, you would us Encoding.GetByt es to get an array[/color][/color]
                    of[color=blue][color=green]
                    > > Bytes with DOS cyrillic.
                    > >
                    > > bytes = cyrillic.GetByt es(s)
                    > >
                    > > If your DOS cyrillic is in a Text File you pass the Encoding object to[/color]
                    > your[color=green]
                    > > System.IO reader & writer classes
                    > >
                    > > Dim input As New StreamReader("m yCyrillic.txt", cyrillic)
                    > >
                    > > Dim output As New StreamWriter("m yCyrillic.txt", False, cyrillic)
                    > >
                    > > For information on Unicode, Encoding, and code pages (such as DOS[/color]
                    > Cyrillic)[color=green]
                    > > see:
                    > >
                    > > http://www.yoda.arachsys.com/csharp/unicode.html
                    > >
                    > > One last thing: Once you have a String it is Unicode! Only Byte arrays &
                    > > Streams contain DOS Cyrillic and other character encodings.
                    > >
                    > > Hope this helps
                    > > Jay
                    > >
                    > > "Nikolay Petrov" <johntup2@mail. bg> wrote in message
                    > > news:u8aV5UvcEH A.3632@TK2MSFTN GP09.phx.gbl...[color=darkred]
                    > > > How can I convert DOS cyrillic text to Unicode
                    > > >
                    > > >[/color]
                    > >
                    > >[/color]
                    >
                    >[/color]


                    Comment

                    • Nikolay Petrov

                      #11
                      Re: Convert DOS Cyrillic text to Unicode

                      Definitely, Jay. Thank you!
                      I've got it working allready.
                      Thank you again.

                      "Jay B. Harlow [MVP - Outlook]" <Jay_Harlow_MVP @msn.com> wrote in message
                      news:%23chTAn9c EHA.1764@TK2MSF TNGP10.phx.gbl. ..[color=blue]
                      > Nikolay,[color=green]
                      > > What I need is, that a user paste DOS Cyrillic text (taken from Notepad)[/color]
                      > in[color=green]
                      > > left text box,[/color]
                      > I would expect Notepad will have Windows Cyrillic or Unicode or think it
                      > has, depending on the version of Windows & your regional settings in[/color]
                      Control[color=blue]
                      > Panel.
                      >[color=green]
                      > > So I get the DOS text as String, not as bytes. How should I proceed in[/color]
                      > this[color=green]
                      > > case?[/color]
                      > No you don't get DOS text as a String!
                      >
                      > Strings in .NET are always Unicode! Period.
                      >
                      > Notepad, the browser & ASP.NET has already converted your "DOS text" into
                      > Unicode for you. As I stated Notepad made an assumption of what kind of[/color]
                      text[color=blue]
                      > it is, then the browser used some encoding, such as UTF-8 or Windows
                      > Cyrillic to send the response to ASP.NET as a stream of bytes. ASP.NET[/color]
                      then[color=blue]
                      > converted this response stream of bytes into a Unicode String. Hence your
                      > program now has a Unicode string!
                      >
                      > I've only used the normal encoding for requests & response in ASP.NET, so
                      > I'm not certain on how to use a specific encoding for requests &[/color]
                      responses.[color=blue]
                      >
                      > Unfortunately you will need to ask in one of the ASP.NET newsgroups, such[/color]
                      as[color=blue]
                      > microsoft.publi c.dotnet.framew ork.aspnet for specifics on specific[/color]
                      encodings[color=blue]
                      > on requests & responses...
                      >
                      > Notice that in the above there is a whole lot of converting going on! Once
                      > your user opened the file in Notepad it was converted, an assumption was
                      > made about the type of text in the file (I strongly suspect the assumption
                      > was not DOS Cyrillic). Then when you cut & pasted the text from notepad to
                      > your browser a conversion may have been made, but more then likely it was
                      > done in the code page of your regional settings in windows, then when you
                      > submitted the page to ASP.NET a conversion is made from the[/color]
                      request/response[color=blue]
                      > encoding into Unicode. So by the time ASP.NET gets you text is has already
                      > been converted for you, so it is no where near DOC Cyrillic any more.
                      >
                      > If you have files with DOS Cyrillic in them and you need or want to use
                      > ASP.NET to convert them to Unicode I would recommend rather then using a
                      > notepad, a text box and cut & paste. That you use the input type=file HTML
                      > control to upload your DOS Cyrillic to the server as a stream of bytes
                      > (preserving the DOS Cyrillic), then using the encoding object as I showed[/color]
                      to[color=blue]
                      > read this stream validly converting it to Unicode.
                      >
                      > Hope this helps
                      > Jay
                      >
                      > "Nikolay Petrov" <johntup2@mail. bg> wrote in message
                      > news:ebOujt7cEH A.1888@TK2MSFTN GP12.phx.gbl...[color=green]
                      > > That was very helpfull.
                      > > But I have some problems. Let me first tell you exactly what I want to
                      > > achieve.
                      > > I've made a simple ASP .NET page with two text boxes and a button.
                      > > What I need is, that a user paste DOS cyrillic text (taken from Notepad)[/color]
                      > in[color=green]
                      > > left text box,
                      > > and when he clicks the button, the Converted to Unicode text to appear[/color][/color]
                      at[color=blue][color=green]
                      > > the right box.
                      > > So I get the DOS text as String, not as bytes. How should I proceed in[/color]
                      > this[color=green]
                      > > case?
                      > >
                      > >
                      > > "Jay B. Harlow [MVP - Outlook]" <Jay_Harlow_MVP @msn.com> wrote in[/color][/color]
                      message[color=blue][color=green]
                      > > news:utwflAycEH A.2268@TK2MSFTN GP12.phx.gbl...[color=darkred]
                      > > > Nikolay,
                      > > > In addition to the other comments
                      > > >
                      > > > What is the Code Page for DOS Cyrillic? Quickly checking MSDN I think[/color][/color]
                      > its[color=green][color=darkred]
                      > > > 866, but you need to double check!
                      > > >
                      > > > You would use Encoding.GetEnc oding to get the DOS Cyrillic Encoding[/color]
                      > > object.[color=darkred]
                      > > >
                      > > > Imports System.Text
                      > > >
                      > > > Dim cyrillic As Encoding = Encoding.GetEnc oding(866)
                      > > >
                      > > > Given an array of Bytes with DOS Cyrillic in it, you would use
                      > > > Encoding.GetStr ing to convert to a Unicode String.
                      > > >
                      > > > Dim bytes() As Byte
                      > > > Dim s As String = cyrillic.GetStr ing(bytes)
                      > > >
                      > > > Given a Unicode String, you would us Encoding.GetByt es to get an array[/color][/color]
                      > of[color=green][color=darkred]
                      > > > Bytes with DOS cyrillic.
                      > > >
                      > > > bytes = cyrillic.GetByt es(s)
                      > > >
                      > > > If your DOS cyrillic is in a Text File you pass the Encoding object to[/color]
                      > > your[color=darkred]
                      > > > System.IO reader & writer classes
                      > > >
                      > > > Dim input As New StreamReader("m yCyrillic.txt", cyrillic)
                      > > >
                      > > > Dim output As New StreamWriter("m yCyrillic.txt", False, cyrillic)
                      > > >
                      > > > For information on Unicode, Encoding, and code pages (such as DOS[/color]
                      > > Cyrillic)[color=darkred]
                      > > > see:
                      > > >
                      > > > http://www.yoda.arachsys.com/csharp/unicode.html
                      > > >
                      > > > One last thing: Once you have a String it is Unicode! Only Byte arrays[/color][/color][/color]
                      &[color=blue][color=green][color=darkred]
                      > > > Streams contain DOS Cyrillic and other character encodings.
                      > > >
                      > > > Hope this helps
                      > > > Jay
                      > > >
                      > > > "Nikolay Petrov" <johntup2@mail. bg> wrote in message
                      > > > news:u8aV5UvcEH A.3632@TK2MSFTN GP09.phx.gbl...
                      > > > > How can I convert DOS cyrillic text to Unicode
                      > > > >
                      > > > >
                      > > >
                      > > >[/color]
                      > >
                      > >[/color]
                      >
                      >[/color]


                      Comment

                      Working...