Unicode Encoding

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Andy

    Unicode Encoding

    Hello All:

    I have a windows application that I need to encode a string using Unicode.
    The example I have been given to use is a Web-Version. Below is the webcode.

    Response.Conten tEncoding=Syste m.Text.Encoding .Unicode;
    Response.Conten tType = "applicatio n/postscript";
    Response.Buffer =true;
    Response.Append Header("Content-Disposition","a ttachment; filename=\"" +
    sFilename + "\"");

    Byte[] ba = System.Text.Enc oding.Unicode.G etBytes(sPSFile );
    Byte[] by = new System.Byte[1];
    for(int i=0;i<ba.Length ;i+=2)
    {
    by[0]=ba[i];
    Response.Binary Write(by);
    }
    Response.End();

    I am having a hard time converting this code into the windows equivalent so
    that the files remain the same. Would anyone be able to help?

    I know what I am trying is not working. I thought I had to start down this
    path:
    Byte[] ba = System.Text.Enc oding.Unicode.G etBytes(postScr ipt);
    UnicodeEncoding encoding = new UnicodeEncoding ();
    postScript = encoding.GetStr ing(ba);

    But I know the encoding.GetStr ing() simply converts it back to a string.
    How do I get the value postScript to be the encoded value?

    Thanks
    Andy






  • ssamuel

    #2
    Re: Unicode Encoding

    Hi, Andy.

    I assume "windows version" for you means a C# WinForms application
    since you're posting on a C# board.

    All .NET strings are Unicode encoded. If you're getting your data into
    a string, you're getting it in Unicode.

    It looks like you're trying to read a Postscript file out of a MIME
    attachment. I don't have the rest of your "webcode," but chances are,
    your MIME attachment is Base64 encoded. If you rip the Base64-encoded
    content out of the BOF and EOF markers, you'll still need to decode it
    into binary bytes (as opposed to Base64 characters) before you can do
    anything useful with it.

    Try System.Convert. FromBase64Strin g() on your data. That'll give you
    back a byte[], which you can then turn your UnicodeEncoding on.


    HTH!

    Stephan




    Andy wrote:
    Hello All:
    >
    I have a windows application that I need to encode a string using Unicode.
    The example I have been given to use is a Web-Version. Below is the webcode.
    >
    Response.Conten tEncoding=Syste m.Text.Encoding .Unicode;
    Response.Conten tType = "applicatio n/postscript";
    Response.Buffer =true;
    Response.Append Header("Content-Disposition","a ttachment; filename=\"" +
    sFilename + "\"");
    >
    Byte[] ba = System.Text.Enc oding.Unicode.G etBytes(sPSFile );
    Byte[] by = new System.Byte[1];
    for(int i=0;i<ba.Length ;i+=2)
    {
    by[0]=ba[i];
    Response.Binary Write(by);
    }
    Response.End();
    >
    I am having a hard time converting this code into the windows equivalent so
    that the files remain the same. Would anyone be able to help?
    >
    I know what I am trying is not working. I thought I had to start down this
    path:
    Byte[] ba = System.Text.Enc oding.Unicode.G etBytes(postScr ipt);
    UnicodeEncoding encoding = new UnicodeEncoding ();
    postScript = encoding.GetStr ing(ba);
    >
    But I know the encoding.GetStr ing() simply converts it back to a string.
    How do I get the value postScript to be the encoded value?
    >
    Thanks
    Andy

    Comment

    • Andy

      #3
      Re: Unicode Encoding

      Stephan:

      Thank you very much for your help. I appreciate it immensly. You are
      correct in your assumption about WinForms and C#.

      Please forgive me, as this is my first forray into any type of encoding. My
      google searches confused me even more.

      Unfortunately: I am unsure exactly how the MIME attachment is encoded, as
      this comes from a component, of which I do not have access to the code.

      However: I took your suggestion and used this code:
      Byte[] ba = System.Convert. FromBase64Strin g(postScript);
      UnicodeEncoding encoding = new UnicodeEncoding ();
      postScript = encoding.GetStr ing(ba);

      But now I am getting an error stating: "System.FormatE xception: Invalid
      character in a Base-64 string. at System.Convert. FromBase64Strin g(String s)"


      Any help would be greatly appreciated.

      "ssamuel" wrote:
      Hi, Andy.
      >
      I assume "windows version" for you means a C# WinForms application
      since you're posting on a C# board.
      >
      All .NET strings are Unicode encoded. If you're getting your data into
      a string, you're getting it in Unicode.
      >
      It looks like you're trying to read a Postscript file out of a MIME
      attachment. I don't have the rest of your "webcode," but chances are,
      your MIME attachment is Base64 encoded. If you rip the Base64-encoded
      content out of the BOF and EOF markers, you'll still need to decode it
      into binary bytes (as opposed to Base64 characters) before you can do
      anything useful with it.
      >
      Try System.Convert. FromBase64Strin g() on your data. That'll give you
      back a byte[], which you can then turn your UnicodeEncoding on.
      >
      >
      HTH!
      >
      Stephan
      >
      >
      >
      >
      Andy wrote:
      Hello All:

      I have a windows application that I need to encode a string using Unicode.
      The example I have been given to use is a Web-Version. Below is the webcode.

      Response.Conten tEncoding=Syste m.Text.Encoding .Unicode;
      Response.Conten tType = "applicatio n/postscript";
      Response.Buffer =true;
      Response.Append Header("Content-Disposition","a ttachment; filename=\"" +
      sFilename + "\"");

      Byte[] ba = System.Text.Enc oding.Unicode.G etBytes(sPSFile );
      Byte[] by = new System.Byte[1];
      for(int i=0;i<ba.Length ;i+=2)
      {
      by[0]=ba[i];
      Response.Binary Write(by);
      }
      Response.End();

      I am having a hard time converting this code into the windows equivalent so
      that the files remain the same. Would anyone be able to help?

      I know what I am trying is not working. I thought I had to start down this
      path:
      Byte[] ba = System.Text.Enc oding.Unicode.G etBytes(postScr ipt);
      UnicodeEncoding encoding = new UnicodeEncoding ();
      postScript = encoding.GetStr ing(ba);

      But I know the encoding.GetStr ing() simply converts it back to a string.
      How do I get the value postScript to be the encoded value?

      Thanks
      Andy
      >
      >

      Comment

      • ssamuel

        #4
        Re: Unicode Encoding

        Andy,

        What's the string (postScript) look like? Where are you getting it
        from?

        I'm still assuming your source component is emitting a MIME-encoded
        message and that's what the variable postScript contains. Is this
        correct? Do you know anything about what the source is *supposed* to be
        sending? If it is a MIME message, there are some headers that you have
        to strip. Have you done this?

        It looks like you're trying to feed more than just some Base64-encoded
        text into the decoder. A (likely) reason that this could happen is if
        you're not stripping the headers.

        It would be helpful if you could post the fist couple hundred
        characters of the string you're trying to decode (postScript).


        Stephan




        Andy wrote:
        Stephan:
        >
        Thank you very much for your help. I appreciate it immensly. You are
        correct in your assumption about WinForms and C#.
        >
        Please forgive me, as this is my first forray into any type of encoding. My
        google searches confused me even more.
        >
        Unfortunately: I am unsure exactly how the MIME attachment is encoded, as
        this comes from a component, of which I do not have access to the code.
        >
        However: I took your suggestion and used this code:
        Byte[] ba = System.Convert. FromBase64Strin g(postScript);
        UnicodeEncoding encoding = new UnicodeEncoding ();
        postScript = encoding.GetStr ing(ba);
        >
        But now I am getting an error stating: "System.FormatE xception: Invalid
        character in a Base-64 string. at System.Convert. FromBase64Strin g(String s)"
        >
        >
        Any help would be greatly appreciated.
        >
        "ssamuel" wrote:
        >
        Hi, Andy.

        I assume "windows version" for you means a C# WinForms application
        since you're posting on a C# board.

        All .NET strings are Unicode encoded. If you're getting your data into
        a string, you're getting it in Unicode.

        It looks like you're trying to read a Postscript file out of a MIME
        attachment. I don't have the rest of your "webcode," but chances are,
        your MIME attachment is Base64 encoded. If you rip the Base64-encoded
        content out of the BOF and EOF markers, you'll still need to decode it
        into binary bytes (as opposed to Base64 characters) before you can do
        anything useful with it.

        Try System.Convert. FromBase64Strin g() on your data. That'll give you
        back a byte[], which you can then turn your UnicodeEncoding on.


        HTH!

        Stephan




        Andy wrote:
        Hello All:
        >
        I have a windows application that I need to encode a string using Unicode.
        The example I have been given to use is a Web-Version. Below is the webcode.
        >
        Response.Conten tEncoding=Syste m.Text.Encoding .Unicode;
        Response.Conten tType = "applicatio n/postscript";
        Response.Buffer =true;
        Response.Append Header("Content-Disposition","a ttachment; filename=\"" +
        sFilename + "\"");
        >
        Byte[] ba = System.Text.Enc oding.Unicode.G etBytes(sPSFile );
        Byte[] by = new System.Byte[1];
        for(int i=0;i<ba.Length ;i+=2)
        {
        by[0]=ba[i];
        Response.Binary Write(by);
        }
        Response.End();
        >
        I am having a hard time converting this code into the windows equivalent so
        that the files remain the same. Would anyone be able to help?
        >
        I know what I am trying is not working. I thought I had to start down this
        path:
        Byte[] ba = System.Text.Enc oding.Unicode.G etBytes(postScr ipt);
        UnicodeEncoding encoding = new UnicodeEncoding ();
        postScript = encoding.GetStr ing(ba);
        >
        But I know the encoding.GetStr ing() simply converts it back to a string.
        How do I get the value postScript to be the encoded value?
        >
        Thanks
        Andy

        Comment

        • Christof Nordiek

          #5
          Re: Unicode Encoding

          Hi Andy,

          why do you increment i by 2. Even though Unicode (or UTF-16 to be precise),
          is a 2-Byte code, 1 Byte is still 1 Byte. You increnet by to, but copy only
          1 Byte per iteration thereby loosing half of each character.

          Also it may be more performant copying greater chunks.

          "Andy" <Andy@discussio ns.microsoft.co mschrieb im Newsbeitrag
          news:1AD71C79-DA12-4D7D-A3C9-238026B6F93B@mi crosoft.com...
          Hello All:
          >
          I have a windows application that I need to encode a string using Unicode.
          The example I have been given to use is a Web-Version. Below is the
          webcode.
          >
          Response.Conten tEncoding=Syste m.Text.Encoding .Unicode;
          Response.Conten tType = "applicatio n/postscript";
          Response.Buffer =true;
          Response.Append Header("Content-Disposition","a ttachment; filename=\"" +
          sFilename + "\"");
          >
          Byte[] ba = System.Text.Enc oding.Unicode.G etBytes(sPSFile );
          Byte[] by = new System.Byte[1];
          for(int i=0;i<ba.Length ;i+=2)
          {
          by[0]=ba[i];
          Response.Binary Write(by);
          }
          Response.End();
          >
          I am having a hard time converting this code into the windows equivalent
          so
          that the files remain the same. Would anyone be able to help?
          >
          I know what I am trying is not working. I thought I had to start down
          this
          path:
          Byte[] ba = System.Text.Enc oding.Unicode.G etBytes(postScr ipt);
          UnicodeEncoding encoding = new UnicodeEncoding ();
          postScript = encoding.GetStr ing(ba);
          >
          But I know the encoding.GetStr ing() simply converts it back to a string.
          How do I get the value postScript to be the encoded value?
          >
          Thanks
          Andy
          >
          >
          >
          >
          >
          >

          Comment

          • Andy

            #6
            Re: Unicode Encoding

            Stephan:

            Unfortunately: I am not familiar with what is supposed to be sending as
            this component was written by someone long before I was at this company, so I
            don't have much knowledge of it.

            What is happening is: we have item layouts that are XML files. This XML is
            being passed into the component, which returns a postScript file (or
            postScript of some sort). Once we get this postScript, it is then converted
            to a .pdf file. I was to write a windows automated version that followed the
            same logic as our web-version (which was manual and time consuming).
            Everything seemed to be working wiht my version until we discovered a pdf
            that didn't display correctly and it seemed to be an encoding issue.

            Here are the first 200 characters of the "postScript " that is returned.

            %!PS-AdobeFont-1.0: TimesNewRomanPS-Italic 001.000
            %%CreationDate: 2/10/00 at 7:13 PM
            %%VMusage: 1024 28747
            % Generated by Fontographer 4.1
            % Copyright \(c\) 1988, 1990 Adobe Systems Incorporate

            I have not stripped out any messages, as it doesn't appear the web-version
            was doing this. Perhaps I need to do this?

            "ssamuel" wrote:
            Andy,
            >
            What's the string (postScript) look like? Where are you getting it
            from?
            >
            I'm still assuming your source component is emitting a MIME-encoded
            message and that's what the variable postScript contains. Is this
            correct? Do you know anything about what the source is *supposed* to be
            sending? If it is a MIME message, there are some headers that you have
            to strip. Have you done this?
            >
            It looks like you're trying to feed more than just some Base64-encoded
            text into the decoder. A (likely) reason that this could happen is if
            you're not stripping the headers.
            >
            It would be helpful if you could post the fist couple hundred
            characters of the string you're trying to decode (postScript).
            >
            >
            Stephan

            Comment

            • Andy

              #7
              Re: Unicode Encoding

              Hello Christof:

              Unfortunately: I am not able to answer this question as to why they
              increment by 2 as this was written before I was with this company. I was
              just given this code as an example of how it works.

              Thanks
              Andy

              "Christof Nordiek" wrote:
              Hi Andy,
              >
              why do you increment i by 2. Even though Unicode (or UTF-16 to be precise),
              is a 2-Byte code, 1 Byte is still 1 Byte. You increnet by to, but copy only
              1 Byte per iteration thereby loosing half of each character.
              >
              Also it may be more performant copying greater chunks.

              Comment

              • ssamuel

                #8
                Re: Unicode Encoding

                Andy,

                You say it was working fine until a file didn't work. Does this mean
                that there's one file that doesn't work, or that you wrote a system and
                it hasn't worked yet?

                If it's just this one file, it's not an encoding issue. It's probably
                because your PS refers to a PS font rather than a PS page definition.
                The PS-to-PDF won't work on your source document because it describes a
                different thing.


                Stephan



                Andy wrote:
                Stephan:
                >
                Unfortunately: I am not familiar with what is supposed to be sending as
                this component was written by someone long before I was at this company, so I
                don't have much knowledge of it.
                >
                What is happening is: we have item layouts that are XML files. This XML is
                being passed into the component, which returns a postScript file (or
                postScript of some sort). Once we get this postScript, it is then converted
                to a .pdf file. I was to write a windows automated version that followed the
                same logic as our web-version (which was manual and time consuming).
                Everything seemed to be working wiht my version until we discovered a pdf
                that didn't display correctly and it seemed to be an encoding issue.
                >
                Here are the first 200 characters of the "postScript " that is returned.
                >
                %!PS-AdobeFont-1.0: TimesNewRomanPS-Italic 001.000
                %%CreationDate: 2/10/00 at 7:13 PM
                %%VMusage: 1024 28747
                % Generated by Fontographer 4.1
                % Copyright \(c\) 1988, 1990 Adobe Systems Incorporate
                >
                I have not stripped out any messages, as it doesn't appear the web-version
                was doing this. Perhaps I need to do this?
                >
                Stephan

                Comment

                • Andy

                  #9
                  Re: Unicode Encoding

                  Hello All:

                  I just wanted to let you know I found the issue. When I write the file out,
                  I had to set the StreamWriter to System.Text.End oding.Default and all was
                  good.

                  The reason it worked prior was they used different characters on this other
                  file than other files were using.

                  Thanks to all for helping. I do greatly apprecaite it.

                  "ssamuel" wrote:
                  Andy,
                  >
                  You say it was working fine until a file didn't work. Does this mean
                  that there's one file that doesn't work, or that you wrote a system and
                  it hasn't worked yet?
                  >
                  If it's just this one file, it's not an encoding issue. It's probably
                  because your PS refers to a PS font rather than a PS page definition.
                  The PS-to-PDF won't work on your source document because it describes a
                  different thing.
                  >
                  >
                  Stephan
                  >
                  >
                  >
                  Andy wrote:
                  Stephan:

                  Unfortunately: I am not familiar with what is supposed to be sending as
                  this component was written by someone long before I was at this company, so I
                  don't have much knowledge of it.

                  What is happening is: we have item layouts that are XML files. This XML is
                  being passed into the component, which returns a postScript file (or
                  postScript of some sort). Once we get this postScript, it is then converted
                  to a .pdf file. I was to write a windows automated version that followed the
                  same logic as our web-version (which was manual and time consuming).
                  Everything seemed to be working wiht my version until we discovered a pdf
                  that didn't display correctly and it seemed to be an encoding issue.

                  Here are the first 200 characters of the "postScript " that is returned.

                  %!PS-AdobeFont-1.0: TimesNewRomanPS-Italic 001.000
                  %%CreationDate: 2/10/00 at 7:13 PM
                  %%VMusage: 1024 28747
                  % Generated by Fontographer 4.1
                  % Copyright \(c\) 1988, 1990 Adobe Systems Incorporate

                  I have not stripped out any messages, as it doesn't appear the web-version
                  was doing this. Perhaps I need to do this?
                  Stephan
                  >
                  >

                  Comment

                  Working...