Base64 question

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Jim Brandley

    #16
    Re: Base64 question

    Thanks. That's similar to what I have written. I'll see if I can get mine to
    perform better. I was using a StringBuilder to accept the encoded
    characters. I'll see if it performs better using a character array, and save
    the string construction until it's complete.

    "Jon Skeet [C# MVP]" <skeet@pobox.co mwrote in message
    news:MPG.20fa97 25cf46dc4e2bd@m snews.microsoft .com...
    Arne Vajhøj <arne@vajhoej.d kwrote:
    Jim Brandley wrote:
    I need to append a short ciphertext string as a query variable encoded
    so
    it's valid for a URL. After encryption, I convert the bytes to Base64.
    However, the result includes characters that are invalid for a URL,
    notably
    '+' symbols. So, I have to cycle the output string through
    HttpUtility.Url Encode(). That takes time. I wrote my own URL-safe Base64
    converter in C#, that's about as lean as I can make it. It is much
    slower
    (about 6 times) than the the one provided. However, it runs in about 70%
    of
    the time required to use the standard Base64 converter followed by a
    trip
    through UrlEncode().
    >
    I believe that + is the only non URL valid character in base64 output.
    Depending on the exact context, it can be handy to get rid of / and =
    too. In some cases it's just + that needs to be replaced though, yes.
    Why not a simple String Replace ?
    Indeed... possibly with a check to see whether a replacement is needed
    to start with.
    I am using .Net 2.0, and I have not found a way to coerce the built in
    Base64 converter to use a character set that could avoid the trip
    through
    UrlEncode. Am I missing anything? If not, is there any way to add this
    capability to a future release?
    >
    Base64 is a standard. It is not common to allow mocking with a standard.
    I think it's pretty common to adapt base64 to only include URL-safe
    characters. Put it this way - it's common enough to have made it into
    Wikipedia:



    --
    Jon Skeet - <skeet@pobox.co m>
    http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
    If replying to the group, please do not mail me too


    Comment

    • Jon Skeet [C# MVP]

      #17
      Re: Base64 question

      Arne Vajhøj <arne@vajhoej.d kwrote:
      Jon Skeet [C# MVP] wrote:
      Base64 is a standard. It is not common to allow mocking with a standard.
      I think it's pretty common to adapt base64 to only include URL-safe
      characters. Put it this way - it's common enough to have made it into
      Wikipedia:

      http://en.wikipedia.org/wiki/Base64#URL_Applications
      Hmm.

      People seem already to have forgotten the nightmare of
      incompatible uuencode versions.
      This isn't usually for communicating between two applications though -
      it's to allow a stateless application to communicate effectively with
      itself. In other words, you're in complete control of both "ends" of
      the conversation, so can be compatible with yourself appropriately.
      Base64 happens to be a pretty simple format for representing arbitrary
      binary data, and it just needs a little tweak for the sake of URL
      encoding.

      --
      Jon Skeet - <skeet@pobox.co m>
      http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
      If replying to the group, please do not mail me too

      Comment

      • =?Utf-8?B?UGV0ZXIgQnJvbWJlcmcgW0MjIE1WUF0=?=

        #18
        Re: Base64 question

        Jim,
        As Jon Skeet pointed out, modifying the Framework System.Convert classes may
        be the way to go here. A quick decompilation of the System.Convert Base64
        methods reveals that :
        1) they use unsafe code, which probably accounts for the speed factor.
        2) There is a char[] Base64Table used.

        So, you could decompile this, create your own (say,
        Convert.ToBase6 4StringUrlSafe) method, and all you would need to do is change
        the values in the Base64table char[] array.
        Peter

        --
        Site: http://www.eggheadcafe.com
        UnBlog: http://petesbloggerama.blogspot.com
        BlogMetaFinder( BETA): http://www.blogmetafinder.com



        "Jim Brandley" wrote:
        Arne - That was faster - Thanks for the idea. However, Base64 is also
        sending out the slash '/' character - that means a second pass with
        string.Replace( ).
        >
        BTW - I agree that altering something that complies with a standard is a bad
        thing to do. I was on an ANSI committee years ago, and I know why they are
        built the way they are. However, supplementing that method with an optimized
        conversion is not a bad thing to do. Maybe call it UrlSafeBase64. The name
        would convey the reason for the existance of the method, along with a pretty
        good idea of what the output might be. Just a thought.
        >
        Jim
        >
        >
        "Jim Brandley" <Jim.Brandley@I ntercimNOSPAM.c omwrote in message
        news:eFgDBdQwHH A.4332@TK2MSFTN GP06.phx.gbl...
        I'll try that and see what it costs. I was hoping to avoid another
        iteration through the characters in the string.

        "Arne Vajhøj" <arne@vajhoej.d kwrote in message
        news:46904a3e$0 $90266$14726298 @news.sunsite.d k...
        Jim Brandley wrote:
        >I need to append a short ciphertext string as a query variable encoded
        >so it's valid for a URL. After encryption, I convert the bytes to
        >Base64. However, the result includes characters that are invalid for a
        >URL, notably '+' symbols. So, I have to cycle the output string through
        >HttpUtility.Ur lEncode(). That takes time. I wrote my own URL-safe Base64
        >converter in C#, that's about as lean as I can make it. It is much
        >slower (about 6 times) than the the one provided. However, it runs in
        >about 70% of the time required to use the standard Base64 converter
        >followed by a trip through UrlEncode().
        >
        I believe that + is the only non URL valid character in base64 output.
        >
        Why not a simple String Replace ?
        >
        >I am using .Net 2.0, and I have not found a way to coerce the built in
        >Base64 converter to use a character set that could avoid the trip
        >through UrlEncode. Am I missing anything? If not, is there any way to
        >add this capability to a future release?
        >
        Base64 is a standard. It is not common to allow mocking with a standard.
        >
        Arne
        >
        >
        >

        Comment

        • =?ISO-8859-1?Q?Arne_Vajh=F8j?=

          #19
          Re: Base64 question

          Jim Brandley wrote:
          BTW - I agree that altering something that complies with a standard is a bad
          thing to do. I was on an ANSI committee years ago, and I know why they are
          built the way they are. However, supplementing that method with an optimized
          conversion is not a bad thing to do. Maybe call it UrlSafeBase64. The name
          would convey the reason for the existance of the method, along with a pretty
          good idea of what the output might be. Just a thought.
          If you insist in pursuing the idea, then there are some code
          attached below which is the fastest code I can write without
          unsafe code.

          Arne

          =============== =============== =============== ======'

          public class Base64
          {
          private static char[] EncVals =
          "ABCDEFGHIJKLMN OPQRSTUVWXYZabc defghijklmnopqr stuvwxyz0123456 789+/".ToCharArray() ;
          private static int[] DecVals;
          static Base64()
          {
          DecVals = new int[128];
          for(int i = 0; i < 64; i++)
          {
          DecVals[EncVals[i]] = i;
          }
          }
          public string Encode(byte[] b)
          {
          int len = (b.Length * 8 + 5) / 6;
          int extra = 3 - (len + 3) % 4;
          char[] res = new char[len + extra];
          int p = b.Length - b.Length % 3;
          int ix = 0;
          int tmp;
          for(int i = 0; i < p; i += 3)
          {
          tmp = (b[i] << 16) | (b[i + 1] << 8) | b[i + 2];
          res[ix + 3] = EncVals[tmp & 0x3F];
          res[ix + 2] = EncVals[(tmp >6) & 0x3F];
          res[ix + 1] = EncVals[(tmp >12) & 0x3F];
          res[ix] = EncVals[tmp >18];
          ix += 4;
          }
          if(extra == 1)
          {
          tmp = (b[p] << 16) | (b[p + 1] << 8);
          res[ix + 3] = '=';
          res[ix + 2] = EncVals[(tmp >6) & 0x3F];
          res[ix + 1] = EncVals[(tmp >12) & 0x3F];
          res[ix] = EncVals[tmp >18];
          }
          else if(extra == 2)
          {
          tmp = b[p] << 16;
          res[ix + 3] = '=';
          res[ix + 2] = '=';
          res[ix + 1] = EncVals[(tmp >12) & 0x3F];
          res[ix] = EncVals[tmp >18];
          }
          return new String(res);
          }
          public byte[] Decode(string s)
          {
          int len = s.Length;
          while(s[len - 1] == '=') len--;
          len = (len / 4 + 2) * 3;
          byte[] res = new byte[len];
          int ix = 0;
          int tmp;
          for(int i = 0; i < s.Length; i += 4)
          {
          tmp = (DecVals[s[i]] << 18) | (DecVals[s[i + 1]] << 12) |
          (DecVals[s[i + 2]] << 6) | DecVals[s[i + 3]];
          res[ix] = (byte)(tmp >16);
          res[ix + 1] = (byte)((tmp >8) & 0xFF);
          res[ix + 2] = (byte)(tmp & 0xFF);
          ix += 3;
          }
          return res;
          }
          }

          Comment

          • =?ISO-8859-1?Q?Arne_Vajh=F8j?=

            #20
            Re: Base64 question

            Jim Brandley wrote:
            "Arne Vajhøj" <arne@vajhoej.d kwrote in message
            >You can do about 1 million conversions to base64 of
            >a small string in 1 second
            >>
            >=>
            >>
            >If your web server is CPU bound at about 1000 requests/second,
            >then the base64 conversion is using 0.1% of your CPU and something
            >else is chewing the other 99.9%.
            I did not mean to imply this was a bottleneck. I strive to prevent the
            creation of bottlenecks - easier to do that than track them down
            later. I'm
            working on a very large (to me anyway - approx 2M lines of C#, not
            counting
            aspx and ascx pages) web app for intranets. Pages are generated with
            maybe
            2% static text and 98% dynamic, and can have 1500 to 1700 users at
            any given
            time. It is primarily presenting and recording real-time information in
            large manufacturing environments.
            >
            Responsiveness is a big deal for our customers. I spend all my time
            in the
            business objects, data layer and writing SQL. I very seldom do
            anything with
            screens, except present the information they need for binding. Any
            time I
            write a bit of code that gets executed with any frequency, I try to
            find the
            time to analyze it carefully and shave whatever I can.
            I still don't think it is worth it.

            You should write 95%-98% of your code with priority of easy maintenance
            and then optimize the 2%-5% of your code that has been proven to impact
            performance for speed.

            Writing clever code that optimizes stuff that does not need to be
            optimized does not reduce hardware costs but will increase maintenance
            costs dramatically.

            Simple code is usually better than clever code when we talk business.

            I used to do a lot that type of micro optimizations in the 1980's. But
            not any more.

            I think you should use the framework methods and just consider the
            optimized code an interesting academic exercise.

            Arne

            Comment

            • =?ISO-8859-1?Q?Arne_Vajh=F8j?=

              #21
              Re: Base64 question

              Jon Skeet [C# MVP] wrote:
              Arne Vajhøj <arne@vajhoej.d kwrote:
              >Jon Skeet [C# MVP] wrote:
              >>>Base64 is a standard. It is not common to allow mocking with a standard.
              >>I think it's pretty common to adapt base64 to only include URL-safe
              >>characters. Put it this way - it's common enough to have made it into
              >>Wikipedia:
              >>>
              >>http://en.wikipedia.org/wiki/Base64#URL_Applications
              >Hmm.
              >>
              >People seem already to have forgotten the nightmare of
              >incompatible uuencode versions.
              >
              This isn't usually for communicating between two applications though -
              it's to allow a stateless application to communicate effectively with
              itself. In other words, you're in complete control of both "ends" of
              the conversation, so can be compatible with yourself appropriately.
              Base64 happens to be a pretty simple format for representing arbitrary
              binary data, and it just needs a little tweak for the sake of URL
              encoding.
              There are always some excuse to break the standards.

              It starts with being used for one page communicating with itself. Then
              it become used for communicating between pages. Then it starts getting
              used down the lower layers. Then it gets exposed as a service to
              Java and Python apps. Etc.etc..

              Maybe.

              Arne

              Comment

              • Jim Brandley

                #22
                Re: Base64 question

                Thanks Peter. I'll look into that.

                "Peter Bromberg [C# MVP]" <pbromberg@yaho o.yabbadabbadoo .comwrote in
                message news:3AEA10ED-2086-4879-AA3D-11B17557E7A5@mi crosoft.com...
                Jim,
                As Jon Skeet pointed out, modifying the Framework System.Convert classes
                may
                be the way to go here. A quick decompilation of the System.Convert Base64
                methods reveals that :
                1) they use unsafe code, which probably accounts for the speed factor.
                2) There is a char[] Base64Table used.
                >
                So, you could decompile this, create your own (say,
                Convert.ToBase6 4StringUrlSafe) method, and all you would need to do is
                change
                the values in the Base64table char[] array.
                Peter
                >
                --
                Site: http://www.eggheadcafe.com
                UnBlog: http://petesbloggerama.blogspot.com
                BlogMetaFinder( BETA): http://www.blogmetafinder.com
                >
                >
                >
                "Jim Brandley" wrote:
                >
                >Arne - That was faster - Thanks for the idea. However, Base64 is also
                >sending out the slash '/' character - that means a second pass with
                >string.Replace ().
                >>
                >BTW - I agree that altering something that complies with a standard is a
                >bad
                >thing to do. I was on an ANSI committee years ago, and I know why they
                >are
                >built the way they are. However, supplementing that method with an
                >optimized
                >conversion is not a bad thing to do. Maybe call it UrlSafeBase64. The
                >name
                >would convey the reason for the existance of the method, along with a
                >pretty
                >good idea of what the output might be. Just a thought.
                >>
                >Jim
                >>
                >>
                >"Jim Brandley" <Jim.Brandley@I ntercimNOSPAM.c omwrote in message
                >news:eFgDBdQwH HA.4332@TK2MSFT NGP06.phx.gbl.. .
                I'll try that and see what it costs. I was hoping to avoid another
                iteration through the characters in the string.
                >
                "Arne Vajhøj" <arne@vajhoej.d kwrote in message
                news:46904a3e$0 $90266$14726298 @news.sunsite.d k...
                >Jim Brandley wrote:
                >>I need to append a short ciphertext string as a query variable
                >>encoded
                >>so it's valid for a URL. After encryption, I convert the bytes to
                >>Base64. However, the result includes characters that are invalid for
                >>a
                >>URL, notably '+' symbols. So, I have to cycle the output string
                >>through
                >>HttpUtility.U rlEncode(). That takes time. I wrote my own URL-safe
                >>Base64
                >>converter in C#, that's about as lean as I can make it. It is much
                >>slower (about 6 times) than the the one provided. However, it runs in
                >>about 70% of the time required to use the standard Base64 converter
                >>followed by a trip through UrlEncode().
                >>
                >I believe that + is the only non URL valid character in base64 output.
                >>
                >Why not a simple String Replace ?
                >>
                >>I am using .Net 2.0, and I have not found a way to coerce the built
                >>in
                >>Base64 converter to use a character set that could avoid the trip
                >>through UrlEncode. Am I missing anything? If not, is there any way to
                >>add this capability to a future release?
                >>
                >Base64 is a standard. It is not common to allow mocking with a
                >standard.
                >>
                >Arne
                >
                >
                >>
                >>
                >>

                Comment

                • Jim Brandley

                  #23
                  Re: Base64 question

                  That's exactly what I'm using it for. In a stateless environment, I need a
                  secure way to return context to myself to service http requests coming in
                  from our own pages. Since I generate a lot of these, it needs to be done
                  quickly.

                  "Jon Skeet [C# MVP]" <skeet@pobox.co mwrote in message
                  news:MPG.20fb49 fa8031156d2bf@m snews.microsoft .com...
                  Arne Vajhøj <arne@vajhoej.d kwrote:
                  Jon Skeet [C# MVP] wrote:
                  Base64 is a standard. It is not common to allow mocking with a
                  standard.
                  I think it's pretty common to adapt base64 to only include URL-safe
                  characters. Put it this way - it's common enough to have made it into
                  Wikipedia:

                  http://en.wikipedia.org/wiki/Base64#URL_Applications
                  >
                  Hmm.
                  >
                  People seem already to have forgotten the nightmare of
                  incompatible uuencode versions.
                  This isn't usually for communicating between two applications though -
                  it's to allow a stateless application to communicate effectively with
                  itself. In other words, you're in complete control of both "ends" of
                  the conversation, so can be compatible with yourself appropriately.
                  Base64 happens to be a pretty simple format for representing arbitrary
                  binary data, and it just needs a little tweak for the sake of URL
                  encoding.

                  --
                  Jon Skeet - <skeet@pobox.co m>
                  http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
                  If replying to the group, please do not mail me too


                  Comment

                  • Jim Brandley

                    #24
                    Re: Base64 question

                    Thanks Arne - That's pretty much what I have written. I was using a
                    StringBuilder in Encode last night. I was able to cut the cost in half today
                    by using a char array as you have done. I was surprised at the difference.

                    "Arne Vajhøj" <arne@vajhoej.d kwrote in message
                    news:46918ab8$0 $90267$14726298 @news.sunsite.d k...
                    Jim Brandley wrote:
                    >BTW - I agree that altering something that complies with a standard is a
                    >bad thing to do. I was on an ANSI committee years ago, and I know why
                    >they are built the way they are. However, supplementing that method with
                    >an optimized conversion is not a bad thing to do. Maybe call it
                    >UrlSafeBase6 4. The name would convey the reason for the existance of the
                    >method, along with a pretty good idea of what the output might be. Just a
                    >thought.
                    >
                    If you insist in pursuing the idea, then there are some code
                    attached below which is the fastest code I can write without
                    unsafe code.
                    >
                    Arne
                    >
                    =============== =============== =============== ======'
                    >
                    public class Base64
                    {
                    private static char[] EncVals =
                    "ABCDEFGHIJKLMN OPQRSTUVWXYZabc defghijklmnopqr stuvwxyz0123456 789+/".ToCharArray() ;
                    private static int[] DecVals;
                    static Base64()
                    {
                    DecVals = new int[128];
                    for(int i = 0; i < 64; i++)
                    {
                    DecVals[EncVals[i]] = i;
                    }
                    }
                    public string Encode(byte[] b)
                    {
                    int len = (b.Length * 8 + 5) / 6;
                    int extra = 3 - (len + 3) % 4;
                    char[] res = new char[len + extra];
                    int p = b.Length - b.Length % 3;
                    int ix = 0;
                    int tmp;
                    for(int i = 0; i < p; i += 3)
                    {
                    tmp = (b[i] << 16) | (b[i + 1] << 8) | b[i + 2];
                    res[ix + 3] = EncVals[tmp & 0x3F];
                    res[ix + 2] = EncVals[(tmp >6) & 0x3F];
                    res[ix + 1] = EncVals[(tmp >12) & 0x3F];
                    res[ix] = EncVals[tmp >18];
                    ix += 4;
                    }
                    if(extra == 1)
                    {
                    tmp = (b[p] << 16) | (b[p + 1] << 8);
                    res[ix + 3] = '=';
                    res[ix + 2] = EncVals[(tmp >6) & 0x3F];
                    res[ix + 1] = EncVals[(tmp >12) & 0x3F];
                    res[ix] = EncVals[tmp >18];
                    }
                    else if(extra == 2)
                    {
                    tmp = b[p] << 16;
                    res[ix + 3] = '=';
                    res[ix + 2] = '=';
                    res[ix + 1] = EncVals[(tmp >12) & 0x3F];
                    res[ix] = EncVals[tmp >18];
                    }
                    return new String(res);
                    }
                    public byte[] Decode(string s)
                    {
                    int len = s.Length;
                    while(s[len - 1] == '=') len--;
                    len = (len / 4 + 2) * 3;
                    byte[] res = new byte[len];
                    int ix = 0;
                    int tmp;
                    for(int i = 0; i < s.Length; i += 4)
                    {
                    tmp = (DecVals[s[i]] << 18) | (DecVals[s[i + 1]] << 12) |
                    (DecVals[s[i + 2]] << 6) | DecVals[s[i + 3]];
                    res[ix] = (byte)(tmp >16);
                    res[ix + 1] = (byte)((tmp >8) & 0xFF);
                    res[ix + 2] = (byte)(tmp & 0xFF);
                    ix += 3;
                    }
                    return res;
                    }
                    }

                    Comment

                    • Jon Skeet [C# MVP]

                      #25
                      Re: Base64 question

                      Arne Vajhøj <arne@vajhoej.d kwrote:
                      This isn't usually for communicating between two applications though -
                      it's to allow a stateless application to communicate effectively with
                      itself. In other words, you're in complete control of both "ends" of
                      the conversation, so can be compatible with yourself appropriately.
                      Base64 happens to be a pretty simple format for representing arbitrary
                      binary data, and it just needs a little tweak for the sake of URL
                      encoding.
                      There are always some excuse to break the standards.

                      It starts with being used for one page communicating with itself. Then
                      it become used for communicating between pages. Then it starts getting
                      used down the lower layers. Then it gets exposed as a service to
                      Java and Python apps. Etc.etc..
                      So you avoid doing that - keep it very tightly controlled, and there
                      are no problems. I really don't see anything wrong in this case.

                      --
                      Jon Skeet - <skeet@pobox.co m>
                      http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
                      If replying to the group, please do not mail me too

                      Comment

                      • =?ISO-8859-1?Q?Arne_Vajh=F8j?=

                        #26
                        Re: Base64 question

                        Jon Skeet [C# MVP] wrote:
                        Arne Vajhøj <arne@vajhoej.d kwrote:
                        >>This isn't usually for communicating between two applications though -
                        >>it's to allow a stateless application to communicate effectively with
                        >>itself. In other words, you're in complete control of both "ends" of
                        >>the conversation, so can be compatible with yourself appropriately.
                        >>Base64 happens to be a pretty simple format for representing arbitrary
                        >>binary data, and it just needs a little tweak for the sake of URL
                        >>encoding.
                        >There are always some excuse to break the standards.
                        >>
                        >It starts with being used for one page communicating with itself. Then
                        >it become used for communicating between pages. Then it starts getting
                        >used down the lower layers. Then it gets exposed as a service to
                        >Java and Python apps. Etc.etc..
                        >
                        So you avoid doing that - keep it very tightly controlled, and there
                        are no problems. I really don't see anything wrong in this case.
                        How does one prevent code reuse ?

                        Arne

                        Comment

                        • Jon Skeet [C# MVP]

                          #27
                          Re: Base64 question

                          Arne Vajhøj <arne@vajhoej.d kwrote:
                          So you avoid doing that - keep it very tightly controlled, and there
                          are no problems. I really don't see anything wrong in this case.
                          How does one prevent code reuse ?
                          There's no problem reusing the code - within the appropriate layer.
                          There's no reason why multiple web applications shouldn't all use the
                          same code converting URL parameters into arbitrary binary data. You
                          just need to be careful not to use it inappropriately elsewhere.
                          Software engineering always requires discipline like that. Naming the
                          class UrlSafeBase64 or something like that would make it pretty obvious
                          though, IMO.

                          --
                          Jon Skeet - <skeet@pobox.co m>
                          http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
                          If replying to the group, please do not mail me too

                          Comment

                          Working...