About Base 34 or Base 36 string encoding

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • ashitpro
    Recognized Expert Contributor
    • Aug 2007
    • 542

    About Base 34 or Base 36 string encoding

    I have got a RSA signature which is 512 bytes long.
    I was looking for a way to convert this string into base34 or base36 format.

    I searched a lot, but everywhere I saw converting number to given base. Here, I am looking for a big string to encode.

    So far, I tried following code:

    Code:
    char const c_base_encoding[] = "0123456789ABCDEFGHJKLMNPQRSTUVWXYZ";
    size_t const c_base_encoding_size = (sizeof(c_base_encoding)/sizeof(c_base_encoding[0])) - 1;
    
    std::string base_encode( unsigned __int8 * data, size_t size )
    {
    	std::string encoded;
    	unsigned long long value = 0;
    
    	size_t tmp  = sizeof(unsigned long long);
    
    	for(size_t i = 0; i <= size; i+=tmp)
    	{
    		memcpy(&value, data+i,tmp);
    		while( value > 0 )
    		{
    			encoded = c_base_encoding[ value % c_base_encoding_size ] + encoded;
    			value /= c_base_encoding_size;
    		}
    	}
    
    	return encoded;
    }
    Here, I am basically reading 8 bytes from buffer and treating it as a long and then convert it to base34.
    The problem is, final output should be 101 bytes, but its getting more than 400 bytes. I am missing something in here. Any idea? or do you suggest any other method? Reading long value out of byte array, is it a valid way of encoding?
  • Rabbit
    Recognized Expert MVP
    • Jan 2007
    • 12517

    #2
    If you're converting 512-bytes encoded in a Base256 scheme to a Base36 scheme, you should expect an increase in bytes. I don't know why you would expect a decrease in bytes.

    It's been a while since I've done any C++ so I can't verify the veracity of the code. However, you're encoding algorithm looks off. You need to encode the full 8 bytes you're bringing in and you're not doing that. You're stopping as soon as the value becomes 0.

    Take a look at the following example in Base2
    Code:
    value = some byte variable
    encoded = ""
    value = 2
    encoded = value mod 2 + ""
    encoded = 2 mod 2 + ""
    encoded = 0 + ""
    encoded = "0"
    value = value / 2
    value = 0
    That's where your algorithm would stop. Your encoded string would be "0". When it should be "00000010". Also, another problem is that you don't account for the fact that a Base36 isn't a Base2 derivative. This means that at the end of your 8 bytes, you will have leftover data that needs to be kept, more data should be appended to the front, and then it needs to continue encoding. Is there a reason you decided to use Base36 other than convenience? Because it's troublesome since it's not a power of two.

    Comment

    Working...