Base64

**José de Paula** · Nov 14 '05, 05:24 AM

Re: Base64

Em Thu, 01 Apr 2004 03:28:13 -0800, John escreveu:
[color=blue]
> Hi all,
> I've been going through google and yahoo looking for a certain base64
> decoder in C without success. What I'm after is something that you can
> pass a base64 encoded string into and get back a decoded String.
>[/color]
Take one of those free e-mail clients and look into its source code for an
insight. Mutt (found at http://www.mutt.org) is such a client, and
certainly has the code you need.

As an aside, this question is offtopic here, since it deals with an
algorithm, not the C language itself. comp.programmin g would be a more
adequate place to seek help.

--
Quidquid latine dictum sit altum viditur

**Jeremy Yallop** · Nov 14 '05, 05:24 AM

Re: Base64

John wrote:[color=blue]
> I've been going through google and yahoo looking for a certain base64
> decoder in C without success. What I'm after is something that you can
> pass a base64 encoded string into and get back a decoded String.[/color]

Kevin Easton posted code to do this a while ago:

http://groups.google.com/groups?selm=ahr1ji%246v3%241%40tomato.pcug.org.au

Jeremy.

**Lew Pitcher** · Nov 14 '05, 05:24 AM

Re: Base64

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John wrote:
| Hi all,
| I've been going through google and yahoo looking for a certain base64
| decoder in C without success. What I'm after is something that you can
| pass a base64 encoded string into and get back a decoded String.
|
| Any help is very much appreciated.
| Thanks
| Philip.

Here's one that I put together as a testbed for some mainframe-to-unix tools I
was working on. I used this C code as a model for a COBOL program that
manipulated base64 encodings.

/*
** MIME Base64 coding examples
**
** encode() encodes an arbitrary data block into MIME Base64 format string
** decode() decodes a MIME Base64 format string into raw data
**
** Global table base64[] carries the MIME Base64 conversion characters
*/

/* Global data used by both binary-to-base64 and base64-to-binary conversions */
static char base64[] = "ABCDEFGHIJKLMN OPQRSTUVWXYZ"
"abcdefghijklmn opqrstuvwxyz"
"0123456789 "
"+/";

/*
** ENCODE RAW into BASE64
*/

/* Encode source from raw data into Base64 encoded string */
int encode(unsigned s_len, char *src, unsigned d_len, char *dst)
{
unsigned triad;

for (triad = 0; triad < s_len; triad += 3)
{
unsigned long int sr;
unsigned byte;

for (byte = 0; (byte<3)&&(tria d+byte<s_len); ++byte)
{
sr <<= 8;
sr |= (*(src+triad+by te) & 0xff);
}

sr <<= (6-((8*byte)%6))%6 ; /* leftshift to 6bit align */

if (d_len < 4) return 1; /* error - dest too short */

*(dst+0) = *(dst+1) = *(dst+2) = *(dst+3) = '=';
switch(byte)
{
case 3:
*(dst+3) = base64[sr&0x3f];
sr >>= 6;
case 2:
*(dst+2) = base64[sr&0x3f];
sr >>= 6;
case 1:
*(dst+1) = base64[sr&0x3f];
sr >>= 6;
*(dst+0) = base64[sr&0x3f];
}
dst += 4; d_len -= 4;
}

return 0;
}

/*
** DECODE BASE64 into RAW
*/

/* determine which sextet value a Base64 character represents */
int tlu(int byte)
{
int index;

for (index = 0; index < 64; ++index)
if (base64[index] == byte)
break;
if (index > 63) index = -1;
return index;
}

/* Decode source from Base64 encoded string into raw data */
int decode(unsigned s_len, char *src, unsigned d_len, char *dst)
{
unsigned six, dix;

dix = 0;

for (six = 0; six < s_len; six += 4)
{
unsigned long sr;
unsigned ix;

sr = 0;
for (ix = 0; ix < 4; ++ix)
{
int sextet;

if (six+ix >= s_len)
return 1;
if ((sextet = tlu(*(src+six+i x))) < 0)
break;
sr <<= 6;
sr |= (sextet & 0x3f);
}

switch (ix)
{
case 0: /* end of data, no padding */
return 0;

case 1: /* can't happen */
return 2;

case 2: /* 1 result byte */
sr >>= 4;
if (dix > d_len) return 3;
*(dst+dix) = (sr & 0xff);
++dix;
break;

case 3: /* 2 result bytes */
sr >>= 2;
if (dix+1 > d_len) return 3;
*(dst+dix+1) = (sr & 0xff);
sr >>= 8;
*(dst+dix) = (sr & 0xff);
dix += 2;
break;

case 4: /* 3 result bytes */
if (dix+2 > d_len) return 3;
*(dst+dix+2) = (sr & 0xff);
sr >>= 8;
*(dst+dix+1) = (sr & 0xff);
sr >>= 8;
*(dst+dix) = (sr & 0xff);
dix += 3;
break;
}
}
return 0;
}

- --
Lew Pitcher
IT Consultant, Enterprise Application Architecture,
Enterprise Technology Solutions, TD Bank Financial Group

(Opinions expressed are my own, not my employers')
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)

iD8DBQFAbBTFagV FX4UWr64RAl2AAK CxunT3bzDQ16w1s OWmh7Krs+WEpwCg sdL7
wtz0zplSxc9B4fv pS/8b/Dc=
=Hbsy
-----END PGP SIGNATURE-----

**Paul Hsieh** · Nov 14 '05, 05:25 AM

Re: Base64

philip@donegal. net (John) wrote:[color=blue]
> I've been going through google and yahoo looking for a certain base64
> decoder in C without success. What I'm after is something that you can
> pass a base64 encoded string into and get back a decoded String.[/color]

The Better String Library contains a auxilliary function for doing the
inner loop of base64 encoding and decoding:

The Better String Library

http://bstring.sf.net/

You have to deal with the headers yourself. The reason bstrlib does
this is because many uses of base64 do not include a header.

--
Paul Hsieh

Pobox is now part of Fastmail

http://www.pobox.com/~qed/

Pobox has been discontinued as a separate service, and all existing customers moved to the Fastmail platform.

The Better String Library

http://bstring.sf.net/

**Dave Thompson** · Nov 14 '05, 05:28 AM

Re: Base64

On Thu, 01 Apr 2004 08:10:32 -0500, Lew Pitcher <Lew.Pitcher@td .com>
wrote:
[color=blue]
> Here's one that I put together as a testbed for some mainframe-to-unix tools I
> was working on. I used this C code as a model for a COBOL program that
> manipulated base64 encodings.[/color]
[color=blue]
> int encode(unsigned s_len, char *src, unsigned d_len, char *dst)[/color]

Could make src const char*; and theoretically better to use size_t.
[color=blue]
> {
> unsigned triad;
>
> for (triad = 0; triad < s_len; triad += 3)
> {
> unsigned long int sr;
> unsigned byte;
>
> for (byte = 0; (byte<3)&&(tria d+byte<s_len); ++byte)
> {
> sr <<= 8;
> sr |= (*(src+triad+by te) & 0xff);
> }
>[/color]
This uses sr uninitialized; in practice unsigned ints won't have trap
representations or even padding, but it's still unclean.

I assume/hope you do (most) array references as *(ptr+sub) instead of
ptr[sub] for alignment with the COBOL; it's still ugly.
[color=blue]
> sr <<= (6-((8*byte)%6))%6 ; /* leftshift to 6bit align */
>[/color]
Yuck. Confusing *and* inefficient. Why not
sr <<= (3-byte)*(8-6); /* leftshift for skipped bytes less skipped
output chars */
[color=blue]
> /* determine which sextet value a Base64 character represents */
> int tlu(int byte)
> {
> int index;
>
> for (index = 0; index < 64; ++index)
> if (base64[index] == byte)
> break;
> if (index > 63) index = -1;
> return index;
> }
>[/color]
Much more natural in C to use strchr, or even memchr; or set up and
use a reverse translation table. COBOL again?
[color=blue]
> /* Decode source from Base64 encoded string into raw data */
> int decode(unsigned s_len, char *src, unsigned d_len, char *dst)[/color]

Similarly.
[color=blue]
> {
> unsigned six, dix;
>
> dix = 0;
>
> for (six = 0; six < s_len; six += 4)
> {
> unsigned long sr;
> unsigned ix;
>
> sr = 0;[/color]

This time you do initialize sr.
[color=blue]
> for (ix = 0; ix < 4; ++ix)
> {
> int sextet;
>
> if (six+ix >= s_len)
> return 1;
> if ((sextet = tlu(*(src+six+i x))) < 0)
> break;
> sr <<= 6;
> sr |= (sextet & 0x3f);[/color]

Don't need this &, a valid char decode never exceeds 6 bits.
[color=blue]
> }
>
> switch (ix)
> {
> case 0: /* end of data, no padding */
> return 0;
>[/color]
Or padding of a full group of 4 =, which is at least one of the
standards(!) and your decode does not distinguish from garbage.
If that matters. And of course you don't check padding ='s at all; are
you requiring your caller(s) do that? It's going to be hard(er) for
them, because you don't return any indication of how many chars were
validly decoded, or even into how many bytes.
[color=blue]
> case 1: /* can't happen */
> return 2;
>[/color]
(Can't happen *legally*.)
[color=blue]
> case 2: /* 1 result byte */
> sr >>= 4;
> if (dix > d_len) return 3;[/color]

dix >= d_len or if you prefer dix+1 > d_len. Unless your d_len already
allows for at least one additional (perhaps terminator?) byte.
[color=blue]
> *(dst+dix) = (sr & 0xff);
> ++dix;
> break;
>[/color]
Similarly for the 2-byte and 3-byte cases.

In encode you have an offset stepping through the data but adjust the
pointer and count for output chars; in decode you use offsets on both.
I would prefer to be consistent; in C I think I would do adjust in all
cases; and also use names consistent betwen the two directions.

In practice I would probably also loop over only full groups with
their more regular logic, and then handle the more complicated partial
leftovers once, but you don't need and might not even want that for a
reference version.

- David.Thompson1 at worldnet.att.ne t

**Dave Thompson** · Nov 14 '05, 05:41 AM

Re: Base64

On Thu, 01 Apr 2004 08:10:32 -0500, Lew Pitcher <Lew.Pitcher@td .com>
wrote:
[color=blue]
> Here's one that I put together as a testbed for some mainframe-to-unix tools I
> was working on. I used this C code as a model for a COBOL program that
> manipulated base64 encodings.[/color]
[color=blue]
> int encode(unsigned s_len, char *src, unsigned d_len, char *dst)[/color]

Could make src const char*; and theoretically better to use size_t.
[color=blue]
> {
> unsigned triad;
>
> for (triad = 0; triad < s_len; triad += 3)
> {
> unsigned long int sr;
> unsigned byte;
>
> for (byte = 0; (byte<3)&&(tria d+byte<s_len); ++byte)
> {
> sr <<= 8;
> sr |= (*(src+triad+by te) & 0xff);
> }
>[/color]
This uses sr uninitialized; in practice unsigned ints won't have trap
representations or even padding, but it's still unclean.

I assume/hope you do (most) array references as *(ptr+sub) instead of
ptr[sub] for alignment with the COBOL; it's still ugly.
[color=blue]
> sr <<= (6-((8*byte)%6))%6 ; /* leftshift to 6bit align */
>[/color]
Yuck. Confusing *and* inefficient. Why not
sr <<= (3-byte)*(8-6); /* leftshift for skipped bytes less skipped
output chars */
[color=blue]
> /* determine which sextet value a Base64 character represents */
> int tlu(int byte)
> {
> int index;
>
> for (index = 0; index < 64; ++index)
> if (base64[index] == byte)
> break;
> if (index > 63) index = -1;
> return index;
> }
>[/color]
Much more natural in C to use strchr, or even memchr; or set up and
use a reverse translation table. COBOL again?
[color=blue]
> /* Decode source from Base64 encoded string into raw data */
> int decode(unsigned s_len, char *src, unsigned d_len, char *dst)[/color]

Similarly.
[color=blue]
> {
> unsigned six, dix;
>
> dix = 0;
>
> for (six = 0; six < s_len; six += 4)
> {
> unsigned long sr;
> unsigned ix;
>
> sr = 0;[/color]

This time you do initialize sr.
[color=blue]
> for (ix = 0; ix < 4; ++ix)
> {
> int sextet;
>
> if (six+ix >= s_len)
> return 1;
> if ((sextet = tlu(*(src+six+i x))) < 0)
> break;
> sr <<= 6;
> sr |= (sextet & 0x3f);[/color]

Don't need this &, a valid char decode never exceeds 6 bits.
[color=blue]
> }
>
> switch (ix)
> {
> case 0: /* end of data, no padding */
> return 0;
>[/color]
Or padding of a full group of 4 =, which is at least one of the
standards(!) and your decode does not distinguish from garbage.
If that matters. And of course you don't check padding ='s at all; are
you requiring your caller(s) do that? It's going to be hard(er) for
them, because you don't return any indication of how many chars were
validly decoded, or even into how many bytes.
[color=blue]
> case 1: /* can't happen */
> return 2;
>[/color]
(Can't happen *legally*.)
[color=blue]
> case 2: /* 1 result byte */
> sr >>= 4;
> if (dix > d_len) return 3;[/color]

dix >= d_len or if you prefer dix+1 > d_len. Unless your d_len already
allows for at least one additional (perhaps terminator?) byte.
[color=blue]
> *(dst+dix) = (sr & 0xff);
> ++dix;
> break;
>[/color]
Similarly for the 2-byte and 3-byte cases.

In encode you have an offset stepping through the data but adjust the
pointer and count for output chars; in decode you use offsets on both.
I would prefer to be consistent; in C I think I would do adjust in all
cases; and also use names consistent betwen the two directions.

In practice I would probably also loop over only full groups with
their more regular logic, and then handle the more complicated partial
leftovers once, but you don't need and might not even want that for a
reference version.

- David.Thompson1 at worldnet.att.ne t

Base64

Base64

Comment

Comment

Comment

Comment

Comment

Comment