Base64

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • John

    Base64

    Hi all,
    I've been going through google and yahoo looking for a certain base64
    decoder in C without success. What I'm after is something that you can
    pass a base64 encoded string into and get back a decoded String.

    Any help is very much appreciated.
    Thanks
    Philip.
  • José de Paula

    #2
    Re: Base64

    Em Thu, 01 Apr 2004 03:28:13 -0800, John escreveu:
    [color=blue]
    > Hi all,
    > I've been going through google and yahoo looking for a certain base64
    > decoder in C without success. What I'm after is something that you can
    > pass a base64 encoded string into and get back a decoded String.
    >[/color]
    Take one of those free e-mail clients and look into its source code for an
    insight. Mutt (found at http://www.mutt.org) is such a client, and
    certainly has the code you need.

    As an aside, this question is offtopic here, since it deals with an
    algorithm, not the C language itself. comp.programmin g would be a more
    adequate place to seek help.

    --
    Quidquid latine dictum sit altum viditur

    Comment

    • Jeremy Yallop

      #3
      Re: Base64

      John wrote:[color=blue]
      > I've been going through google and yahoo looking for a certain base64
      > decoder in C without success. What I'm after is something that you can
      > pass a base64 encoded string into and get back a decoded String.[/color]

      Kevin Easton posted code to do this a while ago:



      Jeremy.

      Comment

      • Lew Pitcher

        #4
        Re: Base64

        -----BEGIN PGP SIGNED MESSAGE-----
        Hash: SHA1

        John wrote:
        | Hi all,
        | I've been going through google and yahoo looking for a certain base64
        | decoder in C without success. What I'm after is something that you can
        | pass a base64 encoded string into and get back a decoded String.
        |
        | Any help is very much appreciated.
        | Thanks
        | Philip.


        Here's one that I put together as a testbed for some mainframe-to-unix tools I
        was working on. I used this C code as a model for a COBOL program that
        manipulated base64 encodings.

        /*
        ** MIME Base64 coding examples
        **
        ** encode() encodes an arbitrary data block into MIME Base64 format string
        ** decode() decodes a MIME Base64 format string into raw data
        **
        ** Global table base64[] carries the MIME Base64 conversion characters
        */


        /* Global data used by both binary-to-base64 and base64-to-binary conversions */
        static char base64[] = "ABCDEFGHIJKLMN OPQRSTUVWXYZ"
        "abcdefghijklmn opqrstuvwxyz"
        "0123456789 "
        "+/";

        /*
        ** ENCODE RAW into BASE64
        */

        /* Encode source from raw data into Base64 encoded string */
        int encode(unsigned s_len, char *src, unsigned d_len, char *dst)
        {
        unsigned triad;

        for (triad = 0; triad < s_len; triad += 3)
        {
        unsigned long int sr;
        unsigned byte;

        for (byte = 0; (byte<3)&&(tria d+byte<s_len); ++byte)
        {
        sr <<= 8;
        sr |= (*(src+triad+by te) & 0xff);
        }

        sr <<= (6-((8*byte)%6))%6 ; /* leftshift to 6bit align */

        if (d_len < 4) return 1; /* error - dest too short */

        *(dst+0) = *(dst+1) = *(dst+2) = *(dst+3) = '=';
        switch(byte)
        {
        case 3:
        *(dst+3) = base64[sr&0x3f];
        sr >>= 6;
        case 2:
        *(dst+2) = base64[sr&0x3f];
        sr >>= 6;
        case 1:
        *(dst+1) = base64[sr&0x3f];
        sr >>= 6;
        *(dst+0) = base64[sr&0x3f];
        }
        dst += 4; d_len -= 4;
        }

        return 0;
        }

        /*
        ** DECODE BASE64 into RAW
        */

        /* determine which sextet value a Base64 character represents */
        int tlu(int byte)
        {
        int index;

        for (index = 0; index < 64; ++index)
        if (base64[index] == byte)
        break;
        if (index > 63) index = -1;
        return index;
        }

        /* Decode source from Base64 encoded string into raw data */
        int decode(unsigned s_len, char *src, unsigned d_len, char *dst)
        {
        unsigned six, dix;

        dix = 0;

        for (six = 0; six < s_len; six += 4)
        {
        unsigned long sr;
        unsigned ix;

        sr = 0;
        for (ix = 0; ix < 4; ++ix)
        {
        int sextet;

        if (six+ix >= s_len)
        return 1;
        if ((sextet = tlu(*(src+six+i x))) < 0)
        break;
        sr <<= 6;
        sr |= (sextet & 0x3f);
        }

        switch (ix)
        {
        case 0: /* end of data, no padding */
        return 0;

        case 1: /* can't happen */
        return 2;

        case 2: /* 1 result byte */
        sr >>= 4;
        if (dix > d_len) return 3;
        *(dst+dix) = (sr & 0xff);
        ++dix;
        break;

        case 3: /* 2 result bytes */
        sr >>= 2;
        if (dix+1 > d_len) return 3;
        *(dst+dix+1) = (sr & 0xff);
        sr >>= 8;
        *(dst+dix) = (sr & 0xff);
        dix += 2;
        break;

        case 4: /* 3 result bytes */
        if (dix+2 > d_len) return 3;
        *(dst+dix+2) = (sr & 0xff);
        sr >>= 8;
        *(dst+dix+1) = (sr & 0xff);
        sr >>= 8;
        *(dst+dix) = (sr & 0xff);
        dix += 3;
        break;
        }
        }
        return 0;
        }



        - --
        Lew Pitcher
        IT Consultant, Enterprise Application Architecture,
        Enterprise Technology Solutions, TD Bank Financial Group

        (Opinions expressed are my own, not my employers')
        -----BEGIN PGP SIGNATURE-----
        Version: GnuPG v1.2.4 (MingW32)

        iD8DBQFAbBTFagV FX4UWr64RAl2AAK CxunT3bzDQ16w1s OWmh7Krs+WEpwCg sdL7
        wtz0zplSxc9B4fv pS/8b/Dc=
        =Hbsy
        -----END PGP SIGNATURE-----

        Comment

        • Paul Hsieh

          #5
          Re: Base64

          philip@donegal. net (John) wrote:[color=blue]
          > I've been going through google and yahoo looking for a certain base64
          > decoder in C without success. What I'm after is something that you can
          > pass a base64 encoded string into and get back a decoded String.[/color]

          The Better String Library contains a auxilliary function for doing the
          inner loop of base64 encoding and decoding:



          You have to deal with the headers yourself. The reason bstrlib does
          this is because many uses of base64 do not include a header.

          --
          Paul Hsieh
          Pobox has been discontinued as a separate service, and all existing customers moved to the Fastmail platform.


          Comment

          • Dave Thompson

            #6
            Re: Base64

            On Thu, 01 Apr 2004 08:10:32 -0500, Lew Pitcher <Lew.Pitcher@td .com>
            wrote:
            [color=blue]
            > Here's one that I put together as a testbed for some mainframe-to-unix tools I
            > was working on. I used this C code as a model for a COBOL program that
            > manipulated base64 encodings.[/color]
            [color=blue]
            > int encode(unsigned s_len, char *src, unsigned d_len, char *dst)[/color]

            Could make src const char*; and theoretically better to use size_t.
            [color=blue]
            > {
            > unsigned triad;
            >
            > for (triad = 0; triad < s_len; triad += 3)
            > {
            > unsigned long int sr;
            > unsigned byte;
            >
            > for (byte = 0; (byte<3)&&(tria d+byte<s_len); ++byte)
            > {
            > sr <<= 8;
            > sr |= (*(src+triad+by te) & 0xff);
            > }
            >[/color]
            This uses sr uninitialized; in practice unsigned ints won't have trap
            representations or even padding, but it's still unclean.

            I assume/hope you do (most) array references as *(ptr+sub) instead of
            ptr[sub] for alignment with the COBOL; it's still ugly.
            [color=blue]
            > sr <<= (6-((8*byte)%6))%6 ; /* leftshift to 6bit align */
            >[/color]
            Yuck. Confusing *and* inefficient. Why not
            sr <<= (3-byte)*(8-6); /* leftshift for skipped bytes less skipped
            output chars */
            [color=blue]
            > /* determine which sextet value a Base64 character represents */
            > int tlu(int byte)
            > {
            > int index;
            >
            > for (index = 0; index < 64; ++index)
            > if (base64[index] == byte)
            > break;
            > if (index > 63) index = -1;
            > return index;
            > }
            >[/color]
            Much more natural in C to use strchr, or even memchr; or set up and
            use a reverse translation table. COBOL again?
            [color=blue]
            > /* Decode source from Base64 encoded string into raw data */
            > int decode(unsigned s_len, char *src, unsigned d_len, char *dst)[/color]

            Similarly.
            [color=blue]
            > {
            > unsigned six, dix;
            >
            > dix = 0;
            >
            > for (six = 0; six < s_len; six += 4)
            > {
            > unsigned long sr;
            > unsigned ix;
            >
            > sr = 0;[/color]

            This time you do initialize sr.
            [color=blue]
            > for (ix = 0; ix < 4; ++ix)
            > {
            > int sextet;
            >
            > if (six+ix >= s_len)
            > return 1;
            > if ((sextet = tlu(*(src+six+i x))) < 0)
            > break;
            > sr <<= 6;
            > sr |= (sextet & 0x3f);[/color]

            Don't need this &, a valid char decode never exceeds 6 bits.
            [color=blue]
            > }
            >
            > switch (ix)
            > {
            > case 0: /* end of data, no padding */
            > return 0;
            >[/color]
            Or padding of a full group of 4 =, which is at least one of the
            standards(!) and your decode does not distinguish from garbage.
            If that matters. And of course you don't check padding ='s at all; are
            you requiring your caller(s) do that? It's going to be hard(er) for
            them, because you don't return any indication of how many chars were
            validly decoded, or even into how many bytes.
            [color=blue]
            > case 1: /* can't happen */
            > return 2;
            >[/color]
            (Can't happen *legally*.)
            [color=blue]
            > case 2: /* 1 result byte */
            > sr >>= 4;
            > if (dix > d_len) return 3;[/color]

            dix >= d_len or if you prefer dix+1 > d_len. Unless your d_len already
            allows for at least one additional (perhaps terminator?) byte.
            [color=blue]
            > *(dst+dix) = (sr & 0xff);
            > ++dix;
            > break;
            >[/color]
            Similarly for the 2-byte and 3-byte cases.

            In encode you have an offset stepping through the data but adjust the
            pointer and count for output chars; in decode you use offsets on both.
            I would prefer to be consistent; in C I think I would do adjust in all
            cases; and also use names consistent betwen the two directions.

            In practice I would probably also loop over only full groups with
            their more regular logic, and then handle the more complicated partial
            leftovers once, but you don't need and might not even want that for a
            reference version.

            - David.Thompson1 at worldnet.att.ne t

            Comment

            • Dave Thompson

              #7
              Re: Base64

              On Thu, 01 Apr 2004 08:10:32 -0500, Lew Pitcher <Lew.Pitcher@td .com>
              wrote:
              [color=blue]
              > Here's one that I put together as a testbed for some mainframe-to-unix tools I
              > was working on. I used this C code as a model for a COBOL program that
              > manipulated base64 encodings.[/color]
              [color=blue]
              > int encode(unsigned s_len, char *src, unsigned d_len, char *dst)[/color]

              Could make src const char*; and theoretically better to use size_t.
              [color=blue]
              > {
              > unsigned triad;
              >
              > for (triad = 0; triad < s_len; triad += 3)
              > {
              > unsigned long int sr;
              > unsigned byte;
              >
              > for (byte = 0; (byte<3)&&(tria d+byte<s_len); ++byte)
              > {
              > sr <<= 8;
              > sr |= (*(src+triad+by te) & 0xff);
              > }
              >[/color]
              This uses sr uninitialized; in practice unsigned ints won't have trap
              representations or even padding, but it's still unclean.

              I assume/hope you do (most) array references as *(ptr+sub) instead of
              ptr[sub] for alignment with the COBOL; it's still ugly.
              [color=blue]
              > sr <<= (6-((8*byte)%6))%6 ; /* leftshift to 6bit align */
              >[/color]
              Yuck. Confusing *and* inefficient. Why not
              sr <<= (3-byte)*(8-6); /* leftshift for skipped bytes less skipped
              output chars */
              [color=blue]
              > /* determine which sextet value a Base64 character represents */
              > int tlu(int byte)
              > {
              > int index;
              >
              > for (index = 0; index < 64; ++index)
              > if (base64[index] == byte)
              > break;
              > if (index > 63) index = -1;
              > return index;
              > }
              >[/color]
              Much more natural in C to use strchr, or even memchr; or set up and
              use a reverse translation table. COBOL again?
              [color=blue]
              > /* Decode source from Base64 encoded string into raw data */
              > int decode(unsigned s_len, char *src, unsigned d_len, char *dst)[/color]

              Similarly.
              [color=blue]
              > {
              > unsigned six, dix;
              >
              > dix = 0;
              >
              > for (six = 0; six < s_len; six += 4)
              > {
              > unsigned long sr;
              > unsigned ix;
              >
              > sr = 0;[/color]

              This time you do initialize sr.
              [color=blue]
              > for (ix = 0; ix < 4; ++ix)
              > {
              > int sextet;
              >
              > if (six+ix >= s_len)
              > return 1;
              > if ((sextet = tlu(*(src+six+i x))) < 0)
              > break;
              > sr <<= 6;
              > sr |= (sextet & 0x3f);[/color]

              Don't need this &, a valid char decode never exceeds 6 bits.
              [color=blue]
              > }
              >
              > switch (ix)
              > {
              > case 0: /* end of data, no padding */
              > return 0;
              >[/color]
              Or padding of a full group of 4 =, which is at least one of the
              standards(!) and your decode does not distinguish from garbage.
              If that matters. And of course you don't check padding ='s at all; are
              you requiring your caller(s) do that? It's going to be hard(er) for
              them, because you don't return any indication of how many chars were
              validly decoded, or even into how many bytes.
              [color=blue]
              > case 1: /* can't happen */
              > return 2;
              >[/color]
              (Can't happen *legally*.)
              [color=blue]
              > case 2: /* 1 result byte */
              > sr >>= 4;
              > if (dix > d_len) return 3;[/color]

              dix >= d_len or if you prefer dix+1 > d_len. Unless your d_len already
              allows for at least one additional (perhaps terminator?) byte.
              [color=blue]
              > *(dst+dix) = (sr & 0xff);
              > ++dix;
              > break;
              >[/color]
              Similarly for the 2-byte and 3-byte cases.

              In encode you have an offset stepping through the data but adjust the
              pointer and count for output chars; in decode you use offsets on both.
              I would prefer to be consistent; in C I think I would do adjust in all
              cases; and also use names consistent betwen the two directions.

              In practice I would probably also loop over only full groups with
              their more regular logic, and then handle the more complicated partial
              leftovers once, but you don't need and might not even want that for a
              reference version.

              - David.Thompson1 at worldnet.att.ne t

              Comment

              Working...