email processing script, need help trying to catch =0D and =5f encoded characters

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • chris_fieldhouse@hotmail.com

    email processing script, need help trying to catch =0D and =5f encoded characters

    Hi,

    I have a script for processing emails,
    The script finds email sent to a particular alias, grabs the body text
    of the email and stores it into a database.

    Problem is that certain character like '_' sometimes get stored as =5F,
    and some email clients seem to add in =0D encoded characters.

    I store the text in an Mysql database, and depending on the message
    type I also set a flag to indicate whether its a plain text message, or
    a MIME message with HTML coding.
    Its alwasy the HTML messages that have the extra characters.

    here is the code segment I modified from php.net to break down the
    email structure and grab just the body text (not interested in storring
    any attachments).

    Any help would be appreciated.

    # some borrowed code to get the email contents and attachments,
    $MIME = FALSE;
    $debug = "";

    $struct = imap_fetchstruc ture($conn, $msg);
    $parts = $struct->parts;
    $i = 0;

    # messages are either simple, only text, or complex with attachments.
    if (!$parts) { /* Simple message, only 1 piece */
    $content = imap_body($conn , $msg);
    } else { /* Complicated message, multiple parts */

    # complex message: multi-part - dump attachments (do not forward to
    ANK).
    $endwhile = false;
    $stack = array(); /* Stack while parsing message */
    $content = ""; /* Content of message */

    while (!$endwhile) {
    if (!$parts[$i]) {
    if (count($stack) > 0) {
    $parts = $stack[count($stack)-1]["p"];
    $i = $stack[count($stack)-1]["i"] + 1;
    array_pop($stac k);
    } else {
    $endwhile = true;
    }
    }

    if (!$endwhile) {
    /* Create message part first (example '1.2.3') */
    $partstring = "";
    foreach ($stack as $s) {
    $partstring .= ($s["i"]+1) . ".";
    }
    $partstring .= ($i+1);
    $debug .= strtoupper($par ts[$i]->subtype) . "\n";

    # only grab the plain text message - everything else will get dumped!

    if (strtoupper($pa rts[$i]->subtype) == "PLAIN" && $MIME ==
    FALSE) { /* Message */
    $content = imap_fetchbody( $conn, $msg, $partstring);
    }
    if (strtoupper($pa rts[$i]->subtype) == "HTML" ) { /*
    Message */
    $content = imap_fetchbody( $conn, $msg, $partstring);
    $MIME = TRUE;
    }
    }

    if ($parts[$i]->parts) {
    $stack[] = array("p" => $parts, "i" => $i);
    $parts = $parts[$i]->parts;
    $i = 0;
    } else {
    $i++;
    }
    } /* while */
    } /* complicated message */

  • d

    #2
    Re: email processing script, need help trying to catch =0D and =5f encoded characters

    <chris_fieldhou se@hotmail.com> wrote in message
    news:1139412284 .395848.305720@ o13g2000cwo.goo glegroups.com.. .[color=blue]
    > Hi,
    >
    > I have a script for processing emails,
    > The script finds email sent to a particular alias, grabs the body text
    > of the email and stores it into a database.
    >
    > Problem is that certain character like '_' sometimes get stored as =5F,
    > and some email clients seem to add in =0D encoded characters.
    >[/color]

    [snippy snip snip]



    It's a type of MIME encoding called "quoted printable". Read up about it -
    it's really quite simple to encode/decode.

    dave


    Comment

    • chris_fieldhouse@hotmail.com

      #3
      Re: email processing script, need help trying to catch =0D and =5f encoded characters

      Ah,

      So I should do this check
      if (strtoupper($pa rts[$i]->encoding) == "QUOTED-PRINTABLE")
      and if true, then use
      imap_qprint function to convert it to 8-Bit.

      Does that seem like a reasonable approach?

      Thanks for your help.

      Comment

      • Mike Scougall

        #4
        Re: email processing script, need help trying to catch =0D and =5f encoded characters


        <chris_fieldhou se@hotmail.com> wrote in message
        news:1139412284 .395848.305720@ o13g2000cwo.goo glegroups.com.. .[color=blue]
        > Hi,
        >
        > I have a script for processing emails,
        > The script finds email sent to a particular alias, grabs the body text
        > of the email and stores it into a database.
        >
        > Problem is that certain character like '_' sometimes get stored as =5F,
        > and some email clients seem to add in =0D encoded characters.
        >
        > I store the text in an Mysql database, and depending on the message
        > type I also set a flag to indicate whether its a plain text message, or
        > a MIME message with HTML coding.
        > Its alwasy the HTML messages that have the extra characters.
        >
        > here is the code segment I modified from php.net to break down the
        > email structure and grab just the body text (not interested in storring
        > any attachments).
        >
        > Any help would be appreciated.
        >[/color]

        You could try using addslashes($con tent) just before writing the string to
        the DB. Use stripslashes() when you pull the string back out.

        There are also strip_tags() and htmlspecialchar s() functions that strip out
        or convert HTML and PHP tags. You might find one of those useful.

        HTH

        ------------------------------------
        Mike S
        Copywriting for the IT Professional



        Comment

        • chris_fieldhouse@hotmail.com

          #5
          Re: email processing script, need help trying to catch =0D and =5f encoded characters

          I currently do the addslashes before saving to the database.

          Doing a simple echo of the $contents in the php script right after the
          imap_fetchbody shows me that the Quoted-Printable are already present,
          so it now becomes a problem of recognising that they are in there and
          converting them accordingly.

          Chris.

          Comment

          • chris_fieldhouse@hotmail.com

            #6
            Re: email processing script, need help trying to catch =0D and =5f encoded characters

            Dave,

            Just putting this code in does the trick:
            if (strtoupper($pa rts[$i]->encoding) == "4")
            $content = imap_qprint($co ntent);

            All the Quoted Printable characters are converted.

            Thanks.

            Chris

            Comment

            • d

              #7
              Re: email processing script, need help trying to catch =0D and =5f encoded characters

              <chris_fieldhou se@hotmail.com> wrote in message
              news:1139430604 .447530.308590@ f14g2000cwb.goo glegroups.com.. .[color=blue]
              > Dave,
              >
              > Just putting this code in does the trick:
              > if (strtoupper($pa rts[$i]->encoding) == "4")
              > $content = imap_qprint($co ntent);
              >
              > All the Quoted Printable characters are converted.
              >
              > Thanks.
              >
              > Chris[/color]

              Nice!

              dave


              Comment

              Working...