Parsing Raw email data

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Chrisatnetronix
    New Member
    • Aug 2010
    • 2

    Parsing Raw email data

    Currently I need a email piping to a php script then I need to separate into variables

    $from
    $subject
    $message


    I then need to take the above variables and insert them into a mysql database

    I wrote a parser php script that does ok, but it seems to work in

    thunderbird email client
    comcast webmail

    if I use outlook it includes a bunch of encryption code

    and yahoo and gmail make the message include a bunch of things like this:

    --000e0cd24e8a6d9 440048d2e4f27
    Content-Type: text/plain; charset=ISO-8859-1

    messtest

    --000e0cd24e8a6d9 440048d2e4f27
    Content-Type: text/html; charset=ISO-8859-1
    Content-Transfer-Encoding: quoted-printable


    if the $message is "messtest" that's all i need not the other stuff for message

    the same is with the $from i need:

    just test@gmail.com (example output)

    I instead get:

    tester <test@gmail.com >


    and it seems the output from each variable varies via different mail servers.

    I need a universal parser so I can get the variables I need no matter what mail server they use.

    here is the code I have made using it to parse then testing it by sending it to a text file to view.

    Code:
    #!/usr/bin/php -q
    <?php
    // read from stdin
    $fp = fopen("php://stdin", "r");
    $email = "";
    while (!feof($fp)) {
    $email .= fgets($fp, 1024);
    }
    fclose($fp);
    // handle email
    $lines = explode("\n", $email);
    // empty vars
    $from = "";
    $subject = "";
    $headers = "";
    $message = "";
    $splittingheaders = true;
    
    for ($i=0; $i < count($lines); $i++) {
    if ($splittingheaders) {
    // this is a header
    $headers .= $lines[$i]."\n";
    // look out for special headers
    if (preg_match("/^Subject: (.*)/", $lines[$i], $matches)) {
    $subject = $matches[1];
    }
    if (preg_match("/^From: (.*)/", $lines[$i], $matches)) {
    $from = $matches[1];
    }
    } else {
    // not a header, but message
    $message .= $lines[$i]."\n";
    }
    if (trim($lines[$i])=="") {
    // empty line, header section has ended
    $splittingheaders = false;
    
    
    }
    }
     
    
    
    
    //write mail to file
    //emails.txt is chmod 777
    $out = fopen("emails.txt","a+");
    fwrite($out, $message);
    fclose($out);
    
    
    
    ?>
    Attached Files
  • Chrisatnetronix
    New Member
    • Aug 2010
    • 2

    #2
    Here is a resolution that fixes 99 percent of the parsing issues:

    this works and has been tested in AOL, Yahoo, MSN, Gmail OutLook, and Thunderbird.

    The only one that still has some raw code is gmail and it only shows this in the message:

    --001636765830970 8d3048d428b34
    Content-Type: text/plain; charset=ISO-8859-1

    no ohters show any thing:

    simply add this code under but before the text file write part of the code I supplied above.

    Code:
    preg_match("/boundary=\".*?\"/i", $headers, $boundary);
    $boundaryfulltext = $boundary[0];
    
    if ($boundaryfulltext!="")
    {
    $find = array("/boundary=\"/i", "/\"/i");
    $boundarytext = preg_replace($find, "", $boundaryfulltext);
    $splitmessage = explode("--" . $boundarytext, $message);
    $fullmessage = ltrim($splitmessage[1]);
    preg_match('/\n\n(.*)/is', $fullmessage, $splitmore);
    
    if (substr(ltrim($splitmore[0]), 0, 2)=="--")
    {
    $actualmessage = $splitmore[0];
    }
    else
    {
    $actualmessage = ltrim($splitmore[0]);
    }
    
    }
    else
    {
    $actualmessage = ltrim($message);
    }
    
    $clean = array("/\n--.*/is", "/=3D\n.*/s");
    $cleanmessage = trim(preg_replace($clean, "", $actualmessage));
    then after that you can install your insert into mysql code or whatever you like.


    I must admit parsing raw email universally ain't easy.....

    Comment

    Working...