Thursday, May 19, 2016

Extract pictures from a mbs/mbox export

If you're using a mail client that can export its mails to a mbs or mbox format; like Opera or Thunderbird; and you want to parse these mails; for example to extract all pictures or any other operation, you can follow this approach using a formail and PHP.

Firstly you'll need to split the mbox archive to separate all the email as we'll use PHP to parse it and you're going to explode all the memory and timeout settings if you parse it in one go.

The command to do that is:

#cat Inbox.mbs | formail -ds sh -c 'cat > extra/msg.$FILENO'

This will generate a separate file for each email (msg.XXX) in a separate directory (extra).


Then you can use PHP and the mime_parser class (download it here mime_parser.php, rfc822_addresses.php.html ) to iterate other all these files and extract the picture (or other attachments).
 // get the mails in separated files with
//cat Inbox.mbs | formail -ds sh -c 'cat > extra/msg.$FILENO'
require_once('mime_parser.php');
require_once('rfc822_addresses.php');

$mail_dirs = array("extra");

function dirToArray($dir)
{
 
   $result = array();

   $cdir = scandir($dir);
   foreach ($cdir as $key => $value)
   {
      if (!in_array($value,array(".","..",".DS_Store")))
      {
         if (is_dir($dir . DIRECTORY_SEPARATOR . $value))
         {
            $result[$value] = dirToArray($dir . DIRECTORY_SEPARATOR . $value);
         }
         else
         {
            $result[] = $value;
         }
      }
   }
 
   return $result;
}

$mime=new mime_parser_class;
$mime->ignore_syntax_errors = 1;

foreach($mail_dirs as $mail_dir)
{
    $files = dirToArray($mail_dir);
    //print_r($files);
    foreach($files as $file)
    {
        echo "doing:". $mail_dir.'/'.$file . "\n";
        $fd = fopen($mail_dir.'/'.$file, "r");
        $email = "";
        while (!feof($fd))
        {
            $email .= fread($fd, 1024);
        }
        fclose($fd);

      
      
        $parameters=array('Data'=>$email);

        $mime->Decode($parameters, $decoded);
        //print_r($decoded);
        $file_name = "";
        foreach($decoded[0]['Parts'] as $part)
        {  
            //EVENT TIME: 2015-12-15,08:18:40
            //print_r($part);
            if(preg_match('/text\/plain/', $part['Headers']['content-type:']))
            {
                echo "TEXT\n";
                preg_match('/EVENT TIME: (.+)\n/', $part['Body'], $matches);
                print_r($matches);
                //$file_name = mb_ereg_replace("([^\w\s\d\-_~,;\[\]\(\).])", '', $matches[1]);
                $file_name = mb_ereg_replace(":", '-', $matches[1]);
                $file_name = mb_ereg_replace(",", '_', $file_name);
                echo $file_name . "\n";
            }
            if(preg_match('/image\/jpeg/', $part['Headers']['content-type:']))
            {
                echo "IMAGE\n";
                preg_match('/(.)\.jpg/', $part['FileName'], $matches);
                print_r($matches);
                file_put_contents('extracted/'.$file_name.'_'.$matches[1].'.jpg', $part['Body']);
            }  
        }
    }
}

echo "ALL END\n";

?>


In these case I'm using the message content to extract information about the pictures, to rename them as they all have the same name and save them in a separate folder (extracted).

This page is powered by Blogger. Isn't yours?