PHPMailer BUG : Splits the subject line in the middle of a multi-byte character

PHPMailer is one of my favorite library for sending email since long. I though it is really mature library and still I believe it. Nowadays its getting more & more mature.  I was using PHPMailer v1.73.

Last week I week, I have seen major bug in my application. It makes junk char/mojibake in subject line. I use all content in utf-8. I set all settings to make email completely utf-8. Then I try to read code of my project 2-3 times but it has not problem at all. Then I moved to Google to search solution for the problem. I found following links.

#2957 ([PATCH] Long utf-8 encoded subject fails (phpmailer bug)) – symfony – Trac

Then I realized that problem is in PHPMailer library. Following is the actual problem. If a subject line contains non US ASCII characters and the encoding is set to UTF-8, phpMailer occasionally splits the subject line in the middle of a multi-byte character, causing the encoded representation to appear in the email client. Specially problem appears when email subject contains multi-byte char & length is more than 20 char. You can usually find problem around or after 20th char.

#2957: encoding.patch – symfony – Trac

The above link has also given solution to the problem. But before adopting patch,I visited the PHPMailer site ( http://phpmailer.codeworxtech.com). I found PHPMailer 2.2.1 is already released on July 20 2008. Then I downloaded the latest version & checked its code, same patch is found in latest release. So I upgraded to PHPMailer 2.2.1.

8 Steps To Avoid Mojibake

Mojibake problem is common problem which is usually seen with many non English sites or program. Non English content needs some special treatment while processing, saving, showing or exporting. There are many mistake made during software development which creates mojibake problem. There are few precaution to take to avoid mojibake problem.

Following are steps to avoid mojibake

  1. Check encoding of program file. Many time file is not save in required encoding or corrupted. File itself contain some mojibake char. To avoid this problem use suitable editor which supports require encoding. While saving file check encoding/char set property with available program (Many editor provides “Character Code” or “Encoding” property in SAVE/SAVE AS dialog box).
  2. Verify following properties in database at different level (Server, Connection, DB, Table, Field)
    -Charset, Collation
  3. Set suitable charset or locale settings just after connecting to DB. This will solve some big problem.
    For example if you are using PHP-MySQL:

    /************** PHP Code starts *****************/
    //Connect to mysql
    mysql_connect($host, $uid, $pwd, true);
    //select database
    mysql_select_db($dbase);

    //Set connction charset
    @mysql_query(“SET CHARACTER SET $charset”);

    //Set connction charset
    mysql_query(“SET NAMES $charset;”);
    /************** PHP Code ends *******************/

  4. Set suitable character encoding in web page using meta tag. like following.
    <meta http-equiv=”Content-Type” content=”text/html; charset=utf-8″ />
  5. Set suitable charset for CSS file & Javascript.
    Put something like following in first line of CSS file
    @charset “UTF-8”;
    Define charset property in <script> tag
  6. Set suitable value for following property for email, while sending email from program.
    CharSet, Encoding, ContentType
  7. While exporting data as CSV set suitable value for following header properties.
    Content-Type, Content-Disposition, Content-Language, Content-transfer-encoding
  8. While importing/exporting/converting content between different format, always check for charset/encoding property in available program.

Some terminologies need to know while dealing with i18n or l10n.

Mojibake:

http://en.wikipedia.org/wiki/Mojibake

Character encoding:

http://en.wikipedia.org/wiki/Character_encoding

Code page:

http://en.wikipedia.org/wiki/Code_page

Windows code page:

http://en.wikipedia.org/wiki/Windows_code_page

Unicode:

http://en.wikipedia.org/wiki/Unicode

UTF-8:

http://en.wikipedia.org/wiki/UTF-8

Shift JIS:

http://en.wikipedia.org/wiki/Shift_JIS

EUC:

http://en.wikipedia.org/wiki/Extended_Unix_Coding

EUC-JP:

http://en.wikipedia.org/wiki/Extended_Unix_Coding#EUC-JP

Collation:

http://en.wikipedia.org/wiki/Collation