8 Steps To Avoid Mojibake

Mojibake problem is common problem which is usually seen with many non English sites or program. Non English content needs some special treatment while processing, saving, showing or exporting. There are many mistake made during software development which creates mojibake problem. There are few precaution to take to avoid mojibake problem.

Following are steps to avoid mojibake

  1. Check encoding of program file. Many time file is not save in required encoding or corrupted. File itself contain some mojibake char. To avoid this problem use suitable editor which supports require encoding. While saving file check encoding/char set property with available program (Many editor provides “Character Code” or “Encoding” property in SAVE/SAVE AS dialog box).
  2. Verify following properties in database at different level (Server, Connection, DB, Table, Field)
    -Charset, Collation
  3. Set suitable charset or locale settings just after connecting to DB. This will solve some big problem.
    For example if you are using PHP-MySQL:

    /************** PHP Code starts *****************/
    //Connect to mysql
    mysql_connect($host, $uid, $pwd, true);
    //select database
    mysql_select_db($dbase);

    //Set connction charset
    @mysql_query(“SET CHARACTER SET $charset”);

    //Set connction charset
    mysql_query(“SET NAMES $charset;”);
    /************** PHP Code ends *******************/

  4. Set suitable character encoding in web page using meta tag. like following.
    <meta http-equiv=”Content-Type” content=”text/html; charset=utf-8″ />
  5. Set suitable charset for CSS file & Javascript.
    Put something like following in first line of CSS file
    @charset “UTF-8”;
    Define charset property in <script> tag
  6. Set suitable value for following property for email, while sending email from program.
    CharSet, Encoding, ContentType
  7. While exporting data as CSV set suitable value for following header properties.
    Content-Type, Content-Disposition, Content-Language, Content-transfer-encoding
  8. While importing/exporting/converting content between different format, always check for charset/encoding property in available program.

Some terminologies need to know while dealing with i18n or l10n.

Mojibake:

http://en.wikipedia.org/wiki/Mojibake

Character encoding:

http://en.wikipedia.org/wiki/Character_encoding

Code page:

http://en.wikipedia.org/wiki/Code_page

Windows code page:

http://en.wikipedia.org/wiki/Windows_code_page

Unicode:

http://en.wikipedia.org/wiki/Unicode

UTF-8:

http://en.wikipedia.org/wiki/UTF-8

Shift JIS:

http://en.wikipedia.org/wiki/Shift_JIS

EUC:

http://en.wikipedia.org/wiki/Extended_Unix_Coding

EUC-JP:

http://en.wikipedia.org/wiki/Extended_Unix_Coding#EUC-JP

Collation:

http://en.wikipedia.org/wiki/Collation

4 thoughts on “8 Steps To Avoid Mojibake

  1. It very good article for a person looking for solution for solving such problem.

    i came to know from here only that such problems are known as mojibake.

    these steps can also be useful for other non ascii range chars as well.

  2. Very useful information to avoid the mojibake.
    This artical includes almost all steps which you can check to avoid mojibake.

  3. Some additional tips:

    1. When creating pages in UTF-8 for the web always be sure to add a Byte Order Mark at the beginning of the document. In Dreamweaver you can find this under Modify -> Page Properties -> Title/Encoding as the option “Include Unicode Signature (BOM).”

    2. For Japanese email JIS (ISO-2022-JP) is still preferable to UTF-8 in terms of mail client support.

    More general info available here:
    http://www.denbushi.net/2003/02/28/avoiding-mojibake/

Leave a comment