Mojibake problem is common problem which is usually seen with many non English sites or program. Non English content needs some special treatment while processing, saving, showing or exporting. There are many mistake made during software development which creates mojibake problem. There are few precaution to take to avoid mojibake problem.
Following are steps to avoid mojibake
- Check encoding of program file. Many time file is not save in required encoding or corrupted. File itself contain some mojibake char. To avoid this problem use suitable editor which supports require encoding. While saving file check encoding/char set property with available program (Many editor provides “Character Code” or “Encoding” property in SAVE/SAVE AS dialog box).
- Verify following properties in database at different level (Server, Connection, DB, Table, Field)
-Charset, Collation - Set suitable charset or locale settings just after connecting to DB. This will solve some big problem.
For example if you are using PHP-MySQL:
/************** PHP Code starts *****************/
//Connect to mysql
mysql_connect($host, $uid, $pwd, true);
//select database
mysql_select_db($dbase);//Set connction charset
@mysql_query(“SET CHARACTER SET $charset”);//Set connction charset
mysql_query(“SET NAMES $charset;”);
/************** PHP Code ends *******************/ - Set suitable character encoding in web page using meta tag. like following.
<meta http-equiv=”Content-Type” content=”text/html; charset=utf-8″ /> - Set suitable charset for CSS file & Javascript.
Put something like following in first line of CSS file
@charset “UTF-8”;
Define charset property in <script> tag - Set suitable value for following property for email, while sending email from program.
CharSet, Encoding, ContentType - While exporting data as CSV set suitable value for following header properties.
Content-Type, Content-Disposition, Content-Language, Content-transfer-encoding - While importing/exporting/converting content between different format, always check for charset/encoding property in available program.
Some terminologies need to know while dealing with i18n or l10n.
Mojibake:
http://en.wikipedia.org/wiki/Mojibake
Character encoding:
http://en.wikipedia.org/wiki/Character_encoding
Code page:
http://en.wikipedia.org/wiki/Code_page
Windows code page:
http://en.wikipedia.org/wiki/Windows_code_page
Unicode:
http://en.wikipedia.org/wiki/Unicode
UTF-8:
http://en.wikipedia.org/wiki/UTF-8
Shift JIS:
http://en.wikipedia.org/wiki/Shift_JIS
EUC:
http://en.wikipedia.org/wiki/Extended_Unix_Coding
EUC-JP:
http://en.wikipedia.org/wiki/Extended_Unix_Coding#EUC-JP
It very good article for a person looking for solution for solving such problem.
i came to know from here only that such problems are known as mojibake.
these steps can also be useful for other non ascii range chars as well.
Very useful information to avoid the mojibake.
This artical includes almost all steps which you can check to avoid mojibake.
Thanks for the review
Some additional tips:
1. When creating pages in UTF-8 for the web always be sure to add a Byte Order Mark at the beginning of the document. In Dreamweaver you can find this under Modify -> Page Properties -> Title/Encoding as the option “Include Unicode Signature (BOM).”
2. For Japanese email JIS (ISO-2022-JP) is still preferable to UTF-8 in terms of mail client support.
More general info available here:
http://www.denbushi.net/2003/02/28/avoiding-mojibake/