Old MySQL versions didn’t have good support for Arabic, so what I used to do is to configure MySQL tables with any character set e.g. latin-1 and configure PHP to read it and write in Windows-1256 and pass it to MySQL as is i.e. MySQL think it is “Latin-1” but in fact it is “Windows-1256”.
Needless to say cheating in the character set have many bad consensuses:
- Sorting will not work as expected
- Other applications will assume you are using the correct character set and you won’t be able to read or write your data. e.g. PHPMyAdmin could display something like this:
“ÈÓã Çááå ÇáÑÍãä ÇáÑÍíã”
How to fix it
I found this trick in MySQL documentation, what you need to do is convert any character field to “blob” then convert it back to the desired field type with the desired character set:
ALTER TABLE t1 CHANGE c1 c1 BLOB;
ALTER TABLE t1 CHANGE c1 c1 TEXT CHARACTER SET cp1256;
You can change it and test it from phpMyAdmin.
Of course it would be wiser to automate the process using a php script.
If you want to store the data in UTF-8, you can convert it using the command:
ALTER TABLE t1 CHANGE c1 c1 TEXT CHARACTER SET utf8;
NOTE: don’t convert from “blob” to “utf8” unless your data is actually stored using “utf8”.
Windows-1256 or UTF8
If you are application use only Arabic and English you can store the data in Windows-1256 (aka CP1256), but if you want to store more character set you need to use UTF8.
Even if you use Windows-1256 to store your data in MySQL you can still build applications with UTF8 interface, this done by using the SQL command:
SET NAMES utf8
This command means all data communication between your application (PHP page) and the server is UTF8, MySQL server could store in Windows-1256 or UTF-8 it doesn’t really matter. Keep in mind storing Arabic data in UTF8 means double sizing your MySQL data files.
If you see your data in question mark like this “????? ?????? ????” it means your “MySQL connection” character set is incompatible with “MySQL storage” character set, and when MySQL doesn’t know how to convert any character it replaces it with “?”.
A great way to migrate your application is to set the server default charset to “cp1256”, and set all new connections to use it:
init-connect='SET NAMES cp1256'