character set problem
I have a CentOS 5.x Server that I am running into character encoding problems on. The Server variable for $LANG is en_US.UTF-8 so I should be good for unicode there. But I am writing content from the web to a mysql db using the PHP cli and when the data gets to the db, the non-english letters are garbled. The mysql db is using latin1 character set with the swedish collation as this is the mysql default. I understand that pumping utf8 characters into a latin1 db is going to cause problems but I also read that php itself has a default character set and so does apache so before I go mucking around on the server too much, does anyone know what I have to do to resolve this? Here is the scenario:
Environment: Centos 5.x running en_US.UTF-8 Mysql db running latin1 PHP CLI character set: not defined in php.ini PHP CGI character set: not defined in php.ini Shell character set: should be en_US.UTF-8 from echo $LANG Apache character set: not defined in http.conf Task: php cli scrapes the web for text --> php cli writes the scraped text to mysql db. Problem: Non-english letters and characters like apostrophe's are encoded. Side note, php functions like trying to preg_replace apostrophes and addslahes and/or mysql_real_escape_string dont work so php is apparently not able to see the apostrophe characters. |
Hi,
Usually the best approach is to define a encoding for all parts of the systems and stick to it, if you are going to use many special characters then use UTF-8 To configure mysql to use utf8 in this storage you have to do in you /etc/my.cnf in the [mysqld] and [mysql.server] sections add the following lines: # Enable UTF-8 for Server Storage collation_server=utf8_unicode_ci default-character-set=utf8 character_set_server=utf8 Also, take into account that the PHP script must set the connection to use utf8 also, this is usually done with a 'set names utf8' query right after you start your datatabase connection or if you use mysqli then use the set_charset('utf8') method. Finally, even if your script and your database are correctly configured, you may experience trouble when you see your data because of issues like the charset/encoding used in your terminal for example and if you are extracting data via Apache/web then the encoding of your page also need to be set. Best regards. |
You can use mysqldump to make a backup of your database. Make another copy of it to be able to play safely and then use iconv to transform the backup.sql file from latin1 (that is iso-8859-1 i think) to utf-8.
Code:
iconv -f iso-8859-1 -t utf8 backup.sql > backup-iconv.sql You can find more details here Regards |
All times are GMT -5. The time now is 05:14 AM. |