LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   character set problem (https://www.linuxquestions.org/questions/linux-server-73/character-set-problem-759017/)

vbsaltydog 10-01-2009 12:19 PM

character set problem
 
I have a CentOS 5.x Server that I am running into character encoding problems on. The Server variable for $LANG is en_US.UTF-8 so I should be good for unicode there. But I am writing content from the web to a mysql db using the PHP cli and when the data gets to the db, the non-english letters are garbled. The mysql db is using latin1 character set with the swedish collation as this is the mysql default. I understand that pumping utf8 characters into a latin1 db is going to cause problems but I also read that php itself has a default character set and so does apache so before I go mucking around on the server too much, does anyone know what I have to do to resolve this? Here is the scenario:

Environment:

Centos 5.x running en_US.UTF-8
Mysql db running latin1
PHP CLI character set: not defined in php.ini
PHP CGI character set: not defined in php.ini
Shell character set: should be en_US.UTF-8 from echo $LANG
Apache character set: not defined in http.conf

Task:

php cli scrapes the web for text --> php cli writes the scraped text to mysql db.

Problem:

Non-english letters and characters like apostrophe's are encoded.

Side note, php functions like trying to preg_replace apostrophes and addslahes and/or mysql_real_escape_string dont work so php is apparently not able to see the apostrophe characters.

jhcaiced 10-01-2009 05:31 PM

Hi,

Usually the best approach is to define a encoding for all parts of the systems
and stick to it, if you are going to use many special characters then
use UTF-8

To configure mysql to use utf8 in this storage you have to do in you /etc/my.cnf
in the [mysqld] and [mysql.server] sections add the following lines:

# Enable UTF-8 for Server Storage
collation_server=utf8_unicode_ci
default-character-set=utf8
character_set_server=utf8

Also, take into account that the PHP script must set the connection to use
utf8 also, this is usually done with a 'set names utf8' query right after you
start your datatabase connection or if you use mysqli then use the set_charset('utf8')
method.

Finally, even if your script and your database are correctly
configured, you may experience trouble when you see your data because of issues like
the charset/encoding used in your terminal for example and if you are extracting data
via Apache/web then the encoding of your page also need to be set.

Best regards.

bathory 10-01-2009 05:33 PM

You can use mysqldump to make a backup of your database. Make another copy of it to be able to play safely and then use iconv to transform the backup.sql file from latin1 (that is iso-8859-1 i think) to utf-8.
Code:

iconv -f iso-8859-1 -t utf8 backup.sql > backup-iconv.sql
Drop the database and restore it using the modified .sql.
You can find more details here

Regards


All times are GMT -5. The time now is 05:14 AM.