LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices

Reply
 
Search this Thread
Old 10-01-2009, 12:19 PM   #1
vbsaltydog
Member
 
Registered: Nov 2005
Distribution: CentOS
Posts: 141

Rep: Reputation: 15
character set problem


I have a CentOS 5.x Server that I am running into character encoding problems on. The Server variable for $LANG is en_US.UTF-8 so I should be good for unicode there. But I am writing content from the web to a mysql db using the PHP cli and when the data gets to the db, the non-english letters are garbled. The mysql db is using latin1 character set with the swedish collation as this is the mysql default. I understand that pumping utf8 characters into a latin1 db is going to cause problems but I also read that php itself has a default character set and so does apache so before I go mucking around on the server too much, does anyone know what I have to do to resolve this? Here is the scenario:

Environment:

Centos 5.x running en_US.UTF-8
Mysql db running latin1
PHP CLI character set: not defined in php.ini
PHP CGI character set: not defined in php.ini
Shell character set: should be en_US.UTF-8 from echo $LANG
Apache character set: not defined in http.conf

Task:

php cli scrapes the web for text --> php cli writes the scraped text to mysql db.

Problem:

Non-english letters and characters like apostrophe's are encoded.

Side note, php functions like trying to preg_replace apostrophes and addslahes and/or mysql_real_escape_string dont work so php is apparently not able to see the apostrophe characters.
 
Old 10-01-2009, 05:31 PM   #2
jhcaiced
Member
 
Registered: Mar 2009
Distribution: CentOS - Ubuntu - Debian
Posts: 83

Rep: Reputation: 27
Hi,

Usually the best approach is to define a encoding for all parts of the systems
and stick to it, if you are going to use many special characters then
use UTF-8

To configure mysql to use utf8 in this storage you have to do in you /etc/my.cnf
in the [mysqld] and [mysql.server] sections add the following lines:

# Enable UTF-8 for Server Storage
collation_server=utf8_unicode_ci
default-character-set=utf8
character_set_server=utf8

Also, take into account that the PHP script must set the connection to use
utf8 also, this is usually done with a 'set names utf8' query right after you
start your datatabase connection or if you use mysqli then use the set_charset('utf8')
method.

Finally, even if your script and your database are correctly
configured, you may experience trouble when you see your data because of issues like
the charset/encoding used in your terminal for example and if you are extracting data
via Apache/web then the encoding of your page also need to be set.

Best regards.
 
Old 10-01-2009, 05:33 PM   #3
bathory
Guru
 
Registered: Jun 2004
Location: Piraeus
Distribution: Slackware
Posts: 10,910

Rep: Reputation: 1326Reputation: 1326Reputation: 1326Reputation: 1326Reputation: 1326Reputation: 1326Reputation: 1326Reputation: 1326Reputation: 1326Reputation: 1326
You can use mysqldump to make a backup of your database. Make another copy of it to be able to play safely and then use iconv to transform the backup.sql file from latin1 (that is iso-8859-1 i think) to utf-8.
Code:
iconv -f iso-8859-1 -t utf8 backup.sql > backup-iconv.sql
Drop the database and restore it using the modified .sql.
You can find more details here

Regards
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
character set problem vbsaltydog Linux - General 2 07-06-2009 01:33 PM
Character Set problem Libertes Linux - General 0 09-08-2007 03:27 PM
Korean Character Set bootneck Linux - Newbie 2 06-29-2006 05:41 AM
Character set in the terminal intuxicator Debian 1 05-01-2005 02:56 AM
Character Set displayed problem. chrislee8 Linux - Newbie 3 10-02-2004 12:28 PM


All times are GMT -5. The time now is 03:25 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration