LinuxQuestions.org
LinuxAnswers - the LQ Linux tutorial section.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices



Reply
 
Search this Thread
Old 02-10-2008, 10:17 AM   #1
secretlydead
Member
 
Registered: Sep 2003
Location: Qingdao, China
Distribution: mandriva, slack, red flag
Posts: 248

Rep: Reputation: 31
mysql utf8 traditional simplified chinese load data infile


I've got a few mysql databases of Chinese-English dictionaries.

I've got one that i can't "load data infile".

the file is here:
http://www.worldtradetown.com/15000-3.csv

it contains 15,000 traditional and simplified chinese characters and their pinyin (roman pronunciation).


this command:
mysql> load data infile '/mnt/data/study/chinese/15000-3.csv' into table pinyin fields terminated by '`';
produces a table that produces results that look like this:

mysql> select * from pinyin limit 10;
+---------+---------------+
| chinese | pinyin |
+---------+---------------+
| ? | de5;;di2;;di4 |
| ? | shi4 |
| ? | bu4;;bu2 |
| ? | wo3 |

while other tables in the database produce correct results like this:
mysql> select * from chinese_frequency limit 10;
+--------+---------+--------+------+
| no | chinese | pinyin | rank |
+--------+---------+--------+------+
| 214048 | 的 | di4 | 1 |
| 214048 | 的 | de5 | 2 |
| 70872 | 了 | liao3 | 3 |
| 70872 | 了 | le5 | 4 |
| 61364 | 我 | wo3 | 5 |

The database is utf8, and if you open up that file in your web-browser (firefox) and set the encoding to utf8, both traditional and simplified characters show up. This is also true if you open it with openoffice and select utf8 as the character encoding.

So, why can't I load data infile into mysql?
 
Old 02-10-2008, 10:52 AM   #2
secretlydead
Member
 
Registered: Sep 2003
Location: Qingdao, China
Distribution: mandriva, slack, red flag
Posts: 248

Original Poster
Rep: Reputation: 31
even weirder

i did more extensive testing by loading a file that i had used for a different table, the old table was called ldc_ce and the new one ldc_ce2:


mysql> describe ldc_ec;
+---------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+---------------+------+-----+---------+-------+
| english | varchar(1000) | YES | | NULL | |
| chinese | varchar(1000) | YES | | NULL | |
+---------+---------------+------+-----+---------+-------+
2 rows in set (0.00 sec)

mysql> create table ldc_ec2 (english varchar(1000), chinese varchar(1000));
Query OK, 0 rows affected (0.01 sec)

mysql> load data infile '/mnt/data/study/chinese/dictionary/ldc-ec-dict.csv' into table ldc_ec2 fields terminated by '`';
Query OK, 110834 rows affected, 2 warnings (1.03 sec)
Records: 110834 Deleted: 0 Skipped: 0 Warnings: 2

mysql> select * from ldc_ec2 limit 10;
+----------+---------+
| english | chinese |
+----------+---------+
| -er | ? |
| -est | ? |
| -ian | ? |
| -ism | ?? |
| -ist | ? |
| -itis | ? |
| -ization | ? |
| -less | ? |
| -ly | ?;;? |
| -ology | ? |
+----------+---------+
10 rows in set (0.01 sec)

mysql> select * from ldc_ec limit 10;
+----------+----------+
| english | chinese |
+----------+----------+
| -er | 家 |
| -est | 最 |
| -ian | 家 |
| -ism | 主义 |
| -ist | 家 |
| -itis | 炎 |
| -ization | 化 |
| -less | 无 |
| -ly | 地;;然 |
| -ology | 学 |
+----------+----------+
10 rows in set (0.12 sec)



so, it has nothing to do with the file or anything else. just all of a sudden the system broke... have any idea how this could happen?
 
Old 02-10-2008, 11:14 AM   #3
secretlydead
Member
 
Registered: Sep 2003
Location: Qingdao, China
Distribution: mandriva, slack, red flag
Posts: 248

Original Poster
Rep: Reputation: 31
just remembered that i installed drakconf, the mandriva control panel today. it must have something to do with that.
 
Old 02-11-2008, 11:35 AM   #4
secretlydead
Member
 
Registered: Sep 2003
Location: Qingdao, China
Distribution: mandriva, slack, red flag
Posts: 248

Original Poster
Rep: Reputation: 31
i also got this message...

(gnome-terminal:11496): Vte-WARNING **: Can not find appropiate font for character U+e19a.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
convert Traditional to Simplified Chinese & vice versa tcma Linux - Software 4 02-06-2014 10:21 PM
MYSQL Load data infile error slam Linux - Software 2 12-03-2012 12:26 AM
Mysql 'LOAD DATA INFILE' command error talat Programming 3 01-07-2008 10:09 PM
How to display Simplified/Traditional Chinese characters on RedHat Linux machine chetandb Linux - Software 2 05-22-2006 11:58 AM
LOAD DATA INFILE error verokard Linux - Newbie 0 07-14-2003 12:02 AM


All times are GMT -5. The time now is 02:55 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration