LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   mysql utf8 traditional simplified chinese load data infile (https://www.linuxquestions.org/questions/linux-server-73/mysql-utf8-traditional-simplified-chinese-load-data-infile-620093/)

secretlydead 02-10-2008 09:17 AM

mysql utf8 traditional simplified chinese load data infile
 
I've got a few mysql databases of Chinese-English dictionaries.

I've got one that i can't "load data infile".

the file is here:
http://www.worldtradetown.com/15000-3.csv

it contains 15,000 traditional and simplified chinese characters and their pinyin (roman pronunciation).


this command:
mysql> load data infile '/mnt/data/study/chinese/15000-3.csv' into table pinyin fields terminated by '`';
produces a table that produces results that look like this:

mysql> select * from pinyin limit 10;
+---------+---------------+
| chinese | pinyin |
+---------+---------------+
| ? | de5;;di2;;di4 |
| ? | shi4 |
| ? | bu4;;bu2 |
| ? | wo3 |

while other tables in the database produce correct results like this:
mysql> select * from chinese_frequency limit 10;
+--------+---------+--------+------+
| no | chinese | pinyin | rank |
+--------+---------+--------+------+
| 214048 | 的 | di4 | 1 |
| 214048 | 的 | de5 | 2 |
| 70872 | 了 | liao3 | 3 |
| 70872 | 了 | le5 | 4 |
| 61364 | 我 | wo3 | 5 |

The database is utf8, and if you open up that file in your web-browser (firefox) and set the encoding to utf8, both traditional and simplified characters show up. This is also true if you open it with openoffice and select utf8 as the character encoding.

So, why can't I load data infile into mysql?

secretlydead 02-10-2008 09:52 AM

even weirder
 
i did more extensive testing by loading a file that i had used for a different table, the old table was called ldc_ce and the new one ldc_ce2:


mysql> describe ldc_ec;
+---------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+---------------+------+-----+---------+-------+
| english | varchar(1000) | YES | | NULL | |
| chinese | varchar(1000) | YES | | NULL | |
+---------+---------------+------+-----+---------+-------+
2 rows in set (0.00 sec)

mysql> create table ldc_ec2 (english varchar(1000), chinese varchar(1000));
Query OK, 0 rows affected (0.01 sec)

mysql> load data infile '/mnt/data/study/chinese/dictionary/ldc-ec-dict.csv' into table ldc_ec2 fields terminated by '`';
Query OK, 110834 rows affected, 2 warnings (1.03 sec)
Records: 110834 Deleted: 0 Skipped: 0 Warnings: 2

mysql> select * from ldc_ec2 limit 10;
+----------+---------+
| english | chinese |
+----------+---------+
| -er | ? |
| -est | ? |
| -ian | ? |
| -ism | ?? |
| -ist | ? |
| -itis | ? |
| -ization | ? |
| -less | ? |
| -ly | ?;;? |
| -ology | ? |
+----------+---------+
10 rows in set (0.01 sec)

mysql> select * from ldc_ec limit 10;
+----------+----------+
| english | chinese |
+----------+----------+
| -er | 家 |
| -est | 最 |
| -ian | 家 |
| -ism | 主义 |
| -ist | 家 |
| -itis | 炎 |
| -ization | 化 |
| -less | 无 |
| -ly | 地;;然 |
| -ology | 学 |
+----------+----------+
10 rows in set (0.12 sec)



so, it has nothing to do with the file or anything else. just all of a sudden the system broke... have any idea how this could happen?

secretlydead 02-10-2008 10:14 AM

just remembered that i installed drakconf, the mandriva control panel today. it must have something to do with that.

secretlydead 02-11-2008 10:35 AM

i also got this message...
 
(gnome-terminal:11496): Vte-WARNING **: Can not find appropiate font for character U+e19a.


All times are GMT -5. The time now is 01:14 AM.