LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices

Reply
 
LinkBack Search this Thread
Old 09-19-2007, 10:18 PM   #1
secretlydead
Member
 
Registered: Sep 2003
Location: Qingdao, China
Distribution: mandriva, slack, red flag
Posts: 248

Rep: Reputation: 31
convert sql file gb2312 -> utf8


I've got an sql file. Many parts are like this:

咱们的客户

which is sometimes Chinese (i mean this literally, not as an expression) to a web browser.


I wonder how I can convert those into Chinese.

I figured on opening the file with a text processor and then convert them somehow (not really sure how, though!) and then save the file as utf8.

Any ideas?
 
Old 09-19-2007, 10:24 PM   #2
secretlydead
Member
 
Registered: Sep 2003
Location: Qingdao, China
Distribution: mandriva, slack, red flag
Posts: 248

Original Poster
Rep: Reputation: 31
p.s.

if the chinese characters up there are automatically converting in your browser, the sql file looks like this (but without the spaces)

& # 2 1 6 8 1 ; & # 2 0 2 0 4 ; & # 3 0 3 4 0 ; & # 2 3 4 5 8 ;
 
Old 09-20-2007, 10:09 AM   #3
secretlydead
Member
 
Registered: Sep 2003
Location: Qingdao, China
Distribution: mandriva, slack, red flag
Posts: 248

Original Poster
Rep: Reputation: 31
so far...

so far, i've grabbed all the gb2312 code from the sql file with a grep statement like this:

grep -o "&#[[:digit:]]\{5\};" >> filename

and then put that in an mysql database, made an html file and ran it through the browser, made another file and imported both of those files into another mysql database, first database with primary auto_increment of num and numb.

now, i just wrote this php script:

<?php

//this converts from gb2312 to utf8. it just goes through a file and replaces all the strings like &#?????; to the utf8 one

$mysql = mysql_connect( 'localhost', '*', '*' );
if(!$mysql)
{
echo 'Cannot connect to database.';
exit;
}

$query = "select gb2312, utf8 from piers.zh2312_1, piers.zh2312_2 where num=numb";
$result = mysql_query($query);
$num_result = mysql_num_rows($result);

echo $query;

echo 'test';
echo $num_result;

for ($i = 0; $i < $num_result; $i++)
{

$row = mysql_fetch_array($result);
$utf8 = $row['utf8'];
$gb2312 = $row['gb2312'];

$query2 = "perl -pi -e 's/".$gb2312."/".$utf8."/g' test.txt";
echo $query2;

//exec ("perl -pi -e 's/".$gb2312."/".$utf8."/g' test.txt");

}

?>

that's a test script, when executed and printed in UTF8 encoding, it shows the perl command over and over with two chinese characters next to each other; they should both look the same, verifying that you have put your database together correctly.

when you want to actually run the script, find the file name and replace it for test.txt and take the hide marks off of the exec command.
 
Old 09-20-2007, 11:04 AM   #4
secretlydead
Member
 
Registered: Sep 2003
Location: Qingdao, China
Distribution: mandriva, slack, red flag
Posts: 248

Original Poster
Rep: Reputation: 31
ok, so that works fine. just run it with php from the command line.

problem, though, is making that damn database (which would have to be made every single time you'd want to convert a document!).

how can i make a complete database of all utf8 gb2312 characters?
 
Old 09-20-2007, 12:16 PM   #5
Guttorm
Senior Member
 
Registered: Dec 2003
Location: Trondheim, Norway
Distribution: Debian and Ubuntu
Posts: 1,089

Rep: Reputation: 207Reputation: 207Reputation: 207
Hi

I don't know Chinese at all, but PHP has a function called html_entity_decode that converts html entity codes. Here's an example:
PHP Code:
<?php

$s 
"& # 2 1 6 8 1 ; & # 2 0 2 0 4 ; & # 3 0 3 4 0 ; & # 2 3 4 5 8 ;";
$s str_replace(' ','',$s);
$s html_entity_decode($sENT_QUOTES'utf-8');

echo 
$s;
?>
Running it, I got the result:
咱们的客

I have no idea if it's correct or not, but I hope u get my idea. Just read your file into a string, run html_entity_decode on it, and save.

To get a table of all chars, take a look at the get_html_translation_table function - I don't know what gb2312 is, but it might be there.
 
Old 09-20-2007, 07:55 PM   #6
secretlydead
Member
 
Registered: Sep 2003
Location: Qingdao, China
Distribution: mandriva, slack, red flag
Posts: 248

Original Poster
Rep: Reputation: 31
yeah,

that's right.

now, would you have any idea how to write a script that grabs all data from a database, converts it and puts it back in?
 
Old 09-21-2007, 08:57 AM   #7
secretlydead
Member
 
Registered: Sep 2003
Location: Qingdao, China
Distribution: mandriva, slack, red flag
Posts: 248

Original Poster
Rep: Reputation: 31
this will do a single column, taking the html code in a database and putting it back in as ut-8:

<?php

$username = 'whatever';
$password = 'whatever';

$mysql = mysql_connect( 'localhost', $username, $password );
if(!$mysql)
{
echo 'Cannot connect to database.';
exit;
}

// put in your database name here:
$db = 'whatever';
$table = 'whatever';
$column = 'whatever';

$query = "select ".$column." from ".$db.".".$table."";
$result = mysql_query($query);
$num_result = mysql_num_rows($result);

for ($i = 0; $i < $num_result; $i++)
{
$thing1 = mysql_result($result,$i);

echo $thing1.'<br>';

$thing2 = html_entity_decode($thing1, ENT_QUOTES, 'utf-8');

echo $thing2;

$queryreplace = "update ".$db.".".$table." set ".$column." = '".$thing2."' where ".$column." = '".$thing1."'";
echo $queryreplace."<br>";
mysql_query($queryreplace);
}


?>

Last edited by secretlydead; 09-21-2007 at 09:23 AM.
 
Old 09-21-2007, 09:46 AM   #8
secretlydead
Member
 
Registered: Sep 2003
Location: Qingdao, China
Distribution: mandriva, slack, red flag
Posts: 248

Original Poster
Rep: Reputation: 31
this one does a whole table

this variant will convert an entire table (it should be easily modifiable to do an entire database, but my job only extends as far as tables):




<?php

$username = 'whatever';
$password = 'whatever';

$mysql = mysql_connect( 'localhost', $username, $password );
if(!$mysql)
{
echo 'Cannot connect to database.';
exit;
}

// put in your database name here:
$db = 'whatever';
$table = 'whatever';


$fieldsquery = "select * from ".$db.".".$table."";
$fields_result = mysql_query($fieldsquery);
$num_fields = mysql_num_fields($fields_result);

for ($j = 0; $j < $num_fields ; $j++)
{

$column = mysql_field_name($fields_result, $j);

$query = "select ".$column." from ".$db.".".$table."";
$result = mysql_query($query);
$num_result = mysql_num_rows($result);

for ($i = 0; $i < $num_result; $i++)
{
$thing1 = mysql_result($result,$i);

echo $thing1.'<br>';

$thing2 = html_entity_decode($thing1, ENT_QUOTES, 'utf-8');

echo $thing2;

$queryreplace = "update ".$db.".".$table." set ".$column." = '".$thing2."' where ".$column." = '".$thing1."'";
echo $queryreplace."<br>";
mysql_query($queryreplace);
}

}

?>
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
convert file from UTF8 to ASCII encoding graemef Programming 8 12-15-2008 04:45 AM
zh_CN.gb2312 does not exist in locale -a damienduff Suse/Novell 1 02-23-2005 06:35 AM
in linux & c/c++: how do I convert an ascii string to utf8 & vice versa? davidh_uk Programming 2 02-06-2005 05:55 PM
kde with utf8 file names as root classic Slackware - Installation 2 01-08-2005 07:31 PM
is there any tool avilable to convert java/sql code into asp.net/sql2000 rddreamz Programming 1 11-08-2004 02:16 PM


All times are GMT -5. The time now is 02:04 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration