LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   convert sql file gb2312 -> utf8 (https://www.linuxquestions.org/questions/linux-general-1/convert-sql-file-gb2312-utf8-585966/)

secretlydead 09-19-2007 10:18 PM

convert sql file gb2312 -> utf8
 
I've got an sql file. Many parts are like this:

咱们的客户

which is sometimes Chinese (i mean this literally, not as an expression) to a web browser.


I wonder how I can convert those into Chinese.

I figured on opening the file with a text processor and then convert them somehow (not really sure how, though!) and then save the file as utf8.

Any ideas?

secretlydead 09-19-2007 10:24 PM

p.s.

if the chinese characters up there are automatically converting in your browser, the sql file looks like this (but without the spaces)

& # 2 1 6 8 1 ; & # 2 0 2 0 4 ; & # 3 0 3 4 0 ; & # 2 3 4 5 8 ;

secretlydead 09-20-2007 10:09 AM

so far...
 
so far, i've grabbed all the gb2312 code from the sql file with a grep statement like this:

grep -o "&#[[:digit:]]\{5\};" >> filename

and then put that in an mysql database, made an html file and ran it through the browser, made another file and imported both of those files into another mysql database, first database with primary auto_increment of num and numb.

now, i just wrote this php script:

<?php

//this converts from gb2312 to utf8. it just goes through a file and replaces all the strings like &#?????; to the utf8 one

$mysql = mysql_connect( 'localhost', '*', '*' );
if(!$mysql)
{
echo 'Cannot connect to database.';
exit;
}

$query = "select gb2312, utf8 from piers.zh2312_1, piers.zh2312_2 where num=numb";
$result = mysql_query($query);
$num_result = mysql_num_rows($result);

echo $query;

echo 'test';
echo $num_result;

for ($i = 0; $i < $num_result; $i++)
{

$row = mysql_fetch_array($result);
$utf8 = $row['utf8'];
$gb2312 = $row['gb2312'];

$query2 = "perl -pi -e 's/".$gb2312."/".$utf8."/g' test.txt";
echo $query2;

//exec ("perl -pi -e 's/".$gb2312."/".$utf8."/g' test.txt");

}

?>

that's a test script, when executed and printed in UTF8 encoding, it shows the perl command over and over with two chinese characters next to each other; they should both look the same, verifying that you have put your database together correctly.

when you want to actually run the script, find the file name and replace it for test.txt and take the hide marks off of the exec command.

secretlydead 09-20-2007 11:04 AM

ok, so that works fine. just run it with php from the command line.

problem, though, is making that damn database (which would have to be made every single time you'd want to convert a document!).

how can i make a complete database of all utf8 gb2312 characters?

Guttorm 09-20-2007 12:16 PM

Hi

I don't know Chinese at all, but PHP has a function called html_entity_decode that converts html entity codes. Here's an example:
PHP Code:

<?php

$s 
"& # 2 1 6 8 1 ; & # 2 0 2 0 4 ; & # 3 0 3 4 0 ; & # 2 3 4 5 8 ;";
$s str_replace(' ','',$s);
$s html_entity_decode($sENT_QUOTES'utf-8');

echo 
$s;
?>

Running it, I got the result:
咱们的客

I have no idea if it's correct or not, but I hope u get my idea. Just read your file into a string, run html_entity_decode on it, and save.

To get a table of all chars, take a look at the get_html_translation_table function - I don't know what gb2312 is, but it might be there.

secretlydead 09-20-2007 07:55 PM

yeah,

that's right.

now, would you have any idea how to write a script that grabs all data from a database, converts it and puts it back in?

secretlydead 09-21-2007 08:57 AM

this will do a single column, taking the html code in a database and putting it back in as ut-8:

<?php

$username = 'whatever';
$password = 'whatever';

$mysql = mysql_connect( 'localhost', $username, $password );
if(!$mysql)
{
echo 'Cannot connect to database.';
exit;
}

// put in your database name here:
$db = 'whatever';
$table = 'whatever';
$column = 'whatever';

$query = "select ".$column." from ".$db.".".$table."";
$result = mysql_query($query);
$num_result = mysql_num_rows($result);

for ($i = 0; $i < $num_result; $i++)
{
$thing1 = mysql_result($result,$i);

echo $thing1.'<br>';

$thing2 = html_entity_decode($thing1, ENT_QUOTES, 'utf-8');

echo $thing2;

$queryreplace = "update ".$db.".".$table." set ".$column." = '".$thing2."' where ".$column." = '".$thing1."'";
echo $queryreplace."<br>";
mysql_query($queryreplace);
}


?>

secretlydead 09-21-2007 09:46 AM

this one does a whole table
 
this variant will convert an entire table (it should be easily modifiable to do an entire database, but my job only extends as far as tables):




<?php

$username = 'whatever';
$password = 'whatever';

$mysql = mysql_connect( 'localhost', $username, $password );
if(!$mysql)
{
echo 'Cannot connect to database.';
exit;
}

// put in your database name here:
$db = 'whatever';
$table = 'whatever';


$fieldsquery = "select * from ".$db.".".$table."";
$fields_result = mysql_query($fieldsquery);
$num_fields = mysql_num_fields($fields_result);

for ($j = 0; $j < $num_fields ; $j++)
{

$column = mysql_field_name($fields_result, $j);

$query = "select ".$column." from ".$db.".".$table."";
$result = mysql_query($query);
$num_result = mysql_num_rows($result);

for ($i = 0; $i < $num_result; $i++)
{
$thing1 = mysql_result($result,$i);

echo $thing1.'<br>';

$thing2 = html_entity_decode($thing1, ENT_QUOTES, 'utf-8');

echo $thing2;

$queryreplace = "update ".$db.".".$table." set ".$column." = '".$thing2."' where ".$column." = '".$thing1."'";
echo $queryreplace."<br>";
mysql_query($queryreplace);
}

}

?>


All times are GMT -5. The time now is 07:45 PM.