LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 04-17-2010, 08:54 PM   #1
tulane
LQ Newbie
 
Registered: Apr 2010
Posts: 5

Rep: Reputation: 0
Unhappy MySQL encoding problems (HELP HELP HELP!)


Hi,

Sorry for the unprofessional subject, but I am literally at my wits' end. There's just something that I'm not getting, but I wish I knew what it was.

Here is the problem:

I recently migrated from one server with MySQL 3.23 (yes, it was a little outdated) to another with MySQL 5.0.77. I had a database under MySQL 3.23 that contained data in the cp1251 encoding. When I did a mysqldump of the data, it was converted to utf-8 for some reason (I specified --default-character-set=cp1251) and (AND THIS IMPORTANT) the iso-8859-1 subset of utf-8 (i.e. instead of cyrillic characters, all the data got exported as a bunch of vowels with various diacritics. I don't understand why, nor what I can do prevent mysqldump from doing this.

It then, of course, imported as utf8 to the new server. While I can get the contents of the database to display as cyrillic by setting the character-encoding on the website as "windows-1251", this is a backwards way of going about it. Furthermore, and importantly, sorting doesn't work properly.

So my question is:
1) Is there any way I can get mysqldump on the old server to recognize that the data is cp1251 and not iso-8859-1? That would solve my problem.
2) If that fails, is there any way I can convert the latin diacritic symbols currently stored as utf8 to iso-8859-1? (Converting from that to cp1251 should be fairly straightforward... ?... I guess? Maybe? Hopefully?)

I've already tried:
iconv -f utf8 -t iso-8859-1
and
iconv -f utf8 -t cp1251
of the dump files. It doesn't work. Tells me there is an illegal input sequence at position X. Googling that has given me no satisfactory answer.

I've already looked at every source I could in order to solve this dilemma. Please, if you have any ideas on this, HEEEEEELLLLLLP!
 
Old 04-18-2010, 03:04 AM   #2
norbert74
Member
 
Registered: Apr 2006
Posts: 63

Rep: Reputation: 23
Can you post the complete statement you use for the dump.
I'm sure you have already checked this, but nevertheless I ask:
Do you use the paramter --tab for your mysql dump?
Are there tables which contain columns in several char sets?
 
Old 04-18-2010, 05:01 PM   #3
tulane
LQ Newbie
 
Registered: Apr 2010
Posts: 5

Original Poster
Rep: Reputation: 0
I did try using the --tab option. I tried various mysqldump commands, but none would produce a file with anything other than utf-8 and none would produce a file that could be converted iconv.

I DID finally solve the problem, though this has got to be the most convoluted and bass-ackwards way to do it. With any luck, some poor soul will find it useful.

I was using Putty to connect to the server, and in Putty I could set the character set Putty displayed for me. If you set the display character set to utf-8, Putty is effectively performing an on-the-fly conversion from utf-8 to ascii (i.e. the conversion that iconv can't seem to do). That got me thinking, so I turned on logging in Putty, then did:
cat mysqldump_file

Opened the log up in Wordpad on my Windows computer, where the default non-Unicode character set is cp1251. Sure enough, all the cyrillic characters were displaying correctly. From there, it was a simple step to save the file as a unicode file, upload to server and load to database.

But what the heck? My faith has been shaken. Since when does Unix need to use Windows as a crutch?
 
Old 04-18-2010, 08:07 PM   #4
tulane
LQ Newbie
 
Registered: Apr 2010
Posts: 5

Original Poster
Rep: Reputation: 0
No, I spoke too soon. That did not solve the problem as the data that I saved in Windows is still unicode

I can't believe there's no way to resolve this. I mean, how hard can it be to do a simple unicode character to non-unicode character search-and-replace?
 
Old 04-18-2010, 09:13 PM   #5
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,360

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
You could try posting in the MySQL forums http://forums.mysql.com/ or the mysqldump manual http://dev.mysql.com/doc/refman/5.0/en/mysqldump.html may help.
 
Old 04-18-2010, 09:56 PM   #6
tulane
LQ Newbie
 
Registered: Apr 2010
Posts: 5

Original Poster
Rep: Reputation: 0
Solved it.

Ran a simple
sed -r -i 's/\u{hex_code_of_latin_symbol}/\u{hex_code_of_cyrillic_symbol}/g' mysqldump_file
for each letter.

It was only a little painful. If you were smarter than me you might be able to do this in one regex function. Latin diacritics are unicode hex codes 00c0 through 00ff while cyrillic ones are 0410 through 044f.
 
Old 04-18-2010, 09:57 PM   #7
tulane
LQ Newbie
 
Registered: Apr 2010
Posts: 5

Original Poster
Rep: Reputation: 0
And chris:
I extensively searched through all available documentation before posting here. I'm really not the question asking type, preferring to research stuff on my own. In this case, I was just feeling I had no choice...
 
Old 04-19-2010, 01:18 AM   #8
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,360

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
Fair enough, just thought the mysql forums would know if anyone...
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Problems with MySQL on SuSE: Can't Connect (/var/lib/mysql/mysql.sock) neocookie Linux - Software 8 02-07-2008 11:48 PM
Mplayer encoding problems brokenpromises Linux - Server 2 10-10-2007 01:49 AM
Character encoding problems eduac Linux - Software 5 08-02-2005 06:48 PM
Problems with ripperx encoding walterbyrd Linux - Software 1 03-09-2005 03:04 PM
mp3 encoding problems Stephanie Linux - General 9 11-16-2001 10:43 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 04:42 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration