Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Sorry for the unprofessional subject, but I am literally at my wits' end. There's just something that I'm not getting, but I wish I knew what it was.
Here is the problem:
I recently migrated from one server with MySQL 3.23 (yes, it was a little outdated) to another with MySQL 5.0.77. I had a database under MySQL 3.23 that contained data in the cp1251 encoding. When I did a mysqldump of the data, it was converted to utf-8 for some reason (I specified --default-character-set=cp1251) and (AND THIS IMPORTANT) the iso-8859-1 subset of utf-8 (i.e. instead of cyrillic characters, all the data got exported as a bunch of vowels with various diacritics. I don't understand why, nor what I can do prevent mysqldump from doing this.
It then, of course, imported as utf8 to the new server. While I can get the contents of the database to display as cyrillic by setting the character-encoding on the website as "windows-1251", this is a backwards way of going about it. Furthermore, and importantly, sorting doesn't work properly.
So my question is:
1) Is there any way I can get mysqldump on the old server to recognize that the data is cp1251 and not iso-8859-1? That would solve my problem.
2) If that fails, is there any way I can convert the latin diacritic symbols currently stored as utf8 to iso-8859-1? (Converting from that to cp1251 should be fairly straightforward... ?... I guess? Maybe? Hopefully?)
I've already tried:
iconv -f utf8 -t iso-8859-1
and
iconv -f utf8 -t cp1251
of the dump files. It doesn't work. Tells me there is an illegal input sequence at position X. Googling that has given me no satisfactory answer.
I've already looked at every source I could in order to solve this dilemma. Please, if you have any ideas on this, HEEEEEELLLLLLP!
Can you post the complete statement you use for the dump.
I'm sure you have already checked this, but nevertheless I ask:
Do you use the paramter --tab for your mysql dump?
Are there tables which contain columns in several char sets?
I did try using the --tab option. I tried various mysqldump commands, but none would produce a file with anything other than utf-8 and none would produce a file that could be converted iconv.
I DID finally solve the problem, though this has got to be the most convoluted and bass-ackwards way to do it. With any luck, some poor soul will find it useful.
I was using Putty to connect to the server, and in Putty I could set the character set Putty displayed for me. If you set the display character set to utf-8, Putty is effectively performing an on-the-fly conversion from utf-8 to ascii (i.e. the conversion that iconv can't seem to do). That got me thinking, so I turned on logging in Putty, then did:
cat mysqldump_file
Opened the log up in Wordpad on my Windows computer, where the default non-Unicode character set is cp1251. Sure enough, all the cyrillic characters were displaying correctly. From there, it was a simple step to save the file as a unicode file, upload to server and load to database.
But what the heck? My faith has been shaken. Since when does Unix need to use Windows as a crutch?
No, I spoke too soon. That did not solve the problem as the data that I saved in Windows is still unicode
I can't believe there's no way to resolve this. I mean, how hard can it be to do a simple unicode character to non-unicode character search-and-replace?
Ran a simple
sed -r -i 's/\u{hex_code_of_latin_symbol}/\u{hex_code_of_cyrillic_symbol}/g' mysqldump_file
for each letter.
It was only a little painful. If you were smarter than me you might be able to do this in one regex function. Latin diacritics are unicode hex codes 00c0 through 00ff while cyrillic ones are 0410 through 044f.
And chris:
I extensively searched through all available documentation before posting here. I'm really not the question asking type, preferring to research stuff on my own. In this case, I was just feeling I had no choice...
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.