LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 09-04-2011, 09:31 AM   #1
qwertyjjj
Senior Member
 
Registered: Jul 2009
Location: UK
Distribution: Cent OS5 with Plesk
Posts: 1,013

Rep: Reputation: 30
iconv errors - French


I have a file with French characters in and it needs to be ISO 8859 to display correctly on my webpage.
I keep trying to convert it from UTF8 but it errors at the first slash / apostrophe.

Any ideas why?

Code:
<?php
/*
$Id: $

osCommerce, Open Source E-Commerce Solutions
http://www.oscommerce.com

Copyright © 2007 osCommerce

Released under the GNU General Public License
*/

define('TEXT_MAIN', '
<hr>
<b style="color: rgb(7,75,138);">
Proxy vous offre l\

iconv: illegal input sequence at position 266
Code:
<?php
/*
  $Id: $

  osCommerce, Open Source E-Commerce Solutions
  http://www.oscommerce.com

  Copyright (c) 2007 osCommerce

  Released under the GNU General Public License
*/

define('TEXT_MAIN', '
<hr>
<b style="color: rgb(7,75,138);">
Proxyllllll vous offre l\’accès Ã* des serveurs proxy et réseau privé virtuel (VPN) pour une grande variété d’utilisations telles que :<br />
 
Old 09-04-2011, 11:01 AM   #2
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Since ISO-8859 is a more limited encoding than unicode, chances are you've simply hit a character that it can't handle.

Not to mention that there are about 15 variations of that encoding, each with a slightly different set of supported characters. Latin1 text is ISO-8859-1 or the "updated" ISO-8859-15, while 2-14,16 are designed with various other, mostly European languages in mind.

http://en.wikipedia.org/wiki/ISO/IEC_8859

Finally, the gcc-based iconv, at least, has two options for handling illegal characters:

Code:
iconv -f UTF-8 -t ISO-8859-1//IGNORE file	#delete any unsupported characters

iconv -f UTF-8 -t ISO-8859-1//TRANSLIT file	#attempt to replace unsupported
						#characters with similar characters
						#from the target encoding
PS: A better, long term solution would probably be to convert your web page to UTF-8. ISO-8859 is old tech. The future is unicode.

Last edited by David the H.; 09-04-2011 at 11:04 AM.
 
Old 09-04-2011, 11:06 AM   #3
qwertyjjj
Senior Member
 
Registered: Jul 2009
Location: UK
Distribution: Cent OS5 with Plesk
Posts: 1,013

Original Poster
Rep: Reputation: 30
Quote:
Originally Posted by David the H. View Post
Since ISO-8859 is a more limited encoding than unicode, chances are you've simply hit a character that it can't handle.

Not to mention that there are about 15 variations of that encoding, each with a slightly different set of supported characters. Latin1 text is ISO-8859-1 or the "updated" ISO-8859-15, while 2-14,16 are designed with various other, mostly European languages in mind.

http://en.wikipedia.org/wiki/ISO/IEC_8859

Finally, the gcc-based iconv, at least, has two options for handling illegal characters:

Code:
iconv -f UTF-8 -t ISO-8859-1//IGNORE file	#delete any unsupported characters

iconv -f UTF-8 -t ISO-8859-1//TRANSLIT file	#attempt to replace unsupported
						#characters with similar characters
						#from the target encoding
PS: A better, long term solution would probably be to convert your web page to UTF-8. ISO-8859 is old tech. The future is unicode.
UTF doesn;t seem to recognise French characters.
 
Old 09-04-2011, 11:16 AM   #4
qwertyjjj
Senior Member
 
Registered: Jul 2009
Location: UK
Distribution: Cent OS5 with Plesk
Posts: 1,013

Original Poster
Rep: Reputation: 30
Is there a LINUX GUI program that can show you which characters are incorrect?
I keep trying to save my file as ISO with French characters and gedit errors.

It does not like these characters in ISO:
Avec ses prix compétitifs et ses forfaits flexibles, Proxyxxxxxx vous offre l'accès à un serveur PROXY et/ou RÉSEAU PRIVÉ VIRTUEL (VPN) géolocalisé en France, qui vous permettra d’accéder à l’internet sans restrictions de partout dans le monde et ce, pour une grande variété d'utilisat

Last edited by qwertyjjj; 09-04-2011 at 11:24 AM.
 
Old 09-04-2011, 11:59 AM   #5
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
UTF-8 can handle anything. As I said, it's ISO-8859 that's limited. Did you try the IGNORE or TRANSLIT options I mentioned?

Read the Wikipedia link I gave you, and follow on to the sub-pages as necessary. The link to http://en.wikipedia.org/wiki/ISO/IEC_8859-1 says this about French, for example (slightly abbreviated) :

Code:
Languages commonly supported but with incomplete coverage

Language	Missing characters	Typical workaround	Supported by

French	 	Œ, œ, and Ÿ		digraphs OE, oe, and Y	ISO-8859-15, Windows-1252
 
Old 09-04-2011, 01:20 PM   #6
qwertyjjj
Senior Member
 
Registered: Jul 2009
Location: UK
Distribution: Cent OS5 with Plesk
Posts: 1,013

Original Poster
Rep: Reputation: 30
Quote:
Originally Posted by David the H. View Post
UTF-8 can handle anything. As I said, it's ISO-8859 that's limited. Did you try the IGNORE or TRANSLIT options I mentioned?

Read the Wikipedia link I gave you, and follow on to the sub-pages as necessary. The link to http://en.wikipedia.org/wiki/ISO/IEC_8859-1 says this about French, for example (slightly abbreviated) :

Code:
Languages commonly supported but with incomplete coverage

Language	Missing characters	Typical workaround	Supported by

French	 	Œ, œ, and Ÿ		digraphs OE, oe, and Y	ISO-8859-15, Windows-1252
When I save the file in UTF8 it displays weird characters instead of the French accents.
I'm using gedit in Linux rather than the terminal...
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to make iconv to skip incorrect symbols or iconv alternative? x-stream Linux - Software 4 09-26-2011 09:32 PM
Installing iconv? Zeno McDohl Linux - Software 1 01-24-2009 05:29 AM
Convert file from ISO-8859-1 to some Japanese encoding? (iconv errors) violagirl23 Linux - Software 5 03-26-2008 12:13 AM
Iconv troubles ppr:kut Linux - Software 1 10-19-2007 05:24 AM
iconv command saravanan1979 Linux - Software 1 07-06-2002 11:55 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 08:24 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration