LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 07-17-2013, 02:50 PM   #1
alsaf
Member
 
Registered: Mar 2009
Distribution: Lubuntu 13.10
Posts: 40

Rep: Reputation: 0
Compare unicode characters in Python program


I am learning phrases in the French language at the moment and have written a wee Python program that creates a web-page containing these phrases and their English equivalent. I then copy this web-page to my Kindle which I use as flash-cards to memorise them.

I have the program working but I am stuck with part of it where I have to convert certain characters to their HTML Unicode equivalent to allow them to be displayed on the web-page. My work-around I have used is to check for unicode characters in each phrase string by converting each character to ASCII using the ord() function. If the first part is 195 I then do a compare with the second part of character against an array which contains these ASCII codes. The code of the particular function is as follows:

Code:
def conv_unicode_str(vocab_str, unicode_arr):
  length_str = len(vocab_str)
  new_str = ""
  x = 0
  while x < length_str:
    # If current character at pos x is a unicode char
    if ord(vocab_str[x]) == 195:
      for entry in unicode_arr:
        if ord(vocab_str[x +1]) == int(entry[0]):
          new_str = new_str + entry[1]
          break
      # This will skip next char as unicode character is made up of 2 chars
      # in string
      x = x +2
    # Else if normal character then add to string 
    else:
      new_str = new_str + vocab_str[x]
      x = x + 1
  # return newly created string
  return (new_str)
The contents of the unicode file which is copied into an array and used in the above function is as follows:

Code:
# à
160=&agrave;
# â 
162=&acirc;
# ç
167=&ccedil;
# è
168=&egrave;
# é
169=&eacute;
# ì
172=&igrave;
# í
173=&iacute;
# î
174=&icirc;
# ò
178=&ograve;
# ó
179=&oacute;
# ù
185=&ugrave;
# ú
186=&uacute;
I've been googling but I can't an easier way of doing this. Is there a better way of doing this?
 
Old 07-17-2013, 03:31 PM   #2
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,780

Rep: Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081
Quote:
Originally Posted by alsaf View Post
I have to convert certain characters to their HTML Unicode equivalent to allow them to be displayed on the web-page.
You should be able to put actual Unicode on the web-page, for instance this page containing your post does that. You may need to indicate the encoding in the html with a meta tag:

Code:
<! assuming you are encoding in utf-8 -->

<!-- Defining the charset in HTML4 -->
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

<!-- In HTML5 -->
<meta charset="utf-8">
Also of possible interest, the Python Unicode HOWTO: for 2.x, and 3.x.
 
Old 07-17-2013, 03:42 PM   #3
alsaf
Member
 
Registered: Mar 2009
Distribution: Lubuntu 13.10
Posts: 40

Original Poster
Rep: Reputation: 0
ntubski, thanks for that. It works like a dream. To think, I had spent all that time getting the workaround going and believe me, there was times where I was close to tearing what my hair I've got out, and it so easy to do!!
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Problem displaying Unicode special characters in Urxvt/rxvt-unicode terminal shahinism Slackware 4 10-22-2012 03:08 PM
Printing Unicode characters in C Completely Clueless Programming 3 09-05-2009 04:13 PM
how do new unicode characters get implemented j1wu Linux - Software 1 04-18-2009 01:23 AM
unicode/japanese characters in C merc64 Programming 6 03-13-2007 07:00 PM
Unicode characters in Firefox Ephracis Linux - Software 6 10-14-2005 04:05 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 09:12 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration