[SOLVED] Compare unicode characters in Python program

alsaf · 07-17-2013, 02:50 PM

I am learning phrases in the French language at the moment and have written a wee Python program that creates a web-page containing these phrases and their English equivalent. I then copy this web-page to my Kindle which I use as flash-cards to memorise them.

I have the program working but I am stuck with part of it where I have to convert certain characters to their HTML Unicode equivalent to allow them to be displayed on the web-page. My work-around I have used is to check for unicode characters in each phrase string by converting each character to ASCII using the ord() function. If the first part is 195 I then do a compare with the second part of character against an array which contains these ASCII codes. The code of the particular function is as follows:

Code:

def conv_unicode_str(vocab_str, unicode_arr):
  length_str = len(vocab_str)
  new_str = ""
  x = 0
  while x < length_str:
    # If current character at pos x is a unicode char
    if ord(vocab_str[x]) == 195:
      for entry in unicode_arr:
        if ord(vocab_str[x +1]) == int(entry[0]):
          new_str = new_str + entry[1]
          break
      # This will skip next char as unicode character is made up of 2 chars
      # in string
      x = x +2
    # Else if normal character then add to string 
    else:
      new_str = new_str + vocab_str[x]
      x = x + 1
  # return newly created string
  return (new_str)

The contents of the unicode file which is copied into an array and used in the above function is as follows:

Code:

# à
160=&agrave;
# â 
162=&acirc;
# ç
167=&ccedil;
# è
168=&egrave;
# é
169=&eacute;
# ì
172=&igrave;
# í
173=&iacute;
# î
174=&icirc;
# ò
178=&ograve;
# ó
179=&oacute;
# ù
185=&ugrave;
# ú
186=&uacute;

I've been googling but I can't an easier way of doing this. Is there a better way of doing this?

ntubski · 07-17-2013, 03:31 PM

Quote:

Originally Posted by alsaf

I have to convert certain characters to their HTML Unicode equivalent to allow them to be displayed on the web-page.

You should be able to put actual Unicode on the web-page, for instance this page containing your post does that. You may need to indicate the encoding in the html with a meta tag:

Code:

<! assuming you are encoding in utf-8 -->

<!-- Defining the charset in HTML4 -->
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

<!-- In HTML5 -->
<meta charset="utf-8">

Also of possible interest, the Python Unicode HOWTO: for 2.x, and 3.x.

alsaf · 07-17-2013, 03:42 PM

ntubski, thanks for that. It works like a dream. To think, I had spent all that time getting the workaround going and believe me, there was times where I was close to tearing what my hair I've got out, and it so easy to do!!