I am learning phrases in the French language at the moment and have written a wee Python program that creates a web-page containing these phrases and their English equivalent. I then copy this web-page to my Kindle which I use as flash-cards to memorise them.
I have the program working but I am stuck with part of it where I have to convert certain characters to their HTML Unicode equivalent to allow them to be displayed on the web-page. My work-around I have used is to check for unicode characters in each phrase string by converting each character to ASCII using the ord() function. If the first part is 195 I then do a compare with the second part of character against an array which contains these ASCII codes. The code of the particular function is as follows:
Code:
def conv_unicode_str(vocab_str, unicode_arr):
length_str = len(vocab_str)
new_str = ""
x = 0
while x < length_str:
# If current character at pos x is a unicode char
if ord(vocab_str[x]) == 195:
for entry in unicode_arr:
if ord(vocab_str[x +1]) == int(entry[0]):
new_str = new_str + entry[1]
break
# This will skip next char as unicode character is made up of 2 chars
# in string
x = x +2
# Else if normal character then add to string
else:
new_str = new_str + vocab_str[x]
x = x + 1
# return newly created string
return (new_str)
The contents of the unicode file which is copied into an array and used in the above function is as follows:
Code:
# à
160=à
# â
162=â
# ç
167=ç
# è
168=è
# é
169=é
# ì
172=ì
# í
173=í
# î
174=î
# ò
178=ò
# ó
179=ó
# ù
185=ù
# ú
186=ú
I've been googling but I can't an easier way of doing this. Is there a better way of doing this?