Quote:
Originally Posted by paulsm4
Hi, Sammywammy -
You are 100% correct. The problem is discussed here:
|
Thanks for the reply. I may be a victim of this ISO-8859-1 / Windows-1252 confusion.
What's more, I see certain websites claiming that en dash is not a character in ISO8859-1 (1 hyphen) but is in ISO-8859-1 (2 hyphens) with other websites interchanging those 2 names, so how is anyone new to character encoding supposed to get their head around this?!
http://en.wikipedia.org/wiki/ISO/IEC_8859-1
Even if I assume that there is such a thing as this ISO-8859-1 (different to ISO8859-1) it still wouldn't be the right character encoding for my original file as the application is interpreting '96' as en-dash.
I'm using Ultraedit-32 to get a better view of the bytes in my data and how it's being interpreted by the app (assume this app is a blackbox, it's not mine. I only see the original file and resulting file). I can see that the app interpreted the '96' as an en-dash as it transformed to 'E2 80 93' which is the byte encoding for en-dash in UTF-8.
I tried to see if WINDOWS-1252 / CP-1252 had been used but then came across '81' which is not a valid byte encoding in WINDOWS-1252.
It sounds like I am in a situation where the app took this WINDOWS-1252 data but treated it as something else (or the other way round...I'm really not sure)