Quote:
Originally Posted by dbee
1) So ascii was basically a one byte, 256 character code page with the first 1-127 being the most popular printable characters right ?
|
No, ASCII is only 7-bit (0-127). 32-126 are the printable characters. ASCII forms the first 128 characters of Unicode.
Quote:
Originally Posted by dbee
Unicode was then an expansion of that to a 4 byte code page with virtually every character that was ever written and known about ?
|
No, Unicode is an abstract of assignment of integers to characters and has nothing to do with how things are represented in data. That is specified by specific encodings (like UTF-8).
Quote:
Originally Posted by dbee
2) UTF8 is pretty much the same as unicode right ?
3) But instead of the Koreans having to use a 4 byte character encoding for all of their text files, unicode allows them to switch the 'charsets' and just to use the first 256 characters like we do (or did) with ascii ?
4) why do my sheets then print a little bit of both ?
|
No, UTF-8 is a specific Unicode encoding. It is the most popular encoding because it is ASCII-compatible (i.e. ASCII characters are represented using 1 byte the same way that ASCII is represented, so that ASCII text is automatically also UTF-8). It uses 1, 2, 3, or 4 bytes depending on the character. So it is not "switching" between anything at all; it is natural for different characters to have different widths in UTF-8.
Quote:
Originally Posted by dbee
5) also ... if I print out a data file to my screen why do I get large amounts of the same weird character, instead a mix of letters, numbers and characters that I'd image I would get if I printed out random characters from 1-256 on an ascii codepage ?
|
I am not sure. What character is this?
There are many other Unicode encodings, like UTF-16 (which uses 2 or 4 bytes per character, but is more efficient than UTF-8 for Asian characters), UTF-32 (which uses 4 bytes per character), etc. If you try to view stuff with the wrong encoding it will show weird things, e.g. if you view UTF-16 text with UTF-8, it will often have an extra garbage character between every character.