LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices



Reply
 
Search this Thread
Old 12-17-2005, 12:53 AM   #1
dbee
LQ Newbie
 
Registered: Nov 2005
Posts: 3

Rep: Reputation: 0
Code explanation ... ?


So I'm working in korea at the moment and when I print out documents I sometimes get a mixture of English and Hangul (Korean) characters. I'm reasonably familiar with the unicode specs, but I'd like to ask a couple of questions all the same just for clarification.

1) So ascii was basically a one byte, 256 character code page with the first 1-127 being the most popular printable characters right ? Unicode was then an expansion of that to a 4 byte code page with virtually every character that was ever written and known about ?

2) UTF8 is pretty much the same as unicode right ?

3) But instead of the Koreans having to use a 4 byte character encoding for all of their text files, unicode allows them to switch the 'charsets' and just to use the first 256 characters like we do (or did) with ascii ?

4) why do my sheets then print a little bit of both ?

5) also ... if I print out a data file to my screen why do I get large amounts of the same weird character, instead a mix of letters, numbers and characters that I'd image I would get if I printed out random characters from 1-256 on an ascii codepage ?

Again, I'm eager to understand this a little better so if anyone can point out where I've gone wrong here I'd be much obliged.

Thanks
 
Old 12-17-2005, 02:06 AM   #2
spooon
Senior Member
 
Registered: Aug 2005
Posts: 1,755

Rep: Reputation: 48
Quote:
Originally Posted by dbee
1) So ascii was basically a one byte, 256 character code page with the first 1-127 being the most popular printable characters right ?
No, ASCII is only 7-bit (0-127). 32-126 are the printable characters. ASCII forms the first 128 characters of Unicode.

Quote:
Originally Posted by dbee
Unicode was then an expansion of that to a 4 byte code page with virtually every character that was ever written and known about ?
No, Unicode is an abstract of assignment of integers to characters and has nothing to do with how things are represented in data. That is specified by specific encodings (like UTF-8).

Quote:
Originally Posted by dbee
2) UTF8 is pretty much the same as unicode right ?

3) But instead of the Koreans having to use a 4 byte character encoding for all of their text files, unicode allows them to switch the 'charsets' and just to use the first 256 characters like we do (or did) with ascii ?

4) why do my sheets then print a little bit of both ?
No, UTF-8 is a specific Unicode encoding. It is the most popular encoding because it is ASCII-compatible (i.e. ASCII characters are represented using 1 byte the same way that ASCII is represented, so that ASCII text is automatically also UTF-8). It uses 1, 2, 3, or 4 bytes depending on the character. So it is not "switching" between anything at all; it is natural for different characters to have different widths in UTF-8.

Quote:
Originally Posted by dbee
5) also ... if I print out a data file to my screen why do I get large amounts of the same weird character, instead a mix of letters, numbers and characters that I'd image I would get if I printed out random characters from 1-256 on an ascii codepage ?
I am not sure. What character is this?

There are many other Unicode encodings, like UTF-16 (which uses 2 or 4 bytes per character, but is more efficient than UTF-8 for Asian characters), UTF-32 (which uses 4 bytes per character), etc. If you try to view stuff with the wrong encoding it will show weird things, e.g. if you view UTF-16 text with UTF-8, it will often have an extra garbage character between every character.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
User Preferences: Use HTML code instead of vB code? (vB code is overrated) stefanlasiewski LQ Suggestions & Feedback 5 07-26-2005 02:37 AM
Explanation of a phenomenon n3tw0rk Linux - Networking 1 11-10-2004 02:21 PM
fat fs code explanation ramya272 Programming 0 03-06-2004 10:55 AM
Pascal Code Explanation Gerardoj Programming 1 09-25-2003 02:15 AM
I Could Use An Explanation winger Linux - General 3 04-13-2002 11:32 PM


All times are GMT -5. The time now is 01:33 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration