LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 09-01-2005, 04:01 AM   #1
rajesh_b
Member
 
Registered: Sep 2004
Location: Hyderabad.
Posts: 83

Rep: Reputation: 15
unicode file


Hi all,

The problem is related to 2 byte character representation problem. what i have to do is if the filename is unicode i have to ignore other wise i have to process the file. i.e i have to process only ascii files. I dont know how can i check whether filename is unicoded or not. Can u plz tell me whether there is any C api or function is there which can be used for this. Or some pointers on this. Thanks in advance.


Regards
Rajesh
 
Old 09-01-2005, 05:15 AM   #2
spooon
Senior Member
 
Registered: Aug 2005
Posts: 1,755

Rep: Reputation: 51
I think you misunderstand what Unicode is. Unicode is just an abstract mapping that assigns an integer to each character or modifier. How it is actually represented in data depends on the encoding.

The two most common encodings are UTF-8 and UTF-16. UTF-8 is ASCII-compatible (meaning anything in ASCII is also trivially considered UTF-8 encoded), almost universally used on Unix-like systems, and takes 1-4 bytes per character. UTF-16 takes 2-4 bytes per character. Both of these could use "2 bytes" per character but no Unicode encoding always uses "2 bytes", so it's incorrect to associate Unicode with "2 bytes".

If your job is to distinguish ASCII from non-ASCII then that is easy: pure ASCII only uses characters 0-127; so if it contains any byte that has a value that is 128-255 it is not pure ASCII.
 
Old 09-01-2005, 11:28 PM   #3
rajesh_b
Member
 
Registered: Sep 2004
Location: Hyderabad.
Posts: 83

Original Poster
Rep: Reputation: 15
Hi spooon,
Thanx for u r reply. Yah i mis understood . What i have to do is If the filename contains a character which occupies two bytes or more , i have to ignore the file name, other wise i have to process the filename.

Rajesh
 
Old 09-02-2005, 01:07 AM   #4
jlliagre
Moderator
 
Registered: Feb 2004
Location: Outside Paris
Distribution: Solaris 11.4, Oracle Linux, Mint, Debian/WSL
Posts: 9,789

Rep: Reputation: 492Reputation: 492Reputation: 492Reputation: 492Reputation: 492
A filename, at least on unix, is just a 8 bit character array with nothing about which encoding is used, so there is no definitive way to figure out if a file name is to be represented with one or another encoding.
 
Old 09-02-2005, 04:38 AM   #5
addy86
Member
 
Registered: Nov 2004
Location: Germany
Distribution: Debian Testing
Posts: 332

Rep: Reputation: 31
Isn't the encoding saved somewhere in the description of the file system?
 
Old 09-02-2005, 06:29 AM   #6
theYinYeti
Senior Member
 
Registered: Jul 2004
Location: France
Distribution: Arch Linux
Posts: 1,897

Rep: Reputation: 66
1/ Filenames' encoding is set as an option for some filesystems in the /etc/fstab file.

2/ I can see good reasons for wanting to detect non-UTF files (eg: transforming them into UTF), and no: detecting non-ASCII is not enough (eg: ISO-8859-1 is not UTF) unless you really only use ASCII (in which case UTF files will be identical anyway).
I don't know of any 100%-reliable method for doing such detection. The best solution probably is to parse the file for UTF conformity.

Yves.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Unicode file names Ygrex Linux - Software 2 10-28-2005 04:30 PM
Konqueror File Manager + Unicode soldeace Linux - Software 2 12-07-2004 03:51 PM
Unicode ?? Help 80s Debian 1 11-02-2003 07:27 AM
Unicode file names!! shivasa Linux - General 3 10-16-2003 10:04 PM
RH 8 and UNICODE rafabgood Linux - Software 4 12-10-2002 04:09 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:31 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration