LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 04-02-2006, 08:39 PM   #1
chovy
Member
 
Registered: Dec 2004
Location: Capitola, CA
Distribution: Debian
Posts: 51

Rep: Reputation: 15
determine encoding type of a file (ie - UTF-8)


I've tried several methods, including "file -i file.html" and "stat file.html", but it doesn't tell me the encoding type of the file.

I have <?xml version="1.0" encoding="UTF-8"?> in the head of my xhtml file, but how do I know it is really UTF-8?
 
Old 04-03-2006, 12:46 AM   #2
foo_bar_foo
Senior Member
 
Registered: Jun 2004
Posts: 2,553

Rep: Reputation: 51
this is really hard
remember a file that appears to be 100% ascii at the byte level but declares itself UTF-8 can be/is a valid UTF-8 file because UTF-8 overlaps ascii (english for instance).
files and strings which contain only 7-bit ASCII characters have the same encoding under both ASCII and UTF-8. that is these files are both ascii files and UTF-8.
"file" is the Linux utility that tells you encoding
if i save a file as utf-8 in english and do
(gary) ~/test $ file utf8.txt
i get as output
utf8.txt: ASCII text, with no line terminators
but if i do a file in hebrew in utf-8 file says
(gary) ~/test $ file utf8.txt
utf8.txt: UTF-8 Unicode text, with no line terminators

sometimes i see people talk about byte order marks or prefix bytes for unicode encodings and you can see these in Linux for UTF-16 using a hex editor but i have never seen one for UTF-8
Byte Order Mark is not necesary in a XML file at all but XML has a leading less than sign. so the less than sign can give away encoding
but again its the same for ascii and UTF-8


(i was just playing with encoding on my keyboard so i hope this post is still readable english)
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to determine partition type? halturata Linux - General 2 08-11-2005 02:07 PM
How to determine partition type? halturata Linux - General 3 08-11-2005 03:11 AM
How do I determine file system type? lowpro2k3 Linux - General 5 07-09-2005 03:40 PM
How does KDE determine file type? vdemuth Linux - Software 4 01-08-2005 04:08 AM
How do I know how a file is encoded? UTF-8, UTF-16, etc.. ?? brynjarh Linux - General 1 12-03-2004 11:11 AM


All times are GMT -5. The time now is 12:42 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration