LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices



Reply
 
Search this Thread
Old 09-03-2010, 03:20 AM   #1
qdinar
LQ Newbie
 
Registered: Sep 2008
Posts: 7

Rep: Reputation: 0
how to search in files text that is one-byte encoding? (enc. that's not unicode)


how to search in files text that is one-byte encoding? places - search for files in gnome in ubuntu searches only utf-8 text.

i know one way: install wine and total commander, then search with it. what are better ways?

i have asked this in https://answers.launchpad.net/ubuntu/+question/123912 and in http://ubuntuforums.org/showthread.php?t=1564911 and in freenode channels.
 
Old 09-04-2010, 12:15 AM   #2
kbp
Senior Member
 
Registered: Aug 2009
Posts: 3,758

Rep: Reputation: 644Reputation: 644Reputation: 644Reputation: 644Reputation: 644Reputation: 644
utf8 includes 1 byte encoded characters doesn't it ?

ref: http://en.wikipedia.org/wiki/UTF-8
 
Old 09-04-2010, 01:49 AM   #3
qdinar
LQ Newbie
 
Registered: Sep 2008
Posts: 7

Original Poster
Rep: Reputation: 0
utf-8 includes only latin letters and several other marks like punctuation marks as one bytes. they are near 128 . in one-byte encodings most of them and additionally near 128 letters are one-bytes, which are non-latin letters, like cyrillic, or latin with diacritics.
 
Old 07-18-2011, 03:23 AM   #4
qdinar
LQ Newbie
 
Registered: Sep 2008
Posts: 7

Original Poster
Rep: Reputation: 0
ubuntu's search tool cannot find one-byte encoded characters, because it tries to read them as utf-8 and cannot read them. it only can read latin letters, numbers - (ascii?) that are universally encoded both in one-byte encodings and in utf-8. other(additional) 128 letters of one-byte encoded text it reads as error or accidentally as an random unicode letter, it is in many times a chinese character.
 
Old 07-18-2011, 04:44 AM   #5
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950
Run the file through iconv to a new file) to change the encoding to utf-8, then use that. There's a tool called chardet that can tell you the exact encoding of the file.

Mayn of the major text editors can also autodetect the encoding, and have the ability to save the text back in a different encoding.

UTF-8 uses the same encoding as ascii for the first tier of characters, so an ascii-encoded file is also valid UTF-8. But characters beyond ascii involve multiple bytes.
 
  


Reply

Tags
encoding, files, search, utf8


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Converting lots of text files to Unicode Schreiberling Linux - Software 11 06-11-2013 04:24 PM
[SOLVED] How to search text in all text files of all the sub-directories ? moicpit Linux - Newbie 7 04-21-2010 09:45 AM
Any way to search for video files that use a specific encoding? Rob00 Linux - Software 4 10-27-2009 05:38 PM
python: how do you replace unicode chars in large text files? BrianK Programming 1 12-19-2008 01:54 AM
integer encoding & byte ordering slzckboy Programming 25 07-09-2005 05:40 PM


All times are GMT -5. The time now is 05:17 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration