how to search in files text that is one-byte encoding? (enc. that's not unicode)
Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Introduction to Linux - A Hands on Guide
This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.
Click Here to receive this Complete Guide absolutely free.
utf-8 includes only latin letters and several other marks like punctuation marks as one bytes. they are near 128 . in one-byte encodings most of them and additionally near 128 letters are one-bytes, which are non-latin letters, like cyrillic, or latin with diacritics.
ubuntu's search tool cannot find one-byte encoded characters, because it tries to read them as utf-8 and cannot read them. it only can read latin letters, numbers - (ascii?) that are universally encoded both in one-byte encodings and in utf-8. other(additional) 128 letters of one-byte encoded text it reads as error or accidentally as an random unicode letter, it is in many times a chinese character.
Run the file through iconv to a new file) to change the encoding to utf-8, then use that. There's a tool called chardet that can tell you the exact encoding of the file.
Mayn of the major text editors can also autodetect the encoding, and have the ability to save the text back in a different encoding.
UTF-8 uses the same encoding as ascii for the first tier of characters, so an ascii-encoded file is also valid UTF-8. But characters beyond ascii involve multiple bytes.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.