[SOLVED] Searching a doc. for UTF-8 hex instances and converting
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Searching a doc. for UTF-8 hex instances and converting
I'm trying to write a script that searches a document for UTF-8 hex and converts each instance found to its corresponding character. Does this exists already in Bash or, more broadly, in Linux?
If it doesn't, is there a library that I can use or a CSV file that I can use as a database to feed to sed?
I'm trying to write a script that searches a document for UTF-8 hex and converts each instance found to its corresponding character. Does this exists already in Bash or, more broadly, in Linux?
If it doesn't, is there a library that I can use or a CSV file that I can use as a database to feed to sed?
I usually do use sed and awk to perform text processing, also there is the tr command which I've not used much. And then of course you can feed them into a script. What I believe you have to do is to tell your environment that you are coding in UTF-8. Sorry you'll have to search for things like "using UTF-8 and sed" or awk, or tr, I've not done this type of conversion.
Yes they exist in linux and are called hex editors. I'm not sure which ones are capable of UTF-8 but there are both command line and GUI programs. There are lots of hex editors and some examples are:
Command line
hexedit
xxd
GUI
wxMEdit
ghex
bless
vim has the capability of displaying a file as hex code.
If you look at the posted link there is a table that shows the hex code versus character. Since UTF-8 is backwards compatible with standard ASCII the letter A is represented by X0041.
It would be helpful if the OP could provide an actual example of the file they wish to convert.
It is not clear to me whether the source file is unicode, or perhaps ascii with unicode characters represented as hex characters, or a hex dump of a unicode or ascii file, formatted or raw. For example, I can imagine any of the following fitting the vague description given:
Code:
Some text including some unicode characters: ± « §.
Some text including some unicode character hex values: c2b1 c2ab c2a7.
00000000: 536f 6d65 2074 6578 7420 696e 636c 7564 Some text includ
00000010: 696e 6720 736f 6d65 2075 6e69 636f 6465 ing some unicode
00000020: 2063 6861 7261 6374 6572 733a 20c2 b120 characters: ..
00000030: c2ab 20c2 a72e 0a .. ....
536f6d65207465787420696e636c7564696e6720736f6d6520756e69636f
646520636861726163746572733a20c2b120c2ab20c2a72e0a
NevemTeve appears to be right, that is not UTF-8 or simple ASCII, it is urlencoded.
PHP's urldecode() is probably the easiest single solution, but I found several interesting approaches here and here, although some have a few caveats on use, mostly related to '+' signs for spaces and text with embedded backslashes.
See if you can find one of those to meet your needs and let us know if you need more help!
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.