Is there a utility to convert UTF to HTML metacharacters?
Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Introduction to Linux - A Hands on Guide
This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.
Click Here to receive this Complete Guide absolutely free.
Is there a utility to convert UTF to HTML metacharacters?
I convert pages I download from UTF to HTML (I prefer ISO-8859). I use utf8trans, but it relies on a character table, which is incomplete. I've been adding custom entries for years, but that's lame. Converting UTF's 23f0 to ⏰ (for example) would be good enough for me; I could write a script that added all those entries to my character table, which would be inelegant and make it huge. Hmmmm... there's probably a C function to do that, which would make an easy program.
There's a CPAN module for perl that does that already. You just need to specify the range of characters to be encoded. Here's a one-liner to encode x0100 through x2fa1f:
Sorry. The mistake was on my end. map() is needed to build the replacement string. The -C is needed to force I/O as UTF-8. Unicode is still a bit unfamiliar.
Thanks. I have 1 last question: how do I set an environment variable? The man page for this function says,
Quote:
$HTML::HTML5::Entities::hex
This variable controls whether numeric entities will use hexadecimal or decimal notation. It is TRUE (hexadecimal) by default, but can be set to FALSE.
, which I'd like to switch to decimal, but all of the tries I have made to set this variable to false are rejected.
While the perl script worked okay for a short test file, it took minutes on a real-life 80K file (that had only 6 UTF characters to convert). I wrote a program to emit all the entries for a complete character table for utf8trans and it only takes a second longer than before. I had feared that a 65K-entry table would slow utf8trans down a lot but I don't notice.
No problem. Scripts are for a quick solution from the writing perspective not necessarily from the run speed. Can you go into more detail about how you solved it? For some people, C is comfortable and they can write something equivalent quickly.
utf8trans translates utf-8 characters according to a table. Each entry in the table has 2 fields, the number of the utf character to replace, written as a hex string (e.g., 2d0a), a tab, the string to translate it into. The Slackware package comes with a translation table that doesn't have all the characters I want to translate. I had added individual characters as I encountered them. I got tired of this. So I wrote a C program that emitted a complete set of entries and appended this to my table. I didn't do this in the first place because I feared that so-large a table would slow utf8trans down. That turned out not to be the case.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.