Run "file filename". It may report the encoding used.
Code:
cat >test Code:
cat test |
Code:
¡¢£¤¥¦§¨©ª«¬*®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ file non-ascii.out gives to-ascii.out: ISO-8859 text so this is not a UTF-8 file. How to convert ISO-8859 file to UTF-8 file? Does anybody know? |
Here I will use iconv to convert your file to what I found with "locate 8859". For a real example, the characters in a file should make up actual works with accents, or foreign characters. You should be able to tell if you used the right one by examination. Posting a few sample lines of an actual file would have been more useful.
Code:
for code in $(seq 1 9) 13 14 15; do echo;echo -n "iso8859-$code :"; iconv -f iso_8859-$code -t utf-8 -o - non-ascii.out; done I hope we don't confuse the LQ server with all of these strange characters! Just glancing at the results you can see which one supports cyrillic. The documentation for codepages should tell you what locales they are for. |
Very interesting discussion.
2 coding (as in programming) comments, both involving the use of bash brace expansion: Code:
for code in $(seq 1 9) 13 14 15 Code:
echo -e `echo \\\\x{a..f}{{0..9},{a..f}}` > non-ascii.out |
Code:
for code in $(seq 1 9) 13 14 15; do echo;echo -n "iso8859-$code :"; iconv -f iso_8859-$code -t utf-8 -o - non-ascii.out; done Code:
echo -e `echo \\\\x{a..f}{{0..9},{a..f}}` > non-ascii.out But why are there spaces between characters? And how are you calculating the number of backslashes? There are so many of them, what do they mean? |
Quote:
If you need a literal '\' to appear in a context like this, you escape it w/ itself: '\\'. Sometimes, like here, that isn't enough, there is a 2nd layer of escaping necessary. Then '\\\\' (which becomes '\\', which becomes '\') is used. I didn't bother to figure out why 4 is the right number of them to use. I just stopped when I knew I had the right answer. I knew to try this mainly from reading the gawk documentation. |
Well, really it just shows that echo interprets \ by default. First, \\\\ stands without protection in the middle of a command. So it gets collided simultaneously with deciding that
"a\ b" is one word. Now inner echo invocation gets an argument starting with '\\x' . By default echo interprets \-sequences, so the command in `` outputs something beginning with '\x' . Now it gets fed to outer echo, and is used as a hex number starter. |
Quote:
And what should be modified to get rid of them? btw echo -e \\x{a..f}{{0..9},{a..f}} > non-ascii.out works well too, so one does not need two echos |
[quote=archtoad6;2936352]Very interesting discussion.
2 coding (as in programming) comments, both involving the use of bash brace expansion: Code:
for code in $(seq 1 9) 13 14 15 Thanks for that. I had forgot about it. I'll routinely use the {a,b,c} form of brace expansion but using a range hadn't sunk into my brain enough to remember is. --- Wikipedia has some good articles about the iso8859 standard. Some of the \xA0-\xFF values are not used so the sample file we used should be adjusted. |
jschiwal,
OTOH I never knew, or had completely forgotten, seq & its "-w" option. That can produce series like "08 09 10 11", compare: Code:
echo {0{1..9},{10..20}} Code:
echo {0{0{0{1..9},{10..99}},{100..999}},{1000..1010}} igor.R, I think the spaces are provided by the shell as word separators during the brace expansion. If you want to remove them use sed 's, ,,g': Code:
echo -e \\x{a..f}{{0..9},{a..f}} | sed 's, ,,g' |
deleted - manipulating unicode via bash
<deleted>
|
Solution: removing accent marks from file names
I don't know how to 'fold' posts on this forum, or how to delete them.
Hopefully though, this will be more acceptable: Code:
$ export FILTER=$(/usr/bin/time -f '%e seconds' ../gen_filter.sh) |
SwaJime,
Please edit your posts to fold your extra long code blocks -- they are causing the worst horizontal scrolling in Konqueror 3.5.8 that I have ever seen. If you don't, the only way I can continue to participate in this thread is to put you on my ignore list. <original reaponse> Thank you, SwaJime, for making this thread unreadable in Konqueror 3.5.8 w/ your extra long code/quote blocks. I can fix this problem in several ways:
</original reaponse> |
Quote:
|
Newbies Anonymous
Quote:
Thank you so much for your warm welcoming hospitality. I finally, completely accidentally, stumbled upon some information regarding this "folding" that you've so kindly suggested. I probably won't spend much time posting to any part of this forum in the future, given the gratefulness and appreciation that has been shown to me here so far for my contributions. I was pleased to note also that the horizontal scrolling "issue" that I am somehow responsible for seems to afflict other posts in this thread, and yet there was apparently some redeeming quality of those that kept you from giving them such helpful advice. For reference, the page I found that discusses the "folding" is here: http://www.apps.ietf.org/rfc/rfc822.html#sec-3.1.1 -- j |
All times are GMT -5. The time now is 12:04 PM. |