LinuxQuestions.org - non-ascii characters in bash script and unicode

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - non-ascii characters in bash script and unicode (https://www.linuxquestions.org/questions/linux-newbie-8/non-ascii-characters-in-bash-script-and-unicode-593822/)

igor.R

10-22-2007 09:22 PM

non-ascii characters in bash script and unicode

Dear All,

I want to write a shell script that will replace accented characters
in the names of the files by standard ASCII characters according to
some table, like "E -> E, ^U -> U, where "E is E with two dots
and ^U is U with a hat.

But I want that shell script to be completely ASCII file.

So, my question is: how one can encode non-ASCII characters in
shell script using ASCII characters. I know that in html one
uses the following combinations & #x03B1; where 03B1 is a hexadecimal
number of the symbol in the unicode table (in this specific case
03B1 corresponds to Greek letter alpha).

But how one encodes unicode symbols in bash?

if I put
==========================================================================
echo "α"
==========================================================================
in the shell script it will output & #03B1; , but not the symbol alpha.

So, can one process unicode symbols in the file names and in the file contents, using simple shell commands?

Again I want that the script itself to be completely 100% ASCII.

Thanks in advance for any ideas.

raskin

10-23-2007 12:49 PM

Are the files you proceed Unicode ones or not? Anyway, the symbol you want should be represented by sed rules, that is '\xHH' represent character in range 0-255 with number represented in hex as HH (\x20 is space (32, 0x20), for example). If you have an iso8859 file, your special symbols are represented just by one character from top-half (>127) each. Else you may encounter 2-byte symbols. Anyway, hexdump on the file with only symbol is a very reliable way to learn its code (check whitespace - 0x0a in the end is usually not a part of the symbol, and neither is 0x20).

igor.R

10-23-2007 01:13 PM

Quote:

Originally Posted by raskin (Post 2933929)

Thanks for reply.
Yes, the files that I want to process are the Unicode text files.
And the names of those files also contain some non-ascii characters
(Cyrillic letters and accented letters).
I still do not understand from your comment how to program UTF-8
characters using ASCII characters in the bash scripts to process
them with tr or sed or awk commands. Could you, please, give me some
short example of how you are using tr or sed or awk with unicode things.

raskin

10-23-2007 09:37 PM

You are not obliged to tell sed that your file is Unicode. You can byte-for byte reinterpret it as any encoding. Any fixed letter is encoded by a fix sequence of bytes. So, if Cyrillic a is encoded in Unicode as 0xb0 0xd0,

Code:

LC_ALL=C sed -e 's/\xb0\xd0/a/'

will replace it with Latin a.

igor.R

10-24-2007 12:00 AM

Thank you very much, it really works

OK, now I understand that, say

echo '"u' | sed -e 's/"u/\xfc/g' >> output.tmp

will write u with two dots on the top in the file output.tmp

As you said

'\xHH' represent character in range 0-255

But what about other characters? Say, if I want some Greek letters.
Letter Omega has hexadecimal index 03A9.
What should one do in this case?
Because

echo 'Omega' | sed -e 's/Omega/\x03A9/g' >> output.tmp

does not work.

jschiwal

10-24-2007 12:50 AM

At first I thought you could use sed's "y/input set/output set/" command, or the "tr" command as a filter, but I guess that sed considers certain utf8 characters as more than one character.

You could write a sed program (saved as a file) with lines like:

Code:

s/а/a/g

s/в/v/g

s/л/l/g

s/ц/ts/g

s/ь//g

s/г/g/g

s/ /_/g

This file could be pretty long, and you might build it over time to handle more characters. The german character that looks like a script B would be transliterated to ascii as two characters "ss". For the Russian "ц" letter you could use the sed command 's/ц/ts/'.
For a utf-8 character set, there will be a one-to-one correspondence so only one sed program would be needed. For the iso character sets, you would need one sed program for each character set.

Suppose that you call this sed program translate.sed. You could use the pipe "| sed -f translate.sed" in a command to filter and translate the characters.

Code:

for file in *; do

mv "$file" "$(echo "$file" | sed -f translate.sed)"

done

I tried something like sed 's/\0x84d1/f/g' but it didn't work.

Code:

echo 'фффф' | sed 's/\x84\xd1/f/g'

�fff�

There may be more info in the utf8 and readline manpages.

igor.R

10-24-2007 01:07 AM

Quote:

Originally Posted by jschiwal (Post 2934555)

At first I thought you could use sed's "y/input set/output set/" command, or the "tr" command as a filter, but I guess that sed considers certain utf8 characters as more than one character.

Yes, but there must be some way to write ASCII scripts that can
process non-ASCII files

Quote:

Originally Posted by jschiwal (Post 2934555)

You could write a sed program (saved as a file) with lines like:

Code:

s/а/a/g

s/в/v/g

s/л/l/g

s/ц/ts/g

s/ь//g

s/г/g/g

s/ /_/g

Code:

for file in *; do

mv "$file" "$(echo "$file" | sed -f translate.sed)"

done

I know this, but, you see, expressions like
s/л/l/g
s/ц/ts/g
s/ь//g
s/г/g/g

are written in non-ASCII characters. There must be some simple 100%
ASCII solution.

jschiwal

10-24-2007 01:22 AM

Using the iconv program may be a better solution:

http://www.linuxproblem.org/art_21.html

This may work for accented characters, but cyrillic characters will be invalid going to latin1 for example.

raskin

10-24-2007 09:48 AM

Well, if you have any character you want to replace, you just need to cut it out of some sample file, paste it to be the only character in a file, and hexdump it. Then my solution is applicable.

igor.R

10-24-2007 11:14 AM

Ok, now i understand something.

echo 'F' | sed -e 's/F/\x8c\xc4/g' > output.tmp

writes letter Ф in the file,

but the command "hexdump output.tmp" gives

0000000 c48c 000a
0000003

how is this related to \x8c\xc4?
the only common part here is 8c.

By the way, I can see letter Ф in the file only when I use editor.
if I type "cat output.tmp", I do not see any output.
I have other files encoded in utf-8 and those files show their
content under cat command. What is going wrong? Is what I get a
really a unicode file?

raskin

10-24-2007 12:42 PM

The question is if your console is Unicode - for cat. Try 'hexdump -C' to preserve byte order inside words - that is about "8c c4 -> c4 8c".

igor.R

10-24-2007 01:14 PM

Oh! Thanks now I know everything for my script.

Quote:

The question is if your console is Unicode - for cat.

I have downloaded demo file

http://www.cl.cam.ac.uk/~mgk25/ucs/e...UTF-8-demo.txt

and cat command outputs it correctly on the screen.
So I think that this means that my console is Unicode.
Am I wrong?

igor.R

10-24-2007 03:36 PM

Quote:

By the way, I can see letter Ф in the file only when I use editor.

by "editor" I mean GNU Emacs. Vi/Gvim do not understand what is
written, they show some abracadabra.

raskin

10-24-2007 03:47 PM

Well, in GVim try opening and issuing ':e! ++enc=utf-8' . Your console not recognizing some Unicode characters may be missing some fonts.

igor.R

10-24-2007 03:57 PM

no, it still does not work correctly

it was like this:

ŒÄ

and after ':e! ++enc=utf-8' it shows two question marks:

??

Fonts are OK, since I can view letter Ф in emacs in terminal mode,
i.e. using emacs -nw

I suspect that it is not unicode, but something else.
Whilst emacs is smart enough to find correct encoding
automatically, other editors can not do this.

jschiwal

10-24-2007 06:16 PM

Run "file filename". It may report the encoding used.

Code:

cat >test

ясно?

cat >test2

qwerty

file test*

test:  UTF-8 Unicode text

test2: ASCII text

Nice catch about the byte order.

Code:

cat test

ясно?

jschiwal@hpamd64:~> sed 's/\xd1\x8f/ya/;s/\xd1\x81/s/;s/\xd0\xbd/n/;s/\xd0\xbe/o/' test

yasno?

igor.R

10-24-2007 06:50 PM

Code:



echo -e -n '\xa0'  >non-ascii.out

echo -e -n '\xa1' >>non-ascii.out

echo -e -n '\xa2' >>non-ascii.out

echo -e -n '\xa3' >>non-ascii.out

echo -e -n '\xa4' >>non-ascii.out

echo -e -n '\xa5' >>non-ascii.out

echo -e -n '\xa6' >>non-ascii.out

echo -e -n '\xa7' >>non-ascii.out

echo -e -n '\xa8' >>non-ascii.out

echo -e -n '\xa9' >>non-ascii.out

echo -e -n '\xaa' >>non-ascii.out

echo -e -n '\xab' >>non-ascii.out

echo -e -n '\xac' >>non-ascii.out

echo -e -n '\xad' >>non-ascii.out

echo -e -n '\xae' >>non-ascii.out

echo -e -n '\xaf' >>non-ascii.out



echo -e -n '\xb0' >>non-ascii.out

echo -e -n '\xb1' >>non-ascii.out

echo -e -n '\xb2' >>non-ascii.out

echo -e -n '\xb3' >>non-ascii.out

echo -e -n '\xb4' >>non-ascii.out

echo -e -n '\xb5' >>non-ascii.out

echo -e -n '\xb6' >>non-ascii.out

echo -e -n '\xb7' >>non-ascii.out

echo -e -n '\xb8' >>non-ascii.out

echo -e -n '\xb9' >>non-ascii.out

echo -e -n '\xba' >>non-ascii.out

echo -e -n '\xbb' >>non-ascii.out

echo -e -n '\xbc' >>non-ascii.out

echo -e -n '\xbd' >>non-ascii.out

echo -e -n '\xbe' >>non-ascii.out

echo -e -n '\xbf' >>non-ascii.out



echo -e -n '\xc0' >>non-ascii.out

echo -e -n '\xc1' >>non-ascii.out

echo -e -n '\xc2' >>non-ascii.out

echo -e -n '\xc3' >>non-ascii.out

echo -e -n '\xc4' >>non-ascii.out

echo -e -n '\xc5' >>non-ascii.out

echo -e -n '\xc6' >>non-ascii.out

echo -e -n '\xc7' >>non-ascii.out

echo -e -n '\xc8' >>non-ascii.out

echo -e -n '\xc9' >>non-ascii.out

echo -e -n '\xca' >>non-ascii.out

echo -e -n '\xcb' >>non-ascii.out

echo -e -n '\xcc' >>non-ascii.out

echo -e -n '\xcd' >>non-ascii.out

echo -e -n '\xce' >>non-ascii.out

echo -e -n '\xcf' >>non-ascii.out



echo -e -n '\xd0' >>non-ascii.out

echo -e -n '\xd1' >>non-ascii.out

echo -e -n '\xd2' >>non-ascii.out

echo -e -n '\xd3' >>non-ascii.out

echo -e -n '\xd4' >>non-ascii.out

echo -e -n '\xd5' >>non-ascii.out

echo -e -n '\xd6' >>non-ascii.out

echo -e -n '\xd7' >>non-ascii.out

echo -e -n '\xd8' >>non-ascii.out

echo -e -n '\xd9' >>non-ascii.out

echo -e -n '\xda' >>non-ascii.out

echo -e -n '\xdb' >>non-ascii.out

echo -e -n '\xdc' >>non-ascii.out

echo -e -n '\xdd' >>non-ascii.out

echo -e -n '\xde' >>non-ascii.out

echo -e -n '\xdf' >>non-ascii.out



echo -e -n '\xe0' >>non-ascii.out

echo -e -n '\xe1' >>non-ascii.out

echo -e -n '\xe2' >>non-ascii.out

echo -e -n '\xe3' >>non-ascii.out

echo -e -n '\xe4' >>non-ascii.out

echo -e -n '\xe5' >>non-ascii.out

echo -e -n '\xe6' >>non-ascii.out

echo -e -n '\xe7' >>non-ascii.out

echo -e -n '\xe8' >>non-ascii.out

echo -e -n '\xe9' >>non-ascii.out

echo -e -n '\xea' >>non-ascii.out

echo -e -n '\xeb' >>non-ascii.out

echo -e -n '\xec' >>non-ascii.out

echo -e -n '\xed' >>non-ascii.out

echo -e -n '\xee' >>non-ascii.out

echo -e -n '\xef' >>non-ascii.out



echo -e -n '\xf0' >>non-ascii.out

echo -e -n '\xf1' >>non-ascii.out

echo -e -n '\xf2' >>non-ascii.out

echo -e -n '\xf3' >>non-ascii.out

echo -e -n '\xf4' >>non-ascii.out

echo -e -n '\xf5' >>non-ascii.out

echo -e -n '\xf6' >>non-ascii.out

echo -e -n '\xf7' >>non-ascii.out

echo -e -n '\xf8' >>non-ascii.out

echo -e -n '\xf9' >>non-ascii.out

echo -e -n '\xfa' >>non-ascii.out

echo -e -n '\xfb' >>non-ascii.out

echo -e -n '\xfc' >>non-ascii.out

echo -e -n '\xfd' >>non-ascii.out

echo -e -n '\xfe' >>non-ascii.out

echo -e -n '\xff' >>non-ascii.out

¡¢£¤¥¦§¨©ª«¬*®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ

file non-ascii.out gives

to-ascii.out: ISO-8859 text

so this is not a UTF-8 file.

How to convert ISO-8859 file to UTF-8 file?

Does anybody know?

jschiwal

10-25-2007 03:09 AM

Here I will use iconv to convert your file to what I found with "locate 8859". For a real example, the characters in a file should make up actual works with accents, or foreign characters. You should be able to tell if you used the right one by examination. Posting a few sample lines of an actual file would have been more useful.

Code:

for code in $(seq 1 9) 13 14 15; do  echo;echo -n "iso8859-$code :"; iconv -f iso_8859-$code -t utf-8 -o - non-ascii.out; done



iso8859-1 :£¤¥¦§¨©ª«¬*®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ

iso8859-2 :Ł¤ĽŚ§¨ŠŞŤŹ*ŽŻ°ą˛ł´ľśˇ¸šşťź˝žżŔÁÂĂÄĹĆÇČÉĘËĚÍÎĎĐŃŇÓÔŐÖ×ŘŮÚŰÜÝŢßŕáâăäĺćçčéęëěíîďđńňóôőö÷řůúűüýţ˙

iso8859-3 :£¤iconv: illegal input sequence at position 2



iso8859-4 :Ŗ¤ĨĻ§¨ŠĒĢŦ*Ž¯°ą˛ŗ´ĩļˇ¸šēģŧŊžŋĀÁÂÃÄÅÆĮČÉĘËĖÍÎĪĐŅŌĶÔÕÖ×ØŲÚÛÜŨŪßāáâãäåæįčéęëėíîīđņōķôõö÷øųúûüũū˙

iso8859-5 :ЃЄЅІЇЈЉЊЋЌ*ЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюя№ёђѓєѕіїјљњћќ§ўџ

iso8859-6 :iconv: illegal input sequence at position 0



iso8859-7 :£€₯¦§¨©ͺ«¬*iconv: illegal input sequence at position 11



iso8859-8 :£¤¥¦§¨©×«¬*®¯°±²³´µ¶·¸¹÷»¼½¾iconv: illegal input sequence at position 28



iso8859-9 :£¤¥¦§¨©ª«¬*®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏĞÑÒÓÔÕÖ×ØÙÚÛÜİŞßàáâãäåæçèéêëìíîïğñòóôõö÷øùúûüışÿ

iso8859-13 :iconv: conversion from `iso_8859-13' is not supported

Try `iconv --help' or `iconv --usage' for more information.



iso8859-14 :£ĊċḊ§Ẁ©ẂḋỲ*®ŸḞḟĠġṀṁ¶ṖẁṗẃṠỳẄẅṡÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏŴÑÒÓÔÕÖṪØÙÚÛÜÝŶßàáâãäåæçèéêëìíîïŵñòóôõöṫøùúûüýŷÿ

iso8859-15 :£€¥Š§š©ª«¬*®¯°±²³Žµ¶·ž¹º»ŒœŸ¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ

I hope we don't confuse the LQ server with all of these strange characters! Just glancing at the results you can see which one supports cyrillic. The documentation for codepages should tell you what locales they are for.

archtoad6

10-25-2007 08:32 AM

Very interesting discussion.

2 coding (as in programming) comments, both involving the use of bash brace expansion:

Code:

for code in $(seq 1 9) 13 14 15

for code in {1..9} 13 14 15

for code in {{1..9},{13..15}}

all do the same thing, the brace expansions are both shorter. The "hybrid" is the shortest -- brace expansion is not the answer to everything.

Code:

echo -e `echo \\\\x{a..f}{{0..9},{a..f}}` > non-ascii.out

is much more compact than the 96 lines above. The trick is getting the correct number of backslashes. Depending on how the 96 lines are written/generated, it may also be more accurate.

igor.R

10-25-2007 02:44 PM

Code:

for code in $(seq 1 9) 13 14 15; do  echo;echo -n "iso8859-$code :"; iconv -f iso_8859-$code -t utf-8 -o - non-ascii.out; done

this is very cool. One can switch between alphabets by changing one letter. Thanks.

Code:

echo -e `echo \\\\x{a..f}{{0..9},{a..f}}` > non-ascii.out

wery interesting ...

But why are there spaces between characters?
And how are you calculating the number of backslashes? There are so many of them, what do they mean?

archtoad6

10-25-2007 04:26 PM

Quote:

Originally Posted by archtoad6 (Post 2936352)

Code:

echo -e `echo \\\\x{a..f}{{0..9},{a..f}}` > non-ascii.out

Empirically. :) -- I just kept doubling the the number of backslashes until the code worked.

If you need a literal '\' to appear in a context like this, you escape it w/ itself: '\\'. Sometimes, like here, that isn't enough, there is a 2nd layer of escaping necessary. Then '\\\\' (which becomes '\\', which becomes '\') is used.

I didn't bother to figure out why 4 is the right number of them to use. I just stopped when I knew I had the right answer.

I knew to try this mainly from reading the gawk documentation.

raskin

10-25-2007 04:32 PM

Well, really it just shows that echo interprets \ by default. First, \\\\ stands without protection in the middle of a command. So it gets collided simultaneously with deciding that
"a\ b" is one word. Now inner echo invocation gets an argument starting with '\\x' . By default echo interprets \-sequences, so the command in `` outputs something beginning with '\x' . Now it gets fed to outer echo, and is used as a hex number starter.

igor.R

10-25-2007 04:52 PM

Quote:

Originally Posted by raskin (Post 2936910)

But where do all these spaces between letters come from?
And what should be modified to get rid of them?

btw

echo -e \\x{a..f}{{0..9},{a..f}} > non-ascii.out

works well too, so one does not need two echos

jschiwal

10-25-2007 07:56 PM

[quote=archtoad6;2936352]Very interesting discussion.

2 coding (as in programming) comments, both involving the use of bash brace expansion:

Code:

for code in $(seq 1 9) 13 14 15

for code in {1..9} 13 14 15

for code in {{1..9},{13..15}}

all do the same thing, the brace expansions are both shorter. The "hybrid" is the shortest -- brace expansion is not the answer to everything.

Thanks for that. I had forgot about it. I'll routinely use the {a,b,c} form of brace expansion but using a range hadn't sunk into my brain enough to remember is.

---

Wikipedia has some good articles about the iso8859 standard. Some of the \xA0-\xFF values are not used so the sample file we used should be adjusted.

archtoad6

10-26-2007 08:46 AM

jschiwal,
OTOH I never knew, or had completely forgotten, seq & its "-w" option. That can produce series like "08 09 10 11", compare:

Code:

echo {0{1..9},{10..20}}

# to

echo `seq -w 1 20`

or worse,

Code:

echo {0{0{0{1..9},{10..99}},{100..999}},{1000..1010}}

# to

echo `seq -w 1 1010`

Just debugging that last brace expansion took me 15 min.

igor.R,
I think the spaces are provided by the shell as word separators during the brace expansion. If you want to remove them use sed 's, ,,g':

Code:

echo -e \\x{a..f}{{0..9},{a..f}} | sed 's, ,,g'

BTW, thanks for showing that the extra echo is unnecessary.

SwaJime

05-09-2009 02:13 PM

deleted - manipulating unicode via bash

SwaJime

06-01-2009 08:53 AM

Solution: removing accent marks from file names

I don't know how to 'fold' posts on this forum, or how to delete them.
Hopefully though, this will be more acceptable:

Code:

$ export FILTER=$(/usr/bin/time -f '%e seconds' ../gen_filter.sh)

18.69 seconds

$ ls -l

total 0

-rw-r--r-- 1 john john 0 2009-06-03 16:01 ËÔ

-rw-r--r-- 1 john john 0 2009-06-03 16:01 α

-rw-r--r-- 1 john john 0 2009-06-03 16:01 αβγδεζηθικλμνξοπρςστυφχψω

-rw-r--r-- 1 john john 0 2009-06-03 16:01 γδεξοπζηθιωαβ-ËÔ

-rw-r--r-- 1 john john 0 2009-06-03 16:01 δεξο νξ-ËÔ γδε

-rw-r--r-- 1 john john 0 2009-06-03 16:01 εξοπ.ωαβ

-rw-r--r-- 1 john john 0 2009-06-03 16:01 λμνξ-ËÔ

$ /usr/bin/time -f "%e seconds" ../rename.sh 

0.15 seconds

$ ls -l

total 0

-rw-r--r-- 1 john john 0 2009-06-03 16:01 a

-rw-r--r-- 1 john john 0 2009-06-03 16:01 abgdeze_iklmnxoprsstyfk_o

-rw-r--r-- 1 john john 0 2009-06-03 16:01 dexo nx-EO gde

-rw-r--r-- 1 john john 0 2009-06-03 16:01 EO

-rw-r--r-- 1 john john 0 2009-06-03 16:01 exop.oab

-rw-r--r-- 1 john john 0 2009-06-03 16:01 gdexopze_ioab-EO

-rw-r--r-- 1 john john 0 2009-06-03 16:01 lmnx-EO

archtoad6

06-02-2009 08:57 AM

SwaJime,

Please edit your posts to fold your extra long code blocks
-- they are causing the worst horizontal scrolling
in Konqueror 3.5.8 that I have ever seen.

If you don't, the only way I can continue
to participate in this thread
is to put you on my ignore list.

<original reaponse>
Thank you, SwaJime, for making this thread unreadable in Konqueror 3.5.8 w/ your extra long code/quote blocks. I can fix this problem in several ways:

unsubscribe
use Firefox
use Opera
put you on my ignore list
hope you edit your posts to eliminate the horizontal scrolling they currently trigger

Guess which I am most likely to do?
</original reaponse>

fpmurphy

06-02-2009 10:51 AM

Quote:

How to convert ISO-8859 file to UTF-8 file?

iconv -f ISO-8859-1 -t UTF-8

SwaJime

10-28-2009 04:36 PM

Newbies Anonymous

Quote:

Originally Posted by archtoad6 (Post 3560336)

SwaJime,

Please edit your posts to fold your extra long code blocks
-- they are causing the worst horizontal scrolling
in Konqueror 3.5.8 that I have ever seen.

If you don't, the only way I can continue
to participate in this thread
is to put you on my ignore list.

[ COLOR="#E6E6E6" ]
< original reaponse >
Thank you, SwaJime, for making this thread unreadable in Konqueror 3.5.8 w/ your extra long code/quote blocks. I can fix this problem in several ways:

unsubscribe
use Firefox
use Opera
put you on my ignore list
hope you edit your posts to eliminate the horizontal scrolling they currently trigger

Guess which I am most likely to do?
< /original reaponse >"
[ /COLOR ]

Toad,
Thank you so much for your warm welcoming hospitality.
I finally, completely accidentally, stumbled upon some information regarding this "folding" that you've so kindly suggested.

I probably won't spend much time posting to any part of this forum in the future, given the gratefulness and appreciation that has been shown to me here so far for my contributions.

I was pleased to note also that the horizontal scrolling "issue" that I am somehow responsible for seems to afflict other posts in this thread, and yet there was apparently some redeeming quality of those that kept you from giving them such helpful advice.

For reference, the page I found that discusses the "folding" is here: http://www.apps.ietf.org/rfc/rfc822.html#sec-3.1.1

--
j

cssfsu82

09-14-2010 02:02 AM

Removing accented chars from file

Hi folks

I am kinda new to the linux world.

I wish to achieve the same function as this thread has, but rather than filenames, I have a huge file which contains several of these accented characters that I need to remove. How can I use the above solution for thaT?

A sample of the file is below. Any help is appreciated.

Landkreis Demmin|Adolf-Pompe-StraÃŸe Am BrÃ¼ll 17| ZÃ¼rich
HeukenstraÃŸe 6|MÃ¶nchengladbach

NateT

12-29-2012 03:45 AM

RE: Removing accented chars from file

I just posted a reply to another thread that asks the same question (as the last post, not the OP).

http://www.linuxquestions.org/questi...7/#post4858893

All times are GMT -5. The time now is 09:44 PM.