LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 11-11-2004, 09:57 AM   #1
Hko
Senior Member
 
Registered: Aug 2002
Location: Groningen, The Netherlands
Distribution: Debian
Posts: 2,536

Rep: Reputation: 111Reputation: 111
Converting extended ascii (ë,ô) in bash script


Hi,

I need to convert extended characters like ê ö (hope it displays in your browser) to their normal ones in a bash script. So: 'ë' becomes 'e', etc.

Can somebody tell mee how I could do that? Preferably using standard tools available (sed, awk, tr or similar)?

Thanks in advance.
 
Old 11-11-2004, 11:39 AM   #2
Hko
Senior Member
 
Registered: Aug 2002
Location: Groningen, The Netherlands
Distribution: Debian
Posts: 2,536

Original Poster
Rep: Reputation: 111Reputation: 111
OK, I've already found a blunt solution that works (at least on SuSE's default charset), which is enough for me now. I discovered there no general way to do this.

For people interested:
Code:
#!/bin/bash

sed \
-e 's/ä/a/g' \
-e 's/á/a/g' \
-e 's/à/a/g' \
-e 's/â/a/g' \
\
-e 's/ë/e/g' \
-e 's/é/e/g' \
-e 's/è/e/g' \
-e 's/ê/e/g' \
\
-e 's/ï/i/g' \
-e 's/í/i/g' \
-e 's/ì/i/g' \
-e 's/î/i/g' \
\
-e 's/ö/o/g' \
-e 's/ó/o/g' \
-e 's/ò/o/g' \
-e 's/ô/o/g' \
-e 's/ø/o/g' \
\
-e 's/ü/u/g' \
-e 's/ú/u/g' \
-e 's/ù/u/g' \
-e 's/û/u/g' \
\
-e 's/ÿ/y/g' \
-e 's/ý/y/g' \
\
-e 's/ñ/n/g' \
\
-e 's/ÿ/y/g' \
-e 's/ý/y/g' \
\
-e 's/ñ/n/g' \
\
-e 's/ä/A/g' \
-e 's/Á/A/g' \
-e 's/À/A/g' \
-e 's/Â/A/g' \
\
-e 's/Ë/E/g' \
-e 's/É/E/g' \
-e 's/È/E/g' \
-e 's/Ê/E/g' \
\
-e 's/Ï/I/g' \
-e 's/Í/I/g' \
-e 's/Ì/I/g' \
-e 's/Î/I/g' \
\
-e 's/Ö/O/g' \
-e 's/Ó/O/g' \
-e 's/Ò/O/g' \
-e 's/Ô/O/g' \
-e 's/Ø/O/g' \
\
-e 's/Ü/U/g' \
-e 's/Ú/U/g' \
-e 's/Ù/U/g' \
-e 's/Û/U/g' \
\
-e 's/Ý/Y/g' \
\
-e 's/Ñ/n/g' \
\
"$0"

# End Of Script
 
Old 11-11-2004, 12:06 PM   #3
Hko
Senior Member
 
Registered: Aug 2002
Location: Groningen, The Netherlands
Distribution: Debian
Posts: 2,536

Original Poster
Rep: Reputation: 111Reputation: 111
Better yet:
Code:
#!/bin/bash

sed \
-e 's/[äáàâ]/a/g'  \
-e 's/[ëéèê]/e/g'  \
-e 's/[ïíìî]/i/g'  \
-e 's/[öóòôø]/o/g' \
-e 's/[üúùû]/u/g'  \
-e 's/[ÿý]/y/g'    \
-e 's/ñ/n/g'       \
\
-e 's/[ÄÁÀÂ]/A/g'  \
-e 's/[ËÉÈÊ]/E/g'  \
-e 's/[ÏÍÌÎ]/I/g'  \
-e 's/[ÖÓÒÔØ]/O/g' \
-e 's/[ÜÚÙÛ]/U/g'  \
-e 's/Ý/Y/g'       \
-e 's/Ñ/n/g'       \
\
"$1"

Last edited by Hko; 11-11-2004 at 12:10 PM.
 
Old 06-01-2009, 09:03 AM   #4
SwaJime
LQ Newbie
 
Registered: May 2009
Distribution: Ubuntu, CentOS, Redhat, Maemo
Posts: 10

Rep: Reputation: 1
Lightbulb

Quote:
Originally Posted by Hko View Post
Hi,

I need to convert extended characters like ê ö (hope it displays in your browser) to their normal ones in a bash script. So: 'ë' becomes 'e', etc.

Can somebody tell mee how I could do that? Preferably using standard tools available (sed, awk, tr or similar)?

Thanks in advance.
A little late, as usual, but this question was also asked in another thread. I posted the solution in that thread -> http://www.linuxquestions.org/questi...ml#post3559031
 
Old 12-29-2012, 03:42 AM   #5
NateT
LQ Newbie
 
Registered: Dec 2012
Posts: 2

Rep: Reputation: Disabled
I would like to expand on the OP's question and then provide my answer.

I often wish to convert a text file to strict ASCII and to lose as little of the readability as possible. The file may be unicode, or (extremely likely, nowadays) Windows Extended ASCII. It contains chunks of text like

Jörge says, “Look – ½ nuggets!”.

This is easily converted by a person into ASCII: Jorge says, "Look - 1/2 nuggets!".
Conversions that occur are
1) accented ö converted to unaccented o
2) open and close double quotes each converted to "
3) long dash – converted to -
4) symbol ½ converted to three character sequence 1/2

After much googling, I have found that the problem is common, but most of the answers out there miss the mark. The script posted above is very similar to approaches I have used in the past - usually in a perl one-liner, and usually converting just some subset of the "bad" characters out there.

A more complete conversion tool is uni2ascii, but it only converts (translates) UTF-8 and (as the uni2ascii site freely admits) you may have to use iconv first to convert to UTF-8.

So, a technique that has worked well for me lately is the following one-liner:

Code:
iconv --from-code $(file -b --mime-encoding non_ASCII_file.txt | sed 's/unknown-8bit/WINDOWS-1258/') --to-code UTF-8 -c non_ASCII_file.txt | uni2ascii -qB
Easily converted into a bash script or shell function, I just haven't done it yet.
 
1 members found this post helpful.
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
bash script for converting ps to pdf juergenkemeter Linux - General 3 10-10-2005 04:35 PM
extended ascii in kde bwysocki Programming 3 05-18-2005 12:19 PM
bash printing extended ASCII characters nutthick Programming 6 02-04-2005 02:15 PM
Extended ASCII set. exvor Programming 1 12-19-2004 02:44 PM
extended ascii in RH9 deleeuw Linux - General 0 07-18-2003 11:33 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:03 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration