LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 12-09-2011, 11:12 AM   #1
sopier
Member
 
Registered: Dec 2011
Location: Jogja, Indonesia
Distribution: Ubuntu
Posts: 33

Rep: Reputation: Disabled
replace ascii characters using sed


I'm using gnome terminal, when I run this command:

Code:
echo Michael Bublé | sed 's/[éèêë]/e/g'
the results is fine => Michael Buble

but when I run this code:

Code:
cat artist.txt | sed 's/[éèêë]/e/g'
the result is Michael Bubl

Any solutions on this?


Regards
 
Old 12-09-2011, 11:16 AM   #2
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Hi,

Have you tried the more elegant and resource friendly:
Code:
sed 's/[éèêë]/e/g' artist.txt
Hope this helps.
 
Old 12-09-2011, 11:17 AM   #3
corp769
LQ Guru
 
Registered: Apr 2005
Location: /dev/null
Posts: 5,818

Rep: Reputation: 1007Reputation: 1007Reputation: 1007Reputation: 1007Reputation: 1007Reputation: 1007Reputation: 1007Reputation: 1007
Works for me. Can you post the contents of that file?
 
Old 12-09-2011, 11:19 AM   #4
sopier
Member
 
Registered: Dec 2011
Location: Jogja, Indonesia
Distribution: Ubuntu
Posts: 33

Original Poster
Rep: Reputation: Disabled
Yes, I already tried that too, the result is still the same: Michael Bubl
 
Old 12-09-2011, 11:21 AM   #5
sopier
Member
 
Registered: Dec 2011
Location: Jogja, Indonesia
Distribution: Ubuntu
Posts: 33

Original Poster
Rep: Reputation: Disabled
the content is only one line:

Code:
Michael Bublé
 
Old 12-09-2011, 11:23 AM   #6
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Hi,

What's the output of the following command:
Code:
$ file artist.txt
The output of the above command should show if this is a "normal" ascii file or not.
 
Old 12-09-2011, 11:25 AM   #7
sopier
Member
 
Registered: Dec 2011
Location: Jogja, Indonesia
Distribution: Ubuntu
Posts: 33

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by druuna View Post
Hi,

What's the output of the following command:
Code:
$ file artist.txt
The output of the above command should show if this is a "normal" ascii file or not.
artist.txt: ISO-8859 text
 
Old 12-09-2011, 11:37 AM   #8
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Hi,

That might be the problem.

Mine says UTF-8 Unicode text and both sed 's/[éèêë]/e/g' artist.txt and cat artist.txt | sed 's/[éèêë]/e/g' work on my side.

If I change my locale to anything else then UTF8 the output is incorrect.

Check your locale setting (locale -a), try setting it to, for example, en_US.UTF-8 (export LANG=en_US.utf8) and try again.

Hope this helps.

BTW: The above changes made are only set in your current terminal.
 
Old 12-09-2011, 11:48 AM   #9
sopier
Member
 
Registered: Dec 2011
Location: Jogja, Indonesia
Distribution: Ubuntu
Posts: 33

Original Poster
Rep: Reputation: Disabled
Thumbs up

Quote:
Originally Posted by druuna View Post
Hi,

That might be the problem.

Mine says UTF-8 Unicode text and both sed 's/[éèêë]/e/g' artist.txt and cat artist.txt | sed 's/[éèêë]/e/g' work on my side.

If I change my locale to anything else then UTF8 the output is incorrect.

Check your locale setting (locale -a), try setting it to, for example, en_US.UTF-8 (export LANG=en_US.utf8) and try again.

Hope this helps.

BTW: The above changes made are only set in your current terminal.
Thank you, I finally convert the file first using this command

Code:
iconv -f ISO-8859-1 -t UTF-8 artist_old.txt > artist_new.txt
and the sed replacing goes fine...

Regards
 
Old 12-09-2011, 11:50 AM   #10
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
You're welcome
 
Old 12-09-2011, 12:35 PM   #11
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
Note that you can use
Code:
iconv -f ISO-8859-1 -t ASCII//TRANSLIT input-file > output-file
to convert all accented letters to their nearest ASCII equivalents, for example é è ê ë to e, æ to ae, and so on.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Using sed to replace special characters in a file wskibum Linux - Software 2 03-24-2011 09:47 PM
sed: replace same number of characters between tags unihiekka Linux - Newbie 6 12-30-2008 03:51 AM
How to modify the names of files and replace characters with other characters or symb peter88 Linux - General 2 12-10-2006 03:05 AM
ascii characters lakshman Linux - General 1 03-14-2003 11:28 AM
Deleting non ASCII characters Thinkgeekness Linux - Networking 4 03-04-2003 01:29 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 11:00 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration