LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 11-21-2018, 01:19 PM   #1
Bartonsen
Member
 
Registered: Oct 2016
Posts: 41

Rep: Reputation: Disabled
Replace the Euro-sign (€) using sed-


"file.csv" is converted from excel, and contains a lot of the euro-sign €.
How can I remove this character from the file using bash script?

sed -i 's/\€//g' file.csv
works ok if run from commandline, but running from script, it's not working. Why?

Tried the following:
#!/usr/bin/sh and #!/bin/bash

sed -i "s/\&euro//g" file.csv
sed -i "s/\x80//g" file.csv
sed -i 's/\<80>//g' file.csv
sed -i 's/\€//g' file.csv
 
Old 11-21-2018, 01:29 PM   #2
RandomTroll
Senior Member
 
Registered: Mar 2010
Distribution: Slackware
Posts: 1,963

Rep: Reputation: 271Reputation: 271Reputation: 271
Have you tried
Code:
 tr -d '€'
 
Old 11-21-2018, 01:44 PM   #3
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 19,872
Blog Entries: 12

Rep: Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053
Quote:
Originally Posted by Bartonsen View Post
sed -i 's/\€//g' file.csv
works ok if run from commandline, but running from script, it's not working. Why?
i see no reason why it shouldn't work, other than that your bash script file isn't utf-8 encoded?
are you using the exact same command?
so you alwys make a copy of the original file.csv before attempting it?
what does the euro sign look like inside the .csv file?
 
Old 11-21-2018, 02:01 PM   #4
Bartonsen
Member
 
Registered: Oct 2016
Posts: 41

Original Poster
Rep: Reputation: Disabled
tr -d '€' works from commandline, but does nothing when running from script.

utf-8 could be an issue, but don't know how to check that.

For the script, I enter the € sign, but when doing vi on the file, € looks like <80>
 
Old 11-21-2018, 02:03 PM   #5
scasey
LQ Veteran
 
Registered: Feb 2013
Location: Tucson, AZ, USA
Distribution: CentOS 7.9.2009
Posts: 5,727

Rep: Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211
Quote:
Originally Posted by ondoho View Post
what does the euro sign look like inside the .csv file?
Yes. cat or more the csv file, copy the character and paste it into the sed command.
or
Remove it in Excel before converting to csv.
 
Old 11-21-2018, 02:03 PM   #6
BW-userx
LQ Guru
 
Registered: Sep 2013
Location: Somewhere in my head.
Distribution: Slackware (15 current), Slack15, Ubuntu studio, MX Linux, FreeBSD 13.1, WIn10
Posts: 10,342

Rep: Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242
maybe use the hex number thingy.
 
Old 11-21-2018, 02:24 PM   #7
Bartonsen
Member
 
Registered: Oct 2016
Posts: 41

Original Poster
Rep: Reputation: Disabled
Hmmm... doing cat or vi on the csv file, the symbol looks like €.

Opening the csv file in excel, the euro-symbol is displayed as โ‚ฌ

I should add that this is a csv file converted from excel using: libreoffice --headless --convert-to csv file.xlsx --infilter=CSV:59,34,UTF8

Mayber the infilter is wrong?
I chosed 59 because I think that means semicolon as separator, but I don't really know what I'm doing.
It works as expected on other excel files though, but ran into problems now when the excel sheet contains €...

Last edited by Bartonsen; 11-21-2018 at 02:35 PM.
 
Old 11-21-2018, 03:09 PM   #8
Bartonsen
Member
 
Registered: Oct 2016
Posts: 41

Original Poster
Rep: Reputation: Disabled
I'm probably doing something wrong in the infilter for the libreoffice command, but I found this to work ok for my script:

Code:
tr -cd '\11\12\15\40-\176' < file-with-binary-chars > clean-file
 
Old 11-21-2018, 03:11 PM   #9
scasey
LQ Veteran
 
Registered: Feb 2013
Location: Tucson, AZ, USA
Distribution: CentOS 7.9.2009
Posts: 5,727

Rep: Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211
If more displays it as you expect, then
Code:
sed -i 's/€//g' file.csv
should work.
Note that you shouldn't need to escape the €

As stated, you are backing up file.csv before each attempt so it can be restored to try again...or, for testing don't do the inplace: When
Code:
sed 's/€//g' file.csv | grep €
returns nothing, you'll have it.
 
Old 11-21-2018, 06:07 PM   #10
mina86
Member
 
Registered: Aug 2008
Distribution: Debian
Posts: 517

Rep: Reputation: 229Reputation: 229Reputation: 229
‘tr -d €’ is not a correct solution. Observe how it corrupts characters other then euro sign:
Code:
$ echo '<ż๓łw €>' | tr -d €
<ż๓ลw >
This is because tr does not understand Unicode and operates on bytes and € are two bytes.

To delete all non-ASCII characters, you need ‘tr -d '\200-\377' as in:
Code:
$ echo '<ż๓łw €>' | tr -d '\200-\377'
<w >
To delete just the euro sign use sed as has been mentioned by others. If your script doesn’t work, ‘hexdump -C’ can help you find out differences in encoding that may cause that. You can also explicitly encode the character:
Code:
$ echo '<ż๓łw €>' | sed $'s/\342\202\254/EURO/'
<ż๓łw EURO>
though (notice this uses $'…' which is bash extension and) I don’t think you should be needing to do that since just using € should work if you save your files in the same encoding.

Lastly, you can play around with iconv to do various things to your input file, e.g.:
Code:
$ echo '<ż๓łw €>' | iconv -f UTF-8 -t ASCII//TRANSLIT
<zolw EUR>
$ echo '<ż๓łw €>' | iconv -f UTF-8 -t ASCII//IGNORE
<w >

Last edited by mina86; 11-21-2018 at 06:13 PM.
 
1 members found this post helpful.
Old 11-21-2018, 10:13 PM   #11
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,864
Blog Entries: 1

Rep: Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869
Guess your datafile is in windows-125x encoding, while your script is in UTF-8.

tr(1) works on bytes (encoding-independent) so this should work:
Code:
tr -d $'\x80' file.csv >file1.csv
Or, save your script as ISO-8859-1, and add this line at the beginning:
Code:
export LC_ALL=en_US.ISO-8859-1
After it sed might work:
Code:
sed -i 's/\x80//g' file.csv

Last edited by NevemTeve; 11-21-2018 at 10:22 PM.
 
Old 11-22-2018, 03:26 AM   #12
l0f4r0
Member
 
Registered: Jul 2018
Location: Paris
Distribution: Debian
Posts: 900

Rep: Reputation: 290Reputation: 290Reputation: 290
Maybe
Code:
file -i file.csv
file -i <your_script>
could help to know first which encoding is being used...
 
Old 12-11-2018, 12:18 PM   #13
tofino_surfer
Member
 
Registered: Aug 2007
Posts: 483

Rep: Reputation: 153Reputation: 153
Quote:
Tried the following:
#!/usr/bin/sh and #!/bin/bash
On many Linux systems such as RH/Fedora these are the same. sh just links to bash. The old /usr/bin/sh doesn't exist anymore.

Code:
$ type sh
sh is /usr/bin/sh
$ ll /usr/bin/sh
lrwxrwxrwx 1 root root 4 May 18  2018 /usr/bin/sh -> bash
Try this on your system.
 
Old 12-11-2018, 12:28 PM   #14
RandomTroll
Senior Member
 
Registered: Mar 2010
Distribution: Slackware
Posts: 1,963

Rep: Reputation: 271Reputation: 271Reputation: 271
If you're trying to convert from UTF to ISO-8859 try utf8trans.
 
Old 12-11-2018, 02:02 PM   #15
tofino_surfer
Member
 
Registered: Aug 2007
Posts: 483

Rep: Reputation: 153Reputation: 153
The Unicode for euro symbol is 20AC.

http://eurosymbol.eu/ascii-code

Code:
$ echo -e '\u20AC'
€
$ echo -ne '\u20AC' | hexdump -C
00000000  e2 82 ac                                          |...|
00000003
$ echo -e "\xe2\x82\xac"
€
$ echo -e '\xe2\x82\xac'
€
Try the following. I don't know if single or double quotes are needed or if it matters at all which you use. I changed the delimiter to # for readability.

Code:
sed -i 's#\xe2\x82\xac##g' file.csv
sed -i "s#\xe2\x82\xac##g" file.csv

Last edited by tofino_surfer; 12-11-2018 at 02:57 PM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] why my phpldapadmin there is no + sign only "?" that sign aryafedrik Linux - Newbie 2 11-07-2018 07:45 PM
KMail doesn't apply a filter with euro sign ka3ak Linux - Software 2 03-21-2016 10:19 AM
No euro-sign in RHEL 5 hunnemad Linux - Newbie 2 05-10-2008 04:03 PM
sign on invisible in gaim - NOT invi after sign on saravkrish Linux - Software 7 09-12-2005 10:55 PM
Big prob ! Special characters (like euro sign) needed + samba koencalliauw Arch 0 08-27-2003 01:23 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 09:03 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration