LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 09-08-2017, 05:49 AM   #1
Effman
LQ Newbie
 
Registered: Mar 2017
Location: Hamburg
Distribution: Debian
Posts: 17

Rep: Reputation: Disabled
xml_grep problem with utf-8 file with special chars


Hi everyone,
like the headline says I have a problem, getting values out of a utf-8 encoded xml file when special characters are included.

The parts that doesn't worg look like this:
Code:
    <ORT>München</ORT>
    <FARBE>weiß</FARBE>
a wrote a script where I execute xml_grep:

Code:
xml_grep 'FARBE' script.xml --text_only > FARBE.TXT
xml_grep 'ORT' script.xml --text_only > ORT.TXT
When the script reaches "München" and "weiß" my output file contains "München" and "WeiÃ<9f>". It's my first time using xml_grep and I tried using --encoding option - without success.

I hope it is possible to use xml_grep with specail chars, otherwise I have to write these parts of the script using compatible commands.

Thanks!
Effman
 
Old 09-08-2017, 12:26 PM   #2
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 17,227

Rep: Reputation: 2539Reputation: 2539Reputation: 2539Reputation: 2539Reputation: 2539Reputation: 2539Reputation: 2539Reputation: 2539Reputation: 2539Reputation: 2539Reputation: 2539
For 'München' try 'M\ünchen'
For 'weiß' try 'wei\ß'

I'd try to get the grep working off the command line, because xml-grep probably passers some variables or defaults to grep when calling grep.
 
Old 09-08-2017, 12:39 PM   #3
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 23,508

Rep: Reputation: 7777Reputation: 7777Reputation: 7777Reputation: 7777Reputation: 7777Reputation: 7777Reputation: 7777Reputation: 7777Reputation: 7777Reputation: 7777Reputation: 7777
you may try --encoding also you may try to check the LC_* variables (locale settings)
 
Old 09-09-2017, 03:28 AM   #4
Effman
LQ Newbie
 
Registered: Mar 2017
Location: Hamburg
Distribution: Debian
Posts: 17

Original Poster
Rep: Reputation: Disabled
Thanks for your answers. Using \ is no option for me. The xml file contains around 100.000 lines with all kinds of special chars so I am not able to change all of them.

I tried it with --encoding and added utf-8 to the LC variables, same result.
 
Old 09-09-2017, 04:55 AM   #5
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 17,227

Rep: Reputation: 2539Reputation: 2539Reputation: 2539Reputation: 2539Reputation: 2539Reputation: 2539Reputation: 2539Reputation: 2539Reputation: 2539Reputation: 2539Reputation: 2539
Quote:
Originally Posted by Effman View Post
Thanks for your answers. Using \ is no option for me. The xml file contains around 100.000 lines with all kinds of special chars so I am not able to change all of them.

I tried it with --encoding and added utf-8 to the LC variables, same result.
I meant use it in the grep, not the xml. I should have made that clear. Changing every instance of an expression in 100,000 lines of code is not daunting if you have mastered sed.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] issues with sed for removing special chars from file names lleb Linux - General 7 04-01-2015 07:13 PM
[SOLVED] Bad chars on console (not X) with UTF-8 charset brainvision Slackware 2 12-05-2011 03:56 PM
HowTo: special chars (ŰűŐő) in file names ceetrom Linux - Newbie 1 11-03-2006 01:53 PM
Problem with special chars in general smokylux Linux - General 6 05-26-2004 05:46 AM
German umlaute (special chars) in file system steltner Linux - General 5 10-27-2003 03:07 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 12:17 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration