LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-14-2021, 10:27 AM   #1
czezz
Member
 
Registered: Nov 2004
Distribution: Slackware/Solaris
Posts: 889

Rep: Reputation: 41
[Python] - python -m json.tool changes foreign language letters into unicode (topic updated)


How to tell curl not to mix up German letters?
Eg. When I execute following GET request, I get bushes like this: grtner > g\u00e4rtner

Code:
curl -k -s --location --request GET 'https://address:443/service/list' \
--header 'Content-Type: application/json' \
--header 'Content-Type: text/html; charset=windows-1252' \ 
--header 'Authorization: Basic ###secret###'
I tried to add charset=windows-1252 or iso-8859-1 or utf-8 but seems to be ignored.
Any ideas how to overcome this?


[UPDATE]
Quick and dirty workaround would be this:
Code:
sed 's/\\u00f3//g' | sed 's/\\u00fc//g' |  sed 's/\\u00f6//g' | sed 's/\\u00e4//g' | sed 's/\\u00e9//g'
... but there must be a better way?

Last edited by czezz; 05-15-2021 at 06:55 AM.
 
Old 05-14-2021, 01:19 PM   #2
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 17,237
Blog Entries: 10

Rep: Reputation: 5160Reputation: 5160Reputation: 5160Reputation: 5160Reputation: 5160Reputation: 5160Reputation: 5160Reputation: 5160Reputation: 5160Reputation: 5160Reputation: 5160
It depends entirely on the encoding of the requested data.
We need more information.

FWIW, bash is perfectly capable to show "German" characters, and so is curl:
Code:
$> 
bash: : command not found
$> echo  > file
$> curl "file:///file"
 
1 members found this post helpful.
Old 05-14-2021, 01:20 PM   #3
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 16,490

Rep: Reputation: 5532Reputation: 5532Reputation: 5532Reputation: 5532Reputation: 5532Reputation: 5532Reputation: 5532Reputation: 5532Reputation: 5532Reputation: 5532Reputation: 5532
you may try to use -o <filename> to save that page and check if that was ok
 
Old 05-15-2021, 12:06 AM   #4
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,170
Blog Entries: 1

Rep: Reputation: 1536Reputation: 1536Reputation: 1536Reputation: 1536Reputation: 1536Reputation: 1536Reputation: 1536Reputation: 1536Reputation: 1536Reputation: 1536Reputation: 1536
@OP
Hi,
I think the code-snippet you included to demonstrate the problem doesn't contain any accented letters, so it doesn't demonstrate the problem.
Nonetheless, I don't think `curl` would perform any transformation on the text it downloads.

Here is an example:
Code:
curl -o accent.html http://lzsiga.users.sourceforge.net/ekezet.html
Now accent.html has the same content and encoding (ISO-8859-2) as the original file has, no conversion occured.
 
Old 05-15-2021, 06:51 AM   #5
czezz
Member
 
Registered: Nov 2004
Distribution: Slackware/Solaris
Posts: 889

Original Poster
Rep: Reputation: 41
Hi all,
Thank you for looking into this topic.
Note: topic of this post has been updated.

What I didnt mention (and at the time of writing 1st post, didnt know it matters/is important) is I pipe curl with | python -m json.tool
Only now it turns out this is a trouble maker (without python -m json.tool letters are kept as they are).
Apparently | python -m json.tool converts all special letters to unicode number (So, in my example grtner will be converted to g\u00e4rtner).
Is there any way to tell python not to do that?


For the record, whole GET request look like this:
Code:
curl -k -s --location --request GET 'https://address:443/service/list' \
--header 'Content-Type: application/json' \
--header 'Content-Type: text/html; charset=windows-1252' \ 
--header 'Authorization: Basic ###secret###' | python -m json.tool

Last edited by czezz; 05-15-2021 at 06:57 AM.
 
Old 05-15-2021, 07:15 AM   #6
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,170
Blog Entries: 1

Rep: Reputation: 1536Reputation: 1536Reputation: 1536Reputation: 1536Reputation: 1536Reputation: 1536Reputation: 1536Reputation: 1536Reputation: 1536Reputation: 1536Reputation: 1536
With PHP:
Code:
echo '{"tag": "value"}' |
php -r '$x= file_get_contents("php://stdin");
        $y= json_decode ($x, $assoc= TRUE);
        print json_encode($y,JSON_PRETTY_PRINT | JSON_UNESCAPED_UNICODE)."\n";'
 
Old 05-15-2021, 10:06 AM   #7
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 2,321

Rep: Reputation: Disabled
Starting with Python 3.9, json.tool has the option --no-ensure-ascii.

Alternatively, use any other JSON pretty printer. NevemTeve has shown the PHP solution above. jq will also do. So will json_xs that comes with the Perl module JSON::XS or json_reformat that comes with YAJL.

json_pp will require specifying some options though:
Code:
echo '{"s":""}'|json_pp -json_opt pretty,utf8
BTW, inside a Python script you can always specify any JSON encoding options you want, even if they are not supported on the command line:
Code:
#!/usr/bin/python3
import argparse
import json
import sys

prog = 'jsontool'
description = 'Pretty print JSON'
parser = argparse.ArgumentParser(prog=prog, description=description)
parser.add_argument('infile', nargs='?', type=argparse.FileType(),
                    help='an input file to be converted')
parser.add_argument('outfile', nargs='?', type=argparse.FileType('w'),
                    help='write the output of infile to outfile')
options = parser.parse_args()

infile = options.infile or sys.stdin
outfile = options.outfile or sys.stdout
with infile:
    try:
        indata = json.load(infile)
    except ValueError as e:
        raise SystemExit(e)
with outfile:
    json.dump(indata,outfile,ensure_ascii=False,indent=1)
    outfile.write('\n')

Last edited by shruggy; 05-15-2021 at 10:42 AM.
 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] HowTo setup hp keyboard KBAR211 to swiss-german in Debian10? (it behave like standard german qwertz for now) floppy_stuttgart Linux - Desktop 3 04-29-2021 04:10 AM
LXer: German Foreign Office explains open source elimination LXer Syndicated Linux News 0 05-12-2011 07:10 PM
LXer: Background: German Foreign Office drops Linux LXer Syndicated Linux News 0 02-23-2011 08:10 AM
LXer: German Foreign Office kills desktop Linux, hugs Windows XP LXer Syndicated Linux News 0 02-22-2011 03:50 PM
No more desktop Linux systems in the German Foreign Office okcomputer44 Linux - News 2 02-18-2011 09:26 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 01:24 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration