LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices



Reply
 
Search this Thread
Old 01-26-2009, 06:32 PM   #1
BrianK
Senior Member
 
Registered: Mar 2002
Location: Los Angeles, CA
Distribution: Debian, Ubuntu
Posts: 1,334

Rep: Reputation: 51
python: how to handle unicode chars in ascii strings?


So, in python, this assignment is legal, but it breaks conversion:

Code:
In [44]: sys.getdefaultencoding()
Out[44]: 'ascii'

In [45]: a = 'André'

In [46]: a.encode('ascii','replace')
---------------------------------------------------------------------------
exceptions.UnicodeDecodeError                        Traceback (most recent call last)

/hosts/soho/v11/users/briank/<ipython console>

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)
however, if I specify that the string is unicode, this all works:

Code:
In [47]: a = u'André'

In [48]: a.encode('ascii','replace')
Out[48]: 'Andr??'
The problem I'm having here is that I'm using os.walk to go through a bunch of files... some of those files have paths with unicode chars. I'm unclear on how to get the results of os.walk to be considered unicode such that the encode function works correctly.

At the end of the day, All these paths are going into a database. I just want everything that comes out of that database to be ascii & I'm having a hard time with that when unicode chars make their way into strings that think they are ascii.

Hope that made sense.

So... how can you convert ascii strings with unicode chars into actual ascii strings. Pardon me if my terminology is incorrect - see my first example for what I mean by "unicode chars in ascii strings"
 
Old 01-26-2009, 07:48 PM   #2
unSpawn
Moderator
 
Registered: May 2001
Posts: 27,777
Blog Entries: 54

Rep: Reputation: 2976Reputation: 2976Reputation: 2976Reputation: 2976Reputation: 2976Reputation: 2976Reputation: 2976Reputation: 2976Reputation: 2976Reputation: 2976Reputation: 2976
Please post your thread once and in only one forum. Posting a single thread in the most relevant forum will make it easier for members to help you and will keep the discussion in one place. This thread should be closed because it is a duplicate of http://www.linuxquestions.org/questi...trings-700041/.
 
Old 01-26-2009, 08:52 PM   #3
niknah
Member
 
Registered: Dec 2002
Location: In front of a computer
Distribution: UPS, DHL, FedEx
Posts: 466

Rep: Reputation: 38
I've double posted before too when the site was down and I hit reload.

What you want is maybe encode('latin-1'... instead of 'ascii'.
 
Old 01-26-2009, 09:43 PM   #4
BrianK
Senior Member
 
Registered: Mar 2002
Location: Los Angeles, CA
Distribution: Debian, Ubuntu
Posts: 1,334

Original Poster
Rep: Reputation: 51
To add to the mystery, I tried this, which worked in one python session, worked in the script, though I can't seem to make another example of it, so I suspect it's not totally correct...

Code:
codecs.unicode_internal_decode(a)[0].encode('ascii','replace')
I'm open to better solutions if anyone has one.
 
Old 01-26-2009, 09:45 PM   #5
BrianK
Senior Member
 
Registered: Mar 2002
Location: Los Angeles, CA
Distribution: Debian, Ubuntu
Posts: 1,334

Original Poster
Rep: Reputation: 51
Quote:
Originally Posted by unSpawn View Post
Please post your thread once and in only one forum. Posting a single thread in the most relevant forum will make it easier for members to help you and will keep the discussion in one place. This thread should be closed because it is a duplicate of http://www.linuxquestions.org/questi...trings-700041/.
yeah, must have been a site error (or something on my end... not my finger ). It took forever to post & I only clicked "submit" once. After a few minutes, it came back with a 503 (I think) gateway error.

Duly noted, however.

Last edited by BrianK; 01-26-2009 at 09:51 PM.
 
Old 01-27-2009, 05:20 PM   #6
unSpawn
Moderator
 
Registered: May 2001
Posts: 27,777
Blog Entries: 54

Rep: Reputation: 2976Reputation: 2976Reputation: 2976Reputation: 2976Reputation: 2976Reputation: 2976Reputation: 2976Reputation: 2976Reputation: 2976Reputation: 2976Reputation: 2976
If it was due to a server timeout then NP, in any case the other thread was closed so no efforts were wasted.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
python: how to handle unicode chars in ascii strings? BrianK Programming 1 01-26-2009 10:46 PM
python: how do you replace unicode chars in large text files? BrianK Programming 1 12-19-2008 01:54 AM
cyrillic chars - unicode or koi8_ru ojav Linux - Newbie 1 05-29-2005 03:51 PM
can fedora handle ascii strings for wep keys? ehawk Linux - Wireless Networking 1 10-13-2004 07:34 PM
Fun with strings & chars in C Scrag Programming 4 05-19-2004 04:03 PM


All times are GMT -5. The time now is 09:32 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration