LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 12-18-2008, 10:06 PM   #1
BrianK
Senior Member
 
Registered: Mar 2002
Location: Los Angeles, CA
Distribution: Debian, Ubuntu
Posts: 1,334

Rep: Reputation: 51
python: how do you replace unicode chars in large text files?


If I have a text file that's ~1M chars & there happens to be 4 or 5 unicode chars.... is there a quick way to find & replace them without trying to convert the whole thing?

I don't care what they are converted to... could be '_' for all I care. If there's a way to convert to a similar char, that would be great, but not necessary.

If it matters, I'm trying to get around this problem:

Code:
  File "/usr/lib/python2.5/smtplib.py", line 493, in data
    self.send(q)
  File "/usr/lib/python2.5/smtplib.py", line 320, in send
    self.sock.sendall(str)
  File "<string>", line 1, in sendall
UnicodeEncodeError: 'ascii' codec can't encode character u'\uf029' in position 42061: ordinal not in range(128)
 
Old 12-19-2008, 12:54 AM   #2
atom
Member
 
Registered: Feb 2004
Location: Slovenia
Distribution: archlinux
Posts: 271

Rep: Reputation: 31
You can get the behaviour you want by iconv -c.

For instance, you might want to do cat largefile | iconv -f utf-8 -t ascii -c > new_file. This drops all invalid characters. You can also do that from python.

There is also another solution: replace all character codes above 127 with a regular expression. utf-8 matches ascii for the first part of the character set.

Regards,


Gašper
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Converting lots of text files to Unicode Schreiberling Linux - Software 11 06-11-2013 03:24 PM
Python: find defined text string in a file, and replace the whole line Dark Carnival Programming 6 05-22-2007 06:02 AM
cyrillic chars - unicode or koi8_ru ojav Linux - Newbie 1 05-29-2005 02:51 PM
replace text in files and directories rincewind Linux - Software 4 10-27-2004 11:29 AM
Script file to replace large text blocks in files? stodge Linux - Software 0 09-27-2003 10:53 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 12:43 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration