LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 04-15-2012, 01:42 PM   #1
battler
LQ Newbie
 
Registered: Apr 2012
Posts: 25

Rep: Reputation: Disabled
Windows 7 txt file to Linux conversion problems


Hi,

I'm having a lot of trouble using a .txt file created by Microsoft Office Word in Linux.

I save the file in Windows Word as a txt file and select save as Unicode (UTF-8). That is also the end output that I need. I than have a conversion program in Linux Ubuntu that needs to run on this file. However I run in to difficulties because the text file contains characters like <C3><AF> when I use cat or Emacs.

I tried almost everything. Saving in different formats, converting with iconf and dos2unix, checking the Ubuntu character standard. But I've always ended up with the same problem, characters between <>. Is there someone who can give me the winning combination?

example line: Ze werken op de computer waarop ze ge<95>nstalleerd zijn
How it should be: Ze werken op de computer waarop ze geïnstalleerd zijn
 
Old 04-15-2012, 02:31 PM   #2
headrift
Member
 
Registered: Sep 2005
Distribution: Gentoo, Sabayon, Puppy, Arch
Posts: 165

Rep: Reputation: 29
It looks like the <95> is extended ASCII... the hex for the letter you want, in the example case. I'd make sure Emacs (and your shell in general) is running in utf-8.

Somewhere I have a script that strips diacritical marks off letters, but I'm guessing you want to keep them.
 
Old 04-15-2012, 02:55 PM   #3
battler
LQ Newbie
 
Registered: Apr 2012
Posts: 25

Original Poster
Rep: Reputation: Disabled
Thanks to this chart I discovered that it is UTF-8 only is HEX format, still searching for a way to convert this to normal UTF-8.

Last edited by battler; 04-15-2012 at 03:03 PM.
 
Old 04-15-2012, 03:48 PM   #4
battler
LQ Newbie
 
Registered: Apr 2012
Posts: 25

Original Poster
Rep: Reputation: Disabled
I solved it, it had nothing to do with program conversion. My locals were wrong. I've changed the following file: /etc/default/locale

from:
Quote:
LANG="nl_NL.UTF-8"
to:
Quote:
LANG="nl_NL.UTF-8"
LC_ALL="nl_NL.UTF-8"
Reading material:
https://help.ubuntu.com/community/Locale
http://www.madboa.com/geek/utf8/
http://ayozone.org/2009/07/27/ubuntu...-iso-to-utf-8/
http://www.cl.cam.ac.uk/~mgk25/unicode.html
 
Old 04-15-2012, 03:49 PM   #5
John VV
LQ Muse
 
Registered: Aug 2005
Location: A2 area Mi.
Posts: 17,624

Rep: Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651
have you looked at "dos2unix" and "unix2dos"

however the easiest thing it to
NOT use Microsoft Windows Office to save a text only file

MS Office is known to cause Linux, IBM Unix, and Apple Mac users all kinds of problems
and even cross platform programs ON WINDOWS problems

on windows the best normal everyday test editor ( just plain text) is SciTe
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Windows 7 txt file to Linux conversion problems battler Linux - Software 1 04-15-2012 01:49 PM
Copy the contents of a txt file to other txt files (with similar names) by cp command Aquarius_Girl Linux - Newbie 7 07-03-2010 12:54 AM
.doc to .txt conversion vaalu Linux - Newbie 3 05-10-2008 12:46 PM
How can read from file.txt C++ where can save this file(file.txt) to start reading sam_22 Programming 1 01-11-2007 05:11 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 11:55 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration