Visit the LQ Articles and Editorials section
Go Back > Forums > Linux Forums > Linux - Software
User Name
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.


  Search this Thread
Old 04-15-2012, 01:42 PM   #1
LQ Newbie
Registered: Apr 2012
Posts: 25

Rep: Reputation: Disabled
Windows 7 txt file to Linux conversion problems


I'm having a lot of trouble using a .txt file created by Microsoft Office Word in Linux.

I save the file in Windows Word as a txt file and select save as Unicode (UTF-8). That is also the end output that I need. I than have a conversion program in Linux Ubuntu that needs to run on this file. However I run in to difficulties because the text file contains characters like <C3><AF> when I use cat or Emacs.

I tried almost everything. Saving in different formats, converting with iconf and dos2unix, checking the Ubuntu character standard. But I've always ended up with the same problem, characters between <>. Is there someone who can give me the winning combination?

example line: Ze werken op de computer waarop ze ge<95>nstalleerd zijn
How it should be: Ze werken op de computer waarop ze ge´nstalleerd zijn
Old 04-15-2012, 02:31 PM   #2
Registered: Sep 2005
Distribution: Gentoo, Sabayon, Puppy, Arch
Posts: 165

Rep: Reputation: 29
It looks like the <95> is extended ASCII... the hex for the letter you want, in the example case. I'd make sure Emacs (and your shell in general) is running in utf-8.

Somewhere I have a script that strips diacritical marks off letters, but I'm guessing you want to keep them.
Old 04-15-2012, 02:55 PM   #3
LQ Newbie
Registered: Apr 2012
Posts: 25

Original Poster
Rep: Reputation: Disabled
Thanks to this chart I discovered that it is UTF-8 only is HEX format, still searching for a way to convert this to normal UTF-8.

Last edited by battler; 04-15-2012 at 03:03 PM.
Old 04-15-2012, 03:48 PM   #4
LQ Newbie
Registered: Apr 2012
Posts: 25

Original Poster
Rep: Reputation: Disabled
I solved it, it had nothing to do with program conversion. My locals were wrong. I've changed the following file: /etc/default/locale

Reading material:
Old 04-15-2012, 03:49 PM   #5
John VV
LQ Muse
Registered: Aug 2005
Posts: 15,322

Rep: Reputation: Disabled
have you looked at "dos2unix" and "unix2dos"

however the easiest thing it to
NOT use Microsoft Windows Office to save a text only file

MS Office is known to cause Linux, IBM Unix, and Apple Mac users all kinds of problems
and even cross platform programs ON WINDOWS problems

on windows the best normal everyday test editor ( just plain text) is SciTe


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
Windows 7 txt file to Linux conversion problems battler Linux - Software 1 04-15-2012 01:49 PM
Copy the contents of a txt file to other txt files (with similar names) by cp command TheIndependentAquarius Linux - Newbie 7 07-03-2010 12:54 AM
.doc to .txt conversion vaalu Linux - Newbie 3 05-10-2008 12:46 PM
How can read from file.txt C++ where can save this file(file.txt) to start reading sam_22 Programming 1 01-11-2007 05:11 PM

All times are GMT -5. The time now is 06:54 PM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration