Windows 7 txt file to Linux conversion problems
I'm having a lot of trouble using a .txt file created by Microsoft Office Word in Linux.
I save the file in Windows Word as a txt file and select save as Unicode (UTF-8). That is also the end output that I need. I than have a conversion program in Linux Ubuntu that needs to run on this file. However I run in to difficulties because the text file contains characters like <C3><AF> when I use cat or Emacs.
I tried almost everything. Saving in different formats, converting with iconf and dos2unix, checking the Ubuntu character standard. But I've always ended up with the same problem, characters between <>. Is there someone who can give me the winning combination?
example line: Ze werken op de computer waarop ze ge<95>nstalleerd zijn
How it should be: Ze werken op de computer waarop ze ge´nstalleerd zijn
It looks like the <95> is extended ASCII... the hex for the letter you want, in the example case. I'd make sure Emacs (and your shell in general) is running in utf-8.
Somewhere I have a script that strips diacritical marks off letters, but I'm guessing you want to keep them.
Thanks to this chart I discovered that it is UTF-8 only is HEX format, still searching for a way to convert this to normal UTF-8.
I solved it, it had nothing to do with program conversion. My locals were wrong. I've changed the following file: /etc/default/locale
have you looked at "dos2unix" and "unix2dos"
however the easiest thing it to
NOT use Microsoft Windows Office to save a text only file
MS Office is known to cause Linux, IBM Unix, and Apple Mac users all kinds of problems
and even cross platform programs ON WINDOWS problems
on windows the best normal everyday test editor ( just plain text) is SciTe
|All times are GMT -5. The time now is 07:40 AM.|