LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   How to convert a windows like text file into *nix like utf-8 encoding automately? (https://www.linuxquestions.org/questions/linux-newbie-8/how-to-convert-a-windows-like-text-file-into-%2Anix-like-utf-8-encoding-automately-870404/)

kcynice 03-23-2011 01:55 AM

How to convert a windows like text file into *nix like utf-8 encoding automately?
 
Mostly, I want to convert many text files(copied from windows workstation) into utf-8 encoding file. Yes, iconv is available for it. However, I have to give source file encoding at the command line parameters! The problem is, at most case, I am not sure the source encoding of it. And, I also want to use a script to convert many files recursively.

What should I do?

MensaWater 03-23-2011 08:08 AM

I suspect you're talking about the difference between DOS/Windows text files and UNIX/Linux text files. That is that the former have a carriage return at the end of every line whereas the latter only have a line feed. When you view the file in *nix with vi/vim you see a ^M at the end of every line - that is the carriage return (ctrl-M).

You can convert files with dos2unix or unix2dos commands. Typing "man dos2unix" or "man unix2dos" will give you more details. They're fairly simple to use.

kcynice 03-23-2011 08:23 PM

Quote:

Originally Posted by MensaWater (Post 4300328)
I suspect you're talking about the difference between DOS/Windows text files and UNIX/Linux text files. That is that the former have a carriage return at the end of every line whereas the latter only have a line feed. When you view the file in *nix with vi/vim you see a ^M at the end of every line - that is the carriage return (ctrl-M).

Not exactly, CR/LF is a problem, but not the most. The most issue is, Im a Chinese, most of my files have Chinese words included. When such files are copied from windows to linux, some softwares can't work normally. Yes, i can convert them to utf-8 encoding by hands, but, I can't do so many files one by one, because I can't guarantee all the source file use the same encoding.


All times are GMT -5. The time now is 12:36 PM.