Quote:
Originally Posted by jgrizich
My reaction would be to think that plain text isn't platform dependant. Obviously that's not correct. Thank you for the lesson.
|
A "text" file is really just a series of bytes that the system is told to interpret as text characters according to some character encoding scheme, such as ascii, or utf-8 (The most common unicode encoding, and the Linux standard, ascii is also valid utf-8), or cp-1252 (the standard pre-unicode Windows encoding for English).
If you try to read a file created in one encoding with a program set for a different encoding, you tend to get "mojibake", garbled characters. The "interpretation" ends up wrong.
But the problem here is different. Even when the correct encoding is used, dos/windows and unix have traditionally inserted different characters to indicate the end of a "line" of text, in human terms. Unix uses the ascii LINE FEED character (LF, octal 012, often indicated by the backslash symbol '\n'). Dos uses the combination CARRIAGE RETURN+LINE FEED (CRLF, octal 015+012, '\r\n').
Therefore when a unix program reads a dos file, it ends up with an extra invisible CR at the end of each line, and when a dos program reads a unix file, it sees it as having no line endings at all, but with many non-printing LF characters interspersed in the file.
Incidentally, pre-OSX Apple systems used CR only for their line endings, meaning that both dos and unix would see a file as having a single continuous line, and invisible CR characters sprinkled throughout it. But with OSX they've switched to using the unix-style line format.
So bottom line; first make sure you're using the correct encoding, and if the file could have come from a different system, check that you have the correct line endings.