c++ - parsing windows textfile: how to strip extra characters?
i've run into a problem reading a windows-generated textfile
onto my linux (mandriva 2007.1) system using c++. it took me a long time and lots of help from the good folks here, but i've finally figured out that the issue i've run into seems to be one of extra, hidden characters in the original text file. i started out by processing one variable read from the textfile and had a lot of problems. i finally got around them by using substr to parse only the first three characters of the line read into my variable. that made things work. my thinking is, however, that the likelihood is that every line in the text file probably has this same issue. that would argue in favor of addressing the issue, not at the individual variable level, but at the file level. in other words, when the text file is first parsed into my script. either that or by somehow processing the textfile before it is read. so, there i have two ideas to pursue: preprocessing the text file, or processing it as it is read. i'm using vector to read the text file. how would i strip extra characters at that stage? alternately, how would i strip the extra characters before the text file comes into the script? the program is below. thanks, BabaG Code:
#include <fstream> |
You're file probably has dos line endings. dos2unix <filename> should fix it. If that's not installed sed -i 's/\r\n/\n/' <filename> should work too.
From inside your program, you can remove '\r' characters from the string, using standard C++ string functions: Code:
while (getline(infile, line)) |
great! thanks, man. will try as soon as i get back in front of the box
that has this stuff on it. this program is for processing a bunch of files which have been moved over from a windows box to a linux box. in that move i'll be also moving the ScriptVariables.txt file. should be simple enough to run dos2unix as a part of the bash script that moves all the files. thanks again, BabaG |
CRLF shouldn't be the problem
since you are creating your ifstream object without "ifstream::binary", it should open the file in text mode, which will automatically translate CRLF sequences into the native format, which on Linux, would be CR, although you could certainly test this theory with a bit of debug output, trying printing the value of each character and comparing?
|
Quote:
|
All times are GMT -5. The time now is 04:11 AM. |