LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Free open source CSV format parsing software written in C/C++ (https://www.linuxquestions.org/questions/programming-9/free-open-source-csv-format-parsing-software-written-in-c-c-883886/)

Aquarius_Girl 06-01-2011 03:14 AM

Free open source CSV format parsing software written in C/C++
 
Google directed me to the ones written in .Net/C# etc.
Any ideas on the ones written in C/C++?

evo2 06-01-2011 03:39 AM

awk/gawk is witten in C. Just use: "awk -F ','".

Evo2.

---------- Post added 2011-06-01 at 17:39 ----------

awk/gawk is witten in C. Just use: "awk -F ','".

Evo2.

Aquarius_Girl 06-01-2011 04:14 AM

Quote:

Originally Posted by evo2 (Post 4372814)
awk/gawk is witten in C. Just use: "awk -F ','".

Thanks, that shows the power of awk! I found this: http://www.joeldare.com/wiki/using_awk_on_csv_files

Now I have a choice of calling awk through C by system system call. By that'll be a slow process. On the top of that I want the output in a string rather than a file.

I downloaded the awk source code and peeked in at the mammoth code :(

colucix 06-01-2011 04:39 AM

I've found this: http://code.google.com/p/csv-parser-cplusplus/. It looks promising.

Aquarius_Girl 06-01-2011 04:46 AM

and what have you been eating now-a-days for breakfast, BTW? It indeed looks promising, I've just untared it.

colucix 06-01-2011 04:55 AM

Just a couple of coffee cups! ;)

Aquarius_Girl 06-01-2011 05:07 AM

and I thought you must be having something special in the breakfast to be at your productive best! It is not easy to be on dot, always.

evo2 06-01-2011 06:30 AM

Ahh, sorry I didn't know you were looking for a library: thought you were looking for an executable.

Evo2.

Aquarius_Girl 06-01-2011 06:32 AM

Well, actually I was looking for a "small" program of 15-20 lines which I could embed in my code. :D

Anyway, you were helpful and thanks for that.

catkin 06-01-2011 08:36 AM

Which CSV formats do you want to parse? There's a good description of CSV formats here. Having delimiters and quotes in the data is common; having line ends in the data adds a level of complexity. If you are interested in deriving an algorithm for C++ from awk please ask and I'll post awk code (which does support line ends in the data).

sundialsvcs 06-01-2011 09:16 AM

Don't send C++ to do a camel's business ...

gnashley 06-01-2011 12:07 PM

What about peeking at the 'awk' code in busybox? Or the code that handles IFS in some lightweight shell?

SigTerm 06-01-2011 05:28 PM

Quote:

Originally Posted by Anisha Kaul (Post 4372793)
Google directed me to the ones written in .Net/C# etc.
Any ideas on the ones written in C/C++?

Umm, judging from wikipedia article it shouldn't be that hard to write parser from scratch in C++ (convert any input file into std::list<std::string> or std::vector<std::string> or std::vector<std::vector<std::string>>). using STL or Qt 4 the whole thing will probably take less than 100 (maybe even less than 50) lines. So, is there some kind of problem?

Aquarius_Girl 06-02-2011 12:47 AM

Quote:

Originally Posted by catkin (Post 4373077)
Which CSV formats do you want to parse?

This one: http://www.robosuv.com/html/rddf_2004.html

Quote:

Originally Posted by catkin (Post 4373077)
There's a good description of CSV formats here.

Thanks, will go through that.

Quote:

Originally Posted by catkin (Post 4373077)
having line ends in the data adds a level of complexity. If you are interested in deriving an algorithm for C++ from awk please ask and I'll post awk code (which does support line ends in the data).

By "line ends" you mean some special characters denoting the end of line? Well that file doesn't have anything other than a '\n' char at the end.

and I am not sure that I have understood what you meant by "deriving an algorithm for C++ from awk". Did you mean that you'll be writing the code for the same in awk and then I'll translate the awk code to C++?
Anyway, I am an awk/sed/bash/perl illiterate. So won't be able to "read" anything you post in awk.

If you do bother to post the awk code [and also do write "what" each line is doing in plain English], it might be helpful in present/future, to either me or to someone else.

Thanks for your concern.

Quote:

Originally Posted by SigTerm (Post 4373545)
So, is there some kind of problem?

Not any more. Yesterday I went through some of the code of the software colucix posted. They appeared to be using some libraries, so I thought instead of spending time in exploring, understanding and then extracting code from their software, it is better to reinvent the wheel again. So, yesterday itself, I read the first row of the file [accidentally that was the only row in the file at that time :rolleyes:], put it in the std::string and extracted the substr from it on the basis of position returned by find function, of commas. Converted the string to double type by strtod function..

Didn't think that it would be too simple. Actually previously I had to parse an XML file in Qt and the Qt mailing list people told me to use a new class instead of regular one for parsing. And it really took quite an amount of time in writing that code, handling all the conditions etc.

SigTerm 06-02-2011 06:06 AM

Quote:

Originally Posted by Anisha Kaul (Post 4373804)
Didn't think that it would be too simple. Actually previously I had to parse an XML file in Qt and the Qt mailing list people told me to use a new class instead of regular one for parsing. And it really took quite an amount of time in writing that code, handling all the conditions etc.

XML is much more complicated than CSV.

Quote:

Originally Posted by Anisha Kaul (Post 4373804)
put it in the std::string and extracted the substr from it on the basis of position returned by find function, of commas.

Sounds like you forgot to handle quotes ("). A comma can be within quotes (","), and there can be quotes within quotes (""""). That's the only "difficulty", but it is still fairly trivial to implement. On other hand, if your file doesn't have any quotes, you don't need even that.


All times are GMT -5. The time now is 05:16 AM.