LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Reading from text file and parse specific lines using C++ (https://www.linuxquestions.org/questions/programming-9/reading-from-text-file-and-parse-specific-lines-using-c-4175593116/)

Racooncity86 11-07-2016 05:24 PM

Reading from text file and parse specific lines using C++
 
Dear all,

I am trying to read from a file and parsing each block seperately. Right now, the code sums the last columns up but it is not capable of treating each block as a seperate one and distinguish between block "Energy 1" and "Energy 2".

The data which I am try to read in is given below:

------------------------------------------------------------------------

Calculation
Number of points: 200 # Atoms: 4

Point 1 : 0.00000000 0.00000000 0.00000000 Weighting = 0.00500000

Energy 1 # with weighting 1.00000000

Atom a b c d
1 0.476 0.000 0.000 0.100
2 0.476 0.000 0.000 0.100
1 0.000 -0.000 -0.000 0.200
2 -0.000 -0.000 0.000 0.200


Energy 2 # with weighting 1.00000000

Atom a b c d
1 0.476 0.000 0.000 0.300
2 0.476 0.000 0.000 0.300
1 0.000 -0.000 -0.000 0.400
2 -0.000 -0.000 0.000 0.400


Energy 2 # with weighting 1.00000000

Atom a b c d
1 0.476 0.000 0.000 0.500
2 0.476 0.000 0.000 0.500
1 0.000 -0.000 -0.000 0.600
2 -0.000 -0.000 0.000 0.600

....

....
------------------------------------------------------------------------

The code which I have so far is:

Code:

#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
#include <vector>

using namespace std;



int main()
{
    int rows = 0;
    int columns = 0;
    string line;
    int firstNumber = 0;
    vector<vector<double> > values;
    vector<vector<double> > results;
    vector<double> rowstotal;
    ofstream File;
    ifstream in("datei.txt");
    File.open("Output.txt",ios::app);
    File.setf(ios::fixed);
    File.setf(ios::showpoint);
    File.precision(3);

    if(in.fail())
    {
        cerr << "File kann nicht geoffnet werden" << endl;
        return -1;
    }

    File << "\n" << endl;

    // Speichere jede Zahl
    while(in.good())
    {
       
            bool begin_tag = false;
                while (getline(in,line))
                {       
      if(line.find("Energy  2 #") != std::string::npos ) {
            begin_tag = true;
            continue;
        }
        else if (line == "Energy  1 #")
        {
            begin_tag = false;

        }
               
            istringstream stream(line);
            vector<double> tmp;
            double x;
                       
            while (stream >> x)
              tmp.push_back(x);

            if (tmp.size() > 0)
              values.push_back(tmp);
                                                                       
        }
    }
       
       
    columns = values[0].size();
    for (unsigned i = 1; i < values.size(); ++i)
    {
        if (values[i].size() != columns)
        {
            cerr << "Reihe mit unterschiedlicher columnsnummer" << endl;
            return -1;
        }
    }

    for (unsigned i = 0; i < values.size(); ++i)
        {
        // If number with 1.0 is encountered, add it to the row
        if (values[i][0] == 1.0)
          results.push_back(values[i]);

        // If number with 2.0 is encountered, add it also to the row
        if (values[i][0] == 2.0)
        {
            for (unsigned j = 0; j < values[i].size(); ++j)
              results.back()[j] += values[i][j];
        }
    }

       

    rows = results.size();

    File << "Number of rows # " << rows << endl;
    File << "Number of columns # " << columns << endl;
    File << " " << endl;
       
    for(int i=0; i < rows; i++)
    {
        for(int j=4; j < columns; j++)
        {
            File << results[i][j]  <<  "        " << "  " << endl;
        }
    }

                                                                               
    for(int i=0; i < rows; i++)
    { 
        rowstotal.push_back(0.0);
        for (int j=1; j < columns; j++)
        {
            rowstotal[i] += results[i][j];
        }
    }
       
        File.close();
    in.close();
    return 0;
}

The output is:

Number of rows # 6
Number of columns # 5

0.200
0.400
0.600
0.800
1.000
1.200

As stated above, what I would like to achieve is to sum over only the blocks "Energy 2 #" and ignore the block beginning with "Energy 1#". So the code should give the values:

0.600
0.800
1.000
1.200

I tried to implement a boolean to get it done but somehow I am missing something. I would be really thankful if someone is able to give me a hint or tell me how to solve it.

Thanks in advance :) !

Best wishes,
Racooncity86

notKlaatu 11-07-2016 05:40 PM

I can think of two things off the top of my head; either impose more structure on your data so that you can parse it better, or use Boost to do a regex match on the heading for the block you want to parse.

Racooncity86 11-07-2016 06:18 PM

Yeah, I know I could use boost to get it done but I would like to solve it without using this.Anyway, what do you mean exactly by imposing? Could you be more specific?
Thanks in advance!

crazy-yiuf 11-13-2016 10:55 PM

C++11 also has a <regex> library, if you're just trying to dodge run time dependencies.

Either way, (and without reading your code), my approach would be something like:
Code:

char skip_counter = -1
while readline
    if skip_counter >= 0
        skip_counter--
    else if line.data()[0] == 'E' && line.data()[7] == '1'
        skip_counter = 8
    else if line.data()[0] is a number (look up the ascii range)
        resume processing


Racooncity86 11-17-2016 06:07 PM

Thanks for the hint crazy-yiuf! I understood the idea but still I am trying to figure out how to implement your idea into my code but somehow I cannot accomplish it. Can you help me to adapt your piece of code to mine or provide a working example? Thanks for your effort!

sundialsvcs 11-17-2016 08:14 PM

Also consider using tools such as awk ... and/or the strategies that such tools embrace.

The essential structure of an awk program looks like this:
Code:

/regular expression that matches a line/
  {
    code that is executed when a line that matches this expression is encountered
  }

Regular expressions are a very powerful technology for "ripping apart a line of text to get useful pieces out of it." They are well-implemented by standard libraries and therefore are more-or-less universally supported in every language (other than "C").

a4z 11-18-2016 12:37 AM

go there, people will know if you are asking us for doing your homework
https://www.c-plusplus.net/forum/
weil wenn das so ist kennen die vielleicht soagar deinen Lehrer ;-)

ps: you have even the tags set, if a blog needs to be processed, all you are missing is some if here and there.
You should be able to do that.

Racooncity86 11-18-2016 01:45 AM

Thanks for all commments. Since there are people like "a4z" who think that this is a homework or part of anything I need to do, I will not ask any more questions here. It is just for me to learn and by the way I already provided a working example...I guess people like a4z are just on forums not to help people but trying to be cool when insulting others. Anyway do you have an idea how old I am? (can't stop laughing "Lehrer"...) Nevertheless, thanks for all and the admin can close this thread.

P.s.: I know that it should be only some if statements to read in, I was just making somewhere a mistake resulting in segmentation faul but I will solve it on my own. Thanks for all who contributed in a positive way to my question!

astrogeek 11-18-2016 02:52 AM

Quote:

Originally Posted by a4z (Post 5631877)
go there, people will know if you are asking us for doing your homework
https://www.c-plusplus.net/forum/
weil wenn das so ist kennen die vielleicht soagar deinen Lehrer ;-)

ps: you have even the tags set, if a blog needs to be processed, all you are missing is some if here and there.
You should be able to do that.

Please keep your comments on topic and of a helpful nature, and always respectful of others.

If you see an error, then perhaps you could be the Lehrer and guide the student to the understanding they seek.

a4z 11-18-2016 05:03 AM

Quote:

Originally Posted by astrogeek (Post 5631917)
Please keep your comments on topic and of a helpful nature, and always respectful of others.

If you see an error, then perhaps you could be the Lehrer and guide the student to the understanding they seek.

ähem, some facts:
1)I was helpful and on topic by giving a hint.
2)Since the code contained German comments, the link to the German C++ forum was also a help
3)The question for the homework was according to the forum rules, which you as a mod should know.
http://www.linuxquestions.org/questi...gs-4175464257/
see Nr 6.
Looking at the thread from this perspective I think, if any, than the last post from Racooncity86 would deserve more a mod reaction than my post, maybe with an explanation for Nr6.

crazy-yiuf 11-18-2016 11:20 AM

Without getting into the argument too much, I do think it would be inappropriate to find the error for the OP. "Teach a man to fish" and all that.

But if you're getting a segmentation fault that should make the error easy enough to find. Compile it with the -ggdb flag, then run "gdb ./a.out", then type run<enter>, let it crash, then type bt<enter>. This should tell you the line things go wrong at, which should give you enough information to solve the problem. If you're still stumped you can post the backtrace and I'll give another hint.

crazy-yiuf 11-18-2016 12:00 PM

Also, it looks like you haven't tried basic debug statements. For example if I change it to:
Code:

            if(line.find("Energy  2 #") != std::string::npos ) {
              begin_tag = true;
              cout << "begin true" << endl;
              continue;
            }
            else if (line == "Energy  1 #")
              {
                begin_tag = false;
                cout << "begin false" << endl;
              }

It doesn't print anything (presumably your input file got altered during copy paste? you could try pastebinit). I also don't see any point at which the begin tag is actually used. My recommendation is to attempt implementing what I said in my first post. If you can't get it to work and post the new code, I will give you hints on that.

astrogeek 11-19-2016 01:17 AM

Quote:

Originally Posted by Racooncity86 (Post 5631896)
Thanks for all commments. Since there are people like "a4z" who think that this is a homework or part of anything I need to do, I will not ask any more questions here. It is just for me to learn and by the way I already provided a working example...I guess people like a4z are just on forums not to help people but trying to be cool when insulting others. Anyway do you have an idea how old I am? (can't stop laughing "Lehrer"...) Nevertheless, thanks for all and the admin can close this thread.

P.s.: I know that it should be only some if statements to read in, I was just making somewhere a mistake resulting in segmentation faul but I will solve it on my own. Thanks for all who contributed in a positive way to my question!


I am certain it was not a4z's intent to insult you, personal attacks and insults are not tolerated on LQ.

Your reason for asking for help (i.e. homework, job, study, etc.) is unimportant if your question is clear and you make effort to use the advice offered. It is important to show your own efforts, and to interact with those who have taken the time to respond.

notKlaatu's suggestion to structure your input data differently to allow for easier parsing is a very good idea.

sundialsvcs' suggestion of an awk script may provide an easy way to restructure the data, or to provide a complete solution.

a4z's reference to the "tags" and "ifs", although somewhat cryptic, does also point to a problem with your existing code. You have set the states of begin_tag but never use it! A test of that state would solve your main difficulty.

crazy-yiuf points out the same error a bit more clearly, please look closely at their example.

You have received helpful suggestions from all who have replied, and are encouraged to continue your participation as a valued LQ member. This thread will remain open as further encouragement.

Racooncity86 12-02-2016 09:43 AM

Sorry, had to do alot of other stuff. In the meanwhile, I was able to find the solution to my problem. I followed the suggestion of crazy-yiuf and what astrogeek also pointed out. Thanks to you both and also others who have been helpful!
I also realized that I was setting "begin_tag = false" but was never asking for it.
Thanks again for any supportive hints!

P.s.: a4z, do you really think that I did not make a research before asking a question here? Sorry, your post was just annoying rather than being supportive. Btw, I never asked to give me a full solution to my problem, I was just asking for a "working example", could have been another code where something like this was already done. I wanted to understand it and transfer it to my problem!


All times are GMT -5. The time now is 10:05 PM.