LinuxQuestions.org - C program: memory leak/ segmentation fault/ memory limit exceeded

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - C program: memory leak/ segmentation fault/ memory limit exceeded (https://www.linuxquestions.org/questions/programming-9/c-program-memory-leak-segmentation-fault-memory-limit-exceeded-4175718711/)

C program: memory leak/ segmentation fault/ memory limit exceeded

Hi all,

thank you for all constructive feedback.

Sorry about "annoying" thing. Haven't remarked it was still there before posting. This was a joke from the university, as our professor told us we could write both input[n]=x or n[input]=x but the latter was annoying :).

I have replaced "realloc" with "malloc", so this piece works (I hope).

But now I see the other line where error appears and it looks like the real issue.
[CODE]long int * frequency = (long int*) malloc(2 * MAX * MAX * sizeof(long int));[CODE]

This is the part where I need to go through my array (where I've saved sums of all pairs on the previous step) and find which numbers/sums are repeating and how many times. I've done it easy but ineffectively and need to change.
As of now, I check an array item, and if it contains the number X, I add +1 to the item X in a frequency array, eg frequency[x]+=1.

I've read some articles on Internet and it looks like dictionary can be a better structure to use for this purpose. (I need to check every item in an array and create a list of unique numbers/sums that are present in this array; each number/sum to be associated with the frequency).

Do you think dictionary will be a good choice for that? I basically need to store pairs int->int, specifically sum(a number stored in an array item) -> frequency (how many times this sum was contained in different array items).

Thanks
Alina

------the task ----------------
The problem is to develop a program to analyze input sequence.

Assume a sequence of numbers. We may choose a contiguous range from the input sequence, for instance, we may choose the range between the 5-th number and the 10-th number. A range must contain at least two numbers. The sum of the numbers may be computed for such range. We choose all possible ranges from the input sequence and we compute the sum for each of them. We are interested how many different ranges exist in the input sequence such that these ranges have the same sum.

For instance, the input sequence may be:
1 5 2 4 2 2 2
there is a total of 21 contiguous ranges of length ≥ 2, in particular:
0..1: 1 5 sum=6
0..2: 1 5 2 sum=8
0..3: 1 5 2 4 sum=12
0..4: 1 5 2 4 2 sum=14
0..5: 1 5 2 4 2 2 sum=16
0..6: 1 5 2 4 2 2 2 sum=18
1..2: 5 2 sum=7
1..3: 5 2 4 sum=11
1..4: 5 2 4 2 sum=13
1..5: 5 2 4 2 2 sum=15
1..6: 5 2 4 2 2 2 sum=17
2..3: 2 4 sum=6
2..4: 2 4 2 sum=8
2..5: 2 4 2 2 sum=10
2..6: 2 4 2 2 2 sum=12
3..4: 4 2 sum=6
3..5: 4 2 2 sum=8
3..6: 4 2 2 2 sum=10
4..5: 2 2 sum=4
4..6: 2 2 2 sum=6
5..6: 2 2 sum=4
Of these 21 ranges, there are 12 pairs such that the ranges in the pair have the same sum:
0..1 ~ 2..3 sum=6
0..1 ~ 3..4 sum=6
0..1 ~ 4..6 sum=6
0..2 ~ 2..4 sum=8
0..2 ~ 3..5 sum=8
0..3 ~ 2..6 sum=12
2..3 ~ 3..4 sum=6
2..3 ~ 4..6 sum=6
2..4 ~ 3..5 sum=8
2..5 ~ 3..6 sum=10
3..4 ~ 4..6 sum=6
4..5 ~ 5..6 sum=4
The input of the program is a sequence of integers (positive, zero, and negative values). The reading of the input sequence is terminated when EOF is reached. The input sequence is at most 2000 numbers long.

The output of the program is the total number of pairs of ranges such that the sums of the ranges are the same.

---------whole code of program [updated] ---------

Code:

#include <stdio.h>

#include <stdlib.h>

#define MAX 2000



//Take in the list of numbers

//As the list is input, loop within the j-arrays

//Loop through the result & get data into frequency array 

//Loop through the frequency array to get the result





int choose(int n, int k){

    if (k == 0) return 1;

    return (n * choose(n - 1, k - 1)) / k;

}





int main(){

        



//***************************

//takes input into the array

int * input = (int *) malloc (MAX * sizeof(int));



int n = 0, x;

printf("Input sequence:\n");



while ( scanf ( "%d", & x ) == 1 ) {

                if (n > MAX){

                                free ( input );

                                printf ( "Invalid input.\n" );

                                return EXIT_FAILURE;

                        }



                input [n] = x;

                ++n;

        }

if ( ! feof ( stdin ) ) {

                printf ( "Invalid input.\n" );

                free ( input );

                return EXIT_FAILURE;

        }

        

if ( n == 0 ) {

                printf ( "Invalid input.\n" );

                free ( input );

                return EXIT_FAILURE;

        }

        

//****************************

//Finding sums

long int * pointer  = (long int*) malloc(((n-1) * (n-1)) * sizeof(long int));





for (int i = 0; i < n-1; i ++){

        //i is the column now

        //i + 2 is the N of elements in the sum

        for (int j = 0; j< n-1-i; j ++){ 

                //Add jth, ..., j+i+1 th element

                for (int k = j; k <= j+i+1; k++){ 

                *(pointer + (n-1)*i + j)  += input[k];

        }

        }

}



free(input);



//*******************************

//getting frequency

long int * frequency  = (long int*) malloc(2 * MAX * MAX  * sizeof(long int));





for (int i = 0; i < n-1; i ++){

        for (int j = 0; j < n-1-i; j ++){

                *(frequency + MAX + *(pointer + (n-1) * i + j)) += 1;

        }}





//*****************************

//Finding the max N of pairs

free(pointer);



int pairs = 0;



for (int i = 0; i < 2 * MAX * MAX; i++){

        if (i[frequency] == 2){

                pairs += 1;

        }

        else if (i[frequency] > 2){

                pairs += choose(i[frequency], 2);

} }











//******************************

//Printing whatever i need to print



free (frequency);



printf("Total pairs: %d\n", pairs);



return 0;}



//Maybe, it just wasn't meant to be





/*

 for (int i = 0; i < n-1; i ++){

        for (int j = 0; j < n-1-i; j ++){

        printf("%ld ", *(pointer + (n-1)*i + j));

        }

        printf("\n");}

 */

I've written blogs about how to debug core dumps and use the debugger.

Please see the link in my signature.

Quote:

Originally Posted by AlinaC (Post 6391863)

Debugging has shown that it relates to the following line of code:

Code:

input = (int * ) realloc ( input, sizeof ( * input) * max );

I'm going to shove a rubber duck in your face. Explain to that duck what that line is supposed to do.

you need to check the return value of malloc, realloc and in general any function. Just to know if they failed or not.
you can use valgrind to check memory related issues
you did not provide enough info, for example what did you exactly entered/tried when it failed.
you need also explain how did you compile/build this code (including the versions and also the OS).

Oh yes, what is this good for?

Code:

n [ input ] = x;

Quote:

Originally Posted by pan64 (Post 6391897)

Oh yes, what is this good for?

Code:

n [ input ] = x;

Yeah, I'm surprised that didn't throw a warning.

Quote:

Originally Posted by pan64 (Post 6391897)

Oh yes, what is this good for?

Code:

n [ input ] = x;

As clearly explained by the comment on the previous line, it's good for annoying people.

(see also https://c-faq.com/aryptr/joke.html)

Code:

if (n >= max) {

    max += max/2 + 10;

    input = (int * ) realloc (input, sizeof (*input) * max);

}

This wouldn't work in generic case (where n can be greater than max+(max/2)+10) but here n is incremented by one in every step, so 'n<=max' should be true before this part and 'n<max' after it.
Just to be sure, these checks could be added:

Code:

if (n > max) exit (60);

if (n == max) {

    max += max/2 + 10;

    input = (int * ) realloc (input, sizeof (*input) * max);

    if (input==NULL) exit (61);

}

Quote:

Originally Posted by ntubski (Post 6391972)

As clearly explained by the comment on the previous line, it's good for annoying people.

(see also https://c-faq.com/aryptr/joke.html)

That clearly will not annoy me. It has definitely different consequences.

Quote:

Originally Posted by rtmistler (Post 6391879)

I've written blogs about how to debug core dumps and use the debugger.

Please see the link in my signature.

Thank you for reference, will read it to learn more about debugging

Quote:

Originally Posted by dugan (Post 6391883)

I'm going to shove a rubber duck in your face. Explain to that duck what that line is supposed to do.

I didn't get what you intended to say with this phrase. Was it intentionally rude - or you just was asking to explain what this line of code was intended to do?

Quote:

Originally Posted by pan64 (Post 6391897)

Code:

n [ input ] = x;

Thank you for your questions and advice, this was the most constructive answer!
I've used valgrind, it doesn't find anything.

First I compiled the code with g++ v11.3.0 in Ubuntu, no error. When I tried uploading my code to the university environment, the error appeared. I've tested then this code in Vs Code and then Visual Studio in Windows and got error messages with more info about where the problem could be.

I've tried with 2 examples from below, both failed with the same error
Example program output:
Input sequence:
1 5 2 4
Total pairs: 1

Input sequence:
1 5 2 4 2 2 2
Total pairs: 12

Quote:

Originally Posted by NevemTeve (Post 6392004)

Code:

if (n >= max) {

    max += max/2 + 10;

    input = (int * ) realloc (input, sizeof (*input) * max);

}

Code:

if (n > max) exit (60);

if (n == max) {

    max += max/2 + 10;

    input = (int * ) realloc (input, sizeof (*input) * max);

    if (input==NULL) exit (61);

}

Thank you for advice. After reading more about malloc/realloc, I've decided malloc should work better for my case and changed the code. Now debugging shows an error on another line: long int * frequency = (long int*) malloc(2 * MAX * MAX * sizeof(long int));
I've written about it in the updated text of my question.

At that point, realloc is perfectly valid, as it keeps the already filled values. It's the later malloc-s where 'n' should be used instead of 'MAX'

Anyways, I tried the latest state of your program with valgrind; it showed that 'frequency' is uninitialized: 'malloc' allocates memory, but doesn't clear it; 'calloc' does that:

Code:

- long int * frequency  = (long int*) malloc(2 * MAX * MAX  * sizeof(long int));

+ long int * frequency  = (long int*) calloc(2 * n * n, sizeof(long int));

Quote:

Originally Posted by AlinaC (Post 6392034)

I didn't get what you intended to say with this phrase. Was it intentionally rude - or you just was asking to explain what this line of code was intended to do?

It's a reference to a common debugging technique - instead of asking a colleague for help, you first describe the issue to a rubber duck and imagine it responding with questions as that co-worker might, which you then answer.

Frequently, the action of describing a problem and considering common questions is enough to resolve an issue, but even when it's not it means you've done some basic diagnostic steps that allow you to rule certain things out before taking the issue to someone who isn't a duck.

A note: I'd start with ensuring that all elements are positive: if the minimum of all elements is not positive, then add 1-minimum to every elements.

After that we have n-1 series of nested intervals (only the sum of the elements is relevant; as the elements are positive, these sum-series are increasing).

Code:

input:    1 2 4  3  5  7  1 (n=7)

series#1:  3 7 10 15 22 23

series#2:    6  9 14 21 22

series#3:        7 12 19 20

series#4:          8 15 16

series#5:            12 13

series#6:                8

Now we only have to find the common elements in these list.

The points people are trying to drive home are:

Debug techniques are important to learn, there are multiple methods and they each benefit you. The old printf() can only take you so far.
Actually read the code you've written. Consider that you may possibly be making assumptions about how certain sections and library calls work. Review your code to determine if you fully understand every line you wrote and every return from library calls as well as have called those functions properly.
There are several sources to check and confirm your interim work: valgrind, clang, compiler warnings and errors and increased scrutiny by optional compilation flags, and the ability to run, freeze execution, and examine variables.

Quote:

Originally Posted by AlinaC (Post 6392034)

I didn't get what you intended to say with this phrase. Was it intentionally rude - or you just was asking to explain what this line of code was intended to do?

https://en.wikipedia.org/wiki/Rubber_duck_debugging

Off: now I think my idea to make elements positive by adding a constant value was wrong... So right now I think we have to calculate the n(n-1)/2 interval-sums, then perform a sort to find the duplications.

I find the exercise a little conceptually odd, and I don't fully get this concept of "pairs" rather than just reporting the number of duplicate results, but my first run using the input from the example gives the same results as described in OPs description, so I assume I've understood the goal correctly.

Code:

$ time -p (echo "1 5 2 4 2 2 2" | ./sum_range )

Input items: 7

Range:  0..1: sum = 6

Paired: 0..1 ~ 2..3: sum = 6

Paired: 0..1 ~ 3..4: sum = 6

Paired: 0..1 ~ 4..6: sum = 6

Range:  0..2: sum = 8

Paired: 0..2 ~ 2..4: sum = 8

Paired: 0..2 ~ 3..5: sum = 8

Range:  0..3: sum = 12

Paired: 0..3 ~ 2..6: sum = 12

Range:  0..4: sum = 14

Range:  0..5: sum = 16

Range:  0..6: sum = 18

Range:  1..2: sum = 7

Range:  1..3: sum = 11

Range:  1..4: sum = 13

Range:  1..5: sum = 15

Range:  1..6: sum = 17

Range:  2..3: sum = 6

Paired: 2..3 ~ 3..4: sum = 6

Paired: 2..3 ~ 4..6: sum = 6

Range:  2..4: sum = 8

Paired: 2..4 ~ 3..5: sum = 8

Range:  2..5: sum = 10

Paired: 2..5 ~ 3..6: sum = 10

Range:  2..6: sum = 12

Range:  3..4: sum = 6

Paired: 3..4 ~ 4..6: sum = 6

Range:  3..5: sum = 8

Range:  3..6: sum = 10

Range:  4..5: sum = 4

Paired: 4..5 ~ 5..6: sum = 4

Range:  4..6: sum = 6

Range:  5..6: sum = 4

Total Ranges: 21

Total Paired: 12

real 0.00

user 0.00

sys 0.00

However, my simple solution doesn't scale well:

Code:

$ time -p ( seq 1 100| ./sum_range | grep Total )

Total Ranges: 4950

Total Paired: 4342

real 0.02

user 0.02

sys 0.00

$ time -p ( seq 1 200| ./sum_range | grep Total )

Total Ranges: 19900

Total Paired: 21302

real 0.24

user 0.24

sys 0.00

$ time -p ( seq 1 400| ./sum_range | grep Total )

Total Ranges: 79800

Total Paired: 100903

real 3.50

user 3.49

sys 0.01

$

Quote:

Originally Posted by AlinaC (Post 6391863)

I've read some articles on Internet and it looks like dictionary can be a better structure to use for this purpose. (I need to check every item in an array and create a list of unique numbers/sums that are present in this array; each number/sum to be associated with the frequency).

Do you think dictionary will be a good choice for that? I basically need to store pairs int->int, specifically sum(a number stored in an array item) -> frequency (how many times this sum was contained in different array items).

Yes :) Although you'd probably call your implementation a hash table (means the same thing).

Quote:

Originally Posted by boughtonp (Post 6392046)

Thank you for explaining! However before posting a question, I've talked already to myself, my friends from the Uni, and even my mom ;) so this condition was fulfilled )

Yeah, but how many of those are ducks? :)

Anyway, not clear on the current status of the thread... do you still have an outstanding memory leak, or a different issue, or is it solved now?