LinuxQuestions.org - [SOLVED] C programming index number confusion

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - C programming index number confusion (https://www.linuxquestions.org/questions/programming-9/c-programming-index-number-confusion-4175532496/)

C programming index number confusion

C programming / selection sort / index number confusion

The below code uses functions to create a selection sort. It's source is http://www.sanfoundry.com/c-program-...ing-functions/

Code:

/*

  C program for SELECTION sort that uses the following functions to:

    a) Find maximum of elements

    b) Swap two elements

 */

#include<stdio.h>



int findmax(int b[],int size);

void swap(int b[],int size);



void main()

{

    int n;

    printf("Enter the value of n: ");

    scanf("%d",&n);



    int array[n];

    int i,j,temp;



    puts("Enter the array's numbers: ");

    for(i=0;i<n;i++)

    {

        scanf("%d",&array[i]);

    }

    printf("The unsorted array elements are: ");

    for(i=0;i<n;i++)

    {

        printf("%d ",array[i]);

    }

    printf("\n");

    //Selection Sort Begins

    swap(array,n);

    printf("  The sorted array elements are: ");

    for(i=0;i<n;i++)

    {

        printf("%d ",array[i]);

    }

    printf("\n");

}      //end main



//Function to find the maximum value

int findmax(int b[],int size)

{

    int max, j;

    max=0;



    for(j=1;j<=size;j++)

    {

        if(b[j]>b[max])

        {

            max=j;

        }

    }

    return(max);

}



void swap(int b[],int size)

{

    int temp, big, j;

    for(j=size-1;j>0;j--)      

    {

        big=findmax(b,j);

        temp=b[j];

        b[j]=b[big];

        b[big]=temp;

    }

    return;

}

Specifically, why does "int findmax(int b[],int size)" specify 'j' in "for(j=1;j<=size;j++)"?
Just think what 'j<=size' implies. if you have a 5 element array, e.g. int b[5], then int b[5] declares (as integer) the array b[] which is comprised of b[0], b[1], b[2], b[3] and finally b[4]. THERE IS NO b[size]!!!!! I even wrote another program where I enter in the elements of an array, then I enter in the element number. Enter in a '0' and you get back the 1st element(NOT the 0th element). Enter in a 2 and an array will give you back the 3rd element. For a array of size 5 (int array[5]), enter in a five, and then you're in trouble, the computer will produce an erroneous number. This is simple to see, but here the above program for some reason defies all c programming logic and states "for(j=1;j<=size...", which implies to retrieve the element n+1 in an array of size n.

Does anyone know why j<=size makes the program work yet when I edit the program to "j<size" the program doesn't sort correctly?

> Specifically, why does "int findmax(int b[],int size)" specify 'j' in "for(j=1;j<=size;j++)"?
> Just think what 'j<=size' implies.

It doesn't implies that, because 'size' isn't the size, it's the 'highest index'. You'd better use clearer variable names, eg:

Code:

//Function to find the maximum value

int findmax (const int b[], int highest_index)

{

    int maxpos, j;



    if (highest_index<1) exit (77);



    maxpos= 0;

    for(j=1;j<=highest_index;j++) {

        if(b[j]>b[maxpos]) {

            maxpos=j;

        }

    }

    return maxpos;

}

The reason it works is that it starts at size-1

Code:

for(j=size-1;j>0;j--)

It is always comparing the whole list to the last element of the list. Which also means this is not a good algorithm for very large lists.

I would not say this code works. In some cases probably it produces the required result, but in some cases that code will simply die.
size means the number of elements in the array, and the index is running from 0 to size-1. Therefore the for loop should be:

Code:

for(j=0;j<size;j++)

but j=0 can be skipped in your case. Therefore:

Code:

//Function to find the maximum value

int findmax(int b[],int size)

{

    int max, j;

    max=0;



    # just exit in this case 

    if (size < 1) exit (77);

    

    for(j=1;j<size;j++)

    {

        if(b[j]>b[max])

        {

            max=j;

        }

    }

    return(max);

}

but in some cases - like in the original example you linked - you pass the index of the last item (that is not equal to the size, but less), and therefore that code will work.

The key to the answer is that C doesn't care what a variable's name means. It only cares what a variable's usage means.

In swap() we have a variable named "size" and the way it is used, it actually is the size. But in findmax() we have another variable named "size" but based on the way it is used, it should have been named "last".

In each pass through the loop in swap() the number passed to findmax() is the index of the last unsorted position. It is not the size of the unsorted portion. It is one less than the size of the unsorted portion.

Bad variable names are very confusing to anyone who tries to read the code. So even though the code works, the author should be ashamed.

Quote:

Originally Posted by smallpond (Post 5309049)

It is always comparing the whole list to the last element of the list.

Not in any obvious meaning of "always" or "last". I understand you basic objection. But you aren't expressing it very well.

Quote:

Which also means this is not a good algorithm for very large lists.

I'm sure it was never intended as a good algorithm for very large lists. I expect it was intended as an example of implementing a simple algorithm in C (meant to teach C programming, not to teach sorting algorithms).

As an example used to teach C programming, it is terrible for several reasons. But not because it is a poor sorting algorithm. Selecting best algorithms probably should not be part of early teaching of C. Best algorithm is largely a distraction from learning the basics of competent coding.

Using meaningful, rather than misleading, variable names should be part of teaching C coding from the start.

Expressing ranges as first,limit in most cases and first,size in others and almost never as first,last is an important aspect of decent C style, that should be learned at first by example and rote, because using first,last for a range (or using last alone when first is implied) is a very bad practice whose flaws will not be explainable to beginners nor matter in trivial examples. But if you are in the habit of using last (rather than size or limit) to end a range, by the time you reach serious work where that habit creates a mess, it will be too late to change.

Using int rather than unsigned for inherently non negative quantities is another bad habit, best avoided from the start.

Quote:

Originally Posted by pan64 (Post 5309097)

but in some cases that code will simply die.

That is a best a strange point of view, rather than a true statement. In effect, you are saying the function is coded incorrectly and then used incorrectly in a way that makes it work. You are saying the function could be called in other ways that would make it crash. But that is true of most functions. C is not intended for strictly defensive programming.

I see that as a badly misnamed variable. "size" in that function is not a size. You see the variable name as a contract, which is then violated by both the function itself and its caller.

Thanks everyone, Core problem solved!

Thanks to all, I was indeed mislead by the example's use certain misleading words used as variables. More specifically the formal function definition for int findmax(int b, size), the "size" variable mixed me up. I thought the variable going into this function was "size". Actually the variable going into the function findmax was "j" from the swap function's "j" variable, which as all know so well from the previous line of code is "for(j=size-1;j>0;j--)" thus this variable only ranges from size-1 to 1, thus a span falling short of the array's size by 1 element! Naturally then Sanfoundry had written "j<=size" in findmax function's "for(j=1;j<=size;j++)" statement, and not "j<size".

I should have some more time tomorrow evening to look some more at johnsfine and pan64 at more detail to enable me to make the program even better for pedagogical purposes, but the core question has most certainly been solved.

Thanks everyone so much!:D

johnsfine,
Thanks so much for your pointers, and the final program is as below.

Code:

/*

  C program for SELECTION sort that uses the following functions to:

    a) Find maximum of elements

    b) Swap two elements

 */

#include<stdio.h>



unsigned findmax(int b[],unsigned si);

void swap(int b[],unsigned limit);



void main()

{

    unsigned lmt;      //limit of array b's size

    printf("Enter the value of the array: ");

    scanf("%d",&lmt);



    int array[lmt];

    unsigned i;



    puts("Enter the array's numbers: ");

    for(i=0;i<lmt;i++)

    {

        scanf("%d",&array[i]);

    }

    printf("The unsorted array elements are: ");

    for(i=0;i<lmt;i++)

    {

        printf("%d ",array[i]);

    }

    printf("\n");

    //Selection Sort Begins

    swap(array,lmt);

    printf("  The sorted array elements are: ");

    for(i=0;i<lmt;i++)

    {

        printf("%d ",array[i]);

    }

    printf("\n");

}      //end main



//Function to find the maximum value

unsigned findmax(int b[],unsigned si)

{

    unsigned maxpos, m;    //m= findmax's index

    maxpos=0;



    for(m=1;m<=si;m++)

    {

        if(b[m]>b[maxpos])

        {

            maxpos=m;

        }

    }

    return(maxpos);

}



//Function to swap variables

void swap(int b[],unsigned limit)

{

    int temp, big;

    unsigned si;            //swap's index]

    for(si=limit-1;si>0;si--)

        /*

        1)'limit-1' because an array a[n] only has n elements going from a[0],a[1],...,a[n-1]

        2)si>0  and not si>=0 because 5 elements at the most only need 4 rounds of swaping(just experiment with cut out numbers), hence n elements at most need n-1 times of swapping.

        */

    {

        big=findmax(b,si);

        temp=b[si];

        b[si]=b[big];

        b[big]=temp;

    }

    return;

}



/*

TESTING

I thought about making both findmax and swap both go forward rather than the current way where swap goes backwards, but that way then at the beginning of the loop then find max gets fed a really low swap index number which ends up blocking findmax from returning the absolute maximum early on.  If the abs. max isn't swapped early on then each future loop will get stuck sorting the abs. max not to the absolute end memory address but merely one more memory address upward each time, causing the remaining numbers to remain totally unsorted.

*/

Thanks soooo much!!

Quote:

Originally Posted by pan64 (Post 5309097)

I would not say this code works. In some cases probably it produces the required result, but in some cases that code will simply die.

How specifically doesn't the code work, i.e. could you give an example?

I don't think changing the name of "size" in swap() to "limit" was helpful. As the size of the unsorted portion, I think the name "size" was just as meaningful as "limit".

I'm less happy with changing the name of "size" in findmax() to "si". That does get rid of the agressively wrong name. But what does "si" mean? If you keep the bad design in which the variable represents a last index, you should make that clearer by naming it "last". Passing a "last" index in C is bad practice, but naming it unclearly makes that worse, not better. The better choice is to change the code to not pass a last index:

Code:

/*

  C program for SELECTION sort that uses the following functions to:

    a) Find maximum of elements

    b) Swap two elements

 */

#include<stdio.h>



unsigned findmax(int b[],unsigned size);  /* Takes location and size of unsorted portion */

void swap(int b[],unsigned size);  /* Takes location and size of unsorted portion */



void main()

{

    unsigned size;      //array b's size

    printf("Enter the value of the array: ");

    scanf("%d",&size);



    int array[size];

    unsigned i;



    puts("Enter the array's numbers: ");

    for(i=0;i<size;i++)

    {

        scanf("%d",&array[i]);

    }

    printf("The unsorted array elements are: ");

    for(i=0;i<size;i++)

    {

        printf("%d ",array[i]);

    }

    printf("\n");

    //Selection Sort Begins

    swap(array,size);

    printf("  The sorted array elements are: ");

    for(i=0;i<size;i++)

    {

        printf("%d ",array[i]);

    }

    printf("\n");

}      //end main



//Function to find the maximum value

unsigned findmax(int b[],unsigned size)

{

    unsigned maxpos, m;    //m= findmax's index

    maxpos=0;



    for(m=1;m<size;m++)

    {

        if(b[m]>b[maxpos])

        {

            maxpos=m;

        }

    }

    return(maxpos);

}



//Function to swap variables

void swap(int b[],unsigned size)

{

    int temp, big;

    unsigned si;            //swap's index]

    for(si=size;si>1;si--)

        /*

        si>1  and not si>0 because 5 elements at the most only need 4 rounds of swaping(just experiment with cut out numbers), hence n elements at most need n-1 times of swapping.

        */

    {

        big=findmax(b,si);

        temp=b[si-1];

        b[si-1]=b[big];

        b[big]=temp;

    }

    return;

}

Quote:

I thought about making both findmax and swap both go forward rather than the current way where swap goes backwards, but that way then at the beginning of the loop then find max gets fed a really low swap index number which ends up blocking findmax from returning the absolute maximum early on. If the abs. max isn't swapped early on then each future loop will get stuck sorting the abs. max not to the absolute end memory address but merely one more memory address upward each time, causing the remaining numbers to remain totally unsorted.

The fact that you are finding the max each time is tied to the fact that the loop in swap() run backwards. The loop in findmax (or if you changed it to findmin) can run in either direction. But the loop in swap() must run in the direction consistent with the choice to find max or min each time.

BTW, I think using unsigned instead of int is good practice when quantities are inherently unsigned. But it does require some extra care. One of your comments about >=0 implied you had missed that detail. This program did not need any >=0 checks on unsigned loops. But that need is quite common, and becomes and extra detail to worry about when using unsigned. I think it is worth it, but others likely disagree:

The following is a very common loop:

Code:

for (i=size-1; i>=0; i--)

The values of i are inherently non negative, but when you use unsigned instead on int, the loop doesn't work. The invalid (negative) value of i is needed in the test to terminate the loop.

I code that loop in a lot of serious functions, where the flaws of using int instead of unsigned matter, so I use a form that probably looks strange the first time:

Code:

for (i = size; i-- > 0; )

Within the body of the loop, i goes through the correct set of values (size-1 downto 0 inclusive), and has whatever advantages are gained by using an unsigned variable for the unsigned quantity. If you aren't used to seeing that form of loop, it will be confusing at first glance, which is a real disadvantage (other people need to be able to follow my code). But I think the advantages end up outweighing that disadvantage.

Quote:

Originally Posted by johnsfine (Post 5310075)

The following is a very common loop:

Code:

for (i=size-1; i>=0; i--)

...
I code that loop in a lot of serious functions, where the flaws of using int instead of unsigned matter, so I use a form that probably looks strange the first time:

Code:

for (i = size; i-- > 0; )

Within the body of the loop, i goes through the correct set of values (size-1 downto 0 inclusive),

I have two questions:
1) Why doesn't the code "i = size;" include 'size'?
2) The above code works well in the swap function, but when I adapt it for the findmax function and write 'for(m = 1; m-->s; )' I get the following result:

Code:

Enter the array's size: 5

Enter the array's numbers: 

5

4

3

2

1

The unsorted array elements are: 5 4 3 2 1 

The sorted array elements are: 4 3 2 1 5

Should the syntax 'for(s = size; s-- >0; )' only be used for unsigned index numbers approaching 0(as to avoid testing a negative value that can't inherently exist), and never anything else?

Each of your questions implies you are failing to see the basic operation i-- in that less usual context.

i-- is a basic C operation, saying to decrement i, but use the value before decrementing for the current expression.

So in: for (i = size; i-- > 0; ) { body }
in each iteration, the value of i before it is decremented is compared to zero, but the value of i after it is decremented is used within body.

Quote:

1) Why doesn't the code "i = size;" include 'size'?

The test expression of a for() is computed at the beginning of each iteration, including the first iteration. So the i-- in the test expression is executed before the first iteration and so i is changed from size to size-1 before the body of the loop is executed the first time.

Quote:

2) The above code works well in the swap function, but when I adapt it for the findmax function and write 'for(m = 1; m-->s; )' I get the following result:

The m-- part of the test expression decrements m, so obviously it means the loop is running backward.

Quote:

Should the syntax 'for(s = size; s-- >0; )' only be used for unsigned index numbers approaching 0(as to avoid testing a negative value that can't inherently exist), and never anything else?

Pretty much. It decrements the index, so it should be used only in cases where decrementing the index is what you want. It is a non obvious syntax whose benefit is avoiding the need to test against negative. So it could be used for signed values, but that would be less readable for no reason.

So the only good usage of that syntax is as you said: for unsigned index numbers approaching 0(as to avoid testing a negative value that can't inherently exist)

Quote:

Originally Posted by andrew.comly (Post 5310030)

How specifically doesn't the code work, i.e. could you give an example?

That is quite simple: b[size] is invalid, since the last element of the array is b[size-1]. Trying to access to a non-existent member may cause crash and/or other strange things (like overwriting another variable)