LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   C++ Unions to understand - question (https://www.linuxquestions.org/questions/programming-9/c-unions-to-understand-question-546277/)

vargadanis 04-15-2007 02:40 PM

C++ Unions to understand - question
 
Hi!

I began to learn C++. It goes quite well. I have understood many important part of it and some concepts however there is a little bit more complicated thing (unions) that I cannot "digest".

Correct if I am wrong:
with a union you can access the same memory allocation with different data types. So it is like a pointer. You can point to a certain memory location and read the data there as byte or character or whatever. Is that correct so far?

So if I read in a character from console with cin and store it in a var like char c, that I should be able to access it as an byte if I point of reffer to it. Now the question would be, how can I use the unions. I read some examples but I couldn't find any practical examples, therefore I cannot use it, however it is a handy feature. I'd be glad if someone gave ma a good example (simple one pls that is not implemented into a struct or similar).

Thanx

taylor_venable 04-15-2007 03:20 PM

Quote:

Originally Posted by vargadanis
Now the question would be, how can I use the unions. I read some examples but I couldn't find any practical examples, therefore I cannot use it, however it is a handy feature.

Unions are a hold-over from the days when memory was very expensive. Nowadays, we're practically swimming in it (unless you're running Vista) so there's not a lot of reason to use them. Although I've hardly ever used them in practice, unions can be handy for a certain kind of type polymorphism, where a variable name can be bound to data of different types. This is an unsafe feature in C and C++ because keeping track of what type of data is actually stored is up to the programmer. That's why you typically find them as members in structs; usually another member of the struct tells you what type of data the union value is holding.

A useful example of how to use one is when you want to store some type of data into a variable, say the current temperature, but want the user to be able to decide what kind of precision to use. Imagine the user selects to just store the temperature as an integer; you can set a flag in your program to say it's stored as an int, and then store it into the int field of the union. Then later on say the user changes their mind and wants to store it as a float; you set the flag to indicate storage as a float, and assign to the float value of the union. Note that you can still try to get the integer value of the union, but it won't be correct -- it will be the data of the float interpreted as an int, which is usually not what you want.

Unions in C and C++ have a storage size equal to the largest member of the union.

Other languages (like Ada and Standard ML) have, in contrast to C and C++, safe unions. This means that the variable that represents the union carries it's type information with it at runtime, and doesn't allow itself to be coerced into a type it's not holding. This feature is often used along with type checking to take different actions depending on the type of the variable. A typical example is a binary tree. Each node on the tree is either (a) a leaf holding some value, or (b) a value with a left child node and a right child node. Both (a) and (b) are different types, the general node is a union of those two types. A search function can take a general node, and examine it's type. If it's a leaf, check the value and return whether it's value is equal to the search key. If it's an interior node, check the value and compare it to the search key. If the values are the same, return true. If the values are not the same, recurse and pass either the left or right child node as the general node to the search function.

But unfortunately cool features like these can only be used in languages supporting safe unions. In C and C++, my advice is to stay away from them whenever possible.

dmail 04-15-2007 05:06 PM

Quote:

I read some examples but I couldn't find any practical examples, therefore I cannot use it, however it is a handy feature. I'd be glad if someone gave ma a good example (simple one pls that is not implemented into a struct or similar).
Here is an example which is a straight copy and paste from one of my projects, which is used to serialise types in a class to be able to transfer over a network. The serialisation takes a type and shifts it into an byte(unsigned char) buffer. As shifting a signed type is not defined and to easily make one function which can take a type of 8, 16, 32 or 64 bits I use this structure.
Code:

template<typename T>
struct Type2bytes
{
        Type2bytes(T const &value):whole(value){}
        Type2bytes():whole(T()){}
        union
        {
                T whole;
                GNL::byte bytes[sizeof(T)];
        };
};


ErV 04-15-2007 08:23 PM

Quote:

Originally Posted by vargadanis
Correct if I am wrong:
with a union you can access the same memory allocation with different data types. So it is like a pointer. You can point to a certain memory location and read the data there as byte or character or whatever. Is that correct so far?

Mostly correct, but not in details. Union is not a pointer, it's a datatype with a certain size (the size of union is the size of it's largest component). You will not need to call "malloc" or "new" to allocate storage to union.


all examples will probably work correctly only if structures are 1-byte aligned.

Quote:

Originally Posted by vargadanis
So if I read in a character from console with cin and store it in a var like char c, that I should be able to access it as an byte if I point of reffer to it.

Not exactly. (You can access c as a byte using casting, but that's another story) You must first define a union like this:
Code:

#include <iostream>
int main(int argc, char** argv){
    union{
        char a;
        byte b;
    } v;
    std::cin >> v.a;
    std::cout << v.b;
}

so union looks like a struct, but it's fields are placed at the same memory location. In this example v.a and v.b are placed at same address.

Quote:

Now the question would be, how can I use the unions.
The one example is to provide a several names/access methods for a data. An example:
Code:

#include <linux/types.h>
union dword_union{
    unsigned char bytes[4];
    __u16 words[2];
    __u32 dword;
};

using that union you can access dword (32bit integer) as a whole using "dword" field, as an array of two words (16bit integers) using "words" field, and as an array of 4 bytes, using "bytes" field. The more complex (and probably non-portable) example can be:
Code:

#include <linux/types.h>
union dword_union{
    unsigned char bytes[4];
    struct{
        __u16 hiword;
        __u16 loword;
    };
    __u16 words[2];
    __u32 dword;
};

The nameless unions inside structures or classes can be used this way,
Code:

struct Something{
    char a;
    union{
        __u16 words[2];
        unsigned char bytes[4];
    };
};

in this case "words" and "bytes" will be accessible as a members of structure, but they will be placed at same memory location.

IN practice you will need unions only if you are dealing with some binary data, that is located in memory, and is created by some external program that can't be (for some reason) currently modified. If you have a raw block of data you can use unions (in certain cases) to provice several ways to access it's content.

Another example/way to use unions: imagine you are developing a script language where variable can be anything (with certain restrictions, of course) 4bytes long - pointer(32bit system), float, dword. If you'll have to implement a structure/class for that variable (and your program will need a several hundred thounsands of those variables ;)) you'll probably use unions to save some memory.
Code:

enum VariableType = {Dword, Float, Pointer, None};

struct ScriptVariable{
    VariableType variableType;
    union{
        __u32 dwordValue;
        float floatValue;
        void* pointerValue;
    };   
};

I hope this wasn't too difficult and will be useful.

vargadanis 04-16-2007 10:35 AM

Ahm... Wow.. I didn't expect that you would actually take the time and explain me that in such great details. I am very impressed. Thanx guys...

I have a question in connection with these:

Quote:

The one example is to provide a several names/access methods for a data. An example:
Code:

Code:

#include <linux/types.h>
union dword_union{
unsigned char bytes[4];
__u16 words[2];
__u32 dword;
};

using that union you can access dword (32bit integer) as a whole using "dword" field, as an array of two words (16bit integers) using "words" field, and as an array of 4 bytes, using "bytes" field.
What are those __u16 and __u32 things? I have never seen them before.

Thanx one more time

ErV 04-16-2007 11:44 PM

Quote:

Originally Posted by vargadanis
What are those __u16 and __u32 things? I have never seen them before.

Thanx one more time

__u16 is an unsigned integer 16bit long.
__u32 is an unsigned integer 32bit long.
Those are non-standart types (declared in <linux/types.h>), they should exist in another distributions, but I can't guarantee that. In fact I've used them only because I've just recently migrated from windows and still do not know if there is some other "DWORD" type that is guaranteed to be 4bytes long on any platform, and if there is some "WORD" type that is guaranteed to be 2 bytes long on any platform. I've found __u16 and __u32 in linux include directory while browsing svgalib sources. They looked like a good "DWORD" and "WORD" replacement, so I just took them. I don't know for sure if those types are standart or not, and if they are portable or not.

dmail 04-17-2007 04:40 AM

Have a look at stdint.h, it provides types which are more portable, int*_t and uint*_t.

vargadanis 04-18-2007 08:43 PM

If once you have compiled the app with static libs than no matter what's on the other comp, it should run, is that right? Bigger program, maybe slower but more compatible.

ta0kira 04-19-2007 11:24 AM

Quote:

Originally Posted by taylor_venable
Unions are a hold-over from the days when memory was very expensive. Nowadays, we're practically swimming in it (unless you're running Vista) so there's not a lot of reason to use them. Although I've hardly ever used them in practice, unions can be handy for a certain kind of type polymorphism, where a variable name can be bound to data of different types. This is an unsafe feature in C and C++ because keeping track of what type of data is actually stored is up to the programmer.

I think the best use of unions is in option/flag logic with command line arguments. Take this example:
Code:

#include <iostream>
#include <string>

//'packed' attribute for use with gcc
#define PACKED_OPTION( option ) Option option __attribute__ ((packed))

typedef unsigned int Option;

union AllOptions {
        //all options___________________________________________________________
        Option all;

        //by group______________________________________________________________
        struct {

                //group one_____________________________________________________
                union { PACKED_OPTION(group_one:3);
                        struct {
                                PACKED_OPTION(group_one_A:1);
                                PACKED_OPTION(group_one_B:1);
                                PACKED_OPTION(group_one_C:1); }; };

                //group two_____________________________________________________
                union { PACKED_OPTION(group_two:3);
                        struct {
                                PACKED_OPTION(group_two_D:1);
                                PACKED_OPTION(group_two_E:1);
                                PACKED_OPTION(group_two_F:1); }; };
}; };


int main(int argc, const char *argv[])
{
        AllOptions options;
        options.all = 0;

        for (int I = 1; I < argc; I++)
        {
        if      (std::string("-A") == argv[ I ]) options.group_one_A = 1;
        else if (std::string("-B") == argv[ I ]) options.group_one_B = 1;
        else if (std::string("-C") == argv[ I ]) options.group_one_C = 1;
        else if (std::string("-D") == argv[ I ]) options.group_two_D = 1;
        else if (std::string("-E") == argv[ I ]) options.group_two_E = 1;
        else if (std::string("-F") == argv[ I ]) options.group_two_F = 1;
        }

        if (!options.all)      std::cout << "must use at least one option!\n";
        if (!options.group_one) std::cout << "must use A, B, or C!\n";
        if (!options.group_two) std::cout << "must use D, E, or F!\n";
       
        if (options.group_one && options.group_two) std::cout << "DONE!\n";

        return 0;
}

To use 'bool' instead of a union here would create a significant increase in boolean operations, therefore making code more difficult to maintain and more prone to error. Of course when creating complex embedded unions and structs you need to account for unions starting and stopping on the byte.
ta0kira

vargadanis 04-19-2007 12:02 PM

Ahm... OK... I am quite into programming but this is too high for me. :confused:

Thanx though..

vargadanis 04-19-2007 12:12 PM

Quote:

Originally Posted by ErV
__u16 is an unsigned integer 16bit long.
__u32 is an unsigned integer 32bit long.
Those are non-standart types (declared in <linux/types.h>), they should exist in another distributions, but I can't guarantee that. In fact I've used them only because I've just recently migrated from windows and still do not know if there is some other "DWORD" type that is guaranteed to be 4bytes long on any platform, and if there is some "WORD" type that is guaranteed to be 2 bytes long on any platform. I've found __u16 and __u32 in linux include directory while browsing svgalib sources. They looked like a good "DWORD" and "WORD" replacement, so I just took them. I don't know for sure if those types are standart or not, and if they are portable or not.

Well it seems kind of standard for me because I found it on the port of GCC to windows too. It is sys/types.h but has the same content. I use gcc 3.4.2 I think so it should be crossplatform.

ta0kira 04-19-2007 12:23 PM

Sorry, I was just demonstrating to taylor_venable that unions are quite useful despite having loads of memory. Basically what the union in my other post does is this:
Code:

union:
  all:                [all-------------------------------------------]
  groups:        [one----]              [two----]
  options:        [a][b][c]              [d][e][f]

memory:
  byte:                [0                    ][1                    ]
  bit:                [0][1][2][3][4][5][6][7][0][1][2][3][4][5][6][7]

This allows abstraction of options on the all, groups, and options levels. That in turn allows checking a group for true rather than ||'ing all of the options in that group, etc.
ta0kira

dmail 04-19-2007 03:14 PM

Quote:

Originally Posted by vargadanis
Well it seems kind of standard for me

Standard to you?
Quote:

because I found it on the port of GCC to windows too. It is sys/types.h
Hmm so its different then :)
Quote:

but has the same content. I use gcc 3.4.2 I think so it should be crossplatform.
I don't.

http://www.xml.com/ldd/chapter/book/ch10.html
Quote:

It's important to remember that these types are Linux specific, and using them hinders porting software to other Unix flavors. Systems with recent compilers will support the C99-standard types, such as uint8_t and uint32_t; when possible, those types should be used in favor of the Linux-specific variety

vargadanis 04-19-2007 07:18 PM

Yeah, whatever...
Thanx for correcting me. It is always nice to know the accurate things.

ErV 04-19-2007 11:49 PM

Quote:

Originally Posted by vargadanis
Well it seems kind of standard for me because I found it on the port of GCC to windows too. It is sys/types.h but has the same content. I use gcc 3.4.2 I think so it should be crossplatform.

Those types aren't placed in sys/types.h. I'm not sure about it, but they are defined somewhere deeply in the linux kernel. the "__" prefix clearly indicates, that they are not standard. By the way, they are used to typedef uint8_t types in linux/types.h So I think it better not to use __u8, __u16, etc. I've just placed them for an example (I didn't find uint8_t at that time).


All times are GMT -5. The time now is 11:38 AM.