LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 04-15-2007, 03:40 PM   #1
vargadanis
Member
 
Registered: Sep 2006
Posts: 248

Rep: Reputation: 30
C++ Unions to understand - question


Hi!

I began to learn C++. It goes quite well. I have understood many important part of it and some concepts however there is a little bit more complicated thing (unions) that I cannot "digest".

Correct if I am wrong:
with a union you can access the same memory allocation with different data types. So it is like a pointer. You can point to a certain memory location and read the data there as byte or character or whatever. Is that correct so far?

So if I read in a character from console with cin and store it in a var like char c, that I should be able to access it as an byte if I point of reffer to it. Now the question would be, how can I use the unions. I read some examples but I couldn't find any practical examples, therefore I cannot use it, however it is a handy feature. I'd be glad if someone gave ma a good example (simple one pls that is not implemented into a struct or similar).

Thanx
 
Old 04-15-2007, 04:20 PM   #2
taylor_venable
Member
 
Registered: Jun 2005
Location: Indiana, USA
Distribution: OpenBSD, Ubuntu
Posts: 892

Rep: Reputation: 41
Quote:
Originally Posted by vargadanis
Now the question would be, how can I use the unions. I read some examples but I couldn't find any practical examples, therefore I cannot use it, however it is a handy feature.
Unions are a hold-over from the days when memory was very expensive. Nowadays, we're practically swimming in it (unless you're running Vista) so there's not a lot of reason to use them. Although I've hardly ever used them in practice, unions can be handy for a certain kind of type polymorphism, where a variable name can be bound to data of different types. This is an unsafe feature in C and C++ because keeping track of what type of data is actually stored is up to the programmer. That's why you typically find them as members in structs; usually another member of the struct tells you what type of data the union value is holding.

A useful example of how to use one is when you want to store some type of data into a variable, say the current temperature, but want the user to be able to decide what kind of precision to use. Imagine the user selects to just store the temperature as an integer; you can set a flag in your program to say it's stored as an int, and then store it into the int field of the union. Then later on say the user changes their mind and wants to store it as a float; you set the flag to indicate storage as a float, and assign to the float value of the union. Note that you can still try to get the integer value of the union, but it won't be correct -- it will be the data of the float interpreted as an int, which is usually not what you want.

Unions in C and C++ have a storage size equal to the largest member of the union.

Other languages (like Ada and Standard ML) have, in contrast to C and C++, safe unions. This means that the variable that represents the union carries it's type information with it at runtime, and doesn't allow itself to be coerced into a type it's not holding. This feature is often used along with type checking to take different actions depending on the type of the variable. A typical example is a binary tree. Each node on the tree is either (a) a leaf holding some value, or (b) a value with a left child node and a right child node. Both (a) and (b) are different types, the general node is a union of those two types. A search function can take a general node, and examine it's type. If it's a leaf, check the value and return whether it's value is equal to the search key. If it's an interior node, check the value and compare it to the search key. If the values are the same, return true. If the values are not the same, recurse and pass either the left or right child node as the general node to the search function.

But unfortunately cool features like these can only be used in languages supporting safe unions. In C and C++, my advice is to stay away from them whenever possible.
 
Old 04-15-2007, 06:06 PM   #3
dmail
Member
 
Registered: Oct 2005
Posts: 970

Rep: Reputation: Disabled
Quote:
I read some examples but I couldn't find any practical examples, therefore I cannot use it, however it is a handy feature. I'd be glad if someone gave ma a good example (simple one pls that is not implemented into a struct or similar).
Here is an example which is a straight copy and paste from one of my projects, which is used to serialise types in a class to be able to transfer over a network. The serialisation takes a type and shifts it into an byte(unsigned char) buffer. As shifting a signed type is not defined and to easily make one function which can take a type of 8, 16, 32 or 64 bits I use this structure.
Code:
template<typename T>
struct Type2bytes
{
	Type2bytes(T const &value):whole(value){}
	Type2bytes():whole(T()){}
	union
	{
		T whole;
		GNL::byte bytes[sizeof(T)];
	};
};
 
Old 04-15-2007, 09:23 PM   #4
ErV
Senior Member
 
Registered: Mar 2007
Location: Russia
Distribution: Slackware 12.2
Posts: 1,202
Blog Entries: 3

Rep: Reputation: 62
Quote:
Originally Posted by vargadanis
Correct if I am wrong:
with a union you can access the same memory allocation with different data types. So it is like a pointer. You can point to a certain memory location and read the data there as byte or character or whatever. Is that correct so far?
Mostly correct, but not in details. Union is not a pointer, it's a datatype with a certain size (the size of union is the size of it's largest component). You will not need to call "malloc" or "new" to allocate storage to union.


all examples will probably work correctly only if structures are 1-byte aligned.

Quote:
Originally Posted by vargadanis
So if I read in a character from console with cin and store it in a var like char c, that I should be able to access it as an byte if I point of reffer to it.
Not exactly. (You can access c as a byte using casting, but that's another story) You must first define a union like this:
Code:
#include <iostream>
int main(int argc, char** argv){
    union{
        char a;
        byte b;
    } v;
    std::cin >> v.a;
    std::cout << v.b;
}
so union looks like a struct, but it's fields are placed at the same memory location. In this example v.a and v.b are placed at same address.

Quote:
Now the question would be, how can I use the unions.
The one example is to provide a several names/access methods for a data. An example:
Code:
#include <linux/types.h>
union dword_union{
    unsigned char bytes[4];
    __u16 words[2];
    __u32 dword;
};
using that union you can access dword (32bit integer) as a whole using "dword" field, as an array of two words (16bit integers) using "words" field, and as an array of 4 bytes, using "bytes" field. The more complex (and probably non-portable) example can be:
Code:
#include <linux/types.h>
union dword_union{
    unsigned char bytes[4];
    struct{
        __u16 hiword;
        __u16 loword;
    };
    __u16 words[2];
    __u32 dword;
};
The nameless unions inside structures or classes can be used this way,
Code:
struct Something{
    char a;
    union{
        __u16 words[2];
        unsigned char bytes[4];
    };
};
in this case "words" and "bytes" will be accessible as a members of structure, but they will be placed at same memory location.

IN practice you will need unions only if you are dealing with some binary data, that is located in memory, and is created by some external program that can't be (for some reason) currently modified. If you have a raw block of data you can use unions (in certain cases) to provice several ways to access it's content.

Another example/way to use unions: imagine you are developing a script language where variable can be anything (with certain restrictions, of course) 4bytes long - pointer(32bit system), float, dword. If you'll have to implement a structure/class for that variable (and your program will need a several hundred thounsands of those variables ) you'll probably use unions to save some memory.
Code:
enum VariableType = {Dword, Float, Pointer, None};

struct ScriptVariable{
    VariableType variableType;
    union{
        __u32 dwordValue;
        float floatValue;
        void* pointerValue;
    };    
};
I hope this wasn't too difficult and will be useful.

Last edited by ErV; 04-15-2007 at 09:45 PM.
 
Old 04-16-2007, 11:35 AM   #5
vargadanis
Member
 
Registered: Sep 2006
Posts: 248

Original Poster
Rep: Reputation: 30
Ahm... Wow.. I didn't expect that you would actually take the time and explain me that in such great details. I am very impressed. Thanx guys...

I have a question in connection with these:

Quote:
The one example is to provide a several names/access methods for a data. An example:
Code:

Code:
#include <linux/types.h> 
union dword_union{ 
unsigned char bytes[4];
__u16 words[2]; 
__u32 dword; 
};
using that union you can access dword (32bit integer) as a whole using "dword" field, as an array of two words (16bit integers) using "words" field, and as an array of 4 bytes, using "bytes" field.
What are those __u16 and __u32 things? I have never seen them before.

Thanx one more time
 
Old 04-17-2007, 12:44 AM   #6
ErV
Senior Member
 
Registered: Mar 2007
Location: Russia
Distribution: Slackware 12.2
Posts: 1,202
Blog Entries: 3

Rep: Reputation: 62
Quote:
Originally Posted by vargadanis
What are those __u16 and __u32 things? I have never seen them before.

Thanx one more time
__u16 is an unsigned integer 16bit long.
__u32 is an unsigned integer 32bit long.
Those are non-standart types (declared in <linux/types.h>), they should exist in another distributions, but I can't guarantee that. In fact I've used them only because I've just recently migrated from windows and still do not know if there is some other "DWORD" type that is guaranteed to be 4bytes long on any platform, and if there is some "WORD" type that is guaranteed to be 2 bytes long on any platform. I've found __u16 and __u32 in linux include directory while browsing svgalib sources. They looked like a good "DWORD" and "WORD" replacement, so I just took them. I don't know for sure if those types are standart or not, and if they are portable or not.
 
Old 04-17-2007, 05:40 AM   #7
dmail
Member
 
Registered: Oct 2005
Posts: 970

Rep: Reputation: Disabled
Have a look at stdint.h, it provides types which are more portable, int*_t and uint*_t.
 
Old 04-18-2007, 09:43 PM   #8
vargadanis
Member
 
Registered: Sep 2006
Posts: 248

Original Poster
Rep: Reputation: 30
If once you have compiled the app with static libs than no matter what's on the other comp, it should run, is that right? Bigger program, maybe slower but more compatible.
 
Old 04-19-2007, 12:24 PM   #9
ta0kira
Senior Member
 
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 3,078

Rep: Reputation: Disabled
Quote:
Originally Posted by taylor_venable
Unions are a hold-over from the days when memory was very expensive. Nowadays, we're practically swimming in it (unless you're running Vista) so there's not a lot of reason to use them. Although I've hardly ever used them in practice, unions can be handy for a certain kind of type polymorphism, where a variable name can be bound to data of different types. This is an unsafe feature in C and C++ because keeping track of what type of data is actually stored is up to the programmer.
I think the best use of unions is in option/flag logic with command line arguments. Take this example:
Code:
#include <iostream>
#include <string>

//'packed' attribute for use with gcc
#define PACKED_OPTION( option ) Option option __attribute__ ((packed))

typedef unsigned int Option;

union AllOptions {
	//all options___________________________________________________________
	Option all;

	//by group______________________________________________________________
	struct {

		//group one_____________________________________________________
		union { PACKED_OPTION(group_one:3);
			struct {
				PACKED_OPTION(group_one_A:1);
				PACKED_OPTION(group_one_B:1);
				PACKED_OPTION(group_one_C:1); }; };

		//group two_____________________________________________________
		union { PACKED_OPTION(group_two:3);
			struct {
				PACKED_OPTION(group_two_D:1);
				PACKED_OPTION(group_two_E:1);
				PACKED_OPTION(group_two_F:1); }; };
}; }; 


int main(int argc, const char *argv[])
{
	AllOptions options;
	options.all = 0;

	for (int I = 1; I < argc; I++)
	{
	if      (std::string("-A") == argv[ I ]) options.group_one_A = 1;
	else if (std::string("-B") == argv[ I ]) options.group_one_B = 1;
	else if (std::string("-C") == argv[ I ]) options.group_one_C = 1;
	else if (std::string("-D") == argv[ I ]) options.group_two_D = 1;
	else if (std::string("-E") == argv[ I ]) options.group_two_E = 1;
	else if (std::string("-F") == argv[ I ]) options.group_two_F = 1;
	}

	if (!options.all)       std::cout << "must use at least one option!\n";
	if (!options.group_one) std::cout << "must use A, B, or C!\n";
	if (!options.group_two) std::cout << "must use D, E, or F!\n";
	
	if (options.group_one && options.group_two) std::cout << "DONE!\n";

	return 0;
}
To use 'bool' instead of a union here would create a significant increase in boolean operations, therefore making code more difficult to maintain and more prone to error. Of course when creating complex embedded unions and structs you need to account for unions starting and stopping on the byte.
ta0kira

Last edited by ta0kira; 04-19-2007 at 12:31 PM.
 
Old 04-19-2007, 01:02 PM   #10
vargadanis
Member
 
Registered: Sep 2006
Posts: 248

Original Poster
Rep: Reputation: 30
Ahm... OK... I am quite into programming but this is too high for me.

Thanx though..
 
Old 04-19-2007, 01:12 PM   #11
vargadanis
Member
 
Registered: Sep 2006
Posts: 248

Original Poster
Rep: Reputation: 30
Quote:
Originally Posted by ErV
__u16 is an unsigned integer 16bit long.
__u32 is an unsigned integer 32bit long.
Those are non-standart types (declared in <linux/types.h>), they should exist in another distributions, but I can't guarantee that. In fact I've used them only because I've just recently migrated from windows and still do not know if there is some other "DWORD" type that is guaranteed to be 4bytes long on any platform, and if there is some "WORD" type that is guaranteed to be 2 bytes long on any platform. I've found __u16 and __u32 in linux include directory while browsing svgalib sources. They looked like a good "DWORD" and "WORD" replacement, so I just took them. I don't know for sure if those types are standart or not, and if they are portable or not.
Well it seems kind of standard for me because I found it on the port of GCC to windows too. It is sys/types.h but has the same content. I use gcc 3.4.2 I think so it should be crossplatform.
 
Old 04-19-2007, 01:23 PM   #12
ta0kira
Senior Member
 
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 3,078

Rep: Reputation: Disabled
Sorry, I was just demonstrating to taylor_venable that unions are quite useful despite having loads of memory. Basically what the union in my other post does is this:
Code:
union:
  all:		[all-------------------------------------------]
  groups:	[one----]               [two----]
  options:	[a][b][c]               [d][e][f]

memory:
  byte:		[0                     ][1                     ]
  bit:		[0][1][2][3][4][5][6][7][0][1][2][3][4][5][6][7]
This allows abstraction of options on the all, groups, and options levels. That in turn allows checking a group for true rather than ||'ing all of the options in that group, etc.
ta0kira
 
Old 04-19-2007, 04:14 PM   #13
dmail
Member
 
Registered: Oct 2005
Posts: 970

Rep: Reputation: Disabled
Quote:
Originally Posted by vargadanis
Well it seems kind of standard for me
Standard to you?
Quote:
because I found it on the port of GCC to windows too. It is sys/types.h
Hmm so its different then
Quote:
but has the same content. I use gcc 3.4.2 I think so it should be crossplatform.
I don't.

http://www.xml.com/ldd/chapter/book/ch10.html
Quote:
It's important to remember that these types are Linux specific, and using them hinders porting software to other Unix flavors. Systems with recent compilers will support the C99-standard types, such as uint8_t and uint32_t; when possible, those types should be used in favor of the Linux-specific variety
 
Old 04-19-2007, 08:18 PM   #14
vargadanis
Member
 
Registered: Sep 2006
Posts: 248

Original Poster
Rep: Reputation: 30
Yeah, whatever...
Thanx for correcting me. It is always nice to know the accurate things.
 
Old 04-20-2007, 12:49 AM   #15
ErV
Senior Member
 
Registered: Mar 2007
Location: Russia
Distribution: Slackware 12.2
Posts: 1,202
Blog Entries: 3

Rep: Reputation: 62
Quote:
Originally Posted by vargadanis
Well it seems kind of standard for me because I found it on the port of GCC to windows too. It is sys/types.h but has the same content. I use gcc 3.4.2 I think so it should be crossplatform.
Those types aren't placed in sys/types.h. I'm not sure about it, but they are defined somewhere deeply in the linux kernel. the "__" prefix clearly indicates, that they are not standard. By the way, they are used to typedef uint8_t types in linux/types.h So I think it better not to use __u8, __u16, etc. I've just placed them for an example (I didn't find uint8_t at that time).
 
  


Reply

Tags
c++, explanation


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
unions spx2 Programming 3 08-21-2006 10:35 AM
Question about threads (I don't understand how they work) zahadumy Programming 10 12-13-2005 01:19 PM
array of unions in c alaios Programming 3 09-19-2005 10:02 AM
array of unions (c code) alaios Programming 6 09-16-2005 12:43 PM
structres and unions problem linux_lover2005 Programming 2 04-15-2005 09:31 AM


All times are GMT -5. The time now is 03:17 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration