LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 04-14-2011, 12:58 PM   #1
Mansi_Jaiswal
LQ Newbie
 
Registered: May 2010
Posts: 2

Rep: Reputation: 0
Data type in C to hold TetraByte


Hi All,

Can you please help me in understanding c data type. I need a data type to hold TetraByte.
Even I want to know details of int64, long long, unsigned long long, and how they differ from each other.

Thankyou for your replies
 
Click here to see the post LQ members have rated as the most helpful post in this thread.
Old 04-14-2011, 01:46 PM   #2
TB0ne
Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 14,989

Rep: Reputation: 2673Reputation: 2673Reputation: 2673Reputation: 2673Reputation: 2673Reputation: 2673Reputation: 2673Reputation: 2673Reputation: 2673Reputation: 2673Reputation: 2673
Quote:
Originally Posted by Mansi_Jaiswal View Post
Hi All,
Can you please help me in understanding c data type. I need a data type to hold TetraByte.
Even I want to know details of int64, long long, unsigned long long, and how they differ from each other.

Thankyou for your replies
I assume you mean a terabyte. If you want to know the details, I'd suggest referencing your textbooks, or checking on Google. This:

http://en.wikipedia.org/wiki/Integer...ter_science%29

should get you started. Plenty more material out there you can easily find.

Last edited by TB0ne; 04-15-2011 at 09:48 AM.
 
1 members found this post helpful.
Old 04-14-2011, 01:55 PM   #3
johnsfine
Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,125

Rep: Reputation: 1119Reputation: 1119Reputation: 1119Reputation: 1119Reputation: 1119Reputation: 1119Reputation: 1119Reputation: 1119Reputation: 1119
I thought Tetra used that way meant 4.

Four bytes equals 32 bits. So the OP's question is too confused for me to answer.
 
1 members found this post helpful.
Old 04-14-2011, 02:40 PM   #4
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443
Blog Entries: 3

Rep: Reputation: 714Reputation: 714Reputation: 714Reputation: 714Reputation: 714Reputation: 714Reputation: 714
Quote:
Originally Posted by Mansi_Jaiswal View Post
Hi All,

Can you please help me in understanding c data type. I need a data type to hold TetraByte.
Even I want to know details of int64, long long, unsigned long long, and how they differ from each other.
If you really want to store a terabyte of data an a C type, that's impossible. Even high-end computers only have a few gigabytes of RAM. You could write it to the hard drive, though.

Or am I somehow misunderstanding your question?

Last edited by MTK358; 04-14-2011 at 02:43 PM.
 
Old 04-14-2011, 02:49 PM   #5
paulsm4
Guru
 
Registered: Mar 2004
Distribution: SusE 8.2
Posts: 5,863
Blog Entries: 1

Rep: Reputation: Disabled
Hi -

C's "int" type will only map to the largest integer value your CPU will handle (for example, 4GB/unsigned for a 32-bit CPU).

What you're really interested in is "BigInt" - a C language library that can handle arbitrary precision.

Here's a link to the Gnu MP Bignum library, one of several good choices:

http://gmplib.org/
 
Old 04-14-2011, 03:04 PM   #6
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443
Blog Entries: 3

Rep: Reputation: 714Reputation: 714Reputation: 714Reputation: 714Reputation: 714Reputation: 714Reputation: 714
Quote:
Originally Posted by paulsm4 View Post
C's "int" type <snip> 4GB
GB?

Another thing I wanted to add is that there are predefined integer types with a certain amount of bytes. For example "uint16_t" is an unsigned 16-bit integer, and "int32_t" is a signed 32-bit integer.

Last edited by MTK358; 04-14-2011 at 03:06 PM.
 
Old 04-14-2011, 03:05 PM   #7
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 23,000
Blog Entries: 11

Rep: Reputation: 893Reputation: 893Reputation: 893Reputation: 893Reputation: 893Reputation: 893Reputation: 893
Moved: This thread is more suitable in <PROGRAMMING> and has been moved accordingly to help your thread/question get the exposure it deserves.
 
Old 04-14-2011, 04:18 PM   #8
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942
I apologize for this being almost off topic, but I just had to interject.

Quote:
Originally Posted by MTK358 View Post
If you really want to store a terabyte of data an a C type, that's impossible.
No, of course it's possible! There are actually off-the-shelf blades that can accommodate at least half a terabyte. Even standard servers can currently use up to 192GB of RAM.

Besides, you can always use a memory map instead of RAM to back up the data structure, if you use a 64-bit platform. If you are using a 64-bit kernel, compile (gcc -O3 -m64 -o tera filename.c) and run (./tera) this on a filesystem that supports large enough sparse files (like ext3, ext4, or XFS).
Code:
#include <unistd.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <errno.h>

#include <string.h>
#include <stdio.h>

int main(void)
{
    char const *const   filename = "tera.data";
    size_t const        size = 1024UL * 1024UL * 1024UL * 1024UL;
    int                 descriptor;

    unsigned char      *data;
    size_t              i;

    int                 result;

    /* Create the backing file. */
    do {
        descriptor = open(filename, O_RDWR | O_CREAT | O_EXCL, 0600);
    } while (descriptor == -1 && errno == EINTR);
    if (descriptor == -1) {
        char const *const   error = strerror(errno);
        fprintf(stderr, "Cannot create sparse mapping file %s: %s.\n", filename, error);
        return 1;
    }

    /* Make it large enough. */
    do {
        result = ftruncate(descriptor, (off_t)size);
    } while (result == -1 && errno == EINTR);
    if (result == -1) {
        char const *const   error = strerror(errno);

        do {
            result = unlink(filename);
        } while (result == -1 && errno == EINTR);
        do {
            result = close(descriptor);
        } while (result == -1 && errno == EINTR);

        fprintf(stderr, "Cannot grow backing file %s large enough: %s.\n", filename, error);
        return 1;
    }

    /* Map it. */
    do {
        data = mmap(NULL, size, PROT_READ | PROT_WRITE,
                    MAP_SHARED | MAP_FILE | MAP_NORESERVE,
                    descriptor, (off_t)0);
    } while ((void *)data == MAP_FAILED && errno == EINTR);
    if ((void *)data == MAP_FAILED) {
        char const *const   error = strerror(errno);

        do {
            result = unlink(filename);
        } while (result == -1 && errno == EINTR);
        do {
            result = close(descriptor);
        } while (result == -1 && errno == EINTR);

        fprintf(stderr, "Memory mapping failed: %s.\n", error);
        return 1;
    }

    /* Notify of success. */
    fprintf(stderr, "Mapped %lu bytes at %p from file %s successfully.\n",
                    (unsigned long)size, (void *)data, filename);
    fflush(stderr);

    /* Just to be a bitch, set a couple of bytes every few megabytes or so. */
    i = 0;
    while (i < size) {
        data[i] = 32 + (i & 63);

        i += 5; if (i >= size) break;

        data[i] = 64 + (i & 63);

        i += 25000000;
    }

    /* Tell it was successful. */
    fprintf(stderr, "The entire map is used (with about a megabyte holes).\n");
    fflush(stderr);

    /* Tear down the mapping. */
    do {
        result = munmap(data, size);
    } while (result == -1 && errno == EINTR);
    if (result == -1) {
        char const *const   error = strerror(errno);
        fprintf(stderr, "Unmapping error: %s.\n", error);
    }

    /* Remove the backing file. */
    do {
        result = unlink(filename);
    } while (result == -1 && errno == EINTR);
    if (result == -1) {
        char const *const   error = strerror(errno);
        fprintf(stderr, "Error removing %s: %s.\n", filename, error);
    }

    /* Close the backing file descriptor. */
    do {
        result = close(descriptor);
    } while (result == -1 && errno == EINTR);
    if (result == -1) {
        char const *const   error = strerror(errno);
        fprintf(stderr, "Error closing the backing file: %s.\n", error);
    }

    /* Done. */
    fprintf(stderr, "All done.\n");
    fflush(stderr);
    return 0;
}
For simplicity, I only used a terabyte-sized char array, instead of some fancier data structure, but that shouldn't matter. The example program even scribbles into the structure (every two dozen megabytes or so) just to show you it really is a terabyte-sized data structure, which is completely useable as if it was in RAM.

If you watch the directory you run it in, you'll notice a terabyte-sized file, tera.data appears while the program is running. (You could actually unlink the file immediately after its creation, which is a good idea since the space taken up by the file is released when the program closes the descriptor, but I wanted you to be able to see the file while the program is running.) If you have a sensible filesystem, the file will be sparse: only the nonzero bytes (in I/O block sized chunks) are actually saved on disk. So you don't need a terabyte of free disk space to run the program; I think it uses < 50000 pages' worth, or about 250 MB on x86_64, maximum.

The test program runs fine on my x86_64 with 4 GiB of RAM, but I'd expect it to run fine on any Linux machine running a x86_64 (or other 64-bit) kernel: the kernel will evict pages back to the file if/when needed. If there is a lot of free RAM, RAM will be used instead.

What is surprising is that most of current scientific software that handles truly huge data files still do not take advantage of this method. (If anybody is interested, feel free to contact me.)

Last edited by Nominal Animal; 05-10-2011 at 08:56 PM. Reason: Fixed (result == -1) to ((void *)data == MAP_FAILED)
 
2 members found this post helpful.
Old 04-14-2011, 04:52 PM   #9
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942
Quote:
Originally Posted by Mansi_Jaiswal View Post
Can you please help me in understanding c data type. I need a data type to hold TetraByte.
Even I want to know details of int64, long long, unsigned long long, and how they differ from each other.
Use #include <stdint.h> and either uint32_t or int32_t type, depending on whether you need unsigned or signed 32-bit integers.

You'll learn the details much better if you start by playing with the types first. Compile and run this via e.g. gcc -Wall -std=c99 -o test test.c && ./test:
Code:
#include <stdio.h>
#include <stdint.h>

typedef struct tetrabyte tetrabyte_t;
struct tetrabyte {
    union {
        uint32_t    u32;
        int32_t     i32;
        uint8_t     byte[4];
    } as;
} tetrabyte;

int main(void)
{
    tetrabyte_t t;

    t.as.u32 = 0x12345678;

    printf("0x%08x = %u\n", (unsigned int)t.as.u32, (unsigned int)t.as.u32);
    printf("0x%02x 0x%02x 0x%02x 0x%02x\n",
           (int)t.as.byte[0], (int)t.as.byte[1], (int)t.as.byte[2], (int)t.as.byte[3]);

    return 0;
}
Running this on an Intel machine (or on any "little-endian" processor) you will get
Code:
0x12345678 = 305419896
0x78 0x56 0x34 0x12
but on "big-endian" processors like POWER processors, you will get
Code:
0x12345678 = 305419896
0x12 0x34 0x56 0x78
Not only does the byte order depend on the machine architecture, but also the size of int, long, and long long types differ. Only the size-specific types defined in stdint.h (intN_t, uintN_t, et cetera) have known fixed sizes.

Most Linux architectures are either ILP32 or LP64 type. For ILP32 ints, longs and pointers are 32-bit. For LP64, ints are 32-bit, but longs and pointers are 64-bit.

Most architectures also provide two or more floating point types. float is usually the same as IEEE 754-2008 Binary32 type, and double is usually the same as Binary64, with long double being a nonstandard type close to double but providing more precision. These types are affected by compiler options, though.

(In practice even floating-point data types are standard enough to use in machine-architecture agnostic way, if you account for the floating-point types possibly having a different endianness to integer data. I use prototype floating-point embedded in my binary data files, having a value where each bytes is unique. That allows easy detection of byte order; comparing then the floating-point value to the expected prototype value tells whether or not the machine representations are compatible enough to use. A similar mechanism works well for integers, too. For 32-bit values, only byte orders 1234, 4321, 3412 and 2143 are known to have been used.)

There is of course a lot of information available on the web on the topic at hand. I'd recommend checking out at leastIf you have specific questions you need help with, please feel free to ask. But please, do first at least glance at the above links.

Hope this helps.
 
Old 04-14-2011, 08:25 PM   #10
paulsm4
Guru
 
Registered: Mar 2004
Distribution: SusE 8.2
Posts: 5,863
Blog Entries: 1

Rep: Reputation: Disabled
Hi, again
Quote:
Can you please help me in understanding c data type. I need a data type to hold TeraByte.
I interpret this to mean you want to hold the number "1024*1024*1024*1024" in a C variable.

You can do this in a "uint_64" (aka "unsigned long long", aka "quadword").

But for arbitrarily large values, you'll need to use "BigNum". For example, the Gnu MP Bignum library I spoke of earlier.

Quote:
Even I want to know details of int64, long long, unsigned long long, and how they differ from each other.
I think you've already got a reasonable explanation. But please look at this link, too:

Integer (computer science)

'Hope that helps .. PSM

Last edited by paulsm4; 04-14-2011 at 08:27 PM.
 
Old 04-15-2011, 12:23 AM   #11
Mansi_Jaiswal
LQ Newbie
 
Registered: May 2010
Posts: 2

Original Poster
Rep: Reputation: 0
Thank you all for your responses, its a great help
 
Old 04-15-2011, 09:50 AM   #12
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443
Blog Entries: 3

Rep: Reputation: 714Reputation: 714Reputation: 714Reputation: 714Reputation: 714Reputation: 714Reputation: 714
If your problem is solved, mark the thread as solved.
 
1 members found this post helpful.
Old 04-15-2011, 10:55 AM   #13
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,654

Rep: Reputation: 1965Reputation: 1965Reputation: 1965Reputation: 1965Reputation: 1965Reputation: 1965Reputation: 1965Reputation: 1965Reputation: 1965Reputation: 1965Reputation: 1965
@Nominal (and slightly OT) - my C is rusty and clearly not in the same league as yours (kudos) but may I ask a question that is bugging me, even though not related to
whether it demonstrates the issue. In your 'Map it' section you seem to test the value of result early on but I am not following where it was set that the 'if' would
now be of value:
Code:
do {
        data = mmap(NULL, size, PROT_READ | PROT_WRITE,
                MAP_SHARED | MAP_FILE | MAP_NORESERVE,
                descriptor, (off_t)0);
    } while ((void *)data == MAP_FAILED && errno == EINTR);
    if (result == -1) {
From what i understand, all previous assignments of result have already been tested and as they all lead to an error message being displayed then they are effectively 'used' at that
point. So my question is simply, is this a typo and if not, what is the relevant setting of this variable?
 
Old 05-10-2011, 09:03 PM   #14
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942
Hi, grail, and sorry for not responding earlier.
Quote:
Originally Posted by grail View Post
In your 'Map it' section you seem to test the value of result early on but I am not following where it was set that the 'if' would
now be of value:
Code:
    do {
        data = mmap(NULL, size, PROT_READ | PROT_WRITE,
                MAP_SHARED | MAP_FILE | MAP_NORESERVE,
                descriptor, (off_t)0);
    } while ((void *)data == MAP_FAILED && errno == EINTR);
    if (result == -1) {
It's a bug. It should be
Code:
if ((void *)data == MAP_FAILED) {
as that is the error condition. I fixed the code in my original post, too. The mmap() may be interrupted, in which case the loop must retry, but if errno is something other than EINTR, then the mmap() failed.

You have very sharp eyes, grail!
 
Old 05-11-2011, 12:07 AM   #15
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,654

Rep: Reputation: 1965Reputation: 1965Reputation: 1965Reputation: 1965Reputation: 1965Reputation: 1965Reputation: 1965Reputation: 1965Reputation: 1965Reputation: 1965Reputation: 1965
Quote:
You have very sharp eyes, grail!
hehe .. well they do have glasses these days but I more just wanted to make sure I was understanding correctly
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
changing file system type to another filesystem type,does it effect on data? janakiramulu Ubuntu 1 02-04-2011 02:58 AM
data type errors microsoft/linux Programming 33 05-04-2006 11:31 PM
MySQL data type question: timestamp versus integer to hold time vharishankar Programming 4 07-07-2005 10:01 PM
data type to hold pointers? SciYro Programming 5 05-15-2004 06:24 PM
Data Type 9 bob Linux - Newbie 1 02-10-2001 09:10 AM


All times are GMT -5. The time now is 06:15 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration