Threads, CPU and Memory

ankit,garg · 01-19-2012, 05:40 AM

Hello,

I have a multi threaded application and I want to see the cpu usuage and memory consumed by a single thread during the program execution.

Is it possible to do it by using any system command or embedding code in my program?

Thanks
Ankit

johnsfine · 01-19-2012, 08:21 AM

What do you mean by "memory consumed by a single thread"?

All the threads in a process share the same address space. Memory allocated by any thread is usable by any other thread.

ankit,garg · 01-19-2012, 11:54 PM

Suppose my process is consuming 3176 KB and there are 10 threads running inside this process so I want the memory consumed by a single thread from 3176 KB.

Same for CPU Usage.

Nominal Animal · 01-20-2012, 03:00 AM

Read /proc/self/task/tid/statm where tid is the thread ID (use gettid() as pthreads uses a different thread identifier). See man 5 proc for the descriptions of the fields in statm (and the other files, in case you decide you need further info). To get the values in bytes, you'll need to multiply by sysconf(_SC_PAGE_SIZE) as most values are in pages. You'll get much more data if you parse /proc/self/task/tid/stat instead.

Note, however, that the stack size is fixed for each thread you start, and the above will not tell you how much of the stack each thread is using. For that, you need to measure it yourself, perhaps using something like I described in my post in your previous thread.

ankit,garg · 01-20-2012, 03:32 AM

Thanks for the help. I will check it as described by your and come back if I have anything else related to this.

sundialsvcs · 01-20-2012, 09:38 AM

The essential difference between a thread and a process is that ... a process is the thing that "owns" resources (such as files and memory), and multiple threads can run within the context of a single process.

Therefore, the notion "how much memory is used by a single thread" has no meaning at all.

"I can see from the electric meter that the occupants of that building consumed 21 kWh of electricity today ... but I have no way to know which one of you turned on which electrical appliance."

johnsfine · 01-20-2012, 10:52 AM

Quote:

Originally Posted by sundialsvcs

the notion "how much memory is used by a single thread" has no meaning at all.

I tried to explain that, but in post #3, it is clear that attempt merely bounced off the OP's pre conceptions.

Quote:

"I can see from the electric meter that the occupants of that building consumed 21 kWh of electricity today ... but I have no way to know which one of you turned on which electrical appliance."

That is a bad analogy, because it describes a measurement difficultly difficulty, not a lack of definition of what you would like to measure.

Two people are watching a TV show while a third is sort of watching while doing something else in the same room. Who is using the electricity consumed by that TV? The person who last turned the TV on, (but left the room when someone else changed the channel)? The person who last selected the channel? Some apportioned value among those in the room depending on whether they are really watching?

Attributing memory use within a process by thread is less well defined than who is using the electricity in my TV example. It is a definition question more than a measurement question.

The use of stack memory is kind of a special case. Threads might interact in such a way that data on one thread's stack is also used by other threads, but that is rare and doesn't necessarily invalidate the idea that each thread is solely responsible for memory use on its stack. But the stacks are typically a small part of the total memory use of a multi threaded process. So attributing the stack use still leaves most of the memory use not attributed to specific threads.

Nominal Animal · 01-20-2012, 01:40 PM

Quote:

Originally Posted by sundialsvcs

Therefore, the notion "how much memory is used by a single thread" has no meaning at all.

I disagree.

I agree that the notion "how much memory is allocated by a single thread" has no meaning. /proc/self/task/*/statm and the other per-task status files are essentially identical for all threads. From the kernel's point of view, all allocations are done by the process; it does not care a whit which thread does it.

However, it is possible to track "how much memory was first accessed by each thread". Perhaps not exactly, but approximately.

When a process (whichever thread, does not matter) allocates memory, the kernel usually sets up only the virtual memory, not the actual RAM. You can request the kernel to populate the pages, too, but it is counterproductive in most situations.

When the process first accesses a page, a page fault is generated. If the page fault is on a page that the kernel has already set up, it will map an actual page in RAM, filled with zeroes, there, and let the process continue. There are other types of mappings, like file-backed mappings, in which case the kernel may e.g. load the file contents there. If there is no mapping at all, you'll get segfault or bus error.

These page faults are counted for each thread separately on Linux. The tenth field in /proc/self/stat does describe the number of minor page faults for the entire process, but /proc/self/task/tid/stat fields describe them for each thread separately.

This means that you can estimate the amount of memory allocated by each thread by checking on how many minor page faults the thread has caused.

Obviously, this is very imprecise. Library functions use temporary allocations, so those affect the counts. Allocations are done in larger chunks, and initially there is almost always some available in an already obtained page. Small allocations do not therefore show up. The GNU C library at least does not release allocations back to the kernel immediately. Many times the released memory is used to satisfy a later allocation instead. Because those pages are already faulted in, they're not accounted for in the minor page faults. Also, if you are tight enough on memory that some of the pages are swapped out, the fault counts will probably get a bit haywire.

On the other hand, if you use mmap(NULL,size,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANONYMOUS,-1,(off_t)0)/munmap() instead of malloc()/free() to allocate memory, those will be accounted very reliably in the minor page fault counts. (Swapping will mess with those fault counts too, though. Avoid swapping.)

Assuming you have a Linux kernel 2.6.26 or later, this data can also be obtained using getrusage(RUSAGE_THREAD,ptr); for the current thread. Here is some example code I used to verify (at least on my machines) my opinions above:

Code:

#define _GNU_SOURCE
#include <unistd.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <string.h>
#include <errno.h>
#include <pthread.h>
#include <semaphore.h>

#include <stdio.h>

size_t first_accessed(void)
{
    struct rusage   u;
    size_t          page;
    int             result, saved_errno;

    saved_errno = errno;

    do {
        result = getrusage(RUSAGE_THREAD, &u);
    } while (result == -1 && errno == EINTR);
    if (result == -1)
        return (size_t)0;

    page = sysconf(_SC_PAGE_SIZE);

    errno = saved_errno;
    return page * u.ru_minflt;
}

sem_t worker_semaphore;

void *worker(void *payload)
{
    const long   bytes = (long)payload;
    char        *data = NULL;
    int          result;

    if (bytes > 0) {
        data = malloc(bytes);
        if (data)
            memset(data, 0, bytes);
    }

    result = sem_wait(&worker_semaphore);
    if (result)
        return (void *)( (long)errno );

    return (void *)( (long)first_accessed() );
}


int main(int argc, char *argv[])
{
    pthread_t       *thread_id     = NULL;
    pthread_attr_t  *thread_attr   = NULL;
    long            *thread_arg    = NULL;
    int              threads       = 0;

    long             value;
    char             dummy;
    void            *retval;
    int              arg, result;

    if (argc < 2 || !strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) {
        fprintf(stderr, "Usage: %s bytes [ bytes ... ]\n", argv[0]);
        return 0;
    }

    thread_id     = malloc((size_t)argc * sizeof (pthread_t));
    thread_arg    = malloc((size_t)argc * sizeof (long));
    thread_attr   = malloc((size_t)argc * sizeof (pthread_attr_t));
    if (!thread_id || !thread_arg || !thread_attr) {
        fprintf(stderr, "Not enough memory.\n");
        return 1;
    }

    result = sem_init(&worker_semaphore, 0, 0);
    if (result == -1) {
        fprintf(stderr, "Cannot initialize worker semaphore: %s.\n", strerror(errno));
        return 1;
    }

    for (arg = 1; arg < argc; arg++) {

        if (sscanf(argv[arg], "%ld %c", &value, &dummy) != 1) {
            fprintf(stderr, "%s: Invalid number of bytes.\n", argv[arg]);
            return 1;
        }
        if (value < 0L) {
            fprintf(stderr, "%s: Invalid number of bytes.\n", argv[arg]);
            return 1;
        }

        thread_arg[threads] = value;

        result = pthread_attr_init(&(thread_attr[threads]));
        if (result) {
            fprintf(stderr, "Cannot initialize thread attributes: %s.\n", strerror(result));
            return 1;
        }

        result = pthread_attr_setstacksize(&(thread_attr[threads]), (size_t)65536);
        if (result) {
            fprintf(stderr, "Cannot set thread stack size attribute: %s.\n", strerror(result));
            return 1;
        }

        result = pthread_create(&(thread_id[threads]), &(thread_attr[threads]), worker, (void *)thread_arg[threads]);
        if (result) {
            fprintf(stderr, "Cannot create thread: %s.\n", strerror(result));
            return 1;
        }

        threads++;
    }

    if (threads < 1) {
        fprintf(stderr, "Nothing to do.\n");
        return 1;
    }

    for (arg = 0; arg < threads; arg++)
        if (sem_post(&worker_semaphore) == -1) {
            fprintf(stderr, "Cannot post worker semaphore: %s.\n", strerror(errno));
            return 1;
        }

    fflush(stderr);

    for (arg = 0; arg < threads; arg++) {

        result = pthread_join(thread_id[arg], &retval);
        if (!result) {
            printf("Thread %d of %d: %ld bytes allocated, %ld bytes used (minor page faults).\n",
                   arg + 1, threads, thread_arg[arg], (long)retval);
            fflush(stdout);
        } else {
            fprintf(stderr, "Failed to join thread %d of %d: %s.\n", arg + 1, threads, strerror(result));
            fflush(stderr);
        }
    }

    if (sem_destroy(&worker_semaphore) == -1) {
        fprintf(stderr, "Cannot destroy worker semaphore: %s.\n", strerror(errno));
        return 1;
    }

    return 0;
}

If you save the above code as minorfaults.c you can compile and run a couple of tests using

Code:

gcc minorfaults.c -Wall -O3 -lpthread -o minorfaults

./minorfaults 1 1000 1000000
    Thread 1 of 3: 1 bytes allocated, 12288 bytes used (minor page faults).
    Thread 2 of 3: 1000 bytes allocated, 4096 bytes used (minor page faults).
    Thread 3 of 3: 1000000 bytes allocated, 1003520 bytes used (minor page faults).

./minorfaults 1000000 1000 1
    Thread 1 of 3: 1000000 bytes allocated, 1015808 bytes used (minor page faults).
    Thread 2 of 3: 1000 bytes allocated, 4096 bytes used (minor page faults).
    Thread 3 of 3: 1 bytes allocated, 0 bytes used (minor page faults).

./minorfaults 50000 200 40000 800000 40
    Thread 1 of 5: 50000 bytes allocated, 61440 bytes used (minor page faults).
    Thread 2 of 5: 200 bytes allocated, 8192 bytes used (minor page faults).
    Thread 3 of 5: 40000 bytes allocated, 40960 bytes used (minor page faults).
    Thread 4 of 5: 800000 bytes allocated, 802816 bytes used (minor page faults).
    Thread 5 of 5: 40 bytes allocated, 0 bytes used (minor page faults).

To get the corresponding information on any thread in any process you have access to, read the tenth field in /proc/pid/task/tid/stat and multiply by sysconf(_SC_PAGE_SIZE) . Note that tid is the Linux task id, not POSIX threads ID; you need to use gettid() or e.g. ps -o tid ... , I don't know of any way to derive the tid from a pthread_t variable.

If your library does not provide gettid(), use

Code:

#define _GNU_SOURCE
#include <unistd.h>
#include <sys/syscall.h>

pid_t gettid(void)
{
    return (pid_t)syscall(SYS_gettid);
}

I hope you found this stuff as interesting as I did.

@sundialsvcs and johnsfine: I did not know all the details above before this thread. I did have a fuzzy notion, but nothing specific. Because of this thread, I checked -- and I'm glad I did. If I ever need to check if my worker threads have more or less balanced memory use, I know how to do it now.