LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 07-31-2011, 04:32 PM   #1
dillongeo
LQ Newbie
 
Registered: Jul 2011
Posts: 2

Rep: Reputation: Disabled
How to measure CPU time spent on a code block (with C or C++)?


Hi, I have a C program like this:

...
CODE_BLOCK;
...

I want to know how much CPU time spent on CODE_BLOCK. Since the process executing CODE_BLOCK may be preempted during execution, this CPU time may not be equal to the (wall-clock) time elapsed from the beginning of CODE_BLOCK to the end of it.

Can anyone tell me how to do this?

Thanks.
 
Old 07-31-2011, 05:12 PM   #2
Proud
Senior Member
 
Registered: Dec 2002
Location: England
Distribution: Used to use Mandrake/Mandriva
Posts: 2,794

Rep: Reputation: 116Reputation: 116
What exactly are you trying to achieve? To accurately capture this value programatically, or to benchmark this code block?
Possibly consider renice, a different scheduler or just averaging multiple runs if it's the latter.
 
Old 07-31-2011, 05:42 PM   #3
dillongeo
LQ Newbie
 
Registered: Jul 2011
Posts: 2

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by Proud View Post
What exactly are you trying to achieve? To accurately capture this value programatically, or to benchmark this code block?
Possibly consider renice, a different scheduler or just averaging multiple runs if it's the latter.
Thanks. Sorry for the confusion, CODE_BLOCK here stands for a code segment. I am trying to measure how much user/system time spent on the execution of CODE_BLOCK. I just google'd, and it seems getrusage can do the job. I prefer not to using a third party tools or libraries.
 
Old 08-01-2011, 12:02 AM   #4
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942
If you are using x86, use the time stamp counter (rdtsc, example function at Wikipedia). Read the counter just before and just after running the benchmarked code, and compute the difference.

To make sure you are actually benchmarking the code you want, I recommend compiling the benchmarked code into another object file; that way you can also use maximum optimization (and to look for better optimization flags). If you benchmark an empty function the same way, you can quite reliably estimate the overheads involved.

I've used this method when writing optimized SSE3 routines. It is quite simple when you split the program into multiple separately-compiled files.

Here is the framework I use with GCC on x86. It is split in eight files (including the Makefile), but that way it is much easier to control. The code is CC0-licensed, so you can use it in any way you like. So, feel free to use it as a basis for your own code block benchmarking stuff, if you like.

test.c - the actual function to benchmark, plus an empty function to compute overheads with. The test() here will benchmark one multiplication and one addition:
Code:
/* This is test.c */
#include "test.h"

/* test() may use global variables to keep state
 *        from one iteration to another.
*/
static unsigned long counter = 0L;
static unsigned long result = 0L;

/* This is the test function to be benchmarked.
*/
void test(void)
{

	/* Benchmark multiply-add */
	result *= ++counter;

}

/* This is an empty function, used to measure call overhead.
*/
void nothing(void)
{
	return;
}
Note that if your test function has to do some extra work, you can include a similar operation in the empty function. It will of course be approximate, due to CPU scheduling and other stuff -- even the compiler may produce slightly different code. But it is much better than guessing, even if it is approximate.

utils.c - isolation for benchmarking a function:
Code:
/* This is utils.c */
#include <stdint.h>
#include <errno.h>

#ifndef   _POSIX_C_SOURCE
#define   _POSIX_C_SOURCE 200112L
#endif
#include <time.h>

#include "rdtsc.h"
#include "utils.h"

uint64_t benchmark(unsigned long iterations, void (*function)(void))
{
	uint64_t started, finished;

	/* Sanity check. */
	if (!iterations || !function) {
		errno = EINVAL;
		return (uint64_t)-1;
	}

	started = rdtsc();
	while (iterations--)
		function();
	finished = rdtsc();

	return finished - started;
}

double cpu_seconds(void)
{
	struct timespec t;

	if (!clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &t))
		return (double)t.tv_sec
		     + (double)t.tv_nsec / 1000000000.0;
	else
		return (double)clock() / (double)CLOCKS_PER_SEC;
}
rdtsc.c - RDTSC function:
Code:
/* This is rdtsc.c */
#include "rdtsc.h"

/* Note: this implementation has no implicit barrier (such as cpuid),
 *       so the CPU may reorder other operations to before or after.
*/
uint64_t rdtsc()
{
	uint32_t lo, hi;
	__asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
	return ( ((uint64_t)hi) << (uint32_t)32 )
	     | ( ((uint64_t)lo) );
}
These are the header files for above .c files:
Code:
#ifndef   TEST_H
#define   TEST_H

/* Prototype for the benchmarked function. */
void test(void);

/* Prototype for the empty function used to measure overhead. */
void nothing(void);

#endif /* TEST_H */
Code:
#ifndef   UTILS_H
#define   UTILS_H

#include <stdint.h>
#include "rdtsc.h"

uint64_t benchmark(unsigned long iterations, void (*function)(void));

double cpu_seconds(void);

#endif /* UTILS_H */
Code:
#ifndef   RDTSC_H
#define   RDTSC_H
#include <stdint.h>

uint64_t rdtsc(void);

#endif /* RDTSC_H */
main.c - the main program. It takes the number of iterations (function calls) as a parameter, and pretty-prints the results:
Code:
/* This is main.c */
#include <stdio.h>
#include <stdint.h>
#include "utils.h"
#include "test.h"

static int usage(char const *const argv0, int exitstatus)
{
	fprintf(stderr, "\n"
	                "Usage: %s [ -h | --help ]\n"
	                "       %s iterations\n"
	                "\n"
	              , argv0, argv0);
	return exitstatus;
}

int main(int argc, char *argv[])
{
	unsigned long	iterations;
	char		dummy;

	double		startsecs, endsecs, count;

	uint64_t	nothing_cycles, test_cycles, body_cycles;
	double		nothing_seconds, test_seconds;

	if (argc < 2)
		return usage(argv[0], 0);
	if (argc > 2)
		return usage(argv[0], 2);
	if (argv[1][0] == '-' && argv[1][1] == 'h' && !argv[1][2])
		return usage(argv[0], 0);
	if (argv[1][0] == '-' && argv[1][1] == '-' && argv[1][2] == 'h' &&
	    argv[1][3] == 'e' && argv[1][3] == 'l' && argv[1][4] == 'p' && !argv[1][5])
		return usage(argv[0], 0);

	if (sscanf(argv[1], "%lu %c", &iterations, &dummy) != 1) {
		fprintf(stderr, "%s: Invalid number of iterations.\n", argv[1]);
		return usage(argv[0], 1);
	}
	if (iterations < 1UL) {
		fprintf(stderr, "%s: Number of iterations is too small.\n", argv[1]);
		return usage(argv[0], 1);
	}
	count = (double)iterations;

	printf("Running %lu iterations.\n", iterations);
	printf("Overhead:\n");
	fflush(stdout);

	startsecs = cpu_seconds();
	nothing_cycles = benchmark(iterations, nothing);
	endsecs = cpu_seconds();
	nothing_seconds = endsecs - startsecs;

	printf("\t%15.0f CPU cycles\n", (double)nothing_cycles);
	printf("\t%15.0f CPU cycles per call\n", (double)nothing_cycles / count);
	if (nothing_seconds > 0.0) {
		printf("\t%15.0f CPU cycles per second\n", (double)nothing_cycles / nothing_seconds);
		printf("\t( %13.3f GHz)\n", (double)nothing_cycles / nothing_seconds / 1000000000.0);
		printf("\t%15.9f seconds\n", nothing_seconds);
		printf("\t%15.9f seconds per call\n", nothing_seconds / count);
	} else
		printf("\t                (Not enough iterations to measure time in seconds)\n");
	printf("\n");
	printf("Test function:\n");
	fflush(stdout);
	
	startsecs = cpu_seconds();
	test_cycles = benchmark(iterations, test);
	endsecs = cpu_seconds();
	test_seconds = endsecs - startsecs;

	printf("\t%15.0f CPU cycles\n", (double)test_cycles);
	printf("\t%15.0f CPU cycles per call\n", (double)test_cycles / count);
	if (test_seconds > 0.0) {
		printf("\t%15.0f CPU cycles per second, estimated\n", (double)test_cycles / test_seconds);
		printf("\t( %13.3f GHz)\n", (double)test_cycles / test_seconds / 1000000000.0);
		printf("\t%15.9f seconds\n", test_seconds);
		printf("\t%15.9f seconds per call\n", test_seconds / count);
	} else
		printf("\t                (Not enough iterations to measure time in seconds)\n");
	printf("\n");

	if (test_cycles > nothing_cycles) {
		body_cycles  = test_cycles  - nothing_cycles;

		printf("Test function, excluding overhead:\n");
		printf("\t%15.0f CPU cycles\n", (double)(test_cycles - nothing_cycles));
		printf("\t%15.0f CPU cycles per call\n", (double)(test_cycles - nothing_cycles) / count);
		if (test_seconds > nothing_seconds) {
			printf("\t%15.9f seconds\n", test_seconds - nothing_seconds);
			printf("\t%15.9f seconds per call\n", (test_seconds - nothing_seconds) / count);
		} else
			printf("\t                (Not enough iterations to measure time in seconds)\n");
	} else
		printf("Function body execution time is lost in the measurement noise.\n");

	printf("\nDone.\n");

	return 0;
}
And finally, Makefile to make it easy to build (just run make clean all in the same directory):
Code:
CC	:= gcc
CFLAGS	:= -Wall -pedantic -O3 -ftree-vectorize -fomit-frame-pointer
LD	:= $(CC)
LDFLAGS	:= $(CFLAGS) -lrt

PROG	:= benchmark
OBJS	:= main.o test.o utils.o rdtsc.o

all: $(PROG)

clean:
	rm -f $(PROG) $(OBJS)

$(OBJS): %.o: %.c
	$(CC) $(CFLAGS) -c $<

$(PROG): %: $(OBJS) 
	$(LD) $(LDFLAGS) -o $@ $^
The rt library and _POSIX_C_SOURCE are only needed for clock_gettime(). If you do not mind much less precise CPU time in seconds, just drop those bits, and use the much less-precise clock() instead in the cpu_seconds() function. It does not affect the CPU cycle counts at all, just the estimated time in seconds (and CPU frequency).

When running benchmarks on a very fast multi-core machine, check that the CPU frequency is close to what you expect. If the machine is otherwise lightly loaded, you may see transitions between different frequencies; that affects the benchmark and makes the results unreliable in my opinion.

I hope you find this useful,

Last edited by Nominal Animal; 08-01-2011 at 12:08 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Measure CPU Utilization on Real time Linux patsylin Linux - Software 1 07-01-2010 12:21 PM
What is the longest time spent without using a GUI? crypticlogic Linux - General 14 12-20-2007 09:34 AM
Time spent in workspace (Gnome) littlewood Linux - Software 0 01-27-2006 12:40 PM
Log time spent on a Linux desktop Horos23 Linux - Software 3 01-11-2006 08:02 AM
Time spent working vs tinkering sausagejohnson General 6 05-18-2004 06:24 AM


All times are GMT -5. The time now is 08:09 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration