ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I needed to find the time it takes to execute a for loop...
I decided to use the gettimeofday() function and wrote the following code:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <sched.h> // for use of sched_setaffinity
#include "/usr/include/time.h"
void test();
int main () {
unsigned long mask;
unsigned int len = sizeof(mask);
// Get the affinity of the current process
if ( sched_getaffinity(0, len, &mask) == 0 ) {
printf("\nThe affinity mask for current process is: %08lx", mask);
// Set the affinity of the current process
mask = 2;
if (sched_setaffinity(0, len, &mask) == 0) {
printf("\nChecking that affinity mask has been set to 2 ...");
// Check that affinity of this process has been set to CPU 0
sched_getaffinity(0, len, &mask);
if (mask == 2 ) {
printf(" yes (new affinity is: %08lx)\n", mask);
} else {
printf(" no (affinity is still: %08lx)\n", mask);
}
test ();
} else {
perror("sched_setaffinity");
}
I ran it on my laptop with Fedora 14 (1GB memory, 2 processors at T7250 @2.0GHz each).
Code:
The affinity mask for current process is: 00000003
Checking that affinity mask has been set to 2 ... yes (new affinity is: 00000002)
Counting to 1000 took: 10 microseconds
Counting to 1000 took: 10 microseconds
Counting to 1000 took: 10 microseconds
Counting to 1000 took: 10 microseconds
Counting to 1000 took: 10 microseconds
Counting to 1000 took: 10 microseconds
Counting to 1000 took: 10 microseconds
Counting to 1000 took: 11 microseconds
Counting to 1000 took: 10 microseconds
Counting to 1000 took: 10 microseconds
Counting to 1000 took: 10 microseconds
Counting to 1000 took: 10 microseconds
Counting to 1000 took: 10 microseconds
Counting to 1000 took: 10 microseconds
Counting to 1000 took: 11 microseconds
Average time taken: 10.133 microseconds
I didn't reboot, didn't disconnect from the network, have about 8 applications running and 30 tabs open in Firefox. And my system is pretty unloaded as shown by the uptime command:
02:07:11 up 6 days, 10:12, 4 users, load average: 0.03, 0.11, 0.13
I wonder if you have something running that's swapping memory.
Does /proc/meminfo show SwapCached as 0KB and SwapTotal == SwapFree?
There are lots of things that can cause variation. Is your system "tick-less" (ie: CONFIG_NO_HZ=y)? Does your CPU have lots of SMIs? How are you getting your time information (where does gettimeofday get its info from)?
One quick fix I'd make to your script is to use clock_gettime() with CLOCK_MONOTONIC from the rt library (link with -lrt). With gettimeofday(), if some task (say ntpd) is updating your timing source, the information might be bad.
One other thing to check:
you seem to have a lot of rescheduling on core #1. Perhaps set the scheduling to sched_fifo / sched_rr and a prio of 77 (I just picked that out of thin air - "most prioritized" is 99). That will ensure that your software isn't getting preempted by any kind of software interrupt handler.
Last edited by orgcandman; 12-30-2010 at 12:10 PM.
Sorri, for the delay. Now to answer your questions...
@KenJackson
/proc/meminfo shows:
SwapCached: 0kB
and yes SwapTotal is equal to SwapFree
SwapTotal: 2698880 kB
SwapFree: 2698880 kB
@orgcandman
Seems like my system is ‘tick-less’ (i.e. CONFIG_NO_HZ=y).
grep HZ /boot/config-`uname –r` gives:
CONFIG_NO_HZ=y
# CONFIG_NO_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
CONFIG_MACHZ=m
I tried clock_gettime() as you suggested. This is the body of the new test():
int i, j;
struct timespec begin_time, end_time;
double total = 0;
for (i = 0; i < 15; i++) {
clock_gettime(CLOCK_MONOTONIC, &begin_time);
for (j = 0; j < 1000; j++ );
clock_gettime(CLOCK_MONOTONIC, &end_time);
printf(“\nCounting to 1000 took: %ld nanoseconds”,
end_time.tv_nsec – begin_time.tv_nsec);
total += end_time.tv_nsec – begin_time.tv_nsec;
}
printf(“\nAverage time taken: %.3f nanoseconds \n”, total / 15);
but again the output varies
Counting to 1000 took: 9219 nanoseconds
Counting to 1000 took: 9079 nanoseconds
Counting to 1000 took: 8940 nanoseconds
Counting to 1000 took: 19486 nanoseconds
Counting to 1000 took: 19486 nanoseconds
Counting to 1000 took: 19556 nanoseconds
Counting to 1000 took: 9010 nanoseconds
Counting to 1000 took: 9080 nanoseconds
Counting to 1000 took: 9079 nanoseconds
Counting to 1000 took: 19276 nanoseconds
Counting to 1000 took: 20115 nanoseconds
Counting to 1000 took: 19486 nanoseconds
Counting to 1000 took: 9010 nanoseconds
Counting to 1000 took: 9079 nanoseconds
Counting to 1000 took: 9009 nanoseconds
Average time taken: 13260.667 nanoseconds
I read somewhere that the configuration option CONFIG_PREEMPT in the 2.6 kernel can help to get soft real-time performance. Apparently, it is not set...
grep PREE /boot/config-`uname –r` gives:
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
Btw, uname –a gives:
Linux HAIL 2.6.26-2-686 #1 SMP Mon Aug 30 07:01:57 UTC 2010 i686 GNU/Linux
I am really running out of ideas... any suggestions?
I decided to run the code on a desktop with Fedora 13 and get much better output:
[sana@localhost Benchmarking-Example]$ ./application
The affinity mask for current process is: 00000003
Checking that affinity mask has been set to 2 ... yes (new affinity is: 00000002)
Counting to 1000 took: 4 microseconds
Counting to 1000 took: 3 microseconds
Counting to 1000 took: 4 microseconds
Counting to 1000 took: 3 microseconds
Counting to 1000 took: 3 microseconds
Counting to 1000 took: 4 microseconds
Counting to 1000 took: 4 microseconds
Counting to 1000 took: 4 microseconds
Counting to 1000 took: 4 microseconds
Counting to 1000 took: 4 microseconds
Counting to 1000 took: 3 microseconds
Counting to 1000 took: 3 microseconds
Counting to 1000 took: 3 microseconds
Counting to 1000 took: 3 microseconds
Counting to 1000 took: 3 microseconds
Average time taken: 3.467 microseconds
[sana@localhost Benchmarking-Example]$
The biggest difference I can see is that CONFIG_HZ is set to 1000 on the desktop and 250 on the laptop. Could this be the cause of the variation?
[sana@localhost ~] $ grep HZ /boot/config-2.6.27.41-170.2.117.fc10.i686
CONFIG_NO_HZ=y
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000
CONFIG_MACHZ_WDT=m
[sana@localhost ~]$
Is is possible to change the value of CONFIG_HZ on the laptop to 1000?
Or better still... can I set CONFIG_NO_HZ to n? Would that improve reduce the variation even more? I am not sure how this it is to be done... any ideas?
Assuming you used -O0 (no optimization, otherwise the loop may be optimized away) it's just scheduling noise and cache latencies.
Wait for a timer tick (by waiting until user time in getrusage() changes), before starting the measurement to minimize scheduling noise.
Try the following code:
Code:
#include <sys/time.h>
#include <sys/resource.h>
#include <stdio.h>
static inline double toseconds(const struct timeval t)
{
return (double)t.tv_sec + (double)t.tv_usec / 1000000.0;
}
static inline unsigned long cycles(void)
{
register unsigned int lo, hi;
asm volatile ("rdtsc" : "=a" (lo), "=d" (hi));
return ((unsigned long)hi << 32UL) | (unsigned long)lo;
}
volatile int dummy = 0;
int main(int argc, char **argv)
{
int arg;
unsigned long c;
long i;
double ssys, suser, swall;
struct timeval begin, end;
struct rusage before, after;
for (arg = 1; arg < argc; arg++) {
char ignore;
if (sscanf(argv[arg], "%ld %c", &i, &ignore) != 1) {
fprintf(stderr, "%s: Invalid number of loop cycles.\n", argv[arg]);
continue;
}
if (i < 1L) {
fprintf(stderr, "%s: Number of loop cycles is too small.\n", argv[arg]);
continue;
}
printf("%lu loops:\n", i);
fflush(stdout);
/* Wait until change at user time in usecs (or secs) */
getrusage(RUSAGE_SELF, &after);
do {
gettimeofday(&begin, NULL);
getrusage(RUSAGE_SELF, &before);
} while (before.ru_utime.tv_sec == after.ru_utime.tv_sec &&
before.ru_utime.tv_usec == after.ru_utime.tv_usec);
gettimeofday(&begin, NULL);
c = cycles();
/* Work loop -- volatile assignment cannot be optimized away. */
while (i-->0L)
dummy = 0;
c = cycles() - c;
gettimeofday(&end, NULL);
getrusage(RUSAGE_SELF, &after);
swall = toseconds(end) - toseconds(begin);
suser = toseconds(after.ru_utime) - toseconds(before.ru_utime);
ssys = toseconds(after.ru_stime) - toseconds(before.ru_stime);
printf("Total cycles: %lu\n", c);
printf("User time: %.6f seconds\n", suser);
printf("System time: %.6f seconds\n", ssys);
printf("Total CPU time: %.6f seconds\n", suser + ssys);
printf("Wall clock: %.6f seconds\n", swall);
printf("\n");
fflush(stdout);
}
return 0;
}
You can compile this with or without optimizations (-O0 to -O3); the loop repeatedly zeroes a volatile variable - an operation which the compiler cannot optimize away. Supply the number(s) of loop cycles you wish to test as command line parameters.
The number of clock cycles will vary a bit due to cache effects.
First run will always have a higher clock cycle count, because the code and data will not be in cache. Afterwards, caches are "hot", unless evicted by the kernel (process scheduler), perhaps because another process was run for a while on the same CPU.
Clocks normally change at tick intervals.
I believe Linux gettimeofday() on x86 uses the clock cycle count (rdtsc) internally to achieve better than tick precision.
When measuring user or system time via getrusage(), wait until the struct timeval changes before starting the measurement. Any measured interval is then correct to within one timer tick interval; the result may be one tick too short, but never too long. Measure and add half a tick for best possible resolution: +-0.5 timer tick interval.
If you start measuring elapsed time near the end of the time slice given to this process, you will not only include the scheduler but also other processes run on this core, in the wall clock time; also, the number of cycles taken will vary wildly because the caches may or may not be evicted in the middle of the testing run.
You can check the timer tick (resolution) by calling getrusage() repeatedly, and calculating the delta when the user time changes.
Wall clock includes kernel overhead and other processes run on the same CPU -- and you don't normally want to include those.
User time is the time spent running userspace code, including any libraries.
System time is the time spent in the kernel by this process, for example in reading/writing files or sockets.
CPU time (user + system time) is the total CPU time taken by this process. When you do computation and I/O, this is the most interesting measurement.
Use clock cycle count (rdtsc) to measure pure CPU and CPU+memory workloads; sort algorithms, math, et cetera. Repeat the measurement and check the statistics on the results. If using I/O, ignore or separate those clock cycles.
Hope this helps,
Nominal Animal
Last edited by Nominal Animal; 03-21-2011 at 01:48 AM.
sana@HAIL:~/test$ grep HZ /boot/config-2.6.26
# CONFIG_NO_HZ is not set
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000
CONFIG_MACHZ_WDT=m
sana@HAIL:~/test$
I ran the program (the one in my 1st post in this thread) twice:
sana@HAIL:~/test$ ./application
The affinity mask for current process is: 00000001
Checking that affinity mask has been set to 2 ... yes (new affinity is: 00000002)
Counting to 1000 took: 9 microseconds
Counting to 1000 took: 9 microseconds
Counting to 1000 took: 9 microseconds
Counting to 1000 took: 9 microseconds
Counting to 1000 took: 9 microseconds
Counting to 1000 took: 9 microseconds
Counting to 1000 took: 9 microseconds
Counting to 1000 took: 9 microseconds
Counting to 1000 took: 8 microseconds
Counting to 1000 took: 9 microseconds
Counting to 1000 took: 9 microseconds
Counting to 1000 took: 9 microseconds
Counting to 1000 took: 9 microseconds
Counting to 1000 took: 9 microseconds
Counting to 1000 took: 9 microseconds
Average time taken: 8.933 microseconds
sana@HAIL:~/test$ ./application
The affinity mask for current process is: 00000001
Checking that affinity mask has been set to 2 ... yes (new affinity is: 00000002)
Counting to 1000 took: 9 microseconds
Counting to 1000 took: 9 microseconds
Counting to 1000 took: 9 microseconds
Counting to 1000 took: 9 microseconds
Counting to 1000 took: 9 microseconds
Counting to 1000 took: 9 microseconds
Counting to 1000 took: 9 microseconds
Counting to 1000 took: 9 microseconds
Counting to 1000 took: 9 microseconds
Counting to 1000 took: 10 microseconds
Counting to 1000 took: 9 microseconds
Counting to 1000 took: 8 microseconds
Counting to 1000 took: 9 microseconds
Counting to 1000 took: 9 microseconds
Counting to 1000 took: 9 microseconds
Average time taken: 9.000 microseconds
sana@HAIL:~/test$
There seems to be much less variation now...
Btw, I chose the Preemption Model as 'Preemptible kernel (Low-Latency Desktop)' during menuconfig. So, now:
sana@HAIL:~/test$ grep PREE /boot/config-2.6.26
CONFIG_PREEMPT_NOTIFIERS=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
# CONFIG_PREEMPT_RCU is not set
CONFIG_DEBUG_PREEMPT=y
sana@HAIL:~/test$
Any ideas if chosing 'Voluntary Kernel Preemption' instead improve results?
Your results won't improve much beyond what they are now. Tickless systems are awful for scheduling latency since they go to sleep (which means they're GREAT for battery life). As you've noticed. You'll also notice things like select waits taking MUCH longer than they should with a tickless system.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.