variation in value returned from gettimeofday()
Hello
I needed to find the time it takes to execute a for loop... I decided to use the gettimeofday() function and wrote the following code: #include <stdio.h> #include <stdlib.h> #include <time.h> #include <sched.h> // for use of sched_setaffinity #include "/usr/include/time.h" void test(); int main () { unsigned long mask; unsigned int len = sizeof(mask); // Get the affinity of the current process if ( sched_getaffinity(0, len, &mask) == 0 ) { printf("\nThe affinity mask for current process is: %08lx", mask); // Set the affinity of the current process mask = 2; if (sched_setaffinity(0, len, &mask) == 0) { printf("\nChecking that affinity mask has been set to 2 ..."); // Check that affinity of this process has been set to CPU 0 sched_getaffinity(0, len, &mask); if (mask == 2 ) { printf(" yes (new affinity is: %08lx)\n", mask); } else { printf(" no (affinity is still: %08lx)\n", mask); } test (); } else { perror("sched_setaffinity"); } } else { perror("sched_getaffinity"); return -1; } return 0; } void test () { int i, j; // counter variable struct timeval begin_time, end_time; // variables to record time from gettimeofday() double total; // ***** Use gettimeofday() to record time taken ***** total = 0; for (i = 0; i < 15; i++) { gettimeofday(&begin_time, 0); for (j = 0; j < 1000; j++); gettimeofday (&end_time, 0); printf("\nCounting to 1000 took: %ld microseconds ", end_time.tv_usec - begin_time.tv_usec); total += (end_time.tv_usec - begin_time.tv_usec); } printf("\nAverage time taken: %.3f microseconds \n", total / 15); } The output I get is: ==================== The affinity mask for current process is: 00000001 Checking that affinity mask has been set to 2 ... yes (new affinity is: 00000002) Counting to 1000 took: 20 microseconds Counting to 1000 took: 19 microseconds Counting to 1000 took: 20 microseconds Counting to 1000 took: 9 microseconds Counting to 1000 took: 9 microseconds Counting to 1000 took: 9 microseconds Counting to 1000 took: 19 microseconds Counting to 1000 took: 20 microseconds Counting to 1000 took: 9 microseconds Counting to 1000 took: 9 microseconds Counting to 1000 took: 9 microseconds Counting to 1000 took: 19 microseconds Counting to 1000 took: 19 microseconds Counting to 1000 took: 19 microseconds Counting to 1000 took: 9 microseconds Average time taken: 14.533 microseconds I think that something is causing the values to vary quite a bit (the result is either 9 or 19~20)... I am using a laptop with Debian 5.0.6 (1GB memory, 2 processors at T5500@1.66GHz each). Before I ran the above program, I did the following: 1. Set the default affinity of all process's on boot to cpu 0 by adding: /bin/taskset -p 1 1 /bin/taskset -p 1 $$ to the top of /etc/init.d/rcS 2. Rebooted 3. Used System monitor to check that utilization of CPU2 is 0% 4. Made sure the laptop was not connected to any network I am at a loss to find out what is causing this variation :( ... any suggestions?? Thanks btw, my /proc/interrupts file is: sana@HAIL:~$ cat /proc/interrupts CPU0 CPU1 0: 423461 0 IO-APIC-edge timer 1: 14926 0 IO-APIC-edge i8042 8: 5 0 IO-APIC-edge rtc0 9: 1 0 IO-APIC-fasteoi acpi 12: 123 0 IO-APIC-edge i8042 14: 35463 0 IO-APIC-edge ata_piix 15: 42935 0 IO-APIC-edge ata_piix 16: 0 0 IO-APIC-fasteoi i915@pci:0000:00:02.0 17: 40464 0 IO-APIC-fasteoi eth0 18: 12 0 IO-APIC-fasteoi mmc0 19: 2 0 IO-APIC-fasteoi ohci1394 20: 3 0 IO-APIC-fasteoi uhci_hcd:usb1, ehci_hcd:usb5 21: 407800 0 IO-APIC-fasteoi uhci_hcd:usb2, HDA Intel 22: 45979 0 IO-APIC-fasteoi uhci_hcd:usb3 23: 0 0 IO-APIC-fasteoi uhci_hcd:usb4 NMI: 0 0 Non-maskable interrupts LOC: 177460 215150 Local timer interrupts RES: 5905 38820 Rescheduling interrupts CAL: 85 28373 function call interrupts TLB: 9 11 TLB shootdowns TRM: 0 0 Thermal event interrupts SPU: 0 0 Spurious interrupts ERR: 0 MIS: 0 sana@HAIL:~$ |
I ran it on my laptop with Fedora 14 (1GB memory, 2 processors at T7250 @2.0GHz each).
Code:
The affinity mask for current process is: 00000003 02:07:11 up 6 days, 10:12, 4 users, load average: 0.03, 0.11, 0.13 I wonder if you have something running that's swapping memory. Does /proc/meminfo show SwapCached as 0KB and SwapTotal == SwapFree? |
There are lots of things that can cause variation. Is your system "tick-less" (ie: CONFIG_NO_HZ=y)? Does your CPU have lots of SMIs? How are you getting your time information (where does gettimeofday get its info from)?
One quick fix I'd make to your script is to use clock_gettime() with CLOCK_MONOTONIC from the rt library (link with -lrt). With gettimeofday(), if some task (say ntpd) is updating your timing source, the information might be bad. One other thing to check: you seem to have a lot of rescheduling on core #1. Perhaps set the scheduling to sched_fifo / sched_rr and a prio of 77 (I just picked that out of thin air - "most prioritized" is 99). That will ensure that your software isn't getting preempted by any kind of software interrupt handler. |
Re
Sorri, for the delay. Now to answer your questions...
@KenJackson /proc/meminfo shows: SwapCached: 0kB and yes SwapTotal is equal to SwapFree SwapTotal: 2698880 kB SwapFree: 2698880 kB @orgcandman Seems like my system is ‘tick-less’ (i.e. CONFIG_NO_HZ=y). grep HZ /boot/config-`uname –r` gives: CONFIG_NO_HZ=y # CONFIG_NO_HZ_100 is not set CONFIG_HZ_250=y # CONFIG_HZ_300 is not set # CONFIG_HZ_1000 is not set CONFIG_HZ=250 CONFIG_MACHZ=m I tried clock_gettime() as you suggested. This is the body of the new test(): int i, j; struct timespec begin_time, end_time; double total = 0; for (i = 0; i < 15; i++) { clock_gettime(CLOCK_MONOTONIC, &begin_time); for (j = 0; j < 1000; j++ ); clock_gettime(CLOCK_MONOTONIC, &end_time); printf(“\nCounting to 1000 took: %ld nanoseconds”, end_time.tv_nsec – begin_time.tv_nsec); total += end_time.tv_nsec – begin_time.tv_nsec; } printf(“\nAverage time taken: %.3f nanoseconds \n”, total / 15); but again the output varies Counting to 1000 took: 9219 nanoseconds Counting to 1000 took: 9079 nanoseconds Counting to 1000 took: 8940 nanoseconds Counting to 1000 took: 19486 nanoseconds Counting to 1000 took: 19486 nanoseconds Counting to 1000 took: 19556 nanoseconds Counting to 1000 took: 9010 nanoseconds Counting to 1000 took: 9080 nanoseconds Counting to 1000 took: 9079 nanoseconds Counting to 1000 took: 19276 nanoseconds Counting to 1000 took: 20115 nanoseconds Counting to 1000 took: 19486 nanoseconds Counting to 1000 took: 9010 nanoseconds Counting to 1000 took: 9079 nanoseconds Counting to 1000 took: 9009 nanoseconds Average time taken: 13260.667 nanoseconds Btw, gettimeofday() is defined in sys/time.h and http://www.manpagez.com/man/2/gettimeofday/ I read somewhere that the configuration option CONFIG_PREEMPT in the 2.6 kernel can help to get soft real-time performance. Apparently, it is not set... grep PREE /boot/config-`uname –r` gives: CONFIG_PREEMPT_NOTIFIERS=y CONFIG_PREEMPT_NONE=y # CONFIG_PREEMPT_VOLUNTARY is not set # CONFIG_PREEMPT is not set:confused: Btw, uname –a gives: Linux HAIL 2.6.26-2-686 #1 SMP Mon Aug 30 07:01:57 UTC 2010 i686 GNU/Linux I am really running out of ideas... any suggestions? :confused: thanxs for all your help... |
More updates...
I decided to run the code on a desktop with Fedora 13 and get much better output: [sana@localhost Benchmarking-Example]$ ./application The affinity mask for current process is: 00000003 Checking that affinity mask has been set to 2 ... yes (new affinity is: 00000002) Counting to 1000 took: 4 microseconds Counting to 1000 took: 3 microseconds Counting to 1000 took: 4 microseconds Counting to 1000 took: 3 microseconds Counting to 1000 took: 3 microseconds Counting to 1000 took: 4 microseconds Counting to 1000 took: 4 microseconds Counting to 1000 took: 4 microseconds Counting to 1000 took: 4 microseconds Counting to 1000 took: 4 microseconds Counting to 1000 took: 3 microseconds Counting to 1000 took: 3 microseconds Counting to 1000 took: 3 microseconds Counting to 1000 took: 3 microseconds Counting to 1000 took: 3 microseconds Average time taken: 3.467 microseconds [sana@localhost Benchmarking-Example]$ The biggest difference I can see is that CONFIG_HZ is set to 1000 on the desktop and 250 on the laptop. Could this be the cause of the variation? [sana@localhost ~] $ grep HZ /boot/config-2.6.27.41-170.2.117.fc10.i686 CONFIG_NO_HZ=y # CONFIG_HZ_100 is not set # CONFIG_HZ_250 is not set # CONFIG_HZ_300 is not set CONFIG_HZ_1000=y CONFIG_HZ=1000 CONFIG_MACHZ_WDT=m [sana@localhost ~]$ Is is possible to change the value of CONFIG_HZ on the laptop to 1000? |
Or better still... can I set CONFIG_NO_HZ to n? Would that improve reduce the variation even more? I am not sure how this it is to be done... any ideas?
thanks |
I'd do:
CONFIG_NO_HZ=n CONFIG_HZ=1000 CONFIG_HZ_1000=y Should do much better for performance at a loss of battery life tradeoff. Additionally, if you can turn on voluntary kernel preemption, preempting the Big Kernel Lock, you might get even better performance. |
Assuming you used -O0 (no optimization, otherwise the loop may be optimized away) it's just scheduling noise and cache latencies.
Wait for a timer tick (by waiting until user time in getrusage() changes), before starting the measurement to minimize scheduling noise. Try the following code: Code:
#include <sys/time.h>
Hope this helps, Nominal Animal |
Hi,
I recompiled the kernel on my laptop, so now: sana@HAIL:~/test$ grep HZ /boot/config-2.6.26 # CONFIG_NO_HZ is not set # CONFIG_HZ_100 is not set # CONFIG_HZ_250 is not set # CONFIG_HZ_300 is not set CONFIG_HZ_1000=y CONFIG_HZ=1000 CONFIG_MACHZ_WDT=m sana@HAIL:~/test$ I ran the program (the one in my 1st post in this thread) twice: sana@HAIL:~/test$ ./application The affinity mask for current process is: 00000001 Checking that affinity mask has been set to 2 ... yes (new affinity is: 00000002) Counting to 1000 took: 9 microseconds Counting to 1000 took: 9 microseconds Counting to 1000 took: 9 microseconds Counting to 1000 took: 9 microseconds Counting to 1000 took: 9 microseconds Counting to 1000 took: 9 microseconds Counting to 1000 took: 9 microseconds Counting to 1000 took: 9 microseconds Counting to 1000 took: 8 microseconds Counting to 1000 took: 9 microseconds Counting to 1000 took: 9 microseconds Counting to 1000 took: 9 microseconds Counting to 1000 took: 9 microseconds Counting to 1000 took: 9 microseconds Counting to 1000 took: 9 microseconds Average time taken: 8.933 microseconds sana@HAIL:~/test$ ./application The affinity mask for current process is: 00000001 Checking that affinity mask has been set to 2 ... yes (new affinity is: 00000002) Counting to 1000 took: 9 microseconds Counting to 1000 took: 9 microseconds Counting to 1000 took: 9 microseconds Counting to 1000 took: 9 microseconds Counting to 1000 took: 9 microseconds Counting to 1000 took: 9 microseconds Counting to 1000 took: 9 microseconds Counting to 1000 took: 9 microseconds Counting to 1000 took: 9 microseconds Counting to 1000 took: 10 microseconds Counting to 1000 took: 9 microseconds Counting to 1000 took: 8 microseconds Counting to 1000 took: 9 microseconds Counting to 1000 took: 9 microseconds Counting to 1000 took: 9 microseconds Average time taken: 9.000 microseconds sana@HAIL:~/test$ There seems to be much less variation now... Btw, I chose the Preemption Model as 'Preemptible kernel (Low-Latency Desktop)' during menuconfig. So, now: sana@HAIL:~/test$ grep PREE /boot/config-2.6.26 CONFIG_PREEMPT_NOTIFIERS=y # CONFIG_PREEMPT_NONE is not set # CONFIG_PREEMPT_VOLUNTARY is not set CONFIG_PREEMPT=y # CONFIG_PREEMPT_RCU is not set CONFIG_DEBUG_PREEMPT=y sana@HAIL:~/test$ Any ideas if chosing 'Voluntary Kernel Preemption' instead improve results? thank you all for your help :hattip: |
Your results won't improve much beyond what they are now. Tickless systems are awful for scheduling latency since they go to sleep (which means they're GREAT for battery life). As you've noticed. You'll also notice things like select waits taking MUCH longer than they should with a tickless system.
|
All times are GMT -5. The time now is 09:47 AM. |