[SOLVED] why: the same program runs multiple times but get very different results

hillgreen · 12-07-2010, 12:12 PM

I used wait4 to time a child process. but the results differ each other dramatically. why?
my core source code lists as below.
to the point: in linux 2.6 kernel, how to time a process with a high precision. and, only its user-time rather than the elapsed time???

static struct rusage ruse;
static pid_t u_pid;

int main(int argc, char *argv[])
{
int i;
for (i = 0; i < 10; i++) {
u_pid = fork();

if (u_pid == 0) {
execl("./anti", "anti", NULL);

}
else if (u_pid > 0) {
int sec, usec;
float usetime;
pid_t repid = wait4(u_pid, NULL, WUNTRACED, &ruse);
usec = ruse.ru_utime.tv_usec;
sec = ruse.ru_utime.tv_sec;
usetime = sec + usec / (float)1000000;
printf("%g\n", usetime);
}
}
return 0;
}

and my result:
1.08807
1.06407
1.08807
1.05607
1.06807
1.07607
1.06807
1.07607
1.07207
1.08407

Sergei Steshenko · 12-08-2010, 05:31 AM

Quote:

Originally Posted by hillgreen

I used wait4 to time a child process. but the results differ each other dramatically. why?
my core source code lists as below.
to the point: in linux 2.6 kernel, how to time a process with a high precision. and, only its user-time rather than the elapsed time???

static struct rusage ruse;
static pid_t u_pid;

int main(int argc, char *argv[])
{
int i;
for (i = 0; i < 10; i++) {
u_pid = fork();

if (u_pid == 0) {
execl("./anti", "anti", NULL);

}
else if (u_pid > 0) {
int sec, usec;
float usetime;
pid_t repid = wait4(u_pid, NULL, WUNTRACED, &ruse);
usec = ruse.ru_utime.tv_usec;
sec = ruse.ru_utime.tv_sec;
usetime = sec + usec / (float)1000000;
printf("%g\n", usetime);
}
}
return 0;
}

and my result:
1.08807
1.06407
1.08807
1.05607
1.06807
1.07607
1.06807
1.07607
1.07207
1.08407

Why do you call the differences dramatic, i.e. which differences would you consider to be non-dramatic ? Justify your calculations of non-dramatic differences and your expectation WRT to the results in general.

hillgreen · 12-09-2010, 05:06 AM

the differences is about 10ms. I define the precision 0.5ms. And in my opinion, time elapsed in user space can be attained according to usage. But it does work now. Why not? and How to measure the time a programe used for its execution in sys or usr space with the precision 0.5ms ??

Sergei Steshenko · 12-09-2010, 05:09 AM

Quote:

Originally Posted by hillgreen

the differences is about 10ms. I define the precision 0.5ms. And in my opinion, time elapsed in user space can be attained according to usage. But it does work now. Why not? and How to measure the time a programe used for its execution in sys or usr space with the precision 0.5ms ??

Do you understand how your computer, including the CPU, works ? Do you know which big parts the system consists of ? Have you ever heard the words "cache", "DRAM" ?

salasi · 12-09-2010, 09:44 AM

Quote:

Originally Posted by hillgreen

I used wait4 to time a child process. but the results differ each other dramatically. why?

They are all 1.0-something, or another. I agree with Sergei that this is not dramatic. It may not meet your requirements, but that is a rather different issue.

Quote:

1.08807
1.06407
1.08807
1.05607
1.06807
1.07607
1.06807
1.07607
1.07207
1.08407

Did you actually look at your data? All of your data points (from this run) end ...07, in fact 207, 407, 607 and 807. Those will be completely random numbers, of course

Looking at your pattern, you could come to some conclusion about what could and could not be captured as a result of these runs.

Sergei Steshenko · 12-09-2010, 10:20 AM

To the OP: you apparently have an expectation of constant execution time, and this expectation is wrong - regardless of timer accuracy. That's why I wrote:

Quote:

Justify your calculations of non-dramatic differences and your expectation WRT to the results in general.

.

jiml8 · 12-09-2010, 12:32 PM

The execution time probably is constant, but what the OP in the program was the total wall clock time to execute the program, which of course includes context switches, and other things that the system does during that time.

What the OP apparently wants is to determine the total CPU time taken by the specific child process.

Take a look at getrusage(). Also look at /proc/stat. The answers you want are in those areas.

Sergei Steshenko · 12-09-2010, 01:29 PM

Quote:

Originally Posted by jiml8

The execution time probably is constant ...

No, it is not. By construction of modern HW in the first place, by context switches, by other HW processes.

jiml8 · 12-09-2010, 01:56 PM

Quote:

Originally Posted by Sergei Steshenko

No, it is not. By construction of modern HW in the first place, by context switches, by other HW processes.

Please try to pay attention, and read carefully and correctly.

The execution time probably is constant. Every time the program "anti" is run...whatever anti is...it probably runs in the same number of CPU cycles and bus cycles. This is almost certainly true. Period.

Now, if you had taken the time to pay attention and actually read the rest of the sentence I wrote - which you clearly did not because you only clipped the first portion of it, suggesting that your attention span is only that long - you would have seen that I stated that OP was actually getting the wall clock time which included things like context switches.

I've seen you do this too many times. And on this particular thread, you have done it again. And following your usual pattern you took a patronizing tone with OP while NOT presenting a solution. You will note that I did tell the OP where to look for a solution.

Remember one thing, Junior. No matter how much you think you know, there's bound to be someone around who knows more than you do. And in this case, on this topic, I'm that someone.

paulsm4 · 12-09-2010, 02:03 PM

Sergei is absolutely correct - timings WON'T be accurate to the microsecond. Or probably not even to the millisecond.

@hillgreen -

Here are a couple of other quick/easy ways to get "timings". It might be interesting to compare the results. And - equally interesting - to compare the variance between successive runs.

1. Use "time" in a shell script

Code:

time anti
time anti
time anti
time anti
time anti
time anti
time anti

2. Recompile your "anti" program with "profiling enabled" and run gprof:
http://www.cs.utah.edu/dept/old/texinfo/as/gprof.html

3. Remember -
a) Linux is NOT a "real-time operating system" (and, of course, neither is Windows

)
b) The whole point of "real time" is NOT "instantaneous" or "as fast as possible"...
... rather, the point is "deterministic"
c) And (at the risk of duplicate redundancy

) - Linux is NOT "deterministic"

'Hope that helps

PS:
If you don't have a scientific calculator handy, here's a cute web app for computing standard deviation:
http://www.easycalculation.com/stati...-deviation.php

jiml8 · 12-09-2010, 02:10 PM

Quote:

Originally Posted by paulsm4

Sergei is absolutely correct - timings WON'T be accurate to the microsecond. Or probably not even to the millisecond.

CPU time will be. Wall clock time won't be. The time spent executing is CPU time. The time from start to finish is wall clock time.

Quote:

@hillgreen -

Here are a couple of other quick/easy ways to get "timings". It might be interesting to compare the results. And - equally interesting - to compare the variance between successive runs.

1. Use "time" in a shell script

Code:

time anti
time anti
time anti
time anti
time anti
time anti
time anti

That won't be as good as what he already has.

Quote:

2. Recompile your "anti" program with "profiling enabled" and run gprof:
http://www.cs.utah.edu/dept/old/texinfo/as/gprof.html

That will do a lot better than what he has, but so will getrusage() or analyzing /proc/stat.

Sergei Steshenko · 12-09-2010, 02:23 PM

Quote:

Originally Posted by jiml8

Please try to pay attention, and read carefully and correctly.

The execution time probably is constant. ...

No, it is not. And I suggested to answer specific questions in: http://www.linuxquestions.org/questi...6/#post4185742 .

You apparently haven't answered these questions. So, here are my answers.

Modern computers typically use DRAM. "D" stands for "dynamic", meaning capacitive storage elements. Capacitive storage elements need refresh cycles. Refresh cycles are implemented as completely independent HW process, i.e. there is a piece of HW doing this. While a refresh cycle is in progress, the DRAM is inaccessible. I.e. a read or write operation will take more time if it hits a refresh cycle.

Modern CPUs have caches, and a CPU job is to execute instruction and to while doing this to load and store data. Instructions fetches and data load/store operations can occur with/from cache - in such a case they are fast, or with/from RAM - in such a case they are (IIRC) ten(s) of times slower. Whenever the OS switches tasks, cache is essentially repopulated, so task switching does cause variations of execution time.

Modern (and not only modern) computers use DMA extensively. DMA is yet another HW process. If a DMA transfer is in progress, RAM (at least, the affected by DMA bank) is inaccessible to CPU, so it looks to it like slower RAM.

As I have already written many times, taking all this into consideration, execution time is not constant. Because it simply can't be.
...
P.S. I took part in development of DMA block of a pretty complex communications chip; the chip also contained a CPU. It was my day to day job to perform HDL simulations of the chip carefully watching bus activity. DMA typically has higher than CPU bus priority - a network packet can't wait, that's why, for example.

jiml8 · 12-09-2010, 03:33 PM

Quote:

Originally Posted by Sergei Steshenko

No, it is not. And I suggested to answer specific questions in: http://www.linuxquestions.org/questi...6/#post4185742 .

You apparently haven't answered these questions. So, here are my answers.

Modern computers typically use DRAM. "D" stands for "dynamic", meaning capacitive storage elements. Capacitive storage elements need refresh cycles. Refresh cycles are implemented as completely independent HW process, i.e. there is a piece of HW doing this. While a refresh cycle is in progress, the DRAM is inaccessible. I.e. a read or write operation will take more time if it hits a refresh cycle.

Modern CPUs have caches, and a CPU job is to execute instruction and to while doing this to load and store data. Instructions fetches and data load/store operations can occur with/from cache - in such a case they are fast, or with/from RAM - in such a case they are (IIRC) ten(s) of times slower. Whenever the OS switches tasks, cache is essentially repopulated, so task switching does cause variations of execution time.

Modern (and not only modern) computers use DMA extensively. DMA is yet another HW process. If a DMA transfer is in progress, RAM (at least, the affected by DMA bank) is inaccessible to CPU, so it looks to it like slower RAM.

As I have already written many times, taking all this into consideration, execution time is not constant. Because it simply can't be.
...
P.S. I took part in development of DMA block of a pretty complex communications chip; the chip also contained a CPU. It was my day to day job to perform HDL simulations of the chip carefully watching bus activity. DMA typically has higher than CPU bus priority - a network packet can't wait, that's why, for example.

Do you know what? That's all true. I won't argue with any of it. But I will put it into context.

If a process has to wait on a memory refresh (which could happen) the process will be marked "not ready", the processor will do a context switch, and therefore the process is not executing. So the time charged against it is wall clock time, not processor time.

As for processor caching, the devil is in the details. I will agree that if process B is waiting and process A winds up using all the onboard cache, then when process B gets the processor back it will have to wait for fetches from RAM. And this will cause some variance in the CPU time associated with the execution, which will be charged against the process since the wait states associated with waiting on RAM are charged against the process. However, even in worst case, the time variance associated with doing this will be a very very small fraction of the total time variance associated with measuring wall clock time, which includes context switches and waits for I/O and so forth.

Similarly, extensive DMA can affect RAM access times, but often enough that will cause a CPU context switch if the data or instructions are not already cached, and again the time is not charged against the process.

So. What you say is true. But it is also a significant impact only at a scale that is ordinarily well below the scale at which the OP is working, and is at best an extremely trivial contributor to the variance OP was asking about, presuming the OP's computer is reasonably modern. How trivial? Very hard to say. But if I had to take a whack at it, I'd place it someplace on the order of a thousandth of a percent or less of OP's indicated variance in a modern PC-class computer. That's just a WAG, and if you can present real numbers, please do so.

Sergei Steshenko · 12-09-2010, 03:41 PM

Quote:

Originally Posted by jiml8

If a process has to wait on a memory refresh (which could happen) the process will be marked "not ready", the processor will do a context switch, and therefore the process is not executing. So the time charged against it is wall clock time, not processor time.
...

Nonsense. The CPU has no notion of memory refresh. The CPU can't do anything externally while in wait cycles, for example, it can't push current task's registers on stack, it can't process an interrupt.

To the same extent the CPU has no knowledge about DMA.

...

Spend about $100 or something like that (well, more - you'll need an oscilloscope too), buy a cheap development board for a simple controller - preferably without pipeline, and do something on bare metal from scratch. I.e. write your own BIOS first.

For me it was quite revealing in the late eighties to develop an in-circuit emulator and to use it to debug HW and SW.

paulsm4 · 12-09-2010, 04:20 PM

Hi -

I agree:

Quote:

As I have already written many times, taking all this into consideration, execution time is not constant. Because it simply can't be.

Quote:

Spend about $100 or something like that (well, more - you'll need an oscilloscope too), buy a cheap development board for a simple controller - preferably without pipeline, and do something on bare metal from scratch. I.e. write your own BIOS first.

Or, even better:

* Take your favorite (CPU- and memory- intensive) program, and time it any three or four or five ways you like.

* Run it ten or 100 or 1000 times for each timing method you choose.

* Compute the variation for each method.

I'll bet your timings might be a lot closer than hillgreen's (with his fork()'s and "wait()'s", which introduce a HUGE amount of latency). But I seriously doubt they'll consistently line up to the nearest millisecond, either