Linux - KernelThis forum is for all discussion relating to the Linux kernel.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Hi
Using C, I need to get the idle time of a thread, on a multi-socket system. Given that this is a multi-core setup, solutions based on difference of wall time and sum of user and kernel time will not suffice. I am wondering if there are any functions, similar to getrusage, or /proc file, similar to /proc/pid/stat, which could be used?
Cheers,
Not yet.
There are facilities becoming available (in 4.9) that more easily allow tracing of entry and exit to kernel functions where this can be determined - including the scheduler. Currently tools have to be hand written to place intercepts and do the math.
Probably not something you'd want to turn on in a production and/or high load environment anyway.
Generally, when "instrumenting" a system for "telmetery," I suggest that you should build measures into your code (without resorting to any kernel trickery), not to measure "thread idle time," but rather, something that is usefully related to what your system is designed to do, as seen by the clients that it is designed to do it for.
And, instead of simply gathering numbers, gather binomial tests. For instance: "90% of the time, this process should service a request within 0.02 seconds." Or, "90% of the time, this process should achieve a throughput rate of 10,000 requests per second or more, as measured over a 5-second sampling interval." And so on.
The processes simply capture statistics in counters. A separate, low-priority thread periodically gathers-up the values of those counters and records them. (So that the act of gathering experimental data does not affect the experiment.) The data streams are analyzed off-line using classic statistical analysis techniques and tools.
Any procedure which seeks to analyze other statistics, such as thread-idle times, cannot see the factors that might substantially affect those numbers (to the point of making them apples-and-oranges): other processes and threads.
Last edited by sundialsvcs; 11-30-2016 at 03:06 PM.
Very "experienced" observation. Thank you for sharing.
I am trying to establish for our application, which has soft real-time requirements, whether we should move to RT patch, move tasks to specific numa sockets, core bind, etc. As part of that effort, I need to know average and variance of:
. Number of cores used, per thread
. Thread preemption time
. Number of page fault
...
Thread sleep time was a way to get average cpu utilisation - by comparing wall time against user and kernel time ...
(There is a back-end ELK setup that digests all these among other data points, to spot anomalous conditions.)
Again, thank you for sharing.
Cheers
Better yet: "how many times did the thread achieve its soft-real-time requirements?" That's what you really want to know!
If the incoming request has a timestamp, generated by some other system whose clock is known to be aligned with yours, then you can compare the current-time when you dequeue that request, and determine the latency. Record these latencies, say in a simple circular in-memory buffer.
Periodically, another thread takes (say ...) 20 randomsamples from that buffer, and records them to an external file for analysis: (latency, request timestamp).
During off-line post-processing, any records with an identical timestamp and latency are presumed to be duplicates, so they are eliminated, and the rest analyzed.
At any point in time, the clocks of the two systems could drift, and you need to be reasonably aware of this in calculating whether or not a particular request in your sample did, or did not, meet the soft-latency requirement.
It is also useful to install similar instrumentation on the client side, since they can evaluate the "round trip."
Consider adding four timestamp fields to the packet: the client's time when the request was posted, the server's time when dequeued, the server's time when replied, and the client's time when dequeued. Two of these are understood to be "client's clock," and two the "server's clock."
I cordially suggest that the various things you propose to measure, although they are indeed measurable, will be found to have no direct, useful, consistent correlation to your objective. These factors are "beyond your control," so to speak. They're not directly correlated to "the activities of this process."
If you discover that the system is missing its deadlines, then you can add other instrumentation to it. For instance, once a minute it measures its own timing: "how much CPU time did I use?" "how long was I in such-and-such wait?" Since you know your own process-id, /proc/your_pid/... contains a cornucopia of information that you could have the process periodically capture about itself. You're particularly interested in involuntary wait statistics: where the process could have been runnable, but was blocked for some reason external to itself.
The process, itself, is by various ways observing itself, recording for off-line analysis information which can determine if it is fulfilling its mission, and if not, what might be affecting it. These data will enable you to find means, from which you're then particularly examining standard deviation and other measures of variance. Likewise, you might be looking for statistical correlation, although you must be very careful when comparing binomials ("did you, or did you not, meet the deadline this time?") with continuous data such as system dispatcher statistics.
Certain things you simply don't care about, such as "cores." At any moment, any free core will do nicely, but no process has control over that.
You really don't care how Linux chose to dispatch the work: you care if it succeeded in doing what you needed, and if not, why not.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.