Simultaneous multithreading support?
I was looking around on Wikipedia, and ran across the UltraSPARC T2 processor. This is a processor with 8 cores (8 physical processors) and 8 threads per core (64 logical processors!)
I'm aware of Intel's Hyper-Threading and other technologies, and I know collisions between the threads have the possibility of decreasing the processor's throughput.
From what I know, the O(1) scheduler (and surely the current CFS scheduler) is aware of Hyper-Threading, and is careful with processor affinity for threads (mainly, it tries to fill every physical processor before giving any multiple threads, and it tries to keep processes that much move between logical processors on the same physical processor.)
However, although that certainly helps somewhat, I believe the largest gain can come if the compiler is allowed to re-arrange instructions, so that more than two threads can be run at once without collisions in one physical processor. For this to happen, I believe we'd need to come up with a new executable format that can store multiple threads in one file, so the scheduler can schedule them as one block.
For example, with the current scheduler and executable formats, the UltraSPARC would appear as 64 logical processors. However, if we consider a system that consists entirely of thread-aware systems, it might look at it as 8 logical processors, and schedule eight processes at a time. Since only one process would exist on each physical processor, the compiler would be able to optimize and significantly reduce resource contention (which I mention below.)
We can look at one physical processor with 8 threads of execution (or any number >2.) If you run 8 different processes, as Linux would currently do, there would likely be a lot of resource contention. My guess is that most of the time, several threads would be waiting for the unit they need to become available. I feel that this is supported because I saw a number from Sun (I forgot where) that the correct optimization settings on their compiler can yield a 300% speedup, on a practical workload.
Is there any infrastructure available for Linux that supports processors like the UltraSPARC T2, that have a high number of hardware threads per core? Also, I know this would be a large task to implement, and there would be hurdles (how do you mix multithreaded processes with single-threaded processes on one machine?), and I am wondering if others think there would be enough benefit to justify the effort.
Are there any further thoughts on this?
|