Originally Posted by iam_techno
I have just recently expanded my programming purview by learning, albeit slowly, assembly programming.
I have a Phenom x4 that has four cores.
I know that it supports SSE-SSE4a instructions.
Some of the operations (like SQRTPS) can operate on 4 items at the time.
Becasue it is 64bit capable, it has 16 XMM registers
So my question is this: Does each core of my Phenom have its own registers or are the registers shared? The reason I ask is that the other day I was thinking about this application I'd like to write that would benefit from being able to perform many simultaneous square root operations. So what I was thinking about doing was coding an example that used this algorithm:
Load numbers from source into a vector
Use OpenMP to create 4 threads (one for each core). Each doing:
Load up the 16 XMM registers with packed numbers read from vector
Perform SQRTPS on each XMM register
Write resultant numbers into another vector
Output resultant vector
Each core should have all of its own registers, unless AMD is running some kind of quiet scam on all of us. The only alternative would be for them to push register data in and out of memory constantly, which would have a pretty dramatic effect on performance which we all would have noticed by now.
I believe some of the Intel chips have "virtual cores", which means that some of the cores don't actual exist, but I don't think that is the case for a Phenom.