LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   How that is possible? CPU&memory (https://www.linuxquestions.org/questions/linux-newbie-8/how-that-is-possible-cpu-and-memory-4175446910/)

yaximik 01-23-2013 11:15 AM

How that is possible? CPU&memory
 
:confused: with simple question...
My RHEL58 box has two dual core Xeon CPUs with total 16 processors, which is confirmed by
Code:

grep processor /proc/cpuinfo
The box also has 12 slots each populated with 8GB, that is 96GB.

I thought that each processor has its own physical memory chunk, that is 96/16 = 6GB, which less than 8GB

In other words, how 12 memory banks are divided between 16 cores?

johnsfine 01-23-2013 11:39 AM

Each CPU (package) has direct access to some of the memory slots. But each CPU can access the rest of the memory indirectly through the CPU that has direct access.

That indirect access is completely transparent to the program. Each byte of ram has a system-wide unique physical address and the software simply accesses the memory (through the usual virtual to physical translation of course) and the access reaches the correct memory directly or indirectly.

My newer (obviously not very new) system at work (a Dell T5500) has 9 memory slots for two physical CPUs. One CPU has direct access to 6 memory slots and one CPU has direct access to 3 memory slots. That CPU type is designed to access three memory slots in parallel. So one CPU has just the 3 it can access in parallel, while the other has two sets of 3.

Quote:

Originally Posted by yaximik (Post 4876216)
two dual core Xeon CPUs with total 16 processors

I'm missing at least a factor of two somewhere. Two physical CPUs. Two cores per CPU?? That seems like very old technology! But you have a modern amount of ram. Anyway, a total of four cores. If you had hyperthreading enabled, that could show up as eight processors. But you said sixteen.

Did you mean two quad core with hyperthreading? Or did you mean two eight core packages? Or what?

Quote:

Originally Posted by yaximik (Post 4876216)
I thought that each processor has its own physical memory chunk, that is 96/16 = 6GB,

That is your main confusion. In AMD designs and newer Intel designs, each CPU package has direct access to part of the memory as I described above. (In older Intel designs, both CPU packages had equal access to all of the ram).

The cores or hyperthread processors within a CPU package all have equal access to whatever ram that package can access.

I don't know whether your twelve sticks of ram are evenly divided or unevenly divided (or undivided) between the two CPU packages.

But all that has only secondary impact on how memory is used by each process in Linux. An OS that is aware of direct vs. indirect access to ram might try to assign a process to the CPU package with better access to most of that process's ram. It also (and more easily) might try to use ram better accessed from the package where the process is running to fill any new requests from that process. But to the extent such efforts fail, all the ram is still accessible from processes in either package. That access just might be a little slower.

shivaa 01-23-2013 11:51 AM

Can you share result of:-
Code:

~$ grep 'model name' /proc/cpuinfo
OR
~$ grep 'model name' /proc/cpuinfo | wc -l


yaximik 01-24-2013 08:34 AM

Quote:

I'm missing at least a factor of two somewhere.
Quote:

~$ grep 'model name' /proc/cpuinfo
I was wrong indeed. From the command above I got 16 entries
Code:

Intel Xeon CPU E5420 @ 2.4 GHz
Tech Guide for Dell T610 says it is a 4 core processor, so as I have two of them the total is 8 cores, but how come OS lists 16?

I guess 12 banks divided between 8 physical cores, 6 banks per quad core processor or per each of 2 CPU packages. Still puzzling relationship.

The reason I got interested in this is I am looking for ways to speed up computation using programs that split a task to run it in parallel on multiple cores. The question is how much memory each core will have - or this should not be my concern at all as OS will take care of memory distribution and 96/16=6GB is sufficient to know? Or is it in fact 96/8=12GB?

johnsfine 01-24-2013 11:24 AM

Quote:

Originally Posted by yaximik (Post 4876788)
I was wrong indeed. From the command above I got 16 entries

I checked a few sites, which all agree the E5420 does not have hyperthreading. So I don't understand why you see 16 processors.

Rather have have us guess what info to extract from /proc/cpuinfo it wouldn't be excessive to just copy/paste the entire /proc/cpuinfo into a reply here.

Quote:

Originally Posted by yaximik (Post 4876788)
I guess 12 banks divided between 8 physical cores, 6 banks per quad core processor or per each of 2 CPU packages.

The Dell T610 owners manual (online) confirms you have 6 sticks of ram for each of the two CPU packages.

Quote:

Originally Posted by yaximik (Post 4876788)
I am looking for ways to speed up computation using programs that split a task to run it in parallel on multiple cores. The question is how much memory each core will have - or this should not be my concern at all as OS will take care of memory distribution

Normally you should let the OS worry about dividing ram between processes. If one process needed 95GB and all the rest only added up to 1GB, the OS can do that, even though each CPU package has direct access to only 48GB. The indirect access is only slightly slower and is totally transparent to your code within the process (your single core process using 95GB would have no need to care that half of that is accessed through the other physical CPU packages).

sundialsvcs 01-24-2013 07:13 PM

In my experience, everything "depends." You might think you have lots-n-lots of "CPUs," especially with hyper-threading, and you might think that you've got lots-n-lots of accessible memory, but when you start to dig deeper into how your motherboard is laid-out, you find out what the difference between cheap mobo's and expensive ones actually is. :) Sometimes, having "all those cores" banging-away actually runs slower than it otherwise would, because what's actually happening is that they're competing with one another.

johnsfine 01-25-2013 07:54 AM

Quote:

Originally Posted by sundialsvcs (Post 4877027)
you find out what the difference between cheap mobo's and expensive ones actually is. :) Sometimes, having "all those cores" banging-away actually runs slower than it otherwise would

I don't think that is very much affected by the quality or price of the motherboard.

It is absolutely true that many algorithms can be divided up among multiple threads with no significant extra work nor synchronization overhead (so you would expect linear speedup with the number of cores), but when you actually try it you find two threads take much more than half as long as one thread. Then as you increase the number of threads further, the elapsed time goes up rather than down. With enough threads (even though you have enough cores and ram for that many threads) the elapsed time may be worse than with just one thread.

This effect depends on the size and structure of the CPU caches and on the memory access patterns of the algorithm. (The motherboard quality may also have some impact, but typically that is small). Contention between the threads can eliminate the benefits of having more than one thread.

You might want to use oprofile or similar tool to investigate a key performance measure of your code before going to the trouble to multi-thread it (Lots of effort has been wasted by some people I work with who would not follow my advice to do that step).
Non intrusive low level profilers work on a different principle than the more common profiling tools and it is information from that low level profiling that matters here. If you see an unusually high number of CPU cycles per instruction completed and you see a high cache miss rate, it is clear that splitting the work across two cores that share cache would cause a net increase in elapsed time. Even splitting across cores that don't share cache is likely to have little benefit. But if you see a low number of CPU cycles per instruction and/or you see a high level of branch misprediction (suffitient to explain high CPU cycles per instruction) then multi-threading is likely to have near linear benefits.

A high number of CPU cycles per instruction with low cache misses and low branch mispredictions could indicate a large concentration of divides and or square roots in your algorithm, which could benefit a lot from multi-threading. But you really should know your algorithm before taking that view. Why do you have a large concentration of divides and or square roots? If the reason is not fundamental to the job being done, then you might be misinterpreting the profile data and have a situation that would not benefit from multi-threading and/or you might have a performance flaw in your implementation and have better potential benefits from fixing the implementation than from multi-threading.

yaximik 01-26-2013 08:25 AM

For algorithms implemented by others it is often not possible to examine how it was done as a compiled code is all you get. For example, I got to run a code that was said was capable of multithreading, but people who tried did not see sufficient speed up with large datasets (dozens of GB). When I run the code on my box, I saw that at some point CPU got occupied at 100%, but in the Irix mode it was only about 6% per core (100/16 ~ 6.3%). As I did not know how to load CPU at a higher rate, I split the dataset to 5 equal size chunks and launched
5 instances of the code, each with its own chunk. CPU load eventually got up to ~500% or about 30% per core and the job effectively was done 5 times faster. I guess this is one way to speed up processing, but not all datasets are possible to split like that. But if possible - how big chunks can be processed in parallel, would a rough estimate like TotalMemory/NumberOfCores make a usable guidance? If the dataset cannot be split, is there another way to load CPU at a higher rate with one instance and would that be really helpful?

johnsfine 01-26-2013 08:53 AM

I'm still curious why grep found lines from 16 processors in your /proc/cpuinfo when you have clearly described a system with 8 cores and without hyperthreading.

Please post the full /proc/cpuinfo so we can see what is really there.

Quote:

Originally Posted by yaximik (Post 4877930)
For algorithms implemented by others

You generally can't make it multi-thread if it doesn't do so on its own.

Quote:

Originally Posted by yaximik (Post 4877930)
I got to run a code that was said was capable of multithreading, ... When I run the code on my box, I saw that at some point CPU got occupied at 100%,

So the code did not multi-thread when you ran it. Maybe it needed some option you forgot to include. Maybe it doesn't really have useful multi-threading.

Quote:

Originally Posted by yaximik (Post 4877930)
I split the dataset to 5 equal size chunks and launched
5 instances of the code, each with its own chunk.

Very few problems can be divided that way. If you have such a problem, I guess that gives you a reasonable solution to multi-threading that problem (if you don't figure out the simpler way by giving the correct option to the original program).

Quote:

Originally Posted by yaximik (Post 4877930)
But if possible - how big chunks can be processed in parallel

That depends on the kind of processing. If you can get the right answer by slicing the problem in chunks, then I would expect the original algorithm knows that and reads only small amounts at once, so there should be no limit to the size of a chunk for your parallel split.

Quote:

Originally Posted by yaximik (Post 4877930)
If the dataset cannot be split, is there another way to load CPU at a higher rate with one instance and would that be really helpful?

That is the typical status of big problems. It isn't coded to effectively use multiple threads. Without recoding it, you can't make any meaningful use of multiple threads. Even if you have good control (access and expertise) over the source code, changing it to multi-thread may be very hard. If you change it well, it still might not give decent multi-thread performance for the reasons mentioned earlier.

yaximik 01-27-2013 08:38 AM

Quote:

I'm still curious why grep found lines from 16 processors in your /proc/cpuinfo when you have clearly described a system with 8 cores and without hyperthreading.

Please post the full /proc/cpuinfo so we can see what is really there.
Here it is:
Code:

yaximik@G5NNJN1 ~]$ cat /proc/cpuinfo
processor      : 0
vendor_id      : GenuineIntel
cpu family      : 6
model          : 44
model name      : Intel(R) Xeon(R) CPU          E5620  @ 2.40GHz
stepping        : 2
cpu MHz        : 1596.000
cache size      : 12288 KB
physical id    : 1
siblings        : 8
core id        : 0
cpu cores      : 4
apicid          : 32
fpu            : yes
fpu_exception  : yes
cpuid level    : 11
wp              : yes
flags          : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx pdpe1gb rdtscp lm constant_tsc ida nonstop_tsc arat pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm
bogomips        : 4788.16
clflush size    : 64
cache_alignment : 64
address sizes  : 40 bits physical, 48 bits virtual
power management: [8]

processor      : 1
vendor_id      : GenuineIntel
cpu family      : 6
model          : 44
model name      : Intel(R) Xeon(R) CPU          E5620  @ 2.40GHz
stepping        : 2
cpu MHz        : 1596.000
cache size      : 12288 KB
physical id    : 0
siblings        : 8
core id        : 0
cpu cores      : 4
apicid          : 0
fpu            : yes
fpu_exception  : yes
cpuid level    : 11
wp              : yes
flags          : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx pdpe1gb rdtscp lm constant_tsc ida nonstop_tsc arat pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm
bogomips        : 4788.19
clflush size    : 64
cache_alignment : 64
address sizes  : 40 bits physical, 48 bits virtual
power management: [8]

processor      : 2
vendor_id      : GenuineIntel
cpu family      : 6
model          : 44
model name      : Intel(R) Xeon(R) CPU          E5620  @ 2.40GHz
stepping        : 2
cpu MHz        : 1596.000
cache size      : 12288 KB
physical id    : 1
siblings        : 8
core id        : 1
cpu cores      : 4
apicid          : 34
fpu            : yes
fpu_exception  : yes
cpuid level    : 11
wp              : yes
flags          : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx pdpe1gb rdtscp lm constant_tsc ida nonstop_tsc arat pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm
bogomips        : 4788.03
clflush size    : 64
cache_alignment : 64
address sizes  : 40 bits physical, 48 bits virtual
power management: [8]

processor      : 3
vendor_id      : GenuineIntel
cpu family      : 6
model          : 44
model name      : Intel(R) Xeon(R) CPU          E5620  @ 2.40GHz
stepping        : 2
cpu MHz        : 1596.000
cache size      : 12288 KB
physical id    : 0
siblings        : 8
core id        : 1
cpu cores      : 4
apicid          : 2
fpu            : yes
fpu_exception  : yes
cpuid level    : 11
wp              : yes
flags          : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx pdpe1gb rdtscp lm constant_tsc ida nonstop_tsc arat pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm
bogomips        : 4788.00
clflush size    : 64
cache_alignment : 64
address sizes  : 40 bits physical, 48 bits virtual
power management: [8]

processor      : 4
vendor_id      : GenuineIntel
cpu family      : 6
model          : 44
model name      : Intel(R) Xeon(R) CPU          E5620  @ 2.40GHz
stepping        : 2
cpu MHz        : 1596.000
cache size      : 12288 KB
physical id    : 1
siblings        : 8
core id        : 9
cpu cores      : 4
apicid          : 50
fpu            : yes
fpu_exception  : yes
cpuid level    : 11
wp              : yes
flags          : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx pdpe1gb rdtscp lm constant_tsc ida nonstop_tsc arat pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm
bogomips        : 4788.03
clflush size    : 64
cache_alignment : 64
address sizes  : 40 bits physical, 48 bits virtual
power management: [8]

processor      : 5
vendor_id      : GenuineIntel
cpu family      : 6
model          : 44
model name      : Intel(R) Xeon(R) CPU          E5620  @ 2.40GHz
stepping        : 2
cpu MHz        : 1596.000
cache size      : 12288 KB
physical id    : 0
siblings        : 8
core id        : 9
cpu cores      : 4
apicid          : 18
fpu            : yes
fpu_exception  : yes
cpuid level    : 11
wp              : yes
flags          : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx pdpe1gb rdtscp lm constant_tsc ida nonstop_tsc arat pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm
bogomips        : 4788.04
clflush size    : 64
cache_alignment : 64
address sizes  : 40 bits physical, 48 bits virtual
power management: [8]

processor      : 6
vendor_id      : GenuineIntel
cpu family      : 6
model          : 44
model name      : Intel(R) Xeon(R) CPU          E5620  @ 2.40GHz
stepping        : 2
cpu MHz        : 1596.000
cache size      : 12288 KB
physical id    : 1
siblings        : 8
core id        : 10
cpu cores      : 4
apicid          : 52
fpu            : yes
fpu_exception  : yes
cpuid level    : 11
wp              : yes
flags          : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx pdpe1gb rdtscp lm constant_tsc ida nonstop_tsc arat pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm
bogomips        : 4788.03
clflush size    : 64
cache_alignment : 64
address sizes  : 40 bits physical, 48 bits virtual
power management: [8]

processor      : 7
vendor_id      : GenuineIntel
cpu family      : 6
model          : 44
model name      : Intel(R) Xeon(R) CPU          E5620  @ 2.40GHz
stepping        : 2
cpu MHz        : 1596.000
cache size      : 12288 KB
physical id    : 0
siblings        : 8
core id        : 10
cpu cores      : 4
apicid          : 20
fpu            : yes
fpu_exception  : yes
cpuid level    : 11
wp              : yes
flags          : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx pdpe1gb rdtscp lm constant_tsc ida nonstop_tsc arat pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm
bogomips        : 4788.04
clflush size    : 64
cache_alignment : 64
address sizes  : 40 bits physical, 48 bits virtual
power management: [8]

processor      : 8
vendor_id      : GenuineIntel
cpu family      : 6
model          : 44
model name      : Intel(R) Xeon(R) CPU          E5620  @ 2.40GHz
stepping        : 2
cpu MHz        : 1596.000
cache size      : 12288 KB
physical id    : 1
siblings        : 8
core id        : 0
cpu cores      : 4
apicid          : 33
fpu            : yes
fpu_exception  : yes
cpuid level    : 11
wp              : yes
flags          : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx pdpe1gb rdtscp lm constant_tsc ida nonstop_tsc arat pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm
bogomips        : 4788.03
clflush size    : 64
cache_alignment : 64
address sizes  : 40 bits physical, 48 bits virtual
power management: [8]

processor      : 9
vendor_id      : GenuineIntel
cpu family      : 6
model          : 44
model name      : Intel(R) Xeon(R) CPU          E5620  @ 2.40GHz
stepping        : 2
cpu MHz        : 1596.000
cache size      : 12288 KB
physical id    : 0
siblings        : 8
core id        : 0
cpu cores      : 4
apicid          : 1
fpu            : yes
fpu_exception  : yes
cpuid level    : 11
wp              : yes
flags          : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx pdpe1gb rdtscp lm constant_tsc ida nonstop_tsc arat pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm
bogomips        : 4777.30
clflush size    : 64
cache_alignment : 64
address sizes  : 40 bits physical, 48 bits virtual
power management: [8]

processor      : 10
vendor_id      : GenuineIntel
cpu family      : 6
model          : 44
model name      : Intel(R) Xeon(R) CPU          E5620  @ 2.40GHz
stepping        : 2
cpu MHz        : 1596.000
cache size      : 12288 KB
physical id    : 1
siblings        : 8
core id        : 1
cpu cores      : 4
apicid          : 35
fpu            : yes
fpu_exception  : yes
cpuid level    : 11
wp              : yes
flags          : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx pdpe1gb rdtscp lm constant_tsc ida nonstop_tsc arat pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm
bogomips        : 4788.04
clflush size    : 64
cache_alignment : 64
address sizes  : 40 bits physical, 48 bits virtual
power management: [8]

processor      : 11
vendor_id      : GenuineIntel
cpu family      : 6
model          : 44
model name      : Intel(R) Xeon(R) CPU          E5620  @ 2.40GHz
stepping        : 2
cpu MHz        : 1596.000
cache size      : 12288 KB
physical id    : 0
siblings        : 8
core id        : 1
cpu cores      : 4
apicid          : 3
fpu            : yes
fpu_exception  : yes
cpuid level    : 11
wp              : yes
flags          : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx pdpe1gb rdtscp lm constant_tsc ida nonstop_tsc arat pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm
bogomips        : 4788.05
clflush size    : 64
cache_alignment : 64
address sizes  : 40 bits physical, 48 bits virtual
power management: [8]

processor      : 12
vendor_id      : GenuineIntel
cpu family      : 6
model          : 44
model name      : Intel(R) Xeon(R) CPU          E5620  @ 2.40GHz
stepping        : 2
cpu MHz        : 1596.000
cache size      : 12288 KB
physical id    : 1
siblings        : 8
core id        : 9
cpu cores      : 4
apicid          : 51
fpu            : yes
fpu_exception  : yes
cpuid level    : 11
wp              : yes
flags          : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx pdpe1gb rdtscp lm constant_tsc ida nonstop_tsc arat pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm
bogomips        : 4788.03
clflush size    : 64
cache_alignment : 64
address sizes  : 40 bits physical, 48 bits virtual
power management: [8]

processor      : 13
vendor_id      : GenuineIntel
cpu family      : 6
model          : 44
model name      : Intel(R) Xeon(R) CPU          E5620  @ 2.40GHz
stepping        : 2
cpu MHz        : 1596.000
cache size      : 12288 KB
physical id    : 0
siblings        : 8
core id        : 9
cpu cores      : 4
apicid          : 19
fpu            : yes
fpu_exception  : yes
cpuid level    : 11
wp              : yes
flags          : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx pdpe1gb rdtscp lm constant_tsc ida nonstop_tsc arat pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm
bogomips        : 4788.05
clflush size    : 64
cache_alignment : 64
address sizes  : 40 bits physical, 48 bits virtual
power management: [8]

processor      : 14
vendor_id      : GenuineIntel
cpu family      : 6
model          : 44
model name      : Intel(R) Xeon(R) CPU          E5620  @ 2.40GHz
stepping        : 2
cpu MHz        : 1596.000
cache size      : 12288 KB
physical id    : 1
siblings        : 8
core id        : 10
cpu cores      : 4
apicid          : 53
fpu            : yes
fpu_exception  : yes
cpuid level    : 11
wp              : yes
flags          : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx pdpe1gb rdtscp lm constant_tsc ida nonstop_tsc arat pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm
bogomips        : 4788.03
clflush size    : 64
cache_alignment : 64
address sizes  : 40 bits physical, 48 bits virtual
power management: [8]

processor      : 15
vendor_id      : GenuineIntel
cpu family      : 6
model          : 44
model name      : Intel(R) Xeon(R) CPU          E5620  @ 2.40GHz
stepping        : 2
cpu MHz        : 1596.000
cache size      : 12288 KB
physical id    : 0
siblings        : 8
core id        : 10
cpu cores      : 4
apicid          : 21
fpu            : yes
fpu_exception  : yes
cpuid level    : 11
wp              : yes
flags          : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx pdpe1gb rdtscp lm constant_tsc ida nonstop_tsc arat pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm
bogomips        : 4788.03
clflush size    : 64
cache_alignment : 64
address sizes  : 40 bits physical, 48 bits virtual
power management: [8]

[yaximik@G5NNJN1 ~]$


johnsfine 01-27-2013 09:37 AM

Quote:

Originally Posted by yaximik (Post 4876788)
From the command above I got 16 entries
Code:

Intel Xeon CPU E5420 @ 2.4 GHz

Now that you showed the whole /proc/cpuinfo I see it is 16 entries, but they don't say "E5420 @ 2.4 GHz".

Anyway, it is absolutely clear that you have hyperthreading enabled so that each of your 8 cores pretends to be two, for a total of 16.

All the online info I found says the E5420 does not have hyperthreading. Maybe that online info is wrong. Maybe you don't have E5420 CPU's. But my interpretation of the /proc/cpuinfo data you just posted is not wrong. You posted info listing 8 real cores doubled by hyperthreading into 16 apparent processors.

Depending on the mix of work you run on that computer, you would probably get slightly better performance if you rebooted into the BIOS menu and found the BIOS option for hyperthreading and turned it off. But for other workloads, turning hyperthreading off would reduce the total work the system can do.

I do a LOT of compiling very large projects with single threaded compilers launched by a build system that is very flexible about running multiple compilers in parallel. On a system with 8 real cores and a hyperthreading option and enough ram for 16 compiles at once, I have found
1) Running 16 compiles in parallel with hyperthreading disabled is slightly better throughput than running 8 compiles in parallel with hyperthreading disabled.
2) Running 16 compiles in parallel with hyperthreading enabled is slightly better throughput than running 16 compiles in parallel with hyperthreading disabled.

But I also do some sophisticated large simulations on the same hardware (performance dominated by cache misses) with configurable thread count.
With or without hyperthreading enabled, selecting 2 threads gives me better performance than selecting 1 or selecting more than 2. When selecting 2 threads, the performance is slightly better if hyperthreading was disabled than if it was enabled.
That pattern will not be true of large simulation activities in general. But it is a common pattern. It matches what many other people have seen with other simulation jobs.

That performance behavior of multiple single threaded compilers in building very large projects is more general. It is true across many different compilers, across many different projects, across different build systems and across Windows vs. Linux. If you have a compiler that internally makes good use of multi-threading, performance issues may be very different. But for heavy use of single threaded compilers, those performance effects are quite reproducible, including the benefits of hyperthreading.

knudfl 01-27-2013 09:51 AM

# 10

Intel® Xeon® Processor E5620
http://ark.intel.com/products/47925/...-GTs-Intel-QPI
>>> # of Threads : 8

johnsfine 01-27-2013 10:49 AM

Edit: Oops! I had a stupid post here because of browser issues that didn't let me see part of the /proc/cpuinfo posted above.

I saw
Code:

model name      : Intel(R) Xeon(R) CPU
where I should have seen
Code:

model name      : Intel(R) Xeon(R) CPU E5620  @ 2.40GHz
I think the last bit was spaced over further to the right than the horizontal scroll let me reach. Now it is just far to the right, not too far to the right.
I looked at some Dell T610 documentation when writing post #5. If I had looked at the E5420 documentation more closely I would have realized it is based on a totally different relationship between the CPU and ram than that in the T610, so I could be sure there was no E5420 in a T610 even without seeing the list of compatible processors that I saw later.

btmiller 01-27-2013 12:43 PM

The OP's /proc/cpuinfo shows the E5620 as the processor model, so I'd imagine that's what processor he really has.

johnsfine 01-27-2013 01:33 PM

In the above discussion, I left out the occasional situation in which it is most important to turn hyperthreading off:

Some basic libraries of support software for large numerical algorithms are internally hyperthreaded and configure themselves automatically based on the apparent number of processors.

You could be using such software with a problem whose performance is dominated by cache misses. In that case use of 16 threads via hyperthreading would be horribly worse than using 8 threads.

If you are in control of the number of threads in such a situation, you could select 8 (or fewer) threads even though the system seems to have 16 processors, and that reduction to 8 threads from 16 would have almost as much benefit with hyperthreading left on as it would with hyperthreading off.

But with some programs you are stuck with the automatic configuration and so leaving hyperthreading on may devastate the total throughput.

If the OS is aware of hyperthreading it will use one thread of every real core before using the second thread of any core. The hardware is designed so that when one thread of a core is unused, the other thread runs almost as fast as the undivided core would have run when hyperthreading was disabled.

To a first approximation, using both threads of a hyperthreaded core makes each of them half as fast as the undivided core. So hyperthreading gives you twice as many processing units each half as fast.

But when your code stalls a lot on things like mispredicted branches (as many compilers tend to do) then each thread will be much better than 50% as fast as an undivided core so total throughput is improved by hyperthreading.
The same is true if the two processes tend to have very different kinds of stalls from each other: Such as one stalling on something like excess floating point divides and sqrts, while the other isn't using floating point at all.

But if both threads are stalling mainly on cache misses then each thread will be much worse than 50% of an undivided core. There is hardly any limit on how much worse, because the raw cache miss rate is increased in addition to the two cores contending on the same resource. Each thread could easily be slower than 10% the speed of an undivided core.


All times are GMT -5. The time now is 04:22 AM.