LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 05-26-2012, 03:29 PM   #1
chiendarret
Member
 
Registered: Mar 2007
Posts: 307

Rep: Reputation: 16
Four GTX-580 on a single consumer board


In wonder whether a consumer motherboard now exists which supports four GTX-580 cards. In consumer hardware, with a six-logical cpus per unit, two cpu sockets should be available. This because a GTX-580 must be supported by two cpus for my use.

I already have an Antec case with a GA-890FXA-UD5 motherboard, an AMD Phenom II 1075, two Zotac GTX-580, and 850W Corsair power source. I use this system for scientific number crunching, not for its graphical potential. Even at full use, the GTX-580 do not release much heat; in fact the Antec need not be ventilated at its highest possibilities. I would like to replace this motherboard, doubling the AMD Phenom and the Zotac. I also have a modern 1000W Enermax power source to replace the 850W. It is on an obsolete four double-AMD server, no more in use.

Each socket should have place for mem slots.

I know that such a system could be built on server material. However, the software for my use is compiled at single precision, so that I do not require server hardware (which is notably more expensive).

Thanks

chiendarret
 
Old 05-27-2012, 03:46 AM   #2
cascade9
Senior Member
 
Registered: Mar 2011
Location: Brisneyland
Distribution: Debian, aptosid
Posts: 3,753

Rep: Reputation: 935Reputation: 935Reputation: 935Reputation: 935Reputation: 935Reputation: 935Reputation: 935Reputation: 935
You cannot have 2 physical CPUs sockets with 'dekstop' level hardware. To have 2 CPUs, you must use server hardware.

There are a few server level motherboards with 2 x CPU sockets and 4 x PCIe x16 slots (eg asus KGPE-D16). That baord and a few others with 4 x PCIe x16 slots cant run 4 x GTX 580s, as the GTX 580 is a double slot card and there is not enough room for the cards in the slots.

You *might* be able to find 2 x GTX 590s ( GTX 590 = 2 x GTX 580 GPUs, slightly downclocked, on a single PXCIe x16 board). GTX 690 is the same thing, 2 slightly downclocked GTX 680 GPUs on a single board.

BTW, the GTX 670 is about the same cost as a GTX 580, uses less power, and outputs less heat. Unless your number crunching is dependant on the render output unit, the GTX 670 should be faster than GTX580.
 
Old 05-27-2012, 10:49 AM   #3
salasi
Senior Member
 
Registered: Jul 2007
Location: Directly above centre of the earth, UK
Distribution: SuSE, plus some hopping
Posts: 4,070

Rep: Reputation: 897Reputation: 897Reputation: 897Reputation: 897Reputation: 897Reputation: 897Reputation: 897
Quote:
Originally Posted by chiendarret View Post
In consumer hardware, with a six-logical cpus per unit, two cpu sockets should be available. This because a GTX-580 must be supported by two cpus for my use.
Not sure that I understand why you are saying that two cpu sockets should be available. Note that in servers, if you want to go multi-socket, the Opterons have to be the right Opterons, because some do not have the correct HT bus to allow communication between the cpus in the different sockets.

So, it is not even true that all server parts can do this, but it is true that no current consumer parts can. GTX-580s are not involved in the logic of availability of the correct CPUs.

Quote:
Originally Posted by chiendarret View Post
I know that such a system could be built on server material. However, the software for my use is compiled at single precision, so that I do not require server hardware (which is notably more expensive).
Single precision (or otherwise) isn't a factor. Server motherboards are more expensive (at least, more expensive than ordinary consumer motherboards, although some gamer motherboards do run it rather closer), that's true, but server cpus can be comparably priced at the low end. At the high end, everything is expensive.
 
Old 05-28-2012, 01:29 AM   #4
chiendarret
Member
 
Registered: Mar 2007
Posts: 307

Original Poster
Rep: Reputation: 16
Thanks for answering. Single precision (as for classical molecular dynamics) is faster and does not require server hardware. With molecular dynamics, the need of two CPUs per GTX-580 card is proven.

Based on the answers, I wonder whether there are consumer mainboards that can host three GTX-580 (the one I have now, GA-890FXA-UD5) accepts only two of them. As to the particular card, I stick at GTX-580 because I already have two of them. Also, the GTX-580 are well proven for molecular dynamics. Moreover, there are no safe investments in this area: CUDA may be soon abandoned, if developers find OpenGL usable and faster. Another reason for sticking to consumer hardware.

I am a biochemist, don't pretend to dwell on hardware, which is largely out of my competence. However, I find safe to take into account what comes out established about hardware from our molecular dynamics forum.

chiendarret
 
Old 05-28-2012, 03:56 AM   #5
chiendarret
Member
 
Registered: Mar 2007
Posts: 307

Original Poster
Rep: Reputation: 16
Adding to my reply, I am now considering the RUNNER motherboard

MSI MB 990XA-GD80 AM3 SB950 4DDRIII PCI-E ESATA SATA III ATX

which is said to host three GTX-580 and accept AMD Phenom II 1075T.

Any comment about?

Thanks a lot

chiendarret
 
Old 05-28-2012, 04:22 PM   #6
cascade9
Senior Member
 
Registered: Mar 2011
Location: Brisneyland
Distribution: Debian, aptosid
Posts: 3,753

Rep: Reputation: 935Reputation: 935Reputation: 935Reputation: 935Reputation: 935Reputation: 935Reputation: 935Reputation: 935
Quote:
Originally Posted by chiendarret View Post
Based on the answers, I wonder whether there are consumer mainboards that can host three GTX-580 (the one I have now, GA-890FXA-UD5) accepts only two of them.
GA-890FXA-UD5 should run 3 x GTX580s. 1 in PCIEX16_1, 1 in PCIEX16_2, 1 in PCIE8X.

Thee are AM3/AM3+ (and intel) boards that will run 4 x GTX580s. Only 'top of the line' 890FX/990FX boards will do it with AMD. Eg GA-890FXA-UD7 (AM3) and GA-990FXA-UD7 (AM3+).

There is even a pic of a GA-990FXA-UD7 running 4 x double slot cards on the gigabyte website.

http://www.gigabyte.com/products/pro...px?pid=3880#ov

Quote:
Originally Posted by chiendarret View Post
MSI MB 990XA-GD80 AM3 SB950 4DDRIII PCI-E ESATA SATA III ATX

which is said to host three GTX-580 and accept AMD Phenom II 1075T.
You've typoed, you mean 990FXA-GD80.

No point buying that, it wont run any more GTX580s than the GA-890FXA-UD5. IMO the GA-890FXA-UD5 is a better board as well.
 
Old 05-28-2012, 04:38 PM   #7
TobiSGD
Moderator
 
Registered: Dec 2009
Location: Germany
Distribution: Whatever fits the task best
Posts: 17,148
Blog Entries: 2

Rep: Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886
Quote:
Originally Posted by chiendarret View Post
Single precision (as for classical molecular dynamics) is faster and does not require server hardware.
Neither does double precision. Which precision you use does not in anyway relate to the motherboard/CPU you choose.

Quote:
Moreover, there are no safe investments in this area: CUDA may be soon abandoned, if developers find OpenGL usable and faster. Another reason for sticking to consumer hardware.
I think you mean OpenCL. If CUDA will be abandoned just change to OpenCL, it is well supported by Nvidia and will just run fine. Has also nothing to do with if you get consumer hardware or professional equipment.
 
Old 05-29-2012, 01:08 AM   #8
chiendarret
Member
 
Registered: Mar 2007
Posts: 307

Original Poster
Rep: Reputation: 16
Hi Cascade9:
The PCI specifications for GA-890FXA-UD5 are the same as for the GA-890FXA-UD5 that I currently use. Only two at x16.

Yes, 990FXA-GD80, sorry for the typo.

What I would like to see in mainboard specifications, especially for GPU-CPU, is bandwidth in physically unambiguous terms, such as, for example, bytes/second. The 990FXA-GD80 are particularly suspicious: four x16 2.x with only six logical CPUs is something that can simply not work. Number crunching requires two CPUs per GPU, otherwise the CPU memory bandwidth is soon out of game. The advantage of GPU comes only out for computations on very large systems, above 200,000 atoms. Below that, it is better to stick to a decent 24 CPUs system.

Hi TobiSGD:
Running ab initio code (which is at double precision) for an extensive time lag on consumer mainboards will lead inevitably to errors. Try that and you will see. For such computations (Hartree-Fock, or higher) we use server-type mainboards with ECC mem.

For classical molecular mechanics, based on Newton's equation, double precision would only slow down the computation, without any other gain. We use consumer mainboards for such computations.

You see that the choice of the mainboard is also related to single/double precision. For our work, this is the first choice.

Cheers
chiendarret
 
Old 05-29-2012, 01:39 AM   #9
cascade9
Senior Member
 
Registered: Mar 2011
Location: Brisneyland
Distribution: Debian, aptosid
Posts: 3,753

Rep: Reputation: 935Reputation: 935Reputation: 935Reputation: 935Reputation: 935Reputation: 935Reputation: 935Reputation: 935
Quote:
Originally Posted by chiendarret View Post
Hi Cascade9:
The PCI specifications for GA-890FXA-UD5 are the same as for the GA-890FXA-UD5 that I currently use. Only two at x16.

Yes, 990FXA-GD80, sorry for the typo.
GTX 580s will run at PCIe x8.

AFAIK you cannot get more than 2 PCIe x16 slots (running at PCIe x16) with any current AMD chipsets. You migth find some nVidia chipset AMD baords with nforce 200 chips that run 3 or 4x PCIe x16, but the nforce 200 chip adds latency so its not going to be any faster than PCIe x8.

Quote:
Originally Posted by chiendarret View Post
What I would like to see in mainboard specifications, especially for GPU-CPU, is bandwidth in physically unambiguous terms, such as, for example, bytes/second.
The bandwidth is lsted as PCIe lanes. Its on every motherboard manufacturers website. Soem of them are a bit nasty with hiding the real capabilities, but as long as you know what the chipset can do its hard to get caught with dodgy marketing. BTW, PCIe 1.0, 2.0 and 3.0 all have different bandwidths per PCIe lane (PCIe 1.X = 250MB/sec, PCIe 2.0 = 500MB/sec, PCIe 3.0 = 1GB/sec)

So PCIe 2.0 x16 = 8GB/sec. PCIe 2.0 x8 = 4GB/sec.

Quote:
Originally Posted by chiendarret View Post
The 990FXA-GD80 are particularly suspicious: four x16 2.x with only six logical CPUs is something that can simply not work. Number crunching requires two CPUs per GPU, otherwise the CPU memory bandwidth is soon out of game. The advantage of GPU comes only out for computations on very large systems, above 200,000 atoms. Below that, it is better to stick to a decent 24 CPUs system.
4 x PCIe x16 _physical_ slots. They dont all run at x16. 2 x 16, 1 x8, 1 x4. It's on the MSI site. Though in typical MSI fashion, you only get that info if you look at the the 'Basic Specification', when you look at the 'Detail Specification' it just tells you 4 x PCIe x16.

I'd like to see what has led you to decide that you need 2 CPUs per GPU. CPU memory bandwidth varies at lot, and maybe someone tested back in the days of DDR2-533/667, not with current DDR3-1600/1866.

Last edited by cascade9; 05-29-2012 at 01:42 AM.
 
1 members found this post helpful.
Old 05-29-2012, 06:16 AM   #10
TobiSGD
Moderator
 
Registered: Dec 2009
Location: Germany
Distribution: Whatever fits the task best
Posts: 17,148
Blog Entries: 2

Rep: Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886
Quote:
Originally Posted by chiendarret View Post
For such computations (Hartree-Fock, or higher) we use server-type mainboards with ECC mem.
ECC memory means nothing more that there is an correction in the case that minor errors occur. I never understood that, that shouldn't be errors in the first place, if there are you either run the RAM/memory controller out of their specifications, there are incompatibilities or one of them is faulty. If that is the case you will have errors in single precision also. Instead of relying on error correction the issue should be fixed.

Quote:
What I would like to see in mainboard specifications, especially for GPU-CPU, is bandwidth in physically unambiguous terms, such as, for example, bytes/second. The 990FXA-GD80 are particularly suspicious: four x16 2.x with only six logical CPUs is something that can simply not work.
The bandwidth available to the PCIe slots is a fixed number, dependent on the number of lanes and the generation of the specification, as cascade9 already pointed out. It will be the same, regardless if you have a dual-core CPU or a 16-core CPU. If a six-core CPU slows down the computations if you use 4 video cards then this is because of to less CPU power, not to less bandwidth. In that case a change of the board will not help, you need a more powerful CPU.
 
1 members found this post helpful.
Old 05-29-2012, 09:57 AM   #11
chiendarret
Member
 
Registered: Mar 2007
Posts: 307

Original Poster
Rep: Reputation: 16
Well, I must thank all kind guys who spent time in shedding much light in this matter. It will also be useful for other users with limited understanding of hardware.

As a final question, sticking as I must to two x16 lanes, 2.0, I could have replaced the two GTX-580 with two faster cards, for example GTX-680, or better. Is that a possibility with my motherboard GA-890FXA-UDC5 and Phenom II 1075T? Which GPU cards should allow max performance?

Actually, already at its present configuration, it is a remarkably good system for the energy it uses. With very large computations, it rivals 32 CPUs at the same clock as present AMD, which would be so much more expensive to run. With small systems (< 200,000 atoms) it is not to use. For ab initio computations, it is not only the lack of ECC, but the available 8GB ram can't cope with the large matrices; as soon as access to the HD is required, the computation can't be carried out. With molecular mechanics the need of ram is very modest; what matters is the CPU clock in CPU machines and a perfect interconnection between fast GPU and fast CPU with large mem bandwidth.

Thanks a lot
chiendarret
 
Old 05-29-2012, 03:04 PM   #12
TobiSGD
Moderator
 
Registered: Dec 2009
Location: Germany
Distribution: Whatever fits the task best
Posts: 17,148
Blog Entries: 2

Rep: Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886
When it comes to pure computing performance in OpenCL the GTX680 is only slightly faster than the GTX580, depending on the application sometimes even slower. The Radeon HD7970 beats the GTX580 and GTX680 in performance, depending on the application sometimes even with a factor greater than 2: http://techreport.com/articles.x/22653/7
The GTX690 is slightly slower than two GTX680, but also slightly slower than a GTX590. But even one HD7970 is faster in this application and two of them declassify the other cards: http://www.hardwareluxx.de/index.php...i.html?start=5 (sorry, German site)

Of course this are synthetic benchmarks and real statements for your case can only be made when benchmarking with your software.
If memory bandwidth makes really such a difference you would be better with buying a Intel socket LGA2011 board with Core i7-3930K or i7-3960X with 12 logical CPUs (6 physical) and four memory controllers instead of two for AMD. This will also allow more RAM, since those boards simply have more slots for RAM.

Last edited by TobiSGD; 05-29-2012 at 03:08 PM.
 
1 members found this post helpful.
Old 05-29-2012, 04:10 PM   #13
chiendarret
Member
 
Registered: Mar 2007
Posts: 307

Original Poster
Rep: Reputation: 16
Thanks a lot for so much useful information.

I forgot previously to answer why CPU/GPU = 2. In molecular dynamics the largest part of the computation (the non-bonded) is carried out by the GPU, and this is why the GPU proved so useful, provided that much job is requested. Still, energy calculations are left to the CPU because it proved difficult to implement this part to the GPU. very recently, developers are beginning to overcome this obstacle.

chiendarret
 
Old 05-30-2012, 02:52 AM   #14
chiendarret
Member
 
Registered: Mar 2007
Posts: 307

Original Poster
Rep: Reputation: 16
I also forgot to mention that ATI AMD Radeon is not compatible with CUDA, while my software (as all current MD softare, I believe) uses CUDA. CUDA is currently faster than OpenCL for molecular dynamics. Thus, the benchmarks are also to be seen under this viewpoint.
chiendarret
 
Old 05-30-2012, 06:03 AM   #15
TobiSGD
Moderator
 
Registered: Dec 2009
Location: Germany
Distribution: Whatever fits the task best
Posts: 17,148
Blog Entries: 2

Rep: Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886Reputation: 4886
Quote:
Originally Posted by chiendarret View Post
I also forgot to mention that ATI AMD Radeon is not compatible with CUDA, while my software (as all current MD softare, I believe) uses CUDA. CUDA is currently faster than OpenCL for molecular dynamics. Thus, the benchmarks are also to be seen under this viewpoint.
chiendarret
A statement like "CUDA is faster than OpenCL" when comparing different hardware platforms is almost pointless. If your CUDA program is let's say 20% faster than OpenCL on the same hardware, but the different hardware running the OpenCL code is able to outperform the CUDA hardware with a factor 2 then CUDA is definitely not faster than OpenCL in this case.
To make such statements you need to specifically compare your software with the same software written for OpenCL on both platforms, AMD and Nvidia.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] A MTRR/Geforce GTX 580 issue ? aihaike Slackware 11 12-04-2011 04:24 PM
Does a u-boot for a single core board works for a dual core board? archieval Linux - Embedded & Single-board computer 0 06-06-2011 03:34 AM
Fedora-DS replication issue in a Single Master / Consumer Envrionment sea-bass Linux - Server 0 03-17-2008 07:11 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 04:26 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration