LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 05-29-2011, 09:02 PM   #16
cuizehan
LQ Newbie
 
Registered: May 2011
Posts: 2

Rep: Reputation: Disabled

Quote:
Originally Posted by Skaperen View Post
I thought I'd toss out possible interleaving schemes just so you'd have an idea what kinds of things I was thinking about. In these descriptions I will assume the access word size is 64-bits. That means the low order 3 bits of the byte address is not used. The address bits above that would be used in the descriptions below.

Any interleave involves some kind of address transform from the address the CPU accesses, to end up with the addresses requested over the memory channels, and which memory channel is used for that request.

A basic 2-way interleave would use the lowest address bit to choose which channel to access. The address bits above that would be passed to the memory devices to select the word to access. This ensures that a sequential access of memory alternates between the channels in a two channel system. If both channels are actually fetched in parallel when either word is accessed (and when there is a cache miss), then both channels are loaded into cache. So the 2nd word of that sequential memory READ will see a cache hit. Writes would just do a write-through.

A 3-way is where it gets complicated.

The equivalent of the basic interleave is to take the address and divide by 3. The quotient is passed to the memory devices as the address, and the remainder chooses which channel. The trouble with this is that there is some delay through all those decision gates before the quotient and remainder are available. Unless someone can somehow come up with a zero-delay divide-by-three gate matrix, this is not a practical solution.

An alternative is to take just a few bits (3 bits for an 8/9 interleave) and apply the divide by 3. Since no power of 2 is divisible exactly by 3, this is an uneven amount of access. That is, for 8 sequential addresses, they would alternate through the three channels in the order 0, 1, 2, 0, 1, 2, 0, 1 ... and stop there (the next 8 addresses starts over at 0 again, not at 2). That means 1/9th of memory is lost (every 3 positions in the 3rd channel address space would be skipped over). This would be a poor solution. Doing this with a larger number of bits might be doable because less memory would be lost. A 32/33 interleave would lose 1/33rd of memory (every 11th position in the 3rd channel would be skipped over), which might be more acceptable. But there is one problem as we keeping going up with this ... we get longer delays in doing that division by 3 with more bits. But maybe there is a good tradeoff point somewhere.

There are a couple other ideas I'm thinking of, but I haven't figured them out completely. I'm not even sure they'd work (I'd have to work out the design all the way, probably, to determine that). One of them involves scattering the accessed addresses around the three channels in a non-linear order (using an XOR matrix). One disadvantage with that is you cannot always parallel load the cache (although it might be possible to selectively do so where the address transforms favor it).

A 4-way interleave is basically as simple as a 2-way, but you have 2 bits selecting the channel instead of just 1 bit. Basically, whenever the interleave is a power of two, you can do the needed modular arithmetic by just routing address bit lines.

Hey Intel and AMD ... just move on up to quadruple channel memory and simplify life.

I don't know whether you have figure out this question.

I can provide some information here.

The interleaving scheme in triple channel mode is exactly the MOD3 operation. I measured this by using our HMTT hardware. Besides, in the intel datasheet(i7-900 datasheet volume2 section 2.9) there are related information: three schemes are like you have described above, one is using 3 bits, one is using 3 bits and XOR, and one is MOD3 which is our situation.

Recently, I'm confusing about how interleave will be performed if the need for triple channel operation is not satisfied, i.e., the capacity of each channel is not equal. If you have related information, please share with me.
 
Old 06-01-2011, 08:07 AM   #17
Skaperen
Senior Member
 
Registered: May 2009
Location: center of singularity
Distribution: Xubuntu, Ubuntu, Slackware, Amazon Linux, OpenBSD, LFS (on Sparc_32 and i386)
Posts: 2,684

Original Poster
Blog Entries: 31

Rep: Reputation: 176Reputation: 176
Quote:
Originally Posted by cuizehan View Post
I don't know whether you have figure out this question.

I can provide some information here.

The interleaving scheme in triple channel mode is exactly the MOD3 operation. I measured this by using our HMTT hardware. Besides, in the intel datasheet(i7-900 datasheet volume2 section 2.9) there are related information: three schemes are like you have described above, one is using 3 bits, one is using 3 bits and XOR, and one is MOD3 which is our situation.

Recently, I'm confusing about how interleave will be performed if the need for triple channel operation is not satisfied, i.e., the capacity of each channel is not equal. If you have related information, please share with me.
The mod3 scheme would seem to require extra circuitry to do that calculation, and a lot of gates that would delay the address propagation. Is it uniformly mod3 over the entire address space, or is it mod3 across some subset of the address bits? If the latter, and if done over lower bits, then that would leave the mapping out of balance since every power of 2 mod 3 is never 0.

Searching on developer.intel.com finds nothing about details of memory interleaving. At least one document I found said it did interleave but gave no further details (besides where to plug in DIMMS to gain triple-channel speed on certain Intel boards). There's more than one way to interleave, so I make no assumptions about this from what they say.

Last edited by Skaperen; 06-01-2011 at 08:16 AM.
 
Old 06-01-2011, 08:22 AM   #18
cuizehan
LQ Newbie
 
Registered: May 2011
Posts: 2

Rep: Reputation: Disabled
Quote:
Originally Posted by Skaperen View Post
The mod3 scheme would seem to require extra circuitry to do that calculation, and a lot of gates that would delay the address propagation. Is it uniformly mod3 over the entire address space, or is it mod3 across some subset of the address bits? If the latter, and if done over lower bits, then that would leave the mapping out of balance since every power of 2 mod 3 is never 0.
yeah, it is MOD3 over the entire address space. I think the MOD3 may not cost too much, compared to the complicated memory controller and the long memory access latency.
 
Old 06-01-2011, 11:05 AM   #19
Skaperen
Senior Member
 
Registered: May 2009
Location: center of singularity
Distribution: Xubuntu, Ubuntu, Slackware, Amazon Linux, OpenBSD, LFS (on Sparc_32 and i386)
Posts: 2,684

Original Poster
Blog Entries: 31

Rep: Reputation: 176Reputation: 176
They could still get some of the speedup of triple-channel memory if they also supported having one of the slots populated with memory twice are large as the other two. In this case it would be mod4 interleaved like 0,1,0,2 (where 0 has a double sized DIMM, or interconnected with 2 DIMMS). Then you could have true powers of 2 like 4G, 8G, 16G, or even 32G, while still being somewhat faster than plain double-channel.

Maybe we'll eventually see quad-channel memory.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
what is 'architecture' in 'binary for an architecture'?multiple architecture support? wagaboy Linux - Newbie 2 07-10-2010 11:18 AM
Mixing memory in triple channel board andrewjg Linux - Hardware 1 03-23-2010 02:08 AM
LXer: Initializing Memory Efficiently on Power Architecture Platforms LXer Syndicated Linux News 0 05-25-2006 07:54 PM
Dual channel memory need to recompile kernal? jimdaworm Linux - Hardware 2 02-07-2004 03:31 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 06:14 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration