[SOLVED] I'm confused as to what is meant by 12-bit offset in 32-bit intel processors...

trist007 · 08-07-2011, 05:48 PM

Intel 32-bit processors use a 4096 byte page file and word size = 32 bits. Which means each instruction is 32 bits or 4 bytes.
Take this memory location in hex

Code:

0x47F0Fxxx

From what I understand that last 3 digits represent the 12-bit offset from the base register. What does that mean exactly? In hexadecimal, according to this

Code:

16^0=1
16^1=16
16^2=256
16^3=4096
16^4=65,536
16^5=1,048,576
16^6=16,777,216
16^7=268,435,456

The last three digits can be anywhere from 0-255 bits or 32 bytes.
What do they mean by a 12-bit offset?

Proud · 08-07-2011, 06:02 PM

Hex/Oct/whatever is just another way of writing a value without using all the 1s and 0s, to save space. Hex F = binary 1111.

Code:

01010101 01010101 01010101 01010101
----------------------++++ ++++++++ <- Last 12 bits

2^12 = 4096 = 4K.

If the 32bit value addresses memory, you could define the start of each 4K page as being when the last 12bit are all 0. Any other value stored in/intepreted from those bits would be a location within that page, while we're retaining the page number/reference in the 13th-31st bits.
Taking this original full number and doing a bitwise AND with 00000FFF gives you a 32bit value which could be interpreted as a whole as a number that'll be somewhere within 0 and 4K, and it was found offset 12bits into(from the least siginificant bit) the full 32bit value.

Code:

01010101 01010101 01010101 01010101
00000000 00000000 00001111 11111111
----------------------------------- AND
00000000 00000000 00000101 01010101

It's also the number with which you offset your adressing into the page, which starts at a base value of

Code:

01010101 01010101 01010101 01010101
11111111 11111111 11110000 00000000
----------------------------------- AND
01010101 01010101 01010000 00000000

Hope that helped, I'm no expert.

corp769 · 08-07-2011, 06:08 PM

Quote:

Originally Posted by Proud

Hex/Oct/whatever is just another way of writing a value without using all the 1s and 0s, to save space. Hex F = binary 1111.

Code:

01010101 01010101 01010101 01010101
----------------------++++ ++++++++

2^12 = 4096 = 4K.

If the 32bit value addresses memory, you could define the start of each 4K page as being when the last 12bit are all 0. Any other value stored in/intepreted from those bits would within that page while retaining the page number/reference in the 13th-31st bits.
Taking this original full number and doing a bitwise AND with 00000FFF gives you a 32bit value which could be interpreted as a number that'll be somewhere within 0 and 4K, and it was found offset 12bits into(from the least siginificant bit) the full 32bit value.

Code:

01010101 01010101 01010101 01010101
00000000 00000000 00001111 11111111
-----------------------------------
00000000 00000000 00000101 01010101

Hope that helped, I'm no expert.

Yes, that is definitely correct. Just remember that the offset is just the number used to find your address space within memory, so when programming and dealing with offsets, you don't want to be even a digit off.

Cheers,

Josh

johnsfine · 08-07-2011, 07:10 PM

Quote:

Originally Posted by trist007

word size = 32 bits. Which means each instruction is 32 bits or 4 bytes.

No, it does not mean that.

Instructions are variable length. Some instructions are just one byte long. Many are far more than four bytes long (I forget the rules for the absolute maximum instruction length).

The "word size" in much older simpler CPU architectures was very significant and determined the sizes of many different things in the CPU design. In Intel x86, the "32-bit" has limited significance. Many registers, data paths, address sizes, etc. are larger than 32 bits in 32-bit x86.

Quote:

Take this memory location in hex

Code:

0x47F0Fxxx

From what I understand that last 3 digits represent the 12-bit offset from the base register. What does that mean exactly?

I don't think the thing you are talking about is called a "base register".

In the mode of 32-bit x86 used by Linux and Windows, virtual addresses are 32 bits and every virtual address is translated to a physical address by the cpu. In that translation, the bottom 12 bits are not changed (the bottom 12 bits of the physical address matches the bottom 12 bits of the virtual address).

So you can view the virtual address as consisting of 20 bits that select which page is addressed and 12 bits which select which byte within the page.

Quote:

The last three digits can be anywhere from 0-255 bits or 32 bytes.
What do they mean by a 12-bit offset?

I'm not sure what you think the above means. Maybe Proud already cleared up that part of your confusion (see post #2 in this thread). Three hex digits encode 4096 (decimal) different values. As the low 12 bits of an address, 3 hex digits encode a byte position of 0 to 4095 within a page.

In ordinary addressing, bytes are addressed, not bits. So 256 bits is just 32 bytes, but that fact is not directly relevant to ordinary addressing.

32-bit x86 architecture also supports segmented addressing that does involve "base" registers combined with offsets. But those offsets are not 12-bit and that type of addressing is not significantly used in Linux nor 32-bit Windows. So you seem to be combining some terminology and concepts from segmented addressing into a question about the non segmented addressing Linux uses.

trist007 · 08-08-2011, 08:45 AM

Ok, I think I'm understanding it better. The first 5 hex values represent a given Virtual Page.
0x47F0F000 - 0x47F0FFFF is one whole page = 4095 bytes. So the last 3 FFFs are where in that Virtual Page we are.
So the last 3 at FFF represent 4095 in decimal?

The purpose of the 12-bit offset? To make sure everything lines up so you won't be off by one bit? 2^12 = 4096 does not imply 4096/12

: binary multiplication
2^0=1
2^1=2
2^2=4
2^3=8
2^4=16
2^5=32
2^6=64
2^7=128

So let's say you had 1011, so it would be 8(1)+0(0)+2(1)+1(1) = 11

: hexadecimal multiplication
16^0=1
16^1=16
16^2=256
16^3=4096
16^4=65,536
16^5=1,048,576
16^6=16,777,216
16^7=268,435,456

So if you had FFF, wouldn't it be 256(15) + 16(15) + 1(15) = 4095

So yes the purpose of 12-bit offsets?

SigTerm · 08-08-2011, 08:57 AM

Quote:

Originally Posted by trist007

0x47F0F000 - 0x47F0FFFF is one whole page = 4095 bytes.

4096 bytes. You forgot one byte - either first or last one.
General rule - if zero-based array (starts at index 0) has N elements, then index of last element is N-1.

johnsfine · 08-08-2011, 09:29 AM

Quote:

Originally Posted by trist007

The purpose of the 12-bit offset? To make sure everything lines up so you won't be off by one bit? 2^12 = 4096 does not imply 4096/12

What strange meaning are you giving to the phrase "12-bit offset".

A reasonable meaning is that there is an offset (which in this case refers to the byte position within a page) and 12 bits of the address are used to encode that offset.

12-bits is the size of the encoding of the offset. You seem to be treating it as a fixed amount of "offset". Even so, I don't understand what you expect would be divisible by 12.

The word "offset" in computer programming has a very general meaning. To know what it means in a specific context, you need to understand the context. Similarly, a lot of the words around "offset" must be interpreted either through context or reasonableness.

Some data structure might include a "500 byte offset". In that case reasonableness tells us the direct magnitude of the "offset" is 500 bytes. No one would use 500 bytes to encode an offset.

But when you talk about a "12 bit offset" mere reasonableness can't tell you whether that describes a fixed tiny offset in some bit packed data structure vs. the size of the encoding of a variable offset. You need context.

trist007 · 08-08-2011, 11:35 AM

Ok great, I have a good handle on it now, thank you.

trist007 · 08-08-2011, 11:52 AM

So here, 0x80497bf, in this case the page number is 0x8049 16-bit page number, and 7bf 12-bit offset, which is a 28-bit address. Addresses are normally 32-bit no? Usually the page number is 5 hex digits instead of 4. I guess pages in lower memory are 28-bit? Or is it because 0x8049 is really 0x08049, which case it would be a 20-bit page number and 7bf a 12-bit offset, which would give us a 32-bit address. Each hex digit is 4 bits correct?

johnsfine · 08-08-2011, 12:09 PM

Quote:

Originally Posted by trist007

So here, 0x80497bf, in this case the page number is 0x8049

Calling that a "page number" is a bit loose with terminology and might lead to confusion later. But you are correct within this limited context.

Quote:

I guess pages in lower memory are 28-bit?

No idea what you might mean by that, but ..

Quote:

Or is it because 0x8049 is really 0x08049

Correct. More significantly, the original 32 bit virtual address was 0x80497bf, which means it was really 0x080497bf

trist007 · 08-08-2011, 12:21 PM

Awesome, ok I got it now, thanks guys.