Intel syntax assembly code using dereferenced pointers as offsets

chrstrbrts · Jan 27, 2017

Hello,

I've been reading Intel's manual and I reviewed an old chapter this morning.

I found the following:

For example, the following MOV instruction moves a value from register EAX into the segment
pointed to by the ES register. The offset into the segment is contained in the EBX register:
MOV ES:[EBX], EAX;

I'm curious here about the use of the brackets around EBX.

The use of brackets used this way, around a register or memory location, is supposed be a dereferenced pointer.

That is, brackets used that way are supposed to mean "treat what's between these brackets as a pointer and go to the memory address that this pointer points to and manipulate what you find there."

But, in the above quote, Intel doesn't dereference the pionter.

In the above example, apparently, using [EBX] is the same as just using EBX.

When I see something like ES:[EBX], I think take EBX and treat it like it's a pointer that points somewhere in the active default data segment, DS.

Take what you find there and use that as the offset in the ES segment.

But, apparently that's not how it's done.

Why?

Is this just some quirk of the language syntax?

The only other place I've seen something like this is with the LEA, load effective address, machine instruction.

The LEA instruction looks like this I think: LEA register1, [expression usually containing registers]

Here, the expression in the brackets is not used to dereference the pointer.

It's just used in itself to move the calculated expression itself into register1.

Does the segment : offset syntax work the same way?

exdeath · Jan 28, 2017

[ ] specifies a memory access. Without brackets you just do a register to register move. This type of syntax is merely to specify intended addressing mode so the assembler spits out the correct opcode.

If you specify an explicit segment prefix or segment override it's obvious you want a memory access not a register access so it assembles the memory access opcode as if you'd had brackets, since an immediate or reg mov is nonsensical with a segment specification.

Try them both and compare the machine code.

This is if you are asking about mov es:ebx, eax? I don't recall if that is valid or not but if it is, since a segment prefix is meaningless to a register, the assembler would spit out identical code as mov es:[ebx], eax.

If you mean DS vs ES, if you use a segment override there is no use of default, hence override. DS is only assumed if no other segment prefix is specified in this instance.

Code:

mov ebx,  0x12345678
mov [ebx], eax    ; DS:12345678 = eax, hence "default" segment
mov es:[ebx], eax ; ES:12345678 = eax

Unless you have DS and ES loaded with different selectors with descriptors with different bases, they both end up being the same physical location anyway. Instructions and registers have default segment registers they use (EBP and ESP pair with SS as do PUSH/POP) and if overridden they use that instead, not both. Only the more complex instructions like the string instructions (stosx, movsx, cmpsx, etc) implicitly make use of es and ds simultaneously for source and destination.

Real mode 32 bit example, easier to see with literal segment addresses rather than indirect descriptor table look ups:

Code:

mov bx, 0x1000
mov es, bx ; es = 0x1000
inc bx
mov ds, bx ; ds = 0x1001

; zero 20 bytes of memory at 1000:0000-1000:0013
cld          ; incrementing di
xor eax, eax ; eax = 0x00000000
mov cx, 5    ; 5 dwords = 20 bytes
xor edi, edi ; es:di = 0x1000:0000
repz
stosd

; values to store, which will be reversed when written (little endian)
mov eax, 0xDDCCBBAA
mov ebx, 0x44332211

; still es = 0x1000 and ds = 0x1001
; seg:off numbers are in hex

mov es:[0], eax     ; 1000:0000 = 0x00010000 = eax
mov [0], ebx        ; 1001:0000 = 0x00010010 = ebx
mov es:[0x0010], al ; 1000:0010 = 0x00010010 = al
mov di, 0xA
mov es:[di], bh     ; 1000:000A = 0x0001000A = bh
mov es:[di+1], ah   ; 1000:000B = 0x0001000B = ah

; results after in linear addressing:

0x00010000 - AA BB CC DD
0x00010004 - 00 00 00 00
0x00010008 - 00 00 22 BB
0x0001000C - 00 00 00 00
0x00010010 - AA 22 33 44

To dereference a pointer you have to do two loads, first the pointer, then again using the pointer as an address. Let's see if I can get this right without checking in a compiler:

Code:

void test( void )
{
   unsigned long arr[3];
   unsigned long * ptr;
   ptr = &arr[1];
   *ptr = 0xDDCCBBAA;
}

; enter
push ebp
mov ebp, esp
sub esp, 0x10
; ebp-04 = address of arr[2]
; ebp-08 = address of arr[1]
; ebp-0c = address of arr[0]
; ebp-10 = address of ptr

; ptr = &arr[1];
lea ebx, [ebp-8] ; calculate address of arr[1]
mov [ebp-0x10], ebx ; store it to ptr, keep in ebx for next step

; *ptr = 0xDDCCBBAA;
mov eax, 0xDDCCBBAA
mov [ebx], eax ; *ptr = arr[1] = 0xDDCCBBAA

; leave
mov esp, ebp
pop ebp
ret

Say you lost ebx in longer code and wanted to dereference ptr again:

Code:

mov eax, 0xDDCCBBAA
mov ebx, [ebp-0x10] ; same as above but reloading ptr from ram
mov [ebx], eax

Gryz · Jan 28, 2017

Thank god for gcc.

chrstrbrts · Jan 28, 2017

Gryz said:
Thank god for gcc.

I understand the power of higher level languages, but I don't like them.

They obfuscate what's happening at the machine level and hinder proper learning in my opinion.

But, that's just me.

exdeath said:
[ ] specifies a memory access. Without brackets you just do a register to register move.

Yes, I understand this already.

exdeath said:
This type of syntax is merely to specify intended addressing mode so the assembler spits out the correct opcode.

Yes, I gathered that as well.

exdeath said:
If you specify an explicit segment prefix or segment override it's obvious you want a memory access not a register access so it assembles the memory access opcode as if you'd had brackets, since an immediate or reg mov is nonsensical with a segment specification.

Well, yes, that makes sense.

That was my reasoning.

This begs the question of why the brackets are necessary at all.

exdeath said:
Try them both and compare the machine code.

Well, I have an assembler and a hex editor.

How should I proceed?

I can't tell what's going on in the resultant binary if I have formatting going on.

So, I supposed I should just have the assembler assemble the test code in flat binary so there aren't any formatting tables and things in headers and footers for OS segment loading.

exdeath said:
This is if you are asking about mov es:ebx, eax?

Yes, it is.

I'm asking about mov es:[ebx], eax versus mov es:ebx, eax.

I don't understand why Intel's standard, apparently, is to use the brackets.

That doesn't make sense to me and it's counter-intuitive.

Though, LEA seems to work that way, using pointers in brackets without performing the dereferencing.

exdeath said:
I don't recall if that is valid or not but if it is, since a segment prefix is meaningless to a register, the assembler would spit out identical code as mov es:[ebx], eax.

OK. So, here's my plan:

I'll open my assembler and code:

mov es:[ebx], eax
mov es:ebx, eax

Then, I'll tell the assembler to assemble in flat binary mode so that the only binary produced will be just the machine codes and no formatting whatsoever.

Then, I'll open the resultant binary in my hex editor and see what shows up.

Will that work?

exdeath said:
If you mean DS vs ES, if you use a segment override there is no use of default, hence override. DS is only assumed if no other segment prefix is specified in this instance.

My question wasn't directly about the prefixes, and I do understand the mechanism regarding overrides and defaults.

I was trying my best to reconcile Intel's use of brackets in the original example with the standard use of brackets in Intel syntax.

I conjecturaly assumed that if access to DS were to be made to satisfy the dereferencing of the ebx pointer despite the presence of the ES override, then that access would happen behind the scenes at the micro-architecture level.

But, it's clear now that that's not what's happening.

I think that the use of brackets here in this original example is just syntactical like with the LEA instruction.

Intel syntax assembly code using dereferenced pointers as offsets

chrstrbrts

Senior member

exdeath

Lifer

Gryz

Golden Member

chrstrbrts

Senior member

TRENDING THREADS