Originally posted by: grant2
Vee, could you answer my questions without getting into extra registers & any other specific enhancements that happen to be in an A64 chip?
(sorry for the delay, I've been busy)
Yes, I can. But the extra registers are specific to the x86-64 architecture as such. Not just to the A64.
Going 64-bit isn't just about increasing width of some registers and adding some instructions. It's about adding essentially an entirely new CPU architecture to the old. Just as the '386 once did. Keeping it *similar* helps with staying backwards compatible, while saving transistors. But the CPU will execute new 64-bit code in a new 64-bit mode. The old 32-bit architecture ('386, '486, Pentiums, K6, Athlon/XP) has three different user modes (or four depending upon how you count), representing three (8086, '286, 32-bit) different CPU personalities in one CPU:
Real mode = original 8086.
[8086]
protected mode = supporting both older 16-bit protected mode
['286] and 32 bit computing.
[32-bit]
Virtual real mode (virtual mode) = submode of 'protected 32-bit mode', emulates an original 8086 inside the protected mode.
[8086]
This is known as 'IA32', but is basically the '386. A consequence of the long '86 PC legacy.
The various extensions since, FPU, MMX, SSE, SSE2, may add registers and instructions, but isn't as fundamental change as going 64-bit, (or going 32-bit from 16-bit).
The x86-64 CPUs now have five (or seven, depending on how you count) modes, representing four (8086, '286, 32-bit, 64-bit) main CPU personalities in one CPU:
Legacy mode/real mode = original 8086.
[8086].
Legacy mode/protected mode = protected 16/32 bit code.
['286] & [32-bit].
Legacy mode/virtual mode = emulating 8086 inside protected addressing.
[8086].
Long mode/compatibility mode = emulating 16/32 bit protected modes inside a 64-bit space.
['286] & [32-bit].
Long mode/64-bit = Our brave new world! 64-bit computing
[64-bit]. A 64-bit virtual address space - this is the main thing! This last mode also includes double the number of registers, and 64-bit integer GP registers. These things are inherent in x86-64. Not just some enhancement that happens to be in the A64.
This is 'x86-64' (aka AMD'86-64, aka AMD64, aka CT (Intel), aka IA32e (Intel), aka EM64T (Intel))
As for the rest of my earlier elaborations on execution, I tried to show that what actually goes on, hardware wise, is slightly different, maybe, from an intuitive understanding of the instructions and registers. (And those are things that "happens to be in" the A64. And other CPUs might do things slightly different.)
So is it true that 64-bit integer math will be performed much faster on a 64bit cpu? (because it doesn't have to break it down into multiple 32-bit operations)?
- Yes. Also, wider still - than 64-bit operations - like in security encryption, will also be faster because it can be broken into fewer, larger parts.
That was the short simple answer. Here's the elaboration:
Also, some stuff can be faster because persistent data can be kept close to the execution, in the extra registers (if you/compiler use them). This holds for both 32-bit and 64-bit integer, in 64-bit code of course.
Is it also true that 64-bit floating point math will not be performed any faster, because modern CPUs already have specialized 128-bit hardware to handle that?
- No. That is not quite right.
It is true that 64-bit GP registers and 64-bit flat addressing, the 64-bit issue as such, will not change things for 64-bit (double precision) FP math.
It is true that we already have, since the '387 FPU coprocessor and the '486DX (integrating the FPU) 64-bit FP registers and 64-bit FP operations (with 80 bit precision).
It is true that we already have, since the Pentium4, 128-bit vector registers, holding 2X64 bit or 4X32 bit FP data. And that we have SIMD vector instructions, that will perform a single 32-bit or 64-bit FP operation, on 4 or 2 FP values, with a single instruction.
That may have been the 'simple' answer you're looking for?
So what about the "- No. That is not quite right."? Again, here's the elaboration.
But:
We do not have hardware performing 128 bit wide operations. Not in x86-64, and not before. 64 bits is the longest datawidth (discounting 80-bit precision inside '87 math) to be operated on by hardware.
128-bit vector instructions are useful because they state explicitely parallel operations, which makes good use of late CPU's hardware parallelism, as well as OoO (Out of Order) timing shuffling.
There is FP and there is FP... '87 FP and vector (aka SIMD, aka 'packed') -FP (3DNow, SSE, SSE2, SSE3..).
For sporadic FP operations, '87 is probably still best to use. In this case there is no change in 64-bit mode.
More FP intensive (and time consuming) computing work, like media encoding, 3D rendering, game 3D-engines, matrix/tensor math (basically all advanced computer math, for physics and engineering) is normally better to handle with vector instructions.
In this case, we have twice the number of registers in 64-bit mode. And I'm suggesting this will mean some FP math maybe will indeed perform better in 64-bit code. I also think there are some media encoding benches that support this. I also think various game developers have made some 64-bit performance claims, that could possible be partially because of this.
Final words: There will be no GENERAL performance increase from going 64-bit. Some things will be faster, by exploiting new features. But understanding modern, PC-, 16-32-64 *-bitness* in the game console paradigm - 64 is twice as 32 bit - is wrong. The essential issue is the virtual addressing space that code and data must live in. Is it flat or segmented, is it big enough, does it have 'elbow space'. This is a much, much bigger and more important thing than some silly data width.
As for the *width* thing, we already have 128-bit wide buses (dual 64-bit channel). As for the total execution width, total collective, sum of parallel execution widths of the A64, at one of the final stages, is a whopping 384 bits. Similar is true about PIII, P4 and Athlon/AthlonXP.
We don't go 64-bit just because wider is faster. Wider is faster! - If you specifically need to operate on long, 32+, bitfields, the 64-bit GPRs will make a lot of difference. But we have already been pursuing that 'wider' path for a good while. Not just with FPU, bus-widths and vector extensions, but also with multiple execution units. Even if our CPUs are just "32-bit". I'm also sure CPUs will continue to get gradually *wider*, while remaining "64" bit.
The primary purpose for 64-bit GP registers and integer instructions, is for handling 64-bit pointers.