Originally posted by: Artanis
good point, end of story. primarily, 64 bit is not the 'performance fever' as some would believe is twice as speedy (because 64=32*2 ). that's the point...
cya!
I see from later posts that you've either completely misunderstood this!? (or...)
There will be a good performance boost from 64-bit. I tried to explain you the technical reasons for that, which are not so selfevident, and have more to do with the ISA, than 64-bit datachunks.
You are practically guaranteed at least 30%, from what I've seen, even from a crude port. And that is a good boost indeed. But actually, more 'mature' optimizing will probably bring that up to 40-55%. And in some extreme cases, where 64-bit integer ops, twice the number and more useful registers, and mapping tricks, will converge, you will see 400%-500% increase.
- Still! Still the primary point of 64-bit is the 64-bit pointers! Because this is 1000nds times more important then a mere performance boost. The things that will be done on the 64-bit platform,
cannot be done at all on a 32-bit platform, at any speed. You might as well try to travel to the moon on your bicycle.
Originally posted by: Grant2
So is it true that 64-bit integer math will be performed much faster on a 64bit cpu? (because it doesn't have to break it down into multiple 32-bit operations)?
Yes. But it will also be faster yet, because we have more visable registers.
Is it also true that 64-bit floating point math will not be performed any faster, because modern CPUs already have specialized 128-bit hardware to handle that?
No, FP will also be faster, because we have more visable registers.
And yes, there already is specialized 64/80-bit hardware (not 128-bit) to handle double precicion FP (64-bit) math.
'128-bit hardware' is actually just 64-bit SSE and 128-bit SSE2 registers, and instruction extensions to include vector operations. I hope you understand what that is. There are still just logical hardware operations for 8, 16, 32, 64 bit long data. And the execution paths are generalized in the A64 to 64-bit. (so there are good reasons to call it a "64-bit" CPU)
An 128-bit instruction operates on vectors. That is, it performs 4 times 32-bit operation, or 2 times 64-bit.
The A64 has basically three execution paths. It also has storage/buffer areas for instructions in work.
The instructions pour into the CPU, into three parallel decoders. Then they collect in a pool, where their sequential order is broken. Each instruction, at this point, have their own version of the contents of 'visable' registers. At this point, a 2X64, 128-bit vector instruction is split up into two 64-bit operations. A sheduler then kicks away the operations, as soon as they are ready (all relevant in-data collected or computed) into either, one of three paralell integer execution units, or one of three specialized FP units.
(If no instruction is ready for sheduling, data is guessed and an operation is kicked into the execution unit anyway, rather than wait. If it eventually proves the guess was wrong, the operation has to be redone, but if right, then the result is already at hand.)
After the execution, the results are collected in an re-ordering queue. There the sequential order is restored, before the results are committed. In this case, the two 64-bit results are written to the visable 128-bit register that is to accumulate the result.
As you see, there is a good deal of parallelism implemented in todays CPUs. Vector ops is one fairly explicit way to put this to good use. My point with this description, is that the real hardware is not exactly the same as what is implied by the ISA, to the assembler programmer. The processing power can be increased by adding more decoder pipes and execution units. Problem is to make use of it. Vector instructions help a bit with that.