This was an ARM presentation on the 14nm FF, with an isolated NEON FPU of an A53 (or 57, don't remember). obviously the leakage is proportional to the transistors number and the isolated NEON FPU is at most 1/6 of a Zen CPU. Even if 18mW are trascurable, we are talking passing from 30% of power wasted in leakage (100mW on 330mW), to 5% (18mW on 330mW)... On a 95W CPU this is from 30W wasted in leakage, to 5W, with the 25W that can be invested in more clock...
That's what I'm saying (if you read the other points). Gains at mW/low MHz/low IPC archs never translate directly to gains for high performance CPUs. Low power CPUs, at every generation in the past 6 years have made HUGE gains.
A53/A57 are ASICs with at least 30 FO4 per stage. This is the reason for this lower clock... Anyway I found a graph that projected to up to 4.3GHz the consumption of this NEON FPU, being about 1W. Even if Zen draw 10 times this FPU and is done with 30 FO4, should draw 10W/core at 4.3GHz...
They are in-order dual issue 8-stage ULTRA low power cores with max power around 0.8W though. They really are not comparable and neither does scaling occur as such (or Arm can just scale up and beat Intel today lol).
Frequency scaling also differs between the two in the same way.
And even then bjt2... While you're on A53... Have you seen how power more than DOUBLES from 200MHz to 400MHz for the A53? [from anandtech]
From 500MHz to 1GHz it more than TRIPLES.
After 900MHz, +100MHz increase (11%) makes a +135mW increase (33.7%)!
And, when Arm increased perf 30-40% from A7, they also 2.22x power!
Real world CPU constraints are very different to research papers
I was talking of medium IPC because I was supposing to start from an high IPC desing, break the stages in more pieces, with the goal of increasing clock and so losing some IPC for the longer latencies and so longer branch misprediction penalities... But if the branch predicition is good and we add only 2.5 FO4 per stage, a 17.5 FO4 per stage gives 15 FO4 for the logic and maybe something useful can be done...
That's the FO4 inverter delay for the brilliant Alpha 21264
It is possible, yes, but extremely difficult and requires a big budget, plenty of research time with good management.
Sent from HTC 10
(Opinions are own)