The discussion about the number of ALUs misses an important point: ALUs don't matter if you can't keep them fed. For that you need a sufficient number of Load/Store Units. Zen will retain the two LSUs from the Bulldozer family. This will probably pose a pretty hard limit on the performance of various integer algorithms. Just as reference: Intel moved to three LSUs with Haswell in 2013. While RISC systems (e.g. POWER8, Apple A9) often have only half as many LSUs as their sustained instruction throughput, on x86 you want a higher ratio due to the fused load-execute ops.
I haven't found traces of current x64 software, but for x86 traces from the 90s, load/store ops often amounted for more than 50% of the dynamic instruction count. For SPEC2000 int-gzip, it was nearly 77%; for int-gcc 82%. These are the most extreme cases, but then the lowest percentage is about 40%.
If current x64 code still has similar ratios, we won't see a general 40% integer IPC uplift vs. EXV, as the LSUs will pose a bottleneck for some types of integer code. For FP this is a different story, as FP stores seem to use only one FP pipeline without using a LSU, so Zen will act as having three LSUs when running FP code.
While the design doesn't seem competitive for a 2017 high-performance x86 product, the design choice makes sense if the execution backend was primarily intended for K12, as it would have the 1:2 ratio of LSUs to issue width which seems normal for RISC architectures.
BTW, any news on K12? I haven't seen any news on it for a long time. Is it still mentioned by AMD, or does it appear to be canceled?
I haven't found traces of current x64 software, but for x86 traces from the 90s, load/store ops often amounted for more than 50% of the dynamic instruction count. For SPEC2000 int-gzip, it was nearly 77%; for int-gcc 82%. These are the most extreme cases, but then the lowest percentage is about 40%.
If current x64 code still has similar ratios, we won't see a general 40% integer IPC uplift vs. EXV, as the LSUs will pose a bottleneck for some types of integer code. For FP this is a different story, as FP stores seem to use only one FP pipeline without using a LSU, so Zen will act as having three LSUs when running FP code.
While the design doesn't seem competitive for a 2017 high-performance x86 product, the design choice makes sense if the execution backend was primarily intended for K12, as it would have the 1:2 ratio of LSUs to issue width which seems normal for RISC architectures.
BTW, any news on K12? I haven't seen any news on it for a long time. Is it still mentioned by AMD, or does it appear to be canceled?