Why? How you can say that?I have already said before. Core design gives no chance for Zen to be clock for clock on Skylake level for single threaded performance. However, Haswell/Broadwell - that is completely different story.
Zen can decode 4 instructions or give 6 uops from uop cache. Skylake 4 fused (so up to 8). For ST loads this is not a bottleneck.
Zen can execute 4 int + 2 mem + 4 fp. Skylake 4 between int and FP and 3-4 mem, but in ST this is more than enough
Zen can retire 8 instructions. Also SKL... More than enough...
SKL throughput is higher only for 256 bit FMAC and on par for 256 bit FMUL or FADD. On 128/80 bit tasks is lower.
There is no bottleneck for single threaded tasks. In heavy FP tasks INTEL has to share ports with integer instructions, but in ST tasks this is simple since IPC of real application is below 3... So no problem for both.
The only problem could be for Zen on 256 bit tasks, since INTEL has more memory BW and resources... But if the 256 bit task requires many data, the L2, L3 and mem BW should be similar...
There is a clue on L3 BW though. I ever thought that L3 cache in INTEL went with core clock, but on an overclocker forum I discovered that L3 clock is lower, about 2.4GHz... Is it true?
On Zen hot chips presentation, the L3 BW was declared 5x than on bulldozer. 4x is for the bus and 25% is for the clock, I think. On BD the L3 goes at 2,4GHz. +25% = 3GHz. So it seems that the L3 goes at core clock on Zen...
So the only unknown is cache and branch prediction efficiency. We have no clue to say who it's better.