Let me ask a couple of questions about these "256 bit" AVX supposed deficiencies of Zen. AFAIK, Zen has 4x128 bit FP SIMD units, latest Intel arch has 2x256 bit. But, Zen has the possibility to execute 256 bit FP instructions, too, by splitting up the "larger" instructions between two clock cycles. So, theoretically, having double the units, this leads to the same theoretical peak FP256 rate than BW, albeit at the cost of probably higher latencies. When executing FP128, Zen should have double of the peak rate of Broadwell, per clock (latencies apart). Also, I've read on several sites (sorry, too lazy to start finding the occurencies), some can be found directly on Intel's site (
http://www.intel.com/content/dam/ww...on-e5-v3-advanced-vector-extensions-paper.pdf ) it seems that when executing AVX2 core the maximum frequencies are reduced in order to keep down the power consumption.
So, it seems to me that if we look only at the peak rate per clock, Zen has the upper hand in FP128 and parity in FP256.
While, of course, when looking for real performance in FP128/256, we should know:
- Real latencies of the execution (influenced by memory/cache, schedulers, etc..)
- Actual clock speeds when executing these instructions (that may be lower than maximum turbo also in Zen)
Am I correct or there is something else?