The problem with good 256bit AVX/AVX2/FMA performance is you need 256bit paths. And that's a TDP killer. Without these paths my 6700K could maybe have been sold as a 55W TDP CPU.
Same reason why AVX512 is server only.
If Zen gets these 256bit paths, then it will be clocked at 3Ghz or below for 8 core parts.
Even Sandy Bridge shared some of the AVX hardware path with SSE. From Anandtech's architecture review:
Sandy Bridge allows 256-bit AVX instructions to borrow 128-bits of the integer SIMD datapath. This minimizes the impact of AVX on the execution die area while enabling twice the FP throughput, you get two 256-bit AVX operations per clock (+ one 256-bit AVX load).
Haswell adjusted this to enable higher peak FLOPs, but it did indeed cause power utilization issues, to the point where Intel had to use a separate, lower "AVX base clock" in addition to the standard base clock rate.