A lot actually. AVX 512 takes up a lot of space. On SL-X it was estimated to be 20-25% of the core size. No matter who you are AVX-512 doesn't even have the consumer potential to make it worth having on a consumer die, let alone it accounting for as much die space as it does. Those transistors could be better served on doing just about anything else.
My impression is that Intel expected the node shrink to 10nm to "take care of the problem", and in a worst-case scenario we might see some changes to cache architecture that might be unfavorable for overall CPU operation compared to the cache architecture from the Skylake family (consider Skylake vs Skylake-X). So I don't think the trade-offs were anything that would explain why the i3-8121U was so much slower than the i3-8130U in some of the tests.
I see what you're saying, and that the implementation of 512-bit SIMD would take up a lot of die space. I'm just not sure that AVX-512 made the situation "worse" than, say, a Skylake-derived chip. It certainly did prevent them from improving some other aspects of the uarch.
Deep dive, but some basic questions are left unanswered. The sore spot that can hurt performance a lot is memory latency. Just putting some random DIMMs does not make them run SPD speed 2400CL17. Things are much more likely to run JEDEC safe mode 2133 at some hilariuos primary timings. In core 8121U test from October, it was running DDR4L @2400CL24, for Aida64 latency = 100ns.
Given AT results, they must have had something even worse and in the end that horrible memory latency ended up hurting CNL core performance in quite a few tests.
I have to call bs on this:
strong words, esp given this statement on previuos page, they had no way of knowing freq/timings?
I was a bit disappointed with that aspect of the review. There certainly were some ways for them to know frequency and timings. A custom tool like RyzenTimingChecker should do the trick. Someone would have to code such a thing, but it would be doable. And I think Thaiphoon Burner would at least get the SPD data off the DIMMs. CPUz, HWiNFO64, and any number of other applications should be able to get them at least the speed and primary timings.
My guess is they used CPUz, got the speed and primaries, and called it a day. Their apparent inability to glean subtimings (note the exact wording of the review) is probably from there being a barebones UEFI on the system.
Also, concretely - doesn't the review show a non trivial performance regression under AVX2 code?
It did seem a little odd, didn't it? I'd have to re-read the thing to get a better idea of exactly which tests had apparent AVX2 performance regression for the i3-8121U.