Ok i just thought there was a performance penality doing it.
That the xmm registers overlay the ymm registers.
https://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions
But hey its way over my head this tuff anyway.
That's a very different thing. Previously maybe I wasn't clear, but I was talking about the possibility to use a 256-bit unit, "aggregating" 2 128-bit calculations, in order to maximize the use of the fatter FPU. That's not possible.
But obviously it's possible to do 128-bit (ONLY!) calculations with a 256-bit unit, and it's also possible to mix SSE and AVX (even 128-bit) calculations, but only in this case there's a penalty.
You're a bit over-estimate the advantage of long instruction set, the longer instruction window you have, the higher latency you get, performance gain is not linear or even relative when move to wider vector.
Correct, but I have made no statement about it.
I've only made a comparison between the 4 128-bit FPUs approach vs 2 256-bit FPUs on the only test available.
I don't know where to read so-call FPU unit count but native wider ALUs could do narrow algorithm if application is well tuned. OTOH it's not even make sense to justify performance with ALUs count. CPU is not GPU.
The FPU unit counts come from the respective microarchitecture details.
They likely used Blender as it is, why would they need to recompile it given that once Zen is released it will be tested with Blender as well, and AMD are not crazy to display a bench whose results are not reproducible, the argument that they could have voluntarly rigged the software is a poor one and has its roots in some public that absolutely dont want AMD to outperform Intel in any way, hence the tendency to discard no only the results but even Blender s relevancy...
Well, if they have used Blender is because it's the test where Zen performs well.
And I have NOT said that they have recompiled it. This is unknown.
I have only said that we don't know if this Blender version was using SSE, AVX 128-bit, or AVX 256 bit.
As said i m 100% sure that if they had used PovRay they would have displayed even better perfs in respect of Broadwell but they have their reasons to not do so, first is that the same people who downplay Blender would have been even more critical of PovRay since AMD currently perform better in this renderer, and second is that AMD has no advantage in showing better results that what they did, they are not here to help their competitor position himself..
"Currently" means with the current AMD architectures, which are different.
Zen is another one. So maybe that it performs better on Blender than on PovRay.
Anyway, Blender is an application that scales very well with the number of cores, and greatly makes use of SMT capabilities as well. It's also FPU-intensive. And last but not least, it's quite "linear" (read: the code is not full of branches and so on, like an emulator, compiler, etc.).
So, it's a perfect benchmark for testing Zen's capabilities, with it's 4 ALUs + 2AGUs/LS + 4 128 bit FPU.
But does only 2% better performance of a Broadwell, which has much less resources from this point of view.
PS & BTW: I'm doing QA for Intel's Application Debugger team (especially Xeon Phis).