AMD didn't say if the 40% was minimum, mean, maximum, but looking at blender, the maximum is at least +80%. This imply that 40% at least is the mean value, that is reasonable. Stating that 40% is the minimum in an official statement is very dangerous, because a single example of less than 40% gain would suffice to be wrong.
Thinking better on this subject, i think (and probabily can be verified lokking at the small notes) that AMD intended mean IPC, as it is reasonable.
IPC is a mean, and that's what I expect to be used when more applications are involved to calculate some "aggregated" IPC.
In the case of Zen you can safely assume a weighted mean between 4 and 6 upos/cycle, weighted by the uop cache hit/miss ratio. Because if there is a branch misprediction, the uops can still be taken on the uop cache, so if there is an hit, the throughput is still 6 uops/cycle.
You don't need the decoder again, except if the branch is in a never taken piece of code... But we are talking of loops here, right? Because a single path without loops has a negligible execution time, compared to loops of code and so in the total performance evaluation can be safely ignored. We are talking of program that take at least seconds to be executed, so piece of code executed only one time and first execution of a loop, were the uop cache probabily would miss, are a negligile fraction of the total execution time...
Well, even emulators have loops, and a bigger loop (the main, "evaluation", loop at least), but it doesn't mean that a uop cache can be of any benefit.
If you take a look at the main loop of an emulator, for example, you'll see a big (C-like) switch statement which usually is internally translated to an unconditional jump to a pointers table to execute the code of specific instructions. And so on with other switches for handling other cases, especially with more complex architectures to be emulated (x86 and 68K are primary examples, with the latter which is even more complicated due to 16-bit opcodes).
Virtual machines for programming languages have very similar behaviors.
Geez, 74 pages and 73 of them are special friends ELF and Cidimauro arguing about what "40%" means, as if that has any bearing on current zen performance whatsoever. Can we give these trolls their own thread to fill with this drivel? It's pedantic and totally unrelated to technology, it's also threadcrapping at this point.
Discussions about IPC are perfectly in-topic. Ignore them if you don't like, but don't invoke censorship.
And if you have problem with specific members here, there's an ignore list.
I expected 40% to be the minimum actually, and I have seen it quoted somewhere here, but when searching I could not find a direct reference, except:
First point suggests pretty clearly that 40% is with the SMT. Otherwise why list it under how the 40% IPC is gained?
Going off HotChips, I'd say it's IPC pretty clearly, and that means per core. Not 40% performance.
Sent from HTC 10
(Opinions are own)
The slide talks about core improvements, so I agree with you that it should be referred to SMT.
Which is obvious, considering how a core works (it commits instructions which come from ANY thread), and the definition of IPC.
And I still think that it's completely non-sense talking of "ST IPC", unless you are measuring a purely ST application (in which case they match): a MT application will use BOTH (in a SMT-2 design) hardware threads to achieve the goal, and so the core will execute/commit instructions of BOTH threads.
P.S. I know that, after the presentation, an AMD executive talked about "ST IPC" for the +40%.