- Mar 3, 2017
- 1,747
- 6,598
- 136
It is largely Front-End and Load Store.Yes but your going to see that covered by the prefetch and load parts of the breakdown diagram. But that will only be a component of those sections.
But I would suppose you are getting weary of all the back and forth hype and anti hype, the sarcasms and mud slings. You are not alone. There is a group who enjoys this kind of discussion but there is another group who doesn't and cannot participate in any discussion anymore on the topic of Zen 5. Maybe we should create a different thread to discuss only the architectural/technical aspects of this upcoming product with limited speculation (no hype, no market share discussion, no memes) based on publicly available evidences (patches, GB results, manuals, official statements).
Is it because geekbench overweights those instructions with regards to how often they're actually seen by users?Geekbench 5 ain't any better as it includes AVX512 to pump up scores. So any non-Intel (ex 11th gen) and ARM CPUs will score lower.
Cinebench 2024 is probably the best one now for consumers
Just on this, remember that the rumours comes from serverside sku's. +40% SPEC Int Rate 1N and +50% SPEC Int Rate tN (with +25% power). While on serverside SPEC Int increases may a good proxy for general application performance increases, that might not be the case for desktop scenario's. The rumour is currently that the IOD is not changed for Granite Ridge, so you have to be cautious in applying the serverside performance numbers to desktop sku's since memory bandwidth is not likely to change too much.No, not really. E.g. I just quoted this regarding SpecINT being very close to average IPC:
Just on this, remember that the rumours comes from serverside sku's. +40% SPEC Int Rate 1N and +50% SPEC Int Rate tN (with +25% power). While on serverside SPEC Int increases may a good proxy for general application performance increases, that might not be the case for desktop scenario's. The rumour is currently that the IOD is not changed for Granite Ridge, so you have to be cautious in applying the serverside performance numbers to desktop sku's since memory bandwidth is not likely to change too much.
I have no idea on how many benchmarks in AMD's IPC suite are bandwidth sensitive under nT loads, but I would wager its greater than zero.
Hence SPEC Int numbers !== IPC for Granite Ridge.
No, server CCDs are different.Maybe with Zen 5 the server products clock at like 5 ghz (+15%, with remaining +20% being the IPC increase)?
Wow, you've proved AMD's comparison between Zen 2 and Zen 3 wrong using a small subset of tests comparing Zen 4 to Zen 4 X3D.C&C:
"VCache provides a notable 33% L3 hitrate increase here. Bringing average hitrate to 78% is more than enough to compensate for the slight L3 latency increase. GHPC enjoys a 9.67% IPC gain from running on the VCache CCD, so the other CCD should fall short even with its higher clock speed."
"With affinity set to the VCache CCD, IPC increased from 1.26 to 1.43. That’s a 13.4% increase, or basically a generational jump in performance per clock. VCache really turns in an excellent performance here. L3 hitrate with VCache is 63.74% – decent for a game, but not the best in absolute terms. Therefore, there’s still plenty of room for improvement. Modern CPUs have a lot of compute power, and DRAM performance is so far behind that a lot of that CPU capability is left on the table. Cyberpunk 2077 is an excellent demonstration of that."
"Zen 4’s normal 32 MB cache suffers heavily in this game, eating a staggering 8.66 MPKI while hitrates average under 50%. VCache mitigates the worst of these issues. Hitrate goes up by 47%, while IPC increases by over 19%."
"Hitrate improves by 16.75%, going from 61.5% to 72.8%. That’s a measurable and significant increase in hitrate, but like DCS, libx264 doesn’t suffer a lot of L3 misses in the first place. It’s not quite as extreme at 1.48 L3 MPKI with the non-VCache CCD. But for comparison, Cyberpunk and GHPC saw 5 and 5.5 L3 MPKI respectively. We still see a 4.9% IPC gain, but that’s not great when the regular CCD clocks 7% higher. Performance doesn’t scale linearly with clock speed, but that’s mostly because memory access latency falls further behind core clock. But given libx264’s low L3 miss rate, it’ll probably come close."
"With affinity set to the VCache CCD, we see a 29.37% hitrate improvement. IPC increases by 9.75%, putting it in-line with GHPC. This is a very good performance for VCache, and shows that increased caching capacity can benefit non-gaming workloads. However, AMD’s default policy is to place regular applications on the higher clocked CCD. Users will have to manually set affinity if they have a program that benefits from VCache."
"Zen 4’s VCache implementation is an excellent follow-on to AMD’s success in stacking cache on top of Zen 3. The tradeoffs in L3 latency is very minor in comparison to the massive capacity increase, meaning that VCache can provide an absolute performance advantage in quite a few scenarios. Zen 4’s larger L2 also puts it in a better position to tolerate the small latency penalty created by VCache, because the L2 satisfies more memory requests without having to incur L3 latency. The results speak for themselves. While we didn’t test a lot of scenarios, VCache provided an IPC gain in every one of them. Sometimes, the extra caching capacity alone is enough to provide a generational leap in performance per clock, without any changes to the core architecture."
It was developed by someone smarter who didn't just look at marketing slidesAMD’s 7950X3D: Zen 4 Gets VCache
Compute performance has been held back by memory performance for decades, with DRAM performance falling even further behind with every year. Caches compensate for this by trying to keep frequently …chipsandcheese.com
Obfuscation? Read previous entries. Previous speakers claim that cache has nothing to do with the increase in IPC. I have provided clear proof that it is quite the opposite and the proof is VCache which can increase a lot of IPC. Not in every application, but the L3 design itself and its quantity affects IPC.Wow, you've proved AMD's comparison between Zen 2 and Zen 3 wrong using a small subset of tests comparing Zen 4 to Zen 4 X3D.
Honestly I'm not sure what the point is here other than obfuscation. More cache is good for some games and that's why Zen 3 was well above the geomean AMD showed for some games. And Zen 3D further ahead still in even more games for the same reasons.
Allmighty Patterson, bless my soul.Previous speakers claim that cache has nothing to do with the increase in IPC.
And where did I say that?Obfuscation? Read previous entries. You claim that cache has nothing to do with IPC increase. I provided clear proof that it is quite the opposite and the proof of this is VCache which can increase a lot of IPC. Not in every application, but the L3 design itself and its quantity has an impact on IPC measurements.
Are you sure it didn't increase L3? Each Zen3 core has direct access to 32 MB instead of just 16 MB like a single Zen2 core. I see the difference, but you don't see it.Allmighty Patterson, bless my soul.
Not in case of vanilla Z3.
V$ triples the cache.
Z3 didn't add a single meg.
Neither does Z5.
You clearly wanted to deny that cache has any effect on IPC, since AMD didn't state that on the slide. While claiming that delays have no impact on IPC.It's for ISSCC. From the people who designed it. I'll take that over 'a gamer explains'.
There's a reason AMD markets the X3Ds as 'the ultimate in latency reduction' not 'the ultimate in IPC'.
Clearly not. Stop tilting at windmills and read it again.You clearly wanted to deny that cache has no effect on IPC because AMD didn't state that on the slide.
At what point am I wrong? Enlighten me.Clearly not. Stop tilting at windmills and read.
Yea.Are you sure it didn't increase L3?
There isn't much, that's the point.I see the difference, but you don't see it.
I see this is still a problem.Yea.
There isn't much, that's the point.
if you have bigger caches( at even given point in cache hierarchy ) and your prefetchers and predictors aren't any different/better then all you have is a larger victim cache. If the working set/hot loop /etc fits within the existing victim cache size then your going to see next to zero benefit from larger cache.Are you sure it didn't increase L3? Each Zen3 core has direct access to 32 MB instead of just 16 MB like a single Zen2 core. I see the difference, but you don't see it.
It doesn't matter that much outside of like DB workloads.CCD Zen 2 has 2x CCX(4 cores and 16MB)
(2x 16 MB (total 32 MB)).
The problem is that each Zen2 core only has direct access to 16MB, and another 16MB is connected by a much slower IF.
CCD Zen3 has 1x CCX, i.e. 8 cores and 32MB. This allows each Zen3 core to have direct access to 32MB of L3.
Can you see the difference?
And did I write somewhere that there are no other improvements apart from the cache?if you have bigger caches( at even given point in cache hierarchy ) and your prefetchers and predictors aren't any different/better then all you have is a larger victim cache. If the working set/hot loop /etc fits within the existing victim cache size then your going to see next to zero benefit from larger cache.
So as people keep saying the front end is going to be super critical, prefetch , predict along with decode that feeds the beast. Cache is a support to that not the key contributor. If it was so super important to enabling performance AMD would be shipping 4/8 hi Vcache in server and rolling up with 6Gb of cache a socket . If it was that enabling of performance they would be able to charge whatever price because CPU's are like 1-5% of the hardware cost of a server ( depending on exact config ) and irrelevant in full life TCO. But obviously they dont exist as a product part so what does that tell you ?
Everything matters. Games benefit mainly from this.It doesn't matter that much outside of like DB workloads.
like Nar der....And did I write somewhere that there are no other improvements apart from the cache?
VCache clearly shows that reducing latency in accessing RAM by using it less frequently results in additional IPC gain, which is mainly used by games.
No, they benefit from total cache capacity.Games benefit mainly from this.
V Cache is a lot of additional billions of transistors to obtain additional IPC layers from the cores. VCache only allows you to get close to the theoretical peak IPC of a given architecture. To see further gains again, you need a new and more complex core design (to put it very simply).like Nar der....
but its called a diminishing return curve , other wise we would have 6gb cache processors right now.......
RaptorLake has no inter-chiplet latency problem and the RAM controller is on the same chip. Thanks to 2MB L2 instead of 1.25MB, RaptorLake gains approximately +4-5% higher IPC.No, they benefit from total cache capacity.
(They also love fat L2's, as you've seen in RPL).