- Mar 3, 2017
- 1,747
- 6,598
- 136
yeRaptorLake has no inter-chiplet latency problem and the RAM controller is on the same chip
In select* workloads.Thanks to 2MB L2 instead of 1.25MB, RaptorLake gains approximately +4-5% higher IPC.
d2d penalty is tiny.Zen has a RAM controller on a separate IOD
It needs larger L3 to compensate for a skinnier core with less reordering capacity.so it needs larger L3 to compensate
it's a skinnier core.and this is mainly why it benefits from large L3 + VCache in games.
Not that relevant.Moreover, communication of one CCD with the neighboring CCD is via IOD.
WRONGZen 6 is expected to introduce a single CCD with 16 cores and a shared 64MB L3.
Really? On the same chip or another? Seems huge. People say SRAM scaling stalled on N5/N3. And if I'm eyeballing it correctly that'd be like 70% of a current CCD's size just for the 64MB L3.shared 64MB L3
It’s Zen 4, family 1A is Zen 5 afaik (26 in decimal)Guys, what processor is this: https://browser.geekbench.com/v6/cpu/6030993
EDIT: It's Phoenix(not Zen5). But the score is a bit high(for 4.1GHz!)
Any info about ROBs and op-cache in Zen5?
It's Zen 4, Zen 5 has 48 KB L1D.Guys, what processor is this: https://browser.geekbench.com/v6/cpu/6030993
EDIT: It's Phoenix(not Zen5). But the score is a bit high(for 4.1GHz!)
Any info about ROBs and op-cache in Zen5?
No, there must have been mixed up something.The following is ONLY from the perspective of a Distributed computing perspective. This means 2 things, computing power and efficiency. We use all cores and most of the time SMP.
Zen 1 : way better than bulldozer in all respects, and if I remember correctly cheaper and more efficient than the Intel counter parts.
Zen 2 : small improvements in performance, about the same efficiency as Zen 1.
Zen 3 : MUCH better performance AND efficiency compared to Zen 2 The larger L3 cache made a big difference in some apps.
Zen 4 : MUCH better performance than Zen 3, but about the same efficiency. But in apps that use avx-512, nothing could touch the performance. For primegrid, we had to disable SMT and pin cores to a CCX for maximum performance, but when we do, nothing that Intel has comes close.
Actually SMT does measurably improve throughput in PrimeGrid on Zen 4, desktop and server, and does improve perf/W slightly. In contrast, on Zen 2 and Zen 3, SMT usage in PrimeGrid provides no or sometimes a small host throughput advantage but always reduces perf/W. (PrimeGrid is vectorized FP with large cache footprint, but not too large on Zen 3 and 4 if the user gives hints to the OS's process scheduler. Zen 2's cache is too small in many but not all of PrimeGrid's currently active projects.)For primegrid, we had to disable SMT and pin cores to a CCX for maximum performance,
PS, in the context of this specific application scenario, I expect that the top-end Zen 5 desktop SKUs with 230 W PPT limit (170 W TDP) make considerably more efficient use of this power budget, compared to their direct predecessors. Which would make them more attractive to somebody like myself who is interested in perf/Watt and in perf/host.Zen 5 in Distributed Computing? I trust that AMD carves out a decent perf/W update once again, despite only a minor manufacturing node update. But how much? Various hints earlier in this thread sounded promising to me. Though so far, 1T or/and iso-clock or/and integer performance characteristics have been more of a focus in this thread so far, rather than nT iso-power FP.
No, they benefit from total cache capacity.
(They also love fat L2's, as you've seen in RPL).
Ughh not quite.Separate cache pools waste most of cache capacity to duplication
Not even.And AMD did make it clear that unified 32MB cache pool of Zen3 is responsible for most part of game speed up.
Not even that, Zen3 was a major improvement in other ways, most notably branch prediction.That's mainly for a different reason - that is, eliminating the inter-CCX penalty that made Zen2 suffer
Ughh not quite.
Not even.
Not even that, Zen3 was a major improvement in other ways, most notably branch prediction.
Kinda wish C&C did more profile testing on games where Z3 is miles ahead of Z2. I think people have assumed it is the cache when it could more so be something core specific.
Sorta OT bu interesting stuff is happening at Chips & Cheese:I have noticed that there is less contribution from the technically inclined posters (including myself), so a separation of threads and focus may help to keep likeminded forum members engaged.
The first project we have been working on is a new microbenchmark framework. This new framework will hopefully allow for standardization between different tests to keep things as consistent as possible. In the short term, this will also allow folks other than the Chips and Cheese team to add to the Chips and Cheese test suite.
In the long term, we hope that this framework will allow for more tests to be written than the current Chips and Cheese team could ever write on its own, along with diversifying test authors.
As for the members of the board:
...
- George J. Cozma: President and Chairman
- Dr. Ian Cutress: Vice President
- Ryan Mull: Treasurer
- ...
It's probably an invented leak by Micro Center because they want to clear their Zen4 stock before the Zen5 announcement at Computex. 🤣So we now have:
9800X, 8 cores, 170w TDP
Clock regression, ~100Mhz
IPC, ~10% compared to Zen4 <NEW>
OMG. I strongly recommend Mike Clark don't wake up anytime soon and keep sleeping until Zen6.
Core architecture changes + unified L3 cache as a whole architecture. I don't know how you can still think that L3 was completely irrelevant to IPC.Kinda wish C&C did more profile testing on games where Z3 is miles ahead of Z2. I think people have assumed it is the cache when it could more so be something core specific.
Not true. If core A 4GHz + VCache compared to core B 4GHz without VCache allows you to get +15% more FPS, this is an increase in the IPC of the processor.F.P.S. ≠ I.P.C.
A processor with a 0.2 ms latch can process more instructions per cycle than a processor with a 0.3 ms latch."Instructions Per Cycle" means instructions per cycle. Is that so hard to memorize?
Edit, as an example, when one processor spins on a lock for 0.2 ms, and the other for 0.3 ms, which of the two processors got the higher Instructions Per Cycle count?
If you need to know: I can't continue something which I never started. Just take what I wrote and avoid to add meaning which is not in there.Will you continue to try to distort reality?
That's mainly for a different reason - that is, eliminating the inter-CCX penalty that made Zen2 suffer