- Mar 3, 2017
- 1,754
- 6,631
- 136
https://blog.hjc.im/spec-cpu-2017 David Huang added some time ago SpecInt results for 9800x3d overclocked to 5.7GHz [the fact that the vanilla score is using slower RAM might muddle things a bit, not sure how each subtest is affected by mem] but also stock 9950x vs 9800x3d show, that 9800x3d is extremely close to 9950x despite the clock deficit, which agrees with C&C results. Slight clock bump on top of what 9800x3d offers, should make both CCDs almost equal.9950X3D is heterogeneous too, but hopefully not as bad as 7950X3D. (That is, with a clock frequency differential small enough such that cache insensitive workloads behave practically identical on both CCXs — hopefully.)
for zen 6 they really need to make a super superchip dedicated 100% to real performance, max tdp max vcache everything maxed out, without wasting silicon for NPUs and AVX512 and whatnot
like Halo but 100% CPU
hold the indisputable crown even higher than the popular *800X3D models
Will we get 9c/18t x600 and 18c/36c x900 CPU's then? That would be a really nice floor for mainstream parts.If the main Zen 6 CCD is indeed 12 cores, and everything scales 1.5x (L2, L3, V-Cache) then, a 24 thread processor would be a significant upgrade to 9800x3d.
Desktop and server do not share the CCD, just the design itself.
More differences than just another stepping?
I am guessing:Yeah, very very different xtor-level optimizations.
Bingo!I am guessing:
– Turin CCD: focus on low V-f curve, notably in power-limited load scenarios, of course without compromises to correctness?
– GNR CCD: focus on an ability to operate at high V?
For once, AMD should just release a product without any rehearsed marketing speak. When asked what's great about it, they just say, "We will let the community find out for themselves" and everyone frantically scrambles and tests everything under the moon. One month later, people are still finding pros and cons. All this exposure will translate into product sales. Then AMD announces a free X3D2 CPU to every reviewer/community member who discovered a previously unknown perf benefit in any application/game with decent user base. It will be a tremendous red letter day for AMD Marketing.this will lead to another 'waste of sand' video by GN, HWU et al.
For once, AMD should just release a product without any rehearsed marketing speak. When asked what's great about it, they just say, "We will let the community find out for themselves" and everyone frantically scrambles and tests everything under the moon. One month later, people are still finding pros and cons. All this exposure will translate into product sales. Then AMD announces a free X3D2 CPU to every reviewer/community member who discovered a previously unknown perf benefit in any application/game with decent user base. It will be a tremendous red letter day for AMD Marketing.
The 16 core chips have always been sort of prosumer no? Who’s buying 16 core JUST for gaming?The issue is, we know with near certainty the things that it will be good at. Phoenix has done extensive testing on the Epyc X3D parts and has an excellent roadmap of where it shines. Serve the home also has a few good benches that show the X3D parts' strengths.
99% of their strengths are in semi-pro or full server level tasks. Sell it under the Epyc brand and watch them fly off the shelves.
Lotf of "I have a bigger one than yours " ?The 16 core chips have always been sort of prosumer no? Who’s buying 16 core JUST for gaming?
To quote part of the article conclusionIt does seem like an obvious place for AMD to work some magic in Zen 6 though, assuming they still care about client performance.
So if they ever let go of op cache, they better make the decoder wider, but with the op cache in place they should rather make it faster. Wider won't help, what lines up with other C&C pieces where they mention decode width is not a bottleneck.Zen 5’s improved op cache focuses on maximizing single threaded performance, while the decoders step in for certain multithreaded workloads where the op cache isn’t big enough.
And Intel's example shows that wider is not necessairly the answer.Intel’s Lion Cove might not take the gaming crown, but Intel’s ability to run a plain 8-wide decoder at 5.7 GHz should turn some heads. For now, that advantage doesn’t seem to show up. I haven’t seen a high IPC, low-threaded workload with a giant instruction-side cache footprint. But who knows what the future will bring
AMD has been beefing the op cache since Zen 1. Zen 5 invested a lot in that cache. They are not about to drop it anytime soon, right?So if they ever let go of op cache, they better make the decoder wider, but with the op cache in place they should rather make it faster. Wider won't help, what lines up with other C&C pieces where they mention decode width is not a bottleneck.
Never; unless they would, for some reason, need to build a special-purpose core which sacrifices perf and perf/W for area savings. [Edit: Or if they, for some reason, made a core for a simpler to decode ISA such as ARM.]They are not about to drop [the op cache] anytime soon, right?
The clustered decoder might also help mobile (compared to a single decoder pipeline but with same total width [or even compared to any >4 wide single-pipeline decoder]): While there is only one thread running on a core, power off half of the decoder.Wider won't help Zen 5 all they need is to cut the frontend latency. Anyway, that hyped clustered frontend was purposedly done to handle server/SMT.
To quote part of the article conclusion
So if they ever let go of op cache, they better make the decoder wider, but with the op cache in place they should rather make it faster. Wider won't help, what lines up with other C&C pieces where they mention decode width is not a bottleneck.
And Intel's example shows that wider is not necessairly the answer.
As for the power consumption aspect, I just rediscovered this post on prices to pay with wide decoders (in ISAs with variable instruction lengths):The clustered decoder might also help mobile (compared to a single decoder pipeline but with same total width [or even compared to any >4 wide single-pipeline decoder]): While there is only one thread running on a core, power off half of the decoder.
Even after you know the lengths, you still have to pay the massively wide mux tree to align the instruction starts with the decoders. (Technically, this usually happens after first stage of decode begins on every byte boundary, but the point is, you still have to do it at some point.) This structure is huge and high-latency, and grows quadratically with decode width. (x86 instructions are 1-15 bytes long. First instruction slot selects first byte. Second instruction slot selects any byte between second, and 16th. Third slots selects any between third, and 31st. You get where this is going.) And unless instructions are always the same width, all those transistors switch every cycle, so you pay a lot of power too.
I could see AMD or more likely Intel drop the micro op cache. Most of the ARM cores don't have one. Apple's M-series are the fastest and most efficient CPUs. It seems to me that wide decoders is more area efficient than a big micro op cache.I don't know why some people thing uop caches are going away in the future. I think AMD should do something with the decoder though. Its worst case has essentially been 4 wide since Zen 1. Maybe they go to 2x5. That may be to big of a change for Zen 6 though.
I could see AMD or more likely Intel drop the micro op cache. Most of the ARM cores don't have one. Apple's M-series are the fastest and most efficient CPUs. It seems to me that wide decoders is more area efficient than a big micro op cache.
I'm not convinced that a micro op cache lowers power usage by that much. If that were true, then why don't ARM cores have one? They are more power constrained. The fact that ARM is easier to decode might mean that ARM decoders are less power hungry, but that answer isn't sufficient imho. I don't see how the difference between ARM and x86 could be so big that AMD is worried about decoder power consumption, but Apple isn't.Decoding is power hungry. Decoding ARM is easier so less need for an opcache. It also allows them to use a longer pipeline with less drawbacks. Of course that's assuming the plan on keeping high frequencies in the future.
Strix Halo does not support dGPUs and features only 12 PCIe Gen 4 lanes from the CPU.
thats a big turn off. I was hoping we get dGPU support unlike M chipsA detail that is not hasardous :
"Our big Middle cores are better than their army of little cores" AMD's Ben Conrad chats about some of the design decisions behind Ryzen AI APUs and what makes Strix Halo tick
News, Reviews and other Informations about Laptopswww.notebookcheck.net
Not at all. That's why Strix point exists.thats a big turn off. I was hoping we get dGPU support unlike M chips
more like Fire range but yeah I was hoping for dGPUsNot at all. That's why Strix point exists.