Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 177 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

eek2121

Diamond Member
Aug 2, 2005
3,051
4,274
136
There's a Russian saying "9 women can't make a baby in 1 month". It could be that we're at the point of diminishing returns without a paradigm shift. We may need to add some new ISA extensions, update how x86 is compiled and ditch concepts like SMT to take the next step.
They just need the right man.
OMG! Yeah, AMD added all those resources to Zen5 for a measly 10% bump in 1T performance. Nutz! I suppose that there is a scenario where all the engineers on the Zen5 development team fell down at once and suffered TBIs and had to stop development b/4 finished. Somehow, I think that is unlikely. The only other thing I could see is that TSMC's N3E node is stroking out.
A lot of folks here are expecting large jumps. Zen 4 has an okay jump. Getting large improvements from an already well optimized design is very hard.
Not really. For example. if we put a RTX 4070 in an CPU SoC package or a monolithic die (with cpu) which shares DDR5 memory, it's not going to perform well due to gpu to main memory bandwidth bottleneck and also the bottleneck due to sharing the main memory with the CPU.

Also, the cost of the RTX 4070 silicon remains the same irrespective of whether it's an iGPU in a CPU SoC package or a separate dGPU graphics card. The only place where there is cost savings is GDDR6 graphics memory as we won't be typically using them with iGPUs.

To summarize, for identical performance, the iGPU needs to be more powerful (and hence more expensive) than a dGPU. A dedicated RTX 4060 Ti will comfortably beat a integrated RTX 4070.
Go look at the latest article on the Geekom Mini PC on the front page of AT. Those types if boxes are becoming popular. I own one with a Cezanne chip and it works great Strix Halo would be wonderful in that form factor.

You are wrong about memory constraints. LPDDR5 is available at speeds of 8533 mhz, LPDDR5T is even faster. Does it reach GDDR6x speeds? No. Is it fast enough for an IGP? Absolutely. LPDDR5 + a decent cache will likely be more than enough to power it.

If the chip reaches 4060 levels of performance, OEMs can save a ton of money by not including a dGPU. If it hits 4070 levels of performance, it will be an instabuy for me.
 

Gideon

Golden Member
Nov 27, 2007
1,712
3,932
136
OMG! Yeah, AMD added all those resources to Zen5 for a measly 10% bump in 1T performance. Nutz! I suppose that there is a scenario where all the engineers on the Zen5 development team fell down at once and suffered TBIs and had to stop development b/4 finished. Somehow, I think that is unlikely. The only other thing I could see is that TSMC's N3E node is stroking out.
While I don't think this is the most likely outcome I can totally see this as a possibility. The industry is actually full of such examples:
  • RDNA2 -> RDNA3 is a prime example, given the amount of added transistors and a major node shrink, the performance uplift is really bad.
  • The entire Samsung Custom ARM CPU family is an example (where significantly wider cores on better nodes ended up with almost no improvement in perf/watt)
  • And particularly Intel's Cypress Cove is an example (the infamous Sunny Cove's 14nm derivate).

AMD has executed near-flawlessly on the CPU side, so this makes it harder to believe. However, they've also had TSMC constantly delivering on the process side. As we know TSMC screwed up with N3E. AMD might have had limited time /resources to invest into a N4 backport.


Going back to the original leaked slides, the Zen 6 (Morpheus) generation lists 10%+ IPC uplift. That's pretty big for what's rumored to be a minor node bump + a major packaging redesign (but few changes to the core CPU architecture).

I'm not saying that the 10-15% prediction must be true. All I'm saying is if TSMC indeed hit a major problems with N3E and AMD had to switch to N4 quite late - there could be plenty of reasons why an otherwise very good architecture just can't stretch it's wings.



Nevertheless, even if the low IPC growth ends up true, these changes might be necessary to reap the benefits in later architectures: Zen6, Zen 7, etc....

In hindsight one can consider Zen 1 to be qute unbalanced core compared to Zen 4 (for all the parts that didn't change much in the 4 generations e.g. the 6 microop dispatch).

You have to significantly widen the core at some pont, even when it immediately looks like a "waste of resources".
 
Last edited:

Timmah!

Golden Member
Jul 24, 2010
1,463
729
136
While I don't think this is the most likely outcome I can totally see this as a possibility. The industry is actually full of such examples:
  • RDNA2 -> RDNA3 is a prime example, given the amount of added transistors and a major node shrink, the performance uplift is really bad.
  • The entire Samsung Custom ARM CPU family is an example (where significantly wider cores on better nodes ended up with almost no improvement in perf/watt)
  • And particularly Intel's Cypress Cove is an example (the infamous Sunny Cove's 14nm derivate).

AMD has executed near-flawlessly on the CPU side, so this makes it harder to believe. However, they've also had TSMC constantly delivering on the process side. As we know TSMC screwed up with N3E. AMD might have had limited time /resources to invest into a N4 backport.


Going back to the original leaked slides, the Zen 6 (Morpheus) generation lists 10%+ IPC uplift. That's pretty big for what's rumored to be a minor node bump + a major packaging redesign (but few changes to the core CPU architecture).

I'm not saying that the 10-15% prediction must be true. All I'm saying is if TSMC indeed hit a major problems with N3E and AMD had to switch to N4 quite late - there could be plenty of reasons why an otherwise very good architecture just can't stretch it's wings.



Nevertheless, even if the low IPC growth ends up true, these changes might be necessary to reap the benefits in later architectures: Zen6, Zen 7, etc....

In hindsight one can consider Zen 1 to be qute unbalanced core compared to Zen 4 (for all the parts that didn't change much in the 4 generations e.g. the 6 microop dispatch).

You have to significantly widen the core at some pont, even when it immediately looks like a "waste of resources".
Agreed. Is it Golden Cove not significantly bigger core than Zen4, yet only marginally more performant? Perhaps thats not down to some flaw, but more of a "low hanging fruit" being gone situation and now bigger Zen5 core basically only drawing GC or slightly beating it would pretty much prove that?
 

inf64

Diamond Member
Mar 11, 2011
3,764
4,222
136
Agreed. Is it Golden Cove not significantly bigger core than Zen4, yet only marginally more performant? Perhaps thats not down to some flaw, but more of a "low hanging fruit" being gone situation and now bigger Zen5 core basically only drawing GC or slightly beating it would pretty much prove that?
Well that will end up being the case if we really get the measly 10-15% IPC increase coming from Zen 4. Let's wait and see
 
Reactions: Tlh97 and Timmah!

Ajay

Lifer
Jan 8, 2001
16,094
8,106
136
Agreed. Is it Golden Cove not significantly bigger core than Zen4, yet only marginally more performant? Perhaps thats not down to some flaw, but more of a "low hanging fruit" being gone situation and now bigger Zen5 core basically only drawing GC or slightly beating it would pretty much prove that?
So now AMD has to go after the fruit higher up in the tree. The good news is that they have a larger transistor budget to chase after the gains that can be had. The biggest problem, going forward, is the dramatic slowdown in process node improvements. Chiplet/tile packaging is the only way forward - but that introduces more complexity and costs.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,428
2,914
136
You can just add silicon die area of the 2 parts + extra memory that will get poor utilization + board cost + cooling + extra assembly cost
Don't know what you mean by that extra memory?
Board cost and assembly cost yes, but that should be pretty cheap for a laptop.
Strix Halo supposedly has 120W, so I don't think you really need a stronger cooling for a comparably performing CPU+dGPU.
Shared LPDDR5 vs (LP)DDR5 + GDDR. The shared will win on cost, we will see about performance, especially how much MALL cache helps.
I wouldn't be so sure about which one would win.
We are talking about 24GB LPDDR5x up to 8.533gbps vs 16GB DDR5 4.8-5.6gbps + 8GB GDDR6 16gbps.
 

adroc_thurston

Diamond Member
Jul 2, 2023
3,322
4,790
96
We are talking about 24GB LPDDR5x
Why so low?
memory's cheap those days.
All I'm saying is if TSMC indeed hit a major problems with N3E and AMD had to switch to N4 quite late - there could be plenty of reasons why an otherwise very good architecture just can't stretch it's wings.
Nodes are irrelevant, all bleeding edge stuff has rounding error improvements those days.
N3 is a meagre 30% device-level density bump.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,331
2,942
106
Don't know what you mean by that extra memory?
Board cost and assembly cost yes, but that should be pretty cheap for a laptop.
Strix Halo supposedly has 120W, so I don't think you really need a stronger cooling for a comparably performing CPU+dGPU.

I wouldn't be so sure about which one would win.
We are talking about 24GB LPDDR5x up to 8.533gbps vs 16GB DDR5 4.8-5.6gbps + 8GB GDDR6 16gbps.

Pooled memory tends to provide some saving. Less of shared memory provides the same benefit as more in total of segregated buckets.

Most likely spec of Strix Point is with 4 x 8 GB LPDDR5x chips rated at ~8333 = 32 GB

Equivalent discrete configuration would have to be 32 GB of DDR5 + 8 GB of GDDR6. And this configuration would still more likely run out of graphics memory in AAA games than the Strix Point, because Windows would just allocate more memory dynamically from the shared pool in Strix Point.
 
Reactions: Tlh97

Joe NYC

Platinum Member
Jun 26, 2021
2,331
2,942
106
With low volume that won't happen. AMD needs to meet demand and margins are not high in this segment.

I see a scenario where this is successful: If there is an exodus from notebooks with discrete GPU.

In this scenario, Strix Point is head and shoulders above the competition in providing an alternative to notebooks with dGPU.
 
Reactions: Tlh97

TESKATLIPOKA

Platinum Member
May 1, 2020
2,428
2,914
136
Pooled memory tends to provide some saving. Less of shared memory provides the same benefit as more in total of segregated buckets.

Most likely spec of Strix Point is with 4 x 8 GB LPDDR5x chips rated at ~8333 = 32 GB

Equivalent discrete configuration would have to be 32 GB of DDR5 + 8 GB of GDDR6. And this configuration would still more likely run out of graphics memory in AAA games than the Strix Point, because Windows would just allocate more memory dynamically from the shared pool in Strix Point.
It could be 32, 48 or 64GB with 4* 8533mbps modules on board.

Even in productivity It wouldn't be really equal because IGP still needs some memory allocation, but that's just a few hundred mega, but during gaming that equivalent is just not true.
You are right, that shared memory can bypass the limit of 8GB Vram.
 

Glo.

Diamond Member
Apr 25, 2015
5,761
4,666
136
I see a scenario where this is successful: If there is an exodus from notebooks with discrete GPU.

In this scenario, Strix Point is head and shoulders above the competition in providing an alternative to notebooks with dGPU.
AMD needs to pour some money into development of this platform. Its not typical product, which will require quite a lot of engineering from OEMS, even if it is in general - simplifying things for them.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,331
2,942
106
AMD needs to pour some money into development of this platform. Its not typical product, which will require quite a lot of engineering from OEMS, even if it is in general - simplifying things for them.
I think AMD should go Apple route with Strix Halo, and place memory inside the package. To remove one step of complexity from the OEMs. This would make adopting this platform (by the OEMs) less complex than any other.

Then, changing memory types, speeds, even adding more channels (providing there is enough physical space) would be encapsulated inside the package, and all the OEMs would have to worry about is connecting the IO, power and cooling.

BTW, this will eventually happen. Resisting this trend is counterproductive and self defeating. Giving OEMs ability to pair AMD product with garbage memory is never going to lead to anything good.
 
Reactions: Gideon and Tlh97

adroc_thurston

Diamond Member
Jul 2, 2023
3,322
4,790
96
8533 module.
Gonna be the exact same price as the 7500 one.
They're just ICs.
To remove one step of complexity from the OEMs. This would make adopting this platform (by the OEMs) less complex than any other.
SKU spam. gross.
Its also quad channel, which changes the PCBs, which means - new production lines/supply chain parts.
no?
type4 HDI PCB isn't some magick and is necessary to hit 7500 on PHX anyway.
 
Reactions: Tlh97 and Joe NYC

Joe NYC

Platinum Member
Jun 26, 2021
2,331
2,942
106
8533 module.

RX 7600 has a bit higher BW with 128-bit 18gbps than Strix Halo with 256-bit 7500-8533mbps and then there is the extra CPU part on top.

RX 7600 and 4600 Ti both seem to have a bandwidth of 288 GB/s
StrixPoint, if using 8533 speed modules would be 273 GB/s

5800x with DDR4-3200 had max bandwidth of 51 GB/s

This is as a comparison, of how much bandwidth CPU can take up, and typically, CPU is a lot more latency constrained than bandwidth constrained. So, CPU typically uses a fraction of the 51 GB/s, and most of the bandwidth would be left to the GPU

RX 7600 and 4600 Ti PCIe4 x8 bandwidth: 16 GB/s

When moving data large chunks of date, such as loading textures, it is no comparison, iGPU is ahead by order of magnitude.

How about sending commands to the GPU? I think latency here is also favoring iGPU, hands down.

There is another, more speculative area. Say the CPU decompresses a texture and writes it to certain memory address. It may be written to MALL cache first. Then CPU sends it to GPU, and it might be possible to the GPU to get it from the MALL instead of getting it from memory.

So, MALL may be more than compensating for memory areas in contention, since MALL bandwidth to GPU is likely >> LPDDR5 bandwidth.
 
Last edited:
Reactions: Tlh97 and Kryohi

Joe NYC

Platinum Member
Jun 26, 2021
2,331
2,942
106
SKU spam. gross.

Ok, so that is one (minor) downside. Although, 32 GB of LPDDR5x-8533 would likely be 90% of volume.

Upside? Say this helps AMD gain 50% more design wins. It would be definitely worth it.

Also, setting up the assembly and then cost of assembly per unit would be cheaper for AMD to do it for all the OEMS vs. if each separate OEMs were to duplicate the same work.
 
Reactions: Tlh97

biostud

Lifer
Feb 27, 2003
18,399
4,964
136
Since AMD creates the SOC for both Playstation and Xbox, they could probably do something similar for PC if there were a marketing for it (which I doubt)
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |