Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

soresu · Oct 17, 2023

coercitiv said:
Like in many human activities, making only the most efficient moves isn't necessarily a winning long-term strategy.

Going by the principle of Jevons Paradox this very seldom happens anyway because the moment you make something more efficient someone wants more out of it so you lose the efficiency anyway 😅

soresu · Oct 17, 2023

Frenetic Pony said:
Goals should never include performance per watt or per mm

Per mm is always going to be relevant from a basic business sense for the ODMs profit margins.

eek2121 · Oct 17, 2023

H433x0n said:
There's a Russian saying "9 women can't make a baby in 1 month". It could be that we're at the point of diminishing returns without a paradigm shift. We may need to add some new ISA extensions, update how x86 is compiled and ditch concepts like SMT to take the next step.

They just need the right man.

Ajay said:
OMG! Yeah, AMD added all those resources to Zen5 for a measly 10% bump in 1T performance. Nutz! I suppose that there is a scenario where all the engineers on the Zen5 development team fell down at once and suffered TBIs and had to stop development b/4 finished. Somehow, I think that is unlikely. The only other thing I could see is that TSMC's N3E node is stroking out.

A lot of folks here are expecting large jumps. Zen 4 has an okay jump. Getting large improvements from an already well optimized design is very hard.

SiliconFly said:
Not really. For example. if we put a RTX 4070 in an CPU SoC package or a monolithic die (with cpu) which shares DDR5 memory, it's not going to perform well due to gpu to main memory bandwidth bottleneck and also the bottleneck due to sharing the main memory with the CPU.

Also, the cost of the RTX 4070 silicon remains the same irrespective of whether it's an iGPU in a CPU SoC package or a separate dGPU graphics card. The only place where there is cost savings is GDDR6 graphics memory as we won't be typically using them with iGPUs.

To summarize, for identical performance, the iGPU needs to be more powerful (and hence more expensive) than a dGPU. A dedicated RTX 4060 Ti will comfortably beat a integrated RTX 4070.

Go look at the latest article on the Geekom Mini PC on the front page of AT. Those types if boxes are becoming popular. I own one with a Cezanne chip and it works great Strix Halo would be wonderful in that form factor.

You are wrong about memory constraints. LPDDR5 is available at speeds of 8533 mhz, LPDDR5T is even faster. Does it reach GDDR6x speeds? No. Is it fast enough for an IGP? Absolutely. LPDDR5 + a decent cache will likely be more than enough to power it.

If the chip reaches 4060 levels of performance, OEMs can save a ton of money by not including a dGPU. If it hits 4070 levels of performance, it will be an instabuy for me.

Gideon · Oct 17, 2023

Ajay said:
OMG! Yeah, AMD added all those resources to Zen5 for a measly 10% bump in 1T performance. Nutz! I suppose that there is a scenario where all the engineers on the Zen5 development team fell down at once and suffered TBIs and had to stop development b/4 finished. Somehow, I think that is unlikely. The only other thing I could see is that TSMC's N3E node is stroking out.

While I don't think this is the most likely outcome I can totally see this as a possibility. The industry is actually full of such examples:

RDNA2 -> RDNA3 is a prime example, given the amount of added transistors and a major node shrink, the performance uplift is really bad.
The entire Samsung Custom ARM CPU family is an example (where significantly wider cores on better nodes ended up with almost no improvement in perf/watt)
And particularly Intel's Cypress Cove is an example (the infamous Sunny Cove's 14nm derivate).

AMD has executed near-flawlessly on the CPU side, so this makes it harder to believe. However, they've also had TSMC constantly delivering on the process side. As we know TSMC screwed up with N3E. AMD might have had limited time /resources to invest into a N4 backport.

Going back to the original leaked slides, the Zen 6 (Morpheus) generation lists 10%+ IPC uplift. That's pretty big for what's rumored to be a minor node bump + a major packaging redesign (but few changes to the core CPU architecture).

I'm not saying that the 10-15% prediction must be true. All I'm saying is if TSMC indeed hit a major problems with N3E and AMD had to switch to N4 quite late - there could be plenty of reasons why an otherwise very good architecture just can't stretch it's wings.

Nevertheless, even if the low IPC growth ends up true, these changes might be necessary to reap the benefits in later architectures: Zen6, Zen 7, etc....

In hindsight one can consider Zen 1 to be qute unbalanced core compared to Zen 4 (for all the parts that didn't change much in the 4 generations e.g. the 6 microop dispatch).

You have to significantly widen the core at some pont, even when it immediately looks like a "waste of resources".

Timmah! · Oct 17, 2023

Gideon said:
While I don't think this is the most likely outcome I can totally see this as a possibility. The industry is actually full of such examples:

RDNA2 -> RDNA3 is a prime example, given the amount of added transistors and a major node shrink, the performance uplift is really bad.

The entire Samsung Custom ARM CPU family is an example (where significantly wider cores on better nodes ended up with almost no improvement in perf/watt)

And particularly Intel's Cypress Cove is an example (the infamous Sunny Cove's 14nm derivate).

AMD has executed near-flawlessly on the CPU side, so this makes it harder to believe. However, they've also had TSMC constantly delivering on the process side. As we know TSMC screwed up with N3E. AMD might have had limited time /resources to invest into a N4 backport.

Going back to the original leaked slides, the Zen 6 (Morpheus) generation lists 10%+ IPC uplift. That's pretty big for what's rumored to be a minor node bump + a major packaging redesign (but few changes to the core CPU architecture).

I'm not saying that the 10-15% prediction must be true. All I'm saying is if TSMC indeed hit a major problems with N3E and AMD had to switch to N4 quite late - there could be plenty of reasons why an otherwise very good architecture just can't stretch it's wings.

Nevertheless, even if the low IPC growth ends up true, these changes might be necessary to reap the benefits in later architectures: Zen6, Zen 7, etc....

In hindsight one can consider Zen 1 to be qute unbalanced core compared to Zen 4 (for all the parts that didn't change much in the 4 generations e.g. the 6 microop dispatch).

You have to significantly widen the core at some pont, even when it immediately looks like a "waste of resources".

Agreed. Is it Golden Cove not significantly bigger core than Zen4, yet only marginally more performant? Perhaps thats not down to some flaw, but more of a "low hanging fruit" being gone situation and now bigger Zen5 core basically only drawing GC or slightly beating it would pretty much prove that?

inf64 · Oct 17, 2023

Timmah! said:
Agreed. Is it Golden Cove not significantly bigger core than Zen4, yet only marginally more performant? Perhaps thats not down to some flaw, but more of a "low hanging fruit" being gone situation and now bigger Zen5 core basically only drawing GC or slightly beating it would pretty much prove that?

Well that will end up being the case if we really get the measly 10-15% IPC increase coming from Zen 4. Let's wait and see

Ajay · Oct 17, 2023

Timmah! said:
Agreed. Is it Golden Cove not significantly bigger core than Zen4, yet only marginally more performant? Perhaps thats not down to some flaw, but more of a "low hanging fruit" being gone situation and now bigger Zen5 core basically only drawing GC or slightly beating it would pretty much prove that?

So now AMD has to go after the fruit higher up in the tree. The good news is that they have a larger transistor budget to chase after the gains that can be had. The biggest problem, going forward, is the dramatic slowdown in process node improvements. Chiplet/tile packaging is the only way forward - but that introduces more complexity and costs.

TESKATLIPOKA · Oct 17, 2023

Joe NYC said:
You can just add silicon die area of the 2 parts + extra memory that will get poor utilization + board cost + cooling + extra assembly cost

Don't know what you mean by that extra memory?
Board cost and assembly cost yes, but that should be pretty cheap for a laptop.
Strix Halo supposedly has 120W, so I don't think you really need a stronger cooling for a comparably performing CPU+dGPU.

Joe NYC said:
Shared LPDDR5 vs (LP)DDR5 + GDDR. The shared will win on cost, we will see about performance, especially how much MALL cache helps.

I wouldn't be so sure about which one would win.
We are talking about 24GB LPDDR5x up to 8.533gbps vs 16GB DDR5 4.8-5.6gbps + 8GB GDDR6 16gbps.

adroc_thurston · Oct 17, 2023

TESKATLIPOKA said:
We are talking about 24GB LPDDR5x

Why so low?
memory's cheap those days.

Gideon said:
All I'm saying is if TSMC indeed hit a major problems with N3E and AMD had to switch to N4 quite late - there could be plenty of reasons why an otherwise very good architecture just can't stretch it's wings.

Nodes are irrelevant, all bleeding edge stuff has rounding error improvements those days.
N3 is a meagre 30% device-level density bump.

igor_kavinski · Oct 17, 2023

adroc_thurston said:
memory's cheap those days.

Apple/nGreedia beg to differ.

TESKATLIPOKA · Oct 17, 2023

adroc_thurston said:
Why so low?
memory's cheap those days.

Low speed memory is cheap, but the faster ones cost a lot more.

Strix Halo will use faster memory, so OEMs will ask extra for It.

Joe NYC · Oct 17, 2023

yuri69 said:
Intel is in a completely different market position compared to AMD. Intel has no problem pushing OEMs their stuff (Ultrabooks?). IIRC, AMD resorts to obscure exclusives.

Intel once was in the same position in servers and now it is reversed.

AMD is trying to repeat the same in notebooks. Which will be harder, but worth trying.

TESKATLIPOKA · Oct 17, 2023

Joe NYC said:
AMD is trying to repeat the same in notebooks. Which will be harder, but worth trying.

With low volume that won't happen. AMD needs to meet demand and margins are not high in this segment.

Joe NYC · Oct 17, 2023

TESKATLIPOKA said:
Don't know what you mean by that extra memory?
Board cost and assembly cost yes, but that should be pretty cheap for a laptop.
Strix Halo supposedly has 120W, so I don't think you really need a stronger cooling for a comparably performing CPU+dGPU.

I wouldn't be so sure about which one would win.
We are talking about 24GB LPDDR5x up to 8.533gbps vs 16GB DDR5 4.8-5.6gbps + 8GB GDDR6 16gbps.

Pooled memory tends to provide some saving. Less of shared memory provides the same benefit as more in total of segregated buckets.

Most likely spec of Strix Point is with 4 x 8 GB LPDDR5x chips rated at ~8333 = 32 GB

Equivalent discrete configuration would have to be 32 GB of DDR5 + 8 GB of GDDR6. And this configuration would still more likely run out of graphics memory in AAA games than the Strix Point, because Windows would just allocate more memory dynamically from the shared pool in Strix Point.

Joe NYC · Oct 17, 2023

TESKATLIPOKA said:
With low volume that won't happen. AMD needs to meet demand and margins are not high in this segment.

I see a scenario where this is successful: If there is an exodus from notebooks with discrete GPU.

In this scenario, Strix Point is head and shoulders above the competition in providing an alternative to notebooks with dGPU.

TESKATLIPOKA · Oct 17, 2023

Joe NYC said:
Pooled memory tends to provide some saving. Less of shared memory provides the same benefit as more in total of segregated buckets.

Most likely spec of Strix Point is with 4 x 8 GB LPDDR5x chips rated at ~8333 = 32 GB

Equivalent discrete configuration would have to be 32 GB of DDR5 + 8 GB of GDDR6. And this configuration would still more likely run out of graphics memory in AAA games than the Strix Point, because Windows would just allocate more memory dynamically from the shared pool in Strix Point.

It could be 32, 48 or 64GB with 4* 8533mbps modules on board.

Even in productivity It wouldn't be really equal because IGP still needs some memory allocation, but that's just a few hundred mega, but during gaming that equivalent is just not true.
You are right, that shared memory can bypass the limit of 8GB Vram.

Glo. · Oct 17, 2023

Joe NYC said:
I see a scenario where this is successful: If there is an exodus from notebooks with discrete GPU.

In this scenario, Strix Point is head and shoulders above the competition in providing an alternative to notebooks with dGPU.

AMD needs to pour some money into development of this platform. Its not typical product, which will require quite a lot of engineering from OEMS, even if it is in general - simplifying things for them.

adroc_thurston · Oct 17, 2023

igor_kavinski said:
Apple/nGreedia beg to differ.

Irrelevant for stx-halo.

TESKATLIPOKA said:
Low speed memory is cheap, but the faster ones cost a lot more.

Ughhhh no?
It's all 7500 ICs everywhere right now.

Glo. said:
Its not typical product

It's just a fatter APU.
Market traction (well, sales, really) is the question.

TESKATLIPOKA · Oct 17, 2023

adroc_thurston said:
Ughhhh no?
It's all 7500 ICs everywhere right now.

8533 module.

RX 7600 has a bit higher BW with 128-bit 18gbps than Strix Halo with 256-bit 7500-8533mbps and then there is the extra CPU part on top.

Joe NYC · Oct 17, 2023

Glo. said:
AMD needs to pour some money into development of this platform. Its not typical product, which will require quite a lot of engineering from OEMS, even if it is in general - simplifying things for them.

I think AMD should go Apple route with Strix Halo, and place memory inside the package. To remove one step of complexity from the OEMs. This would make adopting this platform (by the OEMs) less complex than any other.

Then, changing memory types, speeds, even adding more channels (providing there is enough physical space) would be encapsulated inside the package, and all the OEMs would have to worry about is connecting the IO, power and cooling.

BTW, this will eventually happen. Resisting this trend is counterproductive and self defeating. Giving OEMs ability to pair AMD product with garbage memory is never going to lead to anything good.

Glo. · Oct 17, 2023

adroc_thurston said:
It's just a fatter APU.
Market traction (well, sales, really) is the question.

Its also quad channel, which changes the PCBs, which means - new production lines/supply chain parts.

Unless the memory is soldered on the APU package, then its a completely different discussion.

adroc_thurston · Oct 17, 2023

TESKATLIPOKA said:
8533 module.

Gonna be the exact same price as the 7500 one.
They're just ICs.

Joe NYC said:
To remove one step of complexity from the OEMs. This would make adopting this platform (by the OEMs) less complex than any other.

SKU spam. gross.

Glo. said:
Its also quad channel, which changes the PCBs, which means - new production lines/supply chain parts.

no?
type4 HDI PCB isn't some magick and is necessary to hit 7500 on PHX anyway.

Joe NYC · Oct 17, 2023

TESKATLIPOKA said:
8533 module.

RX 7600 has a bit higher BW with 128-bit 18gbps than Strix Halo with 256-bit 7500-8533mbps and then there is the extra CPU part on top.

RX 7600 and 4600 Ti both seem to have a bandwidth of 288 GB/s
StrixPoint, if using 8533 speed modules would be 273 GB/s

5800x with DDR4-3200 had max bandwidth of 51 GB/s

This is as a comparison, of how much bandwidth CPU can take up, and typically, CPU is a lot more latency constrained than bandwidth constrained. So, CPU typically uses a fraction of the 51 GB/s, and most of the bandwidth would be left to the GPU

RX 7600 and 4600 Ti PCIe4 x8 bandwidth: 16 GB/s

When moving data large chunks of date, such as loading textures, it is no comparison, iGPU is ahead by order of magnitude.

How about sending commands to the GPU? I think latency here is also favoring iGPU, hands down.

There is another, more speculative area. Say the CPU decompresses a texture and writes it to certain memory address. It may be written to MALL cache first. Then CPU sends it to GPU, and it might be possible to the GPU to get it from the MALL instead of getting it from memory.

So, MALL may be more than compensating for memory areas in contention, since MALL bandwidth to GPU is likely >> LPDDR5 bandwidth.

Joe NYC · Oct 17, 2023

adroc_thurston said:
SKU spam. gross.

Ok, so that is one (minor) downside. Although, 32 GB of LPDDR5x-8533 would likely be 90% of volume.

Upside? Say this helps AMD gain 50% more design wins. It would be definitely worth it.

Also, setting up the assembly and then cost of assembly per unit would be cheaper for AMD to do it for all the OEMS vs. if each separate OEMs were to duplicate the same work.

biostud · Oct 17, 2023

Since AMD creates the SOC for both Playstation and Xbox, they could probably do something similar for PC if there were a marketing for it (which I doubt)

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Platinum Member

Platinum Member

Diamond Member

Golden Member

Golden Member

Diamond Member

Lifer

Platinum Member

Diamond Member

Lifer

Platinum Member

Platinum Member

Platinum Member

Platinum Member

Platinum Member

Platinum Member

Diamond Member

Diamond Member

Platinum Member

Platinum Member

Diamond Member

Diamond Member

Platinum Member

Platinum Member

Lifer