Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

Mopetar · Apr 17, 2023

Exist50 said:
SRAM doesn't scale at all N5->N3E. https://fuse.wikichip.org/news/7343/iedm-2022-did-we-just-witness-the-death-of-sram/

That's not surprising. The scaling from 7nm to 5nm wasn't particularly great either.

I do wonder if we get to a point where it's more economical for the entire L3 cache to be a stacked cache. The latency isn't as critical and being able to manufacture it on an old node makes for a lot of cost savings and lets the base die shrink by a fair bit.

Exist50 · Apr 17, 2023

Mopetar said:
That's not surprising. The scaling from 7nm to 5nm wasn't particularly great either.

I do wonder if we get to a point where it's more economical for the entire L3 cache to be a stacked cache. The latency isn't as critical and being able to manufacture it on an old node makes for a lot of cost savings and lets the base die shrink by a fair bit.

I think that's ultimately the end game, assuming hybrid bonding production/costs scale sufficiently.

turtile · Apr 17, 2023

Geddagod said:
Because both 4 and 3nm variants of Zen 5 are supposed to come out in 2024, and it doesn't make much sense to me IMO to release Zen 5 and Zen 5+ in the same year when AMD is already on a 1.5 year pace, and Zen 5 should be comfortable in terms of performance with it's competitors as well.
Esp considering the price of 3nm as well. But that's just my opinion.

They make a yearly releases for mobile so I definitely think there will be a 3nm Zen 5+ out in 2025

Geddagod · Apr 17, 2023

turtile said:
They make a yearly releases for mobile so I definitely think there will be a 3nm Zen 5+ out in 2025

mobile is on a refreshed node without any major changes, no new design rules, nothing like that. 4 to 3nm won't be the same.
Plus with the rumors of AMD trying to quicken up their release cadence, it makes even less sense to do so.
The only way I see it is if Zen 5 was designed for 4 and 3nm simultaneously since the start, which is possible ig but still...
Based on competition and cost I don't really think there will be a point for AMD to release that even if it possible, unless Intel shocks us.

Tigerick · Apr 18, 2023

Exist50 said:
14nm/12nm and N7/N6 are both design compatible, so AMD had to do little to no work to support both. If we truly see full fat Zen 5 on both N4 and N3, that would be a notable departure.

Let's see whether upcoming STX will use both N4 and N3E processes as rumored. STX1 would be big iGPU from AMD, so AMD has to use most advanced process for it, and I also believe STX would be monolithic design seeing Intel LNL also remains monolithic.

Kepler_L2 · Apr 18, 2023

Tigerick said:
Let's see whether upcoming STX will use both N4 and N3E processes as rumored. STX1 would be big iGPU from AMD, so AMD has to use most advanced process for it, and I also believe STX would be monolithic design seeing Intel LNL also remains monolithic.

STX1 is not the big iGPU model.

soresu · Apr 18, 2023

DisEnchantment said:
Hopefully we will see stock boost 6.5 GHz+ processors, even if not, we are looking at 5 GHz base clock x86 CPUs pretty soon

I'd be thankful for just 4 Ghz at lower TDPs until manufacturers finally start making fans and cases with acoustic metamaterials.

soresu · Apr 18, 2023

Exist50 said:
SRAM doesn't scale at all N5->N3E

This is why 3D/V cache was such a great investment for AMD in their design process.

Apart from giving them more freedom to use different nodes for the cache it also opens up the possiblity of using radically different device types to SRAM for the larger caches.

soresu · Apr 18, 2023

Anyone think there is a possibility of a chiplet APU with Zen5 and RDNA4 using a full chiplet for NV4x to come in 2025?

Rumours put the individual NV4x chiplets at 48 CUs, so it would be a pretty damn meaty upgrade for single chip package implementations if it happens.

DisEnchantment · Apr 18, 2023

soresu said:
I'd be thankful for just 4 Ghz at lower TDPs until manufacturers finally start making fans and cases with acoustic metamaterials.

Clocking high and high TDP is like an x86 USP But TSMC is working hard to take away x86 high TDP advantage.

soresu said:
This is why 3D/V cache was such a great investment for AMD in their design process.

Apart from giving them more freedom to use different nodes for the cache it also opens up the Possibilist of using radically different device types to SRAM for the larger caches.

The frequency regression is one thing that might hold them back though. I don't know if stacking logic on SRAM would help.

soresu said:
Anyone think there is a possibility of a chiplet APU with Zen5 and RDNA4 using a full chiplet for NV4x to come in 2025?

Rumors put the individual NV4x chiplets at 48 CUs, so it would be a pretty damn meaty upgrade for single chip package implementations if it happens.

One thing I am wondering about is integrating an LLC in an IO chiplet.
I wonder if reducing L3 to 0.75x and introducing a massive SLC in IOD would bring some tangible benefits. This SLC could be in the IO chiplet for both mobile and desktop SKUs and can be used by the GPU as well.
Current L3 latencies on 7950X are around ~9ms. Quite excellent. SLC would probably make that >25ms. Unless a new interconnect comes in.
Which makes packaging design one of the most interesting aspect of future designs from AMD.
There is a member in the 3D Fabric portfolio, which is InFO_3D.

+
SLC with a high density of 136MTr/mm2 on N7 like the V Cache chips makes too much sense to stack on IOD. This process is now at 50-60% usage at TSMC, they will be very cheap going forward.

BorisTheBlade82 · Apr 18, 2023

soresu said:
Anyone think there is a possibility of a chiplet APU with Zen5 and RDNA4 using a full chiplet for NV4x to come in 2025?

Rumours put the individual NV4x chiplets at 48 CUs, so it would be a pretty damn meaty upgrade for single chip package implementations if it happens.

Don't think so. It would be brutally bottlenecked by RAM bandwidth.

Timorous · Apr 18, 2023

DisEnchantment said:
Clocking high and high TDP is like an x86 USP But TSMC is working hard to take away x86 high TDP advantage.

The frequency regression is one thing that might hold them back though. I don't know if stacking logic on SRAM would help.

One thing I am wondering about is integrating an LLC in an IO chiplet.
I wonder if reducing L3 to 0.75x and introducing a massive SLC in IOD would bring some tangible benefits. This SLC could be in the IO chiplet for both mobile and desktop SKUs and can be used by the GPU as well.
Current L3 latencies on 7950X are around ~9ms. Quite excellent. SLC would probably make that >25ms. Unless a new interconnect comes in.
Which makes packaging design one of the most interesting aspect of future designs from AMD.
There is a member in the 3D Fabric portfolio, which is InFO_3D.

View attachment 79660+
SLC with a high density of 136MTr/mm2 on N7 like the V Cache chips makes too much sense to stack on IOD. This process is now at 50-60% usage at TSMC, they will be very cheap going forward.

Why not both. Have v-cache under the CCDs (so can probably fit 2 under a 72mm die which is going to be ballpark Zen 5 CCD size which would be 128MB + whatever is in the CCD itself) and have v-cache on the IOD as an L4 cache, with the die size as it is you could probably fit 4 cache dies on\under the IOD for 256MB there.

Maybe this is more something for Zen 6. I could see that being 3D as standard so that could be how they increase the CCD core count. Strip out the L3 almost entirely and have that be done via stacked cached dies and use that space for more cores. That has to be where we end up at some point just not sure of when.

eek2121 · Apr 18, 2023

DisEnchantment said:
Leaving out density for a moment, N5 --> N3E is on par in terms of perf and efficiency vs N7 --> N5 as per TSMC data.
For chiplet based designs, lack of density scaling while not ideal can be tackled to some extent by advanced packaging. Issue mostly for Monolithic designs.
Bulk of Analog+IO is already off die anyway, and SRAM is main issue with no scaling with N5 --> N3E compared to 1.35x scaling on N7 --> N5
That 15-20% perf looks sweet if it goes with a high clocked long pipeline x86 processor. Hopefully we will see stock boost 6.5 GHz+ processors, even if not, we are looking at 5 GHz base clock x86 CPUs pretty soon.
N3P will be a minor bump just like N4P.

N2 is bringing barely 1.1x scaling, so not much to write home about. But perf and efficiency again improved significantly.

View attachment 79647 View attachment 79648
Edited some SRAM scaling values I recollected wrongly.

Yes, and as we move forward, I suspect Intel will have the edge in terms of process. AMD definitely needs to have perfect execution.

Arrow Lake and Zen 5 will be on similar sized nodes and it only gets more challenging from there. AMD hasn’t had to deal with such a competitive threat in quite a while.

IMO, the “only” reason AMD is able to beat Intel (for desktop/mobile, on server Intel has another issue as well) is that they are on a custom N5 process and Intel is still on Intel 7. If Raptor Lake had been on Intel 4, Intel would have had better performance with lower power consumption.

Ajay · Apr 18, 2023

DisEnchantment said:
SLC with a high density of 136MTr/mm2 on N7 like the V Cache chips makes too much sense to stack on IOD. This process is now at 50-60% usage at TSMC, they will be very cheap going forward.

Definitely something worth looking at, and I'm sure AMD has done so. The question, I think, is what drawbacks are present that have prevented this from happening yet. Just packaging issues? I think, ultimately, that x86 can go the way of Apple's M series and use larger L2$ and off-chip SLC, there just has to be a high bandwidth connection to the SLC (and that could be the rub, power wise).

Doug S · Apr 18, 2023

DisEnchantment said:
Clocking high and high TDP is like an x86 USP But TSMC is working hard to take away x86 high TDP advantage.

The frequency regression is one thing that might hold them back though. I don't know if stacking logic on SRAM would help.

One thing I am wondering about is integrating an LLC in an IO chiplet.
I wonder if reducing L3 to 0.75x and introducing a massive SLC in IOD would bring some tangible benefits. This SLC could be in the IO chiplet for both mobile and desktop SKUs and can be used by the GPU as well.
Current L3 latencies on 7950X are around ~9ms. Quite excellent. SLC would probably make that >25ms. Unless a new interconnect comes in.
Which makes packaging design one of the most interesting aspect of future designs from AMD.
There is a member in the 3D Fabric portfolio, which is InFO_3D.

View attachment 79660+
SLC with a high density of 136MTr/mm2 on N7 like the V Cache chips makes too much sense to stack on IOD. This process is now at 50-60% usage at TSMC, they will be very cheap going forward.

I wonder if that packaging mix would allow Apple to make the Max die stack with a cache chip while still allowing for the integration of multiple Max chips to make an Ultra or "Extreme", or would that be too complex i.e. risky? Packaging will probably be the most interesting thing going forward at TSMC not just for AMD but for all customers. The ability to mix older processes that aren't fully utilized and will thus allow major partners like Apple & AMD to extract favorable pricing will open up new avenues.

The latency concerns you mention might be alleviated somewhat by creating a sort of hybrid two level SLC design. Keep the tags and a smaller "quick access" level close to the CPU on the main die. A lot of SLC access is semi temporal, a smart prefetcher can fill the quick access level and reduce effective latency for that in-between type of cache behavior that isn't random but isn't exactly streaming either.

Tuna-Fish · Apr 19, 2023

Doug S said:
The latency concerns you mention might be alleviated somewhat by creating a sort of hybrid two level SLC design. Keep the tags and a smaller "quick access" level close to the CPU on the main die. A lot of SLC access is semi temporal, a smart prefetcher can fill the quick access level and reduce effective latency for that in-between type of cache behavior that isn't random but isn't exactly streaming either.

Having tags on die could make sense, but a "quick access" portion does not. Simply because the kinds of loads that would likely hit it can be perfectly captured by the L2 instead.

Also, if it's stacked vertically, there is no meaningful latency penalty -- the latency difference between on-die L3 and stacked L3 is just 4 clocks.

eek2121 · Apr 19, 2023

Doug S said:
I wonder if that packaging mix would allow Apple to make the Max die stack with a cache chip while still allowing for the integration of multiple Max chips to make an Ultra or "Extreme", or would that be too complex i.e. risky? Packaging will probably be the most interesting thing going forward at TSMC not just for AMD but for all customers. The ability to mix older processes that aren't fully utilized and will thus allow major partners like Apple & AMD to extract favorable pricing will open up new avenues.

The latency concerns you mention might be alleviated somewhat by creating a sort of hybrid two level SLC design. Keep the tags and a smaller "quick access" level close to the CPU on the main die. A lot of SLC access is semi temporal, a smart prefetcher can fill the quick access level and reduce effective latency for that in-between type of cache behavior that isn't random but isn't exactly streaming either.

Something to be aware of for Apple, they are targeting mostly consumer and workstation type devices. I do think we will eventually see higher core counts from them, but only to a point. They are not currently 'officially' in the server market, so high core counts aren't a priority for them.

I do think that they will be focused on IPC, frequency increases, and GPU performance, however.

This may change if they decide to get back into the server/enterprise market (which I personally wish they would do, it would be nice to run my software on the same platform it is built on)

soresu · Apr 19, 2023

eek2121 said:
Something to be aware of for Apple, they are targeting mostly consumer and workstation type devices. I do think we will eventually see higher core counts from them, but only to a point. They are not currently 'officially' in the server market, so high core counts aren't a priority for them.

I do think that they will be focused on IPC, frequency increases, and GPU performance, however.

This may change if they decide to get back into the server/enterprise market (which I personally wish they would do, it would be nice to run my software on the same platform it is built on)

Yeah this.

I wouldn't be surprised to find that they had already put their own cores to work in a server farm/datacenter at Close Encounters of the Apple Kind 👽, but beyond that I doubt that they will ever branch into servers for the common market.

A/// · Apr 21, 2023

Apple can take their max ultra contour pad whatever and plant 4-6 of them down on a custom motherboard they designed hemselves to run their own servers and we'll never know. i'd presume tim cook personally hires hitmen should anyone at their dc's be open to speaking with the press or other outside parties. it's not a huge stretch of the mind given apple's desire for full vertical integration.

soresu said:
Yeah this.

I wouldn't be surprised to find that they had already put their own cores to work in a server farm/datacenter at Close Encounters of the Apple Kind 👽, but beyond that I doubt that they will ever branch into servers for the common market.

they run their own dc we'll never know unless it leaks. I'm sure someone out there can track shipments of x86 processors to their facilities if they wanted to or we could group together and make it like mission impossible 1 coming down into the core room of their dc by tensile rope working away at the security dialogues to access their inner treasure.

Doug S · Apr 21, 2023

A/// said:
Apple can take their max ultra contour pad whatever and plant 4-6 of them down on a custom motherboard they designed hemselves to run their own servers and we'll never know. i'd presume tim cook personally hires hitmen should anyone at their dc's be open to speaking with the press or other outside parties. it's not a huge stretch of the mind given apple's desire for full vertical integration.

they run their own dc we'll never know unless it leaks. I'm sure someone out there can track shipments of x86 processors to their facilities if they wanted to or we could group together and make it like mission impossible 1 coming down into the core room of their dc by tensile rope working away at the security dialogues to access their inner treasure.

I agree we probably wouldn't know until such a project was well underway, but I think Cook would tout it at some point either at WWDC or in an investor call, depending on whether use of their own CPUs was primarily for developer benefit (i.e. an offering that provides APIs to have parts of a Mac and iOS binary able to run in the cloud where maximum performance is beneficial) or for cost savings / efficiency (i.e. Apple saving money by using their own more efficient cores vs x86 alternatives for their future datacenter builds - which would also reduce the per CPU share of design/mask and other fixed costs for the Pro/Max lineup)

A/// · Apr 21, 2023

Doug S said:
I agree we probably wouldn't know until such a project was well underway, but I think Cook would tout it at some point either at WWDC or in an investor call, depending on whether use of their own CPUs was primarily for developer benefit (i.e. an offering that provides APIs to have parts of a Mac and iOS binary able to run in the cloud where maximum performance is beneficial) or for cost savings / efficiency (i.e. Apple saving money by using their own more efficient cores vs x86 alternatives for their future datacenter builds - which would also reduce the per CPU share of design/mask and other fixed costs for the Pro/Max lineup)

but if it's for internal use is there a point? I forget the exact reasoning the late jobs claimed for why apple got out of servers but it likely a future forecast of what was to come. apple's doing of that was at the very beginning of the cloud compute era even though it launched a few years before, it was gaining steam and teeth to it. with the likes of tenstorrent and possibly intel and amd offering their own arm solutions in the future the point is moot. apple will never sell their own hardware creations in that space to keep a competitive edge and in doing so keep the public in wonder. I'm no expert in security but it may fall under security through obscurity. easier to assume apple's running x86 and bog standard linux for their corporate stuff than running a specialised version of macos and special hardware. but even then do recall that apple was running osx 20 years ago on pentium 4 hardware. there's a few good articles online about it involving apple and dell or vaio pentium 4s. apple's skunk works had began playing around with osx on pc hardware roughly 7 or 8 years before the first intel mac came about. I've not got the link on me atm but you can easily find a write up about it. in short apple may be running their own version of macos server on x86 servers and none would be the wiser.

Mopetar · Apr 21, 2023

Apple got out of servers because they didn't sell many of them and it wasn't worth spending a lot of money to break even at best. When they were on PowerPC it made a certain amount of sense as there weren't a lot of options as far as getting that kind of hardware. Then selling x86 servers in a crowded market wasn't going to work.

Depending on how they design their chips for their pro line, servers might make sense. However right now they're making SoCs. Most people who want servers want a lot of CPU power or a lot of GPU power and the market that wants both is pretty small. Apple's SoCs would have a lot of wasted silicon from most buyer's perspectives.

The best argument for why they'd want to get into that market is that the margins can be amazing. Server CPUs and GPUs sell for several times what a high-end Mac will cost. Apple loves high margins. The best argument for why they won't is that Apple is quite stubborn about doing things their way. It's hard to attract a large customer base that are all willing to conform to your will. You can get a fair number of laptop and desktop users that don't care about that, but that's a much harder sell in the enterprise space.

Maybe we'll see a return of Xserve products based on M1 chips. Hell, they might just make their next Mac Pro rack mountable and effectively a server for anyone who wants to use it that way.

Doug S · Apr 22, 2023

A/// said:
but if it's for internal use is there a point?

Why would making servers for internal use "count" less than making them to sell? Why should Apple buy a bunch of x86 CPUs to build servers (like they have been doing and AFAIK are still doing today) if they can build servers cheaper and/or better with Apple Silicon? Or realize some benefit of using the same ISA for their cloud (or at least part of it, for storage i.e. iCloud CPU ISA is obviously irrelevant) as they use for the Mac & iPhone?

Obviously Apple wouldn't bother if they don't feel they are getting some type of benefit from it. Maybe they don't feel they will and don't ever build any AS servers. All I'm saying is that IF they do build them on a mass scale rather than just a small pilot test, we will hear about it eventually - because they would have some reason for doing it that's going to be important enough to share with developers and/or investors at some point.

As a company almost entirely focused on consumer products it is obvious why they don't offer a server for sale, even if they felt they could make better servers than Intel and AMD. Servers aren't a consumer product, and never will be. The Xserve product never made much sense, which is probably why it was not very successful and was discontinued.

BorisTheBlade82 · Apr 22, 2023

I know, it is MLID, but some points do sound plausible for me - especially a Zen5/c 4+8 monolithic SoC. Also the timeline makes much more sense now - the real Zen5 mobile generation will be much later than CES2024 and after the DT/DC SKUs.

AMD Ryzen 8000 APU Leaks: Zen 5 In Fire Range & Strix Point Families, Hawk Point With Zen 4 & RDNA 3.5

AMD's next-gen Zen 5 & Zen 4 powered Fire Range, Strix Point & Hawk Point APUs with RDNA 3.5 GPU cores have leaked out.

wccftech.com

BorisTheBlade82 · Apr 22, 2023

What I can not believe is the unified L3 for both Zen5 and Zen5c. I expect them to be two separate CCX and unifying their L3 would go against their philosophy of recent years.

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Diamond Member

Platinum Member

Senior member

Golden Member

Senior member

Senior member

Platinum Member

Platinum Member

Platinum Member

Golden Member

Senior member

Golden Member

Diamond Member

Lifer

Platinum Member

Golden Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Platinum Member

Senior member

Senior member