- Mar 3, 2017
- 1,687
- 6,243
- 136
SRAM doesn't scale at all N5->N3E. https://fuse.wikichip.org/news/7343/iedm-2022-did-we-just-witness-the-death-of-sram/
I think that's ultimately the end game, assuming hybrid bonding production/costs scale sufficiently.That's not surprising. The scaling from 7nm to 5nm wasn't particularly great either.
I do wonder if we get to a point where it's more economical for the entire L3 cache to be a stacked cache. The latency isn't as critical and being able to manufacture it on an old node makes for a lot of cost savings and lets the base die shrink by a fair bit.
They make a yearly releases for mobile so I definitely think there will be a 3nm Zen 5+ out in 2025Because both 4 and 3nm variants of Zen 5 are supposed to come out in 2024, and it doesn't make much sense to me IMO to release Zen 5 and Zen 5+ in the same year when AMD is already on a 1.5 year pace, and Zen 5 should be comfortable in terms of performance with it's competitors as well.
Esp considering the price of 3nm as well. But that's just my opinion.
mobile is on a refreshed node without any major changes, no new design rules, nothing like that. 4 to 3nm won't be the same.They make a yearly releases for mobile so I definitely think there will be a 3nm Zen 5+ out in 2025
Let's see whether upcoming STX will use both N4 and N3E processes as rumored. STX1 would be big iGPU from AMD, so AMD has to use most advanced process for it, and I also believe STX would be monolithic design seeing Intel LNL also remains monolithic.14nm/12nm and N7/N6 are both design compatible, so AMD had to do little to no work to support both. If we truly see full fat Zen 5 on both N4 and N3, that would be a notable departure.
STX1 is not the big iGPU model.Let's see whether upcoming STX will use both N4 and N3E processes as rumored. STX1 would be big iGPU from AMD, so AMD has to use most advanced process for it, and I also believe STX would be monolithic design seeing Intel LNL also remains monolithic.
I'd be thankful for just 4 Ghz at lower TDPs until manufacturers finally start making fans and cases with acoustic metamaterials.Hopefully we will see stock boost 6.5 GHz+ processors, even if not, we are looking at 5 GHz base clock x86 CPUs pretty soon
This is why 3D/V cache was such a great investment for AMD in their design process.SRAM doesn't scale at all N5->N3E
Clocking high and high TDP is like an x86 USP But TSMC is working hard to take away x86 high TDP advantage.I'd be thankful for just 4 Ghz at lower TDPs until manufacturers finally start making fans and cases with acoustic metamaterials.
The frequency regression is one thing that might hold them back though. I don't know if stacking logic on SRAM would help.This is why 3D/V cache was such a great investment for AMD in their design process.
Apart from giving them more freedom to use different nodes for the cache it also opens up the Possibilist of using radically different device types to SRAM for the larger caches.
One thing I am wondering about is integrating an LLC in an IO chiplet.Anyone think there is a possibility of a chiplet APU with Zen5 and RDNA4 using a full chiplet for NV4x to come in 2025?
Rumors put the individual NV4x chiplets at 48 CUs, so it would be a pretty damn meaty upgrade for single chip package implementations if it happens.
Don't think so. It would be brutally bottlenecked by RAM bandwidth.Anyone think there is a possibility of a chiplet APU with Zen5 and RDNA4 using a full chiplet for NV4x to come in 2025?
Rumours put the individual NV4x chiplets at 48 CUs, so it would be a pretty damn meaty upgrade for single chip package implementations if it happens.
Clocking high and high TDP is like an x86 USP But TSMC is working hard to take away x86 high TDP advantage.
The frequency regression is one thing that might hold them back though. I don't know if stacking logic on SRAM would help.
One thing I am wondering about is integrating an LLC in an IO chiplet.
I wonder if reducing L3 to 0.75x and introducing a massive SLC in IOD would bring some tangible benefits. This SLC could be in the IO chiplet for both mobile and desktop SKUs and can be used by the GPU as well.
Current L3 latencies on 7950X are around ~9ms. Quite excellent. SLC would probably make that >25ms. Unless a new interconnect comes in.
Which makes packaging design one of the most interesting aspect of future designs from AMD.
There is a member in the 3D Fabric portfolio, which is InFO_3D.
View attachment 79660+
SLC with a high density of 136MTr/mm2 on N7 like the V Cache chips makes too much sense to stack on IOD. This process is now at 50-60% usage at TSMC, they will be very cheap going forward.
Yes, and as we move forward, I suspect Intel will have the edge in terms of process. AMD definitely needs to have perfect execution.Leaving out density for a moment, N5 --> N3E is on par in terms of perf and efficiency vs N7 --> N5 as per TSMC data.
For chiplet based designs, lack of density scaling while not ideal can be tackled to some extent by advanced packaging. Issue mostly for Monolithic designs.
Bulk of Analog+IO is already off die anyway, and SRAM is main issue with no scaling with N5 --> N3E compared to 1.35x scaling on N7 --> N5
That 15-20% perf looks sweet if it goes with a high clocked long pipeline x86 processor. Hopefully we will see stock boost 6.5 GHz+ processors, even if not, we are looking at 5 GHz base clock x86 CPUs pretty soon.
N3P will be a minor bump just like N4P.
N2 is bringing barely 1.1x scaling, so not much to write home about. But perf and efficiency again improved significantly.
View attachment 79647View attachment 79648
Edited some SRAM scaling values I recollected wrongly.
Definitely something worth looking at, and I'm sure AMD has done so. The question, I think, is what drawbacks are present that have prevented this from happening yet. Just packaging issues? I think, ultimately, that x86 can go the way of Apple's M series and use larger L2$ and off-chip SLC, there just has to be a high bandwidth connection to the SLC (and that could be the rub, power wise).SLC with a high density of 136MTr/mm2 on N7 like the V Cache chips makes too much sense to stack on IOD. This process is now at 50-60% usage at TSMC, they will be very cheap going forward.
Clocking high and high TDP is like an x86 USP But TSMC is working hard to take away x86 high TDP advantage.
The frequency regression is one thing that might hold them back though. I don't know if stacking logic on SRAM would help.
One thing I am wondering about is integrating an LLC in an IO chiplet.
I wonder if reducing L3 to 0.75x and introducing a massive SLC in IOD would bring some tangible benefits. This SLC could be in the IO chiplet for both mobile and desktop SKUs and can be used by the GPU as well.
Current L3 latencies on 7950X are around ~9ms. Quite excellent. SLC would probably make that >25ms. Unless a new interconnect comes in.
Which makes packaging design one of the most interesting aspect of future designs from AMD.
There is a member in the 3D Fabric portfolio, which is InFO_3D.
View attachment 79660+
SLC with a high density of 136MTr/mm2 on N7 like the V Cache chips makes too much sense to stack on IOD. This process is now at 50-60% usage at TSMC, they will be very cheap going forward.
Having tags on die could make sense, but a "quick access" portion does not. Simply because the kinds of loads that would likely hit it can be perfectly captured by the L2 instead.The latency concerns you mention might be alleviated somewhat by creating a sort of hybrid two level SLC design. Keep the tags and a smaller "quick access" level close to the CPU on the main die. A lot of SLC access is semi temporal, a smart prefetcher can fill the quick access level and reduce effective latency for that in-between type of cache behavior that isn't random but isn't exactly streaming either.
Something to be aware of for Apple, they are targeting mostly consumer and workstation type devices. I do think we will eventually see higher core counts from them, but only to a point. They are not currently 'officially' in the server market, so high core counts aren't a priority for them.I wonder if that packaging mix would allow Apple to make the Max die stack with a cache chip while still allowing for the integration of multiple Max chips to make an Ultra or "Extreme", or would that be too complex i.e. risky? Packaging will probably be the most interesting thing going forward at TSMC not just for AMD but for all customers. The ability to mix older processes that aren't fully utilized and will thus allow major partners like Apple & AMD to extract favorable pricing will open up new avenues.
The latency concerns you mention might be alleviated somewhat by creating a sort of hybrid two level SLC design. Keep the tags and a smaller "quick access" level close to the CPU on the main die. A lot of SLC access is semi temporal, a smart prefetcher can fill the quick access level and reduce effective latency for that in-between type of cache behavior that isn't random but isn't exactly streaming either.
Yeah this.Something to be aware of for Apple, they are targeting mostly consumer and workstation type devices. I do think we will eventually see higher core counts from them, but only to a point. They are not currently 'officially' in the server market, so high core counts aren't a priority for them.
I do think that they will be focused on IPC, frequency increases, and GPU performance, however.
This may change if they decide to get back into the server/enterprise market (which I personally wish they would do, it would be nice to run my software on the same platform it is built on)
they run their own dc we'll never know unless it leaks. I'm sure someone out there can track shipments of x86 processors to their facilities if they wanted to or we could group together and make it like mission impossible 1 coming down into the core room of their dc by tensile rope working away at the security dialogues to access their inner treasure.Yeah this.
I wouldn't be surprised to find that they had already put their own cores to work in a server farm/datacenter at Close Encounters of the Apple Kind 👽, but beyond that I doubt that they will ever branch into servers for the common market.
Apple can take their max ultra contour pad whatever and plant 4-6 of them down on a custom motherboard they designed hemselves to run their own servers and we'll never know. i'd presume tim cook personally hires hitmen should anyone at their dc's be open to speaking with the press or other outside parties. it's not a huge stretch of the mind given apple's desire for full vertical integration.
they run their own dc we'll never know unless it leaks. I'm sure someone out there can track shipments of x86 processors to their facilities if they wanted to or we could group together and make it like mission impossible 1 coming down into the core room of their dc by tensile rope working away at the security dialogues to access their inner treasure.
but if it's for internal use is there a point? I forget the exact reasoning the late jobs claimed for why apple got out of servers but it likely a future forecast of what was to come. apple's doing of that was at the very beginning of the cloud compute era even though it launched a few years before, it was gaining steam and teeth to it. with the likes of tenstorrent and possibly intel and amd offering their own arm solutions in the future the point is moot. apple will never sell their own hardware creations in that space to keep a competitive edge and in doing so keep the public in wonder. I'm no expert in security but it may fall under security through obscurity. easier to assume apple's running x86 and bog standard linux for their corporate stuff than running a specialised version of macos and special hardware. but even then do recall that apple was running osx 20 years ago on pentium 4 hardware. there's a few good articles online about it involving apple and dell or vaio pentium 4s. apple's skunk works had began playing around with osx on pc hardware roughly 7 or 8 years before the first intel mac came about. I've not got the link on me atm but you can easily find a write up about it. in short apple may be running their own version of macos server on x86 servers and none would be the wiser.I agree we probably wouldn't know until such a project was well underway, but I think Cook would tout it at some point either at WWDC or in an investor call, depending on whether use of their own CPUs was primarily for developer benefit (i.e. an offering that provides APIs to have parts of a Mac and iOS binary able to run in the cloud where maximum performance is beneficial) or for cost savings / efficiency (i.e. Apple saving money by using their own more efficient cores vs x86 alternatives for their future datacenter builds - which would also reduce the per CPU share of design/mask and other fixed costs for the Pro/Max lineup)
but if it's for internal use is there a point?