Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 117 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
805
1,394
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).



What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts!
 
Last edited:
Reactions: richardllewis_01

DrMrLordX

Lifer
Apr 27, 2000
21,808
11,165
136
So I don't know if that puts Genoa ahead or behind SPR. One think we can be quite confident about is that there will be no yield issues for Genoa...

Correct, N5/N5P should have excellent yields by this point, while 10ESF isn't proven for dice that large. Genoa has had ES for ages and since Dr. Su already announced QS for customers on stage, that process has likely been in play for several months now. The Facebook/Meta acquisition announcement seems to indicate that Facebook - normally a customer of things like Cooper Lake and IceLake-SP - is interested in Genoa for it's extended ISA and other positive attributes. So they'll likely get early shipments, along with MS; Google; and Amazon.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,106
136
Correct, N5/N5P should have excellent yields by this point, while 10ESF isn't proven for dice that large. Genoa has had ES for ages and since Dr. Su already announced QS for customers on stage, that process has likely been in play for several months now. The Facebook/Meta acquisition announcement seems to indicate that Facebook - normally a customer of things like Cooper Lake and IceLake-SP - is interested in Genoa for it's extended ISA and other positive attributes. So they'll likely get early shipments, along with MS; Google; and Amazon.
Meta rolleyes is using EPYC in it's OAM universal platform. Cool beans!
 

BorisTheBlade82

Senior member
May 1, 2020
667
1,022
136
@DisEnchantment
I have to admit that I did not understand everything you wrote, so please allow me some questions:

They look nothing like TSMC LSI bridges (which is polymer and RDL interconnect layers with the bridge), which brings more context when AMD said they invested heavily in bringing up the packaging supply chain (from last ER Q&A).
That would greatly help on cost and capacity.
This comes as a bit of a surprise to me. Ryan was saying that EFB was basically INFO-L(SI) renamed by AMD. Ofc that does not mean that AMD might have played a big role in the R&D in a joint-venture like manner.
This was also my assumption Was it not the same for SoIC?

My guess here
If they were to use the EFB for Genoa, what they would do within the 14 layers of routing in the substrate (Milan) can be moved to the EFB/Fan out package.
So for 12 CCDs they connect the same way as if they connect to the substrate in Milan just that the bumps are making contact with the Fan out package instead.
Cu Pillars route directly to substrate and the EFB could route the chiplet to chiplet connections.
Maybe I misunderstood this completely, but do I understand you correctly:
You suggest they would use EFB for the interconnect between neighbouring CCD? So that woul basically mean you could connect up to 3 CCD with each other as they are grouped in that way on the package. Do you suggest daisy-chaining or only connections between pairs?
Either way this is an approach I have not really thought about. So connections to the IOD would still be IFOP? I would guess that - relatively speaking - that in general massively parallel workloads only about 10% of the traffic take place between CCDs while 90% is IOD <-> CCD. So the benefits might be very limited.

The Zen 4c slide mentioned "density-optimized cache hierarchy" and "significantly improved power efficiency" separately. This suggests the power efficiency being a big deal and separated from the cache.

What kind of tricks you can do to improve power efficiency for a cloud oriented CPU? Adding moar cores - 2xCCX. Lowering clocks to fit the efficiency curve? Axing FP resources as much as possible since cloud workloads are INTish? Aggressively optimizing DVFS based on workload profile?
You named almost all of them. But for AMD especially going from IFOP to some sort of silicon interconnect - EFB being the favorite ATM - might very easily be the single most important option to increase power efficiency.

4 wide front end (But uop cache helps a lot here)
Zen3 OOO BE is quite wide. 10 Wide for int and 6 wide fp
From Chips&Cheese tests bottlenecks are ROB, LS and Rename Reg file before hitting front end limits.
As from what I understood the ROB as such has not much relevance with regards to the performance of a core. Please do not get me wrong at that - as of course the ROB defines the OoO-window-size. But only increasing the size in and of itself makes not a lot of a difference. Moreover the size of the ROB is a good indicator, how wide and how deep (queues) a core architecture generally is. https://drive.google.com/file/d/1WrMYCZMnhsGP4o3H33ioAUKL_bjuJSPt/view?usp=sharing

Is this the same image as what is behind Lisa Su in the article here:


These do not look like Bergamo. It appears to have 12 CCD, which would be Genoa. Compared to Rome / Milan, this appears to have some surface mount capacitors in the middle, between the cpu chiplets in addition the those along the top and bottom. Bergamo should have 8 CCD only (for some reason).

There could be some surprise with that though. It isn’t coming out for a while, so perhaps it actually uses embedded silicon interconnect. Fitting 8 die close enough to the IO die for embedded silicon interconnects seems like it would be difficult, but it isn’t impossible given the dimensions of the normal Zen 4 CCD. Given the supposedly leaked specs, the IO die is 24.79 x 16 mm. The Zen 4 / Genoa CCD is 10.7 x 6.75. 24.79 divided by 4 is about 6.2 mm, so it isn’t that much of a stretch that they could put 4 die along each side, directly adjacent to the IO die with a slightly smaller or differently shaped die, or a larger IO die.

Another possibility is that the Bergamo CCD has little to no L3 cache and the IO die has the L3 or L4 cache. It might be made on 6 nm, so having large caches is plausible, like the 128 MB infinity caches on GPUs. It will need to be a different version of the IO die to use embedded silicon bridges (of some kind; I can’t keep the names straight), but that would fit in with the lower power usage and extreme density. The penalty for going to the IO would be much lower than with serdes based solutions. It might also be lower latency making a somewhat monolithic last level cache reasonable. The IO die might be of similar size, even with the cache. If they don’t have any serdes for the IFOP connections, that would likely save a lot of die area and power that could be used for caches.
This, IMHO, is spot on.
 
Reactions: Vattila

DisEnchantment

Golden Member
Mar 3, 2017
1,687
6,243
136
This comes as a bit of a surprise to me. Ryan was saying that EFB was basically INFO-L(SI) renamed by AMD
Definitely not InFO-LSI. InFO is WLP, chip first. Good for Mobile, not good when there are many dies to stitch like in EPYC.
What @uzzi38 and basically David Schor (WikiChip author) is basically what I believe it is.

Maybe I misunderstood this completely, but do I understand you correctly:
You suggest they would use EFB for the interconnect between neighbouring CCD? So that woul basically mean you could connect up to 3 CCD with each other as they are grouped in that way on the package. Do you suggest daisy-chaining or only connections between pairs?
Either way this is an approach I have not really thought about. So connections to the IOD would still be IFOP? I would guess that - relatively speaking - that in general massively parallel workloads only about 10% of the traffic take place between CCDs while 90% is IOD <-> CCD. So the benefits might be very limited.
You can read about FOEB, it is a fan out package like EFB and as David Schor (who for sure knows a lot more about these things than Anandtech) suggested SPIL is the most likely supplier
The fan out package is basically just that, fan out, you can put anything on the mold, bridges, pillars, RDL, whatever. It can cover the entire fan out area with single or multiple bridges or RDL, and then you cut out the mold and place the dies on top.
The Si Bridges can be used if you want active devices, like repeaters (which is very common) or you can have passive routing, using plain RDL.
The Bridges can be few big ones or many small ones, it does not matter, they all sit inside the fan out mold. The logic dies only come in contact with bumps on top of the fan out package.

That said, I don't claim, I am suggesting and very open to hearing other thoughts

As from what I understood the ROB as such has not much relevance with regards to the performance of a core.
Of course it is relevant when the execution gets stalled when there are no ROB entries any more because instructions take time to be retired. Zen3 OOO engine is quite wide.
They have a github repo if you wanna do it yourself.
 
Last edited:

LightningZ71

Golden Member
Mar 10, 2017
1,661
1,946
136
If the IOD is N6, and tsmc has N6 stacking working, why not have the cache stacked on th N6 IOD? We know that the IO interface areas on the IOD do not scale well at all with increasingly dense nodes. Preserve the IOD as being as physically small as the IO pads, control and interface logic, and buffers allow, then stack the communal L3 or L4 cache on to of it using am N6 die with the "levers, knobs and switches" turned to make it as dense and power efficient as possible for cache.
 

leoneazzurro

Golden Member
Jul 26, 2016
1,013
1,610
136
If the IOD is N6, and tsmc has N6 stacking working, why not have the cache stacked on th N6 IOD? We know that the IO interface areas on the IOD do not scale well at all with increasingly dense nodes. Preserve the IOD as being as physically small as the IO pads, control and interface logic, and buffers allow, then stack the communal L3 or L4 cache on to of it using am N6 die with the "levers, knobs and switches" turned to make it as dense and power efficient as possible for cache.

Because latency and bandwidth of the interconnects between CCDs and I/O die are way higher than those of a stacked die, obviously.
 
Last edited:

DisEnchantment

Golden Member
Mar 3, 2017
1,687
6,243
136

Bold numbers are measured or official numbers, the other numbers are estimates

Estimation for Zen4 CCD (based on Zen3 die shots from Fritzchen Fritz)

My estimation of the Zen4 CCD taking the die size as 72.225mm2 based on the leaked manual from Gigabyte and also shared by ExecuFix
Based on TSMC's values and AMD's recent presentation I am estimating a super conservative density of 90 MTr/mm2 (vs 171 MTr/mm2 from TSMC). (I think it will be around ~95 MTr/mm2 actual)
Main assumption is that AMD is not going full retard and is simply doubling FP resources (Instead of quadrupling them, in which case the FPU will be bigger than the rest of the core itself ).
Another assumption is that the CVML block is in the IOD like DCN and GPU, if any.
The core is massive, Zen4 Core + L2 will be almost double of Zen3.
Core+L2 : MTr ~400 and around 4.3mm2 in size. Close to a A14 in MTr. ( Zen3 has a measly 204 Mtr and 3.96mm2 )

A plain optical shrink for Zen3 CCD would be a tiny 45 mm2 @90MTr/mm2
 
Last edited:

jamescox

Senior member
Nov 11, 2009
642
1,104
136
@DisEnchantment
I have to admit that I did not understand everything you wrote, so please allow me some questions:


This comes as a bit of a surprise to me. Ryan was saying that EFB was basically INFO-L(SI) renamed by AMD. Ofc that does not mean that AMD might have played a big role in the R&D in a joint-venture like manner.
This was also my assumption Was it not the same for SoIC?


Maybe I misunderstood this completely, but do I understand you correctly:
You suggest they would use EFB for the interconnect between neighbouring CCD? So that woul basically mean you could connect up to 3 CCD with each other as they are grouped in that way on the package. Do you suggest daisy-chaining or only connections between pairs?
Either way this is an approach I have not really thought about. So connections to the IOD would still be IFOP? I would guess that - relatively speaking - that in general massively parallel workloads only about 10% of the traffic take place between CCDs while 90% is IOD <-> CCD. So the benefits might be very limited.


You named almost all of them. But for AMD especially going from IFOP to some sort of silicon interconnect - EFB being the favorite ATM - might very easily be the single most important option to increase power efficiency.


As from what I understood the ROB as such has not much relevance with regards to the performance of a core. Please do not get me wrong at that - as of course the ROB defines the OoO-window-size. But only increasing the size in and of itself makes not a lot of a difference. Moreover the size of the ROB is a good indicator, how wide and how deep (queues) a core architecture generally is. https://drive.google.com/file/d/1WrMYCZMnhsGP4o3H33ioAUKL_bjuJSPt/view?usp=sharing


This, IMHO, is spot on.
I have posted this several times, but it is a good overview of TSMC stacking tech. It is still very confusing though, especially if more than one stacking type might be in use for a single device. You could have stacked cache with SoIC and then one of the other types of stacking in package assembly. Whatever EFB is, it is certainly one of the TSMC technologies listed here.

 

jamescox

Senior member
Nov 11, 2009
642
1,104
136
Stacking L4 onto the IOD might seem like a good idea, though as with Broadwell-C, if it doesn't significantly outperform system RAM then it's kind of a wash. Skylake-C never happened for reasons.
Genoa is almost certainly serdes connections from cpu die to IO die, but I think Bergamo is likely to use EFB. Bergamo is supposed to be maximum core count, but it seems to be limited to 8 cpu die rather than the 12 in Genoa. If Bergamo was 12 * 16 cores, it would be 192 core device. This limitation is likely due to EFB limitations; the die have to be very close and it would also have alignment constraints. There should be room to place 8 die directly adjacent to the IO die; 4 on each side. Genoa IO die is around 24 mm long and a Zen 4 die is near 6 mm in one dimension. Make the IO die slightly larger or the cpu die smaller or a slightly different aspect ratio, and you should be able to fit four 16-core chiplets on each side.

The connection allowed by EFB would be similar to that used by HBM2E, so possibly thousands of bits wide with no added latency from serialization / deserialization. HBM doesn’t have very good latency, but that is due to HBM being DRAM, not due to the interface, so this should make off chip cache a reasonable thing to do. You can get TB/s bandwidth from such interfaces with low latency.

I don’t know if it is plausible for the bridge die to actually contain the L3 or L4 cache, but that may be a possibility. I haven’t had time to keep up with the packaging tech; has there been anything said about active bridge die? You could embed or stack a huge amount of cache, so the Zen 4c core may not actually be cut down that much if a lot of cache has been moved off die. I am wondering if AMD GPUs will use the same chips and those are rumored to have up to 512 MB infinity cache. If you have 8 bridge chips for 8 cpu die, then that would he 512 MB if they are actually single 64 MB cache chips. A 512 MB L4 would likely make up for a bit less cache on the cpu die. Perhaps they could be stacked for 1, 1.5, 2 GB cache.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,106
136
Main assumption is that AMD is not going full retard and is simply doubling FP resources (Instead of quadrupling them, in which case the FPU will be bigger than the rest of the core itself )
Damn, that's a huge FP footprint! Are these here mainly for their HPC customers? I know hyper-scalers do some heavy matrix math, but I thought most nodes where mainly integer driven (plus, I/O ofc).
 

jamescox

Senior member
Nov 11, 2009
642
1,104
136
Bergamo being 8 dies is I figure more due to power consumption. They need to do something about the IO die power consumption before getting too crazy.
Using EFB is a way to reduce IO die power consumption. It should be a huge reduction compared to running 12 pci-e 5 or pci-e 6 speed serdes links to cpu chiplets. Large caches reduce off die memory access, but they still burn some power. The cache only chips are on a process optimized for cache, so they may be quite low power consumption.

I am relatively confident that Bergamo will use silicon bridge connections. It is less clear what the cache hierarchy implementation will be. It might be plausible that all cache chips are the same chip. The bump-less TSVs don’t take much area, so it might be possibly to have pads for micro-solder bumps on the same chip as bump-less contacts for maximum reusability.

Bergamo isn’t coming out for a while. Everyone expects more 2.5D/3D packaging to be used. It will be used in next gen GPUs. RDNA3 seems like it may be a more advanced device than CDNA2, but a lot of compute applications work fine spread across multiple gpus. I have hit cases where the performance was about the same with one monolithic gpu and 2 half size / half bandwidth GPUs. Gaming actually seems to be a more difficult problem. The EFB tech mostly reduces cost issues with stacking. Giant interposers under everything is very expensive and very wasteful. EFB seems like it may be even cheaper and simpler to manufacture than Intel EMIB.

Edit: Also, Bergamo is a 2023 product. If AMD isn’t using some of TSMC’s 2.5D / 3D stacking tech by then, then they are likely to fall behind. I was actually surprised that Genoa doesn’t seem to use stacking tech.
 
Last edited:

Saylick

Diamond Member
Sep 10, 2012
3,389
7,154
136
You can read about FOEB, it is a fan out package like EFB and as David Schor (who for sure knows a lot more about these things than Anandtech) suggested SPIL is the most likely supplier
The fan out package is basically just that, fan out, you can put anything on the mold, bridges, pillars, RDL, whatever. It can cover the entire fan out area with single or multiple bridges or RDL, and then you cut out the mold and place the dies on top.
The Si Bridges can be used if you want active devices, like repeaters (which is very common) or you can have passive routing, using plain RDL.
The Bridges can be few big ones or many small ones, it does not matter, they all sit inside the fan out mold. The logic dies only come in contact with bumps on top of the fan out package.
Seems like this is spot on. SPIL's FOEB looks exactly like what AMD presented:

Source: https://www.3dincites.com/2020/07/iftle-456-spil-fan-out-embedded-bridge-foeb-technology/

Giant interposers under everything is very expensive and very wasteful. EFB seems like it may be even cheaper and simpler to manufacture than Intel EMIB.
SPIL seems to agree. Better yield, better interconnect density.
 

jamescox

Senior member
Nov 11, 2009
642
1,104
136
That could be explained with the start of development of Genoa predating Milan-X. Didn't want to interfere the TTM of Genoa with implementing the chip stacking. Maybe that's where Bergamo comes in.
Yeah, Genoa seems like the best option to get Zen 4 out quickly in the enterprise market. It is kind of minimal changes. For those that know, a stacked package is a bit more of a risk due to long term longevity and reliability concerns because of differing thermal expansion in the new packaging. There might be some hesitation.

I suppose Zen 4 will be around for a while in some form, but AMD has this brand new IO die with DDR5 and PCI-e 5; will it still be used with Zen 5 if Bergamo really uses silicon bridge interconnect? Perhaps the IO die will stick around for Zen 5 and there will be at least 2 versions for Zen 5 also. I had thought that Zen 5 would use embedded silicon interconnect or stacking, but Zen 5 will likely be massive with rather high power consumption, so the non-stacked package may be better due to being able to spread the chips out further for lower thermal density. I guess the current IO die has only been through 2 or 3 cpu upgrades (Zen 2 Rome -> Zen 3 Milan -> Zen 3 Milan-x). I guess there is also the possibility that the Genoa and Bergamo IO die are actually the same. The micro-bump interfaces, if present, wouldn’t take much space.
 

BorisTheBlade82

Senior member
May 1, 2020
667
1,022
136
@DisEnchantment
Thanks for the elaboration.
WRT to the ROB I am absolutely on your side and I also already read the CaC article. I just wanted to point out the following: Just increasing the ROB and not changing anything else will not lead to any significant performance increase if the former size wasn't already much too small for the given architecture.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,333
2,947
106
Genoa is almost certainly serdes connections from cpu die to IO die, but I think Bergamo is likely to use EFB.

I would think that too. If you consider time progression, when Genoa taped out was probably good amount of time before CDNA2, and with CDNA2, AMD still had an option in the back pocket to switch to interposer if EFB did not work.

The progression of tape outs is likely:
- Genoa
- CDNA2
- RDNA3
- CDNA3
- Bergamo

So a good amount of time to make a choice for Bergamo. So my guess would be EFB or better. Better would be Active Silicon Bridge, possibly with SRAM on it.

Since there is a lot of talk about RDNA 3 having Active Silicon Bridge, it might be a possibility for Bergamo.

The connection allowed by EFB would be similar to that used by HBM2E, so possibly thousands of bits wide with no added latency from serialization / deserialization. HBM doesn’t have very good latency, but that is due to HBM being DRAM, not due to the interface, so this should make off chip cache a reasonable thing to do. You can get TB/s bandwidth from such interfaces with low latency.

The eventual goal is to make the collection of chiplet act as a single massive silicon die (or better, in reducing some distances), and EFB advances AMD almost there.

One better would be hybrid bond connection between the chips, which would allow even higher bandwidth and even lower power, but EFB is close enough in the mean time.

I don’t know if it is plausible for the bridge die to actually contain the L3 or L4 cache, but that may be a possibility. I haven’t had time to keep up with the packaging tech; has there been anything said about active bridge die? You could embed or stack a huge amount of cache, so the Zen 4c core may not actually be cut down that much if a lot of cache has been moved off die. I am wondering if AMD GPUs will use the same chips and those are rumored to have up to 512 MB infinity cache. If you have 8 bridge chips for 8 cpu die, then that would he 512 MB if they are actually single 64 MB cache chips. A 512 MB L4 would likely make up for a bit less cache on the cpu die. Perhaps they could be stacked for 1, 1.5, 2 GB cache.

I always imagined the Active Bridge (which would contain L3) be stacked on top, not underneath the die. It may very well be this will come after EFB, as another technology.

Some of the leaks of RDNA3 and related patents seem to be pointing in that direction, of the bridge connection being TSV and hybrid bond, with the bridge containing L3.

Since Bergamo is likely 6+ months following RDNA3, whatever technology that goes into RDNA3 would be available to Bergamo...
 
Reactions: Manabu and Saylick

Joe NYC

Platinum Member
Jun 26, 2021
2,333
2,947
106
Using EFB is a way to reduce IO die power consumption. It should be a huge reduction compared to running 12 pci-e 5 or pci-e 6 speed serdes links to cpu chiplets. Large caches reduce off die memory access, but they still burn some power. The cache only chips are on a process optimized for cache, so they may be quite low power consumption.

I am relatively confident that Bergamo will use silicon bridge connections. It is less clear what the cache hierarchy implementation will be. It might be plausible that all cache chips are the same chip. The bump-less TSVs don’t take much area, so it might be possibly to have pads for micro-solder bumps on the same chip as bump-less contacts for maximum reusability.

i am not sure if that's worthwhile, to have different ways to connect to the same chip.

I think the priority for reusability would be in some standard way different chiplets connect to IOD, each other or some other central switch, which would allow mix and match, to be able to easily churn out variety of designs and also semi-custom MCMs for customers.

The chiplets being CCD, RDNA GPU, CDNA GOU, some compute matrix engine, extra SRAM or DRAM modules, compression / decompression, FPGAs.

Bergamo isn’t coming out for a while. Everyone expects more 2.5D/3D packaging to be used. It will be used in next gen GPUs. RDNA3 seems like it may be a more advanced device than CDNA2, but a lot of compute applications work fine spread across multiple gpus. I have hit cases where the performance was about the same with one monolithic gpu and 2 half size / half bandwidth GPUs. Gaming actually seems to be a more difficult problem. The EFB tech mostly reduces cost issues with stacking. Giant interposers under everything is very expensive and very wasteful. EFB seems like it may be even cheaper and simpler to manufacture than Intel EMIB.

I think that is what Charlie D. at Semi Accurate seems to think. Not only cheaper to manufacture, but also higher yield.

Edit: Also, Bergamo is a 2023 product. If AMD isn’t using some of TSMC’s 2.5D / 3D stacking tech by then, then they are likely to fall behind. I was actually surprised that Genoa doesn’t seem to use stacking tech.

That was a little bit of a disappointment, but Genoa tapeout probably goes way back. And if AMD also decided to have a half generation (Bergamo) between Genoa and Turin, and that generation would get the interconnect upgrade, Genoa did not have to be held back or jeopardized by fiddling with the interconnect.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,333
2,947
106
I suppose Zen 4 will be around for a while in some form, but AMD has this brand new IO die with DDR5 and PCI-e 5; will it still be used with Zen 5 if Bergamo really uses silicon bridge interconnect?

I don't see it as a huge challenge to re-spin the IO Die, keeping the external interfaces unchanged (PCIe5, DDR5) but only changing the internal connection to chiplets.

For all we know, the DDR5 interface is sharing the same IP / layout used in Steam Deck, Rembrandt and Genoa.

Perhaps the IO die will stick around for Zen 5 and there will be at least 2 versions for Zen 5 also. I had thought that Zen 5 would use embedded silicon interconnect or stacking, but Zen 5 will likely be massive with rather high power consumption, so the non-stacked package may be better due to being able to spread the chips out further for lower thermal density. I guess the current IO die has only been through 2 or 3 cpu upgrades (Zen 2 Rome -> Zen 3 Milan -> Zen 3 Milan-x). I guess there is also the possibility that the Genoa and Bergamo IO die are actually the same. The micro-bump interfaces, if present, wouldn’t take much space.

I don't think there were any change in IO Die for Milan X. You must be thinking Trento.

With introduction of EFB, I think Genoa will be last one to use SerDes, internally, and future IO Dies after Genoa will taken out and replaced.
 
Reactions: Saylick

jamescox

Senior member
Nov 11, 2009
642
1,104
136
I don't see it as a huge challenge to re-spin the IO Die, keeping the external interfaces unchanged (PCIe5, DDR5) but only changing the internal connection to chiplets.

For all we know, the DDR5 interface is sharing the same IP / layout used in Steam Deck, Rembrandt and Genoa.



I don't think there were any change in IO Die for Milan X. You must be thinking Trento.

With introduction of EFB, I think Genoa will be last one to use SerDes, internally, and future IO Dies after Genoa will taken out and replaced.
I was saying that the essentially the same IO die has been used for Rome, Milan, and Milan-X. I was expecting some form of silicon bridges for Zen 5, but that would make the Genoa IO die (new die with DDR5 and pci-e ) a single cpu chiplet generation product. So I was speculations that the Zen 5 may still have a version that connects to the Genoa IO die.
 

amd6502

Senior member
Apr 21, 2017
971
360
136
The value proposition node at GlobalFoundries is 12FDX, not 12LP+.
22FDX = 1x cost
12LP+ = ~1.6x cost [2019 node]
12FDX = ~1.2x cost [2022 node]
per mm squared

So, I doubt "4c Zen3" plus "a couple RDNA2 WGPs" on 12LP+ is aimed at value.

2c Zen4-lite APU on 12nm would be quite enough horsepower for value segments. If they stick with SMT2 that's four big threads. If they go beyond we can hope for an arrangement like four big threads, and four background threads, each with the ability to execute without speculation, and even in-order.
 

jamescox

Senior member
Nov 11, 2009
642
1,104
136
i am not sure if that's worthwhile, to have different ways to connect to the same chip.

I think the priority for reusability would be in some standard way different chiplets connect to IOD, each other or some other central switch, which would allow mix and match, to be able to easily churn out variety of designs and also semi-custom MCMs for customers.

The chiplets being CCD, RDNA GPU, CDNA GOU, some compute matrix engine, extra SRAM or DRAM modules, compression / decompression, FPGAs.



I think that is what Charlie D. at Semi Accurate seems to think. Not only cheaper to manufacture, but also higher yield.



That was a little bit of a disappointment, but Genoa tapeout probably goes way back. And if AMD also decided to have a half generation (Bergamo) between Genoa and Turin, and that generation would get the interconnect upgrade, Genoa did not have to be held back or jeopardized by fiddling with the interconnect.
I don’t know about the IO die. It seems like they will already have trouble with all of the signal pins required for Genoa, so I have wondered if they would use some embedded silicon, perhaps with the PHY interface hardware in the embedded silicon. In the original Zen processor, they actually had 4 IFOP links just to make routing in the package easier; only were ever used, so unused interfaces wouldn’t be new.

For the cache chips, it seems like it could be a good fit to support multiple interfaces. I would need to look up where the TSVs are in the cache chips. I was thinking the TSVs would be near the middle of the chip for SoIC stacking. For the EFB stacking, the pads for the micro-solder balls would necessarily be on the edges, where the cpu die and the IO die overlap it.
 

jamescox

Senior member
Nov 11, 2009
642
1,104
136
I would think that too. If you consider time progression, when Genoa taped out was probably good amount of time before CDNA2, and with CDNA2, AMD still had an option in the back pocket to switch to interposer if EFB did not work.

The progression of tape outs is likely:
- Genoa
- CDNA2
- RDNA3
- CDNA3
- Bergamo

So a good amount of time to make a choice for Bergamo. So my guess would be EFB or better. Better would be Active Silicon Bridge, possibly with SRAM on it.

Since there is a lot of talk about RDNA 3 having Active Silicon Bridge, it might be a possibility for Bergamo.



The eventual goal is to make the collection of chiplet act as a single massive silicon die (or better, in reducing some distances), and EFB advances AMD almost there.

One better would be hybrid bond connection between the chips, which would allow even higher bandwidth and even lower power, but EFB is close enough in the mean time.



I always imagined the Active Bridge (which would contain L3) be stacked on top, not underneath the die. It may very well be this will come after EFB, as another technology.

Some of the leaks of RDNA3 and related patents seem to be pointing in that direction, of the bridge connection being TSV and hybrid bond, with the bridge containing L3.

Since Bergamo is likely 6+ months following RDNA3, whatever technology that goes into RDNA3 would be available to Bergamo...
I don’t think I have seen any SoIC tech that bridges across multiple chips; is that a thing? AFAIK, it is just one chip completely on top of another. The alignment for SoIC (direct metal contact, no micro-solder balls) needs to be significantly more precise. Any movement from thermal expansion of multiple chips may cause failure. The pitch for SoIC is on the order of 1 to 10 microns. For micro-solder ball based connections, it seems to be more like 50 to 100 microns. The stacking tech for EFB should allow mix and match of just about any chips (like HBM made elsewhere) while the SoIC is very limited on what can be stacked.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,333
2,947
106
I don’t think I have seen any SoIC tech that bridges across multiple chips; is that a thing? AFAIK, it is just one chip completely on top of another. The alignment for SoIC (direct metal contact, no micro-solder balls) needs to be significantly more precise. Any movement from thermal expansion of multiple chips may cause failure. The pitch for SoIC is on the order of 1 to 10 microns. For micro-solder ball based connections, it seems to be more like 50 to 100 microns. The stacking tech for EFB should allow mix and match of just about any chips (like HBM made elsewhere) while the SoIC is very limited on what can be stacked.

Something I came across:

X3D is how AMD often describes 3D stacked cache, which in this arrangement also acts as a bridge between GCDs. MCD has been the term used for this die.

It is just a patent, not a road map, but some of the speculation around RDNA3 is based on this sort of arrangement.



 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |