Speculation: Ryzen 4000 series/Zen 3

Page 162 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

jamescox

Senior member
Nov 11, 2009
640
1,104
136
We are assuming Zen 3 has the concept of a CCX at all. In an ideal world AMD wouldn’t even have an IO die. The chiplets themselves would be completely self contained.

There was some AMD slides showing an 8 core CCX for Zen 3. Zen 1 was essentially “self contained”, but that is actually very wasteful. It obviously allows for very good design reusability and was probably very cost effective to tape out. It had a lot of wasted die area through. All of the cpu chips had two 64-bit memory controllers, 32 pci-e/IFIS links, and 4 (only 3 used at any time) IFOP for connecting to other die. The rather large infinity fabric switch wasn’t really necessary for desktop parts; that also probably made latency worse. It also had four 32-bit infinity fabric links that were completely unused in Ryzen parts.

The look up the ISSCC 2018 slides covering the original Zen 1 Zeppelin die for the details.

I don’t know how a distributed system with 8 separate chips would work. The memory controllers are grouped together in pairs. Dual channel is 2x64 for 128-bit, but it is DDR, so it can actually generate 256-bits per memory clock, which is 32 bytes. This is the width of all of the infinity fabric and the cache line size, so it transfers 32 bytes per clock on most pathways. You wouldn’t want to put a single 64-bit controller on each of the 8 cpu die. You would limit the bandwidth available to any one core and create a lot of NUMA nodes. You would also have issues with different numbers of cpu die having different numbers of memory controllers available. AMD makes EPYC processors with 2, 4, 6, and 8 cpu chips. That wouldn’t work well if it was still like Zen 1 architecture. They did make a threadripper with 4 cpu die, but it was still limited to 4 channel memory, so two die did not have any connected memory controller. This was not very good under most circumstances. If the 8 memory channels were split across 8 chips, then you would always need 8 chips to connect all memory controllers

The current split allows for maximum reusability with very little wasted silicon. They can make huge numbers of the CPU die and bin them for almost their entire product stack. About the only waste is the one extra IFOP on single cpu chip Ryzen parts.

There is a possibility that there isn’t much info about Zen 3 yet since it is using the exact same IO die used with Zen 2 and it doesn’t require much of an update. It would look exactly the same from the outside since cpu cores only connect with the IO die. It would need different microcode and maybe a few other things, but it needs to be the same to use the same socket.

We don’t get new IO until Zen 4 with DDR5 and PCI-e 5.
 
Reactions: Tlh97 and Vattila

Vattila

Senior member
Oct 22, 2004
805
1,394
136
Core A L2 sends request for read from L3 to cluster core interface A, the shared memory table says the data isn't in L3 slice A but is in L3 slice D.

The location of a cache line, i.e. the slice in which it resides, is directly determined by the address, using memory address interleaving. No table lookup is hence needed for this purpose.

So I think your scenario goes like this: The L2 cache controller for core A sends a request to its local L3 cache controller. The latter then routes the request to the L3 cache controller for core D, based on the lower bits of the cache line address.
 
Last edited:

NostaSeronx

Diamond Member
Sep 18, 2011
3,688
1,222
136
The location of a cache line, i.e. the slice in which it resides, is directly determined by the address, using memory address interleaving. No table is hence needed.

So I think your scenario goes like this: The L2 cache controller for core A sends a request to its local L3 cache controller. The latter then routes the request to the L3 cache controller for core D, based on the lower bits of the cache line address.
The memory table = address tags for L3.
Also, the L2 unit that does transfers is called the L2 Bus Unit. (Similar to Jaguar's BU)
Also, the L3 unit that does transfers is called the Cluster Core Interface. (Similar to Jaguar's L2 un-distributed cluster core interface)

None of the cores directly interface with the L2 SRAM but rather the L2BU or L3 SRAM but rather the CCI.

Core
|
LSU* - L1d SRAM
|
L2 Bus Unit* -> L2 SRAM
|
L3 Cluster Core Interface* -> L3 SRAM
|
CCX Cache-Coherent Master -> Transfer Switch (5x32B/c)

*These units are identical.

SRAM CTL Interface = SRAM Control Box => SRAM CTL Interface + SRAM Data+Tags = Lx SRAM
Lx SRAM < usually 2 Reads / 1 Write > loads and stores to Lx Load+Store Queue Buffering Unit < usually 4 Loads / 4 Stores for L1, unknown for L2/L3 other than min. 1R1W 32B > loads and stores to Core, L2, L3 or other CCI.
 
Last edited:
Reactions: Vattila

juergbi

Junior Member
Apr 27, 2019
12
14
41
We don’t get new IO until Zen 4 with DDR5 and PCI-e 5.

Is there a chance Badami/Trento is Zen 3-based EPYC with updated I/O (on SP5 with DDR5 and PCIe 5) to be launched in Summer 2021? In a Q4 2018 AMD roadmap such a product was listed (although under the name Genoa, which is now Zen 4 and expected in H1 2022). This would help against Sapphire Rapids, if Intel manages to launch that before Genoa.
 

eek2121

Diamond Member
Aug 2, 2005
3,043
4,266
136
There was some AMD slides showing an 8 core CCX for Zen 3. Zen 1 was essentially “self contained”, but that is actually very wasteful. It obviously allows for very good design reusability and was probably very cost effective to tape out. It had a lot of wasted die area through. All of the cpu chips had two 64-bit memory controllers, 32 pci-e/IFIS links, and 4 (only 3 used at any time) IFOP for connecting to other die. The rather large infinity fabric switch wasn’t really necessary for desktop parts; that also probably made latency worse. It also had four 32-bit infinity fabric links that were completely unused in Ryzen parts.

The look up the ISSCC 2018 slides covering the original Zen 1 Zeppelin die for the details.

I don’t know how a distributed system with 8 separate chips would work. The memory controllers are grouped together in pairs. Dual channel is 2x64 for 128-bit, but it is DDR, so it can actually generate 256-bits per memory clock, which is 32 bytes. This is the width of all of the infinity fabric and the cache line size, so it transfers 32 bytes per clock on most pathways. You wouldn’t want to put a single 64-bit controller on each of the 8 cpu die. You would limit the bandwidth available to any one core and create a lot of NUMA nodes. You would also have issues with different numbers of cpu die having different numbers of memory controllers available. AMD makes EPYC processors with 2, 4, 6, and 8 cpu chips. That wouldn’t work well if it was still like Zen 1 architecture. They did make a threadripper with 4 cpu die, but it was still limited to 4 channel memory, so two die did not have any connected memory controller. This was not very good under most circumstances. If the 8 memory channels were split across 8 chips, then you would always need 8 chips to connect all memory controllers

The current split allows for maximum reusability with very little wasted silicon. They can make huge numbers of the CPU die and bin them for almost their entire product stack. About the only waste is the one extra IFOP on single cpu chip Ryzen parts.

There is a possibility that there isn’t much info about Zen 3 yet since it is using the exact same IO die used with Zen 2 and it doesn’t require much of an update. It would look exactly the same from the outside since cpu cores only connect with the IO die. It would need different microcode and maybe a few other things, but it needs to be the same to use the same socket.

We don’t get new IO until Zen 4 with DDR5 and PCI-e 5.
There are a lot of ways to make things small and compact without being wasteful. Note that eliminating the IO die would not be done for performance, but rather, package space. It is 2:30am here and I am on mobile so I won’t go into it now, but I do have some thoughts here.

One thing to remember is, excluding the Threadripper and EPYC product lines (because they need MCM), MCM actually costs more than a more classic design. The vast majority of AMD’s desktop chips contain a single chiplet and IO die as only the top end use 2 chiplets. All of their laptops are also a single monolithic die. I am betting AMD will approach MCM very differently in the future.

EDIT: It looks like Renoir is actually much cheaper to make vs. desktop Ryzen. Granted they cut down on the cache, but they also added a GPU. Don’t be surprised if select future Ryzen chips end up being monolithic. Cheaper manufacturing means higher margins after all. Yeah R&D, tape-out, etc. all cost a fortune, but they are (relatively) fixed costs, and provided AMD can sell enough chips, the trade off is worth it.
 
Last edited:

A///

Diamond Member
Feb 24, 2017
4,352
3,155
136
Is Badami/Trento that supposed refreshed Epyc between Milan and Genoa people are talking about? How does Badami come into play? Are are we assuming that refreshes will be named after Indian cities as opposed to Italian?
 

A///

Diamond Member
Feb 24, 2017
4,352
3,155
136
We are assuming Zen 3 has the concept of a CCX at all. In an ideal world AMD wouldn’t even have an IO die. The chiplets themselves would be completely self contained.
This is an interesting approach and something discussed at length in the past. AM5 being from scratch and breaking with prior compatibility may bring more than just shiny new numbers. OTOH, it increases costs if there's chiplet defects. The IO dies make it dead cheap for AMD to assemble their processors.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,688
1,222
136
Is Badami/Trento that supposed refreshed Epyc between Milan and Genoa people are talking about? How does Badami come into play? Are are we assuming that refreshes will be named after Indian cities as opposed to Italian?
I believe Badami is Brown. Badam = Almond, Badami = Color of Almond aka Brown.
1. almond-colored
2. brownish
3. brown

Badami is Brown, thus its "classic rock" artist/band is James Brown. As far as I am willing to research and speculative-guess it.

19h 00h-0Fh => Genesis => "classic rock" Genesis.
19h 30h-3Fh => Badami as above. There are other bands/artists with Brown in their name in the same era.

Badami is probably Zen4 on SP3 and a newer TRX socket.
SP3r2 = TR4
SP3r3 = sTRX4
SP3r4 = ???

If it is Zen3, it is the real Zen3 core. (SR-28, XV-28, Z3-5, Z4-5 = same cores team)

Genesis(Milan)-SP3 -> Floyd(Genoa)-SP5 -> Badami(Trento)-SP3 -> Stones(___)-SP5
 
Last edited:

leoneazzurro

Golden Member
Jul 26, 2016
1,005
1,599
136
There are a lot of ways to make things small and compact without being wasteful. Note that eliminating the IO die would not be done for performance, but rather, package space. It is 2:30am here and I am on mobile so I won’t go into it now, but I do have some thoughts here.

One thing to remember is, excluding the Threadripper and EPYC product lines (because they need MCM), MCM actually costs more than a more classic design. The vast majority of AMD’s desktop chips contain a single chiplet and IO die as only the top end use 2 chiplets. All of their laptops are also a single monolithic die. I am betting AMD will approach MCM very differently in the future.

EDIT: It looks like Renoir is actually much cheaper to make vs. desktop Ryzen. Granted they cut down on the cache, but they also added a GPU. Don’t be surprised if select future Ryzen chips end up being monolithic. Cheaper manufacturing means higher margins after all. Yeah R&D, tape-out, etc. all cost a fortune, but they are (relatively) fixed costs, and provided AMD can sell enough chips, the trade off is worth it.

There was a clear declaration by AMD that chiplets cut down costs even in the 8 core desktop design


of course the biggest savings are on the highest core count CPUs.

The reasons of going monolithic for mobile are probably the fact that they had to use a 3 chip MCM with larger caches (bigger combined die size), better power savings having everything in 7nm and lower latencies interconnection, other that not planning of raising core count beyond 8.
 

Martimus

Diamond Member
Apr 24, 2007
4,488
153
106
There are a lot of ways to make things small and compact without being wasteful. Note that eliminating the IO die would not be done for performance, but rather, package space. It is 2:30am here and I am on mobile so I won’t go into it now, but I do have some thoughts here.

One thing to remember is, excluding the Threadripper and EPYC product lines (because they need MCM), MCM actually costs more than a more classic design. The vast majority of AMD’s desktop chips contain a single chiplet and IO die as only the top end use 2 chiplets. All of their laptops are also a single monolithic die. I am betting AMD will approach MCM very differently in the future.

EDIT: It looks like Renoir is actually much cheaper to make vs. desktop Ryzen. Granted they cut down on the cache, but they also added a GPU. Don’t be surprised if select future Ryzen chips end up being monolithic. Cheaper manufacturing means higher margins after all. Yeah R&D, tape-out, etc. all cost a fortune, but they are (relatively) fixed costs, and provided AMD can sell enough chips, the trade off is worth it.
How would AMD meet their contractual obligations with Global Foundries of they don't use them to create the IO? The chiplet design didn't just help with yield, it also helped AMD meet their contractual obligations with the foundry that isn't advancing.

But I agree that without the requirement to have a large percentage of chips fabbed at Global Foundries, that AMD would probably make more monolithic dies, or at least move the IO to a better process node.
 

moinmoin

Diamond Member
Jun 1, 2017
4,994
7,765
136
How would AMD meet their contractual obligations with Global Foundries of they don't use them to create the IO? The chiplet design didn't just help with yield, it also helped AMD meet their contractual obligations with the foundry that isn't advancing.
The current WSA runs out next spring and AMD so far hasn't denoted further contractual liabilities beyond that. Though @NostaSeronx claimed before that there will be a new WSA not yet public after that point.
 

Asterox

Golden Member
May 15, 2012
1,028
1,786
136
Not correct - AMD compared Zen (Summit Ridge) to Excavator (2015), Bulldozer was a first gen released on October 2011.
https://en.wikipedia.org/wiki/Bulldozer_(microarchitecture)
https://en.wikipedia.org/wiki/Excavator_(microarchitecture)

Yes, and in AM3 socket there was not a single one Steamroller or Excavator CPU.It is only one test or benchmark, but we can see difference in CPU performance.For example, Athlon 3000G score in CR15 is about 385.

Blah, FX-4100 was or is very bad processor.But hey it has modern CPU instructions, compared to very old Phenom or Athlon oldtimers.

 

Martimus

Diamond Member
Apr 24, 2007
4,488
153
106
The current WSA runs out next spring and AMD so far hasn't denoted further contractual liabilities beyond that. Though @NostaSeronx claimed before that there will be a new WSA not yet public after that point.
Wasn't there an Anandtech article that showed it was extended 3 years just a couple months ago? Memory is not my strong suit, so it might have just been a part of the WAN show or a news update from Gamers Nexus.

EDIT: https://www.anandtech.com/show/1391...th-globalfoudries-set-to-buy-wafers-till-2021
It looks like the agreement goes through March 1, 2024. However, they can pay GF a percentage of the lost revenue if they don't meet their wafer targets.
 
Last edited:

eek2121

Diamond Member
Aug 2, 2005
3,043
4,266
136
There was a clear declaration by AMD that chiplets cut down costs even in the 8 core desktop design


of course the biggest savings are on the highest core count CPUs.

The reasons of going monolithic for mobile are probably the fact that they had to use a 3 chip MCM with larger caches (bigger combined die size), better power savings having everything in 7nm and lower latencies interconnection, other that not planning of raising core count beyond 8.

With low yields, yes. AMD doesn’t have yield issues. Also I wasn’t strictly discussing a monolithic design, but rather, a true MCM based approach where a package can have n number of dies where n can be between 1 and however many the package can hold. each nodule can contain x86 cores, ARM cores, tensor cores, GPU compute cores, etc. I will give more specific thoughts later.

EDIT: AMD and Intel have been binning for decades without chiplets.
 

leoneazzurro

Golden Member
Jul 26, 2016
1,005
1,599
136
AMD does not have yield issues also because they went for small chiplets. I was pointing only out that your statement about MCM costing more than a traditional design was incorrect.
 

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
With low yields, yes. AMD doesn’t have yield issues. Also I wasn’t strictly discussing a monolithic design, but rather, a true MCM based approach where a package can have n number of dies where n can be between 1 and however many the package can hold. each nodule can contain x86 cores, ARM cores, tensor cores, GPU compute cores, etc. I will give more specific thoughts later.

EDIT: AMD and Intel have been binning for decades without chiplets.
There are many other 2nd order effects of chiplets besides what you're saying. You're only thinking of the simplest, yield @ fabbing cost.

One of the big ones is the ability to bin a CPU that will be impossible for a single unified die of equal core capacity.

Other big ones are design, verification & inventory cost savings.
 
Reactions: spursindonesia

moinmoin

Diamond Member
Jun 1, 2017
4,994
7,765
136
Wasn't there an Anandtech article that showed it was extended 3 years just a couple months ago? Memory is not my strong suit, so it might have just been a part of the WAN show or a news update from Gamers Nexus.

EDIT: https://www.anandtech.com/show/1391...th-globalfoudries-set-to-buy-wafers-till-2021
It looks like the agreement goes through March 1, 2024. However, they can pay GF a percentage of the lost revenue if they don't meet their wafer targets.
Thanks, the WSA is active through March 1, 2024 indeed, but the current amendment runs out next spring. Neither has been a new amendment for the last three years been announced nor has AMD allocated any significant purchase obligations in this year's Form 10-K beyond next spring. As of now I'm of the belief that whatever purchase obligations may be left are at this point dwarfed by the amount of legacy and IOD chips AMD will continue to order from GloFo anyway (like Zen 1 embedded chips that have a guaranteed availability through 2028), so a new amendment may be deemed unnecessary. I guess we'll see next January if a new amendment is being announced and what it entails.
 
Last edited:

uzzi38

Platinum Member
Oct 16, 2019
2,702
6,405
146
I believe Badami is Brown. Badam = Almond, Badami = Color of Almond aka Brown.
1. almond-colored
2. brownish
3. brown

Badami is Brown, thus its "classic rock" artist/band is James Brown. As far as I am willing to research and speculative-guess it.

19h 00h-0Fh => Genesis => "classic rock" Genesis.
19h 30h-3Fh => Badami as above. There are other bands/artists with Brown in their name in the same era.

Badami is probably Zen4 on SP3 and a newer TRX socket.
SP3r2 = TR4
SP3r3 = sTRX4
SP3r4 = ???

If it is Zen3, it is the real Zen3 core. (SR-28, XV-28, Z3-5, Z4-5 = same cores team)

Genesis(Milan)-SP3 -> Floyd(Genoa)-SP5 -> Badami(Trento)-SP3 -> Stones(___)-SP5

This is incorrect. Trento is not Zen 4 on SP3.
 

Martimus

Diamond Member
Apr 24, 2007
4,488
153
106
Thanks, the WSA is active through March 1, 2024 indeed, but the current amendment runs out next spring. Neither has been a new amendment for the last three years be announced nor has AMD allocated any significant purchase obligations in this year's Form 10-K beyond next spring. As of now I'm of the belief that whatever purchase obligations may be left are at this point dwarfed by the amount of legacy and IOD chips AMD will continue to order from GloFo anyway (like embedded chips that have a guaranteed availability through 2028), so a new amendment may be deemed unnecessary. I guess we'll see next January if a new amendment is being announced and what it entails.
Thanks. The link was a great writeup of what is going on. I apologize for bringing up old stuff, but I haven't been active on Anandtech since 2014 and I've missed a bunch of the conversations here.
 
Reactions: french toast

moinmoin

Diamond Member
Jun 1, 2017
4,994
7,765
136
Thanks. The link was a great writeup of what is going on. I apologize for bringing up old stuff, but I haven't been active on Anandtech since 2014 and I've missed a bunch of the conversations here.
No need to apologize. The WSA stuff is rather opaque so I just know I looked into it once before, and who aside Su knows what will happen still.
 
Reactions: Martimus

Lennox0010

Junior Member
Aug 26, 2020
2
1
36
Thanks, the WSA is active through March 1, 2024 indeed, but the current amendment runs out next spring. Neither has been a new amendment for the last three years been announced nor has AMD allocated any significant purchase obligations in this year's Form 10-K beyond next spring. As of now I'm of the belief that whatever purchase obligations may be left are at this point dwarfed by the amount of legacy and IOD chips AMD will continue to order from GloFo anyway (like Zen 1 embedded chips that have a guaranteed availability through 2028), so a new amendment may be deemed unnecessary. I guess we'll see next January if a new amendment is being announced and what it entails.

I thought the current amendment was 7 which runs until 2024?

“Today, the seventh amendment of the WSA spans on January 2019 through March 2024,” Devinder Kumar, AMD CFO, says during the earnings call (via Seeking Alpha). “It establishes purchase commitments and pricing at 12-nanometer and above for the years 2019 through 2021. The amendment also provides AMD full sourcing flexibility at 7-nanometer and beyond without any one-time payments or royalties for products, purchase from other foundries.”

 
Reactions: amd6502
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |