Speculation: Ryzen 4000 series/Zen 3

jamescox · Aug 26, 2020

eek2121 said:
We are assuming Zen 3 has the concept of a CCX at all. In an ideal world AMD wouldn’t even have an IO die. The chiplets themselves would be completely self contained.

There was some AMD slides showing an 8 core CCX for Zen 3. Zen 1 was essentially “self contained”, but that is actually very wasteful. It obviously allows for very good design reusability and was probably very cost effective to tape out. It had a lot of wasted die area through. All of the cpu chips had two 64-bit memory controllers, 32 pci-e/IFIS links, and 4 (only 3 used at any time) IFOP for connecting to other die. The rather large infinity fabric switch wasn’t really necessary for desktop parts; that also probably made latency worse. It also had four 32-bit infinity fabric links that were completely unused in Ryzen parts.

The look up the ISSCC 2018 slides covering the original Zen 1 Zeppelin die for the details.

I don’t know how a distributed system with 8 separate chips would work. The memory controllers are grouped together in pairs. Dual channel is 2x64 for 128-bit, but it is DDR, so it can actually generate 256-bits per memory clock, which is 32 bytes. This is the width of all of the infinity fabric and the cache line size, so it transfers 32 bytes per clock on most pathways. You wouldn’t want to put a single 64-bit controller on each of the 8 cpu die. You would limit the bandwidth available to any one core and create a lot of NUMA nodes. You would also have issues with different numbers of cpu die having different numbers of memory controllers available. AMD makes EPYC processors with 2, 4, 6, and 8 cpu chips. That wouldn’t work well if it was still like Zen 1 architecture. They did make a threadripper with 4 cpu die, but it was still limited to 4 channel memory, so two die did not have any connected memory controller. This was not very good under most circumstances. If the 8 memory channels were split across 8 chips, then you would always need 8 chips to connect all memory controllers

The current split allows for maximum reusability with very little wasted silicon. They can make huge numbers of the CPU die and bin them for almost their entire product stack. About the only waste is the one extra IFOP on single cpu chip Ryzen parts.

There is a possibility that there isn’t much info about Zen 3 yet since it is using the exact same IO die used with Zen 2 and it doesn’t require much of an update. It would look exactly the same from the outside since cpu cores only connect with the IO die. It would need different microcode and maybe a few other things, but it needs to be the same to use the same socket.

We don’t get new IO until Zen 4 with DDR5 and PCI-e 5.

Vattila · Aug 26, 2020

NostaSeronx said:
Core A L2 sends request for read from L3 to cluster core interface A, the shared memory table says the data isn't in L3 slice A but is in L3 slice D.

The location of a cache line, i.e. the slice in which it resides, is directly determined by the address, using memory address interleaving. No table lookup is hence needed for this purpose.

So I think your scenario goes like this: The L2 cache controller for core A sends a request to its local L3 cache controller. The latter then routes the request to the L3 cache controller for core D, based on the lower bits of the cache line address.

NostaSeronx · Aug 26, 2020

Vattila said:
The location of a cache line, i.e. the slice in which it resides, is directly determined by the address, using memory address interleaving. No table is hence needed.

So I think your scenario goes like this: The L2 cache controller for core A sends a request to its local L3 cache controller. The latter then routes the request to the L3 cache controller for core D, based on the lower bits of the cache line address.

The memory table = address tags for L3.
Also, the L2 unit that does transfers is called the L2 Bus Unit. (Similar to Jaguar's BU)
Also, the L3 unit that does transfers is called the Cluster Core Interface. (Similar to Jaguar's L2 un-distributed cluster core interface)

None of the cores directly interface with the L2 SRAM but rather the L2BU or L3 SRAM but rather the CCI.

Core
|
LSU* - L1d SRAM
|
L2 Bus Unit* -> L2 SRAM
|
L3 Cluster Core Interface* -> L3 SRAM
|
CCX Cache-Coherent Master -> Transfer Switch (5x32B/c)

*These units are identical.

SRAM CTL Interface = SRAM Control Box => SRAM CTL Interface + SRAM Data+Tags = Lx SRAM
Lx SRAM < usually 2 Reads / 1 Write > loads and stores to Lx Load+Store Queue Buffering Unit < usually 4 Loads / 4 Stores for L1, unknown for L2/L3 other than min. 1R1W 32B > loads and stores to Core, L2, L3 or other CCI.

juergbi · Aug 26, 2020

jamescox said:
We don’t get new IO until Zen 4 with DDR5 and PCI-e 5.

Is there a chance Badami/Trento is Zen 3-based EPYC with updated I/O (on SP5 with DDR5 and PCIe 5) to be launched in Summer 2021? In a Q4 2018 AMD roadmap such a product was listed (although under the name Genoa, which is now Zen 4 and expected in H1 2022). This would help against Sapphire Rapids, if Intel manages to launch that before Genoa.

eek2121 · Aug 26, 2020

jamescox said:
There was some AMD slides showing an 8 core CCX for Zen 3. Zen 1 was essentially “self contained”, but that is actually very wasteful. It obviously allows for very good design reusability and was probably very cost effective to tape out. It had a lot of wasted die area through. All of the cpu chips had two 64-bit memory controllers, 32 pci-e/IFIS links, and 4 (only 3 used at any time) IFOP for connecting to other die. The rather large infinity fabric switch wasn’t really necessary for desktop parts; that also probably made latency worse. It also had four 32-bit infinity fabric links that were completely unused in Ryzen parts.

The look up the ISSCC 2018 slides covering the original Zen 1 Zeppelin die for the details.

I don’t know how a distributed system with 8 separate chips would work. The memory controllers are grouped together in pairs. Dual channel is 2x64 for 128-bit, but it is DDR, so it can actually generate 256-bits per memory clock, which is 32 bytes. This is the width of all of the infinity fabric and the cache line size, so it transfers 32 bytes per clock on most pathways. You wouldn’t want to put a single 64-bit controller on each of the 8 cpu die. You would limit the bandwidth available to any one core and create a lot of NUMA nodes. You would also have issues with different numbers of cpu die having different numbers of memory controllers available. AMD makes EPYC processors with 2, 4, 6, and 8 cpu chips. That wouldn’t work well if it was still like Zen 1 architecture. They did make a threadripper with 4 cpu die, but it was still limited to 4 channel memory, so two die did not have any connected memory controller. This was not very good under most circumstances. If the 8 memory channels were split across 8 chips, then you would always need 8 chips to connect all memory controllers

The current split allows for maximum reusability with very little wasted silicon. They can make huge numbers of the CPU die and bin them for almost their entire product stack. About the only waste is the one extra IFOP on single cpu chip Ryzen parts.

There is a possibility that there isn’t much info about Zen 3 yet since it is using the exact same IO die used with Zen 2 and it doesn’t require much of an update. It would look exactly the same from the outside since cpu cores only connect with the IO die. It would need different microcode and maybe a few other things, but it needs to be the same to use the same socket.

We don’t get new IO until Zen 4 with DDR5 and PCI-e 5.

There are a lot of ways to make things small and compact without being wasteful. Note that eliminating the IO die would not be done for performance, but rather, package space. It is 2:30am here and I am on mobile so I won’t go into it now, but I do have some thoughts here.

One thing to remember is, excluding the Threadripper and EPYC product lines (because they need MCM), MCM actually costs more than a more classic design. The vast majority of AMD’s desktop chips contain a single chiplet and IO die as only the top end use 2 chiplets. All of their laptops are also a single monolithic die. I am betting AMD will approach MCM very differently in the future.

EDIT: It looks like Renoir is actually much cheaper to make vs. desktop Ryzen. Granted they cut down on the cache, but they also added a GPU. Don’t be surprised if select future Ryzen chips end up being monolithic. Cheaper manufacturing means higher margins after all. Yeah R&D, tape-out, etc. all cost a fortune, but they are (relatively) fixed costs, and provided AMD can sell enough chips, the trade off is worth it.

A/// · Aug 26, 2020

Is Badami/Trento that supposed refreshed Epyc between Milan and Genoa people are talking about? How does Badami come into play? Are are we assuming that refreshes will be named after Indian cities as opposed to Italian?

A/// · Aug 26, 2020

eek2121 said:
We are assuming Zen 3 has the concept of a CCX at all. In an ideal world AMD wouldn’t even have an IO die. The chiplets themselves would be completely self contained.

This is an interesting approach and something discussed at length in the past. AM5 being from scratch and breaking with prior compatibility may bring more than just shiny new numbers. OTOH, it increases costs if there's chiplet defects. The IO dies make it dead cheap for AMD to assemble their processors.

NostaSeronx · Aug 26, 2020

A/// said:
Is Badami/Trento that supposed refreshed Epyc between Milan and Genoa people are talking about? How does Badami come into play? Are are we assuming that refreshes will be named after Indian cities as opposed to Italian?

I believe Badami is Brown. Badam = Almond, Badami = Color of Almond aka Brown.

বাদামী - Wiktionary

en.wiktionary.org

1. almond-colored
2. brownish
3. brown

Badami is Brown, thus its "classic rock" artist/band is James Brown. As far as I am willing to research and speculative-guess it.

19h 00h-0Fh => Genesis => "classic rock" Genesis.
19h 30h-3Fh => Badami as above. There are other bands/artists with Brown in their name in the same era.

Badami is probably Zen4 on SP3 and a newer TRX socket.
SP3r2 = TR4
SP3r3 = sTRX4
SP3r4 = ???

If it is Zen3, it is the real Zen3 core. (SR-28, XV-28, Z3-5, Z4-5 = same cores team)

Genesis(Milan)-SP3 -> Floyd(Genoa)-SP5 -> Badami(Trento)-SP3 -> Stones(___)-SP5

A/// · Aug 26, 2020

Oh sweet jesus...

leoneazzurro · Aug 26, 2020

eek2121 said:
There are a lot of ways to make things small and compact without being wasteful. Note that eliminating the IO die would not be done for performance, but rather, package space. It is 2:30am here and I am on mobile so I won’t go into it now, but I do have some thoughts here.

One thing to remember is, excluding the Threadripper and EPYC product lines (because they need MCM), MCM actually costs more than a more classic design. The vast majority of AMD’s desktop chips contain a single chiplet and IO die as only the top end use 2 chiplets. All of their laptops are also a single monolithic die. I am betting AMD will approach MCM very differently in the future.

EDIT: It looks like Renoir is actually much cheaper to make vs. desktop Ryzen. Granted they cut down on the cache, but they also added a GPU. Don’t be surprised if select future Ryzen chips end up being monolithic. Cheaper manufacturing means higher margins after all. Yeah R&D, tape-out, etc. all cost a fortune, but they are (relatively) fixed costs, and provided AMD can sell enough chips, the trade off is worth it.

There was a clear declaration by AMD that chiplets cut down costs even in the 8 core desktop design

AMD Gives Itself Massive Cost-cutting Headroom with the Chiplet Design

At its 2020 IEEE ISSCC keynote, AMD presented two slides that detail the extent of cost savings yielded by its bold decision to embrace the MCM (multi-chip module) approach to not just its enterprise and HEDT processors, but also its mainstream desktop ones. By confining only those components...

www.techpowerup.com

of course the biggest savings are on the highest core count CPUs.

The reasons of going monolithic for mobile are probably the fact that they had to use a 3 chip MCM with larger caches (bigger combined die size), better power savings having everything in 7nm and lower latencies interconnection, other that not planning of raising core count beyond 8.

Martimus · Aug 26, 2020

eek2121 said:
There are a lot of ways to make things small and compact without being wasteful. Note that eliminating the IO die would not be done for performance, but rather, package space. It is 2:30am here and I am on mobile so I won’t go into it now, but I do have some thoughts here.

One thing to remember is, excluding the Threadripper and EPYC product lines (because they need MCM), MCM actually costs more than a more classic design. The vast majority of AMD’s desktop chips contain a single chiplet and IO die as only the top end use 2 chiplets. All of their laptops are also a single monolithic die. I am betting AMD will approach MCM very differently in the future.

EDIT: It looks like Renoir is actually much cheaper to make vs. desktop Ryzen. Granted they cut down on the cache, but they also added a GPU. Don’t be surprised if select future Ryzen chips end up being monolithic. Cheaper manufacturing means higher margins after all. Yeah R&D, tape-out, etc. all cost a fortune, but they are (relatively) fixed costs, and provided AMD can sell enough chips, the trade off is worth it.

How would AMD meet their contractual obligations with Global Foundries of they don't use them to create the IO? The chiplet design didn't just help with yield, it also helped AMD meet their contractual obligations with the foundry that isn't advancing.

But I agree that without the requirement to have a large percentage of chips fabbed at Global Foundries, that AMD would probably make more monolithic dies, or at least move the IO to a better process node.

moinmoin · Aug 26, 2020

Martimus said:
How would AMD meet their contractual obligations with Global Foundries of they don't use them to create the IO? The chiplet design didn't just help with yield, it also helped AMD meet their contractual obligations with the foundry that isn't advancing.

The current WSA runs out next spring and AMD so far hasn't denoted further contractual liabilities beyond that. Though @NostaSeronx claimed before that there will be a new WSA not yet public after that point.

Asterox · Aug 26, 2020

rainy said:
Not correct - AMD compared Zen (Summit Ridge) to Excavator (2015), Bulldozer was a first gen released on October 2011.
https://en.wikipedia.org/wiki/Bulldozer_(microarchitecture)
https://en.wikipedia.org/wiki/Excavator_(microarchitecture)

Yes, and in AM3 socket there was not a single one Steamroller or Excavator CPU.It is only one test or benchmark, but we can see difference in CPU performance.For example, Athlon 3000G score in CR15 is about 385.

Blah, FX-4100 was or is very bad processor.But hey it has modern CPU instructions, compared to very old Phenom or Athlon oldtimers.

Martimus · Aug 26, 2020

moinmoin said:
The current WSA runs out next spring and AMD so far hasn't denoted further contractual liabilities beyond that. Though @NostaSeronx claimed before that there will be a new WSA not yet public after that point.

Wasn't there an Anandtech article that showed it was extended 3 years just a couple months ago? Memory is not my strong suit, so it might have just been a part of the WAN show or a news update from Gamers Nexus.

EDIT: https://www.anandtech.com/show/1391...th-globalfoudries-set-to-buy-wafers-till-2021
It looks like the agreement goes through March 1, 2024. However, they can pay GF a percentage of the lost revenue if they don't meet their wafer targets.

DrMrLordX · Aug 26, 2020

moinmoin said:
Though NostaSeronx claimed before that there will be a new WSA not yet public after that point.

Which means that it probably won't happen after all.

Kenmitch · Aug 26, 2020

DrMrLordX said:
Which means that it probably won't happen after all.

Viewing the post above yours.....

As Maxwell Smart says " Missed it by that much " .

eek2121 · Aug 26, 2020

leoneazzurro said:
There was a clear declaration by AMD that chiplets cut down costs even in the 8 core desktop design

AMD Gives Itself Massive Cost-cutting Headroom with the Chiplet Design

At its 2020 IEEE ISSCC keynote, AMD presented two slides that detail the extent of cost savings yielded by its bold decision to embrace the MCM (multi-chip module) approach to not just its enterprise and HEDT processors, but also its mainstream desktop ones. By confining only those components...

www.techpowerup.com

of course the biggest savings are on the highest core count CPUs.

The reasons of going monolithic for mobile are probably the fact that they had to use a 3 chip MCM with larger caches (bigger combined die size), better power savings having everything in 7nm and lower latencies interconnection, other that not planning of raising core count beyond 8.

With low yields, yes. AMD doesn’t have yield issues. Also I wasn’t strictly discussing a monolithic design, but rather, a true MCM based approach where a package can have n number of dies where n can be between 1 and however many the package can hold. each nodule can contain x86 cores, ARM cores, tensor cores, GPU compute cores, etc. I will give more specific thoughts later.

EDIT: AMD and Intel have been binning for decades without chiplets.

leoneazzurro · Aug 26, 2020

AMD does not have yield issues also because they went for small chiplets. I was pointing only out that your statement about MCM costing more than a traditional design was incorrect.

maddie · Aug 26, 2020

eek2121 said:
With low yields, yes. AMD doesn’t have yield issues. Also I wasn’t strictly discussing a monolithic design, but rather, a true MCM based approach where a package can have n number of dies where n can be between 1 and however many the package can hold. each nodule can contain x86 cores, ARM cores, tensor cores, GPU compute cores, etc. I will give more specific thoughts later.

EDIT: AMD and Intel have been binning for decades without chiplets.

There are many other 2nd order effects of chiplets besides what you're saying. You're only thinking of the simplest, yield @ fabbing cost.

One of the big ones is the ability to bin a CPU that will be impossible for a single unified die of equal core capacity.

Other big ones are design, verification & inventory cost savings.

DrMrLordX · Aug 26, 2020

Kenmitch said:
Viewing the post above yours.....

As Maxwell Smart says " Missed it by that much " .

That's just a WSA extension with an out clause, not a new agreement.

moinmoin · Aug 26, 2020

Martimus said:
Wasn't there an Anandtech article that showed it was extended 3 years just a couple months ago? Memory is not my strong suit, so it might have just been a part of the WAN show or a news update from Gamers Nexus.

EDIT: https://www.anandtech.com/show/1391...th-globalfoudries-set-to-buy-wafers-till-2021
It looks like the agreement goes through March 1, 2024. However, they can pay GF a percentage of the lost revenue if they don't meet their wafer targets.

Thanks, the WSA is active through March 1, 2024 indeed, but the current amendment runs out next spring. Neither has been a new amendment for the last three years been announced nor has AMD allocated any significant purchase obligations in this year's Form 10-K beyond next spring. As of now I'm of the belief that whatever purchase obligations may be left are at this point dwarfed by the amount of legacy and IOD chips AMD will continue to order from GloFo anyway (like Zen 1 embedded chips that have a guaranteed availability through 2028), so a new amendment may be deemed unnecessary. I guess we'll see next January if a new amendment is being announced and what it entails.

uzzi38 · Aug 26, 2020

NostaSeronx said:
I believe Badami is Brown. Badam = Almond, Badami = Color of Almond aka Brown.

বাদামী - Wiktionary

en.wiktionary.org

1. almond-colored
2. brownish
3. brown

Badami is Brown, thus its "classic rock" artist/band is James Brown. As far as I am willing to research and speculative-guess it.

19h 00h-0Fh => Genesis => "classic rock" Genesis.
19h 30h-3Fh => Badami as above. There are other bands/artists with Brown in their name in the same era.

Badami is probably Zen4 on SP3 and a newer TRX socket.
SP3r2 = TR4
SP3r3 = sTRX4
SP3r4 = ???

If it is Zen3, it is the real Zen3 core. (SR-28, XV-28, Z3-5, Z4-5 = same cores team)

Genesis(Milan)-SP3 -> Floyd(Genoa)-SP5 -> Badami(Trento)-SP3 -> Stones(___)-SP5

This is incorrect. Trento is not Zen 4 on SP3.

Martimus · Aug 26, 2020

moinmoin said:
Thanks, the WSA is active through March 1, 2024 indeed, but the current amendment runs out next spring. Neither has been a new amendment for the last three years be announced nor has AMD allocated any significant purchase obligations in this year's Form 10-K beyond next spring. As of now I'm of the belief that whatever purchase obligations may be left are at this point dwarfed by the amount of legacy and IOD chips AMD will continue to order from GloFo anyway (like embedded chips that have a guaranteed availability through 2028), so a new amendment may be deemed unnecessary. I guess we'll see next January if a new amendment is being announced and what it entails.

Thanks. The link was a great writeup of what is going on. I apologize for bringing up old stuff, but I haven't been active on Anandtech since 2014 and I've missed a bunch of the conversations here.

moinmoin · Aug 26, 2020

Martimus said:
Thanks. The link was a great writeup of what is going on. I apologize for bringing up old stuff, but I haven't been active on Anandtech since 2014 and I've missed a bunch of the conversations here.

No need to apologize. The WSA stuff is rather opaque so I just know I looked into it once before, and who aside Su knows what will happen still.

Lennox0010 · Aug 26, 2020

moinmoin said:
Thanks, the WSA is active through March 1, 2024 indeed, but the current amendment runs out next spring. Neither has been a new amendment for the last three years been announced nor has AMD allocated any significant purchase obligations in this year's Form 10-K beyond next spring. As of now I'm of the belief that whatever purchase obligations may be left are at this point dwarfed by the amount of legacy and IOD chips AMD will continue to order from GloFo anyway (like Zen 1 embedded chips that have a guaranteed availability through 2028), so a new amendment may be deemed unnecessary. I guess we'll see next January if a new amendment is being announced and what it entails.

I thought the current amendment was 7 which runs until 2024?

“Today, the seventh amendment of the WSA spans on January 2019 through March 2024,” Devinder Kumar, AMD CFO, says during the earnings call (via Seeking Alpha). “It establishes purchase commitments and pricing at 12-nanometer and above for the years 2019 through 2021. The amendment also provides AMD full sourcing flexibility at 7-nanometer and beyond without any one-time payments or royalties for products, purchase from other foundries.”

AMD no longer has to pay millions in royalties to make 7nm CPUs and GPUs

AMD is now officially weapons-free to build future CPU and GPUs on 7nm and denser process nodes as GlobalFoundries steps down

www.pcgamesn.com

Speculation: Ryzen 4000 series/Zen 3

Senior member

Senior member

Diamond Member

Junior Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Golden Member

Diamond Member

Lifer

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Junior Member