64 core EPYC Rome (Zen2)Architecture Overview?

Page 19 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Zapetu

Member
Nov 6, 2018
94
165
66
I estimated 250-300 mm^2 without L4. The actual die size suggest maybe there will be an L4 after all. But perhaps not a big one. Say 128-256MB. We can hope.

Excuse me, but 256MB for an SRAM is really, really big.

AMD stated that they would be using a 14 nm process but they didn't specify which one. There might be a small change that they would use IBM based 14HP process which leverages embedded DRAM (eDRAM). Would be more suitable for the I/O die but also more expensive than the Samsung based 14 nm process.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
AMD stated that they would be using a 14 nm process but they didn't specify which one. There might be a small change that they would use IBM based 14HP process which leverages embedded DRAM (eDRAM). Would be more suitable for the I/O die but also more expensive than the Samsung based 14 nm process.

That's still extremely large. eDRAM is much less dense than DRAM. 256MB eDRAM is like a 90MB SRAM. I guess it could use eDRAM as it'll be a prohibitively large die. AMD said the L3 cache portion took 16mm2 for 8MB, so 256MB would take 500mm2. eDRAM being 3x the density is still 170mm2.
 

PeterScott

Platinum Member
Jul 7, 2017
2,605
1,540
136
It might be marginally slower than an optimised direct connection, but Zen2 will likely have lower DRAM latencies than Zen1.

So then you have a situation where you make some Internal latency improvement for Ryzen 3000 vs 2000, but then turn around and squander them with an off die Memory Controller.

Better to have a monolithic part for Ryzen 3000, to better capitalize on latency gains vs Intel desktop parts.
 

Gideon

Golden Member
Nov 27, 2007
1,709
3,927
136
There was a very interesting discussion on twitter about why the I/O chip uses 14nm instead of 16nm:

WSA is the obvious reason, but there was another much more interesting theory, namely: GF 14HP process supports eDRAM (requirement for IBM) while TSMC 16nm does not.

The I/O Die is way too big to only have I/O and no cache, yet it also seems too small to fit 256MB SRAM L4 cache. 256 MB of eDRAM L4 would take roughly 120mm², which is just about right.

That alone would be a very good reason to use GF 14nm HP process! Now we don't know which version of 14nm they are using, but the fact that they are not using 12nm, despite it being available (for the obvious density improvement with 7.5T libraries), lends some credibility to the idea.

EDIT:
Damn it, took too long to write the idea down and @Zapetu beat me to it!
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
The I/O Die is way too big to only have I/O and no cache, yet it also seems too small to fit 256MB SRAM L4 cache. 256 MB of eDRAM L4 would take roughly 120mm², which is just about right.

120mm2 is just for the cells. If its a cache it needs tags and it takes an enormous portion of it.
 

Gideon

Golden Member
Nov 27, 2007
1,709
3,927
136
So then you have a situation where you make some Internal latency improvement for Ryzen 3000 vs 2000, but then turn around and squander them with an off die Memory Controller.

Better to have a monolithic part for Ryzen 3000, to better capitalize on latency gains vs Intel desktop parts.
Doing a whole new (and considerably bigger) monolithic die just for the Ryzen 3000 makes absolutely no sense IMO, considering the costs.

I can see them maybe doing it, if the monolithic die had a decent GPU and is usable for 15W - 105W desktop and mobile parts. Catering to both Pinnacle Ridge and Raven Ridge market (as they do need an eventual 7nm APU).

Then again, if they do add a GPU, they'll run into the problem of Navi not being ready till 2019H2, postponing the whole thing (as I seriously doubt they will add a 7nm Linux only Vega-20 derivate).

Considering AMDs past, there is near zero change, they'll tape out a whole new chip just for the Pinnacle Ridge replacement (and then another one for RR). The market is just ludicrously small.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Catering to both Pinnacle Ridge and Raven Ridge market (as they do need an eventual 7nm APU).

Well, if the monolithic die replaces both Pinnacle Ridge and Raven Ridge like how it works with Intel it can work. You'll have an iGPU that covers mobile to $400 desktop chips. Then you make an HEDT chip by using the EPYC die and chiplets to get many cores.
 
Reactions: lightmanek

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
Well, if the monolithic die replaces both Pinnacle Ridge and Raven Ridge like how it works with Intel it can work. You'll have an iGPU that covers mobile to $400 desktop chips. Then you make an HEDT chip by using the EPYC die and chiplets to get many cores.
My problem with this scenario is the timeframe implications. I thought that there will be no 7nm APU refresh until late 2019. This means no Ryzen 3xxx until that time.

Do we really expect this to happen?
 

PeterScott

Platinum Member
Jul 7, 2017
2,605
1,540
136
Doing a whole new (and considerably bigger) monolithic die just for the Ryzen 3000 makes absolutely no sense IMO, considering the costs.

I can see them maybe doing it, if the monolithic die had a decent GPU and is usable for 15W - 105W desktop and mobile parts. Catering to both Pinnacle Ridge and Raven Ridge market (as they do need an eventual 7nm APU). Considering AMDs past, there is near zero change, they'll tape out a whole new chip just for the Pinnacle Ridge replacement (and then another one for RR). The market is just ludicrously small.

First. Doing only an 8C APU was actually something I suggested if they really need to reduce design cost that much.

But the market for that is not ludicrously small. The market is nearly the entire consumer PC Laptop/Desktop market. two hundred+ million devices/year IIRC. In the worse case AMD gets 10%, so at minimum AMD is selling Twenty+ million desktop/laptop CPUs/year.

Discrete GPUs OTOH are something like 40 million annually IIRC, and AMD only gets about 30% of that. So I guess they will abandon the GPU business altogether.

Because if they can't afford to be build a dedicated CPU for a share of 200 million sales, then certainly can't build dies for a share of 40 million sales, especially for the multiple GPU models required to stay relevant.
 

Gideon

Golden Member
Nov 27, 2007
1,709
3,927
136
Well, if the monolithic die replaces both Pinnacle Ridge and Raven Ridge like how it works with Intel it can work. You'll have an iGPU that covers mobile to $400 desktop chips. Then you make an HEDT chip by using the EPYC die and chiplets to get many cores.
This was also my initial guess. Now that we know that the chiplets do not require an interposer, I'm not that convinced! Especially because of the aforementioned timing issue of 7nm Navi coming later.

Right now, I'm more leaning towards them doing both for AM4. That's because:

Doing a 12-16 core chiplet design for AM4 would take nothing away from the theoretical 8-core monolithic APU. But it would allow:

1. AMD to have a clear advantage on mainstream socket over Intel in heavily threaded apps (or streaming), that would otherwise just be a about a draw (at best).
2. It would allow them to release a 8-core APU later down the line, not having to "wait for Navi"™, or releasing it with Vega.
3. They could still release the monolithic APU later, when it's ready. Or even design the APU as a chiplet design with 3 chiplets (1x8-cores, IO, GPU) though I doubt they would go as far.

I mean, It'll be quite hard to beat (or match) Intel on desktop in absolute performance with just a 8-core. Yet, as we've seen from EPYC, they could fit up to 16 cores into the same power envelope with similar clock-speeds. Yes it would be somewhat BW limited, but no more than the 64 core epyc or 32 core Threadripper (and faster memory support could mitigate some of that).
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
My problem with this scenario is the timeframe implications. I thought that there will be no 7nm APU refresh until late 2019. This means no Ryzen 3xxx until that time.

Do we really expect this to happen?

You think an APU will be made with an MCM?

For 70-100mm2 chips there's no gain to be had with an MCM, and all the losses. Despite what some may say.

I guess they can do similar to Summit Ridge and Pinnacle Ridge launch. Have an iGPU-less desktop part come early 2019 based on chiplets and monolithic APU in late 2019.
 
Reactions: spursindonesia

Gideon

Golden Member
Nov 27, 2007
1,709
3,927
136
I guess they can do similar to Summit Ridge and Pinnacle Ridge launch. Have an iGPU-less desktop part come early 2019 based on chiplets and monolithic APU in late 2019.
That would be my guess as well (with a small change of the apu not being monolithic, but I don't really like the implications of that for mobile).

However if AMD goes that route, they would waste a huge opportunity, if they don't make the APU a 8-core (they have teased that in an interview), and the desktop part 16 cores (with the possibility of releasing with 12 initially and holding 16 cores off until they're near the Threadripper 3 launch).

They would waste the main benefit 7nm could give them - the ability to run twice the cores, with similar all-core frequencies as today's chips at similar power.
 

HurleyBird

Platinum Member
Apr 22, 2003
2,725
1,342
136
My guess is that a separate IO die adds too much latency for consumer sans interposer.
 

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
You think an APU will be made with an MCM?

For 70-100mm2 chips there's no gain to be had with an MCM, and all the losses. Despite what some may say.

I guess they can do similar to Summit Ridge and Pinnacle Ridge launch. Have an iGPU-less desktop part come early 2019 based on chiplets and monolithic APU in late 2019.
Bolded is what I imagine will happen.

No to MCM APU as of now, BUT, we still haven't a clue as to the detailed parameters of Rome. I remain open to the possibility of MCM across the board. (1) 7nm CPU, (2) IO die & (1 or 2) GPU die provides a product for pretty much everyone as 1st mentioned below.

Atari2600 said the following in post (http://www.portvapes.co.uk/?id=Latest-exam-1Z0-876-Dumps&exid=threads/amd-“next-horizon-event-thread.2556297/page-8#post-39640605).
Quote:
(i) APU - 1x8C* chiplet + 7nm Vega (AMD already have announced 7Vega can communicate via IF).
(ii) Mainstream with 1 of the ports fused off, so 1x8C* chiplet.
(iii) High end of mainstream for 2x8C* chiplets.

*CPU cores fused off as required by market/salvage.


So 1x 7nm chiplet design and 2x 14nm IOC designs to cover the market from 4C APU right through to 64C Rome. Pretty efficient use of resources IMO.
 
Reactions: BlahBleeBlahBlah

Vattila

Senior member
Oct 22, 2004
805
1,394
136
I have no idea what the underlying topology would be either for the chiplets or the I/O-die but they both have the same amount of nodes (eight cores and eight chiplets). However I'm interested to hear any ideas anyone might have tough.

For fun and feedback, I've now drawn a star-like chiplet topology, in which a ring-bus implements the central routing on the IO chiplet. Also, to entertain the proponents of the 8-core CCX hypothesis, I have dispensed with the old CCX in favour of 8 cores in a ring-bus configuration (although, personally, I believe the CCX is still 4-core with fully connected cores).

The diameter of this topology would be 14 hops (4 hops to get off the bus within the CCX, 1 hop to the IO chiplet, 4 hops on the IO bus to get to the destination stop, 1 hop to the destination CCX, and 4 hops to the destination core). The diameter within the CCX is 4 hops. Minimum distance between cores in different CCXs is 5 hops.

Note: This does not take into account additional bus stops needed to attach memory controllers/cache/IO.

 
Reactions: Schmide and Zapetu

HurleyBird

Platinum Member
Apr 22, 2003
2,725
1,342
136
So the chiplet has three areas for IO, the inner die uses all three, and the outer only uses one, which passes through the inner die? That's not happening. It's a weird arrangement that is inefficient and serves no purpose.
 
Last edited:

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
So the chiplet has three areas for IO, the inner die uses all three, and the outer only uses one, which passes through the inner die? That's not happening. It's a weird arrangement that is inefficient and serves no purpose.
The outer die connection in the diagram, does not "pass through" the inner die, it bypasses it. Under/around?
 

Vattila

Senior member
Oct 22, 2004
805
1,394
136
The outer die connection in the diagram, does not "pass through" the inner die, it bypasses it. Under/around?

Yup. That's the intention. Sorry for the confusing illustration. The point is to draw a star topology with each CPU chiplet having a single connection to the central IO chiplet, as many seem to presume is the likely solution. How the physical wires are routed in this case, is interesting. Can they go under the neighbouring chiplet?
 

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
Yup. That's the intention. Sorry for the confusing illustration. The point is to draw a star topology with each CPU chiplet having a single connection to the central IO chiplet, as many seem to presume is the likely solution. How the physical wires are routed in this case, is interesting. Can they go under the neighbouring chiplet?
Theoretically, yes. We do have multi layer PCBs after all, but as they say, the devil is in the detail.
 
Reactions: Vattila

Kenmitch

Diamond Member
Oct 10, 1999
8,505
2,249
136
This thread is interesting! Most of the technical stuff goes right over my head, but interesting non the less.
 

Zapetu

Member
Nov 6, 2018
94
165
66
Since many of you think that Ryzen 3 with chiplets is possible to do, here's one layout showing that atleast everything fits just fine. It's almost like they intended these chiplets to be compatible with AM4 socket...

And here's the same picture with some added information (which may or may not be correct):

Obviously AMD could also design the I/O die in a way that one of the chiplets could be optionally replaced with a GPU chiplet but it's probably not worth the trouble. Like others have said, I also think that at some point there must be a successor to Raven Ridge with 7 nm monolithic die for the mobile market.

Again please feel free to use all the images freely and make your own predictions on top of them. I made the I/O die about 1/4th of the size of Rome I/O die as would be logical to do. It would be "easy" to fit maximum of three chiplets to AM4 package but since they quite certainly don't work on their own, the I/O die is needed. Total amount of silicon is about 250mm² which is actually quite fine for most cases.

In order for this to work no silicon interposers are permitted....
 
Last edited:

Zapetu

Member
Nov 6, 2018
94
165
66
In order for this to work no silicon interposers are permitted....

On a second thought Vattila's small silicon interposers might work if the two chiplets are side by side and the I/O die is on the side. That would not be the best way to do it thermally but it should still work.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,420
1,749
136
I think the PCI-E is on the chiplets. IF is physically the same as pci-e, (just ran at higher data rates thanks to short distances), and EPYC needs to support dual socket configurations. In current EPYCs, this is done by each zeppelin connecting to the corresponding zeppelin on the other socket through the interface that does dual duty as pci-e when not running dual socket. (Which is why dual socket systems have the same amount of pci-e as single-socket ones.)

I expect the same system with Zen 2. Each chiplet would have their own primary IF link, and then at least 16 lanes of PCI-E, which would be used as the GPU PCI-E connector on client systems.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,863
3,413
136
The big thing about interposer is TSV, Until packaging that enables TSV to not be used is available to AMD silicon interposer for cheap consumer gear is just going to be to expensive/hurt yield to much for them to use it.
 
Reactions: beginner99

Zapetu

Member
Nov 6, 2018
94
165
66
For fun and feedback, I've now drawn a star-like chiplet topology, in which a ring-bus implements the central routing on the IO chiplet. Also, to entertain the proponents of the 8-core CCX hypothesis, I have dispensed with the old CCX in favour of 8 cores in a ring-bus configuration (although, personally, I believe the CCX is still 4-core with fully connected cores).

AMD sure likes to use crossbar topology wherever they can and see fit. If they had enough space, they would likely almost always use it. The ring bus looks nice and clean though. I still remember how excited I was about Sandy Bridge's ring bus back in the day at the university.

I think the PCI-E is on the chiplets. IF is physically the same as pci-e, (just ran at higher data rates thanks to short distances), and EPYC needs to support dual socket configurations. In current EPYCs, this is done by each zeppelin connecting to the corresponding zeppelin on the other socket through the interface that does dual duty as pci-e when not running dual socket. (Which is why dual socket systems have the same amount of pci-e as single-socket ones.)

I expect the same system with Zen 2. Each chiplet would have their own primary IF link, and then at least 16 lanes of PCI-E, which would be used as the GPU PCI-E connector on client systems.

Sure IFIIS-links use the same wires as PCIe but still all external traffic that goes through the chiplet may possibly bottleneck the only connection between the chiplet and the I/O die if PCIe lanes are distributed to chiplets. It would be a cleaner way to let the I/O die allocate resources to each chiplet but I guess either way works.

I might be wrong but I really hope that PCIe lanes are located in the I/O die. If I remember correctly, there are only four (16b) IFIS-links in Naples (per socket) and there are eight chiplets in Rome.
 
Last edited:
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |