Speculation: Ryzen 3000 series

Page 173 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

amd6502

Senior member
Apr 21, 2017
971
360
136
https://images.anandtech.com/doci/1...Gaming-CPU_Architecture_06092019-page-008.jpg

3rd AGU, unified AGU scheduler, etc. Definitely, more SMT friendly than Zen/Zen+.

So, this is a weird setup I don't quite get. One unit is for writes only. The other two units are for both read and writes. Up to three reads and one write is allowed per cycle.

https://www.anandtech.com/show/1452...itecture-analysis-ryzen-3000-and-epyc-rome/10



Why did they pick it this way, and can somebody elaborate and speculate on the details?

From the looks of the diagram, below, it looks like the write only AGU has the ability to skip ahead of the Store queue.
 
Last edited:

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,824
15,826
136
Doesn't the 2990wx have a 4.2 GHz max boost?
Yes, I have never seen it, since I use all cores, and 3.3 is the best it does on all cores. So maybe it is faster. Anyway, I still think its great if true,
 
Reactions: Drazick

Elfear

Diamond Member
May 30, 2004
7,159
811
126
Apologies if this has been posted but Pcgamesn posted up an article with some quotes by AMD about the 3950X. They said the 3950X will be the best gaming chip of the line-up. Obviously the higher clock speeds are at work here but it implies to me that the cross-CCX latency issue with gaming has been resolved. That is great news if true.
 
Reactions: IEC

beginner99

Diamond Member
Jun 2, 2009
5,309
1,748
136
Apologies if this has been posted but Pcgamesn posted up an article with some quotes by AMD about the 3950X. They said the 3950X will be the best gaming chip of the line-up. Obviously the higher clock speeds are at work here but it implies to me that the cross-CCX latency issue with gaming has been resolved. That is great news if true.

Speculation is that every ccx to ccx access goes via IO die. even if on same chiplet. On the plus side this makes latency and performance consistent and predictable regardless how many chiplets you have. It avoids making single-chiplet chips better at gaming for example. But the real reason for this would be predictable server performance.
 

Schmide

Diamond Member
Mar 7, 2002
5,639
838
126
So, this is a weird setup I don't quite get. One unit is for writes only. The other two units are for both read and writes. Up to three reads and one write is allowed per cycle.

https://www.anandtech.com/show/1452...itecture-analysis-ryzen-3000-and-epyc-rome/10



Why did they pick it this way, and can somebody elaborate and speculate on the details?

From the looks of the diagram, below, it looks like the write only AGU has the ability to skip ahead of the Store queue.

Seems logical to me. (I'm prob grossly simplifying things)

A (read) + B (read) = C (write)

It's prob way easier to schedule reads as their order really doesn't matter. Where as writes often require sequential ordering and cache coherence.
 

Thibsie

Golden Member
Apr 25, 2017
1,010
1,186
136
I don't understand: if the memory controller is on the IO dye, why would you need multiple chiplets to keep the max number of Ram channels ?

One chiplet (8 cores)+IO dye is sufficient to producer a Rome w/ 8cores/8channels.
Or what did I get wrong ?

Thanks
 

naukkis

Senior member
Jun 5, 2002
991
841
136
I don't understand: if the memory controller is on the IO dye, why would you need multiple chiplets to keep the max number of Ram channels ?

One chiplet (8 cores)+IO dye is sufficient to producer a Rome w/ 8cores/8channels.
Or what did I get wrong ?

Thanks

They don't but with more chiplets they will have more L3-cache - so better performance. And they can make use for very bad chiplets with only one or two working cores.
 
Reactions: Glo. and DarthKyrie

Thibsie

Golden Member
Apr 25, 2017
1,010
1,186
136
What gave you this idea?

Some previous posts alluded to this.
Of course keeping 8 chiplets would change cache amounts against 1 chiplet but the posts implied that keeeping the 8 chiplets would allow keeping all RAM chanels.
Seemed wierd so I wanted to be sure.

There's no link at all between number of chiplets and number of RAM Chanels.
Thank you.
 

Thibsie

Golden Member
Apr 25, 2017
1,010
1,186
136
They don't but with more chiplets they will have more L3-cache - so better performance. And they can make use for very bad chiplets with only one or two working cores.

Indeed. But they might opt for one or the other but would introduce consistency problems if they put all versions behind the same model number.
Guess we'll know soon enough what strategy they chose.
 

Atari2600

Golden Member
Nov 22, 2016
1,409
1,655
136
There's no link at all between number of chiplets and number of RAM Chanels.
Thank you.

That is true, but does it affect effective bandwidth?


What is the speed and width of infinity fabric on the die? Can 8 channels of memory bandwidth be crammed down (what would be) 1 IF link?

I see 25 GT/s on IF2 per link[1]. Naples had a bandwidth of around 130 GB/s[2]. What is each transfer packet size? Is it 32 bits? [25 x 32/8 = 100 GB/s]
If so, it might choke a bit feeding 8 channels from DRAM into one chiplet.



[1]https://en.wikichip.org/wiki/amd/microarchitectures/zen_2
[2]https://www.dell.com/support/articl...-infiniband-and-wrf-performance-study?lang=en
 

Thibsie

Golden Member
Apr 25, 2017
1,010
1,186
136
That is true, but does it affect effective bandwidth?


What is the speed and width of infinity fabric on the die? Can 8 channels of memory bandwidth be crammed down (what would be) 1 IF link?

I see 25 GT/s on IF2 per link[1]. Naples had a bandwidth of around 130 GB/s[2]. What is each transfer packet size? Is it 32 bits? [25 x 32/8 = 100 GB/s]
If so, it might choke a bit feeding 8 channels from DRAM into one chiplet.



[1]https://en.wikichip.org/wiki/amd/microarchitectures/zen_2
[2]https://www.dell.com/support/articl...-infiniband-and-wrf-performance-study?lang=en
Good point. Saw on the other thread and if those leaks are correct, there would be multiple 8 cores Epyc with 32 and 64 MB L3.
This points to different chiplets configs.

Édit: removed 16MB référence.
 

Asterox

Golden Member
May 15, 2012
1,042
1,837
136

Topweasel

Diamond Member
Oct 19, 2000
5,437
1,659
136
Good point. Saw on the other thread and if those leaks are correct, there would be multiple 8 cores Epyc with 32 and 64 MB L3.
This points to different chiplets configs.

Édit: removed 16MB référence.

For cooling, balance and packaging all SP3 socket EPYC's are going to be either 4 or 8 dies. That's why the 8 chiplets version looks like it does. They can remove a die from each pair. That said you can't have 8 cores on an 8 chiplet EPYC. The smallest of those will be 16c but I have a feeling the two configurations merge at 32c and anything less is always 4 chiplets.
 

Mopetar

Diamond Member
Jan 31, 2011
8,307
7,321
136
64 cores at 3.4GHz and 225W TDP, unbelievable.

It might not be wholly unreasonable. A Ryzen 3700X will do 3.6 GHz with a 65W TDP. Consider that Epyc is going to get the best chiplets from the perspective of power efficiency and factor in that the Epyc will have one big IO die and how much of the power budget that typically consumes as well as how it scales with additional chiplets and 3.4 GHz isn't out of the realm of possibility. Could also be the all-core boost and not the base, and considering the previous top Epyc could do a 2.7 GHz all-core boost in a 180W TDP, it's not hard to imagine getting to 3.4 GHz boost with the top tier 64C Zen 2 Epyc CPUs.

If you look back at the first Ryzen chips we went from 8C/16T at 3.6 GHz in 95W TDP to 16C/32T at 3.5 GHz (and a much, much higher boost) in 105W when comparing the 1800X to the 3950X.

I don't think I would bet even money on that outcome, but I wouldn't write it off as impossible either.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,111
136
Speculation is that every ccx to ccx access goes via IO die. even if on same chiplet. On the plus side this makes latency and performance consistent and predictable regardless how many chiplets you have. It avoids making single-chiplet chips better at gaming for example. But the real reason for this would be predictable server performance.
The main reason would be cache coherency.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |