Speculation: Ryzen 4000 series/Zen 3

Page 37 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Saylick

Diamond Member
Sep 10, 2012
3,385
7,151
136
There’s loads of potential for further increases, and that’s without radical redesigns needed.

Just some basic stuff of the top of my head

- More execution units (Doesn’t have to be ALU, can be AGU, LEA, FPU, etc...)
- Larger Caches
- Increased ROB and Memory, Scheduler Buffers
- More ports to dispatch instructions to execution units and reduce back end bottle necks
Makes sense. In a post I wrote about 2 months ago, I thought that the number of micro-ops that could be dispatched was limiting the overall throughput of the core:
http://www.portvapes.co.uk/?id=Latest-exam-1Z0-876-Dumps&exid=threads/speculation-ryzen-4000-series-zen-3.2567589/post-39896162

Not to say that the 6 ops/cycle dispatch & 8 ops/cycle retire was the main bottleneck, but given how wide the core is and how many micro-ops the micro-op cache could deliver (8 ops per cycle), AMD could afford to increase the dispatch rate to match. That alone, in theory, gives a maximum of 33% more instructions in a given cycle assuming nothing else is a bottleneck. Everything else is just making sure there's enough load/store resources and enlarging buffers to keep up.
 
Reactions: Olikan

Richie Rich

Senior member
Jul 28, 2019
470
229
76
4ALU is quite good already and got zen the 40%+ ipc gain.

The potentiall monothreading gains from 4ALU to 5ALU (or 6ALU) are going to be much less. But here, even a ~5% IPC increase is going to count a lot. And for (SMT2) multithread IPC gains, it's bound to be double digits.
Why Apple A12 use 6xALU in mobile power sensitive CPU? Why they let these ALU units idle?
Why Apple A12 is +58% IPC faster than Skylake-X in INT? Isn't because Apple find a way how feed them efficiently?

It looks like those 6xALU are not idling much. These are delivering pure performance.

With SMT2 it must be even much easier to utilize all 6xALUs. It looks like the lowest fruit available
It's the same like tune Corolla 4-cyl engine for evey last bit of horsepower. It's way cheaper and easier to buy a car with V6 engine.
 
Reactions: amd6502

itsmydamnation

Platinum Member
Feb 6, 2011
2,864
3,418
136
Why Apple A12 use 6xALU in mobile power sensitive CPU? Why they let these ALU units idle?
Why Apple A12 is +58% IPC faster than Skylake-X in INT? Isn't because Apple find a way how feed them efficiently?

It looks like those 6xALU are not idling much. These are delivering pure performance.

With SMT2 it must be even much easier to utilize all 6xALUs. It looks like the lowest fruit available
It's the same like tune Corolla 4-cyl engine for evey last bit of horsepower. It's way cheaper and easier to buy a car with V6 engine.
coralation != causation
backup your constant posting with some actual data, otherwise its nothing but https://bit.ly/2Bc1eB8

Also if skylake SPEC int has a average IPC of 1.5 ( that includes memory ops) and A13 is 50% faster ( so 2.25) how the hell are 8 pipelines ( 6 ALU , 2 load/ store ) "not idling much"? Like actually answer a question for once!
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
coralation != causation
backup your constant posting with some actual data, otherwise its nothing but https://bit.ly/2Bc1eB8

Also if skylake SPEC int has a average IPC of 1.5 ( that includes memory ops) and A13 is 50% faster ( so 2.25) how the hell are 8 pipelines ( 6 ALU , 2 load/ store ) "not idling much"? Like actually answer a question for once!
To be clear. You talk about 1.5 IPC of CISC code, right? And do you about fact that ALUs are executing RISC instructions internally? You sounds like mixing two different things together. Jim Keller said that actuall Intel CPU is executing 3-6 instructions per clock. Is he wrong and why?

Anyway, if Apple A12 with 6xALU (+50% more ALU over 4xALU Skylake) is faster about +58% in SPEC2006int, that means Skylake's ALUs are more idling than Apples. That's pretty impressive to me. That leads me to idea 6xALU core is very efficient way how to increase IPC. I'm not saying it's easy to design such state of art core. Engineering was always harder job than selling burgers in McDonnald's.
 

DrMrLordX

Lifer
Apr 27, 2000
21,805
11,161
136
What I'm curious about is what else is there to do that would also give another 15% IPC on top of the 15% that Zen 2 brought. Larger registers? Another L/D unit? More ALUs?

We already know that there will be a shared L3 cache between CCX pairs in Zen3. That has all kinds of interesting performance implications. There may also be some performance improvements for the TAGE branch predictor that was launched a bit early with Zen2 (thanks AMD!). And um, some other stuff.
 

Atari2600

Golden Member
Nov 22, 2016
1,409
1,655
136
Why all the intense hate for Richie Rich posts?

All he's saying is that 6 ALU will be faster than 4 ALU IF the rest of the support structures ( cache, decode, retire, etc) are in place. Not that 6 is always better than 4 irrespective of anything else.

Those transistors don't come for free.

If you have 4x 4ALU cores vs. 3x 6ALU cores - which is quicker? So which is the best use of budget?
 

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
Those transistors don't come for free.

If you have 4x 4ALU cores vs. 3x 6ALU cores - which is quicker? So which is the best use of budget?
Of course they don't, but what is the path after you've optimized for the 4 ALU core? There comes a point when you will have to increase them. Are we there yet?

Intel for one, is obviously paying the price for their lack of progress in transistor budgets and resultant stagnation in IPC.

I've often mentioned the Soft Machines work, as I'm interested in its possible use of variable sized virtual cores assembled from simpler structures on the fly.
 
Reactions: amd6502

soresu

Platinum Member
Dec 19, 2014
2,966
2,188
136
We already know that there will be a shared L3 cache between CCX pairs in Zen3.
The AMD slide displaying Zen3 unified L3 literally defines a CCX as a complex of cores bound by a shared L3, this effectively makes the new CCD a single CCX.

This will probably count only for the desktop/enthusiast/server parts though.

We may get a multi chiplet APU with a full 8 core CCD in the future, but that will be early 2021 at the earliest.
 

soresu

Platinum Member
Dec 19, 2014
2,966
2,188
136
I've often mentioned the Soft Machines work, as I'm interested in its possible use of variable sized virtual cores assembled from simpler structures on the fly.
I was very interested at the time, but I feel that if it was quite as good as they were making it out to be, that they would not have sold out to Intel so easily.

At the very least they would have pitched to Qualcomm, Samsung or some Chinese venture capital interests if it was all that.

Their quick capitulation fealt like they had something, but more of a "look see what we have, come buy us former employers or we will go elsewhere!" vibe.
 

DrMrLordX

Lifer
Apr 27, 2000
21,805
11,161
136
The AMD slide displaying Zen3 unified L3 literally defines a CCX as a complex of cores bound by a shared L3, this effectively makes the new CCD a single CCX.

It doesn't if there's an IF link between cores 1-4 and cores 5-8 just like in Zen2.
 

Atari2600

Golden Member
Nov 22, 2016
1,409
1,655
136
Of course they don't, but what is the path after you've optimized for the 4 ALU core? There comes a point when you will have to increase them. Are we there yet?

AMD tried to change the programming model before.

Remind me how Bulldozer worked out again?
 

Guru

Senior member
May 5, 2017
830
361
106
The usual questions spring to mind:
  1. Will it support currently existing motherboards (300/400/500 series chipsets)?
  2. What kind of IPC increase are we talking about?
  3. Will AMD manage to squeeze more frequencies?
  4. What node will it use?
  5. What will be its TDP?
  6. Will it support AVX512 instructions?
  7. When and if we can expect Ryzen 4000 CPUs with modern onboard graphics (e.g. Navi10/Navi20)?
We pretty much know most of these.
1. AMD said they are going to support the same chipset until 2020, so yeah, even Zen 3 should be compatible with current chipset and mobo's.
2. This is up in the air, but they won't be doing a Zen 3 design if they can't get at least 5% more IPC gain over Zen 2.
3. Of course, TSMC has said that their 7nm+ is much better than their 7nm, in fact it's even better than their 6nm, though it is also more expensive for now. So yeah, I do expect some small frequency increases, probably 100mhz on the lower to mid end and up to 200mhz on the higher end.
4. Its going to be 7nm+
5. We know 7nm+ can be up to 20% more power efficient or up to 15% faster, so if AMD decides to use 7+ benefits for reduced TDP, then about 20% lower TDP, assuming everything else stays the same, but again if they introduce more frequency, more IPC it might be less.
6. Probably.
7. Earliest Ryzen 4000 cpu's are at least 9 months away, their G series cpu's come few months after that, so at least a year.
 

soresu

Platinum Member
Dec 19, 2014
2,966
2,188
136
It doesn't if there's an IF link between cores 1-4 and cores 5-8 just like in Zen2.
Be careful, it discussed both Rome AND Milan on those slides.

Not sure where you get that from, I don't see it on the slide with the diagram.
 
Last edited:

soresu

Platinum Member
Dec 19, 2014
2,966
2,188
136
5. We know 7nm+ can be up to 20% more power efficient or up to 15% faster, so if AMD decides to use 7+ benefits for reduced TDP, then about 20% lower TDP, assuming everything else stays the same, but again if they introduce more frequency, more IPC it might be less.
No, it's a 20% area decrease, 15% more power efficient at iso design/clock, or 10% more clock at iso power.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,689
1,224
136
It doesn't if there's an IF link between cores 1-4 and cores 5-8 just like in Zen2.
Or, the new structure could give the same cores 2x bandwidth. Both IF links drop into the same L3 cache. If the workload on the CCD is only quad-core, in Zen2 only 16 megabytes and 1/2 total CCD GB/s could be used. While, with Zen3 the four cores would have 32+ megabytes and complete global saturation.
 

JDG1980

Golden Member
Jul 18, 2013
1,663
570
136
Why Apple A12 use 6xALU in mobile power sensitive CPU? Why they let these ALU units idle?
Why Apple A12 is +58% IPC faster than Skylake-X in INT? Isn't because Apple find a way how feed them efficiently?

It looks like those 6xALU are not idling much. These are delivering pure performance.

Perhaps ARM CPUs are more amenable to extreme parallelism than x86 CPUs?

With SMT2 it must be even much easier to utilize all 6xALUs. It looks like the lowest fruit available
It's the same like tune Corolla 4-cyl engine for evey last bit of horsepower. It's way cheaper and easier to buy a car with V6 engine.

I'm not sure if you've noticed, but car manufacturers have been squeezing more performance out of 4-cylinder engines and avoiding 6-cylinder engines wherever possible. These days to get a V6 you pretty much have to buy a truck or a specialty car.
 

soresu

Platinum Member
Dec 19, 2014
2,966
2,188
136
I'm not sure if you've noticed, but car manufacturers have been squeezing more performance out of 4-cylinder engines and avoiding 6-cylinder engines wherever possible. These days to get a V6 you pretty much have to buy a truck or a specialty car.

Oh gawd no, don't encourage people to use car engine metaphors!
 

DrMrLordX

Lifer
Apr 27, 2000
21,805
11,161
136
Be careful, it discussed both Rome AND Milan on those slides.

Not sure where you get that from, I don't see it on the slide with the diagram.

I saw the slides. It doesn't show how the cores on each side of the massive L3 are linked at all. It shows four cores on the left and four on the right, with the L3 in between. And . . . that's it! No link diagram, no topology, no nothing. It does make it very clear that the cores are separated into two blocks just like in Rome. Just the L3 is different.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |