Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 176 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

H433x0n

Golden Member
Mar 15, 2023
1,068
1,273
96
Yeah, something doesn't add up if it only achieves 10% ST improvement. If so, it would be the LOWEST single threaded uplift of a Zen generation (besides Zen to Zen+) to date, which I find hard to believe considering how many changes are under the hood.

I still think that RGT doesn't know jack and is just re-aligning w/ MLID's slides after the fact.
I don’t find it hard to believe, especially after reading that ChipsAndCheese post coupled with ARL-S perf projections.

There's a Russian saying "9 women can't make a baby in 1 month". It could be that we're at the point of diminishing returns without a paradigm shift. We may need to add some new ISA extensions, update how x86 is compiled and ditch concepts like SMT to take the next step.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,331
2,942
106
And this is based on what?

You can just add silicon die area of the 2 parts + extra memory that will get poor utilization + board cost + cooling + extra assembly cost

With the amount of LPDDR5x It will likely have, I have to wonder If the price will be lower than a comparably performing laptop with 4060/4070.
.

Shared LPDDR5 vs (LP)DDR5 + GDDR. The shared will win on cost, we will see about performance, especially how much MALL cache helps.

I don't think AMD will make that many of them, so I don't expect many laptop models or good availability.

That has always been a problem. AMD builds the chips and expects them to sell themselves, and only few sell themselves...

NVidia still has the mindshare, and there will still be people overpaying for NVidia product to get the same performance, but AMD (and Intel) have to start chipping away somewhere...
 
Reactions: Tlh97

Joe NYC

Platinum Member
Jun 26, 2021
2,331
2,942
106
There's a Russian saying "9 women can't make a baby in 1 month". It could be that we're at the point of diminishing returns without a paradigm shift. We may need to add some new ISA extensions, update how x86 is compiled and ditch concepts like SMT to take the next step.

Forget making babies, if you can just pull rabbits out of a hat - as MLID thinks Intel can.

MLID thinks Intel can pull up to +20% IPC for MTL and +~40% IPC for ArrowLake out of a hat, using cores that are not drastically changed
 

Ajay

Lifer
Jan 8, 2001
16,094
8,106
136
So according to RedGamingTech now, 8950x on only 2400 points in CB24. My 7950x scored 2120 and i saw even 2185 score somewhere... so not that great. Quite a far cry from previously leaked CB23 score which should have been 49k. Pretty much half of that.
Granted, there are other workloads, where it may slap Zen4.
OMG! Yeah, AMD added all those resources to Zen5 for a measly 10% bump in 1T performance. Nutz! I suppose that there is a scenario where all the engineers on the Zen5 development team fell down at once and suffered TBIs and had to stop development b/4 finished. Somehow, I think that is unlikely. The only other thing I could see is that TSMC's N3E node is stroking out.
 

jpiniero

Lifer
Oct 1, 2010
14,835
5,454
136
OMG! Yeah, AMD added all those resources to Zen5 for a measly 10% bump in 1T performance. Nutz! I suppose that there is a scenario where all the engineers on the Zen5 development team fell down at once and suffered TBIs and had to stop development b/4 finished. Somehow, I think that is unlikely. The only other thing I could see is that TSMC's N3E node is stroking out.

Maybe the leakers got it wrong and the core bloat is due to them going crazy with AI?

(I'm guessing "They only care about AI" is going to be a popular complaint)
 

Joe NYC

Platinum Member
Jun 26, 2021
2,331
2,942
106
Frankly, the Halo SKU doesn't make any sense and it wouldn't be surprising if AMD scrapped it.

Charging premium for a AMD iGPU product catering a small niche doesn't sound viable in the current economical situation.

BTW, this is entire Intel business plan - charge more for Intel iGPU - with MeteorLake and forward.

Charging more to justify higher cost of manufacturing disaggregated CPU and larger GPU tile manufactured by TSMC.
 

H433x0n

Golden Member
Mar 15, 2023
1,068
1,273
96
Forget making babies, if you can just pull rabbits out of a hat - as MLID thinks Intel can.

MLID thinks Intel can pull up to +20% IPC for MTL and +~40% IPC for ArrowLake out of a hat, using cores that are not drastically changed
How did this become about MLID and Intel?

It seems there’s not a lot of low hanging fruit when going wider if ChipsAndCheese’s recent post is to be believed.
 

Frenetic Pony

Senior member
May 1, 2012
218
179
116
How did this become about MLID and Intel?

It seems there’s not a lot of low hanging fruit when going wider if ChipsAndCheese’s recent post is to be believed.
Yeah wide is the way every non x86 CPU has gone with good results. There's some block with x86 due to variable length instructions that I've never totally understood, because every time I see it the explanation the interviewee gives is "it's complicated" and then they refuse to explain.

Well whatever it is the lead for Zen has publicly stated that it's being worked on at AMD, and I think it's been implied/leaked that whatever solution they've come up with is due in Zen 5. So it'll be interesting to see if it turns up.
 

SiliconFly

Golden Member
Mar 10, 2023
1,218
631
96
To reach the identical performance, integrated CPU + iGPU will always have lower cost than having 2 components doing the same job. With additional benefit of better power efficiency and smaller size weight.

The advantage of having dGPU has always been in delivering greater performance, beyond iGPU. Not the cost.
Not really. For example. if we put a RTX 4070 in an CPU SoC package or a monolithic die (with cpu) which shares DDR5 memory, it's not going to perform well due to gpu to main memory bandwidth bottleneck and also the bottleneck due to sharing the main memory with the CPU.

Also, the cost of the RTX 4070 silicon remains the same irrespective of whether it's an iGPU in a CPU SoC package or a separate dGPU graphics card. The only place where there is cost savings is GDDR6 graphics memory as we won't be typically using them with iGPUs.

To summarize, for identical performance, the iGPU needs to be more powerful (and hence more expensive) than a dGPU. A dedicated RTX 4060 Ti will comfortably beat a integrated RTX 4070.
 
Last edited:

Ajay

Lifer
Jan 8, 2001
16,094
8,106
136
Maybe the leakers got it wrong and the core bloat is due to them going crazy with AI?

(I'm guessing "They only care about AI" is going to be a popular complaint)
I think the most likely problem with leaked benches are one of two things:

1. The leaks are stale, from older ES silicon.
2. CB is an 'APP' (lol) that isn't well optimized for Zen5; or a good target for showing Zen5's strengths.

ChipAndCheese made this observation:
Any performance numbers should be taken with a giant grain of salt too. It’s better to assume they are all guesses at this point. Even if a leaker has a “source”, estimating performance is inherently difficult because different applications will behave differently. An engineer might see a 30% IPC instruction uplift in simulation with a specific instruction trace, but that doesn’t mean other applications will enjoy the same improvement.
IPC is always subject to the workload being run. It could well be between 10-30% depending on which application is being executed. Honestly, if average 1T ST doesn't hit +20% over Zen4 (as measured the same way as Zen4), then I would be a bit disappointed.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,741
14,773
136
I think the most likely problem with leaked benches are one of two things:

1. The leaks are stale, from older ES silicon.
2. CB is an 'APP' (lol) that isn't well optimized for Zen5; or a good target for showing Zen5's strengths.

ChipAndCheese made this observation:

IPC is always subject to the workload being run. It could well be between 10-30% depending on which application is being executed. Honestly, if average 1T ST doesn't hit +20% over Zen4 (as measured the same way as Zen4), then I would be a bit disappointed.
Perfect example. Upcoming primegrid race in 2 weeks. Zen 4 is like 60% faster (5950x> 7950x) due to avx-512. The same type of thing could happen in Zen 4 to Zen 5. And nobody benchmarks Zen 4 to 13900KS in avx-512, since it does not support it !
 

Goop_reformed

Senior member
Sep 23, 2023
248
307
96
I don’t find it hard to believe, especially after reading that ChipsAndCheese post coupled with ARL-S perf projections.

There's a Russian saying "9 women can't make a baby in 1 month". It could be that we're at the point of diminishing returns without a paradigm shift. We may need to add some new ISA extensions, update how x86 is compiled and ditch concepts like SMT to take the next step.
So you are talking about "convergence of evidence" but in this case there has been a variety of infos coming out of the same (!!) leakers. There's no paradigm shift at the moment, because if the AI boom is only recently started whereas zen 5 and arrowlake have been around for much longer.

What still puzzles me is how much resource both companies have put in their respecting uarch and yet the results are underwhelming, according to the leaks at least. Makes absolutely no sense.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,331
2,942
106
Not really. For example. if we put a RTX 4070 in an CPU SoC package or a monolithic die (with cpu) which shares DDR5 memory, it's not going to perform well due to gpu to main memory bandwidth bottleneck and also the bottleneck due to sharing the main memory with the CPU.

Strix Halo will have additional memory channels of LPDDR5x.

As far as the "bottleneck due to sharing" it is actually the opposite. Instead of sending data from CPU attached DDR5 through the PCIe bus to GDDR (burning bandwidth on both sides), the data doesn't go anywhere, it stays in LPDDR5. There may be a memory copy or a pointer is pointed to a new address.

Also, the cost of the RTX 4070 silicon remains the same irrespective of whether it's an iGPU in a CPU SoC package or a separate dGPU graphics card. The only place where there is cost savings is GDDR6 graphics memory as we won't be typically using them with iGPUs.

It is not the same. All of the IO is shared within one silicon die in Strix Point including memory controllers, PHY, PCIe, display IO.

Also, the small iGPU that regular CPu has disappears (is replaced).

So 2 copies of all of these goes down to 1.

To summarize, for identical performance, the iGPU needs to be more powerful (and hence more expensive) than a dGPU. A dedicated RTX 4060 Ti will comfortably beat a integrated RTX 4070.

I don't think you have shown any of this. Your contention of bandwidth sharing is actually going to work in opposite way from what you think, by eliminating of data copying through the PCI bus.

CPU, it is typically not bandwidth starved. Especially not at the same time when GPU is starved. If they both are, it is because CPU is sending data to GPU, and shared memory will reduce this starvation from both sides.

Now let's look at the bandwidth:

4060 Ti has apparently 288 GB/s.
Strix Halo will have 4 channels of ~68 GB/s = 272 GB/s

So it is a wash. If Strix Halo is aiming for approx 4060 Ti performance, it will not be hampered by lack of bandwidth any more than 4060 Ti. So there is no reason for GPU compute to be any more powerful than that of 4060 Ti (in order to achieve the same performance), and there will instead be die area savings from removing duplication of analog and I/O
 
Last edited:

BorisTheBlade82

Senior member
May 1, 2020
667
1,022
136
The trouble I am having with the CnC article is this:
They are basically looking at each possible point of improvement in isolation and state that it is basically not worth it.
But that is not how it works. You start at widening your biggest bottleneck. Then a new bottleneck arises which you widen as well. You basically iterate on that until your transistor or power budget is exhausted. Of course the whole thing is much more sophisticated than just described, because there are so many ways on how to approach this.
The point is: Looking at single stages of a pipelined system does not make sense.
 

Frenetic Pony

Senior member
May 1, 2012
218
179
116
Unless you consider good results to be clock speed and latency.

You're right, screw multi core, Pentium 5 should've happened and we've lost the one true way of the gigahertz.

Goals should never include performance per watt or per mm, clockspeed is the only god we bow to and dare I say it, we should be aiming for

One BIIILLLIIOON gigahertz!!!
 

coercitiv

Diamond Member
Jan 24, 2014
6,395
12,827
136
But that is not how it works. You start at widening your biggest bottleneck. Then a new bottleneck arises which you widen as well. You basically iterate on that until your transistor or power budget is exhausted. Of course the whole thing is much more sophisticated than just described, because there are so many ways on how to approach this.
My understanding is that widening part of the core at the cost of efficiency is also part of the cost they pay to setup further design evolution, otherwise they'd be stuck rearranging and optimizing the same pool of transistors. Obviously they try to get as much right as possible in the first iteration, but it's still a plan for the future too.

Like in many human activities, making only the most efficient moves isn't necessarily a winning long-term strategy.
 

yuri69

Senior member
Jul 16, 2013
437
717
136
BTW, this is entire Intel business plan - charge more for Intel iGPU - with MeteorLake and forward.

Charging more to justify higher cost of manufacturing disaggregated CPU and larger GPU tile manufactured by TSMC.
Intel is in a completely different market position compared to AMD. Intel has no problem pushing OEMs their stuff (Ultrabooks?). IIRC, AMD resorts to obscure exclusives.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |