Discussion AMD cools the pace to Moore's Law Death

FlameTail · Friday at 9:26 AM

moinmoin said:
AMD usually uses SpecINT2017 as the base for its IPC claims.

Would a new IOD improve SPEC2017 scores?

GTracing · Friday at 9:46 AM

FlameTail said:
Would a new IOD improve SPEC2017 scores?

The are a few ways that a new I/O die could increase performance.

Lower memory latency
Higher ram bandwidth
Higher bandwidth from the I/O die to the CCD
Lower latency between CCDs

I might be missing some, but they all basically come down to improved bandwidth and latency for the L3 cache or ram.

With that in mind, the question becomes does SPEC benefit from better L3 and RAM? If you look at chips and cheese's 9800X3D vs 9950X benchmarks, I would say yes.

AMD's 9800X3D: 2nd Generation V-Cache

Following the first generation of V-Cache found in the Zen 3 and Zen 4 X3D SKUs, AMD is now following up with the second generation of V-Cache which is a major change for AMD in terms of packaging.

chipsandcheese.com

inquiss · Friday at 11:54 AM

FlameTail said:
• New IOD
• Frequency Increase

I have a question: Doesn't the performance uplift brought by the new IOD fall into the category of "IPC" ?

Interesting but I don't think so. At least going by the latest IPC suggested in zen 5 where it seems like they explained the upcoming ipc based on what the core was capable of, but without the limitations of the IOD. Worked against them for zen 5 maybe work for them with zen 6?

yuri69 · Friday at 12:00 PM

OneEng2 said:
Also, while I think that modifying the IOD of Zen 5 might well unlock more performance for AMD with minimal effort, I wonder if the Zen 5 front end needs modified to really take advantage of additional bandwidth?

The main counter point to a new IOD is AMD's history of the cheapest possible solutions. They have done "the XT" twice, why not with Zen 5 too?

gdansk · Friday at 12:08 PM

yuri69 said:
The main counter point to a new IOD is AMD's history of the cheapest possible solutions. They have done "the XT" twice, why not with Zen 5 too?

There is something planned for next year that isn't the cheapest possible solution. But no one will want it and I'm not convinced it'll help much.

StefanR5R · Friday at 12:12 PM

yuri69 said:
The main counter point to a new IOD is AMD's history of the cheapest possible solutions.

They could bring "wide GMI" from sIOD over to cIOD; doesn't seem costly. But this would only improve memory bandwidth, not memory latency, while the latter matters more to the client segment.

moinmoin · Friday at 1:03 PM

FlameTail said:
Would a new IOD improve SPEC2017 scores?

As @GTracing already indicated the score already differs between different Zen 5 products. Epyc Turin famously achieves ~40% while X3D less client chips don't even manage 10% to the disappointment of pretty much everybody.

I expect Zen 6 to remove all the bottlenecks and then some, creating the uncore that is then reused again with Zen 7 which then is bottlenecked by that again. That's the repeating cycle AMD chose to go through with its Zen gens.

----

Regarding the thread's topic, I still think AMD biggest issue is not the kind of cadence but the fact it's slowing (@yuri69 pointed it out before in this thread).

gdansk · Friday at 1:21 PM

moinmoin said:
As @GTracing already indicated the score already differs between different Zen 5 products. Epyc Turin famously achieves ~40% while X3D less client chips don't even manage 10% to the disappointment of pretty much everybody.

But a very large portion of that difference is because client Zen 5 runs up against frequency walls while Turin isn't. How is it going to fix that?

Edit: Even with finflex and N3 I don't expect more than 6050MHz.

Doug S · Friday at 3:26 PM

GTracing said:
The are a few ways that a new I/O die could increase performance.

Lower memory latency

Higher ram bandwidth

Higher bandwidth from the I/O die to the CCD

Lower latency between CCDs

I might be missing some, but they all basically come down to improved bandwidth and latency for the L3 cache or ram.

With that in mind, the question becomes does SPEC benefit from better L3 and RAM? If you look at chips and cheese's 9800X3D vs 9950X benchmarks, I would say yes.

AMD's 9800X3D: 2nd Generation V-Cache

Following the first generation of V-Cache found in the Zen 3 and Zen 4 X3D SKUs, AMD is now following up with the second generation of V-Cache which is a major change for AMD in terms of packaging.

chipsandcheese.com

Improved memory latency helps SPECint, but improving memory bandwidth only helps SPECfp. Improving the latency being CCDs won't really help SPECrate MT scores because it is a fully parallel benchmark, running multiple independent copies of the SPEC tests.

The reason SPEC is helped by X3D is because of the size of the cache reducing the effective memory latency for stuff that is pushed out of the smaller non-X3D L3 and has to be reloaded from main memory.

OneEng2 · Friday at 3:49 PM

SteinFG said:
AMD makes a new core every 2 years for the past 8 years, why are you surprised that Zen 6 is coming out 2 years after Zen 5?

The cadence has slowed over time. I am not surprised, in fact I argue that it is a waste of time and energy to design something new without the advantage of a significantly improved process to do it on.

maddie said:
What exactly is termed IPC?

(1) Is it the basic core processing "X" instructions for "Y" cycles = X/Y IPC?

(2) Is it the wider CPU processing "X" instructions for "Y" cycles = X/Y IPC?

(3) Something else?

(2) = real world usage for user. IOD improvements apply here but not in case (1), which seems to assume zero external latency effects. I would assume feeding the cores as relevant to IPC, but it seems core designers think more narrowly (my impression) as in case (1).

Unless we are talking about pure single threaded performance ..... which I think is silly in this day and age, it is my opinion that performance per clock is a much better gauge of the design of a core. This is why I keep questioning the assertion that Skymont is a great idea for a unified design. Only when viewed from a single threaded POV is this true. Once you are talking about MT applications, how much performance per core does Skymont attain compared to Zen 5c with SMT?

yuri69 said:
The main counter point to a new IOD is AMD's history of the cheapest possible solutions. They have done "the XT" twice, why not with Zen 5 too?

I am not complaining about AMD's efforts to remain very profitable while Intel is burning through money like a drunken sailor (FYI, I have done this personally in a very literal sense ). Zen 5 manages to outperform everything Intel has to offer in nearly every way while doing it on a less expensive and less dense process node.

gdansk said:
There is something planned for next year that isn't the cheapest possible solution. But no one will want it and I'm not convinced it'll help much.

No one here will want it perhaps. We are not the majority of customers. I see AMD gaining traction in the corporate laptop market next year by offering the absolute best performance per $. Intel has spent decades using its loss leader capability in one segment that they simply augment with a profit leader in another. They do this and squeeze AMD's market share and profit that has kept AMD from building up further and investing to become more competitive. I think that things have changed. Cheap is good sometimes.

Thunder 57 · Friday at 4:54 PM

OneEng2 said:
The cadence has slowed over time. I am not surprised, in fact I argue that it is a waste of time and energy to design something new without the advantage of a significantly improved process to do it on.

Zen 2 to Zen 3 had great gains gen over gen and both were N7. That said I don't think AMD can gain much more with a better IOD die and fclk.

OneEng2 · Friday at 5:03 PM

Thunder 57 said:
Zen 2 to Zen 3 had great gains gen over gen and both were N7. That said I don't think AMD can gain much more with a better IOD die and fclk.

The core die size did go up from 74mm2 to 80mm2 though. Additionally, Zen 2 was the first chiplet design and like Intel, AMD had latency issues in the first design that they fixed with Zen 3 giving basically the same design much better performance.

GTracing · Friday at 5:24 PM

OneEng2 said:
The core die size did go up from 74mm2 to 80mm2 though. Additionally, Zen 2 was the first chiplet design and like Intel, AMD had latency issues in the first design that they fixed with Zen 3 giving basically the same design much better performance.

I don't know what you mean by "latency issues" but that's not what AMD says the IPC comes from.

DavidC1 · Friday at 10:37 PM

inquiss said:
Why do you think 10% performance is all that zen 6 will bring to the table? That's just the suggested IPC increase, there are other vectors for improved performance...

Zen 5 is already at 5.7GHz. There's practically no room to grow there. They used up most of them in Zen 4. We aren't going to get another 10%. Intel literally kills Raptorlake to get there, and in Arrowlake a significant performance potential.

In certain limited scenarios they will get faster. But when they say "10%" they mean in average, across wide multiples of workloads it'll get 10%. It doesn't matter whether you have 100,000MT CUDIMM backed up by Optane running at 1THz, and Infinity Fabric at 10GHz. It's all combined.

Well, maybe they can get 15% if not being able to use the clustered decode is a missed target. But that's about it. This is a losing battle.

OneEng2 said:
This is why I keep questioning the assertion that Skymont is a great idea for a unified design. Only when viewed from a single threaded POV is this true. Once you are talking about MT applications, how much performance per core does Skymont attain compared to Zen 5c with SMT?

What's the size of Zen 4c without L2 cache? The performance per clock differences is 25-30% in Integer and 60% in FP. Gracemont clocks quite a bit higher though on the client part at 4.4GHz.

Now on Skymont that difference gets reduced to 5-10% in Integer and 20-25% on FP. Skymont clocks 5% higher at 4.6GHz, even though Lion Cove had to clock lower by 5%, both on the same die. Turin Dense on N3E is 1.9mm2 by the way.

adroc_thurston · Friday at 11:32 PM

DavidC1 said:
There's practically no room to grow there

Cope?

DavidC1 said:
We aren't going to get another 10%

Hell yeah juice the 3-2 finpile in.

OneEng2 said:
AMD had latency issues in the first design

No?
Zen3 was just a new core and a bigger CCX.

moinmoin · Saturday at 6:27 AM

Going from Zen 2 to 3 the most obvious change was going from a 4 cores CCX to a 8 cores CCX. That both doubled the L3$ available to all cores in the CCD ("gamecache") as well as made cross CCX latency less of an issue (only limited to two CCDs x990 and x950 chips from that point onward). All this didn't even concern the redesigned core.

StefanR5R · Saturday at 7:36 AM

What @moinmoin said. Also,

GTracing said:
I don't know what you mean by "latency issues" but that's not what AMD says the IPC comes from.
View attachment 112010

I don't know what @OneEng2 had in mind particularly either. But when AMD highlights, for example, the changed cache prefetching policy of Zen 3, then this is about reduced latency. (See the middle part of page 5 of Ian Cutress' Ryzen 5000 deep dive.) Edit: I haven't looked up to which extent other items in that list, i.e. execution engine etc., were about reduced latency (versus improved throughput, although you hardly can look at either in isolation).

BTW, AMD apparently didn't disclose what the 25 workloads for their 19% figure were. Hence it is impossible to say if these workloads were largely insensitive to the CCX change. Only then would the left side of this slide correspond well with the right side of the slide. (End note R5K-003: See e.g. press release at amd.com.)

StefanR5R · Saturday at 7:42 AM

OneEng2 said:
The cadence has slowed over time. I am not surprised, in fact I argue that it is a waste of time and energy to design something new without the advantage of a significantly improved process to do it on.

On the other hand, they diversify their products and thus address more target markets or address markets better.

GTracing · Saturday at 7:53 AM

StefanR5R said:
What @moinmoin said. Also,

I don't know what @OneEng2 had in mind particularly either. But when AMD highlights, for example, the changed cache prefetching policy of Zen 3, then this is about reduced latency. (See the middle part of page 5 of Ian Cutress' Ryzen 5000 deep dive.)

BTW, AMD apparently didn't disclose what the 25 workloads for their 19% figure were. Hence it is impossible to say if these workloads were largely insensitive to the CCX change. Only then would the left side of this slide correspond well with the right side of the slide. (End note R5K-003: See e.g. press release at amd.com.)

AMD actually did say which 25 workloads they tested.

I can see where better prefetching could be considering lowering latency, but that's clearly not what OneEng2 was talking about. Since he said the first generation chiplets were the issue and prefetching has nothing to do with chiplets.

Meteor Late · Saturday at 8:03 AM

DavidC1 said:
Zen 5 is already at 5.7GHz. There's practically no room to grow there.

Meh, I used to think the same about the 5GHz barrier. We don't know, but I'm willing to bet 6GHz is possible with TSMC N3P.
If you think about it, Intel 6GHz was achieved on a very old node (Intel 7), They are now at 5.7GHz with just TSMC N3B. Are you telling me N3B to N3P doesn't get you from 5.7GHz to 6GHz? let me doubt it.

Meteor Late · Saturday at 8:09 AM

GTracing said:
AMD actually did say which 25 workloads they tested.

View attachment 112036

I can see where better prefetching could be considering lowering latency, but that's clearly not what OneEng2 was talking about. Since he said the first generation chiplets were the issue and prefetching has nothing to do with chiplets.

Yeah that 19% IPC uplift is BS because it's heavily gaming focused, that's not what you do when you want to assess average IPC uplift across many applications.
Maybe it's truly 19% or so in Spec, but this test for sure was wrong if they wanted to show average IPC.

Thunder 57 · Saturday at 9:34 AM

Meteor Late said:
Yeah that 19% IPC uplift is BS because it's heavily gaming focused, that's not what you do when you want to assess average IPC uplift across many applications.
Maybe it's truly 19% or so in Spec, but this test for sure was wrong if they wanted to show average IPC.

What kind of garbage post is this? 19% in games, 19% or so in SPEC, but it's all wrong if you look at "average IPC"?

OneEng2 · Saturday at 10:49 AM

StefanR5R said:
On the other hand, they diversify their products and thus address more target markets or address markets better.

I think This is the immediate future. More non-homogeneous computing aimed at more specific work loads.

General computer is good at doing many things OK. Specific hardware like DSP for example, can perform their specific tasks hundreds or even thousands of times better at a fraction of the cost.

gdansk · Saturday at 11:12 AM

Meteor Late said:
Yeah that 19% IPC uplift is BS because it's heavily gaming focused, that's not what you do when you want to assess average IPC uplift across many applications.
Maybe it's truly 19% or so in Spec, but this test for sure was wrong if they wanted to show average IPC.

It basically matched for Zen 3. But if you haven't realized by now AMD does their marketing numbers backward. Engineers say what they achieved in spec and then the marketing bros pick a bunch of benchmarks and games to put on the slide such that the geomean is around the target given.

It backfires tremendously when the SPEC and gaming results aren't close like Zen 5.

Meteor Late · Saturday at 11:27 AM

Thunder 57 said:
What kind of garbage post is this? 19% in games, 19% or so in SPEC, but it's all wrong if you look at "average IPC"?

What I mean is that geomean chart is BS, you don't put like half of the workloads as games if you want to assess geomiean IPC.

Discussion AMD cools the pace to Moore's Law Death

Diamond Member

Member

Senior member

Senior member

Diamond Member

Elite Member

Diamond Member

Diamond Member

Platinum Member

Senior member

Diamond Member

Senior member

Member

Golden Member

Diamond Member

Diamond Member

Elite Member

Elite Member

Member

Member

Member

Diamond Member

Senior member

Diamond Member

Member