Question Zen 6 Speculation Thread

Page 9 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

FlameTail

Diamond Member
Dec 15, 2021
3,837
2,275
106
More cores are boring.

What is NOT boring?
- V-Cache
- bigger V-Cache
- V-Cache running at full speed, no clock regression
- 16 core CCD with double L3, double V-Cache.
Add to that list "V-cache slice that is bonded laterally across 2 CCDs, connecting them like a bridge"
 

Joe NYC

Platinum Member
Jun 26, 2021
2,473
3,355
106
Add to that list "V-cache slice that is bonded laterally across 2 CCDs, connecting them like a bridge"

I think this would be physically achievable.

But developing algorithm to utilize it, between 2 CCDs, without latency-cost prohibitive overhead probably makes it impossible to implement.
 
Reactions: lightmanek

Anhiel

Member
May 12, 2022
81
34
61
SMT-4 not happening. Nobody can beat IBM beyond SMT-2 anyway.

Too bad back side power delivery was removed from N2 and moved to A16.
So only speculative nextgen unknown is whether some LPDDR die will be put on top of 3D-V cache die or also be gone due to above.

As for core counts to counter rumored Arrow Lake refresh 8+16 I already did the calculations over a year ago.
But if Zen6 is limited to AM5 there's not much hope it will ever get more than 16c (Zen6) at least not for the typical numbering (24c; 32c). Being generous and say we go for N2 and get a total of 1.3x perf/power uplift then at most only ~20c (Zen6) could be fit within the energy budget.... which happens to be enough to match Arrow Lake refresh. Arranging this for 2 CCDs we would get another oddity. If we go for a mix of Zen6 & Zen6c then 8+16 could barely fit.

The other problem is memory bandwidth. More L3/L4 cache could reduce latency (see above).
Infinity Fabric bandwidth on Zen5 supposedly has been 2x over Zen4. Supposedly, not much more will be improved for Zen6 then at most there would be a 1.3x relative gap to Zen4. Limited at same clockspeed IPC would be limited to 1.3/1.16 = 1.12 over Zen5.
So we can't expect much more IPC as it wouldn't be worth it. It's an improvement cycle anyway.
 
Reactions: Tlh97 and Joe NYC
Jul 27, 2020
19,756
13,555
146
Jul 27, 2020
19,756
13,555
146
But developing algorithm to utilize it, between 2 CCDs, without latency-cost prohibitive overhead probably makes it impossible to implement.
The only latency overhead would be when one CCD needs to get data from the core of another CCD. For two CCDs, there would be a V-cache die over each CCD, plus a V-cache "bridge" die containing data that is most frequently shared between the two CCDs. It might be more expensive and have slightly higher latency but it would still beat going out to the IOD for data. The only reason AMD isn't doing something like this is coz they don't have any competition in this space so why should they bother increasing their costs?
 

SarahKerrigan

Senior member
Oct 12, 2014
735
2,035
136
Oh man, that was a very ambitious design. Can't help lusting at the slides here: https://www.servethehome.com/marvel...s-in-2020/marvell-thunderx3-smt4-improvement/

Must've been shortsighted executives to not use that IP for entering the laptop market.

XLP-family cores are firmly server-oriented. They lack enough clock scalability and single-thread perf to be good in the PC market, and every generation since the first - that's four, if you include TX3 which was canned before release - has been late and missed clock targets.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,474
1,966
136
Hmm. LPDDR6 will certainly be in market by the time Zen6 rolls out. The question is whether Zen6 will support LPDDR6.

The time from memory standard release to new desktop SOC using it is historically about 2 years. LPDDR6 spec is currently expected for late this year. When do you think Zen6 will be out?
 

naukkis

Senior member
Jun 5, 2002
889
768
136
Marvell claimed +28-121% higher throughput with +5% area from their 4-way SMT implementation. You are not going to get anywhere close to that with a few larger structures here and there.

With few cores memory latency capped workloads saw big gains from SMT. But todays 100-core systems not so much - those memory latency bound use cases will also saturate memory system without SMT . If those Intel numbers for area and performance/power handicaps are really so big only sane thing for them is to drop SMT from their server-grade cpu implementations too.
 

SarahKerrigan

Senior member
Oct 12, 2014
735
2,035
136
With few cores memory latency capped workloads saw big gains from SMT. But todays 100-core systems not so much - those memory latency bound use cases will also saturate memory system without SMT . If those Intel numbers for area and performance/power handicaps are really so big only sane thing for them is to drop SMT from their server-grade cpu implementations too.

ThunderX3 was 60 cores and still got large gains.

"Hardware multithreading makes stuff go faster" isn't that controversial of a statement. It applies from the niche (hundreds of single-issue cores with 16 threads each, running a large SMP Linux instance) to the normal (large server chips from mainstream vendors.)

It makes heterogeneity harder, and having smaller more efficient cores on the same die arguably makes it less necessary, but SMT exists for good reason, sees real gains, and that isn't going to change.
 

naukkis

Senior member
Jun 5, 2002
889
768
136
ThunderX3 was 60 cores and still got large gains.

"Hardware multithreading makes stuff go faster" isn't that controversial of a statement. It applies from the niche (hundreds of single-issue cores with 16 threads each, running a large SMP Linux instance) to the normal (large server chips from mainstream vendors.)

It makes heterogeneity harder, and having smaller more efficient cores on the same die arguably makes it less necessary, but SMT exists for good reason, sees real gains, and that isn't going to change.
It's already changed. If those Intel numbers are right they did great mistake keeping SMT onboard so long. There's only one reason to make big power inefficient cores - to go after best possible thread performance. Splitting that performance to half or 1/4 with SMT is just plain stupidity. For workloads that do not need best possible thread performance use power-optimized cores instead of SMT neutered power hog big cores. Those power-optimized little cores might benefit from smt in some niche cases though.
 

SarahKerrigan

Senior member
Oct 12, 2014
735
2,035
136
It's already changed. If those Intel numbers are right they did great mistake keeping SMT onboard so long. There's only one reason to make big power inefficient cores - to go after best possible thread performance. Splitting that performance to half or 1/4 with SMT is just plain stupidity. For workloads that do not need best possible thread performance use power-optimized cores instead of SMT neutered power hog big cores. Those power-optimized little cores might benefit from smt in some niche cases though.

"Using smaller cores is better than doing SMT on bigger cores" inevitably leads to "okay, what if we add hardware MT to smaller cores?" - and I can attest from experience that that is something that does, in fact, happen; the manycore Linux I mentioned before wasn't hypothetical.

The core argument, or a core argument, of SMT is that you have a given target for size and complexity of a core and you also want a cheap throughput win on the side. I struggle to think of a category of core where that ceases to be applicable.
 

naukkis

Senior member
Jun 5, 2002
889
768
136
"Using smaller cores is better than doing SMT on bigger cores" inevitably leads to "okay, what if we add hardware MT to smaller cores?" - and I can attest from experience that that is something that does, in fact, happen; the manycore Linux I mentioned before wasn't hypothetical.

The core argument, or a core argument, of SMT is that you have a given target for size and complexity of a core and you also want a cheap throughput win on the side. I struggle to think of a category of core where that ceases to be applicable.

What Intel basically said is that they are losing about one generation of ST performance by implementing SMT. And it's clear that they can't afford it anymore as there is rivals that goes all in for ST performance.
 

naukkis

Senior member
Jun 5, 2002
889
768
136
Funny that Intel is blaming SMT when their rival is eclipsing their ST performance with SMT enabled

Yet to see if AMD does SMT so that it won't hamper ST-performance and efficiency. But it's not AMD that Intel is worried about, SMT-less big arm cores start to became so powerful and still efficient that desktop/laptop-platforms will lost at least as much marked share to arm than AMD.
 
Reactions: marees

coercitiv

Diamond Member
Jan 24, 2014
6,611
13,998
136
An engineer examines a locust. First the engineer claps his hands, and the locust jumps. He then proceeds to cut one of it's legs, claps his hands, the locust jumps. He cuts one more leg, claps his hands, the locust jumps no more. The engineer writes down his conclusion: when two legs get cut off, the locust loses hearing capacity.

An engineer looks at a high performance ARM core and a high performance, but slower x86 cores. Sees SMT hardware in one, but none in the other. The engineer writes down his conclusion: when SMT2 hardware is missing, the CPU runs faster.

Joke aside, here's another point of view from IBM:
Additionally, newer versions of the Linux kernel than the version in SUSE Linux Enterprise Server SP1 contain enhancements that enhance SMT behavior for POWER7 processors. These enhancements migrate work from tertiary threads to secondary or primary threads and from secondary threads to primary threads when possible. These enhancements can help to provide better processor utilization and decrease or eliminate, the effect shown in Table 1.
 

Nothingness

Diamond Member
Jul 3, 2013
3,053
2,019
136
An engineer looks at a high performance ARM core and a high performance, but slower x86 cores. Sees SMT hardware in one, but none in the other. The engineer writes down his conclusion: when SMT2 hardware is missing, the CPU runs faster.
Random Joe, who is a hardcore gamer, watches some video comparing HT vs no-HT without reading comments and concludes he should disable HT.

Joke aside, I'm still not convinced of the usefulness or uselessness of SMT. I see some benefits of SMT for my compilation jobs (though it's marginal) while I see none for finely tuned computational software (OTOH I'd have to turn off SMT in the firmware to see if there's any gain; I don't expect much on modern CPU that were tuned for more than 20 years to properly handle SMT).
 
Jul 27, 2020
19,756
13,555
146

SarahKerrigan

Senior member
Oct 12, 2014
735
2,035
136
What Intel basically said is that they are losing about one generation of ST performance by implementing SMT. And it's clear that they can't afford it anymore as there is rivals that goes all in for ST performance.

Right, and that's a value calculation that makes sense on a heterogeneous design where you're farming MT perf out to flocks of smaller and more area-efficient cores anyway. That raises the question of whether those smaller cores would benefit from SMT themselves; I suspect my opinion on that is obvious. SMT improves throughput performance without new caches, without new interconnect stops, and is one of the few things in microarchitecture that gives you larger throughput performance gains than you spend on area or power.

There is a lot of distance between "you can separate your cores into throughput-efficiency-optimized vs single-thread-optimized in a heterogeneous design" and "SMT categorically considered harmful."
 

naukkis

Senior member
Jun 5, 2002
889
768
136
An engineer examines a locust. First the engineer claps his hands, and the locust jumps. He then proceeds to cut one of it's legs, claps his hands, the locust jumps. He cuts one more leg, claps his hands, the locust jumps no more. The engineer writes down his conclusion: when two legs get cut off, the locust loses hearing capacity.

An engineer looks at a high performance ARM core and a high performance, but slower x86 cores. Sees SMT hardware in one, but none in the other. The engineer writes down his conclusion: when SMT2 hardware is missing, the CPU runs faster.

Joke aside, here's another point of view from IBM:

This is not speculation anymore as Intel did provide numbers explaining why they ditch SMT. Yeah, I have liked to speculate that SMT needs additional logic in cpu critical paths and some ST performance is there after abandoning it - but I was in order of magnitude too conservative if those Intel numbers are real.
 

marees

Senior member
Apr 28, 2024
393
452
96
This is not speculation anymore as Intel did provide numbers explaining why they ditch SMT. Yeah, I have liked to speculate that SMT needs additional logic in cpu critical paths and some ST performance is there after abandoning it - but I was in order of magnitude too conservative if those Intel numbers are real.
Thinking as an end customer, this might be important in a hand held gaming console or 5g tablet. Otherwise I am not going anywhere near such a processor
 

naukkis

Senior member
Jun 5, 2002
889
768
136
There is a lot of distance between "you can separate your cores into throughput-efficiency-optimized vs single-thread-optimized in a heterogeneous design" and "SMT categorically considered harmful."

Let's talk about those numbers Intel gave. They claim to lose 15% iso-power st speed from SMT design. That's pretty much one giant cpu generational step. They have been trying everything they could to have ST crown - driving unbelievable power limits to the edge of instability and they always had 15% on table from SMT. Thats sure path to lose ST performance crown - which unlike this forum which prefers throughput over anything still the merit which will measure cpu pricing.
 

naukkis

Senior member
Jun 5, 2002
889
768
136
Thinking as an end customer, this might be important in a hand held gaming console or 5g tablet. Otherwise I am not going anywhere near such a processor

We are talking about ST performance. Like having fastest cpu to play games. Or actually doing anything in computer which prefers speed. I don't know how accurate those Intel numbers are - but seems that nobody doesn't seem to acknowledge how significant they are. Actually they are pretty hard to believe - there might be some marketing acceleration added.
 

SarahKerrigan

Senior member
Oct 12, 2014
735
2,035
136
Let's talk about those numbers Intel gave. They claim to lose 15% iso-power st speed from SMT design. That's pretty much one giant cpu generational step. They have been trying everything they could to have ST crown - driving unbelievable power limits to the edge of instability and they always had 15% on table from SMT. Thats sure path to lose ST performance crown - which unlike this forum which prefers throughput over anything still the merit which will measure cpu pricing.

And in heterogeneity, that's great. Trading some throughput to build a single-thread-optimized core when you also have throughput-optimized cores on the same die is a reasonable tradeoff. I feel that I already acknowledged that in clear terms.

That is not the same as SMT being categorically bad and I don't know why you are having difficulty absorbing that. In a server, +30% throughput perf at +20% power (Intel's numbers, which are worse than I've seen from other vendors) is a really big deal and that's what SMT provides.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |