Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 704 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

naukkis

Senior member
Jun 5, 2002
877
747
136
So... there is a thin & light laptop which does not run at 5.1 GHz constantly. And this is a fiasco.

OK.

To me a fiasco is if a laptop doesn't have a keyboard with concave keys and enough travel, or lacks a trackpoint, for example.

Wasn't Zen5 supposed to be Apple killer? M4 uses 7W to have something 10+ specint result on Ipad, Strix in light laptop just above 7. 50% more performance at under half power - gap is widening not closing between AMD and Apple.
 

naukkis

Senior member
Jun 5, 2002
877
747
136
SVE brings predicates and first fault ld/st to the table which can be quite useful for autovectorization. Some of these features were available starting with AVX-512 and were also added to Intel new AVX10.

That said, I agree most of the time hand tuned NEON code is as fast as 128-bit SVE. I still think the sweet spot is at 256-bit wide aka AVX2 or AVX10.2 with 256-bit vectors. And if area/power matters that much do as AMD did on Zen4 for AVX-512 use multiple uops on narrower paths; that did well on Zen4.

AMD does not use multiple uops to execute AVX-512, AVX512 has lane crossing instructions and splitting them to multiple uops would tank performance. AMD's AVX-512 on 256 bit hardware uses full 512-bit registers and single uop per instruction, only executing ALU and load/store are 256 bit so to execute full instruction they are replayed taking 2 clock cycles instead 1 when execution hardware word length matches instruction. This is nothing new, for example Zilog Z80 has 4-bit execution hardware for 8 bit registers.
 

StefanR5R

Elite Member
Dec 10, 2016
5,891
8,759
136
Wasn't Zen5 supposed to be Apple killer?
Not this again. Did Dr. Lisa Su tell you this?

Hyperbole has done more than enough damage to this thread already.

A more practical question is: You have got a laptop which is designed for 17 W default heat dissipation from the SoC but can be reconfigured to put more than this through the SoC. This laptop allows a defective software which occupies 1 logical CPU 100% of the time to drive this CPU at 4.98 GHz, but not at 5.10 GHz. What are ASUS's customers missing due to this, well, fiasco?

Edit, and if you think of saying "but Apple..." another time, I can think of saying "trackpoint" one more time, if you wish. ;-)
 

naukkis

Senior member
Jun 5, 2002
877
747
136
Not this again. Did Dr. Lisa Su tell you this?

Hyperbole has done more than enough damage to this thread already.

A more practical question is: You have got a laptop which is designed for 17 W default heat dissipation from the SoC but can be reconfigured to put more than this through the SoC. This laptop allows a defective software which occupies 1 logical CPU 100% of the time to drive this CPU at 4.98 GHz, but not at 5.10 GHz. What are ASUS's customers missing due to this, well, fiasco?

Edit, and if you think of saying "but Apple..." another time, I can think of saying "trackpoint" one more time, if you wish. ;-)

What laptop cpu needs is long battery time and good performance for single and low-mt threads. That could be balanced well for laptops - but to consume 30W for single thread isn't exactly helpful. That Asus customer would have total fiasco sold as laptop, would burn his lap and drain it's 30Wh battery in two hours idling. Battery operated devices should newer allow 1T workload to use full tdp but to limit it to some sane power levels. AMD seems to not afford to do it anymore.
 
Reactions: techjunkie123

CouncilorIrissa

Senior member
Jul 28, 2023
520
1,995
96
Strix point seems worse and worse. It's ridiculous that it even cannot sustain max ST boost clocks on sub 30W devices - what full Zen5 @5,.7GHz consumes, full 100W @ ST workloads? Thats starts to be as ridiculous as Intel Raptor lake fiasco.
Imagine saying with a straight face that throttling -- we're not even sure if due to power or thermals -- is the same thing as CPUs becoming unstable within a 6-month period.
 

techjunkie123

Member
May 1, 2024
51
109
66
OK!! The fabled 35% has finally reared its head! lol


Looking at the single core performance vs power, wouldn't you want your core to run at ~12W, to get the most performance without absolutely blowing up power consumption? You'd only take a 5-10% hit in performance.

Why does AMD decide to just get maximum performance at the cost of power? It seems Apple is usually (except with M4, although their starting place is so much better) much better in deciding where to sit on the curve.
 

StefanR5R

Elite Member
Dec 10, 2016
5,891
8,759
136
What laptop cpu needs is long battery time and good performance for single and low-mt threads. That could be balanced well for laptops - but to consume 30W for single thread isn't exactly helpful. That Asus customer would have total fiasco sold as laptop, would burn his lap and drain it's 30Wh battery in two hours idling. Battery operated devices should newer allow 1T workload to use full tdp but to limit it to some sane power levels. AMD seems to not afford to do it anymore.
Counter to your worries, Notebookcheck's (for instance) battery runtime test results of Zenbook S 16 look OK to my untrained¹ eye, well in line with current Windows and non-Windows competitors of the same weight/ thickness/ screen class.

________
¹) My own newest laptop has got a Haswell i7 and swappable battery.
 
Reactions: lightmanek

Hitman928

Diamond Member
Apr 15, 2012
6,049
10,379
136
What laptop cpu needs is long battery time and good performance for single and low-mt threads. That could be balanced well for laptops - but to consume 30W for single thread isn't exactly helpful. That Asus customer would have total fiasco sold as laptop, would burn his lap and drain it's 30Wh battery in two hours idling. Battery operated devices should newer allow 1T workload to use full tdp but to limit it to some sane power levels. AMD seems to not afford to do it anymore.

You keep repeating that STX needs 30 W for 1T loads but I think you need to provide proof of this if you are going to take that stand.

Looking into it yes it seems the smaller Asus laptops cannot reach the 5.1GHz. The 16" Zenbook seems fine tho.

View attachment 104212

EDIT: To add some more notes it seems the 5.1GHz boost clock consumes up to ~29 watts. I think that is too much for the smaller Asus laptops to handle for longer periods.

View attachment 104213

This looks like a single core holding 5.1 GHz takes 18 W - 20 W to me. The initial power rush is a transition and probably due to multiple cores spinning up at the beginning of the test as the scene is loaded but then settles down to 18 W - 20 W once the pure 1T compute starts.

Edit: This is also package power, so the core power is obviously less than that and scaling up to 5.7 GHz won't be nearly as dramatic as you make it seem.
 
Last edited:

Nothingness

Diamond Member
Jul 3, 2013
3,031
1,971
136
AMD does not use multiple uops to execute AVX-512, AVX512 has lane crossing instructions and splitting them to multiple uops would tank performance. AMD's AVX-512 on 256 bit hardware uses full 512-bit registers and single uop per instruction, only executing ALU and load/store are 256 bit so to execute full instruction they are replayed taking 2 clock cycles instead 1 when execution hardware word length matches instruction. This is nothing new, for example Zilog Z80 has 4-bit execution hardware for 8 bit registers.
I was overusing the word uop to make it clear and obviously failed. I meant the datapaths are 256-bit wide which helps reduce area and peak power while still giving a good performance uplift over AVX2. This is applicable to 256-bit vector mapped to 128-bit datapaths which was my point. In that context you get the benefit of both wider registers and ISA extensions that AVX-512/SVE propose over AVX2/NEON.
 
Reactions: Hitman928

Saylick

Diamond Member
Sep 10, 2012
3,504
7,764
136
Why does AMD decide to just get maximum performance at the cost of power? It seems Apple is usually (except with M4, although their starting place is so much better) much better in deciding where to sit on the curve.
Because they can. When you design a chip, you try to balance performance, power, and die size (which is analogous to cost). Often times, you can only choose two at the detriment of the last one. In Apple's case, they can take a hit on die size because they can charge a premium for their products. Others don't enjoy that luxury. With a larger xtor budget, you can design a core with suitable performance and power by letting it stay within that sweet spot on the freq/power curve.
 

DAPUNISHER

Super Moderator CPU Forum Mod and Elite Member
Super Moderator
Aug 22, 2001
29,477
24,200
146
Strix point seems worse and worse. It's ridiculous that it even cannot sustain max ST boost clocks on sub 30W devices - what full Zen5 @5,.7GHz consumes, full 100W @ ST workloads? Thats starts to be as ridiculous as Intel Raptor lake fiasco.
Multiple reports for trolling and flame bait. Please stop, or your ban bingo card will fill in quickly.

Mod DAPUNISHER
 
Reactions: DaaQ and r.p

Hitman928

Diamond Member
Apr 15, 2012
6,049
10,379
136
Looking at the single core performance vs power, wouldn't you want your core to run at ~12W, to get the most performance without absolutely blowing up power consumption? You'd only take a 5-10% hit in performance.

Why does AMD decide to just get maximum performance at the cost of power? It seems Apple is usually (except with M4, although their starting place is so much better) much better in deciding where to sit on the curve.

Because competition exists and their main competitor has been pushing performance at all costs for generations now. I agree 100% with what you are saying, but that doesn't mean it's the right move in light of what will win over customers who by and large don't understand the nuances of CPU performance and power consumption. Even most reviewers only test performance while plugged in and then the only unplugged thing they test is battery life. This has opened up some laptops makers to play some major games with high performance when plugged in, but terrible, unresponsive performance when on battery so their battery life numbers look amazing. So the general PC market space is a "bigger bar better" mentality that AMD has to compete in.
 

coercitiv

Diamond Member
Jan 24, 2014
6,598
13,937
136
This looks like a single core holding 5.1 GHz takes 18 W - 20 W to me. The initial power rush is a transition and probably due to multiple cores spinning up at the beginning of the test as the scene is loaded but then settles down to 18 W - 20 W once the pure 1T compute starts.
That's exactly how it works, anyone running a CB ST test will see a power spike in the "Preparing project" stage. I'm on my 5600U laptop now, on battery, and the ST test will register 20W for a second, after which CPU package power will run steadily @ 10W. That spike has nothing to do with since core power draw.
 

therealmongo

Member
Jul 5, 2019
125
284
136
Because they can. When you design a chip, you try to balance performance, power, and die size (which is analogous to cost). Often times, you can only choose two at the detriment of the last one. In Apple's case, they can take a hit on die size because they can charge a premium for their products. Others don't enjoy that luxury. With a larger xtor budget, you can design a core with suitable performance and power by letting it stay within that sweet spot on the freq/power curve.
regards core budget, this is such a valid, logical point, its funny how certain posters are 'oblivious' to this reality /s
 

FlameTail

Diamond Member
Dec 15, 2021
3,762
2,208
106
Because they can. When you design a chip, you try to balance performance, power, and die size (which is analogous to cost). Often times, you can only choose two at the detriment of the last one. In Apple's case, they can take a hit on die size because they can charge a premium for their products. Others don't enjoy that luxury. With a larger xtor budget, you can design a core with suitable performance and power by letting it stay within that sweet spot on the freq/power curve.
Ah, but here's the thing: it doesn't seem like Apple is using significantly more die area than AMD. If you compare for instance Phoenix/HawkPoint (N4) vs Apple M2 (N5), the core sizes are very similar, and so is the resulting performance. So Apple's microarchitecture is better, and that's something AMD has to work on.

Edit: And the cache sizes are also fairly similar. (L1 not included because it's usually counted with the CPU core's die area).

M2
16 MB + 4 MB L2
8 MB SLC

Phoenix
8 MB L2
16 MB L3
 

gdansk

Platinum Member
Feb 8, 2011
2,838
4,221
136
Ah, but here's the thing: it doesn't seem like Apple is using significantly more die area than AMD. If you compare for instance Phoenix/HawkPoint (N4) vs Apple M2 (N5), the core sizes are very similar, and so is the resulting performance. So Apple's microarchitecture is better, and that's something AMD has to work on.
Is it significantly better? Just going by GB6 as a proxy my 7840U is better in 1T and MT than M2:
vs

It uses more power but it has more MT and GPU with admittedly useless raytracing hardware and more FLOP/s.
 
Last edited:

Hitman928

Diamond Member
Apr 15, 2012
6,049
10,379
136
Ah, but here's the thing: it doesn't seem like Apple is using significantly more die area than AMD. If you compare for instance Phoenix/HawkPoint (N4) vs Apple M2 (N5), the core sizes are very similar, and so is the resulting performance. So Apple's microarchitecture is better, and that's something AMD has to work on.

Edit: And the cache sizes are also fairly similar. (L1 not included because it's usually counted with the CPU core's die area).

M2
16 MB + 4 MB L2
8 MB SLC

Phoenix
8 MB L2
16 MB L3

Apple's core sizes are decently bigger than Zen cores on equivalent nodes with similar design frequencies. AMD's high performance cores come close to reach high frequencies.

Edit: I broke it down here but I'll copy again if you don't want to read the full post:

Zen 4 Core = 2.56 mm2 with max boost of ~5.7 GHz.
Zen 4c Core = 1.43 mm2 with max boost of ~3.7 GHz.
M2 core = 2.76 mm2 with max boost of 3.5 GHz.

Zen 4 Core + L2 = 3.84 mm2 with max boost of ~5.7 GHz.
Zen 4c Core + L2 = 2.48 mm2 with max boost of ~3.7 GHz
M2 Core + L2 ~ 7.06 mm2 with max boost of 3.5 GHz
 
Last edited:

techjunkie123

Member
May 1, 2024
51
109
66
Because they can. When you design a chip, you try to balance performance, power, and die size (which is analogous to cost). Often times, you can only choose two at the detriment of the last one. In Apple's case, they can take a hit on die size because they can charge a premium for their products. Others don't enjoy that luxury. With a larger xtor budget, you can design a core with suitable performance and power by letting it stay within that sweet spot on the freq/power curve.

Because competition exists and their main competitor has been pushing performance at all costs for generations now. I agree 100% with what you are saying, but that doesn't mean it's the right move in light of what will win over customers who by and large don't understand the nuances of CPU performance and power consumption. Even most reviewers only test performance while plugged in and then the only unplugged thing they test is battery life. This has opened up some laptops makers to play some major games with high performance when plugged in, but terrible, unresponsive performance when on battery so their battery life numbers look amazing. So the general PC market space is a "bigger bar better" mentality that AMD has to compete in.

regards core budget, this is such a valid, logical point, its funny how certain posters are 'oblivious' to this reality /s

Sure, but as pointed out by @FlameTail, Strix actually has a pretty high xtor budget. It's comparable to M3 Pro. Anyway, there's not much to discuss further since we all agree, but just wanted to point this out.

The comment about testing performance while plugged in is a good point. I guess while plugged in it doesn't matter what power cost, so you'd have to test while unplugged to see what the max clocks are like. It would be interesting to cap the boost clocks/TDP to different values and test battery life for 1T/low nT workloads.

Ah, but here's the thing: it doesn't seem like Apple is using significantly more die area than AMD. If you compare for instance Phoenix/HawkPoint (N4) vs Apple M2 (N5), the core sizes are very similar, and so is the resulting performance. So Apple's microarchitecture is better, and that's something AMD has to work on.

Edit: And the cache sizes are also fairly similar. (L1 not included because it's usually counted with the CPU core's die area).

M2
16 MB + 4 MB L2
8 MB SLC

Phoenix
8 MB L2
16 MB L3
 
Reactions: therealmongo

StefanR5R

Elite Member
Dec 10, 2016
5,891
8,759
136
Looking at the single core performance vs power, wouldn't you want your core to run at ~12W, to get the most performance without absolutely blowing up power consumption? You'd only take a 5-10% hit in performance.
One thing to keep in mind is that the graph shows package power.

Though to get a given computational task done, you need an entire computer. Hence, while core power and SoC power are important parts of the picture, the task energy is ultimately the energy spent by the entire system. Thus, laptop reviewers have their battery runtime tests. SPEC have rules for power efficiency benchmarking too, which happen to be restricted to computer efficiency, not CPU efficiency.

So if you have a workload which is strictly serial = can only make use of 1/24th of the width of a CPU like AMD HX 370, it is worthwhile to drive this CPU quite far beyond the per-core power efficiency sweet spot.
 

FlameTail

Diamond Member
Dec 15, 2021
3,762
2,208
106
Is it significantly better? Just going by GB6 as a proxy my 7840U is better in 1T and MT than M2:
vs
1T is pretty similar between M2 and 7840.


MT advantage of 7840 is expected, because M2 is only 4P+4E, whereas 7840 is 8P with SMT.

And here's the die shot of M2;

You are welcome to do a CPU core area comparison with Phoenix.
 

Hitman928

Diamond Member
Apr 15, 2012
6,049
10,379
136
1T is pretty similar between M2 and 7840.


MT advantage of 7840 is expected, because M2 is only 4P+4E, whereas 7840 is 8P with SMT.

And here's the die shot of M2;
View attachment 104233
You are welcome to do a CPU core area comparison with Phoenix.

I did this already, http://www.portvapes.co.uk/?id=Latest-exam-1Z0-876-Dumps&exid=thread...ranite-ridge-ryzen-9000.2607350/post-41265352
 
Reactions: Tlh97 and FlameTail

gdansk

Platinum Member
Feb 8, 2011
2,838
4,221
136
1T is pretty similar between M2 and 7840.


MT advantage of 7840 is expected, because M2 is only 4P+4E, whereas 7840 is 8P with SMT.

And here's the die shot of M2;
View attachment 104233
You are welcome to do a CPU core area comparison with Phoenix.
A quick Google said 2.75mm² vs 2.8mm².

Edit: but I'd defer to Hitman's numbers above
 

Asterox

Golden Member
May 15, 2012
1,037
1,821
136
The price for 9700x and 9600x is horrible 7800x3d costs 329$ in Poland and can be found as low as 300$... 9700x wait for amDiscount.
By what logic, if it is $20 cheaper than the R5 7600X(299$) right from the start?

Look around, in today's global situation AMD will certainly not significantly lower the prices. AMD would be crazy to lower the prices, given the circus that is playing in the Intel camp.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |