Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 712 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Abwx

Lifer
Apr 2, 2011
11,517
4,303
136
Interesting that Spec with GCC shows that much higher IPC gain versus test with Clang. I wonder what flags they were using as well.
Dunno how they proceeded given that AT also use GCC and had allegedly 0% in INT,
their number in this matter is about the same as to what AMD displayed in the few INT based tests.
 

Hitman928

Diamond Member
Apr 15, 2012
6,058
10,403
136
Dunno how they proceeded given that AT also use GCC and had allegedly 0% in INT,
their number in this matter is about the same as to what AMD displayed in the few INT based tests.

AT didn’t normalize for clock speed and apparently didn’t notice that the CPU in their laptop was thermally throttling during the ST tests, so you can’t get any IPC data from AT’s testing, unfortunately.
 

Abwx

Lifer
Apr 2, 2011
11,517
4,303
136
AT didn’t normalize for clock speed and apparently didn’t notice that the CPU in their laptop was thermally throttling during the ST tests, so you can’t get any IPC data from AT’s testing, unfortunately.

Well, using the latest datas we can at least deduct by how much their CPU was throttling, assuming of course that it was the only cause.

Guess that we need a few more tests to definitly have an accurate picture, anyway there s only 5 day left before everything being cristal clear.
 

Hitman928

Diamond Member
Apr 15, 2012
6,058
10,403
136
Well, using the latest datas we can at least deduct by how much their CPU was throttling, assuming of course that it was the only cause.

Even though they are using GCC, they are using a much older version than the linked twitter post, so you won't get a super accurate prediction that way either. Bad data is just bad data*, not much you can do with it. I previously estimated around 5% throttling based on the other tests AT ran, but it's a very rough estimate and Spec is a much longer test than everything else, so it's very possible, if not probable, that there was more throttling during at least some of the Spec tests than the others.


(*I'm saying it's bad data to try and calculate IPC, I have no reason to believe it's not fine as a measurement of the performance you get from STX in that particular laptop).
 

Nothingness

Diamond Member
Jul 3, 2013
3,033
1,976
136
Hey, bring that back. This thread can't become any more of a dumpster fire than it already is.
Well if you insist.

You made it sound like 2x4 decoders would bring 2x16%=32%.

The stupid joke was that %ages don't add, they multiply. So if you have one 4 x decoder bringing 16% improvement *and* you add a second one then the improvement will be almost 35%, not 32%.

Yeah, that was stupid. But you insisted, so I'll put the blame on you 😀
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,480
2,958
136
Strix Point laptops are expensive because there s no competition in this segment but since the chip is the same size or so as Hawk Point it will gradually be cheaper as time goes by.
@FlameTail already pointed out It's bigger and also using a newer process.
So It will likely get somewhat cheaper with more manufacturers offering them not just Asus, but not as cheap as Phoenix was(is).
Strix Halo cost more to manufacture but that s surely less than say a 8840 APU + RX 7600 chip, and AMD will cash on both the CPU and GPU, so on the mid term it will also be substancially cheaper once some RD cost is amortized.
Strix Halo uses N4 + N3 and the size is 2x ~66mm² + ~300mm² IOD including the IGP according to leak.
178 mm² 8840 uses N4 + 204mm² RTX 7600 using N6.
Yet you think It will be cheaper to manufacture than that CPU+dGPU combo. Not happening.
 
Last edited:

inquiss

Member
Oct 13, 2010
183
264
136
What's it got to do with manufacturing cost? The point is they get to set the price for both, and a strix halo sold "may" be an Nvidia GPU not sold. They get the profit margin for "both" chips and can sell that for cheaper overall margin than the combo and make nore margin (in $) than they would make on just the CPU alone.
 

jdubs03

Senior member
Oct 1, 2013
712
317
136
Even though they are using GCC, they are using a much older version than the linked twitter post, so you won't get a super accurate prediction that way either. Bad data is just bad data*, not much you can do with it. I previously estimated around 5% throttling based on the other tests AT ran, but it's a very rough estimate and Spec is a much longer test than everything else, so it's very possible, if not probable, that there was more throttling during at least some of the Spec tests than the others.


(*I'm saying it's bad data to try and calculate IPC, I have no reason to believe it's not fine as a measurement of the performance you get from STX in that particular laptop).
Just looked at the version history, 13.3 came out May 21, 2024 and the latest 14.2 came out yesterday. The version history isn’t sequential which is interesting (as there were two releases in between with lower version numbers). Y’all probably know why better than me. But in terms of versioning it was the highest at the most recent time.

How much would that affect the results? I can’t imagine it would be materially.
 

Hitman928

Diamond Member
Apr 15, 2012
6,058
10,403
136
Just looked at the version history, 13.3 came out May 21, 2024 and the latest 14.2 came out yesterday. The version history isn’t sequential which is interesting (as there were two releases in between with lower version numbers). Y’all probably know why better than me. But in terms of versioning it was the highest at the most recent time.

How much would that affect the results? I can’t imagine it would be materially.

I meant that AT used an older version of GCC, but it's wrong anyway. I double checked and AT used Clang 10.0.0 (edit: much older than Geekerwan used as well so I guess the point still stands).
 
Reactions: ZGR and jdubs03

StefanR5R

Elite Member
Dec 10, 2016
5,892
8,764
136
Guess that we need a few more tests to definitly have an accurate picture, anyway there s only 5 day left before everything being cristal clear.
The Zenbook S 16 reviews were all rushed.
The Granite Ridge reviews on August 7 (if there will be such reviews on August 7) will all be rushed too. But at least they will be performed on desktop computer platforms.
 

MS_AT

Senior member
Jul 15, 2024
210
507
96
Generally you should look at the environment used for running the test and compiler flags. For example David Huang is running the tests on native linux so in his case core pinning will work [telling OS to run the test on given core without migrating it to other cores what lowers performance], while the outlet shown today is using WSL2 [Windows Subsystem for Linux ver 2] which is a Hyper-V Virtual Machine running Linux. In this setup core pinning is unlikely to work, so they can think they measure Zen5c core, but the workload can migrate to Zen5 if hypervisor feels like it and guest OS [Linux in this case] won't be any wiser. So the best thing one can do is use native Linux as only native Linux will give you certainty you are measuring the core you think you measure. [Or native Windows, but it seems SPEC is targetting Linux].

When it comes to compiler flags Anandtech is not using any CPU specific tunings [like march=native] but ensures avx2, fma extensions are enabled, what means compiler can emit avx2 instructions. Geekerwan is not enabling AVX or AVX512 so at best SSE4 will be used as iirc this is the current default for X64. David Huang is using -march=native or closest predecessor if compiler doesn't know tested architecture [wasn't patched to support it yet]. This is sensible behaviour as otherwise it would penalize CPUs that are too new [as march=native will default to baseline so SSE4 in case of X64 but most of them support AVX2 at least] but at the same time ensures new features like AVX512 or SME can be used if compiler supports them. I think they are all using -Ofast what gives compiler more leeway to vectorize by ignoring strict rules about FP math. [For example a + b might be different than b+a in FP math].

Then when it comes to newer compiler versions they sometimes learn new tricks, get patched with cost tables for newer architectures or bugfixes. But sometimes they regress so older version can give better results on given hardware, that is not unheard of.

So since people who so far presented SPEC results are using different environments and different compilers with different compiler options, comparing their results is more like comparing apples to oranges than apples to apples.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,466
3,350
106
Z4 already has dual SDP per CCD (EPYC GMI wide), so at the very least with dense fanout interconnect they can enable both SDPs and still consume less than half the energy compared to DT CCD.
So it would have double the BW of the DT CCD if they do this at least.

But that sounds like bare minimal effort, some new innovation should be there.

The other interesting aspect is the MALL, which can do aggressive prefetching for hiding memory latencies (as done on MI300), the one on RDNA2/3 is not capable of this.
I again hope this is the one they use not the one from RDNA3.

According to some rumors, there will also be LP cores as part of the giant SoC, and the LP cores will also benefit from MALL memory, which can, in low power situation, act as its L3.
 
Reactions: Gideon

Joe NYC

Platinum Member
Jun 26, 2021
2,466
3,350
106
Strix Halo is really big. Anyone hoping It It will be relatively cheap should forget about that, just look at what they ask for a Strix Point laptop.
My prediction is >2000 euro and for that only 4070 80W level of performance is not very good.
In my opinion Strix Halo is not aimed for gamers, that's just secondary. The main selling point is the 16C32T CPU paired with 64-128GB RAM.

Or, a GPU attached to 128 GB of RAM, which is more than H100 offers for ~$10,000 - $30,000 (depending on who you ask).
 
Reactions: Kryohi

poke01

Platinum Member
Mar 8, 2022
2,008
2,546
106
Intel has so far managed with only 2 SKUs, IIRC.
Exactly, if you’re going to solder RAM might as well use the best implementation. MoP saves board space and enables higher busses on laptops. AMD just didn’t want to go all out I guess.

If AMD used MoP, they will likely have 3-4 SKUs. 32, 64, and 128GB RAM SKUs and one flagship SKU will the full cores and clocks.
 

HurleyBird

Platinum Member
Apr 22, 2003
2,759
1,455
136
Strix Halo uses N4 + N3 and the size is 2x ~66mm² + ~300mm² IOD including the IGP according to leak.
178 mm² 8840 uses N4 + 204mm² RTX 7600 using N6.
Yet you think It will be cheaper to manufacture than that CPU+dGPU combo. Not happening.

That CPU+dGPU combo is more expensive in every single way that isn't the manufacturing cost of the silicon. More complex PCB, more complex layout, more complex cooling, two different memory pools, etc.

It still won't be cheaper, but might not be as much more expensive than you think.
 
Reactions: Tlh97 and Joe NYC

gdansk

Platinum Member
Feb 8, 2011
2,843
4,240
136
This is it. It’s not a gamer part. It comp will be the M3 Max and ML usage. The 128GB SKU will likely be >$2500
I'm pretty sure it is a gamer part by design. Why else would it use an RDNA variant which still doesn't have good ROCm support. They'll have to market it as something else because it isn't competitive where it was aiming.
 
Last edited:

Joe NYC

Platinum Member
Jun 26, 2021
2,466
3,350
106
@FlameTail already pointed out It's bigger and also using a newer process.
So It will likely get somewhat cheaper with more manufacturers offering them not just Asus, but not as cheap as Phoenix was(is).

Strix Halo uses N4 + N3 and the size is 2x ~66mm² + ~300mm² IOD including the IGP according to leak.
178 mm² 8840 uses N4 + 204mm² RTX 7600 using N6.
Yet you think It will be cheaper to manufacture than that CPU+dGPU combo. Not happening.

That die size is a bit of a shock. I think most people estimated 200 to 250 mm2.

We will see what it will end up having, between LP cores, possibly bigger NPU, MALL. But it still seems too big on N3E... Or maybe the die size info may turn out to be not correct...

Since this is, in effect a prototype, cost was not the primary concern, just establishing a new niche.

Similar to Lunar Lake, that was supposed to be a niche, so Intel did not care about the cost. But Intel may be forced to sell it in wider market, with low margins...

From AMD POV on Strix Halo, it is just how aggressive AMD wants to be establishing this new market segment. If successful, AMD can introduce a more cost optimized version for next gen, with more cost optimized chiplet arrangement...
 

Abwx

Lifer
Apr 2, 2011
11,517
4,303
136

adroc_thurston

Diamond Member
Jul 2, 2023
3,492
5,055
96
Intel has so far managed with only 2 SKUs, IIRC.
Have you seen the LNL SKU list?
Why else would it use an RDNA variant which still doesn't have good ROCm support
Because that's the only GFX IP AMD has that draws triangles.
Either way RDNA ROCm support will be streamlined once SPIR-V support gets mainlined.
178 mm² 8840 uses N4 + 204mm² RTX 7600 using N6.
It's a lot more performant than that.
 
Last edited:

Doug S

Platinum Member
Feb 8, 2020
2,715
4,607
136
So since people who so far presented SPEC results are using different environments and different compilers with different compiler options, comparing their results is more like comparing apples to oranges than apples to apples.

But they aren't doing stuff like replacing malloc libraries or using PGO, so I'd argue the results reported by Anandtech (when they had people doing that) or Geekerwan are more useful for comparison than the "official" submissions.

There isn't any effective way to standardize between say macOS and Windows, nor should there be. Pretty much every developer on macOS and iOS uses XCode, so using the latest Xcode release with some basic optimization flags that ordinary developers might use is how Apple Silicon performance should be demonstrated. It doesn't matter if it performs better or worse running Asahi Linux, that's not what 99.9% of Mac buyers are running. Likewise on Windows you'd want to use the MS C Compiler, though there are arguments for using vendor compilers since some developers may do so.

Trying to make them all equal by saying "OK we'll use gcc on everything as the lowest common denominator" might level the playing field, but the information you get doesn't really prove anything. The goal isn't "how does M4 compare against Zen 5" in some sort of abstract sense divorced from the realities of the Mac and PC platforms, the macOS and Windows APIs and development environments, etc. If Zen 5 performs better under Linux or M4 performed better using DDR5 instead of LPDDR5X that's not relevant as far as I'm concerned, because that's not how those CPUs are used (unless you are the 2% or whatever like me and actually do run Linux on your desktop)

The problem is SPEC is a pain to run, so asking people to re-run it just because there's a new compiler rev or something just isn't worth the trouble. If you want something you can run often there's Geekbench. Its just too bad it is so bad as far as benchmark repeatability, but that's going to be true of any benchmark that runs quickly in today's world where you have a bunch of cores and they can all adjust their frequencies moment to moment depending on temperature, load and the phase of the moon.
 

poke01

Platinum Member
Mar 8, 2022
2,008
2,546
106
Have you seen the LNL SKU list?
thats intel binning by clock and gpu count. If AMD binned by RAM there should only be 3 SKUs.

 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |