Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

Abwx · Aug 2, 2024

Hitman928 said:
Interesting that Spec with GCC shows that much higher IPC gain versus test with Clang. I wonder what flags they were using as well.

Dunno how they proceeded given that AT also use GCC and had allegedly 0% in INT,
their number in this matter is about the same as to what AMD displayed in the few INT based tests.

Hitman928 · Aug 2, 2024

Abwx said:
Dunno how they proceeded given that AT also use GCC and had allegedly 0% in INT,
their number in this matter is about the same as to what AMD displayed in the few INT based tests.

AT didn’t normalize for clock speed and apparently didn’t notice that the CPU in their laptop was thermally throttling during the ST tests, so you can’t get any IPC data from AT’s testing, unfortunately.

Abwx · Aug 2, 2024

Hitman928 said:
AT didn’t normalize for clock speed and apparently didn’t notice that the CPU in their laptop was thermally throttling during the ST tests, so you can’t get any IPC data from AT’s testing, unfortunately.

Well, using the latest datas we can at least deduct by how much their CPU was throttling, assuming of course that it was the only cause.

Guess that we need a few more tests to definitly have an accurate picture, anyway there s only 5 day left before everything being cristal clear.

Hitman928 · Aug 2, 2024

Abwx said:
Well, using the latest datas we can at least deduct by how much their CPU was throttling, assuming of course that it was the only cause.

Even though they are using GCC, they are using a much older version than the linked twitter post, so you won't get a super accurate prediction that way either. Bad data is just bad data*, not much you can do with it. I previously estimated around 5% throttling based on the other tests AT ran, but it's a very rough estimate and Spec is a much longer test than everything else, so it's very possible, if not probable, that there was more throttling during at least some of the Spec tests than the others.

(*I'm saying it's bad data to try and calculate IPC, I have no reason to believe it's not fine as a measurement of the performance you get from STX in that particular laptop).

Nothingness · Aug 2, 2024

CouncilorIrissa said:
Hey, bring that back. This thread can't become any more of a dumpster fire than it already is.

Well if you insist.

You made it sound like 2x4 decoders would bring 2x16%=32%.

The stupid joke was that %ages don't add, they multiply. So if you have one 4 x decoder bringing 16% improvement *and* you add a second one then the improvement will be almost 35%, not 32%.

Yeah, that was stupid. But you insisted, so I'll put the blame on you 😀

TESKATLIPOKA · Aug 2, 2024

Abwx said:
Strix Point laptops are expensive because there s no competition in this segment but since the chip is the same size or so as Hawk Point it will gradually be cheaper as time goes by.

@FlameTail already pointed out It's bigger and also using a newer process.
So It will likely get somewhat cheaper with more manufacturers offering them not just Asus, but not as cheap as Phoenix was(is).

Abwx said:
Strix Halo cost more to manufacture but that s surely less than say a 8840 APU + RX 7600 chip, and AMD will cash on both the CPU and GPU, so on the mid term it will also be substancially cheaper once some RD cost is amortized.

Strix Halo uses N4 + N3 and the size is 2x ~66mm² + ~300mm² IOD including the IGP according to leak.
178 mm² 8840 uses N4 + 204mm² RTX 7600 using N6.
Yet you think It will be cheaper to manufacture than that CPU+dGPU combo. Not happening.

inquiss · Aug 2, 2024

What's it got to do with manufacturing cost? The point is they get to set the price for both, and a strix halo sold "may" be an Nvidia GPU not sold. They get the profit margin for "both" chips and can sell that for cheaper overall margin than the combo and make nore margin (in $) than they would make on just the CPU alone.

jdubs03 · Aug 2, 2024

Hitman928 said:
Even though they are using GCC, they are using a much older version than the linked twitter post, so you won't get a super accurate prediction that way either. Bad data is just bad data*, not much you can do with it. I previously estimated around 5% throttling based on the other tests AT ran, but it's a very rough estimate and Spec is a much longer test than everything else, so it's very possible, if not probable, that there was more throttling during at least some of the Spec tests than the others.

(*I'm saying it's bad data to try and calculate IPC, I have no reason to believe it's not fine as a measurement of the performance you get from STX in that particular laptop).

Just looked at the version history, 13.3 came out May 21, 2024 and the latest 14.2 came out yesterday. The version history isn’t sequential which is interesting (as there were two releases in between with lower version numbers). Y’all probably know why better than me. But in terms of versioning it was the highest at the most recent time.

How much would that affect the results? I can’t imagine it would be materially.

Hitman928 · Aug 2, 2024

jdubs03 said:
Just looked at the version history, 13.3 came out May 21, 2024 and the latest 14.2 came out yesterday. The version history isn’t sequential which is interesting (as there were two releases in between with lower version numbers). Y’all probably know why better than me. But in terms of versioning it was the highest at the most recent time.

How much would that affect the results? I can’t imagine it would be materially.

I meant that AT used an older version of GCC, but it's wrong anyway. I double checked and AT used Clang 10.0.0 (edit: much older than Geekerwan used as well so I guess the point still stands).

StefanR5R · Aug 2, 2024

Abwx said:
Guess that we need a few more tests to definitly have an accurate picture, anyway there s only 5 day left before everything being cristal clear.

The Zenbook S 16 reviews were all rushed.
The Granite Ridge reviews on August 7 (if there will be such reviews on August 7) will all be rushed too. But at least they will be performed on desktop computer platforms.

MS_AT · Aug 2, 2024

Generally you should look at the environment used for running the test and compiler flags. For example David Huang is running the tests on native linux so in his case core pinning will work [telling OS to run the test on given core without migrating it to other cores what lowers performance], while the outlet shown today is using WSL2 [Windows Subsystem for Linux ver 2] which is a Hyper-V Virtual Machine running Linux. In this setup core pinning is unlikely to work, so they can think they measure Zen5c core, but the workload can migrate to Zen5 if hypervisor feels like it and guest OS [Linux in this case] won't be any wiser. So the best thing one can do is use native Linux as only native Linux will give you certainty you are measuring the core you think you measure. [Or native Windows, but it seems SPEC is targetting Linux].

When it comes to compiler flags Anandtech is not using any CPU specific tunings [like march=native] but ensures avx2, fma extensions are enabled, what means compiler can emit avx2 instructions. Geekerwan is not enabling AVX or AVX512 so at best SSE4 will be used as iirc this is the current default for X64. David Huang is using -march=native or closest predecessor if compiler doesn't know tested architecture [wasn't patched to support it yet]. This is sensible behaviour as otherwise it would penalize CPUs that are too new [as march=native will default to baseline so SSE4 in case of X64 but most of them support AVX2 at least] but at the same time ensures new features like AVX512 or SME can be used if compiler supports them. I think they are all using -Ofast what gives compiler more leeway to vectorize by ignoring strict rules about FP math. [For example a + b might be different than b+a in FP math].

Then when it comes to newer compiler versions they sometimes learn new tricks, get patched with cost tables for newer architectures or bugfixes. But sometimes they regress so older version can give better results on given hardware, that is not unheard of.

So since people who so far presented SPEC results are using different environments and different compilers with different compiler options, comparing their results is more like comparing apples to oranges than apples to apples.

yuri69 · Aug 2, 2024

mostwanted002 said:
>RDNA double-issue flashbacks.

Not comparable. Compiler patches revealed the horrible double issue restrictions pretty early. It was "wtfed" pretty very early in the hype cycle.

Joe NYC · Aug 2, 2024

DisEnchantment said:
Z4 already has dual SDP per CCD (EPYC GMI wide), so at the very least with dense fanout interconnect they can enable both SDPs and still consume less than half the energy compared to DT CCD.
So it would have double the BW of the DT CCD if they do this at least.

But that sounds like bare minimal effort, some new innovation should be there.

The other interesting aspect is the MALL, which can do aggressive prefetching for hiding memory latencies (as done on MI300), the one on RDNA2/3 is not capable of this.
I again hope this is the one they use not the one from RDNA3.

According to some rumors, there will also be LP cores as part of the giant SoC, and the LP cores will also benefit from MALL memory, which can, in low power situation, act as its L3.

Joe NYC · Aug 2, 2024

TESKATLIPOKA said:
Strix Halo is really big. Anyone hoping It It will be relatively cheap should forget about that, just look at what they ask for a Strix Point laptop.
My prediction is >2000 euro and for that only 4070 80W level of performance is not very good.
In my opinion Strix Halo is not aimed for gamers, that's just secondary. The main selling point is the 16C32T CPU paired with 64-128GB RAM.

Or, a GPU attached to 128 GB of RAM, which is more than H100 offers for ~$10,000 - $30,000 (depending on who you ask).

poke01 · Aug 2, 2024

TESKATLIPOKA said:
The main selling point is the 16C32T CPU paired with 64-128GB RAM.

This is it. It’s not a gamer part. It comp will be the M3 Max and ML usage. The 128GB SKU will likely be >$2500

Joe NYC · Aug 2, 2024

adroc_thurston said:
MoP means SKU spam.

Intel has so far managed with only 2 SKUs, IIRC.

poke01 · Aug 2, 2024

Joe NYC said:
Intel has so far managed with only 2 SKUs, IIRC.

Exactly, if you’re going to solder RAM might as well use the best implementation. MoP saves board space and enables higher busses on laptops. AMD just didn’t want to go all out I guess.

If AMD used MoP, they will likely have 3-4 SKUs. 32, 64, and 128GB RAM SKUs and one flagship SKU will the full cores and clocks.

HurleyBird · Aug 2, 2024

TESKATLIPOKA said:
Strix Halo uses N4 + N3 and the size is 2x ~66mm² + ~300mm² IOD including the IGP according to leak.
178 mm² 8840 uses N4 + 204mm² RTX 7600 using N6.
Yet you think It will be cheaper to manufacture than that CPU+dGPU combo. Not happening.

That CPU+dGPU combo is more expensive in every single way that isn't the manufacturing cost of the silicon. More complex PCB, more complex layout, more complex cooling, two different memory pools, etc.

It still won't be cheaper, but might not be as much more expensive than you think.

gdansk · Aug 2, 2024

poke01 said:
This is it. It’s not a gamer part. It comp will be the M3 Max and ML usage. The 128GB SKU will likely be >$2500

I'm pretty sure it is a gamer part by design. Why else would it use an RDNA variant which still doesn't have good ROCm support. They'll have to market it as something else because it isn't competitive where it was aiming.

Joe NYC · Aug 2, 2024

TESKATLIPOKA said:
@FlameTail already pointed out It's bigger and also using a newer process.
So It will likely get somewhat cheaper with more manufacturers offering them not just Asus, but not as cheap as Phoenix was(is).

Strix Halo uses N4 + N3 and the size is 2x ~66mm² + ~300mm² IOD including the IGP according to leak.
178 mm² 8840 uses N4 + 204mm² RTX 7600 using N6.
Yet you think It will be cheaper to manufacture than that CPU+dGPU combo. Not happening.

That die size is a bit of a shock. I think most people estimated 200 to 250 mm2.

We will see what it will end up having, between LP cores, possibly bigger NPU, MALL. But it still seems too big on N3E... Or maybe the die size info may turn out to be not correct...

Since this is, in effect a prototype, cost was not the primary concern, just establishing a new niche.

Similar to Lunar Lake, that was supposed to be a niche, so Intel did not care about the cost. But Intel may be forced to sell it in wider market, with low margins...

From AMD POV on Strix Halo, it is just how aggressive AMD wants to be establishing this new market segment. If successful, AMD can introduce a more cost optimized version for next gen, with more cost optimized chiplet arrangement...

Abwx · Aug 2, 2024

Projected Strix Halo based design by Asus :

ASUS preparing ROG Z13 Flow 2025, a gaming tablet with AMD 110W Strix Halo APU - VideoCardz.com

ASUS ROG Flow Z13 gets Strix Halo, a next-gen gaming tablet A Taiwanese company will put a 100W+ APU inside a tablet. Some interesting details have been leaked regarding the next-gen AMD Strix Halo product planning. The leak covers the next generation of the ASUS ROG Z13 Flow 2025, a compact...

videocardz.com

adroc_thurston · Aug 2, 2024

Joe NYC said:
Intel has so far managed with only 2 SKUs, IIRC.

Have you seen the LNL SKU list?

gdansk said:
Why else would it use an RDNA variant which still doesn't have good ROCm support

Because that's the only GFX IP AMD has that draws triangles.
Either way RDNA ROCm support will be streamlined once SPIR-V support gets mainlined.

TESKATLIPOKA said:
178 mm² 8840 uses N4 + 204mm² RTX 7600 using N6.

It's a lot more performant than that.

Doug S · Aug 2, 2024

MS_AT said:
So since people who so far presented SPEC results are using different environments and different compilers with different compiler options, comparing their results is more like comparing apples to oranges than apples to apples.

But they aren't doing stuff like replacing malloc libraries or using PGO, so I'd argue the results reported by Anandtech (when they had people doing that) or Geekerwan are more useful for comparison than the "official" submissions.

There isn't any effective way to standardize between say macOS and Windows, nor should there be. Pretty much every developer on macOS and iOS uses XCode, so using the latest Xcode release with some basic optimization flags that ordinary developers might use is how Apple Silicon performance should be demonstrated. It doesn't matter if it performs better or worse running Asahi Linux, that's not what 99.9% of Mac buyers are running. Likewise on Windows you'd want to use the MS C Compiler, though there are arguments for using vendor compilers since some developers may do so.

Trying to make them all equal by saying "OK we'll use gcc on everything as the lowest common denominator" might level the playing field, but the information you get doesn't really prove anything. The goal isn't "how does M4 compare against Zen 5" in some sort of abstract sense divorced from the realities of the Mac and PC platforms, the macOS and Windows APIs and development environments, etc. If Zen 5 performs better under Linux or M4 performed better using DDR5 instead of LPDDR5X that's not relevant as far as I'm concerned, because that's not how those CPUs are used (unless you are the 2% or whatever like me and actually do run Linux on your desktop)

The problem is SPEC is a pain to run, so asking people to re-run it just because there's a new compiler rev or something just isn't worth the trouble. If you want something you can run often there's Geekbench. Its just too bad it is so bad as far as benchmark repeatability, but that's going to be true of any benchmark that runs quickly in today's world where you have a bunch of cores and they can all adjust their frequencies moment to moment depending on temperature, load and the phase of the moon.

Joe NYC · Aug 2, 2024

adroc_thurston said:
Have you seen the LNL SKU list?

Maybe what I have seen was not a complete list. It had only 2 SKUs.

adroc_thurston said:
It's a lot more performant than that.

Any theory about the 300 mm2 die size? It seems higher than most people expected, especially being on N3.

poke01 · Aug 2, 2024

adroc_thurston said:
Have you seen the LNL SKU list?

thats intel binning by clock and gpu count. If AMD binned by RAM there should only be 3 SKUs.

Intel Core Ultra 200V "Lunar Lake" lineup allegedly leaks out, features one Core Ultra 9 SKU - VideoCardz.com

Intel Core Ultra 9 288V could be Lunar Lake flagship SKU Launch could feature more SKUs than previously expected. In the coming weeks, expect to hear more about Intel’s upcoming Core Ultra 200V series, codenamed Lunar Lake. Intel’s announcements at Computex focused heavily on this series, which...

videocardz.com

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Lifer

Diamond Member

Lifer

Diamond Member

Diamond Member

Platinum Member

Member

Senior member

Diamond Member

Elite Member

Senior member

Senior member

Platinum Member

Platinum Member

Platinum Member

Platinum Member

Platinum Member

Platinum Member

Platinum Member

Platinum Member

Lifer

Diamond Member

Platinum Member

Platinum Member

Platinum Member