Speculation: Ryzen 4000 series/Zen 3

cherullo · Apr 6, 2020

Thanks @DisEnchantment , those are nice patches.

The first patch is more about RAS features, there is very little information that could help us deduce other behaviour/features.
It pays to be very specific about those errors, it's the natural path, DDR5 will bring even more powerful memory error reporting features later on.
That said, the L3 cache in Zen2 already acts as a probe filter for the CCX. Quoting https://en.wikichip.org/wiki/amd/microarchitectures/zen_2 :

The L3 cache maintains shadow tags for all cache lines of each L2 cache in the CCX. This simplifies coupled fill/victim transactions between the L2 and L3 cache, and allows the L3 cache to act as a probe filter for requests between the L2 caches in the CCX, external probes and, taking advantage of its knowledge that a cache line shared by two or more L2 caches is exclusive to this CCX, probe traffic to the rest of the system.

This wikichip page is very interesting and well worth a read.

Looking at the second patch, I was expecting to find some evidence of SMT2 or SMT4, but I found nothing conclusive. I don't know what those masks are used for and I'm not sure I found the correct implementation for the topology_is_primary_thread function.

DisEnchantment · Apr 6, 2020

cherullo said:
The first patch is more about RAS features, there is very little information that could help us deduce other behaviour/features.

Indeed you could be right (hence the wild speculation disclaimer ) , because it could mean the probe filter as implemented in Zen2.
But then, this is a new Load store implementation, and I was thinking coherency probe messages instead of the Zen2 style implementation. So fingers crossed
Will try to do more digging.

cherullo said:
It pays to be very specific about those errors, it's the natural path, DDR5 will bring even more powerful memory error reporting features later on.

The error strings are only from the Load/Store bank type. Zen2 added a new SMU, Coherent Slave and PSP. Zen3 added a new Load Store unit. So maybe in Zen4 we can expect a new UMC (and its something to look out for in the kernel patches)

cherullo said:
Looking at the second patch, I was expecting to find some evidence of SMT2 or SMT4, but I found nothing conclusive. I don't know what those masks are used for and I'm not sure I found the correct implementation for the topology_is_primary_thread function.

The second patch is confirming what is already shown in the AMD slides from Martin @ HPC Advisory council.
Specifically, it attempts to filter out perf counters per core, and in this case there is a 3 bit wide mask for the core Id per PMU (Core 0 to 7 ) indicating an 8 core per un-core/ L3 i.e. 8 core share an L3/uncore per CCX.
There is no indication of SMT4 in this patch.

uzzi38 · Apr 6, 2020

I'm the big dumb.

I forgot one more reason to believe Van Gogh isn't semi-custom, and it was a pretty big sign.

It's in the Linux GPU drivers. So each RDNA product has an identifier along the lines of gfx10XY. The X is like a family of products, the Y is the number of the product in the family - the order in which the design was created. It's fundamentally the same as Navi1X/2X, but only includes dies past a later stage on becoming a real product. That is to say for example, Navi11 - a dead project - does not have a gfx number.

Why is this detail important? Well, let me go through the numbers for some dies and their final products.

Navi10Lite - gfx1000 (PS5)
Navi14Lite - gfx1001 (Lockhart?)
Navi10 - gfx1010 (5700XT/5700/5600XT)
Navi12 - gfx1011 (Unknown, but 40CUs and HBM2)
Navi14 - gfx1012 (5500XT/5500M/5300M)
Navi21Lite - gfx1020 (Xbox Series X)
Navi21 - gfx1030 (Rumour: ~500mm^2)
Navi22 - gfx1031 (Rumour: ~250mm^2)
Navi23 - gfx1032 (?)
VanGogh - gfx1033
VanGoghLite - gfx1040

So what does all of that mean, well, for starters, anything Lite is semi-custom. That should be fairly obvious given the fact that the PS5 and Xbox SoCs are both there.

Next thing you might notice is the pattern there.

gfx1000, 1020 and 1040 are all used for semi-custom projects.

gfx1010 is RDNA1.

gfx1030 is RDNA2.

It's worth noting that as semi-custom projects they likely have features their gfx names don't let on. For example, the PS5 is closer to RDNA2 in terms of it's performance, given how it can clock so high. Though that might be after future revisions (Oberon, Flute etc). The RTRT functionality was probably brought over from RDNA2 right from the beginning, but no clue as to the rest.

So then, what are we left with Van Gogh. Well it's bog-standard RDNA2 for starters. Same as the main Navi2X lineup. There's also a seperate semi-custom flavour of Van Gogh. My own personal guess says it's Mero.

That would paint Mero as the semi-custom version of Van Gogh. Possibly with some features from RDNA3... but I kind of doubt that, seems to early for that.

The one oddball in all of this speculation is Navi12. Till date we have absolutely no idea what on Earth that die is for. No benchmarks, nothing. It's weird, because we know it's 40CUs, we know those CUs don't clock high and we know that it can sport 2 stacks of HBM2. Aside that, it's anyone's guess. But it does leave open the possibility that Lite might NOT mean semi-custom... but only a small possibility. Would help if we knew what Navi12 was.

TL;DR - Linux drivers kinda give away the fact that Van Gogh probably isn't semi-custom because it has the same naming scheme as other standard RDNA dies.

EDIT: Some typos and miistakes were corrected.

Glo. · Apr 6, 2020

uzzi38 said:
The one oddball in all of this speculation is Navi12. Till date we have absolutely no idea what on Earth that die is for. No benchmarks, nothing. It's weird, because we know it's 40CUs, we know those CUs don't clock high and we know that it can sport 2 stacks of HBM2. Aside that, it's anyone's guess. But it does leave open the possibility that Lite might NOT mean semi-custom... but only a small possibility. Would help if we knew what Navi12 was.

Apple needs a GPU to replace Navi 14 in MBP 16 at the end of 2020.

Navi 12 fits that perfectly, with HBM2 stack, very low-clocks and 50W TDP.

uzzi38 · Apr 6, 2020

Glo. said:
Apple needs a GPU to replace Navi 14 in MBP 16 at the end of 2020.

Navi 12 fits that perfectly, with HBM2 stack, very low-clocks and 50W TDP.

Navi22 is better for that though, and if we're talking EoY doesn't N22 make more sense?

Glo. · Apr 6, 2020

uzzi38 said:
Navi22 is better for that though, and if we're talking EoY doesn't N22 make more sense?

Navi 22, if it is replacement for Navi 10, yes, makes more sense, but if Apple wants HBM2 that again will require redesigning the chip .

Well there is still this possibility that Navi 12 is a dead project at AMD...

darkswordsman17 · Apr 6, 2020

DisEnchantment said:
The following merged Linux kernel patches suggest major changes in the Load/Store architecture of Zen3/Family 19h

SMCA Patches for Family 19h
New Load store architecture for Zen3/Family 19h [Jan 10th, 2020]

https://github.com/torvalds/linux/commit/89a76171bf50bd20d44338408b8c09433c302956#diff-468c185c1fd58d8f491a5eed2a15825b

Changes in L3 hierarchy suggested by this perf patch

https://github.com/torvalds/linux/commit/e48667b865480d8bf0f1171a8b474ffc785b9ace#diff-da441899381d8c9f80c02f02999a5c90

When I read the MCA error strings for the LS_V2 it strikes me that there is some kind of correlation with the patents applications made public recently.
Patent Applications
20200065275, 20200099993, 20190108861, 20190199617
See also this post

Page 11 - Discussion - Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 11 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

It seems to me that the Zen3 could in fact be the next stepping stone for the chiplet and Infinity Architecture instead of the direct jump and everything implemented in Zen4.
Zen1 introduced NUMA MCM processors.
Zen2 introduced cache coherent Chiplet processors
Zen3 will introduce something as radical as Zen1/2

[WILD SPECULATION]

Reading the error strings it seems, the L3 could be poisoned by a Load/Store/eviction(victimization) AND a "probe"
The probe mechanism is described in the above patents and is triggered by the coherency probe messages from other CCXs. There are mentions of address and state as described in the patents.
CCXs are cache coherent at L3 level.
This implies that some extent of the new on-chip communication coherent fabric is implemented in Zen3. Fingers crossed.

What could this mean
- A direct CCX to CCX communication is possible without the IOD.
- The IOD serves as the base die where the routing logic resides.
- Active blocks could still reside on the IOD but the comminication is not hierachical but rather bus/mesh oriented.
- There could be a reduction in latency compared to Zen for inter CCX due to direct die to die communication
- Reduction in memory access latency as the CCXs can access the memory controller attached to a coherent slave

The Zen IMC is also cache aware and can do speculative access. This is described in some patents in the past. I am not sure it will apply to Zen3 but it does makes sense.
This kind of implementation has already been done before albeit not with a leading edge process and high performance cores... and guess what, made public only recently.
Links for the first ever implementation of a NoC with chiplet and active interposer.

CEA-Leti Demos a 6-Chiplet 96-Core 3D-Stacked MIPS Processor

CEA-Leti demonstrates a high-performance microprocessor architecture with a 96-core MIPS processor built with six chiplets 3D-stacked on an active interposer die.

fuse.wikichip.org

Active Interposer Technology for Chiplet-Based Advanced 3D System Architectures

We report the first successful technology integration of chiplets on an active silicon interposer, fully processed, packaged and tested. Benefits of chiplet-based architectures are discussed. Built up technology is presented and focused on 3D interconnects process and characterization. 3D...

ieeexplore.ieee.org

[/WILD SPECULATION]

So you're saying SMT4? (Sorry I had to!)

Saylick · Apr 7, 2020

darkswordsman17 said:
So you're saying SMT4? (Sorry I had to!)

Veradun · Apr 7, 2020

uzzi38 said:
The one oddball in all of this speculation is Navi12. Till date we have absolutely no idea what on Earth that die is for. No benchmarks, nothing. It's weird, because we know it's 40CUs, we know those CUs don't clock high and we know that it can sport 2 stacks of HBM2.

Can it be a specialized Instinct for DL?

uzzi38 · Apr 7, 2020

Veradun said:
Can it be a specialized Instinct for DL?

Actually that's a really good suggestion. Iirc Navi supports INT4 and INT8. Definitely the latter, not 100% sure about the former.

That actually REALLY makes sense.

uzzi38 · Apr 7, 2020

Glo. said:
Navi 22, if it is replacement for Navi 10, yes, makes more sense, but if Apple wants HBM2 that again will require redesigning the chip .

Well there is still this possibility that Navi 12 is a dead project at AMD...

Yes, but it's worth taking into account RDNA2's power efficiency uplift.

Navi22 is probably good enough for desktop ~5600XT or maybe even 5700 performance in a notebook chassis.

(As in, within a 90W TDP)

DisEnchantment · Apr 7, 2020

uzzi38 said:
The one oddball in all of this speculation is Navi12. Till date we have absolutely no idea what on Earth that die is for. No benchmarks, nothing. It's weird, because we know it's 40CUs, we know those CUs don't clock high and we know that it can sport 2 stacks of HBM2. Aside that, it's anyone's guess. But it does leave open the possibility that Lite might NOT mean semi-custom... but only a small possibility. Would help if we knew what Navi12 was.

Not in the right thread to add this .. but
LLVM is used by AMD as the shader compiler, from there we can see the following

[FeatureGFX10,
FeatureLDSBankCount32,
FeatureDLInsts,
FeatureDot1Insts,
FeatureDot2Insts,
FeatureDot5Insts,
FeatureDot6Insts,
FeatureNSAEncoding,
FeatureWavefrontSize32,
FeatureScalarStores,
FeatureScalarAtomics,
FeatureScalarFlatScratchInsts,
FeatureDoesNotSupportXNACK,
FeatureCodeObjectV3]

GFX1011 has many low precision matrix operations not present in Navi10. Additionally, it does not have the LDS bug present in Navi10 either.
Strangely GMCv10 code does not show it support HBM in amdgpu, for now at least. In contrast GMCv9 has vram width calculation for both HBM and GDDR. Unless it is not meant for Linux?

Glo. · Apr 7, 2020

uzzi38 said:
Yes, but it's worth taking into account RDNA2's power efficiency uplift.

Navi22 is probably good enough for desktop ~5600XT or maybe even 5700 performance in a notebook chassis.

(As in, within a 90W TDP)

If RDNA2 is what people are touting, then RX 5600 XT performance is very... conservative bet .

Unless Navi 22 is replacing Navi 14, and the die sizes of RDNA2 GPUs are bloated similarly to Turing's.

RDNA2 should bring once more around 25% IPC increase, and we are not counting for clock increases. Well that 50% performance per watt uplift has to come from somewhere.

Thunder 57 · Apr 7, 2020

Glo. said:
If RDNA2 is what people are touting, then RX 5600 XT performance is very... conservative bet .

Unless Navi 22 is replacing Navi 14, and the die sizes of RDNA2 GPUs are bloated similarly to Turing's.

RDNA2 should bring once more around 25% IPC increase, and we are not counting for clock increases. Well that 50% performance per watt uplift has to come from somewhere.

I have a feeling RDNA1 was kinda rushed and RDNA2 will be what RDNA should have been all along. Should be interesting, waiting to learn more about it as it may be my next GPU.

Glo. · Apr 7, 2020

Thunder 57 said:
I have a feeling RDNA1 was kinda rushed and RDNA2 will be what RDNA should have been all along. Should be interesting, waiting to learn more about it as it may be my next GPU.

There is plenty of variables. But if anything, Both: efficiency of XSX and Raw GPU Clock speeds of PS5 GPU is good indication of what to expect from RDNA2 GPUs.

My dream, however, and I mean dream, is if AMD will push the memory bandwidth envelope from top to bottom. And I wish that we will see 192 bit GPU that has under 200 mm2 die size.

Thunder 57 · Apr 7, 2020

Glo. said:
There is plenty of variables. But if anything, Both: efficiency of XSX and Raw GPU Clock speeds of PS5 GPU is good indication of what to expect from RDNA2 GPUs.

My dream, however, and I mean dream, is if AMD will push the memory bandwidth envelope from top to bottom. And I wish that we will see 192 bit GPU that has under 200 mm2 die size.

192 bit is just a nonstarter for me because of the RAM configurations. 6GB is too little (see Doom Eternal) and 12GB probably would be too costly and not make sense.

Glo. · Apr 7, 2020

Thunder 57 said:
192 bit is just a nonstarter for me because of the RAM configurations. 6GB is too little (see Doom Eternal) and 12GB probably would be too costly and not make sense.

Sure, but it also pushes the VRAM in sub 200$ price bracket from 4 GB to 6 GB's and from 192/224GB/s to over 300 GB/s.

Thunder 57 · Apr 7, 2020

Glo. said:
Sure, but it also pushes the VRAM in sub 200$ price bracket from 4 GB to 6 GB's and from 192/224GB/s to over 300 GB/s.

Fair point. I'm just glad I went with the RX 480 8GB ($239) vs 4GB ($199) because of Doom Eternal.Who would've thought $40 four years ago would make such a difference in games like Doom Eternal and Battlefield.

I penny pinched and got the 3570k instead of the 3770k and those extra four threads would've helped a ton in BF1. Upgrading the CPU with the same RX 480 made a significant difference in some of the more "busy" maps.

maddie · Apr 7, 2020

Glo. said:
There is plenty of variables. But if anything, Both: efficiency of XSX and Raw GPU Clock speeds of PS5 GPU is good indication of what to expect from RDNA2 GPUs.

My dream, however, and I mean dream, is if AMD will push the memory bandwidth envelope from top to bottom. And I wish that we will see 192 bit GPU that has under 200 mm2 die size.

Shouldn't you just wish that the GPUs are balanced instead of predetermining what is needed? Unless you have some clairvoyant abilities of course. Starved for bandwidth is bad, but paying for something you don't need is not really better.

Glo. · Apr 8, 2020

maddie said:
Shouldn't you just wish that the GPUs are balanced instead of predetermining what is needed? Unless you have some clairvoyant abilities of course. Starved for bandwidth is bad, but paying for something you don't need is not really better.

Well Ray Tracing requires you to have insane amount of Memory Bandwidth available. And RDNA2 has RT tech from top, to bottom, because its inherent part of the architecture.

Glo. · Apr 8, 2020

uzzi38 said:
I'm the big dumb.

I forgot one more reason to believe Van Gogh isn't semi-custom, and it was a pretty big sign.

It's in the Linux GPU drivers. So each RDNA product has an identifier along the lines of gfx10XY. The X is like a family of products, the Y is the number of the product in the family - the order in which the design was created. It's fundamentally the same as Navi1X/2X, but only includes dies past a later stage on becoming a real product. That is to say for example, Navi11 - a dead project - does not have a gfx number.

Why is this detail important? Well, let me go through the numbers for some dies and their final products.

Navi10Lite - gfx1000 (PS5)
Navi14Lite - gfx1001 (Lockhart?)
Navi10 - gfx1010 (5700XT/5700/5600XT)
Navi12 - gfx1011 (Unknown, but 40CUs and HBM2)
Navi14 - gfx1012 (5500XT/5500M/5300M)
Navi21Lite - gfx1020 (Xbox Series X)
Navi21 - gfx1030 (Rumour: ~500mm^2)
Navi22 - gfx1031 (Rumour: ~250mm^2)
Navi23 - gfx1032 (?)
VanGogh - gfx1034
VanGoghLite - gfx1040

So what does all of that mean, well, for starters, anything Lite is semi-custom. That should be fairly obvious given the fact that the PS5 and Xbox SoCs are both there.

Next thing you might notice is the pattern there.

gfx1000, 1020 and 1040 are all used for semi-custom projects.

gfx1010 is RDNA1.

gfx1030 is RDNA2.

It's worth noting that as semi-custom projects they likely have features their gfx names don't let on. For example, the PS5 is closer to RDNA2 in terms of it's performance, given how it can clock so high. Though that might be after future revisions (Oberon, Flute etc). The RTRT functionality was probably brought over from RDNA2 right from the beginning, but no clue as to the rest.

So then, what are we left with Van Gogh. Well it's bog-standard RDNA2 for starters. Same as the main Navi2X lineup. There's also a seperate semi-custom flavour of Van Gogh. My own personal guess says it's Mero.

That would paint Mero as the semi-custom version of Van Gogh. Possibly with some features from RDNA3... but I kind of doubt that, seems to early for that.

The one oddball in all of this speculation is Navi12. Till date we have absolutely no idea what on Earth that die is for. No benchmarks, nothing. It's weird, because we know it's 40CUs, we know those CUs don't clock high and we know that it can sport 2 stacks of HBM2. Aside that, it's anyone's guess. But it does leave open the possibility that Lite might NOT mean semi-custom... but only a small possibility. Would help if we knew what Navi12 was.

TL;DR - Linux drivers kinda give away the fact that Van Gogh probably isn't semi-custom because it has the same naming scheme as other standard RDNA dies.

EDIT: Some typos and miistakes were corrected.

Getting back to topic. Here is my question.

Does Van Gogh/Mero has LPDDR5 memory controller? 16 CUs should be fed with ease then with 6400 MHz LPDDR5 with 128 bits(128 GB/s bandwidth available for both: CPU and GPU). If it has 256 bits, VGH will have available 256 GB/s of memory bandwidth.

If VGH has LPDDR5 controller maybe it doesn't need HBM2 stacks? Thoughts?

NTMBK · Apr 8, 2020

Glo. said:
There is plenty of variables. But if anything, Both: efficiency of XSX and Raw GPU Clock speeds of PS5 GPU is good indication of what to expect from RDNA2 GPUs.

My dream, however, and I mean dream, is if AMD will push the memory bandwidth envelope from top to bottom. And I wish that we will see 192 bit GPU that has under 200 mm2 die size.

The memory bus width is directly linked to GPU cost. Wider bus means

-more pins on the GPU, so larger minimum die size
-more traces on the PCB, so probably more layers to be able to route them all
-more memory chips
-higher power consumption, so a larger cooler and more expensive power delivery circuits

I think it makes sense for a premium laptop APU, because it's replacing a 128-bit CPU and 128-bit GPU with a 192-bit APU- an overall reduction. But for low-end GPUs, it's going to be hard to make the economics work.

uzzi38 · Apr 8, 2020

Glo. said:
Getting back to topic. Here is my question.

Does Van Gogh/Mero has LPDDR5 memory controller? 16 CUs should be fed with ease then with 6400 MHz LPDDR5 with 128 bits(128 GB/s bandwidth available for both: CPU and GPU). If it has 256 bits, VGH will have available 256 GB/s of memory bandwidth.

If VGH has LPDDR5 controller maybe it doesn't need HBM2 stacks? Thoughts?

I would be very, very suprised if AMD went with HBM2, because they're not even planning on doing that with consumer facting dGPUs afaik. Not anymore anyway.

As for LPDDR5... no clue. I mean, it would explain some things, but I would have thought supply would be tight that even Intel dropped LPDDR5 support from Tiger Lake.

DisEnchantment · Apr 8, 2020

Any chance for one of these Van Gogh derivatives to be destined for a Microsoft Surface device?
I remember AMD already said they were already working on the next gen device for MS right at the time they launch the AMD Ryzen™ 5 3580U/3780U Microsoft Surface® Edition

With all the things MS is having with AMD ( XSX, Azure, Surface, xCloud ), I bet MS could go to AMD and say, guys give us some rebates for the XSX and we will put more of your stuff on Azure, or give us a custom DC CPU for xCloud or custom Chip for a Surface device.

uzzi38 · Apr 8, 2020

DisEnchantment said:
Any chance for one of these Van Gogh derivatives to be destined for a Microsoft Surface device?
I remember AMD already said they were already working on the next gen device for MS right at the time they launch the AMD Ryzen™ 5 3580U/3780U Microsoft Surface® Edition

With all the things MS is having with AMD ( XSX, Azure, Surface, xCloud ), I bet MS could go to AMD and say, guys give us some rebates for the XSX and we will put more of your stuff on Azure, or give us a custom DC CPU for xCloud or custom Chip for a Surface device.

If anyone were to ask for anything premium from AMD my first guesses would always be Apple and Microsoft, so yes.

Speculation: Ryzen 4000 series/Zen 3

Member

Golden Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Lifer

Diamond Member

Senior member

Platinum Member

Platinum Member

Golden Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Platinum Member

Golden Member

Platinum Member