Question Speculation: RDNA2 + CDNA Architectures thread

uzzi38 · Apr 28, 2020

All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html

Zstream · Sep 19, 2020

pandemonium said:
It really is.

If I'm understanding what they're laying out in theory, they want to basically AI the entire pipeline from the start, by task.

Given their wide range of compute tests they used, I can see this having an impact on real-time rendering. Like DLSS improving vastly over a generation, this could have broad ramifications for how efficiently GPGPUs handle their tasks.

This will likely not help FPS games by 20% IMO, but rather compute workloads. I could see this being used in certain game engines when textures and meshes are reused.

Kenmitch · Sep 19, 2020

Zstream said:
This will likely not help FPS games by 20% IMO, but rather compute workloads. I could see this being used in certain game engines when textures and meshes are reused.

What about hybrid ray tracing? Just wondering is all.

Zstream · Sep 19, 2020

Kenmitch said:
What about hybrid ray tracing? Just wondering is all.

I believe ray tracing is unique at every turn, or at least was when I ran simulations 10 years ago or so on my CPU. However, if meshes and textures are reused, it will help with tessellated scenes while offloading compute to RT.

Krteq · Sep 19, 2020

Well...Yes.

Ray Tracing (BVH + Intersections) is quite heavy on cache and bandwidth

Zstream · Sep 19, 2020

Krteq said:
Well...Yes.

Ray Tracing (BVH + Intersections) is quite heavy on cache and bandwidth

In what fluid scenes would you cache RT objects? The moment the camera moves, the light object changes course and it must compute Rays all over again. Maybe static scenes.

Kenmitch · Sep 19, 2020

Zstream said:
In what fluid scenes would you cache RT objects?

I'm not really into the technical aspects of the way things work.

What about when your panning up, down, left, right looking for your next victim? At least it sounds like it would be better to cache then do it all over again.

Krteq · Sep 19, 2020

Zstream said:
In what fluid scenes would you cache RT objects? The moment the camera moves, the light object changes course and it must compute Rays all over again. Maybe static scenes.

What? I'm not talking about caching any objects, it doesn't make any sense

I'm talking about BVH + Ray Intersection calculations are heavy on cache and bandwith, because you have to store BVH Data in cache for Ray intersection testing etc.

A/// · Sep 19, 2020

blckgrffn said:
And by writing needless replies to people that have differing opinions and 0% likelihood of changing said opinions? Yup.

I also like the angry ones. Spotted one on another forum yesterday where two AMD fans were threatening to murder each other because one said AMD drivers have always been crap and the other said he was lying, and then they realized they lived in the same country, and then the threats began. It was quite the show. I wish I had chips to snack on as it unfolded.

A/// · Sep 19, 2020

GodisanAtheist said:
- But is it going to be ready for a top to bottom RDNA2 stack? This type of radical technological shift looks like it would be a prime candidate for a pipe cleaner product or a mid gen refresh, not a top to bottom stack launch.

Wonder if this is the kind of thing that's being kept in the pipe for an RDNA3 launch or even further down the line.

After all the promises of the new pathways and discard accelerators etc in Vega and Polaris I would be less surprised if AMD managed to bork the physical design so the feature is useless than not.

This is AMD we're talking about. It's not unheard of them to drop everything for something new even if it means leaving older customers flailing in the wind. A mid-gen refresh doesn't seem like the type of thing you'd do with a so-called MCM GPU coming after that. You'd want to set up your foundations well in advance and gather remotely collected data over time to improve your design process.

jamescox · Sep 19, 2020

GodisanAtheist said:
- But is it going to be ready for a top to bottom RDNA2 stack? This type of radical technological shift looks like it would be a prime candidate for a pipe cleaner product or a mid gen refresh, not a top to bottom stack launch.

Wonder if this is the kind of thing that's being kept in the pipe for an RDNA3 launch or even further down the line.

After all the promises of the new pathways and discard accelerators etc in Vega and Polaris I would be less surprised if AMD managed to bork the physical design so the feature is useless than not.

They do need some outside of the box solution to compete with Nvidia. I don’t think they will compete well if they just make a bigger gpu. They are increasing efficiency significantly, but that doesn’t seem sufficient, just necessary. They may have done better than Nvidia though, with TSMC vs. Samsung processes.

The cache rumors are still a bit odd. Such a cache may help significantly with ray tracing though. If they are saying that it will perform like it has a 384-bit bus with only 256-bits, then it seems like it needs to be a 4 IFOP style links to deliver that kind of bandwidth given the speed of GDDR6. I guess it could actually be 4x single link devices which would be very interesting, but expensive. That might still be cheap though, especially if it was made at GF or something.

This had me wondering how such a device could be re-used with Epyc processors. For Epyc, the most sense would be as an additional chip that would fit in between the IO die and the cpu die without really needing to change the IO die or the cpu die. If they could fit such a cache chip on either side of the Epyc IO die, they could have 2x 128 MB transparent L4 caches per Epyc package if they have 8 IFOP on the cache chip rather than just 4. Four could connect to the IO die and the other 4 to the cpu dies. The problem is, such a device would be quite large, possibly 200 to 250 square mm or so. That probably wouldn’t fit with the existing Epyc package layout.

This is wild speculation, but this led me to wonder if it could be a “pipe cleaning“ stacked device. With Zen 4, they would want to move to an active interposer for the IO die with cpu and possibly memory stacked on top. That is a big change to do all at once. Could this cache device be a precursor with 8x IFOP in one layer and 128 MB of cache stacked? Maybe later the cache die stack with the cpu. That would probably fit on an existing Epyc package with essentially the same layout. They could make some without cache and some with cache. Intel wouldn’t even compete with the cheaper parts with no L4.

This is continuing wild speculation, but if an RDNA GPU has 4 IFOP, then they could technically connect two GPUs together directly, or with one of these caches in between, with something like 150 to 200 GB/s or so in each direction. There has been some infinity architecture slides that show CDNA GPUs with what looks like 6 links, connecting up to 8 GPUs to each other. That isn’t actually fully connected, but the slide may not representative, or they don’t support fully connected with 8 GPUs. It may be possible to connect GPUs with IFOP on the same board. The die used in current AMD MCMs are really just BGA packages, which is why they shouldn't be called chiplets; chiplets should be reserved for devices on silicon interposers. For HPC, I could see them possibly mounting 4 to 8 HBM GPUs very close to each other with IFOP connecting adjacent GPUs together and IFIS style links for the longer runs. You would definitely need water cooling, but a lot of HPC has already gone water cooling. If they are IFIS links rather than IFOP, then it wouldn’t be quite as fast or power efficient, but it would allow for multiple GPUs on the same board with larger spacing between them.

If they are using it on their CDNA architecture then it wouldn’t be that much of a stretch that it would show up in the consumer cards if they can figure out how to make good use of it. Multi-gpu works fine for some compute applications. I have seen cases where 2 GPUs with half of the compute and half the memory bandwidth perform almost the same as one large gpu. That doesn’t necessarily work well with rendering. Although, at that speed, they could do some unified memory and some other things that might be interesting. AMD has fully virtualized memory for their GPUs similar to CPU memory, which should facilitate sharing.

This is brings me back around to wonder if the 128 MB cache rumor is that someone misunderstood. We seem to only have one source for that. Could it actually be that it is 128-bit infinity fabric connection (4x IFOP) rather than 128 MB “infinty cache”? The whole infinity cache thing could just be made up, or it could refer to something completely different, like sharing memory across infinity fabric.

I think this has been my journey for this weekend. I have things to do.

A/// · Sep 19, 2020

I haven't bothered to look into as much as you have, @jamescox but when Andrei mentioned it last week, he did state (from memory) that it would benefit their entire range of processors as a simple cache system, and it wouldn't be costly at scale.

Zen 4 could be larger. Ideally you wouldn't want it too large with too much space, as that will begin to affect the little things.

jamescox said:
This is continuing wild speculation, but if an RDNA GPU has 4 IFOP, then they could technically connect two GPUs together directly, or with one of these caches in between, with something like 150 to 200 GB/s or so in each direction. There has been some infinity architecture slides that show CDNA GPUs with what looks like 6 links, connecting up to 8 GPUs to each other.

Sorry, are you referring to the patent going around with the crossbar in the middle? It's only suspected that is for CDNA and not CDNA and RDNA. It is a stepping stone towards MCM which is rumored to be slated for RDNA3.

Glo. · Sep 20, 2020

Olikan said:
There is crazy impressive research from AMD, that pretty much IS this patent....

https://t.co/nZopFRUt9V?amp=1

Just a quote:
"We extensively evaluate our proposal across 28 GPGPU applications. Our dynamic scheme boosts performance by 22% (up to 52%) and energy efficiency by 49% for the applications that exhibit high data replication and cache sensitivity without degrading the performance of the other applications. This is achieved at a modest
area overhead of 0.09 mm2/core."

There are two, directly related Patents together.

https://www.freepatentsonline.com/20200133868.pdf

https://www.freepatentsonline.com/20200293445.pdf

I think we might be getting an idea of what that mythical already Infinity Cache might be...

Mopetar · Sep 20, 2020

Glo. said:
There are two, directly related Patents together.

https://www.freepatentsonline.com/20200133868.pdf

https://www.freepatentsonline.com/20200293445.pdf

I think we might be getting an idea of what that mythical already Infinity Cache might be...

I wouldn't get hopes up about seeing fancy new stuff. We saw the same posts about patents for cool technology when Vega was being discussed. Most of it never made it in and the magic drivers never materialized for most of it.

Maybe that's how they're getting performance uplifts now though. Older tech that's fallen off our radar or stuff that's fully baked finally.

Olikan · Sep 20, 2020

Mopetar said:
Maybe that's how they're getting performance uplifts now though. Older tech that's fallen off our radar or stuff that's fully baked finally.

IMO, it's the only way (as we know) that gives big performance and power savings...

The latest big perf/watt increases, was Maxwell unification of SIMD/Wavefront sizes. With it, instructions scheduling was greatly simplified and registry pressure a thing of the past (cof Ampere cof)... well, RDNA1 already did that...

Glo. · Sep 20, 2020

Mopetar said:
I wouldn't get hopes up about seeing fancy new stuff. We saw the same posts about patents for cool technology when Vega was being discussed. Most of it never made it in and the magic drivers never materialized for most of it.

Maybe that's how they're getting performance uplifts now though. Older tech that's fallen off our radar or stuff that's fully baked finally.

We also were not expecting Super-SIMD patents to appear with RDNA1, right?

Kenmitch · Sep 20, 2020

Mopetar said:
I wouldn't get hopes up about seeing fancy new stuff. We saw the same posts about patents for cool technology when Vega was being discussed. Most of it never made it in and the magic drivers never materialized for most of it.

It is a speculation thread after all. Not saying that it's best to over hype possible new features, but one mans speculation is no better than the others.

Carry on!

sirmo · Sep 20, 2020

Mopetar said:
I wouldn't get hopes up about seeing fancy new stuff. We saw the same posts about patents for cool technology when Vega was being discussed. Most of it never made it in and the magic drivers never materialized for most of it.

Maybe that's how they're getting performance uplifts now though. Older tech that's fallen off our radar or stuff that's fully baked finally.

Except this time none of what we know about the upcoming consoles would make sense if there weren't some significant improvement in the RDNA arch.

So I am personally excited and hope AMD can displace Nvidia at the top like AMD did with Intel. This would be great for consumers as Nvidia and their dirty tricks are getting stale.

Panino Manino · Sep 20, 2020

I'm not liking all these rumors and the hype they may create.
It may result in AMD's crowing achievement or a disappointment bigger than Vega.

Don't we already have a good enough idea of how RDNA2 perform based on the new video games?

Kenmitch · Sep 20, 2020

Panino Manino said:
I'm not liking all these rumors and the hype they may create.
It may result in AMD's crowing achievement or a disappointment bigger than Vega.

Don't we already have a good enough idea of how RDNA2 perform based on the new video games?

I think about all we really know so far is what the cooler look like along with it'll perform better then RDNA.

Best to just take all the rumors/alleged leaks/fud/etc with a dose of salt and wait and see how it all pans out.

darkswordsman17 · Sep 20, 2020

I would personally be pretty surprised if they have a large dedicated cache that isn't about mitigating chiplets, so my guess is that's much more related to the compute stuff (which doesn't mean we won't see it on gaming cards, just that I don't think it'll be across the stack or for anything but probably the highest end "Prosumer" stuff which would probably be $1000+). I could see it being part of an I/O chip, along with HBM on an interposer and so be limited to those cards.

Random out of nowhere guess, we might see a dual chip card for the very high end, where it won't offer as much as the compute does (where that can offer up to 2x performance if not higher than 2x what a single of those GPU would thanks to the cache), but it'll bring enough of an uplift that it'll be up with 3090/Titan/Ti. But get this, it might not be any more power hungry than the 3090 because it'll be optimized (meaning if a game can be worked to use both GPUs, it'll enable both but at lower clock speeds that keep power and heat in check, but other cases it'll just use the cache, HBM, and single GPU), with them allowing AIBs to sell unlocked versions (with extra beefy coolers - probably water).

I do also wonder if maybe one of the compute chips might be good for ray-tracing or for DLSS type of thing, so they could pair a gaming GPU with a compute one.

Since the one card shown had a USB-C connector, it makes me wonder if it might not be targeting VR, where they do a per eye rendering, and use the cache for shared stuff. I wonder if we might even see something interesting, like AMD release a PC version of the next Sony PSVR headset. AMD had talked about the new VR standard over USB-C, which Nvidia already abandoned (because none of the headset makers adopted it, but AMD might have a collaboration with Sony - which Sony is already talking about porting their 1st party games to PC - seems like they're starting to build the bridge towards streaming).

I don't think AMD should need the cache to compete with the 3080. I do think we'll see a 3090 competitive chip early next year, that's pushed very high, using HBM. and I think Nvidia will show off Ti versions and maybe a Titan, which then AMD will respond with the monster dual GPU card.

One reason I feel this way somewhat is that I think there's being work done to implement mGPU in games (in the way that DX12 was supposed to), as a means to getting to chiplets. Nvidia's recent announcement with regards to SLI, where their rationale is that developers now have the ability to do that to me is a signal of that. Just my personal opinion. I don't think it'll be a big boost (I'd guess avg boost over single GPU will be like 25-40%, with outliers being all over the place maybe a few edge cases where it gets a big boost, but for the most part, making it not worth the cost for most people).

I base this off of nothing but my own made up thoughts, I have no insider info or even a crystal ball and have been wrong multiple times (was wrong on Radeon VII, Navi, and arguably wrong on console predictions). Have your himalayan salt lamp on.

Kenmitch · Sep 20, 2020

darkswordsman17 said:
I would personally be pretty surprised if they have a large dedicated cache that isn't about mitigating chiplets, so my guess is that's much more related to the compute stuff (which doesn't mean we won't see it on gaming cards, just that I don't think it'll be across the stack or for anything but probably the highest end "Prosumer" stuff which would probably be $1000+). I could see it being part of an I/O chip, along with HBM on an interposer and so be limited to those cards.

Random out of nowhere guess, we might see a dual chip card for the very high end, where it won't offer as much as the compute does (where that can offer up to 2x performance if not higher than 2x what a single of those GPU would thanks to the cache), but it'll bring enough of an uplift that it'll be up with 3090/Titan/Ti. But get this, it might not be any more power hungry than the 3090 because it'll be optimized (meaning if a game can be worked to use both GPUs, it'll enable both but at lower clock speeds that keep power and heat in check, but other cases it'll just use the cache, HBM, and single GPU), with them allowing AIBs to sell unlocked versions (with extra beefy coolers - probably water).

I do also wonder if maybe one of the compute chips might be good for ray-tracing or for DLSS type of thing, so they could pair a gaming GPU with a compute one.

Since the one card shown had a USB-C connector, it makes me wonder if it might not be targeting VR, where they do a per eye rendering, and use the cache for shared stuff. I wonder if we might even see something interesting, like AMD release a PC version of the next Sony PSVR headset. AMD had talked about the new VR standard over USB-C, which Nvidia already abandoned (because none of the headset makers adopted it, but AMD might have a collaboration with Sony - which Sony is already talking about porting their 1st party games to PC - seems like they're starting to build the bridge towards streaming).

I don't think AMD should need the cache to compete with the 3080. I do think we'll see a 3090 competitive chip early next year, that's pushed very high, using HBM. and I think Nvidia will show off Ti versions and maybe a Titan, which then AMD will respond with the monster dual GPU card.

One reason I feel this way somewhat is that I think there's being work done to implement mGPU in games (in the way that DX12 was supposed to), as a means to getting to chiplets. Nvidia's recent announcement with regards to SLI, where their rationale is that developers now have the ability to do that to me is a signal of that. Just my personal opinion. I don't think it'll be a big boost (I'd guess avg boost over single GPU will be like 25-40%, with outliers being all over the place maybe a few edge cases where it gets a big boost, but for the most part, making it not worth the cost for most people).

I base this off of nothing but my own made up thoughts, I have no insider info or even a crystal ball and have been wrong multiple times (was wrong on Radeon VII, Navi, and arguably wrong on console predictions). Have your himalayan salt lamp on.

Well it is a speculation thread after all.

soresu · Sep 20, 2020

darkswordsman17 said:
Since the one card shown had a USB-C connector, it makes me wonder if it might not be targeting VR, where they do a per eye rendering, and use the cache for shared stuff. I wonder if we might even see something interesting, like AMD release a PC version of the next Sony PSVR headset. AMD had talked about the new VR standard over USB-C, which Nvidia already abandoned (because none of the headset makers adopted it, but AMD might have a collaboration with Sony - which Sony is already talking about porting their 1st party games to PC - seems like they're starting to build the bridge towards streaming).

It's more likely to be standard DP 2.0 alt mode with DSC, that makes Virtualink redundant anyways.

There's been enough time since the announcement for it to be integrated into RDNA2 dGPU's.

A/// · Sep 20, 2020

Panino Manino said:
Don't we already have a good enough idea of how RDNA2 perform based on the new video games?

Only if you have direct access to the new consoles and can extrapolate data based on a third of what the supposed CU count is, minus downgrading of graphics because you wouldn't be running ultra levels on a console, and a custom Zen 2 APU that reportedly straddles between Zen+ and Zen 2 APUs, new decompression method, and more in a 300-350 watt box.

jamescox · Sep 20, 2020

A/// said:
I haven't bothered to look into as much as you have, @jamescox but when Andrei mentioned it last week, he did state (from memory) that it would benefit their entire range of processors as a simple cache system, and it wouldn't be costly at scale.

Zen 4 could be larger. Ideally you wouldn't want it too large with too much space, as that will begin to affect the little things.

Sorry, are you referring to the patent going around with the crossbar in the middle? It's only suspected that is for CDNA and not CDNA and RDNA. It is a stepping stone towards MCM which is rumored to be slated for RDNA3.

I am just trying to figure out how such an “infinity cache” would make sense for both RDNA and Epyc. I wouldn’t put to much importance on such patents; it may be relevant and it may not. If it is just a kind of single ended infinity fabric connected cache (not pass through) , then it would need to be very configurable, so that patent may be related. It doesn’t seem like it makes sense that it is a single link device since that really isn’t much bandwidth for graphics. It could make a big difference in ray tracing and other compute stuff though. Ilike the idea that darkswordsman17 had about including a CDNA die. Perhaps the infinity cache is for a compute die That offloads certain task from the main RDNA gpu. I am not sure how single link devices would be used in Epyc. I guess you could just replace some CPU die with cache die. With two links, they could have the IO die on one side and the cpu die on the other. You would get big L4 caches, but only 32 cores. I don’t know if we will find anything out about this until RDNA reveal. The Zen 3 reveal may not include anything about this. I doubt that it would be used in a Ryzen 5000 consumer chip or ThreadRipper.

A/// · Sep 20, 2020

jamescox said:
I am just trying to figure out how such an “infinity cache” would make sense for both RDNA and Epyc. I wouldn’t put to much importance on such patents; it may be relevant and it may not. If it is just a kind of single ended infinity fabric connected cache (not pass through) , then it would need to be very configurable, so that patent may be related. It doesn’t seem like it makes sense that it is a single link device since that really isn’t much bandwidth for graphics. It could make a big difference in ray tracing and other compute stuff though. Ilike the idea that darkswordsman17 had about including a CDNA die. Perhaps the infinity cache is for a compute die That offloads certain task from the main RDNA gpu. I am not sure how single link devices would be used in Epyc. I guess you could just replace some CPU die with cache die. With two links, they could have the IO die on one side and the cpu die on the other. You would get big L4 caches, but only 32 cores. I don’t know if we will find anything out about this until RDNA reveal. The Zen 3 reveal may not include anything about this. I doubt that it would be used in a Ryzen 5000 consumer chip or ThreadRipper.

That's the thing. Ever since I spotted Andrei's post I've been mulling over it. Makes sense for static data or raytrace material. Isn't so useful for constant switch of live data. But even then, if you take that out the then rumored 256 bit bus makes even less sense. Why would you, if that was the big Navi, cut it off at the kneecaps. My only suggestive reasoning is because RTG was assigned a slew of Zen engineers to help or takeover. This could all also be a smoke screen by AMD. Those were heat sensitive labels on that leak photo we saw. This whole charade is bizarre.

I find it a little bizarre that we've barely heard anything about either of AMD's upcoming products other than what they willingly have said, and yet we're supposed to believe that someone at AMD's overseas development centers leaked a photo of their flagship card that's coming out in a month?

Or those photos Jason from Jayz made a video on... If it's a mockup, it's done in-house. Even the renders officially released are a bit of a stretch.

Question Speculation: RDNA2 + CDNA Architectures thread

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Golden Member

Senior member

Diamond Member

Lifer

Diamond Member

Platinum Member

Diamond Member

Senior member

Diamond Member