Discussion Ada/'Lovelace'? Next gen Nvidia gaming architecture speculation

biostud · Mar 22, 2022

Obviously it must be because of cost, but how much does it cost to go HBM over traditional GDDR? It is not like we haven't seen HBM on a consumer card before. But with HBM2e and HBM3 we can get plenty of capacity and bandwidth, unlike 1st generation of HBM.

Antey · Mar 22, 2022

SMs = 132
FP32 Cores / SM = 128
FP32 Cores / GPU = 15872

What?

132 SMs x 128 FP32 per SM is 16896

H100 PCIe5 numbers are also 'wrong'

Mopetar · Mar 22, 2022

igor_kavinski said:
How is Nvidia planning to counter 256MB/512MB of stacked Infinity Cache on RDNA3?

It won't change anything much. If you look back at the slides AMD provided, the performance really tapered off after 128 MB for anything that wasn't 4K. It just means that a competing NVidia card doesn't gain as much as the resolutions increase like we see with current cards where AMD will win at 1080p but gets slightly overtaken at 1440p and is solidly behind in 4K.

For other workloads it doesn't matter much unless it lets some data set fit entirely in cache that otherwise couldn't. Otherwise the cache is larger, but there are more CUs to feed that are requesting data. For some workloads it comes out a wash.

Timmah! · Mar 22, 2022

Mopetar said:
It won't change anything much. If you look back at the slides AMD provided, the performance really tapered off after 128 MB for anything that wasn't 4K. It just means that a competing NVidia card doesn't gain as much as the resolutions increase like we see with current cards where AMD will win at 1080p but gets slightly overtaken at 1440p and is solidly behind in 4K.

For other workloads it doesn't matter much unless it lets some data set fit entirely in cache that otherwise couldn't. Otherwise the cache is larger, but there are more CUs to feed that are requesting data. For some workloads it comes out a wash.

I do think Infinity Cache is L3, while this talk in regard to Lovelace is about L2. Surely there is advantage in that.
I think raytracing will benefit from it especially, and is the primary reason of the increase (if it happens).

igor_kavinski · Mar 22, 2022

Timmah! said:
I do think Infinity Cache is L3, while this talk in regard to Lovelace is about L2. Surely there is advantage in that.
I think raytracing will benefit from it especially, and is the primary reason of the increase (if it happens).

Maybe that's the only way AMD knows how to increase raytracing performance.

Aapje · Mar 22, 2022

It's probably because they haven't finalized the cooling solution yet, so they don't know what they can get away with.

Frenetic Pony · Mar 22, 2022

It's bigger! Bigger and more power efficient. Ok the doubling, again, of 8bit performance is cool. Something neat there I suppose; though I feel it's odd to put this massively expensive card out there for... inference work. Who'd buy it for that? Well it'll help on their next gen of inference specific hardware. But otherwise, it's just bigger-er. And that's fine and dandy for their "we can charge a hell of a lot per card" AI accelerators.

But other than limiting itself to a standard power input, again putting a total damper on the "850 watts 4090" stuff, it doesn't feel like this announcement gives us a whole lot to go in for the consumer end. I mean surely you can half the size of this massive card, probably about what a 4090 would be, limit it to 600 watts and still have headroom to clock it up to 2.something ghz. As for bandwidth. HBM? Probably not. No word on any changes to the cache structure either, so no hints of a big LLC either way; and the big AI card might not even include such a feature.

Guess, other than concluding the powerdraw could easily stick to 600watts or less, we'll have to wait 5 months or whatever until the Hopper consumer launch to find out.

eek2121 · Mar 23, 2022

Mopetar said:
It won't change anything much. If you look back at the slides AMD provided, the performance really tapered off after 128 MB for anything that wasn't 4K. It just means that a competing NVidia card doesn't gain as much as the resolutions increase like we see with current cards where AMD will win at 1080p but gets slightly overtaken at 1440p and is solidly behind in 4K.

For other workloads it doesn't matter much unless it lets some data set fit entirely in cache that otherwise couldn't. Otherwise the cache is larger, but there are more CUs to feed that are requesting data. For some workloads it comes out a wash.

Apples and Oranges comparison. AMD's cache hit rate issue was about the architecture, not anything related to caching itself. Improvements to the architecture will drastically improve the hit rate for IC. Nonetheless, as another user stated, this is L2 cache being discussed, not IC.

DisEnchantment · Mar 23, 2022

Something is strange with Hopper's Power consumption. 700W and going from N7 to N4. I was expecting a real efficient chip using N4, like 400W or less.
N4 > N5 > N7 in terms of power efficiency from process perspective.

igor_kavinski · Mar 23, 2022

DisEnchantment said:
Something is strange with Hopper's Power consumption. 700W and going from N7 to N4. I was expecting a real efficient chip using N4, like 400W or less.
N4 > N5 > N7 in terms of power efficiency from process perspective.

They have a choice of keeping the same transistor count and gaining power efficiency or increasing transistor density at the expense of power consumption. They went with the latter approach.

DisEnchantment · Mar 23, 2022

igor_kavinski said:
They have a choice of keeping the same transistor count and gaining power efficiency or increasing transistor density at the expense of power consumption. They went with the latter approach.

I don't know if you fully thought about what you wrote.
If they keep same transistor count why bother to move to N4.
Newer full nodes means higher density and the tradeoff is between performance and efficiency. Keeping transistor efficiency same means higher performance.

But the almost 2x TDP means something else is up and it is not a process problem.

igor_kavinski · Mar 23, 2022

DisEnchantment said:
If they keep same transistor count why bother to move to N4.

Die will be smaller so they profit more.

DisEnchantment said:
Newer full nodes means higher density and the tradeoff is between performance and efficiency. Keeping transistor efficiency same means higher performance.

The higher density itself will increase the power draw.

Timmah! · Mar 23, 2022

DisEnchantment said:
I don't know if you fully thought about what you wrote.
If they keep same transistor count why bother to move to N4.
Newer full nodes means higher density and the tradeoff is between performance and efficiency. Keeping transistor efficiency same means higher performance.

But the almost 2x TDP means something else is up and it is not a process problem.

Maybe its those transformer engines. It takes juice to go full Optimus Prime
Or the clocks will be through the roof.

Hopefully Lovelace / Hopper RTX wont suffer from that as much. I cant have a card with bigger than 2,5slot cooler.

Timmah! · Mar 23, 2022

igor_kavinski said:
Die will be smaller so they profit more.

The higher density itself will increase the power draw.

Does not density increase as they move to smaller and smaller nodes? Why there is always expectation that those smaller nodes are going to be cooler?

igor_kavinski · Mar 23, 2022

Timmah! said:
Does not density increase as they move to smaller and smaller nodes? Why there is always expectation that those smaller nodes are going to be cooler?

Because the required voltage decreases (Less heat so cooler). But then they have to increase the voltage to keep the transistors stable at higher frequencies (more heat again!). It's a balancing act.

jpiniero · Mar 23, 2022

DisEnchantment said:
I don't know if you fully thought about what you wrote.
If they keep same transistor count why bother to move to N4.
Newer full nodes means higher density and the tradeoff is between performance and efficiency. Keeping transistor efficiency same means higher performance.

The 350 W PCIe version is still a lot faster than the A100. FP32 is 2.4x for instance (48 TF vs 19.5 TF)

Aapje · Mar 23, 2022

Timmah! said:
Does not density increase as they move to smaller and smaller nodes? Why there is always expectation that those smaller nodes are going to be cooler?

If you have the same number of transistors with higher density, then the signal doesn't have to travel that far, so it takes less voltage.

jpiniero · Mar 25, 2022

New details about the NVIDIA GeForce RTX 4090 and RTX 4080 - memory configuration, pin layout, voltage supply and cooling | igor´sLAB

One or the other detail has already been leaked about Ada (Lovelace) and the upcoming graphics cards with the AD102 core and up to 600 watts TBP. Today, another interesting piece of the puzzle is…

www.igorslab.de

Igor speculating that AD102 might be pin compatible with GA102 and the 3090 Ti is going to be used as a test run for the 600+ W cards.

Mopetar · Mar 25, 2022

eek2121 said:
Apples and Oranges comparison. AMD's cache hit rate issue was about the architecture, not anything related to caching itself. Improvements to the architecture will drastically improve the hit rate for IC. Nonetheless, as another user stated, this is L2 cache being discussed, not IC.

The post I was replying to was specifically about AMD doubling their infinity cache and what NVidia would have to do about it.

Also I'm not sure how cache architecture improves hit rate, unless by architecture you meant the size of the cache. The only thing that does that is having more cache, either longer lines so that more sequential data is brought in to the cache or a higher associativity which means data sticks around longer before being evicted.

Architectural improvements might make the cache faster and require fewer cycles, but the hit rate isn't going to change. Outside of the hardware itself compilers can structure code and data to get better performance, but changes to the hardware itself that aren't increasing the size won't affect the hit rate terribly much if at all.

But at some point you get diminishing returns. Look at Zen 3D which is fairly niche in terms of what applications will actually get an improvement from the extra cache. RDNA3 similarly isn't going to see a 2x improvement just from having 2x the Infinity Cache.

Saylick · Mar 25, 2022

Mopetar said:
But at some point you get diminishing returns. Look at Zen 3D which is fairly niche in terms of what applications will actually get an improvement from the extra cache. RDNA3 similarly isn't going to see a 2x improvement just from having 2x the Infinity Cache.

Agreed. I'd argue that the 2-4x increase in Infinity Cache is there more so to support 3x the compute units on the same sized memory bus rather plus any overhead required to coordinate two GCDs rather than an attempt to improve IPC. As you mentioned previously, even if you took a hypothetical 6900XT and slapped on twice the IF, the hit rate doesn't increase all that much and I suspect the performance won't either.

Frenetic Pony · Mar 25, 2022

Saylick said:
Agreed. I'd argue that the 2-4x increase in Infinity Cache is there more so to support 3x the compute units on the same sized memory bus rather plus any overhead required to coordinate two GCDs rather than an attempt to improve IPC. As you mentioned previously, even if you took a hypothetical 6900XT and slapped on twice the IF, the hit rate doesn't increase all that much and I suspect the performance won't either.

192-256mb of LLC could be useful for super high res stuff, even upscaling. They mentioned they spill out using FSR2 on upscaling to 4k even with 128mb. Thus I'd guess anything beyond midrange $4-500 will get above that, if only for reasons like that though.

As for Nvidia, it'll be interesting to see how much their rumored "giant 2nd level cache" strategy is going to pay off. A massive hit rate could certainly make up for the lack of advancement in memory speed, but I don't even want to hazard a guess as to it's effectiveness.

Edit- Speaking of which, looks like 600 watts is in; as is a 384bit bus with 24gb of ram for the highest end. Nothing surprising then. I'm a bit confused about the air cooling claims, was under the impression that topped out at 450 watts... unless the card is severely restricted to 450watts constant and might go up to 600 very temporarily.

Ajay · Mar 25, 2022

Frenetic Pony said:
Edit- Speaking of which, looks like 600 watts is in; as is a 384bit bus with 24gb of ram for the highest end. Nothing surprising then. I'm a bit confused about the air cooling claims, was under the impression that topped out at 450 watts... unless the card is severely restricted to 450watts constant and might go up to 600 very temporarily.

Good news is 16Gb GDDR6X chips are out, so only the top side of the circuit board will need to be populated. That gets rid of some of the heat problems with the 3090s.

That 600W number has to be peak output, not sustained. Otherwise, the 4090s would need blower fans on a huge copper heat sink, like the new Mac Studio. Even then it would have to vent directly out of the chassis. Or, be water cooled.

tviceman · Mar 27, 2022

DisEnchantment said:
Something is strange with Hopper's Power consumption. 700W and going from N7 to N4. I was expecting a real efficient chip using N4, like 400W or less.
N4 > N5 > N7 in terms of power efficiency from process perspective.

It's a combination of nodes losing efficiency gains like we got in the past, coupled with Nvidia accepting higher leakage dies while maximizing performance/clock speeds to prioritize profits.

As we saw with Ampere, an end user could usually shave 30% off the power draw while only losing ~3-5% in performance when lowering the voltage.

jpiniero · Mar 27, 2022

If Ada (or at least the initial products) use 24+ GDDR6X, that could be a problem wrt power consumption.

Aapje · Mar 27, 2022

I only expect that for the 4090. GDDR6X might go onto the 4070, but I doubt it. And I expect/fear relatively low memory quantities again.

Discussion Ada/'Lovelace'? Next gen Nvidia gaming architecture speculation

Lifer

Member

Diamond Member

Golden Member

Lifer

Golden Member

Senior member

Diamond Member

Golden Member

Lifer

Golden Member

Lifer

Golden Member

Golden Member

Lifer

Lifer

Golden Member

Lifer

Diamond Member

Diamond Member

Senior member

Lifer

Diamond Member

Lifer

Golden Member