Discussion Ada/'Lovelace'? Next gen Nvidia gaming architecture speculation

Page 9 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

biostud

Lifer
Feb 27, 2003
18,397
4,963
136
Obviously it must be because of cost, but how much does it cost to go HBM over traditional GDDR? It is not like we haven't seen HBM on a consumer card before. But with HBM2e and HBM3 we can get plenty of capacity and bandwidth, unlike 1st generation of HBM.
 

Antey

Member
Jul 4, 2019
105
153
116


SMs = 132
FP32 Cores / SM = 128
FP32 Cores / GPU = 15872

What?

132 SMs x 128 FP32 per SM is 16896

H100 PCIe5 numbers are also 'wrong'
 

Mopetar

Diamond Member
Jan 31, 2011
8,005
6,449
136
How is Nvidia planning to counter 256MB/512MB of stacked Infinity Cache on RDNA3?

It won't change anything much. If you look back at the slides AMD provided, the performance really tapered off after 128 MB for anything that wasn't 4K. It just means that a competing NVidia card doesn't gain as much as the resolutions increase like we see with current cards where AMD will win at 1080p but gets slightly overtaken at 1440p and is solidly behind in 4K.

For other workloads it doesn't matter much unless it lets some data set fit entirely in cache that otherwise couldn't. Otherwise the cache is larger, but there are more CUs to feed that are requesting data. For some workloads it comes out a wash.
 

Timmah!

Golden Member
Jul 24, 2010
1,463
729
136
It won't change anything much. If you look back at the slides AMD provided, the performance really tapered off after 128 MB for anything that wasn't 4K. It just means that a competing NVidia card doesn't gain as much as the resolutions increase like we see with current cards where AMD will win at 1080p but gets slightly overtaken at 1440p and is solidly behind in 4K.

For other workloads it doesn't matter much unless it lets some data set fit entirely in cache that otherwise couldn't. Otherwise the cache is larger, but there are more CUs to feed that are requesting data. For some workloads it comes out a wash.

I do think Infinity Cache is L3, while this talk in regard to Lovelace is about L2. Surely there is advantage in that.
I think raytracing will benefit from it especially, and is the primary reason of the increase (if it happens).
 
Jul 27, 2020
17,916
11,685
116
I do think Infinity Cache is L3, while this talk in regard to Lovelace is about L2. Surely there is advantage in that.
I think raytracing will benefit from it especially, and is the primary reason of the increase (if it happens).
Maybe that's the only way AMD knows how to increase raytracing performance.
 
Reactions: maddie

Aapje

Golden Member
Mar 21, 2022
1,467
2,031
106
It's probably because they haven't finalized the cooling solution yet, so they don't know what they can get away with.
 

Frenetic Pony

Senior member
May 1, 2012
218
179
116
It's bigger! Bigger and more power efficient. Ok the doubling, again, of 8bit performance is cool. Something neat there I suppose; though I feel it's odd to put this massively expensive card out there for... inference work. Who'd buy it for that? Well it'll help on their next gen of inference specific hardware. But otherwise, it's just bigger-er. And that's fine and dandy for their "we can charge a hell of a lot per card" AI accelerators.

But other than limiting itself to a standard power input, again putting a total damper on the "850 watts 4090" stuff, it doesn't feel like this announcement gives us a whole lot to go in for the consumer end. I mean surely you can half the size of this massive card, probably about what a 4090 would be, limit it to 600 watts and still have headroom to clock it up to 2.something ghz. As for bandwidth. HBM? Probably not. No word on any changes to the cache structure either, so no hints of a big LLC either way; and the big AI card might not even include such a feature.

Guess, other than concluding the powerdraw could easily stick to 600watts or less, we'll have to wait 5 months or whatever until the Hopper consumer launch to find out.
 
Last edited:

eek2121

Diamond Member
Aug 2, 2005
3,051
4,273
136
It won't change anything much. If you look back at the slides AMD provided, the performance really tapered off after 128 MB for anything that wasn't 4K. It just means that a competing NVidia card doesn't gain as much as the resolutions increase like we see with current cards where AMD will win at 1080p but gets slightly overtaken at 1440p and is solidly behind in 4K.

For other workloads it doesn't matter much unless it lets some data set fit entirely in cache that otherwise couldn't. Otherwise the cache is larger, but there are more CUs to feed that are requesting data. For some workloads it comes out a wash.

Apples and Oranges comparison. AMD's cache hit rate issue was about the architecture, not anything related to caching itself. Improvements to the architecture will drastically improve the hit rate for IC. Nonetheless, as another user stated, this is L2 cache being discussed, not IC.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,687
6,235
136
Something is strange with Hopper's Power consumption. 700W and going from N7 to N4. I was expecting a real efficient chip using N4, like 400W or less.
N4 > N5 > N7 in terms of power efficiency from process perspective.
 
Jul 27, 2020
17,916
11,685
116
Something is strange with Hopper's Power consumption. 700W and going from N7 to N4. I was expecting a real efficient chip using N4, like 400W or less.
N4 > N5 > N7 in terms of power efficiency from process perspective.
They have a choice of keeping the same transistor count and gaining power efficiency or increasing transistor density at the expense of power consumption. They went with the latter approach.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,687
6,235
136
They have a choice of keeping the same transistor count and gaining power efficiency or increasing transistor density at the expense of power consumption. They went with the latter approach.
I don't know if you fully thought about what you wrote.
If they keep same transistor count why bother to move to N4.
Newer full nodes means higher density and the tradeoff is between performance and efficiency. Keeping transistor efficiency same means higher performance.

But the almost 2x TDP means something else is up and it is not a process problem.
 

Timmah!

Golden Member
Jul 24, 2010
1,463
729
136
I don't know if you fully thought about what you wrote.
If they keep same transistor count why bother to move to N4.
Newer full nodes means higher density and the tradeoff is between performance and efficiency. Keeping transistor efficiency same means higher performance.

But the almost 2x TDP means something else is up and it is not a process problem.

Maybe its those transformer engines. It takes juice to go full Optimus Prime
Or the clocks will be through the roof.

Hopefully Lovelace / Hopper RTX wont suffer from that as much. I cant have a card with bigger than 2,5slot cooler.
 
Jul 27, 2020
17,916
11,685
116
Does not density increase as they move to smaller and smaller nodes? Why there is always expectation that those smaller nodes are going to be cooler?
Because the required voltage decreases (Less heat so cooler). But then they have to increase the voltage to keep the transistors stable at higher frequencies (more heat again!). It's a balancing act.
 
Reactions: Timmah!

jpiniero

Lifer
Oct 1, 2010
14,835
5,452
136
I don't know if you fully thought about what you wrote.
If they keep same transistor count why bother to move to N4.
Newer full nodes means higher density and the tradeoff is between performance and efficiency. Keeping transistor efficiency same means higher performance.

The 350 W PCIe version is still a lot faster than the A100. FP32 is 2.4x for instance (48 TF vs 19.5 TF)
 

Aapje

Golden Member
Mar 21, 2022
1,467
2,031
106
Does not density increase as they move to smaller and smaller nodes? Why there is always expectation that those smaller nodes are going to be cooler?

If you have the same number of transistors with higher density, then the signal doesn't have to travel that far, so it takes less voltage.
 
Reactions: Timmah!

jpiniero

Lifer
Oct 1, 2010
14,835
5,452
136
Reactions: Saylick

Mopetar

Diamond Member
Jan 31, 2011
8,005
6,449
136
Apples and Oranges comparison. AMD's cache hit rate issue was about the architecture, not anything related to caching itself. Improvements to the architecture will drastically improve the hit rate for IC. Nonetheless, as another user stated, this is L2 cache being discussed, not IC.

The post I was replying to was specifically about AMD doubling their infinity cache and what NVidia would have to do about it.

Also I'm not sure how cache architecture improves hit rate, unless by architecture you meant the size of the cache. The only thing that does that is having more cache, either longer lines so that more sequential data is brought in to the cache or a higher associativity which means data sticks around longer before being evicted.

Architectural improvements might make the cache faster and require fewer cycles, but the hit rate isn't going to change. Outside of the hardware itself compilers can structure code and data to get better performance, but changes to the hardware itself that aren't increasing the size won't affect the hit rate terribly much if at all.

But at some point you get diminishing returns. Look at Zen 3D which is fairly niche in terms of what applications will actually get an improvement from the extra cache. RDNA3 similarly isn't going to see a 2x improvement just from having 2x the Infinity Cache.
 

Saylick

Diamond Member
Sep 10, 2012
3,385
7,151
136
But at some point you get diminishing returns. Look at Zen 3D which is fairly niche in terms of what applications will actually get an improvement from the extra cache. RDNA3 similarly isn't going to see a 2x improvement just from having 2x the Infinity Cache.
Agreed. I'd argue that the 2-4x increase in Infinity Cache is there more so to support 3x the compute units on the same sized memory bus rather plus any overhead required to coordinate two GCDs rather than an attempt to improve IPC. As you mentioned previously, even if you took a hypothetical 6900XT and slapped on twice the IF, the hit rate doesn't increase all that much and I suspect the performance won't either.
 
Reactions: Mopetar

Frenetic Pony

Senior member
May 1, 2012
218
179
116
Agreed. I'd argue that the 2-4x increase in Infinity Cache is there more so to support 3x the compute units on the same sized memory bus rather plus any overhead required to coordinate two GCDs rather than an attempt to improve IPC. As you mentioned previously, even if you took a hypothetical 6900XT and slapped on twice the IF, the hit rate doesn't increase all that much and I suspect the performance won't either.

192-256mb of LLC could be useful for super high res stuff, even upscaling. They mentioned they spill out using FSR2 on upscaling to 4k even with 128mb. Thus I'd guess anything beyond midrange $4-500 will get above that, if only for reasons like that though.

As for Nvidia, it'll be interesting to see how much their rumored "giant 2nd level cache" strategy is going to pay off. A massive hit rate could certainly make up for the lack of advancement in memory speed, but I don't even want to hazard a guess as to it's effectiveness.

Edit- Speaking of which, looks like 600 watts is in; as is a 384bit bus with 24gb of ram for the highest end. Nothing surprising then. I'm a bit confused about the air cooling claims, was under the impression that topped out at 450 watts... unless the card is severely restricted to 450watts constant and might go up to 600 very temporarily.
 
Last edited:
Reactions: Mopetar and Saylick

Ajay

Lifer
Jan 8, 2001
16,094
8,106
136
Edit- Speaking of which, looks like 600 watts is in; as is a 384bit bus with 24gb of ram for the highest end. Nothing surprising then. I'm a bit confused about the air cooling claims, was under the impression that topped out at 450 watts... unless the card is severely restricted to 450watts constant and might go up to 600 very temporarily.

Good news is 16Gb GDDR6X chips are out, so only the top side of the circuit board will need to be populated. That gets rid of some of the heat problems with the 3090s.

That 600W number has to be peak output, not sustained. Otherwise, the 4090s would need blower fans on a huge copper heat sink, like the new Mac Studio. Even then it would have to vent directly out of the chassis. Or, be water cooled.
 

tviceman

Diamond Member
Mar 25, 2008
6,734
514
126
www.facebook.com
Something is strange with Hopper's Power consumption. 700W and going from N7 to N4. I was expecting a real efficient chip using N4, like 400W or less.
N4 > N5 > N7 in terms of power efficiency from process perspective.

It's a combination of nodes losing efficiency gains like we got in the past, coupled with Nvidia accepting higher leakage dies while maximizing performance/clock speeds to prioritize profits.

As we saw with Ampere, an end user could usually shave 30% off the power draw while only losing ~3-5% in performance when lowering the voltage.
 

jpiniero

Lifer
Oct 1, 2010
14,835
5,452
136
If Ada (or at least the initial products) use 24+ GDDR6X, that could be a problem wrt power consumption.
 

Aapje

Golden Member
Mar 21, 2022
1,467
2,031
106
I only expect that for the 4090. GDDR6X might go onto the 4070, but I doubt it. And I expect/fear relatively low memory quantities again.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |