Discussion Nvidia Blackwell in Q1-2025

Hulk · Jan 8, 2025

Fooling around with this. By assuming a GPU can predict a frame in half the time it can compute one I'm assuming that more powerful GPU's can not only compute frames faster than less powerful ones, but they can also predict frames faster as well.

This is all very "caveman" and filled with assumptions. The main one being that the number of framed predicted into the future will be based on a finite time into the future, I have made that 3ms in this example. Therefore faster native GPU's benefit more from this technique because, 1. They can compute frames faster, and 2. they can predict frames faster so they can therefore render out more frames into the future in the same 3ms into the future.

Native frame rate	40	50	75	100	150
Seconds to compute 1 frame	0.025	0.02	0.013333	0.01	0.006667

ms into future to "predict"	3	3	3	3	3
Number of prediction frames	1.2	1.5	2.25	3	4.5
Round to nearest integer	1	2	2	3	5

Assume frame can be predicted in
half the time it can be computed
Time to predict a frame	0.0125	0.01	0.006667	0.005	0.003333

Number of frames in GOP	2	3	3	4	6
Time to compute GOP	0.04	0.035	0.028333	0.025	0.021667

"Faker" frame rate	55	71	115	160	254
Percent frame rate improvement	38%	43%	53%	60%	69%

coercitiv · Jan 8, 2025

Hulk said:
Yes, it is predicting the future but it is not knowing the future.
Let me explain.

I know you like to think and theorize about these things, but the gaming screen is a very unpredictable place. Games as a whole don't obey any law, Newtonian or otherwise. Balls stop mid-flight, objects materialize from nowhere. Too many things can happen in the next 3 frames that were not there in the last real frame.

This is why knowing instead of predicting is worth the price in performance. They compensate for this by using aggressive sync to minimize system delay and claw back some of the loss. With this new generation they include a new tech that lowers latency further by adjusting rendered frames right before they're displayed using the most up-to-date positioning information from the game engine. This allows them to move objects closer to their "present" location and then paint the gaps using neural models. You can read about it here:

NVIDIA Reflex 2 With New Frame Warp Technology Reduces Latency In Games By Up To 75%

Innovative new technology improves responsiveness by updating rendered frames based on latest mouse input.

www.nvidia.com

NTMBK · Jan 8, 2025

igor_kavinski said:
View attachment 114466

Five fans.

Obligatory https://theonion.com/fuck-everything-were-doing-five-blades-1819584036/

MoogleW · Jan 8, 2025

Saylick said:
I get what you're saying, but in a purely forward predicting approach ("extrapolation") you are more subject to errors if there's sudden and random movement. Making the timesteps smaller just limits the amount of potential error between the last extrapolated frame and the "real" frame that comes after it, but the error can exist and it can be quite large if a lot of movement happens in that timestep. At least with interpolation, the start and end points are already known so you just need to fill in the gap, whether it be linear interpolation, polynomial, or whatever.

In your analogy of predicting the flight of a baseball, that works well because a thrown baseball follows physics. What doesn't is user input, e.g. FPS games where I can be panning left, right, up, down or wherever and however I choose. It's that old stock investing adage of "past performance is no indicator of future results".

Games do still have know assets and the scene that just dob't get rendered to the monitor right? Its not visible, but its known

Thunder 57 · Jan 8, 2025

NTMBK said:
Obligatory https://theonion.com/fuck-everything-were-doing-five-blades-1819584036/

I still have a Mach 3. Pretty great razor IMHO. I tried their stupid 5 blade one and didn't like it.

MoogleW · Jan 8, 2025

eek2121 said:
They used Frame generation to make the gains seem larger than they are.

I am willing to bet the 5090 is around 30-40% faster than the 4090. Possibly even more in RT or AI.

You can’t necessarily compare raw transistor counts because the chip design itself has several changes to accommodate for the new memory bus and tech.

Also, don’t knock FG, it is actually pretty solid tech. It helps keep games synced with the monitor’s max refresh. Did NVIDIA do bad by using it in slides? Absolutely. Like for like comparisons are what they should have done. FG itself isn’t bad though…and the fact they crammed it into a 2 slot card is incredible.

If you honestly think these cards are only going to be 5-10% faster than previous gen, I don’t know what to tell you except wait for the reviews.

We're just back to regular scheduled doom posting. Personally I say 30-45% for 5090 and 25-40% for 5080. Primarily from architecture and minor spec bumps in some cases. I doubt clocks are any better in practice vs 2.8ghz for the 4090. Would interpolating the first frame and extrapolating the latter two using the new reflex and so on to account for say, movement of a cursor or sight, make any sense? That would require the last frame to be influenced by the intermediate frames though I guess. Unless you use the second extrapolated frame after the second frame.

Much easier to just interpolate the two in between.

poke01 · Jan 8, 2025

this is coming to all RTX cards. Good explanation of new Neural features.

https://twitter.com/x/status/1876785270399389758

NTMBK · Jan 8, 2025

poke01 said:
this is coming to all RTX cards. Good explanation of new Neural features.

https://twitter.com/x/status/1876785270399389758

That still doesn't explain what it's actually doing. Is it compression of the shader binaries, which will then need decompression at runtime? Or is this some sort of AI driven code optimization (in which case, good luck 😬)? Very vague.

EDIT: There's a paper on the texture compression, if anyone is interested in the details: https://research.nvidia.com/labs/rtr/neural_texture_compression/

Grooveriding · Jan 8, 2025

We need to see real reviews across lots of games. I think those guesses are optimistic, my guess is it’s disappointing going on the benchmarks and comparisons they used. They don’t show the 5080 compared to the 4090, very little straight raster, almost everything is with DLSS4 turned on. Some exceptions to the general average could be RT, they've made decent improvements to RT performance every gen.

It shouldn’t be surprising as it’s the same node with GDDR7 and a bigger die for the 5090, and the 5080 with about the same die as a 4080.

This is 2080/2080ti all over again, but possibly a little worse of a performance uplift. 5080 likely slower across the board than 4090 without DLSS4, and 20-30% uplift for 5090 over 4090 in raw performance. Normal Quality DLSS & DLAA are pretty decent IQ wise. Frame gen looks pretty bad in my experience. Have to wait for reviews to see if DLSS4 has improved there. Current FG has plenty of artifacts and smearing, I'd be impressed if now generating even more false frames ends up with it looking better than current frame gen.

Racan · Jan 8, 2025

eek2121 said:
I had to laugh when I saw the news, because when the two slot rumor came out and I mentioned it, someone quickly tried to claim it was BS. Now here we are. I didn’t make up the rumor, but I figured it was probably true. NVIDIA has been investing heavily in this area whereas most partners have not. The FE cards were out of this world compared to partner cards.

Thought now that both fans are flow through it’s going to suck for most SFF cases especially sandwich style ones. Should work best in something like the Fractal Ridge.

eek2121 · Jan 8, 2025

MoogleW said:
We're just back to regular scheduled doom posting. Personally I say 30-45% for 5090 and 25-40% for 5080. Primarily from architecture and minor spec bumps in some cases. I doubt clocks are any better in practice vs 2.8ghz for the 4090. Would interpolating the first frame and extrapolating the latter two using the new reflex and so on to account for say, movement of a cursor or sight, make any sense? That would require the last frame to be influenced by the intermediate frames though I guess. Unless you use the second extrapolated frame after the second frame.

Much easier to just interpolate the two in between.

FWIW the clocks for the 4090 were not 2.8ghz. Aftermarket. cards clocked higher, sure, but you can’t compare those since we have no info on clocks for 3rd party 5090s. The 4090’s official clock was 2.52ghz. The 5090 is 2.41 ghz (about a 5% difference). (numbers may be slightly off, pulled them from memory, but they are on NVIDIA’s site)

We don’t know how much OC headroom these parts have.

SolidQ · Jan 8, 2025

Buy 5x5090 and he can buy 2nd jacket !

eek2121 · Jan 8, 2025

Racan said:
Thought now that both fans are flow through it’s going to suck for most SFF cases especially sandwich style ones. Should work best in something like the Fractal Ridge.

Actually it may work perfect with my case. I have to double check, but I used an AIO for my CPU, and I think my case had venting in the back as well as the side, so airflow would easily be pushed through unlike with my 4090.

jpiniero · Jan 8, 2025

Grooveriding said:
This is 2080/2080ti all over again, but possibly a little worse of a performance uplift.

2080 had a way better improvement over the 1080 than this will be.

I think the raster improvement for the 5080 and 5070 Ti will be 0-5% and -5 to 0% on the 5070 (compared to 4070 NS). With RT being 15-20% above that.

eek2121 · Jan 8, 2025

SolidQ said:
Buy 5x5090 and he can buy 2nd jacket !

Think I should buy that and make a youtube video making fun of jensen? 🤣

Actually, I wouldn’t be caught dead in that thing LOL

igor_kavinski · Jan 8, 2025

poke01 said:
this is coming to all RTX cards. Good explanation of new Neural features.

https://twitter.com/x/status/1876785270399389758

This is another brazen attempt by Nvidia to push a proprietary feature. How much you wanna bet that this has its own texture format so game developers will have to do double the work just to accommodate Nvidia's stingy VRAM addiction. Game sizes will also increase due to needing two different texture packs.

Golgatha · Jan 8, 2025

Random addition to this conversation, but the 5090 FE is out of consideration for me. They're using liquid metal for the TIM and I absolutely loathe this stuff. I want an easy fix if I need to re-paste or re-pad my card.

poke01 · Jan 8, 2025

igor_kavinski said:
This is another brazen attempt by Nvidia to push a proprietary feature. How much you wanna bet that this has its own texture format so game developers will have to do double the work just to accommodate Nvidia's stingy VRAM addiction. Game sizes will also increase due to needing two different texture packs.

What? It’s being supported by Microsoft. AMD, Intel and Qualcomm will have a similar approach to this as well and it will be a DirectX feature.
Read into how this works it’s anything but predatory.

Enabling Neural Rendering in DirectX: Cooperative Vector Support Coming Soon - DirectX Developer Blog

Neural Rendering: A New Paradigm in 3D Graphics Programming In the constantly advancing landscape of 3D graphics, neural rendering technology represents a significant evolution. Neural rendering broadly defines the suite of techniques that leverage AI/ML to dramatically transform traditional...

devblogs.microsoft.com

poke01 · Jan 8, 2025

poke01 said:
What? It’s being supported by Microsoft. AMD, Intel and Qualcomm will have a similar approach to this as well and it will be a DirectX feature.
Read into how this works it’s anything but predatory.

Enabling Neural Rendering in DirectX: Cooperative Vector Support Coming Soon - DirectX Developer Blog

Neural Rendering: A New Paradigm in 3D Graphics Programming In the constantly advancing landscape of 3D graphics, neural rendering technology represents a significant evolution. Neural rendering broadly defines the suite of techniques that leverage AI/ML to dramatically transform traditional...

devblogs.microsoft.com

Here’s AMD version

https://gpuopen.com/download/publications/2024_NeuralTextureBCCompression.pdf

I bet RDNA5 will have even better support for this when compared to RDNA4 and the PS6 will take advantage of this compression techniques.

SolidQ · Jan 8, 2025

poke01 said:
I bet RDNA5 will have even better support for this when compared to RDNA4

UDNA will have Matrix Cores, ofc it will be better

Win2012R2 · Jan 8, 2025

I was just thinking that (as of now based on limited info we have) since it seems massive memory bandwidth wasn't really helping perf, then it should be enough to feed a lot more ALUs in the next gen once shrinking logic will work again, even if it will look like half node compared to N4. At least something to look forward to...

Win2012R2 · Jan 8, 2025

Golgatha said:
They're using liquid metal for the TIM

What's the problem with it?

poke01 · Jan 8, 2025

Win2012R2 said:
What's the problem with it?

The PS5 has been using Liquid Metal for 5 years now and no major issues occurred. But I get what the above poster is saying it’s annoying to repaste Liquid Metal, so
I think it’s matter of convenience for them rather than worrying about a future manufacturing defect.

Hulk · Jan 8, 2025

Hulk said:
Fooling around with this. By assuming a GPU can predict a frame in half the time it can compute one I'm assuming that more powerful GPU's can not only compute frames faster than less powerful ones, but they can also predict frames faster as well.

This is all very "caveman" and filled with assumptions. The main one being that the number of framed predicted into the future will be based on a finite time into the future, I have made that 3ms in this example. Therefore faster native GPU's benefit more from this technique because, 1. They can compute frames faster, and 2. they can predict frames faster so they can therefore render out more frames into the future in the same 3ms into the future.

Native frame rate
40
50
75
100
150
Seconds to compute 1 frame
0.025
0.02
0.013333
0.01
0.006667
ms into future to "predict"
3
3
3
3
3
Number of prediction frames
1.2
1.5
2.25
3
4.5
Round to nearest integer
1
2
2
3
5
Assume frame can be predicted in
half the time it can be computed
Time to predict a frame
0.0125
0.01
0.006667
0.005
0.003333
Number of frames in GOP
2
3
3
4
6
Time to compute GOP
0.04
0.035
0.028333
0.025
0.021667
"Faker" frame rate
55
71
115
160
254
Percent frame rate improvement
38%
43%
53%
60%
69%

Yes, but despite all of that reasoning against, DLSS works with some minor inconsistancies, which are most promininant with text alignment or movement.
In the same way Simpon's rule can estimate the area under a curve with greater and greater precision as delta x gets smaller, the same with frame prediction as delta t gets smaller. Again, the fundamental theorem of calculus is in fact correct and as these time slices approach zero so does the error. If the time slice is small enough and the AI prediction is good enough the error becomes exceedingly small. nVidia of course figured this out and decided it was worth pouring tens of millions or perhaps hundreds of millions of dollars into developing. I'm running some rough numbers, to get a feel for it. And yes, I do like to play with the numbers!

I'm not sure you're fully comprehending how short a time interval 3ms is in terms of human perception. For example, in the Olympics a false start is called if an athlete is shown to have a reaction time of less than 100ms. 10ms would be an order of magnitude faster than human reaction time. 3ms a third of that. As the time interval gets smaller the error between rendered and predicted frames becomes smaller.

The theory behind what nVidia is doing is solid. We may not like it but at the end of the day it does produce meaningful fps improvement with very little visual artifacts as Linus pointed out with this short demo of the 5090.

Golgatha · Jan 8, 2025

Win2012R2 said:
What's the problem with it?

It's annoying to re-paste and it's electrically conductive. It's possible when you separate the cooler from the PCB that the TIM could go into someplace it shouldn't and short out a $2000 card. No thank you. Same reason I wouldn't ever water cool a system unless it was absolutely necessary (say tightly packed servers or something); it's a potential headache down the road.

Discussion Nvidia Blackwell in Q1-2025

Diamond Member

Diamond Member

Lifer

Member

Diamond Member

Member

Diamond Member

Lifer

Diamond Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Lifer

Diamond Member

Lifer

Lifer

Diamond Member

Diamond Member

Golden Member

Senior member

Senior member

Diamond Member

Diamond Member

Lifer