Discussion Nvidia Blackwell in Q1-2025

Page 64 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Hulk

Diamond Member
Oct 9, 1999
4,941
3,372
136
Fooling around with this. By assuming a GPU can predict a frame in half the time it can compute one I'm assuming that more powerful GPU's can not only compute frames faster than less powerful ones, but they can also predict frames faster as well.

This is all very "caveman" and filled with assumptions. The main one being that the number of framed predicted into the future will be based on a finite time into the future, I have made that 3ms in this example. Therefore faster native GPU's benefit more from this technique because, 1. They can compute frames faster, and 2. they can predict frames faster so they can therefore render out more frames into the future in the same 3ms into the future.

Native frame rate
40​
50​
75​
100​
150​
Seconds to compute 1 frame
0.025​
0.02​
0.013333​
0.01​
0.006667​
ms into future to "predict"
3​
3​
3​
3​
3​
Number of prediction frames
1.2​
1.5​
2.25​
3​
4.5​
Round to nearest integer
1​
2​
2​
3​
5​
Assume frame can be predicted in
half the time it can be computed
Time to predict a frame
0.0125​
0.01​
0.006667​
0.005​
0.003333​
Number of frames in GOP
2​
3​
3​
4​
6​
Time to compute GOP
0.04​
0.035​
0.028333​
0.025​
0.021667​
"Faker" frame rate
55​
71​
115​
160​
254​
Percent frame rate improvement
38%​
43%​
53%​
60%​
69%​
 

coercitiv

Diamond Member
Jan 24, 2014
6,957
15,595
136
Yes, it is predicting the future but it is not knowing the future.
Let me explain.
I know you like to think and theorize about these things, but the gaming screen is a very unpredictable place. Games as a whole don't obey any law, Newtonian or otherwise. Balls stop mid-flight, objects materialize from nowhere. Too many things can happen in the next 3 frames that were not there in the last real frame.

This is why knowing instead of predicting is worth the price in performance. They compensate for this by using aggressive sync to minimize system delay and claw back some of the loss. With this new generation they include a new tech that lowers latency further by adjusting rendered frames right before they're displayed using the most up-to-date positioning information from the game engine. This allows them to move objects closer to their "present" location and then paint the gaps using neural models. You can read about it here:

 
Last edited:

MoogleW

Member
May 1, 2022
95
44
61
I get what you're saying, but in a purely forward predicting approach ("extrapolation") you are more subject to errors if there's sudden and random movement. Making the timesteps smaller just limits the amount of potential error between the last extrapolated frame and the "real" frame that comes after it, but the error can exist and it can be quite large if a lot of movement happens in that timestep. At least with interpolation, the start and end points are already known so you just need to fill in the gap, whether it be linear interpolation, polynomial, or whatever.

In your analogy of predicting the flight of a baseball, that works well because a thrown baseball follows physics. What doesn't is user input, e.g. FPS games where I can be panning left, right, up, down or wherever and however I choose. It's that old stock investing adage of "past performance is no indicator of future results".
Games do still have know assets and the scene that just dob't get rendered to the monitor right? Its not visible, but its known
 

MoogleW

Member
May 1, 2022
95
44
61
They used Frame generation to make the gains seem larger than they are.

I am willing to bet the 5090 is around 30-40% faster than the 4090. Possibly even more in RT or AI.

You can’t necessarily compare raw transistor counts because the chip design itself has several changes to accommodate for the new memory bus and tech.

Also, don’t knock FG, it is actually pretty solid tech. It helps keep games synced with the monitor’s max refresh. Did NVIDIA do bad by using it in slides? Absolutely. Like for like comparisons are what they should have done. FG itself isn’t bad though…and the fact they crammed it into a 2 slot card is incredible.

If you honestly think these cards are only going to be 5-10% faster than previous gen, I don’t know what to tell you except wait for the reviews.
We're just back to regular scheduled doom posting. Personally I say 30-45% for 5090 and 25-40% for 5080. Primarily from architecture and minor spec bumps in some cases. I doubt clocks are any better in practice vs 2.8ghz for the 4090. Would interpolating the first frame and extrapolating the latter two using the new reflex and so on to account for say, movement of a cursor or sight, make any sense? That would require the last frame to be influenced by the intermediate frames though I guess. Unless you use the second extrapolated frame after the second frame.

Much easier to just interpolate the two in between.
 
Last edited:

NTMBK

Lifer
Nov 14, 2011
10,377
5,520
136
this is coming to all RTX cards. Good explanation of new Neural features.
That still doesn't explain what it's actually doing. Is it compression of the shader binaries, which will then need decompression at runtime? Or is this some sort of AI driven code optimization (in which case, good luck 😬)? Very vague.

EDIT: There's a paper on the texture compression, if anyone is interested in the details: https://research.nvidia.com/labs/rtr/neural_texture_compression/
 

Grooveriding

Diamond Member
Dec 25, 2008
9,144
1,322
126
We need to see real reviews across lots of games. I think those guesses are optimistic, my guess is it’s disappointing going on the benchmarks and comparisons they used. They don’t show the 5080 compared to the 4090, very little straight raster, almost everything is with DLSS4 turned on. Some exceptions to the general average could be RT, they've made decent improvements to RT performance every gen.

It shouldn’t be surprising as it’s the same node with GDDR7 and a bigger die for the 5090, and the 5080 with about the same die as a 4080.

This is 2080/2080ti all over again, but possibly a little worse of a performance uplift. 5080 likely slower across the board than 4090 without DLSS4, and 20-30% uplift for 5090 over 4090 in raw performance. Normal Quality DLSS & DLAA are pretty decent IQ wise. Frame gen looks pretty bad in my experience. Have to wait for reviews to see if DLSS4 has improved there. Current FG has plenty of artifacts and smearing, I'd be impressed if now generating even more false frames ends up with it looking better than current frame gen.
 

Racan

Golden Member
Sep 22, 2012
1,199
2,201
136
I had to laugh when I saw the news, because when the two slot rumor came out and I mentioned it, someone quickly tried to claim it was BS. Now here we are. I didn’t make up the rumor, but I figured it was probably true. NVIDIA has been investing heavily in this area whereas most partners have not. The FE cards were out of this world compared to partner cards.
Thought now that both fans are flow through it’s going to suck for most SFF cases especially sandwich style ones. Should work best in something like the Fractal Ridge.
 

eek2121

Diamond Member
Aug 2, 2005
3,202
4,635
136
We're just back to regular scheduled doom posting. Personally I say 30-45% for 5090 and 25-40% for 5080. Primarily from architecture and minor spec bumps in some cases. I doubt clocks are any better in practice vs 2.8ghz for the 4090. Would interpolating the first frame and extrapolating the latter two using the new reflex and so on to account for say, movement of a cursor or sight, make any sense? That would require the last frame to be influenced by the intermediate frames though I guess. Unless you use the second extrapolated frame after the second frame.

Much easier to just interpolate the two in between.
FWIW the clocks for the 4090 were not 2.8ghz. Aftermarket. cards clocked higher, sure, but you can’t compare those since we have no info on clocks for 3rd party 5090s. The 4090’s official clock was 2.52ghz. The 5090 is 2.41 ghz (about a 5% difference). (numbers may be slightly off, pulled them from memory, but they are on NVIDIA’s site)

We don’t know how much OC headroom these parts have.
 

eek2121

Diamond Member
Aug 2, 2005
3,202
4,635
136
Thought now that both fans are flow through it’s going to suck for most SFF cases especially sandwich style ones. Should work best in something like the Fractal Ridge.
Actually it may work perfect with my case. I have to double check, but I used an AIO for my CPU, and I think my case had venting in the back as well as the side, so airflow would easily be pushed through unlike with my 4090.
 

jpiniero

Lifer
Oct 1, 2010
15,634
6,111
136
This is 2080/2080ti all over again, but possibly a little worse of a performance uplift.

2080 had a way better improvement over the 1080 than this will be.

I think the raster improvement for the 5080 and 5070 Ti will be 0-5% and -5 to 0% on the 5070 (compared to 4070 NS). With RT being 15-20% above that.
 
Jul 27, 2020
22,309
15,576
146
this is coming to all RTX cards. Good explanation of new Neural features.
This is another brazen attempt by Nvidia to push a proprietary feature. How much you wanna bet that this has its own texture format so game developers will have to do double the work just to accommodate Nvidia's stingy VRAM addiction. Game sizes will also increase due to needing two different texture packs.
 

Golgatha

Lifer
Jul 18, 2003
12,310
790
126
Random addition to this conversation, but the 5090 FE is out of consideration for me. They're using liquid metal for the TIM and I absolutely loathe this stuff. I want an easy fix if I need to re-paste or re-pad my card.
 
Reactions: gdansk

poke01

Diamond Member
Mar 8, 2022
3,040
4,031
106
This is another brazen attempt by Nvidia to push a proprietary feature. How much you wanna bet that this has its own texture format so game developers will have to do double the work just to accommodate Nvidia's stingy VRAM addiction. Game sizes will also increase due to needing two different texture packs.
What? It’s being supported by Microsoft. AMD, Intel and Qualcomm will have a similar approach to this as well and it will be a DirectX feature.
Read into how this works it’s anything but predatory.

 
Last edited:

poke01

Diamond Member
Mar 8, 2022
3,040
4,031
106
What? It’s being supported by Microsoft. AMD, Intel and Qualcomm will have a similar approach to this as well and it will be a DirectX feature.
Read into how this works it’s anything but predatory.

Here’s AMD version


I bet RDNA5 will have even better support for this when compared to RDNA4 and the PS6 will take advantage of this compression techniques.
 

Win2012R2

Senior member
Dec 5, 2024
647
610
96
I was just thinking that (as of now based on limited info we have) since it seems massive memory bandwidth wasn't really helping perf, then it should be enough to feed a lot more ALUs in the next gen once shrinking logic will work again, even if it will look like half node compared to N4. At least something to look forward to...
 

Hulk

Diamond Member
Oct 9, 1999
4,941
3,372
136
Fooling around with this. By assuming a GPU can predict a frame in half the time it can compute one I'm assuming that more powerful GPU's can not only compute frames faster than less powerful ones, but they can also predict frames faster as well.

This is all very "caveman" and filled with assumptions. The main one being that the number of framed predicted into the future will be based on a finite time into the future, I have made that 3ms in this example. Therefore faster native GPU's benefit more from this technique because, 1. They can compute frames faster, and 2. they can predict frames faster so they can therefore render out more frames into the future in the same 3ms into the future.

Native frame rate
40​
50​
75​
100​
150​
Seconds to compute 1 frame
0.025​
0.02​
0.013333​
0.01​
0.006667​
ms into future to "predict"
3​
3​
3​
3​
3​
Number of prediction frames
1.2​
1.5​
2.25​
3​
4.5​
Round to nearest integer
1​
2​
2​
3​
5​
Assume frame can be predicted in
half the time it can be computed
Time to predict a frame
0.0125​
0.01​
0.006667​
0.005​
0.003333​
Number of frames in GOP
2​
3​
3​
4​
6​
Time to compute GOP
0.04​
0.035​
0.028333​
0.025​
0.021667​
"Faker" frame rate
55​
71​
115​
160​
254​
Percent frame rate improvement
38%​
43%​
53%​
60%​
69%​
Yes, but despite all of that reasoning against, DLSS works with some minor inconsistancies, which are most promininant with text alignment or movement.
In the same way Simpon's rule can estimate the area under a curve with greater and greater precision as delta x gets smaller, the same with frame prediction as delta t gets smaller. Again, the fundamental theorem of calculus is in fact correct and as these time slices approach zero so does the error. If the time slice is small enough and the AI prediction is good enough the error becomes exceedingly small. nVidia of course figured this out and decided it was worth pouring tens of millions or perhaps hundreds of millions of dollars into developing. I'm running some rough numbers, to get a feel for it. And yes, I do like to play with the numbers!

I'm not sure you're fully comprehending how short a time interval 3ms is in terms of human perception. For example, in the Olympics a false start is called if an athlete is shown to have a reaction time of less than 100ms. 10ms would be an order of magnitude faster than human reaction time. 3ms a third of that. As the time interval gets smaller the error between rendered and predicted frames becomes smaller.

The theory behind what nVidia is doing is solid. We may not like it but at the end of the day it does produce meaningful fps improvement with very little visual artifacts as Linus pointed out with this short demo of the 5090.
 
Reactions: Tlh97 and Win2012R2

Golgatha

Lifer
Jul 18, 2003
12,310
790
126
What's the problem with it?
It's annoying to re-paste and it's electrically conductive. It's possible when you separate the cooler from the PCB that the TIM could go into someplace it shouldn't and short out a $2000 card. No thank you. Same reason I wouldn't ever water cool a system unless it was absolutely necessary (say tightly packed servers or something); it's a potential headache down the road.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |