Discussion Nvidia Blackwell in Q1-2025

adroc_thurston · Jan 29, 2025

Kepler_L2 said:
Regressions in L1 bandwidth and L2/mem latency:

Wait, lower L1 b/w with a lot more SMs?
And they won exactly 0MHz in fmax on that?

branch_suggestion · Jan 29, 2025

Kepler_L2 said:
Regressions in L1 bandwidth and L2/mem latency:
View attachment 115829
View attachment 115828

Yep, that explains a lot.
Latency is expected, bigger chip after all.
Aggregate L1/L2 bandwidth being lower is a huge shock, likely a forced regression due to power limitations.

adroc_thurston · Jan 29, 2025

branch_suggestion said:
Yep, that explains a lot.
Latency is expected, bigger chip after all.
Aggregate L1/L2 bandwidth being lower is a huge shock, likely a forced regression due to power limitations.

Well it has moar SMs, it shouldn't regress in L1 b/w at least.

branch_suggestion · Jan 29, 2025

adroc_thurston said:
Well it has moar SMs, it shouldn't regress in L1 b/w at least.

5090 is 0.88x L1 bandwidth with 1.33x SM's.
That is 50% less bandwidth/SM, what on earth?

adroc_thurston · Jan 29, 2025

branch_suggestion said:
5090 is 0.88x L1 bandwidth with 1.33x SM's.
That is 50% less bandwidth/SM, what on earth?

Well it clocks lower.
But that's a nasty regression either way.
They lobotomized the SM to not win on fmax.
Like Maxwell lobotomized the SM inna bunch of ways relative to Kepler but it also clocked like 30-40% faster iso node.

I think they just missed and by quite a lot.

Kepler_L2 · Jan 29, 2025

branch_suggestion said:
5090 is 0.88x L1 bandwidth with 1.33x SM's.
That is 50% less bandwidth/SM, what on earth?

With Ampere and the dual-FP32 design they doubled L1 bandwidth, seems like the return to Maxwell/Pascal-like ALU design also returned to the older L1.

adroc_thurston · Jan 29, 2025

Kepler_L2 said:
With Ampere and the dual-FP32 design they doubled L1 bandwidth, seems like the return to Maxwell/Pascal-like ALU design also returned to the older L1.

Turing also had cucked L1 b/w relative to Volta.
Idk what's the reason of doing it here.

Grooveriding · Jan 29, 2025

SolidQ said:
IgorLabs seems post early RTX 5080 review, and then took it down fast.
People who read there, was like 7% over 4080s in 4k

Pathetic. 5080 is a disaster. They know they can get it away with it though, as it’s still the third best card, and the second best is only available in the used market for more $$.

branch_suggestion · Jan 29, 2025

adroc_thurston said:
Well it clocks lower.
But that's a nasty regression either way.
They lobotomized the SM to not win on fmax.
Like Maxwell lobotomized the SM inna bunch of ways relative to Kepler but it also clocked like 30-40% faster iso node.

I think they just missed and by quite a lot.

Clock drop is only like 10% so they clearly architected for density at the cost of associativity, hoping to get a good fmax bump to compensate ala Maxwell.
Instead the opposite happened with seemingly little to show in PPW.

Kepler_L2 said:
With Ampere and the dual-FP32 design they doubled L1 bandwidth, seems like the return to Maxwell/Pascal-like ALU design also returned to the older L1.

Sounds like a design lead quirk.

adroc_thurston · Jan 29, 2025

branch_suggestion said:
so they clearly architected for density at the cost of associativity, hoping to get a good fmax bump to compensate ala Maxwell.

Well ain't much density or fmax to find here.

branch_suggestion said:
Instead the opposite happened with seemingly little to show in PPW.

It's actually a PPW regression if it's 15% more power for 8% more perf.

SolidQ · Jan 29, 2025

and people complaining about 6800XT vs 7800XT, despite 7800XT have reduced to 60CU

adroc_thurston · Jan 29, 2025

SolidQ said:
and people complaining about 6800XT vs 7800XT, despite 7800XT have reduced to 60CU

Well in both cases it was designed to frequency harder.

coercitiv · Jan 29, 2025

Cue in the RTX 508% jokes, but then again the joke is on us unfortunately.

jpiniero · Jan 29, 2025

adroc_thurston said:
It's actually a PPW regression if it's 15% more power for 8% more perf.

Igor measured 4.2% more power. Still not great of course.

adroc_thurston · Jan 29, 2025

jpiniero said:
Igor measured 4.2% more power. Still not great of course.

That's pretty much down to GDDR7 pj/b improvements then.

ToTTenTranz · Jan 29, 2025

Win2012R2 said:
Supply is very low because this is a **** gen so the only way they can get away with launching it is by creating artificial scarcity.

With 512bit GDDR7 bandwidth and 8x FP4 throughput, the RTX5090 can probably run DeepSeek's (or similarly trained) 32B model super fast.

It's going to be gobbled up by businesses that can now run GPT-o1 (ish) models on a ~$2000 GPU which is ridiculous compared to what they needed before. Same thing with the RTX5080 running 14B models.

adroc_thurston said:
Just genuinely baffled at NV delivering a new SM that's just like, nothing.

Blackwell SM were upgraded for transformers and tensor FP4, nothing else. There's really nothing there for gaming, except for the new transformer-based DLSS.
Gaming was an afterthought for Blackwell. It's an AI architecture. Be glad there's still ROPs in there.

They're going to be great cards if you want to become an AI hobbyist, though.

Win2012R2 · Jan 29, 2025

ToTTenTranz said:
There's really nothing there for gaming

Changing 2nd FP32 to also support INT should have done something positive since according to Nvidia 1/3 ops are integer and that was supposedly limiting factor in dual issue, but it looks like it did diddly squat, perhaps because of bandwidth limitation for L1 and GDDR7 cant fix it

Grooveriding said:
They know they can get it away with it though

Can they? Last time 4080 bombed (and "4080" got canned in days).

SolidQ · Jan 29, 2025

4070Super

RTX 5070

So possible 5070 will slower, than 4070S in some scenarios?

jpiniero · Jan 29, 2025

SolidQ said:
So possible 5070 will slower, than 4070S in some scenarios?

Seems like it should be.

ToTTenTranz · Jan 29, 2025

Win2012R2 said:
Changing 2nd FP32 to also support INT should have done something positive since according to Nvidia 1/3 ops are integer and that was supposedly limiting factor in dual issue, but it looks like it did diddly squat, perhaps because of bandwidth limitation for L1 and GDDR7 cant fix it

Dual-pumping ALUs make very little difference if they're not increasing caches and registers accordingly. Turing introduced FP32 + INT32 and that's where the bulk of performance increase per-SM happened. Ampere then made it FP32/INT32+FP32 and it did almost nothing, and now it's FP32/INT32+FP32/INT32 and it did nothing.

The truth is dual-pumping ALUs seems to be super cheap transistors and area-wise, but performance gains are just as small.
AMD saw the same when they introduced dual-pumped ALUs in RDNA3.

CastleBravo · Jan 29, 2025

I want to see performance of 5080 vs 4080 with DLSS4 transformer model turned on. If the 50 series doesn't lose ~5% FPS with transformer vs CNN like the 40 series, maybe the 5080 is at least slightly better.

Heartbreaker · Jan 29, 2025

CastleBravo said:
I want to see performance of 5080 vs 4080 with DLSS4 transformer model turned on. If the 50 series doesn't lose ~5% FPS with transformer vs CNN like the 40 series, maybe the 5080 is at least slightly better.

Totally better. 3X as many Fake Frames. What more could you ask for?

CastleBravo · Jan 29, 2025

Heartbreaker said:
Totally better. 3X as many Fake Frames. What more could you ask for?

Not frame gen, just super resolution. Blackwell might handle new and improved model without a performance hit unlike Ada and Ampere. Hopefully we end up with at least a "4080 Ti" rather than just a 4080 Tie.

Win2012R2 · Jan 29, 2025

ToTTenTranz said:
Ampere then made it FP32/INT32+FP32 and it did almost nothing

Yeah, but that was explained at the time that since in gaming INT is 33% of the time they could not keep second FP32 busy, but now they both supposedly support it, so what's the problem - total bandwidth starving? It's important because if they fix that on N3 then perf in 60 series might be good = decision not to buy 50 series

ToTTenTranz · Jan 29, 2025

Win2012R2 said:
Yeah, but that was explained at the time that since in gaming INT is 33% of the time they could not keep second FP32 busy, but now they both supposedly support it, so what's the problem - total bandwidth starving? It's important because if they fix that on N3 then perf in 60 series might be good = decision not to buy 50 series

60 series are most probably coming only in 2027.

RTX20/30 and RX6000 users are probably going to upgrade this year, and they're not waiting another 2 years.

Discussion Nvidia Blackwell in Q1-2025

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Diamond Member

Senior member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Senior member

Senior member

Golden Member

Lifer

Senior member

Member

Diamond Member

Member

Senior member

Senior member