adroc_thurston
Diamond Member
- Jul 2, 2023
- 4,714
- 6,503
- 96
Wait, lower L1 b/w with a lot more SMs?Regressions in L1 bandwidth and L2/mem latency:
And they won exactly 0MHz in fmax on that?
Wait, lower L1 b/w with a lot more SMs?Regressions in L1 bandwidth and L2/mem latency:
Yep, that explains a lot.
Well it has moar SMs, it shouldn't regress in L1 b/w at least.Yep, that explains a lot.
Latency is expected, bigger chip after all.
Aggregate L1/L2 bandwidth being lower is a huge shock, likely a forced regression due to power limitations.
5090 is 0.88x L1 bandwidth with 1.33x SM's.Well it has moar SMs, it shouldn't regress in L1 b/w at least.
Well it clocks lower.5090 is 0.88x L1 bandwidth with 1.33x SM's.
That is 50% less bandwidth/SM, what on earth?
With Ampere and the dual-FP32 design they doubled L1 bandwidth, seems like the return to Maxwell/Pascal-like ALU design also returned to the older L1.5090 is 0.88x L1 bandwidth with 1.33x SM's.
That is 50% less bandwidth/SM, what on earth?
Turing also had cucked L1 b/w relative to Volta.With Ampere and the dual-FP32 design they doubled L1 bandwidth, seems like the return to Maxwell/Pascal-like ALU design also returned to the older L1.
IgorLabs seems post early RTX 5080 review, and then took it down fast.
People who read there, was like 7% over 4080s in 4k
Clock drop is only like 10% so they clearly architected for density at the cost of associativity, hoping to get a good fmax bump to compensate ala Maxwell.Well it clocks lower.
But that's a nasty regression either way.
They lobotomized the SM to not win on fmax.
Like Maxwell lobotomized the SM inna bunch of ways relative to Kepler but it also clocked like 30-40% faster iso node.
I think they just missed and by quite a lot.
Sounds like a design lead quirk.With Ampere and the dual-FP32 design they doubled L1 bandwidth, seems like the return to Maxwell/Pascal-like ALU design also returned to the older L1.
Well ain't much density or fmax to find here.so they clearly architected for density at the cost of associativity, hoping to get a good fmax bump to compensate ala Maxwell.
It's actually a PPW regression if it's 15% more power for 8% more perf.Instead the opposite happened with seemingly little to show in PPW.
Well in both cases it was designed to frequency harder.and people complaining about 6800XT vs 7800XT, despite 7800XT have reduced to 60CU
It's actually a PPW regression if it's 15% more power for 8% more perf.
That's pretty much down to GDDR7 pj/b improvements then.Igor measured 4.2% more power. Still not great of course.
Supply is very low because this is a **** gen so the only way they can get away with launching it is by creating artificial scarcity.
Just genuinely baffled at NV delivering a new SM that's just like, nothing.
Changing 2nd FP32 to also support INT should have done something positive since according to Nvidia 1/3 ops are integer and that was supposedly limiting factor in dual issue, but it looks like it did diddly squat, perhaps because of bandwidth limitation for L1 and GDDR7 cant fix itThere's really nothing there for gaming
They know they can get it away with it though
So possible 5070 will slower, than 4070S in some scenarios?
Changing 2nd FP32 to also support INT should have done something positive since according to Nvidia 1/3 ops are integer and that was supposedly limiting factor in dual issue, but it looks like it did diddly squat, perhaps because of bandwidth limitation for L1 and GDDR7 cant fix it
I want to see performance of 5080 vs 4080 with DLSS4 transformer model turned on. If the 50 series doesn't lose ~5% FPS with transformer vs CNN like the 40 series, maybe the 5080 is at least slightly better.
Totally better. 3X as many Fake Frames. What more could you ask for?
Yeah, but that was explained at the time that since in gaming INT is 33% of the time they could not keep second FP32 busy, but now they both supposedly support it, so what's the problem - total bandwidth starving? It's important because if they fix that on N3 then perf in 60 series might be good = decision not to buy 50 seriesAmpere then made it FP32/INT32+FP32 and it did almost nothing
Yeah, but that was explained at the time that since in gaming INT is 33% of the time they could not keep second FP32 busy, but now they both supposedly support it, so what's the problem - total bandwidth starving? It's important because if they fix that on N3 then perf in 60 series might be good = decision not to buy 50 series