Question 'Ampere'/Next-gen gaming uarch speculation thread

Ottonomous · Nov 1, 2019

How much is the Samsung 7nm EUV process expected to provide in terms of gains?
How will the RTX components be scaled/developed?
Any major architectural enhancements expected?
Will VRAM be bumped to 16/12/12 for the top three?
Will there be further fragmentation in the lineup? (Keeping turing at cheaper prices, while offering 'beefed up RTX' options at the top?)
Will the top card be capable of >4K60, at least 90?
Would Nvidia ever consider an HBM implementation in the gaming lineup?
Will Nvidia introduce new proprietary technologies again?

Sorry if imprudent/uncalled for, just interested in the forum member's thoughts.

IntelUser2000 · Jun 16, 2020

CastleBravo said:
Frequency?

Exactly. Even in GPUs if you can scale frequency, you'd do that over increasing literally anything else, because frequency benefits all.

But the point is the real compute difference is only 20% because the 2080 Super clocks quite a bit higher.

Mopetar · Jun 17, 2020

Glo. said:
And 40-45% difference in ALU count. So something is genuinely bottlenecking the ALUs.

I think that's just a matter of it being more difficult to fully saturate all of them. We saw the same thing with AMD cards where the insane (especially for the time) number of CUs never translated into the theoretical performance you could get with raw compute tasks.

Unless performance is bottlenecked at some other point (memory, ROPs, etc.) then having a card with half the cores but twice the clock speed would almost assuredly have better real world performance.

maddie · Jun 17, 2020

IntelUser2000 said:
Exactly. Even in GPUs if you can scale frequency, you'd do that over increasing literally anything else, because frequency benefits all.

But the point is the real compute difference is only 20% because the 2080 Super clocks quite a bit higher.

Yep. It scales almost exactly for the (freq*shader) product.

uzzi38 · Jun 21, 2020

Exclusive first look at Nvidia's Ampere Gaming performance - HardwareLeaks.com

116,478 total views, 9 views today Next-gen Nvidia Ampere gaming GPUs are one of the most anticipated pieces of hardware in the PC and gaming scene this year. Over the course of the last few months, a number of leaks surfaced detailing various aspects of the much anticipated gaming GPUs...

hardwareleaks.com

_rogame got his hands on a Time Spy score. Lots of details and comparisoms here.

DiogoDX · Jun 21, 2020

uzzi38 said:
Exclusive first look at Nvidia's Ampere Gaming performance - HardwareLeaks.com

116,478 total views, 9 views today Next-gen Nvidia Ampere gaming GPUs are one of the most anticipated pieces of hardware in the PC and gaming scene this year. Over the course of the last few months, a number of leaks surfaced detailing various aspects of the much anticipated gaming GPUs...

hardwareleaks.com

_rogame got his hands on a Time Spy score. Lots of details and comparisoms here.

The hope is that is the 3080. 30% over 2080Ti is too low for the top dog on a new node.

uzzi38 · Jun 21, 2020

I don't think it is.

GA100 at least doesn't show any major uplifts in performance/flop or IPC, whatever you want to call it. There seems to be little adjustments to the uArch in terms of shaders. Consumer facing Ampere I'd hope is different, but I can't see some revolutionary jump in performance coming as a result of this.

https://twitter.com/x/status/1274075738992451585

The score is in line with what you'd expect from the same kind of clocks and 20-something% extra CUDA cores. I do hope the final clocks are higher though at the least, especially given the 350W rumour. At, say, 2.3GHz, it would become a 50% lead over the 2080Ti which is more like what you'd expect generation on generation.

exquisitechar · Jun 21, 2020

uzzi38 said:
I don't think it is.

GA100 at least doesn't show any major uplifts in performance/flop or IPC, whatever you want to call it. There seems to be little adjustments to the uArch in terms of shaders. Consumer facing Ampere I'd hope is different, but I can't see some revolutionary jump in performance coming as a result of this.

https://twitter.com/x/status/1274075738992451585

The score is in line with what you'd expect from the same kind of clocks and 20-something% extra CUDA cores. I do hope the final clocks are higher though at the least, especially given the 350W rumour. At, say, 2.3GHz, it would become a 50% lead over the 2080Ti which is more like what you'd expect generation on generation.

Explains the 350W rumor. If this is the rumored 3090/whatever, Nvidia is in for a fight in raster performance and they’re cranking up the clocks (and power) to compete to the utmost possible. I expect to see them pushing RT performance and DLSS 2.0 like crazy.

DXDiag · Jun 21, 2020

This is definitely the 3080 (which will be launching first), 30% faster than 2080Ti is the expected gain for a xx80 SKU on a new node.

If the quoted power figures for the top dog in any true that is, it is not possible for NVIDIA to require 350w for 3090 for a meager 30% uplift on a new node.

uzzi38 said:
GA100 at least doesn't show any major uplifts in performance/flop or IPC,

Nope, it shows 60-70% general HPC performance improvements despite a much lower gains in FP32.

uzzi38 · Jun 21, 2020

DXDiag said:
This is definitely the 3080 (which will be launching first), 30% faster than 2080Ti is the expected gain for a xx80 SKU on a new node.

If the quoted power figures for the top dog in any true that is, it is not possible for NVIDIA to require 350w for 3090 for a meager 30% uplift on a new node.

Nobody said 350W for a 30% uplift, you're making the ridiculous mistake of assuming these are final clocks at that 350W power budget which nobody is or will suggest.

My estimate for final performance remains the same as it did way back. I think we're looking at a 40-50% uplift over Turing.

DXDiag · Jun 21, 2020

uzzi38 said:
Nobody said 350W for a 30% uplift, you're making the ridiculous mistake of assuming these are final clocks at that 350W power budget which nobody is or will suggest.

NVIDIA is launching the 3080 first, the leaks were for the 3080 as well, so it stands to reason that what we are seeing is the 3080 numbers.

uzzi38 · Jun 21, 2020

DXDiag said:
NVIDIA is launching the 3080 first, the leaks were for the 3080 as well, so it stands to reason that what we are seeing is the 3080 numbers.

There is more than one benchmark. This is the only one with a score. _rogame has found tests for GPUs with 10GB VRAM, 12GB VRAM and 24GB VRAM as of yesterday.

The leaks were for all 3 GA102 based dies.

DXDiag · Jun 21, 2020

uzzi38 said:
There is more than one benchmark. This is the only one with a score. _rogame has found tests for GPUs with 10GB VRAM, 12GB VRAM and 24GB VRAM as of yesterday.

This scores are for which one?

uzzi38 · Jun 21, 2020

DXDiag said:
This scores are for which one?

VRAM is misreported for this sample. We know what it is because of something else, but I was asked not to share.

sontin · Jun 21, 2020

uzzi38 said:
There is more than one benchmark. This is the only one with a score. _rogame has found tests for GPUs with 10GB VRAM, 12GB VRAM and 24GB VRAM as of yesterday.

The leaks were for all 3 GA102 based dies.

He didnt find scores. He has access to private scores.

DXDiag said:
Nope, it shows 60-70% general HPC performance improvements despite a much lower gains in FP32.

A100 with FP16 and TensorCores is 87,5% more efficient than V100.

jpiniero · Jun 21, 2020

sontin said:
A100 with FP16 and TensorCores is 87,5% more efficient than V100.

But that's with the Tensor cores. Raw Single precision isn't any more efficient, and that's what Raster games use. Plus that's on TSMC 7 and not SS 8.

Certainly would be disappointing if it's pushing 300+ W and only 30% faster.

sontin · Jun 21, 2020

No, throughput with FP16 vector operations is 4:1 to FP32. So within 400W A100 delivers 2,5x more performance with FP16 than V100.

DXDiag · Jun 21, 2020

jpiniero said:
Raw Single precision isn't any more efficient

A100 is 60 to 70% faster than V100 in regular FP32/FP64 HPC workloads.

Glo. · Jun 21, 2020

If that leaked scores from Next gen gaming GPUs from Nvidia has over 1.9 GHz at least it gives hope that it RTX 3080 Ti that does not have eye watering power draw.

30% performance increase sounds about right looking at the rumored/leaked CU counts by Kopite for specific SKUs, with that clock speed, and the same lack of IPC increase in FP32 that GA100 has.

So If anything I would presume this is the RTX 3080 Ti but with power draw anywhere between 260 and 300W with 30% higher performance that RTX 2080 Ti FE.

If it is 350W at 1.9 GHz it means that effectively by going down a node, Nvidia lost efficiency compared to 12 FFN process, with SS's 10 nm node.

jpiniero · Jun 21, 2020

DXDiag said:
A100 is 60 to 70% faster than V100 in regular FP32/FP64 HPC workloads.

Doesn't seem like that's the case, otherwise they would have quoted a much higher raw FP32/FP64 performance. You'd have to make use of the tensor cores. Maybe we will get some real reviews of the A100 at some point.

Glo. said:
If it is 350W at 1.9 GHz it means that effectively by going down a node, Nvidia lost efficiency compared to 12 FFN process, with SS's 10 nm node.

Probably not 350 W, but if there was indeed no efficiency gain, I guess 30% faster for 30% more does make sense.

Konan · Jun 21, 2020

jpiniero said:
Certainly would be disappointing if it's pushing 300+ W and only 30% faster.

Whatever it is, it is not the final launch product.

Not only will they be tuning with memory speeds which will only be higher than what we see here, there is going to be iterative driver enhancements and optimization through the QA process.

GodisanAtheist · Jun 21, 2020

Whatever it is, hopefully NV will be more sensible in their pricing thanks to the cheaper SS node and upcoming compteition from consoles and AMD.

"There are no bad products, only bad pricing" obviously within limits. I fully expect NV to deliver the performance goods, I just hope they don't bleed people dry in the process.

jpiniero · Jun 21, 2020

Videocardz says the PCIe version of the A100 will be announced tomorrow. Maybe you will get a hint as to how high W they are willing to go on PCIe.

maddie · Jun 21, 2020

GodisanAtheist said:
Whatever it is, hopefully NV will be more sensible in their pricing thanks to the cheaper SS node and upcoming compteition from consoles and AMD.

"There are no bad products, only bad pricing" obviously within limits. I fully expect NV to deliver the performance goods, I just hope they don't bleed people dry in the process.

At least it's voluntary.

Ajay · Jun 21, 2020

uzzi38 said:
I don't think it is.

GA100 at least doesn't show any major uplifts in performance/flop or IPC, whatever you want to call it. There seems to be little adjustments to the uArch in terms of shaders. Consumer facing Ampere I'd hope is different, but I can't see some revolutionary jump in performance coming as a result of this.

https://twitter.com/x/status/1274075738992451585

The score is in line with what you'd expect from the same kind of clocks and 20-something% extra CUDA cores. I do hope the final clocks are higher though at the least, especially given the 350W rumour. At, say, 2.3GHz, it would become a 50% lead over the 2080Ti which is more like what you'd expect generation on generation.

Well, NV made large gains in CC performance per clock in Maxwell, Pascal and Turing (IIRC). They've run out of great ideas for increasing that for now. The only thing left may be larger cache sizes, but these GPUs are going to be pretty big on SS 8nm and at suspected clocks, probably too power hungry for any big change. The whole memory hierarchy is bottle-necking FMA performance with the high number of compute elements and clocks speeds for any realistic set of scenes.

uzzi38 · Jun 21, 2020

Ajay said:
Well, NV made large gains in CC performance per clock in Maxwell, Pascal and Turing (IIRC). They've run out of great ideas for increasing that for now. The only thing left may be larger cache sizes, but these GPUs are going to be pretty big on SS 8nm and at suspected clocks, probably too power hungry for any big change. The whole memory hierarchy is bottle-necking FMA performance with the high number of compute elements and clocks speeds for any realistic set of scenes.

Maxwell -> Pascal performance increase is primarily through increasing shader counts and a huge uptick in clocks. uArch differences were much smaller than other uArch changes, just the jump from going 28nm -> 16nm was huge.

Anyway, everybody's acting like 8LPP is the end of the world, but it's really not. It's still like a 40-50% improvement in power/perf vs TSMC N16 (give or take), it's just abysmal in density is all (compared to N7 that is).

Question 'Ampere'/Next-gen gaming uarch speculation thread

Senior member

Elite Member

Diamond Member

Diamond Member

Platinum Member

Senior member

Platinum Member

Senior member

Member

Platinum Member

Member

Platinum Member

Member

Platinum Member

Diamond Member

Lifer

Diamond Member

Member

Diamond Member

Lifer

Senior member

Diamond Member

Lifer

Diamond Member

Lifer

Platinum Member