Question 'Ampere'/Next-gen gaming uarch speculation thread

Page 53 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Ottonomous

Senior member
May 15, 2014
559
292
136
How much is the Samsung 7nm EUV process expected to provide in terms of gains?
How will the RTX components be scaled/developed?
Any major architectural enhancements expected?
Will VRAM be bumped to 16/12/12 for the top three?
Will there be further fragmentation in the lineup? (Keeping turing at cheaper prices, while offering 'beefed up RTX' options at the top?)
Will the top card be capable of >4K60, at least 90?
Would Nvidia ever consider an HBM implementation in the gaming lineup?
Will Nvidia introduce new proprietary technologies again?

Sorry if imprudent/uncalled for, just interested in the forum member's thoughts.
 

Mopetar

Diamond Member
Jan 31, 2011
7,992
6,407
136
And 40-45% difference in ALU count. So something is genuinely bottlenecking the ALUs.

I think that's just a matter of it being more difficult to fully saturate all of them. We saw the same thing with AMD cards where the insane (especially for the time) number of CUs never translated into the theoretical performance you could get with raw compute tasks.

Unless performance is bottlenecked at some other point (memory, ROPs, etc.) then having a card with half the cores but twice the clock speed would almost assuredly have better real world performance.
 
Reactions: Konan

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
Exactly. Even in GPUs if you can scale frequency, you'd do that over increasing literally anything else, because frequency benefits all.

But the point is the real compute difference is only 20% because the 2080 Super clocks quite a bit higher.
Yep. It scales almost exactly for the (freq*shader) product.
 

uzzi38

Platinum Member
Oct 16, 2019
2,698
6,393
146

_rogame got his hands on a Time Spy score. Lots of details and comparisoms here.
 

DiogoDX

Senior member
Oct 11, 2012
746
277
136

_rogame got his hands on a Time Spy score. Lots of details and comparisoms here.
The hope is that is the 3080. 30% over 2080Ti is too low for the top dog on a new node.
 

uzzi38

Platinum Member
Oct 16, 2019
2,698
6,393
146
I don't think it is.

GA100 at least doesn't show any major uplifts in performance/flop or IPC, whatever you want to call it. There seems to be little adjustments to the uArch in terms of shaders. Consumer facing Ampere I'd hope is different, but I can't see some revolutionary jump in performance coming as a result of this.


The score is in line with what you'd expect from the same kind of clocks and 20-something% extra CUDA cores. I do hope the final clocks are higher though at the least, especially given the 350W rumour. At, say, 2.3GHz, it would become a 50% lead over the 2080Ti which is more like what you'd expect generation on generation.
 
Reactions: Elfear and Glo.

exquisitechar

Senior member
Apr 18, 2017
665
895
136
I don't think it is.

GA100 at least doesn't show any major uplifts in performance/flop or IPC, whatever you want to call it. There seems to be little adjustments to the uArch in terms of shaders. Consumer facing Ampere I'd hope is different, but I can't see some revolutionary jump in performance coming as a result of this.


The score is in line with what you'd expect from the same kind of clocks and 20-something% extra CUDA cores. I do hope the final clocks are higher though at the least, especially given the 350W rumour. At, say, 2.3GHz, it would become a 50% lead over the 2080Ti which is more like what you'd expect generation on generation.
Explains the 350W rumor. If this is the rumored 3090/whatever, Nvidia is in for a fight in raster performance and they’re cranking up the clocks (and power) to compete to the utmost possible. I expect to see them pushing RT performance and DLSS 2.0 like crazy.
 

DXDiag

Member
Nov 12, 2017
165
121
116
This is definitely the 3080 (which will be launching first), 30% faster than 2080Ti is the expected gain for a xx80 SKU on a new node.

If the quoted power figures for the top dog in any true that is, it is not possible for NVIDIA to require 350w for 3090 for a meager 30% uplift on a new node.

GA100 at least doesn't show any major uplifts in performance/flop or IPC,
Nope, it shows 60-70% general HPC performance improvements despite a much lower gains in FP32.
 

uzzi38

Platinum Member
Oct 16, 2019
2,698
6,393
146
This is definitely the 3080 (which will be launching first), 30% faster than 2080Ti is the expected gain for a xx80 SKU on a new node.

If the quoted power figures for the top dog in any true that is, it is not possible for NVIDIA to require 350w for 3090 for a meager 30% uplift on a new node.

Nobody said 350W for a 30% uplift, you're making the ridiculous mistake of assuming these are final clocks at that 350W power budget which nobody is or will suggest.

My estimate for final performance remains the same as it did way back. I think we're looking at a 40-50% uplift over Turing.
 

DXDiag

Member
Nov 12, 2017
165
121
116
Nobody said 350W for a 30% uplift, you're making the ridiculous mistake of assuming these are final clocks at that 350W power budget which nobody is or will suggest.
NVIDIA is launching the 3080 first, the leaks were for the 3080 as well, so it stands to reason that what we are seeing is the 3080 numbers.
 

uzzi38

Platinum Member
Oct 16, 2019
2,698
6,393
146
NVIDIA is launching the 3080 first, the leaks were for the 3080 as well, so it stands to reason that what we are seeing is the 3080 numbers.
There is more than one benchmark. This is the only one with a score. _rogame has found tests for GPUs with 10GB VRAM, 12GB VRAM and 24GB VRAM as of yesterday.

The leaks were for all 3 GA102 based dies.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
There is more than one benchmark. This is the only one with a score. _rogame has found tests for GPUs with 10GB VRAM, 12GB VRAM and 24GB VRAM as of yesterday.

The leaks were for all 3 GA102 based dies.

He didnt find scores. He has access to private scores.

Nope, it shows 60-70% general HPC performance improvements despite a much lower gains in FP32.

A100 with FP16 and TensorCores is 87,5% more efficient than V100.
 
Reactions: DXDiag

jpiniero

Lifer
Oct 1, 2010
14,820
5,432
136
A100 with FP16 and TensorCores is 87,5% more efficient than V100.

But that's with the Tensor cores. Raw Single precision isn't any more efficient, and that's what Raster games use. Plus that's on TSMC 7 and not SS 8.

Certainly would be disappointing if it's pushing 300+ W and only 30% faster.
 

Glo.

Diamond Member
Apr 25, 2015
5,758
4,666
136
If that leaked scores from Next gen gaming GPUs from Nvidia has over 1.9 GHz at least it gives hope that it RTX 3080 Ti that does not have eye watering power draw.

30% performance increase sounds about right looking at the rumored/leaked CU counts by Kopite for specific SKUs, with that clock speed, and the same lack of IPC increase in FP32 that GA100 has.

So If anything I would presume this is the RTX 3080 Ti but with power draw anywhere between 260 and 300W with 30% higher performance that RTX 2080 Ti FE.

If it is 350W at 1.9 GHz it means that effectively by going down a node, Nvidia lost efficiency compared to 12 FFN process, with SS's 10 nm node.
 

jpiniero

Lifer
Oct 1, 2010
14,820
5,432
136
A100 is 60 to 70% faster than V100 in regular FP32/FP64 HPC workloads.

Doesn't seem like that's the case, otherwise they would have quoted a much higher raw FP32/FP64 performance. You'd have to make use of the tensor cores. Maybe we will get some real reviews of the A100 at some point.

If it is 350W at 1.9 GHz it means that effectively by going down a node, Nvidia lost efficiency compared to 12 FFN process, with SS's 10 nm node.

Probably not 350 W, but if there was indeed no efficiency gain, I guess 30% faster for 30% more does make sense.
 

Konan

Senior member
Jul 28, 2017
360
291
106
Certainly would be disappointing if it's pushing 300+ W and only 30% faster.

Whatever it is, it is not the final launch product.

Not only will they be tuning with memory speeds which will only be higher than what we see here, there is going to be iterative driver enhancements and optimization through the QA process.
 

GodisanAtheist

Diamond Member
Nov 16, 2006
7,035
7,456
136
Whatever it is, hopefully NV will be more sensible in their pricing thanks to the cheaper SS node and upcoming compteition from consoles and AMD.

"There are no bad products, only bad pricing" obviously within limits. I fully expect NV to deliver the performance goods, I just hope they don't bleed people dry in the process.
 
Reactions: Elfear and psolord

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
Whatever it is, hopefully NV will be more sensible in their pricing thanks to the cheaper SS node and upcoming compteition from consoles and AMD.

"There are no bad products, only bad pricing" obviously within limits. I fully expect NV to deliver the performance goods, I just hope they don't bleed people dry in the process.
At least it's voluntary.
 

Ajay

Lifer
Jan 8, 2001
16,044
8,086
136
I don't think it is.

GA100 at least doesn't show any major uplifts in performance/flop or IPC, whatever you want to call it. There seems to be little adjustments to the uArch in terms of shaders. Consumer facing Ampere I'd hope is different, but I can't see some revolutionary jump in performance coming as a result of this.


The score is in line with what you'd expect from the same kind of clocks and 20-something% extra CUDA cores. I do hope the final clocks are higher though at the least, especially given the 350W rumour. At, say, 2.3GHz, it would become a 50% lead over the 2080Ti which is more like what you'd expect generation on generation.
Well, NV made large gains in CC performance per clock in Maxwell, Pascal and Turing (IIRC). They've run out of great ideas for increasing that for now. The only thing left may be larger cache sizes, but these GPUs are going to be pretty big on SS 8nm and at suspected clocks, probably too power hungry for any big change. The whole memory hierarchy is bottle-necking FMA performance with the high number of compute elements and clocks speeds for any realistic set of scenes.
 
Reactions: Saylick

uzzi38

Platinum Member
Oct 16, 2019
2,698
6,393
146
Well, NV made large gains in CC performance per clock in Maxwell, Pascal and Turing (IIRC). They've run out of great ideas for increasing that for now. The only thing left may be larger cache sizes, but these GPUs are going to be pretty big on SS 8nm and at suspected clocks, probably too power hungry for any big change. The whole memory hierarchy is bottle-necking FMA performance with the high number of compute elements and clocks speeds for any realistic set of scenes.
Maxwell -> Pascal performance increase is primarily through increasing shader counts and a huge uptick in clocks. uArch differences were much smaller than other uArch changes, just the jump from going 28nm -> 16nm was huge.

Anyway, everybody's acting like 8LPP is the end of the world, but it's really not. It's still like a 40-50% improvement in power/perf vs TSMC N16 (give or take), it's just abysmal in density is all (compared to N7 that is).
 
Reactions: Lodix
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |