Question 'Ampere'/Next-gen gaming uarch speculation thread

Page 18 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Ottonomous

Senior member
May 15, 2014
559
292
136
How much is the Samsung 7nm EUV process expected to provide in terms of gains?
How will the RTX components be scaled/developed?
Any major architectural enhancements expected?
Will VRAM be bumped to 16/12/12 for the top three?
Will there be further fragmentation in the lineup? (Keeping turing at cheaper prices, while offering 'beefed up RTX' options at the top?)
Will the top card be capable of >4K60, at least 90?
Would Nvidia ever consider an HBM implementation in the gaming lineup?
Will Nvidia introduce new proprietary technologies again?

Sorry if imprudent/uncalled for, just interested in the forum member's thoughts.
 

uzzi38

Platinum Member
Oct 16, 2019
2,702
6,405
146
Yes. Turing is 3 years old at this point, it's just an RT evolution of Volta which was introduced 3 years ago. NVIDIA will revamp the uarch.

Take a look at the provided leaked benchmarks, the 7552 core Ampere GPU (@1100MHz) is beating the 4608 core Turing GPU (@1800MHz) by 40% despite using lower clocks, which means this is a ~40% increase in IPC alone. Imagine if the 7552 core GPU operated at something like 1600MHz or more.
Why are you comparing against the RTX Titan.

Compare vs Volta which comes with it's own optimised driver set and, you know, is the actual predecessor to the GPU in that benchmark: https://browser.geekbench.com/v5/compute/576479

Compared against a V100 with 5120 shaders clocking at 1.38GHz, it manages a mere 10% lead.
 
Reactions: DisEnchantment

Dribble

Platinum Member
Aug 9, 2005
2,076
611
136
I think Nvidia could put out a 3080 Ti with 70% more non-RTX performance than the 2080 Ti, but only if it is a 7nm 700+mm^2 monster with an MSRP of $1,500+. I don't see a 70% performance per dollar increase for the top-end card in anything other than ray tracing.
I agree, they will only have 70% performance improvement with a monster die, instead they'll make a smaller cheaper die and go for 35% which is basically about they aim for with every new arch. That's both all they need and it gives them somewhere to go for the followup to Amphere which will almost certainly still be on 7nm.
The exception being ray tracing and tensor cores (DLSS) which will see a big upgrade.

Still everyone will buy it - lower power, 35% better raster, much better ray tracing and DLSS - we'll all want one.
 
Reactions: xpea

CastleBravo

Member
Dec 6, 2019
119
271
96
I agree, they will only have 70% performance improvement with a monster die, instead they'll make a smaller cheaper die and go for 35% which is basically about they aim for with every new arch. That's both all they need and it gives them somewhere to go for the followup to Amphere which will almost certainly still be on 7nm.
The exception being ray tracing and tensor cores (DLSS) which will see a big upgrade.

Going with the smaller die will also give them some room to lower pricing if and when AMD puts too much pressure on them.
 

DXDiag

Member
Nov 12, 2017
165
121
116
Compared against a V100 with 5120 shaders clocking at 1.38GHz, it manages a mere 10% lead.
Because you are comparing an engineering chip with beta old drivers to a fully mature chip driver wise and optimizations wise?

At same 1.38GHz clocks it would be 40% faster as well! at 1600MHz it would be 60% faster, not counting the factors of mature drivers and optimizations.
 
Reactions: xpea

uzzi38

Platinum Member
Oct 16, 2019
2,702
6,405
146
Because you are comparing an engineering chip with beta old drivers to a fully mature chip driver wise and optimizations wise?

At same 1.38GHz clocks it would be 40% faster as well! at 1600MHz it would be 60% faster, not counting the factors of mature drivers and optimizations.

How is that any different to the Titan RTX? Driver wise etc they're the same.

The differences with what I've suggested comes down to compute-optimised drivers and non-gimped compute performance.

As for clocks, give it up. You have no ground to stand on, and you've reached the point where you're stretching to try reach your initial 70% target. Give it up.

For the record, my expectation for A100 has always been about 50% over Volta. This little discussion has been good for reconfirming my expectations. Thanks.
 

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
Because you are comparing an engineering chip with beta old drivers to a fully mature chip driver wise and optimizations wise?

At same 1.38GHz clocks it would be 40% faster as well! at 1600MHz it would be 60% faster, not counting the factors of mature drivers and optimizations.

I am not understanding your math here, or in your previous post talking about increased IPC.

For one, you are assuming a linear performance increase with clocks, which is very rarely the case. But also some work loads scale better on core count than clock speeds, and some the other way around.

I think claiming a 40% increase in IPC is just wrong. There is not only no where close to the amount of data required to extrapolate this, but that kind of increase in IPC is unheard of in the GPU world in recent years.
 

DXDiag

Member
Nov 12, 2017
165
121
116
How is that any different to the Titan RTX? Driver wise etc they're the same.
A Titan V also has the same performance of Titan RTX. If you like that comparison better. Secondly you are assuming these leaked chips are Tesla, I think they are Quadros.

I like to use the Titan RTX because I want to compare gaming performance. Not purely Compute ones. You seriously think a 118CU Ampere part is going to be only 10% faster than a 80CU Volta part? 50% doesn't even cut it if they were at the same node, let alone 7N.


For one, you are assuming a linear performance increase with clocks, which is very rarely the case. But also some work loads scale better on core count than clock speeds, and some the other way around.
I didn't factor linear scaling of clocks in that comparison, I simply stated the obvious: engineering samples of 118CU Ampere beats 80CU Titan V or 72CU Titan RTX by 40% while operating at lower clocks.
 

Elfear

Diamond Member
May 30, 2004
7,115
690
126
I agree, they will only have 70% performance improvement with a monster die, instead they'll make a smaller cheaper die and go for 35% which is basically about they aim for with every new arch. That's both all they need and it gives them somewhere to go for the followup to Amphere which will almost certainly still be on 7nm.
The exception being ray tracing and tensor cores (DLSS) which will see a big upgrade.

Still everyone will buy it - lower power, 35% better raster, much better ray tracing and DLSS - we'll all want one.

Ya, but not at $1,200+. 35% more performance vs Turing just doesn't seem worth it IMO.
 

GodisanAtheist

Diamond Member
Nov 16, 2006
7,062
7,487
136
I figure NV will take the 980TI -> 1080 approach here and release a "GA104" die on the shrunk process that will provide a 30% boost over the 2080ti performance and a cut down version that will marginally edge out the 2080TI for less.

This scheme worked almost perfectly for them with the Pascal series, I cannot see them letting that winning formula go. The unwashed masses get better performance for the price, NV gets manageable costs, the product cycle gets drawn out with a subsequent "TI" release as well as a respin on the same process with larger dies.

Whatever monster die NV is cooking up right now is almost certainly going to be ML/Compute only.
 
Reactions: DXDiag

Hitman928

Diamond Member
Apr 15, 2012
5,600
8,790
136
No, it's just the increased reticle limit for 16nm.
Exactly. Unlike GF 12LP vs Samsung 14LP on which it was licensed, where GF got 10% more density -- TSMC 12 and 16 have no differences in any meaningful respect, except like you said, reticle was widened. I can't say that the masking process wasn't refined and as a result there may be some power and heat savings, but density wise nothing changed.

Where you two getting your info? Again, according to TSMC, this is not true, 12 nm has density and efficiency enhancements over 16 nm. It's not like a full node jump, maybe not even a half node jump, but it's not the same process with a reticle increase and has significant enough changes for multiple companies to use it even when reticle limit isn't an issue.

The enhanced process is said to feature lower leakage [and] better cost characteristics

12nm FinFET Compact Technology (12FFC) drives gate density to the maximum and provides the best performance among the industry's 16/14nm-class offerings.

The number I've heard quoted as a guideline is 20% higher density for 12 nm over 16 nm but again, I don't personally have access to 12 nm to confirm (and couldn't with any precision due to NDA even if I did).
 

uzzi38

Platinum Member
Oct 16, 2019
2,702
6,405
146
Where you two getting your info? Again, according to TSMC, this is not true, 12 nm has density and efficiency enhancements over 16 nm. It's not like a full node jump, maybe not even a half node jump, but it's not the same process with a reticle increase and has significant enough changes for multiple companies to use it even when reticle limit isn't an issue.





The number I've heard quoted as a guideline is 20% higher density for 12 nm over 16 nm but again, I don't personally have access to 12 nm to confirm (and couldn't with any precision due to NDA even if I did).
Oh sorry, forgot about replying to this.

Wikichip lists all the specs of 12nm as being the same as 16nm. Both are under the same table.

 

DXDiag

Member
Nov 12, 2017
165
121
116
This scheme worked almost perfectly for them with the Pascal series, I cannot see them letting that winning formula go. The unwashed masses get better performance for the price, NV gets manageable costs, the product cycle gets drawn out with a subsequent "TI" release as well as a respin on the same process with larger dies.
Yes, I expect this too. 3080 to be like 30% faster than 2080Ti, then 3080Ti to be like 60-70% faster than 2080Ti.
 
Last edited:

Hitman928

Diamond Member
Apr 15, 2012
5,600
8,790
136
Oh sorry, forgot about replying to this.

Wikichip lists all the specs of 12nm as being the same as 16nm. Both are under the same table.


I think they do that because the actual specs are unknown and its based off of 16 nm. Even in the description under the table they mention 12 nm has density and power use improvements.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
I figure NV will take the 980TI -> 1080 approach here and release a "GA104" die on the shrunk process that will provide a 30% boost over the 2080ti performance and a cut down version that will marginally edge out the 2080TI for less.

This scheme worked almost perfectly for them with the Pascal series, I cannot see them letting that winning formula go. The unwashed masses get better performance for the price, NV gets manageable costs, the product cycle gets drawn out with a subsequent "TI" release as well as a respin on the same process with larger dies.

Whatever monster die NV is cooking up right now is almost certainly going to be ML/Compute only.

Except the GTX1080 didnt have any Tensor and RT Cores, both need lots of Xtors and that means less space for raster performance.

Its the same reason TU104 (RTX2080) with 13.6B Xtors was only 5-10% faster than the GTX1080Ti with only 11.8B Xtros at the time of release.
If all those extra Xtors on the TU104 were utilized for Raster performance then RTX2080 would be way faster.
 
Reactions: beginner99

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
I think they do that because the actual specs are unknown and its based off of 16 nm. Even in the description under the table they mention 12 nm has density and power use improvements.
You're correct, my apologies on that. I shouldn't rely on Wikichip.

12FFC offers 6 track cells (which 16FFC doesn't), and a 10% process shrink to go along with 10% speed gain or 25% power reduction vs 16FFC.
 

GodisanAtheist

Diamond Member
Nov 16, 2006
7,062
7,487
136
Except the GTX1080 didnt have any Tensor and RT Cores, both need lots of Xtors and that means less space for raster performance.

Its the same reason TU104 (RTX2080) with 13.6B Xtors was only 5-10% faster than the GTX1080Ti with only 11.8B Xtros at the time of release.
If all those extra Xtors on the TU104 were utilized for Raster performance then RTX2080 would be way faster.

- The 2080 was a cut down chip though.

The 2080S is a better example of what we're looking at full chip vs full chip and according to TPU, it looks like a 12% performance difference in favor of the 2080S and the gap only widens at higher resolutions. Given there is roughly a 13% difference in transistor count in favor of the 2080s, thats about as linear as scaling gets despite the added transistor load and new features.

I can see NV boosting performance through higher clocks afforded by the smaller process + whatever arch magic they seem to always seem to whip up to hit that magic 30% number.
 
Reactions: DXDiag

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
Yes, I expect this too. 3080 to be like 30% faster than 2080Ti, then 3080Ti to be like 60-70% faster than 2080Ti.

No way. The 2080 was equal to the 1080Ti (Slower in some cases), the 2080S is only a bit faster.

We still have no idea when a 3080 would come out, nobody has seen any signs of the chip. Ampere (like Volta) is an HPC card. But to think that a 3080Ti it would be in unprecedented 70% (!?!?!) faster than a 2080Ti is ludicrous. nVidia has never had that kind of performance jump.

Especially when the number one technology NV is pushing for is RTX, which means the ratio of RTX:CUDA functionality is going to swing to having an increase in RTX performance.
 

DXDiag

Member
Nov 12, 2017
165
121
116
No way. The 2080 was equal to the 1080Ti (Slower in some cases), the 2080S is only a bit faster.
What does the 2080 have to do with this? most Turing GPUs are just an evolution of Volta/Pascal with Ray Tracing and Tensor, except for the 2080Ti. Manufactured on the same 16nm process, NVIDIA focused on adding Ray Tracing, and not improving performance.


We still have no idea when a 3080 would come out, nobody has seen any signs of the chip
You don't have to see any signs, the Ampere chips only leaked out this month despite being several months old. NVIDIA is keeping their cards very close to their chest, we didn't hear anything on Turing except in the last two months before release too.

But to think that a 3080Ti it would be in unprecedented 70% (!?!?!) faster than a 2080Ti is ludicrous. nVidia has never had that kind of performance jump.
They had all the time:
7900GTX to 8800GTX was 120%
280 to 580 was 80%
580 to 780Ti was 85%
980Ti to 1080Ti was 75%
 
Reactions: psolord and xpea

CakeMonster

Golden Member
Nov 22, 2012
1,428
535
136
25-35% is the average yearly bump (with leeway for somewhat uneven release dates), while the 60--80 range would be for the release after two years. The chip hardly matters anymore, whether its small, large, mature, immature, new process or not, NV knows what kind of perfomance bump is needed to make enough buck the next year and they're sticking to that as long as they have the technological advantage.
 

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
They had all the time:
7900GTX to 8800GTX was 120%
280 to 580 was 80%
580 to 780Ti was 85%
980Ti to 1080Ti was 75%

The 8800GTX is a bit of an odd duck, as its technically two dies. It was too large to fit on TSMC's 90nm process, so they broke it out into two chips. It also came out in the same year as the 7900, which was just a die shrunk rehash of an earlier chip.

580 to 780Ti is not accurate, as the 780Ti came out AFTER the 780. The 780Ti was basically a cheaper Titan. The 780 (non-Ti) was the spiritual successor of the 580, even if it did come out way later.

280 to 580? You are aware there was a 480 in there right? The 580 was an updated version of Fermi. The 480 was the replacement of the 280.

The 1080Ti was a lot faster than the 980Ti. But it also launched almost two years later, because the Titan was released between the two.
 

DXDiag

Member
Nov 12, 2017
165
121
116
The 8800GTX is a bit of an odd duck,
It was not odd duck, it was the start of NVIDIA building large chips.
580 to 780Ti is not accurate, as the 780Ti came out AFTER the 780.
The 780Ti is the highest Kepler chip, compared to the highest Fermi chip (the 580)
280 to 580? You are aware there was a 480 in there right?
The 480 was defective, undercut and underclocked, NVIDIA fixed it 6 months later with the 580. We are comparing the highest Tesla chip to the highest Fermi chip.

The 1080Ti was a lot faster than the 980Ti. But it also launched almost two years later, because the Titan was released between the two.
1080Ti is the highest Pascal chip compared to the highest Maxwell chip.

I don't know where you are going with this, you are jumping all over the place, mentioning small cut down chips and time periods .. none of this matters, what matters is the chips underneath and the process node. And what matters is that NVIDIA did achieve this kind of uplift several times before.

I am comparing the highest Ampere chip (3080Ti) to the highest Turing chip (2080TI), in this comparison I expect NVIDIA to provide 70-80% performance uplift just like they did several times before.
 
Last edited:

DXDiag

Member
Nov 12, 2017
165
121
116
See here:

It is also mentioned that Big Red 200 gained an additional 2 petaflops of performance even though it uses a smaller number of GPUs than the Volta V100 based design. The reason for going with a smaller number of next-generation GPUs is simply because they offer 70-75% better performance than existing parts and by that, we are comparing it with Volta-based Tesla V100 GPUs as no Tesla GPUs based on the Turing GPU architecture, aside from the Tesla T4, exist.


 

xpea

Senior member
Feb 14, 2014
449
148
116
For the record, my expectation for A100 has always been about 50% over Volta. This little discussion has been good for reconfirming my expectations. Thanks.
The uplift is higher (over 70%) in FP64 and much higher in ML (see DXDiag reply above for a nearly official statement)
better source here:
 
Reactions: DXDiag
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |