Question 'Ampere'/Next-gen gaming uarch speculation thread

Page 49 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Ottonomous

Senior member
May 15, 2014
559
292
136
How much is the Samsung 7nm EUV process expected to provide in terms of gains?
How will the RTX components be scaled/developed?
Any major architectural enhancements expected?
Will VRAM be bumped to 16/12/12 for the top three?
Will there be further fragmentation in the lineup? (Keeping turing at cheaper prices, while offering 'beefed up RTX' options at the top?)
Will the top card be capable of >4K60, at least 90?
Would Nvidia ever consider an HBM implementation in the gaming lineup?
Will Nvidia introduce new proprietary technologies again?

Sorry if imprudent/uncalled for, just interested in the forum member's thoughts.
 

jpiniero

Lifer
Oct 1, 2010
14,835
5,451
136
- RTX 3080, as a top mainstream card, based on GA102 (biggest GPU for customer segment)
- Reported power consumption for a top high-end model in range of 300 - 375W

You know, if nVidia thought they "had to", they could increase the stock power limits into the 300 range. I do still think the rumors of 300-375 are more because they are using Dual 8-pin.

Why did nvidia select samsung 10nm for this generation of huge DIE graphics cards?

They originally were going to do it on Samsung's 7 nm EUV process. Samsung gave them a good deal. Turns out it was too good.
 

Glo.

Diamond Member
Apr 25, 2015
5,761
4,666
136
You guys are both right. It just depends whether or not you want to include the acceleration afforded by the tensor cores: A100 Blogpost

I recall reading somewhere in the past that Volta tensor cores were not separate, discrete execution units but were basically ganged up FP32 cores to do matrix math, and it was suspected since you can't use the FP32 cores within the same clock cycle as the tensor cores. I imagine this is the same with Ampere but they've simply added in more math formats to the cores, including FP64. Can someone remind me if the FP64 units are discrete as well? Or are they the same as paired up FP32 units? I am leaning towards the former, in which case Ampere tensor cores probably gang up the FP32 units to do FP64 math somehow in conjunction with the dedicated FP64 units.
So Nvidia effectively invented 1.5 Precision Operations? Good to know.
 

Glo.

Diamond Member
Apr 25, 2015
5,761
4,666
136
Nope, Tensor FP64 is 20TF, it encompasses IEEE FP64 code as well. This is a substantial upgrade for AI HPC workloads.
So this is effectively ONLY for AI workloads. And not ALL FP64.

Well, great they increased Accuracy of AI. But its not "real" FP64, and its still 9.7 TFLOPs FP64, guys, no matter how you will spin it.
 
Reactions: french toast

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
So this is effectively ONLY for AI workloads. And not ALL FP64.

Well, great they increased Accuracy of AI. But its not "real" FP64, and its still 9.7 TFLOPs FP64, guys, no matter how you will spin it.

This.

It's one thing to claim one thing, but when straight from the horse's mouth it claims otherwise, then its time to admit being wrong.

It's ok, that's how you learn.

Now this doesn't reflect gaming Ampere. But A100 is a great example of what happens when they focus on something else. If they focus on Tensor/RT cores for Ampere gaming you can expect rasterizer performance increases to be smaller as well.
 

Glo.

Diamond Member
Apr 25, 2015
5,761
4,666
136
Now this doesn't reflect gaming Ampere. But A100 is a great example of what happens when they focus on something else. If they focus on Tensor/RT cores for Ampere gaming you can expect rasterizer performance increases to be smaller as well.
Based on the rumors, its exactly what is happening with next gen Gaming cards from Nvidia.

Smaller rasterization perf. increase, larger Ray Tracing performance increase.
 

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
Why did nvidia select samsung 10nm for this generation of huge DIE graphics cards?
I thought tsmc and nvidia had a better relationship since tsmc made that 12nm process for nvidia graphics cards.
Now that huawei must stop using tsmc, I dont't think there will be a huge problem allocating tsmc resources at the 7nm process.

From what we know/surmise, nVidia announced a deal with Samsung to use their 7nm EUV process in order to try and force a better deal with TSMC. TSMC has more 7nm orders than they could fulfill already, so they told nVidia there would be no special deal. Buy your wafers at the same price everybody else does, or take a hike. While this was happening, Samsung found their 7nm EUV process to have issues preventing its use with large die chips. This caused nVidia to have to make changes to their designs, and chips that had already taped out, had to be rerolled on another process. A100 had to be done on TSMC because of its size and market, so they bit the bullet and payed TSMC what they wanted. The gaming series chips were put onto Samsung's "8nm" (updated 10nm) since because nVidia was playing hardball, they did not pre-purchase 7nm wafers at TSMC, so there was no available capacity for their main stream GPUs.

This is one of the reasons rumors of high power draw chips is coming about. nVidia made their chips larger expecting a node drop, but they didn't get that full node drop. So a lot of the power savings that would have come from that, did not.
 

Mopetar

Diamond Member
Jan 31, 2011
8,005
6,449
136
I'm a little leery about making any assumptions based on GA-100 because it's built for a completely different workload as well as on a different process.

Even if NVidia is using a single underlying Ampere architecture for all of their cards, that doesn't exclude them from making tweaks or other changes where appropriate.

It seems like they're probably pushing the cards a bit more, but that could just as easily be because they anticipate AMD to have a worthy competitor and don't want to lose the performance crown. They probably can't get away with running sub-300 watt chips against Fury or Vega power hogs any more.

A $150 cooler is alarming in some ways because it should give you an idea of what NVidia is planning to charge. At the same time though it's nice to know that they're not being cheap and cutting corners on what they intend to be a premium product. Though at some point the extra cost and increased performance to run liquid cooling makes such a cooler questionable.

I'm sure we'll learn more details as we get closer to the launch date, whenever that might be. It just seems a bit hasty to get attached to any particular predictions or assumptions when there's still so much unknown and up in the air.

But we can be sure of one thing, and it's that NVidia will do everything they can to make sure their card comes out on top.
 

Glo.

Diamond Member
Apr 25, 2015
5,761
4,666
136
From what we know/surmise, nVidia announced a deal with Samsung to use their 7nm EUV process in order to try and force a better deal with TSMC. TSMC has more 7nm orders than they could fulfill already, so they told nVidia there would be no special deal. Buy your wafers at the same price everybody else does, or take a hike. While this was happening, Samsung found their 7nm EUV process to have issues preventing its use with large die chips. This caused nVidia to have to make changes to their designs, and chips that had already taped out, had to be rerolled on another process. A100 had to be done on TSMC because of its size and market, so they bit the bullet and payed TSMC what they wanted. The gaming series chips were put onto Samsung's "8nm" (updated 10nm) since because nVidia was playing hardball, they did not pre-purchase 7nm wafers at TSMC, so there was no available capacity for their main stream GPUs.

This is one of the reasons rumors of high power draw chips is coming about. nVidia made their chips larger expecting a node drop, but they didn't get that full node drop. So a lot of the power savings that would have come from that, did not.
Basically this. Samsung's node is pretty efficient till around 1.9 GHz for large chips, but then it all starts to fall off the cliff.

Yes, those GPUs will clock higher, than 1.9 GHz, but also will lose Efficiency. If RTX 3080 is using 300W of power under load, lets not expect that we will ever see sub-75W GPU on this node, on desktops, that is competitive with AMD's entry level offerings(which also will not be 75W TDP, but higher).

The flipside is that before this retooling of the designs we could expect GTX 1660 Ti performance for 107 die, right now we might actually be getting RTX 2060 performance level. Same goes for 106 die: before we would be getting RTX 2070-2070 Super performance, right now its more like 2070 Super to RTX 2080 performance for 3060 SKU.

Price it correctly, like 279$, for RTX 3060, and its a great value product.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
No you can't, it's basic science, RT splits work between shaders and RT cores, the shader split is indispensable.

Yes and if you reduce the impact of Ray Tracing by boosting RT performance you'll improve performance.

No, those FP64 tensor are still full double precision.

You don't get anything for free. It's still a specialized unit, otherwise HPC guys would go for the double flops FP64 tensor instead of the real FP64.
 

DXDiag

Member
Nov 12, 2017
165
121
116
You don't get anything for free. It's still a specialized unit, otherwise HPC guys would go for the double flops FP64 tensor instead of the real FP64.
Yes they would, NVIDIA already encourages them to change their code to take advantage of the new speed ups.

Yes and if you reduce the impact of Ray Tracing by boosting RT performance you'll improve performance.
You can't reduce the shader work, only accelerate it.
 

Glo.

Diamond Member
Apr 25, 2015
5,761
4,666
136
People really need to read more about the HPC/ML/AI world. A100 is nothing more than a tailor made super GPU for NVIDIA customers to accelerate their workloads. The GPU is balanced to achieve these goals.
And I think you should STOP believing in Nvidia marketing Gibberish, and think about the real world workloads, the financial responsibility that is behind them, and what calculations are done on FP64 math.

JUst because Nvidia says something works does not mean it actually works.

Jenhsen said that Laptops with RTX 2080 MaxQ are better than next generation consoles.

If he would say that buying sandals in winter is good idea, you would also believe him?
 

DXDiag

Member
Nov 12, 2017
165
121
116
Jenhsen said that Laptops with RTX 2080 MaxQ are better than next generation consoles.
Didn't you hear? a laptop with these specs already ran UE5 demo better than PS5. Have you seen benchmarks that proved him wrong?
And I think you should STOP believing in Nvidia marketing Gibberish, and think about the real world workloads, the financial responsibility that is behind them, and what calculations are done on FP64 math.
And you need to stop being so anti-NVIDIA every step of the way, it's really immature and childish, these aren't some kids playing around your backyard, these are multi billion companies doing their home work to satisfy their customers. Many of the HPC FP64 workloads can be transformed to FP64 GEMM code to take advantage of the speed ups.

And you also need to stop drumming about how A100 has low FP32 increases, the caches in A100 have increased considerably, which already boosted the IPC of the cores, end result is this: an average of ~65% performance increase in general HPC workloads despite the modest FP32 increase.

 

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
No you can't, it's basic science, RT splits work between shaders and RT cores, the shader split is indispensable.

No, those FP64 tensor are still full double precision.
OK, I'll ask outright.

What is the optimum shader/RT cores ratio at present? You seem to know a lot on this topic.
 

Glo.

Diamond Member
Apr 25, 2015
5,761
4,666
136
Didn't you hear? a laptop with these specs already ran UE5 demo better than PS5. Have you seen benchmarks that proved him wrong?
Considering that Neither of next gen consoles are out We haven't see any benchmarks to prove him right, in the first place.
And you need to stop being so anti-NVIDIA every step of the way, it's really immature and childish, these aren't some kids playing around your backyard, these are multi billion companies doing their home work to satisfy their customers. Many of the HPC FP64 workloads can be transformed to FP64 GEMM code to take advantage of the speed ups.

And you also need to stop drumming about how A100 has low FP32 increases, the caches in A100 have increased considerably, which already boosted the IPC of the cores, end result is this: an average of ~65% performance increase in general HPC workloads despite the modest FP32 increase.

View attachment 22701
So by pointing out, obvious BS done by Nvidia marketing, you believe I am anti Nvidia?
By posting the truth, that you, or other people here find... uncomfortable, about Next Gen Gaming cards, you believe I am anti-Nvidia?

You say that about a person who NEVER owned AMD/ATi GPU, always had Nvidia GPUs, and only once in his lifetime he had AMD CPU, in the good old Sempron days, and always had Intel CPUs?

Im not anti-Nvidia. Im against typical Forum user BS, against marketing BS.

What you are constantly showing is typical Nvidia Marketing BS, because you have nothing else to prove your point.

When people look beyond Nvidia's marketing BS, they see:

1) Performance per watt of GA100 chip in FP32 and FP64 is lower than Volta.
2) That those features are only "tryHarding" to mitigate the problems that stacking all of those Tensor cores, and AI features created for Number Crunching workloads(The fabled Tensor accelerated FP64 performance).

There is no free lunch. And Nvidia always try to push the agenda that there is, with their solution. Which is blatantly stupid, for people who look beyond merketing hype.
 

DXDiag

Member
Nov 12, 2017
165
121
116
By posting the truth, that you, or other people here find... uncomfortable, about Next Gen Gaming cards, you believe I am anti-Nvidia?
What truth? all I see is a bunch of speculation and baseless rumors ... fueled by an anti NVIDIA agenda, worse you are consistently being proven wrong.

When people look beyond Nvidia's marketing BS, they see:
When people who have no basic knowledge about the HPC/ML world you mean.
There is no free lunch. And Nvidia always try to push the agenda that there is, with their solution. Which is blatantly stupid, for people who look beyond merketing hype.
People are clamoring to buy the A100 GPU, because it basically has no match, V100 had no match before it for 3 years, and this one extends the lead significantly.

1) Performance per watt of GA100 chip in FP32 and FP64 is lower than Volta.
Completely irrelevant to the task at hand, and completely inconsequential considering the benchmarks I posted.

2) That those features are only "tryHarding" to mitigate the problems that stacking all of those Tensor cores, and AI features created for Number Crunching workloads(The fabled Tensor accelerated FP64 performance).
Are you an HPC expert by any mean? How did you conclude that GEMM FP64 is worthless? where is the "Facts" in that?
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |