Question 'Ampere'/Next-gen gaming uarch speculation thread

Page 15 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Ottonomous

Senior member
May 15, 2014
559
292
136
How much is the Samsung 7nm EUV process expected to provide in terms of gains?
How will the RTX components be scaled/developed?
Any major architectural enhancements expected?
Will VRAM be bumped to 16/12/12 for the top three?
Will there be further fragmentation in the lineup? (Keeping turing at cheaper prices, while offering 'beefed up RTX' options at the top?)
Will the top card be capable of >4K60, at least 90?
Would Nvidia ever consider an HBM implementation in the gaming lineup?
Will Nvidia introduce new proprietary technologies again?

Sorry if imprudent/uncalled for, just interested in the forum member's thoughts.
 

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
Well I don't want to comment much on die sizes, but I mean, 20% alongside the rest sounds perfectly reasonable to me.
Are we replying now always only to the latest comment now? My reply was this: if it's 20% smaller and on 7nm, then it should consume much less than half the power, then there comes the 20% higher clockspeed, which would result in ca. half the power usage altogether.

Edit: please don't bring Vega into this, that computing monster would consume 5KW even on 2nm.
 

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
Are we replying now always only to the latest comment now? My reply was this: if it's 20% smaller and on 7nm, then it should consume much less than half the power, then there comes the 20% higher clockspeed, which would result in ca. half the power usage altogether.

Edit: please don't bring Vega into this, that computing monster would consume 5KW even on 2nm.
These numbers are confusing everyone.

If we assume 7nm is twice as dense as 16nm and 1/2 the power at identical frequency.

A 50% smaller die at same frequency will use 50% the power. (same # transistors)
A 20% smaller die at same frequency will use 80% the power. (160% # transistors)

A 20% smaller die and 20% higher clocks will use ~ 115% the power. ( architectural changes can offset this to increase frequency of total unit)

On the other hand, I remember reading that Nvidia used a custom 12 nm node from TSMC. As we have recently read, GloFlo is offering a 12nm+ node that significantly closes the gap to 7nm especially in power consumption. Could we imagine that TSMC with their much larger R&D has offered a similar process to Nvidia much earlier?

The huge assumption now is that the 12 nm custom to 7nm TSMC jump will be equivalent to the 14nm GloFlo to 7nm TSMC one. I'm not sure this is true.
 
Reactions: Adonisds

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
These numbers are confusing everyone.

If we assume 7nm is twice as dense as 16nm and 1/2 the power at identical frequency.

A 50% smaller die at same frequency will use 50% the power. (same # transistors)
A 20% smaller die at same frequency will use 80% the power. (160% # transistors)

A 20% smaller die and 20% higher clocks will use ~ 115% the power. ( architectural changes can offset this to increase frequency of total unit)

On the other hand, I remember reading that Nvidia used a custom 12 nm node from TSMC. As we have recently read, GloFlo is offering a 12nm+ node that significantly closes the gap to 7nm especially in power consumption. Could we imagine that TSMC with their much larger R&D has offered a similar process to Nvidia much earlier?

The huge assumption now is that the 12 nm custom to 7nm TSMC jump will be equivalent to the 14nm GloFlo to 7nm TSMC one. I'm not sure this is true.

Both new GPUs from AMD and NVIDIA will use the 7nm+ EUV that brings higher density and lower power vs 7nm
 

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
Both new GPUs from AMD and NVIDIA will use the 7nm+ EUV that brings higher density and lower power vs 7nm
My comparison still stands. Don't get hung up on these details, or just substitute 7nm+ for 7nm. Same result.

edit:
My point is that (I think) we're making the assumption that the node improvement for Nvidia will be similar to what AMD experienced in moving to TSMC 7nm (or 7nm+).

I am saying that Nvidia might be starting off from a better position so their node performance jump will not be as great as AMD experienced.

[GloFlo 14nm > TSMC 7nm+] > [TSMC 12nm > TSMC 7nm+]
 
Last edited:

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
My comparison still stands. Don't get hung up on these details, or just substitute 7nm+ for 7nm. Same result.


Well, 12nm didnt bring any density improvements vs 16nm as we can see from GTX1080Ti vs RTX2080Ti
GTX1080Ti = 11800M transistors / 471mm2 = ~25m transistors per mm2
RTX2080Ti = 18600M transistors / 754mm2 = ~25m transistors per mm2

7nm+ EUV brings 15% higher density + 10% lower power vs 7nm , so it will be a full node+ difference vs 12nm TSMC.
 

Guru

Senior member
May 5, 2017
830
361
106
What process is Nvidia using and which plant? Are they using Samsung's, GloFlo or TSMC?

The only way for Nvidia to make their next gen faster and yet at the same time cheaper, would be to reduce their hardware level raytracing so that their chips are not so massive and obviously have a much smaller process node in order to put more stuff on the die.

I don't see Nvidia actually increasing ray tracing dedicated hardware capabilities, I think they are going to go AMD's route and just use general computing with help from the dedicated hardware raytracing. Otherwise their new chips would have to be even bigger than their current ones and I just don't think they can sell these GPU's much more expensive.

At a certain point even cash loaded gamers who play games for 5+ hours every day are going to question the price for a new gpu.
 

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
Well, 12nm didnt bring any density improvements vs 16nm as we can see from GTX1080Ti vs RTX2080Ti
GTX1080Ti = 11800M transistors / 471mm2 = ~25m transistors per mm2
RTX2080Ti = 18600M transistors / 754mm2 = ~25m transistors per mm2

7nm+ EUV brings 15% higher density + 10% lower power vs 7nm , so it will be a full node+ difference vs 12nm TSMC.
I am speaking strictly about power consumption here. Density has too many unknowns for you to make those claims. You are equating max density with a logic layout. For all I know Nvidia stayed at the same transistor density for thermals to allow high clocks. Just because a process is capable of a certain density does not means we see it in practice, as you well know.

Read this on GloFlo 12LP+ 12nm process.

https://www.tomshardware.com/news/globalfoundries-12lp-plus-node-process-12nm,40478.html
"GlobalFoundries (GF) on Tuesday announced the availability of a new addition to its 12 Leading Performance (12LP) platform, called 12LP+. The company claims it will feature a noticeable increase in performance and decrease in power and area. It also contains a low-voltage SRAM bit cell.

GlobalFoundries (GF) is promising much of the power and performance benefits of 7nm but at the lower cost of the 12LP platform. It also delivers a 15% increase in logic transistor density.


More specifically, GF said the 12LP+ FinFET process provides a 20% increase in performance or a 40% decrease in power over the base 12LP platform, (which itself provided 10% improvement over 16/14nm and a 15% improvement in logic area scaling). That's the same amount of improvement that TSMC claimed when comparing its 7nm process to 16nm, (although, TSMC also had 10nm before reaching 7nm).

In a statement, GF compared its new process to 7nm and also mentioned its lower cost: "As an advanced 12nm technology, our 12LP+ solution already offers clients a majority of the performance and power advantages they would expect to gain from a 7nm process, but their NRE (non-recurring engineering) costs will average only about half as much, a significant savings."

What I'm saying is that we Do Not know what exactly is the 12nm custom process that Nvidia used for the 2xxx series. Everyone keeps making the assumption that it's roughly equivalent to the 16nm process with small tweaks. It seems that 12nm can offer a lot more performance than is commonly assumed.

What if it's a lot closer to GloFlo new 12LP+ process. In this case, all of the assumptions of Nvidia getting a 50%+ reduction in power use due solely to the node advancement is wrong. They might be starting from a position much closer to a plain 7nm process than is commonly assumed.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
A 20% smaller die and 20% higher clocks will use ~ 115% the power. ( architectural changes can offset this to increase frequency of total unit)

I am sure the power increase only holds true at around nominal process voltage and not at overdrive voltages as GPUs tends to run at. Typically 20% higher clocks would be more expensive than your calculation power wise.
 

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
Well, 12nm didnt bring any density improvements vs 16nm as we can see from GTX1080Ti vs RTX2080Ti

GTX1080Ti = 11800M transistors / 471mm2 = ~25m transistors per mm2
RTX2080Ti = 18600M transistors / 754mm2 = ~25m transistors per mm2
They are the same between 16 and 12nm except for a slightly tighter metal pitch and therefore a minimal density improvement. IMO, though, 12nm was completely based on branding since each producer just tosses crap against the wall when deciding what to call the process (well, it's not that bad, but it's confusing for sure). I'm sure TSMC saw Samsung 14nm, and GloFo's imminent 12nm, and wanted to play catchup. Their density falls well short of both Samsung 14nm, Intel 14nm, as well as GloFo 12nm.

7nm+ EUV brings 15% higher density + 10% lower power vs 7nm , so it will be a full node+ difference vs 12nm TSMC.
I was wondering about power scaling on Turing vs Navi, based on all this, and what I am 100% sure of is that I really hope both Nvidia and AMD use TSMC 7nm+ EUV for their Ampere / RDNA2 respectively.

Right now, it doesn't appear that either side has a great efficiency advantage, e.g. 5700 vs 2060S draw at peak gaming 180W vs 192W, while # of transistors is 10300M vs 10800M, for a W per billion transistors of 17.5 vs 17.8, and the 5700 is ~5% faster at 1440p.

But for 5700XT vs 2070S, it's 227W vs 220W, 10300M vs 13600M, and a resulting 22.0 vs 16.2, and the 2070S is 27% faster.

Two take-aways:
1) same process might let us crown a true uarch champion, which I look forward to in discussions
2) there really does seem to be a lot of competition, but AMD clearly cannot simply clock their way to the lead - they're going to need more die area lest Big Navi draw 400+W at 18000M transistors

Some thoughts:
- 7nm is maturing well, with density of defect of 0.09, for 80% yield on 251mm2 on 300mm wafers - but at 471mm2, the yield is 66%; once defect density hits 0.05, yields on 471mm2 will be 80%
- 7nm+ EUV will reset this back to higher defect density for a while
- AMD may do well to put Big Navi on 7nm simply for cost reasons, since 7nm+ EUV may not provide a substantial enough performance benefit to offset the yield cost
 
Last edited:

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
Some thoughts:
- 7nm is maturing well, with density of defect of 0.09, for 80% yield on 251mm2 on 300mm wafers - but at 471mm2, the yield is 66%; once defect density hits 0.05, yields on 471mm2 will be 80%
- 7nm+ EUV will reset this back to higher defect density for a while
- AMD may do well to put Big Navi on 7nm simply for cost reasons, since 7nm+ EUV may not provide a substantial enough performance benefit to offset the yield cost
AFAIK, 7nm+ increases yield. Much lower mask counts at front end and better line definitions.

GPUs have impressively redundant circuitry leading to high partially functional yields. You design for X CU but only activate less is quite common.

I do not see yields falling with the transition to 7nm+. The problem with EUV was the throughput of the lithography machines. This has now reached satisfactory levels.

 
Last edited:

Adonisds

Member
Oct 27, 2019
98
33
51
AFAIK, 7nm+ increases yield. Much lower mask counts at front end and better line definitions.

GPUs have impressively redundant circuitry leading to high partially functional yields. You design for X CU but only activate less is quite common.

I do not see yields falling with the transition to 7nm+. The problem with EUV was the throughput of the lithography machines. This has now reached satisfactory levels.

That's all very logical and it's what should happen, but 7nm+ rollout hasn't been very smooth and hasn't come as soon as predicted, so maybe there's something going on
 

RetroZombie

Senior member
Nov 5, 2019
464
386
96
The only difference that tsmc 12nm have over 16nm is the reticle limit.
16nm is around 600mm2, the 12nm 850mm2.

If there where other better stuff nvidia or tsmc would be claiming them.
 

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
That's all very logical and it's what should happen, but 7nm+ rollout hasn't been very smooth and hasn't come as soon as predicted, so maybe there's something going on
There is a huge order backlog for ASML EUV machines. Seeing as they have lower wafer output/machine than previous generations, I think capacity constraints was the main reason for EUV being slower to start mass production.
 

RetroZombie

Senior member
Nov 5, 2019
464
386
96
According to this anandtech article:
TSMC: N7+ EUV Process Technology in High Volume, 6nm (N6) Coming Soon
Provides 70% area reduction and 60% power savings vs 16nm.

So assuming that power savings can be converted into performance and with some ipc increase it's possible to see the up to 75% performance increase, the problem is the other limitations like memory bandwidth.

I don't know how to math* the size decrease but something like the volta die of 815mm2 at 16nm will do 480mm2 at 7nm.

*Math is 815/1.7 = 480mm2 ?

Edit: Many typos.
 
Last edited:

Ajay

Lifer
Jan 8, 2001
16,094
8,106
136
According to this anandtech article:
TSMC: N7+ EUV Process Technology in High Volume, 6nm (N6) Coming Soon
Provides 70% area reduction and 60% power savings vs 16nm.

So assuming that power savings can be converted into performance and with some ipc increase it's possible to see the up 75% performance increase, the problem is the other limitations like memory bandwidth.

I don't know how to math* the size decrease but something like the volta die of 815mm2 at 16nm will do 480mm2 at 12nm.

Math is 815/1.7 = 480mm2 ?
That process improvements are always 'up to'. Since NV's next gen GPUs aren't a strict die shrink, it's really hard to translate any of those figures in actual performance or die size calculations. If recent trends are any indicator, then GPU clock rates will not change substantially. An increase in die size, total xtors, urach changes, etc are all to be expected and will result in a significant performance increase. But that increase will be a result of what marketing wanted, as much as what is technically possible (well, except in the case of GA100, where NV will want to squeeze out every ounce of performance possible).
 

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
According to this anandtech article:
TSMC: N7+ EUV Process Technology in High Volume, 6nm (N6) Coming Soon
Provides 70% area reduction and 60% power savings vs 16nm.

So assuming that power savings can be converted into performance and with some ipc increase it's possible to see the up 75% performance increase, the problem is the other limitations like memory bandwidth.

I don't know how to math* the size decrease but something like the volta die of 815mm2 at 16nm will do 480mm2 at 12nm.

Math is 815/1.7 = 480mm2 ?
A 70% area reduction and a 60% power savings basically means an identical layout will be 30% of the previous size and consuming 40% of the power, but as Ajay pointed out, it's never the same layout.

A Volta of 815mm2 on 16nm will be 245mm2 on 7nm+.
 

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
That process improvements are always 'up to'. Since NV's next gen GPUs aren't a strict die shrink, it's really hard to translate any of those figures in actual performance or die size calculations. If recent trends are any indicator, then GPU clock rates will not change substantially. An increase in die size, total xtors, urach changes, etc are all to be expected and will result in a significant performance increase. But that increase will be a result of what marketing wanted, as much as what is technically possible (well, except in the case of GA100, where NV will want to squeeze out every ounce of performance possible).
I think just like on Ryzen CPU, it seems that simply boosting clocks is not an easy thing to do because of heat density. However, IIRC, I have read that EUV at the same "density" may enable lower power draw for the same work. If that's the case, perhaps there is more overhead to boost clocks.
 
Reactions: Ajay

Ajay

Lifer
Jan 8, 2001
16,094
8,106
136
I think just like on Ryzen CPU, it seems that simply boosting clocks is not an easy thing to do because of heat density. However, IIRC, I have read that EUV at the same "density" may enable lower power draw for the same work. If that's the case, perhaps there is more overhead to boost clocks.

I think you are correct wrt heat density problems. It won’t likely be as severe in a GPU because of the lower clocks and because total watts of power are spread over a larger die. EUV will reduce line edge roughness and should improve overall consistency in geometric sizes and shapes, IIRC. Not sure what that’ll get us users, maybe better bins.

Oh, and I’d be shocked if GA100 is < 700 mm^2, depending on the reticle limit of n7+.
 
Last edited:

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
A 70% area reduction and a 60% power savings basically means an identical layout will be 30% of the previous size and consuming 40% of the power, but as Ajay pointed out, it's never the same layout.

A Volta of 815mm2 on 16nm will be 245mm2 on 7nm+.

If im not mistaken , 70% higher density means you can have 70% more transistors per mm2 vs the previous process.

GV100 manufactured at 12nm has 21,100M transistors with a die size of 815mm2. Transistors per mm2 Density is 25,89 (21,100/815)

At 7nm with 70% higher density , will have a Transistor per mm2 density close to 44 (25,89 + 70%). That means that 21,100M transistors will occupy 480mm2 (21,100/44 )

As a rough estimation, we multiply the die size (mm2) by 0.67 to get a glimpse of the area reduction that we can have from a full node process.
 

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
If im not mistaken , 70% higher density means you can have 70% more transistors per mm2 vs the previous process.

GV100 manufactured at 12nm has 21,100M transistors with a die size of 815mm2. Transistors per mm2 Density is 25,89 (21,100/815)

At 7nm with 70% higher density , will have a Transistor per mm2 density close to 44 (25,89 + 70%). That means that 21,100M transistors will occupy 480mm2 (21,100/44 )

As a rough estimation, we multiply the die size (mm2) by 0.67 to get a glimpse of the area reduction that we can have from a full node process.
It's 70% area reduction = now takes up 30% of the area. It's not 70% higher density. That seems to be your interpretation, but is incorrect.
 

Glo.

Diamond Member
Apr 25, 2015
5,762
4,667
136
I think everybody forgets that N7 TSMC process has around 25 m xTor/mm2 density, for anything High Performance. EUV process, on which Nvidia has taped out Ampere has 20% better density than N7 TSMC's proces.

We have to base every calculation on this factor, not the mythical 70% higer densities.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |