Question 'Ampere'/Next-gen gaming uarch speculation thread

Ottonomous · Nov 1, 2019

How much is the Samsung 7nm EUV process expected to provide in terms of gains?
How will the RTX components be scaled/developed?
Any major architectural enhancements expected?
Will VRAM be bumped to 16/12/12 for the top three?
Will there be further fragmentation in the lineup? (Keeping turing at cheaper prices, while offering 'beefed up RTX' options at the top?)
Will the top card be capable of >4K60, at least 90?
Would Nvidia ever consider an HBM implementation in the gaming lineup?
Will Nvidia introduce new proprietary technologies again?

Sorry if imprudent/uncalled for, just interested in the forum member's thoughts.

moonbogg · Sep 1, 2020

blckgrffn said:
How many 1080ti owners are going to see that buffer downgrade and take a pause on the 3080? I would, but I already sold my 1080 ti.

How many large vram cards with weak GPUs have been built for suckers in the past?

I would have committed to buying a 3080 by now, honestly, if it weren't for the downgrade in Vram. It bothers me. It's the reason I'm not clicking the buy button on day one. As for large buffers for suckers, I agree completely. We've all seen those cards. The argument doesn't apply to the 3080 IMO. People will be using the card for high resolutions, texture mods, ray tracing and future generation games. 8Gb has been breached in some cases, even at 1440p. The 3080 would have been a good fit for 16Gb. They could have got away with 12 even.
I don't care about quantities not matching the memory architecture or whatever. It's not my job to be a GPU engineer and base my purchasing decisions on that kind of knowledge. I base it on the simple fact that an upgrade should be an upgrade, especially one priced at $700.

I need to be fair though. Historically, GPU releases have often come with the same Vram as the last gen and it wasn't an issue. I feel this time is an issue.

uzzi38 · Sep 1, 2020

DisEnchantment said:
It was quite disappointing that it was not 7LPP. Ampere could have stretched its legs.
There could be something wrong with 7LPP for bigger dies.
Also this means there is no quick re-tapeout path to the 6LPP/5LPE for later refreshes.

Well you know, GA103 is still apparently around, maybe, just maybe we'll see something interesting

CP5670 · Sep 1, 2020

It will be interesting to see what the AIB versions of these cards look like. Usually the AIB cards have significantly better coolers (as well as stock clocks), but in this case the stock Nvidia coolers are already quite beefy. I wonder how much of an improvement the AIB cards will be.

JasonLD · Sep 1, 2020

DisEnchantment said:
It was quite disappointing that it was not 7LPP. Ampere could have stretched its legs.
There could be something wrong with 7LPP for bigger dies.
Also this means there is no quick re-tapeout path to the 6LPP/5LPE for mid gen Super refreshes.

I do expect Ampere refresh on either 7nm or 6nm. I don't expect 5nm until Hopper.
Though looking at pricing, I think Nvidia did get some really good prices on those 8nm.

That said, 3080 is looking lot more impressive than I thought it would end up being.

Ajay · Sep 1, 2020

uzzi38 said:
Nvidia are no doubt getting a good deal.

AIBs? I feel so bad for them. Margins must be awful having to deal with GDDR6X in all it's cursedness. Power draw per module is notably up, but worsed of all is the PAM4 signalling. The extra PCB costs alongside the cost per module is not fun.

I don't know why to you think PAM4 is all that bad - it's pretty common in other areas of electronics. The controller shouldn't be that difficult to design, toughest part I see is keeping the signals clean.

DisEnchantment said:
It was quite disappointing that it was not 7LPP. Ampere could have stretched its legs.
There could be something wrong with 7LPP for bigger dies.
Also this means there is no quick re-tapeout path to the 6LPP/5LPE for mid gen Super refreshes.
Or maybe 6LPP is not part of wafer shuttle arrangement?
Samsung has to come out publicly in their Foundry Forums. Too much rampant FUD from Taiwan.

I think the defect rate must has been pretty bad for large dice. If Samsung keeps improving their 8nm (NVIDIA special), then they may be able to turn the clock up in another year.
I suppose it depends on whether AMD plans on coming out with RNDA3 on 5nm in another 12-18 months after RDNA2.

MrTeal · Sep 1, 2020

Has there been any information on whether RTX IO is interface agnostic? There's obvious performance differences between a top 4.0 NMVE drive and a SATA drive, but will a PCIe based drive be required to get good performance out of it?

uzzi38 · Sep 1, 2020

Ajay said:
I don't know why to you think PAM4 is all that bad - it's pretty common in other areas of electronics. The controller shouldn't be that difficult to design, toughest part I see is keeping the signals clean.

12 layer PCBs.

Reduced distances between GPU die and memory.

Failure to reach 21Gbps as GDDR6X is supposed to allow at launch.

There's already several signs that GDDR6X has major limitations. Add onto the fact that between data travelling and module power you're looking at notably over 100W on the 3090 and yes, the stuff is absolutely cursed.

Just because Nvidia didn't want to have to take on the HBM tax on themselves. But they don't need to worry, as AIBs can deal with all the extra costs. No biggie for them

DiogoDX · Sep 1, 2020

AtenRa said:
Its the slide they mentioned 1.9x perf/watt and according to official NVIDIA slide,

Turing 60fps / 240w = 0.25
vs
Ampere 105fps / 320W = 0.33

This is 32% higher perf/watt , again this is according to official NVIDIA slide.

2080S is like 20% better perf/watt that a 5700XT. So if AMD delivers the 50% better perf/watt with RDNA2 they will really catch Nvidia.

itsmydamnation · Sep 1, 2020

pj- said:
RTX IO seems interesting. Does that erase the big advantage of PS5/XBSX over pc? Will big navi have something similar?

Yes, but no and no.
So there is a lot more to File I/O like security etc. So it probably wont be as good as the Xbox system for equal parts. That said we can probably easily just brute force that on the PC side so xbox level or greater I/O I would expect to be quite achievable . PS5 on the other hand seems to be another level entirely that i dont know if brute force will be able to catch.

Genx87 · Sep 1, 2020

3080 is in my sights!

maddie · Sep 1, 2020

Nvidia is calculating the perf/W improvement by using the max performance part of the Turing curve and comparing it to around the max efficiency portion of Ampere. Creative thinking by the marketing team & fooling a lot of people.

JasonLD · Sep 1, 2020

maddie said:
Nvidia is calculating the perf/W improvement by using the max performance part of the Turing curve and comparing it to around the max efficiency portion of Ampere. Creative thinking by the marketing team & fooling a lot of people.

That is how the perf/w improvement is calculated on marketing slide, since it is easy to inflate that figure depends on how it is being compared. Reason I am not being so hot on 50% Perf/w claim on RDNA2 side.

Beside that, I think this is a pretty good win for Samsung foundry side since this might be the first time they scored big GPU dies.

Thala · Sep 1, 2020

maddie said:
Nvidia is calculating the perf/W improvement by using the max performance part of the Turing curve and comparing it to around the max efficiency portion of Ampere. Creative thinking by the marketing team & fooling a lot of people.

Technically there is nothing wrong with specifying perf/W at iso performance levels - thats what they did. Of course they could have added the iso-power number, which naturally is lower and then peak-to-peak, which would be even lower again. But its all in the graph, its not that NVidia is hiding something.

moonbogg · Sep 1, 2020

I'm having confused feelings about the 3080. I think I like it. I wish it had more Vram, but I'm guessing 10Gb might be fine for 3440x1440. Even I can admit when I'm being too damn picky. It looks like a great card at $700 and should lay waste to anything thrown at it for quite a long time. I do suspect an impressive refresh may be in the works, or a 3080Ti, but who cares. Waiting another year doesn't sound like fun, but playing around with a new card does sound like fun. I don't think I'll be able to resist taking a hammer to the "buy" button and smashing it into the ground on release day. Jacket man delivered.

DiogoDX · Sep 1, 2020

JasonLD said:
That is how the perf/w improvement is calculated on marketing slide, since it is easy to inflate that figure depends on how it is being compared. Reason I am not being so hot on 50% Perf/w claim on RDNA2 side.

Beside that, I think this is a pretty good win for Samsung foundry side since this might be the first time they scored big GPU dies.

They said the same with RDNA vs VEGA and delivered (5700XT vs Vega64 or 5700 vs Vega 56). But we know that Vega was terrible in perf/watt so was "easy". If they deliver again, 50% over RDNA1 will put RDNA2 at the same perf/watt that Ampere.

A/// · Sep 1, 2020

I'll be waiting it out since I don't plan to build any sooner than 5 or 6 months. I'll have a goo idea of where AMD is then but as I've said before, I expect RDNA2 to fall flat on its face. If nv wasn't bs'ing, these are some of the best cards they've put out in a long time. I missed the event and couldn't believe some of the prices I'd seen while wading through online articles that went up in near live time. If the 3060 ends up being around 275-350, nv will cover all their bases and there really won't be a point in team red's product, especially given the expected letdown with faulty software.

raghu78 · Sep 1, 2020

Glo. said:
Its 2xFP32 THROUGHPUT.

Its the same marketing gibberish they used for A100 chips.

In reality, when tested actual, native FP32 performance, it was the same, as V100.

Nvidia have provided specs for RTX 3090, RTX 3080 and RTX 3070

RTX 3090 - 10496 cuda cores

3090 & 3090 Ti-Grafikkarten

Die Leistung der TITAN-Klasse für das ultimative Gaming.

www.nvidia.com

RTX 3080 - 8704 cuda cores

NVIDIA GeForce RTX 3080-Familie

Die Superleistung, nach der sich Gamer sehnen – mit der NVIDIA Ampere-Architektur.

www.nvidia.com

RTX 3070 - 5888 cuda cores

NVIDIA GeForce RTX 3070-Familie

Liefert dir die Leistung, die du brauchst, um die anspruchsvollsten Spiele meistern zu können.

www.nvidia.com

But as always the devil is in the details and actual benchmarks in games should give us an idea. From my estimates based on Nvidia benchmarks and the digitalfoundry preview the RTX 3080 is 30% faster than RTX 2080 Ti. But the TDP is 23% higher. So the perf/watt improvement is just 5.6% . RTX 3070 is looking to be slightly faster than the RTX 2080 Ti for 40w less power. So the perf/watt increase is roughly 20-25%.

tomli747 · Sep 1, 2020

Saylick said:
Yeah, I agree with Glo here. They used the term "Shader-FLOPS" not "FP32 FLOPS". The INT cores actually do single-precision math (i.e. 32-bit) but just not floating point specifically. Going off of the SM diagram for A100, they don't list the INT cores as being capable of doing FP math either so my guess is either Nvidia tuned Ampere for graphics so that the the pipelines for an SM are (2) x 16-wide FP or 16-wide INT + 16-wide FP, or they are just listing Shader-FLOPS as a catch all term for all the concurrent single-precision math the entire GPU can do, INT and FP included. My money is on the latter.

My speculation is that the ALUs are running at doubled speed just like Fermi as Jensen said "Ampere does 2 shader calculations per clock". (i.e. 3070 has 2944 core @3.4GHz)

In this case, a CUDA core at doubled speed is advertised as 2 CUDA cores.
It probably also explains the high power consumption and the crazy numbers of core count for its die size and transistor density.

lobz · Sep 1, 2020

raghu78 said:
Nvidia have provided specs for RTX 3090, RTX 3080 and RTX 3070

RTX 3090 - 10496 cuda cores

3090 & 3090 Ti-Grafikkarten

Die Leistung der TITAN-Klasse für das ultimative Gaming.

www.nvidia.com

RTX 3080 - 8704 cuda cores

NVIDIA GeForce RTX 3080-Familie

Die Superleistung, nach der sich Gamer sehnen – mit der NVIDIA Ampere-Architektur.

www.nvidia.com

RTX 3070 - 5888 cuda cores

NVIDIA GeForce RTX 3070-Familie

Liefert dir die Leistung, die du brauchst, um die anspruchsvollsten Spiele meistern zu können.

www.nvidia.com

But as always the devil is in the details and actual benchmarks in games should give us an idea. From my estimates based on Nvidia benchmarks and the digitalfoundry preview the RTX 3080 is 30% faster than RTX 2080 Ti. But the TDP is 23% higher. So the perf/watt improvement is just 5.6% . RTX 3070 is looking to be slightly faster than the RTX 2080 Ti for 40w less power. So the perf/watt increase is roughly 20-25%.

Well, the phrase '2x throughput' was on an official NVIDIA slide during the presentation, so...

Don't forget, Zen2 has also 2x FP throughput compared to Zen1. That does NOT translate to 100% performance gain at all.

FatherMurphy · Sep 1, 2020

I'm just a hobbyist and gamer, so no expert. But, I think it is safe to say that Nvidia didn't get rid of the separate INT unit and concurrent INT/FP performance. It was a big factor in the Turing uplift in "IPC". At the same time, Nvidia admitted at the time that for every 100 FP operations, there were ~ 35 INT operations.

It makes sense that Nvidia would create a GA102 with more flexible INT unit per SM so that it isn't sitting there idle. So, now you have FP + INT/FP, where you can do FP + INT or FP + FP, thus "twice" the throughput on FP32.

Given the GA100 was tailored more for HPC/AI with an emphasis on either higher precision (FP64) or lower precision (TF32 and lower), it is not a FP32 monster. GA102 is designed to be the FP32 monster to be used in workstations/servers where traditional FP32 is important. If I'm not mistaken, Nvidia has targeted its sub-Gx100 chips (e.g. TU102, 104, GP104) like this before. So, Nvidia took the transistors it saved by stripping out GA100s HPC specific parts and invested them in a more robust/hyrbid FP/INT unit. This helps in gaming (with somewhat limited gains because the rest of the graphics pipeline isn't doubled) but also gives Nvidia a FP32 (and raytracing) beast to sell to those industries who need it.

I think the CUDA core marketing could be clearer (if this is what has happened).

All of this is uneducated speculation.

AtenRa · Sep 1, 2020

DiogoDX said:
2080S is like 20% better perf/watt that a 5700XT. So if AMD delivers the 50% better perf/watt with RDNA2 they will really catch Nvidia.

Well, if you take same tiers NAVI 10 has higher perf/w vs Turing

RX5600Xt higher perf/w vs RTX2060
RX5700 higher perf/w vs 2060S

RX5700XT is way too pushed in order to be faster vs RTX2070 but even then its very close

RTX2070 Super and RTX2080/S are using the much much bigger TU104

ASUS Radeon RX 5600 XT TUF EVO Review

The ASUS Radeon RX 5600 XT TUF EVO uses the improved version of the EVO cooler. Temperatures and noise levels are excellent because ASUS decided not to downgrade the heatsink for their RX 5600 XT despite the much lower heat output of the GPU. Idle fan stop and a backplate are included, too.

www.techpowerup.com

ddarko · Sep 1, 2020

Since there will be no preorders for the Founders Edition cards this time, has the Nvidia store been the only place to get the cards on launch day in the past or have they also been available at retailers like Amazon, Best Buy, Microcenter etc.? Or do the FE cards only show up later at retailers?

Karnak · Sep 1, 2020

You guys remember Jensen by the end of last year?

During GTC 2019 in China, NVIDIA founder and CEO Jensen Huang addressed questions from the tech press over who will be building out most ogf the new 7nm GPUs in 2020 and beyond. Huang said that most of the 7nm GPU production will be done by TSMC while Samsung will only handling a small portion of 7nm GPU production for NVIDIA.

TSMC will handle most of NVIDIA's next-gen 7nm Ampere GPUs

TSMC will be handling most of NVIDIA's new 7nm GPU according to these reports, Samsung will only have a small number of orders.

www.tweaktown.com

Still possible though if GA106 and smaller ones will be on TSMC N7. But for now that looks weird to me since GA102 and GA104 are on Samsung's 8nm...

jpiniero · Sep 1, 2020

Karnak said:
Still possible though if GA106 and smaller ones will be on TSMC N7. But for now that looks weird to me since GA102 and GA104 are on Samsung's 8nm...

Most likely if they do end up releasing something on SS7, it would be something exclusive to Quadros.

linkgoron · Sep 1, 2020

Given Nvidia's recent GPU history, I wonder if GA104 failed to deliver the expected performance, and the 3080/3090 cards were released earlier because of this. This is the first time since 2013 that the 80s card is not a X04 - 980 was GM204 in 2014, 1080 was GP104 in 2016 and 2080 was TU104 in 2018. The 780 was GK110 in 2013 (Technically, the 680 was also GK104). In addition, everything seems over-the-top with the cards (cooling and TDP).

This is obviously pending reviews, but I wonder if the 3070 just failed to deliver the required performance for an 80 class card (i.e. faster than the previous gen's ti). According to Nvidia it's just on-par with the 2080ti, while even the 2080 was around 10% faster than the 1080ti (and it was also released alongside the 2080ti).

Karnak said:
You guys remember Jensen by the end of last year?

TSMC will handle most of NVIDIA's next-gen 7nm Ampere GPUs

TSMC will be handling most of NVIDIA's new 7nm GPU according to these reports, Samsung will only have a small number of orders.

www.tweaktown.com

Still possible though if GA106 and smaller ones will be on TSMC N7. But for now that looks weird to me since GA102 and GA104 are on Samsung's 8nm...

Samsung is 8nm, they're technically not making any 7nm cards.

Question 'Ampere'/Next-gen gaming uarch speculation thread

Senior member

Lifer

Platinum Member

Diamond Member

Senior member

Lifer

Diamond Member

Platinum Member

Senior member

Platinum Member

Lifer

Diamond Member

Senior member

Golden Member

Lifer

Senior member

Diamond Member

Diamond Member

Junior Member

Platinum Member

Senior member

Lifer

Senior member

Senior member

Lifer

Platinum Member