NVIDIA Pascal Thread

Page 114 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Feb 19, 2009
10,457
10
76
OK, I'm lost. What is a 3840CC Maxwell?

Lol you're right, I had gloss over it.

980Ti is 2816 CC.

Titan X is 3072.

55% higher clocks vs 980Ti.

25% higher perf.

~10% less cores.

Er, there's actually no IPC gains.

What the heck.

There's even a regression based on 3dMark. !!!

Edit: Hmm, perhaps adding fine-grained preemption to the uarch along with graphics <-> compute instant context switch features may hurt IPC in game engines that don't benefit from it, such as 3dMark. Becoming GCN-like for better compute + graphics workloads has a cost associated, and it looks to be increased TDP and poorer IPC in older games. Thoughts?
 
Last edited:

maddie

Diamond Member
Jul 18, 2010
4,881
4,951
136
Lol you're right, I had gloss over it.

980Ti is 2816 CC.

Titan X is 3072.

55% higher clocks vs 980Ti.

25% higher perf.

~10% less cores.

Er, there's actually no IPC gains.

What the heck.

There's even a regression based on 3dMark. !!!

Edit: Hmm, perhaps adding fine-grained preemption to the uarch along with graphics <-> compute instant context switch features may hurt IPC in game engines that don't benefit from it, such as 3dMark. Becoming GCN-like for better compute + graphics workloads has a cost associated, and it looks to be increased TDP and poorer IPC in older games. Thoughts?
How reliable is the 2560CC in GP104? An error there throws everything off.
 

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
I think 1070 will end up being the star of the show here. NV will probably charge a hefty premium for the 1080 with GDDR5X being an excuse to bump up the price more. Anyone on a 980 Ti already that wants bang for their buck should probably wait for the true high end Pascal cards.

980 was a bad value relative to the 970, and ended up being completely obliterated by the 980 Ti for not much more taking launch prices into account, expect it'll be the same deal here.

Most likely true. This time though because AMD is MIA, NV could literally raise 1070 to $399-449 (~ 980Ti level) and 1080 to $599-649 (980Ti reference + 20-25%) and still come out looking like they delivered a stellar update. If NV prices 1070 at $329, it could sell even better than 970 did considering just how many people on Steam have HD7900 and R9 200 series.

As a GTX 980 owner looking to add a bit more GPU kick for running games in 4k, is the 1080 going to be worth the upgrade do you think? Or would it be worth waiting for 2nd gen FinFET GPUs with stacked mem?

I've only recently come back to the GPU discussion so I'm behind on a lot of this and 113 pages of thread is too much info to scan through.

The opportunity cost of holding onto a 980 for another year is too high. If you sell a 980 for $350-375 and put that $350 into a new $550 card, the new card won't drop much in the first 6 months due to lack of competition from AMD. Alternatively, it should be possible to sell a 980 and step-up to the 1070 for almost free or very little $ and 1070 should be close to 980Ti, essentially a free 25-30% boost in performance.

For 4K, I'd consider 1070 SLI over 1080 (if 970 SLI vs. 980 generation was anything to go by), but best bet is to wait for benchmarks.

A 2560 CC Pascal is keeping up with a 3840 CC Maxwell, looks like clock per clock Pascal's overall throughput is much better than Maxwell.

There is no such thing. The comparison is being made to an after-market 980Ti with 2816 CC.
 

SteveGrabowski

Diamond Member
Oct 20, 2014
7,424
6,156
136
Hmm, perhaps adding fine-grained preemption to the uarch along with graphics <-> compute instant context switch features may hurt IPC in game engines that don't benefit from it, such as 3dMark. Becoming GCN-like for better compute + graphics workloads has a cost associated, and it looks to be increased TDP and poorer IPC in older games. Thoughts?

What games do you think we should expect huge gains vs 980 Ti in with the better compute power of 1080? Quantum Break and Hitman instead of Crysis 3 and Witcher 3?
 
Feb 19, 2009
10,457
10
76
What games do you think we should expect huge gains vs 980 Ti in with the better compute power of 1080? Quantum Break and Hitman instead of Crysis 3 and Witcher 3?

Any game that has a higher % of the rendering using compute shaders.

Quantum Break definitely, expect a massive performance leap for the 1080 vs 980Ti.

It's not better compute power.

It's the graphics <-> compute switch that the GPU has to go through currently. If you're rendering graphics and the game calls for a compute workload, it has to flush everything and wait for full idle before starting the compute work. This is a slow process which leads to a performance penalty.

GCN can do this instantly and according to NV's paper, Pascal can too.
 

SteveGrabowski

Diamond Member
Oct 20, 2014
7,424
6,156
136
Any game that has a higher % of the rendering using compute shaders.

Quantum Break definitely, expect a massive performance leap for the 1080 vs 980Ti.

It's not better compute power.

It's the graphics <-> compute switch that the GPU has to go through currently. If you're rendering graphics and the game calls for a compute workload, it has to flush everything and wait for full idle before starting the compute work. This is a slow process which leads to a performance penalty.

GCN can do this instantly and according to NV's paper, Pascal can too.

OK, thanks!
 

renderstate

Senior member
Apr 23, 2016
237
0
0
Determining IPC from a single number is borderline delusional. To not mention that IPC tends to decrease with higher frequency since the memory subsystem cannot indefinitely keep up with it. Moreover if giving up 5% IPC allows you to modify the architecture and increase clock by 20% then you take it.. (assuming everything else doesn't change, which is probably unrealistic..)


Sent from my iPhone using Tapatalk
 

JDG1980

Golden Member
Jul 18, 2013
1,663
570
136
Moreover if giving up 5% IPC allows you to modify the architecture and increase clock by 20% then you take it..

That is a very risky design strategy. On the CPU front, both Intel (Netburst) and AMD (Bulldozer) tried that, and both efforts were miserable failures - the clock gains weren't nearly as high as hoped, and the IPC losses were worse than expected.

Maybe it works better on GPUs, but it has the potential to fail hard if anything is out of place even a little.
 

renderstate

Senior member
Apr 23, 2016
237
0
0
That is a very risky design strategy. On the CPU front, both Intel (Netburst) and AMD (Bulldozer) tried that, and both efforts were miserable failures - the clock gains weren't nearly as high as hoped, and the IPC losses were worse than expected.

Maybe it works better on GPUs, but it has the potential to fail hard if anything is out of place even a little.


First of all let me clarify I don't believe for a second NVIDIA traded IPC for higher clocks (and at the moment we have no data whatsoever to think they did..).

Second, if you have a very good pre-silicon simulation infrastructure you don't have to take huge risks. You model your architectural changes and verify whether they work or not on a wide range of workloads. Of course it's not as easy as it sounds but it's doable.

Third, IIRC on P4 IPC losses were mostly due to the super long (30+ stages) pipeline getting flushed and refilled after branch mispredictions. On GPUs this is not an issue since there is no speculative execution, as they hide latencies much better by running other warps/wavefronts while waiting to resolve a branch target & fetch new instructions.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
That is a very risky design strategy. On the CPU front, both Intel (Netburst) and AMD (Bulldozer) tried that, and both efforts were miserable failures - the clock gains weren't nearly as high as hoped, and the IPC losses were worse than expected.

Maybe it works better on GPUs, but it has the potential to fail hard if anything is out of place even a little.

GPUs are not designed for latency. They are designed for throughput. There is no "IPC" in the same context.

Higher clocks have huge benefits: Better geometry performance, better pixel performance (ROPs), better L2 cache bandwidth etc.
Decreasing the core count and combine them with a higher clock helps for a better utilizing of the architecture.
 

coercitiv

Diamond Member
Jan 24, 2014
6,624
14,032
136
Maxwell has a better perf/watt than Kepler and GCN.
Are you really trying to convince us that increasing frequency (linear increase in perf, exponential increase in power) is the way to increase efficiency?
 

Timmah!

Golden Member
Jul 24, 2010
1,512
824
136
GPUs are not designed for latency. They are designed for throughput. There is no "IPC" in the same context.

Higher clocks have huge benefits: Better geometry performance, better pixel performance (ROPs), better L2 cache bandwidth etc.
Decreasing the core count and combine them with a higher clock helps for a better utilizing of the architecture.

I would prefer if they kept the core count intact and combined it with higher clock. Somehow i feel this would be better for me, regardless of architectural utilization levels.
 

xpea

Senior member
Feb 14, 2014
451
153
116
Are you really trying to convince us that increasing frequency (linear increase in perf, exponential increase in power) is the way to increase efficiency?
Every process has a different sweet spot in efficiency.
28nm and 14nm don't have same electrical characteristics nor same voltage/power curve (planar and FinFet don't behave the same)

But even more important, Maxwell and CGN are very different architectures in front end, pipeline length, arithmetic units, LDS, cache size/structure, etc
That why, despite using the same TSMC process, Maxwell can clock much higher than CGN.

For Pascal, Nvidia may have optimized the uarch to reach best efficiency at high clocks under 16FF+ (even if 2GHz is nothing spectacular, many ARM SoC already hit this frequency at 28nm...)
 

Glo.

Diamond Member
Apr 25, 2015
5,802
4,776
136
Lol you're right, I had gloss over it.

980Ti is 2816 CC.

Titan X is 3072.

55% higher clocks vs 980Ti.

25% higher perf.

~10% less cores.

Er, there's actually no IPC gains.

What the heck.

There's even a regression based on 3dMark. !!!

Edit: Hmm, perhaps adding fine-grained preemption to the uarch along with graphics <-> compute instant context switch features may hurt IPC in game engines that don't benefit from it, such as 3dMark. Becoming GCN-like for better compute + graphics workloads has a cost associated, and it looks to be increased TDP and poorer IPC in older games. Thoughts?
http://vrworld.com/2016/04/25/16nm-nvidia-geforce-gtx-1080-leak-ahead-computex-taipei-2016/
VRZone said:
While the specs of GP104 are still hidden, we can now say that the chip packs 1920 CUDA Cores and a 256-bit controller supports both GDDR5 and GDDR5X (MSI uses GDDR5X) memory.
How come people did not seen this even if it was on previous pages, already?
 

Glo.

Diamond Member
Apr 25, 2015
5,802
4,776
136
Maxwell has a better perf/watt than Kepler and GCN.

No, it doesn't. Performance per watt is counted from compute power. Thats the first thing. If you want to compare GPUs compare GTX 980 with R9 380X, GTX 980 TI with R9 390X, and Fury X, and Fury Nano to any Nvidia GPU.

DirectX12 benchmarks also show that currently that performance is reflected in the games.
 

airfathaaaaa

Senior member
Feb 12, 2016
692
12
81
so who can actually do a sum up of the situation?
(im talking only about confirmed stuff so far after the bench leak)
 

Glo.

Diamond Member
Apr 25, 2015
5,802
4,776
136
1920 CUDA core GPU with 1.86 GHz core clock, 10 GHz GDDR5X memory is 25% faster than reference GTX 980 Ti, and 4% faster than GTX 980 Ti Waterforce Gaming.

It also has around 1 TFLOPs of compute power more than reference GTX 980 Ti.

TDP between 165 and 200W. Because of extremely high core clocks it will be closer to 200W.
 

airfathaaaaa

Senior member
Feb 12, 2016
692
12
81
1920 CUDA core GPU with 1.86 GHz core clock, 10 GHz GDDR5X memory is 25% faster than reference GTX 980 Ti, and 4% faster than GTX 980 Ti Waterforce Gaming.

It also has around 1 TFLOPs of compute power more than reference GTX 980 Ti.

TDP between 165 and 200W. Because of extremely high core clocks it will be closer to 200W.

that tdp is from what? they finally have a hardware sc or something else?
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |