NVIDIA Pascal Thread

Page 68 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Feb 19, 2009
10,457
10
76
It is quite funny because you have posted that table in this very thread

http://forums.anandtech.com/showpost.php?p=38146747&postcount=1065

64 Cores from Pascal Arch have exactly the same performance clock to clock as 128 Cores from Maxwell and 192 from Kepler.

That table does not refer to performance. Just the actual layout of the SM/CC clusters and supporting registers etc.

The important point from that, is mentioned in that post i made.

Pascal will be much better than Maxwell in next-gen games that are GCN-optimized. It should not "tank" due to being GCN-like.
 

Glo.

Diamond Member
Apr 25, 2015
5,765
4,670
136
Yes, you are correct. But IPC performance of Maxwell Cores was bigger than Kepler due to the increased amount of work they were able to do. The same thing is for Pascal, but in much higher ratio than Kepler.

64 cores can do exactly the same amount of work that 192 Kepler cores, and 128 Maxwell Cores. It is in that table .
 
Feb 19, 2009
10,457
10
76
Yes, you are correct. But IPC performance of Maxwell Cores was bigger than Kepler due to the increased amount of work they were able to do. The same thing is for Pascal, but in much higher ratio than Kepler.

64 cores can do exactly the same amount of work that 192 Kepler cores, and 128 Maxwell Cores. It is in that table .

Not exactly, just max thread sizes (which goes into the warp scheduler and caches), does not equate to same amount of work.

If 128 Maxwell cores actually did the same amount of "work" as 192 Kepler cores, the 980 would have much much higher IPC than the 20-30%.

Remember, 980 has a big 30% clock speed advantage over the 780Ti and it's barely faster (unless GimpWorks kills Kepler). Overall IPC is just 20-30%, basically per Core per clock improvements.

Pascal has 64 CC per SM, but it has much more SM to make up for the lower CC count. What you can say is those 64 CC per SM is more likely to be fully utilized than 128 CC per SM in Maxwell. So per CC, Pascal is worth more.

I wouldn't say it's much more, since they are essentially the same uarch as per NV's published statements.

Edit: To clarify, because the scheduler & warp scheduler remains the same, the max thread per SM (which has 2x Warp Schedulers each SM) remains the same. You can queue up the threads to that limit and the warp scheduler will send it to the CC to process over time. It's not an instant "send 2048 threads" thing and have the CC magically finish all those threads at the same time with 64 CC vs 128 CC in Maxwell.
 
Last edited:

Glo.

Diamond Member
Apr 25, 2015
5,765
4,670
136
Thats because, what Mahigan pointed already out: Maxwell Cores are starved for Bandwidth. Secondly - scale.

2816 CUDA core GTX 980 Ti is around 50%(Sometimes more, sometimes less, mostly less) faster than GTX 780 Ti which has 2880 CUDA Cores. Both have identical bandwidth, but core clocks are different and they differ by the amount of performance cores can output. Cores of different architectures are bringing advancements. Jump from 128 Cores of Maxwell to 128 Cores Pascal is exactly 2X. The question is, will they be fed enough. Core clocks will be high, very high. But the ROP amount in GP104 is worrying: 64 ROPs, and GDDR5 and GDDR5X memory. And we already know that this may not be enough, by looking at Maxwell. Thats why Nvidia went with HBM2 for their GP100.

Overall performance increase between 2048 Maxwell CUDA GPU and 2048 CUDA Pascal GPU will not be 2X. But it may end up between 30 and 50%. And that is quite a feat, to be honest.
 
Last edited:

Glo.

Diamond Member
Apr 25, 2015
5,765
4,670
136
Edit: To clarify, because the scheduler & warp scheduler remains the same, the max thread per SM (which has 2x Warp Schedulers each SM) remains the same. You can queue up the threads to that limit and the warp scheduler will send it to the CC to process over time. It's not an instant "send 2048 threads" thing and have the CC magically finish all those threads at the same time with 64 CC vs 128 CC in Maxwell.

Yes, you are right, front end is bottlenecking here.
 
Feb 19, 2009
10,457
10
76
Thats because, what Mahigan pointed already out: Maxwell Cores are starved for Bandwidth. Secondly - scale.

2880 CUDA core GTX 780 Ti is 50%(sometimes less, sometimes more, depending on the game and... drivers) slower than reference GTX 980 Ti which has 2816 CUDA cores. Both have identical bandwidth, but core clocks are different and they differ by the amount of performance cores can output. Cores of different architectures are bringing advancements. Jump from 128 Cores of Maxwell to 128 Cores Pascal is exactly 2X. The question is, will they be fed enough. Core clocks will be high, very high. But the ROP amount in GP104 is worrying: 64 ROPs, and GDDR5 and GDDR5X memory. And we already know that this may not be enough, by looking at Maxwell. Thats why Nvidia went with HBM2 for their GP100.

Overall performance increase between 2048 Maxwell CUDA GPU and 2048 CUDA Pascal GPU will not be 2X. But it may end up between 30 and 50%. And that is quite a feat, to be honest.

In your example, 780Ti vs 980Ti, there is again a huge clock speed deficit on the 780Ti.

~25-30% clock speed gains on the 980Ti and extra vram which has an effect in many games from 2015 onwards. When you work it out, each Maxwell CC is effectively x1.2 to x1.3 Kepler CC only at the same clocks. Hence the 20-30% IPC.

If you want to claim Pascal has 30-50% IPC, with its ~20% clock speed advantage, GP104 will be 50-70% faster than Titan X.

Now, compare the last node shrink and uarch change, 580 -> 680. The 680 was ~25-30% faster. But the 580 chip itself was compute heavy and so it suffered perf/mm2 and perf/w, thus the 680 being a gaming focused chip, has even a better handicap.

This time, Titan X is already beast mode for gaming with gimped compute. It's already a lean-mean chip.

There's zero chance of GP104 being 50-70% faster. That's more for GP100. Because GM200 is already a gaming focused chip, I would say this time around, the delta will be potentially less than the 680 vs 580 comparison.

GP104 full ~Titan X + 20% is a good result on a small chip with the power savings.
 

Glo.

Diamond Member
Apr 25, 2015
5,765
4,670
136
In your example, 780Ti vs 980Ti, there is again a huge clock speed deficit on the 780Ti.

~25-30% clock speed gains on the 980Ti and extra vram which has an effect in many games from 2015 onwards. When you work it out, each Maxwell CC is effectively x1.2 to x1.3 Kepler CC only at the same clocks. Hence the 20-30% IPC.

If you want to claim Pascal has 30-50% IPC, with its ~20% clock speed advantage, GP104 will be 50-70% faster than Titan X.

Now, compare the last node shrink and uarch change, 580 -> 680. The 680 was ~25-30% faster. But the 580 chip itself was compute heavy and so it suffered perf/mm2 and perf/w, thus the 680 being a gaming focused chip, has even a better handicap.

This time, Titan X is already beast mode for gaming with gimped compute. It's already a lean-mean chip.

There's zero chance of GP104 being 50-70% faster. That's more for GP100. Because GM200 is already a gaming focused chip, I would say this time around, the delta will be potentially less than the 680 vs 580 comparison.

GP104 full ~Titan X + 20% is a good result on a small chip with the power savings.

I am not even comparing GP104 to GM200 I still compare GP104 to GM204 .

And I still think that GP104 will be around 30-50% faster than GM204.
 

Sweepr

Diamond Member
May 12, 2006
5,148
1,142
131
Nvidia launching Pascal Geforce lineup at editor’s event before Computex 2016 (mid-May)

We received independent confirmation of the invites and they were actually handed out at the time before the GTC Keynote. The Nvidia Editor’s event will be the official press launch of for the new lineup of 16nm Geforce GPUs by Nvidia. It is not yet known at this point whether Nvidia will allow the event to be covered live or it will have an NDA to be revoked at the time the reviews hit. At any rate, we expect reviews of the new GPUs to be here by mid-May.

http://wccftech.com/nvidia-launch-pascal-editors-event-may

Reviews in a month?
 
Last edited:

antihelten

Golden Member
Feb 2, 2012
1,764
274
126
2880 CUDA core GTX 780 Ti is 50%(sometimes less, sometimes more, depending on the game and... drivers) slower than reference GTX 980 Ti which has 2816 CUDA cores.

The performance difference between 780 Ti and 980 Ti isn't nearly that big. on average the 780 Ti is only 25-30% slower, not 50%. And as Silverforce11 mentioned the 780 Ti is clocked significantly lower (15-20% lower or so), so at clock parity a 780 Ti would only be about 10-15% slower than a 980 Ti, or inversely the 980 Ti would be about 15% faster. That would indicate an increase in performance per CUDA core of 15-20% between Maxwell and Kepler.

If Pascal achieves a similar 15-20% increase in IPC per core and clocks at 1480 like P100 (vs. 1200 for reference 980 Ti), then that would result in a 45% performance increase assuming equal CUDA core count. Current rumors seem to put GP104 somewhere around 2560 CUDA cores, or about 10% less than GM200, so the net performance increase over a stock 980 Ti would be about 30% or about 10% higher than an aftermarket 980 Ti. I would be about 60% higher than a 980 non-Ti (GM204).
 

Glo.

Diamond Member
Apr 25, 2015
5,765
4,670
136
I did not said it was 50% slower . I said that GTX 980 Ti is 50% faster than GTX 780 Ti, which exactly relates to 25-30% slower GTX 780 Ti . The matter of point of view .

Performance gains will be bigger however in IPC on Pascal, because of the scale. 192 vs 128 Cores. 128 vs 64 Cores. See my point?
 

MrTeal

Diamond Member
Dec 7, 2003
3,587
1,748
136
I did not said it was 50% slower . I said that GTX 980 Ti is 50% faster than GTX 780 Ti, which exactly relates to 25-30% slower GTX 780 Ti . The matter of point of view .

Performance gains will be bigger however in IPC on Pascal, because of the scale. 192 vs 128 Cores. 128 vs 64 Cores. See my point?

No, you said the 780Ti was 50% slower than the 980 Ti. antihelten even quoted where you used those exact words.
 

Glo.

Diamond Member
Apr 25, 2015
5,765
4,670
136
No, you said the 780Ti was 50% slower than the 980 Ti. antihelten even quoted where you used those exact words.

Whoopsie, I meant the other way around .

Thanks for pointing that, I will correct it.
 

antihelten

Golden Member
Feb 2, 2012
1,764
274
126
I did not said it was 50% slower . I said that GTX 980 Ti is 50% faster than GTX 780 Ti, which exactly relates to 25-30% slower GTX 780 Ti . The matter of point of view .

Eerm no you did not say 50% faster, you said 50% slower, try going back and reading your own post.

Performance gains will be bigger however in IPC on Pascal, because of the scale. 192 vs 128 Cores. 128 vs 64 Cores. See my point?

Yes I see your point, but there is no logic to it, I still don't think you understand why Nvidia went from 192 cores in Kepler to 128 cores in Maxwell.

Reducing the number of cores per SM is not some magical method to increase IPC, it was done to achieve a better balance of functional unit and thus throughput since Kepler had poor throughput (relative to it's theoretical max). It is basically the same reason why AMD switched from a VLIW5 design to a VLIW4 design back in the days. If Nvidia did their job properly there is no reason to believe that Maxwell requires additional balancing of functional units to achieve proper throughput.

Most likely Pascal cut the SM of Maxwell in half whilst keeping the registry file the same size because that was the easiest way of ensuring that each SM had access to a registry file twice the size (the alternative way would have been to keep the SM the same size as in Maxwell and then simply doubling the size of the registry file). The reason why Nvidia wanted double the registry file size is probably related to compute, not gaming.
 
Last edited:

Glo.

Diamond Member
Apr 25, 2015
5,765
4,670
136
Eerm no you did not say 50% faster, you said 50% slower, try going back and reading your own post.



Yes I see your point, but there is no logic to it, I still don't think you understand why Nvidia went from 192 cores in Kepler to 128 cores in Maxwell.

Reducing the number of cores per SM is not some magical method to increase IPC, it was done to achieve a better balance of functional unit and thus throughput since Kepler had poor throughput. It is basically the same reason why AMD switched from a VLIW5 design to a VLIW4 design back in the days. If Nvidia did their job properly there is no reason to believe that Maxwell requires additional balancing of functional units to achieve proper throughput.

Most likely Pascal cut the SM of Maxwell in half whilst keeping the registry file the same size because that was the easiest way of ensuring that each SM had access to a registry file twice the size (the alternative way would have been to keep the SM the same size as in Maxwell and then simply doubling the size of the registry file). The reason why Nvidia wanted double the registry file size is probably related to compute, not gaming.

After thinking about it - you are right. And you know with what is it related?

Unified Memory.
 

CakeMonster

Golden Member
Nov 22, 2012
1,428
535
136
Nvidia launching Pascal Geforce lineup at editor’s event before Computex 2016 (mid-May)

Aiming to get as much sales as possible before AMD launches? Just a wild guess but a few weeks could mean a lot of $$ here for whoever manages to launch slightly ahead. Some buyers are that impatient...
 

Creig

Diamond Member
Oct 9, 1999
5,171
13
81
Are they "launching" the Pascal lineup or simply "announcing" the Pascal lineup.

Big difference.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
Are they "launching" the Pascal lineup or simply "announcing" the Pascal lineup.

Big difference.

The faster they will announce/launch Pascal for desktop the faster people will start to forget how badly Maxwell is performing in 2016 games and especially in DX-12 titles.
NV seriously needs to steer away bad public perception from Maxwell in 2016 and in to a new DX-12 architecture with Pascal that will "promise" better DX-12 performance.
 

Sweepr

Diamond Member
May 12, 2006
5,148
1,142
131
Are they "launching" the Pascal lineup or simply "announcing" the Pascal lineup.

Big difference.

Looks like 'press launch' with reviews. Actual launch could happen at or right after Computex though.
 

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
Hey, it could happen, but you seem to be avoiding answering me. Probably because you know how stupid the answer sounds.

I'll make it easier for ya, since you have a hard time understanding a simplistic example...

Say a company develops a GPU that is 150% the performance of anything out there today, but can do so at the same cost. Why, exactly, do you feel they should charge the same price?

Ready... GO!

Great! You're calling me stupid. lol I'm crushed. How will I sleep at night?

If you think the example you gave (2x the perf - 1/2 the size) is realistic fine. I've seen perf/$ improvements regularly occur. It's one of the benefits of technology. See below.
 

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
That scenario isn't just realistic, it's downright inevitable. It's just a question of time, wont happen on 16nm obviously, but 10nm should get close.

Over multiple gens, sure. I'm sure we can find examples though in that situation where the 2x/perf - 1/2 the size actually cost less and more, depending on the example.
 

Kris194

Member
Mar 16, 2016
112
0
0
Can't wait to see Pascal results next month. It seems that Nvidia will have a strong competition this year (Polaris 10).
 
Last edited:
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |