Full AMD Polaris 10 GPU has 2304 Stream Processors

Sweepr · Jul 4, 2016

PCGH: Is the P10 a 36-CU-part and no hidden additional CUs?

Evan Groenke: I can absolutely confirm with you right here, that Polaris 10 in its full configuration defined by the silicon is a 36 Compute Unit configuration theres nothing else hidden on that product that end users might be looking forward to unlocking. This is the pinnacle, the latest and greatest of the Polaris 10 product.

www.pcgameshardware.de/AMD-Polaris-Hardware-261587/News/Interview-1200545/

Arachnotronic · Jul 4, 2016

Sweepr said:
www.pcgameshardware.de/AMD-Polaris-Hardware-261587/News/Interview-1200545/

Good find, should put the rumors of a 2560 SP part to rest.

Kenmitch · Jul 4, 2016

Nothing wrong with paying for and getting a full chip as far as I can see. Yields must be decent at GloFo if the RX 480 is the full chip. Seems silly but I'd rather have the full chip. Guess some dead silicon could somewhat help with heat issues a little bit.

el etro · Jul 4, 2016

Looks like the problem is the process. For this power range. Rx470 itself is a much more efficient design and Polaris 11 is looking to be even more efficient. 14LPP seems to have a problem scaling over 100 Watts.

The higher-end Vega parts could be fabbed on TSMC16FF+.

Arachnotronic · Jul 4, 2016

el etro said:
Looks like the problem is the process. For this power range. Rx470 itself is a much more efficient design and Polaris 11 is looking to be even more efficient. 14LPP seems to have a problem scaling over 100 Watts.

Don't blame the process. NVIDIA said when it launched Pascal that if it didn't spend all that time tweaking the critical paths in the chip and just did a straight shrink, it would have gotten only about 1300MHz.

Let this serve as a lesson to those who were mocking Pascal as being a "die shrink" of Maxwell -- a lot of work goes into being able to increase frequency by ~40% generation on generation, even with a new process, while keeping power consumption low.

Pascal uArch might not have changed much from Maxwell, but the layout/implementation saw a ton of work.

There's more to delivering a good architecture than whether it supports some buzzword feature.

JDG1980 · Jul 4, 2016

Arachnotronic said:
Don't blame the process. NVIDIA said when it launched Pascal that if it didn't spend all that time tweaking the critical paths in the chip and just did a straight shrink, it would have gotten only about 1300MHz.

Nvidia says a lot of things, not all of them true. Of course they want to play up the amount of R&D they put into Pascal, so they can justify increasing the price point for medium-sized chips yet again.

Don't forget that the increased clock speeds of Pascal do come with a cost: lower shader density. 2560 shaders for a 314mm^2 chip on 16nm FinFET really isn't that high. Nvidia presumably would have had lower clocks if they'd packed the shaders more densely on the die, as AMD did. In this particular case it appears the trade-off was worth it. But it's closer than you think. GTX 1080 peaks at ~8.9 TFlops at max default boost clock. RX 480 with its 232mm^2 die peaks at ~5.8 TFlops at max default boost. This means Polaris 10 has ~65% of the raw computing power of GP104, at ~74% of the die size. The reason Nvidia comes out much further ahead than that is because Nvidia's drivers and architecture are much better at translating TFlops into real-world gaming performance - at least in DX11.

Arachnotronic · Jul 4, 2016

JDG1980 said:
Nvidia says a lot of things, not all of them true. Of course they want to play up the amount of R&D they put into Pascal, so they can justify increasing the price point for medium-sized chips yet again.

They did put a lot of R&D into Pascal, and that R&D has clearly paid off in terms of very high clocks and an efficient, compact design.

Don't forget that the increased clock speeds of Pascal do come with a cost: lower shader density. 2560 shaders for a 314mm^2 chip on 16nm FinFET really isn't that high. Nvidia presumably would have had lower clocks if they'd packed the shaders more densely on the die, as AMD did.

GP104 features more TMUs than Polaris 10 (160 vs 144), twice the ROPs (64 versus 32), and there are obviously other parts of the GPU that aren't related to shader count (Polymorph engine, Simultaneous Multiprojection block, GDDR5X controller, etc.) that may add to the area/xtor count while not ballooning shader count.

In this particular case it appears the trade-off was worth it. But it's closer than you think. GTX 1080 peaks at ~8.9 TFlops at max default boost clock. RX 480 with its 232mm^2 die peaks at ~5.8 TFlops at max default boost. This means Polaris 10 has ~65% of the raw computing power of GP104, at ~74% of the die size. The reason Nvidia comes out much further ahead than that is because Nvidia's drivers and architecture are much better at translating TFlops into real-world gaming performance - at least in DX11.

As I said above, there's more to gaming performance than just raw FLOPs.

In terms of xtor density NVIDIA put 7.2 billion xtors in a 314mm^2 area, while AMD put 5.7 billion in 232mm^2. AMD's chip has ~24.57 million transistors/mm^2, while NVIDIA's is at ~23 million/mm^2.

AMD's design is slightly denser, but the slight areal disadvantage that NVIDIA has is more than offset by the perf/mm^2 advantage that NVIDIA has.

coercitiv · Jul 4, 2016

Arachnotronic said:
NVIDIA said when it launched Pascal that if it didn't spend all that time tweaking the critical paths in the chip and just did a straight shrink, it would have gotten only about 1300MHz.

So Nvidia said that if they hadn't tweaked the critical paths, going from 980 to 1080 frequency would have stayed more or less the same at... what.. same TDP?! Can I take this with a grain of salt?

AtenRa · Jul 4, 2016

Arachnotronic said:
AMD's design is slightly denser, but the slight areal disadvantage that NVIDIA has is more than offset by the perf/mm^2 advantage that NVIDIA has.

Only in DX-11.

JDG1980 · Jul 4, 2016

coercitiv said:
So Nvidia said that if they hadn't tweaked the critical paths, going from 980 to 1080 frequency would have stayed more or less the same at... what.. same TDP?! Can I take this with a grain of salt?

Yeah, I can buy that a straight port of Maxwell to 16FF+ might have only gotten them to, say, 1600-1700 MHz instead of the 2000-2100 MHz they actually got, but not that they wouldn't get any gains at all.

el etro · Jul 4, 2016

Arachnotronic said:
Don't blame the process. NVIDIA said when it launched Pascal that if it didn't spend all that time tweaking the critical paths in the chip and just did a straight shrink, it would have gotten only about 1300MHz.

RX470 packs 87% of RX 480 performance at 2.7x times R9 290 efficiency, will be rated at 110W. Never we get such disparity between same gen/node cards efficiency. Also AMD already states they achieved even more than this with Polaris11. There IS a process problem.

el etro · Jul 4, 2016

coercitiv said:
So Nvidia said that if they hadn't tweaked the critical paths, going from 980 to 1080 frequency would have stayed more or less the same at... what.. same TDP?! Can I take this with a grain of salt?

For sure they can screw all and make the card clock not better with offering not much increase in power efficiency, we saw what happened with Fermi.

Sweepr · Jul 4, 2016

el etro said:
RX470 packs 87% of RX 480 performance at 2.7x times R9 290 efficiency, will be rated at 110W.

What's the source on this?

el etro · Jul 4, 2016

Sweepr said:
What's the source on this?

Just firestrike scores and estimations(Robert Hallock said RX 470 gives 2.7x some 28nm GCN GPU efficiency) based on AMD statements plus that june 17 slide.

Abwx · Jul 4, 2016

coercitiv said:
So Nvidia said that if they hadn't tweaked the critical paths, going from 980 to 1080 frequency would have stayed more or less the same at... what.. same TDP?! Can I take this with a grain of salt?

They said same frequency not TDP, but then i do not agree with this point, it s about 100% sure that frequency would had still increased substancialy, the lower density is due to 16FF+ wich has less density than Samsung s 14nm LPP, i guess that it was forgotten by Arachnotronic when he explain that higher frequency was due to said lower density.

Sweepr · Jul 4, 2016

Arachnotronic said:
They did put a lot of R&D into Pascal, and that R&D has clearly paid off in terms of very high clocks and an efficient, compact design.

According to Hardware.fr their perf/watt improvement was more substantial than AMD's, and looking at where they were with Maxwell, that's significant.

Note that Tonga is more efficient than Hawaii according to TPU, and it wasn't included here.

Erenhardt · Jul 4, 2016

Not really:

Note that 480 is more efficient than fury and furyx according to TPU

geoxile · Jul 4, 2016

el etro said:
RX470 packs 87% of RX 480 performance at 2.7x times R9 290 efficiency, will be rated at 110W. Never we get such disparity between same gen/node cards efficiency. Also AMD already states they achieved even more than this with Polaris11. There IS a process problem.

If that's true then that guy on OCN might have been right about Polaris 10 having problems. Not sure why you think it's a process problem when we've only seen one bad case and heard of one hypothetically good case.

Sweepr · Jul 4, 2016

Erenhardt said:
Note that 480 is more efficient than fury and furyx according to TPU

Only at 1080p and below. Fury is ahead and Fury X is closer at 1440p (Hardware.fr resolution).

And Nano beats regular Fury.

tweakboy · Jul 4, 2016

It will go for 900 dollars.

el etro · Jul 4, 2016

Sweepr said:
Only at 1080p and below. Fury is ahead and Fury X is closer at 1440p (Hardware.fr resolution).

And Nano beats regular Fury.

You are right, but RX480 achieves its maximum potential at 1080p.

Also Nano is a very special low power bin, designed to cap a ~240W card into a much lower power limit, where its efficiency shines!

3DVagabond · Jul 4, 2016

Well this went off topic immediately. lol Not surprising though.

Silverforce11 · Jul 4, 2016

It may well be, but we will know the *truth* when the Mac refresh is here. Anything from PR needs to be taken with a grain of salt as you all should be fully aware of that by now!

Sweepr · Jul 4, 2016

Silverforce11 said:
It may well be, but we will know the *truth* when the Mac refresh is here. Anything from PR needs to be taken with a grain of salt as you all should be fully aware of that by now!

Sorry but no. Polaris 10 is a 36 CUs GPU.

Evan Groenke is Senior Product Manager for Polaris 10 and he was deeply involved in the development of Polaris. He should know if the full configuration of Polaris is 40 compute units (like some websites state) or 36. Groenke made absolutely clear that the latter is the case. Here is his full quote of the interview, you will find the whole audio interview at the end of the article.

PCGH: Is the P10 a 36-CU-part and no hidden additional CUs?

Evan Groenke: I can absolutely confirm with you right here, that Polaris 10 in its full configuration defined by the silicon is a 36 Compute Unit configuration there's nothing else hidden on that product that end users might be looking forward to unlocking. This is the pinnacle, the latest and greatest of the Polaris 10 product.

They could not have made it more clear than this.

3DVagabond · Jul 4, 2016

How many times did AMD claim that Tonga only had a 256bit memory bus? Turned out not to be true.

Full AMD Polaris 10 GPU has 2304 Stream Processors

Diamond Member

Lifer

Diamond Member

Golden Member

Lifer

Golden Member

Lifer

Diamond Member

Lifer

Golden Member

Golden Member

Golden Member

Diamond Member

Golden Member

Lifer

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Golden Member

Lifer

Lifer

Diamond Member

Lifer