Full AMD Polaris 10 GPU has 2304 Stream Processors

Page 5 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

StrangerGuy

Diamond Member
May 9, 2004
8,443
124
106
On the other end of the spectrum, the 1060 suggests Pascal is a monster at scaled down core counts. It's not going to be remotely surprising if NV pulls out a 60W 750 Ti successor except only this time at 380 level performance and that sizeable chunk of the market becomes completely locked down by NV.
 

Glo.

Diamond Member
Apr 25, 2015
5,761
4,666
136
I maintain that AMD will need a larger die size and more power usage to match a GTX1080 with Vega 10. Their only way to compete will be on price or price/performance. This means GTX1080Ti/Titan P should have 0 competition this generation again. That's why I also predict a 2nd consecutive generation of NV selling GTX570 cut-down flagship under the x80Ti brand (980Ti and soon GTX1080Ti). I will be pleasantly surprised if NV actually releases a 3840 CC / 96 ROP / 240 TMU 384-bit GTX1080Ti. If they do, and Pascal scales almost linearly in GPU demanding games, then that's another 1.5X increase over the 1080. AMD will have no chance. To me right now this is AMD's HD2000 series generation unless I see some major changes. What masks AMD's engineering failure is RX 480's $199-239 price. The power usage should have been 110-120W with R9 390X performance, not 163-167W with R9 390 level of performance.

From what I understand, implementing scheduler from Polaris 10 to Fury, or simply designing one specifically for such wide GPU to utilize it fully would already bring 4096 GCN cores from Fiji into GTX 1080 performance levels.

AMD says that 14 nm process brings 1.9 times better efficiency, so porting 1:1 Fury to 14 nm would bring the GPU at 1050 MHz to 140-150W of power consumption with HBM1, and exactly on GTX 1080 levels of performance, if the new scheduler would be implemented.

Without new scheduler - whole GPU is pointless.

Fiji was Graphics IPv8, as well as Tonga, and as Polaris(Tonga was IPv8, Fiji IPv8.1, Polaris IPv8.2). Vega will be most likely IPv9.

Think about that this can indicate new scheduler, or completely new rasteriser or rasterising technology. Only that could account for new version of graphics IP number, as was in the past.

And then there is one last bit. Lets assume that 8 GB HBM2, 512 GB/s GPU with 4096 GCN cores is already at 1050 MHz on the same level of performance as GTX 1080. What will happen if you also bring new rasteriser and higher core clock: 1250 MHz.

As a last bit. http://www.freepatentsonline.com/20160085551.pdf

There is no sign of this patent in Polaris 10 and 11 architecture. So this is most likely Vega architecture part.

For me personally it is specifically designed for GPUs like Nano. Powergated by design.

Polaris 10 and 11 appears to be as a stop gap architecture, to gain cash from targeting widest marketshare possible, and as a stop gap solution, for Vega. Of course, Polaris 10 most likely will appear in future as rebadges.

We cannot also forget the rumours that AMD might bring their own gaming platform with Zen and Vega in Project Quantum.

http://wccftech.com/amd-project-quantum-not-dead-zen-cpu-vega-gpu/
 

USER8000

Golden Member
Jun 23, 2012
1,542
780
136
On the other end of the spectrum, the 1060 suggests Pascal is a monster at scaled down core counts. It's not going to be remotely surprising if NV pulls out a 60W 750 Ti successor except only this time at 380 level performance and that sizeable chunk of the market becomes completely locked down by NV.

At the other end of the spectrum AMD has both the RX460 and RX470 and we can't just look at the performance/watt of the RX480 and make them the same especially if effiency has been affected for absolute performance. It seems from what Raja Koduri stated Polaris 11 was the lead chip of the Polaris generation.
 
Mar 10, 2006
11,715
2,012
126
At the other end of the spectrum AMD has both the RX460 and RX470 and we can't just look at the performance/watt of the RX480 and make them the same especially if effiency has been affected for absolute performance. It seems from what Raja Koduri stated Polaris 11 was the lead chip of the Polaris generation.

He did say Polaris 11 was the lead chip, but I thought the naming was based on which chip was done first? Very confusing naming.
 
May 11, 2008
20,041
1,289
126
From what I understand, implementing scheduler from Polaris 10 to Fury, or simply designing one specifically for such wide GPU to utilize it fully would already bring 4096 GCN cores from Fiji into GTX 1080 performance levels.

AMD says that 14 nm process brings 1.9 times better efficiency, so porting 1:1 Fury to 14 nm would bring the GPU at 1050 MHz to 140-150W of power consumption with HBM1, and exactly on GTX 1080 levels of performance, if the new scheduler would be implemented.

Without new scheduler - whole GPU is pointless.

Fiji was Graphics IPv8, as well as Tonga, and as Polaris(Tonga was IPv8, Fiji IPv8.1, Polaris IPv8.2). Vega will be most likely IPv9.

Think about that this can indicate new scheduler, or completely new rasteriser or rasterising technology. Only that could account for new version of graphics IP number, as was in the past.

And then there is one last bit. Lets assume that 8 GB HBM2, 512 GB/s GPU with 4096 GCN cores is already at 1050 MHz on the same level of performance as GTX 1080. What will happen if you also bring new rasteriser and higher core clock: 1250 MHz.

As a last bit. http://www.freepatentsonline.com/20160085551.pdf

There is no sign of this patent in Polaris 10 and 11 architecture. So this is most likely Vega architecture part.

For me personally it is specifically designed for GPUs like Nano. Powergated by design.

Polaris 10 and 11 appears to be as a stop gap architecture, to gain cash from targeting widest marketshare possible, and as a stop gap solution, for Vega. Of course, Polaris 10 most likely will appear in future as rebadges.

We cannot also forget the rumours that AMD might bring their own gaming platform with Zen and Vega in Project Quantum.

http://wccftech.com/amd-project-quantum-not-dead-zen-cpu-vega-gpu/

The whole issue with simd alu's is that when you do not get all data slots occupied, efficiency is out the door. SIMD is single instruction on the same multiple data.
And that is what i have been questioning all along. polaris 10 has 2304 compute units but has actually 36 simd units. As soon as a 16 data set simd units is not fully occupied by data, polaris is at a loss. Now of course, the smx from pascal is the same in essence, single instruction multiple data. But it seems Nvidia is doing a whole of a lot better getting graphics data processed.
 
Last edited:

JDG1980

Golden Member
Jul 18, 2013
1,663
570
136
AIB 1080 is 2X faster than an RX 480 @ 1440p/4K, while maintaining 210W typical power usage with a peak of 224W. AMD would need a full generational leap from P10 to Vega to come close to GTX1080 if there were to target ~ 320-330mm2 die within the same 210-225W power envelope.

RX 480 currently uses about 160W. That means a naive doubling of Polaris 10 would theoretically use about 320W, which would indeed be bad.

But it's not that dire in reality. As I noted above (and see this thread for more details), P10 is bottlenecked by memory bandwidth. With GDDR5X, P10 could surpass vanilla Fury and fall about halfway between that and Fury X. That figure isn't hypothetical; it's based on actual overclocking results. Not only that, but running the memory controller at 1250 MHz (effective 10Gbps with GDDR5X) is likely to be far more efficient than running it at 2000 MHz like they do now. Throw in a new stepping and/or some normally expected process improvements at GloFo, and AMD should eventually be able to get ~15-20% performance improvements out of Polaris 10 while simultaneously cutting gaming power usage to near 125W. That would close most of the perf/watt gap even without taking into account potential savings from HBM2 or any of the Vega architectural improvements.
 

Sweepr

Diamond Member
May 12, 2006
5,148
1,142
131
RX 480 currently uses about 160W. That means a naive doubling of Polaris 10 would theoretically use about 320W, which would indeed be bad.

But it's not that dire in reality. As I noted above (and see this thread for more details), P10 is bottlenecked by memory bandwidth.

Not more than Geforce GTX 1070 is. 12.5% extra memory bandwidth provides 6% better performance according to ComputerBase. This would still put it behind the Nano in their charts.

https://www.computerbase.de/2016-06...diagramm-performancerating-speicherbandbreite

There's a huge gap to overcome to even match Geforce GTX 1080, let alone GP102 VGAs rumoured to launch in a few months.
 

Glo.

Diamond Member
Apr 25, 2015
5,761
4,666
136
RX 480 currently uses about 160W. That means a naive doubling of Polaris 10 would theoretically use about 320W, which would indeed be bad.

But it's not that dire in reality. As I noted above (and see this thread for more details), P10 is bottlenecked by memory bandwidth. With GDDR5X, P10 could surpass vanilla Fury and fall about halfway between that and Fury X. That figure isn't hypothetical; it's based on actual overclocking results. Not only that, but running the memory controller at 1250 MHz (effective 10Gbps with GDDR5X) is likely to be far more efficient than running it at 2000 MHz like they do now. Throw in a new stepping and/or some normally expected process improvements at GloFo, and AMD should eventually be able to get ~15-20% performance improvements out of Polaris 10 while simultaneously cutting gaming power usage to near 125W. That would close most of the perf/watt gap even without taking into account potential savings from HBM2 or any of the Vega architectural improvements.
Not really. 110W is the power consumption of the GPU die, without board.

Think about HBM power consumption which should be around 10W instead of 37-80W depending on the memory bit bus of GDDR5. You already has much different picture.

If 2304 GCN core chip draws 110W at 1266 MHz, and high voltage, lowering the voltage and slowing the chip down to lets say 1250 MHz brings the power consumption down. 100W impossible then? So 4608 GCN core GPU could draw between 200-220W alone without board.

And then is Nano sized board with HBM only. How would that behave? 250W is the highest possibility for 4608 GCN core GPU. With work case scenario.

And to add to all that. 8 GB of HBM2 will use only 2 stacks that will bring 512 GB/s. 4W of power consumption.
 

boozzer

Golden Member
Jan 12, 2012
1,549
18
81
Not really. 110W is the power consumption of the GPU die, without board.

Think about HBM power consumption which should be around 10W instead of 37-80W depending on the memory bit bus of GDDR5. You already has much different picture.

If 2304 GCN core chip draws 110W at 1266 MHz, and high voltage, lowering the voltage and slowing the chip down to lets say 1250 MHz brings the power consumption down. 100W impossible then? So 4608 GCN core GPU could draw between 200-220W alone without board.

And then is Nano sized board with HBM only. How would that behave? 250W is the highest possibility for 4608 GCN core GPU. With work case scenario.

And to add to all that. 8 GB of HBM2 will use only 2 stacks that will bring 512 GB/s. 4W of power consumption.
hey glo, how many watts would 480 be if it uses 4gb of HBM?
 

tviceman

Diamond Member
Mar 25, 2008
6,734
514
126
www.facebook.com
That's not an argument as to why it can't be a successor to both.

It has roughly the same number of xtors as Tonga (which to me makes it interesting to see what level of generational change AMD is getting) and was launched into the same price range. Seems like a match to me.

And afaik no Tonga had this "magical"* 384 bit bus exposed.





*Why the need to be pejorative?

Successors always have more transistors and similar die sizes to their predecessors. GP106 has more transistors than GM206, which had more transistors than GK106, which had more transistors than GF116.

Also, according to many on the web- including AMD execs- Tonga had a physical memory bus implementation of 384-bits. It was just never enabled on any products. I said magical because Tonga was an engineering failure. I guess, in that same vein, Polaris could be considered a Tonga successor since Polaris is not nearly as impressive as pitcairn was and remained during its shelf life.
 
Last edited:
May 11, 2008
20,041
1,289
126
Successors always have more transistors and similar die sizes to their predecessors. GP106 has more transistors than GM206, which had more transistors than GK106, which had more transistors than GF116.

Also, according to many on the web- including AMD execs- Tonga had a physical memory bus implementation of 384-bits. It was just never enabled on any products.

And that was there biggest mistake. If it may be contractual wise or not.
 

Glo.

Diamond Member
Apr 25, 2015
5,761
4,666
136
hey glo, how many watts would 480 be if it uses 4gb of HBM?

Ask AMD. I can give you only my opinion. Depending on board size, amount of VRM's etc, etc. It would be around 125W IMO.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
Again, current reviews only measured the power consumption and perf/watt in DX-11 games. When most of the Gaming benchmarks in Reviews, by the end of the year will be DX-12, the perf/watt will change again.

I have said this before, Pascal is more power efficient in DX-11, but not in DX-12.


As for the Tonga vs Polaris 10 comparison, it was made because of the ~same transistor count to understand and evaluate how the new process and new architecture gained over the last.
 
Last edited:

JDG1980

Golden Member
Jul 18, 2013
1,663
570
136
Not more than Geforce GTX 1070 is. 12.5% extra memory bandwidth provides 6% better performance according to ComputerBase. This would still put it behind the Nano in their charts.

A switch to GDDR5X would mean 25% more memory bandwidth (256GB/sec -> 320GB/sec). That's a substantial jump. And keep in mind that the test run by ComputerBase was probably running at a core clock substantially lower than the 1266 MHz maximum, because overclocking the RAM would make the card hit the power limit even sooner.

In the other thread I linked, someone posted benchmarks indicating that 12.5% memory overclock combined with ~7% core overclock provided nearly 19% improvement in 3DMark FireStrike. (We can assume that this user boosted the power limit in WattMan.) With a few months of process refinement and maybe a new stepping, it shouldn't be hard for AMD to tweak the core clocks upward a bit, while keeping voltage the same or lower. Combined with the additional memory bandwidth of GDDR5X, that could make a big difference.

It's probably fair to say that Pascal will have better perf/watt overall than Polaris even when all is said and done, but what we're seeing with RX 480 is really close to a worst-case scenario. I think much of the gap can be closed in the next couple of months if AMD releases an "RX 485" introducing some of the modifications outlined here. Note that if there is indeed GDDR5X support already in the memory controller, then no expensive silicon-level changes would be necessary to do any of this.
 

krumme

Diamond Member
Oct 9, 2009
5,956
1,595
136
Not only is maxwell pascal arch more efficient. The better memory compression means that 480 is going to use gddr5 at oc levels so to speak.

Going by amd own slide the efficiency difference from Hawaii to polaris is the same as from polaris to vega.

I bet it will edge out 1080 on performance. As consumer fine. But from a business perspective it doesnt really matter. The die will be bigger and use hbm2 and therefore have cost disadvantage vs 1080 and will be soundly beaten by bigger nv die.
 

krumme

Diamond Member
Oct 9, 2009
5,956
1,595
136
On the other end of the spectrum, the 1060 suggests Pascal is a monster at scaled down core counts. It's not going to be remotely surprising if NV pulls out a 60W 750 Ti successor except only this time at 380 level performance and that sizeable chunk of the market becomes completely locked down by NV.
4 gb p11 at 100 usd is what this market needs. Not some expensive product. The 750 situation is gone although brand plays major part in this segment. Its not going to be locked down. Gf needs to move capacity.
 

krumme

Diamond Member
Oct 9, 2009
5,956
1,595
136
It's not enough because of how far behind the RX480 is in relation to the GTX1070/1080.

AIB 1080 is 2X faster than an RX 480 @ 1440p/4K, while maintaining 210W typical power usage with a peak of 224W. AMD would need a full generational leap from P10 to Vega to come close to GTX1080 if there were to target ~ 320-330mm2 die within the same 210-225W power envelope.

My calculation using typical gaming power usage:

1440p
Palit 1080 = 100% / 210W = 0.476 perf/watt rating
RX 480 = 51% / 163W = 0.313 perf/watt rating

Palit 1080 has a 0.476 / 0.313 = 52% perf/watt advantage

But it's even worse for the reference 1080 against reference RX 480 where the perf/watt advantage goes up to almost 80% in favour of NV.

You actually believe AMD will improve perf/watt by 50-80% from Polaris 10 with Vega? After their performance and perf/watt misrepresentation with Fury X and Polaris 10, I highly doubt it.

I maintain that AMD will need a larger die size and more power usage to match a GTX1080 with Vega 10. Their only way to compete will be on price or price/performance. This means GTX1080Ti/Titan P should have 0 competition this generation again. That's why I also predict a 2nd consecutive generation of NV selling GTX570 cut-down flagship under the x80Ti brand (980Ti and soon GTX1080Ti). I will be pleasantly surprised if NV actually releases a 3840 CC / 96 ROP / 240 TMU 384-bit GTX1080Ti. If they do, and Pascal scales almost linearly in GPU demanding games, then that's another 1.5X increase over the 1080. AMD will have no chance. To me right now this is AMD's HD2000 series generation unless I see some major changes. What masks AMD's engineering failure is RX 480's $199-239 price. The power usage should have been 110-120W with R9 390X performance, not 163-167W with R9 390 level of performance.

Go back to HD4000 series. When was the last time AMD actually had an significantly inferior $200-250 level chip? It hasn't happened until now. GTX1060 is not only going to be more power efficient but also faster. In fact, one could argue that in the last 5 years, that AMD actually had a superior product line in the $200-250 space. While AMD's cards were less efficient, they were at least as fast or faster. This time AMD's 1060 competitor is both slower and less efficient. Since the $200 RX 480 4GB is MIA and was released in limited quantities, how do they expect to sell RX 480 for $240-270 when GTX1060 is a $250-300 card? AMD is blowing it this time.
Well go ask retailers how it's going for amd. I think it's not since 5 series they moved so many cards in so short time.

If this paper tiger of 1060 comes to market in any real numbers aib 480 will lower price. Its that simple. The license of 14nm is done. The wafers will move because nobody else will use them. Sunk cost.

And btw Rs 1440? I think you can get to your point even at 1080
 
Last edited:

Sweepr

Diamond Member
May 12, 2006
5,148
1,142
131
A switch to GDDR5X would mean 25% more memory bandwidth (256GB/sec -> 320GB/sec). That's a substantial jump. And keep in mind that the test run by ComputerBase was probably running at a core clock substantially lower than the 1266 MHz maximum, because overclocking the RAM would make the card hit the power limit even sooner.

The memory overclock results were done at 'Maximied Power Target' and the card was running at 1,266 MHz. If 12.5% only brings 6% gains, I don't think an extra 11.1% (GDDR5X 10 Gbps) would do much. Just my 2 cents.

The Maximized Power Target allows the power consumption of the Radeon RX 480 to rise significantly. 267 watts required for the graphics card in consistently applied 1,266 MHz, which additional 42 watts means.


I have said this before, Pascal is more power efficient in DX-11, but not in DX-12.

Polaris 10 doesn't even match the most efficient Maxwell chips. 0.7-1.8% faster in DX12 (6 games on average) while using more power than a Geforce GTX 980 here.
 
Last edited:

renderstate

Senior member
Apr 23, 2016
237
0
0
The whole issue with simd alu's is that when you do not get all data slots occupied, efficiency is out the door. SIMD is single instruction on the same multiple data.
And that is what i have been questioning all along. polaris 10 has 2304 compute units but has actually 36 simd units. As soon as a 16 data set simd units is not fully occupied by data, polaris is at a loss. Now of course, the smx from pascal is the same in essence, single instruction multiple data. But it seems Nvidia is doing a whole of a lot better getting graphics data processed.


It's not the same. GCN logical SIMD width is 64 vector lanes, not 16. NVIDIA SIMD width is 32 vector lanes, which makes it less likely than GCN to suffer from so-called SIMD divergence issues. On the other hand it's highly unlikely this is what makes NVIDIA HW vastly more efficient than GCN. The truth is probably hidden in hundreds of micro-architectural details and differences between the two architectures.
 
May 11, 2008
20,041
1,289
126
It's not the same. GCN logical SIMD width is 64 vector lanes, not 16. NVIDIA SIMD width is 32 vector lanes, which makes it less likely than GCN to suffer from so-called SIMD divergence issues. On the other hand it's highly unlikely this is what makes NVIDIA HW vastly more efficient than GCN. The truth is probably hidden in hundreds of micro-architectural details and differences between the two architectures.

If i can believe the slides, it is 4 x 16 simd units for one compute unit. So if a wavefront cannot fit into a simd 16, polaris is at a loss. And that is what Nvidia does very well. At the driver level, Nvidia is king. Amd needs to do the same. I am lam...
 
Last edited:

renderstate

Senior member
Apr 23, 2016
237
0
0
If i can believe the slides, it is 4 x 16 simd units for one compute unit. So if a wavelet cannot fit into a simd 16, polaris is at a loss. And that is what Nvidia does very well. At the driver level, Nvidia is king. Amd needs to do the same.


You are not reading slides correctly. A GCN instruction runs for 4 cycles, on each cycle p 16 vector lanes are processed, for a total of 64 lanes. This is the smallest unit of computation on GCN and it's called a wavefront. Wavelets are completely different objects

An NVIDIA SM smallest unit of computation is called warp and it's 32 wide. This makes things a bit better for NVIDIA with code that has flow control but I'd be surprised if it brings to the table a lot of performance. NVIDIA advantage on this front is likely to be quite small, perhaps irrelevant in most cases.
 

tviceman

Diamond Member
Mar 25, 2008
6,734
514
126
www.facebook.com
Again, current reviews only measured the power consumption and perf/watt in DX-11 games. When most of the Gaming benchmarks in Reviews, by the end of the year will be DX-12, the perf/watt will change again.

I have said this before, Pascal is more power efficient in DX-11, but not in DX-12.


As for the Tonga vs Polaris 10 comparison, it was made because of the ~same transistor count to understand and evaluate how the new process and new architecture gained over the last.

Pascal is 70-80% more efficient than Polaris. There is not one single game that Polaris will ever be more efficient in.
 

dark zero

Platinum Member
Jun 2, 2015
2,655
138
106
Pascal is 70-80% more efficient than Polaris. There is not one single game that Polaris will ever be more efficient in.
Ashes of the Singularity.
Hitman.

However that's only two. AMD seems that made Polaris to cryptousers and people who uses the FP64 since is WAY more efficient than the 1080 in both things.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |