Full AMD Polaris 10 GPU has 2304 Stream Processors

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

.vodka

Golden Member
Dec 5, 2014
1,203
1,537
136
Ah, marketing die shot like GM204's where all the actual drawn structures more or less resemble that of the logical diagram.

I have another interpretation of P10's marketing die shot:





GCN1 (Tahiti), GCN2 (Hawaii), GCN3 (Tonga/Fiji) CU diagrams are exactly the same. So this remains constant at the logical level across the four architecture revisions.

36CUs are shown. A GCN CU has 64SP in 4x16 groups, 16+4 units making up texturing hardware plus other necessary hardware. The differently colored 16u + 4u block has to be the texturing hardware.

16u green blocks * 4 = 64 SPs
16u yellow blocks = 16 TF/LS
4u green line next to yellow blocks = 4 TMUs



This forms a CU. The extra hardware depicted in the logical diagram has to be in these 16 large blocks in the outer regions and the smaller 4u lines in between CUs have the same color, these are probably registers, or L1 and L2 caches. Scalar unit, message unit, scheduler, etc, don't seem to be included.

ROPs, geometry processors, etc are decoupled from the CU and should be in the middle section of the diagram, probably. So should be ACE/HWS units. Memory controllers obviously in the periphery.

Going by this way of seeing it we have:


  • 9 CUs per quadrant (36CUs total)
  • 64SP*9=576SPs per quadrant and 2304 total
  • 16TF/LS units per CU, 144 per quadrant, 576 total
  • 4 TMUs per CU, 36 per quadrant, 144 total
all of which match the high level diagram slide's bullet points and the CU diagram.





All this shows is the marketing die shot matches the high level diagram shown in the slides, nothing more. P10 at the physical level may very well have redundancy built in as any sanely designed chip should have (seen in the console APUs, 2-4 extra CUs thrown in there, disabled to guarantee yields), it may very well have extra hardware built in that's disabled at this time and may never be enabled (like Tonga's extra 128 bit memory controller chunk, Kaveri's memory controller GDDR5 support, etc), or it may very well be a design that's shipping 100% functional in what seem to be the truckloads, making GloFo a miracle foundry up from a disaster literally overnight with Samsung's help. Overvolted dies across the board, yes, but functional.


Silverforce has a point, quite a valid one in that one shouldn't believe any claims this early in the silicon's life over the specs. Especially with this slide floating around



RX485 could exist going by that. It's anyone's guess where they're going with that naming scheme.




edit: oh, I was late while making this post. Higher resolution marketing die shot confirms what I thought, more or less.
 
Last edited:
Feb 19, 2009
10,457
10
76
@.vodka

You are correct, that isn't a real die-shot but a marketing die-shot. Like someone had a hack job in power point.
 

jj109

Senior member
Dec 17, 2013
391
59
91
Real die shots look nothing like that.

It was funny watching /r/AMD having a minor freakout though.
 

tviceman

Diamond Member
Mar 25, 2008
6,734
514
126
www.facebook.com
For those who want to believe PR, let me test your logic..

NVIDIA says the 1070 has 64 ROPs. Is this the complete truth?

Look at the diagram and look at the actual test result of Rasterizer performance.



^ If you know anything about NV's architecture layout, you would have quickly realized Rasterizers are within a GPC cluster, if it's cut, bye bye ROPs.

Look at it's fillrate performance:



Ohh look at that! Nowhere near the 1080 with full 64 ROPs. It looks to be missing quite a few, like it's only got 48 ROPS usable.

What a coincidence, each GPC has 16 ROPs, four for the full GP104 equates to 64, 3 for the 1070 equates to 48 ROPs.

Do you trust AMD or NV PR?

Need I remind you, 970 fiasco?

Or recently with AMD, 150W RX 480! lol

1. This is veering off topic, but a 48-ROP gtx1070 (along with the rest of its cut down specs) would have zero chance of catching a gtx1080, but a few AIB OC'd cards are doing just that.

2. Since the GP106 rumors are heavily favoring a 192-bit 48 ROP part with 10 streaming clusters, it completely debunks your logic.
 

Mahigan

Senior member
Aug 22, 2015
573
0
0
1. This is veering off topic, but a 48-ROP gtx1070 (along with the rest of its cut down specs) would have zero chance of catching a gtx1080, but a few AIB OC'd cards are doing just that.

2. Since the GP106 rumors are heavily favoring a 192-bit 48 ROP part with 10 streaming clusters, it completely debunks your logic.

Actually... Silverforce11 appears to be right.

Take the clock speed and multiply it by the number of ROPs and you get the theoretical Pixel Fillrate.

Take 1898MHz x 48 = 91 GPixels/s. If you look at the GTX 1070s performance it is around 85 GPixels/s. In other words... it only has 48 active ROPs. If the GTX 1070 had 64 active ROPs then it would be scoring at up to 121 GPixels/s when running at 1898MHz. The GTX 1080 delivers around that performance (taking into account clock throttling) but the GTX 1070 is far far far away from those figures and in-line with what one would expect from a 48 ROPs GPU.

The ROPs on the GTX 1070 are present and are connected to the memory controller but are not being used. Most probably the GTX 1060 depending on how nVIDIA disable it... will not have a true 48 ROPs design either.

nVIDIA are... once again... deceiving their users.
 
Last edited:

Glo.

Diamond Member
Apr 25, 2015
5,761
4,666
136
Actually... Silverforce11 appears to be right.

Take the clock speed and multiply it by the number of ROPs and you get the theoretical Pixel Fillrate.

Take 1898MHz x 48 = 91 GPixels/s. If you look at the GTX 1070s performance it is around 85 GPixels/s. In other words... it only has 48 active ROPs. If the GTX 1070 had 64 active ROPs then it would be scoring at up to 121 GPixels/s when running at 1898MHz. The GTX 1080 delivers around that performance (taking into account clock throttling) but the GTX 1070 is far far far away from those figures and in-line with what one would expect from a 48 ROPs GPU.

nVIDIA are... once again... deceiving their users.

Nominal boost clock for GTX 1070 is 1693 MHz. And it gives 108 GPixels/second with 64 ROPs.

48 ROPs times 1.693 give 81 GPixels.
 

Mahigan

Senior member
Aug 22, 2015
573
0
0
Nominal boost clock for GTX 1070 is 1693 MHz. And it gives 108 GPixels/second with 64 ROPs.

48 ROPs times 1.693 give 81 GPixels.

Which only further proves that it is a 48 ROPs part. In the test above they have set the boost speed to 1898MHz (read the picture posted by Silverforce carefully).

I think that the TechReport (once they post their GTX 1070 review and their "sizing them up" section) may prove to be enlightening in that their review may prove this theory posted by Silverforce.
 
Last edited:

Erenhardt

Diamond Member
Dec 1, 2012
3,251
105
101
Any memory access tests like when 970 appeared to be 3,5GB only?
I think it deserves a thread on its own.
 

Mahigan

Senior member
Aug 22, 2015
573
0
0
On the ultra tiny chance this is real, then it means the gtx 1070 also has a 192-bit effective bus and 2gb of gimped vram, correct?

Well it appears to be true... look at the ALU performance (proving it is running at 1898MHz).


2(1898 x 1920) = 7.2 TFlops and the figure that PCGameshardware obtain is 6.9 TFlops.

If it were running at a nomial boost clock of 1683MHz then we would get a theoretical TFlops MAX rating of 6.4 TFlops which is not in line with what we are seeing.

We can also perform the calculations for the TitanX posted there... 2(1189 x 3072) = 7.3 TFlops and it obtains 6.8 TFlops. This pretty much confirms that the GTX 1070 is operating (part of the time at least) at its 1898MHz clock (overclock?).

In other words... the GTX 1070 does really appear to only have 48 active ROPs.


I agree... this needs a new thread.
 
Last edited:

Mahigan

Senior member
Aug 22, 2015
573
0
0
On the ultra tiny chance this is real, then it means the gtx 1070 also has a 192-bit effective bus and 2gb of gimped vram, correct?

Potentially? Yes. It all depends on how nVIDIA cut down the GTX 1070. We do not have this information so we would need for someone to test it.
 

IEC

Elite Member
Super Moderator
Jun 10, 2004
14,359
5,017
136
Going back to the point of PR not being believable...

Keep in mind that AMD was saying that Polaris 10 (RX 480) was a 110W part. Then they "clarified" that it was chip only, and that the RX 470 would be closer to that overall. The saying "trust, but verify" applies here as well as in other cases (the 48 ROP piece being interesting, since all the reviews were parroting the same 64 ROP line with no verification).

AMD has at least made it so that some GPUs are unlockable to a higher tier of performance... e.g. Hawaii (290) and Fiji chips (Fury). In the case of the RX 480 it seems that early 4GB models are actually physically using the same 8Gb chips (x8) as the 8GB models - it's only a matter of time before they get unlocked, and I get my double memory for $40 less per card

I wouldn't be surprised to see a Polaris GPU with > 2304 SPs, but it'd likely be the RX 485. Global Foundries would have to get their process tuned up to par since it's pretty apparent they've got plenty of room for improvement, judging from the less than spectacular power efficiency of Polaris 10.
 

tviceman

Diamond Member
Mar 25, 2008
6,734
514
126
www.facebook.com
Potentially? Yes. It all depends on how nVIDIA cut down the GTX 1070. We do not have this information so we would need for someone to test it.

Amazing if true...a 192 GB/s bandwidth GTX 1070 destroying the 256 gb/s bandwidth RX 480 by 50%. The GTX 1070 having worse specs on paper make it look all the more impressive in comparison.
 
Last edited:

Mahigan

Senior member
Aug 22, 2015
573
0
0
Amazing if true...a 192 GB/s bandwidth GTX 1070 destroying the 256 gb/s bandwidth RX 480 by 50%. The GTX 1070 having worse specs on paper make it look all the more impressive in comparison.

The clock speed and additional ROPs are the most likely culprits.

Some dude just overclocked a reference RX 480 to 1.5GHz..


He obtained near Fury-X performance. Add another 200MHz or so and you should be near GTX 980 Ti performance... with only 32 ROPs.

So it is not that impressive. What is most impressive is TSMCs process vs GoFlo.

I am looking forward to seeing what AIBs Graphics cards end up with overclocking wise on the RX 480.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
Amazing if true...a 192 GB/s bandwidth GTX 1070 destroying the 256 gb/s bandwidth RX 480 by 50%. The GTX 1070 having worse specs on paper make it look all the more impressive in comparison.

facepalm
 
Feb 19, 2009
10,457
10
76
On the ultra tiny chance this is real, then it means the gtx 1070 also has a 192-bit effective bus and 2gb of gimped vram, correct?

It's real if the fillrate test is accurate, which it is for other GPUs. The Rasterizer throughput is massive between the 1070 vs 1080.

However, Pascal may have a different memory subsystem or crossbar, we can't be certain how the memory controller is affected with the cut.

It could well be the full 8GB.. need some reliable testing. I'm sure the geeks at beyond3D forums will have a look into this.

Either way even if it's segmented, doesn't make any difference since 6/7 GB is plenty and the 1070 performs great already.
 

Glo.

Diamond Member
Apr 25, 2015
5,761
4,666
136
The clock speed and additional ROPs are the most likely culprits.

Some dude just overclocked a reference RX 480 to 1.5GHz..


He obtained near Fury-X performance. Add another 200MHz or so and you should be near GTX 980 Ti performance... with only 32 ROPs.

So it is not that impressive. What is most impressive is TSMCs process vs GoFlo.

I am looking forward to seeing what AIBs Graphics cards end up with overclocking wise on the RX 480.

The only and biggest bottleneck for Polaris architecture is memory bandwidth. Scheduler does really good job handling it, but the lack of memory bandwidth completely erases its capabilities.

I was theorizing lately that porting Fury X to 14 nm GloFo process would make the GPU run at around 140W, and new scheduler would actually make the GPU perform in GTX 1080 levels(at Fury X core clock of 1050 MHz).

Hawaii XT has 68.75% of GCN cores from Fiji XT. Fiji should be 30% faster than Hawaii. But is it? Every DX12 game shows that Fiji is only 15% faster. It is due the scheduler, taken out directly from Hawaii. If it would be based on similar architecture(not the same) as is in Polaris it would already be on GTX 1080 levels in DX 12 titles.

As about RX 480/Polaris 10. Give the GPU GDDR5X or HBM and it will be completely different story.
 

omek

Member
Nov 18, 2007
137
0
0
The undervolting results show a lot of potential headroom.
https://www.youtube.com/watch?v=L00yplZVDhQ
undervolted from 1060mV to 1000

http://www.legitreviews.com/amd-radeon-rx-480-undervolting-performance_183699
1137 to 1050, 87mV drop and the performance increases in both instances, -60mV and -87mV (without downclocking mind you).

Power draw decrease isn't as linear as it should be at this point either.
IMO the overall power target and voltage threshold is being overshot by quite a bit and when the power target is dropped the 480 gains performance and also clock consistency.



When overclocking and overvolting results in dropped performance there are power delivery problems. The 480 should see lots of potential for overclocking as the process matures and AIB partners feed these things more power, not just that but actually show us the potential of a full stock P10 because at the moment it's being stifled.
 
Last edited:

Headfoot

Diamond Member
Feb 28, 2008
4,444
641
126
I love all the team members coming here and claiming "there is no process problem." Really? Think about it for a second. This is the first dGPU ever produced by this foundry vs a foundry that has produced almost every single dGPU ever made.

Of course TSMC is going to be better on GPUs... or do you think 15+ years of making GPUs doesn't give them an edge?

There is no direct evidence to support either theory so we have to resort to basic logic.
 

sirmo

Golden Member
Oct 10, 2011
1,014
391
136
Going back to the point of PR not being believable...

Keep in mind that AMD was saying that Polaris 10 (RX 480) was a 110W part. Then they "clarified" that it was chip only, and that the RX 470 would be closer to that overall. The saying "trust, but verify" applies here as well as in other cases (the 48 ROP piece being interesting, since all the reviews were parroting the same 64 ROP line with no verification).

AMD has at least made it so that some GPUs are unlockable to a higher tier of performance... e.g. Hawaii (290) and Fiji chips (Fury). In the case of the RX 480 it seems that early 4GB models are actually physically using the same 8Gb chips (x8) as the 8GB models - it's only a matter of time before they get unlocked, and I get my double memory for $40 less per card

I wouldn't be surprised to see a Polaris GPU with > 2304 SPs, but it'd likely be the RX 485. Global Foundries would have to get their process tuned up to par since it's pretty apparent they've got plenty of room for improvement, judging from the less than spectacular power efficiency of Polaris 10.

My rx480 stays within that range in all the games I've played. 110-120 according to GPU-Z. This is with a slight overclock 1320Mhz.

Also I see people equating rx480's efficiency or lack thereof to a bad process at GloFo. 14nm process at GloFo is actually quite good. For one the yields must be excellent, rx480 being a full part they've shipped a lot of GPUs. GCN is just not as streamlined for graphical workloads like Maxwell/Pascal is. If you compare the efficiency of R9 380 to rx480 you'll notice an almost 200% gain.

GCN has features which make it inherently less efficient at purely graphical tasks which is most of DX11 titles. It's the difference that's been there since the 1st GCN card so I am not sure why blame the process now all of a sudden.

Obviously AMD has decided to stay the course in regards to compute resources on their GPUs. Which is understandable.. Apple, Sony and Microsoft are all utilizing those, and that's the majority of AMD's revenues. DX12 and Vulkan adoption is also inevitable.

IMHO rx480 is quite a competitive GPU despite the fact that it isn't a one trick pony like Maxwell and Pascal to some degree.
 
Last edited:

sirmo

Golden Member
Oct 10, 2011
1,014
391
136
GloFo sinks another AMD ship. They have got to get Zen/Vega away from them ASAP.
Are you aware that Fury X couldn't OC much and that was on TSMC? Fury X has 4096 stream processors, while this thing only has 2300. The fact that it's even coming close to Fury X is a testament to 14nm being quite good.
 

PhonakV30

Senior member
Oct 26, 2009
987
378
136
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |