AMD's next GPU uarch is called "Polaris"

Page 18 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
I didn't say otherwise. I just replied to monstercameron's post saying that looking for nVidia hardware and disabling Async Compute is exactly what Oxide said they did as they couldn't rely on the driver reporting to be accurate.

What? DX12 is a low level API. The developer is responsible to make it working. nVidia fully supports Async Compute.
 

MrTeal

Diamond Member
Dec 7, 2003
3,586
1,747
136
What? DX12 is a low level API. The developer is responsible to make it working. nVidia fully supports Async Compute.

Again...
Kollock @ Oxide said:


also



I didn't write the game or the driver. I'm just going by what an Oxide developer has said happened with Async Compute in this case. Keep in mind that this happened last summer, and their driver very well might now have Async Compute fully implemented. Really this whole discussion should be forked to a new thread.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
Look in the beyond3d.com thread. nVidia hardware supports Async Compute. It works for the application without problems.

Oxide has written their application for AMD. And their implementation has shown a flaw in nVidia's implementation. But this wasnt a fault of nVidia because, and you can see it in beyond3d.com thread, Oxide hasnt used a real graphic + compute implementation. They used only the graphics pipeline and launched graphic and compute commands within this one pipeline.
 
Last edited:

ThatBuzzkiller

Golden Member
Nov 14, 2014
1,120
260
136
Look in the beyond3d.com thread. nVidia hardware supports Async Compute. It works for the application without problems.

Oxide has written their application for AMD. And their implementation has shown a flaw in nVidia's implementation. But this wasnt a fault of nVidia because, and you can see it in beyond3d.com thread, Oxide hasnt used a real graphic + compute implementation. They used only the graphics pipeline and launched graphic and compute commands within this one pipeline.

Excuses ...

That is a fault on Nvidia's part and if you don't like that then you can take it up with Microsoft instead of pinning the blame on Oxide/AMD because the D3D12 spec is controlled by Microsoft ...

As for that last bit, if that were true then by definition Oxide isn't using Multi-Engine and hence no possible concurrent execution of compute shaders which renders your initial statement about Oxide's implementation flaw void ...
 

Glo.

Diamond Member
Apr 25, 2015
5,763
4,667
136
So whole lineup will be on 14 nm.

That changes a lot for Polaris/GCN4.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
Excuses ...
That is a fault on Nvidia's part and if you don't like that then you can take it up with Microsoft instead of pinning the blame on Oxide/AMD because the D3D12 spec is controlled by Microsoft ...

So, why was the guy over at beyond3d.com able to write something with Async Compute which runs without problems on nVidia hardware?
So either there are more than on "D3D12 spec controlled by Microsoft" or Oxide went with an implementation optimized for AMD only...

As for that last bit, if that were true then by definition Oxide isn't using Multi-Engine and hence no possible concurrent execution of compute shaders which renders your initial statement about Oxide's implementation flaw void ...
No, read the beyond3d.com thread. AMD is able to fetch different
commands from the graphics pipeline without an context switch or penalty. nVidia cant do this, so everytime an compute commands is launched the hardware introduces a penalty. On the other hand when these commands are within the compute pipeline the driver is able to fetch and, to that time, proceeds them after the graphics command from the graphics pipeline without a penalty. They are not getting executed concurrently, but 32 compute command kernels are running in parallel.
 
Last edited:

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
So, why was the guy over at beyond3d.com able to write something with Async Compute which runs without problems on nVidia hardware?
So either there are more than on "D3D12 spec controlled by Microsoft" or Oxide went with an implementation optimized for AMD only...



No, read the beyond3d.com thread. AMD is able to fetch different
commands from the graphics pipeline without an context switch or penalty. nVidia cant do this, so everytime an compute commands is launched the hardware introduces a penalty. On the other when these commands within the compute pipeline the driver is able to fetch then and, to that time, proceeds them after the graphics command from the graphics pipeline without a penalty. They are not getting executed concurrently, but 32 command kernels are running in parallel.
He never said Nvidia can't do compute + graphics, just not async compute.

Just asking, are you trying to get this thread locked?
 

Glo.

Diamond Member
Apr 25, 2015
5,763
4,667
136
It is no question that Nvidia GPUs can use Asynchronous Compute. The problem it is how it affects performance of NV GPUs. But for that we will need to wait and see.

P.S. It is off topic.
 

ThatBuzzkiller

Golden Member
Nov 14, 2014
1,120
260
136
So, why was the guy over at beyond3d.com able to write something with Async Compute which runs without problems on nVidia hardware?
So either there are more than on "D3D12 spec controlled by Microsoft" or Oxide went with an implementation optimized for AMD only...

Define "without problems" and Nvidia still hasn't showed that it can run compute shaders while the 3D engine is busy with rasterizing a G-buffer/building shadow maps etc ...

Oxide isn't using any trick to purposely optimize for AMD like driver extensions or such, what they are doing IS part of the D3D12 spec ...

If you have a fit to throw about the less than ideal performance on your Nvidia GPU then go direct your frustration towards Microsoft or Nvidia themselves for not getting an as good implementation on AMD about multiple queues ...

No, read the beyond3d.com thread. AMD is able to fetch different
commands from the graphics pipeline without an context switch or penalty. nVidia cant do this, so everytime an compute commands is launched the hardware introduces a penalty. On the other hand when these commands are within the compute pipeline the driver is able to fetch and, to that time, proceeds them after the graphics command from the graphics pipeline without a penalty. They are not getting executed concurrently, but 32 compute command kernels are running in parallel.

Guess what the conclusion is ? They have yet to find that AMD's competitor can execute the 3D engine + compute engine concurrently ...
 

antihelten

Golden Member
Feb 2, 2012
1,764
274
126
So, why was the guy over at beyond3d.com able to write something with Async Compute which runs without problems on nVidia hardware?

The software from B3D ran async compute without problems on Nvidia hardware, in the same way that multithreaded software can run without problems on a single core CPU, i.e non-concurrently.
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
So, why was the guy over at beyond3d.com able to write something with Async Compute which runs without problems on nVidia hardware?
So either there are more than on "D3D12 spec controlled by Microsoft" or Oxide went with an implementation optimized for AMD only...

I doubt the beyond3d tester created a test that actually relied on the Async Compute task having good context switching behavior that games need to respond to game events. Please link it if they did run metrics on the non-Async tasks responsiveness during mixed workload.

John Carmack, who generally prefers Nvidia hardware, even stated AMD's context switching is way ahead right now when discussing his VR work.

On topic, so either all 14nm or we have Polaris 14nm (likely smaller dies) and Arctic Islands 16nm (likely larger dies). AMD might be a bit slow to launch big dies if they are solely 14nm, wonder if they'll do a HBM2 ~300-350mm2 die as a stopgap.
 
Last edited:

jpiniero

Lifer
Oct 1, 2010
14,841
5,456
136
Considering AMD is talking about the 950, is it reasonable to think that the full 110 mm2 Polaris is in the performance range of the 950 and will cost roughly similar? I bet the TDP is closer to 60-75 W though, which still is a huge improvement. The bigger die would then be maybe in the 970/390 range?
 

Wall Street

Senior member
Mar 28, 2012
691
44
91
Considering AMD is talking about the 950, is it reasonable to think that the full 110 mm2 Polaris is in the performance range of the 950 and will cost roughly similar? I bet the TDP is closer to 60-75 W though, which still is a huge improvement. The bigger die would then be maybe in the 970/390 range?

AMD said that the price would be similar to the 950. I expect the performance to be closer to equal with the 960.

The demo was apparently running at 850 MHz and I would expect them to try to push the release version to 1.2-1.5 Ghz. My uneducated guess would be that this has 1024 shaders like the 960.
 
Feb 19, 2009
10,457
10
76
I agree with your assessment.

These two consecutive sentences in the quote seem to say it quite clearly.


1) Some publications have reported that Polaris will be mix of both TSMC 16nm and Globalfoundries 14nm GPUs, which is where some of the confusion could potentially have stemmed from.


2)
However, according to AMD Polaris is only 14nm.
The only other possible option is that the name Polaris is for this specific die, and I don't think it is that narrow a code name.


As to the multi die on interposer approach, join the fun. Some of us have been speculating about this quite attractive possibility. This year will be great for graphics.

It can't be just a name to that specific chip.

Because AMD's official Polaris video talks about what Polaris is. It's an "architecture ecosystem", that incorporates 4th gen GCN along with other tech.

Specific chip names would be Artic Island series.

AMD will have to pay GloFo for capacity regardless if they use it or not, so they best try to work with what they got. If GloFo can't do big dies, then make multiple small dies and interposer link them.
 

Leadbox

Senior member
Oct 25, 2010
744
63
91
Did anyone see the difference<br />
<a href="https://www.youtube.com/watch?v=oNA6fll2DDQ" target="_blank">https://www.youtube.com/watch?v=oNA6fll2DDQ</a><br />
<br />
Left panel color are more sharper and have more saturation (GTX 950) where as right side panel color are totally washed out which is equipped with amd next gen.
<br />
They're TN panels and the monitor on the left is more front on with the camera, thats what you're seeing.Look here
 
Last edited:

Nachtmaer

Junior Member
Oct 26, 2014
11
2
81
My uneducated guess would be that this has 1024 shaders like the 960.

Shouldn't it be more than that? Bonaire is already 160mm² and has 12CUs/896 shaders and that's on 28nm. I know that die sizes don't scale linearly with the shader count or with the space savings from a smaller node, but something doesn't add up here.

I guess the main problem is that this showcase was just to prove that this chip can keep up with a 950 under the same conditions at a lower power draw, which makes it hard to guess its true potential. For all we know GM206 is still the faster chip.

That's just my uneducated guess, so feel free to point out what I'm missing.
 

Wall Street

Senior member
Mar 28, 2012
691
44
91
Shouldn't it be more than that? Bonaire is already 160mm² and has 12CUs/896 shaders and that's on 28nm. I know that die sizes don't scale linearly with the shader count or with the space savings from a smaller node, but something doesn't add up here.

I think that you are correct. I will revise my estimate to 1280 shaders. Pitcairn was 212 mm2 with 1280 shaders to give a 45-50% scaling factor). We have to see if their memory compression and enhanced memory clock speed can give what I presume to be a 128-bit bus enough throughput to keep that many shaders fed. It makes me wonder if they are going to scale up the clock speed or actually release this generation of products at under 1000 Mhz. I guess I am speculating too much though from the few things we know.
 

Nachtmaer

Junior Member
Oct 26, 2014
11
2
81
I think that you are correct. I will revise my estimate to 1280 shaders. Pitcairn was 212 mm2 with 1280 shaders to give a 45-50% scaling factor). We have to see if their memory compression and enhanced memory clock speed can give what I presume to be a 128-bit bus enough throughput to keep that many shaders fed. It makes me wonder if they are going to scale up the clock speed or actually release this generation of products at under 1000 Mhz. I guess I am speculating too much though from the few things we know.

Yeah, that's pretty much what I was thinking too. Something along the lines of a "redone" Pitcairn seems very likely to me.

I still wonder what AMD will do in the mid to high range and what happened to that mysterious third chip that was rumored. People mentioned before that a die in the 300-350mm² range should be able to outperform Fiji/GM200 enough for it to be top of the line until the really big dies are ready (whenever that may be). I can see them do something similar to what Nvidia has done in recent years with Gx104.
 

Glo.

Diamond Member
Apr 25, 2015
5,763
4,667
136
Isnt 14 nm a bit denser than that? If 160mm2 has 896 GCN core GPU, than at that die size on 14 nm it would be at least 1792 GCN core GPU. But we don't get into account how much denser is the architecture itself. Maybe it can also impact the density/die size of the chip itself?

Tonga with 2048 GCN cores is 360mm2. Divide die size by 2(at least) and we get into range of 180mm2 on 14nm. But we still don't get into account the the density of the process itself, and how new architecture affects the density and die size.

https://www.semiwiki.com/forum/content/3884-who-will-lead-10nm.html On the bottom are density comparisons. If Im reading this correctly 28 nm process from TSMC does not divide exactly by 2 into GloFo process. It goes more like 2.2-2.3.

So the die sizes will be different on GloFo nodes.
 
Last edited:

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
Isnt 14 nm a bit denser than that? If 160mm2 has 896 GCN core GPU, than at that die size on 14 nm it would be at least 1792 GCN core GPU. But we don't get into account how much denser is the architecture itself. Maybe it can also impact the density/die size of the chip itself?

Tonga with 2048 GCN cores is 360mm2. Divide die size by 2(at least) and we get into range of 180mm2 on 14nm. But we still don't get into account the the density of the process itself, and how new architecture affects the density and die size.

https://www.semiwiki.com/forum/content/3884-who-will-lead-10nm.html On the bottom are density comparisons. If Im reading this correctly 28 nm process from TSMC does not divide exactly by 2 into GloFo process. It goes more like 2.2-2.3.

So the die sizes will be different on GloFo nodes.

One of the factors to consider is even though logic shrinks perfectly 50% from 28nm to 16/14nm , I/O does not. the PCI-E and memory controller will shrink lesser. AMD is likely to go for a highly clocked 128 bit memory bus with improved memory compression and bandwidth efficiency for the 110 mm sq die. A smaller memory bus helps in saving power and lowers PCB cost. To help offset the loss of raw bandwidth AMD might increase the L2 cache similar to what Nvidia with Maxwell (compared to Kepler). AMD needs low power and low cost to compete against Pascal's smallest GPU which will be around 100 - 110 sq mm. AMD needs to try and get as close as possible to Nvidia in terms of perf/sp vs perf/cc, perf/watt and perf/sq mm.

AMD has made significant changes to the command processor, geometry processor, GCN shaders, L2 cache and memory controller. New features such as primitive discard accelerator could improve power efficiency at the cost of slight die size increase. We need to see how the end products fare. AMD has failed against Maxwell and thus faces an uphill battle. Nvidia has plundered marketshare and mindshare. AMD needs to be competitive across the stack in terms of perf/watt, perf/sq mm and perf/$ to gain back market share.
 
Last edited:

KaRLiToS

Golden Member
Jul 30, 2010
1,918
11
81
One of the factors to consider is even though logic shrinks perfectly 50% from 28nm to 16/14nm , I/O does not. the PCI-E and memory controller will shrink lesser. AMD is likely to go for a highly clocked 128 bit memory bus with improved memory compression and bandwidth efficiency for the 110 mm sq die. A smaller memory bus helps in saving power and lowers PCB cost. To help offset the loss of raw bandwidth AMD might increase the L2 cache similar to what Nvidia with Maxwell (compared to Kepler). AMD needs low power and low cost to compete against Pascal's smallest GPU which will be around 100 - 110 sq mm. AMD needs to try and get as close as possible to Nvidia in terms of perf/sp vs perf/cc, perf/watt and perf/sq mm.

AMD has made significant changes to the command processor, geometry processor, GCN shaders, L2 cache and memory controller. New features such as primitive discard accelerator could improve power efficiency at the cost of slight die size increase. We need to see how the end products fare. AMD has failed against Maxwell and thus faces an uphill battle. Nvidia has plundered marketshare and mindshare. AMD needs to be competitive across the stack in terms of perf/watt, perf/sq mm and perf/$ to gain back market share.


+50 Rep to you.
I always like to read your posts when it comes to GPU predictions.
 

DeathReborn

Platinum Member
Oct 11, 2005
2,757
752
136
I think it'll probably be a 1536 shader part, maybe cut down to 1408 for the demo/yield purposes. With GCN AMD nearly always needs more shaders vs the competition to compensate for the clock speed amongst other things, I can't really see that changing with Polaris.
 

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
Something else I've not seen mentioned is that every previous GPU from both AMD and Nvidia used TSMC.

This generation, in addition to the 28>14nm scaling, Samsung has a further 8-10% increase in density vs TSMC.

edit: Missed Glo. post and seems 10-15% increase in density @ Samsung vs TSMC.
 
Last edited:
Feb 19, 2009
10,457
10
76
Something else I've not seen mentioned is that every previous GPU from both AMD and Nvidia used TSMC.

This generation, in addition to the 28>14nm scaling, Samsung has a further 8-10% increase in density vs TSMC.

edit: Missed Glo. post and seems 10-15% increase in density @ Samsung vs TSMC.

Density gives better perf/mm2 but traditionally has led to worse perf/w due to leakage.

Now, if the FF transistors can prevent much of the leakage (as its frequently advertised), then the density advantage should lead to better perf/w and perf/mm2.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |