AMD's next GPU uarch is called "Polaris"

sontin · Jan 8, 2016

MrTeal said:
I didn't say otherwise. I just replied to monstercameron's post saying that looking for nVidia hardware and disabling Async Compute is exactly what Oxide said they did as they couldn't rely on the driver reporting to be accurate.

What? DX12 is a low level API. The developer is responsible to make it working. nVidia fully supports Async Compute.

MrTeal · Jan 8, 2016

sontin said:
What? DX12 is a low level API. The developer is responsible to make it working. nVidia fully supports Async Compute.

Again...

Kollock @ Oxide said:

Personally, I think one could just as easily make the claim that we were biased toward Nvidia as the only 'vendor' specific code is for Nvidia where we had to shutdown async compute. By vendor specific, I mean a case where we look at the Vendor ID and make changes to our rendering path. Curiously, their driver reported this feature was functional but attempting to use it was an unmitigated disaster in terms of performance and conformance so we shut it down on their hardware. As far as I know, Maxwell doesn't really have Async Compute so I don't know why their driver was trying to expose that. The only other thing that is different between them is that Nvidia does fall into Tier 2 class binding hardware instead of Tier 3 like AMD which requires a little bit more CPU overhead in D3D12, but I don't think it ended up being very significant. This isn't a vendor specific path, as it's responding to capabilities the driver reports.

Click to expand...

also

Oxide Games, developer of the highly-anticipated Ashes of the Singularity, has revealed that NVIDIA is working on a driver to fully implement DirectX 12’s Async Compute. According to Oxide developer Kollock, in a post on overclock.net, NVIDIA is in the process of refining its Async Compute driver, with the help of Oxide.

Kollock writes:

“We actually just chatted with Nvidia about Async Compute, indeed the driver hasn’t fully implemented it yet, but it appeared like it was. We are working closely with them as they fully implement Async Compute. We’ll keep everyone posted as we learn more.”

Click to expand...

I didn't write the game or the driver. I'm just going by what an Oxide developer has said happened with Async Compute in this case. Keep in mind that this happened last summer, and their driver very well might now have Async Compute fully implemented. Really this whole discussion should be forked to a new thread.

sontin · Jan 8, 2016

Look in the beyond3d.com thread. nVidia hardware supports Async Compute. It works for the application without problems.

Oxide has written their application for AMD. And their implementation has shown a flaw in nVidia's implementation. But this wasnt a fault of nVidia because, and you can see it in beyond3d.com thread, Oxide hasnt used a real graphic + compute implementation. They used only the graphics pipeline and launched graphic and compute commands within this one pipeline.

ThatBuzzkiller · Jan 8, 2016

sontin said:
Look in the beyond3d.com thread. nVidia hardware supports Async Compute. It works for the application without problems.

Oxide has written their application for AMD. And their implementation has shown a flaw in nVidia's implementation. But this wasnt a fault of nVidia because, and you can see it in beyond3d.com thread, Oxide hasnt used a real graphic + compute implementation. They used only the graphics pipeline and launched graphic and compute commands within this one pipeline.

Excuses ...

That is a fault on Nvidia's part and if you don't like that then you can take it up with Microsoft instead of pinning the blame on Oxide/AMD because the D3D12 spec is controlled by Microsoft ...

As for that last bit, if that were true then by definition Oxide isn't using Multi-Engine and hence no possible concurrent execution of compute shaders which renders your initial statement about Oxide's implementation flaw void ...

Glo. · Jan 8, 2016

So whole lineup will be on 14 nm.

That changes a lot for Polaris/GCN4.

sontin · Jan 8, 2016

ThatBuzzkiller said:
Excuses ...
That is a fault on Nvidia's part and if you don't like that then you can take it up with Microsoft instead of pinning the blame on Oxide/AMD because the D3D12 spec is controlled by Microsoft ...

So, why was the guy over at beyond3d.com able to write something with Async Compute which runs without problems on nVidia hardware?
So either there are more than on "D3D12 spec controlled by Microsoft" or Oxide went with an implementation optimized for AMD only...

As for that last bit, if that were true then by definition Oxide isn't using Multi-Engine and hence no possible concurrent execution of compute shaders which renders your initial statement about Oxide's implementation flaw void ...

No, read the beyond3d.com thread. AMD is able to fetch different
commands from the graphics pipeline without an context switch or penalty. nVidia cant do this, so everytime an compute commands is launched the hardware introduces a penalty. On the other hand when these commands are within the compute pipeline the driver is able to fetch and, to that time, proceeds them after the graphics command from the graphics pipeline without a penalty. They are not getting executed concurrently, but 32 compute command kernels are running in parallel.

maddie · Jan 8, 2016

sontin said:
So, why was the guy over at beyond3d.com able to write something with Async Compute which runs without problems on nVidia hardware?
So either there are more than on "D3D12 spec controlled by Microsoft" or Oxide went with an implementation optimized for AMD only...

No, read the beyond3d.com thread. AMD is able to fetch different
commands from the graphics pipeline without an context switch or penalty. nVidia cant do this, so everytime an compute commands is launched the hardware introduces a penalty. On the other when these commands within the compute pipeline the driver is able to fetch then and, to that time, proceeds them after the graphics command from the graphics pipeline without a penalty. They are not getting executed concurrently, but 32 command kernels are running in parallel.

He never said Nvidia can't do compute + graphics, just not async compute.

Just asking, are you trying to get this thread locked?

Glo. · Jan 8, 2016

It is no question that Nvidia GPUs can use Asynchronous Compute. The problem it is how it affects performance of NV GPUs. But for that we will need to wait and see.

P.S. It is off topic.

ThatBuzzkiller · Jan 8, 2016

sontin said:
So, why was the guy over at beyond3d.com able to write something with Async Compute which runs without problems on nVidia hardware?
So either there are more than on "D3D12 spec controlled by Microsoft" or Oxide went with an implementation optimized for AMD only...

Define "without problems" and Nvidia still hasn't showed that it can run compute shaders while the 3D engine is busy with rasterizing a G-buffer/building shadow maps etc ...

Oxide isn't using any trick to purposely optimize for AMD like driver extensions or such, what they are doing IS part of the D3D12 spec ...

If you have a fit to throw about the less than ideal performance on your Nvidia GPU then go direct your frustration towards Microsoft or Nvidia themselves for not getting an as good implementation on AMD about multiple queues ...

No, read the beyond3d.com thread. AMD is able to fetch different
commands from the graphics pipeline without an context switch or penalty. nVidia cant do this, so everytime an compute commands is launched the hardware introduces a penalty. On the other hand when these commands are within the compute pipeline the driver is able to fetch and, to that time, proceeds them after the graphics command from the graphics pipeline without a penalty. They are not getting executed concurrently, but 32 compute command kernels are running in parallel.

Guess what the conclusion is ? They have yet to find that AMD's competitor can execute the 3D engine + compute engine concurrently ...

antihelten · Jan 8, 2016

sontin said:
So, why was the guy over at beyond3d.com able to write something with Async Compute which runs without problems on nVidia hardware?

The software from B3D ran async compute without problems on Nvidia hardware, in the same way that multithreaded software can run without problems on a single core CPU, i.e non-concurrently.

Vesku · Jan 8, 2016

sontin said:
So, why was the guy over at beyond3d.com able to write something with Async Compute which runs without problems on nVidia hardware?
So either there are more than on "D3D12 spec controlled by Microsoft" or Oxide went with an implementation optimized for AMD only...

I doubt the beyond3d tester created a test that actually relied on the Async Compute task having good context switching behavior that games need to respond to game events. Please link it if they did run metrics on the non-Async tasks responsiveness during mixed workload.

John Carmack, who generally prefers Nvidia hardware, even stated AMD's context switching is way ahead right now when discussing his VR work.

On topic, so either all 14nm or we have Polaris 14nm (likely smaller dies) and Arctic Islands 16nm (likely larger dies). AMD might be a bit slow to launch big dies if they are solely 14nm, wonder if they'll do a HBM2 ~300-350mm2 die as a stopgap.

jpiniero · Jan 8, 2016

Considering AMD is talking about the 950, is it reasonable to think that the full 110 mm2 Polaris is in the performance range of the 950 and will cost roughly similar? I bet the TDP is closer to 60-75 W though, which still is a huge improvement. The bigger die would then be maybe in the 970/390 range?

Wall Street · Jan 8, 2016

jpiniero said:
Considering AMD is talking about the 950, is it reasonable to think that the full 110 mm2 Polaris is in the performance range of the 950 and will cost roughly similar? I bet the TDP is closer to 60-75 W though, which still is a huge improvement. The bigger die would then be maybe in the 970/390 range?

AMD said that the price would be similar to the 950. I expect the performance to be closer to equal with the 960.

The demo was apparently running at 850 MHz and I would expect them to try to push the release version to 1.2-1.5 Ghz. My uneducated guess would be that this has 1024 shaders like the 960.

Silverforce11 · Jan 8, 2016

maddie said:
I agree with your assessment.

These two consecutive sentences in the quote seem to say it quite clearly.

1) Some publications have reported that Polaris will be mix of both TSMC 16nm and Globalfoundries 14nm GPUs, which is where some of the confusion could potentially have stemmed from.

2) However, according to AMD Polaris is only 14nm.
The only other possible option is that the name Polaris is for this specific die, and I don't think it is that narrow a code name.

As to the multi die on interposer approach, join the fun. Some of us have been speculating about this quite attractive possibility. This year will be great for graphics.

It can't be just a name to that specific chip.

Because AMD's official Polaris video talks about what Polaris is. It's an "architecture ecosystem", that incorporates 4th gen GCN along with other tech.

Specific chip names would be Artic Island series.

AMD will have to pay GloFo for capacity regardless if they use it or not, so they best try to work with what they got. If GloFo can't do big dies, then make multiple small dies and interposer link them.

desprado · Jan 9, 2016

Did anyone see the difference
https://www.youtube.com/watch?v=oNA6fll2DDQ

Left panel color are more sharper and have more saturation (GTX 950) where as right side panel color are totally washed out which is equipped with amd next gen.

Leadbox · Jan 9, 2016

desprado said:
Did anyone see the difference<br />
<a href="https://www.youtube.com/watch?v=oNA6fll2DDQ" target="_blank">https://www.youtube.com/watch?v=oNA6fll2DDQ</a><br />
<br />
Left panel color are more sharper and have more saturation (GTX 950) where as right side panel color are totally washed out which is equipped with amd next gen.

<br />
They're TN panels and the monitor on the left is more front on with the camera, thats what you're seeing.Look here

Nachtmaer · Jan 9, 2016

Wall Street said:
My uneducated guess would be that this has 1024 shaders like the 960.

Shouldn't it be more than that? Bonaire is already 160mm² and has 12CUs/896 shaders and that's on 28nm. I know that die sizes don't scale linearly with the shader count or with the space savings from a smaller node, but something doesn't add up here.

I guess the main problem is that this showcase was just to prove that this chip can keep up with a 950 under the same conditions at a lower power draw, which makes it hard to guess its true potential. For all we know GM206 is still the faster chip.

That's just my uneducated guess, so feel free to point out what I'm missing.

Wall Street · Jan 9, 2016

Nachtmaer said:
Shouldn't it be more than that? Bonaire is already 160mm² and has 12CUs/896 shaders and that's on 28nm. I know that die sizes don't scale linearly with the shader count or with the space savings from a smaller node, but something doesn't add up here.

I think that you are correct. I will revise my estimate to 1280 shaders. Pitcairn was 212 mm2 with 1280 shaders to give a 45-50% scaling factor). We have to see if their memory compression and enhanced memory clock speed can give what I presume to be a 128-bit bus enough throughput to keep that many shaders fed. It makes me wonder if they are going to scale up the clock speed or actually release this generation of products at under 1000 Mhz. I guess I am speculating too much though from the few things we know.

Nachtmaer · Jan 9, 2016

Wall Street said:
I think that you are correct. I will revise my estimate to 1280 shaders. Pitcairn was 212 mm2 with 1280 shaders to give a 45-50% scaling factor). We have to see if their memory compression and enhanced memory clock speed can give what I presume to be a 128-bit bus enough throughput to keep that many shaders fed. It makes me wonder if they are going to scale up the clock speed or actually release this generation of products at under 1000 Mhz. I guess I am speculating too much though from the few things we know.

Yeah, that's pretty much what I was thinking too. Something along the lines of a "redone" Pitcairn seems very likely to me.

I still wonder what AMD will do in the mid to high range and what happened to that mysterious third chip that was rumored. People mentioned before that a die in the 300-350mm² range should be able to outperform Fiji/GM200 enough for it to be top of the line until the really big dies are ready (whenever that may be). I can see them do something similar to what Nvidia has done in recent years with Gx104.

Glo. · Jan 9, 2016

Isnt 14 nm a bit denser than that? If 160mm2 has 896 GCN core GPU, than at that die size on 14 nm it would be at least 1792 GCN core GPU. But we don't get into account how much denser is the architecture itself. Maybe it can also impact the density/die size of the chip itself?

Tonga with 2048 GCN cores is 360mm2. Divide die size by 2(at least) and we get into range of 180mm2 on 14nm. But we still don't get into account the the density of the process itself, and how new architecture affects the density and die size.

https://www.semiwiki.com/forum/content/3884-who-will-lead-10nm.html On the bottom are density comparisons. If Im reading this correctly 28 nm process from TSMC does not divide exactly by 2 into GloFo process. It goes more like 2.2-2.3.

So the die sizes will be different on GloFo nodes.

raghu78 · Jan 9, 2016

Glo. said:
Isnt 14 nm a bit denser than that? If 160mm2 has 896 GCN core GPU, than at that die size on 14 nm it would be at least 1792 GCN core GPU. But we don't get into account how much denser is the architecture itself. Maybe it can also impact the density/die size of the chip itself?

Tonga with 2048 GCN cores is 360mm2. Divide die size by 2(at least) and we get into range of 180mm2 on 14nm. But we still don't get into account the the density of the process itself, and how new architecture affects the density and die size.

https://www.semiwiki.com/forum/content/3884-who-will-lead-10nm.html On the bottom are density comparisons. If Im reading this correctly 28 nm process from TSMC does not divide exactly by 2 into GloFo process. It goes more like 2.2-2.3.

So the die sizes will be different on GloFo nodes.

One of the factors to consider is even though logic shrinks perfectly 50% from 28nm to 16/14nm , I/O does not. the PCI-E and memory controller will shrink lesser. AMD is likely to go for a highly clocked 128 bit memory bus with improved memory compression and bandwidth efficiency for the 110 mm sq die. A smaller memory bus helps in saving power and lowers PCB cost. To help offset the loss of raw bandwidth AMD might increase the L2 cache similar to what Nvidia with Maxwell (compared to Kepler). AMD needs low power and low cost to compete against Pascal's smallest GPU which will be around 100 - 110 sq mm. AMD needs to try and get as close as possible to Nvidia in terms of perf/sp vs perf/cc, perf/watt and perf/sq mm.

AMD has made significant changes to the command processor, geometry processor, GCN shaders, L2 cache and memory controller. New features such as primitive discard accelerator could improve power efficiency at the cost of slight die size increase. We need to see how the end products fare. AMD has failed against Maxwell and thus faces an uphill battle. Nvidia has plundered marketshare and mindshare. AMD needs to be competitive across the stack in terms of perf/watt, perf/sq mm and perf/$ to gain back market share.

KaRLiToS · Jan 9, 2016

raghu78 said:
One of the factors to consider is even though logic shrinks perfectly 50% from 28nm to 16/14nm , I/O does not. the PCI-E and memory controller will shrink lesser. AMD is likely to go for a highly clocked 128 bit memory bus with improved memory compression and bandwidth efficiency for the 110 mm sq die. A smaller memory bus helps in saving power and lowers PCB cost. To help offset the loss of raw bandwidth AMD might increase the L2 cache similar to what Nvidia with Maxwell (compared to Kepler). AMD needs low power and low cost to compete against Pascal's smallest GPU which will be around 100 - 110 sq mm. AMD needs to try and get as close as possible to Nvidia in terms of perf/sp vs perf/cc, perf/watt and perf/sq mm.

AMD has made significant changes to the command processor, geometry processor, GCN shaders, L2 cache and memory controller. New features such as primitive discard accelerator could improve power efficiency at the cost of slight die size increase. We need to see how the end products fare. AMD has failed against Maxwell and thus faces an uphill battle. Nvidia has plundered marketshare and mindshare. AMD needs to be competitive across the stack in terms of perf/watt, perf/sq mm and perf/$ to gain back market share.

+50 Rep to you.
I always like to read your posts when it comes to GPU predictions.

DeathReborn · Jan 9, 2016

I think it'll probably be a 1536 shader part, maybe cut down to 1408 for the demo/yield purposes. With GCN AMD nearly always needs more shaders vs the competition to compensate for the clock speed amongst other things, I can't really see that changing with Polaris.

maddie · Jan 9, 2016

Something else I've not seen mentioned is that every previous GPU from both AMD and Nvidia used TSMC.

This generation, in addition to the 28>14nm scaling, Samsung has a further 8-10% increase in density vs TSMC.

edit: Missed Glo. post and seems 10-15% increase in density @ Samsung vs TSMC.

Silverforce11 · Jan 9, 2016

maddie said:
Something else I've not seen mentioned is that every previous GPU from both AMD and Nvidia used TSMC.

This generation, in addition to the 28>14nm scaling, Samsung has a further 8-10% increase in density vs TSMC.

edit: Missed Glo. post and seems 10-15% increase in density @ Samsung vs TSMC.

Density gives better perf/mm2 but traditionally has led to worse perf/w due to leakage.

Now, if the FF transistors can prevent much of the leakage (as its frequently advertised), then the density advantage should lead to better perf/w and perf/mm2.

AMD's next GPU uarch is called "Polaris"

Diamond Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Golden Member

Diamond Member

Lifer

Senior member

Lifer

Golden Member

Senior member

Junior Member

Senior member

Junior Member

Diamond Member

Diamond Member

Golden Member

Platinum Member

Diamond Member

Lifer