[WCCFtech] AMD and NVIDIA DX12 big picture mode

Carfax83 · Sep 3, 2015

I don't usually use WCCFtech as a source, but this article of theirs is actually quite good, and helps to put things back into perspective regarding DX12.

The more I learn about this, the more I think that AMD has been quite clever in how they've managed to hype their asynchronous shader technology. For instance, a lot of people now erroneously think that hardware based asynchronous compute engines are part of the DX12 spec (a small percentage are even thinking about selling their NVidia GPUs and going AMD because of this), yet if you look at the DX12 feature levels in the article, nowhere is a hardware implementation for asynchronous compute mentioned.

Intel GPUs don't support it in hardware, and apparently neither does NVidia. This doesn't mean that Intel and NVidia cannot do asynchronous compute however, just that they lack the specialized hardware that AMD has in that regard.

Regardless, it's unlikely that current NVidia GPUs will ever do asynchronous compute as effectively as AMD's GCN models, or achieve a similar performance gain. But is this a bad thing? This is where it gets murky.

Whilst asynchronous compute obviously has significant performance improvements for AMD's GCN cards, there is no evidence that it would for Maxwell 2, or even Intel's latest Skylake GPUs. Asynchronous compute relies on idle resources to perform compute shaders in parallel with rendering. So GPUs with a lot of idle resources at any particular time would presumably see large performance increases relative to GPUs with less idle resources, assuming they have the means to exploit it....and modern GCN GPUs with their eight dedicated ACEs definitely do.

So AMD have OBVIOUSLY and purposely designed their GPUs to be capable of asynchronous compute operations for a long time, knowing that one day it would become useful for achieving maximum throughput in their GPUs, which have an issue with under utilization.

And they OBVIOUSLY want developers to use asynchronous compute as much as possible, knowing it would give them an advantage. This is where their domination of the console market could finally give AMD the leverage they've always wanted. Developers that target the current gen consoles are guaranteed to utilize asynchronous compute because of the performance gains, and also because the CPUs in the consoles are so weak. Some of the first party developers may even develop their engines around it.

With DX11, AMD could never really leverage their domination of the console market to gain an edge over NVidia, because DX11's abstractive layers prevented that. But with DX12, this changes. Console optimizations can now be implemented in PC games, and so provide AMD with a unique advantage. Asynchronous compute shaders may just be the tip of the iceberg!

So what can, or should NVidia do? Maxwell 2 is perhaps the most efficient GPU architecture ever designed up to this point, and DX12 should increase that efficiency even more. Thing is, DX12 massively improves Radeon efficiency as well, perhaps even more so relatively speaking. Still, NVidia seem very confident of the DX12 performance for Maxwell, and Maxwell hasn't really displayed any salient weaknesses so far..

The jury is still out on AotS, as it's still in alpha, and as NVidia continues to refine and optimize their DX12 driver, things could change much like what occurred with the Star Swarm benchmark.

ShintaiDK · Sep 3, 2015

They couldnt even get the featurelist correct in the IHV compare.

cen1 · Sep 3, 2015

I ignore every wccftech article that tries to "finally" explain a highly technical problem because it is obvious to me that the guy writing it has absolutely no idea how things actually work. They just throw in a lot of quotes and try to sound smart but on the end of the day, they don't really tell anything substantial.

Keysplayr · Sep 3, 2015

Now it's time to see if Lisa Su can put Rory Read to shame. Or not.

Carfax83 · Sep 3, 2015

ShintaiDK said:
They couldnt even get the featurelist correct in the IHV compare.

What did they get wrong?

cen1 said:
I ignore every wccftech article that tries to "finally" explain a highly technical problem because it is obvious to me that the guy writing it has absolutely no idea how things actually work. They just throw in a lot of quotes and try to sound smart but on the end of the day, they don't really tell anything substantial.

Of course the article is going to be light on technical details, considering it's source.

But the reason why I posted it is because it brings to light one simple truth. Asynchronous compute is NOT part of the DX12 specification.

GCN cards have had the ability to do asynchronous compute in hardware for years, before DX12 was even released. AMD knowingly implemented this feature into their designs, with the thought that it would become useful once low level APIs started to emerge, with Mantle being the first.

And now they are using their console domination leverage, and their marketing department to sort of delude everyone into believing that asynchronous compute is some major DX12 feature that NVidia does not properly support..

pepone1234 · Sep 3, 2015

Carfax83 said:
What did they get wrong?

Of course the article is going to be light on technical details, considering it's source.

But the reason why I posted it is because it brings to light one simple truth. Asynchronous compute is NOT part of the DX12 specification.

GCN cards have had the ability to do asynchronous compute in hardware for years, before DX12 was even released. AMD knowingly implemented this feature into their designs, with the thought that it would become useful once low level APIs started to emerge, with Mantle being the first.

And now they are using their console domination leverage, and their marketing department to sort of delude everyone into believing that asynchronous compute is some major DX12 feature that NVidia does not properly support..

It doesn't matter if it is or not part of dx12. Nvidia promised their maxwell cards were going to support async compute and now everything is telling us that maxwell does not support async compute.

Goatsecks · Sep 3, 2015

Maxwell does support async compute?

TheELF · Sep 3, 2015

Carfax83 said:
AMD knowingly implemented this feature into their designs, with the thought that it would become useful once low level APIs started to emerge, with Mantle being the first.

It's the exact same move they pulled with the fx line of processors,bring out a CPU with 8 cores try to convince everybody that the multithreaded future is right around the corner push people towards rendering benchmarks "proving" that they are faster then an I7 and sit back collecting the cash.
Same thing here
make a GPU with 8 ACEs bring out a benchmark "proving" how fast it will be,try to convince everybody that the async compute future is right around the corner and sit back collecting the cash..

NTMBK · Sep 3, 2015

TheELF said:
It's the exact same move they pulled with the fx line of processors,bring out a CPU with 8 cores try to convince everybody that the multithreaded future is right around the corner push people towards rendering benchmarks "proving" that they are faster then an I7 and sit back collecting the cash.
Same thing here
make a GPU with 8 ACEs bring out a benchmark "proving" how fast it will be,try to convince everybody that the async compute future is right around the corner and sit back collecting the cash..

With Bulldozer, AMD had no way to push the software market towards multithreading. They have put GCN and its ACEs in both consoles, meaning all multiplatform games will be developed for that architecture. It's a totally different situation.

cmdrdredd · Sep 3, 2015

pepone1234 said:
It doesn't matter if it is or not part of dx12. Nvidia promised their maxwell cards were going to support async compute and now everything is telling us that maxwell does not support async compute.

Yet on beyond3D they have written simple apps that do async compute and Nvidia GPUs do it. They are not as fast when you go beyond their queue limit though.

When one alpha benchmark from a company that has been pimping AMD for a while seems to show a slight disadvantage in performance for Nvidia in DX12 people go nuts though.

ShintaiDK · Sep 3, 2015

Carfax83 said:
What did they get wrong?

Try compare with someone that did their homework.

http://www.pcgameshardware.de/Core-...ylake-Test-Core-i7-6700K-i5-6600K-1166741/#a3

PrincessFrosty · Sep 3, 2015

Fundamentally if you fully support an API or not does not come down to if you can execute the features in hardware or not, its if you can deal with all input/output for that API call correctly, which Nvidia can clearly do.

Performance isn't a requirement for API support, things implemented in software might or might not be slower than hardware implementation depending on a huge number of factors. But that's a 2ndry conversation to what is "supported".

It's also important to have some realistic context regarding feature sets and their scope and usefulness, early adoption of a new version of DX has almost always been pointless, mostly because the GPUs can deal with the API calls long before they have the kind of core speed to actually implement features using these newer features, when you consider this along side finite and often very short lifespans of GPUs. There's a lot of fanboyism around this, a lot of bad synthetic benchmarks and a lot of hype based on a small number of games.

What is a "win" on paper for Nvidia or AMD vs what is actually good for the gamers playing actual games is often very different.

boozzer · Sep 3, 2015

PrincessFrosty said:
Fundamentally if you fully support an API or not does not come down to if you can execute the features in hardware or not, its if you can deal with all input/output for that API call correctly, which Nvidia can clearly do.

Performance isn't a requirement for API support, things implemented in software might or might not be slower than hardware implementation depending on a huge number of factors. But that's a 2ndry conversation to what is "supported".

It's also important to have some realistic context regarding feature sets and their scope and usefulness, early adoption of a new version of DX has almost always been pointless, mostly because the GPUs can deal with the API calls long before they have the kind of core speed to actually implement features using these newer features, when you consider this along side finite and often very short lifespans of GPUs. There's a lot of fanboyism around this, a lot of bad synthetic benchmarks and a lot of hype based on a small number of games.

What is a "win" on paper for Nvidia or AMD vs what is actually good for the gamers playing actual games is often very different.

if amd can pull off backwards compatibility with their gcn cards it would be huge. but we got nothing concrete on that front. so far it is just 1 game. AMD needs to spread some moola and grease some elbows.

NomanA · Sep 3, 2015

cmdrdredd said:
Yet on beyond3D they have written simple apps that do async compute and Nvidia GPUs do it. They are not as fast when you go beyond their queue limit though.

Nope, the nVidia GPUs don't show asynchronous graphic and compute in those results. That's the whole point. Otherwise, why would there be this long debate?

cmdrdredd · Sep 3, 2015

PrincessFrosty said:
Fundamentally if you fully support an API or not does not come down to if you can execute the features in hardware or not, its if you can deal with all input/output for that API call correctly, which Nvidia can clearly do.

Performance isn't a requirement for API support, things implemented in software might or might not be slower than hardware implementation depending on a huge number of factors. But that's a 2ndry conversation to what is "supported".

It's also important to have some realistic context regarding feature sets and their scope and usefulness, early adoption of a new version of DX has almost always been pointless, mostly because the GPUs can deal with the API calls long before they have the kind of core speed to actually implement features using these newer features, when you consider this along side finite and often very short lifespans of GPUs. There's a lot of fanboyism around this, a lot of bad synthetic benchmarks and a lot of hype based on a small number of games.

What is a "win" on paper for Nvidia or AMD vs what is actually good for the gamers playing actual games is often very different.

All true and thank you for putting into words what so many of us have thought and not articulated very well. Games may or may not use all these features and future GPUs will be more efficient at using those features. I think that by the time this matters in a majority of hot AAA games, we will not be using 980s or 390s anymore.

Carfax83 · Sep 3, 2015

pepone1234 said:
It doesn't matter if it is or not part of dx12. Nvidia promised their maxwell cards were going to support async compute and now everything is telling us that maxwell does not support async compute.

It matters in the context of marketing your product. AMD is making asynchronous compute out to be a major feature of DX12, but in reality it isn't.

AMD is really the only IHV that's been constantly banging their drums about asynchronous compute. Intel and NVidia not so much at all..

Perhaps it's because asynchronous compute just isn't as important for their architectures as it is for GCN..

NTMBK said:
With Bulldozer, AMD had no way to push the software market towards multithreading. They have put GCN and its ACEs in both consoles, meaning all multiplatform games will be developed for that architecture. It's a totally different situation.

That's true, and it's a masterful stroke by AMD to do this. I congratulate them wholeheartedly! :thumbsup:

ShintaiDK said:
Try compare with someone that did their homework.

http://www.pcgameshardware.de/Core-...ylake-Test-Core-i7-6700K-i5-6600K-1166741/#a3

Wow, Skylake IGP is the boss when it comes to specifications isn't it?

Hitman928 · Sep 3, 2015

Maxwell owners at beyond3d are showing little to no benefit in trying to run the test asynchronously:

https://forum.beyond3d.com/threads/dx12-async-compute-latency-thread.57188/page-22

ToTTenTranz said:
it doesn't look like your test [is] doing any async compute either [on a 970]

Devnant said:
Seems like I don't get any async benefits [on a 980 Ti]

and more.

One example:

The test isn't perfect but some well respected people there have said that this test should show whether Async compute is working or not. Maxwell seems to get little speed ups here and there, but nothing significant. Until Nvidia comments, we probably won't know what's going on and if they never do, then it probably is as this test shows.

Will this matter? Yes, at some point. Will that time be soon enough for the market to be effected? TBD. The industry is moving towards more compute being used in game engines which could obviously enhance the situation but Nvidia will probably be fine because their next gen will come out before a lot of games support asynch compute and they can tell everyone to jump on the latest and greatest at that point.

cmdrdredd · Sep 3, 2015

Hitman928 said:
Will this matter? Yes, at some point. Will that time be soon enough for the market to be effected? TBD. The industry is moving towards more compute being used in game engines which could obviously enhance the situation but Nvidia will probably be fine because their next gen will come out before a lot of games support asynch compute and they can tell everyone to jump on the latest and greatest at that point.

Or it may not matter at all since it isn't part of DX12. People like to harp on how consoles do it but the consoles also have a pretty weak CPU so anything that can be done GPU side helps.

Carfax83 · Sep 3, 2015

Hitman928 said:
Will this matter? Yes, at some point. Will that time be soon enough for the market to be effected? TBD. The industry is moving towards more compute being used in game engines which could obviously enhance the situation but Nvidia will probably be fine because their next gen will come out before a lot of games support asynch compute and they can tell everyone to jump on the latest and greatest at that point.

Compute has been used heavily in games for some time now, especially since the launch of the current gen consoles. Frostbite 3 uses it a lot, and so does CryEngine for instance.. So asynchronous compute isn't really the "industry" harbinger that AMD is portraying it to be. Asynchronous compute is less about compute performance, and more about increasing GPU efficiency; particularly for GCN.

And besides, NVidia's compute performance since Maxwell has been very good, on par with or better than the GCN cards.. Pascal is certainly going to be a compute monster, as that's where the industry is headed towards.

cmdrdredd · Sep 3, 2015

NomanA said:
Nope, the nVidia GPUs don't show asynchronous graphic and compute in those results. That's the whole point. Otherwise, why would there be this long debate?

How come some of the results show Nvidia GPUs with shorter durations in the tests up to a certain point? Has that been explained?

I'm not gonna read through that entire thread, I only browsed it and submitted my results there.

Hitman928 · Sep 3, 2015

cmdrdredd said:
How come some of the results show Nvidia GPUs with shorter durations in the tests up to a certain point? Has that been explained?

I'm not gonna read through that entire thread, I only browsed it and submitted my results there.

Yes.

DX12 Async Compute Latency thread

Other stuff has been mentioned as well, but that's the biggest thing.

Silverforce11 · Sep 3, 2015

cmdrdredd said:
How come some of the results show Nvidia GPUs with shorter durations in the tests up to a certain point? Has that been explained?

I'm not gonna read through that entire thread, I only browsed it and submitted my results there.

Pretty much the program was made to test async compute functionality, whether GPUs can process graphics & compute in parallel. It was not made to test throughput or performance.

http://forums.anandtech.com/showpost.php?p=37674878&postcount=820

http://forums.anandtech.com/showpost.php?p=37675312&postcount=829

http://forums.anandtech.com/showpost.php?p=37676035&postcount=841

This comes back to the ACEs, without multiple separate engines, parallel execution would be very difficult.

I also don't believe that AMD originally designed GCN for DX12 or Mantle. This I believe is a common myth, a well planned (or lucky) coincidence.

Reading the articles, Sony actually worked closely with AMD to design GCN because it was Sony who requested 8 ACEs with 8 queues each (some say it was a legacy of the unique PS3 architecture that prompted Sony to want more compute power), whereas the base GCN had 2 ACEs with 2 queues (probably original for Xbone/MS). This was back in 2007 to 2009. So the ground work was laid a long time ago.

GCN would always be crippled for DX11 because it was designed for consoles with their close to the metal API that can extract peak performance from multiple independent engines. So what happens latter? Well, AMD realizes that it now needs a console like API to take advantage of its architecture on PC, and that's where Mantle (port of an Xbone API) comes in, which is also why it's so closely related to DX12 (most likely a port of the Xbone API + Mantle) and it now lives on in Vulkan/Metal. If AMD had not gave Mantle away for free to MS's biggest competitors, we would unlikely be getting DX12 so soon.

Robert Hallock posted several times using the phrase "AMD's gamble paying off", and I do think it was a nice move given their limited R&D $$ and financial situation. They need the consoles to survive as a business. Lisa Su in the recent analyst talk focus heavily on AMD GCN powering over 200 million gaming consoles. With Nintendo NX, it will sustain them, and now with PC having closer to the metal DX12, it's a huge pay-off too. It means they just design the best architecture for consoles and also reap the reward on PC.

Oh, there's already rumor AMD and Sony are at it again, designing the future architecture for the PS5.

NomanA · Sep 3, 2015

cmdrdredd said:
How come some of the results show Nvidia GPUs with shorter durations in the tests up to a certain point? Has that been explained?

I'm not gonna read through that entire thread, I only browsed it and submitted my results there.

The varying times are very likely because of the minor differences in the three different runs, done for each of the 512 iterations per test. The poster who tabulated the graph said this
https://forum.beyond3d.com/posts/1869730/

3DVagabond · Sep 3, 2015

Carfax83 said:
What did they get wrong?

Of course the article is going to be light on technical details, considering it's source.

But the reason why I posted it is because it brings to light one simple truth. Asynchronous compute is NOT part of the DX12 specification.

GCN cards have had the ability to do asynchronous compute in hardware for years, before DX12 was even released. AMD knowingly implemented this feature into their designs, with the thought that it would become useful once low level APIs started to emerge, with Mantle being the first.

And now they are using their console domination leverage, and their marketing department to sort of delude everyone into believing that asynchronous compute is some major DX12 feature that NVidia does not properly support..

When did AMD do any deluding? It was a benchmark released by a game dev that blew this open. Other than make a tweet about it. I haven't seen AMD do anything.

selni · Sep 4, 2015

Hitman928 said:
Maxwell owners at beyond3d are showing little to no benefit in trying to run the test asynchronously:

https://forum.beyond3d.com/threads/dx12-async-compute-latency-thread.57188/page-22

and more.

One example:

The test isn't perfect but some well respected people there have said that this test should show whether Async compute is working or not. Maxwell seems to get little speed ups here and there, but nothing significant. Until Nvidia comments, we probably won't know what's going on and if they never do, then it probably is as this test shows.

Will this matter? Yes, at some point. Will that time be soon enough for the market to be effected? TBD. The industry is moving towards more compute being used in game engines which could obviously enhance the situation but Nvidia will probably be fine because their next gen will come out before a lot of games support asynch compute and they can tell everyone to jump on the latest and greatest at that point.

You really have to go into a lot of detail about exactly what you're doing and how the card/driver is executing it to draw any conclusions about this. Right now what it's showing is that there's no/only a small benefit to explicitly running things asynchronously, but that doesn't mean much by itself.

Asynchronous execution and parallel execution are different concepts. Something can be executed in parallel without being explicitly asynchronous, and tasks can be asynchronous without necessarily being executed in parallel. Further to that parallel execution won't give any speedup unless there's unused resources somewhere that can be utilised (which is hardly uncommon, but also not a guarantee).

Right away we can say nvidia probably supports async compute or the benchmark wouldn't work at all. Whether it's being executed in parallel or not is a more difficult question and requires some work that hasn't been done. Just comparing the times and saying "well it didn't get faster" isn't particularly useful as it doesn't address how well the synchronous case is loading the GPU (eg, it's probably not executing on 1 CUDA core...). Was nvidia already getting enough parallelism to near fully load their hardware anyway?

[WCCFtech] AMD and NVIDIA DX12 big picture mode

Diamond Member

Lifer

Member

Elite Member

Diamond Member

Member

Senior member

Diamond Member

Lifer

Lifer

Lifer

Platinum Member

Golden Member

Member

Lifer

Diamond Member

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Member

Lifer

Senior member