DX12 case studies (NVidia GDC 2017)

Status
Not open for further replies.

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
I just found this very interesting PDF while searching for info on Ubisoft's AnvilNext engine and DX12, so apologies if this has already been posted.

It has a list of several different engines and their DX12 implementations with regards to NVidia hardware. It also confirms some suspicions myself and others have had in regards to DX12 performance on NVidia; that matching the DX11 driver is very difficult to do manually through DX12 when it comes to certain aspects, especially memory management.

Though considering how optimized the NVidia DX11 driver is, I suppose sensible people wouldn't be surprised by that at all. This probably won't stop certain people from gloating about AMD's supposed superiority when it comes to DX12 performance though I'm sure.

One thing that popped out to me, is that certain folks (you know who you are) said that AMD's hardware ACEs which are theoretically capable of handling heavy asynchronous loads are handicapped by developers due to NVidia's weaker async implementation. But if you read the PDF, you might be surprised that The Division utilizes multi engine very heavily. In fact, the game uses three copy queues!

The PDF also heavily implies that the next Assassin's Creed game (scheduled to be released this year) will run on Ubisoft's AnvilNext engine, and will use DX12. I've been wondering why Ubisoft hasn't ported their biggest engine to DX12 yet, but I guess they have been busy for quite some time doing exactly that.

And this might be perhaps the very first game to support the DX12 binding model.

Member callouts are not allowed. Also, this thread wreaks of trolling.
Markfw
Locking this to stop the war....
 
Last edited by a moderator:

BFG10K

Lifer
Aug 14, 2000
22,709
2,979
126
It also confirms some suspicions myself and others have had in regards to DX12 performance on NVidia; that matching the DX11 driver is very difficult to do manually through DX12 when it comes to certain aspects, especially memory management
nVidia's driver engineers are some of the best in the business. Almost no game developer will be able to match someone with intimate insider knowledge of the hardware. It's like expecting an assembler programmer to beat a modern optimizing compiler.

DX12/Vulkan titles on nVidia almost universally run worse or no better than DX11/OpenGL.

Low level APIs are a crutch for AMD's lacking driver. But to be fair AMD has far less resources than nVidia, so it's amazing they do as well as they do. Hopefully if they start making more money they can allocate more to driver engineering.
 

Red Hawk

Diamond Member
Jan 1, 2011
3,266
169
106
nVidia's driver engineers are some of the best in the business. Almost no game developer will be able to match someone with intimate insider knowledge of the hardware. It's like expecting an assembler programmer to beat a modern optimizing compiler.

DX12/Vulkan titles on nVidia almost universally run worse or no better than DX11/OpenGL.

Low level APIs are a crutch for AMD's lacking driver. But to be fair AMD has far less resources than nVidia, so it's amazing they do as well as they do. Hopefully if they start making more money they can allocate more to driver engineering.

As I recall, Doom and Deus Ex Mankind Divided do run better in Vulkan/DX12 even on Nvidia GPUs, just off the top of my head, so I think saying "nearly universally" is exaggerating things.
 
Reactions: Bacon1

NTMBK

Lifer
Nov 14, 2011
10,269
5,134
136
One thing that popped out to me, is that certain folks (you know who you are) said that AMD's hardware ACEs which are theoretically capable of handling heavy asynchronous loads are handicapped by developers due to NVidia's weaker async implementation. But if you read the PDF, you might be surprised that The Division utilizes multi engine very heavily. In fact, the game uses three copy queues!

One thing to bear in mind is that NVidia has specific hardware devoted to offloading copy operations, and has recommended asynchronous copy queues in CUDA for years. Async copying and async compute are slightly different things

Nice find though, thanks!
 

Krteq

Senior member
May 22, 2015
993
672
136
It has a list of several different engines and their DX12 implementations with regards to NVidia hardware. It also confirms some suspicions myself and others have had in regards to DX12 performance on NVidia; that matching the DX11 driver is very difficult to do manually through DX12 when it comes to certain aspects, especially memory management.
Yep, current nV uarch have some limitations with binding resources. GCN is a "bindless" uarch, so there is no issue.

One thing that popped out to me, is that certain folks (you know who you are) said that AMD's hardware ACEs which are theoretically capable of handling heavy asynchronous loads are handicapped by developers due to NVidia's weaker async implementation. But if you read the PDF, you might be surprised that The Division utilizes multi engine very heavily. In fact, the game uses three copy queues!
There is no issue with copy queue in async compute, copy queue have it's own "engine" and it's just copying values to another queues

The elephant in the room for current nV uarchs is the async-compute implementation, due to relatively slow context switching and preemption.

Anyway, thx for link. Interesting read.
 

2is

Diamond Member
Apr 8, 2012
4,281
131
106
As I recall, Doom and Deus Ex Mankind Divided do run better in Vulkan/DX12 even on Nvidia GPUs, just off the top of my head, so I think saying "nearly universally" is exaggerating things.

I'd say using the word "nearly" takes care of the exaggeration claim. You listed two games, and I'm not even sure if they do In fact run better. I know they didn't when I was playing them so saying "nearly universally" sounds right on the mark.
 

ryzenmaster

Member
Mar 19, 2017
40
89
61
One thing that popped out to me, is that certain folks (you know who you are) said that AMD's hardware ACEs which are theoretically capable of handling heavy asynchronous loads are handicapped by developers due to NVidia's weaker async implementation. But if you read the PDF, you might be surprised that The Division utilizes multi engine very heavily. In fact, the game uses three copy queues!

Last time I checked; Division actually ran slightly worse on DX12 than DX11 on Nvidia. I believe Quantum break on DX12 had Fury X actually outperforming 1070 and evenly match it in Sniper Elite 4. Both take a performance hit in BF1 when switching over to DX12, so it may indeed be difficult to do DX12 implementation right and it certainly doesn't help when you need to ensure your game runs well on all recent GPU's. Whether it means handicapping async compute is something I don't know since I don't write code for the GPU in any capacity.

AFAIK Doom on Vulkan does perform better on both GCN and Pascal though, so there certainly can be performance gains when using modern API's.
 

Headfoot

Diamond Member
Feb 28, 2008
4,444
641
126
Low level APIs are a crutch for AMD's lacking driver. But to be fair AMD has far less resources than nVidia, so it's amazing they do as well as they do. Hopefully if they start making more money they can allocate more to driver engineering.

Completely and totally wrong. Lower levels of abstraction have had benefits and trade offs since the first language past assembly was written. Cut the garbage. DX12 vs DX11 is just yet another step up or down the abstraction ladder.
 

2is

Diamond Member
Apr 8, 2012
4,281
131
106
Completely AND totally wrong... Sounds pretty serious. A bit redundant, but serious.
 

Guru

Senior member
May 5, 2017
830
361
106
Can someone make a good summary of the content of the PDF, like what are Nvidia's low level disadvantages?

From real life testing AMD has a better implementation for low level API's, as someone said above the FuryX in some titles equals or even beats the 1070, while its close on others. Point being even an older architecture on 28nm with only 4GB vram, albeit HBM can still beat nvidia's 1070.

This is a big thing since the 1070 is always 25-35% faster in DX11. So clearly Nvidia has big issues with low level API's, but their raw performance is so much bigger that it ultimately doesn't matter that their DX12 performance often times is slower than the DX11 implementation.

Only ROTTR and Hitman(only very recently) is the DX12 performance larger, while in The Division its same as DX11.
 

Flapdrol1337

Golden Member
May 21, 2014
1,677
93
91
From real life testing AMD has a better implementation for low level API's, as someone said above the FuryX in some titles equals or even beats the 1070, while its close on others. Point being even an older architecture on 28nm with only 4GB vram, albeit HBM can still beat nvidia's 1070.
The fury x has more Tflops and twice the memory bandwidth. On paper it should easily win.

Let's hope vega performs more up to spec.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
DX12/Vulkan titles on nVidia almost universally run worse or no better than DX11/OpenGL.

Actually there's quite a few games that run better/faster in DX12/Vulkan than in DX11/OpenGL. Ashes of the Singularity, Hitman, Sniper Elite 4, Doom for example. The Division is still undergoing DX12 optimization so I don't think we can come to a conclusion on that game yet.

Low level APIs are a crutch for AMD's lacking driver. But to be fair AMD has far less resources than nVidia, so it's amazing they do as well as they do. Hopefully if they start making more money they can allocate more to driver engineering.

I definitely disagree with you. Low level APIs are the future, and when games are properly optimized for them, will show massive improvements in performance and hardware utilization.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
As I recall, Doom and Deus Ex Mankind Divided do run better in Vulkan/DX12 even on Nvidia GPUs, just off the top of my head, so I think saying "nearly universally" is exaggerating things.

Doom definitely runs better on Vulkan for NVidia, but Deus Ex MD? No way. At least not with the Creators update for Windows 10. DX12 is buggy as hell and it's much slower than DX11 now, at least on my rig.

But then again, Deus Ex MD has always had very spotty DX12 performance, even for AMD. That's why most reviewers test the game only in DX11.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
One thing to bear in mind is that NVidia has specific hardware devoted to offloading copy operations, and has recommended asynchronous copy queues in CUDA for years. Async copying and async compute are slightly different things

Nice find though, thanks!

Yeah but the point is that the Snowdrop engine is making heavy use of the multi engine capabilities of DX12 hardware. I was actually surprised at how much they were leveraging multi engine.
 

Bacon1

Diamond Member
Feb 14, 2016
3,430
1,018
91
Can you find some talks that aren't purely Nvidia biased? I mean obviously they are going to paint themselves in the best light possible. The article slides are all listed as www.gameworks.nvidia.com and having Nvidia branding throughout.

Also I'm confused as to why some people are stating that NVidia's drivers are unmatched when multiple games run better using DX12/Vulkan than DX11/OpenGL... for Pascal at least.
 

Bacon1

Diamond Member
Feb 14, 2016
3,430
1,018
91
At least not with the Creators update for Windows 10. DX12 is buggy as hell and it's much slower than DX11 now, at least on my rig.

But then again, Deus Ex MD has always had very spotty DX12 performance, even for AMD. That's why most reviewers test the game only in DX11.

Yeah DX12 is definitely going to have some teething. The creators update added WDDM 2.1 so could be related to that or drivers needing to be updated.

I'm sad how little we get in ways of updated testing from the review sites. Why don't we have any "June 2017 DX11 vs DX12 Revisited/Showdown" type articles?

Ideal:

3 GPUs each: 1060, 1070, 1080, 470, 580, Fury (X)

5 CPUs each: Pentium, i5, R5, i7, R7

3 Resolutions: 1080p, 1440p, 4k

2 Settings: Medium, Very High

Games: Hitman, Deus Ex MD, Doom, Sniper Elite, The Division, Rise of the Tomb Raider, Battlefield 1 (any others I missed?)

Someone please!!
 

Spjut

Senior member
Apr 9, 2011
928
149
106
I saw slides from a Dice presentation somewhere talking about Frostbite getting big VRAM savings in DX12.
I've seen some graphics programmers being excited for what Shader Model 6.0 is bringing.

But programmers have said multiple times that DX12's benefits will be small for as long as the games still must support DX11.

I don't do any thorough testing logging the FPS and frametimes, but on my original GTX Titan, Hitman 2016 felt constantly smoother in DX12 than DX11. Benchmarks and my own tests indicated higher max FPS in DX11 but my own tests in DX11 had some stuttering present in certain areas, that weren't there in DX12.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Yep, current nV uarch have some limitations with binding resources. GCN is a "bindless" uarch, so there is no issue.

NVidia hardware is tier 2 bindless resources, and AMD has tier 3. Whether that makes any difference in practice remains to be seen since no games are currently using the DX12 bindless model.

There is no issue with copy queue in async compute, copy queue have it's own "engine" and it's just copying values to another queues

Did I say anything about asynchronous compute? If you read my quote properly, I said asynchronous loads. I purposely left compute out, because I was specifically talking about The Division's multi engine implementation, which seems to be rather robust.

The elephant in the room for current nV uarchs is the async-compute implementation, due to relatively slow context switching and preemption.

In the context of this discussion, preemption has no relevance since it's not used in gaming scenarios outside of VR. As for context switching, if NV uarchs have such a problem with it, then why does Maxwell 2 generally outperform AMD in VR?

As far as I know, NVidia hardware has much faster context switching than AMD.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Last time I checked; Division actually ran slightly worse on DX12 than DX11 on Nvidia. I believe Quantum break on DX12 had Fury X actually outperforming 1070 and evenly match it in Sniper Elite 4. Both take a performance hit in BF1 when switching over to DX12, so it may indeed be difficult to do DX12 implementation right and it certainly doesn't help when you need to ensure your game runs well on all recent GPU's. Whether it means handicapping async compute is something I don't know since I don't write code for the GPU in any capacity.

The Division is still undergoing DX12 optimization so the jury is still out on that one. With patch 1.6, DX12 has stuttering, but this is supposedly fixed in the 1.6.1 patch which is still in test mode and hasn't been released. I think The Division's Snowdrop engine will eventually have proper DX12 support as it seems they are on the right track.

As for Quantum Break, that game was a disaster in DX12 for NVidia. But the fact that the game runs faster on NVidia for DX11 than it does for AMD on DX12 says a lot about the level of optimization for both vendors.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Can you find some talks that aren't purely Nvidia biased? I mean obviously they are going to paint themselves in the best light possible. The article slides are all listed as www.gameworks.nvidia.com and having Nvidia branding throughout.

I wasn't looking for any "biased articles" at all when I found this one. I just thought I would post it since it has some interesting info regarding DX12 optimization. Also Hitman isn't exactly a gameworks title now is it? And the AnvilNext DX12 engine from Ubisoft hasn't been released yet, but it likely will with the new Assassin's Creed game.

Also I'm confused as to why some people are stating that NVidia's drivers are unmatched when multiple games run better using DX12/Vulkan than DX11/OpenGL... for Pascal at least.

Yeah I can't believe that myth still persists. There is a section in Doom where I got a 150% increase in performance vs OpenGL, so yeah, low level APIs are useful
 

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,729
136
As for Quantum Break, that game was a disaster in DX12 for NVidia. But the fact that the game runs faster on NVidia for DX11 than it does for AMD on DX12 says a lot about the level of optimization for both vendors.
Remedy massively improved the performance after an update. Too bad that by then they had already decided to skip support for the UWP version and focus on the DX11 version for Steam.
 

Bacon1

Diamond Member
Feb 14, 2016
3,430
1,018
91
Remedy massively improved the performance after an update. Too bad that by then they had already decided to skip support for the UWP version and focus on the DX11 version for Steam.

Yeah QB was my most regretted purchase in a long time. Not only did it take weeks after launch to even get my Win 10 key, but the game ran very poorly and they dropped support for it super fast. Then they came out with the steam version and stated they'd keep both versions equal, when its obvious that they made a completely separate build for steam as it had some issues that the Win 10 had fixed up (like fraps logo in the in game movie!) months before. They optimized it with Nvidia, left AMD to hang and never updated the Win 10 version again.

Rise of the Tomb Raider on Windows store is equal to steam and offers both DX11 and DX12. No idea why Remedy worked harder for less options for people. But w/e, I wouldn't include it in any comparison since the game builds aren't the same with just an API change.
 

dogen1

Senior member
Oct 14, 2014
739
40
91
nVidia's driver engineers are some of the best in the business. Almost no game developer will be able to match someone with intimate insider knowledge of the hardware. It's like expecting an assembler programmer to beat a modern optimizing compiler.

You mean a non assembler programmer, right? Cause it's not that amazing to outperform a compiler. It's a program vs a human being.
 
Status
Not open for further replies.
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |