DX12 / Vulcan and new GPU architecture (?)

stateofmind · Jul 15, 2015

Fox5 said:
Lighting can be computed in two ways.

The classic way of doing it is forward rendering. You basically have to rerender the scene for each light to compute the change in lighting, so your performance is cut by 1/# lights. It's hard to get above about 8 lights in a scene like this, but it doesn't really cause any other issues.

There's also deferred lighting, in which case lighting is deferred until the very end and just rendered on the final 2d image. Lighting is rendered at the resolution of the image, and you can do thousands of lights because it's more of a filter on the final image. However, it breaks anti aliasing and in general doesn't look as good as forward rendering.

Physics computations are unrelated to the lighting. The objects will be moved into the correct positions each frame prior to any lighting being rendered. You could just calculate positions every frame, but if things are non-deterministic (like dynamic physics) then you really don't want a variable number of calculations per second or the results could change. You also don't want to compute something very expensive more than needed, so linking to framerate would lower framerate too.

Thanks!

I still don't get how physics and lighting are unrelated. You were saying that the objects are positioned prior to the lighting. Positioning takes into account physics, no? then, it means that physics go before lighting, which is in contrary to what is said in the article/slides

I'm missing here something - sorry for the bother..

Silverforce11 said:
That's nothing new from what people (gamedevs as well as directly from AMD's engineers) have been saying for a long time. GCN is built for a Mantle-like API, as its uarch cannot be taken advantage of fully with DX11. It was a forward looking uarch, meant to last them a long time.

I am hoping in the not too distant future, I can buy a Zen APU with GCN 2 and HBM2, that would make for an awesome base for an expanding rig, ie. plug in a GCN 2 dGPU, DX12 games take advantage of both.

This thing is not new also. VLIW5 to VLIW4 was also around underutilizing..

What's annoying me is that this article comes only now. They have known it years ago!

werepossum · Jul 15, 2015

stateofmind said:
Thanks!

I still don't get how physics and lighting are unrelated. You were saying that the objects are positioned prior to the lighting. Positioning takes into account physics, no? then, it means that physics go before lighting, which is in contrary to what is said in the article/slides

I'm missing here something - sorry for the bother..

This thing is not new also. VLIW5 to VLIW4 was also around underutilizing..

What's annoying me is that this article comes only now. They have known it years ago!

He isn't saying that physics and lighting are unrelated, he's saying that physics computations are unrelated to the lighting. Physics affects lighting, but lighting does not affect physics. This might seem like semantics, but it's not because it allows one to run physics calculations out of sequence, so that when it's time to run the lighting computations the positional geometry has been pre-determined for each frame. Thus there is no delay while the element's geometry is calculated before its shading can be done. It's also important because the element's positional geometry may not change with frame rate - that's a very difficult calculation since frame rate is generally variable - so by running physics out of the GPU pipeline, the rendering engine can call up the required pre-calculated information at the instant it is needed to create that frame. Then all that is needed is the exact time that frame will be rendered, which is a much simpler calculation. This probably isn't that big a deal where the CPU is calculating physics, but it's a big advantage for particle effects calculated by the GPU. Or that's my take on it anyway - my understanding may be as fuzzy as yours.

Headfoot · Jul 15, 2015

DirectX 12 really can't come soon enough. Absolute total sea change in graphics. So much potential. I hope at least some of it comes to fruition. The CPU overhead seems to be a certainty, multi-adapter a lot fuzzier, and increased async compute use is probably pretty certain but to what degree is the question.

If I were making games and I knew kepler couldn't do async compute well even in DX12 I'd still try and avoid it so my game would run alright on lots of hardware.

So we might see async compute based effects as "ultra" settings and add-ons which never have as much impact as graphics features built in from the ground up. The dx_12_1 feature level stuff will be the same in all likelihood, nice but the potential mostly wasted because no one is going to make it a requirement for their game.

Sabrewings · Jul 15, 2015

Silverforce11 said:
DX12 games take advantage of both.

Just remember this is not an automatic feature of DX12 and will require devs to actually put effort into implementing the feature in their particular application.

stateofmind · Jul 15, 2015

werepossum said:
He isn't saying that physics and lighting are unrelated, he's saying that physics computations are unrelated to the lighting. Physics affects lighting, but lighting does not affect physics. This might seem like semantics, but it's not because it allows one to run physics calculations out of sequence, so that when it's time to run the lighting computations the positional geometry has been pre-determined for each frame. Thus there is no delay while the element's geometry is calculated before its shading can be done. It's also important because the element's positional geometry may not change with frame rate - that's a very difficult calculation since frame rate is generally variable - so by running physics out of the GPU pipeline, the rendering engine can call up the required pre-calculated information at the instant it is needed to create that frame. Then all that is needed is the exact time that frame will be rendered, which is a much simpler calculation. This probably isn't that big a deal where the CPU is calculating physics, but it's a big advantage for particle effects calculated by the GPU. Or that's my take on it anyway - my understanding may be as fuzzy as yours.

Thanks again. Many good people

1. I have the feeling that it relates heavily on a specific way of doing things in current 3D engines.

2. So, if I get you correctly, this whole parallel thing is a play on the pacing of physics calculations? And more specifically - physics are not calculated as part of the usual pipeline - the one that dictates the FPS, right?
I frankly don't know how they do it correctly and I guess that it has it's not that simple

If it's not so, I still don't understand what you were trying to explain to me and how the calculations can be made concurrently (except in the case where parts of the scene are calculated separately)

3. If it's true, it's still not really as the slides show.. they show specific case that might not always be the case as sometimes you are bound to update your physics state

I'm pretty sure that a lot is missing in all these slides and articles. No?

Sabrewings · Jul 15, 2015

stateofmind said:
2. So, if I get you correctly, this whole parallel thing is a play on the pacing of physics calculations? And more specifically - physics are not calculated as part of the usual pipeline - the one that dictates the FPS, right?
I frankly don't know how they do it correctly and I guess that it has it's not that simple

Correct. Physics is independent of the rendering pipeline calculations and are refreshed on a specific schedule regardless of the number of frames in between. The number of 50Hz comes to mind. So, whether you're at 144FPS or 60FPS, the physics engine updates bodies at that specific rate.

stateofmind · Jul 15, 2015

Sabrewings said:
Correct. Physics is independent of the rendering pipeline calculations and are refreshed on a specific schedule regardless of the number of frames in between. The number of 50Hz comes to mind. So, whether you're at 144FPS or 60FPS, the physics engine updates bodies at that specific rate.

Thanks. Really valuable info. Isn't there good tutorial to all this?

So, how it works now, really? the physics calculations would have taken some GPU pipeline time while not really required to be part of the pipeline?
But from what I understand, some of the recent AMD GPUs already had more than one ACEs

And I guess that it still requires some syncing.. since even if you calculate physics at 50HZ, changes might be too fast for it and you might need to somehow sync it, or do some correction or loosen the accuracy levels.. no?

werepossum · Jul 15, 2015

stateofmind said:
Thanks again. Many good people

1. I have the feeling that it relates heavily on a specific way of doing things in current 3D engines.

2. So, if I get you correctly, this whole parallel thing is a play on the pacing of physics calculations? And more specifically - physics are not calculated as part of the usual pipeline - the one that dictates the FPS, right?
I frankly don't know how they do it correctly and I guess that it has it's not that simple

If it's not so, I still don't understand what you were trying to explain to me and how the calculations can be made concurrently (except in the case where parts of the scene are calculated separately)

3. If it's true, it's still not really as the slides show.. they show specific case that might not always be the case as sometimes you are bound to update your physics state

I'm pretty sure that a lot is missing in all these slides and articles. No?

I agree with all three of your points. Fox or Sabrewings or Silverforce could state it better than can I as they obviously know it better, but my understanding is that with traditional GPU physics the rendering pipeline essentially halts while the geometry is calculated. (Or to perhaps be more clear, this is a part of the pipeline process, just calculated in a different part of the GPU from rendering.) With some asynchronous computing there is more predictability, so that the physics geometry can be calculated simultaneously and merely recalled when needed to reduce the delay to just a couple clock cycles, making a comparatively expensive operation (measured in clock cycles) into a relatively cheap operation. IFF I understand it correctly. Think of it as just-in-time inventory delivery versus traditional order-based pull inventory.

It's also worth pointing out that the Maxwell architecture also has some asynchronous computing capability and that as Silverforce pointed out, there are practical limitations which suggest that NVidia's very limited capability is an intentional design choice. Whether this design choice works out well or poorly depends on what developers do. Personally I'm guessing that developers do so more than NVidia anticipated, but compared to the understanding NVidia's designers necessarily have, obviously mine is by definition a WAG, so consider it worth what it cost you.

stateofmind · Jul 15, 2015

werepossum said:
I agree with all three of your points. Fox or Sabrewings or Silverforce could state it better than can I as they obviously know it better, but my understanding is that with traditional GPU physics the rendering pipeline essentially halts while the geometry is calculated. (Or to perhaps be more clear, this is a part of the pipeline process, just calculated in a different part of the GPU from rendering.) With some asynchronous computing there is more predictability, so that the physics geometry can be calculated simultaneously and merely recalled when needed to reduce the delay to just a couple clock cycles, making a comparatively expensive operation (measured in clock cycles) into a relatively cheap operation. IFF I understand it correctly. Think of it as just-in-time inventory delivery versus traditional order-based pull inventory.

It's also worth pointing out that the Maxwell architecture also has some asynchronous computing capability and that as Silverforce pointed out, there are practical limitations which suggest that NVidia's very limited capability is an intentional design choice. Whether this design choice works out well or poorly depends on what developers do. Personally I'm guessing that developers do so more than NVidia anticipated, but compared to the understanding NVidia's designers necessarily have, obviously mine is by definition a WAG, so consider it worth what it cost you.

Wouldn't one or two complete charts help us a lot more? and also, it will help AMD/NV (and others) focus better and understand what they are trying to do

Personally, I think that the fact that clear and accurate information is so hard to get by is intentional - more than the design intentions of NV ((-:
And I'm sure they predicted these trends years ago.

Do you know of any tool for monitoring the inner works of a GPU? no way, ha?

werepossum · Jul 15, 2015

stateofmind said:
Wouldn't one or two complete charts help us a lot more? and also, it will help AMD/NV (and others) focus better and understand what they are trying to do

Personally, I think that the fact that clear and accurate information is so hard to get by is intentional - more than the design intentions of NV ((-:
And I'm sure they predicted these trends years ago.

Do you know of any tool for monitoring the inner works of a GPU? no way, ha?

Better charts and graphs would help you and I, but remember that GPUs are simply specialized processors and as such, developers have a fair amount of control over how they accomplish their work. Also, there is a large amount of variation possible within each API, so we're always going to get fairly generic descriptions.

stateofmind · Jul 15, 2015

werepossum said:
Better charts and graphs would help you and I, but remember that GPUs are simply specialized processors and as such, developers have a fair amount of control over how they accomplish their work. Also, there is a large amount of variation possible within each API, so we're always going to get fairly generic descriptions.

So how good are these marketing slides/charts ?.. (lol)
Yea, I know that, I'm just annoyed by the fact that big hardware sites act like a "pipeline" of such marketing.

Silverforce11 · Jul 15, 2015

Here's the reference to devs talking about async compute:

https://www.youtube.com/watch?v=7MEgJLvoP2U&feature=youtu.be&t=19m40s

Note the terminology used is extremely similar to what other gamedevs on B3D forums have said regarding GCN. It's a compute beast, able to do many async compute.

Basically the ACE engines are useless in DX11 now, which is one of the reasons we see the extra shaders in Fury/X scaling poorly at lower resolutions. It's just not capable of being fed, as shown by tom's when they do 1080 and 1440p power usage tests, it uses less than at 4K so there's lots of idle shaders.

Come DX12, there should be a nice spike in shader uptime for GCN and in games with lots of async compute (or VR for any of you willing to wear a mask while gaming.. not weird at all!), it'll really shine.

Fox5 · Jul 15, 2015

werepossum said:
He isn't saying that physics and lighting are unrelated, he's saying that physics computations are unrelated to the lighting. Physics affects lighting, but lighting does not affect physics. This might seem like semantics, but it's not because it allows one to run physics calculations out of sequence, so that when it's time to run the lighting computations the positional geometry has been pre-determined for each frame. Thus there is no delay while the element's geometry is calculated before its shading can be done. It's also important because the element's positional geometry may not change with frame rate - that's a very difficult calculation since frame rate is generally variable - so by running physics out of the GPU pipeline, the rendering engine can call up the required pre-calculated information at the instant it is needed to create that frame. Then all that is needed is the exact time that frame will be rendered, which is a much simpler calculation. This probably isn't that big a deal where the CPU is calculating physics, but it's a big advantage for particle effects calculated by the GPU. Or that's my take on it anyway - my understanding may be as fuzzy as yours.

That's a pretty good explanation.

You can do all game calculations two ways, independent of the framerate, or synced to the framerate. Console games used to (and may still be) synced tightly to the framerate, so gameplay actually changes if framerate changes. PC games moved away from that a long time ago, so generally graphics and lighting are unrelated to AI, gameplay, and physics.
If code that's relevant to gameplay slows down, you either get a change in gameplay, or the game actually slows down. The latter is preferred, but it's rare you'll see a game's speed actually show down besides rendering framerate.

Game calculation speeds can be uncapped, uncapped up to framerate, or capped at some arbitrary rate. In the case of Nvidia Physx, it runs at its own simulation speed apart from the rest of the game, although few if any games use it in a way that effects anything other than visuals. I imagine this is because of the intended use scenario, where Physx is calculated on a separate graphics card, so it's hard to tightly couple the physics simulation with the rest of the game. A cpu only physics engine would be easier to couple to gameplay since there's no communication penalty.

As another example, many strategy games run the gameplay on a separate thread from graphics. So it's quite common in a game like Civilization to get 60fps, but have the simulation speed be very very slow.

Silverforce11 · Jul 15, 2015

Yup, physics (inc PhysX usage) in Project Cars run at 600hz.

Useful for all the rigs that can run that game at 600 fps...

Azix · Jul 15, 2015

Silverforce11 said:
Here's the reference to devs talking about async compute:

https://www.youtube.com/watch?v=7MEgJLvoP2U&feature=youtu.be&t=19m40s

Note the terminology used is extremely similar to what other gamedevs on B3D forums have said regarding GCN. It's a compute beast, able to do many async compute.

Basically the ACE engines are useless in DX11 now, which is one of the reasons we see the extra shaders in Fury/X scaling poorly at lower resolutions. It's just not capable of being fed, as shown by tom's when they do 1080 and 1440p power usage tests, it uses less than at 4K so there's lots of idle shaders.

Come DX12, there should be a nice spike in shader uptime for GCN and in games with lots of async compute (or VR for any of you willing to wear a mask while gaming.. not weird at all!), it'll really shine.

I was wondering if async would result in higher power usage

greatnoob · Jul 15, 2015

From what you've all said it sounds like Async Compute is no different to multithreading in the CPU landscape and if that's the case does anybody know if we'll have to be dealing with semaphore-like context switching between workloads. How would you deal with synchronising tasks for example where x depends on y and is updated at 16.6ms but y finishes 166ms later?

Azix said:
I was wondering if async would result in higher power usage

If anything, it should be lower when workloads are low (that's if they can be turned off in whole, else it'll be the same as it always has been)

Silverforce11 · Jul 16, 2015

greatnoob said:
From what you've all said it sounds like Async Compute is no different to multithreading in the CPU landscape and if that's the case does anybody know if we'll have to be dealing with semaphore-like context switching between workloads. How would you deal with synchronising tasks for example where x depends on y and is updated at 16.6ms but y finishes 166ms later?

Which is why gamedevs on b3d have said don't be fooled by how many queues each engine can support. It works best with 1 extra async queue, more than that, it trashes data and makes it very difficult to work with.

This is why GCN with its 8x ACE (queue engines) is superior for this DX12 workload because while they each support 8 queues, works best with 1 added async queue. This is where the 1 rendering + 8 compute numbers were thrown around.

The context here is that Kepler (which can't even do async compute) & Maxwell has only one queue engine.

For further info we'll need inputs from gamedevs accustomed to DX12/Mantle programming, most of them reside on b3d, I believe only zlatan frequents AT forums.

Pottuvoi · Jul 16, 2015

Fox5 said:
Lighting can be computed in two ways.

The classic way of doing it is forward rendering. You basically have to rerender the scene for each light to compute the change in lighting, so your performance is cut by 1/# lights. It's hard to get above about 8 lights in a scene like this, but it doesn't really cause any other issues.

There's also deferred lighting, in which case lighting is deferred until the very end and just rendered on the final 2d image. Lighting is rendered at the resolution of the image, and you can do thousands of lights because it's more of a filter on the final image. However, it breaks anti aliasing and in general doesn't look as good as forward rendering.

These are the basic ideas for the ways how you render things, but there are plenty of ways in which you can light your scene, it's a world where anything goes. (there tens of variations of forward and deferrerd.)

Then there are also more unique ways like texture based shading in which you do lighting in texture space.
In such scheme you have shading done in resolution independent from screen resolution and shading frequency is not constrained by framerate at all. (This could work wonderfully with virtual/megatextures as well.)
Version of such is also used in Ashes of Singularity.
https://www.youtube.com/watch?v=t9UACXikdR0

flopper · Jul 16, 2015

Ubisoft whats DX12?
Dice we are already doing it.
wont matter how good dx12 is unless implemented and used

stateofmind · Jul 16, 2015

Fox5 said:
That's a pretty good explanation.

You can do all game calculations two ways, independent of the framerate, or synced to the framerate. Console games used to (and may still be) synced tightly to the framerate, so gameplay actually changes if framerate changes. PC games moved away from that a long time ago, so generally graphics and lighting are unrelated to AI, gameplay, and physics.
If code that's relevant to gameplay slows down, you either get a change in gameplay, or the game actually slows down. The latter is preferred, but it's rare you'll see a game's speed actually show down besides rendering framerate.

Game calculation speeds can be uncapped, uncapped up to framerate, or capped at some arbitrary rate. In the case of Nvidia Physx, it runs at its own simulation speed apart from the rest of the game, although few if any games use it in a way that effects anything other than visuals. I imagine this is because of the intended use scenario, where Physx is calculated on a separate graphics card, so it's hard to tightly couple the physics simulation with the rest of the game. A cpu only physics engine would be easier to couple to gameplay since there's no communication penalty.

As another example, many strategy games run the gameplay on a separate thread from graphics. So it's quite common in a game like Civilization to get 60fps, but have the simulation speed be very very slow.

Still, some syncing should be done, but it seems like Silverforce11 reply is the answer - much higher refresh rates

Silverforce11 said:
Yup, physics (inc PhysX usage) in Project Cars run at 600hz.

Useful for all the rigs that can run that game at 600 fps...

That explains the no-need-to-sync issue

flopper said:
Ubisoft whats DX12?
Dice we are already doing it.
wont matter how good dx12 is unless implemented and used

And how you use it..
I still don't get what so beautiful in "The Witcher 3", graphically, and yet, it is very demanding for a system, even without this hairworks thing

stateofmind · Jul 16, 2015

Azix said:
I was wondering if async would result in higher power usage

You would also finish more quickly

Silverforce11 said:
Which is why gamedevs on b3d have said don't be fooled by how many queues each engine can support. It works best with 1 extra async queue, more than that, it trashes data and makes it very difficult to work with.
[...]

I think that we'll see a change in the way things work. I expect some more ability to utilize the GPU for other things too (Super-OpenCL style)

Otherwise, what for so many ACEs, expect pure compute?
I would think, that you could power more than one "station"/"pc" with one GPU.. that would be nice

werepossum · Jul 16, 2015

Silverforce11 said:
Which is why gamedevs on b3d have said don't be fooled by how many queues each engine can support. It works best with 1 extra async queue, more than that, it trashes data and makes it very difficult to work with.

This is why GCN with its 8x ACE (queue engines) is superior for this DX12 workload because while they each support 8 queues, works best with 1 added async queue. This is where the 1 rendering + 8 compute numbers were thrown around.

The context here is that Kepler (which can't even do async compute) & Maxwell has only one queue engine.

For further info we'll need inputs from gamedevs accustomed to DX12/Mantle programming, most of them reside on b3d, I believe only zlatan frequents AT forums.

So for the immediate future, Kepler is at a disadvantage to GCN but Maxwell really is not unless and until developers develop and implement better systems of predicting what resources should be pre-calculated and at what time value, correct? Or are there some async calcs which are useful now but beyond Maxwell's async abilities?

Also, am I correct to think that right now this would be limited to effects physics without real effects on gameplay, but DX12 promises to also move gameplay physics onto the GPU if needed? Or will gameplay physics continue to be the province of CPUs for the foreseeable future?

This plus the consoles' AMD GPUs is still making me believe that DX12 is going to favor GCN over current-gen NVidia hardware in spite of the latter's fuller coverage of 12.1. And yet . . . Wasn't it here at Anandtech that early DX12 testing revealed Maxwell blowing the doors off GCN?

Silverforce11 · Jul 16, 2015

If anything should convince you DX12 is going to favor GCN, it should be this:

Devs have praised GCN for DX12 publicly on tech forums, they even go so far as to highlight its ability for fine-grained execution of async compute beyond 1 thread/queue, to saturate the shaders.

But whether it has an impact in games, will depend on the game. Just because its a DX12 game does not equate to it using a lot of async compute features. It could simply be the low lvl CPU overhead reduction being employed.

The Starswarm demo favors NV, while the draw call 3dMark bench favors AMD. Basically synthetics tell us very little, we can only judge once a bunch of DX12 games hit.

Interesting times ahead, but not least because I am looking forward to Battlefront & Deus Ex!

ShintaiDK · Jul 17, 2015

Nomatter how many times you repost that picture its not going to be anymore true.

Sabrewings · Jul 17, 2015

Hmm, similar programming guides absolutely has to mean it's going to favor GCN! It can't possibly be just a high level document that has no direct representation of the low level code! By George, I think he's got it!

Just more unsubstantiated reaching, just like that patent for a high reliability memory controller for chip he is purporting as AMD's HBM controller patent. Similarites do not mean they are ==.

DX12 / Vulcan and new GPU architecture (?)

Senior member

Elite Member

Diamond Member

Golden Member

Senior member

Golden Member

Senior member

Elite Member

Senior member

Elite Member

Senior member

Lifer

Diamond Member

Lifer

Golden Member

Senior member

Lifer

Senior member

Senior member

Senior member

Senior member

Elite Member

Lifer

Lifer

Golden Member