Ashes of the Singularity User Benchmarks Thread

sontin · Aug 24, 2015

VR Enthusiast said:
I'm not sure what Nvidia said but what Oxide says is true regarding there being no bug in the application. It is clear from the benchmarks (without MSAA) that this is not the reason why Nvidia scores so badly in DX12.

It is not about bugs. This engine is still under development and the developer warned reviewers not to use MSAA because their DX12 implementation is still experimental.

So what else is it if not MSAA and not the async shaders? The obvious answer is it is the async shaders.

Because DX12 is slower than DX11 so it must be "async shaders" and not the "Multi-Threaded Performance" or "Explicit Frame Management" or "Updated Memory Management Design" or whatever?!

The fact that "async shaders" doesnt cost any performance on hardware which doesnt support it is still ignored by you...

Silverforce11 · Aug 24, 2015

TheELF said:
Not what I was saying.
Console games will have async in the amount of what the apu-gpu can handle, this will be way less than what even a middle class desktop gpu can handle.

Async shaders are a potential bottleneck on current gen Nvidia cards ONLY IF YOU DEVELOP TO PUSH THE BIGGEST OF DESKTOP CARDS,ashes is doing that (and after a year of optimizing until it comes out a lot will change) but a classic console port will not.

FYI, PS4 has 8 ACE (same as recent GCN SKUs), XBone has 2. So devs will have to tone down usage for the Xbone, PC will probably have sliders, no worries.

Silverforce11 · Aug 24, 2015

sontin said:
The fact that "async shaders" doesnt cost any performance on hardware which doesnt support it is still ignored by you...

It's a fact?

Where can we verify this FACT?

Carfax83 · Aug 24, 2015

VR Enthusiast said:
Async shaders are a potential bottleneck on current gen Nvidia cards that devs will either have to remove to increase performance on Nvidia, or just leave it as it is and accept that Nvidia cards will perform less optimally. What do you think they'll do?

I have to agree with sontin, there is no evidence for this at all.

If you understood the nature of asynchronous shaders, you would know that this is quite a ridiculous assertion..

VR Enthusiast · Aug 24, 2015

sontin said:
The fact that "async shaders" doesnt cost any performance on hardware which doesnt support it is still ignored by you...

Carfax83 said:
If you understood the nature of asynchronous shaders, you would know that this is quite a ridiculous assertion..

I think the issue here is that you guys seem to be confusing AMD's (actual) async shaders with whatever rubbish hack Nvidia is trying to con you into believing is actual working async shaders.

A game that has been developed on GCN to make use of AMD's async shaders will likely run worse on Nvidia.

Silverforce11 · Aug 24, 2015

Carfax83 said:
I have to agree with sontin, there is no evidence for this at all.

If you understood the nature of asynchronous shaders, you would know that this is quite a ridiculous assertion..

Well it certainly isn't "spare cycles".

DX12 performance for NV in Ashes.
1. No async shaders (no dynamic lights), draw call test = Big perf boost.
2. Heavy test with dynamic lights (which the devs CONFIRMED is via async compute) = Small perf drop. Across the Kepler & Maxwell series.

A few possible conclusions you can draw from that. But to say there's no evidence that Async Compute is causing NV hardware some issues, is wrong. In fact, there's NO evidence it does not. There's NO evidence Maxwell excels at async shaders.

You would have evidence IF and WHEN you find gamedevs that works with DX12 which publicly say Maxwell is great at it, or a demo. Both will be viable evidence.

Remember this?

https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-4#post-1867649

I don't know anything about Maxwell's async compute implementation, but I know that GCN gets huge benefits from it.

Carfax83 · Aug 24, 2015

VR Enthusiast said:
Look at that graph from Carfax above, you can see even the 370 gains with DX12 and it only has 2 ACEs. There are no benchmarks showing lower fps on AMD in DX12. That probably means that Ashes is nowhere near pushing the maximum limit (8 ACEs) on later GCN cards.

Or it could also mean that since AMD's DX11 performance is so terrible in this benchmark, that to go any lower would be going under the bottom of the barrel :awe:

sontin · Aug 24, 2015

Silverforce11 said:
It's a fact?

Yes it is. DX12 doesnt behave different to DX11. If the hardware doesnt support it the context switch is exactly the same. You dont get "negative performance" over DX11 because nothing changed. That's the reason why anyone is promoting it as a huge feature.

Page 32 - 34: http://on-demand.gputechconf.com/gtc/2015/presentation/S5561-Chas-Boyd.pdf

VR Enthusiast · Aug 24, 2015

sontin said:
Yes it is. DX12 doesnt behave different to DX11. If the hardware doesnt support it the context switch is exactly the same. You dont get "negative performance" over DX11 because nothing changed. That's the reason why anyone is promoting it as a huge feature.

Page 32 - 34: http://on-demand.gputechconf.com/gtc/2015/presentation/S5561-Chas-Boyd.pdf

Nobody said Nvidia didn't support it. What I'm saying is they are doing it badly.

We don't know if there are other benefits to DX12 that we can't see in the benchmarks, ie smoothness. Even though Nvidia's DX12 results look bad it could offer smoother gameplay compared to DX11.

Silverforce11 · Aug 24, 2015

sontin said:
Yes it is. DX12 doesnt behave different to DX11. If the hardware doesnt support it the context switch is exactly the same. You dont get "negative performance" over DX11 because nothing changed. That's the reason why anyone is promoting it as a huge feature.

Page 32 - 34: http://on-demand.gputechconf.com/gtc/2015/presentation/S5561-Chas-Boyd.pdf

There's nothing in p32-34 that supports your statement: "Async shaders doesnt cost any performance on hardware which doesn't support it".

The compute job is a queue, if it's not supported asynchronously it has to default back to the normal serial behavior and the compute task is done on the shaders AFTER rendering is done.

If a dev is calling for compute shaders to do dynamic lights on a scene, and the hardware can't handle that asynchronously, do you think the hardware is going to IGNORE the call and never render the lights? That would tantamount to cheating.

The default behavior (current APIs) is a serial pipeline, rendering gets done, then compute, then rendering, then compute. This is the expected behavior of Kepler since it cannot do async compute at all. So its no shock to see it lose performance when async compute is used.

Now you see why there's potential for async compute to cause delays in rendering if the hardware can't handle it. So the drop in Maxwell perf is either 1) Driver bug (remember everyone saying drivers have less impact in DX12 because devs talk to the hardware directly?? So its unlikely a driver issue), 2) Oxide's fault or 3) Maxwell is also gimped for async compute/shaders.

Your statement: "Async shaders doesnt cost any performance on hardware which doesn't support it" implies hardware which cannot support async compute ignores or discard the async compute task, never rendering it, ie. skipping it. There's no chance of that happening or there be a crapstorm as features aren't rendered, skipped. It will be rendered, just in serial and slower.

sontin · Aug 24, 2015

If the hardware doesnt support a graphic and a compute queue at the same time then the frame gets delayed till both queues are finished. And even then you can get more performance because certain hardware supports more than one compute queue to process compute task more efficiently.

The job of the developer is to look after the supported hardware and to optimize the behaviour of their engine for it. They cant blame the hardware vendor when their DX12 implementation isnt properly prepared for them.

Pottuvoi · Aug 24, 2015

If the speed problem is only evident with huge amount of dynamic lights, I'm willing to bet the problem is in 'render/shade to textures' part of the code or driver which handles it.

AotS is quite unique as it writes a lot to textures (Lighting etc.) and has to keep unique texture for each unit to do so.

Carfax83 · Aug 24, 2015

Silverforce11 said:
Well it certainly isn't "spare cycles".

DX12 performance for NV in Ashes.
1. No async shaders (no dynamic lights), draw call test = Big perf boost.
2. Heavy test with dynamic lights (which the devs CONFIRMED is via async compute) = Small perf drop. Across the Kepler & Maxwell series.

OK here's my question. Look at these benchmarks between DX11 and DX12. The benchmarks are presumably the exact same except for the API. The Radeon gains in all categories, but gains an especially MASSIVE increase in performance in the heavy load; well over a 100%...

The Geforce on the other hand, either gains a smidgen, or loses some.

DX12 benchmark

DX11 benchmark

If the benchmarks are the exact same, and the DX11 version is doing the SAME STUFF as the DX12 version but without the "benefit" of asynchronous compute which supposedly increases performance and efficiency, how on Earth can the Geforce lose performance instead of gaining it?

Remember that DX11 is acting in a serial manner which is the slowest possible way, but is doing the same workload as in the DX12 benchmark (which is done in a parallel fashion) but is actually faster in some cases for the Geforce card.

Or even more strange, the GTX 980 Ti in DX11 mode beats the Radeon in DX12 mode in everything but the heavy load

It makes no sense. You should not be able to lose performance because of asynchronous compute, given that it supposedly does not conflict with the rendering.. And the Radeon in DX12 mode should not be losing to the Geforce in DX11 mode. So something is broken here for sure.

Silverforce11 · Aug 24, 2015

@Carfax83

This is why I have asked if ANYONE has seen DX11 benching in action, I suspect the dynamic lights are DISABLED.

Because in DX12 mode, the CPU test (draw calls only), has no lights (some site found 80% gain for NV in this test). I've seen a screenshot of it in action already, confirmed no dynamic lights in that mode.

If we can see a screen of DX11 mode in action, we can confirm immediately. If its a pure draw call test like the DX12 CPU mode, then bang, there's the answer: NV is already excellent for draw calls in DX11, AMD is bad at high draw calls.

TheELF · Aug 24, 2015

VR Enthusiast said:
Ashes might be worst case but I don't think so. I think that when devs really start using async shaders ie pushing all 8 ACEs on a PS4 for example, then you'll start to see worst case for Nvidia.

They didn't up to now?
They didn't have to worry about dx11 from the get go and they have low level access, if they didn't until now why should they in the future?

I believe the only reason why the ps4 has the level of performance that it has with such weak hardware is because they already use as much of it as they can,they already use all of these features and that's why the ports run so badly on PCs.

AtenRa · Aug 24, 2015

Silverforce11 said:
Btw, does anyone have a video of Ashes running in DX11 mode? I wanna know if there's thousands of dynamic lights (all those units firing & explosions cast a light in DX12 via async compute), since I wonder how they actually do it (if its there) in DX11 at all.

https://www.youtube.com/watch?v=9bvpyvoNXcw

Carfax83 · Aug 24, 2015

Silverforce11 said:
@Carfax83

This is why I have asked if ANYONE has seen DX11 benching in action, I suspect the dynamic lights are DISABLED.

Because in DX12 mode, the CPU test (draw calls only), has no lights (some site found 80% gain for NV in this test). I've seen a screenshot of it in action already, confirmed no dynamic lights in that mode.

If we can see a screen of DX11 mode in action, we can confirm immediately. If its a pure draw call test like the DX12 CPU mode, then bang, there's the answer: NV is already excellent for draw calls in DX11, AMD is bad at high draw calls.

I have no in action footage, but from these screenshots from wccftech, the benchmark appears to be the same between DX11 and DX12 judging by the config window:

DX11 benchmark
DX12 benchmark

The only difference between the two is that the DX12 version displays more information.

VR Enthusiast · Aug 24, 2015

TheELF said:
They didn't up to now?
They didn't have to worry about dx11 from the get go and they have low level access, if they didn't until now why should they in the future?

I believe the only reason why the ps4 has the level of performance that it has with such weak hardware is because they already use as much of it as they can,they already use all of these features and that's why the ports run so badly on PCs.

This isn't the case. Very few games up till this point use async shaders. I think maybe 3 on PS4, none on Xbox and 1 or 2 on PC. From anandtech article,

The reason for this is that old habits die hard and there is always a time to make a transition to new methods. I expect to see many more games using async shaders from this point on especially now that DX12 allows for it, even though it will run worse on Nvidia you have to understand that the devs don't really care about Nvidia's much smaller PC market share compared to console.

TheELF · Aug 24, 2015

Silverforce11 said:
FYI, PS4 has 8 ACE (same as recent GCN SKUs), XBone has 2. So devs will have to tone down usage for the Xbone, PC will probably have sliders, no worries.

Yes,ps4 also has 6 x86 cores for running the game same as the fx-6xxx that doesn't mean that the ps4 has the same level of computing power as the fx-6.

ShintaiDK · Aug 24, 2015

VR Enthusiast said:

It didnt do anything performance wise for BF4 and Thief. Just adds more points to that AOTS is broken in its current state.

Digidi · Aug 24, 2015

shady28 said:
It's 20k calls per frame, you're right my math is off but so is yours

Already shown that it takes 1.2ms per frame spent on API calls on the 980, and 1ms on the 290, assuming 20k draws/frame.

Using that, the assumption is the game is running at 100fps. Lets say that is on the 290.

So the amount of time spent on the API calls is

980 : 100fps X 0.0012s/frame = 0.12s
290 : 100fps x 0.001s/frame = 0.1s

Those numbers are how much time doing draw calls they spent for all 100 frames, not for one frame. Another way of saying that is that during that second, the 980 will spend 20ms (0.020s) spread out across 100 frames more than the 290x is spending doing draw calls.

20ms (0.020s) is 1/50th of a second : 1000ms(x)=20ms solve for X= 0.02 or 2%

So in 1 second at 20k calls/frame and 100 frames/s , the 980 will spend 2% of its time (20ms) doing draw calls that the 290X is free to do other things during. The TOTAL time it will spend doing draw calls is 120ms vs the 290s 100ms.

So yes, that's more significant, but it still doesn't explain the huge performance drop-offs on this game. It would amount to ~2fps compared to a 100fps 290x (if all else is equal), pretty close to the 1.6fps you found.

In any case, my main point is that the ability of the GPU to handle parallel draw calls is not the hold-up here. The bottleneck is elsewhere.

Edit : This is where I messed up with the math in the earlier post -

That should read that the 980 is handicapped by spending an additional 20ms doing draw calls for 100 frames during that 1s, not .002 fps.

I also not thinking that nvidia is bottelnecks by the rasterizer. I only want to say thaz AMD is also not bottelnecked by raszerizer.

AMD furyx can do 18.000.000 drawcalls in 3dmark test. Each drawcall have 112 polygons. So the polygonoutput from AMD is 2.000.000.000 Polygons each second. Nvidia can only output 1.500.000.000 billion Polygons. So AMD's rasterizer is close to its maximum. Nvidia is far away.

Gcn can handle in the worste case (when one polygon is as big like a pixel) 4 pixel per clock. That means at 1050mhz you get 2.300.000.000 Polygons. Thats realy close to the drawcalls test.

If a polygon is big as a pixel under UHD AMD can output 290fps per second

2.400.000.00/(3840*2160)=290fps

VR Enthusiast · Aug 24, 2015

ShintaiDK said:
It didnt do anything performance wise for BF4 and Thief. Just adds more points to that AOTS is broken in its current state.

BF4 on PC didn't use async shaders. Thief on Mantle shows big perf increases which may or may not be down to async shaders.

http://wccftech.com/mantle-api-directx-thief-benchmarks-direct3d-creamed/

http://www.legitreviews.com/wp-content/uploads/2014/03/Normal-645x305.jpg

You did not respond to my previous question yet,

Do you accept that AMD's LiquidVR and Asynchronous shaders are better at removing latency than Nvidia's solution, which is unacceptable for VR as stated by established industry veterans?

ShintaiDK · Aug 24, 2015

VR Enthusiast said:
BF4 on PC didn't use async shaders. Thief on Mantle shows big perf increases which may or may not be down to async shaders.

http://wccftech.com/mantle-api-directx-thief-benchmarks-direct3d-creamed/

http://www.legitreviews.com/wp-content/uploads/2014/03/Normal-645x305.jpg

I have to disappoint you. No async shader benefit in Thief. You forgot to post that you picked a CPU limited case with a very weak AMD CPU. No "free performance".

Also we know Thief is broken on GCN 1.2. Is that the future we gonna see?

VR Enthusiast said:
You did not respond to my previous question yet,

I thought it was made clear I dont support your view.

Silverforce11 · Aug 24, 2015

AtenRa said:
https://www.youtube.com/watch?v=9bvpyvoNXcw

Thanks. It looks like its rendering the same workloads in DX11 vs DX12, therefore in DX11 the compute lighting is coming in serial after rendering.

So now we can narrow down to 2 possibilities for why in DX12 GPU heavy test, NV tanks in performance.

1. Oxide's implementation is poor/unoptimized for NV hardware.
2. Kepler/Maxwell incur an small overhead using DX12 async shaders.

We have no evidence to say which is the likely one. Speculate away!

Silverforce11 · Aug 24, 2015

ShintaiDK said:
I thought it was made clear I dont support your view.

Strange you say that since its the official view from NV themselves.

They still have work to do to improve their latency for a good VR experience!

"The standard VR pipeline from input in (when you move your head) to photons out (when you see the action occur in-game) is about 57 milliseconds (ms). However, for a good VR experience, this latency should be under 20ms. Presently, a large portion of this is the time it takes the GPU to render the scene and the time it takes to display it on the headset (about 26ms). To reduce this latency we've reduced the number of frames rendered in advance from four to one, removing up to 33ms of latency, and are nearing completion of Asynchronous Warp, a technology that significantly improves head tracking latency, ensuring the delay between your head moving and the result being rendered is unnoticeable.

Combined, and with the addition of further NVIDIA-developer tweaks, the VR pipeline is now only 25ms."

http://www.geforce.com/whats-new/ar...us-the-only-choice-for-virtual-reality-gaming

Ashes of the Singularity User Benchmarks Thread

Diamond Member

Lifer

Lifer

Diamond Member

Member

Lifer

Diamond Member

Diamond Member

Member

Lifer

Diamond Member

Senior member

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Member

Diamond Member

Lifer

Junior Member

Member

Lifer

Lifer

Lifer