Ashes of the Singularity User Benchmarks Thread

Glo. · Aug 23, 2015

Enigmoid said:
This kind of thing is much easier to do in DX12 vs. DX11. You can simply send the GPU commands that work well on one set of hardware but not on another, and due to the lack of driver overhead, the driver cannot correct for this.

As a handwaving explanation consider programming in Java (DX11) and C++ (DX12). A well written C++ program should always be faster than the java equivalent. However java can be run with some very aggressive performance enhancements and libraries (driver intervention) so that a poorly written C++ program will run worse than that same program in java as the JRE can add performance enhancements. As well in a program written in C++ your performance optimizations can favour one architecture over another (ie loop unrolling).

Its the same thing with DX12 and Oxide (hate them or love them) have stated as much.

So in your mind whole DirectX 12 API is locked for Nvidia hardware?

Game engine executes itself on the API, and API on the hardware. There is nothing except this.

Yes, of course you can lock one vendor from performance. By using asynchronous shading. But that is the problem of the nature of the hardware you have. Now you see the problem? Parallelism is the way to go from now on.

Everything worked for both GPU vendors. Nvidia, which designed GPUs with planned obsolescence, AMD GPUs that made GPUs for yesterday, today and tomorrow, but already outdated and being bottlenecked by their nature in games.

14/16 nm battle will be extremely interesting, however Im starting to feel the winner already here.

Intel.

Enigmoid · Aug 23, 2015

crislevin said:
I can see your point that malicious developers can purposely hinder a vendor.

But if you are implying this is the case in AoS, or if you are suggesting nVidia's hardware capability is not the problem with DX12, or if you are saying allowing low-level access to GPU is somehow bad, then you would be ................... lets just say "wrong".

I'm saying that the developer needs to be careful. Low level hardware access brings benefits but to do a good job at using those benefits is hard. Architectures differ and optimizations for one may hurt another.

Digidi · Aug 23, 2015

Glo. said:
I think what people forget, and what pretty much gives more credit to Mahigan posts is the effect of more cores on Nvidia GPU in 4K resolution.

To 1600p resolution, we completely don't see any difference between the use of 4 or 6 cores, with, and without HT. The Arstechnicas test with R9 290X and GTX 980 Ti provides it.
http://cdn.arstechnica.net/wp-conte...ew-chart-template-final-full-width-3.0021.png - 4 cores, without HT.
http://cdn.arstechnica.net/wp-conte...ew-chart-template-final-full-width-3.0011.png - fully enabled CPU.

29 vs 32 FPS in DX12 mode with 4 vs 6 cores.

More work done by the CPU and scheduled by the ACE in Maxwell provides higher performance, only in 4K. The engine does not need more CPU work on lower resolutions therefore, we don't see of any benefit in them. 4K is something rather different.

The biggest problem about all this is:
Nvidia knew what will be with DX12, and deliberately released Maxwell architecture, to sell as many GPUs as they can. With DX12 performance gimped, now they can bring new design, which will make all of the people who bought Maxwell and older architectures, more likely to buy new hardware.
Planned obsolescence.
As for GCN. Unfortunately its the same story. Fury X will not fly in future games because of Rasterization Bottleneck.
All of our hardware in fact is already outdated for DX12 games if we look at the potential of the API, and future hardware will make that potential extremely possible, and that is a shame.

Furyx is not rasterizer bottlenecked. Otherwise the 3dmark DirectX 12 drawcalltest will not run good on AMD Hardware. Nvidia is rasterizer bottlenecked but it's not the rasterizer it self which make the bottleneck, the Problem which nvidia have is that the command processor can't feed enought the rasterizer!

A drawcall test is also always a test of the rasterizer. Each drawcall send a huge amount of polgons to the gpu. The gpu have to change the Polygons to pixel, thats what a rasterizer is doing. If the rasterizer lags the drawcalls will be much lower. But in the pure drawcalltest AMD beats out Nvidia!

Glo. · Aug 23, 2015

Thank you digidi, this makes everything even more interesting.

Azix · Aug 23, 2015

It's interesting how having bad drivers becomes a defense when nvidia is losing.

Also that people are so willing to assume the developer is shady but not so much with gameworks and other situations where nvidias involvement is bad for AMD but not the other way.

strange

shady28 · Aug 23, 2015

Digidi said:
A drawcall test is also always a test of the rasterizer. Each drawcall send a huge amount of polgons to the gpu. The gpu have to change the Polygons to pixel, thats what a rasterizer is doing. If the rasterizer lags the drawcalls will be much lower. But in the pure drawcalltest AMD beats out Nvidia!

You can calculate the effect of draw calls. Its impact on performance does not explain any of this.

Lets say you are in DX11 and can issue 2M draw calls per second, and in DX12 you can do 12M/s (from a PCPer bench on the 3dmark API).

Only about 10k-20k are used in AAA games (ref at bottom). So assume the game is going to ask for 20,000 draw calls/s.

DX11 MT :
980 Max 2.75M draw calls/s
290X Max 1.05M draw calls/s

DX12 :
980 Max 15.7M draw calls/s
290x Max 19.12M draw calls/

(game draw calls/s) / (Max draws/s) = % of time used for draw calls

Multiply %time by 1s to get the amount of time in s, or by 1000ms to get the number of milliseconds. I'm going express results in milliseconds.

Using the 20k call/s game :

DX11 / GTX 989 :
20k/2.75m = 7.2ms

DX11 / R9 290X :
20k/1.05m = 19ms

DX12 / GTX 980 :
20k / 15.7M = 1.2ms (833 fps)

DX12 / R9 290X :
20k / 19.12M = 1.0ms (1000fps)

The above means a 980 will take 0.2ms more for the actual calls when doing 20k draw calls per frame.

What does that do to fps if all else were equal?

Well it would be the difference in draw call time x the fps.

So if you have a game that runs at 100fps ignoring draw call time:

On the 980 the time doing draws would be (100frames/s) X (.0012s) = 0.12 fps

On the 290x the time doing draws would be (100frames/s) X (.001s) = 0.1 fps

So in other words, the 980 would be 'handicapped' by 0.02FPS - the time it spent issuing the draws.

In DX11 it is a lot more significant, by factor of 7-20. If you take 0.02fps and multiply it by 7 or 20 you get some fps hits that might be noticeable on a benchmark.

But the ability to issue the draw call itself is clearly so high in DX12 on both AMD and Nvidia that any perceivable bottlenecks have to be elsewhere.

References :

http://www.pcper.com/reviews/Editorial/What-Exactly-Draw-Call-and-What-Can-It-Do

Graphics vendors state that amazing PC developers should be able to push about 10,000 to 20,000 draw calls per frame with comfortable performance. This Assassin's Creed, on the other hand, was rumored to be pushing upwards of 50,000 at some points, and some blame its performance issues on that.

The number of simple draw calls that a graphics card can process in a second does not have a strong effect on overall performance. If the number of draw calls in the DirectX 12 results are modeled as a latency, which is not the best way to look at it but it helps illustrate a point, then a 10% performance difference is about five nanoseconds (per task). This amount of time is probably small compared to how long the actual workload takes to process.

antihelten · Aug 23, 2015

shady28 said:
So if you have a game that runs at 100fps ignoring draw call time:

On the 980 the time doing draws would be (100frames/s) X (.0012s) = 0.12 fps

On the 290x the time doing draws would be (100frames/s) X (.001s) = 0.1 fps

So in other words, the 980 would be 'handicapped' by 0.02FPS - the time it spent issuing the draws.

Your math here is a bit off.

It's not (100frames/s) X (.0012s), it's (100frames/s) X (.0012s/frame), which then gives you 0.12 s/s, or in other words for every one second of rendering minus draw calls you will need 0.12 seconds of draw calls. So rendering your 100 frames will take a total of 1.12 seconds, and thus run at 89.3 fps

the 290X example would run at 90.9 fps and thus the difference is 1.62 fps. this still isn't significant from a gaming experience pov of course, but it's significantly more than 0.02 fps.

Edit: On a second reading it's not really clear what your scenario is? are we talking 20k draw calls per second (and thus 200 draw calls per frame if running at 100 fps), or are we talking 20k draw calls per frame (and thus 2M draw calls per second if running at 100fps)? My correction is based on the latter scenario (20k draw calls per frame)

Silverforce11 · Aug 23, 2015

It should be calculated as 10-20K PER FRAME. The example of ACU pushing up to 50K per frame would definitely cause a CPU bottleneck, as 60 fps = 3M per second.

antihelten · Aug 23, 2015

Silverforce11 said:
It should be calculated as 10-20K PER FRAME. The example of ACU pushing up to 50K per frame would definitely cause a CPU bottleneck, as 60 fps = 3M per second.

Agreed, 200 draw calls per frame is unrealistically low, even for DX11 (by an order of magnitude roughly)

shady28 · Aug 23, 2015

antihelten said:
Edit: On a second reading it's not really clear what your scenario is? are we talking 20k draw calls per second (and thus 200 draw calls per frame if running at 100 fps), or are we talking 20k draw calls per frame (and thus 2M draw calls per second if running at 100fps)? My correction is based on the latter scenario (20k draw calls per frame)

It's 20k calls per frame, you're right my math is off but so is yours

Already shown that it takes 1.2ms per frame spent on API calls on the 980, and 1ms on the 290, assuming 20k draws/frame.

Using that, the assumption is the game is running at 100fps. Lets say that is on the 290.

So the amount of time spent on the API calls is

980 : 100fps X 0.0012s/frame = 0.12s
290 : 100fps x 0.001s/frame = 0.1s

Those numbers are how much time doing draw calls they spent for all 100 frames, not for one frame. Another way of saying that is that during that second, the 980 will spend 20ms (0.020s) spread out across 100 frames more than the 290x is spending doing draw calls.

20ms (0.020s) is 1/50th of a second : 1000ms(x)=20ms solve for X= 0.02 or 2%

So in 1 second at 20k calls/frame and 100 frames/s , the 980 will spend 2% of its time (20ms) doing draw calls that the 290X is free to do other things during. The TOTAL time it will spend doing draw calls is 120ms vs the 290s 100ms.

So yes, that's more significant, but it still doesn't explain the huge performance drop-offs on this game. It would amount to ~2fps compared to a 100fps 290x (if all else is equal), pretty close to the 1.6fps you found.

In any case, my main point is that the ability of the GPU to handle parallel draw calls is not the hold-up here. The bottleneck is elsewhere.

Edit : This is where I messed up with the math in the earlier post -

So in other words, the 980 would be 'handicapped' by 0.02FPS - the time it spent issuing the draws.

That should read that the 980 is handicapped by spending an additional 20ms doing draw calls for 100 frames during that 1s, not .002 fps.

Carfax83 · Aug 23, 2015

Silverforce11 said:
Lionhead Studios disagree with you.

https://www.youtube.com/watch?v=7MEgJLvoP2U&feature=youtu.be&t=20m59s

Yeah, Fable Legends uses asynchronous compute, and Fable Legends definitely runs faster on NVidia hardware in DX12 mode compared to DX11 as shown by those videos I posted earlier..

So why the discrepancy between Fable Legends and AotS? I still think it's because Oxide just haven't spent much time optimizing the NVidia path as they have the AMD path for obvious reasons, ie being an AMD partner..

Not because of some conspiracy though or anything nefarious. I don't think Oxide would do such a thing, as they would be hurting their potential sales due to NVidia's dominance of the discrete GPU market, and they haven't shown themselves to be unethical about such things in the past.

At this stage, it's relatively harmless despite causing verbose discussion all across the net about NVidia's DX12 capabilities, since the game is still in alpha..

Silverforce11 said:
It should be calculated as 10-20K PER FRAME. The example of ACU pushing up to 50K per frame would definitely cause a CPU bottleneck, as 60 fps = 3M per second.

ACU never pushed 50K draw calls. That was just an unsubstantiated rumor..

Silverforce11 · Aug 23, 2015

Carfax83 said:
Yeah, Fable Legends uses asynchronous compute, and Fable Legends definitely runs faster on NVidia hardware in DX12 mode compared to DX11 as shown by those videos I posted earlier..

Because Ashes also run faster on NV when you test draw call mode, without dynamic lighting/async compute. You have to ask Fable devs for more info at what's going on in the video to be certain.

If you mean this video: https://www.youtube.com/watch?v=Z_XLX7qYmGY

It's the UAV implementation of DX12 giving them a speed up. Notice they mention it specifically and also look at the spell bolts & weapon effects, NO DYNAMIC LIGHTING present in that test case.

You can compare it directly to the presentation they did here: https://www.youtube.com/watch?v=7MEgJLvoP2U&feature=youtu.be&t=21m45s

Look at the spell bolt travelling on the ground, the dynamic lighting it gives off, compared to the one demoing UAV, nothing.

Carfax83 · Aug 23, 2015

Silverforce11 said:
It's the UAV implementation of DX12 giving them a speed up. Notice they mention it specifically and also look at the spell bolts & weapon effects, NO DYNAMIC LIGHTING present in that test case.

Yes, DX12 has typed UAV which apparently can give performance increases compared to the standard UAV in DX11..

But how do you know there was no dynamic lighting present in the video? Fable Legends uses a dynamic GLOBAL illumination system. So all light sources are dynamic, including the Sun. That's how they are able to have time of day.

And this system relies heavily on compute shaders as stated in the blogpost, and will use asynchronous compute capabilities in both Maxwell and GCN GPUs.

Video showcasing dynamic global illumination.

Lionhead studios blogpost about dynamic global illumination in Fable Legends.

shady28 · Aug 23, 2015

Carfax83 said:
At this stage, it's relatively harmless despite causing verbose discussion all across the net about NVidia's DX12 capabilities, since the game is still in alpha..

Definitely in alpha. I downloaded it and right now, it really doesn't hold a candle to SupCom FA from 2008. There are only a few aspects where the graphics look a lot better, but the overall package is far weaker.

That said they have almost an entire year to finish it. They need it.

SupCom FA:

Ashes :

Silverforce11 · Aug 23, 2015

Carfax83 said:
Yes, DX12 has typed UAV which apparently can give performance increases compared to the standard UAV in DX11..

But how do you know there was no dynamic lighting present in the video? Fable Legends uses a dynamic GLOBAL illumination system. So all light sources are dynamic, including the Sun. That's how they are able to have time of day.

I don't know, I'm just saying from visuals, its possible.

In the video you mentioned: https://www.youtube.com/watch?v=Z_XLX7qYmGY

There's no dynamic light source for spells & weapon effects. In the demo showed where they mention async compute, there is. So from a viewer PoV, one's very static, other isn't.

If you want more info you have to talk to Lionhead. Otherwise let's not assume to be correct, positive or negative for anything from the vids.

Edit: The contention here is that NV drops off in performance in Ashes in the full test (but gains perf in the draw call test!) is perhaps due to poor async compute/shading performance OR poor implementation by Oxide. For the blame on Oxide to be true, you would need to show proof that Kepler/Maxwell is great at async compute/shading. As I said, I only see examples of gamedevs praising async compute/shading for GCN, not for Kepler/Maxwell.

antihelten · Aug 23, 2015

shady28 said:
It's 20k calls per frame, you're right my math is off but so is yours

Already shown that it takes 1.2ms per frame spent on API calls on the 980, and 1ms on the 290, assuming 20k draws/frame.

Using that, the assumption is the game is running at 100fps. Lets say that is on the 290.

So the amount of time spent on the API calls is

980 : 100fps X 0.0012s/frame = 0.12s
290 : 100fps x 0.001s/frame = 0.1s

Those numbers are how much time doing draw calls they spent for all 100 frames, not for one frame. Another way of saying that is that during that second, the 980 will spend 20ms (0.020s) spread out across 100 frames more than the 290x is spending doing draw calls.

20ms (0.020s) is 1/50th of a second : 1000ms(x)=20ms solve for X= 0.02 or 2%

So in 1 second at 20k calls/frame and 100 frames/s , the 980 will spend 2% of its time (20ms) doing draw calls that the 290X is free to do other things during. The TOTAL time it will spend doing draw calls is 120ms vs the 290s 100ms.

So yes, that's more significant, but it still doesn't explain the huge performance drop-offs on this game. It would amount to ~2fps compared to a 100fps 290x (if all else is equal), pretty close to the 1.6fps you found.

In any case, my main point is that the ability of the GPU to handle parallel draw calls is not the hold-up here. The bottleneck is elsewhere.

Edit : This is where I messed up with the math in the earlier post -

That should read that the 980 is handicapped by spending an additional 20ms doing draw calls for 100 frames during that 1s, not .002 fps.

Actually my math wasn't off, as the numbers you just posted here are essentially the same as mine (with the exception that I converted my numbers to fps)

89.3 fps for the 980 is the same as 11.2 ms per frame, or a total of 1.12 seconds for 100 frames, which is the same as 1 second plus 120 ms (the same number as yours).

90.9 fps for the 290 is the same as 11 ms per frame, or a total of 1.1 second for 100 frames, which is the same as 1 second plus 100 ms (again the same number as yours).

This then gives a difference of 1.6 fps or 0.2 ms per frame (20 ms for 100 frames). If you normalize the 290 to 100 fps, then the 980 would be running at 98 fps, and the difference becomes 2 fps as you mentioned (and still the same 0.2 ms per frame). So the overall difference in fps depends upon what level you normalize to, but the difference in ms remains constant.

For what it's worth I absolutely agree with your overall point of the bottleneck being elsewhere.

TheProgrammer · Aug 23, 2015

VR Enthusiast said:
They don't have the hardware ACEs that AMD has. Just check a block diagram of GCN and Maxwell and you can see AMD has ACEs while Nvidia has none.

I found this forum due to a post by Zlatan (a PS4 dev) while searching for more information. I recommend paying attention to his posts.

Recently again he has pointed out the deficiencies in Maxwell's VR,

And this post further down.

Another post by Zlatan covers this.

Nvidia just doesn't have the hardware and it's something they can't optimise around. Even with Pascal they still won't have anything like Mantle. AMD's VR superiority is real and assured for at least two more years. This is why all VR games are being developed on LiquidVR. That's not to say that GameWorks VR wont get development time too - it will - but it will always be an inferior option.

Even an R9 380 will offer a better VR experience than a Titan X.

Excellent post and accurate information. It's time to scramble to get away from Nvidia and buy into GCN if you haven't already.

Times have changed.

TheELF · Aug 24, 2015

TheProgrammer said:
Excellent post and accurate information. It's time to scramble to get away from Nvidia and buy into GCN if you haven't already.

Times have changed.

Sure,and also dump your amd cpus since now the i3 is officially faster then fx-8xxx in dx12 games.

Silverforce11 · Aug 24, 2015

TheELF said:
Sure,and also dump your amd cpus since now the i3 is officially faster then fx-8xxx in dx12 games.

Probably nothing can help save the faildozer.

Zen is their only hope.

ocre · Aug 24, 2015

Yeah, some alleged ps4 dev says maxwell might have to use a context switch which might have a penalty and at the same time admits that it might not even be useful to have more than one asynchronous pipeline.

Sounds like for sure nvidia is doomed. Yep...

Makes perfect good sense right?

Oh yeah, then he says they don't have a chance because of mantle.

TheELF · Aug 24, 2015

The thing is that 99% (if not more) of AAA games coming out are console ports and people are sitting here talking about the 1 game designed on pc for pc that is showcasing a worst case scenario for drawcalls,sure its a valid point for this kind of games, but this kind of games will be very sparse.
And it's not even finished yet so you can't draw conclusions yet.

Silverforce11 · Aug 24, 2015

TheELF said:
The thing is that 99% (if not more) of AAA games coming out are console ports and people are sitting here talking about the 1 game designed on pc for pc that is showcasing a worst case scenario for drawcalls,sure its a valid point for this kind of games, but this kind of games will be very sparse.
And it's not even finished yet so you can't draw conclusions yet.

Who is drawing conclusions?

I'm only speculating as to WHY DX12 scales for NV under "worse case" heavy draw call test in Ashes (so devs have DX12 optimized fine for NV!) but it tanks in GPU testing with dynamic lights via async compute.

I lay no blame, not on Oxide, not on NV. Curiously, with the VR latency & Async Warp info (provided by NV themselves), it seems Maxwell is perhaps gimped with Async Shaders. It seems a logical explanation for the observed results in Ashes.

Until more evidence is available, its only a speculation.

Drawing conclusions would be to claim with certainty that Oxide is being dirty due to AMD sponsorship.

3DVagabond · Aug 24, 2015

TheELF said:
Sure,and also dump your amd cpus since now the i3 is officially faster then fx-8xxx in dx12 games.

The CPU forum is ^ that a way.

3DVagabond · Aug 24, 2015

ocre said:
Yeah, some alleged ps4 dev says maxwell might have to use a context switch which might have a penalty and at the same time admits that it might not even be useful to have more than one asynchronous pipeline.

Sounds like for sure nvidia is doomed. Yep...

Makes perfect good sense right?

Oh yeah, then he says they don't have a chance because of mantle.

Can't refute the message attack the messenger? When you say alleged, are you accusing the poster of lying, or just trying to cast shadows?

Are you saying that DX12 can only use one asynchronous pipeline? And if not, then how do you propose Maxwell deal with it?

TheELF · Aug 24, 2015

Silverforce11 said:
Who is drawing conclusions?

I said you can't draw conclusions,not that someone is drawing conclusions.

Ashes of the Singularity User Benchmarks Thread

Diamond Member

Platinum Member

Junior Member

Diamond Member

Golden Member

Platinum Member

Golden Member

Lifer

Golden Member

Platinum Member

Diamond Member

Lifer

Diamond Member

Platinum Member

Lifer

Golden Member

Member

Diamond Member

Lifer

Golden Member

Diamond Member

Lifer

Lifer

Lifer

Diamond Member