Ashes of the Singularity User Benchmarks Thread

Silverforce11 · Aug 31, 2015

Shivansps said:
Reemplace "Compute" for "Mantle" and its the exact same argument from Mantle launch hype... again how Mantle is doing howdays?

So i hope you understand that argument alone does not work.

Mantle is doing great, say hi to Vulkan, Metal and DX12.

Classic "downplay the feature your favorite hardware doesn't support". Okay fine, if you feel that way, move along, nothing else for you in this thread because now your only contribution is "Async Compute is like Mantle it's not important".

dzoni2k2 · Aug 31, 2015

Hitman928 said:
Here is Tahiti (GCN1.0) for comparison, it's not mine, I pulled the numbers from the thread to graph them.

And that's how it should look if asynchronous compute was actually working as it should.

Shivansps · Aug 31, 2015

Silverforce11 said:
Mantle is doing great, say hi to Vulkan, Metal and DX12.

Classic "downplay the feature your favorite hardware doesn't support". Okay fine, if you feel that way, move along, nothing else for you in this thread because now your only contribution is "Async Compute is like Mantle it's not important".

Sure, thats probably why those APIs are AMD only.

Again you are using the exact same arguments used for mantle defense, why should i think anything will turn out diferently now? And you dont seem to know what to say.

You are talking about a feature that is petty much AMD only right now, no much diferently of DX11 and the Deferred Contexts that where Nvidia only that was rarely used. the only thing that may be different now is that xone is running on DX12, contrary to what happened with mantle, its the only thing that may shift the balance.

TheProgrammer · Aug 31, 2015

ocre said:
<snip snip>
Ultimately, we have very little data and a lot of wishful thinking. The real games of dx12 will be all that matters, not some alpha benchmark that starts up with an AMD logo on it

Sounds to me like you regret getting robbed by Nvidia for that 980. :biggrin:

The only wishful thinking is that you could turn it into a Sapphire Tri-X R9 Fury. :whiste:

Trolling is against the forum rules and you should know that.

-Rvenger

Silverforce11 · Aug 31, 2015

Shivansps said:
Sure, thats probably why those APIs are AMD only.

Wow, we still have deniers? You must not have kept up to date!

https://www.khronos.org/assets/uplo...y/2015-gdc/Valve-Vulkan-Session-GDC_Mar15.pdf <-- As official as you can get

desprado · Aug 31, 2015

Silverforce11 said:
So @Shivansps says Compute is no big deal in games.. devs say it is, thats why they wanted next-gen APIs.

Sony defnitely thinks its important, thats why they help AMD designed GCN with 8 ACEs, and PS4 games have already started to use Async Compute to give some some performance gains.

As said, when working with weaker hardware on consoles, devs will have to find methods to extract peak performance out of it, so ofc the statement from Oxide that console devs are pushing the console's ACEs in next-gen games makes perfect sense.

The question is what will be done for those heavy async compute console games when they get ported to PC? This is where you are free to speculate. But to say consoles don't use async compute, is wrong.

Be specific which developers claimed or asked AMD for low level API?

DICE and Oxide?

So these are only the one who asked?

Do you have proper names of developers who claimed or are you just quoting AMD that many developers asked them for level API which they never told the name even in the interview they never told the name.

Plz post your theroy on logic so that later you do not get disappointed like you just recently claiming that Fury X will be 20% or 40% faster than Titan X so it better you just come to reality.

dogen1 · Aug 31, 2015

Shivansps said:
Sure, thats probably why those APIs are AMD only.

Again you are using the exact same arguments used for mantle defense, why should i think anything will turn out diferently now? And you dont seem to know what to say.

You are talking about a feature that is petty much AMD only right now, no much diferently of DX11 and the Deferred Contexts that where Nvidia only that was rarely used. the only thing that may be different now is that xone is running on DX12, contrary to what happened with mantle, its the only thing that may shift the balance.

Both XB1 and PS4 games use async compute, if that's what you're talking about.

Silverforce11 · Aug 31, 2015

desprado said:
Plz post your theroy on logic so that later you do not get disappointed like you just recently claiming that Fury X will be 20% or 40% faster than Titan X so it better you just come to reality.

I never claimed such. Go find my post, dare you to prove me wrong.

I've always said I expect or hope it to be ~15% above Titan X, because that is when I will consider throwing $ at it to upgrade. It did not meet my expectations, so I have not bought Fury X.

Now, if you have anything else of value to add besides the usual "Mantle is useless, Async Compute is useless"..

Oh, here's the source where Sony thought that Async Compute was an important feature by 2015 (from 2009!):

http://www.gamasutra.com/view/feature/191007/inside_the_playstation_4_with_mark_.php?print=1

desprado · Aug 31, 2015

Silverforce11 said:
I never claimed such. Go find my post, dare you to prove me wrong.

I've always said I expect or hope it to be ~15% above Titan X, because that is when I will consider throwing $ at it to upgrade. It did not meet my expectations, so I have not bought Fury X.

Now, if you have anything else of value to add besides the usual "Mantle is useless, Async Compute is useless"..

Oh, here's the source where Sony thought that Async Compute was an important feature by 2015 (from 2009!):

http://www.gamasutra.com/view/feature/191007/inside_the_playstation_4_with_mark_.php?print=1

I am asking the name of those developers who asked AMD for low API.So in short can you provide those "many developers" names.
If you do not have names than plz you do not need to post base on fanboyism.

I have a pic that you were calming that Fury X will be 20% faster than Titan X. Therefore, it is not hard to talk sense and base on reality.

Erenhardt · Aug 31, 2015

Silverforce11 said:
980Ti

GCN has the same time for either compute, or combined graphics + compute async.

Hitman928 said:
Here is Tahiti (GCN1.0) for comparison, it's not mine, I pulled the numbers from the thread to graph them.

Interesting...
If you put one graph on the other you can see where tahiti matches 980ti in performance.

So, it is possible that with enough graphics offloaded to compute 280X will be faster than 980ti thanks to async compute.

Ofcourse it depends on how much of that will be in games. Fun times ahead.

selni · Aug 31, 2015

Silverforce11 said:
Mantle is doing great, say hi to Vulkan, Metal and DX12.

Classic "downplay the feature your favorite hardware doesn't support". Okay fine, if you feel that way, move along, nothing else for you in this thread because now your only contribution is "Async Compute is like Mantle it's not important".

We do appear to be seeing some of the traditional drawbacks of low level APIs here with DX12 though - you often need significantly different codepaths per hardware platform to get good performance. Eg, mantle implementations had some initial issues with newer GCN revisions (thinking of tonga here) and nvidia DX12 performance here is actually worse than DX11.

It's still looking promising judging from the industry reception, but there's some warning signs appearing here too.

Magee_MC · Aug 31, 2015

I think that the most interesting part of all of this so far is that NV has been completely silent on the subject while this has played out.

From what I can tell from the discussions and information presented so far is that while yes, Maxwell can do asynchronous compute, its hardware implementation requires a context switch that in essence not only negates the increased performance that asynchronous compute is used to extract, the context switch actually degrades performance compared to not using it.

If this is true, then for Maxwell, the only options that NV will have is to have the devs add code to check for a Maxwell card, and if one is detected then run the code without asynchronous compute, or have the drivers tell the engine that asynchronous compute isn't available on the card.

One thought for a possible solution might be, if a system is running in SLI have one card do only compute and the other do graphics and async compute. This might allow the card only doing compute to use async compute without a context change, but I have no idea if it's even possible in DX12 or how much effort it would take.

The real billion dollar question that needs to be answered is when did NV realize that async compute was going to be such an important part in increasing game speed in the new APIs (DX12, Vulkan, PS4).

AMD and Dice announced that they were working on Mantle in September of 2013 and both D3D12 and Pascal were announced in March of 2014. I would expect that work on Pascal had started sometime before then, but how much before, and most importantly when was the design finalized? It's possible that NV could have known that something was in the wind since in early 2013 the word was that the PS4 had added 8 ACEs, and D3D12 had been in development well prior to that.

So as I see it the choices are; either NV realized that they needed to revamp how they handle async compute before Pascal was finished being designed, and made the needed changes, or they didn't realize what was coming before Pascal was complete. If it's the first, then they'll be ok. By the time DX12 is really ramping up, they'll have Pascal and an upgrade path. If not, then they really have to have it in Volta, but that leads to the same question, since Volta was announced in March of 2013 how will it handle async compute and where is it in the design process.

Very interesting times are ahead.

littleg · Aug 31, 2015

Would this also explain the higher latency NV cards are seeing in VR applications?

Silverforce11 · Aug 31, 2015

Magee_MC said:
I think that the most interesting part of all of this so far is that NV has been completely silent on the subject while this has played out.

They were silent for awhile with the 3.5gb 970. Then their PR came out and called it a feature.

Their PR is no doubt busy preparing a statement for this. Let me guess what's gonna be included:

1. "We said we support async compute with Maxwell 2, and it's true, it does work. [It's just not any good at it!]"

2. "Async compute is one of many new features in DX12 that game devs can use to increase performance and visuals, we will work with developers to ensure DX12 games will run well on our hardware"

3. "Async compute is a small part of DX12. We are fully FL12.1 ready, 12.1 is bigger than 12"

4. "Pascal will arrive soon, it's much better at Async Compute, so for users who are unhappy, we have given them a viable upgrade route"

Silverforce11 · Aug 31, 2015

littleg said:
Would this also explain the higher latency NV cards are seeing in VR applications?

Yup. Async Timewarp, can't timewarp when it's stuck in traffic waiting for the graphics task in front to finish rendering.

AtenRa · Aug 31, 2015

ocre said:
You cannot ignore the fact that the PC gaming market is 80% nvidia.

For the millionth time, NVIDIA doesnt have 80% of the PC gaming market. They just shipped 80% of Desktop dGPUs in Q2 2015 only.

PC gaming market is comprise of Desktop APUs, Desktop dGPUs, Laptop APUs and Laptop dGPUs.
So NVIDIA marketshare in PC Gaming is not that high as people make it or believe it is.

Erenhardt · Aug 31, 2015

Magee_MC said:
I think that the most interesting part of all of this so far is that NV has been completely silent on the subject while this has played out.

From what I can tell from the discussions and information presented so far is that while yes, Maxwell can do asynchronous compute, its hardware implementation requires a context switch that in essence not only negates the increased performance that asynchronous compute is used to extract, the context switch actually degrades performance compared to not using it.

If this is true, then for Maxwell, the only options that NV will have is to have the devs add code to check for a Maxwell card, and if one is detected then run the code without asynchronous compute, or have the drivers tell the engine that asynchronous compute isn't available on the card.

One thought for a possible solution might be, if a system is running in SLI have one card do only compute and the other do graphics and async compute. This might allow the card only doing compute to use async compute without a context change, but I have no idea if it's even possible in DX12 or how much effort it would take.

The real billion dollar question that needs to be answered is when did NV realize that async compute was going to be such an important part in increasing game speed in the new APIs (DX12, Vulkan, PS4).

AMD and Dice announced that they were working on Mantle in September of 2013 and both D3D12 and Pascal were announced in March of 2014. I would expect that work on Pascal had started sometime before then, but how much before, and most importantly when was the design finalized? It's possible that NV could have known that something was in the wind since in early 2013 the word was that the PS4 had added 8 ACEs, and D3D12 had been in development well prior to that.

So as I see it the choices are; either NV realized that they needed to revamp how they handle async compute before Pascal was finished being designed, and made the needed changes, or they didn't realize what was coming before Pascal was complete. If it's the first, then they'll be ok. By the time DX12 is really ramping up, they'll have Pascal and an upgrade path. If not, then they really have to have it in Volta, but that leads to the same question, since Volta was announced in March of 2013 how will it handle async compute and where is it in the design process.

Very interesting times are ahead.

Or it is planned obsolesce. Without much improvement the next gpu lineup will be a lot faster in dx12 thanks to one change that enabled async compute.

They will release new GPUs about a year into DX12. It is not much of a window for amd to get traction.

People will downplay it as usual, protecting their buying decision. And the minute nv releases dx12 card that support async compute, they will upgrade to the new nv gpu immediately.

NV dictates trends here, and async compute will be important when nv says so.

AMD took their sweet time to leverage those ACE units we saw with first GCN cards 4 years ago..

sontin · Aug 31, 2015

dzoni2k2 said:
And that's how it should look if asynchronous compute was actually working as it should.

Have you looked at all of the numbers?
Fury X:https://forum.beyond3d.com/posts/1869030/

one kernel/batch:
Graphics: 25,18ms
Compute: 49,65ms
G+C: 55,93ms

And a GTX970: https://forum.beyond3d.com/posts/1869008/
One kernel/batch:
Graphics: 32,13ms
Compute: 9,77ms
G+C: 41,63ms

Up to 32 batches the GTX970 is faster than the Fury X. Only after this the GTX970 gets slower.

Using a compute queue on AMD hardware is introducing a huge latency. That's the reason why asynchronous compute is "free" for them.

That is really ironic, huh? :biggrin:

Silverforce11 · Aug 31, 2015

sontin said:
Have you looked at all of the numbers?
Fury X:https://forum.beyond3d.com/posts/1869030/

One queue:
Graphics: 25,18ms
Compute: 49,65ms
G+C: 55,93ms

And a GTX970TI: https://forum.beyond3d.com/posts/1869008/
One queue:
Graphics: 32,13ms
Compute: 9,77ms
G+C: 41,63ms

Up to 32 queues the GTX970TI is faster than the Fury X. Only after this the GTX970 gets slower.

Using a compute queue on AMD hardware is introducing a huge latency. That's the reason why asynchronous compute is "free" for them.

That is really ironic, huh? :biggrin:

You have to understand the application better to draw that context, but I was expecting you or ocre to point that out. Please ask the programmer of the app for better clarity.

Carfax83 · Aug 31, 2015

Magee_MC said:
From what I can tell from the discussions and information presented so far is that while yes, Maxwell can do asynchronous compute, its hardware implementation requires a context switch that in essence not only negates the increased performance that asynchronous compute is used to extract, the context switch actually degrades performance compared to not using it

Apparently, AMD's ACEs use context switches as well. There's lots of assumptions being thrown about, as usual. The guys over at beyond3d are a lot more sober when it comes to discussing this compared to this forum with all of its biases and partisanship.

For instance, someone at beyond3d said that Maxwell 2 definitely supports asynchronous compute as it's mentioned in the CUDA developer toolkit.

And from Andrew Lauritzen, something that some of us including myself have been saying in this thread:

And let's remember, an ideal architecture would not require additional parallelism to reach full throughput, so while the API is nice to have, seeing "no speedup" from async compute is not a bad thing if it's because the architecture had no issues keeping the relevant units busy without the additional help It is quite analogous to CPU architectures that require higher degrees of multi-threading to run at full throughput vs. ones with higher IPCs.

Source

3DVagabond · Aug 31, 2015

AtenRa said:
For the millionth time, NVIDIA doesnt have 80% of the PC gaming market. They just shipped 80% of Desktop dGPUs in Q2 2015 only.

PC gaming market is comprise of Desktop APUs, Desktop dGPUs, Laptop APUs and Laptop dGPUs.
So NVIDIA marketshare in PC Gaming is not that high as people make it or believe it is.

But it sounds good. It's just one stop for the hype train as it keeps rolling along.

sontin · Aug 31, 2015

Silverforce11 said:
You have to understand the application better to draw that context, but I was expecting you or ocre to point that out. Please ask the programmer of the app for better clarity.

Huh? I see a huge 25ms penalty when using a compute queue next to the graphics queue. It doubles the time to process the informationen over graphics only and put the Fury X behind a GTX970.

Using compute workload is adding a processing penalty to AMD's GCN architecture. So developers need to use a huge amount of kernels to bring AMD at the same performance level as nVidia when using compute...

Silverforce11 · Aug 31, 2015

sontin said:
Huh? I see a huge 25ms penalty when using a compute queue next to the graphics queue. It doubles the time to process the informationen over graphics only and put the Fury X behind a GTX970.

Using compute workload is adding a processing penalty to AMD's GCN architecture. So developers need to use a huge amount of kernels to bring AMD at the same performance level as nVidia when using compute...

Because he said he develop it to only test whether async compute is functional or not, it's not a benchmark. You are jumping to conclusion thinking its a benchmark. So if you want to use the result as such, go ask the developer for his thoughts to prevent misinterpretation.

@Carfax83
CUDA and DX12 are different animals, please don't confuse the two. As far as we know, Hyper-Q is fully functional in CUDA, with 32 compute queues running in parallel.

sontin · Aug 31, 2015

And yet his programm shows us that AMD has a huge latency problem when using the compute pipeline of their architecture.

"Asynchronous shaders" is not a feature, it is necessary for AMD to utilize their architectur in a best way. On the other hand using only compute workload and the graphics queue is waiting for the result of the compute queue will hurt AMD when developers dont have the workload to saturate the compute units.

Silverforce11 · Aug 31, 2015

sontin said:
And yet his programm shows us that AMD has a huge latency problem when using the compute pipeline of their architecture.

"Asynchronous shaders" is not a feature, it is necessary for AMD to utilize their architectur in a best way. On the other hand using only compute workload and the graphics queue is waiting for the result of the compute queue will hurt AMD when developers dont have the workload to saturate the compute units.

Do you remember Starswarm and the folks here who tried to use that as a benchmark to compare different GPUs?

Do you remember the 3dMark DX12 API test? <- synthetic best case scenario draw call test.

In those threads, I said both were synthetics, don't draw conclusions.

You repeat the same mistakes others have in the prior examples. Even worse because the developer has said its only to test for functional async compute or not. You had better make an account on b3d and ask for more clarification before you jump to such definitive statements.

This is what the dev had to say:

It's running from 1 to 128 single lane compute kernels which are quite long and require minimal amount of bandwidth. The graphics queue is basically just pushing fillrate with triangles occupying 4k x 4k offscreen render target.
So basically the best possible case.

It's a synthetic best case scenario test. Its purpose is to test for a feature support or not.

Ashes of the Singularity User Benchmarks Thread

Lifer

Member

Diamond Member

Member

Lifer

Golden Member

Senior member

Lifer

Golden Member

Diamond Member

Senior member

Senior member

Senior member

Lifer

Lifer

Lifer

Diamond Member

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Lifer