Ashes of the Singularity User Benchmarks Thread

Page 23 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Feb 19, 2009
10,457
10
76
Reemplace "Compute" for "Mantle" and its the exact same argument from Mantle launch hype... again how Mantle is doing howdays?

So i hope you understand that argument alone does not work.

Mantle is doing great, say hi to Vulkan, Metal and DX12.

Classic "downplay the feature your favorite hardware doesn't support". Okay fine, if you feel that way, move along, nothing else for you in this thread because now your only contribution is "Async Compute is like Mantle it's not important".
 

Shivansps

Diamond Member
Sep 11, 2013
3,873
1,527
136
Mantle is doing great, say hi to Vulkan, Metal and DX12.

Classic "downplay the feature your favorite hardware doesn't support". Okay fine, if you feel that way, move along, nothing else for you in this thread because now your only contribution is "Async Compute is like Mantle it's not important".

Sure, thats probably why those APIs are AMD only.

Again you are using the exact same arguments used for mantle defense, why should i think anything will turn out diferently now? And you dont seem to know what to say.

You are talking about a feature that is petty much AMD only right now, no much diferently of DX11 and the Deferred Contexts that where Nvidia only that was rarely used. the only thing that may be different now is that xone is running on DX12, contrary to what happened with mantle, its the only thing that may shift the balance.
 
Last edited:

TheProgrammer

Member
Feb 16, 2015
58
0
0
<snip snip>
Ultimately, we have very little data and a lot of wishful thinking. The real games of dx12 will be all that matters, not some alpha benchmark that starts up with an AMD logo on it

Sounds to me like you regret getting robbed by Nvidia for that 980. :biggrin:

The only wishful thinking is that you could turn it into a Sapphire Tri-X R9 Fury. :whiste:


Trolling is against the forum rules and you should know that.

-Rvenger
 
Last edited by a moderator:

desprado

Golden Member
Jul 16, 2013
1,645
0
0
So @Shivansps says Compute is no big deal in games.. devs say it is, thats why they wanted next-gen APIs.

Sony defnitely thinks its important, thats why they help AMD designed GCN with 8 ACEs, and PS4 games have already started to use Async Compute to give some some performance gains.

As said, when working with weaker hardware on consoles, devs will have to find methods to extract peak performance out of it, so ofc the statement from Oxide that console devs are pushing the console's ACEs in next-gen games makes perfect sense.

The question is what will be done for those heavy async compute console games when they get ported to PC? This is where you are free to speculate. But to say consoles don't use async compute, is wrong.

Be specific which developers claimed or asked AMD for low level API?

DICE and Oxide?

So these are only the one who asked?

Do you have proper names of developers who claimed or are you just quoting AMD that many developers asked them for level API which they never told the name even in the interview they never told the name.

Plz post your theroy on logic so that later you do not get disappointed like you just recently claiming that Fury X will be 20% or 40% faster than Titan X so it better you just come to reality.
 

dogen1

Senior member
Oct 14, 2014
739
40
91
Sure, thats probably why those APIs are AMD only.

Again you are using the exact same arguments used for mantle defense, why should i think anything will turn out diferently now? And you dont seem to know what to say.

You are talking about a feature that is petty much AMD only right now, no much diferently of DX11 and the Deferred Contexts that where Nvidia only that was rarely used. the only thing that may be different now is that xone is running on DX12, contrary to what happened with mantle, its the only thing that may shift the balance.

Both XB1 and PS4 games use async compute, if that's what you're talking about.
 
Feb 19, 2009
10,457
10
76
Plz post your theroy on logic so that later you do not get disappointed like you just recently claiming that Fury X will be 20% or 40% faster than Titan X so it better you just come to reality.

I never claimed such. Go find my post, dare you to prove me wrong.

I've always said I expect or hope it to be ~15% above Titan X, because that is when I will consider throwing $ at it to upgrade. It did not meet my expectations, so I have not bought Fury X.

Now, if you have anything else of value to add besides the usual "Mantle is useless, Async Compute is useless"..

Oh, here's the source where Sony thought that Async Compute was an important feature by 2015 (from 2009!):

http://www.gamasutra.com/view/feature/191007/inside_the_playstation_4_with_mark_.php?print=1
 

desprado

Golden Member
Jul 16, 2013
1,645
0
0
I never claimed such. Go find my post, dare you to prove me wrong.

I've always said I expect or hope it to be ~15% above Titan X, because that is when I will consider throwing $ at it to upgrade. It did not meet my expectations, so I have not bought Fury X.

Now, if you have anything else of value to add besides the usual "Mantle is useless, Async Compute is useless"..

Oh, here's the source where Sony thought that Async Compute was an important feature by 2015 (from 2009!):

http://www.gamasutra.com/view/feature/191007/inside_the_playstation_4_with_mark_.php?print=1

I am asking the name of those developers who asked AMD for low API.So in short can you provide those "many developers" names.
If you do not have names than plz you do not need to post base on fanboyism.

I have a pic that you were calming that Fury X will be 20% faster than Titan X. Therefore, it is not hard to talk sense and base on reality.
 
Last edited:

Erenhardt

Diamond Member
Dec 1, 2012
3,251
105
101
980Ti


GCN has the same time for either compute, or combined graphics + compute async.

Here is Tahiti (GCN1.0) for comparison, it's not mine, I pulled the numbers from the thread to graph them.


Interesting...
If you put one graph on the other you can see where tahiti matches 980ti in performance.

So, it is possible that with enough graphics offloaded to compute 280X will be faster than 980ti thanks to async compute.

Ofcourse it depends on how much of that will be in games. Fun times ahead.
 

selni

Senior member
Oct 24, 2013
249
0
41
Mantle is doing great, say hi to Vulkan, Metal and DX12.

Classic "downplay the feature your favorite hardware doesn't support". Okay fine, if you feel that way, move along, nothing else for you in this thread because now your only contribution is "Async Compute is like Mantle it's not important".

We do appear to be seeing some of the traditional drawbacks of low level APIs here with DX12 though - you often need significantly different codepaths per hardware platform to get good performance. Eg, mantle implementations had some initial issues with newer GCN revisions (thinking of tonga here) and nvidia DX12 performance here is actually worse than DX11.

It's still looking promising judging from the industry reception, but there's some warning signs appearing here too.
 

Magee_MC

Senior member
Jan 18, 2010
217
13
81
I think that the most interesting part of all of this so far is that NV has been completely silent on the subject while this has played out.

From what I can tell from the discussions and information presented so far is that while yes, Maxwell can do asynchronous compute, its hardware implementation requires a context switch that in essence not only negates the increased performance that asynchronous compute is used to extract, the context switch actually degrades performance compared to not using it.

If this is true, then for Maxwell, the only options that NV will have is to have the devs add code to check for a Maxwell card, and if one is detected then run the code without asynchronous compute, or have the drivers tell the engine that asynchronous compute isn't available on the card.

One thought for a possible solution might be, if a system is running in SLI have one card do only compute and the other do graphics and async compute. This might allow the card only doing compute to use async compute without a context change, but I have no idea if it's even possible in DX12 or how much effort it would take.

The real billion dollar question that needs to be answered is when did NV realize that async compute was going to be such an important part in increasing game speed in the new APIs (DX12, Vulkan, PS4).

AMD and Dice announced that they were working on Mantle in September of 2013 and both D3D12 and Pascal were announced in March of 2014. I would expect that work on Pascal had started sometime before then, but how much before, and most importantly when was the design finalized? It's possible that NV could have known that something was in the wind since in early 2013 the word was that the PS4 had added 8 ACEs, and D3D12 had been in development well prior to that.

So as I see it the choices are; either NV realized that they needed to revamp how they handle async compute before Pascal was finished being designed, and made the needed changes, or they didn't realize what was coming before Pascal was complete. If it's the first, then they'll be ok. By the time DX12 is really ramping up, they'll have Pascal and an upgrade path. If not, then they really have to have it in Volta, but that leads to the same question, since Volta was announced in March of 2013 how will it handle async compute and where is it in the design process.

Very interesting times are ahead.
 

littleg

Senior member
Jul 9, 2015
355
38
91
Would this also explain the higher latency NV cards are seeing in VR applications?
 
Feb 19, 2009
10,457
10
76
I think that the most interesting part of all of this so far is that NV has been completely silent on the subject while this has played out.

They were silent for awhile with the 3.5gb 970. Then their PR came out and called it a feature.

Their PR is no doubt busy preparing a statement for this. Let me guess what's gonna be included:

1. "We said we support async compute with Maxwell 2, and it's true, it does work. [It's just not any good at it!]"

2. "Async compute is one of many new features in DX12 that game devs can use to increase performance and visuals, we will work with developers to ensure DX12 games will run well on our hardware"

3. "Async compute is a small part of DX12. We are fully FL12.1 ready, 12.1 is bigger than 12"

4. "Pascal will arrive soon, it's much better at Async Compute, so for users who are unhappy, we have given them a viable upgrade route"
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
You cannot ignore the fact that the PC gaming market is 80% nvidia.

For the millionth time, NVIDIA doesnt have 80% of the PC gaming market. They just shipped 80% of Desktop dGPUs in Q2 2015 only.

PC gaming market is comprise of Desktop APUs, Desktop dGPUs, Laptop APUs and Laptop dGPUs.
So NVIDIA marketshare in PC Gaming is not that high as people make it or believe it is.
 

Erenhardt

Diamond Member
Dec 1, 2012
3,251
105
101
I think that the most interesting part of all of this so far is that NV has been completely silent on the subject while this has played out.

From what I can tell from the discussions and information presented so far is that while yes, Maxwell can do asynchronous compute, its hardware implementation requires a context switch that in essence not only negates the increased performance that asynchronous compute is used to extract, the context switch actually degrades performance compared to not using it.

If this is true, then for Maxwell, the only options that NV will have is to have the devs add code to check for a Maxwell card, and if one is detected then run the code without asynchronous compute, or have the drivers tell the engine that asynchronous compute isn't available on the card.

One thought for a possible solution might be, if a system is running in SLI have one card do only compute and the other do graphics and async compute. This might allow the card only doing compute to use async compute without a context change, but I have no idea if it's even possible in DX12 or how much effort it would take.

The real billion dollar question that needs to be answered is when did NV realize that async compute was going to be such an important part in increasing game speed in the new APIs (DX12, Vulkan, PS4).

AMD and Dice announced that they were working on Mantle in September of 2013 and both D3D12 and Pascal were announced in March of 2014. I would expect that work on Pascal had started sometime before then, but how much before, and most importantly when was the design finalized? It's possible that NV could have known that something was in the wind since in early 2013 the word was that the PS4 had added 8 ACEs, and D3D12 had been in development well prior to that.

So as I see it the choices are; either NV realized that they needed to revamp how they handle async compute before Pascal was finished being designed, and made the needed changes, or they didn't realize what was coming before Pascal was complete. If it's the first, then they'll be ok. By the time DX12 is really ramping up, they'll have Pascal and an upgrade path. If not, then they really have to have it in Volta, but that leads to the same question, since Volta was announced in March of 2013 how will it handle async compute and where is it in the design process.

Very interesting times are ahead.

Or it is planned obsolesce. Without much improvement the next gpu lineup will be a lot faster in dx12 thanks to one change that enabled async compute.

They will release new GPUs about a year into DX12. It is not much of a window for amd to get traction.

People will downplay it as usual, protecting their buying decision. And the minute nv releases dx12 card that support async compute, they will upgrade to the new nv gpu immediately.

NV dictates trends here, and async compute will be important when nv says so.

AMD took their sweet time to leverage those ACE units we saw with first GCN cards 4 years ago..
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
And that's how it should look if asynchronous compute was actually working as it should.

Have you looked at all of the numbers?
Fury X:https://forum.beyond3d.com/posts/1869030/

one kernel/batch:
Graphics: 25,18ms
Compute: 49,65ms
G+C: 55,93ms

And a GTX970: https://forum.beyond3d.com/posts/1869008/
One kernel/batch:
Graphics: 32,13ms
Compute: 9,77ms
G+C: 41,63ms

Up to 32 batches the GTX970 is faster than the Fury X. Only after this the GTX970 gets slower.

Using a compute queue on AMD hardware is introducing a huge latency. That's the reason why asynchronous compute is "free" for them.

That is really ironic, huh? :biggrin:
 
Last edited:
Feb 19, 2009
10,457
10
76
Have you looked at all of the numbers?
Fury X:https://forum.beyond3d.com/posts/1869030/

One queue:
Graphics: 25,18ms
Compute: 49,65ms
G+C: 55,93ms

And a GTX970TI: https://forum.beyond3d.com/posts/1869008/
One queue:
Graphics: 32,13ms
Compute: 9,77ms
G+C: 41,63ms

Up to 32 queues the GTX970TI is faster than the Fury X. Only after this the GTX970 gets slower.

Using a compute queue on AMD hardware is introducing a huge latency. That's the reason why asynchronous compute is "free" for them.

That is really ironic, huh? :biggrin:

You have to understand the application better to draw that context, but I was expecting you or ocre to point that out. Please ask the programmer of the app for better clarity.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
From what I can tell from the discussions and information presented so far is that while yes, Maxwell can do asynchronous compute, its hardware implementation requires a context switch that in essence not only negates the increased performance that asynchronous compute is used to extract, the context switch actually degrades performance compared to not using it

Apparently, AMD's ACEs use context switches as well. There's lots of assumptions being thrown about, as usual. The guys over at beyond3d are a lot more sober when it comes to discussing this compared to this forum with all of its biases and partisanship.

For instance, someone at beyond3d said that Maxwell 2 definitely supports asynchronous compute as it's mentioned in the CUDA developer toolkit.

And from Andrew Lauritzen, something that some of us including myself have been saying in this thread:

And let's remember, an ideal architecture would not require additional parallelism to reach full throughput, so while the API is nice to have, seeing "no speedup" from async compute is not a bad thing if it's because the architecture had no issues keeping the relevant units busy without the additional help It is quite analogous to CPU architectures that require higher degrees of multi-threading to run at full throughput vs. ones with higher IPCs.

Source

 

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
For the millionth time, NVIDIA doesnt have 80% of the PC gaming market. They just shipped 80% of Desktop dGPUs in Q2 2015 only.

PC gaming market is comprise of Desktop APUs, Desktop dGPUs, Laptop APUs and Laptop dGPUs.
So NVIDIA marketshare in PC Gaming is not that high as people make it or believe it is.

But it sounds good. It's just one stop for the hype train as it keeps rolling along.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
You have to understand the application better to draw that context, but I was expecting you or ocre to point that out. Please ask the programmer of the app for better clarity.

Huh? I see a huge 25ms penalty when using a compute queue next to the graphics queue. It doubles the time to process the informationen over graphics only and put the Fury X behind a GTX970.

Using compute workload is adding a processing penalty to AMD's GCN architecture. So developers need to use a huge amount of kernels to bring AMD at the same performance level as nVidia when using compute...
 
Feb 19, 2009
10,457
10
76
Huh? I see a huge 25ms penalty when using a compute queue next to the graphics queue. It doubles the time to process the informationen over graphics only and put the Fury X behind a GTX970.

Using compute workload is adding a processing penalty to AMD's GCN architecture. So developers need to use a huge amount of kernels to bring AMD at the same performance level as nVidia when using compute...

Because he said he develop it to only test whether async compute is functional or not, it's not a benchmark. You are jumping to conclusion thinking its a benchmark. So if you want to use the result as such, go ask the developer for his thoughts to prevent misinterpretation.

@Carfax83
CUDA and DX12 are different animals, please don't confuse the two. As far as we know, Hyper-Q is fully functional in CUDA, with 32 compute queues running in parallel.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
And yet his programm shows us that AMD has a huge latency problem when using the compute pipeline of their architecture.

"Asynchronous shaders" is not a feature, it is necessary for AMD to utilize their architectur in a best way. On the other hand using only compute workload and the graphics queue is waiting for the result of the compute queue will hurt AMD when developers dont have the workload to saturate the compute units.
 
Feb 19, 2009
10,457
10
76
And yet his programm shows us that AMD has a huge latency problem when using the compute pipeline of their architecture.

"Asynchronous shaders" is not a feature, it is necessary for AMD to utilize their architectur in a best way. On the other hand using only compute workload and the graphics queue is waiting for the result of the compute queue will hurt AMD when developers dont have the workload to saturate the compute units.

Do you remember Starswarm and the folks here who tried to use that as a benchmark to compare different GPUs?

Do you remember the 3dMark DX12 API test? <- synthetic best case scenario draw call test.

In those threads, I said both were synthetics, don't draw conclusions.

You repeat the same mistakes others have in the prior examples. Even worse because the developer has said its only to test for functional async compute or not. You had better make an account on b3d and ask for more clarification before you jump to such definitive statements.

This is what the dev had to say:

It's running from 1 to 128 single lane compute kernels which are quite long and require minimal amount of bandwidth. The graphics queue is basically just pushing fillrate with triangles occupying 4k x 4k offscreen render target.
So basically the best possible case.

It's a synthetic best case scenario test. Its purpose is to test for a feature support or not.
 
Last edited:
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |