Ashes of the Singularity User Benchmarks Thread

Page 9 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Glo.

Diamond Member
Apr 25, 2015
5,802
4,773
136
I think what people forget, and what pretty much gives more credit to Mahigan posts is the effect of more cores on Nvidia GPU in 4K resolution.

To 1600p resolution, we completely don't see any difference between the use of 4 or 6 cores, with, and without HT. The Arstechnicas test with R9 290X and GTX 980 Ti provides it.
http://cdn.arstechnica.net/wp-conte...ew-chart-template-final-full-width-3.0021.png - 4 cores, without HT.
http://cdn.arstechnica.net/wp-conte...ew-chart-template-final-full-width-3.0011.png - fully enabled CPU.

29 vs 32 FPS in DX12 mode with 4 vs 6 cores.

More work done by the CPU and scheduled by the ACE in Maxwell provides higher performance, only in 4K. The engine does not need more CPU work on lower resolutions therefore, we don't see of any benefit in them. 4K is something rather different.

The biggest problem about all this is:
Nvidia knew what will be with DX12, and deliberately released Maxwell architecture, to sell as many GPUs as they can. With DX12 performance gimped, now they can bring new design, which will make all of the people who bought Maxwell and older architectures, more likely to buy new hardware.
Planned obsolescence.
As for GCN. Unfortunately its the same story. Fury X will not fly in future games because of Rasterization Bottleneck.
All of our hardware in fact is already outdated for DX12 games if we look at the potential of the API, and future hardware will make that potential extremely possible, and that is a shame.

However, that makes another point. Right now, the best possible buy is the R9 390X, which has really good even 1440p performance in DX12, nice price, and it will get much more power with DX12, for free. What a shame that we cannot buy R9 290X anymore.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
DX11 is Serial, DX12 is Parallel. nVIDIAs hardware was built to function better under Serial conditions (Kepler, Maxwell 1). Maxwell 2 is a bit of a mixed bag. A Middle of the road between Serial and Parallel. Take a look at the GTX 980 Ti. It gains from using DX12 over DX11.

Everthing is parallel for a GPU. Otherwise they wouldnt be able to get information to all cores and units...
The difference between DX11 and DX12 is how the API can feed the GPU. And DX12 is much better in this.

Star Swarm does not use Asynchronous Shading.
And you know this how? :sneaky:
Star Swarm is based on the same engine. This is was Oxide told the world. nVidia hardware showing a massive gain over DX11. The approach to display things hasnt changed.

DX12 is not slower than DX11. The GTX 980 Ti is slower under DX12 than DX11 under certain conditions.
DX12 is slower when nVidia hardware gets nearly full utilization. On the other hand DX11 is still capable to feed the GPU with more information to perform faster. There is a bottleneck in the DX12 path with is limiting the performance.

DX11 can be faster. There are some cases where DX11 is faster than DX12. This can happen under lower CPU load conditions. It can also happen if the DX11 driver is providing shortcuts rather than rendering the Game Engine's desired work.
I like how you repeat the nonsense of Oxide.
First you should understand that DX12 or other absolute zero driver overhead approaches reduces the CPU workload. This is the future. Even on the first slide shown by Microsoft it was highlighted:
http://wccftech.com/microsoft-unveiels-directx-12-api-gdc-2014-mantle-level-features/

Using Oxide logic DX12 would always be slower than DX11 when applying the same workload over DX11 up to a point where the DX11 limitations start to limiting the performance on the older API. :hmm:

The second part is the best one: Developers have asked for more control over the GPU. It is their job to optimize the underlaying hardware now. To do this the whole driver abstraction layer got reduced to a minimum. They are responsible to do the job. Blaming better DX11 optimization is just so wrong, that i even think Oxide is laughing about it.

As for your comparing of Apples and Oranges with the GTX 770 and GTX 960.
I used it to show you how wrong your assumption about asynchronous compute is. GK104 only supports 1 graphics or 1 compute queue. So the performance impact would be much bigger on GK104 than on GM206.
But i see where you stand. Let me use the GTX980TI on the other hand. This card in 1080p with the normal numbers of draw calls and medium setting (CPU limited!) is only 48% faster with DX12 than the GTX770 with DX11. It is quiet clear at this point that the DX12 implementation is at least on nVidia hardware not good enough to release this benchmark. Instead of delaying it they are throwing stones at nVidia for the performance.
 
Last edited:

Glo.

Diamond Member
Apr 25, 2015
5,802
4,773
136
First you should understand that DX12 or other absolute zero driver overhead approaches reduces the CPU workload. This is the future. Even on the first slide shown by Microsoft it was highlighted:
http://wccftech.com/microsoft-unveiels-directx-12-api-gdc-2014-mantle-level-features/

So why 6 core CPU gives more performance compared to 4 core CPU in that benchmark on Nvidia Hardware in 4K resolution? Can you explain, why in lower resolutions, there is completely no difference between amount of CPU cores, and in 4K there is and only on Nvidia Hardware?

Because Nvidia GPU is bottlenecked here. It is exactly what Mahigan described. CPU has more work to do, because of amount of ACEs in Maxwell GPU, and the more cores you have the better on Nvidia GPU. Because the amount of work can be added to the scheduler in Maxwell GPU. Thats what bottlenecks it.

It is not software, because the API exposed hardware for software. It is simply because of the inability of Nvidia GPU to work in parallel environment.

Because of the APIs we compare we can make nice analogy. API bottlenecked hardware for software on AMD side, whereas Nvidia and AMD with drivers got around this problem to some degree. In DirectX12 the only bottleneck here is the hardware. Nothing else.
 
Last edited:
Feb 19, 2009
10,457
10
76
Sontin, has it occurred to you at all, that maybe NV's hardware cannot handle async compute/shaders well? Is it not even a possibility in your world?

You seem adamant Oxide is at fault here. Why blame a studio with veterans who have proven themselves neutral in the past (remember Civ 5, DX11 multi-threading advantage for NV), proven themselves neutral in the current (source code for all IHVs, for a year already in alpha, willing to implement NV's optimized shaders etc)?

Why be so negative on a studio that has proven they are above the dirty approach of GameWorks developers?

I ask this, because in 2016 when there's more DX12 games and if the trend holds, async compute performance suffers on NV GPUs, will you come and apologize for your attempts to drag Oxide into the mud? On the contrary, if other DX12 games come and NV's async compute works great, no issues, I will remember this and will blame Oxide for their anti-NV approach.
 
Feb 19, 2009
10,457
10
76
It is not software, because the API exposed hardware for software. It is simply because of the inability of Nvidia GPU to work in parallel environment.

You can't be certain. None of us can be.

Unless you have DX12 programming experience at a major level, you cannot be certain. It's just a hypothesis and conjecture. If someone like Sebbi comes out and says Maxwell is gimped for async compute, his words are much more solid evidence than any other forum goer for example.

Currently due to unknown factors, there are many possibilities why NV's DX12 performance is lackluster. To rule anyone is impossible or a fact, is being too liberal with the truth.

For us non-experts, when we have more DX12 games to compare, a trend will show itself real fast.
 

Glo.

Diamond Member
Apr 25, 2015
5,802
4,773
136
There is a way to workaround this, and not add Asynchronous Shaders at all to the engine mechanics. The problem here is we are going back this way to DirectX11. It will not work in the big scheme of things.

We HAVE TO HAVE low-level APIs, if we want it or not. That is because they are giving us simply much more possibilities, especially for eGPUs in the future.
 

Glo.

Diamond Member
Apr 25, 2015
5,802
4,773
136
You can't be certain. None of us can be.

Unless you have DX12 programming experience at a major level, you cannot be certain. It's just a hypothesis and conjecture. If someone like Sebbi comes out and says Maxwell is gimped for async compute, his words are much more solid evidence than any other forum goer for example.

Currently due to unknown factors, there are many possibilities why NV's DX12 performance is lackluster. To rule anyone is impossible or a fact, is being too liberal with the truth.

For us non-experts, when we have more DX12 games to compare, a trend will show itself real fast.
I agree, I may have used non exact words to what I was referring to. Parallel computing on Maxwell cards is gimped. Thats what I meant. Not even if we count the ACE in Maxwell 2 and compare it AMD proposition here.

1 ACE with 32 queues is IMO too small amount. I have no idea, if it would be better to have 8 ACEs with 32 Queues each, or 8 ACEs with 8 Queues each. But simply one ACE is too small amount to get work done on parallel environment.
 

tential

Diamond Member
May 13, 2008
7,348
642
121
You know, R9 290/X where selling for less than GTX970 a few months ago. It seems now that 290/X was the better card to get. And today, it also seems that 390 is even batter than GTX970, so if you are in the market for a $300 card the R9 390 is the better choice. Same goes for the R9 380 vs GTX960.

I know that but sadly, I can't get a 290x/390x etc. because of the gimped VSR. So it's Fury or better for me. I'm waiting to see what Fury Nano brings. And waiting for the price of the 65 inch freesync monitor so I can make a decision as to what my plan is.

But back on the point, the point is, having a forward looking architecture doesn't mean much. Just because it's forward looking doesn't stop the competition from making a new architecture for the new times. AMD keeps making forward architectures while their competitors make architectures for the current needs of customers today, and it's not boding well for them.

For example:

Fury won't be fully utilized until DX12 games, but it won't play ANY DX12 game at the quality level I want to. But it will also not be fully utilized in ANY of the older titles I want to play due to the architecture bottlenecks there as well.

I'd rather opt for a more efficient design like the 980ti then in that case, rather than get a GPU like Fury for new games that I'll be getting sub 60 FPS in (when far better GPUs come out next year I can pick the best GPU then for DX12 games because they'll be out and GPUs will be out) and couldn't care less about playing.

If AMD wants to be successful, they need to build things people want TODAY, not build things and tell people to wait for the necessary things to become available for their GPUs to be fully utilized or anything they develop.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
So why 6 core CPU gives more performance compared to 4 core CPU in that benchmark on Nvidia Hardware in 4K resolution? Can you explain, why in lower resolutions, there is completely no difference between amount of CPU cores, and in 4K there is and only on Nvidia Hardware?

Because Nvidia GPU is bottlenecked here. It is exactly what Mahigan described. CPU has more work to do, because of amount of ACEs in Maxwell GPU, and the more cores you have the better on Nvidia GPU. Because the amount of work can be added to the scheduler in Maxwell GPU. Thats what bottlenecks it.

It is not software, because the API exposed hardware for software. It is simply because of the inability of Nvidia GPU to work in parallel environment.

Because of the APIs we compare we can make nice analogy. API bottlenecked hardware for software on AMD side, whereas Nvidia and AMD with drivers got around this problem to some degree. In DirectX12 the only bottleneck here is the hardware. Nothing else.

Because this software might have a broken DX12 path on nVidia hardware?

And dont use benchmarks from only one site to make conclusions. Take this from pcper: http://www.pcper.com/files/imagecache/article_max_width/review/2015-08-16/ashes-gtx980.png

Look how the 6 core processor doesnt show any performance improvement on a GTX980? Or how the difference between the AMD processors is the same? Or how a i3 is faster than an eight core FX processor with only 2C/4T?

/edit: Compare it to the 390X. The 4 and 6 Core Intel processor increase the performance 15-20% over the i3 in 1600p and high. On the GTX980 it is only 6%. So AMD has a broken asynchronous computing implementation, too?
 
Last edited:

Glo.

Diamond Member
Apr 25, 2015
5,802
4,773
136
Sontin, did you read my post? To 1600p we don't see any difference between the amount of cores. This is extremely consistent with EVERY benchmark there is in the wild. In 4K, however, we do. And we see the difference ONLY on Nvidia Hardware. 6 cores affect in 4K resolution the framerate on GTX 980 Ti. It has more work to do, thats what bottlenecks it.

Software is not broken here. Its like saying DirectX 12 is broken on Nvidia hardware. Mahigan already explained all of this, and the amount of cores in 4K sustains this theory.

Game engine talks to the API, API talks to the GPU on DX12. Thats what makes a difference.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
And only on nVidia hardware we see a performance impact with the DX12 path. So you making a conclusion out of a broken DX12 path and questioned nVidia's hardware while thinking at the same time there is nothing wrong with the DX12 implementation from the developers.

The fact that DX12 is slower shows only that DX12 is broken. The result with the 6-core processor should always be faster in any resolution if more cores would be helpful to fed the GPU.

You see at pcper that more cores dont help when the GPU is under load.
 
Last edited:
Feb 19, 2009
10,457
10
76
The fact that DX12 is slower shows only that DX12 is broken.

Either Oxide did a bad job, or NV's hardware can't handle Async Compute in the GPU test mode of the bench. Because it shows improvement in the CPU (draw call) test (no lightning enabled).

You seem certain Oxide did a bad job with DX12. Why are you so certain? What have the team at Oxide done in the past to make you question their competency/ability or ethics? These guys & DICE are the major players in showcasing DX12 at every single conference of late, you think they are incompetent?
 

Glo.

Diamond Member
Apr 25, 2015
5,802
4,773
136
The only thing what could occur with proper implementation of DirectX 12 is... widening the gap between AMD GCN GPUs and Nvidia GPUs.

Reference R9 290X has much more compute power than GTX 980. It has almost as much compute power as GTX 980 Ti. And those benchmarks reflect that. R9 280X will have almost as much power as GTX 980. You have to bare in mind the fact compute power will be the most important thing that is here right now. Applications, will reflect that if the hardware will not be bottlenecked somewhere: Asynchronous Shading, or Rasterizing performance.

I knew that AMD GPUs will fly on DirectX 12, but I have had no idea what will happen with Nvidia GPUs. It was shocking a bit for me, but on the other hand was pretty predictable, because of the limited amount of asynchronous compute abilities in Maxwell GPUs.

In DirectX 11 with Drivers, Nvidia was able to get around the problem of limited compute power on their GPUs compared to AMD propositions, by simply optimizing the paths for GPU. The GPU knew how to execute tasks to get the best performance out of it.

In DirectX 12 the situation is exact opposite. Nvidia hardware does not know how optimize itself for get the task done, it has to do what the API tells it. Between API and the Hardware there is only API driver. Nothing else.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
Company that is AMD sponsored runs better in an pre alpha on AMD hardware and people already make conclusion about future nVidia performance.

There is no such thing as "free performance" with async compute.

nVidia got a problem with the MSAA, but else the point of DirectX12/Vulkan/mantle is that its the developers task to fix it all. Do I need to remind people of the broken Mantle in BF4/Hardline/Thief with GCN 1.2?

And funny enough Oxide is the same company behind Star Swarm. And we all know how that turned out in the end.
 
Last edited:

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
In DirectX 12 the situation is exact opposite. Nvidia hardware does not know how optimize itself for get the task done, it has to do what the API tells it. Between API and the Hardware there is only API driver. Nothing else.

Wow. I dont even now what to say...

DX12 allows much better optimization on GPUs than DX11. It has nothing to do with nVidia or AMD. The whole API is better suited to new GPU architectures than DX11. That's the reason why it is even supported on Fermi - a GPU architecture from 2010, introduced 6 months after DX11...

You should re-read again what everybody is saying about these new low level APIs starting with the AZDO from OpenGL 4.4.

In the end you just said, that DX12 is bad for nVidia because DX12 is able to do more on their hardware. :|
 
Feb 19, 2009
10,457
10
76
Some of us can read past PR.

Can you? Is 3.5gb a feature?

It seems to be to be a question of trust. Some people don't trust Oxide to do the right thing, whilst Oxide have stated they are very fair to all IHVs. They are so fair that despite AMD sponsoring, FX CPUs perform like crap.

When a lead dev says "free performance", he means it, look at the way he talks, nothing PR about it at all. ACEs on GCN are not utilized in DX11, they are in DX12, that's free performance which does not detract from rendering performance.

There's gotta be a reason why other devs have praised GCN on async compute & shaders on b3d... it must be all PR, right?

https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-4#post-1867649
I don't know anything about Maxwell's async compute implementation, but I know that GCN gets huge benefits from it.

There's a reason Sony insisted AMD improve GCN with 8 ACE for the PSU APU, they specifically mention async compute as the features they wanted to give them an edge in 2015+.. so its no shock when devs say GCN is excellent for that task. But it's probably useless, it must be all PR...
 
Last edited:

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
It seems to be to be a question of trust. Some people don't trust Oxide to do the right thing, whilst Oxide have stated they are very fair to all IHVs. They are so fair that despite AMD sponsoring, FX CPUs perform like crap.

Oh, Oxide is blaming reviewers for the "crap performance" of FX CPUs:
https://twitter.com/dankbaker/status/634748937149280256

Very fair. :awe:

When a lead dev says "free performance", he means it, look at the way he talks, nothing PR about it at all. ACEs on GCN are not utilized in DX11, they are in DX12, that's free performance which does not detract from rendering performance.
When Microsoft says DX12 gives "free performance" on nVidia hardware people dont believe it because a developer who blames reviewers for the bad performance of AMD CPUs is saying otherwise. :hmm:

There's gotta be a reason why other devs have praised GCN on async compute & shaders on b3d... it must be all PR, right?
I guess there are thousend or million reasons for this. :sneaky:
 
Feb 19, 2009
10,457
10
76
Oh, Oxide is blaming reviewers for the "crap performance" of FX CPUs:
https://twitter.com/dankbaker/status/634748937149280256

Very fair. :awe:

When Microsoft says DX12 gives "free performance" on nVidia hardware people dont believe it because a developer who blames reviewers for the bad performance of AMD CPUs is saying otherwise. :hmm:

I guess there are thousend or million reasons for this. :sneaky:

"I think some reviews are actually seeing MB issues. My numbers for 8350 are much higher. Could be an AGP/PCIE issue on the MB"

= blaming reviewers. You sure English is a language you understand? Because in my understanding, words such as COULD aren't definitive.

If MS comes out and says Async Compute gives free performance on NV hardware, I'll be very interested in that source! Cos so far, the devs have only used that to describe GCN.

Is this the same as you thinking a DX12 showcase like Wushu = BENCHMARK to show NV is faster in DX12 than AMD?

https://www.youtube.com/watch?v=AB5iuX8UDHk

Look closely at the fps, 10% improvement. On an NV setup only. Somehow its a benchmark confirming NV is faster in DX12 than AMD.

Now I get why you believe Oxide in some un-ethical studio (like GimpWorks devs) out to be anti-NV... You live in a strange world buddy.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
Ye their blame on reviewers with the slowpoke FX CPUs says it all. Sponsor money dictates. Ethical my rear end. Its a business, not a charity.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
When a lead dev says "free performance", he means it, look at the way he talks, nothing PR about it at all. ACEs on GCN are not utilized in DX11, they are in DX12, that's free performance which does not detract from rendering performance.

When i run AVX2 on my Haswell or Skylake CPUs vs non AVX2 loads. Is it free performance?

We can talk about many things, but its never free. Specially not if GFX cards will have to throttle/lower boost or whatever to run it due to power or temperature limits.
 
Feb 19, 2009
10,457
10
76
Ye their blame on reviewers with the slowpoke FX CPUs says it all. Sponsor money dictates. Ethical my rear end. Its a business, not a charity.

You should re-read the quote again. Nowhere does he blame, he said it could be a MB issue.

Blaming is what NV PR did to Oxide, saying they have an MSAA bug in their game, and hypocritical too, when its a bug in their own DX12 drivers which Oxide offered help to fix.
 
Feb 19, 2009
10,457
10
76
When i run AVX2 on my Haswell or Skylake CPUs vs non AVX2 loads. Is it free performance?

I dunno for YOUR case. But for Lionhead's case, they seem to think Async Compute is "free performance" for GCN.

I take their words over forum warriors with no experience in DX12 programming, any day of the week.
 

VR Enthusiast

Member
Jul 5, 2015
133
1
0
AMD can do a lot to feed there shader by driver. Also i don't think the Problem is in the frontend. The Problem AMD have is the horrible dx11driver which cant pull any drawcalls to the gpu. There are rumors that AMD's driver use only one thread in the CPU. This will also look like frontend bottelnecks. But now the fps in low resolution in dx12 is the same like nvidia. So i think there is no frontend bottleneck!

Nvidia have the Problem that they cant fix there command processor Problem because its 100% Hardware limit.

Yes, the limit is in the ACEs which they don't have in Maxwell. Context switching on Nvidia comes with a huge penalty also.

Fair enough. Though I will say (and I think you will agree with me) that had AMD maintained Pitcarin like scaling throughout their lineup they would have done significantly better.

This has always been the case, architectures tend to have a sweet spot. For Maxwell it's clearly the 970/980 range, with AMD it's Pitcairn. It's hard to believe that Pitcairn is only around 10% worse in performance per Watt than Maxwell there.

There is no such thing as "free performance" with async compute.

It's better than free because it reduces latency too. This is why GCN is so much better than Maxwell in VR.

And funny enough Oxide is the same company behind Star Swarm. And we all know how that turned out in the end.

If they were really sponsored by AMD I doubt this would have happened.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |