Ashes of the Singularity User Benchmarks Thread

Page 5 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
Star Swarm is a synthetic, draw call bottleneck. It's also why in those same threads I posted not to draw much conclusion because it isn't the entire game. The same as I & you posted in the 3dMark DX12 API test on the same issue.

Notice the CPU test in Ashes, no lights, no dynamic lightsources, just lots of smoke/trails for draw calls. Keep GPU load minimal, maximize CPU loading.

Nobody is putting eggs anywhere. I FULLY expect GCN to shine on DX12 given the similarities to Mantle. Do you deny that still?

If you go back in history, Star Swarm was used just as much now as AOTS to say AMD will win the DX12 game and get some magic bonus over nVidia that will somehow change the tides. Star Swarm was also AMD backed.

What happend:


If anything we learned basing anything on alpha or even pre alpha numbers is pointless.
 
Last edited:

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
NV has only claimed Maxwell 2 support for DX12 & FLs, they didn't claim it would be any good at it.

This really comes down to faith in a way. Maxwell 2 is a marvelously engineered GPU with lots of strengths and very few weaknesses. I have a hard time believing that the engineers would be so shortsighted as to not see the importance of asynchronous compute, especially given that they had significantly increased general compute performance with Maxwell over Kepler.

NVidia was already caught napping in that regard with Kepler. The main reason why GCN has aged so well compared to Kepler is because the former has very strong compute performance, and more and more game engines are using compute shaders to increase performance.

Asynchronous shaders will continue that trend..
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
The main reason why GCN has aged so well compared to Kepler is because the former has very strong compute performance, and more and more game engines are using compute shaders to increase performance.

Its more likely due to that AMD doesnt have anything else. Look what happend to the VLIW uarchs. if AMD ever get something past GCN, GCN will suffer exactly the same fate. But until then they have to work with what they got.
 
Feb 19, 2009
10,457
10
76
I already covered StarSwarm, I was there in those threads, to tell people not to draw too much over a synthetic. Just like I was present in the 3dMark DX12 API which show a R290X beating Titan X, I said its a synthetic, useless to draw parallels to DX12 gaming performance.

Ashes isn't a synthetic, it's a game about to enter beta, their developers have consistently championed DX12 and until they prove otherwise, I will respect that.

Now, your criticism should be: It's one DX12 game, it's in alpha/closed beta. That is the fair criticism.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
The simple fact that DX11 performs like DX12 should make you wonder and conclude its broken at the moment. Then we can play the blame game. But its still not an indication of anything.
 
Feb 19, 2009
10,457
10
76
This really comes down to faith in a way. Maxwell 2 is a marvelously engineered GPU with lots of strengths and very few weaknesses. I have a hard time believing that the engineers would be so shortsighted as to not see the importance of asynchronous compute, especially given that they had significantly increased general compute performance with Maxwell over Kepler.

NVidia was already caught napping in that regard with Kepler. The main reason why GCN has aged so well compared to Kepler is because the former has very strong compute performance, and more and more game engines are using compute shaders to increase performance.

Asynchronous shaders will continue that trend..

It depends on whether you believe DX12 was built with Mantle as an inspiration or foundation. Or whether you even believe at all that DX12 is Mantle-like, similar to Vulkan & Metal (some people here are still in denial regarding the similarities).

If you do, you have to raise doubt whether NV actually saw that coming at all. That Mantle or Mantle-like APIs would form the basis of next-gen DX. IF they saw that coming, they can engineer Maxwell 2 to excel for it. If they didn't, then they got caught with their pants down, and async compute & shaders will incur a performance hit for context switching their in-order uarch. This comes to the promised DX12 features, it can do it (async compute/shaders), but doesn't mean its good at it.

So until we're fully in the DX12 era and can look back with certainty, these speculations is up to you, whether you believe it or not.

I choose to believe that Mantle is the foundation of Vulkan, Metal, DX12. Therefore I fully expect GCN to excel at it. Whether NV will be any good for it with current GPUs, is up for speculation.
 
Last edited:
Feb 19, 2009
10,457
10
76
The simple fact that DX11 performs like DX12 should make you wonder and conclude its broken at the moment. Then we can play the blame game. But its still not an indication of anything.

http://oxidegames.com/2015/08/16/the-birth-of-a-new-api/

DirectX 11 vs. DirectX 12 performance

There may also be some cases where D3D11 is faster than D3D12 (it should be a relatively small amount). This may happen under lower CPU load conditions and does not surprise us. First, D3D11 has 5 years of optimizations where D3D12 is brand new. Second, D3D11 has more opportunities for driver intervention. The problem with this driver intervention is that it comes at the cost of extra CPU overhead, and can only be done by the hardware vendor’s driver teams. On a closed system, this may not be the best choice if you’re burning more power on the CPU to make the GPU faster. It can also lead to instability or visual corruption if the hardware vendor does not keep their optimizations in sync with a game’s updates.

While Oxide is showing off D3D12 support, Oxide also is very proud of its DX11 engine. As a team, we were one of the first groups to use DX11 during Sid Meier’s Civilization V, so we’ve been using it longer than almost anyone and know exactly how to get the get the most performance out of it. However, it took 3 engines and 6 years to get to this point . We believe that Nitrous is one of the fastest, if not the fastest, DX11 engines ever made.

It would have been easy to engineer a game or benchmark that showed D3D12 simply destroying D3D11 in terms of performance, but the truth is that not all players will have access to D3D12, and this benchmark is about yielding real data so that the industry as a whole can learn. We’ve worked tirelessly over the last years with the IHVs and quite literally seen D3D11 performance more than double in just a few years time. If you happen to have an older driver laying around, you’ll see just that. Still, despite these huge gains in recent years, we’re just about out of runway.

Unfortunately, our data is telling us that we are near the absolute limit of what it can do. What we are finding is that if the total dispatch overhead can fit within a single thread, D3D11 performance is solid. But eventually, one core is not enough to handle the rendering. Once that core is saturated, we get no more performance. Unfortunately, the constructs for threading in D3D11 turned out to be not viable. Thus, if we want to get beyond 4 core utilization, D3D12 is critical.

Oxide is fully aware of their engines and potential issues. Their prior exp was in Civ 5, one of the few DX11 games with multi-threaded rendering (which excelled on NV's Fermi & latter GPUs!) and directcompute shaders.
 

Mahigan

Senior member
Aug 22, 2015
573
0
0
I think I know what is happening.

Ashes of the Singularity makes use of Asynchronous Shading. Now we know that AMD have been big on advertising this feature. It is a feature which is used in quite a few Playstation 4 titles. It allows the Developer to make efficient use of the compute resources available. GCN achieves this by making use of 8 Asynchronous Compute Engines (ACE for short) found in GCN 1.1 290 series cards as well as all GCN 1.2 cards. Each ACE is capable of queuing up to 8 tasks. This means that a total of 64 tasks may be queued on GCN hardware which features 8 ACEs.

nVIDIA can also do Asynchronous Shading through its HyperQ feature. The amount of available information, on the nVIDIA side regarding this feature, is minimal. What we do know is that nVIDIA mentioned that Maxwell 2 is capable of queuing 32 Compute or 1 Graphics and 31 Compute for Asynchronous Shading. nVIDIA has been

Anandtech made a BIG mistake in their article on this topic which seems to have become the defacto standard article for this topic. Their information has been copied all over the web. This information is erroneous. Anandtech claimed that GCN 1.1 (290 series) and GCN 1.2 were Capable of 1 Graphics and 8 Compute queues per cycle. This is in fact false. The truth is that GCN 1.1 (290 series) and GCN 1.2 are capable of 1 Graphics and 64 Compute queues per cycle.

Anandtech also had barely no information on Maxwell's capabilities. Ryan Smith, the Graphics author over at Anandtech, assumed that Maxwell's queues were its dedicated compute units. Therefore Anandtech published that Maxwell 2 had a total of 32 Compute Units. This information is false.

The truth is that Maxwell 2 has only a single "Asynchronous" Compute Engine tied to 32 Compute Queues (or 1 Graphics and 31 Compute queues). (Asynchronous is in brackets because it isn't Asynchronous as you will see).

I figured this out when I began to read up on Kepler/Maxwell/2 CUDA documentation and I found what I was looking for. Basically Maxwell 2 makes use of a single ACE-like unit. nVIDIA name this unit the Grid Management Unit.

How it works?



The CPUs various Cores send Parallel streams to the Stream Queue Management. The Stream Queue Management sends streams to the Grid Management Unit (Parallel to Serial thus far). The Grid Management unit can then create multiple hardware work queues (1 Graphics and 31 Compute or 32 Compute) which are then sent in a Serial fashion to the Work Distributor (one after the other or in Serial based on priority) . The Work Distributor, in a Parallel fashion, assigns the work loads to the various SMXs. nVIDIA call this entire process "HyperQ".

Here's the documentation: http://docs.nvidia.com/cuda/samples/6_Advanced/simpleHyperQ/doc/HyperQ.pdf

GCN 1.1 (290 series)/GCN 1.2, on the other hand, works in a very different manner. The CPUs various Cores send Parallel streams to the Asynchronous Compute Engines various Queues (up to 64). The Asynchronous Compute Engines prioritizes the work and then sends it off, directly, to specific Compute Units based on availability. That's it.

Maxwell 2 HyperQ is thus potentially bottlenecked at the Grid Management and then Work Distributor segments of its pipeline. This is because these stages of the Pipeline are "in order". In other words HyperQ contains only a single pipeline (Serial not Parallel).

AMDs Asynchronous Compute Engine implementation is different. It contains 8 Parallel Pipelines working independently from one another. This is why AMDs implementation can be described as being "out of order".

A few obvious facts come to light. AMDs implementation incurs less latency as well as having the ability of making more efficient use of the available Compute resources.

This explains why Maxwell 2 (GTX 980 Ti) performs so poorly under Ashes of the Singularity under DirectX 12 and when compared to even a lowly R9 290x. Asynchronous Shading kills its performance compared to GCN 1.1 (290 series)/GCN 1.2. The latter's performance is barely impacted.

GCN 1.1 (290 series)/GCN 1.2 are clearly being limited elsewhere, and I believe it is due to their Peak Rasterization Rate or Gtris/s. Many objects and units permeate the screen under Ashes of the Singularity. Each one is made up of Triangles (Polygons). Since both the Fury-X and the 290x/390x have the same amount of hardware rasterization units, I believe that this is the culprit. Some people have attribute this to the amount of ROps (64) that both Fury-X and 290/390x share. I thought the same at first but then I was reminded of the Color Compression found in the Fury/Fury-X cards. The Fury/X make use of Color Compression algorithms which have shown to alleviate the Pixel Fill Rate issues which were found in the 290/390x cards. Therefore I do not believe that ROps (Render Back Ends) are the issue. Rater the Triangle Setup Engine (Raster/Hierarchical Z) are the likely culprits.

I've been away from this stuff for a few years so I'm quite rusty but Direct X 12 is getting me interested once again.

PS. Don't expect an nVIDIA fix through Driver Intervention either. DirectX 12 is limited in driver intervention because it is closer to Metal than DirectX 11. Therefore nVIDIAs penchant for replacing shaders at the driver level is nullified with DirectX 12. DirectX 12 will be far more hardware limited than DirectX 11.

Oxide confirmed it here:

DirectX 11 vs. DirectX 12 performance

There may also be some cases where D3D11 is faster than D3D12 (it should be a relatively small amount). This may happen under lower CPU load conditions and does not surprise us. First, D3D11 has 5 years of optimizations where D3D12 is brand new. Second, D3D11 has more opportunities for driver intervention. The problem with this driver intervention is that it comes at the cost of extra CPU overhead, and can only be done by the hardware vendor’s driver teams. On a closed system, this may not be the best choice if you’re burning more power on the CPU to make the GPU faster. It can also lead to instability or visual corruption if the hardware vendor does not keep their optimizations in sync with a game’s updates.

The developer can optimize by replacing shaders on their end. This was already done as confirmed here:

To this end, we have made our source code available to Microsoft, Nvidia, AMD and Intel for over a year. We have received a huge amount of feedback. For example, when Nvidia noticed that a specific shader was taking a particularly long time on their hardware, they offered an optimized shader that made things faster which we integrated into our code.

http://www.oxidegames.com/2015/08/16/the-birth-of-a-new-api/
 
Last edited:

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
It depends on whether you believe DX12 was built with Mantle as an inspiration or foundation. Or whether you even believe at all that DX12 is Mantle-like, similar to Vulkan & Metal (some people here are still in denial regarding the similarities).

I have no problem believing that Mantle, DX12 and Vulkan all share similarities. Why wouldn't they, as they are all low level APIs.. I don't believe that Mantle was the inspiration or foundation for DX12 however. The timeline doesn't support that notion. Plus DX12 was designed to support multiple architectures from it's inception, unlike Mantle.

If they didn't, then they got caught with their pants down, and async compute & shaders will incur a performance hit for context switching their in-order uarch. This comes to the promised DX12 features, it can do it (async compute/shaders), but doesn't mean its good at it.

From what I have been able to research (and getting good information about Maxwell architecture is difficult), many context switches are free as long as they are within kernel. Unlike CPUs, GPUs don't do context switching in software, but in hardware.

And one of the improvements for Maxwell over Kepler was faster context switching, much like Kepler improved over Fermi. So I don't know for sure how much of a performance hit, if any, would be caused by a context switch.

I'm not even sure whether a context switch would be required to do asynchronous compute, since it's part of the DX12 spec. It's not like it requires CUDA or OpenCL or anything..
 

railven

Diamond Member
Mar 25, 2010
6,604
561
126
Can you link me to those articles so I can be on the same page as you?

Alright, that makes a lot more sense now though.
And I been saying that due to the speed of next gen hardware, none of us are going to care about the 980ti/fury x. It's why I'm struggling to buy them when I know next gen hardware is going to be fast.

Agreed. If AMD was smart, they'd want their users to upgrade (they need the money right now).

However, your posts made me think of something:
Nvidia - "artificially" inhibits Kepler through Gameworks - "riot time!"
AMD - "artificially" inhibits DX11 through "GCN was designed for DX12/this API" - "AMD IS THE BEST!!!"

Some people just baffle me. They can swallow years of subpar DX11 performance but moment NV ignores an older series of cards they need to be burned down.
 

BSim500

Golden Member
Jun 5, 2013
1,480
216
106
Which is excellent and showcases DX12's potential for draw call & parallelism.
Software rendering is completely fake as any realistic "showcase" representation of any game's normal CPU load on any GPU. In fact I don't think I've used a software rendered benchmark since 3DMark '03 (the one with the planes dogfighting) and that was totally useless for telling me how 2003 games ran on 2003 GPU's in reality. It's almost like being back in the mid 90's measuring up the difference between a "2D Windows accelerator" card vs an actual proper first generation 3D card with "revolutionary" new tech like "Bilinear texture filtering" and "MIP mapping"... :biggrin:

If there's one thing I've learned here it's this : Whenever a new technology comes out, during the 6-12 month build-up to actually seeing at least half a dozen different real-world games get released that makes use of that technology, avoid daily tech forum "speculation" like the plague and you'll end up twice as educated... I doubt we'll know what DX12 is really like for the average game until the first 3 or 4 FPS games come out, as Star Swarm & AOTS are obviously being received as 'pseudo-synthetic benchmarks first, games second' regardless of what marketing are pushing.
 

Shivansps

Diamond Member
Sep 11, 2013
3,875
1,530
136
Which is excellent and showcases DX12's potential for draw call & parallelism. I was very impressed with Frostbite when it uses 6 threads, this is on another level and all involved with DX12 should be commended (and certainly not belittled like some members here who are unhappy with the results!).

You got to be jocking, go and play a DX9 single thread game under "Windows Basic Adapter" (X3TC demo 100% ST game) and check how many cores it uses.

If the cpu part is software renderer it can only be used for raw cpu power comparison, nothing more.
 
Feb 19, 2009
10,457
10
76
I have no problem believing that Mantle, DX12 and Vulkan all share similarities. Why wouldn't they, as they are all low level APIs.. I don't believe that Mantle was the inspiration or foundation for DX12 however. The timeline doesn't support that notion. Plus DX12 was designed to support multiple architectures from it's inception, unlike Mantle.

Vulkan supports multiple architectures, yet it was built on Mantle (and hastily too). Metal is the other Mantle contribution and also support multiple architectures, but somehow DX12 is different.. well, its your prerogative, believe what you will.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
@ Mahigan welcome to Anandtech! That PDF document concerned Kepler and not Maxwell. HyperQ was certainly improved for Maxwell over Kepler, as Maxwell can do everything in parallel. Ryan explains it in his article:

On a side note, part of the reason for AMD's presentation is to explain their architectural advantages over NVIDIA, so we checked with NVIDIA on queues. Fermi/Kepler/Maxwell 1 can only use a single graphics queue or their complement of compute queues, but not both at once – early implementations of HyperQ cannot be used in conjunction with graphics. Meanwhile Maxwell 2 has 32 queues, composed of 1 graphics queue and 31 compute queues (or 32 compute queues total in pure compute mode). So pre-Maxwell 2 GPUs have to either execute in serial or pre-empt to move tasks ahead of each other, which would indeed give AMD an advantage..

Source

This explains why Maxwell 2 (GTX 980 Ti) performs so poorly under Ashes of the Singularity under DirectX 12 and when compared to even a lowly R9 290x.

Only one benchmark had results where the GTX 980 Ti performed so poorly, and that was Ars Technica's. Computerbase.de and ExtremeTech showed much better results for the GTX 980 Ti, on par more or less with the R9 Fury X:


 
Last edited:
Feb 19, 2009
10,457
10
76
Agreed. If AMD was smart, they'd want their users to upgrade (they need the money right now).

However, your posts made me think of something:
Nvidia - "artificially" inhibits Kepler through Gameworks - "riot time!"
AMD - "artificially" inhibits DX11 through "GCN was designed for DX12/this API" - "AMD IS THE BEST!!!"

Some people just baffle me. They can swallow years of subpar DX11 performance but moment NV ignores an older series of cards they need to be burned down.

What do you mean years of subpar DX11 performance?

My R290s stomped all over 780s (they were $100 cheaper each at the time!). It even has a bigger gap now. R290X on recent drivers is faster than 970. If that's subpar, it's pretty decent.

Also, by buying a R290X, the gamer is rewarded years later with DX12 boosting it performance by a massive leap. That's bad? Your world is a strange one.
 

VR Enthusiast

Member
Jul 5, 2015
133
1
0
Agreed. If AMD was smart, they'd want their users to upgrade (they need the money right now).

However, your posts made me think of something:
Nvidia - "artificially" inhibits Kepler through Gameworks - "riot time!"
AMD - "artificially" inhibits DX11 through "GCN was designed for DX12/this API" - "AMD IS THE BEST!!!"

Some people just baffle me. They can swallow years of subpar DX11 performance but moment NV ignores an older series of cards they need to be burned down.

There's a big difference between optimising for a new API at the expense of the old one, and optimising for a new graphics architecture at the expense of the old one. In this case it would be that the former doesn't require another $500 outlay.
 

Mahigan

Senior member
Aug 22, 2015
573
0
0
If you go back in history, Star Swarm was used just as much now as AOTS to say AMD will win the DX12 game and get some magic bonus over nVidia that will somehow change the tides. Star Swarm was also AMD backed.

What happend:


If anything we learned basing anything on alpha or even pre alpha numbers is pointless.

I can answer this, Star Swarm makes use of 100,000 Draw Calls. Both AMD and nVIDIAs hardware have no problem with processing that many draw calls.

What is limiting AMD GCN performance, under Star Swarm, is the Peak Rasterization Rate (Gtris/s). nVIDIA has a relatively higher Peak Rasterization Rate on its Kepler and Maxwell GPUs than comparable AMD GCN cards. Why is the peak rasterization rate important? Because all of those units on the screen are made up of triangles (Polygons). This is the same issue we see under Ashes of the Singularity on AMD GCN hardware (why a Fury-X performs nearly the same as a 290/390x).

Star Swarm also makes no use of Asynchronous Shading (which is used under Ashes of the Singularity). If it did, the AMD GCN numbers would remain around the same whereas the nVIDIA Kepler and Maxwell numbers would take a nose dive.
 
Last edited:
Feb 19, 2009
10,457
10
76
There's a big difference between optimising for a new API at the expense of the old one, and optimising for a new graphics architecture at the expense of the old one. In this case it would be that the former doesn't require another $500 outlay.

Some people don't care, they always upgrade to latest & greatest, in that case, Maxwell 2 won't matter once Pascal is out.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Vulkan supports multiple architectures, yet it was built on Mantle (and hastily too). Metal is the other Mantle contribution and also support multiple architectures, but somehow DX12 is different.. well, its your prerogative, believe what you will.

I meant from it's inception. DX12 was created from the beginning to support multiple architectures. Mantle was not. You're right that Vulkan was created using Mantle as a foundation, but that's besides the point since Vulkan is founded by the same group that brought us OpenGL, a cross architectural API.
 

Mahigan

Senior member
Aug 22, 2015
573
0
0
@ Mahigan welcome to Anandtech! That PDF document concerned Kepler and not Maxwell. HyperQ was certainly improved for Maxwell over Kepler, as Maxwell can do everything in parallel. Ryan explains it in his article:



Source

The difference between Maxwell and Maxwell 2 is that Maxwell's Grid Management Unit can only send either a Graphics task or 32 Compute tasks to the work Distributor. It cannot send both in Parallel.

Therefore you're correct in pointing out that with Maxwell 2 the communication between the Grid Management Unit and Work Distributor is now Parallel.

The problem is that this doesn't change the fact that Maxwell 2 still only contains a single Grid Management Unit. This still remains as a bottleneck.

nVIDIAs Parallelism, under Maxwell 2, is thus limited to 1 Graphics and 31 Compute tasks. AMDs Parallelism, under GCN 1.1 (290 series) and GCN 1.2 is limited to 1 Graphics and 64 Compute tasks.

Another difference is that AMDs GCN 1.1 (290 series)/GCN 1.2 have 8 independent Asynchronous Compute Engines each able to schedule and prioritize work independently of one another. With Maxwell 2, it's a single Grid Management Unit. You can see why GCN 1.1 (290 series)/GCN 1.2 can best take advantage of the available compute resources.

Take a look at all of those light sources floating around in Ashes of the Singularity. Each unit emits its own light sources in Parallel to other units. Each one of those light sources is a Compute task.

Therefore if there are more than 31 Compute tasks (assuming there is a Graphics task which there ought to be because of the amount of Rasterization going on), it takes two cycles for Maxwell 2 to assign the tasks to the Work Distributor. This looks to be the culprit (explaining why Maxwell 2 tends to match, but not beat, AMDs GCN 1.1/1.2 architecture).

I'm quite certain that Pascal will incorporate more than a single Grid Management Unit for this very reason.

What we do know is that with each run of the benchmark, the amount of work on the screen changes because the benchmark isn't static. It is rather dynamic. Rendered in real time. If we look at 4K over at the Dutch website you linked we see this...


The GTX 980 Ti isn't doing poorly over at Ars Technica. At Ars they used a 290x to compare with the GTX 980 Ti. There is no 290x over at computerbase.de. They used a 390 (which is a slightly overclocked 290 with 8GB of GDDR5 ram). That's 2816 ALU vs 2560. Since Asynchronous Shading allows 100% efficient use of the GPU (as seen in the Ashes benchmarking tools) that translates into being able to add the 10% more ALUs figure directly. Add this difference and you'll see how Ars Technica got their numbers. 48.5 FPS of the 390 times 10% equals 4.85. 48.5 plus 4.85 equals 53.35FPS. That's close to the 55.4FPS of the GTX 980 Ti. Feel me?

A stock 290x can nearly match a stock GTX 980 Ti under Ashes of the Singularity. That's what is impressive imo.

I am near damn certain that if there was no Asynchronous Shading going on, the GTX 980 Ti would walk away with a large lead over the Fury-X just as it does under Star Swarm.

Ashes of the Singularity can thus be considered, imo, a Asynchronous Shading benchmark. I doubt we will see this degree of Asynchronous shading under most DX12 titles but I could be wrong.
 
Last edited:
Feb 19, 2009
10,457
10
76
I meant from it's inception. DX12 was created from the beginning to support multiple architectures. Mantle was not. You're right that Vulkan was created using Mantle as a foundation, but that's besides the point since Vulkan is founded by the same group that brought us OpenGL, a cross architectural API.

Actually AMD have said they intended for Mantle to be cross-architecture, as a foundation for new APIs, but the early showcase was on GCN as a proof of concept.

This is awhile ago:
http://www.eteknix.com/interview-amds-richard-huddy-responds-criticisms-mantle/
Right now Mantle only works with AMD hardware, yes, that’s true. But AMD has created what could become the foundation of a new Open Standard.

AMD didn’t “undercut” Microsoft, instead AMD lead the way in bringing low level APIs into the 21st Century.

Your contention is that Mantle is specific to GCN so therefore it cannot have been the foundation of DX12, but clearly, VULKAN is cross-architecture, appearing very quickly out of nowhere, built with Mantle as a foundation. The similar timing with Apple's Metal also suggest they got a similar deal to Kronos. 2 big players (& competitors to MS) have a brand new API without doing major work or investment for free.. surely that would have enticed MS big-time to "borrow" some Mantle-like features to bring forth DX12. That's the way I see it. We are free to disagree. In time, we shall see who was right.

Edit: Reading that statement again from AMD, that was awhile ago and they already planned (& allowed to publicly say) that Mantle could become the foundation of a new Open Standard.. obviously their plans go back further than that public statement and I would have to say that Mantle's purpose was exactly that, to become the new API standard. It's no coincidence when Mantle debuts, we see the rise of Metal, Vulkan then MS announcing DX12 "soon" and here we are.
 
Last edited:

shady28

Platinum Member
Apr 11, 2004
2,520
397
126
http://oxidegames.com/2015/08/16/the-birth-of-a-new-api/



Oxide is fully aware of their engines and potential issues. Their prior exp was in Civ 5, one of the few DX11 games with multi-threaded rendering (which excelled on NV's Fermi & latter GPUs!) and directcompute shaders.

You keep on quoting a source that has an AMD logo on their front page, was closely aligned with AMD for mantle, and later for DirectX12 as if they have complete veracity.

The conflict of interest is quite obvious. When this all washes out, most likely the only victor will be Stardock with huge "Founder" sales for their RTS game - a genre that has been ever so slowly fading for 10 years.

Given we have virtually nothing else to benchmark, I suspect that being "first" to field with something to measure DirectX 12 with was a planned move. I wonder how many man-hours they spent optimizing their AMD DX12 code paths vs their Nvidia DX12 code paths.

Meanwhile, we are starting to get some other items to benchmark.

"Unreal Engine 4.9 – DirectX 12 Tech Demo Now Available for Download"

http://techfrag.com/2015/08/21/unreal-engine-4-9-directx-12-tech-demo-now-available-for-download/
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
The problem is that this doesn't change the fact that Maxwell 2 still only contains a single Grid Management Unit. This still remains as a bottleneck.

nVIDIAs Parallelism, under Maxwell 2, is thus limited to 1 Graphics and 31 Compute tasks. AMDs Parallelism, under GCN 1.1 (290 series) and GCN 1.2 is limited to 1 Graphics and 64 Compute tasks.

Take a look at all of those light sources floating around in Ashes of the Singularity. Each unit emits its own light sources in Parallel to other units. Each one of those light sources is a Compute task.

Therefore if there are more than 31 Compute tasks (assuming there is a Graphics task which there ought to be because of the amount of Rasterization going on), it takes two cycles for Maxwell 2 to assign the tasks to the Work Distributor. This looks to be the culprit (explaining why Maxwell 2 tends to match, but not beat, AMDs GCN 1.1/1.2 architecture under the majority of the tests done across the web, you point to a single test which shows differing results).

From everything I've been able to discern, the Grid Management Unit (or GMU) is a system that works in tandem with HyperQ to increase parallelism as much as possible by prioritizing workloads dynamically, even to the point where it can pause them or hold pending or suspended grids..

So having one GMU shouldn't matter, as all it's doing is dynamically managing which grids to be sent to processing.

I'm quite certain that Pascal will incorporate more than a single Grid Management Unit for this very reason.

If this was such a big deal, I think NVidia would have already done so for Maxwell.. The entire process is very dynamic with Maxwell, so adding another GMU wouldn't likely make a big difference considering that the GMU can pause active, or hold pending or suspended grids.

I think you're also picking and choosing your benchmarks based on the results you'd like to see. What we do know is that with each run of the benchmark, the amount of work on the screen changes because the benchmark isn't static. It is rather dynamic. Rendered in real time. If we look at 4K over at the Dutch website you linked we see this...

That's the same benchmark I used in my earlier post. It's German, not Dutch.

Also Fury X has a big bandwidth advantage on the GTX 980 Ti which becomes more pronounced at higher resolutions, particularly when MSAA is thrown into the mix. So it's not surprising that the Fury X would gain an edge at 4K when MSAA is being used..
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |