How many ACEs does Maxwell 2 have?

Carfax83 · Aug 17, 2015

I've been researching ACEs a lot lately, as I believe they'll play a massive role in reshaping PC gaming for the better; nearly as much as DX12's higher draw call ceiling. Asynchronous compute could finally put the nail in the coffin for GPU PhysX for instance, at least when it comes to relying on CUDA. With DX12, asynchronous compute is done natively, just like DirectCompute with DX11. Whereas with CUDA, the GPU must engage in expensive context switching..

Anyway, the technology has loads of potential to be sure, especially with physics.. But as I was researching it, I found conflicting information as to how many ACEs Maxwell 2 has.

According to our very own Ryan Smith who wrote an excellent article about asynchronous compute shaders, Maxwell 2 has 31 dedicated compute engines with 1 queue dispatch each, or 32 in pure compute mode. The latest GCN core has 8 compute engines, with 8 queue dispatches for each one, totalling 64 queues.

Several AT members have asked him to correct the data, but he has not done so, claiming and I quote,"The queue counts are correct. Keep in mind we're counting engines, not queues within an engine."

Extremetech's Joel Hruska seems to back up Ryan's data when he wrote a similar article and in the comments section below, stated,"I've seen no evidence that Nvidia has dumbed down anything. Maxwell has 32 asynchronous engines to AMD's eight, and can operate in similar fashion.."

Several members on this very forum (as evidenced in this thread) have stated that Maxwell 2 has only ONE compute engine with 32 queues.. How do they respond to this conflicting information, where two major tech websites claiming one thing, while they claim another?

As a Maxwell 2 owner, the idea that NVidia would improve compute performance so substantially over Kepler but keep only a single compute engine, strikes me as silly and illogical.

So right now I'm inclined to believe Ryan and Joel. In any case, it will be very interesting to see how NVidia's approach compares with AMD's when we finally start seeing games use this feature. Hopefully Gears of War Ultimate edition will use asynchronous compute, as it will likely be the first game this year to have DX12.

DustinBrowder · Aug 17, 2015

GTX 960 is already Maxwell 2. So I'm guessing zero with the GTX 950 series.

Vaporizer · Aug 17, 2015

Since NV controls 70% of the market the developers will adjust ACE usage to NV. This means in principle that the potential more ACE power of AMD will have no measureable effect till NV decides to increase ACE on their hardware. So Business as usual.

AtenRa · Aug 17, 2015

Vaporizer said:
Since NV controls 70% of the market the developers will adjust ACE usage to NV.

Or,

Since AMD controls 100% of Consoles, developers will adjust ACE usage to AMD.

Carfax83 · Aug 17, 2015

DustinBrowder said:
GTX 960 is already Maxwell 2. So I'm guessing zero with the GTX 950 series.

Maxwell 2 are GTX 970, 980, 980 Ti and Titan X. Maxwell 1 is GTX 750 Ti, since that was the first Maxwell based GPU.

Vaporizer said:
Since NV controls 70% of the market the developers will adjust ACE usage to NV. This means in principle that the potential more ACE power of AMD will have no measureable effect till NV decides to increase ACE on their hardware. So Business as usual.

That's possible but besides the point of this thread. It seems there are an awful lot of people already discounting Maxwell's asynchronous compute capabilities against AMD's Fury, but is there a basis for this?

Some people are convinced that Maxwell only has one compute engine, like Kepler. But looking at the benchmarks, Maxwell is much faster than Kepler in compute, and is very competitive with AMD at the moment..

But if Maxwell really does have 31 dedicated compute engines, does this make it the more capable GPU when it comes to asynchronous compute? Likely the answer is a helluva lot more complicated than just mere number comparisons.

Silverforce11 · Aug 17, 2015

You have to go to b3d forums for more detail talk on this topic, but from the gist of it, cuda exposes the compute engine for devs and they say it remains the same as Kepler. One engine with 32 queues. The difference is, Maxwell 2 has an enhanced DMA engine that enables asynchronous operation whereas Kepler does not.

Ryan's article is obtuse, because the chart has:

"GPU Queue Engine Support"

In pure compute mode, later GCN has 64 compute queues which can be run in parallel. Not 8 as listed. Clearly wrong.

Joe's article was also wrong on XBOX uarch details, its not 8 ACE but only 2. A lot of people recycle AT's Ryan article.

NV is being very quiet on this issue regarding their architecture and DX12, async compute, VR etc. Without them confirming officially, we only have forum posts from devs to compare, for that, again, go to b3d.

This article comes from AMD/NV/MS & Devs at GDC:
http://www.hardware.fr/news/14133/gdc-d3d12-amd-parle-gains-gpu.html

"On the side of Nvidia 's Fermi and Kepler GPUs are satisfied with a single command processor can not simultaneously support the Graphics type commands and type Compute. Either it is in one mode or the other, a change in state to operate relatively heavy.

Moreover, except GK110 / GK210, these former GPU have only a single queue at their level of control processor that is to process the tasks in the order in which they are subjected. This severely limits the ability to handle multiple simultaneous within the Compute fashion when there are dependencies between them.

With the GK110, Nvidia introduced a more advanced command processor, which supports a technology called Hyper-Q . It represents the ability to support up to 32 queues, but only Compute mode. This GPU, and GM107 / GM108 which take this specificity, are well adapted to the concomitant many small Compute type of tasks (eg for physics?).

The GK110 has also introduced a second DMA engine, but it is reserved for Tesla and Quadro variations. Finally, with the GPU Maxwell 2 (GM200 / GM204 / GM206), Nvidia blew all these limitations, unlike what we thought. First, the second DMA Engine is active on GeForce versions. But especially when Hyper-Q is active, one of the 32 queues can be of type Graphics.

In terms of software implementation, a full concomitant treatment of all types of work is supported under Direct3D 12 GPUs Maxwell 2. It is against by not possible to perform concurrent execution between Direct3D and CUDA 12, based on a different driver. NVIDIA also implemented in its controlled limited support from the concurrent running under Direct3D 11 for compute tasks only.

The API does not allow to access it explicitly, but some tricks allow pilots to activate such a mode to optimize performance (in the case of GPU PhysX?). Nvidia said that the issue will add support for a comprehensive concomitant treatment of all types of tasks, such as Direct3D 12, remained unresolved. It is not impossible if it had a purpose, but no current Direct3D 11 game is expected to enjoy. Without stand clarified in the API it is very difficult for developers to ensure that it can work."

From this hardware.fr article, it suggests Maxwell 2's Hyper-Q has issues working for CUDA, so that could explain weird cuda results that it looks "Kepler-like". Maybe for DX12, its fully 32 queues, or 1 graphics + 31 compute.

But to count it like that, for GCN, it would be 1 graphics + 64 compute. Because GCN has 1 separate graphics processor, and 8 independent ACE with support for 8 queues each.

sontin · Aug 17, 2015

You cant use AMD's marketing names for nVidia's hardware.

Since GK110 every nVidia GPU supports 32 independent compute queues. They call it Hyper-Q. A Grid Management Unit is feeding the compute units with data.

With Maxwell v2 nVidia has expended the functionality. Now it is possible to use one graphic and 31 compute queues at the same time.

Silverforce11 · Aug 17, 2015

sontin said:
You cant use AMD's marketing names for nVidia's hardware.

Since GK110 every nVidia GPU supports 32 independent compute queues. They call it Hyper-Q. A Grid Management Unit is feeding the compute units with data.

With Maxwell v2 nVidia has expended the functionality. Now it is possible to use one graphic and 31 compute queues at the same time.

GK110 Tesla could also do that, because it has 2 functional DMA engines, Geforce had 1 disabled to neuter its compute. The difference here is consumer Maxwell 2 also has 2 DMA engines.

It depends on what you want to call. If its ENGINES.

Then GCN is 1 graphics + 8 compute.

But if its how many queues it can execute, then its 1 + 64. Or 64 compute. Ryan needs to fix his chart, it's wrong against AMD's own info released.

Why is NV not hyping up DX12 and VR? Why are they not explaining their uarch in the context of DX12??

All the hype is coming from Gamedevs and AMD.

This is what I like about DX12, from Devs: https://www.youtube.com/watch?v=7MEgJLvoP2U&feature=youtu.be&t=19m40s

"Compute is basically FREE performance". It's a game changer for games.

dacostafilipe · Aug 17, 2015

How many ACEs does Maxwell 2 have?

Zero? :hmm:

Both vendors support all the needed task types (graphics, compute, copy) and have more then enough dispatch power.

AMD's ACE should have a benefit for asyn compute as the GPU has more tasks to choose from and can fill up the "blanks" faster. (Ryan states the same in the article)

That said, those engines certainly will not be the bottleneck in the new DX12 games.

zlatan · Aug 17, 2015

ACE is an AMD thing, and Nvidia don't call their approach with any fancy name.

On GCN1.1/1.2 one ACE is an out-of-order compute command engine with the ability to support up to 8 command queue. Most newer Radeons have 8 ACEs.
On Maxwell 2 there is one in-order command engine with the ability to support up to 31 compute command queue, in mixes mode.
This means that a D3D12 engine with multi-engine support can run up to eight different async pipelines on the newer GCN GPUs, while the Maxwell 2 can only execute one. But we speak about theoretical numbers here. In the practice there might be not useful to execute more than one async compute pipeline. You have to think about the register usage.

Handling async shaders is complicated. So the front-end is just one aspect of the problem. On Maxwell 2 the compute is not stateless, so a context switch might be needed to perform an async task, and this comes with a huge performance penalty. The GCN is a stateless compute hardware, so it can run any async compute task without context switch. This is more important than the front-end. And the GCN has a very robust cache system, which is also important in these scenarios.

sontin · Aug 17, 2015

DMA engines have nothing to do with it. They are responsible for the data transfering between the GPU and the CPU.

Hyper-Q is nVidia's term for asynchronous computing. They can feed the GMU from 32 independent input streams and the GMU can put out 32 independent queues.

Calling it "one engine" is just wrong.

zlatan · Aug 17, 2015

Silverforce11 said:
Why is NV not hyping up DX12 and VR?

VR is a different beast. The async timewarps are not running well on Nvidia GPUs, because Fermi, Kepler and Maxwell only support draw-based preemption with the context priority feature. This is very inefficient with long draws, and with that the timewarp might not calculated in time, which will result a stutter.
The GCN handles this differently with stateless compute and with the out-of-order ACEs. Also the newer GCN 1.2 design supports fine-grained preemption for very low-latency timewarp rendering.

Pascal will be different. It will support fine-grained preemption, just like GCN 1.2.

Silverforce11 · Aug 17, 2015

sontin said:
DMA engines have nothing to do with it. They are responsible for the data transfering between the GPU and the CPU.

Hyper-Q is nVidia's term for asynchronous computing. They can feed the GMU from 32 independent input streams and the GMU can put out 32 independent queues.

Calling it "one engine" is just wrong.

Is Hyper-Q active for geforce or only a Tesla thing?

NTMBK · Aug 17, 2015

Carfax83 said:
Some people are convinced that Maxwell only has one compute engine, like Kepler. But looking at the benchmarks, Maxwell is much faster than Kepler in compute, and is very competitive with AMD at the moment..

"Compute" benchmarks don't tell you a whole lot about asynchronous compute engines. Lots of the benchmarks are little more than FLOPs tests, launching a single massively parallel kernel and crunching it as fast as possible. Multiple compute engines come into play when you have multiple different kernels in flight simultaneously, and are trying to keep the GPU occupied.

Silverforce11 · Aug 17, 2015

zlatan said:
VR is a different beast. The async timewarps are not running well on Nvidia GPUs, because Fermi, Kepler and Maxwell only support draw-based preemption with the context priority feature. This is very inefficient with long draws, and with that the timewarp might not calculated in time, which will result a stutter.
The GCN handles this differently with stateless compute and with the out-of-order ACEs. Also the newer GCN 1.2 design supports fine-grained preemption for very low-latency timewarp rendering.

Pascal will be different. It will support fine-grained preemption, just like GCN 1.2.

Is this the reason devs (I read it a few times at different places) working on VR have said NV GPUs can't handle it well, a ton of latency and users get motion sickness.

Somebody on this forum in the past has mentioned that Maxwell 2 is more GCN-like than Kepler, I wonder if building a GPU to take advantage of DX12, would therefore turn Pascal into "very GCN-like".

Maybe it explains NV being quiet on this front, until Pascal.. then it's all aboard DX12 & VR train!

Personally I don't think it will matter that much what the differences are for current GPUs in DX12, because games built ground up for DX12 will be awhile coming, which by then, current hardware is kinda pointless. ie. the real DX12 battle is going to be Artic Island vs Pascal.

ShintaiDK · Aug 17, 2015

Silverforce11 said:
Why is NV not hyping up DX12 and VR? Why are they not explaining their uarch in the context of DX12??

All the hype is coming from Gamedevs and AMD.

This is what I like about DX12, from Devs: https://www.youtube.com/watch?v=7MEgJLvoP2U&feature=youtu.be&t=19m40s

"Compute is basically FREE performance". It's a game changer for games.

Did you burn your fingers enough on hype already? Specially AMD hype?

And marketing slides and reality have a tendency to be far apart. Go figure out why.

zlatan · Aug 17, 2015

Silverforce11 said:
Is this the reason devs (I read it a few times at different places) working on VR have said NV GPUs can't handle it well, a ton of latency and users get motion sickness.

Somebody on this forum in the past has mentioned that Maxwell 2 is more GCN-like than Kepler, I wonder if building a GPU to take advantage of DX12, would therefore turn Pascal into "very GCN-like".

Maybe it explains NV being quiet on this front, until Pascal.. then it's all aboard DX12 & VR train!

Personally I don't think it will matter that much what the differences are for current GPUs in DX12, because games built ground up for DX12 will be awhile coming, which by then, current hardware is kinda pointless. ie. the real DX12 battle is going to be Artic Island vs Pascal.

Yes, the long draws will kill the experience. There might be some engine redesign to improve the performance, but it is hard to handle this situation.

Pascal won't magically solve the problems for NV, because they don't have an own API like Mantle. So they need the hardware, and than the standard VR API to use the new abilities.

Silverforce11 · Aug 17, 2015

ShintaiDK said:
Did you burn your fingers enough on hype already? Specially AMD hype?

And marketing slides and reality have a tendency to be far apart. Go figure out why.

I take more value from gamedevs who directly hype up DX12. Not AMD, that's just interesting but besides from that, always expect some PR when it comes to official company stuff.

We're seeing devs hype async compute on consoles as well, not just PC. Then the VR devs always are full of enthusiasm. I'm reserving final judgement until I can get hands on with it, but atm, my leaning is towards VR being a gimmick..

Silverforce11 · Aug 17, 2015

zlatan said:
Yes, the long draws will kill the experience. There might be some engine redesign to improve the performance, but it is hard to handle this situation.

Pascal won't magically solve the problems for NV, because they don't have an own API like Mantle. So they need the hardware, and than the standard VR API to use the new abilities.

Why do they need a separate API for VR, is DX12 not capable?

zlatan · Aug 17, 2015

Silverforce11 said:
Why do they need a separate API for VR, is DX12 not capable?

For the current specs no. Microsoft will need to extend the multi-engine compute with fine-grained preemption support, and the API must support latest data latch for optimal head tracking. Only Mantle support these right now, but a standard will surely evolve in the future.

Azix · Aug 17, 2015

Silverforce11 said:
Is this the reason devs (I read it a few times at different places) working on VR have said NV GPUs can't handle it well, a ton of latency and users get motion sickness.

Somebody on this forum in the past has mentioned that Maxwell 2 is more GCN-like than Kepler, I wonder if building a GPU to take advantage of DX12, would therefore turn Pascal into "very GCN-like".

Maybe it explains NV being quiet on this front, until Pascal.. then it's all aboard DX12 & VR train!

Personally I don't think it will matter that much what the differences are for current GPUs in DX12, because games built ground up for DX12 will be awhile coming, which by then, current hardware is kinda pointless. ie. the real DX12 battle is going to be Artic Island vs Pascal.

dx12, fortunately or unfortunately, was designed for the current hardware. The future hardware might do things faster but I don't expect current up-to-date hardware to be pointless. We have dx12 games coming sooner than that. Fable legends dx12 alone, ashes of the singularity built from the ground up for dx12. There will be more. The good thing is that a lot of dx12 features are for performance so that current hardware should get faster in the near future either with higher fps or better graphics.

Nvidias silence is suspicious. I am surprised to hear they don't do VR that well even though that was one thing they chose to talk about. Maybe we missed some information releases.

Elixer · Aug 17, 2015

Vaporizer said:
Since NV controls 70% of the market the developers will adjust ACE usage to NV. This means in principle that the potential more ACE power of AMD will have no measureable effect till NV decides to increase ACE on their hardware. So Business as usual.

Nope...not even close.

The big dog still is intel.

As for # of ACEs, that is the wrong thing to be looking at, since they are not called that on different hardware.

For whatever reason Ryan didn't correct the table, and made everything more murky, processors vs queues, and really should be corrected.
He writes

Meanwhile Maxwell 2 has 32 queues, composed of 1 graphics queue and 31 compute queues (or 32 compute queues total in pure compute mode)

So, here, he is clearly talking about compute queues. So, if that is the case, then the table should show 64 for GCN 1.1 & 1.2.

Enigmoid · Aug 17, 2015

Elixer said:
Nope...not even close.

The big dog still is intel.

As for # of ACEs, that is the wrong thing to be looking at, since they are not called that on different hardware.

For whatever reason Ryan didn't correct the table, and made everything more murky, processors vs queues, and really should be corrected.
He writes
So, here, he is clearly talking about compute queues. So, if that is the case, then the table should show 64 for GCN 1.1 & 1.2.

There is the graphics market and the gaming market

Most of those intel igp's are not part of the gaming market.

For instance the devs of Witcher 3 really didn't care about intl igps. No igp can play witcher 3 at acceptable framerates.

tviceman · Aug 17, 2015

Silverforce11 said:
Maybe it explains NV being quiet on this front, until Pascal.. then it's all aboard DX12 & VR train!

From a casual VR interest perspective, I don't see how Nvidia is being any different than AMD or anyone else regarding VR interest/PR/initiatives. When the 980 came out, they touted the GM204's VR capabilities in their press deck slides. I remember that very specifically. And this past week they released their VR-works software suite. Also they have discussed different techniques to improving VR performance, like this: http://www.fudzilla.com/news/graphics/38179-nvidia-vr-multi-rendering-saves-20-percent-performance

I'm not particularly familiar with other companies initiatives (besides Oculus and HTC/Valve), but I don't get the impression that Nvidia is being "quiet" on VR.

LTC8K6 · Aug 17, 2015

Silverforce11 said:
Why is NV not hyping up DX12 and VR? Why are they not explaining their uarch in the context of DX12??

Speak softly, and carry a big stick?

Maybe NV doesn't need to say much?

How many ACEs does Maxwell 2 have?

Diamond Member

Member

Member

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Senior member

Senior member

Diamond Member

Senior member

Lifer

Lifer

Lifer

Lifer

Senior member

Lifer

Lifer

Senior member

Golden Member

Lifer

Platinum Member

Diamond Member

Lifer