The AMD Mantle Thread

Page 193 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

desprado

Golden Member
Jul 16, 2013
1,645
0
0
How can so many still not get this?



- Heavily multithreaded rendering
- Asynchronous compute
- That's where the complexity is (meaning with DX)
- This now translates easily to the PC (meaning with Mantle)

It's almost a direct copy - you're talking about AMD x86 CPU and AMD GCN GPU in the console and in the (AMD) PC. How much different can it be? WHY would it be so different?

The reason it's so hard under DX is because DX can't deal with multithreading properly and doesn't know what to do with compute either so it just mashes it all together in a single thread, causing the game to stall. Mantle - and the consoles - has separate queues so compute can run simultaneously with graphics, with no chance of either stalling the other.

Devs will start to take advantage of this feature in consoles very soon, and it will transfer to Mantle very easily. It will probably need cutting out of the DX version, or will be cut due to time/difficulty.
Wow man u believe in this so much hype.Even the product is not yet and Developers which are using it are 4 only which are based on AMD contract(Gaming Evolved).

These are all PR Slides which any one can make and anytime.
 

SiliconWars

Platinum Member
Dec 29, 2012
2,346
0
0
I guess I'm confused. How can you take features that were designed to work on consoles, port them to Mantle, and then not be able to implement them on any other API for PC, when the XBox One uses DX 11.x and PS4 uses OGL?

If they work on both of those two separate APIs, then they should be able to be made to work on DX whatever version on PC. Granted, perhaps not as efficiently, but still work.

Just working isn't enough - it has to work efficiently. Look at the most recent example of the motion blur with Nitrous. It's still "working" on DX but it's so slow as to be pointless.

Now, if there are specific routines or functions enabled only by GCN hardware, great. But there are ways to deal with that with current APIs. Direct X games, and developers that code for them seem to be able to have capability queries figure out if you can run TressFX/MLAA/FXAA/TXAA whatever... just fine. Similarly, with current coding, if I decide I don't want to run or my hardware can't efficiently support one of those features, it doesn't shunt me into a completely unoptimized code path running a separate game build. (well unless you are talking about DX version builds, but that's somewhat different)

I'm still struggling, because the "This enables visuals millions of times way ahead of consoles" and "This is the same capability set and code as the consoles, so faster and cheaper ports!" flags seem to be switched and reswitched depending on what logic or questions get asked. I can't soo how it can be both, but the frantic pace of posting in this thread means that both sets of statements aren't in proximity long enough for most people to try and compare.
There's nothing that *can't* be done on DX that can be done in Mantle, but there are things that just can't be done efficiently. That's the real difference - that's what they mean when they talk about new stuff being enabled by Mantle - it's not that DX couldn't do it period, it just could not ever be done efficiently...so basically the same thing in terms of being workable.

DX can multi-thread but it will be shown up as badly inefficient compared to Mantle. BF4 will show it - the game is already well threaded (for DX) but on Mantle it will perform in a different league.

DX


Mantle


DX can also run compute but it can't do it in it's own queue so it all ends up competing for the same resources. These slides show it best -



and



You wouldn't attempt this final slide under DX because it would all end up being mashed together in one thread and would stall the game. With Mantle it's all in separate queues - the devs know what will happen and when, they have full control over it. The devs don't know what will happen with DX half of the time.

This is really good information on it - http://techreport.com/review/25683/delving-deeper-into-amd-mantle-api/3

The APIs we have right now, they just allow us to queue synchronous workloads. We say, "draw some triangles," and then, "do some compute," and the driver can try to be a little smart, and maybe it'll overlap some of that. But for the most part, it's serial, and where we're doing one thing, it's not doing other things.
That paragraph is basically all you need to understand it, but the rest is worth reading as well.

With Mantle . . . we can schedule compute work in parallel with the normal graphics work. That allows for some really interesting optimizations that will really help your overall frame rate and how . . . with less power, you can achieve higher frame rates.
What we'd see, for example—say we're rendering shadow maps. There's really not much compute going on. . . . Compute units are basically sitting there being idle. If, at the same time, we are able to do post-processing effects—say maybe even the post-processing from a previous frame, or what we could do in Tomb Raider, [where] we have TressFX hair simulations, which can be quite expensive—we can do that in parallel, in compute, with these other graphics tasks, and effectively, they can become close to zero cost.
If we guessed that maybe only 50% of that compute power was utilized, the theoretical number—and we won't reach that, but in theory, we might be able to get up to 50% better GPU performance from overlapping compute work, if you would be able to find enough compute work to really fill it up.

What Katsman is saying is that the hardware has the capability but DX can't make it work without seriously affecting the rest of the game.

This part of the Oxide presentation is well worth watching - http://www.youtube.com/watch?feature=player_detailpage&v=QIWyf8Hyjbg#t=839

They are avoiding features of the hardware because of the overhead in DX - the features are there but they are unusable because of the overhead. That must be annoying as hell to see features that are available but unusable just because of the API.

Sigh. I should probably just stop trying to force it to make sense and wait for the benchmarks. The sheer volume of information being slung about though makes me wonder what sorts of repercussions this is all likely to have, if only on the fate of civil discussion.
This is pretty advanced stuff and based on what you said you're understanding it more than most are. People understand it at different levels anyway - I couldn't program a GPU if my life depended on it, but I am beginning to get a decent understanding of the concepts behind how it all works at an abstract level, so I can now figure out what the devs are talking about even if I couldn't fix it.
 

SiliconWars

Platinum Member
Dec 29, 2012
2,346
0
0
Wow man u believe in this so much hype.Even the product is not yet and Developers which are using it are 4 only which are based on AMD contract(Gaming Evolved).

This again? How many people have told you otherwise on this point, yet you continue to ignore them?

These are all PR Slides which any one can make and anytime.
Make some, come back and show us.
 

SiliconWars

Platinum Member
Dec 29, 2012
2,346
0
0
DX and OpenGL on the PC don't have the hardware access that they do on the consoles.

What don't they have access to? I thought it was mostly because it would be pointless to attempt it with DX or OGL, nothing to do with not having access.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136

OK that slide looks a bit fishy to me. They claim DX doesn't scale beyond 2-3 cores, yet in their Frostbite 3 presentation, they claimed the engine could use up to 8 threads:

Source slide number 11.

I've played BF4 multiplayer, and every single one of my threads on my 3930K were lit up. Crysis 3 is the same, and the Firaxis presentation on Civ 5 showed high activity across 12 threads as well..

So DX isn't limited to 2 or 3 from what I've seen and experienced.
 

Spjut

Senior member
Apr 9, 2011
928
149
106
That's actually impressive off a laptop. Could anybody translate what he's saying?

I'm swedish. Much of what he says is what we have heard before and is said in the text. CPU is AMD A10. 40-45% faster according to AMD, no specific framerates were given (but this demo ran at around 30 FPS for them). The demo crashed twice during "the last 10 minutes".
 

SiliconWars

Platinum Member
Dec 29, 2012
2,346
0
0
Windows might show every thread of a 12 thread CPU running at 10%, but it's really only running on two threads.

A quad core CPU showing 20% on each thread in Windows is probably only running on one thread too, at 80% utilisation, or two threads at say 60% and 20%.
 
Last edited:

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Windows might show every thread running at 10%, but it's really only running on two threads.

A quad core CPU showing 20% on each thread in Windows is probably only running on one thread too, at 80% utilisation, or two threads at say 60 and 20.

What on Earth are you talking about?

So the Windows task manager is lying? I seriously doubt that. I think what they meant was that DX is limited to 2 or 3 threads for RENDERING perhaps, but definitely not explicitly limited to 2 or 3 threads.

But even that's not really true. Might be true for immediate context rendering, but with DX11 multithreading, deferred context can use as many worker threads as required to feed the GPU, resulting in draw calls in excess of 15K to 20K..
 

Paul98

Diamond Member
Jan 31, 2010
3,732
199
106
What don't they have access to? I thought it was mostly because it would be pointless to attempt it with DX or OGL, nothing to do with not having access.

You don't even have DMA on PC, there is a lot done by DX that you have no access to change and minimal control over.
 

Paul98

Diamond Member
Jan 31, 2010
3,732
199
106
What on Earth are you talking about?

So the Windows task manager is lying?

No, but a single thread running in windows will most likely show on a quad core about 25% per core. It won't just fully load one core and leave the rest doing nothing or very little.
 

SiliconWars

Platinum Member
Dec 29, 2012
2,346
0
0
What on Earth are you talking about?

So the Windows task manager is lying? I seriously doubt that.

Lying? Well not exactly but it's not really showing you the whole truth either. I thought this was common knowledge by now.

I think what they meant was that DX is limited to 2 or 3 threads for RENDERING perhaps, but definitely not explicitly limited to 2 or 3 threads.

Probably that's what they meant, but the slide still shows the serial nature of the rendering under DX that doesn't exist with Mantle.
 
Feb 19, 2009
10,457
10
76
Here is the possible virtuous cycle that I see.
-Mantle is released on BF4.
-Performance is better on GCN cards.
-Gamers upgrade to new GCN cards or APUs
-Gamers buy an Oxide game or Thief because they want to see what else is
possible with their new card.
-Other game developers see the possibilities, and implement Mantle
in the next project. Maybe release a Mantle patch for an existing game.
(as long as they see a possible new revenue stream)
-More gamers buy GCN cards.
-Intel investigates
-More games are released with Mantle options.
-Games get better and faster and smoother.
-Intel adds Mantle support(or licenses it)
-More cards have mantle
-nVidia stays nVidia.
-Etc.

Here is the "certain" viscous cycle that a lot of others see.
-Mantle is released on BF4.
-Performance is better on GCN cards.
-Gamers upgrade to new GCN cards or APUs
-nVidia releases their own API
-Intel releases their own API
-Game makers get confused.
-All games from here forward are a convoluted mess of multiple
API code, and gamers lose.


The first option is a possibility in my mind. Or else the benefits are
minimal, gamers ignore it, and it goes away. I like it that AMD
is trying this. Gamers might benefit. The other multi-API,
end of PC gaming scenario is just FUD. I don't see it happening

Intel are so far behind its pointless for them to release their own rendering API since anyone who seriously games on PC isn't using intel iGPU. People who casual games may but those games are so weak it wont matter.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Lying? Well not exactly but it's not really showing you the whole truth either. I thought this was common knowledge by now.

I've never heard that before. It's possible that the early versions of task manager in XP may have been inaccurate, but in Vista, Windows 7 and especially Windows 8/8.1, the task managers display lots of information about system resources and their usage.

Probably that's what they meant, but the slide still shows the serial nature of the rendering under DX that doesn't exist with Mantle.
DirectX hasn't had serial rendering since DX9

If DX still had serial rendering, then you couldn't make games like Crysis 3, BF4, AC IV, Witcher 3 etc without some serious compromises in detail.
 
Last edited:

SiliconWars

Platinum Member
Dec 29, 2012
2,346
0
0
I've never heard that before. It's possible that the early versions of task manager in XP may have been inaccurate, but in Vista, Windows 7 and especially Windows 8/8.1, the task managers display lots of information about system resources and their usage.

It's also possible that Windows is showing 1 actual strong thread split over your 12 CPU threads, and that's why when gaming your task manager shows 10-20% utilisation on each thread. Not just possible, in fact that's what is happening.

DirectX hasn't had serial rendering since DX9
The vast majority of games c. 2014 are still rendering in one thread.

If DX still had serial rendering, then you couldn't make games like Crysis 3, BF4, AC IV, Witcher 3 etc without some serious compromises in detail.
You can hack anything together on any hardware/software combo - it just takes a lot of work in some cases. If DX was truly multi-threaded then CPU's would be irrelevant for 99% of today's games - the way it should be and the way it will be under Mantle.
 

mikk

Diamond Member
May 15, 2012
4,175
2,211
136
Doesn't scale beyond 2-3 cores is actually true for the big majority of directx games. Task manager scaling is different. What matters in the end is the overall CPU utilization. With a 2-3 cores scaling a 4/4 Quadcore gets 50-75% overall CPU utilization. The windows scheduler splits it, so of course alle threads are utilized but not fully. There are a couple of games/engines with better scaling like Crysis 3 and BF4 @Windows 8.1 as well though.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
It's also possible that Windows is showing 1 actual strong thread split over your 12 CPU threads, and that's why when gaming your task manager shows 10-20% utilisation on each thread. Not just possible, in fact that's what is happening.

This is my task manager:



It shows the individual usage for every single one of my 12 threads.

The vast majority of games c. 2014 are still rendering in one thread.
That's because the vast majority of games don't need more than one rendering thread. Only large, complex games with lots of detail and objects on screen benefit from multithreaded rendering.

You can hack anything together on any hardware/software combo - it just takes a lot of work in some cases. If DX was truly multi-threaded then CPU's would be irrelevant for 99% of today's games - the way it should be and the way it will be under Mantle.
You're simply wrong on this. The big AAA games like Crysis 3, AC IV, BF4 are all native DX11 titles that use immediate context multithreaded rendering. AC IV may possibly use deferred context as well, like it's predecessor AC III.

If DX11 wasn't truly multithreaded, then there would be no gain from multicore CPUs, which obviously isn't the case as some of these games will scale all the way to 8 threads..
 
Feb 19, 2009
10,457
10
76
It depends on the game engine these games are made on, AAA engines from Crytek and Frosbite scales very well with heaps of cores, but once you get outside the AAA studios, there's a lot less effort to support moar cores because their engines they license dont have it.

But, if Mantle is getting support in more game engines, that situation could improve a lot. So there is a case for some optimism.
 

mikk

Diamond Member
May 15, 2012
4,175
2,211
136
If DX11 wasn't truly multithreaded, then there would be no gain from multicore CPUs, which obviously isn't the case as some of these games will scale all the way to 8 threads..


Scaling != utilization


Let's say each core is utilized by 10%.....sure it scales over all threads but your utilization is very poor and the scaling is just bad. This is the big issue on directx. Ideally it would scale over all cores with a 100% or close to 100% utilization. As I said in the end the overall CPU utilization matters.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Scaling != utilization


Let's say each core is utilized by 10%.....sure it scales over all threads but your utilization is very poor and the scaling is just bad. This is the big issue on directx. Ideally it would scale over all cores with a 100% or close to 100% utilization. As I said in the end the overall CPU utilization matters.

Yes, scaling doesn't equal utilization. But it gives you the potential of a 3D engine to utilize a multicore processor.

Also, expecting a game to utilize all cores with 100% or close to utilization is silly, because game engines are dynamic by nature.

In BF4 single player campaign, my CPU usage isn't very high (both overall and individual thread usage) because the game isn't demanding much from my CPU. Multiplayer now is another matter entirely. It's much more demanding on the CPU because there's a lot more going on for the CPU to orchestrate.

Even in single player games you can see this. In the Welcome to the Jungle section of Crysis 3, CPU usage is much higher than in other areas of the game due to the grass physics. Conversely, the Root of all Evil level is extremely GPU intensive and not very CPU intensive because it's a shader intensive level.
 
Last edited:

Gloomy

Golden Member
Oct 12, 2010
1,469
21
81
Classic case of CPU bottleneck. Luckily that doesn't happen (as bad) in multiplayer, because it would be unplayable.

There are certain spots in MP that do this. I thought it was because of the whole 64 player situation. This guy is staring at a boat in single player and running into a CPU bottleneck, seems legit. :hmm:
 

MutantGith

Member
Aug 3, 2010
53
0
0
...

This is pretty advanced stuff and based on what you said you're understanding it more than most are. People understand it at different levels anyway

...

Thanks for all the information. Though I'm pretty sure that I'd seen most/all of that before, It's nonetheless of some value to scribe it out, in case someone doesn't want to go back and pick through to read that out.

When I said I was confused, it was far more a figure of speech. While I was thinking about going through and pointing out where (as I see it) there's contradictory and deliberately confusing information being bandied about in a lot of statements being repeated here, I don't think that I have the time or belief that it would benefit the discussion to do that.

The quoted segment above though, I think, sums up my frustration with this discussion on this API, as well as the way it's being marketed so aggressively. No matter how much people know, or think they know, there are certain things that just can't be fully known about the API and how it works (on a nuts and bolts level) at this point, unless people here actually work at development houses implementing the code, or at AMD. There is just too much unreleased information, and as time goes on, releases and statements directly contravene previous statements (and assumptions).

That lack of information and understanding isn't stopping whole truck loads of hyperbole being slung around by a lot of people though. It also seems like any time that the nuts and bolts of how this system might/should/could work get dragged back into the limelight of the conversation, the hyperbole gets flooded right back in, and any more detailed examination as a community gets flooded out.

Meh. Thanks for the reply in any event.

We'll just have to wait and see, and hope that we aren't moving into a walled garden scenario in what was a traditionally relatively open PC gaming era.
 
Status
Not open for further replies.
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |