The AMD Mantle Thread

Page 194 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

Paul98

Diamond Member
Jan 31, 2010
3,732
199
106
This is my task manager:



It shows the individual usage for every single one of my 12 threads.

That's because the vast majority of games don't need more than one rendering thread. Only large, complex games with lots of detail and objects on screen benefit from multithreaded rendering.

You're simply wrong on this. The big AAA games like Crysis 3, AC IV, BF4 are all native DX11 titles that use immediate context multithreaded rendering. AC IV may possibly use deferred context as well, like it's predecessor AC III.

If DX11 wasn't truly multithreaded, then there would be no gain from multicore CPUs, which obviously isn't the case as some of these games will scale all the way to 8 threads..

The point people are trying to make to you is that a single threaded app won't show 100% on one core and 0 on others, it will show it spread over all cores.
 

jj109

Senior member
Dec 17, 2013
391
59
91
Classic case of CPU bottleneck. Luckily that doesn't happen (as bad) in multiplayer, because it would be unplayable.

Classic case of AMD drivers or just an fanboy trying to hype up mantle is much more probable.

GTX 780 GPU-bound


0x MSAA CPU-bound

 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
The point people are trying to make to you is that a single threaded app won't show 100% on one core and 0 on others, it will show it spread over all cores.

Nope, a single threaded app will use two threads on my machine due to hyper-threading.

If it was a single threaded app, then how would it be able to spread over all cores? That makes no sense.

If the graph is showing it spread over all cores, it's because you have it configured to show overall utilization of the entire CPU, and not each thread like I do.
 

jj109

Senior member
Dec 17, 2013
391
59
91
If it was a single threaded app, then how would it be able to spread over all cores? That makes no sense.

If the graph is showing it spread over all cores, it's because you have it configured to show overall utilization of the entire CPU, and not each thread like I do.

The thread gets moved across the cores by the OS. This is a single threaded CB R15 run:

 

ASM-coder

Member
Jan 12, 2014
193
0
0
There is just too much unreleased information, and as time goes on, releases and statements directly contravene previous statements (and assumptions).
Can you point to a statement from AMD or Dice where they have contradicted themselves or each other, or are you just referring
to the unwashed masses on this thread?
 

MutantGith

Member
Aug 3, 2010
53
0
0
Contradiction wise, Mantle can be open sourced and an open standard/will only work with GCN hardware.

Is exactly the API for one or more consoles/Is not the API for consoles.

Will completely replace drivers in the windows environment for specific hardware/will have a hardware abstraction layer

That sort of thing.

Some of it is definitely people in this and other forum environments embellishing. However, I can't help but feel that statements are being made as intentionally vague yet suggestive as possible, so as to encourage this sort of assumption without having to worry about being proved technically wrong later.

A complete technical briefing, white paper, or similar from AMD would of course solve these issues, but that seems to be something of a pipe dream. Rather than just explain - "This is how Mantle works, here is a sample of API execution, here is our plan for licensing/releasing extension sets... etc" there is just instead this giant, nebulous cloud of claims. The lack of certainty and particulars in the information actually being shared means that people are filling in the gaps in that knowledge with assertions ranging from hailing Mantle as the greatest acheivment in computers since Floating Point Operations, all the way to people that are dismissing it out of hand as a complete non-starter.

For such a small amount of actual information having been released, it certainly has generated a heck of a lot of page hits, forum messages, and completely flooded news reporting throughout the holidays. Unfortunately, because of the polarized nature of this particular hobby and, frankly, this forum in particular, all of this delay in details being filled in has just left everyone time to drive way out to the extremes of rhetoric.

Hope that clarifies.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
The thread gets moved across the cores by the OS. This is a single threaded CB R15 run:

OK I see what you're saying. But still, if the OS is moving the workload across each core, that's different from saying that the workload is spread across each core.

For the latter to be true, the workload would have to be multithreaded.

BTW jj109, that graph in the bottom left of the screenshot you took, is that part of the inbuilt FPS counter, or is that something else?
 

jj109

Senior member
Dec 17, 2013
391
59
91
OK I see what you're saying. But still, if the OS is moving the workload across each core, that's different from saying that the workload is spread across each core.

For the latter to be true, the workload would have to be multithreaded.

BTW jj109, that graph in the bottom left of the screenshot you took, is that part of the inbuilt FPS counter, or is that something else?

That's the BF4 performance overlay graph that can be pulled up using the console command:

PerfOverlay.DrawGraph 1

Or something along those lines.

I have it up so you guys can't accuse me of cherry picking high FPS screenshots. :biggrin:
 

jj109

Senior member
Dec 17, 2013
391
59
91
Something must be up with that guy's machine to get such low performance compared to jj109's.

I know AMD's drivers aren't as extensively multithreaded as NVidia's, but that's just far too large a deficit.

Well, it's not the reflection on the scope :biggrin:

 

dacostafilipe

Senior member
Oct 10, 2013
772
244
116
OK that slide looks a bit fishy to me. They claim DX doesn't scale beyond 2-3 cores, yet in their Frostbite 3 presentation, they claimed the engine could use up to 8 threads

Frostbite is not DX .... Frostbite uses DX.

DX may be limited to a number of cores, but the engines are not ....

It's possible to prepare a lot of stuff in multiple cores before sending it to DX.
 

Gloomy

Golden Member
Oct 12, 2010
1,469
21
81
I knew that performance seemed fishy :thumbsup:

I would test if it's AMD drivers, but I just woke up and found my PC won't boot. Have to troubleshoot now, sorry-- pretty sure the motherboard is dead. It's like a month old. :|
 

jj109

Senior member
Dec 17, 2013
391
59
91
Frostbite is not DX .... Frostbite uses DX.

DX may be limited to a number of cores, but the engines are not ....

It's possible to prepare a lot of stuff in multiple cores before sending it to DX.

DX using deferred contexts and command lists can scale to as many threads as the engine developer deems efficient.

I found this slide from GDC 2013 where Nvidia talks about using command lists and deferred context to remove CPU bottlenecks without manual threading.

https://developer.nvidia.com/sites/...dev/docs/GDC_2013_DUDASH_DeferredContexts.pdf

Summary:
Civ V - 10k+ draw calls per frame => 50% speed up
Assassin's Creed 3 => 24%+ speed up.

Makes me wonder how much of the up to 45% speed up AMD is claiming for BF4 is actually just catch up to a fully implemented DX11 render path.

I knew that performance seemed fishy :thumbsup:

I would test if it's AMD drivers, but I just woke up and found my PC won't boot. Have to troubleshoot now, sorry-- pretty sure the motherboard is dead. It's like a month old. :|

Sorry to hear that man. I hope it's something stupid like a plug wiggled out.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Frostbite is not DX .... Frostbite uses DX.

DX may be limited to a number of cores, but the engines are not ....

It's possible to prepare a lot of stuff in multiple cores before sending it to DX.

Yeah I said on the previous page that they may have been talking about rendering threads only, which makes sense. I think DX11 immediate context you only use two or three threads for rendering, which is still quite a bit for just rendering.

But as jj109 says, with deferred context, there is no limit on how many threads you can use for rendering purposes.

But of course, AMD does not support that feature as they can't get it to work properly, which may have something to do with why they're pushing Mantle so aggressively..

Honestly, it seems as though their driver team is really behind NVidia in supporting multithreading.

In BF4 multiplayer, NVidia have gained the edge because their drivers are more efficient at exploiting multicore processors.
 

dacostafilipe

Senior member
Oct 10, 2013
772
244
116
DX using deferred contexts and command lists can scale to as many threads as the engine developer deems efficient.

I found this slide from GDC 2013 where Nvidia talks about using command lists and deferred context to remove CPU bottlenecks without manual threading.

We had a discussion about this some pages ago. DR allows you to prepare commands on multiple cores and queue them, but your main renderer is still a single process.

Modern engines (ex Frostbite) already do this without event using DX DR with better results (that's what they say).
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136

I've seen that before. Lots of good and interesting info about deferred context.

It really puts to shame some of the comments by Richard Huddy about DX11. In the Bit Tech interview, he claims:

On consoles, you can draw maybe 10,000 or 20,000 chunks of geometry in a frame, and you can do that at 30-60fps. On a PC, you can't typically draw more than 2-3,000 without getting into trouble with performance, and that's quite surprising - the PC can actually show you only a tenth of the performance if you need a separate batch for each draw call.

Now the PC software architecture – DirectX – has been kind of bent into shape to try to accommodate more and more of the batch calls in a sneaky kind of way. There are the multi-threaded display lists, which come up in DirectX 11 – that helps, but unsurprisingly it only gives you a factor of two at the very best, from what we've seen. And we also support instancing, which means that if you're going to draw a crate, you can actually draw ten crates just as fast as far as DirectX is concerned.

Source

So according to Huddy, PC is limited to 2 to 3 thousand draw calls per frame, and double that with deferred context. But according to Firaxis, they got more than 15K plus draw calls at 60 FPS using deferred context, and he claimed he was GPU limited..

Source

So basically, while Mantle will be better than DX11 multithreading, the lead isn't nowhere near as large as they're making it out to be. Honestly, between instancing and deferred context, no PC game should be draw call limited.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
We had a discussion about this some pages ago. DR allows you to prepare commands on multiple cores and queue them, but your main renderer is still a single process.

Yes, because much of rendering is still sequential. It has to be done in a orderly fashion for it to work properly.

Deferred context gets around that by using multiple threads to upload data to a command list alongside the main rendering thread, so that whenever something needs to be rendered at a certain time, it can be executed quickly by the main rendering thread or immediate context.

Modern engines (ex Frostbite) already do this without event using DX DR with better results (that's what they say).
I remember Repi said he really wanted to implement driver command lists in BF3, but the drivers weren't available from neither NVidia or AMD so he couldn't test it out and in the end decided against it.

Check slide number 34.

So BF3 used lots of instancing to get around any draw call limitations, which is probably the same for BF4. Still, if BF4 had been an NVidia sponsored title, deferred context could have been properly implemented imo as the drivers are now available.
 

dacostafilipe

Senior member
Oct 10, 2013
772
244
116
Still, if BF4 had been an NVidia sponsored title, deferred context could have been properly implemented imo as the drivers are now available.

Maybe, but I do think doing instances to solve this problem is more effective.

They do use NVAPI, maybe they did implement DR in the last Frostbite version.
I will ask Johan on twitter

Edit: Yes, they do use DR
Source : https://twitter.com/repi/status/424155738613620736
 
Last edited:

Markymarc206

Junior Member
Mar 14, 2010
6
0
0
This mantle tech is really putting a major halt in my build. I want it and an AMD card because it says it adds 45% towards BF4 (the only game that I play) but then I was thinking is this only with a 45% boost with one of AMD's APU's? What if I have an intel cpu 4670k will I see a drastic boost in gameplay vs a 760 for the same price as a R9 270x?
 
Status
Not open for further replies.
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |