(Discussion) Futuremark 3DMark Time Spy Directx 12 Benchmark

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Elixer

Lifer
May 7, 2002
10,376
762
126
It all boils down to what I have thought, they are not using the same code path for each vendor, and the code path that is the same is made for the lowest common denominator.

What we need is full source to be made available, then people can really see what is going on, but, this will never happen unless you pay to play.
 

Azix

Golden Member
Apr 18, 2014
1,438
67
91
It all boils down to what I have thought, they are not using the same code path for each vendor, and the code path that is the same is made for the lowest common denominator.

What we need is full source to be made available, then people can really see what is going on, but, this will never happen unless you pay to play.

it sounds like the same code, but the driver decides what it does with it.
 

ThatBuzzkiller

Golden Member
Nov 14, 2014
1,120
260
136
The one spreading missinformation is you, so please inform yourself about DX12.
The Feature Levels are these ones:
https://msdn.microsoft.com/en-us/library/windows/desktop/mt186615(v=vs.85).aspx
Async is part of the DX12 Multiengine Feature which needs to be there to get any DX12 compatibility:
https://msdn.microsoft.com/en-us/li...)()#asynchronous_compute_and_graphics_example

Without Multiengine support you don't have DX12 support independent of the feature level. There is a difference between API features of DX12 and hardware feature levels.

It's you that's spreading misinformation ...

https://msdn.microsoft.com/en-us/library/windows/desktop/ff476154(v=vs.85).aspx

How do you explain why the library interface for D3D11.3 is different from D3D12 ?

Why doesn't D3D11.3 expose seperate queues or synchronization primitives for them ?
 

Red Hawk

Diamond Member
Jan 1, 2011
3,266
169
106
This I am not so sure about. Like I said before the engine is the core and so many games now use established engines, or new engines with great developers behind them. Even IDsoftware went so far in what they did with DOOM to the point that they use shader intrinsics. This will continue to benefit any game they choose to use that engine for. So the investment for a lot of these AAA games comes in the engine that is made 'once'. Snowdrop, frostbite, cryengine, nitrous etc. So I don't think it will be odd for engine developers to pick and choose features to integrate with separate code paths. I am sure your own developers could have done it as well if that was the plan. If you could then build multiple benchmarks on that engine, it would suggest multiple games can be built on the same engine with that investment already made. Compatibility is not affected, but the potential gains with newer hardware might be worth it.

Is this not what happens in the game/engine relationship?

It sounds like Jarnis' point was that the difference between Time Spy's engine and more established game engines is that Time Spy was built from the ground up for DirectX 12 -- a daunting task, to be sure. Even Doom was developed for OpenGL first, with the Vulkan renderer being a really nice bonus on PC but not integral to the game's design. An an interview with Eurogamer, Doom's lead rendering programmer Tiago Sousa had this to say about games using the new APIs moving forward:

Tiago Sousa: From a different perspective, I think it will be interesting to see the result of a game entirely taking advantage by design of any of the new APIs - since no game has yet. I'm expecting to see a relatively big jump in the amount of geometry detail on-screen with things like dynamic shadows. One other aspect that is overlooked is that the lower CPU overhead will allow art teams to work more efficiently - I'm predicting a welcome productivity boost on that side.

http://www.eurogamer.net/articles/digitalfoundry-2016-doom-tech-interview

Sousa's expectations are actually reflected in Time Spy, with Time Spy boasting a lot more geometry detail over Fire Strike and more usage of shadows.

Middleware engine providers may provide a way to implement FL12 codepaths in games, but that doesn't mean game developers will actually use them. I'm sure it won't be as simple as flipping a switch and boom, your game is running with FL12 features. Those features would still require development resources to implement properly, and Jarnis seems to be saying there won't be a lot of motivation on the part of developers to invest those resources into FL12 when FL11 is already working (at least initially).
 
Last edited:

Bacon1

Diamond Member
Feb 14, 2016
3,430
1,018
91
http://www.futuremark.com/downloads/3DMark_Technical_Guide.pdf

It was not tailored for any specific architecture. It overlaps different rendering passes for asynchronous compute, in paraller when possible. Drivers determine how they process these - multiple paraller queues are filled by the engine.

The reason Maxwell doesn't take a hit is because NVIDIA has explictly disabled async compute in Maxwell drivers. So no matter how much we pile things to the queues, they cannot be set to run asynchronously because the driver says "no, I can't do that". Basically NV driver tells Time Spy to go "async off" for the run on that card. If NVIDIA enables Asynch Compute in the drivers, Time Spy will start using it. Performance gain or loss depends on the hardware & drivers.

Yes it is. But Engine cannot dictate what the hardware has available or not.

Async compute is about utilizing "idle" shader units. Slower the card, less idle ones you have. Less capable hardware may also be hard pressed to utilize all of them even if the engine asks nicely. Also there may be limitations as to what workloads in the engine *can* run in parallel. Yes, Time Spy is very graphics-heavy, since, well, its a graphics benchmark. But even there many of the rendering passes have compute tasks that can use this.

Ultimately some AMD cards gain quite a bit (ie. they have a lot of shader units idling while rendering and they are very good at using them for the available paraller loads). Some AMD cards gain less or not at all (either less capable at paralleriziing, less idle shader units or no idle shader units at all - for example a HD 7970 is hard pressed to have any to "spare")

Some NVIDIA cards cannot do this at all. The driver simply says "hold your horses, we'll do this nicely in order". Some NVIDIA cards can do some of it. They might use another way than AMD (more driver/software based), but the end result is the same - the card hardware is capable of doing more through some intelligent juggling of the work.



http://steamcommunity.com/app/223850/discussions/0/366298942110944664/
 

ThatBuzzkiller

Golden Member
Nov 14, 2014
1,120
260
136
He probably refers to this:

http://www.futuremark.com/downloads/3DMark_Technical_Guide.pdf

I'm also completely happy to pass on any detailed tech questions directly to the lead developer of the DX12 engine. I can't guarantee he can answer everything, but I'll promise to ask. For complex stuff, please email to info@futuremark.com so I can forward it directly.

Source code? You need to join the Futuremark Benchmark Development Program to get access to the git repo

I have a question, does 3DMark Time Spy have a FL 12_0 code path ?
 

Red Hawk

Diamond Member
Jan 1, 2011
3,266
169
106
I have a question, does 3DMark Time Spy have a FL 12_0 code path ?

It doesn't appear to. This is what the 3DMark technical guide has to say about DirectX 12 feature levels:

Direct3D feature levels

DirectX 11 introduced a paradigm called Direct3D feature levels. A feature level is a well-defined set of GPU functionality. For instance, the 9_1 feature level implements the functionality in DirectX 9.

With feature levels, 3DMark tests can use modern DirectX 12 and DirectX 11 engines and yet still target older DirectX 10 and DirectX 9 level hardware. For example, 3DMark Cloud Gate uses a DirectX 11 feature level 10 engine to target DirectX 10 compatible hardware.

Time Spy uses DirectX 12 feature level 11_0. This lets Time Spy leverage the most significant performance benefits of the DirectX 12 API while ensuring wide compatibility with DirectX 11 hardware through DirectX 12 drivers. Game developers creating DirectX 12 titles are also likely to use this approach since it offers the best combination of performance and compatibility.
 
Feb 19, 2009
10,457
10
76
FM_Jarnis: Async compute is about utilizing "idle" shader units. Slower the card, less idle ones you have.

Ultimately some AMD cards gain quite a bit (ie. they have a lot of shader units idling while rendering and they are very good at using them for the available paraller loads). Some AMD cards gain less or not at all (either less capable at paralleriziing, less idle shader units or no idle shader units at all - for example a HD 7970 is hard pressed to have any to "spare")

Figured as much, when I see a sharp drop in performance gains for less shader GCN, I know they are heavily focused on improving shader utilization with the Async approach taken and not a true multi-engine approach.

In Doom, RX 480 gains massively, almost as much as Fury X, which is indicative of a true multi-engine design as the RX 480 has so few shaders compared and a improved scheduler, pre-fetch, etc to improve it's shader utilization already.

When you get Rasterizers & DMAs to run in parallel with Shaders, you improve performance all round. Then add fillers to improve shader utilization on top and it's a huge gain for all GPUs that is capable of DX12/Vulkan Multi-Engine.

---------------------

The idea that a 7970 won't gain with AC because it has less shaders and less idle shaders is counter-productive to the point of a Multi-Engine design.

On consoles, they have even LESS shaders, ~half that of a RX 480 or 7970. Why do the devs who use AC on consoles report the biggest performance gains?!

Because it's NOT just about shader utilization. -_-

Example here: https://www.computerbase.de/2016-07/doom-vulkan-benchmarks-amd-nvidia/

Look at the % gains the RX 480 gets with Doom's approach, compared to the 390 which has more shaders. About the same or better. This is because it's not an emphasis on just shader utilization, or filling in the gaps, overlapping rendering etc. It's about real parallel execution, multi-engine, getting the GPU's ROPs + DMAs + Shaders to work in parallel.
 

GodisanAtheist

Diamond Member
Nov 16, 2006
7,070
7,492
136
This is probably a question of what a dedicated benchmark should represent.

-License or work with the teams behind the most widespread/popular game engines (Frostbyte & Cryengine for example) to create a capsulized, repeatable benchmark built on engines that people will actually use. You can even launch separate benchmarks based on separate engines.

3dMark has its own community and serves its own purpose within the PC Gaming world, but its relevance as a benchmark utility to draw sweeping conclusions on card performance is limited (although I freely admit that this might be more an impression hoisted on 3dMark by gamers than the company itself.
 

Det0x

Golden Member
Sep 11, 2014
1,065
3,116
136
Intel, AMD and NVIDIA are all part of Benchmark Development Program. They have source code read access and they can suggest changes and give feedback (with the feedback public within BDP, so any changes they suggest have to be accepted by the other vendors as well while Futuremark retains final say as to what goes into the benchmark).

AMD : We want a (enableable) mode which supports the true multi-engine approach, where so we can get maximum utilization of our hardware.

Nvidia : No, out of the question.. Our hardware don't support it. If we agreed to to this, we would see the same results as we see in Doom Vulkan Benchmarks..

Futuremark : Nvidia's "AC lite" version it is then.


Thats atleast how i read this.. :\

*edit*

For reference: (in sharp contrast to what Futuremark 3DMark Time Spy Directx 12 Benchmark shows)


vs
 
Last edited:
Feb 19, 2009
10,457
10
76
@Det0x

NV has the biggest marketshare, it would be wise of a company that makes benchmarks to take their opinions as priority. If all the IHVs have to agree on a certain implementation for it to be included, then rest assure, it's targeting the lowest hanging fruit.

What I dislike about using Async Compute to fill in bubbles or use idled shaders, is it has little benefit for GPUs with less shaders or at 4K when shaders are being used (folks can test this with Time Spy, run at 4K, Pascal sees almost no gains). It's not really benefiting more GPUs at all resolutions like a Multi-Engine approach would.

Remember, every GPU has idling Rasterizers & DMAs when Shaders are busy & vice versa under DX11.
 

Det0x

Golden Member
Sep 11, 2014
1,065
3,116
136
@Det0x

NV has the biggest marketshare, it would be wise of a company that makes benchmarks to take their opinions as priority. If all the IHVs have to agree on a certain implementation for it to be included, then rest assure, it's targeting the lowest hanging fruit.

What I dislike about using Async Compute to fill in bubbles or use idled shaders, is it has little benefit for GPUs with less shaders or at 4K when shaders are being used (folks can test this with Time Spy, run at 4K, Pascal sees almost no gains). It's not really benefiting more GPUs at all resolutions like a Multi-Engine approach would.

Remember, every GPU has idling Rasterizers & DMAs when Shaders are busy & vice versa under DX11.

True, and i understand the reasoning behind it, but i don't have to like it

And i very well could be mistaken, but if memory serves me right, didn't we have a "enableable" option for PhysX or something like that, in one of the previous 3dmarks a few years back ?

*edit*

I think you could enable the PhysX-workload to be done on a nv graphic card instead of the cpu, which solely benefited Nvidia owners scores.
 
Last edited:

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
What I dislike about using Async Compute to fill in bubbles or use idled shaders, is it has little benefit for GPUs with less shaders or at 4K when shaders are being used (folks can test this with Time Spy, run at 4K, Pascal sees almost no gains). It's not really benefiting more GPUs at all resolutions like a Multi-Engine approach would.

Remember, every GPU has idling Rasterizers & DMAs when Shaders are busy & vice versa under DX11.

Async Compute is only about filling the gaps in the shaders. I hope you dont think that your "rasterizers" or "DMAs" are operating on compute shaders...
 
Feb 19, 2009
10,457
10
76
Async Compute is only about filling the gaps in the shaders. I hope you dont think that your "rasterizers" or "DMAs" are operating on compute shaders...

No, they don't execute compute shaders. But they can do a lot of other things that are involved in rendering a frame.

http://www.eurogamer.net/articles/d...n-patch-shows-game-changing-performance-gains

Senior engine programmer Jean Geffroy goes into depth on the profound advantages that async compute brings to the table.

"When looking at GPU performance, something that becomes quite obvious right away is that some rendering passes barely use compute units. Shadow map rendering, as an example, is typically bottlenecked by fixed pipeline processing (eg rasterisation) and memory bandwidth rather than raw compute performance. This means that when rendering your shadow maps, if nothing is running in parallel, you're effectively wasting a lot of GPU processing power.

DMAs can certain stream textures in parallel with Async Compute, rather than doing it serial mode.

https://twitter.com/idsoftwaretiago/status/738427826089512965

It's why DX12/Vulkan has 3 distinct queues, a Multi-Engine design: Graphics, Compute & Copy.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
Where have i read that? Oh right, in the technical guide for Timespy:
Before the main illumination passes, asynchronous compute shaders are used to cull lights, evaluate illumination from prebaked environment reflections,
compute screen-space ambient occlusion, and calculate unshadowed surface illumination. These tasks are started right after G-buffer rendering has
finished and are executed alongside shadow rendering[...]
Particles are simulated on the GPU using asynchronous compute queue. Simulation work is submitted to the asynchronous queue while G-buffer and
shadow map rendering commands are submitted to the main command queue
.
http://s3.amazonaws.com/download-aws.futuremark.com/3DMark_Technical_Guide.pdf
 
Feb 19, 2009
10,457
10
76
@sontin
Why are you contradicting what you just posted?

You jump into here with this claim:

Async Compute is only about filling the gaps in the shaders. I hope you dont think that your "rasterizers" or "DMAs" are operating on compute shaders...

Then I refuted you with proof... and now you use the Time Spy guide (if they did what they claimed) as further proof that you are refuted again.

Async Compute is NOT "only about filling the gaps in the shaders." Period. Don't say such BS again when it's already proven false.
 
Feb 19, 2009
10,457
10
76
Also, what the devs say on Steam and what they write in the guide conflict. On Steam, they claim they only use it to increase shader utilization, or use idle shaders. In the guide, if it's true, they are running Shaders in parallel while the Rasterizers are working on Shadows.

That's a Multi-Engine approach. The former, dealing with idle shaders only, is not.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
You complained that Timespy doesnt use "rasterizers" and "DMAs" concurrently with "shaders". I proofed to you that this is wrong.

Your proof is fanfiction. My proof comes straight from the developer. You can refuse to believe them but i think that would look very foolish...
 
Mar 10, 2006
11,715
2,012
126
I genuinely don't understand why people are spending so much time trying to argue so vehemently that "AMD's async is better than NVIDIA's."

All that matters is delivered performance. It doesn't really matter to gamers how an IHV gets there.
 
Feb 19, 2009
10,457
10
76
You complained that Timespy doesnt use "rasterizers" and "DMAs" concurrently with "shaders". I proofed to you that this is wrong.

Your proof is fanfiction. My proof comes straight from the developer. You can refuse to believe them but i think that would look very foolish...

This is you:

Async Compute is only about filling the gaps in the shaders. I hope you dont think that your "rasterizers" or "DMAs" are operating on compute shaders...

Then you come back and say that Time Spy uses Rasterizers & DMAs alongside Shaders. That's nothing to do with only filling the gaps in the shaders. That's a true Multi-Engine approach.

I am referring to what the devs actually say on Steam, it conflicts what they write in their Guide. Do you understand English? This is what they say:

http://steamcommunity.com/app/223850/discussions/0/366298942110944664/

Async compute is about utilizing "idle" shader units. Slower the card, less idle ones you have.

Ultimately some AMD cards gain quite a bit (ie. they have a lot of shader units idling while rendering and they are very good at using them for the available paraller loads). Some AMD cards gain less or not at all (either less capable at paralleriziing, less idle shader units or no idle shader units at all - for example a HD 7970 is hard pressed to have any to "spare")

Some NVIDIA cards cannot do this at all. The driver simply says "hold your horses, we'll do this nicely in order". Some NVIDIA cards can do some of it. They might use another way than AMD (more driver/software based), but the end result is the same - the card hardware is capable of doing more through some intelligent juggling of the work.
 

dogen1

Senior member
Oct 14, 2014
739
40
91
Also, what the devs say on Steam and what they write in the guide conflict. On Steam, they claim they only use it to increase shader utilization, or use idle shaders. In the guide, if it's true, they are running Shaders in parallel while the Rasterizers are working on Shadows.

That's a Multi-Engine approach. The former, dealing with idle shaders only, is not.

It's one and the same. Shadows are rendered with shaders, but underutilize the gpu because they are typically heavily rop bound. If you're running a compute shader in parallel and performance improves then you are increasing utilization.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
English is not my native language, but:
Ultimately some AMD cards gain quite a bit (ie. they have a lot of shader units idling while rendering and they are very good at using them for the available paraller loads)

He wrote the exact same thing...
 
Feb 19, 2009
10,457
10
76
It's one and the same. Shadows are rendered with shaders, but underutilize the gpu because they are typically heavily rop bound. If you're running a compute shader in parallel and performance improves then you are increasing utilization.

That is the idea, if you run a Multi-Engine approach you get work done on the Shaders while the Rasterizer or DMA is doing other work.

http://www.gamasutra.com/view/feature/191007/inside_the_playstation_4_with_mark_.php?page=2

"If you look at the portion of the GPU available to compute throughout the frame, it varies dramatically from instant to instant. For example, something like opaque shadow map rendering doesn't even use a pixel shader, it’s entirely done by vertex shaders and the rasterization hardware -- so graphics aren't using most of the 1.8 teraflops of ALU available in the CUs. Times like that during the game frame are an opportunity to say, 'Okay, all that compute you wanted to do, turn it up to 11 now.'"

I was merely correcting @sontin when he makes this claim:

Async Compute is only about filling the gaps in the shaders. I hope you dont think that your "rasterizers" or "DMAs" are operating on compute shaders...
 
Feb 19, 2009
10,457
10
76
English is not my native language, but:


He wrote the exact same thing...

Continue with that quote:

"Some AMD cards gain less or not at all (either less capable at paralleriziing, less idle shader units or no idle shader units at all - for example a HD 7970 is hard pressed to have any to "spare")"

And suddenly it's not a true Multi-Engine approach. Because ALL GPUs have idle Rasterizer/DMAs when the main Shaders are engaged under serial rendering.

Serial rendering: DMA > Shaders > Rasterizer > etc

Multi-Engine rendering: DMA & Shaders & Rasterizer > etc

If you target idle shaders (Doom's Vulkan uses post processing as filler for idle shaders), you only benefit one engine, the Shaders or Compute Units/SMX. If you target Multi-Engine, you benefit ALL GPUs, regardless of how well they use shaders because they all have Rasterizers/DMAs that can benefit.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
The graphics pipeline is serial. Rasterizers are idling because running workload has not or had reached them. You cant access these Rasterizers through a dedicated queue to fill them with "work". You should read a little bit more about this: https://msdn.microsoft.com/en-us/library/windows/desktop/ff476882(v=vs.85).aspx

DMA engines copy information from the host system to the gpu before the GPU starts the workload. It has nothing to do with "asynchronous shaders(compute)".
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |