(Discussion) Futuremark 3DMark Time Spy Directx 12 Benchmark

Page 10 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
It's even worse for ASC (Async Compute). It would be a great opportunity to compare the max possible gaines between GPU vendors/generation with ASC, but with suboptimal utilisation on some vendors this is not possible at the moment.

"Max possible gaines"? That doesnt make any sense.

The "max possible gaines" on AMD hardware would be a total pixel limited scenario. And guess which vendor outputs more pixel per clock?

I can imagine how you would react to this. :|
 
Last edited:

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
Wow, that website referenced above is one of the most biased sites I've ever seen posted here. Suggest you don't post any of their articles again, unless you want to be laughed at...

Jarnis, I commend you for trying to explain to the people here the choices FM made for this benchmark, but do understand you are talking to some who will just not accept anything you say. Don't take it personal, just the way things are 'round here.

^Making a 'true' empty statement as an attempt to rebuttal a logical argument without actually having any tangible, logical counter-arguments to the article in question. Stating a conclusion without any premises that support the conclusion is the very definition of a flawed argument.

Let's recap what the article states:

1) NV lacks hardware Async Compute - true

2) NV introduced Enhanced Async Compute but they really meant pre-emption because NV has no ACE engines - true

3) NV positions Pre-emption as some killer feature, nearly an Async "equivalent" in its marketing slides. However, pre-emption in prioritizing key tasks must stop existing workload before a new workload begins. This is not true multi-engine parallelism and this is not the same as ASync Compute. Thus, pre-emption's goal is to prioritize tasks but Async Compute hardware engines can issue tasks in parallel - true

4) In real world DX12/Vulkan games, modern APIs show massive gains in CPU performance. In Doom, a 1.2Ghz 5820K gains 70%+ with a Fury X when running Vulkan. In Hitman, FX8350-70 gains 50-70%. There is no correlation between faster CPU and improved 3DMark score. Normally, this is what we want -- a GPU limited benchmark. However, in this case, the time view data shows that Fury X stalls waiting for serial code, which means the CPU has no work to issue. Hitman and Doom do not exibhit this behaviour. Since all existing DX12/Vulkan PC games show CPU scaling and dependency and Time Spy does not show it, it also proves Time Spy has no relevance to real world DX12/Vulkan games as it does not mimic the programming behaviour/trends of real world modern API games.

5) By definition of DX12/Vulkan, programming should move away from AIB (AMD/NV) driver reliance to the developer. Why? Because to get the full advantage of the new API's "closer to the metal" / "removal of the old API obstruction layer", developers MUST optimize the game engine's code to the specific GPU architecture to maximize coding to the metal. This means using Pre-Emption for Pascal and its large L2 cache to the max, while for AMD running as much code as GCN can handle under Async Compute, shader intrinsics, etc. This is how coding works on consoles, Doom, etc.

By definition then since Futuremark failed to create deep architecture specific optimizations, the Time Spy benchmark does NOT compare the true potential of GCN to Pascal architectures under next generation DX12 games. What it instead compares is ONLY how those graphics architectures can run Time Spy synthetic code -- nothing else. This is 100% true. If next generation games are coded to specifically take advantage of Pascal and GCN under DX12/Vulkan, Time Spy had to do the same -- they did NOT. This means Time Spy fails to capture both the GCN Async Compute console port effect and all scenarios where DX12/Vulkan AAA games that will take better advantage of GCN's DX12 capabilities.

Since we already see RX 480 beating 1060 in 80%+ of DX12/Vulkan games, it is clear that real world DX12/Vulkan gaming performance does not accurately align with Time Spy. Since TimeSpy does not try to measure average GPU performance (TPU charts), but tries to predict future DX12 performance, then the performance delta between 480 and 1060 should roughly match what's already happening in DX12/Vulkan games. It does not:

http://www.hardwarecanucks.com/forum/hardware-canucks-reviews/73040-nvidia-gtx-1060-6gb-review-21.html

6) It's impossible to have a synthetic DX12 benchmark that tries to predict performance in future DX12 games without taking full advantage of GCN's most important DX12 feature - Async Compute. Since FutureMark is not doing much in the way of GCN-specific optimizations, in effect they are assuming almost no AAA developer making DX12/Vulkan games will do so on average over the next 1-2 years. 280X losing to the 960 also makes no sense as that is not what we see in modern DX12/Vulkan games.

7) By definition, ALL synthetic GPU benchmarks are worthless for predicting performance in the next generation games because next generation games do NOT use game engines which are used by these synthetic benchmarks. That means Unigine, 3DMark, GPUMark, Catzilla, etc. are all worthless predictors of a real world gaming experience a PC gamer will get in a new game. Unless we actually test a new PC game, 3DMark score tells us nothing about how a GTX1070 or an RX 480 will actually run the game as far as the user experience is concerned.

The people who work at FutureMark will defend their work because it's their job. The ONLY way to gauge how well modern graphics cards perform in DX12/Vulkan games is to test them in real world DX12/Vulkan games.

Another litmus test as to why 3DMark is worthless is if it ceases to exist, it changes nothing about our graphics card choices because we should only care about how a graphics card runs a game we play, not some worthless synthetic score.

There is no need to have synthetic power viruses like FurMark or synthetic GPU benchmarks like 3DMark or Unigine when we can test real world applications for which the graphics cards were purchased in the first place. But eh, if you love playing Furmark, 3DMark, Catzilla, Unigine games, by all means defend synthetics!!

The industry just likes a simple, repeatable benchmark that they can run and it spits out a score in minutes. If more games used in-game benchmarks, there would be even less reason for 3DMark. Also, YouTube and online reviewers want to please all their viewers/readers. There is no doubt that there is a fraction of PC gamers who still think useless synthetics like 3DMark, Unigine and Passmark mean anything. To please them, some reviewers still include synthetics just to not alienate some readership that will read another review that does.
 
Last edited:

FM_Jarnis

Member
Jul 16, 2016
28
1
0
Everything related to driver and hardware level implementation details of different ways to do async compute are pretty meaningless. From the programmer's standpoint, he doesn't know what the driver/hardware will do. Do not confuse hardware implementation details with DX12 programming. DX12 literally offers no way to control how the GPU processes the queues.

Fury does not stall in Time Spy. GPU utilization is effectively 100%. Do not make claims based on misunderstanding GPUView screenshots.

Existing DX12 games are effectively DX11 engines with DX12 renderer bolted on. Not the best comparison target.

3DMark alone is not the only data point you should use. Neither is games (especially cherry-picked games). Use all data points together.

3DMark is infinitely more reproducible than game benchmarks since it renders the exact same thing every time.
 
Last edited:

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
Let's recap what the article states:

This will be fun:

1) NV lacks hardware Async Compute - true
Pascal supports "hardware Async Compute": http://www.anandtech.com/show/10325/the-nvidia-geforce-gtx-1080-and-1070-founders-edition-review/9

2) NV introduced Enhanced Async Compute but they really meant pre-emption because NV has no ACE engines - true
NV introduced Async Compute with Pascal in the form of "overlapping workload" and a better "Preemption" implementation:
"Overlapping Workload": http://www.anandtech.com/show/10325/the-nvidia-geforce-gtx-1080-and-1070-founders-edition-review/9
"Preemption": http://www.anandtech.com/show/10325/the-nvidia-geforce-gtx-1080-and-1070-founders-edition-review/10

3) NV positions Pre-emption as some killer feature, nearly an Async "equivalent" in its marketing slides. However, pre-emption in prioritizing key tasks must stop existing workload before a new workload begins. This is not true multi-engine parallelism and this is not the same as ASync Compute. Thus, pre-emption's goal is to prioritize tasks but Async Compute hardware engines can issue tasks in parallel - true
Pascal fully supports "overlapping workload" and is not using "Preemption" for it: http://www.anandtech.com/show/10325/the-nvidia-geforce-gtx-1080-and-1070-founders-edition-review/9

4) In real world DX12/Vulkan games, modern APIs show massive gains in CPU performance. In Doom, a 1.2Ghz 5820K gains 70%+ with a Fury X when running DX12. In Hitman, FX8350-70 gains 50-70%. There is no correlation between faster CPU and improved 3DMark score. Normally, this is what we want -- a GPU limited benchmark. However, in this case, the time view data shows that Fury X stalls waiting for serial code, which means the CPU has no work to issue. Hitman and Doom do not exibhit this behaviour.
TimeSpy has no support for DX11. :|
And no the TimeSpy data doesnt show "stalls waiting for serial code".

By definition then since Futuremark failed to create architecture specific optimizations, the Time Spy benchmark does NOT compare the true potential of GCN to Pascal architectures under next generation DX12 games. What it compares is ONLY how those graphics architectures can run Time Spy code -- nothing else. This is 100% true. If next generation games are codes to specifically take advantage of Pascal and GCN under DX12/Vulkan, Time Spy had to do the same -- they did NOT.

Since we already see RX 480 beating 1060 in 80%+ of DX12/Vulkan games, it is clear that real world DX12/Vulkan gaming performance does not accurately align with Time Spy. Since TimeSpy does not try to measure average GPU performance (TPU charts), but tries to predict DX12 performance, then the performance delta between 480 and 1060 should roughly match what's already happening in DX12/Vulkan games. It does not:
80% of these games are paid by AMD and have no nVidia optimized path. :|
TimeSpy does the exact same thing. By your definition these 80% cant count as a benchmark and dont tell the true story.

6) It's impossible to have a synthetic DX12 benchmark that tries to predict performance in future DX12 games without taking full advantage of GCN's most important DX12 feature - Async Compute. Since FutureMark is not doing GCN-specific optimizations, in effect they are assuming almost no AAA developer making DX12/Vulkan games will do so on average over the next 1-2 years.
TimeSpy is using Async Compute.

The people who work at FutureMark will defend their work because it's their job. The ONLY way to gauge how well modern graphics cards perform in DX12/Vulkan games is to test them in real world games.
"The people who work at [AMD sponsored developer] will defend their work because it's their job. The ONLY way to gauge how well modern graphics cards perform in DX12/Vulkan games is to test them in [neutral] games."

Your whole posting is nonsense. I dont even know why anybody would write something like this riddled with false claims.
 

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
No, they meant dynamic load balancing, though preemption is important to make sure every asynchronous task does get worked on.

https://forum.beyond3d.com/posts/1911

This is what you linked:

"Nvidia withholding the launch of GeForce4 Ti 4200"

---

Just to extend on my points. Even if 3DMark optimized fully for DX12 and used all the Async Compute of GCN, it would still be worthless because then it would show an unrealistic scenario no game would use. It would become the same as as the Tessellated City for testing tessellation under DX11.

Point is, there is 0 need to try to predict the performance of DX12/Vulkan games by 3DMark when we already have DX12/Vulkan games that we can test. As more DX12/Vulkan games come out, we just add that data to the overall picture. The existence or lack of existence of Time Spy should change nothing about BF1, Deus Ex MD, etc since we won't know how graphics cards perform in those titles before they come out, are patched, etc. That's easily the biggest flaw of synthetics - they aren't based on specific game engine code our graphics cards will run in 2016-2018.
 

dogen1

Senior member
Oct 14, 2014
739
40
91
It's unfortunate that the developers did not include a major feature of technology.

And what exactly are they not including?
Here is their queue(multi engine) setup.



Now they even describe the order of submission and execution of these tasks.




So, first, they simulate particles in parallel with rendering the g-buffer. This is very likely a pretty efficient combination(conducive to parallel execution, and therefore latency hiding) because particle simulation is likely alu bound, and g-buffer rendering(while it does touch the geometry processors) is ROP and bandwidth bound.

Now the next step they do is kick off shadow map rendering and in parallel run compute shaders that handle light tiling+culling, environment reflections, hbao, and unshadowed surface illumination.

Now this is also a good combination. Shadow map rendering is extremely ROP bound, and compute shaders do not use ROPs at all. That doesn't say anything exactly about the combinations of the compute shaders themselves, but there should be a latency reduction here of at least the length of shadow map rendering, and possibly even greater.
 
Last edited:

FatherMurphy

Senior member
Mar 27, 2014
229
18
81
I would recommend everyone read Ryan's analysis of AC that is contained in his 1080 review as well as his discussion of GPU preemption on the following page.

http://www.anandtech.com/show/10325/the-nvidia-geforce-gtx-1080-and-1070-founders-edition-review/9

Folks on this board and others (Overclock) appear to want to fit AC into a neat little box with only one means of execution. From Ryan's analysis, clearly AC means a variety of methods to execute work as opposed to just one way (i.e. the AMD way or the Nvidia way). It is also clear that Nvidia's Dynamic Load Balancing addresses the more common AC definition, while preemption addresses a different (and, for strictly graphics, less common and useful) aspect of AC.

Anywho, from my layman's perspective, AC is a much clearer issue after reading Ryan's analysis (and the analysis over at the Beyond3d forums).
 

Yakk

Golden Member
May 28, 2016
1,574
275
81
So per FM themselves through their press release they've made TimeSPY a pointless metric for DX12 games
Performance.

Emm, ok.. I don't know what more there is to say about this.
 
Reactions: Grazick

dogen1

Senior member
Oct 14, 2014
739
40
91
So per FM themselves through their press release they've made TimeSPY a pointless metric for DX12 games
Performance.

Emm, ok.. I don't know what more there is to say about this.

If you expect every game to have optimized paths for every architecture(for all time?), then timespy is not realistic.

If you think(inevitably realize?) that few games if any will have more than 1 or 2 optimized paths for specific architectures, then maybe you can take it as a reasonable approximate baseline of dx12 app performance.
 
Last edited:

boozzer

Golden Member
Jan 12, 2012
1,549
18
81
so anyone figure out why amd gpus got as much as 20-50% fps increase in doom with vulkan? AC accounted for a good 10-20% of that. I can't figure it out as a gamer. hoping some of you guys might know.
 

dogen1

Senior member
Oct 14, 2014
739
40
91
so anyone figure out why amd gpus got as much as 20-50% fps increase in doom with vulkan? AC accounted for a good 10-20% of that. I can't figure it out as a gamer. hoping some of you guys might know.

Part of it was the removal of the extra opengl overhead, which was significantly higher(50%+) than nvidia cards.
 

SirDinadan

Member
Jul 11, 2016
108
64
71
boostclock.com
AMD's GCN architecture was developed from the outset with HSA in mind. DX11 and OpenGL simply cannot utilize the architecture efficiently. Now with the advent of close-to-metal APIs, it's quite evident that with some proper coding, developers can finally unlock the power of GCN.
 

boozzer

Golden Member
Jan 12, 2012
1,549
18
81
hmm, seems like more and more overheads are being removed = release date drivers are no longer needed as games are already optimized if the game dev is up to the task.

power in the hands of devs? I like it.
 

Bacon1

Diamond Member
Feb 14, 2016
3,430
1,018
91
Part of it was the removal of the extra opengl overhead, which was significantly higher(50%+) than nvidia cards.

390 wasn't much slower than 970 in OpenGL, yet now its over 25% faster in Vulkan.
 

dacostafilipe

Senior member
Oct 10, 2013
772
244
116
It is not a suboptimal utilization. I have no clue where you pull that out of.

So, you are telling me that your "vendor neutral solution" is the best way the extract all the power from a specific card from a specific vendor? :hmm:

You seem to fundamentally misunderstand what is a benchmark and what it is supposed to do.

I don't think I do, but thx ^_^
 

dogen1

Senior member
Oct 14, 2014
739
40
91
390 wasn't much slower than 970 in OpenGL, yet now its over 25% faster in Vulkan.

I didn't say it was 50% slower in opengl, only that overhead was around 50% higher. It wasn't purely cpu bottlenecked.
 
May 11, 2008
20,068
1,292
126
AMD's GCN architecture was developed from the outset with HSA in mind. DX11 and OpenGL simply cannot utilize the architecture efficiently. Now with the advent of close-to-metal APIs, it's quite evident that with some proper coding, developers can finally unlock the power of GCN.

Here is my take on it, i might be wrong since i am still studying the matter.

I think the magic is in the ACE and HWS. I wonder if those are not used properly, there is under GCN only 3 queues possible with each queue having a bunch of command lists that must run in sequence within that queue. DMA (copy or streaming), Graphics and compute. Can all run in parallel. That is demanded in DX12.

If i understand it correctly, GCN can run multiple compute queues in parallel. And switch when dependencies require for compute shaders to wait for each other. Dx12 does not request that, but the application can. And if the application is not written for it, GCN behaves just like Pascal, only slower. Doing the "basic" 3 dx12 queues.

The AMD gpu driver is in this case the agent in the graphics processing that has to find out which compute shaders can be run independently from eacxh other and thus truly in parallel.
If i am correct, this only works when there are CU free to do compute shaders on. The SIMD units in the CU can run independently (but in a 4 cycle lockstep). If i am not wrong , they share execution resources in the pipeline to reduce die real estate and power consumption.

So, i think the driver must tell the ACE what can run in parallel (Concurrently, without dependencies). If the driver cannot do that because the application does not provide it, GCN cannot shine.
Maybe that is also why the GCN card has less threads...
 

jckaboom

Junior Member
Jul 20, 2016
2
0
0
I see. Does it do the same if you were to rename 3dmark to something else?
Looks like the nvidia drivers are detecting 3dmark is running, and running "optimizations".

While on the this subject, those same tests run in crossfire/SLI would also shed much more light on this...

Also, how big are the ETL files from the tests you are doing?

Can you please follow on this. seems like Is not the first time than "optimizations" change scores.

http://www.theinquirer.net/inquirer/news/1048824/nvidia-cheats-3dmark-177-drivers
 
Last edited:

Azix

Golden Member
Apr 18, 2014
1,438
67
91
I am disappointed but only because I was expecting 3dmark to show what is maximum possible with 3dmark.

Otherwise not really bothered.
 

dogen1

Senior member
Oct 14, 2014
739
40
91
I am disappointed but only because I was expecting 3dmark to show what is maximum possible with 3dmark.

Otherwise not really bothered.

Well, they did say that this is basically the 3DMark11 equivalent for DX12. I'm sure they will make one like firestrike in a few years that will go all out.
 

Det0x

Golden Member
Sep 11, 2014
1,063
3,112
136
Well, they did say that this is basically the 3DMark11 equivalent for DX12. I'm sure they will make one like firestrike in a few years that will go all out.

My guess is that they release it just in-time for Volta
 

sirmo

Golden Member
Oct 10, 2011
1,014
391
136
Futuremark will implement IHV specific paths when Volta comes out.. don't worry guys.

Seriously after all these years of seeing this shady business there are still those who don't believe that Nvidia's interest is not being protected here?

We've seen in Doom Vulkan what a game optimized for both IHV looks like. Nvidia had engineers on site working on the Vulkan Nvidia implementation.

Futuremark is innocent of bias same way Tom's Hardware is when they benchmarked rx480 vs 1060 and used 6/9 gameworks titles including Project Cars and then the one game where rx480 dominated they didn't even use DX12 (Hitman).

Or Guru3D who after benchmarking Doom Vulkan patch in their previous review still ended up using Doom OpenGL when comparing 1060 and rx480 in their 1060 review.

All this is pure "coincidence" and totally innocent.
 
Last edited:
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |