Some performance values from that video. Intel's Astroid field benchmark program is used.
First off Dx 11:
FPS: 28 CPU Utilization:
20% (Only 2 cores could be utilized)
Now for basic Dx 12 with no extra features other than
REAL multithreading all the astreoid rocks are drawn one by one in a loop(DrawLoop) just like Dx 11 but on all the cores! (Which is actually not so smart as you can see after this)
FPS: 74 CPU Utilization:
38-39% (On all 4 cores and 8 threads available!)
Dx 11 vs Dx 12 Multithreading and issuing command with all the cores can be seen here:
On Dx 11 is till the middle of each cpu core graph, only 2 are active (and one verry jittery). But on Dx 12 all cores are active and really steady.
OK! Now with the usage of descriptor tables (
a sw feature of Dx 12) all the different astreoid rocks are drawn again with a loop(DrawLoop) but this time on a single descriptor table heap in a
bindless fashion.
FPS: 80 CPU Utilization:
35%
THIS IS NOT ALL! What comes next is the motherload of CPU utilization!!
Up until now all the rocks were drawn in a loop. A draw call was issued to all different cores of cpu at the end of each loop. But if you knew what is to be drawn for static/not changing parts of the game beforehand (like the developers do! ) then you can describe all of the object to be drawn in one single descriptor table and then ExecuteIndirect (prepare what is to be drawn beforehand and execute it in one fell swoop) so all the different objects are drawn in exactly one instance all together (smartest possibility to render).
DrawLoop on the left: Start drawing and don't stop until certain condition is met and when that is met stop drawing.
ExecuteIndirect on the right: Draw everything that is in the DescriptorTable, all the asteroid rocks in one heap in that table.
Dx 12 with ExecuteIndirect and Descriptor Heaps
FPS:90 !! and CPU Utilization: ONLY 9% !!!!
So going from Dx11 and 28 FPS to Dx 12 and 90 FPS. CPU utilization is down to 9% from 20%. I say Dx 12 is REAL.