[ComputerBase.de] Forza 7 Benchmark: Vega has more gasoline in the blood than Pascal

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
One possible reason why AMD has the overhead advantage in this game compared to Nvidia is that the game makes extensive use of the D3D12 resource binding architecture but by design it's not completely hardware agnostic ...

Even if Forza 7 can't use the fully bindless feature in DX12, making use of partially bindless with structured buffers is enough to cause Nvidia hardware some slowdowns in these specific cases ...

Forza 7, like Forza Horizon 3 uses resource binding tier 2. But if FH3 had no performance issues after the parallel rendering patch on NVidia hardware, and in fact runs faster than on comparable AMD hardware, then we can't really blame the resource binding architecture for the performance issues.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
One of the things that makes me question the heavily utilized CPU core theory is the 7700K doesn't do much better than the R7 1800X and in fact trails it in frame times slightly. Normally the 7700K has quite a bit better per core performance over Ryzen in games.

The CPU benchmark was for AMD hardware. We can't really know for sure the full extent of the CPU performance issues until we see NVidia hardware benchmarked for CPU scaling.

This is definitely one of the weirder gaming results, both GPU and CPU.

I agree with you 100%!
 

ThatBuzzkiller

Golden Member
Nov 14, 2014
1,120
260
136
Forza 7, like Forza Horizon 3 uses resource binding tier 2. But if FH3 had no performance issues after the parallel rendering patch on NVidia hardware, and in fact runs faster than on comparable AMD hardware, then we can't really blame the resource binding architecture for the performance issues.

Forza Horizon 3 has different bottlenecks compared to Forza 7 and Turn 10 studios wasn't the main developer behind Forza Horizon 3, that was Playground Games ...

A better comparison point for Forza 7 would Forza 6: Apex and it's AMD hardware that had the advantage to comparable Nvidia hardware in Forza 6 ...

Overhead didn't matter much with Forza Horizon 3 since it's built with a lower performance profile in mind (30FPS on consoles) but it starts to matter much more in trying to reach lower frametime targets such as Forza 6/7 ...

D3D12 resource binding is not entirely agnostic by design since some vendors have more input on different parts of the gfx API. Fully bindless exists in D3D12 not only because of AMD but because Intel had new hardware (Gen 9) on the way to expose the feature so you don't really see fully bindless exposed in Metal 2. D3D12 multi-engine is where AMD and Nvidia (you'd be surprised they would collaborate on this but they share common functionality so it's good to expose separate copy queues for async copy on both sides) had more input than Intel (no DMA engine or compute engines) ... (it's normal for gfx APIs to have vendor bias, in fact it's almost impossible to be perfectly agnostic)
 

4K_shmoorK

Senior member
Jul 1, 2015
464
43
91
Horizon 3 had tons of problems at launch, and yes, the game was entirely single threaded. I haven't played any of 7, just a bit of the demo where I didn't take a look at any of the perf.

But playground games and the group who was contracted for the PC port ( I assume they contracted? I could be wrong) have improved the performance massively. Just redownloaded Horizon 3 the other day. Here you can see the distribution across all 8c on my system and the game is much better for it. I'm willing to bet Forza 7 will have multithreaded patch pushed out much sooner than Horizon 3.

People in this forum sure get touchy. Hopefully this is a sign that DX12 is starting to come of age and the era of single threaded performance holding back GPUs at mainstream 1080p will be a thing of the past.

3440x1440 Ultra 8xMSAA (had FPS limited to 95 by accident) This title hammered a single core at launch.

 

ThatBuzzkiller

Golden Member
Nov 14, 2014
1,120
260
136
You are comparing a 3.5GB card to an 8GB card, on a beta code.

This is the full game:

*snip*

Not a whole lot different than I imagined, AMD hardware is fairly competitive in Forza 6 ... (poor scaling from GP104 to GP102 which is also observed in Forza 6 just like Forza 7)

The second chart you posted shows Fiji having the smoothest experience out of them all since it has the best frametimes ...

The only thing that changed with Forza 7 is that AMD GPUs of lower performance tiers had a moderate shift and Vega meeting expectations set by Fiji ...

Additional optimizations by the team made for Forza 7 on the Xbox One X is also likely transferred to PC since the team had to be more aggressive with async compute optimizations since the X1X is more skewed in compute performance than the original X1 was ...

X1 had 6 CUs per shader engine while X1X has 10 CUs per shader engine. For comparison Polaris 10 had 9 CUs per shader engine (like PS4 Pro) and Vega 10 had 16 CUs per shader engine ... (missed opportunity by AMD for not trying to go aggressive enough of the CU/SE ratio for console hardware updates when they could really stand to benefit by pushing PS4 Pro to 10 CU/SE and 12 CU/SE for X1X IMO)
 

ThatBuzzkiller

Golden Member
Nov 14, 2014
1,120
260
136
CPU limitation, at 4K it's quite different:

*snip*

Which is behaviour just like Forza 7 ...

Poor scaling from GP104 to GP102 on Forza 6/7 to better scaling at 4K on those games ...

It's not all that rare for DX12 backends to be skewed in AMD's favour since advanced D3D12 resource binding is not vendor agnostic to begin with (AMD has a latent advantage in this area but it's only showed in two game series so far such as Forza and possibly Total War: Warhammer) and optimizing async compute for platforms with higher CU/SE ratio such as X1X will yield more similar gains for similar systems such as Vega 10 or Fiji ...

D3D12 bindless and async compute usage designed around X1X is probably a powerful optimization recipe going forward for both AMD and Vega. Exposing rapid packed math (available in PS4 Pro) for shader model 6.2 gives even more room for AMD to work towards a definitive performance crown on PC ...
 

Eric1987

Senior member
Mar 22, 2012
748
22
76
8xMSAA? Why? 2x at MOST. 4k doesn't even need AA. I hate tests that use it it isn't accurate for me.
 
Reactions: guachi

Jackie60

Member
Aug 11, 2006
118
46
101
8xMSAA? Why? 2x at MOST. 4k doesn't even need AA. I hate tests that use it it isn't accurate for me.
4K does need AA unless you're on a tiny 23" monitor. I game on a 40inch at 4K and you can definitely see the crawlies/jaggies without at least 4x MSAA ( I use 8X). I've got old eyes
and wear reading glasses so if I can see the difference most other people can too.
 
Reactions: Gikaseixas and ZGR

amenx

Diamond Member
Dec 17, 2004
4,005
2,275
136
Aliasing may be there on 4k but it does not bother me in the least. Quite comfortable with little to no AA in most titles. I also game on a 40" 4k display. Aliasing bugged me at 1080p, less so at 1440p and much less so at 4k. It may be there, others may notice it, others may be bothered by it, just happy it does not affect me. In fact I dont even notice it when I'm enjoying my 4k experience.
 
Reactions: ZGR

zlatan

Senior member
Mar 15, 2011
580
291
136
While I didn't test the game itself, I think this is not really a "D3D12 loves Vega" thing. Vega just simply shines in the well designed tiled/clustered renderers. It has a lot of binning cache to store a lot of useful data inside the chip, and the LDS usage is also better. Forza 7 is just a well desidned engine with a clutered renderer. Based on the earlier Forza games, the GPUs were limited by the memory access and the occupancy. This is perfect for the Vega, because the architecture is designed to solve or at least to treat exactly these limitations. Same as Dirt4 and PUBG. As long as the rendering paramteters allow the chip to avoid most the memory access, the performance will be amazing. The advantage is bigger if the game will run in to an occupancy limit by the LDS. The previous architectures are not built for these scenarios, even if the mentioned limitations are more common in today's engines.
 

TeknoBug

Platinum Member
Oct 2, 2013
2,084
31
91
I have been waiting on the Vega 56, but the pricetag is a bit of a turnoff, right now I can get the GTX1080 for a bit cheaper.
 

Muhammed

Senior member
Jul 8, 2009
453
199
116
The previous architectures are not built for these scenarios, even if the mentioned limitations are more common in today's engines.
Yet they are ahead of NVIDIA as well, the 390 is far ahead of the 970, the RX 580 is far ahead of the 1060, and Vega 64 is far ahead of the 1080. This isn't about Vega alone.

I think this is not really a "D3D12 loves Vega" thing.
Yup, the game doesn't even use D3D12, it's basically D3D11 wrapped in 12.

Forza 7 was developed by Turn 10 Studios and is based on the in-house ForzaTech engine. The game uses only the DirectX 12 API, but only the feature level 11_0 and, like all other DirectX 12 games so far, no new hardware features of the API. Whether Async Compute is used, is unknown, but appears in view of the test results as probable.
 

dogen1

Senior member
Oct 14, 2014
739
40
91
I've heard some developers support this notion, but they also say that a properly implemented multithreaded renderer would have a minimal increase on input latency. Just look at Doom for instance. Doom's Vulkan renderer is amazing, and has extremely low input latency. It's actually one of the best games to play on a high refresh rate monitor because the game is so responsive.

And then there's FH3's post parallel rendering patch, which all would agree performed much better than the original model which main loaded one or two cores.

Yet, doom actually doesn't have nearly as good input latency as other games like cod, as tested by digital foundry. Idk what forza would measure at, but these guys live and breathe 60 fps. I'm pretty sure they know what they're doing. Plus, just because one core is at or near 100% load doesn't mean it's actually being utilized fully (check core load while using a frame limiter like RTSS. it's just a wait loop), nor does it automatically mean it's acting as a bottleneck for the game. If that one core is really just running a input loop it's not gonna bottleneck anything else. The entire point is to guarantee that input is being captured as often as possible without any interference.
 
Reactions: Carfax83

ThatBuzzkiller

Golden Member
Nov 14, 2014
1,120
260
136
Yet, doom actually doesn't have nearly as good input latency as other games like cod, as tested by digital foundry. Idk what forza would measure at, but these guys live and breathe 60 fps. I'm pretty sure they know what they're doing. Plus, just because one core is at or near 100% load doesn't mean it's actually being utilized fully (check core load while using a frame limiter like RTSS. it's just a wait loop), nor does it automatically mean it's acting as a bottleneck for the game. If that one core is really just running a input loop it's not gonna bottleneck anything else. The entire point is to guarantee that input is being captured as often as possible without any interference.

That's actually a fairly common side effect of multi-threading in games ...

What developers will often do to 'use' more cores is that they'll buffer up multiple frames so that game logic and rendering can be done in parallel across multiple frames along with the synchronization thus spiking usage across more CPU cores ...

Instead of the input lag costs simply being game logic + rendering it becomes game logic/rendering (1st frame) + game logic/rendering (2nd frame) which effectively doubles the input latency ...

At first multi-threaded rendering sounds like a great idea since higher CPU utilization = win but you're arguably getting a less responsive experience in the end ...

I'd liken the comparison between framerate and frametime. Higher framerate =/= smoother experience but lower frametimes does translate to smoother experience ...

It's not just about the numbers, one has to ask themselves if they are also getting a better experience in the end ...
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Forza Horizon 3 has different bottlenecks compared to Forza 7 and Turn 10 studios wasn't the main developer behind Forza Horizon 3, that was Playground Games ...

A better comparison point for Forza 7 would Forza 6: Apex and it's AMD hardware that had the advantage to comparable Nvidia hardware in Forza 6 ...

AFAIK, Forza 6 Apex runs faster on NVidia hardware as well.

Overhead didn't matter much with Forza Horizon 3 since it's built with a lower performance profile in mind (30FPS on consoles) but it starts to matter much more in trying to reach lower frametime targets such as Forza 6/7 ...

But how does that console overhead translate to PCs? It doesn't. I tried the FH3 demo after the parallel rendering patch, and the framerates could easily hit triple digits at 1440p maxed settings if I remember. So basically, the overhead associated with lower frametimes on consoles doesn't mean squat on PC, provided your machine is reasonably modern.

D3D12 resource binding is not entirely agnostic by design since some vendors have more input on different parts of the gfx API. Fully bindless exists in D3D12 not only because of AMD but because Intel had new hardware (Gen 9) on the way to expose the feature so you don't really see fully bindless exposed in Metal 2. D3D12 multi-engine is where AMD and Nvidia (you'd be surprised they would collaborate on this but they share common functionality so it's good to expose separate copy queues for async copy on both sides) had more input than Intel (no DMA engine or compute engines) ... (it's normal for gfx APIs to have vendor bias, in fact it's almost impossible to be perfectly agnostic)

Supposedly NVidia supports fully bindless now remember? I don't know if it's been tested yet, but NVidia updated their resource binding tier from tier 2 to tier 3 a few months ago with a driver up date.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Yet, doom actually doesn't have nearly as good input latency as other games like cod, as tested by digital foundry. Idk what forza would measure at, but these guys live and breathe 60 fps. I'm pretty sure they know what they're doing. Plus, just because one core is at or near 100% load doesn't mean it's actually being utilized fully (check core load while using a frame limiter like RTSS. it's just a wait loop), nor does it automatically mean it's acting as a bottleneck for the game. If that one core is really just running a input loop it's not gonna bottleneck anything else. The entire point is to guarantee that input is being captured as often as possible without any interference.

I remember that test, and it was done on consoles if I recall. Translating console performance to PC is extremely difficult, if not impossible as the consoles use a different API, plus they have far weaker CPUs. Doom never ran at a stable 60 FPS on either console, even with dynamic resolution. It would dip below 60 FPS during heavy scenes.

But I see your point. I still don't know if such an extreme measure is necessary though on PC, as the CPUs in most gaming PCs (for example Intel's Core series) are just much more powerful and have much lower latencies than the CPUs in the consoles.

PC vs PS4 input lag test in Tekken
 

dogen1

Senior member
Oct 14, 2014
739
40
91
That's actually a fairly common side effect of multi-threading in games ...

What developers will often do to 'use' more cores is that they'll buffer up multiple frames so that game logic and rendering can be done in parallel across multiple frames along with the synchronization thus spiking usage across more CPU cores ...

Instead of the input lag costs simply being game logic + rendering it becomes game logic/rendering (1st frame) + game logic/rendering (2nd frame) which effectively doubles the input latency ...

At first multi-threaded rendering sounds like a great idea since higher CPU utilization = win but you're arguably getting a less responsive experience in the end ...

I'd liken the comparison between framerate and frametime. Higher framerate =/= smoother experience but lower frametimes does translate to smoother experience ...

It's not just about the numbers, one has to ask themselves if they are also getting a better experience in the end ...

Yeah, I know that trick. Not a huge fan. In UE3 it was called "oneframethreadlag" and turning it off made games feel immediately better. Not sure what forza is doing exactly, but I assume they're trying to minimize latency every way then can.
 

TheELF

Diamond Member
Dec 22, 2012
3,990
744
126
Instead of the input lag costs simply being game logic + rendering it becomes game logic/rendering (1st frame) + game logic/rendering (2nd frame) which effectively doubles the input latency ...
That would mean that the game would only check for input after every second frame,what's the reasoning for that if the input thread can,and probably is, a separate thread?
Or the game only refreshes the screen after every second frame,we know for sure that that doesn't happen,I mean look at the support forums for forza, if anything frames get thrown to the display even if they are not ready yet resulting in missing textures and graphics glitches.
A lot of "advanced" games do this lately since this is the only way to get more performance out of multiple threads,just don't waste any time on syncing the frames anymore and just throw anything at the screen whenever it's ready...
 

Muhammed

Senior member
Jul 8, 2009
453
199
116
Not a whole lot different than I imagined, AMD hardware is fairly competitive in Forza 6 ... (poor scaling from GP104 to GP102 which is also observed in Forza 6 just like Forza 7)

The second chart you posted shows Fiji having the smoothest experience out of them all since it has the best frametimes ...
Actually a whole lot different, as NVIDIA cards don't suffer as badly @1080p, they stay ahead of AMD GPUs.

FuryX having better minimums doesn't really prove anything, since it's slower than 980Ti, and the same thing couldn't be replicated on any Vega card, could be just an anomaly in the test.

The behavior of Forza 6 is the same as Forza 7 @4K though.


 

Head1985

Golden Member
Jul 8, 2014
1,866
699
136
So you think Phynaz's comments (and others) are about damage control and that's it? Sorry but that's ridiculous. This is a legitimate complaint. It's obvious that something is amiss with these scores, as the GTX 1080 Ti is only 7% faster than the GTX 1080 at 1080p, which means that it's being bottlenecked somewhere.

It could be the driver, or it could be the game's CPU thread model, we don't know. But what we do know is that it's not normal. And since this FH3 had some serious CPU performance issues when it first released that was eventually resolved, my bet is that the game's CPU usage model is at fault. NVidia's driver uses more CPU than AMD's, which probably explains why NVidia is more affected by this than AMD.
Nvidia using sofware scheduler in dx12 same as in dx11.So they runs dx12 with sofware way with drivers.This causing more cpu overhead under dx12 compare to AMD but it is better under dx11 than AMD.
AMD using hardware schedulers in dx12.SO they dont have any cpu overhead under dx12.That why vega is faster in lower resolutions.

1080TI is only 7% faster than GTX1080 in 1920x1080 and 2560x1440 and only 20% faster in 4k.That indicate CPU bottleneck even in 4k(1080TI still pushing 90+fps in 4k)
 

geoxile

Senior member
Sep 23, 2014
327
25
91
Nvidia using sofware scheduler in dx12 same as in dx11.So they runs dx12 with sofware way with drivers.This causing more cpu overhead under dx12 compare to AMD but it is better under dx11 than AMD.
AMD using hardware schedulers in dx12.SO they dont have any cpu overhead under dx12.That why vega is faster in lower resolutions.

1080TI is only 7% faster than GTX1080 in 1920x1080 and 2560x1440 and only 20% faster in 4k.That indicate CPU bottleneck even in 4k(1080TI still pushing 90+fps in 4k)

Without any bottlenecks the 1080ti is only a little over 20% faster than the 1080. It's like 11tf to 9tf
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |