Ryzen's poor performance with Nvidia GPU's. Foul play? Did Nvidia know?

Page 8 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Dygaza

Member
Oct 16, 2015
176
34
101
Seems to me that in most benchmarks that even a Radeon performs better with an intel cpu than on a ryzen so what's your point?

Carefull when making assumption between games before actually checking their threading details, which thread is bottlenecking and when. If your game is bound by main game logic thread, bigger driver overhead won't have real impact on fps hence you're still being bottlenecked by main game thread. In such case releasing more cpu efficient driver would only result in smaller cpu load. However when you are being bound by driver thread, getting more efficient driver would have big impact to fps.

Before Ryzen, it was really rare to see nvidia actually being driver thread bound in games, but with Ryzen it's happening.
 
May 11, 2008
20,055
1,290
126
I never said that the CPU has less work because of the resolution increase. I said that the CPU has less work because of the reduced framerate.

You can test this yourself if you want. Start up any game and test with Vsync enabled and then disabled. You'll notice that CPU usage increases when Vsync is disabled because the GPU is drawing frames at a faster pace, compared to with it on.
I am wondering about that, if the cpu has less work todo because it is waiting on the gpu, i get the impression the game engine is not that well designed. Because there is always work to be done.
The cpu should not be waiting.


I used to be a big proponent of GPU physics, but honestly, CPU physics has come a long way over the years due to multithreading and SIMD optimization. Some of the effects that used to be only possible with GPU physics a few years ago (ie cloth simulation), now run very quickly on the CPU with PhysX 3.xx. NVidia completely rewrote the PhysX API starting with version 3.0 to make effective use of multithreading and SIMD.

So I think that CPU physics has actually become viable for advanced physics effects. The only real problem of course is that quad cores aren't really enough to go balls out, as they get bogged down too easily with modern games.
I had to read up on that. The latest Physx 3.4 seems very promising, making use of SSE2 and AVX when available when running on the cpu and not using cuda. It is indeed multithreaded so should take good advantage of multicore processors. Physx has indeed been rewritten so highly likely a concurrent programming style is used with as few mutex as possible. WHich should be possible with independent physics calculations.
If i may believe wiki, it is available for everyone to use commercially and non commercially even though it is proprietary.

Now that the consoles have standard 8 cores and the pc is gaining in cores again, i can understand that physics on the cpu can be better. Also, now that even the most advanced gpu is pushed to its limits (again, previously with 1080p) with 4k rendering, i wonder how often the gpu still has free execution time for other async compute tasks.

That makes me wonder, the AMD jaguars have SSE2 and AVX extensions. So, i can understand that the Physx programmers would use that . The console world benefits from that, and is easy market as well for the Physx programmers.
Maybe the custom scorpio jaguars are modified for improved AVX throughput. AVX is better than SSE2 anyway because of the distinct source and destination registers format.

edit:
It says here clearly, free of use commercially and non commercially with no license fee or royalties.
https://developer.nvidia.com/physx-sdk
 
Last edited:

dogen1

Senior member
Oct 14, 2014
739
40
91
Nope.

Stalker: Clear Sky 1366x768
High Preset


With 1932x1086 DSR


If anything, I shifted the bottleneck to the GPU, which dropped frame rate by 10fps, and got slightly higher utilization on the first core.

So, nope this entire premise of increased GPU load decreasing CPU utilization is FLAWED and MISLEADING.


Well, that's surprising to me. I won't deny the results, but I'm really not sure how the mechanics behind this are working. I can't see how simply increasing the resolution can significantly increase CPU load. I guess it's possible there are more conflicting factors, but I'd have expected the exact opposite result.
 
Last edited:

EightySix Four

Diamond Member
Jul 17, 2004
5,121
49
91
Well, that's surprising to me. I won't deny the results, but I'm really not sure how the mechanics behind this are working. I can't see how simply increasing the resolution can significantly increase CPU load. I guess it's possible there are more conflicting factors, but I'd have expected the exact opposite result.

Why is this confusing? Every step of the process isn't handled by the GPU, portions are still done by drivers which run on the CPU. If the GPU is doing more it is because the drivers (running on the CPU) are sending more instructions.
 
Reactions: DarthKyrie

dogen1

Senior member
Oct 14, 2014
739
40
91
Why is this confusing? Every step of the process isn't handled by the GPU, portions are still done by drivers which run on the CPU. If the GPU is doing more it is because the drivers (running on the CPU) are sending more instructions.

I guess I don't understand the internal mechanics that well yet. I didn't know there was increased driver work involved with higher resolutions. I'll find some resources on this to read, so I can understand better.
 
Reactions: EightySix Four

dogen1

Senior member
Oct 14, 2014
739
40
91
I'm probably too tired to think through this right now, but I'll try.
Do most games run simulation decoupled from rendering? That would result in resolution not really changing CPU load.

Cause if not, I mean, you're still uploading the exact same shaders to the GPU. I can't see how running your shaders on a few more pixels would change CPU load by itself. I can understand if you're streaming in higher quality textures, or you have more draw calls for longer draw distance, or higher LODs..

Is there something I'm missing?
 

Spjut

Senior member
Apr 9, 2011
928
149
106
Is DSR a built-in feature in Clear Sky, or Nvidia's driver DSR?
If it's Nvidia's driver DSR, I'd guess that could explain the changed CPU load. Comparisons should stick to the in-game settings.
 

dogen1

Senior member
Oct 14, 2014
739
40
91
Is DSR a built-in feature in Clear Sky, or Nvidia's driver DSR?
If it's Nvidia's driver DSR, I'd guess that could explain the changed CPU load.

Nvidia driver feature. I wouldn't guess that it affects CPU load, but it couldn't hurt to eliminate the possibility.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,729
136
Is DSR a built-in feature in Clear Sky, or Nvidia's driver DSR?
If it's Nvidia's driver DSR, I'd guess that could explain the changed CPU load. Comparisons should stick to the in-game settings.
My native resolution is 768p so I have to use DSR to test higher resolutions, but since you mentioned it I tried it differently.

Crysis 2 DX11, lowest in-game graphics preset.

800x600



1280x960



This scene was chosen because it offers stable FPS and frame-times. Even though the frame times are almost identical(it seems RTSS updates the frame-times faster than the FPS counter, but this is what I get in terms of FPS), I pushed the GPU to maximum utilization, as expected due to 2.56x increase in the number of pixels. CPU utilization changed from a 60:40 distribution, to 50:50, meaning overall it is roughly equal.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
I had to read up on that. The latest Physx 3.4 seems very promising, making use of SSE2 and AVX when available when running on the cpu and not using cuda. It is indeed multithreaded so should take good advantage of multicore processors. Physx has indeed been rewritten so highly likely a concurrent programming style is used with as few mutex as possible. WHich should be possible with independent physics calculations.
If i may believe wiki, it is available for everyone to use commercially and non commercially even though it is proprietary.

Check this out. This is running on the CPU using PhysX 3.4. A few years ago, if I had posted this people would never have believed it was running on the CPU:


According to the developer, they got a 2-3 times performance increase in rigid body collisions by going from PhysX 3.3 to 3.4. This is a good example of how software optimization can have a dramatic impact on CPU performance. And for the most part, PhysX only uses about four threads. They could scale it even higher if they wanted to.

But like I said before, most people these days are running quad cores, which prevents advanced physics effects from running on the CPU.

Now that the consoles have standard 8 cores and the pc is gaining in cores again, i can understand that physics on the cpu can be better. Also, now that even the most advanced gpu is pushed to its limits (again, previously with 1080p) with 4k rendering, i wonder how often the gpu still has free execution time for other async compute tasks.

GPUs always have some resources that are idling at any given time, even if the hardware monitoring shows 100% utilization. Asynchronous compute is a good way to go about using the GPU for physics calculations, but the latency cost is very high since the communication has to go across the PCI-e bus.

That's why a hybrid model is probably best, because the CPU can take care of the latency sensitive physics calculations, whilst the GPU does the calculations that respond well to extreme parallelism.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Well, that's surprising to me. I won't deny the results, but I'm really not sure how the mechanics behind this are working. I can't see how simply increasing the resolution can significantly increase CPU load. I guess it's possible there are more conflicting factors, but I'd have expected the exact opposite result.

How about we try a more modern game. Look at the framerate between the two, and look at the CPU usage. Detail level is the same in both screenshots. This is also on a octa core CPU, which gives the CPU plenty of headroom.

Witcher 3 at 1440p:



Witcher 3 at 1024x768:

 
Last edited:

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,729
136
How about we try a more modern game. Look at the framerate between the two, and look at the CPU usage. Detail level is the same in both screenshots. This is also on a octa core CPU, which gives the CPU plenty of headroom.

Witcher 3 at 1440p:



Witcher 3 at 1024x768:

How about you test at a fixed aspect ratio? You are rendering less horizontally at 1024x768. Also Witcher 3 taxes the GPU under most scenarios regardless of resolution, so it really doesn't show the GPU kicking in at higher resolutions.
 

dogen1

Senior member
Oct 14, 2014
739
40
91
Yeah, but it's not rendering the same things in between scenes.

Yeah.. do you expect CPU load to go down once the lower res shot is rendering as much as the 1440p one?

My native resolution is 768p so I have to use DSR to test higher resolutions, but since you mentioned it I tried it differently.

I think one thing that determines whether there'll be a decrease in CPU at higher resolutions is whether simulation and rendering are run in lockstep or not. If a separate thread is chugging away at updating the state of the game then CPU load is going to closer between resolutions than otherwise. The amount of draw calls submitted per frame also counts, as does the difference in framerate between the two resolutions. Less difference in framerate, less difference in draw calls per second.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,729
136
Yeah.. are you saying once the lower res shot is rendering more like the 1440p shot CPU load will go down?
I cannot *DEFINITELY* say whether it will go down, but it will be a more apples-to-apples comparison.
 
May 11, 2008
20,055
1,290
126
Check this out. This is running on the CPU using PhysX 3.4. A few years ago, if I had posted this people would never have believed it was running on the CPU:


According to the developer, they got a 2-3 times performance increase in rigid body collisions by going from PhysX 3.3 to 3.4. This is a good example of how software optimization can have a dramatic impact on CPU performance. And for the most part, PhysX only uses about four threads. They could scale it even higher if they wanted to.

But like I said before, most people these days are running quad cores, which prevents advanced physics effects from running on the CPU.



GPUs always have some resources that are idling at any given time, even if the hardware monitoring shows 100% utilization. Asynchronous compute is a good way to go about using the GPU for physics calculations, but the latency cost is very high since the communication has to go across the PCI-e bus.

That's why a hybrid model is probably best, because the CPU can take care of the latency sensitive physics calculations, whilst the GPU does the calculations that respond well to extreme parallelism.

Well, that strikes me as odd that the gpu is idle. Because when the system struggles to stay at 60fps, how can the gpu have idle moments when the gpu is the bottle neck. It is non stop calculating and frame times would be mainly determined by the gpu.
But it is possible that the needed graphic calculations do not fill up all the available simd units and some units could be allocated in between. Which is what async compute is all about. That would be a better explanation about the gpu always having some resources available as long as the simd units are independent and can execute different instructions for the async compute job and the graphics shader calculations.


I think the HSA model is the best way to combine cpu scalar operations together with gpu simd operations. Because here the latency is lowest when using zero-copy and because no system bus is functioning. Of course there is always a way to program around high latency by pre-fetching but sometimes it is impossible to know what is up ahead to calculate.

The web blog of one of the physx progammers is very enlightning.
The programmer of physx 3.4 explained that there is not that much simd code present in physx because they do a lot of logic testing and branches that cannot be converted to simd operations.
In all honesty , i have no knowledge of all possible integer simd and floating point simd instructions or how the physx code works.

It would be interesting to see physics engine programmers from different programming groups debate with each other on how to solve problems.

edit:
Here is the link.
I had it on my mobile phone but not on the pc.

http://www.codercorner.com/blog/?p=1129
 
Last edited:

dogen1

Senior member
Oct 14, 2014
739
40
91
1024x768 - 65 FPS 74% CPU



1280x720 - 61 FPS 68% CPU


1920x1080 - 51 FPS 58% CPU


960x540 - 98 FPS 98% CPU

1280x720 - 97 FPS 97% CPU

1920x1080 - 74 FPS 84% CPU
 

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,729
136
1024x768 - 65 FPS 74% CPU



1280x720 - 61 FPS 68% CPU


1920x1080 - 51 FPS 58% CPU


960x540 - 98 FPS 98% CPU

1280x720 - 97 FPS 97% CPU

1920x1080 - 74 FPS 84% CPU
Interesting results. Is the first screenshot from the Talos Principle? Anyway, the GPU utilization is already pretty high across all the resolutions, whereas I think situations where GPU headroom decreases by going to higher resolutions, like the second set of screenshots of Shadow of Mordor, are much more interesting.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,729
136
So this is Insurgency, a Source engine game, so the CPU is already being taxed pretty high, while GPU utilization is usually quite low. Relevant because Counter Strike is a thing.

640x480



1280x960



Note that both FRAPS and RTSS are running because I was focusing on the FRAPS counter to remove any bias by looking at the counters on the right, and only clicked the screenshots when FRAPS was reporting the highest FPS at that particular moment.

My takeaway from these results: it depends on the game and the hardware combination. Also I might have been wrong with my comments in the Crysis 2 screenshots, the FPS counter seems to be more reliable than the frame times reported by RTSS.
 

dogen1

Senior member
Oct 14, 2014
739
40
91
Interesting results. Is the first screenshot from the Talos Principle? Anyway, the GPU utilization is already pretty high across all the resolutions, whereas I think situations where GPU headroom decreases by going to higher resolutions, like the second set of screenshots of Shadow of Mordor, are much more interesting.

I tried the medium preset (+ some MSAA to keep GPU load high at 1080p) instead of high and I get around 73% CPU at 1024, then 71% and then 69%.


So this is Insurgency, a Source engine game, so the CPU is already being taxed pretty high, while GPU utilization is usually quite low. Relevant because Counter Strike is a thing.

640x480
Note that both FRAPS and RTSS are running because I was focusing on the FRAPS counter to remove any bias by looking at the counters on the right, and only clicked the screenshots when FRAPS was reporting the highest FPS at that particular moment.

My takeaway from these results: it depends on the game and the hardware combination. Also I might have been wrong with my comments in the Crysis 2 screenshots, the FPS counter seems to be more reliable than the frame times reported by RTSS.

Oh, I also just tried CS:GO and while framerates are really high changing the resolution doesn't affect CPU load much for me, once I tried DSR small differences started to show up. I think the game probably just keeps working in the background regardless of how well your GPU can keep up. It could also be pretty light on draw calls, but I'm just guessing.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,729
136
I tried the medium preset (+ some MSAA to keep GPU load high at 1080p) instead of high and I get around 73% CPU at 1024, then 71% and then 69%.




Oh, I also just tried CS:GO and while framerates are really high changing the resolution doesn't affect CPU load much for me, once I tried DSR small differences started to show up. I think the game probably just keeps working in the background regardless of how well your GPU can keep up. It could also be pretty light on draw calls, but I'm just guessing.
I tried DSR in a couple of other games, it seems to add some CPU overhead. Not that much, but its there.
 

lolfail9001

Golden Member
Sep 9, 2016
1,056
353
96
My takeaway from these results: it depends on the game and the hardware combination.
It really does not, Crysis 2 is single threaded, and you were CPU limited in both screenshots, could you rise resolution any further, you would probably witness CPU utilization start dropping... quickly.

Anyway, the man's point was that CPU utilization is depends on framerates and lower resolutions produce high framerates due to lower load on GPU... hence, CPU utilization does depend on resolution until you hit a single thread limit.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,729
136
It really does not, Crysis 2 is single threaded, and you were CPU limited in both screenshots, could you rise resolution any further, you would probably witness CPU utilization start dropping... quickly.

Anyway, the man's point was that CPU utilization is depends on framerates and lower resolutions produce high framerates due to lower load on GPU... hence, CPU utilization does depend on resolution until you hit a single thread limit.
No, it isn't. Maybe in DX9 mode but not in DX11. It was always a mix between 60:40 to 70:30 during the times I took the screenshots, roughly speaking.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,729
136
So I had to push 2732x1536 DSR to really choke the GPU for CPU utilization to drop.
So, after correcting my initial position, I think it is reasonable to conclude that unless you are absolutely choking your GPU with crazy high resolutions, it won't affect CPU utilization that much when you change between different resolutions.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |