Official AMD Ryzen Benchmarks, Reviews, Prices, and Discussion

Page 219 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

CatMerc

Golden Member
Jul 16, 2016
1,114
1,153
136
Interesting, I'll update you once I get in touch with the tester.
 

lolfail9001

Golden Member
Sep 9, 2016
1,056
353
96
Watch this and tell me why GCN is flawed.
You do understand that within Dx11 this video agrees with me and brings even more evidence to it than i meant to.
Within Dx12 framework it gets more complicated but you understand as well as i do that making drawcall submission multithreaded is not exactly trivial even when it is possible on API level.
Uhm, isn't that obvious, considering we know that even mobo makers did not have too much time with Ryzen before release.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,729
136
You do understand that within Dx11 this video agrees with me and brings even more evidence to it than i meant to.
Within Dx12 framework it gets more complicated but you understand as well as i do that making drawcall submission multithreaded is not exactly trivial even when it is possible on API level.
Making draw calls multithreaded on DX11 in a way that is developer and game-engine agnostic was only possible due to the SW scheduler in the NVIDIA driver. With GCN, AMD hoped that developers would code their games such that the draw call thread is kept as free as possible so it can efficiently feed the ALUs via the command processor, ie. its hardware scheduler. Like Civilization V. But games continued to load the draw call thread with other tasks(this is the reason why Project Cars performed so poorly on GCN) and until recently, there haven't been many DX11 games that had the incentive to do it properly, because the NVIDIA driver took care of that.

However, the SW scheduler of the NVIDIA driver has its own overhead, and this shows in comparatively recent DX11 games like COD:BLOPS 3 and Witcher 3 where it causes near-max CPU utilization across multiple cores in CPU-limited scenarios. This happens in Titanfall 2 as well, which the video doesn't show. Incidentally, the RX 480 is faster than the GTX 1060 in Titanfall 2. This is also carried over to DX12 where NVIDIA GPUs take a hit regardless of what CPUs are being used. The only way to overcome this, as the video points out, is when developers take their time to code in a way that prevents driver bottlenecks(which are due to features that were originally intended to help DX11 draw call multi-threading, but are redundant on DX12) on NVIDIA cards, examples of this being TW:W and Doom.

This is basically the difference between AMD and NVIDIA software and hardware implementation that the video talks about, and if you claim that it only strengthens your claim that GCN is flawed, I suggest you rewatch it.
 

Glo.

Diamond Member
Apr 25, 2015
5,765
4,668
136
However, the SW scheduler of the NVIDIA driver has its own overhead, and this shows in comparatively recent DX11 games like COD:BLOPS 3 and Witcher 3 where it causes near-max CPU utilization across multiple cores in CPU-limited scenarios. This happens in Titanfall 2 as well, which the video doesn't show. Incidentally, the RX 480 is faster than the GTX 1060 in Titanfall 2. This is also carried over to DX12 where NVIDIA GPUs take a hit regardless of what CPUs are being used. The only way to overcome this, as the video points out, is when developers take their time to code in a way that prevents driver bottlenecks(which are due to features that were originally intended to help DX11 draw call multi-threading, but are redundant on DX12) on NVIDIA cards, examples of this being TW:W and Doom.
This exact reason, for the above, is why you NEED very fast CPU, rather than slightly slower, but with higher number of cores, in most game engines that are today. But brace yourselves. Shader Model 6.0 will change this.
 
Reactions: Drazick

Topweasel

Diamond Member
Oct 19, 2000
5,436
1,655
136
So guys small update. I am going to back track on this being Core penalty. Its old and not quite completely relevant but does seem to establish that at one time Nvidia scaled fine with CPU cores on DX12.

http://www.pcgameshardware.de/Battl...attlefield-1-Technik-Test-Benchmarks-1210394/

At the very bottom there is a BF1 test with both a Fury and 1080. This is on a 6900. They do the test with 8c16t, 4c8t, and 4c4t.

The point being that the 1080 scaled well with more cores. The Fury capped out at 4c. But I think that had to do with Fury running out of steam since mins went up but max's didn't.

When combined with Computerbase numbers for BF1 with the 6900 and a Titan X. I noticed that the numbers on computerbases lined up better with the 4c numbers from pcgameshardware. I think the loss in performance we are seeing on Nvidia GPU's is not a core penalty so much as a thread cap and API overhead issue. The Fury never loses performance going to DX12. But the 1080 does. Almost in every game and example Nvidia loses performance when moving to DX12. Considering the drop in performance across the board on Nvidia. I wonder if in the last 6 months to boost DX12 performance on the "best" and most sold gaming CPU, to eak out that little bitextra performance for the Titan and 1080's that they capped threads for to that of 4c CPU's and what we are seeing is more of the overhead on the API that normally would be glossed over with more cores but since it can't we are just seeing the overhead on top of an IPC and clockspeed deficit. DX12 threading in games is going to be much more dependent on the API than it would with DX11.
 

lolfail9001

Golden Member
Sep 9, 2016
1,056
353
96
Making draw calls multithreaded on DX11 in a way that is developer and game-engine agnostic was only possible due to the SW scheduler in the NVIDIA driver.
Yes, i have seen that.
With GCN, AMD hoped that developers would code their games such that the draw call thread is kept as free as possible so it can efficiently feed the ALUs via the command processor, ie. its hardware scheduler.
Yes, i have seen that too.
However, the SW scheduler of the NVIDIA driver has its own overhead, and this shows in comparatively recent DX11 games like COD:BLOPS 3 and Witcher 3 where it causes near-max CPU utilization across multiple cores in CPU-limited scenarios.
Yes, i have seen even that.. Though it has to be noted that such near-max CPU utilization is already present there, nV driver just makes it worse in this case.
Incidentally, the RX 480 is faster than the GTX 1060 in Titanfall 2.
*After a minute of checking* Technically correct, but i fail to see relevancy since it is hardly CPU heavy in the first place.
This is also carried over to DX12 where NVIDIA GPUs take a hit regardless of what CPUs are being used.
Go on... I mean, AMD GPUs take a hit on regular basis too, but it is a fair statement that nV GPUs do so way more often.
The only way to overcome this, as the video points out, is when developers take their time to code in a way that prevents driver bottlenecks
Yes, it is kind of twisted mirror of GCN Dx11 situation. Make no mistake, i have no issues admitting that nV's approach is flawed too, if that's your point.
examples of this being TW:W and Doom.
Wait, i listened to entire video, but that's one thing i did miss, did he mention it at some point or is that your addition?

This is basically the difference between AMD and NVIDIA software and hardware implementation that the video talks about, and if you claim that it only strengthens your claim that GCN is flawed, I suggest you rewatch it.
It does because when i claimed GCN is flawed i meant the implementation not the very idea. A video highlighting the design flaw of GCN in certain situations is addition i totally did not have in mind, but granted it does more to explaining why badly made games run worse on AMD hardware than i ever hoped to, too bad i have seen it after writing my original post.

This exact reason, for the above, is why you NEED very fast CPU, rather than slightly slower, but with higher number of cores, in most game engines that are today. But brace yourselves. Shader Model 6.0 will change this.
And here i am thinking it is because games are ultimately ALWAYS bound to a single threaded performance, or rather, performance of an event loop thread.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,729
136
*After a minute of checking* Technically correct, but i fail to see relevancy since it is hardly CPU heavy in the first place.
Heavily modified Source engine, CPU plays a part in it.
Wait, i listened to entire video, but that's one thing i did miss, did he mention it at some point or is that your addition?
It is there in the 14:12 mark. Creative Assembly and the TW series is mentioned before that.
 

Topweasel

Diamond Member
Oct 19, 2000
5,436
1,655
136
Watch this and tell me why GCN is flawed.
Well dang. That pretty much explains everything doesn't it. Well except why Nvidia stops scaling on DX12 and why it might be only recently. But that would explain just about everything. Why Radeon's are so consistently better at DX12. Why Nvidia sees near consistent performance hit on DX12. So than the real question again is why does it seem like the Nvidia driver caps the MT scheduling at 4c. Probably goes back to what I was thinking. They thought they could eek out better performance on the their cards by doing that considering the performance of the 7700k.
 

lolfail9001

Golden Member
Sep 9, 2016
1,056
353
96
Heavily modified Source engine, CPU plays a part in it.
Fair, i guess, though it is still hardly CPU heavy from what i see in benchmarks and stuff.
It is there in the 14:12 mark. Creative Assembly and the TW series is mentioned before that.
Oh yeah, i noticed it now, but i believe TW is mentioned in different context still (TW is mentioned in context of single threaded games i believe).
 

james1701

Golden Member
Sep 14, 2007
1,873
59
91
Working on P state overclocking now. I am running into trouble that It wants ignore P0 and only go up to P1. If I make P1 identical to P0, then it won't drop to P2. Any ideas?
 

sushukka

Member
Mar 17, 2017
52
39
61
Watch this and tell me why GCN is flawed.
Very enlightening video. Explains lots of things and lolfail9001 got more gas in his fire to continue his crusade.
Also shows how rotten the game is behind the curtains. Intel has kept the 4c limit for a decade soon, which opened the room for Nvidia to exploiting it. With all the extra money Nvidia/Intel have in their pockets there has probably been very little choises for AMD to make (developer contracts etc). They gambled with DX12 and Vulkan/Mantle and sadly it seems to pay off only a little bit. As said in the video, AMD has to backup some steps to respond the dx11 deficit because they don't have the money for developer contracts to make the API leap faster. Nvidia & Intel surely don't want that as long their technology is in this phase. Just from business perspective it seems that keeping 4c processors and dx11 has been the best milking strategy for both Nvidia and Intel. The more I read these, the more my sympathy for AMD grows. It's reallyreally good to have them back to bring competition and actual development back in this area.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,729
136
That's what I followed. It will not let P0 activate. P1 is as high as it goes.
Set P0 to what you desire, keep the rest unchanged. Use High Performance power profile and set the Minimum Processor State to a low value and report back.
 

james1701

Golden Member
Sep 14, 2007
1,873
59
91
Set P0 to what you desire, keep the rest unchanged. Use High Performance power profile and set the Minimum Processor State to a low value and report back.

If I only change P0 It maxes out at 3.2ghz even though I have it set to 4ghz P0. That's with everything else left alone. If I change P1 and P2 to manual, it still does the same thing. 3.2ghz down to 2.2ghz. If I change min cpu power to 100%, it still only does 3.2ghz.

I did run some CPU benchmarks to see if programs are misreporting the speed. They are not, the benchmark numbers reflect a slower processor.
 
Last edited:

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
Watch this and tell me why GCN is flawed.

We went over this nonsense last year in the GPU section. This whole video is just a bunch of nonsense which doesnt make any sense and doesnt reflect the reality.
Straight from Microsoft:
In D3D12 the concept of a command queue is the API representation of a roughly serial sequence of work submitted by the application. Barriers and other techniques allow this work to be executed in a pipeline or out of order, but the application only sees a single completion timeline. This corresponds to the immediate context in D3D11.
https://msdn.microsoft.com/en-us/library/windows/desktop/dn899217(v=vs.85).aspx

Multi-Threaded Rendering is an API concept. Work gets only submitted through one command queue. You can fill the command queue from multiple threads. After the GPU got the queue the hardware is scheduling the workload on the GPU. The application can create multiple command queues. The number depends on the hardware.
 

james1701

Golden Member
Sep 14, 2007
1,873
59
91
I found a janky way around it. Treat P0 as XFR speeds and set it to 4.1. Set P1 to 4.0ghz. The voltage will make a jump to P0 speeds, but Multiplier, stays at P1. Benchmark numbers come up at 4ghz speeds. The only down side is some programs report my speed as 4.1ghz. That makes my scores look lower than they are.
 

Agent-47

Senior member
Jan 17, 2017
290
249
76
We went over this nonsense last year in the GPU section. This whole video is just a bunch of nonsense which doesnt make any sense and doesnt reflect the reality.
Straight from Microsoft:

https://msdn.microsoft.com/en-us/library/windows/desktop/dn899217(v=vs.85).aspx

Multi-Threaded Rendering is an API concept. Work gets only submitted through one command queue. You can fill the command queue from multiple threads. After the GPU got the queue the hardware is scheduling the workload on the GPU. The application can create multiple command queues. The number depends on the hardware.
And here we are, talking about DX12 in a CPU related thread.

Did you read the link you posted? The video only describes that mentions in plain English.

particular, the following scenarios can be addressed with D3D12:

  • Asynchronous and low priority GPU work. This enables concurrent execution of low priority GPU work and atomic operations that enable one GPU thread to consume the results of another unsynchronized thread without blocking.
  • High priority compute work. With background compute it is possible to interrupt 3D rendering to do a small amount of high priority compute work. The results of this work can be obtained early for additional processing on the CPU.
  • Background compute work. A separate low priority queue for compute workloads allows an application to utilize spare GPU cycles to perform background computation without negative impact on the primary rendering (or other) tasks. Background tasks may include decompression of resources or updating simulations or acceleration structures. Background tasks should be synchronized on the CPU infrequently (approximately once per frame) to avoid stalling or slowing foreground work.
  • Streaming and uploading data. A separate copy queue replaces the D3D11 concepts of initial data and updating resources. Although the application is responsible for more details in the D3D12 model, this responsibility comes with power. The application can control how much system memory is devoted to buffering upload data. The app can choose when and how (CPU vs GPU, blocking vs non-blocking) to synchronize, and can track progress and control the amount of queued work.
  • Increased parallelism. Applications can use deeper queues for background workloads (e.g. video decode) when they have separate queues for foreground work

The video states that command can be issued from multiple workers. With nvidia the driver handles the scheduling while in AMD the hardware does it. Hence amd has lower overhead.

The video then goes on to say due to this, NV will load up the main worker with scheduling task hence leaving less room for other things like player management, UDP calls, etc. Perfectly logical in reasoning assuming the developers are not interest in accounting lower IPC ST cpu due to NGW or otherwise, I.e. amd's platform.
 
Last edited:

itsmydamnation

Platinum Member
Feb 6, 2011
2,867
3,418
136
No, it is a logical consequence of GCN flaws, nothing else. But i digress, what business nV has hurting it's consumer base? It's not exactly in competition with Ryzen, y'know.

What flaws in GCN? GCN 1.0 is still more capable in term of ALU features and flexibility then pascal.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
The video states that command can be issued from multiple workers. With nvidia the driver handles the scheduling while in AMD the hardware does it. Hence amd has lower overhead.

The nVidia driver doesnt handle the scheduling. There is no difference to AMD. "inter warp scheduling" has nothing to do with it.

The video then goes on to say due to this, NV will load up the main worker with scheduling task hence leaving less room for other things like player management, UDP calls, etc. Perfectly logical in reasoning assuming the developers are not interest in accounting lower IPC ST cpu due to NGW or otherwise, I.e. amd's platform.

And this wrong, too. nVidia creates helper threads under DX11/OpenGL to support the main render thread - aka offloading workload to other threads.

Instead of listening to people on youtube you could watch GDC videos from nVidia about DX11 and DX12: http://www.gdcvault.com/play/1023517/Advanced-Rendering-with-DirectX-11
And here is one for OpenGL and Vulkan: http://www.gdcvault.com/play/1023516/High-performance-Low-Overhead-Rendering

Maybe you guys can now go back to Ryzen. This is so more interesting than this nonsense about nVidia, AMD, DX11 and DX12. The last two pages are really offtopic.
 

Agent-47

Senior member
Jan 17, 2017
290
249
76
The nVidia driver doesnt handle the scheduling. There is no difference to AMD. "inter warp scheduling" has nothing to do with it.

And this wrong, too. nVidia creates helper threads under DX11/OpenGL to support the main render thread - aka offloading workload to other threads.

Instead of listening to people on youtube you could watch GDC videos from nVidia about DX11 and DX12: http://www.gdcvault.com/play/1023517/Advanced-Rendering-with-DirectX-11
And here is one for OpenGL and Vulkan: http://www.gdcvault.com/play/1023516/High-performance-Low-Overhead-Rendering

Maybe you guys can now go back to Ryzen. This is so more interesting than this nonsense about nVidia, AMD, DX11 and DX12. The last two pages are really offtopic.

Wow that totally explains how DX12 performance is so awesome in gtx.

/sarcasm

BTW, did you know that there is an added latency while creating each worker thread? That is what we are saying. Its a overhead amd does not have as it has the silicon based implementation. I donot know how else to explain. I understand its be difficult to accept the flaws in NV, but seriously know the basics before preaching in a tech forum.


Cheers
 

PhonakV30

Senior member
Oct 26, 2009
987
378
136
eurocom said:
Eurocom is launching the most powerful laptop ever created. The 19.1” EUROCOM Tornado F9 is fully configurable and can support dual AMD Ryzen 7 series processors, dual SLI NVIDIA GeForce GTX 1080Ti with 16GB GDDR5X VRAM or AMD Radeon RX580 dual CrossFireX desktop graphics setup, eight memory slots for up to 128 GB DDR4 memory and six M.2 NVMe solid-state drives.

Source : http://www.eurocom.com/ec/release(372)ec

What the hell ?

Edit : Twitter
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |