Official AMD Ryzen Benchmarks, Reviews, Prices, and Discussion

Page 157 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DrMrLordX

Lifer
Apr 27, 2000
21,808
11,165
136
I only pimp performance brother and Ryzen is coming up short against Intel's finest in that regard. X99 motherboards are not that expensive and are comparable to what AM4 units cost plus provide quad channel memory.

Um . . . okay? If that floats your boat. Ryzen is clearly a superior productivity CPU to anything at the same pricepoint (especially the 1700), and the 7700k is currently Intel's gaming king except in those circumstances where chips like the 6950x win with huge L3. As time goes on, Intel's 6900k, 6950x, and Skylake-X CPUs will take over the gaming crown in newer titles. I don't see where the 6800k really fits into that scenario.

The absolute cheapest LGA2011 v3 board I can find right now is ~$150 (ASRock X99 Extreme3). You won't have to pay that much for AM4 once the hype dies down, and it'll probably be possible to run a 1700 @ 3.8 GHz on a fairly stripped-down B350 board. You only need ~1.24v to achieve that clockspeed. Everything above that looks like a pissing contest.
 

krumme

Diamond Member
Oct 9, 2009
5,956
1,595
136
I have to correct myself. I ran 1080p on a tv that I used as a secondary monitor for a while for text display but never for a primary monitor resolution. I currently use 3440x1440 in a 34" curved panel and my 4790k and gtx 780 ti do okay for gaming.
Its a pretty high resolution. Is that 60Hz monitor? Depending on what you game? ..you might better if just changing gpu to a 1080ti or if it end up fast fast a Vega. The 4790 is a nice cpu.
 

tential

Diamond Member
May 13, 2008
7,355
642
121
Jesus wept, is it possible that you guys could a more spectacular job of missing the point? You won't be getting those "crazy frame rates" if you're CPU bound.

Incidentally, a GTX 780 couldn't sustain 60fps in Crysis at 1920x1080 with 4xAA.
As I follow Ryzen release I'm realizing more and more that the average level of cpu knowledge and testing methodology on hardware forums is very very low. It's very sad, if Jesus has been weeping I have been joining him.
 
Reactions: Trender

looncraz

Senior member
Sep 12, 2011
722
1,651
136
Ok, you build a Ryzen system with the best parts. Asus Crosshair Hero VI, a couple sticks of 3200 ddr4 2x16 to get 32 gb dual channel, and then buy a new case and high end AIO water cooler to keep that 95C chip at about 80C for 3.9ghz overclock. Then put all your important data on it. Something will happen, either ram won't work, mobo will brick itself with bios update, cpu will run too hot, windows will see it as two 4 cores instead of one 8 core cpu. You will play with SMT off and on for each game you use, and then every windows patch will probably crash something and you can pray you don't lose important data. Read the 100 reviews of people that KNOW how to build a system over the last two weeks. They ALL had troubles, headaches, and are not 100% stable. These systems are crashing after less than an hour of stress testing. What do you think will happen in 6 hours? Water coolers are not air coolers, they work great at first but hours later when the water reaches max. temp, not so much. Don't get me wrong, I was very excited and ready to build one, but right now there is no way you can say they are stable for daily use and only system. Believe me even if you are lucky to be stable now, something is going to go wrong soon, unless AMD, mobo manufactures, ram manufacturers, and Microsoft all work together to make this platform stable. It is NO WHERE NEAR the stability of an Intel rig right now, not even stable enough to be used for business.




Trolling isn't allowed here.


esquared
Anandtech Forum Director

Are you somehow expecting a new clean-sheet ecosystem to be as stable on day one as an ecosystem that is, in effect, 10 years old?

There are, right now, THOUSANDS of developers working on making Ryzen work as perfectly as possible. If AMD needs to release a revised CPU - they will do it. The could call it a Ryzen 7 1x50[X] for example...

However, the issues, at this moment, are largely resolvable - and MINOR. I'm not bothered by a 10~15% difference in performance in a few games because Windows 10 doesn't (yet) know how to balance threads on a dual-CCX Ryzen CPU. That issue will quite possibly be resolved before I even get a motherboard in.

I'm not bothered by BIOS issues - my Z77 Extreme 4 was a horribly finicky motherboard when I first got it and was running its original BIOS. (something like version 0.9, I kid you not). It is now running P2.90Q and is the most stable consumer board I've ever run (and I have NVMe support ).

X99 had some horrible teething issues - and, yes, even performance issues. AMD is doing something very different and unique, it will take effort to make it mature - we are simply seeing that effort first-hand with Ryzen instead of it being hidden behind closed doors.
 

looncraz

Senior member
Sep 12, 2011
722
1,651
136
What you said right there parallels what Raja said in an interview about the memory management in the Vega gpu. Their design team seems to think that they can make better use of a smaller cache than NVidia and since its based upon the same architecture I'm concerned that it will be hampered in the same manner.


Well, not exactly. Vega is designed to make better use of available cache and memory in memory constrained areas (think an APU with only 2GB allocated for graphics by the system...). This will also hide some bandwidth limitations (though you will always end up being limited by the final bandwidth).

Think of a nice 32MB HBC on the APU dedicated to the frame buffer output (1080p is 9MB and would, undoubtedly, be the upper target for APU optimizations).

There's still some contention as to what the HBC actually is, though. Some seem to think it is just HBM. I don't, I think it exists prior to the memory controller(s).

As far as how that relates to Ryzen.... it doesn't. Ryzen has a very fast, and large, L3 - something is simply gutting its performance when accessing more than 4MB of it within the same CCX... and there is, it seems, ZERO access between the CCXes. As soon as we exceed 6MB accesses we end up at full-on memory latencies.

Ryzen is REALLY good, however, at predicting linear accesses.



Source
 

krumme

Diamond Member
Oct 9, 2009
5,956
1,595
136
Um . . . okay? If that floats your boat. Ryzen is clearly a superior productivity CPU to anything at the same pricepoint (especially the 1700), and the 7700k is currently Intel's gaming king except in those circumstances where chips like the 6950x win with huge L3. As time goes on, Intel's 6900k, 6950x, and Skylake-X CPUs will take over the gaming crown in newer titles. I don't see where the 6800k really fits into that scenario.

The absolute cheapest LGA2011 v3 board I can find right now is ~$150 (ASRock X99 Extreme3). You won't have to pay that much for AM4 once the hype dies down, and it'll probably be possible to run a 1700 @ 3.8 GHz on a fairly stripped-down B350 board. You only need ~1.24v to achieve that clockspeed. Everything above that looks like a pissing contest.

I bought the cheapeat am4 board; an asus prime b350m. The only in stock btw. I guess its 80usd us right?
But anyway. Yes its clearly an early bios and software with lots of bugs. But booted widows fine without a single driver whatever. 100% stable. Plug and play. Far more easy than my last hsw build.
I have stopped buying expensive mb with all the colors and shrouds. And only care that its from a reputable company.
Goes 3.8 at 1.375 and my guess is that is generally needed for that freq. Dont know if it goes to 3.9 at 1.39v but i dont want that efficiency loss anyway. As it is a zen 1700 at 3.9 consumes about the same as a 6900 stock. And is fater in many loads. But it is a lot of tdp to handle.
Imo 1700 is for those that oc. 1700x and 1800x is for those that dont want to oc. As benefit they keep the nice efficiency features that us oc ers lose.
 
Reactions: lightmanek

HutchinsonJC

Senior member
Apr 15, 2007
465
202
126
The problem is that the methodology doesn't hold up across core counts or for gaming evolution.

And if you read a forum thread from one post to the next, or by chance glanced my next post after what you quoted, I had already answered this. I'm not unaware of how these benchmarks do no justice for the "evolution" of gaming. (edit: I should say "how these benchmarks do no justice for the "evolution" of gaming in how they've often recently been presented; I don't want to sound as if you can't use the same information, the same benchmarks, and still come to a different conclusion)

You can have a benchmark and you can have two review sites, and get totally different interpretations of the result depending on personal agendas or whatever.

It STILL doesn't negate the usefulness of these benchmarks. My targeted audience, even if not stated, was those denouncing low resolution gaming benchmarks with reasoning that "no one games with a 1080 GTX on a 1080P screen - so it's not realistic". In the last several pages I seen this come up several times. I've actually read almost every single page in this thread. The problem is though, no one benches at low resolution for realistic gaming performance. You bench at low resolution to specifically address how strong or weak the CPU is in a particular game or several. Ideally, you would do so with games that are reliant on several powerful threads on a CPU and in games that are more sensitive to high ipc/high clocks on a single thread, since many games coming out now are STILL a varied mix of both realities. And you write your material to talk about BOTH of those realities and maybe even how only one of those realities has a strong future.
 
Last edited:

Agent-47

Senior member
Jan 17, 2017
290
249
76
The 6900k is faster in gaming mostly because of its unified L3. Ryzen basically only has 4MB of full-speed L3 usable for any given core from testing. 6900k has some 16MB.

I think you are overestimating the impact of L3 in gaming. It only counts for 2-5% improvement in gaming from doubling L3.
also you have to consider this deficit in cache is also true for other benchmarks, where despite it, AMD pulls off the performance. So the gaming is unlikely to be limited by it.

Yes having a truely unified cache will be preferable and if they did, IPC could have been KL level, but this llimitation does not explain why gaming is below BWE while workstation is on par. gaming therfore cannot be cache just limited.

 
Last edited:

Puffnstuff

Lifer
Mar 9, 2005
16,046
4,805
136
Its a pretty high resolution. Is that 60Hz monitor? Depending on what you game? ..you might better if just changing gpu to a 1080ti or if it end up fast fast a Vega. The 4790 is a nice cpu.
Yes I run at 60hz using DP and its primarily a productivity monitor but is running at 5ms. Previously I was running a 27" dell ultrasharp 1920x1200 flat panel that was just too small for multiple applications to use simultaneously. Having large spreadsheets open was a real pita when one window held the data set forcing me to scroll repeatedly. With that said I will never return to a flat panel for desktop use. As for the gpu upgrading to the 1080Ti will probably be the first step I take in upgrading this system. If AMD and their partners can raise the performance of the 1800x at stock clocks to something closer to its Intel counterparts I might consider it as an upgrade path.
 

krumme

Diamond Member
Oct 9, 2009
5,956
1,595
136
As far as how that relates to Ryzen.... it doesn't. Ryzen has a very fast, and large, L3 - something is simply gutting its performance when accessing more than 4MB of it within the same CCX... and there is, it seems, ZERO access between the CCXes. As soon as we exceed 6MB accesses we end up at full-on memory latencies.

Ryzen is REALLY good, however, at predicting linear accesses.



Source

I dont know how the 2xccx pans out technically in the future and neither can i undertand all the implications but imo its clearly a performance tradeoff for saving a lot of cost using the same die and platform technology from all consumer segments to servers. Imo its a pretty brilliant move from a business perspective. Classic platform strategy thinking when its best. Gives us all cheaper products.
 

krumme

Diamond Member
Oct 9, 2009
5,956
1,595
136
Yes I run at 60hz using DP and its primarily a productivity monitor but is running at 5ms. Previously I was running a 27" dell ultrasharp 1920x1200 flat panel that was just too small for multiple applications to use simultaneously. Having large spreadsheets open was a real pita when one window held the data set forcing me to scroll repeatedly. With that said I will never return to a flat panel for desktop use. As for the gpu upgrading to the 1080Ti will probably be the first step I take in upgrading this system. If AMD and their partners can raise the performance of the 1800x at stock clocks to something closer to its Intel counterparts I might consider it as an upgrade path.
If it isnt outright obvious that you are directly cpu bound its always good to test just by lowering resolution and quality and take a tour where your games is most tuff. That method have always worked. Dont forget it.
Gaming at 60fps as min target means that only bf1 in some corner cases can take it to 40fps. Straight off the bat i dont know of other games that is so taxin with that 60 fps limit. So unless that the game you play your cpu is perfectly fine now.
The 1800x will improve in some games and perhaps generally but its the new game engines that will make a difference here imo so the 6c and especially 8c intel will stay strong and stronger here. But its also different cost.

Interesting with a curved screen btw. Will have to try it.
 

Atari2600

Golden Member
Nov 22, 2016
1,409
1,655
136
As far as how that relates to Ryzen.... it doesn't. Ryzen has a very fast, and large, L3 - something is simply gutting its performance when accessing more than 4MB of it within the same CCX... and there is, it seems, ZERO access between the CCXes. As soon as we exceed 6MB accesses we end up at full-on memory latencies.

Hmm, VERY interesting.

I wonder what would happen to performance in certain applications if AMD forced full coherency* (write-broadcasting) between the two L3 address spaces (effectively halving L3 cache from 16MB to 8MB).

Could they even enable that in bios updates? Or in Ryzen Master?


*if full coherency is forced, then as long as the bandwidth between CCX is sufficient to allow updating of both local and remote L3 within the timeframe for the existing remote L3 request, they should see a latency improvement - right?


**My thinking** If it takes say, 85 ns to send a request to the remote CCX and receive a return, then it should take less than 40ns to write-broadcast to the remote cache. The remote cores then take 10ns to perform a request on their local cache data. Total time (at worst) = 50 ns.



edit: Hmmm. Actually, now I think of it - what I posted here may be gibberish. It has been demonstrated that L3 cache performance beyond around 4MB makes very little difference in games. Which then wouldn't explain the poor performance. I suppose that cache thrashing (like the old Clovertown) could be pulling performance down 10-15%.
 
Last edited:

Trender

Junior Member
Mar 4, 2017
23
1
16
Yep. But this argument is only valid as long as you run a 8350 that couldnt get the most out of the games when it launched. It was to slow out the gate for even 60fps gaming.
This situation now is different.

There is a few/handfull games today where the ones with a 144 monitor benefits from a 7700. They get eg 144 vs 120 fps with better frametimes with a very fast gfx at 1080p.

And if they can buy a new i7/cpu in 2 years and they play those game certainly 7700 is clearly fastest. Its pretty aparent as the numbers is all over. But its also a 144 situation and not 60fps that the huge majority still runs.

In bf1 today you can make a modern i7 tank below 60fps. Absolute a corner case in the game and as i know the only game that kills the processor like that. But its actually a point in relation to the 8350 2012 non 60 fps situation.

As we know its the min 1%/0.01% defining experience for fps/action at least.

I think for people playing with a 60/90 monitor and/or keep their processor over 2 years 8c is far more futureproof than a fast 4c.
Well I mean if it tanks now, wont both tank anyway newer games which are more heavy and i7 will tank less just like now it gets more fps
 

AMDisTheBEST

Senior member
Dec 17, 2015
682
90
61
i need an am4 itx board. if i have that, i may be abl
There's nothing novel or particularly insightful in that video. He just points out the obvious (games have become more threaded over time) and fails miserably to make a compelling argument against low res benchmarks. As I said on OCN There are two possible objectives one might have when choosing a CPU for longevity:

a) Improving your future experience with existing games, or:
b) Improving your future experience with future games

The video focuses entirely on b, and draws grand conclusions from a relatively small pool of games tested by a single site (though admittedly one of the better ones).

It's also readily apparent, if you look at some of his other videos, that he has somewhat of an infatuation with AMD, so he's the last person I'd look to for unbiased analysis.

It's great that AMD is finally back in the game, but I feel that die-hard AMD fans are kind of ruining the moment by turning it into such a desperate pissing match and cooking up absurd conspiracy theories about the evil tech press. Can't we just enjoy that there are now several massively competent and fairly priced parts in the mid-high end.
None of what you said invalidates any of his points. He makes the case that Ryzen is more future proofing which is 100% substantiated by the empirical benchmarks he included in the vid. "infatuation with AMD" i heard this way too often from others who dismissed the merits of my posts simply because of my profile name. It's nothing more than a subtle ad hominem which refutes nothing.
 

looncraz

Senior member
Sep 12, 2011
722
1,651
136
I think you are overestimating the impact of L3 in gaming. It only counts for 2-5% improvement in gaming from doubling L3.
also you have to consider this deficit in cache is also true for other benchmarks, where despite it, AMD pulls off the performance. So the gaming is unlikely to be limited by it.

Yes having a truely unified cache will be preferable and if they did, IPC could have been KL level, but this llimitation does not explain why gaming is below BWE while workstation is on par. gaming therfore cannot be cache just limited.


The difference we are talking about is four fold. After ~4MB of L3 being written on Ryzen by one core it looks like that data is then no longer accessible anywhere near as quickly. This may be a result of its victim-cache policy, unseen cache contention from other cores, a CCX bug, L3 reservations, or even an AGESA/BIOS bug.

On Intel's 6900k, it can access 16MB with near perfect uniformity from any core. Ryzen can only do about 4MB - and that 4MB won't be the same 4MB between context switches about 50% of the time, which makes it even worse as cache lines are then flushed and all new data is fetched from RAM (with abysmal latency to boot).

There's a perfect storm of problems that impacts random-access memory operations negatively (such as fetching the data for AI computations in game, or organizing and issuing draw calls from that data...).

Random access performance is only as expected in that ~4MB region. Cache aware applications will treat Ryzen as having fast, uniform, access to either 8 or 16MB of data depending on if it is NUMA aware and has enough data to fill more than that. Many game engines are cache aware simply because they can manage the data stream more efficiently... and try to keep computations in the caches.

Windows's scheduler is cache aware but does not treat the two L3 segments as anything special - even when it does accurately identify them. Cache locality is lost on non-pinned thread groups.

These are all problems which are non-existent on Intel's L3 cache design.
 

looncraz

Senior member
Sep 12, 2011
722
1,651
136
And if you read a forum thread from one post to the next, or by chance glanced my next post after what you quoted, I had already answered this. I'm not unaware of how these benchmarks do no justice for the "evolution" of gaming. (edit: I should say "how these benchmarks do no justice for the "evolution" of gaming in how they've often recently been presented; I don't want to sound as if you can't use the same information, the same benchmarks, and still come to a different conclusion)

You can have a benchmark and you can have two review sites, and get totally different interpretations of the result depending on personal agendas or whatever.

It STILL doesn't negate the usefulness of these benchmarks. My targeted audience, even if not stated, was those denouncing low resolution gaming benchmarks with reasoning that "no one games with a 1080 GTX on a 1080P screen - so it's not realistic". In the last several pages I seen this come up several times. I've actually read almost every single page in this thread. The problem is though, no one benches at low resolution for realistic gaming performance. You bench at low resolution to specifically address how strong or weak the CPU is in a particular game or several. Ideally, you would do so with games that are reliant on several powerful threads on a CPU and in games that are more sensitive to high ipc/high clocks on a single thread, since many games coming out now are STILL a varied mix of both realities. And you write your material to talk about BOTH of those realities and maybe even how only one of those realities has a strong future.

It really doesn't matter if a game can run 200FPS or 300FPS... no one, at all, would call the game running at 200FPS as running poorly.

It's the conclusions drawn from the data that are problematic - and the idea that this somehow indicates future gaming performance... an idea which has been thoroughly debunked when comparing CPUs with different core counts.

However, if you were testing two quad/hex/octo core CPUs with SMT... suddenly these tests are fully relevant again.

If you were testing just how a single game scales with CPU performance, then it would be very relevant as well. It would set the floor for what people should expect from that game.

But comparing a quad core at 4.5Ghz+ to a octo core at 3.7Ghz at low resolutions, seeing a 10% difference in framerates that are well above what is needed, then declaring the octo core as a bad gaming CPU is... well... dishonest.

In order for my 2600k to bottleneck my RX 480 I had to drop it to its possible lowest resolution (1440x900, IIRC) and the lowest settings across the board (50% resolution scaling as well...). Then my RX 480 started to become used at only ~90% in a few peaks - therefore being held back by my CPU. Never mind I was running 200FPS+ on average in multiplayer.

If Ryzen can only manage 180FPS in that scenario... guess what? I don't care - I don't play like that - I can't even make out people on the ground. As soon as I turn the game up to playable settings, I become GPU bottlenecked - and still push 120FPS @ 1080p.

As soon as settings reach that level of performance in a game - CPU performance is irrelevant for that game, move on.
 

inolvidable

Junior Member
Mar 30, 2009
5
2
81
If you take handbrake h265 encode that uses avx2 a ryzen 8c is about 6800 6c perf level. And thats a h265 cornercase in handbrake.
I mean this up to 50% is optimistic and even then so what? For the rest of the stuff zen is bwe 6900 like perf and a dirt cheap pro tool.
Zen have 2 fpu units and it shows in a lot of productivity workloads and it compensates a lot for not having avx2. Especially in those loads it can use its fpu ressources with its superior smt. So it compensates.
Intel have segmented themselves into a corner and removed avx2 from a huge part of their portfolio.
Avx2 uptake is slow. We are more in sse than avx for many applications as i can tell.

My guess is avx2 is a 7nm thing due to transistor cost. So a zen plus plus in 2-3 year.

My take on this avx2 issue is its actually the ommision of avx2 that makes zen so darn fast. Its a fine priority. Amd usually historically brought tech to early to market. This time they did it right.

The main problem I see is that we are talking about things that may or may not happen. I share most of your views but I fear I could be underestimating Intel. IMHO one of the most cost-effective ways to crush the 8C/16t Ryzen (the ones aimed at workstations) wold be to reverse course and invest big money so the major players implement 256bit AVX2 asap in some of the "must have" software that can benefit from it. This way by the time Skylake-X and the X299 chipset arrived (in a few months) Intel would have a significant advantage over Zen out of nothing, I mean, just by tweaking software with no need to change the price scheme or anything at the hardware level.

As a side note we saw big headlines last week about Google choosing Skylake-S CPUs for their servers due to the leap in performance that the avx-512 instruction set offer for this segment. In Google's words: "Skylake’s AVX-512 doubles the floating-point performance for the heaviest calculations” and is therefore being advanced as eminently suitable for “scientific modeling, genomic research, 3D rendering, data analytics and engineering simulations.”
I get that it is not an accurate comparison because the server segment has little to do with the HEDT one, and because Naples were not there when this deal was made. However I think it is a good example that the instruction set strategy has already been used by Intel with success. By the way avx-512 seem to be one of the most important selling points for the CannonLake Xeons (It seem like the Skylake "mainstream" Xeons would have it at the hardware level, but disabled for some reason)

I am neither a programmer nor an Intel strategist and what I see as the most effective way to hurt Zen in the HEDT segment might perfectly be wrong for a myriad or reasons. Unfortunatelly recent events and the lack of knowledge on my part are advicing me to wait till Intel make some move and enjoy in the meantime every new discovery about Ryzen and every new performance gain it gets through updates in microcode, bios, windows or software in general.

All in all I am truly impressed with AMD and I really hope they can offer us similar level of performance at half the price of their Intel counterparts for a long time so they can regain some deserved market share and force by the way true innovation again in CPUs at reasonable prices.
 
Last edited:

looncraz

Senior member
Sep 12, 2011
722
1,651
136
Hmm, VERY interesting.

I wonder what would happen to performance in certain applications if AMD forced full coherency* (write-broadcasting) between the two L3 address spaces (effectively halving L3 cache from 16MB to 8MB).

Could they even enable that in bios updates? Or in Ryzen Master?


*if full coherency is forced, then as long as the bandwidth between CCX is sufficient to allow updating of both local and remote L3 within the timeframe for the existing remote L3 request, they should see a latency improvement - right?


**My thinking** If it takes say, 85 ns to send a request to the remote CCX and receive a return, then it should take less than 40ns to write-broadcast to the remote cache. The remote cores then take 10ns to perform a request on their local cache data. Total time (at worst) = 50 ns.



edit: Hmmm. Actually, now I think of it - what I posted here may be gibberish. It has been demonstrated that L3 cache performance beyond around 4MB makes very little difference in games. Which then wouldn't explain the poor performance. I suppose that cache thrashing (like the old Clovertown) could be pulling performance down 10-15%.


I had originally postulated that Ryzen would have two global L3 tag sets attached to the data fabric and one per CCX. The tag sets would be kept synchronized with global data and would act as a virtual L4 - except using the L3 in each CCX as the store. Synchronization would be several times faster than for writing back out to main memory (which would happen asynchronously from the global data synchronization).

This would have sped up pretty much every scenario we are seeing being problematic right now. Inter-CCX traffic, however, would have increased and there would be a first-access penalty - and global data would incur a full-time penalty to ensure that the global tag and the local tag are synchronized... which would cause a performance slow down under heavy lock contention relative to other areas of performance - BUT it would still be better than going back out to RAM.

Another option - which is simpler, but costs more die space, is just to use two small L4 caches attached to the data fabric adjacent to each memory controller. This would just make accesses to memory more uniform and there would be no need for explicit inter-CCX communication or awareness.
 
Reactions: Drazick

Agent-47

Senior member
Jan 17, 2017
290
249
76
The difference we are talking about is four fold. After ~4MB of L3 being written on Ryzen by one core it looks like that data is then no longer accessible anywhere near as quickly. This may be a result of its victim-cache policy, unseen cache contention from other cores, a CCX bug, L3 reservations, or even an AGESA/BIOS bug.

On Intel's 6900k, it can access 16MB with near perfect uniformity from any core. Ryzen can only do about 4MB - and that 4MB won't be the same 4MB between context switches about 50% of the time, which makes it even worse as cache lines are then flushed and all new data is fetched from RAM (with abysmal latency to boot).

There's a perfect storm of problems that impacts random-access memory operations negatively (such as fetching the data for AI computations in game, or organizing and issuing draw calls from that data...).

Random access performance is only as expected in that ~4MB region. Cache aware applications will treat Ryzen as having fast, uniform, access to either 8 or 16MB of data depending on if it is NUMA aware and has enough data to fill more than that. Many game engines are cache aware simply because they can manage the data stream more efficiently... and try to keep computations in the caches.

Windows's scheduler is cache aware but does not treat the two L3 segments as anything special - even when it does accurately identify them. Cache locality is lost on non-pinned thread groups.

These are all problems which are non-existent on Intel's L3 cache design.

again, yes this is not the most ideal choice. I bet they did a full cost benefit analysis before pulling the trigger on the design. if it had access to all 16 MB, their ST score would have touched KL. but the 4MB limit (if its true at all) is true for non-game applications where AMD is able to keep intel under control. Hence this cannot be the reason why AMD cannot keep intel at bay in games.

again, the benchmark i listed shows that going from 2MB to 8 MB (four-fold), the % increase is roughly 2-7% depending of the title.

In fact, one game where Ryzen is far behind BWE is Tomb Rider. and from the graph TR is not sensitive to L3 at all.

EDIT:
as for cache aware application trying to access the full 16 MB: if this is indeed true, that is what scheduler/software optimization is for. These problems are non-existant with intel because AMD never had the core design that deserved tailor made optimization. just give it some time. TIll now Intel was literally the only CPU.
 
Last edited:

tential

Diamond Member
May 13, 2008
7,355
642
121
What are the chances that Ryzen will have smt working by the end of the year? If I can't emulate dual/tri core Ryzen with Ryzen 7 for multiple gaming units on one cpu, I'm going to ignore Ryzen til Ryzen+(Ryzen 2 or whatever)

I pray they fix it.
This was the most interesting hardware upgrade I was going to do
 

Agent-47

Senior member
Jan 17, 2017
290
249
76
The main problem I see is that we are talking about things that may or may not happen. I share most of your views but I fear I could be underestimating Intel. IMHO one of the most cost-effective ways to crush the 8C/16t Ryzen (the ones aimed at workstations) wold be to reverse course and invest big money so the major players implement 256bit AVX2 asap in some of the "must have" software that can benefit from it. This way by the time Skylake-X and the X299 chipset arrived (in a few months) Intel would have a significant advantage over Zen out of nothing, I mean, just by tweaking software with no need to change the price scheme or anything at the hardware level.

Donot know what google said, but there are reports that after AMD showed off ryzen on opensource Blender, Intel update the code path to add AVX support. Yet AVX failed to give intel any meaningful improvements. I never personally worked with AVX, because its faster to use C++ AMP to offload calculations to GPU.
 
Last edited:

looncraz

Senior member
Sep 12, 2011
722
1,651
136
again, yes this is not the most ideal choice. I bet they did a full cost benefit analysis before pulling the trigger on the design. if it had access to all 16 MB, their ST score would have touched KL. but the 4MB limit (if its true at all) is true from non-game applications where AMD is able to keep intel under control. Hence this cannot be the reason why AMD cannot keep intel at bay in games.

again, the benchmark i listed shows that going from 2MB to 8 MB (four-fold), the % increase is roughly 2-7% depending of the title.

In fact, one game where Ryzen is far behind BWE is Tomb Rider. and from the graph TR is not sensitive to L3 at all.

We aren't interested in Ryzen matching Intel in absolute performance - it's relative to its own performance in other applications.

Ryzen is more than two times slower in accessing data after ~4MB than it should be. However, as soon as you are streaming data linearly ... that completely goes away and Ryzen even bests Intel everywhere after 16MB (that's Smart Prefetch in action).

Capacity isn't the only issue - it's the uniformity. On top of the same-core uniformity issue for random access, it also has a context switch double-penalty. Usually a high-load thread seeing a context switch will be rescheduled for another core by the kernel - this is simple load balancing. The cost of this is fairly trivial as the L2 and L1 caches on the CPU core have already been flushed, so scheduling it for another core is a non-issue.

The larger the L3 cache the CPU has - or the shorter the interruption - the higher the probability that thread's data will still be in the L3 and still be fresh - execution resumes seamlessly. The kernel intentionally plans on this situation and tried to optimize it. It needs to treat the cache exactly perfectly for this to happen best.

Where Ryzen has a problem with this is that the thread has a 50% chance of being pushed to the other CCX. This occurs every 10ms (kernel scheduler preemption interrupt interval used by Windows the last I checked). Games are VERY reliant on this context restoration behavior as they tend to have at least one thread that has exactly zero yields to the scheduler - so it is being interrupted every 10ms like clockwork - if even for only a few dozen nanoseconds - then moved to another core, with its primary context restored... a process which takes time (and is beyond the scope of this comment ).

This violates all the rules of cache locality - a major no-no.

Setting cpugroups in Windows is almost a valid solution, but most games are not NUMA aware and will end up just running on one CCX - so it's better to patch the games to be NUMA aware - or at least to manually set thread affinity on Ryzen for threads in relation to their cache needs. That's an easy 10% bump from just calling SetThreadAffinityMask().
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |