Will AMD support AVX-512 and Intel TSX ?

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,729
136
We don't see it cause AVX isn't faster in the case of consoles when it's practically half rate and Sandy Bridge supporting it isn't good enough either since there's lots of users who refuse to ditch old systems (as we can see from the No Man's Sky and 3D Mark Time Spy backlash when they required SSE4.1 ?) so that just means that the app developers withhold taking use of the new extensions ... (There's lot's of gravity involved with software support such as userbase, drivers, and the compilers before adopting new hardware and extensions)

I believe that only consoles can set an example for common end users how important these features are by showing has this reflects to PC games ...

Realistically speaking if you can find a speedup using SSE in a game engine then you can almost certainly find a speedup using AVX-512. You would usually need tens of thousands elements to work on before you start seeing a payoff whereas if you need to do several hundreds or a couple thousand iterations then AVX-512 is best suited to the task when those vector units are running higher clocks along with a lower startup cost since you can keep the scalar and vector code entangled to each other so that translates to a lower latency in the end which is a fair bit of a win ...
I said that AVX has been supported since SB, which means every Intel CPU since then and every AMD CPU since Bulldozer has supported AVX, yet we have only seen it achieve limited traction, given that it has been four years since Haswell which offered a marked improvement in AVX support over the previous generations. Oxide's Nitrous engine supposedly uses AVX, even then there's just one game in the market which uses that engine.

No Man's Sky wasn't the first instance where SSE4.1/4.2 support was a requirement which was later dropped due to pressure from customers stuck with old CPUs. MGS V:TPP was one of the first cases where this issue was reported, and it was patched the next day. This really suggests to me that it was more of an oversight, considering how easy it was to fix and especially when Ground Zeroes which was released almost a year back didn't have this requirement and looked better in certain aspects.

If there is potential performance gains to be had in executing hundreds of small loops in a game engine like you say, why should the game developers skip AVX and AVX2 and go directly to AVX512? I don't see how Zen splitting an AVX2 op into 2x128 bit ops is going to be a problem in these situations where it is much less about overall throughput compared to HPC workloads.
 
Last edited:

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,729
136
Subset of PCs with a particular type of hardware? No idea what you're talking about. PhysX isn't just restricted to PCs, it's also found on loads of console games as well. Witcher 3 used PhysX for instance, and so does every Unreal Engine 3 and Unreal Engine 4 game plus many other games. In fact, PhysX is the most popular middleware physics engine by far.
Well cross-platform engines like Unreal or Unity aren't usually the torchbearers in graphical quality or performance when it comes to consoles. Developers who push the limits of what can be extracted from console hardware, like Naughty Dog - do they use PhysX? Even if they use AVX the benefits aren't handed down to PC gamers because those games are usually exclusives.

As for Witcher 3, a lot was promised from its PhysX implementation which didn't make it into the final release. Moreover, AVX wasn't clearly an issue as it ran on Conroe. Frostbite 3 and AnvilNext seem to run on Phenom II. We don't know how much it impacts performance having to fall back on SSE, they indeed use AVX - that is something we can only speculate on.
As for latency issues with wider SIMD, I would imagine that SIMD generally reduces latency, since it uses fewer instructions to perform calculations. Also, remember that there are twice as many AVX-512 registers, from 16 to 32.
That depends on the type of operation that is being performed. In throughput oriented benchmarks, like LINPACK and STREAM triad, latency issues arise when you spill caches. This mostly concerns HPC guys. However, in the real world, when you call real functions, they can quickly become I/O bound, and when that happens nothing can be done.
I'm not a programmer, but from what I've read, most of the problems with AVX stem from the lack of certain instructions, or their performance. For instance, when AVX was first introduced with SB, there was no gather instruction. We had to wait till AVX2 with Haswell for the gather instruction, and even then, it was considered too slow to use. But Intel kept refining it, first with Broadwell and then with Skylake.

And now with AVX-512, you have scatter and masked operations, which make auto vectorization easier and more performant. So the point is, that wide SIMD on x86 has been a work in progress, and AVX-512 is a major milestone; not just in terms of performance, but practicality as well.
There are alignment issues when fitting lesser width ops in registers designed to handle larger widths, and there are lots of threads @realworldtech which discusses these things in great detail. It's not so simple - yes it is much less painful than before, but there are sill a lot of things to consider.
 
Reactions: Drazick

krumme

Diamond Member
Oct 9, 2009
5,956
1,595
136
I don't want to say you can't have Zen without L3. But it would be pretty close to the truth. A big reason AMD went cacheless on their APU's was due to the fact that any additional cache they added subtracted from the APU CU's they could add. With Zen they obviously didn't want to be stuck in that situation again and made L3 more important than ever and added the L3 cache to the center of the CCX instead of doing what they and intel have done which is add the cache afterwards once they figure out how much they wanted to add.

Zen really isn't about the cores themselves but the CXX module. That is Zen. Just like the Bulldozer architect wasn't about a single BD die but the full CMT module.

Now again these are customer customer chips so it's all up for grabs. But it's going to take a lot of money to get AMD to basically carve up a CCX.
Amd explicitely said zen is going to cover both small and big core. How will they do it in consoles then? And what mm2 budget will there be?
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Well cross-platform engines like Unreal or Unity aren't usually the torchbearers in graphical quality or performance when it comes to consoles. Developers who push the limits of what can be extracted from console hardware, like Naughty Dog - do they use PhysX? Even if they use AVX the benefits aren't handed down to PC gamers because those games are usually exclusives.

Well this is kind of a goal post shift, but I'll answer your question anyway. Naughty Dog uses their own 3D and physics engines, but whether they're any better than the Unreal Engine or any other major 3D engine is another matter. Unreal Engine is typically bleeding edge, and constantly evolving. Same thing with PhysX.

PhysX was the first physics engine to bring cloth simulation to gaming with a PPU, then the GPU and finally the CPU. PhysX Flex is also bringing advanced physics simulation for effects like smoke, fluid and hair that will run on the GPU using asynchronous compute. But these will eventually be ported over to the software version, and that's where wide vectors can become useful since the simulation of these effects are much more compute intensive.

As for Witcher 3, a lot was promised from its PhysX implementation which didn't make it into the final release. Moreover, AVX wasn't clearly an issue as it ran on Conroe. Frostbite 3 and AnvilNext seem to run on Phenom II. We don't know how much it impacts performance having to fall back on SSE, they indeed use AVX - that is something we can only speculate on.

Witcher 3 was downgraded in a lot of different ways unfortunately. Luckily, it still turned out to be one of the best games in memory. As for PhysX, CDPR used PhysX 3.2x which runs on the CPU. And the game features a LOT of very nice cloth simulation.

Regarding the game running on Conroe and Yorkfield, I've seen the videos. This shows that PhysX itself has multiple paths depending on what CPU is used. A CPU without AVX will use SSEn. Looking at the videos though, the older CPUs gets bogged down very easily; especially in Novigrad.


Here's one in Beauclair from Blood and Wine. Blood and Wine uses more cloth simulation than the base game:


That depends on the type of operation that is being performed. In throughput oriented benchmarks, like LINPACK and STREAM triad, latency issues arise when you spill caches. This mostly concerns HPC guys. However, in the real world, when you call real functions, they can quickly become I/O bound, and when that happens nothing can be done.

There are alignment issues when fitting lesser width ops in registers designed to handle larger widths, and there are lots of threads @realworldtech which discusses these things in great detail. It's not so simple - yes it is much less painful than before, but there are sill a lot of things to consider.

Not being a programmer myself, I'll leave solving these problems to them. Like I mentioned previously, taking advantage of wider vectors will likely require rethinking the entire approach to software development.
 
Last edited:
Reactions: Sweepr

ThatBuzzkiller

Golden Member
Nov 14, 2014
1,120
260
136
How far in the future are you talking? Because I feel like we may be talking at cross purposes somewhat The jump from what's in today's consoles to what you are proposing is enormous. For reference, here's a comparison of die sizes of CPUs:



Even comparing 28nm to 22nm, a Haswell core without AVX-512 is almost five times the size. We will need a lot of die shrinks to happen before we can fit 8 Haswell-plus-AVX-512 CPUs onto a console APU without significantly increasing die size.

I would agree that in the long, long term, it could well become a small enough addition to a core to make its way in. It's cool tech, and it has its usage. But by the time it's feasible in a console in, say, 2027, will we even still be on x86? Maybe we'll be talking about ARM SVE instead.

I'm probably talking in roughly a little less than next half a decade or so where Samsung will introduce 4nm LPP which will feature the new MCBFET (multi-channcel bridge MOSFETs) transistor technology by late 2021 and this is also where I expect that date to coincide with the release of next generation consoles. With SS 4nm LPP we'll be able to achieve nearly 10x the transistor density which should translate to 10x smaller cores.

I'd be surprised if AVX-512 nearly as much die space as the people proclaim because I don't expect it will take more than 2mm^2 per core on a 22nm process to be able to implement the functionality when all you're adding is 8 or so ALUs, 2KB register file (this pales in comparison to the hundreds or so kilobytes that a CU or an SM will have) and an increased width in the load and store ports for practicality. The rest of the implementation is pretty much optional such as increasing cache size.

In the end my new proposed console cores will take up less than 1.7mm^2 so console hardware designers should go all out as much as possible with CPU performance when the clear trend is CPU is taking up less and less die space to the point where it will be vanishingly small and by then EUV will be ready. I could see some valid concerns with thermal budget but with the new MCBFETs we'll get lower leakage and lower threshold voltage which will also keeps the thermals in balance too ... (even better, we can enhance the performance by introducing III-V semiconductor materials to the channels)

To answer your last line, yes we'll probably still be on x86 because BC will be important for a couple decades to come unless there's an industry wide drive to really port over the vast majority of established code bases (and I doubt the console manufacturers would want to give to up BC so easily) ...

The above is precisely why I want console hardware designers to set an example for all high performance application programmers so that we can establish a new software ecosystem with much better hardware infrastructure. (considering the next generation will most likely be the last I wouldn't want to hold back in terms of features and if AVX-512 will let us hit the magical 16.6ms frame time target then I'm all for it but even more so for VR applications which needs much tighter frame times)
 

jpiniero

Lifer
Oct 1, 2010
14,841
5,456
136
I should point out that:

A.) There may not even be another generation of consoles period or for some time. If Scorpio flops, MS may either give up completely or go to Plan C, which is "OEMs selling PCs that can't run Steam". Hell, that's probably what Scorpio is in the first place. Especially with the Switch selling pretty well, Sony could sit on the PS4 for awhile if MS were to give up. Sony has even started to push the regular PS4 more because the Pro flopped (although that was more due to the PSVR flopping because VR in general is dead)

B.) That there is no guarantee that any future console processors would be made by AMD or even x86. A Sony Exec came out not too long ago and said something to the effect that backwards compatibility is talked about but not actually that used. And then there was that report that only 1.5% of time playing on XBone was on BC games.

So any talk of AVX-512 on consoles is really premature.
 

ThatBuzzkiller

Golden Member
Nov 14, 2014
1,120
260
136
I said that AVX has been supported since SB, which means every Intel CPU since then and every AMD CPU since Bulldozer has supported AVX, yet we have only seen it achieve limited traction, given that it has been four years since Haswell which offered a marked improvement in AVX support over the previous generations. Oxide's Nitrous engine supposedly uses AVX, even then there's just one game in the market which uses that engine.

No Man's Sky wasn't the first instance where SSE4.1/4.2 support was a requirement which was later dropped due to pressure from customers stuck with old CPUs. MGS V:TPP was one of the first cases where this issue was reported, and it was patched the next day. This really suggests to me that it was more of an oversight, considering how easy it was to fix and especially when Ground Zeroes which was released almost a year back didn't have this requirement and looked better in certain aspects.

If there is potential performance gains to be had in executing hundreds of small loops in a game engine like you say, why should the game developers skip AVX and AVX2 and go directly to AVX512? I don't see how Zen splitting an AVX2 op into 2x128 bit ops is going to be a problem in these situations where it is much less about overall throughput compared to HPC workloads.

Sandy Bridge supporting it just isn't good enough. Maybe if Intel wasn't so adamant in fusing off the functionality for lower tier parts (Skylake Pentiums/Celerons without AVX/AVX2 ?!), support began with Nehalem (wasn't possible due to the specs not being finalized understandably), consoles had full rate AVX then only maybe would we have seen more console centric engines like Frostbite 3, AnvilNEXT 2, Dunia 2, Glacier 2, EGO, Creation, id Tech 5/6 engine and etc requiring AVX only supported CPUs ...

There's nothing good about lowering system requirement when it just sacrifices the technical quality of the game. I'm just glad many console centric engines don't run on mobile hardware when they have higher technical standards. (too bad that Unreal Engine 4 and Unity has to be downgraded for the likes of Android and iOS since there's much hardware that don't support compute shaders)

I didn't say that game developers should skip AVX/AVX2 if that's what you thought I implied. I said console hardware designers should implement AVX-512 (it's backwards compatible with AVX and SSE since AVX-512 is really just an extension to those extensions) because the only way to show the fruits of new hardware and get people to upgrade is to reflect those benefits in games and other high performance applications ...
 

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,729
136
Sandy Bridge supporting it just isn't good enough. Maybe if Intel wasn't so adamant in fusing off the functionality for lower tier parts (Skylake Pentiums/Celerons without AVX/AVX2 ?!), support began with Nehalem (wasn't possible due to the specs not being finalized understandably), consoles had full rate AVX then only maybe would we have seen more console centric engines like Frostbite 3, AnvilNEXT 2, Dunia 2, Glacier 2, EGO, Creation, id Tech 5/6 engine and etc requiring AVX only supported CPUs ...

There's nothing good about lowering system requirement when it just sacrifices the technical quality of the game. I'm just glad many console centric engines don't run on mobile hardware when they have higher technical standards. (too bad that Unreal Engine 4 and Unity has to be downgraded for the likes of Android and iOS since there's much hardware that don't support compute shaders)

I didn't say that game developers should skip AVX/AVX2 if that's what you thought I implied. I said console hardware designers should implement AVX-512 (it's backwards compatible with AVX and SSE since AVX-512 is really just an extension to those extensions) because the only way to show the fruits of new hardware and get people to upgrade is to reflect those benefits in games and other high performance applications ...
Well one issue that I've heard people complain about a lot regarding SSE is how over the years it has become fragmented and developers do not always use all the features that the latest versions bring with them, and Intel is largely to blame for this. If AVX is supposedly an attempt to begin on a clean slate, then it is welcome. But who's to say that the same mistakes won't be repeated? That's why I at least feel cautiously optimistic about it.

Coming to PC, Steam HW stats show that 99.98% of PCs have a CPU with SSE3 support. When it comes to SSE4.1 and 4.2, the figure is 90.03% and 87.59%, respectively. However with AVX support there is a sharp drop to 76.21%. That's one reason why adoption of AVX by game developers have been so slow. The threshold for supporting older hardware is lower than that of supporting older software, which is reflected in these numbers. Now with AVX512 supported only on upcoming HEDT processors, there will be some significant time lag before it trickles down to PCs running on sub-100$ CPUs, even if console designers implement it during that time, because a majority of releases are multiplatform. So it is not only about AMD supporting it, as it can't take off if Intel does things like restrict AVX2 and above to its i3 lineup while its most value-oriented chip, the G4560, goes unsupported.
 
Last edited:
Reactions: Drazick

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,729
136
Well this is kind of a goal post shift, but I'll answer your question anyway. Naughty Dog uses their own 3D and physics engines, but whether they're any better than the Unreal Engine or any other major 3D engine is another matter. Unreal Engine is typically bleeding edge, and constantly evolving. Same thing with PhysX.

PhysX was the first physics engine to bring cloth simulation to gaming with a PPU, then the GPU and finally the CPU. PhysX Flex is also bringing advanced physics simulation for effects like smoke, fluid and hair that will run on the GPU using asynchronous compute. But these will eventually be ported over to the software version, and that's where wide vectors can become useful since the simulation of these effects are much more compute intensive.
Uncharted 4 does physics simulation far better than anything achieved by UE, like Gears of War 4.
Then you have to consider the historical problems with performance of UE games on consoles, and to add to that the compromises made due to its multiplatform nature given that most UE titles that get released for consoles also get a PC release.
Witcher 3 was downgraded in a lot of different ways unfortunately. Luckily, it still turned out to be one of the best games in memory. As for PhysX, CDPR used PhysX 3.2x which runs on the CPU. And the game features a LOT of very nice cloth simulation.

Regarding the game running on Conroe and Yorkfield, I've seen the videos. This shows that PhysX itself has multiple paths depending on what CPU is used. A CPU without AVX will use SSEn. Looking at the videos though, the older CPUs gets bogged down very easily; especially in Novigrad.
As for Witcher 3, it is impossible to say how much of the performance hit is due to the slowness of Conroe itself, or the fact that it has to fall back to SSE for the cloth simulation.

I'll admit that AVX can be useful, but it will take time before AVX512 finds its usefulness with respect to consoles, as the new console generation is still 3-4 years away.
 
Reactions: Drazick

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Uncharted 4 does physics simulation far better than anything achieved by UE, like Gears of War 4.

How do you know it's being "simulated?" If you watch that video, the host explains that many of those effects are merely scripted animations which playback a certain way depending on the circumstance. While some computation is used for these animations, they are nowhere near as computationally intensive as actual full on simulation.

And this makes sense, because as impressive as Uncharted 4 is, it still runs on a PS4 with a low end CPU. Over the years, developers have gotten very clever when it comes to mimicking actual physics with scripted animations. And this is doubly so for the Uncharted games, which have some amazing set pieces. I remember seeing the plane crash scene for Uncharted 3 years ago and being blown away......before I figured out that it was mostly scripted.

Then you have to consider the historical problems with performance of UE games on consoles, and to add to that the compromises made due to its multiplatform nature given that most UE titles that get released for consoles also get a PC release.

Well last gen, UE was the king among the consoles. The vast majority of games were made on UE3. As for current gen, it seems that most developers have opted to make their own internal engines rather than use someone else's. But I do believe that UE still ranks first amongst the list of cross platform engines.

I'll admit that AVX can be useful, but it will take time before AVX512 finds its usefulness with respect to consoles, as the new console generation is still 3-4 years away.

That's a good find. Yep, like I said, wide vectors can be very useful. I can easily imagine AVX2 or AVX-512 being used for advanced fluid, smoke and cloth simulations, plus some insane rigid body calculations. In fact, PhysX 3.4 has MUCH faster rigid body performance compared to previous versions, so I'm curious as to what kind of under the hood optimizations they made:

 

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,729
136
How do you know it's being "simulated?" If you watch that video, the host explains that many of those effects are merely scripted animations which playback a certain way depending on the circumstance. While some computation is used for these animations, they are nowhere near as computationally intensive as actual full on simulation.

And this makes sense, because as impressive as Uncharted 4 is, it still runs on a PS4 with a low end CPU. Over the years, developers have gotten very clever when it comes to mimicking actual physics with scripted animations. And this is doubly so for the Uncharted games, which have some amazing set pieces. I remember seeing the plane crash scene for Uncharted 3 years ago and being blown away......before I figured out that it was mostly scripted.
If the end result is better, then I don't find any merit in not using such techniques on the PC as well, where much more powerful CPUs are available, even though it might be possible to do a full simulation.

It's the same thing as brute-force 4K vs checkerboard - there is no reason why a mid-range or GPU of today's standards, can't do 4K@60 if there is an option of not having to render at full resolution.
 
Reactions: Drazick

ThatBuzzkiller

Golden Member
Nov 14, 2014
1,120
260
136
Well one issue that I've heard people complain about a lot regarding SSE is how over the years it has become fragmented and developers do not always use all the features that the latest versions bring with them, and Intel is largely to blame for this. If AVX is supposedly an attempt to begin on a clean slate, then it is welcome. But who's to say that the same mistakes won't be repeated? That's why I at least feel cautiously optimistic about it.

Coming to PC, Steam HW stats show that 99.98% of PCs have a CPU with SSE3 support. When it comes to SSE4.1 and 4.2, the figure is 90.03% and 87.59%, respectively. However with AVX support there is a sharp drop to 76.21%. That's one reason why adoption of AVX by game developers have been so slow. The threshold for supporting older hardware is lower than that of supporting older software, which is reflected in these numbers. Now with AVX512 supported only on upcoming HEDT processors, there will be some significant time lag before it trickles down to PCs running on sub-100$ CPUs, even if console designers implement it during that time, because a majority of releases are multiplatform. So it is not only about AMD supporting it, as it can't take off if Intel does things like restrict AVX2 and above to its i3 lineup while its most value-oriented chip, the G4560, goes unsupported.

Frankly, I think SSE has stopped becoming fragmented since development ended on it long ago ...

Nobody is making a new spec to extend SSE. Well TBH it's only Intel who is the one standardizing x86 ISA extensions so AMD can't afford to take part in anything too risky like risk flooding the x86 opcode space or do extensive validation for mammoth amounts of new functionality much like Intel had to for TSX and Intel is not at all interested in developing new extensions for SSE ...

Console may support SSE4a which amounts to 2 or so instructions but if console hardware designers we're smart enough they'd have disabled it by either fusing it off, deactivate it (via microcode) or just avoid generating the two instructions at the compiler level ...

Plus every new x86 processor sold supports the most common SSE extensions (as sanctioned by Intel). I think SSE is now pretty much cemented as the standard seeing as how Intel will include it in their lower tier CPUs so I wouldn't be surprised if the new Windows OS went 64 bit and SSE 4.2 only ...

I realize it takes years if not maybe even decades to shift an entire software ecosystem as evidenced when it took 15 years for AAA games to make 64-bit CPUs only the norm but I think there's an even bigger benefit for consoles adopting AVX-512 as it will drive new x86 processor sales off the roof which will greatly help AMD's bottomline with their new clean slate. If the latest AAA next gen games will require AVX-512 then the value oriented next generation Ryzen 3's will be flying of the shelves in comparison to Intel's Pentiums/Celerons if Intel is not willing to keep feature parity with it's competitors.

So in the end if Intel can't be trusted to deliver the best then maybe we can depend on upgrading to AMD CPU systems instead with their newer and more reliable microachitectures in the future. We don't need Intel anymore to take things off we need those who can establish a new ecosystem much like how Microsoft dictates a baseline for DirectX, I believe AMD can too dictate a minimum standard through their ISV partners and in turn maybe get Intel to change their ways about separating feature sets ...
 

knutinh

Member
Jan 13, 2006
61
3
66
I believe that SIMD calculation amounts to a minute fraction of the cpu area, and a larger (but still small) fraction of the cpu power budget.

I have seen energy breakdowns on fetching 64 bits from memory, doing a double-precision multi-acc, and storing the result back to memory. Turns out that data movement is really expensive, multiplications and additions not so much.

Of course, if you increase the multiplication capabilities by 2x, you would want to increase the data movement accordingly, which is (I would guess) one reason why Intel are reducing clocks when encountering AVX code.

I think that some contributors to this thread has an odd view of the general usability of (wide) simd. If you are writing a text editor, simd might not matter much. But number crunching is critical to a number of consumer and prosumer tasks. Sometimes it makes sense to do that number crunching in fixed function hardware or in limited function GPUs. But the ability use a fairly generic and "rich" cpu to do strong encryption, to apply complex filtering to your audio application, to do high quality image and video processing, to do the currently hyped convolutional neural networks all could benefit from having simd available.

-k
 

jpiniero

Lifer
Oct 1, 2010
14,841
5,456
136
No Man's Sky wasn't the first instance where SSE4.1/4.2 support was a requirement which was later dropped due to pressure from customers stuck with old CPUs. MGS V:TPP was one of the first cases where this issue was reported, and it was patched the next day. This really suggests to me that it was more of an oversight, considering how easy it was to fix and especially when Ground Zeroes which was released almost a year back didn't have this requirement and looked better in certain aspects.

I wouldn't be surprised if at least for TPP that they had a no-vector fallback which was compiled out for the launch but put back in after the patch.
 

Thunder 57

Platinum Member
Aug 19, 2007
2,814
4,108
136
I don't want to say you can't have Zen without L3. But it would be pretty close to the truth. A big reason AMD went cacheless on their APU's was due to the fact that any additional cache they added subtracted from the APU CU's they could add. With Zen they obviously didn't want to be stuck in that situation again and made L3 more important than ever and added the L3 cache to the center of the CCX instead of doing what they and intel have done which is add the cache afterwards once they figure out how much they wanted to add.

Zen really isn't about the cores themselves but the CXX module. That is Zen. Just like the Bulldozer architect wasn't about a single BD die but the full CMT module.

Now again these are customer customer chips so it's all up for grabs. But it's going to take a lot of money to get AMD to basically carve up a CCX.

No L3 could surely be done. It would be a horrible idea though. The L3 is generally used as the last level cache (LLC). Cores can benefit greatly for having this, as it makes it easier to share/find data with other cores. The problem with the construction cores is that the L2 and L3 cache had very high latencies. The L3 was particularly bad, it wasn't all that much quicker than main memory. According to Anand (in one of those CPU reviews), AMD eventually discovered the cause of the high latency for the L3. They decided not to do anything with it though as servers are more forgiving with latency.

The L2 I've read was supposed to be smaller and much quicker. For whatever reason that didn't happen, so that's why they made it so large. This is all speculation of course, based on things I've read over the years.

With Zen, they really fixed those cache problems. In a HUGE way. The L1 went back to a writeback cache, the L1/L2 changed from exclusive to inclusive (first time for AMD I believe). That made a significant difference. With the L3, they apparently got 5x the bandwidth. How? I have no idea. The latencies are also better.



TL;DR AMD could get away with losing the L3 cache with the Construction cores because it was high latency and did not do much for desktop tasks. With Zen, they really made it a priority to fix the cache system and since it is much better, they would almost certainly need to keep.
 

beginner99

Diamond Member
Jun 2, 2009
5,223
1,598
136
he newest mid generation console updates like the PS4 Pro and project Scorpio only gives a frequency boost of 31% and 37% respectively and their built on the much more recent 16/14nm process node.

That's pretty biased argument. project scorpio also increases GPU clocks by a similar amount (37%) and increases amount of SPs by 333%!!! which leas to roughly 5 times peak throughput. So yeah, the new process is what matters mostly. The rest is just icing on the top.

MS seems to be convinced that a tweaked Jaguar is enough to feed this GPU. The tweaking they did is exactly what other posters suggested. Instead of going for AVX-512, tweak the CPU so everything performs faster. And for consoles which are a pretty dedicated device and cost matters, it's usually most efficient to add some fixed function hardware for important stuff over some "niche" ISA extensions that have a significant developer overhead. In case of XBOX1 this is the custom command processor which was further enhanced in Project Scorpio (see below). DX12 draw calls are done by the GPU command processor and not the CPU. That's why something as tiny as Jaguar can actually drive a GPU more powerful than a RX 480. Fixed function hardware for the most critical stuff in gaming CPU performance: Draw calls.

However, potentially the most exciting aspect surrounding the CPU revamp doesn't actually relate to the processor blocks at all, but rather to the GPU command processor - the piece of hardware that receives instructions from the CPU, piping them through to the graphics core.

"We essentially moved Direct3D 12," says Goossen. "We built that into the command processor of the GPU and what that means is that, for all the high frequency API invocations that the games do, they'll all natively implemented in the logic of the command processor - and what this means is that our communication from the game to the GPU is super-efficient."

Processing draw calls - effectively telling the graphics hardware what to draw - is one of the most important tasks the CPU carries out. It can suck up a lot of processor resources, a pipeline that traditionally takes thousands - perhaps hundreds of thousands - of CPU instructions. With Scorpio's hardware offload, any draw call can be executed with just 11 instructions, and just nine for a state change.

Given this why would you want to add stuff like AVX-512 to the CPU in a console? Makes no sense. Just make a bigger GPU and then the developer can chose if they want better graphics or better physics. The bigger GPU is always a benefit, something like AVX-512 is not.
 

Topweasel

Diamond Member
Oct 19, 2000
5,436
1,655
136
No L3 could surely be done. It would be a horrible idea though. The L3 is generally used as the last level cache (LLC). Cores can benefit greatly for having this, as it makes it easier to share/find data with other cores. The problem with the construction cores is that the L2 and L3 cache had very high latencies. The L3 was particularly bad, it wasn't all that much quicker than main memory. According to Anand (in one of those CPU reviews), AMD eventually discovered the cause of the high latency for the L3. They decided not to do anything with it though as servers are more forgiving with latency.

The L2 I've read was supposed to be smaller and much quicker. For whatever reason that didn't happen, so that's why they made it so large. This is all speculation of course, based on things I've read over the years.

With Zen, they really fixed those cache problems. In a HUGE way. The L1 went back to a writeback cache, the L1/L2 changed from exclusive to inclusive (first time for AMD I believe). That made a significant difference. With the L3, they apparently got 5x the bandwidth. How? I have no idea. The latencies are also better.



TL;DR AMD could get away with losing the L3 cache with the Construction cores because it was high latency and did not do much for desktop tasks. With Zen, they really made it a priority to fix the cache system and since it is much better, they would almost certainly need to keep.
This has nothing to do with how the cache is used and much more to do with module design. While core design is important and a big step up in Zen. Also while in the future AMD could design for extremely small and low power setups a "small" Zen design that may or may not sport L3. That part of Zen that really makes Zen Zen is the CCX design.

The CCX is more than two groups of CPU's in a Ryzen. It's the building block for all of their desktop and laptop CPU's. A design that they can drag in drop one or more into any die design with little effort.

Why does this matter? Because unlike just about every CPU we have seen before. L3 cache is included in the center of this module. My point on the APU's, was that in the past whatever core design any L3 added would have to be added after the fact. You place the cores or in BD's case the module, you place whatever other SoC functionality, and then decide how much L3 and place that in a memory bank connected but sperate from the cores.

On the APU's any L3 added would because it's added after the fact would impact the amount of GPU Units you could add. What you stated also mattered, L3 didn't have as much of an impact on top of the fact that L3 had to include everything that was in L2 creating a minimum you had to exceed before it was useful.

On Zen the L3 cache isn't added on later. It is a preset 2MB per core and it's part of the module. Not only part of the module, but an integral part of it. It's literally at the center of the CCX design. Which is my point. You don't create a reusable scalable module system, use it on two designs, and then gut and have to completely redesign it again. Therefore my whole premise is that not only is L3 cache important to Zen, but because of how a CCX module is designed only on extreme cases would we see AMD remove the L3.
 

ThatBuzzkiller

Golden Member
Nov 14, 2014
1,120
260
136
That's pretty biased argument. project scorpio also increases GPU clocks by a similar amount (37%) and increases amount of SPs by 333%!!! which leas to roughly 5 times peak throughput. So yeah, the new process is what matters mostly. The rest is just icing on the top.

Yet despite a 2x density improvement all they could do is increase CPU clocks by 37% ...

Process matters most but it's not all about the TFlops when you have to use it in different ways ...

MS seems to be convinced that a tweaked Jaguar is enough to feed this GPU. The tweaking they did is exactly what other posters suggested. Instead of going for AVX-512, tweak the CPU so everything performs faster. And for consoles which are a pretty dedicated device and cost matters, it's usually most efficient to add some fixed function hardware for important stuff over some "niche" ISA extensions that have a significant developer overhead. In case of XBOX1 this is the custom command processor which was further enhanced in Project Scorpio (see below). DX12 draw calls are done by the GPU command processor and not the CPU. That's why something as tiny as Jaguar can actually drive a GPU more powerful than a RX 480. Fixed function hardware for the most critical stuff in gaming CPU performance: Draw calls.

It really doesn't matter much what Microsoft thinks when game developers will make a targeted effort to adjust to the console's platform bottlenecks. Here's a question for you ...

Is Scorpio's CPU enough to drive 60FPS for the vast majority of the AAA games ?

Given this why would you want to add stuff like AVX-512 to the CPU in a console? Makes no sense. Just make a bigger GPU and then the developer can chose if they want better graphics or better physics. The bigger GPU is always a benefit, something like AVX-512 is not.

It's really obvious, to increase CPU performance! A bigger GPU isn't all you want since there's far more to a game than just graphics or physics, it's also about the game logic too so don't just downplay AVX-512 if it can deliver more 60FPS games on consoles! (we could even get 120FPS or even 144FPS games too or some could just hit 100FPS lock with adaptive vsync!)

Frankly, you just can't use a GPU for everything when there are tasks a CPU is better at even in games like driving some latency down!
 

NTMBK

Lifer
Nov 14, 2011
10,269
5,134
136
Yet despite a 2x density improvement all they could do is increase CPU clocks by 37% ...

Process matters most but it's not all about the TFlops when you have to use it in different ways ...



It really doesn't matter much what Microsoft thinks when game developers will make a targeted effort to adjust to the console's platform bottlenecks. Here's a question for you ...

Is Scorpio's CPU enough to drive 60FPS for the vast majority of the AAA games ?



It's really obvious, to increase CPU performance! A bigger GPU isn't all you want since there's far more to a game than just graphics or physics, it's also about the game logic too so don't just downplay AVX-512 if it can deliver more 60FPS games on consoles! (we could even get 120FPS or even 144FPS games too or some could just hit 100FPS lock with adaptive vsync!)

Frankly, you just can't use a GPU for everything when there are tasks a CPU is better at even in games like driving some latency down!

If you want to improve the performance of the CPU in the XBox One X, there's a whole laundry list of ways you could improve it. Larger OoO windows, more execution units, higher clock speed, better branch prediction, better cache designs... basically, replace it with Zen. AVX-512 would personally be a long way down that list
 

krumme

Diamond Member
Oct 9, 2009
5,956
1,595
136
Its mostly a matter of what is on the shelf and what is made for the shelf.
Making a 6c ccx and staying on 128bit avx to save some mm2 makes a lot of sense for a platform reused in servers mobile and consoles. More sense than 4c ccx and wider vectors.
It seems to me from the die pics the ccx module is quite small. We ought to ask Hans for an estimate here
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,867
3,418
136
Its mostly a matter of what is on the shelf and what is made for the shelf.
Making a 6c ccx and staying on 128bit avx to save some mm2 makes a lot of sense for a platform reused in servers mobile and consoles. More sense than 4c ccx and wider vectors.
It seems to me from the die pics the ccx module is quite small. We ought to ask Hans for an estimate here
We dont need one, AMD already gave the data out themselves, i cant remember exactly i think 45mm sq.

http://techreport.com/news/31402/amd-touts-zen-die-size-advantage-at-isscc
 

ThatBuzzkiller

Golden Member
Nov 14, 2014
1,120
260
136
If you want to improve the performance of the CPU in the XBox One X, there's a whole laundry list of ways you could improve it. Larger OoO windows, more execution units, higher clock speed, better branch prediction, better cache designs... basically, replace it with Zen. AVX-512 would personally be a long way down that list

Zen is big (like 16.5mm^2 per core on SS/GF 14nm) compared to Jaguar (3.1mm^2 per core on 28nm) ...

A 10x more dense core for a 3x increase in IPC doesn't sound like the best trade off to me ...

Higher clock speeds is also not ideal from a perf/watt perspective either. You would only need a high enough clock and IPC to speed up long dependency chains in code before the gains in single threaded performance with little DLP start to taper off at the expense in terms of perf/die area and perf/watt cost soaring ...

I would not place AVX-512 on low priority given how common it is to exploit some moderate DLP in game logic but you do have a point that there are other ways to increase CPU performance. It's just that other ways are pretty extreme about it in terms of thermal and silicon area cost when AVX-512 is fairly tame in die logic size and power consumption but it's not as if I'm closed on the idea of improving the entire core itself as I think it's also a good idea to do implement most of these improvements ...
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,689
1,224
136
Zen is big (like 16.5mm^2 per core on SS/GF 14nm) compared to Jaguar (3.1mm^2 per core on 28nm) ...
Correct your math after this...
44 mm² => 4 cores + L2 + L3
L3 = 16 mm²
L2 = 1.5 mm² per core.
Zen -- ((44 - 16)/4) - 1.5 => 5.5 mm² per core without L2.
Jaguar => 3.1 mm² per core without L2.

~3.6x more dense for a 3x increase in perf?
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |