I gather it means a reduction in margin due to moving down the product stack.
I'm leaning towards he's talking performance.
I gather it means a reduction in margin due to moving down the product stack.
Agreed.Dual sourcing Zen 3 on some inferior node? Don't think it would be worth it.
I think it’s just about Milan cores being that much better than Intel’s best.
I think Rome is already better, core for core than Intels best, or at least equal to. Not to mention blowing them away in number of core/socket.
Two Anandtech posts that might be relevant. It seems that 5nm is ramping very quickly, plus it's using a totally different fab than 7nm production, so no shared equipment.Well, if TSMC can not give AMD more wafers, then they have to be able to compete with Intel leading edge offerings by either using less silicon per package (better cores, cache structure, and thermal/power management to allow more frequency for a given package power rating) or they have to come up with wafers from another source or node. TSMC has extra capacity in the 8 and 10nm nodes, but neither share design rules with 7nm. Samsung is moving into N7 territory with its leading edge node, but, again, far different design rules. We’ve heard absolutely nothing in the rumor mill about AMD sourcing from Samsung.
I think it’s just about Milan cores being that much better than Intel’s best.
I’ve wondered if AMD could find a way to fit a lower performance Zen3 CCD on GloFo’s improves 12LP node. I mean, more specifically, could AMD put an eight core CCX with 16MB of L3 in a CCD on GloFo’s 12lpp, and use a maximum of four of them on a pin compatible EPYC package. 12LPP is supposed to be more dense with better power characteristics than 12LP, and CCDs with half the L3 should be small enough to fit in the place of two N7 CCDs. This could be sold to the lower end of the market for IO heavy SKUs or just lower performance packages.
That’s obviously not going to happen, but I think it’s possible.
True, but if there is more 5nm available, don't you think AMD would want to move also?@maddie
If TSMC can move enough of their mobile customers to 5nm, that would leave more N7+ wafers for AMD. AMD's move to 5nm may be rather sluggish.
I think whenever Intel still (rarely) wins vs Rome, it's because of cache hierarchy or AVX512. Milan should eliminate that first (sometimes) advantage.
Current rumor is ~15% improvement on Int and ~10% on FP and a ~5% improvement on frequency.They have stated 50% FP performance boost, but Intel leads by quite a bit in some AVX benchmarks.
The 32 MB monolithic cache on each CCX should take care of almost all cases where Intel was leading due to cache size accessible from a single core. Top Intel part is 28 cores with 38.5 MB of cache.
Except... it's not really true.Haven't read most of the replies on this thread, but a '22 triple rate for yield makes sense as 2022 will probably be when Zen 4 releases on AM5 with whatever rumored spec sheet.
AM5 in my opinion is right there with SP4. It either doesn't exist or doesn't exist in the way people think it does.Yeah, who would have thought a new process being compared to an older and still active node would be only a fraction of the latter's size. Mind = blown. With AM5 being a new platform offering new tech, don't be surprised if Zen4 launches 16-18 months after Zen 3, falling in line with your own suggestion of 2Q2020.
They would have to achieve an insane leap in power efficiency to stack 4 logic dies of top of each other at desktop frequencies.SP3 -> SP5 (Server X3D: 8x4 Hi => 256-cores)
AM4 -> AM6 (Desktop X3D: 2x4 Hi => 64-cores)
Well we are still at least 1.5 years away before the stacked sockets.They would have to achieve an insane leap in power efficiency to stack 4 logic dies of top of each other at desktop frequencies.
Unless the stack is using some kind of interior thermal via like ICEcool the heat build up would just be way too much to handle.
At server frequencies maybe though.
It seems more likely that they would start at 2 high stacks at 5nm anyway to begin with - then 4 high at N3 or more likely N2.
What if the stacking doesn't happen on the CCDs? What if the stacking is on the large IO die? You wouldn't want to stack the SRAM cache on the CCDs as they need to dissipate eat at prodigious rates. However, while the IO die gets warm, it's not a huge energy sink. What if the IO die move to the enhanced 12lpp process with a reduced Z height and instead stacks a large L4 cache on top of it. If the CCDs maintain 32MB of L3, and they stick with eight CCDs max, then it would have to be exclusive 256MB to be of my real use, or inclusive 1GB. Can that much L4 SRAM even fit on a 12lpp chip that size?
I would actually ecpect most of the improvement to be cache hierarchy re-design.15% increase for the integer core points to significant changes. My guess is core slightly widened core (+1 ALU) among other refinements to match. Hopefully that 15% holds for not only multithread Int increase.
Current rumor is ~15% improvement on Int and ~10% on FP and a ~5% improvement on frequency.
Basically, Improved/Enhanced Zen2, rather than Next-Generation Zen3.
So, unless Milan launches as Zen2+ on N7e(N7 -> N7P -> N7e // N7e != N7+/N6). If not then there will be problems.
You could have cases where a huge number of threads are trying to share data. Technically, you could have up to 56 threads with SMT on sharing ~40 MB of cache with intel. With AMD, it will be up to 16 threads with 32 MB. That is a big improvement since current Zen 2 is 8 threads and 16 MB. I am still wondering if AMD is going to pull off some kind of larger cache variant. There is still that one slide were they have "32+ MB L3" for milan. It does look like there would be room for longer chips on the Epyc package.Icelake-SP should increase that to 42MB. Still, it's difficult to envision many scenarios where 1*42MB will win out over 8*32MB L3.
It doesn't seem like they would want to stack multiple logic die at all. Stacked memory die are much more likely since they do not consume as much power. I have been wondering about stacking a cache chip with a cpu die, but sram takes a bit of power. It may be a non-starter, unless they have come up with a way to cool it. There was some AMD patent a while back about using an integrated TEC to cool stacks of memory and logic, but that doesn't seem like it would be very effective. They seem to always have the logic die at the bottom. It seems like it would make more sense to put the logic die on top and the memory on the bottom. I wonder what prevents the memory die from being under the logic die.They would have to achieve an insane leap in power efficiency to stack 4 logic dies of top of each other at desktop frequencies.
Unless the stack is using some kind of interior thermal via like ICEcool the heat build up would just be way too much to handle.
At server frequencies maybe though.
It seems more likely that they would start at 2 high stacks at 5nm anyway to begin with - then 4 high at N3 or more likely N2.
That's be an ingenious approach to increasing the core count further even with N7+ (or whatever node AMD now uses for Zen 3) not significantly increasing density over Zen 2/N7. And as always smaller dies also mean higher yield. Though so far there was no single (or was there?) indication that Zen 3 is actually going to increase the core count.I have wondered if it would make sense to have something like a 2 CCX chiplet with 16 cores stacked on top of a cache die that is just the L3 cache and fabric interface. It seems like you would want the cpu die on top for cooling. With a stacked die, you could have a much larger cache and also much shorter connections since the cache would be right under or on top of the cpu cores rather than half way across the chip. I wouldn't think TSVs would incur much of any latency penalty. You could probably do 64 MB per CCX in a very compact die stack. The current Zen 2 CCD is actually around 50% cache, so you would have double the cores and cache in roughly the same footprint. It may also make better use of available process tech since the best process tech for SRAM cache may not be optimal for the cpu cores. Also, such a stacked die wouldn't need to be placed on an interposer. You could make the bottom SRAM die different depending on whether it is meant for an interposer or not.