Speculation: Ryzen 4000 series/Zen 3

A/// · Jan 1, 2020

Veradun said:
I want to casually point out that 5nm, PCIe5, DDR5, AM5 would fit nicely on a 5/5/2021 launch date. Note that 2021->2+0+2+1=5

I suppose that works. I only brought it up because AMD launched Zen2 on 7/7 this year. I haven't bought AMD in like two decades, so I don't know if these numbers games are something they do all the time.

IntelUser2000 · Jan 1, 2020

amd6502 said:
As far as this table goes, it's a bit unfair to the 4 and 5 ghz contenders because scaling is a very crude approximation. You should run the 9900k and 3950X at 2.5GHz. Otherwise it's as if you handicapped these competitors with very high latency memory (about 2x latency of whatever RAM the A12 was using).

SpecCPU2006 has a scaling factor of ~85% for most CPUs. Just by that it would bring advantage against Zen 2 to 70%.

But, SpecCPU is not the lightest workload. The 9900K and 3950X are both at the edge of what's possible in terms of frequency. Because the boost can be easily dislodged, it further disadvantages them.

I would say realistically its 60-65%. That's still an astonishing difference.

There's also a matter of x86 vendors standing still. What if it was going against Golden Cove derivatives instead of Skylake? Even if you consider a lower gain of 15% for Sunny Cove, and 15% for Golden Cove, you would reduce the difference to 25%. Add 8% for Willow Cove? It becomes 15%.

Let's say we had 3.5GHz Golden Cove instead of 5GHz Skylake. Would look much better against ARM competition even though the overall result may not be a huge advance on the desktop side.

amd6502 · Jan 1, 2020

IntelUser2000 said:
SpecCPU2006 has a scaling factor of ~85% for most CPUs. Just by that it would bring advantage against Zen 2 to 70%.

But, SpecCPU is not the lightest workload. The 9900K and 3950X are both at the edge of what's possible in terms of frequency. Because the boost can be easily dislodged, it further disadvantages them.

I would say realistically its 60-65%. That's still an astonishing difference.

Can anyone point to how to run a SpecCPU on linux?

It should be trivial to limit the frequency and test at the correct frequency where there is minimal scaling approximation error.

Just do a

cpufreq-set -c0-23 -u 2.5

and this should get you to a very close frequency. Then look up where it would be running with a cpufreq-info query. Then run your benchmark.

If this bench is easy to get then pretty much anybody with a 9900k-gen core or 3600/Matisse system could get us more accurate IPC comparison numbers.

For benchmarks that fit easily within the cache you're right the scaling should be very high (close to 100%). How long the peak boost is sustained is a good point, and this will also depend much on case cooling and cpu cooler--- so, it's a very bad idea to run at boost frequency if you want to measure IPC.

I haven't looked into it at all but I'm very skeptical of Apple's acorn core achieving a 60% IPC lead over newest gen x86.

IntelUser2000 · Jan 1, 2020

amd6502 said:
If this bench is easy to get then pretty much anybody with a 9900k-gen core or 3600/Matisse system could get us more accurate IPC comparison numbers.

For benchmarks that fit easily within the cache you're right the scaling should be very high (close to 100%). How long the peak boost is sustained is a good point, and this will also depend much on case cooling and cpu cooler--- so, it's a very bad idea to run at boost frequency if you want to measure IPC.

Even with good cooling, I found some errors in calculations when allowing the CPU to boost to Turbo. The clock frequency has to be fixed.

I assume it should be easy to set low frequencies? Just go to BIOS and set the multiplier to 27x for 2.7GHz. Right?

I haven't looked into it at all but I'm very skeptical of Apple's acorn core achieving a 60% IPC lead over newest gen x86.

I can believe this. AMD is just catching up to Skylake, and Intel has stood in a spot for years. ~15% for Sunny/Golden cove, plus 7-8% for Willow Cove gets us 40-45%.

That's still behind A13, but it has the advantage of having a top to bottom vertical stack plus running at much lower frequencies.

For benchmarks that fit easily within the cache you're right the scaling should be very high (close to 100%).

Even 85% is a high figure for some workloads. Some may end up even lower at 60%, and this is true for many floating point workloads because they gobble up bandwidth. I don't like playing with SpecFP numbers for the same reason. Its harder to isolate uarch differences. SpecInt is much better.

Dhrystone scales 100% with clocks, and Geekbench probably at 99% or even higher.

DisEnchantment · Jan 1, 2020

DisEnchantment said:
Another week, another couple of AMD patent applications related to stacked memory.

This one is a very novel idea. Quite interesting to read.
20190333876
METHOD AND APPARATUS FOR POWER DELIVERY TO A DIE STACK VIA A HEAT SPREADER
Various chip stack power delivery circuits are disclosed. In one aspect, an apparatus is provided that includes a stack of semiconductor chips that has an uppermost semiconductor chip and a lowermost semiconductor chip. A heat spreader is positioned on the uppermost semiconductor chip. A power transfer circuit is configured to transfer electric power from the heat spreader to the uppermost semiconductor chip.
View attachment 12775

To me this patent could be related to the integrated thermo-electric cooler patent (20180358080/20190122704). They could deliver power to to the integrated thermo-electric device to extract the heat. They talk about the many ways of delivering power to different layers in the stack device from the head spreader. Each layer could be the SRAM or could be a heat transfer layer which also serves as protection layer.

20190333876
CONFIGURATION OF MULTI-DIE MODULES WITH THROUGH-SILICON VIAS
A data processing system includes a processing unit that forms a base die and has a group of through-silicon vias (TSVs), and is connected to a memory system. The memory system includes a die stack that includes a first die and a second die. The first die has a first surface that includes a group of micro-bump landing pads and a group of TSV landing pads. The group of micro-bump landing pads are connected to the group of TSVs of the processing unit using a corresponding group of micro-bumps. The first die has a group of memory die TSVs. The subsequent die has a first surface that includes a group of micro-bump landing pads and a group of TSV landing pads connected to the group of TSVs of the first die. The first die communicates with the processing unit using first cycle timing, and with the subsequent die using second cycle timing.

View attachment 12778

This is also very interesting, quite detailed description how the layout is going to look like, data transfer mechanisms, clocking, synchronization, etc. Looks like development of this is quite advanced.

Zen 4 stuffs I would say.
There are lots of novel patents around PIM as well but probably GPU and FPGA related.

Quoting myself for reference, this new patent is a lot more interesting and seems more practical than the thermoelectric stacked die patent that they made some years ago.
The theme around stacked dies is so recurring in all these patent applications you can bet it is happening real soon (Zen4 latest I would bet) ( similar to how I kept quoting patents for Zen 2 and they did happen for real )

This time, there are other chips stacked on top of the IOD.
It seems highly probable that the IOD will be based on the same process node.
The stacked chips are manufactured differently.
Everything will be wrapped together by a molding material as a single chip.

20190393124 ARRANGEMENT AND THERMAL MANAGEMENT OF 3D STACKED DIES

What is described here is that

The cache and the IO will be located on the center die which is a low heat producing block. This seems to reconcile with their patents of a big unified L3.
The processor cores/compute blocks are on the periphery, with a dummy die stacked on top of them to take the heat out to the IHS. These are thermal hotspots which needs a good thermal path to the IHS.
Fully 3D integrated. This allows a lot more room for what can be integrated on a single chip for a specific socket.
The desire to route power via IHS comes from the fact that they want to stack the high heat producing blocks like cores on the top of the stack close to the IHS with the low heat producing blocks like the memory below on top of the substrate. Also applies to routing power to other dies stacks.
Multiple dies will be stacked on top of the center IOD/Cache Die. These could be memory or other SFUs.
Stacked dies are connected via TSV, bumps, conductive pillars and others (see quoted patents).

This patent is a specialization of the the one they filed in 2017. The patent seems to indicate that they have a more mature idea much closer to production based off patents they filed two years ago.

Their 3D stacked chiplet architecture coming full circle.

I think AMD has a good understanding of thermal issues associated with highly dense processes cropping from the move to 7nm for HPC applications and are reflected in a lot of their recent patents.

There are lots of patents around load/store improvements and Fabric efficiency which are very interesting as well but perhaps for another post.

DrMrLordX · Jan 2, 2020

@DisEnchantment

Good follow-up. One wonders what this means for delidding, as a practice?

Andrei. · Jan 2, 2020

soresu said:
The most advanced CPU core? Arguable to say the least with the likes of A64FX knocking around now with 512 bit SVE - there's more to being advanced than great scalar IPC.

Remember that for all that vaunted scalar oomph, Axx cores are currently crippled in the SIMD arena when comparing to x86 Zen and Core competitors.

Apple also lacks the software on any iOS platform that compares to x86 systems using MacOS, Windows or Linux - all that awesome IPC is as useful as a paperweight for most serious work outside the new Photoshop iOS release.

Scalar IPC is the hardest design characteristic to achieve, the A64FX doesn't look that great other than it being an SIMD and bandwidth monster.

In regards to Apple's freq scaling, people forget Apple still has to cater to energy efficiency. Simply using fatter transistors, going above 3GHz to around 3.5GHz shouldn't be very hard if actually designed for it. We'll see macs with their CPUs soon enough so I hope that'll finally shut people up.

Olikan · Jan 2, 2020

DisEnchantment said:
There are lots of patents around load/store improvements and Fabric efficiency which are very interesting as well but perhaps for another post.

Also, many patents for cpu's front end latency, decode, uop and L1 cache latency

uzzi38 · Jan 2, 2020

Renoir is Ryzen 4000 APUs, so I'm plopping these here.

https://twitter.com/x/status/1212656329833115653

https://twitter.com/x/status/1212675865366212608

teejee · Jan 2, 2020

uzzi38 said:
Renoir is Ryzen 4000 APUs, so I'm plopping these here.

https://twitter.com/x/status/1212656329833115653

https://twitter.com/x/status/1212675865366212608

8 cores then, at least according to the info above in the leaked benchmark.

Veradun · Jan 2, 2020

DisEnchantment said:
Quoting myself for reference, this new patent is a lot more interesting and seems more practical than the thermoelectric stacked die patent that they made some years ago.
The theme around stacked dies is so recurring in all these patent applications you can bet it is happening real soon (Zen4 latest I would bet) ( similar to how I kept quoting patents for Zen 2 and they did happen for real )

This time, there are other chips stacked on top of the IOD.
It seems highly probable that the IOD will be based on the same process node.
The stacked chips are manufactured differently.
Everything will be wrapped together by a molding material as a single chip.

20190393124 ARRANGEMENT AND THERMAL MANAGEMENT OF 3D STACKED DIES

View attachment 15171

What is described here is that

The cache and the IO will be located on the center die which is a low heat producing block. This seems to reconcile with their patents of a big unified L3.

The processor cores/compute blocks are on the periphery, with a dummy die stacked on top of them to take the heat out to the IHS. These are thermal hotspots which needs a good thermal path to the IHS.

Fully 3D integrated. This allows a lot more room for what can be integrated on a single chip for a specific socket.

The desire to route power via IHS comes from the fact that they want to stack the high heat producing blocks like cores on the top of the stack close to the IHS with the low heat producing blocks like the memory below on top of the substrate. Also applies to routing power to other dies stacks.

Multiple dies will be stacked on top of the center IOD/Cache Die. These could be memory or other SFUs.

Stacked dies are connected via TSV, bumps, conductive pillars and others (see quoted patents).

This patent is a specialization of the the one they filed in 2017. The patent seems to indicate that they have a more mature idea much closer to production based off patents they filed two years ago.

Their 3D stacked chiplet architecture coming full circle.

I think AMD has a good understanding of thermal issues associated with highly dense processes cropping from the move to 7nm for HPC applications and are reflected in a lot of their recent patents.

There are lots of patents around load/store improvements and Fabric efficiency which are very interesting as well but perhaps for another post.

In one of those slides from the leaked event that sparked a long discussion on SMT2 here, I remember the cache were still inside the compute chiplet for Milan. Do you think this patent would apply to a "L4 cache" if used for Milan?

naukkis · Jan 2, 2020

IntelUser2000 said:
But, SpecCPU is not the lightest workload. The 9900K and 3950X are both at the edge of what's possible in terms of frequency. Because the boost can be easily dislodged, it further disadvantages them.

I would say realistically its 60-65%. That's still an astonishing difference.

You are comparing phone SOC to desktop CPU. It's that phone SOC which is boosted to it's max and is throttling to keep temperatures in control. Give A13 a proper heatsink and more power headroom and it will sustain clocks better - there's absolutely no content about which one is more disadvantaged in comparison. So Apple's core is much, much more better than what benchmarks rant in phone would suggest.

amd6502 · Jan 2, 2020

naukkis said:
You are comparing phone SOC to desktop CPU. It's that phone SOC which is boosted to it's max and is throttling to keep temperatures in control. Give A13 a proper heatsink and more power headroom and it will sustain clocks better - there's absolutely no content about which one is more disadvantaged in comparison. So Apple's core is much, much more better than what benchmarks rant in phone would suggest.

not if you put the telephone in the freezer. (not sure if this will make the battery explode though, so don't try without a firetruck nearby).

soresu · Jan 2, 2020

naukkis said:
there's absolutely no content about which one is more disadvantaged in comparison.

Yes there is, it's not even arguable the difference in software - the uphill battle of Windows on ARM shows that whatever other commenters on here have said to the contrary, there is clearly much more than a mere re-compile needed to port a lot of the software on x86 platforms.

I'm not saying this because I'm any big proponent of x86, quite the opposite as I wish there were far more AAA game ports on Android (#cough#KOTOR2#cough#), but it simply isn't happening - what little effort was previously made seems to have slowed to a crawl now, even with Windows on ARM to bolster the potential market (and Switch on the same ISA for that matter).

Are there even any significant AAA games playing natively on WARM? If so the news certainly doesn't seem to be circulating as well as might be expected from a properly invested platform vendor.

Coupled with the ridiculous amount of time they are taking with x64 binary translation support, it does seem as if MS never really had much faith in the platform at all.

Carfax83 · Jan 2, 2020

soresu said:
The most advanced CPU core? Arguable to say the least with the likes of A64FX knocking around now with 512 bit SVE - there's more to being advanced than great scalar IPC.

I won't lie. When I first read that sentence, I thought you were referring to the Athlon 64 FX, which makes no sense whatsoever!

Carfax83 · Jan 2, 2020

Andrei. said:
Scalar IPC is the hardest design characteristic to achieve, the A64FX doesn't look that great other than it being an SIMD and bandwidth monster.

Perhaps scalar IPC is harder to achieve, but that doesn't negate the increasing importance of SIMD. So many workloads these days can be accelerated by SIMD, and Intel (with AMD following) are both hellbent on wider vectors. It's easy for non industry professionals like myself to not realize the importance, until I see the ridiculous performance gains you get when an application is optimized for it.

We'll see macs with their CPUs soon enough so I hope that'll finally shut people up.

I hope to see this one day myself. I'd like to see how Macs with a beefed up A series CPU deal with heavier workloads.

soresu · Jan 2, 2020

Carfax83 said:
I won't lie. When I first read that sentence, I thought you were referring to the Athlon 64 FX, which makes no sense whatsoever!

Yes it is a bit of a funny name likely to cause such confusion, but it's not meant for wider consumption so I guess it never went through a more exhaustive PR effort.

soresu · Jan 2, 2020

Carfax83 said:
until I see the ridiculous performance gains you get when an application is optimized for it.

Absolutely, it's the difference between usable and unusable for the moment with AV1 decoding at higher resolutions, even with dav1d's superior SW engineering.

DrMrLordX · Jan 2, 2020

amd6502 said:
not if you put the telephone in the freezer.

Hey that's not a bad idea . . . gets crazy ideas.

amd6502 · Jan 2, 2020

DrMrLordX said:
Hey that's not a bad idea . . . gets crazy ideas.

I would not do that with a fully charged warm battery. I would guess (I am really not sure) that it might be safe to put a telephone in the freezer for benchmarking if there is say a 25%-65% charge.

Are there any apple apps to measure boosting behaviour and frequencies? Something equivalent to cpufreq-aperf or more elaborate

soresu · Jan 2, 2020

amd6502 said:
I would not do that with a fully charged warm battery. I would guess (really not sure) that it might be safe to put a telephone in the freezer for benchmarking if there is say a 25%-65% charge.

Are there any apple apps to measure boosting behaviour and frequencies? Something equivalent to cpufreq-aperf or more elaborate

My guess would be a condensation fueled short would be the main danger, though pure condensed water is not nearly as conductive.

amd6502 · Jan 2, 2020

soresu said:
My guess would be a condensation fueled short would be the main danger, though pure condensed water is not nearly as conductive.

I lost my phone in the snow for an evening, and it was single digit in Fahrenheit at the time. No damage. But the charge was near empty. I somewhat worry that a fully 100% charged battery could get shocked when quickly going from 40C to 45C (which it might easily coming fresh off a fast charger) to -20C. The energy that a battery can hold at room temperature is significantly more than what it can hold at -20C. So what happens to that energy that it now all the sudden cannot hold any more?

arandomguy · Jan 2, 2020

Aandtech way back did do some smartphone reviews with the phone in a freezer to compare -

CPU: https://www.anandtech.com/show/6914/samsung-galaxy-s-4-review/4

GPU: https://www.anandtech.com/show/6914/samsung-galaxy-s-4-review/5

Nexus 4: https://www.anandtech.com/show/6440/google-nexus-4-review/3

DrMrLordX · Jan 2, 2020

Sadly they only did it for one bench. That or it only made a difference on one bench. It's a bit off-topic so I'll stop now.

To segue: Sub-ambient cooling might be tempting for Matisse since it gets to be such a hassle to cool it with just a little bit of a clock bump. Hopefully 7nm+ will be easier to deal with than 7nm, but it's doubtful. Maybe someone could come up with a nitrogen-sealed, chilled case for PCs? Most non-HDD components can handle temps down to -60C without malfunction. Nitrogen-sealed means: no condensation. The only trick would be external peripherals. Also, there are some things you don't want to add to your heat load when running an active-cooled environment, like the PSU. Also NVMe drives might not be happy at those temps either.

soresu · Jan 3, 2020

amd6502 said:
I lost my phone in the snow for an evening, and it was single digit in Fahrenheit at the time. No damage. But the charge was near empty. I somewhat worry that a fully 100% charged battery could get shocked when quickly going from 40C to 45C (which it might easily coming fresh off a fast charger) to -20C. The energy that a battery can hold at room temperature is significantly more than what it can hold at -20C. So what happens to that energy that it now all the sudden cannot hold any more?

From what I understand it has to do with the chemistry and structure of the electrodes and electrolyte - especially the electrolyte as solid electrolyte cell batteries can function well at somewhat lower temps than liquid electrolyte cells can.

The energy stored in a rechargeable electrochemical battery cell is not like bottled lightning so much as ions induced to move from one electrode to the other during charging - even without use it will discharge anyway over time as those ions move back to their natural states, seemingly a drop in temperature accelerates it.

It seems very counter intuitive as cold should slow down chemical processes - but I probably have the wrong end of the electrochemical stick, so don't take me at my word.

Edit:

Found this link which takes a stab at an explanation without exact specifics.

Sounds like the battery does not discharge so much as simply stops working well enough to detect a charge coming from it after the electrolyte temp dips to a critical level which impedes ion movement - like a power wire that suddenly develops a massive resistance between the two electrodes in each cell.

The device (phone) makes an educated guess at the current level of charge in its battery based on the output voltage I think.

As it drops during discharge, the device interprets the battery voltage output as an equivalent percentage of remaining charge until it gets to an unacceptable voltage for the device minimum spec to run at - which the device interprets as 0% on the battery charge, rather than the absolute discharge of the battery (repeated absolute discharge is not a good idea for long term battery health anyway).

When the liquid electrolyte freezes, the battery can no longer discharge properly, so the output voltage drops dramatically - leading the device to think the battery has actually discharged instead.

Solid electrolyte circumvents this problem by having sort of engineered fixed channels for ion movement - not quite as mobile/fast as liquid electrolyte, but less temperature dependent.

Speculation: Ryzen 4000 series/Zen 3

Diamond Member

Elite Member

Senior member

Elite Member

Golden Member

Lifer

Senior member

Platinum Member

Platinum Member

Senior member

Senior member

Senior member

Senior member

Platinum Member

Diamond Member

Diamond Member

Platinum Member

Platinum Member

Lifer

Senior member

Platinum Member

Senior member

Senior member

Lifer

Platinum Member