Speculation: Ryzen 4000 series/Zen 3

turtile · Dec 27, 2019

lobz said:
I get that, they could get it out, technically. But they won't, not on the desktop so shortly after Matisse, it's just not plausible. Nevertheless, please believe me this: I'll be quite happy to be wrong and pleasantly surprised in this particular question

Why do you think it's not possible? What would take more time for 7nm+ vs 7nm? With the move to Zen 2, I assume they reused much of the I/O considering the Epyc remained 14nm and Ryzen remained 12nm. If Zen 3 still reuses the same I/O, they only need to tape out new chiplets (or at least, 14nm/12nm should be easy to change in comparison). GF's 12nm+ won't be ready until Zen 4.

lobz · Dec 27, 2019

turtile said:
Why do you think it's not possible?

lobz said:
I get that, they could get it out, technically. But they won't, not on the desktop so shortly after Matisse, it's just not plausible.

turtile said:
If Zen 3 still reuses the same I/O, they only need to tape out new chiplets (or at least, 14nm/12nm should be easy to change in comparison). GF's 12nm+ won't be ready until Zen 4.

According to this leaked slide we're talking about, Zen 3 will not use either 14 or 12nm I/O dies.

DrMrLordX · Dec 28, 2019

moinmoin said:
Would be an odd intermediary step, but I could see Zen 3 with 3x256b FMAC and Zen 4 with 4x256b FMAC. The latter may need too much space to fit in 7nm+'s density improvement.

3x256b FMAC would be weird. I agree that AMD may not spend their up-to-20% increase in density on 4x256b FMAC, but it's probably possible. They're getting that extra fp performance somewhere.

The pattern so far is a new Ryzen gen every 14 months.

Yes and no. That's too simplistic of an analysis.

Pinnacle Ridge was 13 months, and it probably could have been March 2018 if AMD had pushed it (they didn't). Matisse, on the other hand, was ready in silicon within about the same time frame (13 months) but delayed because of . . . looks like board UEFI and binning on the 3900x this time around. The July launch also coincided with the 7/7 launch (remember that?) that was as much a marketing gimmick as anything else. AMD likely could have had at least the 3600 and 3700x out by April 2019 if board UEFI had been up to snuff (look at early UEFI revs: even the non-crippled ones were sketchy). I don't expect another 15-month delay for Zen3. Remember AMD has to manage inventory as much as anything else. They're keeping inventories tight on Matisse products, so why not just launch Vermeer in July 2020 to alleviate demand for Matisse? AMD's entire strategy is to keep up a steady drumbeat in case Intel comes back on them in 2022. They want to face off against Intel 7nm with the strongest core they can have under the circumstances. Delays only threaten that strategy.

Antey · Dec 28, 2019

DrMrLordX said:
They're getting that extra fp performance somewhere.

What if It's just a rumour?

DrMrLordX · Dec 28, 2019

Antey said:
What if It's just a rumour?

wccftech is the source, and they themselves said that it's a rumour. Take it for what it is, and nothing more. Granted, Milan QS should be out by now (AMD announced that Milan was already sampling to customers in October). It's just a matter of time before there are more performance details leaked to the public.

soresu · Dec 28, 2019

DrMrLordX said:
I agree that AMD may not spend their up-to-20% increase in density on 4x256b FMAC, but it's probably possible.

I'd be more concerned about the power than area density - 15% doesn't sound like much to cover another doubling of FMAC resources.

Olikan · Dec 28, 2019

Didn't we had a rumour that L1 cache had a 50% bandwidth increase

exquisitechar · Dec 28, 2019

lobz said:
Zen 2 came out Q3 2019, consumer Zen 3 is practically impossible in Q3 2020, I'd say it's even improbable @ holidays 2020, but not impossible at least. There's also no reason for it, it's not like CML will be that huge of a threat on the desktop.

No, it’s coming in Q3 if all goes well. A bit later than Ryzen 3000, but still Q3.

lobz · Dec 28, 2019

exquisitechar said:
No, it’s coming in Q3 if all goes well. A bit later than Ryzen 3000, but still Q3.

Like I said. I will be very happy if I'm mistaken. I will need a new PC next year, so that would be pretty convenient for me, having yet another new Zen core just over a year later after Zen 2
I'm still not holding my breath. But you know what? If TGL comes out in Q1, as many people here seem to be convinced of, then I may believe Zen 3 comes to consumers in Q3. I see them equally improbable.

rainy · Dec 28, 2019

lobz said:
Zen 2 came out Q3 2019, consumer Zen 3 is practically impossible in Q3 2020, I'd say it's even improbable @ holidays 2020, but not impossible at least. There's also no reason for it, it's not like CML will be that huge of a threat on the desktop.

I'm pretty much sure that we will see Vermeer (Ryzen 4xxx) in the summer next year - at worst it would be September.

Btw, I disagree completely with the last sentence - if you're fighting against much bigger/stronger company like Intel, then you must throw at them as many strong punches as you can, especially with 10nm debacle.

lobz · Dec 28, 2019

rainy said:
I'm pretty much sure that we will see Vermeer (Ryzen 4xxx) in the summer next year - at worst it would be September.

Btw, I disagree completely with the last sentence - if you're fighting against much bigger/stronger company like Intel, then you must throw at them as many strong punches as you can, especially with 10nm debacle.

1: that would be super cool.
2: that's OK for me, it's why we're here

DrMrLordX · Dec 28, 2019

Olikan said:
Didn't we had a rumour that L1 cache had a 50% bandwidth increase

Red Gaming Tech seems to think it'll be 40%:

Zen 3 & Ryzen 4000 Analysis - IPC, Clock Speed, Core Count & More Analysis

On March 2nd of next year, it will have been three years since the launch of the first-gen Ryzen processors and with their launch, AMD caused significant d

www.redgamingtech.com

soresu said:
I'd be more concerned about the power than area density - 15% doesn't sound like much to cover another doubling of FMAC resources.

AMD will rely on their boost algorithm to drop clocks in order to keep within power limits (and to mitigate problems of power density). That may be why a doubling of FMACs won't result in a larger real-world gain in fp performance.

exquisitechar said:
No, it’s coming in Q3 if all goes well. A bit later than Ryzen 3000, but still Q3.

To post a pic from the above-cited article:

Based on that, what do you think will be the launch date for Vermeer? Also remember that the above roadmap has nothing to do with commercial availability (per se); for example, Milan is shown as being maybe a March/April 2020 product in the above leaked roadmap, but we all know that the Big Cloud Boys will be the first (and possibly only) ones getting chips at that time. You'll have to wait awhile to order a small batch of servers from an OEM featuring Milan.

Tuna-Fish · Dec 28, 2019

DrMrLordX said:
Maybe, maybe not. Correct me if I'm wrong, but Zen1/Zen+ were 2x128b FMAC and Zen2 is 2x256b FMAC, correct? Zen3 may just be 4x256b FMAC though I would think that would give more than +50% performance in fp. That would hint at AVX512 support, but that wouldn't absolutely be necessary.

dnavas said:
The API, sure, the vector width, not so much. Even after this rumor. Otherwise it wouldn't be "50%". "Up to 50%" sounds more like a third FMA. Interesting that they didn't aim for Cascade Lake parity, but not completely surprising for all the reasons quoted above.

It's important to note that real applications using a lot of FP compute also move around a lot of data. Doubling compute without widening the pipe to cache gives very diminishing returns, and more load units is a lot more expensive in power and silicon than more AVX units. If they just doubled the FMA pipes, but still have 2 load pipes of 256 bits, "up to 50%" might be a realistic estimation.

lobz · Dec 28, 2019

Tuna-Fish said:
It's important to note that real applications using a lot of FP compute also move around a lot of data. Doubling compute without widening the pipe to cache gives very diminishing returns, and more load units is a lot more expensive in power and silicon than more AVX units. If they just doubled the FMA pipes, but still have 2 load pipes of 256 bits, "up to 50%" might be a realistic estimation.

Thus it's also very unlikely.

krumme · Dec 28, 2019

lobz said:
Thus it's also very unlikely.

Why?

lobz · Dec 28, 2019

krumme said:
Why?

In the past years AMD have been talking a lot about moving data around and feeding compute units being so important, therefore it would be a very strange move for them to double just the maximum throughput again and leave it at that - seems not worth either the silicon cost or the fuss.

Richie Rich · Dec 28, 2019

Tuna-Fish said:
It's important to note that real applications using a lot of FP compute also move around a lot of data. Doubling compute without widening the pipe to cache gives very diminishing returns, and more load units is a lot more expensive in power and silicon than more AVX units. If they just doubled the FMA pipes, but still have 2 load pipes of 256 bits, "up to 50%" might be a realistic estimation.

I agree that ""up to 50%" might be a realistic estimation."
Based on IPC comparisons we can see that Zen 3 is on the edge of 4/6xALU design (Apple's A10 has around 13 pts/GHz). So it might be last 4xALU uarch or first 6xALU uarch. When AMD stated it's completely new uarch from scratch I'd guess 6xALU is highly probable. Not only for higher IPC and future potential for core evolution in Zen 4 but also the heat spreading on larger surface.

Regarding FP that's a question. My original estimation was 6xALU and 4xFPU (8xpipes) with SMT4. Now with the leaks suggesting +40-50% FP IPC it might suggest just 6xFP pipes (3xFPU 256-bit). However IPC doesn't scale linearly so doubling FPUs might result in +50% IPC increase. So doubled FPU pipes seems feasible way IHMO.

Another trick AMD could do is shared FPU ala Bulldozer. Two cores would be 12xpipes (6x FPU 256-bit) instead 8+8xpipes (8xFPUs). Such a configuration would save a lot of transistors while producing similar performance. The cost is radical uarch change (cannot be done as Zen2 evolution). But it's new uarch and AMD has experience from Bulldozer so who knows. Such a shared FPU (and a front-end) has one nice advantage: 4-core CCX becomes 8-core CCX (L2$ is shared by two cores). This configuration is less probable IMHO.

IPC calculations of SPECint2006:

- 9900K .... 54.28/5 GHz = 10.86 pts/GHz
- 3950X .... 50.02/4.6 GH = 10.87 pts/GHz
- A76 ........ 26.65/2.84 GHz = 9.38 pts/GHz
- A77 ........ 33.32/2.84 GHz = 11.73 pts/GHz ...... +8% IPC over 9900K
- A11 ........ 36.80/2.39 GHz = 15.40 pts/GHz .... +42% IPC over 9900K
- A12 ........ 45.32/2.53 GHz = 17.91 pts/GHz .... +65% IPC over 9900K
- A13 ........ 52.82/2.65 GHz = 19.93 pts/GHz .... +83% IPC over 9900K

amd6502 · Dec 28, 2019

lobz said:
Thus it's also very unlikely.

I agree.

They already doubled the FPU on Zen2. It would be more their style (alternating between brute force capacity and optimization+refining) to work on optimization stage. So rather than doubling FPU units again, work on optimizations like faster cache interface and front end and a speed upgrade for some more complex FPU operations.

I don't think a doubling of FPU units will happen in Zen3, but a minor FPU pipe widening very well might. Would a very minor low transistor count FPU pipe upgrade like two more FADD be worth it?

itsmydamnation · Dec 28, 2019

lobz said:
In the past years AMD have been talking a lot about moving data around and feeding compute units being so important, therefore it would be a very strange move for them to double just the maximum throughput again and leave it at that - seems not worth either the silicon cost or the fuss.

The problem is actual cache design. it really hard to design lots of ports on a cache, especially a fast one. Remember AMD already has 4x256 bit pipes ( or 8x128 bit if you like to count wrong like nosta.) AMD wouldn't need to add to many more read ports to take those 4 FP pipes from 2xFMA, 2x FADD to 4x FMA, would only grow the FPU size modestly. Maybe they could go to 4x 256bit FMA, support AVX512 over multiple cycles and widen data paths to 512bit, would match basically intel on 512bit ops and smash them on 256bit that had the right load/store/register reuse pattern.

The real interesting question for me about Zen3 FPU is for AI, is it just adding things like bfloat or could we see something new like a matrix multiply with gather.

soresu · Dec 28, 2019

itsmydamnation said:
The real interesting question for me about Zen3 FPU is for AI, is it just adding things like bfloat or could we see something new like a matrix multiply with gather.

Considering it is touted as a 'new uArch' and all other recent CPU designs are leaning towards ML, it seems to be highly likely we will see something like this in Zen3.

NTMBK · Dec 28, 2019

soresu said:
Considering it is touted as a 'new uArch' and all other recent CPU designs are leaning towards ML, it seems to be highly likely we will see something like this in Zen3.

Adding a bunch of hardware to do AI badly on the CPU seems like a silly idea, when anyone doing serious amounts of AI work in server will have accelerators better suited to the task.

NostaSeronx · Dec 28, 2019

itsmydamnation said:
Remember AMD already has 4x256 bit pipes ( or 8x128 bit if you like to count wrong like nosta.)

I am counting it right.

It isn't native FP256 like how Intel does it. It is literally 8x 128-bit datapaths, FP0-3 being low 128-bit and FP4-7 being high 128-bit. It would be relatively simple to switch 4x 128-bit / 4x 256-bit to 8x 128-bit / 4x 256-bit. Increasing FPU availability to the rest of 128-bit datapaths absolutely will give higher IPC than AVX512. AVX512 requires a new ISA and requires to use full-width 512-bit instructions to max out usage. Much like to max out usage on Zen2, to use all datapaths the instruction must be AVX256.

Allowing FP128 instructions to also execute on FP4-7 is the easier to implement option, with instant IPC growth for FP128/legacy SSE2+ workloads.

soresu · Dec 28, 2019

NTMBK said:
Adding a bunch of hardware to do AI badly on the CPU seems like a silly idea, when anyone doing serious amounts of AI work in server will have accelerators better suited to the task.

You could say the same thing of ARM when they have independent ML accelerator cores, yet still accelerate ML functions on their CPU cores - arguably they know what they are doing.

I would personally call 512 bit SIMD on a CPU an insane idea, but Intel and ARM clearly believe otherwise again.

DrMrLordX · Dec 28, 2019

NTMBK said:
Adding a bunch of hardware to do AI badly on the CPU seems like a silly idea, when anyone doing serious amounts of AI work in server will have accelerators better suited to the task.

Agreed, but . . .

soresu said:
You could say the same thing of ARM when they have independent ML accelerator cores, yet still accelerate ML functions on their CPU cores - arguably they know what they are doing.

Different target markets. AMD isn't in cell phones/tablets at all, so adding ML instructions to their CPUs for the reasons that the mobile/tablet SoC designers add them to their designs would be ludicrous. AMD has their dGPUs for ML/AI, arguably at better perf/watt and defintely at better overall performance than anything from the mobile SoC sector.

Then there's VIA's bizarre decision to include ML in their CPUs. Their market rationale is truly odd and out-of-place. I would think AMD following suit would be seen in a similar fashion. Intel's decision to include bfloat seems really weird when they should be upselling Loihi instead.

soresu · Dec 28, 2019

DrMrLordX said:
Agreed, but . . .

Different target markets. AMD isn't in cell phones/tablets at all, so adding ML instructions to their CPUs for the reasons that the mobile/tablet SoC designers add them to their designs would be ludicrous. AMD has their dGPUs for ML/AI, arguably at better perf/watt and defintely at better overall performance than anything from the mobile SoC sector.

Then there's VIA's bizarre decision to include ML in their CPUs. Their market rationale is truly odd and out-of-place. I would think AMD following suit would be seen in a similar fashion. Intel's decision to include bfloat seems really weird when they should be upselling Loihi instead.

It only sounds bizarre because most people still haven't grasped how pervasive ML workloads are becoming.

It will become as important a workload as any soon, that means needing to support it as well as possible across all system compute hardware, regardless of whatever may do it better in the ideal scenario.

Not every AMD CPU system will have an AMD GPU, and even those that do may well be mismatched with an older GPU.

Speculation: Ryzen 4000 series/Zen 3

Senior member

Platinum Member

Lifer

Member

Lifer

Platinum Member

Platinum Member

Senior member

Platinum Member

Senior member

Platinum Member

Lifer

Golden Member

Platinum Member

Diamond Member

Platinum Member

Senior member

Senior member

Platinum Member

Platinum Member

Lifer

Diamond Member

Platinum Member

Lifer

Platinum Member