Question AMD Phoenix/Zen 4 APU Speculation and Discussion

Page 16 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

scineram

Senior member
Nov 1, 2020
361
283
106
Right, and what Intel has in mobile that even has comparable iGPU performance, or in the case of Phoenix - will have?
What they have is perfectly fine. Few care about gaming on integrated and their multimedia engine is very good. The GPU is a non-issue.
 

moinmoin

Diamond Member
Jun 1, 2017
4,994
7,765
136
Since mobile Zen cores having lower frequencies than desktop cores somehow seems to be a point of contention I looked up the boost values for previous gens:

desktopmobile
Zen4.1 GHz (1800X)3.8 GHz (2800H)
Zen+4.3 GHz (2700X)4.0 GHz (3750H)
Zen 24.7 GHz (3950X)4.4 GHz (4900H)
Zen 34.9 GHz (5950X)4.8 GHz (5980HX)
Zen 3+5.0 GHz (6980HX)

The HX models are specially binned, the more common H models are significantly slower (5800H 4.4 GHz, 6800H 4.7GHz).

So going by this precedence and 7950X official boost clock being 5.7 GHz we can expect AMD to have targeted 5.6 GHz for a HX and 5.2-5.4 GHz for a H Phoenix Point chip (with Dragon Range joining the model names likely will be different though).

Zen 4c vs Zen 4 is a bit of a complicated affair. It has it's own v/f characteristics - it doesn't behave like Zen 4 always does. Not going to give too much context but the efficiency cross over point is a bit too low.
AMD's previous mobile cores all had v/f curves that significantly deviate from the desktop/server cores. Earlier efficiency inflection point but much lower starting point. Or have you seen something worse than historical precedence?

I've previously linked a comparison by CnC for Zen 2, which shows the inflection point around 0.7-0.8 GHz lower, but about double the efficiency at the lowest frequency for the mobile core:
For example this is how much Zen 2's v/f curve differs as part of desktop 3950X vs mobile 4800H (taken from Chips and Cheese's article on ADL's power efficiency):
 

DisEnchantment

Golden Member
Mar 3, 2017
1,687
6,243
136
So going by this precedence and 7950X official boost clock being 5.7 GHz we can expect AMD to have targeted 5.6 GHz for a HX and 5.2-5.4 GHz for a H Phoenix Point chip (with Dragon Range joining the model names likely will be different though).
Phoenix will be a much more interesting chip than Dragon Range definitely.
Some important metrics for AMD Ryzen™ 7 6800U --> 4.7 GHz Fmax, 214 mm2 die, 15-28 W TDP, ~60 MTr/mm2

Phoenix being on N4 is pretty much guaranteed to hit and sustain 5.2+ GHz with a higher XTor density than N5 Zen 4 CCDs.
Put the cumulative frequency gains from N5 and N4 for density and efficiency optimized parts, I would guesstimate 112-118 MTr/mm2 range for density and ~60% energy efficiency from N6 --> N4, if more efficient flavors of N4 used.
Which could make a high boosting <20W part possible against the 28W part from 6000U series. Much more interesting are base clocks. I would expect 3.6GHz - 4GHz base clocks for this chip.

Going by the stellar characteristics of most SoCs refreshed on N4, like smartphone SoCs and GPUs, the odds for Phoenix to have excellent efficiency and frequency are very very good.
 

Glo.

Diamond Member
Apr 25, 2015
5,763
4,667
136
Phoenix will be a much more interesting chip than Dragon Range definitely.
Some important metrics for AMD Ryzen™ 7 6800U --> 4.7 GHz Fmax, 214 mm2 die, 15-28 W TDP, ~60 MTr/mm2

Phoenix being on N4 is pretty much guaranteed to hit and sustain 5.2+ GHz with a higher XTor density than N5 Zen 4 CCDs.
Put the cumulative frequency gains from N5 and N4 for density and efficiency optimized parts, I would guesstimate 112-118 MTr/mm2 range for density and ~60% energy efficiency from N6 --> N4, if more efficient flavors of N4 used.
Which could make a high boosting <20W part possible against the 28W part from 6000U series. Much more interesting are base clocks. I would expect 3.6GHz - 4GHz base clocks for this chip.

Going by the stellar characteristics of most SoCs refreshed on N4, like smartphone SoCs and GPUs, the odds for Phoenix to have excellent efficiency and frequency are very very good.
I think 4 GHz base clock for higher TDP, albeit, mobile parts is very possible, at this point.

45W TDP, anyone?
 

BorisTheBlade82

Senior member
May 1, 2020
667
1,022
136

yuri69

Senior member
Jul 16, 2013
438
719
136
According to THW some Linux Kernel patches point to Dragon Range as well as Phoenix Point being RDNA3.
While the latter one was a given for me I honestly do not believe in the former one. So far everything indicated Dragon Range to be Raphael for Mobile.
But maybe we are in for a big surprise - a different IOD and maybe a different packaging as well.Time will tell...

PRO TIP: Always read and translate the original yourself!

The original article at coelacanth-dream states there is a RDNA3-based GC 11.0.4 IP different from the Phoenix's one. Then it *speculates* the only upcoming known APU is Dragon Range. So it might be the one using that IP. Or not, since the article further confirms the Dragon Range uses the regular 'desktop' DDR5 memory.
 

LightningZ71

Golden Member
Mar 10, 2017
1,661
1,946
136
PRO TIP: Always read and translate the original yourself!

The original article at coelacanth-dream states there is a RDNA3-based GC 11.0.4 IP different from the Phoenix's one. Then it *speculates* the only upcoming known APU is Dragon Range. So it might be the one using that IP. Or not, since the article further confirms the Dragon Range uses the regular 'desktop' DDR5 memory.
Hardware Times has a decent direct but edited translation at https://www.hardwaretimes.com/amd-r...ips-to-leverage-rdna-3-graphics-linux-driver/

It indicates that there is an SMU update as well. In my opinion, an IP revision for the SMU indicates different silicon for the IOD for Dragon Range, which leads me to believe that either it's a respin of the IOD for Raphael that has feature swaps and other tweaks for the mobile platform, or, they actually bothered to develop a completely different IOD specifically for Dragon Range on a different process. I suppose that there is precedent here. They are developing all of the components for an IOD on Phoenix on N5, including a non-trivial layout change for RDNA3 that is possibly focused on increasing it's performance and efficiency when memory starved in a mobile platform (no infinity cache, but larger internal caches at lower levels maybe) and focuses on improving performance when using features like FSR and XeSS. They are also doing IP for an N5 DDR5 controller as well as all the USB and Ethernet connectivity that is needed. It make some sense to do a parallel development for an N5 IOD for Dragon Range and reuse a LOT of the functional blocks that they developed on that IOD.

I realize that this will bring the cost of doing another package for Dragon, but, making it BGA as I suspect that it is was going to require that anyway. The advantage here is a common APU code base, lower idle power for the improved process, possibly a smaller IOD to make packaging easier, and likely the ability to better support higher speeds of DDR5 without blowing the power budget.
 
Reactions: Vattila

Glo.

Diamond Member
Apr 25, 2015
5,763
4,667
136
I'd like to remind us all about one thing:

"“Phoenix Point” innovations include the AIE inference accelerator, image signal processor, advanced display for refresh and response, AMD chiplet architecture, and extreme power management. "


Its still here, so most likely this is NOT a mistake.

So, do we have wild guesses of what it means, considering that CPU+GPU part are, indeed, monolithic?
 

Tigerick

Senior member
Apr 1, 2022
686
576
106
This diagram is from RGT (I know, I know ), seems like a bit over engineering.

3 chiplets from 3 process combined as one APU. They do solve the little cache on GPU issue, we shall know more in a month
 
Reactions: lightmanek

BorisTheBlade82

Senior member
May 1, 2020
667
1,022
136
I'd like to remind us all about one thing:

"“Phoenix Point” innovations include the AIE inference accelerator, image signal processor, advanced display for refresh and response, AMD chiplet architecture, and extreme power management. "


Its still here, so most likely this is NOT a mistake.

So, do we have wild guesses of what it means, considering that CPU+GPU part are, indeed, monolithic?
To me the most likely explanation would be that they incorporate some Xilinx IP or some AI accelorator on the same package.

A wilder guess: The monolithic Phoenix Point die can be connected to a Raphael CCD, resulting in Dragon Range
 

Glo.

Diamond Member
Apr 25, 2015
5,763
4,667
136
This diagram is from RGT (I know, I know ), seems like a bit over engineering. View attachment 71768

3 chiplets from 3 process combined as one APU. They do solve the little cache on GPU issue, we shall know more in a month
This diagram is incorrect .

IMO, the most likely scenario is that we get monolithic die of CPU+GPU, and chiplets are where the caches and memory controllers are, just like in Navi 31.
 

IEC

Elite Member
Super Moderator
Jun 10, 2004
14,362
5,029
136
Strange, IOD, which scales even worse than cache is on the smallest node.

IO Die is explicitly one of the items AMD puts on larger process nodes because they 1) just do not scale well with node shrinks, and 2) it is not worth it to decrease yields of the parts that matter

Unless there is some power efficiency reason to do so, I'd wager that diagram is flat out wrong.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,430
2,914
136
I also think If Phoenix is not a single monolith, then IO+Memory+cache would be separate.
If It had SLC cache, I would expect more than just 12CU IGP.
I think clocks will be 2.8-3GHz
100*1.14*(28 or 30)/24 => 133-143
I think we will see 33-43% better performance of the IGP compared to Rembrandt.
 
Last edited:

Glo.

Diamond Member
Apr 25, 2015
5,763
4,667
136
I expect up to 43% better performance of the IGP compared to Rembrandt.
A "bit" higher .

PHX is supposed to make 3050M irrelevant for thin and light laptops. For that it need to score, depending on thermal envelope design, between 4000 and 5000 PTS in Time Spy.

RMB scores here 2400-2800 pts.
 
Reactions: lightmanek

Tigerick

Senior member
Apr 1, 2022
686
576
106
Yeah that is what I thought first, but we are at new era here. Clearly this is AMD's response to Intel's Meteor Lake with 4 tiles solution. Below is Ryzen 6000 topology:


AMD already has experience with AIO apu and with RDNA3, AMD can easily split the GCD and MCD out of it. Lets say GCD has 768SP, that is 20% of Navi 32, so with simple cutoff we has new GCD for around 50mm2. MCD should be similar to RDNA3 with different memory support (Can anyone tell me different memory like GDDR6 and LPDDR5 can co-exist on same die?)

So unless AMD want to create another die for IOD or use big IOD die from desktop Raphael, they have to combine CPU and IOD using N4 process.

The modular design also help future development. Strix Point should maintain 3 chiplets design, AMD only need to focus on CPU die only (most like using Zen5 + 4c), GCD and MCD should follow RDNA3+ design.
 
Reactions: Tlh97

TESKATLIPOKA

Platinum Member
May 1, 2020
2,430
2,914
136
A "bit" higher .

PHX is supposed to make 3050M irrelevant for thin and light laptops. For that it need to score, depending on thermal envelope design, between 4000 and 5000 PTS in Time Spy.

RMB scores here 2400-2800 pts.
Not sure where you found 2800 pts for Rembrandt.

RTX 3050M scores from Notebookcheck.net:
TDP40 W45 W65 W
3DMark Time Spy Graphics38533931-45034490-4960

The highest Time Spy Graphics score on Notebookcheck.net for 680M was only 2449 points.

To be on par with the weakest 3050M you would need 57% higher score and for the 65W one you would need to have 103% higher score.
Even for the lowest score, you would need 3.3GHz and the likelihood of that happening is very, very low, especially for mobile.
 
Last edited:

Tigerick

Senior member
Apr 1, 2022
686
576
106
Here is upcomimg Intel's Meteor Lake likely diagram:


If we omit TB4 controller, Intel has used 3 different tiles to combine as one SoC. Ironically, it is Intel 4, TSMC N5 and N6 three different process. Intel design is more like traditional PC,I can see why people think AMD should follow...but as I said in previous post, AMD has the design experience so why not use it?

Of course, I still has question of can CPU die access the SLC cache or just by pass it?
 
Last edited:

moinmoin

Diamond Member
Jun 1, 2017
4,994
7,765
136
This diagram is from RGT (I know, I know ), seems like a bit over engineering. View attachment 71768

3 chiplets from 3 process combined as one APU. They do solve the little cache on GPU issue, we shall know more in a month
That picture is plain stupid: It would require separate memory for the CPU (IOD) and the GPU (MCD). That's not an APU anymore, so even if it exists it won't be Phoenix Point. And at that point one can simple combine a CPU and a dGPU on the same package connected by plain on-package PCIe, which is exactly what Kaby Lake-G was all about, a concept since never used again.
 

Glo.

Diamond Member
Apr 25, 2015
5,763
4,667
136
That picture is plain stupid: It would require separate memory for the CPU (IOD) and the GPU (MCD). That's not an APU anymore, so even if it exists it won't be Phoenix Point. And at that point one can simple combine a CPU and a dGPU on the same package connected by plain on-package PCIe, which is exactly what Kaby Lake-G was all about, a concept since never used again.
100% this.
 

Glo.

Diamond Member
Apr 25, 2015
5,763
4,667
136
Not sure where you found 2800 pts for Rembrandt.

RTX 3050M scores from Notebookcheck.net:
TDP40 W45 W65 W
3DMark Time Spy Graphics38533931-45034490-4960

The highest Time Spy Graphics score on Notebookcheck.net for 680M was only 2449 points.

To be on par with the weakest 3050M you would need 57% higher score and for the 65W one you would need to have 103% higher score.
Even for the lowest score, you would need 3.3GHz and the likelihood of that happening is very, very low, especially for mobile.
All I can say to all of this is: Yes!

On a serious note.


5200 MHz DDR5 - 2536 pts, 6400 LPDDR5 - 2633 pts.

Not much does RMB gain from increased memory bandwdith.

Phoenix gets much more internal bandwidth from the larger caches, much higher clock speeds(expect 3 GHz on shaders, even in 45W TDP), and I think 5600 MHz DDDR5, and much better culling pipeline.

The only question is: is there cache on the chiplets(assuming its memory/cache chiplets). Because based on what we know - there is no infinity cache on the dies.
 

LightningZ71

Golden Member
Mar 10, 2017
1,661
1,946
136
We know that AMD has a different spin of RDNA3 for APU mobile and its about maintaining performance in a memory restricted environment by deepening internal caches throughout the WGPs. There is no reason for AMD to even consider using an SLC.

I suspect that the chiplets are for separating out IPs to where they already exist. It appears that AMD will have an N6 RDNA3 chip, so they have made an RDNA3 block for that process. We know that AMD has a dedicated Low End processor again in Mendocino, so they aren't concerning themselves with quad core and below for Phoenix, so there is maybe no thought to recovering low function chips. I suspect that there will be a small cache N4 CCD and an N5 or N6 IOD with GPU and integrated Xilinx blocks. That's going to be a big N6 iod though, so, its possible that they use dense libraries on N5 to keep it small, but those large internal caches aren't going to like to scale.

In a mobile product, I can't see AMD separating the memory controller from the iGPU. That's just too much power burned. It works in desktop add in cards because you have a higher power budget.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |