Question AMD Phoenix/Zen 4 APU Speculation and Discussion

Page 12 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Abwx

Lifer
Apr 2, 2011
11,166
3,862
136
Do you really think people who need the extra sustained CPU performance care much about battery life during the same said situation? 16W and 26W CPU is something close to 25W and 35W platform level which means on a 50WHr battery you end up with 2 hours and 1.4 hours. Big deal, they are both terrible.

You re assuming a ST usage at 100% of the time, and in this perspective 1.4x the battery life at 2H is not that thrilling in absolute time length, but at an average 25% duty cycle over the battery life, wich is closer to real conditions, the battery lives will be 5.6H and 8H respectively, and that s quite a difference.

If you wanted sustained load battery life you'd get a Y CPU with 7W or even the Jasper Lake ones. I know many here will scoff at that comment, but whatever that's the reality.

That make sense since it can be easily demonstrated mathematicaly that best efficency of a system is when the CPU consume as much as the rest of the system, and 7W is close to a laptop power comsumption excluding the CPU.

That being said best efficency is not forcibly best usability and a trade off is to be made between those two metrics, hence those frequency boosts, but with recent CPUs they are more of a marketing driven choice than actual best efficency/usability balances.
 

moinmoin

Diamond Member
Jun 1, 2017
4,994
7,765
136
That make sense since it can be easily demonstrated mathematicaly that best efficency of a system is when the CPU consume as much as the rest of the system, and 7W is close to a laptop power comsumption excluding the CPU.
I'm not sure about that. The rest of the system at idle or in use? My Renoir laptop's lowest idle usage in normal usage (so with screen and wifi enabled) is slightly below 3W. But use of NVMe does visibly rise that value, though usually only for a very short time. Also best performance-per-watt is usually at the efficiency infliction point most CPUs have in the frequency/voltage curve, with the addition of the system power use moving that point slightly above that. This gives the best possible performance return for the available battery capacity which imo should be always the goal for all mobile use cases.
 

Abwx

Lifer
Apr 2, 2011
11,166
3,862
136
I'm not sure about that. The rest of the system at idle or in use? My Renoir laptop's lowest idle usage in normal usage (so with screen and wifi enabled) is slightly below 3W. But use of NVMe does visibly rise that value, though usually only for a very short time. Also best performance-per-watt is usually at the efficiency infliction point most CPUs have in the frequency/voltage curve, with the addition of the system power use moving that point slightly above that. This gives the best possible performance return for the available battery capacity which imo should be always the goal for all mobile use cases.

It can be proved by computing the energy/time first and second order derivatives, the optimum point is at CPU power = rest of system power.

To give a numerical exemple let s assume that rest of system power is 1W and that the task is executed in 1 second when the CPU use 1W, that amount to 2 joules for the full system.

If you increase frequency by 1.1x time will be reduced by 1.1x and CPU energy increase by 1.3, total energy consumed by the CPU will be 1.18 joule while the rest of system will consume 0.91 Joule, that makes a grand total of 2.1 joule..

If you increase frequency even further the efficency will decrease accordingly, but as said efficency and usability can be contradictory when it comes to the required power level to get a responsive enough system.
 
Reactions: maddie

moinmoin

Diamond Member
Jun 1, 2017
4,994
7,765
136
It can be proved by computing the energy/time first and second order derivatives, the optimum point is at CPU power = rest of system power.

To give a numerical exemple let s assume that rest of system power is 1W and that the task is executed in 1 second when the CPU use 1W, that amount to 2 joules for the full system.

If you increase frequency by 1.1x time will be reduced by 1.1x and CPU energy increase by 1.3, total energy consumed by the CPU will be 1.18 joule while the rest of system will consume 0.91 Joule, that makes a grand total of 2.1 joule..

If you increase frequency even further the efficency will decrease accordingly, but as said efficency and usability can be contradictory when it comes to the required power level to get a responsive enough system.
Sorry, that's nonsense. The optimum point can only be at CPU power = rest of system power if said CPU is actually able to run at that amount of power. In your particular example good luck finding an x86 chip that offers significant performance at 1W already, they will be heavily throttled due to lack of juice, if they turn on to begin with. As a result of that for the first couple watts added the increase in performance will be so dramatic that the performance increase is much higher than the energy increase. That's exactly why one wants to look at the CPU's efficiency infliction point of its frequency/voltage curve.
 

Abwx

Lifer
Apr 2, 2011
11,166
3,862
136
Sorry, that's nonsense. The optimum point can only be at CPU power = rest of system power if said CPU is actually able to run at that amount of power. In your particular example good luck finding an x86 chip that offers significant performance at 1W already, they will be heavily throttled due to lack of juice, if they turn on to begin with. As a result of that for the first couple watts added the increase in performance will be so dramatic that the performance increase is much higher than the energy increase. That's exactly why one wants to look at the CPU's efficiency infliction point of its frequency/voltage curve.

If rest of system use 10W then best efficency is at 10W for the CPU, translating the function to other conditions will change nothing to this efficency point, you could do the maths for 5 or 10W rest of system and find exactly the same ratios.

The so called race to idling is a marketing gimmick, because increasing perf above said optimum by a ratio X will reduce time by 1/X.

Meanwhile CPU power has to be increased by a ratio X^3, the result is that the CPU will consume (1/X).(X^3) = X^2 time more energy (in joules) to process the task.

As for the power/frequency curve it s cubic shaped because in ST CPUs are pushed at the extremity of their frequency capability within wich they are very inefficient, but even if it were perfect mosfets they would still behave minimaly as square law devices, and the CPU would consume X time more energy than at optimal point to do the afformentioned task.

Edit : Some AMD APU based laptops have a setting with 12W CPU power, that makes lot of sense when it comes to battery life, you should do some tests with your 5800U if it has such an option.
 
Last edited:

podspi

Golden Member
Jan 11, 2011
1,982
102
106
I don't think there is any reason I am aware of that suggests 50% of total system power should always be dedicated to the CPU for maximum efficiency. It would seem to me that this would be highly dependent on what the task actually is.

Conceptually, computers are systems with many sub-components. Sticking with joules, maximum efficiency for a given task is achieved when each marginal joule used by each component contributes the same incremental increase in performance.

Of course, in the real world, power and performance may not scale in a way that makes that possible, as moinmoin pointed out.

This is the same concept as the selection of inputs to minimize the cost of production: https://en.wikipedia.org/wiki/Production_function

It is also the same concept behind AMD's Smartshift, and many other similar technologies. It isn't marketing, its basic economics.
 
Reactions: Tlh97 and Bigos

moinmoin

Diamond Member
Jun 1, 2017
4,994
7,765
136
In the beginning of the years I actually did some measurements of Speedometer 2.0 with the 4500U being set to different low SoC cTDPs using ryzenadj (particularly setting STAPM LIMIT, PPT LIMIT FAST and PPT LIMIT SLOW all to the same value).



So on my 4500U with a majorly ST heavy workload the efficiency infliction point is around 5.5W for the SoC.

I would need to find some decently repeatable MT, GPU and mixed workloads for Linux to test the effect of SoC cTDPs on their respective workloads and their resulting efficiency infliction points.
 

Abwx

Lifer
Apr 2, 2011
11,166
3,862
136
I don't think there is any reason I am aware of that suggests 50% of total system power should always be dedicated to the CPU for maximum efficiency. It would seem to me that this would be highly dependent on what the task actually is.

The conclusion will be the same whatever the task or the active core count, best efficency in respect of battery life is when CPU power is equal to rest of system power, that s mathematicaly provable..

But as said there s usability to consider when there s tasks that are longer than say one second, and one would prefer a CPU that boost shortly to 45W instead of keeping at an optimal 10W if it reduce a 10s task down to 5s.
 

moinmoin

Diamond Member
Jun 1, 2017
4,994
7,765
136
The conclusion will be the same whatever the task or the active core count, best efficency in respect of battery life is when CPU power is equal to rest of system power, that s mathematicaly provable..
What odd redefinition of the term efficiency is that? Since battery capacity is limited, the most efficient CPU is the one which manages to do the most work until the battery runs out. This is highly workload dependent since e.g. hardware accelerated video is different from ST different from MT different from GPU different from different balances of mixed workloads. There is no mathematical way to simplify or generalize that calculation.
 

uzzi38

Platinum Member
Oct 16, 2019
2,702
6,405
146
That could make sense. What irritates me is that there's no mention of any chip below Phoenix Point so far, with Phoenix Point supposedly only targeting the range of 35-45W TDP but treated as a straight successor to Rembrandt. That it's supposedly already be chiplet based came to the surprise of many. Rebadging monolithic Rembrandt for the lower end while disappointing could happen.
Phoenix-U is still a thing, wut?
 

Bigos

Member
Jun 2, 2019
138
322
136
The conclusion will be the same whatever the task or the active core count, best efficency in respect of battery life is when CPU power is equal to rest of system power, that s mathematicaly provable..

It is not true in general. It is only true in some designs.

Consider a design that uses very little power outside of the CPU, e.g. 1W. What if the CPU needs to run at 1GHz to use 1W but it can run up to 2GHz on the same voltage? Such a CPU will use 2W at 2GHz. Now you can run your CPU at 1GHz and the design will use 2W. Or you can run it at 2GHz and it will use 3W. Which one is more efficient?

(The above ignores the leakage which does not scale with frequency. I.e. such a CPU at 2GHz will use even less than 2W, depending on how much can be attributed to the leakage.)

Obviously, the frequency/voltage curve often means that you cannot double your frequency at only twice the power consumption. However, you cannot just say "it is mathematically provable that a design that uses half its power on the CPU is the most efficient". Or you can say it when you provide such a proof

There are far too many variables (frequency/voltage curve, leakage, ...) for there to be one single solution to this. It depends on the design.
 

qmech

Member
Jan 29, 2022
82
179
66
What odd redefinition of the term efficiency is that? Since battery capacity is limited, the most efficient CPU is the one which manages to do the most work until the battery runs out. This is highly workload dependent since e.g. hardware accelerated video is different from ST different from MT different from GPU different from different balances of mixed workloads. There is no mathematical way to simplify or generalize that calculation.
It is not true in general. It is only true in some designs.

Consider a design that uses very little power outside of the CPU, e.g. 1W. What if the CPU needs to run at 1GHz to use 1W but it can run up to 2GHz on the same voltage? Such a CPU will use 2W at 2GHz. Now you can run your CPU at 1GHz and the design will use 2W. Or you can run it at 2GHz and it will use 3W. Which one is more efficient?

(The above ignores the leakage which does not scale with frequency. I.e. such a CPU at 2GHz will use even less than 2W, depending on how much can be attributed to the leakage.)

Obviously, the frequency/voltage curve often means that you cannot double your frequency at only twice the power consumption. However, you cannot just say "it is mathematically provable that a design that uses half its power on the CPU is the most efficient". Or you can say it when you provide such a proof

There are far too many variables (frequency/voltage curve, leakage, ...) for there to be one single solution to this. It depends on the design.

This is getting silly. I prefer to lurk, but fine, I'll provide the math and then shut up again.

Consider a system where we divide power consumption into processor and "rest of system". For a task that takes a given amount of time, we can trivially write:

E(total) = (P(cpu) + P(rest)) * t

We want to examine the P(cpu) that minimizes total energy consumption, which in the case of a battery driven device like a phone or a notebook directly translates to increased battery life.

We will look at 3 regimes of CPU frequency-power. Linear, quadratic, and cubic. If you go look at actual power profiles, even the best case is usually at least somewhat superlinear, so linear is clearly a best case scenario. We will also assume that this task scales perfectly with frequency, which is also clearly a best case scenario.

Let us examine the middle case first, where scaling is quadratic. We can then rewrite P(cpu) in terms of same (arbitrary) power, A, and a frequency scaling factor, x. The time taken for our task must be scaled as well. We will call P(rest) B to make the equations somewhat simpler visually.

E = (A*x² + B)*t/x
where P(cpu) = A*x²

Finding minimum, we'll differentiate and look at the roots:
dE/dx = (A - B/x²)*t
=>
x² = B/A
x = sqrt(B/A), given our non-negative constraints.
P(cpu) for minimum power consumption:

P(cpu) = A*x² = A*B/A = B

I.e. at quadratic scaling, the most efficient (least energy consuming) case is where the processor matches the rest of the system in power, as stated by Abwx.

There are a few additional observations, but we'll deal with linear and quadratic scaling first. Feel free to do the math, I'll simply note the results.

For linear scaling, the processor uses the same amount of energy for the task regardless of frequency (it just takes longer if clocked lower). Total system energy is obviously minimized by finishing as fast as possible. In this very idealized regime, race-to-idle is indeed best.

For quadratic scaling, plugging in x³ instead of x² above gives a minimum at:

x³ = B/(2*A)

corresponding to the most efficient P(cpu) of:

P(cpu) = B/2

What we are seeing here is clear. For sub-quadratic scaling, we want the processor to finish as quickly as possible, while we want the power to be equal to the (rest of the) system power at the quadratic limit, and lower than system power for superquadratic scaling.

At this point I need point out a few assumptions:

1. Performance scaling is very rarely linear and any sub-linear scaling will push the processor towards lower clocks (and thus lower power), even in the linear power scaling regime.

2. The concept of isolated task-energy is *usually* completely wrong when looking at real world usage patterns. Outside of things like compilation or rendering tasks, the system is in use before and after the isolated task. In this case, the equation should be:

E(tot) = (A*x² + B)*t/x + (A0 + B0)*(T-t/x)

This is the relevant equation when looking at how long your phone or notebook will last while browsing, watching a movie, programming, using a word processor or anything else that can be considered a "base load" (even if not idle) and looking at what the most efficient power consumption for the processor should be while spiking to do whatever Windows is up to (or any other non-critical task that would slow down what you are doing). A0 and B0 in this case is the base load of the CPU and "rest".

Solving that gives this for most efficient P(cpu):

P(cpu) = B - (A0+B0)

I.e. the processor should consume about as much power as the rest of the system at load MINUS the idle (or base) power of the system.

The conclusion to all of this is that race-to-idle is very often a very bad choice, at least for non-interactive/blocking tasks.

It is quite easy to plug in real numbers for real systems and the results are, in my experience, quite close to Abwx' postulate of most-efficient being when processor power is close to rest-of-system power.
 

moinmoin

Diamond Member
Jun 1, 2017
4,994
7,765
136
Phoenix-U is still a thing, wut?
Is it? (And is it still called Phoenix-U, not Phoenix Point-U? )
Honestly haven't seen a mention of it in ages, which didn't bother me after the revelation that Phoenix Point is in some way using chiplet technology. I would still think monolithic chips are financially best suited to the lower end of the product range. But I know nothing.
 

DrMrLordX

Lifer
Apr 27, 2000
21,797
11,143
136
Honestly haven't seen a mention of it in ages, which didn't bother me after the revelation that Phoenix Point is in some way using chiplet technology. I would still think monolithic chips are financially best suited to the lower end of the product range. But I know nothing.

Kinda makes you wonder what will be the successor to Rembrandt.
 

LightningZ71

Golden Member
Mar 10, 2017
1,659
1,942
136
I don't think that we will see a successor to Rembrandt for a long time. TSMC has already stated that N6 is going to be a long life node that will offer value to customers in volume. Given the capability that Rembrandt brings to the table, it seems to me that it will be in production for years and years, especially for the life of DDR5.

Instead, we should see APUs move further up market and smaller ones created on legacy nodes to fill the bottom end ala Mendocino.
 

ahimsa42

Senior member
Jul 16, 2016
225
194
116
I don't think that we will see a successor to Rembrandt for a long time. TSMC has already stated that N6 is going to be a long life node that will offer value to customers in volume. Given the capability that Rembrandt brings to the table, it seems to me that it will be in production for years and years, especially for the life of DDR5.

Instead, we should see APUs move further up market and smaller ones created on legacy nodes to fill the bottom end ala Mendocino.
we may not even see rembrandt widely available at a decent price for a long time, let alone a rembrant successor lol.
 

soresu

Platinum Member
Dec 19, 2014
2,955
2,173
136
Is it? (And is it still called Phoenix-U, not Phoenix Point-U? )
Honestly haven't seen a mention of it in ages, which didn't bother me after the revelation that Phoenix Point is in some way using chiplet technology. I would still think monolithic chips are financially best suited to the lower end of the product range. But I know nothing.
I think it's a matter of packaging cost for the most part.

If the packaging costs can be reduced enough then manufacturing the IO and other less scalable components of the GPU on a less advanced, more mature and cheaper node with chiplet SoP architecture absolutely becomes worth it.
 

MadRat

Lifer
Oct 14, 1999
11,923
259
126
Smaller chips that fit in the bottom end are outside of AMDs scope. AMD can target a swath of lower mid-range market no problem. But lets face it, below that is a commodity market. No real profit for them.
 

DrMrLordX

Lifer
Apr 27, 2000
21,797
11,143
136
Smaller chips that fit in the bottom end are outside of AMDs scope. AMD can target a swath of lower mid-range market no problem. But lets face it, below that is a commodity market. No real profit for them.

Clearly AMD has disagreed in the immediate past: see, Rembrandt-U
 

moinmoin

Diamond Member
Jun 1, 2017
4,994
7,765
136
I think it's a matter of packaging cost for the most part.

If the packaging costs can be reduced enough then manufacturing the IO and other less scalable components of the GPU on a less advanced, more mature and cheaper node with chiplet SoP architecture absolutely becomes worth it.
Packaging cost is one thing. The other is whether an MCM package can reach the efficiency of monolithic chips, whether interconnects can avoid adding decisive power consumption at lower wattage. What at higher wattage like on desktop amounts to noise floor that can be safely ignored can make the difference between good and bad performance in frugal mobile systems.
 

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
Packaging cost is one thing. The other is whether an MCM package can reach the efficiency of monolithic chips, whether interconnects can avoid adding decisive power consumption at lower wattage. What at higher wattage like on desktop amounts to noise floor that can be safely ignored can make the difference between good and bad performance in frugal mobile systems.
More than the use of chiplets, how you connect them is the key. Zen using IF through the substrate is vastly less efficient than using TSMC SOIC tech. In other words, a chiplet design using SOIC to integrate chiplets, can be considered a monolithic design from a power consumption viewpoint.
 
Reactions: MadRat

moinmoin

Diamond Member
Jun 1, 2017
4,994
7,765
136
More than the use of chiplets, how you connect them is the key. Zen using IF through the substrate is vastly less efficient than using TSMC SOIC tech. In other words, a chiplet design using SOIC to integrate chiplets, can be considered a monolithic design from a power consumption viewpoint.
Vastly closer to a monolithic design anyway, yes. But also significantly more expensive packaging compared to using bog standard substrate. Will be very interesting what balance AMD will strike in this area.
 
Reactions: Kaluan

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
Vastly closer to a monolithic design anyway, yes. But also significantly more expensive packaging compared to using bog standard substrate. Will be very interesting what balance AMD will strike in this area.
This is something that we all assume, but is it accurate. SOIC has to have very flat surfaces and precise positioning, but the bond itself is automatic, no solder needed. I can see this becoming very cheap when mass produced.
 

moinmoin

Diamond Member
Jun 1, 2017
4,994
7,765
136
This is something that we all assume, but is it accurate. SOIC has to have very flat surfaces and precise positioning, but the bond itself is automatic, no solder needed. I can see this becoming very cheap when mass produced.
Do we know whether AMD plans to do SOIC packaging on its own like it does with substrate packaging? If not I expect it to cost more in any case just for the reason AMD can't move it in-house and directly profit of economy of scale. I haven't seen any news indicating the TF AMD packaging plant in Malaysia, the current or the upcoming one in next year, is capable of SOIC or not.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |