AMD Realizes Significant Reduction in Power Consumption by Implementing Cyclos Resona

Page 5 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Joseph F

Diamond Member
Jul 12, 2010
3,523
2
0
Wrong, AMD designed the GPU originally but they sold the design to MS which has made changes. One of which was designing the worlds first APU based on that.
The xbox360 GPU underwent several optical shrinks before being integrated with the CPU into a single chip, along with a special "slow down module" which is designed to artificially slow down the CPU and GPU on the fusion chip to compensate for the speed improvements of the design over two seperate chips. It was slowed down to original xbox360 specs

I've heard about this "slow down module" before, but I never saw anyone cite any sources. Can you tell me where you heard this?
 

Idontcare

Elite Member
Oct 10, 1999
21,118
59
91
Compared to what, though?

For example, according to this-
http://www.legionhardware.com/articles_pages/amd_fx_8150fx_8120fx_6100_and_fx_4170,7.html

8150 uses 3% more power than 8120, but runs at 500mhz higher base clock. So why does Piledriver need 5% more power for a 100mhz increase? It doesn't really add up...

I'm assuming Piledriver will be a line of CPUs, not just one single CPU, so I don't understand how they can make such a statement- unless they are only talking about the top bin Piledriver CPU, but wedon't know what it's base speed was before the extra 100mhz or 5% power savings. If it's 100mhz more on top of a base speed of 5.5ghz, that is something different from 100mhz more than 3.6ghz.

Since the article was created by the people who would know such caveats firsthand, and further still the article was internally vetted before being allowed into the public domain by people who would know such caveats firsthand, I am inclined to believe they would have concluded differently if concluding differently than they did would have made a materially significant difference.

You can choose to second-guess the authors but I am going to take them at their word and assume they knew exactly what they meant when they chose to state their conclusions as they did.

FWIW, if you follow the data analysis in this thread, crunching those numbers result in the following graph for Intel's own 32nm Core i7-2600K power-consumption increase for each additional 100 MHz increment:



Even when optimizing the operating voltage to take advantage of the shmoo-curve, from 1.6GHz to 4.5GHz the 2600K results in an average 7% increase in power usage for every 100MHz increase in its clockspeed.

(I excluded the last 5 measurements from the average, in going from 4.5GHz to 5GHz each 100MHz increase requires an average 10% increase in power usage)

So, AMD claiming that 100MHz increase uses up the 5-10% power-savings at any point on the clockspeed curve certainly seems reasonable to me, and seemed reasonable to them to state in the first place.
 

lifeblood

Senior member
Oct 17, 2001
999
88
91
I have an 9XX series AM3+ MB. I wonder if the new power mode will be supported on it? I wonder if it can overclock PD? AMD said PD would be AM3+ but it didn't say it would work with older AM3+ boards.
 

Chiropteran

Diamond Member
Nov 14, 2003
9,811
110
106
Since the article was created by the people who would know such caveats firsthand, and further still the article was internally vetted before being allowed into the public domain by people who would know such caveats firsthand, I am inclined to believe they would have concluded differently if concluding differently than they did would have made a materially significant difference.

You can choose to second-guess the authors but I am going to take them at their word and assume they knew exactly what they meant when they chose to state their conclusions as they did.

FWIW, if you follow the data analysis in this thread, crunching those numbers result in the following graph for Intel's own 32nm Core i7-2600K power-consumption increase for each additional 100 MHz increment:



Even when optimizing the operating voltage to take advantage of the shmoo-curve, from 1.6GHz to 4.5GHz the 2600K results in an average 7% increase in power usage for every 100MHz increase in its clockspeed.

(I excluded the last 5 measurements from the average, in going from 4.5GHz to 5GHz each 100MHz increase requires an average 10% increase in power usage)

So, AMD claiming that 100MHz increase uses up the 5-10% power-savings at any point on the clockspeed curve certainly seems reasonable to me, and seemed reasonable to them to state in the first place.

I understand what you are saying, and i generally agree, but that doesn't change the fact that we don't know what base speed they are assuming before the extra 100mhz or 5% power savings is applied.

Ultimately, all the info tells us is that this change gives a small boost, but it doesn't give us the info to know what the actual performance of Piledriver will be.

Also, do you have a good explanation for this graph?

http://www.legionhardware.com/articles_pages/amd_fx_8150fx_8120fx_6100_and_fx_4170,7.html

It shows power usage difference between 8120 and 8150 under load as 244 to 252, a 3% difference for a 500 mhz difference. Even assuming that is full system load instead of CPU load, subtracting 100 watts for the rest of the system (which is probably overkill), it's still only a 5.5% difference.

I'm not trying to say you are wrong, but how can you explain this? 8120 naturally use too much power?

Also, the scaling obviously hasn't been linear for bulldozer so far, given the absurd power usage when overclocked-

http://www.bit-tech.net/hardware/cpus/2011/10/12/amd-fx-8150-review/10

1200mhz performance increase, 140% power usage increase. If power usage was a linear value of 5%, power usage increase for 1200mhz would be about 80%. I mean, your own example shows i7 CPU scaling about 100mhz per 7%, does it make any sense at all that bulldozer would be scaling BETTER when we all know it uses more power in most load situations?
 

Haserath

Senior member
Sep 12, 2010
793
1
81
I understand what you are saying, and i generally agree, but that doesn't change the fact that we don't know what base speed they are assuming before the extra 100mhz or 5% power savings is applied.

Ultimately, all the info tells us is that this change gives a small boost, but it doesn't give us the info to know what the actual performance of Piledriver will be.

Also, do you have a good explanation for this graph?

http://www.legionhardware.com/articles_pages/amd_fx_8150fx_8120fx_6100_and_fx_4170,7.html

It shows power usage difference between 8120 and 8150 under load as 244 to 252, a 3% difference for a 500 mhz difference. Even assuming that is full system load instead of CPU load, subtracting 100 watts for the rest of the system (which is probably overkill), it's still only a 5.5% difference.

I'm not trying to say you are wrong, but how can you explain this? 8120 naturally use too much power?

Also, the scaling obviously hasn't been linear for bulldozer so far, given the absurd power usage when overclocked-

http://www.bit-tech.net/hardware/cpus/2011/10/12/amd-fx-8150-review/10

1200mhz performance increase, 140% power usage increase. If power usage was a linear value of 5%, power usage increase for 1200mhz would be about 80%. I mean, your own example shows i7 CPU scaling about 100mhz per 7%, does it make any sense at all that bulldozer would be scaling BETTER when we all know it uses more power in most load situations?
The 8120 is probably at the same Vcc as the 8150, thus a somewhat linear increase in power occurs. The graph of the 2600k is every 100mhz with the lowest Vcc at that speed.
 

Idontcare

Elite Member
Oct 10, 1999
21,118
59
91
I understand what you are saying, and i generally agree, but that doesn't change the fact that we don't know what base speed they are assuming before the extra 100mhz or 5% power savings is applied.
I was hoping to show you with the graph that the point is they don't need to, it doesn't matter. Pick any point on that curve and if you want to increase the clockspeed another 100MHz its going to cost you another 5-10% increase in power consumption.

The reason they didn't specify the "base speed" is because the statement holds true across the entire range of tested speeds. (up to 4.1GHz in the paper IIRC)

If you made that statement about SB then it would be true, at least per my data, and I can only surmise that the same holds true for Piledriver given the author's decision to state such a generalized conclusion.

I understand what you are saying, and i generally agree, but that doesn't change the fact that we don't know what base speed they are assuming before the extra 100mhz or 5% power savings is applied.

Ultimately, all the info tells us is that this change gives a small boost, but it doesn't give us the info to know what the actual performance of Piledriver will be.

Also, do you have a good explanation for this graph?

http://www.legionhardware.com/articles_pages/amd_fx_8150fx_8120fx_6100_and_fx_4170,7.html

It shows power usage difference between 8120 and 8150 under load as 244 to 252, a 3% difference for a 500 mhz difference. Even assuming that is full system load instead of CPU load, subtracting 100 watts for the rest of the system (which is probably overkill), it's still only a 5.5% difference.

I'm not trying to say you are wrong, but how can you explain this? 8120 naturally use too much power?

Also, the scaling obviously hasn't been linear for bulldozer so far, given the absurd power usage when overclocked-

http://www.bit-tech.net/hardware/cpus/2011/10/12/amd-fx-8150-review/10

1200mhz performance increase, 140% power usage increase. If power usage was a linear value of 5%, power usage increase for 1200mhz would be about 80%. I mean, your own example shows i7 CPU scaling about 100mhz per 7%, does it make any sense at all that bulldozer would be scaling BETTER when we all know it uses more power in most load situations?

The review comparing the 8120 versus 8150 are comparing physically different CPU's.

You have to use the exact same die, physically the same chip, in order to eliminate the intrinsic differences in RC and parasitic inductance that is going on for each chip and get down to power differences that are truly due to the device physics in question.

Comparing a chip that was binned to be an 8120, and not an 8150, possibly for good reasons relating to its shmoo-plot in the first place is going to convolute the resultant power usage observations like crazy.

In other words, had they taken the 8150 they had in hand an underclocked it to the same clockspeed as the 8120 I bet they would have observed substantially lower power-usage than what their 8120 CPU was showing.

For the 4.8GHz OC'ed example, their results makes me think they probably had a rather large temperature difference between the 3.6GHz and 4.8GHz testing points. That temp delta will drive a considerable increase in the static leakage of the CPU.



If they get a large ΔT going then that middle term will be large even if they have not changed the last term (which is the only term that incorporates clockspeed).

Again this is not the kind of mistake that a professional would make, but review articles are generally not authored by academic or industry professionals. They are done at the laymens level for digestion by the enthusiast, you have to expect some apples-to-oranges comparisons will occur by accident.
 

blckgrffn

Diamond Member
May 1, 2003
9,198
3,185
136
www.teamjuchems.com
Thank goodenss you are here, IDC. Sometimes I feel much more informed after reading your posts.

Then you start throwing in the fancy math... well, I hope you are well compensated IRL.

Thank you for sharing your data.

Just to be clear on your last point, if you can keep a temperature delta minimized between two clockspeeds, the less the resultant power usage will be?
 

Chiropteran

Diamond Member
Nov 14, 2003
9,811
110
106
Yeah, IDC, that makes sense to me now. The 100mhz per 5% power thing being independent of yield and binning is the part I guess I missed. It feels a bit like cheating to do it that way though. I mean, how do they actually *know* what percentage difference is due to the new technology and how much is due to yield?

I mean, this is my take, please correct me where I am wrong.

(Case 1) Without Cyclos Resona, 100 piledrivers are etched

20 unusable
40 reliably work at 3.2ghz 95w
30 reliably work at 3.8ghz 95w
10 reliably work at 4.4ghz 95w

(Case 2) Another batch is done, with the "Cyclos Resona" improvment

20 unusable
40 reliably work at 3.3ghz 95w
30 reliably work at 3.9ghz 95w
10 reliably work at 4.5ghz 95w

Is what I get... but aren't yields random to some degree, and constantly changing as the process is improved?

Isn't it completely possible to have a later wafer with yields similar to case 2 without the Cyclos Resona improvement at all? I'm just trying to wrap my head around how they know the 100mhz difference is due to the change, given all the other variables.
 

GroundZero7

Member
Feb 23, 2012
55
29
91
The engineering samples came back. that's how they know the technology works.

Clock generation uses ~ 35% of a chips power. RCM recovers 85% of the clock power in the new arm chips.

Bulldozer gets a 10% overall power reduction so that equates to only a ~30% reduction in clock power.

RCM also simplifies the design of processors because normally the clock elements of processors are responsible for a large % of failures. The materials used in the clock mesh as well as the inductors are far less prone to failure.

RCM eliminates Clock skew which generally holds back overclocking due to a clock dead zone that has to be there in traditional tree based clock designs (IE Bulldozer). Eliminating clock skew will free up this dead zone and allow it to be used. Somewhere around 10%

This all should allow PD to clock ~20% higher given the hot spots in the processor design do not limit clocks.

Future designs IE Steamroller should do even better.
 

GroundZero7

Member
Feb 23, 2012
55
29
91
Does that means that in future iterations, they could even reduce more the clock generation waste?

From what I read you could get close to 90% reduction in clock power if a processor was designed around RCM. Weather that is possible on a big processor I have no idea.
 

Idontcare

Elite Member
Oct 10, 1999
21,118
59
91
Yeah, IDC, that makes sense to me now. The 100mhz per 5% power thing being independent of yield and binning is the part I guess I missed. It feels a bit like cheating to do it that way though. I mean, how do they actually *know* what percentage difference is due to the new technology and how much is due to yield?

I mean, this is my take, please correct me where I am wrong.

(Case 1) Without Cyclos Resona, 100 piledrivers are etched

20 unusable
40 reliably work at 3.2ghz 95w
30 reliably work at 3.8ghz 95w
10 reliably work at 4.4ghz 95w

(Case 2) Another batch is done, with the "Cyclos Resona" improvment

20 unusable
40 reliably work at 3.3ghz 95w
30 reliably work at 3.9ghz 95w
10 reliably work at 4.5ghz 95w

Is what I get... but aren't yields random to some degree, and constantly changing as the process is improved?

Isn't it completely possible to have a later wafer with yields similar to case 2 without the Cyclos Resona improvement at all? I'm just trying to wrap my head around how they know the 100mhz difference is due to the change, given all the other variables.

I think what you are missing is that they implemented BOTH the traditional clock as well as the Cyclos Resona in Piledriver.

Every chip they tested had the ability to have power consumption and clockspeed tested without Cyclos Resona and then tested again with Cyclos Resona enabled, within the exact same chip.

That is how they were able to isolate the exact benefits of their implementation of Cyclos Resona, they held everything else constant in the test and only varied the clocking method.

This is from the article:
To support testability and robust operation at the wide range of operating frequencies required of a commercial processor, the clock system operates in two modes: direct-drive (cclk) and resonant (rclk).
To operate in both modes, the clock driver needs to support frequency-dependent drive strength and pulse modulation, both of which are efficiently implemented using a split-buffer topology.

Pretty slick, I thought this was the coolest way to test it out. Not that they did it this way for academic purposes though. Obviously they implemented both as a means of risk-reduction in case the resonant clocking technique turned out to be borked.

So they knew exactly what the benefit is, from the very best kind of data, and the conclusion was:
The power savings from rclk enable either a frequency increase of about 100 MHz for the same power, or a power reduction of 5-10% for the same frequency.

Which isn't bad, it just doesn't sound as sexy as the headline "significant reduction in power consumption" when you boil it down to brass tax.

Still though, as was poignantly noted by another member, engineers wrack their brains attempting to lower power usage of the microarchitecture by 1-2%, getting a 5-10% reduction is rather phenomenal even if it is paltry in the grander scheme of things.
 

Idontcare

Elite Member
Oct 10, 1999
21,118
59
91
Thank goodenss you are here, IDC. Sometimes I feel much more informed after reading your posts.

Then you start throwing in the fancy math... well, I hope you are well compensated IRL.

Thank you for sharing your data.

Just to be clear on your last point, if you can keep a temperature delta minimized between two clockspeeds, the less the resultant power usage will be?

Absolutely.

Here is a real-world example of how much the power consumption will rise for a CPU when the clockspeed is kept constant and the only thing changing is the operating temperature.



The absolute numbers may not seem all that impressive but consider that what this graph is showing you is the power consumption for this specific test rose an astonishing 33% (from 68W to 91W) solely to an increase in the operating temperature (from 47C to 96C).
 

taltamir

Lifer
Mar 21, 2004
13,576
6
76

Idontcare

Elite Member
Oct 10, 1999
21,118
59
91
Wrong, AMD designed the GPU originally but they sold the design to MS which has made changes. One of which was designing the worlds first APU based on that.
The xbox360 GPU underwent several optical shrinks before being integrated with the CPU into a single chip, along with a special "slow down module" which is designed to artificially slow down the CPU and GPU on the fusion chip to compensate for the speed improvements of the design over two seperate chips. It was slowed down to original xbox360 specs

The first? Must be in how you are defining APU? Sony released a single-die chip containing both the emotion engine and the graphics synthesizer in 2004.

 

taltamir

Lifer
Mar 21, 2004
13,576
6
76
I stand corrected.

In that case the history of fusion/APU is:
Sony Emotion engine, followed by MS Vejle, followed by intel SB, followed by AMD.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
Must be in how you are defining APU?

Exactly. The definition of the term "Accelerated Processing Unit" or more general "heterogenous processor" would always include at least two different types of compute cores. So for each of those CPU+GPU hybrids it just needs to be checked, whether their non-generic compute cores (e.g. in the graphics processing units) could be used for compute tasks. The Emotion Engine is a special case as its MIPS core is accompanied by 2 vector processors and the graphics synthesizer.

Otherwise we might go back further and have a look at e.g. Cyrix' MediaGX from 1997.
 
Last edited:

blckgrffn

Diamond Member
May 1, 2003
9,198
3,185
136
www.teamjuchems.com
Absolutely.

Here is a real-world example of how much the power consumption will rise for a CPU when the clockspeed is kept constant and the only thing changing is the operating temperature.



The absolute numbers may not seem all that impressive but consider that what this graph is showing you is the power consumption for this specific test rose an astonishing 33% (from 68W to 91W) solely to an increase in the operating temperature (from 47C to 96C).

That's really interesting, I think, because when we see these tests (that we argue about so much...) that show power usage levels, it becomes much more important to know what kind of cooling that CPU was under. ie, stock coolers? Huge aftermarket coolers? Were they tested with the same ambient temp? Since I would think that many sites keep a simple DB of their results and don't run through every platform for a review this could be very informational.

And it further casts a dim light on sites like Tom's (IMHO) who seem somewhat disinterested in rigorous testing conditions.

Thank you for your reply and the backing data
 

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
Every chip they tested had the ability to have power consumption and clockspeed tested without Cyclos Resona and then tested again with Cyclos Resona enabled, within the exact same chip.

I give them a lot of credit for this. Having built in risk mitigation in case something goes wrong was a very wise move on their part. Another failure would have put them out of business.
 

Idontcare

Elite Member
Oct 10, 1999
21,118
59
91
Exactly. The definition of the term "Accelerated Processing Unit" or more general "heterogenous processor" would always include at least two different types of compute cores. So for each of those CPU+GPU hybrids it just needs to be checked, whether their non-generic compute cores (e.g. in the graphics processing units) could be used for compute tasks. The Emotion Engine is a special case as its MIPS core is accompanied by 2 vector processors and the graphics synthesizer.

Otherwise we might go back further and have a look at e.g. Cyrix' MediaGX from 1997.

The moniker "APU" is AMD's to define as they like, right?

Technically AMD was the first with an APU the same as technically Nvidia was the first with a "GPU".

Other products with similar features may have existed before the first APU or first GPU, but since they weren't called APU or GPU they don't get to be called "the first" unless the originators of the terms themselves (AMD and Nvidia in these cases) elects to anoint them as so.

In the end though we are still left to ponder what it all matters anyways. Sony's EE+GS did not herald in an era of unbridled synergistic performance or never before seen lowering in cost. Neither did the media GX, or Brazos.

5 or 6 yrs later and the nuts and bolts of the conclusion on fusion, HSA, and apu's seems to be on a collision course with one big "meh".

The transitions from uni-core to multi-core was nice, so too the re-introduction of the IMC, and 64bit for >4GB memory was nice. But IGP's? Still waiting for the "big to do" to finally come to fruition.
 

mrcool63

Member
Apr 26, 2010
26
0
0
Absolutely.

Here is a real-world example of how much the power consumption will rise for a CPU when the clockspeed is kept constant and the only thing changing is the operating temperature.



The absolute numbers may not seem all that impressive but consider that what this graph is showing you is the power consumption for this specific test rose an astonishing 33% (from 68W to 91W) solely to an increase in the operating temperature (from 47C to 96C).

Temperatures as such are an indication of electron loss in the circuit.. the lost energy is radiated as heat. it is akin to pain in a human being, more the pain more the problem in the circuit it goes to show how efficient a processor is. more leakage implies more generation of heat that implies a greater degree of power is required to maintain efficiency.. this fact applies to all electrical circuitry..

This is the basis for more power required during more heat..
 

Idontcare

Elite Member
Oct 10, 1999
21,118
59
91
Temperatures as such are an indication of electron loss in the circuit.. the lost energy is radiated as heat. it is akin to pain in a human being, more the pain more the problem in the circuit it goes to show how efficient a processor is. more leakage implies more generation of heat that implies a greater degree of power is required to maintain efficiency.. this fact applies to all electrical circuitry..

This is the basis for more power required during more heat..



I don't know about all that, but the basis for increasing power consumption due to rising temperature in integrated circuits is Ohm's law and the Poole-Frenkel effect.

 

Ferzerp

Diamond Member
Oct 12, 1999
6,438
107
106
Exactly. The definition of the term "Accelerated Processing Unit" or more general "heterogenous processor" would always include at least two different types of compute cores. So for each of those CPU+GPU hybrids it just needs to be checked, whether their non-generic compute cores (e.g. in the graphics processing units) could be used for compute tasks. The Emotion Engine is a special case as its MIPS core is accompanied by 2 vector processors and the graphics synthesizer.

Otherwise we might go back further and have a look at e.g. Cyrix' MediaGX from 1997.

So, you mean, something like sandwiching integer and floating point on to the same die? hmm....
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |