AMD Realizes Significant Reduction in Power Consumption by Implementing Cyclos Resona

Page 8 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
BD is here for a long time...

The FPU is truly a breakthrough while in the coming years
they will put their efforts in improving its integer capabilities.

The shrink to smaller nodes will allow to simply double the FPU
units and gain double the FP perfs.

While that's certainly true, then what would be the purpose of HSA then? I think we'll be seeing AMD offering more "specialty" cores in the near future, and likely ahead of Intel in that sense. Though I think Intel's Knight's Corner will come out as a co-processor before we see full-fledged GPU number crunching.
 

grimpr

Golden Member
Aug 21, 2007
1,095
7
81
BD is here for a long time since its frequency
potential is a valuable asset....

The FPU is truly a breakthrough while in the coming years
they will put their efforts in improving its integer capabilities.

The shrink to smaller nodes will allow to simply double the FPU
units and gain double the FP perfs.

Here's a fresh one from Dresdenboy, they got Steamroller on development and on verification of its new FPU unit.

http://www.russinoff.com/papers/srt8.pdf
 

Abwx

Lifer
Apr 2, 2011
11,172
3,869
136
HSA would be useful for repetitive calculations using a redundant pattern
as it allow massive parallelization but for many apps the generic FPUs
are out of reach as there s lot of FP computing where an operation is
dependant of the preceding results , that is , data dependency prevent
any parralelization.
 

Abwx

Lifer
Apr 2, 2011
11,172
3,869
136
Here's a fresh one from Dresdenboy, they got Steamroller on development and on verification of its new FPU unit.

http://www.russinoff.com/papers/srt8.pdf

Wow , that s a first rate mathematical theory paper.
Thanks for the link.

They did solve equations to find the most efficient and errorless
base (radix) and algorithm to allow fast computing of square root and division.
According to this paper , the already known theorems on that matter
were flawed and as a result :

any radix16 or 32 SRT hardware divider based on these
results (the previous flawed theorems) is likely to have a bug similar to that of the original Pentium FDIV instruction
CPU with a radix16 anyone ??...
 
Last edited:

Abwx

Lifer
Apr 2, 2011
11,172
3,869
136
They use a radix-4 SRT based hardware divider/square root extractor but as pointed in their paper it is not rigourously precise as well.
 

intangir

Member
Jun 13, 2005
113
0
76
They use a radix-4 SRT based hardware divider/square root extractor but as pointed in their paper it is not rigourously precise as well.

You can't draw that conclusion. First of all, your "quote" from the paper does not appear in the paper, and second, in the absence of first-hand reports from the Penryn designers, the paper can only make suppositions about the proof methods that Intel may have used. Given that this divider has been out in the wild for 4 years without any division bugs being found, I think it's been sufficient time to say that it actually works!
 

Abwx

Lifer
Apr 2, 2011
11,172
3,869
136
You can't draw that conclusion. First of all, your "quote" from the paper does not appear in the paper,

It is explicitly on the paper , i copied it as it was written,
so first of all is that you didnt read the paper accurately ,
not to say not at all since it s on page 3 first column.
 

intangir

Member
Jun 13, 2005
113
0
76
It is explicitly on the paper , i copied it as it was written,
so first of all is that you didnt read the paper accurately ,
not to say not at all since it s on page 3 first column.

Hm, that is not what is in my paper. This is the exact sentence, which you misquoted:

Moreover, any radix-16 or -32 SRT hardware divider based on these results is likely to have a bug very similar to that of the original Pentium FDIV instruction.

So, you did not copy it as it was written; you dropped two words and punctuation which caused my search for the text I sourced from you to fail, and you added a parenthetical comment which was not in the original document. I find your conduct and motives suspicious.

Plus, there's still the issue that there is no evidentiary chain that provides a link between the formal proof methods used on the Penryn divider and the theorems discussed in this paper.
 

Abwx

Lifer
Apr 2, 2011
11,172
3,869
136
So, you did not copy it as it was written; you dropped two words and punctuation which caused my search for the text I sourced from you to fail, and you added a parenthetical comment which was not in the original document. I find your conduct and motives suspicious.

I added (the flawed theorem) since without this precision
the quote would be meaningless as there would be no reference
about what it talked about.

What is suspicious is rather your insistence to discredit AMD s
theorical findings.

Expressely , using a radix-16 SRT can lead to computations errors
in case of square roots extractions.
The theorem is a mathematical proof of the radix-16 SRT limitation,
also proved is that a radix-8 SRT has no such limitations.

The author also highlight the fact that the previous theorem
dated from 2005 was flawed despite having been reviewed
by confirmed mathematicians.

http://www.russinoff.com/papers/srt8.pdf
 
Last edited:

intangir

Member
Jun 13, 2005
113
0
76
I added (the flawed theorem) since without this precision
the quote would be meaningless as there would be no reference
about what it talked about.

What is suspicious is rather your insistence to discredit AMD s
theorical findings.

I wasn't trying to discredit them. I am trying to discredit you and what you're trying to imply.

I have no doubt David Russinoff does good work. I was merely pointing out that his results don't actually apply to any processors you might have thought would answer the question
Abwx said:
CPU with a radix16 anyone ??...
 

Abwx

Lifer
Apr 2, 2011
11,172
3,869
136
I wasn't trying to discredit them. I am trying to discredit you and what you're trying to imply.

I have no doubt David Russinoff does good work. I was merely pointing out that his results don't actually apply to any processors you might have thought would answer the question

Intel employee ??..

The only thing you ll discredit is yourself.

Russinof clearly stated that the 2005 theorem , that was
proved "right" at the time , was flawed and as such , any
CPU using a radix-16 SRT for square root extraction based on the said 2005 flawed theorem is undoubtly bugged, he provide mathematical proof
about it and give exemples.

Since at the time his findings were not known we can assume
that Intel s CPUs radix16 SRT , wich is used for sqrt extractions
in their CPUs , are likely to be bugged in this respect.
 

TuxDave

Lifer
Oct 8, 2002
10,572
3
71
AMD is catching up to 2008? I thought they were out on the edge with FMA?

Let's just say the Oregon has several people who are beasts when it comes to FP divides. I worked with them first hand for timing and logic verification when we tore up the Nehalem FDIV (which opened the pandora's box of "how does this thing even work"?)

I got roped in conversations with mathematicians since divides are all about error bounds. The more bits of precision you try to include, the smaller the error bound but the slower the circuit. There was an optimal tradeoff they found that was tweaked as the mathematicians developed their proofs "oh, ok maybe you need one more bit here". To me, it was all voodoo.

The lead architect for it basically wrote the original FP specs back in the day. Only he has the guts to try and revamp the whole thing (in light of the Pentium FDIV bug)
 

Idontcare

Elite Member
Oct 10, 1999
21,118
59
91
Intel employee ??..

The only thing you ll discredit is yourself.

Russinof clearly stated that the 2005 theorem , that was
proved "right" at the time , was flawed and as such , any
CPU using a radix-16 SRT for square root extraction based on the said 2005 flawed theorem is undoubtly bugged, he provide mathematical proof
about it and give exemples.

Since at the time his findings were not known we can assume
that Intel s CPUs radix16 SRT , wich is used for sqrt extractions
in their CPUs , are likely to be bugged in this respect.

Just going to point out the obvious here, it would not be at all uncommon for Intel to have known that and remedied it while at the same time intentionally withholding that info from the public domain as a matter of trade-secret.

When it comes to the public domain, there are a lot of strategic corporate politics and subterfuge going on in this industry. Just saying you can't rule it out, never should attempt to rule it out, and just because there is nothing on it from Intel does not mean there is nothing on it within Intel.

And of course there is the possibility that Intel had no idea, we can't rule that out either of course.
 

Abwx

Lifer
Apr 2, 2011
11,172
3,869
136
Just going to point out the obvious here, it would not be at all uncommon for Intel to have known that and remedied it while at the same time intentionally withholding that info from the public domain as a matter of trade-secret.

When it comes to the public domain, there are a lot of strategic corporate politics and subterfuge going on in this industry. Just saying you can't rule it out, never should attempt to rule it out, and just because there is nothing on it from Intel does not mean there is nothing on it within Intel.

And of course there is the possibility that Intel had no idea, we can't rule that out either of course.

They have enough mathematicians to do the work but what is surprising
is that they would have implemented a radix-16 SRT knowing that
a radix-8 SRT would be better as well as not patenting their
findings with the risk of someone else doing it instead.

Also , Russinof state that the 2005 preceding theorem did claim
that an errorless radix16 SRT was possible and that a crew of mathematicians did find the said theorem to be right.

The maths involved are very high level , Russinof paper s which
is linked above is a very compressed memo , as his complete paper
use about 800 lemmas in his mathematical demonstration ,
so it s very likely that even intel s first rates mathematicians
could have been out of track.
 
Last edited:

intangir

Member
Jun 13, 2005
113
0
76
Russinof clearly stated that the 2005 theorem , that was
proved "right" at the time , was flawed and as such , any
CPU using a radix-16 SRT for square root extraction based on the said 2005 flawed theorem is undoubtly bugged, he provide mathematical proof
about it and give exemples.

But here's where you misunderstand the results. The theorems only prove correctness of a divider algorithm; they do not necessarily apply to any physical dividers which could implement the algorithm in different ways. There's no evidence that Intel relied on the 2005 "theorem" (now known to be false) to validate their radix-16 SRT dividers. Even if they did, a flaw in the proof does not translate to a flaw in the design. You can build a huge number of a continuum of possible designs implementing a radix-16 SRT divider algorithm, differing only in the way quotient digit selection is done, and they would all work. That's the nature of iterative numeric methods; they tend towards convergence. How can you possibly know that the particular digit selection method used by Penryn would fail Russinoff's revised proof criteria? "Undoubtly" is a completely unjustified word!

In summary, I would take your alarmist statements as FUD, trumped up by a person with a poor understanding of how formal proof methods are applicable, in an attempt to cast doubt on a perfectly fine piece of engineering, for no better reason than it competes with AMD's. I've seen it from you before, in threads about transactional memory, in threads about hardware multithreading implementations, in threads about IPC (tell me again what "constant IPC" means!), in this very thread about clock mesh technology. You type up a whole bunch of authoritative-sounding technical language, but when it's parsed through or when someone asks a pointed question, it becomes clear that you don't actually know what you're talking about. I could continue at length about how revolted I am at intellectual dishonesty of this magnitude, but I think I would violate forum rules if I did.

Since at the time his findings were not known we can assume
that Intel s CPUs radix16 SRT , wich is used for sqrt extractions
in their CPUs , are likely to be bugged in this respect.

No, we cannot. Again, Russinoff's statement only applies to a "hardware divider based on these results". There's no evidence that Intel designed the Penryn divider based on these results. I'm not even sure the time window would have allowed it. Penryn taped out, in, what, 2006? The design work must have been started at least 2 years before that.
 

Abwx

Lifer
Apr 2, 2011
11,172
3,869
136
You can build a huge number of a continuum of possible designs implementing a radix-16 SRT divider algorithm, differing only in the way quotient digit selection is done, and they would all work. That's the nature of iterative numeric methods; they tend towards convergence. How can you possibly know that the particular digit selection method used by Penryn would fail Russinoff's revised proof criteria? "Undoubtly" is a completely unjustified word!

Seems that you didnt catch the paper s importance..

What is proved is that although radix16 SRT works there is cases
where it is simply unable to yield a digit selection , i.e , the underlying
function of digit selection , whatever the solution choosed in your
infinite continum , does not systematicaly exist.

Only a radix-8 SRT has a systematic digit selection function existence.


In summary, I would take your alarmist statements as FUD, trumped up by a person with a poor understanding of how formal proof methods are applicable, in an attempt to cast doubt on a perfectly fine piece of engineering, for no better reason than it competes with AMD's. I've seen it from you before, in threads about transactional memory, in threads about hardware multithreading implementations, in threads about IPC (tell me again what "constant IPC" means!), in this very thread about clock mesh technology. You type up a whole bunch of authoritative-sounding technical language, but when it's parsed through or when someone asks a pointed question, it becomes clear that you don't actually know what you're talking about. I could continue at length about how revolted I am at intellectual dishonesty of this magnitude, but I think I would violate forum rules if I did.
Your summary is quite long and has no value other perhaps
than a self exemple of what it is supposed to denounce..

You simply did browse in my posts to see what are my opinions
and usuals tenets and now pretend that it s a long time that
you re noticing my posts in this site....

No, we cannot. Again, Russinoff's statement only applies to a "hardware divider based on these results". There's no evidence that Intel designed the Penryn divider based on these results. I'm not even sure the time window would have allowed it. Penryn taped out, in, what, 2006? The design work must have been started at least 2 years before that.

His statements apply more precisely , if you had really took attention ,
to radix-16 SRT hardware dividers when used for square roots extraction ,
not when performing divisions.

Intel did use the knowledge of the time , at wich such flaw wasnt
known , moreover , they did also use their radix-16 hardware divider
to compute square roots...


 

intangir

Member
Jun 13, 2005
113
0
76
Seems that you didnt catch the paper s importance..

What is proved is that although radix16 SRT works there is cases
where it is simply unable to yield a digit selection , i.e , the underlying
function of digit selection , whatever the solution choosed in your
infinite continum , does not systematicaly exist.

Only a radix-8 SRT has a systematic digit selection function existence.

Um, he doesn't say anything of the sort. You can always get an admissible table of any radix if you choose M, N and k large enough. In fact, Russinoff gives a formula for phi(i, j) on page 11! Just plug in rho=4, choose M, N, and K, test the inequalities he gives for all entries and if they all check out, voila, you've constructed a K-admissible M-by-N radix-16 SRT table, the thing you claim does not exist.

Your summary is quite long and has no value other perhaps
than a self exemple of what it is supposed to denounce..

You simply did browse in my posts to see what are my opinions
and usuals tenets and now pretend that it s a long time that
you re noticing my posts in this site....

Oh, there's no pretending going on with me. Look at this post addressing more of your FUD back in September 2011:

http://forums.anandtech.com/showthread.php?p=32264037&highlight=#post32264037

Priceless. Are you going to tell me now that your strained interpretation of "constant IPC" was correct?
 

Abwx

Lifer
Apr 2, 2011
11,172
3,869
136
Um, he doesn't say anything of the sort. You can always get an admissible table of any radix if you choose M, N and k large enough. In fact, Russinoff gives a formula for phi(i, j) on page 11! Just plug in rho=4, choose M, N, and K, test the inequalities he gives for all entries and if they all check out, voila, you've constructed a K-admissible M-by-N radix-16 SRT table, the thing you claim does not exist.


Notice that you re using his results as if they were published prior
to intel s Penryn launch........

You should also notice that the table number of entries p-bits
is 2^(M + N) so there s really no point of taking M and N "large enough"


Oh, there's no pretending going on with me. Look at this post addressing more of your FUD back in September 2011:

http://forums.anandtech.com/showthread.php?p=32264037&highlight=#post32264037

Priceless. Are you going to tell me now that your strained interpretation of "constant IPC" was correct?

Constant is a word accurate enough.
A 2 Alus core that reach 80/85% of the throughput of a 3 Alus core
has an effectively more constant IPC , otherwise it would at best
reach 2/3 of the latter IPC , i.e , the average IPC/Alu is higher.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
One of the guys at P3DNow forum mentioned the possibility of implementing a Radix-16 SRT by cascading two Radix-4 stages:

http://www.planet3dnow.de/vbulletin/showpost.php?p=4573441&postcount=20

The papers he linked:
http://www.acsel-lab.com/arithmetic/papers/ARITH07/ARITH07_Taylor.pdf
http://www2.imm.dtu.dk/~an/pubs/ARITH20.pdf

@at "constant IPC":
That's more like an academic discussion, because to nail this term, it has to be defined, how IPC will be measured - e.g. using legacy code, recompiled code, one or two threads sharing a module.
 

Idontcare

Elite Member
Oct 10, 1999
21,118
59
91
@at "constant IPC":
That's more like an academic discussion, because to nail this term, it has to be defined, how IPC will be measured - e.g. using legacy code, recompiled code, one or two threads sharing a module.

And what specific instruction is being referred to?

We've got a lot to choose from, and the execution latencies varies widely across them in any given microarchitecture.



So we'd need to define the instruction under consideration, or if it is to be more than one instruction then we must define the instruction mix (and weightings).

In short, its not simply an academic matter, that would actually be easier than the errand we are setting ourselves upon here.

We would be talking about defining our own Bapco sysmark or passmark with which "effective IPC" would be characterized, and it would be "workload class" dependent. IPC for office apps is different than IPC for mathlab apps because the instruction mix, and their weightings, are so different.

What the heck would "constant IPC" mean? To me it means "steady-state IPC", you process the exact same execution loop indefinitely and measure the average IPC that comes from doing so. But what instructions? And perhaps more importantly, to what end?
 

Olikan

Platinum Member
Sep 23, 2011
2,023
275
126
not sure if somebody saw this...but

Figure 6: Simulated cclk and rclk waveforms at Vdd=1.2V, frequency=4.25GHz under different drive-strength configurations

bulldozer 4.2Ghz uses 1.4V right?
 

Idontcare

Elite Member
Oct 10, 1999
21,118
59
91
I saw that but I was dissapointed they were only showing simulated waveforms rather than the actual waveforms.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |