Why can't AMD make 3.8GHz processors?

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

dmens

Platinum Member
Mar 18, 2005
2,274
959
136
This is somewhat accurate but it is very indirectly related to the pipeline lenght. A longer pipeline requires more logic than a comparable shorter pipeline and, in this respect, a longer pipeline leads to higher power draw

Not necessarily, since the issue of activity factors come into play. Dynamic power draw (which is still the dominant factor for processors in the market today) has a linear correlation with signal activity. A long pipeline with low signal activity due to whatever reason (architectural or good design) will have lower power draw than a short pipeline that is switching constantly.

Another thing is that a fast-switching transistor should require thinner gate-oxide channels, which makes the dielectric's insulating ability diminish greatly, which leads to higher current leakage. The problem with Prescott was that it was huge (for a 90nm CPU) and it operated at high clock speeds that required fast-switching transistors.

Using low-Vt transistors is a design choice. By pipeline depth, netburst has fewer transistors per pipestage than other designs, so blowing up leakage power using low-Vt is a last ditch effort. Besides, leakage was the least of prescott's concerns imo.
 

Furen

Golden Member
Oct 21, 2004
1,567
0
0
I think that current leakage is actually a huge issue with Prescott. The only thing that somewhat mitigates this is the use of SpeedStep in the newer revisions and the many tweaks that Intel gave it in order to curb this leakage. Dynamic power is also a big issue, of course, because of the insane amount of transistors in current chips, but this actually decreases with process improvement and lowered Vt, while leakage actually increases (relatively). Low threshold voltage transistors are indeed a design choice but netburst's expected clock scaling actually made this necessary or dynamic power would have been insane (not that it isn't right now). Most likely Prescott's problems are a combination of both dynamic and static power because the use of low Vt transistors was actually aimed at lowering the dynamic power at higher clock speeds and a greater speed potential. The huge surprise at 90nm was leakage, though. Both Intel and IBM had pretty bad leakage problems (though AMD avoided them to a large degree due to its more-expensive SOI process, which reduces leakage throughout the transistor, not just at junctions, by the way) and the low Vt doesn't help.
 

dexvx

Diamond Member
Feb 2, 2000
3,899
0
0
Originally posted by: Furen
LOL, A Sempron 2800+ can easily crush a Celeron M in most tasks, excepting gaming (where the cache makes too much of a difference). The Sempron has a 12 stage pipeline, by the way, the original K7 is the one that had a 10-stage pipeline.

Actually not.

http://www.anandtech.com/mobile/showdoc.aspx?i=2625

The 2800+ Sempron64 (1.6Ghz) is easily matched by the lowly Celeron-M 1.4Ghz/1MB in most tasks.

Originally posted by: Furen
Itanium2 is just massive because of its width, though its clock speed is low (and the first Itanium 2 was a 180nm chip, which makes its transistors close to 4x bigger than Prescotts, which, by itself makes them draw more power). Power4 is also pretty wide and its clock speeds are a bit higher than Itanium2's (not to mention that it also uses a 180nm process). By the way, a K8 at 3GHz may draw about 100W but this is because its wasn't BUILT for those clocks but, as a comparison, a dual-core FX-60 at 2.6GHz draws much less than 100W (something like 70-80W).

An FX-60 draws close to 100W at stock at peak. Itanium2 is 130nm and the latest Power4 is 90nm.
 

dmens

Platinum Member
Mar 18, 2005
2,274
959
136
Originally posted by: Furen
I think that current leakage is actually a huge issue with Prescott. The only thing that somewhat mitigates this is the use of SpeedStep in the newer revisions and the many tweaks that Intel gave it in order to curb this leakage. Dynamic power is also a big issue, of course, because of the insane amount of transistors in current chips, but this actually decreases with process improvement and lowered Vt, while leakage actually increases (relatively). Low threshold voltage transistors are indeed a design choice but netburst's expected clock scaling actually made this necessary or dynamic power would have been insane (not that it isn't right now). Most likely Prescott's problems are a combination of both dynamic and static power because the use of low Vt transistors was actually aimed at lowering the dynamic power at higher clock speeds and a greater speed potential. The huge surprise at 90nm was leakage, though. Both Intel and IBM had pretty bad leakage problems (though AMD avoided them to a large degree due to its more-expensive SOI process, which reduces leakage throughout the transistor, not just at junctions, by the way) and the low Vt doesn't help.

No, prescott leakage is not that bad. Speedstep doesn't do anything to leakage. It is purely targeted at dynamic power by reducing activity factors. And the transistor gate width due to high transistor count has more effect on leakage, since activity factors on most nets are quite low, even in power virus. But even then, leakage is much less than dynamic power draw on prescott.

As for low Vt, that increases leakage to the benefit of switching speed, not the other way around. Is its usage an absolute necessity to converge timing? Maybe. But even then, the percentage of low Vt gate is not high to enough to contribute significantly to prescott's static power draw.

Leakage will become a problem only if the process guys screw up.
 

Continuity28

Golden Member
Jul 2, 2005
1,653
0
76
Originally posted by: dmens
We want efficiency and shorter pipelines until the fabrication process allows 10ghz naturally

LOL, I don't know who "we" are, but intel, amd, ibm and everyone certainly don't want shorter pipelines and "efficiency", whatever the hell that means. Are you implying that long pipelines are inefficient? OK, maybe if the machine nuked and/or the frontend mispredicted every 50 cycles.... oh wait, that doesn't happen.

Throughput (and performance) is achieved by width, frequency (longer pipelines... gasp!) and smart speculation, not short pipelines and stalling for every flow uncertainty in the pursuit for "efficient" execution.

It's nice that you can take my post out of context.

By efficiency, I was obviously talking about HEAT and POWER DRAW in comparison to performance. Where did I say you can't have a solid performer that has more pipelines hmm? I didn't.

The whole point is that they increased the length of pipelines to get more speed instead of waiting for better manufacturing process. If they had stuck with their Pentium III architecture, and moved it to 90nm, 65nm, etc... do you honestly think they would be so far behind? That's why I said to stick with the comparitavely "shorter" pipeline design, in comparison to netburst which went too far. Instead of adding so much in the way of branch prediction to help offset their pipelines (and using many transistors in the process mind you) they could have done other things to make their original architectures better.
 

Furen

Golden Member
Oct 21, 2004
1,567
0
0
I suppose it depends on what you focus on. I actually tend to give encoding, encrytion and other similar tasks more weight in my assessments (and the very basic benchmarks in that Anandtech article tend to focus on integer performance) so let's compromise and say that the 2800+ performs about the same as a Celeron M at 1.4GHz.

Initial Itanium 2s were 180nm, as were initial Power4s. A 90nm Power4+ at 1.3GHz has a TDP of around 115W. Assuming that this is what it draws at load then it draws significantly less power than a 3.8GHz P4, for example, which draws significantly more than its TDP rating. My point was that these two are significantly wider than a P4 (the Power4 is dual-core AND wider) and that is why their power draw is high. Lost Circuits has nice CPU power draw measurements. These show that the FX-60 consumes around 81W. Assuming a 90-95% efficiency from the motherboard's VRMs then real power consumption is 73-77W.
 

dmens

Platinum Member
Mar 18, 2005
2,274
959
136
What's out of context? You and countless others said/implied shorter pipelines are the means to better performance per watt, which is wrong.

Why in the world would chip designers want to rely on process shrinks to get more speed. That is admitting all architectural options have been exhausted and the only way to extract performance is with die shrinks. x86 performance would have grown at a snail's pace. How much faster do you think transistors get per process generation? Here's a hint: it's not the insane 40% that certain PR releases have been claiming.

Oh, and "netburst" didn't go too far. I wrote a post about netburst misconceptions a while back, check it: http://forums.anandtech.com/messageview...&STARTPAGE=2&FTVAR_FORUMVIEWTMP=Linear

FYI, if intel stuck with a P3 design and just did die shrinks, it would suck horribly. Even if they took the P3 and deepened all its buffers to say, as deep as merom. It would still be a crippled chip utilizing a mere fraction of its available floorspace, with crappy throughput because of its short, low frequency pipeline.

Regardless, pipelining is still the best way to get throughput. You can shoot for pipestage width, but chances are the stage in question will need repipelining, since width saps speed like nothing else.
 

BrownTown

Diamond Member
Dec 1, 2005
5,314
1
0
high clockspeeds is whats really needed so far as i'm concerned. This whole multicore fad will last awhile as process tech allows companies to double their cores every 2 years, but this is a rapidly deminishing return game. In real life most problems jsut dont parellelise theat well. Especially when your gonna start having 8-64 cores, how are you gonna divide something up so that you are getting anywhere near full thoroughput on that? Therese no way a human could do it, so basically you need compliers which auto-parellelise problems, but i doubt you will be able to extract enough parellelism to keep that many cores fed no matter what you use. Sure you can go an add asymetric cores, where some are very powerfull and others are weak, so you can try to put the kernal threads on the powerfull cores, and the other threads on the weak ones, but that will require tons of arbitration logic (or software) to distribute the threads across the cores. Either way, you are eventuall running into a wall where you cannot increase performance at the same clockspeed no matter how many cores you have since there are certain essential traces which must run in a serial manor. The only way to speed them up is to add clockspeed. I mean lets face it, adding 16cores will probably get you the same performance increase as adding maybe 4x the clockspeed (assuming that you are giving both the same RAM bandwidth which would obviously have to be very large to keep either method working well)
 

Furen

Golden Member
Oct 21, 2004
1,567
0
0
Seriously, Intel is pushing multi-core a bit too much for my taste. The benefit from going to quad-core is going to be significantly less than the benefit we got from going dual-core since, even in the workstation environment, such a huge amount of multithreading is rare (most apps are dual-threaded). AMD is actually not mentioning massive amounts of multi-core CPUs so hopefully they have a plan (aka a new microarchitecture or a significant improvement on the current one) that will deepen the pipelines a bit to increase clocks and improve SIMD and Integer performance. If Conroe actually performs on par with K8s clock for clock then Intel should be in pretty good shape for a while yet, since the EE clock speed looks pretty good (and perhaps production maturity will allow it to increase even further).
 

BrownTown

Diamond Member
Dec 1, 2005
5,314
1
0
1. AMD is pushing multicore as much if not mroe then Intel, they are the ones that started dual core, and they are looking to go quad core ASAP.

2. Conore EE speed is speculation at this point, also absolutely no info has been released per its release date, so could easily be Q1 2007.

Anyways, both AMD and Intel are fully on the multicore bandwagon, at this point it is really the only place to be since improving the IPC of single threaded apps has pretty much gone as far as it can go. However, it will not be too long before increasing speed of multithreaded apps is also reaching very limited returns. Fact is simply that the pace of advance in CPU speed cannot continue on indeffinitely, certainly there are many avenues to explore, but the fact of the matter is that as time goes on it will be harder and harder to find ways to meaningfully improve speed as all the easy ways of improving performance are reaching their limit.
 

dmens

Platinum Member
Mar 18, 2005
2,274
959
136
Having a multicore product does not mean single thread performance is ignored, that is still the most important performance parameter.
 

BrownTown

Diamond Member
Dec 1, 2005
5,314
1
0
no, single threaded performance is pretty much relegated to a secondary or tertiary objective, the same cores are used as before, and likely running at slower clockspeeds then single cores. Single threaded performance improvements are not easily obtained anymroe without additional clockspeed, so single threaded performance is being pushed aside in order to try to increase maximum thoroughput by using multithreaded software.
 

dmens

Platinum Member
Mar 18, 2005
2,274
959
136
single threaded performance is pretty much relegated to a secondary or tertiary objective

That's not the directive I'm working under, and I doubt AMD is so dumb as to think that way.

Single threaded performance improvements are not easily obtained anymroe without additional clockspeed

It's not easy any more, but there's still plenty of ways to extract performance. I don't know how you can possibly justify that statement.

so single threaded performance is being pushed aside in order to try to increase maximum thoroughput by using multithreaded software.

LOL, maybe if the hardware guys have *lot* of faith in both the compilers and programmers. But we don't. Trust me, single thread performance is still extremely important.
 

F1shF4t

Golden Member
Oct 18, 2005
1,583
1
71
Having a longer or shorter pipeline will have a very small effect on performance, assuming ur unable to do tasks in parrallel, and have the same clock speed and each stage take one clock cycle assuming ideal situation.
Having a longer pipeline will allow u to reach higher clock speeds, as in fact the highest clock speed of the cpu IS the clock speed of the slowest pipeline stage, this means that making each stage equal in complexity will have to be done well for the deep pipeline to allow hight clock speeds.

The advantage of having a larger number of pipeline stages in a cpu is that once the pipeline is full ur able to do an instruction per clock at a higher clock speed, now the biggest problem with this is that u have to keep the pipeline full, as any stall will have more effect on performance. Since to process just one instruction it takes u the number of clock cycles as the number of stages.

now theoreticaly i think when u just take into account the execution core (fully utilised) the pentium 4 should be able to blow a64 out of the water, and in fact a number of synthetic benchmarks show this.
Problem arises that in real world performance ideal theoretical performance is never reached, then the best product is the one that is designed better overall. (i'm refering to the mem controller etc)

like what is said in enginnering the sum off all the parts solved seprately will always be worse than if the problem was solved taking all parts into account.
 

BrownTown

Diamond Member
Dec 1, 2005
5,314
1
0
Originally posted by: dmens
single threaded performance is pretty much relegated to a secondary or tertiary objective

That's not the directive I'm working under, and I doubt AMD is so dumb as to think that way.

well, I was under the impression that the current cores weren't gonna change all that much, and that the big changes were gonna be the interconnection between cores, and of course the addition of more cores. Obviously Intel and AMD are looking into improving single threaded performance where they can, but clearly the focus is on multicore designs, and improving multithreaded tasks. At least this is what they keep talking about in all their press reports and the like.

Maybe its just all hype about multicore and AMD and Intel really only push it to sound futuristic, but personally I beleive that both companies are working more on integrating multiple cores together then they are in creating new cores.
 

stuepfnick

Junior Member
Mar 1, 2006
6
0
0
I think, the real way to boost performance in the future will be SIMD-Units.


Take the Cell for example. It reaches 200 GFlop, where a P4 with 3,2 Ghz makes 25,6 GFlops. That's just for Matrix Multiplication, in Linpack it reaches 156 Gflops, where the P4 is at 25,6 too.

At Double-Precision it is still not very fast, it has 9,6 GFlops compared to the 7,2 Gflops of the P4.

BUT there is a Cell2 CPU planned, that also speeds up Double precision in a similar way for 2007.

And I think 2 Cores for the consumer is fine, 4 Cores for Workstations, more doesn't really make sense, although multithreaded Apps run usually in 2-10 Threads.

So I think, most important is a powerful SIMD-Unit. SSE3 is good, Altivec is better, the Cell is by far the best here!!

So IBM is on the right way, also they said to be able to achieve double clock speeds in 65 nm, without a longer pipeline, the same amount of power consumption and the same performance per clock cycle.

The plans are: A Power6 in 2007, which will run at clock speeds of 5,6 Ghz. I think, that's pretty impressive!

greetings,
Stuepfnick
 

stuepfnick

Junior Member
Mar 1, 2006
6
0
0
PS: I don't mind, if SIMD-Units can't boost Office, Internet or Mail. They all run fast enough! What Cell really boosts is Audio, Video, 3D - the things, where performance is really needed!
 

Furen

Golden Member
Oct 21, 2004
1,567
0
0
The problem with Cell, as I see it, is that it is an in-order, narrow and deep CPU with a pretty lousy branch predictor. And the SPEs are even worse than the PPU. It does great with streaming media but you'd have a huge performance hit once you start hitting good old general purpose code with branches and the like. I must say that a Cell CPU as a media encoder/Renderer does sound very good...

Even so, both AMD and Intel have shown some interest in asymmetric, specialized cores, but since going multi-core currently yields the greatest benefits (because there are many applications that are already optimized for SMP) trying to introduce specialized cores that need to be handled differently (and optimized for differently) is just not worth it.
 

Centoros

Member
Mar 1, 2006
70
0
0
That's true but let us not forget AMD uses on die memory controllers too which makes them more efficient. This also attributes to their being faster as well.
 

Furen

Golden Member
Oct 21, 2004
1,567
0
0
Originally posted by: Centoros
That's true but let us not forget AMD uses on die memory controllers too which makes them more efficient. This also attributes to their being faster as well.

The microarchitectures are too different to be able to compare the impact a single feature has on them. AMD has its integrated memory controller while Intel uses larger caches, more aggressive prefetching and the like. I tend to believe that clock-for-clock AMD is currently slightly more efficient but performance is pretty damn close considering that AMD has the integrated memory controller advantage. Of course AMD currently has a clock speed advantage (and x86-64) compared to Yonah but Conroe is supposed to take care of this with a relatively minor increase in pipeline depth and TDP.
 

sofarfrome

Senior member
Apr 27, 2005
787
0
0
____________________________________________________________
Think about which is more impressive at the same 0.09u process:

12-stage pipeline @ 2.8GHz, runs warm, or
30+ stage pipeline @ 3.8GHz, runs oven-hot

The former is, right?
____________________________________________________________

9 microns? Damn, that is small
 

dexvx

Diamond Member
Feb 2, 2000
3,899
0
0
Originally posted by: stuepfnick
Take the Cell for example. It reaches 200 GFlop, where a P4 with 3,2 Ghz makes 25,6 GFlops. That's just for Matrix Multiplication, in Linpack it reaches 156 Gflops, where the P4 is at 25,6 too.

At Double-Precision it is still not very fast, it has 9,6 GFlops compared to the 7,2 Gflops of the P4.

BUT there is a Cell2 CPU planned, that also speeds up Double precision in a similar way for 2007.

And I think 2 Cores for the consumer is fine, 4 Cores for Workstations, more doesn't really make sense, although multithreaded Apps run usually in 2-10 Threads.

So I think, most important is a powerful SIMD-Unit. SSE3 is good, Altivec is better, the Cell is by far the best here!!

Peak performance at optimal conditions. EG, it won't achieve it in real world.

The Cell is a 8way SPE/PPE. The assembler for is so massively difficult to program its not even funny. Algorithms are still pretty much under development. To fully utilize Cell, it'll take literally a GENERATION's worth of doctoral thesis's and research.

Originally posted by: stuepfnick
So IBM is on the right way, also they said to be able to achieve double clock speeds in 65 nm, without a longer pipeline, the same amount of power consumption and the same performance per clock cycle.

Yea and IBM promised Apple they'd have 3Ghz PPC's by now.
 

stuepfnick

Junior Member
Mar 1, 2006
6
0
0
Originally posted by: Furen
I must say that a Cell CPU as a media encoder/Renderer does sound very good...

That's what I thought of. For the moment a Cell Co-processor could do it. But I think a cell-ish onchip SIMD-Unit, consisting of i.e. 8 SPEs (or is it called APUs?)

Or simply replace the Cell Main-Core, with a more powerful CPU, like a G5 (would be the easiest way, I think) or an AMD 64 X2, etc.

My preferred system would be Dual-Core G5 with Cell2 (the one that speeds up double precision too) extension!! WHOOOHOO! ;-) :-D
 

stuepfnick

Junior Member
Mar 1, 2006
6
0
0
Originally posted by: dexvx
Originally posted by: stuepfnick
Take the Cell for example. It reaches 200 GFlop, where a P4 with 3,2 Ghz makes 25,6 GFlops. That's just for Matrix Multiplication, in Linpack it reaches 156 Gflops, where the P4 is at 25,6 too.

At Double-Precision it is still not very fast, it has 9,6 GFlops compared to the 7,2 Gflops of the P4.

BUT there is a Cell2 CPU planned, that also speeds up Double precision in a similar way for 2007.

And I think 2 Cores for the consumer is fine, 4 Cores for Workstations, more doesn't really make sense, although multithreaded Apps run usually in 2-10 Threads.

So I think, most important is a powerful SIMD-Unit. SSE3 is good, Altivec is better, the Cell is by far the best here!!

Peak performance at optimal conditions. EG, it won't achieve it in real world.

The Cell is a 8way SPE/PPE. The assembler for is so massively difficult to program its not even funny. Algorithms are still pretty much under development. To fully utilize Cell, it'll take literally a GENERATION's worth of doctoral thesis's and research.

Originally posted by: stuepfnick
So IBM is on the right way, also they said to be able to achieve double clock speeds in 65 nm, without a longer pipeline, the same amount of power consumption and the same performance per clock cycle.

Yea and IBM promised Apple they'd have 3Ghz PPC's by now.

Oh no, you are a little dis-informed. They told Apple, they will have 3 Ghz G5s by 2004, but they have it now. But they have not simply 3 Ghz, they have DualCore 3 Ghz, so a 3 Ghz Quad would be possible at the moment, the only question is: Does Apple still want this? It would be hard to be topped out by a Conroe based Mac this year.

The reason why IBM did only reach 2.5 Ghz in 2004 and not 3 Ghz was unexpected difficulties in the 90 nm process, which ALL MAJOR CPU-MANUFACTORS HAD.

Intel said to reach 4-5 Ghz with the 90 nm process. Do you notice something?

IBM said they solved the problems with higher clock frequencies, and will bring a 5,6 Ghz Power6 CPU.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |