AMD 2014 Desktop roadmap

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
So bummed there is no new performance parts. Planning on upgrading in the spring and was hoping to keep an AMD CPU. But the current FX chips are not enough of an upgrade over what I have now.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
When we look into 2015, the current rumours seems to fit AMDs new stance.

In 2015, 'Carrizo' APUs will be launched to succeed Kaveri in the desktop market, featuring the Excavator architecture at two TDPs: 45W and 65W.
Lower TDPs, 1 and 2 module I assume. Since they are now in the Celeron/Pentium/i3 area. Leaving i5 and i7 alone in the performance and enthusiast segments.
 

ViRGE

Elite Member, Moderator Emeritus
Oct 9, 1999
31,516
167
106
I'll get back to this after breakfast and unlock it. But I expect I'll probably evict a few of you from this thread.
-ViRGE


Cleaned and reopened. Try sticking to the subject of AMD's roadmap this time, for those of you who aren't on vacation for accumulated infractions.
-ViRGE
 
Last edited:

inf64

Diamond Member
Mar 11, 2011
3,764
4,223
136
DT claims this is SR die(or better to say part of the die shot). They got it from Extremetech who supposedly got it from AMD. Very fishy.

 

erunion

Senior member
Jan 20, 2013
765
0
0

Ajay

Lifer
Jan 8, 2001
16,094
8,106
136
good research.

That just shows that its the process that is going to make or break Kaveri. I have a feeling that the process is what delayed Kaveri.

AMD's recent strategy has been to add performance as the process matures. So maybe we'll see that 1 gflop Kaveri in a year and with a new name.

Something like the Trinity->Richland scenario, that may well be the case, especially if Excavator *needs* 20 nm.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,106
136
So it's an objective fact that CMT is "crap"? And I haven't "missed" anything, it's the same old garbage; topic gets created about "something" and it's really just a poorly-veiled front to turn the topic into a AMD/Bulldozer sucks thread. Funny how out of the countless tech forums I've been on, it seems to happen the most here. Now that you guys get called out on derailing, you resort to the classic "you guys are just defending the company/being one-sided" cop-out. Try again!

CMT, objectively, doesn't scale well if it has to serve diverse workloads. It uses allot more xtors than SMT (at least compared to 2x SMT). So a CPU designer is faced with needing more, somewhat anemic, cores for servers, but then the ST performance is horrible on the desktop; or designing beefier cores for the client side at the cost of having too few cores for heavily multithreaded server apps.

Since AMD was and is behind in process technology (via GF), they have no way of scaling up to enough cores under CMT to compete with Intel. So, instead they are making beefier cores and improving instruction decode bandwidth. etc. The problem with this is it is an inefficient way to make a multicore CPU, but that's the design they are stuck with for now. AMD doesn't have the resources to do a ground up redesign of say the Thurban architecture (plus what they've learned from BD) and come out with a new more efficient multicore microprocessor.

Keller has guided the development of Excavator and that processor will show us what he and the best of AMD's architects can do to get the most out of AMD's CMT CPUs. Because of this, EX will almost assuredly include more radical changes than the PD->SR uarch update.

This is why AMD is more focused on winning back some desktop market share than they were b/4. This is is the course of action taken to recover something from the BD debacle. At least AMD's chief architect seems confident:
Jim Keller said:
"AMD are on track to catch up on high performance cores"*
We will have to wait and see if that confidence has any merit.

*http://www.rage3d.com/articles/hardware/amd_worldcast/
 

NaroonGTX

Member
Nov 6, 2013
106
0
76
That die shot is pretty old, and is of a single module of some unknown processor. There was another version that was posted of what was supposedly the "full shot" which included the L2 cache, but people who saw that said it was definitely fake. I think the entire thing may be fake, but who knows for sure... Steamroller ver1, whatever it was, was canceled at some point in 2012 and they started on Steamroller 2.0 (bdver3b, Steamroller B).

AMD doesn't have the resources to do a ground up redesign of say the Thurban architecture (plus what they've learned from BD) and come out with a new more efficient multicore microprocessor.

I don't think it's necessary to throw away the current uarch. Thuban and its predecessors hit the performance wall, there wasn't much they could do to get more out of those K7-derivatives. That's why AMD were in such a scramble to rush Bulldozer out in 2011. Llano showed this further with the low clocks (it was supposed to have clocks in the 4ghz+ range, but GloFo's screw-ups prevented that), but the IPC was roughly 6~7% faster than PII.

I think CMT is a good idea, it's just that the current implementation leaves a bit to be desired. They'll be able to improve on it more and more with each revision to the overall uarch.

I remember that Jim Keller quote, and I think if he's confident enough to say that, it only spells good things for EX. I don't know if he's had any role in SR2, but we'll see come December/January.
 

Rickyyy369

Member
Apr 21, 2012
149
13
81
So it looks like AMD wants to capture the lower and mid range segments and completely surrender the high end to intel. Disappointing for consumers but it makes sense for AMD. They don't have the money to continue producing products that aren't selling. So they're going to focus on their products that do sell, APUs and cat cores.
 

cytg111

Lifer
Mar 17, 2008
23,546
13,113
136
CMT, objectively, doesn't scale well if it has to serve diverse workloads...

I've allways thought of SMT as the bastard child and here is why. Suppose you have one core. Now you SMT enable it. That means that you get to tap into that ~20% unused resources of the core if you just so happen to have another thread lying around that fits the bill.. and that bill really has to fit, dont it?
I like my hyperthreading, its a little something extra extra here and there, it is spice, but thats it.
From an coding angle and the operation system.... this must be a nightmare, I got these 4 real cores and then these 'maybe' 20% threads .. and if I put the wrong things (ie. fitting the bill) in these 20% threads the real cores/threads begins to suffer as well. We'll need some pretty smart AI to guide those schedulars soon.
 

inf64

Diamond Member
Mar 11, 2011
3,764
4,223
136
That die shot cannot be SR. It is either Excavator or a fake.
 

Homeles

Platinum Member
Dec 9, 2011
2,580
0
0
By the time it took to fake a die shot like that, you'd be able to build one from the ground up. There's too many intricacies to fake.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,106
136
I've allways thought of SMT as the bastard child and here is why. Suppose you have one core. Now you SMT enable it. That means that you get to tap into that ~20% unused resources of the core if you just so happen to have another thread lying around that fits the bill.. and that bill really has to fit, dont it?
I like my hyperthreading, its a little something extra extra here and there, it is spice, but thats it.
From an coding angle and the operation system.... this must be a nightmare, I got these 4 real cores and then these 'maybe' 20% threads .. and if I put the wrong things (ie. fitting the bill) in these 20% threads the real cores/threads begins to suffer as well. We'll need some pretty smart AI to guide those schedulars soon.

Well the 20% applies to general purpose, desktop computing, IIRC. I've seen better results for certain more bandwidth intensive apps with small working sets. As each thread stalls waiting on memory fetches, the other thread get full use of the processor - so the net speed up can be better for very specific classes of parallel applications. I know back with the P4 w/HT F@H was getting around a 30% speed up.

In any case, even a 20% speed up for 5% more xtors is a good deal. Sparc T5s are now running 8 way SMT/core. I wonder how it's implemented and what the performance speed up is with that many threads. I know simulations done by DEC showed a drop of in efficiency/xtor after 4 way SMT/core (2nd ed. of Hennessy and Patterson, IIRC).
 

Pilum

Member
Aug 27, 2012
182
3
81
SMT sure does have tradeoffs in 'CPU core quality' or else Intel would use it for every and all of their processors. It's obvious by Intels own product releases that SMT is an improvement only to a specific set of conditions.
I was thinking about general-purpose cores which scale from client to server across a wide range of workloads, sorry, should have made that clearer. Even so, only Silvermont and Quark don't have SMT. Quark is obvious, and Silvermont couldn't benefit much from SMT as long as the out-of-order engine can achieve high utilization, which seems probable in a 2-issue design. If there are only a few unused execution slots, implementing SMT doesn't make sense.

Also, I doubt that there is much if anything stopping Intel to switch between SMT and CMT, or use a combination/drop it. As long as it is the most favorable improvement to whatever uarch they develop, of course. The same holds true for AMD.
True, but AFAIK nobody has used CMT in a high-performance CPU. Seems to indicate that it isn't regarded as useful for producing competitive products.

Dunno. Current pipelines have gotten quite fat and ILP scaling seems to be limited. CMT really isn't a bad idea even for client workloads.
ILP scaling gets slower but seems to be doing well, see Power8 and Haswell. It may be quite some time before it drops as dead as frequency scaling seems to have done. And even when ILP improvements stop, CMT isn't an obvious solution; just stack more of your CMT cores on a die, this also gives more threads but should be easier to balance for an optimum of ILP/integer/FP/die area/power use.

I do think CMT can be useful for client, at least in Kaveris implementation. I'm more sceptical about other uses - even widely threaded server workloads often require high single-threaded performance to keep response times down, and for HPC the design wastes die area on decode/integer components with low utilization, relative to the FPU parts.

Keller has guided the development of Excavator and that processor will show us what he and the best of AMD's architects can do to get the most out of AMD's CMT CPUs. Because of this, EX will almost assuredly include more radical changes than the PD->SR uarch update.
Keller joined AMD in August 2012, that's too late for him to have influenced the basic design of Excavator - if it comes out in 1Q15. He might have helped with some improvements/optimizations, but that's it. The design phase for complex CPUs currently seems to be in the range of 4-5 years, so a from-the-ground-up "Keller design" might come out in 2016/17 (Intel has given a 5-year project time for Haswell, it took Apple 5 years from the aquisition of P.A. Semi to release of the A7, and even something as primitive as the original P4 took 4 years from first RTL implementation to release). But if the result should be remotely as good as A7, it will be worth the wait.

I've allways thought of SMT as the bastard child and here is why. Suppose you have one core. Now you SMT enable it. That means that you get to tap into that ~20% unused resources of the core if you just so happen to have another thread lying around that fits the bill.. and that bill really has to fit, dont it?
I like my hyperthreading, its a little something extra extra here and there, it is spice, but thats it.
From an coding angle and the operation system.... this must be a nightmare, I got these 4 real cores and then these 'maybe' 20% threads .. and if I put the wrong things (ie. fitting the bill) in these 20% threads the real cores/threads begins to suffer as well. We'll need some pretty smart AI to guide those schedulars soon.
SMT doesn't add a "20% thread"; when a core starts running in SMT, you simply get two threads running at roughly 60% of a single one, while on a BD family module you get two threads at ~90% performance of a single thread. Both approaches are transparent to the programmer and work without problems most of the time. Occasionally there are scaling problems when shared resources are insufficient to satisfy both threads, but this limitation is shared by both architectures. This is less common in a AMDs CMT architecture because fewer resources are shared, but it still happens. There have been cases when negative scaling was observed on Bulldozer when going from 4 to 8 threads (e.g. CLOMP v3.3 @ Phoronix). This should also become very visible with Kaveri when comparing integer to FP scaling.

In any case, even a 20% speed up for 5% more xtors is a good deal. Sparc T5s are now running 8 way SMT/core. I wonder how it's implemented and what the performance speed up is with that many threads. I know simulations done by DEC showed a drop of in efficiency/xtor after 4 way SMT/core (2nd ed. of Hennessy and Patterson, IIRC).
That will depend on the workload. If you expect a high cache miss rate but want high utilization of the execution units, 8-way could be useful. I don't know the T5s details, but I'd guess that it has switchable SMT modes like Power does. Power7 can be statically set to 1/2/4 threads per core, so that you can configure your system for your specific workload, Intels SMT can be disabled as well. This is another advantage of SMT: you can adjust the amounts of thread per core for specific scenarios while paying only a small die area penalty. With CMT this is far less useful/the penalties are higher. Hm, although the costs for 4/8-way SMT will also be higher than 5%; it would be nice to know the specific costs of the Power7 and T5 implementations.

Something like the Trinity->Richland scenario, that may well be the case, especially if Excavator *needs* 20 nm.
I don't think Excavator will really need 20nm. There's been lots of talk that the performance benefits of 20nm will be much less than in previous generations and that the cost/transistor won't go down much either, at least initially. And while the WSA is in force, AMD is operating under special restrictions. It may be much better to have a sub-optimal Excavator to produce at GFL than trying to compete with an older architecture. If Excavator really fixes most weak points of the BD family, AMD should get it into the market ASAP and then shrink to 20nm when feasible.
 

Piroko

Senior member
Jan 10, 2013
905
79
91
True, but AFAIK nobody has used CMT in a high-performance CPU. Seems to indicate that it isn't regarded as useful for producing competitive products.
Fair enough.
Power7 can be statically set to 1/2/4 threads per core, so that you can configure your system for your specific workload, Intels SMT can be disabled as well. This is another advantage of SMT: you can adjust the amounts of thread per core for specific scenarios while paying only a small die area penalty. With CMT this is far less useful/the penalties are higher. Hm, although the costs for 4/8-way SMT will also be higher than 5%; it would be nice to know the specific costs of the Power7 and T5 implementations.
One thing that has always irked me was that 5% die penalty statement, which I believe originated from the P4 era?
I mean, even if HWs SMT itself still has such a small penalty, there's the question remaining if some of their uarch improvements like larger buffers and additional ports get any justifiable utilization without it.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,106
136
Keller joined AMD in August 2012, that's too late for him to have influenced the basic design of Excavator - if it comes out in 1Q15. He might have helped with some improvements/optimizations, but that's it. The design phase for complex CPUs currently seems to be in the range of 4-5 years, so a from-the-ground-up "Keller design" might come out in 2016/17 (Intel has given a 5-year project time for Haswell, it took Apple 5 years from the aquisition of P.A. Semi to release of the A7, and even something as primitive as the original P4 took 4 years from first RTL implementation to release). But if the result should be remotely as good as A7, it will be worth the wait.

I was expecting EX on 20nm, so not till 1H2016, in that case, I think Keller would have had allot of input into EX. His main influence, of course, is the re-org he did of the CPU teams in 2012 - mainly by picking the 'winners' who would stay and lead the various teams and groups. Given the lower head count, I think AMD is relying more on automation tools and doing less hand placement out of necessity - this will shorten development times, but at a cost. I wonder if Kaveri 'B' came about because AMD realized GF wasn't going to be able to hit their clock targets and Kaveri needed some more hand tuning to get better clocks/higher throughput.

I don't think Excavator will really need 20nm. There's been lots of talk that the performance benefits of 20nm will be much less than in previous generations and that the cost/transistor won't go down much either, at least initially. And while the WSA is in force, AMD is operating under special restrictions. It may be much better to have a sub-optimal Excavator to produce at GFL than trying to compete with an older architecture. If Excavator really fixes most weak points of the BD family, AMD should get it into the market ASAP and then shrink to 20nm when feasible.

If EX is going to be 28nm, and is thus further along than I thought, then I agree - AMD needs to get it to market as fast as possible. I really hope that EX has at least the option of using triple or quad channel memory, even if most vendors don't use it. It would allow AMD to offer an 'performance' APU to their fans that wouldn't be crippled by bandwidth. Assuming AMD has enough fans left to attract Asus and Gigabyte to supply the requisite motherboards.

Lastly - from the rummage room of my mind - I think that if AMD wants a place at the table for even the low-end x86 server market, that they will have to add SMT since they just can't scale CMT well enough to improve perf/watt for high utilization multithreaded apps. But that is going to take allot of effort and will be a ways out.

On the other hand, 20nm could be an inflection point for AMD since it will be more than a full node drop from 32nm and could allow them to place something like 16 EX cores on a die and see very significant gains in performance and perf/watt that may allow them to re-enter the market. GFL will have to come through with a process that's the right fit for such a product to exist. They wouldn't be a big player, but they don't have to be if significantly higher margins than they get for desktop are available to them.

In any case, thanks for the engaging comments
 

NTMBK

Lifer
Nov 14, 2011
10,269
5,134
136
I was expecting EX on 20nm, so not till 1H2016, in that case, I think Keller would have had allot of input into EX. His main influence, of course, is the re-org he did of the CPU teams in 2012 - mainly by picking the 'winners' who would stay and lead the various teams and groups. Given the lower head count, I think AMD is relying more on automation tools and doing less hand placement out of necessity - this will shorten development times, but at a cost. I wonder if Kaveri 'B' came about because AMD realized GF wasn't going to be able to hit their clock targets and Kaveri needed some more hand tuning to get better clocks/higher throughput.



If EX is going to be 28nm, and is thus further along than I thought, then I agree - AMD needs to get it to market as fast as possible. I really hope that EX has at least the option of using triple or quad channel memory, even if most vendors don't use it. It would allow AMD to offer an 'performance' APU to their fans that wouldn't be crippled by bandwidth. Assuming AMD has enough fans left to attract Asus and Gigabyte to supply the requisite motherboards.

Lastly - from the rummage room of my mind - I think that if AMD wants a place at the table for even the low-end x86 server market, that they will have to add SMT since they just can't scale CMT well enough to improve perf/watt for high utilization multithreaded apps. But that is going to take allot of effort and will be a ways out.

On the other hand, 20nm could be an inflection point for AMD since it will be more than a full node drop from 32nm and could allow them to place something like 16 EX cores on a die and see very significant gains in performance and perf/watt that may allow them to re-enter the market. GFL will have to come through with a process that's the right fit for such a product to exist. They wouldn't be a big player, but they don't have to be if significantly higher margins than they get for desktop are available to them.

In any case, thanks for the engaging comments

2016?! I know GlobalFoundries are bad, but surely 20nm isn't running *that* late.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,106
136
2016?! I know GlobalFoundries are bad, but surely 20nm isn't running *that* late.

Well, how late was 28nm. If the Samsung engineers are @ GF to speed up the 20nm process, then maybe GF will have more success with 20nm. But GF is prioritizing LP which isn't great news for AMD.
 
Last edited:

Ajay

Lifer
Jan 8, 2001
16,094
8,106
136
Who knows, 28nm HP parts products is next year for example.

Well that's good news, in a sense. Maybe EX will be coming out early rather than later and will deliver higher IPC and frequencies than Kaveri. Then Keller's quote from Rage3D may have some real merit to it.
 

Homeles

Platinum Member
Dec 9, 2011
2,580
0
0
It's got double the ALUs/AGUs and FPU width, and the data caches expand to 32KB each. It'll certainly be faster per clock. If they end up fixing their memory controller issues, it could be a very compelling design.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,689
1,224
136
It's got double the ALUs/AGUs and FPU width, and the data caches expand to 32KB each. It'll certainly be faster per clock. If they end up fixing their memory controller issues, it could be a very compelling design.
Branch Prediction -> Logic|Doubled / RAM|6 to 8
Instruction Fetch -> Logic|Doubled / RAM|2-way 64 KB -> 3-way 96 KB
Instruction Pick -> Doubled
Instruction Decode -> Doubled
x86 core PRF/Schedular/RetireQ -> Doubled
x86 core ALU/AGU -> Doubled
x86 core Immediate Value Storage -> Quadrupled or doubled with smaller units.
Load/store L1D/DTLB -> Doubled
Load/store L2 DTLB -> Half-doubled
x86 FPU PRF -> Quadrupled
x86 FPU Schedular/Dispatch/Retire/FMAC/MMX -> Doubled

- What appears the same -
L1 ITLB / L2 ITLB / L1 BTB
L2 Interface
Load/Store Interface

Relative to Piledriver.
 
Last edited:
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |