6 or 8 core Steamroller based AMD CPU likely?

Page 5 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
Well it's certainly much cheaper than designing a new CPU where all the parts have to be designed from scratch (or "upgraded" from previous design). E.g. going from Richland->Kaveri should be much more expensive compared to reusing IP blocks from the Kaveri design in a 6/8 core CPU based on the Kaveri Steamroller CPU core.

Regarding the reason why we're not seeing all the possible CPU versions you mentioned from AMD, I would say the primary reason is that AMD sells in much less volume. So they simply can't have that many SKUs, because the volume of each SKU would be too small. Entering a new segment would be a different matter though. I.e. if they make an 6 & 8 core CPU without iGPU, they can enter a completely different market segment for relatively little cost. Hence it makes sense.

Also, look at Intel instead and the multitude of SKUs they have. Surely they would not have so many SKU variants if it would require a huge additional cost for each SKU.

It's not just about whether or not the potential profit offsets the initial costs - AMD needs to have the money to make those different SKUs in the first place. Intel has a lot of volume potential and they also have a lot of money. It may also probably costs less for them to make all these different dies because they own the fabs.

But yes, if they can afford it in the first place then it comes down to volume.. and in the end FX may now basically be just another SKU they can't justify a die for.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
Perhaps, but since both PS4 and XBONE now have 8 core CPUs, it's a trend that will be much more common going forward. If you don't code your next-gen console game to make use of that you'll get run over by other games that do. Thus 6 and 8 core AMD CPUs based on the Steamroller CPU makes much more sense now that back when the 6/8 core Bulldozer based FX CPUs were released.

You forget a fast dualcore got the same performance as consoles. And that 2 cores are locked for other usage than gaming. So 8 cores is not even in question. Not to mention the performance scaling.

And as shown, your rationale with BF4 was completely off since it was already there in BF3.

Arcording to AMDs server roadmaps there wont be more than 2M/4T SR in 2014. So a 6 or 8 core SR is purely hypothetical.
 
Last edited:

Fjodor2001

Diamond Member
Feb 6, 2010
3,938
408
126
You forget a fast dualcore got the same performance as consoles.
If they've already coded the game for parallelism on the PS4/XBONE it's not like they will disable that in the PC version. So if the PC has more than 2 cores then they'll be used too. And that will enable the PC version of the game to surpass the console version of the same game.
And that 2 cores are locked for other usage than gaming.
Not know for sure, so far only speculation. Also, often when you decide to code for parallelism it does not matter if you target 6 or 8 or N cores. If you do it in a nice way it can often scale nicely in any number of cores. Just as an example, when coding PC Applications several frameworks like .NET handle that automatically for you if you use the correct API:s when coding for parallelism.
So 8 cores is not even in question.
Wrong. See above.
Not to mention the performance scaling.
Wrong. See above.
And as shown, your rationale with BF4 was completely off since it was already there in BF3.
It was just an example. The point is that 8 core CPUs will now be the standard on both PS4 and XBONE, so it's what game developers will have to adapt to and make use of, or they'll get run over by those who do.
According to AMDs server roadmaps there wont be more than 2M/4T SR in 2014. So a 6 or 8 core SR is purely hypothetical.
Sure, anything is speculation until officially announced. E.g. Intel's Broadwell-K was not on the roadmap for 2014 until very late either.
 
Last edited:

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
You should contact all software developers. You obviously singlehanded solved all multithreading issues. You will be a rich man.

And consoles have already been demostrated to you several times. You just keep on refusing that they dont use 8 cores for gaming.
 

Fjodor2001

Diamond Member
Feb 6, 2010
3,938
408
126
You should contact all software developers. You obviously singlehanded solved all multithreading issues. You will be a rich man.

And consoles have already been demostrated to you several times. You just keep on refusing that they dont use 8 cores for gaming.

You're being ridiculous. Obviously it takes some work to make code parallelized. But going forward that is the trend we'll see.

We're entering a new era now with 8 CPU core consoles. So what history has "demonstrated" does not say so much about the future. Because previously it was not worth the effort to make code use 6/8+ cores simply because it was very uncommon to have HW making use of that, but now that is changing.

But I know you'd rather live in that past with single core CPUs and single threaded code being the only options allowed. I don't think any facts, reasoning or new HW & SW will make you change your mind on that.
 
Last edited:

CHADBOGA

Platinum Member
Mar 31, 2009
2,135
832
136
You're being ridiculous. Obviously it takes some work to make code parallelized. But going forward that is the trend we'll see.

We're entering a new era now with 8 CPU core consoles. So what history has "demonstrated" does not say so much about the future. Because previously it was not worth the effort to make code use 6/8+ cores simply because it was very uncommon to have HW making use of that, but now that is changing.

But I know you'd rather live in that past with single core CPUs and single threaded code being the only options allowed. I don't think any facts, reasoning or new HW & SW will make you change your mind on that.

So in your Brave New World of 8 cores, what do you think of future AMD roadmaps showing only 4 core products? :hmm:
 

NTMBK

Lifer
Nov 14, 2011
10,269
5,134
136
AMD did used to make more specialised dies- the Phenom II generation came in two, four or six core die sizes, and some two core dies physically had no L3 (not just disabled L3 cache). So it's not entirely beyond the realms of possibility. But AMD are in such a tight monetary bind right now, I'm not sure if it's very likely. A six core Steamroller die on FM2+ would be nice, but there's no sign of it so far.
 

Fjodor2001

Diamond Member
Feb 6, 2010
3,938
408
126
So in your Brave New World of 8 cores, what do you think of future AMD roadmaps showing only 4 core products? :hmm:

Well, not everybody needs or desires 6 or 8 core CPUs of course. I think AMD is starting out with 4 core Kaveri CPUs, which is most mainstream. Then they might add 6 or 8 core versions later (maybe some without iGPU in that case), perhaps in late 2014 or so. Possibly e.g. when PS4/XBONE has made sure there are enough games on the market making use of more cores, so that demand for CPUs with 6/8 cores rises on the desktop PC market as well (if we're talking about the gaming PC segment).

As some have said, AMD is in a tight monetary situation so they can't do everything at once. But possibly in sequence.
 
Last edited:
Jun 24, 2012
112
0
0
You're being ridiculous. Obviously it takes some work to make code parallelized. But going forward that is the trend we'll see.

We're entering a new era now with 8 CPU core consoles. So what history has "demonstrated" does not say so much about the future. Because previously it was not worth the effort to make code use 6/8+ cores simply because it was very uncommon to have HW making use of that, but now that is changing.

But I know you'd rather live in that past with single core CPUs and single threaded code being the only options allowed. I don't think any facts, reasoning or new HW & SW will make you change your mind on that.


You're not getting it. The reason that 8 cores might be used on a Jaguar-based core, but not on a higher PC is because the code might need a certain level of performance and once it gets it, either from an extremely awesome dual core or a really sluggish 8 core, it's done.

It's like you have a quart of milk that is your optimal performance in a reasonable world where performance dropoff hits. You can fill a tall, narrow container or you can fill a short, squat one, but either way there's only so much milk before you hit performance dropoff that makes it not scale much farther. We see this all the time.

Just because a game is designed to use the 8 (more like 5 since Xbone uses 3 cores for its OS's and other things), it doesn't mean it's also not designed around those slower cores. If given faster cores, it might be optimal to just run on less cores. CPU's just don't scale that well with the games we've seen thus far.

It's pure speculation to say that they're going to find some magic sauce that's going to suddenly make any computer application--game or otherwise--run well and need 8 cores.

Especially since the Xbone is only giving its developers access to five of those cores anyway. Plus, the PS4 is giving six.

That, btw, is the reason AMD isn't pushing more than quad-cores (if you can call their 2 module CPU's quadcores at all because they're a hybrid between hyperthreading and real cores, but they ain't real cores). The next gen consoles are going to have games built to look for 5-6 cores, not 8. With PC CPU's so much faster (literally in speed and in performance), even an AMD Kaveri quad-core should match the eight cores of a Jaguar-based CPU at much lower speeds.

Even with OS overhead. The problem Kaveri and other APU's have is they can't match the integrated GPU of what's in the PS4. I'm not as confident about the Xbone's APU being ahead, though.
 

Fjodor2001

Diamond Member
Feb 6, 2010
3,938
408
126
You're not getting it. The reason that 8 cores might be used on a Jaguar-based core, but not on a higher PC is because the code might need a certain level of performance and once it gets it, either from an extremely awesome dual core or a really sluggish 8 core, it's done.

It's like you have a quart of milk that is your optimal performance in a reasonable world where performance dropoff hits. You can fill a tall, narrow container or you can fill a short, squat one, but either way there's only so much milk before you hit performance dropoff that makes it not scale much farther. We see this all the time.

Just because a game is designed to use the 8 (more like 5 since Xbone uses 3 cores for its OS's and other things), it doesn't mean it's also not designed around those slower cores. If given faster cores, it might be optimal to just run on less cores.
Is all you're trying to say that some games will not make use of more processing power despite it being available, when a certain threshold has been reached? Because then I fully agree with you.

But that does not mean that all games designed for PS4/XBONE will not make use of additional CPU performance on a PC, if available.

Also, note that games can be CPU bottlenecked because of poor parallelism too. And that's what we've mainly have been discussing here. I.e. if a game is only designed to make use of 1 or 2 CPU cores, it does not matter if you have 8. And my point is that with the PS4/XBONE that kind of bottlenecking due to poor parallelism will likely be less common going forward, since the games will have to be designed for parallelism to make good use of the 8 core CPU available in those next-gen consoles.
CPU's just don't scale that well with the games we've seen thus far.

It's pure speculation to say that they're going to find some magic sauce that's going to suddenly make any computer application--game or otherwise--run well and need 8 cores.

Especially since the Xbone is only giving its developers access to five of those cores anyway. Plus, the PS4 is giving six.
See this. Also to clarify, if you code for parallism in a good way, you don't design specifically for X number of cores. Instead you try to redesign as much of your code as possible to make it is possible to execute in parallel. And then you just throw all those parallel work tasks onto the OS/Framework scheduler and let it distribute the work among the CPU cores available, be that 2 or 16 cores.
That, btw, is the reason AMD isn't pushing more than quad-cores (if you can call their 2 module CPU's quadcores at all because they're a hybrid between hyperthreading and real cores, but they ain't real cores). The next gen consoles are going to have games built to look for 5-6 cores, not 8. With PC CPU's so much faster (literally in speed and in performance), even an AMD Kaveri quad-core should match the eight cores of a Jaguar-based CPU at much lower speeds.
See this. And also the other post I linked to above.
Even with OS overhead. The problem Kaveri and other APU's have is they can't match the integrated GPU of what's in the PS4. I'm not as confident about the Xbone's APU being ahead, though.
I agree, but that does not contradict anything I have stated, and I'm not sure what you're point is with bringing that up.

As I mentioned before a 6/8 core Steamroller based CPU does not have to contain any iGPU. And it could be used together with a discrete GFX card, so it can surpass the PS4/XBONE iGPU easily.
 
Last edited:

NaroonGTX

Member
Nov 6, 2013
106
0
76
SteamrollerB (a.k.a. Steamroller 2.0, a.k.a. bdver3b) has been known about for quite some time: http://www.fudzilla.com/home/item/29986-richland-successor-in-2014-is-kaveri

That, btw, is the reason AMD isn't pushing more than quad-cores (if you can call their 2 module CPU's quadcores at all because they're a hybrid between hyperthreading and real cores, but they ain't real cores).

This is incorrect. A 2 module chip has 4 real physical integer units on it. Don't use the "b-b-but they share the FPU!" thing as an excuse because it doesn't work. In that case any chip that came before the i486 wasn't a 'real' CPU because there wasn't a FPU on the die. There are two FMACs which can either work as 2x 128-bit units or combine together for 1x 256-bit unit for AVX instructions. In comparison, a Phenom II x4 chip had four separate physical INT units on it and each one had a 128-bit FPU on it. The modular approach saves dive space and increases efficiency.

I also don't see the point in trying to downplay the fact that the consoles have 8 cores directly affecting the PC games. We've already seen several titles such as BF4, Watch_Dogs, Need For Speed: Rivals, etc. recommend hexa- or -octocores for the PC versions. This means that yes, more games will be properly multi-threaded on PC finally and anyone with the proper chips will get better performance. The consoles having eight low-power weaker cores doesn't mean the devs will code on that platform (which is also x86-64) and then suddenly dump that work and "limit" the PC versions.

To further expand on this, Planetside 2 on PC will receive an update later which will give the engine better multi-threaded capabilities. This is because the game is coming to PS4, which has eight weak cores on it. They obviously needed to tweak the engine to get the game running properly, so this will tinkle over to the PC version as well. Source

I agree that not everyone needs more than 4 cores necessarily, especially not gamers. Kaveri's increased integer performance will be more impressive in games, especially single-threaded ones than any previous octocore chip. I myself occasionally run very CPU-heavy emulators like PCSX2 and Dolphin, so I would much prefer Kaveri's SR cores over BD or PD.
 

Insert_Nickname

Diamond Member
May 6, 2012
4,971
1,692
136
This is incorrect. A 2 module chip has 4 real physical integer units on it.

Nobody is saying differently, but those integer cores does share the front-end and instruction decoder. That happens to be 4 instructions wide, one more then the Phenom2.

 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
Its only the decode thats being "individualized" in steamroller. The fetch etc is still shared. AMD is slowly moving away from the failed CMT concept.
 

Homeles

Platinum Member
Dec 9, 2011
2,580
0
0
You're saying that as if the extra decode hardware is a minor addition -- it's not. The L1D will also be separated into two 48KB partitions, one for each core.

And it's hard to say what's planned after Excavator.
 

Insert_Nickname

Diamond Member
May 6, 2012
4,971
1,692
136
Not Steamroller, which is the subject of this thread and the ongoing discussion.

That was some poor wording on my part. About the only difference between that Bulldozer diagram and Steamroller is that the the decoders are being duplicated (4 x2) and the L1 cache is being doubled.

But if it delivers a 15% increase in IPC, that's fine by me. We'll know soon enough.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
You're saying that as if the extra decode hardware is a minor addition -- it's not. The L1D will also be separated into two 48KB partitions, one for each core.

And it's hard to say what's planned after Excavator.

I assume you meant L1I, not L1D. Where are you hearing that the icache will be partitioned per core? This is the first I've heard something like that.

Bulldozer/Piledriver has a 2-way set associative 64KB instruction cache like all of its predecessors going back to the K7 (and even the K6 had 2-way set associative 32KB instruction cache). The natural extension to get 96KB would be to move to a 3-way set associative instruction cache; incidentally this is what Cortex-A57 is doing too. You can't partition this into two separate 48KB caches, you can't have 1.5-way caches.
 

sniffin

Member
Jun 29, 2013
141
22
81
You should contact all software developers. You obviously singlehanded solved all multithreading issues. You will be a rich man.

And consoles have already been demostrated to you several times. You just keep on refusing that they dont use 8 cores for gaming.

Except that developers have been using the 6 available SPEs onboard the PS3 for some time now out of necessity, with the added bonus of them being far more difficult to use than general purpose cores like Jaguar.

Multithreading in gaming isn't a magical unicorn. When it's necessary they make it happen
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
Except that developers have been using the 6 available SPEs onboard the PS3 for some time now out of necessity, with the added bonus of them being far more difficult to use than general purpose cores like Jaguar.

Multithreading in gaming isn't a magical unicorn. When it's necessary they make it happen

The thing with PS3 is that as years went on the SPEs were used more and more for tasks that you'd use the GPU for now - or even things you'd normally use the GPU for then (if the GPU were stronger).
 

sm625

Diamond Member
May 6, 2011
8,172
137
106
Bulldozer/Piledriver has a 2-way set associative 64KB instruction cache like all of its predecessors going back to the K7 (and even the K6 had 2-way set associative 32KB instruction cache). The natural extension to get 96KB would be to move to a 3-way set associative instruction cache; incidentally this is what Cortex-A57 is doing too. You can't partition this into two separate 48KB caches, you can't have 1.5-way caches.

I dont think you can have a 3-way set associative cache. It has to be a power of 2. Also, the size of the cache shouldnt have any direct bearing on how many "ways" it is. iirc, increasing a 2-way set associative to a 4-way set associative just means that you can cache 4 totally separate sections of main memory (which allows for more branch paths to be cached).

What I dont get is why AMD caches are 128 bit when Intel's are 256 and Intel completely destroys AMD mainly due to its massive cache performance advantage. An FX8350 has 8MB of L2 and 8MB of L3. What they should have is simply a large shared L2 and no L3, but the cache datapath/width should be 256 bit. The sheer wiring mess was probably too much for them to handle. But they should at least make the L1 256 bit. All I know is AMD is probably painted into a corner with their cache design. It is big and slow and needs a complete rework.
 

NaroonGTX

Member
Nov 6, 2013
106
0
76
Nobody is saying differently, but those integer cores does share the front-end and instruction decoder. That happens to be 4 instructions wide, one more then the Phenom2.

The guy I quoted was saying differently. CMT isn't hyperthreading or close to it. The only reason people say that is because of the shared resources. Intel's implementation of SMT shares many more resources than this CMT implementation does...and the reason why multi-threaded perf doesn't scale 100% but rather 80% is because of the narrow decoder (which is fixed in Steamroller), not the shared FPU.

AMD is slowly moving away from the failed CMT concept.

Can you enlighten me on how it's failed if it actually works exactly as they said it would?

The thing with PS3 is that as years went on the SPEs were used more and more for tasks that you'd use the GPU for now - or even things you'd normally use the GPU for then (if the GPU were stronger).

An interesting thing is how originally, the PS3 didn't have a real dedicated GPU. They were gonna have two XPE's (no idea what this stood for, but it's basically the Cell/PPE) rather than just one. One would have taken the traditional CPU role, and the other would be used as the GPU... Naughty Dog were apparently the first to tell the Japanese engineers that this was insane, and wouldn't work too well, which is the reason why the PS3 missed its original late 2005 launch date and got delayed a year.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
The benefit of Intels hyperthreading is the extremely small die usage. Its essentially free.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
I dont think you can have a 3-way set associative cache. It has to be a power of 2. Also, the size of the cache shouldnt have any direct bearing on how many "ways" it is. iirc, increasing a 2-way set associative to a 4-way set associative just means that you can cache 4 totally separate sections of main memory (which allows for more branch paths to be cached).

Of course you can have set-associativity that isn't a power of two, why would you think otherwise? Cortex-A57 is one example I gave that's 3-way. For another example look at all Atoms (both the old in-order Bonnell/Saltwell and the newer Silvermont) which sport 24KB six-way set associative caches.

I'm not sure you totally understand how set-associativity in caching works. The set isn't picked from the address bits, the tags for each set are scanned in parallel (ignoring some kind of way prediction), the number of sets can be totally arbitrary.

The set size, on the other hand, must be a power of two. So when you see a cache that's overall a non-power of two size like 96KB that means that the number of sets is not a power of two.

I don't know why you think I'm saying that the size of the cache has any direct relation with the associativity, I just said that increasing from 64KB/2-way to 96KB/3-way is a natural evolution. It's also what BSN claims that a confidential AMD document says. If what Homeles says is true it'd have to be split into something like 6-ways total.

What I dont get is why AMD caches are 128 bit when Intel's are 256 and Intel completely destroys AMD mainly due to its massive cache performance advantage. An FX8350 has 8MB of L2 and 8MB of L3. What they should have is simply a large shared L2 and no L3, but the cache datapath/width should be 256 bit. The sheer wiring mess was probably too much for them to handle. But they should at least make the L1 256 bit. All I know is AMD is probably painted into a corner with their cache design. It is big and slow and needs a complete rework.

I expect AMD would generally benefit more from lowering L2 latency than increasing its bandwidth. 256-bit L1 makes no sense so long as nothing in the system performs 256-bit transactions (ie, AVX/AVX2).

The guy I quoted was saying differently. CMT isn't hyperthreading or close to it. The only reason people say that is because of the shared resources. Intel's implementation of SMT shares many more resources than this CMT implementation does...and the reason why multi-threaded perf doesn't scale 100% but rather 80% is because of the narrow decoder (which is fixed in Steamroller), not the shared FPU.

There's other stuff that's shared than decode and FPU/SIMD (everything else in the frontend - branch prediction, icache, ITLBm fetch/PFB), without an indepth analysis you can't say that most or all of it was because of a decode bottleneck alone. Although at a glance it looks like the single most limiting thing by far.
 

NaroonGTX

Member
Nov 6, 2013
106
0
76
Going off the slides from Hot Chops 2012, it seems whatever was causing the CMT thread-scaling penalty was highlighted and subsequently nipped in the bud with SR. The fact that some AMD engineers kept saying SR is what BD "was supposed to be" seems accurate there, but you're right that it may have been more than the decoder.

The benefit of Intels hyperthreading is the extremely small die usage. Its essentially free.

Uses less die space but doesn't scale as well as real cores do. The point of the modular design was to lower die space as well and it pretty much works. HT itself, or rather Intel's implementation of it at least has always left me unimpressed. IBM's POWER series on the other hand...
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
Uses less die space but doesn't scale as well as real cores do. The point of the modular design was to lower die space as well and it pretty much works. HT itself, or rather Intel's implementation of it at least has always left me unimpressed. IBM's POWER series on the other hand...

So you are unimpressed with getting 10-25% for less than 5% extra core diespace? Yet you are impressed by IBMs solution? Hah..
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |