[AT] AMD Kaveri APU Launch Details: Desktop, January 14th

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

rtsurfer

Senior member
Oct 14, 2013
733
15
76
hmm, what about mixed cores a la big.LITTLE?

2 SR big cores [1 module]
4 bobcat cores
512 gpu style cores
+
opencl and boom!

single threaded perf and multithreaded perf in one package the size of richland

Like inf64 said, maybe Excavator will bring up the single threaded performance up and even current PileDriver multi threaded performance isn't that bad.

So, I think the big.Little like thing is going to be 2M SR and 512 GPU cores, no need for the Bobcat cores.

AMD's HSA = ARM's Big.Little, just that Little is going to be GPU instead of another low power Cpu.
 

Homeles

Platinum Member
Dec 9, 2011
2,580
0
0
Assuming AMD's memory controller has improved.

Not seeing the scaling. At most in toms review is around 22% in games.
Yeah, that's the point anyway -- it's about games, not synthetics. Your chart is pretty much a worst case scenario.
I was gonna say, isn't that an issue with AMD's current memory controller? While DDR3-2133 may provide a significant amount of bandwidth over DDR3-1600, AMD's memory controller isn't efficient enough to effectively use all of it; something Intel doesn't have as much of a problem with.
I'm fairly confident that it works the opposite of what you're suggesting. A more efficient memory controller would make there less of a need for higher bandwidth memory. It'd scale better, but you'd solve your bandwidth problem more quickly, and therefore wouldn't see any real world improvement.
Isn't it scheduled for a redesign come Excavator?
That's what been theorized. It's not making it in with Steamroller, however the issue "has been isolated," according to Anand. There wasn't any mention as to when AMD plans to do a rehaul, but given the timing of things, Excavator makes the most sense. Excavator will show up with DDR4, which would be the perfect time to do a re-plumbing since you're already tearing up the asphalt, so to speak.
Like inf64 said, maybe Excavator will bring up the single threaded performance up and even current PileDriver multi threaded performance isn't that bad.
In regards to Piledriver, it's actually pretty damn terrible, given the thread count. This is why AMD is focusing so much on the front end in Steamroller.

Excavator will bring better single threaded performance though. Die shots are already floating around -- basically the integer and floating point hardware is doubling.
 
Last edited:

rtsurfer

Senior member
Oct 14, 2013
733
15
76
.

In regards to Piledriver, it's actually pretty damn terrible, given the thread count. This is why AMD is focusing so much on the front end in Steamroller.

Are you talking about Piledriver multi threaded..??

Because it is in no way terrible,
http://www.anandtech.com/bench/product/697?vs=836

AMD's thread count is at 8 using their Module implementation, same as 8 threads of Intel Using their HyperThreading..

Meditek is apart of the HSA alliance

Mediatek is their partner in HSA foundation (my guess).

Thanks for the clarification guys..
 

rtsurfer

Senior member
Oct 14, 2013
733
15
76
It's describing mediatek/ARM products.. Just like the other slides talking about mobile products.

You might have a point.

But do you think they intend to run H.264 encoding or Ray Tracing on mobile platform..??
 

Homeles

Platinum Member
Dec 9, 2011
2,580
0
0
Are you talking about Piledriver multi threaded..??

Because it is in no way terrible
The scaling is quite poor, relative to the die space committed to the second thread. The uplift is better than hyperthreading, however the die space penalty is far more substantial -- an additional integer execution unit and LSU, among other things. Hyperthreading is a more efficient use of transistors than the CMT implementation seen in Bulldozer and Piledriver.

Steamroller addresses this. Whether it is a more efficient concept in comparison to Intel's hyperthreading is yet to be seen, but it should be a substantial improvement over Bulldozer/Vishera.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
The scaling is quite poor, relative to the die space committed to the second thread. The uplift is better than hyperthreading, however the die space penalty is far more substantial -- an additional integer execution unit and LSU, among other things. Hyperthreading is a more efficient use of transistors than the CMT implementation seen in Bulldozer and Piledriver.

The Scaling actually is quite good, much higher than Intel's SMT. You spend more logic (die space) for higher scaling. Simple as that.

A single 32nm PileDriver Module at 4GHz is equal in performance with a 32nm SandyBridge Core (+HT). Also, those two are almost the same die size.
Here we have almost the same performance by using two different micro architectures at the same litho process.

 

mrmt

Diamond Member
Aug 18, 2012
3,974
0
76
Excavator core seems to be going after just that, building up the individual core

(...)

But if Ex. core comes at ~90% of Haswell's IPC , or roughly at IB+ territory, it won't mater if Skylake will be +10 or so % more than Haswell


Can we quote you on this?
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
hmm, what about mixed cores a la big.LITTLE?

2 SR big cores [1 module]
4 bobcat cores
512 gpu style cores
+
opencl and boom!

single threaded perf and multithreaded perf in one package the size of richland

big.LITTLE only make sense in the extreme powersaving when you lack the ability to make proper cores. Qualcomm and Apple for example rejected big.LITTLE. So it would be a waste of space for AMD. Also the lack of compability (when moving a running application) between SR and Jaguar cores would require software there isnt there yet on Windows. big.LITTLE is not even working fully on Android yet.
 
Last edited:

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
The Scaling actually is quite good, much higher than Intel's SMT. You spend more logic (die space) for higher scaling. Simple as that.

A single 32nm PileDriver Module at 4GHz is equal in performance with a 32nm SandyBridge Core (+HT). Also, those two are almost the same die size.
Here we have almost the same performance by using two different micro architectures at the same litho process.


That would only mean Intels diespace is much more efficient used. And again show the weakness of AMDs CMT, due to the scaling issue. AMDs CMT only works in very high scaling multithreaded cases without demanding main threads. In short, you get all the good with Intels SMT without being penalized, unlike AMDs CMT solution that depends on very high scaling to yield the same efficiency. It also explains why AMD is slowly moving away from CMT one step at a time.
 

mrmt

Diamond Member
Aug 18, 2012
3,974
0
76
hmm, what about mixed cores a la big.LITTLE?

2 SR big cores [1 module]
4 bobcat cores
512 gpu style cores
+
opencl and boom!

single threaded perf and multithreaded perf in one package the size of richland

AMD competition on the bottom market isn't 240mm^2 CPUs like they are selling today, but the 120mm^2 and the sub 100mm^2 Silvermont CPUs. If anything the only thing that would "boom" is AMD margins with this chip, because they would need extra everything in order to support the extra small cores on the already big APU die.
 

mrmt

Diamond Member
Aug 18, 2012
3,974
0
76
That would only mean Intels diespace is much more efficient used. And again show the weakness of AMDs CMT, due to the scaling issue. AMDs CMT only works in very high scaling multithreaded cases without demanding main threads. In short, you get all the good with Intels SMT without being penalized, unlike AMDs CMT solution that depends on very high scaling to yield the same efficiency. It also explains why AMD is slowly moving away from CMT one step at a time.

I would say that AMD CMT does not work at all. It's misleading to compare core per core, because all the support infrastructure that the core needs to work just isn't there. 32nm Sandy Bridge without the CPU part is 30% smaller than the CPU part of a 32nm APU. And it's not blank space, is transistor budget that consumes power even when only leaking, making the design a lot less efficient than Intel's.

In servers we can see how CMT designs scale badly, because once AMD tried to go over 4 modules they could achieve only paltry clocks, while Intel could get much higher clocks of their big die parts. While here on the forums people are claiming for 8 core Steamroller, Intel has 12 core/24 threads parts that would mop the floor with AMD chips in whatever multithreaded task you can think of, and even the 8C parts would be enough to hold the line against whatever AMD throws at them. AMD can only look for the poor scaling of their designs to blame for their server debacle.

And speaking about sharing... SMT is about sharing *all* resources of the core, while CMT is about sharing just a few of the resources. Intel (and IBM, and Sun, and everyone with SMT) can go huge on core resources and IPC because the resources will be used by more than one thread at a given time, while AMD cannot, because if they go huge on core resources it might end up with only added leakage while the core sits still for lack of threads. This is the reason for the anemic core (which they tried to compensate with high clocks), and this is why you cannot expect much IPC from AMD parts(I'll really save that 90% post for the posterity). CMT ends up delivering a much more inflexible processor than CMT, one that only shines when there are a lot of light threads on the fly, and sucks at everything else.
 
Last edited:

NTMBK

Lifer
Nov 14, 2011
10,269
5,134
136
AMD competition on the bottom market isn't 240mm^2 CPUs like they are selling today, but the 120mm^2 and the sub 100mm^2 Silvermont CPUs. If anything the only thing that would "boom" is AMD margins with this chip, because they would need extra everything in order to support the extra small cores on the already big APU die.

Silvermont and Kaveri in competition? Please! Jaguar and Silvermont are the competitors in the craptop market.

But I agree, adding these little cores to a big core die makes little sense. They'd be better off widening the resources in a module, and having SMT+CMT (4 threads per module).
 

mrmt

Diamond Member
Aug 18, 2012
3,974
0
76
Silvermont and Kaveri in competition? Please! Jaguar and Silvermont are the competitors in the craptop market.

Oh, they are. Last time I noticed AMD didn't have many places to dump their single module parts.

But I agree, adding these little cores to a big core die makes little sense. They'd be better off widening the resources in a module, and having SMT+CMT (4 threads per module).

They would be better off falling back to conventional designs like everyone else did.
 

NTMBK

Lifer
Nov 14, 2011
10,269
5,134
136
I would say that AMD CMT does not work at all. It's misleading to compare core per core, because all the support infrastructure that the core needs to work just isn't there. 32nm Sandy Bridge without the CPU part is 30% smaller than the CPU part of a 32nm APU. And it's not blank space, is transistor budget that consumes power even when only leaking, making the design a lot less efficient than Intel's.

In servers we can see how CMT designs scale badly, because once AMD tried to go over 4 modules they could achieve only paltry clocks, while Intel could get much higher clocks of their big die parts. While here on the forums people are claiming for 8 core Steamroller, Intel has 12 core/24 threads parts that would mop the floor with AMD chips in whatever multithreaded task you can think of, and even the 8C parts would be enough to hold the line against whatever AMD throws at them. AMD can only look for the poor scaling of their designs to blame for their server debacle.

And speaking about sharing... SMT is about sharing *all* resources of the core, while CMT is about sharing just a few of the resources. Intel (and IBM, and Sun, and everyone with SMT) can go huge on core resources and IPC because the resources will be used by more than one thread at a given time, while AMD cannot, because if they go huge on core resources it might end up with only added leakage while the core sits still for lack of threads. This is the reason for the anemic core (which they tried to compensate with high clocks), and this is why you cannot expect much IPC from AMD parts(I'll really save that 90% post for the posterity). CMT ends up delivering a much more inflexible processor than CMT, one that only shines when there are a lot of light threads on the fly, and sucks at everything else.

Don't read too much into the fact that AMD's >4 module server parts had low clocks. Remember, these aren't more than 4 modules on a die- it's just two FX-8350 dies connected together by Hypertransport. It's a dual socket system on a single package, not a true 16 core part. The clock drops are due to thermal constraints, not any difference in the core itself.

And don't forget that Intel has a much more advanced uncore design. AMD are just hanging more and more modules off a crossbar, while Intel has a fast bidirectional ringbus. It's hard to know what comes down to core design, and what comes down to uncore. AMD for some reason cancelled their 10-core Terramar die, which was meant to come on a new platform, and instead doubled down on their already outdated AM3/MCM server platform. I don't know why, but I suspect that the on-chip fabric just didn't scale well enough to handle that many cores.

EDIT: Although we're getting pretty off topic here. We may be better off making a new thread to discuss CMT vs SMT before this thread turns into another flamewar again. Server level scaling doesn't really matter to Kaveri.
 

SiliconWars

Platinum Member
Dec 29, 2012
2,346
0
0
Oh, they are. Last time I noticed AMD didn't have many places to dump their single module parts.

What's the difference between AMD and Intel in this? Where are all those dual core Celerons and Pentiums going? Both companies salvage products are starting to look very weak because of their new craptop chips.

They would be better off falling back to conventional designs like everyone else did.

I'm sure they will be after excavator.
 

NaroonGTX

Member
Nov 6, 2013
106
0
76
I'm pretty sure AMD will stick with the modular design past EX, at least for the big-core APU's.
 

mrmt

Diamond Member
Aug 18, 2012
3,974
0
76
And don't forget that Intel has a much more advanced uncore design. AMD are just hanging more and more modules off a crossbar, while Intel has a fast bidirectional ringbus. It's hard to know what comes down to core design, and what comes down to uncore. AMD for some reason cancelled their 10-core Terramar die, which was meant to come on a new platform, and instead doubled down on their already outdated AM3/MCM server platform. I don't know why, but I suspect that the on-chip fabric just didn't scale well enough to handle that many cores.

Well, I agree that the scaling parts bring some issues beyond the core, but even if we stay in consumer partswe see Intel fielding smaller, more power efficient CPU parts than AMD, that at the same node. With CMT AMD got nor the performance and neither the die space savings they said they would. Steamroller simply doesn't address any of the issues, that's why I'm not very upbeat about it.
 

USER8000

Golden Member
Jun 23, 2012
1,542
780
136
On a side note,it looks like the Haswell Core i3 CPUs are not using a dedicated die unless they are the ones with an HD5000 series IGP:

http://www.anandtech.com/show/7003/the-haswell-review-intel-core-i74770k-i54560k-tested/5

There is a distinct lack of information about the die size of the 2C GT2 parts unlike in the past.

So,that might(and I say might) mean the Haswell based Pentium and Celerons will be using a 177MM2 die too. It would also explain why the Core i3 now has the top desktop IGP,ie,the HD4600, as standard.

Ultimately both companies,will probably eventually want to sell smaller chips at the lower pricepoints.

Anyway,this is all off-topic - back to AMD Kaveri.
 
Last edited:
Status
Not open for further replies.
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |