[WCCF] AMD Kaveri Mobile APUs Vs. ULV Haswell Benchmarks

Page 5 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Blitzvogel

Platinum Member
Oct 17, 2010
2,012
23
81
Does AMD's implementation even allow the igp to use L2?

I would expect so for HSA at least. The need CUs likely have their own caches too but nothing as big as the L2. I would guess that any stacked eDRAM or whatever AMD could use would be an L3 cache much like Intel's Iris Pro processors recognize the eDRAM as L4.

The thing is, realistically and in terms of costs what could AMD get from stacked eDRAM/RAM implementation versus an additional DDR3/4 memory controller or two?
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
People mentioned Iris Pro, but Iris Pro is not only on the R versions, its also on the mobile versions, actually they are the same thing with diferent clocks, turbos and TDP windows.

Same with Kaveri, whats your point ??

That Brix where able to use R version on these things is kinda suprissing, and they are getting named because of these, we have benchmarks.

You can have a 45/65W TDP Kaveri in that small form factor if you like, and it would be way cheaper that those Iris Pro BRIX.
In fact, you can get the A10-7850K at 65W TDP by lowering CPU frequency only. GPU performance will remain almost the same as the 95W TDP SKU.

Them you decided to go and compare it to a 7850K, and... hang on a minute, i remember you telling me in my face that i cant compare the 5350 to a G1820... ok.

AMD has Dual core Richland to compete against Intel Haswell Celerons and Pentiums. They already have released new Entry level updated A4 series SKUs(A4-4020, A4-6300 and A4-6320).
Kabini is competing against ATOM based SKUs, simple as that. Iris Pro is in the same segment with Kaveri.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
Will be a nice increase for integrated GPU even if it's not as good as having an eDRAM cache.

50+% FPS gains in Linux going from DDR3-1333 to DDR3-2133.

http://www.phoronix.com/scan.php?page=article&item=amd_kaveri_memory&num=2

Granted will be a bit disappointing if lowest end DDR4 in notebooks isn't at least DDR4-1866. JEDEC spec has a DDR4-1600 speed bin but hopefully that will only be seen in phones and tablets.

Dont get me wrong, it will be a great increase compared to today. But the IGP will suffer the same if not even higher bottleneck than today without something else as well.
 

coercitiv

Diamond Member
Jan 24, 2014
6,402
12,861
136
Dont get me wrong, it will be a great increase compared to today. But the IGP will suffer the same if not even higher bottleneck than today without something else as well.
Chip architecture can also help, as Maxwell shows: in the mobile field the DDR3 equipped 850M beats 750M by as much as 80%. Sure it's a bigger chip, but it still shows what can be done using the same cheap DDR3 as the last generation.
 

Blitzvogel

Platinum Member
Oct 17, 2010
2,012
23
81
Chip architecture can also help, as Maxwell shows: in the mobile field the DDR3 equipped 850M beats 750M by as much as 80%. Sure it's a bigger chip, but it still shows what can be done using the same cheap DDR3 as the last generation.

Maxwell probably has larger GPU caches.
 

coercitiv

Diamond Member
Jan 24, 2014
6,402
12,861
136
Indeed. GM107 has 2MB L2, GK107 "only" has 256KB. So that's 8x the on-chip cache right there.
Cache size is only half of the story, what you do with that cache is also important.

All this makes me think I should read more about how Intel and AMD make use of their cache structure in Haswell and Kaveri. I know Haswell GPU makes use of L3 cache, but know little about Kaveri from this point of view.
 

Insert_Nickname

Diamond Member
May 6, 2012
4,971
1,692
136
Cache size is only half of the story, what you do with that cache is also important.

Of course. I thought that went without saying. But still, adding that much extra cache is bound to have an effect.

All this makes me think I should read more about how Intel and AMD make use of their cache structure in Haswell and Kaveri. I know Haswell GPU makes use of L3 cache, but know little about Kaveri from this point of view.

Trinity/Richland/Kaveri/Kabini's GPU doesn't use the CPUs L2. The IGP is connected directly to the northbridge/memory controller by a dedicated bus.
 

Blitzvogel

Platinum Member
Oct 17, 2010
2,012
23
81
Of course. I thought that went without saying. But still, adding that much extra cache is bound to have an effect.



Trinity/Richland/Kaveri/Kabini's GPU doesn't use the CPUs L2. The IGP is connected directly to the northbridge/memory controller by a dedicated bus.

Wow, not even Kaveri? Seems counter intuitive considering it's suppose to lead us into the HSA era I assume there still a GPU L2 like any other GCN CU array?
 

Insert_Nickname

Diamond Member
May 6, 2012
4,971
1,692
136
I assume there still a GPU L2 like any other GCN CU array?

I haven't been able to dig up anything solid on that, but I have a strong suspicion that's the case.

I wonder if APUs could benefit from perhaps shrinking the CPU L2 and add a larger L3 shared with the GPU? Considering how much performance Nvidia has been able to extract with the 2MB L2 of the GM107, despite the "low" memory bandwidth available, I'm wondering if AMDs APUs would benefit from a larger L2 cache given how much they're constrained by memory bandwidth...?
 

mrmt

Diamond Member
Aug 18, 2012
3,974
0
76
I wonder if APUs could benefit from perhaps shrinking the CPU L2 and add a larger L3 shared with the GPU? Considering how much performance Nvidia has been able to extract with the 2MB L2 of the GM107, despite the "low" memory bandwidth available, I'm wondering if AMDs APUs would benefit from a larger L2 cache given how much they're constrained by memory bandwidth...?

That would further erode the CPU performance, because AMD L2 cache is already slow. Moving everything else to L3 and sharing it with a big iGPU would certainly assure that most CPU workloads are sure to go to the main memory.
 

Insert_Nickname

Diamond Member
May 6, 2012
4,971
1,692
136
That would further erode the CPU performance, because AMD L2 cache is already slow. Moving everything else to L3 and sharing it with a big iGPU would certainly assure that most CPU workloads are sure to go to the main memory.

I'm not sure that's the case. Firstly, I didn't say "remove" the CPU L2 cache, just reducing its size a bit. Intel's CPUs since Nehalem have gotten along just fine with 256KB L2. AMD's L1 cache is also pretty big on Kaveri (192KB), so I don't think it'll have too much influence on CPU performance. Secondly, the above assumes that you're running both CPU and GPU heavy applications simultaneously. Thirdly to make room for the additional cache, you could cut the number of CUs, f.x. from 8 to 6. You loose a bit of raw computing power, but the additional performance from the big "fast" (faster then main memory anyway) L2 would properly offset that.

I do agree on that AMD's memory controller could do with an upgrade though.
 

Schmide

Diamond Member
Mar 7, 2002
5,590
724
126
I wonder if APUs could benefit from perhaps shrinking the CPU L2 and add a larger L3 shared with the GPU? Considering how much performance Nvidia has been able to extract with the 2MB L2 of the GM107, despite the "low" memory bandwidth available, I'm wondering if AMDs APUs would benefit from a larger L2 cache given how much they're constrained by memory bandwidth...?

I'm not sure that's the case. Firstly, I didn't say "remove" the CPU L2 cache, just reducing its size a bit. Intel's CPUs since Nehalem have gotten along just fine with 256KB L2. AMD's L1 cache is also pretty big on Kaveri (192KB), so I don't think it'll have too much influence on CPU performance. Secondly, the above assumes that you're running both CPU and GPU heavy applications simultaneously. Thirdly to make room for the additional cache, you could cut the number of CUs, f.x. from 8 to 6. You loose a bit of raw computing power, but the additional performance from the big "fast" (faster then main memory anyway) L2 would properly offset that.

I do agree on that AMD's memory controller could do with an upgrade though.

You can't just route a cache to another subsystem and expect it to not only fit well, but also fulfill the other systems needs. A CPU cache has a way, meaning it can only mirror a certain number of slices of memory before one slice needs to be evicted.

I personally think the biggest problem for the Bulldozer family was its horrid cache system. The original had an L3 that was basically the same speed as main memory and an L2 that was only fractionally better. Having possible 50-100 tick bubbles can't be good for business especially when Intel is returning sub 20 tick bubbles for the same type of event.

Steamroller and Piledriver improved this area fractionally but not near the magnitude needed to catch up with Intel.
 

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
I haven't been able to dig up anything solid on that, but I have a strong suspicion that's the case.

I wonder if APUs could benefit from perhaps shrinking the CPU L2 and add a larger L3 shared with the GPU? Considering how much performance Nvidia has been able to extract with the 2MB L2 of the GM107, despite the "low" memory bandwidth available, I'm wondering if AMDs APUs would benefit from a larger L2 cache given how much they're constrained by memory bandwidth...?

The GPU cores have 16KB of L1 cache. They have access to the CPU L2 cache when running HSA workloads (which of course don't exist yet)

http://www.xbitlabs.com/articles/cpu/display/amd-a10-7850k_3.html
 
Last edited:

coercitiv

Diamond Member
Jan 24, 2014
6,402
12,861
136
The GPU cores have 16KB of L1 cache. They have access to the CPU L2 cache when running HSA workloads (which of course don't exist yet)
Cape Verde has an 512KB of L2 cache, 256KB per memory controller. AMD seems to be increasing it's cache size for GCN, from 64 > 128 > 256KB pe memory controller.
Since the L2 cache is tied to the memory controller, I should have figured Kaveri's iGPU would have none.

This is an interesting fact since without some other form of caching the Kaveri iGPU has no chance in matching it's dGPU brethren. (the ones that are comparable in computation power)
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,868
3,419
136
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |