[WCCF] AMD Kaveri Mobile APUs Vs. ULV Haswell Benchmarks

Blitzvogel · May 11, 2014

Enigmoid said:
Does AMD's implementation even allow the igp to use L2?

I would expect so for HSA at least. The need CUs likely have their own caches too but nothing as big as the L2. I would guess that any stacked eDRAM or whatever AMD could use would be an L3 cache much like Intel's Iris Pro processors recognize the eDRAM as L4.

The thing is, realistically and in terms of costs what could AMD get from stacked eDRAM/RAM implementation versus an additional DDR3/4 memory controller or two?

AtenRa · May 12, 2014

Shivansps said:
People mentioned Iris Pro, but Iris Pro is not only on the R versions, its also on the mobile versions, actually they are the same thing with diferent clocks, turbos and TDP windows.

Same with Kaveri, whats your point ??

Shivansps said:
That Brix where able to use R version on these things is kinda suprissing, and they are getting named because of these, we have benchmarks.

You can have a 45/65W TDP Kaveri in that small form factor if you like, and it would be way cheaper that those Iris Pro BRIX.
In fact, you can get the A10-7850K at 65W TDP by lowering CPU frequency only. GPU performance will remain almost the same as the 95W TDP SKU.

Shivansps said:
Them you decided to go and compare it to a 7850K, and... hang on a minute, i remember you telling me in my face that i cant compare the 5350 to a G1820... ok.

AMD has Dual core Richland to compete against Intel Haswell Celerons and Pentiums. They already have released new Entry level updated A4 series SKUs(A4-4020, A4-6300 and A4-6320).
Kabini is competing against ATOM based SKUs, simple as that. Iris Pro is in the same segment with Kaveri.

ShintaiDK · May 12, 2014

Vesku said:
Will be a nice increase for integrated GPU even if it's not as good as having an eDRAM cache.

50+% FPS gains in Linux going from DDR3-1333 to DDR3-2133.

http://www.phoronix.com/scan.php?page=article&item=amd_kaveri_memory&num=2

Granted will be a bit disappointing if lowest end DDR4 in notebooks isn't at least DDR4-1866. JEDEC spec has a DDR4-1600 speed bin but hopefully that will only be seen in phones and tablets.

Dont get me wrong, it will be a great increase compared to today. But the IGP will suffer the same if not even higher bottleneck than today without something else as well.

coercitiv · May 12, 2014

ShintaiDK said:
Dont get me wrong, it will be a great increase compared to today. But the IGP will suffer the same if not even higher bottleneck than today without something else as well.

Chip architecture can also help, as Maxwell shows: in the mobile field the DDR3 equipped 850M beats 750M by as much as 80%. Sure it's a bigger chip, but it still shows what can be done using the same cheap DDR3 as the last generation.

Blitzvogel · May 12, 2014

coercitiv said:
Chip architecture can also help, as Maxwell shows: in the mobile field the DDR3 equipped 850M beats 750M by as much as 80%. Sure it's a bigger chip, but it still shows what can be done using the same cheap DDR3 as the last generation.

Maxwell probably has larger GPU caches.

Insert_Nickname · May 13, 2014

NUSNA_Moebius said:
Maxwell probably has larger GPU caches.

Indeed. GM107 has 2MB L2, GK107 "only" has 256KB. So that's 8x the on-chip cache right there.

It'll be interesting to see if the bigger Maxwell cores increase that further.

coercitiv · May 13, 2014

Insert_Nickname said:
Indeed. GM107 has 2MB L2, GK107 "only" has 256KB. So that's 8x the on-chip cache right there.

Cache size is only half of the story, what you do with that cache is also important.

All this makes me think I should read more about how Intel and AMD make use of their cache structure in Haswell and Kaveri. I know Haswell GPU makes use of L3 cache, but know little about Kaveri from this point of view.

Insert_Nickname · May 13, 2014

coercitiv said:
Cache size is only half of the story, what you do with that cache is also important.

Of course. I thought that went without saying. But still, adding that much extra cache is bound to have an effect.

coercitiv said:
All this makes me think I should read more about how Intel and AMD make use of their cache structure in Haswell and Kaveri. I know Haswell GPU makes use of L3 cache, but know little about Kaveri from this point of view.

Trinity/Richland/Kaveri/Kabini's GPU doesn't use the CPUs L2. The IGP is connected directly to the northbridge/memory controller by a dedicated bus.

Blitzvogel · May 13, 2014

Insert_Nickname said:
Of course. I thought that went without saying. But still, adding that much extra cache is bound to have an effect.

Trinity/Richland/Kaveri/Kabini's GPU doesn't use the CPUs L2. The IGP is connected directly to the northbridge/memory controller by a dedicated bus.

Wow, not even Kaveri? Seems counter intuitive considering it's suppose to lead us into the HSA era I assume there still a GPU L2 like any other GCN CU array?

Insert_Nickname · May 14, 2014

NUSNA_Moebius said:
I assume there still a GPU L2 like any other GCN CU array?

I haven't been able to dig up anything solid on that, but I have a strong suspicion that's the case.

I wonder if APUs could benefit from perhaps shrinking the CPU L2 and add a larger L3 shared with the GPU? Considering how much performance Nvidia has been able to extract with the 2MB L2 of the GM107, despite the "low" memory bandwidth available, I'm wondering if AMDs APUs would benefit from a larger L2 cache given how much they're constrained by memory bandwidth...?

mrmt · May 14, 2014

Insert_Nickname said:
I wonder if APUs could benefit from perhaps shrinking the CPU L2 and add a larger L3 shared with the GPU? Considering how much performance Nvidia has been able to extract with the 2MB L2 of the GM107, despite the "low" memory bandwidth available, I'm wondering if AMDs APUs would benefit from a larger L2 cache given how much they're constrained by memory bandwidth...?

That would further erode the CPU performance, because AMD L2 cache is already slow. Moving everything else to L3 and sharing it with a big iGPU would certainly assure that most CPU workloads are sure to go to the main memory.

Insert_Nickname · May 14, 2014

mrmt said:
That would further erode the CPU performance, because AMD L2 cache is already slow. Moving everything else to L3 and sharing it with a big iGPU would certainly assure that most CPU workloads are sure to go to the main memory.

I'm not sure that's the case. Firstly, I didn't say "remove" the CPU L2 cache, just reducing its size a bit. Intel's CPUs since Nehalem have gotten along just fine with 256KB L2. AMD's L1 cache is also pretty big on Kaveri (192KB), so I don't think it'll have too much influence on CPU performance. Secondly, the above assumes that you're running both CPU and GPU heavy applications simultaneously. Thirdly to make room for the additional cache, you could cut the number of CUs, f.x. from 8 to 6. You loose a bit of raw computing power, but the additional performance from the big "fast" (faster then main memory anyway) L2 would properly offset that.

I do agree on that AMD's memory controller could do with an upgrade though.

Schmide · May 14, 2014

Insert_Nickname said:
I wonder if APUs could benefit from perhaps shrinking the CPU L2 and add a larger L3 shared with the GPU? Considering how much performance Nvidia has been able to extract with the 2MB L2 of the GM107, despite the "low" memory bandwidth available, I'm wondering if AMDs APUs would benefit from a larger L2 cache given how much they're constrained by memory bandwidth...?

Insert_Nickname said:
I'm not sure that's the case. Firstly, I didn't say "remove" the CPU L2 cache, just reducing its size a bit. Intel's CPUs since Nehalem have gotten along just fine with 256KB L2. AMD's L1 cache is also pretty big on Kaveri (192KB), so I don't think it'll have too much influence on CPU performance. Secondly, the above assumes that you're running both CPU and GPU heavy applications simultaneously. Thirdly to make room for the additional cache, you could cut the number of CUs, f.x. from 8 to 6. You loose a bit of raw computing power, but the additional performance from the big "fast" (faster then main memory anyway) L2 would properly offset that.

I do agree on that AMD's memory controller could do with an upgrade though.

You can't just route a cache to another subsystem and expect it to not only fit well, but also fulfill the other systems needs. A CPU cache has a way, meaning it can only mirror a certain number of slices of memory before one slice needs to be evicted.

I personally think the biggest problem for the Bulldozer family was its horrid cache system. The original had an L3 that was basically the same speed as main memory and an L2 that was only fractionally better. Having possible 50-100 tick bubbles can't be good for business especially when Intel is returning sub 20 tick bubbles for the same type of event.

Steamroller and Piledriver improved this area fractionally but not near the magnitude needed to catch up with Intel.

Phynaz · May 14, 2014

Insert_Nickname said:
I haven't been able to dig up anything solid on that, but I have a strong suspicion that's the case.

I wonder if APUs could benefit from perhaps shrinking the CPU L2 and add a larger L3 shared with the GPU? Considering how much performance Nvidia has been able to extract with the 2MB L2 of the GM107, despite the "low" memory bandwidth available, I'm wondering if AMDs APUs would benefit from a larger L2 cache given how much they're constrained by memory bandwidth...?

The GPU cores have 16KB of L1 cache. They have access to the CPU L2 cache when running HSA workloads (which of course don't exist yet)

http://www.xbitlabs.com/articles/cpu/display/amd-a10-7850k_3.html

coercitiv · May 15, 2014

Phynaz said:
The GPU cores have 16KB of L1 cache. They have access to the CPU L2 cache when running HSA workloads (which of course don't exist yet)

Cape Verde has an 512KB of L2 cache, 256KB per memory controller. AMD seems to be increasing it's cache size for GCN, from 64 > 128 > 256KB pe memory controller.
Since the L2 cache is tied to the memory controller, I should have figured Kaveri's iGPU would have none.

This is an interesting fact since without some other form of caching the Kaveri iGPU has no chance in matching it's dGPU brethren. (the ones that are comparable in computation power)

itsmydamnation · May 15, 2014

coercitiv said:
Since the L2 cache is tied to the memory controller, I should have figured Kaveri's iGPU would have none.

no there is L2, it is needed it maintains coherency and it where all the L1's write two, its better to think of it as memory controllers are tied to cache, not the other way around,

http://www.slideshare.net/DevCentralAMD/gs4106-the-amd-gcn-architecture-a-crash-course-by-layla-mah

[WCCF] AMD Kaveri Mobile APUs Vs. ULV Haswell Benchmarks

Blitzvogel

Platinum Member

AtenRa

Lifer

ShintaiDK

Lifer

coercitiv

Diamond Member

Blitzvogel

Platinum Member

Insert_Nickname

Diamond Member

coercitiv

Diamond Member

Insert_Nickname

Diamond Member

Blitzvogel

Platinum Member

Insert_Nickname

Diamond Member

mrmt

Diamond Member

Insert_Nickname

Diamond Member

Schmide

Diamond Member

Phynaz

Lifer

coercitiv

Diamond Member

itsmydamnation

Platinum Member

TRENDING THREADS