Can AMD "rescue" the Bulldozer?

Page 5 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
It would be nice if the concrete issues with Bulldozer weren't scattered among posts and blogs such as "8 LOLcores" and the like. So far I have not seen any solid criticism of the CMT implementation, even though there is some to be had especially regarding the L1 cache coherency. If anyone knows of some grounded analysis of whether and how AMD can improve Bulldozer's CMT implementation, I would love to read it.

It also keeps the real WTF regarding the years of Bulldozer design unfairly buried. Why would the chip engineers set such high MHz targets? It had bitten them before with previous nodes. It ignores one of the key mistakes Intel made with the Pentium 4. It is actually quite mind-boggling. It feels like they stripped down the Phenom II and then bolted on the cutting edge instructions, with very little plans to improve the core itself and to just rely on adjusting for higher clockspeed per Watt. Was marketing in charge of Bulldozer core design decisions and that's why they got the axe (a bit more Pentium 4 deja vu than I would care for)? Or is it that AMD's core x86 talent pool is so poor they were unwilling to bank on making a better core than Phenom II?

Edit: It seems such an odd design choice that pre-FX launch I thought surely AMD would be getting a 5-15% IPC increase over Phenom II out of a BD module running a single thread.
 
Last edited:

Martimus

Diamond Member
Apr 24, 2007
4,488
153
106
It would be nice if the concrete issues with Bulldozer weren't scattered among posts and blogs such as "8 LOLcores" and the like. So far I have not seen any solid criticism of the CMT implementation, even though there is some to be had especially regarding the L1 cache coherency. If anyone knows of some grounded analysis of whether and how AMD can improve Bulldozer's CMT implementation, I would love to read it.

It also keeps the real WTF regarding the years of Bulldozer design unfairly buried. Why would the chip engineers set such high MHz targets? It had bitten them before with previous nodes. It ignores one of the key mistakes Intel made with the Pentium 4. It is actually quite mind-boggling. It feels like they stripped down the Phenom II and then bolted on the cutting edge instructions, with very little plans to improve the core itself and to just rely on adjusting for higher clockspeed per Watt. Was marketing in charge of Bulldozer core design decisions and that's why they got the axe (a bit more Pentium 4 deja vu than I would care for)? Or is it that AMD's core x86 talent pool is so poor they were unwilling to bank on making a better core than Phenom II?

Edit: It seems such an odd design choice that pre-FX launch I thought surely AMD would be getting a 5-15% IPC increase over Phenom II out of a BD module running a single thread.

Pentium 4 was actually rather successful for Intel, in that they sold well and at a price premium. The fact that functionally it didn't do very well compared to the competition (AMD) yet still beat it where it counted (in sales) may have been the reason AMD went to the high clockspeed design at the expense of IPC.

I just don't see this strategy working as well as it did for Intel all those years ago, however.
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
Intel was riding their historical ability to charge a premium due to market dominance and branding. They were milking a position they held at the expense of hastening the general decline in ASP. It would have been a gamble for AMD to go this route at the height of their Athlon x64 success let alone years later.

My best guess at the moment was a desire to have a decent "MHz" number on their Fusion products for a given TDP.
 

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
I'm not sold on CMT yet, Vesku. It is a nifty trick when trying to jam a lot of cores into a single chip, saving die space by having them share certain resources and cache, but currently it's something that just isn't implemented well and frankly isn't needed. Trying to justify that 20% decrease in performance between 2-per-module rather than 1-per-module numbers for the 2 added "cores" is something has yet to be explained and frankly just doesn't seem to be worth it. As soon as you take a glance at the numbers the 2600k puts up with it's measly 4 cores and 8 threads you can't help but think it's a failure
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
It would be nice if the concrete issues with Bulldozer weren't scattered among posts and blogs such as "8 LOLcores" and the like. So far I have not seen any solid criticism of the CMT implementation, even though there is some to be had especially regarding the L1 cache coherency. If anyone knows of some grounded analysis of whether and how AMD can improve Bulldozer's CMT implementation, I would love to read it.

I know I've appealed to the masses in generic forms on many occasions for someone to explain in laymen terms what it is about SB that makes it such an IPC demon, and what it is about bulldozer that makes it a borderline step-back from Stars in IPC.

And I'm sure some nice folks have answered that inquiry a time or two but I'll be damned if I can remember

The closest I recall to such an analyses was some innuendo posted by Francois in which he claimed (paraphrasing here) that the slides depicting the microarchitecture of bulldozer (x-wide, y-retires, etc) were wrong and it was provable if you ran the right kind of code.

But that was a while ago, and has probably been debunked by now.

So yeah. I hope someone with authority on the topic rises to the call and delivers us ignorant noobs from our collective morass :awe:
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,786
136
Speaking of TLB bug, how's intel coming along with their broken virtualization? Just going to keep selling the broken silicon and let marketing gloss it over?

This is the reason they delayed the Xeon E5 launch till Q1. Yes, the Sandy Bridge E parts kinda underwhelm though.

Regarding the TLB issue...

You don't want to go there.
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
I'm not sold on CMT yet, Vesku. It is a nifty trick when trying to jam a lot of cores into a single chip, saving die space by having them share certain resources and cache, but currently it's something that just isn't implemented well and frankly isn't needed. Trying to justify that 20% decrease in performance between 2-per-module rather than 1-per-module numbers for the 2 added "cores" is something has yet to be explained and frankly just doesn't seem to be worth it. As soon as you take a glance at the numbers the 2600k puts up with it's measly 4 cores and 8 threads you can't help but think it's a failure

See that is something IDC just posted on. It's not so much the CMT though. What you are really asking is why is BD performance so far behind. Give BD back the 20% from CMT and add in the die space you are then not saving and it would still look a bit sickly for an 8 core 32nm.
 
Last edited:

StrangerGuy

Diamond Member
May 9, 2004
8,443
124
106
Its gonna be Prescott -> Cedar Mill but without the die-shrink and increased cache. Unsalvageable is the word, yeah.
 

denev2004

Member
Dec 3, 2011
105
1
0
Actually BD has already saved die place to some extend.
The square of a BD module with 2MB L2 is just a little bit bigger than a Sandy Bridge core with 2MB L3
The problem of taking-up die place is still with the cache system,which I can't understand at all
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
Actually BD has already saved die place to some extend.
The square of a BD module with 2MB L2 is just a little bit bigger than a Sandy Bridge core with 2MB L3
The problem of taking-up die place is still with the cache system,which I can't understand at all

This is a consistent weakness with AMD's design. Intel's cache is an asset, AMD's cache seems to be to its detriment.

Never really understood why though. Whether it is just a straight up money issue (Intel can afford the requisite R&D into cache topologies and AMD can't) or if it was more of the IP-space kind (AMD operates its cache as it does because it can't license the IP needed to make it like Intel's).

I know if you read the PR then you are led to believe that AMD's cache is the best thing since sliced bread, but reality paints a very different picture.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
It really is 65nm dejavu all over again. Remember 65nm Athlon X2's had problems just getting to the same clockspeeds of their 90nm siblings, and before Phenom even launched the biggest concern among enthusiasts was that the K10 would not be able to hit the intended clockspeeds because 65nm seemed to be a miss.

If I remember correctly, the 65nm parts were all 65W at launch, while the 90nm parts were 89W+. Did the performance recover if you allowed as much power consumption?
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
If I remember correctly, the 65nm parts were all 65W at launch, while the 90nm parts were 89W+. Did the performance recover if you allowed as much power consumption?

I couldn't remember myself, so I went to the AT article on Brisbane and looked. You are right, the TDP's for the parts were lower for the 65nm chips.

But at least at the time of the initial 65nm reviews the power-savings didn't really show up all that much.





Your point still stands, the 65nm parts were lower power consumption, but the TDP paints a story that is rosier than the reality.

But the question is of merit, had Brisbane been clocked accordingly such that it ran 10-15W higher in power usage (enough to match Windsor's power consumption) then it would have certainly clocked higher. How much higher though? 200, maybe 300 MHz tops?

There were good reasons why the 65nm chips weren't a major step improvement over the 90nm ones though. AMD had already pulled in many of the 65nm xtor improvements into 90nm in advance as part of their STT (Shared Transistor Technology) program, itself a subset of the CTI Model.









Source
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
And so I think you are right in asserting that bulldozer's clocks, while falling slightly short of what was needed, are a testament to the capability of the microarchitecture in light of the canary in the coal-mine observation that Llano is in regards to the capability of GloFo's 32nm process tech.

I mentioned "heat density" because I feel we as computer enthusiasts have been spoiled by the benefits of the node progressions.

Every time a chip is built on a new process tech we have yet another reason to rejoice. Faster clocks, Higher IPC, greater single threaded performance, lowered prices.....What is not to like about this?

But when does the gravy train come to an end without making drastic alterations in design philosophy either at the level of process tech or the uarch itself?

If we look at the last three nodes it has been nothing but victory for Intel using planar xtors.

1. 65nm Kentsfield GO stepping: 3.6 Ghz on Air
2. 45nm Nehalem: 4.0 to 4.2 Ghz on Air.
3. 32nm Sandy Bridge: 4.8 Ghz on Air

Comparing #3 to #1 we are seeing much greater single threaded performance in a area only 1/4 the die size! <----Amazing, but then I wonder if the release of Intel's 22nm Trigate is first sign these kind of advances on high power/high leakage planar xtor are coming to an end?

If so, how will AMD deal with this? Will they be able to rotate serial single threaded tasks around to various cores? Maybe Larger CPUs on 22nm planar with configurable dark silicon strategies within the core?
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
I'm not sold on CMT yet, Vesku. It is a nifty trick when trying to jam a lot of cores into a single chip, saving die space by having them share certain resources and cache, but currently it's something that just isn't implemented well and frankly isn't needed. Trying to justify that 20&#37; decrease in performance between 2-per-module rather than 1-per-module numbers for the 2 added "cores" is something has yet to be explained and frankly just doesn't seem to be worth it. As soon as you take a glance at the numbers the 2600k puts up with it's measly 4 cores and 8 threads you can't help but think it's a failure
If it is only loaded by a single thread within a single core of a single module, it is slower than Stars, on average. CMT and the shared L2 caches should affect performance scaling, but unless they are working around a bug in the implementation that we don't know about (possible), CMT should have no affect on one little thread. The large shared L2 cache, OTOH, should affect a single thread, and is significantly higher latency than a PhII's, in addition to BD's L1D being on the small side.
 

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
See that is something IDC just posted on. It's not so much the CMT though. What you are really asking is why is BD performance so far behind. Give BD back the 20&#37; from CMT and add in the die space you are then not saving and it would still look a bit sickly for an 8 core 32nm.

But it is the CMT and that's my point. That 20% decrease is significant. Very significant. It's also the reason they're able to brand is as an 8-core rather than a 4 core with threads (though I'd argue that that it's 4 "super cores" with 8 threads). The poor performance of their CMT design is partly responsible for an 8 "core" chip losing to a 2600k with only 4 cores and 8 threads.

Would that 20%ish gap been smaller if they were 8 dedicated cores? Yes, but what would the tradeoff be? It's not easy designing those things, but I think what nearly everyone can agree on is that today an enthusiast favors an IPC gain if they already have 4 cores. The pursuit to "hold the line" on IPC from AMD was partly what doomed them from the beginning. I think that was in part due to their aim to provide more cores, and thus CMT. The fact that they weren't able to even keep IPC at the same level doesn't bode well for their "moar cores with CMT" approach (and if you read their responses on hardocp you'll see they contradicted themselves with their answers with regards to IPC). Once you hit 4 cores, at least for today's computing needs, you should weigh the advantages and disadvantages of adding more cores and/or threads and just how they would impact the chip's architecture and performance. If it's one thing that AMD grossly underestimated and misunderstood it was software progression.

There's obviously an issue with the cache latency and I tried asking that on [H] but they didn't pick that one to answer. Do we need that much L3 and L2? Probably not. Could we use just a bit more L1 instead? Probably.
 
Last edited:

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
Would that 20%ish gap been smaller if they were 8 dedicated cores? Yes
No.

It would look like the above, unless there is currently an undisclosed design bug related to their CMT implementation. In the above benchmark, a single thread should get to use the entire front-end, as if CMT were not being used. If there were no CMT, that would still end up ~20% slower per clock per thread.


Here, CMT limitations might be coming into play, though isolating those affects from cache would be a challenge.
 

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
But what if we're talking 4 threads? 6? 2? How are they assigned? And most importantly, why does the Thuban get so damn close if it's got 2 less cores/threads? Thubans rely on isolated cores whereas the BD design has the 8 cores 8 threads with shared resources approach. I can't shake the feeling that 4 cores and higher IPC, along with 8 threads, would have been the better approach. Just look at the 2600k

I'm not impressed by CMT because of benchmarks like the one above that you posted. It doesn't make up the lost performance in single-threaded performance with the gains in multi-threaded benchmarks, where it more often than not, it still loses to the 2600k. The benchmarks I've seen where it gets a decent lead are benchmarks that use its new instruction sets, and even there the thubans and 2600k are within spitting distance.

And yea, I think the general consensus has been that the cache issues are hampering the chip's performance. By how much? Well that depends on the size of the cache and just what they've done with Piledriver. We'll have to wait until they release the am3+ chip and can't really rely on the Trinity benchmarks to show us just what they've done. Trinity will have Piledriver cores, but it's also an APU and has no L3 cache.
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
The single threaded performance is just not very good at the current default clocks. All indications are that if the single threaded performance were higher then CMT would respond to that with the corresponding CMT penalty. That can be seen through overclocking results.

Current situation: NotGood*8*.8(CMT penalty)

Better single threaded: Better*8*.8(CMT penalty)

Now, "NotGood" isn't quite as bleak in software that uses all of Bulldozer's new instructions. But it will be a while before most of the popular Windows software is updated, perhaps long enough that BD rev 2 will be in retail.

Again, I have yet to see any well constructed breakdowns on Bulldozer's CMT with the good and the bad explored. There is some analysis of the cache system and how it seems lackluster for the transistors and node size, but even that isn't extensive.

It seems Intel thinks AMD hasn't completely Pentium 4ed itself. They've gotten much looser lipped in the last few weeks regarding future products.
 

hooflung

Golden Member
Dec 31, 2004
1,190
1
0
I'm not sold on CMT yet, Vesku. It is a nifty trick when trying to jam a lot of cores into a single chip, saving die space by having them share certain resources and cache, but currently it's something that just isn't implemented well and frankly isn't needed. Trying to justify that 20% decrease in performance between 2-per-module rather than 1-per-module numbers for the 2 added "cores" is something has yet to be explained and frankly just doesn't seem to be worth it. As soon as you take a glance at the numbers the 2600k puts up with it's measly 4 cores and 8 threads you can't help but think it's a failure

It really doesn't matter if you are sold on CMT. The truth of the matter is that C++ and Java developers who write industrial, financial, heuristic and scientific applications need more true threads.

At best Hyper-Threading is a Surge tank. Sometimes it works absolutely wonderful. But it is unpredictable and is dependant on 2 layers of proper execution (the internal scheduler and the OS's). CMT is not. You have a 1 to 1 thread nature. You have to quantify how much work you can get out of an Intel chip to negate the overhead.
 

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
The single threaded performance is just not very good at the current default clocks. All indications are that if the single threaded performance were higher then CMT would respond to that with the corresponding CMT penalty. That can be seen through overclocking results.

Given their initial plan to hold IPC and boost stock clocks those penalties wouldn't be so drastic. Given its current state, with lower IPC and hungry power consumption, it's difficult to see where they were going with this and hard to imagine that they didn't at least try to tweak it a bit to reach those initial goals. Maybe they did but it didn't work? Who knows what these things looked like 2 years ago. But frankly, a stock clock of mid 4ghz does reek of pentium4-like fanaticism.

Just to utter something AMD themselves admitted to:

These CPUs were originally designed for the server space. Why release them on an enthusiast desktop when they themselves knew it would underperform is bewildering. Maybe an AMD 1366/2011 platform is in order? Workstation/Server? Somewhere where 8 threads can go to good use and not have to worry too much about single-threaded benchmarks?
 
Last edited:

cbn

Lifer
Mar 27, 2009
12,968
221
106
My best guess at the moment was a desire to have a decent "MHz" number on their Fusion products for a given TDP.

Maybe the Bulldozer module was aimed primarily at mobile and "many core" server? With the desktop application (with its higher clockspeed) an afterthought?

Essentially such a decision would be the opposite of what we saw with Phenom II and Lisbon right? On those 45nm parts wasn't the CPU core primarily designed around 3 to 3.6 Ghz operation with the lower clocked server parts being the afterthought?

With that being said I am quite interested to see how Bulldozer's high clockspeed fares on GF's 32nm as the node matures? Will power consumption drop showing ~4.5 Ghz operation actually being in the "sweet spot" for this design? Or is that kind of observation going to be quite unlikely even after many steppings?
 
Last edited:

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
But what if we're talking 4 threads? 6? 2? How are they assigned?
Depends on the OS. With the shared L2 cache, that ends up just like using HT.

And most importantly, why does the Thuban get so damn close if it's got 2 less cores/threads?
Each one of those cares can do more per clock.
I can't shake the feeling that 4 cores and higher IPC, along with 8 threads, would have been the better approach.
8 cores with higher IPC all-around than Thuban would have also been a better approach, CMT or not.

I'm not impressed by CMT because of benchmarks like the one above that you posted. It doesn't make up the lost performance in single-threaded performance with the gains in multi-threaded benchmarks
It can't, and it's not supposed to. With each core able to perform better, it should allow for better and more predictable scaling than HT, with maximum performance tapering off a bit as more threads are active.

What it should do is allow a small number of high-ILP threads to perform very well, then back off on peak performance as it maxes out with many active threads competing for either IPC [at fetch/decode/rename], or data bandwidth, or cache space, and then also provide some long-term side benefits, regarding perf/W, perf/mm^2, and ideally, perf/R&D$, that would end up making those trade-offs worthwhile (such as a high turbo speed, or higher non-turbo speeds, on the consumer end of things). Well, as it stands, the cores behind that shared front end are losing to improved Hammers, while executing very few threads.

Executing more threads, sometimes it scales great (low-ILP + low DLP + CMT = win), sometimes it doesn't (high-ILP + CMT = meh, small L1D + long distance to a big L2 = meh), sometimes it scales worse than Stars (cache hit rates and latencies, small L1D?). While I$ and L2 being shared certainly complicates figuring out how much of the poorer scaling when maxing out threads is actually CMT, surely some of it is. But, however much of it is using CMT, the first half of the threads needs to at least beat Stars more often than not. If those first 4 threads in a 4-module BD CPU performed, on average 15&#37; better per clock than they do today, BD would look like a very good start, instead of a 2011 vision of the P4.

IoW, yes, BD's performance leaves quite a bit to be desired, but unless there is a design bug in the CMT implementation, CMT doesn't seem to have much, if anything, to do with the average regression v. Stars when only a few threads are active.

Regarding the server v. desktop thing: the server performance is better, as expected, but it is still far from impressive. Remember: K8 and Stars were also server-first CPUs, and excelled there, so BD had its work cut out for it in that arena, too, and only gave a slightly better showing than on the desktop. Recall that Stars was able to at least remain moderately competitive with the early Nehalem Xeons. CMT certainly should have business application advantages, but those will only matter if the per-core performance/Watt can be brought up by at least 25% or so.
 
Last edited:

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
Regarding the server v. desktop thing: the server performance is better, as expected, but it is still far from impressive. Remember: K8 and Stars were also server-first CPUs, and excelled there, so BD had its work cut out for it in that arena, too, and only gave a slightly better showing than on the desktop. CMT certainly should have business application advantages, but those will only matter if the per-core performance/Watt can be brought up by at least 25&#37; or so.

I'd assume that optimized compilers would account for a healthier net gain as well, but, as you noted already, the chips wouldn't be used unless the perf-per-watt shows more improvement. Bear in mind I was only stating what AMD themselves admitted to in that [H] thread. I've read the AT review on Interlagos and was expecting it to do a little better than it did, but I wasn't surprised by the conclusion either.

8 cores with higher IPC all-around than Thuban would have also been a better approach, CMT or not.

http://www.anandtech.com/show/5174/why-ivy-bridge-is-still-quad-core

A lot of what he mentioned could transfer over to the BD argument. If it's going to be released on the desktop then why add another 4 (or 2) cores where you know it won't be utilized? If they had 8 true cores and released the chip on a workstation platform/socket with higher clocks and less regard for keeping TDP down then it may have had more glittering reviews despite even higher power consumption figures.

IoW, yes, BD's performance leaves quite a bit to be desired, but unless there is a design bug in the CMT implementation, CMT doesn't seem to have much, if anything, to do with the average regression v. Stars when only a few threads are active.

I've looked at CMT as a way of sidestepping making beefier cores and the CMT implementation was also way of saving die space. From my perspective I see that there isn't a "bug" in CMT implementation, but rather the reason the BD cores are so unimpressive, because with CMT and heavily-threaded applications AMD thought that they can afford to be. If we're to believe AMD's slides on Piledriver then we're to expect a 10-15% improvement in IPC so that may be the original iteration of BD they were gunning for. Unfortunately we'll be comparing those chips against IVB. If they put up a good showing against SB then at least we'll know they're original expectations weren't so far-fetched.
 
Last edited:

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
But it is the CMT and that's my point. That 20% decrease is significant. Very significant. It's also the reason they're able to brand is as an 8-core rather than a 4 core with threads (though I'd argue that that it's 4 "super cores" with 8 threads). The poor performance of their CMT design is partly responsible for an 8 "core" chip losing to a 2600k with only 4 cores and 8 threads.

Would that 20%ish gap been smaller if they were 8 dedicated cores? Yes, but what would the tradeoff be? It's not easy designing those things, but I think what nearly everyone can agree on is that today an enthusiast favors an IPC gain if they already have 4 cores. The pursuit to "hold the line" on IPC from AMD was partly what doomed them from the beginning. I think that was in part due to their aim to provide more cores, and thus CMT. The fact that they weren't able to even keep IPC at the same level doesn't bode well for their "moar cores with CMT" approach (and if you read their responses on hardocp you'll see they contradicted themselves with their answers with regards to IPC). Once you hit 4 cores, at least for today's computing needs, you should weigh the advantages and disadvantages of adding more cores and/or threads and just how they would impact the chip's architecture and performance. If it's one thing that AMD grossly underestimated and misunderstood it was software progression.

There's obviously an issue with the cache latency and I tried asking that on [H] but they didn't pick that one to answer. Do we need that much L3 and L2? Probably not. Could we use just a bit more L1 instead? Probably.

First of all, the CMT is working like AMD have said it would. CMT doesn't decrease single thread performance. IPC is measured in single core single thread. BD lacks single core performance, that's the big problem.

BD was created with large L2 and L3 caches because TLP/DLP and SIMD's needs them. Desktop applications need smaller and faster caches.

If i remember correctly, with deeper pipelines (20?)we need smaller L1 D-cache, that's the reason AMD installed a smaller L1 D-cache (16kb) inside the Core than 64kb in Thuban.

BD has the potential for high performance but only IF software will take advantage of its abilities.

They could increase IPC per core but that would increase core size, i dont expect them to do that, i expect them to increase the core count (add more Modules) because they targeting parallel workloads (Cloud, Virtualization, SIMDs, TLP, DLP etc)

One more thing, i believe the Bulldozer architecture is the beginning of the FSA (Fusion System Architecture) that will fuse the CPU and GPU together.
The Bulldozer architecture could be the foundation for the Architectural Integration (Unified Address Space for CPU and GPU) that will happen in the next APUs after Trinity.

AFDS Keynote: “The Programmer’s Guide to the APU Galaxy.”
Phil Rogers, AMD Corporate Fellow
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |