First, the aforementioned APU would cost 299$, not 500$.
Secondly, the die cost would be extremely similar to the cost of Ryzen 8C CPU, because of similar die size, for 4C+16CU design.
So pretty much around 20$ per die we are talking about wafer costs.
Thirdly, what APU we are talking about? 4C/8T+12CU is mobile design, with up to 35W TDP. It is called Raven Ridge.
4C/8T+16CU+ HBM2 is most likely Horned Owl APU with TDPs up to 95W, that is targeted for not only Mainstream market, but also embedded, professional, server, and Machine Learning Markets. Those are the markets where the margins can be higher.
Lastly we have Snowy Owl design: 16C/32T, 64CU's, 16 GB of HBM2. Note that it is exactly 4 times bigger design than Horned Owl. Coincidence? Or perfect scaling?
Next is the cost. Whole APU+2 stacks of HBM2 package will not cost more than 40$. Price it at 300$ and you still have huge margin, in a market that is much bigger than CPU only. You are targeting the market where you have BOTH CPU and GPU.
Lastly. Which of the markets would explode with such a product? Small-Form-Factor, efficient designs. NUC's. The markets in which the money lies, currently.
It is the funniest part, that this APU design can be a cure for dying mainstream market.
Im sure we will see this design this year, but it will be at least Q4 2017.
Seriously you want 2 HBM stacks for 16 CUs lol? They have 2 stacks for 64 CUs with Vega 10 at 12.5TFLOP and 200+W.
And how much does it cost, 1GB of DDR4 2133 is 6-8$ today. Even a single stack of 2GB HBM doubles the cost or more, considering the interposer and yields.
Vs a 11-12CU APU they get what?
Higher dev costs (at much lower volumes, sales wise), more than 2x the manufacturing costs (larger die, HBM, interposer) and what do they gain?
11-12 CUs vs 16 CUs at same overall TDP - the 16CU has the HBM adding significant heat and IF perf is higher, the CPU's TDP rises a bit too.
So they more than double the manufacturing costs , dev costs are much higher on a per unit basis and they gain almost nothing in perf at the same TDP for the entire SoC.
If the OS could use the HBM as a system memory to reduce the amount of DRAM required, costs become slightly better but still won't make any sense unless you have a firm commitment in laptop from some big client.
If you consider the basics, CPU, GPU, memory interface, you are better off finding the right balance between how wide the GPU is and clocks than to go after costly solutions that get you nothing.The memory BW is just a limitation they need to accept and deal with. Just like folks in phones don't go 128 bit memory interface because it is too power hungry.
The 16 cores server SKU should be a more traditional MCP (not a monolithic die), the CPUs are not on the interposer so they just add a Vega 10 to the package ,more or less.
As i have mentioned, such a solution comes with minimal cost and could be doable, paring a 4C die with a Vega 11 (lets say 32CU or so- note that Vega 11 might not use HBM an then this solution isn't viable anymore), it would be a niche market for folks that need a very compact machine- like laptops.This would make sense if Vega is good as it would hurt Nvidia but would need to be close in perf, power and cost to Nvidia's discrete offering (+ a discrete CPU) and that might be tough. It wouldn't be an APU per say but close enough.
Anyway, the normal 11-12CU APU will be up to 300-350$, APU's are not cheaper, they just replace some cores with a GPU to serve a certain market, Kaby Lake is an APU.
Anyone expecting Raven Ridge with 4 cores and 11-12CUs at 150$, is delusional. And BTW, Raven Ridge is laptop first, we don't even know if it comes to desktop this year.
The point of using advanced packaging in APUs and CPUs is to lower costs (better yield and lower dev costs) and gain flexibility. To achieve that, they need much cheaper solutions like organic interposer or Intel's silicon bridge.On the memory side ,HBM in its current form is not ideal either, from a cost perspective. It's highly likely that we'll see them use chiplets and advanced packaging but it will take a bit more time.
There is a packaging conference in about a week and there could be some interesting things shown
http://www.imaps.org/devicepackaging/