Yes, I'm aware of that. But I'm not saying AMD should decrease/remove L2/L3 cache and place HBM instead. Just that HBM could be used as a "first memory choice", before DDR. Similar to GPUs - when it runs out of VRAM, it will use slower DDR3/4 if needed. It might be good thing if it is cheaper/better than investing in quad (or more) channel platform, and insisting in many RAM modules.
Though I don't know why I'm talking about this, since I'm not so familiar with this stuff and it might just be some stupid/unreal solution
The way to think about this is that ,in consumer,it's about costs not perf. Spending a lot for small gains in perf doesn't work, just like quad channel is not worth it today in consumer.
If they invest in a solution ,they need to at least generate sufficient revenue to avoid loses.Revenue would be about units and price.Low volumes would require a very high price.
High perf is low volumes so how do you make a product that can be sold at a high price by being better than alternatives.
A very large APU would have to compete with discrete GPUs and ,in this case,beat their own solutions and Nvidia's solutions to sell well enough.The math might not add up.
A small APU with HBM would be too costly vs normal APUs while offering small perf gains.
The memory could be HBM like so some 3D stacked DRAM and much cheaper interposer or even just a DRAM die very close to the CPU with a very high bandwidth link. And sure if the CPU can use this memory ,maybe they save on DRAM costs. However, it's likely not quite enough to justify spending on it.
The best chance to see an APU with HBM anytime soon would be if they make it for server and decide to also sell it in consumer.Server pays for it and consumer just widens the market a bit.
Longer term, with cheaper advanced packaging solutions, they could do it to save money. They develop a CPU and a separate GPU die and they pair them- 1 CPU no GPU,CPU and 1GPU, CPU and 2GPUs, 1GPU no CPU and so on and use those combinations across the board. Development on the most advanced process is very costly and getting worse. A small die also has better yields and in consumer, some extra latency between CPU and GPU doesn't quite matter if it enables substantially lower costs - ofc compared to discrete ,the latency is nothing. They would lose a bit in perf but gain a lot in costs and flexibility.
If they do this, they are already paying for advanced packaging and developing every die that goes with it and since the GPU needs bandwidth, they might as well use some HBM like memory in the SKUs that require it. Cheaper advanced packaging and stacked memory solutions are crucial to enable this and those are arriving soon.
They might be already doing this in server to save on dev costs although it is a bit weird to lose perf in a segment where costs matter a lot less but they have limited resources today and at least the ASPs in server are high enough to cover the current costs for advanced packaging. It's a bit upside down for now and maybe they give up on that in server to chase every bit of perf and power they can, but adopt it in consumer when cheaper packaging and stacked memory solutions will be available.