It sounds like you're mixing up VRAM power and GPU power (or you're not expressing yourself quite clearly). I (we?) was talking about stacking HBM on top of a relatively high powered chip like a CPU, GPU or APU. Even though the HBM does consume some power, it's bound to be far lower than the chip beneath, unless you're talking about a Core m competitor, in which case a couple of stacks of low-clocked HBM2 would probably be roughly equal. The problem here is twofold: first (and the lesser of the two) is that you're concentrating heat-generating chips into a smaller surface area, i.e. concentrating the heat. The second, and bigger issue is that the stacked-on-top HBM will act as a thermal insulator between the CPU/GPU/APU die and the IHS(/cold plate in a laptop), making thermal transfer (cooling) exponentially more difficult. In essence, you're trapping part of the heat from the lower chip, not allowing it to be carried away by the cooling system.
While switching from PoP to stacked dice will lower this effect (as
@imported_jjj was pointing out) it will in no way remove it entirely. And the hotter the chip below, the bigger the problem this becomes (as more heat generated means it needs to be dissipated quicker to avoid overheating). You'll run a very high risk of your Cpu entering heat soak, while your cooling solution will become far less efficient.
This barely works for ultra-low power devices like tablets (where, as I said, the iPad outperforms everything else in sustained loads largely due to having off-package RAM), but can never, ever work with more power hungry chips than that.