Its really the other way around, millions of people are using GPT and other commercially/closed models. Where as most LLM questions/and even lots of image stuff will run just fine on a 16gb GPU from 2020 using distilled models. Meanwhile look at apple AI, they critically depend on GPT simply because they don't have any models or public facing tech of their own.
So in this world where the real OEMs are HP, Dell, Lenovo, ect. And their customers have a pressing need for high bandwidth matrix calculations, and this job can either be done by a giant GPU chip using special GDDR ram, or you can put an NPU into a SOC, save tons of power, use the same ram chips already on the mobo, and still get good enough performance. And if everyone has AI on their machine, it really makes it easy for developers to make AI based apps and programs they can sell to people. This is both microsoft and apple's angle.
Right now it's awful for me to try and run AI on my non-accelerated CPU and I fully have to commit my GPU and lots of power for good performance. Where as NPUs on CPUs really open up the doors to things like games using AI for npcs, or in digital media where the NPU can be used to free up compute for things like effects and transitions. For development it really blows the hinges because it means running one AI into another, locally, suddenly becomes feasible and thats where things really shine. I for one really enjoy using different system prompts on the same model to get more varied results. ie, one programmer bot thinks its a webdev, the other I tell its john carmack, now give me code for X. Would be super nice to get both results at the same time for comparison.
Also look at the GPU market lately, it's not like there's a crypto rush right now, those sales and scalping is legit pressure from the AI market. And if AMD can leverage even a tiny bit of that onto desktop CPU, thats becomes a win for potential volume and margins.