Question By 2030, we will be buying massive NPUs with a CPU and GPU attached to it.

mikegg

Golden Member
Jan 30, 2010
1,815
445
136
Today, NPUs such as Apple's neural engine takes less space than the CPU or GPU in a SoC. By 2030, I predict that we won't be buying "CPUs". We will all be buying NPUs with a CPU and a GPU attached to it.

NPUs will become the new CPUs.

More applications will start to make massive use of AI inference. Soon, consumers will demand that their laptops and mobile phones infer models as big as GPT or LLaMA or Stable Difusion or future large models. It has been theorized that the current iPhone 14 Pro could infer Meta's LLaMA model, though slowly and with much less accuracy.

In order to do this, chip makers will focus on making NPUs and making them huge.

We are in the beginning of a complete paradigm shift in chip requirements.

Apologies if this is the wrong place to post this. There is no NPU forum on Anandtech.
 
Last edited:
Reactions: Vattila

TheELF

Diamond Member
Dec 22, 2012
3,990
744
126
If A.I. will become standard in systems then they will become accelerator modules inside CPUs and GPUs and not the other way around.
Intel already included nervana and then habana in their GPUs years ago.
The inference for apps will still be happening on servers, there is no reason for the same work to be done a billion times over instead just once.
 
Reactions: Hulk

TheELF

Diamond Member
Dec 22, 2012
3,990
744
126
I will be buying RTX 7090 with motherboard, CPU, RAM and whatnot plugged into it.
Nvidia maybe not, but intel already has this, it's just a question of if they can pull it off with one of their full power ARC GPUs on there.
This thing is supposed to plug into a backboard with nothing on it other than a PCI bridge for an GPU but derbauer put it into a system.
Yo dawg...
 
Reactions: Timmah!

Mopetar

Diamond Member
Jan 31, 2011
8,000
6,433
136
The NPU in your phone will actually shape your thought patterns in order to make you want a better NPU so that more of human economic activity is devoted to developing a better NPU.

If you want to prospect yourself from your future AI overlords, I suggest you purchase one of the lovely tinfoil hats I happen to have available for sale. Get one before the NPU gets you.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,688
1,222
136
*shakes foretelling 8-ball* ah yes...indeed hmm..

I do not foresee a Neural/Inference processing unit craze. CPUs and GPUs will remain as the big features and NPU/IPU will remain as minor features. Nice to have but nothing important.

Proof of the 8-ball:


It has been foretold! *magic poof and Nosta without an R vanished*
 

soresu

Platinum Member
Dec 19, 2014
2,933
2,156
136
I will be buying RTX 7090 with motherboard, CPU, RAM and whatnot plugged into it.
Not really viable simply because of how much IO that mobos have attached to them and how much space these connectors take up on the board.

It's possible that they could simply split some of this IO into separate daughter boards as they did decades ago - it would probably make installation less of an issue if you could split all of those connectors that are such a problem matching up once the mobo is already inside the case so that you can just plug it all into one or 2 slots on the mobo, especially on ATX12VO mobos which need extra connectors anyway.
 

moinmoin

Diamond Member
Jun 1, 2017
4,993
7,763
136
How can one start this thread and not be aware of Lisa Su's recent talk about compute efficiency?


Her point essentially is: Energy efficiency is the primary limiter. For a target of >10.000 GF/Watt for <= 100MW Zettascale the industry will need to find ways to use AI in integral ways as efficiency shortcuts. So CPUs and GPUs fundamentally assisted by NPUs.

 

Shivansps

Diamond Member
Sep 11, 2013
3,872
1,527
136
Nvidia maybe not, but intel already has this, it's just a question of if they can pull it off with one of their full power ARC GPUs on there.
This thing is supposed to plug into a backboard with nothing on it other than a PCI bridge for an GPU but derbauer put it into a system.
Yo dawg...

That concept has been attempted since the ISA slot days, it never had success because you will only be exchaging what card is the motherboard and what card is a I/O expansion board. That said, with the upcoming PCI-E 6.0 rev im thinking it may be actually be possible to make "CPU addon cards" that will have a socket or a soldered cpu, and will just add to the system using the same system ram via the pci-e slot and NUMA nodes.

What is definatelly clear here, igpus already are a huge part of a cpu die, if they have to add a NPU to it then the cpu cores will be a very minor part. And when that happens they may actually change the name for marketing reasons (like they did with the APUs).
 

Doug S

Platinum Member
Feb 8, 2020
2,469
4,024
136
Today, NPUs such as Apple's neural engine takes less space than the CPU or GPU in a SoC. By 2030, I predict that we won't be buying "CPUs". We will all be buying NPUs with a CPU and a GPU attached to it.

NPUs will become the new CPUs.

More applications will start to make massive use of AI inference. Soon, consumers will demand that their laptops and mobile phones infer models as big as GPT or LLaMA or Stable Difusion or future large models. It has been theorized that the current iPhone 14 Pro could infer Meta's LLaMA model, though slowly and with much less accuracy.


So you're saying "build it and they will come"?

Most inference tasks are far less latency sensitive than CPU or GPU tasks, and would work just fine via cloud for a lot of stuff where privacy is less of an issue - even where privacy is a potential concern, the fact consumers keep using products like Facebook and buying gadgets like Alexa and Ring demonstrates that they either don't care or don't understand these issues so most will not be willing to pay more to get an SoC with a giant NPU to keep those inference tasks local.

Not to mention there has to be a 'killer app' for this, something people can't live without. ChatGPT is fun to play with, but a killer app it is not.
 
Reactions: Tlh97 and Ranulf

alcoholbob

Diamond Member
May 24, 2005
6,271
323
126
Id be disappointed if CPUs didnt have enough cache on die to offset the unified memory of consoles, otherwise even a 5 year old console would outperform a $10,000 TOTL PC in minimum framerate.

Also, I hope micro LED is affordable by 2030…
 

TheELF

Diamond Member
Dec 22, 2012
3,990
744
126
That concept has been attempted since the ISA slot days, it never had success because you will only be exchaging what card is the motherboard and what card is a I/O expansion board.
I don't get your point here, this is an all inclusive card, it only needs power.
All the I/O is on board.
If somebody would be able to come up with a mobo that would combine a bunch of these into a cloud like network it would be a thing to behold.
 

Shivansps

Diamond Member
Sep 11, 2013
3,872
1,527
136
I don't get your point here, this is an all inclusive card, it only needs power.
All the I/O is on board.
If somebody would be able to come up with a mobo that would combine a bunch of these into a cloud like network it would be a thing to behold.

Its a very old concept.


Those things have existed for a long, long time, it is just a SBC in a very innefficient format, it was used mostly for industrial uses. You can make SBCs smaller than that with more expandability. In fact LGR made a video on one of those old devices some time ago.

In fact, it actually made FAR more sence for the ISA slot because due to how it worked it allowed you to have a I/O board with multiple ISA slots. You cant really do that with PCI-E whiout adding a bridge on the I/O board.
 
Reactions: lightmanek

kschendel

Senior member
Aug 1, 2018
270
203
116
The problem I see with this notion that neural engine processing will take over, is the lack of accuracy. Wetware NPU's have been developing for millions of years, and yet we still get garbage results. (Election denial in the US, for instance.) Systems like chatgpt might be cute, but if you're using its output without a thorough vetting, you're building your house on sand, in a flood zone.

In other words, not gonna happen.
 

FlameTail

Diamond Member
Dec 15, 2021
3,122
1,786
106
If A.I. will become standard in systems then they will become accelerator modules inside CPUs and GPUs and not the other way around.
Intel already included nervana and then habana in their GPUs years ago.
The inference for apps will still be happening on servers, there is no reason for the same work to be done a billion times over instead just once.
There are already matrix accelerators for AI embedded in the CPU in ARM processors. (See Apple AMX and ARM SME).

Those accelerators are used for tasks where latency is more important than brute computing power, while the NPUs are used for tasks where the opposite is true.
 

mikegg

Golden Member
Jan 30, 2010
1,815
445
136
M4 will be "AI focused". Will likely feature a significantly larger NPU with maybe added quant types accelerations such as INT4.


See OP. I believe that by 2030, the NPU will be the biggest part of the SoC. We're going to be buying NPUs with a CPU and GPU attached to it.
 
Last edited:

Doug S

Platinum Member
Feb 8, 2020
2,469
4,024
136
M4 will be "AI focused". Will likely feature a significantly larger NPU with maybe added quant types accelerations such as INT4.


See OP. I believe that by 2030, the NPU will be the biggest part of the SoC. We're going to be buying NPUs with a CPU and GPU attached to it.

I would think the future direction will be to integrate the GPU and NPU to avoid duplication of resources. There is a lot of similar floating point hardware in both, the main difference being the support for smaller formats like INT16, INT8 and even INT4.

It hasn't made sense for Apple to do this yet because the NPU is a tiny block on the overall SoC, but the bigger the NPU becomes relative to the GPU / rest of the SoC the more sense it makes.
 

mikegg

Golden Member
Jan 30, 2010
1,815
445
136
I would think the future direction will be to integrate the GPU and NPU to avoid duplication of resources. There is a lot of similar floating point hardware in both, the main difference being the support for smaller formats like INT16, INT8 and even INT4.

It hasn't made sense for Apple to do this yet because the NPU is a tiny block on the overall SoC, but the bigger the NPU becomes relative to the GPU / rest of the SoC the more sense it makes.
There's a reason they don't do this today. GPUs take a substantial amount of power to run. GPUs are becoming more and more like general purpose compute. NPUs are far more like ASICs. They're for efficiency. They are optimized for low precision inference.

In 2024, the AI labs are experimenting with different quantizations. Some are trying INT4 and found that LLMs are still pretty darn good. Some are even trying single bit inference.

We don't know what is the best yet. Once we do, the NPU will evolve to focus on a particular quantization.
 
Reactions: Vattila
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |