Tensor Processing Units (TPU) for Consumers; The next big thing?

PeterScott · Aug 25, 2018

With the new RTX cards from NVidia, most of the focus is on Ray Tracing, and performance in last generation games.

But we are seeing the first Tensor Processing units (TPU) aimed at consumers. TPUs were developed by Google, for machine learning, and Google open sourced their Tensor Flow software which is driving the industry on Deep Learning.

More info on TPUs:
https://en.wikipedia.org/wiki/Tensor_processing_unit
https://cloud.google.com/blog/produ...k-at-googles-first-tensor-processing-unit-tpu

This appears like it could potentially enable a variety of new capabilities when they these are in the hands of consumers.

My expectations is that they will soon (within 3 years) migrate to all new GPU designs beyond entry levels. Putting TPUs in large numbers of consumer machines and there may also be some migrations into CPUs/SoCs.

The big potential stumbling blocks is having the TPUs locked to only allow first party trained neural networks. For example with RTX card, you can only use trained networks supplied by NVidia to do denoising for Ray Tracing, and some video/imaging processing tasks. Vendors won't be able to supply their own trained networks, and consumers won't be able to experiment with training their own networks. Though eventually we should get unlocked access.

IMO, putting TPUs in the hands of consumers seems like a big step. Do they take off and become pervasive?

darkswordsman17 · Aug 26, 2018

I'm not sure this takes off significantly in the consumer space, as it makes more sense to put that hardware in the cloud (where it can utilize all the data to improve, I think that's an aspect you're not taking into account is that this stuff needs to be fed lots of data to improve, so the only way to realize its potential is on large scale data). I think this is just something they can hype right now, but that it'll be fairly limited on the consumer side.

I do think we start seeing more specialized hardware though. GPUs are becoming more complex and general purpose and growing in size such that they're becoming the biggest hog of power on chips (and also cost because of their size), and we're going to slowly see some of their general purpose functionality be moved to specialized hardware that is more tailored and efficient at processing it, so that they can keep things in check. Heck, even things you think GPUs would be especially suited for (image processing, video) they already have offloaded to specialized hardware.

What specifically are you talking about "variety of new capabilities when these are in the hands of consumers"? Because average people don't even know what "deep learning" is, let alone how to take advantage of it. Maybe I'm completely misunderstanding what you're talking about, but this reminds me of the person in the CPU subforum that was arguing that now with stuff like Threadripper offering 32 threads, that we're going to see consumers take over the cloud (meaning build their own individual ones) and take down Google and the like. I think it fundamentally ignores the data aspect of both situations (plus, whatever aspect people think consumers can exploit by gaining this new level of hardware, these corporations are even more able to exploit).

So assuming you're talking about someone coming up with software that automatically uses the hardware for the consumers (since average consumers are definitely not going to be programming their own inference and AI programs, people are still going to be at the behest of others to enable them to actually use it in a meaningful manner, and so they're going to most likely just use people like Google. I mean, people could definitely build their own index and search engines, but somehow we didn't see that happen.

That's not to say this isn't interesting in that consumers will be able to tinker around with it with it becoming economically feasible for them to get in on it. But then I think Tensor cores have been around for a little while, so its kinda like arguing that well, consumers could design their own tensor chips and get them built. That's close to as feasible as expecting that they'll code their own programs to use them. Much like Threadripper, great for the tinkerers and enthusiasts, but I think people believing that this is going to spur some huge consumer development (when has that ever happened?) are being naive (don't take that as an insult, I love the optimism!). I could maybe see it be like app and game development, where these new tools enable a lot more people to try to do those things, and we'll see some development (and some really neat/well done stuff) but the large players will still dominate.

PeterScott · Aug 26, 2018

darkswordsman17 said:
What specifically are you talking about "variety of new capabilities when these are in the hands of consumers"? Because average people don't even know what "deep learning" is, let alone how to take advantage of it.

I don't foresee many normal consumers training their own DL neural networks.

I just see them taking advantage of pre-trained DL-NN, and they wouldn't really have to know DL exists, just that certain applications get a boost from certain video cards.

The obvious early uses are likely image/video processing, but as the HW gets into the hands of more consumers, Developers would likely start finding more uses for this new problems solving tool.

We may be on the verge of different paradigm in computer problem solving making it into homes. How it might be used, or even if it will, remains a big unknown. But historically, new tools, lead to new approaches, and new applications.

To me, this is a lot more interesting, than just having more CPU/GPU performance.

ThatBuzzkiller · Aug 27, 2018

Define "next big thing" ?

The biggest killer apps for machine learning so far are voice/text recognition, computer vision, maybe artificial intelligence, video/image processing like you mentioned, data mining, finance, search engine queries or doing any time series analysis. Machine learning will unquestionably change the way we collect and analyze data but I don't think it will fundamentally change the way we do signal processing on a grand scale ...

coercitiv · Aug 27, 2018

PeterScott said:
We may be on the verge of different paradigm in computer problem solving making it into homes. How it might be used, or even if it will, remains a big unknown. But historically, new tools, lead to new approaches, and new applications.

Let me introduce you to your alter ego:

PeterScott said:
LOL!

In our constantly connected, living online world, everything is migrating to the cloud, people are abandoning the ownership model for the service model (Netflix, Spotify) as well as backing up (or even only having) their personal data in the cloud.

That applies to computing resources as well.

Braznor · Aug 27, 2018

Has it been confirmed the tensor cores in RTX series are locked in and cannot be used to train our own networks?

PeterScott · Aug 27, 2018

coercitiv said:
Let me introduce you to your alter ego:

Exactly what is the zinger you think you found in your out of context quote mining?

Braznor · Aug 27, 2018

No one knows the answer to my question?

coercitiv · Aug 27, 2018

PeterScott said:
Exactly what is the zinger you think you found in your out of context quote mining?

On one hand you talk about putting TPUs in a large number of consumer machines, on the other hand you LOL! at people who entertain the idea of increased computation power at home, declaring the cloud model is the only alternative for computation intensive applications.

PeterScott · Aug 27, 2018

coercitiv said:
On one hand you talk about putting TPUs in a large number of consumer machines, on the other hand you LOL! at people who entertain the idea of increased computation power at home, declaring the cloud model is the only alternative for computation intensive applications.

I LOL'd at an absurd statement that the "The cloud era is over". This was emphasized in my original post.

Which is why quote mining out of context, from other threads no less, isn't great.

Highlighting the absurdity of that statement, by pointing out the obvious growing cloud usage, doesn't mean I think the cloud is the only way for computation to take place. It was just a counterpoint to the absurdity.

Do you have any interest in discussing TPUs, or do you just want to mine threads for out of context quotes?

PeterScott · Aug 27, 2018

Braznor said:
Has it been confirmed the tensor cores in RTX series are locked in and cannot be used to train our own networks?

No, but given the way companies like to price segment resources, it is my unfortunate expectation that is what will happen.

Midwayman · Aug 29, 2018

TPU's on consumer cards feels a like more like them trying to spread out their R&D cost for workstation/datacenter or self driving compute than anything. IE Nvidia would rather not make a gaming specific GPU do we're getting TPUs and now to find a way to market that as a good thing.

PeterScott · Aug 29, 2018

Midwayman said:
TPU's on consumer cards feels a like more like them trying to spread out their R&D cost for workstation/datacenter or self driving compute than anything. IE Nvidia would rather not make a gaming specific GPU do we're getting TPUs and now to find a way to market that as a good thing.

NVidia has been pursuing Real Time Raytracing a lot longer than it has been involved in self driving cars or deep learning for that matter.

TPUs are on Consumer cards for the Raytracing. Nothing more, nothing less. If they could denoise effectively without TPUs, they wouldn't be there.

The interesting thing for me is how much usage they get outside of Raytracing.

Dribble · Aug 29, 2018

TPU's are definitely something for the future - for all devices. Where you put them is a different question. In phones/etc you have one chip and a number of modules (cpu/gpu/sound/video decode), they all access the same memory and very fast interconnect. I can see an AI module to do voice recognition/etc.

In pc's traditionally have lots of separate bits - cpu, sound, graphics but that leads to problems with communication between them and each bit needs it's own memory, etc. It's not very efficient but it is easy to upgrade. I guess in Nvidia's future for the pc the gpu becomes the centre of everything as it's more then a gpu - it provides most of the specialised processing, and that's attached to a cpu for general purpose stuff and io. I guess AMD/Intel's dream is to go like phones/consoles and turn pc's into 1 chip solutions.

PeterScott · Aug 29, 2018

Dribble said:
TPU's are definitely something for the future - for all devices. Where you put them is a different question. In phones/etc you have one chip and a number of modules (cpu/gpu/sound/video decode), they all access the same memory and very fast interconnect. I can see an AI module to do voice recognition/etc.

In pc's traditionally have lots of separate bits - cpu, sound, graphics but that leads to problems with communication between them and each bit needs it's own memory, etc. It's not very efficient but it is easy to upgrade. I guess in Nvidia's future for the pc the gpu becomes the centre of everything as it's more then a gpu - it provides most of the specialised processing, and that's attached to a cpu for general purpose stuff and io. I guess AMD/Intel's dream is to go like phones/consoles and turn pc's into 1 chip solutions.

While we don't have a clear breakdown of how much die area the TPU's are using, it does seem like it's a significant portion of an enormous die, making putting them in phones questionable for some time. You also need some significant use case to justify spending the die area.

GPUs are good starting location since you have a use case for them (Raytracing) and GPU might be a good location for related applications (Video/image processing). I don't know how much memory bandwidth they require, but GPUs have the most. Overall this seems like the right location to start. Applications will determine if/where they go next.

Qwertilot · Aug 29, 2018

DLSS? Its very tentative but the graphs make it look like that might really do something big performance wise.
(Probably more benefit/unit of area than more shaders etc.).

If it does it could make sense to ship the smaller chips with TPU’s but not the ray tracing bits.

TheF34RChannel · Aug 29, 2018

Qwertilot said:
DLSS? Its very tentative but the graphs make it look like that might really do something big performance wise.
(Probably more benefit/unit of area than more shaders etc.).

If it does it could make sense to ship the smaller chips with TPU’s but not the ray tracing bits.

We have only the one graph and by Nvidia so it's highly biased. Although DLSS looks very promising I'd like to see independent articles on it.

Can a chip exist without one of those core designs? You lot are much more knowledgeable than I am in that regard.

Headfoot · Aug 29, 2018

Regardless of whether the average home user finds much use in dedicated deep learning hardware, the fleet of mid-range average software developers who don't work for huge companies will absolutely benefit. This plus the increasingly easy to use deep learning and machine learning libraries like TensorFlow will mean that average Joe developer at average Joe small to mid cap company ($25m-$100m in revenue-ish) can start to build software that utilizes this. Multiply that across thousands of different niched up apps that these developers are working on. That can be huge, even if its only indirectly getting to consumers. It's nothing you can't already do on AWS but to have a card locally on your machine for dev purposes will make it even more accessible

PeterScott · Aug 29, 2018

TheF34RChannel said:
We have only the one graph and by Nvidia so it's highly biased. Although DLSS looks very promising I'd like to see independent articles on it.

Can a chip exist without one of those core designs? You lot are much more knowledgeable than I am in that regard.

Video link from the main thread, I watched the whole thing. Lots of info indicating developers will be able to add Tensor AI to games in whatever way they want. NVidia will help with training the networks.

Lots of specific DLSS info staring at time stamp. Including that this is definitely a post processing buffer AA, and key, it is trained specifically for each game. NVidia will train DLSS for specific games on their Super Computer network.

Talks about training DLSS for each game here:
https://youtu.be/YNnDRtZ_ODM?t=24m4s

Downside, is there will no forcing DLSS on, or adding it easily. It has to be trained for the game.

Upside DLSS should be really good if it is trained on each game.

Edit:
Below he is talking about seeding an innovation platform, putting AI computing in reach of Garage computer scientists.
https://youtu.be/YNnDRtZ_ODM?t=40m32s

TheF34RChannel · Aug 30, 2018

Thanks very much! I will have a look later.

NTMBK · Aug 30, 2018

I think we'll definitely see more DNN inference accelerators integrated into PC chips- it's too useful and power efficient for lots of tasks that consumers might want to use it for (see all the stuff like face tracking and recognition on phones). Microsoft has already added the Windows ML and DirectML: https://blogs.msdn.microsoft.com/directx/2018/03/19/gaming-with-windows-ml/ Hopefully Intel and AMD will add backends at some point. AMD's INT8 packed math should work great for inference tasks.

Dribble · Aug 30, 2018

PeterScott said:
Video link from the main thread, I watched the whole thing. Lots of info indicating developers will be able to add Tensor AI to games in whatever way they want. NVidia will help with training the networks.

Lots of specific DLSS info staring at time stamp. Including that this is definitely a post processing buffer AA, and key, it is trained specifically for each game. NVidia will train DLSS for specific games on their Super Computer network.

That super computer will be on the cloud. You'll pay for some time on it to optimise, so in that way it will be available to everyone (at a cost obviously). It does ask the question of what happens when you get AMD AI cores, or maybe google or ARM AI cores in phones. If you have to train the network then how cross compatible is that training? If it isn't very compatible then that's a huge barrier entry for say ARM or AMD. Google and Nvidia can write the software and make the cloud super computers to train with, AMD and ARM don't have the same level of software skills or history of cloud computing.

NTMBK · Aug 30, 2018

Dribble said:
That super computer will be on the cloud. You'll pay for some time on it to optimise, so in that way it will be available to everyone (at a cost obviously). It does ask the question of what happens when you get AMD AI cores, or maybe google or ARM AI cores in phones. If you have to train the network then how cross compatible is that training? If it isn't very compatible then that's a huge barrier entry for say ARM or AMD. Google and Nvidia can write the software and make the cloud super computers to train with, AMD and ARM don't have the same level of software skills or history of cloud computing.

From the WinML/DirectML article I linked above:

As we disclosed earlier this month, The WinML API allows game developers to take their trained models and perform inference on the wide variety of hardware (CPU, GPU, VPU) found in gaming machines across all vendors. A developer would choose a framework, such as CNTK, Caffe2, or Tensorflow, to build and train a model that does anything from visually improving the game to controlling NPCs. That model would then be converted to the Open Neural Network Exchange (ONNX) format, co-developed between Microsoft, Facebook, and Amazon to ensure neural networks can be used broadly. Once they've done this, they can pipe it up to their game and expect it to run on a gamer's Windows 10 machine with no additional work on the gamer's part. This works, not just for gaming scenarios, but in any situation where you would want to use machine learning on your local machine.

PeterScott · Aug 30, 2018

Dribble said:
That super computer will be on the cloud. You'll pay for some time on it to optimise, so in that way it will be available to everyone (at a cost obviously). It does ask the question of what happens when you get AMD AI cores, or maybe google or ARM AI cores in phones. If you have to train the network then how cross compatible is that training? If it isn't very compatible then that's a huge barrier entry for say ARM or AMD. Google and Nvidia can write the software and make the cloud super computers to train with, AMD and ARM don't have the same level of software skills or history of cloud computing.

During the Video, I linked above,Tom mentions at least once, that training your networks for DLSS at least will be FREE, as part of their developer relations program.

TheF34RChannel · Aug 30, 2018

PeterScott said:
During the Video, I linked above,Tom mentions at least once, that training your networks for DLSS at least will be FREE, as part of their developer relations program.

Hopefully this creates a broad variety of developers opting for inclusion of DLSS. I'd like to know how long it takes a network to be trained.

Tensor Processing Units (TPU) for Consumers; The next big thing?

Platinum Member

Lifer

Platinum Member

Golden Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Platinum Member

Platinum Member

Diamond Member

Platinum Member

Platinum Member

Platinum Member

Golden Member

Senior member

Diamond Member

Platinum Member

Senior member

Lifer

Platinum Member

Lifer

Platinum Member

Senior member