With HBM and 14/16nm shrinking GPU area, will we see m.2 GPUs?

Valantar · Dec 2, 2015

Just a thought I've been entertaining recently:

Given that a PCIe 3.0 x4 link is plenty for even high end GPUs (please don't argue this, it's been demonstrated quite a lot), and that HBM (especially HBM2) and the coming die shrink will allow for huge area savings in GPUs, shouldn't it be possible in the near future to fit a decent GPU on an m.2 22110 or even 2280 card? Of course not monster GPUs like the Fury family, but something half decent at 1080p? Thermals would of course be an issue, but as long as a laptop has room for a large enough fan and heatsink, running a fat heatpipe or two along the m.2 card shouldn't be an issue. Or am I completely wrong?

bystander36 · Dec 2, 2015

Have you seen just how small M.2 drives are?

NTMBK · Dec 2, 2015

Nope, not enough PCIe bandwidth.

n0x1ous · Dec 2, 2015

room for vrm's caps etc? I dont see it happening.

ShintaiDK · Dec 2, 2015

No...

iiiankiii · Dec 2, 2015

It's possible. You can probably get xbox 360 or ps3 level GPUs that small. Heck, mobile devices are already that powerful with very little TDP. The iPad Pro's A9x can already do this. The Nvidia Shield "console" is also as powerful as last gen's consoles. It is definitely possible. But, graphic horsepower is no where near what we want them to be. We are a long ways off from photorealism.

Elcs · Dec 2, 2015

Something that is small enough to fit into an M.2 slot *and* can support a sufficient heatsink and/or fan combination *and* can have sufficient vram is likely to be on par with some sort of AMD APU or Intel HD graphics GPU I would speculate.

Ergo... not much of a practical use unless you could SLI an APU and M.2 GPU to make some sort of fast and low profile but cheap and cool and low energy use system... which sounds quite barmy to me

KingFatty · Dec 2, 2015

A Beowulf cluster of mobile phone GPUs?

Valantar · Dec 2, 2015

NTMBK said:
Nope, not enough PCIe bandwidth.

That's simply not true. PCIe 3.0 x4 has the same bandwidth as 1.0 x16, and there have been quite a few articles exploring PCIe bottlenecking on older motherboards - all of which conclude that 1.0 x16 is no a bottleneck for a single high end GPU today.

iiiankiii said:
It's possible. You can probably get xbox 360 or ps3 level GPUs that small. Heck, mobile devices are already that powerful with very little TDP. The iPad Pro's A9x can already do this. The Nvidia Shield "console" is also as powerful as last gen's consoles. It is definitely possible. But, graphic horsepower is no where near what we want them to be. We are a long ways off from photorealism.

The GPU of the A9X is far beyond the Xbox 360/PS3 - at least according to Imagination's slides.

And, of course, I'm not talking the next paradigm in high performance desktop GPUs here. I'm talking about a better way to cram a decent, replaceable GPU into a smallish laptop chassis.

Elcs said:
Something that is small enough to fit into an M.2 slot *and* can support a sufficient heatsink and/or fan combination *and* can have sufficient vram is likely to be on par with some sort of AMD APU or Intel HD graphics GPU I would speculate.

Ergo... not much of a practical use unless you could SLI an APU and M.2 GPU to make some sort of fast and low profile but cheap and cool and low energy use system... which sounds quite barmy to me

I think you're misunderstanding what I'm talking about here. Why would the heatsink/fan have any physical relation to the m.2 card other than a couple of heatpipes? I'm not talking desktop implementations here, obviously. And in a laptop, the norm is to place fans and heatsinks away from heat sources. The card would only need a secure fastening mechanism for the heatpipes to make contact with the GPU. Which should be doable, no?

Also, for the VRAM, HBM2 supports 4GB/stack from what I can remember. Wouldn't that be plenty for a GPU like this, both in terms of amount and bandwidth?

As for fitting the GPU on the card? m.2 cards are 22mm wide, so fitting a 20mm wide chip on one side should be feasible, no? They fit flash chips and controllers very close to the edges of the cards, at least. I'm not talking some thing the size of Fiji of course, or even Tonga, and I know die shrinks aren't perfect, but you should be able to get very good performance out of a 400mm2 (20mm square) GPU at 14nm, right? After all, Tonga is less than 400mm2. And a single stack of HBM2 takes little enough space not to be an issue. To conserve power, a "big" and slow chip would seem like a smart fit, rather than smaller and higher clocked.

Now I get that both power delivery and fitting necessary VRMs and such would be difficult. I don't know how much power the m.2 slot can transfer by ut self, but I can't imagine it being a lot. But how much power delivery circuitry would a 30-50w GPU need? Of course you'd need to exceed the m.2 spec for thickness, but are you sure everything necessary couldn't fit on both sides of at least a 110mm card?

Timmah! · Dec 2, 2015

I am more curious if we´ll see 2GHz clocks.

MrTeal · Dec 2, 2015

Valantar said:
That's simply not true. PCIe 3.0 x4 has the same bandwidth as 1.0 x16, and there have been quite a few articles exploring PCIe bottlenecking on older motherboards - all of which conclude that 1.0 x16 is no a bottleneck for a single high end GPU today.

The GPU of the A9X is far beyond the Xbox 360/PS3 - at least according to Imagination's slides.

And, of course, I'm not talking the next paradigm in high performance desktop GPUs here. I'm talking about a better way to cram a decent, replaceable GPU into a smallish laptop chassis.

I think you're misunderstanding what I'm talking about here. Why would the heatsink/fan have any physical relation to the m.2 card other than a couple of heatpipes? I'm not talking desktop implementations here, obviously. And in a laptop, the norm is to place fans and heatsinks away from heat sources. The card would only need a secure fastening mechanism for the heatpipes to make contact with the GPU. Which should be doable, no?

Also, for the VRAM, HBM2 supports 4GB/stack from what I can remember. Wouldn't that be plenty for a GPU like this, both in terms of amount and bandwidth?

As for fitting the GPU on the card? m.2 cards are 22mm wide, so fitting a 20mm wide chip on one side should be feasible, no? They fit flash chips and controllers very close to the edges of the cards, at least. I'm not talking some thing the size of Fiji of course, or even Tonga, and I know die shrinks aren't perfect, but you should be able to get very good performance out of a 400mm2 (20mm square) GPU at 14nm, right? After all, Tonga is less than 400mm2. And a single stack of HBM2 takes little enough space not to be an issue. To conserve power, a "big" and slow chip would seem like a smart fit, rather than smaller and higher clocked.

Now I get that both power delivery and fitting necessary VRMs and such would be difficult. I don't know how much power the m.2 slot can transfer by ut self, but I can't imagine it being a lot. But how much power delivery circuitry would a 30-50w GPU need? Of course you'd need to exceed the m.2 spec for thickness, but are you sure everything necessary couldn't fit on both sides of at least a 110mm card?

A 400mm^2 die wouldn't be attached to a 22mm board. The whole chip would need to be 20mm on a side, so the die would necessarily be a decent bit smaller. Even then, that's a ridiculous amount of power to try and put on a m.2 card. When you have a single chip that occupies a whole dimension of the PCB, routing and supplying power becomes a bit of a nightmare.

Really though, what you describe already exists. MXM is PCIe over a card edge interface for laptops. 80mmx105mm MXM cards (which are essentially a bigger version of what you're describing) are rated for up to 100W.

Headfoot · Dec 2, 2015

Yeah, this is MXM. No reason to reinvent the wheel.

Valantar · Dec 2, 2015

MrTeal said:
Really though, what you describe already exists. MXM is PCIe over a card edge interface for laptops. 80mmx105mm MXM cards (which are essentially a bigger version of what you're describing) are rated for up to 100W.

Headfoot said:
Yeah, this is MXM. No reason to reinvent the wheel.

Of course it's like MXM. Except that MXM modules are unnecessarily huge now that you can make tiny GPUs with no need for massive wasted areas used only for memory routing. Thus, fitting an m.2 based GPU would require far less engineering effort, not to mention a far smaller chassis, than fitting an MXM module.

MrTeal said:
A 400mm^2 die wouldn't be attached to a 22mm board. The whole chip would need to be 20mm on a side, so the die would necessarily be a decent bit smaller. Even then, that's a ridiculous amount of power to try and put on a m.2 card. When you have a single chip that occupies a whole dimension of the PCB, routing and supplying power becomes a bit of a nightmare.

Yeah, I get that fitting a 20x20mm die on an m.2 module might be optimistic. But 15mm wide? Have chip mounting techniques not improved sufficiently in recent years for that? How many more pins would a chip like this require compared, for example, to a high-end mobile SoC? Outside of power needs, I'd be surprised if this wasn't possible.

MrTeal · Dec 2, 2015

Valantar said:
Of course it's like MXM. Except that MXM modules are unnecessarily huge now that you can make tiny GPUs with no need for massive wasted areas used only for memory routing. Thus, fitting an m.2 based GPU would require far less engineering effort, not to mention a far smaller chassis, than fitting an MXM module.

MrTeal said:

A 400mm^2 die wouldn't be attached to a 22mm board. The whole chip would need to be 20mm on a side, so the die would necessarily be a decent bit smaller. Even then, that's a ridiculous amount of power to try and put on a m.2 card. When you have a single chip that occupies a whole dimension of the PCB, routing and supplying power becomes a bit of a nightmare.

Click to expand...

Yeah, I get that fitting a 20x20mm die on an m.2 module might be optimistic. But 15mm wide? Have chip mounting techniques not improved sufficiently in recent years for that? How many more pins would a chip like this require compared, for example, to a high-end mobile SoC? Outside of power needs, I'd be surprised if this wasn't possible.

They have, but not to the point you're talking about. Even with Fiji, the GPU and HBM are on a silicon interposer which still sits on top of a standard substrate. You might place a 15mm die on a slightly larger interposer, but that would still be on a larger substrate before being mounted on a PCB.

Even if you do that though, you still need to deal with routing. Small memory sticks are one thing, but routing all that high power under the chip creates the need for a very expensive to manufacture board filled with blind and buried vias to allow power planes to work without becoming Swiss cheese.

You'd also still need a decent bit of power circuitry on there. Just look at the requirements for the 980M, which has the same 400mm^2 die you proposed and is a 100W part.

Even with all the VRAM being a single HBM stack, getting everything else on there into a card 1/4 as wide would be a massive and costly engineering challenge. Even the retooling of VRMs would be costly, as m.2 doesn't supply 12V and you'd need to bring in a lot of amps of 3.3V over the small number of traces to power the beast. Even 15mmx15mm and half the TDP, I can't imagine them pushing to that any time soon.

I could see something along the lines of GP108 or GP106 on an m.2 stick though. It might be interesting form factor for something like a NUC.

NTMBK · Dec 3, 2015

Valantar said:
That's simply not true. PCIe 3.0 x4 has the same bandwidth as 1.0 x16, and there have been quite a few articles exploring PCIe bottlenecking on older motherboards - all of which conclude that 1.0 x16 is no a bottleneck for a single high end GPU today.

Those articles were all uniformly bad. No measurements of latency spikes, attempt to assess effect on texture pop in, load times, and so on.

Given that modern open world games are requiring 4GB GPUs as a minimum, do you really think you will get a good experience shovelling all that data over an x4 link?

therealnickdanger · Dec 3, 2015

NTMBK said:
Those articles were all uniformly bad. No measurements of latency spikes, attempt to assess effect on texture pop in, load times, and so on.

Given that modern open world games are requiring 4GB GPUs as a minimum, do you really think you will get a good experience shovelling all that data over an x4 link?

Do you have articles that objectively measure latency spikes, pop in, load times, and so on? That's not a rhetorical question, I would really like to see it! Otherwise, you can't just take a dump on one source without backing it up. From what I've seen, it doesn't make any difference unless you're using SLI/CF. But will that still matter in the Vulkan/DX12 era?

i7 5960X, 16GB quad-channel, GTX 980 and GTX 980 SLI. Averages and FCAT:
http://www.guru3d.com/articles-pages/pci-express-scaling-game-performance-analysis-review,1.html

The benefit of PCIe bandwidth for GPUs is greatly exaggerated. Knowing that the theoretical bandwidth of PCIe 3.0x4 = PCIe 1.0x16, and that for single card performance it makes no difference, and that an m.2 GPU would be size/heat constrained anyway, I think that there could be a niche within a niche somewhere. Would I buy one? No.

PPB · Dec 3, 2015

I could really see low end Firepro and Quadros suited for this. Fanless designs, really low impact in computer size and a good niche not eaten by the looming APU business (counting both Intel and AMD here, not just the latter) with their iGPs. It would also really suit the 5x5 Intel platform, considering it doesnt have traditional PCI-E slots.

PD: Also, bandwith is scarse if talking about games, GPU rendering and GPGPU is another matter entirely.

Valantar · Dec 4, 2015

therealnickdanger said:
The benefit of PCIe bandwidth for GPUs is greatly exaggerated. Knowing that the theoretical bandwidth of PCIe 3.0x4 = PCIe 1.0x16, and that for single card performance it makes no difference, and that an m.2 GPU would be size/heat constrained anyway, I think that there could be a niche within a niche somewhere.

My point exactly. Also, given that this would be for relatively low power cards (<50W), you're not talking the kind of power where PCIe bandwidth matters much any way. Laptop gaming at >1080p is as of yet more or less a pipe dream (it exists, but at massive cost and with little to no upgradeability, i.e. no longevity). What I'm proposing here is a compact solution for ~1080p gaming in small form factor devices.

Let's use the GTX 980m that was brought up before as a baseline for a thought experiment. And yes, I'm entirely unqualified for this. But I'll give it a go any way.

The 980m performs at a certain level - more than enough for gaming at 1080p. It does that at ~100W average power (given that it's a Maxwell card, I suspect it spikes far higher than that, but drops far lower as well).

What's a reasonable expectation for power reductions moving from 28nm to 16nm? Not a 50% drop, obviously, but quite a lot. After all, Intel dropped the TDP of their mainstream laptop CPUs from M class (35W) to U class (17W) moving from 32nm to 22nm, only sacrificing base clocks slightly. Let's say this move reduces GPU power consumption by 25%. That gives us 100% performance at 75W average power.

Now, let's talk about that 400mm2 GPU. If area scaling worked like simple maths, moving from 28nm to 16nm would make a 20x20mm chip into a 12x12mm chip. Of course, it's not that simple. But would I be wrong thinking that something like 16x16mm would be doable? After all, Samsung's Exynos SoCs shrank by 40% (http://www.anandtech.com/show/9330/exynos-7420-deep-dive) moving from 28nm to 14nm (although a straight comparison here is impossible, isn't it a reasonable pointer? After all, this was while moving to "larger" and more complex CPU cores).

The GTX 980m has 1536 cores and runs at "1038 +boost" MHz. For additional area and power savings, let's drop that down to, say, 1280 cores and 900 MHz + boost. (This would of course shrink the core even further.) That's a 17% drop in cores and a 13% drop in clockspeed. Given that power doesn't scale linearly with clockspeed, along with architectural improvements, the power usage should drop far more than the performance from this. I'd go out on a limb and say you could get perhaps 80% of GTX 980m performance at 50W of power.

Now, add in the reduced power usage of HBM. According to Anandtech, 4 stacks of HBM1 should use ~14W (http://www.anandtech.com/show/9266/amd-hbm-deep-dive/4). A single stack of HBM2, even if it's 4GB instead of 1GB, should use less power than this. Also, the figures show this being roughly half the VRAM power usage of both the Titan X and the R9 290X, both using GDDR5. As the 980m has fewer GDDR5 chips running slower and on a narrower bus than these, it would use less power for RAM, but a 10W drop doesn't seem unreasonable to me. All the while the bus width would quadruple to 1024 bits and effective bandwidth would increase to 256GB/s (numbers from this news post about HBM2: http://www.extremetech.com/gaming/2...specifics-up-to-16gb-of-vram-1tb-of-bandwidth). Heck, you could even squeeze an 8GB stack in there if you really wanted to, but bandwidth would stay the same, and power usage would increase. Also, as Anandtech noted, HBM saves area by requiring less complex power delivery to RAM chips.

With all of this combined, I'm pretty sure that with all of this put together, you'd be able to get 80% of GTX 980m performance at half the power usage and significant area savings.

How significant area savings?

Memory chips, GPU and memory traces in red, rough outline of an m.2 22110 in green for comparison.

We now have a 16x16mm GPU. HBM1 stacks are 5x7mm (I haven't been able to find any information regarding HBM2 die size, but I'd guess it would increase at least some - a 4x density increase sounds unlikely, no?).

Now I'm speculating quite a bit, but I'm guessing a lot of the package size outside of the die is to fit traces for the VRAM. If I'm right in thinking this, the package should shrink quite a lot with HBM. Reducing the interface to PCIe 3.0 x4 from the package should shrink it further. And, for small form factors like this, why not integrate the substrate into the board itself, i.e. attach the interposer directly to the PCB? I guess this would have to be done at the chip manufacturing plant, but would it be any more difficult than what they're doing today? Using HBM takes the choice of RAM chips away from the board OEMs anyway, so why not integrate production accordingly?

Would making the package 20x40mm be impossible? If not, this leaves more than half the length of an m.2 22110 card for power delivery.

I'd also suggest standardizing "external" 12V power connectors for modues like this (pads or some form of connector at the far end of the card?), removing the limitation of the m.2 connector's 3.3V power supply. This would also make for a very tidy board layout: Power pins -> power circuitry -> chip -> m.2 connector. This also removes the need for routing large power traces under/around the chip.

I know I'm probably wrong in at least half of the assumptions and estimations I make in this post. But is it really such a stretch?

Tl;dr: Wild, more or less unfounded speculations resulting in ~80% of GTX 980m performance at 50W in an m.2 card.

bystander36 · Dec 4, 2015

What is so wrong with using an existing spec that gets the job done now? What are you really gaining here? Size constraints are typically thickness, not area. The existing MXM works with far less sacrifice. When/If GPU's get to the size where they fit on an M.2 card, they'll come up with a new spec that is smaller and faster than M.2.

One other issue with your proposal is that M.2 already exists with certain guidelines regulated by their use. Heat dissipation for a GPU is not likely part of these regulations and would likely not work on many current implementations. M.2 was not designed for GPU's, and adding that use now would cause problems with existing MB designs.

For example, my MB has the M.2 slot tucked underneath a GPU.

Headfoot · Dec 4, 2015

If MXM was "unnecessarily huge" they wouldn't still be using it. MXM is probably close to just right, which is why that standard has stood the test of time and continues to do so. As evidenced by nVidia putting a desktop 980 on MXM, there is more interest in putting bigger GPUs into the same space rather than trying to fit smaller ones into smaller space. It's an interesting idea and all but its pretty much preempted by MXM's current status

moonbogg · Dec 4, 2015

I think it would be possible for a low powered card, but I don't think it will happen.

Valantar · Dec 4, 2015

Headfoot said:
If MXM was "unnecessarily huge" they wouldn't still be using it. MXM is probably close to just right, which is why that standard has stood the test of time and continues to do so. As evidenced by nVidia putting a desktop 980 on MXM, there is more interest in putting bigger GPUs into the same space rather than trying to fit smaller ones into smaller space. It's an interesting idea and all but its pretty much preempted by MXM's current status

Have you ever seen an MXM card in a laptop smaller than 15"? Me neither. And even many 15" (and 17") gaming laptops have soldered on GPUs. This is mostly due to size constraints, as an 8*10cm square kills a lot of space that could be used for other things - batteries, or the bazillion drives that they like to stuff into these.

Mxm is great if what you want is the maximum amount of performance possible in a large, thick laptop where space is of little to no concern. If you want reasonable gaming performance in a small laptop, it's utterly useless.

And as to existing m.2 implementations - OF COURSE you would need a motherboard and chassis designed for it. I'm not proposing this as something to drop into any laptop with an m.2 slot. Where would the heat go? I'm pretty much taking for granted that this breaks the m.2 spec in a number of ways too. But how hard would it be to implement an extention of the spec, specifically for GPUs?

The whole point is that with m.2, you could get an entire GPU plus a fan and a reasonable heatsink in roughly the area that an MXM module would occupy. I.e. an opportunity for user upgradeable, repairable, small form factor gaming laptops (at reasonable prices). I bet they'd sell like hotcakes to the esports crowd.

Kristijonas · Dec 5, 2015

therealnickdanger said:
i7 5960X, 16GB quad-channel, GTX 980 and GTX 980 SLI. Averages and FCAT:
http://www.guru3d.com/articles-pages/pci-express-scaling-game-performance-analysis-review,1.html

http://www.guru3d.com/articles_pages/pci_express_scaling_game_performance_analysis_review,6.html

It says here that Thief benefited by 10% in fps and ~25% in latency from gen 3 PCIe. I did not expect that. I mean, sure, the benefit is very little and we're back to measuring RAM speeds of DDR3/4, different motherboard speeds, sound card benefits to fps from reduced cpu cycles... the different is really small, but this benchmark with Thief shows that in certain situations the difference can actually be felt. 10% fps increase and 25% lower latency IS a good thing and not to be ignored completely.

sontin · Dec 5, 2015

Here is the size of nVidia's Jetson X1 module:
https://youtu.be/WFUcGGuWhdk?t=60

Not really m.2 size but you could "plug" it into a PCIe slot.

Vesku · Dec 5, 2015

Valantar said:
Have you ever seen an MXM card in a laptop smaller than 15"? Me neither. And even many 15" (and 17") gaming laptops have soldered on GPUs. This is mostly due to size constraints, as an 8*10cm square kills a lot of space that could be used for other things - batteries, or the bazillion drives that they like to stuff into these.

Mxm is great if what you want is the maximum amount of performance possible in a large, thick laptop where space is of little to no concern. If you want reasonable gaming performance in a small laptop, it's utterly useless.

And as to existing m.2 implementations - OF COURSE you would need a motherboard and chassis designed for it. I'm not proposing this as something to drop into any laptop with an m.2 slot. Where would the heat go? I'm pretty much taking for granted that this breaks the m.2 spec in a number of ways too. But how hard would it be to implement an extention of the spec, specifically for GPUs?

The whole point is that with m.2, you could get an entire GPU plus a fan and a reasonable heatsink in roughly the area that an MXM module would occupy. I.e. an opportunity for user upgradeable, repairable, small form factor gaming laptops (at reasonable prices). I bet they'd sell like hotcakes to the esports crowd.

There is the battery space but most of the reason MXM is only in the largest laptops is it being difficult to cool in smaller form factors. Not much point cramming the highest end performance GPUs into some sort of M.2 card if cooling restrictions mean it would only be running ~10-15W more than the soldered on the mainboard version.

With HBM and 14/16nm shrinking GPU area, will we see m.2 GPUs?

Golden Member

Diamond Member

Lifer

Platinum Member

Lifer

Senior member

Diamond Member

Diamond Member

Golden Member

Golden Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Lifer

Senior member

Golden Member

Golden Member

Diamond Member

Diamond Member

Lifer

Golden Member

Senior member

Diamond Member

Diamond Member