AMD's next GPU uarch is called "Polaris"

maddie · Jan 5, 2016

raghu78 said:
Actually its well known that 16/14nm yields for die sizes above 200 sq mm are really bad. In fact only TSMC 16FF+ is going to be feasible for 300+ sq mm GPUs in terms of yields. But TSMC will be capacity constrained as demand far outstrips supply at TSMC 16FF+. Moreover TSMC will give first priority for Apple A9, A9X and A10/A10X (Q3 2016 release) . Nvidia and AMD are better served by using 16FF+ for selling high performance USD 300+ GPUs. AMD's choice to thus go with two GPU dies - a 110 - 120 sq mm low power GPU die fabbed at GF 14LPP and a high performance 300 sq mm GPU die fabbed at TSMC 16FF+ makes sense.

My guess is the the low power GPU specs will be a
R7 470 - 768 sp,
R7 470X - 1024 sp, 1 geoometry engine, 1 raster engine, 32 ROP, 128 bit memory bus 8 Ghz GDDR5

The performance will be on par with GTX 960 for the fully enabled SKU and GTX 950 for the salvage SKU.

the high performance GPU using HBM2 will power 4 SKUs as I expect yields to be really bad for 300 sqmm GPUs in 2016. I think there is going to be heavily salvaged SKUs in 2016 to fill the product stack. We will see a dedicated mid range chip in 2017 once yields are much better.

R9 490X - 4096 sp, 4 geometry engines, 4 raster engines, 128 ROPs, 2048 bit HBM2 , 512 GB/s, 8 GB.

R9 490 - 3072 sp, 4 geometry engines, 4 raster engines, 128 ROPs, 2048 bit HBM2 , 512 GB/s, 8 GB.

R9 480x - 2048 sp, 2 geometry engines, 2 raster engines, 64 ROPs, 1024 bit HBM2, 256 GB/s, 4 GB.
R9 480 - 1792 sp, 2 geometry engines, 2 raster engines, 64 ROPs, 1024 bit HBM2, 256 GB/s, 4 GB.

AMD's approach makes a lot of sense as they use GF 14LPP to serve the high volume GPU market as AMD has a WSA to meet. GF will be able to yield a 110-120 sq mm die reasonably well enough and AMD can try and push as much volume as possible from GF 14LPP. TSMC 16FF+ will be used for the bleeding edge GPUs of 2016.

I expect 4th gen GCN to have significant improvements in perf/sp and thus I think we can expect a 25-30% faster flagship R9 490X GPU compared to Fury X. I think Nvidia will come out with a faster GPU as Maxwell already has impressive perf/cc and Pascal should bring more. I think the Nvidia GPU will be 10% faster than AMD's flagship GPU.

I guess we're on opposite ends of this.

You actually see <50% shaders as a harvested die? Wow. I would never have considered that.

My belief is that we are still mentally trapped in the old world of monolithic designs. The use of interposers radically change the old design limits. Interposers are NOT PCBs.

With your above example, the 490X is a bit better than FuryX, maybe 20-25% performance wise, not power.

Two 200-225mm^2 die should allow a 490X to be 80 % better than FuryX AND maybe cost the same as a 300mm^2 one, if your prediction of terrible yields as you approach 300mm^2 is true.
You will need an interposer and HBM in both cases, and the present one in Fury is big enough for (2) 225 GPU die.

IF AMD wants to regain market share, they can't dance with Nvidia. They have to be clear in front. Raja must know this.

Also, if you see 300mm^2 as being poor yielding at first, why are so many Nvidia fans expecting a big GP100 early?

Techhog · Jan 5, 2016

maddie said:
Just bringing us back to earth for a moment.

My error in not being clear. This was for Kenmitch, two posts earlier. It had nothing to do with the rest of the post. Since yesterday he has been dropping some hilarious posts. Strangely though, quite a few here are taking him at face value adding to the humor.

Now to the rest of your post.

Just found this example of high performance multi-die on interposer:

http://www.semiwiki.com/forum/showw...ree-Dimensional+Integrated+Circuit+3D+IC+Wiki

"Production interposer: Xilinx Virtex-7
Xilinx is using this technology for their Virtex-7 FPGAs. They call the technology “stacked silicon interconnect” and claim that it gives them twice the FPGA capacity at each process node. This is because very large FPGAs only become viable late after process introduction when a lot of yield learning has taken place. Earlier in the lifetime of the process, Xilinx have calculated, it makes more sense to create smaller die and then put several of them on a silicon interposer instead. It ends up cheaper despite the additional cost of the interposer because such a huge die would not yield economic volumes.
The Xilinx interposer consists of 4 layers of 65um metal on a silicon substrate. TSVs through the interposer allow this metal to be connected to the package substrate. Microbumps allow 4 FPGA die to be flipped and connected to the interposer. See the picture above. An additional advantage of the interposer is that it makes power distribution across the whole die simpler. This seems to be the only design in high volume production today."

I bolded the advantages possible in the above quote.

All I'm saying is what I posted before. We get strange conclusions if RTG is only producing 2 Die and the 1st one is 100-110 mm^2.

Do you believe that they will surpass Fury X this year?
What size Die?
Where is the mid range?
Is mid range a 50-60% activated full die?
Can AMD afford this waste as mid-range will sell many multiples of high end?

If this can be done, we can have very high performance 14nm GPUs very early in the cycle. The interposer will introduce latency, but as I keep reading on this forum, "GPU latency can be designed around".

In my opinion the advantages are compelling. Surely AMD has thought about it.

This theory is now an annual thing with AMD. It's not going to happen. That quote doesn't mean that it can actually work for GPU without the same limitations as Crossfire. That would mean that there would be no benefit whatsoever to producing this die other than a smaller board. So, I'm standing by what I said. In fact, if it happens and it's functionally the same as a single GPU, I will buy the card for one of you people who insist that this will happen every year.

MrTeal · Jan 5, 2016

maddie said:
I guess we're on opposite ends of this.

You actually see <50% shaders as a harvested die? Wow. I would never have considered that.

My belief is that we are still mentally trapped in the old world of monolithic designs. The use of interposers radically change the old design limits. Interposers are NOT PCBs.

With your above example, the 490X is a bit better than FuryX, maybe 20-25% performance wise, not power.

Two 200-225mm^2 die should allow a 490X to be 80 % better than FuryX AND maybe cost the same, if your prediction of terrible yields as you approach 300mm^2 is true.
You will need an interposer and HBM in both cases, and the present one in Fury is big enough for (2) 225 GPU die.

IF AMD wants to regain market share, they can't dance with Nvidia. They have to be clear in front. Raja must know this.

What exactly do you think an interposer allows you to do that a PCB doesn't? An interposer is used for HBM to allow much higher density routing than is possible with a standard substrate, while also decreasing the drive necessary on each pin. Other than that, it doesn't itself enable anything that you can't do with a PCB.

Could you put two 200mm^2 GPUs on a silicon interposer? Sure. To what end? The GPUs would still need to communicate over PCI-E, and you'd have essentially a traditional dual GPU card with all the pitfalls of that.

It's possible that using an interposer could allow you have an extremely dense parallel interface between the GPU dies that would allow it to be treated as a single monolithic GPU and have both dies working on the same scene without major issues, but no one's talked about such a technology being developed, let alone used in Polaris. I'd imagine it would have to be a pretty fundemental change to the front and back ends of the chip.

Re: The virtex FPGAS with multiple dies, while I haven't looked into it I imagine there are going to be some pretty interesting restrictions on crossing signals between the dies, and some severe timing implications. Even turning the corner in an FPGA can severely impact your timing, going off die would be interesting to say the least.

Techhog · Jan 5, 2016

MrTeal said:
What exactly do you think an interposer allows you to do that a PCB doesn't? An interposer is used for HBM to allow much higher density routing than is possible with a standard substrate, while also decreasing the drive necessary on each pin. Other than that, it doesn't itself enable anything that you can't do with a PCB.

Could you put two 200mm^2 GPUs on a silicon interposer? Sure. To what end? The GPUs would still need to communicate over PCI-E, and you'd have essentially a traditional dual GPU card with all the pitfalls of that.

It's possible that using an interposer could allow you have an extremely dense parallel interface between the GPU dies that would allow it to be treated as a single monolithic GPU and have both dies working on the same scene without major issues, but no one's talked about such a technology being developed, let alone used in Polaris. I'd imagine it would have to be a pretty fundemental change to the front and back ends of the chip.

Re: The virtex FPGAS with multiple dies, while I haven't looked into it I imagine there are going to be some pretty interesting restrictions on crossing signals between the dies, and some severe timing implications. Even turning the corner in an FPGA can severely impact your timing, going off die would be interesting to say the least.

You are the one truly bringing us back to Earth.

maddie · Jan 6, 2016

MrTeal said:
What exactly do you think an interposer allows you to do that a PCB doesn't? An interposer is used for HBM to allow much higher density routing than is possible with a standard substrate, while also decreasing the drive necessary on each pin. Other than that, it doesn't itself enable anything that you can't do with a PCB.

Could you put two 200mm^2 GPUs on a silicon interposer? Sure. To what end? The GPUs would still need to communicate over PCI-E, and you'd have essentially a traditional dual GPU card with all the pitfalls of that.

It's possible that using an interposer could allow you have an extremely dense parallel interface between the GPU dies that would allow it to be treated as a single monolithic GPU and have both dies working on the same scene without major issues, but no one's talked about such a technology being developed, let alone used in Polaris. I'd imagine it would have to be a pretty fundemental change to the front and back ends of the chip.

Re: The virtex FPGAS with multiple dies, while I haven't looked into it I imagine there are going to be some pretty interesting restrictions on crossing signals between the dies, and some severe timing implications. Even turning the corner in an FPGA can severely impact your timing, going off die would be interesting to say the least.

Black bold:
I disagree that you have to use PCIe inter-die connection.

Red bold:
This is what I'm thinking. The sheer increase in signal lines for an interposer is impossible for a PCB. You can have literally thousands of connections through the microbumps.
Yes, a radical change to the front and rear is needed. You might even need at least 2 die besides HBM on the interposer. In-Out die to also feed/coordinate the processing + (1 or more) shader/ vertex/etc processing die. Remember AMD appears to have changed their design philosophy and made a lot of IP synthesizable. They can combine IP blocks easier that ever in their history.

Actually, papers on interposers have covered cases, where it might be possible to space out a SOC and have the previously monolithic die replaced by component blocks. One advantage is to easily upgrade, another is lower cost. I'll try to find the ones I saved.

If this can happen, there is no way AMD will announce early as it allows them to have the undisputed performance crown.

Now I have to admit as you know that this is pure speculation, but at least a reasoned one.

By the way, if Polaris demoed is using around 35-40W from around 100-110mm^2 die with GDDR5 then a 600mm^2 max sized die with HBM will be very much under the traditional 250-300W high end power. Power room for more processing than can fit in traditional max size 600mm^2 die.

Thanks for at least considering the possibilities and seeing possible solutions. Too many just shoot down any strange idea without the slightest examination. Very strange for a supposed enthusiast forum.

maddie · Jan 6, 2016

Techhog said:
You are the one truly bringing us back to Earth.

Did'nt you read my post explaining the bringing us back to Earth statement. It was for Kenmitch and his Polaris aliens post.

3DVagabond · Jan 6, 2016

maddie said:
Just bringing us back to earth for a moment.

What do we know:
2 new 14nm GPU die this year [Raja Koduri]
1st die released probably = 100-110 mm^2 [GTX950 price range @PcPer]

Infer:
A 250mm^2 die needed to equal FuryX assuming some architectural gains
A 300 mm^2 die needed to give FuryX + 20%
Implies 100mm^2 and 300mm^2 as the two new die designs

Problems:
A huge gap between them [worse ratio than R7 260 : R9 290X]
Lots of value wasted in die harvesting

Conclusion:
Unrealistic for cash poor AMD
Missing some important information

Suggested Solution:
100-110 mm^2 Gddr5 die
200-225 mm^2 HBM die
Interposer multi-die approach for high end market [Fury interposer is big enough]

Well, some had Hawaii dual die because they were certain that AMD would never do a +550mm2 chip. We had Fury being dual Tonga, for some reason I can't remember, and now it's top Polaris sku? Eh, why not.

3DVagabond · Jan 6, 2016

raghu78 said:
Actually its well known that 16/14nm yields for die sizes above 200 sq mm are really bad. In fact only TSMC 16FF+ is going to be feasible for 300+ sq mm GPUs in terms of yields. But TSMC will be capacity constrained as demand far outstrips supply at TSMC 16FF+. Moreover TSMC will give first priority for Apple A9, A9X and A10/A10X (Q3 2016 release) . Nvidia and AMD are better served by using 16FF+ for selling high performance USD 300+ GPUs. AMD's choice to thus go with two GPU dies - a 110 - 120 sq mm low power GPU die fabbed at GF 14LPP and a high performance 300 sq mm GPU die fabbed at TSMC 16FF+ makes sense.

My guess is the the low power GPU specs will be a
R7 470 - 768 sp,
R7 470X - 1024 sp, 1 geoometry engine, 1 raster engine, 32 ROP, 128 bit memory bus 8 Ghz GDDR5

The performance will be on par with GTX 960 for the fully enabled SKU and GTX 950 for the salvage SKU.

the high performance GPU using HBM2 will power 4 SKUs as I expect yields to be really bad for 300 sqmm GPUs in 2016. I think there is going to be heavily salvaged SKUs in 2016 to fill the product stack. We will see a dedicated mid range chip in 2017 once yields are much better.

R9 490X - 4096 sp, 4 geometry engines, 4 raster engines, 128 ROPs, 2048 bit HBM2 , 512 GB/s, 8 GB.

R9 490 - 3072 sp, 4 geometry engines, 4 raster engines, 128 ROPs, 2048 bit HBM2 , 512 GB/s, 8 GB.

R9 480x - 2048 sp, 2 geometry engines, 2 raster engines, 64 ROPs, 1024 bit HBM2, 256 GB/s, 4 GB.
R9 480 - 1792 sp, 2 geometry engines, 2 raster engines, 64 ROPs, 1024 bit HBM2, 256 GB/s, 4 GB.

AMD's approach makes a lot of sense as they use GF 14LPP to serve the high volume GPU market as AMD has a WSA to meet. GF will be able to yield a 110-120 sq mm die reasonably well enough and AMD can try and push as much volume as possible from GF 14LPP. TSMC 16FF+ will be used for the bleeding edge GPUs of 2016.

I expect 4th gen GCN to have significant improvements in perf/sp and thus I think we can expect a 25-30% faster flagship R9 490X GPU compared to Fury X. I think Nvidia will come out with a faster GPU as Maxwell already has impressive perf/cc and Pascal should bring more. I think the Nvidia GPU will be 10% faster than AMD's flagship GPU.

In which workload? 1080P DX11?

MrTeal said:
I don't see why it wouldn't be possible, though I'd agree with you that we're probably not likely to see a true dual GPU chip. Even if they did do something like this, there's no indication that AMD included any synchronization technology that would make the thing anything but a glorified 295X2 with both GPUs under a common heatsink.

This^ Strapping together GPU's is not like 2x4C CPU's. Syncing 1000's of cores I don't believe has ever been attempted because it simply can't be feasibly done.

wege12 · Jan 6, 2016

How do you guys predict the first 14nm gpu's that AMD releases to compare, performance wise, to the Fury X?

I apologize if this has already been discussed...

Techhog · Jan 6, 2016

wege12 said:
How do you guys predict the first 14nm gpu's that AMD releases to compare, performance wise, to the Fury X?

I apologize if this has already been discussed...

Slower. They'll be mobile GPUs.

wege12 · Jan 6, 2016

Techhog said:
Slower. They'll be mobile GPUs.

Okay, that makes sense. What about their non-mobile 14nm GPUs?

Techhog · Jan 6, 2016

wege12 said:
Okay, that makes sense. What about their non-mobile 14nm GPUs?

No idea. That's all a bit hazy. Both AMD and Nvidia have only talked about efficiency so far.

Headfoot · Jan 6, 2016

MrTeal said:
Could you put two 200mm^2 GPUs on a silicon interposer? Sure. To what end? The GPUs would still need to communicate over PCI-E, and you'd have essentially a traditional dual GPU card with all the pitfalls of that..

What? No. Where are you getting this? Do you think HBM and Fiji talk via PCI-e??? No.... You can use any fabric you choose to implement. Obviously, and this is very, very obvious, if AMD was to do a multiple-die-on-interposer strategy they would be attached via a fabric better suited to the task than PCI-e. They were talking up how great their SeaMicro acquired fabric when they still were pushing that, and they are really big on memory tech (and a large part of memory tech is the fabric associated with it). I doubt multi die on interposer is happening for polaris. Its hardly impossible though

wege12 · Jan 6, 2016

Techhog said:
No idea. That's all a bit hazy. Both AMD and Nvidia have only talked about efficiency so far.

Interesting. Kind of leads me to believe both Nvidia and AMD are unfortunately going to stagger their new gpu releases to fully milk all performance gains from the process node.

ShintaiDK · Jan 6, 2016

wege12 said:
Interesting. Kind of leads me to believe both Nvidia and AMD are unfortunately going to stagger their new gpu releases to fully milk all performance gains from the process node.

The lifespan of 28nm will look short compared to 14/16nm. And it will be interesting to see if we ever see a dGPU below it. Remember its also about cost. A 28nm transistor is slightly cheaper than a 14/16nm transistor. So in rough terms, they could shrink current GPUs and the price would be the same. Any extra performance from more transistors cost $$$ on the bottom line.

However I am all for performance/watt and lower power consumption

MrTeal · Jan 6, 2016

Headfoot said:
What? No. Where are you getting this? Do you think HBM and Fiji talk via PCI-e??? No.... You can use any fabric you choose to implement. Obviously, and this is very, very obvious, if AMD was to do a multiple-die-on-interposer strategy they would be attached via a fabric better suited to the task than PCI-e. They were talking up how great their SeaMicro acquired fabric when they still were pushing that, and they are really big on memory tech (and a large part of memory tech is the fabric associated with it). I doubt multi die on interposer is happening for polaris. Its hardly impossible though

HBM and Fiji obviously talk through their 1024-bit interface, which is a fundamental design feature of HBM. The use of the interposer allows the kinds of routing and bump density needed to facilitate that.

The point isn't that you couldn't do a multiple die strategy, it's that you can't just take two GPUs and toss them on an interposer, play some Barry White, and expect to get anything other than Crossfire over XDMA. Unless AMD has some unannounced major additions to Polaris, there's nothing indicated on their slides that would allow two GPUs to synchronize over anything but PCI-E.

A GPU would have to be designed from the ground up to support that kind of functionality regardless of what you use as a fabric, and I would imagine it would be considered a fairly high risk new direction. Even if you did see this as the preferred way forward, I can't imagine a way that you could implement a dual 200mm^2 die strategy without compromising at least the perf/mm^2 of a single 200mm^2 die. What that hit would be, and whether it would force you to have a separate big die designed just for multi-GPU, I have no idea. Like I said, something like this isn't impossible, but there's absolutely no indication thus far that Polaris has anything remotely like this planned and it is pretty much a standard GPU. Slapping two of them together through anything other than xfire right now is just a flight of fancy.

MrTeal · Jan 6, 2016

96Firebird said:
I don't see a post saying they disabled Async compute... Am I missing it?

Personally, I think one could just as easily make the claim that we were biased toward Nvidia as the only 'vendor' specific code is for Nvidia where we had to shutdown async compute. By vendor specific, I mean a case where we look at the Vendor ID and make changes to our rendering path. Curiously, their driver reported this feature was functional but attempting to use it was an unmitigated disaster in terms of performance and conformance so we shut it down on their hardware. As far as I know, Maxwell doesn't really have Async Compute so I don't know why their driver was trying to expose that. The only other thing that is different between them is that Nvidia does fall into Tier 2 class binding hardware instead of Tier 3 like AMD which requires a little bit more CPU overhead in D3D12, but I don't think it ended up being very significant. This isn't a vendor specific path, as it's responding to capabilities the driver reports.

Click to expand...

Probably this one. As far as I read it, Async Compute was disabled for nVidia specifically because it causes massive performance issues. He didn't say that they globally disabled Async Compute.

lilltesaito · Jan 6, 2016

96Firebird said:
I don't see a post saying they disabled Async compute... Am I missing it?

Did you hit page two?

lilltesaito · Jan 6, 2016

MrTeal said:
http://www.overclock.net/t/1569897/...ingularity-dx12-benchmarks/1200#post_24356995

Probably this one. As far as I read it, Async Compute was disabled for nVidia specifically because it causes massive performance issues. He didn't say that they globally disabled Async Compute.

Think this is what everyone was talking about.

AFAIK, Maxwell doesn't support Async Compute, at least not natively. We disabled it at the request of Nvidia, as it was much slower to try to use it then to not.Weather or not Async Compute is better or not is subjective, but it definitely does buy some performance on AMD's hardware. Whether it is the right architectural decision for Maxwell, or is even relevant to it's scheduler is hard to say.

96Firebird · Jan 6, 2016

I didn't even see there was a page two, thanks for finding it MrTeal. :thumbsup:

So, going back to the point that started all this, Nvidia is faster than AMD (980 Ti vs Fury X) in DX12 Ashes with AMD using async compute and Nvidia not?

Kenmitch · Jan 6, 2016

MrTeal said:
HBM and Fiji obviously talk through their 1024-bit interface, which is a fundamental design feature of HBM. The use of the interposer allows the kinds of routing and bump density needed to facilitate that.

The point isn't that you couldn't do a multiple die strategy, it's that you can't just take two GPUs and toss them on an interposer, play some Barry White, and expect to get anything other than Crossfire over XDMA. Unless AMD has some unannounced major additions to Polaris, there's nothing indicated on their slides that would allow two GPUs to synchronize over anything but PCI-E.

A GPU would have to be designed from the ground up to support that kind of functionality regardless of what you use as a fabric, and I would imagine it would be considered a fairly high risk new direction. Even if you did see this as the preferred way forward, I can't imagine a way that you could implement a dual 200mm^2 die strategy without compromising at least the perf/mm^2 of a single 200mm^2 die. What that hit would be, and whether it would force you to have a separate big die designed just for multi-GPU, I have no idea. Like I said, something like this isn't impossible, but there's absolutely no indication thus far that Polaris has anything remotely like this planned and it is pretty much a standard GPU. Slapping two of them together through anything other than xfire right now is just a flight of fancy.

It's not that it's impossible....Just hasn't been done yet....As far as we know.

Maybe the name Polaris was chosen wisely?

Looking up Polaris on wiki shows it really isn't what the normal person thinks.
https://en.m.wikipedia.org/wiki/Polaris

Food for thought....Somewhat out there still.

Why couldn't such a dGPU be invented? Or is it more like AMD couldn't design such a thing?

AtenRa · Jan 6, 2016

96Firebird said:
I didn't even see there was a page two, thanks for finding it MrTeal. :thumbsup:

So, going back to the point that started all this, Nvidia is faster than AMD (980 Ti vs Fury X) in DX12 Ashes with AMD using async compute and Nvidia not?

I dont believe they selectively only disabled Async Compute for the NV hardware but left it enabled on the AMD.

lilltesaito · Jan 6, 2016

MrTeal said:
http://www.overclock.net/t/1569897/...ingularity-dx12-benchmarks/1200#post_24356995

Probably this one. As far as I read it, Async Compute was disabled for nVidia specifically because it causes massive performance issues. He didn't say that they globally disabled Async Compute.

Kind of sounds like Nvidia's drivers were trying to access the Compute and they had to disable it completely to stop their drivers from using it.

Edit:
Found more information about this. It does look like AMD is still using Async Compute and that they redirected Nvidia to DX 11 pathways.

desprado · Jan 6, 2016

AtenRa said:
I dont believe they selectively only disabled Async Compute for the NV hardware but left it enabled on the AMD.

AMD would have said if they have done that. Oxide is very close with AMD.

maddie · Jan 6, 2016

Headfoot said:
What? No. Where are you getting this? Do you think HBM and Fiji talk via PCI-e??? No.... You can use any fabric you choose to implement. Obviously, and this is very, very obvious, if AMD was to do a multiple-die-on-interposer strategy they would be attached via a fabric better suited to the task than PCI-e. They were talking up how great their SeaMicro acquired fabric when they still were pushing that, and they are really big on memory tech (and a large part of memory tech is the fabric associated with it). I doubt multi die on interposer is happening for polaris. Its hardly impossible though

In MrTeal's defense, his next paragraph conceded that you could have a massively parallel interface between die.

MrTeal said:
HBM and Fiji obviously talk through their 1024-bit interface, which is a fundamental design feature of HBM. The use of the interposer allows the kinds of routing and bump density needed to facilitate that.

The point isn't that you couldn't do a multiple die strategy, it's that you can't just take two GPUs and toss them on an interposer, play some Barry White, and expect to get anything other than Crossfire over XDMA. Unless AMD has some unannounced major additions to Polaris, there's nothing indicated on their slides that would allow two GPUs to synchronize over anything but PCI-E.

A GPU would have to be designed from the ground up to support that kind of functionality regardless of what you use as a fabric, and I would imagine it would be considered a fairly high risk new direction. Even if you did see this as the preferred way forward, I can't imagine a way that you could implement a dual 200mm^2 die strategy without compromising at least the perf/mm^2 of a single 200mm^2 die. What that hit would be, and whether it would force you to have a separate big die designed just for multi-GPU, I have no idea. Like I said, something like this isn't impossible, but there's absolutely no indication thus far that Polaris has anything remotely like this planned and it is pretty much a standard GPU. Slapping two of them together through anything other than xfire right now is just a flight of fancy.

1) All of the elements are in place for this as they have never been in the past. Interposer tech. Synthesizable IP blocks.
2)The improvement in yield of a 200mm^2 die over a 300mm^2 die [to be heavily harvested] would possibly be cheaper enough to allow a lower perf/mm^2 and still lower costs
3) A separate small and mid/big die strategy with the same IP blocks but physically arranged differently.
4) Small die as shown for notebooks and low end discrete.
Finally:
Anyone here sees an AMD market-share comeback soon if they don't take risks?
In any case are the risks so great as one might think at first?
All of the tech is there and has been tested. Remember the game console SOCs gave them a lot of experience in mixing and integrating IP blocks to suit the desired product.

Kenmitch said:
It's not that it's impossible....Just hasn't been done yet....As far as we know.

Maybe the name Polaris was chosen wisely?

Looking up Polaris on wiki shows it really isn't what the normal person thinks.
https://en.m.wikipedia.org/wiki/Polaris

Food for thought....Somewhat out there still.

Why couldn't such a dGPU be invented? Or is it more like AMD couldn't design such a thing?

I have been having a lot of fun reading your posts in this thread and I know this will sound crazy to most, but knowing how marketing thinks, this might actually be a clue.

Very much out there, but still fun to speculate.

Polaris, a double star that appears as a single one.

Correction, a multiple star

AMD's next GPU uarch is called "Polaris"

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Lifer

Lifer

Senior member

Platinum Member

Senior member

Platinum Member

Diamond Member

Senior member

Lifer

Diamond Member

Diamond Member

Member

Member

Diamond Member

Diamond Member

Lifer

Member

Golden Member

Diamond Member