Multiple dies acting as one on interposer

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
Before I watch this, do they talk about multiple dies on one interposer? Is this on topic?
No, but it's relevant to the discussion that the OP, several others and myself was having, concerning how it might be possible to layout a multi-Die GPU on interposer.
 

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
Navi appears to be pointing the way to a multi-die approach. Koduri indicated the yield/cost benefits of multiple small die vs large monolithic as a key to going forward on smaller processes.

Late 2017/early 2018 and counting.
 

MrTeal

Diamond Member
Dec 7, 2003
3,587
1,748
136
Navi appears to be pointing the way to a multi-die approach. Koduri indicated the yield/cost benefits of multiple small die vs large monolithic as a key to going forward on smaller processes.

Late 2017/early 2018 and counting.

This timeline appears reasonable, and given the (extremely limited) information posted about Navi it would appear to be a good possibility.

It will be interesting to have some details when they do come out. Ravi's statements were in the context of pushing Fiji X2 and multi-GPU to developers to improve support for multi-GPU going forward. We might not see early versions of this technology appear quite as seamless as has been discussed here; they could still appear to the system as multiple GPUs, but hopefully with a much more robust interconnect between them.
 

hrga225

Member
Jan 15, 2016
81
6
11
Quite reasonable estimates from both of you.I think that Navi will use PIM(processing in memory),which should be first stage to multi-die chip.
 

maddie

Diamond Member
Jul 18, 2010
4,787
4,771
136
This timeline appears reasonable, and given the (extremely limited) information posted about Navi it would appear to be a good possibility.

It will be interesting to have some details when they do come out. Ravi's statements were in the context of pushing Fiji X2 and multi-GPU to developers to improve support for multi-GPU going forward. We might not see early versions of this technology appear quite as seamless as has been discussed here; they could still appear to the system as multiple GPUs, but hopefully with a much more robust interconnect between them.
Quite true. It appears that DX12 and Vulkan might be necessary to fully exploit this first iteration.

In any case, a single small die on 10nm should surpass 980Ti performance and allow good frame rates in older DX games. Sort of like the single thread/multi-thread performance analogy of CPUs.
 

Headfoot

Diamond Member
Feb 28, 2008
4,444
641
126
We might not see early versions of this technology appear quite as seamless as has been discussed here; they could still appear to the system as multiple GPUs, but hopefully with a much more robust interconnect between them.

This seems like by far and away the most likely outcome. Between NV's NVLink and AMD's interposer experience + SeaMicro fabric, and Intel's increasing emphasis on their Xeon PHI fabric; it definitely seems like fabric first, and transparently gluing the cores together comes later if at all if DX12/vulkan multi-gpu ends up being sufficient
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
Feb 19, 2009
10,457
10
76
Raja was talking about multiple small dies on interposer as a way to compensate low yields before - during capsaicin IIRC.

I would say Raja talking about that very specifically, is also a PR move for investors to see AMD has plans for the future, where nodes will be more problematic.
 

hrga225

Member
Jan 15, 2016
81
6
11
Those GMI links are interesting and answer a question i had in my mind.

With the later versions of the apu (since kaveri) and heterogenous computing, the gpu and the cpu can read and write in each others memory because of the unified memory. If the cpu needs to do some work on a data set, it would not have to physically copy that data set to "cpu" memory , perform the modification on the data and write it back to "gpu" memory. With shared memory, this is kind of silly. And the later apu's solved that issue by using a hardware method (by use of the MMU : memory managing unit) that is comparable to just copying a pointer to that data set and not the actual data set it self. This reliefs the memory from a lot of unnecessary data copying (meaning lot of read and write). The memory is more efficiently used.
This is called zero-copy through pointer passing.

With the arrival of fury, i always wondered how AMD was going to solve that issue since the fury is connected over the pci-e bus and has its own memory. So data copying is needed again. With the latency of the pci-e bus added. The only way to solve this is to optimize the drivers that as little as possible data is moved between gpu and cpu. But one can do only so much.

It seems that AMD is slowly unraveling their plans.
Now i finally understand why they mention a cpu only zen.
Because they are going the mcm road (Not a pcb but an interposer die with multiple dies on it, cpu, gpu and hbm). With the confidence they have gotten from the fury (die with hbm, same height with very strict tolerances for cooling purposes), AMD has now confidence enough to create a interposer based mcm (multi carrier module). The whole point is that at billions of transistors, the energy density is huge. If you want performance, you need to go wide and crank up the clock speed. This creates heat that cannot be removed easily.
Also, a multi 10 to 100 billion transistor (single) die apu would be statistically more prone to defects then separate (lower transistor number) dies. Even adding redundant circuitry and as a last resort the method of binning, can do only so much.
And, either the cpu architecture or the gpu architecture can be kept and the other modified. More or less hbm. mcm allows for more flexibility.

But the issue is to get the same latency advantage as the apu's have since kaveri. The zero copy method. And AMD has also solved that riddle by coming up with the GMI link. One GMI link allows for 25GB/sec. The more GMI links, the higher the bandwidth. But what is the real benefit, is the much lower latency when compared to pci-e. They will start if the slides can be believed with a 100GB/sec link. Cranking it up to a point as much as the cpu and gpu can handle the data. So, data copying is still needed but at a much lower latency. And when the need for ddr4/ddr5 external memory is again eliminated by the use of a hbm2 only, mcm based "SOC", the zero copy through pointer passing method is again available. This works at the grace of having a unified memory architecture once again. And HBM2 allows for a computer configuration with enough memory on it that unified memory makes sense. A complete mcm with 16GB would be more then enough for most people, meaning a pc with 16GB of memory.

The future is looking well for AMD if they execute correctly.

ZEN will be a powerhouse because AMD does not have to worry about the GPU being on the same die anymore. It has benefits but also drawbacks. Especially since there are more and more transistors.

Expect the next generation of consoles to be again a AMD design win.
And this might as well be the end of the pc as we know it for a lot of people.
If Microsoft play their cards right in the near future, they can finally execute an Apple alike strategy. A microsoft home pc that is powerful enough to game (for most people) but can also be used as a normal desktop pc.
With an optimized windows 10 successor that greatly taps into the hardware because of optimized low level drivers with no HAL (Hardware abstraction layer).
Just as Apple does with their hardware and optimized drivers + software.


EDIT:
Of course, it could be that Apple will go that road first. Or perhaps Sony with a linux based pc or even google. Maybe sony will come with an android based home pc in cooperation with google.
But microsoft has the advantage of having the largest software base.
If they are hopefully not so stupid to break compatibility with all old software without giving a proper virtualization solution to keep old win32 software running.

Great post from cpu forum.
 
Feb 19, 2009
10,457
10
76
@William Gaatjes
@hrga225

Well explained, though a few of us said those things when HBM/interposer tech was first shown awhile ago.

Interposer linked HBM + CPU + GPU with AMD's HSA tech can definitely take performance to the next level.
 

Headfoot

Diamond Member
Feb 28, 2008
4,444
641
126
The question in my mind isnt whether we'll get multi die on interposer -- that I believe is 100% inevitable, and arguably already has happened depending on how strong you believe the HBM stack logic/processing is in Fiji. My question is whether we'll ever get something better than explicitly managed (e.g. DX12 Multi Adapter or equivalent) in the graphics card space. I'm starting to think we will not. It seems more like you skip over the transparently-stitch-together-small-GPUs (ala many small die appear as one single GPU) stage altogether and move straight into full on SoC mode (stitch-together-everything)
 
Last edited:

hrga225

Member
Jan 15, 2016
81
6
11
The question in my mind isnt whether we'll get multi die on interposer -- that I believe is 100% inevitable, and arguably already has happened depending on how strong you believe the HBM stack logic/processing is in Fiji. My question is whether we'll ever get something better than explicitly managed (e.g. DX12 Multi Adapter or equivalent) in the graphics card space. I'm starting to think we will not. It seems more like you skip over the transparently-stitch-together-small-GPUs (ala many small die appear as one single GPU) stage altogether and move straight into full on SoC mode (stitch-together-everything)
I would not be so pessimistic.There were many more trickier things solved.
 

Pottuvoi

Senior member
Apr 16, 2012
416
2
81
@William Gaatjes
@hrga225

Well explained, though a few of us said those things when HBM/interposer tech was first shown awhile ago.

Interposer linked HBM + CPU + GPU with AMD's HSA tech can definitely take performance to the next level.
Zen + Polaris cores and HBM2 on interposer certainly sounds very interesting combination for almost any system.
Especially if AMD allows additional memory (DDR4?) for users that need more memory.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |