This has to be my longest post ever. Here goes.
Predicting this:
Radeon ????? 8GB HBM WCE full die
Radeon 390X 4-8 GB HBM air full die
Radeon 390X 4-8GB HBM air cut die
Radeon 380X 8 GB ddr5 air full die [based on Hawaii, not rebranded]
Radeon 380X 4 GB ddr5 air cut die [based on Hawaii, not rebranded]
With regards to the naysayers to 8GB HBM, think of this.
Anyone who claimed that the 1st iteration of HBM only allowed 4 stacks was shown to be wrong. Some saw the graphic with 4 stacks and took that as the limit. Had they read the papers on interposer technology they would have realized that position was wrong. Notice Macri below Basically saying you can do more with 4Gb than you thought, not that 4GB is the limit of the card.
http://arstechnica.co.uk/informatio...hbm-why-amds-high-bandwidth-memory-matters/2/
AMD's CTO, Joe Macri, explained the 4GB limitation to Ars in a telephone call:
"You're not limited in this world to any number of stacks, but from a capacity point of view, this generation-one HBM, each DRAM is a two-gigabit DRAM, so yeah, if you have four stacks you're limited to four gigabytes. You could build things with more stacks, you could build things with less stacks. Capacity of the frame buffer is just one of our concerns. There are many things you can do to utilise that capacity better. So if you have four stacks you're limited to four [gigabytes], but we don't really view that as a performance limitation from an AMD perspective."
"If you actually look at frame buffers and how efficient they are and how efficient the drivers are at managing capacities across the resolutions, you'll find that there's a lot that can be done. We do not see 4GB as a limitation that would cause performance bottlenecks. We just need to do a better job managing the capacities. We were getting free capacity, because with [GDDR5] in order to get more bandwidth we needed to make the memory system wider, so the capacities were increasing. As engineers, we always focus on where the bottleneck is. If you're getting capacity, you don't put as much effort into better utilising that capacity. 4GB is more than sufficient. We've had to go do a little bit of investment in order to better utilise the frame buffer, but we're not really seeing a frame buffer capacity [problem]. You'll be blown away by how much [capacity] is wasted."
If you read the HBM papers, the interposer can be as large as the wafer [not talking costs here]. Interposers can also be as low as $2/200mm2. Remember, the steps to produce an interposer is not advanced lithography.
Here is a 2012 item.
http://electroiq.com/blog/2012/12/lifting-the-veil-on-silicon-interposer-pricing/
At the recent Georgia Tech-hosted International Interposer Conference, Matt Nowak of Qualcomm and Nagesh Vordharalli of Altera both pointed to the necessity for interposer costs to reach 1$ per 100mm2 for them to see wide acceptance in the high-volume mobile arena. For Nowak, the standard interposer would be something like ~200mm2 and cost $2. The question that was posed but unanswered was: "Who will make such a $2 interposer?"
Less than a month later, this question began to be answered as several speakers at the year-ending RTI ASIP conference (Architectures for Semiconductor Integration and Packaging) began to lift the veil on silicon interposer pricing.
Sesh Ramaswami, managing director at Applied Materials, showed a cost analysis which resulted in 300mm interposer wafer costs of $500-$650 / wafer. His cost analysis showed the major cost contributors are damascene processing (22%), front pad and backside bumping (20%), and TSV creation (14%).
Ramaswami noted that the dual damascene costs have been optimized for front-end processing, so there is little chance of cost reduction there; whereas cost of backside bump could be lowered by replacing polymer dielectric with oxide, and the cost of TSV formation can be addressed by increasing etch rate, ECD (plating) rate, and increasing PVD step coverage.
Since one can produce ~286 200mm2 die on a 300mm wafer, at $575 (his midpoint cost) per wafer, this results in a $2 200mm2 silicon interposer.
Thus costs should not be too prohibitive for larger interposers enabling 8+ stacks HBM. I say 8+ because Fiji will most certainly have to be used as professional graphics cards. In fact, AMD should have an opportunity to make inroads here as maxwell seems to be DP gimped and Nvidia will have to wait until pascal to offer a next gen professional card.
After the 4 stack limit was shown to be false, the usual suspects began to say that dual link is a fantasy or you would need 8192bit memory bus which was impossible.
I can't comment on the dual-link HBM, but lets look at the bus size as a limit.
An HBM stack is 5x7 mm and handles 1024 memory lanes. These lanes take at most 1/3 of this area [actually less] which is roughly 12mm2. Thefore 4096 lanes = 48mm2 and 8192 lanes = 96mm2. Remember, the whole base of the GPU is now available for I/O not just the perimeter. Even 8192 lanes should only be 1/6 the die area of Fiji [Reputed to be 550-600 mm2]. From an areal perspective, lack of space is not a problem.