- Tahiti had added redundancy (which incurred in bigger die area besides the area penalty to sport a 1.5x wider bus and higher DP rate) to sport better yields.
The problem with that statement is that AMD explicitly denied it. The initial allegation was that Tahiti has 8 pure redundant CUs (for a total of 40, with only 32 active.) That's an incredible amount of redundancy that can't possibly be exploited profitably over the lifetime of a product. It makes much more sense to sell partially disabled products initially and then deploy fully enabled ones later on, which is exactly what Nvidia did for GK110. GK104 was small enough for it not to really matter. A 1 out of 5 redundancy ratio is a staggering amount of real estate wasted that becomes a bigger liability with time. AMD engineers are not stupid. If you look at typical redundancy ratios, as used in RAMs etc., you get ratios of 1 out of 128 and such. The reason for this is that redundancy math is one of extremely rapid diminishing returns.
This is easy to see as follows:
- You need a perfect die.
- That means: zero defects in all non-RAM structures. (Assuming larger RAMs already have redundancy.)
- If 1 defect can kill a die, then that means that defects are in reality really quite rare, generally speaking.
- This means that 1 redundant group will already recover a vast majority of otherwise failing dies. It has the power of increase a yield of, say, 70% to, say, 95%. (Once again: RAM redundancy ratios support this claim.)
- Adding another level of redundancy is plain stupid, because, at best, it can only recover an additional 5%.
- And let's not forget: as time goes by, that 70% will go up, as will the 95%, so the payoff gets even less. And the cost even higher.
This allowed not only AMD to have a few months head start compared to NV, but also have a better yield progression curve than NV, because their design was mostly die area optimized at the expense of yields.
Yes, because correlation always means causation. /s
Here's the trouble with that statement: defect density changes very gradually over time. The difference of in-store availability between 7970 and GTX680 was all of 2 months. Defect density doesn't change a lot in such a short amount of time.
- Having better % yields with fewer total dies (functional + nonfuctional ones) per wafer might be roughly the same cost per functional die than having more dies per wafer but lower yield rate.
Might have, but unlikely. GTX680 had a die that was small enough (300mm2) for it not to matter all that much. And whatever wasn't perfect could be used anyway for GTX670. That's a crucial point you forgot: the vast majority of nonfunctional dies that you tally up as a negative for GTX680 could be reused for GTX670, which arrived on the scene just a couple of months later. So they were never a loss in the first place.
Overal AMD's design decisions behind Tahiti were OK considering it allowed them to have a head start on the 28nm node by shipping products earlier, and reach better yield rates earlier than NV to compensate the bigger die area as a consequence because of the point explained earlier.
You have product that takes 3 years to develop. What do you think is more likely? That this extra redundancy made a difference in time-to-market of 2 months or that 2 months in a large engineering project are in the noise?
...the rest of your statement...
Agreed.