- Oct 14, 2003
- 8,686
- 3,786
- 136
Here's my reply for the guys at Vega thread: http://www.portvapes.co.uk/?id=Latest-exam-1Z0-876-Dumps&exid=threads/vega-navi-rumors-updated.2486940/
Correct. Stuka87 gets it.
No.
Their assertion is right. However, I believe their numbers are in practice, shady as well.
I would assume so. Or its a lot more complex.
Why are Foundrys' node numbers a marketing number and does not necessarily reflect reality?
Because post-28nm nodes use full node names for half nodes. Pre-28nm, going from 110nm to 90nm was a half node, as it should be. 130nm to 90nm would be a full node. However, post-28nm it changes. Why? Because scaling started to become really hard. And transistor performance benefits started to diminish. Because manufacturers wanted to perpetuate the notion that "everything is fine" they named 20nm and 14/16nm when in reality it should have been a half node.
20nm = 28nm with double the density(conversely half the size)
14nm = 20nm density but with FinFET transistors for performance
So most manufacturers "skipped" 20nm and went from 28nm to "16/14nm", because they NEEDED that to get full node benefits. Traditionally a shrink brought both the density gain and performance gains. With post-28nm they needed FinFET to get the full node performance gain. Hence, 28nm to 16/14nm is a single, full node reduction, not a double full node reduction. It's a double half node reduction.
Is Intel completely correct in saying they have a massive density lead?
This is questionable.
Ivy Bridge 4C GT2 1.958mm2 per 1MB L3, Total = 160mm2 1.4 billion transistors(8.75 mil tr/mm2)
*Broadwell 2C GT2 1.09mm2 per 1MB L3, Total = 82mm2 1.3 billion transistors(15.6 mil tr/mm2)
*Knights Landing 0.7mm2 per 512KB L2 Die, Total = 650mm2 ~8 billion transistors(12.3 mil tr/mm2)
*Skylake 1.2mm2 per 1MB L3
Since I've calculated the Total numbers above, let's calculate the million transistors per mm2(mil tr/mm2) for the caches. It uses 6T setup for the caches so it takes 6 transistors per bit. 6 transistors x 8 bit per byte x byte to MB conversion(1024x1024) = 50.3 million transistors
Ivy Bridge L3 cache density = 25.69 mil tr/mm2
Broadwell L3 cache density = 46.15 mil tr/mm2
Knights Landing L2 cache density = 35.9 mil tr/mm2
Skylake L3 cache density = 41.9 mil tr/mm2
Caveats for the chips with * next to their names
-Chips Post Ivy Bridge Intel started the practice of hiding transistor and die size metrics. We get the numbers until Haswell. After that, its hard to find
-Knights Landing numbers are approximate, though it shouldn't change more than 20%
-Broadwell has blurry shots. 10% margin of error
-Skylake has a weird cache configuration. Should be more accurate than Broadwell though.
For comparison, Ryzen has an L3 cache density of 1.0mm2 per 1MB L3 cache. ~50 mil tr/mm2. This number is quite accurate.
Ryzen has 3x the transistors of Intel chips. Total transistor numbers became irrevalent.
Courtesy of Paratus:
Ryzen is 4.8B with 195mm2 for 24.6MTr/mm2
4C GT2 Haswell had a transistor count of 1.4 billion. It's unlikely we're much higher with Skylake.
SRAM(caches) are by far the majority of the consumer of transistors.
Ryzen's caches
8x 64KB I-cache
8x 32KB D-cache
8x 512KB cache for L2
2x 8MB cache for L3
Total: 20.75MB of cache
Haswell 4C GT2:
4x 32KB I-cache
4x 32KB D-cache
4x 256KB L2 cache
1x 8MB L3 cache
Total: 9.25MB of cache
But wait, you might say, Ryzen has TWICE the amount of cores! That doesn't matter. Because cores take a small amount of transistors. Let's compare it to Broadwell-E shall we?
Broadwell-E 10 core, 3.4 billion transistors, 246mm2 die = 13.8 mil tr/mm2. Ryzen still has 40% more transistors.
10x 32KB I-cache
10x 32KB D-cache
10x 256KB L2 cache
1x 25MB L3 cache
Total: 28.125MB of cache
28.125(BDW-E) - 20.75(Ryzen 8C) = 7.375MB of cache! Or 7.375 x 50.3 mil tr = 371 million more transistors used up in caches for Broadwell-E. Yet, Ryzen has 1.4 billion more.
Transistor counts at the whole don't matter. What is Ryzen using the transistors for? Since the die size is pretty compact, we don't really care. Unlike Vega, which has a huge amount of transistors ending up in a very large die.
Conclusion:
Intel claims in their presentations that competing 14/16nm solutions have a transistor density of 25-30 mil tr/mm2. Ryzen 8C has an L3 cache density of 50 mil tr/mm2, which is far higher than that.
I call Intel's tactics as "shady" as other Foundries are "shady". Because in practice there's little difference. You can not equalize it like you do with benchmarks to do a "fair" comparison.
-Intel numbers don't matter because they basically only make their chips.
-Transistor density is heavily influenced by implementation. I suspect the reason Ryzen is quite dense and Intel chips not is because the latter is optimized for performance. You need more transistors and larger transistors to get greater drive current and get higher clocks, or lower instruction latency(for per thread performance).
-Density metrics are not as important as before because the transistors themselves may not be as high performance. Yes, Intel's 10nm may be dense, but who cares when they themselves claim 10nm is lower performing than 14nm++?
It seems Intel started using weird density metrics and focus on density when they started offering Foundry services! How about Intel, focus on things that matter because right now no one uses your Foundry. YOUR chips are used, which is what the process should be made for.
Node numbers are basically marketing numbers. Only products speak the truth. Post 32/28nm, the numbers mean little. It just means its better than the previous generation. How much better? You'll know it when you get the product.
Stuka87 said:No, it isn't. 28 to 20 is a "half node". Just to like 20 to 14 is half a node.
Correct. Stuka87 gets it.
PeterScott said:I think people are getting mixed up because they remember a full node is half size/double density, and it is. But that is based on area and node names are based on linear measurement.
No.
PeterScott said:I should have added. "In Theory".
Remember Intel complaining that it's competitors node names were bullshit? This is what it is all about.
Their assertion is right. However, I believe their numbers are in practice, shady as well.
Paratus said:That would suggest Intel is coming no where close the theoretical density of their process which is probably based on the size of an SRAM cell.
I would assume so. Or its a lot more complex.
Why are Foundrys' node numbers a marketing number and does not necessarily reflect reality?
Because post-28nm nodes use full node names for half nodes. Pre-28nm, going from 110nm to 90nm was a half node, as it should be. 130nm to 90nm would be a full node. However, post-28nm it changes. Why? Because scaling started to become really hard. And transistor performance benefits started to diminish. Because manufacturers wanted to perpetuate the notion that "everything is fine" they named 20nm and 14/16nm when in reality it should have been a half node.
20nm = 28nm with double the density(conversely half the size)
14nm = 20nm density but with FinFET transistors for performance
So most manufacturers "skipped" 20nm and went from 28nm to "16/14nm", because they NEEDED that to get full node benefits. Traditionally a shrink brought both the density gain and performance gains. With post-28nm they needed FinFET to get the full node performance gain. Hence, 28nm to 16/14nm is a single, full node reduction, not a double full node reduction. It's a double half node reduction.
Is Intel completely correct in saying they have a massive density lead?
This is questionable.
Ivy Bridge 4C GT2 1.958mm2 per 1MB L3, Total = 160mm2 1.4 billion transistors(8.75 mil tr/mm2)
*Broadwell 2C GT2 1.09mm2 per 1MB L3, Total = 82mm2 1.3 billion transistors(15.6 mil tr/mm2)
*Knights Landing 0.7mm2 per 512KB L2 Die, Total = 650mm2 ~8 billion transistors(12.3 mil tr/mm2)
*Skylake 1.2mm2 per 1MB L3
Since I've calculated the Total numbers above, let's calculate the million transistors per mm2(mil tr/mm2) for the caches. It uses 6T setup for the caches so it takes 6 transistors per bit. 6 transistors x 8 bit per byte x byte to MB conversion(1024x1024) = 50.3 million transistors
Ivy Bridge L3 cache density = 25.69 mil tr/mm2
Broadwell L3 cache density = 46.15 mil tr/mm2
Knights Landing L2 cache density = 35.9 mil tr/mm2
Skylake L3 cache density = 41.9 mil tr/mm2
Caveats for the chips with * next to their names
-Chips Post Ivy Bridge Intel started the practice of hiding transistor and die size metrics. We get the numbers until Haswell. After that, its hard to find
-Knights Landing numbers are approximate, though it shouldn't change more than 20%
-Broadwell has blurry shots. 10% margin of error
-Skylake has a weird cache configuration. Should be more accurate than Broadwell though.
For comparison, Ryzen has an L3 cache density of 1.0mm2 per 1MB L3 cache. ~50 mil tr/mm2. This number is quite accurate.
Ryzen has 3x the transistors of Intel chips. Total transistor numbers became irrevalent.
Courtesy of Paratus:
Ryzen is 4.8B with 195mm2 for 24.6MTr/mm2
4C GT2 Haswell had a transistor count of 1.4 billion. It's unlikely we're much higher with Skylake.
SRAM(caches) are by far the majority of the consumer of transistors.
Ryzen's caches
8x 64KB I-cache
8x 32KB D-cache
8x 512KB cache for L2
2x 8MB cache for L3
Total: 20.75MB of cache
Haswell 4C GT2:
4x 32KB I-cache
4x 32KB D-cache
4x 256KB L2 cache
1x 8MB L3 cache
Total: 9.25MB of cache
But wait, you might say, Ryzen has TWICE the amount of cores! That doesn't matter. Because cores take a small amount of transistors. Let's compare it to Broadwell-E shall we?
Broadwell-E 10 core, 3.4 billion transistors, 246mm2 die = 13.8 mil tr/mm2. Ryzen still has 40% more transistors.
10x 32KB I-cache
10x 32KB D-cache
10x 256KB L2 cache
1x 25MB L3 cache
Total: 28.125MB of cache
28.125(BDW-E) - 20.75(Ryzen 8C) = 7.375MB of cache! Or 7.375 x 50.3 mil tr = 371 million more transistors used up in caches for Broadwell-E. Yet, Ryzen has 1.4 billion more.
Transistor counts at the whole don't matter. What is Ryzen using the transistors for? Since the die size is pretty compact, we don't really care. Unlike Vega, which has a huge amount of transistors ending up in a very large die.
Conclusion:
Intel claims in their presentations that competing 14/16nm solutions have a transistor density of 25-30 mil tr/mm2. Ryzen 8C has an L3 cache density of 50 mil tr/mm2, which is far higher than that.
I call Intel's tactics as "shady" as other Foundries are "shady". Because in practice there's little difference. You can not equalize it like you do with benchmarks to do a "fair" comparison.
-Intel numbers don't matter because they basically only make their chips.
-Transistor density is heavily influenced by implementation. I suspect the reason Ryzen is quite dense and Intel chips not is because the latter is optimized for performance. You need more transistors and larger transistors to get greater drive current and get higher clocks, or lower instruction latency(for per thread performance).
-Density metrics are not as important as before because the transistors themselves may not be as high performance. Yes, Intel's 10nm may be dense, but who cares when they themselves claim 10nm is lower performing than 14nm++?
It seems Intel started using weird density metrics and focus on density when they started offering Foundry services! How about Intel, focus on things that matter because right now no one uses your Foundry. YOUR chips are used, which is what the process should be made for.
Node numbers are basically marketing numbers. Only products speak the truth. Post 32/28nm, the numbers mean little. It just means its better than the previous generation. How much better? You'll know it when you get the product.
Last edited: