- Oct 9, 1999
- 4,961
- 3,392
- 136
With the release of Alder Lake less than a week away and the "Lakes" thread having turned into a nightmare to navigate I thought it might be a good time to start a discussion thread solely for Alder Lake.
I haven't kept up with the layout of Aldy, how does the GPU area relate to all of this?
Also, how would a hypothetical GPUless Alder derivative look with an 8+16 layout (P+E)? Similar size?
It is driving efficiency. As per https://www.computerbase.de/2021-11...hnitt_wie_effizient_ist_die_hybridarchitektur, the 12900k is on average 32% faster with 8+8 vs 8+0 with no power limit, 125w/241w PL2, and 125w PL2 scenarios in the MT suite.
Agree that ADL not having separate voltage planes is an oversight, but again, I don't see how that's such a fundamental, unfixable issue as to render the entire hybrid model a bad idea, especially since this is essentially Intel's first mainstream implementation of the hybrid model.
Adding more E-cores => lower P-core clocks in MT for a set performance target => lower voltages => better efficiency.
Amdahl's law is the exact reason a hybrid model does make sense, since speedups become more marginal the more cores/threads there are, it's only logical that the cores/threads responsible for the MT speedup be as area and power efficient as possible.
16c Golden Cove would also be an entirely different die size.
I'm no expert at this but I gave it a try. GPU plus display logic looks to be about 43mm^2 out of 209mm^2 total. About 20% of the total die. GC core with L1/L2/L3 is around 10mm^2. GC about 12 or 13mm^2.
Maybe without GPU ADL at current die size could accommodate 12+8? Or maybe 14 GC's and no GM's or GPU.
I think Gracemont is just the start. In a generation or two, you might see "little" cores reach Golden Cove performance per clock.
JUST Coercertiv? 🤣Using the dimensions from this die shot I calculated the CB R23 points/mm^2. In the same amount of die space Gracemont is 28% more efficient than Golden Cove. This is including HT for Golden Cove and running cores at stock speeds. 4.9 for Golden Cove and 3.7 for Gracemont. Gracemont really packs quite a bit of compute into a small amount of die space. As Coercertiv as been telling us.
Furthermore, if all of the die are was used for Gracemont cores, approximately 8.47 Gracemont clusters would fit on the die with a resulting CB R23 score of 32,700.
Intel already has new iterations in arrow lake/meteor lake.I imagine that will get us a good bit closer to Golden Cove performance. I have seen others claim that you need 8 big cores ideally, but I don’t know about that. If Intel can boost IPC another 10-20% and improve latency and throughput, I imagine most users would only need 2-4 “P” cores and 16 “E” cores.Design is ready for that. I think to start enroaching on big core performance they would only need:
1) Backend: 5th ALU and increase allocation to be also 6 wide ( as big core ). They might not even need 5th ALU and just allocation to 6 wide already would give great returns - on big core ports are shared between ALU and VEC ops, so 5 ALU in reality are impacted by port already being used for something else.
Maybe throw more hardware at VEC, they likely have 4 ports driving 2x256 units that can do 4x128 operations already, so just beefing up these ports with execution resources can match ZEN3 and big core already.
2) Frontend needs to copy all execution elimination tricks from big core and increase various buffers including ROB.
Remember they don't really need to match big core in some retarded FLOP/clock in esoteric FMA ops, but rather in common sense execution capability.
The real challenge will be the clocks. Can they keep the tight L1 cache and not ruin things by relaxing it and adding additional states in pipeline to hunt the clocks? They are already clocked 4Ghz and don't go above 4.2 or so with ton of voltage, so corporate morons and beancounters will definitely pressure them to give more clock.
Define "most users" in this statement...I imagine most users would only need 2-4 “P” cores and 16 “E” cores.
Define "most users" in this statement...
Because for me "most users" are your aunts and uncles type people that are fine with just 4 cores but might go up to 6 or even 8 just because they have the money to spend. Seriously a modern i3 with 4 cores +HTT is even good for decent gaming, the only reason to go above that would be that you are making money with your PC and that is definitely not "most users".
JUST Coercertiv? 🤣
I have been telling people for months that Gracemont was/is a game changer.
If you're limited in thread-level parallelism, then you want your cores to feature as much instruction-level parallelism, high IPC, and high clocks to keep pushing higher performance. A few workloads that are "embarassingly parallel" will scale near-perfectly with additional E cores, but honestly, do you think Alder Lake-S will benefit much from adding extra E cores when running Handbrake? Especially when its successor (Raptor Lake) could have more than 16 of the things? There will be a relatively small number of applications that will be able to benefit from more than 8 E cores (any review of Threadripper vs AM4 CPUs should show you where the scaling will be poor). Gracemont may be area-efficient, but it's going to be totally unutilized, then what's the point?
I'm well aware of that, but sometimes if you want to get it done right, you make the necessary sacrifices.
Eh, there are pros and cons for both methods, if intel had infinite resources I could see them having gone with 10 or 12 real cores instead of having ecores on the desktop parts. I'm pretty sure that supply issues played a big part in making this choice at this point.You aren't the genius, that discovered they are all wrong, that you seem to think you are.
Eh, there are pros and cons for both methods, if intel had infinite resources I could see them having gone with 10 or 12 real cores instead of having ecores on the desktop parts. I'm pretty sure that supply issues played a big part in making this choice at this point.
I'm currently using Skylake-X i7-7820X overclocked on all cores to 4.4ghz, 64GB RAM, GTX1080. What kind of single core speed improvements should I expect from i7-12700 or i9-12900 ? I primarily use ACR, Photoshop for stitching panoramas, merging HDR, processing film scans of 70-100MP photos (B&W and Color), applying SRDx dust removal filter, etc. Most of the workload I do seems to be single threaded. Would I be able to tell a significant speed improvement?
Unless your workload is embarrassingly parallel, you don't need 16 performance cores either.
You are basically making up in your head, some kind of non existent work load that ideally matches 16 cores, then arguing that 16 performance cores would be best.
Do what right? Run imaginary workloads?
A hybrid design will match low thread count work load performance, exceed high thread count performance, while consuming less power, and using less silicon. It's Win-Win.
The thing is that "needs" change over time.I have seen others claim that you need 8 big cores ideally, but I don’t know about that. If Intel can boost IPC another 10-20% and improve latency and throughput, I imagine most users would only need 2-4 “P” cores and 16 “E” cores.
The 12900K gets 1400 points in PugetBench Photoshop v0.93.3. https://www.pugetsystems.com/labs/a...Gen-Intel-Core-vs-AMD-Ryzen-5000-Series-2245/I'm currently using Skylake-X i7-7820X overclocked on all cores to 4.4ghz, 64GB RAM, GTX1080. What kind of single core speed improvements should I expect from i7-12700 or i9-12900 ? I primarily use ACR, Photoshop for stitching panoramas, merging HDR, processing film scans of 70-100MP photos (B&W and Color), applying SRDx dust removal filter, etc. Most of the workload I do seems to be single threaded. Would I be able to tell a significant speed improvement?
Are you sure your workloads are mostly single-threaded?
If money is no big deal, i'd expect at least 40% improvement versus said Skylake system. The expense would mainly come from getting 64GB DRAM, i'd wait for faster DDR5 to appear at more affordable prices.
Unless your workload is embarrassingly parallel, you don't need 16 performance cores either.
You aren't the genius, that discovered they are all wrong, that you seem to think you are.
Back in 2018, Intel released the 9900k that swept every benchmark, beating the 2700x decisively. And the 2700x wasn't even half a year old when they did it. Intel did so with a die that was larger than anything they had launched on 14nm for the consumer desktop. Doesn't take a genius to figure out how they did it, either.
We aren't living in the past.
The Big.little solution from Intel didn't run the table. 16c Golden Cove would have. What does Intel have to show for it as an advantage?
A smaller die and more power consumption.
They also have a larger profit margin to show for it
The Big.little solution from Intel didn't run the table. 16c Golden Cove would have. What does Intel have to show for it as an advantage?
A smaller die and more power consumption.