Question Alder Lake - Official Thread

Page 41 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Hulk

Diamond Member
Oct 9, 1999
4,367
2,234
136
I haven't kept up with the layout of Aldy, how does the GPU area relate to all of this?

Also, how would a hypothetical GPUless Alder derivative look with an 8+16 layout (P+E)? Similar size?

I'm no expert at this but I gave it a try. GPU plus display logic looks to be about 43mm^2 out of 209mm^2 total. About 20% of the total die. GC core with L1/L2/L3 is around 10mm^2. GC about 12 or 13mm^2.

Maybe without GPU ADL at current die size could accommodate 12+8? Or maybe 14 GC's and no GM's or GPU.
 
Reactions: Arkaign

DrMrLordX

Lifer
Apr 27, 2000
21,794
11,143
136
It is driving efficiency. As per https://www.computerbase.de/2021-11...hnitt_wie_effizient_ist_die_hybridarchitektur, the 12900k is on average 32% faster with 8+8 vs 8+0 with no power limit, 125w/241w PL2, and 125w PL2 scenarios in the MT suite.

You're basically ignoring the fact that the unified voltage rail causes the chip to burn maybe ~50W needlessly by overvolting the efficiency cores when sustaining PL2 power values (241W). The potential was there but Intel blew it in multiple ways.

Agree that ADL not having separate voltage planes is an oversight, but again, I don't see how that's such a fundamental, unfixable issue as to render the entire hybrid model a bad idea, especially since this is essentially Intel's first mainstream implementation of the hybrid model.

Intel will allegedly solve the problem with Raptor Lake. That isn't much of an endorsement of the current design.

Adding more E-cores => lower P-core clocks in MT for a set performance target => lower voltages => better efficiency.

Not sure Alder Lake currently exhibits boost clock behavior consistent with this strategy. It's hard to tell since so many benchmarks are conducted with no effective power limit on the CPU.

Amdahl's law is the exact reason a hybrid model does make sense, since speedups become more marginal the more cores/threads there are, it's only logical that the cores/threads responsible for the MT speedup be as area and power efficient as possible.

It absolutely does not. Amdahl's Law should show you that for a given workload, you can only service so many threads before adding more thread-handling capability results in a precipitous drop-off in performance. Specific applications will only ever use so many threads. If you're limited in thread-level parallelism, then you want your cores to feature as much instruction-level parallelism, high IPC, and high clocks to keep pushing higher performance. A few workloads that are "embarassingly parallel" will scale near-perfectly with additional E cores, but honestly, do you think Alder Lake-S will benefit much from adding extra E cores when running Handbrake? Especially when its successor (Raptor Lake) could have more than 16 of the things? There will be a relatively small number of applications that will be able to benefit from more than 8 E cores (any review of Threadripper vs AM4 CPUs should show you where the scaling will be poor). Gracemont may be area-efficient, but it's going to be totally unutilized, then what's the point?

16c Golden Cove would also be an entirely different die size.

I'm well aware of that, but sometimes if you want to get it done right, you make the necessary sacrifices.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
I'm no expert at this but I gave it a try. GPU plus display logic looks to be about 43mm^2 out of 209mm^2 total. About 20% of the total die. GC core with L1/L2/L3 is around 10mm^2. GC about 12 or 13mm^2.

Maybe without GPU ADL at current die size could accommodate 12+8? Or maybe 14 GC's and no GM's or GPU.

Yea but that won't happen, because you lose a ton of sales(I mean, $0, not low margins), you need to significantly rearrange all blocks to keep within the rectangular shape, and iGPU is pretty much paid by the mainstream market. Also with 14 GCs you run into ringbus trouble, while even with 32 Gracemonts you can use tricks to reduce the number of stops such as moving to an octo-core cluster, which also saves die area.

I think Gracemont is just the start. In a generation or two, you might see "little" cores reach Golden Cove performance per clock.

The trick is to keep the mont cores high enough performance so you lose only a small bit when switching threads and it happens to land on the E cores.

Mobiles should shine as well. I expect it to live up to the 2x performance claim. 2+8 15W should outperform 4C Tigerlake by 30-50%, and 4+8 20W-plus should double it. Remember that Apple slide saying they beat Tigerlake-H by a significant amount and it beats 4C Tigerlake to a pulp?

Well, enter Alderlake. Also of course they conveniently not include AMD Cezanne chips, cause 20% advantage against Cezanne looks way worse for M1 than 70% versus Tigerlake.

I think the reinvigorated Intel/AMD, Qualcomm with Nuvia, and Apple cores should see amazing advancements in the coming years.
 
Reactions: lightmanek

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
I think Gracemont is just the start. In a generation or two, you might see "little" cores reach Golden Cove performance per clock.

Design is ready for that. I think to start enroaching on big core performance they would only need:

1) Backend: 5th ALU and increase allocation to be also 6 wide ( as big core ). They might not even need 5th ALU and just allocation to 6 wide already would give great returns - on big core ports are shared between ALU and VEC ops, so 5 ALU in reality are impacted by port already being used for something else.
Maybe throw more hardware at VEC, they likely have 4 ports driving 2x256 units that can do 4x128 operations already, so just beefing up these ports with execution resources can match ZEN3 and big core already.

2) Frontend needs to copy all execution elimination tricks from big core and increase various buffers including ROB.

Remember they don't really need to match big core in some retarded FLOP/clock in esoteric FMA ops, but rather in common sense execution capability.

The real challenge will be the clocks. Can they keep the tight L1 cache and not ruin things by relaxing it and adding additional states in pipeline to hunt the clocks? They are already clocked 4Ghz and don't go above 4.2 or so with ton of voltage, so corporate morons and beancounters will definitely pressure them to give more clock.
 

eek2121

Diamond Member
Aug 2, 2005
3,043
4,265
136
Using the dimensions from this die shot I calculated the CB R23 points/mm^2. In the same amount of die space Gracemont is 28% more efficient than Golden Cove. This is including HT for Golden Cove and running cores at stock speeds. 4.9 for Golden Cove and 3.7 for Gracemont. Gracemont really packs quite a bit of compute into a small amount of die space. As Coercertiv as been telling us.

Furthermore, if all of the die are was used for Gracemont cores, approximately 8.47 Gracemont clusters would fit on the die with a resulting CB R23 score of 32,700.
JUST Coercertiv? 🤣

I have been telling people for months that Gracemont was/is a game changer.
Design is ready for that. I think to start enroaching on big core performance they would only need:

1) Backend: 5th ALU and increase allocation to be also 6 wide ( as big core ). They might not even need 5th ALU and just allocation to 6 wide already would give great returns - on big core ports are shared between ALU and VEC ops, so 5 ALU in reality are impacted by port already being used for something else.
Maybe throw more hardware at VEC, they likely have 4 ports driving 2x256 units that can do 4x128 operations already, so just beefing up these ports with execution resources can match ZEN3 and big core already.

2) Frontend needs to copy all execution elimination tricks from big core and increase various buffers including ROB.

Remember they don't really need to match big core in some retarded FLOP/clock in esoteric FMA ops, but rather in common sense execution capability.

The real challenge will be the clocks. Can they keep the tight L1 cache and not ruin things by relaxing it and adding additional states in pipeline to hunt the clocks? They are already clocked 4Ghz and don't go above 4.2 or so with ton of voltage, so corporate morons and beancounters will definitely pressure them to give more clock.
Intel already has new iterations in arrow lake/meteor lake.I imagine that will get us a good bit closer to Golden Cove performance. I have seen others claim that you need 8 big cores ideally, but I don’t know about that. If Intel can boost IPC another 10-20% and improve latency and throughput, I imagine most users would only need 2-4 “P” cores and 16 “E” cores.
 

TheELF

Diamond Member
Dec 22, 2012
3,990
744
126
I imagine most users would only need 2-4 “P” cores and 16 “E” cores.
Define "most users" in this statement...
Because for me "most users" are your aunts and uncles type people that are fine with just 4 cores but might go up to 6 or even 8 just because they have the money to spend. Seriously a modern i3 with 4 cores +HTT is even good for decent gaming, the only reason to go above that would be that you are making money with your PC and that is definitely not "most users".
 
Reactions: Hotrod2go

eek2121

Diamond Member
Aug 2, 2005
3,043
4,265
136
Define "most users" in this statement...
Because for me "most users" are your aunts and uncles type people that are fine with just 4 cores but might go up to 6 or even 8 just because they have the money to spend. Seriously a modern i3 with 4 cores +HTT is even good for decent gaming, the only reason to go above that would be that you are making money with your PC and that is definitely not "most users".

Anyone not needed Sapphire Rapids or Threadripper? Shoot, even half of those users would be better off.

If 4x Gracemont cores take up the space of 1 Golden Cove core, but 4X Gracemont cores have > efficiency of 1 Golden Cove core, than, by logic, you only need just enough big cores to cover your big foreground tasks. The rest can run off the small cores.

Raptor Lake will be a test of this use case for sure.
 

Hulk

Diamond Member
Oct 9, 1999
4,367
2,234
136
JUST Coercertiv? 🤣

I have been telling people for months that Gracemont was/is a game changer.

Yes you too! Coercertiv is the one who came to mind as I was writing.

I think the easiest way for Intel to increase the performance of the Gracemont cores would be to add Hyperthreading. They just need to add registers and other relatively minor structures and considering how wide GM is would probably yield an easy 20% in MT, which is where the small cores are supposed to shine.
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,260
5,257
136
If you're limited in thread-level parallelism, then you want your cores to feature as much instruction-level parallelism, high IPC, and high clocks to keep pushing higher performance. A few workloads that are "embarassingly parallel" will scale near-perfectly with additional E cores, but honestly, do you think Alder Lake-S will benefit much from adding extra E cores when running Handbrake? Especially when its successor (Raptor Lake) could have more than 16 of the things? There will be a relatively small number of applications that will be able to benefit from more than 8 E cores (any review of Threadripper vs AM4 CPUs should show you where the scaling will be poor). Gracemont may be area-efficient, but it's going to be totally unutilized, then what's the point?

Unless your workload is embarrassingly parallel, you don't need 16 performance cores either.

You are basically making up in your head, some kind of non existent work load that ideally matches 16 cores, then arguing that 16 performance cores would be best.

I'm well aware of that, but sometimes if you want to get it done right, you make the necessary sacrifices.

Do what right? Run imaginary workloads?

A hybrid design will match low thread count work load performance, exceed high thread count performance, while consuming less power, and using less silicon. It's Win-Win.

Everyone (AMD included according to rumors) is moving to big-little with performance and efficiency cores, , for exactly these reasons.

You aren't the genius, that discovered they are all wrong, that you seem to think you are.
 

imported_pk

Junior Member
Feb 11, 2007
11
0
66
I'm currently using Skylake-X i7-7820X overclocked on all cores to 4.4ghz, 64GB RAM, GTX1080. What kind of single core speed improvements should I expect from i7-12700 or i9-12900 ? I primarily use ACR, Photoshop for stitching panoramas, merging HDR, processing film scans of 70-100MP photos (B&W and Color), applying SRDx dust removal filter, etc. Most of the workload I do seems to be single threaded. Would I be able to tell a significant speed improvement?
 

TheELF

Diamond Member
Dec 22, 2012
3,990
744
126
You aren't the genius, that discovered they are all wrong, that you seem to think you are.
Eh, there are pros and cons for both methods, if intel had infinite resources I could see them having gone with 10 or 12 real cores instead of having ecores on the desktop parts. I'm pretty sure that supply issues played a big part in making this choice at this point.
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,260
5,257
136
Eh, there are pros and cons for both methods, if intel had infinite resources I could see them having gone with 10 or 12 real cores instead of having ecores on the desktop parts. I'm pretty sure that supply issues played a big part in making this choice at this point.

If the "pros" side of your argument requires "infinite resources", it's really a "con".

Engineering is all about working around real world constraints like minimizing die size for economic and supply reasons.

Everyone is going Big-Little because it makes the best use of Power and Silicon budgets.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
I'm currently using Skylake-X i7-7820X overclocked on all cores to 4.4ghz, 64GB RAM, GTX1080. What kind of single core speed improvements should I expect from i7-12700 or i9-12900 ? I primarily use ACR, Photoshop for stitching panoramas, merging HDR, processing film scans of 70-100MP photos (B&W and Color), applying SRDx dust removal filter, etc. Most of the workload I do seems to be single threaded. Would I be able to tell a significant speed improvement?

If money is no big deal, i'd expect at least 40% improvement versus said Skylake system. The expense would mainly come from getting 64GB DRAM, i'd wait for faster DDR5 to appear at more affordable prices.
 

Hulk

Diamond Member
Oct 9, 1999
4,367
2,234
136
Unless your workload is embarrassingly parallel, you don't need 16 performance cores either.

You are basically making up in your head, some kind of non existent work load that ideally matches 16 cores, then arguing that 16 performance cores would be best.

Do what right? Run imaginary workloads?

A hybrid design will match low thread count work load performance, exceed high thread count performance, while consuming less power, and using less silicon. It's Win-Win.

This is a really good point. If your workload scales perfectly with additional threads then the most efficient use of die space and power is going to be lots of small cores. If your workload only uses 4, or 6, or 8 cores effectively, then that amount of big cores will be optimal.
 

dullard

Elite Member
May 21, 2001
25,203
3,617
126
I have seen others claim that you need 8 big cores ideally, but I don’t know about that. If Intel can boost IPC another 10-20% and improve latency and throughput, I imagine most users would only need 2-4 “P” cores and 16 “E” cores.
The thing is that "needs" change over time.

1) To have a functional computer, you only need one core. But, it isn't a great experience using it.

2) For a halfway decent user experience you need one thread to respond to the user, one thread to perform the task that the user wants, and a third thread to perform all operating system tasks. They aren't equal in processor needs, but at a bare minimum you need 3 threads. That is why going from 1 core to 2 cores to 4 cores were very significant jumps in the way the user felt the computer responded. The days of the endless spinning beach ball or hourglass were over unless you were doing something quite intensive. That could theoretically be handled with 1 P core and 2 E cores.

3) As a programmer, I can say without a doubt that I can make almost anything that I've programmed run faster with 2 threads than one with very little work. So, realistically, that minimum should be 2 P cores and 2 E cores. Going past that becomes very case-specific and might not be worth my time.

4) Operating Systems and internet browsers have had scope creep, especially Windows. I'm doing next to nothing right now and there are 10 threads taking up significant CPU resources. True, they don't each need their own core, but ~2 E cores would handle most of that. So now we are at 2 P cores and 4 E cores.

5) What if we decide with all these E cores available that we "need" 100% virus scanning of everything with no user-impact. 2 P cores and 5 E cores.

6) What if we then decide we "need" 100% encryption of everything. 2 P cores and 6 E cores.

7) "Need" Teams or Skype or similar connectivity at all times? 2 P cores and 7 E cores.

8) "Need" Alexis-like functionality to actually work well? 2 P cores and 8 E cores.

9) "Need" interactive email that updates the emails on the fly? 2 P cores and 9 E cores. Google is already going down this path.

Etc. The number of E cores we "need" will just keep increasing. Once we reach Arrow Lake and the E cores take less power than your case fan, it is easy to decide that there is one more "need".

And the numbers above don't even account for actual power use that many people have at some point.
 
Last edited:

dullard

Elite Member
May 21, 2001
25,203
3,617
126
I'm currently using Skylake-X i7-7820X overclocked on all cores to 4.4ghz, 64GB RAM, GTX1080. What kind of single core speed improvements should I expect from i7-12700 or i9-12900 ? I primarily use ACR, Photoshop for stitching panoramas, merging HDR, processing film scans of 70-100MP photos (B&W and Color), applying SRDx dust removal filter, etc. Most of the workload I do seems to be single threaded. Would I be able to tell a significant speed improvement?
The 12900K gets 1400 points in PugetBench Photoshop v0.93.3. https://www.pugetsystems.com/labs/a...Gen-Intel-Core-vs-AMD-Ryzen-5000-Series-2245/

The best stock speed 7820X gets 855 points, although do note that it is using a different benchmark version 0.93.1. https://www.pugetsystems.com/benchm...toshop&application=&specs=7820x#results-table I do not know how much the benchmark version change affects the scores, so take them with a bit of caution. 1400 vs 855 is a 60% gain at stock speeds. If you have a 4.4 GHz overclock, then the difference is closer to 30% assuming you don't overclock the 12900K.

The benches above are for Photoshop as a whole which does have a lot of multi-threaded tasks. I don't have knowledge of your specific individual tasks. But, you can look into those links for individual filter benchmarks. If your workload is truly single threaded, then I think you are looking at the wrong CPUs. The 12900K main advantage is a lot of threads. Same with the 7820X. Are you sure your workloads are mostly single-threaded?
 

imported_pk

Junior Member
Feb 11, 2007
11
0
66
Are you sure your workloads are mostly single-threaded?

Thank you for your advice. I do also have multithreaded workflows, but I'm less concerned about them - anything with 8+ cores is good enough, e.g. for Adobe Premiere video encoding, Lightroom imports/exports, etc. It's the single threaded tasks that I would love to speed up to its max. 12700 seems reasonably priced where I wouldn't have a reason to get anything lesser/cheaper. 12900 with its higher turbo clock seems interesting as well. If all I get is 25-30% single threaded performance increase then it's likely not worth it. If i can get 40-50% increase than upgrade starts looking more reasonable.
 

DrMrLordX

Lifer
Apr 27, 2000
21,794
11,143
136
Unless your workload is embarrassingly parallel, you don't need 16 performance cores either.

10P would be fine too, 16P just covers most use cases and gives Intel the edge they need to whip the 5950X in every conceivable workload. Not really that hard to figure it out mang.

Meanwhile, in a 10t workload, which would you rather have? 8P + 8E or 16P? I'm voting 16P, every time.

You aren't the genius, that discovered they are all wrong, that you seem to think you are.

Back in 2018, Intel released the 9900k that swept every benchmark, beating the 2700x decisively. And the 2700x wasn't even half a year old when they did it. Intel did so with a die that was larger than anything they had launched on 14nm for the consumer desktop. Doesn't take a genius to figure out how they did it, either.
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,260
5,257
136
Back in 2018, Intel released the 9900k that swept every benchmark, beating the 2700x decisively. And the 2700x wasn't even half a year old when they did it. Intel did so with a die that was larger than anything they had launched on 14nm for the consumer desktop. Doesn't take a genius to figure out how they did it, either.

We aren't living in the past. Core counts have doubled since then, and when going to very high core counts(or at even lower counts for mobile), the Big-Little solutions makes more sense .

8 performance cores is plenty (if not overkill) for thread limited workloads, and beyond that point you are better served by efficiency cores for highly parallel workloads.

As stated multiple times, they deliver both better silicon area efficiency, and power efficiency. It's why they exist.

You can then build a chip that is smaller, delivers higher MT performance, with better power efficiency with this approach. So it's win-win-win.

Which is why EVERYONE is doing it. The last holdout is now AMD, which apparently will also soon go this route.
 

DrMrLordX

Lifer
Apr 27, 2000
21,794
11,143
136
They also have a larger profit margin to show for it

Well yeah. They can get more dice per wafer out of it. That certainly must have affected their decision-making process. Plus unless Intel has a different die for Alder Lake-P, they can basically use the same die between Alder Lake-S and at least some Alder Lake-P by just disabling two Golden Cove cores for -P.
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,260
5,257
136
The Big.little solution from Intel didn't run the table. 16c Golden Cove would have. What does Intel have to show for it as an advantage?

A smaller die and more power consumption.

IIRC the 12900K consumes something like 250+ watts, while the E-cores are consuming less than 50 watts. Drop a couple of P-Cores and include more E-cores and they could have dropped power while increasing MT peformance while using less power.

16 P cores at these clocks likely would need 400 watts, and a lot more die area.

The problem with the 12900K trying to compete with the 5950x is not from too many E-Cores, it's from having not enough E-Cores.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |