Discussion Intel current and future Lakes & Rapids thread

IntelUser2000 · Aug 22, 2021

insertcarehere said:
2. @IntelUser2000 estimates on die sizes was based on Gracemont being an incremental step-change from Tremont, when its clear that isn't the case here (more than doubling of the back-end resources + 2x L1 size)

I haven't changed my estimation. I calculated the size using the diagram they had earlier on, not based on architectural changes. There's no way to estimate die size using high level additions.

The core size is well under 2mm2.

DrMrLordX · Aug 22, 2021

biostud said:
Intel Core i9-12900K 16 Core Alder Lake CPU Benchmarked on ASUS ROG STRIX Z690-E Gaming WIFI Motherboard, Faster Than Core i9-11900K

Intel Alder Lake Core i9-12900K 16 Core Desktop CPU has been benchmarked on ASUS's ROG STRIX Z690-E Gaming WIFI motherboard.

wccftech.com

Well that's not very impressive. But it's just one data point, and it's a QS so it could be buggy.

biostud · Aug 22, 2021

DrMrLordX said:
Well that's not very impressive. But it's just one data point, and it's a QS so it could be buggy.

So the performance cores are faster and the efficiency cores slower, but roughly the same as 16 zen3 in MT.

Abwx · Aug 22, 2021

mikk said:
Consommation, efficacité énergétique - Intel Core i7-3770K et i5-3570K : Ivy Bridge 22nm en test - HardWare.fr

Aidé d'un nouveau process 22nm, Intel lance de nouveaux Core i5 et i7 Ivy Bridge qui ont la lourde tâche de succéder aux Sandy Bridge qui sont en haut de l'affiche depuis plus d'un an. Pari réussi ?

www.hardware.fr

It's not even close in these.

mikk said:
3770K= 64.8W
2600k= 73.2W

13% more not 42%.

Fair enough, i confused the scores, neverless i stand by the second part of my post, that 10nm bring something like 2x the perf/watt vs 14nm.
Indeed the perf/watt improvement is stated at 10-15% from 10SF to 10ESF, that s within a same node, so the gap with (average) 14nm is undoubtly huge.

Besides, and as pointed by a member, we dont know what 14nm they used in the comparisons, there s considerable improvement from first 14nm iterations to the following ones.

biostud said:
So the performance cores are faster and the efficiency cores slower, but roughly the same as 16 zen3 in MT.

Looking at the 11900K score the test make hardly use of more than 8 cores, so in 16T mode ADL 8 + 8 is slightly faster (at 5GHz + 3.9GHz..) than a 5950X (at 4.3-4.4GHz)...

SAAA · Aug 22, 2021

Honestly that bench shows almost no differences between a 8 core, 16 core and 24 core with different clocks and IPC. Whatever it measures it's moot if all current/next gen CPUs perform almost the same, I'd even take leaked Geekbench scores as more indicative than that suite.

diediealldie · Aug 22, 2021

JoeRambo said:
Intel has returned to doing the sane things and is turning its back to FMA crowd ( the two of them who are running Linpack and prime95 all day ) that are ruining performance for normal people. Skylake has degraded latency of simple FP add / mul instructions from 3 to 4, and even if throughput is good, latency still matters. Small Atom like Tremont in fact had 3 cycle latency FP add, when big core had 4 cycle.

Since everything in floating point world is executed on what Intel calls "vector" units, even if we are talking about simple, not vectorized floating point variables (float x; double y) - they are loaded in 128bit XMM registers and instructions like ADDSS / ADDSD and MULSS / MULSD are executed.
So looking at resources "small" core has - it can in fact match Skylake in throughput and beat it in latency for those small ops while also having additional FP/VEC port for ALU operations. So it already starts the game with more execution resources than Skylake and is more similar to Sunny Cove, than Skylake.
And the funny thing is, since we are talking about separate execution ports for FP/VEC, it means that additional four integer ALU ports are free to do operations, unlike on Skylake/Sunny Cove where PORT0 / PORT1 are overcrowded with hardware and once busy with FP/VEC, they are not available. For example Skylake/SNC will have just one Shift ALU available for variuos operations, while Atom has 4 to choose from; while just one integer multiplier unit is available and zero divisors, Atom can choose from 2 ports to do these ops.

I think the only real bottleneck with so many ports is gonna be 5-wide allocation to feed so many ports, if they had 6-wide allocation like Skylake, they would be matching Sunny Cove instead. Next-generation of Atom is gonna be exciting, even if the current one is good for marketing Cinebench numbers only.

These are good points. I'm actually quite curious how Atom team put everything into such a small area. Some parts of cores are even bigger than Golden Cove (L1 cache and issue ports) We already know that the Willow cove(which is almost the same as Sunny cove) core uses more than 10mm2 of die space. 2015 Skylake used 8mm2 of die space, it'll be 3~4mm2 with 10ESF assuming 50~70% of shrink. But Gracemont uses 1.5~2mm2. Surprising.

diediealldie · Aug 22, 2021

CakeMonster said:
Considering the HT threads are prioritized last with Thread Director(?), is HT becoming irrelevant in a generation or two? It would take a lot today for regular user with the average workload to use those very last threads on AL or especially RL.

Maybe not. 20~25% of additional throughput from a big core is still quite big. Also, if they go with hardware-guided scheduling, probably there will be a point where an operating system figures out backend utilization of CPU. For example, the current OS does not really know if a certain thread is an integer ALU intensive or floating-point ALU intensive. They still have rooms to improve.

lobz · Aug 22, 2021

DrMrLordX said:
And yet . . .

. . . you're still making predictions. Against a CPU that won't even be Alder Lake's main competition.

Why is that a problem? If only everyone here would try to make educated guesses instead of swearing fearlessly on the things they think they know - when in reality, the only people who really know anything for sure may not come to this forum, not just morally but legally too.

Borealis7 · Aug 22, 2021

Will i ever be able to get a 6C/12T (or similar configuration) for gaming, with 5GHZ all cores and just decent thermals? or all we ever gonna get are hot chips with more threads than we can use? i live in the desert, thermals matter to me because a throttled CPU performs worse than stock. also, i've given up on OCing entirely. just not worth the effort and investment anymore. both on CPU and GPU.

jpiniero · Aug 22, 2021

Borealis7 said:
Will i ever be able to get a 6C/12T (or similar configuration) for gaming, with 5GHZ all cores and just decent thermals? or all we ever gonna get are hot chips with more threads than we can use? i live in the desert, thermals matter to me because a throttled CPU performs worse than stock. also, i've given up on OCing entirely. just not worth the effort and investment anymore. both on CPU and GPU.

May as well buy Zen 3D then.

Abwx · Aug 22, 2021

SAAA said:
Honestly that bench shows almost no differences between a 8 core, 16 core and 24 core with different clocks and IPC. Whatever it measures it's moot if all current/next gen CPUs perform almost the same, I'd even take leaked Geekbench scores as more indicative than that suite.

The upside is that all CPUs cores can be compared somewhat relevantly since there s no apparent core count advantage over 8, and among subscores the rendering is surely the most IPC sensitive, in wich case IPC improvement from RKL to ADL is 11.4% in this task, assuming they are both at 5GHz.

Borealis7 · Aug 22, 2021

jpiniero said:
May as well buy Zen 3D then.

ADL-S releases first, in October. Zen3D releases further into 2022, we'll see.

jpiniero · Aug 22, 2021

Borealis7 said:
ADL-S releases first, in October. Zen3D releases further into 2022, we'll see.

You can still turn off the Atoms if you want. Intel might make the cheapest i5 be 6+0 instead of 6+4 but that will be locked so no 5 Ghz.

Mopetar · Aug 22, 2021

I'd almost like to see Intel release a chip using only the small cores, but stacking the hell out of them like the ARM server chips do.

It's probably not practical for a variety of reasons without Intel doing a lot of other work for such a design, but it would be cool to see them make something like that.

Thunder 57 · Aug 22, 2021

eek2121 said:
Agreed. Intel is almost hiding some workloads. The use of spec_rate, or ANY synthetic workload, is a huge red flag for me. You'll notice that AMD does NOT do such things at launch. Lovers can argue about CPU X vs. CPU Y all they want, but I start to wonder when the benchmarks are hidden in fine print. P.S. anyone at Intel reading this, feel free to grow a pair (of CPU clusters?) any time now. That being said, I suspect we'll find the 8 golden cove cores ahead of Zen 3 by 10-15% on average and the 8 Gracemont cores should provide another 25%-35% uplift. The higher boosts will be benchmarks that benefit cache and/or AVX and the lower boosts will not. Overall, I stand by my predictions: close to 10% faster than the 5950x in the majority of workloads, with a few being above and a few being below.

You had me until you started speculating. Golden Cove sounds like the biggest update since Sandy Bridge. Gracemont sounds to good to be true. I think ADL takes the singe thread crown, but Zen 3 still holds MT. Interesting times ahead, nice to see CPU competition again.

lobz · Aug 22, 2021

SAAA said:
Honestly that bench shows almost no differences between a 8 core, 16 core and 24 core with different clocks and IPC. Whatever it measures it's moot if all current/next gen CPUs perform almost the same, I'd even take leaked Geekbench scores as more indicative than that suite.

Yep, mostly a gpu bench. At least we know there are no big problems.

Mopetar said:
I'd almost like to see Intel release a chip using only the small cores, but stacking the hell out of them like the ARM server chips do.

It's probably not practical for a variety of reasons without Intel doing a lot of other work for such a design, but it would be cool to see them make something like that.

Kind of a self-fulfilling wish, they are obviously going to put Gracemont into servers in some form. Not sure if it will be Xeon D, they might stay with the coves for the D family still. AMD will not be far behind this time though, if the many leaks are to be believed, a 128C Bergamo will probably not be using full-fledged Zen 4 cores, but of course I don't know that yet.
I bet they are both terrified from Apple entering the server space all of a sudden, and with good reason.

Doug S · Aug 22, 2021

lobz said:
I bet they are both terrified from Apple entering the server space all of a sudden, and with good reason.

That's silly. When has Apple ever shown it can successfully sell to enterprises? The sales it makes to businesses now (which are mostly to smaller/mid sized businesses not Fortune 500 sized companies) are almost an afterthought, they make those sales without really trying or appearing to care that much.

Entering the server market would be a huge departure for them, and even if they totally dominated on benchmarks that wouldn't be any guarantee of success as people making those purchase decisions value vendor support (and knowing that vendor support will still be as good five years down the road as it is on day one) more than anything else. There's a reason why when AMD started beating Intel with Opteron and more recently with Epyc it took a couple years before that started having any real impact on market share. Enterprise buyers are conservative by nature, the "nobody ever got fired for buying IBM" saw is kind of true - it is a bigger risk to say "let's go a different way this time" so most will want to have seen others successfully take that risk before they will jump.

If Apple repurposes the "Jade-C" building blocks for their higher end into server CPUs I bet it would be for internal use only. Maybe after a few years of using it successfully internally they might think about productizing it, but I'm skeptical. One of the reasons they've been so successful is that they focus on a few fairly narrowly defined markets, and are almost entirely consumer focused. Consider that the entire worldwide server market (from OEMs so counting everything from RAM to storage to racks and so on, way more than just CPUs) is a bit less than $100 billion a year. They'd need to take 20-25% of that market just to equal what they make from Airpods! So it isn't like they'd be able to have a big impact on the overall business with even the rosiest possible outcome.

CakeMonster · Aug 22, 2021

Thanks for the replies to the HT question. I didn't mean to imply that the smaller cores needed HT, more that HT threads on the big cores would be used very little now that there are small cores ahead of them in the prioritization queue. On average HT will be utilized a LOT less than currently on AL and its successors if Intel stick with this kind of configuration.

Dayman1225 · Aug 22, 2021

Mopetar said:
I'd almost like to see Intel release a chip using only the small cores, but stacking the hell out of them like the ARM server chips do.

It's probably not practical for a variety of reasons without Intel doing a lot of other work for such a design, but it would be cool to see them make something like that.

Intel is doing this for server - it’s codenamed Sierra Forest and is using the next Atom core.

jpiniero · Aug 22, 2021

Doug S said:
That's silly. When has Apple ever shown it can successfully sell to enterprises? The sales it makes to businesses now (which are mostly to smaller/mid sized businesses not Fortune 500 sized companies) are almost an afterthought, they make those sales without really trying or appearing to care that much.

I would expect it to be more Cloud; either selling directly to cloud companies exclusively or maybe their own service. Or both.

AMDK11 · Aug 22, 2021

Thunder 57 said:
You had me until you started speculating. Golden Cove sounds like the biggest update since Sandy Bridge. Gracemont sounds to good to be true. I think ADL takes the singe thread crown, but Zen 3 still holds MT. Interesting times ahead, nice to see CPU competition again.

I dare say that GoldenCove is the biggest update since Conroe (Core2). Conroe (2006) introduced a 4-way x86 decoder, so from Conroe (Core2) to SunnyCove (2021) the x86 decoder was 4-way. I would like to add that from the time of Bulldozer (x86 CMT core (called module)) to Zen-Zen3 the x86 decoder is also 4 way. For the first time in 15 years in x86 history, Intel extended the x86 decoder by 50% to 6-way. This is really a very big x86 core upgrade. Additionally, Intel for the first time since P6 (Pentium Pro) to SunnyCove is switching from L1-I cache fetching 16 Bytes per clock cycle to fetching 32 Bytes in GoldenCove. GoldenCove is the largest redevelopment and expansion since Conroe (Core2). SandyBridge wasn't that revolutionary.

JoeRambo · Aug 22, 2021

AMDK11 said:
SandyBridge wasn't that revolutionary.

Can't agree with You here. Sandy Bridge WAS revolutionary, even if it was undersold at that time by Intel. It was design that moved from P6 ( PPRO ) legacy, to modern core with different OOP machine based on Physical Reg File ( as in from actual registers in OOO machinery to pointers to said registers in PRF ) and added uOP cache.
Haswell, Skylake, Sunny Cove and Golden Cove have all the same machinery of Sandy Bridge and kept piling "more" on to it.

So in evolution of P6->P4->P6 based C2D->Sandy Bridge we are still on that uArch. Widening instruction fetch and widening decoders is loooong overdue, just like adding 5th ALU, but that does not make any revolutions, rather belated evolution of necessity.

Oh, and lets wait for actual chip release before we evaluate how good that 6-wide decode is in practice. It could be as ~~bad~~ retarded as having 1-complex + 5 simple decoders, that is 6-wide when all stars align in instruction stream

AMDK11 · Aug 22, 2021

JoeRambo said:
Can't agree with You here. Sandy Bridge WAS revolutionary, even if it was undersold at that time by Intel. It was design that moved from P6 ( PPRO ) legacy, to modern core with different OOP machine based on Physical Reg File ( as in from actual registers in OOO machinery to pointers to said registers in PRF ) and added uOP cache.
Haswell, Skylake, Sunny Cove and Golden Cove have all the same machinery of Sandy Bridge and kept piling "more" on to it.

So in evolution of P6->P4->P6 based C2D->Sandy Bridge we are still on that uArch. Widening instruction fetch and widening decoders is loooong overdue, just like adding 5th ALU, but that does not make any revolutions, rather belated evolution of necessity.

Oh, and lets wait for actual chip release before we evaluate how good that 6-wide decode is in practice. It could be as ~~bad~~ retarded as having 1-complex + 5 simple decoders, that is 6-wide when all stars align in instruction stream

With this, I agree that SandyBridge is revolutionary in these respects.

I was talking about pipelines and I consider the transition from the 4-way x86 to the 6-way decoder to be a revolution. It's not just adding resources. All the control logic and the algorithms contained in it have been completely redesigned or replaced with a new, more extensive one. You have a heavily rebuilt and expanded Front-End with a completely new predictor and preselector with new mechanisms. There have been big changes to the rest of the x86 core as well, but so far Intel hasn't revealed everything yet. The fact that GoldenCove basically uses similar mechanisms as SandyBridge does not mean that it is the same microarchitecture. With each generation, new mechanisms and algorithms are added to the x86 core to increase the IPC. In my opinion SunnyCove is the biggest change since SandyBridge and GodenCove is the biggest change since Conroe.

That's my opinion but you don't have to agree with it

Cardyak · Aug 22, 2021

CakeMonster said:
Thanks for the replies to the HT question. I didn't mean to imply that the smaller cores needed HT, more that HT threads on the big cores would be used very little now that there are small cores ahead of them in the prioritization queue. On average HT will be utilized a LOT less than currently on AL and its successors if Intel stick with this kind of configuration.

I hypothesise that HyperThreading really has its days numbered. Processing more threads simultaneously via HT seems a bit redundant now we have more physical cores than we know what to do with.

HT was really just a quick and dirty way to increase the utilisation of CPU architectures that didn’t have enough depth to keep the core fed. Dropping HT will also reduce the burden of complexity on the scheduler, and remove the management around virtual cores.

I’d honestly rather just see designers go down the Apple/ARM/Intel Atom route. One thread per core and focus on building a more balanced design that keeps the width as busy as possible via strong OoO, Caches, Branch Predictors, etc.

moinmoin · Aug 22, 2021

lobz said:
I bet they are both terrified from Apple entering the server space all of a sudden, and with good reason.

Any indication of that?

Discussion Intel current and future Lakes & Rapids thread

Elite Member

Lifer

Lifer

Lifer

Senior member

Member

Member

Platinum Member

Platinum Member

Lifer

Lifer

Platinum Member

Lifer

Diamond Member

Platinum Member

Platinum Member

Platinum Member

Golden Member

Golden Member

Lifer

Senior member

Golden Member

Senior member

Member

Diamond Member