Discussion Intel current and future Lakes & Rapids thread

mikk · Aug 21, 2021

AtenRa said:
Well it looks very similar to me......I mean the marketing.

Skylake and Gracemont had been compared on real hardware, the 22nm slide is an example of Abwx being too optimistic on a simple shrink. SB and IVB is a good example, IVB only got minor architecture tweaks over SB.

SAAA · Aug 21, 2021

Hulk said:
Seems like the big question here is the "golden" optimal ratio of Golden Cove to Gracemont cores? Which will vary from user-to-user by workload(s). If Gracemont really is nearly Skylake at 1/4 the die space I think I'd rather have 4 Coves and 24 Gracemont cores. But of course we won't know until we get our sweaty hands on these things.

I think the point is they are going to reach and even surpass that ratio long term, Raptor and Meteor lake going 8/16 and 8/32, but for the time being no "big" cores regression from the 8 that Skylake and Rocketlake had. (ignore Cometlake in this argument as they went up to 10 cores out of desperation, if they could have pushed out 8 Icelake cores sooner there would have been no 10 core part at all)
That way they can keep increasing IPC and transistor budget on the big cores, without increasing the amount and wasting die area for parallel performance that many small core can achieve better.

AtenRa · Aug 21, 2021

mikk said:
Skylake and Gracemont had been compared on real hardware, the 22nm slide is an example of Abwx being too optimistic on a simple shrink. SB and IVB is a good example, IVB only got minor architecture tweaks over SB.

According to anand, Ivybridge has a 38% increase in perf/watt over Sandybridge.
Since Ivy had minor mArch tweaks, then the vast majority of its increased perf/watt is coming from the fabrication (22nm tri-gate).

The Intel Ivy Bridge (Core i7 3770K) Review

www.anandtech.com

JoeRambo · Aug 21, 2021

jpiniero said:
re: R20, Golden Cove did increase the vector units to 3 from 2. Between that and the DDR5 bandwidth increase you'd think R20 would be way faster. Don't ask about power, especially at 5 Ghz.

Sunny Cove already had 3 vector ALUs, it was able to pump 3x256 ALU type instructions per clock. I think most of the Cinebench MT prowess instead comes from:

1) Big cores can do 3x256 loads per cycle, can execute a wide mix of ALU, vector ALU per clock. As Zen3 investigation found: on ZEN3 IPC of Cinebench R23 is 1.41 and it is bound by backend resources 23%, so more execution capabilities go long way enchancing performance. And i suspect Intel is adding fast FADD unit cause it helps in Cinebench style workloads.
2) Small cores are actually beastly and offer the same 3 vector ALUs, even if one is limited to ALU operations only ( think something like VPANDX but not VADDXX or VMULXX ). And all that is backed by 2 load + 2 store per cycle. Question is how wide it is, but remember ZEN3 can only do 2x256 load + 1x256 store per cycle.

Intel has returned to doing the sane things and is turning its back to FMA crowd ( the two of them who are running Linpack and prime95 all day ) that are ruining performance for normal people. Skylake has degraded latency of simple FP add / mul instructions from 3 to 4, and even if throughput is good, latency still matters. Small Atom like Tremont in fact had 3 cycle latency FP add, when big core had 4 cycle.

Since everything in floating point world is executed on what Intel calls "vector" units, even if we are talking about simple, not vectorized floating point variables (float x; double y) - they are loaded in 128bit XMM registers and instructions like ADDSS / ADDSD and MULSS / MULSD are executed.
So looking at resources "small" core has - it can in fact match Skylake in throughput and beat it in latency for those small ops while also having additional FP/VEC port for ALU operations. So it already starts the game with more execution resources than Skylake and is more similar to Sunny Cove, than Skylake.
And the funny thing is, since we are talking about separate execution ports for FP/VEC, it means that additional four integer ALU ports are free to do operations, unlike on Skylake/Sunny Cove where PORT0 / PORT1 are overcrowded with hardware and once busy with FP/VEC, they are not available. For example Skylake/SNC will have just one Shift ALU available for variuos operations, while Atom has 4 to choose from; while just one integer multiplier unit is available and zero divisors, Atom can choose from 2 ports to do these ops.

I think the only real bottleneck with so many ports is gonna be 5-wide allocation to feed so many ports, if they had 6-wide allocation like Skylake, they would be matching Sunny Cove instead. Next generation of Atom is gonna be exciting, even if current one is good for marketing Cinebench numbers only.

Mopetar · Aug 21, 2021

Hougy said:
I used to think MLID was a fraud, but apparently he got a few things right recently. Maybe he was a case of fake it till you make it and he got a lot better at his job

He's in the rumor reporting business which means he'll publish anything. Even if he gets something he himself thinks is ridiculous or unlikely he doesn't get any money from a video he doesn't make.

Stick around long enough publishing the junk rumors and eventually a few good leaks are bound to fall into your lap.

mikk · Aug 21, 2021

AtenRa said:
According to anand, Ivybridge has a 38% increase in perf/watt over Sandybridge.
Since Ivy had minor mArch tweaks, then the vast majority of its increased perf/watt is coming from the fabrication (22nm tri-gate).

Intel Core i7-3770K Review: A Small Step Up For Ivy Bridge

One of Intel's worst-kept secrets ever, Ivy Bridge is an evolutionary die shrink of Sandy Bridge with improved integrated graphics. The flagship Core i7-3770K is great if you're replacing an old PC. But it's a tough sell if you already own a Core i7 CPU.

www.tomshardware.com

Core i7 3770K review with Z77 (Page 12)

We review the Core i7 3770K Ivy bridge processors alongside Intel's Z77 motherboard. Will Ivy Bridge be the processor series everything you expected? Go find out in this extensive review here at Guru3D.

www.guru3d.com

Consommation, efficacité énergétique - Intel Core i7-3770K et i5-3570K : Ivy Bridge 22nm en test - HardWare.fr

Aidé d'un nouveau process 22nm, Intel lance de nouveaux Core i5 et i7 Ivy Bridge qui ont la lourde tâche de succéder aux Sandy Bridge qui sont en haut de l'affiche depuis plus d'un an. Pari réussi ?

www.hardware.fr

Intel Core i7-3770K CPU Review | bit-tech.net

Intel's range of Ivybridge CPUs has arrived; we take a look at the top-end Core i7 3770K.

bit-tech.net

Intel Core i7-3770K Ivy Bridge Processor Review - Page 13

HotHardware takes a detailed look at the new Intel Core i7-3770K desktop processor, featuring Intel's 22nm Ivy Bridge Core. - Page 13

hothardware.com

Intel Core i7-3770K (22nm Ivy Bridge)

Say hello to the 3rd Generation Core Processor Family.

hexus.net

It's not even close in these.

CakeMonster · Aug 21, 2021

Considering the HT threads are prioritized last with Thread Director(?), is HT becoming irrelevant in a generation or two? It would take a lot today for regular user with the average workload to use those very last threads on AL or especially RL.

Abwx · Aug 21, 2021

mikk said:
Consommation, efficacité énergétique - Intel Core i7-3770K et i5-3570K : Ivy Bridge 22nm en test - HardWare.fr

Aidé d'un nouveau process 22nm, Intel lance de nouveaux Core i5 et i7 Ivy Bridge qui ont la lourde tâche de succéder aux Sandy Bridge qui sont en haut de l'affiche depuis plus d'un an. Pari réussi ?

www.hardware.fr

It's not even close in these.

If you look at the CPU only power (on the 12V rail) then 2600K consume 42% more and has lower throughput, wich if accounted result in more than 50% perf/watt advantage for the 3770K despite a slightly higher frequency, at isofrequency improvement would be closer to 60%.
Not quite 2x but so far both TSMC and GF announced rougly 2x the perf/watt for a same process transition, so Intel should get similar results.

Besides contrary to your sayings SKL vs GRMT comparison is not done on real silicon, this and the fact that they use a system dependant benchmark rather than the IPC oriented version.

Efficient-core

Click to expand...

Internal Estimates as of June 22, 2021 using internal architecture simulation.

Workload: SPECrate2017_int_base estimates with GCC 8.1.0 -O2 binaries

Architecture Day 2021 - 1 | Performance Index

edc.intel.com

jpiniero · Aug 21, 2021

JoeRambo said:
I think most of the Cinebench MT prowess instead comes from:

I was talking more about the single thread R20 test where the rumor was that it hits 800+ which is 30% faster than Rocket Lake at 5.3 Ghz.

John Carmack · Aug 21, 2021

As much as I enjoy the dmens vs world debates (featuring CapFrameX), aren't we past the point of taking marketing slides at face value?

Haven't you people been burned enough by those misleading Sunny Cove/Ice Lake/Rocket Lake/Tiger Lae slides of the past year?

I'll believe the performance uplift when I see it in print on this web site.

Saylick · Aug 21, 2021

CakeMonster said:
Considering the HT threads are prioritized last with Thread Director(?), is HT becoming irrelevant in a generation or two? It would take a lot today for regular user with the average workload to use those very last threads on AL or especially RL.

I doubt it. Modern x86 cores are pretty dang wide, so it really helps having an additional thread to increase utilization. Heck, if MLID is to be believed, SMT4 might be on the table for Lunar Lake.

mikk · Aug 21, 2021

Abwx said:
If you look at the CPU only power (on the 12V rail) then 2600K consume 42% more and has lower throughput, wich if accounted result in more than 50% perf/watt advantage for the 3770K despite a slightly higher frequency, at isofrequency improvement would be closer to 6

3770K= 64.8W
2600k= 73.2W

13% more not 42%.

eek2121 · Aug 21, 2021

jpiniero said:
I was talking more about the single thread R20 test where the rumor was that it hits 800+ which is 30% faster than Rocket Lake at 5.3 Ghz.

Does not surprise me at all.

Hulk · Aug 21, 2021

CakeMonster said:
Considering the HT threads are prioritized last with Thread Director(?), is HT becoming irrelevant in a generation or two? It would take a lot today for regular user with the average workload to use those very last threads on AL or especially RL.

I'm not doubting the validity of this comment but I don't understand it? It is my understanding that HT basically utilizes CPU resources that would otherwise not be utilized. Regardless of how low down the priority list the threads assigned to HT are, wouldn't the additional compute being assigned to these threads increase overall performance? What am I missing here?

Mopetar · Aug 21, 2021

It really depends on the design being used. HT was originally introduced because Intel had a long pipeline in P4 and it was was easy for it to sit around doing nothing for many of those stages. Hyper threading allowed those resources to be utilized better when the situations that would have otherwise left them idle arose.

The Gracemont cores may not have a design with the kind of slack that exists in other x86 cores, so adding in the additional hardware to enable SMT may not add enough performance to be worth the extra transistor cost.

eek2121 · Aug 21, 2021

Mopetar said:
It really depends on the design being used. HT was originally introduced because Intel had a long pipeline in P4 and it was was easy for it to sit around doing nothing for many of those stages. Hyper threading allowed those resources to be utilized better when the situations that would have otherwise left them idle arose.

The Gracemont cores may not have a design with the kind of slack that exists in other x86 cores, so adding in the additional hardware to enable SMT may not add enough performance to be worth the extra transistor cost.

This is likely the correct answer. Atom is designed to be small and energy efficient. It will be interesting to see the size, number of transistors, and power consumption numbers compared to Tiger Lake or Ice Lake.

EDIT: it is a shame we never saw a 10nm skylake port.

DrMrLordX · Aug 21, 2021

John Carmack said:
aren't we past the point of taking marketing slides at face value?

Yes. Let's see Alder Lake in QS or launch form on a standardized benchmark suite.

gdansk · Aug 21, 2021

Goldmont is such an unusual design. But I think it has some benefit being without SMT. Tasks that aren't very low latency and care more about security could avoid some SMT side channel attacks if they can mark themselves as preferring it.

eek2121 · Aug 22, 2021

DrMrLordX said:
Yes. Let's see Alder Lake in QS or launch form on a standardized benchmark suite.

Agreed. Intel is almost hiding some workloads. The use of spec_rate, or ANY synthetic workload, is a huge red flag for me. You'll notice that AMD does NOT do such things at launch. Lovers can argue about CPU X vs. CPU Y all they want, but I start to wonder when the benchmarks are hidden in fine print. P.S. anyone at Intel reading this, feel free to grow a pair (of CPU clusters?) any time now. That being said, I suspect we'll find the 8 golden cove cores ahead of Zen 3 by 10-15% on average and the 8 Gracemont cores should provide another 25%-35% uplift. The higher boosts will be benchmarks that benefit cache and/or AVX and the lower boosts will not. Overall, I stand by my predictions: close to 10% faster than the 5950x in the majority of workloads, with a few being above and a few being below.

bigboxes · Aug 22, 2021

Intel. LOL

DrMrLordX · Aug 22, 2021

eek2121 said:
Agreed.

And yet . . .

Overall, I stand by my predictions: close to 10% faster than the 5950x in the majority of workloads, with a few being above and a few being below.

. . . you're still making predictions. Against a CPU that won't even be Alder Lake's main competition.

scannall · Aug 22, 2021

Attention passengers, the Hype Train is arriving at platform 6. Please have your tickets ready.

coercitiv · Aug 22, 2021

Performance train has already left the station. Efficiency train is still boarding.

Measuring devices are strictly prohibited on board and will be confiscated on sight!

biostud · Aug 22, 2021

Intel Core i9-12900K 16 Core Alder Lake CPU Benchmarked on ASUS ROG STRIX Z690-E Gaming WIFI Motherboard, Faster Than Core i9-11900K

Intel Alder Lake Core i9-12900K 16 Core Desktop CPU has been benchmarked on ASUS's ROG STRIX Z690-E Gaming WIFI motherboard.

wccftech.com

A/// · Aug 22, 2021

Yes, spotted. Until a retail sample gets passed around like a candy tray and we see actual benches, I don't care enough to speculate. If it's faster than a 5900X but uses 150-200 watts more, then it's DOA to me. While you don't have to worry about power consumption on a desktop, there is a limit to how silly you can get with power draw. I'm not trying to outbattle my dang central air.

Discussion Intel current and future Lakes & Rapids thread

Diamond Member

Senior member

Lifer

Golden Member

Diamond Member

Diamond Member

Golden Member

Lifer

Lifer

Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Platinum Member

Diamond Member

Lifer

Lifer

Golden Member

Diamond Member

Lifer

Diamond Member