Discussion Intel current and future Lakes & Rapids thread

Page 502 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

mikk

Diamond Member
May 15, 2012
4,168
2,205
136
Well it looks very similar to me......I mean the marketing.


Skylake and Gracemont had been compared on real hardware, the 22nm slide is an example of Abwx being too optimistic on a simple shrink. SB and IVB is a good example, IVB only got minor architecture tweaks over SB.
 
Reactions: Zucker2k

SAAA

Senior member
May 14, 2014
541
126
116
Seems like the big question here is the "golden" optimal ratio of Golden Cove to Gracemont cores? Which will vary from user-to-user by workload(s). If Gracemont really is nearly Skylake at 1/4 the die space I think I'd rather have 4 Coves and 24 Gracemont cores. But of course we won't know until we get our sweaty hands on these things.
I think the point is they are going to reach and even surpass that ratio long term, Raptor and Meteor lake going 8/16 and 8/32, but for the time being no "big" cores regression from the 8 that Skylake and Rocketlake had. (ignore Cometlake in this argument as they went up to 10 cores out of desperation, if they could have pushed out 8 Icelake cores sooner there would have been no 10 core part at all)
That way they can keep increasing IPC and transistor budget on the big cores, without increasing the amount and wasting die area for parallel performance that many small core can achieve better.
 
Reactions: mikk

AtenRa

Lifer
Feb 2, 2009
14,002
3,357
136
Skylake and Gracemont had been compared on real hardware, the 22nm slide is an example of Abwx being too optimistic on a simple shrink. SB and IVB is a good example, IVB only got minor architecture tweaks over SB.

According to anand, Ivybridge has a 38% increase in perf/watt over Sandybridge.
Since Ivy had minor mArch tweaks, then the vast majority of its increased perf/watt is coming from the fabrication (22nm tri-gate).




 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
re: R20, Golden Cove did increase the vector units to 3 from 2. Between that and the DDR5 bandwidth increase you'd think R20 would be way faster. Don't ask about power, especially at 5 Ghz.

Sunny Cove already had 3 vector ALUs, it was able to pump 3x256 ALU type instructions per clock. I think most of the Cinebench MT prowess instead comes from:

1) Big cores can do 3x256 loads per cycle, can execute a wide mix of ALU, vector ALU per clock. As Zen3 investigation found: on ZEN3 IPC of Cinebench R23 is 1.41 and it is bound by backend resources 23%, so more execution capabilities go long way enchancing performance. And i suspect Intel is adding fast FADD unit cause it helps in Cinebench style workloads.
2) Small cores are actually beastly and offer the same 3 vector ALUs, even if one is limited to ALU operations only ( think something like VPANDX but not VADDXX or VMULXX ). And all that is backed by 2 load + 2 store per cycle. Question is how wide it is, but remember ZEN3 can only do 2x256 load + 1x256 store per cycle.

Intel has returned to doing the sane things and is turning its back to FMA crowd ( the two of them who are running Linpack and prime95 all day ) that are ruining performance for normal people. Skylake has degraded latency of simple FP add / mul instructions from 3 to 4, and even if throughput is good, latency still matters. Small Atom like Tremont in fact had 3 cycle latency FP add, when big core had 4 cycle.

Since everything in floating point world is executed on what Intel calls "vector" units, even if we are talking about simple, not vectorized floating point variables (float x; double y) - they are loaded in 128bit XMM registers and instructions like ADDSS / ADDSD and MULSS / MULSD are executed.
So looking at resources "small" core has - it can in fact match Skylake in throughput and beat it in latency for those small ops while also having additional FP/VEC port for ALU operations. So it already starts the game with more execution resources than Skylake and is more similar to Sunny Cove, than Skylake.
And the funny thing is, since we are talking about separate execution ports for FP/VEC, it means that additional four integer ALU ports are free to do operations, unlike on Skylake/Sunny Cove where PORT0 / PORT1 are overcrowded with hardware and once busy with FP/VEC, they are not available. For example Skylake/SNC will have just one Shift ALU available for variuos operations, while Atom has 4 to choose from; while just one integer multiplier unit is available and zero divisors, Atom can choose from 2 ports to do these ops.

I think the only real bottleneck with so many ports is gonna be 5-wide allocation to feed so many ports, if they had 6-wide allocation like Skylake, they would be matching Sunny Cove instead. Next generation of Atom is gonna be exciting, even if current one is good for marketing Cinebench numbers only.
 
Last edited:
Reactions: Vattila

Mopetar

Diamond Member
Jan 31, 2011
7,997
6,426
136
I used to think MLID was a fraud, but apparently he got a few things right recently. Maybe he was a case of fake it till you make it and he got a lot better at his job

He's in the rumor reporting business which means he'll publish anything. Even if he gets something he himself thinks is ridiculous or unlikely he doesn't get any money from a video he doesn't make.

Stick around long enough publishing the junk rumors and eventually a few good leaks are bound to fall into your lap.
 

mikk

Diamond Member
May 15, 2012
4,168
2,205
136
According to anand, Ivybridge has a 38% increase in perf/watt over Sandybridge.
Since Ivy had minor mArch tweaks, then the vast majority of its increased perf/watt is coming from the fabrication (22nm tri-gate).




It's not even close in these.
 

CakeMonster

Golden Member
Nov 22, 2012
1,425
530
136
Considering the HT threads are prioritized last with Thread Director(?), is HT becoming irrelevant in a generation or two? It would take a lot today for regular user with the average workload to use those very last threads on AL or especially RL.
 
Reactions: scineram

Abwx

Lifer
Apr 2, 2011
11,157
3,857
136


It's not even close in these.

If you look at the CPU only power (on the 12V rail) then 2600K consume 42% more and has lower throughput, wich if accounted result in more than 50% perf/watt advantage for the 3770K despite a slightly higher frequency, at isofrequency improvement would be closer to 60%.
Not quite 2x but so far both TSMC and GF announced rougly 2x the perf/watt for a same process transition, so Intel should get similar results.

Besides contrary to your sayings SKL vs GRMT comparison is not done on real silicon, this and the fact that they use a system dependant benchmark rather than the IPC oriented version.

Efficient-core
Internal Estimates as of June 22, 2021 using internal architecture simulation.

Workload: SPECrate2017_int_base estimates with GCC 8.1.0 -O2 binaries

 

John Carmack

Member
Sep 10, 2016
156
248
116
As much as I enjoy the dmens vs world debates (featuring CapFrameX), aren't we past the point of taking marketing slides at face value?

Haven't you people been burned enough by those misleading Sunny Cove/Ice Lake/Rocket Lake/Tiger Lae slides of the past year?

I'll believe the performance uplift when I see it in print on this web site.
 

Saylick

Diamond Member
Sep 10, 2012
3,369
7,096
136
Considering the HT threads are prioritized last with Thread Director(?), is HT becoming irrelevant in a generation or two? It would take a lot today for regular user with the average workload to use those very last threads on AL or especially RL.
I doubt it. Modern x86 cores are pretty dang wide, so it really helps having an additional thread to increase utilization. Heck, if MLID is to be believed, SMT4 might be on the table for Lunar Lake.
 

mikk

Diamond Member
May 15, 2012
4,168
2,205
136
If you look at the CPU only power (on the 12V rail) then 2600K consume 42% more and has lower throughput, wich if accounted result in more than 50% perf/watt advantage for the 3770K despite a slightly higher frequency, at isofrequency improvement would be closer to 6


3770K= 64.8W
2600k= 73.2W

13% more not 42%.
 
Reactions: Zucker2k

Hulk

Diamond Member
Oct 9, 1999
4,364
2,230
136
Considering the HT threads are prioritized last with Thread Director(?), is HT becoming irrelevant in a generation or two? It would take a lot today for regular user with the average workload to use those very last threads on AL or especially RL.

I'm not doubting the validity of this comment but I don't understand it? It is my understanding that HT basically utilizes CPU resources that would otherwise not be utilized. Regardless of how low down the priority list the threads assigned to HT are, wouldn't the additional compute being assigned to these threads increase overall performance? What am I missing here?
 

Mopetar

Diamond Member
Jan 31, 2011
7,997
6,426
136
It really depends on the design being used. HT was originally introduced because Intel had a long pipeline in P4 and it was was easy for it to sit around doing nothing for many of those stages. Hyper threading allowed those resources to be utilized better when the situations that would have otherwise left them idle arose.

The Gracemont cores may not have a design with the kind of slack that exists in other x86 cores, so adding in the additional hardware to enable SMT may not add enough performance to be worth the extra transistor cost.
 

eek2121

Diamond Member
Aug 2, 2005
3,041
4,257
136
It really depends on the design being used. HT was originally introduced because Intel had a long pipeline in P4 and it was was easy for it to sit around doing nothing for many of those stages. Hyper threading allowed those resources to be utilized better when the situations that would have otherwise left them idle arose.

The Gracemont cores may not have a design with the kind of slack that exists in other x86 cores, so adding in the additional hardware to enable SMT may not add enough performance to be worth the extra transistor cost.

This is likely the correct answer. Atom is designed to be small and energy efficient. It will be interesting to see the size, number of transistors, and power consumption numbers compared to Tiger Lake or Ice Lake.

EDIT: it is a shame we never saw a 10nm skylake port.
 

gdansk

Platinum Member
Feb 8, 2011
2,478
3,373
136
Goldmont is such an unusual design. But I think it has some benefit being without SMT. Tasks that aren't very low latency and care more about security could avoid some SMT side channel attacks if they can mark themselves as preferring it.
 

eek2121

Diamond Member
Aug 2, 2005
3,041
4,257
136
Yes. Let's see Alder Lake in QS or launch form on a standardized benchmark suite.
Agreed. Intel is almost hiding some workloads. The use of spec_rate, or ANY synthetic workload, is a huge red flag for me. You'll notice that AMD does NOT do such things at launch. Lovers can argue about CPU X vs. CPU Y all they want, but I start to wonder when the benchmarks are hidden in fine print. P.S. anyone at Intel reading this, feel free to grow a pair (of CPU clusters?) any time now. That being said, I suspect we'll find the 8 golden cove cores ahead of Zen 3 by 10-15% on average and the 8 Gracemont cores should provide another 25%-35% uplift. The higher boosts will be benchmarks that benefit cache and/or AVX and the lower boosts will not. Overall, I stand by my predictions: close to 10% faster than the 5950x in the majority of workloads, with a few being above and a few being below.
 

A///

Diamond Member
Feb 24, 2017
4,352
3,155
136
Yes, spotted. Until a retail sample gets passed around like a candy tray and we see actual benches, I don't care enough to speculate. If it's faster than a 5900X but uses 150-200 watts more, then it's DOA to me. While you don't have to worry about power consumption on a desktop, there is a limit to how silly you can get with power draw. I'm not trying to outbattle my dang central air.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |