Discussion Intel current and future Lakes & Rapids thread

HurleyBird · Mar 21, 2021

SAAA said:
Not sure how schematic it is vs reality, but the fact they added 8 Gracemont cores rather than 4 Golden cores (guess estimating) for that area hints the contrary. They can't be that large or it would be pointless.

I don't think that the Gracemont cores are that large, but even if they were I wouldn't necessarily agree with your assertion. Whether or not they would be "worthless" versus adding additional GC cores would come down to the perf/watt difference. Besides lower power consumption for small tasks, power density eventually becomes a limiting factor also.

DrMrLordX · Mar 22, 2021

@cortexa99

Looks more like 12-19% there. 16.8% overall.

Hulk · Mar 22, 2021

11700K is currently $400 at MC.

dullard · Mar 22, 2021

Hulk said:
11700K is currently $400 at MC.

Same price at Newegg and you don't have to go into the store to get it.

Are you a human?

www.newegg.com

Sure beats the original pre-release sale prices that we were seeing such as 480 Euros (~$572) at Mindfactory.

Mopetar · Mar 22, 2021

Well yeah, if you're the only source for something then you can charge more for it.

dullard · Mar 22, 2021

Dell now has Rocket Lake up on their website. April 19 delivery date for most models (a week or so earlier for the Alienware Aurora R12). At least for now, the price is a lot higher for their typical i5 model. Instead of the $500 range for an i5, they are starting at $750. But that will change when they update the Inspiron model.

Dell XPS Desktop with up to 11th Gen Intel Processor

Create without limits using the all-new XPS Desktop. Featuring powerful performance, a minimalist design and a highly expandable chassis.

www.dell.com

dullard · Mar 22, 2021

Dell needs to fix their Rocket Lake text:

Dell XPS Desktop with up to 11th Gen Intel Processor

Create without limits using the all-new XPS Desktop. Featuring powerful performance, a minimalist design and a highly expandable chassis.

www.dell.com

Unleash the power to create: With Intel®’s latest 11th gen, up to 10 core, 20 threads i9 processor you can keep at your most demanding workloads for even longer.

IntelUser2000 · Mar 22, 2021

mikk said:
They have a real small core architecture which is a much better fit, they don't need Skylake.

Someone gets it.

Gracemont is a continuation of the dual-cluster decode introduced with Tremont.

The -mont line substantially diverges from -core.
-The small core decoders directly use most of the x86 instructions rather than changing them to internal instructions - continuation of what the original Bonnell Atom did.
-Sunny Cove increases the L1 data cache, while in Gracemont it'll increase the L1 instruction cache. Likely the L1I increase helps with the doubled decode.
-Starting from Tremont, it uses dual cluster decode which according to the chief architect it saves space in comparison to the uop cache.
-Starting from Goldmont it also has a predecode cache. A massive 64KB on Goldmont Plus.

I expect Gracemont cores to be substantially bigger than Tremont, but they'll still be barely over 1mm2. The Tremont cores are like 0.7mm2. Sunny Cove core is 5-6x the size of Tremont, not 3-4x.

It may perform like Skylake but it'll use fraction of the space and much lower power.

It's not the addition of AVX-512 that makes the Core cores bloated. It's the ridiculous focus on clock speeds that's the issue. Xeon Phi shows the AVX-512 units can be far, far smaller.*

*Of course there might be the factor that the Core design itself might be bloated and inefficient in general.

dmens · Mar 22, 2021

IntelUser2000 said:
The small core decoders directly use most of the x86 instructions rather than changing them to internal instructions - continuation of what the original Bonnell Atom did.

Nope, every Atom has its own internal uop mapping just like the big cores.

It may perform like Skylake

Fat chance, it will not be even remotely close.

It's not the addition of AVX-512 that makes the Core cores bloated. It's the ridiculous focus on clock speeds that's the issue.

The main source of bloat is the area resources required to implement depth of execution speculation. If the CPU pipeline is designed appropriately for the target clock frequency there is no reason for an extraordinary area cost. There is no free lunch, you are not going to get performance without spending area... therefore all this talk of big core level performance from atoms is pure fantasy.

IntelUser2000 · Mar 22, 2021

By the way Rocketlake is trash. Good thing they got Tigerlake out rather than Rocketlake-U. It would have been a laughingstock of the industry.

Relaxing of the L3 and other latencies is the cost of trying to backport a 10nm process to an older, inefficient 14nm one.

yuri69 said:
ALD topping at a maximal 20% ST gain? The average IPC gains is thus possibly significantly lower. I expected the 20% figure to be an average IPC gain.

I don't see what's the big deal here. Did anyone really expect higher clocks? They only reach 5.3GHz by taking away any overclocking headroom. If they back down clocks a bit that's a good thing since it'll get power and thermals under control.

Also the new microcode shows 18.5% in SpecInt and 21% in SpecFP.

Intel Core i7-11700K Review: Blasting Off with Rocket Lake

www.anandtech.com

MLiD actually claimed 2x MT with Alderlake mobile, not desktop. He personally expects Alderlake 8+8 to be like a 12-13 core Rocketlake part.

Gracemont is roughly 1/3rd of Golden Cove considering:
-2/3rd clocks
-2/3rd perf/clock
-20% loss without HT

IntelUser2000 · Mar 22, 2021

dmens said:
Nope, every Atom has its own internal uop mapping just like the big cores.

Intel's Atom Architecture: The Journey Begins

www.anandtech.com

Fat chance, it will not be even remotely close.

You want a bet or something? Tremont is already at Ivy Bridge levels. Another 30% gets us to Skylake.

Do you know what makes that possible? Because ARM cores can do it.

Nope. Let's not even get that far. Their own "little" core team is owning them.

There is no free lunch, you are not going to get performance without spending area... therefore all this talk of big core level performance from atoms is pure fantasy.

Oh right, cause there's no such thing as optimization. That's why since Zen they beat Intel to a pulp in perf/mm2. And that's why M1 almost beats it in overall performance with again, a fraction of the size.

If it was just about the ISA, why does the GPU in M1 perform so fantastic in perf/mm2?

Because there are other factors such as: motivation, execution, optimism about the future and the project you are working on, which are all human factors not "ISA".

Also speaking of competition, I believe in the Zen 4 rumors being 30% faster per clock. With Genoa offering 50% more cores, their performance target must be 2x over Milan. They'll need it against Ampere Altra.

dmens · Mar 22, 2021

IntelUser2000 said:
Intel's Atom Architecture: The Journey Begins

www.anandtech.com

LOL, that is micro-op fusion. It is when two uops are fused coming out of the fast decoder for rename and dispatch. It has absolutely nothing to do with executing x86 instructions directly since there is still an internal uop mapping, and the instruction still has to go through the fast decoder just like everything else. By the way, the big cores did that exact same thing, before the first atom was even a design concept. See this quote: "With the Pentium M Intel began fusing certain micro-ops." Merom (Core 2 Duo) derived directly from Pentium M Yonah and carried the concept over.

You want a bet or something? Tremont is already at Ivy Bridge levels. Another 30% gets us to Skylake.

Sure, any time. I love how people just throw around double digit gains like nothing. Here, ivybridge still 30% faster than Tremont in GB5:

Intel Corporation Elkhart Lake Embedded Platform - Geekbench

Benchmark results for an Intel Corporation Elkhart Lake Embedded Platform with an Intel Atom x6413E processor.

browser.geekbench.com

Intel Core i7-3770K Benchmarks - Geekbench

browser.geekbench.com

They won't spend the area/power to increase small core perf because it will blow through the budgets.

Do you know what makes that possible? Because ARM cores can do it.

Well yes, ARM has several crucial advantages over x86 after all. Just because ARM can do it, does not mean Atom can.

Oh right, cause there's no such thing as optimization. That's why since Zen they beat Intel to a pulp in perf/mm2. And that's why M1 almost beats it in overall performance with again, a fraction of the size.

What are you talking about? Firestorm cores are huge. Zen 3 CPU cores are 5mm2 apiece. We are not talking about massive size differentials here. You are right that optimization can make a huge difference. But it won't turn a 1mm2 core into a 5mm2 core.

If it was just about the ISA, why does the GPU in M1 perform so fantastic in perf/mm2?

Because there are other factors such as: motivation, execution, optimism about the future and the project you are working at, which are all human factors not "ISA".

Sorry but no amount of intangibles will turn an atom size core into a fat core in performance. Not gonna happen. The M1 GPU is also huge by the way. It just also happens to be a really well designed, highly optimized GPU with extremely tight-knit API support. Software support is more important to GPU than CPU so I don't know why you would drag GPU into this, but whatever.

RTX2080 · Mar 23, 2021

The title by Gamersnexus is a bit ruthless:

In realworld pure cpu workload like Blender, Adobe Premier, 11700k to 10700k has well below ~10% advantage, looks worse than those theoretical tests like Cinebench/CPU-Z which could have ~15-20%......

coercitiv · Mar 23, 2021

cortexa99 said:
The title by Gamersnexus is a bit ruthless:

At least they're consistent:

jpiniero · Mar 23, 2021

dmens said:
The main source of bloat is the area resources required to implement depth of execution speculation. If the CPU pipeline is designed appropriately for the target clock frequency there is no reason for an extraordinary area cost. There is no free lunch, you are not going to get performance without spending area... therefore all this talk of big core level performance from atoms is pure fantasy.

Talking about IPC more than actual pure performance. Obviously Gracemont isn't going to clock to 5 Ghz. Surely there's some amount of density gain they could realize even if it lowers max frequency.

JoeRambo · Mar 23, 2021

IntelUser2000 said:
-The small core decoders directly use most of the x86 instructions rather than changing them to internal instructions - continuation of what the original Bonnell Atom did.
-Sunny Cove increases the L1 data cache, while in Gracemont it'll increase the L1 instruction cache. Likely the L1I increase helps with the doubled decode.
-Starting from Tremont, it uses dual cluster decode which according to the chief architect it saves space in comparison to the uop cache.
-Starting from Goldmont it also has a predecode cache. A massive 64KB on Goldmont Plus.

Increasing L1i is a must to improve decode rate. Same with predecode L2 cache, a must for such architecture.
uOP cache ( as done by Sandy+/ZEN, not P4 ) while complex, it saves energy by not having to decode over and over again, at some point those savings + not having to have those massive decoder supporting structures wins over.

I feel like big part of Core bloat comes from sizing of various buffers: ROB, int PRF, FP PRF, uOP cache, branch prediction unit buffers, multi level I/D TLBs, TLBs that have variuos page sizes, store and load queues.
Atoms used to cut corners in all those structures and is making great use of diminishing returns of their sizing. For example large pages in TLB? Only added in Goldmont? 3 decoders? Also recent addition.

Combine these cuts with obvious cuts to execution resources, vector size support and caches sizes, ports and path widths => tiny die area is the end result.

IntelUser2000 said:
I expect Gracemont cores to be substantially bigger than Tremont, but they'll still be barely over 1mm2. The Tremont cores are like 0.7mm2. Sunny Cove core is 5-6x the size of Tremont, not 3-4x.

Atom is more like "dial transistor budget to get performance targets they need" product. It's not like in 2014 it was a secret to them that having iTLB with large page support would help performance, or design of such TLB was beyond their capabilities. it was conscious decision to forgo better performance for die size savings.

What I don't share with You is optimism at performance targets. Honestly that hybrid 1+4 cpu was a disaster and that is understatement already. And it used those Tremont cores that have already dialed structure sizes up on 10nm a lot. But look at disastrous performance compared to other mobile CPUs in for example Cinebench R15? R20?

So Intel's Alder Lake 2xMT performance is very likely a pipe dream on 10nm, probably based on comparison with some hilariuos wattage constrained mobile CPU.

The desktop reality will be ~11 Golden Coves in Cinebench and waaaaaaay behind AMD 16 core cpus. And since it will struggle in poster childs of linear scaling, memory not-touching MT loads, the best way to use it will be disabling those Atom clusters and not having to deal with scheduling problems.

ondma · Mar 23, 2021

JoeRambo said:
Increasing L1i is a must to improve decode rate. Same with predecode L2 cache, a must for such architecture.
uOP cache ( as done by Sandy+/ZEN, not P4 ) while complex, it saves energy by not having to decode over and over again, at some point those savings + not having to have those massive decoder supporting structures wins over.

I feel like big part of Core bloat comes from sizing of various buffers: ROB, int PRF, FP PRF, uOP cache, branch prediction unit buffers, multi level I/D TLBs, TLBs that have variuos page sizes, store and load queues.
Atoms used to cut corners in all those structures and is making great use of diminishing returns of their sizing. For example large pages in TLB? Only added in Goldmont? 3 decoders? Also recent addition.

Combine these cuts with obvious cuts to execution resources, vector size support and caches sizes, ports and path widths => tiny die area is the end result.

Atom is more like "dial transistor budget to get performance targets they need" product. It's not like in 2014 it was a secret to them that having iTLB with large page support would help performance, or design of such TLB was beyond their capabilities. it was conscious decision to forgo better performance for die size savings.

What I don't share with You is optimism at performance targets. Honestly that hybrid 1+4 cpu was a disaster and that is understatement already. And it used those Tremont cores that have already dialed structure sizes up on 10nm a lot. But look at disastrous performance compared to other mobile CPUs in for example Cinebench R15? R20?

So Intel's Alder Lake 2xMT performance is very likely a pipe dream on 10nm, probably based on comparison with some hilariuos wattage constrained mobile CPU.

The desktop reality will be ~11 Golden Coves in Cinebench and waaaaaaay behind AMD 16 core cpus. And since it will struggle in poster childs of linear scaling, memory not-touching MT loads, the best way to use it will be disabling those Atom clusters and not having to deal with scheduling problems.

Obviously, 8+8 will not compete with 16 real Zen cores. I think the real target is 12 core zen performance. Hopefully, Intel will price AL to somewhat reflect this.

mikk · Mar 23, 2021

https://twitter.com/x/status/1374467303845158921

I guess we will see Meteor Lake for mobile first, he talked about SoC.

jpiniero · Mar 23, 2021

Ian's article says Meteor Lake is using Foveros. Article also says they are talking about fabbing products (including CPU tiles) at external foundries as well as 7 nm.

Plus they are going to try the foundry system again. Good luck with that.

mikk · Mar 23, 2021

Alder Lake for desktop followed by mobile, as expected. They have shipped 40 millions of Tigerlake chips so far.

IntelUser2000 · Mar 23, 2021

JoeRambo said:
What I don't share with You is optimism at performance targets. Honestly that hybrid 1+4 cpu was a disaster and that is understatement already. And it used those Tremont cores that have already dialed structure sizes up on 10nm a lot. But look at disastrous performance compared to other mobile CPUs in for example Cinebench R15? R20?

It's Lakefield that's a disaster. Even the Sunny Cove core performance sucks on that chip. 110 on Cinebench R15 ST best case, when Icelake can get 180.

The 6W N6000 gets same ST cores as the Sunny Cove cores in Lakefield. And it gets almost 800 points in MT, which compares well to the 900 points some low end Icelake i5 laptops get: https://www.notebookcheck.net/Acer-...Also-a-good-subnotebook-with-i5.468200.0.html

That's a 50%+ gain over Goldmont Plus based N5000 which corresponds to the 30% perf/clock + clock speed gains.

dmens said:
Sure, any time. I love how people just throw around double digit gains like nothing. Here, ivybridge still 30% faster than Tremont in GB5:

Ivy Bridge also clocks 44% higher in that comparison. In Geekbench it's actually at Haswell levels.

What are you talking about? Firestorm cores are huge. Zen 3 CPU cores are 5mm2 apiece. We are not talking about massive size differentials here. You are right that optimization can make a huge difference. But it won't turn a 1mm2 core into a 5mm2 core.

Zen 3 is only 3.2mm2. Sunny Cove without L2 and FIVR is 4.4mm2, not to mention it's a worse performing core.

We know on the GPU side ARM chips have a far better perf/mm2. Icelake level GPU performance is achieved at less than half the size with the ARM chips.

And it used those Tremont cores that have already dialed structure sizes up on 10nm a lot.

If you call 0.7-0.8mm2 a lot sure.

jpiniero said:
Talking about IPC more than actual pure performance. Obviously Gracemont isn't going to clock to 5 Ghz. Surely there's some amount of density gain they could realize even if it lowers max frequency.

Exactly. Even AMD as a company that seriously struggled does far better. Compared to Zen cores Intel chips need 50% or more to achieve similar performance.

dullard · Mar 23, 2021

JoeRambo said:
What I don't share with You is optimism at performance targets. Honestly that hybrid 1+4 cpu was a disaster and that is understatement already. And it used those Tremont cores that have already dialed structure sizes up on 10nm a lot. But look at disastrous performance compared to other mobile CPUs in for example Cinebench R15? R20?

Lakefield had problems, yes. But complaining about rendering performance on a 7 W device is probably the worst possible argument you could make against Lakefield. Honestly who buys a laptop with low performance and long battery life with the intention of fast image rendering? That would be like a high-end restaurant buying their ingredients from McDonald's down the street, doing poorly, then complaining that therefore the McDonald's food could not possibly be successful.

gdansk · Mar 23, 2021

jpiniero said:
Plus they are going to try the foundry system again. Good luck with that.

It will be a success based on how much effort they put into it. Intel's 14nm would be a massive improvement for many designs. The majority of TSMC revenue comes from 16nm and older processes I think Intel's soon-to-be-spare 14nm could compete well in that space if they help potential customers get their designs working on the process.

RTX · Mar 23, 2021

2.8ghz 80W W-1390
1.5ghz 35W W-1390T
2.9ghz 80W W-1370
3.3ghz 80W W-1350
4.0ghz 125W W-1350P

ASRock Z490 Taichi

Supports 10th Gen Intel Core™ Processors and 11th Gen Intel Core™ Processors (LGA1200)<span style=color:red;>*</span>; 15 Phase Dr.MOS Power Design; Supports DDR4 4666MHz+ (OC); 3 PCIe 3.0 x16, 2 PCIe 3.0 x1; Supports NVIDIA SLI™, AMD 3-Way CrossFireX™; Graphics Output Options: HDMI...

www.asrock.com

jpiniero · Mar 23, 2021

Might be some typos. The site suggests that Rocket Lake W will work on Z490 which isn't the case for Comet Lake W.

Discussion Intel current and future Lakes & Rapids thread

Platinum Member

Lifer

Diamond Member

Elite Member

Diamond Member

Elite Member

Elite Member

Elite Member

Platinum Member

Elite Member

Elite Member

Platinum Member

Senior member

Diamond Member

Lifer

Golden Member

Platinum Member

Diamond Member

Lifer

Diamond Member

Elite Member

Elite Member

Platinum Member

Member

Lifer