Question Intel Mont thread

igor_kavinski · Feb 20, 2023

https://store.acer.com/en-us/aspire-3-laptop-a315-510p-3905

First Gracemont laptop available for sale.

Anybody got disposable $500 to buy and test this laptop?

ajsdkflsdjfio · Dec 23, 2024

adroc_thurston said:
you in fact, can not. Not one knows any actual perf/power/area/Cac/whatever difference between i7 and N3b. Just a futile effort.

That isn't my point here. In the first place no one knows the actual perf/power/area/differences between even i7 and i4 despite having Intel's official numbers, since it is reliant on many factors that aren't even dictated by the node. Power/perf increases vary by chip design even if the same chip is being used to compare between different nodes. A GPU designed on two different nodes may show different gains than a CPU designed on the same two nodes.

Either way what I am pointing out here is that LNC gains like 10% IPC over GLC? While Skymont gains somewhere in the mid 30% IPC over Gracemont. BOTH WITH THE SAME NODE SHRINK AND SIZE INCREASES. You can extrapolate from this that Skymont has a much larger generational increase than other parts ISO node/area.

DZero · Dec 23, 2024

Meteor Late said:
AVX512 is niche for consumers, but so are 16-core CPUs, it's just a Cinebench benchmarking contest about "which CPU has more raw power".

Funny story I see with Windows 11, being more thread hungry... I don't be surprised if an Octa Thread won't be enough....

cannedlake240 · Dec 23, 2024

adroc_thurston said:
is this why CWF-AP and future Atom xeons are dead?

CWF-AP? Thought it was SP that was canned. SRF-AP is the one rumored to be cancelled

Panino Manino · Mar 17, 2025

What about this MiniPC Alder Lake N95 as a substitute for a Haswell 4400 fried by power failure?

Good enough for my old man to browse, right?
Any catches?

Beelink | Beelink MINI S12 Intel® Alder Lake-N N95

www.bee-link.com

igor_kavinski · Mar 17, 2025

Panino Manino said:
Good enough for my old man to browse, right?
Any catches?

Depends on number of tabs. If never more than a dozen tabs, should be ok.

Better to increase the RAM to 16GB.

Panino Manino · Thursday at 6:41 AM

Returned the Gracemont Mini-PC because it was actually a scam with the store trying to push a Celeron N5100. Already got something else for my father, but got me thinking...

We are all satisfied by old computers. I think the most important feature is the decide capabilities of the iGPU. My brother is also in need of a computer, maybe a N5100 is enough for him, even for me even that is 90% satisfied by a mobile Skylake dual core.

How bad is the lack of AVX on Jasper Lake/Tremont for browsing?
Does it hurt usability too much, even compared with a slower Skylake?

DavidC1 · Thursday at 6:53 AM

Panino Manino said:
How bad is the lack of AVX on Jasper Lake/Tremont for browsing?
Does it hurt usability too much, even compared with a slower Skylake?

The portion of the vector instructions for overall performance is a fraction, thus the capability of the uarch is more important. And the support for even first AVX is not 100%.

Gracemont supporting AVX did not improve overall performance in a noticeable way beyond uarch changes. It's Skymont, that doubled the unit count that got extra 30% FP gain on top of the 30% uarch gain.

511 · Thursday at 8:13 AM

DavidC1 said:
The portion of the vector instructions for overall performance is a fraction, thus the capability of the uarch is more important. And the support for even first AVX is not 100%.

Gracemont supporting AVX did not improve overall performance in a noticeable way beyond uarch changes. It's Skymont, that doubled the unit count that got extra 30% FP gain on top of the 30% uarch gain.

Skymont raw FP gains were more than 30%.

Now Arctic Wolf is getting AVX-512 now what would be the difference between P and E core cause for once I think E will match or overtake P in IPC with Nova Lake.

DavidC1 · Thursday at 3:20 PM

511 said:
Skymont raw FP gains were more than 30%.

Yes, because aiming for high scalar Integer performance benefits everything: Integer, FP, AI, cryptography.

It won't be 30% Integer and 0% FP.

On the other hand, if you aim for high FP performance, you could end up being very little gains on Integer.

511 said:
Now Arctic Wolf is getting AVX-512 now what would be the difference between P and E core cause for once I think E will match or overtake P in IPC with Nova Lake.

It won't matter in most applications, because code supporting AVX2 is small, nevermind AVX-512.

511 · Thursday at 9:31 PM

DavidC1 said:
It won't matter in most applications, because code supporting AVX2 is small, nevermind AVX-512.

My point is general availability of hw drive adoption if AVX-512 is available there will be more software leveraging this.

DavidC1 · Friday at 7:00 AM

511 said:
My point is general availability of hw drive adoption if AVX-512 is available there will be more software leveraging this.

You said: "Now Arctic Wolf is getting AVX-512 now what would be the difference between P and E core cause for once I think E will match or overtake P in IPC with Nova Lake."

What is the performance difference between AVX2 Sunny Cove and AVX-512 Skylake? For vast majority of applications, Sunny Cove is significantly better. That tells us the proportion of code accelerated by FP or benefitting it is small.

Also, I don't get the constant focus on wanting wider SIMD units, because it requires recompiling every time. If you look at any videos or read anything about modern software development, they are heavily burdened in terms of time and financial resources trying to get new features supported.

Like Ray Tracing for games, and you end up having a ok looking game but shallow gameplay, unoptimized, and only a little improvement over a well polished one without fancy new checkboxes.

Rather than doing 2x256-bit to 2x512-bit transition, they should have done a 4x256-bit transition, so it benefits ALL code dating back 40+ years. ARM is already doing this, and Skymont is doing this. What's their latest core now? 6x 128-bit FP units?

Now they got 512-bit with AVX-512, they should stop expanding... forever.

511 · Friday at 7:55 AM

DavidC1 said:
You said: "Now Arctic Wolf is getting AVX-512 now what would be the difference between P and E core cause for once I think E will match or overtake P in IPC with Nova Lake."

What is the performance difference between AVX2 Sunny Cove and AVX-512 Skylake? For vast majority of applications, Sunny Cove is significantly better. That tells us the proportion of code accelerated by FP or benefitting it is small.

Also, I don't get the constant focus on wanting wider SIMD units, because it requires recompiling every time. If you look at any videos or read anything about modern software development, they are heavily burdened in terms of time and financial resources trying to get new features supported.

Like Ray Tracing for games, and you end up having a ok looking game but shallow gameplay, unoptimized, and only a little improvement over a well polished one without fancy new checkboxes.

You need HW to have features

DavidC1 said:
Rather than doing 2x256-bit to 2x512-bit transition, they should have done a 4x256-bit transition, so it benefits ALL code dating back 40+ years. ARM is already doing this, and Skymont is doing this. What's their latest core now? 6x 128-bit FP units?

Now they got 512-bit with AVX-512, they should stop expanding... forever.

For Arctic wolf I don't know amounts of FP Units but I agree they should go with 4*256 bit they can double pump for AVX-512

GTracing · Friday at 7:58 AM

DavidC1 said:
What is the performance difference between AVX2 Sunny Cove and AVX-512 Skylake? For vast majority of applications, Sunny Cove is significantly better. That tells us the proportion of code accelerated by FP or benefitting it is small.

Sunny Cove supports AVX-512, doesn't it?

511 · Friday at 7:59 AM

DavidC1 said:
You said: "Now Arctic Wolf is getting AVX-512 now what would be the difference between P and E core cause for once I think E will match or overtake P in IPC with Nova Lake."

What is the performance difference between AVX2 Sunny Cove and AVX-512 Skylake? For vast majority of applications, Sunny Cove is significantly better. That tells us the proportion of code accelerated by FP or benefitting it is small.

Also, I don't get the constant focus on wanting wider SIMD units, because it requires recompiling every time. If you look at any videos or read anything about modern software development, they are heavily burdened in terms of time and financial resources trying to get new features supported.

Like Ray Tracing for games, and you end up having a ok looking game but shallow gameplay, unoptimized, and only a little improvement over a well polished one without fancy new checkboxes.

You need HW to have features to be able to take advantage of and AVX-512 is not new like AMX is there are already many AVX-512 accelerated stuff like Crypto for example and JSON Parsing both are used heavenly in everyday use.

DavidC1 said:
Rather than doing 2x256-bit to 2x512-bit transition, they should have done a 4x256-bit transition, so it benefits ALL code dating back 40+ years. ARM is already doing this, and Skymont is doing this. What's their latest core now? 6x 128-bit FP units?

Now they got 512-bit with AVX-512, they should stop expanding... forever.

For Arctic wolf I don't know amounts of FP Units but I agree they should go with 4*256 bit they can double pump for AVX-512.

511 · Friday at 7:59 AM

GTracing said:
Sunny Cove supports AVX-512, doesn't it?

Yes so does Willow Cove

Nothingness · Friday at 8:36 AM

DavidC1 said:
What is the performance difference between AVX2 Sunny Cove and AVX-512 Skylake? For vast majority of applications, Sunny Cove is significantly better. That tells us the proportion of code accelerated by FP or benefitting it is small.

SIMD code in general is not FP-only.

DavidC1 said:
Also, I don't get the constant focus on wanting wider SIMD units, because it requires recompiling every time. If you look at any videos or read anything about modern software development, they are heavily burdened in terms of time and financial resources trying to get new features supported.

Modern compilers can vectorize code and benefit from wider units with little extra work. The problem is that thanks to segmentation, Intel made sure that AVX2 and even worse AVX-512 was not available for all their CPUs which leads to the very true last part of what you wrote: SW developers don't have the resources to support and validate multiple extensions.

DavidC1 said:
Rather than doing 2x256-bit to 2x512-bit transition, they should have done a 4x256-bit transition, so it benefits ALL code dating back 40+ years. ARM is already doing this, and Skymont is doing this. What's their latest core now? 6x 128-bit FP units?

On this I agree: if you want to run existing code faster, you'd better expand the number of units rather than making them wider; the latter would only bring benefit by recompiling/rewriting existing code (and then you're back to the previous issue: you'd need different sources/binaries to be able to run on existing older CPUs).

DZero · Friday at 10:14 AM

No info about the next small Intel core?

Cardyak · Friday at 11:30 AM

DZero said:
No info about the next small Intel core?

It's called "Darkmont"

The branch predictor can handle 2 taken branches per cycle (similar to Zen 5)
The instruction decode length limit per cluster has been raised from 20B -> 24B
IPC increase expected to be minor, around 5%

That's it so far, most of this is from Raichu on X

igor_kavinski · Friday at 12:19 PM

Cardyak said:
The branch predictor can handle 2 taken branches per cycle (similar to Zen 5)

There is a bottleneck elsewhere I guess if 2 taken branches is still leading to only expected ~5% IPC increase. What do you think it could be? Any chance we may finally see HT in E-core?

511 · Friday at 1:07 PM

Cardyak said:
It's called "Darkmont"

The branch predictor can handle 2 taken branches per cycle (similar to Zen 5)

The instruction decode length limit per cluster has been raised from 20B -> 24B

IPC increase expected to be minor, around 5%

That's it so far, most of this is from Raichu on X

considering the diffrence between Lion Cove P an Skymont E cores in IPC is about 7% according to Intel if it's 5-8% again than it is matching a previous gen P core with much better power curve.

DavidC1 · 2025-03-31T19:17:53-0400

igor_kavinski said:
There is a bottleneck elsewhere I guess if 2 taken branches is still leading to only expected ~5% IPC increase. What do you think it could be? Any chance we may finally see HT in E-core?

That IS the bottleneck, because performance increase doesn't come free. The investment in area and transistors for that is probably 5% at the most, so getting a gain much greater than that would be actually an outlier, not the norm.

Note in comparison how many things they needed to beef up to get Lion Cove just 10% gain.

The fact that the Mont team has been increasing performance nearly linearly with amount of extra transistors is impressive in itself, because just increasing the size of everything gets you into what's called an Inverse Square Law, where, the performance gain is approximately square root of the transistors/power/area invested.

2-taken branches likely helps the clustered decode work better. Considering that HT increases complexity and validation time of each and every core, I can't see them putting it anytime soon. A 1 month equivalent adder per year would add up to a full generation being behind in about 15 years.

And I'd argue based on the core size, it doesn't need HT to perform like one.

DavidC1 · 2025-03-31T19:33:40-0400

Nothingness said:
SIMD code in general is not FP-only.

Yes, but most of the gains come from having the necessary instructions. Like an article from HPCWire promoting AVX512 said it allows vectorizing instructions that weren't previous vectorizable. So do that, and keep it at 256-bit, not 512.

Nothingness said:
Modern compilers can vectorize code and benefit from wider units with little extra work. The problem is that thanks to segmentation, Intel made sure that AVX2 and even worse AVX-512 was not available for all their CPUs which leads to the very true last part of what you wrote: SW developers don't have the resources to support and validate multiple extensions.

My whole point is that AVX512 was created because of Intel's obsession of keeping GPU out of the lucrative HPC market by beefing up general purpose CPUs FP capability so the developers being lazy would not move from it.

But at one point, there should have been a realization that there's too much vector unit in what is otherwise a general purpose CPU. And if they really wanted double the performance, rather than inconveniencing everyone, they could have upped number of FP units from 2x to 4x. Which would have substantially increased performance in all applications, for everyone that bought the new CPU.

And I know that is also because partly Intel wanted to lock AMD out by forcing on a new instruction cause they were a greedy monopoly.

The counterargument that SIMD is more than FP is satisfied by maximum, 256-bit width. I'd argue further that it should have stayed maximum 128-bit but rather than moving to 256-bit, it should have been double the amount of 128-bit units. Same instructions as "AVX-512" without the 512-bit part.

Value CPU: 2x 128-bit
Mainstream: 4x 128-bit
Server: 6-8x 128-bit

The fact that this is what the ARM CPU vendors are doing, who actually care about area/power efficiency unlike Intel/AMD tells me it's the right approach, because GAA/RibbonFET transistor generation perf gains are going to crash.

adroc_thurston · 2025-03-31T22:07:09-0400

DavidC1 said:
My whole point is that AVX512 was created because of Intel's obsession of keeping GPU out of the lucrative HPC market by beefing up general purpose CPUs FP capability so the developers being lazy would not move from it.

It was created because AVX(2) were rudimentary ISAs.

DavidC1 said:
Value CPU: 2x 128-bit
Mainstream: 4x 128-bit
Server: 6-8x 128-bit

Just don't.

Question Intel Mont thread

Lifer

Member

Senior member

Senior member

Golden Member

Lifer

Golden Member

Golden Member

Golden Member

Golden Member

Golden Member

Golden Member

Golden Member

Senior member

Golden Member

Golden Member

Diamond Member

Senior member

Member

Lifer

Golden Member

Golden Member

Golden Member

Diamond Member