Question Intel Mont thread

Page 16 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

ajsdkflsdjfio

Member
Nov 20, 2024
185
132
76
you in fact, can not. Not one knows any actual perf/power/area/Cac/whatever difference between i7 and N3b. Just a futile effort.
That isn't my point here. In the first place no one knows the actual perf/power/area/differences between even i7 and i4 despite having Intel's official numbers, since it is reliant on many factors that aren't even dictated by the node. Power/perf increases vary by chip design even if the same chip is being used to compare between different nodes. A GPU designed on two different nodes may show different gains than a CPU designed on the same two nodes.

Either way what I am pointing out here is that LNC gains like 10% IPC over GLC? While Skymont gains somewhere in the mid 30% IPC over Gracemont. BOTH WITH THE SAME NODE SHRINK AND SIZE INCREASES. You can extrapolate from this that Skymont has a much larger generational increase than other parts ISO node/area.
 
Last edited:

Panino Manino

Golden Member
Jan 28, 2017
1,002
1,243
136
Returned the Gracemont Mini-PC because it was actually a scam with the store trying to push a Celeron N5100. Already got something else for my father, but got me thinking...

We are all satisfied by old computers. I think the most important feature is the decide capabilities of the iGPU. My brother is also in need of a computer, maybe a N5100 is enough for him, even for me even that is 90% satisfied by a mobile Skylake dual core.

How bad is the lack of AVX on Jasper Lake/Tremont for browsing?
Does it hurt usability too much, even compared with a slower Skylake?
 

DavidC1

Golden Member
Dec 29, 2023
1,438
2,335
96
How bad is the lack of AVX on Jasper Lake/Tremont for browsing?
Does it hurt usability too much, even compared with a slower Skylake?
The portion of the vector instructions for overall performance is a fraction, thus the capability of the uarch is more important. And the support for even first AVX is not 100%.

Gracemont supporting AVX did not improve overall performance in a noticeable way beyond uarch changes. It's Skymont, that doubled the unit count that got extra 30% FP gain on top of the 30% uarch gain.
 

511

Golden Member
Jul 12, 2024
1,740
1,602
106
The portion of the vector instructions for overall performance is a fraction, thus the capability of the uarch is more important. And the support for even first AVX is not 100%.

Gracemont supporting AVX did not improve overall performance in a noticeable way beyond uarch changes. It's Skymont, that doubled the unit count that got extra 30% FP gain on top of the 30% uarch gain.
Skymont raw FP gains were more than 30%.

Now Arctic Wolf is getting AVX-512 now what would be the difference between P and E core cause for once I think E will match or overtake P in IPC with Nova Lake.
 

DavidC1

Golden Member
Dec 29, 2023
1,438
2,335
96
Skymont raw FP gains were more than 30%.
Yes, because aiming for high scalar Integer performance benefits everything: Integer, FP, AI, cryptography.

It won't be 30% Integer and 0% FP.

On the other hand, if you aim for high FP performance, you could end up being very little gains on Integer.
Now Arctic Wolf is getting AVX-512 now what would be the difference between P and E core cause for once I think E will match or overtake P in IPC with Nova Lake.
It won't matter in most applications, because code supporting AVX2 is small, nevermind AVX-512.
 

DavidC1

Golden Member
Dec 29, 2023
1,438
2,335
96
My point is general availability of hw drive adoption if AVX-512 is available there will be more software leveraging this.
You said: "Now Arctic Wolf is getting AVX-512 now what would be the difference between P and E core cause for once I think E will match or overtake P in IPC with Nova Lake."

What is the performance difference between AVX2 Sunny Cove and AVX-512 Skylake? For vast majority of applications, Sunny Cove is significantly better. That tells us the proportion of code accelerated by FP or benefitting it is small.

Also, I don't get the constant focus on wanting wider SIMD units, because it requires recompiling every time. If you look at any videos or read anything about modern software development, they are heavily burdened in terms of time and financial resources trying to get new features supported.

Like Ray Tracing for games, and you end up having a ok looking game but shallow gameplay, unoptimized, and only a little improvement over a well polished one without fancy new checkboxes.

Rather than doing 2x256-bit to 2x512-bit transition, they should have done a 4x256-bit transition, so it benefits ALL code dating back 40+ years. ARM is already doing this, and Skymont is doing this. What's their latest core now? 6x 128-bit FP units?

Now they got 512-bit with AVX-512, they should stop expanding... forever.
 

511

Golden Member
Jul 12, 2024
1,740
1,602
106
You said: "Now Arctic Wolf is getting AVX-512 now what would be the difference between P and E core cause for once I think E will match or overtake P in IPC with Nova Lake."

What is the performance difference between AVX2 Sunny Cove and AVX-512 Skylake? For vast majority of applications, Sunny Cove is significantly better. That tells us the proportion of code accelerated by FP or benefitting it is small.

Also, I don't get the constant focus on wanting wider SIMD units, because it requires recompiling every time. If you look at any videos or read anything about modern software development, they are heavily burdened in terms of time and financial resources trying to get new features supported.

Like Ray Tracing for games, and you end up having a ok looking game but shallow gameplay, unoptimized, and only a little improvement over a well polished one without fancy new checkboxes.
You need HW to have features
Rather than doing 2x256-bit to 2x512-bit transition, they should have done a 4x256-bit transition, so it benefits ALL code dating back 40+ years. ARM is already doing this, and Skymont is doing this. What's their latest core now? 6x 128-bit FP units?

Now they got 512-bit with AVX-512, they should stop expanding... forever.
For Arctic wolf I don't know amounts of FP Units but I agree they should go with 4*256 bit they can double pump for AVX-512
 

GTracing

Senior member
Aug 6, 2021
442
1,041
106
What is the performance difference between AVX2 Sunny Cove and AVX-512 Skylake? For vast majority of applications, Sunny Cove is significantly better. That tells us the proportion of code accelerated by FP or benefitting it is small.
Sunny Cove supports AVX-512, doesn't it?
 

511

Golden Member
Jul 12, 2024
1,740
1,602
106
You said: "Now Arctic Wolf is getting AVX-512 now what would be the difference between P and E core cause for once I think E will match or overtake P in IPC with Nova Lake."

What is the performance difference between AVX2 Sunny Cove and AVX-512 Skylake? For vast majority of applications, Sunny Cove is significantly better. That tells us the proportion of code accelerated by FP or benefitting it is small.

Also, I don't get the constant focus on wanting wider SIMD units, because it requires recompiling every time. If you look at any videos or read anything about modern software development, they are heavily burdened in terms of time and financial resources trying to get new features supported.

Like Ray Tracing for games, and you end up having a ok looking game but shallow gameplay, unoptimized, and only a little improvement over a well polished one without fancy new checkboxes.
You need HW to have features to be able to take advantage of and AVX-512 is not new like AMX is there are already many AVX-512 accelerated stuff like Crypto for example and JSON Parsing both are used heavenly in everyday use.

Rather than doing 2x256-bit to 2x512-bit transition, they should have done a 4x256-bit transition, so it benefits ALL code dating back 40+ years. ARM is already doing this, and Skymont is doing this. What's their latest core now? 6x 128-bit FP units?

Now they got 512-bit with AVX-512, they should stop expanding... forever.
For Arctic wolf I don't know amounts of FP Units but I agree they should go with 4*256 bit they can double pump for AVX-512.
 

Nothingness

Diamond Member
Jul 3, 2013
3,277
2,329
136
What is the performance difference between AVX2 Sunny Cove and AVX-512 Skylake? For vast majority of applications, Sunny Cove is significantly better. That tells us the proportion of code accelerated by FP or benefitting it is small.
SIMD code in general is not FP-only.

Also, I don't get the constant focus on wanting wider SIMD units, because it requires recompiling every time. If you look at any videos or read anything about modern software development, they are heavily burdened in terms of time and financial resources trying to get new features supported.
Modern compilers can vectorize code and benefit from wider units with little extra work. The problem is that thanks to segmentation, Intel made sure that AVX2 and even worse AVX-512 was not available for all their CPUs which leads to the very true last part of what you wrote: SW developers don't have the resources to support and validate multiple extensions.

Rather than doing 2x256-bit to 2x512-bit transition, they should have done a 4x256-bit transition, so it benefits ALL code dating back 40+ years. ARM is already doing this, and Skymont is doing this. What's their latest core now? 6x 128-bit FP units?
On this I agree: if you want to run existing code faster, you'd better expand the number of units rather than making them wider; the latter would only bring benefit by recompiling/rewriting existing code (and then you're back to the previous issue: you'd need different sources/binaries to be able to run on existing older CPUs).
 

511

Golden Member
Jul 12, 2024
1,740
1,602
106
It's called "Darkmont"
  • The branch predictor can handle 2 taken branches per cycle (similar to Zen 5)
  • The instruction decode length limit per cluster has been raised from 20B -> 24B
  • IPC increase expected to be minor, around 5%
That's it so far, most of this is from Raichu on X
considering the diffrence between Lion Cove P an Skymont E cores in IPC is about 7% according to Intel if it's 5-8% again than it is matching a previous gen P core with much better power curve.
 

DavidC1

Golden Member
Dec 29, 2023
1,438
2,335
96
There is a bottleneck elsewhere I guess if 2 taken branches is still leading to only expected ~5% IPC increase. What do you think it could be? Any chance we may finally see HT in E-core?
That IS the bottleneck, because performance increase doesn't come free. The investment in area and transistors for that is probably 5% at the most, so getting a gain much greater than that would be actually an outlier, not the norm.

Note in comparison how many things they needed to beef up to get Lion Cove just 10% gain.

The fact that the Mont team has been increasing performance nearly linearly with amount of extra transistors is impressive in itself, because just increasing the size of everything gets you into what's called an Inverse Square Law, where, the performance gain is approximately square root of the transistors/power/area invested.

2-taken branches likely helps the clustered decode work better. Considering that HT increases complexity and validation time of each and every core, I can't see them putting it anytime soon. A 1 month equivalent adder per year would add up to a full generation being behind in about 15 years.

And I'd argue based on the core size, it doesn't need HT to perform like one.
 
Last edited:
Reactions: 511

DavidC1

Golden Member
Dec 29, 2023
1,438
2,335
96
SIMD code in general is not FP-only.
Yes, but most of the gains come from having the necessary instructions. Like an article from HPCWire promoting AVX512 said it allows vectorizing instructions that weren't previous vectorizable. So do that, and keep it at 256-bit, not 512.
Modern compilers can vectorize code and benefit from wider units with little extra work. The problem is that thanks to segmentation, Intel made sure that AVX2 and even worse AVX-512 was not available for all their CPUs which leads to the very true last part of what you wrote: SW developers don't have the resources to support and validate multiple extensions.
My whole point is that AVX512 was created because of Intel's obsession of keeping GPU out of the lucrative HPC market by beefing up general purpose CPUs FP capability so the developers being lazy would not move from it.

But at one point, there should have been a realization that there's too much vector unit in what is otherwise a general purpose CPU. And if they really wanted double the performance, rather than inconveniencing everyone, they could have upped number of FP units from 2x to 4x. Which would have substantially increased performance in all applications, for everyone that bought the new CPU.

And I know that is also because partly Intel wanted to lock AMD out by forcing on a new instruction cause they were a greedy monopoly.

The counterargument that SIMD is more than FP is satisfied by maximum, 256-bit width. I'd argue further that it should have stayed maximum 128-bit but rather than moving to 256-bit, it should have been double the amount of 128-bit units. Same instructions as "AVX-512" without the 512-bit part.

Value CPU: 2x 128-bit
Mainstream: 4x 128-bit
Server: 6-8x 128-bit

The fact that this is what the ARM CPU vendors are doing, who actually care about area/power efficiency unlike Intel/AMD tells me it's the right approach, because GAA/RibbonFET transistor generation perf gains are going to crash.
 
Last edited:
Reactions: Nothingness

adroc_thurston

Diamond Member
Jul 2, 2023
5,365
7,547
96
My whole point is that AVX512 was created because of Intel's obsession of keeping GPU out of the lucrative HPC market by beefing up general purpose CPUs FP capability so the developers being lazy would not move from it.
It was created because AVX(2) were rudimentary ISAs.
Value CPU: 2x 128-bit
Mainstream: 4x 128-bit
Server: 6-8x 128-bit
Just don't.
 
Reactions: MS_AT and 511
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |