- Jul 27, 2020
- 23,513
- 16,525
- 146
First Gracemont laptop available for sale.
Anybody got disposable $500 to buy and test this laptop?
That isn't my point here. In the first place no one knows the actual perf/power/area/differences between even i7 and i4 despite having Intel's official numbers, since it is reliant on many factors that aren't even dictated by the node. Power/perf increases vary by chip design even if the same chip is being used to compare between different nodes. A GPU designed on two different nodes may show different gains than a CPU designed on the same two nodes.you in fact, can not. Not one knows any actual perf/power/area/Cac/whatever difference between i7 and N3b. Just a futile effort.
Funny story I see with Windows 11, being more thread hungry... I don't be surprised if an Octa Thread won't be enough....AVX512 is niche for consumers, but so are 16-core CPUs, it's just a Cinebench benchmarking contest about "which CPU has more raw power".
CWF-AP? Thought it was SP that was canned. SRF-AP is the one rumored to be cancelledis this why CWF-AP and future Atom xeons are dead?
Depends on number of tabs. If never more than a dozen tabs, should be ok.Good enough for my old man to browse, right?
Any catches?
The portion of the vector instructions for overall performance is a fraction, thus the capability of the uarch is more important. And the support for even first AVX is not 100%.How bad is the lack of AVX on Jasper Lake/Tremont for browsing?
Does it hurt usability too much, even compared with a slower Skylake?
Skymont raw FP gains were more than 30%.The portion of the vector instructions for overall performance is a fraction, thus the capability of the uarch is more important. And the support for even first AVX is not 100%.
Gracemont supporting AVX did not improve overall performance in a noticeable way beyond uarch changes. It's Skymont, that doubled the unit count that got extra 30% FP gain on top of the 30% uarch gain.
Yes, because aiming for high scalar Integer performance benefits everything: Integer, FP, AI, cryptography.Skymont raw FP gains were more than 30%.
It won't matter in most applications, because code supporting AVX2 is small, nevermind AVX-512.Now Arctic Wolf is getting AVX-512 now what would be the difference between P and E core cause for once I think E will match or overtake P in IPC with Nova Lake.
My point is general availability of hw drive adoption if AVX-512 is available there will be more software leveraging this.It won't matter in most applications, because code supporting AVX2 is small, nevermind AVX-512.
You said: "Now Arctic Wolf is getting AVX-512 now what would be the difference between P and E core cause for once I think E will match or overtake P in IPC with Nova Lake."My point is general availability of hw drive adoption if AVX-512 is available there will be more software leveraging this.
You need HW to have featuresYou said: "Now Arctic Wolf is getting AVX-512 now what would be the difference between P and E core cause for once I think E will match or overtake P in IPC with Nova Lake."
What is the performance difference between AVX2 Sunny Cove and AVX-512 Skylake? For vast majority of applications, Sunny Cove is significantly better. That tells us the proportion of code accelerated by FP or benefitting it is small.
Also, I don't get the constant focus on wanting wider SIMD units, because it requires recompiling every time. If you look at any videos or read anything about modern software development, they are heavily burdened in terms of time and financial resources trying to get new features supported.
Like Ray Tracing for games, and you end up having a ok looking game but shallow gameplay, unoptimized, and only a little improvement over a well polished one without fancy new checkboxes.
For Arctic wolf I don't know amounts of FP Units but I agree they should go with 4*256 bit they can double pump for AVX-512Rather than doing 2x256-bit to 2x512-bit transition, they should have done a 4x256-bit transition, so it benefits ALL code dating back 40+ years. ARM is already doing this, and Skymont is doing this. What's their latest core now? 6x 128-bit FP units?
Now they got 512-bit with AVX-512, they should stop expanding... forever.
Sunny Cove supports AVX-512, doesn't it?What is the performance difference between AVX2 Sunny Cove and AVX-512 Skylake? For vast majority of applications, Sunny Cove is significantly better. That tells us the proportion of code accelerated by FP or benefitting it is small.
You need HW to have features to be able to take advantage of and AVX-512 is not new like AMX is there are already many AVX-512 accelerated stuff like Crypto for example and JSON Parsing both are used heavenly in everyday use.You said: "Now Arctic Wolf is getting AVX-512 now what would be the difference between P and E core cause for once I think E will match or overtake P in IPC with Nova Lake."
What is the performance difference between AVX2 Sunny Cove and AVX-512 Skylake? For vast majority of applications, Sunny Cove is significantly better. That tells us the proportion of code accelerated by FP or benefitting it is small.
Also, I don't get the constant focus on wanting wider SIMD units, because it requires recompiling every time. If you look at any videos or read anything about modern software development, they are heavily burdened in terms of time and financial resources trying to get new features supported.
Like Ray Tracing for games, and you end up having a ok looking game but shallow gameplay, unoptimized, and only a little improvement over a well polished one without fancy new checkboxes.
For Arctic wolf I don't know amounts of FP Units but I agree they should go with 4*256 bit they can double pump for AVX-512.Rather than doing 2x256-bit to 2x512-bit transition, they should have done a 4x256-bit transition, so it benefits ALL code dating back 40+ years. ARM is already doing this, and Skymont is doing this. What's their latest core now? 6x 128-bit FP units?
Now they got 512-bit with AVX-512, they should stop expanding... forever.
Yes so does Willow CoveSunny Cove supports AVX-512, doesn't it?
SIMD code in general is not FP-only.What is the performance difference between AVX2 Sunny Cove and AVX-512 Skylake? For vast majority of applications, Sunny Cove is significantly better. That tells us the proportion of code accelerated by FP or benefitting it is small.
Modern compilers can vectorize code and benefit from wider units with little extra work. The problem is that thanks to segmentation, Intel made sure that AVX2 and even worse AVX-512 was not available for all their CPUs which leads to the very true last part of what you wrote: SW developers don't have the resources to support and validate multiple extensions.Also, I don't get the constant focus on wanting wider SIMD units, because it requires recompiling every time. If you look at any videos or read anything about modern software development, they are heavily burdened in terms of time and financial resources trying to get new features supported.
On this I agree: if you want to run existing code faster, you'd better expand the number of units rather than making them wider; the latter would only bring benefit by recompiling/rewriting existing code (and then you're back to the previous issue: you'd need different sources/binaries to be able to run on existing older CPUs).Rather than doing 2x256-bit to 2x512-bit transition, they should have done a 4x256-bit transition, so it benefits ALL code dating back 40+ years. ARM is already doing this, and Skymont is doing this. What's their latest core now? 6x 128-bit FP units?
It's called "Darkmont"No info about the next small Intel core?
There is a bottleneck elsewhere I guess if 2 taken branches is still leading to only expected ~5% IPC increase. What do you think it could be? Any chance we may finally see HT in E-core?
- The branch predictor can handle 2 taken branches per cycle (similar to Zen 5)
considering the diffrence between Lion Cove P an Skymont E cores in IPC is about 7% according to Intel if it's 5-8% again than it is matching a previous gen P core with much better power curve.It's called "Darkmont"
That's it so far, most of this is from Raichu on X
- The branch predictor can handle 2 taken branches per cycle (similar to Zen 5)
- The instruction decode length limit per cluster has been raised from 20B -> 24B
- IPC increase expected to be minor, around 5%
That IS the bottleneck, because performance increase doesn't come free. The investment in area and transistors for that is probably 5% at the most, so getting a gain much greater than that would be actually an outlier, not the norm.There is a bottleneck elsewhere I guess if 2 taken branches is still leading to only expected ~5% IPC increase. What do you think it could be? Any chance we may finally see HT in E-core?
Yes, but most of the gains come from having the necessary instructions. Like an article from HPCWire promoting AVX512 said it allows vectorizing instructions that weren't previous vectorizable. So do that, and keep it at 256-bit, not 512.SIMD code in general is not FP-only.
My whole point is that AVX512 was created because of Intel's obsession of keeping GPU out of the lucrative HPC market by beefing up general purpose CPUs FP capability so the developers being lazy would not move from it.Modern compilers can vectorize code and benefit from wider units with little extra work. The problem is that thanks to segmentation, Intel made sure that AVX2 and even worse AVX-512 was not available for all their CPUs which leads to the very true last part of what you wrote: SW developers don't have the resources to support and validate multiple extensions.
It was created because AVX(2) were rudimentary ISAs.My whole point is that AVX512 was created because of Intel's obsession of keeping GPU out of the lucrative HPC market by beefing up general purpose CPUs FP capability so the developers being lazy would not move from it.
Just don't.Value CPU: 2x 128-bit
Mainstream: 4x 128-bit
Server: 6-8x 128-bit