Discussion Intel current and future Lakes & Rapids thread

Page 751 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,761
14,785
136
Again, BACK ATCHA. You're ignoring Intel's recent inability to execute and leaning on products that don't even exist yet, and may never on the broader market! You keep pumping up e-cores in contrast to cores that are known to have area and power-efficiency problems. There is no "arguing in bad faith".
Not to mention trying to compare Zen4c to E-cores (not in the same class) and saying "hybrid solutions are the future" when only Intel has done it (aside from iphones and the like) Even arm stays with all the same cores that I have seen.
 
Last edited:

Hulk

Diamond Member
Oct 9, 1999
4,378
2,256
136
When you get over 8 E cores, ie the 13900 series, the biggest problem is finding applications that actually load all of those cores optimally. Or should I say "my" biggest problem as I'm not seeing huge increases due to the huge number of E's. One of these days I'm going to do some benches with 8 E's and 16 E's with the applications I use on a regular basis.
 
Reactions: lightmanek

Exist50

Platinum Member
Aug 18, 2016
2,452
3,102
136
Again, BACK ATCHA. You're ignoring Intel's recent inability to execute and leaning on products that don't even exist yet, and may never on the broader market! You keep pumping up e-cores in contrast to cores that are known to have area and power-efficiency problems. There is no "arguing in bad faith".
This is precisely what I'm talking about. The discussion was on the inherit merit of small cores vs big ones for certain markets. Now you're trying to twist it into being about Intel's execution issues, which you don't even bother to connect to the subject of small cores. There is no discussion to be had with someone who changes their argument with every reply.
 
Reactions: mderbarimdiger

Exist50

Platinum Member
Aug 18, 2016
2,452
3,102
136
Not to mention trying to compare Zen4c to E-cores (not in the same class)
According to the which set of made up numbers now?
and saying "hybrid solutions are the future" when only Intel has done it (aside from iphones and the like) Even arm stays with all the same cores that I have seen.
Of the consumer application processor market, the majority is ARM, almost all of which are hybrid. Of the remainder, the vast majority are Intel, which is also all-in on hybrid. So no, hybrid solutions are not the future. They are the present.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,761
14,785
136
According to the which set of made up numbers now?

Of the consumer application processor market, the majority is ARM, almost all of which are hybrid. Of the remainder, the vast majority are Intel, which is also all-in on hybrid. So no, hybrid solutions are not the future. They are the present.
Zen4c for one has avx-512, which e-cores do NOT. Not to mention benchmarkwise they are not even in the same class.

As for the hybrid, would you care to list all the hybrid CPU's that are NOT for phones or Apple that are hybrid ? Like real server CPUs ? Intel desktop is excluded in my question, as we know they have it.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,102
136
Zen4c for one has avx-512, which e-cores do NOT.
And since when did that matter for the cloud market these chips are targeting?
Not to mention benchmarkwise they are not even in the same class.
Where are you comparing benchmarks for not one, but two unreleased products?
As for the hybrid, would you care to list all the hybrid CPU's that are NOT for phones or Apple that are hybrid ? Like real server CPUs ?
So aside from the billions of devices that use hybrid? No one was suggesting it for servers.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,689
1,224
136
It looks AMD CMT processors with additional Inter-core resources to me. Will it really perform well in ST workloads compared to vanilla BF OOO cores? I agree if they are planning to mitigate ST degradation due to small core sizes, but not sure if it's really the way to boost ST performances.
The penalty for Intel-sided fusion of processors into a monolithic virtual core is between 90% to 98% of the fictional monolithic core.

P-core(2c hw-fusion) => Huge area penalty of physically fusing two processors into one.
W/ it done on Intel 10/7
Area penalty: 7.04 mm2 -> less than 14.08 mm2
Gives monolithic performance of: 10-wide decode+caches, etc
This as far as I can tell is not better than the Israeli managed P-core successor, which killed off all American managed P-core projects.

E-core(4c hw-fusion) => Less area penalty of physically fusing four processors into one.
W/ it done on Intel 10/7
Area penalty: 1.7*4+1.98=8.78mm2 -> 5.1*2[Min-element fused two cores, which scales to max-element of fused four cores]+1.98 = 12.18 mm2
Gives monolithic performance of: 24-wide decode, etc

The use of 24-wide in single-threaded is dependent on the 2nd level load balancer which is the exo-core[the exact component which fuses cores] variant of;
Gracemont's: hardware-driven load balancer is also capable of taking long chains of sequential instructions and automatically inserts toggle points to ensure parallelism.

{If PRF sharing is not the target, the latency for L1d sharing is:
3-cycle for local core
5-cycle for neighbor core
7-cycle for opposite core}

Intel 10, Tremont(Dense)
Intel 7, Gracemont(HP) <== L1 Load Balancer in client
Intel 4, Crestmont(HP)
Intel 3's Mont(Dense HP)
Intel Test Node [FinFET+PowerVIA] (Test Lib) <== L2 Load Balancer in test E-core fusion
Intel 20A's(Dense) <== E-core Tick (Intel 3-like E-core)
Intel 18A(Dense) <== E-core Tock L2 LB in client if greenlit.

E-core path isn't going for peak frequency, so it will go for area density allowing inclusion of fusing elements for cheap. Peak frequency can be ignored in a couple server and ultra-low-TDP workloads. Which means the only path for E-core to increase parallelity(^x)/execution(^-x) scaling without P-cores is to have core fusion. One High IPC fused core can increase work scaling to multiple Low IPC non-fused cores faster than one P-core :: Single-threaded perf is connected to multi-threaded scaling rate.

Intel -> ARM(2018-2022)
Also, the fused P-core project that was developed at ARM from Intel employees jumping ship is still running.
A715 (2x 128-bit Vx's+4 Int(2x 2SX+2MX))
vs
Fused P-core(2018 file date at ARM => 2024~2025 launch date) :: 2x4IQ Int(4x+4x 2SX+2MX) + 2x3IQ (3x+3x 128-bit:: 3x 256-bit(High-performance mode) or 6x 128-bit(Low-power mode))

Intel -> Apple (2020-2022)
Fused E-core project for big boy Datacenter CPU. No specs, since Apple doesn't pop off like ARM or Intel.
 
Last edited:

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,761
14,785
136
And since when did that matter for the cloud market these chips are targeting?

Where are you comparing benchmarks for not one, but two unreleased products?

So aside from the billions of devices that use hybrid? No one was suggesting it for servers.
Way to avoid my question. I was not even replying to your originally, just agreeing with another member. I should know better than to trade posts with you as you always avoid my questions, or change the subject. I am done with you....
 

Timorous

Golden Member
Oct 27, 2008
1,727
3,152
136
Not to mention benchmarkwise they are not even in the same class.

If 45w vs 65w 7950X is anything to go by (roughly the same ppc as 128c Bergarmo and 96c Genoa) the performance drop off is quite significant. Unless the Zen4c library shifts the v/f curve such that it can run at higher clocks when given such low amounts of power vs the standard Zen 4 core the performance delta could actually be closer to that of the e-core vs p-core.

Very much napkin math there but we don't have anything else to go off of at the moment.
 

Dayman1225

Golden Member
Aug 14, 2017
1,153
982
146
Looks like the specs of the new Sapphire Rapids based Xeon-W line have leaked ahead of the launch event today.

ranging from 6-56 cores, 64-112 PCIe Gen5 lanes and quad or octo channel configs.
56 core price tops out at $5889





 

palladium

Senior member
Dec 24, 2007
538
2
81
Looks like the specs of the new Sapphire Rapids based Xeon-W line have leaked ahead of the launch event today.

ranging from 6-56 cores, 64-112 PCIe Gen5 lanes and quad or octo channel configs.
56 core price tops out at $5889





Hmm, where do you think the i9 13900k will sit in MT, non AVX 512 performance? I'm guessing around W5-2465X - W5-2475X ?
 
Reactions: Geddagod

Timmah!

Golden Member
Jul 24, 2010
1,463
729
136
Looks like the specs of the new Sapphire Rapids based Xeon-W line have leaked ahead of the launch event today.

ranging from 6-56 cores, 64-112 PCIe Gen5 lanes and quad or octo channel configs.
56 core price tops out at $5889






So 24 core costing about as much as 24 core TR 5965X? Color me surprised. The joy of duopoly and fake competition.
If these are true, and it seems fairly legit, then i did right choice going with 7950x. Cause if i was not willing to spend 2,5k on TR, safe to say the same goes for these Intels.
 

Kocicak

Senior member
Jan 17, 2019
982
974
136
So 24 core costing about as much as 24 core TR 5965X? Color me surprised. The joy of duopoly and fake competition.
These are still made from one large piece of silicone and are not cheap to make at all. They may have an advantage of lower core to core latency than Threadrippers. They may actually force AMD to lower the price of Threadrippers somewhat.
 
Jul 27, 2020
18,021
11,751
116
i did right choice going with 7950x.
Plus, you would have gotten stuck with no upgrade path if you went with workstation CPU. Now you can enjoy increased performance in 4 or 5 years with a CPU upgrade without having to change the whole system.
 

Timmah!

Golden Member
Jul 24, 2010
1,463
729
136
These are still made from one large piece of silicone and are not cheap to make at all. They may have an advantage of lower core to core latency than Threadrippers. They may actually force AMD to lower the price of Threadrippers somewhat.

Thats really their problem, is it not?
Not sure about lower core to core latency - they probably use mesh and that was not exactly great in that particular thing in the past either. Or am i wrong here? Anyway, its possible they improved on that and its not an issue anymore, we shall see.
 
Reactions: lightmanek

MrTeal

Diamond Member
Dec 7, 2003
3,587
1,748
136
These are still made from one large piece of silicone and are not cheap to make at all. They may have an advantage of lower core to core latency than Threadrippers. They may actually force AMD to lower the price of Threadrippers somewhat.
You can pay for two larger pieces of silicone for less than one of those W9's, and probably get a lot more enjoyment out of them.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
@DrMrLordX The E core team is one area within Intel that has been executing fantastically. Also, the Networking division is far, far smaller than the general server division. In addition to the fact that Ridge products are basically replacing entrenched ASIC competitors. So @Exist50 has a point about the real server E core being Sierra Forest.

You do know Ampere Altra is Icelake level in core performance? Graviton 2 is even lower, at Skylake level. Crestmont-class cores in Sierra Forest is very much in the ballpark category.

It looks AMD CMT processors with additional Inter-core resources to me. Will it really perform well in ST workloads compared to vanilla BF OOO cores? I agree if they are planning to mitigate ST degradation due to small core sizes, but not sure if it's really the way to boost ST performances.

There were rumors a while ago the original Nehalem was a "proper" CMT CPU aimed at increasing performance rather than saving area per thread as with Bulldozer.

The practicality of achieving performance is probably what killed it along with TTM worries.

I have read the E cores are actually "Cinebench accelerators." While funny, there is some truth to that statement as there aren't a lot of applications that will fully utilize all 16 E's in a 13900 series part.

This is probably the real reason why Arrowlake is going to be 8+16 rather than 8+32. There were rumors while ago that ARM cores in phones/tablets would go up quickly to 16+ cores. It hasn't.
 
Last edited:

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
Feature Set Comparison between 8480+ and The W9 3495X

 

Hulk

Diamond Member
Oct 9, 1999
4,378
2,256
136
This is probably the real reason why Arrowlake is going to be 8+16 rather than 8+32. There were rumors while ago that ARM cores in phones/tablets would go up quickly to 16+ cores. It hasn't.

Most software has yet to catch up to CPU's with 16 threads must less 32 or more.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,761
14,785
136
Most software has yet to catch up to CPU's with 16 threads must less 32 or more.
There are tasks which can use all the cores/threads you can throw at them. Some rendering/encoding tasks. What I do can use all the cores/threads 24/7, but DC is the only place you will see that.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,689
1,224
136
There were rumors a while ago the original Nehalem was a "proper" CMT CPU aimed at increasing performance rather than saving area per thread as with Bulldozer.
The original Bulldozer design was also a proper clustered-based multithreading arch. The boss of cores at the time wanted to reduce Opteron power and increase core counts with many 2 GHz low-power/high-IPC cores. Bulldozer 2007's HPC 16c = 16-actual processors and 32-int-clusters.

David Witt(Chief for K8 generation secondary design) => 1997 homo-clustered architecture (Two monolithic clusters each with Int/FPU//LD/ST)
Jim Keller => 1999 hetero-clustered architecture (Three clusters, one with Int, one with FPU, one with LD/ST:;Proto-Bobcat // handled by the converted MIPS->x86 Alchemy team with Brad B.(Chief Arch(Bobcat))/McKinney(Chief Tech(Bobcat&Bulldozer))
Andy Glew => 2002-2004 K10 architecture (Independent from the above, but spinned out from LP-project::clusters are not yet cores)
Charles R. Moore => 2005-1H2007, Original Bulldozer architecture (Derived from Witt/Keller with four clusters: Two Int(CMT2), One LD/ST(SMT2), One FPU(SMT2)::Bulldozer)
Mike Butler => 2H2007-2013, K10 as "Bulldozer" (Derived from Glew's microarchitecture at AMD::clusters became cores)

Glew/Butler Architected design focused towards high performance => higher frequency
"Perhaps I should say here that my MCMT had a significant difference from
clustering in, say, the Alpha 21264,
Those clusters bypass to each other: there is a fast bypass within a
cluster, and a slightly slower (+1 cycle) bypass of results between
clusters. The clusters are execution units only, and share the data
cache. This bypassing makes it easy (or at least easier) to spread a
single thread across both clusters. My MCMT clusters, on the other
hand, do NOT bypass to each other. This motivates separate threads per
cluster, whether explicit or implicit."

Boss of Glew "K10 Architect" and Butler "K10 as Bulldozer Architect" wanted high frequency. As well not implementing SpMT or anything complex which could comprise frequency.

Moore Architected design focused towards low power and scalability => higher IPC(TLP->IPC[MT] and ILP->IPC[ST])
Boss of Moore's time as OG Bulldozer Architect wanted low power.



New naming scheme of ongoing AMD development;
HP1(launched) -> HP2(launched) -> HP3(dropped:2012->2013) // HP3 ST had access to 8 add/inc/dec/etc ALUs
LP1(hiatus:2014->2018) // Started while StoneyX was being done. FT4 socket was suppose to be Excavator and LP-NextGen(LP1).
ULP1(restart:LP1, resume on new target node:2019->2023]
 
Last edited:
Reactions: BorisTheBlade82

BorisTheBlade82

Senior member
May 1, 2020
667
1,022
136
@NostaSeronx
I guess you would highly enjoy this:

And this:

Quite interesting findings and interpretations.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |