Discussion Intel current and future Lakes & Rapids thread

Page 443 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Some thoughts on Sapphire Rapids and interconnects.

Sapphire Rapids-
If Sapphire Rapids maxes out at 56 cores, we can hope for only 60-80% performance gain, assuming everything works out. While that'll beat Milan, it'll only do it by a small amount.

56 cores = 40% more cores, Golden Cove = 20-25% more perf/clk

The thing is just putting more cores isn't enough as scaling isn't perfect so resources need to be added just to compensate for the loss. So 40% gain from 40% more cores is optimistic.

In Integer scenarios where it's representative of a uarch gain, things like HBM2 will only help in a minor way.

They really need to get the 72 core version working somehow, if not 68. If Genoa is really that big of a gain,* potentially even with 68 cores, Genoa will end up being 50%+ faster. Intel focuses more on enterprise so it might be closer there, but I can't see how they'll be in a better competitive position then they are with Icelake-SP. Maybe by a wee bit?

Interconnects-
This is more of an what-if? Based on how the 4 tiles are placed so closely on leaked Sapphire Rapids shots, perhaps they can go with "rings-of-meshes". Each tile would have it's cores connected using a mesh, but inter-tile connections would be using a ring.

*Based on how Ampere is looking, Genoa will need all the performance it can get.
 
Reactions: lightmanek

andermans

Member
Sep 11, 2020
151
153
76
56 cores = 40% more cores, Golden Cove = 20-25% more perf/clk

Open question is how the new process and the cores change the clock at a given TDP per core level. For the top SKU they're going from 6.75 W/core for Ice Lake SP to 6.25 W/core for Sapphire Rapids, so not too much of a regression. AFAIU those power levels (~54W for 8 cores) were around the level where Ice Lake on mobile was already seeing the clocking issues from the 10nm process at the time?

Interestingly even for AMD with the 64 core -> 96 core move the TDP per core barely changes for the highest TDP models (280W for Milan and 400W cTDP for Genoa), going from 4.375 W to 4.167 W. Same for the standard TDP of 225W -> 320W, that is a 3.516 W to 3.333 W per core change. (though this overlooks all of the IO die effects wrt power).

So either going to lower TDP per core isn't that helpful for power efficiency or both AMD/Intel likely are not really trying to compete with the power efficiency of ARM servers.
 

eek2121

Diamond Member
Aug 2, 2005
3,051
4,276
136
Some thoughts on Sapphire Rapids and interconnects.

Sapphire Rapids-
If Sapphire Rapids maxes out at 56 cores, we can hope for only 60-80% performance gain, assuming everything works out. While that'll beat Milan, it'll only do it by a small amount.

56 cores = 40% more cores, Golden Cove = 20-25% more perf/clk

The thing is just putting more cores isn't enough as scaling isn't perfect so resources need to be added just to compensate for the loss. So 40% gain from 40% more cores is optimistic.

In Integer scenarios where it's representative of a uarch gain, things like HBM2 will only help in a minor way.

They really need to get the 72 core version working somehow, if not 68. If Genoa is really that big of a gain,* potentially even with 68 cores, Genoa will end up being 50%+ faster. Intel focuses more on enterprise so it might be closer there, but I can't see how they'll be in a better competitive position then they are with Icelake-SP. Maybe by a wee bit?

Interconnects-
This is more of an what-if? Based on how the 4 tiles are placed so closely on leaked Sapphire Rapids shots, perhaps they can go with "rings-of-meshes". Each tile would have it's cores connected using a mesh, but inter-tile connections would be using a ring.

*Based on how Ampere is looking, Genoa will need all the performance it can get.

I don’t think Intel is quite as worried about core density for Sapphire Rapids. They would rather sell you more chips. Remember, Sapphire Rapids marks a return of more than 2 sockets. A 4S system would technically support 224 cores/448 threads. An EPYC server currently tops out at 128 cores/256 threads.

I am willing to bet that there will be aggressively clocked 32 core chips designed specifically for 4S systems.

EDIT: Oh and an 8S system? 448 cores/996 threads.
 
Last edited:

uzzi38

Platinum Member
Oct 16, 2019
2,703
6,405
146
Tapeout costs shooting up? This is a data transfer. Please explain.
By tape-out costs I refer to the cost of actually taping out a chip. That includes all of the R&D and design work required to get to that stage in the first place.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,106
136
By tape-out costs I refer to the cost of actually taping out a chip. That includes all of the R&D and design work required to get to that stage in the first place.
It’s better too just use design costs. That is what is growing so much, even with ever better automated design tools. Verification, which is part of design, is becoming very complex and time consuming as the number of xtors per die climb that exponential (though slowing) curve.
 

Hitman928

Diamond Member
Apr 15, 2012
5,611
8,826
136
By tape-out costs I refer to the cost of actually taping out a chip. That includes all of the R&D and design work required to get to that stage in the first place.

Fabrication costs themselves go up as well as the cost per wafer rises and the cost for a complete mask set increases as well.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Interestingly even for AMD with the 64 core -> 96 core move the TDP per core barely changes for the highest TDP models (280W for Milan and 400W cTDP for Genoa), going from 4.375 W to 4.167 W. Same for the standard TDP of 225W -> 320W, that is a 3.516 W to 3.333 W per core change. (though this overlooks all of the IO die effects wrt power).

The actual core power is going to be half this. So for Genoa you are looking at 2-2.3W.

The L3 takes maybe 10W at most but the rest like memory controllers, hubs for managing core to core communication, and other I/O take a ton of power.

Especially for Intel they aim in a very broad way, so for one application some of the features seem unnecessary, but benefits others. So that'll show in extra die space and power use.

Like in the SpecJVMM benchmark or OLTP transaction processing benchmarks, Icelake is much closer to Milan and gets an up to 2x gain over Cascade Lake.

I don’t think Intel is quite as worried about core density for Sapphire Rapids. They would rather sell you more chips.

This is not much of an excuse for a chip that's being significantly behind the competition. Way back when they were using the Xeon MP brand, they might have been 4 socket capable and performed ok in enterprise applications, it truly sucked against the regular dual socket Xeons everywhere else. Basically if the company didn't care about spending few thousands of dollars extra on a CPU then you went for Xeon MP.

At some point, probably starting with Nehalem-EX, the non-enterprise performance was no longer horrid in comparison to Nehalem-EP. Probably not a coincidence their marketshare and revenue really grew.

If Sapphire Rapids turns out to be 20% faster in 2P, it'll also be 20% faster in 4P and 8P.
 
Last edited:

eek2121

Diamond Member
Aug 2, 2005
3,051
4,276
136
The actual core power is going to be half this. So for Genoa you are looking at 2-2.3W.

The L3 takes maybe 10W at most but the rest like memory controllers, hubs for managing core to core communication, and other I/O take a ton of power.

Especially for Intel they aim in a very broad way, so for one application some of the features seem unnecessary, but benefits others. So that'll show in extra die space and power use.

Like in the SpecJVMM benchmark or OLTP transaction processing benchmarks, Icelake is much closer to Milan and gets an up to 2x gain over Cascade Lake.



This is not much of an excuse for a chip that's being significantly behind the competition. Way back when they were using the Xeon MP brand, they might have been 4 socket capable and performed ok in enterprise applications, it truly sucked against the regular dual socket Xeons everywhere else. Basically if the company didn't care about spending few thousands of dollars extra on a CPU then you went for Xeon MP.

At some point, probably starting with Nehalem-EX, the non-enterprise performance was no longer horrid in comparison to Nehalem-EP. Probably not a coincidence their marketshare and revenue really grew.

If Sapphire Rapids turns out to be 20% faster in 2P, it'll also be 20% faster in 4P and 8P.
I get where you are coming from, but I don’t think
Intel cares since AMD is only 2S. A 4U Intel system will hold more cores than a 4U AMD system and that is probably all Intel cares about.

Of course, with Genoa this may change.

Also can we pause for a moment and appreciate the fact that a few months from now we may see a single system with 448 cores? (Ignoring the fact it will use as much power as a small town 🤣)
 
Reactions: lightmanek

andermans

Member
Sep 11, 2020
151
153
76
I get where you are coming from, but I don’t think
Intel cares since AMD is only 2S. A 4U Intel system will hold more cores than a 4U AMD system and that is probably all Intel cares about.

I think the problem with that is that if you need more sockets that will mean more expensive motherboards as well as needing even cheaper CPUs to be price competitive. So while having >2 socket systems mitigates the issue to some extent I don't think the issue is resolved.
 

energy23

Junior Member
May 26, 2021
1
0
6
According to this link:
There seems to be a CPU variant with 8 big cores and NONE little cores but with an iGPU. Is this correct? Any guesses if this would be something like an i7 or i9 processor, ie; 12700k or something?

Asking because Im looking to build a Hackintosh in the future and am having doubts Mac OS will recognize the little cores.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,102
136
According to this link:
There seems to be a CPU variant with 8 big cores and NONE little cores but with an iGPU. Is this correct? Any guesses if this would be something like an i7 or i9 processor, ie; 12700k or something?

Asking because Im looking to build a Hackintosh in the future and am having doubts Mac OS will recognize the little cores.

There have been no rumors/leaks of such a die. But it's at least imaginable that such a SKU could be offered as a cut down version of the 8+8 die.
 

eek2121

Diamond Member
Aug 2, 2005
3,051
4,276
136
I think the problem with that is that if you need more sockets that will mean more expensive motherboards as well as needing even cheaper CPUs to be price competitive. So while having >2 socket systems mitigates the issue to some extent I don't think the issue is resolved.

Price does not matter as near as much as density. Real estate costs more than anything, if you can double or triple your capacity/performance by paying a one time fixed cost, that cost would have to be pretty high to make you hesitate.
 

Gideon

Golden Member
Nov 27, 2007
1,714
3,938
136
Servethehome IMO has very good article on the subject

The obvious answer here is cost. Each node simply costs significantly more even though one may be able to get per-node cost savings. For smaller installations of a rack or less, many organizations simply want more nodes for smaller failure domains. With fewer more costly nodes a node that is offline for any reason represents a higher proportional cost of offline resources in small installations.

So if you're not all-in on rack density you can also just put more nodes or racks in. If this were the only important metric, EPYC Rome would have been a lot more popular than it was. After all it allows to consolidate old Sandy-Bridge and Ivy-Bridge, etc nodes often 4 to 1 since 2019.
 
Reactions: Tlh97 and moinmoin

repoman27

Senior member
Dec 17, 2018
381
536
136
The term "tape out" refers to the final engineering milestone immediately prior to the production of photomasks. It actually predates EDA tools going back to the early days when the layouts were done by hand with Rubylith tape and then photo reduced.

The Calma GDSII system employed the terms "stream in" for retrieving from and "stream out" for writing out to magnetic tape storage. These terms and the GDSII file format are still in common use today, despite magnetic tape having gone the way of the dodo.

With the shift to EDA and fabless, I believe the term "tape out" gradually transitioned to meaning the point where the GDSII database with the final layout was sent off to the foundry or mask shop. However, this is almost exactly how Intel has defined "tape in" for customers using their foundry services—as delivery to Intel of the GDSII or OASIS database for Intel's use in creating masks and tooling. Intel is far from fabless and apparently committed to being an "integrated device manufacturer", so I've always wondered how this term was being applied internally. My suspicion was that it represented an engineering milestone following the completion of the logical design work but before the physical layout was completed, as Ian is suggesting in that tweet. Although I thought it was more along the lines of all of the final IP blocks having been submitted to the team responsible for the physical layout of the SoC, but perhaps prior to place and route.

Maybe someone like dmens could shed more light on this, having actually been on the inside?
 

naukkis

Senior member
Jun 5, 2002
782
637
136
If manufacturing process and tools are ready they can just tape out product to them. But when manufacturing process isn't complete yet there is no way to tape out design. So they tape-in design, trying to develop design and process design rules to actually manufacturable and when they are ready design is taped out.

Those Intel links they estimate that tape out for product family should happen within three years of tape-in of first product......
 

eek2121

Diamond Member
Aug 2, 2005
3,051
4,276
136
Servethehome IMO has very good article on the subject



So if you're not all-in on rack density you can also just put more nodes or racks in. If this were the only important metric, EPYC Rome would have been a lot more popular than it was. After all it allows to consolidate old Sandy-Bridge and Ivy-Bridge, etc nodes often 4 to 1 since 2019.

The upgrade cycle in enterprise will blow your mind. Servers run until they die or a major infrastructure project is planned. Most companies I have worked with replace servers every 6-8 years.

You can expect EPYC to become more popular in a few years.

Which do you think costs more in the long run, 2X 64 core EPYC servers or 1X 128 core EPYC server? What if you have 2 racks in a tiny (well cooled) closet as one of my clients does? Building a new closet is a significant capital investment. Adding a 2S motherboard or 4S motherboard for a few hundred or thousand more + additional CPU does not.
 

andermans

Member
Sep 11, 2020
151
153
76
The upgrade cycle in enterprise will blow your mind. Servers run until they die or a major infrastructure project is planned. Most companies I have worked with replace servers every 6-8 years.

You can expect EPYC to become more popular in a few years.

Which do you think costs more in the long run, 2X 64 core EPYC servers or 1X 128 core EPYC server? What if you have 2 racks in a tiny (well cooled) closet as one of my clients does? Building a new closet is a significant capital investment. Adding a 2S motherboard or 4S motherboard for a few hundred or thousand more + additional CPU does not.

I think the caveat there is that for large scale deployments densification has gone so fast that in a lot of cases you're power/cooling limited now. (especially once you involve accelerators)
 

uzzi38

Platinum Member
Oct 16, 2019
2,703
6,405
146
Alder Lake-P/M のステッピング、PL1/PL2 参考値 | Coelacanth's Dream (coelacanth-dream.com)

Some ADL-P laptop PL1 and PL2s for their RVPs

On adlrvp (482):
CPU PL1 = 28 Watts
CPU PL2 = 64 Watts

On adlrvp (682):
CPU PL1 = 45 Watts
CPU PL2 = 115 Watts

On brya (282):
CPU PL1 = 15 Watts
CPU PL2 = 55 Watts

And one ADL-M:

# DPTF
register "dptf_enable" = "1"
register "power_limits_config" = "{
.tdp_pl1_override = 9,
.tdp_pl2_override = 30,
}"

The latter is a 10W increase over Jasper Lake's PL2
 
Last edited:

yuri69

Senior member
Jul 16, 2013
438
720
136
TGL-U was PL1 42W/PL2 64W in the most aggressive setup, right? So 115W is kinda high compared to that. Maybe the Golden Coves are really fast but hungry.

However, it got moar corez potentially affecting the balance.
 
Reactions: lightmanek

jpiniero

Lifer
Oct 1, 2010
14,841
5,456
136
TGL-U was PL1 42W/PL2 64W in the most aggressive setup, right? So 115W is kinda high compared to that. Maybe the Golden Coves are really fast but hungry.

However, it got moar corez potentially affecting the balance.

P is really a merger of U and H. The 45/115 model is something that would replace TGL-H.
 

dmens

Platinum Member
Mar 18, 2005
2,271
917
136
The term "tape out" refers to the final engineering milestone immediately prior to the production of photomasks. It actually predates EDA tools going back to the early days when the layouts were done by hand with Rubylith tape and then photo reduced.

The Calma GDSII system employed the terms "stream in" for retrieving from and "stream out" for writing out to magnetic tape storage. These terms and the GDSII file format are still in common use today, despite magnetic tape having gone the way of the dodo.

With the shift to EDA and fabless, I believe the term "tape out" gradually transitioned to meaning the point where the GDSII database with the final layout was sent off to the foundry or mask shop. However, this is almost exactly how Intel has defined "tape in" for customers using their foundry services—as delivery to Intel of the GDSII or OASIS database for Intel's use in creating masks and tooling. Intel is far from fabless and apparently committed to being an "integrated device manufacturer", so I've always wondered how this term was being applied internally. My suspicion was that it represented an engineering milestone following the completion of the logical design work but before the physical layout was completed, as Ian is suggesting in that tweet. Although I thought it was more along the lines of all of the final IP blocks having been submitted to the team responsible for the physical layout of the SoC, but perhaps prior to place and route.

Maybe someone like dmens could shed more light on this, having actually been on the inside?

Nah, tape-in is after verification and PD sign-off, you can't tape in if you have not verified your interfaces.

Historically tape-in just meant before the GDS generation and tape-out meant handing off GDS to the fab. But considering the likely state of Intel 7nm it could just mean that design has signed off and moved on while the fab figures out how to make the thing.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |