Intel Skylake / Kaby Lake

Page 422 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Unless Intel has significantly altered their cache model, the L3 cache is still inclusive. This means that everything that exists in L2 cache on a given core also has to exist in L3 cache - this means that for every 1 MB of L2 cache, there is only 1.375 MB of L3 cache.

And L1s also needs to be in L3 (but not necessarily in L2, at least it used to be this way ). So that leaves just 320KB of "free" L3 per core in worst case ( but very unlikely).

Question is, would Intel "burn" 1.05MB of L3 cache per core just to provide coherency?
It is all speculation now, for example Knights Landing also has 1MB of L2 per "tile" that is fully cache coherent with other tiles, there is no reason that Intel could not reuse it and throw in eviction L3 to speed up certain workloads?
 

formulav8

Diamond Member
Sep 18, 2000
7,004
522
126
Question is, would Intel "burn" 1.05MB of L3 cache per core just to provide coherency?

If the trade off from much reduced L3 to much increased L2 wasn't overall worth the trade off, I don't think they would do it.

The bigger question I guess is for which workloads benefit the most or hurts the most from Intel's cache changes.
 
Reactions: Drazick

beginner99

Diamond Member
Jun 2, 2009
5,223
1,598
136
So if I am reading this correctly the Skylake now only has lets use the 7920x model only has 16.5 MB of unique cache between the L2 and L3 and the 6950x has 25MB of unique cache? So the hope here would be that the extra speed on the L2 make up for losing about 10MB worth of extra data total which in theory would mean more wasted cycles on refreshing the cache from memory?

That's the question. But also it's a big if. It depends if level 3 cache is inclusive which it has been forever in intel uArch and also in skylake and kaby lake. So it's highly unlikely this is different but entirely possible. inclusive means L3 cache contains all data from L2 caches. IMHO this is important if the OS moves threads from core to core. But with this new config there really isn't much space left in level 3. Strange.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,106
136
That's the question. But also it's a big if. It depends if level 3 cache is inclusive which it has been forever in intel uArch and also in skylake and kaby lake. So it's highly unlikely this is different but entirely possible. inclusive means L3 cache contains all data from L2 caches. IMHO this is important if the OS moves threads from core to core. But with this new config there really isn't much space left in level 3. Strange.

Inclusive L3$ benefit cache coherency performance between CPUs (or in the case of AMD, dice within an MCM package).
 
Reactions: Drazick

beginner99

Diamond Member
Jun 2, 2009
5,223
1,598
136
nclusive L3$ benefit cache coherency performance between CPUs (or in the case of AMD, dice within an MCM package).

Exactly my point. And if it is still inclusive, there isn't much room for data not already in L2. If it is exclusive, it could negatively affect performance as you said due to lacking cache coherency and resulting latency penalties.
 

DrMrLordX

Lifer
Apr 27, 2000
21,813
11,168
136
This effectively means that 72.7% of Skylake-X's L3 cache could be used for mirroring L2 cache, where Broadwell-E only used 10% of its L3 for mirroring L2 cache.

Does this mean cache will be slower? No. We need benchmarks.

I don't think the cache will be slower, but I do think there will be more scenarios where Skylake-X has to hit the system RAM due to a large-ish working set.

Stop worrying about level 3 cache.
Level 3 cache is no better than RAM. And you really dont need an extra megabyte of RAM.

No better than RAM huh? Right.

Inclusive L3$ benefit cache coherency performance between CPUs (or in the case of AMD, dice within an MCM package).

I was also thinking of that. HEDT customers won't worry since they will all be on 1P systems, but for Xeon buyers, that could be an issue.
 

LTC8K6

Lifer
Mar 10, 2004
28,520
1,575
126
Couldn't we turn off some cores on some chips and get an idea of what the different L3 cache levels do to performance?
IOW make the chips the same except for the L3 cache level.

Given some examples such as i7 vs i5 and i3 vs Pentium, I suspect it doesn't affect the performance much?

Maybe make a couple of BW-E chips the same except for the L3 amounts?
 

Sweepr

Diamond Member
May 12, 2006
5,148
1,142
131
Kaby Lake-G rumors getting intense:

Raja Koduri, Senior Vice President and Chief Architect, Radeon Technologies Group worked at Apple and worked closely with Intel too. He probably played an important role in these negotiations.

It remains to be seen when it will be formally announced and when it will start affecting AMD’s bottom line. The cooperation and agreement will allow Intel to access AMD’s graphics Intellectual Properties and most things Radeonish.

AMD will weaken its position to fight Intel on in the integrated solutions, but licensing money should help overcome that issue. Despite that fact that these companies compete, they are close when it comes to graphics.

Of course, when we came to know the existence of the deal, we will dig much harder to get the many more details about potential new products and if the Radeon will get inside of the future Intel CPUs. Intel takes a lot of time to implement a new architecture.

http://www.fudzilla.com/news/graphics/43663-intel-is-licensing-amd-graphics
 

Hans de Vries

Senior member
May 2, 2008
321
1,018
136
www.chip-architect.com
Pretty much confirms it doesn't have full speed AVX-512.
How do you figure from that result?

Because the so called "Multi Media" test used Mandelbrot/Julia
fractal programs which run almost entirely within the CPU's
register-set.

That is:

The bandwidth from the AVX CPU-registers to the L1D cache is
virtually irrelevant, let alone the bandwidth to L2, L3 and DRAM.

This means that the performance is proportional to the SIMD
vector length (512) divided by the number of cycles it takes
to execute the AVX operation.

The result is more or less the same as for the six core i7-6800K
so it looks that it handles 256 bits per cycle.

http://ranker.sisoftware.net/show_d...b984b593fbc6f3d5ad90a187e287ba8aacdfe2d2&l=en

SiSoft Sandra has already implemented the snip-lets of AVX-512
assembly code needed for these tests:

http://www.sisoftware.eu/2016/02/24/future-performance-with-avx512-in-sandra-2016-sp1/

But it seems there are no results which show any sign of full
speed AVX-512 execution.

http://ranker.sisoftware.net/top_de...f3ceffd9b18cb99fe7daebcda8cdf0c0e695a898&l=en

Of course real Multi Media (images, videos) do not quite fit in
the CPU register file! Calling it representative for Multi Media
tasks is therefor misleading. 512 bit words at full speed need
an enormous bandwidth for real life applications.
 

mikk

Diamond Member
May 15, 2012
4,175
2,211
136
They'll need something bigger than 149mm^2 if they are to fit 2MB L3/core.


You were wrong on this. I've got some news and I can say Coffeelake 6/12 comes with 12 MB L3. Don't talk as a given fact next time without confirmation.

Also Coffeelake-S is going to support DDR4-2667, at least the 6C version. I'm not sure about the 4C version, it might support DDR4-2400 only, not sure.
 
Mar 10, 2006
11,715
2,012
126
I still can't see why AMD would want Intel to essentially be able to make APUs to compete with AMD APUs and with AMD low end GPUs.

If Apple wants a single MCM/single chip solution in place of discrete Polaris 11 + Intel CPU, then a product like Kaby Lake-G would make sense.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,729
136
You were wrong on this. I've got some news and I can say Coffeelake 6/12 comes with 12 MB L3. Don't talk as a given fact next time without confirmation.

Also Coffeelake-S is going to support DDR4-2667, at least the 6C version. I'm not sure about the 4C version, it might support DDR4-2400 only, not sure.
We know 6C/6T is 9MB L3, and unless die sizes are official, I'm not wrong - perhaps they are using a bigger die than 149mm^2 or the 149mm^2 figure was inaccurate to begin with.

What's so special about DDR4-2667 support? Ryzen supports DDR4-2667 in single-rank 2 DIMM configuration.
 

LTC8K6

Lifer
Mar 10, 2004
28,520
1,575
126
We know 6C/6T is 9MB L3, and unless die sizes are official, I'm not wrong - perhaps they are using a bigger die than 149mm^2 or the 149mm^2 figure was inaccurate to begin with.

What's so special about DDR4-2667 support? Ryzen supports DDR4-2667 in single-rank 2 DIMM configuration.
It would be faster memory than KL or BW-E supports. Not everything is about RyZen...
I'm not sure that CL support of 2667 is accurate, though.

I have seen no bench leaks of a 6C12T part, which would presumably have 12mb of L3.
How this fits in with 1151, I don't know.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,729
136
It would be faster memory than KL or BW-E supports. Not everything is about RyZen...
I'm not sure that CL support of 2667 is accurate, though.

I have seen no bench leaks of a 6C12T part, which would presumably have 12mb of L3.
How this fits in with 1151, I don't know.
Every new generation has officially supported faster memory than the previous one in the past few years. What would have been surprising is if it didn't do so.
 
Reactions: Drazick
Mar 10, 2006
11,715
2,012
126
We know 6C/6T is 9MB L3, and unless die sizes are official, I'm not wrong - perhaps they are using a bigger die than 149mm^2 or the 149mm^2 figure was inaccurate to begin with.

What's so special about DDR4-2667 support? Ryzen supports DDR4-2667 in single-rank 2 DIMM configuration.

Bro, the 6C/12T CFL-S will have 12MB of L3$, not 9MB.
 

LTC8K6

Lifer
Mar 10, 2004
28,520
1,575
126
Every new generation has officially supported faster memory than the previous one in the past few years. What would have been surprising is if it didn't do so.
Except that this is still really just Skylake/Kabylake. It's not even close to a new generation. It's a 6C KL chip.
However, I think the point of the poster who mentioned the 2667 might have been for performance numbers.
Bench testers often use only the officially supported memory speeds, so if CL supports 2667, that would give it an advantage over SL/KL.
It could also mean that boards will be able to use higher speed overclocked memory than previously, if the board is validated for 2667.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |