Question [AT]Cannon Lake deep dive review

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

NTMBK

Lifer
Nov 14, 2011
10,270
5,135
136

NTMBK

Lifer
Nov 14, 2011
10,270
5,135
136
The benefit to AVX-512 is that it's right there on the CPU. If you're a data center and you're going to buy a bunch of CPUs, it's really nice to have that capability right there in the silicon you're already buying. Don't dismiss that advantage over GPGPU.

Sure, but in the context of these low power parts, the ARM chips they are competing with will have compute capable GPUs, DSPs, DL accelerators etc. integrated into the SoC.
 

Qwertilot

Golden Member
Nov 28, 2013
1,604
257
126
That also doesn't explain why they seem to feel the need to put it into every chip they're making.

This review rather shows some massive trade offs involved....
 

Topweasel

Diamond Member
Oct 19, 2000
5,436
1,655
136
How many of those trade-offs are directly or indirectly related to AVX512 capability?

A lot actually. AVX 512 takes up a lot of space. On SL-X it was estimated to be 20-25% of the core size. No matter who you are AVX-512 doesn't even have the consumer potential to make it worth having on a consumer die, let alone it accounting for as much die space as it does. Those transistors could be better served on doing just about anything else.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Deep dive, but some basic questions are left unanswered. The sore spot that can hurt performance a lot is memory latency. Just putting some random DIMMs does not make them run SPD speed 2400CL17. Things are much more likely to run JEDEC safe mode 2133 at some hilariuos primary timings. In core 8121U test from October, it was running DDR4L @2400CL24, for Aida64 latency = 100ns.

Given AT results, they must have had something even worse and in the end that horrible memory latency ended up hurting CNL core performance in quite a few tests.

I have to call bs on this:

At first we thought this was a bug. For both systems to have dual channel memory and running DDR4-2400, something had to be wrong. We double checked the setups – both systems were running in dual channel mode, giving the same memory bandwidth. The Cannon Lake processor was running at DDR4-2400 17-17-17, whereas the Kaby Lake system was at DDR4-2400 16-16-16 (due to memory SPD differences), which isn’t a big enough change to have such a big difference. The only reason we can come up with is that the memory controller on Cannon Lake must have additional overhead from the core to the memory controller – either a slower than expected PLL or something.

strong words, esp given this statement on previuos page, they had no way of knowing freq/timings?

For the Cannon Lake based Lenovo Ideapad 330-15ICN, we removed the low-end SSD and HDD that was shipped with the design and put in our own Crucial MX200 1TB and 2x4 GB DDR4 SO-DIMMs for testing. Unfortunately we can’t probe the exact frequency the memory seems to be running at, nor the sub-timings, because of the nature of the system. However the default SPD of the modules is DDR4-2400 17-17-17.
 

Qwertilot

Golden Member
Nov 28, 2013
1,604
257
126
How many of those trade-offs are directly or indirectly related to AVX512 capability?

Also, concretely - doesn't the review show a non trivial performance regression under AVX2 code?

That seemed likely to be somehow down to the 512 support.

If it is then surely not ideal given long it might take to get stuff recompiled to use 512.
 

LTC8K6

Lifer
Mar 10, 2004
28,520
1,575
126
Deep dive, but some basic questions are left unanswered. The sore spot that can hurt performance a lot is memory latency. Just putting some random DIMMs does not make them run SPD speed 2400CL17. Things are much more likely to run JEDEC safe mode 2133 at some hilariuos primary timings. In core 8121U test from October, it was running DDR4L @2400CL24, for Aida64 latency = 100ns.

Given AT results, they must have had something even worse and in the end that horrible memory latency ended up hurting CNL core performance in quite a few tests.

I have to call bs on this:



strong words, esp given this statement on previuos page, they had no way of knowing freq/timings?
They couldn't run CPU-Z or something?
 

DrMrLordX

Lifer
Apr 27, 2000
21,819
11,173
136
A lot actually. AVX 512 takes up a lot of space. On SL-X it was estimated to be 20-25% of the core size. No matter who you are AVX-512 doesn't even have the consumer potential to make it worth having on a consumer die, let alone it accounting for as much die space as it does. Those transistors could be better served on doing just about anything else.

My impression is that Intel expected the node shrink to 10nm to "take care of the problem", and in a worst-case scenario we might see some changes to cache architecture that might be unfavorable for overall CPU operation compared to the cache architecture from the Skylake family (consider Skylake vs Skylake-X). So I don't think the trade-offs were anything that would explain why the i3-8121U was so much slower than the i3-8130U in some of the tests.

I see what you're saying, and that the implementation of 512-bit SIMD would take up a lot of die space. I'm just not sure that AVX-512 made the situation "worse" than, say, a Skylake-derived chip. It certainly did prevent them from improving some other aspects of the uarch.

Deep dive, but some basic questions are left unanswered. The sore spot that can hurt performance a lot is memory latency. Just putting some random DIMMs does not make them run SPD speed 2400CL17. Things are much more likely to run JEDEC safe mode 2133 at some hilariuos primary timings. In core 8121U test from October, it was running DDR4L @2400CL24, for Aida64 latency = 100ns.

Given AT results, they must have had something even worse and in the end that horrible memory latency ended up hurting CNL core performance in quite a few tests.

I have to call bs on this:

strong words, esp given this statement on previuos page, they had no way of knowing freq/timings?

I was a bit disappointed with that aspect of the review. There certainly were some ways for them to know frequency and timings. A custom tool like RyzenTimingChecker should do the trick. Someone would have to code such a thing, but it would be doable. And I think Thaiphoon Burner would at least get the SPD data off the DIMMs. CPUz, HWiNFO64, and any number of other applications should be able to get them at least the speed and primary timings.

My guess is they used CPUz, got the speed and primaries, and called it a day. Their apparent inability to glean subtimings (note the exact wording of the review) is probably from there being a barebones UEFI on the system.

Also, concretely - doesn't the review show a non trivial performance regression under AVX2 code?

It did seem a little odd, didn't it? I'd have to re-read the thing to get a better idea of exactly which tests had apparent AVX2 performance regression for the i3-8121U.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
My guess is they used CPUz, got the speed and primaries, and called it a day. Their apparent inability to glean subtimings (note the exact wording of the review) is probably from there being a barebones UEFI on the system.

Or they assumed that it is running DDR4 2400 CL17, cause that is what was on SPD?

Just quick glance at http://users.atw.hu/instlatx64/GenuineIntel0060663_CannonLake_NewMemLat.txt does not show any crazy latencies worth of noting, looking good for mobile class device memory subsystem.
 

Topweasel

Diamond Member
Oct 19, 2000
5,436
1,655
136
My impression is that Intel expected the node shrink to 10nm to "take care of the problem", and in a worst-case scenario we might see some changes to cache architecture that might be unfavorable for overall CPU operation compared to the cache architecture from the Skylake family (consider Skylake vs Skylake-X). So I don't think the trade-offs were anything that would explain why the i3-8121U was so much slower than the i3-8130U in some of the tests.

I see what you're saying, and that the implementation of 512-bit SIMD would take up a lot of die space. I'm just not sure that AVX-512 made the situation "worse" than, say, a Skylake-derived chip. It certainly did prevent them from improving some other aspects of the uarch.

No that's a good point. I can't say whether it caused a negative impact on performance comparatively. I was thinking the indirect. Without it they would have room for other Uarch improvements.
 

DrMrLordX

Lifer
Apr 27, 2000
21,819
11,173
136
Or they assumed that it is running DDR4 2400 CL17, cause that is what was on SPD?

Also possible. It would have been nice if they had given us at least a CPUz screenshot.

No that's a good point. I can't say whether it caused a negative impact on performance comparatively. I was thinking the indirect. Without it they would have room for other Uarch improvements.

Right, which is a valid concern. AVX512 is an odd choice for a mobile product (an unfortunate side-effect of what was probably a plan to use the same basic core in everything from mobile parts to S-class desktop chips).
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Also possible. It would have been nice if they had given us at least a CPUz screenshot.

Yeah. Or even better Aida64 memory/cache bench screen. Those latencies and bandwidhes would instantly validate so many things like dual channel mode, uncore clocks and so on.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |