Discussion Intel current and future Lakes & Rapids thread

Page 307 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Ajay

Lifer
Jan 8, 2001
16,094
8,106
136
It may as well not have.

That aside, 250W for an 8c seems a bit ridiculous. One of the potential problems I see here is that, in games, Rocket Lake-S may actually chew up more power than a 10900k since a 10900k may not use all its cores during gaming. Rocket Lake-S is more likely to do so. That increase in performance will come at a price.
Unless you play AOS, I don't think there will be added power consumption because of the number of game threads. If RKL sucks more power, it's just because a 10nm architecture, shoehorned into a 14nm, running at 14nm clocks sucks power like there is no tomorrow.
 
Reactions: Tlh97

jpiniero

Lifer
Oct 1, 2010
14,835
5,453
136
According to Notebookcheck ADL-S supports 3 display outputs and RKL-S supports only 3 display outputs. You have to understand that the display and media engine could differ even when both are Xe LP based.

Tiger Lake's Gen12 supports 4 display outputs. If Rocket-S doesn't support that, I'd have to think that's due to LGA1200 packaging limitations.
 

utahraptor

Golden Member
Apr 26, 2004
1,053
199
106
My take on Rocket Lake. If Intel and AMD are car companies with Intel being Chevy and AMD being Ford it would seem Ford switched to electronic fuel injection (12nm) recently and has ironed out all the bugs. Chevy tried (10nm) and failed so they are bringing back carburetors (14nm) one last time, but they managed to get remote start working! (PCIE 4.0.)
 

Hulk

Diamond Member
Oct 9, 1999
4,373
2,251
136
If the clockspeed of Rocket Lake is really good then this could be a nice upgrade because the Sunny Cove core incorporates quite a lot of new features, probably the biggest jump from Ivy to Haswell. Compared to Skylake we're looking at a large L1 data cache, addition of another complex micro-op decoder, larger L2 AND L3 cache, larger micro-op cache, larger/better OoO scheduler, and of course 2 more execution ports on the back end. It's a huge upgrade. Intel really opened up the intake and exhaust!

One thing that I keep wondering is why they didn't go with the Willow Cove core which has larger L2 and L3 caches? But I'm thinking at 14nm the additional die space was just too much from a cost/performance point of view.

As for the big/little approach it could be a real winner. 95% of PC users are probably using about 20% of available compute 90% of the time. But when you need that additional compute (video editing, photo editing, DAW, etc.) you need it! If the little cores could be "tuned" to work well on these redundant operations then perhaps this hybrid design could result in a really powerful and efficient cpu.
 

AMDK11

Senior member
Jul 15, 2019
341
235
116
@Hulk
Because WillowCove has a non-inclusive cache subsystem which is a much slower solution when exchanging data between cores. There is software that is multi-threaded, including games that are very sensitive to it.

Skylake L1D 32KB, L2 256KB and L3 2MB (inclusive)

SunnyCove L1D 48KB, L2 512KB and L3 2MB (inclusive)

CypressCove L1D 48KB, L2 512KB and L3 2MB (inclusive)

Skylake-X L1D 32KB, L2 1MB and L3 1.375MB (non-inclusive)

WillowCove L1D 48KB, L2 1.25MB and L3 3MB (non-inclusive)
 
Last edited:

yuri69

Senior member
Jul 16, 2013
437
717
136
True that. The Willow Cove's memory subsystem is similar to the "server Skylake" which sucks in games.
 

DrMrLordX

Lifer
Apr 27, 2000
21,805
11,159
136
Unless you play AOS, I don't think there will be added power consumption because of the number of game threads. If RKL sucks more power, it's just because a 10nm architecture, shoehorned into a 14nm, running at 14nm clocks sucks power like there is no tomorrow.

We aren't exactly disagreeing. It's about utilization of available computational resources.

If I have an 8t game (for example), a 10c chip with narrower cores vs. an 8c chip with wider cores will use less power, assuming only 8c are ever used and assuming the same TDP/PL1/PL2 values.

I would expect Rocket Lake-S to use more power per core per MHz. At that point it's a matter of how many cores are engaged to see which CPU will use more power. In a workload like CBR20 or Blender, a 10900k should use the same amount of power.
 

AMDK11

Senior member
Jul 15, 2019
341
235
116
True that. The Willow Cove's memory subsystem is similar to the "server Skylake" which sucks in games.
There are rumors that Intel deployed a non-enabled cache under pressure from cloud customers.
 
Last edited:

AMDK11

Senior member
Jul 15, 2019
341
235
116
Be kind of dumb to do inclusive if you have 1 MB L2/core and 1.375 MB L3/core.
All in all, you're right, it was more about a larger L2 than non-inclusive. Simply the L3 would have to be several times larger than 1.25MB to make sense (3-4x I think). On the other hand, while Skylake-X non-inclusive was something logical due to the slightly larger L3 over L2, WillowCove has L3 2.5x larger than L2. It can be seen from Alderlake leaks that GoldenCove has the same cache and quantity subsystem as WillowCove, which suggests that Intel is standardizing high-performance x86 cores without dividing into inclusive and non-inclusive.
 

mikk

Diamond Member
May 15, 2012
4,173
2,210
136
Some might remember this test:
On youtube there is a Vivobook 15 on a i7-1165G7, it runs on DDR4-3200 Dualchannel. It confirms that Xe LP is heavily bandwidth starved:


Firestrike graphics: 4000
Time spy graphics: 1100

LPDDR4x-4266 ultrabook devices can reach 5500 in Firestrike and 1500+ in timespy (without throttling). It's a 35% gap which is massive. And according to JZWSVIC single/dual rank memory makes also a big difference: https://www.zhihu.com/question/424267432/answer/1509741848

He says DDR4-3200 dual rank can reach LPDDR4-4266 single rank, however all 2x8 GB DDR4 devices will likely use single rank memory. Bad news for DDR4 devices but good news for ADL-P which will support DDR5 - because it indicates that Xe LP will scale further up with faster RAM.
 

jpiniero

Lifer
Oct 1, 2010
14,835
5,453
136
All in all, you're right, it was more about a larger L2 than non-inclusive. Simply the L3 would have to be several times larger than 1.25MB to make sense (3-4x I think). On the other hand, while Skylake-X non-inclusive was something logical due to the slightly larger L3 over L2, WillowCove has L3 2.5x larger than L2. It can be seen from Alderlake leaks that GoldenCove has the same cache and quantity subsystem as WillowCove, which suggests that Intel is standardizing high-performance x86 cores without dividing into inclusive and non-inclusive.

I think Rocket Lake's cache hierarchy is mostly because it's derived from Sunny Cove; but it's going to be Intel's biggest mainstream die in a long time as it is...
 

Hulk

Diamond Member
Oct 9, 1999
4,373
2,251
136
For fun I made a little table comparing IPC for all of the generations of Intel Core Architecture and some of the P4's. I averaged results for like architectures (P4's and Skylakes). Last column shows IPC improvement from generation-to-generation.
Benchmark is Cinebench 11.5 single threaded and scores have been weighted to account for differences in clockspeed.

Yes I know this is a very limited analysis but I thought it might be fun to see.

If anyone has some older systems (pre P4) and would like to run the bench that would be great. Also there is no install for the benchmark it runs from the exe.

Of course Conroe shows the greatest IPC improvement from P4. Nehalem, and Sandy Bridge are also impressive. Less so Haswell. Sunny Cove is showing double digit improvement.

ClockspeedCinebench 11.5 STWeightedImprovement
Pentium 4Smithfield
4.336​
0.56​
0.66​
23.1%​
24.7%
Pentium 4Prescott
3.2​
0.42​
0.67​
23.5%​
Pentium 4Dothan
4.272​
0.6​
0.72​
25.2%​
Pentium 4Cedarmill
3​
0.43​
0.73​
25.7%​
Pentium 4Presler
4.007​
0.58​
0.74​
25.9%​
ConroeC2D
2.53​
0.76​
1.53​
53.8%​
53.8%​
29.1%​
Nehalem965x
3.2​
1.14​
1.82​
63.8%​
63.8%​
10.0%​
Sandy Bridge2700k
3.8​
1.52​
2.04​
71.6%​
71.6%​
7.8%​
Ivy Bridge3770k
3.9​
1.64​
2.14​
75.3%​
75.3%​
3.7%​
Haswell4770k
3.9​
1.76​
2.30​
80.8%​
80.8%​
5.5%​
Broadwell5200u
2.7​
1.26​
2.38​
83.6%​
83.6%​
2.8%​
Skylake6700k
4.2​
2.05​
2.49​
87.4%​
86.9%3.3%
Kaby Lake7700k
4.5​
2.19​
2.48​
87.2%​
Coffee Lake 8086k
5​
2.4​
2.45​
86.0%​
Coffee Lake R9900k
5​
2.43​
2.48​
87.0%​
Comet Lake10900k
5.1​
2.47​
2.47​
86.7%​
Sunny CoveIce Lake
3.9​
2.16​
2.82​
99.2%​
99.2%​
12.3%​
Willow CoveTiger Lake
4.8​
2.68​
2.85​
100.0%​
100.0%​
0.8%​
 

Hulk

Diamond Member
Oct 9, 1999
4,373
2,251
136
The P4's and Skylakes have averaged percent value but the merged cell didn't come through the copy/paste operation. It's 24.7% average for P4's and 88.9% average for the 'lakes.

All percentages are based on the fastest core IPC-wise (Willow) being 100% performance-wise.
 

Hulk

Diamond Member
Oct 9, 1999
4,373
2,251
136
All results except for the P4's came from Anandtech reviews so they should be pretty much apples-to-apples.

Willow Cove does show a slight improvement from Sunny in this bench but Intel probably decided it wasn't worth the extra die space right? I mean for Rocket Lake.
 

AMDK11

Senior member
Jul 15, 2019
341
235
116
I think Rocket Lake's cache hierarchy is mostly because it's derived from Sunny Cove; but it's going to be Intel's biggest mainstream die in a long time as it is...
The difference between SunnyCove and WillowCove is the same as with Skylake and Skylake-X. The x86 core logic remains the same, the major change is in the cache subsystem.

Multithreaded code, including games, is sensitive to fast communication between the cores, and the subsystem containing the cache is much faster in this regard, because the core b) does not have to poll for the currently processed L2 / L1 data of the core a), because it gets copies from L3. Additionally, L2 at SunnyCove is much faster than at WillowCove. WillowCove will have an advantage in independent threads and L2 capacity sensitive programs.
 
Last edited:

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Multithreaded code, including games, is sensitive to fast communication between the cores, and the subsystem containing the cache is much faster in this regard, because the core b) does not have to poll for the currently processed L2 / L1 data of the core a), because it gets copies from L3. Additionally, L2 at SunnyCove is much faster than at WillowCove. WillowCove will have an advantage in independent threads and L2 capacity sensitive programs.


There is quite some multithreaded code that does not care about inter thread communication at all. Just look at 32C ZEN1 Threadrippers, some of them even had no direct access to memory, yet they scaled just fine in MT loads like Cinebench and co.

Inclusive L3 is not the only way to do fast inter thread communication either. Chips like SKL-X and AMD ZEN either keep shadow tags for what is in inner core or said shadow tags are part of lower cache structure. Just look at 3300X that has a single CCX enabled and it's L3 is not inclusive and yet it has 12-24ns inter thread comm latency.

It was mentioned in this thread already that it is completely impractical to have inclusive L3 when L2 is nearly the same size. What was not said is the fact that inclusive L3 can become bottleneck in variuos ways:
1) for example a single thread busy reading/writing memory will keep on dirtying L3 and evicting working sets of every single other thread, since everything needs to go through L3. Intel used to have some technologies for HSW/BDW generation server chips for "partitioning" cache, but they were pretty much impossible to use without a black belt in know-how. SKL-SP with it's large L2 solved this for not only cloud titans, but for average joes like me.
2) Even if bandwidth is small, cores still impact each other in unpredictable way, as cache ways, slice address partitioning etc are limited resources and when you have multiple cores generating load, scalability will suffer.

So this cache redesign was required to keep growing caches and "non-inclusive" SKL-SP hurting games, was not due to L3 architecture, but rather implementation was horrible - 2.4Ghz uncore, terrible L3 latency and abysmal cumulative bandwith.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Some might remember this test:
On youtube there is a Vivobook 15 on a i7-1165G7, it runs on DDR4-3200 Dualchannel. It confirms that Xe LP is heavily bandwidth starved:

I'm still very skeptical that its that bound by memory bandwidth. 35% gap with 33% bandwidth is actually showing super-linear scaling. Even 100% scaling sounds unrealistic because it means literally nothing else matters other than memory bandwidth - forget EUs, texture units, caches, power limits.

Ultrabookreview did a comparison between the i5 and the i7: https://www.ultrabookreview.com/41841-asus-zenbook-14-ux425ea-review/

Even at 19W, the 1165G7 is noticeably faster(ignore the NFS result as its likely capped at display refresh rate of 60Hz) than the 1135G7.

Why is the Iris Xe Max showing only minimal gains despite having 25W all to itself and dedicated LPDDR4x?

Plus, its a Vivobook, meaning its Asus, which underperforms. The 17W Acer outperforms the 26W Asus in their review, which can explain the discrepancy.

@Hulk
Good, but few things to note.

-Dothan isn't Pentium 4, its the 90nm update of the Pentium M. There's no way Core 2 is 2x as fast as Dothan per clock.
-Core i7 965EE "Nehalem" also has a Turbo clock of 3.46GHz. With that out of the way, it'll be at 1.05/1.68, or 10% gain over Conroe. 2700K has 3.9GHz Turbo, not 3.8GHz. 2700K will be at 1.48/1.99, showing 18% gain over Nehalem. Now that makes a lot more sense!
-Haswell, and Skylake are performing lower relatively than they should. Though this isn't your fault.

Their results are just very low in general. 5% over Haswell?
 
Last edited:

mikk

Diamond Member
May 15, 2012
4,173
2,210
136
Plus, its a Vivobook, meaning its Asus, which underperforms. The 17W Acer outperforms the 26W Asus in their review, which can explain the discrepancy.


Looks like you are watching the best run from the Swift, it uses more than 17W. In Cinebench R15 there is a big drop off from 970 to 700 after a few runs. At 17W it scores 700, it looks similar to the Asus. There is no Asus running on 26W sustained load on ultrabookreview, only up to 18W. Where do you see 26W?

In terms of cooling and CPU performance the Vivobook S15 performs very well. He wrote in Cinebench R20 it scores 2190 in the first run and even after 10 runs it scores 2120. Wattage stays at 28W and the temps are around 75 degrees. The cooling is much better than the LPDDR4x ultrabook devices from Asus or Swift. In fact the Vivobook S15 was one of the best if not best sustained load CML-U performer on laptopmedia.




I would hope there is a specific S15 issue involved which hurts iGPU performance a lot but I don't believe in this, unfortunately.
 
Last edited:

Hulk

Diamond Member
Oct 9, 1999
4,373
2,251
136
@Hulk
Good, but few things to note.

-Dothan isn't Pentium 4, its the 90nm update of the Pentium M. There's no way Core 2 is 2x as fast as Dothan per clock.
-Core i7 965EE "Nehalem" also has a Turbo clock of 3.46GHz. With that out of the way, it'll be at 1.05/1.68, or 10% gain over Conroe. 2700K has 3.9GHz Turbo, not 3.8GHz. 2700K will be at 1.48/1.99, showing 18% gain over Nehalem. Now that makes a lot more sense!
-Haswell, and Skylake are performing lower relatively than they should. Though this isn't your fault.

Their results are just very low in general. 5% over Haswell?
[/QUOTE]

As I was typing "Dothan" I was thinking to myself "I don't remember this as a P4?" Good catch.
That 965 score at 3.2GHz was recorded in the Haswell review written by Anand himself. He noted the frequency at 3.2GHz for that test. The 2700k was actually a 2600k score.

I can't stand behind the P4/Dothan result but except for the 2600k typo the other results are from Anandtech benches so I think these results are accurate. I will do some digging though.

I've done other such IPC comparisons such as this and the generation-to-generation gains are generally smaller than the numbers thrown around by both manufacturers and users. A 10% IPC gain in the post Conroe era is quite an accomplishment. As Anand wrote long ago "all of the low hanging fruit has been picked."

And of course this is just one benchmark we're looking at, single core and frequency isolated so it is really a look at IPC alone.

Skylake to Sunny Cove has the largest IPC increase in the post Conroe era. Quite an accomplishment for Intel. Rocket Lake may be more formidable than I had imagined...

Here is where I got most of the stats: https://www.anandtech.com/bench/CPU-2019/2199

And here is the 4770k review: https://www.anandtech.com/show/7003/the-haswell-review-intel-core-i74770k-i54560k-tested/6

I think Anand may have not included the fact that the 4770k and 3770k turbo up to 3.9 as those scores are what I'm seeing from those parts at 3.9GHz.
 
Last edited:

ondma

Platinum Member
Mar 18, 2018
2,771
1,351
136
I dont even consider Anand a go to reviewer anymore. Many of its later reviews (from Skylake on) are handicapped by limiting ram speeds to the officially supported values, even for a K model chip.
 

Hulk

Diamond Member
Oct 9, 1999
4,373
2,251
136
For what it's worth I double-checked the results and found two additional Nehalem results, which I averaged with the one I had from Anand.

P4 to Conroe: 29.2%
Conroe to Nehalem: 8.6%
Nehalem to Sandy Bridge: 9.3%
Sandy Bridge to Ivy Bridge: 3.7%
Ivy Bridge to Haswell: 6.4%
Haswell to Broadwell: 1.8%
Broadwell to Skylake: 3.3%
Skylake to Ice Lake: 12.3%
Ice Lake to Tiger Lake: 0.8%


ClockspeedCinebench 11.5 STWeightedImprovement
Pentium 4Smithfield
4.336​
0.56​
0.66​
23.1%​
24.6%
Pentium 4Prescott
3.2​
0.42​
0.67​
23.5%​
Pentium 4Cedarmill
3​
0.43​
0.73​
25.7%​
Pentium 4Presler
4.007​
0.58​
0.74​
25.9%​
ConroeC2D
2.53​
0.76​
1.53​
53.8%​
53.8%​
29.2%​
Nehalem965x
3.2​
1.14​
1.82​
63.8%​
62.4%
8.6%​
Nehalem980x
4.101​
1.43​
1.78​
62.5%​
Nehalem965
3.737​
1.27​
1.73​
60.9%​
Sandy Bridge2600k
3.8​
1.52​
2.04​
71.6%​
71.6%​
9.3%​
Ivy Bridge3770k
3.9​
1.64​
2.14​
75.3%​
75.3%​
3.7%​
Haswell4770k
3.9​
1.78​
2.33​
81.7%​
81.7%​
6.4%​
Broadwell5200u
2.7​
1.26​
2.38​
83.6%​
83.6%​
1.8%​
Skylake6700k
4.2​
2.05​
2.49​
87.4%​
86.9%3.3%
Kaby Lake7700k
4.5​
2.19​
2.48​
87.2%​
Coffee Lake 8086k
5​
2.4​
2.45​
86.0%​
Coffee Lake R9900k
5​
2.43​
2.48​
87.0%​
Comet Lake10900k
5.1​
2.47​
2.47​
86.7%​
Sunny CoveIce Lake
3.9​
2.16​
2.82​
99.2%​
99.2%​
12.3%​
Willow CoveTiger Lake
4.8​
2.68​
2.85​
100.0%​
100.0%​
0.8%​
 

SAAA

Senior member
May 14, 2014
541
126
116
For what it's worth I double-checked the results and found two additional Nehalem results, which I averaged with the one I had from Anand.

P4 to Conroe: 29.2%
Conroe to Nehalem: 8.6%
Nehalem to Sandy Bridge: 9.3%
Sandy Bridge to Ivy Bridge: 3.7%
Ivy Bridge to Haswell: 6.4%
Haswell to Broadwell: 1.8%
Broadwell to Skylake: 3.3%
Skylake to Ice Lake: 12.3%
Ice Lake to Tiger Lake: 0.8%
...
Wait, something is off: how are you measuring the jumps? Because I'm certain Conroe was 2x Prescott IPC. All the other look quite small too.
Still, it puts into perspective how large the change is with Icelake cores, biggest since Conroe, hopefully the next one is just as good.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |