2990WX review thread Its live !

Page 6 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

french toast

Senior member
Feb 22, 2017
988
825
136
Yes it true, but when intel start using ddr4, they abandoned the l4 cache, I'm never said it's doesn't have it use, I'm just said that l4 cache usually used to mitigate bandwidth problem.
Yes it helps bandwidth and is primarily used for graphics, but it does reduce latency rather than going to main memory... that's why it would have to be big, as if you have lots of data too big for L4 and constantly needing to look up L4 before hitting main memory it will increase latency.

The reason it is not included more often is because it is expensive, uses power, may affect clocking potential...so for desktop it is only used for igp graphics, other than that the die area is better used for more cores or save space and higher clocking potential.

In enterprise it has more potential as you are looking at throughput and efficiency, loads of low clocked cores with wide vectors, lots of memory sensitive workloads, whether that be latency or bandwidth, as we see better multi chip techniques, higher density nodes and less core scaling, we will see big caches come into play.
The reason it has not been used in enterprise and HPC yet is because it has been more beneficial to use that die area for more execution units, there has been enough bandwidth also to get the job done.

Adding more than 32 cores to a quad channel ddr4 bus with some die not having direct IMC access is useless accept in corner cases.
New topology and better IF will help, as will ddr5...but it won't be effective having much more than 32 cores on that socket without seeing some large L4 cache in future.

Just my take.
 

LightningZ71

Golden Member
Mar 10, 2017
1,661
1,945
136
EPYC has distinct advantages in both total RAM bandwidth (2X) and locality of memory (it has RAM on each die, meaning that, with proper NUMA segmenting of processes, the memory can be accessed much more quickly than in the TR-WX scenario for distant cores.
 

LightningZ71

Golden Member
Mar 10, 2017
1,661
1,945
136
On the L4 Cache question: L4 cache is very expensive in die area. It can double or triple the size of a processor die if properly sized. My proposal for an L4 cache for Zepplin is contingent on the die themselves shrinking significantly with new process technology. The package will always be quite large to handle the number of pins needed for I/O, while the die can get smaller and smaller. That being the case, there eventually becomes a lower bound on how the small the die can actually be and still allow all the IO connections that are needed. On 7nm, it may not make sense to add more die area for an L4 cache, but, perhaps at 5nm or 3nm, it may just be that there is so much shrinkage that the die needs to be kept somewhat artificially big. If that's the case, an L4 can make a lot of sense. An L4 can also still hide latency. With DDR-5 coming, it will have even more cycles of access latency, making each memory access more painful from a cycle point of view (the actual ns won't change a whole lot). Anything that can change that penalty will be a big benefit to performance.
 
Reactions: french toast

french toast

Senior member
Feb 22, 2017
988
825
136
I think it can get to a point where by you use the area for one die (25% of TR~4 die) to use as a large L4.
So for threadripper 3...3x 12 core Matisse die...36/72..(much faster zen2 cores) with one L4 die instead of extra die as in TR2.
This along with more efficient IF links would be much faster than 2990wx..more well rounded also.
Epyc Rome could be bespoke 16 core die X 4 with L4 cache.
 
Reactions: ZGR

wahdangun

Golden Member
Feb 3, 2011
1,007
148
106
I don't understand the obsession about l4 cache, why not just increasing L3 cache ? Increasing IF frequency ?
 

french toast

Senior member
Feb 22, 2017
988
825
136
I don't understand the obsession about l4 cache, why not just increasing L3 cache ? Increasing IF frequency ?
Increasing IF frequency? We need to improve fabric efficiency, even that is not enough to solve the issues of none direct memory access and bandwidth issues.
 

Shivansps

Diamond Member
Sep 11, 2013
3,873
1,527
136
I don't understand the obsession about l4 cache, why not just increasing L3 cache ? Increasing IF frequency ?

You cant "just increase L3 cache". Cache sizes are calculated to be the optimal amount for the CPU, adding more may degrade performance or offer no improvement at all.

And adding another level of cache the problem is A) Cost and B) Were to place it.

In a nutshell, you cant fix this with more cache, you are going to always have a high amount of cache miss, the only thing you are going to win with more cache is faster benchmarks, real-world escenarios is another thing.
 
Reactions: ub4ty

ub4ty

Senior member
Jun 21, 2017
749
898
96
You cant "just increase L3 cache". Cache sizes are calculated to be the optimal amount for the CPU, adding more may degrade performance or offer no improvement at all.

And adding another level of cache the problem is A) Cost and B) Were to place it.

In a nutshell, you cant fix this with more cache, you are going to always have a high amount of cache miss, the only thing you are going to win with more cache is faster benchmarks, real-world escenarios is another thing.
^what he said right here.
Essentially comp-arch. A general over-view for those interested :
http://courses.cs.vt.edu/cs2506/Fall2014/Notes/L16.CachePoliciesAndPerformance.pdf
Note CPI (https://en.wikipedia.org/wiki/Cycles_per_instruction) come into the picture again... This time in relation to cache design and performance. Memory stalls are real and can dog performance. Some micro-architectures have lived and died based on their cache design and subsequent performance.
 

naukkis

Senior member
Jun 5, 2002
782
636
136
You cant "just increase L3 cache". Cache sizes are calculated to be the optimal amount for the CPU, adding more may degrade performance or offer no improvement at all.

And adding another level of cache the problem is A) Cost and B) Were to place it.

In a nutshell, you cant fix this with more cache, you are going to always have a high amount of cache miss, the only thing you are going to win with more cache is faster benchmarks, real-world escenarios is another thing.

L1 and L2 caches inside CPU core are speed-sensitive, so increasing their sizes can be harmful. L3 cache isn't so much, so L3 sizing is mainly about chip size.

wahdangun is spot on, with on-chip caches you don't implement L4, you increase size of L3. Implementing L4 on chip would be just stupid as it will duplicate logic, waste chips area and increase memory latency for nothing. If they implement L4 it's only because chip has not space for such size L3 and offchip cache has to be implemented as next level cache.
 

wahdangun

Golden Member
Feb 3, 2011
1,007
148
106
Increasing IF frequency? We need to improve fabric efficiency, even that is not enough to solve the issues of none direct memory access and bandwidth issues.

Yes, i know interconnect cost, but it can be mitigate with node shrink or increasing the efficiency, but no matter what you do, you can't make die without direct memory access to perform like normal die, even with l4 cache



You cant "just increase L3 cache". Cache sizes are calculated to be the optimal amount for the CPU, adding more may degrade performance or offer no improvement at all.

And adding another level of cache the problem is A) Cost and B) Were to place it.

In a nutshell, you cant fix this with more cache, you are going to always have a high amount of cache miss, the only thing you are going to win with more cache is faster benchmarks, real-world escenarios is another thing.


It's true if the L3 act as inclusive like -lake arch in intel.
 

Asterox

Golden Member
May 15, 2012
1,028
1,786
136
Reactions: lightmanek

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
I asked Ian. He used the figures reported by CPU registers. Which ones I dunno.

Definitely not the correct way to do it.

On AMD for example the measured (TFN) figures require separate calibration for each motherboard model, and the reported values can be skewed as well (MSI even provides control for the end-user to do that).
DCR / RdsOn measurements made by the controller itself would be acceptable, but still completely useless for determining "IF" power consumption (as you can only measure the whole SoC plane).

AMD has some software tools which can display the power consumption (at least the calculated one) of the individual blocks of the CPU, however I'm pretty certain they weren't available in this case.

With Intel the software power measurements are generally accurate, but only if the CPU is running at COMPLETELY stock.
Any change in the frequencies or voltages will throw them off. Also I haven't seen any figure on e.g. Skylake-X which would indicate the power consumption of the mesh itself.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |