2990WX review thread Its live !

wahdangun · Aug 15, 2018

french toast said:
Not true.
https://www.anandtech.com/show/9320/intel-broadwell-review-i7-5775c-i5-5675c/10
You can clearly see the impact of having an L4 cache has to the cpu here.

Yes it true, but when intel start using ddr4, they abandoned the l4 cache, I'm never said it's doesn't have it use, I'm just said that l4 cache usually used to mitigate bandwidth problem.

french toast · Aug 16, 2018

wahdangun said:
Yes it true, but when intel start using ddr4, they abandoned the l4 cache, I'm never said it's doesn't have it use, I'm just said that l4 cache usually used to mitigate bandwidth problem.

Yes it helps bandwidth and is primarily used for graphics, but it does reduce latency rather than going to main memory... that's why it would have to be big, as if you have lots of data too big for L4 and constantly needing to look up L4 before hitting main memory it will increase latency.

The reason it is not included more often is because it is expensive, uses power, may affect clocking potential...so for desktop it is only used for igp graphics, other than that the die area is better used for more cores or save space and higher clocking potential.

In enterprise it has more potential as you are looking at throughput and efficiency, loads of low clocked cores with wide vectors, lots of memory sensitive workloads, whether that be latency or bandwidth, as we see better multi chip techniques, higher density nodes and less core scaling, we will see big caches come into play.
The reason it has not been used in enterprise and HPC yet is because it has been more beneficial to use that die area for more execution units, there has been enough bandwidth also to get the job done.

Adding more than 32 cores to a quad channel ddr4 bus with some die not having direct IMC access is useless accept in corner cases.
New topology and better IF will help, as will ddr5...but it won't be effective having much more than 32 cores on that socket without seeing some large L4 cache in future.

Just my take.

Greyguy1948 · Aug 16, 2018

In some benchmarks EPYC is faster. Why is it so? BW or something else?

LightningZ71 · Aug 16, 2018

EPYC has distinct advantages in both total RAM bandwidth (2X) and locality of memory (it has RAM on each die, meaning that, with proper NUMA segmenting of processes, the memory can be accessed much more quickly than in the TR-WX scenario for distant cores.

LightningZ71 · Aug 16, 2018

On the L4 Cache question: L4 cache is very expensive in die area. It can double or triple the size of a processor die if properly sized. My proposal for an L4 cache for Zepplin is contingent on the die themselves shrinking significantly with new process technology. The package will always be quite large to handle the number of pins needed for I/O, while the die can get smaller and smaller. That being the case, there eventually becomes a lower bound on how the small the die can actually be and still allow all the IO connections that are needed. On 7nm, it may not make sense to add more die area for an L4 cache, but, perhaps at 5nm or 3nm, it may just be that there is so much shrinkage that the die needs to be kept somewhat artificially big. If that's the case, an L4 can make a lot of sense. An L4 can also still hide latency. With DDR-5 coming, it will have even more cycles of access latency, making each memory access more painful from a cycle point of view (the actual ns won't change a whole lot). Anything that can change that penalty will be a big benefit to performance.

french toast · Aug 16, 2018

I think it can get to a point where by you use the area for one die (25% of TR~4 die) to use as a large L4.
So for threadripper 3...3x 12 core Matisse die...36/72..(much faster zen2 cores) with one L4 die instead of extra die as in TR2.
This along with more efficient IF links would be much faster than 2990wx..more well rounded also.
Epyc Rome could be bespoke 16 core die X 4 with L4 cache.

wahdangun · Aug 16, 2018

I don't understand the obsession about l4 cache, why not just increasing L3 cache ? Increasing IF frequency ?

french toast · Aug 16, 2018

wahdangun said:
I don't understand the obsession about l4 cache, why not just increasing L3 cache ? Increasing IF frequency ?

Increasing IF frequency? We need to improve fabric efficiency, even that is not enough to solve the issues of none direct memory access and bandwidth issues.

Shivansps · Aug 16, 2018

wahdangun said:
I don't understand the obsession about l4 cache, why not just increasing L3 cache ? Increasing IF frequency ?

You cant "just increase L3 cache". Cache sizes are calculated to be the optimal amount for the CPU, adding more may degrade performance or offer no improvement at all.

And adding another level of cache the problem is A) Cost and B) Were to place it.

In a nutshell, you cant fix this with more cache, you are going to always have a high amount of cache miss, the only thing you are going to win with more cache is faster benchmarks, real-world escenarios is another thing.

ub4ty · Aug 16, 2018

Shivansps said:
You cant "just increase L3 cache". Cache sizes are calculated to be the optimal amount for the CPU, adding more may degrade performance or offer no improvement at all.

And adding another level of cache the problem is A) Cost and B) Were to place it.

In a nutshell, you cant fix this with more cache, you are going to always have a high amount of cache miss, the only thing you are going to win with more cache is faster benchmarks, real-world escenarios is another thing.

^what he said right here.
Essentially comp-arch. A general over-view for those interested :
http://courses.cs.vt.edu/cs2506/Fall2014/Notes/L16.CachePoliciesAndPerformance.pdf
Note CPI (https://en.wikipedia.org/wiki/Cycles_per_instruction) come into the picture again... This time in relation to cache design and performance. Memory stalls are real and can dog performance. Some micro-architectures have lived and died based on their cache design and subsequent performance.

naukkis · Aug 16, 2018

Shivansps said:
You cant "just increase L3 cache". Cache sizes are calculated to be the optimal amount for the CPU, adding more may degrade performance or offer no improvement at all.

And adding another level of cache the problem is A) Cost and B) Were to place it.

In a nutshell, you cant fix this with more cache, you are going to always have a high amount of cache miss, the only thing you are going to win with more cache is faster benchmarks, real-world escenarios is another thing.

L1 and L2 caches inside CPU core are speed-sensitive, so increasing their sizes can be harmful. L3 cache isn't so much, so L3 sizing is mainly about chip size.

wahdangun is spot on, with on-chip caches you don't implement L4, you increase size of L3. Implementing L4 on chip would be just stupid as it will duplicate logic, waste chips area and increase memory latency for nothing. If they implement L4 it's only because chip has not space for such size L3 and offchip cache has to be implemented as next level cache.

wahdangun · Aug 17, 2018

french toast said:
Increasing IF frequency? We need to improve fabric efficiency, even that is not enough to solve the issues of none direct memory access and bandwidth issues.

Yes, i know interconnect cost, but it can be mitigate with node shrink or increasing the efficiency, but no matter what you do, you can't make die without direct memory access to perform like normal die, even with l4 cache

Shivansps said:
You cant "just increase L3 cache". Cache sizes are calculated to be the optimal amount for the CPU, adding more may degrade performance or offer no improvement at all.

And adding another level of cache the problem is A) Cost and B) Were to place it.

In a nutshell, you cant fix this with more cache, you are going to always have a high amount of cache miss, the only thing you are going to win with more cache is faster benchmarks, real-world escenarios is another thing.

It's true if the L3 act as inclusive like -lake arch in intel.

Asterox · Aug 17, 2018

Kenmitch said:
Hmm....Anybody do similar testing in Linux?

For now "we only have 2990WX Windows 10 vs Linux performanse comparison".

https://www.phoronix.com/scan.php?page=article&item=2990wx-linux-windows&num=1

"Long story short, the Linux performance in a majority of these CPU-focused benchmarks were running much faster on the AMD Threadripper 2990WX than Windows 10 Pro when tested with the same hardware in the same configuration."

CatMerc · Aug 23, 2018

The Stilt said:
To get similar numbers for Intel, the power consumption of at least VCCIO and VCCSA planes should be included.

I asked Ian. He used the figures reported by CPU registers. Which ones I dunno.

The Stilt · Aug 23, 2018

CatMerc said:
I asked Ian. He used the figures reported by CPU registers. Which ones I dunno.

Definitely not the correct way to do it.

On AMD for example the measured (TFN) figures require separate calibration for each motherboard model, and the reported values can be skewed as well (MSI even provides control for the end-user to do that).
DCR / RdsOn measurements made by the controller itself would be acceptable, but still completely useless for determining "IF" power consumption (as you can only measure the whole SoC plane).

AMD has some software tools which can display the power consumption (at least the calculated one) of the individual blocks of the CPU, however I'm pretty certain they weren't available in this case.

With Intel the software power measurements are generally accurate, but only if the CPU is running at COMPLETELY stock.
Any change in the frequencies or voltages will throw them off. Also I haven't seen any figure on e.g. Skylake-X which would indicate the power consumption of the mesh itself.

2990WX review thread Its live !

wahdangun

Golden Member

french toast

Senior member

Greyguy1948

Member

LightningZ71

Platinum Member

LightningZ71

Platinum Member

french toast

Senior member

wahdangun

Golden Member

french toast

Senior member

Shivansps

Diamond Member

ub4ty

Senior member

naukkis

Golden Member

wahdangun

Golden Member

Asterox

Golden Member

CatMerc

Golden Member

The Stilt

Golden Member

TRENDING THREADS