Question Will there be an 16 core TRX40 Threadripper?

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Kedas

Senior member
Dec 6, 2018
355
339
136
I wanted an 4 mem-channel 16 core but it seems the leaks only show 24 cores and up.
16 core I can ALSO use for games but 24 cores, I don't think that will work well?
Buying to disable 8 cores isn't really acceptable.
 

Kedas

Senior member
Dec 6, 2018
355
339
136
So you are saying it's easier to believe that they have broken their uniform design for EPYC then to do something they have done several times in disabling L3?
You call 4 dies uniform EPYC design, even without freedom of an I/O die with Zen1 they made an 1 die SKU for EPYC embedded.
They could disable L3 but I don't think so for Rome or TR3
 

Topweasel

Diamond Member
Oct 19, 2000
5,436
1,655
136
You call 4 dies uniform EPYC design, even without freedom of an I/O die with Zen1 they made an 1 die SKU for EPYC embedded.
They could disable L3 but I don't think so for Rome or TR3
There are tons of write and read performance trickiness you can run into with disabled dies. Things server tools can be sensitive to.

As for embedded EPYC. It wasn't a 4 die full socket, with one die turned on. It was a Ryzen 1k sold as an embedded EPYC with all server functionality enabled and validated.

I'll ask again if it's so flexible. Why does EPYC go from 24-32-48-64c. If they are willing to sell 8c and 16c EPYCs. Why not have more higher end pricing tiers? Again where is the 56c EPYC?
 

moinmoin

Diamond Member
Jun 1, 2017
4,994
7,765
136
There is good reason to believe that they are limited to either 4 or 8 CCDs for several reasons. Until we see anything different then any configuration of cores will be based on that.
At least we won't see anything different in Epyc chips. The server IOD contains 4 distinct dual channel IMCs, to make uniform use of them all each of them has to be connected/close to the equal amount of dies. The big question is how the IOD is configured/changed for Threadripper considering it is still 4 channel unlike the 8 channel Epyc.
 

Topweasel

Diamond Member
Oct 19, 2000
5,436
1,655
136
At least we won't see anything different in Epyc chips. The server IOD contains 4 distinct dual channel IMCs, to make uniform use of them all each of them has to be connected/close to the equal amount of dies. The big question is how the IOD is configured/changed for Threadripper considering it is still 4 channel unlike the 8 channel Epyc.
I expect its something similar. Can probably do 1 CCD to 1 mem channel, 2CCD to 1Mem channel, 1CCD to 2 Memchannel. There is a reason besides space that AMD went with the 2 CCD's side by side in that arrangement and I believe they had a sort of interconnect between the two CCD's on top of that. But its got be the correct target Mem channels do to substrate wiring. People want to use the 2970 and 2990wx as examples of unbalanced. But its really a different situation. The IO chip is a hub and that hub probably needs proper connections to work within spec. Some flexibility exists there. But we already see things like write-speed to the IO dropping due to the use of single die on desktop. We don't know all of the pitfalls of randomly removing CCD's because it should be possible. Its a very tight design and wouldn't assume they could do something willy nilly just because some where out there it might be slightly feasible.
 

Topweasel

Diamond Member
Oct 19, 2000
5,436
1,655
136
According to this the 16core TR3 will be 140W (or 280W)
It's not the 3950X otherwise it would be 105W
The 280W 16C version may be the 4 die version.

I would question if anything in that lineup will ever exist. Seems mostly nonsensical and maybe based on early design options and not shipping products. Also the 140w 16c chips aren't labeled Threadripper. So probably the 3950x. My guess made by someone who thought the 3950x might take the power level up from the 3900x. Which makes the document look more fake.
 
Reactions: scannall

beginner99

Diamond Member
Jun 2, 2009
5,223
1,598
136
it's not just avx512, intel mkl which is used by default if you use anaconda and numpy runs sse-code or even slower path on ryzen. It doesn't even use AVX2. this is what you get with ryzen if you follow the normal route according to tutorials.

A follow up on this. Bascially any Application that is compiled vs MKL suffers from the issue. This includes matlab for which there is no alternative with openblas.

However, just this week on reddit there was a workaround making it's rounds. You can set an environment variable MKL_DEBUG_CPU_TYPE=5 (undocumented) and that will run a correct AVX2 code path even on a Ryzen CPU for any MKL compiled application. This increases performance massively, easily >4x depending on exact test.
This is also helpful on linux if the application is only available with MKL.
Problem is, Intel can remove this "feature" at any time. So AMD needs to act quick. If you use matlab, AMD CPU makes no sense right now...
 
Reactions: moinmoin

Atari2600

Golden Member
Nov 22, 2016
1,409
1,655
136
In fact since I have the 3900x on windows (my previous link used ubuntu) I tried to recreate the results and they pretty much match what that guy got on ubuntu. However, getting numpy installed with openblas instead of mkl was a real pain in the ass on windows with anaconda, see took me an hour to get it working and that is just with numpy.

Hmm, didn't know that.

Most of my stuff uses custom libraries which are compiled using VS2017 - comparing a 2950X to an E5-2630V3, the Threadripper is over 2x as fast (per thread) using my own C extensions.

Might be an avenue you could go down?

its really not that hard once you have a template for how - as long as you can live without sending over 2D lists. Could put some guidance into the Programming subforum if it would be of use to youse?
 
Reactions: lightmanek

beginner99

Diamond Member
Jun 2, 2009
5,223
1,598
136
A follow up on this. Bascially any Application that is compiled vs MKL suffers from the issue. This includes matlab for which there is no alternative with openblas.

However, just this week on reddit there was a workaround making it's rounds. You can set an environment variable MKL_DEBUG_CPU_TYPE=5 (undocumented) and that will run a correct AVX2 code path even on a Ryzen CPU for any MKL compiled application. This increases performance massively, easily >4x depending on exact test.
This is also helpful on linux if the application is only available with MKL.
Problem is, Intel can remove this "feature" at any time. So AMD needs to act quick. If you use matlab, AMD CPU makes no sense right now...


I know this is a very old thread but I needed to add here, that with the newest Intel MKL version 2020.1, intel has closed this loophole and hence Intel MKL is slow on any AMD CPU and nothing can be done about it unless telling the software supplier not to use intel MKL or offer and alternative.

AMD needs to get their redacted together. It's getting similar to CUDA lock-in. The software game matters a lot. AMD needs their own BLAS working as nicley on all operating system (including Windows...). Another alternative is to at least help the opensurce community to get OpenBLAS builds for as much stuff as possible. At work I simply can't deal with getting ugly hacks to work.

Profanity is not allowed in tech areas.

AT Mod Usandthem


EDIT:

In a suprising twist of events it turns out above statement about Intel MKL was completely wrong. I simply believed people on social media claiming this was happening. They are true that MKL_DEBUG_CPU_TYPE=5 has no effect anymore. Why?

MKL 2020.1 by default used the fast path on my AMD Ryzen CPU.

This is actually exactly the opposite of what I wrote! It's a very good and extremely surprising thing which hopefully lasts. More in an additional comment
 
Last edited:
Reactions: Drazick

moinmoin

Diamond Member
Jun 1, 2017
4,994
7,765
136
with the newest Intel MKL version 2020.1, intel has closed this loophole and hence Intel MKL is slow on any AMD CPU and nothing can be done about it unless telling the software supplier not to use intel MKL or offer and alternative.
Can't software suppliers keep using the last version with the loophole instead upgrading to newer versions which remove it? Possibly shipping both and keep using the old one on AMD systems?

If the software suppliers won't do that, the users certainly will (try as much as they can).

Anyway with this move this has now become a case for antitrust as with this lapse Intel itself already has shown that supporting a competitor is possible.
 

DrMrLordX

Lifer
Apr 27, 2000
21,802
11,157
136
Can't software suppliers keep using the last version with the loophole instead upgrading to newer versions which remove it? Possibly shipping both and keep using the old one on AMD systems?

Possibly. If they never upgrade, then yes. The real question is: what incentive do they have to do so? Intel may make it worth their while to upgrade.
 

moinmoin

Diamond Member
Jun 1, 2017
4,994
7,765
136
Possibly. If they never upgrade, then yes. The real question is: what incentive do they have to do so? Intel may make it worth their while to upgrade.
My question was more a legal one, whether Intel can otherwise outright forbid them the use of their libraries. Else software suppliers are more likely to do nothing. This change has the potential of creating a lot of angry customers.
 

LightningZ71

Golden Member
Mar 10, 2017
1,659
1,944
136
Well, I mean, that's a HUGE incentive for me, as an end user, to upgrade to the latest version of their program that uses the new Intel BLAS. I can't wait to pay for a new version of the software that upgrades my calculation performance by quartering it!

I'm hoping that this makes enough hay in the social media circuits to really tank the sales of any package that uses the updated library version. I now it won't, as most people will just do it blindly and not realize that anything changed because: their AMD system was always slower than the intel systems in the lab, or their Intel system didn't change performance at all.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,729
136
There's nothing illegal about Intel modifying their own software to lock AMD out. AMD should be working on developing their own libraries and/or contribute to OpenBLAS like @beginner99 said and help software vendors like MathWorks, Anaconda, Wolfram etc. integrate their libraries in the software.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,729
136
Could violate the antitrust settlement.
Intel isn't violating anything - they are as per the settlement simply allowed to explicitly state that their software behaves differently on non-Intel processors:


disclose to software developers that Intel computer compilers discriminate between Intel chips and non-Intel chips, and that they may not register all the features of non-Intel chips. Intel also will have to reimburse all software vendors who want to recompile their software using a non-Intel compiler.
 

moinmoin

Diamond Member
Jun 1, 2017
4,994
7,765
136
Intel isn't violating anything - they are as per the settlement simply allowed to explicitly state that their software behaves differently on non-Intel processors:
While technically not violating anything, I do think it makes a difference that however accidentally they showed themselves that better support is possible, and are now actively taking away said support. I expect the public to react accordingly, and antitrust to take a second look. With the current market situation, Intel still being dominant but AMD having an upper hand performance wise in nearly all markets, antitrust may still consider this particular change an abuse from a dominating market position.

Of course it would be much better for AMD (and all the software suppliers for that matter) to support open, vendor agnostic implementations more and better, but that's a consideration different from the above.
 

beginner99

Diamond Member
Jun 2, 2009
5,223
1,598
136
I have edited my initial comment that triggered the last couple comments but to help visibility I also post it here in a new comment:

In a suprising twist of events it turns out the statement about Intel MKL crippling AMD and closing the loophole was completely wrong. I simply believed people on social media claiming this was happening. They are true that MKL_DEBUG_CPU_TYPE=5 has no effect anymore. Why?

MKL 2020.1 by default used the fast path on my AMD Ryzen CPU.

This is actually exactly the opposite of what I wrote! It's a very good and extremely surprising thing which hopefully lasts.

I was naive and only tested this on my own Ryzen after writing the comment here. I tested extensively (anaconda python with numpy MKL vs numpy OpenBLAS) and now I'm confident this is true and MKL is actually fixed!!! MKL numpy is equivalent or a lot faster than openblas depending on the exact test. It's a lot faster (3x) in SVD which has relevance for AI/ML. Some additional users now are making the same observations. After all Matlab was fixed as well and I doubt they ship it with different BLAS depending on CPU.

This is so shocking giving intels track record, I blindly believed the "bad press" on social media. But in fact it's extremely good news.
 

beginner99

Diamond Member
Jun 2, 2009
5,223
1,598
136
Huh! That's surprising.

It is but I'm not keeping my hopes up and it's hard for me to make a full judgement. This current MKL version is clearly faster than the gimped one available end of 2019 that needed the "loophole" to give acceptable performance. The current version is faster on my 3900x in all the tests I tried compared to OpenBLAS.

However in some tests ( numpy svd and eig) my 3900x performs pretty much identical to my 6-core coffe-lake Intel laptop. This is "too much" for me to judge if the test simply isn't that parallel or Intel cpu has advantages from intel specific cache optimizations obviously missing for any AMD CPU or if MKL still is hurting AMD performance, just less than before. For now the conclusion that stands is that even with an AMD CPU on Windows, MKL will give you bettter perfromance than OpenBLAS.

For a final verdict I would need a high core count intel CPU to check if svd and eig scale with cores.

EDIT:

With setting

os.environ["MKL_NUM_THREADS"] = "1"

before numpy import one can limit threads used by MKL. svd takes a little less than double the time compared to using all threads, eig is just 25% slower with just one thread so these 2 tests clear don't scale well with core count. Setting it to 2 for these tests leads to similar performance as with default of all threads (=6, intel machine) hence I guess that explains why the 3900x isn't any faster.
 
Last edited:

moinmoin

Diamond Member
Jun 1, 2017
4,994
7,765
136
MKL 2020.2 adds a Zen specific kernel. Unfortunately it's not complete, and everything missing uses the slowest code path. Here are some tests by a researcher who shows a new workaround under Linux, essentially replacing a function of the library:

MKL 2020.3 isn't complete either. MatLab appears to keep using MKL 2020.0, the last version to allow enabling the debug mode previously used to enable the AVX2 code path on Zen chips.

via https://www.computerbase.de/2020-09/intel-mkl-2020-2-zen-kernel-amd-workaround/ (German)
 
Reactions: lightmanek
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |