2990WX review thread Its live !

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,740
14,772
136
Thanks, don't know how I missed that thread. Don't recall if I saw it mentioned, but since it's your first custom loop, I'd do a leak test before plugging in the power to the motherboard.
Please reply in that thread about how to do a leak test
 
Reactions: Drazick

coercitiv

Diamond Member
Jan 24, 2014
6,393
12,825
136
Some people only want to have one desktop computer in their house, so it makes sense to see if this CPU can fulfill the need for such folk.
If you own a system like it, chances are you have a daily driver or dedicated gaming box too (ex: i7 8700K or Ryzen 2700). Gaming is not one of it's main purposes by far
AMD themselves marketed the X series to gamers.



If AMD marketing doesn't know any better, why expect this from consumers?
 

CHADBOGA

Platinum Member
Mar 31, 2009
2,135
832
136
If you own a system like it, chances are you have a daily driver or dedicated gaming box too (ex: i7 8700K or Ryzen 2700). Gaming is not one of it's main purposes by far

Which is why I used the term "Some people" and not something like "Most people".
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
This is a reflection on the hard trade-off designers have to make. Infinity Fabric and MCM = Easier to make many core chips by ganging up dies. But costs higher in power and latency. Mesh and single die = Lower latency and power requirements. Takes extra time to put them on die.

AMD has made the right choices for now, but two years into the future when intel has icelake SP, EMIB, likely new protocols and topologies, AMD is going to need a better system.

EMIB will likely cost more in power and latency compared to a monolithic die. Perhaps its better than a regular MCM with an associated interconnect like IF used in Threadripper. Still more work needed and higher cost though.

Cooper Lake goes back to an MCM not because they assume its the absolute technically best solution, but given what they have and what they need to do they decided it was the best.
 
Last edited:
Reactions: french toast

Schmide

Diamond Member
Mar 7, 2002
5,589
724
126
(snip)
These are the instructions for a simple printf program. I'm sure you don't need this lesson as you seem informed. Now extrapolate this to a benchmark application.

Analysis of an execution flow :
https://en.wikipedia.org/wiki/Cycles_per_instruction

I like what you write.

I have a good concept of how things move through the pipeline and often count certain instructions in my code to get an idea of how efficient the code is.

The above examples, while great as abstract examples for learning, do not represent real world code. No one would ever measure a program like hello world. It basically does some setup then makes a system call.

As well the other example is loaded with dependencies, (some designed to confuse), is in order, and simplified. I do not believe any modern processor could return a dependance in 1 or 2 cycles. I would think it would be more in the order of 7-14.

So here is what I do. In my current project I'm looking at better ways of transposing a matrix of bits for faster output to the GPIO. (aka bit banging) It will be open sourced so I don't care posting some snippets here.

This is a major routine. It takes matrix, interleaves the bytes from the middle over and over. The result is a transposed matrix. There are more direct ways of doing this but this method is very friendly to a pipelined arch.

Code:
void InterleaveBytes(__m128i *in, __m128i *out, __m128i *scratch, long count, unsigned long passes, unsigned long offset)
{
   __m128i *to, *nextTo;
   __m128i *from = in;
   if (passes-- & 1) { // odd
      to = scratch;
      nextTo = out;
   } else {
      to = out;
      nextTo = scratch;
   }
   __m128i *end = &to[count];
   do {
      *to++ = _mm_unpacklo_epi8(from[0], from[offset]);
      *to++ = _mm_unpackhi_epi8(from[0], from[offset]);
      from++;
   } while (to < end);
   from = &to[-count];
   to = nextTo;
   do {
      end = &to[count];
      do {
         *to++ = _mm_unpacklo_epi8(from[0], from[offset]);
         *to++ = _mm_unpackhi_epi8(from[0], from[offset]);
         from++;
      } while (to < end);
      end = &from[-(count>>1)];
      from = &to[-count];
      to = end;
   } while (--passes > 0);
}

So typically this routine would loop 4-5 times on a matrix size of 128 bytes. For loops like this most of the instructions are superfluous. As long as you feed the major instructions at a decent rate they will be the determining factor.

On a haswell 4ghz

Code:
      /* 40+128=168 non loop operations
      10m passes comes in at
      0.48 sec so 10m/0.48 = 21m
      4g / 168 = 24m
      87% efficiency
      */

This first routine is the 40. There is a more complex routine that shifts out the bits. The 128.

I now have a metric of how certain instructions are moving through the system. It is ordered, repeatable, and logical.

The same can be said for almost any routine in the abstract. Yes there could be better terms for what is going on, but IPC is a good term that people understand.

In the pure sense, you are correct.

However, I believe there is room for many contexts for the term IPC.

Edit: The routine for AVX is a bit more complicated because of the 128bit lanes. Here is the results for it though. It almost doubles the throughput at a lesser efficiency. It has less IPC. In this case your argument against IPC shows validity.

Code:
   /* 24+64=88 non loop operations (sans byte bit swap)
      10m passes comes in at 0.3 sec so
      10m/0.3 = 33m
      4g/88 = 49m
      67% efficiency
      */
 
Last edited:

BigDaveX

Senior member
Jun 12, 2014
440
216
116
OMG, people are complaining about gaming performance.
Gaming's not its main purpose, sure. It's also not Skylake-X's main purpose, but that didn't stop people raking it over the coals when its gaming performance turned out to have dropped all the way back to Ivy Bridge levels in certain cases.

If you own a system like it, chances are you have a daily driver or dedicated gaming box too (ex: i7 8700K or Ryzen 2700). Gaming is not one of it's main purposes by far
In my case, I wouldn't physically have the room for a dedicated gaming box. Meaning that if I were putting together a one-size-fits-all rig, my choices would boil down to the 2950X (good, cheap all-rounder, but not really the performance leader in anything), 7980XE (very solid, consistent performance, plus AVX-512 support, but ridiculously expensive and less PCIe connectivity) or 2990WX (crazy fast in multi-thread situations, but wildly inconsistent in single/low thread).

There's not really an obvious "best" CPU out of that line-up, and I'm sure AMD would have loved to have marketed the 2990WX as the CPU that can be all things to all people - which, let's face it, is how Intel markets the 7980XE, despite it being by no means their best gaming chip - but clearly there's too many pitfalls as it is for them to do that.

You could also argue that having to build a second gaming box would eliminate its price advantage over Intel's line-up, but then again this thing's real competition will be the rumoured LGA3647 i9s, and right now we have absolutely no idea how they'll be priced or how well they'll do in games, so it'll still probably come out in front on that count.
 

french toast

Senior member
Feb 22, 2017
988
825
136
Gaming's not its main purpose, sure. It's also not Skylake-X's main purpose, but that didn't stop people raking it over the coals when its gaming performance turned out to have dropped all the way back to Ivy Bridge levels in certain cases.


In my case, I wouldn't physically have the room for a dedicated gaming box. Meaning that if I were putting together a one-size-fits-all rig, my choices would boil down to the 2950X (good, cheap all-rounder, but not really the performance leader in anything), 7980XE (very solid, consistent performance, plus AVX-512 support, but ridiculously expensive and less PCIe connectivity) or 2990WX (crazy fast in multi-thread situations, but wildly inconsistent in single/low thread).

There's not really an obvious "best" CPU out of that line-up, and I'm sure AMD would have loved to have marketed the 2990WX as the CPU that can be all things to all people - which, let's face it, is how Intel markets the 7980XE, despite it being by no means their best gaming chip - but clearly there's too many pitfalls as it is for them to do that.

You could also argue that having to build a second gaming box would eliminate its price advantage over Intel's line-up, but then again this thing's real competition will be the rumoured LGA3647 i9s, and right now we have absolutely no idea how they'll be priced or how well they'll do in games, so it'll still probably come out in front on that count.
The 1950x and 2950x are the 'all things to all people's' perf/$ champ though...sure it is not the best at anything, except perhaps it's value...but it is best all round workhorse for your money, with no real weaknesses in anything.
With intel skylake X you have to pay twice the RRP to get that better performance all-round...for some people it is worth it, for many others it is over priced, I feel this will be reflected in the sales.

If you have the money spare and you want a jack of all trades..top performer..buy the intel solution.
If you have a smaller budget buy the 2950x which gets slightly lower performance, but at half the price.

Both good choices, you wouldn't by the 2990wx unless you had a specific rendering workload to use it for 80% of the time.
 
Reactions: msroadkill612

mattiasnyc

Senior member
Mar 30, 2017
356
337
136
Which is why I used the term "Some people" and not something like "Most people".

I think the other point still stands though: Once the buyer has built their 2990wx system they've spent so much money that a top-of-the-line graphics card suited for gaming and thus gaming at high resolution seems reasonable. And at that point aren't the video cards the limiting factor?

In addition,at those prices why not just get a gaming-dedicated cheaper computer or a console?
 
Jul 24, 2017
93
25
61
I think the other point still stands though: Once the buyer has built their 2990wx system they've spent so much money that a top-of-the-line graphics card suited for gaming and thus gaming at high resolution seems reasonable. And at that point aren't the video cards the limiting factor?

In most cases yes, the video card would become the limiting factor, but there are a few games where performance on the 2990WX is so bad that you would actually be bottlenecking a high-end card even at 1440p/4K. Attempting to use this as a gaming CPU even "from time to time" really isn't a good idea, at least not unless we get major Windows and game updates to make them work better with the 2990WX's unique design.
 

Abwx

Lifer
Apr 2, 2011
11,167
3,862
136
I think the other point still stands though: Once the buyer has built their 2990wx system they've spent so much money that a top-of-the-line graphics card suited for gaming and thus gaming at high resolution seems reasonable. And at that point aren't the video cards the limiting factor?

In addition,at those prices why not just get a gaming-dedicated cheaper computer or a console?

The GFX that suit an eventual TR WX build :


https://www.anandtech.com/show/13210/amd-announces-radeon-pro-wx-8200
 

Hitman928

Diamond Member
Apr 15, 2012
5,600
8,790
136
The 2990wx has a game mode which can bring it down to one die with 8c/16 where it performs about equal to a 2700. That should be fine for those who want to play the occasional game on a workstation system. . .
 
Reactions: french toast

french toast

Senior member
Feb 22, 2017
988
825
136
The 2990wx has a game mode which can bring it down to one die with 8c/16 where it performs about equal to a 2700. That should be fine for those who want to play the occasional game on a workstation system. . .
If money is no object buy an epyc or something, none of the draw backs, loads of everything.
 

BigDaveX

Senior member
Jun 12, 2014
440
216
116
If money is no object buy an epyc or something, none of the draw backs, loads of everything.
Epyc's clocked a lot lower, though. Besides, the overheads which are hurting performance in certain scenarios on the 2990WX might actually be worse on Epyc, since there's four memory pools for Windows to juggle instead of two.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,740
14,772
136
Epyc's clocked a lot lower, though. Besides, the overheads which are hurting performance in certain scenarios on the 2990WX might actually be worse on Epyc, since there's four memory pools for Windows to juggle instead of two.
Also, the linux performance is insanely good, I mean it crushs everything on all then benchmarks., and thats what I will be running after its all setup and OC'ed
 

Guru

Senior member
May 5, 2017
830
361
106
Seems like the issue lies with the way they configured the CCX cores to access the memory, only 2 out of the four can access the memory directly and this causing an issue. I wonder why then did they not make all CCX access the memory? Can someone with an expertise in CPU design answer what would be the drawback to allowing the other CCX cores to access the memory as well? Does it increase the die size, is there some other penalty?
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Seems like the issue lies with the way they configured the CCX cores to access the memory, only 2 out of the four can access the memory directly and this causing an issue. I wonder why then did they not make all CCX access the memory? Can someone with an expertise in CPU design answer what would be the drawback to allowing the other CCX cores to access the memory as well? Does it increase the die size, is there some other penalty?

You do realize there are four Zeppelin dies (MCM4) attached on the same substrate?
 
Reactions: Drazick

dnavas

Senior member
Feb 25, 2017
355
190
116
Also, the linux performance is insanely good, I mean it crushs everything on all then benchmarks., and thats what I will be running after its all setup and OC'ed

Yeah. If you skipped the phoronix reviews, go back and have a look-see. They wrote up a bunch of reviews which are long on graphs and short on words. The performance difference between linux and windows is really quite impressive.
 

CuriousMike

Diamond Member
Feb 22, 2001
3,044
543
136
Seems like the issue lies with the way they configured the CCX cores to access the memory, only 2 out of the four can access the memory directly and this causing an issue. I wonder why then did they not make all CCX access the memory? Can someone with an expertise in CPU design answer what would be the drawback to allowing the other CCX cores to access the memory as well? Does it increase the die size, is there some other penalty?

Hardware unboxed explained it in a way I finally understood -
https://youtu.be/QI9sMfWmCsk?t=3m49s

And how that impact memory bandwidth
https://youtu.be/QI9sMfWmCsk?t=17m3s
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |