Why Does AMD GPU Architecture Appear to be Under-Designed?

crazzy.heartz

Member
Sep 13, 2010
183
26
81
A 24 CU / 1500 shader mobile GPU is able to outperform a full blown GTX 1060, all because of 64 ROPs

Whereas on the Desktop front, it takes them 36 CU's / 2300 Shaders to equal GTX 1060's performance..

RX Vega M GH performance:
(64 ROPs competing with a GTX 1060)


http://www.tomshardware.com/news/intel-amd-radeon-vega-gpu,36250.html






https://www.engadget.com/2018/01/07/intel-amd-rx-vega-m/

It's further validated by the fact that it's cousin, Vega M GL, featuring 20 CUs / 1280 Shaders could only trade blows with a GTX 1050, crippled by 32 ROPs




Why AMD releases 2000+ Shader GPUs with measely 32 ROPs, whereas Nvidia engineers, even their 1280 shader GFX cards with 64 ?

Shouldn't AMD be using higher ROPs to maximise performance of their Graphics cards !?


Edit: trying to fix broken images.

Thread title moderated.
-- stahlhart
 
Last edited:

PeterScott

Platinum Member
Jul 7, 2017
2,605
1,540
136
A 24 CU / 1500 shader mobile GPU is able to outperform a full blown GTX 1060, all because of 64 ROPs

Whereas on the Desktop front, it takes them 36 CU's / 2300 Shaders to equal GTX 1060's performance..

RX Vega M GH performance:
(64 ROPs competing with a GTX 1060)


Why AMD releases 2000+ Shader GPUs with measely 32 ROPs, whereas Nvidia engineers, even their 1280 shader GFX cards with 64 ?

Shouldn't AMD be using higher ROPs to maximise performance of their Graphics cards !?

It's not a "full blown GTX 1060". It's a Max Q Model.

Max Q GTX 1060 < Laptop GTX 1060 < Desktop GTX 1060. So a couple of steps down on "full blown".

Also "Outperform" is a stretch. It scrapes ahead on a couple of cherry picked AMD favorable benchmarks.

Those might be games more prone to show the advantages of more ROPs/HBM.

Also this is much newer design than Polaris with 36 CU's that you are comparing with, and HBM memory is no doubt an advantage. Why you don't see it on all models is because it's more expensive.

So it looks like newer design, with faster, more expensive memory, on cherry picked games that make it look good, looks better than older design on slower memory. Hardly a newsflash.

When we get beyond the marketing, to independent testing on more games, we might get a clearer picture.
 

IllogicalGlory

Senior member
Mar 8, 2013
934
346
136
So it looks like newer design, with faster, more expensive memory, on cherry picked games that make it look good, looks better than older design on slower memory. Hardly a newsflash.
The memory isn't faster. It's actually 204.8 GB/s versus the RX 480 8GB's 256 GB/s.


https://www.engadget.com/2018/01/07/intel-amd-rx-vega-m/

But yes the 1060 desktop is about 30% faster than the 1060 max-Q.


https://www.notebookcheck.net/NVIDIA-GeForce-GTX-1060-Max-Q-GPU-Benchmarks-and-Specs.224734.0.html
https://www.notebookcheck.net/NVIDIA-GeForce-GTX-1060.167603.0.html

Nonetheless, the results look pretty good, considering the RX 480's 58% FLOPS advantage.
 

crazzy.heartz

Member
Sep 13, 2010
183
26
81
It's not a "full blown GTX 1060". It's a Max Q Model.

Max Q GTX 1060 < Laptop GTX 1060 < Desktop GTX 1060. So a couple of steps down on "full blown".

When we get beyond the marketing, to independent testing on more games, we might get a clearer picture.

I wrote Full blown in the sense that it's whole 1060 chip and not a cut down version as in 1060 3GB or 1060 5GB, with less resources. Of course, desktop versions of both chips will feature more aggressive clocks, which would result in higher performance.

Yes, they are going to choose games that have favorable results. The point im trying to make is that such a small chip is able to compete with a full size GTX1060, in a frequency constrained environment.

Both 24 CU / 1500 shader AMD chip and GTX 1060 perform similarly, when operating at 1100/1200Mhz. This simple increase in the number of ROPs boosted a RX 560 level chip to RX 580 levels..

Why are you intentionally creating an incendiary thread title?
Just ask the question you want to ask.

Is it not intentional for AMD to limit the performance of it's graphics cards by using fewer number of ROPs ? I always thought that GTX 1060/1050 punch above their weight, compared to AMD counterparts, simply because they have higher ROPs which results in higher pixel fill rate.

AMD could very well implement the same in their Desktop designs, it simply chooses not to.. However, the very first chip they designed for Intel, they increased it and now it performs at a whole new level.

Polaris, when Designed for Intel:

 

PeterScott

Platinum Member
Jul 7, 2017
2,605
1,540
136
I wrote Full blown in the sense that it's whole 1060 chip and not a cut down version as in 1060 3GB or 1060 5GB, with less resources. Of course, desktop versions of both chips will feature more aggressive clocks, which would result in higher performance.

Yes, they are going to choose games that have favorable results. The point im trying to make is that such a small chip is able to compete with a full size GTX1060, in a frequency constrained environment.

Both 24 CU / 1500 shader AMD chip and GTX 1060 perform similarly, when operating at 1100/1200Mhz. This simple increase in the number of ROPs boosted a RX 560 level chip to RX 580 levels..

If they are both running at about 1100/1200 MHz, that explains a lot.

That is pretty close to the Desktop speed that Polaris runs at now, while Desktop GTX 1060 runs over 1700MHz.

So MaxQ is knocking down performance a LOT with clock speed. Where this new Vega has less CUs, it runs them closer to stock Polaris speed.

GTX loses performance from clockspeed, and Vega Mobile from less units, they end up in the same ballpark.


Is it not intentional for AMD to limit the performance of it's graphics cards by using fewer number of ROPs ? I always thought that GTX 1060/1050 punch above their weight, compared to AMD counterparts, simply because they have higher ROPs which results in higher pixel fill rate.

AMD could very well implement the same in their Desktop designs, it simply chooses not to.. However, the very first chip they designed for Intel, they increased it and now it performs at a whole new level.

Polaris, when Designed for Intel:

It's not Polaris, it's Vega Mobile. Usually the GTX GPUs punch above their weight, because they run at MUCH higher clock speed. When moving GTX from desktop to Mobile they lose that.

Vega Mobile benefits from being a much newer design than Polaris and likely from actually being designed for Mobile.

It is not a case of AMD crippling Polaris on purpose (totally absurd), it's question of newer mobile design, doing relatively better with less CUs because it's competitor that isn't designed for mobile takes a big hit from loss of clock speed, when going mobile.
 
Reactions: Krteq and tential

Qwertilot

Golden Member
Nov 28, 2013
1,604
257
126
I've also seen speculation - from the main site here iirc - that these results might actually be down to the CPU. Because the CPU & GPU in this thing are sharing an inherent power budget they can ramp the CPU up quite hard when that's the limiting factor.

Certainly if there's an edge case like this where something really performs then you can fully expect Intel's marketing to seek out and loudly proclaim it I'd really wait until it gets proper reviews before concluding anything much.
 

crisium

Platinum Member
Aug 19, 2001
2,643
615
136
I complained about Polaris' 32 ROPs the second I saw it. And the performance showed my concerns were correct, as despite higher "IPC" if everything is equal (see Tonga vs Polaris), it has worse performance-per-flop than Hawaii with its 64 ROPs.

I can only speculate that Polaris was cheaper to manufacture with 32 ROPs, and AMD thought that with their new engine tweaks to GCN 4 that 32 ROPs would be fine.

In retrospect they may have gone with something different though. Seeing 1536SP 64ROP Vega compete with the 1060 Max Q (underclocked compared to desktop) makes me confident a 1920SP 64ROP Polaris would be able to compete with a desktop GTX 1060. But it may have been more expensive.

I have noted before that as the well oilded Nvidia machine churns forward, they generally push higher ROP-to-Shader ratios. From 48:2880 in Kepler, to 48:1280 in Pascal. In the same time span, AMD has done the opposite, moving from 64:2816 to 32:2304. While there is a lot more at play than ROP ratios, I still feel that AMD has chosen the wrong approach. Again, probably they're either overly optimistic in eliminating bottlenecks or just want to save cash. Not purposefully reducing their competitiveness.

I have also noted that I wish AMD made a bigger Polaris with the Hawaii configuration. It might have to run at lower clocks than Polaris 10 to keep the power in check, but unleashing the Polaris architecture (close to 20% higher IPC compared to GCN 1) with the least amount of bottlenecks GCN has ever seen would have been nice to see, and it could have competed easily with the 1070 and possibly 1080. But overconfidence in Polaris 10 and especially Vega 10 release time made them intentionally not do this, imo.

Just some thoughts.
 
Reactions: CatMerc and tential

TempAcc99

Member
Aug 30, 2017
60
13
51
While there is a lot more at play than ROP ratios

Yeah the RX 4/580 is also memory bottlenecked. This can be best seen between the 480 and 580 where the 580 has worse performance/watt with higher clocks but barley more performance. They went to the limits of the design with their skus.
 

Enigmoid

Platinum Member
Sep 27, 2012
2,907
31
91
The main takeaway is that the comparison system in question (Dell Inspiron 15 looking at the notes) appears to use massively less power.

https://www.notebookcheck.net/Dell-...HQ-GTX-1060-Max-Q-Laptop-Review.264134.0.html

59W AT THE WALL on 3D mark, 98W for Witcher 3 (i5 dell variant). Compared to an intel + AMD chip rated 100W and designed to exploit all power headroom. Even looking at less efficient models it is very obvious that the intel + nvidia systems have significant power advantages. For instance the 1060 MSI takes 125W at the wall for Witcher 3 (I have the notebook and observe similar power numbers). I doubt that the 100W equivalent system is more efficient.
 
Reactions: xpea

LTC8K6

Lifer
Mar 10, 2004
28,520
1,575
126
The main takeaway is that the comparison system in question (Dell Inspiron 15 looking at the notes) appears to use massively less power.

https://www.notebookcheck.net/Dell-...HQ-GTX-1060-Max-Q-Laptop-Review.264134.0.html

59W AT THE WALL on 3D mark, 98W for Witcher 3 (i5 dell variant). Compared to an intel + AMD chip rated 100W and designed to exploit all power headroom. Even looking at less efficient models it is very obvious that the intel + nvidia systems have significant power advantages. For instance the 1060 MSI takes 125W at the wall for Witcher 3 (I have the notebook and observe similar power numbers). I doubt that the 100W equivalent system is more efficient.
What about on battery power?
If you are plugging in your laptop to play games, you'd probably do better on power with an efficient desktop system?
 

tential

Diamond Member
May 13, 2008
7,355
642
121
What about on battery power?
If you are plugging in your laptop to play games, you'd probably do better on power with an efficient desktop system?

The point of having a laptop you can game on is that you can easily move it.

An efficient desktop system would not serve the same purpose. A surface book (gtx 1060)and an efficient desktop just are two completely different beasts.
 
Reactions: IEC

IEC

Elite Member
Super Moderator
Jun 10, 2004
14,362
5,033
136
The point of having a laptop you can game on is that you can easily move it.

An efficient desktop system would not serve the same purpose. A surface book (gtx 1060)and an efficient desktop just are two completely different beasts.

As someone who used to travel 100% and had a "gaming" laptop/DTR for work + play on the side... this is exactly the reason. When you fly every week and want to fit everything in carry-on luggage there really isn't any other option.
 
Reactions: tential

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
I totally agree that AMD has not been designing desktop GPUs with optimal ROP/shader cores. Nvidia has a much better ROP/shader core ratio. The last really competitive high end AMD GPU was Hawaii back in 2012 which had 64 ROPs for 44CUs. Fiji and Vega was stuck at the same 64 ROPs for 64 CUs. AMD has a problem with scalability and efficiency with its current GCN designs. I do not know if they can fix that without a clean sheet design like Zen. Anyway I like what I see with the custom Rx Vega M for Intel. AMD has been able to get Rx Vega M to compete with atleast GTX 1060 Max Q by providing it with lots of ROPs and low power HBM2. I look forward to see the specs of the discrete Rx Vega mobile.
 
Last edited:

gregoryvg

Senior member
Jul 8, 2008
241
10
76
If it's so easy as just adding in more ROPs, why didn't AMD do that when it realized the Vega wasn't as competitive with nVidia as it needed to be? What would stop them from offering a Vega 64 Ti or something and doubling the ROPs (to oddly enough, 64)? Sounds like that would be a good way to at least compete with nVidia high-end and get some mind share.
 

CatMerc

Golden Member
Jul 16, 2016
1,114
1,153
136
AMD focused on other parts of its design. To add more ROPS they would have to add more shader engines, and adding more shader engines means spending engineering work on repiping GCN. As it is, GCN is limited to 64 ROPS.

It was simply a question of tradeoff, they saw it fit to spend engineering work elsewhere.

https://www.anandtech.com/show/11717/the-amd-radeon-rx-vega-64-and-56-review/2

On a brief aside, the number of compute engines has been an unexpectedly interesting point of discussion over the years. Back in 2013 we learned that the then-current iteration of GCN had a maximum compute engine count of 4, which AMD has stuck to ever since, including the new Vega 10. Which in turn has fostered discussions about scalability in AMD’s designs, and compute/texture-to-ROP ratios.

Talking to AMD’s engineers about the matter, they haven’t taken any steps with Vega to change this. They have made it clear that 4 compute engines is not a fundamental limitation – they know how to build a design with more engines – however to do so would require additional work. In other words, the usual engineering trade-offs apply, with AMD’s engineers focusing on addressing things like HBCC and rasterization as opposed to doing the replumbing necessary for additional compute engines in Vega 10.

As an aside - Hawaii remains the highest performance per clock per shader of the GCN GPUs. It's also the GPU with the most ROP per shader ratio before the Intel one.
 
Last edited:

JimKiler

Diamond Member
Oct 10, 2002
3,559
205
106
AMD focused on other parts of its design. To add more ROPS they would have to add more shader engines, and adding more shader engines means spending engineering work on repiping GCN. As it is, GCN is limited to 64 ROPS.

It was simply a question of tradeoff, they saw it fit to spend engineering work elsewhere.

https://www.anandtech.com/show/11717/the-amd-radeon-rx-vega-64-and-56-review/2



As an aside - Hawaii remains the highest performance per clock per shader of the GCN GPUs. It's also the GPU with the most ROP per shader ratio before the Intel one.
So AMD focused on Vulkan instead?
 

SPBHM

Diamond Member
Sep 12, 2012
5,058
410
126
going back to GCN1.0 with the 7970 it already looked strange (given 2048 vs 1280SPs, 384bit memory) that it had 32ROPs like the 7800s.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |