Vega/Navi Rumors (Updated)

Glo. · Oct 9, 2016

I have one or two issues with most of what people are saying here.

Lets get back to main attraction.

XFX RX 480 GTR: 1288 MHz@ 1.05v. OC: 1475 MHz@ 1.175v.

Something here is wrong, don't you all think?

I genuinely think this is revised process, or simply Samsung process(how on earth otherwise GTX 1050 Ti would be able to achieve 1.7 GHz clocks on this process?).

This also may be the reason for much lowered power consumption of Embedded GPUs(lower voltages, at the same clocks).

I wonder what this means actually, and what can mean for Vega. But that is remained to be seen.

Piroko · Oct 9, 2016

Glo. said:
When there is exception from the rule, then there is no rule.

That's taking it too easy. I think I was the first person to quote this video so I'll also take the freedom to put it into context: This was the fourth model Jayz reviewed and the only model that was this good. So, a rule, as in "a usually valid generalization", is defendable.

Generally, reviews of the 480 had a larger spread in performance and power consumption than 1060 reviews. Consensus seems to be though that the 480 can take a healthy undervolt at stock clocks, which shrinks the gap between those two significantly. To say that GCN won't be able to close the gap to Pascal will have a high chance of putting you into an awkward position.

Glo. · Oct 9, 2016

Piroko said:
That's taking it too easy. I think I was the first person to quote this video so I'll also take the freedom to put it into context: This was the fourth model Jayz reviewed and the only model that was this good. So, a rule, as in "a usually valid generalization", is defendable.

Generally, reviews of the 480 had a larger spread in performance and power consumption than 1060 reviews. Consensus seems to be though that the 480 can take a healthy undervolt at stock clocks, which shrinks the gap between those two significantly. To say that GCN won't be able to close the gap to Pascal will have a high chance of putting you into an awkward position.

The idea, that Polaris is Efficient/inefficient is not what I care, or I am on about in my thinking.

I am about the change in the process itself. Because this is TOO BIG outlier to be considered this way.

Look at the graphs. Reference: Ellesmere XT: 1.04v - 1120 MHz, 1.175v - 1266 MHz. XFX RX 480 GTR - 1288 MHz@1.05v, and possible OC stable at 1475 MHz at 1.175v.

Previously, none of Polaris GPU was able to get this high OC at reference boost voltages.

Thats why I think this is much more than simply outlier. And all of this in the context of possible, rumored 1.5 GHz clocks for big Vega GPU.

Piroko · Oct 9, 2016

Glo. said:
Look at the graphs. Reference: Ellesmere XT: 1.04v - 1120 MHz, 1.175v - 1266 MHz. XFX RX 480 GTR - 1288 MHz@1.05v, and possible OC stable at 1475 MHz at 1.175v.

Previously, none of Polaris GPU was able to get this high OC at reference boost voltages.

Thats why I think this is much more than simply outlier.

One point I would like to make is that this is based on a software readout which might be inaccurate. Though I also doubt that it's completely off, more in the line of +- 0.05V. And that's enough to get this card back into the line AMD presented. But I won't rule out the idea of a relaunch on a tweaked stepping and/or process either. That Idea definitely isn't unheard of with AMD (Hawaii/Grenada among others).

biostud · Oct 10, 2016

Maybe AMD's design run into similar problems as the gtx 480 did with heat and rising electrical resistance when reaching higher clocks.

ElFenix · Oct 10, 2016

Glo. said:
I genuinely think this is revised process, or simply Samsung process(how on earth otherwise GTX 1050 Ti would be able to achieve 1.7 GHz clocks on this process?).

or, you know, pascal made different design choices that enable higher clocks as one benefit. port pascal over to intel's latest and it ain't hitting 4 GHz.

tviceman · Oct 10, 2016

Glo. said:
I genuinely think this is revised process, or simply Samsung process(how on earth otherwise GTX 1050 Ti would be able to achieve 1.7 GHz clocks on this process?).

R&D... Oomph! Ah! What is it good for? To you, absolutely nothing.

Perhaps Nvidia's larger number of engineers with a higher budget working on fewer total projects has something to do with higher clocks, faster performance, higher efficiency, and top to bottom product releases.

Vega with GDDR5 or G5X or GDDR6 won't be a massive departure in perf/w from Polaris.

IllogicalGlory · Oct 10, 2016

By the same token, is it really the process advantage that allows AMD to have almost twice as many shakers as comparable NV cards? It's the design that makes the difference and each has its advantages.

SpaceBeer · Oct 11, 2016

Average user cares only (mostly) about one thing when buying new graphics card, and that is FPS/$. Of course, there are things like power consumption, noise, etc.… So if you offer to anyone e.g. 1060 3GB and 980 Ti at the same price, even those people with cheap PSUs would take 980Ti and buy new PSU.

But this is forum for PC enthusiasts and we like to talk about other stuff also. And one should be very careful when analyzing some product or comparing different architectures. Especially today, when chips are so complex, and companies are using different approaches to solve same problems.

If we would like to compare AMD’s and nVidia’s architecture, we should take chips with exactly the same configuration and clocks. But there are no such chips. So let’s see what we have.

Good thing on AMD’s side is that we have chips with 36 CUs in three generations (Tahiti, Tonga and Polaris 10). Which makes it much easier to compare them, and there is also test Tonga vs Polaris on same clock. So we see the main reason for better of performance of RX 470 is higher clock.

On nVidia’s side, it’s a little bit harder, since there are no GP and GM chips with exactly the same configuration. Though we will soon have GTX 1050Ti to compare it with GTX 950. But even without that, when we include larger (different) number of SM units with higher clocks, it seems the main reason for better Pascal performance is also higher clock (compared to Maxwell cards).

So looks like there are no significant performance per CU/SM improvements on either side, compared to previous generations.

Now we need to compare apple vs orange I really don’t know what would be the best choice. We could take reference RX 470 and GTX 980 for example, since they have very similar (boost) clocks and configuration. Though 980 has 2 times more ROPs, which leads to much higher pixel fillrate. Or maybe it would be better to take R9 390 and GTX 1080 – exactly the same configuration, but quite the difference in clocks and production process.

Let’s take a look at performance chart* we can see that GTX 980 is ~20% faster than RX 470, but is the difference caused by better architecture or double amount of ROPs (pixel fillrate)? GTX 1080 is ~70% faster than R9 390. But it is also clocked ~70% higher. Does that mean that AMD’s and nVidia’s chips have the same performance per clock?

So it looks like the main (not to say only) advantage nVidia has, is possibility to clock its GPUs much higher compared to AMDs. But since GPUs are made for parallel tasks, that problem is easily solved by adding more cores, which is what AMD is doing. Of course, that leads to larger chips, and therefore higher cost. Which doesn’t necessarily mean P10 is more expensive than GP106, since we don’t know GF’s and TSMC’s prices.

But besides gaming, GPUs are used for other tasks, so we don’t know where nVidia’s efficiency is coming from and for example, how much DP compute capabilities in GCN influence on max clocks and power consumption. But we know GCN has advantage in those tasks. How much could AMD increase clock speeds and lower TDP if they decide to cut its DP performance? I suppose efficiency of blocks for audio, video encoding, CF/SLI, display controllers, etc. is different also, though that’s only a small part of overall card efficiency.

So after all, I think it’s hard to say how much is AMD behind nVidia when it comes to chip design. And is it better for them to increase R&D funds to improve architecture, or go with brute force approach if they can get lower production price (GF vs TSMC). End customer only wants to get as much as possible FPS per dollar, and doesn’t care about what’s inside the chip.

* I really hate those charts since is impossible to make relevant one. And depending on games chosen, relative performance between 2 products can vary a lot. If in one game card A is 40% faster than card B, and in next one card B is 20% faster, it looks silly to say card A is usually 20% faster than card B. And we can’t be sure if reason for that lies in hardware or in software. You could also say – developers of game C are much better than those who developed game D. But we have to use something, and here’s one to take a look at: https://www.computerbase.de/thema/grafikkarte/rangliste/

Glo. · Oct 11, 2016

tviceman said:
R&D... Oomph! Ah! What is it good for? To you, absolutely nothing.

Perhaps Nvidia's larger number of engineers with a higher budget working on fewer total projects has something to do with higher clocks, faster performance, higher efficiency, and top to bottom product releases.

Vega with GDDR5 or G5X or GDDR6 won't be a massive departure in perf/w from Polaris.

ElFenix said:
or, you know, pascal made different design choices that enable higher clocks as one benefit. port pascal over to intel's latest and it ain't hitting 4 GHz.

General forum consensus is that GloFo and Samsung process are worse than TSMC. We know how Polaris 10 and 11 behave on GloFo process in terms of power consumption. Then we have XFX RX 480 GTR. That behaves differently than what we know about RX 480 chips with almost 1.5 GHz core clocks on almost reference voltages, and we have information about GTX 1050 Ti possibly being made on Samsung process, and being able to achieve 1.75 GHz boost clocks.

And lastly, we have information about embedded GPUs behaving in terms of thermals completely different way than what we know from desktop lineup.

All of this was in the context of possibility of Samsung process being better than GloFo, and in the context of Vega achieving 1.5 GHz core clocks.

This is theory. Unfortunately for this theoretical extrapolation, most of information here comes from rumors and crystal ball. We have to wait and observe what will happen, for every company mentioned directly or indirectly here.

gamervivek · Oct 11, 2016

The overwhelming advantage that nvidia have is their higher clockspeeds. And unlike when AMD were ahead on the clockspeed front they make big chips as well. So AMD can only go so far with adding shaders, there's simply too big of a gap in clockspeeds to make up for it. Squeezing performance by upping the voltages also hurts the perf/W.

Kepler cards were clocking to 1.3-1.4Ghz with added voltage and AMD are just getting there.

SpaceBeer · Oct 11, 2016

But we know clocks can't be increased indefinitely and they will hit the wall at some point. So even nVidia will have to focus more on other improvements (more shades, architectural changes), and less on clocks. Sure, no one can tell where will AMD be when nVidia reaches 3GHz. Might be 2.0 GHz or maybe even 2.7 GHz, who knows.

We know they have managed to lower power consumption significantly in Carrizo and Bristol Ridge APUs in 28nm, compared to previous generations in 32nm and 28nm processes. So they are surely working on it. We also know it takes years to make big changes in processor architecture. It took them like 8 years or so to implement HBM and have been working 4 years on Zen

And last, but not least - there are no bad graphics cards, there are only badly priced cards

Phynaz · Oct 11, 2016

Glo. said:
(how on earth otherwise GTX 1050 Ti would be able to achieve 1.7 GHz clocks on this process?).

Different design.

Edit: ElFenix beat me to it.

crisium · Oct 11, 2016

gamervivek said:
Kepler cards were clocking to 1.3-1.4Ghz with added voltage and AMD are just getting there.

I'm not sure about Kepler at 1.4 on air, but GCN 1 (Tahiti, Pitcairn) were the first cards to be able to hit 1.3GHz. Radeon 7950 800MHz->1.3GHz was the greatest GPU OC ever, thanks to silly low stock clocks. GCN 2 and 3 were setbacks here, tending to max out closer to 1.2 (or less with early Fiji chips).

It seems GCN 4 on the new process can only OC a little more (100-200Mhz more?) than GCN 1 max clocks. The lower power consumption I guess is the biggest node advantage. Even with revisions GCN might just not be able to go much over 1.4-1.5. Hopefully Vega changes something.

Glo. · Oct 11, 2016

XFX RX 480 GTR on air: 1288 MHz@ 1.05v around 85-90W in Heaven benchmark, 1475 MHz@1.185v - 133W.
Water Cooled RX 480: 1266 MHz@1.05v around 105W in heaven benchmark, 1470 MHz@1.185v - 150W, and unstable/crashes.

Gigabyte G1 from the same reviewer:

125W at stock clocks. And 180-190W when OC'ed.

Something is just not right here, guys...

I do think that XFX RX 480 must use at least new revision of the process, or be made on Samsung process. Then it is logical.

I have derailed the thread a bit, while staying a bit on the edges of the topic(Vega achieving high clocks), sorry for this.

Phynaz · Oct 11, 2016

SpaceBeer said:
Average user cares only (mostly) about one thing when buying new graphics card, and that is FPS/$.

Since Nvidia in general has a lower FPS/$, and outsells AMD 3:1, this statement can't be true.

crisium · Oct 11, 2016

Indeed. Brand is the most important metric for consumers.

Glo. · Oct 11, 2016

Phynaz said:
Different design.

Edit: ElFinix beat me to it.

No. If Samsung process, on which presumably GTX 1050 Ti is made, is similarly rubbish to GloFo, it will let down the potential of the architecture design. If the GTX 1050 Ti is able to clock that high on this process, then it is not a problem with the process.

That was the whole point from the start.

Mopetar · Oct 11, 2016

Glo. said:
XFX RX 480 GTR on air: 1288 MHz@ 1.05v around 85-90W in Heaven benchmark, 1475 MHz@1.185v - 133W.
Water Cooled RX 480: 1266 MHz@1.05v around 105W in heaven benchmark, 1470 MHz@1.185v - 150W, and unstable/crashes.

Gigabyte G1 from the same reviewer:
125W at stock clocks. And 180-190W when OC'ed.

Something is just not right here, guys...

That would suggest a high variance in the process and that AMD binned artificially low and (probably) doesn't have a higher bin. My guess is the process was having some issues or wasn't maturing as fast as expected which led to AMD releasing some questionable silicon that probably should have been cut in order to hit production numbers.

William Gaatjes · Oct 11, 2016

It got me wondering, the rumour that the size of the wavefronts (64 threads) might become smaller (Like a half, similar to 32 Nvidia) may work out for AMD. That is, if they can still execute these smaller wavefronts in parallel as if the amount of execution units virtually increases and for different instructions at the same time. Splitting up the SIMD unit to not do the same instructions on 4 (cycles) * 16 lanes (=64). I got the impression that that is why Nvidia is doing very well.

Glo. · Oct 11, 2016

William Gaatjes said:
It got me wondering, the rumour that the size of the wavefronts (64 threads) might become smaller (Like a half, similar to 32 Nvidia) may work out for AMD. That is, if they can still execute these smaller wavefronts in parallel as if the amount of execution units virtually increases and for different instructions at the same time. Splitting up the SIMD unit to not do the same instructions on 4 (cycles) * 16 lanes (=64). I got the impression that that is why Nvidia is doing very well.

That would increase potential throughput on properly fed GPU. Also would require new, next generation rasterization, and next generation schedulers. Without this, whole patent may be rendered pretty much useless in terms of performance and mediocre in terms of efficiency.

That is, if I understand the patents, and how GCN works.

rainy · Oct 11, 2016

Glo. said:
I do think that XFX RX 480 must use at least new revision of the process, or be made on Samsung process.

Isn't it too early for a new revision?
Usually it takes 6-7 months for a new one.

Btw, I don't know when production of Polaris has started.

Despoiler · Oct 11, 2016

rainy said:
Isn't it too early for a new revision?
Usually it takes 6-7 months for a new one.

Btw, I don't know when production of Polaris has started.

I'm with Mopetar. AMD/Gloflo was probably already working on a new revision. Their launch date comes up and they decide to launch with the current revision rather than delay for several months. It would make sense given the power fix they implemented. I find it hard to believe that the engineers missed something related to the power delivery. More likely they designed/programmed the first release to target values that didn't hold across all silicon. Either way we need more data to determine if this is truly a new rev or just a really good piece of silicon. My gut feeling is that any process issues will be nailed down by the time Vega releases.

William Gaatjes · Oct 11, 2016

Glo. said:
That would increase potential throughput on properly fed GPU. Also would require new, next generation rasterization, and next generation schedulers. Without this, whole patent may be rendered pretty much useless in terms of performance and mediocre in terms of efficiency.

That is, if I understand the patents, and how GCN works.

What keeps me wondering :
The question is : Where is the bottleneck located ? Can the schedulers not feed the execution units fast enough, or can they but is the given calculation task not optimal ? I wonder if they cannot feed the simd unit with data that uses all 4* 16 lanes effectively ? What if the simd units are fed with data that does not make use of the maximum throughput . I wonder if anybody ever did some debugging or trace program to see what happens with the data inside the GCN GPU.

I mean, if i compare the GTX1060 and the RX480, it is kind of strange. I still favor the RX480, though.
I am just a noob, but curious.

How are the rasterizers coupled to the simd units ?

Glo. · Oct 11, 2016

William Gaatjes said:
What keeps me wondering :
The question is : Where is the bottleneck located ? Can the schedulers not feed the execution units fast enough, or can they but is the given calculation task not optimal ? I wonder if they cannot feed the simd unit with data that uses all 4* 16 lanes effectively ? What if the simd units are fed with data that does not make use of the maximum throughput . I wonder if anybody ever did some debugging or trace program to see what happens with the data inside the GCN GPU.

I mean, if i compare the GTX1060 and the RX480, it is kind of strange. I still favor the RX480, though.
I am just a noob, but curious.

How are the rasterizers coupled to the simd units ?

For Vega if we have variable wavefront, then it would require... wait for it... tile based rasterization, and putting the data into smaller bits.

That would also save a lot of Memory bandwidth required to feed the GPU with work, hence the 2048 Bit memory bus, rumored in Vega architecture.

That is exact opposite for previous generations of GCN. It is, strictly saying, memory bottlenecked. RX 480 gets highest increase in performance from memory overclocks, compared to the core. It also gives me abstract idea "why", but I cannot explain in simple words, or even illustrate this.

Vega/Navi Rumors (Updated)

Diamond Member

Senior member

Diamond Member

Senior member

Lifer

Elite Member

Diamond Member

Senior member

Senior member

Diamond Member

Senior member

Senior member

Lifer

Platinum Member

Diamond Member

Lifer

Platinum Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Senior member

Golden Member

Lifer

Diamond Member