Vega/Navi Rumors (Updated)

Bacon1 · Mar 13, 2017

Dave2150 said:
Nvidia is having massive success with GDDR5X - why on earth haven't AMD jumped on this bandwagon? Do NVIDIA have exclusive access to it?

If GDDRX is so great, why is Nvidia using HBM 2 on GP100?

Vega is already being used in massive server farms by Liquid Sky. Desktop sales pale in comparison to server sales.

beginner99 · Mar 13, 2017

Bacon1 said:
f GDDRX is so great, why is Nvidia using HBM 2 on GP100?

Because it's better but also more expensive. Why would NV use GDDR5x on 1080Ti (GP102) if HBM2 was cheaper?

I'm not an NV fan but they clearly have an edge in memory bandwidth and power usage. So they can get away with GDDR5x instead of hbm2. If a technology is better and cheaper, we would have seen way, way faster adoption.

Krteq · Mar 13, 2017

beginner99 said:
Why would NV use GDDR5x on 1080Ti (GP102) if HBM2 was cheaper?

It's simple. GP104 and GP102 doesn't have IMC with HBM support, it was not designed for use on interposer etc.

beginner99 · Mar 13, 2017

Krteq said:
It's simple. GP104 and GP102 doesn't have IMC with HBM support, it was not designed for use on interposer etc.

Of course. But why was it designed like that? Because GDDR5x is cheaper and good enough for the target market.

Dave2150 · Mar 13, 2017

Bacon1 said:
If GDDRX is so great, why is Nvidia using HBM 2 on GP100?

Vega is already being used in massive server farms by Liquid Sky. Desktop sales pale in comparison to server sales.

GP100 = Extremely expensive part, sold in much smaller numbers than a 1070 for example. That means they can use the super expensive HBM2, 16GB of it, without any supply/pricing issues.

AMD on the other hand have presumably had to wait for HBM2 supply, price etc to come down to levels where it makes sense for consumer video cards. I think they would have been far better off releasing Vega months ago, with GDDR5X.

w3rd · Mar 13, 2017

Arachnotronic said:
Because HBM2 is expensive to implement, is expensive itself, makes it prohibitively expensive for large memory configurations, and is just generally a poor solution for consumer products when a GDDR-follow on that can offer similar performance is available.

It has nothing to do with "release rights."

Sorry, but Nvidia doesn't have a cohesive unified memory controller like Vega does. So, it can not make use of HBM/HBM2 like Vega can. HBM is just a selling point/marketing for Nvidia anyways, their technology can't make full use of it's bandwidth. That is why NVidia cheapened up using inferior GDDR5x.

Valantar · Mar 13, 2017

w3rd said:
Sorry, but Nvidia doesn't have a cohesive unified memory controller like Vega does. So, it can not make use of HBM/HBM2 like Vega can. HBM is just a selling point/marketing for Nvidia anyways, their technology can't make full use of it's bandwidth. That is why NVidia cheapened up using inferior GDDR5x.

Uh, okay. None of that is true. We don't yet know what the memory controller of Vega can actually do, but AMD used a "simple" HBM controller in the Fury cards - which worked perfectly well - and Nvidia has something comparable in P100. A "simple" memory controller can perfectly well utilize all the bandwidth of the memory connected to it. That's up to a) the bandwidth and speed of the controller (which again in a determinant of what memory ends up being connected to it, not the other way around), and b) the tasks given to it. Or are you saying Nvidia's GDDR5/5X controllers can't utilize the full bandwidth of those either?

I'm a big supporter of AMDs desicion to go HBM2 with Vega - it has a ton of advantages - but it's very clear that Nvidia's choice of GDDR5X for high-end consumer cards stems from a combination of cost, availability, GDDR5X being "close enough" in performance, and Nvidia's GPU architecture efficiency advantage. Why? Nvidia sells far more GPUs than AMD, so they'd need a much healthier supply of HBM. This doesn't exist, and even if it did, costs for higher volumes would be huge. They can fit the 20+W of a 8GB+ GDDR5X setup within reasonable board TDPs due to their (for now) superior GPU efficiency. And, as the 1070, 1080 and 1080Ti show, GDDR5X is no slouch. Would HBM have been better? Probably, but it would significantly cut into Nvidia's margins (or force prices higher) while giving them a small advantage (over themselves?) at best.

While the Vega memory controller sounds innovative and might be ground-breaking, it certainly isn't the sole reason why AMD went HBM2. After all, they made HBM GPUs way before a controller like that existed.

lobz · Mar 13, 2017

amenx said:
Whenever there are big delays, you know things dont look good.

that explicitly means that things are very, very bad at intel for like 2-3 years now

lobz · Mar 13, 2017

Sweepr said:
Two stack HBM2 with only a minor bandwidth advantage compared to cut down GP102 (8GB 512GB/s vs 11GB 484GB/s) isn't really impressive - Fury X had 50% more bandwidth than GTX 980 Ti. Meanwhile rumor says the competition is embracing 16GB GDDR6 at >14Gbps for their next generation.

That is just by a design choice, they can go (and most certainly will, with professional products first) to 4 stacks any time. With the new hbcc, more bandwidth is simply not needed, and they have this bandwidth on an area much, much smaller than they'd have gotten it with ddr ram

lobz · Mar 13, 2017

beginner99 said:
Because it's better but also more expensive. Why would NV use GDDR5x on 1080Ti (GP102) if HBM2 was cheaper?

I'm not an NV fan but they clearly have an edge in memory bandwidth and power usage. So they can get away with GDDR5x instead of hbm2. If a technology is better and cheaper, we would have seen way, way faster adoption.

I think NVIDIA hate working with interposers, they just don't have any other choice in the pro market

lobz · Mar 13, 2017

Valantar said:
Uh, okay. None of that is true. We don't yet know what the memory controller of Vega can actually do, but AMD used a "simple" HBM controller in the Fury cards - which worked perfectly well - and Nvidia has something comparable in P100. A "simple" memory controller can perfectly well utilize all the bandwidth of the memory connected to it. That's up to a) the bandwidth and speed of the controller (which again in a determinant of what memory ends up being connected to it, not the other way around), and b) the tasks given to it. Or are you saying Nvidia's GDDR5/5X controllers can't utilize the full bandwidth of those either?

I'm a big supporter of AMDs desicion to go HBM2 with Vega - it has a ton of advantages - but it's very clear that Nvidia's choice of GDDR5X for high-end consumer cards stems from a combination of cost, availability, GDDR5X being "close enough" in performance, and Nvidia's GPU architecture efficiency advantage. Why? Nvidia sells far more GPUs than AMD, so they'd need a much healthier supply of HBM. This doesn't exist, and even if it did, costs for higher volumes would be huge. They can fit the 20+W of a 8GB+ GDDR5X setup within reasonable board TDPs due to their (for now) superior GPU efficiency. And, as the 1070, 1080 and 1080Ti show, GDDR5X is no slouch. Would HBM have been better? Probably, but it would significantly cut into Nvidia's margins (or force prices higher) while giving them a small advantage (over themselves?) at best.

While the Vega memory controller sounds innovative and might be ground-breaking, it certainly isn't the sole reason why AMD went HBM2. After all, they made HBM GPUs way before a controller like that existed.

worked 'well' -> it couldn't use more than 2/3 of its theoretical bandwidth. That is changing with hbcc and hbm2 for vega. We must wait to be able to test gp100 against vega, so we can compare their memory solutions, but fury's HMB is not an indicator for this.

Glo. · Mar 13, 2017

w3rd said:
Sorry, but Nvidia doesn't have a cohesive unified memory controller like Vega does. So, it can not make use of HBM/HBM2 like Vega can. HBM is just a selling point/marketing for Nvidia anyways, their technology can't make full use of it's bandwidth. That is why NVidia cheapened up using inferior GDDR5x.

You do not need to have Unified Memory controller on hardware to get the feature compatibility. Nvidia did with GP100 the same thing, but through CUDA. CUDA Unified Memory has exactly the same level of compatibility with this feature(49 bit register), like Vega has on hardware level. But here are starting the differences.

Vega has on hardware level compatibility with Unified Memory, in OpenCL 2.0. CUDA has software level compatibility with Unified Memory in OpenCL 2.0.

Where is the difference? Software layer of abstraction. You will not have the benefit of 49 bit register, if you do not use software, specifically designed for the hardware.

AMD hardware will always adapt itself, to the situation.

It will be interesting from my perspective about Vega. Vega appears to not bring anything to the table in terms of compute capabilities, and I am a part of professional forum, on which professionals make their decisions about hardware purchases, based on gaming benchmarks, and completely ignoring compute benchmarks. I could post time and time again, showing RX 480 being in professional applications just 10% slower than GTX 1070, and nobody there cared about it. That is how strong appeal Nvidia brand has. I will have a very good laugh from them, watching what will happen with Vega.

Valantar · Mar 13, 2017

lobz said:
worked 'well' -> it couldn't use more than 2/3 of its theoretical bandwidth. That is changing with hbcc and hbm2 for vega. We must wait to be able to test gp100 against vega, so we can compare their memory solutions, but fury's HMB is not an indicator for this.

Uh, okay? Numbers/benchmarks to demonstrate that, please? That account for delta color compression and other factors, and that measure VRAM-to-GPU bandwidth only? If you're drawing conclusions from comparisons with Nvidia cards with slower memory, you have to account for their clearly superior compression algorithms.

crisium · Mar 13, 2017

Valantar said:
Uh, okay? Numbers/benchmarks to demonstrate that, please? That account for delta color compression and other factors, and that measure VRAM-to-GPU bandwidth only? If you're drawing conclusions from comparisons with Nvidia cards with slower memory, you have to account for their clearly superior compression algorithms.

http://techreport.com/review/28513/amd-radeon-r9-fury-x-graphics-card-reviewed/4

780 Ti, Titan X, and 980 Ti all have 336 GB/s. 980 has 224 GB/s.
290X has 346 GB/s (factory OC). Fury X has 512 GB/s.

Fury X loses random texture efficiency here compared to the 290X. So you don't even have to compare it to Nvidia to see they are not doing so well with HBM. In this one test only, of course. But it's the only one I have seen, so anyone with more link them please.

The 'good news' from this is that AMD have room to work with. Even though Fiji and Vega have the same theoretical bandwidth, if AMD can catch up to Maxwell v2 efficiency then there will be an appreciable bandwidth increase.

Edit: Techreport says they have 346 GB/s for the 290X. Must be a factory OC. Numbers updated.

w3rd · Mar 13, 2017

Glo. said:
You do not need to have Unified Memory controller on hardware to get the feature compatibility. Nvidia did with GP100 the same thing, but through CUDA. CUDA Unified Memory has exactly the same level of compatibility with this feature(49 bit register), like Vega has on hardware level. But here are starting the differences.

Vega has on hardware level compatibility with Unified Memory, in OpenCL 2.0. CUDA has software level compatibility with Unified Memory in OpenCL 2.0.

Where is the difference? Software layer of abstraction. You will not have the benefit of 49 bit register, if you do not use software, specifically designed for the hardware.

AMD hardware will always adapt itself, to the situation.

It will be interesting from my perspective about Vega. Vega appears to not bring anything to the table in terms of compute capabilities, and I am a part of professional forum, on which professionals make their decisions about hardware purchases, based on gaming benchmarks, and completely ignoring compute benchmarks. I could post time and time again, showing RX 480 being in professional applications just 10% slower than GTX 1070, and nobody there cared about it. That is how strong appeal Nvidia brand has. I will have a very good laugh from them, watching what will happen with Vega.

I understand about the 49 bit registers..
But what I am talking about is the actual non-volatile memory that interacts with HBM as a cache. Nvidia doesn't have this, therefore they can't use HBM to it's fullest potential. That is why GDDR5x is good enough.

Bacon1 · Mar 13, 2017

crisium said:
http://techreport.com/review/28513/amd-radeon-r9-fury-x-graphics-card-reviewed/4

780 Ti, Titan X, and 980 Ti all have 336 GB/s. 980 has 224 GB/s.
290X has 320 GB/s. Fury X has 512 GB/s.

Fury X has 1.6x the theoretical bandwidth of the 290X, but does not come close to that in this test. So you don't even have to compare it to Nvidia to see they are not doing so well with HBM. In this one test only, of course. But it's the only one I have seen, so anyone with more link them please.

I think that test is flawed though, and has some kind of limit.

The 980 Ti / Titan X has max bandwidth of 336 but only gets 234/238. 290x? 346 vs the 263 it gets in the test. 780 Ti also matches the 980 Ti and Titan X at 336 max but only gets 197.

So its not just Fury X that is under performing in that test, all cards are massively underperforming. 980 Ti / Titan X should be getting another 40%.

crisium · Mar 13, 2017

The 780 Ti and Fury X stand out the most as under performing. Clearly Nvidia improved a lot with Maxwell, but AMD lost efficiency compared to Hawaii. I think a fair argument can be made that AMD did not work "well" with their first HBM experiment. Again, try to view the good news that they have likely improved for Vega.

Percentage of theoretical bandwidth achieved
Black / Random Texture:
780 Ti: 66% / 59%
980 Ti: 108% / 70%
290X: 76% / 76%
Fury X: 76% / 65%

I invite more tests if there are any, but this is the only one I remember from Fiji launch.

Edit: Techreport says they have 346 GB/s for the 290X. Must be a factory OC. Numbers updated.

Bacon1 · Mar 13, 2017

You can't compare compression and theoretical bandwidth though, because like you saw, you get over 100% which doesn't make sense. Compression is there to help lower bandwidth because you compress it, then send it and then uncompress on the other side.

Edit: You see the same thing with 480:

http://techreport.com/review/30328/amd-radeon-rx-480-graphics-card-reviewed/5

Compressed outpaces the bandwidth limit.

And even more so with 1080 / 1070:

http://techreport.com/review/30413/nvidia-geforce-gtx-1070-graphics-card-reviewed/4

Where 1070 / 1080 get way over their max when compressed. So you have to just look at the random ones which all fall short... so that is why I think the test is flawed as it doesn't fully saturate the bandwidth for any card. Even 1080 is only 247 / 320 or should have 30% more bandwidth.

Glo. · Mar 13, 2017

w3rd said:
I understand about the 49 bit registers..
But what I am talking about is the actual non-volatile memory that interacts with HBM as a cache. Nvidia doesn't have this, therefore they can't use HBM to it's fullest potential. That is why GDDR5x is good enough.

Nope, they can. HBM will act just like "normal memory" not like just a cache for Data. It is important to understand the difference here, because it changes approach, but does not make one better or worse form the other.

Valantar · Mar 13, 2017

crisium said:
http://techreport.com/review/28513/amd-radeon-r9-fury-x-graphics-card-reviewed/4

780 Ti, Titan X, and 980 Ti all have 336 GB/s. 980 has 224 GB/s.
290X has 320 GB/s. Fury X has 512 GB/s.

Fury X has 1.6x the theoretical bandwidth of the 290X, but does not come close to that in this test. So you don't even have to compare it to Nvidia to see they are not doing so well with HBM. In this one test only, of course. But it's the only one I have seen, so anyone with more link them please.

The 'good news' from this is that AMD have room to work with. Even though Fiji and Vega have the same theoretical bandwidth, if AMD can catch up to Maxwell v2 efficiency then there will be an appreciable bandwidth increase.

So to avoid compressible textures, you need to look at the "Random texture" numbers.

The 290X scores 82% of its theoretical maximum. The 980 scores 76%. The 980Ti scores 70%. The Titan X scores 71%. The 780Ti scores 58%. And the Fury X scores 65%. For a first-generation controller for a never-seen-before memory technology (competing with controllers refined through 5 (AMD)/4 (Nvidia 9XX) generations), I'd say that's pretty decent. All that really shows is that the 290X has a very efficient memory subsystem (although the lack of compression is killing it when it comes to compressible textures). And the first-gen HBM controller in the Fury X trounces the efficiency of the 780Ti, which was a third-gen GDDR5 controller. I see nothing wrong with those numbers.

lobz · Mar 13, 2017

Valantar said:
Uh, okay? Numbers/benchmarks to demonstrate that, please? That account for delta color compression and other factors, and that measure VRAM-to-GPU bandwidth only? If you're drawing conclusions from comparisons with Nvidia cards with slower memory, you have to account for their clearly superior compression algorithms.

I did absolutely not compare fury to any nvidia gpu. I was reading a lot of professional forums back then with tests (not game benches) that check actually usable memory bandwidth. I apologize for not wanting to search for a forum discussion that has happened more than 1,5 years ago.

Sweepr · Mar 14, 2017

- Radeon RX580/RX570 launching April 18th, right after Ryzen 5
- Same Polaris GPUs, frequency bump (unknown if other differences)
- Polaris 12 positioned against GTX 1050
- Vega expected in May

http://news.mydrivers.com/1/523/523439.htm

swilli89 · Mar 14, 2017

Sweepr said:
- Radeon RX580/RX570 launching April 18th, right after Ryzen 5
- Same Polaris GPUs, frequency bump (unknown if other differences)
- Polaris 12 positioned against GTX 1050
- Vega expected in May

http://news.mydrivers.com/1/523/523439.htm

Thanks for the post.

Will be interesting what AMD can do with a process +1 year matured as well as better tuning and binning. Been seeing more and more 480's on reddit/r/amd that hit 1400mhz at near stock voltage.

Krteq · Mar 14, 2017

Hmm, I'm curious if they used Polaris 10XT2 which has been spotted in Mac OS update.

raghu78 · Mar 14, 2017

Polaris 12 is not going to compete against GTX 1050 no matter what some rumours say. I think Polaris 11 based RX 560 will be priced to compete against GTX 1050 at around USD 99- USD 109.

http://wccftech.com/amd-radeon-rx-580-570-launch-delay-rumor/

AMD's problem with Polaris is its less efficient than Pascal in terms of perf/sq mm and perf/watt. Nvidia can easily adjust prices to make the price perf very similar and they can do so while still making higher margins due to a smaller GPU and cheaper BOM. I think Rx 570 could turn out to be a very good card from a price/perf and efficiency standpoint. We know Polaris efficiency falls off sharply once you push clocks beyond 1250 Mhz. Nvidia will still call the shots with some tweaked SKUs based on GP106 and GP107 with slightly higher core clocks.

I hope AMD Vega can compete with Nvidia in terms of perf/watt and perf/sq mm. AMD cannot win a price war until it matches Nvidia on those 2 important metrics.

Vega/Navi Rumors (Updated)

Diamond Member

Diamond Member

Golden Member

Diamond Member

Senior member

Senior member

Golden Member

Platinum Member

Platinum Member

Platinum Member

Platinum Member

Diamond Member

Golden Member

Platinum Member

Senior member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Golden Member

Platinum Member

Diamond Member

Golden Member

Golden Member

Diamond Member