Vega/Navi Rumors (Updated)

Page 67 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

railven

Diamond Member
Mar 25, 2010
6,604
561
126
Tell that to the people who want a passively cooled GPU for HTPC use. Green sounds like a great moniker all of a sudden.

It all depends on they handle it, of course. I remember a Nvidia GTX 460 Green, that seemed to confuse/annoy people more than denote anything.

Of course AMD wouldn't use it even if it made perfect sense. The demented system for Ryzen naming that seems to be an amalgamation of Intel's scheme and pure madness suggests we'll get the Radeon RX Vega 5080 Fury Ti or something similarly cruddy.

Haha, nice! Kudos for that one.
 

Malogeek

Golden Member
Mar 5, 2017
1,390
778
136
yaktribe.org
AMD would be better off with a GDDR5 8 Gig card and an HBM2 8 Gig card.
That would be even more confusing for the average gamer.

There are many instances over the years of a higher version of a card having less VRAM than a lower performing version. 1060 the obvious most recent example.

Brand PC makers will want a 4Gb version but it's highly likely we'll see standalone GPUs in 4Gb version as well.
 

Despoiler

Golden Member
Nov 10, 2007
1,966
770
136
I've still never seen anyone provide the cost for a HBM implementation. Everyone keeps saying its astronomically expensive. While I agree it's more expensive than GDDR based on the fact that it has added complexity (interposer, TSVs, final packaging) and that it has lower economies of scale, I'm not sure what the offset is with smaller PCB, less complex board and board power design, and lower cooling capacity requirement.
 

Valantar

Golden Member
Aug 26, 2014
1,792
508
136
I agree, having two different 8GB SKUs would be a nightmare. Besides, if AMD's demos are anything to go by, the HBCC should have the potential to alleviate many of the potential issues with "only" 4GB of VRAM. Besides, 4GB is plenty even for contemporary games at 1440p (unless you run Skyrim modded into the ground, in which case, you have yourself to blame). Need a higher resolution? Get the 8GB version.

I for one was more excited by this:
Hexus.net said:
The biggest takeaway from China (sorry) was Herkelman's revelation that Vega would be important to AMD's laptop plans. As AMD's 14nm Vega is compact and efficient it can be leveraged in laptops, said the VP. However, detail wasn't specific enough to determine whether Vega would come to laptops as a discrete graphics chip, in a new APU design, or via both of those routes. Whatever the way to the portable market, Vega will facilitate OEM partner creation of "thinner and lighter notebooks, that still pack that punch you need to drive virtual reality or the latest and greatest AAA games." That sounds rather attractive
(url=http://hexus.net/tech/news/graphics/103900-amd-says-rx-vega-just-around-corner/]link[/url])
While I admittedly am going out on a limb here, could this mean that my previous (totally and utterly unfounded) speculation that small Vega is mobile-focused be close to the truth?

If they're aiming for mobile in a big way (where power and space constraints are king, and cost is less of an issue than desktop/margins are higher), launching a Vega chip around or slightly above RX 480 levels of performance would make sense in my mind, as long as it consumes noticeably less power and production costs are reasonable. If they could compete with the mobile 1060 while using less power, beat it at the same retail price, or both, that's huge. Far more important than competing with the 1070 and 1080 in the mobile market. The area savings are a given with HBM2, so that's already done. Heck, in a 14" chassis, the area savings of HBM2 would be a pretty big deal, requiring nothing but power delivery outside of the package itself. If Gigabyte and Razer both can cool a 1060 reasonably well in a 14" chassis, could this open up possibilities for 1070 levels of performance, with the extra room being used for bigger heatsinks and/or fans?

Also: has anyone, ever, made a mobile GPU with a ~500mm2 chip? That sounds utterly insane, even if clocked down significantly. And brilliant, if they can swing it. Huge chip, tiny area due to HBM, low power consumption for the size and performance due to arch improvements and purposely lowered clocks - I struggle to see how this wouldn't beat the mobile 1080 in pretty much every metric. If the desktop chip can compete with the 1080Ti at 225W (which we of course don't know, but let's go with it for now), 10-20% lower clocks and power binning should make for a killer mobile chip. After all, wide and slow is pretty much always more efficient than narrow and fast in the GPU space.
 

piesquared

Golden Member
Oct 16, 2006
1,651
473
136
I thought the Fury X had no bottleneck in 4K with 4GB HBM? Considering Vega is supposedly designed with memory capacity constraints alleviated, maybe it is 2GB HBM2 for HD, 4GB HBM2 for 4K and 8GB HBM2 for 8K? I know if i had no plans to upgrade my monitor for a few years, i'd be fine with less memory for less money.
 

Glo.

Diamond Member
Apr 25, 2015
5,765
4,670
136
I suggest going back to the Vega slides presented, and reading about the effect HBCC has on memory management, memory requirements, and understanding the whole situation.

P.S. 4GB GPUs will have 2 stacks of HBM2.
 
Reactions: 3DVagabond

CatMerc

Golden Member
Jul 16, 2016
1,114
1,153
136
Raja did say they're shifting the advertisement to be more about memory bandwidth than amount. So you'd see 512GB/s on the box instead of 4GB. Reason being that the former is more important with HBCC than the latter.
 

Valantar

Golden Member
Aug 26, 2014
1,792
508
136
I suggest going back to the Vega slides presented, and reading about the effect HBCC has on memory management, memory requirements, and understanding the whole situation.

P.S. 4GB GPUs will have 2 stacks of HBM2.
According to Hexus, it's unclear wether they'll use one or two stacks (see link above).
I thought the Fury X had no bottleneck in 4K with 4GB HBM? Considering Vega is supposedly designed with memory capacity constraints alleviated, maybe it is 2GB HBM2 for HD, 4GB HBM2 for 4K and 8GB HBM2 for 8K? I know if i had no plans to upgrade my monitor for a few years, i'd be fine with less memory for less money.
For the vast majority of un-modded games, 4GB is okay for 4K, although some newer titles, at least on Nvidia cards, have been shown to use more. If you use graphics-heavy mods, using 4GB is quite easy. I've seen situations where Nvidia GPUs allocate significantly more VRAM (~1GB) than AMD counterparts in the same game and the same situation with the same amount of VRAM (although those cards also significantly outperform the AMD cards, so this might be good for performance - or not, we don't know).

As for 8K, GPU performance is nowhere near sufficient for that yet. Remember, that's 4x more pixels than 4K, so expecting anything other than 1/4 the performance of 4K is naive. As such, even a 1080Ti couldn't sustain even close to 30fps in 8k ultra. Probably not even in something light like Overwatch.
 

Glo.

Diamond Member
Apr 25, 2015
5,765
4,670
136
According to Hexus, it's unclear wether they'll use one or two stacks (see link above).
First of all. Vega has Unified Memory feature that is on the GPU. Required data will be "streamed" to the GPU, at the available memory bandwidth, when its needed. So the higher memory bandwidth you have - the better for your results. There will be much less of a difference in 4K resolution performance between 4 and 8 GB versions of the GPU. So this is simply technical argument. 399$ price tag, for 4GB GPU, is enough to cannibalize the manufacturing costs. And the fact, that this market is much wider, than 599-699$ also allows AMD to get higher volume, and therefore lower manufacturing cost from 2Hi stacks.
 
Reactions: w3rd

Valantar

Golden Member
Aug 26, 2014
1,792
508
136
First of all. Vega has Unified Memory feature that is on the GPU. Required data will be "streamed" to the GPU, at the available memory bandwidth, when its needed. So the higher memory bandwidth you have - the better for your results. There will be much less of a difference in 4K resolution performance between 4 and 8 GB versions of the GPU. So this is simply technical argument. 399$ price tag, for 4GB GPU, is enough to cannibalize the manufacturing costs. And the fact, that this market is much wider, than 599-699$ also allows AMD to get higher volume, and therefore lower manufacturing cost from 2Hi stacks.
a) I'm just reiterating what Hexus said, which is that this seems unclear. Given that all of this seems to stem from a Videocardz post, I'm trying to take as much as possible with a grain of salt.
and b) Isn't part of the big improvement with the HBCC meant to be transfers between storage/system memory and VRAM, rather than between VRAM and GPU? Otherwise, how would it compensate for low amounts of VRAM? After all, that's where the biggest bandwidth bottleneck is, by far.
 

Malogeek

Golden Member
Mar 5, 2017
1,390
778
136
yaktribe.org
We'll have to see how this translates to real world use on low VRAM tests. After all we've had things like AGP Memory from long ago which proved to be horrible performance. Obviously with PCIe 3 and DDR4 it's a different world now for system memory bandwidth but the point of onboard VRAM is to provide the fastest possible storage for the GPU to access.
 

Valantar

Golden Member
Aug 26, 2014
1,792
508
136
This also shows 16GB from 2 stacks of HBM (8gb each).

The card they showed off in Doom and everything else public has been an 8GB card.

So either the graphic is wrong, or we were seeing another GPU.
Or they just put more RAM on their HPC/datacenter versions of the same GPU (which is what those slides purportedly cover). Which is the norm, after all. Two stacks of HBM2 allows for 4, 8, and 16 GB with different stack heights, no other changes necessary. Heck, they could probably do 2, 6, 10, 12 and 14 GB too, if they wanted. I just don't see the need for 8 different SKUs ...
 

Bacon1

Diamond Member
Feb 14, 2016
3,430
1,018
91
Or they just put more RAM on their HPC/datacenter versions of the same GPU (which is what those slides purportedly cover). Which is the norm, after all. Two stacks of HBM2 allows for 4, 8, and 16 GB with different stack heights, no other changes necessary. Heck, they could probably do 2, 6, 10, 12 and 14 GB too, if they wanted. I just don't see the need for 8 different SKUs ...

Oh I know, was just pointing out that we can't quote that as the specs that we've seen or what it will release at, since it's already been shown as different
 

zlatan

Senior member
Mar 15, 2011
580
291
136
a) I'm just reiterating what Hexus said, which is that this seems unclear. Given that all of this seems to stem from a Videocardz post, I'm trying to take as much as possible with a grain of salt.
and b) Isn't part of the big improvement with the HBCC meant to be transfers between storage/system memory and VRAM, rather than between VRAM and GPU? Otherwise, how would it compensate for low amounts of VRAM? After all, that's where the biggest bandwidth bottleneck is, by far.

Transfers between system memory and GPU memory is not a problem with the traditional VRAM model. The real issue is the software based memory management. There will be a lot of data in the GPU memory that's just in the memory, but not even used. It is very tipical today that a card with 8 GB VRAM will only use 2-3 GB efficiently. The other 5-6 GB is just loaded but unused data. With HBCC x GB means x GB usable memory, there won't be any unused data in the HBC.
 
Reactions: Despoiler

zlatan

Senior member
Mar 15, 2011
580
291
136
Raja did say they're shifting the advertisement to be more about memory bandwidth than amount. So you'd see 512GB/s on the box instead of 4GB. Reason being that the former is more important with HBCC than the latter.

The size of the GPU memory is still matter, AMD just want to oversimplify this in their advertisement.
A very simple but mostly accurate definition is that a 4-8 GB card with the HBCC model will effectively means 12-24 GB with the traditional software based management. But this sounds too silly for an advertisement.
 

Valantar

Golden Member
Aug 26, 2014
1,792
508
136
Just an FYI: you can qoute several messages in one post. Makes for a cleaner forum. Now, onto the topic:

The real issue is the software based memory management. There will be a lot of data in the GPU memory that's just in the memory, but not even used.
Yes, and why is this a problem? Because when new assets are loaded, the data needed a) isn't in VRAM, and b) needs to be transferred into VRAM, leading to bottlenecks rearing their heads. Transferred in from where, you ask? From system memory or storage. If this wasn't the case, the situation you're describing wouldn't be a problem in any way, shape or form. After all, if all you need are those 2-3GB of used assets, having 5-6GB of unused assets in VRAM has zero effect on performance. Windows does this constantly and on purpose, using "unused" system RAM for caching all kinds of stuff - all without any effect on system performance at all, as it's immediately overwritten if needed. So the only situation in which what you're describing is a problem, is a situation in which the GPU needs new data transferred to VRAM.

Now, I don't know close to enough about how various games and drivers allocate VRAM or how various data blocks are transferred to it to truly understand this, but my takeaway from these discussions is that current systems work something like this: The GPU needs an asset from block X, so block X is transferred to VRAM (filling the framebuffer in this scenario). Subsequently, an asset from block Y is needed, ejecting block X from VRAM as block Y is loaded - and so on. The point of the HBCC system seems to be granularity - that both blocks X and Y (and Z, and ....) can be held in a tiered caching system consisting of both system memory and storage, and thus only the relevant data are transferred to VRAM, not filling the framebuffer as quickly, thus not ejecting data as quickly (perhaps ejected data is again kept in system RAM?), and overall improving transfer efficiency and VRAM utilization dramatically. This would dramatically lower GPU wait times for out-of-VRAM assets (the difference in transfer time between transferring the specific textures you need vs. the entire texture library would be rather noticeable, no?), while simultaneously lowering the rate of occurrence of this due to more selective use of VRAM.

Now, as I said, I could be entirely wrong about this. I barely understand how GPUs work at all. But if I am, what exactly does the HBCC imrpove upon?
 

richaron

Golden Member
Mar 27, 2012
1,357
329
136
The size of the GPU memory is still matter, AMD just want to oversimplify this in their advertisement.
A very simple but mostly accurate definition is that a 4-8 GB card with the HBCC model will effectively means 12-24 GB with the traditional software based management. But this sounds too silly for an advertisement.

This is just wrong. It's an apples to oranges comparison. 4GB local memory is most accurately compared to another 4GB local, no matter what type.

From what I know the HBCC allows for directly acessing system RAM (and more) by passing pointers or virtual memory address space. Any other GPU can also access system RAM (with drivers), but because of the lack of hardware support and added software complexity/overhead the HBCC system is much simpler and more efficient. But being more efficient doesn't mean it's magically equal to more local GPU memory, and they still all rely on optimized software to prefech and manage the data if they run out of local memory.

I could be wrong though, can you provide a quantitative way of working out your 12-24GB number?
 
Reactions: w3rd and Valantar

zlatan

Senior member
Mar 15, 2011
580
291
136
Just an FYI: you can qoute several messages in one post. Makes for a cleaner forum. Now, onto the topic:


Yes, and why is this a problem? Because when new assets are loaded, the data needed a) isn't in VRAM, and b) needs to be transferred into VRAM, leading to bottlenecks rearing their heads. Transferred in from where, you ask? From system memory or storage. If this wasn't the case, the situation you're describing wouldn't be a problem in any way, shape or form. After all, if all you need are those 2-3GB of used assets, having 5-6GB of unused assets in VRAM has zero effect on performance. Windows does this constantly and on purpose, using "unused" system RAM for caching all kinds of stuff - all without any effect on system performance at all, as it's immediately overwritten if needed. So the only situation in which what you're describing is a problem, is a situation in which the GPU needs new data transferred to VRAM.

Now, I don't know close to enough about how various games and drivers allocate VRAM or how various data blocks are transferred to it to truly understand this, but my takeaway from these discussions is that current systems work something like this: The GPU needs an asset from block X, so block X is transferred to VRAM (filling the framebuffer in this scenario). Subsequently, an asset from block Y is needed, ejecting block X from VRAM as block Y is loaded - and so on. The point of the HBCC system seems to be granularity - that both blocks X and Y (and Z, and ....) can be held in a tiered caching system consisting of both system memory and storage, and thus only the relevant data are transferred to VRAM, not filling the framebuffer as quickly, thus not ejecting data as quickly (perhaps ejected data is again kept in system RAM?), and overall improving transfer efficiency and VRAM utilization dramatically. This would dramatically lower GPU wait times for out-of-VRAM assets (the difference in transfer time between transferring the specific textures you need vs. the entire texture library would be rather noticeable, no?), while simultaneously lowering the rate of occurrence of this due to more selective use of VRAM.

Now, as I said, I could be entirely wrong about this. I barely understand how GPUs work at all. But if I am, what exactly does the HBCC imrpove upon?

The problem is how the GPU is connected to the system. We choose a very wrong way years ago, and this added a lot of complexity. For the older APIs, the driver model is too robust, and the graphics kernel driver unable to delete an allocation without causing stuttering. And this is a huge problem. The actual kernel drivers just load the stuffs to the VRAM and deleting an allocation is the last thing what the management will do. The hardware can work much-much better, but the software stack is not designed to map well to the actual hardwares.
With D3D12 and Vulkan the application get explicit control over the memory, but this also not works as expected, because managing the data is very hard, and most devs don't have the resource to do it well.
HBCC is just cut off the very inefficient software stack and make the whole system managed by the hardware.

Even todays hardwares can get much better efficiency, if we change the WDDM software stack. Just think about the consoles. But it won't happen, because there is a lot of money in the code.
 

Glo.

Diamond Member
Apr 25, 2015
5,765
4,670
136
This is just wrong. It's an apples to oranges comparison. 4GB local memory is most accurately compared to another 4GB local, no matter what type.

From what I know the HBCC allows for directly acessing system RAM (and more) by passing pointers or virtual memory address space. Any other GPU can also access system RAM (with drivers), but because of the lack of hardware support and added software complexity/overhead the HBCC system is much simpler and more efficient. But being more efficient doesn't mean it's magically equal to more local GPU memory, and they still all rely on optimized software to prefech and manage the data if they run out of local memory.

I could be wrong though, can you provide a quantitative way of working out your 12-24GB number?
This is my, oversimplified, take on what you are saying:

First of all. Vega has Unified Memory feature that is on the GPU. Required data will be "streamed" to the GPU, at the available memory bandwidth, when its needed. So the higher memory bandwidth you have - the better for your results. There will be much less of a difference in 4K resolution performance between 4 and 8 GB versions of the GPU. So this is simply technical argument. 399$ price tag, for 4GB GPU, is enough to cannibalize the manufacturing costs. And the fact, that this market is much wider, than 599-699$ also allows AMD to get higher volume, and therefore lower manufacturing cost from 2Hi stacks.
Zlatan is absolutely correct on the memory framebuffer and software memory allocation management.

However, it means that you have effectively 4 GB's of RAM, that has direct, and immediate access to larger parts of data that is not allocated in memory.

I have said this before. Vega is possibly most advanced GPU architecture we have ever seen.
 
Reactions: w3rd

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
It's like 'infinite fabric", but for GPU - a marketing term designed to hide a fact that due to memory pricing realities AMD is forced to introduce extra hw/driver schemes to manage GPU memory. And just like with CPUs, once you hit a link that has bw/latency limits and is being also used for other things, you will get reduced performance.

We already got this management for GTX970, and given how that card performs once near/above 3.5GB is perfect indicator for how this scheme will work once near 4GB limit for this card. PCIE bandwith/latency is still the same and it is still being used for other things. And no, AMD has not invented time machine and cannot anticipate user actions and does not know what assets will be needed.
 
Reactions: xpea and Sweepr
Status
Not open for further replies.
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |