William Gaatjes
Lifer
- May 11, 2008
- 20,041
- 1,289
- 126
It's an old picture. There are newer ones w/ updated components and respective latencies.
Picked the 1st one I found while googling as it conveys the point of wildly differing access latencies for respective components and scales therein. People often throw around components while ignoring the access latencies.
Also, to put this marketing gimmick to rest : All modern video cards have HBCCs as its nothing more than a DMA controller paging memory and communication back and forth through system memory. The only question is how its implemented, how well it performs, and what code/driver support it needs to function. Radeon hasn't provided any of these details thus its nothing more than fanciful marketing until then.
Hilariously, Nvidia outperforms Radeon cards by 10fold factors when it comes to this area of the pipeline. So, don't expect some earth shattering change when the details come as well as real world non-canned performance.
I'm really getting tired of the Vega b.s and I was 100% on board waiting for its arrival.
The more I look into the technical details about the card the more I understand how little of note these marketed features are. I don't judge AMD CPU division in the same light. They seem to have actually gotten themselves together and fixed such glaring issues in their hardware pipeline.
Interesting that you mention DMA, because that was what i was thinking about also when reading about hbcc in the previous posts and i was thinking about the IOMMU as well. The DMA controllers indeed take care of retrieving data in parallel with current execution. And it is normally the game engine or 3d cad software that must schedule the DMA transfers correctly such that the DMA transfers actually prefetch data before it is needed by the execution units in the GPU. Otherwise the execution units would be stalled while waiting for data and that nullifies the use of DMA for these kind of cases. That is DMA under control by software, now to the gpu iommu.
The iommu takes care of the virtual addressing of the device to physical addressing of addressable system memory( All the memory locations the cpu can address, not just the physically present and available memory) .
I think the the iommu can also perform some cache tasks but i am not sure about this.
And that is what got me wondering.
I wonder if HBCC works well in a HSA environment.
The whole zero copy idea of HSA was that only pointers are passed and no data is actually unnecessarily copied, saving bandwidth and preventing high latencies.
On a cpu/gpu combination like an APU with a single memory space that works fantastic.
On a cpu + gpu system with a gpu connected over a high speed serial port (alike lots of serial links in parallel = PCIe or IF), only pointers are passed over the high speed serial port. And the
gpu can access main system memory over the serial port (like PCIe). But as you mentioned, that is indeed the slow down factor. And the only way to prevent that factor is to do intelligent prefetching.
Normally the game engine or 3D cad program takes care of this prefetching. ( I mean with prefetching here that the data is modified ahead of actual use, hiding the latency.)
If i understand correctly, AMD boasts that they can do that prefetching automatically through the driver and even from slower devices such as SSD storage.
I wonder how well that works. The software does have to tell the hardware what to retrieve in advance.
It would be really difficult to track all behavior of a program to know what is needed.
That would make a gpu driver very complex.
And it seems that AMD already has a dire need to increase the software department for the gpu.
Also , the HBCC usage should extend the physical addressing from system memory only that a normal iommu does to data on storage as well (HBCC is designed with 512TB of addressing range).
So, virtual addressing to system memory and other storage devices. At least , that is what i interpreted from it.
When looking at the diagram from this AMD whitepaper, the HBCC is replacing the memory controller. Note the lack of an IOMMU. So, i can assume the HBCC is also doing the IOMMU tasks.
When reading here,
http://www.guru3d.com/articles-pages/amd-radeon-rx-vega-64-8gb-review,31.html
Starting with Vega architecture you have the ability to use a bit of your system memory and assign it to the graphics card. In the drivers (global settings) you will see a HBCC (High Bandwidth Cache Controller) entry, you can assign a part of your system memory that the graphics card can then use as extra cache.
Now as my editor Ian reminded me on, this idea is pretty similar to technologies used for years like Turbocache (Nv) or Hypermemory (ATi) like ten years ago. HBCC it is totally different and yet offer the same.
That sure is interesting.
And that is what you mentioned.
Nvidia turbocache and ATI hypermemory.
But AMD claims they can do it better now by making use of mass storage devices as well.
edit:
Forgot to mention that the DMA engine is missing too from the picture.
XDMA is for crossfire, i assume that the HBCC is doing the DMA and IOMMU tasks aside from being a memory controller.
Last edited: