Let me get back on track:
https://forum.beyond3d.com/threads/...ssion-soc-tegra-x1.59968/page-14#post-1973875
Sebbbi from Beyond3D:
Just wanted to clarify that I meant AMD GCN2 (consoles) vs Nvidia's latest (Maxwell/Pascal). AMD PC GPUs have also improved since GCN2.
Improvements for general performance:
- GCN3 introduced delta color compression. Including ability to sample/load compressed textures without decompress step.
- GCN3 improved geometry tessellation performance
- GCN4 improved geometry performance in general (including fast strips, primitive discard, etc).
- GCN4 improved delta color compression.
- GCN4 added instruction prefetch (reduces pipeline latency, again helps with geom bottleneck).
- GCN4 improved async compute scheduling (GPU side)
GCN5 (Vega) adds these general performance improvements:
- L2 cache includes L2 ROP cache (L1 ROP caches under L2). Don't need to flush caches between pixel shader passes.
- Tiled rasterizer. Reduces overdraw, bandwidth and makes ROPs more efficient in general.
- Improved geometry pipeline (including proper load balancing, up to 2x higher peak throughput)
- General purpose memory paging system
(I didn't list features that don't bring performance improvements without programmer intervention)
All of these improvements mean that GCN5 should run general purpose pixel/vertex shader code much better than GCN2. GCN5 has most of the same tricks that are seen in modern Nvidia GPUs. There are nice compute improvements as well, but they need special programmer support (DPP, SDWA, FP16). We will see the real impact of these improvements when DX12 SM 6.0 becomes available. Doom is already using these features with Vulkan, resulting in nice gains.
So in order to get performance gains from FP16 you need to update the code of your application, same thing for Primitive Shaders(Programmable Geometry Pipeline). As for General Purpose Memory Paging System:
https://forums.macrumors.com/thread...her-apu-polaris-announcements.1975249/page-58
Part about The memory system:
General Purpose Memory Paging system works in a way that is pretty much revolutionary. You have 512 TB indexing available done on hardware level, for full HSA 2.0 Unified Memory compatibility without any software level abstraction. Lets take a theoretical approach, and lets say we have 3072 GCN core GPU, with 4 GB of HBM2 with 512 GB/s of bandwidth.
Current models of Memory System store both used, and unused data in memory of the GPU, because in general GPUs do not have enough horsepower to handle all of it in particular time. Vega changes this approach. Tile Based Rasterization, next generation Pixel Engine, that is connected to L2 cache, and massively improved geometry performance increase throughput of the GPU. What is important is feeding this GPU with data. GDDR5 memory cannot give enough bandwidth to feed those cores, with reasonable amounts of power consumed. Neither does GDDR5X. Titan Xp memory system consumes around 50W of power alone, and memory subsystem consumes at peak 75W of power, due to amount of memory controllers, but averages are lower due to both Memory Compression, and Tile Based Rasterization, and ROPs connected to L2 cache. HBM2 memory cubes - 8W, and whole memory subsystem will consume at peak 15W of power, and you still get the benefit of Tile Based Rasterization, ROPs connected to L2 cache, rather than memory controller, ETC.
What actually does this Memory format? Framebuffer compared to large amount of GDDR5/X memory is smaller, but the data are available immediately to the GPU, and larger portions can be executed in particular time. Think about this like Non-Volatile data stored in memory, and indexed data in the system(because the memory controller has access to data in the System RAM, SSD's, HDD's in your computer and even network storage) is volatile. You save even more memory power consumption because of unused data. The Framebuffer is small enough to not exceed PCIe bandwidth, so the data can be delivered where its needed, when it is needed. Its all done on hardware, without any software abstraction.
Im sure people can explain all of this much more clearly and in more detail.
In essence this works not only for games, but compute, rendering, professional applications, everything that can benefit from it.