That's weird, and the first that I've heard of Arctic Islands being non-GCN. Unless Polaris is just GCN 2.0 .
It's a name. It's meaningless. You could easily argue that Maxwell is Fermi 3.0.
That's weird, and the first that I've heard of Arctic Islands being non-GCN. Unless Polaris is just GCN 2.0 .
It's not like they just trash their old designs and start over from scratch. Once they tweak GCN enough, it makes sense to give it a new name, just for the marketing boost.
Personal wishlist: more registers per work unit, dismantling the existing geometry/tesselation and doing it all distributed on the CUs (with necessary tweaks to make that fast).
You can't just cut the work unit size like that, because amd uses a trick where the register file is actually split into 4 separate register files for the 4 segments of the 64-unit wavefront, each of which has only a single r/w port from which the operands are read over 4 clocks. 4x single ported register file is much cheaper than a single 4-ported register file, even if you make it a lot smaller.More registers per thread is certainly a plus when it comes to increasing occupancy but lowering their wavefront size from 64 to 16 to match the vector unit width would do wonders for them since it reduces the register file by a factor of 4x! Around a billion transistors are used to create JUST the register file alone on Fiji. Naively increasing the size of the register file would increase latencies which lowers the clocks potentially leading to lower performance ...
That's weird, and the first that I've heard of Arctic Islands being non-GCN. Unless Polaris is just GCN 2.0 .
You can't just cut the work unit size like that, because amd uses a trick where the register file is actually split into 4 separate register files for the 4 segments of the 64-unit wavefront, each of which has only a single r/w port from which the operands are read over 4 clocks. 4x single ported register file is much cheaper than a single 4-ported register file, even if you make it a lot smaller.
Umm, no ...
Each of those separate register files are dedicated to their own vector units ...
Yes, but this also allows them to make the register files cheaper because as they do barrel processing over 64 elements in 4 cycles, and they need 3 operands and 1 write, they can split their register file into 4 quarters and they need to touch each quarter exactly four times over four clocks, allowing them to make it a very simple reg file with 1 r/w port.How AMD got to the size of 64 as their wavefront is due to the fact that it takes 4 cycles to execute an instruction for an entire wavefront on their vector units ...
AMD GCN has at least 3 read ports and 1 write port per 32 bit ALU considering the fact that they support native trinary operations going by their OpenGL extension or their compiler heuristics so that comes to a total of 64 R/W ports per register file which means that a multi-port I/O register file design isn't as expensive as you lead me to believe ...
wavefront 1 wavefront 2
reg file a: r1 r2 r3 w r1 r2 r3 w
reg file b: r1 r2 r3 w r1 r2 r3 w
reg file c: r1 r2 r3 w r1 r2 r3 w
reg file d: r1 r2 r3 w r1 r2 r3 w
No. Although as GCN has the number 4 several times at different levels, it's easy to get confused. To put it clearer:
Each SIMD pipe inside a GCN CU has 4 separate 16-wide register files, each with a single r/w port. So a CU has 16 register files.
Yes, but this also allows them to make the register files cheaper because as they do barrel processing over 64 elements in 4 cycles, and they need 3 operands and 1 write, they can split their register file into 4 quarters and they need to touch each quarter exactly four times over four clocks, allowing them to make it a very simple reg file with 1 r/w port.
to illustrate, each wavefront is split into a,b,c,d and each operation requires r1, r2, r3, w:
(writes are probably delayed over a wavefront so that ex gets full 4 cycles, and results forwarded over once cycle when required.)Code:wavefront 1 wavefront 2 reg file a: r1 r2 r3 w r1 r2 r3 w reg file b: r1 r2 r3 w r1 r2 r3 w reg file c: r1 r2 r3 w r1 r2 r3 w reg file d: r1 r2 r3 w r1 r2 r3 w
The ultimate result is that all execution units get fed from energetically very cheap register files, but that you can only read "in sequence", so you have to use 64-element wavefronts instead of operating on new 16-element vector each cycle.
Industry is going for higher resolution (4K, VR) instead of higher Image Quality. You can forget about Raytracing for the next 10-20 years.
Nice one thanks, but i was talking about the hardware.
With 4K Monitors becoming mainstream the next 2-3 years, the Hardware performance of GPUs will have to be 5-10 times more (perhaps more) than what would need for Ray Tracing at 1080p.
Surprised they denote no new changes to the Rasterizer or the ROPs as this is where Fury X is struggling against Maxwell.
The way AMD is presenting it, Polaris will just be another "generation" of GCN, implying only a few small tweaks yet again. This is really bad. Pascal is going to leave it in the dust...
Also, in before the "I told you so" parade. :/
Did you not look at the slide? Most of the chip is all new. Don't get hung up on a name. Unfortunately for AMD they don't understand the marketing power of simply calling it something different. Again, their engineering degrees are showing instead of their marketing degrees.