[Videocardz] AMD Polaris 11 SKU spotted, has 16 Compute Units

MrTeal · Apr 11, 2016

@Silverforce and Mahigan, is there anything to actually say that GCN4 will resemble the system described in the patents, such as Linux checkins or similar? That seems like a pretty radical departure from the current architecture, and if they were introducing GPUs based on that within a few months you would think they would want cards in the hands of every dev out there now especially in the DX12 era.

AtenRa · Apr 11, 2016

1, you dont get more work per ALU

2, you get higher perf/watt per SIMD if you have less than 16x Threads wavefronts

3, you get lower latency with smaller than 16x Threads wavefronts

4, You need more CUs to maintain the same throughput as a 4x SIMD 16x Threads (64) CU.

AtenRa · Apr 11, 2016

MrTeal said:
@Silverforce and Mahigan, is there anything to actually say that GCN4 will resemble the system described in the patents, such as Linux checkins or similar? That seems like a pretty radical departure from the current architecture, and if they were introducing GPUs based on that within a few months you would think they would want cards in the hands of every dev out there now especially in the DX12 era.

The new design described in the patent could actually be better for current software if the software uses lots of smaller than 16x Threads Wavefronts.

If the software is heavily optimized with lots of 16x threads wavefronts you dont gain much, in some cases you may actually even lose performance.

Slaughterem · Apr 11, 2016

I think you all are forgetting that the scaler counts as one. The CU - 1 wide 1 wide 2 wide 4 wide and 8 wide. The scalers are 2 cycle high speed compared to the other alu that would make 64 cu = 1024 sp.

By providing a set of execution resources within each GPU compute unit tailored to a range of execution profiles, the GPU can handle irregular workloads more efficiently. Current GPUs (for example, as shown in FIG. 3) only support a single uniform wavefront size (for example, logically supporting 64 thread wide vectors by piping threads through 16 thread wide vector units over four cycles). Vector units of varying widths (for example, as shown in FIG. 4) may be provided to service smaller wavefronts, such as by providing a four thread wide vector unit piped over four cycles to support a wavefront of 16 element vectors. In addition, a high-performance scalar unit may be used to execute critical threads within kernels faster than possible in existing vector pipelines, by executing the same opcodes as the vector units. Such a high performance scalar unit may, in certain instances, allow for a laggard thread (as described above) to be accelerated. By dynamically issuing wavefronts to the execution unit best suited for their size and performance needs, better performance and/or energy efficiency than existing GPU architectures may be obtained.

Slaughterem · Apr 11, 2016

My understanding is 64 wave equals 4cycles x 16 threads. Some of the threads are useless and are predicated off so a 16 thread with 4 useless threads could be run in 8x4 wide ALU while the rest of the ALuU could be power gated off.

antihelten · Apr 11, 2016

Head1985 said:
Fiji at 14nm and with better fron-end will be only tahiti size.Amd can pull hawaii size SKU with around 5120SP

AMD could also make a 600mm2 8192 shader monster GPU on 14nm if they wanted to. But the question is what makes sense from a business standpoint at this stage. If AMD can sell a ~300-350mm2 Vega 10 to people now, and then a 450-600mm2 GPU to the same people down the road, then obviously that is preferable to simply selling them the 450-600mm2 GPU from the beginning.

flash-gordon · Apr 11, 2016

So, with smaller wavefronts and higher occupancy, IPC and efficiency will be higher, latency and size will be smaller, as will throughput.

To achieve higher throughput AMD will need to multiply their shader engines which increases ROPs and geometry.

16nm plus this stuff is pretty exciting!

AtenRa · Apr 11, 2016

flash-gordon said:
So, with smaller wavefronts and higher occupancy, IPC and efficiency will be higher, latency and size will be smaller, as will throughput.

To achieve higher throughput AMD will need to multiply their shader engines which increases ROPs and geometry.

16nm plus this stuff is pretty exciting!

ROP count will increase only if you keep the same amount per Shader Engine. But since ROPs are decoupled from Memory you could have more Shader Engines with fewer ROPs each and still keep the same overall ROP count as before.

Saylick · Apr 11, 2016

Slaughterem said:
I think you all are forgetting that the scaler counts as one. The CU - 1 wide 1 wide 2 wide 4 wide and 8 wide. The scalers are 2 cycle high speed compared to the other alu that would make 64 cu = 1024 sp.

Good catch.

If the Scalar ALUs can be used to catch the one-off vector math, then the total ALU count for GCN 4's CU will be 16 ALUs, which allows it to operate 16-thread wide vector math in one cycle if all ALUs are engaged. Thus, (4) "GCN 4" CUs will match (1) "GCN 1-3" CU (64 ALUs total for each).

Head1985 · Apr 11, 2016

antihelten said:
AMD could also make a 600mm2 8192 shader monster GPU on 14nm if they wanted to, but question is what makes sense from a business standpoint at this stage. If AMD can sell a ~300-350mm2 Vega 10 to people now, and then a 450-600mm2 GPU to the same people down the road, then obviously that is preferable to simply selling them the 450-600mm2 GPU from the beginning.

Tahiti size SKU is vega11 and that SKu will go against GP104.Vega 10 should go agains Big pascal.
To be honest GP104 and 106 looks like crap.
Gp106 with 128bit vs polaris 10 with 256bit 200mm2GP106 vs 232mm2 polaris
Gp104 with crap 256bit and DDR5 vs Vega11 with HBM2
AMd might come 6months late with vega, but they will win in this generation.Unless Nv pulls some miracle like again 20% IPC vs maxwell and miracle delta compression and 1.6Ghz stock boost.
I just cant wait for vega11-Tahiti size SKU(350mm2) HBM2 with 1TB/S.Same SP as Fiji, but with new architecture and front-end.That should crush Gp104 and it will be great Card.

Tapoer · Apr 11, 2016

Fusion is definitely on AMD future.
Bulldozer brought a more separated FPU, and now GCN brings the flexibility and the low latency needed for a CPU FPU/SIMD Unit.
EDIT: I'm imagining a wide low latency GCN like FPU shared between several integer cores (with only simple scalar FP ALU), Niagara style.

There is a limit on how many ALU that can be feeded each frame on a GPU, the more ALU the less they scale, and latency starts to matter, this GCN change makes all the sense.

I'm wondering if currency mining or any workload that can use as many ALU a GPU have if this approach will weaker compared to a more wide GCN 1-3, but Polaris and Vega will be more efficient anyway.

Mopetar · Apr 11, 2016

MrTeal said:
That seems like a pretty radical departure from the current architecture, and if they were introducing GPUs based on that within a few months you would think they would want cards in the hands of every dev out there now especially in the DX12 era.

I don't think it's a change that developers need to be too concerned with as it's something that the hardware should be handling on its own. If anything, it means that developers don't need to go crazy trying to optimize their code to perfectly feed the GPU and worry about keeping the utilization high because the GPU will powergate unused resources with more granularity than before.

I'm also somewhat skeptical about any SP predictions based on previous generations. It's quite likely that in order to use this kind of arrangement where you can powergate groups of ALUs you would need additional transistors that increase the size of the CU, which means fewer total CUs relative to the older designs. Of course there's also the chance that they've made other optimizations to cut down on the transistors used or to lay them out better, but considering the major overhauls that they've already confirmed, it seems like that's something we'll see in future generations more than we will here.

Slaughterem · Apr 11, 2016

Mopetar said:
I'm also somewhat skeptical about any SP predictions based on previous generations. It's quite likely that in order to use this kind of arrangement where you can powergate groups of ALUs you would need additional transistors that increase the size of the CU, which means fewer total CUs relative to the older designs. Of course there's also the chance that they've made other optimizations to cut down on the transistors used or to lay them out better, but considering the major overhauls that they've already confirmed, it seems like that's something we'll see in future generations more than we will here.
*

It appears that 4 cu NewGCN = 1 cu OldGCN. OGCN has 4x16 ALU plus 1 scaler and associated registers. NGCN will be 4 x 8 ALU = 2 X 16 OGcn alu + 4 x 4 ALU NGCN = 1 X 16 OGCN + 4 x 2 NGCN = half of the last OGCN 1 x 16 + 4 X 1 additional scaler. It might be the same size even with power gating due to the space savings, it depends on how big a scaler is compared to 2 ALU.

antihelten · Apr 11, 2016

Head1985 said:
Tahiti size SKU is vega11 and that SKu will go against GP104.Vega 10 should go agains Big pascal.
To be honest GP104 and 106 looks like crap.
Gp106 with 128bit vs polaris 10 with 256bit 200mm2GP106 vs 232mm2 polaris
Gp104 with crap 256bit and DDR5 vs Vega11 with HBM2
AMd might come 6months late with vega, but they will win in this generation.Unless Nv pulls some miracle like again 20% IPC vs maxwell and miracle delta compression and 1.6Ghz stock boost.
I just cant wait for vega11-Tahiti size SKU(350mm2) HBM2 with 1TB/S.Same SP as Fiji, but with new architecture and front-end.That should crush Gp104 and it will be great Card.

We have no way of knowing the size of Vega 11 at this stage, nor what it will be competing with, same goes for Vega 10.

To be honest GP106/104 neither looks like crap, nor do they look like non-crap, since there is practically no information available for them at this stage (other than some die shots from Chiphell, which could be of literally anything).

In other words we have no idea of who will be winning or losing this generation, and as such your predictions are imho wildly premature.

dzoni2k2 · Apr 11, 2016

Well Raja Koduri already hinted what we can expect from Polaris. It's gonna be smaller than in the past and it's gonna be higher perf/$.

In other words I expect something similar to what HD 4000 series was. Not quite the perf crown but a no brainer when it comes to perf/$.

This is where most volume is and this is what AMD will focus on.

So don't be expecting FuryX killer from Polaris

I expect the same from Vega. I don't think you will see another 600mm2 GPU from AMD anytime soon.

Glo. · Apr 11, 2016

So AMD will not be on Computex and on E3. Interesting. Are they gonna show new GPUs on external event? WWDC? It is directly between them.

And appearance on that would make huge traction for them...

tential · Apr 11, 2016

Glo. said:
So AMD will not be on Computex and on E3. Interesting. Are they gonna show new GPUs on external event? WWDC? It is directly between them.

And appearance on that would make huge traction for them...

Who said amd won't be at e3? I thought they'd continue to debut new gpus there for the average gamer. Especially when they're targeting the midrange market with Polaris

96Firebird · Apr 11, 2016

They aren't on the exhibitor list yet...

http://pr311jm731.mapyourshow.com/6_0/alphalist.cfm

I looked under "A" for AMD or Advanced Micro Devices, and under "T" and "R" for The Radeon Group. Nothing there...

It would be odd for them to miss it.

tential · Apr 11, 2016

96Firebird said:
They aren't on the exhibitor list yet...

http://pr311jm731.mapyourshow.com/6_0/alphalist.cfm

I looked under "A" for AMD or Advanced Micro Devices, and under "T" and "R" for The Radeon Group. Nothing there...

It would be odd for them to miss it.

Would be a huge missed opportunity in my opinion

Glo. · Apr 11, 2016

If they would announce the GPUs on Apple's WWDC keynote presentation it would make much more stir than anything before.

StereoPixel · Apr 11, 2016

http://insightfulvr.com/insightfulvr/

monstercameron · Apr 11, 2016

Knowing amd theyll probably be in some hotel close by...

Sent from my SM-G930T using Tapatalk

Kenmitch · Apr 11, 2016

tential said:
Would be a huge missed opportunity in my opinion

Weren't they entwined in the pc gaming show last year? Think it's coming back this year also.

tential · Apr 11, 2016

Glo. said:
If they would announce the GPUs on Apple's WWDC keynote presentation it would make much more stir than anything before.

You think apple would let amd get a new product launch at their own keynote presentation?

antihelten · Apr 11, 2016

dzoni2k2 said:
Well Raja Koduri already hinted what we can expect from Polaris. It's gonna be smaller than in the past and it's gonna be higher perf/$.

Polaris doesn't really seem to be much smaller as far as I can tell. Polaris is obviously intended as the lower end of the stack with Vega being the higher end. As such Polaris is the update for chips like Cape Verde and Pit Cairn, which at 123mm2 and 212mm2 would seem to be quite close to the rumored Polaris sizes.

As far as perf/$ goes, that's a bit of a no-brainer. New nodes always bring GPUs with higher perf/$ (at least if we compare MSRPs). So unless Koduri was hinting at a bigger jump than usual, higher perf/$ doesn't really tell us anything.

[Videocardz] AMD Polaris 11 SKU spotted, has 16 Compute Units

Diamond Member

Lifer

Lifer

Member

Member

Golden Member

Member

Lifer

Diamond Member

Golden Member

Member

Diamond Member

Member

Golden Member

Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Member

Diamond Member

Diamond Member

Diamond Member

Golden Member