[Videocardz] AMD Polaris 11 SKU spotted, has 16 Compute Units

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

jpiniero

Lifer
Oct 1, 2010
14,842
5,457
136
Based on what?

Well, peak-wise the P100 is roughly only 2X SP and DP of what Hawaii is. You'd have to think that Vega would be a refined Hawaii and the shrink would let them double the shader count plus higher clocks and would be much faster than P100. If it's only 4096 it would probably still be competitive but nowhere near the size of GP100.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
GTX780Ti
384bit with 336GB/s bandwidth

GTX980
256bit with 224GB/s bandwidth

I dont need to tell you which of the two is the fastest.

The difference is only half of the next change. And you can find example that a GTX780TI is faster due to memory bandwidth than a GTX980.

A GTX980 is memory bandwidth limited. I know first hand.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
The difference is only half of the next change. And you can find example that a GTX780TI is faster due to memory bandwidth than a GTX980.

A GTX980 is memory bandwidth limited. I know first hand.

The GTX980 may be, Polaris 10 may not.

Also, we are not expecting Polaris 10 to reach Fiji performance at 4k. We expect Polaris 10 to reach Hawaii/Fiji performance at 1080/1440p.
 

tential

Diamond Member
May 13, 2008
7,355
642
121
The difference is only half of the next change. And you can find example that a GTX780TI is faster due to memory bandwidth than a GTX980.

A GTX980 is memory bandwidth limited. I know first hand.

It still doesn't change the fact that memory bandwidth isn't a pure indicator of performance.

Otherwise, you'd have a GTX 780Ti....
 

ultima_trev

Member
Nov 4, 2015
148
66
66
I hope those clocks are that low due to being engineering samples. Even then, I doubt GCN 1.3 can have that much of a higher IPC than GCN 1.1 / 1.2 thus I hope Polaris 11 is R7 460 series and Polaris 10 has at least as much shaders has Hawaii.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
Wow Russian, nice attack. I don't see anything in ShintaiDK's post predicting anything, he is simply posting fact. You've hit a new low there... :\

I'm surprised he didn't post something about my GTX980. Another thing he cant get right and haunts him.
 

parvadomus

Senior member
Dec 11, 2012
685
14
81
That chart looks like absolute trash, 3.7 GFLOPs is just too low. OR suddendly AMDs resource utilization just went to the roof and reached insane levels.
 
May 11, 2008
20,068
1,292
126
Yeah that memory bus means maybe people should start dialing back expectations, at least at high resolutions. Polaris 10 might be a 1080p monster though, and we all know the 970 made a killing because of that.

I am very interested in that Polaris 11. Looks like it will be cheap with only a 128 bus.

It got me worried for a second as well. But with better memory compression techniques and more cache and the intelligent prefetching i have been reading about, that "meager" 128 bits bus might not be a problem at all for a 1080p resolution. We will have to wait and see.
 
May 11, 2008
20,068
1,292
126
GCN 4.0 will basically power down unused GCN cores and boost GCN cores that are in use. Vector ALUs will also intelligently match incoming workloads leaving very little inefficiencies compute wise:
http://www.freepatentsonline.com/20160085551.pdf

So 2,304 GCN 4.0 Alu's might act more like 1.5GHz+ GCN 3.0 Alu's. That means a significant improvement in compute performance over GCN 3.0. It all depends on how high they will boost.

Seems like we know one way that they will boost CU performance and compute efficiency.

Say we're comparing GCN 3.0 with 4.0 cores. A single GCN 4.0 core would be around 50% more powerful (if not more).

Airfathaaaaa sent me this link.

That is interesting to read.
I have read once that the driver of nvidia compiles and schedules at runtime the shader programs to make efficient use of the calculation units.
When i read this patent, i get the impression that AMD does the same. That the driver compiles or even re compiles the shader programs for optimal use.
Is this true ?
 

Mahigan

Senior member
Aug 22, 2015
573
0
0
That is interesting to read.
I have read once that the driver of nvidia compiles and schedules at runtime the shader programs to make efficient use of the calculation units.
When i read this patent, i get the impression that AMD does the same. That the driver compiles or even re compiles the shader programs for optimal use.
Is this true ?

Not precisely that. NVIDIA's scheduler is a static scheduler, meaning that scheduling takes place in software, in the driver, by the CPU.

For AMD, scheduling takes place in hardware. So no matter how the software is written, the hardware will handle the efficient scheduling as defined in that patent.

This means that these tweaks are software agnostic. They will apply across the board, on all shader/compute loads,regardless of software/driver optimizations.
 

JDG1980

Golden Member
Jul 18, 2013
1,663
570
136
If I'm not mistaken the latest rumors has Vega 10 at 4096 shaders, which is the same as Fury X.

That rumor comes from a questionable interpretation of someone's LinkedIn profile (which has since been deleted). The profile claimed the author was working on the "Greenland" project, and that this GPU would have 4096 shaders. It's pure speculation that the GPU formerly known as Greenland is now going to be Vega 10 (the bigger of the two). It could be Vega 11 (the smaller chip). It could be a project that will never be publicly released for whatever reason. It could even be the iGPU for the massive "Zeppelin" server APU we've heard rumors about.

When you think about it, it doesn't make much sense for the biggest of four AMD 14LPE GPUs to have only 4096 shaders. We know from eyewitness testimony that Polaris 11 (the smallest chip) is a bit smaller than Cape Verde, probably around 110mm^2. There is good reason to think Polaris 10 is 232mm^2; that comes from a LinkedIn profile, but it's around Pitcairn's size, so it would make perfect sense as the next step up. What about the two Vega chips? It seems to me that it would be most logical for the smaller Vega chip to be roughly Tahiti/Tonga-sized (~350mm^2) and the bigger chip to be approximately Hawaii-sized (~450mm^2). Fiji is about 600mm^2 on 28nm, so a hypothetical die-shrink with no other changes would be roughly 272mm^2 (14LPP has about 2.2x the transistor density of 28nm). That leaves plenty of room to beef up the front-end and add all the new features (HEVC, HDMI 2.0, etc.) while still staying at or below the Tahiti size class. If a Tahiti-sized 14LPP chip has 4096 shaders, then the Hawaii-sized 14LPP chip would obviously have a lot more.

GP100 (with 56 SMs) should be about 60-70% faster than a stock Fury X, so Vega 10 would have to pick that up from increases in IPC and frequency, which seems like a bit too much imho. At best I could imagine Vega 10 gaining 30-40% on Fury X via IPC/frequency improvements.

A Vega chip with 4096 shaders should be able to do quite a bit better than that. Remember, Fiji is severely bottlenecked in many games by its weak front-end, which is no better than Hawaii's. Raja Koduri admitted that this was a trade-off to fit it in the reticle limit on 28nm. A Vega chip won't have that problem, and should be able to actually make use of its 4096 shaders. Right now, Fury X is only ~16% faster than R9 390X at 1080p, despite having 45% more shaders. Even at 4K, it's only ~25% faster. If a better front end and improved scheduler let a 4096-shader card actually be 45% better than Hawaii, that alone would be a substantial improvement. On top of that, FinFET should allow far higher clock rates than we get now. Both Apple's A9 SoC and Nvidia's GP100 Tesla accelerator sport clock rates about 40% higher than their planar predecessors. GCN on 14LPP should be able to do ~1.4 GHz core clock without any stability problems; perf/watt may suffer a bit, but for high-end desktop cards, that will be an acceptable tradeoff as long as the overall power envelope stays in the 250-300W range.
 
Feb 19, 2009
10,457
10
76
Polaris 11 = 1,280 SP (~110mm2)
Polaris 10 = 2,560 SP (~232mm2)

Vega 11 = 3,840 SP (~360mm2)
Vega 10 = 5,120 SP (~500mm2)

Because of the block arrangement, SP per SM etc. That's my guess.

If you guys want to understand IPC gains, read the patent. It's an insane change.

GCN 4 is a wide and parallel architecture, even at the individual SP level, where multiple threads and different workloads can exist concurrently.

Each SP has 2x Scalar & multiple different Vector ALUs of different width.

The wider ALU handle more complex maths, the simpler ALU handle simpler maths. Each SP feedbacks to the hardware scheduler at each cycle about it's occupancy for each ALU. The scheduler then submits more work threads suitable for the vector & scalar units.

If there's no suitable workload and the SP has idling ALU, the SP itself will boost clocks. The beauty of this is they claim no change in software is required because the hardware scheduler is dynamic enough to distribute workloads to hit peak efficiency for each SP.

Polaris will blow the socks off earlier GCN this much is certain.
 

Magee_MC

Senior member
Jan 18, 2010
217
13
81
Polaris 11 = 1,280 SP (~110mm2)
Polaris 10 = 2,560 SP (~232mm2)

Vega 11 = 3,840 SP (~360mm2)
Vega 10 = 5,120 SP (~500mm2)

Because of the block arrangement, SP per SM etc. That's my guess.

If you guys want to understand IPC gains, read the patent. It's an insane change.

GCN 4 is a wide and parallel architecture, even at the individual SP level, where multiple threads and different workloads can exist concurrently.

Each SP has 2x Scalar & multiple different Vector ALUs of different width.

The wider ALU handle more complex maths, the simpler ALU handle simpler maths. Each SP feedbacks to the hardware scheduler at each cycle about it's occupancy for each ALU. The scheduler then submits more work threads suitable for the vector & scalar units.

If there's no suitable workload and the SP has idling ALU, the SP itself will boost clocks. The beauty of this is they claim no change in software is required because the hardware scheduler is dynamic enough to distribute workloads to hit peak efficiency for each SP.

Polaris will blow the socks off earlier GCN this much is certain.

How can you be sure that AMD has implemented the functionality referenced in the patent in Polaris/Vega processors?
 
Feb 19, 2009
10,457
10
76
How can you be sure that AMD has implemented the functionality referenced in the patent in Polaris/Vega processors?

Next gen GCN. Could be Navi? Sure.

But before Fiji was released, AMD got the patent for HBM memory controller just prior.

We'll know for certain in a few weeks.
 

Magee_MC

Senior member
Jan 18, 2010
217
13
81
Next gen GCN. Could be Navi? Sure.

But before Fiji was released, AMD got the patent for HBM memory controller just prior.

We'll know for certain in a few weeks.

Fair enough. I just wasn't sure if I had missed something, or if you were making an educated guess. I would make a lot of sense for AMD to have it come out this generation, especially since AMD and NV cross license patents. If they put the patent out too soon, then NV can bake it into their next generation of GPUs. If AMD has it in this generation then it definitely won't be in Pascal, and I don't know if they'll even be able to put it into Volta given how long out from release the architectures are finalized.
 

Vaporizer

Member
Apr 4, 2015
137
30
66
Patents are published 18 month after filing. Therefore if it is public now than the invention was done 18+ month ago. Should be in line with Design of polaris/vega architecture

Gesendet von meinem GT-I9195 mit Tapatalk
 

Adored

Senior member
Mar 24, 2016
256
1
16
I would say Vega also, that would start to make sense of the unexpected perf/Watt increase over Polaris.
 

Mahigan

Senior member
Aug 22, 2015
573
0
0
AMD have an edge over NVIDIA under DX12 with the current generation of hardware.

I had stated this last summer when I uncovered the lack of Asynchronous Compute support in NV hardware which led to the Asynchronous compute controversy. I had predicted AMDs DX12 superiority and this is precisely what is happening.

Some look towards informed theories as being equitable to mere opinions. As a result of this popular mindset, I faced a lot of backlash and anger from those who wanted my views dismissed as the mere ramblings of some partisan commentator.

I find that there is a lot to learn about overall Human behaviors when one dabbles in theories and the resulting backlash.

For one, it appears that "spin" is to be found across various markets. We find "spin" in politics as various partisan political pundits attempt to deflect criticisms by injecting differing perspectives into the issues surrounding the criticisms.

This behavior spans across the entire realm of human public discourse. From Coke vs Pepsi, to Democrats vs Republicans etc etc etc. Wherever there is a dichotomy, there is spin.

Secondly, the amount of personal offense taken when ones informed opinions, see objective truths, cast a dark shadow over one half of a dichotomy or both parts of a dichotomy is alarming.

If you speak truth to power, and these truths cast a shadow over one half of a dichotomy, then it will be assumed that you're part of the partisan crowd defending the opposing aspect of this dichotomy.

As such, I've been consistently labeled as an AMD partisan fan even if all I was doing was communicating truths. Truths which could have gone the other way, in which case I would have been labeled an NVIDIA partisan fan. This remains as one of the biggest lessons I've learned about the general human population.

This lesson is that the majority of humans do not care about the truth...they cannot handle the truth. They instead prefer to live their lives clinging onto non-truths/fantasies/lies and seek the validation of their pre-conceived notions of reality. That is to say that the majority of humans suffer from a confirmation bias.

In a world where the majority clings to lies, how can the human species, as a whole, transcend war, poverty, racism, sexism, famine, corruption etc?

Those are big questions are clearly off topic but to bring this back on topic... How will NVIDIA deal with a market which no longer favors their architectures?

We've caught a glimpse of that with the P100 unveiling. NVIDIA are attempting to mimic GCN in terms of the organizational format of their Shader Multiprocessors. Is this going far enough when AMD appear to be heading towards further tweaks to GCN, one step ahead of NVIDIA, with the GCN 4.0 patent I've shared here?

For all those discounting Polaris, before it has even launched, do you comprehend what the Patent is saying? GCN 4.0 is a revolutionary change over GCN 3.0. The perf/watt is improved architecturally, not simply by means of a node shrink from 28nm planar to 14LPP FinFet.

While NVIDIA's Pascal will be boosting all of its CUDA cores, and will have the power usage that comes with it, from say 1300MHz to 1500MHz, GCN 4.0 will have clock gating incorporated at the Vector ALU level. This means that the potential for a boost clock is well beyond 1500MHz, dare I say perhaps even near or over 2GHz?!

This means that occupied Vector ALUs could theoretically each output twice the performance of a GCN 3.0 Vector ALU.

A low ALU part, like say Polaris 11 which is presumed to have 1,024 Vector ALUs on tap, could outperform a 2,048 Vector ALU GCN 3.0 part irregardless of memory bandwidth due to the Instruction Prefetch feature.

That is quite impressive. Here's to hoping that this patent is tied to Polaris/Vega and not Navi.
 

Glo.

Diamond Member
Apr 25, 2015
5,765
4,670
136
There should be a gigantic like button on this forum.

Mahigan, we also should remember improved DX11 performance on new architecture.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |