NVIDIA Pascal Thread

Head1985 · Apr 5, 2016

How will GP104 looks?My speculation
Gp100 6x blocks-610mm2-3840SP with DP units 5760SP without DP.Two blocks are around 200mm2 large.

GP104:Like GTX980 so 4x blocks 400mm2

In this case GK104 will have 2560SP with DP units and 3840SP without DP units

GP104:Like GTX680 so 2x blocks 200mm2 (i think this should be GP106)

In this case Gp104 will have 1280SP with DP units and 1920SP without DP units

SO what do you think?

Silverforce11 · Apr 5, 2016

jpiniero said:
TSMC said that 16FF and 20p were 20% of their revenue in 2015. It was 24% in the 4Q. That doesn't sound like much but 28 nm was 28% for the year. That doesn't sound like a node that's "not mature".

16FF + 20 planar. It's included in the mix, that's the important bit. Maybe try to find the 16FF figures alone.

Silverforce11 · Apr 5, 2016

Arachnotronic said:
It's the argument that the NVIDIA haters who spent months spreading FUD about how NV hadn't taped out Pascal are resorting to in the face of evidence to the contrary.

It's clear now that the Drive PX2 will use low-end Pascal GPUs and those typically come out after the higher end/higher-margin GeForce GTX and Tesla/Quadro parts, which is why they are using Maxwell now but will switch to Pascal in Q3.

It never made sense to think that NVIDIA would so ineptly drop the ball as some have suggested, and the evidence is finally coming out.

Buddy, it was all the FUD that has NV releasing Titan Pascal in April 2016 that was cheered upon. Then it was later pushed to June. Then no, it's a GP104 mid-range, not the big Pascal. Now we're hearing no, it's not even GP104, but GP106/107 first. -_-

Turns out GP100 Tesla is Q1 2017.

In fact, that's much later that many of us expected. I was expecting GP100 Tesla in June. But it seems NV doesn't want it to go out then, keeping it for their $127K cluster only until Q1 2017.

Silverforce11 · Apr 5, 2016

Arachnotronic said:
Fastest growth segment for NV outside of Autos is GeForce GTX...

Gaming is the lifeblood of NVIDIA's business, plain and simple. It is extremely unlikely that they neglect GeForce GTX for Quadro/Tesla.

Firstly, HPC/Enterpise is a combined ~1B for NV, it's not small. Secondly, it's also very high margin, equating to profits. Look at their total revenue and then compare their profits, its a very small % of total revenue.

You have understanding of the context for why NV got the 30% jump in Gaming.

Marketshare vs AMD ring a bell? They went from 60:40 to 82:18 in that time period.

Let's say NV destroy AMD and get 95:5, how much more growth do you expect?

Compared to this?

jpiniero · Apr 5, 2016

Silverforce11 said:
16FF + 20 planar. It's included in the mix, that's the important bit. Maybe try to find the 16FF figures alone.

Not happening. Apple I think insists on them not reporting it seperately. Besides, both are using the same tools and equipment, may as well be considered the same node.

nurturedhate · Apr 5, 2016

Silverforce11 said:
Firstly, HPC/Enterpise is a combined ~1B for NV, it's not small. Secondly, it's also very high margin, equating to profits. Look at their total revenue and then compare their profits, its a very small % of total revenue.

You have understanding of the context for why NV got the 30% jump in Gaming.

Marketshare vs AMD ring a bell? They went from 60:40 to 82:18 in that time period.

Let's say NV destroy AMD and get 95:5, how much more growth do you expect?

Compared to this?

I agree and it's even easier. What did Nvidia showcase today? What did Nvidia showcase back in January?

Wasn't gaming cards...

Nvidia already told us what was important to them.

HurleyBird · Apr 5, 2016

It is possible that Nvidia could be doing something of a tick-tock cadence for compute/gaming cards.

For example:

Tock: GK100/210 - compute/DP
Tick: GM200 - gaming/SP
Tock: GP100 - compute/DP
Tick: GV100 - gaming/SP

Silverforce11 · Apr 5, 2016

nurturedhate said:
I agree and it's even easier. What did Nvidia showcase today? What did Nvidia showcase back in January?

Wasn't gaming cards...

Nvidia already told us what was important to them.

Yes, and Maxwell as far as we know and figures from retailers, still sells great.

It's on 28nm, it doesn't compete for 16nm FF wafers.

Those 16nm FF wafers go to high margin Tesla, Quadros and Drive SKUs.

Here's the deal, NV can sell 970 for $229 and 980 for $299 if they have to for a few months to compete vs Polaris. 28nm is cheap now. Old GDDR5 is cheaper. Lower the price on the 980Ti further and viola, no problems at all being competitive.

Vega isn't exactly going to arrive soon. There's no threat on the upper mid-range and high-end. No reason to rush out GTX GP104 or GP100.

But there's definitely reasons to not do it, because it competes for 16nm wafers that is more profitable as HPC/Enterprise products. NV's strategy makes too much sense really.

Silverforce11 · Apr 5, 2016

HurleyBird said:
It is possible that Nvidia could be doing something of a tick-tock cadence for compute/gaming cards.

For example:

Tock: GK100/210 - compute/DP
Tick: GM200 - gaming/SP
Tock: GP100 - compute/DP
Tick: GV100 - gaming/SP

With how long each node last, it's the wisest strategy.

RussianSensation · Apr 5, 2016

MrTeal said:
By what metric do you mean that128 Maxwell cores come within 90% of 192 Kepler cores? One Maxwell core had the same number of FLOPs as one Kepler core obviously. Even in gaming at launch, normalized for frequency and core count Titan X was only 12% faster than 780Ti at 1080p, but there are a lot more things than the cores that affect that.

This was covered more than 2 years ago:

Maxwell 1:
"for space efficiency a single 128 CUDA core SMM can deliver 90% of the performance of a 192 CUDA core SMX at a much smaller size." ~ Source

192 * 0.90 / 128 = 1.35X or 35% IPC increase per CUDA core. Directly from NV themselves:

Maxwell 2 improved that to 40% IPC from Kepler:

"GM204 is overall a more efficient chip, and although it possesses just 33% more CUDA cores than GK104 its performance advantage is much greater, on the order of 50% or more, highlighting the fact that NVIDIA is getting more work out of their CUDA cores than ever before. Altogether, NVIDIA tells us that on average theyre getting 40% more performance per core, which is one of the reasons why GTX 980 can beat even the full GK110 based GTX 780 Ti, with its 2880 CUDA cores." ~ Source

That means we cannot straight up assume that Pascal CUDA cores are equal to Maxwell 1/2. Having said that, I do not expect the same 35-40% increase in IPC per CUDA core with Pascal as Maxwell was a phenomenal breakthrough over Kepler in this regard.

Arachnotronic said:
I'm not going to pretend to be able to know what the specs of NV's gaming Pascal cards will be, but it's important not to underestimate NVIDIA in light of what has been phenomenal execution over the last several years. This company knows that it lives & dies by its performance in the gaming GPU market, so I would expect products tailored with that notion in mind to roll off the line when the time comes.

JHH himself mentioned I believe in the GTX690 presentation live stream that Quadro/Tesla profits pay for a lot of R&D of GeForce. Meaning that if you removed the profits/revenue from Quadro/Tesla, it would be FAR harder to design massive monolithic Big Kepler, Maxwell, Pascal, etc. It stands to reason then NV's top end cards are based on or around the flagship Quadro/Tesla. The GeForce versions of those have higher clocks but neutered FP64, compute and other features. It stands to reason this will continue unless GP102 is a drastically gaming focused design that takes all the redundant GP100 stuff and throws it away in favour of some 5000+ CUDA core monster. To me that sounds like some fan fantasy right now, sorry.

My point is you suggested that NV will finally split their professional large die designs completely from their gaming designs, or at least that's how I read your post. NV mentioned nothing of the sort regarding Asynchronous Compute which suggests to me Pascal is going with a brute force (high clocks + massive memory bandwidth). There is nothing wrong with that since there are few DX12 Async games out and Quantum Break/GOW have been unoptimized ports. Point being from what NV revealed it seems they have not focused on this DX12 functionality. This suggests AMD doesn't need to devote too many transistor resources on increasing ACEs from say 8 to 12-16. That means AMD focus on its biggest weaknesses and not waste more die space on ACEs/Compute.

nvgpu said:
http://www.anandtech.com/show/8729/nvidia-launches-tesla-k80-gk210-gpu
Nvidia has already done it before, their first HPC only GPU was GK210.

Ya but the fundamental performance/specs of GK110 and 210 were nearly identical to Titan Black / 780Ti. From what I am reading here it sounds like some people actually think NV will release a 3584 CUDA core P100 but GP102 will be a completely different design with more than 240 TMUs/3840 CUDA cores? I've never seen such a situation where NV's biggest fully unlocked Tesla/Quadro chip has much lower technical specs (not clocks or TDP) compared to the gaming GeForce series.

ThatBuzzkiller said:
Honestly I was expecting a little more from GP100 ...

The clocks just aren't that special considering overclocked GM200 chip can hit 1200MHz base and 1400+MHz boost easily while Pascal seems to be already at it's thermal limits ...

For a 610mm^2 die size with double the transistor density you would think that there'd be a 50% increase in shader count but it's half of that. Maybe there's an increase in IPC but I thought Maxwell had good IPC already plus it needs less parallelism too ...

Most people don't overclock though. Stock vs. stock, a fully unlocked GP100 is >70% capable (paper specs) than the Titan X. Also, there could be 15-20% increase in IPC on top of that. You do make a good point that without knowing Pascal's overclocking headroom, it's hard to gauge 980Ti / Titan X OC vs. Pascal OC. Usually NV is pretty conservative with their clocks. Overclocking actually improved every generation since Fermi and Fermi itself was better than HD5800/6900. I wouldn't underestimate NV's overclocking capabilities. Even my 8800GTS overclocked at least 20%.

Despoiler said:
You must have missed the context of the post. Pascal's async compute abilities are supposedly only as good as GCN 1.0

I guess some people in this thread are trying to evade the entire async compute Pascal situation. It's possible NV is hiding it as a secret but I'd be pretty surprised IF they improved it significantly over Maxwell but then wouldn't tout it as a new killer feature given how Maxwell is terrible with Async. That only reinforces the rumors that Pascal still lacks a strong Async architecture. As I mentioned earlier, probably not a big deal since NV focused on more profitable markets (AI, Neural Networks, Deep Learning) but for games it looks like they have to pick and choose and Async wasn't a priority when they were designing Pascal.

Despoiler said:
P100 has no ROPs listed on the spec sheet. That's probably why.

EDIT:Jesus I was thinking GP100 is the consumer class chip, but Nvidia has both a product name P100 and chip level. They seeming interchange the two on their website, when they are the same thing.

Actually they are not:

- Full GP100/102 was rumored to have 1TB/sec HBM2; P100 uses 1.4Ghz HBM2 for 720GB/sec.
- Full GP100 has 60 SMs => 3840 CUDA cores with 240 TMUs; P100 is a cut-down version with 56 SMs => 3584 CUDA cores with 224 TMUs.
- GP100 is the underlying chip for Pascal family and P100 is a particular Tesla model which uses a variant of GP100. That is to say as yields and the node mature, we could easily see P110/P210 (whatever they call it) with a full GP100 die and higher HBM2 clocks.

Despoiler said:
Actually what people have said is that Maxwell is a derivative of Pascal. Maxwell being created when 20nm fell through. The GP consumer series chips could be radically different from P100 to be more suited for gaming. You can see that P100 Pascal adds back the DP and compute that Maxwell is missing as well as adds NVLINK and HBM2, both of which have been known for sometime. Pascal has changed it's compute capability level, which is new information. It's really not all that different spec wise compared to Maxwell. We still don't know if it can be used in conjunction with DX12 or Vulkan.

Edit:spelling

For gamers, of course it would be way better if NV made a pure gaming chip with 610mm2 based on Pascal, minus NVLink, FP64, etc. but I think that's exactly what Volta will be. That's because it's going to be too hard to enlarge the die size beyond 610mm2 with Volta and it's probably too hard to increase GPU clocks/IPC by another 35-40% from Pascal. It then stands to reason that the lean gaming Big Daddy Pascal will be Volta in 2018. That's my theory but I think Volta will also incorporate Async Compute and many other tricks like next gen VXGI/other lighting techniques, etc.

Hi-Fi Man said:
It's entirely plausible. Just look at the previous smaller GK104 chip, it had a lower FP64 pipe ratio than GK110/GK210 and less compute oriented features. NVIDIA could just be taking this one step further and it makes sense to do so. If true, it would mean that NVIDIA has realized that big enterprise compute workloads have evolved so much and become so different from graphics workloads that it makes sense to sell a chip tailored for each market and I believe they have the size and capability to do so.

I believe AMD mentioned before that it costs $300-500M to design an ASIC. (Pg. 15 has similar data). I am not sure the market for 1080Ti chips is large enough to spend $300-500M to develop a from scratch design. I think it would be better to tweak GP100 and remove some parts but I don't envision a 5000-6000 CUDA core GP102 selling alongside 3584-3840 CUDA core GP100. Seems too good to be true. Again, if there were to happen, it goes contradictory to the entire GPGPU business model NV adapted starting with G80. I can see GP104/106/107 designed as lean gaming GDDR5(X) chips but to have a completely unique > 4000 / > 240 TMU GP102 is a huge undertaking. That would be unprecedented and of course awesome if true. Instead, I see GP102 as a 3840 CUDA core part with higher clocks.

Cookie Monster said:
I mean, they are talking up deep learning alot but did anyone see the 5TFLOPS (FP64) numbers?? that is absolutely huge for a single GPU....! Surprised they weren't highlighting this alot compared to all the VR/Deep learning, the trendy stuff.

They did highlight it but once you mention 5.3TFlops FP64, it's kinda self explanatory. How much can you really talk about that?

Silverforce11 said:
Lol is that your take away from the info?

Pascal still is Maxwell+ on 16FF. What's the plus? FP64, FP16 mix-mode, NVLink, HBM2.

This is what NV says about GP100 on that blog post: Tesla P100: Built for HPC and Deep Learning

Yup, Pascal sounds like Maxwell Refined with next gen HBM2 + FinFET and enlarged L2 cache, while re-balancing the CUDA cores for maximum utilization mimicking GCN, minus the Async capabilities that AMD will bring with GCN 4.0.

Silverforce11 said:
300W TDP is actually going backwards. AS EXPECTED because they tore out all the power hungry features in Maxwell to make it a gaming focused chip, and now Pascal needs to put those back in to compete in the HPC market.

Not a peep from perf/watt people. Let's see all the smack talk around Radeon Pro Duo launch:

"Wow. That's like a small space heater..."
"Two of these in XFire would be the equivalent wattage of a midsize space heater for sure. "
"At 350W? Seriously?"
"24 pins for power! I bet the lights dim when you turn the system on."

AMD managed to package 16.4 Tflops of performance in a 350W TDP and Pascal brings 10.6 Tflops in 300W TDP.

Not seeing the same posters crapping on P100's horrible perf/watt wrt to FP32 or how impressive it was that AMD delivered 16Tflops of FP32 on 28nm!

Just pointing out all the double standards...

Silverforce11 said:
I'm going to call it now, Pascal will perform great in GCN-optimized game engines, better than Maxwell and vastly better than Kepler.

Don't forget another key Pascal feature: Maxwell driver obsolescence, with driver focus shifting 100% to Pascal. That alone should give Pascal an automatic 10-20% increase in IPC in modern DX12 games.

Silverforce11 said:
Despite only a small increase in FP32 or potential gaming performance, comparing GM200 ~7TFlops to GP100 ~10.6 TFlops, in effect the change above means each paper spec flop is worth more for gaming due to improved CC utilization (for those under the impression Maxwell was already 100% utilization, lololol).

Don't forget you are comparing fully unlocked GM200 on a mature 28nm node (higher clocks) to a cut-down GP100 (most likely GP102/110 clocks could improve over the next 12-18 months as 16nm node matures). Not exactly fair in that sense.

Silverforce11 said:
While some were expecting a double of performance, it's not going to happen given how HPC compute focused GP100 has to be, compared to GM200 which was made for gaming.

Using 4K GPU limited gaming scores, 980Ti shows up as 45% faster than 780Ti. With a huge 25% factory overclock, is 79% faster, still well short of 2X. It am not sure who was expecting Pascal OC to be 2X faster than GM200 max OC. At the same time NV did raise TDP from 250W to 300W giving them more wiggle room to up the GPU clocks with GeForce GP100/102. If modern games become more GPU / memory bandwidth bottlenecked, I can see 1080Ti OC beating 980Ti OC by 70%+.

KaRLiToS said:
:thumbsup: Great post Silverforce11

:thumbsup::thumbsup:

Arachnotronic said:
Your understanding of NVIDIA's business needs some work. Tesla is a very, very small portion of NV's overall revenues; GeForce GTX is far more important to the company in terms of total revenue and gross profit dollars.

Doesn't explain then why NV is launching P100 first, prioritizing that market above all else. NV likely thinks it can increase revenue and profits of Tesla dramatically due to pent up demand of customers using older GK110/210, as well as an entirely new growing market segment with Deep Learning/Neural Networks/Autonomous Driving. Sure, the GeForce market is more important but on paper for NV, they are both important for strategic reasons and how the chips are designed. If NV thought launching GP100 right in Q1 2016 was a better strategy as a GeForce chip, they would have done it. That means they know something we average folk don't.

Sweepr said:
Concerning the previous rumours, where there's smoke, there's fire - they could very well announce the new GeForces at Computex for a Q3 launch. GP104/GP106 should be the first if SweClockers is right.

They could but months ago we had people on these boards claiming 980Ti/Titan X's replacement could launch in April of 2016 as a GeForce. That's come and gone. Then we had rumors of 1080 launching in May and supposedly NV was going to show them today. GTX 1080 launching in May and Q3 launch could be dramatically different since Q3 goes all the way to September. If NV launches GP104/106 late May/early June, that would be very impressive given how they kept any leaks to a bare minimum for GP100.

parvadomus said:
Looks bad taking into account specs only. Probably 2xHawaii performance, if AMD does not get a better product this round they probably never will.

I think it could end up faster than that. Either way, 2X R9 390X is 65% faster than 980Ti at 4K. At $699, that would sell like hot cakes because. That's way more than stock 980Ti delivered over 780Ti but yes well short of GTX480/580 -> 780Ti. That was literally a 2X increase. Fury X is actually very competitive with 980Ti stock vs. stock, beating 980Ti at 4K. What bombed Fury X were lack of overclocking headroom, pump noise of early review samples and 4GB of VRAM marketing. In any case, as you've seen with 7970Ghz vs. 680, AMD beating NV in perfomrance isn't enough. They have to be ahead by probably 20-25% to start converting NV users.

JDG1980 said:
If a chip of this same size was done Maxwell-style with minimal DP support, we could be looking at a whopping 7680 shaders! Now that's more like it. That said, though, I don't think GP102 will be quite this massive. A stronger possibility is 6144 shaders, double GM200's count. Whether Nvidia will use HBM2 on GP102 is another question, but given their traditional conservatism when it comes to new memory types, I think a 384-bit GDDR5X bus is more likely. GP102 would then be the highest actual GPU in the Pascal lineup, powering the most expensive Quadro card and the next-generation Titan when it arrives.

We might well see a situation where even GP104 has more single-precision power than GP100.

Wow, someone is an optimist. Looks like some of you aren't happy that Pascal may only be 70-75% faster but it needs to be 2-2.25X faster? Do you realize how freaken fast a 6144 CUDA core 1480mhz chip would be? Borderline wishful thinking; plus leaves 0 room for Volta on the same node.

raghu78 said:
Pascal seems to have gone towards lower number of cores at higher clocks. Since power is generally linked to square of voltage and that higher the frequency the higher the voltage required. AMD might go for more units and lower clocks having been criticized for poor power inefficiency. Nice contest.

That's why I keep saying this could be AMD's lifeline. If AMD can fit 5120-6144 SPs into Vega and 14nm FinFET gives it 20-25% OCing headroom (unlike poor Fiji), we could actually have a solid competitive generation. This round 980Ti/TX blew Fiji away and that's not healthy for competition or price wars. However, rumors of 4096 SP Vega and 2304-2560 SP Polaris 10 aren't aspiring confidence in this scenario.

Arachnotronic said:
No.

"Gaming" which is really GeForce GTX is ~4x the side of Professional Visualization.

Data Centre (Tesla) and Auto (Autonomous Driving, Deep Learning/Neural Networks), grew 40% and 80%, respectively. NV is thinking/looking long-term. A big part of GeForce growth is utter domination over AMD. Do you honestly believe NV will maintain 82.8% market share in the gaming market vs. AMD over the next 10 years? It's only a matter of time before AMD comes back with a solid gaming GPU series -- they always do. Frankly, AMD forfeited the entire mobile dGPU market since 2012 and it's been 4.5 years. You expect them to continue not getting any design wins with mobile dGPUs? It's virtually guaranteed that over the next 5 years AMD will regain market share in GPUs and that means GeForce growth will start slowing down unless everyone and their mother starts buying $350-700 NV GPUs for VR.

Arachnotronic said:
It is extremely unlikely that they neglect GeForce GTX for Quadro/Tesla.

No one said that NV is going to neglect GeForce but clearly they used a large chunk of transistors/die space on HPC/Professional features in GP100. We'll just have to see over the next 2-3 years how important Async Compute will become and whether or not AMD decided to double down on this feature (say going even wider with 12-16 ACEs) or AMD retains the same 8 ACEs and focuses on raw performance/efficiency instead. In any case, it's pretty clear that Pascal is much like Fermi and to some extent Kepler => much more well-rounded chip with compute functionality while Maxwell was almost a pure gaming/FP32 focused design. Don't worry, NV will have a chance to refine Pascal into an even more beastly gaming GPU with Volta much like Kepler evolved into Maxwell.

Silverforce11 said:
Buddy, it was all the FUD that has NV releasing Titan Pascal in April 2016 that was cheered upon. Then it was later pushed to June. Then no, it's a GP104 mid-range, not the big Pascal. Now we're hearing no, it's not even GP104, but GP106/107 first. -_-

Ya, true. Don't forget 1070/1080 were supposed to be unveiled today with May launch.

Silverforce11 said:
You have understanding of the context for why NV got the 30% jump in Gaming.

Marketshare vs AMD ring a bell? They went from 60:40 to 82:18 in that time period.

Let's say NV destroy AMD and get 95:5, how much more growth do you expect?

Compared to this?

Some people believe in NV gaining 99% of dGPU market share I guess. Soon they'll be for a rude awakening once GCN 4.0 launches. AMD will go from having 0 mobile dGPUs anyone wants to having an entire stack to choose from. Is it going to flat out beat Pascal? It doesn't need to as long as the chips find some market. The difference is, almost nothing AMD had from 2012-2016 was wanted in the mobile dGPU market; hence almost no design wins whatsoever. Similarly, on the desktop side, I know a lot of people still don't acknowledge how 7970/7970Ghz hammered 680/770 but history has a tendency of repeating itself (hint: 9700/9800Pro, X800/850 XT PE, X1900/1950XTX, etc.). Sooner or later AMD will have its day and some people have short memories. We are already seeing the panic now with people making up new theories about GP102/104 having way more CUDA cores than GP100 since clearly they expected way more out of Pascal.

njdevilsfan87 · Apr 5, 2016

They're not fully neglecting gaming, but the paradigm shift of HPC to GPUs is happening as we speak. Some of us saw this coming. I only self-learned CUDA back in 2013 after I bought my first GTX Titan, and it was already starting to take off then. Now I have a very lucrative career ahead of me. Deep learning is just really bringing it out now, because data science applies to absolutely everything.

Now from AMD's point of view, I don't think they should even bother trying to compete with NV in this HPC space right now. Forget the FP64 performance. What AMD should do, is get out the biggest FP32 only chip out that they can. It should dominate gaming performance (at least until NV responds), but it will allow them to start making a name for themselves again like back in the 9800 Pro - X1900 XT days. But it might be too late for that I'm sure Polaris is pretty much done like Pascal.

Cookie Monster · Apr 5, 2016

Silverforce11 said:
Buddy, it was all the FUD that has NV releasing Titan Pascal in April 2016 that was cheered upon. Then it was later pushed to June. Then no, it's a GP104 mid-range, not the big Pascal. Now we're hearing no, it's not even GP104, but GP106/107 first. -_-

Turns out GP100 Tesla is Q1 2017.

In fact, that's much later that many of us expected. I was expecting GP100 Tesla in June. But it seems NV doesn't want it to go out then, keeping it for their $127K cluster only until Q1 2017.

You can buy it from June onwards ala DGX-1 for a nice cheap price of $129,000. Q1'17 is for other manufacturers other than IBM, Dell and Cray + HP - edited.

So they must be mass producing this even with poor yields (I suspect) right now plus they got to fill up various orders as discussed before.

Would love to see how this stacks up against the current offerings in the HPC space. @Anandtech please make this happen

Cookie Monster · Apr 5, 2016

To put it in perspective, having 8~10 DGX-1s (8 Pascals each) would put you in the Top500 easy.

Silverforce11 · Apr 5, 2016

Cookie Monster said:
So they must be mass producing this even with poor yields (I suspect) right now plus they got to fill up various orders as discussed before.

Ofc, mass 16nm wafers, limited products available, only in their $129K product. This screams crap yields. Which honestly, who here thinks 610mm2 on 16nm FF at TSMC is going to yield well?

The flagship Tesla itself is a harvested part, that should tell you a lot.

The idea that AMD or NV will give is a huge chip for gaming in June on a brand new UNPROVEN dGPU node is a fairy tale. These companies know what they are doing and they are taking the best approach for the current situation.

The few good chips they get per wafer is going into $10K Teslas instead of GTX for gamers.

When the node is more mature and volume is high, then we can expect some larger GTX SKUs for gamers.

Also, @RS, excellent post man!

Never go on paper specs for a new node, new uarch and new memory tech. GP100 will blow away GM200 for gaming, period.

xpea · Apr 5, 2016

Silverforce11 said:
In fact, that's much later that many of us expected. I was expecting GP100 Tesla in June. But it seems NV doesn't want it to go out then, keeping it for their $127K cluster only until Q1 2017.

and its not hard to understand when they will sell everything they produce at this insane $129k mark. Stop and think again. They won't have enough chips to supply their customers at $129k price tag for 6 months !!!
NVDA Q3 sales numbers will be interesting to follow...

Besides, DGX1 is a brilliant move: Provide a reference platform to all these research institutes and make Pascal the de facto standard in the industry. A single node case bringing 170 TFLOPS at $129k is so cheap that every lab will want one "on his desk"

Silverforce11 · Apr 5, 2016

Head1985 said:
How will GP104 looks?My speculation
Gp100 6x blocks-610mm2-3840SP with DP units 5760SP without DP. Two blocks are around 200mm2 large.

GP104:Like GTX980 so 4x blocks 400mm2

In this case GK104 will have 2560SP with DP units and 3840SP without DP units

SO what do you think?

Firstly, we don't know the other subunit sizes, so to say 2 blocks is 200mm2 is way off. All the non-CC stuff takes space.

Secondly, we don't know the exact size difference of FP32 and FP64 CC. If we assume the diagram is to scale, FP64 CC is ~1.5x as big as FP32 in die size. You can try to work backwards from there if you want to guess GP104 specs.

The interesting part about their chart, they say they have 1:2 FP64:FP32 CC per SM, but if you look at it, the larger orange CC (FP64), there's only 1:3 not 1:2 ratio!

The leaks earlier (BitsandChips) stated than GP100 will return to the 1:3 ratio of Kepler and this chart shows it. To get 1:2, NV must have made their FP32 CC able to work together and act as an FP64 unit. This is very similar to Hawaii GCN. The interesting thing here, potentially an improved scheduler to handle this.

ThatBuzzkiller · Apr 6, 2016

RussianSensation said:
Most people don't overclock though. Stock vs. stock, a fully unlocked GP100 is >70% capable (paper specs) than the Titan X. Also, there could be 15-20% increase in IPC on top of that. You do make a good point that without knowing Pascal's overclocking headroom, it's hard to gauge 980Ti / Titan X OC vs. Pascal OC. Usually NV is pretty conservative with their clocks. Overclocking actually improved every generation since Fermi and Fermi itself was better than HD5800/6900. I wouldn't underestimate NV's overclocking capabilities. Even my 8800GTS overclocked at least 20%.

Most people don't overclock for sure but it's a pretty different story if you're at the top end ...

As for there being a 15-20% increase in IPC, I doubt that very much. While Pascal has twice the amount of registers per ALU the amount of registers per thread hasn't changed. This means that Pascal needs TWICE as many work items as it did before and thus it likely has no higher occupancy than Maxwell does. Pascal does have a higher shared memory per ALU ratio, it has a LOWER amount of shared memory per thread than Maxwell but it is generally a bad idea to spill into there since there is a higher access latency compared to registers ...

These changes seem to indicate that Nvidia is trading in IPC for more thread parallelism like AMD did in the past when transitioning to GCN from VLIW. If we were normalize clocks and core count the optimistic case would be a 10% increase or any at all ...

GP100's design seems to have a more prohibitive overclocking headroom when we factor in that it has a higher starting TDP than the GM200. When we start accounting for the much more power hungry on-chip memory resources compared to Maxwell, it will most likely further diminish the overclocking ability. Overclocking 14 megabytes worth of registers on the GP100 will be a challenge as it was for 16 megabytes of registers on Fiji ...

MrTeal · Apr 6, 2016

RussianSensation said:
This was covered more than 2 years ago:

Maxwell 1:
"for space efficiency a single 128 CUDA core SMM can deliver 90% of the performance of a 192 CUDA core SMX at a much smaller size." ~ Source

192 * 0.90 / 128 = 1.35X or 35% IPC increase per CUDA core. Directly from NV themselves:

Maxwell 2 improved that to 40% IPC from Kepler:

"GM204 is overall a more efficient chip, and although it possesses just 33% more CUDA cores than GK104 its performance advantage is much greater, on the order of 50% or more, highlighting the fact that NVIDIA is getting more work out of their CUDA cores than ever before. Altogether, NVIDIA tells us that on average theyre getting 40% more performance per core, which is one of the reasons why GTX 980 can beat even the full GK110 based GTX 780 Ti, with its 2880 CUDA cores." ~ Source

That means we cannot straight up assume that Pascal CUDA cores are equal to Maxwell 1/2. Having said that, I do not expect the same 35-40% increase in IPC per CUDA core with Pascal as Maxwell was a phenomenal breakthrough over Kepler in this regard.

So the 40% is from old nVidia marketing numbers. Keep in mind IPC is not the same as performance, as Maxwell is clocked higher than Kepler. GM204 had 33% more cores than GK104, and the GTX980's max boost was 11% higher than the GTX770. That's 47% more potential performance just from cores and clocks. The GTX was more than 50% faster than a GTX770, but it wasn't near the more than twice as faster it would need to be to have a 40% IPC advantage over Kepler.

HurleyBird · Apr 6, 2016

Silverforce11 said:
The interesting part about their chart, they say they have 1:2 FP64:FP32 CC per SM, but if you look at it, the larger orange CC (FP64), there's only 1:3 not 1:2 ratio!

Looks pretty clearly like a 1:2 ratio to me. Yes, one in every three shaders is FP64, which is where I think your confusion originated, but if you think about it you'll see that means for every one FP64 shader there are two FP32 shaders - hence the 1:2 ratio.

As far as adopting a more GCN like approach where two FP32 shaders perform like one FP64 shader, I'm really curious what the reason is why Nvidia is not doing this. Obviously they can make one FP32 shader perform like two FP16 ones, and the former seems like it would be an extension of the same idea. Perhaps those FP64 shaders are in certain ways deficient compared to the FP32 ones when it comes to rendering, making them not as expensive as one would assume in terms of die area ... that would also explain why Nvidia doesn't re-purpose FP64 shaders and run FP32 operations through them in non-professional cards for rendering tasks (the only other reason I can think of why they wouldn't do this would be power concerns).

...but if that's the case, it's equally confusing that they haven't made a card with only FP64 shaders and no transistors dedicated to gaming/rendering tasks.

beginner99 · Apr 6, 2016

In terms of GP102 being faster than GP100 in FP32 due to lack of DP:

What actually are the scenarios you need DP compute? Seems like mostly for deep learning.

GP102 focusing on SP makes sense. You can use it as 1080Ti, Titan and in the Quadro line. You then can also cram in more stuff into GP100 as it does not need any silicon for actually display stuff (ROPs) and get a much smaller die for GP102 with at least same if not higher FP32.

Good_fella · Apr 6, 2016

Performance per space.

http://www.computerbase.de/2016-04/...arks-spacer-hat-platz-fuer-32-gb-hbm2/#bilder
http://www.computerbase.de/2016-04/...it-8-tesla-p100-fuer-kuenstliche-intelligenz/

xpea · Apr 6, 2016

Good_fella said:
Performance per space.

http://www.computerbase.de/2016-04/...arks-spacer-hat-platz-fuer-32-gb-hbm2/#bilder
http://www.computerbase.de/2016-04/...it-8-tesla-p100-fuer-kuenstliche-intelligenz/

best shot so far of the chip:

airfathaaaaa · Apr 6, 2016

Despoiler said:
Went back and re-read it. It's confusing because it says i it won't have GCN-like capabilities, but then goes on to say it will be close to GCN 1.0. I was thinking GCN 1.0 has ACE's so it is capable of async at a rudimentary level.

it is.. problem is that the diagram of gp100 doesnt show anything towards that road of parallerism
if you take out the hbm links and nvlink you have the same diagram for maxwell but full of cuda cores and no rop's i hope im wrong but i wont see anything towards that unless they drasticly change the current arch to introduce ace like engines because the gigathread its still the same on this era too

Silverforce11 · Apr 6, 2016

HurleyBird said:
Looks pretty clearly like a 1:2 ratio to me. Yes, one in every three shaders is FP64, which is where I think your confusion originated, but if you think about it you'll see that means for every one FP64 shader there are two FP32 shaders - hence the 1:2 ratio.

As far as adopting a more GCN like approach where two FP32 shaders perform like one FP64 shader, I'm really curious what the reason is why Nvidia is not doing this. Obviously they can make one FP32 shader perform like as FP16 ones, and the former seems like it would be an extension of the same idea. Perhaps those FP64 shaders are in certain ways deficient compared to the FP32 ones when it comes to rendering, making them not as expensive as one would assume in terms of die area ... that would also explain why Nvidia doesn't re-purpose run FP64 shaders and run FP32 operations through them in non-professional cards for rendering tasks (the only other reason I can think of why they wouldn't do this would be power concerns).

...but if that's the case, it's equally confusing that they haven't made a card with only FP64 shaders and no transistors dedicated to gaming/rendering tasks.

You see the count it as half FP64 when they present the specs for FP64 TFlops, it's exactly half of the FP32. But yes, they only have 1 in 3 CC as FP64.

Definitely their new scheduler is making the FP32 CC act as FP64 to reach the half ratio. It can also split up the CC for FP16 operations, which is where this deep learning ~21 TFlops half-precision comes from. Kepler & Maxwell could NOT do that.

This means their scheduler is now GCN-like in it's flexibility.

What does a compute chip need TMUs for, right? Ofc, there's no such thing as GP100 not capable of graphics. It's just a slide to showcase compute atm. When the time comes for the GTX SKU, you'll see the ROPs in the corrected diagram.

^ Increased IPC with the new design. Basically the 64 CC will be lit up by one 64 warp that is commonly used in console optimized engines.

If you notice in the big diagram, what the heck are TPCs by the way? Only info I found, dates back a long time. Texture/Thread Processing Cluster. It is separate from TMUs.

If I had to guess, I would say its a Thread Processing Cluster and that's the Hardware Scheduler.

3DVagabond · Apr 6, 2016

Arachnotronic said:
It's the argument that the NVIDIA haters who spent months spreading FUD about how NV hadn't taped out Pascal are resorting to in the face of evidence to the contrary.

It's clear now that the Drive PX2 will use low-end Pascal GPUs and those typically come out after the higher end/higher-margin GeForce GTX and Tesla/Quadro parts, which is why they are using Maxwell now but will switch to Pascal in Q3.

It never made sense to think that NVIDIA would so ineptly drop the ball as some have suggested, and the evidence is finally coming out.

Who are these "haters"? Anyone here?

NVIDIA Pascal Thread

Golden Member

Lifer

Lifer

Lifer

Lifer

Golden Member

Platinum Member

Lifer

Lifer

Elite Member

Platinum Member

Diamond Member

Diamond Member

Lifer

Senior member

Lifer

Golden Member

Diamond Member

Platinum Member

Diamond Member

Member

Senior member

Senior member

Lifer

Lifer