By what metric do you mean that128 Maxwell cores come within 90% of 192 Kepler cores? One Maxwell core had the same number of FLOPs as one Kepler core obviously. Even in gaming at launch, normalized for frequency and core count Titan X was only 12% faster than 780Ti at 1080p, but there are a lot more things than the cores that affect that.
This was covered more than 2 years ago:
Maxwell 1:
"for space efficiency a single 128 CUDA core SMM can deliver 90% of the performance of a 192 CUDA core SMX at a much smaller size." ~
Source
192 * 0.90 / 128 = 1.35X or 35% IPC increase per CUDA core. Directly from NV themselves:
Maxwell 2 improved that to 40% IPC from Kepler:
"GM204 is overall a more efficient chip, and although it possesses just 33% more CUDA cores than GK104 its performance advantage is much greater, on the order of 50% or more, highlighting the fact that NVIDIA is getting more work out of their CUDA cores than ever before. Altogether,
NVIDIA tells us that on average theyre getting 40% more performance per core, which is one of the reasons why GTX 980 can beat even the full GK110 based GTX 780 Ti, with its 2880 CUDA cores." ~
Source
That means we cannot straight up assume that Pascal CUDA cores are equal to Maxwell 1/2. Having said that, I do not expect the same 35-40% increase in IPC per CUDA core with Pascal as Maxwell was a phenomenal breakthrough over Kepler in this regard.
I'm not going to pretend to be able to know what the specs of NV's gaming Pascal cards will be, but it's important not to underestimate NVIDIA in light of what has been phenomenal execution over the last several years. This company knows that it lives & dies by its performance in the gaming GPU market, so I would expect products tailored with that notion in mind to roll off the line when the time comes.
JHH himself mentioned I believe in the GTX690 presentation live stream that Quadro/Tesla profits pay for a lot of R&D of GeForce. Meaning that if you removed the profits/revenue from Quadro/Tesla, it would be FAR harder to design massive monolithic Big Kepler, Maxwell, Pascal, etc. It stands to reason then NV's top end cards are based on or around the flagship Quadro/Tesla. The GeForce versions of those have higher clocks but neutered FP64, compute and other features. It stands to reason this will continue unless GP102 is a drastically gaming focused design that takes all the redundant GP100 stuff and throws it away in favour of some 5000+ CUDA core monster. To me that sounds like some fan fantasy right now, sorry.
My point is you suggested that NV will finally split their professional large die designs completely from their gaming designs, or at least that's how I read your post. NV mentioned nothing of the sort regarding Asynchronous Compute which suggests to me Pascal is going with a brute force (high clocks + massive memory bandwidth). There is nothing wrong with that since there are few DX12 Async games out and Quantum Break/GOW have been unoptimized ports. Point being from what NV revealed it seems they have not focused on this DX12 functionality. This suggests AMD doesn't need to devote too many transistor resources on increasing ACEs from say 8 to 12-16. That means AMD focus on its biggest weaknesses and not waste more die space on ACEs/Compute.
http://www.anandtech.com/show/8729/nvidia-launches-tesla-k80-gk210-gpu
Nvidia has already done it before, their first HPC only GPU was GK210.
Ya but the fundamental performance/specs of GK110 and 210 were nearly identical to Titan Black / 780Ti. From what I am reading here it sounds like some people actually think NV will release a 3584 CUDA core P100 but GP102 will be a completely different design with more than 240 TMUs/3840 CUDA cores? I've never seen such a situation where NV's biggest fully unlocked Tesla/Quadro chip has much lower technical specs (not clocks or TDP) compared to the gaming GeForce series.
Honestly I was expecting a little more from GP100 ...
The clocks just aren't that special considering overclocked GM200 chip can hit 1200MHz base and 1400+MHz boost easily while Pascal seems to be already at it's thermal limits ...
For a 610mm^2 die size with double the transistor density you would think that there'd be a 50% increase in shader count but it's half of that. Maybe there's an increase in IPC but I thought Maxwell had good IPC already plus it needs less parallelism too ...
Most people don't overclock though. Stock vs. stock, a fully unlocked GP100 is >70% capable (paper specs) than the Titan X. Also, there could be 15-20% increase in IPC on top of that. You do make a good point that without knowing Pascal's overclocking headroom, it's hard to gauge 980Ti / Titan X OC vs. Pascal OC. Usually NV is pretty conservative with their clocks. Overclocking actually improved every generation since Fermi and Fermi itself was better than HD5800/6900. I wouldn't underestimate NV's overclocking capabilities. Even my 8800GTS overclocked at least 20%.
You must have missed the context of the post. Pascal's async compute abilities are supposedly only as good as GCN 1.0
I guess some people in this thread are trying to evade the entire async compute Pascal situation. It's possible NV is hiding it as a secret but I'd be pretty surprised IF they improved it significantly over Maxwell but then wouldn't tout it as a new killer feature given how Maxwell is terrible with Async. That only reinforces the rumors that Pascal still lacks a strong Async architecture. As I mentioned earlier, probably not a big deal since NV focused on more profitable markets (AI, Neural Networks, Deep Learning) but for games it looks like they have to pick and choose and Async wasn't a priority when they were designing Pascal.
P100 has no ROPs listed on the spec sheet. That's probably why.
EDIT:Jesus I was thinking GP100 is the consumer class chip, but Nvidia has both a product name P100 and chip level. They seeming interchange the two on their website, when they are the same thing.
Actually they are not:
- Full GP100/102 was rumored to have 1TB/sec HBM2; P100 uses 1.4Ghz HBM2 for 720GB/sec.
- Full GP100 has 60 SMs => 3840 CUDA cores with 240 TMUs; P100 is a cut-down version with 56 SMs => 3584 CUDA cores with 224 TMUs.
- GP100 is the underlying chip for Pascal family and P100 is a particular Tesla model which uses a variant of GP100. That is to say as yields and the node mature, we could easily see P110/P210 (whatever they call it) with a full GP100 die and higher HBM2 clocks.
Actually what people have said is that Maxwell is a derivative of Pascal. Maxwell being created when 20nm fell through. The GP consumer series chips could be radically different from P100 to be more suited for gaming. You can see that P100 Pascal adds back the DP and compute that Maxwell is missing as well as adds NVLINK and HBM2, both of which have been known for sometime. Pascal has changed it's compute capability level, which is new information. It's really not all that different spec wise compared to Maxwell. We still don't know if it can be used in conjunction with DX12 or Vulkan.
Edit:spelling
For gamers, of course it would be way better if NV made a pure gaming chip with 610mm2 based on Pascal, minus NVLink, FP64, etc. but I think that's exactly what Volta will be. That's because it's going to be too hard to enlarge the die size beyond 610mm2 with Volta and it's probably too hard to increase GPU clocks/IPC by another 35-40% from Pascal. It then stands to reason that the lean gaming Big Daddy Pascal will be Volta in 2018. That's my theory but I think Volta will also incorporate Async Compute and many other tricks like next gen VXGI/other lighting techniques, etc.
It's entirely plausible. Just look at the previous smaller GK104 chip, it had a lower FP64 pipe ratio than GK110/GK210 and less compute oriented features. NVIDIA could just be taking this one step further and it makes sense to do so. If true, it would mean that NVIDIA has realized that big enterprise compute workloads have evolved so much and become so different from graphics workloads that it makes sense to sell a chip tailored for each market and I believe they have the size and capability to do so.
I believe AMD mentioned before that it costs $300-500M to design an ASIC. (
Pg. 15 has similar data). I am not sure the market for 1080Ti chips is large enough to spend $300-500M to develop a from scratch design. I think it would be better to tweak GP100 and remove some parts but I don't envision a 5000-6000 CUDA core GP102 selling alongside 3584-3840 CUDA core GP100. Seems too good to be true. Again, if there were to happen, it goes contradictory to the entire GPGPU business model NV adapted starting with G80. I can see GP104/106/107 designed as lean gaming GDDR5(X) chips but to have a completely unique > 4000 / > 240 TMU GP102 is a huge undertaking. That would be unprecedented and of course awesome if true. Instead, I see GP102 as a 3840 CUDA core part with higher clocks.
I mean, they are talking up deep learning alot but did anyone see the 5TFLOPS (FP64) numbers?? that is absolutely huge for a single GPU....! Surprised they weren't highlighting this alot compared to all the VR/Deep learning, the trendy stuff.
They did highlight it but once you mention 5.3TFlops FP64, it's kinda self explanatory. How much can you really talk about that?
Lol is that your take away from the info?
Pascal still is Maxwell+ on 16FF. What's the plus? FP64, FP16 mix-mode, NVLink, HBM2.
This is what NV says about GP100 on that blog post: Tesla P100: Built for HPC and Deep Learning
Yup, Pascal sounds like Maxwell Refined with next gen HBM2 + FinFET and enlarged L2 cache, while re-balancing the CUDA cores for maximum utilization mimicking GCN, minus the Async capabilities that AMD will bring with GCN 4.0.
300W TDP is actually going backwards. AS EXPECTED because they tore out all the power hungry features in Maxwell to make it a gaming focused chip, and now Pascal needs to put those back in to compete in the HPC market.
Not a peep from perf/watt people. Let's see all the smack talk around Radeon Pro Duo launch:
"Wow. That's like a small space heater..."
"Two of these in XFire would be the equivalent wattage of a midsize space heater for sure. "
"At 350W? Seriously?"
"24 pins for power! I bet the lights dim when you turn the system on."
AMD managed to package
16.4 Tflops of performance in a 350W TDP and Pascal brings 10.6 Tflops in 300W TDP.
Not seeing the same posters crapping on P100's horrible perf/watt wrt to FP32 or how impressive it was that AMD delivered 16Tflops of FP32 on 28nm!
Just pointing out all the double standards...
I'm going to call it now, Pascal will perform great in GCN-optimized game engines, better than Maxwell and vastly better than Kepler.
Don't forget another key Pascal feature: Maxwell driver obsolescence, with driver focus shifting 100% to Pascal. That alone should give Pascal an automatic 10-20% increase in IPC in modern DX12 games.
Despite only a small increase in FP32 or potential gaming performance, comparing GM200 ~7TFlops to GP100 ~10.6 TFlops, in effect the change above means each paper spec flop is worth more for gaming due to improved CC utilization (for those under the impression Maxwell was already 100% utilization, lololol).
Don't forget you are comparing fully unlocked GM200 on a mature 28nm node (higher clocks) to a cut-down GP100 (most likely GP102/110 clocks could improve over the next 12-18 months as 16nm node matures). Not exactly fair in that sense.
While some were expecting a double of performance, it's not going to happen given how HPC compute focused GP100 has to be, compared to GM200 which was made for gaming.
Using 4K GPU limited gaming scores, 980Ti shows up as 45% faster than 780Ti. With a huge 25% factory overclock, is 79% faster, still well short of 2X. It am not sure who was expecting Pascal OC to be 2X faster than GM200 max OC. At the same time NV did raise TDP from 250W to 300W giving them more wiggle room to up the GPU clocks with GeForce GP100/102. If modern games become more GPU / memory bandwidth bottlenecked, I can see 1080Ti OC beating 980Ti OC by 70%+.
:thumbsup: Great post Silverforce11
:thumbsup::thumbsup:
Your understanding of NVIDIA's business needs some work. Tesla is a very, very small portion of NV's overall revenues; GeForce GTX is far more important to the company in terms of total revenue and gross profit dollars.
Doesn't explain then why NV is launching P100 first, prioritizing that market above all else. NV likely thinks it can increase revenue and profits of Tesla dramatically due to pent up demand of customers using older GK110/210, as well as an entirely new growing market segment with Deep Learning/Neural Networks/Autonomous Driving. Sure, the GeForce market is more important but on paper for NV, they are both important for strategic reasons and how the chips are designed. If NV thought launching GP100 right in Q1 2016 was a better strategy as a GeForce chip, they would have done it. That means they know something we average folk don't.
Concerning the previous rumours, where there's smoke, there's fire - they could very well announce the new GeForces at Computex for a Q3 launch. GP104/GP106 should be the first if SweClockers is right.
They could but months ago we had people on these boards claiming 980Ti/Titan X's replacement could launch in April of 2016 as a GeForce. That's come and gone. Then we had rumors of 1080 launching in May and supposedly NV was going to show them today. GTX 1080 launching in May and Q3 launch could be dramatically different since Q3 goes all the way to September. If NV launches GP104/106 late May/early June, that would be very impressive given how they kept any leaks to a bare minimum for GP100.
Looks bad taking into account specs only. Probably 2xHawaii performance, if AMD does not get a better product this round they probably never will.
I think it could end up faster than that. Either way, 2X R9 390X is
65% faster than 980Ti at 4K. At $699, that would sell like hot cakes because. That's way more than stock 980Ti delivered over 780Ti but yes well short of GTX480/580 -> 780Ti. That was literally a 2X increase. Fury X is actually very competitive with 980Ti stock vs. stock, beating 980Ti at 4K. What bombed Fury X were lack of overclocking headroom, pump noise of early review samples and 4GB of VRAM marketing. In any case, as you've seen with 7970Ghz vs. 680, AMD beating NV in perfomrance isn't enough. They have to be ahead by probably 20-25% to start converting NV users.
If a chip of this same size was done Maxwell-style with minimal DP support, we could be looking at a whopping 7680 shaders! Now that's more like it. That said, though, I don't think GP102 will be quite this massive. A stronger possibility is 6144 shaders, double GM200's count. Whether Nvidia will use HBM2 on GP102 is another question, but given their traditional conservatism when it comes to new memory types, I think a 384-bit GDDR5X bus is more likely. GP102 would then be the highest actual GPU in the Pascal lineup, powering the most expensive Quadro card and the next-generation Titan when it arrives.
We might well see a situation where even GP104 has more single-precision power than GP100.
Wow, someone is an optimist. Looks like some of you aren't happy that Pascal may only be 70-75% faster but it needs to be 2-2.25X faster? Do you realize how freaken fast a 6144 CUDA core 1480mhz chip would be? Borderline wishful thinking; plus leaves 0 room for Volta on the same node.
Pascal seems to have gone towards lower number of cores at higher clocks. Since power is generally linked to square of voltage and that higher the frequency the higher the voltage required. AMD might go for more units and lower clocks having been criticized for poor power inefficiency. Nice contest.
That's why I keep saying this could be AMD's lifeline. If AMD can fit 5120-6144 SPs into Vega and 14nm FinFET gives it 20-25% OCing headroom (unlike poor Fiji), we could actually have a solid competitive generation. This round 980Ti/TX blew Fiji away and that's not healthy for competition or price wars. However, rumors of 4096 SP Vega and 2304-2560 SP Polaris 10 aren't aspiring confidence in this scenario.
No.
"Gaming" which is really GeForce GTX is ~4x the side of Professional Visualization.
Data Centre (Tesla) and Auto (Autonomous Driving, Deep Learning/Neural Networks), grew 40% and 80%, respectively. NV is thinking/looking long-term. A big part of GeForce growth is utter domination over AMD. Do you honestly believe NV will maintain 82.8% market share in the gaming market vs. AMD over the next 10 years? It's only a matter of time before AMD comes back with a solid gaming GPU series -- they always do. Frankly, AMD forfeited the entire mobile dGPU market since 2012 and it's been 4.5 years. You expect them to continue not getting any design wins with mobile dGPUs? It's virtually guaranteed that over the next 5 years AMD will regain market share in GPUs and that means GeForce growth will start slowing down unless everyone and their mother starts buying $350-700 NV GPUs for VR.
It is extremely unlikely that they neglect GeForce GTX for Quadro/Tesla.
No one said that NV is going to neglect GeForce but clearly they used a large chunk of transistors/die space on HPC/Professional features in GP100. We'll just have to see over the next 2-3 years how important Async Compute will become and whether or not AMD decided to double down on this feature (say going even wider with 12-16 ACEs) or AMD retains the same 8 ACEs and focuses on raw performance/efficiency instead. In any case, it's pretty clear that Pascal is much like Fermi and to some extent Kepler => much more well-rounded chip with compute functionality while Maxwell was almost a pure gaming/FP32 focused design. Don't worry, NV will have a chance to refine Pascal into an even more beastly gaming GPU with Volta much like Kepler evolved into Maxwell.
Buddy, it was all the FUD that has NV releasing Titan Pascal in April 2016 that was cheered upon. Then it was later pushed to June. Then no, it's a GP104 mid-range, not the big Pascal. Now we're hearing no, it's not even GP104, but GP106/107 first. -_-
Ya, true. Don't forget 1070/1080 were supposed to be unveiled today with May launch.
You have understanding of the context for why NV got the 30% jump in Gaming.
Marketshare vs AMD ring a bell? They went from 60:40 to 82:18 in that time period.
Let's say NV destroy AMD and get 95:5, how much more growth do you expect?
Compared to this?
Some people believe in NV gaining 99% of dGPU market share I guess. Soon they'll be for a rude awakening once GCN 4.0 launches. AMD will go from having 0 mobile dGPUs anyone wants to having an entire stack to choose from. Is it going to flat out beat Pascal? It doesn't need to as long as the chips find some market. The difference is, almost nothing AMD had from 2012-2016 was wanted in the mobile dGPU market; hence almost no design wins whatsoever. Similarly, on the desktop side, I know a lot of people still don't acknowledge how 7970/7970Ghz hammered 680/770 but history has a tendency of repeating itself (hint: 9700/9800Pro, X800/850 XT PE, X1900/1950XTX, etc.). Sooner or later AMD will have its day and some people have short memories. We are already seeing the panic now with people making up new theories about GP102/104 having way more CUDA cores than GP100 since clearly they expected way more out of Pascal.