1080 = new 680, Polaris 10 = new 7870?

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Zodiark1593

Platinum Member
Oct 21, 2012
2,230
4
81
How can a $699 and $449 SKU crush Polaris?

You surely don't think those prices are affordable for the entry and mainstream masses.

In hardware terms, to crush something, you have to offer better performance at a similar price. If you only deliver marginally better performance at a higher price, you're competitive. Or vice versa, you deliver 90% of the performance at 25-40% less in price, you're very competitive.

AMD can't crush GP104, because they won't out-perform it. NV can't crush Polaris 10, cos they don't want to price their SKUs that low.

The battle is not on until GP106 is released for NV to have a chance vs Polaris 10. Likewise, until Vega, AMD has no chance to beat GP104.
Well, let round 16 of the video card wars begin.

 

Fallen Kell

Diamond Member
Oct 9, 1999
6,063
437
126
Just like how massively OCing a 980Ti won't make it faster than a 390x when CF scaling works....

Had to highlight that part for you there. Not all games work well with crossfire and/or SLI. And to be perfectly honest, you are much less likely to have crossfire support on day one of a new game than you are to have SLI support. That said, as I mentioned both have problems which is why I typically just buy the fastest single GPU, which unfortunately hasn't been AMD for a while.
 

tential

Diamond Member
May 13, 2008
7,355
642
121
Had to highlight that part for you there. Not all games work well with crossfire and/or SLI. And to be perfectly honest, you are much less likely to have crossfire support on day one of a new game than you are to have SLI support. That said, as I mentioned both have problems which is why I typically just buy the fastest single GPU, which unfortunately hasn't been AMD for a while.

I think the times where SLI/CF don't work are grossly overexaggerated. It works for the big titles, maybe not on day 1, but then again, on day 1, what works....

I wouldn't expect to play any game on day 1 and have everything working anymore. Day 1 is essentially the open paid beta.
How can a $699 and $449 SKU crush Polaris?

You surely don't think those prices are affordable for the entry and mainstream masses.

In hardware terms, to crush something, you have to offer better performance at a similar price. If you only deliver marginally better performance at a higher price, you're competitive. Or vice versa, you deliver 90% of the performance at 25-40% less in price, you're very competitive.

AMD can't crush GP104, because they won't out-perform it. NV can't crush Polaris 10, cos they don't want to price their SKUs that low.

The battle is not on until GP106 is released for NV to have a chance vs Polaris 10. Likewise, until Vega, AMD has no chance to beat GP104.


As for the 1070 stealing AMD's thunder? You'd be surprised how many people with midrange cards have been waiting a LONG time to upgrade, and are now just like "WOW Titan X? I've waited 5 years for a GPU! Screw it yay Titan X $380 (actually $450 but we put the lower $380 pricetag to make you feel better until you take out your CC later)". Nvidia isn't stupid, just because you think it's a bad value, doesn't mean Nvidia won't make it out to be the deal of the century and convince a large amount of people ot buy at the $450, and $380 pricetag before Polaris gets in Consumer hands.....

When the 1080 is the talk of every person on reddit/neogaf/etc., when your favorite streamer is using a GTX 1080 now, when your friends are rushing to pick up 1080s/1070s, you expect Polaris 10 to go up against the Nvidia hype machine???

You're thinking with 0 emotion, which is why you don't understand how the vast majority of people think which Nvidia does. Nvidia will upsell people, just like they did with the GTX 970. It's what Nvidia does best.

The problem is, you're thinking logically like "Oh, we're priced in 2 different segments so we're not competing". I'm sure AMD thinks that too.
Nvidia doesn't. Nvidia is going to convince people the 1070/1080 are so great that even if you can't afford them, you'll wait for the 1060ti. My money is on Nvidia marketing doing what they do best, and they already are with Founders Edition hiked up prices, and early releases for those cards.... No one predicted those 2 things, and we're just getting started.

Wait til you guys see the reviews based on the Nvidia reviewer guide!
 
Last edited:

renderstate

Senior member
Apr 23, 2016
237
0
0
Yes and no.

AMD does FP64 very different to NV. They do not have dedicated ALU/CC that only runs FP64.

AMD's SP are very flexible, in that two SP can combine to process an FP64 workload or a single SP can process 2x FP16 (half-precision) workload.
Flexibility in arithmetic units is the enemy of power efficiency. I suspect this is why NVIDIA uses dedicated FP64 units in a market where area is less of a problem and power efficiency is way more important. There is nothing particularly hard or magical in fusing 2 or more ALUs to support wider operations.

This latter 2x FP16 ops has just been included in Pascal and hyped to the moon for "Deep Learning/AI" compute. You can chalk up one extra feature that Pascal is taking from GCN's books, along with instant graphics <-> compute switch, fine-grained preemption and 64x optimized wavefront.
I can't find any reference on FP16 ops running at double rate on GCN. Pointers? Even PS3 GPU, designed by NVIDIA, supports FP16 registers and some limited FP16 math, it's not exactly something new.

The rest is inaccurate or just wrong. IIRC GP100 compute preemption is more fine grained than GCN's. Also GP100 doesn't have 64 wide warps, which are still 32 wide. They simply split their SM in two halves doubling the register file for each half. Both physical and logical warp sizes are unchanged, so no, they didn't go the GCN route, apart from putting down more registers, which is something NVIDIA has been doing pretty much any generation.

I keep telling folks that GCN is old, but it aint obsolete and I hope they've started to realize that is the truth.

GCN is a *great* architecture but to suggest NVIDIA is copying it or imitating it is based on wishful thinking. Do you really think that developer computing architectures work that way? Let's be serious.
 
Feb 19, 2009
10,457
10
76
Flexibility in arithmetic units is the enemy of power efficiency. I suspect this is why NVIDIA uses dedicated FP64 units in a market where area is less of a problem and power efficiency is way more important.

You must be joking. Cut down GP100 Tesla, 300W. Selling it for HPC that need FP64, all those FP32 are doing JACK ALL but eat up the chip space and TDP.

Selling it as a Quadro for rendering, all those FP64 CC are doing JACK ALL but eat up the chip space and TDP.

Such an efficient design.

It seems your bias has gotten the better of your processing capabilities to claim having separate 1:2 FP64 units is an efficient design.

If NV could pull off what Hawaii does with 1:2 FP64, they would. A straight shrink & 2x of Hawaii to 14nm, ~400mm2, higher finfet clocks, and it would pwn P100 on FP64 performance. As a much smaller chip and less TDP.

IIRC GP100 compute preemption is more fine grained than GCN's
.

Rubbish.
 

renderstate

Senior member
Apr 23, 2016
237
0
0
You must be joking. Cut down GP100 Tesla, 300W. Selling it for HPC that need FP64, all those FP32 are doing JACK ALL but eat up the chip space and TDP.
Area is not an issue in the HPC market, when they can sell a couple of chips for the price of the whole wafer why would they care about area as much as they do for the gaming market? Unused ALUs are clock gated, if not power gated.

Selling it as a Quadro for rendering, all those FP64 CC are doing JACK ALL but eat up the chip space and TDP.
Same story.

Such an efficient design.
It is power efficient, not area efficient. If you are building a supercomputer with thousands of GPUs each costing thousands of dollars each, your power bill is still way higher than your GPU bill in the long term.

It seems your bias has gotten the better of your processing capabilities to claim having separate 1:2 FP64 units is an efficient design.
A well designed dedicated FP64 unit will use less power than 2 FP32 units fused together. Nothing groundbreaking here. The extra area, which is precious in the gaming market, is not a concern for HPC.

If NV could pull off what Hawaii does with 1:2 FP64, they would. A straight shrink & 2x of Hawaii to 14nm, ~400mm2, higher finfet clocks, and it would pwn P100 on FP64 performance. As a much smaller chip and less TDP.
If that's true, well done AMD. But I have my doubts There is much more to performance than peak flops In a modern GPU ALUs are probably less than 30% of the area of the chip. IIRC once AMD tripled their ALUs per pipe from one generation to the next and area hardly increased (don't ask me when/which product, it was a long time ago..)

Well, NVIDIA claims GP100 can preempt CUDA programs on per-instruction basis (page 9): https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf
What's GCN pre-emption granularity in OpenCL?
 

Zodiark1593

Platinum Member
Oct 21, 2012
2,230
4
81
Time to change hobby. I'm seriously considering it.
If you're looking at AAA titles (maxed out settings no less) only, then by all means do so. Many decent, less demanding titles do exist, and they tend not to care what brand hardware you're running as long as it's decent. Some of my favorites will happily run on mediocre IGPs.
 
Feb 19, 2009
10,457
10
76
Well, NVIDIA claims GP100 can preempt CUDA programs on per-instruction basis (page 9): https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf
What's GCN pre-emption granularity in OpenCL?

Already had since GCN 1.0, even more granularity actually.

https://www.amd.com/Documents/GCN_Architecture_whitepaper.pdf

The ACEs can switch between tasks queue, by stopping a task and selecting the next task from a different queue. For instance, if the currently running task graph is waiting for input from the graphics pipeline due to a dependency, the ACE could switch to a different task queue that is ready to be scheduled.

Here's what sebbi @ beyond3d had to say about Pascal's preemption:

https://forum.beyond3d.com/threads/nvidia-pascal-announcement.57763/page-32



Like I said, Pascal is taking some very nice features from GCN's books.

That 64x wavefront optimization is key for all console ports. The split to 32x2 each targeting 32x FP32 CC means Pascal hits peak utilization instantly like GCN does for game engines that use 64x wavefront.

Both GCN and Pascal can preemption and instant context switch graphics <-> compute. However, only GCN allows simultaneous graphics & compute workload to be processed by the CU/SIMDs, ie. wavefront granularity. True Async Shaders. NV is half way there. Volta will add the other half, but it'll need a hardware scheduler and a multi-engine design.
 
Last edited:

JDG1980

Golden Member
Jul 18, 2013
1,663
570
136
Yes and no.

AMD does FP64 very different to NV. They do not have dedicated ALU/CC that only runs FP64.

AMD's SP are very flexible, in that two SP can combine to process an FP64 workload or a single SP can process 2x FP16 (half-precision) workload.

This is true, and Hawaii is a very space-efficient chip - probably the best perf/mm^2 of any of AMD's 28nm offerings. That said, there must be some tradeoff for that 1/2 FP64 support, since if adding that was "free" in terms of die space, why didn't they do it on Fiji?
 
Mar 10, 2006
11,715
2,012
126
NV is half way there. Volta will add the other half, but it'll need a hardware scheduler and a multi-engine design.

A hardware scheduler probably burns more power than a software one, so it the performance gains from adding the necessary hardware/complexity to support "True Async" (something that developers seem to say adds 5-10% in performance) might not be worth the power consumption trade-off (i.e. there could be other ways to get that performance increase elsewhere that, for NV, may be more efficient) in a particular generation/at a given node.

The architects at both AMD and NVIDIA are very smart and I'm sure there's good reasoning behind the trade-offs that both IHVs make.
 
Feb 19, 2009
10,457
10
76
@Arachnotronic

The time-frames involved with Volta, by then DX12 is common, advanced compute features will be the norm. GPUs that can run those compute effects in parallel with graphics rendering will get a massive leg-up.

ps. DX12 multi-engine is not just about shader/ALU utilization as it's commonly misunderstood to be. I hope I've done enough posts on these for some of you guys to no longer think such things anymore.
 
Feb 19, 2009
10,457
10
76
This is true, and Hawaii is a very space-efficient chip - probably the best perf/mm^2 of any of AMD's 28nm offerings. That said, there must be some tradeoff for that 1/2 FP64 support, since if adding that was "free" in terms of die space, why didn't they do it on Fiji?

The trade-off occurs in the scheduling front-end for the CU, it has a small mm2 impact. But GCN CU/SIMDs themselves don't differ between Tahiti/Hawaii/Fiji, it's the other parts that allow it to handle 1:2 FP64 that differs. There's more on that in the GCN PDF from AMD.
 

renderstate

Senior member
Apr 23, 2016
237
0
0
Already had since GCN 1.0, even more granularity actually.

https://www.amd.com/Documents/GCN_Architecture_whitepaper.pdf



Here's what sebbi @ beyond3d had to say about Pascal's preemption:

https://forum.beyond3d.com/threads/nvidia-pascal-announcement.57763/page-32



Like I said, Pascal is taking some very nice features from GCN's books.
Scheduling granularity *is not* the same thing as pre-emption granularity. The latter is about stopping processing, saving state and possibly re-load state and re-start at later time. Being able to schedule at wavefront/warp granularity has nothing to do with it. As far as I know GCN cannot stop executing a wavewront and give up its resources to another wavefront, until it has run to completion. When GCN is running compute and gfx on the same CU they are using distinct & separate resources. If you have a program executing a trillion iterations and you want to stop it and start doing something else you got to wait for it to complete first. If the program is stuck in an infinite loop, you are toasted. On Pascal a CUDA program can be interrupted at any time, at instruction granularity, and doesn't need to run to completion. Do you understand the difference now?

That 64x wavefront optimization is key for all console ports. The split to 32x2 each targeting 32x FP32 CC means Pascal hits peak utilization instantly like GCN does for game engines that use 64x wavefront.
That's not peak utilization, you don't even cover ALU latencies with 2 warps, to not mention caches and memory latency.

Both GCN and Pascal can preemption and instant context switch graphics <-> compute. However, only GCN allows simultaneous graphics & compute workload to be processed by the CU/SIMDs, ie. wavefront granularity. True Async Shaders. NV is half way there. Volta will add the other half, but it'll need a hardware scheduler and a multi-engine design.
Again, this is completely unrelated to what I was talking about (OpenCL, not graphics, pre-emption granularity, not scheduling granularity). We know nothing about GP104, all the information I quoted was from a GP100 white paper.
 

Abwx

Lifer
Apr 2, 2011
11,167
3,862
136
If you have a program executing a trillion iterations and you want to stop it and start doing something else you got to wait for it to complete first.

http://www.extremetech.com/extreme/...axwell-production-ahead-of-june-pascal-launch

From Nvidia&#8217;s whitepaper:
Compute Preemption is another important new hardware and software feature added to GP100 that allows compute tasks to be preempted at instruction-level granularity, rather than thread block granularity as in prior Maxwell and Kepler GPU architectures. Compute Preemption prevents long-running applications from either monopolizing the system (preventing other applications from running) or timing out.
And extremetech conclusion :


This suggests that Pascal still won&#8217;t support asynchronous compute workloads in the same fashion AMD&#8217;s GCN hardware does, though the verdict is still out on whether this capability will play a significant role in shaping performance in a majority of DirectX 12 titles. It also suggests Pascal will see a smaller performance penalty with async compute enabled than Maxwell did, since it can interleave compute workloads more effectively than its predecessor.
From what i read here it s like a limitation has been partialy removed but this is still below what is possible with GCN, in short, as pointed by Silverforce, Nvidia is just re inventing a GCN like wheel...
 
Feb 19, 2009
10,457
10
76
@Abwx
Async Compute is as much a factor as GPU PhysX (ie. not much), because DX12 games don't auto use it unless AMD sponsors and add it.

Pascal has just what it needs to perform well in a future where games use more compute effects, and importantly, for VR.
 

Abwx

Lifer
Apr 2, 2011
11,167
3,862
136
A hardware scheduler probably burns more power than a software one, so it the performance gains from adding the necessary hardware/complexity to support "True Async" (something that developers seem to say adds 5-10% in performance) might not be worth the power consumption trade-off (i.e. there could be other ways to get that performance increase elsewhere that, for NV, may be more efficient) in a particular generation/at a given node.

The architects at both AMD and NVIDIA are very smart and I'm sure there's good reasoning behind the trade-offs that both IHVs make.

Not so long ago Nvidia published a paper where they said that hardware scheduling is still not possible efficently and that there s no readily available means to implemement an efficient one....

Well, it looks that some people at AMD did find a workaround...
 

Abwx

Lifer
Apr 2, 2011
11,167
3,862
136
@Abwx
Async Compute is as much a factor as GPU PhysX (ie. not much), because DX12 games don't auto use it unless AMD sponsors and add it.

Pascal has just what it needs to perform well in a future where games use more compute effects, and importantly, for VR.

For VR the bar will be dramaticaly upped since current GPUs are hardly good enough, so any efficency that can be squeezed will be welcomed and used if available, it s not like the marketshares are set in stone...

As for Nvidia next GPU it wont benefit from any perf/watt advantage, i think that it s even the contrary that is looming, so increasing frequency wont be as efficient as with Maxwell in respect of Hawai, the only solution left will be bigger dies if they want to be on par perf/watt wise.
 

Cookie Monster

Diamond Member
May 7, 2005
5,161
32
86
http://www.extremetech.com/extreme/...axwell-production-ahead-of-june-pascal-launch

And extremetech conclusion :


From what i read here it s like a limitation has been partialy removed but this is still below what is possible with GCN, in short, as pointed by Silverforce, Nvidia is just re inventing a GCN like wheel...

Many posters on AT especially VC&G confuse or dont understand (and fair enough not everyone in the forums are from this field or ones relevant or close to it) just how complex these products are.

When I read sweeping statements like nVIDIA is copying GCN or something along those lines, I feel many dont really do their due diligence on the subject matter (or perhaps just making stuff up which is normally the case). Its quite ignorant. This is regardless of nVIDIA/AMD. It applies to every electronic product. But when it comes to GPUs its quite clear that what gets talked about here are VERY high level which from my experience in a similiar field means nothing. Nice diagrams, and flow charts etc but whats really important is the actual low level stuff. The part where it actually makes it work. A good example are power supplies. You can have hundreds of different ways of delivering 300W DC from your mains. Yet I cant say that one or two topologies out of the hundred is the right answer.

Now one thing that gets me abit annoyed is, some members in this subforum are rather vocal about how the architecture needs to be done like this or that. In the engineering world, there are ALWAYS trades offs. There is no right or wrong solution. Its all about tradeoffs. In a GPU, you can expect to have alot of these. That is why AMD GPUs are different to nVIDIAs (and most likely alot MORE different if you understand the low level or have had access to both architecture's blue prints).

Basically, performance gains can come from different areas of the GPU and it just so happens that with the GCN architecture, increasing utilisation by taking advantage of running compute shaders in parallel with graphics workload (because the hardware is capable) in some situations increases performance rather than have idling ALUs. This is one of many ways probably to speed up performance on a GCN based GPU. But this also has its downsides like power consumption, more transistors required and overall complexity e.g. scheduling, shared resources etc (along with situations like what if the game doesn't require compute shaders in the first place? or the workload is just a small %?).

Within the next year, we will know which architecture is more suited (or perhaps both have their strengths and weakness) to the current generation of games.
 

renderstate

Senior member
Apr 23, 2016
237
0
0
It would be nice if people would actually read and understand what they read before replying with the usual async compute story. Let me explain it again: async compute and scheduling granularity have *nothing* to do with preemption granularity. You can have a great architecture that can schedule compute and gfx work on the same SIMD processor at the speed of light and run it concurrently and you might still have to wait 3 seconds (or forever ) for a warp/wavefront to run to completion and free its resources (registers, etc.).

On GP100 CUDA programs can be interrupted at any time, you just need to wait for an instruction to be completed. No more driver timing out because the program doesn't stop, no more waiting milliseconds to get control of the GPU (again, in CUDA). This has nothing to do with async compute, so please, just stop bringing it up. Thank you


Sent from my iPhone using Tapatalk
 

Magee_MC

Senior member
Jan 18, 2010
217
13
81
It would be nice if people would actually read and understand what they read before replying with the usual async compute story. Let me explain it again: async compute and scheduling granularity have *nothing* to do with preemption granularity. You can have a great architecture that can schedule compute and gfx work on the same SIMD processor at the speed of light and run it concurrently and you might still have to wait 3 seconds (or forever ) for a warp/wavefront to run to completion and free its resources (registers, etc.).

On GP100 CUDA programs can be interrupted at any time, you just need to wait for an instruction to be completed. No more driver timing out because the program doesn't stop, no more waiting milliseconds to get control of the GPU (again, in CUDA). This has nothing to do with async compute, so please, just stop bringing it up. Thank you

Sent from my iPhone using Tapatalk


If CUDA programs can be interrupted at any time, then does it automatically follow that non CUDA programs can also be similarly interrupted? I understood that in the past there were things that CUDA could do with the GPU that were not possible with DX.

I'm asking because NV specified CUDA programs, and as I read it, that's a specific limitation that they wouldn't have inserted unless it was a genuine limitation. If it were universally available, they wouldn't have phrased it in that way.

Edit: I should have checked the white paper first. NV's phrasing doesn't specify CUDA programs, so what I wrote above is moot.

Compute Preemption is another important new hardware and software feature added to GP100 that allows compute tasks to be preempted at instruction-level granularity, rather than thread block granularity as in prior Maxwell and Kepler GPU architectures. Compute Preemption prevents long-running applications from either monopolizing the system (preventing other applications from running) or timing out. Programmers no longer need to modify their long-running applications to play nicely with other GPU applications. With Compute Preemption in GP100, applications can run as long as needed to process large datasets or wait for various conditions to occur, while scheduled alongside other tasks. For example, both interactive graphics tasks and interactive debuggers can run in concert with long-running compute tasks.
 
Last edited:

airfathaaaaa

Senior member
Feb 12, 2016
692
12
81
Flexibility in arithmetic units is the enemy of power efficiency. I suspect this is why NVIDIA uses dedicated FP64 units in a market where area is less of a problem and power efficiency is way more important. There is nothing particularly hard or magical in fusing 2 or more ALUs to support wider operations.

I can't find any reference on FP16 ops running at double rate on GCN. Pointers? Even PS3 GPU, designed by NVIDIA, supports FP16 registers and some limited FP16 math, it's not exactly something new.

The rest is inaccurate or just wrong. IIRC GP100 compute preemption is more fine grained than GCN's. Also GP100 doesn't have 64 wide warps, which are still 32 wide. They simply split their SM in two halves doubling the register file for each half. Both physical and logical warp sizes are unchanged, so no, they didn't go the GCN route, apart from putting down more registers, which is something NVIDIA has been doing pretty much any generation.



GCN is a *great* architecture but to suggest NVIDIA is copying it or imitating it is based on wishful thinking. Do you really think that developer computing architectures work that way? Let's be serious.
so how do you explain nvidia's own whitepaper? are they lying? it was some sort of typo's?
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |