Termie
Diamond Member
Thanks for this, I appreciate the work you put in. Thinking of switching out my 290CF for 1070s. Also moving from my 3770K to an 8 core. We shall see.
Your welcome - thanks for the feedback!
Thanks for this, I appreciate the work you put in. Thinking of switching out my 290CF for 1070s. Also moving from my 3770K to an 8 core. We shall see.
No it is not ...or atleast it shouldn't be. So long power usage is 'reasonable' and all the latest gen gpus meet that definition then it matters not one jot. No one is or should be basing their gpu purchase on whether a card uses 150w versus one that uses 160w.
Give me fps minimums or averages or frame-times or cost per dollar/pound ...they are what count.
It is not even remotely close to the most important metric for a consumer.
Did I say anything about consumer?
You posted your broad assertion on a consumer forum. Not an engineering forum. Context matters. And to a consumer, perf/watt is near the bottom of the list of important factors. Absolute performance first, price second, price/perf ratio third
This is all by design. IMHO. These are games that couldve easily been coded to run on far less hrdware requirements. But I think its a one back scratches the oher arrangements between software and hardware companies to constantly up the ante on what is required to play.
Intel needs more cpus sold. Send a memo out to blizzard and EA (whoever) to make their next AAA titles crush our current cpu lineup. And cut em a check. Tin foil hat? You betcha..
Did I say anything about consumer? I was speaking from an engineering perspective, which I admit is probably beyond the scope of this debate.
You didn't state yours was from an engineering perspective and you would help your cause if you left out the patronising tone.
What you did say was "Performance per watt is the most important metric by far..."
From a consumers point of view ....it is not. For the reasons I stated above. What's more given AMD's perceived lead in DX12/Vulcan from a engineers point of view the 480 appears to have the better architecture for the job the card is going to be asked to do now and going forward.
You didn't state yours was from an engineering perspective and you would help your cause if you left out the patronising tone.
....it is not. For the reasons I stated above. What's more given AMD's perceived lead in DX12/Vulcan from a engineers point of view the 480 appears to have the better architecture for the job the card is going to be asked to do now and going forward.
They are using 2 different architecture, on 2 different nodes, with different clock speed. you cant compare them unless you are comparing the card not the architectureLook at the GTX 1060. It has a measly 1280 CCs, yet it manages to compete favorably against, and even outperform the RX 480 which has 80% more CUs, whilst using less power.
Nvidia is taking advantage of high frequency, if it had similar clocks as maxwell the different would be minimal or noneWe debate engineering stuff on this forum all the time. My point is that you can't divorce performance per watt from absolute performance, because the two are very much related, as shown by the GTX 1060 vs the RX 480 example. NVidia squeezes a LOT more performance out of their CUDA cores than AMD does their SPs, which gives them a tremendous advantage in performance and flexibility when it comes to designing their GPUs.
GCN can do asynchronous compute in parrallel,Pascal cant.AMD has no real DX12/Vulkan advantage, vs Pascal at any rate.
They are using 2 different architecture, on 2 different nodes, with different clock speed. you cant compare them unless you are comparing the card not the architecture
Nvidia is taking advantage of high frequency, if it had similar clocks as maxwell the different would be minimal or none
GCN can do asynchronous compute in parrallel,Pascal cant.
Still pipeline on each side of the gpu differs aswell each part of the GPU for example AMD cant do tiled based rasterization.I beg to differ. They may not be directly comparable of course, but they can still be compared, as both of them are GPUs and have the same purpose. While their methods might differ, the end result is the same; artfully rendered pixels on your screen.
for Paxwell the frequency is all.Frequency is just one aspect of GPU performance at any rate, and not even the most important one.
Increasing clock speed can also increase textel and pixel fill rate so overall performance can increase to.If you haven't heard, 3D rendering is embarrassingly parallel, so higher frequencies are nowhere near as important as the shader array performance and bandwidth..
well they improved the clock speed on GCN from 1050MHz to 1260-1340s MHz with slighly tweak to the architecture and using 14nm FF allowed lower better perf/wattAlso, Pascal's higher frequencies are a direct result of its architecture and node process. AMD could not attain similar frequencies without radically changing their architecture..
TimeSpy isnt using parallel asynchronous compute+graphicsReally? This explains why I get a 4-5 FPS increase with asynchronous compute turned on in Gears of War 4, and a boost in Time Spy DX12
It can but not in parrallelWith the overwhelming evidence out now, I can't believe there are still naysayers that think Pascal can't do concurrent asynchronous compute
You posted your broad assertion on a consumer forum. Not an engineering forum. Context matters. And to me, perf/watt is near the bottom of the list of important factors. Absolute performance first, price second, price/perf ratio third
Still pipeline on each side of the gpu differs aswell each part of the GPU for example AMD cant do tiled based rasterization.
for Paxwell the frequency is all.
Increasing clock speed can also increase textel and pixel fill rate so overall performance can increase to.
well they improved the clock speed on GCN from 1050MHz to 1260-1340s MHz with slighly tweak to the architecture and using 14nm FF allowed lower better perf/watt
TimeSpy isnt using parallel asynchronous compute+graphics
It can but not in parrallel
Can you post some worthwhile evidence to back that up? Last I remember it was accepted among the more technically knowledgeable boards that pascal is able schedule compute shaders to work in parallel with graphics shaders(maxwell can too, but without dynamic load balancing). Can it do both in one SM or GPC? I don't know. Can it do both in one GPU? Absolutely.
For overlapping workloads, Pascal introduces support for “dynamic load balancing.” In Maxwell generation GPUs, overlapping workloads were implemented with static partitioning of the GPU into a subset that runs graphics, and a subset that runs compute. This is efficient provided that the balance of work between the two loads roughly matches the partitioning ratio. However, if the compute workload takes longer than the graphics workload, and both need to complete before new work can be done, and the portion of the GPU configured to run graphics will go idle. This can cause reduced performance that may exceed any performance benefit that would have been provided from running the workloads overlapped. Hardware dynamic load balancing addresses this issue by allowing either workload to fill the rest of the machine if idle resources are available.Time critical workloads are the second important asynchronous compute scenario. For example, an asynchronous timewarp operation must complete before scanout starts or a frame will be dropped. In this scenario, the GPU needs to support very fast and low latency preemption to move the less critical workload off of the GPU so that the more critical workload can run as soon as possible.
from the 1080 whitepaper there says that probably that while compute queues are processed the graphics are idling ,while it can make load balancing it seems it has to do context switching with preemption this using fences isnt really parallelThis in turn is where Pascal steps in. Along with the aforementioned improvements to how Pascal can fill up its execution pipelines, Pascal also implements a radically improved preemption ability. Depending on whether it’s a graphics or a pure compute task, Pascal can now preempt at the thread level or even the instruction level respectively.
Starting with the case of a graphics task or a mixed graphics + compute task, Pascal can now interrupt at the thread level. For a compute workload this is fairly self-explanatory. Meanwhile for a graphics workload the idea is very similar. Though we’re accustomed to working with pixels as the fundamental unit in a graphics workload, under the hood the pixel is just another thread. As a result the ability to preempt at a thread has very similar consequences for both a graphics workload and the compute threads mixed in with a graphics workload.
With Maxwell 2 and earlier architectures, the GPU would need to complete the whole draw call before preempting. However now with Pascal it can preempt at the pixel level within a triangle, within a draw call. When a preemption request is received, Pascal will stop rasterizing new pixels, let the currently rastered pixels finish going through the CUDA cores, and finally initiate the context switch once the above is done. NVIDIA likes to call this “Pixel Level Preemption
from the 1080 whitepaper there says that probably that while compute queues are processed the graphics are idling ,while it can make load balancing it seems it has to do context switching with preemption this using fences isnt really parallel
Still pipeline on each side of the gpu differs aswell each part of the GPU for example AMD cant do tiled based rasterization.
for Paxwell the frequency is all.
Increasing clock speed can also increase textel and pixel fill rate so overall performance can increase to.
well they improved the clock speed on GCN from 1050MHz to 1260-1340s MHz with slighly tweak to the architecture and using 14nm FF allowed lower better perf/watt
TimeSpy isnt using parallel asynchronous compute+graphics
It can but not in parrallel
from the 1080 whitepaper there says that probably that while compute queues are processed the graphics are idling ,while it can make load balancing it seems it has to do context switching with preemption this using fences isnt really parallel
The RX 480 should always outperform the GTX 1060 due to its higher FP32 performance, but it doesn't.
FTFY
You do realize that most games are not 100% shader/ALU limited, right? If every single game was 100% shader/ALU (FP32) limited, we should see linear scaling across the entire AMD and NV GPU line-ups. Clearly, that is not the case for Fury X vs. 290X/390X vs. RX 480 or for Titan XP vs. 1080 vs. 1070. Since we do not see linear scaling when comparing FP32 performance when comparing various AMD to AMD GPUs, various NV to NV GPUs, and various AMD to NV GPUs, that means there are at least 4 possibilities:
(4) A Tflops does not always equal another Tflop due to architectural efficiency differences -- for example when comparing GTX580 to GTX680 or R9 280X to R9 380X, we can clearly see, that FP32 performance on paper often has nothing to do with real world gaming performance, even when comparing NV to NV or AMD to AMD. As a result, it's even more flawed to compare real world gaming performance of AMD GPU to NV GPU based on FP32 alone, while ignoring all the other facets that make up the GPU. Comparing GTX580 to 680 or Fury X to GTX1080 should already be enough to see that comparing various GPUs from different generations based on FP32 specs is often a total waste of time.
yeha but you were suggesting a comparison between 2 different architectures in the core units not bandwidth shader array,fp32. they differ in core compute units.Of course there are differences, but you can do a very high level general comparison when you look at different factors such as FP32 throughput, memory bandwidth, shader array count etcetera..
the video is exactly suggesting that Pascal is just Maxwell on 16nmFF and higher clock speed given a better architecture would give a better performance per cycle, like GCN4 does against GCN3 and GCN1 2048SP GPUsI don't think you interpreted the video correctly. In fact, it supports my claim. In the end, he had to underclock the GTX 980 Ti for it to match the Tflop rating of the underclocked GTX 1080, because the 980 Ti has more shaders, more cache, more registers, plus more bandwidth due to it's 384 bit memory bus.
.you are not even seeing that the small SMs difference between 1080 and 980Ti is tried to be leveled with similar TFLOPs reducing 1080 clock speed, and if a 3000cuda cores or so Paxwell GPU is being overclocked it will match and beat GTX Titan X PascalSo in the end, those other aspects matter to performance. If they didn't, then a GTX 1080 would be able to beat a Titan X Pascal because of its higher clock speed, but obviously it doesn't. Titan XP has 160GB/s more bandwidth, and 40% more shaders than the GTX 1080 which allows it to perform significantly faster, even if the GTX 1080 has higher clock speeds.
what I mean is that if the Pascal(Paxwell) can use pre-emption to assign a tasks and make a context switch with the use of fences this probably means it doesnt do it in parallel like Anandtech review saysReread your first quote. That issue was in maxwell. Pascal can now schedule work for the idling portions of the GPU. Preemption isn't really related to this. Maybe for high priority tasks, but not for work you want to overlap(aka run in parallel).
The only reason gears(and other games) are getting a performance improvement from the option is because work is being overlapped. If this was not the case you would not see a change in the amount of time taken.
Starting with the case of a graphics task or a mixed graphics + compute task, Pascal can now interrupt at the thread level. For a compute workload this is fairly self-explanatory. Meanwhile for a graphics workload the idea is very similar. Though we’re accustomed to working with pixels as the fundamental unit in a graphics workload, under the hood the pixel is just another thread. As a result the ability to preempt at a thread has very similar consequences for both a graphics workload and the compute threads mixed in with a graphics workload.
what I mean is that if the Pascal(Paxwell) can use pre-emption to assign a tasks and make a context switch with the use of fences this probably means it doesnt do it in parallel like Anandtech review says
it tells it cna do it dynamic scheduling but no where says it can do it in parallel, instead preemption graphics and compute does it switching context. like asynchronous compute+graphics could do it would take more time than doing it in parallelHoly hell dude. You're not even trying to understand. Pascal can dynamically schedule more work for IDLING PORTIONS OF THE GPU. Why the hell would they need to context switch if they're doing nothing?
Did you skip the page right before the one you keep quoting? Go read it.
http://www.anandtech.com/show/10325/the-nvidia-geforce-gtx-1080-and-1070-founders-edition-review/9
Dynamic scheduling requires a greater management of hazards that simply weren’t an issue with static scheduling, as now you need to handle everything involved with suddenly switching an SM to a different queue
...
So what is preemption then? In a nutshell, it’s the ability to interrupt an active task (context switch) on a processor and replace it with another task, with the further ability to later resume where you left off.
....
out to fine-grained context switching that allows for an almost immediate switch at any point in time. What’s new for Pascal then is that preemptive context switching just got a lot finer grained, especially for compute.
...
But in the end, the result is that Pascal can now execute a preemptive context switch for graphics much more rapidly than Maxwell 2 could.
...
Meanwhile I’ll quickly note that while the current Pascal drivers only implement thread/pixel level preemption for graphics and mixed workloads,