Battlefield 1 Benchmarks (Gamegpu & the rest)

Termie · Nov 2, 2016

xxmastermindxx said:
Thanks for this, I appreciate the work you put in. Thinking of switching out my 290CF for 1070s. Also moving from my 3770K to an 8 core. We shall see.

Your welcome - thanks for the feedback!

pcslookout · Nov 2, 2016

Finally fixed my BF1 performance issue and it was all me. No more upgrade time until 6 months from now maybe.

Now finally getting 71 to 80 fps minimum fps on all Ultra at 1920x1080 with a i5 2500k and GTX 1070!

Carfax83 · Nov 3, 2016

fingerbob69 said:
No it is not ...or atleast it shouldn't be. So long power usage is 'reasonable' and all the latest gen gpus meet that definition then it matters not one jot. No one is or should be basing their gpu purchase on whether a card uses 150w versus one that uses 160w.

Give me fps minimums or averages or frame-times or cost per dollar/pound ...they are what count.

Headfoot said:
It is not even remotely close to the most important metric for a consumer.

Did I say anything about consumer? I was speaking from an engineering perspective, which I admit is probably beyond the scope of this debate. Also performance per watt doesn't solely mean power usage. NVidia focuses on performance per watt by making their CUDA cores as power efficient as possible, whilst also simultaneously increasing their effectiveness; or their ability to do more work. The power efficiency AND performance of their architectures have increased very significantly over the years, especially from Kepler to Maxwell and Pascal. And when it comes to targeting their consumers, they simply use the same microarchitecture and scale it up or down. So performance per watt + scalability = NVidia domination..

Look at the GTX 1060. It has a measly 1280 CCs, yet it manages to compete favorably against, and even outperform the RX 480 which has 80% more CUs, whilst using less power. The fact that AMD has a DX11 driver efficiency problem underscores the issue even more.. So yes, NVidia has a very wide lead over AMD at the moment when it comes to performance per watt, which makes them vulnerable on several fronts; including overall performance.

Headfoot · Nov 3, 2016

Carfax83 said:
Did I say anything about consumer?

You posted your broad assertion on a consumer forum. Not an engineering forum. Context matters. And to a consumer, perf/watt is near the bottom of the list of important factors. Absolute performance first, price second, price/perf ratio third

Carfax83 · Nov 3, 2016

Headfoot said:
You posted your broad assertion on a consumer forum. Not an engineering forum. Context matters. And to a consumer, perf/watt is near the bottom of the list of important factors. Absolute performance first, price second, price/perf ratio third

We debate engineering stuff on this forum all the time. My point is that you can't divorce performance per watt from absolute performance, because the two are very much related, as shown by the GTX 1060 vs the RX 480 example. NVidia squeezes a LOT more performance out of their CUDA cores than AMD does their SPs, which gives them a tremendous advantage in performance and flexibility when it comes to designing their GPUs.

Zanovar · Nov 4, 2016

Keysplayr said:
This is all by design. IMHO. These are games that couldve easily been coded to run on far less hrdware requirements. But I think its a one back scratches the oher arrangements between software and hardware companies to constantly up the ante on what is required to play.
Intel needs more cpus sold. Send a memo out to blizzard and EA (whoever) to make their next AAA titles crush our current cpu lineup. And cut em a check. Tin foil hat? You betcha..

So much this.im probably being paranoid though.Bastwards.

fingerbob69 · Nov 4, 2016

Carfax83 said:
Did I say anything about consumer? I was speaking from an engineering perspective, which I admit is probably beyond the scope of this debate.

You didn't state yours was from an engineering perspective and you would help your cause if you left out the patronising tone.

What you did say was "Performance per watt is the most important metric by far..."

From a consumers point of view ....it is not. For the reasons I stated above. What's more given AMD's perceived lead in DX12/Vulcan from a engineers point of view the 480 appears to have the better architecture for the job the card is going to be asked to do now and going forward.

Enigmoid · Nov 4, 2016

fingerbob69 said:
You didn't state yours was from an engineering perspective and you would help your cause if you left out the patronising tone.

What you did say was "Performance per watt is the most important metric by far..."

From a consumers point of view ....it is not. For the reasons I stated above. What's more given AMD's perceived lead in DX12/Vulcan from a engineers point of view the 480 appears to have the better architecture for the job the card is going to be asked to do now and going forward.

AMD does not have much, if any, lead in DX12 from any engineering point of view. When you look at the theoretical performance of AMD and Nvidia GPUs, their physical characteristics (die size, power, FLOPS, etc.) DX12 brings AMD basically on par with Nvidia (Polaris and Pascal) but by no means surpasses it.

Comparing the 480 to the 1060 the 480 should always be significantly ahead. It is larger, uses more power, has significantly more shaders, etc. Yet it is not in DX11 and even in DX12 is doesn't move past the lead it should have if both GPUs had the same performance profiles.

This seems to be a common misconception. In DX 11 AMD required significantly more raw hardware resources to match Nvidia. In DX12 they require less additional hardware resources but generally speaking, are not getting significantly more fps per FLOP, mm^2, W than Nvidia is.

Carfax83 · Nov 4, 2016

fingerbob69 said:
You didn't state yours was from an engineering perspective and you would help your cause if you left out the patronising tone.

What patronizing tone?

....it is not. For the reasons I stated above. What's more given AMD's perceived lead in DX12/Vulcan from a engineers point of view the 480 appears to have the better architecture for the job the card is going to be asked to do now and going forward.

Read Enigmoid's post, as he sums it up nicely. AMD has no real DX12/Vulkan advantage, vs Pascal at any rate.. It's tricky comparing DX12 performance at present, because most DX12 games are poorly optimized and NVidia's DX11 driver already mimics some DX12 capabilities. Only a handful at present run better in DX12 than they do in DX11. Ashes of the Singularity, Gears of War 4, and now possibly Deus Ex MD with the latest patch. BF1, Total War Warhammer, Rise of the Tomb Raider, Hitman etcetera run slower in DX12 mode for either both cards, or just NVidia.

The RX 480 should always outperform the GTX 1060 due to its higher FP32 performance, but it doesn't.

PontiacGTX · Nov 4, 2016

Carfax83 said:
Look at the GTX 1060. It has a measly 1280 CCs, yet it manages to compete favorably against, and even outperform the RX 480 which has 80% more CUs, whilst using less power.

They are using 2 different architecture, on 2 different nodes, with different clock speed. you cant compare them unless you are comparing the card not the architecture

Carfax83 said:
We debate engineering stuff on this forum all the time. My point is that you can't divorce performance per watt from absolute performance, because the two are very much related, as shown by the GTX 1060 vs the RX 480 example. NVidia squeezes a LOT more performance out of their CUDA cores than AMD does their SPs, which gives them a tremendous advantage in performance and flexibility when it comes to designing their GPUs.

Nvidia is taking advantage of high frequency, if it had similar clocks as maxwell the different would be minimal or none

Carfax83 said:
AMD has no real DX12/Vulkan advantage, vs Pascal at any rate.

GCN can do asynchronous compute in parrallel,Pascal cant.

Carfax83 · Nov 4, 2016

PontiacGTX said:
They are using 2 different architecture, on 2 different nodes, with different clock speed. you cant compare them unless you are comparing the card not the architecture

I beg to differ. They may not be directly comparable of course, but they can still be compared, as both of them are GPUs and have the same purpose. While their methods might differ, the end result is the same; artfully rendered pixels on your screen.

Nvidia is taking advantage of high frequency, if it had similar clocks as maxwell the different would be minimal or none

Frequency is just one aspect of GPU performance at any rate, and not even the most important one. If you haven't heard, 3D rendering is embarrassingly parallel, so higher frequencies are nowhere near as important as the shader array performance and bandwidth..

Also, Pascal's higher frequencies are a direct result of its architecture and node process. AMD could not attain similar frequencies without radically changing their architecture..

GCN can do asynchronous compute in parrallel,Pascal cant.

Really? This explains why I get a 4-5 FPS increase with asynchronous compute turned on in Gears of War 4, and a boost in Time Spy DX12

With the overwhelming evidence out now, I can't believe there are still naysayers that think Pascal can't do concurrent asynchronous compute

PontiacGTX · Nov 4, 2016

Carfax83 said:
I beg to differ. They may not be directly comparable of course, but they can still be compared, as both of them are GPUs and have the same purpose. While their methods might differ, the end result is the same; artfully rendered pixels on your screen.

Still pipeline on each side of the gpu differs aswell each part of the GPU for example AMD cant do tiled based rasterization.

Frequency is just one aspect of GPU performance at any rate, and not even the most important one.

for Paxwell the frequency is all.

If you haven't heard, 3D rendering is embarrassingly parallel, so higher frequencies are nowhere near as important as the shader array performance and bandwidth..

Increasing clock speed can also increase textel and pixel fill rate so overall performance can increase to.

Also, Pascal's higher frequencies are a direct result of its architecture and node process. AMD could not attain similar frequencies without radically changing their architecture..

well they improved the clock speed on GCN from 1050MHz to 1260-1340s MHz with slighly tweak to the architecture and using 14nm FF allowed lower better perf/watt

Really? This explains why I get a 4-5 FPS increase with asynchronous compute turned on in Gears of War 4, and a boost in Time Spy DX12

TimeSpy isnt using parallel asynchronous compute+graphics

With the overwhelming evidence out now, I can't believe there are still naysayers that think Pascal can't do concurrent asynchronous compute

It can but not in parrallel

Phynaz · Nov 4, 2016

Headfoot said:
You posted your broad assertion on a consumer forum. Not an engineering forum. Context matters. And to me, perf/watt is near the bottom of the list of important factors. Absolute performance first, price second, price/perf ratio third

FTFY

dogen1 · Nov 4, 2016

PontiacGTX said:
Still pipeline on each side of the gpu differs aswell each part of the GPU for example AMD cant do tiled based rasterization.

for Paxwell the frequency is all.

Increasing clock speed can also increase textel and pixel fill rate so overall performance can increase to.
well they improved the clock speed on GCN from 1050MHz to 1260-1340s MHz with slighly tweak to the architecture and using 14nm FF allowed lower better perf/watt

TimeSpy isnt using parallel asynchronous compute+graphics
It can but not in parrallel

Can you post some worthwhile evidence to back that up? Last I remember it was accepted among the more technically knowledgeable boards that pascal is able schedule compute shaders to work in parallel with graphics shaders(maxwell can too, but without dynamic load balancing). Can it do both in one SM or GPC? I don't know. Can it do both in one GPU? Absolutely.

PontiacGTX · Nov 4, 2016

dogen1 said:
Can you post some worthwhile evidence to back that up? Last I remember it was accepted among the more technically knowledgeable boards that pascal is able schedule compute shaders to work in parallel with graphics shaders(maxwell can too, but without dynamic load balancing). Can it do both in one SM or GPC? I don't know. Can it do both in one GPU? Absolutely.

For overlapping workloads, Pascal introduces support for “dynamic load balancing.” In Maxwell generation GPUs, overlapping workloads were implemented with static partitioning of the GPU into a subset that runs graphics, and a subset that runs compute. This is efficient provided that the balance of work between the two loads roughly matches the partitioning ratio. However, if the compute workload takes longer than the graphics workload, and both need to complete before new work can be done, and the portion of the GPU configured to run graphics will go idle. This can cause reduced performance that may exceed any performance benefit that would have been provided from running the workloads overlapped. Hardware dynamic load balancing addresses this issue by allowing either workload to fill the rest of the machine if idle resources are available.Time critical workloads are the second important asynchronous compute scenario. For example, an asynchronous timewarp operation must complete before scanout starts or a frame will be dropped. In this scenario, the GPU needs to support very fast and low latency preemption to move the less critical workload off of the GPU so that the more critical workload can run as soon as possible.

This in turn is where Pascal steps in. Along with the aforementioned improvements to how Pascal can fill up its execution pipelines, Pascal also implements a radically improved preemption ability. Depending on whether it’s a graphics or a pure compute task, Pascal can now preempt at the thread level or even the instruction level respectively.

from the 1080 whitepaper there says that probably that while compute queues are processed the graphics are idling ,while it can make load balancing it seems it has to do context switching with preemption this using fences isnt really parallel

like says this form an anandtech review

Starting with the case of a graphics task or a mixed graphics + compute task, Pascal can now interrupt at the thread level. For a compute workload this is fairly self-explanatory. Meanwhile for a graphics workload the idea is very similar. Though we’re accustomed to working with pixels as the fundamental unit in a graphics workload, under the hood the pixel is just another thread. As a result the ability to preempt at a thread has very similar consequences for both a graphics workload and the compute threads mixed in with a graphics workload.

With Maxwell 2 and earlier architectures, the GPU would need to complete the whole draw call before preempting. However now with Pascal it can preempt at the pixel level within a triangle, within a draw call. When a preemption request is received, Pascal will stop rasterizing new pixels, let the currently rastered pixels finish going through the CUDA cores, and finally initiate the context switch once the above is done. NVIDIA likes to call this “Pixel Level Preemption

in this image someone used gpuview to see how the queues were done in DOOM AotS and TimeSpy and as you can see there are fences and this limited parallelism aswell TimeSPy Compute workloads were light and fast

Do's

Minimize the use of barriers and fences
We have seen redundant barriers and associated wait for idle operations as a major performance problem for DX11 to DX12 ports
The DX11 driver is doing a great job of reducing barriers – now under DX12 you need to do it
Any barrier or fence can limit parallelism
Make sure to always use the minimum set of resource usage flags
Stay away from using D3D12_RESOURCE_USAGE_GENERIC_READ unless you really need every single flag that is set in this combination of flags
Redundant flags may trigger redundant flushes and stalls and slow down your game unnecessarily
To reiterate: We have seen redundant and/or overly conservative barrier flags and their associated wait for idle operations as a major performance problem for DX11 to DX12 ports.
Specify the minimum set of targets in ID3D12CommandList::ResourceBarrier
Adding false dependencies adds redundancy
Group barriers in one call to ID3D12CommandList::ResourceBarrier
This way the worst case can be picked instead of sequentially going through all barriers
Use split barriers when possible
Use the _BEGIN_ONLY/_END_ONLY flags
This helps the driver doing a more efficient job
Do use fences to signal events/advance across calls to ExecuteCommandLists
Dont's

Don’t insert redundant barriers
This limits parallelism
A transition from D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE to D3D12_RESOURCE_STATE_RENDER_TARGET and back without any draw calls in-between is redundant
Avoid read-to-read barriers
Get the resource in the right state for all subsequent reads
Don’t use D3D12_RESOURCE_USAGE_GENERIC_READ unless you really needs every single flag
Don’t sequentially call ID3D12CommandList::ResourceBarrier with just one barrier
This doesn’t allow the driver to pick the worst case of a set of barriers
Don’t expect fences to trigger signals/advance at a finer granularity then once per ExecuteCommandLists call.

dogen1 · Nov 4, 2016

PontiacGTX said:
from the 1080 whitepaper there says that probably that while compute queues are processed the graphics are idling ,while it can make load balancing it seems it has to do context switching with preemption this using fences isnt really parallel

Reread your first quote. That issue was in maxwell. Pascal can now schedule work for the idling portions of the GPU. Preemption isn't really related to this. Maybe for high priority tasks, but not for work you want to overlap(aka run in parallel).

The only reason gears(and other games) are getting a performance improvement from the option is because work is being overlapped. If this was not the case you would not see a change in the amount of time taken.

Carfax83 · Nov 4, 2016

PontiacGTX said:
Still pipeline on each side of the gpu differs aswell each part of the GPU for example AMD cant do tiled based rasterization.

Of course there are differences, but you can do a very high level general comparison when you look at different factors such as FP32 throughput, memory bandwidth, shader array count etcetera..

for Paxwell the frequency is all.

Increasing clock speed can also increase textel and pixel fill rate so overall performance can increase to.

I don't think you interpreted the video correctly. In fact, it supports my claim. In the end, he had to underclock the GTX 980 Ti for it to match the Tflop rating of the underclocked GTX 1080, because the 980 Ti has more shaders, more cache, more registers, plus more bandwidth due to it's 384 bit memory bus.

So in the end, those other aspects matter to performance. If they didn't, then a GTX 1080 would be able to beat a Titan X Pascal because of its higher clock speed, but obviously it doesn't. Titan XP has 160GB/s more bandwidth, and 40% more shaders than the GTX 1080 which allows it to perform significantly faster, even if the GTX 1080 has higher clock speeds..

well they improved the clock speed on GCN from 1050MHz to 1260-1340s MHz with slighly tweak to the architecture and using 14nm FF allowed lower better perf/watt

That's still way behind what NVidia is getting. An aftermarket GTX 1080 like what I have, can boost to 2ghz clock speeds and sustain it..

TimeSpy isnt using parallel asynchronous compute+graphics
It can but not in parrallel

Although I don't believe that, I will give you TimeSpy. However, what about Gears of War 4? Pascal clearly has a performance increase with asynchronous compute enabled, as do the AMD GPUs..

Carfax83 · Nov 4, 2016

PontiacGTX said:
from the 1080 whitepaper there says that probably that while compute queues are processed the graphics are idling ,while it can make load balancing it seems it has to do context switching with preemption this using fences isnt really parallel

Like dogen1 mentioned, you misinterpreted the data. That only applies to Maxwell, but not to Pascal. Maxwell doesn't support dynamic load balancing, so it can't do concurrent asynchronous compute. However, Pascal can do it, which is why it gets a performance increase from asynchronous compute.

Also, preemption has nothing to do with concurrent asynchronous compute...

RussianSensation · Nov 4, 2016

Carfax83 said:
The RX 480 should always outperform the GTX 1060 due to its higher FP32 performance, but it doesn't.

You do realize that most games are not 100% shader/ALU limited, right? If every single game was 100% shader/ALU (FP32) limited, we should see linear scaling across the entire AMD and NV GPU line-ups. Clearly, that is not the case for Fury X vs. 290X/390X vs. RX 480 or for Titan XP vs. 1080 vs. 1070. Since we do not see linear scaling when comparing FP32 performance when comparing various AMD to AMD GPUs, various NV to NV GPUs, and various AMD to NV GPUs, that means there are at least 4 possibilities:

(1) Not all games are 100% ALU/Shader limited and other parts of the GPU could become the limiting component before FP32/ALU performance,

(2) Even if some game is 100% ALU/shader limited, it doesn't mean the particular AMD/NV SKU itself can scale linearly with extra FP32/ALU theoretical performance, {certain GPUs simply do not scale linearly with extra shaders: HD5850/5870, 6950/6970, 7950/7970, R9 290 vs. 290X vs. FuryX, GTX1080 vs. 1070 all come to mind}

(3) Architecture specific game engine and driver optimizations, as well as black box source code from a particular GPU vendor, can often offset any advantages the actual competing hardware has {there is a lot more to real world performance in games than simply hardware specs. It's why we can have R9 390X beating/approaching GTX980/980Ti in some games and then losing to GTX970 in others}

(4) A Tflops does not always equal another Tflop due to architectural efficiency differences -- for example when comparing GTX580 to GTX680 or R9 280X to R9 380X, we can clearly see, that FP32 performance on paper often has nothing to do with real world gaming performance, even when comparing NV to NV or AMD to AMD. As a result, it's even more flawed to compare real world gaming performance of AMD GPU to NV GPU based on FP32 alone, while ignoring all the other facets that make up the GPU. Comparing GTX580 to 680 or Fury X to GTX1080 should already be enough to see that comparing various GPUs from different generations based on FP32 specs is often a total waste of time.

Since there are at least 4 legitimate reasons why comparing GPUs based on shader/ALU/FP32 performance is flawed, your premise that RX 480 should always outperform GTX1060 due to higher FP32 performance is automatically flawed.

Considering you compared NV's CCs to AMD's SPs in this thread while flat out ignoring that shader performance is also a function of GPU clock speeds {as in NV only needs 1280 CCs to compete with AMD's 2304 SPs <-- sounds completely flawed since most GTX1060 cards boost to 1.9-2Ghz out of the box!}, it's not surprising then that you fail again when comparing one AMD GPU to another NV GPU and ignoring that one of them has more than 2X the theoretical pixel shading performance. Then there is the fact that GTX1060 has close to 50% better geometry performance with culled triangles. Then there is the fact that rasterization/polygon throughput between AMD and NV is vastly different and itself cannot be measured by FP32 paper specs. Furthermore, more likely than not each 1GB/sec memory bandwidth for AMD is not directly comparable with NV due to different delta colour compression efficiency, and different ROP efficiency.

The way you position GPUs is akin to someone comparing 2 cars stricly based on horsepower, while ignoring all other metrics such as transmission efficiency; vehicle weight; ability of the vehicle to put the power to the ground via fwd, rwd, or awd; computer assist/electronic systems, etc.

Discussion of GPU architectures is extremely complex, so much so, that both AMD and NV themselves often incorrectly predicte the GPU layout for future games. If NV's best engineers often fail to design a proper GPU for long-term games (the entire Kepler line and GTX750Ti->980 line say hello), then what makes you think you can make definitive statements that AMD GPU "X" should outperform NV GPU "Y" because it has more shaders/FP32 performance? Anyone can just as easily argue the complete opposite that GTX1060 should win in every single game since it has 48 ROPs clocked at close to 2Ghz out of the box against RX 480's 32 ROPs and only 1.266Ghz clocks. Just like you phrased the argument that games must be shader/FP32 limited, anyone else can say that as games are becoming pixel shader limited as well compared to last generation. Give me a break. Both of these hard-line black or white arguments are completely flawed since the performance of GPU X or Y in game Z is a function of many variables than simply pixel fill-rate, shader throughput, geometry throughout, computer shaders, or any one metric.

Raja himself stated that > 50% of GPU performance is on the software side, not the hardware side. The drivers can only do so much but the way developers make games/engine also has a large impact on how the game performs on a particular SKU.

Right now when the consumer is going to buy RX 480 or GTX1060, in modern games, RX 480 wins 6, GTX1060 wins 6, they tie in 1. GTX1060 also wins in Civ 6 but real world gameplay with Infinite Warfare and Titanfall 2 has RX 480 winning as well. Given that AMD has fewer resources than NV or Intel, but is competing on 2 fronts, it's a remarkable achievement that AMD is still able to have GPUs that compete with NV and often win. Once Vega comes, it'll dethrone GTX1080 easily, despite some naysayers claiming that Vega will barely compete with a GTX1070 OC.

Headfoot · Nov 4, 2016

Phynaz said:
FTFY

To me and to a large portion if not the majority of buyers. Buyers who think perf/watt is the number one concern in the dGPU space is nowhere near majority position.

Carfax83 · Nov 4, 2016

RussianSensation said:
You do realize that most games are not 100% shader/ALU limited, right? If every single game was 100% shader/ALU (FP32) limited, we should see linear scaling across the entire AMD and NV GPU line-ups. Clearly, that is not the case for Fury X vs. 290X/390X vs. RX 480 or for Titan XP vs. 1080 vs. 1070. Since we do not see linear scaling when comparing FP32 performance when comparing various AMD to AMD GPUs, various NV to NV GPUs, and various AMD to NV GPUs, that means there are at least 4 possibilities:

Of course not all games are 100% shader limited.. I don't think any reasonable person would dispute that. However, the majority of modern games are definitely more shader bound than anything else, and are becoming increasingly so as time goes by due to the increasing emphasis on using compute to accelerate rendering and post processing.

(4) A Tflops does not always equal another Tflop due to architectural efficiency differences -- for example when comparing GTX580 to GTX680 or R9 280X to R9 380X, we can clearly see, that FP32 performance on paper often has nothing to do with real world gaming performance, even when comparing NV to NV or AMD to AMD. As a result, it's even more flawed to compare real world gaming performance of AMD GPU to NV GPU based on FP32 alone, while ignoring all the other facets that make up the GPU. Comparing GTX580 to 680 or Fury X to GTX1080 should already be enough to see that comparing various GPUs from different generations based on FP32 specs is often a total waste of time.

I don't even know why you're raising this point, because it's already been covered, and in fact, was the gist of my entire argument. My overarching point is that NVidia's performance per watt and architectural efficiency are so much higher than AMD's, that the GTX 1060 with seemingly inferior specs, is often faster than the RX 480..

I never said that FP32 is the sole determinant of GPU performance; you're just twisting my words and totally ignoring the CONTEXT of the conversation. Again, let me reiterate. My stance is that NVidia has a massive lead over AMD in terms of performance per watt, aka architectural efficiency. So even though AMD might have seemingly superior specs on paper, ie more shaders, more bandwidth etcetera, the Geforce lineup will often outperform their AMD competitors due to greater efficiency..

So even though others were claiming that performance per watt is no concern for consumers, I'm showing that it is VERY relevant because performance per watt is just an engineering phrase for overall architectural efficiency. And as such, NVidia is totally destroying AMD on that front..

PontiacGTX · Nov 5, 2016

Carfax83 said:
Of course there are differences, but you can do a very high level general comparison when you look at different factors such as FP32 throughput, memory bandwidth, shader array count etcetera..

yeha but you were suggesting a comparison between 2 different architectures in the core units not bandwidth shader array,fp32. they differ in core compute units.

I don't think you interpreted the video correctly. In fact, it supports my claim. In the end, he had to underclock the GTX 980 Ti for it to match the Tflop rating of the underclocked GTX 1080, because the 980 Ti has more shaders, more cache, more registers, plus more bandwidth due to it's 384 bit memory bus.

the video is exactly suggesting that Pascal is just Maxwell on 16nmFF and higher clock speed given a better architecture would give a better performance per cycle, like GCN4 does against GCN3 and GCN1 2048SP GPUs

So in the end, those other aspects matter to performance. If they didn't, then a GTX 1080 would be able to beat a Titan X Pascal because of its higher clock speed, but obviously it doesn't. Titan XP has 160GB/s more bandwidth, and 40% more shaders than the GTX 1080 which allows it to perform significantly faster, even if the GTX 1080 has higher clock speeds.

.you are not even seeing that the small SMs difference between 1080 and 980Ti is tried to be leveled with similar TFLOPs reducing 1080 clock speed, and if a 3000cuda cores or so Paxwell GPU is being overclocked it will match and beat GTX Titan X Pascal

That's still way behind what NVidia is getting. An aftermarket GTX 1080 like what I have, can boost to 2ghz clock speeds and sustain it..

Although I don't believe that, I will give you TimeSpy. However, what about Gears of War 4? Pascal clearly has a performance increase with asynchronous compute enabled, as do the AMD GPUs..[/QUOTE]

dogen1 said:
Reread your first quote. That issue was in maxwell. Pascal can now schedule work for the idling portions of the GPU. Preemption isn't really related to this. Maybe for high priority tasks, but not for work you want to overlap(aka run in parallel).

The only reason gears(and other games) are getting a performance improvement from the option is because work is being overlapped. If this was not the case you would not see a change in the amount of time taken.

what I mean is that if the Pascal(Paxwell) can use pre-emption to assign a tasks and make a context switch with the use of fences this probably means it doesnt do it in parallel like Anandtech review says

Starting with the case of a graphics task or a mixed graphics + compute task, Pascal can now interrupt at the thread level. For a compute workload this is fairly self-explanatory. Meanwhile for a graphics workload the idea is very similar. Though we’re accustomed to working with pixels as the fundamental unit in a graphics workload, under the hood the pixel is just another thread. As a result the ability to preempt at a thread has very similar consequences for both a graphics workload and the compute threads mixed in with a graphics workload.

dogen1 · Nov 5, 2016

PontiacGTX said:
what I mean is that if the Pascal(Paxwell) can use pre-emption to assign a tasks and make a context switch with the use of fences this probably means it doesnt do it in parallel like Anandtech review says

Dude, you're not even trying to understand. Pascal can dynamically schedule more work for IDLING PORTIONS OF THE GPU. Why would they need to context switch if they're doing nothing?

Did you skip the page right before the one you keep quoting? Go read it.
http://www.anandtech.com/show/10325/the-nvidia-geforce-gtx-1080-and-1070-founders-edition-review/9

Profanity is not allowed in the technical forums,
Markfw900
Anandtech Moderator

PontiacGTX · Nov 5, 2016

dogen1 said:
Holy hell dude. You're not even trying to understand. Pascal can dynamically schedule more work for IDLING PORTIONS OF THE GPU. Why the hell would they need to context switch if they're doing nothing?

Did you skip the page right before the one you keep quoting? Go read it.
http://www.anandtech.com/show/10325/the-nvidia-geforce-gtx-1080-and-1070-founders-edition-review/9

it tells it cna do it dynamic scheduling but no where says it can do it in parallel, instead preemption graphics and compute does it switching context. like asynchronous compute+graphics could do it would take more time than doing it in parallel

from that and next page

Dynamic scheduling requires a greater management of hazards that simply weren’t an issue with static scheduling, as now you need to handle everything involved with suddenly switching an SM to a different queue
...

So what is preemption then? In a nutshell, it’s the ability to interrupt an active task (context switch) on a processor and replace it with another task, with the further ability to later resume where you left off.

....

out to fine-grained context switching that allows for an almost immediate switch at any point in time. What’s new for Pascal then is that preemptive context switching just got a lot finer grained, especially for compute.

...
But in the end, the result is that Pascal can now execute a preemptive context switch for graphics much more rapidly than Maxwell 2 could.

...

Meanwhile I’ll quickly note that while the current Pascal drivers only implement thread/pixel level preemption for graphics and mixed workloads,

Carfax83 · Nov 5, 2016

@dogen1, I think this guy is trolling us. The fact that he hasn't acknowledged Gears 4 after I told him twice, is quite telling

And he keeps confusing preemption with asynchronous compute..

Battlefield 1 Benchmarks (Gamegpu & the rest)

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Member

Platinum Member

Diamond Member

Senior member

Diamond Member

Senior member

Lifer

Senior member

Senior member

Senior member

Diamond Member

Diamond Member

Elite Member

Diamond Member

Diamond Member

Senior member

Senior member

Senior member

Diamond Member