WCCftech: Memory allocation problem with GTX 970 [UPDATE] PCPer: NVidia response

Spanners · Jan 24, 2015

96Firebird said:
I don't think you are understanding that these tests aren't even touching the .5GB partition on the 970. Games do access that extra .5GB of VRAM.

I'm not sure this is accurate. Why would Nvidia prioritize the 3.5GB section of memory if it had similar performance? Why create two sections at all in fact? I think the 0.5GB has an 1/8th of the bandwidth as the test shows.

Could be wrong though admittedly.

Silverforce11 · Jan 24, 2015

Vesku said:
Nvidia's statement demonstrates there is a performance hit when using the 0.5GB section. They are just saying it's pretty small (few percent). Most likely source of the performance hit is that 0.5GB section having lower throughput than the other 3.5GBs.

https://forums.geforce.com/default/...tx-970-3-5gb-vram-issue/post/4432672/#4432672

That is FPS though, have to wait and see if there is an impact on frame delivery. Or perhaps this is the source of the dropped frames issue with 970 SLI?

If you compare the performance between 980 vs 970 at <3.5gb and >3.5gb, the 970 tapers off more than the 980. The delta is ~5%, not 1% as claimed.

Call of Duty: AW (<3.5gb)
980 = 82 fps
970 = 71 fps

Performance delta = 82/71 = 15% faster for 980

>3.5gb
980 = 48
970 = 40

Performance delta = 48/40 = 20% faster for 980

The 970 suffers more relative to the 980 as >3.5gb vram is used.

As I posted, this NV provided data can be assumed to be their "best case scenario" where their drivers are optimized for the game/engine, by shifting non critical assets into the last 0.5gb for the 970.

We already know Frostbite will allocate vram and use more than "needed" by enhancing LOD for distant objects. It could well be in other game engines, particularly more advanced engines will do something similar. So the "vram used" may not equate to "vram needed". Combined with NV driver optimizations, the performance impact can be reduced.

So far, only Skyrim & Arma 3 has documented stutter issues, both are older games which NV may have not optimized Maxwell for.

Edit: In Skyrim, the texture mods are just pure ultra-res textures that either fit into vram or don't, I can't imagine there's much optimizations to "make it fit" better. Whereas techniques used to test in FC4 or these other games, are MSAA 8x or Super Sampling or DSR, Maxwell drivers could optimize their performance by prioritizing what goes into 3.5gb and the remaining 0.5gb partition. Either that or you can say Skyrim & Arma 3 are buggy with 970 but not 980. :/

Abwx · Jan 24, 2015

Spanners said:
Why would Nvidia prioritize the 3.5GB section of memory if it had similar performance? Why create two sections at all in fact?

Simplified MC for efficency purposes, it cant feed a given SM with a data picked randomly in the whole 4GB RAM, each SM has a limited adressing range through this MC, when datas are picked in the 0.5GB partition they are facing disabled SMs, hence they must be swapped in another part of the RAM, expressely in the 3.5GB part, the MC width is 256bit but the actual datapath width seems to be less.

Enigmoid · Jan 24, 2015

Silverforce11 said:
If you compare the performance between 980 vs 970 at <3.5gb and >3.5gb, the 970 tapers off more than the 980. The delta is ~5%, not 1% as claimed.

Call of Duty: AW (<3.5gb)
980 = 82 fps
970 = 71 fps

Performance delta = 82/71 = 15% faster for 980

>3.5gb
980 = 48
970 = 40

Performance delta = 48/40 = 20% faster for 980

The 970 suffers more relative to the 980 as >3.5gb vram is used.

Your sample size is too small to confirm such tiny differences. Taking 0.7 fps from the 970 >3.5 gb (error in the difference of two sample points subtracted from each other at half the reporting error - 0.5 fps) and taking 0.7 fps from the 980 reduces the percentages to

82/70.3 = 16.6%
47.3/40 = 18.2%
faster for the 980

Not statistically significant until we get a larger sample. The three games on record give something similar. I mean look at the bf4 drop. 980 from 36 to 19 fps, the 970 from 30 to 15. That is easily reporting error. Same with shadow of mordor. COD is the only thing that shows deviations above standard error. Averaging them together would indicate a small loss for the 970 but nothing major.

Pneumothorax · Jan 24, 2015

Sitting here reading this thread is making me very upset. I recently upgraded from a single 780ti to a 970 sli configuration specifically for the extra gb of vram. The 970 would've never even been on my radar if I knew from the start its really a 3.5gb card with an extra 512mb at 1/4 speed. I would be rocking a 290x cf config instead. Too bad I bought my cards from newegg which doesn't allow returns!

Granseth · Jan 24, 2015

http://www.anandtech.com/show/8931/nvidia-publishes-statement-on-geforce-gtx-970-memory-allocation

3ziz_3bqr · Jan 25, 2015

Hi all ,
first of all I'm sorry for my bad English

nvidia hasn't told us the technical reasons of the problem
but I have some thoughts about it ( I'm not sure of anything and it's just my view )

the problem isn't how much the Gigabytes does the gtx970 have .. the problem is how the GPCs control the memory controller

how ?

http://www.bjorn3d.com/wp-content/uploads/2014/09/GeForce_GTX_980_Block_Diagram_FINAL-700x652.png

the full GM204 has 4 GPCs each of them has 4 SMMs
and the full chip has 4 memory controllers

so with 4GBs VRAM we can say there is 1024mb per 1 memory controller to control
so there's 1 GPC For 1 memory controller that control 1024mb
and u can say there is 1 SMM for 1/4GB

http://images.anandtech.com/doci/8931/970_Block_Diagram.png

but with GTX970 we only have 13 SMMs *1/4GB = 3328MB !!

and 3328MB is the limit that we get less bandwidth ..
here :

http://i.imgur.com/v7EVufN.png

this is the problem ... the configuration of SMMs and memory controlers that we get with using of more than 3328mb
"Low bandwith"

maybe I'm right and maybe I'm wrong... waiting nvidia official statement

jj109 · Jan 25, 2015

3ziz_3bqr said:
Hi all ,
but with GTX970 we only have 13 SMMs *1/4GB = 3328MB !!

and 3328MB is the limit that we get less bandwidth ..
here :

http://i.imgur.com/v7EVufN.png

this is the problem ... the configuration of SMMs and memory controlers that we get with using of more than 3328mb
"Low bandwith"

maybe I'm right and maybe I'm wrong... waiting nvidia official statement

Coincidence. Windows DWM is taking up 200-300 MB and the 500 MB isn't paging properly for Nai's benchmark. On multi-monitor systems, the users will see it begin to thrash at 3 GB instead of 3.3 GB.

cmdrdredd · Jan 25, 2015

jj109 said:
Coincidence. Windows DWM is taking up 200-300 MB and the 500 MB isn't paging properly for Nai's benchmark. On multi-monitor systems, the users will see it begin to thrash at 3 GB instead of 3.3 GB.

I didn't know that last bit about milti-monitor. Interesting info.

NomanA · Jan 25, 2015

cmdrdredd said:
How do you know this for a fact though? Do you have some inside info that cuda functions properly on the 970? That may very well be the sole problem, but I don't know. I'm wondering how you know.

You don't need inside info to know how CUDA memory management operates, at least to the level pertinent to this discussion.

cudaMalloc (which is what the tool is using) allocates memory on the device. The tool actually blindly keeps trying to allocate 128MB chunks of memory until the driver responds with an error. If you get 3800MBytes through cudaMalloc, then there are 3800MBytes in system VRAM used by the device. There are no two ways about it.

Yes, WDDM's virtualization can mean few of those chunks are shared by multiple apps (cuda tool, wdm, explorer etc) and data would need to be swapped back in, before each of these applications access their context, but that's a completely different debate. It doesn't take away from the fact, that the cudaMalloc's allocated memory is on the device available to the application.

The only thing NVidia said was that applications needing less than 3.5GB will get all of the memory allocated from the 3.5GB region, and that if an application needs more RAM, it *will* get it.

Somehow this has been turned into this strong belief by few on this forum that CUDA apps can only get 3.5GB RAM, and that only games can access whole 4GB. This is utter nonsense.

The tool gets access to 4GB on the device, and it's uncontested access on a headless display. If it shows low bandwidths, than that's how the card is performing. Those low rates are still very high, if you think the system was swapping in 128MB from somewhere by magic, and even if that was true, the L2 rates will not be reduced by the same amount (once the data is swapped in, the L2 access should not have the same proportional hit)

jj109 · Jan 25, 2015

The L2 cache section would be absolutely affected by an swap bug. What the benchmark calls "bandwidth" is not really bandwidth at all...

The benchmark simply measures the amount of time it needs to do 10 additions on 128 MB worth of floats. The only difference between the DRAM section and the L2 cache section is the addition of an inner loop in the kernel that basically makes each thread do 3 additions instead of 1. The "bandwidth" returned is higher because it's multiplying the calculation by 3.

So based on the algorithm used in the cache section, then cache "bandwidth" has an upper limit of 3 * DRAM "bandwidth".

Silverforce11 · Jan 25, 2015

http://www.neogaf.com/forum/showpost.php?p=148935641&postcount=654

Looks like users who measure frame time do indeed see much more spikes when vram goes above 3.5gb. It's real.

As to NV's own release of Average FPS, it's AVERAGE. If its dropping 3-5% average over the bench, it could well be due to major MIN FPS drops that lead to the overall average decrease.

Now its time for review sites to cover it with FCAT.

SPBHM · Jan 25, 2015

a small average 3% FPS drop could indeed hide some nasty fame times variance, so far I have the impression that this problem with the 970 memory is not all that serious and the Nvidia explanation makes some sense, but more testing is needed indeed, with all those nice Nvidia FCAT setups.

jj109 · Jan 25, 2015

Silverforce11 said:
http://www.neogaf.com/forum/showpost.php?p=148935641&postcount=654

Looks like users who measure frame time do indeed see much more spikes when vram goes above 3.5gb. It's real.

Hold your horses. The first 1080p bench wasn't even GPU bottlenecked and the correlation between frame time spikes and VRAM > 3.5 GB is very weak on the other two runs. The last run is just flat running out of VRAM.

Edit:

Can someone with a GTX 970 compare the results with Nai's bench when running in PCIe 16x and PCIe 8x mode?

amenx · Jan 25, 2015

Mordor: 970 using full 4gb on 4k ultra preset. About in line with 290.

http://www.hardwarepal.com/shadow-mordor-benchmark/

Shame they didnt have a 3gb 780ti to throw in the mix, a card thats been overlooked in this discussion. When a user is claiming performance tanks at the 3.5gb vram mark on a 970, what happens to a 3gb 780ti? Its performance has generally been above 970 in 4k benches and any other very demanding situation, yet no threads claiming it failed or tanked above 3gb in whatever game or task handed to it.

Silverforce11 · Jan 25, 2015

amenx said:
Mordor: 970 using full 4gb on 4k ultra preset. About in line with 290.

http://www.hardwarepal.com/shadow-mordor-benchmark/

Shame they didnt have a 3gb 780ti to throw in the mix, a card thats been overlooked in this discussion. When a user is claiming performance tanks at the 3.5gb vram mark on a 970, what happens to a 3gb 780ti? Its performance has generally been above 970 in 4k benches and any other very demanding situation, yet no threads claiming it failed or tanked above 3gb in whatever game or task handed to it.

Interesting review, we see the 970 is set by drivers to try to remain <=3.5gb vram where possible:

Where it can't, by forcing ultra at 4K, none of the 4GB setups can handle the game and SLI/CF stops scaling.

"At 4K with the Ultra HD pack the 4GB cards don’t seem to be enough. We see full utilization of our video memory and swapping occurs with system resources . We see a system memory usage of above 6GB as well as a pagefile of over 11GB bringing all our setups to their knees."

The results on Neogaf at 1080p with Ultra retains great average fps and not too bad min fps. What we see that's messed up is frame time spikes. FCAT will provide a clearer picture. ie. Does the 980 under the same settings at 1080p show those frame time spikes? Its NOT due to GPU maxed load because the user show GPU load isn't max (running vsync). These cards have the power for 1080p gaming with ultra textures & the vram to handle it fine (supposedly).

Erenhardt · Jan 25, 2015

Silverforce11 said:
http://www.neogaf.com/forum/showpost.php?p=148935641&postcount=654

Looks like users who measure frame time do indeed see much more spikes when vram goes above 3.5gb. It's real.

As to NV's own release of Average FPS, it's AVERAGE. If its dropping 3-5% average over the bench, it could well be due to major MIN FPS drops that lead to the overall average decrease.

Now its time for review sites to cover it with FCAT.

Strange that the company that pushed FCAT so much when amd had framepacing problems now does this awful back-paddling and reports average fps count to show the issue is not a biggie.

I expect that when games start to REQUIRE 3.5+ GB VRAM we will see it have huge impact on gameplay, as it has on the benchmark.

iiiankiii · Jan 25, 2015

Erenhardt said:
Strange that the company that pushed FCAT so much when amd had framepacing problems now does this awful back-paddling and reports average fps count to show the issue is not a biggie.

I expect that when games start to REQUIRE 3.5+ GB VRAM we will see it have huge impact on gameplay, as it has on the benchmark.

Yeah, Some of the people here were pushing hard for FCAT testing back then. Now, when it comes to FCAT testing to see stutters with the 970, silence. You would think there should be more push for FCAT testing since it actually concern the GPUs they're using! More FCAT testing needed. Vram was one of the main reason I replaced my 780ti with a 970.

ShintaiDK · Jan 25, 2015

Erenhardt said:
I expect that when games start to REQUIRE 3.5+ GB VRAM we will see it have huge impact on gameplay, as it has on the benchmark.

Already tested and it doesnt.

Erenhardt · Jan 25, 2015

ShintaiDK said:
Already tested and it doesnt.

I see. Case closed. Nothing to see here. Go buy, buy, buy!

Are you trying to tell me that 780ti is not able to provide smooth experience in 1080p shadow of mordor?
The fact that the whole VRAM is populated doesn't mean you will experience stuttering. I don't want to go through forum history and find if it wasn't you who posted about console devs showing all assets into vram because they have 6.5gb of it in consoles. Which makes pc ports VRAM bloated.

You can't have cake and eat cake.
Also, where is that curiosity about the issue we saw when amd had framepacing problems. Why is the pursuit after "better products" gone now?

cmdrdredd · Jan 25, 2015

From what I understand, Shadow of Mordor is loading up the memory pool with textures and such from the game world rather than dynamically stream them, even ones it doesn't need.

Your argument that ShintaiDK quoted was saying that because the benchmark shows some weird and very large bandwidth reduction that games will suffer. I will tell you again as I have previously that if the bandwidth suffered as much as supposed, someone would have noticed it by now for sure in games. If you go from 150G/s to 22G/s I guarantee that it would be such a huge drop off that everyone would be seeing it and feeling it. Loading up 8x MSAA and watching a game stutter doesn't prove anything either. Even a 980 can stutter in situations when trying to run 8x MSAA due to lack of performance.

wand3r3r · Jan 25, 2015

Keysplayr why don't you ask NVidia to run FCAT on the popular games and demonstrate whether there is an issue? Isn't that why they developed it?

Haserath · Jan 25, 2015

If AMD did this, the internet would explode with WTF AMD.

But it's Nvidia so all is forgiven. Brush it under the rug along with everything else!

Warning issued for trolling.
-- stahlhart

amenx · Jan 25, 2015

If Nvidia did this, the internet would explode with WTF Nvidia!

...as is happening.

Meanwhile, Tweakguides has a bit on it and tests his own 970 on FC4 over the 3.5gb point.

http://www.tweakguides.com/

cmdrdredd · Jan 25, 2015

amenx said:
If Nvidia did this, the internet would explode with WTF Nvidia!

...as is happening.

Meanwhile, Tweakguides has a bit on it and tests his own 970 on FC4 over the 3.5gb point.

http://www.tweakguides.com/

Yep and it's in line with what I have found as well.

I decided to test this out for myself. Running Far Cry 4 at settings just high enough to use around 4GB of VRAM (3,840x2,400 via DSR plus all other settings to maximum, except SMAA) resulted in expectedly lower framerates, but no significant stuttering or hitching as demonstrated in this YouTube Video. See this Screenshot from the video to confirm that 4GB is being used - zoom in on the top left, second entry on the second line of the Afterburner overlay. Unfortunately some people have been "testing" this issue by using system-crippling settings, such as 4K resolution combined with 8x MSAA, and blaming the inevitable 5FPS slideshow on the VRAM. Quite aside from the fact that no current single GPU performs well at those settings, remember that slow or insufficient VRAM manifests itself as severe hitching (longer pauses) and stuttering (frequent brief hiccups), not an overall reduction in FPS. I did notice in my testing however that the GTX 970 definitely prefers using only 3.5GB of its VRAM in most cases; in the VRAM-hungry Watch Dogs for example, as settings were raised the 970 remained stuck at ~3.5GB VRAM usage right up until 8x MSAA was engaged at 4K resolution. After several hours of testing though my conclusion is that there's no discernible practical impact from this issue: the GTX 970 performs smoothly, whether using 3.5 or 4GB of VRAM. If you're experiencing stuttering or low framerates on a 970, in my opinion it is quite likely due to a general system issue or excessive GPU load, not VRAM segmentation.

WCCftech: Memory allocation problem with GTX 970 [UPDATE] PCPer: NVidia response

Senior member

Lifer

Lifer

Platinum Member

Golden Member

Senior member

Junior Member

Senior member

Lifer

Member

Senior member

Lifer

Diamond Member

Senior member

Diamond Member

Lifer

Diamond Member

Senior member

Lifer

Diamond Member

Lifer

Diamond Member

Senior member

Diamond Member

Lifer