WCCftech: Memory allocation problem with GTX 970 [UPDATE] PCPer: NVidia response

Page 16 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

Spanners

Senior member
Mar 16, 2014
325
1
0
I don't think you are understanding that these tests aren't even touching the .5GB partition on the 970. Games do access that extra .5GB of VRAM.

I'm not sure this is accurate. Why would Nvidia prioritize the 3.5GB section of memory if it had similar performance? Why create two sections at all in fact? I think the 0.5GB has an 1/8th of the bandwidth as the test shows.

Could be wrong though admittedly.
 
Feb 19, 2009
10,457
10
76
Nvidia's statement demonstrates there is a performance hit when using the 0.5GB section. They are just saying it's pretty small (few percent). Most likely source of the performance hit is that 0.5GB section having lower throughput than the other 3.5GBs.

https://forums.geforce.com/default/...tx-970-3-5gb-vram-issue/post/4432672/#4432672

That is FPS though, have to wait and see if there is an impact on frame delivery. Or perhaps this is the source of the dropped frames issue with 970 SLI?

If you compare the performance between 980 vs 970 at <3.5gb and >3.5gb, the 970 tapers off more than the 980. The delta is ~5%, not 1% as claimed.

Call of Duty: AW (<3.5gb)
980 = 82 fps
970 = 71 fps

Performance delta = 82/71 = 15% faster for 980

>3.5gb
980 = 48
970 = 40

Performance delta = 48/40 = 20% faster for 980

The 970 suffers more relative to the 980 as >3.5gb vram is used.

As I posted, this NV provided data can be assumed to be their "best case scenario" where their drivers are optimized for the game/engine, by shifting non critical assets into the last 0.5gb for the 970.

We already know Frostbite will allocate vram and use more than "needed" by enhancing LOD for distant objects. It could well be in other game engines, particularly more advanced engines will do something similar. So the "vram used" may not equate to "vram needed". Combined with NV driver optimizations, the performance impact can be reduced.

So far, only Skyrim & Arma 3 has documented stutter issues, both are older games which NV may have not optimized Maxwell for.

Edit: In Skyrim, the texture mods are just pure ultra-res textures that either fit into vram or don't, I can't imagine there's much optimizations to "make it fit" better. Whereas techniques used to test in FC4 or these other games, are MSAA 8x or Super Sampling or DSR, Maxwell drivers could optimize their performance by prioritizing what goes into 3.5gb and the remaining 0.5gb partition. Either that or you can say Skyrim & Arma 3 are buggy with 970 but not 980. :/
 
Last edited:

Abwx

Lifer
Apr 2, 2011
11,167
3,862
136
Why would Nvidia prioritize the 3.5GB section of memory if it had similar performance? Why create two sections at all in fact?

Simplified MC for efficency purposes, it cant feed a given SM with a data picked randomly in the whole 4GB RAM, each SM has a limited adressing range through this MC, when datas are picked in the 0.5GB partition they are facing disabled SMs, hence they must be swapped in another part of the RAM, expressely in the 3.5GB part, the MC width is 256bit but the actual datapath width seems to be less.
 
Last edited:

Enigmoid

Platinum Member
Sep 27, 2012
2,907
31
91
If you compare the performance between 980 vs 970 at <3.5gb and >3.5gb, the 970 tapers off more than the 980. The delta is ~5%, not 1% as claimed.

Call of Duty: AW (<3.5gb)
980 = 82 fps
970 = 71 fps

Performance delta = 82/71 = 15% faster for 980

>3.5gb
980 = 48
970 = 40

Performance delta = 48/40 = 20% faster for 980

The 970 suffers more relative to the 980 as >3.5gb vram is used.

Your sample size is too small to confirm such tiny differences. Taking 0.7 fps from the 970 >3.5 gb (error in the difference of two sample points subtracted from each other at half the reporting error - 0.5 fps) and taking 0.7 fps from the 980 reduces the percentages to

82/70.3 = 16.6%
47.3/40 = 18.2%
faster for the 980

Not statistically significant until we get a larger sample. The three games on record give something similar. I mean look at the bf4 drop. 980 from 36 to 19 fps, the 970 from 30 to 15. That is easily reporting error. Same with shadow of mordor. COD is the only thing that shows deviations above standard error. Averaging them together would indicate a small loss for the 970 but nothing major.
 

Pneumothorax

Golden Member
Nov 4, 2002
1,182
23
81
Sitting here reading this thread is making me very upset. I recently upgraded from a single 780ti to a 970 sli configuration specifically for the extra gb of vram. The 970 would've never even been on my radar if I knew from the start its really a 3.5gb card with an extra 512mb at 1/4 speed. I would be rocking a 290x cf config instead. Too bad I bought my cards from newegg which doesn't allow returns!
 

3ziz_3bqr

Junior Member
Jan 24, 2015
1
0
0
Hi all ,
first of all I'm sorry for my bad English

nvidia hasn't told us the technical reasons of the problem
but I have some thoughts about it ( I'm not sure of anything and it's just my view )

the problem isn't how much the Gigabytes does the gtx970 have .. the problem is how the GPCs control the memory controller

how ?

http://www.bjorn3d.com/wp-content/uploads/2014/09/GeForce_GTX_980_Block_Diagram_FINAL-700x652.png

the full GM204 has 4 GPCs each of them has 4 SMMs
and the full chip has 4 memory controllers

so with 4GBs VRAM we can say there is 1024mb per 1 memory controller to control
so there's 1 GPC For 1 memory controller that control 1024mb
and u can say there is 1 SMM for 1/4GB



http://images.anandtech.com/doci/8931/970_Block_Diagram.png

but with GTX970 we only have 13 SMMs *1/4GB = 3328MB !!

and 3328MB is the limit that we get less bandwidth ..
here :

http://i.imgur.com/v7EVufN.png

this is the problem ... the configuration of SMMs and memory controlers that we get with using of more than 3328mb
"Low bandwith"

maybe I'm right and maybe I'm wrong... waiting nvidia official statement
 

jj109

Senior member
Dec 17, 2013
391
59
91
Hi all ,
but with GTX970 we only have 13 SMMs *1/4GB = 3328MB !!

and 3328MB is the limit that we get less bandwidth ..
here :

http://i.imgur.com/v7EVufN.png

this is the problem ... the configuration of SMMs and memory controlers that we get with using of more than 3328mb
"Low bandwith"

maybe I'm right and maybe I'm wrong... waiting nvidia official statement

Coincidence. Windows DWM is taking up 200-300 MB and the 500 MB isn't paging properly for Nai's benchmark. On multi-monitor systems, the users will see it begin to thrash at 3 GB instead of 3.3 GB.
 

cmdrdredd

Lifer
Dec 12, 2001
27,052
357
126
Coincidence. Windows DWM is taking up 200-300 MB and the 500 MB isn't paging properly for Nai's benchmark. On multi-monitor systems, the users will see it begin to thrash at 3 GB instead of 3.3 GB.

I didn't know that last bit about milti-monitor. Interesting info.
 

NomanA

Member
May 15, 2014
128
31
101
How do you know this for a fact though? Do you have some inside info that cuda functions properly on the 970? That may very well be the sole problem, but I don't know. I'm wondering how you know.

You don't need inside info to know how CUDA memory management operates, at least to the level pertinent to this discussion.

cudaMalloc (which is what the tool is using) allocates memory on the device. The tool actually blindly keeps trying to allocate 128MB chunks of memory until the driver responds with an error. If you get 3800MBytes through cudaMalloc, then there are 3800MBytes in system VRAM used by the device. There are no two ways about it.

Yes, WDDM's virtualization can mean few of those chunks are shared by multiple apps (cuda tool, wdm, explorer etc) and data would need to be swapped back in, before each of these applications access their context, but that's a completely different debate. It doesn't take away from the fact, that the cudaMalloc's allocated memory is on the device available to the application.

The only thing NVidia said was that applications needing less than 3.5GB will get all of the memory allocated from the 3.5GB region, and that if an application needs more RAM, it *will* get it.

Somehow this has been turned into this strong belief by few on this forum that CUDA apps can only get 3.5GB RAM, and that only games can access whole 4GB. This is utter nonsense.

The tool gets access to 4GB on the device, and it's uncontested access on a headless display. If it shows low bandwidths, than that's how the card is performing. Those low rates are still very high, if you think the system was swapping in 128MB from somewhere by magic, and even if that was true, the L2 rates will not be reduced by the same amount (once the data is swapped in, the L2 access should not have the same proportional hit)
 

jj109

Senior member
Dec 17, 2013
391
59
91
The L2 cache section would be absolutely affected by an swap bug. What the benchmark calls "bandwidth" is not really bandwidth at all...

The benchmark simply measures the amount of time it needs to do 10 additions on 128 MB worth of floats. The only difference between the DRAM section and the L2 cache section is the addition of an inner loop in the kernel that basically makes each thread do 3 additions instead of 1. The "bandwidth" returned is higher because it's multiplying the calculation by 3.

So based on the algorithm used in the cache section, then cache "bandwidth" has an upper limit of 3 * DRAM "bandwidth".
 

SPBHM

Diamond Member
Sep 12, 2012
5,058
410
126
a small average 3% FPS drop could indeed hide some nasty fame times variance, so far I have the impression that this problem with the 970 memory is not all that serious and the Nvidia explanation makes some sense, but more testing is needed indeed, with all those nice Nvidia FCAT setups.
 

jj109

Senior member
Dec 17, 2013
391
59
91
http://www.neogaf.com/forum/showpost.php?p=148935641&postcount=654

Looks like users who measure frame time do indeed see much more spikes when vram goes above 3.5gb. It's real.

Hold your horses. The first 1080p bench wasn't even GPU bottlenecked and the correlation between frame time spikes and VRAM > 3.5 GB is very weak on the other two runs. The last run is just flat running out of VRAM.

Edit:

Can someone with a GTX 970 compare the results with Nai's bench when running in PCIe 16x and PCIe 8x mode?
 
Last edited:

amenx

Diamond Member
Dec 17, 2004
4,011
2,279
136
Mordor: 970 using full 4gb on 4k ultra preset. About in line with 290.

http://www.hardwarepal.com/shadow-mordor-benchmark/

Shame they didnt have a 3gb 780ti to throw in the mix, a card thats been overlooked in this discussion. When a user is claiming performance tanks at the 3.5gb vram mark on a 970, what happens to a 3gb 780ti? Its performance has generally been above 970 in 4k benches and any other very demanding situation, yet no threads claiming it failed or tanked above 3gb in whatever game or task handed to it.
 
Feb 19, 2009
10,457
10
76
Mordor: 970 using full 4gb on 4k ultra preset. About in line with 290.

http://www.hardwarepal.com/shadow-mordor-benchmark/

Shame they didnt have a 3gb 780ti to throw in the mix, a card thats been overlooked in this discussion. When a user is claiming performance tanks at the 3.5gb vram mark on a 970, what happens to a 3gb 780ti? Its performance has generally been above 970 in 4k benches and any other very demanding situation, yet no threads claiming it failed or tanked above 3gb in whatever game or task handed to it.

Interesting review, we see the 970 is set by drivers to try to remain <=3.5gb vram where possible:





Where it can't, by forcing ultra at 4K, none of the 4GB setups can handle the game and SLI/CF stops scaling.



"At 4K with the Ultra HD pack the 4GB cards don&#8217;t seem to be enough. We see full utilization of our video memory and swapping occurs with system resources . We see a system memory usage of above 6GB as well as a pagefile of over 11GB bringing all our setups to their knees."

The results on Neogaf at 1080p with Ultra retains great average fps and not too bad min fps. What we see that's messed up is frame time spikes. FCAT will provide a clearer picture. ie. Does the 980 under the same settings at 1080p show those frame time spikes? Its NOT due to GPU maxed load because the user show GPU load isn't max (running vsync). These cards have the power for 1080p gaming with ultra textures & the vram to handle it fine (supposedly).
 
Last edited:

Erenhardt

Diamond Member
Dec 1, 2012
3,251
105
101
http://www.neogaf.com/forum/showpost.php?p=148935641&postcount=654

Looks like users who measure frame time do indeed see much more spikes when vram goes above 3.5gb. It's real.

As to NV's own release of Average FPS, it's AVERAGE. If its dropping 3-5% average over the bench, it could well be due to major MIN FPS drops that lead to the overall average decrease.

Now its time for review sites to cover it with FCAT.

Strange that the company that pushed FCAT so much when amd had framepacing problems now does this awful back-paddling and reports average fps count to show the issue is not a biggie.

I expect that when games start to REQUIRE 3.5+ GB VRAM we will see it have huge impact on gameplay, as it has on the benchmark.
 

iiiankiii

Senior member
Apr 4, 2008
759
47
91
Strange that the company that pushed FCAT so much when amd had framepacing problems now does this awful back-paddling and reports average fps count to show the issue is not a biggie.

I expect that when games start to REQUIRE 3.5+ GB VRAM we will see it have huge impact on gameplay, as it has on the benchmark.

Yeah, Some of the people here were pushing hard for FCAT testing back then. Now, when it comes to FCAT testing to see stutters with the 970, silence. You would think there should be more push for FCAT testing since it actually concern the GPUs they're using! More FCAT testing needed. Vram was one of the main reason I replaced my 780ti with a 970.
 

Erenhardt

Diamond Member
Dec 1, 2012
3,251
105
101
Already tested and it doesnt.

I see. Case closed. Nothing to see here. Go buy, buy, buy!



Are you trying to tell me that 780ti is not able to provide smooth experience in 1080p shadow of mordor?
The fact that the whole VRAM is populated doesn't mean you will experience stuttering. I don't want to go through forum history and find if it wasn't you who posted about console devs showing all assets into vram because they have 6.5gb of it in consoles. Which makes pc ports VRAM bloated.

You can't have cake and eat cake.
Also, where is that curiosity about the issue we saw when amd had framepacing problems. Why is the pursuit after "better products" gone now?
 
Last edited:

cmdrdredd

Lifer
Dec 12, 2001
27,052
357
126
From what I understand, Shadow of Mordor is loading up the memory pool with textures and such from the game world rather than dynamically stream them, even ones it doesn't need.

Your argument that ShintaiDK quoted was saying that because the benchmark shows some weird and very large bandwidth reduction that games will suffer. I will tell you again as I have previously that if the bandwidth suffered as much as supposed, someone would have noticed it by now for sure in games. If you go from 150G/s to 22G/s I guarantee that it would be such a huge drop off that everyone would be seeing it and feeling it. Loading up 8x MSAA and watching a game stutter doesn't prove anything either. Even a 980 can stutter in situations when trying to run 8x MSAA due to lack of performance.
 

wand3r3r

Diamond Member
May 16, 2008
3,180
0
0
Keysplayr why don't you ask NVidia to run FCAT on the popular games and demonstrate whether there is an issue? Isn't that why they developed it?
 

Haserath

Senior member
Sep 12, 2010
793
1
81
If AMD did this, the internet would explode with WTF AMD.

But it's Nvidia so all is forgiven. Brush it under the rug along with everything else!

Warning issued for trolling.
-- stahlhart
 
Last edited by a moderator:

cmdrdredd

Lifer
Dec 12, 2001
27,052
357
126
If Nvidia did this, the internet would explode with WTF Nvidia!

...as is happening.

Meanwhile, Tweakguides has a bit on it and tests his own 970 on FC4 over the 3.5gb point.

http://www.tweakguides.com/

Yep and it's in line with what I have found as well.

I decided to test this out for myself. Running Far Cry 4 at settings just high enough to use around 4GB of VRAM (3,840x2,400 via DSR plus all other settings to maximum, except SMAA) resulted in expectedly lower framerates, but no significant stuttering or hitching as demonstrated in this YouTube Video. See this Screenshot from the video to confirm that 4GB is being used - zoom in on the top left, second entry on the second line of the Afterburner overlay. Unfortunately some people have been "testing" this issue by using system-crippling settings, such as 4K resolution combined with 8x MSAA, and blaming the inevitable 5FPS slideshow on the VRAM. Quite aside from the fact that no current single GPU performs well at those settings, remember that slow or insufficient VRAM manifests itself as severe hitching (longer pauses) and stuttering (frequent brief hiccups), not an overall reduction in FPS. I did notice in my testing however that the GTX 970 definitely prefers using only 3.5GB of its VRAM in most cases; in the VRAM-hungry Watch Dogs for example, as settings were raised the 970 remained stuck at ~3.5GB VRAM usage right up until 8x MSAA was engaged at 4K resolution. After several hours of testing though my conclusion is that there's no discernible practical impact from this issue: the GTX 970 performs smoothly, whether using 3.5 or 4GB of VRAM. If you're experiencing stuttering or low framerates on a 970, in my opinion it is quite likely due to a general system issue or excessive GPU load, not VRAM segmentation.
 
Status
Not open for further replies.
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |