Question Diablo4 causing gpu's to die

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Ranulf

Platinum Member
Jul 18, 2001
2,405
1,303
136
Just a heads up for anyone trying the Diablo4 beta this weekend. Game is apparently bricking 3080ti cards, gigabyte ones. There are reports of it hitting other cards though, including AMD.


While Diablo IV's lenient PC spec requirements indicate a well-optimized game, some users share troubling reports of their expensive graphics cards failing during gameplay. There have been multiple reports of NVIDIA RTX 3080 Ti GPUs failing while playing the Diablo IV early access beta, with symptoms like GPU fan usage skyrocketing to 100% following an outright hardware shut down.

Blizz forum post on it:


Jayz2c video:

 

KompuKare

Golden Member
Jul 28, 2009
1,069
1,101
136
Nonsense. It's absolutely not a game developers problem if a GPU dies under a certain workload. The $billions invested by nVidia for R&D and testing are supposed to include proper safeguards for their products.


I agree. If we're relying on developers putting framerate caps in the menu then the product is a catastrophic design failure, and nVidia should be paying compensation to all affected parties.
This is probably a minor thing so they could easily afford it.
However, telling is that after they had shipped millions of failing parts with a design fault (the solder defect during the transition from leaded to unleaded solder, aka "bumpgate"), nVidia basically wiggled out of most of the liability. As I recall there was a class action in the US and nVdidia put aside $250 million or so but the rest of the world got nothing and even the "winners" of the class action got very little.
 

gdansk

Platinum Member
Feb 8, 2011
2,478
3,373
136
I've said it before and I'll say it again: I wonder how many corners got cut on 2021 run graphics cards.

2020 cards should be fine since they were made with quality parts fabbed in 2019 and early 2020, but stuff built in 2021 and even early 2022 were likely squeezed out with substandard components subbed in to ensure orders were full filled.

Now that supply chain issues are largely resolved, I wonder if manufacturers will keep using substandard parts and pocket the savings (I mean, hey, no one on that supply chain benefits using 20yr parts rather than 7-10yr parts).
And at the end of the day it's something Nvidia should have prevented if it was poor component choice by board partners. That's an area Nvidia should be tyrannical. Specify that the critical current protection and other important components are not subpar. It seems like basic brand management.

Everyway you slice it the people responsible here are Nvidia and perhaps their board partners - not Blizzard. Developers have a totally reasonable expectation that non-malicious code shouldn't be able to actually kill a GPU (if that even happened, seems like it may have only been shutdowns).
 
Reactions: VirtualLarry

GodisanAtheist

Diamond Member
Nov 16, 2006
7,039
7,461
136
And at the end of the day it's something Nvidia should have prevented if it was poor component choice by board partners. That's an area Nvidia should be tyrannical. Specify that the critical current protection and other important components are not subpar. It seems like basic brand management.

Everyway you slice it the people responsible here are Nvidia and perhaps their board partners - not Blizzard. Developers have a totally reasonable expectation that non-malicious code shouldn't be able to actually kill a GPU (if that even happened, seems like it may have only been shutdowns).

- When $650 cards were selling for $1500, no one was gonna be shy about getting these damn things out the door. AIBs basically air dropping pallets of cards to miners, and there was no way some component shortages were going to stop anyone from making money hand over fist. The cards really only had to live for as long as their now piddly warranties managed.

NV and AMD, if not complicit, were likely keen to turn a blind eye.
 

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
It's absolutely not the developers problem to load balance hardware in order to prevent physical failure.


Sure, which further highlights their design flaws. That's nVidia's problem, not the developers.


But this is nothing more than your opinion. Who decides which game is "allowed" a given FPS? Why shouldn't a 500Hz monitor owner be allow to play Diablo 4 at 500 FPS, for example?

I personally cap my 2070 @ 60 FPS for many reasons, but I have the absolute right to run it uncapped full-bore 24/7 and expect it to last the duration of the warranty.


Which nVidia reviewer's guide says to cap the framerate to ensure correct operation? And what if a reviewer wants to test Diablo 4?

Also where does it say on nVidia's GPU boxes "warning, product not designed to run at uncapped framerate, doing so may damage the hardware and void the warranty, do so at your own risk"?

Please note, I am not defending nVidia.

But having worked in the development of software, and hardware, The hardware manufacturer cannot protect a card from every single possible piece of software out there. They can test for a whole lot of different situations, but they can't test software that didn't exist at the time of development.
 
Last edited:

VirtualLarry

No Lifer
Aug 25, 2001
56,442
10,113
126
Please note, I am not defending nVidia.

But having worked in the development of software, and hardware, The hardware manufacturer cannot protect a card from every single possible piece of software out there. They cant test for a whole lot of different situations, but they can't test software that didn't exist at the time of development.
funny how this has never been much of an issue with the "CPU guys".

Yes folks, proper engineering is hard.
 

Mopetar

Diamond Member
Jan 31, 2011
8,000
6,433
136
This reminds me of earlier years when people would use fur mark as a torture test or to ensure that their overclock was actually stable. Even review sites would use it to help test max power draw since it would usually max out the GPU more than any actual game or software could.

I think both AMD (then ATI) and NVidia hated it to a large extent and I remember them basically calling it a power virus, but they still had to keep their cards from blowing up while running it. This shouldn't be a difficult problem and there's no reason my 100 billion transistor GPU can't devote a few hundred thousand to preventing its own destruction.
 

Ranulf

Platinum Member
Jul 18, 2001
2,405
1,303
136
If I remember right there were a few people on the D4 forums with AMD cards reporting the problem at least causing a black screen and game shut down/error.
 

amenx

Diamond Member
Dec 17, 2004
4,005
2,274
136
Toms HW had reported on it with a list of cards involved.



Gigabyte does not inspire much faith in their products. Last year it was their PSUs blowing up which turned out to be cheaping out on internal components (according to GN).

Similar case happened with 3090s "blowing up" with New World game. That was with Evga cards which turned out to be weak solder points near the mosfet circuits.
 

KompuKare

Golden Member
Jul 28, 2009
1,069
1,101
136
The other takeaway from that Tomshardware list is that once again the 3080 Ti comes out ahead: not only was it pre-scalped* for the convenience of the consumer, it also acts as a canary in the mine and sacrifices itself if a "bad" game is detected. What a noble card!

* Compared to the original 3080. Other post crypto boom cards were also pre-scalped like the 6700 XT be the 6800 etc.
 

amenx

Diamond Member
Dec 17, 2004
4,005
2,274
136
I dont think these numbers are statistically significant to push Nvidia to make any wide-reaching changes in AIB tightening of QC, but the least they should do is ensure ANY such incidents involve full compensation for card owners. Perhaps pressure AIBs to offer more generous warranties (5 year minimum).
 

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
The only software that directly interacts with the hardware is the driver.

I have worked on Windows drivers before. So I am aware of how user space software interacts with the hardware. And its worth noting with DX12 and Vulcan, games are much closer to the metal than they were with older APIs. The issue with New World was the game itself being poorly coded which resulted in load characteristics that caused massive spikes in current. We do not yet know what is causing these failures, but it would not be surprising if it is caused by a load profile that induces radical current transients.

But hey, thanks for thumbing down my post for no real reason.
 
Reactions: Cableman and Ranulf

Tup3x

Golden Member
Dec 31, 2016
1,008
996
136
Toms HW had reported on it with a list of cards involved.

View attachment 78606

Gigabyte does not inspire much faith in their products. Last year it was their PSUs blowing up which turned out to be cheaping out on internal components (according to GN).

Similar case happened with 3090s "blowing up" with New World game. That was with Evga cards which turned out to be weak solder points near the mosfet circuits.
I'm not surprised... I had Gigabyte GTX 780 Ghz Edition which squealed like a pig that was about to die. On top of that it was a horrible card that run hot and was loud. I had it like a month or two and switched to MSI R9 290 (which I had about three months and switched back GTX 780 - ASUS this time and it was just sooo much better than that Gigabyte trash).
 

KompuKare

Golden Member
Jul 28, 2009
1,069
1,101
136
We do not yet know what is causing these failures, but it would not be surprising if it is caused by a load profile that induces radical current transients.
Yet they point made earlier about CPUs being able to handle whatever you throw at them is valid.

If GPU silicon, firmware, and drivers are not able to ensure that the chip and card are never able to be run in such a way that they might damage themselves, then the GPU vendors are doing something seriously wrong.
 

jpiniero

Lifer
Oct 1, 2010
14,823
5,440
136
I have worked on Windows drivers before. So I am aware of how user space software interacts with the hardware. And its worth noting with DX12 and Vulcan, games are much closer to the metal than they were with older APIs. The issue with New World was the game itself being poorly coded which resulted in load characteristics that caused massive spikes in current. We do not yet know what is causing these failures, but it would not be surprising if it is caused by a load profile that induces radical current transients.

A game running at a thousand FPS shouldn't be able to cause hardware failures. This is without a doubt 100% a Gigabyte issue, although perhaps you could blame nv for not enforcing quality control.
 

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
Yet they point made earlier about CPUs being able to handle whatever you throw at them is valid.

If GPU silicon, firmware, and drivers are not able to ensure that the chip and card are never able to be run in such a way that they might damage themselves, then the GPU vendors are doing something seriously wrong.

But there are key differences.

1: A CPU does not have any power circuits on it. The motherboard handles all the power delivery. If this issue happened for a CPU, the CPU would not be impacted at all. It would be the motherboard that failed. Which HAS happened. But the blame is always on the motherboard maker, not the CPU manufacturer.

2: At most, a consumer level CPU has 16 cores. A GPU has thousands, so load transients can be significantly larger.

3: CPUs use less power than the GPUs being impacted by these issues. So if a load profile was created that could cause large transient spikes on a CPU, those spikes would be much smaller.

4: The types of loads a CPU sees are drastically different from what a GPU sees. CPUs being general purpose means they are constantly context switching. Video encoding or the like would be similar, but those loads are very constant. Little risk of transients.

And for the second time, I am not defending the GPU makers.

And for those saying there is no reason the GPU makers could not prevent this sort of thing, they are limited in what they could do. Yes, the board maker could add in some sort of hardware over current protection, and have circuit breakers that would shut all power off to the card if it hit the designated limit. However, these circuits rarely react fast enough to handle transients. So then it comes down to only triggering for sustained load, which would result in system crashes if that limit was hit. Which would inevitably make end users angry, and then look bad for the board manufacturer.

Most cards already have software power limits, but these will also not catch transients. And if the software detects high power usage, all it can really do is ramp down clocks in an attempt to lower power consumption. These systems have to have a lot of averaging in them though, and they are slow to react.

All of these issues can likely be tracked back to the fact that newer high end GPUs draw crazy amounts of power. With these very high loads, there is far less room for error in the power delivery circuit. We saw this with the 12pin power connectors, and the EVGA 3090s. A tiny build up of tolerances resulted in catastrophic failure of the units. High end GPUs used to only draw 200-300W. Now we have cards that draw 450W.
 

coercitiv

Diamond Member
Jan 24, 2014
6,369
12,746
136
In other words folks, don't buy used video cards from gamers. What if they disabled vsync and the card is already one foot in the grave? /s
 
Reactions: Leeea

Hail The Brain Slug

Diamond Member
Oct 10, 2005
3,224
1,649
136
But there are key differences.

1: A CPU does not have any power circuits on it. The motherboard handles all the power delivery. If this issue happened for a CPU, the CPU would not be impacted at all. It would be the motherboard that failed. Which HAS happened. But the blame is always on the motherboard maker, not the CPU manufacturer.

2: At most, a consumer level CPU has 16 cores. A GPU has thousands, so load transients can be significantly larger.

3: CPUs use less power than the GPUs being impacted by these issues. So if a load profile was created that could cause large transient spikes on a CPU, those spikes would be much smaller.

4: The types of loads a CPU sees are drastically different from what a GPU sees. CPUs being general purpose means they are constantly context switching. Video encoding or the like would be similar, but those loads are very constant. Little risk of transients.

And for the second time, I am not defending the GPU makers.

And for those saying there is no reason the GPU makers could not prevent this sort of thing, they are limited in what they could do. Yes, the board maker could add in some sort of hardware over current protection, and have circuit breakers that would shut all power off to the card if it hit the designated limit. However, these circuits rarely react fast enough to handle transients. So then it comes down to only triggering for sustained load, which would result in system crashes if that limit was hit. Which would inevitably make end users angry, and then look bad for the board manufacturer.

Most cards already have software power limits, but these will also not catch transients. And if the software detects high power usage, all it can really do is ramp down clocks in an attempt to lower power consumption. These systems have to have a lot of averaging in them though, and they are slow to react.

All of these issues can likely be tracked back to the fact that newer high end GPUs draw crazy amounts of power. With these very high loads, there is far less room for error in the power delivery circuit. We saw this with the 12pin power connectors, and the EVGA 3090s. A tiny build up of tolerances resulted in catastrophic failure of the units. High end GPUs used to only draw 200-300W. Now we have cards that draw 450W.
If you have to state multiple times you're not defending a party, you're probably defending the party.

This isn't 1999 when power viruses could actually exploit the hardware to damage it. It's 2023. The hardware can manage itself just fine. Anything less Is a total and complete failure of the design or components.

Keep in mind new world and diablo 4 are not maliciously designed applications - they are completely legitimate games designed to be games. If you can somehow twist and spin legitimate, AAA games causing GPUs to physically damage themselves as not 100% completely the OEM or vendors fault, you need to reevaluate.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |