Question Diablo4 causing gpu's to die

Ranulf · Mar 22, 2023

Just a heads up for anyone trying the Diablo4 beta this weekend. Game is apparently bricking 3080ti cards, gigabyte ones. There are reports of it hitting other cards though, including AMD.

RTX 3080 Ti GPUs reportedly bricking while playing Diablo IV beta

Multiple PC users are reporting that their expensive NVIDIA RTX 3080 Ti video cards had bricked while playing the recent Diablo IV early access beta test.

www.tweaktown.com

While Diablo IV's lenient PC spec requirements indicate a well-optimized game, some users share troubling reports of their expensive graphics cards failing during gameplay. There have been multiple reports of NVIDIA RTX 3080 Ti GPUs failing while playing the Diablo IV early access beta, with symptoms like GPU fan usage skyrocketing to 100% following an outright hardware shut down.

Blizz forum post on it:

Diablo 4 Bricked my GIGABYTE 3080 TI

I am going to play devil’s advocate here and say that it might not be entirely blizzards fault. Yes, it probably helped a little, but I am running a full AMD am 4 setup Ryzen 5 5600x 6600xt 16 3200 ram and my setting for the game are maxed out and have not adjusted them since I started the beta...

us.forums.blizzard.com

Jayz2c video:

Tup3x · Mar 24, 2023

I always cap framerate. I have 165Hz screen and use NVIDIA NULL to achieve that. It also ensures that I never see tearing which I just can't stand at all.

In2Photos · Mar 24, 2023

Where do I cap the frame rate when I'm running Folding@Home?

KompuKare · Mar 24, 2023

BFG10K said:
Nonsense. It's absolutely not a game developers problem if a GPU dies under a certain workload. The $billions invested by nVidia for R&D and testing are supposed to include proper safeguards for their products.

I agree. If we're relying on developers putting framerate caps in the menu then the product is a catastrophic design failure, and nVidia should be paying compensation to all affected parties.

This is probably a minor thing so they could easily afford it.
However, telling is that after they had shipped millions of failing parts with a design fault (the solder defect during the transition from leaded to unleaded solder, aka "bumpgate"), nVidia basically wiggled out of most of the liability. As I recall there was a class action in the US and nVdidia put aside $250 million or so but the rest of the world got nothing and even the "winners" of the class action got very little.

gdansk · Mar 24, 2023

GodisanAtheist said:
I've said it before and I'll say it again: I wonder how many corners got cut on 2021 run graphics cards.

2020 cards should be fine since they were made with quality parts fabbed in 2019 and early 2020, but stuff built in 2021 and even early 2022 were likely squeezed out with substandard components subbed in to ensure orders were full filled.

Now that supply chain issues are largely resolved, I wonder if manufacturers will keep using substandard parts and pocket the savings (I mean, hey, no one on that supply chain benefits using 20yr parts rather than 7-10yr parts).

And at the end of the day it's something Nvidia should have prevented if it was poor component choice by board partners. That's an area Nvidia should be tyrannical. Specify that the critical current protection and other important components are not subpar. It seems like basic brand management.

Everyway you slice it the people responsible here are Nvidia and perhaps their board partners - not Blizzard. Developers have a totally reasonable expectation that non-malicious code shouldn't be able to actually kill a GPU (if that even happened, seems like it may have only been shutdowns).

Leeea · Mar 24, 2023

GodisanAtheist · Mar 24, 2023

gdansk said:
And at the end of the day it's something Nvidia should have prevented if it was poor component choice by board partners. That's an area Nvidia should be tyrannical. Specify that the critical current protection and other important components are not subpar. It seems like basic brand management.

Everyway you slice it the people responsible here are Nvidia and perhaps their board partners - not Blizzard. Developers have a totally reasonable expectation that non-malicious code shouldn't be able to actually kill a GPU (if that even happened, seems like it may have only been shutdowns).

- When $650 cards were selling for $1500, no one was gonna be shy about getting these damn things out the door. AIBs basically air dropping pallets of cards to miners, and there was no way some component shortages were going to stop anyone from making money hand over fist. The cards really only had to live for as long as their now piddly warranties managed.

NV and AMD, if not complicit, were likely keen to turn a blind eye.

Stuka87 · Mar 24, 2023

BFG10K said:
It's absolutely not the developers problem to load balance hardware in order to prevent physical failure.

Sure, which further highlights their design flaws. That's nVidia's problem, not the developers.

But this is nothing more than your opinion. Who decides which game is "allowed" a given FPS? Why shouldn't a 500Hz monitor owner be allow to play Diablo 4 at 500 FPS, for example?

I personally cap my 2070 @ 60 FPS for many reasons, but I have the absolute right to run it uncapped full-bore 24/7 and expect it to last the duration of the warranty.

Which nVidia reviewer's guide says to cap the framerate to ensure correct operation? And what if a reviewer wants to test Diablo 4?

Also where does it say on nVidia's GPU boxes "warning, product not designed to run at uncapped framerate, doing so may damage the hardware and void the warranty, do so at your own risk"?

Please note, I am not defending nVidia.

But having worked in the development of software, and hardware, The hardware manufacturer cannot protect a card from every single possible piece of software out there. They can test for a whole lot of different situations, but they can't test software that didn't exist at the time of development.

VirtualLarry · Mar 24, 2023

Stuka87 said:
Please note, I am not defending nVidia.

But having worked in the development of software, and hardware, The hardware manufacturer cannot protect a card from every single possible piece of software out there. They cant test for a whole lot of different situations, but they can't test software that didn't exist at the time of development.

funny how this has never been much of an issue with the "CPU guys".

Yes folks, proper engineering is hard.

Mopetar · Mar 24, 2023

This reminds me of earlier years when people would use fur mark as a torture test or to ensure that their overclock was actually stable. Even review sites would use it to help test max power draw since it would usually max out the GPU more than any actual game or software could.

I think both AMD (then ATI) and NVidia hated it to a large extent and I remember them basically calling it a power virus, but they still had to keep their cards from blowing up while running it. This shouldn't be a difficult problem and there's no reason my 100 billion transistor GPU can't devote a few hundred thousand to preventing its own destruction.

DooKey · Mar 24, 2023

IS this restricted to NV cards or not?

Ranulf · Mar 25, 2023

If I remember right there were a few people on the D4 forums with AMD cards reporting the problem at least causing a black screen and game shut down/error.

amenx · Mar 25, 2023

Toms HW had reported on it with a list of cards involved.

RTX 3080 Ti GPUs Are Mysteriously Dying On Diablo IV Beta

Lilith's reaping RTX 3080 Ti souls.

www.tomshardware.com

Gigabyte does not inspire much faith in their products. Last year it was their PSUs blowing up which turned out to be cheaping out on internal components (according to GN).

Similar case happened with 3090s "blowing up" with New World game. That was with Evga cards which turned out to be weak solder points near the mosfet circuits.

KompuKare · Mar 25, 2023

The other takeaway from that Tomshardware list is that once again the 3080 Ti comes out ahead: not only was it pre-scalped* for the convenience of the consumer, it also acts as a canary in the mine and sacrifices itself if a "bad" game is detected. What a noble card!

* Compared to the original 3080. Other post crypto boom cards were also pre-scalped like the 6700 XT be the 6800 etc.

amenx · Mar 25, 2023

I dont think these numbers are statistically significant to push Nvidia to make any wide-reaching changes in AIB tightening of QC, but the least they should do is ensure ANY such incidents involve full compensation for card owners. Perhaps pressure AIBs to offer more generous warranties (5 year minimum).

Geegeeoh · Mar 25, 2023

Stuka87 said:
The hardware manufacturer cannot protect a card from every single possible piece of software out there.

The only software that directly interacts with the hardware is the driver.

snoopy7548 · Mar 25, 2023

It'd be interesting to know the setups of the bricked GPUs. PSU? GPU OC'd w/ increased power limit?

Stuka87 · Mar 25, 2023

Geegeeoh said:
The only software that directly interacts with the hardware is the driver.

I have worked on Windows drivers before. So I am aware of how user space software interacts with the hardware. And its worth noting with DX12 and Vulcan, games are much closer to the metal than they were with older APIs. The issue with New World was the game itself being poorly coded which resulted in load characteristics that caused massive spikes in current. We do not yet know what is causing these failures, but it would not be surprising if it is caused by a load profile that induces radical current transients.

But hey, thanks for thumbing down my post for no real reason.

Tup3x · Mar 25, 2023

amenx said:
Toms HW had reported on it with a list of cards involved.

RTX 3080 Ti GPUs Are Mysteriously Dying On Diablo IV Beta

Lilith's reaping RTX 3080 Ti souls.

www.tomshardware.com

View attachment 78606

Gigabyte does not inspire much faith in their products. Last year it was their PSUs blowing up which turned out to be cheaping out on internal components (according to GN).

Similar case happened with 3090s "blowing up" with New World game. That was with Evga cards which turned out to be weak solder points near the mosfet circuits.

I'm not surprised... I had Gigabyte GTX 780 Ghz Edition which squealed like a pig that was about to die. On top of that it was a horrible card that run hot and was loud. I had it like a month or two and switched to MSI R9 290 (which I had about three months and switched back GTX 780 - ASUS this time and it was just sooo much better than that Gigabyte trash).

KompuKare · Mar 25, 2023

Stuka87 said:
We do not yet know what is causing these failures, but it would not be surprising if it is caused by a load profile that induces radical current transients.

Yet they point made earlier about CPUs being able to handle whatever you throw at them is valid.

If GPU silicon, firmware, and drivers are not able to ensure that the chip and card are never able to be run in such a way that they might damage themselves, then the GPU vendors are doing something seriously wrong.

jpiniero · Mar 25, 2023

Stuka87 said:
I have worked on Windows drivers before. So I am aware of how user space software interacts with the hardware. And its worth noting with DX12 and Vulcan, games are much closer to the metal than they were with older APIs. The issue with New World was the game itself being poorly coded which resulted in load characteristics that caused massive spikes in current. We do not yet know what is causing these failures, but it would not be surprising if it is caused by a load profile that induces radical current transients.

A game running at a thousand FPS shouldn't be able to cause hardware failures. This is without a doubt 100% a Gigabyte issue, although perhaps you could blame nv for not enforcing quality control.

PhonakV30 · Mar 25, 2023

I guess this happens during cut scene? game will be like FurMark behind cut scene ?

Stuka87 · Mar 25, 2023

KompuKare said:
Yet they point made earlier about CPUs being able to handle whatever you throw at them is valid.

If GPU silicon, firmware, and drivers are not able to ensure that the chip and card are never able to be run in such a way that they might damage themselves, then the GPU vendors are doing something seriously wrong.

But there are key differences.

1: A CPU does not have any power circuits on it. The motherboard handles all the power delivery. If this issue happened for a CPU, the CPU would not be impacted at all. It would be the motherboard that failed. Which HAS happened. But the blame is always on the motherboard maker, not the CPU manufacturer.

2: At most, a consumer level CPU has 16 cores. A GPU has thousands, so load transients can be significantly larger.

3: CPUs use less power than the GPUs being impacted by these issues. So if a load profile was created that could cause large transient spikes on a CPU, those spikes would be much smaller.

4: The types of loads a CPU sees are drastically different from what a GPU sees. CPUs being general purpose means they are constantly context switching. Video encoding or the like would be similar, but those loads are very constant. Little risk of transients.

And for the second time, I am not defending the GPU makers.

And for those saying there is no reason the GPU makers could not prevent this sort of thing, they are limited in what they could do. Yes, the board maker could add in some sort of hardware over current protection, and have circuit breakers that would shut all power off to the card if it hit the designated limit. However, these circuits rarely react fast enough to handle transients. So then it comes down to only triggering for sustained load, which would result in system crashes if that limit was hit. Which would inevitably make end users angry, and then look bad for the board manufacturer.

Most cards already have software power limits, but these will also not catch transients. And if the software detects high power usage, all it can really do is ramp down clocks in an attempt to lower power consumption. These systems have to have a lot of averaging in them though, and they are slow to react.

All of these issues can likely be tracked back to the fact that newer high end GPUs draw crazy amounts of power. With these very high loads, there is far less room for error in the power delivery circuit. We saw this with the 12pin power connectors, and the EVGA 3090s. A tiny build up of tolerances resulted in catastrophic failure of the units. High end GPUs used to only draw 200-300W. Now we have cards that draw 450W.

coercitiv · Mar 26, 2023

In other words folks, don't buy used video cards from gamers. What if they disabled vsync and the card is already one foot in the grave? /s

Hail The Brain Slug · Mar 26, 2023

Stuka87 said:
But there are key differences.

1: A CPU does not have any power circuits on it. The motherboard handles all the power delivery. If this issue happened for a CPU, the CPU would not be impacted at all. It would be the motherboard that failed. Which HAS happened. But the blame is always on the motherboard maker, not the CPU manufacturer.

2: At most, a consumer level CPU has 16 cores. A GPU has thousands, so load transients can be significantly larger.

3: CPUs use less power than the GPUs being impacted by these issues. So if a load profile was created that could cause large transient spikes on a CPU, those spikes would be much smaller.

4: The types of loads a CPU sees are drastically different from what a GPU sees. CPUs being general purpose means they are constantly context switching. Video encoding or the like would be similar, but those loads are very constant. Little risk of transients.

And for the second time, I am not defending the GPU makers.

And for those saying there is no reason the GPU makers could not prevent this sort of thing, they are limited in what they could do. Yes, the board maker could add in some sort of hardware over current protection, and have circuit breakers that would shut all power off to the card if it hit the designated limit. However, these circuits rarely react fast enough to handle transients. So then it comes down to only triggering for sustained load, which would result in system crashes if that limit was hit. Which would inevitably make end users angry, and then look bad for the board manufacturer.

Most cards already have software power limits, but these will also not catch transients. And if the software detects high power usage, all it can really do is ramp down clocks in an attempt to lower power consumption. These systems have to have a lot of averaging in them though, and they are slow to react.

All of these issues can likely be tracked back to the fact that newer high end GPUs draw crazy amounts of power. With these very high loads, there is far less room for error in the power delivery circuit. We saw this with the 12pin power connectors, and the EVGA 3090s. A tiny build up of tolerances resulted in catastrophic failure of the units. High end GPUs used to only draw 200-300W. Now we have cards that draw 450W.

If you have to state multiple times you're not defending a party, you're probably defending the party.

This isn't 1999 when power viruses could actually exploit the hardware to damage it. It's 2023. The hardware can manage itself just fine. Anything less Is a total and complete failure of the design or components.

Keep in mind new world and diablo 4 are not maliciously designed applications - they are completely legitimate games designed to be games. If you can somehow twist and spin legitimate, AAA games causing GPUs to physically damage themselves as not 100% completely the OEM or vendors fault, you need to reevaluate.

Geegeeoh · Mar 26, 2023

Stuka87 said:
And its worth noting with DX12 and Vulcan, games are much closer to the metal than they were with older APIs.

Never closer or bypassing the driver.

Question Diablo4 causing gpu's to die

Platinum Member

Golden Member

Platinum Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

No Lifer

Diamond Member

Golden Member

Platinum Member

Diamond Member

Golden Member

Diamond Member

Member

Diamond Member

Diamond Member

Golden Member

Golden Member

Lifer

Senior member

Diamond Member

Diamond Member

Diamond Member

Member