NV 12VHPWR issues revisited

Page 29 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Heartbreaker

Diamond Member
Apr 3, 2006
4,752
6,256
136
the nv board design is such that vrm is expecting one 12v blob. the vrm therefore only sees 1 rail so it cant do anything to change the situation other than shut down. the asus 3 shunt solution can only monitor and notify you if there is a problem because the vrm still doesnt see the 3 rails after they merge back into 1 rail.



People keep acting like balancing is a high priority, but all you can really depend on, is monitoring, alerting, and then shut down/go into limp mode.

If you have a situation like Der8auer had in his testing with 20 amps on one wire and 2 amps on another, you can't balance that.
 

gorobei

Diamond Member
Jan 7, 2007
3,900
1,385
136
People keep acting like balancing is a high priority, but all you can really depend on, is monitoring, alerting, and then shut down/go into limp mode.

If you have a situation like Der8auer had in his testing with 20 amps on one wire and 2 amps on another, you can't balance that.
monitoring and alerting like the asus astral is the 2nd weakest protection as you have to install the alert software and be willing to stop your gaming as soon as it chimes. monitoring and shutdown before you even try to game is better as it actually tells you your cables arent plugged in correctly. the vrm balancing multiple "rails" is part of the monitoring and has some ability to deal/correct with the problem.

nv using a single 12 blob into the vrm with no ability to put multiple shunts or meaningful fuses just means that the card gets zero possibility of it being easily/cheaply fixed by replacing the powerstage and blown fuse. buildzoid goes into some of the fuse math. too low of a fuse value and the card shuts down due to frequent transient spikes in normal gameplay. too high and the fuse doesnt actually trip when you would want it to leading to the card self-destructing in a non repairable way. splitting the lines into multiple 12v "rails" at least gives you a chance to let the vrm try to fix some of the issue and allows you to protect from the worst case outcome.
melting cables and gpu sockets with a side of burning psu socket is not a "oh we cant do anything about that" scenario.

if nv lets board partners split up the rails and fuse them, then fine let the buyer take the responsibility of stopping their gaming session before something fails. if they let board partners fix the issue like the msi 2080 3rail, then great.

but what is more likely: nv acknowledges their mistake in the 12v single blob into the vrm and the failure point of the 12vHF cable, or nv forces its partners to stick to the blob and blames everyone else for the cards failing (soldergate, gpu killing driver patch, thermi) so they never have to pay for the repair/rma.
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,752
6,256
136
monitoring and alerting like the asus astral is the 2nd weakest protection as you have to install the alert software and be willing to stop your gaming as soon as it chimes. monitoring and shutdown before you even try to game is better as it actually tells you your cables arent plugged in correctly.

Agreed. Monitoring and shutdown/limp mode should be card level, not dependent on PC software.


the vrm balancing multiple "rails" is part of the monitoring and has some ability to deal/correct with the problem.


Tell me how you balance when one cable has 20 amps and one has 2 amps?
 
Reactions: wilds

amenx

Diamond Member
Dec 17, 2004
4,283
2,616
136
Agreed. Monitoring and shutdown/limp mode should be card level, not dependent on PC software.





Tell me how you balance when one cable has 20 amps and one has 2 amps?
This was brought up and addressed earlier. The 40/50 series were a departure from Ampere which had 3 shunt resistors that capped load to 200w each. 40/50 series now have just one blob taking in 600w which can easily overload any of the 6 power leads to the GPU if improper contact.



 

Heartbreaker

Diamond Member
Apr 3, 2006
4,752
6,256
136
This was brought up and addressed earlier. The 40/50 series were a departure from Ampere which had 3 shunt resistors that capped load to 200w each. 40/50 series now have just one blob taking in 600w which can easily overload any of the 6 power leads to the GPU if improper contact.

View attachment 121310


This is still only a about monitoring. It does nothing to balance a 2 amp vs 20 amp situation (hint you can't balance that).
 

MrTeal

Diamond Member
Dec 7, 2003
3,888
2,592
136
This is still only a about monitoring. It does nothing to balance a 2 amp vs 20 amp situation (hint you can't balance that).
Are you sure on that? I believe Ampere split the 12 pin to three separate 12V regions to power different phases.
 

coercitiv

Diamond Member
Jan 24, 2014
7,096
16,374
136
This is still only a about monitoring. It does nothing to balance a 2 amp vs 20 amp situation (hint you can't balance that).
It feeds the groups of pins into different (groups of) power phases on the card. GPUs with multiple power connectors have been doing this for many years, in this instance you have 3 groups of 2 pins. The "balancing" comes from the lowered max amperage that can end up on a single pin in a worst case scenario, max load per pin goes down linearly with the number of groups.
 
Reactions: wilds and amenx

Heartbreaker

Diamond Member
Apr 3, 2006
4,752
6,256
136
It feeds the groups of pins into different (groups of) power phases on the card. GPUs with multiple power connectors have been doing this for many years, in this instance you have 3 groups of 2 pins. The "balancing" comes from the lowered max amperage that can end up on a single pin in a worst case scenario, max load per pin goes down linearly with the number of groups.

Unless you get down to individual pins, you can still have something like the 2 amp and 20 amp situation that Der8auer had, even with groups of two. All you need is one bad pin/connector/wire and it will try to pull all the current through the other one of the pair, and at 600W there is a lot of current through each wire.
 

DrMrLordX

Lifer
Apr 27, 2000
22,504
12,374
136
Unless you get down to individual pins, you can still have something like the 2 amp and 20 amp situation that Der8auer had, even with groups of two. All you need is one bad pin/connector/wire and it will try to pull all the current through the other one of the pair, and at 600W there is a lot of current through each wire.
Not if you have 4 PCIe 8-pins. You're getting at most 12.5a per connector.
 

coercitiv

Diamond Member
Jan 24, 2014
7,096
16,374
136
All you need is one bad pin/connector/wire and it will try to pull all the current through the other one of the pair, and at 600W there is a lot of current through each wire.
With a three-way split of power phases on groups of 2 pins you get only 17A max per pin in case the other one makes no contact at all. That's 5A less than we saw in the Der8auer video, which was not even the worst case possible for that setup.

This is one of the reasons the multiple PCIe 8-pin solution worked fine until now, AMD and Nvidia were allocating each connector to a group of power phases. Apparently Nvidia went even further in the past, they had methods to dynamically allocate a couple of phases from one circuit to another.
 

MrTeal

Diamond Member
Dec 7, 2003
3,888
2,592
136
In my opinion 3x2 is better than 6x1 for rails. Yeah a 2A and 14A split between the two wires in a group isn't ideal since the one would be out of spec, but it's not a ridiculous amount over.
The reason you get that imbalance is because you have a difference in contact resistance between the two conductors in that group though. The conductor with the poor contact has higher resistance and thus flows less current, which helps keep heat down since the heat is generated by I²R losses in the connector.
If you create 6 separate 12V regions on the GPU, they're all going to pull ~8A with a 5090 load. If one of the contacts has a high contact resistance there's no other path and it's still going to pull it's normal current or even slightly more due to the voltage drop across the contact, and it'll very quickly start to overheat.
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,752
6,256
136
If you create 6 separate 12V regions on the GPU, they're all going to pull ~8A with a 5090 load. If one of the contacts has a high contact resistance there's no other path and it's still going to pull it's normal current or even slightly more due to the voltage drop across the contact, and it'll very quickly start to overheat.

You shouldn't attempt to pull more current through any connection that has a current drop. This is why I've always been arguing for alert/shutdown over load balancing.

Because load balancing is likely to attempt to pull more current through the connections with the most resistance, if you attempt to balance them.

Going in groups of two won't save you from this. What if you have two bad connections in group, and you still try to force the 200 watts through them. 200 watts through through a bad connection can still melt/catch fire.

The only reasonable approach is alert and shutdown.
 

amenx

Diamond Member
Dec 17, 2004
4,283
2,616
136
Going in groups of two won't save you from this. What if you have two bad connections in group, and you still try to force the 200 watts through them. 200 watts through through a bad connection can still melt/catch fire.
200w big difference vs 600w. 2 wires handling 200w is still potentially less problematic than 6 wires handling 600w where any single wire can more easily melt when carrying the entire load due to poor contact on other 5 wires.

200w x3 worked well with Ampere and no reports of molten connectors.
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,752
6,256
136
200w big difference vs 600w. 2 wires handling 200w is still potentially less problematic than 6 wires handling 600w where any single wire can more easily melt when carrying the entire load due to poor contact on other 5 wires.

200w x3 worked well with Ampere and no reports of molten connectors.

If you simply monitor alert/shutdown you can't have issues. That's better than attempting to push more current through bad connections through load balancing.
 

MrTeal

Diamond Member
Dec 7, 2003
3,888
2,592
136
You shouldn't attempt to pull more current through any connection that has a current drop. This is why I've always been arguing for alert/shutdown over load balancing.

Because load balancing is likely to attempt to pull more current through the connections with the most resistance, if you attempt to balance them.

Going in groups of two won't save you from this. What if you have two bad connections in group, and you still try to force the 200 watts through them. 200 watts through through a bad connection can still melt/catch fire.

The only reasonable approach is alert and shutdown.
There's no reason you can't do both. Pretty much every card from the dawn of supplemental power has involved some form of per connector current balancing, and while I've melted my share of PCIe 6 and 8 pin connectors it was pretty much a non-issue overall. Similarly with Ampere and the 3x2 arrangement we didn't see the same issues with melting connectors even without monitoring.

Issues we're seeing now are twofold. First, the blobbing of 6 conductors together means that worst case current is several times the normal single conductor current. Second, each conductor is running closer to its current rating. Even with the rating there's still some safety factor built in, and balancing them limits the maximum current per pin to double the group power (so 16A for 3x2 @ 575W), but that's still pushing it for a contact rated at 9.5A. Way better than the 22A derbauer measured though.

You can still per pin measure current and restrict power at some value, everything just becomes more reasonable if you have more control. For instance, if you had a card pulling 575W though connector (8A per pin average, 16A per group), and you found you were actually pulling 12A/4A/8A/8A/8A/8A you would have seamless options. You could adjust the phase timings of the group add 2A average to the other two groups, and reduce the first group current 4A giving 9A/3A/9A/9A/9A/9A. All contacts are still in spec, and no need to throttle the card.
 

reb0rn

Senior member
Dec 31, 2009
272
90
101
high power cable need to be soldered, ppl that want modular cable should burn their house down, the connector on NV side need to be way more quality wise so those crappy cables would would not have lose pins, or cards all should detect current per pin

I see zero reason to have basic pcie and mobo/cpu cable modular, just fancy gimich made by idiots and praised by stupid users
 

basix

Member
Oct 4, 2024
89
179
66
If you short all wires at the cable connector level with decently thick metal diameter (on both sides of the cable) you would at least get rid of too high current loads at the wire level. It will not solve the problem, that you might push too much current across a single connector pin but still, an improvement is always welcome.

A second benefit of that shorting is, that heat gets transferred better between pins. This would reduce hot spot risks at a single pin (thermal conduction & power dissipation via connector to the PCB and along all wires of the cable).

These two things should already lead to a decent improvement of the situation with existing cards. It would reduce the likelihood of worst cases drastically. And the implementation / solution is very simple. So cable manufacturers should start to do that immediately. I think Nvidia does that on their adapter cable on the card side's connector.
 

amenx

Diamond Member
Dec 17, 2004
4,283
2,616
136
Asus GPU Tweak III tool now features pin readouts for the connectors on select Asus cards.


HWInfo64 now has beta versions that should work on more 50 series cards. And with promising results. Discussion of 50 series owners with the developer on this.

 
Reactions: Elfear

wilds

Platinum Member
Oct 26, 2012
2,059
673
136
Excluding the 3090 Ti, all other implementations of the 12VHPWR from AMD or NVIDIA are not safe; and should not be recommended.

The lack of safety margin compared to traditional 8-pin should be alarming.

This is no longer an Nvidia only issue; with Sapphire showing off their stupidity too. I fear someone’s house is going to have to burn down for any meaningful change to occur.
 

DrMrLordX

Lifer
Apr 27, 2000
22,504
12,374
136
Excluding the 3090 Ti, all other implementations of the 12VHPWR from AMD or NVIDIA are not safe; and should not be recommended.

The lack of safety margin compared to traditional 8-pin should be alarming.

This is no longer an Nvidia only issue; with Sapphire showing off their stupidity too. I fear someone’s house is going to have to burn down for any meaningful change to occur.
In general I agree, though the 4090Ti had 2 12VHPWR/12v2x6 connectors which isn't as bad as the single-connector 4090 or 5090 cards (especially not as bad as the 5090FE).
 

coercitiv

Diamond Member
Jan 24, 2014
7,096
16,374
136
Some interesting tidbits from the connector specs in the PCIe_CEM_R5.1_V1.0 documentation:

Power Pin Current Rating: (Excluding sideband contacts) 9.2 A per pin/position minimum with a limit of a 30 °C T-Rise above ambient temperature conditions at +12 V VDC with all twelve contacts energized.
Due to variations in contact resistance, an individual pin may see more than 9.2A of current depending on cable contact resistance nonuniformity, but the total current for the assembly shall not exceed 55A RMS in each direction. See section 9.3.1.1 for contact resistance variability measurements and requirements.

And here's the relevant part from section 9.3.1.1

12V-2x6 Cable Plug Housing Assembly and Contact Construction
Contact resistance variability within a cable assembly causes current imbalance between contacts and may result in individual contacts and/or wires exceeding the current per pin specified in section 9.1. In addition, this current imbalance may be increased due to cable bending and/or side loading of cable assembly mated to the Add-in Card. The specific wire, connectors and manufacturing process used for a cable assembly must be designed to accept the current imbalance due to contact resistance variability and side loading. Side loading is defined here as a load applied in each direction defined in Figure 9-10 perpendicular to housing bodies.

To measure low-level contact resistance and qualify that a mated header and cable assembly design can control contact resistance variability under side loading conditions, the following methodology is provided. LLCR measurements shall be conducted according to EIA-364-23, Low Level Contact Resistance Test Procedure for Electrical Connectors and Sockets.
  • Secure the Add-in Card connector to a fixture.
  • LLCR of the cable assembly is measured from the footprint of the receptacle on the top of the Add-in Card PCB to 50mm from the point where the wire exits the body of the plug. The purpose of controlling the measurement point is to ensure that wire length and its contribution to contact resistance is repeatable. This does not imply a restriction on specific cable assembly implementation.
  • Perform 30 mating cycles between the connector and the cable assembly.
  • Record LLCR of each conductor of the cable assembly in the unloaded condition.
  • Apply side load of 20N in each direction as defined in Figure 9-10. Load must be applied to the wire bundle and beyond any cable tie or strain relief feature of the assembly if present. Record LLCR of each conductor once the value has reached a stable value.
  • Calculate average contact resistance of pin groups 1-6 and 7-12 independently for each side load condition.
  • The LLCR shall not vary on any pin more than 50% from the average of that pins respective group in each test condition. A maximum LLCR of 6mohm/contact is required on each conductor.

To me this shows increased reliance on the testing and the quality standards of each manufacturer. The 9.2A per pin load is actually the minimum requirement before testing. The final max rating can be anything based on connector side-load testing
 
Reactions: Elfear
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |