Ryzen Locking on Certain FMA3 Workloads

Mockingbird

Senior member
Feb 12, 2017
733
741
106
Last week a thread was started at the HWBOT forum and discussed a certain workload that resulted in a hard lock every time it was run. This was tested with a variety of motherboards and Ryzen processors from the 1700 to the 1800X. In no circumstance at default power and clock settings did the processor not lock from the samples that they have worked on, as well as products that contributors have been able to test themselves.






This is quite reminiscent of the Coppermine based Pentium III 1133 MHz processor from Intel which failed in one specific workload (compiling). Intel had shipped a limited number of these CPUs at that time, and it was Kyle from HardOCP and Tom from Tom’s Hardware that were the first to show this behavior in a repeatable environment. Intel stopped shipping these models and had to wait til the Tualatin version of the Pentium III to be released to achieve that speed (and above) and be stable in all workloads.

The interesting thing about this FMA3 finding is that it is seen to not be present in some overclocked Ryzen chips. To me this indicates that it could be a power delivery issue with the chip. A particular workload that heavily leans upon the FPU could require more power than the chip’s Control Fabric can deliver, therefore causing a hard lock. Several tested overclocked chips with much more power being pushed to them seems as though enough power is being applied to the specific area of the chip to allow the operation to be completed successfully.

This particular fact implies to me that AMD does not necessarily have a bug such as what Intel had with the infamous F-Div issue with the original Pentium, or AMD’s issue with the B2 stepping of Phenom. AMD has a very complex voltage control system that is controlled by the Control Fabric portion of the Infinity Fabric. With a potential firmware or microcode update this could be a fixable problem. If this is the case, then AMD would simply increase power being supplied to the FPU/SIMD/SSE portion of the Ryzen cores. This may come at a cost through lower burst speeds to keep TDP within their stated envelope.

At posting AMD has confirmed this issue and that a fix will be provided via motherboard firmware update. More than likely this comes in the form of an updated AGESA protocol.
 

formulav8

Diamond Member
Sep 18, 2000
7,004
522
126
The P3 was a clock speed issue. The FMA 3 is a firmware issue from what I understand from reading somewhere else.
 

DaveSimmons

Elite Member
Aug 12, 2001
40,730
670
126
The P3 was a clock speed issue. The FMA 3 is a firmware issue from what I understand from reading somewhere else.

Yep, the P3 was an out-of-design-specs over-overclocked chip intel rushed out in response to the Athlon 1 GHz. Basically retaking the speed crown by cheating.
 

SPBHM

Diamond Member
Sep 12, 2012
5,058
410
126
the phenom b2 bios fix had a big performance loss, hopefully it's not the case with Ryzen.

Yep, the P3 was an out-of-design-specs over-overclocked chip intel rushed out in response to the Athlon 1 GHz. Basically retaking the speed crown by cheating.

Intel had stable Coppermines at 1 and 1.1GHz I think.
 

JDG1980

Golden Member
Jul 18, 2013
1,663
570
136
Intel had stable Coppermines at 1 and 1.1GHz I think.

Intel's first attempt at a 1.13 GHz Pentium III had to be withdrawn from the market after Tom's Hardware tests showed that it was unstable on Linux kernel compilation and various other tasks. It was in effect a factory overclocked chip, with very specific requirements about motherboard, BIOS, etc. OEM only. In many ways this is the most shameful product Intel ever released.
 

Jan Olšan

Senior member
Jan 12, 2017
316
408
136
IMHO, this is more reminiscent of the last year's freeze bug in Skylake, manifesting in Prime95. That was fixed successfully, while in the days of Pentium III and the likes, the stuff possible to do via microcode updates probably wasn't as comprehensive. (Fingers crossed that Zen will also be fixable the soft way without performance issues.)

At least it is not accelerated aging of some I/O interface, those bugs are scary.
 

Doom2pro

Senior member
Apr 2, 2016
587
619
106
I believe The Stilt said the bug has been fixed since February however the new code takes a long time to validate, this is why the bug is still in the wild.
 

Jan Olšan

Senior member
Jan 12, 2017
316
408
136
http://forum.hwbot.org/showpost.php?p=481602&postcount=42

It seems that some Gigabyte boards already have the fix in latest BIOS, the person originally reporting the bug confirming it on his one.
I don't see any information on possible performance hits, anybody has something about that?


Edit: seems the BIOS updates in question are not public yet, you need a beta image that only can be found unofficially ATM: https://www.reddit.com/r/Amd/commen...ly_new_beta_bios_f6b/?st=j0mianmr&sh=0e3b8915
https://www.reddit.com/r/Amd/commen...th_fma3_fix_released/?st=j0milmkt&sh=a56f2ec0
Apparently Gigabyte used to have some of them listed, but then removed them from the publicly vvisibel downloads (maybe a sign it is better to wait for final release).
 
Last edited:

LTC8K6

Lifer
Mar 10, 2004
28,520
1,575
126

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
Here is the disassembled code causing the crash in many Ryzen systems before applying the patch - in nice colors (compared to my posting in the other thread):


As you can see, it doesn't do useful calculations. It repeatedly multiplies xmm0 and xmm1 and adds, later subtracts that result from different registers:
a:=b*c+a
later
a:=b*c-a
which means: b*c is actually not relevant.
 
Last edited:
Reactions: IEC and Drazick

Jan Olšan

Senior member
Jan 12, 2017
316
408
136
Seems Asus Crosshair Hero VI is also fixed now (bios 1002).

Still no tests on performance before/after the update?
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |