AMD Ryzen 7000 Burning Out: Root Cause Identified, EXPO and SoC Voltages to Blame

AMD Ryzen 7000 Burning Out: Root Cause Identified, EXPO and SoC Voltages to Blame

AMD

(Image credit: Speedrookie/Reddit)

Multiple reports of Ryzen processors burning out have burst onto the internet over the last few days. The damaged chips have not only bulged out and become desoldered, but they have also done significant damage to the motherboards they are installed in. We reached out to our industry contacts and learned some new information about the nature of the problem and the scope of AMD’s planned fix. Our information comes from multiple sources that wish to remain anonymous, but the info from our sources aligns on all key technical criteria.

First, it is important to know that this condition can occur with both standard Ryzen 7000 models and the new Ryzen 7000X3D chips, though the latter is far more sensitive to the condition, and the root cause could vary between the two types of chips. AMD will issue a fix soon, but the timeline is unknown. We’re told that failures have occurred with all motherboard brands, including Biostar, ASUS, MSI, Gigabyte, and ASRock.

According to our sources and seconded by an ASUS statement to Der8auer, the problem stems from SoC voltages being altered to unsafe higher levels. This can be imposed from either the pre-programmed voltages used in EXPO memory overclocking profiles or when a user manually adjusts the SoC voltages (a common practice to eke out a bit more memory overclocking headroom).

Our sources also added further details about the nature of the chip failures — excessive SoC voltages destroy the chips’ thermal sensors and thermal protection mechanisms, completely disabling their only means of detecting and protecting themselves from overheating. As a result, the chip operates without knowing its temperature.

AMD’s modern chips often run at their thermal limits to squeeze out every last drop of performance within their safe thermal range — it isn’t uncommon for them to run at 95C during normal operation — so they will automatically continue to draw more power until the chips internal mechanisms tell it to dial back to remain within a safe temperature. In this case, the lack of temperature sensors and protection mechanisms allows the chip to continue to draw more power beyond the recommended safe limits. This excessive power draw leads to overheating that eventually causes physical damage to the chip, like the bowing we’ve seen on the outside of several chip packages, or the desoldering reported by Der8auer.

The chip pulls excessive power through the motherboard during this death spiral of sorts, thus leading to the visible damage we can see in the socket to the vCore pins and the bulging on the chip’s LGA pads. However, less visible damage also extends to the CPU SoC, CPU_VDDCR_SOC, and CPU VDD MSIC rails/pins — they just don’t pull enough current to leave visible scorching like we see with the vCore pins.

AMD

(Image credit: Enwyi/Reddit)

For now, we’re getting conflicting information on just what constitutes a safe SoC voltage. We do know that 1.25V is the recommended safe limit, but we’re told that 1.4V and beyond definitely increases the likelihood of the condition occurring. To be clear, running beyond 1.4V doesn’t ensure that your chip will burn out, but your odds will increase. Conversely, 1.35V appears to be “safe.” Proceed at your own risk, though.

Our sources say that AMD is working on a fix that includes a voltage cap or lock in the firmware/SMU, which should prevent EXPO memory profiles and simple BIOS manipulations from exceeding an as-yet-undefined limit. We’re also told that AMD can’t completely prevent SoC voltage manipulations because the amount fed to the chip is dictated by the VRMs, leaving a means for crafty motherboard vendors to allow voltage changes despite AMD’s lock (this would not be the first time motherboard vendors have circumvented limits to offer rare functionalities).

A few motherboard vendors, like ASUS and MSI, have already issued new BIOSes to correct some of the issues. However, we have confirmed that failures have also occurred on Biostar, ASRock, and Gigabyte, so all vendors are impacted to some degree.

As with all forms of overclocking, any damage from using an EXPO overclocking profile is not covered by your warranty, but given the situation, we don’t think that AMD or the motherboard vendors would use the lack of warrantied EXPO support to invalidate warranties. 

AMD

(Image credit: LT-Cc/Baidu)

The advertised performance you get from an EXPO profile is also not guaranteed by the chipmaker. It’s also noteworthy that AMD’s purportedly planned SoC voltage cap could lead to lower stable memory overclocking frequencies. However, we don’t think that will matter too much to most Ryzen 7000 owners, as the sweet spot DDR4-6000 should work just fine within the proposed limits. However, extreme overclockers and those pushing the very bleeding edge of performance could end up with lower overclocking limits. Time will tell. 

For now, you could take a few common sense approaches to potentially protect your chip while we await an official statement from AMD — but don’t take our advice as a 100% solution (proceed at your own risk). 

This condition means that, even though the odds are small, an EXPO profile could lead to physical damage to your chip and motherboard. If you use an EPXO profile, you should check your SoC voltage in your BIOS or with a utility like HWInfo. If it is at or exceeds 1.4V, you should disable the profile and run the memory at standard stock settings. If you have manually dialed in a 1.4V or higher SoC voltage, dial that back to a safer setting for now. 

Now all that is missing is the official word from AMD on the matter. We’re told the company is moving quickly to resolve the issue, so we expect a statement to arrive soon. We’ll update as necessary. 

Add a Comment