VRM Overtemperature & Sealed vs. Unsealed Delid Thermals (Ft. Bitwit's 7980XE)
Posted on October 30, 2017
Tripping VRM overtemperature isn’t something we do too often, but it happened when working on Bitwit Kyle’s 7980XE. We’re working on a “collab” with Kyle, as the cool kids call it, and delidded an i9-7980XE for Kyle’s upcoming $10,000 PC build. The delidded CPU underwent myriad thermal and power tests, including similar testing to our previous i9-7980XE delid & 7900X “thermal issues” content pieces. We also benchmarked sealant vs. no sealant (silicone adhesive vs. nothing), as all of our previous tests have been conducted without resealing the delidded CPUs – we just rest the IHS atop the CPU, then clamp it under the socket. For Kyle’s CPU, we’re going to be shipping it across the States, so that means it needs to not leak liquid metal everywhere. Part of this is resolved with nail polish on the SMDs, but the sealant – supposing no major thermal detriment – should also help.
Tripping overtemperature is probably the most unexpected side of our journey on this project. We figured we’d publish some data to demonstrate an overtemperature trip, and what happens when the VRMs exceed safe thermals, but the CPU is technically still under TjMax.
Let’s start with the VRM stuff first: This is a complete sideshoot discussion. We might expand it into a separate content piece with more testing, but we wanted to talk through some of the basics first. This is primarily observational data, at this point, though it was logged.
The scenario was this: We were testing stock compound vs. LM and sealant vs. unsealed CPUs, and ended up running into issues with the multiplier bouncing between 12x and our 45x multiplier. This occurred even when permitting unlimited current. The motherboard is the ASUS ROG Rampage VI Extreme (X299), the CPU is the 7980XE (later overclocked to 4.5GHz / 1.24VID), and the memory is 32GB of GSkill Trident Z Black memory at 3600MHz. For the cooler, we are using an NZXT Kraken X62 at max RPMs on the pump and fans.
Multiplier Drops from VRM Overtemperature
This chart of the averaged multipliers tells the story: It was all over the place, but we weren’t actually tripping TjMax when under liquid metal. We were at about 83-89C intermittently across all cores, averaging ultimately to 83.4C. This isn’t anywhere remotely close to the 104C throttle threshold for Skylake X, so obviously something’s wrong in that scenario. It wasn’t the current limit, either, but in this chart, you can see that the current fluctuation also reflected the frequency modulation. This line is ideally fairly steady and straight in a Blender-type workload, as each thread renders one tile ad infinitum, until render completion. The orange and red lines are crashing against an unknown wall, at this time – it’s not current and it’s not TjMax – but the blue line remains steady. The question, of course, is what we changed for that blue line.
We added fans. Touching the heatsink during the initial burn-in was hot enough to cause skin burns, if you held it there long enough. We stick thermocouples on lots of things, but every now and then, a simple feel test will suffice to determine if something is over temperature. The VRM clearly was, and its heatsink had no means to really spread or dissipate that heat meaningfully. For the blue line, we added a fan that was pointed straight down at the VRM heatsink. We’ll look into monitoring VRM temperatures with thermocouples in the future.
Even air coolers wouldn’t solve for this, as they’re often situated above the VRMs. On this particular motherboard, the VRM is sandwiched between the GPU backplate, so we’re dealing with radiative heat, the CPU, so we’re dealing with upwards of 50A of current into that thing, and the memory. The memory alone is burning at about 60C per stick. All that heat just sits there, and it’d be even worse in a case.
Kyle’s Delidding Results (Sealed vs. Unsealed IHS)
We’re using Thermal Grizzly Conductonaut for Kyle’s delidded CPU, configuring the stable overclock for our own 7980XE: 4.5GHz/1.24VID, which is a bit on the high side. We haven’t validated Kyle’s CPU for maximum overclock and minimum voltage – we’ll leave that up to him.
Starting with the delid results from our 4.5GHz overclock at 1.24VID, Kyle’s 7980XE with TIM had several cores bumping into TjMax at 104C, causing clocks to drop to a multiplier of 12X. We were not able to sustain the overclock during this test and with our X62 at max speeds. We’d either need a better cooler or lower clocks and voltage.
After a delid, our heavy silicone adhesive attempt – wherein we used the adhesive on both layers and on all sides of the IHS – had us at TjMax again. Even with the liquid metal, it wasn’t any better.
Manually spreading the adhesive into a thin film on just the top layer of the substrate resulted in 83C average core temperature, resolving our clock throttling issue completely. The overclock is now stable and operates at 45x constantly.
As for the GN parts, we were at 75C with our unsealed IHS. This was with a different chip, liquid metal application, and without any sealant.
With Prime95 fixed at 3.6GHz and 1.15VID, the GN TIM unit operated an all-core average of 89.5C, peaking at 90C, with the unsealed liquid metal variant at 72.4C, resulting in a 17C improvement. Kyle’s TIM unit ran out of box thermals of 86.6C, peaking at 89C – more or less what we saw. With a light sealant in only the pertinent areas, we managed a 76C average and 77C peak, for a peak-to-peak drop of 12C. Keep in mind that these are two separate CPUs, so we cannot definitively state that the sealant caused the 5C difference. It would be disingenuous to do so. What we can state is that a combination of these factors resulted in a 5C delta, and that Kyle’s CPU still benefits from a 12C improvement with liquid metal and a light re-seal.
Out of curiosity, we also tested a medium seal on Kyle’s CPU, finding that the output temperature was 78-80C, up from 76-77C. We can definitively state that the sealant caused this 2-3C leap in thermals.
Just because the rhetoric often gets out of control with delidding, let’s keep in mind that these chips are 100% operable with the stock TIM – it’s just a matter of how far you can push them. With auto settings, which introduce variance in voltage and power consumption, the CPUs run at around 50 degrees with Blender, or 50-62 with Prime95. We don’t use auto for comparison between TIM and liquid metal as it is unpredictable for testing.
When & Why Delidding Matters
You do not “need” to delid, and those who’ve started that rhetoric do not understand why testers use fixed (higher than necessary) voltages and clocks for benchmarking: We need stability and predictable performance, which means not relying on auto settings. The CPU is just fine with thermalpaste, but liquid metal helps significantly in overclocking endeavors (like 4.5GHz/1.24VID). To achieve stability at these OCs without stepping out of bounds, you’d need a much bigger, louder cooler, and that’s where the delid comes in – it’ll reduce noise, cooler expense and size requirement, and very slightly reduce power consumption (about 4% for every 10C drop in thermals).
We've lightly resealed the CPU and have applied nail polish around the SMDs. This should hopefully prevent LM migration during shipping. Keep an eye out for Kyle’s final video. It’ll go live on his YouTube channel.
Editorial, Testing: Steve Burke
Video: Andrew Coleman