Do GPUs Wear Out From Heavy Use?


A GPU card on a blue background
Maxx-Studio/Shutterstock.com

If you’re planning on doing intensive gaming, GPU computing, graphics rendering, Folding@home, or crypto mining on your graphics card, you might be worried that your GPU will wear out from heavy use. But will it? We’ll investigate.

Yes, But It’s Complicated

Most information about lifespans of graphics cards you’ll find online is anecdotal, with numbers that can vary dramatically depending on whom you ask. With hundreds of different models of graphics cards released over the past decade, it’s hard to boil down data on such wildly different cards into simple generalizations.

So far, we know this: According to a 2020 report from a German retailer, most recent graphics cards have about a 2-5% failure rate (measured in returns to the retailer) overall. And in 2021, Nvidia still provided driver updates for cards that were around 9-10 years old (such as the GTX 600 series), so you can possibly expect a decade of use out of a well-treated GPU card—although those might be outliers, as we’ll see ahead.

Regardless of the numbers, there’s some hard physics at work. The materials and components used in the composition of GPU cards aren’t magical: The more you use them, the faster the parts degrade, and the more likely they will fail completely. So heavy use does affect lifespan.

Several GPU cards in a crypto miner.
As you’ll see, crypto mining will decrease the lifespan of a graphics card. socrates471/Shutterstock.com

Whether you’ll see a failure in your GPU card depends on wildly different variables, including exactly how heavily the GPU has been used, the nature and degree of temperature swings in the circuitry, how many times the card has been powered on and off, and how clean the operating environment is.

Because a GPU card is a complex device with many parts, each one can fail or degrade in different ways. We’ll go through several major parts of a GPU card and examine how they might wear out from heavy use over time.

First to Go: Cooling Fans

Of all the parts of a graphics card that are likely to fail first, we’d have to point to the cooling fans (or fan), which are physical moving parts. Fans keep your GPU cool by moving hot air away from the GPU chip (with a heat sink) so it can keep operating.

A GPU card with spinning fans.
FeelGoodLuck/Shutterstock.com

Why is heat bad? With enough heat, transistors don’t work properly, which means the GPU card won’t function. With even more heat, the transistors in chips on the card can be permanently damaged.

Over time, cooling fans often clog up with dust, reducing their ability to move air efficiently. Or the fans might fail completely if an internal lubricant breaks down. Either scenario will raise the temperature of the GPU.

Every GPU protects itself from overheating by using thermal throttling, which slows down the operation of the GPU to lower the operating temperature. Doing so severely limits performance. So if you have a GPU that’s suddenly noisier than usual (the fan is spinning faster) or performing worse, thoroughly clean your GPU’s cooling fans and heat sink with compressed air.

If a GPU cooling fan has failed completely, you can usually replace it if you can find an equivalent fan from a computer parts supplier.

RELATED: How to Thoroughly Clean Your Dirty Desktop Computer

Another Suspect: Faulty Thermal Compound

Between every heat sink and GPU chip there is a layer of thermal conductive material, such as a pad of putty or paste that helps transfer heat from the GPU chip to the heat sink.

Over time, thermal paste can crack or lose potency. When that happens, the heat sink doesn’t cool as effectively, and the GPU temperature will rise. As we’ve seen in the fan section above, high GPU temps result in thermal throttling, which will slow down your GPU.

The best fix in that scenario is replacing the thermal paste yourself. You can buy thermal paste from computer parts sellers.

Failures in Other Components, Solder

Aside from the GPU chip, a graphics card will include dozens of other electronic components such as capacitors, resistors, memory chips, and more. Any of those could potentially fail from heavy use or exposure to too much heat. Some are more likely to fail than others.

A photo of capacitors sitting loose on a PCB.
Andrei Kuzmik/Shutterstock.com

Capacitors in particular are prone to failure over time. They’re sensitive to frequent temperature changes, and some are defective when first produced. If you’re handy enough to troubleshoot capacitor issues, you can potentially replace bad capacitors on a GPU card if you can find equivalent replacement parts.

Also, the solder that bonds chips and components to your GPU card’s circuit board can age and crack over time from frequent temperature shifts, rough physical handling, improper storage, or running too hot. So yes, heavy GPU usage could increase the risks of solder joint failure. Repairing bad solder joints can be technically difficult, but it’s not impossible.

Failures in the GPU Chip Itself

So the question remains: Can a GPU chip eventually wear out from heavy use? The answer is yes, theoretically, under extreme circumstances. But you’ll likely see the failure of another component on the graphics card long before that time.

The GPU chip on your graphics card contains millions or billions of transistors, etched into a piece of silicon. Transistors age over time, affecting their performance. When enough transistors misbehave, the chip will fail.

According to Semiconductor Engineering, there are several major reasons why transistors malfunction over time from aging (one of which is heat), and the errors are more likely the smaller the feature size on the chip. Experts suspect computer chips made today won’t last as long as chips made in the 1990s, but predicting an exact lifespan is still guesswork since the technology is so new.

An illustration of a GPU chip.
ZinetroN/Shutterstock.com

Currently, NVIDIA does not publish MTBF (mean time between failure) estimates for their consumer graphics cards, but the company does publish them for some of its industrial and business graphics accelerators. For example, the datasheet for the Tesla K20X GPU Accelerator cites the MTBF for the card (at 35C/95F temperature) to be 14.7 years for an “uncontrolled environment” and 23.8 years for a “controlled environment.” (Note that, generally, industrial graphics hardware is expected to be more robust and hold up better under heavy use than consumer graphics hardware.)

Interestingly, we can compare this theoretical number with hard data from out in the field. One of the few empirical studies of GPU lifespan comes courtesy of a 2020 paper titled “GPU Lifetimes on Titan Supercomputer: Survival Analysis and Reliability” authored by Oak Ridge National Labs. The paper reports on the reliability of the 18,688 Nvidia K20X Kepler GPU cards used in the now-retired Cray XK7 Titan supercomputer over a period of almost 7 years (2012-2019).

The Cray XK7 Titan Supercomputer
The Cray XK7 supercomputer provided valuable data about GPU lifespan. ORNL

After some initial hiccups due to connection issues, they found relatively high reliability with the XK7’s graphics cards until 2016 (about 3-4 years in), when many began to fail. But guess what? They traced most of the failures in the first batch of cards (before replacement) to a faulty resistor on the graphics card’s circuit board, not the GPU chip itself. Overall, the study’s authors found the average MTBF of the K20X’s heavily-used GPU cards to be around 3 years (not 14-23 years, as cited in Nvidia’s datasheet), with some of the hottest cards in the core failing first. They concluded, “GPU reliability is dependent on heat dissipation.”

So the odds are high that if you use your graphics card as intensely as one of the world’s largest supercomputers (at the time), it will wear out faster, and that other components such as fans and resistors will fail long before the GPU chip itself. Exactly how long you’ll get depends on factors that we can’t predict.

Ultimately, Heat is the Enemy

In the end, from every source we’ve read, the most major deciding factor for how long a GPU card will last is how hot it runs. The hotter the card, the faster all of its components degrade. Also, the hotter the card, the more it throttles down in performance to prevent catastrophic failure. Good cooling both extends your card’s lifespan and increases its performance.

So whether you’re mining crypto or gaming, if you keep your GPU card reasonably cool with clean, working fans and effective thermal paste, you’ll likely have a high-performing card that, if you’re lucky, might last until it becomes obsolete and you upgrade.

If you’re planning on buying a used GPU, you should definitely take its history into account, including how its owner treated and used it. More heavily-used cards (that work now) will likely work fine in the short term but are more prone to failure in the long term. We can’t put any exact number on a card’s lifespan, but heavy use definitely wears graphics cards out faster.

Good luck!

RELATED: Is It Safe to Buy Used GPUs From Cryptocurrency Miners?





Source link

Previous articleSpiderhead Review | TechRadar
Next articleCheck Out These DIY Electric Vehicle Builds – Review Geek