'

Let's talk about Nvidia's RTX 3080 CTD Issues - Are SPCAPS and MLCCs to blame?

These Crash To Desktop issues are a lot more complicated than you think

Let's talk about Nvidia's RTX 3080 CTD Issues - Are spcaps and MLCCs to blame?

Let's talk about Nvidia's RTX 3080 CTD Issues - Are SPCAPS and MLCCs to blame?

It should be common knowledge now that some RTX 3080 users are experiencing stability issues, seeing users of Nvidia's latest Ampere products crash to desktop (CTD) when playing certain games at high clock speeds. 

Note that this issue does not impact all RTX 3080 graphics cards. Reports of stability issues have come from various sources and affect a range of custom RTX 3080 designs. Right now, we know that these RTX 3080 stability issues cannot be blamed on a single manufacturer or GPU model, making these stability issues a lot harder to diagnose than many realise. 

These stability issues are not a universal problem, so don't expect all RTX 3080 graphics cards to face stability issues. 

What's the problem? Why are RTX 3080 GPUs crashing?

The problems facing some RTX 3080 users sit firmly within the realms of electrical engineering. To say that modern graphics cards are complicated is an understatement, and we do not have the expertise to explain the RTX 3080's stability issues in great detail. Even those with degrees in electrical engineering will tell you that the power delivery circuits within modern computers is an incredibly complex field of study. 

The CTD issues facing Nvidia's RTX 3080 graphics cards can be pinned on power delivery, but saying that alone oversimplifies the matter. There are so many factors that can impact GPU power delivery, blaming a single component or factor is unhelpful. While much of the conversation surrounding these issues have come down to SPCAPS and MLCC arrays, there are other factors at play. 

Igor's Lab has written an excellent piece describing the electronics behind this issue in great detail. However, we will note that factors can impact power delivery outside of capacitor selection. The stability of your power supply is a factor, the power circuitry of your graphics card is a factor, the intended clock speeds of your graphics card is a factor and so are Nvidia's Geforce drivers and the quality of your Ampere GPU's silicon. 

While most of the conversation surrounding this issue is focused on SPCAPS VS MLCC arrays, remember that these other factors matter. 

Are SPCAPS to blame? 

Recent statements from Nvidia board partners like EVGA have implied that POSCAPs/SPCAPS can be blamed for RTX 3080 stability issues. On their Forums, EVGA has confirmed that "a full 6 POSCAPs solution cannot pass the real world applications testing" for their custom RTX 3080 models. The implications of this are that SPCAPS are not good enough to supply Nvidia's RTX 3080 with clean power, but the matter is a lot more complicated than that. 

Remember that there is no single SPCAPS or MLCC chip that GPU manufacturers use. Manufacturers can select from thousands of different chips depending on their power requirements. Each of these chips have different power characteristics and are therefore best suited to different applications. Some GPU manufacturers may have selected better SPCAPS or MLCC chips for their chosen tasks, making component selection a vitally important aspect of product design. 

Depending on the chips that Nvidia's AIB partners have selected, some RTX 3080 designs may face more issues than others. That said, RTX 3080 GPUs which use more MLCC chips appear to be more resilient to RTX 3080 stability issues. Other design elements of graphics cards will also have their impact, as will the quality of your GPU's silicon. 

Things are a lot more complicated than you think!

 
Not all graphics card silicon is created equal. When transistor sizes are measured in nanometers, it is understandable that the chips within all modern graphics cards and processors cannot be manufactured with complete uniformity or without any defects. Some chips will require more power than others, while some will be able to run at higher core clock speeds when pushed to their limits. These factors have a part to play for these RTX 3080 stability issues, as more power-hungry RTX 3080 chips will place more strain on the graphics card's power circuitry at high clock speeds, potentially causing these CTD issues.

Some RTX 3080 users with supposedly sub-par 6-SPCAPS configurations could face no issues if the quality of their RTX 3080 silicon is high enough, while those with more power-hungry chips could face issues despite using supposedly stronger MLCC arrays. Several factors are at play here, and the problem can be blamed on several factors depending on how you look at the situation.     

Software can also play a role in these issues, as Nvidia AIB partners informed us that Nvidia planned to increase RTX 3080 stability with a new driver update. Now, Nvidia has released its Geforce 456.55 WHQL driver which "improves stability" of RTX 30 series graphics cards. Both software and hardware play a role when it comes to hardware stability. 



Let's talk about Nvidia's RTX 3080 CTD Issues - Are spcaps and MLCCs to blame?

Where does the blame lie? 

As we have stated before, the blame for RTX 3080 stability issues can be blamed on a number of factors. Pinning these issues on a single company or decision does a disservice to everyone involved, given the complexity of these products and how several factors impact both GPU performance and stability. 

Perhaps Nvidia has been too aggressive with the RTX 30 series' power and frequency targets, as small underclocks have eliminated the RTX 3080's crashing issues for many consumers. On the other hand, Nvidia could have given its AIB partners more time to put their designs through quality control; or have given them access to their full driver stack earlier so that real-world applications could be tested at an earlier date. 

Several aspects of Nvidia's RTX 30 series launch appear to be rushed, and this has created time constraints which have allowed problems to fall under the radars of several GPU manufacturers. Igor's Lab has claimed that Nvidia didn't give their AIB partners full access to "suitable drivers" at an early enough stage to enable full QA testing for their custom PCBs. More extensive QA testing could have prevented these RTX 3080 stability issues entirely. 

My RTX 3080 is crashing. What should I do?   

Earlier today, Nvidia released its Geforce 456.55 WHQL driver. Nvidia claims that this driver "improved stability" of RTX 30 series graphics cards in "certain games", allegedly addressing RTX 3080 stability concerns. That said, we do not know if this fix will address RTX 3080 stability for all RTX 3080 users, or how this driver achieves higher levels of stability.  

Those who have unstable RTX 3080 graphics cards have reported that lowering the clock speeds of their GPU by around 100 MHz using tools like MSI Afterburner and EVGA Precision X1 have been able to address their crashing issues. If Nvidia's RTX 3080 crashing issues are not addressed by the company's latest drivers, consumers should contact the manufacturer of their graphics card for support. We do not consider underclocking as a true solution to this problem, but it should allow your graphics card to function until Nvidia and their partners have fixes and full hardware support plans in place.  

These RTX 3080 stability issues don't impact all Geforce customers, and it is possible that these early issues could be addressed entirely in software. At this time, there are still a lot of unknowns surrounding this story, which means that we should expect more detailed information to become available at a later date. 

At this time, several Nvidia AIB partners are expected to alter the PCB designs of their custom RTX 3080 models to help address these stability concerns. That said, it is still possible that Nvidia could address these issues in software, making radical design changes unnecessary. 

You can join the discussion on Nvidia's RTX 3080 CTD issues on the OC3D Forums

Let's talk about Nvidia's RTX 3080 CTD Issues - Are spcaps and MLCCs to blame?  

«Prev 1 Next»

Most Recent Comments

28-09-2020, 14:12:27

AlienALX
No I don't think so. And thanks Mark for starting this thread. I just got done posting on another forum, and here are my thoughts.

It wasn't about the capacitors in the first place. If the filtering and noise was becoming an issue (where as it wasn't on Turing etc) then it is because the core is not stable. Once again people jumped the gun in order to claim a world's first, when the real issue was nothing to do with it.

For some reason the driver itself, without any software, is boosting the cards faster than they are rated at. Quite probably to make them look better in reviews, IE let them automatically overclock and then you don't have to worry about reviewers marking them down by 10% and then adding it in with overclocking to make people think "Well I don't overclock so to me it's only going to be 20% faster". The problem though, just like with the 5600XT after launch driver is Nvidia have written cheques they can't cash. IE, not every single 3080 die out there is going to be able to do what the drivers are telling them to do without running into stability issues. And that was your problem, not capacitors or filters or anything else.

The fact is that some cores are going to need more power. 10w more it turns out, and lower boosts (they have been lowered to 1930 IIRC from 1970 odd). Quite why they are doing this when the box rated is 1710 I don't know. However what I do know is that 1900+ mhz is 10% more than 1700+ mhz. Especially in your benchmark scores

It's why I have had my cards under water ever since the air coolers simply couldn't cope any more. I hated leaving that sort of performance and more wasted blowing around in a haze of hot air.

The TGP was already absolutely terrible. However, it would have sounded even more terrible with another 10w slapped onto it.
Quote

28-09-2020, 14:47:42

CaTcHmG
Sorry but undervolting and underclocking should not be THE fix, it clearly states BOOST, so why pointing to reference speeds on the box when it clearly states cards can boost???

Another factor concerning cards that are going to be doing the rounds in the next few months that are going to be in the market as STAY AWAY 🤫 maybe give the 3080 series a miss 👍Quote

28-09-2020, 15:02:13

WYP
Quote:
Originally Posted by CaTcHmG View Post
Sorry but undervolting and underclocking should not be THE fix, it clearly states BOOST, so why pointing to reference speeds on the box when it clearly states cards can boost???

Another factor concerning cards that are going to be doing the rounds in the next few months that are going to be in the market as STAY AWAY 🤫 maybe give the 3080 series a miss 👍
Underclocking is not a fix, but it should allow affected users to get their cards working in the meantime. It may take a while for AIBs to be ready to replace cards or for Nvidia to fix things with drivers etc.

Companies still need to react to this, and that takes time. In the meantime, RTX 3080 owners can underclock their GPUs.

Quote:
Originally Posted by AlienALX View Post
No I don't think so. And thanks Mark for starting this thread. I just got done posting on another forum, and here are my thoughts.

It wasn't about the capacitors in the first place. If the filtering and noise was becoming an issue (where as it wasn't on Turing etc) then it is because the core is not stable. Once again people jumped the gun in order to claim a world's first, when the real issue was nothing to do with it.

For some reason the driver itself, without any software, is boosting the cards faster than they are rated at. Quite probably to make them look better in reviews, IE let them automatically overclock and then you don't have to worry about reviewers marking them down by 10% and then adding it in with overclocking to make people think "Well I don't overclock so to me it's only going to be 20% faster". The problem though, just like with the 5600XT after launch driver is Nvidia have written cheques they can't cash. IE, not every single 3080 die out there is going to be able to do what the drivers are telling them to do without running into stability issues. And that was your problem, not capacitors or filters or anything else.

The fact is that some cores are going to need more power. 10w more it turns out, and lower boosts (they have been lowered to 1930 IIRC from 1970 odd). Quite why they are doing this when the box rated is 1710 I don't know. However what I do know is that 1900+ mhz is 10% more than 1700+ mhz. Especially in your benchmark scores

It's why I have had my cards under water ever since the air coolers simply couldn't cope any more. I hated leaving that sort of performance and more wasted blowing around in a haze of hot air.

The TGP was already absolutely terrible. However, it would have sounded even more terrible with another 10w slapped onto it.
Nvidia's last couple of GPU generations have had GPUs go far past their rated boost clocks. That's just how GPU boost 2.0 and newer work.

As far as the capacitors go, its a multitude of factors. If Nvidia weren't as aggressive with their clocks, this wouldn't be a problem. If Nvidia had given their AIB partners more time to test their designs and gave them full driver access earlier, this wouldn't be a problem. Stricter component guidelines may also have helped matters.

There are a lot of ways that Nvidia could have avoided this, and a lot of them involve Nvidia taking their time with this launch. Ampere feels rushed.

Like lots of Intel's recent offerings, Nvidia has pushed hard on the Voltage-Frequency curve. There's a reason why Ampere's real world performance/watt improvements over Turing aren't that substantial. To me, it sounds like Nvidia feels threatened by RDNA 2.Quote

28-09-2020, 15:02:44

Greenback
Quote:
Originally Posted by CaTcHmG View Post
Sorry but undervolting and underclocking should not be THE fix, it clearly states BOOST, so why pointing to reference speeds on the box when it clearly states cards can boost???

Another factor concerning cards that are going to be doing the rounds in the next few months that are going to be in the market as STAY AWAY 🤫 maybe give the 3080 series a miss 👍
If it's on the box as 1710 and goes up to 1710.5 every 4 hours it can boost, can is a very grey areaQuote

28-09-2020, 15:10:17

Warchild
I still think this was all known before launch, and part of the reason why stock is so bad. Once you realise the issue, and you already had XX amount of stock on its way to sellers, you arent going to keep making more, if you think Nvidia can resolve it through drivers or not.

If Nvidia fix it via drivers then AIB can continue to use cheaper components. Otherwise a small redesign on the pcb is needed.

regardless of it all, a driver fix is just a workaround solution and no way on Earth should any owner of a 3080/3090 have to settle for this without entitlement to an RMA. It does not matter what is on the box. If you are allowed to overclock it then you should be able to push it to its limits without hitting issues because of component instability. This is not the same as silicon lottery.Quote
Reply
x

Register for the OC3D Newsletter

Subscribing to the OC3D newsletter will keep you up-to-date on the latest technology reviews, competitions and goings-on at Overclock3D. We won't share your email address with ANYONE, and we will only email you with updates on site news, reviews, and competitions and you can unsubscribe easily at any time.

Simply enter your name and email address into the box below and be sure to click on the links in the confirmation emails that will arrive in your e-mail shortly after to complete the registration.

If you run into any problems, just drop us a message on the forums.