# Replaced 3950X with 5950X = WHEA and reboots



## Deepcuts

Hello,

*Please vote on the pool only if your system is not stable with BIOS defaults, memory at 2133 Mhz without XMP, without any CPU or RAM overclocking, without PBO or any voltage tweaks and of course, if you do not have any issues with your Ryzen 5000 or your problem has been fixed.*
_* you can select 2 values. 
Motherboard+CPU if you have issues. 
No, I tested extensively for several days+CPU if you do not have issues._
_It did, but *+CPU if your issue has been fixed._

*See **https://www.overclock.net/threads/replaced-3950x-with-5950x-whea-and-reboots.1774627/post-28698010** for the solution to my issue.*


I bought the new AMD Ryzen 5950X to replace my AMD Ryzen 3950X.
This is the only new component in the system. The rest of the components are in the signature.


*Problem*

As soon as I booted up to Windows, the system started rebooting and crashing, sometimes with the BSOD WHEA Uncorrectable Error​

*What I tried*

*Long story short:*​
I have replaced every component except the CPU and the motherboard.

*Long story:*​
removed all RAM sticks and tested with only one at a time in different memory slots.
tested with memory at 2133 Mhz auto timings, XMP and manual timings without XMP.
took out my RAM and tested with one stick of G-Skill F4-2400C15S-8GNS and one KIT of 2 sticks Corsair CMK8GX4M1A2400C16.
replaced the PSU with a Corsair AX760i
removed any other USB devices besides mouse and keyboard
tested with only a Bluetooth mouse. No other USB connected.
removed any other HDD and SSD besides the system/windows one.
replaced the system/Windows SSD and tried reinstalling Windows. Crashes while installing.
removed the CPU to check for bent pins with a magnifying glass. Twice. All good.
downgraded BIOS to version F30.
re-flashed BIOS version F31e.
upgraded BIOS to F31h, F31i, F31k, F31l, F31n, F31o, F31
cleared CMOS and tried booting without setting anything in BIOS.
booted Ubuntu 20 Desktop live USB. Crashes before desktop with some cryptic error about CPU.
checked CPU and motherboard temperatures. All fine.
reseated the GPU.
tested with an RX 460 GPU instead of GTX 1080 ti.
tested with an RX 590 GPU instead of GTX 1080 ti. Takes longer to crash than with the GTX 1080 ti on BIOS version F31n.
disabled C-States
disabled HPET-Timer
forced PCIe to gen 2/3/4
disabled AMD Cool&Quiet
disabled PBO (always have it on Auto anyway)
removed all SSDs and HDDs and tried booting from Ubuntu live USB
tried all levels of LLC
Enabled Preferred Cores


*Temporary fix*

After many failed attempts with various BIOS settings, the only one that fixes this problem is setting "Core Performance Boost" to disabled. Of course, with this setting disabled, this new CPU performs a lot worse than the old 3950X.​With "Core Performance Boost" disabled, I can run my RAM at 3600 and IF/UCLK at 1800 with tight timings without any problems. 300+ Handbrake CPU stable encodes so far.​

With F31h Windows no longer crashes at boot, but crashes under load or random at idle like before.
The fastest way to crash the system is to run AIDA64 memory copy benchmark (will crash when CPU will reach 100% usage), a Handbrake encode (will crash as soon as it starts encoding) or a game (Guild Wars 2 crashes at login screen).

Opened a ticket with Gigabyte, but knowing Gigabyte, their response will be "We will inform our engineers" and then silence.
Opened a ticket with AMD. No response. Received an email requesting some details. Still waiting. Received another email requesting details already sent in the original RMA ticket. I guess AMD support and Gigabyte support are outsourced at the same helpdesk. RMA accepted after 5 weeks: my reply to AMD.


Anyone else having problems with the new 5950X and Core Performance Boost?

Thank you.


----------



## Korital

I had the exact same problem on the aorus master x570. What fixed it for me was I had to manually set the ram timings, ram dram voltage and vcore soc, and also manually select the proper infinity fabric frequency. I think there is some type of ram bug with the latest bios.

See if that helps you.


----------



## Deepcuts

What value for VCORE SOC are you using now?


----------



## Marucins

Same as with me.

Swapped 3950X to 5950X Aorus Extreme motherboard. Now the system is rebooting at any time.

Latest BIOS, Windows 10 (20H2), Chipset Driver, AMD Ryzen Master, etc.
Settings for XMP memory, I also checked for OC (frequency, ending with voltages through timings).

The same!!! 
Restart at sudden undefined moments.

I did not move the voltage on the CPU. I always have an AUTO.


----------



## PraiseKek

Bios issues


----------



## Marucins

No more GIGABYTE

I will change the BIOS to the older one - F30 - reviewers did not have such a circus.


----------



## Marucins

FFFUUU..... Nothing has changed.


----------



## Deepcuts

It seems other brands are having the same issue. MSI released a new BIOS for some boards that seem to solve the problem for some users.
Meanwhile, Gigabyte did not even reply to my ticket, not even with their usual copy-paste standard stuff. /s

Question is: nobody is testing these CPUs on actual motherboards these days? How the f do they release support for new CPUs with new BIOS versions and not test it for 5 minutes to see if the thing boots at least.

later edit: 
finally some good news









/s

I have 5 support tickets ever opened with Gigabyte. 100% of their replies are identical to the above one. And then, nothing.


----------



## cstkl1

the amd experience still strong.

wonder will that experience double with 6800xt..


----------



## Marucins

*Deepcuts*, write me what you wrote to them. I will also report from myself.


----------



## Deepcuts

Marucins said:


> *Deepcuts*, write me what you wrote to them. I will also report from myself.


What I wrote to Gigabyte is the exact text from the 1st post, minus the fact that I have tested with another GPU,RX460 and of course, without the last 3 lines.


----------



## MoW

Hi to all. Just registered so I would share that I am too having stability issues with 5950x and the aorus Xtreme. It would random Bsod with Whea errors, on both bios F30 & F30E and with stock settings , Bsod also with XMP settings. I still have my 3950x, and it works well without bsod on the Xtreme.
Have raised a ticket with gigabyte, hopefully they will address the issue with a bios update.
I would urge those having issues raised it with gigabyte.


----------



## Deepcuts

MoW said:


> I would urge those having issues raised it with gigabyte.


Sorry to hear so many have this issue.
Thank you for writing to Gigabyte. Hope more people writing about this issue will expedite things with them.


----------



## Blackfyre

WOW, reading these issues just makes me want to wait. And this is a ~$1200 motherboard in Australia. You'd expect the highest end boards to get the fastest release for fixes. But at least in Australia we are protected by Consumer Laws, basically I would have 0 problems returning the board and getting a full refund if it had these issues. Sadly in other countries you have to contact Gigabyte, here it's not my job to contact Gigabyte, the store you buy from has to deal with them.


----------



## Deepcuts

F31g same situation.
Do not bother.


----------



## Marucins

I wrote ticket to the support... 

Did you check the new BIOS with the old CPU (ZEN 2)?


----------



## Deepcuts

Marucins said:


> I wrote ticket to the support...
> 
> Did you check the new BIOS with the old CPU (ZEN 2)?


Was using F31e with 3950X without any problems.
Cannot check F31g or newer because I gave the 3950X to a friend.


----------



## 84stangman

Hey, have you tried the original chipset driver from AMD's Site?


----------



## Marucins

*Deepcuts*, me too.

F31g does't correct the problem with restarting the computer :\

*84stangman,* of course.


----------



## MoW

Anyone tried F31h?


----------



## Marucins

*MoW, *I'm at work now. I will check out after 4 p.m.

*Deepcuts, c*an you check the new BIOS? You have the specification closest to my rig (CPU, MOB, RAM)


----------



## Deepcuts

Trying now.
Don't hold your breath.


----------



## Deepcuts

Updated to F31h
Clear CMOS via back IO button for good measure.
Entered UEFI and noticed FCLK is 104.4 Mhz. Weird. never seen it so high on auto. Set it to manual 100.
Rest on default.
Problem remains. Windows crashes, Ubuntu live does not even boot.

Set VCORE SOC to 1.000 and I could boot Windows, but crashes under Aida64 memory copy and most games.
Tried almost all combinations of VDDG, VDDP, VCORE SOC, IF, RAM Speed, LLC, etc.
The higher the VCORE SOC, the most likely Windows will not even boot.
No matter the combo, Aida64 will crash on copy benchmark as well as most games. 100% will crash at idle after some time or just internet browsing with WHEA uncorrectable error BSOD.

The only setting that gets this rig stable is setting "Core Performance Boost" to disabled.
With it disabled, I can set everything else on Auto and will work just fine.

On the upside, somehow it seems F31h is very stable when changing various voltages, like VDDP/G.
Up until now, a lot of times when changing anything voltage related, the board will beep twice and reboot 3 times. Somehow F31h fixes this for me.


----------



## Tweedilderp

Blackfyre said:


> WOW, reading these issues just makes me want to wait. And this is a ~$1200 motherboard in Australia. You'd expect the highest end boards to get the fastest release for fixes. But at least in Australia we are protected by Consumer Laws, basically I would have 0 problems returning the board and getting a full refund if it had these issues. Sadly in other countries you have to contact Gigabyte, here it's not my job to contact Gigabyte, the store you buy from has to deal with them.


I have the same board and live in queensland, am expecting my 5950x in a few days by post. I am relishing troubleshooting as just doing a fresh install with my 3900x has caused nothing but headaches.

What I suggest to everyone here is to check a few things, if youre running a pcie4 card make sure tbt and 10-bit tag support is ON. Also make sure gear down mode is enabled in every option on the BIOS and switch PCIE mode to auto. 10-bit tag support is used in pcie 4 iirc so it is likely if its disabled that would cause reboots. 

While on BIOS settings make sure to go into the power config in the (middle tab of the) BIOS and try with HPET (high precision event timer) to OFF to see if it helps. I found turning mine on after changing dynamic ticks in windows helped a lot but if you havent touched that then maybe turning it off will help. I am not sure if its the new chipset drivers as well but I have found my 2080 TI, yes still waiting on 3090 stock D:, is getting a lot of high DPC latency spikes even with perfer maximum performance enabled in the control panel. Apparently this is due to windows 10 now requiring/preferring DCH version of drivers. 

NVIDIA has those drivers but you will need to use their advanced driver search, NOT GFE or their usual web portal. You have to disconnect the net/disable NIC and reboot in safe mode with DDU to uninstall the standard drivers and then reboot again into safe mode and install the DCH version of the drivers and then REBOOT AGAIN into regular OS and get the NVIDIA control panel from the windows store.....ugh such a mouthful to explain driver updates in 20H2 now.....

I am about to do exactly that but apparently that has solved a lot of stuttering for people with using 20H2 and nvidia drivers. I also suggest to watch FR33THY on youtube, he has an awesome introduction and optimisation guide to 20H2 and has multiple google drive folders for all the presets and batch files he uses. His A-Z guides are something to behold but I swear the guy is like a chihuahua on coke, can't sit still for 1 second he is always tweaking hahaha. Really it's what I will be like once I get this new hardware up and running and I am on the modafinil playing FPS' til the cows come home.

OHHHHHH and another thing to try with on/off in the BIOS is maybe the spread spectrum option under the core multiplier. The changes above and this one all affect the windows timers and the new 20H2 timer is still relatively unknown in its actual function unless you're an admin or super-user.....which you would charge to spend the time divulging all the info.

Anyhoo hope that helps and look forward to joining you sweaty 5950x boiz in a few days so we can all cry thermal paste tears together as we do the re-seat dance.


----------



## Tweedilderp

Deepcuts said:


> Updated to F31h
> Clear CMOS via back IO button for good measure.
> Entered UEFI and noticed FCLK is 104.4 Mhz. Weird. never seen it so high on auto. Set it to manual 100.
> Rest on default.
> Problem remains. Windows crashes, Ubuntu live does not even boot.
> 
> Set VCORE SOC to 1.000 and I could boot Windows, but crashes under Aida64 memory copy and most games.
> Tried almost all combinations of VDDG, VDDP, VCORE SOC, IF, RAM Speed, LLC, etc.
> The higher the VCORE SOC, the most likely Windows will not even boot.
> No matter the combo, Aida64 will crash on copy benchmark as well as most games. 100% will crash at idle after some time or just internet browsing with WHEA uncorrectable error BSOD.
> 
> The only setting that gets this rig stable is setting "Core Performance Boost" to disabled.
> With it disabled, I can set everything else on Auto and will work just fine.
> 
> On the upside, somehow it seems F31h is very stable when changing various voltages, like VDDP/G.
> Up until now, a lot of times when changing anything voltage related, the board will beep twice and reboot 3 times. Somehow F31h fixes this for me.


Try my tip for spread spectrum, that affects the base clock mate, should hopefully see some stability.

If it has the options still also try the CPPC and CPPC preferred cores to enabled and disabled C states with cool n quiet ENABLED with pstate 0. Also i believe in the advance core settings there is also a low idle voltage option, try that on and off as well, reboot each time to test (its annoying but youll thoroughly rule things out).

Those settings are also how i got 20H2 stable with my 3900x and still retaining 4.65ghz light threaded and 4.4ghz all core in gaming. A good way to check if core boosts are being assigned properly and wont get "stuck" is to check event viewer when you load in and open the system section, look for kernel entries that have preferred core in the details and make sure they all have slightly different numbers like 130, 133, 142 etc... thats the chipset confirming with the OS so that boosts and cores are assigned properly.


----------



## Deepcuts

@Tweedilderp 
Thank you for the information. Will try some of your suggestions.
Please note that setting manual FCLK to 100 disables Spread Spectrum, which I did and also that a Ubuntu live USB also crashes before desktop. So slim chances that nvidia has anything to do with this.


----------



## Marucins

This makes me more and more convinced that we have broken processors...: GIGABYTE Latest Beta BIOS - TweakTown Forums

It is impossible that it works for one and not for another. The same CPU, the same motherboard, and the new BIOS works with him, but not with us?

Maybe check this CPU on another motherboard?


----------



## InsaneMembrane

I upgraded from a 3300X to a 5950X on the Gigabyte B550I AORUS PRO AX motherboard and get WHEA_UNCORRECTABLE_ERROR BSOD in games. What worked for me was unplugging every single USB device (including keyboard and mouse). If you have things connected to the internal motherboard USB headers you might need to unplug those too.

Usually I would BSOD within 2-10 minutes in Shadow of the Tomb Raider. After unplugging every USB device I went 2 hours without a crash and then plugged them back in and was fine. Then I quit my game, restarted Windows (keeping all USB devices plugged in) and crashed within 2 minutes of the same game. After the BSOD I noticed in Device Manager I had a "Unknown USB Device (Set Address Failed)" showing. I checked Event Viewer and saw every time I had the BSOD, Windows Error Reporting was taking a kernel dump and a USBHUB3 dump. I checked what the Unknown USB Device was (it goes back to normal after a restart) and it turns out to be the Gigabyte RGB Fusion 2.0 controller.

I have G.Skill TridentZ Neo RAM too. Is this some incompatibility with Ryzen 5000 CPUs, Gigabyte motherboards, this RAM and some USB devices?


----------



## Deepcuts

Tested with only a Bluetooth mouse. No other USB connected.
BIOS reset to factory settings and optimized defaults loaded.
Same crash with Aida64 copy benchmark and games.
It seems to be a bit more stable nevertheless, or just placebo in my case.


----------



## Deepcuts

To rule out RAM, I have tried with one stick of G-Skill F4-2400C15S-8GNS and one KIT of 2 sticks Corsair CMK8GX4M1A2400C16.
Same problem, BSOD with or without XMP loaded.
So up until now I have replaced everything but the mainboard and CPU.


----------



## ColdDeckEd

At this point I think you'd be justified in RMAing the CPU


----------



## Marucins

Even if the motherboard was replaced. Maybe someone of your friends has, or a neighbor in the area, a colleague from work?

For me, the computer service says that it will check in 2 weeks because they have a lot of work now. Absurdity.


----------



## Deepcuts

Have a couple of X570 and B550 boards I can test the CPU on, in remote offices. Will try to test there.
In my 20 years working in IT I never once bought or had to service a defective brand new CPU. I had lots of defective boards though.
All over the internet people are starting to post about issues with 5950X, 5900X and WHEA crashes and reboots. I guess if more units were available at launch, even more people would have posted.
I am 99% sure this is a firmware issue, with AMD AGESA or Gigabyte (or both)
I just hope the fix comes sooner rather than later.
Pretty disappointing to pay for a Ferrari only to find you have to keep it under 90 Km/h.
Nevertheless, will try to test the CPU on an Asus and MSI X570 and B550 and get back with details.


----------



## MoW

Got a reply from gigabyte (what ever that is worth) as per below:

As we are looking into this issue, have you done a clean install to your system before after upgrading hardware?
To our understanding, this issue is not identical through all systems. Majority of people with this setup has the system running stably.
What we are concerns is that your old operating system is not quite compatible with the 5950X processor.
We can only confirm if you test Window 10 latest version to a spare drive and booting into the system with the 5950X.
As it is fair to assume the motherboard, and your components are all functional as it is working in previous setup, it is most likely a issue with the software (OS) or motherboard (BIOS).
If you can have the system run stably without rebooting/crash when in BIOS menu, it is less likely an BIOS issue.
BSOD only occurs when Windows system is loaded and certain detail has failed to work.

Regards,
GIGABYTE


----------



## Marucins

I listened to the geniuses from GIGABYTE - I put a new system - did't help. It still reboots itself.

On Friday, the CPU is flying to the RMA. I'm fed up with this unequal fight.


----------



## Redwoodz

cstkl1 said:


> the amd experience still strong.
> 
> wonder will that experience double with 6800xt..


  Of course Intel doesn't have that problem, they don't change tech and force you to buy a new motherboard for each new CPU.


----------



## Redwoodz

Deepcuts said:


> Have a couple of X570 and B550 boards I can test the CPU on, in remote offices. Will try to test there.
> In my 20 years working in IT I never once bought or had to service a defective brand new CPU. I had lots of defective boards though.
> All over the internet people are starting to post about issues with 5950X, 5900X and WHEA crashes and reboots. I guess if more units were available at launch, even more people would have posted.
> I am 99% sure this is a firmware issue, with AMD AGESA or Gigabyte (or both)
> I just hope the fix comes sooner rather than later.
> Pretty disappointing to pay for a Ferrari only to find you have to keep it under 90 Km/h.
> Nevertheless, will try to test the CPU on an Asus and MSI X570 and B550 and get back with details.


 Early adopter's tax. No hardware is immune. I always wait for the kinks to get ironed out.
Don't forget this could also be caused by Windows. Try Ubuntu.


----------



## Deepcuts

F31i same problem.


----------



## Marucins

Gigabyte on my website (POLISH) removed even previous versions. Now the newest one is the F30.

My package is on its way to the service - I bought it at Proshop.

Deepcuts, you did too many things and checked it was the BIOS fault. 
The faster you send your CPU to the RMA, the sooner you will get a new one.
From what I saw on the Proshop website, new processors (5950X) will only be available on December 28, 2020. But the service is not included in the store's range. After all, the service also has to get you a new processor from somewhere.


----------



## MoW

Returned my 5950x to the retailer, they tested it on the asus dark hero. Still the same issue, Bsod with Whea errors.
Kinda sucks isn't it. Don't tell me we have all with us faulty chips ?
All the 5950x sold out in my area. So can't get a new replacement on the dot. 
It's either wait for new stock to arrive (that will take ages) or get a refund


----------



## Deepcuts

MoW said:


> Returned my 5950x to the retailer, they tested it on the asus dark hero. Still the same issue, Bsod with Whea errors.


So what was the shop's take on this? CPU busted? BIOS incompatibility? Raised shoulders and "no clue"?


----------



## Esticbo

Gigabyte 👎🏻


----------



## nevcairiel

I'm facing the same issue with a 5950X and a brand new Gigabyte X570 Aorus Master. I can boot into windows and just let it sit idle, and it'll eventually reboot.

With the number of reports of the exact same problem description, I somehow don't believe the CPU would actually be faulty in a similar manner for all of us, and it seems more likely that the boards are just not really compatible with this CPU yet. Maybe screwed up voltages somewhere. Some people might be lucky that their silicon behaves differently and works with those voltages.

I did a bit of digging around, and similar issues crop up not only for Gigabyte boards, but at least MSI and Asus as well. That leaves me to conclude that its maybe the AGESA, or perhaps the CPUs afterall.. But that would be a lot of faulty CPUs.

If I knew for sure that the gigabyte board just has issues and another board from eg. ASUS or MSI would solve the issue for sure, that would be on order right now. But alas, it does not look like that would be a guaranteed fix.


----------



## MoW

Deepcuts said:


> So what was the shop's take on this? CPU busted? BIOS incompatibility? Raised shoulders and "no clue"?


They said most likely hardware issue (cpu). The fact we are facing the same issue would have ruled out a faulty CPU.
Maybe it's compatibility or a weak silicon sample.


----------



## MoW

nevcairiel said:


> If I knew for sure that the gigabyte board just has issues and another board from eg. ASUS or MSI would solve the issue for sure, that would be on order right now. But alas, it does not look like that would be a guaranteed fix.


From what I gather from the retailer, Giga boards with most issues. Has anyone tried another chip on the line up like the 5800x and see if the same error crops up?


----------



## Deepcuts

The only review featuring Gigabyte Aorus X570 Xtreme and AMD Ryzen 5950X I can find is from Gear Seekers.
He does not mention any problems with the build whatsoever.
The only difference as far as I can tell is that he is using an OEM CPU part, as seen here 



100-100000059WOF is retail BOX (what I have) while his 100-100000000059 is OEM (tray)
I find it strange that so many Youtube reviews about Ryzen 5950X are up, yet none mention any problem, reboots, WHEA errors. Nothing!
I am thinking: either those youtubers received cherry picked OEM parts or they are not disclosing problems in fear of not receiving free stuff to test in the future.


----------



## MoW

Another one to join the club. This one is a 5800x with crosshair 8








New Ryzen 5800X Build BSOD (WHEA_UNCORRECTABLE_ERROR)


Hi! I built a new gaming PC yesterday after receiving my Ryzen 5800x. Around 30 minutes to an hour after a fresh install I get a “WHEA_UNCORRECTABLE_ERROR” BSOD. Once I log back in that I get the same BSOD after a few seconds every single time. It also happens while in safe mode. My motherboard...




www.overclock.net


----------



## Kryptonic83

Here’s some pics of my F31e bios settings running stable 5950x x570 Aorus Master 


http://imgur.com/a/xshjK5m




http://imgur.com/j3am9h3


----------



## Deepcuts

Kryptonic83 can you please tell me what is your CPU part number?

later edit:
and also if your system is stable with BIOS Optimized defaults? I mean, no tweaking, all auto, RAM at 2133.


----------



## newls1

excellent thread here... im tuned in


----------



## Kryptonic83

Deepcuts said:


> Kryptonic83 can you please tell me what is your CPU part number?
> 
> later edit:
> and also if your system is stable with BIOS Optimized defaults? I mean, no tweaking, all auto, RAM at 2133.


100-100000059WOF
Yeah, I'm sure it was stable at defaults as well as that's how I installed windows, I don't think I did much testing with it with ram at 2133 though, I could maybe try that at some point.


----------



## nevcairiel

My old system is still working just fine, and the new one is not usable at all, so I'm pondering to just go for a RMA on the CPU at this point, and see what happens, even if it takes a bit to get a replacement.

I feel like its more likely that some CPUs might be a bit iffy rather then the mainboards, considering so many similar issues across vendors and mainboard models.


----------



## Deepcuts

I might have found a configuration that is stable without crippling performance. Need to do more testing to be sure.
RAM at 2133 for the moment.


----------



## Marucins

And how does the new BIOS "X570 AORUS Xtreme - F31k"?

Cosmetics or something about stability improved?


----------



## Deepcuts

I would say from F31 to F31k, there have been improvements in stability.
I have spent a lot of time with various combinations of BIOS settings trying to make this new CPU stable on my board.
I actually thought I got it stable, being able to run any benchmark and game for over 30 minutes without a crash and with good performance.
As I was ready to call it a day, I started a Handbrake batch, the actual purpose of this machine really and lo and behold, my "stability" went down the drain. Crashed and rebooted in seconds.
To be clear, RAM, IF and UCLK at 2133/1066. No XMP or memory overclock.
It is my belief that most people with Aorus X570 Xtreme and 5950X are far from stable.
Some of them tweaked the hell out of that BIOS and are on the edge of stability, with dangerous voltages and possibly having data degradation without their knowledge. Some went over the edge but they don't know it yet, like I was prior to running Handbrake, which I am sure can be substituted by other programs also.
Some of them like their sanity and just returned the CPU/motherboard instead of wasting their time. From this bunch, I would say a fair percentage don't bother posting on forums.

The shop I bought my CPU and motherboard from told me to send both for testing.
Waiting another week or so hoping Gigabyte or AMD will fix this. Would really ruin my day to install my old Asus Hero, 8700K and all the required programs and settings.


----------



## Marucins

Or maybe there are too many of these new features?

We know that Gigabyte motherboards have 128 Mbit BIOS memory chips.
Previously, other companies, due to a small BIOS chip and major changes to AGESA, eliminated some functions.


> _"ASRock officially confirmed that the first beta BIOS revision for B450 motherboards was made available on the official website a few days ago. However, it is worth bearing in mind that updating the firmware will result in the loss of support for older processors (Ryzen 1000 and Ryzen 2000) and there may be problems with proper operation."_


Maybe for better integration of new features and free memory, GIGABYTE should remove support for Ryzen 1000 from X570 chipsets?


----------



## manuela45

Soooo, Can I join this club? I didn't exactly upgrade. Just did a bios update to F31e to F31j and I'm still getting Whea Logger Event 19 and sometimes Kernel Power event 41 to go with it. I don't get reboots. I get straight unclean shutdowns. 

System specs

CPU- Ryzen 7 3800x
Motherboard- Gigabyte X570 Aorus Master
Ram -DDR4-3600MHz CL15 Gskill Trident Z
PSU- Corsair HX1000i

Windows 10 Version is 20H2


----------



## Marucins

On Monday, Proshop service got my CPU. Today I got information that the new processor will be shipped. 
When? 
Just when will I get a new CPU? Looking at the shortages in stores, I may wait until the end of December. 

But I must admit that I am satisfied with the Proshop service.


----------



## MoW

I am inclined to think that silicon lottery play a role here in causing all the havoc. 
Why is ppl using the same 5950x having no issues and some like us are facing this mess


----------



## Deepcuts

manuela45 said:


> Soooo, Can I join this club? I didn't exactly upgrade. Just did a bios update to F31e to F31j and I'm still getting Whea Logger Event 19 and sometimes Kernel Power event 41 to go with it. I don't get reboots. I get straight unclean shutdowns.


Shutdowns makes me think of overheating.
I would revert to F30 and test if it happens again.

By the way: club membership denied! Shoo with that oldie CPU. We are a fancy club.


----------



## manuela45

Deepcuts said:


> Shutdowns makes me think of overheating.
> I would revert to F30 and test if it happens again.
> 
> By the way: club membership denied! Shoo with that oldie CPU. We are a fancy club.


Haha dang. Goodbye fellas. 

Btw, my temps are good. I did small FFTs on Prime95 and I hit a maximum of 77c. I'm using a Noctua NH-D15. I'll revert back to F30 and see how that goes.


----------



## mdzcpa

And just when I was going to hand my 9900k Hero XI system to my son and build a new AMD rig. I haven't built an AMD system since Core2 arrived. With images of Athlon Slot As, Thunderbirds and Athlon 64 Hammers dancing in my head...my dreams are snuffed out. This is about the worst CPU/platform launch I've seen in years with 5000 series Ryzen chips puking all around the net. How is this not caught by review sites? Or AMD and mobo partners. 

I'm subscribed. I really hope this resolves soon. Very disappointing!


----------



## Deepcuts

mdzcpa said:


> How is this not caught by review sites? Or AMD and mobo partners.


I am asking myself the same exact question.
Not ONE review stating ANY problems whatsoever. I do not count those saying they were unable to overclock IF to 2000 or more.
Possibility1: most if not all the "big" reviewers received cherry picked samples; decent chance
Possibility2: they do not talk about this issue in fear of not receiving free stuff to test in the future; slim chance but not zero.


----------



## MoW

mdzcpa said:


> This is about the worst CPU/platform launch I've seen in years with 5000 series Ryzen chips puking all around the net. How is this not caught by review sites? Or AMD and mobo partners.
> 
> I'm subscribed. I really hope this resolves soon. Very disappointing!





Deepcuts said:


> I am asking myself the same exact question.
> Not ONE review stating ANY problems whatsoever. I do not count those saying they were unable to overclock IF to 2000 or more.
> Possibility1: most if not all the "big" reviewers received cherry picked samples; decent chance
> Possibility2: they do not talk about this issue in fear of not receiving free stuff to test in the future; slim chance but not zero.


It's really really a let down. I have high hopes on zen 3 after what I had with zen 2. Was really excited when i plucked in the 5950X into the socket. Then problems started to crop up with Bsod.
You are right that none of the reviewers mentioned anything about any sort of compatibility issues. 
Maybe this intentionally for end users to find out, huh ?


----------



## xeizo

I have a 5900X incoming tonight, but I'm not particularly nervous, I have and use three Zen+/Zen 2 processors all works very well. I fail to see why 5900X should be any different.


----------



## MoW

xeizo said:


> I have a 5900X incoming tonight, but I'm not particularly nervous, I have and use three Zen+/Zen 2 processors all works very well. I fail to see why 5900X should be any different.


5000 series is a different beast altogether. We came from Zen 2 and it worked well too.


----------



## newls1

truly hope this comes down to a simple bios fix and not cpu's with issues. Im crossing my fingers for you guys and for a fix.


----------



## URV

I also encounter instability issues with my Gigabyte b550m aorus pro and 5800x.
My system shuts downs randomly, most of the times after a heavy load and then restarting from the start menu.
I've got some WHEA-Logger errors previously which seem to be fixed with bios f11j(probably, didn't have the patience to test since it was still unstable), however shut down still occur.
Temps were ok each time, 60-70 degrees at most in normal load and with the newest bios it didn't even shut down under 100% cpu load.
First I tought is the psu, so I got another one, a 750w gold rated one which should be enough in this combination with an rx580 video card.
Then I've noticed a mistake that my ram is not in the QVL and borrowed two sticks that should work which are at a lower frequency, 3000 instead of 3600 and the shut downs are still there.
Core performance boost was also my only option to make this stable and xmp works too.
Gigabyte got me a copy-paste answer and all I can do is wonder now if it's worth getting another motherboard or this cpu has a problem afterall.


----------



## xeizo

Well, been running 5900X for four hours now, done all the benchmarking games and productivity and some surfing. I had exactly ONE WHEA error during that time. But that is one too much. I never had one with Zen+/Zen 2. My guess is bios needs some patching, but we will eventually know. No black screen or crash though, just that single error.

"
A corrected hardware error has occurred.

Reported by component: Processor Core
Error Source: Unknown Error Source
Error Type: Bus/Interconnect Error
Processor APIC ID: 0

The details view of this entry contains further information."

"

*+**System*



*-**Provider*

[ *Name*]Microsoft-Windows-WHEA-Logger

[ *Guid*]{c26c4f3c-3f66-4e99-8f8a-39405cfed220}




*EventID*19




*Version*0




*Level*3




*Task*0




*Opcode*0




*Keywords*0x8000000000000000




*-**TimeCreated*

[ *SystemTime*]2020-11-27T10:14:03.0190543Z




*EventRecordID*34256




*-**Correlation*

[ *ActivityID*]{c7ff0226-af93-4ce2-aaa3-7cc4f49d9d03}




*-**Execution*

[ *ProcessID*]5412

[ *ThreadID*]5192




*Channel*System




*Computer*DESKTOP-XXXXXXX




*-**Security*

[ *UserID*]S-1-5-19



*-**EventData*


*ErrorSource*0


*ApicId*0


*MCABank*27


*MciStat*0x982000000002080b


*MciAddr*0x0


*MciMisc*0xd01a0ffe00000000


*ErrorType*10


*TransactionType*256


*Participation*0


*RequestType*0


*MemorIO*2


*MemHierarchyLvl*3


*Timeout*0


*OperationType*256


*Channel*256


*Length*936


*RawData*435045521002FFFFFFFF03000200000002000000A8030000020E0A001B0B14140000000000000000000000000000000000000000000000000000000000000000BDC407CF89B7184EB3C41F732CB57131B248949139377F4BA8F1E0062805C2A36543F06CA1C4D60100000000000000000000000000000000000000000000000058010000C00000000003000001000000ADCC7698B447DB4BB65E16F193C4F3DB0000000000000000000000000000000002000000000000000000000000000000000000000000000018020000800000000003000000000000B0A03EDC44A19747B95B53FA242B6E1D0000000000000000000000000000000002000000000000000000000000000000000000000000000098020000100100000003000000000000011D1E8AF94257459C33565E5CC3F7E8000000000000000000000000000000000200000000000000000000000000000000000000000000007F010000000000000002040000030000100FA2000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000007000000000000000000000000000000100FA200000818000B32D87EFFFB8B170000000000000000000000000000000000000000000000000000000000000000B3F8F31CB1C5A249AA595EEF92FFA63C01000000000000009E07C000040000000000000000000000000000000000000000000000000000000000000000000000020000000200000070CE4D07A6C4D60100000000000000000000000000000000000000001B0000000B08020000002098000000000000000000000000FE0F1AD00000000000000000000500002E0001000500025A000000007D000000270000000000000000000000000000000000000000000000000010000000000000001000000000000000100000000000000010003B00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000


----------



## fessmm

I just build my first pc and did everything with youtube videos. When i start my pc, there is no screen. HDMI is connected to GPU. I need help, i don't know what to do.
Before i started with the building, i did a bios update with this tutorial:




And the bios update was from here:
B550M AORUS PRO (rev. 1.0) Unterstützung | Mainboards - GIGABYTE Germany
The F11i version.
after that i started building and now when i press the power button, it turns on but there is no screen.
I tried to reset CMOS but it did not help. Anyone please help.
Also when i try to longpress the power button to shut the pc down, nothing happens. the power button just blinks
Maybe CPU dead? It accidentaly came out when i tried to take out my cpu cooler but could that have been the reason to break it? the pins look good so i dont think thats the reason.
what can i do?
Build: https://de.pcpartpicker.com/list/sBb2rr
Pics:


http://imgur.com/a/q50Ont3


----------



## ENTERPRISE

fessmm said:


> I just build my first pc and did everything with youtube videos. When i start my pc, there is no screen. HDMI is connected to GPU. I need help, i don't know what to do.
> Before i started with the building, i did a bios update with this tutorial:
> 
> 
> 
> 
> And the bios update was from here:
> B550M AORUS PRO (rev. 1.0) Unterstützung | Mainboards - GIGABYTE Germany
> The F11i version.
> after that i started building and now when i press the power button, it turns on but there is no screen.
> I tried to reset CMOS but it did not help. Anyone please help.
> Also when i try to longpress the power button to shut the pc down, nothing happens. the power button just blinks
> Maybe CPU dead? It accidentaly came out when i tried to take out my cpu cooler but could that have been the reason to break it? the pins look good so i dont think thats the reason.
> what can i do?
> Build: https://de.pcpartpicker.com/list/sBb2rr
> Pics:
> 
> 
> http://imgur.com/a/q50Ont3


Please create your own thread for this as this is an older thread related to another users issue.


----------



## Marucins

We all have almost identical memories :\

Do you have any other memory set? So to check if the controller in the CPU covers these memories properly...


----------



## nevcairiel

I've tried two sets of memories, a 3200 CL14 B-Die set that I usually run in my old Intel box, and the new 3600 CL16 set I got for this build. Identical issue.

I'm trying for a RMA now, but AMD support takes like a week to respond for every response to the ticket.


----------



## Deepcuts

Marucins said:


> We all have almost identical memories :\
> 
> Do you have any other memory set? So to check if the controller in the CPU covers these memories properly...


If you read the 1st post, you will see I have tried with other 2 different memory KITs. No soup.

later edit:
With CPB disabled, my 64 GB G-Skill KIT works just fine at XMP or very tight manual timings with IF/UCLK 1800.


----------



## Todeseng3l

Good evening Gentlemen,

I am joining the circle of misery. New build for me coming from an X299 Intel HEDT platform, this is my first AMD system since the FX-51.

Stock settings (no OC, BIOS adjustments etc):
AMD 5950X (new)
Gigabyte AORUS X570 Xtreme (new)
Trident Z Neo Kit F4-3600C14Q-64GTZN (new)
WD_BLACK SN850 1TB (new) 
EVGA RTX 3090 FTW3 Ultra (recycled from working Intel rig)
Corsair AX1600i (recycled from working Intel rig)

Out of the box the BIOS recognized the 5950X with the factory installed F30 revision. I decided to roll the dice and install Windows without updating the BIOS. I couldn't even get through the preamble set-up without black screen restarts or WHEA BSOD.

I flashed the BIOS to revision F31I and tried a fresh install of Windows. Things appeared to be looking up, made it through the set-up and into Windows. Started updating/installing everything and received a random WHEA BSOD. Windows event viewer records Event 41, Kernel Power at every BSOD. Upon restarting Windows repaired something and removed my display driver. Continued my tasks and was good for about 5min before the next WHEA BSOD. Rebooted again, this time was able to work for 40 minutes before the next instance.

I haven't done many processor intensive tasks yet, however, max temp in HW monitor through installs etc. has been 34C (running 420mm AIO). Prior to finding this forum my first thought was memory... even though I was running stock (no XMP or manual tuning). I tried booting UEFI MemTestX86 and it couldn't find anything wrong.

Based on guidance here I disable Core Performance Boost and have been stable so far. I purchased directly from AMD so I will open a ticket with them and also with GIGABYTE. 

I hope a BIOS update resolves this. I too find it hard to believe there could be this many defective processors in the wild...

Cheers,

Tony


----------



## Deepcuts

Welcome to the club *Todeseng3l *
Nice and expensive setup right there.
Hope we all get our fix soon, one way or another.


----------



## Todeseng3l

Deepcuts said:


> Welcome to the club *Todeseng3l *
> Nice and expensive setup right there.
> Hope we all get our fix soon, one way or another.


I shot an email to Gamers Nexus asking them to investigate. Who knows if they will even read my email but it appears this is a pretty widespread issue, maybe they will take it on.

I seem to be stable, as everyone has mentioned by disabling Core Performance Boost. Would be interesting to tinker around with some manual OC settings to see if it is stable, could just be bad auto settings with CPB enabled?


----------



## bwana

Has anyone tried PBO2? I read some reports of it actually stabilizing performance.


----------



## Deepcuts

I know it is blasphemy, being on overlock.net and all, but I never tried overclocking any AMD CPUs so far.
While I am pretty sure I have tested almost all possible combination of BIOS settings, I never actually tried overlocking.
Clear CMOS and loaded setup default.
Only the Tweaker page was setup like in the image below. Rest default. 4 hours stable so far with CPB enabled (Auto). CPU temps under load max 85 Celsius. Reverted to no overclock and CPB disabled.
Let me know if this is a stupid thing to do with AMD.
Clock ratio on auto with same settings = same problem as before.


----------



## MoW

Todeseng3l said:


> I shot an email to Gamers Nexus asking them to investigate. Who knows if they will even read my email but it appears this is a pretty widespread issue, maybe they will take it on.
> 
> I seem to be stable, as everyone has mentioned by disabling Core Performance Boost. Would be interesting to tinker around with some manual OC settings to see if it is stable, could just be bad auto settings with CPB enabled?


Disabling Cpb is akin to running a crippled 5xxx processor. 
Looks like there's no fix yet for our woes.


----------



## Todeseng3l

Ended up taking a tour through the BIOS and tweaking a bunch of settings. Mostly followed Buildzoid's advice (



)

Spread Spectrum Control-->Disabled
VCORE SOC--> 1.1V
CPU VDD18--> 1.96V
AMD Quiet Cool-->Disabled
Global C-state Control-->Disabled
CPU Vcore Loadline Calibration--> Turbo
Vcore SOC Loadline Calibration--> Turbo
CPU Vcore Protection--> 400mV
CPU Vcore SOC Protection -->400mV
CPU Vcore Current Protection -->Extreme
PWM Phase Control-->Exm Performance
PCIe Slot Configuration--> Gen 4
Precision Boost Overdrive--> Manual
PPT Limit--> 666
TDC Limit--> 666
EDC Limit--> 666
Precision Boost Overdrive Scaler-->Manual
Customized Precision Boost Overdrive Scaler-->10x

With Core Performance Boost enabled, this has been the longest I have been stable thus far. No crashes for 1.5hrs and counting.

Max single core frequency I hit was 5.05GHz with max temp of 64C. Fingers crossed this remains stable.

EDIT: 4hrs stable and counting, toes crossed now too

EDIT 2: 10hrs of stability with a lot of gaming. Looks like the issue is resolved for me, I would recommend tweaking BIOS settings until you find something that works for your system. Also, Arctic Liquid Freezer II 420mm AIO is a beast. Haven't seen above 64C CPU temp.


----------



## aa.delite

I have the same reboot problem on my *5950X *and *Gigabyte B550 Aorus Master* using default settings. And devices connected to any *USB 2.0* ports continuosly disconnect 20-30 times a minute. Temporaly using USB 3.0 ports. I've tried F11i, F11j, F11k.
*F11j *seems to be the best. *F11k is crappy* and returns critical USB 2.0 problems and decreases perfomance by 50%. You may download F11j by using last bios link and replacing the letter. So I hope it's just a BIOS problem. I don't want to disable turbo boost, that's not I paid for so much.
No WHEA errors on F11j. *Maybe* no reboots yet (still testing, not sure. All AIDA64 test with no WHEA and reboots). Less USB 2.0 problems (not fixed, it still disconnects, but not every second).


----------



## Deepcuts

New F31n beta BIOS for Aorus X570 Xtreme: X570AORUSXTREME


Date 12/01/2020
Agesa 1.1.0.0 *D* and *fix random reboots*.

Other Gigabyte models on this page: GIGABYTE Latest Beta BIOS - TweakTown Forums

Happy testing.

later edit:
false alarm (at least in my case). Defaults loaded = reboots in games and Handbrake.


----------



## nevcairiel

No perceived changes from that beta BIOS. Still BSODs or reboots within minutes from starting the PC.
And also still waiting for AMD Warranty Support to get back to me....


----------



## Todeseng3l

Others are reporting on Reddit stability with my settings. I haven't had a crash with CPB enabled since I updated. Checking out of this thread now that I am good to go. Good luck all!

-Tony


----------



## Deepcuts

Todeseng3l said:


> Others are reporting on Reddit stability with my settings. I haven't had a crash with CPB enabled since I updated. Checking out of this thread now that I am good to go. Good luck all!
> 
> -Tony


Good news. Glad you got yours "stable".
The question is: is it stable at defaults without overlocking it/probably shortening your CPU's service duration/possible warranty problems/possible data loss?


----------



## Todeseng3l

Deepcuts said:


> Good new. Glad you got yours "stable".
> The question is: is it stable at defaults without overlocking it/probably shortening your CPU's service duration/possible warranty problems/possible data loss?


I usually upgrade my processor every 2-3 years. I have never had one fail or noticeably degrade, given that this is running much cooler than my other builds (64C), I am confident with the settings as they really aren't that exotic.

From what I can tell at default BIOS settings Core Precision Boost is pushing the CPU too hard and it runs into either a resource limit or a 'protection' barrier that won't let it draw the resources it needs to boost to the clock it sets. I agree you shouldn't have to change anything in BIOS for a processor to work stock, however, it's almost been a month since release... I wouldn't hold my breath for a quick fix.


----------



## MoW

Todeseng3l said:


> I usually upgrade my processor every 2-3 years. I have never had one fail or noticeably degrade, given that this is running much cooler than my other builds (64C), I am confident with the settings as they really aren't that exotic.
> 
> From what I can tell at default BIOS settings Core Precision Boost is pushing the CPU too hard and it runs into either a resource limit or a 'protection' barrier that won't let it draw the resources it needs to boost to the clock it sets. I agree you shouldn't have to change anything in BIOS for a processor to work stock, however, it's almost been a month since release... I wouldn't hold my breath for a quick fix.


Great to hear that you managed to fix your problems. However, having to use those extreme settings above just for stability sake is mind boggling. Still I did prefer a full fix by a final stable bios. It's not what we pay this processor for.


----------



## RandomOtaku

Same issue here. 

My board is X570i from gigabyte, and the cpu is 5950x. I picked up a pair of 3600c18 memory kit from Klevv because they are cheap and stable (according to some reviews I searched). My system is win10 20h2. Bios version is F31l. It runs smoothly when I disable the XMP file and let the ram run at 2666mhz. After enabling the XMP file, I started to experience random black screen and reboots. The event checker is flooded with whea-logger id19. Then I manually set the timers and the voltage for the ram in bios, tune them down to 3200mhz, c16 and 1.4v, FCLK 1600mhz the whea-logger id19 stopped showing up, however, the system can't pass memtest. It started to get errors and reboots. At last, I tuned them down to 3000mhz c16, FCLK 1500mhz. The system runs smoothly and passes memtest 700%. Although running memory at 3000mhz isn't a big deal for performance perspective, it still feels disappointing to own flag-ship cpu while can't even runs memory speed at 3600mhz. 

(If anyone is curious, the rest of the specs are: GPU: evga3080 xc3-the hottest AIB 3080 LOL, storage: sn750 1t nvme-don't have the budget for a gen4 ssd, PSU: coolermaster sfx850w-expensive, but never save a buck on psu or you will regret it, cooler: EK classic 240 without RGB-my case doesn't even have a side panel, case: Ncase M1 v6-I move quite a lot, and I am addicted to itx builds)

Jerry here


----------



## nevcairiel

Well technically its not the same issue, I can't even run with memory at 2133 and stupidly bad timings. I don't think the memory is the problem for that issue.


----------



## Deepcuts

So like....I have found a RX590 GPU in a PC.
Having tested with a RX460 before, I had low expectations.
Took out my GTX 1080 ti and plugged in the RX590. No drivers installed besides the Windows 10 default ones for this card.
I enabled CPB, XMP and started a Handbrake encode. Stable!
Nothing else tweaked. All default. IF and UCLK 1800.
With the exact same settings it would have crashed in seconds with the GTX 1080 ti

I really don't know what to make of this. Hope it is not a lucky boot of some sort. (haven't rebooted yet). Rebooted twice to be sure. All good.
It is true, when I tested the RX460, the BIOS version was different and lot crappier.
Will torture the poor CPU some more and get back with more info.

later edit:
It takes a lot longer, but in the end the exact same problem. The computer reboots.
Weird that an AMD GPU makes this new CPU behave differently than with a nVIDIA GPU.
So, false alarm. Again. F this.


----------



## Alvy

Man my hopes went up for about 10 seconds reading that. There must be something different that all those Xtreme+5950X users that are stable on defaults have on us. My first thought was GPU maybe. 2nd one was SSD GEN4 and that they're using only gen3/sata. Could be a mix of both.


----------



## Deepcuts

Alvy said:


> Man my hopes went up for about 10 seconds reading that. There must be something different that all those Xtreme+5950X users that are stable on defaults have on us. My first thought was GPU maybe. 2nd one was SSD GEN4 and that they're using only gen3/sata. Could be a mix of both.


Sorry for the false hope.
Tried RX590 + Samsung M.2 EVO GEN3 and with a Gigabyte M.2 Aorus GEN4. No other drives. No soup.


----------



## blucube

EDIT: I will also mention that one of the beta builds with ASUS that were having issues were in fact using the same "AMD AM4 AGESA V2 PI 1.1.0.0 Patch C" as the GB BIOS I suggested below despite claiming FCLK improvements. If that does end up playing a role in all of this the ASUS build that is stable for my board is on AMD AM4 AGESA V2 PI 1.1.0.0 Patch B which looks like the GB board jumped over?

Still pending confirmation by users

--

I didn't thoroughly read through the entire posts though wanted to drop a quick bit of info.

From my experience the issue seems to be following Zen 3 and X570 boards encountering WHEA with FCLK above 1600mhz regardless of any iteration of timings voltages etc.

After working with ASUS they had me flash their latest non-beta BIOS 2802 and I'm currently running 3666/1833 CL15 using settings gathered from DRAM Calc with no errors.

There's a X570 GB user on my thread who is reporting this issue and if it is in fact BIOS/FCLK related I questioned if they had tried this specific build simply because it states FCLK improvements - they haven't gotten back to me so I don't know if it's the/a solution - also I apologize if it's already been ruled out previously on the thread as I just skimmed it but...

If it doesn't fix it the general consensus from my perspective is Zen 3 + X570 & FCLK 1633+ which in my case definitely was resolved by a BIOS version. Why or what exactly I don't know. (Possibly AGESA patch version?)

I also can't speak for any DOCP/XMP settings as I've tuned with DRAM Calc this entire process.

--

F31I

X570 AORUS ELITE (rev. 1.0) Support | Motherboard - GIGABYTE Global









X570 AORUS ELITE (rev. 1.0) Support | Motherboard - GIGABYTE Global


Lasting Quality from GIGABYTE.GIGABYTE Ultra Durable™ motherboards bring together a unique blend of features and technologies that offer users the absolute ...




www.gigabyte.com





F31I

Or (evidently if WIFI version of the board?)

X570 AORUS ELITE WIFI (rev. 1.x) Support | Motherboard - GIGABYTE U.S.A.









X570 AORUS ELITE WIFI (rev. 1.x) Support | Motherboard - GIGABYTE U.S.A.


Lasting Quality from GIGABYTE.GIGABYTE Ultra Durable™ motherboards bring together a unique blend of features and technologies that offer users the absolute ...




www.gigabyte.com


----------



## MoW

Deepcuts said:


> Sorry for the false hope.
> Tried RX590 + Samsung M.2 EVO GEN3 and with a Gigabyte M.2 Aorus GEN4. No other drives. No soup.


It's definitely not the drives. I have been using gen 3 nvme and no issues whatsoever with the board paired with 3950x. Problem started when I plucked in the new cpu.


----------



## Deepcuts

blucube said:


> From my experience the issue seems to be following Zen 3 and X570 boards encountering WHEA with FCLK above 1600mhz regardless of any iteration of timings voltages etc.


With CPB disabled, I can enable XMP @ 3600 and IF/UCLK @ 1800 without any issues
With CPB enabled, I can try most manual settings under the sun, RAM at 2133 with IF/UCLK 1066, without success. WHEA and reboots. I say "try most" because it seems manually setting CPU ratio to 45 or even 46 with VCORE +0.0300, XMP or manual timings set and IF/UCLK 1800 seems stable here under a quick 4 hour test. Then again, I just want this CPU to work at defaults, without worrying about it going kaboom, so overlocking without knowing the platform true limits at all is not an option for me.

No sane person would buy a brand new car only to have to chip mod it in order for the car's engine to maybe not shutdown while driving. But...here I am. A beta tester on my own coin.


----------



## Marucins

...here we go again

*X570 series -> F31o* 









GIGABYTE Latest Beta BIOS - TweakTown Forums


Warning Some of beta BIOSes are still undergoing compatibility testing. GIGABYTE is sharing these BIOSes for testing purposes only and are not meant for general release. If you are not familiar with beta BIOS testing, then please only flash the recommended release BIOSes that are posted on the...




www.tweaktownforum.com


----------



## blucube

Deepcuts said:


> With CPB disabled, I can enable XMP @ 3600 and IF/UCLK @ 1800 without any issues
> With CPB enabled, I can try most manual settings under the sun, RAM at 2133 with IF/UCLK 1066, without success. WHEA and reboots. I say "try most" because it seems manually setting CPU ratio to 45 or even 46 with VCORE +0.0300, XMP or manual timings set and IF/UCLK 1800 seems stable here under a quick 4 hour test. Then again, I just want this CPU to work at defaults, without worrying about it going kaboom, so overlocking without knowing the platform true limits at all is not an option for me.
> 
> No sane person would buy a brand new car only to have to chip mod it in order for the car's engine to maybe not shutdown while driving. But...here I am. A beta tester on my own coin.


I guess the unfortunate sentiment I've heard historically is "Early Adoption Tax" or something along those lines... I do believe there is an issue to some degree (although no merits to suggest such a thing)... but it'll be refined...

In my case - I've admittedly left all CPU config up to bios / boosting. Through my HWinfo core clocks jump up to 4.6mhz and vcore a bit above 1.3v (as it happens was just testing out some different DRAM clocks and will add what HWinfo has gathered thus far):

Zen Timings + HWinfo just under 2hrs of logging Witcher 3: GOTY Full Settings 2k w/ OBS 2k streaming (not exactly a great control but... anyone with the issue will notice cut and dry difference between WHEA accumulation towards assumed issue vs. no)

*Zen Timings / HWinfo stats listed in SPOILER



Spoiler















































Since resolving my issues since working with ASUS and confirming so - I've just sent them a message asking their "take" on the matter as it may seem like they may be somewhat aware / able to shed some light on this particular scenario.

Additionally in lieu of "bad science / control" - at lower FCLKS in regard to anything above 1600mhz running tests like OCCT wasn't always enough to generate WHEA. Though if you were to further the load such as running OCCT + a relatively demanding game for instance... then WHEA would occur. In my case - simply initiating a large download would accelerate WHEA accumulation where OCCT standing alone would appear clean.


----------



## Deepcuts

I have created a pool for this topic.
If you can spare a minute, please cast your vote.


----------



## blucube

Deepcuts said:


> I have created a pool for this topic.
> If you can spare a minute, please cast your vote.


Well, thing is mine never rebooted/BSOD. However, not tooting my own horn but that could just simply be due to how quickly I caught it as it sounds like most the BSOD messages people are getting are affiliated with WHEA messages. It was a brand new build using DOCP and I had just turned it on and had HWinfo up mainly because I was more concerned with the thermals/cooling while steam was redownloading some games and running OCCT then I noticed it probably within 5 minutes or less. 

I'd be curious if everyone on Zen 3 + X570 & >1600 FCLK that are experiencing issues could notice steady WHEA prior to crashing if they were staring at HWinfo or running OCCT + extra load to really drive it. Then do the same at 3200/1600 and see if it clears up. Everythings a little different here and there but buy in large I think the issues are all relative to some extent. 

When doing a battery of testing I more or less terminated the tests to continue making adjustments the moment WHEA would appear which was extremely consistent and immediate under any load using OCCT + Loading FFXV for instance. Higher FCLKS (1800mhz) would pretty much start WHEA on their own if not OCCT for sure would drive them. Lower FCLKS (1633mhz) would not generate WHEA or at least no where near as quick with just OCCT, but the moment I would load FFXV in tandem with OCCT they'd immediately appear.

Not trying to "dock" this thread - but here is the post I had created more or less the moment I started noticing there was nothing I could really do to get WHEA to stop after FCLK over 1600mhz.









[SOLVED] Zen 3 + X570 WHEA (ASUS BIOS 3001)


Ryzen 5600x + Asus tuf b550 plus bios 1216 + ram g skill with Samsung C die [email protected] Fclk and Uclk 1900, and no error whea




www.overclock.net





For the ASUS ROG STRIX X570-I the solution was BIOS 2802 AGESA 1.1.0.0 Patch B
Both beta bios after that had no affect and use AGESA 1.1.0.0 Patch C


----------



## nevcairiel

blucube said:


> Well, thing is mine never rebooted/BSOD. However, not tooting my own horn but that could just simply be due to how quickly I caught it as it sounds like most the BSOD messages people are getting are affiliated with WHEA messages. It was a brand new build using DOCP and I had just turned it on and had HWinfo up mainly because I was more concerned with the thermals/cooling while steam was redownloading some games and running OCCT then I noticed it probably within 5 minutes or less.


I couldn't even get through a Windows install without reboots during the install, and I never saw WHEA errors in HWInfo, just instant reboots, so don't underestimate the impact of these problems. It crashes in idle, not running any test or any stress whatsoever.

In fact, during stress testing it felt more stable, and would crash when I stopped the test and dropped to the desktop.

This is also on full stock, no XMP, no increased FCLK, which means its running on 2133 RAM and thus 1067 FCLK. I was only trying to install Windows and drivers before messing with performance and it already rebooted/BSODed constantly.

You should consider yourself lucky to not experience these symptoms and "just" not being able to run memory at full speed. My system is entirely unusable, so I'm still using my old one.


----------



## blucube

nevcairiel said:


> I couldn't even get through a Windows install without reboots during the install, and I never saw WHEA errors in HWInfo, just instant reboots, so don't underestimate the impact of these problems. It crashes in idle, not running any test or any stress whatsoever.
> 
> In fact, during stress testing it felt more stable, and would crash when I stopped the test and dropped to the desktop.
> 
> This is also on full stock, no XMP, no increased FCLK, which means its running on 2133 RAM and thus 1067 FCLK. I was only trying to install Windows and drivers before messing with performance and it already rebooted/BSODed constantly.
> 
> You should consider yourself lucky to not experience these symptoms and "just" not being able to run memory at full speed. My system is entirely unusable, so I'm still using my old one.


Ouch. Yea that does sound a little different. I was already feeling lucky for a couple other reasons on this build - fortunate ASUS worked with me a bit and found something that worked in my scenario. Unfortuantely everyone's systems aren't apples to apples but an abundance of issues never the less with Zen 3 & X570 boards. Trend doesn't seem to really follow B5's etc. 

Another user I noticed was having issues with their I/O's it seemed like on their X570 which evidently cleared up by setting PCIE 3.0 from 4.0 - although I don't suspect this to be relative necessarily to this thread. Just more quirky stuff.


----------



## Alvy

I spent the whole evening (~8 hours) with my case open on the workdesk testing multiple hardware swap / F31X bios / clear cmos with the settings the other dude mentioned (and some variant to some degree) and wasn't able to get one single successful boot to Windows that works on the same system with a 3600XT nor a successful startup of a Windows installation from usb drive (tested with 3 different sticks). Always gives some sort of bootloader error or file signature not validated (like data corruption). Hell at some point entering the BIOS setup hit me with some file create error popup. That convinces me even more that my 5950X is defective. As soon as I got the 5950X out of there and put the 3600XT back Windows ran a quick chkdsk then booted successfully. Hopefully before 2022 I'll be able to get a RMA considering that AMD are replying to ticket email every 10 days.


----------



## nevcairiel

Alvy said:


> considering that AMD are replying to ticket email every 10 days.


Yes.. their support response times currently are ridiculously bad. I even got an automated response today that there was no interaction for 10 days, guess what AMD, you did not respond for 10 days.

The AMD experience so far has been extremely disappointing. 15 years of buying Intel before, and I never had any trouble. Hardware and/or software was clearly not ready, retail and support channels are entirely unresponsive. AMD, you should do better.


----------



## geriatricpollywog

nevcairiel said:


> Yes.. their support response times currently are ridiculously bad. I even got an automated response today that there was no interaction for 10 days, guess what AMD, you did not respond for 10 days.
> 
> The AMD experience so far has been extremely disappointing. 15 years of buying Intel before, and I never had any trouble. Hardware and/or software was clearly not ready, retail and support channels are entirely unresponsive. AMD, you should do better.


I feel you. I have mistakenly purchased bargain CPUs from the likes of Cyrix and AMD in the past. Now I always look for the blue box when shopping for a CPU. If it doesn’t say “Genuine Intel” or “Intel Inside” it might be a counterfeit.


----------



## Walrusbonzo

Same problem here....

Using a Gigabyte X570 Aorus Master 1.2, I just switched from a rock solid 3950X to a 5950X and now I'm getting WHEA errors, code 19 in the System Event log. I cannot get Linpack extreme stable at all under Windows. It quits almost instantly due to detected hardware failure. Bizarrely, running the same Linpack extreme using the USB bootable Porteus Linux distro passes completely.

Tried F30, F30l, F30o BIOSes so far. Same issue with all of them.

Of course, I've tested at stock. Turned off core boost, tested with RAM at 2400 and IF at 1200. Linpack extreme still detects hardware failure in Windows.

fingers crossed this is a fixable issue with BIOS updates.


----------



## Deepcuts

Walrusbonzo said:


> Of course, I've tested at stock. Turned off core boost, tested with RAM at 2400 and IF at 1200. Linpack extreme still detects hardware failure in Windows.


With CPB disabled you are still getting errors? How about reboots?


----------



## Walrusbonzo

Deepcuts said:


> With CPB disabled you are still getting errors? How about reboots?


Even with CPB disabled LinPack Extreme for Windows still errors out due to hardware failure detection very early on. Not had a chance to try much else as I only installed it 4 hours ago

Initially I tested at stock with XMP on and I was getting WHEA errors when running Cinebench. I had one reboot around this time too.

I've now got it running with manually adjusted memory timings and doing some crypto mining(a. CPU heavy coin) and so far so good. WHEA have stopped.

I'll leave it overnight and test more tomorrow.


----------



## RandomOtaku

The newest bios (F31o) fixed the problem for me.


----------



## Deepcuts

RandomOtaku said:


> The newest bios (F31o) fixed the problem for me.


Good to hear.
Nevertheless, which part of "My system with a Ryzen 5000 CPU reboots with BIOS defaults" is unclear?
Emphasis on "BIOS defaults".
Your 1st post specified you cannot use XMP and you have no problems at stock.


----------



## RandomOtaku

I am able to pass memtest 700% now. BUT, I started to experience some weird small glitches in windows. When I am typing Chinese, some Chinese letters is glitched, and number labels are glitched too. The size of icons for software runs at background are larger than they should be. They aren't severe, and they didn't affect me using the computer, but these glitches starts to appear after I changed the ram speed to 3600mhz. I am not sure whether they have any relationships, and I am not unhappy about the newest bios.


----------



## RandomOtaku

Deepcuts said:


> Good to hear.
> Nevertheless, which part of "My system with a Ryzen 5000 CPU reboots with BIOS defaults" is unclear?
> Emphasis on "BIOS defaults".
> Your 1st post specified you cannot use XMP and you have no problems at stock.


Maybe different versions of Windows? Mine is the newest win10 20h2 professional version.


----------



## thunk_stuff

I also have the same problem the instant I upgrade to 5900x and have shared my issue on amd.com forums, /ramdhelp and /r/asrock .

Getting WHEA 18 bluescreens at default BIOS settings. WHEA error type is "Bus/Interconnect Error" or "Cache Hierarchy Error". Blue screens at idle, loading Windows, opening a Window, running a GPU benchmark. 95% crashes within one minute of Windows loaded. But not normally when running a high usage multicore CPU benchmark (I can't remember if zero cases or not).

Had a Ryzen 3 3100 with 3080 GPU installed previously month/week without a problem. CPU crashes anyway when GPU is in low power mode. Turning off XMP and running at 2133Mhz does not help. Nor swapping with different RAM and various non-overclock speeds.. Reinstalling Windows on SATA SSD (not NVMe) does not help; crashes when installing. Different BIOS versions do not help. Only thing I haven't tried (besides an identical CPU), is swapping motherboard.

Originally I disabled PBO and it was stable. Running in Windows eco-mode was also stable. That's cool (literally). But I was not happy the 5900x could only run in $100 budget CPU mode.

*Temporary solution: I have achieved 5 days of 100% stability so far by upping the voltage +8 on the PBO curve optimizer for all cores. I can now enable PBO and run full speeds, and I have my memory set to 3800Mhz and Fabric to 1900Mhz. No other voltage or overclock setting has been changed. Curve optimizer is a new feature as part of PBO 2 and will only be found in some but not all of the latest BIOS releases.*

Other curve settings: +6 mostly stable but crashed once after a day. +10 was also stable but this is even more voltage which I'm trying to avoid. Undervolting the curve, not suprisingly, crashed almost immediately. I might have tried some lower values like +2 or +4, but can't remember. Anyone who plays with this setting should start small, like +2, and go up from there. The scale goes up to +30. The curve is a very new feature and not documented well. Each step up interval (AMD calls it magnitude) is equivalent to around 3-7mV at the lower end, but this scale range changes the higher you go up.

TODO: Play around with the curve optimizer, and see if I can actually pin down the issue to a single CCX or core.

Specs:

B550 Phantom Gaming ITX/ax
1.61 BIOS.
Latest AMD chipset drivers.
AGESA 1.1.0.0 Patch C (Earlier versions also crashed)
G.SKILL Ripjaws V Series 32GB (2 x 16GB). Also tried and am running currently G.SKILL TridentZ RGB Series 32GB.
Zotac 3080 amp holo
Samsung 970 2TB NVMe
Win 10 Pro 20H2
Corsair 750watt Platinum
I too, would like to run this CPU stable at stock BIOS values. Upping the voltage curve makes the CPU hotter and who knows if it's taking the life away from it. Nice on occasion but normally I would not overclock and would want to be able to undervolt and have a still fast but cooler and less power hungry system. I am not an experienced overclocker so if anyone has better suggestions let me know.

Put in an RMA request with AMD but have not heard back yet.


----------



## Schnuppl

I had the same problems. The reboot came with every installation. In addition, the first chipset m.2 is gone.
There are no more reboots with F31o. But I have a constant USB plugin sound. And the M.2 port is still gone.

(GB x570 Aorus xtreme, R9 5950x)


----------



## Deepcuts

Schnuppl said:


> I had the same problems. The reboot came with every installation. In addition, the first chipset m.2 is gone.
> There are no more reboots with F31o. But I have a constant USB plugin sound. And the M.2 port is still gone.
> 
> (GB x570 Aorus xtreme, R9 5950x)


Too lazy to take out the GPU and M.2 cover to test, but one thing that comes to mind is my motherboard loses the 10 Gb nic and front USB port after every BIOS update. To fix it I must unplug the PSU for 1 minute. 
Maybe try that?


----------



## RandalGraves

Just looking to chime as I am having the same issues. (Poll completed)

Build:
AMD Ryzen 9 5950x Vermeer 3.4Ghz 16-Core AM4
Asus ROG Strix X570-F Gaming
4x 16GB G.Skillz Ripjaws V DDR4 3600 / 2x 16GBCorsair Vengeance RGB Pro DDR4 3200 (Bought a bundle for mobo, cpu, cooler and ram, but wanted to change up to the ripjaws ram for actual latency)
ASUS TUF Gaming OC GeForce RTX 3080
Western Digital BLACK SN850 NVMe 1TB
Win 10 Pro 20H2
Corwsair HX 1200w Platinum

Latest AMD Chipset Drivers
Current BIOS = 2602

After installing windows i have continually ran into WHEA bsods and have tried a multitude of thing including: swapping out for either set of RAM, 1 ram stick in each slot, kit speed on both ram sets (2133Mhz), DOCP (3600Mhz on Ripjaw/3200Mhz on Vengeance), DOCP clocked to 3200/3000 (keeping timings the same) on Ripjaw ram, reseating all cables into the mobo and all PSU cables (along with obviously reseating the memory multiple times), different GPU (1080ti with drivers reinstalled), 3080 GPU plugged into different PCI slot, reinstalling windows and trying multiple BIOS updates (bundle came on BIOS 2802, once I first had issues I upgraded to 2816, then read all the issues about AGESA 1.1.0.0 Patch C, so went back to 2802, then decided to try 2602 recently due to being AGESA 1.0.8.0.

Almost every time, the PC boots into windows no problems, but as soon as I try to do something that isn't "just sit on the destop" it will crash, 90% of the time with WHEA bsod, sometimes will just reboot. I have ran things like Prim95 for an hour and a half straight, with no issues, but as soon as I stop it the PC crashes. Or when I go to play a game it will 50% of the time crash on loading, other 50% is as you close to come back to the main menu. Even changing quality on a video in Chrome browser crashed it earlier.

Only thing that seems to stop the crashes was turning off Core Performance Boost in BIOS. The it ran for a solid 2 hours, in and out of games, etc. But the performance was really "jittery", and wasn't butter smooth like you would expect for a build like this.

I would love to try a fix like @Todeseng3l 's with editing the power settings. But I am dumber than a bag of hammers, so unable to figure out how those settings translate to my BIOS. As I am sure a lot of you are, I am just desperate for a fix where I want to just call a PC shop and pay for them to resolve it, but I can't see how they would have any better idea than me/you guys.


----------



## Schnuppl

Deepcuts said:


> Too lazy to take out the GPU and M.2 cover to test, but one thing that comes to mind is my motherboard loses the 10 Gb nic and front USB port after every BIOS update. To fix it I must unplug the PSU for 1 minute.
> Maybe try that?


Thanks for the info but I don't have a front USB connected.


----------



## JohnnyFlash

Someone needs to test at full manual static voltage and see if that removes the issue.

It seems that the CPU is not stable at all of the voltage/clock steps.


----------



## Deepcuts

Schnuppl said:


> Thanks for the info but I don't have a front USB connected.


I am saying try to cut the power supply unit power for 1 minute.
On my motherboard, this fixes several problems. Might fix yours also.


----------



## excitebike

thunk_stuff said:


> Temporary solution: I have achieved 5 days of 100% stability so far by upping the voltage +8 on the PBO curve optimizer for all cores. I can now enable PBO and run full speeds, and I have my memory set to 3800Mhz and Fabric to 1900Mhz. No other voltage or overclock setting has been changed. Curve optimizer is a new feature as part of PBO 2 and will only be found in some but not all of the latest BIOS releases.


I've been battling this for a couple of days now and this actually fixed it for me. Stock+Curve [email protected]+8 keeps it stable when I could easily cause an idle reboot (WHEA-Logger, code 18, cache hierarchy error) by starting and stopping Cinebench R20/R23 over and over again.

It's unfortunate that it comes at the cost of a lot of PB2 headroom... my 5950x is a pretty poor 5900x now and I can't even think about undervolting even the slightest because the instability becomes unbearable.

I can't even decide if I should RMA this and try to get another one. It's a shame because I was able to dial-in nearly 5.1Ghz effective clocks which were perfectly stable under load... but I get constant *idle* reboots. It's so sad.

Thanks for the tip @thunk_stuff, any advice on RMA would be appreciated. Does this sort of thing normally get fixed by BIOS updates? 3003 on my Dark Hero hasn't fixed it...


----------



## thunk_stuff

Hi excitebike glad the temp fix worked for you. The more people who can get a stable system with a similar workaround the more we can understand what the source of the issue is and draw attention to it. It would be great if someone like Gamer Nexus could investigate and get some media attention on this. That will get us closer to understanding if this is a AGESA issue that can be fixed with a BIOS update or we all have faulty CPUs that need to be RMA'd.

Haven't gotten any response yet regarding my RMA request. It won't hurt to at least submit an RMA request and get a conversation going with AMD. That will draw more attention to this issue. Be as specific as you can about the symptoms and the work around. WHEA can have various causes and it helps to say that you can get the system stable by doing a voltage tweak and it's not related to another component like RAM, PCIE, etc. 

We all deserve a response from AMD and to at least know if they are at least aware of this issue and investigating it. It is in their interest to minimize RMA returns. So far it's just been crickets.

RMA Form: https://www.amd.com/en/support/kb/warranty-information/rma-form

This is my opinion as someone who is not a computer expert. Others please chime in if you have better ideas.


----------



## aa.delite

What can you say about idle reboots now? Is it BIOS problem or AMD RMA required? I saw the poll but everyone could vote once. Maybe BIOS already fixed idle reboots for those who voted "yes" before.


----------



## Deepcuts

aa.delite said:


> What can you say about idle reboots now? Is it BIOS problem or AMD RMA required? I saw the poll but everyone could vote once. Maybe BIOS already fixed idle reboots for those who voted "yes" before.


You have a "Change vote" button at the bottom right of the pool.


----------



## thunk_stuff

I did some more testing with curve optimizer to see if I could narrow down the problem to a specific CCX. No luck.

*TEST 1*
All Cores: +8 magnitude
Result: Stable for 5 days so far no crashes

*TEST 2*
CCX1 (cores 0-5): 0 magnitude (default setting)
CCX2 (cores 6-11): +8 magnitude
Result: Crashes Within 2 Minutes

*TEST 3*
CCX1 (cores 0-5): +8 magnitude
CCX2 (cores 6-11): 0 magnitude (default setting)
Result: Crashes Within 2 Minutes


----------



## aa.delite

Deepcuts said:


> You have a "Change vote" button at the bottom right of the pool.


lol, thanks, I did not notice  I see you voted "yes". Latest beta bios didn't fix idle reboots for you? Do you know any method to cause idle reboot for testing?
I've no reboots yet on F11n (Gigabyte B550 Aorus Master), but idle voltage is 1.44-1.49. Not sure if it's normal. Maybe motherboard has to keep it high in advance to prevent BSOD if any core reaches 5 GHz.


----------



## excitebike

thunk_stuff said:


> I did some more testing with curve optimizer to see if I could narrow down the problem to a specific CCX. No luck.
> 
> *TEST 1*
> All Cores: +8 magnitude
> Result: Stable for 5 days so far no crashes
> 
> *TEST 2*
> CCX1 (cores 0-5): 0 magnitude (default setting)
> CCX2 (cores 6-11): +8 magnitude
> Result: Crashes Within 2 Minutes
> 
> *TEST 3*
> CCX1 (cores 0-5): +8 magnitude
> CCX2 (cores 6-11): 0 magnitude (default setting)
> Result: Crashes Within 2 Minutes


I did the exact same experiment and got the same results - regardless of the CCD that got the overvolt, I'd get an idle reboot. I'll probably end up trying to isolate the core(s) responsible but this type of thing makes me believe we've just got bad hardware. I happen to have a 5800x I was about to return, so I think I might try to swap for that and see if I've got the same issues. In the meantime, I'll start the RMA process. 



aa.delite said:


> lol, thanks, I did not notice  I see you voted "yes". Latest beta bios didn't fix idle reboots for you? Do you know any method to cause idle reboot for testing?
> I've no reboots yet on F11n (Gigabyte B550 Aorus Master), but idle voltage is 1.44-1.49. Not sure if it's normal. Maybe motherboard has to keep it high in advance to prevent BSOD if any core reaches 5 GHz.


I updated my C8DH to the latest BIOS (3003, released this morning) and I'm still getting idle reboots. My most reliable method is to...

1. Make sure the system *as idle as possible*, including uninstalling any background processes that would keep voltage on your CPU (like the ASUS Aura Sync lighting service, in my case, which ridiculously keeps enough load to reduce (but not eliminate) the idle reboots). You can confirm in Ryzen Master that all of your cores should "Sleep" more often than not.
2. Launch HWInfo64 and Ryzen Master.
3. Launch Cinebench R23.
4. Start a single core benchmark and then stop it shortly after it starts.
5. Repeatedly start and stop the single core benchmark
6. Try to start it when all the cores are showing "Sleep" in Ryzen Master.

I can usually get an idle reboot right as the CPU comes out of idle to start the next benchmark within 5 minutes.


@shamino1978 @elmor - maybe you'll see this and have some idea about what's going on. I know there was a data collection effort around idle reboots in the ASUS X570 thread.


----------



## Imraneo

Hello fellas!
Just registered to announce that I have the exact same problem. Using 5900X + Asus Strix X570-F.
Totally stressed out. Tried many BIOS settings.. Patch B/C, beta, non-beta, etc. I've yet to try out the curve optimizer though.
I find it hard to believe it's a chip issue - There are so many with these issues and yet no announcement by any board partner/AMD?
Keep sharing guys..


----------



## Deepcuts

Damn, Asus people will overtake us at this rate!
Soon we will have to get reinforcements to keep our leadership.


----------



## Imraneo

Deepcuts said:


> Damn, Asus people will overtake us at this rate!
> Soon we will have to get reinforcements to keep our leadership.


Hahaha.. thank you for keeping our spirits up!
So far based on what read online, MSI seems to be the most vocal, followed by Gigabyte and then Asus. I may be wrong though.
But really.. the only commonality (besides the CPU itself) is the AGESA thingy. I really really hope there's at least an announcement somewhere. I don't mind switching off my CPB and waiting for an update. 
Such a bummer that none of the YouTubers got this...


----------



## Schnuppl

Deepcuts said:


> I am saying try to cut the power supply unit power for 1 minute.
> On my motherboard, this fixes several problems. Might fix yours also.


Hello i did that. It didn't help.

My system doesn't restart itself since F31o. But..
The 2nd m.2 slot (the first from the chipset) has been gone since F31o. And stays away, with every BIOS version 

I often have usb plug-in sounds, even when no device is connected. Various reboots will help at some point.

the PC hangs up (rarely) when starting. Graphics card is displayed, then the system hangs with a cursor line to the left. Then no restarts help, no pulling the power cable, no switching to the 2nd BIOS, no CMOS reset button.
And then it starts like a miracle.


----------



## Gri77o

Hi guys! Same here.
Getting WHEA errors, random black screens and reboots after installing my new Ryzen 7 5800x in a MSI X570 Tomahawk wifi. I was so desperate that I almost RMA the CPU.

*Workaround 1*
After some digging installed Bios 7C84v151(Beta version), cleared cmos, updated chipset drivers (from AMD), disabled PBO and CPB. Enabled XMP P2 (3600). No more reboots or errors but the cpu was stuck at 3800MHz. Got to the Bios and set Vcore to 1.25V, Vsoc to 1.1125v, LLC to m6, Vram to 1.35v and CPU multiplier to 45.75.

It is stable, including in games, running at 4575 Mhz all core. Max temps in Cinebench r20 near 64ºC in multicore. Setting multiplier to 46 needs 1.30v to be stable and temps go up to 75ºC. Works but not worth it.

*Workaround 2*
I disabled Global C States and enabled PBO and CPB (multiplier on auto) and manualy set vcore to 1.375 volts, Vsoc to 1.125 volts and LLC to m6. It WORKED!!!! Stable without errors and boosting up to 4.850MHz with temps up to 75ºC in Cinebench R20 multi-core. Something with auto voltage is very wrong!! Moving back to workaround 1 because it works with much lower voltages and temps.

Waiting for a fix.

I hope it helps!


----------



## Imraneo

Gri77o said:


> Hi guys! Same here.
> Getting WHEA errors, random black screens and reboots after installing my new Ryzen 7 5800x in a MSI X570 Tomahawk wifi. I was so desperate that I almost RMA the CPU.
> 
> *Workaround 1*
> After some digging installed Bios 7C84v151(Beta version), cleared cmos, updated chipset drivers (from AMD), disabled PBO and CPB. Enabled XMP P2 (3600). No more reboots or errors but the cpu was stuck at 3800MHz. Got to the Bios and set Vcore to 1.25V, Vsoc to 1.1125v, LLC to m6, Vram to 1.35v and CPU multiplier to 45.75.
> 
> It is stable, including in games, running at 4575 Mhz all core. Max temps in Cinebench r20 near 64ºC in multicore. Setting multiplier to 46 needs 1.30v to be stable and temps go up to 75ºC. Works but not worth it.
> 
> *Workaround 2*
> I disabled Global C States and enabled PBO and CPB (multiplier on auto) and manualy set vcore to 1.375 volts, Vsoc to 1.125 volts and LLC to m6. It WORKED!!!! Stable without errors and boosting up to 4.850MHz with temps up to 75ºC in Cinebench R20 multi-core. Something with auto voltage is very wrong!! Moving back to workaround 1 because it works with much lower voltages and temps.
> 
> Waiting for a fix.
> 
> I hope it helps!


I tried Workaround 2 and didn't work for me. 
I think this might be the first 5800X I see with this issue? It seems that the 5600X are spared form this. Lower power requirements perhaps..


----------



## excitebike

Gri77o said:


> *Workaround 2*
> I disabled Global C States and enabled PBO and CPB (multiplier on auto) and manualy set vcore to 1.375 volts, Vsoc to 1.125 volts and LLC to m6. It WORKED!!!! Stable without errors and boosting up to 4.850MHz with temps up to 75ºC in Cinebench R20 multi-core. Something with auto voltage is very wrong!! Moving back to workaround 1 because it works with much lower voltages and temps.


Yeah, disabling Global C-States or DF C-States definitely can reduce the frequency significantly, but I can still get idle reboots. I've not had a single one with Curve Optimizer > All Core > Positive > 8.


----------



## aa.delite

excitebike said:


> Yeah, disabling Global C-States or DF C-States definitely can reduce the frequency significantly, but I can still get idle reboots. I've not had a single one with Curve Optimizer > All Core > Positive > 8.


What core voltage does hwinfo64 show when CPU is idle? Mine is 1.44-1.50 using new beta bios.


----------



## excitebike

aa.delite said:


> What core voltage does hwinfo64 show when CPU is idle? Mine is 1.44-1.50 using new beta bios.


With +8, I'm idling between 1.038V-1.050V. Without it, I'm seeing around 0.970V.

Those are the requests (as I understand it). SVI2 TFN shows 1.475V-1.481V.


----------



## Gri77o

Imraneo said:


> I tried Workaround 2 and didn't work for me.
> I think this might be the first 5800X I see with this issue? It seems that the 5600X are spared form this. Lower power requirements perhaps..


Keep in mind that your CPU may need more voltage. To find 1.375v value i ve made several tests with lower voltages until i got it stable.


excitebike said:


> Yeah, disabling Global C-States or DF C-States definitely can reduce the frequency significantly, but I can still get idle reboots. I've not had a single one with Curve Optimizer > All Core > Positive > 8.


I do not have that option!


----------



## thunk_stuff

Gri77o said:


> I do not have that option!


Curve optimizer for Tomohawk x570? Have you looked at: 


__
https://www.reddit.com/r/Amd/comments/k0jk3d


----------



## excitebike

In the interest of not letting this thread die and hopefully getting some visibility from someone that can diagnose AGESA, I put in my RMA for the 5950x last night. I've got more thermal paste arriving today, so I'll probably try my 5800x once the RMA goes through and I send it back.

I linked this thread in the RMA, just in case that helps. We'll see.


----------



## aa.delite

excitebike said:


> I put in my RMA for the 5950x last night.


No one knows yet if it's CPU fault. But I don't see thousands angry 5900-5950x owners. I see ~50 people over the world. Maybe WE are lucky. Or no one else bought it yet.


----------



## thunk_stuff

excitebike said:


> I'll probably try my 5800x once the RMA goes through and I send it back.


Testing your 5800x would be interesting. It if runs stable and great at default settings, it is more evidence that the problem is with the CPU and not motherboard. 

I guess it can't say definitively, since it's not apples to apples 5900x comparison. And there is still a chance we just all have "special snowflake" CPUs that are fine but need a little extra help running to their full and stable potential through an AGESA update.


----------



## crossblk

time to join "the party", just changed my 5 year old system (intel) with a Aorus Master x570, 5900x (5950x coming this week), TUF 3080, and some G.Skill Trident Neo whatever, 64GB, CL16 ... to keep it short, CPB disabled was the only fix so far, XMP is on, still get some random reboots, rarely, mostly at idle, but at least I can use it cause' I was thinking I'll have to settle for playing Solitaire with this new build. I'm not a specific "team" fan, I always buy try to get what's best at the moment, don't care if it's AMD, Intel, whatever, but I have to say that never had such problems with Intel, over a 20 year spam, even had 2 kingstons and 1 corsair in the same build, no issues. Probably gonna get fixed with BIOS updates, for me, it's fine, since even the 5900x with CPB off is better than what I had


----------



## smbell1979

I've got an Asus Crosshair VIII Hero that has previously been running a 5800x with no issues whatsoever for the last three weeks straight. Stock settings in bios, with a Dark Rock Pro 4 cooler.

I finally got my 5950x in today, so I popped it in and I'm getting random reboots and then a "CPU over temperature" error. Most of the time this happens immediately after windows boots up to the desktop, like not even enough time for me to open ryzen master and look at anything. But when I have been able to get it open quickly, I see the temps get into the high 70s for a quick second, voltage is at 1.4ish.

I can let it sit on the login screen for an hour and it's fine, but as soon as it load up the desktop, it reboots.

All settings in the BIOS are stock, BIOS has been updated to the latest version 3003 as of today.

The other weird thing is, when it actually manages to not reboot, I can run cinebench 50 times, multi and single core, I can play games for hours at a time and the temps are fine.

Full load never goes above 65c, single core never over 75c. Idle at 35-40c.

My heatsink is mounted correctly, thermal paste is fine. The fans are running and ramp up when needed etc.

This is my first Ryzen system, I realize I'm on overclock.net, but I don't want to overclock this thing at all, I just want it to work.

This seems really similar to what is happening to everyone else in here. The over temperature error just seems odd. I'm almost certain it's not temperature related.


----------



## MoW

Final F31 bios is out. Anyone got the time to fiddle with this yet ?


----------



## Deepcuts

Tested F31 on Aorus X570 Xtreme. Same issue.
BIOS defaults loaded. No XMP.
Youtube on chrome is stable for ~15 minutes I tested.
Handbrake CPU encode and Guild Wars 2 launcher = instant reboot.

Sorry for the bad news.
Others might be luckier.


----------



## Imraneo

OK guys. I've had some progress.
Im using Asus Strix X570-F BIOS 3001. Stock settings, didn't solve my reboots. So I finally narrowed down to these 2 settings which worked:

1) Disable Global C-state control
2) Vcore at 1.1V

All this while I've been over-volting. In fact I should be under-volting. In auto settings, the default voltage is 1.44V.
I have narrowed down the above 2 settings to be relevant for this fix. I've had other changes slowly removed as they're not needed.

More info:


> Idle temp: 44 deg (3.6Ghz)
> Single core boost: ~51 deg (max boost 4.82Ghz)
> All core boost: ~71 deg (max boost 4.52Ghz)
> XMP 3600 turned ON


Idled for about 10hrs straight (this is where reboots happen almost immediately, so this gives me huge confidence). Also ran some CineBench stably.
Still keeping my fingers crossed tightly. I think I'm getting what I paid for. Please share if this helped.
Also, is there any issue if I disable C-state control? I read this is a power saving feature, but I'm pretty new to this.

Cheers.


----------



## crossblk

Imraneo said:


> OK guys. I've had some progress.
> Im using Asus Strix X570-F BIOS 3001. Stock settings, didn't solve my reboots. So I finally narrowed down to these 2 settings which worked:
> 
> 1) Disable Global C-state control
> 2) Vcore at 1.1V
> 
> All this while I've been over-volting. In fact I should be under-volting. In auto settings, the default voltage is 1.44V.
> I have narrowed down the above 2 settings to be relevant for this fix. I've had other changes slowly removed as they're not needed.
> 
> More info:
> 
> 
> Idled for about 10hrs straight (this is where reboots happen almost immediately, so this gives me huge confidence). Also ran some CineBench stably.
> Still keeping my fingers crossed tightly. I think I'm getting what I paid for. Please share if this helped.
> Also, is there any issue if I disable C-state control? I read this is a power saving feature, but I'm pretty new to this.
> 
> Cheers.



This works for my 5900x for now, did around 1 hour of Cinebench, played some overwatch, no reboots for now, hope it stays this way. (XMP 3600 is on, CPB on). Default voltage was 1.2 here, I've changed that 1.1, not sure it's optimal, but works for now.


----------



## nevcairiel

I read about Global C-States causing issues with other things before, so I'm going to try that as well and see what happens. If its stable idle until say tomorrow, I might reinstall the system and try actually using it as my main system.

This is on a X570 Aorus Master with the new F31 BIOS now, full default settings with just C-State disabled and 1.1 SOC. Not trying to dilute results by throwing XMP in the mix immediately, want to be sure.

With C-State Enabled it still crashes, so testing will be interesting.

Edit:
First hour in, it has never not crashed for a full hour. I'm feeling hopeful. Going to leave it idle over night and then try to "use" it tomorrow.
One thing I noticed so far is that the CPU never drops below 3.6GHz now and consistently consumes 40W+ in Idle (is as low as 16W with C-States enabled)


----------



## Schnuppl

Now my 5950x only have 8 cores..  
With every BIOS. With main and backup bios.


----------



## V!v!d

I have a 5900x on the way and looking for a mobo (will do some OC). Are there any that are known to not have the issues reported in this thread? Sorry if this info is already listed somewhere. Thank you!


----------



## nevcairiel

The majority of chips are probably fine, but it appears a sizeable percentage is having some degree of issues.

That said, Disabling C-States is still stable after a couple hours of idle, previously it crashed within minutes. Going to run it through stress testing and benchmarking tomorrow to make sure the performance is expected.


----------



## frollic

Registered just to be able to write in this thread.

Bought a 5900x, and 3200 CL16 DIMMs.

Tried three different mobos, Asrock B550M-ITX/ac, ASUS ROG B550-I GAMING and Gigabyte B550I AORUS PRO AX.
All brand new, same WHEA issue on all three.
Did also try a 2nd set of DIMMs to rule out the RAM, still same issue.

Requested a RMA with AMD 3 days ago.

I noticed everything works just fine if I disable half of the CPU (1CCD), obviously not a great work around but it might be as good as it gets, for now.

I'll try some of the solutions provided here, perhaps they'll work, fingers X:ed.


----------



## excitebike

I had pretty good luck with disabling Global C-States... it does dramatically reduce the reboots, but I've definitely reproduced it with them disabled, it's just a lot harder. Just a fair warning to anyone using it that you may see an occasional reboot still. (I've not experienced a single reboot with +8 Curve Optimizer, despite spending a long time trying).

From a thermal perspective, I'm not sure which one is better - they both do something similar, based on my observations. When c-states are disabled, the cores won't reach their lowest power state. This reduces their ability to get out of the way of boosting cores, resulting in compromised boost frequencies. Overvolting with Curve Optimizer raises the floor on the voltage/frequency curves, also preventing idle cores from getting out of the way of boosting cores, also resulting in compromised boost frequencies.

I think disabling c-states keeps the voltage a little lower, because I can still get idle reboots. I feel like it's a stability trade off.


----------



## nevcairiel

Disabling Global C-States was stable over night, but what excitebike said had me worried from the start, what if its not actually 100% stable and just much rarer, I really don't want a system that reboots once a month, so I'm definitely going to continue with the RMA.


----------



## Schnuppl

nevcairiel said:


> Disabling Global C-States was stable over night, but what excitebike said had me worried from the start, what if its not actually 100% stable and just much rarer, I really don't want a system that reboots once a month, so I'm definitely going to continue with the RMA.


RMA with Mainboard or CPU?


----------



## MakubeX

5950X on x570 Aorus Master 1.1 here. My weird issue (other than a hard 1867 FCLK wall) is Per Core overclock only affecting half the cores. Tested on both F31o and F31 beta.

Can someone else with a 5950X (and others too for comparison) please enable Per CCX CPU clock and see if they experience the same issue?


----------



## iraff1

is it possible my hardware has degraded over the week ive had it without increasing any voltages?

i have had my 5950x machine stable at 3600mhz cl16 1800flck for the past week and not seen a single WHEA error, i left the machine idle during the night and suddenly i see WHEA errors, the problem is , after this occurrence of errors during idle the errors are ALSO showing up during load now, i have not changed ANYTHING, i have not even been in the bios. 

Literally went from rock solid stable for 1 week + to idle during night, WHEA Errors and now WHEA errors all over the place. What kind of a beta is this cpu really?


----------



## iraff1

iraff1 said:


> is it possible my hardware has degraded over the week ive had it without increasing any voltages?
> 
> i have had my 5950x machine stable at 3600mhz cl16 1800flck for the past week and not seen a single WHEA error, i left the machine idle during the night and suddenly i see WHEA errors, the problem is , after this occurrence of errors during idle the errors are ALSO showing up during load now, i have not changed ANYTHING, i have not even been in the bios.
> 
> Literally went from rock solid stable for 1 week + to idle during night, WHEA Errors and now WHEA errors all over the place. What kind of a beta is this cpu really?


Update, i am now running everything at stock, memory at 2666 and flck at 1333, STILL getting a bunch of 

*A corrected hardware error has occurred.

Reported by component: Processor Core
Error Source: Corrected Machine Check
Error Type: Cache Hierarchy Error
Processor APIC ID: 11

The details view of this entry contains further information.*

and 

*A corrected hardware error has occurred.

Reported by component: Processor Core
Error Source: Corrected Machine Check
Error Type: Cache Hierarchy Error
Processor APIC ID: 10

The details view of this entry contains further information.*

This is so strange, i have not changed anything, and ive ran the machine full stable with 3600mhz/1800flck for a week and a half and not seen a single of those messages, then i leave the computer idle during night and now i have these errors even when i am at stock speeds? I think the cpu is about to have a complete meltdown


----------



## Deepcuts

iraff1 said:


> Update, i am now running everything at stock, memory at 2666 and flck at 1333, STILL getting a bunch of


I have seen this numerous times. Stock DDR4 is 2133, not 2666
I have yet to see any DDR4 above 2133 without XMP or manual settings.


----------



## iraff1

Deepcuts said:


> I have seen this numerous times. Stock DDR4 is 2133, not 2666
> I have yet to see any DDR4 above 2133 without XMP or manual settings.


maybe stock was the wrong word, "optimized defaults" this is what you get running that. either way, going from fully stable 3600/1800 with tight timings to not even able to run 2666/1333 with slow ass timings = suddenly broken hardware?


----------



## Imraneo

iraff1 said:


> is it possible my hardware has degraded over the week ive had it without increasing any voltages?
> 
> i have had my 5950x machine stable at 3600mhz cl16 1800flck for the past week and not seen a single WHEA error, i left the machine idle during the night and suddenly i see WHEA errors, the problem is , after this occurrence of errors during idle the errors are ALSO showing up during load now, i have not changed ANYTHING, i have not even been in the bios.
> 
> Literally went from rock solid stable for 1 week + to idle during night, WHEA Errors and now WHEA errors all over the place. What kind of a beta is this cpu really?


Pretty much the same happened to me. Worked great for a week, then downhill.
Anyways, do check out my post above and see if it helps.
Cheers!


----------



## iraff1

Imraneo said:


> Pretty much the same happened to me. Worked great for a week, then downhill.
> Anyways, do check out my post above and see if it helps.
> Cheers!


Will try it out , i am very surprised that it worked so good in the beginning then suddenly a plethora of issues straight out of the blue, glad to hear i am not alone in this though, should make the possibility of an RMA easier to deal with considering if you also had this issue (stable in beginning then slowly degrade into ****) its clear there are some big issues with the 5000 series

Only thing that annoys me is that i use this machine for work, and after 5 days of heavy testing i decided to migrate my stuff over, and now finally when i am setup it all breaks, luckily i have my old machine still available, its just really god damn annoying to deal with heh


----------



## iraff1

Imraneo said:


> Pretty much the same happened to me. Worked great for a week, then downhill.
> Anyways, do check out my post above and see if it helps.
> Cheers!


setting 1.1vcore for me won't even let me boot into windows.
disabling c-states did nothing, just as many WHEA errors.

to me all points to broken silicone, they made this thing consume 1.5v at idle and it probably broke the silicone during idle at night, it was stable for a week because i was constantly loading the system and that brings down the vcore to about 1.25v. I left it idle for too long and that broke it.

Now i cannot even operate at regular slow speeds without having a bunch of errors, silicone i broken. ****. Hopefully its a dud and the next one i get is not like this


----------



## iraff1

manually locking the cpu at mutlipiler 30x (3000mhz) fixed the issue for me, that means this thing is slower then what i upgraded from but i have no choice i need to work. ill have to make sure to rebuild my old machine this weekend so i can rma this thing


----------



## aa.delite

iraff1 said:


> manually locking the cpu at mutlipiler 30x (3000mhz) fixed the issue for me, that means this thing is slower then what i upgraded from but i have no choice i need to work. ill have to make sure to rebuild my old machine this weekend so i can rma this thing


Are you sure it's about time to RMA? I'm not sure AMD fixed their CPUs yet. Did you try latest beta bios?


----------



## iraff1

aa.delite said:


> Are you sure it's about time to RMA? I'm not sure AMD fixed their CPUs yet. Did you try latest beta bios?


Yeah tested F31 and same issue. The thing doesn't just go from being rock stable for a week then suddenly i cant even run "stock speeds". The only thing that changed during the week of me having the system is that last night i let it idle for the first time, letting it idle means it will spike the cpu with 1.5v alot for those smaller work loads where it boost to 5050+ mhz, my belief is that i left it idle to too long and the high voltages that was served to it during this almost 10 hour period of time made the scilicone degrade and now i am here with a nearly broken cpu that won't even run stock.

I've re-flashed my bios, tested a ton of settings, nothing yields any fix except setting the multipiler to x30 resulting in the boosting algoritm going away. I mean i would probably not RMA if it started out like this, but please consider i had this system for over a week, things can't just go from one thing to another without something changing. The hardware has somehow changed, its not acting like it did yesterday before the long idle session.

EDIT: The only thing i could consider being the issue is Windows 10, windows changes all the time, without your knowledge. I will try to install a new installation of win10 during this weekend just to see if the WHEA errors occurs on a fresh install, perhaps its something with windows that has to "come into form" before the errors start showing up in the eventlog, who knows. All things do however in my mind point to the cpu having somehow changed for the worse.


----------



## Cavokk

iraff1 said:


> Yeah tested F31 and same issue. The thing doesn't just go from being rock stable for a week then suddenly i cant even run "stock speeds". The only thing that changed during the week of me having the system is that last night i let it idle for the first time, letting it idle means it will spike the cpu with 1.5v alot for those smaller work loads where it boost to 5050+ mhz, my belief is that i left it idle to too long and the high voltages that was served to it during this almost 10 hour period of time made the scilicone degrade and now i am here with a nearly broken cpu that won't even run stock.
> 
> I've re-flashed my bios, tested a ton of settings, nothing yields any fix except setting the multipiler to x30 resulting in the boosting algoritm going away. I mean i would probably not RMA if it started out like this, but please consider i had this system for over a week, things can't just go from one thing to another without something changing. The hardware has somehow changed, its not acting like it did yesterday before the long idle session.
> 
> EDIT: The only thing i could consider being the issue is Windows 10, windows changes all the time, without your knowledge. I will try to install a new installation of win10 during this weekend just to see if the WHEA errors occurs on a fresh install, perhaps its something with windows that has to "come into form" before the errors start showing up in the eventlog, who knows. All things do however in my mind point to the cpu having somehow changed for the worse.


Hmmmm this is worrying me - keen to get your feedback on a fresh win 10 install....


----------



## Spiriva

Amd 5000 series are pure trash. I had the 5950x and the dark "hero". Bsod, hard reboots all the time. Rebooted it self while in the bios.

3003 bios did nothing, tried to change ddr4, graphics card, psu, m.2, reset bios, clear cmos, reinstall windows, set ddr4 timings etc

Nothing changed the grabage just bsod, or hard rebooted. 

FU AMD! Your products are trash, no one in thier right mind will buy a $1500 system and "wait weeks for a bios". I will never again buy anything from AMD. 

Pure garbage.


----------



## Imraneo

Cavokk said:


> Hmmmm this is worrying me - keen to get your feedback on a fresh win 10 install....


At this point, it's probably worth trying everything..
I tried reinstalling but it reboots during the early stages of the install causing data corruptions...
It's definitely not a software issue...


----------



## Cavokk

Spiriva said:


> Amd 5000 series are pure trash. I had the 5950x and the dark "hero". Bsod, hard reboots all the time. Rebooted it self while in the bios.
> 
> 3003 bios did nothing, tried to change ddr4, graphics card, psu, m.2, reset bios, clear cmos, reinstall windows, set ddr4 timings etc
> 
> Nothing changed the grabage just bsod, or hard rebooted.
> 
> FU AMD! Your products are trash, no one in thier right mind will buy a $1500 system and "wait weeks for a bios". I will never again buy anything from AMD.
> 
> Pure garbage.


I really understand your frustration and I must say that personally being on intel platforms since the first Athlons in the 90’s it’s quite a wake up call for me as well with all the issues I am seeing and experiencing since changing to Ryzen 5950x - For sure would NOT use R5000 for professional work as it is now. I’ll give them a chance to remedy though as my rig is only for entertainment and I like to tinker 

all the best
C


----------



## thunk_stuff

frollic said:


> Tried three different mobos, Asrock B550M-ITX/ac, ASUS ROG B550-I GAMING and Gigabyte B550I AORUS PRO AX.
> All brand new, same WHEA issue on all three.
> Did also try a 2nd set of DIMMs to rule out the RAM, still same issue.
> 
> I'll try some of the solutions provided here, perhaps they'll work, fingers X:ed.


You are the first person who had confirmed this problem on different boards. I'm going to quote you when I go through the RMA process with AMD to convince them it's not a motherboard issue 

If you could try out the +8 all cores in the curve optimizer (if you bios supports it), that would be helpful. It's what has made the system stable for me and excite, and would confirm we are all running into the exact same issue. You do lose a hundred or so mhz on max boost.

I posted this also on the amd forum and we have a theory we happen to be a small minority that got bad silicon that amd should have binned for lower performing chips. For us, default settings are like we have undervolted the chip too much.


----------



## iraff1

Did some testing results here: Is it possible my new 5950x system hardware has already degraded?

Pretty much can't run the machine without setting a static multipiler, disabling core performance boost does indeed disable the boosts but WHEA errors remain, only when i set a static multipiler like x36 the WHEA errors go away. I can run my memory and FLCK overclocked no problem with a set multipiler.

I went to fast thinking my hardware degraded, i am once more in belief that my hardware didn't degrade but i somehow managed to update the microcode in the cpu when i tested bios F31 and the microcode remains even after you flash back to f30, so that's why i am suddenly screwed like the rest of you in here. Hopefully this is something AMD can fix, but the question is... when? Why are we beta testing this thing lol


----------



## thunk_stuff

iraff1 said:


> Why are we beta testing this thing lol


AMD assumed only scalpers would get the CPUs and so we must be the shmucks who support the scalper industry and should be punished. 

Just kidding, AMD doesn't care.


----------



## iraff1

frollic said:


> Registered just to be able to write in this thread.
> 
> Bought a 5900x, and 3200 CL16 DIMMs.
> 
> Tried three different mobos, Asrock B550M-ITX/ac, ASUS ROG B550-I GAMING and Gigabyte B550I AORUS PRO AX.
> All brand new, same WHEA issue on all three.
> Did also try a 2nd set of DIMMs to rule out the RAM, still same issue.
> 
> Requested a RMA with AMD 3 days ago.
> 
> I noticed everything works just fine if I disable half of the CPU (1CCD), obviously not a great work around but it might be as good as it gets, for now.
> 
> I'll try some of the solutions provided here, perhaps they'll work, fingers X:ed.


This is valuable information, thank you. My theory that there is microcode on the cpu chip that may have been corrupted during my bios update is the only practical explanation i have to why my fully working setup suddenly started becoming a fully non working setup no matter how low i go on the flck or memory speeds. The only thing that solves it for me is setting a static multipler but maybe also disabling one of the ccds would fix it, i have not tried that yet.

Anywho, we're all in this beta boat together it seems, i will probably RMA my cpu too if they don't get this sorted within a couple of weeks. But i have no stress with the RMA because i know there are zero cpus to be had, next batch that arrives in sweden is next year.


----------



## Deepcuts

When I first swapped my 3950X with the new 5950X, I have setup BIOS the exact same way again and I recall it worked for about 2 hours without problems. Then it started misbehaving.
At 1st, I didn't give much thought to it but seeing more people stating the same thing jolted my neuron.
The only tweaks were with RAM. Manual 3600 @ 1.38V with tight timings. UCLK and FCLK at 1800. PBO disabled and the only manual voltage was VCORE -0.0100 mV. Rest auto, including LLC.
VCORE was negative as that gave me the best performance and lower temps.
Then again, maybe it was just my imagination or dumb luck. No way to know for sure.


----------



## reqq

i get whea error at my current mem overclock but not with xmp.. but i dont get any problems.. 5 hours stable with karhu memtest.. and everything else. Its a bios thing 100%..


----------



## aa.delite

Need to ask Gigabyte about 1.44-1.50v idle somehow. It keeps so even if <4000 MHz. It's too high for a normal use. Good for 1 second 5030MHz boost, but not all the time.


----------



## aa.delite

reqq said:


> i get whea error at my current mem overclock but not with xmp.. but i dont get any problems.. 5 hours stable with karhu memtest.. and everything else. Its a bios thing 100%..


try to use 3600 XMP, not 3733


----------



## reqq

iraff1 said:


> Update, i am now running everything at stock, memory at 2666 and flck at 1333, STILL getting a bunch of
> 
> *A corrected hardware error has occurred.
> 
> Reported by component: Processor Core
> Error Source: Corrected Machine Check
> Error Type: Cache Hierarchy Error
> Processor APIC ID: 11
> 
> The details view of this entry contains further information.*
> 
> and
> 
> *A corrected hardware error has occurred.
> 
> Reported by component: Processor Core
> Error Source: Corrected Machine Check
> Error Type: Cache Hierarchy Error
> Processor APIC ID: 10
> 
> The details view of this entry contains further information.*
> 
> This is so strange, i have not changed anything, and ive ran the machine full stable with 3600mhz/1800flck for a week and a half and not seen a single of those messages, then i leave the computer idle during night and now i have these errors even when i am at stock speeds? I think the cpu is about to have a complete meltdown


thats not the WHEA i get.. only one i get is Bus/Interconnect Error, APIC-ID:0..


----------



## reqq

aa.delite said:


> try to use 3600 XMP, not 3733


yeah.. 3600 xmp didnt WHEA errors.. however 3733 is completely stable so thats what i run... You should see the difference in AOTS score between these mem speeds, its like night and day.. im not gonna give that up.. its a good indicator for how much more you get in games.


----------



## MoW

iraff1 said:


> manually locking the cpu at mutlipiler 30x (3000mhz) fixed the issue for me, that means this thing is slower then what i upgraded from but i have no choice i need to work. ill have to make sure to rebuild my old machine this weekend so i can rma this thing


It's equivalent to setting CPB to disabled (no boosting). Setting it at 3000 manually is way below it's stock base speed.
Some have found disabling CPB solve the random reboots and errors. just a temporary fix.


----------



## thunk_stuff

reqq said:


> thats not the WHEA i get.. only one i get is Bus/Interconnect Error, APIC-ID:0..


I have gotten both "Cache Hierarchy" and "Bus/Interconnect" WHEA errors. Both went away with voltage tweak. So they can both point to the same issue.


----------



## MoW

Correct me if i am wrong. The polling statistic speaks for themselves. Most Ryzen 9's are having the issues and they're with 2 CCD. Ryzen 5 & 7 (with single CCD) less prone to errors ? And some have found disabling one of the CCD in ryzen 9 did prevent the reboots.


----------



## crossblk

Got my 5950x yesterday, replaced the 5900x, BIOS with default settings and only XMP profile on (3600), fclk to 1800, and everything works smooth, for now at least. As I said in a previous post, with the 5900, had the WHEA, lots of times, only fix was to disable CPB, and set vcore to 1.1 from 1.2(default), this worked. After switching to 5950, I was expecting more errors, but for some reason, everything works. BIOS is F31o (aorus master). Cinebench a couple of times, played some games a few hours, and rendered a 6 hours 3ds max scene with no issues. Left it idle over night, no reboots, but it's only 1 day, from what i've seen here, I'm expecting it to start "degrading". Someone mentioned Windows, and I remember after I've installed the 5900 3 days ago , everything worked for like 8-9 hours with no issues, and after that, WHEA started, I had to disable CPB.


----------



## Xvelved

For the record, this happens for Zen 2 cpu as well, I have a 3700x with Gigabyte X570 Aorus Elite, Trident Z Neo 16GB 3600mhz. If I upgrade the bios to version F10, F11, F20, F21, F30, F31o, F31 will instantly produces tons WHEA error event 19 (cache hierarchy error) and random reboots, gaming, surfing websites or even at idle.
Temporary fix is either:
1) disable CPB (this defeats the purpose of cpu auto boost)
2) static overclock 4.0ghz @ 1.32v 
3) Downgrade to BIOS F4 (the most stable bios for me)

1 and 2 solve the WHEA if I'm using bios version higher than F4

The most stable bios for me is F4 which has AGESA 1.0.0.4 B with zero WHEA error, no random reboots. Higher version of AGESA will surely trigger tons of WHEA error and unstable system.
Another test I did is swapping my friend's R5 3600, to my surprise no error! Something is wrong with my 3700x or maybe microcode in the bios. I cannot RMA yet my cpu since its the only Pc I have.
Many users complained about the error on AMD, reddit or other tech forums, people tried to fix this, some do succeed like updating bios, swapping ram but others like me doesn't. Believe me I have tried all of the fix but doesnt work at ALL. Changing cpu voltage, soc voltage, LLC, changing other bios settings you name it. I'm no tech expert I just tried all of the suggested attempt to fix. End user consumer like us can't really do much, this has something to do with AMD it self.

I dont really mind about the CPU or RAM overclocking, I just want a stable system. With the current Zen 3 stock issues, I just have to live with it for now, probably next year till I can purchase new cpu and RMA mine.

tldr; ultimate fix is RMA.


----------



## iraff1

For what its worth i realized after tweaking a million settings that this WHEA error ID 10 and 11 that i am getting seem to be first of all, coming and going. It's probably just a damn random luck i wasn't seeing the error for the past 5 days. After tweaking some settings the error went away, i though i had solved it and then suddenly they are back, i am again... not changing anything from when they were not there, to when they are there.

Who knows at this point, i am by far to unexperienced to try to understand a complex cpu like this, i am chosing to ignore these errors as they seem to have no effect on my system stability (at least i haven't had any issues so far except seeing this message spam in the event log) I guess time will tell.

I hope AMD can get this **** sorted, i am baffled that every single reviewer that has had a 5000 series cpu has not reported ANY issues like this, yet this issue is huge and massively widespread, i am starting to think the entire reviewer community is on the amd payroll because they sure as hell trash intel everytime they get a chance but with amd it's pretty damn quiet if you ask me. The WHEA issues should be covered by them, its a massive issue thats affecting a ton of users. To me it seems like reviewers get their own specially crafted silicone, while the rest of us get scrap metal.


----------



## Imraneo

Do you think it makes sense to make some noise over at the AMD forums too? Not sure if they will take this.more seriously then...

One of the threads:





Ryzen 5900x: System constantly crashing/restarting WHEA-Logger ID 18 and critical error Kernel-Power


Mainboard: MSI x570 Unify Mainboard-BIOS: 7C35vA82 (Beta version) CPU: Ryzen 5900x RAM: Crucial Ballistix BL2K32G36C16U4B 3600 MHz, 64GB (32GB x2) Drive: M.2 Samsung 970 Evo+ 1TB SSD Graphics: SAPPHIRE Nitro+ Radeon RX 5700 XT PSU: be quiet straight power 11 750w Platinum OS: Win 10 Pro (64bit)...




community.amd.com


----------



## thunk_stuff

Imraneo said:


> Do you think it makes sense to make some noise over at the AMD forums too? Not sure if they will take this.more seriously then...


I've been posting in that thread along with here. Also have done: /r/ASRock, /r/AMDhelp, starting RMA with AMD, bugging ASRock for BIOS update (their support is pretty bad BTW). Lowest tier AMD support is probably aware of the thread, but no official response yet from AMD to even acknowledge there's an issue.

If you have an ASUS motherboard you might be able to try out an AGESA 1.1.8.0 BIOS, which looks like the last hope before RMA becomes reality. For at least some people it's eliminated WHEA issues. If you don't have ASUS you could try bugging support for your motherboard and seeing if they have a beta bios.

Almost tempted to reach out to Gamer Nexus and be willing to ship my processer to them to test and get more media attention on this issue. But the voltage tweak is giving me a stable system and I'm only losing a 100Mhz compared to default. And when it's stable it's an amazing processor.


----------



## iraff1

Imraneo said:


> Do you think it makes sense to make some noise over at the AMD forums too? Not sure if they will take this.more seriously then...
> 
> One of the threads:
> 
> 
> 
> 
> 
> Ryzen 5900x: System constantly crashing/restarting WHEA-Logger ID 18 and critical error Kernel-Power
> 
> 
> Mainboard: MSI x570 Unify Mainboard-BIOS: 7C35vA82 (Beta version) CPU: Ryzen 5900x RAM: Crucial Ballistix BL2K32G36C16U4B 3600 MHz, 64GB (32GB x2) Drive: M.2 Samsung 970 Evo+ 1TB SSD Graphics: SAPPHIRE Nitro+ Radeon RX 5700 XT PSU: be quiet straight power 11 750w Platinum OS: Win 10 Pro (64bit)...
> 
> 
> 
> 
> community.amd.com


Def worth it, i am appauled to read the "solution to the problem" in the thread you linked, how is removing 50% of the performance the cpu delivers the fix to the problem? Shouldn't the CPU be marketed at that performance level then? Disabling Core Boosts is definitely not an acceptable solution to this problem. I expect 640 single thread performance in cinebench and 11500-12000 in multi thread, that was what was advertised, that is what i am paying for.


----------



## iraff1

thunk_stuff said:


> I've been posting in that thread along with here. Also have done: /r/ASRock, /r/AMDhelp, starting RMA with AMD, bugging ASRock for BIOS update (their support is pretty bad BTW). Lowest tier AMD support is probably aware of the thread, but no official response yet from AMD to even acknowledge there's an issue.
> 
> If you have an ASUS motherboard you might be able to try out an AGESA 1.1.8.0 BIOS, which looks like the last hope before RMA becomes reality. For at least some people it's eliminated WHEA issues. If you don't have ASUS you could try bugging support for your motherboard and seeing if they have a beta bios.
> 
> Almost tempted to reach out to Gamer Nexus and be willing to ship my processer to them to test and get more media attention on this issue. But the voltage tweak is giving me a stable system and I'm only losing a 100Mhz compared to default. And when it's stable it's an amazing processor.


The fact that all of these reviewers are obviously getting extremely well tested stuff before they recieve it makes me think its opening up a whole new market for reviewers to start buying their own hardware when they are released to compare and re-test them vs the review samples they got.

This would definitely be very popular among viewers to get a real world comparison between the 2, because its safe to say due to the extreme influense some of these reviewers have a big company like AMD, Intel, Nvidia would NEVER gamble and just send them a random sample, the amount decrease in sales generated from just one bad review early on probably yields 100+ millions in lost revenue if not more so its understandable that they handpick samples.


----------



## frollic

thunk_stuff said:


> If you could try out the +8 all cores in the curve optimizer (if you bios supports it), that would be helpful. It's what has made the system stable for me and excite, and would confirm we are all running into the exact same issue. You do lose a hundred or so mhz on max boost.


Nope, no curve optimizer on the Gigabyte B550I, or at least not able to find it :/

I actually have another 5900x sitting on my desk, still not sure if I should return it, or unpack it, and check if it'd crash too :|


----------



## gcji

frollic said:


> Nope, no curve optimizer on the Gigabyte B550I, or at least not able to find it :/
> 
> I actually have another 5900x sitting on my desk, still not sure if I should return it, or unpack it, and check if it'd crash too :|


I have a B550I as well, with a 5950x - curve optimizer is only on BIOS revision F11, it won't show up on F10. You also need to set PBO to advanced to see it. Setting it to +8 has been the only way to keep my CPU somewhat stable, though I still get occasional crashes. I want to say I had better luck with F11j (beta bios) than the final F11 release but I have no idea how to reproduce the crashes so no systematic way to tell.

Consistent with many other people in this thread, my crashes never happen under heavy load but just when surfing the web, roughly every 4 hours even with curve optimizer on (like every 15 minutes without).


----------



## thunk_stuff

gcji said:


> Setting it to +8 has been the only way to keep my CPU somewhat stable, though I still get occasional crashes.


Have you tried it slightly higher, like +10? I was mostly stable at +6, but it did crash after a day. Setting it to +8 has been stable for 12+ days. I figure even if we all are having the same issue, the silicon quality between every CPU will be slightly different.


----------



## aa.delite

Now testing F11(test) pre-final bios (b550 aorus master). Defaults, memory XMP profile 3733mhz, SVM (virtualization) enabled, core voltage "Normal", Core voltage offset -0.2v. Prime95 is stable for hours. Temperature is very low, turbo boost all cores 4566 while prime95. Played Cyberpunk for a few hours, no WHEA, no BSOD, no USB 2.0 problems, no reboots yet. Core offset is a magic, it works with so low voltage so you can't get stable by manual fixed settings.


----------



## Imraneo

thunk_stuff said:


> I've been posting in that thread along with here. Also have done: /r/ASRock, /r/AMDhelp, starting RMA with AMD, bugging ASRock for BIOS update (their support is pretty bad BTW). Lowest tier AMD support is probably aware of the thread, but no official response yet from AMD to even acknowledge there's an issue.
> 
> If you have an ASUS motherboard you might be able to try out an AGESA 1.1.8.0 BIOS, which looks like the last hope before RMA becomes reality. For at least some people it's eliminated WHEA issues. If you don't have ASUS you could try bugging support for your motherboard and seeing if they have a beta bios.
> 
> Almost tempted to reach out to Gamer Nexus and be willing to ship my processer to them to test and get more media attention on this issue. But the voltage tweak is giving me a stable system and I'm only losing a 100Mhz compared to default. And when it's stable it's an amazing processor.


I'm on the 1.1.8.0. BIOS with Asus and it's not fixed there.
1.1V is still is sweet spot for me. Running well with advertised clock speeds. Im only loosing the C-states functionality.
Last night I ran with 1.2V and all-core benchmark threw it to 150W, causing restarts. 1V was too weak.
1.15V came very close to 150W, with higher temps.
And all this makes me wonder.. why my stock voltage is set to 1.44V?!
I've already discussed with my retailer and he's fairly certain that the BIOS needs to be polished. I've sent him an email about this which he will forward to AMD through the distributor channel.
I'll buy some time before I decide to RMA..


----------



## aa.delite

Imraneo said:


> Last night I ran with 1.2V and all-core benchmark threw it to 150W, causing restarts. 1V was too weak.
> 1.15V came very close to 150W, with higher temps.


Don't use manual voltage, use negative offset. Core offset or curve. Try -0.2v offset.


----------



## Imraneo

aa.delite said:


> Now testing F11(test) pre-final bios (b550 aorus master). Defaults, memory XMP profile 3733mhz, SVM (virtualization) enabled, core voltage "Normal", Core voltage offset -0.2v. Prime95 is stable for hours. Temperature is very low, turbo boost all cores 4566 while prime95. Played Cyberpunk for a few hours, no WHEA, no BSOD, no USB 2.0 problems, no reboots yet. Core offset is a magic, it works with so low voltage so you can't get stable by manual fixed settings.


How does voltage offset work differently than core voltage? Does it still fluctuate?
Although 1.1Vcore works, I still need to disable C-state functionality. I'll try the offset and see if it can work with C-state on.
For benchmarking, try Cinebench. It seems to be more stressful than Prime95.


----------



## gcji

thunk_stuff said:


> Have you tried it slightly higher, like +10? I was mostly stable at +6, but it did crash after a day. Setting it to +8 has been stable for 12+ days. I figure even if we all are having the same issue, the silicon quality between every CPU will be slightly different.


Yeah, I tried +3 up to +15. It seems like at +8 the really bad crashes stop but even at +15 I would still get occasional crashes. I'm running the 5950x on eco mode in a tiny SFF PC with a nominally 100W TDP cooler (waiting on a new case and watercooling), so my temps are also really high which may affect stability.



aa.delite said:


> Now testing F11(test) pre-final bios (b550 aorus master). Defaults, memory XMP profile 3733mhz, SVM (virtualization) enabled, core voltage "Normal", Core voltage offset -0.2v. Prime95 is stable for hours. Temperature is very low, turbo boost all cores 4566 while prime95. Played Cyberpunk for a few hours, no WHEA, no BSOD, no USB 2.0 problems, no reboots yet. Core offset is a magic, it works with so low voltage so you can't get stable by manual fixed settings.


Which setting are you changing? DVID? I was under the impression only curve optimizer modifies the voltage-frequency curve and that the offset just limits maximum CPU voltage.


----------



## aa.delite

gcji said:


> Which setting are you changing? DVID? I was under the impression only curve optimizer modifies the voltage-frequency curve and that the offset just limits maximum CPU voltage.


Yep, DVID. Got huge temperature/voltage drop and higher boost. And there is no 1.44-1.50v idle anymore, and heavy load is ~1.04-1.13v instead of 1.30v defaults. Seems very stable for now. It's not about WHEA. I haven't had WHEA errors since F11j bios anyway. But I could not pass prime95 in stock, was a black screen in 15 minutes and power button didn't work. I've had random reboots and up to 1.50v idle. I don't face these problems with negative DVID offset for now.


----------



## MoW

frollic said:


> Nope, no curve optimizer on the Gigabyte B550I, or at least not able to find it :/
> 
> I actually have another 5900x sitting on my desk, still not sure if I should return it, or unpack it, and check if it'd crash too :|


It will be great if you can test that cpu. If it can't reproduce back the problems then it should confirm our suspicion on the silicon lottery.


----------



## newls1

anyone try bios 3003 for the crosshair viii dark yet?


----------



## iraff1

aa.delite said:


> Yep, DVID. Got huge temperature/voltage drop and higher boost. And there is no 1.44-1.50v idle anymore, and heavy load is ~1.04-1.13v instead of 1.30v defaults. Seems very stable for now. It's not about WHEA. I haven't had WHEA errors since F11j bios anyway. But I could not pass prime95 in stock, was a black screen in 15 minutes and power button didn't work. I've had random reboots and up to 1.50v idle. I don't face these problems with negative DVID offset for now.
> View attachment 2469132


Cool, -0.2 seems like a lot, is it working just fine? I will try this too as i am interested


----------



## Deepcuts

Tried -0.20000 in the past.
Tried again today on F31. No soup for handbrake encodes. Will reboot.


----------



## excitebike

newls1 said:


> anyone try bios 3003 for the crosshair viii dark yet?


I immediately updated my C8DH to it when it was released on 12/07. Still have the issue without +8 Curve Optimizer.


----------



## aa.delite

iraff1 said:


> Cool, -0.2 seems like a lot, is it working just fine? I will try this too as i am interested


Just working fine. Depends on CPU, your one may work fine with -0.15, maybe -0.2 or more



Deepcuts said:


> Tried -0.20000 in the past.
> Tried again today on F31. No soup for handbrake encodes. Will reboot.


Do you know a method to cause reboot pretty fast? Do you have WHEA errors?


----------



## Deepcuts

aa.delite said:


> Do you know a method to cause reboot pretty fast? Do you have WHEA errors?


Never had WHEA errors reported by HWInfo on this board. Just crashes or BSOD with the error WHEA Uncorrectable Error. 
A Handbrake CPU encode will crash pretty fast. I would say about 30 minutes.


----------



## aa.delite

Deepcuts said:


> A Handbrake CPU encode will crash pretty fast. I would say about 30 minutes.


30 minutes is not so fast.. I will try. Encode to H.264 FullHD? Reboots are usually caused by idle or frequent 1-core boost up to 5000+.
Up to ~70-80% all core load during encoding BDRip by Handbrake.
Up to 20% encoding low-quality TS/DVDRip source, floating turbo boost all/random cores 4641 jumping from 3600 and back, temp 61C. Seems like it's useful to encode low-quality videos to get jumping voltage and random cores boost spread.
Update: encoded all the night, no reboot issues. Maybe latest bios fixed it for me.


----------



## newls1

Im so confused with all the "curve optimer" talk... Can someone point me to an article/youtube vid that describes what all that talk means please? I actually have the CHVIII "DARK" hero motherboard still sitting in the box untouched as I cant get ahold of a 5950x, so until i can get my hands on this cpu, i sit here hoping these issues get resolved by the time i build my machine. I bought this Dark hero board for the new OCing feature, so im trying to find a video on how to set that up.


----------



## Imraneo

Since my stock Vcore is around 1.44-1.48V and I'm running great at 1.1V, I decided to try offset of -0.3, which brings me to a max voltage of 1.18V.
Advantage of using offset is that the voltages goes lower when idling (down to about 0.8V), thus reducing energy). However, it's strange that I get a performance hit all round. About 100++Mhz less boost speeds all round for all/single core despite having higher max voltage (1.18 vs 1.1) and this shows in Cinebench results as well.
Just an observation here.. I feel there's alot more going on in power management. For now I'll stick to constant 1.1V which gives me all round stability/performance.


----------



## DenneSyd

I just got the same problem aswell. Aorus Xtreme with 5950x. Worked without problems for 3 days. Now it reboots randomly. Sometimes when booting, sometimes I get to the desktop. I’m not sure if it’s a bios problem or of the CPU is defect. No idea what to do now. I tried to skip xmp and boost performance, but no luck.


----------



## Deepcuts

DenneSyd said:


> I just got the same problem aswell. Aorus Xtreme with 5950x. Worked without problems for 3 days. Now it reboots randomly. Sometimes when booting, sometimes I get to the desktop. I’m not sure if it’s a bios problem or of the CPU is defect. No idea what to do now. I tried to skip xmp and boost performance, but no luck.


Clear CMOS via the back button.
Only disable Core Performance Boost. Nothing else.
Still not stable?


----------



## newls1

newls1 said:


> Im so confused with all the "curve optimer" talk... Can someone point me to an article/youtube vid that describes what all that talk means please? I actually have the CHVIII "DARK" hero motherboard still sitting in the box untouched as I cant get ahold of a 5950x, so until i can get my hands on this cpu, i sit here hoping these issues get resolved by the time i build my machine. I bought this Dark hero board for the new OCing feature, so im trying to find a video on how to set that up.


anyone please shed some light on this for me!


----------



## iraff1

DenneSyd said:


> I just got the same problem aswell. Aorus Xtreme with 5950x. Worked without problems for 3 days. Now it reboots randomly. Sometimes when booting, sometimes I get to the desktop. I’m not sure if it’s a bios problem or of the CPU is defect. No idea what to do now. I tried to skip xmp and boost performance, but no luck.


Interesting, i have not had any random reboots (yet) but i had random comeback of WHEA errors in my eventlog something which wasn't there for the past 5-7 days when i had build my computer. They came out of nowhere. Now to the funny part, they also disappeared out of nowhere and i currently don't have them anymore. 

What bios version are you using? What Memory/FLCK settings? Did you verify your setting was stable prior to the 3 days of usage? Like leaving the computer over night running memtest etc?


----------



## DenneSyd

Deepcuts said:


> Clear CMOS via the back button.
> Only disable Core Performance Boost. Nothing else.
> Still not stable?


Still not stable.. it is still randomly rebooting. It also feels like the more often i start the computer, the faster it reboots.


----------



## DenneSyd

iraff1 said:


> Interesting, i have not had any random reboots (yet) but i had random comeback of WHEA errors in my eventlog something which wasn't there for the past 5-7 days when i had build my computer. They came out of nowhere. Now to the funny part, they also disappeared out of nowhere and i currently don't have them anymore.
> 
> What bios version are you using? What Memory/FLCK settings? Did you verify your setting was stable prior to the 3 days of usage? Like leaving the computer over night running memtest etc?


How do you check the eventlog for WHEA errors? I’m using f30, the xmp is 3600, 16-16-16-36, and I have not verified anything overnight. Just playing/working on it. I’m not that good at this level of fixing a computer, I’m afraid.


----------



## Spiriva

newls1 said:


> anyone please shed some light on this for me!







I think this is what you are after?


----------



## aa.delite

I've spotted CPU Core Clock doesn't exceed 4990 MHz (hwinfo for a few days). I've seen 5030 MHz and random reboots before bios update.


----------



## iraff1

DenneSyd said:


> How do you check the eventlog for WHEA errors? I’m using f30, the xmp is 3600, 16-16-16-36, and I have not verified anything overnight. Just playing/working on it. I’m not that good at this level of fixing a computer, I’m afraid.


Look for "Event Viewer" or "loggboken" in swedish, windows logs > system, if you see a ton of yellow warnings click on one and see its a WHEA error


----------



## DenneSyd

iraff1 said:


> Look for "Event Viewer" or "loggboken" in swedish, windows logs > system, if you see a ton of yellow warnings click on one and see its a WHEA error


Thank you (tack!) for explaning. My problem is that my crashes (the computer just turns off as if it lost it’s power.) It’s totally random. Sometimes it takes 3 seconds, sometimes I get to log into windows for some time.


----------



## aa.delite

DenneSyd said:


> My problem is that my crashes (the computer just turns off as if it lost it’s power.)


did you check CPU temperature?


----------



## OCmember

I'm not on a 5k chip yet but I'm seeing the WHEA CPU Bus/Interconnect errors with my 3800X. Gigabyte board & F31 bios. I'm pushing my IF and RAM 1900/3800 (1:1). Yesterday I tried lowering my IF and RAM to 1800/3600 and it seemed like the WHEA errors came sooner in TM5. I'm not having complete computer crashes but the errors are definitely there.


----------



## iraff1

OCmember said:


> I'm not on a 5k chip yet but I'm seeing the WHEA CPU Bus/Interconnect errors with my 3800X. Gigabyte board & F31 bios. I'm pushing my IF and RAM 1900/3800 (1:1). Yesterday I tried lowering my IF and RAM to 1800/3600 and it seemed like the WHEA errors came sooner in TM5. I'm not having complete computer crashes but the errors are definitely there.


Have you stresstested with whea errors present? the bus/interconnect issues often lead to reboot/crashing, be aware your os might corrupt so use an installation you don't care about before stress testing with these errors present.


----------



## OCmember

iraff1 said:


> Have you stresstested with whea errors present? the bus/interconnect issues often lead to reboot/crashing, be aware your os might corrupt so use an installation you don't care about before stress testing with these errors present.


I've only started using HWiNFO64 recently to observe if any errors show up while stress testing. So I only notice them during that time.


----------



## Alyjen

OCmember said:


> I've only started using HWiNFO64 recently to observe if any errors show up while stress testing. So I only notice them during that time.


You can check them in Windows Event Viewer. It's rather easy. You can also create custom view which will only show you WHEA-logger errors & warnings.

You are looking for this. If it's few per day, then it's not that dangerous but I'd still look for a way to get rid of them. If it's in hundreds and they appear also while you just sit in idle then it's not stable. 

I was fighting for over two weeks to get rid of them without going all the way down to 3200MHz memory speed. 

BR


----------



## frollic

gcji said:


> I have a B550I as well, with a 5950x - curve optimizer is only on BIOS revision F11, it won't show up on F10. You also need to set PBO to advanced to see it. Setting it to +8 has been the only way to keep my CPU somewhat stable, though I still get occasional crashes. I want to say I had better luck with F11j (beta bios) than the final F11 release but I have no idea how to reproduce the crashes so no systematic way to tell.
> 
> Consistent with many other people in this thread, my crashes never happen under heavy load but just when surfing the web, roughly every 4 hours even with curve optimizer on (like every 15 minutes without).


Ah, I'll check it out, thank you.

Not very familiar with AMD BIOSes, seems to me things are very well hidden, and there are too many options.

I've been an intel fan boy since 1985, up until now


----------



## frollic

MoW said:


> It will be great if you can test that cpu. If it can't reproduce back the problems then it should confirm our suspicion on the silicon lottery.


Ended up returning it, sorry.
Just couldn't be arsed to take the system apart _again_, to play with the CPUs.

I did however get a reply from AMD regarding my RMA, they (obviously) wanted more information 



Code:


Response and Service Request History:

Thank you for your email, I really appreciate your patience.

I understand that you your system is unstable, unless half of the cores are disabled. This issue may be related to general system settings or defective hardware.

We would like to get your system up and running as quickly as possible. To achieve this, we need to determine if the processor is faulty, or if the issue could be caused by something else.

To aid troubleshooting suggestions please provide the following information:

Can you please elaborate on what you mean by “unstable”? are getting error messages? Performance issue?etc
What is the make and model of your Motherboard?
Is your system overclocked or running at default?
What are your RAMs make, model and clock speed that is running at?
What is the make and model of the CPU cooler?
Make and model of your Power Supply?
Also, can you provide any or all the following reports to analyze?

DxDiag report- Click Start, type in DxDiag and press Enter. Under Run DirectX 64-Bit, click on Save All Information and save this as dxdiag.txt.
System Information report – Click Start, type in msinfo32 and press Enter. Click File and then click.  enter a name and then save the file.
Belarc report (if possible - optional)- Go to www.belarc.com/free_download.html and download/install/run the Free Belarc Advisor. This will generate an HTML report locally on your system, to be saved through "File->Save As" in your browser as report.html.


----------



## OCmember

@Alyjen Thanks for the reply. I'll take a look through it and update this post.

Cheers


----------



## DenneSyd

aa.delite said:


> did you check CPU temperature?


Yes! 35-45c in idle.


----------



## nevcairiel

frollic said:


> I did however get a reply from AMD regarding my RMA, they (obviously) wanted more information


I've been going through the motions with them for what feels like weeks now, with a new response once about a week or so. I tried to be nice and provide all they asked for, but in the last response (about 2 from the step you are on, or so it seems) I directly asked to start the RMA process now, awaiting their next response..

I understand that they want to filter out bad setups or other obviously faulty components, but I already tested pretty much everything else, and tried to tell them that.


----------



## Deepcuts

nevcairiel said:


> I've been going through the motions with them for what feels like weeks now, with a new response once about a week or so. I tried to be nice and provide all they asked for, but in the last response (about 2 from the step you are on, or so it seems) I directly asked to start the RMA process now, awaiting their next response..
> 
> I understand that they want to filter out bad setups or other obviously faulty components, but I already tested pretty much everything else, and tried to tell them that.


Same here with the email interaction with them.
1st response from AMD:
_Looks like you have done all necessary troubleshooting, We want to make sure that you r CPU is indeed defective so we can save time. can you provide *any or all* the following reports to analyze?_
2nd response from AMD:
Asking about GPU, memory and QVL.

I am just amazed about the total lack of balls with such a big name like AMD to just admit "there is a problem with some batches of 5000 and we are working on it". Or maybe even a recall based on the serial number.
I am 100% confident they know about the problem, but for some reason, they play dumb.


----------



## aa.delite

Deepcuts said:


> I am 100% confident they know about the problem, but for some reason, they play dumb.


Just remember Nvidia and first RTX 3080 crashed in games if the core boosted up to 2050+ Mhz. They've fixed it by drivers made the core slower. Then changed capacitors group in 2nd batch but didn't say 1st batch is defective. Well, it wasn't Nvidia reference card, but almost all other brands.
Do you see 5000+ core boost? Maybe it's the problem.


----------



## smbell1979

I'm still getting WHEA errors every now and then, but my main "can't even use the computer or barely get to the windows desktop" reboot issue was solved by disabling the two Asus PBO tweaks in the BIOS, "The Stilt FMAX" one and the other one below it, can't remember the name of it off hand. Neither of which seemed to have much of an effect on performance after being disabled.

These reboots happened when a single core was boosted past 5050ish. I could watch it happen in HWinfo.

Just something else to try if you are on an Asus board.


----------



## Spiriva

smbell1979 said:


> I'm still getting WHEA errors every now and then, but my main "can't even use the computer or barely get to the windows desktop" reboot issue was solved by disabling the two Asus PBO tweaks in the BIOS, "The Stilt FMAX" one and the other one below it, can't remember the name of it off hand. Neither of which seemed to have much of an effect on performance after being disabled.
> 
> These reboots happened when a single core was boosted past 5050ish. I could watch it happen in HWinfo.
> 
> Just something else to try if you are on an Asus board.


Send the grabage back, and do like me: Wait for the next intel platform to come out. 

Amd is *trash*.


----------



## nevcairiel

aa.delite said:


> Just remember Nvidia and first RTX 3080 crashed in games if the core boosted up to 2050+ Mhz. They've fixed it by drivers made the core slower. Then changed capacitors group in 2nd batch but didn't say 1st batch is defective. Well, it wasn't Nvidia reference card, but almost all other brands.
> Do you see 5000+ core boost? Maybe it's the problem.


Oddly, its actually the opposite for me. Mine is fine as long as I put a lot of load on it. In idle, its when it crashes. Turning off some power saving features makes it more stable (but not entirely so). Its some odd interaction for sure.

My hunch is that the IO Die has some issues, but its really hard to come to hard conclusions with such random crashes.


----------



## arvu

While trying to find a solution for the problem, I found this thread. I guess I can also join your club. My current configuration is:

AMD 5950x
ASUS TUF Gaming B550M-Plus (Wi-Fi) (latest bios release 1401 w/ AGESA V2 PI 1.1.0.0 Patch C )
Corsair Vengeance LPX 3600CL18 2x32GB kit
ADATA XPG SX8200 1TB m.2 nvme SSD
Corsair RM750x ATX psu
Palit GTX 970 (recycled from old computer)

I have tried also MSI MAG B550M MORTAR wifi motherboard with rest of the components, but it had even worse symptoms (it could not boot windows login without fixed voltage and cpu clocks).

With current system with default settings it can boot windows, but sometimes I get WHEA BSOD. It might happen many times in a row when starting system cold, but I have also managed to run system for > hours running prime95 or cinebench r20. Several times it has crashed right after I stop stress tests. Problem seem to manifest more often when system has no or very little load.

It seems to be more stable with following settings:

Fixed core voltage & cpu clocks, disabled pbo, disabled c-states or
default settings, disabled c-states, fMax enhancement disabled, w/ Manual PBO w/ PPT 200, TDC 150, EDC 300.
latest bios release 1401 w/ AGESA V2 PI 1.1.0.0 Patch C

I do not want to resort fixed voltage & clocks, due to high temperatures in stress testing. I'm now running latter settings few hours without any WHEA BSOD, or other issues.

Are there some settings to try that could make system more stable?


----------



## OCmember

@Alyjen Hey. I created a custom event viewer and on my 3800X I have a total of 491 WHEA logger CPU Bus/Interconnect errors. As for today, I only have 2 roughly 4 hours apart. 1:1 1900/3800 cl15 A2 PCB. I'm starting to have second thoughts about going through with my Vermeer build and switching over to Intel build.


----------



## WinterActual

Guys maybe we should check our build dates. Probably we got some really bad batches. My 5600x is made in October and I bought it just a week ago..


----------



## Spiriva

OCmember said:


> @Alyjen Hey. I created a custom event viewer and on my 3800X I have a total of 491 WHEA logger CPU Bus/Interconnect errors. As for today, I only have 2 roughly 4 hours apart. 1:1 1900/3800 cl15 A2 PCB. I'm starting to have second thoughts about going through with my Vermeer build and switching over to Intel build.





OCmember said:


> @Spiriva Well it's too late now, a lot of people would of helped you on forums. Shame, you have all those extra components but no spare case. Odd.


I tought you said i gave up to fast on AMD and that thier is so many ppl on the forum that could help ?

But i guess its not as fun when its your self that are sitting there with the AMD garbage system. Do your self a huge favor and dump that AMD trash system.
The 5000 serie is a mess, who wants a system that BSOD´s or random hard reboots every now and then.

Send it back and wait for Intel. Intel should have a new cpu out in Q1 2021.


----------



## Deepcuts

Please, let's not start an AMD vs Intel. Keep it on point.


----------



## WinterActual

Hey Deepcuts, I noticed that you are from Romania. I am from Bulgaria but I bought my cpu from Romania. Maybe we got units from the same bad batch lol


----------



## Alyjen

OCmember said:


> @Alyjen Hey. I created a custom event viewer and on my 3800X I have a total of 491 WHEA logger CPU Bus/Interconnect errors. As for today, I only have 2 roughly 4 hours apart. 1:1 1900/3800 cl15 A2 PCB. I'm starting to have second thoughts about going through with my Vermeer build and switching over to Intel build.


Yea I'm pretty sure that dropping IF a little bit could improve the situation. Or you have to look for other fixes. On Reddit I saw something like this,

_Corrected WHEA errors occur when Vcore drops too low when it is necessary, but the CPU manages to recover from it. (If it doesn't, the PC will just BSOD with "WHEA_UNCORRECTED_ERROR".) You may want to set your own flat voltage instead of relying on auto. This error is typically more evident when you try to push FCLK too hard.

However, corrected errors are really only a concern if they happen too often. I would say that a maximum of one corrected error per 1-2 hours is "safe". Any lower, and you will fail some stability tests, especially when RAM overclocking. It may also randomly throw a BSOD during an extended session of gaming or rendering. (If you are a heavy overclocker, compromises must be made as it is near impossible to never get corrected WHEA errors.)_

I was going to give this CPU voltage tweaks a go but in my case 1866/3733 is looking very promising (not a single error/warning 4 days now, and a lot of gaming testing and normal use during that period).
Grass is always greener on the other side I guess. I don't regret going to AMD after using Intel since Barton 2500+ times.. I only wish I did it later, but during these crazy times you either buy on release and face all early adopters issues, or wait for weeks or months in some parts of the world

full thread

__
https://www.reddit.com/r/Amd/comments/jxvl19


----------



## Spiriva

Deepcuts said:


> Please, let's not start an AMD vs Intel. Keep it on point.


What do you mean? I had a Asus x570 Dark Loser & 5950x, it crashes, hard rebooted, BSOD all the time. I changed: PSU, ddr4 (4 diff kits) graphic card, unplugged all usb, changed settings to more volt, less volt, set the ddr4 timeings. *NOTHING *stopped the crashes, hard rebooted, BSOD.

So now you say i cant give the advice to send the AMD garbage back? We are suppose to sit here like idiots and wait for a bios update for months to fix this trash?
Ye...sure. Spend €1500 on a motherboard and a cpu and then get alot of problems, and the solution is to "sit and wait for a few months for a bios fix that *MIGHT *fix your problems".

So yes, my advice to anyone who own a grabage AMD 5000 serie system with crashes, hard rebooted, BSOD, is to send it back ASAP and get your money back. And then just wait for a new Intel launch.

Since AMD is grabage, *IN MY OPINION*.


----------



## Deepcuts

@Spiriva breathe. Shouting is pointless.
Keep it on point means to discuss the topic. In this case, AMD and the supporting hardware/software around it. Intel has no place in this discussion.


----------



## Spiriva

Deepcuts said:


> @Spiriva breathe. Shouting is pointless.
> Keep it on point means to discuss the topic. In this case, AMD and the supporting hardware/software around it. Intel has no place in this discussion.


"Shouting is pointless." This is in text form tho, so there is no shouting.

And again, is sharing the experiance around AMD products not something that this forum is for? Im sorry but you seem like you are butthurt about the fact that i think AMD is garbage.

Selling products that doesnt work out of the box is however huge garbage. That is what AMD have done.


----------



## Spectre73

Spiriva said:


> "Shouting is pointless." This is in text form tho, so there is no shouting.
> 
> And again, is sharing the experiance around AMD products not something that this forum is for? Im sorry but you seem like you are butthurt about the fact that i think AMD is garbage.
> 
> Selling products that doesnt work out of the box is however huge garbage. That is what AMD have done.


So, just to understand the issue. Are you experiencing these problems with default/stock settings or with any kind of (RAM) overclock? Because, most users reporting problems only have them with some kind of OC or tinkering in place.
If you run EVERYTHING at stock, I agree, errors are unacceptable.


----------



## Spiriva

Spectre73 said:


> So, just to understand the issue. Are you experiencing these problems with default/stock settings or with any kind of (RAM) overclock? Because, most users reporting problems only have them with some kind of OC or tinkering in place.
> If you run EVERYTHING at stock, I agree, errors are unacceptable.


Yes, at stock, and at "load default".


----------



## nevcairiel

Well my RMA was approved and its now on its way back to AMD. Will post an update when I hear back and/or receive a replacement.


----------



## frollic

nevcairiel said:


> Well my RMA was approved and its now on its way back to AMD. Will post an update when I hear back and/or receive a replacement.


good work, fingers X:ed.


----------



## OCmember

I was trying to encourage @Spiriva to hang on to his rig and wait for help, however this whole WHEA CPU Bus/Interconnect error has resurfaced and is now causing me doubt about building my Vermeer rig, but the difference between you and I Spiriva is I'm not trash talking AMD. I'm taking it slow and trying to find out what's going on. And if this is a learning lesson for you this is one reason to have a rig up and running, which I kept asking you about because it could of helped you to find answers like Alyjen gave in post #240 in this thread.


----------



## Spiriva

OCmember said:


> I was trying to encourage @Spiriva to hang on to his rig and wait for help, however this whole WHEA CPU Bus/Interconnect error has resurfaced and is now causing me doubt about building my Vermeer rig, but the difference between you and I Spiriva is I'm not trash talking AMD. I'm taking it slow and trying to find out what's going on. And if this is a learning lesson for you this is one reason to have a rig up and running, which I kept asking you about because it could of helped you to find answers like Alyjen gave in post #240 in this thread.


AMD sold you and me a product that doesnt work, AMD doesnt deserve us to be calm and nice and wait for them to fix this. There are threads dating back 30days with the same problem. Its not a new thing.
AMD is asking ppl to pay for a product that doesnt work, a product that is pure garbage.


----------



## OCmember

@Spiriva Stop with the toxic posts


----------



## dehun

Hi folks, I am new to this board and overclocking in general. But ran into the same issue

gigabyte aorus x570 elite
amd ryzen 7 5800x
bios version F31o, with "load optimized defaults" - aka stock(afaik) in bios.
Everything worked well under moderate load and occt test.
However when I have tried to play a game - it crashed with WHEA uncorrectable in ~10 minutes.
I have tried to reproduce the issue - and OCCT power test seems to reproduce it quite quickly.
Also aida64 memory read benchmark seems to trigger it.

I have tried adjusting LLC and that did not helped.

Setting vcore to 1.3 and CPU multiplier to 46 actually did trick for me.

Unfortunatelly that apparently disables thermothrottling - so OCCT power test is not possible anymore 
But it seems to be stable - no crashes in games.

CBS and PBO are set to auto.


----------



## Spiriva

OCmember said:


> @Spiriva Stop with the toxic posts


Is it "toxic" to say how a product works? (well in this case not working).



dehun said:


> Hi folks, I am new to this board and overclocking in general. But ran into the same issue
> 
> gigabyte aorus x570 elite
> amd ryzen 7 5800x
> bios version F31o, with "load optimized defaults" - aka stock(afaik) in bios.
> Everything worked well under moderate load and occt test.
> However when I have tried to play a game - it crashed with WHEA uncorrectable in ~10 minutes.
> I have tried to reproduce the issue - and OCCT power test seems to reproduce it quite quickly.
> Also aida64 memory read benchmark seems to trigger it.
> 
> I have tried adjusting LLC and that did not helped.
> 
> Setting vcore to 1.3 and CPU multiplier to 46 actually did trick for me.
> 
> Unfortunatelly that apparently disables thermothrottling - so OCCT power test is not possible anymore
> But it seems to be stable - no crashes in games.
> 
> CBS and PBO are set to auto.



Return it. Never accapt a product no matter the cost of the product, if that product doesnt work - return it asap.


----------



## OCmember

@Spiriva Yes. Do you not realize you're being toxic? When _continually_ posting "Garbage" and "Trash" it's not describing how a product works it's more than that, and some people have already made comments on your behavior. At this point it's like extreme intel fanboyism. We've already gave you several passes with your emotions on this but now it's becoming obvious you have some sort of agenda.


----------



## WinterActual

I called my retailer earlier today and they said the current RMA process is very simple - I send them my cpu and as soon as they see its on its way, they will ship me a new unit. So I will probably change my 5600x for a new one tomorrow.


----------



## Deepcuts

So far, found 2 users (not on overclock.net) with the same issue that have managed to RMA their 5950X and the replacement was stable.
I am starting to think this post should have been better suited for the AMD CPUs forum instead of AMD Motherboards and also that those of us waiting for a miracle BIOS update are waiting in vain.
That's it for me. Sending the CPU to the shop. 1 month is enough waiting.
Hope to get back with good news. o/


----------



## arvu

I've now run my system for 24 hours with various tests (idling, web browsing, light gaming, prime95, cinebench) and zero WHEA or other issues except one unintended reboot due to windows update. My current settings are:


disabled ASUS performance enhancement
disabled fMax enhancement
disabled global c-states
Manual PBO
PPT 200
TDC 160
EDC 350
DOCP enabled, default memory settings w/ PC-3600 memory.
No voltage or other adjustement. Everything else should be defaults.

These settings lower maximum clocks a bit, but i do still get ~600 Cinebench r20 single threaded, and ~10700 multi threaded.

I've seen a lot of WHEA errors, BSOD and reboots past two weeks, and now it's first day for me without any such issues.


----------



## WinterActual

I managed to eliminate the whea crashes with the following steps: loaded optimized defaults, enabled XMP, PBO ON +200, Limits: Motherboard, Scalar: X1 (this was very important, anything above that brings whea crashes). Everything else is the default/auto setting. I played 4-5 different games with no crashes and no errors in the event log. I usually test with GTA V because with different settings the game crashes EVERY TIME during the load, I never managed to get into the game with different settings but with these everything runs ok. Its just my temps that are through the roof even if I have AIO - 50c idle, 62-72 during gaming. I will run the cpu for another day or two and then decide if I should swap it for another one with RMA.


----------



## HKisd

I had 5950X manufactured during week 45 (2045SUS) which crashed even in BIOS settings and rebooted while installing Windows. Even with bios defaults and memory clock of 2133MHz. It rebooted or crashed while idling once or twice per day. I troubleshooted that system / CPU for 10 days and can totally understand why @Spiriva is angry. Two days ago I was able to swap it to another 5950X from week 46 (2046SUS) and this one seems to be stable and so far it has worked flawlessly with XMP enabled.


----------



## arvu

HKisd said:


> I had 5950X manufactured during week 45 (2045SUS) which crashed even in BIOS settings and rebooted while installing Windows. Even with bios defaults and memory clock of 2133MHz. It rebooted or crashed while idling once or twice per day. I troubleshooted that system / CPU for 10 days and can totally understand why @Spiriva is angry. Two days ago I was able to swap it to another 5950X from week 46 (2046SUS) and this one seems to be stable and so far it has worked flawlessly with XMP enabled.


My CPU is from the same week and fab (2046SUS). BIOS defaults did cause WHEA BSOD. Now with tuned settings it passed tests for more than 24 hours. It seems to be quite sensitive to changes in BIOS settings. 

I'd guess some mobo + cpu combos are more sensitive to signal/power integrity issues. Individual chips might be borderline cases that pass tests in ideal conditions, but fail to some minor random interference.


----------



## nevcairiel

I don't think you can necessarily assign a timespan so easily to broken chips. Would be nice if they could be identified, but if its a production problem that validation didn't catch, it may have gone on for a while (or even ongoing) until production matures and/or validation is improved. They are of course also producing loads of them, and who knows what percentage may have such problems. Not sure if AMD would ever give out RMA figures, but some shops might.

I just hope the RMA process goes without further complications, and I can get a replacement chip from AMD that does not have this issue.


----------



## dehun

I have tried another configuration today - everything was put onto default except ram settings.
I have HX432C16PB3AK2/64 (https://www.kingston.com/dataSheets/HX432C16PB3AK2_64.pdf) which I decided to overclock a bit 


mem vddio = 1.45v
mem multiplier = 36
using XMP Profile #1: DDR4-3200 CL16-18-18 @1.35V (but with above settings)

This surprisingly results in no crashes, and system can withstand OCCT power test - which was yielding WHEA errors quite consistantly with default configuration.

This ram is in supported list for this CPU, but no on the supported list for the motherboard (see https://download.gigabyte.com/FileList/Memory/mb_memory_x570-aorus-elite_vermeer.pdf)

So perhaps it's RAM in my case.


----------



## excitebike

I was lucky enough to get my hands on another 5950x and after testing yesterday and letting my system idle overnight, I'm pretty certain the new 5950x has completely fixed my issues. I'm still going through the RMA for the other 5950x, but I'm convinced. Idle reboots with WHEA-Logging Code 18 (Cache Hierarchy Error) seem to be faulty processors.

Smooth sailing at default settings for me. On to tuning my system with working silicon now.


----------



## aa.delite

excitebike said:


> I'm still going through the RMA for the other 5950x, but I'm convinced. Idle reboots with WHEA-Logging Code 18 (Cache Hierarchy Error) seem to be faulty processors.


I've bought OEM version, it's still hard to buy BOX here. So I can't RMA, I have to use seller's warranty.
There are no statements from AMD, so seller's service center will test CPU for fault issues. I think there is 90% chance to get current CPU back as "working well" with "no problem detected".
Seems like new BIOS fixed reboots for me, but if it's CPU fault it may come back later.


----------



## scarfield

Registrered to jump on the bandwagon.

Running an Asus ROG Strix x570-f with the 5950x and I've gotten the same types of bluescreens. Mostly with the Stop Code "KMODE_EXCEPTION_NOT_HANDLED". Sometimes with an "What failed: amdppm.sys" added to it. Tried two different BIOS versions, 3001 and 2802 with AGESA V2 PI 1.1.0.0 Patch B.

I was attempting a fresh install, and this happened at random times during Windows install from USB. Sometimes I never even got into install. The moment it loaded the USB it would stop, or somestimes during the copy process.

I've disassembled my desktop several times, swapped PSU and m.2 NVMe SSD. Tried running with 1 to 4 RAM chips, and even a completly different RAM Kit just to be on the safe side.

I'm not an overclocker, so I havent dared play around with voltages, but the only thing I've found to work is to lock the cores multiplier. I can run stable with 3400Mhz and up to 4400Mhz. If I attempt higher, I get bluescreens again.

If locked to 3400Mhz, or any other, and then do a stress test, it blacks out when it seems to reach about 4600Mhz. Not even bluescreen.

As a sidenote, I have a 3950x which runs smoothly with nothing else changed.

I've gone for RMA today after reading through this thread, so thanks for all the info. I'll update if I ever get a new one.


----------



## aa.delite

scarfield said:


> Running an Asus ROG Strix x570-f with the 5950x and I've gotten the same types of bluescreens. Mostly with the Stop Code "KMODE_EXCEPTION_NOT_HANDLED". Sometimes with an "What failed: amdppm.sys" added to it.


Seems like something different. Try not to use USB 2.0 and check again. Use USB 3.0 ports. Check newer BIOS here


----------



## scarfield

aa.delite said:


> Seems like something different. Try not to use USB 2.0 and check again. Use USB 3.0 ports. Check newer BIOS here


Tried both with same result. Install went perfect when locked core multiplier. And the problems persist when I try to change to default BIOS settings after Windows is running smoothly.


----------



## aa.delite

scarfield said:


> Tried both with same result. Install went perfect when locked core multiplier. And the problems persist when I try to change to default BIOS settings after Windows is running smoothly.


I've had the same USB 2.0 problems, but latest Gigabyte bios fixed it. Your BSOD code is something different related to USB, not cache hierarchy error. Check Event Log (or hwinfo64) for WHEA errors. Maybe it's defective CPU, but it's something new in this thread.


----------



## scarfield

aa.delite said:


> I've had the same USB 2.0 problems, but latest Gigabyte bios fixed it. Your BSOD code is something different related to USB, not cache hierarchy error. Check Event Log (or hwinfo64) for WHEA errors. Maybe it's defective CPU, but it's something new in this thread.


Just for ****s and giggles, I disconnected both USB 2.0 and 3.0 cables from the motherboard and then setting BIOS back to defaults. Same issue. Bluescreen almost immediately after POST. A couple of times I got to Windows login screen before bluescreen.


----------



## Imraneo

dehun said:


> I have tried another configuration today - everything was put onto default except ram settings.
> I have HX432C16PB3AK2/64 (https://www.kingston.com/dataSheets/HX432C16PB3AK2_64.pdf) which I decided to overclock a bit
> 
> 
> mem vddio = 1.45v
> mem multiplier = 36
> using XMP Profile #1: DDR4-3200 CL16-18-18 @1.35V (but with above settings)
> 
> This surprisingly results in no crashes, and system can withstand OCCT power test - which was yielding WHEA errors quite consistantly with default configuration.
> 
> This ram is in supported list for this CPU, but no on the supported list for the motherboard (see https://download.gigabyte.com/FileList/Memory/mb_memory_x570-aorus-elite_vermeer.pdf)
> 
> So perhaps it's RAM in my case.


Isn't the mem multiplier used to determine your overall CPU operating speed? BCLK 100Mhz x 36 = 3.6Ghz.
This could mean that your CPU is stuck at 3.6Ghz and not boosting, which is why it's stable. Check HWmonitor to confirm your clocks. 

Now I'm in a dilemma..
Should I RMA or leave my vCore tweak on and move on with life? Perhaps wait a couple more months to see where the BIOS updates go? If I RMA, I'll need a spare CPU to use.. 😓


----------



## arvu

scarfield said:


> I'm not an overclocker, so I havent dared play around with voltages, but the only thing I've found to work is to lock the cores multiplier. I can run stable with 3400Mhz and up to 4400Mhz. If I attempt higher, I get bluescreens again.


My symptoms were similar when I was using MSI B550 motherboard. It worked only with fixed core clocks multiplier. I bought new ASUS B550 motherboard and it works much better, but I still got random WHEA BSOD. Now after some tuning, my PC has been stable. It has been running nonstop > 48h without any issues. 

I still don't trust this CPU, because it isn't stable with default stock settings. I need it to be stable 24/7 for years to come. I'll try to return my current CPU to shop for refund.


----------



## WinterActual

Yesterday I've sent my CPU for RMA. I will report back what's the situation with the new unit when it arrives. It should be here pretty soon. The shop said they will send me a new one from their stock as soon as my faulty cpu arrives there and they will deal with AMD by themselves so everything should happen pretty soon.


----------



## glith

Hi,
I also have this issue.. Sent the CPU back to the store today and awaiting their assessment and RMA process. I hope it will be relatively fast. but they do not have any available processors for sale until beginning of January... 

Mobo: Asus ROG crosshair VIII Dark Hero(BIOS: 3003)
CPU: AMD Ryzen 5950X (Manu. Date: 2045SUS)

Tried most of the "workarounds" without success. Must have been a bad batch that manufacturing week.


----------



## Imraneo

During RMA, did you guys happen to have a spare CPU for use?


----------



## Vorwrath

I've got the same issue with a Ryzen 5950X randomly rebooting when idle with a WHEA 18 "Cache Hierarchy Error". In my case it's on an MSI X570 Ace motherboard (7C35v1D2 BIOS). Thankfully it's not as common as some users here are seeing, and tends to happen once every few days, so I've just been putting up with it so far.

Figured it was probably an early BIOS teething problem, but don't seem to be moving any closer to a solution after more than a month. I've therefore opened a ticket with AMD and will see what they have to say.


----------



## Deepcuts

Imraneo said:


> During RMA, did you guys happen to have a spare CPU for use?


I have ordered another 5950X. Should be here next Monday. 
Sending back the old one as soon as the new one arrives.
Would be a lot of fun to discover that the new one has the same problem.


----------



## nevcairiel

Imraneo said:


> During RMA, did you guys happen to have a spare CPU for use?


I'm still using my old Intel system until I get my 5950X back from RMA and switch over to the new one. Most people likely had some kind of system before they ordered a Ryzen 5000, I would assume. One doesn't build their first ever PC on a 5950X, usually.


----------



## Imraneo

I passed my old 6700K Intel system to my son.. so I've not nothing to work in if I RMA my CPU. I can consider getting a cheap 3600 perhaps to tide me over.
Buying another 5900X is also risky as I'm worried if the same problem happens. Not to mention the big problem of availability..
Thus I'll wait out for more BIOS updates. I guess I can afford to wait since my vCore tweak works..


----------



## scarfield

Imraneo said:


> During RMA, did you guys happen to have a spare CPU for use?


I have a 3950x, and it's not really a slow CPU yet, so there's no rush for me. Sent my 5950x back to the retailer today. Curious what they will find


----------



## newls1

Deepcuts said:


> I have ordered another 5950X. Should be here next Monday.
> Sending back the old one as soon as the new one arrives.
> Would be a lot of fun to discover that the new one has the same problem.


where in gods name did you find a place to get you another one?!


----------



## Deepcuts

newls1 said:


> where in gods name did you find a place to get you another one?!


Here we have one of the fastest internet in the world. So we just download more CPUs and RAM.


----------



## nevcairiel

newls1 said:


> where in gods name did you find a place to get you another one?!


Ordering another one wouldnt be a problem, I would just pay 20%-30% extra then what I paid on the AMD online shop, so I'm just awaiting the RMA now.

Edit:
Actually just checked, the best offer is 15% more, which is better then what I expected.


----------



## Deepcuts

I paid the exact same amount for the 2nd 5950X. Around 882 USD EUR including VAT and delivery.
later edit:
Just checked some prices on Amazon. VVTF is going on there? No wonder Jeffy is so rich.


----------



## Schnuppl

newls1 said:


> where in gods name did you find a place to get you another one?!


Germany, Mindfactory.de


----------



## aerodee80

I'm in to monitor this issue.

In terms of the time we spent trying to fix this and rule out bad hardware, a lot of time is lost as well as the money spent on this CPU. I can empathize with @Spiriva on this. I ended up returning mine as I wasn't getting anywhere with an unstable computer.

We are probably dealing something like early adopters. When supply stabilizes and BIOS/drivers mature, I will reconsider getting this cpu again.


----------



## Blazeiam

I've been having issues as well

5950x
asus dark hero
4x8 corsair 3600 cl16

I've tried everything to fix this issue and it just seems like a bad cpu. One odd thing i've noticed is that if I do bsod or crash I land on the bios with no drives to boot (only m.2). The only way to boot to drives is full power cycle. Not sure what that means but quite a headache with this expensive new build. I've started an rma with amd but I doubt I'll see a new cpu before February.


----------



## dehun

Imraneo said:


> Isn't the mem multiplier used to determine your overall CPU operating speed? BCLK 100Mhz x 36 = 3.6Ghz.
> This could mean that your CPU is stuck at 3.6Ghz and not boosting, which is why it's stable. Check HWmonitor to confirm your clocks.
> 
> Now I'm in a dilemma..
> Should I RMA or leave my vCore tweak on and move on with life? Perhaps wait a couple more months to see where the BIOS updates go? If I RMA, I'll need a spare CPU to use.. 😓


Yeah, was my concern as well - I have tested it with benchmarks.
There are 2 multipliers - one for RAM and one for CPU. Cpu multiplier is set to auto. Ram multiplier - I have set it to 36, while XMP profile is 32. So it's running faster than it should.
Memory multiplier - as far as I understood it's used to determine infinity fabric frequency (FCLK) in case if it's set to auto. 
The benchmark results seems to be comparable to what others people are getting with those cpu - 





Gigabyte Technology Co., Ltd. X570 AORUS ELITE - Geekbench Browser


Benchmark results for a Gigabyte Technology Co., Ltd. X570 AORUS ELITE with an AMD Ryzen 7 5800X processor.



browser.geekbench.com


----------



## Emman253

Hello everyone, this problem has been talked about since the beginning of November especially with ASUS motherboards (of which I am the owner) but it is related to almost all manufacturers, the main problem is related to a bug in AMD AM4 AGESA V2 PI 1.1.0.0 which, when the system is active, forces the intervention by WHEA (Windows Hardware Error Architecture) detecting a hardware error in the system and correcting it by rebooting system, this obviously is not caused by a real physical defect but software related to the BIOS agesa integrated to it.

I solved the problem by downgrading to a BIOS version of August 2020 (2606 for owners of Asus motherboards) with AMD Agesa 1.0.8.0 which is currently the stable one and with support for Ryzen 5000 series at least by Asus (5600X, 5800X, 5900X, 5950X), as far as Gigabyte you have to check through the manufacturer which BIOS version is supported the 5000 series with its Agesa 1.0.8.0 version to temporarily fix the problem until it is fixed by AMD.


----------



## Imraneo

Emman253 said:


> Hello everyone, this problem has been talked about since the beginning of November especially with ASUS motherboards (of which I am the owner) but it is related to almost all manufacturers, the main problem is related to a bug in AMD AM4 AGESA V2 PI 1.1.0.0 which, when the system is active, forces the intervention by WHEA (Windows Hardware Error Architecture) detecting a hardware error in the system and correcting it by rebooting system, this obviously is not caused by a real physical defect but software related to the BIOS agesa integrated to it.
> 
> I solved the problem by downgrading to a BIOS version of August 2020 (2606 for owners of Asus motherboards) with AMD Agesa 1.0.8.0 which is currently the stable one and with support for Ryzen 5000 series at least by Asus (5600X, 5800X, 5900X, 5950X), as far as Gigabyte you have to check through the manufacturer which BIOS version is supported the 5000 series with its Agesa 1.0.8.0 version to temporarily fix the problem until it is fixed by AMD.


May I ask if you've received any official info about this AGESA bug?
I did try v2606. The very first that supported Zen3. Still the same issue...


----------



## MoW

Emman253 said:


> Hello everyone, this problem has been talked about since the beginning of November especially with ASUS motherboards (of which I am the owner) but it is related to almost all manufacturers, the main problem is related to a bug in AMD AM4 AGESA V2 PI 1.1.0.0 which, when the system is active, forces the intervention by WHEA (Windows Hardware Error Architecture) detecting a hardware error in the system and correcting it by rebooting system, this obviously is not caused by a real physical defect but software related to the BIOS agesa integrated to it.
> 
> I solved the problem by downgrading to a BIOS version of August 2020 (2606 for owners of Asus motherboards) with AMD Agesa 1.0.8.0 which is currently the stable one and with support for Ryzen 5000 series at least by Asus (5600X, 5800X, 5900X, 5950X), as far as Gigabyte you have to check through the manufacturer which BIOS version is supported the 5000 series with its Agesa 1.0.8.0 version to temporarily fix the problem until it is fixed by AMD.


I did try bios F30 , giga' s first bios for 5000 series support. Result still the same, bsod at bios defaults. So may I know where is this bug you spoke of officially exist ? There's no mentioned of it anywhere.


----------



## cattlecatcat

Joining the club... dumping some of my experiences below (plus a couple of workarounds that seem to have worked for me... so far):

I get random reboots usually followed by WHEA-Logger Cache Hierarchy Errors in the event log. Sometimes it's just the reboot and no error. I haven't seen BSODs or freezes (which I haven't been able to attribute to something else).

*Specs:*
5900x no OC
MSI Unify x570 running 7C35vA82 beta BIOS
64GB F4-3600C16Q-64GTZNC - 4x16GB kit no XMP
Corsair RM650
Noctua D15
EVGA RTX2070 Super Black all stock settings
Samsung 980 Pro 1TB NVME

*Repro:*
It seems to be a bit different for everyone, but for me, running Cinebench 23 in single core mode and otherwise not touching the machine will usually cause it to reboot within 30 minutes (but has taken up to 2 hours before - see below for more info on that).

It only seems to happen when the machine is partly idle, or coming in or out of idle. If I just leave the machine with nothing running, it seems to stay up. It's also stable when running stress tests - it can Prime95, multicore Cinebench, memtest86, TestMem64 iusmus config and play Doom 2016 until the cows come home. And not produce any errors.

*Causes eliminated:*

XMP is off. Running memory at default 2133 or whatever it is
No overclocking of any form. All BIOS settings at their stock settings
Tried multiple mice in case it's something bizarre with USB. It wasn't
Unplugged everything except the power cables from the MB
Swapped the memory for 32GB (4x8) of older and well proven Samsung DIMMS. Did nothing, including when running just one stick by itself
Tried the previous non-beta 7C35vA7 BIOS - no difference
Setting Power Supply Idle to Typical does nothing
Upgrading no end of random drivers and trying different versions has no effect
*Reduced the frequency of reboots:*
Disabling the onboard 2.5GB LAN (RTL-8125), Wifi and Bluetooth in device manager seemed to increase the time to to reboot when running Cinebench single core from 3-30 minutes to 2-3 hours. I checked this twice - but I didn't do any more testing because it didn't actually fix it. And it might've been random coincidence.

*Workarounds:*
For me, one of either:

Global C-State Control = Disabled or;
DRAM Power Down = Disabled
...seems to stop it (touch a giant massive ungodly sized piece of wood). I saw that disabling c-states seemed to help some people, but not sure if I've seen the DRAM power down thing yet. Disabling one or the other has allowed Cinebench single core to last > 8 hours before I get bored and shut it down, with no reboots.

I've not tried both together. Currently using DRAM Power Down = Disabled as it seems to have no noticeable affect on anything, whereas disabling c-states seems to increase the power draw at the wall by a measurable amount.

Going to attempt to use this system as a daily driver for a bit and see if it manages to stay up, and pray for a BIOS update in the meantime, because currently I have no confidence in it. I don't want to RMA the CPU at the moment because I suspect it will mean weeks without one.


----------



## Emman253

Imraneo said:


> May I ask if you've received any official info about this AGESA bug?
> I did try v2606. The very first that supported Zen3. Still the same issue...


Yeah, there's a lot of topic on AMD and ROG Forum about this problem (Reddit too if you check), Asus tried to fix it with some patch with beta release 2812/16, the final "stable" at the moment is 3001 but people's still have problems (Not just with Zen 3 but with Zen 2 too) and problem appears from BIOS version 2802 with Agesa 1.1.0.0. already tried it month ago with Ryzen 7 3700X and Ryzen 9 5900X and for me the most stable version is 2606 without any problems or random reboot like for others users, obviously you can't use all the optimized settings and even PBO for Zen 3. 


__
https://www.reddit.com/r/ASUS/comments/k97mm6






Ryzen 5900x: System constantly crashing/restarting WHEA-Logger ID 18 and critical error Kernel-Power


Mainboard: MSI x570 Unify Mainboard-BIOS: 7C35vA82 (Beta version) CPU: Ryzen 5900x RAM: Crucial Ballistix BL2K32G36C16U4B 3600 MHz, 64GB (32GB x2) Drive: M.2 Samsung 970 Evo+ 1TB SSD Graphics: SAPPHIRE Nitro+ Radeon RX 5700 XT PSU: be quiet straight power 11 750w Platinum OS: Win 10 Pro (64bit)...




community.amd.com





Can you explain your problem with more details to determinate if your problem is correlated to BIOS error? 
Did you reset to default settings your BIOS before flashing?
Did you flash on BIOS/USB Stick instead of Windows? 
Did you decompressed bios file with WinRar?
Did you do clean CMOS too? 
Motherboard Variant? 
Can you send me your Windows logger and CPU? 
Dump files too *C:\Windows\MiniDump*






Best Motherboards for AMD Ryzen 5000 CPUs: X570, B550 for Gaming & Content Creation - Buildzoid / GN


Here is a link to the review. According to Buildzoid, the AMD AGESA is still changing so BIOS for the motherboards are still changing and buggy. He claIms there are lots of issues such as: USB3 port issues. Memory overclock issues. XMP issues. Infinity fabric stability issues. He is...




community.amd.com


----------



## Emman253

MoW said:


> I did try bios F30 , giga' s first bios for 5000 series support. Result still the same, bsod at bios defaults. So may I know where is this bug you spoke of officially exist ? There's no mentioned of it anywhere.


Like i say, this depends on manufacturer BIOS Version, with Asus we have the possibility to downgrade with the previous AGESA Version with Zen 3 supports too, i don't know with other manufacturer.






Best Motherboards for AMD Ryzen 5000 CPUs: X570, B550 for Gaming & Content Creation - Buildzoid / GN


Here is a link to the review. According to Buildzoid, the AMD AGESA is still changing so BIOS for the motherboards are still changing and buggy. He claIms there are lots of issues such as: USB3 port issues. Memory overclock issues. XMP issues. Infinity fabric stability issues. He is...




community.amd.com





According to Buildzoid, the AMD AGESA is still changing so BIOS for the motherboards are still changing and buggy. 
He claIms there are lots of issues such as: 

USB3 port issues. 
Memory overclock issues.
XMP issues. 
Infinity fabric stability issues. 
He is surprised he has not yet heard about a sleep bug. Expects to. 
Claims all motherboards are affected because it comes from the AMD AGESA
Hopes it will be fixed in a few months at most. 

So - there you go. Don't RMA your CPUs because you cannot tell what is wrong yet if the AGESA is causing Motherboard BIOS BUGs.


----------



## Deepcuts

@Emman253 AGESA surely plays a big part, but if only AGESA would be to blame, everyone with a 5000 CPU would have issues. 
And that is not the case. A lot of users are stable at stock and even with high IF and tight timings.
Some of them using the exact same motherboard, CPU and RAM KIT as the next user with WHEA and reboots.
That only leaves two possibilities: faulty CPU or faulty motherboard. I am still 50/50 but that will change come Monday when I will be able to test another CPU.
I see you voted a BIOS update fixed it, but you did not specify which CPU you have.


----------



## nevcairiel

Emman253 said:


> Like i say, this depends on manufacturer BIOS Version, with Asus we have the possibility to downgrade with the previous AGESA Version with Zen 3 supports too, i don't know with other manufacturer.


The gigabyte BIOS with 1.0.8.1 didn't resolve the issue for me. There might be different levels of this issue, some being exacerbate by the newer BIOS. But ultimately some chips are practically unusable right now, while others work just fine, so there must be some difference in the silicon. People have reported night-and-day differences from changing their chip, and the majority of people are not getting instant BSODs or restarts.

I appreciate that the AGESA might not be perfect yet, but there are definitely certain chips that are far far worse right now, and dont run stable with any known BIOS version. Unless you have experienced this, I don't think you can judge how bad it is on some chips. Your issue might not be our issue.

On the other hand, there are many chips that seem to not have any issues, at worst maybe a bit of a limit in overclocking until WHEA errors come up. I just want a chip that I can run right now without constant BSODs, on any BIOS version, and until I have that, I'm going to RMA the CPU, since right now I literally cannot use this expensive system.


----------



## cattlecatcat

Update: Just got a reboot and a WHEA Bus/Interconnect error while playing Hitman for about 5-10 minutes with DRAM power down disabled. Have now disabled global c state control as well to see if that helps.


----------



## GRABibus

This thread is a nice advertising for Intel.
I am waiting for my build with 5900x and honestly I am wondering if I made a good choice when reading all these posts...


----------



## newls1

GRABibus said:


> This thread is a nice advertising for Intel.
> I am waiting for my build with 5900x and honestly I am wondering if I made a good choice when reading all these posts...


im in the same boat.... waiting for a 5950x but have all other parts bought and ready.... hope i made a good choice


----------



## nevcairiel

Don't worry about it until it actually happens to you. Most chips are probably fine.


----------



## Taku123

5950x here and have the same issue with x570-i board on 3001 bios. It's only happened twice so far idling just browsing the web. I DO NOT think it's the CPU. I think it's the bios. This should be fixed with a bios update.


----------



## GRABibus

Taku123 said:


> 5950x here and have the same issue with x570-i board on 3001 bios. It's only happened twice so far idling just browsing the web. I DO NOT think it's the CPU. I think it's the bios. This should be fixed with a bios update.


and with a CPU at 100% , even during idling (just by putting minimum Cpu at 100% in power options in windows) and with a fixed vcore (override), same issue ?


----------



## Taku123

GRABibus said:


> and with a CPU at 100% , even during idling (just by putting minimum Cpu at 100% in power options in windows) and with a fixed vcore (override), same issue ?


This is a pretty new issue for me, my build is less than a week old and it's only happened twice. Everything at stock except a docc profile for my RAM running at 3600 speed. I have the same specs as you motherboard/gpu too. 

I can benchmark everything just fine, just light idle or web browsing and it restarted twice so far. PBO is on auto (disabled) and everything else at stock.


----------



## GRABibus

Taku123 said:


> This is a pretty new issue for me, my build is less than a week old and it's only happened twice. Everything at stock except a docc profile for my RAM running at 3600 speed. I have the same specs as you motherboard/gpu too.
> 
> I can benchmark everything just fine, just light idle or web browsing and it restarted twice so far. PBO is on auto (disabled) and everything else at stock.


I didn’


Taku123 said:


> This is a pretty new issue for me, my build is less than a week old and it's only happened twice. Everything at stock except a docc profile for my RAM running at 3600 speed. I have the same specs as you motherboard/gpu too.
> 
> I can benchmark everything just fine, just light idle or web browsing and it restarted twice so far. PBO is on auto (disabled) and everything else at stock.


I don’t have my build yet. And it will be a 5900x.
Can you just disable PBO , set the multiplier for cpu at 40 to get 4GHz then a vcore override at 1,25 V to be sure it is stable.
Then under windows, set 100% for minimum 
CPU in power options.
Cpu will be at 4GHZ constantly with 1,25V constantly, even at idle.do you have the same issue ?


----------



## Taku123

i'm also on high performance mode


----------



## Taku123

GRABibus said:


> I didn’
> 
> 
> I don’t have my build yet. And it will be a 5900x.
> Can you just disable PBO , set the multiplier for cpu at 40 to get 4GHz then a vcore override at 1,25 V to be sure it is stable.
> Then under windows, set 100% for minimum
> CPU in power options.
> Cpu will be at 4GHZ constantly with 1,25V constantly, even at idle.do you have the same issue ?


when i manually disabled pbo (instead of having it on auto) i took a pretty big performance hit

i'm back at stock everything except docc for my RAM, if it happens again i'm going to try these settings. I have to figure out how to work with asus bios since this is the first time using it


----------



## WinterActual

I finally received an email from AMD. My cpu is still sent for rma to my retailer so AMD were a bit slow with their response but I will tell you - I think they are finally aware of the problem because their answer was - try to run it with just 1 stick of ram, if the problem still persist directly send the cpu for rma.


----------



## Deepcuts

Take two.








If I am not back in 2 hours with good news, there is a good chance a big hammer was involved.


----------



## Jay109

Deepcuts said:


> Take two.
> View attachment 2470412
> 
> If I am not back in 2 hours with good news, there is a good chance a big hammer was involved.


Please let me know how it goes. I'm literally in the same boat and about to RMA my 5950x but if you newly RMAd one is also faulty maybe its not the cpu then.


----------



## glith

Deepcuts said:


> Take two.
> View attachment 2470412
> 
> If I am not back in 2 hours with good news, there is a good chance a big hammer was involved.


Please also tell which production date is on the new one... Its written on the CPU itself.


----------



## glith

WinterActual said:


> I finally received an email from AMD. My cpu is still sent for rma to my retailer so AMD were a bit slow with their response but I will tell you - I think they are finally aware of the problem because their answer was - try to run it with just 1 stick of ram, if the problem still persist directly send the cpu for rma.


I also got a reply from AMD which isnt really helpful: "Remove CMOS-battery and reset BIOS. If it doesnt help, try the CPU on another system."
(I already did reset the BIOS.. and I'm not "lucky" to have another AMD-system close by.)


----------



## Deepcuts

*Good news.*
Shutdown PC. Removed the power cord. Reset CMOS via the back IO button.
Replaced the CPU. Started the PC. Entered BIOS and loaded setup defaults. RAM 2133 UCLK/FCLK 1067. No XMP. Nothing else changed. Same Windows install.
BIOS version is F31

First 30 minutes light internet browsing with youtube videos, some internet speed test, launched Guild Wars 2. All good.
Next, I started a Handbrake encode. 1st encode took 25 minutes and everything is still stable.
After the 1st Handbrake Encode did some AIDA64 benchmarks. Still stable.
Left the system idle for another 30 minutes. Still stable.
The 1st CPU would have crashed a long time ago.

Of course, 1-2 hours of testing cannot be labeled as 100% stable. But the fact is: the 2nd CPU is light years ahead of the 1st CPU stability wise.
Cannot stress this enough: *the only thing I have replaced is the CPU*.
3950X = stable
1st 5950X = Not stable at stock
2nd 5950X = Stable so far at stock.

Will leave it at stock for 1 week. If still stable, will start tweaking.
Almost 6 weeks now with the 2nd CPU and the system is stable. No WHEA or reboots.
Running my RAM KIT at manual 3600 Mhz with tight timings and 1:1 IF, 57-58 ns latency in AIDA64, C.O. all cores at negative 15 with 200W TDP 130 TDC and 160 EDC PBO.
Max spike temps at 90-92 Celsius with an average of 67 Celsius tdie during long handbrake encodes.










If this is not proof enough AMD did a boo-boo, I do not know what is. But my 2 cents bet is that they will never acknowledge the scale of this issue.
I was under the wrong impression that somehow AMD's quality check would not let so many broken CPUs slip through. I was wrong.

Left 1st CPU Right 2nd CPU​


----------



## Jay109

Deepcuts said:


> *Good news.*
> Shutdown PC. Removed the power cord. Reset CMOS via the back IO button.
> Replaced the CPU. Started the PC. Entered BIOS and load setup defaults. RAM 2133 UCLK/FCLK 1067. No XMP. Nothing else changed. Same Windows install.
> BIOS version is F31
> 
> First 30 minutes light internet browsing with youtube videos, some internet speed test, launched Guild Wars 2. All good.
> Next, I started a Handbrake encode. 1st encode took 25 minutes and everything is still stable.
> After the 1st Handbrake Encode did some AIDA64 benchmarks. Still stable.
> Left the system idle for another 30 minutes. Still stable.
> The 1st CPU would have crashed a long time ago.
> 
> Of course, 1-2 hours of testing cannot be labeled as 100% stable. But the fact is: the 2nd CPU is light years ahead of the 1st CPU stability wise.
> Cannot stress this enough: *the only thing I have replaced is the CPU*.
> 3950X = stable
> 1st 5950X = Not stable at stock
> 2nd 5950X = Stable so far at stock.
> 
> Will leave it at stock for 1 week. If still stable, will start tweaking.
> 
> If this is not proof enough AMD did a boo-boo, I do not know what is. But my 2 cents bet is that they will never acknowledge the scale of this issue.
> I was under the wrong impression that somehow AMD's quality check would not let so many broken CPUs slip through. I was wrong.
> 
> Left 1st CPU Right 2nd CPU​
> View attachment 2470439
> 
> 
> View attachment 2470440



This is with boost enabled and overdrive, correct?


----------



## Deepcuts

Jay109 said:


> This is with boost enabled and overdrive, correct?


Core Performance Boost on Auto, meaning Enabled.
No clue what overdrive is, but if you are refering to PBO, then no. PBO is by default on Auto on Gigabyte boards, meaning disabled.
Check the last image in the post above for frequency and other details. Click or open in new tab for better image quality.


----------



## Jay109

Deepcuts said:


> Core Performance Boost on Auto, meaning Enabled.
> No clue what overdrive is, but if you are refering to PBO, then no. PBO is by default on Auto on Gigabyte boards, meaning disabled.
> Check the last image in the post above for frequency and other details. Click or open in new tab for better image quality.


Ah I see, thanks a lot for this! I sent an RMA request and called an AMD support center who seemingly aren't aware. I emailed them this thread too.


----------



## Imraneo

Deepcuts said:


> Core Performance Boost on Auto, meaning Enabled.
> No clue what overdrive is, but if you are refering to PBO, then no. PBO is by default on Auto on Gigabyte boards, meaning disabled.
> Check the last image in the post above for frequency and other details. Click or open in new tab for better image quality.


Thanks for sharing. Looks like it's silicon lottery issue. Some are just bad.
Again, pls continue to share. My chip decided to misbehave after about 1 week of usage.
Cheers!


----------



## MoW

My question is how would AMD allow such defective chips to be sold in the first place ? I believe they knew about it but sweep it under the carpet.


----------



## Deepcuts

Imraneo said:


> Thanks for sharing. Looks like it's silicon lottery issue. Some are just bad.
> Again, pls continue to share. My chip decided to misbehave after about 1 week of usage.
> Cheers!


Silicon lottery would be a golden chip that boosts above the rest and/or at lower voltages, not what we are seeing here.
This is plain silicon fkery.


----------



## Midian

Good to hear Deepcuts, hopefully it will stay stable. My 5950X (2046SUS) have had zero WHEA-errors since windows reinstall on the 2 dec with F31J bios. I only ever had one error but that was on an old windows install. It boosts up to around 5050MHz, that boost down there is from opening of a movie.


----------



## buildorbust

Deepcuts said:


> Silicon lottery would be a golden chip that boosts above the rest and/or at lower voltages, not what we are seeing here.
> This is plain silicon fkery.


For new one 5950x what max EDC at stock bios settings but unlocked EDC limit from 140A to 200A.

My at cinebench r20 CPU multicore - max 173A EDC(at stock settings just unlocked EDC from 140A to 200A for testing)

More quality CPU - less value for EDC should be.

plz test it...


----------



## aa.delite

MoW said:


> My question is how would AMD allow such defective chips to be sold in the first place ?


Are 5000 series the first AMD CPUs made in China? Maybe it's the answer.



Deepcuts said:


> 2nd 5950X = Stable so far at stock.


1.44-1.50v idle? Aren't you afraid? Today I've talked with a guy who also said CPU is stable after replacement. Seems like RMA is an ultimative solution. It's stable about a week. But there is another guy had no problems for a week with 1st 5950x. But after a week it started to reboot.


----------



## Deepcuts

aa.delite said:


> 1.44-1.50v idle? Aren't you afraid? Today I've talked with a guy who also said CPU is stable after replacement. Seems like RMA is an ultimative solution.


Where did you see 1.44-150V idle?
~0.950 idle (minimum) as seen in the screenshot.
The screenshot was taken during Handbrake encode.


----------



## aa.delite

Deepcuts said:


> Where did you see 1.44-150V idle?


I see 1.44-1.50v idle on latest Gigabyte bios (F11k,l,m,n,test for Aorus Master). Wonder if F31o/test is different...


----------



## Deepcuts

aa.delite said:


> I see 1.44-1.50v idle on latest Gigabyte bios (F11k,l,m,n,test for Aorus Master). Wonder if F31o/test is different...


I am on F31.
I do not see this behavior on Aorus Xtreme.


----------



## nevcairiel

I had that high idle voltages on my "broken" 5950X. I wonder if its part of the issues. I'll see how it looks on the replacement.

Speaking of the replacement, my CPU arrived at AMD and was swiftly tested, and they approved the replacement, so they must've confirmed the issue immediately or something, no further questions or complications. A new CPU is supposed to ship soon.


----------



## Spiriva

nevcairiel said:


> I had that high idle voltages on my "broken" 5950X. I wonder if its part of the issues. I'll see how it looks on the replacement.
> 
> Speaking of the replacement, my CPU arrived at AMD and was swiftly tested, and they approved the replacement, so they must've confirmed the issue immediately or something, no further questions or complications. A new CPU is supposed to ship soon.


You should have asked for the money back like i did. The next 5950x you get will prolly be broken too.
Take the money back instead!


----------



## kingmob

Spiriva said:


> You should have asked for the money back like i did. The next 5950x you get will prolly be broken too.
> Take the money back instead!


Starting to think you're on to something. Do you know of an 10850k board that can run 2 m.2s and still have one lane at x16? PCI gen 4 is keeping me from returning this mobo and cpu and going wit a much cheaper and presumably more stable system.


----------



## iraff1

Should i RMA my CPU? I've been running it at 3600/1800 flck for the past week and its 100% stable, but it still spits out whea errors from time to time, its not related to system load or system overclock, even if i put the system at a default setting the whea errors appear randomly. What say you?


----------



## Redwoodz

iraff1 said:


> Should i RMA my CPU? I've been running it at 3600/1800 flck for the past week and its 100% stable, but it still spits out whea errors from time to time, its not related to system load or system overclock, even if i put the system at a default setting the whea errors appear randomly. What say you?


 I say you wait a little longer. Everyone is focused on bios' and Agesa but there is another player thay everyone is ignoring and that is Windows. Windows could be the cause of all of these problems.


----------



## Taku123

Did some further testing on my end. Take this with a grain of salt: *I'VE ONLY HAD TWO BLACK SCREEN REBOOTS. NEVER ANY BSOD! *

First time was setting up my computer and installing windows, just light browsing downloading programs from chrome. It rebooted, didn't think anything of it. 

The second time was after playing Cyberpunk 2077 for 2 hours. Temps were fine, CPU never reached above 73c and GPU never reached above 70c. About 30mins later, more light browsing refreshing a page on google, boom. Same black screen reboot.

Every single thing is on stock EXCEPT DOCP for my RAM. I'm running GSKILL Trident Neo 3600 speed CL16.

I really think it's either a Windows problem or Bios issue. My motherboard is Asus Strix x570-I bios 3001.

The only thing I wish is if AMD acknowledged this problem so I know whether or not it's my chip. I have a ticket open and it seems like most people that have replaced their 5950x this issue was solved. Granted I haven't run into any restarts in 2 days but still, it's nice to have peace of mind.


----------



## Redwoodz

Taku123 said:


> Did some further testing on my end. Take this with a grain of salt: *I'VE ONLY HAD TWO BLACK SCREEN REBOOTS. NEVER ANY BSOD! *
> 
> First time was setting up my computer and installing windows, just light browsing downloading programs from chrome. It rebooted, didn't think anything of it.
> 
> The second time was after playing Cyberpunk 2077 for 2 hours. Temps were fine, CPU never reached above 73c and GPU never reached above 70c. About 30mins later, more light browsing refreshing a page on google, boom. Same black screen reboot.
> 
> Every single thing is on stock EXCEPT DOCP for my RAM. I'm running GSKILL Trident Neo 3600 speed CL16.
> 
> I really think it's either a Windows problem or Bios issue. My motherboard is Asus Strix x570-I bios 3001.
> 
> The only thing I wish is if AMD acknowledged this problem so I know whether or not it's my chip. I have a ticket open and it seems like most people that have replaced their 5950x this issue was solved. Granted I haven't run into any restarts in 2 days but still, it's nice to have peace of mind.


 But was the issue solved with a new cpu or was it solved by updates in the interim?


----------



## Taku123

Redwoodz said:


> But was the issue solved with a new cpu or was it solved by updates in the interim?


I'm not 100% sure the issue is completely solved. I just haven't had the issue in 2 days and the only thing that I've done is turn DOCP on and off again lol, but it's on now and I can stress test, play games with no issues. I feel like as soon as I let me guard down it's going to happen again.


----------



## t4t3r

Sorry but I don't think the chips are an issue. I got a Cache Hierarchy WHEA error yesterday that rebooted the system with my 3900x + C7H after updating the bios to the latest version that supports Zen 3. I've had this chip on multiple boards for the past ~6 months and it's been rock solid. There's no way this chip "went bad" on the same day as I loaded up a new BIOS.

This is a Windows/BIOS/etc issue or if running certain OC (and even then probably related to BIOS changes to support Zen 3). Bad processors are exceedingly rare, especially out of the box. It's nice that AMD is willing to accept RMAs but the more that gets used the less likely they are to cover this type of stuff over time. Your warranty is 3 years - let manufacturers sort out the BIOS issues first.


----------



## Taku123

t4t3r said:


> Sorry but I don't think the chips are an issue. I got a Cache Hierarchy WHEA error yesterday that rebooted the system with my 3900x + C7H after updating the bios to the latest version that supports Zen 3. I've had this chip on multiple boards for the past ~6 months and it's been rock solid. There's no way this chip "went bad" on the same day as I loaded up a new BIOS.
> 
> This is a Windows/BIOS/etc issue or if running certain OC (and even then probably related to BIOS changes to support Zen 3). Bad processors are exceedingly rare, especially out of the box. It's nice that AMD is willing to accept RMAs but the more that gets used the less likely they are to cover this type of stuff over time. Your warranty is 3 years - let manufacturers sort out the BIOS issues first.


May I ask what bios/mobo you are using?


----------



## Deepcuts

t4t3r said:


> Your warranty is 3 years - let manufacturers sort out the BIOS issues first.


How about sort out the issues first then release the product and not the other way around?
What is this, Cyberpunk 2077 for hardware?
I also was 100% sure that my 1st 5950X CPU was good and the motherboard/BIOS was to blame. Lost one month testing and troubleshooting only to discover the CPU was actually faulty.


----------



## Taku123

Deepcuts said:


> How about sort out the issues first then release the product and not the other way around?
> What is this, Cyberpunk 2077 for hardware?
> I also was 100% sure that my 1st 5950X CPU was good and the motherboard/BIOS was to blame. Lost one month testing and troubleshooting only to discover the CPU was actually faulty.


How is your second cpu coming along?


----------



## t4t3r

Taku123 said:


> May I ask what bios/mobo you are using?


I mentioned it in my post. I also have several B550 and X570 boards from Gigabyte and MSI, had at least one Asrock x570 board and tested probably 6 others over the past year. I've had more than a handful of Zen 2 chips and also have 2 5900x currently. These BIOS versions are all over the place for almost every manufacturer.


----------



## Deepcuts

Taku123 said:


> How is your second cpu coming along?


So far one day of stress testing without issues.
Also with XMP enabled. No manual timings tweaks yet for RAM.


----------



## iraff1

Deepcuts said:


> So far one day of stress testing without issues.
> Also with XMP enabled. No manual timings tweaks yet for RAM.
> View attachment 2470628


Nice to see it works for you with the new CPU! Makes me wonder how many chips where shipped that acutally had issues that somehow passed quality testing. I mean all chips do pass some kind of standard tests, many people have issues only when the CPU is not stressed, makes me think a lot of those chips passed testing because they never idle the chip they stress it... still surprised no tech influencer has picked up on the story about bad chips from amd yet


----------



## Taku123

Deepcuts said:


> So far one day of stress testing without issues.
> Also with XMP enabled. No manual timings tweaks yet for RAM.
> View attachment 2470628


How long did the whole RMA process take for your new CPU?


----------



## Deepcuts

Taku123 said:


> How long did the whole RMA process take for your new CPU?


I don't even want to think about how long a complete RMA for this CPU would take at this time of the year.
I just bought a new one. Ordered Friday, arrived Monday. Today I have sent the 1st CPU to the shop I bought it from. AMD still sleeping on my one-month-old RMA request.


----------



## Taku123

Deepcuts said:


> I don't even want to think about how long a complete RMA for this CPU would take at this time of the year.
> I just bought a new one. Ordered Friday, arrived Monday. Today I have sent the 1st CPU to the shop I bought it from. AMD still sleeping on my one-month-old RMA request.


Gotcha, i'll look for a new one. I just talked with a rep from AMD and he said that he thinks it's a bad chip and he would RMA it.


----------



## HKisd

Update on my new 5950X. It has been totally stable for over a week now. Absolutely no problems. My previous 5950X did reboot while idling about 2 times per day. Luckily I returned that faulty CPU and did not wait for BIOS updates.


----------



## Taku123

HKisd said:


> Update on my new 5950X. It has been totally stable for over a week now. Absolutely no problems. My previous 5950X did reboot while idling about 2 times per day. Luckily I returned that faulty CPU and did not wait for BIOS updates.


Did you buy a new one or RMA? Also, what does everyone think about this thread? [SOLVED] Zen 3 + X570 WHEA (ASUS BIOS 3001)

It seems like in the thread I linked it's a RAM thing. I'm thinking the next time my system reboots i'm going to set the RAM to 3200 speed until new bios update comes out.


----------



## HKisd

Taku123 said:


> Did you buy a new one or RMA? Also, what does everyone think about this thread? [SOLVED] Zen 3 + X570 WHEA (ASUS BIOS 3001)
> 
> It seems like in the thread I linked it's a RAM thing. I'm thinking the next time my system reboots i'm going to set the RAM to 3200 speed until new bios update comes out.


I bought a new one and returned faulty CPU to the seller. It took AMD about 10 days to answer to my query, and when they did, I had already returned the faulty CPU. I have Asus X570-E motherboard. That faulty CPU rebooted in idle with bios 3001, bios default settings and any memory clock, 2133 MHz, 3200 MHz and 3600 MHz. New 5950X has been totally stable with bios version 3001, bios optimized defaults and DOCP 3600 MHz memory settings.


----------



## Taku123

HKisd said:


> I bought a new one and returned faulty CPU to the seller. It took AMD about 10 days to answer to my query, and when they did, I had already returned the faulty CPU. I have Asus X570-E motherboard. That faulty CPU rebooted in idle with bios 3001, bios default settings and any memory clock, 2133 MHz, 3200 MHz and 3600 MHz. New 5950X has been totally stable with bios version 3001, bios optimized defaults and DOCP 3600 MHz memory settings.


Gotcha, I got this from a person from a trade so I'll have to wait for the RMA process.


----------



## Redwoodz

> "Deepcuts, post: 28699046, member: 377632"
> How about sort out the issues first then release the product and not the other way around?
> What is this, Cyberpunk 2077 for hardware?
> I also was 100% sure that my 1st 5950X CPU was good and the motherboard/BIOS was to blame. Lost one month testing and troubleshooting only to discover the CPU was actually faulty.


 That's what happens when you buy the latest, fastest tech on release. Intel does not have any new tech so you don't see it there.




Taku123 said:


> Did you buy a new one or RMA? Also, what does everyone think about this thread? [SOLVED] Zen 3 + X570 WHEA (ASUS BIOS 3001)
> 
> It seems like in the thread I linked it's a RAM thing. I'm thinking the next time my system reboots i'm going to set the RAM to 3200 speed until new bios update comes out.


 Have you guys tried turning of low power states for RAM?


----------



## Taku123

Redwoodz said:


> That's what happens when you buy the latest, fastest tech on release. Intel does not have any new tech so you don't see it there.
> 
> 
> 
> Have you guys tried turning of low power states for RAM?


This is my next step but I'm new to AMD Bios so how would I do that?


----------



## HKisd

Redwoodz said:


> Have you guys tried turning of low power states for RAM?


I did try disabling RAM power saving from BIOS with that faulty 5950X. Did not help. It still rebooted in idle.


----------



## o1dschoo1

Deepcuts said:


> I don't even want to think about how long a complete RMA for this CPU would take at this time of the year.
> I just bought a new one. Ordered Friday, arrived Monday. Today I have sent the 1st CPU to the shop I bought it from. AMD still sleeping on my one-month-old RMA request.


Back in 08 I rmad a cpu through intel and it took a month for me to get a cpu back. With everything going on I'd say 3 months minimum.... 

What I don't understand is why people are flocking over these chips with known issues..


----------



## rob-tech

This is typical AMD, I had to RMA the 3950x twice as it would crash in prime95 smallFFT's after about 30 minutes, the second chip that they sent was garbage and would cause the system to bug check and reboot after less than 10 seconds. I boxed that unit up and they sent a third one which behaved exactly like the first one. I then proceeded to RMA the X570 Aorus Xtreme and the replacement that Gigabyte sent behaved only slightly better with the third CPU as I could now do about 1.5 hours before a worker stopped.

In my case it only happens in this scenario (prime95 smallFFT's), in all other cases including the OCCT suite and general usage the system is 100% rock solid. 

I now refuse to buy a new AMD product at launch and am postponing getting a 5950x until mid 2021 and only from a retailer that will allow me to exchange without hassle. The CPUs are binned poorly and seem to starve themselves of power in the worst case scenario.

My system is nothing fancy and is designed around long term reliability and ease of setup rather than extracting the last bit of performance, I have 64 GB of 3200 CL14 memory which I plan to transfer to the new CPU. 

I feel the frustration of those in this thread, and the reboots at idle are most likely caused by sloppy binning and the vcore being requested that is too low for the chip causing a crash, this is also common on Zen 2 if you browse Reddit.


----------



## o1dschoo1

rob-tech said:


> This is typical AMD, I had to RMA the 3950x twice as it would crash in prime95 smallFFT's after about 30 minutes, the second chip that they sent was garbage and would cause the system to bug check and reboot after less than 10 seconds. I boxed that unit up and they sent a third one which behaved exactly like the first one. I then proceeded to RMA the X570 Aorus Xtreme and the replacement that Gigabyte sent behaved only slightly better with the third CPU as I could now do about 1.5 hours before a worker stopped.
> 
> In my case it only happens in this scenario (prime95 smallFFT's), in all other cases including the OCCT suite and general usage the system is 100% rock solid.
> 
> I now refuse to buy a new AMD product at launch and am postponing getting a 5950x until mid 2021 and only from a retailer that will allow me to exchange without hassle. The CPUs are binned poorly and seem to starve themselves of power in the worst case scenario.
> 
> My system is nothing fancy and is designed around long term reliability and ease of setup rather than extracting the last bit of performance, I have 64 GB of 3200 CL14 memory which I plan to transfer to the new CPU.
> 
> I feel the frustration of those in this thread, and the reboots at idle are most likely caused by sloppy binning and the vcore being requested that is too low for the chip causing a crash, this is also common on Zen 2 if you browse Reddit.


It is and people wonder why I say don't buy ryzen


----------



## MoW

Just for updates. I returned my initial 5950x to the retailer last month for bsod. So during this long Christmas break, I build another one. This time I used a 5800x and a crosshair 8 formula. And voila, stable at bios defaults and also at XMP. The chip manufactured week 45.
I believe from what we have seen , shared and experienced in this thread, it's conclusive to say there are something wrong with certain batches of ryzen 9 5000 series, particularly for 5950x
Suggest for those who got a new chip to continue to monitor the system for stability. Yeah, I think RMA a bsod chip would b a logical step instead of waiting for a elusive stable agessa. Kinda makes me think the more we pay for a Chip, the more headache/ trouble it will cause (for Ryzen)


----------



## MoW

HKisd said:


> I did try disabling RAM power saving from BIOS with that faulty 5950X. Did not help. It still rebooted in idle.


Just RMA it until u got back a good chip.


----------



## Alyjen

I must say I read through this thread shocked. I had zero of such issues with my 5800X with not very expensive B550 board. Yea sure, FCLK clock setting and memory optimization was a bit of a bumpy ride compared to my previous Intel platform, but apart from that it's rock stable, manage to stay well under 90C for most of stress tests and benchmarks, no bsods, no crashes. I think the weirdest thing I've encounter was pci-e cards (wlan & sound) gone when I messed with IF & related voltages too much but I recovered from it and it's working without hiccups since then.


----------



## Deepcuts

o1dschoo1 said:


> What I don't understand is why people are flocking over these chips with known issues..


I build computers from 1998-99. I would say around 2500+ CPUs installed. I know it is not a lot for people assembling dozens of units/day, but still.
Managed to kill only one AMD Athlon XP. Cracked the silicon because no die shields back then.
Not a single CPU DoA.
If Alzeheimer is not my friend yet, this would be my 1st ever CPU, be it Intel, AMD, Cyrix or VIA that was CoA ( Comatose on Arrival. Yeah, I just made this term up. Or did I? )
So no wonder people are not searching for *AMD Ryzen 5950X WHEA *before purchasing one. The thought of getting a CPU that is not stable even at stock is on nobody's mind.
I do not recall any past CPU launch with so many samples with issues *at stock*. Correct me if I am wrong, please.
So I would ease off with "What I don't understand is why people are flocking over these chips with known issues.."


----------



## o1dschoo1

Deepcuts said:


> I build computers from 1998-99. I would say around 2500+ CPUs installed. I know it is not a lot for people assembling dozens of units/day, but still.
> Managed to kill only one AMD Athlon XP. Cracked the silicon because no die shields back then.
> Not a single CPU DoA.
> If Alzeheimer is not my friend yet, this would be my 1st ever CPU, be it Intel, AMD, Cyrix or VIA that was CoA ( Comatose on Arrival. Yeah, I just made this term up. Or did I? )
> So no wonder people are not searching for *AMD Ryzen 5950X WHEA *before purchasing one. The thought of getting a CPU that is not stable even at stock is on nobody's mind.
> I do not recall any past CPU launch with so many samples with issues *at stock*. Correct me if I am wrong, please.
> So I would ease off with "What I don't understand is why people are flocking over these chips with known issues.."


 I guess it's just common sense to me to look up issues on a certain product no matter the company and reputation


----------



## aerodee80

o1dschoo1 said:


> I guess it's just common sense to me to look up issues on a certain product no matter the company and reputation


There is a lot of hype with these processors right now so no one will be looking at these issues for potential customers. I was one of them =(

Anyway, any news on the new BIOS released by MSI?








MSI First To Roll Out AGESA 1.1.9.0 BIOS Firmware For X570 & B550 Motherboards, Intros AMD Curve Optimizer & Enables Resizable BAR For NVIDIA GPUs


MSI has started rolling out the latest AGESA 1.1.9.0 BIOS firmware for its AMD X570 & B550 chipset lineup of motherboards.




wccftech.com


----------



## Riplex

Same problem here with an Asus Crosshair Hero VIII Bios 3003
Tried two cpus (5900x and 5950x) 
2 x 16 GB DDR 3600 G.Skill Trident Z NEO Certified for Ryzen 5000
Reboots @ idle 
I can run CB R20 and OCCT without problems....


----------



## aa.delite

aerodee80 said:


> There is a lot of hype with these processors right now so no one will be looking at these issues for potential customers. I was one of them =(
> 
> Anyway, any news on the new BIOS released by MSI?
> 
> 
> 
> 
> 
> 
> 
> 
> MSI First To Roll Out AGESA 1.1.9.0 BIOS Firmware For X570 & B550 Motherboards, Intros AMD Curve Optimizer & Enables Resizable BAR For NVIDIA GPUs
> 
> 
> MSI has started rolling out the latest AGESA 1.1.9.0 BIOS firmware for its AMD X570 & B550 chipset lineup of motherboards.
> 
> 
> 
> 
> wccftech.com


Gigabyte also released new bios


----------



## smbell1979

Riplex said:


> Same problem here with an Asus Crosshair Hero VIII Bios 3003
> Tried two cpus (5900x and 5950x)
> 2 x 16 GB DDR 3600 G.Skill Trident Z NEO Certified for Ryzen 5000
> Reboots @ idle
> I can run CB R20 and OCCT without problems....


I have the same board and 5950x and what helped my situation out a lot was to disable the "PBO _Fmax_ Enhancer" in the BIOS.


----------



## Taku123

smbell1979 said:


> I have the same board and 5950x and what helped my situation out a lot was to disable the "PBO _Fmax_ Enhancer" in the BIOS.


Did it eliminate it completely?


----------



## smbell1979

Taku123 said:


> Did it eliminate it completely?


I have had one WHEA cache hierarchy reboot and one unknown reboot in the three weeks since, where before I could barely get it to boot into windows or run even the simplest programs (windows explorer) without constant reboots. That setting apparently causes it to boost higher on lighter, single thread workloads, which seems to correspond to idle/very low usage situations.

I don't trust a setting in my BIOS that is created by a youtuber/forum member personally, even if he is active on this forum. I'll pass on that.

I've not noticed any performance issues after it, and my CCX 1 boosts to 5050 on all cores still after it is disabled, so it doesn't hurt anything to turn it off that I've seen.


----------



## aa.delite

smbell1979 said:


> I have had one WHEA cache hierarchy reboot and one unknown reboot in the three weeks since


So you have defective CPU. Good if you've bought BOX version, you can RMA it then. Mine is OEM and I don't know how to prove a problem to the seller. No one else complained about it yet.


----------



## t4t3r

aa.delite said:


> So you have defective CPU. Good if you've bought BOX version, you can RMA it then. Mine is OEM and I don't know how to prove a problem to the seller. No one else complained about it yet.


This is NOT indicative of a defect. Look at my post from a couple days ago - got that error on my 3900x I’ve been using for over 6 months on the exact same day I updated the bios on my C7H - is my 3900x now defective? I’m not saying people here aren’t getting 5950x with issues but that single WHEA error doesn’t unequivocally mean a chip is defective. Come on.


----------



## Deepcuts

@t4t3r Got a brain freeze reading your last reply. @aa.delite did not quote you, but @smbell1979


----------



## aa.delite

t4t3r said:


> got that error on my 3900x I’ve been using for over 6 months on the exact same day I updated the bios on my C7H - is my 3900x now defective?


Did you get only error or reboots?


----------



## MoW

t4t3r said:


> This is NOT indicative of a defect. Look at my post from a couple days ago - got that error on my 3900x I’ve been using for over 6 months on the exact same day I updated the bios on my C7H - is my 3900x now defective? I’m not saying people here aren’t getting 5950x with issues but that single WHEA error doesn’t unequivocally mean a chip is defective. Come on.


If you tried it back again with the older bios version and with no errors , then it's not defective cpu.


----------



## t4t3r

MoW said:


> If you tried it back again with the older bios version and with no errors , then it's not defective cpu.


Exactly. It's been used with probably 10 different bios on at least half as many motherboards, literally my daily since I bought it. I'm not sure what causes that specific WHEA error since other people have posted about it in this thread, and I was just giving an example of a non-5000 chip.

I can't speak to others experience but I won't argue against someone who is able to get a replacement and it solves their issue, that is awesome. There's just a lot of variables right now, especially the state of BIOS versions which is known to cause WHEA errors with EVERY board out there. I hope people get their issues resolved, defective chips replaced, etc, but it will be interesting to see how things go as more people get their hands on them with improved supply. It seems to be almost entirely 5950x parts thus far.


----------



## MoW

Deepcuts said:


> I do not recall any past CPU launch with so many samples with issues *at stock*. Correct me if I am wrong, please.
> So I would ease off with "What I don't understand is why people are flocking over these chips with known issues.."


Nope I haven't come across it as well. Only this particular Zen 3 launch and this particular models of cpu (zen 3 Ryzen 9 family and the flagship 5950 in particular) with the most bsod issues.
The fact its happening throughout the globe suggest something not right with the silicon.
Lastly we flock to these chips because some of us are using zen 2 without issues and we thought , "well this zen 3 will work just as good" and we bought it at launch time without giving it much thought.


----------



## WinterActual

Today my replacement 5600x arrived. So far no wheas or any errors or restarts. Booted with 3800 1:1:1 right away.

edit: btw its an older cpu. The faulty one was 2043, this one is 2037.


----------



## GRABibus

WinterActual said:


> Today my replacement 5600x arrived. So far no wheas or any errors or restarts. Booted with 3800 1:1:1 right away.
> 
> edit: btw its an older cpu. The faulty one was 2043, this one is 2037.


give also some news after 2 or 4 days as some people experienced no issues the first day and get those whea and reboots after several days.

let it run at default settings during several days 😊

I still don’t have my rig with 5900X, but if I experience same issues as most of the people here, I will switch to 11900K in Spring.


----------



## Deepcuts




----------



## WinterActual

GRABibus said:


> give also some news after 2 or 4 days as some people experienced no issues the first day and get those whea and reboots after several days.
> 
> let it run at default settings during several days 😊
> 
> I still don’t have my rig with 5900X, but if I experience same issues as most of the people here, I will switch to 11900K in Spring.


Yes, I am aware. My previous cpu was fine for 3 or 4 days. Ill report back if I get any wheas


----------



## GRABibus

Deepcuts said:


> View attachment 2470889


😂

let’s go for 11900K in spring !


----------



## arvu

My PC now has been stable for more than a week, after I changed some BIOS settings. With default settings it was giving WHEA and BSOD all the time, but now it's been without any issues - not even a single WHEA, BSOD, or any other HW related issues. Changing BIOS settings might make your system more stable, even if it does not work with default settings.


----------



## WinterActual

Well, you could have shared with us what settings you changed...


----------



## GRABibus

arvu said:


> My PC now has been stable for more than a week, after I changed some BIOS settings. With default settings it was giving WHEA and BSOD all the time, but now it's been without any issues - not even a single WHEA, BSOD, or any other HW related issues. Changing BIOS settings might make your system more stable, even if it does not work with default settings.


 Ok.
Can you please post your changes ?


----------



## Deepcuts

GRABibus said:


> Ok.
> Can you please post your changes ?


Judging from Replaced 3950X with 5950X = WHEA and reboots I guess he manually overclocks with a fixed voltage and maybe increased LLC


----------



## WinterActual

I also fixed my problems with fixed oc, but thats not a proper fix imo. That's not how the cpu is designed to work out of the box.


----------



## GRABibus

What I have in mind is that if the CPU was defective, there should be also these kind of issues with static OC...
Let’s see but maybe some bioses and/or agesa will solve these « at stock » issues in the next future.
It is a bet, but let’s see...What a mess....

did some tested with last MSI bios with agesa 1.1.9.0 ?


----------



## PJVol

May be I haven't have read enough posts here, but it amazing how people jump into conclusion of faulty hardware, forgetting that curent firmware is just as buggy as it usually was, during a couple of months after release.
I am pretty sure the issue is on software side. What makes me confident (?) is the fact, that I've bought b550 mobo a month before 5600X cpu arrived, and after flashing bios that of required to boot 5000 series (it was kinda 1080 agesa based), all that WHEAs and BSODs s**t start to flow, on any 1080 or 1100 patch C agesa based firmware. Mind you, it all happened with a 3600X CPU which was worked flawlessly a solid year before.
Thankfully, mobo's vendor responded promptly and effectively, just in time when i switched CPU's )), and that was good idea to abandon any of that 1180 crap, and release 1100 patch D instead, which I currently using and not shooting in the dark.


----------



## GRABibus

PJVol said:


> May be I haven't have read enough posts here, but it amazing how people jump into conclusion of faulty hardware, forgetting that curent firmware is just as buggy as it usually was, during a couple of months after release.
> I am pretty sure the issue is on software side. What makes me confident (?) is the fact, that I've bought b550 mobo a month before 5600X cpu arrived, and after flashing bios that of required to boot 5000 series (it was kinda 1080 agesa based), all that WHEAs and BSODs s**t start to flow, on any 1080 or 1100 patch C agesa based firmware. Mind you, it all happened with a 3600X CPU which was worked flawlessly a solid year before.
> Thankfully, mobo's vendor responded promptly and effectively, just in time when i switched CPU's )), and that was good idea to abandon any of that 1180 crap, and release 1100 patch D instead, which I currently using and not shooting in the dark.


I assume you use 1.80 bios ?
No more Whea ? No more reboots at idle ?


----------



## PJVol

I was, till recently, when Asrock sent me 1.81 beta, basically it's the same 1.80 (I suppose) with boost override limit of 500mhz instead of 200. And yes, since patch D, no more of that ****  Tbf, the most troublesome was on 1080, iirc 1.2. Patch C then greatly reduced those to a something like one reboot in 2-3 days. Waiting for 1190 or 1200 ).


----------



## GRABibus

PJVol said:


> I was, till recently, when Asrock sent me 1.81 beta, basically it's the same 1.80 (I suppose) with boost override limit of 500mhz instead of 200. And yes, since patch D, no more of that ****  Tbf, the most troublesome was on 1080, iirc 1.2. Patch C then greatly reduced those to a something like one reboot in 2-3 days. Waiting for 1190 or 1200 ).


thanks.
Hope you are right and, at least with your Asrock Mobo, it shows that you are.

asus seems to suck as we are still on C patch. Let’s see what happens in January from Asus side with a new Bios based on D patch.
If it would be the solution to all these ****ty idle reboots and Wheas, we all would be happy 😊


----------



## PJVol

The rumour has it that all vendors gonna jump straight to 1.1.9.0 very soon. Msi rolled out already for some boards, think it wont take too long for others to follow.


----------



## arvu

GRABibus said:


> Ok.
> Can you please post your changes ?


You can find the changes I made in my earlier post here: Replaced 3950X with 5950X = WHEA and reboots


----------



## WinterActual

PJVol said:


> May be I haven't have read enough posts here, but it amazing how people jump into conclusion of faulty hardware, forgetting that curent firmware is just as buggy as it usually was, during a couple of months after release.
> I am pretty sure the issue is on software side.


If it was software problem (****ty agesa), why me and other people who went through RMA have zero issues with our new cpus?


----------



## GRABibus

WinterActual said:


> If it was software problem (****ty agesa), why me and other people who went through RMA have zero issues with our new cpus?


maybe it is a combination of both....
He solved by upgrading Bios.
Others solved by changing CPU


----------



## Deepcuts

Maybe we will get an official response from AMD regarding this issue and then we will know for sure. But I doubt it.
To me, crashes and reboots without overclocking anything is not something you can truly fix with software, but merely apply a band-aid and hope the user will not notice or care.
I was in a privileged position that allowed me to buy another CPU and actually get one in order to rule out AGESA/BIOS or the motherboard. I could have just as well received another bad CPU and then I would have been in the same camp with people saying AGESA is to blame. That would've sucked big time.
At the end of the day, it is your prerogative to wait for that sweet sweet band-aid for your expensive CPU.


----------



## GRABibus

Before receiving my rig with 5900X (in January), I had the chance to get also a brand new 5900X.
So, let’s see what happens for me.
If the 5900x in my rig shows reboots at idle and wheas , even with next Asus agesa D patch, then I will try the other 5900x

if same issue with the new CPU, definitely, I will check if 11900k can be more stable at launch time


----------



## machine038

I got a 5950X with a Asus ROG Strix X570-I, by November 15

I was also plagued by idle WHEA BSOD. Setting the VSOC, CLDO VDDP, IOD, CCD fixed it the restarts while being idle, but while playing games would trigger those WHEA BSOD.


Latest BIOS, 3001 version allowed raising the DRAM/IF clock past the 3200/1600 "barrier" but no luck fixing the WHEA BSOD.

Using @excitebike tip of raising the curve optimizer made it fully stable, but I had to push it only to +1 instead of +8

Also, note that the latest BIOS set the VSOC, CLDO VDDP, IOD, CCD to sane values (similar to what I've set) on AUTO preventing the idle WHEA BSOD.

I've "RMA" it and received a new 5950X, worked flawlessly out of the box without any adjustments. I think there is a issue with the CPU and I'd recommend RMAing it even if the curve optimizer offset would help it to get working.


----------



## MoW

It's easy to assume it can be fixed by agessa software. I really doubt it . After paying hard dollars for a high end chip, it should be expected to work at least be stable at bios defaults. It's not our job to find temporary fixes for a flawed product.
Ppl should be aware and start the RMA process to request for a replacement chip that is good as it should be.
Shouldn't be wasting time waiting and bet on a future elusive ageesa will be a cure. 
If it's BSOD , RMA it. AMD ought to take some responsibility on this.
This serves as a good lesson when considering purchasing AMD products.


----------



## Anulu

I have a 5950x for two Days now and Yesterday i found out i have those WHEA Errors,Its always that "Bus-Interconnect Error" and never had a BSOD or Reboot
I could run TestMem5 usmus without Errors @IF2000/DDR4000 but later playing BF V my USB Headset stopped working and the Game stuttered for a Minute
Replugged Headset,Game didnt crash and Sound was back.Later i stopped playing and i saw the Errors in HWinfo. 
More than 5000!! 

Figured a way to quickcheck for Errors with BF V and Aida Rambenchmark.OCCT works too but needs more Time.
When i start BFV they come very quick,it depends on IF/Mem Clock and Voltages how many Errors.
With 1900/3800 its 2-4 right at the start of the Game when it changes to Fullscreen,after that i can play for Hours without a single Error!


-1600/[email protected] no Errors
-1733/[email protected] and 1800/3600 more Errors than 1900/[email protected] Strange

Funny thing is im running on a X370 Asrock with AGESA 1.1.0.0 and have no BSODs or Reboots while the People with 550/570 have them 
Without the USB Incident i wouldnt even check for WHEAs and BF V is very demanding for the Cpu compared to other Games



I dont think my Cpu is damaged,that "Bus-Interconnect Error" looks Firmware Related to me,however im gonna Test with a 5800x


----------



## aa.delite

MoW said:


> If it's BSOD , RMA it. AMD ought to take some responsibility on this.


What if it DID BSOD, but doesn't anymore after BIOS update? No reboots, no WHEA errors, no USB problems.
I feel like it's still defective. New BIOS keeps 1.44-1.50v core voltage while idle, so I'm afraid and set -0.2v offset. It causes performance drop in Cinebench R23. Deepcuts said he has 0.98v idle. It's strange. I thought anyone has 1.44v+ idle using new BIOS.


----------



## t4t3r

If there are no errors after a bios update with the same settings how can you still say the chip is defective? 

Idle voltage is based on the chip throttling down correctly, not the bios version. With the correct chipset drivers installed (and correct power plan enabled) ryzen chips will drop voltage under ~1V at idle and throttle back up as needed. 

These are the huge jumps to conclusion that makes it hard to take some of these issues seriously.


----------



## MoW

aa.delite said:


> What if it DID BSOD, but doesn't anymore after BIOS update? No reboots, no WHEA errors, no USB problems.
> I feel like it's still defective. New BIOS keeps 1.44-1.50v core voltage while idle, so I'm afraid and set -0.2v offset. It causes performance drop in Cinebench R23. Deepcuts said he has 0.98v idle. It's strange. I thought anyone has 1.44v+ idle using new BIOS.


It shouldn't be at 1.44v at idle. Mine 5800x is only 0.9xxx v at idle. Even my previous 3950x is less than 1.0 v at idle. My results tally with Deepcuts.


----------



## Spectre73

> aa.delite said:
> What if it DID BSOD, but doesn't anymore after BIOS update? No reboots, no WHEA errors, no USB problems.
> I feel like it's still defective. New BIOS keeps 1.44-1.50v core voltage while idle, so I'm afraid and set -0.2v offset. It causes performance drop in Cinebench R23. Deepcuts said he has 0.98v idle. It's strange. I thought anyone has 1.44v+ idle using new BIOS.


Did you disable c-states? Many people recommended it here to solve WHEA errors but it of course messes up with your power saving.


----------



## aa.delite

Spectre73 said:


> Did you disable c-states?


No, just flashed to F11 and dynamic core voltage offset -0.2v to prevent 1.5v at idle.


----------



## GRABibus

aa.delite said:


> No, just flashed to F11 and dynamic core voltage offset -0.2v to prevent 1.5v at idle.


1,5V at idle is completely normal for Ryzen.
This is the way boost works !
Read this :


__
https://www.reddit.com/r/Amd/comments/cbls9g


----------



## cstkl1

GRABibus said:


> 1,5V at idle is completely normal for Ryzen.
> This is the way boost works !
> Read this :
> 
> 
> __
> https://www.reddit.com/r/Amd/comments/cbls9g


its actually normal for intel also.
just that ppl crank up the loadline when the fear factor hits.
or has cstate enabled with vid downclocking.


----------



## aa.delite

GRABibus said:


> 1,5V at idle is completely normal for Ryzen.


That "observer effect" kinda works. I've used hwinfo64 as monitoring tool. But CPU-Z really shows much lower voltage. Well, seems you're right.
Btw I've set -0.275v dynamic core offset, so it's 1.19v (hwinfo) at idle (0.97v CPU-Z), 1.03v (hwinfo) heavy load (1.08v CPU-Z).
-0.3v is unstable, can't boot windows.
-0.2v is stable, now testing -0.275v, seems stable, next testing -0.281v
I thought it causes perfomance drop in Cinebench, but I've found it does not. It causes huge temperature drop though.
My Cinebench R23.2 result seems low(?) anyway - 24473 (3733mhz ddr4). No matter core offset or defaults. But there is no WHEA, no BSOD since F11j BIOS. F11j/k/l/m/n/test/p on B550 Aorus Master.


----------



## Redwoodz

WinterActual said:


> If it was software problem (****ty agesa), why me and other people who went through RMA have zero issues with our new cpus?


No cpu that runs fine under load but reboots at idle is faulty generally, usually always a software bug.


----------



## o1dschoo1

C states have always caused issues with overclocking.. hasn't it been the standard for the past 20 years to disable speedstep and c states?


----------



## Deepcuts

o1dschoo1 said:


> C states have always caused issues with overclocking.. hasn't it been the standard for the past 20 years to disable speedstep and c states?


With Intel yes, with AMD not so much.
C-States actually helps Ryzen 2 and 3 reach higher boost clocks.
Won't comment on speedstep on an AMD CPU


----------



## trojan92

My setup (3800x, B450 gaming pro carbon ac, 2x8gb Viper steel at 3600mhz CL17, Corsair AX860w, D15s) was upgraded with a 5900x as new beta bios came out. Everything was 100% stable before, PC even booted and worked the same after Bios update but before 5900x installation. Me and other B450 owners on Reddit have experienced a problem where the PC will boot fine and work normally but under load, it'll black screen. The power button on the case becomes unresponsive and the only way to bring it back to life is to switch the PSU off for a few mins and then turn it on. I turned core performance boost off and everything that caused a black screen before, now runs fine. I'm still hoping it's just a buggy BIOS


----------



## Gebeleisis

I have an Asus hero viii x570 wifi
I had the 3950x and upgraded to a 5950x

I am on the latest bios and always have been on the latest available. 

I have had zero issues with it. All is working fine.


----------



## CubanB

It's possible that it's both a hardware and software problem. If you think of the silicon quality and the AGESA combined.. it could be that with the current problems, it only works on well binned chips. But on poor binned chips (whether it be the IO Die or the cores etc), it becomes more fussy. And that in future, those same chips will be more stable with newer AGESA versions. While the well binned ones won't notice a difference and are already fine. When I say binned.. I just mean good silicon vs bad silicon.

There's also an issue with having a PCIE4.0 GPU or not. And there's also an issue with having C states on or off. And who knows what other random factors.. that might be contributing to it.


----------



## Deepcuts

After one week with the new Ryzen 5950X I have to say we are getting along just fine.
No hissy fits of any sort. Everything stable. I guess this one is a keeper.
Using F31q BIOS.
A mild (in my opinion) PBO of PPT 225 EDC 125 TDC 155 with a negative curve of 10-15 gives me decent performance and temperatures.


----------



## newls1

so have we come to any conclusions if there is a "better' batch code yet? Newer, older, or what????


----------



## Vorwrath

I think setting the SoC voltage to 1.1 volt instead of 1.0 has either much improved or fixed it for me. Hasn't idle rebooted for a week since I changed that, but I wasn't running into it that often, so it's a little early to be 100% sure.

Weirdly I saw a user on Reddit who had the opposite experience, and manually setting 1.0 instead of 1.1 cured it for him. Wondering if some of these chips are just very picky about the SoC voltage they want.


----------



## Taku123

With ryzen chips is it okay or recommended to change your power plan to high performance or keep it as balanced?


----------



## Imraneo

Vorwrath said:


> I think setting the SoC voltage to 1.1 volt instead of 1.0 has either much improved or fixed it for me. Hasn't idle rebooted for a week since I changed that, but I wasn't running into it that often, so it's a little early to be 100% sure.
> 
> Weirdly I saw a user on Reddit who had the opposite experience, and manually setting 1.0 instead of 1.1 cured it for him. Wondering if some of these chips are just very picky about the SoC voltage they want.


I've been running stably at 1.1V or 1.05V for a while now with C-states disabled.
BUT...
All my benchmarks run poorly. Takes about 10% performance hit. There are reports that this CPU can boost its best core toll 5Ghz. That's never happens for me. It idles at 3.6 instead of 3.7Ghz (for strange reasons).
Not what I paid for. Really hoping for a proper BIOS fix, or I think RMA would be the better bet.

Perhaps check your benchmarks.. Cinebench/Geekbench and compare against the one on their DB and see where your chip stands.


----------



## Vorwrath

Imraneo said:


> I've been running stably at 1.1V or 1.05V for a while now with C-states disabled.
> BUT...
> All my benchmarks run poorly. Takes about 10% performance hit. There are reports that this CPU can boost its best core toll 5Ghz. That's never happens for me. It idles at 3.6 instead of 3.7Ghz (for strange reasons).
> Not what I paid for. Really hoping for a proper BIOS fix, or I think RMA would be the better bet.
> 
> Perhaps check your benchmarks.. Cinebench/Geekbench and compare against the one on their DB and see where your chip stands.


Mine seems to score as expected. I'm not sure everyone here has the same problem though, think maybe there are some BIOS/AGESA related issues going on, and some actual bad hardware out there as well. I turned C states back on, since I haven't seen the issue since changing the SoC voltage.


----------



## newls1

I have a batch code 2047PGS 5950x coming, anyone have date with issues?


----------



## newls1

new 3101 beta bios for us crosshair Dark users, any one try it yet?

Version 3101 Beta Version
2020/12/25 20.38 MBytes
ROG CROSSHAIR VIII DARK HERO BIOS 3101
Improved system compatibility
Updated AGESA code to ComboV2PI 1190
Updated graphical firmware
Improved RAID function
Improved system performance


----------



## Deepcuts

newls1 said:


> I have a batch code 2047PGS 5950x coming, anyone have date with issues?


2037 bad
2047 good


----------



## Midian

Mine is 2046 zero issues since Windows reinstall on 2 dec.


----------



## aa.delite

Batch 2044SUS neutral. BIOS update fixed WHEA reboots. But I don't trust it. Problems may be back later and I have only 1 year OEM warranty.


Midian said:


> Mine is 2046 zero issues since Windows reinstall on 2 dec.


Were there issues before Windows reinstall?


----------



## Hueristic

aa.delite said:


> Were there issues before Windows reinstall?


Generally re-installs are not done for the fun of it.


----------



## newls1

Deepcuts said:


> 2037 bad
> 2047 good


 thanks


----------



## Midian

aa.delite said:


> Batch 2044SUS neutral. BIOS update fixed WHEA reboots. But I don't trust it. Problems may be back later and I have only 1 year OEM warranty.
> 
> Were there issues before Windows reinstall?


Yes I had one WHEA error when I started Cinebench R23 one time and even though that error didn't return (and games and other benchmarks worked just fine including Cinebench R23) I opted for a reinstall of Windows which I planned on doing anyway since I had just changed from a 3950X to 5950X.


----------



## WinterActual

Deepcuts said:


> 2037 bad
> 2047 good


Apparently it doesn't matter. My faulty cpu was 2043, the new one is 2037 and it works perfectly fine.


----------



## kairi_zeroblade

WinterActual said:


> Apparently it doesn't matter. My faulty cpu was 2043, the new one is 2037 and it works perfectly fine.


hmm..in general you say the whole batch 2043 is bad?? I have mine on 2043 and seems to be good..just the rams suck and board maybe lackluster..


----------



## WinterActual

No, I am not saying the whole 43 batch is bad, I am just saying that its random. Since tons of people doesn't have any issues with cpu's from all current production batches.


----------



## newls1

Anyone have any feedback using new bios for the asus dark 3101 yet that had prior issues? Im having hi hopes here guys. I get my 5950x in a couple of days and will flash to this bios as soon as I have the CPU in hand.


----------



## arvu

I just got my 5950x from batch 2046SUS replaced with a new one from batch 2047SUS. New CPU seems to boot windows with default BIOS settings unlike the first one. Now I have run only couple of stress tests on the new CPU, but so far it has been without any WHEAs or other issues.


----------



## newls1

arvu said:


> I just got my 5950x from batch 2046SUS replaced with a new one from batch 2047SUS. New CPU seems to boot windows with default BIOS settings unlike the first one. Now I have run only couple of stress tests on the new CPU, but so far it has been without any WHEAs or other issues.


what board and bios?


----------



## machine038

newls1 said:


> Anyone have any feedback using new bios for the asus dark 3101


You should flash to the latest stable first, if you have any issues, then try the beta version.
Here is some links with people discussing this 3101 version.





__





Asus Crosshair VIII 3101 Beta Bios out !


X570 Beta BIOS Update (12/25/2020). 1. Improved System Compatibility 2. Update AMD AGESA ComboAM4 V2 PI 1.1.9.0 3. Update Graphical Firmware 4. Improve RAID Function 5. Improve System Performance (nonWifi)...



rog.asus.com





__
https://www.reddit.com/r/Amd/comments/kk744y

Seems mixed results overall, someone got good speed improvements, some others got slower, someone else got unstable.


----------



## arvu

newls1 said:


> what board and bios?


I'm now using ASUS TUF Gaming B550M-Plus (Wi-Fi) with BIOS version 1401.


----------



## pSickOpatA

arvu said:


> I'm now using ASUS TUF Gaming B550M-Plus (Wi-Fi) with BIOS version 1401.


With this BIOS, same board without wifi and 5600X I barely can use Windows. Its reboots all the time, or as soon as I open a browser.

1004 is the one can hold longer here.. can be 1 hour or one week..


----------



## Anthraxious

I just joined to say this:
I'm using x570 Aorus Elite.
I switched my 3800X to my new 5800X and had WHEA errors aprox twice a day. Hard to pinpoint it. At the time my BIOS version was F30a. I updated to F30o which I noticed was out in hopes it'd help. Now the WHEA errors happened every 30 min. I switched off PBO and CBP cause i saw people say it helped and sure enough, it did. No WHEA but no boost either which is kinda ****.
I then yday updated to F30q which was the latest from 2020-12-18. I reset BIOS, loaded optimized settings and enabled XMP only. Again the WHEA happened. I once again turned the two boosts off and I'm not having errors anymore but still, no boosts.

EDIT: I forgot to mention that the BIOS menu also got sluggish for the first time with latest version. Never had that problem before but there's like half a second delay in commands. Kinda annoying but nothing that makes it unmanageable. Thought I'd mention that too.

I'm waiting for a new BIOS as well but I have started an RMA with AMD and will maybe ask the reseller I got mine from to have it exchanged. There's at least 1y warranty and I can live with no boost for a while, at least until new batches release and stocks get back to normal cause I don't have my old one and can't sit without a CPU for weeks on end.

Nice to see a collective thread tho cause before today I found sporadic, albeit many, threads about this but nobody had my exact combo although the issues are generically the same.

At any rate, will follow this development. If it's needed; where do I contact gigabyte to at least have it on record that I'm having issues? The more people report, the more they'd be keep on fixing it instead of swiping it under the rug, no? I couldn't really find the actual contact mail/page for them...


----------



## Deepcuts

Anthraxious said:


> At any rate, will follow this development. If it's needed; where do I contact gigabyte to at least have it on record that I'm having issues? The more people report, the more they'd be keep on fixing it instead of swiping it under the rug, no? I couldn't really find the actual contact mail/page for them...








GIGABYTE - eSupport







esupport.gigabyte.com


----------



## aa.delite

You may try to increase dynamic CPU Vcore and/or dynamic VCORE SOC voltage using positive offset. +0.05v should be enough to fix reboots.


----------



## arvu

pSickOpatA said:


> With this BIOS, same board without wifi and 5600X I barely can use Windows. Its reboots all the time, or as soon as I open a browser.
> 
> 1004 is the one can hold longer here.. can be 1 hour or one week..


It seems to have very similar symptoms to what my CPU had. You should try to RMA the cpu or return it to seller. Such unstable CPU is not acceptable. AMD should fix their quality problems.

If you can't wait for new CPU you can try tuning bios settings. By disabling some features and doing manual PBO, I was able to run it stable for almost two weeks. Others have been successful in adjusting voltages.


----------



## Manuru

Hey guys.
I'm experiencing unstable behavior on 5900x as well. Noticed that there were a lot of WHEA errors and Windows crashed once during browsing.
I disabled XMP and used my MSI B550 Gaming Edge timings preset: 3600 Mhz ram, 1800 Mhz FCLK, 1.4V.

There're no errors for now, but I still get WHEA log events in the operational journal with ids 42 (no meaningful description) and 5 (reports amount of error sources).
Error journal is empty. Need more time to test stability, but it's still not OK, isn't it?

Also, I'll try different RAM. If it won't work in XMP too, should I go for RMA?


----------



## dtm-be

Hi guys,

I'm running a 5950X in a Asus crosshair VIII WiFi with the 3101 bios, no oc but with DOCP on 3600mhz (this makes no difference in stability for me). This bios has been the most stable for me but still gives WHEA errors multiple times a day. If it registers a code, it's always: LiveKernelevent Code: 124. (No idea what it means) 
I even once got an overheating error (using corsair 360mm AIO). 

I'm probably going to ask for an RMA as well... 

If anyone has any tips or solution please let me know!


----------



## Deepcuts

Manuru said:


> Hey guys.
> I'm experiencing unstable behavior on 5900x as well. Noticed that there were a lot of WHEA errors and Windows crashed once during browsing.
> I disabled XMP and used my MSI B550 Gaming Edge timings preset: 3600 Mhz ram, 1800 Mhz FCLK, 1.4V.
> 
> There're no errors for now, but I still get WHEA log events in the operational journal with ids 42 (no meaningful description) and 5 (reports amount of error sources).
> Error journal is empty. Need more time to test stability, but it's still not OK, isn't it?
> 
> Also, I'll try different RAM. If it won't work in XMP too, should I go for RMA?


Do you have these issues at stock? 
As far as I can tell, your issue is with overclocking RAM. 
This topic is about issues with the Ryzen 5000 series at stock. 
Please keep on topic.


----------



## Deepcuts

dtm-be said:


> Hi guys,
> 
> I'm running a 5950X in a Asus crosshair VIII WiFi with the 3101 bios, no oc but with DOCP on 3600mhz (this makes no difference in stability for me). This bios has been the most stable for me but still gives WHEA errors multiple times a day. If it registers a code, it's always: LiveKernelevent Code: 124. (No idea what it means)
> I even once got an overheating error (using corsair 360mm AIO).
> 
> I'm probably going to ask for an RMA as well...
> 
> If anyone has any tips or solution please let me know!


I have seen such errors in overheating setups.
Check all your sensors and make sure the CPU cooler is seated properly.


----------



## Anulu

I just put the 5800x (2049/00036) on my Asrock x370 Fatality mITX with IF @1933mhz 16-17-16-36-1t and it doesnt show any Errors at the Moment.
The 9550x (2049/00378) produced WHEA BusInterconnect at every IF/Ram 1:1 over 3200 even with lazy Timings,but never BSOD or Reboot 

Not sure if i should RMA the 5950x.Couldnt really Test it because weak VRM i had to stay in safe Powerlimit.
Maybe i should buy a new mITX Board and Retest both Cpu`s with a newer AGESA since AMD doesnt want Asrock to bring more Updates for the x370.
Should be no Problem to sell the Fatality with my old 3700x and some of the trash Bin b-Die i dont use 

Edit:
just got them Errors again in Bf V at IF1800 even more Errors than IF1933  Need Agesa patch D or new Board.
Anyone know a good mITX Board with good Bios Support?


----------



## pSickOpatA

arvu said:


> It seems to have very similar symptoms to what my CPU had. You should try to RMA the cpu or return it to seller. Such unstable CPU is not acceptable. AMD should fix their quality problems.
> 
> If you can't wait for new CPU you can try tuning bios settings. By disabling some features and doing manual PBO, I was able to run it stable for almost two weeks. Others have been successful in adjusting voltages.


I dont want to RMA, at least not for now.. tried some things on BIOS but nothing works consistenly. Will keep trying some more tweaks that ive read here.

For two days straight i spent 5+ hours gaming on Warzone with zero reboots, but if i open any browser i got insta reboot. Cant navigate at all.. doing stuff on Windows its ok too.

Another thing that seems to trigger is open Steam or some random game there.


----------



## MoW

Everyone with a ryzen 5xxx that Bsod at bios defaults should initiate RMA process. Instead of wasting waiting for a hopeful agessa that will cure it or waste time tinkering with bios settings to make it stable. 

It's pretty clear by now that some chips are of poor/bad silicon quality. 
Let AMD accountable for it.


----------



## aa.delite

pSickOpatA said:


> but if i open any browser i got insta reboot. Cant navigate at all..


try to increase dynamic VCORE SOC voltage using positive offset. +0.05v should be enough to fix reboots.
If doesn't help, try to increase dynamic dynamic CPU Vcore using positive offset +0.05v.
Then wait for a good moment to RMA.


----------



## Anthraxious

Deepcuts said:


> GIGABYTE - eSupport
> 
> 
> 
> 
> 
> 
> 
> esupport.gigabyte.com


Thanks mate



MoW said:


> Everyone with a ryzen 5xxx that Bsod at bios defaults should initiate RMA process. Instead of wasting waiting for a hopeful agessa that will cure it or waste time tinkering with bios settings to make it stable.
> 
> It's pretty clear by now that some chips are of poor/bad silicon quality.
> Let AMD accountable for it.


If there were any CPUs to change to I would but nobody wants to sit without a CPU for weeks, that's why people wait with the RMA (Also AMD could say that the CPU is fine and send it back meaning it's the Motherboards fault). When the market recovers a bit and there are more of them out there, I'll go back and change it. Warranty is minimum 1 year anyway.


----------



## kairi_zeroblade

WinterActual said:


> No, I am not saying the whole 43 batch is bad, I am just saying that its random. Since tons of people doesn't have any issues with cpu's from all current production batches.


ohh..well, TBH, I never had issues since I built on this Ryzen 5800x platform I am using..I even myself updated the board so it can post via usb flashback tool..and since it booted I never had a single issue..(with either my B-die kit and my CJR/DJR kit..or the mentioned idle or sleep bugs)..yesterday a new bios for my board is out and after flashing it..still had no issues to report..left it all night sleeping and when I woke up and wake it from sleep still had no issues on doing so..so I don't really know where people are getting the issues..I myself, has been waiting when I will get 1 too..


----------



## dtm-be

Deepcuts said:


> I have seen such errors in overheating setups.
> Check all your sensors and make sure the CPU cooler is seated properly.


Will do that as soon as my thermal paste arrives. But I don't think it's going to solve anything. When I run Cinebench R23 I get a solid 62° C on my CPU and it finishes with an average 5950x score (better than first gen TR, so no thermal throttling I think). But sometimes doing the most basic things it can spike up to 85 degrees using only 4 threads. My WHEA crashes also occur at really random moments so I don't think it's temperature related.


----------



## slvr

(MSI MAG Tomahawk X570, 5950X, memory is 4x16gb 3600 CL16 GSkill Ripjaws V)
Having these problems (WHEA errors, constant reboots when idle), dug the entire web for the solution and ended up with this:


BIOS v151 beta
Disabled Global C State
Curve optimizer +2 all core
XMP enabled
The rest is stock, i.e. it boosts single core 3.8, single core 4.9

no crashes so far. Finally, I am able to use my PC after three days of non-stop investigation.
Waiting for a new BIOS with AGESA 1.1.9.0 ...


----------



## nevcairiel

Anthraxious said:


> If there were any CPUs to change to I would but nobody wants to sit without a CPU for weeks, that's why people wait with the RMA (Also AMD could say that the CPU is fine and send it back meaning it's the Motherboards fault). When the market recovers a bit and there are more of them out there, I'll go back and change it. Warranty is minimum 1 year anyway.


AMD CPUs are easily available here now. A bit overpriced still (which doesn't matter for RMA), but in stock at various vendors. So I don't think thats a huge problem anymore.


----------



## newls1

not in the states my friend... what is available is from the scalpers and your paying 1200$ for a 5950x etc....


----------



## nperpublic

I had the problem (whea when gaming) with my 5800x. I RMAed and got a new one. That fixed it, with everything set to the same.


----------



## Anthos

I ll try to make a long story short. (5950x, dark hero)
-Had the system for a week on an old windows installation, fine for that period of time
-decided to make a fresh install on Sunday
-started getting multiple WHEA reboots, almost all of them at idle (or simple use like browsing etc), only once during gaming I think
-yesterday flashed to the latest beta bios and disabled c-states only, (had DOCP enabled) etc
-left the pc continuous working since (~24h) and most of the time has been idle with the occassional game last night.
-no WHEA errors since


----------



## GRABibus

Anthos said:


> I ll try to make a long story short. (5950x, dark hero)
> -Had the system for a week on an old windows installation, fine for that period of time
> -decided to make a fresh install on Sunday
> -started getting multiple WHEA reboots, almost all of them at idle (or simple use like browsing etc), only once during gaming I think
> -yesterday flashed to the latest beta bios and disabled c-states only, (had DOCP enabled) etc
> -left the pc continuous working since (~24h) and most of the time has been idle with the occassional game last night.
> -no WHEA errors since


Beside bios, bad silicon, etc....Could current windows updates be also the culprit ?


----------



## Anthos

GRABibus said:


> Beside bios, bad silicon, etc....Could current windows updates be also the culprit ?


God knows. Both of my installations were on the latest half year update but I found it very strange that on the old windows installation that previously had an intel cpu/ddr2 memory didn't give any crashes yet I got them on a fresh install. Maybe because the installation was a few years old there was a renegade process that just kept the cores from idling as much? not.. a.. single.. idea... I found it very illogical and doubt it's just a coincidence. I was gonna keep the old installation around and it would had proven really handful in investigating this situation but alas when creating the new installation I chose the wrong disk because I was distracted. Couldn't recover the data. Insult to injury.


----------



## thunk_stuff

I posted this on the AMD forum thread, but thought I'd include it here. Updated my vote to say "replacement CPU fixed".

*RMA UPDATE:* I received my replacement 5900X from AMD yesterday and it looks to be stable. The difference is like night and day. Only one day of testing may be too early to tell, but it's a big improvement from the old chip which crashed right after I first installed it. Below is a summary of everything.

*SYMPTOMS:* Received 5900X beginning of December that within minutes of installing for the first time had bluescreen WHEA crashes, usually cache hierarchy error. Normally crashed at idle or doing a light task like opening an app or web browsing, most often within a minute of Windows loaded. Sometimes was stable long enough to do a benchmark, but never got beyond 5 minutes after Windows loaded before crashing. Does not crash in BIOS.

*THINGS I TRIED:*

All BIOS options set to default (no overclock)
Two different sets of RAM (G.SKILL TridentZ RGB Series, G.SKILL Ripjaws V Series)
XMP off (2133Mhz)
XMP 3000, 3200, etc
FCLK 1500, 1600, etc
All compatible BIOS versions (ASRock B550 Phantom ITX). Latest AGESA I could test: 1.1.0.0 Patch C.

*WORK AROUND:* I could make my system stable by either turning off core performance boost (CPB), setting power profile to "eco", or setting all core magnitude in curve optimizer to +8 (overvolting). I ran +8 stable for two weeks without any crashes. I preferred this work around because it did not affect performance as much as turning off CPB or using eco mode.

*RMA PROCESS WITH AMD (total 4 weeks): *

Dec 1: Submitted RMA to AMD
Dec 14 Received response asking for proof of ownership and purchase. I provided same day.
Dec 15: Accepted for return
Dec 16: I shipped via AMD's provided FedEx ground shipping label
Dec 21: AMD received
Dec 22: Approved for replacement
Dec 28: Received replacement

*RMA RESULT: *My replacement 5900X works great. It is like night and day. At default settings, I am able to set FCLK to 1867Mhz and memory to 3733Mhz, and _undervolt_ CPU to -15 in curve optimizer. Previously, I had to _overvolt_ CPU +8 to make it stable. I am seeing 5-10% better performance in benchmarks and at lower voltage. And this is running AGESA 1.1.0.0 Patch C, which is still immature for this chip and more can be expected from it with AGESA 1.1.8.0 and later.


----------



## GRABibus

thunk_stuff said:


> I posted this on the AMD forum thread, but thought I'd include it here. Updated my vote to say "replacement CPU fixed".
> 
> *RMA UPDATE:* I received my replacement 5900X from AMD yesterday and it looks to be stable. The difference is like night and day. Only one day of testing may be too early to tell, but it's a big improvement from the old chip which crashed right after I first installed it. Below is a summary of everything.
> 
> *SYMPTOMS:* Received 5900X beginning of December that within minutes of installing for the first time had bluescreen WHEA crashes, usually cache hierarchy error. Normally crashed at idle or doing a light task like opening an app or web browsing, most often within a minute of Windows loaded. Sometimes was stable long enough to do a benchmark, but never got beyond 5 minutes after Windows loaded before crashing. Does not crash in BIOS.
> 
> *THINGS I TRIED:*
> 
> All BIOS options set to default (no overclock)
> Two different sets of RAM (G.SKILL TridentZ RGB Series, G.SKILL Ripjaws V Series)
> XMP off (2133Mhz)
> XMP 3000, 3200, etc
> FCLK 1500, 1600, etc
> All compatible BIOS versions (ASRock B550 Phantom ITX). Latest AGESA I could test: 1.1.0.0 Patch C.
> 
> *WORK AROUND:* I could make my system stable by either turning off core performance boost (CPB), setting power profile to "eco", or setting all core magnitude in curve optimizer to +8 (overvolting). I ran +8 stable for two weeks without any crashes. I preferred this work around because it did not affect performance as much as turning off CPB or using eco mode.
> 
> *RMA PROCESS WITH AMD (total 4 weeks): *
> 
> Dec 1: Submitted RMA to AMD
> Dec 14 Received response asking for proof of ownership and purchase. I provided same day.
> Dec 15: Accepted for return
> Dec 16: I shipped via AMD's provided FedEx ground shipping label
> Dec 21: AMD received
> Dec 22: Approved for replacement
> Dec 28: Received replacement
> 
> *RMA RESULT: *My replacement 5900X works great. It is like night and day. At default settings, I am able to set FCLK to 1867Mhz and memory to 3733Mhz, and _undervolt_ CPU to -15 in curve optimizer. Previously, I had to _overvolt_ CPU +8 to make it stable. I am seeing 5-10% better performance in benchmarks and at lower voltage. And this is running AGESA 1.1.0.0 Patch C, which is still immature for this chip and more can be expected from it with AGESA 1.1.8.0 and later.


Even if it seems to occur on random batches, can you provide both date codes for your CPU’s ?


----------



## thunk_stuff

GRABibus said:


> Even if it seems to occur on random batches, can you provide both date codes for your CPU’s ?


Good point. The bad chip was BG 2043PGS.

Unfortunately I didn't make note of the new chip date before I installed it. Looks like I can't get this from the box.


----------



## pSickOpatA

aa.delite said:


> try to increase dynamic VCORE SOC voltage using positive offset. +0.05v should be enough to fix reboots.
> If doesn't help, try to increase dynamic dynamic CPU Vcore using positive offset +0.05v.
> Then wait for a good moment to RMA.


Didnt work either. Cant even have time enough to reply to you here, that reboots. Had to use my phone.

Dont know what to do anymore.. think I gonna send to rma and wait at least 30 days to resolve this. 

Edit: apparently I can use Firefox... but chrome/brave/edge its impossible to navigate...

Edit 2: No I cant. Just saw that Asus released a new beta BIOS for my board. 1601

TUF GAMING B550M-PLUS BIOS 1601
Update AMD AM4 AGESA V2 PI 1.1.9.0.

Will try this one.


----------



## Mandarb

No BSODs or WHEA errors, but I had issues with USB devices - namely my Elgato Cam Link freezing when accessed by MS Teams or when the CPU was under load in OBS. Restricting PCIe to Gen 3 fixed it, new ASUS beta BIOS for my X570-E Gaming fixed all issues with PCIe Gen 4 including the statics in the audio in Cyberpunk 2077 (which persisted under Gen 3). The BIOS Version is 3201 based on AGESA V2 PI 1.1.9.0., might give it a try if a BIOS with this AGESA version pops up for your motherboard.


----------



## newls1

Something ive noticed.... people with (possibly) faulty CPU's all end with "PGS" where as the chips that end in "SUS" seem perfect...... wonder what those letters mean? Anyways, my 5950x cpu comes in tomorrow and im happy and nervous at the same time as ive been following this thread for weeks now and hope i dont suffer from the issues.... My CPU has a batch code of 2047PGS . Please wish me luck.


----------



## yaniv82

thunk_stuff said:


> Good point. The bad chip was BG 2043PGS.
> 
> Unfortunately I didn't make note of the new chip date before I installed it. Looks like I can't get this from the box.


I have a 5950x that reboots at idle, same batch, BG 2043PGS. Already sent to RMA and waiting to hear from AMD


----------



## arvu

newls1 said:


> Something ive noticed.... people with (possibly) faulty CPU's all end with "PGS" where as the chips that end in "SUS" seem perfect...... wonder what those letters mean? Anyways, my 5950x cpu comes in tomorrow and im happy and nervous at the same time as ive been following this thread for weeks now and hope i dont suffer from the issues.... My CPU has a batch code of 2047PGS . Please wish me luck.


I had a faulty chip that had batch number ending with "SUS". It's code for production location.


----------



## Imraneo

Just wanna chime in here. Mines 2043SUS. It's a "bad" chip.
Yet to try the new BIOS release by Asus.

And yea.. like one bro mentioned. The Asus team is taking the lead here in the polls 😁


----------



## Hueristic

W0w, never seen so many new members with the same cpu problems, this must be a record.


----------



## aa.delite

Hueristic said:


> W0w, never seen so many new members with the same cpu problems, this must be a record.


And... silence from AMD.


----------



## DemonAk

Same reboots/bsod at idle with stock/defaults settings

ryzen 5950x (Batch BG* 2044SUS*), MB: B550 Taichi (bios 1.70, agesa 1.1.0.0 patch D)

Can't replace right now because cpu out of stock in store =\ and can't RMA because OEM

a temporary solution that helps me is *disable global c-state in bios*


----------



## aa.delite

DemonAk said:


> ryzen 5950x (Batch BG* 2044SUS*)


Mine OEM is also 2044SUS but Gigabyte m/b. There were reboots till F11j beta bios. Next F11k/l/m/n/o/p works well at default settings. But I can't trust this instance anymore and wanna replace it. But can't yet, it works well and CPUs are out of stock in store.
Maybe you can fix reboots increasing SOC voltage +0.05v. Maybe using Curve optimizer +5 all cores. But it's temporary solution, need to replace CPU I guess.


----------



## BluePaint

DemonAk said:


> a temporary solution that helps me is *disable global c-state in bios*


Thats interesting because that can also lower SC boost clocks which 5950s have the highest by default.


----------



## Deepcuts

AMD "finally" approved my RMA. Five weeks after I opened the ticket and one week after I already sent the CPU to the shop. So nice of them.
Add two more weeks prior to opening the ticket because I thought Gigabyte was to blame. Sorry G.
Maybe two to five weeks to get a new CPU back from AMD. Let's make it five weeks to be on the safe side because that is how long it took for them to even accept the RMA.
That is twelve weeks/three months.
And I feel guilty when any problem a client has is not solved the same day...
All this without taking into account the amount of time lost trying to debug this and the amount of time lost not working on what I should have with this computer. Should I factor in the money for the 2nd CPU I had to buy? How about the ungodly amounts of coffee required to keep me awake throughout testing?
What I am trying to say is: that is a lot of damage.
And the worst part is: all I want from AMD is a public apology and explanation, which I am 99.9% sure will never happen.
So I can only dream that everyone involved in this mess got only a lump of coal for Xmas. But who am I kidding. They got my money.


----------



## Makino

I managed to get a 5900x going from a 3900x fully stable @stock. At first i was getting a lot of whea errors and random BSOD, now its stable but i had to change my ram voltage from 1.35V to 1.41V.

My mobo is a Gigabyte x570 Aorus Elite. The Ram model is a GSkill f4-3600c16d-16gtzr


----------



## villason

New BIOS for the CH7 NON-WIFI:

*Version 4101 Beta Version*
2020/12/29 15.42 MBytes
ROG CROSSHAIR VII HERO BIOS 4101
1. Improve system performance
2. Improve CrashFree function
3. Update AMD AM4 AGESA V2 PI 1.1.9.0. 



https://dlcdnets.asus.com/pub/ASUS/mb/SocketAM4/ROG_CROSSHAIR-VII-HERO/ROG-CROSSHAIR-VII-HERO-ASUS-4101.ZIP



Use it at your own risk.


----------



## Taku123

there was a new bios update for x570 strix-i i downloaded and installed, will do further testing with the 5950x but already have an RMA in place.


----------



## luckyjj10

Just signed up to chime in and say my 5800x is quite unstable as well in various ways.
From first asus tuf 570 bios it came with, 3001 and 3201 all problems at stock+xmp (just 3200 ram). no xmp seems less crashy but I haven't waited long enough. memtests on the ram itself are good.
I'm getting by with manual SOC (1.1v atm) related voltages for the stock crash and 1.4v ram to maybe help. Along with voltage offset. (pbo enabled in any way makes my benchmarks even worse since it just throttles 90c more/sooner). Without the soc voltage: if I start prime95 it will run for xxx? time, but as soon as I close prime95 I get instant WHEA bluescreens pretty consistently when the load STOPS.

The cpu itself can't do any negative curve optimizer or +mhz in pbo without even more crash city. I laugh at people doing like +200mhz or -20 curve when -5 or +25mhz makes me crash harder.

I'd love to get it replaced if I could but I don't know which is more of an ass pain for me right now. I literally only upgraded from my old i7 2600k because my ancient mobo/ram was crashing (with bad memtest results), and I pay such a big premium just to keep crashing


----------



## frollic

frollic said:


> Requested a RMA with AMD 3 days ago.


Got a reply from AMD today, RMA accepted, shipping label provided.


----------



## hisXLNC

could it be the latest windows update? i had a problem with my intel 5930k which would give me bsod on boot with the latest update. i see someone here say this started only with a fresh install of windows?


----------



## frollic

hisXLNC said:


> could it be the latest windows update? i had a problem with my intel 5930k which would give ms bsod on boot with the latest update. i see someone here say this started only with a fresh install of windows?


I tried several releases of win10 pro, same issue using all of them.


----------



## GRABibus

hisXLNC said:


> could it be the latest windows update? i had a problem with my intel 5930k which would give me bsod on boot with the latest update. i see someone here say this started only with a fresh install of windows?


me too, with my former 5930k, I had reboots at idle or when browsing but not at heavy loads.
It occurred just after windows update beginning of November.


----------



## pSickOpatA

im done with a hundred tweaks and all bios available and nothing works.
Tried to contact amd, they pass the ball to Asus, and Asus is on a break till january 5th... now going directly from retailer to replace this cpu as soon as possible.


----------



## MoW

Don't put the blame on windows. It's not windows. I tried testing with an earlier version , it still bsod at stock settings. 
RMA is the way to go. Don't bother asking liniency with Amd. Silence is all you re getting.


----------



## newls1

Guys, im pretty stoked (SO FAR 🤞) My 5950x came in a few hours ago (2047PGS) Installed windows, Applied DOCP settings for my 2x16gb 3600 CL14/15/15/35 ram, Set PBO + Curve optimizer set to all core NEGETIVE 15, and here is my first cinebench R20 SINGLE THREAD score, and for MULTI THREAD i got 11720.... I have 5mins into this OC.. No WHEA errors (YET) Crossing my fingers I got a good cpu. I did flash to the latest Bios for my Dark Hero (3101 I think it was with the newest Agesa 1190) What do you think? Is this ok for 1st OC attempt, and maybe a good cpu?


----------



## Deepcuts

Good score but 87-88 Celsius after a CB20 run is way too high.
A Handbrake encode would throttle for sure.
I would say dial back that PBO until CB20 maxes out at ~70-75 Celsius.
Congratulations on getting a good sample!


----------



## iraff1

I still have my problem where i get random whea errors and i noticed now that whenever i do there's "popping sounds" if i playback music while it occurs. This occurs no matter what setting i put the cpu in, and doesn't matter if i "put load" or just let the cpu idle, it happens at complete random. 

I've been using the computer for over a month now and it has never crashed and i've done several of benchmark/tests/renders etc so its 100% stable. I have no idea why these random whea errors come in floods. I haven't looked into RMA because it would take forever to get a new cpu at this point and considering its fully functional (- the now newly discovered popping noise when the whea flood occurs) i'll wait and see if these bios changes can solve it.

Otherwise i think its pretty clear here, AMD has released the cpu without proper testing, something they didn't do witht he examples they sent to all the hardware reviewers, and we're getting the blunt force of it. I am definitely not a fan of having to spend nearly over nearly 2 weeks of headscartching and testing to figure out AMD has faulty silicone


----------



## newls1

Deepcuts said:


> Good score but 87-88 Celsius after a CB20 run is way too high.
> A Handbrake encode would throttle for sure.
> I would say dial back that PBO until CB20 maxes out at ~70-75 Celsius.
> Congratulations on getting a good sample!


yes, agreed. Given the cooling setup i have, i was shocked with those loaded temps. I see my loaded vcore was 1.320, so what im hoping to try a few things to bring down the vcore a bit. Im currently set to LLC4 and -15 on Curve Opt, so im hoping to drop to LLC3 and -20 C.O... Ill try that tomorrow when i get home from work. Crazy tho that im seeing cores boost to 5.2ghz with my current settings... crazy


----------



## nevcairiel

I just got a shipping confirmation from AMD RMA that my replacement is on its way. We'll see next week if everything is working as expected then.


----------



## Deepcuts

The shop I bought my CPU from agreed to reimburse me the price of the 1st CPU without any questions asked. Took one week from delivery to solution. +Karma 

Only from my small circle of friends and acquaintances, I calculated six 5950X (including mine), two 5900X and one 5600X with the same problem. Some of them already sent back to the shop. None of them even considered opening a ticket with AMD (smart people). Also, none of them posted here (bad bad people).
I wouldn't want to be the tech team that has to check all these CPUs at this time of year.


----------



## nevcairiel

I bought from AMDs online shop on launch day, because they were at MSRP and still available (shipped fast too), so I had no other RMA options. But after all the back and forth the replacement is on its way now.

For statistics, I also build two 5800X systems for friends, and those do not show any problems.


----------



## aa.delite

Can you test if BoostTester causes reboot of your defective CPU? Maybe it's the fast way to examine. It boosts every core one by one.


----------



## Deepcuts

aa.delite said:


> Can you test if BoostTester causes reboot of your defective CPU? Maybe it's the fast way to examine. It boosts every core one by one.
> Note! Executable files may contain viruses. It's clean, but you should not trust anyone, so experienced users only.


No need for Drive and access
https://jedi95.com/files/BoostTester.exe or https://github.com/jedi95/BoostTester/releases





VirusTotal


VirusTotal




www.virustotal.com


----------



## aa.delite

Deepcuts said:


> No need for Drive and access


Thx, edited


----------



## felek

I am on F31 (without any letter) now and still have BSOD WHEA 
Gigabyte x570 Gaming X, 5900X
Bios downloaded from GIGABYTE Latest Beta BIOS - TweakTown Forums


----------



## Deepcuts

felek said:


> I am on F31 (without any letter) now and still have BSOD WHEA
> Gigabyte x570 Gaming X, 5900X
> Bios downloaded from GIGABYTE Latest Beta BIOS - TweakTown Forums


Welcome and sorry to hear that.
If you are not stable at stock (no XMP or any other tweaks), do yourself a favor and return it. Don't waste your time debugging.
Disable Core Performance Boost until you can send it and use it without boost.


----------



## Anthos

Anthos said:


> I ll try to make a long story short. (5950x, dark hero)
> -Had the system for a week on an old windows installation, fine for that period of time
> -decided to make a fresh install on Sunday
> -started getting multiple WHEA reboots, almost all of them at idle (or simple use like browsing etc), only once during gaming I think
> -yesterday flashed to the latest beta bios and disabled c-states only, (had DOCP enabled) etc
> -left the pc continuous working since (~24h) and most of the time has been idle with the occassional game last night.
> -no WHEA errors since


Just an update to this
Since yesterday evening I re-enabled the c-states, so pretty much had everything as when I started getting the multiple WHEA reboots. 90% idle time since, 10% mixed. Still no errors repeated.
Now the only difference is that I am on a newer bios. Did that fix it? Well no.. it shouldn't, as I was on the stock bios for a week and everything was fine and the errors then started while on that same bios. I honestly don't know, they might still happen, just being really really spaced but still, there was a point that I had multiple within an hour. What changed? 
Anyway I am not sure though if I am more annoyed with this issue or with AMD keeping complete radio silence. I highly doubt that only people affected by this it's us 100 whatever people gathered here. I can't imagine how many people out there might be trying to build pcs and have them keep crashing and are not tech savvy enough to understand what's going on or to reach forums like this to make their issue known. I guess AMD's marketing department is only good at overhyping products but probably skipped class when they were teaching things like damage control.
Anyway, as it seems to be stable at the moment I'll try to escalate a bit by enabling PBO etc and see if stability remains or if it forces errors.


----------



## xeizo

Anthos said:


> Just an update to this
> Since yesterday evening I re-enabled the c-states, so pretty much had everything as when I started getting the multiple WHEA reboots. 90% idle time since, 10% mixed. Still no errors repeated.
> Now the only difference is that I am on a newer bios. Did that fix it? Well no.. it shouldn't, as I was on the stock bios for a week and everything was fine and the errors then started while on that same bios. I honestly don't know, they might still happen, just being really really spaced but still, there was a point that I had multiple within an hour. What changed?
> Anyway I am not sure though if I am more annoyed with this issue or with AMD keeping complete radio silence. I highly doubt that only people affected by this it's us 100 whatever people gathered here. I can't imagine how many people out there might be trying to build pcs and have them keep crashing and are not tech savvy enough to understand what's going on or to reach forums like this to make their issue known. I guess AMD's marketing department is only good at overhyping products but probably skipped class when they were teaching things like damage control.
> Anyway, as it seems to be stable at the moment I'll try to escalate a bit by enabling PBO etc and see if stability remains or if it forces errors.


About my thoughts as well, this was never a problem with Zen 2, until with the Zen 3 bioses. Sure, there was the occasional WHEA errors but sudden reboots was very uncommon.

This new AGESA 1.1.9.0 bios boosts much lower though, 4950MHz single vs 5150MHz same settings. I haven't had any reboot on it using Zen 3, but there is still WHEA errors. With Zen 2 on another rig I've had a sudden reboot even with the new AGESA.

Everything else is stable, guess I'm fortunate, but these WHEA are annoying.


----------



## mongoled

newls1 said:


> yes, agreed. Given the cooling setup i have, i was shocked with those loaded temps. I see my loaded vcore was 1.320, so what im hoping to try a few things to bring down the vcore a bit. Im currently set to LLC4 and -15 on Curve Opt, so im hoping to drop to LLC3 and -20 C.O... Ill try that tomorrow when i get home from work. Crazy tho that im seeing cores boost to 5.2ghz with my current settings... crazy


As you are using PBO with curve optimiser your CPU will boost and hold the boost up until a thermal/current limit is detected.

Your cooling subsytem paired with your CPU has found that those loaded temperature is the optimal amount for peak CPU performance.

As others have said, you will need to test each core sepearately to see what tolerances it hasto heavy and light loads when using curve optimiser.

Otherwise you can reduce CPU temperatue is to impose either PBO limits via reducing PPT/TDC/EDC or setting a CPU vCore offset to a lower voltage than its default voltage.

Oh and congrats on a nice specimen, 5.2ghz boost sounds awful nice


----------



## Alvy

Got my RMA # from AMD as well \o/. At least things are moving, holidays are pretty much over too. I bought a temporary 3950X a week ago since I couldn't find a 2nd 5950X and it's been working flawlessly on the x570 xtreme with 0 issues.


----------



## newls1

mongoled said:


> As you are using PBO with curve optimiser your CPU will boost and hold the boost up until a thermal/current limit is detected.
> 
> Your cooling subsytem paired with your CPU has found that those loaded temperature is the optimal acount for peak CPU performance.
> 
> As others have said, you will need to test each core sepearately to see what tolerances it hasto heavy and light loads when using curve optimiser.
> 
> Otherwise you can reduce CPU temperatue is to impose either PBO limits via reducing PPT/TDC/EDC or setting a CPU vCore offset to a lower voltage than its default voltage.
> 
> Oh and congrats on a nice specimen, 5.2ghz boost sounds awful nice


so can i actually set a manual vcore while using PBO? I thought i had to keep that on auto??


----------



## xeizo

Using these settings I haven't had any WHEA as of yet:

VTT = 725mV
VDDP = 950mV
VDDG IOD = 1000mV
VDDG CCD = 1000mV
SOC = 1.05V-1.1V (not Auto)
SB 1.05 = 1.1V
VDIMM = 1.39V (3800MHz)

I also used HWINFO64 to check power consumption for each core during load, and gave the thirsty ones more offset and the good ones less so that now each core draws about as much power under full load.

Fmax is disabled, CPPC, CPPC Preferred cores and Golobal C-states are enabled, all else on Auto


----------



## xeizo

newls1 said:


> so can i actually set a manual vcore while using PBO? I thought i had to keep that on auto??


PBO uses voltage offset as one of the parameters to control boost, that would stop working with a set vcore. I can't see any benefit.


----------



## newls1

xeizo said:


> PBO uses voltage offset as one of the parameters to control boost, that would stop working with a set vcore. I can't see any benefit.


so is that a no? keep my voltage on "auto" then? im trying to reduce my temps some, so im on LLC4 and -15 for curve optimizer. Would dropping to LLC3 and -20 drop my voltage some? Crossing fingers i retain my stability of course. Im at work and cant test this till tomorrow.


----------



## Imraneo

I just refreshed my BIOS page for my Strix X570-F and the latest 3201 is gone! Lol..
Isn't it strange that these BIOS releases across the mobo models for Asus came at a staggered pace and it seems to even be getting withdrawn one at a time? Amusing...


----------



## Catscratch

xeizo said:


> About my thoughts as well, this was never a problem with Zen 2, until with the Zen 3 bioses. Sure, there was the occasional WHEA errors but sudden reboots was very uncommon.
> 
> This new AGESA 1.1.9.0 bios boosts much lower though, 4950MHz single vs 5150MHz same settings. I haven't had any reboot on it using Zen 3, but there is still WHEA errors. With Zen 2 on another rig I've had a sudden reboot even with the new AGESA.
> 
> Everything else is stable, guess I'm fortunate, but these WHEA are annoying.


I guess this thread is a warning for older ZEN owners to stay away from bioses that support Zen3 series. I was wondering about them myself, however I never rush to update bios or any drivers unless they fix a problem I have.

I wish you all good luck and speedy RMA in 2021.


----------



## xeizo

newls1 said:


> so is that a no? keep my voltage on "auto" then? im trying to reduce my temps some, so im on LLC4 and -15 for curve optimizer. Would dropping to LLC3 and -20 drop my voltage some? Crossing fingers i retain my stability of course. Im at work and cant test this till tomorrow.


Easiest way to lower temps is to set PPT/TDC/EDC at lower values when using PBO. You loose a little performance, but it can be a lot cooler.


----------



## mongoled

newls1 said:


> so can i actually set a manual vcore while using PBO? I thought i had to keep that on auto??


I did not say to set a manual vcore for the CPU, i said to set a a CPU vcore offset

Setting a CPU vcore offset means the CPU will stll be on auto but with a higher/lower ceiling depending on if you set a positive or negative offset.



xeizo said:


> PBO uses voltage offset as one of the parameters to control boost, that would stop working with a set vcore. I can't see any benefit.


Again, the information you posted above is correct for manually setting vcore to a arbitrary value, but this is not what I said could be done to lower temps.....


----------



## villason

I just installed my 5950X on a ASUS CH7 with AGESA 1.1.9.0. All BIOS defaults except DOCP. Note my RAM is 3200 CL14 so it's officially supported by the CPU, not considered OC by AMD.

FLAWLESS so far. Idle at 35deg and max 67deg during Cinebench R20 run. NH-D15 properly installed (I am sure most people install it wrong).


----------



## xeizo

mongoled said:


> I did not say to set a manual vcore for the CPU, i said to set a a CPU vcore offset
> 
> Setting a CPU vcore offset means the CPU will stll be on auto but with a higher/lower ceiling depending on if you set a positive or negative offset.
> 
> 
> Again, the information you posted above is correct for manually setting vcore to a arbitrary value, but this is not what I said could be done to lower temps.....


But AMD has explicitly said to not use vcore offset with Ryzen 5000.


----------



## hisXLNC

villason said:


> I just installed my 5950X on a ASUS CH7 with AGESA 1.1.9.0. All BIOS defaults except DOCP. Note my RAM is 3200 CL14 so it's officially supported by the CPU, not considered OC by AMD.
> 
> FLAWLESS so far. Idle at 35deg and max 67deg during Cinebench R20 run. NH-D15 properly installed (I am sure most people install it wrong).


whats the week/manufacturer code on the chip?


----------



## GRABibus

hisXLNC said:


> whats the week/manufacturer code on the chip?


useless.
This is completely random.


----------



## mongoled

xeizo said:


> But AMD has explicitly said to not use vcore offset with Ryzen 5000.


I was not aware of that. Can you post where you saw this information. 

I remember Tom saying "undervolting" the CPU would be a thing of the past when using Curve Optimizer, but I don't recall him saying anything explicitly regards using offsets

Thanks


----------



## aa.delite

xeizo said:


> But AMD has explicitly said to not use vcore offset with Ryzen 5000.


It's safe. You'll get performance drop depends on offset value. You can lower PBO limits and also get perfomance drop. So use offset or PBO limits to lower temperature, but it will cost perfomance.


----------



## yaniv82

villason said:


> I just installed my 5950X on a ASUS CH7 with AGESA 1.1.9.0. All BIOS defaults except DOCP. Note my RAM is 3200 CL14 so it's officially supported by the CPU, not considered OC by AMD.
> 
> FLAWLESS so far. Idle at 35deg and max 67deg during Cinebench R20 run. NH-D15 properly installed (I am sure most people install it wrong).


Could you explain how to properly install an NH-D15? I’ve followed the instructions and videos and doesn’t seem to be that complicated but I’m getting higher temps compared to yours


----------



## newls1

I have a handful of updates regarding my PBO + Curve Optimizer OC... I've had a metric ton of reboots and insta-freezes while playing with settings using trial and error. Before people say RMA the cpu which is the going method so far (I understand all your frustrations people) but after the following settings
PBO - Enabled and set to +150Mhz
PBO Fmax enhancer ENABLED (all other PBO options set to "auto")
Curve Optimizer ALL CORE NEGATIVE 15
CPU Vcore - AUTO 
CPU LLC 4 (3 crashes every now and then, 4 seems stable)
Vddg-ccd 1.05
vddg-iod 1.05
ddr 1.45
SOC .950
Mem speed 3800 CL 14/16/16/35
fclk 1900
procODT _48ohm (if I set it to 43.6 like dram calc said to pc error codes on "22" on code read out)
reboots so far have stopped, no insta-reboots (so far) and this is after SEVERAL hours of back to back 100% cpu loads testing memory, and overall cpu stabilty using various runs of CB R20 and aida64. My temps are of a concern and want to know what i need to do in order to drop temps a good 5-6c would make me happy. I've noticed just doing basic pc tasks, voltage hovers in the 1.43-1.49v then when fully loaded CB R20 runs, it holds to 1.312 and temps are 90c CCD0 81c CCD1 (using LLC4) if i use LLC3 CCD0 87c CCD1 78c.

Can someone please help me figure out what to adjust to keep my OC but lower temps. Im very confused cause if i change vcore from "auto" i loose PBO.. This is so different then OC'ing on Intel! Any feedback would be greatly appreciated


----------



## MoW

yaniv82 said:


> Could you explain how to properly install an NH-D15? I’ve followed the instructions and videos and doesn’t seem to be that complicated but I’m getting higher temps compared to yours





newls1 said:


> I have a handful of updates regarding my PBO + Curve Optimizer OC... I've had a metric ton of reboots and insta-freezes while playing with settings using trial and error. Before people say RMA the cpu which is the going method so far (I understand all your frustrations people) but after the following settings
> PBO - Enabled and set to +150Mhz
> PBO Fmax enhancer ENABLED (all other PBO options set to "auto")
> Curve Optimizer ALL CORE NEGATIVE 15
> CPU Vcore - AUTO
> CPU LLC 4 (3 crashes every now and then, 4 seems stable)
> Vddg-ccd 1.05
> vddg-iod 1.05
> ddr 1.45
> SOC .950
> Mem speed 3800 CL 14/16/16/35
> fclk 1900
> procODT _48ohm (if I set it to 43.6 like dram calc said to pc error codes on "22" on code read out)
> reboots so far have stopped, no insta-reboots (so far) and this is after SEVERAL hours of back to back 100% cpu loads testing memory, and overall cpu stabilty using various runs of CB R20 and aida64. My temps are of a concern and want to know what i need to do in order to drop temps a good 5-6c would make me happy. I've noticed just doing basic pc tasks, voltage hovers in the 1.43-1.49v then when fully loaded CB R20 runs, it holds to 1.312 and temps are 90c CCD0 81c CCD1 (using LLC4) if i use LLC3 CCD0 87c CCD1 78c.
> 
> Can someone please help me figure out what to adjust to keep my OC but lower temps. Im very confused cause if i change vcore from "auto" i loose PBO.. This is so different then OC'ing on Intel! Any feedback would be greatly appreciated


This tread is for BSOD on bios defaults. Please keep it on track, guys. Thanks.


----------



## iraff1

xeizo said:


> This new AGESA 1.1.9.0 bios boosts much lower though, 4950MHz single vs 5150MHz same settings. I haven't had any reboot on it using Zen 3, but there is still WHEA errors. With Zen 2 on another rig I've had a sudden reboot even with the new AGESA.
> 
> Everything else is stable, guess I'm fortunate, but these WHEA are annoying.


I had a feeling the new AGESA would just be a downgrade of the boost curve rather then actually optimizing anything. I don't understand how this is acceptable, they showcase a certain level of performance on their presentation of the cpu, then they have to downgrade the boost algotimn 3 months after release to suit the massive amounts of cpus that cant handle the agressive boosting. Lowering the boost from 5150 to 4950 is not a feeseble solution according to me, they are downgrading the performance to gain stability, but they already showcased the cpu with better performance and that should be stable out of the box.


----------



## GRABibus

Please, let’s post this thread on all the AMD forums


----------



## Anthos

Anthos said:


> Just an update to this
> Since yesterday evening I re-enabled the c-states, so pretty much had everything as when I started getting the multiple WHEA reboots. 90% idle time since, 10% mixed. Still no errors repeated.
> Now the only difference is that I am on a newer bios. Did that fix it? Well no.. it shouldn't, as I was on the stock bios for a week and everything was fine and the errors then started while on that same bios. I honestly don't know, they might still happen, just being really really spaced but still, there was a point that I had multiple within an hour. What changed?
> Anyway I am not sure though if I am more annoyed with this issue or with AMD keeping complete radio silence. I highly doubt that only people affected by this it's us 100 whatever people gathered here. I can't imagine how many people out there might be trying to build pcs and have them keep crashing and are not tech savvy enough to understand what's going on or to reach forums like this to make their issue known. I guess AMD's marketing department is only good at overhyping products but probably skipped class when they were teaching things like damage control.
> Anyway, as it seems to be stable at the moment I'll try to escalate a bit by enabling PBO etc and see if stability remains or if it forces errors.


Possibly and hopefully my last update to this. Gave it another day and started experimenting with ram (EDIT: when I say ram overclock I mean above DOCP) and cpu overclocks (in addition to the previous CPO + CBP). As I never had AMD before I ended up with multiple crashes because I didn't really know what I was doing but none was a WHEA error. I did manage to run several different stable overclocks though. So technically I haven't had a WHEA error again since Monday. The ONLY difference between the before, during and after the WHEA errors is maturity of os? On my old installation it was solid. Started crashing on multiple fresh installations (even once when booting off USB to install the windows) and after a day or so didn't crash again on the fresh installation (possible windows update might have been installed in the meantime that rectified it?) OR possibly installation of a more updated/stable driver?
I literally have no idea. All I know at the moment is not only is the system very stable but yesterday I even caught one of my cores boosting to 5150 so yeah, hope it stays this way.


----------



## xeizo

I have tried a lot of things, the one that seem to affect WHEA the most is if VDDG is too low. At 1.05V I haven't had a single WHEA.

The sudden reboots can easily be triggered with too enthusiastic Curve Optimizer.


----------



## xdev

Joined only to comment here.

I'm glad I found this thread. Now I feel justified in RMA'ing my CPU. I am also experiencing the WHEA crashes with completely default BIOS (no XMP/DOCP) on an ASUS Strix X570 E. A fixed, all core overclock seems to "resolve" the issue. I don't have the desire to tweak all of the random PB settings you guys are playing with. I just want to run the CPU at the stock settings. So I am simply RMA'ing the CPU. Unfortunately, due to stock issues I assume exchanging with Amazon is probably not an option.


----------



## newls1

dumb question i know, but where do i go to check for whea errors?


----------



## Deepcuts

newls1 said:


> dumb question i know, but where do i go to check for whea errors?


Download and rename to .xml
Import this in Custom Views








Should work.


----------



## GRABibus

Guys,
Go there and post this thread with your experiences :






Processors







community.amd.com





the more AMD will see complaints and people who say that they will go to Intel for next Build, the greater the chance that AMD finally communicates about this ****


----------



## rob-tech

iraff1 said:


> I had a feeling the new AGESA would just be a downgrade of the boost curve rather then actually optimizing anything. I don't understand how this is acceptable, they showcase a certain level of performance on their presentation of the cpu, then they have to downgrade the boost algotimn 3 months after release to suit the massive amounts of cpus that cant handle the agressive boosting. Lowering the boost from 5150 to 4950 is not a feeseble solution according to me, they are downgrading the performance to gain stability, but they already showcased the cpu with better performance and that should be stable out of the box.


Agreed, this is just a ****show and I'm surprised no review publication picks apart BS like this, probably they get golden samples from AMD.

There is no quality and even though my flagship x570 system is currently working well, I did have to go through a lot also to get to this point and hope Intel comes back strong, I can't see myself choosing AMD for the processor and platform in the future with the exception of the video card, I have been pleased with the 5700 XT and the open source Linux drivers are a significant step up from anything Nvidia offers.


----------



## GRABibus

So funny :
















It is a multidimensional issue....


----------



## xdev

Deepcuts said:


> Download and rename to .xml
> Import this in Custom Views
> View attachment 2472350
> 
> Should work.


 Interestingly, I don't see any errors with this filter. I've only had BSODs with WHEA as the error.


----------



## Deepcuts

xdev said:


> Interestingly, I don't see any errors with this filter. I've only had BSODs with WHEA as the error.


The provided xml only filters by source WHEA-Logger.
I can see your source is BugCheck
To be frank, I never saw any WHEA errors logged on this board so I cannot be 100% sure. Not even hwinfo ever showed any WHEA errors.
No clue if this means I actually have no WHEA errors or the board just hides them or doesn't report them.
I recall using that filter on my old Intel 8700K when I had problems getting the RAM to work at XMP.


----------



## dehun

I managed to stabilize this system, yet it took quite some time. 
I have 

gigabyte aorus x570 elite
amd ryzen 5800x
hyperx predator 64gb (2x32gb) (xmp 3200mhz at 1.35v)
bios version F31*o* - latest available at a time
be quiet shadow rock 3 with replaced fan(1600 -> 2200)
I have experienced WHEA uncorrectable error BSODs mostly during gaming. Originally I thought that video card is not good - but stability tests for video card did not produced any crashes.
OCCT for 1 hour also was stable and no errors. OCCT power test produced crash every time within several minutes. Stopping test after 2-3 minutes also produced crash immidiately. Adjusting LLC did not helped with it. 

Stable configuration in my case is:

upgraded bios to *F31q*
setting voltage for DRAM to 1.35v in addition to configuting XMP profile
setting SoC voltage to 1.1v instead of AUTO
PBO on auto
Enabling or disabling PBO with F31q and correctly configured RAM does not affect stability of the system.

Quite unhappy with gigabyte aorus - bios F31o with XMP profile for ram did not put 1.35v, but remained at 1.2v. Asus mobo on other PC required only profile to be configured.

PBO was a big party breaker in my case - if I leave it at auto - F31o crashes during gaming. Setting PBO to manual and leaving everything there at AUTO - intensifies crashes. 
Playing with options for manual PBO - not even post, or crash right after log in.
When I updated to *F31q *PBO became way more stable. I can enter some manual values that actually work. 

I have stumbled upon stable configuration by attempting to overclock RAM.
I managed to get stable configuration with manual overclocks for RAM and CPU. However it was slightly slower than stock. 
Setting CPU multiplier to 47 and CPU voltage to 1.4 resulted in thermal shutdown during OCCT power test or any AVX/SSE intensive workloads. 

However in my case PBO makes system less responsive. Also afaik PBO is not stock = is not guaranteed to work.
In the end I decided to send back CPU, RAM and motherboard. Going to replace them with 5950x with asus mobo and g.skill ram. Wish me luck %)


----------



## Midian

dehun said:


> I managed to stabilize this system, yet it took quite some time.
> I have
> 
> gigabyte aorus x570 elite
> amd ryzen 5800x
> hyperx predator 64gb (2x32gb) (xmp 3200mhz at 1.35v)
> bios version F31*o* - latest available at a time
> be quiet shadow rock 3 with replaced fan(1600 -> 2200)
> I have experienced WHEA uncorrectable error BSODs mostly during gaming. Originally I thought that video card is not good - but stability tests for video card did not produced any crashes.
> OCCT for 1 hour also was stable and no errors. OCCT power test produced crash every time within several minutes. Stopping test after 2-3 minutes also produced crash immidiately. Adjusting LLC did not helped with it.
> 
> Stable configuration in my case is:
> 
> upgraded bios to *F31q*
> setting voltage for DRAM to 1.35v in addition to configuting XMP profile
> setting SoC voltage to 1.1v instead of AUTO
> PBO on auto
> Enabling or disabling PBO with F31q and correctly configured RAM does not affect stability of the system.
> 
> Quite unhappy with gigabyte aorus - bios F31o with XMP profile for ram did not put 1.35v, but remained at 1.2v. Asus mobo on other PC required only profile to be configured.
> 
> PBO was a big party breaker in my case - if I leave it at auto - F31o crashes during gaming. Setting PBO to manual and leaving everything there at AUTO - intensifies crashes.
> Playing with options for manual PBO - not even post, or crash right after log in.
> When I updated to *F31q *PBO became way more stable. I can enter some manual values that actually work.
> 
> I have stumbled upon stable configuration by attempting to overclock RAM.
> I managed to get stable configuration with manual overclocks for RAM and CPU. However it was slightly slower than stock.
> Setting CPU multiplier to 47 and CPU voltage to 1.4 resulted in thermal shutdown during OCCT power test or any AVX/SSE intensive workloads.
> 
> However in my case PBO makes system less responsive. Also afaik PBO is not stock = is not guaranteed to work.
> In the end I decided to send back CPU, RAM and motherboard. Going to replace them with 5950x with asus mobo and g.skill ram. Wish me luck %)











Strange on my Xtreme (of course not the same motherboard but still) XMP works flawlessly it even seems it gives almost 1.4v for the RAM (not sure if that's good or bad). I have had no WHEA errors at all on this Windows install from dec 2 first using *F31j *and now *F31q *with PBO on auto which I think is off.


----------



## brasoveanul

I have just created an account on this forum especially to also confirm that the launch of the Ryzen 5000 series seems to be plagued by a consistent carelessness for the quality control. Thus, I replaced the former 3950X with a 5950X(week 40) on an otherwise stable hardware configuration (X570 Plus WiFi, 128 GB Micron memory chips, etc.). A few minutes after the processor swap, the nightmare started, with random reboots that were appearing in the event logger as Kernel-Power or WHEA Cache Hierarchy error. I have relatively managed to stabilize the system by applying the BIOS settings changes that are generally discussed and some other ones. Nevertheless, the reboots have not disappeared completely. Considering that the supply for the new Ryzen 5000 series is fine in Romania, I ordered another 5950X (week 46), and I initiated the return process for the previous processor, I still wait to receive the reimbursement in my bank account. Thus, I installed the new 5950X, I loaded the optimized defaults in BIOS, I enabled PBO in the advanced section of the Asus' BIOS, and I started using the system. The difference is striking, as I haven't had any reboot or any other concerning error to appear in the event viewer. Furthermore, even the synthetic benchmarks prove a substantially better performance of the new CPU (R20 is 11200 plus without any advanced OC and a standard air cooler, although Noctua, while R23 is around 29000 without any advanced OC). Nevertheless, the most important is that the system is stable and it can be used. It is obvious that AMD throws defective units on the market, without any preoccupation for the quality control process and the awful experience these units provoke to the respective unlucky buyers. I would strongly advise any user, which suffers from instability issues after installing a Ryzen 5000 on a previously stable platform to go ahead and RMA/exchange/return the CPU.


----------



## hisXLNC

i saw on amd community forums someone said that iirc gigabyte told them to increase their dram voltage by 0.05. could be something to do with the memory controller


----------



## aa.delite

brasoveanul said:


> R23 is around 29000 without any advanced OC


Do you have water cooling? 29000 R23 seems like high PBO TDP limits set in bios. I think default 5950x Cinebench R23 score is 24500-25500, isn't it? Did you modify Precision Boost PPT/TDC/EDC settings to get 29000 R23 or using default bios settings?
Well, mine one 24500 at default settings. I'm trying to figure if it's normal. What if defective CPUs have low score.


----------



## xeizo

brasoveanul said:


> I have just created an account on this forum especially to also confirm that the launch of the Ryzen 5000 series seems to be plagued by a consistent carelessness for the quality control. Thus, I replaced the former 3950X with a 5950X(week 40) on an otherwise stable hardware configuration (X570 Plus WiFi, 128 GB Micron memory chips, etc.). A few minutes after the processor swap, the nightmare started, with random reboots that were appearing in the event logger as Kernel-Power or WHEA Cache Hierarchy error. I have relatively managed to stabilize the system by applying the BIOS settings changes that are generally discussed and some other ones. Nevertheless, the reboots have not disappeared completely. Considering that the supply for the new Ryzen 5000 series is fine in Romania, I ordered another 5950X (week 46), and I initiated the return process for the previous processor, I still wait to receive the reimbursement in my bank account. Thus, I installed the new 5950X, I loaded the optimized defaults in BIOS, I enabled PBO in the advanced section of the Asus' BIOS, and I started using the system. The difference is striking, as I haven't had any reboot or any other concerning error to appear in the event viewer. Furthermore, even the synthetic benchmarks prove a substantially better performance of the new CPU (R20 is 11200 plus without any advanced OC and a standard air cooler, although Noctua, while R23 is around 29000 without any advanced OC). Nevertheless, the most important is that the system is stable and it can be used. It is obvious that AMD throws defective units on the market, without any preoccupation for the quality control process and the awful experience these units provoke to the respective unlucky buyers. I would strongly advise any user, which suffers from instability issues after installing a Ryzen 5000 on a previously stable platform to go ahead and RMA/exchange/return the CPU.


Yes, I think QC is part of the issue here. The whole idea of using chiplets is selling more of the silicon from the wafers, thus more profit. 7nm wafers from TSMC are extreme expensive. Maybe AMD is taking a slight risk here hoping not to get caught.


----------



## brasoveanul

aa.delite said:


> Do you have water cooling? 29000 R23 seems like high PBO TDP limits set in bios. I think default 5950x Cinebench R23 score is 24500-25500, isn't it? Did you modify Precision Boost PPT/TDC/EDC settings to get 29000 R23 or using default bios settings?
> Well, mine one 24500 at default settings. I'm trying to figure if it's normal. What if defective CPUs have low score.


As I said, I haven't made any custom settings in the BIOS, I have just loaded the so-called optimized defaults, then I set PBO from the advanced section only to enabled, and this is pretty much all that I did. I don't know what the future will bring, I hope the new processor stays stable. Nevertheless, considering the nightmare that I experienced, I am still skeptical, and I'll probably avoid AMD for a future platform change. I use a standard air cooler. The CPU temperature while running R20 is around 62-65 degrees celsius. From what I experienced, it seems that the problematic chips are both unstable and perform below the expected margins.


----------



## aa.delite

brasoveanul said:


> then I set PBO from the advanced section only to enabled


That's enough to raise PBO limits and get 29000 R23 score.
But I'm impressed about air cooling. 60C during Cinebench test seems impossible even for watercooling if PBO Enabled. 
Well, 100% CPU issues were resolved by RMA I see.


----------



## brasoveanul

aa.delite said:


> That's enough to raise PBO limits and get 29000 R23 score.
> But I'm impressed about air cooling. 60C during Cinebench test seems impossible even for watercooling if PBO Enabled.
> Well, 100% CPU issues were resolved by RMA I see.


Yes, it seems that the new CPU behaves significantly different than the old one, which points to a defective processor.


----------



## machine038

aa.delite said:


> Well, mine one 24500 at default settings


That is the default.. To get 29000 score you have to get your CPU all core clocked around 4.5GHz...
I've got a faulty 5950x, one that crashes at default bios, I managed to score 27k with it PBO enabled.

Both CPU (faulty/replacement) at default settings net around the 24.5k score.

With the replacement chip I've managed to get around 4.55GHz all core on R23 (stable) getting around 29k points, with PBO enabled and curve optimizer at 14 cores with -30, two at -10..










I forgot to take a screenshot with the clock speeds 









This score doesn't reflect much on real world usage tough (24k score vs 29k), there is a very small gain (few seconds) on my workload. Stock VS PBO that is.

I think you should RMA if you're having issues with default bios settings.. If is crashing with PBO enabled or enabling XMP it might require some adjustments..


----------



## brasoveanul

machine038 said:


> That is the default.. To get 29000 score you have to get your CPU all core clocked around 4.5GHz...
> I've got a faulty 5950x, one that crashes at default bios, I managed to score 27k with it PBO enabled.
> 
> Both CPU (faulty/replacement) at default settings net around the 24.5k score.
> 
> With the replacement chip I've managed to get around 4.55GHz all core on R23 (stable) getting around 29k points, with PBO enabled and curve optimizer at 14 cores with -30, two at -10..
> 
> View attachment 2472741
> 
> 
> I forgot to take a screenshot with the clock speeds
> View attachment 2472742
> 
> 
> This score doesn't reflect much on real world usage tough (24k score vs 29k), there is a very small gain (few seconds) on my workload. Stock VS PBO that is.
> 
> I think you should RMA if you're having issues with default bios settings.. If is crashing with PBO enabled or enabling XMP it might require some adjustments..


In my case, I am very close to 29000 just with PBO enabled, without any other adjustment, but that is a detail, the most important is stability, which I hope would stay.


----------



## xeizo

I haven't had any WHEA since raising PLL, multicore takes a hit so nothing for those benchmarkers, but it looks stable now. Maybe worth more than setting records. Single doesn't take any hit, which is more important imho.


----------



## Marucins

Deepcuts said:


> *Good news.*
> Shutdown PC. Removed the power cord. Reset CMOS via the back IO button.
> Replaced the CPU. Started the PC. Entered BIOS and loaded setup defaults. RAM 2133 UCLK/FCLK 1067. No XMP. Nothing else changed. Same Windows install.
> BIOS version is F31
> 
> First 30 minutes light internet browsing with youtube videos, some internet speed test, launched Guild Wars 2. All good.
> Next, I started a Handbrake encode. 1st encode took 25 minutes and everything is still stable.
> After the 1st Handbrake Encode did some AIDA64 benchmarks. Still stable.
> Left the system idle for another 30 minutes. Still stable.
> The 1st CPU would have crashed a long time ago.
> 
> Of course, 1-2 hours of testing cannot be labeled as 100% stable. But the fact is: the 2nd CPU is light years ahead of the 1st CPU stability wise.
> Cannot stress this enough: *the only thing I have replaced is the CPU*.
> 3950X = stable
> 1st 5950X = Not stable at stock
> 2nd 5950X = Stable so far at stock.
> 
> Will leave it at stock for 1 week. If still stable, will start tweaking.
> 
> If this is not proof enough AMD did a boo-boo, I do not know what is. But my 2 cents bet is that they will never acknowledge the scale of this issue.
> I was under the wrong impression that somehow AMD's quality check would not let so many broken CPUs slip through. I was wrong.
> 
> Left 1st CPU Right 2nd CPU​
> View attachment 2470476
> 
> 
> View attachment 2470440


Cool!
And I said to replace the CPU right away.
I am glad that everything works for you as it should.
I
have been waiting for my over a month - I should receive it on January 7.


----------



## GamBoTron

brasoveanul said:


> I have just created an account on this forum especially to also confirm that the launch of the Ryzen 5000 series seems to be plagued by a consistent carelessness for the quality control. Thus, I replaced the former 3950X with a 5950X(week 40) on an otherwise stable hardware configuration (X570 Plus WiFi, 128 GB Micron memory chips, etc.). A few minutes after the processor swap, the nightmare started, with random reboots that were appearing in the event logger as Kernel-Power or WHEA Cache Hierarchy error. I have relatively managed to stabilize the system by applying the BIOS settings changes that are generally discussed and some other ones. Nevertheless, the reboots have not disappeared completely. Considering that the supply for the new Ryzen 5000 series is fine in Romania, I ordered another 5950X (week 46), and I initiated the return process for the previous processor, I still wait to receive the reimbursement in my bank account. Thus, I installed the new 5950X, I loaded the optimized defaults in BIOS, I enabled PBO in the advanced section of the Asus' BIOS, and I started using the system. The difference is striking, as I haven't had any reboot or any other concerning error to appear in the event viewer. Furthermore, even the synthetic benchmarks prove a substantially better performance of the new CPU (R20 is 11200 plus without any advanced OC and a standard air cooler, although Noctua, while R23 is around 29000 without any advanced OC). Nevertheless, the most important is that the system is stable and it can be used. It is obvious that AMD throws defective units on the market, without any preoccupation for the quality control process and the awful experience these units provoke to the respective unlucky buyers. I would strongly advise any user, which suffers from instability issues after installing a Ryzen 5000 on a previously stable platform to go ahead and RMA/exchange/return the CPU.


Thanks for this tip, i am waiting for my 5950x, it should arrive in a couple of weeks. If i get similar issues with WHEA cache error etc i will return the CPU right away instead of trying to OC or tweak too much.

its a real shame the quality control is so bad for this product, but thankfully, users like you give other consumers valuable tips and i thank you for that.


----------



## brasoveanul

GamBoTron said:


> Thanks for this tip, i am waiting for my 5950x, it should arrive in a couple of weeks. If i get similar issues with WHEA cache error etc i will return the CPU right away instead of trying to OC or tweak too much.
> 
> its a real shame the quality control is so bad for this product, but thankfully, users like you give othe consumers valuable tips and i thank you for that.


You're welcome. It is important that we, as customers, react so that the manufacturers and suppliers get an incentive to offer functional products. Unfortunately, this is the level at which we are at present, in the 21st century, craving to buy functional products that are supposed to be manufactured in high tech facilities.


----------



## GamBoTron

brasoveanul said:


> You're welcome. It is important that we, as customers, react so that the manufacturers and suppliers get an incentive to offer functional products. Unfortunately, this is the level at which we are at present, in the 21st century, craving to buy functional products that are supposed to be manufactured in high tech facilities.


yes its a shame. 

Its been a crazy year in that sense. Cyberpunk for example also released unfinished and with plenty of bugs. 

It seems like this is becoming more normal and that manufacturers/publishers try to get away with sloppy programming. 

I really hope they start to take these things more seriously, but then again money is always going to dictate things


----------



## iraff1

I also want to exchange my cpu but the truth of the matter is there are no 5950x to be had here, once there are i will buy another one and then force my way to an RMA, it is awful that AMD release broken chip out on the market like this, and whats even worse is that the entire community of people having issues are more or less being ignored by all the tech reviewers, why are no body discussing the obvious fact that a bunch of silicone is broken? Way more than usual, their quality control must be awful.


----------



## nevcairiel

My 5950X RMA replacement from AMD arrived, 2051SUS production, so pretty new. Testing will commence shortly.


----------



## brasoveanul

iraff1 said:


> I also want to exchange my cpu but the truth of the matter is there are no 5950x to be had here, once there are i will buy another one and then force my way to an RMA, it is awful that AMD release broken chip out on the market like this, and whats even worse is that the entire community of people having issues are more or less being ignored by all the tech reviewers, why are no body discussing the obvious fact that a bunch of silicone is broken? Way more than usual, their quality control must be awful.


Indeed, it is an important issue that you raise here, there are so many tech reviewers that would praise themselves for being objective, but I haven't seen at least one of them mentioning something about this issue, which is by no means isolated. Even if AMD sent them fully tested processors, they should have learned out about the nightmare that ordinary customers face, and should have taken action to check and discuss it, if they were so objective, close and open to their followers. This is why I seriously question their openness and fairness. Sure, I don't have material evidence about this, but their attitude, and the obvious reality that they act as it wouldn't exist, it is proof enough. I am curious if/when the first tech "influencer" will post a video on this subject.


----------



## frollic

nevcairiel said:


> My 5950X RMA replacement from AMD arrived, 2051SUS production, so pretty new. Testing will commence shortly.


AMD received my CPU today, waiting for them to send me a new one.

I noticed Gigabyte removed all recent BIOS updates for my B550.
The only one left is vF10 which is from mid Sep. All F11s are gone.


----------



## glith

frollic said:


> AMD received my CPU today, waiting for them to send me a new one.
> 
> I noticed Gigabyte removed all recent BIOS updates for my B550.
> The only one left is vF10 which is from mid Sep. All F11s are gone.


Asus has removed the beta bios completely now, for the rog crosshair VIII dark hero at least.


----------



## frollic

glith said:


> Asus has removed the beta bios completely now, for the rog crosshair VIII dark hero at least.


Gigabytes' weren't, or at least not flagged as such, what I was able to see/remember ...


----------



## glith

frollic said:


> AMD received my CPU today, waiting for them to send me a new one.
> 
> I noticed Gigabyte removed all recent BIOS updates for my B550.
> The only one left is vF10 which is from mid Sep. All F11s are gone.


Regarding my CPU, I did send it back to the seller.. but due to the holidays they haven't yet looked at it... so I went ahead and ordered a new one.. hopefully sent to me next week if their supplier doesn't mess up.
The waiting is painful and I really cross my fingers I get "lucky" this time.


----------



## nevcairiel

frollic said:


> Gigabytes' weren't, or at least not flagged as such, what I was able to see/remember ...


Gigabyte BIOSes with a letter at the end are Beta, so eg. F11d is a beta version of F11, and the final F11 comes after the beta phase.

For X570, they issued a final stable BIOS on 12/31, not sure why B550 wouldn't have anything like that. I hear they should be having a F12 now - or at least soon.


----------



## frollic

nevcairiel said:


> Gigabyte BIOSes with a letter at the end are Beta, so eg. F11d is a beta version of F11, and the final F11 comes after the beta phase.


Ah, I wasn't aware of that. 

In that case yes, all betas were pulled, and only the regular ones remain.


----------



## Deepcuts

nevcairiel said:


> My 5950X RMA replacement from AMD arrived, 2051SUS production, so pretty new. Testing will commence shortly.


Mate, you are at it for 5 hours already.
Have you thrown your PC out the window or the new CPU fixed it?


----------



## GRABibus

brasoveanul said:


> Indeed, it is an important issue that you raise here, there are so many tech reviewers that would praise themselves for being objective, but I haven't seen at least one of them mentioning something about this issue, which is by no means isolated. Even if AMD sent them fully tested processors, they should have learned out about the nightmare that ordinary customers face, and should have taken action to check and discuss it, if they were so objective, close and open to their followers. This is why I seriously question their openness and fairness. Sure, I don't have material evidence about this, but their attitude, and the obvious reality that they act as it wouldn't exist, it is proof enough. I am curious if/when the first tech "influencer" will post a video on this subject.


they will not, especially if they are sponsored by AMD


----------



## rob-tech

GRABibus said:


> they will not, especially if they are sponsored by AMD


AMD probably sends them tested CPUs and it is like you said, this is quite frankly a joke as people use these things for serious work and not only gaming and recreation. It is irresponsible of AMD to do this and as I have lost the trust in them, I will make an effort to build with Intel in the future, so hopefully they catch up.


----------



## GRABibus

rob-tech said:


> AMD probably sends them tested CPUs and it is like you said, this is quite frankly a joke as people use these things for serious work and not only gaming and recreation. It is irresponsible of AMD to do this and as I have lost the trust in them, I will make an effort to build with Intel in the future, so hopefully they catch up.


I still wait for my build from the company I ordered it to (5900X, CH8 hero).
If I experience the same bullshit as you all, I will ask for a refund and wait for 11900k launch,


----------



## nevcairiel

Deepcuts said:


> Mate, you are at it for 5 hours already.
> Have you thrown your PC out the window or the new CPU fixed it?


I guess i'm cautiously optimistic. So far it seems fine, but i'll leave it idle over night and then do stressing tomorrow.


----------



## o1dschoo1

Jesus 86 people just in this thread that voted have issues. Imagine how many people arnt reporting and just rmaing stuff.


----------



## GamBoTron

o1dschoo1 said:


> Jesus 86 people just in this thread that voted have issues. Imagine how many people arnt reporting and just rmaing stuff.


reddit is loaded with threads with these kinds of issues as well.

The interesting part is how much of these issues is due to bad BIOS software/compatibility (that can potentially be fixed) and how much is due to AMD and faulty hardware (basically that people receive faulty cpu's because the quality control is just straight up bad)

either way, its not a very good situation for any of the parts involved: not for the company or the customer


----------



## pSickOpatA

Just returned my cpu to retailer, asking for a brand new of refund. Can't wait longer without a cpu and ordered a new one.. hope to get better luck this time.

Btw the returned one was batch BG 2044PGS


----------



## dev1ance

rob-tech said:


> AMD probably sends them tested CPUs and it is like you said, this is quite frankly a joke as people use these things for serious work and not only gaming and recreation. It is irresponsible of AMD to do this and as I have lost the trust in them, I will make an effort to build with Intel in the future, so hopefully they catch up.


Spent a week dealing with a 5900x+X570-E that would get sporadic WHEAs even underclocked to 4GHz 1.29v all core, ****ed up my Windows install and that was the last straw. Tested my memory from DDR4-3200->3800 at the loosest timings and IF throughout. I was only able to fully eliminate WHEAs at DDR4-3200 and IF1600. Wasn't up for more beta testing and precisely why I went back to Intel. This is reminiscent of my 3700x build, probably will take a while before things get better (I should've learned).


----------



## JohnnyFlash

Well, this may have just pushed me to just get a 3950X and regular hero instead of waiting any longer.


----------



## brasoveanul

GamBoTron said:


> reddit is loaded with threads with these kinds of issues as well.
> 
> The interesting part is how much of these issues is due to bad BIOS software/compatibility (that can potentially be fixed) and how much is due to AMD and faulty hardware (basically that people receive faulty cpu's because the quality control is just straight up bad)
> 
> either way, its not a very good situation for any of the parts involved: not for the company or the customer


The BIOS bugs seem to play a secondary role here, as simply changing the processor allows you to get rid of, basically, the entire unstable behaviour.


----------



## smbell1979

Yeah, it seems like better chips can just manage to handle whatever the BIOS issues are causing, voltage spikes or not enough voltage maybe, who knows...


----------



## GamBoTron

brasoveanul said:


> The BIOS bugs seem to play a secondary role here, as simply changing the processor allows you to get rid of, basically, the entire unstable behaviour.


True, seems like bad batches of the cpu were indeed sent out. Its just too many cases to be random

Making a list of bad batches would be helpful, so people can just return them right away without going trough the stress of trying to make it work with errors and such.

i dont know what kind of numbering/coding these cpus are delivered with, but there must be a way to differentiate them


----------



## Anthos

GamBoTron said:


> True, seems like bad batches of the cpu were sent out. Its just too many cases to be random
> 
> Making a list of bad batches would be helpful, so people can just return them without going trough the stress of trying to make it work.
> 
> i dont know what kind of coding these cpus are delivered with, but there must be a way to differentiate them


There's no specific "bad batch". There's numerous supports of people having the same batch processor and one person has constant errors and the other one never. It's probably multifactorial.


----------



## GamBoTron

Anthos said:


> There's no specific "bad batch". There's numerous supports of people having the same batch processor and one person has constant errors and the other one never. It's probably multifactorial.


Ah ok, in that case its even harder to sort out. Its basically a lottery then. well good luck i guess and fingers crossed lol


----------



## boyman

GamBoTron said:


> True, seems like bad batches of the cpu were indeed sent out. Its just too many cases to be random
> 
> Making a list of bad batches would be helpful, so people can just return them right away without going trough the stress of trying to make it work with errors and such.
> 
> i dont know what kind of numbering/coding these cpus are delivered with, but there must be a way to differentiate them


Hi
I have read from somewhere, that for those in the "know", it is possible to read from the serial number from where in the wafer the chiplet(s) in the processor originate.
The center of the wafer provides the best samples, while towards the edges, quality gets worse.
I did try to look pictures of processors, from different reviewers sites, and it seemed, that for 5950x they all had serials that started 9JF2......
Mine, and all those of my 4 friends who have one, start 452.....
This may have more bearing on the quality than the batch, perhaps the reviewers are given center-wafer samples.
Or, it may mean nothing at all, and is total rubbish, apologies if so.


----------



## boyman

Edit: By quality getting worse toward the edges of the wafers, I really mean, that the chances of lower quality increase.


----------



## GamBoTron

boyman said:


> Hi
> I have read from somewhere, that for those in the "know", it is possible to read from the serial number from where in the wafer the chiplet(s) in the processor originate.
> The center of the wafer provides the best samples, while towards the edges, quality gets worse.
> I did try to look pictures of processors, from different reviewers sites, and it seemed, that for 5950x they all had serials that started 9JF2......
> Mine, and all those of my 4 friends who have one, start 452.....
> This may have more bearing on the quality than the batch, perhaps the reviewers are given center-wafer samples.
> Or, it may mean nothing at all, and is total rubbish, apologies if so.


thanks for the reply.
I am not sure about the "bad batches" theory. according to @Anthos it does not matter , and is more or less random (if i understood him correctly, that is) Apparently some people have problems, some dont (even with the same batch)

I havent even received my 5950x yet, but from what i have gathered from this thread is that if i encounter problems with mine, it is better to return it and ask for a new one.


----------



## boyman

GamBoTron said:


> thanks for the reply.
> I am not sure about the "bad batches" theory. according to @Anthos it does not matter , and is more or less random (if i understood him correctly, that is) Apparently some people have problems, some dont (even with the same batch)
> 
> I havent even received my 5950x yet, but from what i have gathered from this thread is that if i encounter problems with mine, it is better to return it and ask for a new one.


Well, my build is not ready yet, waiting for some cooling components.
But my friends report significant differences among their chips abilities, and the reviewers seem to have more similar quality chips.
It could indicate the wafer position, but could also just be more rigorous testing...
The information about the serial number comes from a forum, where a person, who is very tight with an expert at a retailer, was given a processor by said expert,
with the comment that "it should be a good one, I checked the serial, and it´s from the center of the wafer".
Maybe true, maybe b****cs.

Edit: To be clear, this specifically is NOT a bad batch idea, but rather the opposite; the bad ones are everywhere.


----------



## Midian

boyman said:


> Hi
> I have read from somewhere, that for those in the "know", it is possible to read from the serial number from where in the wafer the chiplet(s) in the processor originate.
> The center of the wafer provides the best samples, while towards the edges, quality gets worse.
> I did try to look pictures of processors, from different reviewers sites, and it seemed, that for 5950x they all had serials that started 9JF2......
> Mine, and all those of my 4 friends who have one, start 452.....
> This may have more bearing on the quality than the batch, perhaps the reviewers are given center-wafer samples.
> Or, it may mean nothing at all, and is total rubbish, apologies if so.











Mine is a 9JG6 and I have no problems might be bs or there is something to it.


----------



## glith

Midian said:


> View attachment 2472948
> 
> 
> Mine is a 9JG6 and I have no problems might be bs or there is something to it.


Mine was 9JG5 and had problems...


----------



## Anthosm

boyman said:


> Hi
> I have read from somewhere, that for those in the "know", it is possible to read from the serial number from where in the wafer the chiplet(s) in the processor originate.
> The center of the wafer provides the best samples, while towards the edges, quality gets worse.
> I did try to look pictures of processors, from different reviewers sites, and it seemed, that for 5950x they all had serials that started 9JF2......
> Mine, and all those of my 4 friends who have one, start 452.....
> This may have more bearing on the quality than the batch, perhaps the reviewers are given center-wafer samples.
> Or, it may mean nothing at all, and is total rubbish, apologies if so.


And how can you extrapolate based on the serial number at what position on the wafer was a specific cpu?


----------



## boyman

Anthosm said:


> And how can you extrapolate based on the serial number at what position on the wafer was a specific cpu?


I have absolutely no idea.
I tried for days and days find something on it, but could not.
If it is true, it is very discreet information.
And, may even be untrue.

Edit: Well, if it is true, it seems that the first few digits give out this information. Perhaps there is a coordinate system, could easily be, say, first 4 digits. Or 2, as the letters are used too, but i I don´t know how many chiplets come from a wafer, nor the layout.


----------



## boyman

glith said:


> Mine was 9JG5 and had problems...


And this might invalidate the whole idea, though, we are talking about probabilities of bad quality increasing toward the edges...


----------



## brasoveanul

Deepcut's bad processor is from a serial number range that starts with 9JF3, I am not sure about mine, it may have been another serial number range. Thus, I don't think we can infer a definitive reasoning on this criteria.


----------



## boyman

brasoveanul said:


> Deepcut's bad processor is from a serial number range that starts with 9JF3, I am not sure about mine, it may have been another serial number range. Thus, I don't think we can infer a definitive reasoning on this criteria.


Seems that way. And, anyway, we´d need a much larger data set to draw any realistically valid conclusions.
A lottery is a lottery.


----------



## OndrejVasicek

Ok guys. I’m joining to the party. I received my 5950x in the middle of December. Everything on base clock, not even DOCP. New fresh Windows installation. Everything was ok.

Then, idle black screen freezes appeared. Happened few times per day (0-5). After every restart there was the obligate WHEA-Logger (Machine Check Exception/Cache Hierarchy Error). I tried everything mentioned here and on other places which don’t cripple the CPU (like turning off the ability to boost).

After seeing many people solving the problem by changing the CPU, I decided to do it as well. Gave he CPU back and bought a new one. Was running the CPU from Sunday, which is 4 days. Without a problem. Even the performance is better. The original one was quite bad in this way.

But then, just few minutes ago I left the PC for few minutes and came back to see restarted Windows ☹ fuu******k. Whea Logger error. So, I guess I’m just bloody unlucky or it’s BIOS/motherboard problem.
What do you think guys?

ASUS ROG STRIX X570-E (3001 Bios – I also had the new beta for a moment with the old CPU and I’m not solid rock true but I think there wasn’t black screen for two days)
Noctua DH-15
G.SKILL 64GB KIT DDR4 4000MHz CL18 Ripjaws V
SSD Samsung 980 Pro 1TB
Corsair RM1000x 1000W
GTX 1080Ti (from previous PC)


----------



## BluePaint

@OndrejVasicek 
How is your RAM + FCLK configured?


----------



## GamBoTron

OndrejVasicek said:


> Ok guys. I’m joining to the party. I received my 5950x in the middle of December. Everything on base clock, not even DOCP. New fresh Windows installation. Everything was ok.
> 
> Then, idle black screen freezes appeared. Happened few times per day (0-5). After every restart there was the obligate WHEA-Logger (Machine Check Exception/Cache Hierarchy Error). I tried everything mentioned here and on other places which don’t cripple the CPU (like turning off the ability to boost).
> 
> After seeing many people solving the problem by changing the CPU, I decided to do it as well. Gave he CPU back and bought a new one. Was running the CPU from Sunday, which is 4 days. Without a problem. Even the performance is better. The original one was quite bad in this way.
> 
> But then, just few minutes ago I left the PC for few minutes and came back to see restarted Windows ☹ fuu******k. Whea Logger error. So, I guess I’m just bloody unlucky or it’s BIOS/motherboard problem.
> What do you think guys?
> 
> ASUS ROG STRIX X570-E (3001 Bios – I also had the new beta for a moment with the old CPU and I’m not solid rock true but I think there wasn’t black screen for two days)
> Noctua DH-15
> G.SKILL 64GB KIT DDR4 4000MHz CL18 Ripjaws V
> SSD Samsung 980 Pro 1TB
> Corsair RM1000x 1000W
> GTX 1080Ti (from previous PC)


Not sure, but i heard some users say that they changed their RAM and that solved issues, might be something else tho but its worth checking out.


----------



## OndrejVasicek

BluePaint said:


> @OndrejVasicek
> How is your RAM + FCLK configured?


When the problems started – with the old CPU after the Win installation, RAM were on base SPF clocks - 2666MHz and 1.2V and FCLK 1333Mhz. Then I decided to try the limits just for fun (I didn’t want to do the final tweaks with corrupted CPU) and tested a different combinations of DDR freq and FCLK, tried some synced and unsynced combinations, measured the performance, just for fun (was even able to run RAM 4000 / 2000 FCLK, but with a lot of WHEA RAM corrections). I ended up with

Downclock RAM 4000 to 3733, 1.4V to 1.375V
FCLK to 1867 – so the frequencies are 1:1
But I was about to do a proper testing in next days, but now I’m not sure.


----------



## GRABibus

OndrejVasicek said:


> Ok guys. I’m joining to the party. I received my 5950x in the middle of December. Everything on base clock, not even DOCP. New fresh Windows installation. Everything was ok.
> 
> Then, idle black screen freezes appeared. Happened few times per day (0-5). After every restart there was the obligate WHEA-Logger (Machine Check Exception/Cache Hierarchy Error). I tried everything mentioned here and on other places which don’t cripple the CPU (like turning off the ability to boost).
> 
> After seeing many people solving the problem by changing the CPU, I decided to do it as well. Gave he CPU back and bought a new one. Was running the CPU from Sunday, which is 4 days. Without a problem. Even the performance is better. The original one was quite bad in this way.
> 
> But then, just few minutes ago I left the PC for few minutes and came back to see restarted Windows ☹ fuu******k. Whea Logger error. So, I guess I’m just bloody unlucky or it’s BIOS/motherboard problem.
> What do you think guys?
> 
> ASUS ROG STRIX X570-E (3001 Bios – I also had the new beta for a moment with the old CPU and I’m not solid rock true but I think there wasn’t black screen for two days)
> Noctua DH-15
> G.SKILL 64GB KIT DDR4 4000MHz CL18 Ripjaws V
> SSD Samsung 980 Pro 1TB
> Corsair RM1000x 1000W
> GTX 1080Ti (from previous PC)


all these problems are showing maybe a fast CPU degradation. Because , even at stock, it can peak at 1,5V at idle due a simple background task.

guys, it is time to stop AMD builds and wait for 11900k


----------



## machine038

OndrejVasicek said:


> was even able to run RAM 4000 / 2000 FCLK


Oh yeah, that would be the cause of the restarts and correctable whea errors in your case, since you replaced with a CPU that works fine @ defaults settings.
I don't think is possible to run IF (FCLK) to run at 2000 at the current moment, maybe that won't be possible.
Anything above 1600 is considered overclocking and not guaranteed (not even DOCP)

With the current BIOS state from ASUS (3001) I've read reports IF @ 1900 being alright.

Have you tried checking out the DRAM Calculator? VSOC, VDDP, IOD, CDD might need some adjustment if you're running that high.


----------



## Midian

GRABibus said:


> all these problems are showing maybe a fast CPU degradation. Because , even at stock, it can peak at 1,5V at idle due a simple background task.
> 
> guys, it is time to stop AMD builds and wait for 11900k


This is normal it's just one core boosting, voltage should decrease once the task is done.

"I'm specifically looking for reports where the voltage is stuck at a particular value, or a small range of values, around 1.4V--no matter how long you sit there and watch it. It is perfectly okay if your CPU is periodically using 1.4-1.5V to achieve boost frequencies, and you should see dips into sub-1.0V as the CPU goes into idle. These dips may be brief, and that's okay. Load voltages of around 1.2-1.3V are perfectly okay also. This is the processor working as expected. Ryzen is a highly dynamic system, with up to 1000 voltage and clockspeed changes every second. You will see a lot of bouncing around as you work with your system." - Robert Hallock


----------



## GRABibus

Midian said:


> This is normal it's just one core boosting, voltage should decrease once the task is done.
> 
> "I'm specifically looking for reports where the voltage is stuck at a particular value, or a small range of values, around 1.4V--no matter how long you sit there and watch it. It is perfectly okay if your CPU is periodically using 1.4-1.5V to achieve boost frequencies, and you should see dips into sub-1.0V as the CPU goes into idle. These dips may be brief, and that's okay. Load voltages of around 1.2-1.3V are perfectly okay also. This is the processor working as expected. Ryzen is a highly dynamic system, with up to 1000 voltage and clockspeed changes every second. You will see a lot of bouncing around as you work with your system." - Robert Hallock


yes I know this wording from Hallock
So question is : why most of the people see first wheas and bsod’s after some days ?


----------



## Anthosm

GRABibus said:


> all these problems are showing maybe a fast CPU degradation. Because , even at stock, it can peak at 1,5V at idle due a simple background task.
> 
> guys, it is time to stop AMD builds and wait for 11900k


Up to 1.5v is within specs


----------



## Midian

GRABibus said:


> yes I know this wording from Hallock
> So question is : why most of the people see first wheas and bsod’s after some days ?


I don't know I only had one but that was on the old windows install so it was probably because of that, after reinstall zero, either way it shouldn't be because of the cpu voltage.


----------



## Anthosm

Midian said:


> I don't know I only had one but that was on the old windows install so it was probably because of that, after reinstall zero, either way it shouldn't be because of the cpu voltage.


I was the complete opposite. Old installation zero problems, fresh installation constant wheas. After a day went away and didn't come up. It's weird how this problem manifests in such varied ways.


----------



## Midian

Anthosm said:


> I was the complete opposite. Old installation zero problems, fresh installation constant wheas. After a day went away and didn't come up. It's weird how this problem manifests in such varied ways.


Now that is really peculiar, for me the only unique thing that happened was that I removed the old Sata HD and SSD and went NVMe only and reinstalled Windows.


----------



## boyman

It is odd (apologies to all of you, AMD, and Lisa Su), not just in this forum, that when suggestive ideas of bad product quality arise, the conversation is drowned in repetitive, non answerable posts.


----------



## aa.delite

Use Ryzen Master or CPU-Z to monitor real voltage and frequency. You may be wondered. There is no 1.5V at idle.
I've found the best way to examine CPU for reboots - Geekbench 5. Up to 5 passes. There are browsing and other tests.


----------



## glith

Asus has posted a new beta bios 3102 for the rog crosshair VIII dark hero.


----------



## smbell1979

glith said:


> Asus has posted a new beta bios 3102 for the rog crosshair VIII hero.


Not seeing it here in US.

Edit: Found it in the Crosshair VIII Hero thread.


----------



## glith

Asus has posted a new beta bios 3102 for the rog crosshair VIII hero. (Edited: Dark)


smbell1979 said:


> Not seeing it here in US.


So sorry, I meant the _dark_ hero.
But the ordinary shouldn't be far off..


----------



## BluePaint

GRABibus said:


> all these problems are showing maybe a fast CPU degradation. Because , even at stock, it can peak at 1,5V at idle due a simple background task.


Doubt it. Iits more likely that the boost is too aggressive. The 5950 is officially speced @ 4900 Mhz max but boosts up to 5050 Mhz out of the box. I guess that not all CPU samples can handle that ambitious boost behavior.


----------



## xeizo

BluePaint said:


> Doubt it. Iits more likely that the boost is too aggressive. The 5950 is officially speced @ 4900 Mhz max but boosts up to 5050 Mhz out of the box. I guess that not all CPU samples can handle that ambitious boost behavior.


I doubt the degradation too, my 3900X was purchased at that launch and has been a beater. Did folding for months etc it still performs as it should and boost as good as ever, 4625MHz on four cores. It's the same process, so if the 3900X didn't degrade, I doubt the 5000 will degrade.

Too much boost, yes, that is what I think. Much the same as what hit Nvidia with the RTX3080 and they had to reduce boost via the drivers pretty fast.


----------



## OndrejVasicek

machine038 said:


> Oh yeah, that would be the cause of the restarts and correctable whea errors in your case, since you replaced with a CPU that works fine @ defaults settings.
> I don't think is possible to run IF (FCLK) to run at 2000 at the current moment, maybe that won't be possible.
> Anything above 1600 is considered overclocking and not guaranteed (not even DOCP)
> 
> With the current BIOS state from ASUS (3001) I've read reports IF @ 1900 being alright.
> 
> Have you tried checking out the DRAM Calculator? VSOC, VDDP, IOD, CDD might need some adjustment if you're running that high.


You got me wrong (sorry for my english, not a native speaker) or I don’t understand correctly, but – the restarts happened and still happen on the defaults – 2666/1333. The testing I was talking about was just a brief phase. So simply – no matter of the RAM/FCLK setting – the PC restarts even with the new CPU.

So I’m asking – do you think it’s another bad CPU or there might be some different problem? It would be pretty akward to give back another "good" CPU.


----------



## slvr

slvr said:


> (MSI MAG Tomahawk X570, 5950X, memory is 4x16gb 3600 CL16 GSkill Ripjaws V)
> Having these problems (WHEA errors, constant reboots when idle), dug the entire web for the solution and ended up with this:
> 
> 
> BIOS v151 beta
> Disabled Global C State
> Curve optimizer +2 all core
> XMP enabled
> The rest is stock, i.e. it boosts single core 3.8, single core 4.9
> 
> no crashes so far. Finally, I am able to use my PC after three days of non-stop investigation.
> Waiting for a new BIOS with AGESA 1.1.9.0 ...


I'd like to post an update.
With AGESA 1.1.9.0 (that's BIOS version v153 for my X570 Tomahawk), everything works fine on auto with XMP enabled. 
No more idle reboots, no WHEA no nothing.


----------



## BluePaint

@slvr 
How is performance/clocks? Maybe u can do a quick benchmark like cpuz with pbo enabled?


----------



## machine038

GRABibus said:


> Because , even at stock, it can peak at 1,5V at idle due a simple background task.


1.5V on a ST workloads is within specification.



OndrejVasicek said:


> You got me wrong (sorry for my english, not a native speaker) or I don’t understand correctly, but – the restarts happened and still happen on the defaults – 2666/1333. The testing I was talking about was just a brief phase. So simply – no matter of the RAM/FCLK setting – the PC restarts even with the new CPU.
> 
> So I’m asking – do you think it’s another bad CPU or there might be some different problem? It would be pretty awkward to give back another "good" CPU.


Oh, sorry, I got the wrong impression.
There is three things that I'd check first


Is your AMD Chipset drivers the latest version?
What is your BIOS version? There are tangible improvements in the latest BIOS updates and people are reporting that it solved their issues.
The CLEAR CMOS might help getting rid of some configuration that got "stuck", like the Curve Optimizer, I saw someone that claimed that after setting the curve optimizer and disabling PBO, the stock precision boost still uses the curve optimizer settings. I tried that in my Asus board and seems to check out.

I have a ASUS ROG STRIX X570-I. I think the latest beta BIOS they removed because there was a bug affecting the 3xxx CPUs. The 3201, for me, it was the only thing that fixed the correctable WHEA errors on leaving everything on auto after enabling DCOP DOCP.

If I think is a another faulty CPU, yeah can be. I'd be suspicious of motherboard or RAM too at this point.
Maybe try to run it on another motherboard if you can, or with only one stick of RAM.



boyman said:


> It is odd (apologies to all of you, AMD, and Lisa Su), not just in this forum, that when suggestive ideas of bad product quality arise, the conversation is drowned in repetitive, non answerable posts.


Because usually when the suggestive ideas of bad product quality arises is usually laced with incorrect statements.
I've got a bad CPU, I agree the QC is lacking, as of why, we only can conjecture.
Used the manufacturer warranty, got a replacement, new CPU works great.


----------



## Deepcuts

machine038 said:


> WHEA errors on leaving everything on auto after enabling DCOP.


We have an undercover 5-0 in the house.
He is enabling DCOP behind our backs.


----------



## machine038

Deepcuts said:


> We have an undercover 5-0 in the house.
> He is enabling DCOP behind our backs.


Got my head scratching there for a while.  
I will fix it thanks.


----------



## Imraneo

Ok.. Asus got the 3201 removed for my Strix X570-F.
But this morning, I see a 3202 in, AGESA 1.1.9.0. Not having high hopes here, but will check it out end of the day.
Other Asus boards should have new BIOSes now too.


----------



## Imraneo

slvr said:


> I'd like to post an update.
> With AGESA 1.1.9.0 (that's BIOS version v153 for my X570 Tomahawk), everything works fine on auto with XMP enabled.
> No more idle reboots, no WHEA no nothing.


May I ask how bad was your situation prior to this?
Constant idle reboots? Are you using 5900X?

Cheers & Congrats


----------



## KatanaSW

Hello guys, so I have a 5600x, it has not been opened yet. The seal is intact. Unfortunately like the the rest of you, replacement in my country is a no go. Only rma is a possibility and even that would take months. I’m building my pc after 5 years and definitely don’t want to go through such issues. So should I give this away and get a 10700k? I’m only going to be gaming at 1440p, never going to overclock, will only enable XMP( G.Skill 3600mhz cl16 ripjaws V). Any response would help.


----------



## Imraneo

KatanaSW said:


> Hello guys, so I have a 5600x, it has not been opened yet. The seal is intact. Unfortunately like the the rest of you, replacement in my country is a no go. Only rma is a possibility and even that would take months. I’m building my pc after 5 years and definitely don’t want to go through such issues. So should I give this away and get a 10700k? I’m only going to be gaming at 1440p, never going to overclock, will only enable XMP( G.Skill 3600mhz cl16 ripjaws V). Any response would help.


Based on all the posts I've read regarding this issue, I have seen only some 5800X and mainly 5900X and 5950X.
Not a single 5600X. It seems to be that higher core processors are generally more complex with more chiplets, which causes higher fallout. Thus, you need not be worried.
Again, I'll let more experts chime in on this!


----------



## KatanaSW

Imraneo said:


> Based on all the posts I've read regarding this issue, I have seen only some 5800X and mainly 5900X and 5950X.
> Not a single 5600X. It seems to be that higher core processors are generally more complex with more chiplets, which causes higher fallout. Thus, you need not be worried.
> Again, I'll let more experts chime in on this!


Hello, thanks for the reply. Yes, as you said in this forum not many people have issues with the 5600x. But Reddit is plagued with 5600x bsod’s. That’s why I wanted to be sure before opening the box.


----------



## xeizo

The higher end CPUs boosts higher, a lot is pointing at a too generous boost is what creates problems.


----------



## Deepcuts

Imraneo said:


> Based on all the posts I've read regarding this issue, I have seen only some 5800X and mainly 5900X and 5950X.
> Not a single 5600X.


That is why you have the pool.
All 5 users with 5600x are having problems.
later edit:
and @kilianlievens got a working replacement.


----------



## OndrejVasicek

machine038 said:


> Oh, sorry, I got the wrong impression.
> There is three things that I'd check first
> 
> 
> Is your AMD Chipset drivers the latest version?
> What is your BIOS version? There are tangible improvements in the latest BIOS updates and people are reporting that it solved their issues.
> The CLEAR CMOS might help getting rid of some configuration that got "stuck", like the Curve Optimizer, I saw someone that claimed that after setting the curve optimizer and disabling PBO, the stock precision boost still uses the curve optimizer settings. I tried that in my Asus board and seems to check out.
> 
> I have a ASUS ROG STRIX X570-I. I think the latest beta BIOS they removed because there was a bug affecting the 3xxx CPUs. The 3201, for me, it was the only thing that fixed the correctable WHEA errors on leaving everything on auto after enabling DCOP DOCP.
> 
> If I think is a another faulty CPU, yeah can be. I'd be suspicious of motherboard or RAM too at this point.
> Maybe try to run it on another motherboard if you can, or with only one stick of RAM.



Yes - 2.10.13.408
I had 3001. Before returning the old CPU I was running on 3201 (I managed to download it before Asus removed it). Today I instaled 3202.
Let's see what it brings. So far the performance dropped a bit R23 - from 1608/25743 to 1601/25471.


I didn't used curve optimizer. I didn't even find something like this in my Bios.

Thanks for the advices. I'll wait for another restart if it happens with this Bios and if it does I'll check the Ram.
Unfortunately, I’m not able to change the motherboard.


----------



## Catscratch

KatanaSW said:


> Hello, thanks for the reply. Yes, as you said in this forum not many people have issues with the 5600x. But Reddit is plagued with 5600x bsod’s. That’s why I wanted to be sure before opening the box.


What's your current rig ?


----------



## Imraneo

Quick update.
BIOS 3202 AGESA 1190. Same ****


----------



## GamBoTron

slvr said:


> I'd like to post an update.
> With AGESA 1.1.9.0 (that's BIOS version v153 for my X570 Tomahawk), everything works fine on auto with XMP enabled.
> No more idle reboots, no WHEA no nothing.


nice, must be a great feeling 😁 Im shortly gonna receive my cpu, same motherboard as you, hopefully mine works as smooth


----------



## brasoveanul

Imraneo said:


> Quick update.
> BIOS 3202 AGESA 1190. Same ****


Return and exchange it as soon as you have this possibility.


----------



## Imraneo

brasoveanul said:


> Return and exchange it as soon as you have this possibility.


Yup. Waiting for AMD to respond to my RMA request.
Basically my chip is so sensitive that it has to be run at around 1.05 - 1.15V for it to be stable. Auto settings brings it to 1.44V.


----------



## frollic

frollic said:


> AMD received my CPU today, waiting for them to send me a new one.


Return approved, we'll see if the new CPU ships tomorrow ....



> I noticed Gigabyte removed all recent BIOS updates for my B550.
> The only one left is vF10 which is from mid Sep. All F11s are gone.


F11 non beta is out....


----------



## GRABibus

I am curious if one thing :
When you ask for RMA, I assume they ask you the reasons why ?
What do you say exactly to AMD ?


----------



## yaniv82

GRABibus said:


> I am curious if one thing :
> When you ask for RMA, I assume they ask you the reasons why ?
> What do you say exactly to AMD ?


They ask to send a picture of the CPU seated on the motherboard / showing the serial number, proof of purchase and "Details of all the steps and resolution tests performed, including the results of these, where it is shown that the processor is defective".
I listed everything I tried to fix the issue (including assembling the computer, checking all cables and connections, reseating the cpu cooler/thermal paste, testing components, bios default settings, updated drivers and OS, different ram configuration ) and sent screenshots of the stress tests performed.


----------



## GRABibus

yaniv82 said:


> They ask to send a picture of the CPU seated on the motherboard / showing the serial number, proof of purchase and "Details of all the steps and resolution tests performed, including the results of these, where it is shown that the processor is defective".
> I listed everything I tried to fix the issue (including assembling the computer, checking all cables and connections, reseating the cpu cooler/thermal paste, testing components, bios default settings, updated drivers and OS, different ram configuration ) and sent screenshots of the stress tests performed.


So crazy....
We should expect the reverse : that they explains us how to solve....
In fact , as beta testers, we are helping them to fill their data base in order to collect issues from the field...
What a bullshit company, really.


----------



## yaniv82

GRABibus said:


> So crazy....
> We should expect the reverse : that they explains us how to solve....
> In fact , as beta testers, we are helping them to fill their data base in order to collect issues from the field...
> What a bullshit company, really.


Totally, I sent my CPU 2 weeks ago and the RMA was approved on Monday but I haven't received any notification regarding the replacement ETA and they haven't replied my emails.


----------



## Yuke

Cant talk about Zen3 yet but Zen2 was rock stable at stock and is also running stable at +0.05V offset with EDC/PBO2 (easy 1.6V boost voltage in low loads)...those "overclock" settings had reboot issues first but got solved by disabling C-states.

You rally gotta try hard to "degrade" it, like many are claiming here...


----------



## frollic

yaniv82 said:


> They ask to send a picture of the CPU seated on the motherboard / showing the serial number, proof of purchase and "Details of all the steps and resolution tests performed, including the results of these, where it is shown that the processor is defective". I listed everything I tried to fix the issue (including assembling the computer, checking all cables and connections, reseating the cpu cooler/thermal paste, testing components, bios default settings, updated drivers and OS, different ram configuration ) and sent screenshots of the stress tests performed.


They didn't of me.
Obviously they wanted the serial #, but no pictures, or proof of purchase, nothing, only a description of the problem,
and what I had tried to do, to mitigate the error.

This is a screen dump of the description I sent them, it's the 1st and only reply back to them after initiating the RMA.


----------



## GRABibus

I receive my rig with 5900x tomorrow.
I will see rapidly if I lost the « Ryzen » lottery.


----------



## frollic

yaniv82 said:


> Totally, I sent my CPU 2 weeks ago and the RMA was approved on Monday but I haven't received any notification regarding the replacement ETA and they haven't replied my emails.


This is weird too, my approval only took 48 hrs, they received the CPU on Tue, approval mail came in earlier today.


----------



## GRABibus

The worst thing is that we, customers, have the feeling that they don’t care because they don’t communicate at all.

Is there a way to create a kind of blaim to their « silence » ?


----------



## rob-tech

Yuke said:


> Cant talk about Zen3 yet but Zen2 was rock stable at stock and is also running stable at +0.05V offset with EDC/PBO2 (easy 1.6V boost voltage in low loads)...those "overclock" settings had reboot issues first but got solved by disabling C-states.
> 
> You rally gotta try hard to "degrade" it, like many are claiming here...


Stable in normal usage, however I wouldn't call it rock stable (at least in my case), my 3950x passes all tests with the exception of prime95 smallFFT's AVX2, at stock the system becomes TDC constrained and voltage dips to about 0.925 usually triggering a single worker stoppage after about 1-2 hours. I went through three 3950x units and two motherboards. 

I am using the X570 Aorus Xtreme with Seasonic Titanium 850 watt power supply. Maybe it works on other motherboard models, however AMD really dropped the ball on Zen 2 and the binning seems too tight. At least the system doesn't crash after 10 seconds with a reboot like my second processor. I'm also currently pleased with this system as I have months of usage with hundreds of hours and no crash of any kind.

I'm looking forward to my Zen 3 lottery (not really)


----------



## JohnnyFlash

I've seen multiple posts on reddit saying that changing the nvme slots to x3 fixed it for some people, anyone here try that?


----------



## reqq

JohnnyFlash said:


> I've seen multiple posts on reddit saying that changing the nvme slots to x3 fixed it for some people, anyone here try that?


yeah some people tried different m2 drive and whea got away..


----------



## silot

Reporting in 5900x Asus strix-e x570 , 3080 amp holo. My WHEA errors started with BIOS 3001 i was fine on 2802, i can't go back to 2802 to be 100% sure.


----------



## Imraneo

Time to work on my info for RMA. This is what they asked for:



> In order to review the issue further I request you provide the below requested details.
> · Could you please provide me the screenshot showing latest BIOS and chipset drivers? Please ensure that your system BIOS is on factory default settings
> · And I request you provide the screenshot of the Ryzen master showing the CPU temperature and other parameters.
> · What kind of troubleshooting have you completed in order to determine that this is in fact a processor issue? (E.g. was any component swap/BIOS update performed) Please do describe in brief what you have done.
> · I request you provide the complete system configuration details for the better understanding of the issue (GPU, RAM, memory module, OS, PSU, etc).
> · please check with your heatsink/fan retailor to see if your HSF meets the processor's thermal design power request and please provide the heatsink/fan model or product link on manufacture website.
> · And I request you to send Dxdiag report:
> To get the Dxdiag report, please follow the below steps:
> 
> Click on "START" > Click "Run” > Type in "dxdiag" and click "OK" > Click "Save all Information”. Please attach this to the email.
> In order to update this service request, please respond without deleting or modifying the service request reference number in the email subject or in the email correspondence below.
> Please Note: This service request will automatically close if we do not receive a *response within 10 days* and cannot be reopened.
> If it is not feasible to respond within 10 days, feel free to open a new service request and reference this ticket for continued support.
> Best regards,
> AMD Global Customer Care


----------



## MikeS3000

So my 5900x is running on an x570 Gigabyte board with AGESA 1.1.0.0 patch D. My #1 core (gold star in Ryzen Master) fails Prime95 Large FFT, non-avx single thread in a matter of 2 minutes at BIOS defaults. I can only stabilize by running a Positive 5 on that core on my curve optimizer. Would you RMA the cpu or is this just buggy AGESA and wait for 1.1.9.0?


----------



## Deepcuts

MikeS3000 said:


> So my 5900x is running on an x570 Gigabyte board with AGESA 1.1.0.0 patch D. My #1 core (gold star in Ryzen Master) fails Prime95 Large FFT, non-avx single thread in a matter of 2 minutes at BIOS defaults. I can only stabilize by running a Positive 5 on that core on my curve optimizer. Would you RMA the cpu or is this just buggy AGESA and wait for 1.1.9.0?


If the CPU goes belly up at BIOS defaults, it is 100% clear that CPU is busted.
A future firmware might stabilize it, but the fact remains: that is a defective CPU with a band-aid on it.
I would return the CPU.


Also, to all newcomers to this thread: pretty please, stop suggesting alternative settings and tweaks without even reading the 1st post (not to mention the rest of them). We have tried them all.
The only solution is RMA and hope to get a working CPU.


----------



## MikeS3000

I hear ya on the RMA. I ordered from Antonline so we'll see how easy this is. I emailed them to open up a support ticket today but haven't heard back. The issue is whether or not they have a replacement in stock and how long I would have to be without a computer. I've got it tuned so it rarely crashes now at idle, but still pretty annoying to discover this bad core.


----------



## Anthos

Deepcuts said:


> Also, to all newcomers to this thread: pretty please, stop suggesting alternative settings and tweaks without even reading the 1st post (not to mention the rest of them). We have tried them all.
> The only solution is RMA and hope to get a working CPU.


I completely disagree. First of all not all of them get this appearing in the exact same way. Second, lots of people have managed to stabilize their system with different ways, some can do so with disabling c-states, others could not, some can stabilize by increasing vcore, others could not, others by playing with the curve, others could not, others my decreasing the IF/DOCP settings, others could not, others by flashing a different bios, others could not. For some the only solution seems to be RMA. How can you say "We" when there is such a varied response? The more information we have the better likelihood we have in understanding this thing. Anyway, I could go on but I'll just drop it here. At the end of the day it is a.. discussion forum.


----------



## Deepcuts

Anthos said:


> I completely disagree. First of all not all of them get this appearing in the exact same way. Second, lots of people have managed to stabilize their system with different ways, some can do so with disabling c-states, others could not, some can stabilize by increasing vcore, others could not, others by playing with the curve, others could not, others my decreasing the IF/DOCP settings, others could not, others by flashing a different bios, others could not. For some the only solution seems to be RMA. How can you say "We" when there is such a varied response? The more information we have the better likelihood we have in understanding this thing. Anyway, I could go on but I'll just drop it here. At the end of the day it is a.. discussion forum.


You have to be kidding.
Do you think disabling C-States, increasing Core voltage, "playing with the curve" and maybe downgrading PCIe speeds are solutions to this problem?
You and I have vastly different views on what a stable system is.


----------



## brasoveanul

Anthos said:


> I completely disagree. First of all not all of them get this appearing in the exact same way. Second, lots of people have managed to stabilize their system with different ways, some can do so with disabling c-states, others could not, some can stabilize by increasing vcore, others could not, others by playing with the curve, others could not, others my decreasing the IF/DOCP settings, others could not, others by flashing a different bios, others could not. For some the only solution seems to be RMA. How can you say "We" when there is such a varied response? The more information we have the better likelihood we have in understanding this thing. Anyway, I could go on but I'll just drop it here. At the end of the day it is a.. discussion forum.


A decent quality processor would just be installed on the motherboard and it would just work. The settings should be changed only to optimize its performance, not to "stabilize" it. It is so easy to understand and from the realm of the basic common sense that I am astonished to see opinions like yours.


----------



## Nolan21

hi,

I am sorry I did not see this forum topic earlier. Tried every possible combination I could find on reddit and amd forums. Some of them helped a bit but I still had reboots. Nothing overclocked at all. Defaults loaded because I was afraid the cpu might burn or something  Around 85 degrees in games and 45-50 idle.
Using a Gigabyte mainboard and 5950x. I got lucky and the shop replaced it the same day I sent it in.
2 days now with the new CPU and everything is very stable. Even temperature is a lot lower. Coming from Intel and never had such problems, like ever. Hope this second cpu will play nice in the future.


----------



## JohnnyFlash

Nolan21 said:


> 2 days now with the new CPU and everything is very stable. Even temperature is a lot lower. Coming from Intel and never had such problems, like ever. Hope this second cpu will play nice in the future.


I wonder they're cherry picking RMAs, or this means the new batches are safer than the original release. Either way, great news!


----------



## Anthos

brasoveanul said:


> A decent quality processor would just be installed on the motherboard and it would just work. The settings should be changed only to optimize its performance, not to "stabilize" it. It is so easy to understand and from the realm of the basic common sense that I am astonished to see opinions like yours.





> You have to be kidding.
> Do you think disabling C-States, increasing Core voltage, "playing with the curve" and maybe downgrading PCIe speeds are solutions to this problem?
> You and I have vastly different views on what a stable system is.


What part of "discussion forum" do you lot fail to understand? Close the damn thread then, and just edit the original post to read "If you are having WHEA errors RMA your CPU" end of story. What's the point in discussing anything then?

P.s And what's acceptable for you doesn't mean it's acceptable for everybody else. ffs.


----------



## pSickOpatA

Well, i'm using a new chip now and dont wanna jinx it or anything.. but till now works flawless.
I'm even able to use a browser for over 10 seconds without a reboot, thats a huge *W*.

gaming all day, no issues.. stress test ok. oh man that feels nic after about 1 month of pure stress.


----------



## Imraneo

I agree that if you have to tweak the BIOS one bit to get it to run, means you have a flawed CPU.
This discussion does help is allowing us to dive in further and temporarily stabilize the system till a BIOS fixes it or you get convinced to RMA it. At least that's how it is for me.


----------



## rob-tech

If it's not stable at stock, it's defective and you guys should RMA. Hopefully, this is fixed shortly and they don't continue pushing this crap going forward, it's a waste of our time and quite frankly insulting to customers. 

Maybe when they get a bunch of returns they will reconsider their approach to quality control, it's unbelievable what I am reading here. No band aid fix should have to be applied for a CPU to be stable at stock.


----------



## brasoveanul

Anthos said:


> What part of "discussion forum" do you lot fail to understand? Close the damn thread then, and just edit the original post to read "If you are having WHEA errors RMA your CPU" end of story. What's the point in discussing anything then?
> 
> P.s And what's acceptable for you doesn't mean it's acceptable for everybody else. ffs.


It seems that you don't understand or don't want to understand some things...At first, people thought that a buggy bios, maybe a bad motherboard, generate the reported issues. It has fairly long been proven that the root and main cause is a sub-standard manufacturing process of these new Ryzen 5000 processors. The BIOS bugs represent a secondary issue, if at all.


----------



## brasoveanul

At this point, the thread is more of a heads up for any other unlucky customers that face these kind of issues, not to waste their time any more trying to "stabilize" the CPU and find working BIOS configurations, and simply return it as soon as this is possible.


----------



## jasstarr

Korital said:


> I had the exact same problem on the aorus master x570. What fixed it for me was I had to manually set the ram timings, ram dram voltage and vcore soc, and also manually select the proper infinity fabric frequency. I think there is some type of ram bug with the latest bios.
> 
> See if that helps you.


Just wanted to chime in and say that I was running into this WHEA BSOD error constantly with default BIOS settings. I have the Asus x570-Pro and 5800x. It was a brand new build that I built twice and wiped clean twice- I also changed several different BIOS settings. What worked for me was changing the ram speed (DOCP) DRAM FREQUENCY to 3200mhz. Even though I bought 3600mhz RAM.


----------



## xeizo

jasstarr said:


> Just wanted to chime in and say that I was running into this WHEA BSOD error constantly with default BIOS settings. I have the Asus x570-Pro and 5800x. It was a brand new build that I built twice and wiped clean twice- I also changed several different BIOS settings. What worked for me was changing the ram speed (DOCP) DRAM FREQUENCY to 3200mhz. Even though I bought 3600mhz RAM.


Working DOCP isn't a given, you should try setting voltages/timings/subtimings manual. I run 3600MHz memory at 3800MHz on two rigs with no problems other than I had to do some testing/verification myself.


----------



## Anthos

brasoveanul said:


> It seems that you don't understand or don't want to understand some things...At first, people thought that a buggy bios, maybe a bad motherboard, generate the reported issues. It has fairly long been proven that the root and main cause is a sub-standard manufacturing process of these new Ryzen 5000 processors. The BIOS bugs represent a secondary issue, if at all.


And it seems that some people don't understand that it's not really raining Ryzen Cpus around. For some it is not an option at the moment to RMA and wait for a month for a replacement. For them if they can change a couple of settings and everything is fine it could be acceptable for now. And you also seem to be under the impression that I am advocating that the only thing people should do is meddle in their bios changing stuff until their machine doesn't crash all the time. No! What I am saying is people should be free to give their 2 cents on the matter. If they tried a setting and it worked for them, just because it didn't work for OP that doesn't mean that they should stay silent.


----------



## aa.delite

If BSOD not caused by RAM, you should RMA. You should test 2400 MHz JEDEC RAM without DOCP/XMP. Up to 3200 should be stable.

Cache Hierarchy Error (WHEA_UNCORRECTABLE random reboots) usually means RMA. There is a chance you'll get stable CPU after some bios updates. Like me. But is it healthy or just alive with a band-aid? I can't use any negative Curve Optimizer. I don't trust this CPU instance. But it's stable at stock now so I can't return it anymore. So you should RMA while you can. To get rock stable better CPU with overclocking / curve downvoltage potential.


----------



## LAA

Deepcuts said:


> If the CPU goes belly up at BIOS defaults, it is 100% clear that CPU is busted.
> A future firmware might stabilize it, but the fact remains: that is a defective CPU with a band-aid on it.
> I would return the CPU.
> 
> 
> Also, to all newcomers to this thread: pretty please, stop suggesting alternative settings and tweaks without even reading the 1st post (not to mention the rest of them). We have tried them all.
> The only solution is RMA and hope to get a working CPU.


Not sure I agree with this.
I had WHEA errors and random restarts pretty much every day with my 5950x on x570 aorus master using F30 bios, all on stock settings too other than xmp.
Then I flashed F31 final just before the new year, again using all stock settings except XMP and not had an issue since.


----------



## Anthos

aa.delite said:


> If BSOD not caused by RAM, you should RMA. You should test 2400 MHz JEDEC RAM without DOCP/XMP. Up to 3200 should be stable.
> 
> Cache Hierarchy Error (WHEA_UNCORRECTABLE random reboots) usually means RMA. There is a chance you'll get stable CPU after some bios updates. Like me. But is it healthy or just alive with a band-aid? I can't use any negative Curve Optimizer. I don't trust this CPU instance. But it's stable at stock now so I can't return it anymore. So you should RMA while you can. To get rock stable better CPU with overclocking / curve downvoltage potential.


I was just now typing a response saying I have my build for 3 weeks and only had a whea error on day 7 of having my build and none since... and as I was typing it the pc restarted and in the event logger states a whea error. Wow you can't make this up, what are the chances. This is quite weird. 4 Whea errors on day 7, and the next one on day 20. it's quite hard finding a pattern to it. Because I would have expected it to happen a lot more frequently no matter what exactly is instigating this.


----------



## xeizo

Anthos said:


> I was just now typing a response saying I have my build for 3 weeks and only had a whea error on day 7 of having my build and none since... and as I was typing it the pc restarted and in the event logger states a whea error. Wow you can't make this up, what are the chances. This is quite weird. 4 Whea errors on day 7, and the next one on day 20. it's quite hard finding a pattern to it. Because I would have expected it to happen a lot more frequently no matter what exactly is instigating this.


There's suspicions it is triggered because the CPU tries to boost too high, try reducing Boost Override and use less offset. Raise VDDG and SOC seems to have a positive impact too.

I have noticed the most replicable way to trigger a WHEA(without reboot) is just running CPUZ benchmark, running other benchmarks like CB, AIDA, Geekbench or 3DMark doesn't trigger any WHEA. CPUZ almost always does.


----------



## Anthos

xeizo said:


> There's suspicions it is triggered because the CPU tries to boost too high, try reducing Boost Override and use less offset. Raise VDDG and SOC seems to have a positive impact too.
> 
> I have noticed the most replicable way to trigger a WHEA(without reboot) is just running CPUZ benchmark, running other benchmarks like CB, AIDA, Geekbench or 3DMark doesn't trigger any WHEA. CPUZ almost always does.


It's a bit weird. Past couple of weeks ran cpu-z benchmark multiple times and never had a single whea error. Even a few days ago when I was aggressively overclock a bit randomly and still no whea error (just your regular instablity crashes), the only whea i've had was when doing simple things like browsing, aside from one that did happen if I remember correctly as soon as game map loaded.

Edit: also does anyone know in the whea error where it states "Processor APIC ID: 0" is the ID the number on the core affected or something else?


----------



## Imraneo

My system was pretty stable for the first 5 days. And then reboots during idle in Windows. And then unable to Windows at this point.
I'm not the only one who saw their CPU "degrade" over time. What can you make of this? Really puzzling, but the bottom line is, it's just not right.


----------



## yaniv82

For anyone starting an RMA for a 5950x I just got this from AMD: 
_Please be informed as there is no stock for 100-100000059WOF inventory in our warehouse. The replacement part will be shipped once the stock arrives in the warehouse. I appreciate you cooperation in this regard._
No idea how long this will be.


----------



## aa.delite

Imraneo said:


> My system was pretty stable for the first 5 days. And then reboots during idle in Windows. And then unable to Windows at this point.
> I'm not the only one who saw their CPU "degrade" over time. What can you make of this? Really puzzling, but the bottom line is, it's just not right.


Try without XMP/DOCP memory settings using 2400 MHz memory speed. If you have defective CPU, you're good candidate to make some tests. Like Curve Optimizer +5 (+10?) all cores. Or overvoltage (Normal voltage + small positive offset). Or higher Loadline Calibration.


----------



## JohnnyFlash

Imraneo said:


> My system was pretty stable for the first 5 days. And then reboots during idle in Windows. And then unable to Windows at this point.
> I'm not the only one who saw their CPU "degrade" over time. What can you make of this? Really puzzling, but the bottom line is, it's just not right.


If it's the same install, it's possible it wasn't 100% stable and windows files were slowly being corrupted, which is why you can't get in now. I've seen that happen before with unstable overclocks, not that this was the reason in this case.


----------



## DemonAk

Found little utility, max boost tester

https://github.com/jedi95/BoostTester/releases/download/1.1/BoostTester.exe

Maybe we can trigger bsod or reboot using this tool

I already had a reboot without errors once (kernel power 41 (63))


----------



## reqq

I have no more WHEA errors with my mem overclock with MSI beta 1.1.9.0 bios. Also curve optmizier working pretty sweet..some cores boosting 5075.


----------



## MusicalPulse

Received my 5900x and Aorus Master x570 two days ago and I have the same problem... random reboots and whea bsods at stock settings 2133 mhz ram on a new windows install (xmp 3200 basically instantly bsods). Tried a bunch of different voltage settings and things from this thread and nothing worked. Disabling core performance boost seemed to work but then got a restart a few hours in. Testing +8 curve optimizer all cores suggested by aa.delite and no restarts yet for 2 hours..will see if its okay. Even if it is stable though, really doubting I'll get 3200 mhz to work. Do you guys think I should just RMA the cpu or wait for BIOS updates?

Pretty upset about all of this right now.


----------



## frollic

yaniv82 said:


> For anyone starting an RMA for a 5950x I just got this from AMD:
> _Please be informed as there is no stock for 100-100000059WOF inventory in our warehouse. The replacement part will be shipped once the stock arrives in the warehouse. I appreciate you cooperation in this regard._
> No idea how long this will be.


Awesome.

Local stores here in .se say new CPUs coming in around Feb 1st. If AMD RMA uses the same channels to replenish, I'm going to have to wait for another 3 weeks.


----------



## brasoveanul

MusicalPulse said:


> Received my 5900x and Aorus Master x570 two days ago and I have the same problem... random reboots and whea bsods at stock settings 2133 mhz ram on a new windows install (xmp 3200 basically instantly bsods). Tried a bunch of different voltage settings and things from this thread and nothing worked. Disabling core performance boost seemed to work but then got a restart a few hours in. Testing +8 curve optimizer all cores suggested by aa.delite and no restarts yet for 2 hours..will see if its okay. Even if it is stable though, really doubting I'll get 3200 mhz to work. Do you guys think I should just RMA the cpu or wait for BIOS updates?
> 
> Pretty upset about all of this right now.


According to my first hand experience, I would say, RMA.


----------



## OndrejVasicek

Fingers crossed, but it has been 3 days since I updated the bios to the version 3202 and no black screen restart. The performance is a bit lower and maybe the temps are a quite higher (even though the cooler is quite cold) but it works. I also ended up with FCLK 1933Mz and RAM 3866Mhz, so coupled mode with 18-22-22-42 with lowered voltage 1.375 (originally it was 4000Mhz with 1.4V). So I think pretty good for 64GB dual stick. The higher sizes do generally have worse timings.

I also tested a lot of variations of FCLK and RAM. Like a LOT from 3200Mhz with CL 14 to 4000Mhz with CL18 combined with FCLK 1600Mhz to 2000Mhz. I did a lot of different tests and have some interesting results. In general, I was very surprised that not having coupled frequencies isn’t always such a tragical scenario as dome benchmark on the internet shows. Some times it really makes sense to have higher RAM frequency with uncoupled FLCK. One of the results definitely was that it’s better to have higher frequency with higher timings then the opposite. Like 3200Mhz/CL14 is worse than 3600Mhz/CL16 which is worse than 4000Mhz CL18. No matter the FCLK (coupled mode of course matters but doesn’t beat the higher freqs all the time). So sometimes just go uncoupled if it means to have higher FCLK and higher RAM.

In my case I ended up in coupled 1933/3866 because 1933 FCLK was the highest stable without whea errors. 1933/4000 had better memory synthetic test but was worse in some more important scenarios. 1966/3933 worked good, passed all the test and had the best results, but generates in some test a lot of Whea errors which could potentially cause some restarts. The golden goal 2000/4000 worked as well. Windows was running, some tests as well. The performance was superior, there was a looooooooooot of whea errors and in one of the tests it crashed 😊 So I gave it up.

So in the end I have some conclusions

Black screen restarts doesn’t always mean bad CPU. My second 5950x, which had black screens the same as my first 5950x (which was given back) is probably OKish piece of silicon but the problem was in bios or AGESA or combination with slightly bad CPU. I’m not sure but it seems the new Bios somehow fixed it.
Don’t always go for coupled mode or low timigs for the price of much lower RAM and FCLK frequencies. Push them both, measure and you will see the best results.
Don’t bother with 1core Cinebench. The score is always the same 😊. Also don’t be surprised that Multicore Cinebench give you different results all the time – the initial temps mean a lot. Few hundreds up or down is no problem.


----------



## ghiga_andrei

MusicalPulse said:


> Received my 5900x and Aorus Master x570 two days ago and I have the same problem... random reboots and whea bsods at stock settings 2133 mhz ram on a new windows install (xmp 3200 basically instantly bsods). Tried a bunch of different voltage settings and things from this thread and nothing worked. Disabling core performance boost seemed to work but then got a restart a few hours in. Testing +8 curve optimizer all cores suggested by aa.delite and no restarts yet for 2 hours..will see if its okay. Even if it is stable though, really doubting I'll get 3200 mhz to work. Do you guys think I should just RMA the cpu or wait for BIOS updates?
> 
> Pretty upset about all of this right now.


+8 CO is not normal, there are a lot of people running -5 on best cores and -30 on others and still stable. Your CPU is bad. Return it and get another. A ton of people this this, including myself.
Keeping it at +8 means less performance and more heat and will still crash in a few days.


----------



## aa.delite

MusicalPulse said:


> hours in. Testing +8 curve optimizer all cores suggested by aa.delite and no restarts yet for 2 hours..will see if its okay. Even if it is stable though, really doubting I'll get 3200 mhz to work. Do you guys think I should just RMA the cpu or wait for BIOS updates?


Positive CO is just for testing purposes to wait RMA requests. Your CPU works well with a small overvoltage and a little higher temps. CO is safe, but CPU should work perfect at default bios settings and default voltage. So either CPU is defective or BIOS is unstable. RMA is recommended if you've tried latest beta BIOS. Healthy CPU should be stable even with some negative curve optimizer value. You'll get a little higher performance and less temps after RMA. And quality product for your money.


----------



## Anthos

aa.delite said:


> You'll get a little higher performance and less temps after RMA. And quality product for your money.


Just wanted to say that although chances are in your favour if you RMA to get one that doesn't have a problem than it does however there is not guarrantee that the new one won't also suffer from the same. There's been reports of people getting another cpu after RMA and were unlucky that they got hit with it again.


By the way has at any point, at any medium AMD acknowldedged even unofficially the existence of these problems or are they acting still as if completely oblivious to their existence?


----------



## brasoveanul

Anthos said:


> Just wanted to say that although chances are in your favour if you RMA to get one that doesn't have a problem than it does however there is not guarrantee that the new one won't also suffer from the same. There's been reports of people getting another cpu after RMA and were unlucky that they got hit with it again.
> 
> 
> By the way has at any point, at any medium AMD acknowldedged even unofficially the existence of these problems or are they acting still as if completely oblivious to their existence?


I haven't come across of any such acknowledgment from AMD or any of the tech "influencers" that claim objectivity, good will and closeness to their followers.


----------



## Deepcuts

Anthos said:


> By the way has at any point, at any medium AMD acknowldedged even unofficially the existence of these problems or are they acting still as if completely oblivious to their existence?


Not that I know of. And I searched a lot.


----------



## ghiga_andrei

Anthos said:


> Just wanted to say that although chances are in your favour if you RMA to get one that doesn't have a problem than it does however there is not guarrantee that the new one won't also suffer from the same. There's been reports of people getting another cpu after RMA and were unlucky that they got hit with it again.
> 
> 
> By the way has at any point, at any medium AMD acknowldedged even unofficially the existence of these problems or are they acting still as if completely oblivious to their existence?


OP says he wrote AMD and 2 tech youtubers about a collection of people complaining about this behavior, here (see update VI):

__
https://www.reddit.com/r/AMDHelp/comments/kfyst7

He also says he received template response from AMD with no actual content and no response from tech youtubers.


----------



## GRABibus

ghiga_andrei said:


> OP says he wrote AMD and 2 tech youtubers about a collection of people complaining about this behavior, here (see update VI):
> 
> __
> https://www.reddit.com/r/AMDHelp/comments/kfyst7
> 
> He also says he received template response from AMD with no actual content and no response from tech youtubers.


we loose our time.
They will never answer.
Solution ? Switch to 11900k when it is launched.


----------



## Anthos

I guess AMD after so many years still acts like a teenager. When it's about hyping their products they are all over the place but when **** happens they just disappear. I wonder what the AMD fanboys have to say when they were ****ting all over intel when all those security exploits appeared that at the end of they day it didn't really affect your average person but then you have here some people that can't use their pcs for more than 2 mins and AMD hasn't even given a response 2 months later.
Obviously they MUST be aware of this problem (if not then they are really... reaaally terrible at their job). Don't they know when **** happens your first response is damage control? By staying silent and letting people speculate on the problems and having everyone affected start RMAing their CPUs how can they not see the ****storm that's brewing?
I mean how more simpler could it be? :
A)"-Hi guys we are aware of the problem and we have identified it to the #CPU #BIOS #whatever and we are planning appropriate action"
B)"-Hi guys we are aware of the problem, we have no idea why it's happening but we are investigating and we'll respond again once we have a more clear understanding"
C)Do/Say nothing


----------



## kzaspam

I managed to get a 5900X a few weeks ago, and i had the exact same problem. After many hours of trying everything, the only thing that worked for me was to disable CBP. 
So i decided to RMA this CPU. AMD agreed to send me a replacement unit and i received it today. To my surprise, when I opened the box, it turns out that they sent me a 5800X instead of a 5900X. *** amd! 
So now I'm waiting again for an answer from their support with a PC I assembled from old parts i had lying around...


----------



## ghiga_andrei

kzaspam said:


> I managed to get a 5900X a few weeks ago, and i had the exact same problem. After many hours of trying everything, the only thing that worked for me was to disable CBP.
> So i decided to RMA this CPU. AMD agreed to send me a replacement unit and i received it today. To my surprise, when I opened the box, it turns out that they sent me a 5800X instead of a 5900X. *** amd!
> So now I'm waiting again for an answer from their support with a PC I assembled from old parts i had lying around...


That sucks. They could have at least sent you a 5950x by mistake.


----------



## aa.delite

kzaspam said:


> AMD agreed to send me a replacement unit and i received it today. To my surprise, when I opened the box, it turns out that they sent me a 5800X instead of a 5900X.


They must RMA it to 5950x for free now. I will be disappointed if not.


----------



## Hueristic

aa.delite said:


> They must RMA it to 5950x for free now. I will be disappointed if not.



Let me let you in on a little history.

Back in the day when EVGA started giving people a better grade card on rma's people started intentionally destroying hardware just to get a free upgrade.

This is why we can't have nice things.


----------



## JohnnyFlash

Hueristic said:


> Let me let you in on a little history.
> 
> Back in the day when EVGA started giving people a better grade card on rma's people started intentionally destroying hardware just to get a free upgrade.
> 
> This is why we can't have nice things.


Yep. When I worked in retail, our cell extended warranty covered accidental damage. If we didn't have your phone anymore, you got the next step up in the line that was in stock; people sometimes went from an iphone 3 to 6 at no extra charge.

Guy came in to warranty his phone because an app wouldn't work. It was 100% the app, he just wanted a new phone. After going over everything that was covered, he left and came back 5 min later with a smashed phone. Said he dropped it getting into the car. That turned into a lawsuit and physical damage was removed from warranties going forward.

We had him on camera smashing the phone, so he got nothing and ruined it for everyone else.


----------



## MusicalPulse

aa.delite said:


> Positive CO is just for testing purposes to wait RMA requests. Your CPU works well with a small overvoltage and a little higher temps. CO is safe, but CPU should work perfect at default bios settings and default voltage. So either CPU is defective or BIOS is unstable. RMA is recommended if you've tried latest beta BIOS. Healthy CPU should be stable even with some negative curve optimizer value. You'll get a little higher performance and less temps after RMA. And quality product for your money.


Yeah thanks for the advice. I got a restart 10-12 hrs in at 8 CO so I have it at 10 now. I'll be sending in the RMA request. Does anyone know if it would be better to request it directly through AMD or through Amazon? I'm assuming neither has stock so hopefully it doesn't take a long time.


----------



## Deepcuts

Hueristic said:


> Let me let you in on a little history.
> 
> Back in the day when EVGA started giving people a better grade card on rma's people started intentionally destroying hardware just to get a free upgrade.
> 
> This is why we can't have nice things.


The idea here is that AMD already messed up the 1st RMA by sending the wrong, cheaper CPU.
Or maybe should I say AMD messed up twice already: 1st by selling a defective CPU and 2nd by sending the wrong CPU back from RMA.


----------



## ghiga_andrei

Cannot help but notice we are at least 4 guys from Romania on this page with problems with new Ryzen chips. Could it be our suppliers received a very bad batch or just a coincidence ?


----------



## Deepcuts

ghiga_andrei said:


> Cannot help but notice we are at least 4 guys from Romania on this page with problems with new Ryzen chips. Could it be our suppliers received a very bad batch or just a coincidence ?


Look at the bright side: all 4 of us managed to get a working replacement.


----------



## ghiga_andrei

Deepcuts said:


> Look at the bright side: all 4 of us managed to get a working replacement.


I guess being in the UE has some advantages. Extended warranty and the right to return directly to shop are great customer protection features, but be sure these contribute to the bigger prices that we have in the UE for the same products as in USA.


----------



## Deepcuts

ghiga_andrei said:


> I guess being in the UE has some advantages. Extended warranty and the right to return directly to shop are great customer protection features, but be sure these contribute to the bigger prices that we have in the UE for the same products as in USA.


Not for long


----------



## ghiga_andrei

I wonder what happens with the cpus that we returned to the stores. Usually they are put to sale immediately with lower price and depending of the store, branded as Resealed, Opened, whatever. So I am sure some other people will re-buy our bad cpus. Will take a few returns until the store will just decide to take it up with the supplier. I know from a discussion with an employee at Evomag for example that they don't even brand previously returned products as Resealed like Emag. They just put a discount and if you are unlucky you get it. For sure if I received an opened CPU box I would return it immediately, but I don't know about other people.


----------



## GamBoTron

ghiga_andrei said:


> I wonder what happens with the cpus that we returned to the stores. Usually they are put to sale immediately with lower price and depending of the store, branded as Resealed, Opened, whatever. So I am sure some other people will re-buy our bad cpus. Will take a few returns until the store will just decide to take it up with the supplier. I know from a discussion with an employee at Evomag for example that they don't even brand previously returned products as Resealed like Emag. They just put a discount and if you are unlucky you get it. For sure if I received an opened CPU box I would return it immediately, but I don't know about other people.


That is disgusting on so many Levels. Reselling people pure frustration and a faulty CPU when the customer use hundreds of dollars in good faith... i am still waiting for my 5950x, should be arriving in 2 weeks, but after reading threads like these im not very positive/hopeful for a smooth system. 

I am expecting the worst tbh


----------



## ghiga_andrei

I know this because some months ago I bought a new monitor and it had 4 dead pixels (quality control is non-existing not only at AMD but also Aorus) so I returned it and I asked the guy working there who took it in what happens now with the monitor and he said they'll put it back on sale and only if the next customer returns it again they have to report it to management and see what to do with it. This was at Evomag, one of the big stores here I would say. 

Anyway, after building a complete PC last year I am full of frustration and disappointed with the state of components these days. The monitor had 4 dead pixels, the CPU is crashing, the back fan in the case (H500 CoolerMaster) made ticking noise and I had to replace that, no stock of video cards and even those had problems (remember the capacitor problems on RTX3080). I also recently bought a rather expensive scanner (Epson V600) and it has air bubbles in the scanning glass if you believe that, have to return that also soon. What I'm saying is that we're paying a lot of money for high-end components and we get not even decent quality components. I would say 12 years ago when I built my last PC these things did not happen. I don't remember spending so much time changing settings in the BIOS for days or having 4 dead pixels on a new monitor. Maybe I would see 1 dead pixel in a corner somewhere, but 4 is just a total lack of respect for the customer.


----------



## GamBoTron

ghiga_andrei said:


> I know this because some months ago I bought a new monitor and it had 4 dead pixels (quality control is non-existing not only at AMD but also Aorus) so I returned it and I asked the guy working there who took it in what happens now with the monitor and he said they'll put it back on sale and only if the next customer returns it again they have to report it to management and see what to do with it. This was at Evomag, one of the big stores here I would say.
> 
> Anyway, after building a complete PC last year I am full of frustration and disappointed with the state of components these days. The monitor had 4 dead pixels, the CPU is crashing, the back fan in the case (H500 CoolerMaster) made ticking noise and I had to replace that, no stock of video cards and even those had problems (remember the capacitor problems on RTX3080). I also recently bought a rather expensive scanner (Epson V600) and it has air bubbles in the scanning glass if you believe that, have to return that also soon. What I'm saying is that we're paying a lot of money for high-end components and we get not even decent quality components. I would say 12 years ago when I built my last PC these things did not happen. I don't remember spending so much time changing settings in the BIOS for days or having 4 dead pixels on a new monitor. Maybe I would see 1 dead pixel in a corner somewhere, but 4 is just a total lack of respect for the customer.


Wow that is indeed unlucky. And i agree, it is not a good development for the industry. Money always is the religion of these companies and if they can get away without having good quality control for their products and save money , they will probably do that.

That being said, most customers dont experience this and receive perfectly fine products, but there is definitely a lot of potential for improvements from the companies


----------



## Deepcuts

_Hi, 
I am Deeps
Three weeks now sober and without any issues with my new 5950X
Over 600 Handbrake encodes without a crash._


Off-topic.
@ghiga_andrei Evomag has been a crappy shop since stay started, with crappy management with whom I had the misfortune to interact. F them and pick another retailer. Plenty to choose from.


----------



## JohnnyFlash

ghiga_andrei said:


> I wonder what happens with the cpus that we returned to the stores. Usually they are put to sale immediately with lower price and depending of the store, branded as Resealed, Opened, whatever. So I am sure some other people will re-buy our bad cpus. Will take a few returns until the store will just decide to take it up with the supplier. I know from a discussion with an employee at Evomag for example that they don't even brand previously returned products as Resealed like Emag. They just put a discount and if you are unlucky you get it. For sure if I received an opened CPU box I would return it immediately, but I don't know about other people.


I can say for certain that doesn't happen, in Canada at least. If a product is returned citing a defect, the manufacturer takes it back and the store gets a new replacement or credit.

Open box items are just straight returns. If you get a wonky one, it's usually the customer failing to tell the store, and/or the store not checking the item close enough.


----------



## GRABibus

Guys,
suggest that you all post this thread on the enclosed link 






I did it


----------



## Anthos

GRABibus said:


> Guys,
> suggest that you all post this thread on the enclosed link
> 
> 
> 
> 
> 
> 
> I did it


Especially after reading those comments I do wonder if there is indeed anything more annoying than an AMD fanboy. Jesus all that cringe.


----------



## brasoveanul

GamBoTron said:


> Wow that is indeed unlucky. And i agree, it is not a good development for the industry. Money always is the religion of these companies and if they can get away without having good quality control for their products and save money , they will probably do that.
> 
> *That being said, most customers dont experience this and receive perfectly fine products, but there is definitely a lot of potential for improvements from the companies*


This is debatable, what does "most" mean, 60, 70, 90 percent? Even if five percent of the customers experienced what we did, and after exchanging the CPU, leaving with the hope, not certainty, that it remains stable as it seems to be the case at the moment, and it would be a tremendously high percentage. I presume that the percent is sensibly higher that 5%.


----------



## GamBoTron

brasoveanul said:


> This is debatable, what does "most" mean, 60, 70, 90 percent? Even if five percent of the customers experienced what we did, and after exchanging the CPU, leaving with the hope that it will be stable as it seems to be the case at the moment, and it would be a tremendously high percentage. I presume that the percent is sensibly higher that 5%.


when i said "most" , I meant in general when it comes to hardware components , not only for ryzen products or 5950x for that matter.

Bigger Companies/manufacturers with a reputation offcourse have to do quality control to make sure their products actually function as intended and can thrive in a market full of competition. thats pretty much a given (offcourse there is special cases where they have to pull back the product because of a widespread issue, but thats exceptions)

But different products are, well , different. Many factors involved to determine if it will be stable or not and specially when it comes to cutting edge technology.

That being said, you have to question how well they controlled this specific product when so many people (like in this thread ) report stability issues and bad performance, so in that sense i agree with you.

My theory is that the rushed this one out , before ensuring compatability for customers and waiting for å "safer" release. 
The hype was too strong, too much money involved.


----------



## brasoveanul

GamBoTron said:


> when i said "most" , I meant in general when it comes to hardware components , not only for ryzen products or 5950x for that matter.
> 
> Companies/manufacturers offcourse have to do quality control to make sure their products actually function as intended and can thrive in a market full of competition. thats pretty much a given (offcourse there is special cases where they have to pull back the product because of a widespread issue, but thats exceptions)
> 
> That being said, you have to queestion how well they controlled this specific product when so many people (like in this thread ) report stability issues and bad performance, so in that sense i agree with you.


I spoke about Ryzen 5000 because this is one of the most significant (at least technological) headaches during the past year. Nevertheless, I suspect this carelessness and, ultimately, disrespect for the customers, is widespread in the industry. Just an example, I have just bought a monitor, with excellent reviews according to the "influencers'" videos, it was defective from the factory, it simply didn't turn on. The man that processed my return said that they get more and more of the so-called high tech products as immediate returns(DOA) that are simply sent out broken, non-functional out of the factory, or with defects that are incompatible with the normal usage of the most permissive end user.


----------



## GamBoTron

brasoveanul said:


> I spoke about Ryzen 5000 because this is one of the most significant (at least technological) headaches during the past year. Nevertheless, I suspect this carelessness and, ultimately, disrespect for the customers, is widespread in the industry. Just an example, I have just bought a monitor, with excellent reviews according to the "influencers'" videos, it was defective from the factory, it simply didn't turn on. The man that processed my return said that they get more and more of the so-called high tech products as immediate returns(DOA) that are simply sent out broken, non-functional out of the factory, or with defects that are incompatible with the normal usage of the most permissive end user.


I see. And for sure , i heard similar stories from friends that work in the industry: a lot of sloppy and bad programming, products being released without proper consideration, rushing out releases rather than opting for stability, more refunds as you mentioned etc

You might be right about it being an increasing tendency, i dont know and i dont have figures on it. Im certainly not gonna argue or disagree with you.

this industry has exploded the last 10 years, so naturally you get more of everything also, the good and the bad on bigger scales


----------



## brasoveanul

GamBoTron said:


> I see. And for sure , i heard similar stories from friends that work in the industry: a lot of sloppy and bad programming, products being released without proper consideration rushing out releases rather than opting for stability, more refunds etc
> 
> You might be right about it being an increasing tendency, i dont know and i dont have figures on it. Im certainly not gonna argue or disagree with you.
> 
> this industry has exploded the last 10 years, so naturally you get more of everything also


I would rather strongly prefer to get less of everything, but stable and absolutely dependable considering sufficiently long periods of time.


----------



## GamBoTron

brasoveanul said:


> I would rather strongly prefer to get less of everything, but stable and absolutely dependable considering sufficiently long periods of time.


Agreed. I like having options, but when it comes at the cost of actually functioning correctly , less is more.


----------



## MusicalPulse

Anyone who RMA'd their CPU here, how long did it take for AMD to reply to your RMA request?


----------



## Deepcuts

MusicalPulse said:


> Anyone who RMA'd their CPU here, how long did it take for AMD to reply to your RMA request?


5 weeks


----------



## iambkm01

kzaspam said:


> I managed to get a 5900X a few weeks ago, and i had the exact same problem. After many hours of trying everything, the only thing that worked for me was to disable CBP.
> So i decided to RMA this CPU. AMD agreed to send me a replacement unit and i received it today. To my surprise, when I opened the box, it turns out that they sent me a 5800X instead of a 5900X. *** amd!
> So now I'm waiting again for an answer from their support with a PC I assembled from old parts i had lying around...


If I want to RMA my 5950x which I bought from Amazon do. I contact Amazon or AMD? Also what do I tell them? I Dont want to be stuck with some amateur technician troubleshooting this with me, I want a replacement.


----------



## iambkm01

aa.delite said:


> If BSOD not caused by RAM, you should RMA. You should test 2400 MHz JEDEC RAM without DOCP/XMP. Up to 3200 should be stable.
> 
> Cache Hierarchy Error (WHEA_UNCORRECTABLE random reboots) usually means RMA. There is a chance you'll get stable CPU after some bios updates. Like me. But is it healthy or just alive with a band-aid? I can't use any negative Curve Optimizer. I don't trust this CPU instance. But it's stable at stock now so I can't return it anymore. So you should RMA while you can. To get rock stable better CPU with overclocking / curve downvoltage potential.


Hi, I have asus dark hero 5950x latest bios 3800cl14 gksill, i cannot run stable anything past 2133/1066 ( tried 3600 various timings/volt, DOCP, 3800) etc. all my errors are Stop Code; WHEA Uncorrectable Error ..

I ran out of ideas what to do, took it to a PC shop today for a full diagnostic. i wonder what they tell me. do you think i should RMA the CPU?


----------



## iambkm01

LAA said:


> Not sure I agree with this.
> I had WHEA errors and random restarts pretty much every day with my 5950x on x570 aorus master using F30 bios, all on stock settings too other than xmp.
> Then I flashed F31 final just before the new year, again using all stock settings except XMP and not had an issue since.


what do you mean you flashed? you did a bios flash ? im using dark hero 3800mh Cl14 cant even run stable at 3600 cl16


----------



## iambkm01

MusicalPulse said:


> Received my 5900x and Aorus Master x570 two days ago and I have the same problem... random reboots and whea bsods at stock settings 2133 mhz ram on a new windows install (xmp 3200 basically instantly bsods). Tried a bunch of different voltage settings and things from this thread and nothing worked. Disabling core performance boost seemed to work but then got a restart a few hours in. Testing +8 curve optimizer all cores suggested by aa.delite and no restarts yet for 2 hours..will see if its okay. Even if it is stable though, really doubting I'll get 3200 mhz to work. Do you guys think I should just RMA the cpu or wait for BIOS updates?
> 
> Pretty upset about all of this right now.an


bro i have the same exact issue but 5950x dark hero 3800cl14, only stable at 2133/1066.... im too new to even mess around with timings or voltages but its my first build in 12 years very upset


----------



## iambkm01

Btw for any reading my comments. My system was stable on default 2133/1066 for 24 hours + testing various ram tests. Does this mean that my CPU is fine and my ram kit may be bad or I should change timings/voltages?


----------



## JohnnyFlash

iambkm01 said:


> Btw for any reading my comments. My system was stable on default 2133/1066 for 24 hours + testing various ram tests. Does this mean that my CPU is fine and my ram kit may be bad or I should change timings/voltages?


Turn off "Core Performance Boost" in the BIOS and then test your ram at higher speeds.


----------



## machine038

iambkm01 said:


> My system was stable on default 2133/1066 for 24 hours + testing various ram tests. Does this mean that my CPU is fine and my ram kit may be bad or I should change timings/voltages?


If you can run you system at default bios settings means that your CPU is fine as far I can tell.
Can you enable DOCP and share your ZenTimings data? There is PNG export tool that you can upload here.

With my own experience troubleshooting my CPU issues, probably your current BIOS version have unstable default settings. This is fixed with newer AGESA versions and BIOS updates. For example, I couldn't run at 3600/1800 at 2802 bios for my Asus ROG Strix-I, was stable only after setting sane voltage values & setting the clock of 3200/1600. At the latest 3202 is stable at 3600/1800. I've read around the internet that AGESA 1.1.9.0 help reaching the overclocked infinite fabric speed up to 2000, so that is worth a shot.

See AMD Announces AGESA 1.1.9.0 Firmware Updates, Improve FCLK OC Stability for the news about it.

PS: Setting your infinity fabric faster than 1600 is considered overclocking and is not guaranteed stability.


----------



## iambkm01

machine038 said:


> If you can run you system at default bios settings means that your CPU is fine as far I can tell.
> Can you enable DOCP and share your ZenTimings data? There is PNG export tool that you can upload here.
> 
> With my own experience troubleshooting my CPU issues, probably your current BIOS version have unstable default settings. This is fixed with newer AGESA versions and BIOS updates. For example, I couldn't run at 3600/1800 at 2802 bios for my Asus ROG Strix-I, was stable only after setting sane voltage values & setting the clock of 3200/1600. At the latest 3202 is stable at 3600/1800. I've read around the internet that AGESA 1.1.9.0 help reaching the overclocked infinite fabric speed up to 2000, so that is worth a shot.
> 
> See AMD Announces AGESA 1.1.9.0 Firmware Updates, Improve FCLK OC Stability for the news about it.
> 
> PS: Setting your infinity fabric faster than 1600 is considered overclocking and is not guaranteed stability.


sure here is the latest run i did. 3600/1800, tweaked DRAM voltage and SOCM a little bit, unstable.


----------



## iambkm01

machine038 said:


> If you can run you system at default bios settings means that your CPU is fine as far I can tell.
> Can you enable DOCP and share your ZenTimings data? There is PNG export tool that you can upload here.
> 
> With my own experience troubleshooting my CPU issues, probably your current BIOS version have unstable default settings. This is fixed with newer AGESA versions and BIOS updates. For example, I couldn't run at 3600/1800 at 2802 bios for my Asus ROG Strix-I, was stable only after setting sane voltage values & setting the clock of 3200/1600. At the latest 3202 is stable at 3600/1800. I've read around the internet that AGESA 1.1.9.0 help reaching the overclocked infinite fabric speed up to 2000, so that is worth a shot.
> 
> See AMD Announces AGESA 1.1.9.0 Firmware Updates, Improve FCLK OC Stability for the news about it.
> 
> PS: Setting your infinity fabric faster than 1600 is considered overclocking and is not guaranteed stability.



I ran these tests on both bios version, latest beta and the one before that.


----------



## Deepcuts

iambkm01 said:


> sure here is the latest run i did. 3600/1800, tweaked DRAM voltage and SOCM a little bit, unstable.
> 
> View attachment 2473903


That is a very very low ProcODT. Increase it to something like 48.


----------



## machine038

iambkm01 said:


> sure here is the latest run i did. 3600/1800, tweaked DRAM voltage and SOCM a little bit, unstable.
> 
> View attachment 2473903


Oh yeah, your VSOC is too low for that speed and can cause instability. Try setting 1.1v at the BIOS. For reference, here are my current settings.











There is a program called DRAM Calculator for Ryzen DRAM Calculator for Ryzen (v1.7.3) Download
That was super helpful for to calculate the timings and voltages values


----------



## mwwl

I've been experiencing this after replacing a 3600xt with a 5900x (and changing nothing else). I have an MSI x570 ACE, which wouldn't even boot the 5900x using XMP (IF at 1800) until I upgraded to a beta BIOS. Even once it would boot, I'd still get WHEA BSODs (Cache Hierarchy Error and the like), usually after something intensive like playing a game and then going to something less intensive like a menu or lobby.

MSI released a new beta 7C35v1D6 today (with ComboAM4PIV2 1.2.0.0), which I already tried. Still BSODs unless CPB is off.

I've tried several suggestions from this thread, but the only thing that keeps it stable is turning off Core Performance Boost. I'm still hopeful that a BIOS update will fix this, but I'd do a cross ship RMA in a heartbeat if that were a thing (is it?).


----------



## Imraneo

mwwl said:


> I've been experiencing this after replacing a 3600xt with a 5900x (and changing nothing else). I have an MSI x570 ACE, which wouldn't even boot the 5900x using XMP (IF at 1800) until I upgraded to a beta BIOS. Even once it would boot, I'd still get WHEA BSODs (Cache Hierarchy Error and the like), usually after something intensive like playing a game and then going to something less intensive like a menu or lobby.
> 
> MSI released a new beta 7C35v1D6 today (with ComboAM4PIV2 1.2.0.0), which I already tried. Still BSODs unless CPB is off.
> 
> I've tried several suggestions from this thread, but the only thing that keeps it stable is turning off Core Performance Boost. I'm still hopeful that a BIOS update will fix this, but I'd do a cross ship RMA in a heartbeat if that were a thing (is it?).


Same here. Tried everything. Only 2 things that makes me run my system:

1) Turn off Core performance boost 
2) Fix Vcore at 1.1V AND turn off C-states

Option 2 is more ideal, I get the boosts and what not. however, performance still suffers as there's neither headroom to boost nor legroom to save energy. 

You may want to try 2). It is clear that I get reboots with higher voltages/boosts. Nothing can fix this.
I'm just glad AMD responded to me asking for a pic of my chip and sales invoice. They're ready to receive my chip once they get these.


----------



## frollic

MusicalPulse said:


> Anyone who RMA'd their CPU here, how long did it take for AMD to reply to your RMA request?


I had my RMA approved in 3 weeks.
It's been a week since they received, and approved the return.

Just now I received an email about the replacement being shipped, if they ship it with DHL
Express, as they did with the RMA, I should have it tomorrow or on Friday.


----------



## MikeS3000

So basically can someone walk us through what to expect for an RMA with AMD start to finish? Antonline where I bought my 5900x less than a month ago won't take a return and is sending me to AMD for warranty support. My understanding is that you go online and submit an RMA request. AMD does follow-up trouble-shooting and if they confirm it is defective then they will issue you an RMA # and shipping label. This can take a few weeks to approve. My concern is once the defective CPU is shipped out to AMD, how long has it been taking to get the replacement CPU. Start to finish from when the CPU left your hands when did you get a replacement back? My CPU is functional but has 1 defective core (#1 gold star) that fails single thread benchmarks at stock. Need +3-5 on curve optimizer to pass benchmarks on only that core, not acceptable for new CPU.


----------



## frollic

MikeS3000 said:


> So basically can someone walk us through what to expect for an RMA with AMD start to finish?











Replaced 3950X with 5950X = WHEA and reboots


Hello guys, so I have a 5600x, it has not been opened yet. The seal is intact. Unfortunately like the the rest of you, replacement in my country is a no go. Only rma is a possibility and even that would take months. I’m building my pc after 5 years and definitely don’t want to go through such...




www.overclock.net





My scenario was somewhat different though, since I had tried diff hardware, and it all failed.
The screenshot was the only conversation I had with them, next step was RMA approved,
no emails bouncing back and forth.

Still took three weeks from initial RMA request to approval.


----------



## MikeS3000

How long did it take once you shipped out your defective CPU to get your replacement?


----------



## frollic

MikeS3000 said:


> How long did it take once you shipped out your defective CPU to get your replacement?


That's in the post above yours


----------



## MikeS3000

My bad. Eyes must be tired . So a week or so turnaround isn't too bad. It just will suck to be without a working system for a few weeks.


----------



## GRABibus

Do several of you has asked a crucial question to AMD ?

« AMD, you have approved my RMA, thank you.
What are your quality analysis results on the RMA from your customers ? 
How can you ensure that the new one you will send will be working perfectly ? »

I work in a big components electronics company and when a customer has a quality issue, he sends back the product to my factory. We then performed analyses and we send him a 8d report.
Then, we replace the product with good ones.

‘please, ask those questions to AMD.
Because if they accept RMA and if they send a new CPU, as this is a major company with high quality standards applied to all semi conductors companies (ISO, etc...), this means they should send a good CPU, so this means the issues are well known from their quality Department.


----------



## frollic

MikeS3000 said:


> My bad. Eyes must be tired . So a week or so turnaround isn't too bad. It just will suck to be without a working system for a few weeks.


Someone from Mex posted he'd been waiting longer, don't take my experience as the absolute truth.


----------



## MikeS3000

Well I opened up a conversation with Antonline again asking why I am being forced to go through AMD when I am within my 30 day return policy. They did just email me and ask if I have any bent or broken pins so maybe they will end up handling the RMA and make life easier. Otherwise, perhaps I could buy like a $100 ebay Ryzen 3 1200 so I can at least have a cheap cpu to use while waiting on warranty support.


----------



## frollic

MikeS3000 said:


> Otherwise, perhaps I could buy like a $100 ebay Ryzen 3 1200 so I can at least have a cheap cpu to use while waiting on warranty support.


Just double check the CPU support list for your mobo, my B550 only support 3100s or better/higher.


----------



## yaniv82

MikeS3000 said:


> My bad. Eyes must be tired . So a week or so turnaround isn't too bad. It just will suck to be without a working system for a few weeks.


Here's my RMA experience, dates and some emails I received from AMD :

Dec 14
Started warranty claim

Dec 18
Your service request SR # *__* has been reviewed and updated.
We understand that you would like to request warranty service for your Ryzen 5950x processor.
We ask you to provide us with the following information in order to verify your request:
1. A photo of the processor installed in the motherboard socket, with the heatsink removed, clearly showing the processor model and its serial number.
2. Original invoice or purchase receipt, showing all details of the product and the purchase. For online purchases, only the original PDF invoice or invoice email will be accepted.
3. Details of all the steps and resolution tests performed, including the results of these, where it is demonstrated that the processor is defective.
It should be noted that without this information we will not be able to process the request.
Once the information has been verified satisfactorily, we will be happy to authorize the replacement of the processor.

Dec 18
Thank you for submitting your warranty claim.
Your request has been approved and the authorization number is: RMA#*_* which is valid for the next 30 days.
To ensure your processor returns safely at the return center, please review the Packing & Shipping Guidelines on the following link before you pack your processor: http://www.amd.com/returnguidelines/en
AMD’s service provider SIR Company will contact you to arrange collection of your defective processor 2-3 business days after you have received this email.
Your return processor will be subjected to inspection upon arrival at AMD returns center, and it must pass this inspection before a replacement product is approved.

Dec 23
Shipped CPU to Florida address

Jan 04
Your return processor has successfully passed the inspection and your replacement product is now approved.
Please expect a follow up email shortly confirming your replacement product shipment. 

Jan 04
This is a confirmation notice from AMD Global Customer Care with regards to Cross Ship Warranty Request RMA# *____*. for your AMD retail packaged Processor(s) in a Box.
Product Serial# Result________________ RETURN RECEIVED
This is to confirm receipt of your defective product which you shipped back to us as part of the Cross Ship Warranty Replacement.
This completes this transaction and we thank you for your business. 

Jan 11
I sent an email asking for an update:
Due to high demand of AMD products, our stock depleted very fast and we are in the process of getting new replacement inventory.
We sincerely apologize for the delay & we will ship replacement on top priority as soon as it arrives in warehouse. 

Jan 13
No ETA


----------



## ghiga_andrei

they have so many RMAs that hey depleted the stock of good chips... class action suit would be great, i hope a lawyer buys a crap 5000 series and starts a suit...


----------



## frollic

ghiga_andrei said:


> they have so many RMAs that hey depleted the stock of good chips... class action suit would be great, i hope a lawyer buys a crap 5000 series and starts a suit...


I think you post from the wrong continent, for that to work out


----------



## ghiga_andrei

as soon as USA will see more stock we'll see what happens... also, I think the EU also updated some laws that now we can also make a class action, but i'm an electrical engineer not a lawyer, so don't know for sure...


----------



## ghiga_andrei

well s**t... it will happen in 2 years, maybe for zen 4 if any of us will ever buy AMD again:








Class-action lawsuits to become EU law – DW – 11/24/2020


The European Parliament has adopted legislation to allow EU consumers to defend their rights collectively. The measure, which is to become law within two years, aims to give individuals more power against corporations.




www.dw.com


----------



## JohnnyFlash

ghiga_andrei said:


> as soon as USA will see more stock we'll see what happens... also, I think the EU also updated some laws that now we can also make a class action, but i'm an electrical engineer not a lawyer, so don't know for sure...


I'm not a lawyer, but I do have a law degree. As long as they are acting in good faith replacing CPUs and following up with customers, they're pretty safe. You can't just sue because you're angry if there are no substantive damages, and if they're allowing RMAs/returns there aren't really any of weight.


----------



## ghiga_andrei

JohnnyFlash said:


> I'm not a lawyer, but I do have a law degree. As long as they are acting in good faith replacing CPUs and following up with customers, they're pretty safe. You can't just sue because you're angry if there are no substantive damages, and if they're allowing RMAs/returns there aren't really any of weight.


well people spend a lot of effort to debug the cpus and this time may be valued a lot... also stores taking up returns have a cost involved in this... also me for returning the 1st defective cpu, i payed for the shipment to the store and spend some gas to go and buy a new one... but mostly the stress and debug time... I think I spent 1 week testing different options, mounted the cooler back 3 times, spent thermal paste (which is not cheap btw, I had to buy another gram of Kryonaut for like 10euros just because i mounted the cpu 3 times)...


----------



## JohnnyFlash

ghiga_andrei said:


> well people spend a lot of effort to debug the cpus and this time may be valued a lot... also stores taking up returns have a cost involved in this... also me for returning the 1st defective cpu, i payed for the shipment to the store and spend some gas to go and buy a new one... but mostly the stress and debug time... I think I spent 1 week testing different options, mounted the cooler back 3 times, spent thermal paste (which is not cheap btw, I had to buy another gram of Kryonaut for like 10euros just because i mounted the cpu 3 times)...


I understand how you feel. Don't worry about the stores, they will get taken care of.

Ask yourself: What's the maximum amount of money the impact you just described is worth in $ that you could prove to someone else? Then remove 40% of that for the lawyer's contingency. Is that amount of money worth giving a deposition and waiting 4-5 years for a judgement? That would be assuming it can be proven they knew they were sending out bad CPUs, which doesn't appear to be the case based on their actions so far.

Look at the redbull lawsuit: Everyone sees 13 million and thinks "wow". But that was actually $10/person and 5 million in legal fees. I know you're upset, I would be too. Let see how this plays out.


----------



## ghiga_andrei

JohnnyFlash said:


> I understand how you feel. Don't worry about the stores, they will get taken care of.
> 
> Ask yourself: What's the maximum amount of money the impact you just described is worth in $ that you could prove to someone else? Then remove 40% of that for the lawyer's contingency. Is that amount of money worth giving a deposition and waiting 4-5 years for a judgement? That would be assuming it can be proven they knew they were sending out bad CPUs, which doesn't appear to be the case based on their actions so far.
> 
> Look at the redbull lawsuit: Everyone sees 13 million and thinks "wow". But that was actually $10/person and 5 million in legal fees. I know you're upset, I would be too. Let see how this plays out.


Agreed with most of what you said, since you have the experience and I do not, except that at this point I am most certain they know they are sending bad CPUs. They have constant RMAs and both Reddit AMD channel and this forum pages are full with people complaining. Not counting returns to the stores which for sure are reported back to AMD. As soon as they accept so many RMAs for how small number of cpus are sold due to lack of stock they for sure know something is wrong.


----------



## Deepcuts

JohnnyFlash said:


> I understand how you feel. Don't worry about the stores, they will get taken care of.


They will be taken care of, but not how you think.
The shops will increase prices, so in the end, the clients will pay through the nose.


----------



## iambkm01

Hi GuysI need help. I changed the memory settings to 16 16 16 16 36. DRAM volt to 1.35, SOC to 1.1 and ran TestMem5 test - you can see my screenshots here on google drive for convenience;






defsdf - Google Drive







drive.google.com





i had so many errors and my PC crashed 1 seconds after that mem test. also for some reason zentimings doesnt properly show my adjusted SOC voltage of 1.1


----------



## JohnnyFlash

iambkm01 said:


> Hi GuysI need help. I changed the memory settings to 16 16 16 16 36. DRAM volt to 1.35, SOC to 1.1 and ran TestMem5 test - you can see my screenshots here on google drive for convenience;
> 
> 
> 
> 
> 
> 
> defsdf - Google Drive
> 
> 
> 
> 
> 
> 
> 
> drive.google.com
> 
> 
> 
> 
> 
> i had so many errors and my PC crashed 1 seconds after that mem test. also for some reason zentimings doesnt properly show my adjusted SOC voltage of 1.1


Turn off core performance boost and try again. You still have it on "Auto" in the pics.


----------



## iambkm01

Hi All,

Finally figured it out. For anyone having issues, I am running Asus Dark Hero, with 3102 bios, 5950x, G Skill 3800 CL 14. 

ZenTimings: see pics in my google drive: 









IMG_9037.JPG







drive.google.com





I changed my settings to 3600 CL 16 16 16 16 36 , DRAM VOLT to 1.35 - no issues on TestMem5. Stable as a rock 1hr 30m testing now. No other options changed/voltages set.

I wonder what is stopping me from Running 3800 CL 14..hmmm


----------



## Deepcuts

iambkm01 said:


> Hi All,
> 
> Finally figured it out. For anyone having issues, I am running Asus Dark Hero, with 3102 bios, 5950x, G Skill 3800 CL 14.
> 
> ZenTimings: see pics in my google drive:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> IMG_9037.JPG
> 
> 
> 
> 
> 
> 
> 
> drive.google.com
> 
> 
> 
> 
> 
> I changed my settings to 3600 CL 16 16 16 16 36 , DRAM VOLT to 1.35 - no issues on TestMem5. Stable as a rock 1hr 30m testing now. No other options changed/voltages set.
> 
> I wonder what is stopping me from Running 3800 CL 14..hmmm


Which part of this topic made you think it is about overclocking RAM? Pretty please, stay on topic.


----------



## iambkm01

Deepcuts said:


> Which part of this topic made you think it is about overclocking RAM? Pretty please, stay on topic.



Well i was getting WHEA REboots using my ram above and i finally got it stable. i think other users with my parts are having issues with ram stability so i shared my settings with other dark hero / 5950x users.


----------



## GRABibus

iambkm01 said:


> Well i was getting WHEA REboots using my ram above and i finally got it stable. i think other users with my parts are having issues with ram stability so i shared my settings with other dark hero / 5950x users.


this thread is mainly for people who have reboots, Bsod, wheas at stock settings


----------



## Jay109

So I RMAd my 5950x and got it approved. They're now shipping me a new model and I'll retry my tests, the rma process took about 2 weeks.


----------



## mwwl

Imraneo said:


> Same here. Tried everything. Only 2 things that makes me run my system:
> 
> 1) Turn off Core performance boost
> 2) Fix Vcore at 1.1V AND turn off C-states
> 
> Option 2 is more ideal, I get the boosts and what not. however, performance still suffers as there's neither headroom to boost nor legroom to save energy.
> 
> You may want to try 2). It is clear that I get reboots with higher voltages/boosts. Nothing can fix this.
> I'm just glad AMD responded to me asking for a pic of my chip and sales invoice. They're ready to receive my chip once they get these.


Awesome thanks! I just gave 2) a try. So far so good (I have a fairly good repro case to trigger WHEAs using Overwatch and strategic alt-tabbing), though I wonder if I did it correctly? In Ryzen Master it still shows cores going to sleep, even though I turned off c-states, and the core voltage is a lot higher than the 1.1 override I set. Do you know if that's expected?

EDIT: well, looking at the idle cpu wattage, it does seem like even though the cores go to sleep, they're still using a decent amount of power. And using HWMonitor I see VCORE at around 1.1 V


----------



## Hueristic

Have you guys even checked into cross shipping?


----------



## NikitoOficial

I have a b450m *gigabyte* and a* r5 5600x* and i was getting *WHEA*

I've notice that on 10 min test (Aida 64) .. one of the cores got *75 ºC*.. another one got *15 ºC* ...and the last 4 got around º*55 C*. Change the themal compound.. but got same results...This is normal???

So..* i've opened my case to lower the temperature*.. and... voilá... *no more WHEA*.


----------



## JohnJ27

Hi, I have brnad new PC and experienced same BSOD / freezes (41, Kernel-Power)
My system is:
AMD Ryzen 7 5800X
ASUS ROG STRIX B550-F GAMING (WI-FI) - AMD B550
Corsair Vengeance LPX Black 32GB (2x16GB) DDR4 3600 CL18

I passed memtest for 4 hours with two different bios version (AGESA V2 PI 1.1.0.0 Patch B and AGESA V2 PI 1.1.9.0.)
I spent lot of hours reading forums about this problem than I realized one thing! Every time i experienced this problem (BSOD, freeze) I was running hwmonitor software. 

*Can anybody tell whether hwmonitor can cause this ??? Can somebody who has these crashes try it? *
Now I did stress test with FS2020 (GPU 100% ussage) together with Cinebech R23 running, no crash after half an hour. With HW monitor, I get random crashes.

Now I am runnig with AGESA V2 PI 1.1.0.0 Patch B which I downgraded from 1.1.9.0 which is beta bios. I did this before I accuse HWMONITOR sw.

I am considering updating to new AGESA V2 PI 1.2.0.0 which is still marked as beta.


----------



## glith

I got my replacement CPU today. After a few hours it still seems fine. I did update the BIOS to the latest beta 3202.
One thing I noticed right away is that it doesnt boost as high as my first CPU, Only to 4.7-4.8Ghz where the old faulty went over 5ghz without problems... 
But a long as it is stable I cant really complain at this moment.


----------



## Redlurkeraite

I Rmaed my 5950x as I was having constant bsods, reboots and wheas. 
I tested the unit with three different boards, b450 ds3h, ch8 and ch8 dark hero the issue still persisted. 
I managed to get a replacement unit after conversing with the warranty team for a period just over a month. 
The replacement unit I have received is also defective as the second ccd is does not boost over 3.8GHz on stock settings. 
Now the warranty is making jump through hoops with endless irrelevant troubleshooting questions, such as taking a picture of the CPU attached to the motherboard. 
Now the AMD technician team has said as long as the CPU is running above 3.4GHz it is within specifications... 

:/ This definitely has not been a great experience with AMD.


----------



## JohnnyFlash

Redlurkeraite said:


> I Rmaed my 5950x as I was having constant bsods, reboots and wheas.
> I tested the unit with three different boards, b450 ds3h, ch8 and ch8 dark hero the issue still persisted.
> I managed to get a replacement unit after conversing with the warranty team for a period just over a month.
> The replacement unit I have received is also defective as the second ccd is does not boost over 3.8GHz on stock settings.
> Now the warranty is making jump through hoops with endless irrelevant troubleshooting questions, such as taking a picture of the CPU attached to the motherboard.
> Now the AMD technician team has said as long as the CPU is running above 3.4GHz it is within specifications...
> 
> :/ This definitely has not been a great experience with AMD.


That really sucks, but they are right, you are only guarenteed 3.4GHz. Intel does this too, I had a 7940X that would only do the base clock under full load at default.

If you're having no WHEAs though, that's great news. Play with the settings a bit, maybe you can get that extra performance you want.


----------



## Redlurkeraite

JohnnyFlash said:


> That really sucks, but they are right, you are only guarenteed 3.4GHz. Intel does this too, I had a 7940X that would only do the base clock under full load at default.
> 
> If you're having no WHEAs though, that's great news. Play with the settings a bit, maybe you can get that extra performance you want.


I'll probably end up returning the CPU. 
Don't really want to keep a 5950x which performs worse than a 5600x or intel equivalent in single core.
Bare in mind this is with a custom watercooling loop, one could only imagine how it would perform if it was aircooled or with an AIO.
Whilst running cinebench there's a 15-20 degree delta between the two ccds. 
I don't really know AMD is playing at this time.


----------



## aa.delite

JohnJ27 said:


> Now I did stress test with FS2020 (GPU 100% ussage) together with Cinebech R23 running, no crash after half an hour. With HW monitor, I get random crashes.


Seems like you have reboots at idle. There are no reboots while heavy load. Try browsing/idling to max boost 1 core. You'll get reboots without hwmonitor. Seems like your CPU is defective.
You may check it by adding Curve Optimizer positive value up to 8-10. CPU should become stable and you should ask for RMA.


----------



## wadec22

Man.... I was so excited I have a 5950x arriving Tuesday. Now I'm wondering if I should just sell it and keep my 3950x and avoid a potential hassle.

Sorry for all those having trouble


----------



## RemoteSpecialist

Hello, friends!

A bit more stats from my side.

On december I got 5950x - there were random black screens restarts usually in idle workload (without WHEA event) with or without CPB\PBO. For example if I leave PC for a night I got ~2 of such restarts. Spent a week about trying to solve this using different combinations. No luck.

Change processor to 5900x and now it starts to work even worse. With 5900x - I got this WHEA error and bsod immediately during simple benchmark run in Shadow Of Tomb Rider. So I disable again CPB and PBO - but the same black screen restart in low load were reproduced. So I'm changing CPU again (this time I sent back to the shop cpu+motherboard+memory - to be retested together in the shop's service center). I think there is a very small chance that there is an issue in motherboard (Gygabyte B550 Aorus Master with latest firmware), but for 99% it's CPU again. That's really frustrating. I did not expect such low quality control for CPUs 

I attached some logs from events for both cases


----------



## Deepcuts

wadec22 said:


> Man.... I was so excited I have a 5950x arriving Tuesday. Now I'm wondering if I should just sell it and keep my 3950x and avoid a potential hassle.
> 
> Sorry for all those having trouble


Just keep your 3950X until you can be sure the new 5950X is stable.
I am bad at math, but I think most Ryzen 5000 users are getting working samples.


----------



## Imraneo

Heads up:
Asus released a new BIOS 3402 based on AGESA 1.2.0.0 for anyone who wants to try.
Doubt I will try it. My chip is in a box, waiting for be shipped out.


----------



## Anthos

wadec22 said:


> Man.... I was so excited I have a 5950x arriving Tuesday. Now I'm wondering if I should just sell it and keep my 3950x and avoid a potential hassle.
> 
> Sorry for all those having trouble


Well to be fair for any product if you go into its technical forum you are bound to run into people that have issues that will make you not wanna buy it. Now to what percentage this is, is impossible to tell, as always people with negative experiences are more vocal than those that don't have any. The chances are in your favour that everything should be ok. Worst case scenario if you have issues and you are worried just return it.


----------



## Imraneo

wadec22 said:


> Man.... I was so excited I have a 5950x arriving Tuesday. Now I'm wondering if I should just sell it and keep my 3950x and avoid a potential hassle.
> 
> Sorry for all those having trouble


I believe these chips are still hard to find now? If yes, I would keep the 5950X. I still believe the majority of chips out there are good ones.
However, don't sell your 3950X yet until you do proper tests of the 5950X, Also, take note of the RMA procedures just in case. I do hope you get a good sample.
Cheers.


----------



## JohnnyFlash

Imraneo said:


> I believe these chips are still hard to find now? If yes, I would keep the 5950X. I still believe the majority of chips out there are good ones.
> However, don't sell your 3950X yet until you do proper tests of the 5950X, Also, take note of the RMA procedures just in case. I do hope you get a good sample.
> Cheers.


Great advice. If you have a chip that can go back in, open the 5950X and test it out right away.


----------



## frollic

frollic said:


> Just now I received an email about the replacement being shipped, if they ship it with DHL
> Express, as they did with the RMA, I should have it tomorrow or on Friday.


Got the replacement in my hand.

Sux AMD didn't provide any DHL tracking # for the new CPU, but at least it's here.


----------



## JohnJ27

aa.delite said:


> Seems like you have reboots at idle. There are no reboots while heavy load. Try browsing/idling to max boost 1 core. You'll get reboots without hwmonitor. Seems like your CPU is defective.
> You may check it by adding Curve Optimizer positive value up to 8-10. CPU should become stable and you should ask for RMA.


Nope. No crashes any more - load or idle. I am using mostly for Photoshop, Lightroom, web browsing, some gaming. Ryzen Master is my way to go for monitoring. I do not recommned using CPU-Z HW monitor or Open HW monitor. These causes me crashes.


----------



## brasoveanul

JohnJ27 said:


> Nope. No crashes any more - load or idle. I am using mostly for Photoshop, Lightroom, web browsing, some gaming. Ryzen Master is my way to go for monitoring. I do not recommned using CPU-Z HW monitor or Open HW monitor. These causes me crashes.


You should not have crashes on a decently sane system, even using other applications than Ryzen Master.


----------



## JohnnyFlash

brasoveanul said:


> You should not have crashes on a decently sane system, even using other applications than Ryzen Master.


I agree. What does Who Crashed have to say?


----------



## silot

Tried the 3402 , now i get bsod whea uncorrectable after 10mins of gaming even worse than previous bios versions that i had to play at least an hour to bsod.


----------



## ghiga_andrei

JohnJ27 said:


> Nope. No crashes any more - load or idle. I am using mostly for Photoshop, Lightroom, web browsing, some gaming. Ryzen Master is my way to go for monitoring. I do not recommned using CPU-Z HW monitor or Open HW monitor. These causes me crashes.


just give it more time... most probably those monitor apps just create the boost condition more often, but your other software will too, eventually... happens in my case also if I go too low on Curve Optimizer, after a few days I will get a reboot at random points during very light load or idle...


----------



## thunk_stuff

Redlurkeraite said:


> I Rmaed my 5950x as I was having constant bsods, reboots and wheas.
> I tested the unit with three different boards, b450 ds3h, ch8 and ch8 dark hero the issue still persisted.
> I managed to get a replacement unit after conversing with the warranty team for a period just over a month.
> The replacement unit I have received is also defective as the second ccd is does not boost over 3.8GHz on stock settings.
> Now the warranty is making jump through hoops with endless irrelevant troubleshooting questions, such as taking a picture of the CPU attached to the motherboard.
> Now the AMD technician team has said as long as the CPU is running above 3.4GHz it is within specifications...
> 
> :/ This definitely has not been a great experience with AMD.


Are you talking about all core boost? This thread and a link in it to anandtech show all core boost 5950 averages 3800Mhz:

Reddit Link



https://images.anandtech.com/doci/16214/PerCore-1-5950X.png



Have you tried undervolting in curve optimizer to see if boost goes up if you can get overall power down?


----------



## SpeedyIV

Where is Asus BIOS 3402? For the Dark Hero, Asus Support still has 3202 Beta. Same for hardwareluxxe.de. Thanks.


----------



## RemoteSpecialist

Anthos said:


> Well to be fair for any product if you go into its technical forum you are bound to run into people that have issues that will make you not wanna buy it. Now to what percentage this is, is impossible to tell, as always people with negative experiences are more vocal than those that don't have any.


Yep, for sure any product can have some technical issues, but I got 2 defective CPUs in a row and I am unable to get working solution for 1.5 month already. So I am really expecting some details about these crashes from AMD.


----------



## Deepcuts

You gotta bump those number up!Those are rookie numbers!


----------



## liweichen6

Mine doesn't BSOD but the boost clock is quite low.
By default it boosts to 4.7 in CB R15/20/23 cpuz 1T, with curve-25 offset+0.375 it boosts to 4.8. Never saw a single burst to 4.9 in these workloads.
The only time I saw 4.9 is during a time spy run, when graphics test 2 just started. The sustained boost is still at 4.6-4.7.


----------



## Anthos

Deepcuts said:


> View attachment 2474426
> 
> You gotta bump those number up!Those are rookie numbers!


I don't know why you don't understand that these CPUs are still almost all the time out of stock and that RMA with AMD literally takes MONTHS.



liweichen6 said:


> Mine doesn't BSOD but the boost clock is quite low.
> By default it boosts to 4.7 in CB R15/20/23 cpuz 1T, with curve-25 offset+0.375 it boosts to 4.8. Never saw a single burst to 4.9 in these workloads.
> The only time I saw 4.9 is during a time spy run, when graphics test 2 just started. The sustained boost is still at 4.6-4.7.


Isn't it contradicting to put a negative curve and a positive cpu offset or am I missing something?


----------



## Deepcuts

@Anthos I know 1st hand how much an AMD RMA takes these days
If you can't take a hint that I wish more people get their CPUs changed for working ones, take a chill pill and stop posting when you are angry.


----------



## Anthosm

Deepcuts said:


> @Anthos I know 1st hand how much an AMD RMA takes these days
> If you can't take a hint that I wish more people get their CPUs changed for working ones, take a chill pill and stop posting when you are angry.


How selfless of you. Now stop posting the same thing over and over.


----------



## brasoveanul

Anthosm said:


> How selfless of you. Now stop posting the same thing over and over.


There is no point in attempting to be sarcastic. We understand your frustration, and we agree that in Romania is significantly easier to replace the processor than it seems to be the case in other countries, but still, Deepcuts simply tries to build a statistic that may convince people this is mostly a problem that is generated by a faulty processor, which may represent an incentive for them to attempt replacing the CPU at their earliest opportunity, rather than wasting their time pointlessly with endless testing and tweaking efforts.


----------



## Anthos

brasoveanul said:


> There is no point in attempting to be sarcastic. We understand your frustration, and we agree that in Romania is significantly easier to replace the processor than it seems to be the case in other countries, but still, Deepcuts simply tries to build a statistic that may convince people this is mostly a problem that is generated by a faulty processor, which may represent an incentive for them to attempt replacing the CPU at their earliest opportunity, rather than wasting their time pointlessly with endless testing and tweaking hours.


He tries to build a statistic? Well based on his poll an X number of users fixed their problem by replacing their cpu and 0,45X fixed it through bios. That should be enough to keep him a bit more quiet.


----------



## brasoveanul

Anthos said:


> He tries to build a statistic? Well based on his poll an X number of users fixed their problem by replacing their cpu and 0,45X fixed it through bios. That should be enough to keep him a bit more quiet.


Pointless.....


----------



## Anthos

brasoveanul said:


> Pointless.....


Yeah.. just what I expected.


----------



## liweichen6

Anthos said:


> I don't know why you don't understand that these CPUs are still almost all the time out of stock and that RMA with AMD literally takes MONTHS.
> 
> 
> 
> Isn't it contradicting to put a negative curve and a positive cpu offset or am I missing something?


I saw this article Ryzen 9 5950X Curve Optimizer to 5.1 GHz, PBO and overclocking and decided to give it a try. Interestingly the results did improve comparing to default and curve-15.


----------



## lobbo232

I thought I'd post my experience here too. I'm a little stuck on what to do...

I have a 5900x with a Gigabyte b550i Aorus Pro AX board running bios F11.

I build the PC over 9th and 10th January and have used it for work for the first week with no problems at all.
Since the evening of Saturday 16th I have started experiencing WHAE Unrecoverable Error blue screen while booting.
It happens every time now, but will try and fail to boot a few times but eventually work. Once in Windows the system will then run like normal.

BIOS setting are stock apart from fan curves. I have tried with XMP enabled and disabled.


----------



## frollic

lobbo232 said:


> I thought I'd post my experience here too. I'm a little stuck on what to do...
> 
> I have a 5900x with a Gigabyte b550i Aorus Pro AX board running bios F11.
> 
> I build the PC over 9th and 10th January and have used it for work for the first week with no problems at all.
> Since the evening of Saturday 16th I have started experiencing WHAE Unrecoverable Error blue screen while booting.
> It happens every time now, but will try and fail to boot a few times but eventually work. Once in Windows the system will then run like normal.
> 
> BIOS setting are stock apart from fan curves. I have tried with XMP enabled and disabled.


I have the same mobo, and CPU.

Only way to get _my_ 5900x stable, was to set 1CCD in BIOS, didn't touch anything else.
But your unit might be faulty in a different way than mine. 

It all ended with me RMAing the CPU with AMD, replacement arrived last week, and it's 
working flawlessly.


----------



## Anthosm

8u


lobbo232 said:


> I thought I'd post my experience here too. I'm a little stuck on what to do...
> 
> I have a 5900x with a Gigabyte b550i Aorus Pro AX board running bios F11.
> 
> I build the PC over 9th and 10th January and have used it for work for the first week with no problems at all.
> Since the evening of Saturday 16th I have started experiencing WHAE Unrecoverable Error blue screen while booting.
> It happens every time now, but will try and fail to boot a few times but eventually work. Once in Windows the system will then run like normal.
> 
> BIOS setting are stock apart from fan curves. I have tried with XMP enabled and disabled.


From what I could see the F11 is based on Agesa 1.1.0.0. Do you know if there is a beta bios by any chance based on 1.1.9.0 or 1.2.0.0? Some people reported much better stability on those versions. If not and your system is heavily unstable no matter what then probably might be best to return as not to miss the 14d rerurn window (if you haven't already). Otherwise if you can't be without a pc for work reasons you can play around in bios to see if you can get it to work and RMA once these supply issues are sorted.

On a related note I got my first WHEAs as well after a week of being fully stable which is a bit puzzling. Reported by others as well. Can't imagine what changes and makes them come up after some time in some cases and not from the get go.


----------



## frollic

Anthosm said:


> From what I could see the F11 is based on Agesa 1.1.0.0. Do you know if there is a beta bios by any chance based on 1.1.9.0 or 1.2.0.0?


The user stasio frequently posts beta BIOSes in GIGABYTE Latest Beta BIOS - TweakTown Forums

Usually before they hit Gigabytes home page, some of the FWs he's posted doesn't even make it there (for reasons unknown).

He's easy to spot, he's the guy with the subtle Speedtest.net footer in the posts.

Although I have to say my B550i (and RMAed 5900x) is rock stable with F11.


----------



## JohnnyFlash

Anthosm said:


> On a related note I got my first WHEAs as well after a week of being fully stable which is a bit puzzling. Reported by others as well. Can't imagine what changes and makes them come up after some time in some cases and not from the get go.


When you say stable, were you checking the windows event viewer for WHEA errors? 

If the system is unstable, it will slowly corrupt the windows install, which leads to something like this happening.


----------



## Anthosm

JohnnyFlash said:


> When you say stable, were you checking the windows event viewer for WHEA errors?
> 
> If the system is unstable, it will slowly corrupt the windows install, which leads to something like this happening.


Yeah. Going back even before I installed the new system in the event viewer my only WHEA errors were 4 on day 7, 1 on day 20 and one on day 24. The 4 i got initially ironically enough I got them when I decided to have a fresh install of windows and one of them (I assume) happened while booting off the USB in bios to install said windows so no possibility for corrupted installs etc.


----------



## Marucins

Someone checked the GIGABYTE BIOS F32abcdef... 
I don't know whether to install or stay with a functional and working F31


----------



## Imraneo

I'm one of the guys who had severe degradation of the CPU after 1 week of usage.
Sorry, but I don't buy into the theory of Windows corruption. Mainly because I did a fresh install after the CPU degraded (by using a fixed vCore/disabling CPB).
Just sent off my CPU for RMA. Disappointing. I expected AMD to have a statement at least, but I figure this would be very damaging to them.


----------



## JohnnyFlash

Ya, it doesn't look like that was the case. I think I have to decide to either sell my dark hero or get a 3950X.

Does turning off core precision boost stop the crashes?


----------



## Imraneo

For me, turning off CPB made it rock stable.
It's basically sensitive to voltage spikes or high voltages. In order words, if could neither boost properly, nor go to sleep (cos when it has to wake up, poof! reboots)
Also meant that constant 1.1V vcore worked too, so it will boost as much as energy 1.1V would give. This meant performance was limited.


----------



## Anthosm

Imraneo said:


> I'm one of the guys who had severe degradation of the CPU after 1 week of usage.
> Sorry, but I don't buy into the theory of Windows corruption. Mainly because I did a fresh install after the CPU degraded (by using a fixed vCore/disabling CPB).
> Just sent off my CPU for RMA. Disappointing. I expected AMD to have a statement at least, but I figure this would be very damaging to them.


It's pretty much near impossible that a cpu degrades a week after use at stock settings. If that was possible then the overclocking topics that have people pushing them to the limits should have multiple people reporting problems. Most likely this happens due to several factors. But yeah, AMD is really dropping the ball here by not issuing a statement about it. Their silence is deafening.


----------



## MikeS3000

Anyone had luck with AMD sending out a new working CPU while they are waiting for the defective one to be sent back? I want to RMA but also know that I will be without a PC for probably a month or more. It's a 3 year warranty right? Maybe it is best to RMA in 6 months when there hopefully is adequate supply.


----------



## JohnnyFlash

Imraneo said:


> For me, turning off CPB made it rock stable.
> It's basically sensitive to voltage spikes or high voltages. In order words, if could neither boost properly, nor go to sleep (cos when it has to wake up, poof! reboots)
> Also meant that constant 1.1V vcore worked too, so it will boost as much as energy 1.1V would give. This meant performance was limited.


This is what give me hope, as I won't be using boost. However not being able to sleep is not great either; I may have to exhaust vent my rad to the hvac system.


----------



## Imraneo

JohnnyFlash said:


> This is what give me hope, as I won't be using boost. However not being able to sleep is not great either; I may have to exhaust vent my rad to the hvac system.


If you're not going to boost, you'll be running at base clock 3.8Ghz all the way. Are you sure you're ok with that?


----------



## JohnnyFlash

Imraneo said:


> If you're not going to boost, you'll be running at base clock 3.8Ghz all the way. Are you sure you're ok with that?


For my goals, yes. 

If my chip is stable enough to do a manual all-core overlock, then I will try for a small amount there, but I'm more about efficiency and the temperature of my office than anything.


----------



## Alvy

2020-11-13
"5950X Delivered"

2020-11-16
"AMD: Your Service Request has been received and will be processed shortly. Depending on the nature of your inquiry, further automated messages with additional instructions might follow"

2020-12-31
"AMD: Your RMA request has been approved"

2021-01-18
"AMD: Your return processor has successfully passed the inspection and your replacement product is now approved. Please expect a follow up email shortly confirming your replacement product shipment." 

\o/


----------



## GamBoTron

Alvy said:


> 2020-11-13
> "5950X Delivered"
> 
> 2020-11-16
> "AMD: Your Service Request has been received and will be processed shortly. Depending on the nature of your inquiry, further automated messages with additional instructions might follow"
> 
> 2020-12-31
> "AMD: Your RMA request has been approved"
> 
> 2021-01-18
> "AMD: Your return processor has successfully passed the inspection and your replacement product is now approved. Please expect a follow up email shortly confirming your replacement product shipment."
> 
> \o/


Gratz! must be a good feeling after all that waiting.

I bought mine tru Ebay , so in case of a RMA im pretty much not gonna get it. Fingers crossed that it works as intended
🤞


----------



## JohnnyFlash

Alvy said:


> 2020-11-13
> "5950X Delivered"
> 
> 2020-11-16
> "AMD: Your Service Request has been received and will be processed shortly. Depending on the nature of your inquiry, further automated messages with additional instructions might follow"
> 
> 2020-12-31
> "AMD: Your RMA request has been approved"
> 
> 2021-01-18
> "AMD: Your return processor has successfully passed the inspection and your replacement product is now approved. Please expect a follow up email shortly confirming your replacement product shipment."
> 
> \o/


Congrats! Let us know how the new one does.


----------



## JohnnyFlash

GamBoTron said:


> I bought mine tru Ebay , so in case of a RMA im pretty much not gonna get it. Fingers crossed that it works as intended
> 🤞


You still can. Without receipt, warranty period starts the day of product release.


----------



## GamBoTron

JohnnyFlash said:


> You still can. Without receipt, warranty period starts the day of product release.


well i asked AMD, and they gave me this response:










The seller is a German reseller (he only has a Ebay "store tho) , so hopefully that helps. Still, im not gonna take anything for granted


----------



## JohnnyFlash

GamBoTron said:


> well i asked AMD, and they gave me this response:
> 
> The seller is a German reseller (he only has a Ebay "store tho) , so hopefully that helps. Still, im not gonna take anything for granted


Oh man, they changed the policy. Hope they take that as good enough.


----------



## GamBoTron

JohnnyFlash said:


> Oh man, they changed the policy. Hope they take that as good enough.


well im not too hopeful. 

In the end its my risk anyways, should have just waited, but im impatient.

The CPU is literally the ONLY component im missing. My 3080 Rog strix is just chilling in my room waiting to be assembled lol.


----------



## frollic

GamBoTron said:


> The seller is a German reseller (he only has a Ebay "store tho) , so hopefully that helps. Still, im not gonna take anything for granted


AMD didn't ask me for proof of purchase, TBH they didn't request any documents or photos at all.
I mean, there's a 3 year warranty, the processor is max 2.5 months old, it's not like it could be out of warranty 

Perhaps it all comes down to who handles your ticket, at the RMA center. Or it's just a way of stalling, due to
replacement shortages.


----------



## GamBoTron

frollic said:


> AMD didn't ask me for any proof of purchase at all, THB they didn't request any documents or photos.
> 
> I mean, there's a 3 year warranty, the processor is max 2.5 months old, it's not like it could be out of warranty


Ah, thats good news! 

Really hope the same applies to me


----------



## goondam

running 5950x on x470 ch7, no whea errors 

running 4201 beta bios


----------



## kr0mka

Hey all, I've recently sent the RMA form via the amd site for my 5900x. 
Received the reply on e-mail asking for basic info about the system, photo of the cpu and troubleshooting steps. 
Sent an extensive reply with all the information required and more. 
A couple days later I've received a reply asking for the same information again (cpu photo, troubleshooting steps etc) but just phrased a bit different (like it was sent by a different helpdesk tech). 
I've sent another reply with the details i've provided before copied in there. Now waiting for response.

Am I doing something wrong? Or is it normal for this rma process to be this confusing? All the SR ticket numbers are matching, it's not that I'm making a new one each time I reply, since the request number stays in the e-mail body every time.


----------



## Imraneo

kr0mka said:


> Hey all, I've recently sent the RMA form via the amd site for my 5900x.
> Received the reply on e-mail asking for basic info about the system, photo of the cpu and troubleshooting steps.
> Sent an extensive reply with all the information required and more.
> A couple days later I've received a reply asking for the same information again (cpu photo, troubleshooting steps etc) but just phrased a bit different (like it was sent by a different helpdesk tech).
> I've sent another reply with the details i've provided before copied in there. Now waiting for response.
> 
> Am I doing something wrong? Or is it normal for this rma process to be this confusing? All the SR ticket numbers are matching, it's not that I'm making a new one each time I reply, since the request number stays in the e-mail body every time.


It's strange that they are asking the same thing again. Mine was pretty alright. First request was the steps and system specs. 2nd request was pic of invoice and CPU.
Overall, patience is required. At times I got replies in 1 to 1.5 days.
As long as you get an automated response that you have sent something to the ticket, you're good to go.


----------



## xeizo

I suppose they have to moderate the RMA:s a bit, or all "overclockers" would try to win the silicon lottery that way


----------



## frollic

xeizo said:


> I suppose they have to moderate the RMA:s a bit, or all "overclockers" would try to win the silicon lottery that way


A CPU unable to overclock isn't "bad", it just doesn't OC very well.
Hopefully AMDs RMA dep does more than just a physical inspection of the returned units, but who knows.


----------



## goondam

my 5950x serial number was 2047 for those curious about the production batch


----------



## SpeedyIV

A guy on the Asus ROG forum used Curve Optimizer to figure out that his 5900X has 2 really weak cores. He fed them more voltage than the other cores and was able to overcome the WHEA errors and low load BSODs. He may be on to something. Here's a link.

Interesting find, 5900x not stable without curve optimizer, is my CPU faulty?


----------



## xeizo

SpeedyIV said:


> A guy on the Asus ROG forum used Curve Optimizer to figure out that his 5900X has 2 really weak cores. He fed them more voltage than the other cores and was able to overcome the WHEA errors and low load BSODs. He may be on to something. Here's a link.
> 
> Interesting find, 5900x not stable without curve optimizer, is my CPU faulty?


Yes, I was the one answering him and became inspired. I've found out I have one bad core, I isolated it in Curve Optimizer and now everything looks really good. Mine is not as bad as having to use positive offset, it can do with -1 but anything more and it spits out WHEA. All the other 11 cores can take a beating with CO.


----------



## SpeedyIV

Yeah I saw your posts there. In his case he will probably RMA anyway but I think this could be a viable solution for a lot of people. Seems like under low loads, the weak cores need more voltage than the system is set to provide, which results in WHEA errors and sometimes a BSOD. I have been following this thread and others - there are a lot of people fighting with this. I don't even have my CPU yet. So far, case, PSU, RAM, and Mobo (Dark Hero). Still trying for a 5900X. I figure in the mean time the least I can do is keep abreast of what others are experiencing. May end up saving me a lot of trouble-shooting time later (assuming I ever GET the CPU).


----------



## JohnnyFlash

So it was TSMC then, and not AMD to blame. They're not producing to spec.


----------



## jomama22

liweichen6 said:


> Mine doesn't BSOD but the boost clock is quite low.
> By default it boosts to 4.7 in CB R15/20/23 cpuz 1T, with curve-25 offset+0.375 it boosts to 4.8. Never saw a single burst to 4.9 in these workloads.
> The only time I saw 4.9 is during a time spy run, when graphics test 2 just started. The sustained boost is still at 4.6-4.7.


You realize setting a positive core offset is completely negating you curve yeah? That's why you aren't boosting...


----------



## goondam

JohnnyFlash said:


> So it was TSMC then, and not AMD to blame. They're not producing to spec.


not sure about that but irc it was the older batch of production that had issues









BG 2038-2044 had issues
mine is 2047, some of the newer ones are 2049


----------



## Imraneo

goondam said:


> not sure about that but irc it was the older batch of production that had issues
> 
> 
> 
> 
> 
> 
> 
> 
> 
> BG 2038-2044 had issues
> mine is 2047, some of the newer ones are 2049


Is this for sure? I remember this was discussed previously and and bath numbers are all over the place.
Anyways, mine is 2043, just reached AMD today. 

It is highly possible that it's a TSMC fabrication issue. Design from AMD is probably ok. I ain't no expert though.. lol!


----------



## xeizo

As these are so hard to get I will get by with my 5900X and it's one weak core until Zen3+ lands, will make another try for the 5950X replacement(5950XT?). I had the 5950X on order for long, I ordered during the first 30 seconds it was available but eventually gave up when 5900X popped up for a few minutes. 

I'm sure batches will improve in time,as it always has, the problem now is to even buy one


----------



## Marucins

GIGABYTE released the final bios for several MOB


X570....F32 .....01/18/2021
B550....F12......01/18/2021
X470....F60f......01/15/2021
B450....F60f......01/15/2021

But why is old AGESA still there?


----------



## Anthosm

JohnnyFlash said:


> So it was TSMC then, and not AMD to blame. They're not producing to spec.





Imraneo said:


> Is this for sure? I remember this was discussed previously and and bath numbers are all over the place.
> Anyways, mine is 2043, just reached AMD today.
> 
> It is highly possible that it's a TSMC fabrication issue. Design from AMD is probably ok. I ain't no expert though.. lol!


I'd personally doubt it's a TSMC issue. They 've been fabricating so many chips for so many electronics and there hasn't ever been an issue as far I know with screwing production up themselves. My bet is probably a combination of silicon lottery with bios.
AMD probably knows what's going on but don't want to share absolutely anything about it, it seems.


----------



## frollic

Marucins said:


> But why is old AGESA still there?


You can grab AGESA 1.2 beta FWs from GIGABYTE Latest Beta BIOS - TweakTown Forums ,
check the 1st post, and then look for updates starting from the end, and going backwards.

Here, one week old - GIGABYTE Latest Beta BIOS - TweakTown Forums


----------



## Dazog

I just picked up a New 5900x in Canada

2050 Date code.

Zero issues with this batch.

FYI


----------



## brasoveanul

Dazog said:


> I just picked up a New 5900x in Canada
> 
> 2050 Date code.
> 
> Zero issues with this batch.
> 
> FYI


At most, zero issues with that particular sample, there is no batch-based assumption, which is possible, concerning the processor's reliability.


----------



## thigobr

My 5950X is BG 2044PGS and doesn't seem to be affected. But it also can't do FCLK higher than 1867MHz. Is inability to suspend/sleep a symptom? The computer will never come back from suspend/sleep without a reset (Kernel power 41 event logged)


----------



## DemonAk

I handed over my problematic ryzen 5950x bg2044sus OEM under warranty. the replacement has been approved. the new processor is due to arrive on Tuesday. I wait and hope  that there will be no problems with him.


----------



## MikeS3000

How long will you be without your defective cpu for the rma process?


----------



## silot

How do you test and come to the conclusion that your CPU is defective?


----------



## frollic

MikeS3000 said:


> How long will you be without your defective cpu for the rma process?


1 week for me (EU RMA center)


----------



## MikeS3000

silot said:


> How do you test and come to the conclusion that your CPU is defective?


Lots of ways to test. For many users they are getting blue and black screen resets just idling at stock settings. For my 5900x the #1 best core fails single thread Prime95 small and large as well as single thread OCCT small and large at stock settings. I need +5 curve optimizer on that single core to pass stress tests. This comes at the expense of lower boost clock for that core.


----------



## silot

So i am really buffled because i have run various stress tests occt, prime95, time spy , port royal, memtest and i have no problems with or without oc and i have tried all of my strix-e 5000series bios updates but i am always getting whea incorectable in game no matter the bios version or oc without oc disabling cpb/ c-states. I changed memory, psu used ddu to clean my gpu drivers and installed the latest drivers , installed the latest chipset drivers still whea errors only in games so i think that my problem is something else entirely.


----------



## JohnnyFlash

silot said:


> So i am really buffled because i have run various stress tests occt, prime95, time spy , port royal, memtest and i have no problems with or without oc and i have tried all of my strix-e 5000series bios updates but i am always getting whea incorectable in game no matter the bios version or oc without oc disabling cpb/ c-states. I changed memory, psu used ddu to clean my gpu drivers and installed the latest drivers , installed the latest chipset drivers still whea errors only in games so i think that my problem is something else entirely.


Turn off "core precision boost" in the bios. Do the WHEAs still happen?


----------



## silot

JohnnyFlash said:


> Turn off "core precision boost" in the bios. Do the WHEAs still happen?


Yes i did turn it off and i still got a whea error in game , although much later than usual maybe it was just a coincidence though because i get it at random periods of time while gaming from 10min to 3 hours. I formatted my pc and going to retest , the problem is that i can't reproduce it with a stress test i need to open a game up and stress it that way.


----------



## xeizo

silot said:


> Yes i did turn it off and i still got a whea error in game , although much later than usual maybe it was just a coincidence though because i get it at random periods of time while gaming from 10min to 3 hours. I formatted my pc and going to retest , the problem is that i can't reproduce it with a stress test i need to open a game up and stress it that way.


My guess is one or more cores are worse than the other ones, since I identified my bad core and gave it less offset I haven't had a single WHEA. Dubious silicon quality from AMD, but it looks like Curve Optimizer can fix it in many cases.


----------



## Anthos

silot said:


> Yes i did turn it off and i still got a whea error in game , although much later than usual maybe it was just a coincidence though because i get it at random periods of time while gaming from 10min to 3 hours. I formatted my pc and going to retest , the problem is that i can't reproduce it with a stress test i need to open a game up and stress it that way.


Usually the problems enounctered by most in this thread are during light load. So the fact that everything works fine during stress test aligns with the rest people here. It's usually by fluctuations in voltage when a core switches from idle to boost (if i remember correctly) is when this happens. So in a stress test all cores are running on a uniform lower voltage and doesn't trigger a whea error (some do get them this way but that's a different issue).


----------



## yaniv82

MikeS3000 said:


> How long will you be without your defective cpu for the rma process?


Completely different experience depending your country. I'm based in Mexico City and started the RMA process on Dec 14th, AMD received my faulty 5950x in Florida and the RMA process was completed on Jan 4th. It's been three weeks waiting for a replacement CPU to be shipped from the US (5 weeks since I shipped it) and nobody at AMD can provide a date or follow up.


----------



## JohnnyFlash

xeizo said:


> My guess is one or more cores are worse than the other ones, since I identified my bad core and gave it less offset I haven't had a single WHEA. Dubious silicon quality from AMD, but it looks like Curve Optimizer can fix it in many cases.


I would lay it more on TSMC by the looks of things, they took on way too much to hit their targets with proper QA. I would imagine that defective consoles would be more of an issue than CPUs from their point of view.


----------



## goondam

JohnnyFlash said:


> I would lay it more on TSMC by the looks of things, they took on way too much to hit their targets with proper QA. I would imagine that defective consoles would be more of an issue than CPUs from their point of view.


thats why i was curious about the batch number of cpus, many people with older batches reporting problems


----------



## Imraneo

I sent my CPU in on Monday and I just got an update that my RMA has passed and a replacement is ready to ship. I should be expecting a replacement in 5 days.
Wish me luck guys!


----------



## Marucins

Another flood of new BIOSes. Even I'm afraid to touch it, they are already reporting that M.2 drives have a problem with detection ...
*



GA-X570

Click to expand...

*


> - Previous BetaBIOS
> X570 AORUS Xtreme - F33a
> X570 AORUS Master - F33a
> X570 AORUS Elite - F33a
> X570 AORUS Elite WiFi - F33a
> X570 AORUS Ultra - F33a
> X570 AORUS Pro - F33a
> X570 AORUS Pro WIFI - F33a
> X570I AORUS Pro WIFI - F33a
> X570 Gaming X - F33a


Currently, my CPU is stable (F32), I am afraid to touch it so as not to break something.


*I have a little memory problem.*

So far I have used 2x 16GB banks (F4-3600C16D-32GTZN / 3600 MHz CL16 (16-16-16-36)). I did a little update and now I have 4x 16GB (2x F4-3600C16D-32GTZN)
While XMP works OK (3600 CL16,16,16,...), raising the clocks to 3800 rebooting PC after start + strange F9 error.
Reducing the clock to 3733 also allows for trouble-free operation.
I checked in pairs. Modules work without problems. 2x 16GB and 2x 16GB in 3800 & IF 1900 and CL 16,16,16,...
But when I put all 4 into the motherboard, apparently it's too much of a load for the controller. It is the speaker on the board squeaking, the computer resets 3 times (error F9), and after 4 time starts up with the default timers for the JEDEC.
But... When I restart the computer, the settings I have entered are accepted and the system will start normally! :\ More than that, the tests for 3-hour AIDA and TM5 memory passes. ***!?


----------



## Pongu

I have not read through all the 42 pages there are, but I managed to somehow fix this. It is not a permanent solution, and I don't know if the CPU is behaving as it should, but my 5600x kept BSODing, and then I got to the troubleshooting screen. In the troubleshooting screen, I tried the "system restore option" and I got into my desktop. I did this twice the first time. It had a problem with it not recognizing the GPU, but when I started GeForce experience up it worked again. the second time everything worked fine. I did not test any games or benchmarks as I wanted to see if it actually worked so I restarted the pc. After restarting the PC it seemed to work until I logged in it was stuck on Welcome and those spinning dots. I clicked Ctrl ALT DEL and closed it again and I got to the desktop, but nothing was working. Everything was frozen time, icons, and the only thing I could do was refresh the desktop yet it changed nothing. I don't know how AMD still continues to push out these broken CPU's I ordered mine this week, and it just came into my country this week meaning it either was shipped from a place that had old CPUs laying around that somehow were not sold or they are still producing them. I know that there are refurbished models of the CPUs as some of the serial numbers, and etc are different on the ones that work, and those that do not.



My BSOD have been saying different things, but the one that caught my eye was "BAD SYSTEM CONFIG INFO"


----------



## Anthosm

Pongu said:


> I have not read through all the 42 pages there are, but I managed to somehow fix this. It is not a permanent solution, and I don't know if the CPU is behaving as it should, but my 5600x kept BSODing, and then I got to the troubleshooting screen. In the troubleshooting screen, I tried the "system restore option" and I got into my desktop. I did this twice the first time. It had a problem with it not recognizing the GPU, but when I started GeForce experience up it worked again. the second time everything worked fine. I did not test any games or benchmarks as I wanted to see if it actually worked so I restarted the pc. After restarting the PC it seemed to work until I logged in it was stuck on Welcome and those spinning dots. I clicked Ctrl ALT DEL and closed it again and I got to the desktop, but nothing was working. Everything was frozen time, icons, and the only thing I could do was refresh the desktop yet it changed nothing. I don't know how AMD still continues to push out these broken CPU's I ordered mine this week, and it just came into my country this week meaning it either was shipped from a place that had old CPUs laying around that somehow were not sold or they are still producing them. I know that there are refurbished models of the CPUs as some of the serial numbers, and etc are different on the ones that work, and those that do not.
> 
> 
> 
> My BSOD have been saying different things, but the one that caught my eye was "BAD SYSTEM CONFIG INFO"


This does not sound like a cpu issue


----------



## Pongu

Anthosm said:


> This does not sound like a cpu issue


That is exactly what I was thinking the problem is my old 2600 works completely fine without any problems at all.


----------



## JohnnyFlash

Well that does it, I'm getting a 3950X for now.


----------



## Anthosm

Pongu said:


> That is exactly what I was thinking the problem is my old 2600 works completely fine without any problems at all.


Is your bios updated and motherboard compatible with your new cpu? I know it's quite of a basic question but the description of the problem seems more likely to stem from something like this.


----------



## Pongu

Anthosm said:


> Is your bios updated and motherboard compatible with your new cpu? I know it's quite of a basic question but the description of the problem seems more likely to stem from something like this.


Yes, I have a GIGABYTE Aorus Pro B450 running bios F60f everything is up to date chipset drivers and GPU drivers are also on the latest.

I also checked the Clock speed of my CPU forgot to mention that when it actually booted up. The clock speeds were 4.2Ghz which I believe is normal correct me if I'm wrong there.


----------



## Anthos

Pongu said:


> Yes, I have a GIGABYTE Aorus Pro B450 running bios F60f everything is up to date chipset drivers and GPU drivers are also on the latest.
> 
> I also checked the Clock speed of my CPU forgot to mention that when it actually booted up. The clock speeds were 4.2Ghz which I believe is normal correct me if I'm wrong there.


Is there any other memory you could possibly use? I just have a feeling that the components are just not working well together. A cpu crashing would not have the windows loading dots keep going on. Usually when the cpu crashes it's straight to black.


----------



## Pongu

I think I might have to change out my ram. I currently have two sticks of Corsair Vengeance RGB pro 8GB 2666mhz so a total of 16GB. I'm pretty sure they are too slow for my CPU, but I was planning on changing them out within a month or two depending on how they would work with the CPU.


----------



## xeizo

Pongu said:


> I think I might have to change out my ram. I currently have two sticks of Corsair Vengeance RGB pro 8GB 2666mhz so a total of 16GB. I'm pretty sure they are too slow for my CPU, but I was planning on changing them out within a month or two depending on how they would work with the CPU.


You could OC them and save some money, I just reassembled a 2700X rig and put some spare 2x8GB 2666MHz in it. It was no big problem to OC the RAM to 3133MHz at 1.4V VDIMM for a nice performance uplift. The key to success is timings/subtimings, set the most important ones like the primary, trfc and ProcODT and DrvStr then let the board train at post. If you have a successful post, put in the rest of the values the board have selected manually.

You have a newer and better CPU than 2700X so you could possibly reach 3200MHz at c16, which ain't that bad.


----------



## mark007

OK I have some strange findings on my setup that I hope might help ASUS but also others. I have a Dark Hero VIII with bios 3202 and a 5950x with 2 sticks of 16GB Trident Z 3600 memory. I have tried all sorts of settings to make my setup faster, like PBO, DOC etc. I eventually settled on all stock (to keep temperatures down, even though its an NH-D15) except for setting memory to DOCP 3600 and PBO enabled, and that's it. Very stable, many tens of hours of games like the Witcher, until today.

Today I tried setting a negative offset of negative 10 in the bios for the curve optimizer (to see would it even further reduce temps), it was unstable pretty quickly in "The Witcher" which seems to detect instabilities for me better than prime 95 / cinebench.

So I proceeded to disable "Advanced" overclocking in the advanced section by setting PBO in the Advanced section to Auto. my surprise I was still unstable. After many hours of tweaking memory / soc voltages I kept getting random reboots at some point within the Witcher game, perhaps within 30 mins at least. Even with the main PBO setting (not the advanced one) set to disabled.

So, I decided to line by line undo what I did in the bios. I set the "Advanced" PBO offset to positive 0, then, I set it to completely disabled not auto, finally I went back to the normal PBO, set it to enabled. All of a sudden I have been stable for the whole day.

My belief is these bios'es have some bug where they retain some advanced overclocking settings, even when set to "auto", and possibly 'disabled', have settings that need to be hand undone one by one. Please let me know if this can be reproduced, I think if it can, it can be fixed by ASUS.


----------



## JohnnyFlash

mark007 said:


> Today I tried setting a negative offset of negative 10 in the bios for the curve optimizer (to see would it even further reduce temps), it was unstable pretty quickly in "The Witcher" which seems to detect instabilities for me better than prime 95 / cinebench.


Everything about this sounds like temperature issues, what is the case airflow like? If it's less stable when the GPU is going as well, could be because of the extra GPU heat.


----------



## mark007

JohnnyFlash said:


> Everything about this sounds like temperature issues, what is the case airflow like? If it's less stable when the GPU is going as well, could be because of the extra GPU heat.


Quite the opposite strangely, negative offset should be lowering voltages and temperatures. Anyways its a meshify 2, 3 Noctua 140mm in front, 2 on the NHD-15, 1 Noctua 140mm in the rear. I think I have found the culprrit of at least my instabilities. Its not SOC Voltage, RAM Volgate, PBO. It was advanced PBO / curve optimizer, but also the fact that advanced PBO/optimizer settings stay active even when disabled/auto. They need to be positive 0, even when advanced PBO is set to disabled / auto. Otherwise that positive/negative seems to stick, for me anyways and cause the weird instability. Thank God I found this as fully stable now on pure PBO only. 11700ish multi core and 640ish single. I'm finished tweaking haha.


----------



## Anthos

mark007 said:


> OK I have some strange findings on my setup that I hope might help ASUS but also others. I have a Dark Hero VIII with bios 3202 and a 5950x with 2 sticks of 16GB Trident Z 3600 memory. I have tried all sorts of settings to make my setup faster, like PBO, DOC etc. I eventually settled on all stock (to keep temperatures down, even though its an NH-D15) except for setting memory to DOCP 3600 and PBO enabled, and that's it. Very stable, many tens of hours of games like the Witcher, until today.
> 
> Today I tried setting a negative offset of negative 10 in the bios for the curve optimizer (to see would it even further reduce temps), it was unstable pretty quickly in "The Witcher" which seems to detect instabilities for me better than prime 95 / cinebench.
> 
> So I proceeded to disable "Advanced" overclocking in the advanced section by setting PBO in the Advanced section to Auto. my surprise I was still unstable. After many hours of tweaking memory / soc voltages I kept getting random reboots at some point within the Witcher game, perhaps within 30 mins at least. Even with the main PBO setting (not the advanced one) set to disabled.
> 
> So, I decided to line by line undo what I did in the bios. I set the "Advanced" PBO offset to positive 0, then, I set it to completely disabled not auto, finally I went back to the normal PBO, set it to enabled. All of a sudden I have been stable for the whole day.
> 
> My belief is these bios'es have some bug where they retain some advanced overclocking settings, even when set to "auto", and possibly 'disabled', have settings that need to be hand undone one by one. Please let me know if this can be reproduced, I think if it can, it can be fixed by ASUS.


I've noticed something like this too. When I play around changing values, trying overclocks etc and end up with an unstable system, when I go back to bios and load my previous stable profile I am still stuck most of the times with an unstable system despite the values being back to how they were supposed to be. If I reset CMOS and THEN load my profile everything is stable again. I also have a dark hero.


----------



## qiller

mark007 said:


> My belief is these bios'es have some bug where they retain some advanced overclocking settings, even when set to "auto", and possibly 'disabled', have settings that need to be hand undone one by one.


Oh, Asus got that problem too, interesting. On Gigabyte boards it's a common problem. This also happens with vcore set to "normal", set an offset, save/reboot settings and after that you set back vcore to "auto" without setting back the offset to +0V, you still have the offset active.

If you want to be sure no bios-bugs are in your way, always start from a saved profile or better from bios-defaults.


----------



## xeizo

I always disable all Asus "performance enhancement" stuff in the bios, they only makes the CPU go hot and actually lessens performance(except on LN2 I guess) and also introduces some weird issues. Disable as much as possible without taking out functionality. It's like color TV:s with the demo mode settings that looks awful.


----------



## silot

Format didn't do anything for me but now i am sure it's not a software issue and i saw that the ryzen power plans are no longer installed with the new chipset drivers and amd said that windows balanced works now, i messed with the curve optimiser and although my system is not stable yet it's better than without pbo/curve enabled and everything as default i am sure that i can get it to stable if i test it more but it takes way too long since i can only get the whea errors in games after at least an hour now my curve is everything negative 20 except the star/circle cores that are negative 10 and benchmarks are stable.


----------



## pSickOpatA

thigobr said:


> My 5950X is BG 2044PGS and doesn't seem to be affected. But it also can't do FCLK higher than 1867MHz. Is inability to suspend/sleep a symptom? The computer will never come back from suspend/sleep without a reset (Kernel power 41 event logged)


My defective 5600X was *BG 2044PGS*. Got a refund...
The new one is *2040* and its ok for now.


----------



## GamBoTron

I dont think you could find a pattern with the serial numbers. Some users have perfectly fine CPU's and another a faulty one with the same batch. No cohesion


----------



## brasoveanul

GamBoTron said:


> I dont think you could find a pattern with the serial numbers. Some users have perfectly fine CPU's and another a faulty one with the same batch. No cohesion


This is even worse for AMD, if a certain batch was of sub-standard quality, it would have been relatively understandable, as it has happened before with AMD and others. Nevertheless, it seems that the quality control issues are persistent with more recent batches, as well.


----------



## xeizo

brasoveanul said:


> This is even worse for AMD, if a certain batch was of sub-standard quality, it would have been relatively understandable, as it has happened before with AMD and others. Nevertheless, it seems that the quality control issues are persistent with more recent batches, as well.


All the good chiplets probably went to server stuff, and then to OEM, lastly to enthusiasts. Maybe tray CPU:s have more consistent quality than the retail packaged ones? OEM:s will express unhappiness more in the face for AMD.


----------



## mark007

qiller said:


> Oh, Asus got that problem too, interesting. On Gigabyte boards it's a common problem. This also happens with vcore set to "normal", set an offset, save/reboot settings and after that you set back vcore to "auto" without setting back the offset to +0V, you still have the offset active.
> 
> If you want to be sure no bios-bugs are in your way, always start from a saved profile or better from bios-defaults.


Thank you. Yeah I again got instabilities later yester testing with the witcher 3.

Your suggestion was what I tried today among many other things and it seemed to work for me. Ie loaded optimized defaults, set DOCP, enabled PBO and set my fan speeds and that's it. Seems to be stable all day. *** it's so weird how these bioses behave. I have to say this game the Witcher 3 has been great for testing stability. Seems as if with it the cou usage hovers between 1 and 5 percent where I seem to see all the reboots,so I can just leave the game running for a few minutes. Usually reboots within ten mins. Anyways this was a saviour for me, the simple clear cmos and enable PBO and set DOCP pretty much. The game was stable for hours.


----------



## Hueristic

Everyone needs to remember to clear your bioses after a flash, EVERYTIME!


----------



## Deepcuts

Also, after RAM has been replaced or moved.
Short read: When to Clear CMOS RAM After Hardware Configuration Changes


----------



## Marucins

My broken Ryzen (the one that went to be replaced) boosted to over 5 GHz.

The max I saw in the C20 on the new 5950X (if I leave the PC, it will finally hit me on the 5GHz) was 4975 MHz in the CB20 1 core test with (PBO +500, scalar x10, -10 curve). As I gave on the same settings All Core was 4500 MHz - 75.5 degrees MC: 11493

Max in CB20 is 11635 with +500, x10, -15 - clocks are 4550 and temp 75.
But it's not stable -> this -15. It crashed on 3DMark

*PBO: +500, scalar: x10, Curve: -15, Vcor: Normal, LLC: Low
OCCT*
Large AVX 4698 MHz / AVX2 4673 MHz
Tiny AVX 4423 MHz / AVX2 4398 MHz
*C20*
MC: 4479 MHz -> Result: 11579
SC: 4973 MHz
*C23*
MC: 4498 MHz -> Result: 29815
SC: 4998 MHz
*CPU-Z*
Benchmark result: 13277/689

And it turned out to be not stable either. I had to reduce the curve to -10.


PS. Has anyone noticed a performance degradation on these recent BIOS?


----------



## Alvy

Got my RMA replacement. So far so good at stock & XMP 4000 on F33a.


----------



## mwwl

I just swapped my faulty one out for a temp cpu and noted that my faulty 5900x is from BG 2047PGS (I think earlier in the thread people speculated that that batch was fine). I'm going to return the 5900x and wait a bit until things have settled and they're more widely available.


----------



## Deepcuts

Alvy said:


> Got my RMA replacement. So far so good at stock & XMP 4000 on F33a.


Don't forget to update your vote when you are sure it is stable.


----------



## bloot

Alvy said:


> Got my RMA replacement. So far so good at stock & XMP 4000 on F33a.


Did they ever send you the tracking number? Last friday I received the automated amd email saying my replacement is on its way but no tracking info yet.


----------



## Imraneo

bloot said:


> Did they ever send you the tracking number? Last friday I received the automated amd email saying my replacement is on its way but no tracking info yet.


I was just thinking of this. Got my last update to the ticket "shipped" and told me to expect the replacement within 5 business days. No tracking number.
Friday is the last day, so hopefully I get it by then.


----------



## Anthos

This machine must be self-aware, I can't explain it. One of the whea errors I got around a couple of weeks ago was when I was replying to a forum about WHEA errors. Haven't had any since for 2 weeks!! As soon as I decided to go visit some other forums to see if there's any more info about this issue I got one that moment. Like ***? How does it know?? It's not bad enough to consider RMAing at the moment but enough to piss you off that at any time you feel that your pc can just.. go..
Arghh.


----------



## Imraneo

When you guys get WHEA errors, does the system reboots or is there any error that pops out?
I'm trying to correlate the WHEA errors to reboots. Mine was all reboots but upon checking the event logger, I did see WHEA.


----------



## thigobr

WHEA code 18 seems to be a crash and code 19 is a correctable error (hardware was able to correct the error but it has a performance impact)


----------



## Priv-Au

Hey guys, made an account in regards to this issue.

I am having very similar issues, constant rebooting under varying load with a 5950X. Stress tests have no issues however I can run it flat stick for 30 minutes with no reboot, then all of a sudden they’ll start again after the tests finishes.
It won’t reboot in BIOS but anything after BIOS is fair game. 
I even had reboots trying to use Windows recovery. 
Windows constantly telling me it was a hardware failure. 

I RMA’ed the motherboard (ROG Crosshair VIII Formula w/ BIOS 3003) and the company conducting the tests uncovered no issues with a 5800X CPU. No reboots or anything.

Today I finally got the RMA approved with AMD and the 5950X will be sent off Friday, I can only hope this is my issue and I can finally use the system I’ve purchased.
Everything else that is going into this system works with an Aorus MB and Intel 8700K.


----------



## qiller

WHEA ID 18 (error): could be everything (too much uv/co, unstable if, faulty cpu)
WHEA ID 19 (warning): corrected error, almost always if-related, could be mitigated or eliminated with changes in vsoc/vddg_iod


----------



## Anthos

Imraneo said:


> When you guys get WHEA errors, does the system reboots or is there any error that pops out?
> I'm trying to correlate the WHEA errors to reboots. Mine was all reboots but upon checking the event logger, I did see WHEA.


there's different kinds... some people get correctable ones and they wouldn't know if they hadn't checked the event logger.
The nastier ones are those that as you are using your pc it just randomly restarts within an instance with no warning, no freezing etc, just straight back to bios


----------



## mwwl

Priv-Au said:


> Hey guys, made an account in regards to this issue.
> 
> I am having very similar issues, constant rebooting under varying load with a 5950X. Stress tests have no issues however I can run it flat stick for 30 minutes with no reboot, then all of a sudden they’ll start again after the tests finishes.


Good news (well, since you already did the RMA) – this is pretty consistent with what others have experienced. It seems as though it's often the transition between high load and low load where the problem occurs (some have speculated that it may have to do with some weaker cores having trouble with particular voltages experienced during these transitions).


----------



## Anthos

All my WHEA errors have looked like this:
Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 0

If anyone knows does APIC ID: 0 relate to the core? (all of them have been "0" for me). Because my #1 fastest (or preferred I should say) core happens to be 0. I was wondering if people that get different numbers if those numbers also correlate to their fastest cores.


----------



## silot

After thorough testing and messing with the curve optimiser i couldn't get my pc to avoid the random whea errors in games i still don't get any errors in any stress test/ benchmark and have tested my other components so my last resort is the new stable bios and if that doesn't fix it I'll have to RMA my 5900x too.


----------



## Spectre73

mwwl said:


> Good news (well, since you already did the RMA) – this is pretty consistent with what others have experienced. It seems as though it's often the transition between high load and low load where the problem occurs (some have speculated that it may have to do with some weaker cores having trouble with particular voltages experienced during these transitions).


I can second that. At least regarding the transitions. This seems to be the main culprit. Regarding the weaker cores theory: I am MOSTLY stable at stock (not 100% but I am not sure for the reason), but yesterday I tried CO with a -10 offset for the two best cores and it immediately got a WHEA error (at least with a bluescreen). So it does not only happen with the weaker cores.

No idea if this can be transferred to toal stock configuration, though.


----------



## Spectre73

Is someone able to summarize what the most likely cause for the WHEA errors is and what is the best known solution, apart from RMAing the CPU?


----------



## MikeS3000

Anthos said:


> All my WHEA errors have looked like this:
> Reported by component: Processor Core
> Error Source: Machine Check Exception
> Error Type: Cache Hierarchy Error
> Processor APIC ID: 0
> 
> If anyone knows does APIC ID: 0 relate to the core? (all of them have been "0" for me). Because my #1 fastest (or preferred I should say) core happens to be 0. I was wondering if people that get different numbers if those numbers also correlate to their fastest cores.


Run CPU-z and click "tools". Save report as *.txt. Open the report. This should show what ACIC ID corresponds to what thread and core. More than likely that is Core 0 and Thread 0. So your very first core and thread for your CPU. My WHEA errors were always in the high 20s for APIC IDs and I had to corresponds these with my 11th and 12th cores on my 5900x as being the culprit.


----------



## MikeS3000

Spectre73 said:


> Is someone able to summarize what the most likely cause for the WHEA errors is and what is the best known solution, apart from RMAing the CPU?


Just like I posted above. If you can find consistency in your APIC ID or IDs linked to the WHEA event 18 error then maybe you can tune that core. If it's failing at stock setting then maybe go into PBO and curve optimizer and try giving small bumps in positive offset to the core or cores that are linked to the crashes. If multiple cores are failing at stock then RMA. Even if just a few cores are failing at stock then RMA would make sense unless it's not worth the headache and you can tune the CPU out of crashes.


----------



## Imraneo

silot said:


> After thorough testing and messing with the curve optimiser i couldn't get my pc to avoid the random whea errors in games i still don't get any errors in any stress test/ benchmark and have tested my other components so my last resort is the new stable bios and if that doesn't fix it I'll have to RMA my 5900x too.


How I tested my system was to keep it on with some apps running (multiple chrome tabs) and that's it. I made sure the sleep settings were turned off.
If in the morning the system still remains, it's good to go.
I know during full load, it doesn't reboot. Mine was reboots all the way. Every day I wait for my replacement to arrive..
Patience.. patience....


----------



## Bakgrund

My 5600x with Msi b550i gaming edge wifi system also crashes. I've found out that cinebench r23 is the easiest way to recreate it, always crashes within 1-5 seconds. Locking core ratio to 37 (3.7 GHz) makes it stable. Diasbling CPB is not enough, it seems like it will also crash when going form light to heavy loads or changing from idle clock frequency to maximum clock.

I also figured out by disabling cores in Ryzen master that it's probably core 01 that is the culprit in my CPU. Cores have to be disabled in pairs in Ryzen master, so by trying a few variations I came to the conclusion that as long as core 01 is enabled, I experience the same instable behaviour.

I've spend two full days on trying to get this to work with various bios settings now, so I'm just going to return mine for a replacement.


----------



## Spectre73

MikeS3000 said:


> Just like I posted above. If you can find consistency in your APIC ID or IDs linked to the WHEA event 18 error then maybe you can tune that core. If it's failing at stock setting then maybe go into PBO and curve optimizer and try giving small bumps in positive offset to the core or cores that are linked to the crashes. If multiple cores are failing at stock then RMA. Even if just a few cores are failing at stock then RMA would make sense unless it's not worth the headache and you can tune the CPU out of crashes.


What about increasing LLC instead? Would this also solve the problem?


----------



## MikeS3000

Spectre73 said:


> What about increasing LLC instead? Would this also solve the problem?


Maybe, but if you just have one core that can't handle stock settings then increasing LLC will decrease your boost on all of your good cores. For me the easiest was to isolate the bad core and add some positive offset to curve. Or, you could leave LLC alone and bump vcore with a +offset but again this will have a similar effect as more LLC and you risk overvolting beyond 1.5v at low loads. Depends on if that is ok to you. Alternatively, leave LLC and Vcore alone and if you are not wanting to put in the effort to isolate the bad core or cores then try positive offset on all cores for curve optimizer. That has helped some.


----------



## ghiga_andrei

Just some info for those using Gigabyte boards (at least Aorus Elite x570):

I updated to F33a and would have reboots with XMP enabled.
Then I did a clear CMOS by removing battery and enabled XMP and no reboots.
Then I set CO to -5 all core and got a reboot.
Then I set it back to 0 and still got reboots.
I did a new clear CMOS and just enabled XMP and no reboots in the last 3 days.
I test with 5 runs of R20 MC and then immediately Geekbench. Crashes most of the times.

So for me it seems that once you enable any setting on the Curve Optimizer even if you set it back to 0 all core or disable it, it will still be unstable.
Only by resetting CMOS really disables the CO. This is horrible and cost me a lot of time.
Using Load optimized defaults or whatever it's called did not do the same thing as battery clearing the CMOS.


----------



## Anthosm

ghiga_andrei said:


> Just some info for those using Gigabyte boards (at least Aorus Elite x570):
> 
> I updated to F33a and would have reboots with XMP enabled.
> Then I did a clear CMOS by removing battery and enabled XMP and no reboots.
> Then I set CO to -5 all core and got a reboot.
> Then I set it back to 0 and still got reboots.
> I did a new clear CMOS and just enabled XMP and no reboots in the last 3 days.
> I test with 5 runs of R20 MC and then immediately Geekbench. Crashes most of the times.
> 
> So for me it seems that once you enable any setting on the Curve Optimizer even if you set it back to 0 all core or disable it, it will still be unstable.
> Only by resetting CMOS really disables the CO. This is horrible and cost me a lot of time.
> Using Load optimized defaults or whatever its called did not do the same thing as battery clearing the CMOS.


I've noticed this too. Play around with overclocking, crash. Load good known profile. Still crash. Reset cmos. Load profile. Stable. (Well stable aside from once in wherever wheas that pop up).


----------



## ghiga_andrei

If I were an AMD dev I would just look through the RMA CPUs and test my code with them also, not only with golden samples and see if CO and PBO and whatever works also with the lower end chips. I don't know what monkeys work at AMD but they clearly don't know what they are doing. At least the AGESA team. I am a semiconductor dev myself and when I find out something is fishy on some chips in the lab I immediately investigate those samples with higher prio. And now AMD has a large chunk of RMA CPUs even after reading only this thread.


----------



## Bakgrund

ghiga_andrei said:


> If I were an AMD dev I would just look through the RMA CPUs and test my code with them also, not only with golden samples and see if CO and PBO and whatever works also with the lower end chips. I don't know what monkeys work at AMD but they clearly don't know what they are doing. At least the AGESA team. I am a semiconductor dev myself and when I find out something is fishy on some chips in the lab I immediately investigate those samples with higher prio. And now AMD has a large chunk of RMA CPUs even after reading only this thread.


It would be interesting to see the RMA statistics of the Zen 3 cpu:s. Hopefully some retailers will publish them.


----------



## yaniv82

I received my 5950x RMA replacement yesterday and have been testing for the past 8 hours. So far everything seems stable at bios default settings and DOCP 3600. No random reboots at idle compared to the previous cpu.
The new processor is from the same batch as the previous one 2043PGS.


----------



## shaksiwnl

ghiga_andrei said:


> If I were an AMD dev I would just look through the RMA CPUs and test my code with them also, not only with golden samples and see if CO and PBO and whatever works also with the lower end chips. I don't know what monkeys work at AMD but they clearly don't know what they are doing. At least the AGESA team. I am a semiconductor dev myself and when I find out something is fishy on some chips in the lab I immediately investigate those samples with higher prio. And now AMD has a large chunk of RMA CPUs even after reading only this thread.



5950x approved for RMA. First got an automated emailed reply with label for RMA facility in miami. Then I got another email from a rep asking to send it to Austin for failure analysis. Maybe they're finally doing something.

Email said:
"We wish to receive your RMA return in our Austin lab for failure analysis. Please DO NOT ship to AMD Miami as stated in RMA approval email. Please disregard that email.
We will ship new replacement processor to you after we receive your return in Austin."


----------



## DemonAk

DemonAk said:


> I handed over my problematic ryzen 5950x bg2044sus OEM under warranty. the replacement has been approved. the new processor is due to arrive on Tuesday. I wait and hope  that there will be no problems with him.


ok, i recived new cpu on local store, because OEM, same batch as previosly (bg 2044sus). Unfortunately this cpu even worse than previously (all setting stock). Bsod (whea uncorrectable error) and hard reboot without bsod at idle. I can reproduce 100% using tool boost tester, every time got bsod and hard reboot at 12 core. I can pass boost tester with LLC1 (LLC2 bsod) and curve optimizer on 12 core +2 (+1 bsod). I Order new CPU in other store. So i think it's defective batch =(
I Can't test with newest bios because on my board B550 Taichi only have bios 1.70 with agesa 1.1.0.0 patch D


----------



## ffletchs

I've "fixed" 3 5000 series systems with curve optimizer, all where having random reboots at idle,whea errors and was very difficult to run at DOCP settings or some memory OC. Basically the issue is that 1 or more cores are unstable at the stock voltages/curves.

To identify the weak cores is easy, start by undervolting 1 at time or half of them and identify which cores are unstable (undervolting will exaggerate the issue, making it more easy to spot the weak cores). Then use curve optimiser to give these weak cores a curve positve value of +5-10.

Of cause you can RMA the processers since this is obviously bad QA/QC by AMD but that takes time


----------



## ghiga_andrei

For me it's puzzling why on the box and in the Where gaming begins presentation they specify 4.8GHz max boost for 5900x but stock they boost to 4.95GHz. This makes no sense. Also I'm very curious what frequency is actually needed to reach the gaming performance they showed in that presentation. Was that with 4.8GHz boost, 4.95GHz boost or even higher ? Because most of us here probably bought the product that has to meet that performance showcased there, not some random frequency specified on the box. I wished I had an RTX3080 to test the gaming performance but I don't and by the looks of it will not be able to get one until summer at least.


----------



## yaniv82

yaniv82 said:


> I received my 5950x RMA replacement yesterday and have been testing for the past 8 hours. So far everything seems stable at bios default settings and DOCP 3600. No random reboots at idle compared to the previous cpu.
> The new processor is from the same batch as the previous one 2043PGS.


I take that back... everything seemed stable the first couple days and just had a reboot while watching a YouTube video, nothing else running on the background. Will try updating my motherboard to the latest beta BIOS. RMA a second time given how long it took to get a replacement doesn't seem like an option. Really frustrated with AMD.


----------



## machine038

ghiga_andrei said:


> For me it's puzzling why on the box and in the Where gaming begins presentation they specify 4.8GHz max boost for 5900x but stock they boost to 4.95GHz. This makes no sense. Also I'm very curious what frequency is actually needed to reach the gaming performance they showed in that presentation. Was that with 4.8GHz boost, 4.95GHz boost or even higher ? Because most of us here probably bought the product that has to meet that performance showcased there, not some random frequency specified on the box. I wished I had an RTX3080 to test the gaming performance but I don't and by the looks of it will not be able to get one until summer at least.


That is how Precision Boost works, it tries to push the max it can, if your chip can do it, sure, why no go for 4.95GHz.
Maybe AMD while researching and development, found out that the minimum they can "guarantee" is 4.8GHz at any time, but sure, if you have a "better silicon", since you know, every CPU is a unique in their own way, then Precision Boost will reach for higher.


https://www.amd.com/en/support/kb/faq/cpu-pb2



About benchmarks, I couldn't find any details if was with boost or enabled or not. Usually they both lock the core speed to a common value so you're comparing apples to apples.

The footnotes sometimes detail the benchmark, for their 19% IPC claim they locked both CPU at 4GHz



> Testing by AMD performance labs as of 09/01/2020. IPC evaluated with a selection of 25 workloads running at a locked 4GHz frequency on 8-core "Zen 2" Ryzen 7 3800XT and "Zen 3" Ryzen 7 5800X desktop processors configured with Windows® 10, NVIDIA GeForce RTX 2080 Ti (451.77), Samsung 860 Pro SSD, and 2x8GB DDR4-3600. Results may vary. R5K-003






https://www.amd.com/en/technologies/zen-core-3


----------



## JohnnyFlash

yaniv82 said:


> I take that back... everything seemed stable the first couple days and just had a reboot while watching a YouTube video, nothing else running on the background. Will try updating my motherboard to the latest beta BIOS. RMA a second time given how long it took to get a replacement doesn't seem like an option. Really frustrated with AMD.


Well that is SUPER concerning. Have you played around to CO yet to see how many cores it is and how much you need to offset?


----------



## MaxHughes

Deepcuts said:


> Hello,
> 
> *Please vote on the pool only if your system is not stable with BIOS defaults, memory at 2133 Mhz without XMP, without any CPU or RAM overclocking, without PBO or any voltage tweaks and of course, if you do not have any issues with your Ryzen 5000 or your problem has been fixed.*
> _* you can select 2 values.
> Motherboard+CPU if you have issues.
> No, I tested extensively for several days+CPU if you do not have issues._
> _It did, but *+CPU if your issue has been fixed._
> 
> *See **https://www.overclock.net/threads/replaced-3950x-with-5950x-whea-and-reboots.1774627/post-28698010** for the solution to my issue.*
> 
> 
> I bought the new AMD Ryzen 5950X to replace my AMD Ryzen 3950X.
> This is the only new component in the system. The rest of the components are in the signature.
> 
> 
> *Problem*
> 
> As soon as I booted up to Windows, the system started rebooting and crashing, sometimes with the BSOD WHEA Uncorrectable Error​
> 
> *What I tried*
> 
> *Long story short:*​
> I have replaced every component except the CPU and the motherboard.
> 
> *Long story:*​
> removed all RAM sticks and tested with only one at a time in different memory slots.
> tested with memory at 2133 Mhz auto timings, XMP and manual timings without XMP.
> took out my RAM and tested with one stick of G-Skill F4-2400C15S-8GNS and one KIT of 2 sticks Corsair CMK8GX4M1A2400C16.
> replaced the PSU with a Corsair AX760i
> removed any other USB devices besides mouse and keyboard
> tested with only a Bluetooth mouse. No other USB connected.
> removed any other HDD and SSD besides the system/windows one.
> replaced the system/Windows SSD and tried reinstalling Windows. Crashes while installing.
> removed the CPU to check for bent pins with a magnifying glass. Twice. All good.
> downgraded BIOS to version F30.
> re-flashed BIOS version F31e.
> upgraded BIOS to F31h, F31i, F31k, F31l, F31n, F31o, F31
> cleared CMOS and tried booting without setting anything in BIOS.
> booted Ubuntu 20 Desktop live USB. Crashes before desktop with some cryptic error about CPU.
> checked CPU and motherboard temperatures. All fine.
> reseated the GPU.
> tested with an RX 460 GPU instead of GTX 1080 ti.
> tested with an RX 590 GPU instead of GTX 1080 ti. Takes longer to crash than with the GTX 1080 ti on BIOS version F31n.
> disabled C-States
> disabled HPET-Timer
> forced PCIe to gen 2/3/4
> disabled AMD Cool&Quiet
> disabled PBO (always have it on Auto anyway)
> removed all SSDs and HDDs and tried booting from Ubuntu live USB
> tried all levels of LLC
> Enabled Preferred Cores
> 
> 
> *Temporary fix*
> 
> After many failed attempts with various BIOS settings, the only one that fixes this problem is setting "Core Performance Boost" to disabled. Of course, with this setting disabled, this new CPU performs a lot worse than the old 3950X.​With "Core Performance Boost" disabled, I can run my RAM at 3600 and IF/UCLK at 1800 with tight timings without any problems. 300+ Handbrake CPU stable encodes so far.​
> 
> With F31h Windows no longer crashes at boot, but crashes under load or random at idle like before.
> The fastest way to crash the system is to run AIDA64 memory copy benchmark (will crash when CPU will reach 100% usage), a Handbrake encode (will crash as soon as it starts encoding) or a game (Guild Wars 2 crashes at login screen).
> 
> Opened a ticket with Gigabyte, but knowing Gigabyte, their response will be "We will inform our engineers" and then silence.
> Opened a ticket with AMD. No response. Received an email requesting some details. Still waiting. Received another email requesting details already sent in the original RMA ticket. I guess AMD support and Gigabyte support are outsourced at the same helpdesk.
> 
> Anyone else having problems with the new 5950X and Core Performance Boost?
> 
> Thank you.


3600MHz is Intel XMP not AMD. Try 3466MHz at the same CL.


----------



## MaxHughes

yaniv82 said:


> I take that back... everything seemed stable the first couple days and just had a reboot while watching a YouTube video, nothing else running on the background. Will try updating my motherboard to the latest beta BIOS. RMA a second time given how long it took to get a replacement doesn't seem like an option. Really frustrated with AMD.


RUN AMD RAM SPEEDS. 2933/3200/3466/3733 Intel XMP 200MHz steps is for Intel. AMD is 266MHZ steps. Posted on AMD's website the day of Ryzens launch.


----------



## Imraneo

MaxHughes said:


> RUN AMD RAM SPEEDS. 2933/3200/3466/3733 Intel XMP 200MHz steps is for Intel. AMD is 266MHZ steps. Posted on AMD's website the day of Ryzens launch.


This is interesting. Thanks for sharing.
However, it makes me wonder why G.Skill is selling their TridentZ Neo (specifically for AMD) at 3600Mhz and also why does the BIOS even allow you to select 3600Mhz?


----------



## qiller

MaxHughes said:


> RUN AMD RAM SPEEDS. 2933/3200/3466/3733 Intel XMP 200MHz steps is for Intel. AMD is 266MHZ steps. Posted on AMD's website the day of Ryzens launch.


Sounds stupid. Why are we all able to set a 66MT/s-stepping on memclock? Why is 3600MT/s faster than 3466MT/s, if I follow your logic?


----------



## Deepcuts

@MaxHughes don't quote without reading the whole post.
And stop this BS with XMP is Intel only on a topic about CPUs crashing at stock settings.
Go argue with buildzoid if you really want to discuss this further.


----------



## DemonAk

DemonAk said:


> ok, i recived new cpu on local store, because OEM, same batch as previosly (bg 2044sus). Unfortunately this cpu even worse than previously (all setting stock). Bsod (whea uncorrectable error) and hard reboot without bsod at idle. I can reproduce 100% using tool boost tester, every time got bsod and hard reboot at 12 core. I can pass boost tester with LLC1 (LLC2 bsod) and curve optimizer on 12 core +2 (+1 bsod). I Order new CPU in other store. So i think it's defective batch =(
> I Can't test with newest bios because on my board B550 Taichi only have bios 1.70 with agesa 1.1.0.0 patch D


Received 3rd CPU, AGAIN BG 2044SUS, he is much better, pass boost tester without bsod or reboot, linx with one thread....


----------



## ghiga_andrei

DemonAk said:


> Received 3rd CPU, AGAIN BG 2044SUS, he is much better, pass boost tester without bsod or reboot, linx with one thread....


I wonder if since you received such old samples if they gave you cpus returned by others before.


----------



## JohnnyFlash

ghiga_andrei said:


> I wonder if since you received such old samples if they gave you cpus returned by others before.


I think it's more likely that TSMC only made a couple batches before switching to console chips, and the delays are mostly distribution related.


----------



## DemonAk

ghiga_andrei said:


> I wonder if since you received such old samples if they gave you cpus returned by others before.


I do not exclude this. our local stores can easily sell returns cpu's


----------



## ghiga_andrei

JohnnyFlash said:


> I think it's more likely that TSMC only made a couple batches before switching to console chips, and the delays are mostly distribution related.


I've seen people reporting buying batches like 2050 on Reddit so I think there are newer batches out there, somewhere.


----------



## RemoteSpecialist

Hello! I posted some time ago

"Change 5950x processor to 5900x and now it starts to work even worse. With 5900x - I got this WHEA error and bsod immediately during simple benchmark run in Shadow Of Tomb Rider. So I disable again CPB and PBO - but the same black screen restart in low load were reproduced. So I'm changing CPU again (this time I sent back to the shop cpu+motherboard+memory - to be retested together in the shop's service center). I think there is a very small chance that there is an issue in motherboard (Gygabyte B550 Aorus Master with latest firmware), but for 99% it's CPU again. That's really frustrating. I did not expect such low quality control for CPUs"

After checking parts I got unexpected results - mainboard (Gigabyte b550 Master) and memory (Predator 2*16gb 3600c17) were with issues, but they did not find any errors in CPU.
So I got new mainboard (MSI B550 Carbon bios with Agesa 1.1.9.0) and memory (Crucial 2*16gb 3600c16) and the same 5900x.
Also I have RTX 3080 and Corsair HX-1000i in a build.

On a new build I checked *settings:*
Settings 1. CPB disabled, PBO disabled. XMP enabled (Ram 3600, IF 1800)
Settings 2. CPB enabled, PBO disabled. XMP enabled (Ram 3600, IF 1800)
Settings 3. CPB enabled, PBO disabled. Ram 3200, IF 1600

I run such *tests:*
Test 1: Cinebench R23 (single core and multi core)
Test 2: AIDA64 (stability test)
Test 3: Shadow of Tomb Raider benchmark (2560*1440, Vsync off, Highest settings with RTX Ultra)
Test 4: Red Dead Redemption 2 benchmark (2560*1440 Highest settings)

*Results:*
1. All tests passed fine if CPB disabled (settings 1)
2. Cinebench and AIDA work fine with CPB enabled (settings 2, 3)
3. I still got *fatal errors* and bsods in games with CPB enabled (settings 2, 3):
Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 11 (also instead of 11 I saw 10, 7, 8 ids at this error)

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Bus/Interconnect Error
Processor APIC ID: 8

Resume after comparing 2 my builds:
1. If you have restarts on idle - it can be failed cpu or memory or ram or bios version - anything from that
2. I think that my CPU is defective as I still got these WHEA errors

*Questions:*
1. Can somebody help me to find another simple way to reproduce this WHEA issue - ideally without any GPU load - so I can share it with service?
2. Can somebody told me the best way (if any) to fix these issues?
3. Does anybody check bios with Agesa 1.2.0.0 - if it fixed this WHEA issues or not?

Thx!


----------



## hisXLNC

have you guys tried manually setting fclk and changing nothing else? I noticed it wont boot on certain fclks or boots and produces lots of wheas. funnily enough wont boot at for example 1900 fclk but will boot at 1933 but produce wheas. could be some cpus at default fclk speed with default ram speeds are not stable and maybe if you boost fclk to something else it will be?


----------



## Anthos

hisXLNC said:


> have you guys tried manually setting fclk and changing nothing else? I noticed it wont boot on certain fclks or boots and produces lots of wheas. funnily enough wont boot at for example 1900 fclk but will boot at 1933 but produce wheas. could be some cpus at default fclk speed with default ram speeds are not stable and maybe if you boost fclk to something else it will be?


What type of whea error? 18 or 19. Because it is quite different from each other.


----------



## mwwl

RemoteSpecialist said:


> *Questions:*
> 1. Can somebody help me to find another simple way to reproduce this WHEA issue - ideally without any GPU load - so I can share it with service?
> 2. Can somebody told me the best way (if any) to fix these issues?
> 3. Does anybody check bios with Agesa 1.2.0.0 - if it fixed this WHEA issues or not?
> 
> Thx!


I think it's tough to reproduce reliably. The only way I pulled it off was running a game and then occasionally going to the menus to make the load lower, where it would often crash. Earlier in this thread someone spoke about a program that explicitly tries to exercise boosting, which may help.

CPB disabled also prevented the crashes for me. With CPB on, I still got WHEA issues with AGESA 1.2.0.0.

Elsewhere in this thread people mention isolated individual cores to figure out which ones are voltage sensitive and then tuning them manually with the curve optimizer. Might be worth a shot for you (though ideally you'd get an RMA, this is obviously not what you paid for).


----------



## RemoteSpecialist

Thx a lot to *DemonAk *for the help and many hints with the issue.

*mwwl *"The only way I pulled it off was running a game and then occasionally going to the menus to make the load lower, where it would often crash" - exactly the same case for me - I run Shadow of Tomb Raider benchmark wait some time, cancel it and it usually crashes - if not repeat the process.
Today I was trying to get this error with some synthetic tests - but no luck here - every tests passes without issues.
Thx a lot for the info about AGESA 1.2.0.0 - it's useless to wait it.

It looks like I have to change CPU one more time. And we still do not have any response from AMD about this issue


----------



## ghiga_andrei

RemoteSpecialist said:


> Hello! I posted some time ago
> 
> "Change 5950x processor to 5900x and now it starts to work even worse. With 5900x - I got this WHEA error and bsod immediately during simple benchmark run in Shadow Of Tomb Rider. So I disable again CPB and PBO - but the same black screen restart in low load were reproduced. So I'm changing CPU again (this time I sent back to the shop cpu+motherboard+memory - to be retested together in the shop's service center). I think there is a very small chance that there is an issue in motherboard (Gygabyte B550 Aorus Master with latest firmware), but for 99% it's CPU again. That's really frustrating. I did not expect such low quality control for CPUs"
> 
> After checking parts I got unexpected results - mainboard (Gigabyte b550 Master) and memory (Predator 2*16gb 3600c17) were with issues, but they did not find any errors in CPU.
> So I got new mainboard (MSI B550 Carbon bios with Agesa 1.1.9.0) and memory (Crucial 2*16gb 3600c16) and the same 5900x.
> Also I have RTX 3080 and Corsair HX-1000i in a build.
> 
> On a new build I checked *settings:*
> Settings 1. CPB disabled, PBO disabled. XMP enabled (Ram 3600, IF 1800)
> Settings 2. CPB enabled, PBO disabled. XMP enabled (Ram 3600, IF 1800)
> Settings 3. CPB enabled, PBO disabled. Ram 3200, IF 1600
> 
> I run such *tests:*
> Test 1: Cinebench R23 (single core and multi core)
> Test 2: AIDA64 (stability test)
> Test 3: Shadow of Tomb Raider benchmark (2560*1440, Vsync off, Highest settings with RTX Ultra)
> Test 4: Red Dead Redemption 2 benchmark (2560*1440 Highest settings)
> 
> *Results:*
> 1. All tests passed fine if CPB disabled (settings 1)
> 2. Cinebench and AIDA work fine with CPB enabled (settings 2, 3)
> 3. I still got *fatal errors* and bsods in games with CPB enabled (settings 2, 3):
> Reported by component: Processor Core
> Error Source: Machine Check Exception
> Error Type: Cache Hierarchy Error
> Processor APIC ID: 11 (also instead of 11 I saw 10, 7, 8 ids at this error)
> 
> Reported by component: Processor Core
> Error Source: Machine Check Exception
> Error Type: Bus/Interconnect Error
> Processor APIC ID: 8
> 
> Resume after comparing 2 my builds:
> 1. If you have restarts on idle - it can be failed cpu or memory or ram or bios version - anything from that
> 2. I think that my CPU is defective as I still got these WHEA errors
> 
> *Questions:*
> 1. Can somebody help me to find another simple way to reproduce this WHEA issue - ideally without any GPU load - so I can share it with service?
> 2. Can somebody told me the best way (if any) to fix these issues?
> 3. Does anybody check bios with Agesa 1.2.0.0 - if it fixed this WHEA issues or not?
> 
> Thx!


Run Cinebench R20 MC for 5 times in a row to heat up the CPU with full load and then run Geekbench CPU test immediately with no pause between (have Geekbench open before running CB20). Crashes most of the time like this. The issue is when the CPU switches from heavy load to light load and above some temperature.


----------



## RemoteSpecialist

ghiga_andrei said:


> Run Cinebench R20 MC for 5 times in a row to heat up the CPU with full load and then run Geekbench CPU test immediately with no pause between (have Geekbench open before running CB20). Crashes most of the time like this. The issue is when the CPU switches from heavy load to light load and above some temperature.


Tried this several times - still no luck to reproduce


----------



## Twirlz

Was wondering if anybody could help me in identifying if I really am suffering from this issue or something else.

I'm having idle reboots. Everything seems perfectly fine at load whereas on idle it just reboots randomly. Could be while browsing the web or when I'm away from the computer, it happened four times yesterday. I first installed the 5900X a few weeks ago and it ran beautifully until the 28th, which coincidentally is when I first installed the RX 6800.

Although I first assumed the RX 6800 is the cause of the reboots, upon checking event viewer I've had 11 WHEA errors since the 28th. Error type Cache Hierarchy Error, event ID 18 (and one Bus/Interconnect Error). I've been reading this thread but I'm getting a little confused. In my case, the GPU is the only thing which changed yet I'm now experiencing these reboots and getting processor core WHEAs in event viewer.

Thank you for any insight.

Specs:
5900X (PBO off, bg2046sus)
Corsair 32GB 3200MHz CL16
Asus Crosshair X470 (latest BIOs tested)
EVGA G2 750W
Sapphire RX 6800 Nitro


----------



## aa.delite

Twirlz said:


> I'm having idle reboots. Everything seems perfectly fine at load whereas on idle it just reboots randomly.
> Error type Cache Hierarchy Error


I suppose you're using latest bios AGESA 1.1.0.0 patch D or newer. If not, update the BIOS.
There is 20% chance it caused by memory. You should test for reboots with default JEDEC DRAM speed - 2400 MHz.
There is 80% chance you have defective CPU. Some core(s) boosting up to 5 GHz at idle (or light load like browsing) unable to work on default voltages, need to RMA. You may temporary "fix" the issue by setting positive Curve Optimizer value (+5 or +10 all cores) means small core(s) overvoltage. You may find exact weak core(s) by using BoostTester and set positive Curve Optimizer value for that core(s). Or just set CO for all cores if too lazy  You may work for years with CO set, but better replace defective CPU when you get the chance to get replacement.


----------



## Anthos

Twirlz said:


> Was wondering if anybody could help me in identifying if I really am suffering from this issue or something else.
> 
> I'm having idle reboots. Everything seems perfectly fine at load whereas on idle it just reboots randomly. Could be while browsing the web or when I'm away from the computer, it happened four times yesterday. I first installed the 5900X a few weeks ago and it ran beautifully until the 28th, which coincidentally is when I first installed the RX 6800.
> 
> Although I first assumed the RX 6800 is the cause of the reboots, upon checking event viewer I've had 11 WHEA errors since the 28th. Error type Cache Hierarchy Error, event ID 18 (and one Bus/Interconnect Error). I've been reading this thread but I'm getting a little confused. In my case, the GPU is the only thing which changed yet I'm now experiencing these reboots and getting processor core WHEAs in event viewer.
> 
> Thank you for any insight.
> 
> Specs:
> 5900X (PBO off, bg2046sus)
> Corsair 32GB 3200MHz CL16
> Asus Crosshair X470 (latest BIOs tested)
> EVGA G2 750W
> Sapphire RX 6800 Nitro


Are you on stock bios settings or have you changed anything so far? Your hierarchy errors what APIC ID do they state?


----------



## Twirlz

aa.delite said:


> I suppose you're using latest bios AGESA 1.1.0.0 patch D or newer. If not, update the BIOS.
> You have defective CPU. Some core(s) boosting up to 5 GHz at idle (or light load like browsing) unable to work on default voltages, need to RMA. You may temporary "fix" the issue by setting positive Curve Optimizer value (+5 or +10 all cores) means small core(s) overvoltage. You may find exact weak core(s) by using BoostTester and set positive Curve Optimizer value for that core(s). Or just set CO for all cores if too lazy


Yep Asus published a AGESA V2 PI 1.2.0.0 BIOs on the 29th but sadly it still has this problem.

I'll have a play with curve optimizer just to see if it has any improvement as well as consider an RMA after I've looked into it more, thank you 



Anthos said:


> Are you on stock bios settings or have you changed anything so far? Your hierarchy errors what APIC ID do they state?


Stock BIOs doesn't seem to help, it just rebooted within a few hours. Usually the only changes I make anyways are just enable DOCP (which worked perfect with a 3700X on this board and RAM) and disable PBO.

The APIC ID varies. 2, 24, 0, 4, 20, 23. Sometimes they appear more than once (20 most common) but it's not a specific ID it seems.


----------



## DemonAk

aa.delite said:


> You may find exact weak core(s) by using BoostTester and set positive Curve Optimizer value for that core(s)


Yep, you right. But we need find other good stress testing utils like boost tester to check the stability of each core.
So far, I wrote a small bat file for ryzen 5950x to test each core (every 10 seconds) for an hour in cinebench single core test. create .bat file with code below spoiler and put file in the cinebench folder and run. You can edit file for your cpu 12, 8 cores as well as test time (in seconds, default 3600) and kernel check interval (10 sec on each core). but it seems to me that the cinebench does not generate the maximum boost to the core.


Spoiler: test_per_core(1st_thread_on_each_core).bat






Code:


@echo off
echo Launch cinebench test (single core)
start .\Cinebench.exe -g_CinebenchCpu1Test=true -g_acceptDisclaimer=true -g_CinebenchMinimumTestDuration=3600
timeout /t 20 /nobreak >nul
:loop
echo Testing stability on core 0...
PowerShell "$Process = Get-Process Cinebench; $Process.ProcessorAffinity=1"
timeout /t 10 /nobreak >nul
echo Testing stability on core 1...
PowerShell "$Process = Get-Process Cinebench; $Process.ProcessorAffinity=4"
timeout /t 10 /nobreak >nul
echo Testing stability on core 2...
PowerShell "$Process = Get-Process Cinebench; $Process.ProcessorAffinity=16"
timeout /t 10 /nobreak >nul
echo Testing stability on core 3...
PowerShell "$Process = Get-Process Cinebench; $Process.ProcessorAffinity=64"
timeout /t 10 /nobreak >nul
echo Testing stability on core 4...
PowerShell "$Process = Get-Process Cinebench; $Process.ProcessorAffinity=256"
timeout /t 10 /nobreak >nul
echo Testing stability on core 5...
PowerShell "$Process = Get-Process Cinebench; $Process.ProcessorAffinity=1024"
timeout /t 10 /nobreak >nul
echo Testing stability on core 6...
PowerShell "$Process = Get-Process Cinebench; $Process.ProcessorAffinity=4096"
timeout /t 10 /nobreak >nul
echo Testing stability on core 7...
PowerShell "$Process = Get-Process Cinebench; $Process.ProcessorAffinity=16384"
timeout /t 10 /nobreak >nul
echo Testing stability on core 8...
PowerShell "$Process = Get-Process Cinebench; $Process.ProcessorAffinity=65536"
timeout /t 10 /nobreak >nul
echo Testing stability on core 9...
PowerShell "$Process = Get-Process Cinebench; $Process.ProcessorAffinity=262144"
timeout /t 10 /nobreak >nul
echo Testing stability on core 10...
PowerShell "$Process = Get-Process Cinebench; $Process.ProcessorAffinity=1048576"
timeout /t 10 /nobreak >nul
echo Testing stability on core 11...
PowerShell "$Process = Get-Process Cinebench; $Process.ProcessorAffinity=4194304"
timeout /t 10 /nobreak >nul
echo Testing stability on core 12...
PowerShell "$Process = Get-Process Cinebench; $Process.ProcessorAffinity=16777216"
timeout /t 10 /nobreak >nul
echo Testing stability on core 13...
PowerShell "$Process = Get-Process Cinebench; $Process.ProcessorAffinity=67108864"
timeout /t 10 /nobreak >nul
echo Testing stability on core 14...
PowerShell "$Process = Get-Process Cinebench; $Process.ProcessorAffinity=268435456"
timeout /t 10 /nobreak >nul
echo Testing stability on core 15...
PowerShell "$Process = Get-Process Cinebench; $Process.ProcessorAffinity=1073741824"
timeout /t 10 /nobreak >nul
goto loop







Spoiler: test_per_core(2_thread_on_each_core).bat






Code:


@echo off
echo Launch cinebench test (single core)
start .\Cinebench.exe -g_CinebenchCpu1Test=true -g_acceptDisclaimer=true -g_CinebenchMinimumTestDuration=3600
timeout /t 20 /nobreak >nul
:loop
echo Testing stability on core 0...
PowerShell "$Process = Get-Process Cinebench; $Process.ProcessorAffinity=3"
timeout /t 10 /nobreak >nul
echo Testing stability on core 1...
PowerShell "$Process = Get-Process Cinebench; $Process.ProcessorAffinity=12"
timeout /t 10 /nobreak >nul
echo Testing stability on core 2...
PowerShell "$Process = Get-Process Cinebench; $Process.ProcessorAffinity=48"
timeout /t 10 /nobreak >nul
echo Testing stability on core 3...
PowerShell "$Process = Get-Process Cinebench; $Process.ProcessorAffinity=192"
timeout /t 10 /nobreak >nul
echo Testing stability on core 4...
PowerShell "$Process = Get-Process Cinebench; $Process.ProcessorAffinity=768"
timeout /t 10 /nobreak >nul
echo Testing stability on core 5...
PowerShell "$Process = Get-Process Cinebench; $Process.ProcessorAffinity=3072"
timeout /t 10 /nobreak >nul
echo Testing stability on core 6...
PowerShell "$Process = Get-Process Cinebench; $Process.ProcessorAffinity=12288"
timeout /t 10 /nobreak >nul
echo Testing stability on core 7...
PowerShell "$Process = Get-Process Cinebench; $Process.ProcessorAffinity=49152"
timeout /t 10 /nobreak >nul
echo Testing stability on core 8...
PowerShell "$Process = Get-Process Cinebench; $Process.ProcessorAffinity=196608"
timeout /t 10 /nobreak >nul
echo Testing stability on core 9...
PowerShell "$Process = Get-Process Cinebench; $Process.ProcessorAffinity=786432"
timeout /t 10 /nobreak >nul
echo Testing stability on core 10...
PowerShell "$Process = Get-Process Cinebench; $Process.ProcessorAffinity=3145728"
timeout /t 10 /nobreak >nul
echo Testing stability on core 11...
PowerShell "$Process = Get-Process Cinebench; $Process.ProcessorAffinity=12582912"
timeout /t 10 /nobreak >nul
echo Testing stability on core 12...
PowerShell "$Process = Get-Process Cinebench; $Process.ProcessorAffinity=50331648"
timeout /t 10 /nobreak >nul
echo Testing stability on core 13...
PowerShell "$Process = Get-Process Cinebench; $Process.ProcessorAffinity=201326592"
timeout /t 10 /nobreak >nul
echo Testing stability on core 14...
PowerShell "$Process = Get-Process Cinebench; $Process.ProcessorAffinity=805306368"
timeout /t 10 /nobreak >nul
echo Testing stability on core 15...
PowerShell "$Process = Get-Process Cinebench; $Process.ProcessorAffinity=3221225472"
timeout /t 10 /nobreak >nul
goto loop







DemonAk said:


> Received 3rd CPU, AGAIN BG 2044SUS, he is much better, pass boost tester without bsod or reboot, linx with one thread....



pass boost tester without bsod or reboot.
linx with one thread 100 runs with 5k size.
Pass blender bench
pass geekbench
pass linx 20 runs with 40k size
Realbench 5 runs
pass 20 runs x264 Stability Test
pass prime95, 5 hours, with custom settings: min fft 4k max fft 400k
pass OCCT small data set, large data set 1 hour each
pass y-cruncher all 9 tests, 10 min each
And right now i set curve optimizer -15 for all cores (-20, -25 bsod in boost tester), previously 2 cpus can't boot windows with curve -5.


----------



## xProlific

Anyone having Idle reboots try setting "Power Supply Idle Control" in the bios to "Typical Current Idle". This fixed it for me.


----------



## RemoteSpecialist

xProlific said:


> Anyone having Idle reboots try setting "Power Supply Idle Control" in the bios to "Typical Current Idle". This fixed it for me.


I tried. This did not help in my case.


----------



## geoxile

The only "fix" for this is an RMA right? Any idea how long 5950X RMAs are taking right now? Been getting idle resets with a 5950X on a B550M Steel Legend, turning off global C-states seems to be a work around but it's not a good solution because now my CPU "idles" at 70W.


----------



## ghiga_andrei

Twirlz said:


> Was wondering if anybody could help me in identifying if I really am suffering from this issue or something else.
> 
> I'm having idle reboots. Everything seems perfectly fine at load whereas on idle it just reboots randomly. Could be while browsing the web or when I'm away from the computer, it happened four times yesterday. I first installed the 5900X a few weeks ago and it ran beautifully until the 28th, which coincidentally is when I first installed the RX 6800.
> 
> Although I first assumed the RX 6800 is the cause of the reboots, upon checking event viewer I've had 11 WHEA errors since the 28th. Error type Cache Hierarchy Error, event ID 18 (and one Bus/Interconnect Error). I've been reading this thread but I'm getting a little confused. In my case, the GPU is the only thing which changed yet I'm now experiencing these reboots and getting processor core WHEAs in event viewer.
> 
> Thank you for any insight.
> 
> Specs:
> 5900X (PBO off, bg2046sus)
> Corsair 32GB 3200MHz CL16
> Asus Crosshair X470 (latest BIOs tested)
> EVGA G2 750W
> Sapphire RX 6800 Nitro


Try to do a BIOS reset by removing the battery and waiting for 15 seconds. It is different than loading defaults in the menu. Then do not change anything in BIOS and see if you get reboots.


----------



## Imraneo

I just got my replacement 5900X. Batch 5052SUS (previously 5043SUS) - not that the batch matters.
This new CPU seems conservative. Importantly there are no reboots (keeping my fingers crossed here). Single core boosts around 4.7 - 4.8Ghz. All core is stuck at 4Ghz. I am expecting at least 4.5Ghz. I also noticed that is does not go below 3599Mhz. Is this normal? I thought it should go into sleep to conserve energy.
I'm so confused here. This CPU is like a difference beast altogether.

Also, is it recommended to install Ryzen Master?

Cheers


----------



## ghiga_andrei

Imraneo said:


> I just got my replacement 5900X. Batch 5052SUS (previously 5043SUS) - not that the batch matters.
> This new CPU seems conservative. Importantly there are no reboots (keeping my fingers crossed here). Single core boosts around 4.7 - 4.8Ghz. All core is stuck at 4Ghz. I am expecting at least 4.5Ghz. I also noticed that is does not go below 3599Mhz. Is this normal? I thought it should go into sleep to conserve energy.
> I'm so confused here. This CPU is like a difference beast altogether.
> 
> Also, is it recommended to install Ryzen Master?
> 
> Cheers


Your batches are for sure 2052 and 2043, not starting with 50... it matters at least to know that your new CPU is 9 weeks newer in production... I don't know about the new boosts since I also have an "old" 5900x... mine boosts to 4.95GHz with default settings and all core it depends on benchmark and if you enable PBO... could be anything between 3.7GHz to 4.4GHz... To see the energy consumption see in HWInfo what is Total Socket Power and Effective clocks... mine sits at around 50W Socket Power in idle and Effective clocks are close to 100MHz except the core that runs HWInfo...


----------



## RemoteSpecialist

Imraneo said:


> I just got my replacement 5900X


On my 5900x (2046SUS) with PBO disabled I saw 4100 in multicore load and boost up to 4950 for the single-core load.


----------



## ghiga_andrei

RemoteSpecialist said:


> On my 5900x (2046SUS) with PBO disabled I saw 4100 in multicore load and boost up to 4950 for the single-core load.


With PBO disabled it depends a lot, if you run CPU-Z you get 4.1 - 4.3GHz all core because it doesn't need much current, but f you run Prime95 Small FFT, you get 3.6GHz all core because you get limited by stock EDC or TDC.


----------



## RemoteSpecialist

ghiga_andrei said:


> With PBO disabled it depends a lot


yep - thx for the point - I checked in the Cinebench R23 multi-core bench and monitored clocks in HWInfo.


----------



## Imraneo

ghiga_andrei said:


> Your batches are for sure 2052 and 2043, not starting with 50... it matters at least to know that your new CPU is 9 weeks newer in production... I don't know about the new boosts since I also have an "old" 5900x... mine boosts to 4.95GHz with default settings and all core it depends on benchmark and if you enable PBO... could be anything between 3.7GHz to 4.4GHz... To see the energy consumption see in HWInfo what is Total Socket Power and Effective clocks... mine sits at around 50W Socket Power in idle and Effective clocks are close to 100MHz except the core that runs HWInfo...


Yes, they're 2052 and 2043! Yup, the difference of batch is known, but whether its "good" or "bad" remains to be seen.
4.95Ghz simultaneously on all cores? That's insane.
There are a few things I did. Went back to Asus 3001 non-beta BIOS. This gave me better CB23 results, closer to what I'd expect. CPU also runs cooler. It's funny although the clocks on this one are lower than my previous CPU, the CB23 scores are higher. Single core now goes up to 4.95Ghz. My all-core clocks are still low.

PBO by default is on. I'm trying out the curve optimizer, -10 on all cores. I do see slight improvements, but I'm trying not to get ahead of myself.
As for energy consumption, I remember my previous CPU went way below base clock speeds while idling. Does yours stay at 3.8Ghz or go lower?

Again, back to Ryzen Master. Is it any better way of tuning this CPU? Gotta admit, there's alot to learn there.
Cheers


----------



## ghiga_andrei

Imraneo said:


> Yes, they're 2052 and 2043! Yup, the difference of batch is known, but whether its "good" or "bad" remains to be seen.
> 4.95Ghz simultaneously on all cores? That's insane.
> There are a few things I did. Went back to Asus 3001 non-beta BIOS. This gave me better CB23 results, closer to what I'd expect. CPU also runs cooler. It's funny although the clocks on this one are lower than my previous CPU, the CB23 scores are higher. Single core now goes up to 4.95Ghz. My all-core clocks are still low.
> 
> PBO by default is on. I'm trying out the curve optimizer, -10 on all cores. I do see slight improvements, but I'm trying not to get ahead of myself.
> As for energy consumption, I remember my previous CPU went way below base clock speeds while idling. Does yours stay at 3.8Ghz or go lower?
> 
> Again, back to Ryzen Master. Is it any better way of tuning this CPU? Gotta admit, there's alot to learn there.
> Cheers


4.95GHz single core of course...


----------



## xeizo

ghiga_andrei said:


> 4.95GHz single core of course...


Of course, 4.95 all core is with LN2


----------



## Imraneo

Sorry for being a noob here. But is "PBO auto" considered ON or OFF?
Coz now I've set it to enabled and whoooa...Multicore moved up from 4Ghz to 4.2Ghz, and now my CB23 score closely matches the only DB. Also, temps hit 90 deg too. Ouch.
At this point, my curve optimizer is off (I do clear the BIOS by shorting the pins. Thanks guys). Will slowly play around with more settings and eventually settle down.


----------



## RemoteSpecialist

PBO Auto = PBO Disabled


----------



## ghiga_andrei

Imraneo said:


> Sorry for being a noob here. But is "PBO auto" considered ON or OFF?
> Coz now I've set it to enabled and whoooa...Multicore moved up from 4Ghz to 4.2Ghz, and now my CB23 score closely matches the only DB. Also, temps hit 90 deg too. Ouch.
> At this point, my curve optimizer is off (I do clear the BIOS by shorting the pins. Thanks guys). Will slowly play around with more settings and eventually settle down.


Depends on your motherboard model, but remember that PBO depends also on the limits set. Stock limits are lower and Auto for sure uses them (maximum 142W total socket power). When you set it to enabled maybe you also set Limits to Motherboard instead of Default and that sets it to a maximum of 200W total socket power. So better multi-core performance but higher temps due to the extra 58W power.


----------



## kr0mka

Hey again, so, my 5900X 2037SUS that i couldn't even make stable went to rma last week and and a new 2052SUS came back last friday as a replacement from AMD. It is certainly at least a bit better than the 37 was, in way that it doesn't reboot as fast as the old one lol but I've got a couple wheas so far already, managed to test plain stock with pbo (whea), without pbo (whea), with typical idle current (whea) and now testing with c-states off (2 hours so far without whea reboot).

Is it possible the replacement CPUs aren't tested by AMD before shipping? It honestly surprises me.

Obviously it already seems as this one is bad too, but I'm starting to think something other than the CPU in the system might be causing the idle reboots/whea 18 hierarchy errors, is it possible?


The mobo is my main culprit here but I can't really test it, as I don't have any other mobo to check. The thing that pointed me in this direction is the fact that my pc was idle freezing/rebooting even when I put in my backup 3200G I bought for cheap in case the rma has delays. There was no wheas using it, but the system would freeze randomly when in browser/idling and i'd need to reboot it manually. I couldn't make it stable either, tried multiple combinations of ram clocks/amount of ram sticks in the system, basic manual oc, several other settings in bios.


I've already tried 3 PSUs, CM G750M Bronze, XFX TS750 Gold and my current LC Power 1200W Platinum, so I think i've eliminated the PSU out of the equation.


The things left in my spec that I haven't changed yet (as I don't have any replacements right now) are:

4 x 3200CL14 G.Skill Ares B-Die 8GB (I doubt it's the cause of the issues here, but I'll get 2 different sticks from a friend next week to test) 
Reference 6800XT (doubt it causes reboot issues here)
MSI X570 Tomahawk mobo - tried all the available bioses with my old 5900x and 3200g and couldn't make them stable at all.

I'm planning to test the new 5900x a bit more and check it out with the new ram sticks, if it doesn't help then I will try to RMA again and if the new replacement will still reboot randomly then I guess I'll go with RMAing the mobo.


----------



## Imraneo

ghiga_andrei said:


> Depends on your motherboard model, but remember that PBO depends also on the limits set. Stock limits are lower and Auto for sure uses them (maximum 142W total socket power). When you set it to enabled maybe you also set Limits to Motherboard instead of Default and that sets it to a maximum of 200W total socket power. So better multi-core performance but higher temps due to the extra 58W power.


Now to get the CB23 scores on par to the online DB, here's are my final settings (for now)
-Last non-beta BIOS AGESA 1.1.8.0
-XMP 3600
-PBO ON, Mobo limits (Thank you very much!)
-Curve optimizer -10

All cores boost to 4.4Ghz, single core till 4.9Ghz. 
I know Curve optimizer -20 gave me a reboot, so I know 1 or many cores isn't playing well. I'll run this for a while to determine the stability and if I have the energy, I will play around with per-core curve optimizer settings.

Cheers guys.


----------



## ghiga_andrei

After a lot of frustration and time wasted I keep the Curve Optimizer disabled. Even with -5 I get a reboot once every 3-4 days and it drives me mad. Stability is very hard to test.


----------



## kr0mka

Hey again, so, my 5900X 2037SUS that i couldn't even make stable went to rma last week and and a new 2052SUS came back last friday as a replacement from AMD. It is certainly at least a bit better than the 37 was, in way that it doesn't reboot as fast as the old one lol but I've got a couple wheas so far already, managed to test plain stock with pbo (whea), without pbo (whea), with typical idle current (whea) and now testing with c-states off (2 hours so far without whea reboot).

Is it possible the replacement CPUs aren't tested by AMD before shipping? It honestly surprises me.

Obviously it already seems as this one is bad too, but I'm starting to think something other than the CPU in the system might be causing the idle reboots/whea 18 hierarchy errors, is it possible?


The mobo is my main culprit here but I can't really test it, as I don't have any other mobo to check. The thing that pointed me in this direction is the fact that my pc was idle freezing/rebooting even when I put in my backup 3200G I bought for cheap in case the rma has delays. There was no wheas using it, but the system would freeze randomly when in browser/idling and i'd need to reboot it manually. I couldn't make it stable either, tried multiple combinations of ram clocks/amount of ram sticks in the system, basic manual oc, several other settings in bios.


I've already tried 3 PSUs, CM G750M Bronze, XFX TS750 Gold and my current LC Power 1200W Platinum, so I think i've eliminated the PSU out of the equation.


The things left in my spec that I haven't changed yet (as I don't have any replacements right now) are:

4 x 3200CL14 G.Skill Ares B-Die 8GB (I doubt it's the cause of the issues here, but I'll get 2 different sticks from a friend next week to test)
Reference 6800XT (doubt it causes reboot issues here)
MSI X570 Tomahawk mobo - tried all the available bioses with my old 5900x and 3200g and couldn't make them stable at all.

I'm planning to test the new 5900x a bit more and check it out with the new ram sticks, if it doesn't help then I will try to RMA again and if the new replacement will still reboot randomly then I guess I'll go with RMAing the mobo.


----------



## xeizo

ghiga_andrei said:


> After a lot of frustration and time wasted I keep the Curve Optimizer disabled. Even with -5 I get a reboot once every 3-4 days and it drives me mad. Stability is very hard to test.


You have to find if you have one or a couple of bad cores and use Curve Optimizer to give them more juice. I have one very bad at -1, one rather bad at -10, most at -15 and the best at -20. So, per core is necessary, all core would have given me -1 on all.


----------



## ghiga_andrei

xeizo said:


> You have to find if you have one or a couple of bad cores and use Curve Optimizer to give them more juice. I have one very bad at -1, one rather bad at -10, most at -15 and the best at -20. So, per core is necessary, all core would have given me -1 on all.


Right, but since your best core does not benefit from CO then your Single core performance stays the same. And the Multi core is slightly improved, but already we have 12 cores so 3-4% there you will never notice. So what's the point of wasting weeks of rebooting and stability testing ? Single core performance is most important in normal daily tasks like browsing, youtube, even maybe gaming, depending on the games.


----------



## RemoteSpecialist

Imraneo said:


> here's are my final settings (for now)


Congrats on stable work with such settings


----------



## xeizo

ghiga_andrei said:


> Right, but since your best core does not benefit from CO then your Single core performance stays the same. And the Multi core is slightly improved, but already we have 12 cores so 3-4% there you will never notice. So what's the point of wasting weeks of rebooting and stability testing ? Single core performance is most important in normal daily tasks like browsing, youtube, even maybe gaming, depending on the games.


You are wrong, the best core do benefit from CO as it boosts higher but more important enjoys sustained boost which is important in gaming. The benefit of the good core is it can take CO without crashing. I label "good" as the most stable core, not as boosting the highest for one micro second.

All my cores can do 5GHz one way or the other, at least five can do 5150MHz, but only a few can sustain 4.9GHz single core in real workloads. And be stable.

Per core CO gives ~500-1000p extra in Time Spy CPU and a higher boosting monitor curve vs CO turned off.

I don't value multi very much, I have set PPT at 165W to keep temps low.


----------



## Imraneo

I still have 1 thing that's still bugging me.
My CPU doesnt go to sleep. Constantly running at 3.6Ghz during idle. Is yours also the same? I had the impression the Global C-state setting will handle this, so that my cores run really slow, using less power and generating less heat.


----------



## ghiga_andrei

xeizo said:


> You are wrong, the best core do benefit from CO as it boosts higher but more important enjoys sustained boost which is important in gaming. The benefit of the good core is it can take CO without crashing. I label "good" as the most stable core, not as boosting the highest for one micro second.
> 
> All my cores can do 5GHz one way or the other, at least five can do 5150MHz, but only a few can sustain 4.9GHz single core in real workloads. And are stable.
> 
> Per core CO gives ~500p extra in Time Spy CPU and a higher boosting monitor curve vs CO turned off.


In most cases the highest boosting core (also windows preferred core) is also the one which takes the least negative CO. Also in my case. So how can the single core score increase if my best core does not benefit from CO ?


----------



## ghiga_andrei

Imraneo said:


> I still have 1 thing that's still bugging me.
> My CPU doesnt go to sleep. Constantly running at 3.6Ghz during idle. Is yours also the same? I had the impression the Global C-state setting will handle this, so that my cores run really slow, using less power and generating less heat.


Install Ryzen Master and see if it shows Sleep instead of core frequency in Advanced view.


----------



## brasoveanul

I have Global C-States on Auto and the cores go down to 2 GHz, even lower sometimes, at idle. By the way, do Auto for C-States mean that it is on?


----------



## xeizo

ghiga_andrei said:


> In most cases the highest boosting core (also windows preferred core) is also the one which takes the least negative CO. Also in my case. So how can the single core score increase if my best core does not benefit from CO ?


Then you have a boring CPU, my best core is certainly the one that can take the most CO = the most stable core. Try the same on any other core is a given WHEA.

edit. My current GB5, tested right now, single core looks to hold up well while multi is limited by PPT165:


----------



## Deepcuts

Imraneo said:


> I still have 1 thing that's still bugging me.
> My CPU doesnt go to sleep. Constantly running at 3.6Ghz during idle. Is yours also the same? I had the impression the Global C-state setting will handle this, so that my cores run really slow, using less power and generating less heat.


Try changing your power plan.
For me, Power Saver lowest clock is 1720 Mhz while Balanced and Performance is 2880 Mhz.


----------



## Imraneo

ghiga_andrei said:


> Install Ryzen Master and see if it shows Sleep instead of core frequency in Advanced view.





Deepcuts said:


> Try changing your power plan.
> For me, Power Saver lowest clock is 1720 Mhz while Balanced and Performance is 2880 Mhz.


In Ryzen Master, I do see my cores sleeping, and some with very less <1Ghz speeds during idle. This seems to be working ok.
Strangely HWmonitor always shows min 3599Mhz. So I guess Ryzen Master is more reliable?


----------



## ghiga_andrei

Imraneo said:


> In Ryzen Master, I do see my cores sleeping, and some with very less <1Ghz speeds during idle. This seems to be working ok.
> Strangely HWmonitor always shows min 3599Mhz. So I guess Ryzen Master is more reliable?


It's made by AMD, cannot be more reliable, theoretically.


----------



## Imraneo

ghiga_andrei said:


> It's made by AMD, cannot be more reliable, theoretically.


Yup, thats what I think so too. The numbers in Ryzen Master seems more "realistic" too. My temps are lower there (8degs lower!) Let me read around on what's going on between these 2 softwares..


----------



## Anthosm

Imraneo said:


> Yup, thats what I think so too. The numbers in Ryzen Master seems more "realistic" too. My temps are lower there (8degs lower!) Let me read around on what's going on between these 2 softwares..


I think I read at some place that the other software are not as good at reading the actual sleep states like ryzen master does. Apparently one issue as well is that when you poll the cores to see what they are doing then you are waking them by doing so (which is why some have constantly high "idle" voltages as in reality the monitoring software doesn't trully allow the cpu to idle as much). Could be dead wrong though. Don't know.


----------



## Spectre73

ghiga_andrei said:


> After a lot of frustration and time wasted I keep the Curve Optimizer disabled. Even with -5 I get a reboot once every 3-4 days and it drives me mad. Stability is very hard to test.


Same for me, so no co for now.


----------



## kr0mka

kr0mka said:


> Hey again, so, my 5900X 2037SUS that i couldn't even make stable went to rma last week and and a new 2052SUS came back last friday as a replacement from AMD. It is certainly at least a bit better than the 37 was, in way that it doesn't reboot as fast as the old one lol but I've got a couple wheas so far already, managed to test plain stock with pbo (whea), without pbo (whea), with typical idle current (whea) and now testing with c-states off (2 hours so far without whea reboot).
> 
> Is it possible the replacement CPUs aren't tested by AMD before shipping? It honestly surprises me.
> 
> Obviously it already seems as this one is bad too, but I'm starting to think something other than the CPU in the system might be causing the idle reboots/whea 18 hierarchy errors, is it possible?
> 
> 
> The mobo is my main culprit here but I can't really test it, as I don't have any other mobo to check. The thing that pointed me in this direction is the fact that my pc was idle freezing/rebooting even when I put in my backup 3200G I bought for cheap in case the rma has delays. There was no wheas using it, but the system would freeze randomly when in browser/idling and i'd need to reboot it manually. I couldn't make it stable either, tried multiple combinations of ram clocks/amount of ram sticks in the system, basic manual oc, several other settings in bios.
> 
> 
> I've already tried 3 PSUs, CM G750M Bronze, XFX TS750 Gold and my current LC Power 1200W Platinum, so I think i've eliminated the PSU out of the equation.
> 
> 
> The things left in my spec that I haven't changed yet (as I don't have any replacements right now) are:
> 
> 4 x 3200CL14 G.Skill Ares B-Die 8GB (I doubt it's the cause of the issues here, but I'll get 2 different sticks from a friend next week to test)
> Reference 6800XT (doubt it causes reboot issues here)
> MSI X570 Tomahawk mobo - tried all the available bioses with my old 5900x and 3200g and couldn't make them stable at all.
> 
> I'm planning to test the new 5900x a bit more and check it out with the new ram sticks, if it doesn't help then I will try to RMA again and if the new replacement will still reboot randomly then I guess I'll go with RMAing the mobo.


Seems like my post was locked from viewing for some time, any opinions on the quoted?


----------



## xeizo

kr0mka said:


> Seems like my post was locked from viewing for some time, any opinions on the quoted?


Have you enabled GDM for the memory? It helps with stability. Also, it's c14 memory, it's like 3600c16 meaning it's OC. You need to check VDIMM, VTT, PLL, SOC, VDDP and VDDG for a stable memory OC. Or just run it in c16 instead. And not below 1.35V. You may also have to check secondary timings for the memory, it could happen the motherboard does silly memory training.

Do you use dual 12V wires to your GPU from the PSU? 6800XT is known for freezing if using splitters.

The two most obvious culprits.


----------



## kr0mka

xeizo said:


> Have you enabled GDM for the memory? It helps with stability. Also, it's c14 memory, it's like 3600c16 meaning it's OC. You need to check VDIMM, VTT, PLL, SOC, VDDP and VDDG for a stable memory OC. Or just run it in c16 instead. And not below 1.35V. You may also have to check secondary timings for the memory, it could happen the motherboard does silly memory training.
> 
> Do you use dual 12V wires to your GPU from the PSU? 6800XT is known for freezing if using splitters.
> 
> The two most obvious culprits.


Yeah gdm is enabled by default on xmp here, I've also tried running the memory on silly speeds like 2133 with JEDEC timings and the old one also was freezing. The new 5900x just crashed on me replying to this post, with C-state control disabled.
Now I'm trying the forementioned 2133 memory speed.on the new one.
Yeah, I have 2x8 pin on separate cables coming from psu to the 6800xt.

I honestly doubt I'm that unlucky since most people with wheas seem to be able to fix it with c-state/idle current or the offsets on vcore.


----------



## ENTERPRISE

Though I would chime in, Though no problems here (Touch Wood), I have a launch 5950X. not sure on the batch number unless I can get it from the Box. I am using an Asrock Aqua X570. I did notice on the poll that the Asrocks motherboards come in better than the others, this does not necessarily mean anything but makes you wonder if it is just down to popularity of other motherboard vendors or if Asrock do something with their BIOS that alleviates the issue? Who knows at this point.

I am eagerly awaiting the new AGESA for improvements.


----------



## GamBoTron

ENTERPRISE said:


> Though I would chime in, Though no problems here (Touch Wood), I have a launch 5950X. not sure on the batch number unless I can get it from the Box. I am using an Asrock Aqua X570. I did notice on the poll that the Asrocks motherboards come in better than the others, this does not necessarily mean anything but makes you wonder if it is just down to popularity of other motherboard vendors or if Asrock do something with their BIOS that alleviates the issue? Who knows at this point.
> 
> I am eagerly awaiting the new AGESA for improvements.


Motherboards definitely plays it parts in this, but at an overall much lower level than the cpu and its power specifications. This is happening across so many different setups and even tho the problems are coming in different variations they all lead to more or less the same thing


----------



## kr0mka

kr0mka said:


> Yeah gdm is enabled by default on xmp here, I've also tried running the memory on silly speeds like 2133 with JEDEC timings and the old one also was freezing. The new 5900x just crashed on me replying to this post, with C-state control disabled.
> Now I'm trying the forementioned 2133 memory speed.on the new one.
> Yeah, I have 2x8 pin on separate cables coming from psu to the 6800xt.
> 
> I honestly doubt I'm that unlucky since most people with wheas seem to be able to fix it with c-state/idle current or the offsets on vcore.


Soo, I think I've found the issue: HWINFO was probably the culprit here with my cpu/gpu combo. Saw a reddit post mentioning it, later the thread that I linked just now. I'm running without hwinfo since yesterday with pbo and xmp on and haven't encountered a whea yet. 

So anyone experiencing wheas with 6800XT & Ryzen CPUs ( or probably any other navi gpu ) and is running Hwinfo in the background for monitoring, please check out this thread and try disabling hwinfo for some time.


----------



## RemoteSpecialist

kr0mka said:


> try disabling hwinfo for some time.


I tried - this did not help - I can easily reproduce WHEA error without hwinfo.

P.S.
I think it is impossible to invoke Hardware CPU Error programmatically if the CPU is valid.


----------



## xeizo

kr0mka said:


> Soo, I think I've found the issue: HWINFO was probably the culprit here with my cpu/gpu combo. Saw a reddit post mentioning it, later the thread that I linked just now. I'm running without hwinfo since yesterday with pbo and xmp on and haven't encountered a whea yet.
> 
> So anyone experiencing wheas with 6800XT & Ryzen CPUs ( or probably any other navi gpu ) and is running Hwinfo in the background for monitoring, please check out this thread and try disabling hwinfo for some time.


Thanks! There is also a new build posted late in that thread, worth trying


----------



## kr0mka

RemoteSpecialist said:


> I tried - this did not help - I can easily reproduce WHEA error without hwinfo.
> 
> P.S.
> I think it is impossible to invoke Hardware CPU Error programmatically if the CPU is valid.


This is probably applicable to only Navi gpus, since the dev himself is suspecting the new navi monitoring features added in the last version. 
I wasn't able to make it stable with bios settings at all on three CPUs, so I really started to doubt it was the cpu's fault here. 

And since you managed to fix the issue with typical current bios setting then I guess you are dealing with a real hardware issue here.


----------



## yaniv82

There's a new AMD Chipset Driver version on Asus's ROG Crosshair VIII Dark Hero support site released today (Version 2.11.26.106). I'm trying it out with the latest BIOS 3204. So far PC has been stable on BIOS default settings.


----------



## folklore11

yaniv82 said:


> There's a new AMD Chipset Driver version on Asus's ROG Crosshair VIII Dark Hero support site released today (Version 2.11.26.106). I'm trying it out with the latest BIOS 3204. So far PC has been stable on BIOS default settings.
> View attachment 2476971


I've had no issues with my rig. 5950X MSI RTX 2080 Super, 32 gigs G-Skillz B-Die at 3400 tight timings, Artic AIO 360 , Samsung EVO plus NVMe's (2), HDD WD Black 2tb. (2) and Corsair HX1000I PSU


----------



## MikeS3000

Submitted my warranty claim today for my 5900x. We'll see what happens. Anybody have some more recent experiences to share with the amount of time they were without their CPU in the United States after sending in for RMA?


----------



## xeizo

yaniv82 said:


> There's a new AMD Chipset Driver version on Asus's ROG Crosshair VIII Dark Hero support site released today (Version 2.11.26.106). I'm trying it out with the latest BIOS 3204. So far PC has been stable on BIOS default settings.
> View attachment 2476971


Thanks! Trying it with the CH8 WiFi, so far so good


----------



## RaEyE

Not sure if it has been posted here already, but there seems to be a correlation between WHEA errors on AMD platforms with Navi cards and HwInfo.






Is HWiNFO causing the dreaded WHEA-Logger Event ID XX Cache Hierarchy Errors and sudden reboots on AMD Ryzen systems?


Hello everyone: A couple of users and myself have been suffering sudden reboots with our computers composed of Ryzen CPU systems (Ryzen 3000, but especially 5000) under different load conditions. The quickest way for us to trigger it, however, has been by using software designed to test RAM...




www.hwinfo.com





I'm not sure if this also applies to NVIDIA GPUs, but it should be worth a shot, to simply disable HwInfo or switch to a version <= 6.40.


----------



## xeizo

RaEyE said:


> Not sure if it has been posted here already, but there seems to be a correlation between WHEA errors on AMD platforms with Navi cards and HwInfo.
> 
> 
> 
> 
> 
> 
> Is HWiNFO causing the dreaded WHEA-Logger Event ID XX Cache Hierarchy Errors and sudden reboots on AMD Ryzen systems?
> 
> 
> Hello everyone: A couple of users and myself have been suffering sudden reboots with our computers composed of Ryzen CPU systems (Ryzen 3000, but especially 5000) under different load conditions. The quickest way for us to trigger it, however, has been by using software designed to test RAM...
> 
> 
> 
> 
> www.hwinfo.com
> 
> 
> 
> 
> 
> I'm not sure if this also applies to NVIDIA GPUs, but it should be worth a shot, to simply disable HwInfo or switch to a version <= 6.40.


There is a new test version in the HWINFO forum with a possible fix, 6.43-4362


----------



## iNeri

xeizo said:


> There is a new test version in the HWINFO forum with a possible fix, 6.43-4362


Yep, Mumak already confirm the issue.

So, if AMD dont want to provide a sample for developers, then developers have to remove support for untested hardware to avoid this kink of problems.

Buy hey, its more important that the youtuber on turn have a sample!!! yey.


----------



## Deepcuts

Had a 6800 XT for 3 days and had no issue with hwinfo. Too bad it was not mine. Back to my 1080ti


----------



## iraff1

can someone confirm that the long awaited AGESA patch is just a down tune of the boost algoritm? I see my cpu that use to hit 5050 mhz on 6-7 cores are now max hitting 5000mhz on 3 cores, so clearly they lowered the performance and called it optimized... my cpu was perfectly stable in old AGESA so whats the point of upgrading if its a downgrade?


----------



## Hueristic

iraff1 said:


> can someone confirm that the long awaited AGESA patch is just a down tune of the boost algoritm? I see my cpu that use to hit 5050 mhz on 6-7 cores are now max hitting 5000mhz on 3 cores, so clearly they lowered the performance and called it optimized... my cpu was perfectly stable in old AGESA so whats the point of upgrading if its a downgrade?


Why would you change your bios when it was functioning optimal is the real question.


----------



## domdtxdissar

iraff1 said:


> can someone confirm that the long awaited AGESA patch is just a down tune of the boost algoritm? I see my cpu that use to hit 5050 mhz on 6-7 cores are now max hitting 5000mhz on 3 cores, so clearly they lowered the performance and called it optimized... my cpu was perfectly stable in old AGESA so whats the point of upgrading if its a downgrade?


I can confirm this, lost ~100mhz singlethread updating crosshair viii hero wifi bios 3003 to 3204 (non beta,final bios) with AGESA 1.2.00.
Went from effective clocks ~5100mhz to 5000mhz on my 5950x in cinebench r20 singlethread. (highly optimized 24/7 tuned settings with c-state disable for -30 allcore CO stable)

Multithread is the same or a smidge better.

Only reason for my upgrade to AGESA 1.2.00 is SAM support for Nvidia.. Otherwise i was perfectly fine on 3003 bios.


domdtxdissar said:


> One last hurrah for bios 3003 before i update to a bios with AMD AM4 AGESA V2 PI 1.2.0.0 and support for Nvidia smart access memory.
> Cold air benching with EK custom waterloop+TechN Zen3 waterblock
> Curve optimizer = -30 allcore
> Stable in everything i throw at it, and no WHEA errors.
> View attachment 2475341
> 
> Cinebench r23 multithread = 32229 points
> Cinebench r23 singlethread = 1729 points
> 
> Cinebench r20 multithread = 12441 points
> Cinebench r20 singlethread = 674 points
> 
> Cinebench r15 multithread = 5404 points
> Cinebench r15 multithread = 288 points
> 
> CPU-Z validator @ AMD Ryzen 9 5950X @ 4798.88 MHz - CPU-Z VALIDATOR
> 
> Some Asus realbench + Passmark performancetest numbers @ PassMark Software - Display Baseline ID# 1359214 (This machine is ranked #36 out of 156355 results globally)
> View attachment 2475342
> 
> 
> Geekbench 4 @ ASUS System Product Name - Geekbench Browser
> Singlethread = 8215 points
> Multithread = 74733 points
> 
> Geekbench5 @ ASUS System Product Name - Geekbench Browser
> Singlethread = 1844 points
> Multithread = 20054 points
> 
> Some heavy IBT high+very high and Y-Cruncher numbers:
> View attachment 2475343
> 
> 
> Did also run a full sweep of all 3dmarks, but i will post that in one other thread


----------



## Deepcuts

iraff1 said:


> can someone confirm that the long awaited AGESA patch is just a down tune of the boost algoritm? I see my cpu that use to hit 5050 mhz on 6-7 cores are now max hitting 5000mhz on 3 cores, so clearly they lowered the performance and called it optimized... my cpu was perfectly stable in old AGESA so whats the point of upgrading if its a downgrade?


Gigabyte F33a with AGESA 1.2.0.0 with a -15 CO all core. Looks about the same


----------



## iraff1

Hueristic said:


> Why would you change your bios when it was functioning optimal is the real question.


Just to try out the "optimized" performance, but turns out optimized was a huge lie, they just tune everything down until its stable. The real question should be why didn't any of the hundreds of reviewers get a sample that had these problems yet so many of us regular plebs have issues with the default boostcurves. AMD scamboozled everyone, shipped perfect golden samples to reviewers and the plebs (the customers) got whatever didn't make the cut for the reviewer scene i guess? And now they are tuning down the performance from the original base level that was advertised/reviewed.


----------



## RemoteSpecialist

My 5900x crashes on 4950Ghz. In the specification, it's written that the max is 4800. So if they increase stability to rock-solid on 4800 - I will thank them.


----------



## ghiga_andrei

RemoteSpecialist said:


> My 5900x crashes on 4950Ghz. In the specification, it's written that the max is 4800. So if they increase stability to rock-solid on 4800 - I will thank them.


That's not enough, they have to meet the performance showed in the launch presentation Where Gaming Begins, otherwise all was a scam and they lied to us.


----------



## RemoteSpecialist

4950 vs 4800 - I don’t think it’s gonna be easy to catch this 3%. Also just take a look at 3dmark results for any CPU - you will always see some difference to higher or lower level.

But the black screens restarts - when the pc suddenly lost all power - are much worse - as you can get a defective part - rtx 3080 for example - after one of them


----------



## GamBoTron

iraff1 said:


> Just to try out the "optimized" performance, but turns out optimized was a huge lie, they just tune everything down until its stable. The real question should be why didn't any of the hundreds of reviewers get a sample that had these problems yet so many of us regular plebs have issues with the default boostcurves.* AMD scamboozled everyone*, shipped perfect golden samples to reviewers and the plebs (the customers) got whatever didn't make the cut for the reviewer scene i guess? And now they are tuning down the performance from the original base level that was advertised/reviewed.


thats a bit extreme. This isnt a issue for everyone


----------



## xeizo

GamBoTron said:


> thats a bit extreme. This isnt a issue for everyone


I agree, no hidden agenda here, it just is what it is. I still have the same performance as with the first V2 bioses. I can just look at my old benchmarks and they agree. The only thing I have had to tweak is to remove WHEA.


----------



## GamBoTron

xeizo said:


> I agree, no hidden agenda here, it just is what it is. I still have the same performance as with the first V2 bioses. I can just look at my old benchmarks and they agree. The only thing I have had to tweak is to remove WHEA.


i got a bit confused here, do you agree with my statement or that everyone got scamboozled (funny word btw )?

I have heard so many different stories when it comes to this product: ranging from fantastic performance to WHEA hell

Scam is a bit to harsh imo.


----------



## xeizo

GamBoTron said:


> i got a bit confused here, do you agree with my statement or that everyone got scamboozled (funny word btw )?
> 
> I have heard so many different stories when it comes to this product: ranging from fantastic performance to WHEA hell
> 
> Scam is a bit to harsh imo.


I agree it is no scam, the product has issues though (WHEA, sudden reboots)


----------



## MikeS3000

While waiting on RMA request from AMD for my 5900x with one bad core, my crazy last ditch effort is to install my CPU on a different MB. I just ordered B550 MSI Unify that will be here tomorrow. This YouTube video, while older and pertaining to Zen 2 got me very suspicious of my x570 Aorus Pro Wifi. 



Let's see what happens tomorrow. I will run the same single core stress tests under BIOS defaults and lets see if I can "save" my 5900x.


----------



## DestrucSean7

*Replacing the CPU cannot be the fix for everyone.*

I got fed up with this issue after excessive Windows reboots (Gigabyte Aorous Elite X570 + Ryzen 5950X). I decided to reinstall my old Ryzen 2700X and load optimized defaults in BIOS, and the exact same issue occurred. My 2700X was 100% stable on old B350 motherboard. I have literally tried every fix that I could find except:

Increased SoC voltage
Downgrading PCIe Gen 4 to Gen 3
I'm considering development of a single website for everyone to voice their issues. This site would force everyone to list all hardware peripherals connected so we could look for trends.


----------



## MikeS3000

DestrucSean7 said:


> *Replacing the CPU cannot be the fix for everyone.*
> 
> I got fed up with this issue after excessive Windows reboots (Gigabyte Aorous Elite X570 + Ryzen 5950X). I decided to reinstall my old Ryzen 2700X and load optimized defaults in BIOS, and the exact same issue occurred. My 2700X was 100% stable on old B350 motherboard. I have literally tried every fix that I could find except:
> 
> Increased SoC voltage
> Downgrading PCIe Gen 4 to Gen 3
> I'm considering development of a single website for everyone to voice their issues. This site would force everyone to list all hardware peripherals connected so we could look for trends.


I do know that I have owned this Gigabyte board since 8/19. I ran a 3900x on it without any of these issue up until Nov. '20. Could be the combo of newer BIOS versions with Zen 3 CPUs on Gigabyte.


----------



## kr0mka

DestrucSean7 said:


> *Replacing the CPU cannot be the fix for everyone.*
> 
> I got fed up with this issue after excessive Windows reboots (Gigabyte Aorous Elite X570 + Ryzen 5950X). I decided to reinstall my old Ryzen 2700X and load optimized defaults in BIOS, and the exact same issue occurred. My 2700X was 100% stable on old B350 motherboard. I have literally tried every fix that I could find except:
> 
> Increased SoC voltage
> Downgrading PCIe Gen 4 to Gen 3
> I'm considering development of a single website for everyone to voice their issues. This site would force everyone to list all hardware peripherals connected so we could look for trends.





MikeS3000 said:


> I do know that I have owned this Gigabyte board since 8/19. I ran a 3900x on it without any of these issue up until Nov. '20. Could be the combo of newer BIOS versions with Zen 3 CPUs on Gigabyte.



If you're running hwinfo in the background be sure to turn it off from start up or run the latest beta build if you have a radeon GPU. Seems like HWINFO has been causing wheas for some people (including me) running ryzen along with radeon gpu. It started for me mid december I think and since I stopped using hwinfo the wheas stopped too.


----------



## Catscratch

DestrucSean7 said:


> *Replacing the CPU cannot be the fix for everyone.*
> 
> I got fed up with this issue after excessive Windows reboots (Gigabyte Aorous Elite X570 + Ryzen 5950X). I decided to reinstall my old Ryzen 2700X and load optimized defaults in BIOS, and the exact same issue occurred. My 2700X was 100% stable on old B350 motherboard. I have literally tried every fix that I could find except:
> 
> Increased SoC voltage
> Downgrading PCIe Gen 4 to Gen 3
> I'm considering development of a single website for everyone to voice their issues. This site would force everyone to list all hardware peripherals connected so we could look for trends.





MikeS3000 said:


> I do know that I have owned this Gigabyte board since 8/19. I ran a 3900x on it without any of these issue up until Nov. '20. Could be the combo of newer BIOS versions with Zen 3 CPUs on Gigabyte.


Robert Hallock says to stay on non-zen3 bios to those who have zen2 cpus. 


__ https://twitter.com/i/web/status/1324562768914239494


----------



## DestrucSean7

kr0mka said:


> If you're running hwinfo in the background be sure to turn it off from start up or run the latest beta build if you have a radeon GPU. Seems like HWINFO has been causing wheas for some people (including me) running ryzen along with radeon gpu. It started for me mid december I think and since I stopped using hwinfo the wheas stopped too.


Crazy strange timing on this. I literally just got suspicious of HWinfo (and other monitoring software that could be causing race states) and disabled it on startup 5 mins before reading the message. Currently on Gigabyte F32. System specs are:

Ryzen 5950x
Gigabyte Aorous Elite X570
32GB G.Skill 3200mhz
Radeon 6900XT
4TB Sabrent PCIe Gen 4 NVMe SSD
2TB Samsung 860 Evo SATA SSD
Creative SoundBlasterX AE-5
NZXT H510 Elite
*UPDATE: *Highly recommend anyone with WHEA errors and desktop reboots to remove HWinfo... epic thanks to kr0mka - I've been 99% stable for 3 weeks now.


----------



## xeizo

Catscratch said:


> Robert Hallock says to stay on non-zen3 bios to those who have zen2 cpus.
> 
> 
> __ https://twitter.com/i/web/status/1324562768914239494


He has to say that, as every bios upgrade is a risk of bricking the device.

The AGESA 1.2.0.0 bioses have excellent memory support, I have set new OC records for crappy RAM I have on both Zen 2 and Zen+ with 1.2.0.0. Better than ever before.

On my Zen 3 rig I don't see WHEA anymore after identifying the bad cores(two) and giving them more juice than the others in CO.


----------



## Imraneo

How do I find out which of my cores is weak? There are 12 and the combination is huge.
My best cores (according to Ryzen Master) are 1, 4, 7, 8 and I have set them to -5 in CO. Rest I've set to -10. 
I'm happy with multi-core performance, but I do like to have just a little more juice for single-core. I see cores 1 & 4 being used most of the time. Should just play with CO with respect to these 2 cores only? That being said.. I'm probably being too ambitious and don't mind leaving things as they are right now.. lol!


----------



## machine038

Imraneo said:


> How do I find out which of my cores is weak? There are 12 and the combination is huge.
> My best cores (according to Ryzen Master) are 1, 4, 7, 8 and I have set them to -5 in CO. Rest I've set to -10.
> I'm happy with multi-core performance, but I do like to have just a little more juice for single-core. I see cores 1 & 4 being used most of the time. Should just play with CO with respect to these 2 cores only? That being said.. I'm probably being too ambitious and don't mind leaving things as they are right now.. lol!


You start with setting for all cores a negative value, test for a day or two, leave it idle overnight, if is stable, set another negative value, in steps, -5, -10, -15...
When it crash, dial back up one until is stable, -10, -9, -7. You just found your weakest core..
Then you go by per core, if you're at -7, go back to -10, -15, -20. Rinse and repeat until you've done with all cores and you got your system stable.

I managed to get almost all cores at -30, just other two at -10 doing this. My slowest/weakest core is the "fastest one" boosting only 4.9GHz, all other cores boost to 5GHz.

Or you can try the new Clock Tuner ClockTuner for Ryzen (CTR) v2.0 RC3 Download
It sorta does that for you in an automated fashion. Please read the guide thoroughly before clicking away.. 

It takes a long time to get some results, but faster than doing by hand.
Also there is a "Silicon Fitness Test" that is pretty neat.


----------



## Imraneo

machine038 said:


> You start with setting for all cores a negative value, test for a day or two, leave it idle overnight, if is stable, set another negative value, in steps, -5, -10, -15...
> When it crash, dial back up one until is stable, -10, -9, -7. You just found your weakest core..
> Then you go by per core, if you're at -7, go back to -10, -15, -20. Rinse and repeat until you've done with all cores and you got your system stable.
> 
> I managed to get almost all cores at -30, just other two at -10 doing this. My slowest/weakest core is the "fastest one" boosting only 4.9GHz, all other cores boost to 5GHz.
> 
> Or you can try the new Clock Tuner ClockTuner for Ryzen (CTR) v2.0 RC3 Download
> It sorta does that for you in an automated fashion. Please read the guide thoroughly before clicking away..
> 
> It takes a long time to get some results, but faster than doing by hand.
> Also there is a "Silicon Fitness Test" that is pretty neat.


Thanks for sharing! It seems like there is no quick way of doing this. I will check out the Softwares you mentioned.
So far this is what I did:
All cores -10: rebooted once overnight
All cores -10, except 4 fastest cores at -5: seems ok so far.
Again, I'm just taking the 4 cores which Ryzen Master told me. In reality it may be different perhaps?
Anyways, I apologize to go out of topic here. I think we should focus on the WHEA idle reboots which many members are still facing.


----------



## Momo6161

Alright extra made an account to post this😁 First of all thank you for showing me, that the error was the cpu... i built my pc by my own and searched literally 2 days so there was no way it couldnt be the cpu. The reason why i write this message is following:

I saw that the solution of this was a broke cpu and the second cpu was produced in week 48 (or 46 dont know im on the smartphone dont roast me😂) and he said that the second cpu was working fine which confused me a little bit. Why? Because my cpu was the same as his SECOND CPU. So i got scared and still wanted a RMA on my Ryzen 5950x. And guess what? My RMA got approved today! I have mixed feelings. actually im happy to get a new one but otherwise it shows that it doesnt matter when the cpu was produced because at the same production date the cpu of the creator of the post works and my didnt. We will see. And for all those asking i sended my cpu to netherlands, so im from europe and it took 2 days to get approved after an additional 6 days where i needed the return label... so when u have that return label it should go brrrrrr


----------



## MikeS3000

So I just setup and installed my 5900x into a brand new MSI B550 Unify. I fired up prime95 large single thread isolated to Core #1 and the CPU fails the test almost instantly. I updated the BIOS to the latest and same issue. This confirms a defective CPU and I am going forward with RMA. I can't blame Gigabyte on this one as both boards have the exact same failure.


----------



## Deepcuts

MikeS3000 said:


> So I just setup and installed my 5900x into a brand new MSI B550 Unify. I fired up prime95 large single thread isolated to Core #1 and the CPU fails the test almost instantly. I updated the BIOS to the latest and same issue. This confirms a defective CPU and I am going forward with RMA. I can't blame Gigabyte on this one as both both have the exact same failure.


hehe you wanted to be the 1st one to check "It did, but a replacement motherboard fixed it" ?
Good luck with the RMA.


----------



## MikeS3000

Deepcuts said:


> hehe you wanted to be the 1st one to check "It did, but a replacement motherboard fixed it" ?
> Good luck with the RMA.


For sure I did. the unify is nice but I can't justify keeping it over my gigabyte x570. I got the email from AMD today asking for photos and documenting my troubleshooting steps. I recorded a YouTube video of the CPU failing prime at stock. Hopefully this will be quick turnaround and fix the issue.

Found out my CPU is a 2047PGS


----------



## mtavel

After weeks of troubleshooting, I've found that my 5950x unexpected reboots and WHEA errors were caused by core instability at low voltage associated with sleep C-States. Disabling C-States immediately resolved all my reboots - but also raised my idle temps and the low voltage instability prevents me from running any meaningful negative PBO curve optimization. Replaced my 5950x with a 5600x and it ran perfectly (with Global C-State Control enabled, as default). It's definitely the CPU. Going through RMA now.










Here's a summary of the APIC ID's associated with the 41 WHEA errors I had logged over a 16 day period. Core 14 and it's neighbors have some issues.


----------



## Hueristic

MikeS3000 said:


> So I just setup and installed my 5900x into a brand new MSI B550 Unify. I fired up prime95 large single thread isolated to Core #1 and the CPU fails the test almost instantly. I updated the BIOS to the latest and same issue. This confirms a defective CPU and I am going forward with RMA. I can't blame Gigabyte on this one as both boards have the exact same failure.



FYI, I had a Gigabyte UD burn out the unlocked cores on a Zosma after it had run 6 years fins on a Asrock extreme4. CPU unlocked fine with no issues and then on reboot cores burned out permenantly.


----------



## JohnnyFlash

mtavel said:


> After weeks of troubleshooting, I've found that my 5950x unexpected reboots and WHEA errors were caused by core instability at low voltage associated with sleep C-States. Disabling C-States immediately resolved all my reboots - but also raised my idle temps and the low voltage instability prevents me from running any meaningful negative PBO curve optimization. Replaced my 5950x with a 5600x and it ran perfectly (with Global C-State Control enabled, as default). It's definitely the CPU. Going through RMA now.
> 
> View attachment 2477308
> 
> 
> Here's a summary of the APIC ID's associated with the 41 WHEA errors I had logged over a 16 day period. Core 14 and it's neighbors have some issues.


Did disabling sleep states stop the WHEA errors as well?


----------



## mtavel

JohnnyFlash said:


> Did disabling sleep states stop the WHEA errors as well?


Disabling Global C-State Control (which disabled the low power sleep state on the CPU) stopped the idle reboots and WHEA errors I was getting. Switching from my 5950x to my temporary 5600x fixed the issue too - no issue running default C-States on the 5600x. I think I lost the silicon lottery with my 5950x being unstable at the standard low voltages associated with core sleep.


----------



## brasoveanul

I would like to clarify, does Global C-States on Auto (Asus BIOS) mean it is on or off?


----------



## mtavel

brasoveanul said:


> I would like to clarify, does Global C-States on Auto (Asus BIOS) mean it is on or off?


Auto does not Disable the feature, it allows the mobo to make a determination regarding when the feature is enabled or disabled... so at best, it would not be enabled all the time, but it would be hard to know it's actual state without constantly observing your cores for low voltages and sleep states. You're best off disabling it if you want to test the impact of this setting on your CPU.


----------



## Imraneo

I sense a non-Beta BIOS coming real soon for Asus.
I saw a glimpse of 3405 based on the same AGESA 1.2.0.0, but the page was not found.. LOL


----------



## MikeS3000

Well that didn't take long. I sent my supporting documentation to AMD via email at 6:30 p.m. yesterday including a video that demonstrates the failure. 2:00 a.m. today I received an approved RMA with a FedEx prepaid ground label to Miami, FL. Is this hope that the RMA process won't take too long to get a replacement?


----------



## mtavel

MikeS3000 said:


> Well that didn't take long. I sent my supporting documentation to AMD via email at 6:30 p.m. yesterday including a video that demonstrates the failure. 2:00 a.m. today I received an approved RMA with a FedEx prepaid ground label to Miami, FL. Is this hope that the RMA process won't take too long to get a replacement?


I saw someone post it was several weeks turn around time on a 5950x RMA.... I suspect it's completely down to stock and whether they have replacements on hand at the moment or not. I hadn't really stopped to think about how well binned the CCXs for the 5950x have to be to prevent issues like this. I can't imagine how many CCXs would qualify to make a great 5600x that would also make a horrible 5950x CCX.


----------



## ghiga_andrei

mtavel said:


> I hadn't really stopped to think about how well binned the CCXs for the 5950x have to be to prevent issues like this. I can't imagine how many CCXs would qualify to make a great 5600x that would also make a horrible 5950x CCX.


That's not our problem, the price for the 5950x is big enough to justify good silicon. Otherwise, what are you paying for ?


----------



## MikeS3000

I thought that's what's happening especially with the discovery of dual CCD 5800x CPUs. I failed 5950x could turn into 5800x. A failed 5900x would become 5600x.


----------



## mtavel

Imraneo said:


> I sense a non-Beta BIOS coming real soon for Asus.
> I saw a glimpse of 3405 based on the same AGESA 1.2.0.0, but the page was not found.. LOL


ASUS released 3204 (non-beta) for the Dark Hero a few days ago which includes AGESA 1.2.0.0. - It did not resolve or even reduce my low-voltage/sleep state related reboots and WHEA errors on my 5950x.

Version 3204
2021/01/29 20.41 MBytes
ROG CROSSHAIR VIII DARK HERO BIOS 3204
1. Update AMD AM4 AGESA V2 PI 1.2.0.0


----------



## mtavel

ghiga_andrei said:


> That's not our problem, the price for the 5950x is big enough to justify good silicon. Otherwise, what are you paying for ?


I totally agree, we shouldn't be the ones binning AMD's silicon for them. I just appreciate the level of testing required to properly do the binning (which appears to not be happening consistently).


----------



## Deepcuts

mtavel said:


> I totally agree, we shouldn't be the ones binning AMD's silicon for them. I just appreciate the level of testing required to properly do the binning (which appears to not be happening consistently).


I think it is not happening at all.
I guess some schmuck back at AMD HQ did some math and concluded it is cheaper to let end users do the "binning" for them instead of maybe paying TSMC for better quality control.


----------



## GRABibus

mtavel said:


> Disabling Global C-State Control (which disabled the low power sleep state on the CPU) stopped the idle reboots and WHEA errors I was getting. Switching from my 5950x to my temporary 5600x fixed the issue too - no issue running default C-States on the 5600x. I think I lost the silicon lottery with my 5950x being unstable at the standard low voltages associated with core sleep.


When I disabled global cstates, I also has increase temps.
What I did on the advise of a member is to disable DF Cstates only.

from another member :
The CPU cores have CC6 State, PC6 State and C-State Boost feature that are required for the power gating and the boost to work properly.
Disabling the "Global C-State Control" option will neutralize all of these and more, and hence it affects the efficiency and the performance of the actual CPU cores negatively.
Meanwhile the "DF CState" option, that is the current suggested workaround, alone will not touch any of the three features, and hence it has no effect either on the efficiency or the performance of the CPU cores. It will have a minor impact on the fabric power consumption at idle, but we're talking about a change from around 7W to 8W or so.


----------



## GRABibus

Did you installed the last AMD chipset drivers from yesterday ?
I didn’t get one single reboot / Whea since when tweaking PBO/CO. But it is only one day...let’s wait for more time to see if it really helps.


----------



## Imraneo

mtavel said:


> ASUS released 3204 (non-beta) for the Dark Hero a few days ago which includes AGESA 1.2.0.0. - It did not resolve or even reduce my low-voltage/sleep state related reboots and WHEA errors on my 5950x.


My Strix X570-F BIOS finally got released:
Version 3405
2021/02/05 20.14 MBytes
ROG STRIX X570-F GAMING BIOS 3405
"1. Update AMD AM4 AGESA V2 PI 1.2.0.0
2. Update AMD RAID UEFI driver
3. Improve system stability

Will take my time to upgrade. Pretty comfy with 3001 for now.

I have a feeling AMD is well aware of these issues and are ready to accept all RMAs. The numbers may still not be alarming enough for AMD to make a risky announcement of any sort. Also.. lets not forget a number of members including myself reported issues surfacing a week after attaining the chips. Strange by true and this probably slipped through their validation.


----------



## JohnnyFlash

Imraneo said:


> I have a feeling AMD is well aware of these issues and are ready to accept all RMAs. The numbers may still not be alarming enough for AMD to make a risky announcement of any sort. Also.. lets not forget a number of members including myself reported issues surfacing a week after attaining the chips. Strange by true and this probably slipped through their validation.


There is zero reason for AMD to make any statement. Demand is still massive and they are servicing anyone with a bad one. There is no positive impact to making a statement.


----------



## mtavel

GRABibus said:


> Meanwhile the "DF CState" option, that is the current suggested workaround, alone will not touch any of the three features, and hence it has no effect either on the efficiency or the performance of the CPU cores. It will have a minor impact on the fabric power consumption at idle, but we're talking about a change from around 7W to 8W or so.


Thanks for providing a more refined workaround option. Here is some more detail about DF C-States:
[from https://lenovopress.com/lp1267.pdf page 27]

*DF (Data Fabric) C-States*​Much like CPU cores, the Infinity Fabric can go into lower power states while idle. However, there will be delay changing back to full-power mode causing some latency jitter. In a low latency workload, or one with burst I/O, one could disable this feature to achieve more performance with the tradeoff of higher power consumption.​​Possible values: ​

Enable (default) Enables Data Fabric C-states. Data Fabric C-states may be entered when all cores are in CC6. 
Disable Disable Data Fabric (DF) C-states

As with any narrower/more specific workaround, this may work well for people experiencing low voltage infinity fabric issues - but not as well for people with specific CPU core issues. As stated above, DF C-State is only initiated when cores are already in CC6. I haven't done enough testing at this level of detail to have conclusive data myself. I certainly encourage others experiencing issues to try this out and report back!


----------



## Deepcuts

JohnnyFlash said:


> There is zero reason for AMD to make any statement. Demand is still massive and they are servicing anyone with a bad one. There is no positive impact to making a statement.


There's no positive impact for AMD you mean.
A public statement might warn potential new customers to steer clear. You know, those people that actually want to use the thing they purchased when they purchased it and not after several months when a possible good CPU is returned from RMA.
You don't seem to take into account the associated cost of downtime for end-users.


----------



## silot

I tested the latest 3405 "stable"  bios on my x570 strix-e and 5900x and i still get the same crashes and whea errors in games , of course everything is on default with no overclocks.


----------



## goondam

so my 5950x becomes unstable if the curve optimizer is set to even 10 negative offset along with 3x or higher scalar option

so yay weird


----------



## kr0mka

mtavel said:


> After weeks of troubleshooting, I've found that my 5950x unexpected reboots and WHEA errors were caused by core instability at low voltage associated with sleep C-States. Disabling C-States immediately resolved all my reboots - but also raised my idle temps and the low voltage instability prevents me from running any meaningful negative PBO curve optimization. Replaced my 5950x with a 5600x and it ran perfectly (with Global C-State Control enabled, as default). It's definitely the CPU. Going through RMA now.
> 
> View attachment 2477308
> 
> 
> Here's a summary of the APIC ID's associated with the 41 WHEA errors I had logged over a 16 day period. Core 14 and it's neighbors have some issues.


Are APIC IDs the actual failing core number? Asking because i've encountered APIC IDs like 17 or 19 in my wheas when i was unstable due to too much negative curve optimizer values, and there's no core id 19 or 17 on a 5900X.


----------



## BluePaint

-10 undervolt for all cores on a 5950x will often make it unstable because the 5950x often boosts up to 5050Mhz already which needs a lot of voltage and already runs at the edge of stability for many CPUs, which is one of the reasons why we see many 5950x being unstable out of the box


----------



## goondam

BluePaint said:


> -10 undervolt for all cores on a 5950x will often make it unstable because the 5950x often boosts up to 5050Mhz already which needs a lot of voltage and already runs at the edge of stability for many CPUs, which is one of the reasons why we see many 5950x being unstable out of the box


i did hear that the solution to this is to find your best and worst cores. then set up individual core pbo curve optimizer for them, with the best cores accepting very high negative offset values and bad cores barely taking any


----------



## MikeS3000

kr0mka said:


> Are APIC IDs the actual failing core number? Asking because i've encountered APIC IDs like 17 or 19 in my wheas when i was unstable due to too much negative curve optimizer values, and there's no core id 19 or 17 on a 5900X.


APIC IDs do not correspond necessarily to the failing core number. They correspond to a failing thread. Open cpu-z and go to tools and save a text report. Open that report and scroll down a little bit and you will see which cores and threads the APIC ID corresponds to. APIC IDs are the last # after the thread #. My 5900x reads as you would think on the first CCD (0-11 for threads and APIC IDs). On the 2nd CCD it resumes APIC IDs at "16" instead of "12".

Socket 0 
-- Node 0 
-- CCX 0 
-- Core 0 (ID 0) 
-- Thread 0 0
-- Thread 1 1
-- Core 1 (ID 1) 
-- Thread 2 2
-- Thread 3 3
-- Core 2 (ID 2) 
-- Thread 4 4
-- Thread 5 5
-- Core 3 (ID 3) 
-- Thread 6 6
-- Thread 7 7
-- Core 4 (ID 4) 
-- Thread 8 8
-- Thread 9 9
-- Core 5 (ID 5) 
-- Thread 10 10
-- Thread 11 11
-- CCX 1 
-- Core 6 (ID 8) 
-- Thread 12 16
-- Thread 13 17
-- Core 7 (ID 9) 
-- Thread 14 18
-- Thread 15 19
-- Core 8 (ID 10) 
-- Thread 16 20
-- Thread 17 21
-- Core 9 (ID 11) 
-- Thread 18 22
-- Thread 19 23
-- Core 10 (ID 12) 
-- Thread 20 24
-- Thread 21 25
-- Core 11 (ID 13) 
-- Thread 22 26
-- Thread 23 27


----------



## mtavel

goondam said:


> i did hear that the solution to this is to find your best and worst cores. then set up individual core pbo curve optimizer for them, with the best cores accepting very high negative offset values and bad cores barely taking any


This would help in some cases (it reduced my reboots to a degree), but I had cores that became unstable at the voltage levels present during low power and sleep states - which are not really influenced by curve optimization offsets.


----------



## mtavel

Here's the CPU-Z report (preserving formatting) where I found the cores associated with the APIC ID in my event viewer WHEA errors:


----------



## MikeS3000

Interesting I wonder why my system skips a bunch of those IDs.


----------



## JohnnyFlash

Deepcuts said:


> There's no positive impact for AMD you mean.
> *A public statement might warn potential new customers to steer clear.* You know, those people that actually want to use the thing they purchased when they purchased it and not after several months when a possible good CPU is returned from RMA.
> You don't seem to take into account the associated cost of downtime for end-users.


This right here is why they would not do it.

You're right from a customer's point of view, but that doesn't benefit AMD at all, so they won't do it. The only time a company would release a statement in a case like this, is if doing so reduced the negative impact.


----------



## MikeS3000

JohnnyFlash said:


> This right here is why they would not do it.
> 
> You're right from a customer's point of view, but that doesn't benefit AMD at all, so they won't do it. The only time a company would release a statement in a case like this, is if doing so reduced the negative impact.


Or if their product could potentially hurt somebody. e.g. NZXT pcie riser fire hazard.


----------



## Anthos

MikeS3000 said:


> Or if their product could potentially hurt somebody. e.g. NZXT pcie riser fire hazard.


NZXT only did it because they were heavily pressured by gamernexus. They were aware of the issue yet did diddly squat to fix it (their nylon screws solution was laughable). They were intending to just do a revision change and screw the people that already bought the defective ones. (and this criticism comes from a person that owns 2 of their products).


About the rest in my opinion AMD SHOULD have released a statement because by doing nothing they are letting this spread unchecked. People left right either blaming bios, or cpus or whatnot and just creating confusion and distrust in their product. A lot of these people affected will go to intel and never look back. If for whatever reason this spikes up more it could bite them in the ass. But for the time being they are taking the childish approach (which is ****up and act dumb).

Now as an update for my whea errors I've managed to identify the culrprit with 99% confidence. Which is.... Me. Some of the whea errors that I had must 've been from negative curves making the system unstable however some times when you revert changes they don't always go back and you have to clear cmos so it might be that the blame is not 100% on me. Without any curve at all I haven't had a whea for quite some time. It seems that my #1 fastest/preferred core happens to be core 0 but it seems that while the other ones can take a big negative value... that one... does not tolerate anything less than -5. I was intending to verify by running the system without any curve for the whole month of feb but I am getting a bit itchy to start messing around again. So yeah, my 5950x might be the ****tiest overclocker that exists but it seems that it is at least able to do what it's supposed to do at least on stock.


----------



## thigobr

Same here on 5950X, best cores won't go lower than -3 CO otherwise I get reboots... Second CCX is happy at -22 so far. I am trying to reduce little by little


----------



## xeizo

I haven't had a sudden reboot since I identified the bad cores(two of them) and gave them less offset in CO, like two weeks ago, seems pretty stable now


----------



## kr0mka

MikeS3000 said:


> APIC IDs do not correspond necessarily to the failing core number. They correspond to a failing thread. Open cpu-z and go to tools and save a text report. Open that report and scroll down a little bit and you will see which cores and threads the APIC ID corresponds to. APIC IDs are the last # after the thread #. My 5900x reads as you would think on the first CCD (0-11 for threads and APIC IDs). On the 2nd CCD it resumes APIC IDs at "16" instead of "12".
> 
> Socket 0
> -- Node 0
> -- CCX 0
> -- Core 0 (ID 0)
> -- Thread 0 0
> -- Thread 1 1
> -- Core 1 (ID 1)
> -- Thread 2 2
> -- Thread 3 3
> -- Core 2 (ID 2)
> -- Thread 4 4
> -- Thread 5 5
> -- Core 3 (ID 3)
> -- Thread 6 6
> -- Thread 7 7
> -- Core 4 (ID 4)
> -- Thread 8 8
> -- Thread 9 9
> -- Core 5 (ID 5)
> -- Thread 10 10
> -- Thread 11 11
> -- CCX 1
> -- Core 6 (ID 8)
> -- Thread 12 16
> -- Thread 13 17
> -- Core 7 (ID 9)
> -- Thread 14 18
> -- Thread 15 19
> -- Core 8 (ID 10)
> -- Thread 16 20
> -- Thread 17 21
> -- Core 9 (ID 11)
> -- Thread 18 22
> -- Thread 19 23
> -- Core 10 (ID 12)
> -- Thread 20 24
> -- Thread 21 25
> -- Core 11 (ID 13)
> -- Thread 22 26
> -- Thread 23 27


Wow, thanks for this! Will be handy for curve optimizer tweaking


----------



## Imraneo

How do you determine your best (or worse?) cores? I took a shortcut and took in whatever Ryzen Master told me (I wonder who determines this?)
Out of the 4 best cores from there, only 2 from the first CCX are the most active. The ones from the second CCX pretty much sleep during single threaded workloads.
My lowest CO numbers for best 4 are -5 and the rest, -15.
I can still tweak further, diving down in individual cores, but I doubt I'll see real benefit from here on...


----------



## iraff1

MikeS3000 said:


> APIC IDs do not correspond necessarily to the failing core number. They correspond to a failing thread. Open cpu-z and go to tools and save a text report. Open that report and scroll down a little bit and you will see which cores and threads the APIC ID corresponds to. APIC IDs are the last # after the thread #. My 5900x reads as you would think on the first CCD (0-11 for threads and APIC IDs). On the 2nd CCD it resumes APIC IDs at "16" instead of "12".
> 
> Socket 0
> -- Node 0
> -- CCX 0
> -- Core 0 (ID 0)
> -- Thread 0 0
> -- Thread 1 1
> -- Core 1 (ID 1)
> -- Thread 2 2
> -- Thread 3 3
> -- Core 2 (ID 2)
> -- Thread 4 4
> -- Thread 5 5
> -- Core 3 (ID 3)
> -- Thread 6 6
> -- Thread 7 7
> -- Core 4 (ID 4)
> -- Thread 8 8
> -- Thread 9 9
> -- Core 5 (ID 5)
> -- Thread 10 10
> -- Thread 11 11
> -- CCX 1
> -- Core 6 (ID 8)
> -- Thread 12 16
> -- Thread 13 17
> -- Core 7 (ID 9)
> -- Thread 14 18
> -- Thread 15 19
> -- Core 8 (ID 10)
> -- Thread 16 20
> -- Thread 17 21
> -- Core 9 (ID 11)
> -- Thread 18 22
> -- Thread 19 23
> -- Core 10 (ID 12)
> -- Thread 20 24
> -- Thread 21 25
> -- Core 11 (ID 13)
> -- Thread 22 26
> -- Thread 23 27


Thats awesome information, i have now also identified the cores that give me WHEA errors yet they never lead to any issues like reboots for me.

So the idea here is to use curve optimizer and give the best cores the highest "minus value" and then the bad cores should stay at 0 or even plus value? 

Thanks in advance.


----------



## GRABibus

iraff1 said:


> Thats awesome information, i have now also identified the cores that give me WHEA errors yet they never lead to any issues like reboots for me.
> 
> So the idea here is to use curve optimizer and give the best cores the highest "minus value" and then the bad cores should stay at 0 or even plus value?
> 
> Thanks in advance.


Bad cores (Means the laziest one's), you give the "highest negative value", as -30 for example as a first try.


----------



## iraff1

GRABibus said:


> Bad cores (Means the laziest one's), you give the "highest negative value", as -30 for example as a first try.


Damn i must have completely misunderstood, i though giving higher negative value means less power to that core and with less power to a core the core can boost higher because less power = less heat = think it can go higher but will crash if it runs out of power?

So how eaclty would i get my 5950x to boost to boost higher than the base level by lowering all the cores value and keeping the best ones at 0? or is there a plus value that comes into play for the best cores?


----------



## xeizo

iraff1 said:


> Damn i must have completely misunderstood, i though giving higher negative value means less power to that core and with less power to a core the core can boost higher because less power = less heat = think it can go higher but will crash if it runs out of power?
> 
> So how eaclty would i get my 5950x to boost to boost higher than the base level by lowering all the cores value and keeping the best ones at 0? or is there a plus value that comes into play for the best cores?


There is misunderstanding of the words best/worst, most only think about higher scores but by worst I mean cores that are causing WHEA/reboots/crashes. Those cores needs MORE juice, less minus, regardless of how they boost. We don't want crashes, do we?

The other cores which are considered stable, you can give them whatever offset to raise your scores.

People only thinks about performance, not stability, gah!

A returning phrase here is "I set -30 on all cores" and a screenshot with super scores. Yeah, but it crashes three times every hour during idle(yes, benchmarks seldom crash with too low CO, load just runs).


----------



## mongoled

xeizo said:


> There is misunderstanding of the words best/worst, most only think about higher scores but by worst I mean cores that are causing WHEA/reboots/crashes. Those cores needs MORE juice, less minus, regardless of how they boost. We don't want crashes, do we?
> 
> The other cores which are considered stable, you can give them whatever offset to raise your scores.
> 
> People only thinks about performance, not stability, gah!
> 
> A returning phrase here is "I set -30 on all cores" and a screenshot with super scores. Yeah, but it crashes three times every hour during idle(yes, benchmarks seldom crash with too low CO, load just runs).


 "I set -30 on all cores" precisely!

Little do they understand that their rigs are far from "stable"

A quick Y-Cruncher 15/16 will cause an instant reboot or freeze!


----------



## Anthos

the problem with negative curves like it was in my case is that there is a grey area. If i put -30 on all the cores it crashes on heavy load (because even though on heavy load each voltage requirement is less for each core as they don't boost as much it's still too little voltage and they crash) however when I had -15 on all of them it "seemed" to be stable. Didn't have a crash for days, so when I ended up getting one I had totally forgot that I had the curve in the first place to associate it with it. so for example if one of your cores can tolerate up to -13 and you have that core on -15 then it won't crash as often, it would have to get into a situation that will push it to its limits and need that extra mv that it won't get now. This becomes even more trivial if it is a core on a low priority list. for example your core #14 that gets much less usage to get into that scenario. At least that's how I understand the whole situation.


----------



## iraff1

But giving a core less voltage by having minus in the curve optimizer results in the core potentially boosting higher and longer? 
I already know that 2 of my cores are bad now due to the findings in cpu-z where you couldf filter out which core is related to APIC ID /whea errors.
So my question is should i put these "bad cores" at really high minus voltage or even plus voltage? To my understanding bad cores should not have minus voltage, only the good ones?


----------



## Imraneo

Ok guys, I've been on the the following settings for a whole day:
CO -5 for cores 0,3,8,7
CO -15 for rest

I've had 5 idle reboots so far. 4x APIC ID 24, 1x APID 16
Below is my CPU-Z log:

Socket 0 
-- Node 0 
-- CCX 0 
-- Core 0 (ID 0) 
-- Thread 0 0
-- Thread 1 1
-- Core 1 (ID 1) 
-- Thread 2 2
-- Thread 3 3
-- Core 2 (ID 2) 
-- Thread 4 4
-- Thread 5 5
-- Core 3 (ID 3) 
-- Thread 6 6
-- Thread 7 7
-- Core 4 (ID 4) 
-- Thread 8 8
-- Thread 9 9
-- Core 5 (ID 5) 
-- Thread 10 10
-- Thread 11 11
-- CCX 1 
-- Core 6 (ID 8) 
-- Thread 12 16
-- Thread 13 17
-- Core 7 (ID 9) 
-- Thread 14 18
-- Thread 15 19
-- Core 8 (ID 10) 
-- Thread 16 20
-- Thread 17 21
-- Core 9 (ID 11) 
-- Thread 18 22
-- Thread 19 23
-- Core 10 (ID 12) 
-- Thread 20 24
-- Thread 21 25
-- Core 11 (ID 13) 
-- Thread 22 26
-- Thread 23 27


So from this, can I say that threads 20 and 12 are causing the reboots?
So if that's the case, I can dial cores 10 and 6 down to -10 (instead of -15) for my next step?
Thanks Mike for the Epic (APIC) info!


----------



## xeizo

iraff1 said:


> But giving a core less voltage by having minus in the curve optimizer results in the core potentially boosting higher and longer?
> I already know that 2 of my cores are bad now due to the findings in cpu-z where you couldf filter out which core is related to APIC ID /whea errors.
> So my question is should i put these "bad cores" at really high minus voltage or even plus voltage? To my understanding bad cores should not have minus voltage, only the good ones?


The core/cores causing WHEA should be close to 0 or even plus, my worst core is -1. While my highest boosting safe cores are -20. I have quick boost to 5GHz on three cores, effective boost to 5100-5200MHz on two cores and sustained single boost in real applications are 4900-4950MHz. All core full load is 4400-4500MHz depending on type of load. In games it usually 4600-4800MHz all the time. I haven't had a reboot for two weeks now, and not a WHEA for several days. Last WHEA was when I was experimenting with Vcore offset, I settled at smallest possible minus offset which gives no WHEA. 

Yes, lower voltage means boost is sustained for longer because of temp limit steps, while too low means it doesn't boost as high. Higher voltage means higher boost freq, with the risk of not being sustained because of too high temp. It's a balance.


----------



## iraff1

xeizo said:


> The core/cores causing WHEA should be close to 0 or even plus, my worst core is -1. While my highest boosting cores are -20.
> 
> Yes, lower voltage means boost is sustained for longer because of temp limit steps, while too low means it doesn't boost as high. Higher voltage means higher boost freq, with the risk of not being sustained because of too high temp. It's a balance.


cheers, thanks for explaining, will play around with this a little.
i watched this video to get a better idea : 




and to my knowledge is AEGSA 1.1.8.0 is the first official release with this type of curve optimizer support, my bios is still running 1.1.0.0 and 1.2.0.0 is still beta so i guess i'll hold off unti i get 1.2.0.0 rather then having to do this all over again once 1.2.0.0 is released for my motherboard.


----------



## mtavel

Imraneo said:


> So from this, can I say that threads 20 and 12 are causing the reboots?
> So if that's the case, I can dial cores 10 and 6 down to -10 (instead of -15) for my next step?
> Thanks Mike for the Epic (APIC) info!


From what I've seen - if your CPU is running stable at stock (no reboots with bios defaults) and your reboots only start when you add negative optimization curve values, then dialing them back on the sensitive cores will probably eliminate the reboots again.

If your CPU is not stable at stock settings, I have not seen a case where dialing in optimization curves eliminates it.

I'm of course happy to hear if PBO curve makes a CPU that was unstable at defaults now stable - but from my personal experience, the circumstances impacting voltage outside of load situations (idle, for example) are not impacted by optimization curves and will continue to cause reboots.


----------



## machine038

mtavel said:


> I'm of course happy to hear if PBO curve makes a CPU that was unstable at defaults now stable


If you use positive offset, is possible. I've made my 5950x stable until mailing the faulty CPU. There others in this thread with the same experience..


----------



## mtavel

machine038 said:


> If you use positive offset, is possible. I've made my 5950x stable until mailing the faulty CPU. There others in this thread with the same experience..


Glad to hear it then. I nearly maxed out my worst offending core with positive offset (in +2 increment steps between idle reboots) and still had the issue. I know this isn't one of those issues where the CPU is either totally bad or perfect, it's a spectrum. 

I definitely got to the point where I couldn't accept how far I had to adjust mine just trying to make the PC stable, much less being able to optimize performance. Especially hard for me to accept for an $800 processor (if you can find it at that price).


----------



## Imraneo

mtavel said:


> From what I've seen - if your CPU is running stable at stock (no reboots with bios defaults) and your reboots only start when you add negative optimization curve values, then dialing them back on the sensitive cores will probably eliminate the reboots again.
> 
> If your CPU is not stable at stock settings, I have not seen a case where dialing in optimization curves eliminates it.
> 
> I'm of course happy to hear if PBO curve makes a CPU that was unstable at defaults now stable - but from my personal experience, the circumstances impacting voltage outside of load situations (idle, for example) are not impacted by optimization curves and will continue to cause reboots.


I'd say its stable at stock settings (It better be, this is my 2nd unit!)
I guess I'm just tuning it to make it more optimized (higher speeds when needed, lesser energy when not). This is a pretty time-consuming process though


----------



## Anthos

It is not a matter per se of bad or good cores. A "good" core that can boost highest than the rest i.e lets say 5150 needs more voltage to achieve so compared to a core that can only get up to 4875. Now the ideal situation is that all cores are blessed in the silicon lottery and can boost to their maximum speed with a -30 curve. In reality this doesn't hold true for all, or many. So you kinda need to find out the limits of each of your core which can be a real pain in the ass especially if you have a 5900-5950 with a dozen and more cores. But yeah, the fact that my highest boosting core doesn't tolerate less than -5 (i think, haven't tested extensively) doesn't make it bad. As it can achieve high boosts regardless of how I play with the curve. But it helps with the weaker ones, as it keeps the heat down and allows them all to reach at least 4900 and several more than default to reach 5000. Does it make a noticeable difference? Not really. Just satisfies the part of your brain that likes to see rounded and big numbers. (as if the 50-100mhz extra boost means anything in reality)


----------



## xeizo

Anthos said:


> It is not a matter per se of bad or good cores. A "good" core that can boost highest than the rest i.e lets say 5150 needs more voltage to achieve so compared to a core that can only get up to 4875. Now the ideal situation is that all cores are blessed in the silicon lottery and can boost to their maximum speed with a -30 curve. In reality this doesn't hold true for all, or many. So you kinda need to find out the limits of each of your core which can be a real pain in the ass especially if you have a 5900-5950 with a dozen and more cores. But yeah, the fact that my highest boosting core doesn't tolerate less than -5 (i think, haven't tested extensively) doesn't make it bad. As it can achieve high boosts regardless of how I play with the curve. But it helps with the weaker ones, as it keeps the heat down and allows them all to reach at least 4900 and several more than default to reach 5000. Does it make a noticeable difference? Not really. Just satisfies the part of your brain that likes to see rounded and big numbers. (as if the 50-100mhz extra boost means anything in reality)


True, you will be hard pressed to notice any difference under real world usage. It's also easy to go back and compare with ones own older benchmarks and quickly see that the difference is small even in benchmarks. However, it's nice to be at the upper end of general performance even if it isn't noticeable.

Something that IS noticeable with under volting is the possibility of having slower fan speeds and a more silent PC. In fact more desirable than absolute max performance in my book.

There was a large benefit for me going from CH7 to CH8, not in performance it's exactly the same only more I/O and pcie 4.0, but the CH8 has a much better fan controller than the crappy one on CH7. It became a lot more silent during normal usage using the same fans.


----------



## machine038

mtavel said:


> Glad to hear it then. I nearly maxed out my worst offending core with positive offset (in +2 increment steps between idle reboots) and still had the issue. I know this isn't one of those issues where the CPU is either totally bad or perfect, it's a spectrum.
> 
> I definitely got to the point where I couldn't accept how far I had to adjust mine just trying to make the PC stable, much less being able to optimize performance. Especially hard for me to accept for an $800 processor (if you can find it at that price).


Yeah I get what you mean, especially at the release date when I switched over from Intel and I was getting those BSOD, tried everything, not knowing if was my fault or the CPU or the new motherboard. There was almost 0 information about it and was driving me crazy since the computer for work. In the end I've managed to RMA it and the new CPU is stable.

I suppose the QA was adjusted with their previous knowledge of Zen 2. Read that they produced a million Zen 3 last year, most likely some faulty units would slipped the automated tests.


----------



## MikeS3000

machine038 said:


> Yeah I get what you mean, especially at the release date when I switched over from Intel and I was getting those BSOD, tried everything, not knowing if was my fault or the CPU or the new motherboard. There was almost 0 information about it and was driving me crazy since the computer for work. In the end I've managed to RMA it and the new CPU is stable.
> 
> I suppose the QA was adjusted with their previous knowledge of Zen 2. Read that they produced a million Zen 3 last year, most likely some faulty units would slipped the automated tests.


So my 5900x got approved for RMA. I have a Fedex Ground shipping label and planning on sending mine out on Monday. Out of curiosity how long did it take to get your replacement after they received the defective one? Thanks.


----------



## Imraneo

MikeS3000 said:


> So my 5900x got approved for RMA. I have a Fedex Ground shipping label and planning on sending mine out on Monday. Out of curiosity how long did it take to get your replacement after they received the defective one? Thanks.


May not be the same as Singapore, but here's my timeline:
18 Jan - Sent out CPU
20 Jan - CPU received
22 Jan - Inspection passed, replacement approved
29 Jan - Received a call for delivery confirmation
1 Feb - Received CPU


----------



## goondam

Imraneo said:


> How do you determine your best (or worse?) cores? I took a shortcut and took in whatever Ryzen Master told me (I wonder who determines this?)
> Out of the 4 best cores from there, only 2 from the first CCX are the most active. The ones from the second CCX pretty much sleep during single threaded workloads.
> My lowest CO numbers for best 4 are -5 and the rest, -15.
> I can still tweak further, diving down in individual cores, but I doubt I'll see real benefit from here on...


clock tuner will tell you even more info on which cores is good and which is your best ccx. it rates them, more amount of high rating cares(highest is 150 i believe could be wrong) better your cpu silicon quality.

it will also grade your cpu as a whole
bronze < silver < gold


----------



## devvv4ever

Hello everyone,

Thank you for all your tips - I've gone through this thread as a whole.

Just registered here to say the one and *only* thing that helped was to replace the CPU with another one. Sent back the faulty 5800X to the vendor, took another one from a different vendor with a different charge number (00066 was faulty while 00063 working - but I don't know if this has something to do with it).

All these problems are probably coming from a bad QA from AMD and thus they are selling CPUs that are not working within the specifications.


BIOS updates did stabilize a little bit - maybe due to adjusted voltages. (MB: Aorus x570 Ultra with F33a BIOS)
The curve adjusted (to about +15) helped most, but a brand new 460 EUR CPU should not needed to be tweaked or scaled down anyway.
A new power supply didn't change anything (replaced it just to be sure).
The new CPU works out-of-the-box super-nicely with top performance without any crash neither in Windows nor in Linux.

My recommendation: don't fiddle around with it - *Immideately replace your CPU* if you have unstable systems like WHEA crashes. It will save you a LOT of trouble and time.

Cheers,
Bernhard


_EDIT: After the replacement I never ever had any lockup for almost 2 weeks now!_


----------



## brasoveanul

devvv4ever said:


> Hello everyone,
> 
> Thank you for all your tips - I've gone through this thread as a whole.
> 
> Just registered here to say the one and *only* thing that helped was to replace the CPU with another one. Sent back the faulty 5800X to the vendor, took another one from a different vendor with a different charge number (00066 was faulty while 00063 working - but I don't know if this has something to do with it).
> 
> All these problems are probably coming from a bad QA from AMD and thus they are selling CPUs that are not working within the specifications.
> 
> 
> BIOS updates did stabilize a little bit - maybe due to adjusted voltages. (MB: Aorus x570 Ultra with F33a BIOS)
> The curve adjusted (to about +15) helped most, but a brand new 460 EUR CPU should not needed to be tweaked or scaled down anyway.
> A new power supply didn't change anything (replaced it just to be sure).
> The new CPU works out-of-the-box super-nicely with top performance without any crash neither in Windows nor in Linux.
> 
> My recommendation: don't fiddle around with it - *Immideately replace your CPU* if you have unstable systems like WHEA crashes. It will save you a LOT of trouble and time.
> 
> Cheers,
> Bernhard


Exactly, this is the only valid advice.


----------



## JohnnyFlash

I haven't trusted what AMD is doing with voltages since Zen 2. Historically with every process shrink, voltage should go down due to physics. Yes, they say it's safe, but really that just means they will warranty it if it degrades.

I plan on using manual per-CCX between 1.2-1.25v and see how far that goes. That's my expectation.


----------



## Anthos

JohnnyFlash said:


> I haven't trusted what AMD is doing with voltages since Zen 2. Historically with every process shrink, voltage should go down due to physics. Yes, they say it's safe, but really that just means they will warranty it if it degrades.
> 
> I plan on using manual per-CCX between 1.2-1.25v and see how far that goes. That's my expectation.


I doubt a company is willing to have millions of CPUs coming back to them after a couple of years and having to replace them all. It's a bit ludicrous to say that just because of a node shrink the voltage HAS to be less as if that's the only parameter. I mean my previous CPU was a q6600 at 65nm and the stock voltage was 1.2875.


----------



## os2wiz

Anthos said:


> I doubt a company is willing to have millions of CPUs coming back to them after a couple of years and having to replace them all. It's a bit ludicrous to say that just because of a node shrink the voltage HAS to be less as if that's the only parameter. I mean my previous CPU was a q6600 at 65nm and the stock voltage was 1.2875.


There was no node shrink between 3950X and 5950X it is the same exact node. I truly doubt there is large number of faulty dies.


----------



## BluePaint

It's like in Fight Club (it's cheaper to let some people burn to death and pay the families some money rather than to improve the safety in millions of cars). 
AMD calculated that it's cheaper to replace some returned CPUs rather than to limit production because of stricter quality control, especially in the current market situation where they can't even meet demand. They can even take a small hit to their reputation due to the current high in that regard.


----------



## JohnnyFlash

os2wiz said:


> There was no node shrink between 3950X and 5950X it is the same exact node. I truly doubt there is large number of faulty dies.


Look at the difference in clock speeds and voltages between Zen 2 and 3.

All I'm saying, is that they wouldn't be pushing these chips this hard if they had a decent lead on Intel. That's why overclocking was a thing in the first place: It was the buffer space for 100% flawless operation. These chips now are coming pre overclocked, it's forcing intel to do the same to compete and boom, it's the new normal.


----------



## Anthos

os2wiz said:


> There was no node shrink between 3950X and 5950X it is the same exact node.


never said there was


----------



## GamBoTron

BluePaint said:


> It's like in Fight Club (it's cheaper to let some people burn to death and pay the families some money rather than to improve the safety in millions of cars).
> AMD calculated that it's cheaper to replace some returned CPUs rather than to limit production because of stricter quality control, especially in the current market situation where they can't even meet demand. They can even take a small hit to their reputation due to the current high in that regard.


great comparison: Also makes sense as their first and foremost priority is and will always be to make money


----------



## Deepcuts

os2wiz said:


> I truly doubt there is large number of faulty dies.


You complain that you cannot buy a Ryzen 5000 CPU, yet somehow you have doubts "there is large number of faulty dies"
Buy one soon. Be a lucky customer and get a good one on 1st try, or more likely get a crappy one like so many of us and have a reason to stop doubting.


----------



## Catscratch

The only different trend with ryzen is that, the frequency goes up as the core count. It used to be that fewer core cpus had the most speed. That probably means they BIN these cpus like there's no tomorrow.


----------



## o1dschoo1

MaxHughes said:


> RUN AMD RAM SPEEDS. 2933/3200/3466/3733 Intel XMP 200MHz steps is for Intel. AMD is 266MHZ steps. Posted on AMD's website the day of Ryzens launch.


So imma take a wild guess and say some of this could be caused by the cpu not liking the fclk and dram clock due to people running intel xmp vs the 266 straps that amds prefer?


----------



## JohnnyFlash

o1dschoo1 said:


> So imma take a wild guess and say some of this could be caused by the cpu not liking the fclk and dram clock due to people running intel xmp vs the 266 straps that amds prefer?


People with issues have tried this, both here and on reddit with little/no success.


----------



## o1dschoo1

JohnnyFlash said:


> People with issues have tried this, both here and on reddit with little/no success.


Damn so it's still a cpu/bios issue


----------



## silot

After testing everything and going crazy, including with the latest bios i just RMA the 5900x and i will inform you when i receive the new one and re test everything.


----------



## mtavel

silot said:


> After testing everything and going crazy, including with the latest bios i just RMA the 5900x and i will inform you when i receive the new one and re test everything.


It's kind of a shame that after BIOS features have advanced so far to allow users to customize almost every element of a system's performance, that we start thinking that extensive tweaking might be required just to make our PC's function to specification. So much time can be wasted (and I'm speaking from my own experience too) tweaking/working-around faulty components that should have never been shipped in the first place! Really frustrating.


----------



## MikeS3000

Those who did an RMA, did AMD provide tracking information for the new CPU? I received the automated emails that they received my defective 5900x and that my replacement is ready to ship. They said something about if I don't receive my cpu in 5 business days that I should contact customer care. I would love to track the replacement.


----------



## mtavel

The AMD service center in NA uses FedEx. You can set up an account with FedEx online where you can login and track any packages to/from your address. This is how I found the tracking number associated with my shipment from the service center.

It took 2 days for my CPU to actually be handed over to FedEx after receiving the 'ready to ship' email from AMD service. 

They ship using FedEx 2 day.


----------



## Imraneo

MikeS3000 said:


> Those who did an RMA, did AMD provide tracking information for the new CPU? I received the automated emails that they received my defective 5900x and that my replacement is ready to ship. They said something about if I don't receive my cpu in 5 business days that I should contact customer care. I would love to track the replacement.


Congrats.
I got exactly the same thing.
They called me on the 5th business day (Friday) making sure I'll be at home on Monday to receive the package. No tracking whatsoever.


----------



## geoxile

Are they actually shipping replacements as soon as they receive it?
edit: I mean, do they have replacements on hand. I'm scared I'll send it in and be stuck without my CPU for months.


----------



## mtavel

Looks like they have replacements on hand right now. Hard to say if they're able to maintain stock consistently, but mine was shipped out to me 2 days after replacement was approved.


----------



## MikeS3000

So far the RMA process has been pretty smooth. I shipped out and paid out of pocket for overnight shipping on Monday. Unfortunately the snow storm caused the CPU to arrive on Wednesday. By Thursday morning I received a bunch of emails from AMD saying that my CPU passed inspection and my replacement is ready to ship. I found my FedEx label created as suggested above on my account as they did not provide a tracking number yet. AMD then communicated that the item should actually ship today and FedEx says I should have it by Tuesday. Obviously if you use the free FedEx ground label provided by AMD to ship out your defective processor then it takes longer. However it's nice to know that the turnaround time was one day in my case once received.


----------



## Gexx

Hmm Hopefully i dont have to ship mine out, however the WHEA 18 Error/randon reboots worry me.


----------



## Imraneo

Gexx said:


> Hmm Hopefully i dont have to ship mine out, however the WHEA 18 Error/randon reboots worry me.


No harm submitting an RMA request with your details. A good CPU will not cause reboots or WHEAs. One or many of your cores might be defective.


----------



## Dword

5950x and gigabyte x570 auros pro, same problem. Can we add to a vote where defect processor made? CHINA or MALASYA


----------



## Imraneo

I thought all processors are made in China?


----------



## smbell1979

With my Asus CH8, the beta BIOS 3102 100% eliminated my reboots and cache hierarchy errors. Upgraded to the non-beta 3204 and all was good still. I finally got my 3090 and now I'm getting constant WHEA uncorrectable errors that I didn't get before with my 1080Tis. Rolled back to the 3102 for now.

Contemplating trying to RMA. Never done this with a CPU before and this is my main workstation I need daily.


----------



## mtavel

Dword said:


> 5950x and gigabyte x570 auros pro, same problem. Can we add to a vote where defect processor made? CHINA or MALASYA











What fab was your bad Zen 3 CPU from?


Trying to get a feel for the distribution of origin for bad Zen 3 CPUs. If the batch number on your CPU ends in PGT/PGS, the fab was Penang, Malaysia. SUS/SUT chips are from the Suzhou, China fab. Thanks!




www.overclock.net


----------



## mbraz69

Registered after reading through this entire post. Thanks for all the great info! I have a problem of my own. Not sure if its relatable to the masses but I wanted to reach out for any help if anyone has experienced anything similar. Extremely frustrated and deeply regretting this expensive upgrade.


Ryzen 5800x
MSI x570 Unify
2x8gb Corsair Vengeance 3600 CL16
RTX 3070 FE
Samsung 970 Evo Plus NVMEs
EVGA 750w P2

I just finished testing 7C35vA7, 7C35vA86(beta), and 7C35vA8 bios and all produce the same results.


Install Bios
Leave Bios settings completely untouched
Let Windows attempt to boot
*Immediate BSOD *with the error being *CRITICAL_PROCESS_DIED*
Hard restart and enter BIOS
Disable CPB and Save & Exit
Windows boots fine
XMP 3600 only works on the latest 7C35vA8 bios

I have also tried entering the BIOS right after the flash completes and Load Optimized Defaults. It will BSOD immediately with the same CRITICAL_PROCESS_DIED or PAGE_FAULT_IN_NOT_PAGED_AREA.

Worst experience in my 15yrs of being a pc geek, AMD has lost me as a customer after this.


----------



## GRABibus

mbraz69 said:


> Registered after reading through this entire post. Thanks for all the great info! I have a problem of my own. Not sure if its relatable to the masses but I wanted to reach out for any help if anyone has experienced anything similar. Extremely frustrated and deeply regretting this expensive upgrade.
> 
> 
> Ryzen 5800x
> MSI x570 Unify
> 2x8gb Corsair Vengeance 3600 CL16
> RTX 3070 FE
> Samsung 970 Evo Plus NVMEs
> EVGA 750w P2
> 
> I just finished testing 7C35vA7, 7C35vA86(beta), and 7C35vA8 bios and all produce the same results.
> 
> 
> Install Bios
> Leave Bios settings completely untouched
> Let Windows attempt to boot
> *Immediate BSOD *with the error being *CRITICAL_PROCESS_DIED*
> Hard restart and enter BIOS
> Disable CPB and Save & Exit
> Windows boots fine
> XMP 3600 only works on the latest 7C35vA8 bios
> 
> I have also tried entering the BIOS right after the flash completes and Load Optimized Defaults. It will BSOD immediately with the same CRITICAL_PROCESS_DIED or PAGE_FAULT_IN_NOT_PAGED_AREA.
> 
> Worst experience in my 15yrs of being a pc geek, AMD has lost me as a customer after this.


RMA is the only way...


----------



## mbraz69

Just for curiosity sakes I just tried to manual set vcore voltage to 1.325 and CPU at 4.4GHz and it appears to be stable after a quick and dirty stress test. I know its not ideal to static OC becuase you miss out on the boost, but if I can't get the boost on auto 4.4ghz is better then a locked 3.7ghz. I honestly am not sure its the CPU. Something in the bios is not set right out of the box I think.


----------



## mtavel

mbraz69 said:


> Just for curiosity sakes I just tried to manual set vcore voltage to 1.325 and CPU at 4.4GHz and it appears to be stable after a quick and dirty stress test. I know its not ideal to static OC becuase you miss out on the boost, but if I can't get the boost on auto 4.4ghz is better then a locked 3.7ghz. I honestly am not sure its the CPU. Something in the bios is not set right out of the box I think.


As you can imagine, there are lots of reasons a CPU may not be stable (voltage too low, too high, silicon binning not done properly and its not stable at max specified boost, etc.). If you're not stable at BIOS defaults, that's not a great sign. You either have incompatible components, or one of your components is bad.

You may be able to tweak BIOS settings to work around whatever the root-cause of the issue is, but it's unlikely to be a real 'fix'. You could try another CPU in that motherboard to see if the issue goes away, or drop that CPU into another mobo - but it's completely understandable if you don't have a lab bench of spare parts laying around to test with. Some local repair shops may be willing to help you isolate a component for a reasonable fee also.

You shouldn't have to tweak anything to have a stable system at base defaults (that means disabling XMP/D.O.C.P. too). If that's still unstable, I would be leaning towards hardware issues. If you're running into issues as you bring in features like PBO and D.O.C.P., it doesn't rule out hardware issues, but it definitely starts to get you into the neighborhood of the root-cause.


----------



## JohnnyFlash

mbraz69 said:


> I have also tried entering the BIOS right after the flash completes and *Load Optimized Defaults*. It will BSOD immediately with the same CRITICAL_PROCESS_DIED or PAGE_FAULT_IN_NOT_PAGED_AREA.


Try it one more time, but completely clear CMOS, don't use the optimized defaults. Maybe it doesn't do anything, but I have seen this fix things with Ryzen systems before.


----------



## mbraz69

Thank you for the details response mtavel. I completed an RMA form with AMD yesterday and they got back to me today asking for screenshots showing BIOS version, chipset driver version, power settings in windows set to balanced and a DXDIAG .txt file. I only have until March 7 to initiate a return with Amazon for the motherboard, so I am probably going to do that as well as I highly doubt AMD will process the RMA and get me a new chip by that date.


----------



## Imraneo

mbraz69 said:


> Thank you for the details response mtavel. I completed an RMA form with AMD yesterday and they got back to me today asking for screenshots showing BIOS version, chipset driver version, power settings in windows set to balanced and a DXDIAG .txt file. I only have until March 7 to initiate a return with Amazon for the motherboard, so I am probably going to do that as well as I highly doubt AMD will process the RMA and get me a new chip by that date.


Pretty much exactly the same issue as yours. Extremely sensitive CPU, which doesn't need any stress testing. Just a simple boot into Windows will fail at BIOS defaults. I got mine to work if I disabled CPB or set to constant 1.1 Vcore.
Don't worry, this is a straight-forward RMA. Also my first CPU failure since Pentium 166Mhz days. LOL!


----------



## mbraz69

This does not seem normal to me after just reinstalling the latest AMD chipset drivers and the installer specifically has an option to install or not install the AMD SMBUS Driver.....?


----------



## Anthos

mbraz69 said:


> This does not seem normal to me after just reinstalling the latest AMD chipset drivers and the installer specifically has an option to install or not install the AMD SMBUS Driver.....?
> 
> View attachment 2478737


That's how mine also looks like and I do not share your issues.


----------



## mbraz69

I ran memtest overnight and no errors were found. I tried another nvme drive and 2 2.5" ssd's, still immediate Bluescreen when windows installer tries to load. The only other thing I can think of is psu. Its an older EVGA 650w P2 Platinum PSU that has served me well for many years on my 6700k system with the same 3070 and nvme drives. Maybe it doesn't have enough juice for the 5800x?


----------



## RemoteSpecialist

Custom PC Builder PowerGPU Claims Ryzen 5 5000 Zen 3 CPUs Experiencing High Failure Rates

"PowerGPU says* it received 50 units each of the Ryzen 9 5950X and Ryzen 9 5900X, of which eight of the former were DOA, and four of the later were as well. That works out to 12 out of 100 chips from within the Ryzen 9 family*. In addition, the builder says it received 100 units of the Ryzen 7 5800X, of which four arrived DOA, and 120 units of the Ryzen 5 5600X, of which three were defective.

That is 19 out of 320 chips, for a failure rate of nearly 6 percent. In contrast, the company said it has only received a single dead Intel chip, a Core i7-9700K, in the past two years. When asked how many Intel CPUs it receives versus AMD chips, the company replied, "_Before the [Ryzen] 5000 series, it was 80 percent Intel an 20 percent AMD and we only had 1 Intel CPU die in the past 2 years._"


----------



## ghiga_andrei

RemoteSpecialist said:


> Custom PC Builder PowerGPU Claims Ryzen 5 5000 Zen 3 CPUs Experiencing High Failure Rates
> 
> "PowerGPU says* it received 50 units each of the Ryzen 9 5950X and Ryzen 9 5900X, of which eight of the former were DOA, and four of the later were as well. That works out to 12 out of 100 chips from within the Ryzen 9 family*. In addition, the builder says it received 100 units of the Ryzen 7 5800X, of which four arrived DOA, and 120 units of the Ryzen 5 5600X, of which three were defective.
> 
> That is 19 out of 320 chips, for a failure rate of nearly 6 percent. In contrast, the company said it has only received a single dead Intel chip, a Core i7-9700K, in the past two years. When asked how many Intel CPUs it receives versus AMD chips, the company replied, "_Before the [Ryzen] 5000 series, it was 80 percent Intel an 20 percent AMD and we only had 1 Intel CPU die in the past 2 years._"


It's a good start. When more OEMs will report this also it will be an issue AMD can't hide anymore. Waiting for HP or Dell to report this.


----------



## mbraz69

I officially give up. Washing my hands of AMD and never looking back. Thanks all for the assistance


----------



## mtavel

Sounds like AMD's push to ramp up production volume has been far more aggressive than the realities of TSMC's silicon quality would allow. 

Considering the 5950x needs 100% healthy CCXs (can't disable one or two bad cores) - I understand how it might have the higher DOA rate, but 16% DOA (in that sample, anyway) is INSANE!


----------



## JohnnyFlash

If mine has issues when I get it, I'm running at manual all core voltage for a year before RMAing. The vast majority of the issues are tied to boost and sleep voltages, as long as it runs static for now I'm happy. 

By the time I RMA it, the silicon should be better.


----------



## mbraz69

BluescreenView says that it's always caused by ntoskrnl.exe+3f5a80......if that meant anything to me.


----------



## dr.Rafi

Catscratch said:


> The only different trend with ryzen is that, the frequency goes up as the core count. It used to be that fewer core cpus had the most speed. That probably means they BIN these cpus like there's no tomorrow.


Because they charging more for higher core count, and most people who pay more want their cpus perform better in games and productivity in the same time, also AMD pushing this generation to limit make it harder to use only high binned chiplets and dispose the rest, which is the majority.
And for my point of view less core count cpus are performing better with fclk which bring their performance equal to high core counts one in Less core demanding and single threaded applications like games and light applications.


----------



## dr.Rafi

mbraz69 said:


> Just for curiosity sakes I just tried to manual set vcore voltage to 1.325 and CPU at 4.4GHz and it appears to be stable after a quick and dirty stress test. I know its not ideal to static OC becuase you miss out on the boost, but if I can't get the boost on auto 4.4ghz is better then a locked 3.7ghz. I honestly am not sure its the CPU. Something in the bios is not set right out of the box I think.


Motherboards are playing a big role in 5000 series cpus issue, I have 5950x which is one 3 i have so far including on 5900x ,this 5950x is the best one, can boot to 2033 fclk with whea and 1900 fully stable with no whea even i test it with quad rank 4 x 16 dual rank memory is stable @ 1900 fclk /3800 memory CO -30 all cores boost 100mhz, global c_state disabled, and both CCDes can boost simillar with 12 cores boost over 5000(tested on Unify-x b550, and aorus master motherboards) , I tested the same cpu with X570 Unify ,continuous reboots and whea @ 1866 and never can boot 1900, even tried manual voltages and timimg no dice, back to unify-x and it worked like before,, so difinitly motherboards are playing role, we used to talk about silicon lottery with cpus now we have lottery for motherboards too.


----------



## mbraz69

So is the B550 Aorus Master a better bet to go with that another x570 motherboard? Would be strange if the higher end boards would be more problematic?


----------



## dr.Rafi

mbraz69 said:


> So is the B550 Aorus Master a better bet to go with that another x570 motherboard? Would be strange if the higher end boards would be more problematic?


Sorry Imeant Aorus master *x570* rev 1.2 is good not rev 1 and 1.1, rev 1 have issue with ram overclocking, and 2.5 G network adapter, 1.1 ram fixed , rev1.2 both memory and network fixed , I tested x570 extreme rev 1.1 which is great perform same as master 1.2 ,extreme have rev 1.1 max because only memory overclocking was upgraded(better memory traces alignment with higher overclocking capability), and have no issue with network.
And b550 unify-x also great but only issue if you not lucky and get coil whine sound ,I did not test the b550 master.


----------



## mbraz69

Ya I don't have the luxury of seeing what Rev version any motherboard is before buying it. Thanks for the info though. Guess I will just wait out the AMD rma and return the x570 Unify to Amazon.

Cheers


----------



## dr.Rafi

mbraz69 said:


> Ya I don't have the luxury of seeing what Rev version any motherboard is before buying it. Thanks for the info though. Guess I will just wait out the AMD rma and return the x570 Unify to Amazon.
> 
> Cheers


You welcome, I buy personaly in store and ask them to check it is usually written on box , and if they can not check it before payment but they can cancel the transaction and refund.


----------



## Notty

This explains a lot. Embarrassing for AMD if you ask me. I returned my 5600x aswell. 1 month without a PC now. I Will make sure I never ever buy an AMD CPU again.


----------



## xeizo

Yes, looks like it was in a hurry to rush out 5000-series, QA needs to improve by much. Working chips works though so to never buy AMD again is a bit overreacting imho


----------



## GamBoTron

Just got mine up and running yesterday. Didnt test to much but everything is flawless so far and temps are nice (so far at least) . Will test more today


----------



## mongoled

Nice, more people never going to buy AMD again, how nice of you, more stock availability for the rest of the population of the World


----------



## Deepcuts

mongoled said:


> Nice, more people never going to buy AMD again, how nice of you, more stock availability for the rest of the population of the World


You mean more stock for people that afford to lose time with RMA, lose time with debugging, or straight-up buy another one and be "lucky" enough to get a 2nd defective sample like some of the users in this thread?
For enthusiasts and home users, it is not the end of the world. I mean it's bad, but not critical.
For business use, I would say staying away from AMD is the right decision for now.


----------



## mongoled

Everyone has free will to choose as they wish.

Just like children do.

And its usually children that cry, some of the reponses in this thread are more like children then Adults.

Though just like children they are free to act how they wish, just like I am free to say that there will be more AMD CPUs available for others.


----------



## Catscratch

Notty said:


> This explains a lot. Embarrassing for AMD if you ask me. I returned my 5600x aswell. 1 month without a PC now. I Will make sure I never ever buy an AMD CPU again.


I also read a turkish site citing them. Did they delete the tweet ? Couldn't find it. 
I found this

__ https://twitter.com/i/web/status/1361088764068659206


----------



## Imraneo

It is embarrassing for AMD, but I can't say for sure if I will stay away from them. I would be a little worried during purchase though, making sure my warranty is intact. 
Business users probably buy from OEM who will have suffered the fallout, so I'd say if you're buying from a builder, you should be more confident as your system should have been tested well before sale.
We still do not have the global picture on what's going on. Global RMA numbers would be interesting and also, is it a TSMC issue or an AMD one? Tech-wise, AMD seems pretty good it seems, with their PBO2 and whatnot. Wouldn't shoot them down just yet..


----------



## Marucins

Weird articles have appeared ridiculing AMD and showing the failure rate of the new Ryzen 5000.
They were created by the PowerGPU profile on Twitter.
Later he deleted everything because he had no evidence for his words.
And he could come to us 

Maybe it's another scam. But I experienced a processor failure myself.

Or maybe it's a cleverly organized action. Similar to the one in March 2018, when a recently founded and unknown cybersecurity company called CTR Labs said it had found 13 security flaws in the ZEN architecture without providing evidence of their existence. The presence of these vulnerabilities has never been confirmed by more reputable institutions afterwards. The entire announcement of CTR Labs came 3 months after Meltdown and Specter vulnerabilities were discovered in Intel processors. It also quickly turned out that the company is registered in Tel Aviv, where by chance there are several research centers of another processor manufacturer.


----------



## 1devomer

mongoled said:


> Everyone has free will to choose as they wish.
> 
> Just like children do.
> 
> And its usually children that cry, some of the reponses in this thread are more like children then Adults.
> 
> Though just like children they are free to act how they wish, just like I am free to say that there will be more AMD CPUs available for others.


Kinda twisted way to think and elaborate about things.
Having more widespread cpu availability does not imply that there will be less faulty cpu's!
Instead, the issue will be diluted because of poor adoption rate, over a large volume of units available.


Anyway, little sanity check up:
-AMD shipped Zen/Zen+ cpu's with segfault issues, that had to be RMAed.
-AMD shipped Zen2 cpu's not able to reach specs clocks, at launch.
-AMD shipped Zen3 cpu's with WHEA fault, at launch, along with dual CCD 5600/5800.

If some want to know where the best binned Zen3 chiplets went, i advise you to look at the AMD investor transcript:
_"Milan production began in the fourth quarter as planned, with initial shipments to cloud and HPC customers."
Lisa Su._

"_EPYC processor revenue grew sequentially, including early shipments of third-generation EPYC Milan processors."
Devinder Kumar._









Advanced Micro Devices (AMD) Q4 2020 Earnings Call Transcript | The Motley Fool


AMD earnings call for the period ending December 31, 2020.




www.fool.com







I also will not buy AMD any more and i do not advise anyone to invest into AMD product at the moment.


----------



## mongoled

1devomer said:


> Kinda twisted way to think and elaborate about things.
> Having more widespread cpu availability do not imply that there will be less faulty cpu's!
> Instead, the issue will be diluted because of poor adoption rate, over a large volume of units available.
> 
> 
> Anyway, little sanity check up:
> -AMD shipped Zen/Zen+ cpu's with segfault issues, that had to be RMAed.
> -AMD shipped Zen2 cpu's not able to reach specs clocks at launch.
> -AMD shipped Zen3 cpu's with WHEA fault at launch, along dual CCD 5600/5800.


Welcome to the forum, great way to make your first post!

Nothing twisted about my post.

Its actually very factual if you chose to read thoroughly.

** EDIT **
Ahhh you are a shill! Maybe you are related to a Francois Piednoel !


----------



## mongoled

Imraneo said:


> It is embarrassing for AMD, but I can't say for sure if I will stay away from them. I would be a little worried during purchase though, making sure my warranty is intact.
> Business users probably buy from OEM who will have suffered the fallout, so I'd say if you're buying from a builder, you should be more confident as your system should have been tested well before sale.
> We still do not have the global picture on what's going on. Global RMA numbers would be interesting and also, is it a TSMC issue or an AMD one? Tech-wise, AMD seems pretty good it seems, with their PBO2 and whatnot. Wouldn't shoot them down just yet..


Thank you for the well balanced post


----------



## JohnnyFlash

That's just it: TSMC bit off way more than they could chew, it's very possible this is on them if they can't delivery at the same rate as the test batches.


----------



## dr.Rafi

Deepcuts said:


> You mean more stock for people that afford to lose time with RMA, lose time with debugging, or straight-up buy another one and be "lucky" enough to get a 2nd defective sample like some of the users in this thread?
> For enthusiasts and home users, it is not the end of the world. I mean it's bad, but not critical.
> For business use, I would say staying away from AMD is the right decision for now.


Business owner will start to order more units than their need inmind the failure rate, and return the failed units.


----------



## Deepcuts

dr.Rafi said:


> Business owner will start to order more units than their need inmind the failure rate, and return the failed units.


Good one...if this was the Funny section.


----------



## mbraz69

I just received my 3rd email from AMD and now they are asking for pictures of the CPU installed in the socket without heatsink, and a copy of my invoice from Newegg. Hopefully this is the start to a RMA that doesn't take months to get a working 5800x back


----------



## dr.Rafi

mbraz69 said:


> I just received my 3rd email from AMD and now they are asking for pictures of the CPU installed in the socket without heatsink, and a copy of my invoice from Newegg. Hopefully this is the start to a RMA that doesn't take months to get a working 5800x back


the picture of cpu I assume to check the serial number and especially the manifacturing week either they checking authenticity of cpu or want be sure you sending them the same cpu you claiming RMA , wonder if you buy cpu from someone with cash and no receipt(but factory sealed) still can claim for warrenty with AMD?


----------



## mbraz69

Doubtful as they also asked for me to provide them with _"The original vendor provided invoice or receipt document, clearly displaying the details of the purchase."_


----------



## JohnnyFlash

They are starting to reject CPUs bought from scalpers. Personally I'm 100% for this, but I know some will disagree.


----------



## mbraz69

I'm going to order a new motherboard before I pack up my MSI and send it back to Amazon just to be 100% sure its not the board vs cpu. I was thinking the Asus X570 Crosshair VIII Hero or ASRock X570 Taichi....thoughts?


----------



## RemoteSpecialist

mbraz69 said:


> I'm going to order a new motherboard before I pack up my MSI and send it back to Amazon just to be 100% sure its not the board vs cpu. I was thinking the Asus X570 Crosshair VIII Hero or ASRock X570 Taichi....thoughts?


Asus X570 Crosshair VIII Dark Hero - as this one has no chipset fan


----------



## mbraz69

If I could find it in stock in Canada I would, but that is not an option right now.


----------



## iraff1

Are we finally going to get those tech influensers (GamersNexus, LinusTechTips, Jays2Cents) and whoemever else is on the AMD paycheck to talk about the elephant in the room now?









AMD Ryzen 5000 'Zen 3' Desktop CPUs & X570 Motherboards Have High Failure Rates, Reports PowerGPU


DIY PC Builder, PowerGPU, has reported that they are facing very high failure rates with AMD's Ryzen 5000 Zen 3 Desktop CPUs & X570 boards.




wccftech.com





Now a large company that builds Ryzen 5000 builds can give us statistics. The tweet that was made from the company in question has already been removed, probably AMD working over time to save face, here's a print screen.










This isn't looking good for AMD, maybe finally we get the attention we deserve?
PS: How the hell do you ship that many DOA and still claim to have quality control? I'm pretty sure there is no quality control, the only quality control that ever existed was for the samples that were sent out to the youtube/influencer community, awful move on AMD's end.


----------



## JohnnyFlash

It's still only a sample size of 320 chips. I also bet they are grouping WHEA issues as DOA.

It's not a smoking gun.


----------



## dr.Rafi

iraff1 said:


> Are we finally going to get those tech influensers (GamersNexus, LinusTechTips, Jays2Cents) and whoemever else is on the AMD paycheck to talk about the elephant in the room now?
> 
> 
> 
> 
> 
> 
> 
> 
> 
> AMD Ryzen 5000 'Zen 3' Desktop CPUs & X570 Motherboards Have High Failure Rates, Reports PowerGPU
> 
> 
> DIY PC Builder, PowerGPU, has reported that they are facing very high failure rates with AMD's Ryzen 5000 Zen 3 Desktop CPUs & X570 boards.
> 
> 
> 
> 
> wccftech.com
> 
> 
> 
> 
> 
> Now a large company that builds Ryzen 5000 builds can give us statistics. The tweet that was made from the company in question has already been removed, probably AMD working over time to save face, here's a print screen.
> 
> View attachment 2479027
> 
> 
> This isn't looking good for AMD, maybe finally we get the attention we deserve?
> PS: How the hell do you ship that many DOA and still claim to have quality control? I'm pretty sure there is no quality control, the only quality control that ever existed was for the samples that were sent out to the youtube/influencer community, awful move on AMD's end.


youtube tech channels depend mainley on hardware send to them by hardware companies they will never talk about any failure otherwise they get banned from new hardware reviews I never trust their twisted reviews.


----------



## Imraneo

__ https://twitter.com/i/web/status/1361363595783794701
Hmm.. some development here.
Who is PowerGPU btw? Not familiar with them.


----------



## Deepcuts

iraff1 said:


> AMD Ryzen 5000 'Zen 3' Desktop CPUs & X570 Motherboards Have High Failure Rates, Reports PowerGPU
> 
> 
> DIY PC Builder, PowerGPU, has reported that they are facing very high failure rates with AMD's Ryzen 5000 Zen 3 Desktop CPUs & X570 boards.
> 
> 
> 
> 
> wccftech.com


Man they are bad at citing stuff:
_"PowerGPU also mentions that prior to the launch of AMD's Ryzen 5000 CPUs, the failure rate was 80% Intel and 20% AMD and they only had one CPU die on them in the past 2 years."_
That was the purchase ratio, not the failure rate.


----------



## Bighuman45

Deepcuts said:


> Hello,
> 
> *Please vote on the pool only if your system is not stable with BIOS defaults, memory at 2133 Mhz without XMP, without any CPU or RAM overclocking, without PBO or any voltage tweaks and of course, if you do not have any issues with your Ryzen 5000 or your problem has been fixed.*
> _* you can select 2 values.
> Motherboard+CPU if you have issues.
> No, I tested extensively for several days+CPU if you do not have issues._
> _It did, but *+CPU if your issue has been fixed._
> 
> *See **https://www.overclock.net/threads/replaced-3950x-with-5950x-whea-and-reboots.1774627/post-28698010** for the solution to my issue.*
> 
> 
> I bought the new AMD Ryzen 5950X to replace my AMD Ryzen 3950X.
> This is the only new component in the system. The rest of the components are in the signature.
> 
> 
> *Problem*
> 
> As soon as I booted up to Windows, the system started rebooting and crashing, sometimes with the BSOD WHEA Uncorrectable Error​
> 
> *What I tried*
> 
> *Long story short:*​
> I have replaced every component except the CPU and the motherboard.
> 
> *Long story:*​
> removed all RAM sticks and tested with only one at a time in different memory slots.
> tested with memory at 2133 Mhz auto timings, XMP and manual timings without XMP.
> took out my RAM and tested with one stick of G-Skill F4-2400C15S-8GNS and one KIT of 2 sticks Corsair CMK8GX4M1A2400C16.
> replaced the PSU with a Corsair AX760i
> removed any other USB devices besides mouse and keyboard
> tested with only a Bluetooth mouse. No other USB connected.
> removed any other HDD and SSD besides the system/windows one.
> replaced the system/Windows SSD and tried reinstalling Windows. Crashes while installing.
> removed the CPU to check for bent pins with a magnifying glass. Twice. All good.
> downgraded BIOS to version F30.
> re-flashed BIOS version F31e.
> upgraded BIOS to F31h, F31i, F31k, F31l, F31n, F31o, F31
> cleared CMOS and tried booting without setting anything in BIOS.
> booted Ubuntu 20 Desktop live USB. Crashes before desktop with some cryptic error about CPU.
> checked CPU and motherboard temperatures. All fine.
> reseated the GPU.
> tested with an RX 460 GPU instead of GTX 1080 ti.
> tested with an RX 590 GPU instead of GTX 1080 ti. Takes longer to crash than with the GTX 1080 ti on BIOS version F31n.
> disabled C-States
> disabled HPET-Timer
> forced PCIe to gen 2/3/4
> disabled AMD Cool&Quiet
> disabled PBO (always have it on Auto anyway)
> removed all SSDs and HDDs and tried booting from Ubuntu live USB
> tried all levels of LLC
> Enabled Preferred Cores
> 
> 
> *Temporary fix*
> 
> After many failed attempts with various BIOS settings, the only one that fixes this problem is setting "Core Performance Boost" to disabled. Of course, with this setting disabled, this new CPU performs a lot worse than the old 3950X.​With "Core Performance Boost" disabled, I can run my RAM at 3600 and IF/UCLK at 1800 with tight timings without any problems. 300+ Handbrake CPU stable encodes so far.​
> 
> With F31h Windows no longer crashes at boot, but crashes under load or random at idle like before.
> The fastest way to crash the system is to run AIDA64 memory copy benchmark (will crash when CPU will reach 100% usage), a Handbrake encode (will crash as soon as it starts encoding) or a game (Guild Wars 2 crashes at login screen).
> 
> Opened a ticket with Gigabyte, but knowing Gigabyte, their response will be "We will inform our engineers" and then silence.
> Opened a ticket with AMD. No response. Received an email requesting some details. Still waiting. Received another email requesting details already sent in the original RMA ticket. I guess AMD support and Gigabyte support are outsourced at the same helpdesk. RMA accepted after 5 weeks: my reply to AMD.
> 
> 
> Anyone else having problems with the new 5950X and Core Performance Boost?
> 
> Thank you.


Wow I am terrified to get my 5950x that I have had on backorder since release date with B&H Photo. I have the 3950x and the AORUS Elite WiFi x570. Feel like I should just sell it on Ebay when it arrives unless there is a MAJOR BIOS fix.


----------



## JohnnyFlash

Bighuman45 said:


> Wow I am terrified to get my 5950x that I have had on backorder since release date with B&H Photo. I have the 3950x and the AORUS Elite WiFi x570. Feel like I should just sell it on Ebay when it arrives unless there is a MAJOR BIOS fix.


Even if it is defective, you can run an all-core overclock which will still merc your current chip. Defects that are completely unstable exist, but seem to be rare.


----------



## dr.Rafi

Bighuman45 said:


> Wow I am terrified to get my 5950x that I have had on backorder since release date with B&H Photo. I have the 3950x and the AORUS Elite WiFi x570. Feel like I should just sell it on Ebay when it arrives unless there is a MAJOR BIOS fix.


Tested three 5950x so far and one 5900x all good no issues , 5900x and one 5950x sold that 5950x was able to do 4000 memory/2000fclk but higher Vsoc 1.17 volt not like other 2 5950x i have now both can do 2000fclk with vsoc 1.11, -30 CO C-state disabled, with 100mhz boost and both stable @ 3800/1900 and all four cpus are stable @ defult. live in Australia 3 cpu from totally different locations and states in Australia are made in China and one 5950x from USA, Made in Malysia which is best overclocker and fclk.
and both 5950x can do dual rank memory 2033/4066 and single rank 2066/4133 but with less performance than 2000 fclk.


----------



## 1devomer

Bighuman45 said:


> Wow I am terrified to get my 5950x that I have had on backorder since release date with B&H Photo. I have the 3950x and the AORUS Elite WiFi x570. Feel like I should just sell it on Ebay when it arrives unless there is a MAJOR BIOS fix.



One can behave like the apologists shouting all over the internet and be tempted to keep a dull chip.
Then a few months later, some threads will start to appear shouting:
_"Did i win the silicon lottery 5950x, 5Ghz all cores, no WHEA."_

Or one can return/sell its dull cpu and buy another one, lets say, in 4/6 months.
When the "Did i hit the silicon lottery" threads start appearing.

There is also 50 pages thread about the WHEA issue on the AMD forum, if some are interested:






Ryzen 5900x: System constantly crashing/restarting WHEA-Logger ID 18 and critical error Kernel-Power


Mainboard: MSI x570 Unify Mainboard-BIOS: 7C35vA82 (Beta version) CPU: Ryzen 5900x RAM: Crucial Ballistix BL2K32G36C16U4B 3600 MHz, 64GB (32GB x2) Drive: M.2 Samsung 970 Evo+ 1TB SSD Graphics: SAPPHIRE Nitro+ Radeon RX 5700 XT PSU: be quiet straight power 11 750w Platinum OS: Win 10 Pro (64bit)...




community.amd.com






In any shapes or forms, i would not buy an AMD product at the moment.
Especially when AMD launch products, due to the fact that one could end up with a dull, low binned chip.


----------



## Anthos

1devomer said:


> One can behave like the apologist all over the internet and be tempted to keep a dull chip.
> Then a few months later, some threads will start to appear shouting:
> _"Did i win the silicon lottery 5950x, 5Ghz all cores, no WHEA."_
> 
> Or one can return/sell its cpu and buy another one, lets say, in 4/6 month.
> When the "Did i hit the silicon lottery" threads start appearing.
> 
> There is also 50 pages about the WHEA issue on the AMD forum if some are interested in:
> 
> 
> 
> 
> 
> 
> Ryzen 5900x: System constantly crashing/restarting WHEA-Logger ID 18 and critical error Kernel-Power
> 
> 
> Mainboard: MSI x570 Unify Mainboard-BIOS: 7C35vA82 (Beta version) CPU: Ryzen 5900x RAM: Crucial Ballistix BL2K32G36C16U4B 3600 MHz, 64GB (32GB x2) Drive: M.2 Samsung 970 Evo+ 1TB SSD Graphics: SAPPHIRE Nitro+ Radeon RX 5700 XT PSU: be quiet straight power 11 750w Platinum OS: Win 10 Pro (64bit)...
> 
> 
> 
> 
> community.amd.com
> 
> 
> 
> 
> 
> 
> In any shapes or forms i would not buy an AMD product at the moment.
> Especially when AMD launch products, due to the fact that one could end up with a dull, low binned chip.


Saying buying no AMD products at all is a bit of a tretch. People ending up with totally unstable chips are quite the minority not the norm, and if you do get extremely unlucky you can RMA. Now if AMD was declining to replace such CPUs it would be a different case but they don't.


----------



## RemoteSpecialist

I opened





Processors


Processors (Intel® Core™, Intel® Xeon®, etc); processor utilities and programs (Intel® Processor Identification Utility, Intel® Extreme Tuning Utility, Intel® Easy Streaming Wizard, etc.)



community.intel.com




and checked 10 pages for the number of comments. Usually, each topic has less than 10 comments. Max number I saw was 17 I think

If we go to





Processors







community.amd.com




we'll see that a single topic with 5xxx reboot problems contains more than 500 comments already - that's at least twice larger than all comments for all topics for all 10 pages from the Intel. And this topic is a new one - it was started in ‎11-19-2020

I don't know - maybe these numbers are explained by the fact that nobody buys Intel anymore


----------



## 1devomer

Anthos said:


> Saying buying no AMD products at all is a bit of a tretch. People ending up with totally unstable chips are quite the minority not the norm, and if you do get extremely unlucky you can RMA. Now if AMD was declining to replace such CPUs it would be a different case but they don't.


I would agree if some striking facts would not come into play as:

_Anyway, little sanity check up:
-AMD shipped Zen/Zen+ cpu's with segfault issues at launch, that had to be RMAed.
-AMD shipped Zen2 cpu's not able to reach specs clocks, at launch.
-AMD shipped Zen3 cpu's with WHEA fault at launch, along with dual CCD 5600/5800._


There is enough material, accumulated over a 4 years course, showing that it is better to avoid AMD products all together, especially at launch.
Buying an AMD product has become a gamble and it is not fine just because only few users lost their cpu bets.

If something is broken, would you advise someone else to buy it?
Personally i don't, so my best advice by facts and personal experience with the products is clear:
_Return dull cpu's and wait until AMD is able to deliver the same chips quality as the ones currently shipping to its EPYC customers!_


----------



## RAAVANA

Hi, guys... I just recently (like 3 days ago) installed a 5950X into my X570 Aorus Master and like most, I got the crashes and reboots... Now, I wanted to ask something... Did any of you guys find any other thing wrong with the system??? Like any small thing that is small and went unnoticed as we were staring at the big issues of crashing PC? I did... I did notice that my sound had crackling all over... So, I went a different root and seem to have stumbled upon something just now (literally like an hour ago...). I too did what all of you tried... BIOS update, changing BIOS settings and stuff.... My system crashed (100%) a few minutes into the game (DotA 2 was my choice of game for sacrifice). I think there is some problem with the chipset drivers that you install for your motherboard. All I did waws uninstall it and set my XMP to 4000Mhz and rest to default and tried watching some video (which also caused the random crashes) and then played a bot game of DotA 2 like all the times before. The sound issue completely disappeared and My PC did not crash in the past 2 hours of testing. I will perform further testing and update if there is a crash or not. I really hope this helps at least a few people out there... Have a good day...


----------



## Deepcuts

*This is a rant. *

I am not a retailer or a shop. Just provide support to businesses.
21 years in the field.
For personal use, I started with an Intel 486 and some later models, then briefly used a Cyrix and an AMD Athlon X2.
The Athlon XP was my last AMD CPU for a very long time. Switched back to AMD after Intel 8700K, to a 3950X and now to 5950X.
The only CPU I ever managed to break was the AMD Athlon XP. I have cracked its die. My bad.

For my clients, I have always opted for Intel CPUs.
In these 21 years, I have never had an Intel CPU malfunctioning. Not a single DoA and cannot remember ever servicing a system with an Intel CPU that was dead or partially defective even long after the warranty expired.
By saying that, I agree that retailers with really high sales volume had their fair share of defective Intel CPUs. No doubt.
I suck at math, but I might have built over two thousand systems with Intel CPUs so far.


I am grateful that AMD is kicking Intel's ass right now performance-wise and I hope they will keep doing that for a long time. If not for AMD, I am certain we would all still use quad-core CPUs in 2021.
I am a fan of whatever company provides the best performance/cost and I have never thought until now that I also have to take into consideration the reliability of a new, unused CPU. Somehow, in my mind, it was impossible for a CPU to be DoA.
Up until now, if anyone would have said that their new CPU was defective, I am 99.99% sure I would have thought the user made a mistake, overclocked the hell out of that CPU, or bent some pins.

Whoever states that the amount of defective AMD Ryzen 5000 CPUs is not that high or even worth taking into consideration must be a sandwich short of a picnic. And I am seeing a lot of such statements.

The worst part of this whole story is that even with this very bad experience, I will still purchase the next AMD Ryzen 6950X or whatever it will be named if it will deliver the same increase in performance.
But I will be sure to pay the shop to test the darn thing before delivery.

/rant over


----------



## Midian

I had only one WHEA error but that was before reinstall of Windows and using SATA-SSD+HD for storage. Then I switched to just NVMe storage and new Windows install and zero errors ever since (2020-12-02). Now it could be the Windows reinstall but it could also have something to do with switching to NVMe storage only, might be something worth investigating.


----------



## Anthos

1devomer said:


> I would agree if some striking facts would not come into play as:
> 
> _Anyway, little sanity check up:
> -AMD shipped Zen/Zen+ cpu's with segfault issues at launch, that had to be RMAed.
> -AMD shipped Zen2 cpu's not able to reach specs clocks, at launch.
> -AMD shipped Zen3 cpu's with WHEA fault at launch, along with dual CCD 5600/5800._
> 
> 
> There is enough material, accumulated over a 4 years course, showing that it is better to avoid AMD products all together, especially at launch.
> Buying an AMD product has become a gamble and it is not fine just because only few users lost their cpu bets.
> 
> If something is broken, would you advise someone else to buy it?
> Personally i don't, so my best advice by facts and personal experience with the products is clear:
> _Return dull cpu's and wait until AMD is able to deliver the same chips quality as the ones currently shipping to its EPYC customers!_


I disagree.

For starters any electronic you buy is always a gamble, from a tv to a car. I know people that they as they rolled out of the dealership they had to roll back in because of a major issue. Saying that no one should ever buy a honda or ford or mercedes because if it is again a bit of a stretch. And on top of that what answer does intel have or double digit cores? Their new flagship is gonna drop back to 8. Obviously that's an issue for people that require more. Don't get me wrong, I am not happy or giving a pass to AMD about all of this, I am massively pissed that they haven't issued a statement yet about this but I just refrain from going from one extreme to the other.


----------



## Hueristic

1devomer said:


> Fanboi shill fist post


Always nice to see a new account show its true colors right out of the gate, welcome to instant ignore.


----------



## GRABibus

Deepcuts said:


> *This is a rant. *
> 
> I am not a retailer or a shop. Just provide support to businesses.
> 21 years in the field.
> For personal use, I started with an Intel 486 and some later models, then briefly used a Cyrix and an AMD Athlon X2.
> The Athlon XP was my last AMD CPU for a very long time. Switched back to AMD after Intel 8700K, to a 3950X and now to 5950X.
> The only CPU I ever managed to break was the AMD Athlon XP. I have cracked its die. My bad.
> 
> For my clients, I have always opted for Intel CPUs.
> In these 21 years, I have never had an Intel CPU malfunctioning. Not a single DoA and cannot remember ever servicing a system with an Intel CPU that was dead or partially defective even long after the warranty expired.
> By saying that, I agree that retailers with really high sales volume had their fair share of defective Intel CPUs. No doubt.
> I suck at math, but I might have built over two thousand systems with Intel CPUs so far.
> 
> 
> I am grateful that AMD is kicking Intel's ass right now performance-wise and I hope they will keep doing that for a long time. If not for AMD, I am certain we would all still use quad-core CPUs in 2021.
> I am a fan of whatever company provides the best performance/cost and I have never thought until now that I also have to take into consideration the reliability of a new, unused CPU. Somehow, in my mind, it was impossible for a CPU to be DoA.
> Up until now, if anyone would have said that their new CPU was defective, I am 99.99% sure I would have thought the user made a mistake, overclocked the hell out of that CPU, or bent some pins.
> 
> Whoever states that the amount of defective AMD Ryzen 5000 CPUs is not that high or even worth taking into consideration must be a sandwich short of a picnic. And I am seeing a lot of such statements.
> 
> The worst part of this whole story is that even with this very bad experience, I will still purchase the next AMD Ryzen 6950X or whatever it will be named if it will deliver the same increase in performance.
> But I will be sure to pay the shop to test the darn thing before delivery.
> 
> /rant over



My comment is off topic but they will never call the next one the 6950X ..Intel BWe....😊

from a Marketing point of view, they also shouldn’t have called the 5950x as it is...my first thought was « Hey, Intel is back with BWe or what ? »


----------



## folklore11

Anthos said:


> I disagree.
> 
> For starters any electronic you buy is always a gamble, from a tv to a car. I know people that they as they rolled out of the dealership they had to roll back in because of a major issue. Saying that no one should ever buy a honda or ford or mercedes because if it is again a bit of a stretch. And on top of that what answer does intel have or double digit cores? Their new flagship is gonna drop back to 8. Obviously that's an issue for people that require more. Don't get me wrong, I am not happy or giving a pass to AMD about all of this, I am massively pissed that they haven't issued a statement yet about this but I just refrain from going from one extreme to the other.




Interesting reading here Gentlemen...








Are Ryzen 5000 CPU failure rates as high as claimed?


Our sources say NO




www.overclock3d.net


----------



## warplane95

5900X here since yesterday, PBO boost to 4.95GHz. 

I had two bios reboot and a blue screen within 30min after setting the curve optimizer at -30, but nothing so far after setting it to -10.

D.O.C.P. to 3600MHz, FCLK to 1800, PBO limit disabled? 


Sent from my Pixel 4 XL using Tapatalk


----------



## JohnnyFlash

warplane95 said:


> 5900X here since yesterday, PBO boost to 4.95GHz.
> 
> I had two bios reboot and a blue screen within 30min after setting the curve optimizer at -30, but nothing so far after setting it to -10.
> 
> D.O.C.P. to 3600MHz, FCLK to 1800, PBO limit disabled?
> 
> 
> Sent from my Pixel 4 XL using Tapatalk


Run it stock for a couple days before messing with anything. I know it's exciting to get it, have your fun then reset the BIOS.

This should be the practice even when there aren't known issues to make sure you have stable hardware.


----------



## Hueristic

warplane95 said:


> 5900X here since yesterday, PBO boost to 4.95GHz.
> 
> I had two bios reboot and a blue screen within 30min after setting the curve optimizer at -30, but nothing so far after setting it to -10.
> 
> D.O.C.P. to 3600MHz, FCLK to 1800, PBO limit disabled?
> 
> 
> Sent from my Pixel 4 XL using Tapatalk



This is not a thread for failures in over or undervolting. 

It is for stock settings.


----------



## RemoteSpecialist

folklore11 said:


> Interesting reading here Gentlemen...
> 
> 
> 
> 
> 
> 
> 
> 
> Are Ryzen 5000 CPU failure rates as high as claimed?
> 
> 
> Our sources say NO
> 
> 
> 
> 
> www.overclock3d.net


yep - it's interesting. I'm happy to hear that "AMD are taking these claims seriously"
I don't understand though why there is no reaction to the reboots topic with 500+ posts on AMD's official forum.
I don't understand why there is no such topic in Intel's community.
I don't understand why we have 1000+ posts in this topic if things are just fine.
I don't understand why we have 5 different AGESA versions in 3 months if there are no serious issues.
I don't understand how it is possible to get 2 defective CPUs in a row (hard reboots and BSODs on the 1st one, BSODs on the 2nd one)


----------



## mongoled

RemoteSpecialist said:


> yep - it's interesting. I'm happy to hear that "AMD are taking these claims seriously"
> I don't understand though why there is no reaction to the reboots topic with 500+ posts on AMD's official forum.
> I don't understand why there is no such topic in Intel's community.
> I don't understand why we have 1000+ posts in this topic if things are just fine.
> I don't understand why we have 5 different AGESA versions in 3 months if there are no serious issues.
> I don't understand how it is possible to get 2 defective CPUs in a row (hard reboots and BSODs on the 1st one, BSODs on the 2nd one)


Seeing you dont understand much, maybe try a different hobby


----------



## RAAVANA

RAAVANA said:


> Hi, guys... I just recently (like 3 days ago) installed a 5950X into my X570 Aorus Master and like most, I got the crashes and reboots... Now, I wanted to ask something... Did any of you guys find any other thing wrong with the system??? Like any small thing that is small and went unnoticed as we were staring at the big issues of crashing PC? I did... I did notice that my sound had crackling all over... So, I went a different root and seem to have stumbled upon something just now (literally like an hour ago...). I too did what all of you tried... BIOS update, changing BIOS settings and stuff.... My system crashed (100%) a few minutes into the game (DotA 2 was my choice of game for sacrifice). I think there is some problem with the chipset drivers that you install for your motherboard. All I did waws uninstall it and set my XMP to 4000Mhz and rest to default and tried watching some video (which also caused the random crashes) and then played a bot game of DotA 2 like all the times before. The sound issue completely disappeared and My PC did not crash in the past 2 hours of testing. I will perform further testing and update if there is a crash or not. I really hope this helps at least a few people out there... Have a good day...


Hi guys, I had stability for some time with the above process but it eventually crashed... However, after a few attempts, I noticed that the system is very stable if I get the FCLK to 1800. I changed the FCLK to 1800 and my ram to match it in 1:1... Since morning I did not have a single crash.... I hope that AMD can fix the FCLK issue if not I have just wasted money buying a 4000Mhz RAM...  Also, my system is running on Auto OC mode using Ryzen master and I was switching PBO and Auto OC without any issues. Hope this helps someone...


----------



## Anthos

RAAVANA said:


> Hi guys, I had stability for some time with the above process but it eventually crashed... However, after a few attempts, I noticed that the system is very stable if I get the FCLK to 1800. I changed the FCLK to 1800 and my ram to match it in 1:1... Since morning I did not have a single crash.... I hope that AMD can fix the FCLK issue if not I have just wasted money buying a 4000Mhz RAM...  Also, my system is running on Auto OC mode using Ryzen master and I was switching PBO and Auto OC without any issues. Hope this helps someone...


For most people buying ram in the range of 3800-4000+ is indeed a waste of money. There's no guarantee it will run at those speeds with a 1:1 with the fclk and as far as I know AMD doesn't claim it would, so it's a total gamble. They might make it more stable in future AGESA but they don't really have to as far as I understand it.


----------



## mtavel

I received my replacement CPU on Monday Feb 15th and have been running with it for about 36 hours now. The new CPU was from batch 2104PGS (4th week 2021 instead of my original 5950x manufactured the 43rd week of 2020).

I'm still monitoring performance, but I have not received a single WHEA error or idle reboot like I saw with my previous (RMA'ed) 5950x. With my old CPU, I would have experienced multiple reboots by now.

I'm quite happy with the replacement so far and will provide updates if anything else comes up.

Overall, I would recommend that anyone experiencing idle/low-power reboots at bios defaults and/or WHEA errors strongly consider RMA'ing your CPU.


----------



## mwwl

I ended up getting a new 5900x recently after returning my WHEA-ing one a month ago. Had a bit of a WHEA scare (mostly correctable, though one reboot) when using the latest stable BIOS (though never tried it with XMP off) on my MSI x570 ACE, but once updating to the beta BIOS with AGESA 1.2.0.0, absolutely no problems for a week. +1 on RMAing these CPUs. This whole experience will probably make me go with Intel next time out of principle though. When I was waiting for the 5900x to come out, I had a DOA 3600xt. And this whole business with zen3 BIOSes still being in beta months after release is embarrassing


----------



## RemoteSpecialist

mwwl said:


> I ended up getting a new 5900x recently


Can you also post the batch number? Thx!


----------



## mtavel

One final difference I'd like to point out between my original bad 5950x and the replacement - I consistently had a 7 to 8 degree C difference between CCD 0 and CCD 1.

The new 5950x has a 1-2 degree temperature difference between CCD's. MUCH more consistent and better matched. 
Interestingly, it was CCD 1 (the cooler running CCD) on my old 5950x that had the most WHEA causing APICs.


----------



## mwwl

RemoteSpecialist said:


> Can you also post the batch number? Thx!


New one is 2102PGS


----------



## Priv-Au

Hey guys just posting an update to circumstances.
Constant reboots under no load with a 5950X. 
Through process of elimination It had to be my motherboard or CPU.

Had motherboard tested by my retailer and they found no fault with it. 

Sent an RMA request with AMD. Jumped through a couple of hoops, namely photos, receipts (etc) and had it approved. 
Sent it off.

Yesterday I received an email saying that it had arrived in Singapore. 
Today I received an email saying that it had passed RMA and that a new CPU would be sent off soon. 
20 minutes later an email saying that the new 5950X is on the way.

Either they put it in and it instantly rebooted or possible they’re able to nail down which ones are doing it that quick.
I’ll provide an update again when the new 5950X arrives but for anyone facing these issues just start the RMA and get it over and done with. 
You *should not* be having to underclock, adjust timings on anything. 
If it does not work out of the box in factory conditions you should immediately return it.


----------



## Abula

How much time does it take from the initial report for AMD to get back to you, i filled the initial form today, and got two email inside one hour, one telling me my ticket and the other that they have created an account, but nothing else. 

Can anyone, that went thru the RMA process, tell me if its fast in terms of emails, the reason is a friend of mine is going to the US tomorrow, so it would be practical to send it with him, but not sure if ill have RMA approval by tomorrow, anyone know how much time give or take?


----------



## mongoled

Priv-Au said:


> Hey guys just posting an update to circumstances.
> Constant reboots under no load with a 5950X.
> Through process of elimination It had to be my motherboard or CPU.
> 
> Had motherboard tested by my retailer and they found no fault with it.
> 
> Sent an RMA request with AMD. Jumped through a couple of hoops, namely photos, receipts (etc) and had it approved.
> Sent it off.
> 
> Yesterday I received an email saying that it had arrived in Singapore.
> Today I received an email saying that it had passed RMA and that a new CPU would be sent off soon.
> 20 minutes later an email saying that the new 5950X is on the way.
> 
> Either they put it in and it instantly rebooted or possible they’re able to nail down which ones are doing it that quick.
> I’ll provide an update again when the new 5950X arrives but for anyone facing these issues just start the RMA and get it over and done with.
> You *should not* be having to underclock, adjust timings on anything.
> If it does not work out of the box in factory conditions you should immediately return it.


I somewhat agree with this post, except that you have not mentioned anything about troubleshooting !

Any system builder worth their salt will know that before you send any parts back you should at lease troubleshoot to see what the issue is otherwise you may end up sending something back that does not have a problem.

As if someone follows your advice, they may end up sending back a perfectly good CPU, where as the problem may lay with the RAM or some other component....


----------



## mtavel

mongoled said:


> I somewhat agree with this post, except that you have not mentioned anything about troubleshooting !


He did mention that he had the mobo tested and it was fine (no details provided, but I'll give him the benefit of the doubt and assume it was a proper job and he didn't want to write a book about it).

But you're right, it would be good to make sure anyone experiencing a possible CPU failure at least makes sure the problem is happening under BIOS defaults with standard memory timings. Similar behavior could be observed if a "good" CPU was undervolted excessively, memory was overclocked and undervolted, etc. Lots of possibilities that could be eliminated with safe BIOS settings - and that would make it much more clearly a CPU issue.


----------



## mtavel

*This is how my 5950x RMA experience went (in the U.S.):

Sunday Jan 31 @ 5pm:* Submitted initial RMA request form

*Sunday Jan 31 @ 8pm:* Received automated confirmation with a service request ticket number

*Wednesday Feb 3 @ 5:45am: *Received a response and service history request from AMD support requesting:

Pic of the CPU installed in the MOBO (cooler removed with model and S/N visible)
Original vendor invoice (pdf)
Make/Model/Bios Version of the MOBO
Details of the issue & troubleshooting steps performed
*Wednesday Feb 3 @ 3:35pm:* Replied with requested information. Received automated confirmation of new service request details @ 3:37pm

*Wednesday Feb 3 @ 9:57pm:* Received "RMA Approved" message with details. Received a FedEx Ground shipping label in a separate email.

*Thursday Feb 4 @ 8am:* Shipped the 5950x to AMD using the provided shipping label (handed to employee at a "FedEx Office" shipping location)

CPU shipped back in its original plastic clamshell to protect the pins, and also in a rigid cardboard box with enough bubble wrap and foam to withstand a SpaceX Starship SN9 landing attempt.
*Monday Feb 8 @ 1pm: *5950x was received at the service center in Miami, FL.

*Tuesday Feb 9 @ 6:37am: *Received confirmation that the my processor:




> Quote
> 
> has successfully passed the inspection and your replacement product is now approved



*Tuesday Feb 9 @ 6:47am: *Received another update indicating:




> Quote
> 
> Your replacement processor is ready to ship.
> If you do not receive your replacement processor within next 5 business days or have any other queries with respects to this RMA, then please submit an online service request...



No shipment tracking number provided. Only RMA number and serial number provided for tracking the request.
*Tuesday Feb 9 @ 7:20pm: *Shipping label created by service center (2-day Fedex). Package not yet received by Fedex.


*Thursday Feb 11 @ 5:04pm: *FedEx received the package from the service center. Expected delivery on Monday Feb 15th.


*Monday Feb 15 @ 9:07am: *FedEx delivered the package (signature required). New CPU in full retail box packaging.


The new CPU batch number is 2104PGS (produced the 4th week [Jan 25-31] of 2021 in Penang). I received the CPU in the 7th week, so not a lot of time in transit. My previous CPU was 2043PGS (mfg. Oct 19-25th 2020 also in Penang).


----------



## mbraz69

AMD Support is a joke. Ive been asked to provide the same thing 3 times now!!! Really regretting this purchase and ever supporting such a s*** company.


----------



## folklore11

Failures? Even more reading today....









If You Buy an AMD Ryzen 5000 CPU, Make Sure You Keep the Box


A PC DIY vendor company sparks controversy by claiming AMD Ryzen 5000 chips have an unusually high failure rate. I also received a bad CPU in November, and it was a hassle to build with and return.




www.pcmag.com


----------



## JohnnyFlash

folklore11 said:


> Failures? Even more reading today....
> 
> 
> 
> 
> 
> 
> 
> 
> 
> If You Buy an AMD Ryzen 5000 CPU, Make Sure You Keep the Box
> 
> 
> A PC DIY vendor company sparks controversy by claiming AMD Ryzen 5000 chips have an unusually high failure rate. I also received a bad CPU in November, and it was a hassle to build with and return.
> 
> 
> 
> 
> www.pcmag.com


Good that it's being discussed, but this is all dovetailing off the same tweet. There's no new info here.


----------



## jvidia

Is there any way to "trigger" this idle reboot problem or we must just use the PC an wait for it ?


----------



## folklore11

JohnnyFlash said:


> Good that it's being discussed, but this is all dovetailing off the same tweet. There's no new info here.


Mostly true. However, author details his personal experience as well... More clout perhaps to "faulty CPU" issue...


----------



## mtavel

jvidia said:


> Is there any way to "trigger" this idle reboot problem or we must just use the PC an wait for it ?


I never found a way to replicate it other that just waiting. Sometimes it would happen while I was scrolling through a web page, other times while the PC was just powered up sitting on it's own.

I would see an idle reboot fairly reliably every 2-3 hours. But I suspect it depends on the quality (or lack thereof) of your silicon. If I was running a test hammering the CPU or Memory, the PC would stay up reliably as long as that was running.


----------



## folklore11

Another update. Still more news concern "faulty" "failing" Ryzen 5000 series CPU's:









Ryzen 5000 failure rates: We reality-check the claims


One system vendor reported that Ryzen 5000 chips are failing at a high rate. Our sources and others suggest the problem is isolated.




www.pcworld.com





On a personal note:
My 5950X is performing perfectly. Bought December 2020 from Amazon using HotStock app.


----------



## Deepcuts

folklore11 said:


> Another update. Still more news concern "faulty" "failing" Ryzen 5000 series CPU's:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Ryzen 5000 failure rates: We reality-check the claims
> 
> 
> One system vendor reported that Ryzen 5000 chips are failing at a high rate. Our sources and others suggest the problem is isolated.
> 
> 
> 
> 
> www.pcworld.com
> 
> 
> 
> 
> 
> On a personal note:
> My 5950X is performing perfectly. Bought December 2020 from Amazon using HotStock app.


In case the post goes Houdini.
"PCWorld reached out to AMD, and officials told us this was an isolated incident."
“AMD is looking into a claim by a custom PC builder regarding higher-than-expected failure rates they are experiencing with Ryzen 5000 series desktop processors,” a spokesman said. “We are unaware of any similar issues at this time.” 

I guess everyone can sleep tight tonight. It was just an isolated incident and our CPUs are perfectly stable./s


----------



## mtavel

Deepcuts said:


> "officials told us this was an isolated incident."


Isolated to Earth at the current time.


----------



## GamBoTron

mwwl said:


> New one is 2102PGS


Where is the actual batch number displayed? i cant seem to find on my 5950x


Btw, had my system for a week now soon.

Ran a couple of benchmarks, enabled XMP, gaming for several hours: No issue whatsoever: the system runs like butter at least for stock settings with XMP enabled.

The only issue i found is that my 3080 rog strix has a slight coil whine, but yeah, thats another story.

Gonna start messing with curve optimiser tommorow and also see if i can tighten RAM timings further etc. I guess thats where the real test comes into play


----------



## Anthos

GamBoTron said:


> Where is the actual batch number displayed? i cant seem to find on my 5950x
> 
> 
> Btw, had my system for a week now soon.
> 
> Ran a couple of benchmarks, enabled XMP, gaming for several hours: No issue whatsoever: the system runs like butter at least for stock settings with XMP enabled.
> 
> The only issue i found is that my 3080 rog strix has a slight coil whine, but yeah, thats another story.
> 
> Gonna start messing with curve optimiser tommorow and also see if i can tighten RAM timings further etc. I guess thats where the real test comes into play


on the cpu


----------



## GamBoTron

Anthos said:


> on the cpu


makes sense, lol, im an idiot. I was looking at the box


----------



## ghiga_andrei

jvidia said:


> Is there any way to "trigger" this idle reboot problem or we must just use the PC an wait for it ?


I'm having good results rebooting my 5900x using an image filtering algorithm I have in python using numpy and opencv libraries that use avx.
My script does all sorts of things and the load varies a lot so it is very much different than what a benchmark does.

Will try to adjust it and compile it tomorrow and I will share it here, see if you can trigger reboots with it also. Right now it's 1am here.


----------



## jvidia

ghiga_andrei said:


> I'm having good results rebooting my 5900x using an image filtering algorithm I have in python using numpy and opencv libraries that use avx.
> My script does all sorts of things and the load varies a lot so it is very much different than what a benchmark does.
> 
> Will try to adjust it and compile it tomorrow and I will share it here, see if you can trigger reboots with it also. Right now it's 1am here.


Yes please.
Thank you!

Those reboots are made by high or low load?


----------



## ghiga_andrei

jvidia said:


> Yes please.
> Thank you!
> 
> Those reboots are made by high or low load?


Light load after a high load. It'a a combination of temperature and boost and the programmed v-f curve. Even with curve optimizer set to 0 it's still a v-f curve there. My guess is there is a small temperature window in each core where coming from a high load to low load makes the cpu think it can sustain a higher boost than it actually can.
Again, my guess, cannot know for sure. But what I am certain based on my experiments is that the reboot happens at a low load but only immediately after a high load or maybe even multi-core load. And of course, it depends on the instructions executed. I tried emulating lots of load variations in python and only numpy vector multiplications that surely use avx or avx2 can crash it.

I am very sure Chrome also uses avx and based on each page content it can cause variable load. But it's a gamble.


----------



## jvidia

ghiga_andrei said:


> Load load after a high load. It'a a combination of temperature and boost and the programmed v-f curve. Even with curve optimizer set to 0 it's still a v-f curve there. My guess is there is a small temperature window in each core where coming from a high load to low load makes the cpu think it can sustain a higher boost than it actually can.
> Again, my guess, cannot know for sure. But what I am certain based on my experiments is that the reboot happens at a low load but only immediately after a high load or maybe even multi-core load. And of course, it depends on the instructions executed. I tried emulating lots of load variations in python and only numpy vector multiplications that surely use avx or avx2 can crash it.
> 
> I am very sure Chrome also uses avx and based on each page content it can cause variable load. But it's a gamble.


Bring it on mate


----------



## folklore11

Now more on this this from Hardware Unboxed: A poll...



https://www.youtube.com/post/Ugw8xfuKoIqgHZsG8MF4AaABCQ


----------



## ghiga_andrei

folklore11 said:


> Now more on this this from Hardware Unboxed: A poll...
> 
> 
> 
> https://www.youtube.com/post/Ugw8xfuKoIqgHZsG8MF4AaABCQ


Actually Hardware Unboxed, not GN.


----------



## jvidia

folklore11 said:


> Now more on this this from Hardware Unboxed: A poll...
> 
> 
> 
> https://www.youtube.com/post/Ugw8xfuKoIqgHZsG8MF4AaABCQ


That poll is for the DOA CPU's not the idle reboot and WHEA errors.


----------



## Deepcuts

It is not only idle reboots.
My 1st one crashed at load also, during Windows install or Linux Live USB boot.
A bit disingenuous on hardware unboxed part to only consider 100% DoA a problem imho.


----------



## 1devomer

Well, well, someone at AMD lit the PR beacon looking for help, as it always does.

Furthermore, I'm still waiting reviewers inquiring about the dual CCD 5600X/5800X, which have issues with RyzenMaster not working with these cpus.

But i'm not surprised anymore when i notice that nobody takes a look a those issues.
At the end of the day, these reviewers belong to the same kind of people that gave Cyberpunk a 10/10!!


----------



## mongoled

Deepcuts said:


> It is not only idle reboots.
> My 1st one crashed at load also, during Windows install or Linux Live USB boot.
> A bit disingenuous on hardware unboxed part to only consider 100% DoA a problem imho.


Same could be said for your comment (a bit disingenuous) !

Did you read through the comments section to see how many people did not have any issues ??

This by no means means that people who are having issues in this thread are not having issues, it just means that the issue is not as widespread as many shills are hoping or are making it out to be.

Talking of shills, there is one who has quoted below.....



1devomer said:


> Well, well, someone at AMD lit the PR beacon looking for help, as it always does.
> 
> Furthermore, I'm still waiting reviewers inquiring about the dual CCD 5600X/5800X, which have issues with RyzenMaster not working with these cpus.
> 
> But i'm not surprised anymore when i notice that nobody takes a look a those issues.
> At the end of the day, these reviewers belong to the same kind of people that gave Cyberpunk a 10/10!!


Apologies if you are not a shill, but seeing you registered just to jump on the bandwagon I will follow my gut instinct unless you can prove otherwise (sorry thats a bit unfair as there is no way for you to prove that!).

The Ryzen Master "issue" is nothing compared to people who have received CPUs that cannot run within spec without producing some hard errors.

I am sure that a Ryzen Master update will correct this (my first 5600x was a dual CCD CPU so I know exactly how the bug occurs).

But some of you peeps coming out here to scream that you should not buy AMD or their stock because of this, that is absolutely fe ck ing re d i cu lou s

Shilling as its finest


----------



## ghiga_andrei

Excluding the 76% which voted but did not buy a cpu, we have 21% works and 2% doa (whatever that means, i guess people voted this for any issue with the cpu)... there are 75k votes, so that 2% means 1500 bad cpus versus 15750 good cpus... I would say that is an alarming defect rate, but it depends how you want to read it I guess...


----------



## Spectre73

ghiga_andrei said:


> Excluding the 76% which voted but did not buy a cpu, we have 21% works and 2% doa (whatever that means, i guess people voted this for any issue with the cpu)... there are 75k votes, so that 2% means 1500 bad cpus versus 15750 good cpus... I would say that is an alarming defect rate, but it depends how you want to read it I guess...


Look here:

__
https://www.reddit.com/r/Amd/comments/lmspdd

Apparently, 2% is nothing out of the ordinary. Even Intel are around 1%.


----------



## azomiss

Hi there,

Another one bites the dust. 

System specs

CPU - Ryzen 9 5900x (PN: 100-000000061)
Motherboard - Gigabyte B550 Aorus Master (Rev.1.0, Bios: F13a)
Ram - 2*16GB [email protected] CL16 Crucial (PN: BL16G36C16U4R.M8FB1)
Cooler - Noctua NH-D15 Chromax.Black
GPU - Sapphire AMD Radeon RX 5600 XT
PSU - Corsair 550 Gold

Windows 10 PRO 64-bit (Build 19042)

Fresh new install, latest drivers. latest bios.

Bios settings:

activated XMP profile for [email protected]
activated virtualization
everything else is on default/auto

Used for almost 2 weeks now, last WHEA was 2 days ago:

*A fatal hardware error has occurred.

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 25*

Issue manifest itself when in idle.

Tested with:

cinebench r20 - 8509
cinebench r23 - 21611
aida64 - all benchmarks
memtest86 v9 pro 

- found no issues.

Using this computer for remote work + VM's and casual gaming.

P.S.

Just jumped from a i7-6700/ Asus Z170 Deluxe/ 2*8GB [email protected] to this rig so I do not have too mutch experience with AMD.
All Intel starting from my first [email protected] until now.


Seems that I need some help from you guys, I got trough almost all the pages from this post.


----------



## ghiga_andrei

azomiss said:


> Hi there,
> 
> Another one bites the dust.
> 
> System specs
> 
> CPU - Ryzen 9 5900x (PN: 100-000000061)
> Motherboard - Gigabyte B550 Aorus Master (Rev.1.0, Bios: F13a)
> Ram - 2*16GB [email protected] CL16 Crucial (PN: BL16G36C16U4R.M8FB1)
> Cooler - Noctua NH-D15 Chromax.Black
> GPU - Sapphire AMD Radeon RX 5600 XT
> PSU - Corsair 550 Gold
> 
> Windows 10 PRO 64-bit (Build 19042)
> 
> Fresh new install, latest drivers. latest bios.
> 
> Bios settings:
> 
> activated XMP profile for [email protected]
> activated virtualization
> everything else is on default/auto
> 
> Used for almost 2 weeks now, last WHEA was 2 days ago:
> 
> *A fatal hardware error has occurred.
> 
> Reported by component: Processor Core
> Error Source: Machine Check Exception
> Error Type: Cache Hierarchy Error
> Processor APIC ID: 25*
> 
> Issue manifest itself when in idle.
> 
> Tested with:
> 
> cinebench r20 - 8509
> cinebench r23 - 21611
> aida64 - all benchmarks
> memtest86 v9 pro
> 
> - found no issues.
> 
> Using this computer for remote work + VM's and casual gaming.
> 
> P.S.
> 
> Just jumped from a i7-6700/ Asus Z170 Deluxe/ 2*8GB [email protected] to this rig so I do not have too mutch experience with AMD.
> All Intel starting from my first [email protected] until now.
> 
> 
> Seems that I need some help from you guys, I got trough almost all the pages from this post.


The only help you need is to return your cpu and get another. You're from RO so just return it to the store if you're in the 14 days return period.


----------



## azomiss

ghiga_andrei said:


> The only help you need is to return your cpu and get another. You're from RO so just return it to the store if you're in the 14 days return period.


Hi Andrei,

Thank you for the prompt reply.

I wanted to double check with you guys because I do have a bios version that ends in 'A', meaning it is a alpha version?

Anyway, just noticed that a new version has been added on Gigabyte site today, F13c.

Will do a bios update, test the system and get back to you.

Regards.


----------



## ghiga_andrei

azomiss said:


> Hi Andrei,
> 
> Thank you for the prompt reply.
> 
> I wanted to double check with you guys because I do have a bios version that ends in 'A', meaning it is a alpha version?
> 
> Anyway, just noticed that a new version has been added on Gigabyte site today, F13c.
> 
> Will do a bios update, test the system and get back to you.
> 
> Regards.


It's the same AGESA version, you won't get much difference.

Try to test stability like this: boot into windows, open chrome and some tabs like facebook, this one, imdb, epic games front page, open Cinebench R20 run MC test a few times and then close Chrome, open Chrome a bunch of times and then browse normally.

Test as much as you want but don't wait for the 14 day return period to expire. You can make the return claim to the store now and legally you have another 14 days after you make the claim to actually return the product. If you decide you don't want to return it anymore, don't return it. They won't mind.


----------



## GamBoTron

they talk about it around 17:03 , not really giving a lot of info based on overall data, only from an Australian retailer. Minimal amount of time spent on the issue


----------



## folklore11

GamBoTron said:


> they talk about it around 17:03 , not really giving a lot of info based on overall data, only from an Australian retailer


You beat me to it ...LoL


----------



## BFGHDF

Hi, my case it's a bit different from yours.
I bought a 5900x and it was an amazing sillicone chip, everything worked like a charm, DOCP RAM @ 3600MHZ, ryzen boosting to 4700 by itself, games working perfectly.
But, after a month of use, the computer started to reboot, and the reboots became more often till the computer stopped from displaying BIOS, I still have my old computer and tested every piece of HW, except MB and CPU (Because my old rig was Intel). I RMA the cpu with my local store, and they confirmed it was dead and they replaced it by a new one.
Guess what? The new cpu is a piece of garbage, Can't use DOCP, have to turn off PBO and CPB, to be able to boot windows. There is no other way to be stable.
I believe the issue is with infinity fabric, that specific part burns out, it's not well built.
Going to RMA to AMD directly this time, at least everyone says the CPU from amd stores works well.

First and last time I buy AMD stuff.


----------



## Dword

Spectre73 said:


> Look here:
> 
> __
> https://www.reddit.com/r/Amd/comments/lmspdd
> 
> Apparently, 2% is nothing out of the ordinary. Even Intel are around 1%.


its not 2% "there are 75k votes, so that 2% means 1500 bad cpus versus 15750 good cpus... " its mean 10% FAILURE CPU
15000 good and 1500 bad!!! 10%!


----------



## Dword

delete


----------



## Spectre73

Dword said:


> its not 2% "there are 75k votes, so that 2% means 1500 bad cpus versus 15750 good cpus... " its mean 10% FAILURE CPU
> 15000 good and 1500 bad!!! 10%!


I was referring to this part of the article:



> A third vendor provided even more information. The company said it isn’t seeing PowerGPU’s reported failure rates with its own systems. Interestingly, however, the vendor actually shared data indicating that Ryzen parts are failing the company’s internal quality screening at a higher rate compared to Intel chips—almost three times as high:
> 
> 
> Ryzen 5000 series fails at 2.9 percent.
> Ryzen 3000 series fails at 3 percent.
> ThreadRipper 3000 series fails at 2.5 percent.
> For comparison, the company' data on Intel chips:
> 
> 
> Intel 9th-gen fails at 0.9 percent.
> Intel 10th-gen fails at 1.2 percent.


But it is 3%, not 2%, sry.


----------



## Redwoodz

ghiga_andrei said:


> It's the same AGESA version, you won't get much difference.
> 
> Try to test stability like this: boot into windows, open chrome and some tabs like facebook, this one, imdb, epic games front page, open Cinebench R20 run MC test a few times and then close Chrome, open Chrome a bunch of times and then browse normally.
> 
> Test as much as you want but don't wait for the 14 day return period to expire. You can make the return claim to the store now and legally you have another 14 days after you make the claim to actually return the product. If you decide you don't want to return it anymore, don't return it. They won't mind.


 This is pure bs. Know we know how Intel is trying to inflate return rates. Losers. Guy gets an error and your response is to return the cpu. Get real man.


----------



## Anthos

Redwoodz said:


> This is pure bs. Know we know how Intel is trying to inflate return rates. Losers. Guy gets an error and your response is to return the cpu. Get real man.


Yeah, I am pretty sure the guy is an undercover Intel employee. Dude snap out of it. Personally I am against the idea of "omg, you got a sudden reboot then immediately return the CPU" and I advocate troubleshooting first buuuut there's obviously enough evidence that it does happen a lot. And Andrei didn't even say to the guy straight up return it but to try to consistently reproduce the error first but keep his options open. And yeah if you have everything in stock settings and you keep getting WHEA 18 errors and you are within your return window then returning it straight up to the shop is a much easier thing to do considering you can get your money back too while if the window passes then you are commited to this cpu forever (which obviously at some point you ll end up with a working one if you persevere but some people just prefer to switch to intel and end this).


----------



## ghiga_andrei

I am even working on a benchmarking tool to constantly reproduce the issue, but i still need a few more days to have it ready.

But yeah, I'm an intel employee trying hard to make you return your good cpus that constantly crash your system with other good cpus that work perfectly once exchanged.


----------



## BFGHDF

It's like flat earthists but on CPUs 😀


----------



## tdimarzio

Just wanted to add my personal experience. I purchased my 5950x in November and it had been running WHEA free for months, with PBO and RAM overclocking, which had all been tested extensively. Suddenly, a little over a week ago, I started getting random reboots, and the event was the dreaded WHEA event 18 "Cache Hierarchy Error". So, I searched and found this thread promptly. Over a few days, I read the entire thread - all 60 pages and 12,000 posts, from beginning to end. I tried multiple BIOS versions, BIOS resets, CMOS reset (on the thought this is more thorough than an in-BIOS settings reset), VDDP/VDDG adjustments, everything. I was still getting WHEA 18 even with 100% BIOS defaults. So, based on everything I had read, I opened an RMA with AMD. That RMA is now approved and I have the shipping label in-hand. However, I will not be using it, as I have found the problem in the meantime. There is nothing wrong with my 5950x. In fact, I have restored my aggressive BIOS settings w/ PBO and RAM overclocking, and have been WHEA-free, 100% solid stable for 3 days now. This is with extended periods of idle time as well. In the end, my issue was the result of a combination of the latest RX 6800/6900 adrenaline drivers and hwinfo + GPU-Z, which I always have running in the background. The issue is well documented here and here, but the short version is - I updated my hwinfo to the latest beta and WHEAs go bye-bye. This is not to diminish any of the other folks who are suffering a true hardware defect. I'm sure many of those are valid. However, in the interest of balance and perhaps saving someone else from an unnecessary and time-consuming RMA process, I wanted to post this. Hope this helps someone.


----------



## BFGHDF

tdimarzio said:


> Just wanted to add my personal experience. I purchased my 5950x in November and it had been running WHEA free for months, with PBO and RAM overclocking, which had all been tested extensively. Suddenly, a little over a week ago, I started getting random reboots, and the event was the dreaded WHEA event 18 "Cache Hierarchy Error". So, I searched and found this thread promptly. Over a few days, I read the entire thread - all 60 pages and 12,000 posts, from beginning to end. I tried multiple BIOS versions, BIOS resets, CMOS reset (on the thought this is more thorough than an in-BIOS settings reset), VDDP/VDDG adjustments, everything. I was still getting WHEA 18 even with 100% BIOS defaults. So, based on everything I had read, I opened an RMA with AMD. That RMA is now approved and I have the shipping label in-hand. However, I will not be using it, as I have found the problem in the meantime. There is nothing wrong with my 5950x. In fact, I have restored my aggressive BIOS settings w/ PBO and RAM overclocking, and have been WHEA-free, 100% solid stable for 3 days now. This is with extended periods of idle time as well. In the end, my issue was the result of a combination of the latest RX 6800/6900 adrenaline drivers and hwinfo + GPU-Z, which I always have running in the background. The issue is well documented here and here, but the short version is - I updated my hwinfo to the latest beta and WHEAs go bye-bye. This is not to diminish any of the other folks who are suffering a true hardware defect. I'm sure many of those are valid. However, in the interest of balance and perhaps saving someone else from an unnecessary and time-consuming RMA process, I wanted to post this. Hope this helps someone.


I am happy you solved it. In my case I don't have hwinfo installed. I've decided to go back to Intel, will ask my local vendor a refund and go full intel.


----------



## 1devomer

Which by the way, casted shades onto the HWInfo maintainer, because a lot of users jumped on the bandwagon attacking HWInfo, saying it was the cause of WHEA errors.
When in reality, HWInfo developers have been working tirelessly with the community over the years now, to provide decent and stable monitoring, over many platforms.
Once again, problematic AMD hardware and/or software was the cause, mainly because AMD does not share much with developers, linux ryzen monitoring still have issues nowadays.

And by the way, since we are citing Reddit, AMD has gone full Apple mode regarding the USB issues:
_"AMD is aware of reports *that a small number of users* are experiencing intermittent USB connectivity issues reported on 500 Series chipsets."_

AMD also seems to have fixed the RyzenMaster issue, when paired with dual 5600x/5800x cpu's.


Sadly nowadays, it is not because something is good on paper, that the stuff is certified to be good once in your hands.
That's a lot of issues all together, if you ask me!


----------



## ghiga_andrei

tdimarzio said:


> Just wanted to add my personal experience. I purchased my 5950x in November and it had been running WHEA free for months, with PBO and RAM overclocking, which had all been tested extensively. Suddenly, a little over a week ago, I started getting random reboots, and the event was the dreaded WHEA event 18 "Cache Hierarchy Error". So, I searched and found this thread promptly. Over a few days, I read the entire thread - all 60 pages and 12,000 posts, from beginning to end. I tried multiple BIOS versions, BIOS resets, CMOS reset (on the thought this is more thorough than an in-BIOS settings reset), VDDP/VDDG adjustments, everything. I was still getting WHEA 18 even with 100% BIOS defaults. So, based on everything I had read, I opened an RMA with AMD. That RMA is now approved and I have the shipping label in-hand. However, I will not be using it, as I have found the problem in the meantime. There is nothing wrong with my 5950x. In fact, I have restored my aggressive BIOS settings w/ PBO and RAM overclocking, and have been WHEA-free, 100% solid stable for 3 days now. This is with extended periods of idle time as well. In the end, my issue was the result of a combination of the latest RX 6800/6900 adrenaline drivers and hwinfo + GPU-Z, which I always have running in the background. The issue is well documented here and here, but the short version is - I updated my hwinfo to the latest beta and WHEAs go bye-bye. This is not to diminish any of the other folks who are suffering a true hardware defect. I'm sure many of those are valid. However, in the interest of balance and perhaps saving someone else from an unnecessary and time-consuming RMA process, I wanted to post this. Hope this helps someone.


Could be that the new drivers just use a combination of instructions that trigger the behavior in the CPU. No one can tell for sure, except AMD.

But I've read the threads and something interesting I noticed is that they say: Every single reboot reported as WHEA 18 with different APIC numbers.
This means each time a different cpu core caused the fault. So maybe this is a good indication it's not the CPU.

In my case and others members in this thread, we are always getting the same APIC (thread) number in the WHEA errors. That means a specific bad core.

So maybe a good advice for everyone is to look at the APIC number in the WHEA events if it's always the same or not (or at least the same 1-2 cores).


----------



## danny9428

Hi, I just finished my 5950X build weeks ago but was kinda frustrated with the random reboots and 'CPU Over Temperature Error' prompts from my motherboard so I figured I might as well write something about it here...


My build :

Ryzen 9 5950X
Asus Crosshair VIII Dark Hero (BIOS 3204)
G.Skill Trident Z Royale 4x32GB 3600 18-22-22-42 kit
Western Digital SN850 1TB NvmE Pci-E 4.0 x4 SSD ( + Intel 750 SSD, 2x 6TB HDDs from old build)
AMD Radeon 6900 XT (Reference)
Corsair AX-1000
IceGiant ProSiphon Elite CPU air cooler
Lian-li PC-O11D XL with 9x Arctic P12 PWM-PST fans

IceGiant cooler came with the Thermal Grizzly Kryonaut paste which I believe is better than most non-conductive compound in the market?
( well it should at least beat my MX-4 lying on the shelf.. )


So the CPU is not stable at all whenever PBO is enabled (with or without F-max or board core performance boost enabled)
it straight out spits 0x124 across my event logger with some occasional WHEA ID 18 logs flanked with maybe one or two ID 19 corrected ones
even if it ever lets me run benchmarks or CPU-Z score, the PBO clocks are simply not right with the performance and single core score shows my core clocks are super stretched
my CPU-Z score only shows single core at ~630 - 640 which is literally what my manual OC at 4.6Ghz could also score
in Cinebench R23 at PBO multi-core score is about 24k which I believe is on par with a stock 5950X
though even when accounting the core stretching at stock PBO, the core isn't even reaching above 4.95Ghz in HWinfo64 logs

Funny is my 5950X is only more stable when I punch manual OC and vcores to it (nothing too crazy, only CCD1 4.6 and CCD2 4.45 at 1.26V and LLC3 )
yet at manual OC it'll instead gives random black screen reboots or the motherboard screams 'CPU Over Temperature Error' while temps are only sitting at 70 ~ 80C

My previous build is an X99 platform, used to have a 6950X with 10 cores overclockable which I would have not needed an upgrade if it didn't die
fun fact, that 6950X died relatively slowly, from being able to OC at 4.1Ghz down to only able to run stock 3.5Ghz and finally a click to Post code 00
throughout that 9-month period (yes, that chip only lived 9-months) it just keeps spitting out random 0x124s or system hang ups which I can never replicate with stress tools like prime95 or occts

This 5950X gave me pretty much the flashbacks of that horrible chip


--


Small update :
As I attempted to type out this reply here in the meantime as I try and verify the chip with all default no oc settings
(aka everything auto with clock boost disabled and CPU at 3.4Ghz, memory 2133/1066FCLK)
it still crashes me with 0x124, I'll assume RMA is my only choice

The CPU batch no is 2047PGS (Malaysia), a fairly late chip I thought would be free of the WHEA issues but apparantly NOPE....

So much I loved you your Silver Sample 5950X but nah... I need a more stable 24/7 platform than these....


(This is my 2nd AMD cpu and it kinda didn't went well, my Intel 6950X also didn't went well...the names look similar, coincidence? : P)

also because of how scarce and scalped these chips are, mine came from Amazon instead of a local retailer which meant that I probably would have to bear shipping cost just to do the RMA as the times needed to wait for other parts to come in I've already long passed the return window Amazon had


update 2 :
Image for the whea and bsod logs















small note : CPU-Z shows I'm running 4.6Ghz, it's only because at stock or PBO and everything auto the PC would not stay up past 10 minutes and I had to go back to my manual oc attempts just to allow me time to do screen snippings and scroll through logs.... lolz


----------



## tdimarzio

ghiga_andrei said:


> Could be that the new drivers just use a combination of instructions that trigger the behavior in the CPU. No one can tell for sure, except AMD.
> 
> But I've read the threads and something interesting I noticed is that they say: Every single reboot reported as WHEA 18 with different ACPI numbers.
> This means each time a different cpu core caused the fault. So maybe this is a good indication it's not the CPU.
> 
> In my case and others members in this thread, we are always getting the same ACPI (thread) number in the WHEA errors. That means a specific bad core.
> 
> So maybe a good advice for everyone is to look at the ACPI number in the WHEA events if it's always the same or not (or at least the same 1-2 cores).


Yes, I agree that the takeaway should not be "hwinfo crashes Zen3". Even though hwinfo was able to work around the problematic code path, the root-cause of the issue is apparently still present, and any other application could (in theory) trigger that code path in just the right way to result in the same WHEA 18 "cache hierarchy error". So, ultimately, AMD will need to fix this either with a driver update or an AGESA update. In the meantime, I think your guidance about taking note of the APIC IDs is sound. For me, I was seeing WHEA 18 on at least 8 different APIC IDs, corresponding to 4 different cores. It's possible that if I waited through it longer, I would have seen it on even more APIC IDs. The possibility of that many cores being defective is very small, especially when the CPU was otherwise stable for months. So, everyone should take note of the APIC IDs corresponding to the WHEA 18 events. If it's more than a couple different IDs, it may point toward the issue I was experiencing (referenced in post # 1186), which, as far as I know, would only be applicable if you have a RDNA2 GPU and the latest Adrenaline drivers.


----------



## azomiss

My first reply from AMD,

Provided here are some troubleshooting suggestions to help isolate the root cause(s) and resolve the problem. Make sure to check the system for stability after completing each step below:
1.Update the system BIOS to latest version available from motherboard manufacturer (refer to motherboard user manual for instructions on updating the BIOS).
2.Set the BIOS to use factory default settings / optimized default settings (refer to motherboard user manual for instructions on restoring BIOS default settings).
3.In the BIOS, locate the Power Supply Idle Control option and set it to Typical (this option should be available in the Advanced section of the BIOS).
4.Update Windows to the latest version and build via Windows Update. For instructions, refer to article.
5.Update to latest chipset driver from AMD. For instructions, refer to article.
6.In Windows Control Panel, select Power Options and choose the Balanced (recommended) power plan. In Windows Settings, select Power & sleep and set the Performance and Energy slider to the middle.
7.Disable non-Microsoft services and startup items using the System Configuration Tool. For instructions, refer to article.
8.Reseat CPU, RAM, and all PSU power connections (end-to-end for modular PSUs). For more instructions, refer the product’s user manual.
Verify RAM sticks are installed in the correct DIMM slots (for socket AM4 motherboards with 4 DIMM slots, use A2 & B2).


----------



## RemoteSpecialist

azomiss said:


> 4.Update Windows to the latest version and build via Windows Update. For instructions, refer to article.


I would change this to "Make clean Windows 10 install from the latest available image (2020H2) and install all windows updates"


----------



## azomiss

RemoteSpecialist said:


> I would change this to "Make clean Windows 10 install from the latest available image (2020H2) and install all windows updates"


Actually I'm on the latest version as I told them but yeah ... I only have 3 steps to test. The other ones are already tested.


----------



## danny9428

azomiss said:


> My first reply from AMD,
> 
> Provided here are some troubleshooting suggestions to help isolate the root cause(s) and resolve the problem. Make sure to check the system for stability after completing each step below:
> 1.Update the system BIOS to latest version available from motherboard manufacturer (refer to motherboard user manual for instructions on updating the BIOS).
> 2.Set the BIOS to use factory default settings / optimized default settings (refer to motherboard user manual for instructions on restoring BIOS default settings).
> 3.In the BIOS, locate the Power Supply Idle Control option and set it to Typical (this option should be available in the Advanced section of the BIOS).
> 4.Update Windows to the latest version and build via Windows Update. For instructions, refer to article.
> 5.Update to latest chipset driver from AMD. For instructions, refer to article.
> 6.In Windows Control Panel, select Power Options and choose the Balanced (recommended) power plan. In Windows Settings, select Power & sleep and set the Performance and Energy slider to the middle.
> 7.Disable non-Microsoft services and startup items using the System Configuration Tool. For instructions, refer to article.
> 8.Reseat CPU, RAM, and all PSU power connections (end-to-end for modular PSUs). For more instructions, refer the product’s user manual.
> Verify RAM sticks are installed in the correct DIMM slots (for socket AM4 motherboards with 4 DIMM slots, use A2 & B2).


I'll assume this means expect some weeks before finally recieving an approval for RMA from AMD : (


----------



## JohnnyFlash

danny9428 said:


> Funny is my 5950X is only more stable when I punch manual OC and vcores to it (nothing too crazy, only CCD1 4.6 and CCD2 4.45 at 1.26V and LLC3 )
> yet at manual OC it'll instead gives random black screen reboots or the motherboard screams 'CPU Over Temperature Error' while temps are only sitting at 70 ~ 80C


Wait, you were getting reboots even with manual overclock? Were sleep states enabled?


----------



## azomiss

Getting back with an update after performing all the tests for AMD.

From 8 steps, 7 passed and one did not.

*3.In the BIOS, locate the Power Supply Idle Control option and set it to Typical (this option should be available in the Advanced section of the BIOS).*

After changing that setting and booted in to Windows, system rebooted in less then 3 minutes.

Is that an evidence of a bad CPU or not? What do you think?


----------



## RemoteSpecialist

azomiss said:


> Is that an evidence of a bad CPU or not? What do you think?


I think that 'Typical' for 'Power Supply Idle Control' should make the system *more* stable for older PSUs. It could fix such called cold reboots - then there is no even a BSOD - pc just reboots the same way as by turning off power on PSU.

So if it crashes on a 'Typical' setting - it's another bad CPU evidence.


----------



## ghiga_andrei

azomiss said:


> Getting back with an update after performing all the tests for AMD.
> 
> From 8 steps, 7 passed and one did not.
> 
> *3.In the BIOS, locate the Power Supply Idle Control option and set it to Typical (this option should be available in the Advanced section of the BIOS).*
> 
> After changing that setting and booted in to Windows, system rebooted in less then 3 minutes.
> 
> Is that an evidence of a bad CPU or not? What do you think?


Typical Idle Current just means that the CPU keeps an active core at all times even if there is no load on it just to make sure the power supply does not think your computer is sleeping due to too low power consumption. Low Idle Current (Default I think) means that all cores in the CPU can sleep at once with low voltages.

Did not make any difference in my bad 5900x, already tried it. But they did mention it because Ryzen 1xxx or 2xxx series had a bug where this setting caused them to deep sleep forever until hard reset.

For more info, I took these screenshots while trying the settings:

Typical Idle Current - You can see that even though all cores are asleep, 1 core actually stays at 0.98V and the CPU idle power consumption is 7.3W and EDC is 3% = 6A.









Low Idle Current - All voltages are under 0.54V and CPU power consumption is just 3W and EDC is 1% = 2A.









Since the SOC either way draws 18W at all times I don't see how the extra 4W of core CPU power would make any difference for the power supply. But the current draw is indeed triple.


----------



## ghiga_andrei

Also azomiss, go into Event Viewer -> Windows Logs -> System and on the right use Filter Current Log, select Event level only Error and Event sources only WHEA-Logger.
Then look into all occurrences of the WHEA errors and tell us if the APIC IDs reported are always the same or they vary.
If it's always the same it means that only 1 CPU core causes the errors.

Also, I've left you a private message to try to use Positive Curve Optimizer to something like 10 all-core and see if the reboots are gone. That is a very clear indication of bad CPU.


----------



## RemoteSpecialist

While I am waiting for another 5950x to be available - I'm taking a look for motherboards also.
Currently, I have MSI B550 Carbon WiFi - I cannot say a bad word about it. But what do think, guys? Is it enough for 5950x or I should consider smth else?


----------



## danny9428

JohnnyFlash said:


> Wait, you were getting reboots even with manual overclock? Were sleep states enabled?


Don't think I've ever tried disabling it when in manual oc
I can see Ryzen Master would report core going sleep but HWinfo64 doesn't show any C6 states
Maybe I can try and do that, see if that would change any while I wait for AMD to reply

It's just as frustrating as my old 6950X as this kind of reboot only ever happens when my system uptime is past 24 hours lol


----------



## JohnnyFlash

danny9428 said:


> Don't think I've ever tried disabling it when in manual oc
> I can see Ryzen Master would report core going sleep but HWinfo64 doesn't show any C6 states
> Maybe I can try and do that, see if that would change any while I wait for AMD to reply
> 
> It's just as frustrating as my old 6950X as this kind of reboot only ever happens when my system uptime is past 24 hours lol


Ya, this is really surprising as most of my reading has shown that manual all core settings were fine. That's how I planned on running system, but now maybe I'll just get a 10980XE and sell my dark hero.


----------



## danny9428

JohnnyFlash said:


> Ya, this is really surprising as most of my reading has shown that manual all core settings were fine. That's how I planned on running system, but now maybe I'll just get a 10980XE and sell my dark hero.


If AMD lets me in through the RMA I might consider just go for the sTRX4 3960X instead
I should've just went HEDT but the Zen 3 just promised so much it's so tempting : P


----------



## azomiss

More details regarding PSIC setting.

3.In the BIOS, locate the Power Supply Idle Control option and set it to Typical (this option should be available in the Advanced section of the BIOS).
--- after changing this setting and booted in windows, system restarted in less than 5 minutes .
--- tried again and system got stuck for 30 seconds with a black screen and noticed that my USB keyboard did not have power.
--- system restarted by itself and bios has been automatically changed to default.

Also pictures with PSIC on auto and IDLE:










And with PSIC on Tyoical and IDLE:










I will get back with the Positive Curve Optimizer results also later today.


----------



## GRABibus

mtavel said:


> One final difference I'd like to point out between my original bad 5950x and the replacement - I consistently had a 7 to 8 degree C difference between CCD 0 and CCD 1.
> 
> The new 5950x has a 1-2 degree temperature difference between CCD's. MUCH more consistent and better matched.
> Interestingly, it was CCD 1 (the cooler running CCD) on my old 5950x that had the most WHEA causing APICs.


From my side I have a 5900X since 1 month and half with no issue at idle or low loads.
CCX0 has 10 degrees more than CCX1 and my APIC whea’s at idle or low loads when tweaking PBO/CO (negative offset) come from Core8 (CCX1)....

So difficult to conclude....


----------



## ghiga_andrei

azomiss said:


> More details regarding PSIC setting.
> 
> 3.In the BIOS, locate the Power Supply Idle Control option and set it to Typical (this option should be available in the Advanced section of the BIOS).
> --- after changing this setting and booted in windows, system restarted in less than 5 minutes .
> --- tried again and system got stuck for 30 seconds with a black screen and noticed that my USB keyboard did not have power.
> --- system restarted by itself and bios has been automatically changed to default.
> 
> Also pictures with PSIC on auto and IDLE:
> 
> View attachment 2479996
> 
> 
> And with PSIC on Tyoical and IDLE:
> 
> View attachment 2479997
> 
> 
> I will get back with the Positive Curve Optimizer results also later today.


These BIOS settings are very sticky, once you change something from Auto to a specific value and you put it back to Auto it will remain on the last specific value. The only way to really test Auto is to clear the BIOS by CMOS battery removal. Your pictures are identical from my point of view, so they were both on Typical I would bet.

Don't forget to also extract the WHEA logs like I mentioned in the previous post. From Ryzen Master your best core is 02 so I would expect it to cause the faults since it will boost the highest. If your WHEA APIC IDs are always 2 or 3 then they relate to that core. Core 02 in Ryzen Master is actually core 1 in BIOS / Windows and the APIC IDs are relating to Threads and there are 2 threads per core so APIC ID 2 or 3 are the threads of core 1.


----------



## ghiga_andrei

GRABibus said:


> From my side I have a 5900X since 1 month and half with no issue at idle or low loads.
> CCX0 has 10 degrees more than CCX1 and my APIC whea’s at idle or low loads when tweaking PBO/CO (negative offset) come from Core8 (CCX1)....
> 
> So difficult to conclude....


I read somewhere on the AMD forum that if you have such a large temp difference between the CCDs (you say CCX but you mean CCD, CCXs are inside the CCDs) you can try to slowly unscrew one side of the cooler or the other one to see if it makes any difference. The theory would be that convex / concave metal casing of the CPU combined with concave / convex cooler base could cause it to be more tight on one CCD and more loose on the other. But it's just a theory. 

My 1st 5900x also had a 15dgr difference between CCDs (CCD1 was hotter) and my 2nd 5900x has a 5dgr difference, but this time on CCD2. With the same cooler and I tried multiple installations and the results are pretty much the same.


----------



## danny9428

ghiga_andrei said:


> I read somewhere on the AMD forum that if you have such a large temp difference between the CCDs (you say CCX but you mean CCD, CCXs are inside the CCDs) you can try to slowly unscrew one side of the cooler or the other one to see if it makes any difference. The theory would be that convex / concave metal casing of the CPU combined with concave / convex cooler base could cause it to be more tight on one CCD and more loose on the other. But it's just a theory.
> 
> My 1st 5900x also had a 15dgr difference between CCDs (CCD1 was hotter) and my 2nd 5900x has a 5dgr difference, but this time on CCD2. With the same cooler and I tried multiple installations and the results are pretty much the same.


My 5950X has CCD0 running about 13~15 degrees hotter than CCD1
Haven't really checked much on how the IceGiant cooler pressures the chip though so I can't tell if doing the loose-screw offseting would help much
I don't feel like doing that on a 2kg weight air cooler though


----------



## danny9428

Ok so I just got a reply from AMD asking me to provide info on my CPU purchase and photo evidence for my RMA request
is it like you just reply to that generic support email or is it like there's a ticket website like Intel does?

It's my first time dealing with AMD regarding that so ... : P


----------



## mtavel

danny9428 said:


> Ok so I just got a reply from AMD asking me to provide info on my CPU purchase and photo evidence for my RMA request
> is it like you just reply to that generic support email or is it like there's a ticket website like Intel does?
> 
> It's my first time dealing with AMD regarding that so ... : P


I'm in the US and I just replied to their email. I never viewed/edited my ticket through any portal. It's possible they have different processes in different regions, but it definitely seemed like the agent I was corresponding with was in a very different time zone compared with me.


----------



## azomiss

ghiga_andrei said:


> These BIOS settings are very sticky, once you change something from Auto to a specific value and you put it back to Auto it will remain on the last specific value. The only way to really test Auto is to clear the BIOS by CMOS battery removal. Your pictures are identical from my point of view, so they were both on Typical I would bet.
> 
> Don't forget to also extract the WHEA logs like I mentioned in the previous post. From Ryzen Master your best core is 02 so I would expect it to cause the faults since it will boost the highest. If your WHEA APIC IDs are always 2 or 3 then they relate to that core. Core 02 in Ryzen Master is actually core 1 in BIOS / Windows and the APIC IDs are relating to Threads and there are 2 threads per core so APIC ID 2 or 3 are the threads of core 1.


Regarding WHEA-Logger .... 4,26,6,8,2,3,25,16


----------



## ghiga_andrei

azomiss said:


> Regarding WHEA-Logger .... 4,26,6,8,2,3,25,16


Seems something weird in your setup, cannot be so many cores that fail.
Please try also the curve optimizer. You may not have a bad CPU after all.


----------



## GRABibus

ghiga_andrei said:


> Seems something weird in your setup, cannot be so many cores that fail.
> Please try also the curve optimizer. You may not have a bad CPU after all.


Curve optimiser with positive offsets of course 😊


----------



## silot

I have successfully RMA my 5900x , i retested and i am still getting WHEA errors with stock settings so the problem must be my GPU.


----------



## GRABibus

silot said:


> I have successfully RMA my 5900x , i retested and i am still getting WHEA errors with stock settings so the problem must be my GPU.


Or a second bad CPU ?
Do you have tried positive offsets with CO ?


----------



## azomiss

Okay, here is with PBO positive 10, now what?


----------



## RemoteSpecialist

silot said:


> I have successfully RMA my 5900x , i retested and i am still getting WHEA errors with stock settings so the problem must be my GPU.


What’s the batch number of a new CPU?


----------



## ghiga_andrei

azomiss said:


> Okay, here is with PBO positive 10, now what?
> 
> View attachment 2480037


Do you still get reboots with positive 10 ?


----------



## MikeS3000

So I finally received my new 5900x under warranty from AMD. That was fun. The package was stuck in Memphis with FedEx for 9 days before it finally got to me. My last CPU had the issue where it would fail Prime95 and OCCT at stock settings on the #1 core. My first impression of the new CPU (2104 PGS batch) is that the boost clock is toned down a bit. My old 5900x would try to boost just over 4.8 ghz running prime95 non-avx single core on the best cores. Now, the best cores are just shy of 4.8. My gut feeling is that some these failing CPUs were programmed to boost a bit too high from the factory and this led to failures at stock. I have lots of testing to do but so far no glaring issues.


----------



## Imraneo

silot said:


> I have successfully RMA my 5900x , i retested and i am still getting WHEA errors with stock settings so the problem must be my GPU.


Turn off CPB and test again. If without boost you do not get any errors of some sort, it means the issue is still your CPU.
Experts, pls chime in 



MikeS3000 said:


> So I finally received my new 5900x under warranty from AMD. That was fun. The package was stuck in Memphis with FedEx for 9 days before it finally got to me. My last CPU had the issue where it would fail Prime95 and OCCT at stock settings on the #1 core. My first impression of the new CPU (2104 PGS batch) is that the boost clock is toned down a bit. My old 5900x would try to boost just over 4.8 ghz running prime95 non-avx single core on the best cores. Now, the best cores are just shy of 4.8. My gut feeling is that some these failing CPUs were programmed to boost a bit too high from the factory and this led to failures at stock. I have lots of testing to do but so far no glaring issues.


I'd agree that the "new/fixed" CPUs are toned down. My old crappy CPU could easily go 4.5Ghz on all cores with PBO set to auto. My new CPU would only hit 4Ghz. I'd have to configure the PBO and tweak some CO settings and the best it does is 4.4Ghz on multi-core. On the flip side, this 4.4Ghz performs significantly better in my CB23 benchmark (22000 vs 18000), so I ain't complaining here!


----------



## danny9428

So after I wrote a long essay about my CPU and troubleshooting to AMD I decided to rerun all my diagnosis with different bios settings

Funny enough, as soon as I reset everything to optimized default the cpu appeared to be waaaay more stable than I used to deal with
now it can actually keep on running some stress test and benchmarks instead of sending me back to post screen before getting into windows

Sadly as soon as I go into Prime95 blend, 20 minutes in it still crash into black screen + 00 debug hardlock (where power and reset would not work in such crash)
Now since this took close to half an hour to get a crash and that no WHEA and bugcheck is ever generated it would take way longer for me to even run through all the combinations suggested....

will see what I can do....


----------



## danny9428

danny9428 said:


> So after I wrote a long essay about my CPU and troubleshooting to AMD I decided to rerun all my diagnosis with different bios settings
> 
> Funny enough, as soon as I reset everything to optimized default the cpu appeared to be waaaay more stable than I used to deal with
> now it can actually keep on running some stress test and benchmarks instead of sending me back to post screen before getting into windows
> 
> Sadly as soon as I go into Prime95 blend, 20 minutes in it still crash into black screen + 00 debug hardlock (where power and reset would not work in such crash)
> Now since this took close to half an hour to get a crash and that no WHEA and bugcheck is ever generated it would take way longer for me to even run through all the combinations suggested....
> 
> will see what I can do....


Okay it turns out all it needs for my cpu to go stable is a positive curve +5 to the core that might be tripping WHEA or 0x124 
(I haven't quite figured which particular core needs that positive adjust yet, since the crashes did not log the failing core or thread)
It happens that my Asus Crosshair VIII Dark Hero was doing funky things with my previous settings that it's mixing them up creating instability before I hit 'Load Optimized Defaults' the second time yesterday, and this may be the root cause my chip just refused to go stable with PBO in the past weeks.

I tried reading previous replies on this post to find more clues and found that some other users also seems to get that funky 'old settings getting stuck' thingy with the same board.

Now I feel bad for not doing troubleshoots throughly enough to call for rma, though it still appears without twinkering curve positive it would still crash, just not so often and imminent.
Though I just still cannot get those crazy scores those tech reviewers achieved in R20 or CPU-Z v16 bench (mine is still giving only about 630 single in CPU-Z)


Do you guys think I should just keep the chip instead or just continue RMA anyways? I also saw someone in older posts who seem to have a chip incapable of PBO at 0 curve and is stable otherwize at +5 yet he went RMA anyways.


----------



## MikeS3000

danny9428 said:


> Okay it turns out all it needs for my cpu to go stable is a positive curve +5 to the core that might be tripping WHEA or 0x124
> (I haven't quite figured which particular core needs that positive adjust yet, since the crashes did not log the failing core or thread)
> It happens that my Asus Crosshair VIII Dark Hero was doing funky things with my previous settings that it's mixing them up creating instability before I hit 'Load Optimized Defaults' the second time yesterday, and this may be the root cause my chip just refused to go stable with PBO in the past weeks.
> 
> I tried reading previous replies on this post to find more clues and found that some other users also seems to get that funky 'old settings getting stuck' thingy with the same board.
> 
> Now I feel bad for not doing troubleshoots throughly enough to call for rma, though it still appears without twinkering curve positive it would still crash, just not so often and imminent.
> Though I just still cannot get those crazy scores those tech reviewers achieved in R20 or CPU-Z v16 bench (mine is still giving only about 630 single in CPU-Z)
> 
> 
> Do you guys think I should just keep the chip instead or just continue RMA anyways? I also saw someone in older posts who seem to have a chip incapable of PBO at 0 curve and is stable otherwize at +5 yet he went RMA anyways.


RMA it. Your CPU is out of factory spec. My RMA was approved because I needed +5 on my #1 core to pass Prime95 and OCCT single. AMD did not bat an eye and approved it immediately. Something I did was film a YouTube video where I reset the bios, booted windows and replicated the failure. You can't argue with that. If we didn't have the storm last week I would only been without my CPU for 8 days.


----------



## ghiga_andrei

danny9428 said:


> Okay it turns out all it needs for my cpu to go stable is a positive curve +5 to the core that might be tripping WHEA or 0x124
> (I haven't quite figured which particular core needs that positive adjust yet, since the crashes did not log the failing core or thread)
> It happens that my Asus Crosshair VIII Dark Hero was doing funky things with my previous settings that it's mixing them up creating instability before I hit 'Load Optimized Defaults' the second time yesterday, and this may be the root cause my chip just refused to go stable with PBO in the past weeks.
> 
> I tried reading previous replies on this post to find more clues and found that some other users also seems to get that funky 'old settings getting stuck' thingy with the same board.
> 
> Now I feel bad for not doing troubleshoots throughly enough to call for rma, though it still appears without twinkering curve positive it would still crash, just not so often and imminent.
> Though I just still cannot get those crazy scores those tech reviewers achieved in R20 or CPU-Z v16 bench (mine is still giving only about 630 single in CPU-Z)
> 
> 
> Do you guys think I should just keep the chip instead or just continue RMA anyways? I also saw someone in older posts who seem to have a chip incapable of PBO at 0 curve and is stable otherwize at +5 yet he went RMA anyways.


Any CPU that needs positive CO to be stable is out of spec. Could also be a bad VRM on the MB, but if it's only 1 core that needs the CO then it's clear.


----------



## MikeS3000

ghiga_andrei said:


> Any CPU that needs positive CO to be stable is out of spec. Could also be a bad VRM on the MB, but if it's only 1 core that needs the CO then it's clear.


I already played the "maybe it's my motherboard" card when I ran my old 5900x on a different board and produced the same error. It's not worth it to blame the MB as this is clearly a repeatable CPU issue that affect a decent amount of people.


----------



## 1devomer

MikeS3000 said:


> I already played the "maybe it's my motherboard" card when I ran my old 5900x on a different board and produced the same error. It's not worth it to blame the MB as this is clearly a repeatable CPU issue that affect a decent amount of people.


I agree!
In fact, most of X570/B550 motherboards are designed with good VRM, most of them nowadays run at least 6 phases powered by DrMos Smart Power Stages.
We are far far away from the first X370/B350 motherboards running a mere 4 phases powered by the worst low end mosfets.
So yeah, i don't think that the motherboards manufactures should take the burden, at least not for this particular issue.

At the end it's pretty simple, look at the best cpu's around, look at the cores, soc voltages and compare them with your own cpu.
If your own cpu is far worse than what the best around cpu's can do, you mostly got a dull, badly binned cpu.
That need positive curve to be stable, which mean more core voltage, because one or mores core are not good enough to be on specs.

Like Intel does, AMD uses the same chiplet for all its cpu's, from the best EPYC 64 cores with 8 chiplets under the hood, to the 5600X_(theoretically)_ with only one chiplet under the IHS.


----------



## danny9428

ghiga_andrei said:


> Any CPU that needs positive CO to be stable is out of spec. Could also be a bad VRM on the MB, but if it's only 1 core that needs the CO then it's clear.


I specifically picked Dark Hero since it's one of the very few X570 that has the more recent 90A power stages and isn't sTRX4-tier expensive
Although it's ASUS and ASUS didn't quite live to people's expectation especially with their recent ROG lineups, I still believe that board shouldn't be on the sloppy end.
Though I guess the Bios behaviour kinda told the opposite story.... hehe
If I were given a second chance to pick a board, it probably will be MSI instead since it appears their boards are better at memory ocs (maybe the B550 unify or X570 Ace)
My Dark Hero does not post past 3600, even though it's more likely to my end as I'm attempting to squeeze 128GB ram to that poor IMC...

One thing I can be confident about my 5950X is it really does not perform with only the PBO, even if you go full-yolo and whack +200 and scalar 10X, the single core performance is simply not there. I can reach or surpass the single core scores with PBO simply by letting CCD1 to run 4.6Ghz manually with 1.23V which is


----------



## BFGHDF

I forget to post the picture of my ryzen with default issues. Here it is


----------



## danny9428

BFGHDF said:


> I forget to post the picture of my ryzen with default issues. Here it is


That's a fairly recent badge, so AMD still didn't quite bin the chips well enough it runs freely at stock even with year 2021 chips?


----------



## LuchoU

Hi Guys,

I'm new here, I just registered so I can collaborate and share my experience in this thread. I'm from Chile, here there is stock of 5600x and 5800x CPUs, the first cost around USD $482 and 5800x cost around USD $596. As you see prices are inflated, but there IS stock, which is good, bad thing are these WHEA 18 restarts.

I have a 5800x, batch 2043PGS. As almost everyone here I'm experiencing only WHEA 18 errors, just at load and this means gaming for me, not triggered by Prime or memory benchmarks such as Karhu, only gaming, and the game I'm using to trigger them is Read Dead Redemption 2 which is CPU intensive, so at least I get a WHEA 18 in less than 1 hour when gaming. I have tested everything, Max LLC, memories to 3200Mhz (My kit is 3600), disabling PBO, etc. My MB is Asus Rog Strix B550F
Yesterday I tried the suggestion to enable CO +10 for all cores and it seems it's stable now, at least yesterday I played for about 1 hour and a half and it not triggered any WHEA 18, but the CPU is boosting less, some cores are not boosting to 4.85Ghz as before. Good cores marked in Ryzen Master are boosting to 4.85Ghz and other cores are boostin to 4.8 or 4.7 max. 

Can someone please confirm if this behaviour is expected when increasing the curve to positive values? that may be the reason why I gained stability, but I would like to confirm.
Other question I have, if I put CO +5 or CO +10 and there is stability with both settings, will it make a difference? as I understand according to AMD slides PBO is intelligent and will use just the required mV that the CPU needs, so it wont overuse voltage, is that correct?

Thank you very much for your answers and I would try to continue collaborating with findings.


----------



## ghiga_andrei

LuchoU said:


> Hi Guys,
> 
> I'm new here, I just registered so I can collaborate and share my experience in this thread. I'm from Chile, here there is stock of 5600x and 5800x CPUs, the first cost around USD $482 and 5800x cost around USD $596. As you see prices are inflated, but there IS stock, which is good, bad thing are these WHEA 18 restarts.
> 
> I have a 5800x, batch 2043PGS. As almost everyone here I'm experiencing only WHEA 18 errors, just at load and this means gaming for me, not triggered by Prime or memory benchmarks such as Karhu, only gaming, and the game I'm using to trigger them is Read Dead Redemption 2 which is CPU intensive, so at least I get a WHEA 18 in less than 1 hour when gaming. I have tested everything, Max LLC, memories to 3200Mhz (My kit is 3600), disabling PBO, etc. My MB is Asus Rog Strix B550F
> Yesterday I tried the suggestion to enable CO +10 for all cores and it seems it's stable now, at least yesterday I played for about 1 hour and a half and it not triggered any WHEA 18, but the CPU is boosting less, some cores are not boosting to 4.85Ghz as before. Good cores marked in Ryzen Master are boosting to 4.85Ghz and other cores are boostin to 4.8 or 4.7 max.
> 
> Can someone please confirm if this behaviour is expected when increasing the curve to positive values? that may be the reason why I gained stability, but I would like to confirm.
> Other question I have, if I put CO +5 or CO +10 and there is stability with both settings, will it make a difference? as I understand according to AMD slides PBO is intelligent and will use just the required mV that the CPU needs, so it wont overuse voltage, is that correct?
> 
> Thank you very much for your answers and I would try to continue collaborating with findings.


Any amount of CO Positive will supply more voltage per given frequency. Theoretically, if you had perfect cooling this only meant more voltage. But since there is more heat due to this voltage, PBO will also lower the frequency a bit.

Most people use CO Negative to gain more performance due to the reverse effect: less voltage, less heat, more boost. CO Positive will give less performance, even though it will be within 3-5% not something extreme.

But your CPU will stay at higher voltages and higher temps which is not good for long term. Using CO +10 is only to confirm you have a bad CPU.


----------



## LuchoU

ghiga_andrei said:


> Any amount of CO Positive will supply more voltage per given frequency. Theoretically, if you had perfect cooling this only meant more voltage. But since there is more heat due to this voltage, PBO will also lower the frequency a bit.
> 
> Most people use CO Negative to gain more performance due to the reverse effect: less voltage, less heat, more boost. CO Positive will give less performance, even though it will be within 3-5% not something extreme.
> 
> But your CPU will stay at higher voltages and higher temps which is not good for long term. Using CO +10 is only to confirm you have a bad CPU.


Hi ghiga_andrei, yes, for sure I have a bad CPU and that was my gut feeling when I received the first WHEA 18 when gaming at stock settings after google "WHEA 18 AMD 5800X" about a month ago, now I have certainty, I knew it was not ram or mb related. It's funny because I feel "lucky" since there is people with issues even at desktop or booting, mine is just during gaming. I never thought a CPU could fail this way from factory, maybe I was naive, but a lot of chips has passed through my hands, ram, gpus, etc I never had issues with things failing at stock settings, it's crazy, I mean it's called silicon lottery but come on, one would think a chip can OVERCLOCK more or less, but that's different from failing at stock settings. They have a very crappy QC checklist at AMD, that's for sure.

Now I need to continue testing and reducing CO counts until I reach instability again, that's very time consuming or maybe I could work per core, but that's even more time consuming. I need to put in a balance if I want to RMA, shipping times from Chile to USA and how much time I won't be able to use my PC or live with a defective CPU that can work fine with some extra 30-50mV, it will probably last me until the replacement comes, maybe 5 years, I know electromigration can happen but of course it's relative. My last CPU was a 6700K from 2016, so yes I upgrade CPU every 5 years more or less.

BTW I'm looking forward to your WHEA triggering tool, I read some pages back that you were working on one, it would make our "finding stability" adventure a lot easier.


----------



## ghiga_andrei

LuchoU said:


> Hi ghiga_andrei, yes, for sure I have a bad CPU and that was my gut feeling when I received the first WHEA 18 when gaming at stock settings after google "WHEA 18 AMD 5800X" about a month ago, now I have certainty, I knew it was not ram or mb related. It's funny because I feel "lucky" since there is people with issues even at desktop or booting, mine is just during gaming. I never thought a CPU could fail this way from factory, maybe I was naive, but a lot of chips has passed through my hands, ram, gpus, etc I never had issues with things failing at stock settings, it's crazy, I mean it's called silicon lottery but come on, one would think a chip can OVERCLOCK more or less, but that's different from failing at stock settings. They have a very crappy QC checklist at AMD, that's for sure.
> 
> Now I need to continue testing and reducing CO counts until I reach instability again, that's very time consuming or maybe I could work per core, but that's even more time consuming. I need to put in a balance if I want to RMA, shipping times from Chile to USA and how much time I won't be able to use my PC or live with a defective CPU that can work fine with some extra 30-50mV, it will probably last me until the replacement comes, maybe 5 years, I know electromigration can happen but of course it's relative. My last CPU was a 6700K from 2016, so yes I upgrade CPU every 5 years more or less.
> 
> BTW I'm looking forward to your WHEA triggering tool, I read some pages back that you were working on one, it would make our "finding stability" adventure a lot easier.


I've been working on tweaking the tool the whole week but it's a pain in the ass, sometimes it reboots 3 times in 10 minutes and then no reboot for 3 hours. I still have to figure out what makes it stable for so long with the same settings. 

Also, like I mentioned in a previous post, go into Event Viewer -> Windows Logs -> System and on the right use Filter Current Log, select Event level only Error and Event sources only WHEA-Logger.
Then look into all occurrences of the WHEA errors and see the APIC IDs reported. Those are IDs linked to each core.
This may help you identify what core is causing you trouble, if it's always the same APIC ID.


----------



## Priv-Au

Hi guys, as promised in my previous messages I’d provide you with an update to my issues after going through RMA.

I received the new 5950X on Tuesday and after finally getting around to installing it I have absolutely no issues to report.
It is working flawlessly out of the box with no over clock (I don’t intend to.)
Some cores are hitting the 5.0ghz range under stress tests absolutely fine as well. 

If you are dealing with constant reboots and can pretty much rule out other components then as far as I’m concerned it’s your cpu and you need to RMA it.
RMA took about 3~4 weeks from postage to a new processor at my door.
Of course this could all go to hell and I could have issues again soon but for now things are looking good.
Best of luck with your 5900’s and 5950’s.


----------



## MikeS3000

So I've been playing with my replacement 5900x for a few days. I don't want to jinx it but I have not seen a single blue screen yet. I first tested at stock, then no PBO and just my tuned 3800 RAM, and then tuned ram with PBO and dialing in curve optimizer using Prime95 non-avx Large FFT single thread one core at at a time. This new CPU is prime stable with CO from -14 to -30 on various cores. Much better than having to do +5 CO on my #1 core on the defective 5900x. That old CPU was easy to bluescreen on a few cores with pretty minor negative CO. My advice is if you have done the troubleshooting and everything points to a defective CPU, just RMA it.


----------



## danny9428

MikeS3000 said:


> So I've been playing with my replacement 5900x for a few days. I don't want to jinx it but I have not seen a single blue screen yet. I first tested at stock, then no PBO and just my tuned 3800 RAM, and then tuned ram with PBO and dialing in curve optimizer using Prime95 non-avx Large FFT single thread one core at at a time. This new CPU is prime stable with CO from -14 to -30 on various cores. Much better than having to do +5 CO on my #1 core on the defective 5900x. That old CPU was easy to bluescreen on a few cores with pretty minor negative CO. My advice is if you have done the troubleshooting and everything points to a defective CPU, just RMA it.


Any chance you've kept the records of their default curve values and sample ranking in CTR software? 
I'm a bit curious if a less-than-capable chip would be generally worse samples when diagnosed in CTR

Mine is a Silver Sample though despite still requiring a positive curve on a weak core or more.
Default curve settings seem to shift quite a lot on my chip before and after previous load optimized settings attempt


Before :
Default curve coefficients
CORE#1 1 CPPC 208
CORE#2 0 CPPC 212
CORE#3 1 CPPC 185
CORE#4 0 CPPC 199
CORE#5 2 CPPC 203
CORE#6 1 CPPC 190
CORE#7 0 CPPC 194
CORE#8 2 CPPC 212
CORE#9 6 CPPC 163
CORE#10 5 CPPC 158
CORE#11 5 CPPC 181
CORE#12 7 CPPC 154
CORE#13 8 CPPC 149
CORE#14 5 CPPC 172
CORE#15 5 CPPC 176
CORE#16 5 CPPC 167After :
Default curve coefficients
CORE#1 11 CPPC 208
CORE#2 11 CPPC 212
CORE#3 9 CPPC 185
CORE#4 9 CPPC 199
CORE#5 11 CPPC 203
CORE#6 8 CPPC 190
CORE#7 10 CPPC 194
CORE#8 12 CPPC 212
CORE#9 3 CPPC 163
CORE#10 2 CPPC 158
CORE#11 5 CPPC 181
CORE#12 1 CPPC 154
CORE#13 0 CPPC 149
CORE#14 6 CPPC 172
CORE#15 5 CPPC 176
CORE#16 4 CPPC 167


----------



## ghiga_andrei

OK, after weeks of testing and rebooting my computer 1000 times, here is the beta version 0.1 of my WHEA triggering tool:








79.32 MB file on MEGA







mega.nz





Extract the archive and first edit the test_ryzen_stability.cfg file. Specify which core you want to test and where you have Cinebench R20 installed. The delay leave it at 100 until we decide for a better value. It's the testing time between Cinebench runs. A delay of 10 would run Cinebench more often, a delay of 1000 much less.

The tool is using Cinebench to heat up and generate heat gradients to accelerate the fail. It's also needed because after a Cinebench run the CPU works in a faster way due to some Performance Regulator implemented by AMD to cheat at AVX benchmarks (not confirmed, just an assumption by some other user on this forum). More details about my finding here:








(Gigabyte X570 AORUS Owners Thread)


I need to set it to 1.47 in bios for aida/hwinfo to show it at 1.5. Not sure why, Aorus Ultra is not so good? I posted this question in an other thread that's easy to explain. Remember that no system is 100% efficient. There is a loss as power moves to the ram sticks. aida/hwinfo is measuring...




www.overclock.net





Between CB runs, my tool performs histograms on a large image and auto adjusts contrast for it, using the specified core.

Make sure you have other apps and background stuff like aRGB controllers and Afterburner turned off. It's important to have the other cores sleeping. 

For me, with the same BIOS settings, sometimes my tool resets the computer within 1 minute, other times it doesn't reset at all.
I recommend not running more than 5-10 minutes.
For me, but this maybe depends on the MB VRMs and cooling, it is the most unstable after a restart. Probably when the VRMs are heating up it's generating a little more ripple. If everything reaches a stable temperature it doesn't break anymore. Always run multiple times after turning off the computer for 5-10 minutes.

*Please let me know how it works for you. Any feedback is welcome.*

P.S. The source code is included also. I would recommend running the exe file instead of the python, unless you have the latest python and opencv and numpy libraries. With older libraries it seems to behave differently.


----------



## GamBoTron

First Content creator i have seen looking into the issue actually spending some time on the topic, still very short tho


----------



## Peanuts4

Hey people, I'm not entirely sure if I'm experiencing the same thing you folks are or not but it sounds like it. This seems to be almost random. It happened once before 2 weeks ago not it happened 3 times in 1 day. Not sure which direction to go with it. I would love some of your input especially if you had this issue with a 3000 series chip I was linked to this thread from a person in my own Trying to figure out why my comp just restarted/crashed... feel free to give me your 2 cents. Not sure what I should do.


----------



## Deepcuts

Peanuts4 said:


> Hey people, I'm not entirely sure if I'm experiencing the same thing you folks are or not but it sounds like it. This seems to be almost random. It happened once before 2 weeks ago not it happened 3 times in 1 day. Not sure which direction to go with it. I would love some of your input especially if you had this issue with a 3000 series chip I was linked to this thread from a person in my own Trying to figure out why my comp just restarted/crashed... feel free to give me your 2 cents. Not sure what I should do.


If you tested everything at stock settings and still have reboots, the solution is simple: RMA/return the CPU to the shop.
People that never had a faulty CPU will never even think that a CPU can be faulty and will blame everything but the CPU. That is until they have a faulty CPU and the only solution is RMA.
Stop wasting your time debugging this.


----------



## ghiga_andrei

Peanuts4 said:


> Hey people, I'm not entirely sure if I'm experiencing the same thing you folks are or not but it sounds like it. This seems to be almost random. It happened once before 2 weeks ago not it happened 3 times in 1 day. Not sure which direction to go with it. I would love some of your input especially if you had this issue with a 3000 series chip I was linked to this thread from a person in my own Trying to figure out why my comp just restarted/crashed... feel free to give me your 2 cents. Not sure what I should do.


Read the messages from the last 5 pages between me and azomiss and you should have plenty info to debug. In short, check curve optimizer with positive values and investigate whea logs in windows event manager to see what cores are failing.


----------



## danny9428

ghiga_andrei said:


> OK, after weeks of testing and rebooting my computer 1000 times, here is the beta version 0.1 of my WHEA triggering tool:
> 
> 
> 
> 
> 
> 
> 
> 
> 79.32 MB file on MEGA
> 
> 
> 
> 
> 
> 
> 
> mega.nz
> 
> 
> 
> 
> 
> Extract the archive and first edit the test_ryzen_stability.cfg file. Specify which core you want to test and where you have Cinebench R20 installed. The delay leave it at 100 until we decide for a better value. It's the testing time between Cinebench runs. A delay of 10 would run Cinebench more often, a delay of 1000 much less.
> 
> The tool is using Cinebench to heat up and generate heat gradients to accelerate the fail. It's also needed because after a Cinebench run the CPU works in a faster way due to some Performance Regulator implemented by AMD to cheat at AVX benchmarks (not confirmed, just an assumption by some other user on this forum). More details about my finding here:
> 
> 
> 
> 
> 
> 
> 
> 
> (Gigabyte X570 AORUS Owners Thread)
> 
> 
> I need to set it to 1.47 in bios for aida/hwinfo to show it at 1.5. Not sure why, Aorus Ultra is not so good? I posted this question in an other thread that's easy to explain. Remember that no system is 100% efficient. There is a loss as power moves to the ram sticks. aida/hwinfo is measuring...
> 
> 
> 
> 
> www.overclock.net
> 
> 
> 
> 
> 
> Between CB runs, my tool performs histograms on a large image and auto adjusts contrast for it, using the specified core.
> 
> Make sure you have other apps and background stuff like aRGB controllers and Afterburner turned off. It's important to have the other cores sleeping.
> 
> For me, with the same BIOS settings, sometimes my tool resets the computer within 1 minute, other times it doesn't reset at all.
> I recommend not running more than 5-10 minutes.
> For me, but this maybe depends on the MB VRMs and cooling, it is the most unstable after a restart. Probably when the VRMs are heating up it's generating a little more ripple. If everything reaches a stable temperature it doesn't break anymore. Always run multiple times after turning off the computer for 5-10 minutes.
> 
> *Please let me know how it works for you. Any feedback is welcome.*
> 
> P.S. The source code is included also. I would recommend running the exe file instead of the python, unless you have the latest python and opencv and numpy libraries. With older libraries it seems to behave differently.


I tried your tool today using stock ram and no vcore or curve offsets on my 5950X
based on my relatively small samples of WHEA logs (only got less than 20s so far) I determined the highest chance my chip would fail would be either Core 2 (Thread id 4/5) and Core 15 (Thread id 30)

Currently I didn't quite get the crash during the single core runs. Instead all my crashes so far are actually initiated by the R20 runs firing up all cores.
The crash is also not quite what I hoped, instead of a bugcheck or a logger, it instead black screen to motherboard debug code : 00.

Kinda lost on what could be the curplit here, will try and see if any other cores would do any difference.


----------



## ghiga_andrei

danny9428 said:


> I tried your tool today using stock ram and no vcore or curve offsets on my 5950X
> based on my relatively small samples of WHEA logs (only got less than 20s so far) I determined the highest chance my chip would fail would be either Core 2 (Thread id 4/5) and Core 15 (Thread id 30)
> 
> Currently I didn't quite get the crash during the single core runs. Instead all my crashes so far are actually initiated by the R20 runs firing up all cores.
> The crash is also not quite what I hoped, instead of a bugcheck or a logger, it instead black screen to motherboard debug code : 00.
> 
> Kinda lost on what could be the curplit here, will try and see if any other cores would do any difference.


Sounds more like VRM or Power supply problem. Did you have a 3000 series in your motherboard before 5950x and know it's ok or is everything new ? Could also be a CPU problem, sure, but it's totally different than what most of us have / had.


----------



## ghiga_andrei

GamBoTron said:


> First Content creator i have seen looking into the issue actually spending some time on the topic, still very short tho


Hmm, interesting that letting aside any percentage from pc suppliers, the guy which wrote the first article at PC Mag or what was it said he had a bad 5900x which he RMAd and this guy now in the video said the tested 5 cpus and 1 was bad... For me it's pretty clear things could start to heat up but it depends if the other bigger tech reviews want to invest more time to investigate this or just want to let the whole thing slide... And regarding official return rates from big stores, I bet there are a lot of users who are not very tech savvy and have stability problems but have no idea what's going on or just don't know it's not normal... Maybe they think it's normal for a PC to crash once every 3 days, it's that damn windows to blame, it is known...


----------



## danny9428

ghiga_andrei said:


> Sounds more like VRM or Power supply problem. Did you have a 3000 series in your motherboard before 5950x and know it's ok or is everything new ? Could also be a CPU problem, sure, but it's totally different than what most of us have / had.


New board (Dark Hero), new power supply (AX1000).
This kind of crash is the same as firing up Aida64 extreme's cache and memory benchmark (especially the L3 cache one)


----------



## ghiga_andrei

danny9428 said:


> New board (Dark Hero), new power supply (AX1000).
> This kind of crash is the same as firing up Aida64 extreme's cache and memory benchmark (especially the L3 cache one)


Your only chance is to try a different combination of components, like a different MB or a different power supply with the same rest of components. Otherwise you won't be able to tell what's broken with 3 new things. Find a computer service near you and ask them to let you come test the components for a small fee, I don't know. Or ask friends if they can borrow a PSU for a few days, or a MB or even a 3700x or likewise CPU.

For all us here who RMAd the 5000s CPUs, we had no trouble with benchmarks and high load. Only at light load we got the reboots, and always with WHEA error.


----------



## JohnnyFlash

Peanuts4 said:


> Hey people, I'm not entirely sure if I'm experiencing the same thing you folks are or not but it sounds like it. This seems to be almost random. It happened once before 2 weeks ago not it happened 3 times in 1 day. Not sure which direction to go with it. I would love some of your input especially if you had this issue with a 3000 series chip I was linked to this thread from a person in my own Trying to figure out why my comp just restarted/crashed... feel free to give me your 2 cents. Not sure what I should do.


First thing you should do when you suspect any problems is reset everything to stock. I see that you're running 3600 memory, put the IF back to stock for now and see if the issue persists.


----------



## Peanuts4

ghiga_andrei said:


> Read the messages from the last 5 pages between me and azomiss and you should have plenty info to debug. In short, check curve optimizer with positive values and investigate whea logs in windows event manager to see what cores are failing.


With your program which you "Specify which core you want to test" how do you know which core to test? I'm not sure if there is a way to tell that in the event viewer after a crash? I'm honestly curious if I can just skip retesting everything and just use the program as a stability test? If it crashes it crashes then I know something is up. Or is this program a sort of last option?


----------



## ghiga_andrei

Peanuts4 said:


> With your program which you "Specify which core you want to test" how do you know which core to test? I'm not sure if there is a way to tell that in the event viewer after a crash? I'm honestly curious if I can just skip retesting everything and just use the program as a stability test? If it crashes it crashes then I know something is up. Or is this program a sort of last option?


Go into Event Viewer -> Windows Logs -> System and on the right use Filter Current Log, select Event level only Error and Event sources only WHEA-Logger.
Then look into all occurrences of the WHEA errors and see the APIC IDs reported. Those are IDs linked to each core.
APIC IDs are threads so if it is 8 then it is core 4, if 5 it is core 2. (divide by 2 and round down).
Use that info to know which core to test with my tool.
I used my tool to see where can I go with the Curve optimizer on each core. Can be used to test any instability in a core since it runs Cinebench to test multi core and then lots of instructions to test one single core. I don't see it as a last option.


----------



## Anthos

Hey Andrei, I've used your tool as a means to test my negative curve to find pretty much what's the maximum negative offset each of my cores can tolerate. I've tested a couple of cores and after cinebench closes and starts testing the cores a few seconds later I get a hard crash, goes back into bios with a max cpu temp error (I've monitor the temps and they never really get high). If I use the tool without any CO offset then it works normally. Do you have any idea on this?


----------



## ghiga_andrei

Anthos said:


> Hey Andrei, I've used your tool as a means to test my negative curve to find pretty much what's the maximum negative offset each of my cores can tolerate. I've tested a couple of cores and after cinebench closes and starts testing the cores a few seconds later I get a hard crash, goes back into bios with a max cpu temp error (I've monitor the temps and they never really get high). If I use the tool without any CO offset then it works normally. Do you have any idea on this?


My tool only executes some instructions on that core that require vector multiplication and similar stuff. Nothing out of the ordinary like you could say about HWInfo accesing drivers or system irqs. 
So it should not cause a crash unless something is really wrong. 
Could be that the temperature inside the specific core exceeded the limit, but you cannot see that. You can only see the average temp on the whole CCD, so you have no way of knowing how hot a single core got. Check Event Viewer, see what that says also, if you get a WHEA it's clear. My BIOS (Aorus Elite MB) does not have the option to report that cpu temp error, it just reboots directly.


----------



## ghiga_andrei

Anthos said:


> Hey Andrei, I've used your tool as a means to test my negative curve to find pretty much what's the maximum negative offset each of my cores can tolerate. I've tested a couple of cores and after cinebench closes and starts testing the cores a few seconds later I get a hard crash, goes back into bios with a max cpu temp error (I've monitor the temps and they never really get high). If I use the tool without any CO offset then it works normally. Do you have any idea on this?


Also, did you try also cores from 2nd CCD ? My 5900x can hold -30 on all 6 cores from CCD2 but in CCD1 one core reboots even with 0 offset, needs +2 to never reboot. Other cores in CCD1 can hold between -5 and -15 without rebooting. You must use per core CO, not all core.


----------



## Anthos

ghiga_andrei said:


> Also, did you try also cores from 2nd CCD ? My 5900x can hold -30 on all 6 cores from CCD2 but in CCD1 one core reboots even with 0 offset, needs +2 to never reboot. Other cores in CCD1 can hold between -5 and -15 without rebooting. You must use per core CO, not all core.


I am not entirely sure what settings I used when I was trialing it yesterday (was a bit in a rush). I was just perplexed that at normal volts it can run the test but temp error in essence with an undervoltage? I mean yeah I expect stability issues etc but temp? I found it a bit contradicting.


----------



## ghiga_andrei

Anthos said:


> I am not entirely sure what settings I used when I was trialing it yesterday (was a bit in a rush). I was just perplexed that at normal volts it can run the test but temp error in essence with an undervoltage? I mean yeah I expect stability issues etc but temp? I found it a bit contradicting.


Temperature is caused by voltage or frequency. If the chip boosts 2% higher due to 1% lower voltage you will have higher temps.

Silicon chips consume power when switching logic gates, in the transition from 1 to 0 or 0 to 1 they create a current path from VDD (Vcore) to GND. That is the only time when both PMOS and NMOS transistors are active for a very short time. If they switch more times per second (frequency) they will create that current path more times per second. In a stable state, either the PMOS is turned off or the NMOS is turned off and the chips only consume the leakage power, which is getting higher with every technology node reduction, but is much much less than switching power.


----------



## azomiss

Good news people !

Here is what I did last time:


bios reset twice
enabled virtualization
enabled support for DDR4 @ 3600
uninstalled Gigabyte monitoring software
uninstalled HWI_642

The result is this 










Have a great one, me happy!


----------



## 1devomer

ghiga_andrei said:


> Also, did you try also cores from 2nd CCD ? My 5900x can hold -30 on all 6 cores from CCD2 but in CCD1 one core reboots even with 0 offset, needs +2 to never reboot. Other cores in CCD1 can hold between -5 and -15 without rebooting. You must use per core CO, not all core.


It's a bad habit coming from the Ryzen 3k launch, when AMD had limited chiplet availability!
One CCD is usually good or very good, the other CCD is usually barely decent or rubbish.

It's still the case with the Ryzen 5k, if i look at the users reports and by the fact that dual CCD 5800/5600 are being sold.
There were clocks issues when Ryzen 3k launched due to poorly binned chips, but not as much as we are experiencing today with the Ryzen 5k!

That's why one should RMA or send back dull, defective, low binned cpu's, a lot of users already went through this issue with Ryzen 3k, including myself unfortunately.


----------



## Redwoodz

danny9428 said:


> New board (Dark Hero), new power supply (AX1000).
> This kind of crash is the same as firing up Aida64 extreme's cache and memory benchmark (especially the L3 cache one)




__ https://twitter.com/i/web/status/1364971430337740804


----------



## danny9428

Redwoodz said:


> __ https://twitter.com/i/web/status/1364971430337740804


I dissassembled my parts yesterday just to try and see if I can find out what was causing the weird black screen to post 00 issue during aida64 cache/cinebench stress
Turns out it was actually the board's default 'Cpu Current Capability' at 'Auto' being the reason.
It appears 5950X would require 130% or 140% on that to not trigger board shut down (which is weird as optimized default seem to not set this high enough)
Set it at 130% and the black screen 00 is gone, I then turn it back to 100% and aida64 cache benchmark would instantly black screen again.

Will see if this also rule out the random reboots I had before...

I checked asus website and it seems they still haven't roll out the official 1.2.0.1. bios yet for Dark Hero


----------



## dr.Rafi

ghiga_andrei said:


> Temperature is caused by voltage or frequency. If the chip boosts 2% higher due to 1% lower voltage you will have higher temps.
> 
> Silicon chips consume power when switching logic gates, in the transition from 1 to 0 or 0 to 1 they create a current path from VDD (Vcore) to GND. That is the only time when both PMOS and NMOS transistors are active for a very short time. If they switch more times per second (frequency) they will create that current path more times per second. In a stable state, either the PMOS is turned off or the NMOS is turned off and the chips only consume the leakage power, which is getting higher with every technology node reduction, but is much much less than switching power.


You mean in stable state, when the transistor is idling (or static) ? curious to learn.


----------



## Anthos

danny9428 said:


> I dissassembled my parts yesterday just to try and see if I can find out what was causing the weird black screen to post 00 issue during aida64 cache/cinebench stress
> Turns out it was actually the board's default 'Cpu Current Capability' at 'Auto' being the reason.
> It appears 5950X would require 130% or 140% on that to not trigger board shut down (which is weird as optimized default seem to not set this high enough)
> Set it at 130% and the black screen 00 is gone, I then turn it back to 100% and aida64 cache benchmark would instantly black screen again.
> 
> Will see if this also rule out the random reboots I had before...
> 
> I checked asus website and it seems they still haven't roll out the official 1.2.0.1. bios yet for Dark Hero


I have a 5950x and a dark hero and I don't get any crashes with cpu current set at auto when testing in aida.


----------



## danny9428

Anthos said:


> I have a 5950x and a dark hero and I don't get any crashes with cpu current set at auto when testing in aida.


Is it like even with 100% your rig still goes fine without black screen crash?

Hmm...that just complicates things even more lol


----------



## Anthos

danny9428 said:


> Is it like even with 100% your rig still goes fine without black screen crash?
> 
> Hmm...that just complicates things even more lol


I've just ran the AIDA cache test twice back to back with the current specifically at 100% and yeah, no problems whatsoever.


----------



## danny9428

Anthos said:


> I've just ran the AIDA cache test twice back to back with the current specifically at 100% and yeah, no problems whatsoever.


Right now it appears my chip can take negative curve 10 and not crash randomly after having cpu current set at 130%

Testing -15 and core 7 finally gave up showing error in prime95

Only other significance with the mobo is when cpu load jumps up, the vrm on the board would give coil whines, though only audible when I place it on test bench.


----------



## ghiga_andrei

dr.Rafi said:


> You mean in stable state, when the transistor is idling (or static) ? curious to learn.


The transistors are used to form logic gates and logic gates are used to create bigger logic circuits, like adders, multipliers, instruction decoders, pipeline registers, cache, everything that is inside the cpu (and any other digital chip). There are a few logic gates like INV, AND, OR, NOR, NAND, XOR, NXOR and each of them has a simple logic truth table. Each of this logic gates is made out of a few transistors, mainly PMOS-NMOS pairs. The MOS transistor is like a simple ON/OFF switch. The problem is that there is no single MOS transistor that can switch both high levels and low levels to an output, that's why we have PMOS and NMOS transistors. The PMOS can switch VDD to an output, when it's input is low and the NMOS can switch GND to an output, when it's input is high. The logic done with pairs of PMOS and NMOS is called CMOS = Complementary MOS CMOS - Wikipedia

The simplest study case is an INV gate which has to invert the input signal: 









You have a pair of PMOS and NMOS transistors connected to the same input and same output. When the input is low, the PMOS is active and the NMOS is inactive and the output is VDD (or VCC in the photo), through the PMOS. When the input is high, the PMOS is inactive and the NMOS is active and the output is GND. More pairs like this are used to implement AND, OR, XOR gates. And so on, until you make the complete design with billions of transistors. Now the transistor itself, in a stable state, which is when the input does not change, consumes only leakage power, regardless if it's active or not. This is the advantage of MOS transistors over Bipolar transistors. The input gate is made of oxide, not metal. It acts like a capacitor, not a resistor, so it blocks static currents. But, in the INV gate, as some point you will change the input state. Let's take the case when you switch the input from low to high and the output has to go from high to low. This means before the switching the PMOS is active and the NMOS is inactive. Both transistors are in stable state and because the NMOS is inactive, the GND path is not connected to VDD. But, when the input signal begins to rise (it's always a slope, a signal will never instantly go from low to high) the NMOS transistor slowly starts to activate, and the PMOS transistor slowly starts to deactivate. This means the GND pin will start to be connected to VDD through the 2 transistors with varying equivalent resistance of each transistor. This takes until the input signal completely reaches the high state and the PMOS deactivates completely. So in this switching time, there will be a current flowing through the transistors between VDD and GND. And this repeats each time the gate switches it's input / output. The same for all other gates in the design. Your current consumption will be proportional to how many gates in the design switch at the same time. This why AVX instructions consume a lot of power, for example. Because they calculate a 512bit result instead of a 64bit one in the same time. It uses 8 times more logic. When the cores are sleeping many of the gates inside will have stable inputs because no signals change around them.


----------



## Peanuts4

ghiga_andrei said:


> Then look into all occurrences of the WHEA errors and see the APIC IDs reported. Those are IDs linked to each core.
> APIC IDs are threads so if it is 8 then it is core 4, if 5 it is core 2. (divide by 2 and round down).
> Use that info to know which core to test with my tool.


So the last time this happened was the 27th, for all of these they all say APICID 0 except for 1 instance that it says APICID 1. Since I can't divide them by 0 does it mean both APICID 0 and APICID 1 are on core 0? Just to be sure I'm understanding this or do I test core 0 and 1? I'm a bit late to the game with these newer CPU's since I think these chips are split into modules now or something. I just want to be sure I'm not testing the wrong cores.

A corrected hardware error has occurred.
Reported by component: Processor Core
Error Source: Unknown Error Source
Error Type: No Error
Processor APIC ID: 0

The details view of this entry contains further information.


----------



## ghiga_andrei

Peanuts4 said:


> So the last time this happened was the 27th, for all of these they all say APICID 0 except for 1 instance that it says APICID 1. Since I can't divide them by 0 does it mean both APICID 0 and APICID 1 are on core 0? Just to be sure I'm understanding this or do I test core 0 and 1? I'm a bit late to the game with these newer CPU's since I think these chips are split into modules now or something. I just want to be sure I'm not testing the wrong cores.
> 
> A corrected hardware error has occurred.
> Reported by component: Processor Core
> Error Source: Unknown Error Source
> Error Type: No Error
> Processor APIC ID: 0
> 
> The details view of this entry contains further information.
> 
> View attachment 2481030


Correct, APIC IDs 0 and 1 are the 2 threads corresponding to core 0. So that means that's your bad core. Try to put a positive curve optimizer only on that core, something like Positive 10. The rest leave at 0.


----------



## Peanuts4

ghiga_andrei said:


> Correct, APIC IDs 0 and 1 are the 2 threads corresponding to core 0. So that means that's your bad core. Try to put a positive curve optimizer only on that core, something like Positive 10. The rest leave at 0.


I'm assuming you mean in the BIOS? What does putting a core optimizer of 10 do? I tried looking it up and I'm not quite finding it. Is it just a zen 3 feature?

Also in your program test_ryzen_stability.cfg what program can you open it and edit it with? I can't just use notepad can I? That would probably be too easy.


----------



## ghiga_andrei

Peanuts4 said:


> I'm assuming you mean in the BIOS? What does putting a core optimizer of 10 do? I tried looking it up and I'm not quite finding it. Is it just a zen 3 feature?
> 
> Also in your program test_ryzen_stability.cfg what program can you open it and edit it with? I can't just use notepad can I? That would probably be too easy.


Yeah, that cfg file is just a text file. Open with notepad.

The curve optimizer is a zen 3 feature, yes. You find it in BIOS where you have PBO settings. Positive values give more voltage to cores and negative values give less voltage. It's a fine tuning of the voltage per frequency factory setting. On a good CPU we use negative values to lower the temperature and allow better PBO performance. On bad cores we must put a positive value because of the bad AMD binning. If your CPU is stable after you put the positive value for core 0 then it is not in spec and you can RMA it at AMD. You should not accept bad cores on your CPU.


----------



## dr.Rafi

ghiga_andrei said:


> The transistors are used to form logic gates and logic gates are used to create bigger logic circuits, like adders, multipliers, instruction decoders, pipeline registers, cache, everything that is inside the cpu (and any other digital chip). There are a few logic gates like INV, AND, OR, NOR, NAND, XOR, NXOR and each of them has a simple logic truth table. Each of this logic gates is made out of a few transistors, mainly PMOS-NMOS pairs. The MOS transistor is like a simple ON/OFF switch. The problem is that there is no single MOS transistor that can switch both high levels and low levels to an output, that's why we have PMOS and NMOS transistors. The PMOS can switch VDD to an output, when it's input is low and the NMOS can switch GND to an output, when it's input is high. The logic done with pairs of PMOS and NMOS is called CMOS = Complementary MOS CMOS - Wikipedia
> 
> The simplest study case is an INV gate which has to invert the input signal:
> View attachment 2481011
> 
> 
> You have a pair of PMOS and NMOS transistors connected to the same input and same output. When the input is low, the PMOS is active and the NMOS is inactive and the output is VDD (or VCC in the photo), through the PMOS. When the input is high, the PMOS is inactive and the NMOS is active and the output is GND. More pairs like this are used to implement AND, OR, XOR gates. And so on, until you make the complete design with billions of transistors. Now the transistor itself, in a stable state, which is when the input does not change, consumes only leakage power, regardless if it's active or not. This is the advantage of MOS transistors over Bipolar transistors. The input gate is made of oxide, not metal. It acts like a capacitor, not a resistor, so it blocks static currents. But, in the INV gate, as some point you will change the input state. Let's take the case when you switch the input from low to high and the output has to go from high to low. This means before the switching the PMOS is active and the NMOS is inactive. Both transistors are in stable state and because the NMOS is inactive, the GND path is not connected to VDD. But, when the input signal begins to rise (it's always a slope, a signal will never instantly go from low to high) the NMOS transistor slowly starts to activate, and the PMOS transistor slowly starts to deactivate. This means the GND pin will start to be connected to VDD through the 2 transistors with varying equivalent resistance of each transistor. This takes until the input signal completely reaches the high state and the PMOS deactivates completely. So in this switching time, there will be a current flowing through the transistors between VDD and GND. And this repeats each time the gate switches it's input / output. The same for all other gates in the design. Your current consumption will be proportional to how many gates in the design switch at the same time. This why AVX instructions consume a lot of power, for example. Because they calculate a 512bit result instead of a 64bit one in the same time. It uses 8 times more logic. When the cores are sleeping many of the gates inside will have stable inputs because no signals change around them.


Thank you, you made it so easy to understand, may be in the future they can invent better technology and make transistors switching state(active to not active transient) delay much shorter or negligible, my be even using photons instead of electrons.


----------



## GamBoTron

so after i built my computer and used it regularly for one month, i experienced my first reboot today. Looked into it further in the event viewer and found this









Anyone know what this might be ? System has been running perfect up untill today (at least i didnt see any of the other reboots logged in action before today) but im unsure about what the unclean reboots come from. Only change i did since last two weeks is install a new SATA SSD


----------



## xeizo

GamBoTron said:


> so after i built my computer and used it regularly for one month, i experienced my first reboot today. Looked into it further in the event viewer and found this
> View attachment 2481150
> 
> 
> Anyone know what this might be ? System has been running perfect up untill today (at least i didnt see any of the other reboots logged in action before today) but im unsure about what the unclean reboots come from. Only change i did since last two weeks is install a new SATA SSD


Too much Curve Optimizer and too much Boost Override reliably does that when using PBO. Only happens in idle, sometimes when away from the PC. Shouldn't happen using stock though.


----------



## GamBoTron

xeizo said:


> Too much Curve Optimizer and too much Boost Override reliably does that when using PBO. Only happens in idle, sometimes when away from the PC. Shouldn't happen using stock though.


havent touched curve optimizer or boost override. Only thing i did was enable xmp. Read some more about the issue and its pretty common. Might be the PSU from what i gathered or even the power settings


----------



## xeizo

GamBoTron said:


> havent touched curve optimizer or boost override. Only thing i did was enable xmp. Read some more about the issue and its pretty common. Might be the PSU from what i gathered or even the power settings


Yes, can be a lot of things, I just mentioned two that is a sure way getting there. Using older versions of HWINFO64 also triggered the same event, latest version should be good.
I haven't had any WHEA or kernel power for several weeks now, I'm not sure which setting was "it" since I have changed a lot of settings to my liking.


----------



## mongoled

WHEA warning and WHEA errors should not be mixed up and used as the same thing !

WHEA warning are usually a sign that the FCLK is too high.

WHEA errors are usually an issue where a core crashes leading to system instability/freeze/reboot.

As this thread is about WHEA reboots I feel this clarification is needed.


----------



## ghiga_andrei

mongoled said:


> WHEA warning and WHEA errors should not be mixed up and used as the same thing !
> 
> WHEA warning are usually a sign that the FCLK is too high.
> 
> WHEA errors are usually an issue where a core crashes leading to system instability/freeze/reboot.
> 
> As this thread is about WHEA reboots I feel this clarification is needed.


More specifically, should be about WHEA 18 Cache Hierarchy Errors since those are specific to the bad cores in the 5xxx Ryzens.


----------



## mark007

ghiga_andrei said:


> OK, after weeks of testing and rebooting my computer 1000 times, here is the beta version 0.1 of my WHEA triggering tool:
> 
> 
> 
> 
> 
> 
> 
> 
> 79.32 MB file on MEGA
> 
> 
> 
> 
> 
> 
> 
> mega.nz
> 
> 
> 
> 
> 
> Extract the archive and first edit the test_ryzen_stability.cfg file. Specify which core you want to test and where you have Cinebench R20 installed. The delay leave it at 100 until we decide for a better value. It's the testing time between Cinebench runs. A delay of 10 would run Cinebench more often, a delay of 1000 much less.
> 
> The tool is using Cinebench to heat up and generate heat gradients to accelerate the fail. It's also needed because after a Cinebench run the CPU works in a faster way due to some Performance Regulator implemented by AMD to cheat at AVX benchmarks (not confirmed, just an assumption by some other user on this forum). More details about my finding here:
> 
> 
> 
> 
> 
> 
> 
> 
> (Gigabyte X570 AORUS Owners Thread)
> 
> 
> I need to set it to 1.47 in bios for aida/hwinfo to show it at 1.5. Not sure why, Aorus Ultra is not so good? I posted this question in an other thread that's easy to explain. Remember that no system is 100% efficient. There is a loss as power moves to the ram sticks. aida/hwinfo is measuring...
> 
> 
> 
> 
> www.overclock.net
> 
> 
> 
> 
> 
> Between CB runs, my tool performs histograms on a large image and auto adjusts contrast for it, using the specified core.
> 
> Make sure you have other apps and background stuff like aRGB controllers and Afterburner turned off. It's important to have the other cores sleeping.
> 
> For me, with the same BIOS settings, sometimes my tool resets the computer within 1 minute, other times it doesn't reset at all.
> I recommend not running more than 5-10 minutes.
> For me, but this maybe depends on the MB VRMs and cooling, it is the most unstable after a restart. Probably when the VRMs are heating up it's generating a little more ripple. If everything reaches a stable temperature it doesn't break anymore. Always run multiple times after turning off the computer for 5-10 minutes.
> 
> *Please let me know how it works for you. Any feedback is welcome.*
> 
> P.S. The source code is included also. I would recommend running the exe file instead of the python, unless you have the latest python and opencv and numpy libraries. With older libraries it seems to behave differently.


Thank you for the tool. I'm planning to use it with my replacement 5950x to see can I reproduce faults quicker. Do you think its possible (if already not in place) to make it loop through all cores for a certain period of time per core, and then finish. Currently I assume you have to specify the core and remember to stop it say after 10/15 mins. Itd be great to click and forget, or do you think theres some reason this might not work well. Cheers.


----------



## ghiga_andrei

mark007 said:


> Thank you for the tool. I'm planning to use it with my replacement 5950x to see can I reproduce faults quicker. Do you think its possible (if already not in place) to make it loop through all cores for a certain period of time per core, and then finish. Currently I assume you have to specify the core and remember to stop it say after 10/15 mins. Itd be great to click and forget, or do you think theres some reason this might not work well. Cheers.


I could make the change, no problem. It's just that if you cycle through all cores, you don't know which one actually caused the reboot. Sure, you can look for the WHEA error APIC ID after reboot, but sometimes in my case it was not there. 25% of my reboots did not log the WHEA error, probably the CPU rebooted too fast, don't know. But testing 1 fixed core I knew it was the one causing it.

Anyway, I will implement this and get back to you and you see for yourself what version is more useful.


----------



## mark007

ghiga_andrei said:


> I could make the change, no problem. It's just that if you cycle through all cores, you don't know which one actually caused the reboot. Sure, you can look for the WHEA error APIC ID after reboot, but sometimes in my case it was not there. 25% of my reboots did not log the WHEA error, probably the CPU rebooted too fast, don't know. But testing 1 fixed core I knew it was the one causing it.
> 
> Anyway, I will implement this and get back to you and you see for yourself what version is more useful.


Thanks so much! Yeah if the testing of all cores in order helps people find the bad core quicker its fantastic, and can then possibly run it against just the one core as they make tweaks. Perhaps the "core" in the cfg file could be set to some string like 'all' or 'loop' or something to indicate that mode, vs an integer to indicate just using one core.


----------



## LuchoU

ghiga_andrei said:


> I could make the change, no problem. It's just that if you cycle through all cores, you don't know which one actually caused the reboot. Sure, you can look for the WHEA error APIC ID after reboot, but sometimes in my case it was not there. 25% of my reboots did not log the WHEA error, probably the CPU rebooted too fast, don't know. But testing 1 fixed core I knew it was the one causing it.
> 
> Anyway, I will implement this and get back to you and you see for yourself what version is more useful.


I was having the same problem to log the WHEA errors in event viewer, what worked for me is to change the dump option to "Complete memory dump" in Advanced system settings. It seems that this gives some extra time to log the WHEA error in event viewer as the dump takes a bit more to be created.


----------



## TheSh4d0w

I got my 5900x a few weeks ago, couldn't even make it through the windows installer without a BSOD (WHEA_UNCORRECTABLE_ERROR). RMA'ed it with AMD and got the replacement this morning, initially everything seemed good. Installed windows, steam, a bunch of apps without issue, then suddenly another BSOD. Reset bios to optimized defaults again just in case, run prime95 and sure enough it bsod's consistently within a couple minutes under load.

I'm running the Asus ROG Crosshair VIII impact board which Asus hasn't published a agesa 1.2.0.1 bios for yet, but googling around I found this potentially sketchy bios site: Test BIOS for Crosshair VIII Impact | bianbao.dev

After installing that bios I ran prime95 stable for about 15 minutes before it BSOD'ed, so maybe it's improved things slightly, but doesn't fix it. Emailed AMD back, curious what they're going to say.

This is my first ever AMD proc, and I think it's going to be my last....


----------



## ghiga_andrei

mark007 said:


> Thanks so much! Yeah if the testing of all cores in order helps people find the bad core quicker its fantastic, and can then possibly run it against just the one core as they make tweaks. Perhaps the "core" in the cfg file could be set to some string like 'all' or 'loop' or something to indicate that mode, vs an integer to indicate just using one core.


You can find the multiple cores version here:








79.32 MB file on MEGA







mega.nz





In the new .cfg file you have the first line:
cores = [0, 1, 2, 3, 4, 5]

You can change that to any sequence of cores needed, just leave the brackets there always.

If you want to test only a single core like it was before, you can do:
cores = [4]

Let me know how it works.


----------



## LuchoU

TheSh4d0w said:


> I got my 5900x a few weeks ago, couldn't even make it through the windows installer without a BSOD (WHEA_UNCORRECTABLE_ERROR). RMA'ed it with AMD and got the replacement this morning, initially everything seemed good. Installed windows, steam, a bunch of apps without issue, then suddenly another BSOD. Reset bios to optimized defaults again just in case, run prime95 and sure enough it bsod's consistently within a couple minutes under load.
> 
> I'm running the Asus ROG Crosshair VIII impact board which Asus hasn't published a agesa 1.2.0.1 bios for yet, but googling around I found this potentially sketchy bios site: Test BIOS for Crosshair VIII Impact | bianbao.dev
> 
> After installing that bios I ran prime95 stable for about 15 minutes before it BSOD'ed, so maybe it's improved things slightly, but doesn't fix it. Emailed AMD back, curious what they're going to say.
> 
> This is my first ever AMD proc, and I think it's going to be my last....


Do you have the CPU batches?, the one you RMA'ed and the new one (which is also failing it seems per your post).


----------



## TheSh4d0w

LuchoU said:


> Do you have the CPU batches?, the one you RMA'ed and the new one (which is also failing it seems per your post).


The one I sent back was (assuming I'm reading it properly) BG 2050PGS
The replacement is BG 2104PGS

Tried just pulling one ram stick at a time / alternating slots in case it's not the CPU, but it consistently crashes at the 10-15 mark of prime95....


----------



## LuchoU

TheSh4d0w said:


> The one I sent back was (assuming I'm reading it properly) BG 2050PGS
> The replacement is BG 2104PGS
> 
> Tried just pulling one ram stick at a time / alternating slots in case it's not the CPU, but it consistently crashes at the 10-15 mark of prime95....


Mine is BG2043PGS, but it's a 5800x. I also have the same problems, I was able to stabilize it by adding a positive +8 all cores in curve optimizer. Probably I will also RMA, but I'm a bit scared since your new CPU is quite recent, BG2104PGS, it was produced this year, last week of January. How is it possible that at this point in time AMD is still producing deffective units?, I thought that by this time it was resolved. Not sure what to do know.


----------



## TheSh4d0w

LuchoU said:


> Mine is BG2043PGS, but it's a 5800x. I also have the same problems, I was able to stabilize it by adding a positive +8 all cores in curve optimizer. Probably I will also RMA, but I'm a bit scared since your new CPU is quite recent, BG2104PGS, it was produced this year, last week of January. How is it possible that at this point in time AMD is still producing deffective units?, I thought that by this time it was resolved. Not sure what to do know.


I bumped to +8 and it ran prime95 for 25 minutes, a new record! It finally bluescreened and while watching the ryzen software, I saw core 3 + core 5 rapidly drop off down to 5 mhz a few seconds before the crash. My guess is those are likely the defective ones or something.

I'm hoping AMD will both cross-ship and give me a return label for the return shipping, I don't want to keep having to both wait weeks and pay to ship these things back to them.


----------



## LuchoU

TheSh4d0w said:


> I bumped to +8 and it ran prime95 for 25 minutes, a new record! It finally bluescreened and while watching the ryzen software, I saw core 3 + core 5 rapidly drop off down to 5 mhz a few seconds before the crash. My guess is those are likely the defective ones or something.
> 
> I'm hoping AMD will both cross-ship and give me a return label for the return shipping, I don't want to keep having to both wait weeks and pay to ship these things back to them.


You can try to bump a bit more to +10 or +15
I was suggested to use +10 and it worked so I stopped there. +15 will add an additional of 0.075v worst case. I monitor vcore usage with HWInfo and I have not seen any crazy vcore increase with +8, maybe because of vdroop? only sligtly above 1.5v worst case, but CPU runs a little bit hotter.

Enviado desde mi SM-G960U1 mediante Tapatalk


----------



## TheSh4d0w

LuchoU said:


> You can try to bump a bit more to +10 or +15
> I was suggested to use +10 and it worked so I stopped there. +15 will add an additional of 0.075v worst case. I monitor vcore usage with HWInfo and I have not seen any crazy vcore increase with +8, maybe because of vdroop? only sligtly above 1.5v worst case, but CPU runs a little bit hotter.


Thanks, but I don't care that much  I need a 100% stable computer, so my new one is just going to go back to collecting dust until AMD can sort this **** show out.


----------



## xeizo

That looks like two real ugly samples in a row, could explain why AMD has so hard supplying these. It's not only the consoles(rumored 80% of the allocation). Maybe that's why the 5600X is unobtanium, there is no trash bin silicon that can run stable so all the (semi)working chiplets go to 5800X, 5900X and 5950X.
I have a very early sample with one worst core running -1, the other cores between -10-20, it has been stable for several weeks now after I found the right curve. I could be envious of those running -30 on many cores, but at least it is stable which looks to be good enough. It does 5025MHz on three cores with stable settings, so it's still above spec.


----------



## abbekeff

I thought I had fixed these reboots by setting LLC on highest setting(all else stock). Today randomly I got 3 reboots in a row trying to play games. Yet it's still stable in stress test / benchmarks. *** is even going on with these processors.


----------



## danny9428

xeizo said:


> That looks like two real ugly samples in a row, could explain why AMD has so hard supplying these. It's not only the consoles(rumored 80% of the allocation). Maybe that's why the 5600X is unobtanium, there is no trash bin silicon that can run stable so all the (semi)working chiplets go to 5800X, 5900X and 5950X.
> I have a very early sample with one worst core running -1, the other cores between -10-20, it has been stable for several weeks now after I found the right curve. I could be envious of those running -30 on many cores, but at least it is stable which looks to be good enough. It does 5025MHz on three cores with stable settings, so it's still above spec.


Wait...I thought the 5600X is waaaay more common in stock than the dual-CCD SKUs no? At least here in Hong Kong we do see them in stock a lot more common than the 59x-es



abbekeff said:


> I thought I had fixed these reboots by setting LLC on highest setting(all else stock). Today randomly I got 3 reboots in a row trying to play games. Yet it's still stable in stress test / benchmarks. *** is even going on with these processors.


Previously I fiddled with CPU current capability settings and thought I'd find my own fix. It turns out it's still not quite the solution as crashes are still happening, just not as often than leaving it auto or 100 - 120%

As far as I know going up in LLC only really improves stability during heavy stress tests but not light loads, do you get any WHEA ID 18s from those crashes?


----------



## xeizo

danny9428 said:


> Wait...I thought the 5600X is waaaay more common in stock than the dual-CCD SKUs no? At least here in Hong Kong we do see them in stock a lot more common than the 59x-es


In Europe only 5800X is widely available


----------



## Anthos

xeizo said:


> In Europe only 5800X is widely available


In the UK the 5600x is the most common of all by a huge margin


----------



## mark007

ghiga_andrei said:


> You can find the multiple cores version here:
> 
> 
> 
> 
> 
> 
> 
> 
> 79.32 MB file on MEGA
> 
> 
> 
> 
> 
> 
> 
> mega.nz
> 
> 
> 
> 
> 
> In the new .cfg file you have the first line:
> cores = [0, 1, 2, 3, 4, 5]
> 
> You can change that to any sequence of cores needed, just leave the brackets there always.
> 
> If you want to test only a single core like it was before, you can do:
> cores = [4]
> 
> Let me know how it works.


That's absolutely great. Thanks very much. I'll test it today and tomorrow.


----------



## JohnnyFlash

I have my 5950X installed, running a battery of tests on an HDD before adding the nvme drives.

So far running fine DDR4 3600 CL16 and 1800 IF with no tweaks.


----------



## LuchoU

Any CPU intensive game is what triggers the WHEA 18 errors for me, it could be the cache error or the bus connection error, only those 2 types and always is an error 18 followed by a restart. Benchmarks, Prime95 or Karhu does not trigger them for me. Red Dead Redemption 2 triggers the error in less than 1 hour, Hitman 3 also triggers the errors. This is different than other overclocking errors I have suffered with Intel in the past, games have all the cores going up and down through different states so the dynamic nature of them will make any error show up. I also tried max LLC in the past, I thought the WHEA errors were gone, but they came back. What these bad CPU units need are extra voltage to make them stable, so adding offsets to the curve optimizer should resolve most of the issues.


----------



## thigobr

I am not getting too many random reboots nor WHEA errors but I just tried Prime95 SSE single thread + thread affinity and I see some of the cores on my 5950X do throw rounding errors, and this is completely at stock! I had on this machine before a 3700X and a 1700 no issues whatsoever..

Time to start an RMA... By the way this CPU batch is BG 2044PGS


----------



## j96j

I decided to RMA my 5800X 4 weeks ago. New one just came in.

*New 5800X have been running stable than the last one.*
I troubleshooted with the old CPU for 2 months, so I have tried almost all suggestions on all forums.

At stock settings, old 5800X can reboot during windows installation and idle. The new 5800X with XMP enabled (3600c18 RAM), have been running stable so far. Finally I can continue my work. The old 5800X wasted 1-2 months of my time, from tinkering with BIOS settings and waiting for BIOS updates.

I'm pretty sure my old 5800X is the lowest binned chip though, because even with PBO and CBP disabled, my pc still reboots.
Now my new 5800X is still running stable. Even with RAM's XMP enabled to 3600 mhz. Note the only part I swapped was the old 5800x to new 5800x.


----------



## OCmember

In my experience my 3800X IMC was better than on my 5800X.. the 5800 can only do 3733 with either single rank or dual rank before getting WHEA errors. In my uneducated guess something effected the IMC in production.


----------



## Hueristic

OCmember said:


> In my experience my 3800X IMC was better than on my 5800X.. the 5800 can only do 3733 with either single rank or dual rank before getting WHEA errors. In my uneducated guess something effected the IMC in production.



Wrong thread.


----------



## OCmember

Hueristic said:


> Wrong thread.


I don't think so.


----------



## Mathieu le dégueu

[QUOTE = "nevcairiel, message: 28710839, membre: 641569"]
Mon remplacement 5950X RMA d'AMD est arrivé, production 2051SUS, donc assez nouveau. Les tests commenceront sous peu.
[/CITATION]
Bonjour,
So this 2051SUS is a better one?


----------



## Hilarius

I just read all 66 pages of this thread while waiting for my system to recover from reboots. After about 40 hours of troubleshooting, I have a bit of a wrinkle to add.

My system:
Ryzen 5600X
MSI B550 Unify
G.Skill 3200 14-14-14
Seasonic GX-750
Radeon 7870 (sigh...)

Built the system on Thursday, installed Windows and all my software, and had zero issues. I left every BIOS setting at stock except setting XMP Profile 1 on my RAM. I then gamed all day Friday, once again with zero issues. Saturday I started getting reboots while gaming, but the cause was consistent, it was always on a transition from the game to a cutscene, or to a loading screen, or to my desktop. Never once did it crash while just playing the game. Having heard about WHEA errors, I checked Event Viewer, and sure enough, I was getting WHEA 18 cache hierarchy errors before each reboot. That's when I started troubleshooting. 

I tried using OCCT Power test, OCCT CPU test, Prime95, Core Booster, and Cinebench R23 to reproduce the errors, both multi and single, and shifting between load and idle, and nothing worked. Eventually I found a way to reproduce the error in a game. Watching a 4 minute cutscene, and then immediately restarting the same cutscene was a 100% chance to cause a reboot. Now being able to reproduce the error, I started trying workarounds. I tried lowering my RAM timings, increasing RAM voltage, leaving RAM at the default 2133, disabling RAM Power Down, increasing CPU voltages (both Override and Offset), disabling c-states, disabling Core Performance Boost, enabling and disabling PBO, setting the CPU clock manually, setting RAM manually, setting fclk manually, every BIOS version, multiple chipset versions, multiple video card drivers, and nothing helped. 

Getting tired of watching the same cutscene over and over, I had a new idea. I started an OCCT 3D test, which yielded minimal CPU load, let it run for a minute, then canceled it. Immediately upon canceling the test, my system crashed. This ended up being reproducible as well. I also tested OCCT 3D test while running Cinebench R23 multi, and still got crashes, even as the CPU load remained 100%. At this point, I got to the discussion about curve optimizer, and decided to set an all core curve optimizer to +10. This actually delayed the crashes, but I still got them. So then I decided to use +20, which delayed the crashes even further, but I still got crashes. Then I saw the discussion about checking the APIC IDs, and realized that I was getting errors on all 12 threads, which made me think, "Maybe this isn't a CPU issue?" So then I decided to try something else - I reset my BIOS back to stock and underclocked my GPU. My system is now perfectly stable. No amount of OCCT 3D tests or watching the same cutscene over and over will result in a reboot. I'm glad to now have a stable system, but at this point, I now have even more questions.

I won't be able to do any more testing until this weekend, but some additional tests I plan on doing are putting my video card back into my old system and seeing if it works. As well as trying my old PSU in my new system and seeing if it's stable. I'm also trying to find a friend with a spare video card. I'd love to just get a new one, but who knows when that will happen. The reason I'm doing all of this testing is because I don't want to go 2-4 weeks w/o a CPU, just to discover that the problem isn't the CPU.

So, any thoughts?


----------



## JohnnyFlash

Hilarius said:


> I just read all 66 pages of this thread while waiting for my system to recover from reboots. After about 40 hours of troubleshooting, I have a bit of a wrinkle to add.
> 
> My system:
> Ryzen 5600X
> MSI B550 Unify
> G.Skill 3200 14-14-14
> Seasonic GX-750
> Radeon 7870 (sigh...)
> 
> Built the system on Thursday, installed Windows and all my software, and had zero issues. I left every BIOS setting at stock except setting XMP Profile 1 on my RAM. I then gamed all day Friday, once again with zero issues. Saturday I started getting reboots while gaming, but the cause was consistent, it was always on a transition from the game to a cutscene, or to a loading screen, or to my desktop. Never once did it crash while just playing the game. Having heard about WHEA errors, I checked Event Viewer, and sure enough, I was getting WHEA 18 cache hierarchy errors before each reboot. That's when I started troubleshooting.
> 
> I tried using OCCT Power test, OCCT CPU test, Prime95, Core Booster, and Cinebench R23 to reproduce the errors, both multi and single, and shifting between load and idle, and nothing worked. Eventually I found a way to reproduce the error in a game. Watching a 4 minute cutscene, and then immediately restarting the same cutscene was a 100% chance to cause a reboot. Now being able to reproduce the error, I started trying workarounds. I tried lowering my RAM timings, increasing RAM voltage, leaving RAM at the default 2133, disabling RAM Power Down, increasing CPU voltages (both Override and Offset), disabling c-states, disabling Core Performance Boost, enabling and disabling PBO, setting the CPU clock manually, setting RAM manually, setting fclk manually, every BIOS version, multiple chipset versions, multiple video card drivers, and nothing helped.
> 
> Getting tired of watching the same cutscene over and over, I had a new idea. I started an OCCT 3D test, which yielded minimal CPU load, let it run for a minute, then canceled it. Immediately upon canceling the test, my system crashed. This ended up being reproducible as well. I also tested OCCT 3D test while running Cinebench R23 multi, and still got crashes, even as the CPU load remained 100%. At this point, I got to the discussion about curve optimizer, and decided to set an all core curve optimizer to +10. This actually delayed the crashes, but I still got them. So then I decided to use +20, which delayed the crashes even further, but I still got crashes. Then I saw the discussion about checking the APIC IDs, and realized that I was getting errors on all 12 threads, which made me think, "Maybe this isn't a CPU issue?" So then I decided to try something else - I reset my BIOS back to stock and underclocked my GPU. My system is now perfectly stable. No amount of OCCT 3D tests or watching the same cutscene over and over will result in a reboot. I'm glad to now have a stable system, but at this point, I now have even more questions.
> 
> I won't be able to do any more testing until this weekend, but some additional tests I plan on doing are putting my video card back into my old system and seeing if it works. As well as trying my old PSU in my new system and seeing if it's stable. I'm also trying to find a friend with a spare video card. I'd love to just get a new one, but who knows when that will happen. The reason I'm doing all of this testing is because I don't want to go 2-4 weeks w/o a CPU, just to discover that the problem isn't the CPU.
> 
> So, any thoughts?


To me, that immediately says PSU ripple, if lowering the GPU draw removes the issue.


----------



## LuchoU

JohnnyFlash said:


> To me, that immediately says PSU ripple, if lowering the GPU draw removes the issue.


Yes, when he said that he left RAM at 2133 and disabled CPB, that basically put your CPU to work without any boost at all, it should be completely stable at that point, at least it was for me. If after that he is still having issues, it could be PSU, RAM or GPU. You could run a RAM test such as Karhu to discard it as a problem. I don't think GPU is bad, in my case when I have had GPU issues they don't trigger a WHEA 18, it could be artifacts or screen get stuck, but no auto-reboots.


----------



## yaniv82

AMD Machine Check Exception - X570/B550 Chipsets


With certain hardware, a system may become unstable and begin rebooting/crashing and only occasionally throwing errors that help identify the culprit. If your system uses an AMD Ryzen 9 CPU and a motherboard based on the X570 series chipset, you may run into this issue as well. This article will...




www.pugetsystems.com




AMD Machine Check Exception - X570 Chipsets


----------



## thigobr

Well, I am already on the latest BETA with AGESA 1.2.0.1 and still can't get Prime95 to pass Single thread test... If only it was as easy as a BIOS update to fix these issues...

I opened the RMA, it took 2 days to get a response. First they asked for more information and the CPU+MB picture. I just sent everything I have including the troubleshooting steps I have tried. I hope next email is the RMA approval.


----------



## Hilarius

JohnnyFlash said:


> To me, that immediately says PSU ripple, if lowering the GPU draw removes the issue.


Thanks for the reply! Technically, lowering GPU draw _causes_ the issue. I can load my CPU and my GPU to 100% for hours with zero issues. The only time I have a problem is when I load the GPU, and then _remove_ the load. I can reproduce this quite easily. I run the OCCT 3D test for 1 minute, and then cancel the test. I always get a crash as soon as I cancel the test. If I just let the test run, I never get a crash. If I was having issues under load, I would absolutely point to the PSU being the culprit, but why would a PSU issue manifest when removing a load? It's definitely possible that the PSU is the problem, and that's why I still plan on testing a different PSU, I've just never seen a PSU issue manifest in this way before, so I'm pretty confused. But why downclocking my GPU fixes the problem is definitely a mystery to me. This is the first time I've gamed in about 18 months, but I've still been using this card daily for office work, so it's possible something went wrong in the last 18 months and I'm only just now noticing. I'm still working on getting my hands on a spare video card that I can test with.

As for my RAM, I've tested both 2133 (BIOS default) and 3200 (XMP) pretty extensively with both HCI MemTest and TestMem5, and I'm yet to have a single error. I think I can hopefully rule out my RAM being the issue.

If I'm missing something, let me know. I'm all ears!


----------



## RemoteSpecialist

@Hilarius, try to run this script








Single core Prime95 test script for Zen 3 curve offset...


4/2/22 - Uploaded new version. Updated default version of p95 to p95v307b9.win64.zip Fixed p95 not exiting automatically Merged in fix for special characters appearing on some systems 2/28/21 - Uploaded new version. Refactored into a few functions that should help with adding more tests in the...




www.overclock.net


----------



## JohnnyFlash

Hilarius said:


> Thanks for the reply! Technically, lowering GPU draw _causes_ the issue. I can load my CPU and my GPU to 100% for hours with zero issues. The only time I have a problem is when I load the GPU, and then _remove_ the load. I can reproduce this quite easily. I run the OCCT 3D test for 1 minute, and then cancel the test. I always get a crash as soon as I cancel the test. If I just let the test run, I never get a crash. If I was having issues under load, I would absolutely point to the PSU being the culprit, but why would a PSU issue manifest when removing a load? It's definitely possible that the PSU is the problem, and that's why I still plan on testing a different PSU, I've just never seen a PSU issue manifest in this way before, so I'm pretty confused. But why downclocking my GPU fixes the problem is definitely a mystery to me. This is the first time I've gamed in about 18 months, but I've still been using this card daily for office work, so it's possible something went wrong in the last 18 months and I'm only just now noticing. I'm still working on getting my hands on a spare video card that I can test with.


We're talking about the same thing. When the GPU or CPU load drops off, the PSU drops output. If that drop in output is less stable or dips before stabilizing, that can cause problems and seems to mirror your experiences. 

It is by no means a for-sure explanation, only the most likely at the moment, from my point of view. It would seem that these chips are more sensitive to changes than others.


----------



## LuchoU

yaniv82 said:


> AMD Machine Check Exception - X570/B550 Chipsets
> 
> 
> With certain hardware, a system may become unstable and begin rebooting/crashing and only occasionally throwing errors that help identify the culprit. If your system uses an AMD Ryzen 9 CPU and a motherboard based on the X570 series chipset, you may run into this issue as well. This article will...
> 
> 
> 
> 
> www.pugetsystems.com
> 
> 
> 
> 
> AMD Machine Check Exception - X570 Chipsets


This is interesting as it comes from a system builder, probably their clients were complaining about WHEA errors when using their Ryzen systems.
They are offering a BIOS profile and suggesting to use AGESA 1.2.0.0, I'm curious to know which settings they are changing in BIOS profile to make their systems stable, if someone could try to load their bios profile (Asus X570 or Gigabyte B550) just to know what they are chaning in terms of CPU settings will be appreciated, I have an Asus B550 mb.


----------



## Hilarius

JohnnyFlash said:


> We're talking about the same thing. When the GPU or CPU load drops off, the PSU drops output. If that drop in output is less stable or dips before stabilizing, that can cause problems and seems to mirror your experiences.
> 
> It is by no means a for-sure explanation, only the most likely at the moment, from my point of view. It would seem that these chips are more sensitive to changes than others.


OK, I figured that's where you were going. That definitely seems plausible. Sadly, I won't be able to try my old PSU until this weekend.

@RemoteSpecialist I just ran the script to completion and it passed on all cores. This is with all BIOS settings at default. I then did my standard OCCT test and got an immediate crash.


----------



## Hilarius

I have an update, and I think I might be both more and less confused. It hit me that I've been testing this system with a single monitor, but I've actually been using this video card with 2 monitors for the last however many years. So I decided to do a test with 2 monitors. If you aren't familiar with this issue, using more than 1 monitor causes your VRAM to stay locked at max frequency. While using a single monitor, my VRAM frequency would jump around quite a bit, but with 2 monitors, it's always 1500 MHz. So I ran a test with all settings at stock, the only difference is that I was now using 2 monitors instead of 1, and I can't force a reboot. I used both my OCCT test and my in-game test, tested for almost 40 minutes, and the system was perfectly stable. How this relates to my other solution (downclocking the GPU), I haven't the slightest idea. This could still point to a PSU issue - solution 1) lower max load means removing the load is less of a drop (medium to low), and solution 2) VRAM can't idle which means removing the load results in less of a drop (high to medium), instead of dropping from high to low. But since the CPU error is "cache hierarchy", I've always been wondering if maybe it was a VRAM issue. At this point, I think I'm out of things to test until I can get my hands on a spare PSU and a spare video card.


----------



## 1devomer

Hilarius said:


> I have an update, and I think I might be both more and less confused. It hit me that I've been testing this system with a single monitor, but I've actually been using this video card with 2 monitors for the last however many years. So I decided to do a test with 2 monitors. If you aren't familiar with this issue, using more than 1 monitor causes your VRAM to stay locked at max frequency. While using a single monitor, my VRAM frequency would jump around quite a bit, but with 2 monitors, it's always 1500 MHz. So I ran a test with all settings at stock, the only difference is that I was now using 2 monitors instead of 1, and I can't force a reboot. I used both my OCCT test and my in-game test, tested for almost 40 minutes, and the system was perfectly stable. How this relates to my other solution (downclocking the GPU), I haven't the slightest idea. This could still point to a PSU issue - solution 1) lower max load means removing the load is less of a drop (medium to low), and solution 2) VRAM can't idle which means removing the load results in less of a drop (high to medium), instead of dropping from high to low. But since the CPU error is "cache hierarchy", I've always been wondering if maybe it was a VRAM issue. At this point, I think I'm out of things to test until I can get my hands on a spare PSU and a spare video card.



This thread is not dedicated to troubleshooting user issues.
This thread is dedicated to reporting WHEA cache hierarchy issue.

Clearly you did not read all the 66 pages composing this thread.
Please open your own thread to seek help, because apparently you are taking the wrong way thinking your gpu or psu are the cause.


----------



## Hilarius

1devomer said:


> This thread is not dedicated to troubleshooting user issues.
> This thread is dedicated to reporting WHEA cache hierarchy issue.
> 
> Clearly you did not read all the 66 pages composing this thread.
> Please open your own thread to seek help, because apparently you are taking the wrong way thinking your gpu or psu are the cause.


Clearly you're the one that didn't read. Each of my reboots is a WHEA 18 cache hierarchy error while running BIOS defaults. Which is precisely what this thread is about. Other people in this thread have also mentioned WHEA 18 errors while gaming and entering a menu or alt-tabbing, which is the issue that I'm having. Their issue was never resolved, so I've been trying to add additional information to maybe help those people, in addition to solving my own problem.

Edit to add: two issues have been uncovered in this thread. If the APIC IDs in the WHEA 18 errors point to 1 or 2 cores, and using a positive offset with curve optimizer allows those cores to become stable, then you have a broken CPU and need to RMA it. If the APIC IDs are random, you're using HWiNFO, and have a Navi GPU, you either need to stop using HWiNFO or install the beta version. Then there's the problem that others have mentioned. If you're getting WHEA 18 errors while in a game and entering a menu or alt-tabbing, a solution to this hasn't been found yet, and RMAing the CPU might not solve this issue. This is the issue that I'm trying to help with. This entire thread has been about finding the source of the WHEA 18 errors and how to correct them, which is precisely what I'm trying to do.


----------



## 1devomer

Hilarius said:


> Clearly you're the one that didn't read. Each of my reboots is a WHEA 18 cache hierarchy error while running BIOS defaults. Which is precisely what this thread is about. Other people in this thread have also mentioned WHEA 18 errors while gaming and entering a menu or alt-tabbing, which is the issue that I'm having. Their issue was never resolved, so I've been trying to add additional information to maybe help those people, in addition to solving my own problem.
> 
> Edit to add: two issues have been uncovered in this thread. If the APIC IDs in the WHEA 18 errors point to 1 or 2 cores, and using a positive offset with curve optimizer allows those cores to become stable, then you have a broken CPU and need to RMA it. If the APIC IDs are random, you're using HWiNFO, and have a Navi GPU, you either need to stop using HWiNFO or install the beta version. Then there's the problem that others have mentioned. If you're getting WHEA 18 errors while in a game and entering a menu or alt-tabbing, a solution to this hasn't been found yet, and RMAing the CPU might not solve this issue. This is the issue that I'm trying to help with. This entire thread has been about finding the source of the WHEA 18 errors and how to correct them, which is precisely what I'm trying to do.


The source of the issue is already known: poorly binned and defective chiplets are being sold by AMD, due to the chiplets supply and manufacturing constraints!
This was already the case with the Ryzen 3k launch, AMD is just repeating himself, but this time with bigger issues!

Now, i can explain how Ryzen works and why it is crashing the way it is:
-Ryzen cpu are built-in with a lot of power saving features, the cpu can wake up or puts to sleep the cores (and the cache) very very very rapidly.
-When the core wakes up, the PBO boost can ramp up the core clock very very very rapidly.
-The clock ramp up scheme is based and driven by the cpu overall load, alongside its own PBO stored values.
-At high loads the cpu runs lower core clocks, on the other hand, at light loads, the cpu will output the max boost clock the core can "achieve".
-There is a maximum voltage scaling, driven by the cpu load, _*generally *_1.5v single core, 1.35v all cores (depending on the n° of cores).
-The cpu silicon fitness, PBO values, are stored inside the cpu, the issue is not related to a particular mb, but rather evolve alongside the AGESA bios versions.


Therefore, under light loads, the core boost peaks at its max achievable frequency.
But the core* NEEDS *to be good enough to achieve the max boost frequency, at fixed determined voltage.
That's why users are reporting fewer or no crash when running P95, OCCT, CB20, at 100% ish loads.
Because almost all cores are good enough to sustain the lower clock imposed by 100% load, at a determined voltage.
Moreover that's why instead, users keep complaining about crashing at idle or under light loads ish.
Because what a lot of people consider "idle", in reality represent very quick boosts up to 5Ghz on a couple of cores, to please the AMD single core marketing materials.
If the single core boost is too fast and the default max voltages are too low for the core silicon quality, the cpu will crash the OS with a WHEA error.

That's why your new rig is crashing when you are watching the game cutscene, and it is not crashing when you load the cpu with P95 on top of the game.
Moreover, your cpu keep crashing when a +20 is applied with the curve optimizer, which mean you got an awfully binned cpu.
That require a lot of voltage to be able to boost to its max clocks, the default voltage is not enough to accommodate the cores bad silicon quality.

By lowering the gpu boost clocks, you lowered the amount of FPS that the gpu can output during the cutscene, hence lowering the overall load/stress on the cpu.
Cutscenes are not really heavy or so for the cpu/gpu, which mean a ton of FPS, up to 300+FPS, during the cutscene, before getting back to a normal gpu/cpu load.
The cpu will still boost to its max clocks, but it will have fewer things to do because it needs to render fewer frames.
Hence, managing to complete the cutscene even if barely stable, and overclocking the gpu would make the cpu crash even faster, i guess!


What crash the cpu, in my opinion, is a very fast, transient load spike, the cpu clocks overshot compared to the allowed max voltage curve.
There is no way to recover from a badly binned, dull or defective cpu, the only real solution is to RMA the cpu and hope AMD send you a decent piece of doped sand!


----------



## Vesimas

I just assembled an Asus Dark Hero, 5800X, G.Skill 3800 Neo and AMD RX 6900XT. First thing i updated the bios to the last one, then installed windows without touching anything in bios. Installed AMD chipset driver, Adrenalide driver and Ryzen Master. I ran three multi thread CB15 and one single thread (just to compare score with old pc) and i made an Heaven Benchmark run. So far so good, i'll update in the next few days if something happen or with what i did should already happened?


----------



## jvidia

1devomer said:


> The source of the issue is already known: poorly binned and defective chiplets are being sold by AMD, due to the chiplets supply and manufacturing constraints!
> This was already the case with the Ryzen 3k launch, AMD is just repeating himself, but this time with bigger issues!
> 
> Now, i can explain how Ryzen works and why it is crashing the way it is:
> -Ryzen cpu are built-in with a lot of power saving features, the cpu can wake up or puts to sleep the cores (and the cache) very very very rapidly.
> -When the core wakes up, the PBO boost can ramp up the core clock very very very rapidly.
> -The clock ramp up scheme is base and driven by the cpu overall load, alongside its own PBO stored values.
> -At high loads the cpu runs lower core clocks, on the other hand, at light loads, the cpu will output the max boost clock the core can "achieve".
> -There is a maximum voltage scaling, driven by the cpu load, _*generally *_1.5v single core, 1.35v all cores (depending on the n° of cores).
> -The cpu silicon fitness, PBO values, are stored inside the cpu, the issue is not related to a particular mb, but rather evolve alongside the AGESA bios versions.
> 
> 
> Therefore, under light loads, the core boost peaks at its max achievable frequency.
> But the core* NEEDS *to be good enough to achieve its max boost frequency, at fixed determined voltage.
> That's why users are reporting fewer or no crash when running P95, OCCT, CB20, at 100% ish loads.
> Because almost all cores are good enough to sustain the lower clock imposed by 100% load, at a determined voltage.
> Moreover, that's why instead, users keep complaining about crashing at idle or under light loads ish.
> Because what a lot of people consider "idle", in reality represent very quick boosts up to 5Ghz on a couple of cores, to please the AMD single core marketing materials.
> If the single core boost is too fast and the default max voltages are too low for the core silicon quality, the cpu will crash the OS with a WHEA error.
> 
> That's why your new rig is crashing when you are watching the game cutscene, and it is not crashing when you load the cpu with P95 on top of the game.
> Moreover, your cpu keep crashing when a +20 is applied with the curve optimizer, which mean you got an awfully binned cpu.
> That require a lot of voltage to be able to boost to its max clocks, the default voltage is not enough to accommodate the cores bad silicon quality.
> 
> By lowering the gpu boost clocks, you lowered the amount of FPS that the gpu can output during the cutscene, hence lowering the overall load/stress on the cpu.
> Cutscenes are not really heavy or so for the cpu/gpu, which mean a ton of FPS, up to 300+FPS, during the cutscene, before getting back to a normal gpu/cpu load.
> The cpu will still boost to its max clocks, but it will have fewer things to do because it needs to render fewer frames.
> Hence, managing to complete the cutscene even if barely stable, and overclocking the gpu would make the cpu crash even faster, i guess!
> 
> 
> What crash the cpu, in my opinion, is a very fast, transient load spike, the cpu clocks overshot compared to the allowed max voltage curve.
> There is no way to recover from a badly binned, dull or defective cpu, the only real solution is to RMA the cpu and hope AMD send you a decent piece of doped sand!


<deleted>


----------



## JohnnyFlash

Vesimas said:


> I just assembled an Asus Dark Hero, 5800X, G.Skill 3800 Neo and AMD RX 6900XT. First thing i updated the bios to the last one, then installed windows without touching anything in bios. Installed AMD chipset driver, Adrenalide driver and Ryzen Master. I ran three multi thread CB15 and one single thread (just to compare score with old pc) and i made an Heaven Benchmark run. So far so good, i'll update in the next few days if something happen or with what i did should already happened?


Did you get the new BIOS (3302) that was posted today?


----------



## Vesimas

No, as just said on the official thread, when i checked the italian web site there was only the 3204  if i want the new one i need to follow the link


----------



## LuchoU

1devomer said:


> The source of the issue is already known: poorly binned and defective chiplets are being sold by AMD, due to the chiplets supply and manufacturing constraints!
> This was already the case with the Ryzen 3k launch, AMD is just repeating himself, but this time with bigger issues!
> 
> Now, i can explain how Ryzen works and why it is crashing the way it is:
> -Ryzen cpu are built-in with a lot of power saving features, the cpu can wake up or puts to sleep the cores (and the cache) very very very rapidly.
> -When the core wakes up, the PBO boost can ramp up the core clock very very very rapidly.
> -The clock ramp up scheme is base and driven by the cpu overall load, alongside its own PBO stored values.
> -At high loads the cpu runs lower core clocks, on the other hand, at light loads, the cpu will output the max boost clock the core can "achieve".
> -There is a maximum voltage scaling, driven by the cpu load, _*generally *_1.5v single core, 1.35v all cores (depending on the n° of cores).
> -The cpu silicon fitness, PBO values, are stored inside the cpu, the issue is not related to a particular mb, but rather evolve alongside the AGESA bios versions.
> 
> 
> Therefore, under light loads, the core boost peaks at its max achievable frequency.
> But the core* NEEDS *to be good enough to achieve its max boost frequency, at fixed determined voltage.
> That's why users are reporting fewer or no crash when running P95, OCCT, CB20, at 100% ish loads.
> Because almost all cores are good enough to sustain the lower clock imposed by 100% load, at a determined voltage.
> Moreover, that's why instead, users keep complaining about crashing at idle or under light loads ish.
> Because what a lot of people consider "idle", in reality represent very quick boosts up to 5Ghz on a couple of cores, to please the AMD single core marketing materials.
> If the single core boost is too fast and the default max voltages are too low for the core silicon quality, the cpu will crash the OS with a WHEA error.
> 
> That's why your new rig is crashing when you are watching the game cutscene, and it is not crashing when you load the cpu with P95 on top of the game.
> Moreover, your cpu keep crashing when a +20 is applied with the curve optimizer, which mean you got an awfully binned cpu.
> That require a lot of voltage to be able to boost to its max clocks, the default voltage is not enough to accommodate the cores bad silicon quality.
> 
> By lowering the gpu boost clocks, you lowered the amount of FPS that the gpu can output during the cutscene, hence lowering the overall load/stress on the cpu.
> Cutscenes are not really heavy or so for the cpu/gpu, which mean a ton of FPS, up to 300+FPS, during the cutscene, before getting back to a normal gpu/cpu load.
> The cpu will still boost to its max clocks, but it will have fewer things to do because it needs to render fewer frames.
> Hence, managing to complete the cutscene even if barely stable, and overclocking the gpu would make the cpu crash even faster, i guess!
> 
> 
> What crash the cpu, in my opinion, is a very fast, transient load spike, the cpu clocks overshot compared to the allowed max voltage curve.
> There is no way to recover from a badly binned, dull or defective cpu, the only real solution is to RMA the cpu and hope AMD send you a decent piece of doped sand!


1devomer, this is a very good analysis. I thought something similar in my head, but you did a very good written explanation.
If this fitness and PBO info are stored inside each CPU, do you think it's possible for AMD to stabilize things with their AGESA releases through BIOS? how they are going to control this? the only way I can think of is to do some kind of "Min CPU Core Boost Override", something similar to Max CPU Core Boost Override where you can go from 0 to +200Mhz, so maybe they will need to provide a 0 to -200Mhz, sounds crazy. The other possible way is to add additional voltage in the background, but this will affect all CPUs, even the good ones.


----------



## thigobr

I noticed exactly what 1devomer said... I only get crashes or Prime95 fails when I test single thread or light loads. The core 0 on my 5950x is the strongest and always boost higher but it's unstable when doing that... Prime95 SSE will make it go to 5025MHz and boom! Rounding error or reboot...


----------



## MikeS3000

thigobr said:


> I noticed exactly what 1devomer said... I only get crashes or Prime95 fails when I test single thread or light loads. The core 0 on my 5950x is the strongest and always boost higher but it's unstable when doing that... Prime95 SSE will make it go to 5025MHz and boom! Rounding error or reboot...


Same issue on my 1st 5900x. AMD approved RMA in a day for me. When they ask for proof, take a video of resetting bios defaults and failing p95 at stock. Send them the link to the video. No questions asked. My new 5900x seems to work fine. I can definitely tell that this CPU is tuned much for conservatively at stock for boost clocks. I have to play with CO a lot more to get more performance out of it (not a bad thing as your everyday user won't experience crashes at stock and likely won't notice the 0.5% to 1.0% lower boost clocks in favor of stability)


----------



## LuchoU

My bad 5800x is boosting to 4,850Mhz every core when running single thread, for it to be stable I need to use +8 CO. I believe max boost clock for this CPU is 4,7Ghz according to product specifications. If I disable PBO it still boosts to 4,850Ghz, so no change there, cleary these CPUs are pushed to the max without stability in mind.


----------



## LuchoU

Double post, sorry.


----------



## 1devomer

LuchoU said:


> 1devomer, this is a very good analysis. I thought something similar in my head, but you did a very good written explanation.
> If this fitness and PBO info are stored inside each CPU, do you think it's possible for AMD to stabilize things with their AGESA releases through BIOS? how they are going to control this? the only way I can think of is to do some kind of "Min CPU Core Boost Override", something similar to Max CPU Core Boost Override where you can go from 0 to +200Mhz, so maybe they will need to provide a 0 to -200Mhz, sounds crazy. The other possible way is to add additional voltage in the background, but this will affect all CPUs, even the good ones.



Bios updates cannot fix everything, updating the bios will not change the physical proprieties of a cpu.
A bad cpu will remain a bad cpu, no bios update can change the conditions by which the cpu was made.

However, bios boost adjustments and OS tweaks will help to mitigate some issues.
Adding voltage is also an option, but not without drawbacks like heat, so at the end lower clocks.

If the cpu sold are very very different one from the other, it will be harder and harder to fix everything with a simple bios update.
Little reminder, a cpu cannot regenerate himself, it can only degrade over time.
It would be nice to get something at least decent to start with, instead of something barely decent or even rubbish.


----------



## TheSh4d0w

TheSh4d0w said:


> The one I sent back was (assuming I'm reading it properly) BG 2050PGS
> The replacement is BG 2104PGS
> 
> Tried just pulling one ram stick at a time / alternating slots in case it's not the CPU, but it consistently crashes at the 10-15 mark of prime95....


After upgrading to a new bios asus released yesterday (3302 for the rog crosshair viii impact), my stability issues appear to be resolved. With all stock bios settings I've been doing prime95 for an hour now successfully, previously it would crash around 12-13 minutes in.

Edit: Nope, now I'm stable under load, but not idle apparently. Just leaving the PC at the windows login screen for a few minutes will result in a bsod.


----------



## thigobr

MikeS3000 said:


> Same issue on my 1st 5900x. AMD approved RMA in a day for me. When they ask for proof, take a video of resetting bios defaults and failing p95 at stock. Send them the link to the video. No questions asked. My new 5900x seems to work fine. I can definitely tell that this CPU is tuned much for conservatively at stock for boost clocks. I have to play with CO a lot more to get more performance out of it (not a bad thing as your everyday user won't experience crashes at stock and likely won't notice the 0.5% to 1.0% lower boost clocks in favor of stability)


Thanks for the tip Mike! I will make a video resetting the bios, booting and running Prime95 1T... It doesn't take more than 3min before first rounding error pops up!


----------



## woozywoo

Hi all - hoping to get some help here with my WHEA errors and random reboots. 

*CPU:* AMD Ryzen 9 5900X
*Motherboard:* ASUS TUF X570-PLUS GAMING (Wi-Fi)
*BIOS: *Version 3405. This is ASUS's latest non-BIOS version that has AGESA 1.2.0.0 
*RAM: *G.Skill 32GB DDR4 3600MHz Ripjaws CL16 (2x16GB)

About once a week, I get a random reboot, or a BSOD (WHEA_UNCORRECTABLE_ERROR). This has happened on light loads (just a few tabs of Chrome open) or on heavier loads (i.e., gaming + streaming). This can happen on just stock settings or when I have XMP/DOCP enabled at 3600mhz or 3200mhz. With stock/3600mhz/3200mhz, the BSODs/restarts all happen with more or less the same frequency -- again, about once a week.

I will note that my mobo initially came with BIOS version 2812 (a beta version) installed. Before I updated the AMD chipset drivers and BIOS, I was getting random restarts and BSODs about once per day so updating drivers and BIOS has been a huge improvement in terms of stability for me.

-In the Event Viewer, I do not see any WHEA warnings or errors. I'm not seeing the WHEA Cache Hierarchy Error that many other folks are seeing.
-I have no issues with CPU or GPU temps. They are all in-line with expectations when idle and under load. 

I've tried the following 3 configs:

-Completely stock settings
-XMP turned on at 3200mhz (IF = 1600mhz), C-states disabled
-XMP turned on at 3600mhz (IF = 1800mhz), C-states disabled

With each of these configs, I've run the following tests and have had no errors/issues:

Karhu RAM test for ~8 hours
Memtest86 for ~8 hours
OCCT CPU Large Data test with extreme option checked for ~1 hour
OCCT Memory test for ~1 hour
Prime95 test, large FFTs and small FFTs, a few hours
Cinebench23 and 3DMark all run without issues

It seems weird to me that my PC can pass all of these stress tests, but still experience BSODs or random restarts about once per week. 

I have not tried touching any of the voltages or other settings in BIOS. (Honestly, not too knowledgeable about voltages.)

Any suggestions as to what else I can try? Has anyone with my motherboard/BIOS/CPU had success with tweaking BIOS settings or voltages to create stability?

It seems like my issue is not a RAM issue and I can only hope that a future BIOS version will make my PC more stable.


----------



## JohnnyFlash

What is your SSD?


----------



## woozywoo

JohnnyFlash said:


> What is your SSD?


512GB ADATA XPG ASX6000LNP PCIe NVMe


----------



## JohnnyFlash

woozywoo said:


> 512GB ADATA XPG ASX6000LNP PCIe NVMe


First thing I'd try is manually setting it to PCI-E 3.0 in the bios. Even though it shouldn't have an effect, this has fixed it for a few people.

I would also download the free version of 'who crashed', so you can get a better understanding of the crash reports.


----------



## TheSh4d0w

TheSh4d0w said:


> After upgrading to a new bios asus released yesterday (3302 for the rog crosshair viii impact), my stability issues appear to be resolved. With all stock bios settings I've been doing prime95 for an hour now successfully, previously it would crash around 12-13 minutes in.


I spoke too soon, after 2 hours in prime95 (never managed that before) I quit the test and went out for a couple hours. Came home to it sitting in the bios. Hard powered it off and back on, and then got another whea bsod while just sitting at the windows login prompt  This is so aggravating.

Using a samsung 980 pro (pcie 4) so just set it to 3 as an experiement...


----------



## Deepcuts

TheSh4d0w said:


> I spoke too soon, after 2 hours in prime95 (never managed that before) I quit the test and went out for a couple hours. Came home to it sitting in the bios. Hard powered it off and back on, and then got another whea bsod while just sitting at the windows login prompt  This is so aggravating.
> 
> Using a samsung 980 pro (pcie 4) so just set it to 3 as an experiement...


If I recall correctly, this is your 2nd CPU with problems?
If so, I see AMD is stepping up their game.
Soon a golden sample will be considered one that does not crash for at least 1 day.


----------



## JohnnyFlash

TheSh4d0w said:


> I spoke too soon, after 2 hours in prime95 (never managed that before) I quit the test and went out for a couple hours. Came home to it sitting in the bios. Hard powered it off and back on, and then got another whea bsod while just sitting at the windows login prompt  This is so aggravating.
> 
> Using a samsung 980 pro (pcie 4) so just set it to 3 as an experiement...


Is your LLC on auto?


----------



## TheSh4d0w

Deepcuts said:


> If I recall correctly, this is your 2nd CPU with problems?
> If so, I see AMD is stepping up their game.
> Soon a golden sample will be considered one that does not crash for at least 1 day.


Correct. AMD support doesn't seem willing to replace the CPU again without me finding another cpu myself to test with, or paying a local store to test it for me  This is my first/last AMD system so I'm having to grab a machine from the office to steal parts from this weekend.

So far it seems to be stable with my SSD dropped down to 3.0 though, I left it idle for an hour and am now trying prime95.



JohnnyFlash said:


> Is your LLC on auto?


Uhh I'm on all stock bios settings  Next time I reboot I can check, but hopefully that's not for a while.


----------



## danny9428

One thing to note when troubleshooting Zen 3 based CPUs is that when you do crash
If you do not go into the bios and do a save and reset and let it keep on booting into your OS it tends to crash right out of the gate further and potentially in an endless loop.

At least that is what I experienced on my Dark Hero board...
It seems the bios and agesa are still kinda crappy in supporting the CPU

This is trickey and annoying but I'll recommend everytime you suspect a CPU crash occured, go into the bios and do a save and restart (or try make some adjustment...at least)

btw I think I'm set to go Threadripper instead...I snagged a good deal on a 3960X (at about same price as a 5950X) so I think I'll just pass on my Zen 3 chip when AMD ship it back to me....


----------



## Xziel

Hello all , just got a 5900x. Whenever I play games and watch netflix , or use userbenchmark , I get the WHEA error : Uncorrectable. 
The WHEA error instant triggers when the userbenchmark reaches the GPU testing segment. 

As long as I stay clear from using anything that requires the GPU component , I can go all night without crashes. 
This has not happened when I was on my 3900x . 
Using a 750w for 5900x and 3080. Updated all the bios settings , and tested all fixes in this thread.
Could this be a PSU issue ?


----------



## LuchoU

Xziel said:


> Hello all , just got a 5900x. Whenever I play games and watch netflix , or use userbenchmark , I get the WHEA error : Uncorrectable.
> The WHEA error instant triggers when the userbenchmark reaches the GPU testing segment.
> 
> As long as I stay clear from using anything that requires the GPU component , I can go all night without crashes.
> This has not happened when I was on my 3900x .
> Using a 750w for 5900x and 3080. Updated all the bios settings , and tested all fixes in this thread.
> Could this be a PSU issue ?


If you are using default bios settings and
you have a Ryzen 5xx0x CPU and
you get a WHEA 18 and
it has any of these 2 description in Event viewer
you CPU will most probably be the culprit.

A fatal hardware error has occurred.

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Bus/Interconnect Error

OR

A fatal hardware error has occurred.

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error


----------



## TheSh4d0w

I just tweeted AMD to push for some sort of official response to this, would encourage everyone else to do the same!

__ https://twitter.com/i/web/status/1370424134753087493


----------



## Redwoodz

ghiga_andrei said:


> These cpus launched early November and we are now 4 months and 5 agesas later and you are talking about waiting for updated agesa. So it's ok for someone to wait more than 4 months and have reboots every day with a new cpu. Your logic is solid man, don't listen to the hype.


You are silly if you build a brand new PC on new tech without another PC. You also can clearly use the PC you just found a way to make it error.lmao. /thread Good job completely missing my point by the way.


----------



## ghiga_andrei

Redwoodz said:


> You are silly if you build a brand new PC on new tech without another PC. You also can clearly use the PC you just found a way to make it error.lmao. /thread Good job completely missing my point by the way.


So in your opinion everyone building a new ryzen system should have a spare pc at home. Great stuff.


----------



## Deepcuts

ghiga_andrei said:


> So in your opinion everyone building a new ryzen system should have a spare pc at home. Great stuff.


Just ignore him. He is clearly having some issues.


----------



## TheSh4d0w

Welp, after struggling with getting my replaced proc stable I was able to borrow a ryzen 2400g system from the office over the weekend. Swapped its proc into my system, left it running overnight, and when I checked it in the morning it was sitting in bios (ie, it bsod'ed overnight). Looks like not only was my first proc defective, but my motherboard is too! Did a live chat with asus and they were quick to volunteer an RMA since I'd already replaced the CPU.

I just want a working PC


----------



## thigobr

After a week of back and forth with AMD support they finally approved the RMA.

As @MikeS3000 suggested few posts ago they stopped asking for extra "troubleshoot" steps after I sent a video of me resetting the BIOS/Loading optimized defaults/Booting Windows and running Prime95 SSE single thread: rounding error and crash.

On this CPU core #0 was the only one giving errors apparently. I tried CoreCycler script by skipping test on core 0 and it mostly worked (1 cycle of 10min each core). Core #0 on the other hand throws errors in less than 5minutes.

I will mail the CPU today let's see how long it takes for them to send a new CPU back. I hope the new one is stable...


----------



## JohnnyFlash

TheSh4d0w said:


> Welp, after struggling with getting my replaced proc stable I was able to borrow a ryzen 2400g system from the office over the weekend. Swapped its proc into my system, left it running overnight, and when I checked it in the morning it was sitting in bios (ie, it bsod'ed overnight). Looks like not only was my first proc defective, but my motherboard is too! Did a live chat with asus and they were quick to volunteer an RMA since I'd already replaced the CPU.
> 
> I just want a working PC


Or your PSU.


----------



## Hueristic

TheSh4d0w said:


> Welp, after struggling with getting my replaced proc stable I was able to borrow a ryzen 2400g system from the office over the weekend. Swapped its proc into my system, left it running overnight, and when I checked it in the morning it was sitting in bios (ie, it bsod'ed overnight). Looks like not only was my first proc defective, but my motherboard is too! Did a live chat with asus and they were quick to volunteer an RMA since I'd already replaced the CPU.
> 
> I just want a working PC



Did you hard reset the cmos to default values?


----------



## Catscratch

TheSh4d0w said:


> Welp, after struggling with getting my replaced proc stable I was able to borrow a ryzen 2400g system from the office over the weekend. Swapped its proc into my system, left it running overnight, and when I checked it in the morning it was sitting in bios (ie, it bsod'ed overnight). Looks like not only was my first proc defective, but my motherboard is too! Did a live chat with asus and they were quick to volunteer an RMA since I'd already replaced the CPU.
> 
> I just want a working PC





JohnnyFlash said:


> Or your PSU.





Hueristic said:


> Did you hard reset the cmos to default values?


As a last resort, if possible, i'd revert the bios to a version that doesn't support zen3 series.


----------



## TheSh4d0w

JohnnyFlash said:


> Or your PSU.


I was using the PSU with my last box for a couple months, so pretty sure its fine.



Hueristic said:


> Did you hard reset the cmos to default values?


More times than I can count



Catscratch said:


> As a last resort, if possible, i'd revert the bios to a version that doesn't support zen3 series.


Had to return the loaner proc, so just gonna send the mobo back to asus and hope a replacement/repair solves it.


----------



## JohnnyFlash

TheSh4d0w said:


> I was using the PSU with my last box for a couple months, so pretty sure its fine.


Honeslty, that makes it more likely, not less. PSUs weaken over time and adding a higher load and expose that degradation faster, especially with chips that are more sensitive to ripple. I'm not saying I'm certain, but don't count it out either.


----------



## TheSh4d0w

JohnnyFlash said:


> Honeslty, that makes it more likely, not less. PSUs weaken over time and adding a higher load and expose that degradation faster, especially with chips that are more sensitive to ripple. I'm not saying I'm certain, but don't count it out either.


It's a brand new psu that I used for 3 months with my old box until I found another pci-e power cable for my old one, so still new but established to be stable


----------



## DizzyAMD

Been lurking for a while and wanted to share my experience. 

Received a 5900x batch 2046PGS from Newegg Canada and this CPU wouldn't even let me install windows without restarting or getting a BSOD. I had windows preinstalled on separate SSD and used that to fire up the computer. Immediately the WHEA uncorrectable errors and random restarts starting occurring at idle and under load. I had to set the curve optimizer to a +8 all cores to get it to run stable. I ultimately decided to RMA the chip and the new 5900x batch 2104PGS has been stable for 2 days now without any BSOD or random restarts.


----------



## ethanmalone

Hello there.

Been dealing with my own Zen 3 woes recently. Bought a R7 5800x at Micro Center last Thursday with a corresponding X570 ITX Aorus board. The first chip I had WHEA'd constantly at full stock settings with no XMP. I was able to stabilize it for a few hours on average by under-volting to 1.28750 and locking the ratio at 42. Disabling CPB also made it 100% stable but also reduces the value of the chip significantly.

I exchanged it, cleared CMOS and stress tested at full stock. I also ran the @ghiga_andrei CB R20 WHEA tests on all cores. No crashes. Added XMP. Same deal; no crashes. No "buts" at this point. Appears to be working and the issue was just the "normal failure rate" of the Zen 3 line up.

Tips: if you see WHEA and can circumvent by disabling CPB: exchange your CPU if at all possible. I know I am speaking from a literal state of privilege by having Micro Center available but try buying from a brick and mortar with at least a two week return policy. The guys at MC even looked for a 2021 sku for me as I've heard those are more stable. 

Side note: the Fusion 2.0 software would cause my hard drives not to be accessible. Keep an eye on it.


----------



## OCmember

@ghiga_andrei Thanks for the testing program. 

Decided to run ghiga_andrei's testing program, temps were good 65*c for each core. On Bios 3302 with the Dark Hero. I've had the chip since Nov. 2020. Once, since I setup the system, the system crashed and was being difficult trying to start. Said no drive detected. Didn't think much of it. Recently the monitor flickered black for a second twice, it's still been running smoothly. System is a daily rig, everything is default except for a few SATA ports set to hot swap. Ran the app, one core failed - core 5 or 6? not sure but it's a surprise for sure. I don't get reboots or WHEA errors but this is an eye opener for sure. Could be the bios, or something else, but I'm gonna re-run it several times to see what's goin on.


----------



## ghiga_andrei

OCmember said:


> @ghiga_andrei Thanks for the testing program.
> 
> Decided to run ghiga_andrei's testing program, temps were good 65*c for each core. On Bios 3302 with the Dark Hero. I've had the chip since Nov. 2020. Once, since I setup the system, the system crashed and was being difficult trying to start. Said no drive detected. Didn't think much of it. Recently the monitor flickered black for a second twice, it's still been running smoothly. System is a daily rig, everything is default except for a few SATA ports set to hot swap. Ran the app, one core failed - core 5 or 6? not sure but it's a surprise for sure. I don't get reboots or WHEA errors but this is an eye opener for sure. Could be the bios, or something else, but I'm gonna re-run it several times to see what's goin on.


Are you sure it was my tool you used ? Because my tool would cause a reboot, not tell you a core has failed.
There is also that Prime95-based new tool that runs single-core SSE on each core and tells you which cores have rounding errors (fails) and maybe you used that one. Not sure who made that one.

Either way, neither of these tests should fail on any CPU.


----------



## OCmember

ghiga_andrei said:


> Are you sure it was my tool you used ? Because my tool would cause a reboot, not tell you a core has failed.
> There is also that Prime95-based new tool that runs single-core SSE on each core and tells you which cores have rounding errors (fails) and maybe you used that one. Not sure who made that one.
> 
> Either way, neither of these tests should fail on any CPU.


CoreCycler 0.7.9.2 Isn't that yours? 

And the error seems consistent to that particular core so it seems one core is not stable. Test 2


----------



## ghiga_andrei

OCmember said:


> CoreCycler 0.7.9.2 Isn't that yours?
> 
> And the error seems consistent to that particular core so it seems one core is not stable. Test 2
> 
> View attachment 2483284


Nope, mine's the one running Cinebench and other stuff afterwards, not Prime95.


----------



## Vesimas

I suppose i'm safe with my 5800X and Dark Hero with bios 3204 atm. As i said the first test i did after installing Windows days ago were all good. Today i ran some Time Spy Extreme, Cinebench R20 multi and single, Division 2 bench and gameplay, installed a bunch of app and never had an issue. I tested also some cores (not all) with @ghiga_andrei app. After that i have set the "xmp" profile @3800 still no problems. What do you think? Can i assume i'm safe from Whea errors?


----------



## xeizo

Vesimas said:


> I suppose i'm safe with my 5800X and Dark Hero with bios 3204 atm. As i said the first test i did after installing Windows days ago were all good. Today i ran some Time Spy Extreme, Cinebench R20 multi and single, Division 2 bench and gameplay, installed a bunch of app and never had an issue. I tested also some cores (not all) with @ghiga_andrei app. After that i have set the "xmp" profile @3800 still no problems. What do you think? Can i assume i'm safe from Whea errors?


No, not complete safe, sudden reboot(in idle) can happen after like two weeks ...


----------



## Woodie_SL

Vesimas said:


> I suppose i'm safe with my 5800X and Dark Hero with bios 3204 atm. As i said the first test i did after installing Windows days ago were all good. Today i ran some Time Spy Extreme, Cinebench R20 multi and single, Division 2 bench and gameplay, installed a bunch of app and never had an issue. I tested also some cores (not all) with @ghiga_andrei app. After that i have set the "xmp" profile @3800 still no problems. What do you think? Can i assume i'm safe from Whea errors?


For me it happened around when the CPU was 6 weeks old, so it is really hard to tell if a chips is good or not until it reboots on its own. Don't worry about it until WHEA errors really occur, there is nothing that you can do to avoid it from happening. From my observation, seems like early batch have a much higher failure rate than recent batches, as indicated by the declining amount of responses in this thread.

My bad 5900x was from the batch 2048SUS. Aftering RMAing it I just sold it and get a 10900KF as I need a 100% stable and reliable PC as my tool to work with during this work from home period. 

Fun fact: People in MSI know this WHEA issue and told me to RMA the CPU instead of waiting for BIOS update. So there is no way that AMD themselves don't know that this is a thing.


----------



## ghiga_andrei

Anthos said:


> As far as I know unless the laws have changed (obviously varies from country to country) shipping is at the customer's expense. I mean currently I am living in the UK (I am not from around here) but very likely in the next few months I will be moving back home to the other side of the continent. Are the shops and manufacturers now forced to pay more expensive shipping if something breaks down just because I moved? What if I moved to Australia?


Hmm, seems it is only a rule in the EU: Under EU law, within the legal guarantee period of two years, defective products must be repaired or replaced *without any cost to the consumer*. This includes any shipping costs.


----------



## LuchoU

Here are my results after using CoreCycler with a 5800x, all stock settings....

- 1st run for best core, which is N#2 (the one with a star in ryzen master), failed with a rounding error in Iteration 3.

Added CO +5 to that core.

- 2nd run of CoreCycler

It reached the 10th iteration without any core failing after adding CO +5 to core N#2.
I wanted to test with a less positive value for Core N#2 so I used CO +3 instead of CO +5 and let CoreCycler ran ignoring all the rest of the cores except core N#2.

- 3rd run of CoreCycler failed on Iteration 17 with rounding error for core N#2, so CO +3 is a no go.

Back to CO +5 for core #2. I left it running over night for 13 hours (129 iterations). Result: Passed
OK at this point I think core N#2 is stable with CO +5, but I still need to know if CO +4 will work, the idea is to lower the vcore. This is pending.

- 4th run of CoreCycler for cores 0,1,3,4,5,6,7 overnight (ignoring core N#2 which I know is stable with positive +5)

Results: Cores 0,1,3,4,5,6,7 passed after 14 hours (20 iterations). This means each one of those cores was tested for 2 hours. I'm not sure if I can certifify all cores are stable after 2 hours for each core, but I will take the risk.

PD: This is time consuming, but I still need to know if CO +4 will work for Core #2 so I will let it run overnight today. If it fails I will go back to CO +5 and I'm done.


----------



## mckajvah

So. After some months of not tinkering with the 5950x, I decided to try and optimize with Curve Optimizer again. Tested with CoreCycler and got some stable cores. Ran them for 5 iterations without any faults.... Then I remembered a small little utility that I think I got mentioned on this site, BoostTester. Releases · jedi95/BoostTester

I decided I wanted to log the test, as I thought I would let it run for some time and therefore ran it with (powershell ".\BoostTester.exe | tee Log.txt") command. I'm running without HWiNFO or anything else, to let the cores boost as high as possible.

Oh man, I'm glad I did. Har to adjust down 6 of my cores as the BoostTester would reboot my computer on the first or second run-through. Some of them gave WHEA error and some just rebooted the machine. I can remember running the program before, but never with so "good" results for finding problematic cores. I think maybe the powershell command may have done the trick.

Could someone else also test and report back? I have a feeling I might have stumbled upon a new good "core optimizer" test here as CoreCycler did not report back any problems on any of the 6 cores I had to dial back, even after 5 iterations.


----------



## MikeS3000

mckajvah said:


> So. After some months of not tinkering with the 5950x, I decided to try and optimize with Curve Optimizer again. Tested with CoreCycler and got some stable cores. Ran them for 5 iterations without any faults.... Then I remebered a small little utility that I think I got mentioned on this site, BoostTester. Releases · jedi95/BoostTester
> 
> I diceided I wanted to log the test, as I thought it would run let it run for some time and therefore ran it with (powershell ".\BoostTester.exe | tee Log.txt") command. I'm running without HWiNFO or anything else, to let the cores boost as high as possible.
> 
> Oh man, I'm glad I did. Har to adjust down 6 of my cores as the BoostTester would reboot my computer on first or second run-through. Some of them gave WHEA error and some just rebooted the machine. I can remember running the program before, but never with so "good" results for finding problematic cores. I think maybe the powershell command may have done the trick.
> 
> Could someone else also test and report back? I have a feeling I might have stumbled upon a new good "core optimizer" test here as CoreCycler did not report back any problems on any of the 6 cores I had to dial back, even after 5 iterations.


I use the BoostTester utility as well. You are correct, if you are wildly unstable you will get reboots. This doesn't mean your curve is setup wrong, however you may be asking for too much overclocking. BoostTester will give you a nice max effective clock boost if you set HWinfo to poll at 100 ms. If you are running autooc in AMD Overclocking and are stable after running CoreCycler, try decreasing your maximum possible boost and see if reboots disappear.


----------



## mckajvah

MikeS3000 said:


> I use the BoostTester utility as well. You are correct, if you are wildly unstable you will get reboots. This doesn't mean your curve is setup wrong, however you may be asking for too much overclocking. BoostTester will give you a nice max effective clock boost if you set HWinfo to poll at 100 ms. If you are running autooc in AMD Overclocking and are stable after running CoreCycler, try decreasing your maximum possible boost and see if reboots disappear.


Well, the point was that I was not "wildly unstable". Ran the machine with browsing, gaming, editing (Davinci Reslove) and other things for many hours, and no reboots or problems. CoreCycler was also fine like I said. I only found those 6 cores I had to dial back after running BoostTester with the logging. I'm also running the maxboost at stock +0mhz (5050mhz for a 5950x).


----------



## OCmember

You guys might want to try your boards best or a different bios before starting an RMA. I rolled my bios back and the best core that would fail on the first iteration passed.


----------



## MikeS3000

mckajvah said:


> Well, the point was that I was not "wildly unstable". Ran the machine with browsing, gaming, editing (Davinci Reslove) and other things for many hours, and no reboots or problems. CoreCycler was also fine like I said. I only found those 6 cores I had to dial back after running BoostTester with the logging. I'm also running the maxboost at stock +0mhz (5050mhz for a 5950x).


Gotha, then you just need to keep dialing back your curve and try to catch which cores are causing reboots using BoostTester. Other stress tests that I've used including Prime95 just won't hit that max boost even when doing SSE instructions like BoostTester will. Some people say GeekBench5 is good for finding instability for max. boost as well.


----------



## thigobr

Bad news... I just got back from RMA a new 5950X batch 2104PGS and it's also giving rounding errors on Prime95 SSE single thread. Now on core #5 (second best core according to HWINFO).

I haven't played with Curve Optimizer yet but that's already a bad sign... I am planning to test this new CPU again on my other PC (Asus B450I) but it will probably be the same behavior.


----------



## maca88

Hi to all,

I registered just to report that I also had WHEA errors with reboots on my old 5950X with batch number BG 2046SUS, which I was able to fix by replacing the CPU using the RMA form. The new CPU has a batch number BG 2105SUS and I am using it for almost two weeks without having any restarts. One thing that I would like to point out is that the RMA ticket will be automatically closed after ten days even if you are waiting them to respond. After ten days of waiting I got an automatic email saying that the ticket was closed and I was forced to open a new ticket and ask them what is going on. So for those that are in a RMA process, make sure to ping them before the tenth day of waiting for a response, to prevent the ticket to be closed.
Thanks for sharing your experiences as it made me more confident that the issue was indeed the CPU. This is my first AMD processor after being more than 20 years on Intel CPUs and I will think twice before buying from AMD again. I never had a problem with a CPU before and I didn't even think checking whether there were any issues before buying it, I guess I learned my lesson.


----------



## thigobr

Quick update: I just got a 5800X earlier today. Went through 4h of Corecycler and couple hours of game play without any issues! And I just replaced the new 2104PGS 5950X by this 2050PGS 5800X...

It's just sad but it's time for the second RMA!


----------



## ghiga_andrei

thigobr said:


> Quick update: I just got a 5800X earlier today. Went through 4h of Corecycler and couple hours of game play without any issues! And I just replaced the new 2104PGS 5950X by this 2050PGS 5800X...
> 
> It's just sad but it's time for the second RMA!


Very good that you confirmed with another CPU. Just tell AMD exactly that, what you did and how you tested, the Prime95 fails. They are very reasonable at RMA especially now.


----------



## JohnnyFlash

I have never trusted AMD's Deadpool torture machine engineering on these chips.


----------



## LuchoU

JohnnyFlash said:


> I have never trusted AMD's Deadpool torture machine engineering on these chips.


Yes, but still It doesn't make any sense, AMD has been in the processor market for years, they should have strong torture tests for processors by now before sending them to the market. We will never know the reasons behind this and AMD will never confess, if this was a human error or a bussiness decision to send bad chips to market to fullfill the demand. After all they can accept RMAs from people that is having issues. There is a client trust portion they are not considering or maybe they don't care, well at least for me if next time I have similar performance/price from Intel and AMD, I will go with Intel.


----------



## mlen

Just pinging back to report that I have not had _any_ crashes on my 5950X for about 3-4 weeks since installing the latest BIOS (+AGESA) update for my Asus Strix X570-E Gaming motherboard. Before the update, I was averaging a few crashes per day.


----------



## Deepcuts

mlen said:


> Just pinging back to report that I have not had _any_ crashes on my 5950X for about 3-4 weeks since installing the latest BIOS (+AGESA) update for my Asus Strix X570-E Gaming motherboard. Before the update, I was averaging a few crashes per day.


I read your replies and I could not understand if your crashes were at stock settings or you had some XMP/PBO setup.
Either way, nice to have your issue fixed.


----------



## mlen

Deepcuts said:


> I read your replies and I could not understand if your crashes were at stock settings or you had some XMP/PBO setup.
> Either way, nice to have your issue fixed.


Tried all combinations of stock, with/without XMP, with/without PBO and a bunch of other variables before the update. Had more and less frequent crashes but was unstable in all configurations. Have been running XMP + PBO now, perfectly stable.


----------



## brasoveanul

mlen said:


> Tried all combinations of stock, with/without XMP, with/without PBO and a bunch of other variables before the update. Had more and less frequent crashes but was unstable in all configurations. Have been running XMP + PBO now, perfectly stable.


Are your synthetic tests' scores normal? How much do you get in R20, R23, CPU-Z? I ask because some customers with deffective processors obtained lower computational performance, even if stability improved.


----------



## mlen

brasoveanul said:


> Are your synthetic tests' scores normal? How much do you get in R20, R23, CPU-Z? I ask because some customers with deffective processors obtained lower computational performance, even if stability improved.


Getting 28510 from Cinebench R23, I frankly have no idea how that figure compares? Is that decent? Haven't tried the other benchmarks you mentioned yet.


----------



## Catscratch

mlen said:


> Getting 28510 from Cinebench R23, I frankly have no idea how that figure compares? Is that decent? Haven't tried the other benchmarks you mentioned yet.




__
https://www.reddit.com/r/Amd/comments/kf2gqs


----------



## hadookaan

Same issue here having crashing at windows sign in, Whea_uncontrollable_error bios defaults, gigabyte x570 master rev 1.2, 5850x, latest bios F33i, the only way to make it stable is turn off PBO, and manual OC cpu/vcore, been also pulling my heair out 2 stands remain, gonna RMA this junk back and go with intel, for £750 CPU its totally unacceptable the amount of people have this issue.

There is forums/posts from release about these errors, and AMD keeping quite, which is terrible, they just let people waste days/weeks/months of trying to understand whats going on, wasted many hrs/days never ever buying from these crooks/clowns, waited 6 months for the CPU just don't bode to well, all excited about finally getting a 5950 well that didn't last long at all, AMD came back from the ashes the can go back there with there rubbish broken core CPUs!
I can see why there are many 5600/5800x on the market and not many at all of 5900/5950x on the market they probably cant get good enough yields, so I think they are just putting broken CPUs out there to make the numbers! all the youtubers saying ohh there the best cpus ever so wait, none of there youtubers never got these issues mmmm!


----------



## danny9428

Tech reviewers get to play with a more sophisticated dual-CCD 5900X/5950X (from Batch 2033~2036) that are generally in better quality and virtually no frequency delta, meaning they won't have the issue we face when overclocking or fiddle with curve optimizer in PBO.

I think the 5950X chip from der8auer even manages to pull all 16 cores to hit 5050Mhz in gaming when he get sub-ambient cooling to it, whreas my previous one never even get to see 1 core hitting it. 

It also appears (at least on this thread), those with 5900X/5950X with dual CCDs would run into issues far more often than single CCD chips. I assume this may be due to how imbalanced the quality in 2 CCDs we get on these SKUs. We average people have to struggle with nominally >150Mhz CCD delta and potentially worse best cores even in the more fitting CCD1 that may not pass default settings.

Regarding to faulty chips that may score lower on benchmarks, my 5950X previously in particular had a relatively unimpressive single core score at only 630~640 in CPU-Z and R23 single narrowly beating the default score of 1165G7 (I think it scored about 1580 somthing). Multi-core is like 24k~27k depending on how I tweaked my PBO which is far below what I could achieve with manual oc. Can't say I have too much confidence to this theory but I did query my other fellows here who also got a 5950X and their chips all outperformed mine on those benchmarks (with mix of both air and water cooling). I would say if you do find crashes and hiccups do try and run some benchmarks and compare them so you get some idea on what kind of a chip you're dealing with.

Now I guess the good thing is on AMD Threadripper such CCD/CCX delta range thingy does not seem to exist. All 8 CCXes in my 3960X appears to have fairly linear performance and quality. (Though PBO settings back in Zen 2 is fairly weak in performance and is nowhere near capable in terms of boosting than what Zen 3s can pull)
Let's hope the upcoming Zen-3-based Threadripper would deliver the same insane IPC performance of such architecture and good core-to-core latency while being able to promise decent chip quality overall to public, that should at least be a much more hassle-free chip to deal with than this.


----------



## brasoveanul

mlen said:


> Getting 28510 from Cinebench R23, I frankly have no idea how that figure compares? Is that decent? Haven't tried the other benchmarks you mentioned yet.


That score is fine, I get around 29200+/29300 in R23 with mine, very slightly less when the CPU is "warm". It depends on the temperature heavily, so your score is absolutely ok, if it is completely stable otherwise. The problem is that I am still left with the idea that some day, out of nowhere, this fourth(yes, fourth, the other ones were faulty) 5950X will start to malfunction, although it has been stable by now. This is the kind of discomfort AMD have managed to produce to their customers. I shall avoid by all means AMD, as a platform, in the future!!!


----------



## thigobr

Wow! Fourth!? What batches were your previous CPUs? All from RMA? 

I am on my second bad one (this is one from RMA) and it's not very confidence inspiring...


----------



## brasoveanul

Fortunately, I didn't use their RMA process, I simply returned the processors to the store and got another (new!) one each time. They are from several "batches", 2043, 2044, 2046, and so on. The current one is from 2046, so there's no batch-related inference that could be made, it is simply a matter of randomness/luck, which is intolerable when speaking about CPUs manufacturing.


----------



## danny9428

brasoveanul said:


> Fortunately, I didn't use their RMA process, I simply returned the processors to the store and got another (new!) one each time. They are from several "batches", 2043, 2044, 2046, and so on. The current one is from 2046, so there's no batch-related inference that could be made, it is simply a matter of randomness/luck, which is intolerable when speaking about CPUs manufacturing.


I guess you can also say the CPUs you've encountered are all from early batches (specifically those that have the highest odds that may be problematic)


----------



## brasoveanul

I guess that, if you also read here or elsewhere, you'll notice that recent "batches" have the same problems. And no, this "early batch" justification is invalid, processors should be fine regardless of chronology. If they are not able to manufacture decent products, they should delay their launch.


----------



## kairi_zeroblade

I just sold a 5900X which is kinda crap..no WHEA's on stock either with tweaked..one thing that is mostly off is the CCD quality..seems one of the posts above is correct/truth about some dual ccd chips being the 1st one as a good binned one and the 2nd one is crap..my 1st CCD can do good CO tuning while the second CCD is crap (just a mere -3 is making it crash and its not even boosting to max advertised), hence, the reason I sold it..slapped back my 5800X and its a golden one..


----------



## 1devomer

danny9428 said:


> Tech reviewers get to play with a more sophisticated dual-CCD 5900X/5950X (from Batch 2033~2036) that are generally in better quality and virtually no frequency delta, meaning they won't have the issue we face when overclocking or fiddle with curve optimizer in PBO.




The tech media received cherry-picked batch of cpu's, nothing unusual or new here, as every company always sends in its best bins for review.
The same is also usually true for competitive overclocking, one doesn't break 1st place on HWBot highly popular benchmarks, simply by throwing a random retail piece of silicon under LN2.

However, the fact that the tech media received cherry-picked binned cpu's, don't redeem them from their journalist inquiring duties.
Especially when issues are happening after the hardware launch, it would be common sense to do a follow-up check, about the overall platform stability.

And from what i understood, not only your favourite youtuber and/or reviewers are staying silent about these practices.
The upper level industry dudes seem also pretty happy that AMD keep selling rubbish products, due to high margin gains.


The point is, just spend a bit of time researching about the product one wishes to buy.
Better be safe than sorry, especially if going with a tight budget.


----------



## danny9428

No doubt same thing would also apply on Intel cpus when it comes to cherry-picked review samples, but here on AMD multi-ccd chips we are talking about potentially a 10 ~ 20% performance difference than what they've tested out.

I remember the first article I read about Zen 3 curve optimizer and that guy had his 5950X hitting 5.1Ghz and single core score in CPU-Z past 710 I'm like wut mine does 630 lol.


----------



## Deepcuts

1devomer said:


> However, the fact that the tech media received cherry-picked binned cpu's, don't redeem them from their journalist inquiring duties.
> Especially when issues are happening after the hardware launch, it would be common sense to do a follow-up check, about the overall platform stability.


Some youtuber (forgot the name but it is in this topic) did a poll asking users if they have a problem with their AMD Zen3 CPU.
The poll was asking if the CPU was Dead on Arrival though. I guess in his tiny brain cell, a CPU can only be defective if it is totally DoA.


----------



## JohnnyFlash

hadookaan said:


> Same issue here having crashing at windows sign in, Whea_uncontrollable_error bios defaults, gigabyte x570 master rev 1.2, 5850x, latest bios F33i, the only way to make it stable is turn off PBO, and manual OC cpu/vcore, been also pulling my heair out 2 stands remain, gonna RMA this junk back and go with intel, for £750 CPU its totally unacceptable the amount of people have this issue.
> 
> There is forums/posts from release about these errors, and AMD keeping quite, which is terrible, they just let people waste days/weeks/months of trying to understand whats going on, wasted many hrs/days never ever buying from these crooks/clowns, waited 6 months for the CPU just don't bode to well, all excited about finally getting a 5950 well that didn't last long at all, AMD came back from the ashes the can go back there with there rubbish broken core CPUs!
> I can see why there are many 5600/5800x on the market and not many at all of 5900/5950x on the market they probably cant get good enough yields, so I think they are just putting broken CPUs out there to make the numbers! all the youtubers saying ohh there the best cpus ever so wait, none of there youtubers never got these issues mmmm!


How many cores are failing on yours?

Just for the sake of perspective, myself and three of my IRL friends have 5950X's now and all are running flawless. There are issues out there, but it's not all of them. If I were in your shoes I would be equally upset, but you might just have to tweak one core to get it stable.


----------



## GamBoTron

JohnnyFlash said:


> How many cores are failing on yours?
> 
> *Just for the sake of perspective, myself and three of my IRL friends have 5950X's now and all are running flawless*.


Can you give some info on what tweaks made it stable for you guys? 

Mine is working fine, no Wheas or nothing, but the boost is really low tho on stock. Havent tried to tweak things yet but im getting quite low scores in benchmarks so its gonna be interesting to see if i can push this thing and be stable or not


----------



## JohnnyFlash

GamBoTron said:


> Can you give some info on what tweaks made it stable for you guys?
> 
> Mine is working fine, no Wheas or nothing, but the boost is really low tho on stock. Havent tried to tweak things yet but im getting quite low scores in benchmarks so its gonna be interesting to see if i can push this thing and be stable or not


I made no tweaks to get it stable, just thoroughly tested at stock, then moved to an all-core overclock.

What are your temps like in benchmarks? There are a good number of overclocking how-to threads, but basically you can start by slowly adding a negative offset to all cores in curve optimizer and test at each step. The thing is, if you actually use your chip and cores aren't sitting idle most of the time, you're much better off with an all-core setting. For me, PBO uses 1.28v for 4.3GHz, all-core is 24 hour prime stable at 1.11v. Huge difference in heat output and probably chip lifespan.


----------



## 1devomer

Deepcuts said:


> Some youtuber (forgot the name but it is in this topic) did a poll asking users if they have a problem with their AMD Zen3 CPU.
> The poll was asking if the CPU was Dead on Arrival though. I guess in his tiny brain cell, a CPU can only be defective if it is totally DoA.


I dunno what to say! ¯\_(ツ)_/¯

Knowing that the tech media get samples and marketing materials directly from the companies.
It is hard to believe that people working within the industry, don't know any better about how the cpu they received works.

It is a lost cause anyway, in any case.
The AMD USB issue was pointed out by the users, complaining about USB disconnections.
The push for the fix did not come from MB reviewers, because they didn't spend time testing thoroughly the USB subsystem.

I tend to repeat myself often, but that's a lot of overlooked issues all together, in my opinion.


----------



## GamBoTron

JohnnyFlash said:


> I made no tweaks to get it stable, just thoroughly tested at stock, then moved to an all-core overclock.
> 
> What are your temps like in benchmarks? There are a good number of overclocking how-to threads, but basically you can start by slowly adding a negative offset to all cores in curve optimizer and test at each step. The thing is, if you actually use your chip and cores aren't sitting idle most of the time, you're much better off with an all-core setting. For me, PBO uses 1.28v for 4.3GHz, all-core is 24 hour prime stable at 1.11v. Huge difference in heat output and probably chip lifespan.



Just did this Cinebench run:



















Temps look fine in benchmarks, but the score is low from what i have seen others come up with (that said, everything is on stock with this setup, only xmp turned on)

However, what i dont understand is that when im gaming temps go up to around 80 c (highest i have seen is 84 c) , and thats not even demanding games (apex legends low settings for example).
Havent tried cyberpunk yet but i can imagine it would get toasty as hell.

This cpu makes me scratch my head, its seems its not using its full potential but struggling in other scenarios (especially running games, dunno if thats down to it being single core workloads or not) : whats confusing me here is i dont know if its down to my limited cooling or settings in Bios, maybe a bit of both


----------



## brasoveanul

GamBoTron said:


> Just did this Cinebench run:
> 
> View attachment 2485370
> 
> 
> View attachment 2485371
> 
> 
> Temps look fine in benchmarks, but the score is low from what i have seen others come up with (that said, everything is on stock with this setup, only xmp turned on)
> 
> However, what i dont understand is that when im gaming temps go up to around 80 c (highest i have seen is 84 c) , and thats not even demanding games (apex legends low settings for example).
> Havent tried cyberpunk yet but i can imagine it would get toasty as hell.
> 
> This cpu makes me scratch my head, its seems its not using its full potential but struggling in other scenarios (especially running games, dunno if thats down to it being single core workloads or not) : the main issue here is i dont know if its down to my limited cooling or settings in Bios, maybe a bit of both


Do you have basic PBO/curve optimizer setup? This score would be fine without any kind of overclocking.


----------



## GamBoTron

brasoveanul said:


> Do you have basic PBO/curve optimizer setup? This score would be fine without any kind of overclocking.


Nice. 

Pbo is set to "auto"

Havent tweaked any other settings


----------



## brasoveanul

GamBoTron said:


> Nice.
> 
> Pbo is set to "auto"
> 
> Havent tweaked any other settings


Auto is basically disabled, set it to Enabled and do another R23 test.


----------



## GamBoTron

brasoveanul said:


> Auto is basically disabled, set it to Enabled and do another R23 test.













PBO set to "enabled". Didnt do much, temps were a bit higher (like a couple of c) score was worse than last run. any other tips to get better scores or push this chip? curve optimizer i guess?


----------



## kairi_zeroblade

GamBoTron said:


> Just did this Cinebench run:
> 
> View attachment 2485370
> 
> 
> View attachment 2485371
> 
> 
> Temps look fine in benchmarks, but the score is low from what i have seen others come up with (that said, everything is on stock with this setup, only xmp turned on)
> 
> However, what i dont understand is that when im gaming temps go up to *around 80 c (highest i have seen is 84 c)* , and thats not even demanding games (apex legends low settings for example).
> Havent tried cyberpunk yet but i can imagine it would get toasty as hell.
> 
> This cpu makes me scratch my head, its seems its not using its full potential but struggling in other scenarios (especially running games, dunno if thats down to it being single core workloads or not) : whats confusing me here is i dont know if its down to my limited cooling or settings in Bios, maybe a bit of both


and they say the 5800X was a bad chip since it runs hot..

I have the same experience on a previous 5900x..scores on stock were somewhat OFF from what reviews say also from here..observed it for a month and what I noted was my CCD2 is somewhat gimped..its not running the same boost clocks like the CCD1..and as I mentioned previously I was banging my head with its CO tuning..hence sold it..painless..


----------



## brasoveanul

This is weird, please make sure there are no conflicting settings in BIOS, load optimized defaults, save, exit, then re-enter BIOS and reset PBO to enabled, save and exit. If there is no change, then the problem gets more complicated than it should.


----------



## GamBoTron

brasoveanul said:


> This is weird, please make sure there are no conflicting settings in BIOS, load optimized defaults, save, exit, then re-enter BIOS and reset PBO to enabled, save and exit. If there is no change, then the problem gets more complicated than it should.


ok, turns out PBO had to be enabled two different places in the Bios. First under the OC option from the main menu and also on the "AMD overclocking" section. When i enabled them both these places it actually turned on and i got the following:










Big jump in score: also the temps and power consumption went up quite a big notch: i was sitting steady around 86/87 C and the power consumption was around 200 W


----------



## brasoveanul

GamBoTron said:


> ok, turns out PBO had to be enabled two different places in the Bios. First under the OC option from the main menu and also on the "AMD overclocking" section. When i enabled them both these places it actually turned on and i got the following:
> 
> View attachment 2485385
> 
> 
> Big jump in score: also the temps and power consumption went up quite a big notch: i was sitting steady around 86/87 C and the power consumption was around 200 W


This is what it should look like with PBO enabled. I presume you have an Asus board.

Later edit: It is good that you managed to figure it out. I currently use an Asrock X570 Creator board, and it is enough to enable PBO in just one place. With the former Asus board, I had to enable it in two places, as you did, although you may not have an Asus board, judging from the BIOS interface look and feel, but the structure seems to be the same.


----------



## GamBoTron

brasoveanul said:


> This is what it should look like with PBO enabled. I presume you have an Asus board.


MSI. Well, at least i know now that PBO is actually working. Very happy for that, now i will start to read about curve optimizer and try it out


----------



## brasoveanul

You can start with an all core -20, and see if your system crashes and/or boots normally. If it does, then use core cycler, and increase the voltage step by step on those cores that throw rounding exceptions, until there is no rounding exception on any core after at least 1-2 hours of corecycler testing.


----------



## machine038

GamBoTron said:


> score is low


24k is the stock score at 105W, I was going to say you can enable PBO for around 200W to get around 27k ~ 28k


----------



## ghiga_andrei

machine038 said:


> 24k is the stock score at 105W, I was going to say you can enable PBO for around 200W to get around 27k ~ 28k


142W is stock power consumption, not 105W... that 105W is just a number on paper...


----------



## machine038

ghiga_andrei said:


> 142W is stock power consumption, not 105W... that 105W is just a number on paper...


Sure, you mean PPT or "total socket power" , I mean the "CPU power" which is part of the PPT value.


----------



## danny9428

JohnnyFlash said:


> I made no tweaks to get it stable, just thoroughly tested at stock, then moved to an all-core overclock.
> 
> What are your temps like in benchmarks? There are a good number of overclocking how-to threads, but basically you can start by slowly adding a negative offset to all cores in curve optimizer and test at each step. The thing is, if you actually use your chip and cores aren't sitting idle most of the time, you're much better off with an all-core setting. For me, PBO uses 1.28v for 4.3GHz, all-core is 24 hour prime stable at 1.11v. Huge difference in heat output and probably chip lifespan.


Not judging but manual oc especially at that clock speed tend to be way more stable
PBO however you'd be asking Zen 3 chips to hit speeds of over 4.8Ghz consistently at variable vcores and with C6 state enabled and active (manual OC would disable that)


----------



## JohnnyFlash

danny9428 said:


> Not judging but manual oc especially at that clock speed tend to be way more stable
> PBO however you'd be asking Zen 3 chips to hit speeds of over 4.8Ghz consistently at variable vcores and with C6 state enabled and active (manual OC would disable that)


I did fully test it both stock and with default PBO, no errors.

The thing is, if you're fully using your chip you'll never see 4.8GHz as that only pops up when the majority of the other cores are idle. With an all-core you get consistent performance regardless of load at better efficiency. For my use case, PBO doesn't make sense. It may for yours, but there's a situation for either. If I can get to 4.5 once my custom loop is installed I'll be happy.


----------



## thigobr

RMA for my second bad 5950X was just approved... Now it's wait couple weeks until I get the replacement and hope this one is stable

I have been using a 5800X on the same machine for a week now and it's been stable as it should. Not a single WHEA, crash or reboot since I replaced the 5950X.


----------



## ghiga_andrei

JohnnyFlash said:


> I did fully test it both stock and with default PBO, no errors.
> 
> The thing is, if you're fully using your chip you'll never see 4.8GHz as that only pops up when the majority of the other cores are idle. With an all-core you get consistent performance regardless of load at better efficiency. For my use case, PBO doesn't make sense. It may for yours, but there's a situation for either. If I can get to 4.5 once my custom loop is installed I'll be happy.


We should make an unofficial market where people who want to all-core overclock their zen 3s can take the samples that do not work very well at high boosts and people who just want single-core performance with pbo could take the better samples, at a higher cost.

I am just joking of course. We don't need to do this while we can rma them.


----------



## Imraneo

Do you guys re-run your curve optimizer stress tests after flashing a new BIOS?
It seems that the newer BIOSes claim more stability, but the benchmarks are lower. I'm wondering if I should try more aggressive settings 🤔


----------



## 1devomer

Imraneo said:


> Do you guys re-run your curve optimizer stress tests after flashing a new BIOS?
> It seems that the newer BIOSes claim more stability, but the benchmarks are lower. I'm wondering if I should try more aggressive settings 🤔


Simply avoid updating your bios!!!!
The best performances bios come out generally near the cpu launch.
Then AMD tone down the boost algorithm because a lot of early adopters got bad cpu and begin to complain.

It is textbook strategy and AMD really mastered it after a couple of launches.
Simply check your PBO boost clocks at stock settings with a newer and older bios.


----------



## mongoled

1devomer said:


> Simply avoid updating your bios!!!!
> The best performances bios come out generally near the cpu launch.
> Then AMD tone down the boost algorithm because a lot of early adopters got bad cpu and begin to complain.
> 
> It is textbook strategy and AMD really mastered it after a couple of launches.
> Simply check your PBO boost clocks at stock settings with a newer and older bios.


Actually, its not always how you describe it with regards to checking PBO boost clocks.

You should check the comparative score in said benchmarks from BIOS to BIOS rather than checking the highest boost clock reached.

As the maximum achievable boost frequency in many circumstances does not equate to higher performance across the board.

I make it more simple

BIOS A has maximum PBO boost frequency of lets say 4500 mhz and scores 2222 in said multitcore benchmark
BIOS B has maximum PBO boost frequency of lets say 4450 mhz and scores 2222 in said multitcore benchmark

BIOS A has maximum PBO boost frequency of lets say 4950 mhz and scores 670 in said single core benchmark
BIOS B has maximum PBO boost frequency of lets say 4800 mhz and scores 670 in said single core benchmark

You need to do direct comparative tests across different benchmarks and when you do that you will see that 

maximum PBO frequency does not always equate to maximum benchmark scored achieved.

But this is off topic ...


----------



## OCmember

In my experience an earlier bios net'd me 627 Single core IPC R20, I haven't seen that since updating the bios. And I usually run CoreCycler several times again after it passes once. I've had a core pass a few iterations and then a day later the same core fail on the first iteration.


----------



## ghiga_andrei

OCmember said:


> In my experience an earlier bios net'd me 627 Single core IPC R20, I haven't seen that since updating the bios. And I usually run CoreCycler several times again after it passes once. I've had a core pass a few iterations and then a day later the same core fail on the first iteration.


You could flash that earlier BIOS and see if the score changes. I did this and it didn't in my case. It's more day to day variations than bios variations. Temperature of cpu and vrm and chipset, I don't know what, but they cause the most variations.


----------



## xeizo

Ambient temps does more for the scores than switching bios versions LoL


----------



## LuchoU

Yesterday I had a WHEA error 18 in APIC ID 12, which is tied to core 6 (second fastest core according to Ryzen Master) when running Red Dead Redemption 2. I let that only core ran through CoreCycler all night for 12 hours and it did not throw any rounding errors. So possibly CoreCycler did not boost to max clocks (?), I suppose that's the reason why it failed when gaming. This is an edge case.

Is there an easier/faster way to look for these edge cases?, other than gaming and waiting for the BSOD to occur/not to occur, ughh...


----------



## JohnnyFlash

LuchoU said:


> Yesterday I had a WHEA error 18 in APIC ID 12, which is tied to core 6 (second fastest core according to Ryzen Master) when running Red Dead Redemption 2. I let that only core ran through CoreCycler all night for 12 hours and it did not throw any rounding errors. So possibly CoreCycler did not boost to max clocks (?), I suppose that's the reason why it failed when gaming. This is an edge case.
> 
> Is there an easier/faster way to look for these edge cases?, other than gaming and waiting for the BSOD to occur/not to occur, ughh...


Standard wisdom back in the day was to test for 12-24 hours, then back off one step or increase the voltage one step.

If it passes 12 hours -20 across all cores, then set to -19 for your permanent setting. My chip passed 24 hour prime AVX2 at 4.35GHz, so it gets set to 4.3GHz for my 24/7.


----------



## Imraneo

Not sure if I would wanna use any tool to test my cores. So far the best way was for me to literally leave the PC idle overnight(s). Nowadays, I actually start mining too, since it stresses the GPU and leaves the CPU idling quite a bit. True enough, I did encounter one of my cores crash during mining.

As for BIOS updates, I've been running a pretty old BIOS for a while now and only recently I decided to update due to 2 things. USB fix and Resizable BAR.


----------



## JohnnyFlash

Imraneo said:


> Not sure if I would wanna use any tool to test my cores. So far the best way was for me to literally leave the PC idle overnight(s). Nowadays, I actually start mining too, since it stresses the GPU and leaves the CPU idling quite a bit. True enough, I did encounter one of my cores crash during mining.


If your chip was making rounding errors, but staying stable enough to not crash, you would never know.


----------



## Imraneo

JohnnyFlash said:


> If your chip was making rounding errors, but staying stable enough to not crash, you would never know.


Ingorance is bliss!.. kidding 
Will I be able to see these errors in Event Viewer?
Also, do rounding errors have retries and correction? If they do, then perhaps it's ok to have those occasionally as opposed to a full system crash/reboot.


----------



## JohnnyFlash

Imraneo said:


> Ingorance is bliss!.. kidding
> Will I be able to see these errors in Event Viewer?
> Also, do rounding errors have retries and correction? If they do, then perhaps it's ok to have those occasionally as opposed to a full system crash/reboot.


They would not be in the event viewer because *the system thinks the result is correct*. Programs like Prime95 or OCCT find errors by running complex calculations and then checking them against a known result; if they don't match, it fails. Crashes happen when the error occurs in a cricital system process. The only way to know if your system is 100% stable is with testing.

Gamers tend not to test as much as they should, because a glitch here and there in a game goes unnoticed, but if you do anything meaningful like video encoding or rendering it will lead to glitches in the end result.


----------



## ghiga_andrei

JohnnyFlash said:


> They would not be in the event viewer because *the system thinks the result is correct*. Programs like Prime95 or OCCT find errors by running complex calculations and then checking them against a known result; if they don't match, it fails. Crashes happen when the error occurs in a cricital system process. The only way to know if your system is 100% stable is with testing.
> 
> Gamers tend not to test as much as they should, because a glitch here and there in a game goes unnoticed, but if you do anything meaningful like video encoding or rendering it will lead to glitches in the end result.


Very true. And this is a big problem when people use their unstable system to be a part of a cloud computing setup, like [email protected] and they submit wrong data that the host thinks it's valid. Very nasty situation.

In the past this was a problem only for those overclocking their cpus manually and putting too little voltage for the frequency they set. But now it seems zen3 brought these problems to everyday stock setup users.

I wonder how many zen3 owners have rounding errors and they do not know about them because they do not test and the system isn't that unstable to cause a reboot. And they may get corrupt data from time to time and random glitches and they would not even know what caused them.


----------



## 1devomer

JohnnyFlash said:


> ...The only way to know if your system is 100% stable is with testing.





JohnnyFlash said:


> Gamers tend not to test as much as they should, because a glitch here and there in a game goes unnoticed...



Why should i be bothered to test my setup, when i'm running stock settings?
As far i can recall, the cpu should be able to perform without crashing on its own.


And please, i am once again asking for your support toward gamers.
I do not think the gamers have any fault, if the games keep crashing to issues, related to hardware and/or software bugs.


----------



## JohnnyFlash

1devomer said:


> Why should i be bothered to test my setup, when i'm running stock settings?
> As far i can recall, the cpu should be able to perform without crashing on its own.
> 
> 
> And please, i am once again asking for your support toward gamers.
> I do not think the gamers have any fault, if the games keep crashing to issues, related to hardware and/or software bugs.


I agree with you 100%: There should be no need to test anything new at stock settings, ever. However, this situation is different and unfortunately people have to.

The same thing has been happening with mobile intel chips and undervolting. Users undervolt blindly, run a quick stability test at max clocks and call it stable without considering the in-between voltages could be unstable. It's not a "group" thing, it's a patience thing.


----------



## ghiga_andrei

1devomer said:


> Why should i be bothered to test my setup, when i'm running stock settings?
> As far i can recall, the cpu should be able to perform without crashing on its own.
> 
> 
> And please, i am once again asking for your support toward gamers.
> I do not think the gamers have any fault, if the games keep crashing to issues, related to hardware and/or software bugs.


Didn't say you should normally. 
You and we all really shouldn't, but I was just saying that in this unfortunate situation we should, and a lot of people who have no idea about this, should.


----------



## mongoled

1devomer said:


> Why should i be bothered to test my setup, when i'm running stock settings?
> As far i can recall, the cpu should be able to perform without crashing on its own.
> 
> 
> And please, i am once again asking for your support toward gamers.
> I do not think the gamers have any fault, if the games keep crashing to issues, related to hardware and/or software bugs.





JohnnyFlash said:


> I agree with you 100%: There should be no need to test anything new at stock settings, ever. However, this situation is different and unfortunately people have to.
> 
> The same thing has been happening with mobile intel chips and undervolting. Users undervolt blindly, run a quick stability test at max clocks and call it stable without considering the in-between voltages could be unstable. It's not a "group" thing, it's a patience thing.


Really guys, are you serious ????

Forget the issues with these CPUs or any CPU for that matter.

Just to get this straight, you are building a custom PC and you have an expectation that every component that you plug into your build should be assumed to just work and not have any problems ?

You people understand how naive and ignorant your statements are ?

You do understand when you buy a pre-built PC it has gone through several different layers of validation before it gets to the hands of the end user.

And even in these situations issues arise.

I strongly suggest you should re-consider what you have both alluded to as IMHO your expectations are ludicrous!


----------



## Anthos

There's people here talking about 2 different things.

1) Components on their own merit working ok i.e not faulty CPU, no screwed up RAM that throws errors no matter what etc

2) Compatibility. A cpu and a stick of ram could both be perfectly fine but you put them in the same setup and they don't play well together.

Obviously number 1 should be expected that each thing you buy should be functional. But when you select which ones to buy it's up to the client to make sure what they are compatible with each other. The huge bulk of problems in this thread stem from number 1 (in some rare cases it could have stemmed from compatibility issues but those are rare and far in between)


----------



## mongoled

Anthos said:


> There's people here talking about 2 different things.
> 
> 1) Components on their own merit working ok i.e not faulty CPU, no screwed up RAM that throws errors no matter what etc
> 
> 2) Compatibility. A cpu and a stick of ram could both be perfectly fine but you put them in the same setup and they don't play well together.
> 
> Obviously number 1 should be expected that each thing you buy should be functional. But when you select which ones to buy it's up to the client to make sure what they are compatible with each other. The huge bulk of problems in this thread stem from number 1 (in some rare cases it could have stemmed from compatibility issues but those are rare and far in between)


My point was geared towards those peeps expectation that a custom PC should "just work".

Of course it should "just work", but thats not the reality, hence the reason for mentioning pre-built systems and the validation process they go through.

There is a reason for that, or maybe those OEMs selling pre-built systems are just stupid .....


----------



## Anthos

mongoled said:


> My point was geared towards those peeps expectation that a custom PC should "just work".
> 
> Of course it should "just work", but thats not the reality, hence the reason for mentioning pre-built systems and the validation process they go through.
> 
> There is a reason for that, or maybe those OEMs selling pre-built systems are just stupid .....


Yes, any custom built always has the potential that you fire it up and keeps crashing every 2 minutes while in windows for example. And you are unsure if you installed something wrong, or you have a wrong bios setting, if there's a compatibility issue or if it's just straight up half dead. Obviously in those cases you need to figure out what goes on especially if you bought the parts from different stores so if you have to return something you know which to whom. I am not disagreeing with what you are saying. Just trying to differentiate because some people seem more focused on lets say mechanically faulted issues while others with general issues which compatibility is one of them (which has existed as an issue in computers since ever).


----------



## mongoled

Anthos said:


> Yes, any custom built always has the potential that you fire it up and keeps crashing every 2 minutes while in windows for example. And you are unsure if you installed something wrong, or you have a wrong bios setting, if there's a compatibility issue or if it's just straight up half dead. Obviously in those cases you need to figure out what goes on especially if you bought the parts from different stores so if you have to return something you know which to whom. I am not disagreeing with what you are saying. Just trying to differentiate because some people seem more focused on lets say mechanically faulted issues while others with general issues which compatibility is one of them (which has existed as an issue in computers since ever).


Sorry, should have said that I agree with your points.

Just miffed with the expectation of "the masses" that everything should just work and be perfect without having to do anything.

As if all stuff happen by "magic".

This type of thinking is just getting worse as time goes on, I expect this from "the masses" but when I am on overclock.net my expectation is that the people who frequent here are "above that".

So obviously, my expectation does not match what I am seeing here....

Of course this could be a fault in my way of thinking, but looking at how our societies are "progressing" I dont believe this to be the case ...


----------



## ghiga_andrei

mongoled said:


> Sorry, should have said that I agree with your points.
> 
> Just miffed with the expectation of "the masses" that everything should just work and be perfect without having to do anything.
> 
> As if all stuff happen by "magic".
> 
> This type of thinking is just getting worse as time goes on, I expect this from "the masses" but when I am on overclock.net my expectation is that the people who frequent here are "above that".
> 
> So obviously, my expectation does not match what I am seeing here....
> 
> Of course this could be a fault in my way of thinking, but looking at how our societies are "progressing" I dont believe this to be the case ...


I don't fully agree with you.

Just to share my personal case, I had my system working fine for months with the 3700x and had no problems with it at stock settings and just XMP enabled. And in december I only replaced the cpu with the 5900x and since then I had troubles with reboots and prime95 fails. All the system is the same.

There is a difference between expecting a 0 fail rate, which is not possible, there will always be some defective products, and what is happening now with the zen3 chips. This is a systematic failure mechanism caused by bad binning to desperately try to meet demand by shipping chips that are at the limit of stability. This is not normal.

I am now at the 3rd 5900x and yes, with this 3rd one my system is stable again. With the first 2 it was not, with all settings the same. It is not acceptable to receive 2 out 3 defective cpus.


----------



## thigobr

I agree with ghiga_andrei here. It's unacceptable this high failure rate! I am too on my way to get a 3rd 5950X and I hope this one is working as intended.

But I see mongoled points... The moment I started building my own PCs, 20 years ago, I knew I had to do an ample burn in test before calling that setup stable. That's to prevent any issues arising from parts incompatibility, DOA parts or even parts that fail on their early life (memory used to be on that group, in Brazil it's pretty common to buy OEM memory modules and they are often mis-handled and suffer with ESD to fail early in their lifycycle).

But I have never in those 20 years experienced 2 faulty Boxed CPUs in a row like with these current 5950X!


----------



## Redwoodz

ghiga_andrei said:


> Very true. And this is a big problem when people use their unstable system to be a part of a cloud computing setup, like [email protected] and they submit wrong data that the host thinks it's valid. Very nasty situation.
> 
> In the past this was a problem only for those overclocking their cpus manually and putting too little voltage for the frequency they set. But now it seems zen3 brought these problems to everyday stock setup users.
> 
> I wonder how many zen3 owners have rounding errors and they do not know about them because they do not test and the system isn't that unstable to cause a reboot. And they may get corrupt data from time to time and random glitches and they would not even know what caused them.


 This whole thread is about rounding errors


ghiga_andrei said:


> I don't fully agree with you.
> 
> Just to share my personal case, I had my system working fine for months with the 3700x and had no problems with it at stock settings and just XMP enabled. And in december I only replaced the cpu with the 5900x and since then I had troubles with reboots and prime95 fails. All the system is the same.
> 
> There is a difference between expecting a 0 fail rate, which is not possible, there will always be some defective products, and what is happening now with the zen3 chips. This is a systematic failure mechanism caused by bad binning to desperately try to meet demand by shipping chips that are at the limit of stability. This is not normal.
> 
> I am now at the 3rd 5900x and yes, with this 3rd one my system is stable again. With the first 2 it was not, with all settings the same. It is not acceptable to receive 2 out 3 defective cpus.


 Or you finally got a working bios. You will never know.


----------



## tdimarzio

Update for my 5950x. WHEA 18 Cache Hierarchy Error finally root-caused and resolved. I RMA'd my 5950x but that changed nothing. Actually, my old 5950x could take a more aggressive curve optimizer, but ... c'est la vie ... Bought a Ryzen 3600 but that did nothing. Still WHEA 18. Clean install of Win10. No change. Replace 1000w PSU with new 1000w PSU. No change. Replace RAM, no change. So, I had tried everything but replacing the motherboard and GPU. GPU was easier, so bought the cheapest GPU I could find. Instant fix. So, I knew it was my video card. RMA'd my Red Devil 6900 XT and the replacement is perfect. So, It's been 3 weeks without a single WHEA 18. Before swapping the defective 6900 XT, I could maybe go a few hours without a WHEA 18. I could work-around the issue by enabling a screen saver, ensuring the GPU never dropped to the lowest power states. System never crashed under load. Finally, this nightmare is behind me. Good luck to everyone else.


----------



## ghiga_andrei

Redwoodz said:


> Or you finally got a working bios. You will never know.


Same BIOS between 2nd and 3rd. Whole behavior of new CPU is different than first 2. Lower boost clocks.


----------



## Abula

tdimarzio said:


> Update for my 5950x. WHEA 18 Cache Hierarchy Error finally root-caused and resolved. I RMA'd my 5950x but that changed nothing. Actually, my old 5950x could take a more aggressive curve optimizer, but ... c'est la vie ... Bought a Ryzen 3600 but that did nothing. Still WHEA 18. Clean install of Win10. No change. Replace 1000w PSU with new 1000w PSU. No change. Replace RAM, no change. So, I had tried everything but replacing the motherboard and GPU. GPU was easier, so bought the cheapest GPU I could find. Instant fix. So, I knew it was my video card. RMA'd my Red Devil 6900 XT and the replacement is perfect. So, It's been 3 weeks without a single WHEA 18. Before swapping the defective 6900 XT, I could maybe go a few hours without a WHEA 18. I could work-around the issue by enabling a screen saver, ensuring the GPU never dropped to the lowest power states. System never crashed under load. Finally, this nightmare is behind me. Good luck to everyone else.


 Very interesting case, i also have done lots of swapping with no success,


3x different sets of rams, at stock and XMP, Corsair CMK16GX4M2Z3600C14, Gskill F4-3600C16D-16GVK, Gskill F4-3600C16D-32GTZNC
2x different PSU, Seasonic SSR-1000TR and Corsiar RM850x
2x different SSDs, Sabrent Rocket Plus 2tb and Samsung 980Pro 2tb (some said there were some issues with sabrent ssds)
2x Gigabyte X570 Aorus Xtreme (v1.0 and v1.1) with multiple BIOS F30/F31/F32/F33c/F33h
2x different CPUs, 5950x and 3950x (recent test, still had a reset over the weekend, reason i didn't rma the CPU).

Now its turn of the Power Color Red Devil 6900XT Limited Edition, hope its the GPU so i can move on. Was powercolor RMA easy?


----------



## ghiga_andrei

Just wanting to let people from Europe know how the RMA worked in my case:

opened the ticket on a monday
received mail with more info requests and answered immediately on wednesday
received the prepayed DHL shipping label on friday morning the same week and called DHL to send a courier and they picked up the package from my house the same day
received notification from DHL that package has been delivered to AMD on monday
received inspection and RMA approved mail on wednesday
received shipped new CPU mail on thursday - no AWB for replacement CPU, had no tracking for the new CPU
received new CPU on monday

In total, the process took exactly 2 weeks and I only had my system without a CPU for 10 days. I did not pay anything.

Background on my problems:

this is my 3rd 5900x - the first 5900x was a complete mess, with 15dgrC temp difference between CCDs and WHEA or reboots immediately after starting windows at stock settings... I returned it at the store and got a second one which was better, no temp difference between CCDs and only 1 bad core which was stable only with CO +10. The rest of the cores were ok.
new CPU is stable at stock and can even take CO -10 without any reboots or rounding errors with same BIOS between 2nd and new cpu so Redwoodz can shut up his nonsense
1st and 2nd CPUs were lot 2046SUS
new CPU is 2105SUS

What I can tell for sure is that the boost behavior on the new CPU is totally different. I need -10 CO on all cores just to reach the same clocks that the old CPU reached at stock. Benchmarks slightly lower but again, with adjusted CO I get the same results as old one and be stable at the same time.

*So I can say for sure that AMD has reduced the boost curve in the latest samples to fix these stability problems.* Unfortunately, this comes with a slight performance penalty, but I guess it's much better to have a stable CPU which is 1% slower than an unusable system at stock.

So yeah, I guess that's why the activity on this forum ramped down, AMD "fixed it" by lowering down the turbo. Unfortunately, I still think there are a lot of initial zen3 CPUs that are in some systems now that are unstable or at the limit of instability and the people running them do not know about them and will either have reboots from time to time not knowing why or even get corrupt data and not know why.

Since they "fixed it", it is clear now that AMD will never speak about this publicly.


----------



## 1devomer

ghiga_andrei said:


> Since they "fixed it", it is clear now that AMD will never speak about this publicly.



It doesn't mean that the AMD's customers that invested in early Zen3 samples bought a decent cpu!!!

Once one will start to overclock the cpu, one will come to the rapid conclusion that the cpu is dull and bad!
Time flowing away, AMD in the near future may launch another "XT", "Zen3+" line-up, leaving old customers laid with horrible cpu bins.
This is my personal experience when i bought my Zen2 cpu, if you wondering why i kept adding information in this thread.

In my case, i will not buy any AMD products in the near future.
Not until AMD come back from being a shady company, the speculation on the AMD stock is very very strong, our friends loving stonks are pretty happy about.
And i have the feeling that the shadiness that have enveloped AMD company, will not go away anywhere soon.

Not a big deal at the end, nobody buy or even know AMD products exist, outside tech passionate sphere.
I'm just a bit sad for some unlucky tech lovers users, that invested in a cpu, platform and got in return a dull piece of doped sand.


----------



## ghiga_andrei

1devomer said:


> It doesn't mean that the AMD's customers that invested in early Zen3 samples bought a decent cpu!!!
> 
> Once one will start to overclock the cpu, one will come to the rapid conclusion that the cpu is dull and bad!
> Time flowing away, AMD in the near future may launch another "XT", "Zen3+" line-up, leaving old customers laid with horrible cpu bins.
> This is my personal experience when i bought my Zen2 cpu, if you wondering why i kept adding information in this thread.
> 
> In my case, i will not buy any AMD products in the near future.
> Not until AMD come back from being a shady company, the speculation on the AMD stock is very very strong, our friends loving stonks are pretty happy about.
> And i have the feeling that the shadiness that have enveloped AMD company, will not go away anywhere soon.
> 
> Not a big deal at the end, nobody buy or even know AMD products exist, outside tech passionate sphere.
> I'm just a bit sad for some unlucky tech lovers users, that invested in a cpu, platform and got in return a dull piece of doped sand.


I wouldn't be so extreme as to call it a dull piece of doped sand. When it's stable it's still a great performer at a much lower power consumption than what 11th gen intel manage.

I don't know if I would buy AMD or not in the future, it depends. But I will never buy anything new until I waited and see what these forums complain about in the new gen.


----------



## thigobr

They just sent notification about shipping my 3rd (2nd from RMA) 5950X. I hope this one works!

I should have learnt to wait for longer before buying Zen products... I ordered a 1700 on launch week and got the dreaded segfault bug. As I work developing code and I also use Gentoo Linux a lot I was getting constant segfault when compiling code. RMA...

I skipped Zen+ and only ordered a 3700X way later in the cycle, mid 2020. This one was very good sample, good overclocker and ran IF at 1900MHz.

Now I tried to order a Zen3 close to launch and got two defect samples in a row!

Next time even if AMD have a competitive product I will wait before buying...


----------



## 1devomer

ghiga_andrei said:


> I wouldn't be so extreme as to call it a dull piece of doped sand. When it's stable it's still a great performer at a much lower power consumption than what 11th gen intel manage.


I agree, you are right.
Still, the performance loss and increased instability are hindering for some tasks.

In my case, i had a lot of troubles when aligning DNA fragments with MAFFT, under Linux, which use AVX FFT computation.
I mean, the difference between a decent bin and a dull bin is pretty huge in this case, even if these cpu consume less power on overall.


----------



## woozywoo

I just wanted to note here that a recent BIOS update fixed this WHEA/reboot issue for me. If you're experiencing this issue and haven't tried one of the more recent BIOS updates for your motherboard, it might be worth trying that before you RMA your CPU.


----------



## MikeS3000

ghiga_andrei said:


> Just wanting to let people from Europe know how the RMA worked in my case:
> 
> opened the ticket on a monday
> received mail with more info requests and answered immediately on wednesday
> received the prepayed DHL shipping label on friday morning the same week and called DHL to send a courier and they picked up the package from my house the same day
> received notification from DHL that package has been delivered to AMD on monday
> received inspection and RMA approved mail on wednesday
> received shipped new CPU mail on thursday - no AWB for replacement CPU, had no tracking for the new CPU
> received new CPU on monday
> 
> In total, the process took exactly 2 weeks and I only had my system without a CPU for 10 days. I did not pay anything.
> 
> Background on my problems:
> 
> this is my 3rd 5900x - the first 5900x was a complete mess, with 15dgrC temp difference between CCDs and WHEA or reboots immediately after starting windows at stock settings... I returned it at the store and got a second one which was better, no temp difference between CCDs and only 1 bad core which was stable only with CO +10. The rest of the cores were ok.
> new CPU is stable at stock and can even take CO -10 without any reboots or rounding errors with same BIOS between 2nd and new cpu so Redwoodz can shut up his nonsense
> 1st and 2nd CPUs were lot 2046SUS
> new CPU is 2105SUS
> 
> What I can tell for sure is that the boost behavior on the new CPU is totally different. I need -10 CO on all cores just to reach the same clocks that the old CPU reached at stock. Benchmarks slightly lower but again, with adjusted CO I get the same results as old one and be stable at the same time.
> 
> *So I can say for sure that AMD has reduced the boost curve in the latest samples to fix these stability problems.* Unfortunately, this comes with a slight performance penalty, but I guess it's much better to have a stable CPU which is 1% slower than an unusable system at stock.
> 
> So yeah, I guess that's why the activity on this forum ramped down, AMD "fixed it" by lowering down the turbo. Unfortunately, I still think there are a lot of initial zen3 CPUs that are in some systems now that are unstable or at the limit of instability and the people running them do not know about them and will either have reboots from time to time not knowing why or even get corrupt data and not know why.
> 
> Since they "fixed it", it is clear now that AMD will never speak about this publicly.


The reduced boosting behavior is exactly what I experienced as well on my RMA 5900x. My original was like a Nov. '20 production date and I think the new one is January '21. All of my cores can take -10 or more on CO but you are correct when you say that CO is needed to reach the same boost clocks as the original defective one. AMD got a little too aggressive with Zen 3 and sacrificed stability and unfortunately ruined my confidence in their CPUs a bit. This will make me think twice about buying an AMD processor again. I don't recall as many issues on Zen 2 except for some 3950x cpus that were pushed too hard.


----------



## LuchoU

woozywoo said:


> I just wanted to note here that a recent BIOS update fixed this WHEA/reboot issue for me. If you're experiencing this issue and haven't tried one of the more recent BIOS updates for your motherboard, it might be worth trying that before you RMA your CPU.


For my 5800x, the bioses are making the problem less recurrent, but it's still there. In my case my two best cores according to Ryzen Master were giving WHEA errors because they were boosting past 4,85GHz even without PBO. I had to use CO + 5 in Best Core and CO + 2 in second best core to keep them stable.


----------



## tdimarzio

Abula said:


> Very interesting case, i also have done lots of swapping with no success,
> 
> 
> 3x different sets of rams, at stock and XMP, Corsair CMK16GX4M2Z3600C14, Gskill F4-3600C16D-16GVK, Gskill F4-3600C16D-32GTZNC
> 2x different PSU, Seasonic SSR-1000TR and Corsiar RM850x
> 2x different SSDs, Sabrent Rocket Plus 2tb and Samsung 980Pro 2tb (some said there were some issues with sabrent ssds)
> 2x Gigabyte X570 Aorus Xtreme (v1.0 and v1.1) with multiple BIOS F30/F31/F32/F33c/F33h
> 2x different CPUs, 5950x and 3950x (recent test, still had a reset over the weekend, reason i didn't rma the CPU).
> 
> Now its turn of the Power Color Red Devil 6900XT Limited Edition, hope its the GPU so i can move on. Was powercolor RMA easy?


The PowerColor RMA authorization was very quick. That was the quickest part. Few questions if any (or maybe none ... can't remember). However, overall, the process is passable, but not great. They would not do an advance RMA / cross-shipment. I offered to pay them up front and they said they could not take / process any payment. So, then my concern was how long I would be without a capable GPU. Since they are in CA and I am East coast, US, I paid good money to ship it 2-day air. It got to CA PowerColor RMA quickly as a result. However, while they received my 6900 XT on a Monday, they did not ship the replacement until Friday. That's too long if you ask me. Also, they only shipped on Friday because I called their RMA dept. a few times and mildly harassed them  If I had not called, I know the replacement would not have shipped until the following Monday at the earliest. Next issue is they would not ship faster than ground. I even offered to pay to upgrade the return shipping, but again they said they could not accept / process payment. So, shipping FedEx ground from US west coast to east coast took a full f'n week. Total time without my GPU was two weeks. So, that sucks ... but then I remembered how lucky I was to have purchased one (or any current gen GPU for that matter) at MSRP. So, I won't complain. Good luck in your RMA. (Maybe you'll get lucky and they'll send you a new 6900 XT Ultimate by mistake?  )


----------



## OCmember

GAMING RIG

Odd thing just happened just a little bit ago. I changed the 100.00MHz bus clock from 100.00MHz to Auto to make Spread Spectrum visible in bios. Spread Spectrum was left set at Auto, had a boot issue, set Spread Spectrum to Disabled while leaving the bus clock on Auto and that worked without issue. Background info: From day one I've always set the bus clock to 100.00MHz and with my board/bios Spread Spectrum would disappear.

I just checked if I had any WHEA errors. My last WHEA error was when I was testing my IF clock and RAM, April 3rd and I had 6 WHEA errors from the OC testing. There were 164 WHEA errors in a matter of 1 minute as the log shows.

EDIT: one of the logs shows EVENT ID 18 with a red exclamation point "A fatal hardware error has occurred"

Explanation


https://social.technet.microsoft.com/wiki/contents/articles/3567.event-id-18-microsoft-windows-whea-logger.aspx



*Some of the main hardware problems which cause machine check exceptions include:*



System bus errors (error communicating between the processor and the motherboard)
Memory errors that may include parity and error correction code (ECC) problems. Error checking ensures that data is stored correctly in the RAM; if the information is corrupted, then random errors occur.
Cache errors in the processor; the cache stores important data and code. If this is corrupted, errors often occur.
Poor voltage regulation (i.e. power supply problem, voltage regulator malfunction, capacitor degradation)
Damage due to power spikes
Static damage to the motherboard
Incorrect processor voltage setting in the BIOS (too low or too high)
Overclocking
Permanent motherboard or power supply damage caused by prior overclocking
Excessive temperature caused by insufficient airflow (possibly caused by fan failure or blockage of air inlet/outlet)
Improper BIOS initialization (the BIOS configuring the motherboard or CPU incorrectly)
Installation of a processor that is too much for your motherboard to handle (excessive power requirement, incompatibility)
Defective hardware that may be drawing excessive power or otherwise disrupting proper voltage regulation


Great


----------



## thigobr

I got a new 5950X 2106PGS and I am glad to say this one is stable at stock! I ran one day of CoreCycler Prime95 and another day of CoreCycler y-cruncher and it worked without any errors! Again same UEFI, parts, load defaults and same Windows Install. I am seeing the same kind of boost as the previous CPUs though. Some people commented lower boost frequencies but this new one is getting up to 5050MHz on the best cores and and between 4800~5000MHz on the other cores all stock with PBO disabled.

Bonus: this one is stable with FCLK at 1900MHz and very low voltages (VSOC 1.06V, CLDO VDDP 0.90V, VDDG_CCD 0.94V and IOD 0.98V). No more WHEA!

TLDR: After two bad 5950s (2044 and 2104) I finally got a stable one!


----------



## OCmember

Nice, @thigobr !


----------



## mongoled

thigobr said:


> I got a new 5950X 2106PGS and I am glad to say this one is stable at stock! I ran one day of CoreCycler Prime95 and another day of CoreCycler y-cruncher and it worked without any errors! Again same UEFI, parts, load defaults and same Windows Install. I am seeing the same kind of boost as the previous CPUs though. Some people commented lower boost frequencies but this new one is getting up to 5050MHz on the best cores and and between 4800~5000MHz on the other cores all stock with PBO disabled.
> 
> Bonus: this one is stable with FCLK at 1900MHz and very low voltages (VSOC 1.06V, CLDO VDDP 0.90V, VDDG_CCD 0.94V and IOD 0.98V). No more WHEA!
> 
> TLDR: After two bad 5950s (2044 and 2104) I finally got a stable one!


Very happy for you


----------



## 1devomer

1devomer said:


> ...If the single core boost is too fast and the default max voltages are too low for the core silicon quality, the cpu will crash the OS with a WHEA error...





OCmember said:


> ....EDIT: one of the logs shows EVENT ID 18 with a red exclamation point "A fatal hardware error has occurred"
> 
> Explanation
> 
> 
> https://social.technet.microsoft.com/wiki/contents/articles/3567.event-id-18-microsoft-windows-whea-logger.aspx
> 
> 
> 
> *Some of the main hardware problems which cause machine check exceptions include:*
> 
> 
> 
> System bus errors (error communicating between the processor and the motherboard)
> Memory errors that may include parity and error correction code (ECC) problems. Error checking ensures that data is stored correctly in the RAM; if the information is corrupted, then random errors occur.
> Cache errors in the processor; the cache stores important data and code. If this is corrupted, errors often occur.
> Poor voltage regulation (i.e. power supply problem, voltage regulator malfunction, capacitor degradation)
> Damage due to power spikes
> Static damage to the motherboard
> Incorrect processor voltage setting in the BIOS (too low or too high)
> Overclocking
> Permanent motherboard or power supply damage caused by prior overclocking
> Excessive temperature caused by insufficient airflow (possibly caused by fan failure or blockage of air inlet/outlet)
> Improper BIOS initialization (the BIOS configuring the motherboard or CPU incorrectly)
> Installation of a processor that is too much for your motherboard to handle (excessive power requirement, incompatibility)
> Defective hardware that may be drawing excessive power or otherwise disrupting proper voltage regulation
> 
> 
> Great



I'm happily surprised that someone actually took the time and checked what a WHEA crash is, thank you.

I did not expand on the matter, but just remember that a cpu is composed by a lot of components built together.
Which mean that the cores themselves, may not always be the main culprit.
Sometimes other components of the cpu could be badly binned or faulty, like the cache, usb/pci-e, ∞ fabric, memory controller, etc, etc.
The same is also true for the internal components composing a core, the stuff that actually do the computation job.

In few words, WHEA are a general way, to tell you that the cpu is not stable, just enough to avoid hard crashing the system.


----------



## salMarv

Deepcuts said:


> When I first swapped my 3950X with the new 5950X, I have setup BIOS the exact same way again and I recall it worked for about 2 hours without problems. Then it started misbehaving.
> At 1st, I didn't give much thought to it but seeing more people stating the same thing jolted my neuron.
> The only tweaks were with RAM. Manual 3600 @ 1.38V with tight timings. UCLK and FCLK at 1800. PBO disabled and the only manual voltage was VCORE -0.0100 mV. Rest auto, including LLC.
> VCORE was negative as that gave me the best performance and lower temps.
> Then again, maybe it was just my imagination or dumb luck. No way to know for sure.


I am getting exactly the same problem but with a 5600x. I have to disable PBO and all overclocks on a X570-F motherboard. Could by board be faulty? it does it on the 3900x as the 5600x. or is it the power supply.. really hard to figure this out.


----------



## ghiga_andrei

Three weeks without a new post... it seems the "fix" from AMD works, reducing the aggressive boost.


----------



## GRABibus

ghiga_andrei said:


> Three weeks without a new post... it seems the "fix" from AMD works, reducing the aggressive boost.


which fix ?


----------



## xeizo

GRABibus said:


> which fix ?


Early 5-series bioses boosted ridiculous high, not so anymore with newer bioses = no more idle reboot. For myself, I haven't had one in months. And I didn't change CPU, still with my early sample 5900X.


----------



## brasoveanul

I tested with ClockBoost tester utility, and it reaches 5 GHz+ on some cores, it seems that it is stable, but there where no significant issues since I installed it. Therefore, I cannot comment on this, but I would like to see more discussions, in order to better understand the dynamics of the phenomenon.


----------



## braincracking

I also have an early 5900x and problems I ran into where of quality control. Unidentified piece of plastic?!?(I assume) between the pins that I found under the digital microscope and made the entire system unstable(I was forced to run my ram at 2133Mhz as anything else wouldn't make it past ram training, also entire channels would randomly disappear). About 2 months ago, I replaced my waterblock(raystorm neo am4 to heatkiller IV) and inspected the CPU under the microscope and found the culprit of the nonsense. To be 100% fair, the raystorm neo am4 has issues with the mounting mechanism making it easy to overtighten and causes instability as well, so I had multiple problems, now all is good and well do, and didn't have anything todo with the bios(had the same motherboard throughout this ordeal, x570 master early sample that was used with 3900x that I replaced).


----------



## brasoveanul

So, it turns out that you should have a fairly advanced testing lab with a digital microscope in order to use and debug these processors.


----------



## LuchoU

I think the silence in this post is a mixture of people doing RMA's and AMD introducing better bioses. CPUs may look more stable in idle, but if pushed through CPU intensive games or stress tests may exhibit the issue. In my case bios improved slower cores boost behaviour, they are not failing anymore, but supposedly good cores (best one and second best one according to Ryzen Master) were still giving rounding errors if pushed through corecycler, so at least from my side is not totally resolved yet. I was able to get stability by adding CO+ on those two.


----------



## GRABibus

LuchoU said:


> I think the silence in this post is a mixture of people doing RMA's and AMD introducing better bioses. CPUs may look more stable in idle, but if pushed through CPU intensive games or stress tests may exhibit the issue. In my case bios improved slower cores boost behaviour, they are not failing anymore, but supposedly good cores (best one and second best one according to Ryzen Master) were still giving rounding errors if pushed through corecycler, so at least from my side is not totally resolved yet. I was able to get stability by adding CO+ on those two.


or maybe no more posts because they all went to Intel 😊


----------



## LuchoU

GRABibus said:


> or maybe no more posts because they all went to Intel 😊


Well, I don't blame them. This WHEA issue was a pain to resolve and a lot of time invested doing it. My experience with Intel has been good in the past, you build and forget, but I didn't want to go through buying new MB plus new CPU plus extra money plus extra time to build, etc.


----------



## 1devomer

1devomer said:


> Time flowing away, AMD in the near future may launch another "XT", "Zen3+" line-up, leaving old customers laid with horrible cpu bins.











AMD Reportedly Preparing B2 Stepping of Ryzen 5000 Series "Vermeer" Processors, Boost Speeds to Reach 5.0 GHz


AMD is reportedly preparing to launch a B2 stepping of their Ryzen 5000 series of processors, codenamed Vermeer. Thanks to the findings of Patrick Schur, who was lucky to get ahold of AMD's processor codes, we have information that AMD is slowly preparing a B2 stepping of Vermeer processors, to...




www.techpowerup.com





It doesn't bode well, especially for the users that already RMAed their flaky cpu!
Wonder why a new stepping was needed??


----------



## ghiga_andrei

1devomer said:


> AMD Reportedly Preparing B2 Stepping of Ryzen 5000 Series "Vermeer" Processors, Boost Speeds to Reach 5.0 GHz
> 
> 
> AMD is reportedly preparing to launch a B2 stepping of their Ryzen 5000 series of processors, codenamed Vermeer. Thanks to the findings of Patrick Schur, who was lucky to get ahold of AMD's processor codes, we have information that AMD is slowly preparing a B2 stepping of Vermeer processors, to...
> 
> 
> 
> 
> www.techpowerup.com
> 
> 
> 
> 
> 
> It doesn't bode well, especially for the users that already RMAed their flaky cpu!
> Wonder why a new stepping was needed??


Users on other threads are speculating that they will be replacing the IO-die with the new tech node one that will be also present in the new X570S chipset. New Motherboards will also be released.

Could be just a power efficiency thing since the IO-die in my 5900x draws 20W all the time... Lower power to the IO-die means more power for the cores.


----------



## GRABibus

To save 20W and win 100MHz, new CPU and new MOBO…

not for me 😊


----------



## OCmember

I think these are OEM only, not available to the public for sale. Correct me if I am wrong


----------



## ghiga_andrei

OCmember said:


> I think these are OEM only, not available to the public for sale. Correct me if I am wrong


I think you are thinking of the non-X 5800 and 5900 models there were released months ago:








AMD Launches Ryzen 9 5900 & Ryzen 7 5800 OEM Processors


AMD has quietly launched two new Zen 3 processors for the OEM market with the Ryzen 9 5900 and Ryzen 7 5800. The Ryzen 9 5900 is a 12 core 24 thread processor with a base clock of 3.0 GHz and a max boost clock of 4.7 GHz along with a TDP of 65 W. The clock speeds were lowered due to the 65 W TDP...




www.techpowerup.com


----------



## uzi1

Hi All,

I just recently upgraded from 6700k which never gave me any issues in the 5-6 years I've had it but thought I was over due an upgrade so few weeks ordered the following

5900x
MSI x570 Unify
980 Pro 512gb m.2

running cinebench all cores temps max around 81c and this is with fans at 900rpm on 360 rad temps dont seem to bad when comparing what others are reporting 

the rest of the parts are from my old system and never had issues with them so unlikely any of them have fault
HX1000i
Kraken Z73 AIO
SN750 2TB m.2 for games/storage
3070 FE
CMW16GX4M2Z3600C18 I actually have 4x8gb sticks of these even though they are the same model they have different chips I bought 1 pair later never had issues will all 4 installed at xmp even tighting the timings to CL16

I heard about 4 sticks can cause issues so removed 2 of them and at stock still get the issue so dont think its the ram which I also used on other system with no issues

I did first right away started playing with PBO and curve optimizer which isnt the best thing to do and thought it was down to that but I set everything to Auto and still have the issue

I did also flash the latest bios right away which is beta released on 2021-04-08 

only other thing I can try is connect the other 8 pin CPU connector to the board I dont know what else to try

my only other thoughts are either the CPU or Motherboard are at fault or its the Bios

A fatal hardware error has occurred.

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 4

Thanks


----------



## ghiga_andrei

uzi1 said:


> Hi All,
> 
> I just recently upgraded from 6700k which never gave me any issues in the 5-6 years I've had it but thought I was over due an upgrade so few weeks ordered the following
> 
> 5900x
> MSI x570 Unify
> 980 Pro 512gb m.2
> 
> running cinebench all cores temps max around 81c and this is with fans at 900rpm on 360 rad temps dont seem to bad when comparing what others are reporting
> 
> the rest of the parts are from my old system and never had issues with them so unlikely any of them have fault
> HX1000i
> Kraken Z73 AIO
> SN750 2TB m.2 for games/storage
> 3070 FE
> CMW16GX4M2Z3600C18 I actually have 4x8gb sticks of these even though they are the same model they have different chips I bought 1 pair later never had issues will all 4 installed at xmp even tighting the timings to CL16
> 
> I heard about 4 sticks can cause issues so removed 2 of them and at stock still get the issue so dont think its the ram which I also used on other system with no issues
> 
> I did first right away started playing with PBO and curve optimizer which isnt the best thing to do and thought it was down to that but I set everything to Auto and still have the issue
> 
> I did also flash the latest bios right away which is beta released on 2021-04-08
> 
> only other thing I can try is connect the other 8 pin CPU connector to the board I dont know what else to try
> 
> my only other thoughts are either the CPU or Motherboard are at fault or its the Bios
> 
> A fatal hardware error has occurred.
> 
> Reported by component: Processor Core
> Error Source: Machine Check Exception
> Error Type: Cache Hierarchy Error
> Processor APIC ID: 4
> 
> Thanks


First, do a clear CMOS by battery removal to be sure the Curve Optimizer has been deactivated. Just setting things back to Auto does not always work, the BIOS is very buggy.

Then, see if your APIC ID is always 4 (or 5). This would mean only your core 2 (3rd core) would be bad. If it is always APIC ID 4 or 5, then go back to Curve Optimizer and put Positive 15 to that core (core 2 in BIOS). See if that changes anything.

Report back with more info after you do this.


----------



## 1devomer

ghiga_andrei said:


> Users on other threads are speculating that they will be replacing the IO-die with the new tech node one that will be also present in the new X570S chipset. New Motherboards will also be released.
> 
> Could be just a power efficiency thing since the IO-die in my 5900x draws 20W all the time... Lower power to the IO-die means more power for the cores.





> AMD confirms that B2 stepping has no effect on performance or functionality and will require no BIOS upgrade from the user.
> 
> 
> 
> As part of our continued effort to expand our manufacturing and logistics capabilities, AMD will gradually move AMD Ryzen 5000 Series Desktop Processors to B2 Revision over the next 6 months. The revision does not bring improvements in terms of functionality or performance, furthermore, no BIOS update will be required.
> _— AMD spokesperson to Benchmark.pl_
Click to expand...

One does not simply respin a design into a new steeping, within the first 6 months of the product life, just for fun.
The mystery deepens, as it is common practice when dealing with AMD??

Edit: i laughed pretty hard, when i read the _"furthermore, no BIOS update will be required" part of the statment!!_


----------



## uzi1

ghiga_andrei said:


> First, do a clear CMOS by battery removal to be sure the Curve Optimizer has been deactivated. Just setting things back to Auto does not always work, the BIOS is very buggy.
> 
> Then, see if your APIC ID is always 4 (or 5). This would mean only your core 2 (3rd core) would be bad. If it is always APIC ID 4 or 5, then go back to Curve Optimizer and put Positive 15 to that core (core 2 in BIOS). See if that changes anything.
> 
> Report back with more info after you do this.


Hi, since this message I havent had another crash before you replied I just set everything to stock only PBO enabled and RAM at its XPM which is 3600 CL18 only 2 sticks installed been running for 10 hours now done Cinebench runs which gives me around 4.5ghz all cores and at single few cores hit 4950mhz all core max temp here is 81c and single 61c using Kraken z72 with fans running at around 900rpm I wont get ahead of myself just yet

Also done few game benchmarks temps are around 55c - 70c and few cores hit 4950mhz at idle its around 38c and general web browsing 42c-45c 360 rad fans at 900rpm 

searching and reading on what others experience with the 5900x my temps seem okay and Cinebench seems to match up I aint bothered about trying to push for more just hoping it stays like this now and fine with the current clocks and temp if I dont see any issues for another day or so I will add the other 2 sticks and see I might connect the other CPU 8 pin power connector just have 1 connected now but reading 1 8pin is enough but no harm connecting it

Thanks


----------



## ghiga_andrei

uzi1 said:


> Hi, since this message I havent had another crash before you replied I just set everything to stock only PBO enabled and RAM at its XPM which is 3600 CL18 only 2 sticks installed been running for 10 hours now done Cinebench runs which gives me around 4.5ghz all cores and at single few cores hit 4950mhz all core max temp here is 81c and single 61c using Kraken z72 with fans running at around 900rpm I wont get ahead of myself just yet
> 
> Also done few game benchmarks temps are around 55c - 70c and few cores hit 4950mhz at idle its around 38c and general web browsing 42c-45c 360 rad fans at 900rpm
> 
> searching and reading on what others experience with the 5900x my temps seem okay and Cinebench seems to match up I aint bothered about trying to push for more just hoping it stays like this now and fine with the current clocks and temp if I dont see any issues for another day or so I will add the other 2 sticks and see I might connect the other CPU 8 pin power connector just have 1 connected now but reading 1 8pin is enough but no harm connecting it
> 
> Thanks


The issue happens at light loads, so you will not catch it by running Cinebench. Just use the PC for Chrome browsing and then leave it idle or keep browsing. This will make the cores boost higher than CB runs.


----------



## uzi1

ghiga_andrei said:


> The issue happens at light loads, so you will not catch it by running Cinebench. Just use the PC for Chrome browsing and then leave it idle or keep browsing. This will make the cores boost higher than CB runs.


yes I also just browsed and watched youtube also left it on idle for around 1hr and to see what idle temp it would settle on and if needed to adjust fan for when idle


----------



## ghiga_andrei

1devomer said:


> One does not simply respin a design into a new steeping, within the first 6 months of the product life, just for fun.
> The mystery deepens, as it is common practice when dealing with AMD??
> 
> Edit: i laughed pretty hard, when i read the _"furthermore, no BIOS update will be required" part of the statment!!_


I feel the same way, that they identified the problem and will change something either in technology or adjust something in the design. But this is a very shady practice indeed, why fix something without acknowledging the problem ?

Regarding the BIOS update, yes, that's straight on comedy. I guess what they mean is that the new chips will be compatible with the older BIOS, without a need for boot kits like for original Zen 3 chips on older MBs.


----------



## GRABibus

uzi1 said:


> Hi, since this message I havent had another crash before you replied I just set everything to stock only PBO enabled and RAM at its XPM which is 3600 CL18 only 2 sticks installed been running for 10 hours now done Cinebench runs which gives me around 4.5ghz all cores and at single few cores hit 4950mhz all core max temp here is 81c and single 61c using Kraken z72 with fans running at around 900rpm I wont get ahead of myself just yet
> 
> Also done few game benchmarks temps are around 55c - 70c and few cores hit 4950mhz at idle its around 38c and general web browsing 42c-45c 360 rad fans at 900rpm
> 
> searching and reading on what others experience with the 5900x my temps seem okay and Cinebench seems to match up I aint bothered about trying to push for more just hoping it stays like this now and fine with the current clocks and temp if I dont see any issues for another day or so I will add the other 2 sticks and see I might connect the other CPU 8 pin power connector just have 1 connected now but reading 1 8pin is enough but no harm connecting it
> 
> Thanks


Now you can try 10 instead of 15 as offset and retest idle/low load stability for several days, etc…..until you find the lowest stable positive offset.


----------



## uzi1

figured out what was causing the freezing I have 4x8gb of CMW16GX4M2Z3600C18 they were in my old system even though all are exact same models 2 pairs are Micron and the other 2 are Nanya either that is causing the issue or the system doesnt like 4 sticks which I have read about or some settings I need to change I had them running at stock , I bought the 2nd pair later on as an upgrade all 4 worked fine in old system with 6700k and running 3600 CL16

system works fine with just 2 sticks installed and even tighted the timings to CL16

so I guess my only soluton is to buy 2x16gb pack and get rid of these


----------



## chris719

LuchoU said:


> I think the silence in this post is a mixture of people doing RMA's and AMD introducing better bioses. CPUs may look more stable in idle, but if pushed through CPU intensive games or stress tests may exhibit the issue. In my case bios improved slower cores boost behaviour, they are not failing anymore, but supposedly good cores (best one and second best one according to Ryzen Master) were still giving rounding errors if pushed through corecycler, so at least from my side is not totally resolved yet. I was able to get stability by adding CO+ on those two.


Now that availability is better you might be seeing more soon. I just got my 5950X on Friday and it’s already restarted randomly once a day at stock settings other than DOCP 3600. Previously had 3950X with the same settings completely stable for a year. Crucial 3600C16 2x32GB, CH8 Wi-Fi, and Seasonic Prime Titanium 1000W. Really pissed.

I have to say, after my issues with the 5950X and the Ryzen consumer platform in total, the only CPUs I can recommend from AMD are Threadripper and Epyc. My Threadripper 3960X at work has never crashed, ever. My Epyc 7742 server has been amazing.


----------



## iraff1

Ended up getting a new 5950x chip, new chip is a golden sample according to ctr and performs really well, after a little curve optimizer and pbo enabled i am now at:

Cinebench 20
Multicore 11562
Single 663

I could probably tweak more but for now i am happy with it, so much better then my old chip that was stuck at
Multicore 10210
Single 621

Looking at this thread i am surprised to see people are still receiving faulty units, my thoughs where the factory process has been optimized much since launch and now the chip delivered are much better than the early ones they sent out in november of 2020. Anyway i am glad i can finally rest easy knowing my rigg performs, i ****ing hate the silicon lottery.


----------



## tdimarzio

iraff1 said:


> Ended up getting a new 5950x chip, new chip is a golden sample according to ctr and performs really well, after a little curve optimizer and pbo enabled i am now at:
> 
> Cinebench 20
> Multicore 11562
> Single 663
> 
> I could probably tweak more but for now i am happy with it, so much better then my old chip that was stuck at
> Multicore 10210
> Single 621
> 
> Looking at this thread i am surprised to see people are still receiving faulty units, my thoughs where the factory process has been optimized much since launch and now the chip delivered are much better than the early ones they sent out in november of 2020. Anyway i am glad i can finally rest easy knowing my rigg performs, i ****ing hate the silicon lottery.


Congrats on winning the silicon lottery! While I'm happy you did so well, it reminds me and makes me sad that I sent in a perfectly good 5950x for RMA, believing that my WHEA idle reboots were due to a faulty chip, and got back a much lower-achiever. My experience is detailed in an earlier comment in this thread, but, in short, my WHEA reboots were due to a faulty 6900xt. Since receiving the replacement 6900xt, I have not have a single WHEA or system instability / reboot. It's been about two months now. Prior to replacing the GPU, WHEA 18 reboots up to a couple times a day.
The sad part is, my original 5950x from November of 2020, would boost higher, run cooler, and run with a much more aggressive CO. I was able to do -30 on most cores. The new 5950x from March, 2021, while perfectly stable without any CO, boosts lower, runs hotter, and will not take any appreciable CO.

[Original 5950x from November, which I believed to be faulty]
R20 single: 640
R20 multi: 11443

[New replacement 5950x from March, 2021]
R20 single: 619
R20 multi: 10930

So, it's sad, but I'm basically considering the loss of some performance due to silicon lottery to be the payment I made for additional experience. Even after building PCs for 25 years, I never would have expected that my idle WHEA 18 reboots were due to the GPU. The GPU was 100% stable in all gaming. Hundreds of hours of gaming. No graphical corruption. There just wasn't anything that made me even suspicious of the GPU. So, valuable experience gained at the expense of some CPU perf. A word of caution that a November 2020 5950x could perform much better than a much newer production 5950x. That's why they call it the silicon lottery. Don't assume that a RMA'd 5950x will be better silicon than earlier production runs.


----------



## Deepcuts

tdimarzio said:


> Don't assume that a RMA'd 5950x will be better silicon than earlier production runs.


There are a few users that received a 3rd faulty CPU (without any AMD GPU in their system)
For your specific case, I am sorry to hear you had bad luck, but you should have tested with a different GPU.
I know it is useless to state that after the fact, but at least I am sure you'll remember that for Ryzen 4 when you'll upgrade.


----------



## tdimarzio

Deepcuts said:


> There are a few users that received a 3rd faulty CPU (without any AMD GPU in their system)
> For your specific case, I am sorry to hear you had bad luck, but you should have tested with a different GPU.
> I know it is useless to state that after the fact, but at least I am sure you'll remember that for Ryzen 4 when you'll upgrade.


Indeed. And now I have a spare GPU in the closet specifically for that reason


----------



## iraff1

tdimarzio said:


> Congrats on winning the silicon lottery! While I'm happy you did so well, it reminds me and makes me sad that I sent in a perfectly good 5950x for RMA, believing that my WHEA idle reboots were due to a faulty chip, and got back a much lower-achiever. My experience is detailed in an earlier comment in this thread, but, in short, my WHEA reboots were due to a faulty 6900xt. Since receiving the replacement 6900xt, I have not have a single WHEA or system instability / reboot. It's been about two months now. Prior to replacing the GPU, WHEA 18 reboots up to a couple times a day.
> The sad part is, my original 5950x from November of 2020, would boost higher, run cooler, and run with a much more aggressive CO. I was able to do -30 on most cores. The new 5950x from March, 2021, while perfectly stable without any CO, boosts lower, runs hotter, and will not take any appreciable CO.
> 
> [Original 5950x from November, which I believed to be faulty]
> R20 single: 640
> R20 multi: 11443
> 
> [New replacement 5950x from March, 2021]
> R20 single: 619
> R20 multi: 10930
> 
> So, it's sad, but I'm basically considering the loss of some performance due to silicon lottery to be the payment I made for additional experience. Even after building PCs for 25 years, I never would have expected that my idle WHEA 18 reboots were due to the GPU. The GPU was 100% stable in all gaming. Hundreds of hours of gaming. No graphical corruption. There just wasn't anything that made me even suspicious of the GPU. So, valuable experience gained at the expense of some CPU perf. A word of caution that a November 2020 5950x could perform much better than a much newer production 5950x. That's why they call it the silicon lottery. Don't assume that a RMA'd 5950x will be better silicon than earlier production runs.


That blows, very sorry to hear this. Strange that the CPU would throw WHEA errors due to GPU but i'm far from a computer expert to really know what these WHEA errors mean. Really happy i didn't end up getting another dud, i still believe the newer batches of 5950x are more likely to be good samples then the ones shipped first, this is always how it works in these factories, they get better at making them with time and trial and error, eventually they have perfected the process and each chip performs pretty well.


----------



## 1devomer

iraff1 said:


> That blows, very sorry to hear this. Strange that the CPU would throw WHEA errors due to GPU but i'm far from a computer expert to really know what these WHEA errors mean. Really happy i didn't end up getting another dud, i still believe the newer batches of 5950x are more likely to be good samples then the ones shipped first, this is always how it works in these factories, they get better at making them with time and trial and error, eventually they have perfected the process and each chip performs pretty well.


As usual, i will appear like an old pedant guy, but having to RMA both gpu and cpu, to be able to get a working computer, is pretty concerning.

On top of that, the real wafer defect rate and the voltage binning curve are not released publicly, by the manufacturers.

So, there is no effective way to assess the current manufacturing process quality, because nobody spent time assessing how good the EPYC chiplets are!

To get a glance of the whole binning spectrum curve, from the dies dedicated to a 5600X to the dies that power an EPYC 7763.

I still remember some 3950X extremely well binned, that were running real 7nm voltages, requiring only between 1.10v to1.25v full load.
I suppose these were almost into the ThreadRipper, EPYC quality bin.


----------



## LuchoU

tdimarzio said:


> Congrats on winning the silicon lottery! While I'm happy you did so well, it reminds me and makes me sad that I sent in a perfectly good 5950x for RMA, believing that my WHEA idle reboots were due to a faulty chip, and got back a much lower-achiever. My experience is detailed in an earlier comment in this thread, but, in short, my WHEA reboots were due to a faulty 6900xt. Since receiving the replacement 6900xt, I have not have a single WHEA or system instability / reboot. It's been about two months now. Prior to replacing the GPU, WHEA 18 reboots up to a couple times a day.
> The sad part is, my original 5950x from November of 2020, would boost higher, run cooler, and run with a much more aggressive CO. I was able to do -30 on most cores. The new 5950x from March, 2021, while perfectly stable without any CO, boosts lower, runs hotter, and will not take any appreciable CO.
> 
> [Original 5950x from November, which I believed to be faulty]
> R20 single: 640
> R20 multi: 11443
> 
> [New replacement 5950x from March, 2021]
> R20 single: 619
> R20 multi: 10930
> 
> So, it's sad, but I'm basically considering the loss of some performance due to silicon lottery to be the payment I made for additional experience. Even after building PCs for 25 years, I never would have expected that my idle WHEA 18 reboots were due to the GPU. The GPU was 100% stable in all gaming. Hundreds of hours of gaming. No graphical corruption. There just wasn't anything that made me even suspicious of the GPU. So, valuable experience gained at the expense of some CPU perf. A word of caution that a November 2020 5950x could perform much better than a much newer production 5950x. That's why they call it the silicon lottery. Don't assume that a RMA'd 5950x will be better silicon than earlier production runs.


The lower CB number is probably because new CPUs are boosting more conservatively. Old batches from 2020 were too aggresive to boost and that was probably causing instabilities.

About your benchmark numbers if we convert those numbers to real world numbers, how much are you loosing in performance, 1 fps in games and 2 seconds in loading times for productivy tools? I'm just throwing numbers, but I really think is not that much. Maybe I'm old and I don't care too much for benchmarks and overclocking now, I'm more in the "build and forget" stage when related to PC hardware. My 5800x is from 2020 and I can say that I didn't win the silicon lottery as I'm also not getting the same CB numbers I see in site reviews, mines are lower.


----------



## tdimarzio

LuchoU said:


> The lower CB number is probably because new CPUs are boosting more conservatively. Old batches from 2020 were too aggresive to boost and that was probably causing instabilities.
> 
> About your benchmark numbers if we convert those numbers to real world numbers, how much are you loosing in performance, 1 fps in games and 2 seconds in loading times for productivy tools? I'm just throwing numbers, but I really think is not that much. Maybe I'm old and I don't care too much for benchmarks and overclocking now, I'm more in the "build and forget" stage when related to PC hardware. My 5800x is from 2020 and I can say that I didn't win the silicon lottery as I'm also not getting the same CB numbers I see in site reviews, mines are lower.


I agree ... the real-world impact to performance is probably very small, and obviously application / situation-dependent. As you say, I'm also getting older and care less now about overclocking and benchmarks than I used to, but I'd be lying to myself if I said overclocking is behind me. I can't resist the urge to try to squeeze more performance out of the CPU, RAM, GPU, etc. I'm 40 years old now so maybe when I'm 60 I'll finally kick the overclocking obsession  In the meantime, I am content with my system as it appears to be 100% stable and that will always be more important that a few % in synthetic benchmarks.


----------



## WinterActual

tdimarzio said:


> Congrats on winning the silicon lottery! While I'm happy you did so well, it reminds me and makes me sad that I sent in a perfectly good 5950x for RMA, believing that my WHEA idle reboots were due to a faulty chip, and got back a much lower-achiever. My experience is detailed in an earlier comment in this thread, but, in short, my WHEA reboots were due to a faulty 6900xt. Since receiving the replacement 6900xt, I have not have a single WHEA or system instability / reboot. It's been about two months now. Prior to replacing the GPU, WHEA 18 reboots up to a couple times a day.
> The sad part is, my original 5950x from November of 2020, would boost higher, run cooler, and run with a much more aggressive CO. I was able to do -30 on most cores. The new 5950x from March, 2021, while perfectly stable without any CO, boosts lower, runs hotter, and will not take any appreciable CO.
> 
> [Original 5950x from November, which I believed to be faulty]
> R20 single: 640
> R20 multi: 11443
> 
> [New replacement 5950x from March, 2021]
> R20 single: 619
> R20 multi: 10930
> 
> So, it's sad, but I'm basically considering the loss of some performance due to silicon lottery to be the payment I made for additional experience. Even after building PCs for 25 years, I never would have expected that my idle WHEA 18 reboots were due to the GPU. The GPU was 100% stable in all gaming. Hundreds of hours of gaming. No graphical corruption. There just wasn't anything that made me even suspicious of the GPU. So, valuable experience gained at the expense of some CPU perf. A word of caution that a November 2020 5950x could perform much better than a much newer production 5950x. That's why they call it the silicon lottery. Don't assume that a RMA'd 5950x will be better silicon than earlier production runs.


I am not convinced its the gpu bro. You probably updated the BIOS meanwhile or you were using the faulty HWInfo version which was causing WHEAs and then they fixed it, so it probably updated and you didnt noticed. But yes, there was a version of HWInfo which was causing WHEA errors with the new 6000 gpus. I also suffered from that and it took me long time to realize what it was. Then I checked their forum and it was even posted here on OC.net. I thought the new replacement CPU was also faulty... lol


----------



## 1devomer

WinterActual said:


> I am not convinced its the gpu bro. You probably updated the BIOS meanwhile or you *were using the faulty HWInfo version which* was causing WHEAs and then they fixed it, so it probably updated and you didnt noticed. But yes, there was a version of HWInfo which was causing WHEA errors with the new 6000 gpus. I also suffered from that and it took me long time to realize what it was. Then I checked their forum and it was even posted here on OC.net. I thought the new replacement CPU was also faulty... lol


HWinfo had no faulty version, AMD is to blame as usual, it released new hardware and software, without providing supporting information to developers.

You should blame the right person or entity, because of this kind of posts, a lot of people like you, threw mud at HWinfo.

So careful who you blame for, if AMD wouldn't be so bad, we would not even be here, arguing in this thread, about RMAing a cpu and gpu to get the computer working!!!


----------



## tdimarzio

WinterActual said:


> I am not convinced its the gpu bro. You probably updated the BIOS meanwhile or you were using the faulty HWInfo version which was causing WHEAs and then they fixed it, so it probably updated and you didnt noticed. But yes, there was a version of HWInfo which was causing WHEA errors with the new 6000 gpus. I also suffered from that and it took me long time to realize what it was. Then I checked their forum and it was even posted here on OC.net. I thought the new replacement CPU was also faulty... lol


Thanks, but, no. That was me who posted in this thread about HWiNFO. I was one of those (AMD718) working closely with Martin (HWiNFO Author) on the WHEA errors on RDNA2 and testing HWiNFO beta versions until we were sure the issue was resolved. However, that did add an additional layer of complexity to the troubleshooting process for a period of time, which was fun (/s). CPU? Replaced twice. PSU? Replaced twice. RAM? Replaced. Windows? Reinstalled. BIOS? Tried about a dozen different versions, with hardware CMOS resets in between. Drivers? Tried every Adrenalin back to the first version supported by RDNA2. So, the only things left were motherboard and GPU. I really didn't want to replace the mobo unless absolutely necessary but I did not have a replacement GPU. So, I went and found the only GPU I could get at a local store. An absolutely awful p.o.s. of a GPU. NVidia GT 710. It gets 1 fps in Time Spy. But, it will do 15 fps in Quake live at 640x480. Anyway, popped in the GT710 and the system was 100% stable. Was even able to restore memory overclocking / timing optimizations and PBO+CO. So, yeah, it was 100% the GPU. Then I had to go through the RMA process with Gigabyte for the 6900xt. These are some bad memories. I'm sure it's over 100 hours invested start-to-finish. Replacement 6900xt has been perfect for two months. Not a single WHEA 18. So, 100% it was the GPU.


----------



## Imraneo

Hi guys, it's been a while now..
I just started with CoreCycler and it seems like I'll have to restart my testing all over again!
Honestly I do not know if I can trust CoreCycler in the first place. There are times where every single core will have rounding errors at one go and right after restarting the tool, I'll have errors on only some.

Based on my previous PBO curve settings, every core seem to error out, just not on a single iteration. It takes like 5 iterations to eventually fail every single core.

I've reset my PBO settings to "enabled" (I believe this is the same as no voltage offsets). I get less errors like the following (ran 2min, no skipping of failed cores): 
1st iteration core fails: 1,4,10
2nd iteration core fails: 2,4
3rd iteration core fails:1,7,8,9
_still running_

Questions...
1) Does this mean that my cores are faulty? Should I restart all my tests again with PBO "disabled" / absolute stock?
2) Should I be aiming for zero rounding errors?
3) Do I have to use positive voltage offset to stabilize some cores?
4) Can I run other programs like web browser while running CoreCycler?
5) At what level should I be considering RMA again? (this is my 2nd chip)

As for normal day-to-day usage, it is mostly stable but nowhere as stable as my previous i7-6700K. I get occasional BSODs out of the blue, and only recently increasingly during rendering videos in Premiere. 
No WHEA errors at all and no restarts during cinebench (this is how I tested previously).

My reason to explore CoreCycler is that I do hope to achieve absolute stability (no BSODs at all). 
Thanks for reading and I apologize for any noob questions which might have been answered before.
Cheers.


----------



## Deepcuts

The question is do you get rounding errors with PBO disabled?
For my CPU, rounding errors are the result of a too drastic curve voltage. Most cores take -28, some -22, some -18 while only 2 cores I have to set a negative 4.
It takes quite a long time to test the curve so be patient.
My money is on too much negative offset for your cores. That is if it does not throw errors on stock without PBO. If it does, RMA.


----------



## Imraneo

Alright, I've turned off PBO completely and so far here are my results:
1st iteration: 1
2nd iteration: <all pass>
3rd iteration: 2, 5
4th iteration: <all pass>

So, again, how can I trust these results? They are not repeatable over multiple iterations. I guess I should run them overnight and gather all the failed cores (so far, theyre 1,2,5) and use positive offset and then test until I get everything perfect. After which I will play with the other 9 cores and see how low they get on the curve.
Or, perhaps I should RMA once again. You think AMD will accept these failures as genuine? After all, its Prime95..


----------



## Deepcuts

Imraneo said:


> Alright, I've turned off PBO completely and so far here are my results:
> 1st iteration: 1
> 2nd iteration: <all pass>
> 3rd iteration: 2, 5
> 4th iteration: <all pass>
> 
> So, again, how can I trust these results? They are not repeatable over multiple iterations. I guess I should run them overnight and gather all the failed cores (so far, theyre 1,2,5) and use positive offset and then test until I get everything perfect. After which I will play with the other 9 cores and see how low they get on the curve.
> Or, perhaps I should RMA once again. You think AMD will accept these failures as genuine? After all, its Prime95..


Can you test with a different RAM KIT?
Or maybe test with your RAM set at stock, without XMP or any other tweaks?
Just clear CMOS, don't set anything in BIOS and test.


----------



## Imraneo

Deepcuts said:


> Can you test with a different RAM KIT?
> Or maybe test with your RAM set at stock, without XMP or any other tweaks?
> Just clear CMOS, don't set anything in BIOS and test.


i've got an update (yes, its related to RAM )

*I read somewhere that the FFT size matters when testing. Default settings "Huge" focuses on RAM and "Small" focuses on CPU. <-- Is this true?*

I ran a few iterations of Small without errors (no PBO still).
I turned off XMP and I am re-running Huge right now. 3rd iteration now, all clean.

So in summary, my theory is that it might actually be my RAM which is unstable. This is also why I tend to have errors on random cores. I think I'm getting somewhere. I plan to over-volt my RAM and/or try different speeds with XMP. But first, I will disable my XMP and focus testing on my CPU and PBO setting.

Honestly, I'm a bit relieved that it could be a RAM issue as I can compromise on RAM speed (or get new sticks in future).


----------



## LuchoU

There are some good tools which are RAM oriented such as Karhu (you can google it). I had to buy a license, but is cheap and I wanted to make sure my RAM was stable previous to start testing CPU for errors. Also make sure to use the last version for CoreCycler. An old version included a beta Prime executable that had a bug involving random rounding errors in stable CPUs (it was indicated in one of the versions changelog for CoreCycler).
For me CoreCycler helped to give stability to one of my bad cores, but there was another core that never failed with CoreCycler even if running +12 hours, but it was failing with CPU demanding games such as RDR2, so I had to use that game to debug one of my cores and had to add +CO to that one. 
In summary, from my perspective CoreCycler is a good tool to debug your CPU cores, but it may not detect some border cases such as the one I was experiencing, those cases involve to run real world applications and look for WHEAs. In event viewer you can look for the APIC ID that failed and that will indicate the logical CPU thread, which you will be able to link to the corresonding core and add +CO to that one.

Enviado desde mi SM-G960U1 mediante Tapatalk


----------



## Imraneo

LuchoU said:


> There are some good tools which are RAM oriented such as Karhu (you can google it). I had to buy a license, but is cheap and I wanted to make sure my RAM was stable previous to start testing CPU for errors. Also make sure to use the last version for CoreCycler. An old version included a beta Prime executable that had a bug involving random rounding errors in stable CPUs (it was indicated in one of the versions changelog for CoreCycler).
> For me CoreCycler helped to give stability to one of my bad cores, but there was another core that never failed with CoreCycler even if running +12 hours, but it was failing with CPU demanding games such as RDR2, so I had to use that game to debug one of my cores and had to add +CO to that one.
> In summary, from my perspective CoreCycler is a good tool to debug your CPU cores, but it may not detect some border cases such as the one I was experiencing, those cases involve to run real world applications and look for WHEAs. In event viewer you can look for the APIC ID that failed and that will indicate the logical CPU thread, which you will be able to link to the corresonding core and add +CO to that one.
> 
> Enviado desde mi SM-G960U1 mediante Tapatalk


Thanks for sharing.
It seems that all the time spent previously with Cinebench and idling continuously overnight (my test tools.. haha) didn't go to waste. My PBO settings are all fine. I've further confirmed it is indeed my RAM at fault. Running at 3533 reduced the frequency of errors and then bumping 0.05V further reduced it again.
XMP off gave me perfect results. I'm not so keen on over-volting, so I just decided to run at XMP 3333Mhz. Its all good. I also tried OCCT tool very briefly and it came clean (as opposed to many continuous errors previously).
I hope to put all these to rest and hopefully no more random BSODs.


----------



## NDS322

Can someone answer me about that AGESA 1.2.0.3 Patch A still get WHEA Error Code 19 with 4000MHz 1:1 of RAM ?


----------



## mongoled

Can someone answer me about that when the rain comes will I still feel cold ?


----------



## LuchoU

Is 4000MHz guaranteed? I believe it's still a lottery and not easy to achieve, I'm just going for 3600MHz 1:1


----------



## Daylight_Invader

NDS322 said:


> Can someone answer me about that AGESA 1.2.0.3 Patch A still get WHEA Error Code 19 with 4000MHz 1:1 of RAM ?


Agree with the other post here. I think 4000 is really expecting something special, and the silicon lottery is generally unkind at those speeds.

AMD really only supports 3200-3600 for the 5000 series. Anything over this is really down to the silicon gods. Many people can hit 3800, but over this threshold seems to be where the majority start to fall down. I also think this is the land of very diminishing returns.


----------



## JohnnyFlash

Daylight_Invader said:


> Agree with the other post here. I think 4000 is really expecting something special, and the silicon lottery is generally unkind at those speeds.
> 
> AMD really only supports 3200-3600 for the 5000 series. Anything over this is really down to the silicon gods. Many people can hit 3800, but over this threshold seems to be where the majority start to fall down. *I also think this is the land of very diminishing returns.*


The upgrade in cache design makes the IF speed less important than Zen2+. 

I got mine to boot 1:1 at 4000 using CL18 and benching that against 3600 1:1 CL18 was 1-2% better on average; both in gaming and rendering. It's not worth the extra heat, power and potential instability IMO.


----------



## rob-tech

JohnnyFlash said:


> The upgrade in cache design makes the IF speed less important than Zen2+.
> 
> I got mine to boot 1:1 at 4000 using CL18 and benching that against 3600 1:1 CL18 was 1-2% better on average; both in gaming and rendering. It's not worth the extra heat, power and potential instability IMO.


This is true, aiming for ultra high clocks with Zen 3 doesn't really make sense and the return on investment is quite poor. Something like 3200 CL14 will get about 97% of the performance for most scenarios people care about and will have the added benefit of being easier to run with full stability.

Right now the overall ram latency matters more, and there are benchmarks that even show 3200 CL14 outperforming slightly or being equal to 3600 CL16 in latency sensitive workloads such as games, whereas with Zen 2 there were some noticable gains going to 3600 CL16.


----------



## Catscratch

rob-tech said:


> This is true, aiming for ultra high clocks with Zen 3 doesn't really make sense and the return on investment is quite poor. Something like 3200 CL14 will get about 97% of the performance for most scenarios people care about and will have the added benefit of being easier to run with full stability.
> 
> Right now the overall ram latency matters more, and there are benchmarks that even show 3200 CL14 outperforming slightly or being equal to 3600 CL16 in latency sensitive workloads such as games, whereas with Zen 2 there were some noticable gains going to 3600 CL16.


Then 3600 cl14 is like the ideal ram to get. Something like this F4-3600C14D-16GTZNB-G.SKILL International Enterprise Co., Ltd.


----------



## JohnnyFlash

Catscratch said:


> Then 3600 cl14 is like the ideal ram to get. Something like this F4-3600C14D-16GTZNB-G.SKILL International Enterprise Co., Ltd.


My set can do CL14 at 1.45v and there was no noticeable difference, just more heat in the case. I would say 3600 CL16 is the sweet spot.


----------



## Imraneo

I have the 3600 CL16 8GBx4








F4-3600C16D-16GTZN - G.SKILL International Enterprise Co., Ltd.


Trident Z Neo DDR4-3600 CL16-16-16-36 1.35V 16GB (2x8GB) Engineered and optimized for full compatibility on the latest AMD Ryzen platforms, Trident Z Neo brings unparalleled DRAM memory performance and vibrant RGB lighting to any gaming PC or workstation with latest AMD Ryzen CPUs and AMD DDR4...




www.gskill.com





I cant seem to run at 3600Mhz. I have to go down to 3333Mhz for stability. I have a few queries..

1) Do you think I should RMA it?
2) Do you think its the IF fault for not being able to run at 1800 and not the RAM?
3) Does it really matter in terms of speed if I carry on with 3333Mhz?
4) I've yet to run 2 sticks at a go to test. Does running 4 sticks make it harder to achieve 3600Mhz?

Thanks for reading.
Cheers!


----------



## JohnnyFlash

Imraneo said:


> 1) Do you think I should RMA it? *Not yet.*
> 2) Do you think its the IF fault for not being able to run at 1800 and not the RAM? *Have you tested the ram at 3600 and the IF at stock?*
> 3) Does it really matter in terms of speed if I carry on with 3333Mhz? *Not much.*
> 4) I've yet to run 2 sticks at a go to test. Does running 4 sticks make it harder to achieve 3600Mhz? *If those sticks are dual ranked, yes. I couldn't find the answer on the product page.*
> 
> Thanks for reading.
> Cheers!


Leave the IF at stock and just test the ram, then also try only 2 sticks.


----------



## 1devomer

Imraneo said:


> 1) Do you think I should RMA it?


If you decide to RMA the chip, you should hold on until AMD roll out the new cpu stepping. 
Keep it in mind.


----------



## Imraneo

1devomer said:


> If you decide to RMA the chip, you should hold on until AMD roll out the new cpu stepping.
> Keep it in mind.


Sorry I didn't understand. What has new CPU steppings got to do with my potentially faulty RAMs?


----------



## 1devomer

Imraneo said:


> Sorry I didn't understand. What has new CPU steppings got to do with my potentially faulty RAMs?


If you need to send back the chip to RMA.
You should wait that AMD release the new stepping for the Ryzen 5000 series.

Some current bios update, already include the new cpu stepping revision, in the release notes.
If i had to send back a cpu, i would like to get the new stepping B2, instead of the current stepping B0.

You never know, maybe AMD actually fixed something and the new cpu stepping runs better!?


----------



## Imraneo

1devomer said:


> If you need to send back the chip to RMA.
> You should wait that AMD release the new stepping for the Ryzen 5000 series.
> 
> Some current bios update, already include the new cpu stepping revision, in the release notes.
> If i had to send back a cpu, i would like to get the new stepping B2, instead of the current stepping B0.
> 
> You never know, maybe AMD actually fixed something and the new cpu stepping runs better!?


I see.
I'm referring to my RAM issue, not CPU. Thus I got confused


----------



## 1devomer

Imraneo said:


> I see.
> I'm referring to my RAM issue, not CPU. Thus I got confused


My bad, apologies, yeah i were referring to the cpu. 😶
Tho, it is a good advice still, to whom to seek to RMA its cpu.

About your memory kit, according to this:





G.SKILL TridentZ NEO DDR4-3600 16GB Dual-Channel Memory Kit Review


G.SKILL's TridentZ NEO DDR4-3600 16GB dual-channel RAM kit goes under the chopping block in our AMD and Intel rigs.




www.tweaktown.com






G.Skill DDR4 3600 CL14 TridentZ Neo 16GB Review



The kit should be composed by 4 sticks of Samsung B-die memory chip, single side, single rank memory kit.
It should not be too stressful for the IMC itself, but rather stressful from an electrical point of view, since one need to power 4 slots.
So, i would start with 2T command rate, set the kit to 16-16-16-36-50, Trfc 300/350.
ProcODT 36/42 ohms, geardown disable, powerdown disable, rest auto, ram voltage 1.45v, IF1:1RAM.
Soc voltage starting from 1.1v to 1.15v  (1.175v max), rest auto.

Disable the fastboot, in the bios boot options.
Start from 3333Mhz, and jump one memory step at each reboot, until reaching 3600Mhz.
To allow the motherboard, training the memory properly.
If it is really necessary, force pci-e 3rd Gen subsystem in the bios.

The memory kit itself should be good, i would test one stick at the time, to look for faulty ram chips.
The built-in windows memory test, done at boot, is pretty good at checking ram sticks.
Just spend some time changing the default settings, with something a bit more stressful.


----------



## Imraneo

1devomer said:


> My bad, apologies, yeah i were referring to the cpu. 😶
> Tho, it is a good advice still, to whom to seek to RMA its cpu.
> 
> About your memory kit, according to this:
> 
> 
> 
> 
> 
> G.SKILL TridentZ NEO DDR4-3600 16GB Dual-Channel Memory Kit Review
> 
> 
> G.SKILL's TridentZ NEO DDR4-3600 16GB dual-channel RAM kit goes under the chopping block in our AMD and Intel rigs.
> 
> 
> 
> 
> www.tweaktown.com
> 
> 
> 
> 
> 
> 
> G.Skill DDR4 3600 CL14 TridentZ Neo 16GB Review
> 
> 
> 
> The kit should be composed by 4 sticks of Samsung B-die memory chip, single side, single rank memory kit.
> It should not be too stressful for the IMC itself, but rather stressful from an electrical point of view, since one need to power 4 slots.
> So, i would start with 2T command rate, set the kit to 16-16-16-36-50, Trfc 300/350.
> ProcODT 36/42 ohms, geardown disable, powerdown disable, rest auto, ram voltage 1.45v, IF1:1RAM.
> Soc voltage starting from 1.1v to 1.15v  (1.175v max), rest auto.
> 
> Disable the fastboot, in the bios boot options.
> Start from 3333Mhz, and jump one memory step at each reboot, until reaching 3600Mhz.
> To allow the motherboard, training the memory properly.
> If it is really necessary, force pci-e 3rd Gen subsystem in the bios.
> 
> The memory kit itself should be good, i would test one stick at the time, to look for faulty ram chips.
> The built-in windows memory test, done at boot, is pretty good at checking ram sticks.
> Just spend some time changing the default settings, with something a bit more stressful.


Thanks for the insight!
Honestly, I've not read much into RAM. Just expected run the default XMP profile and thats it.
I've already RMA my CPU and PSU for this new build and will take this slow for now. 3333Mhz works just fine if I intend to call it a day.


----------



## kairi_zeroblade

Imraneo said:


> Thanks for the insight!
> Honestly, I've not read much into RAM. Just expected run the default XMP profile and thats it.
> I've already RMA my CPU and PSU for this new build and will take this slow for now. 3333Mhz works just fine if I intend to call it a day.


in your case you need to add more voltage to your IMC (VSOC) the more sticks you put the more stress you give the IMC and the more voltage needed to keep those 4 sticks in sync..I tried the same kit the 2x16gb one and didn't even had issues loading DOCP with asus's default settings for a client..


----------



## Imraneo

Thanks guys.
Firstly, my sticks are actually CL16-19-19-19-39. Sorry for the wrong info earlier.
I started with adding more dram voltage. Currently at 1.45V. Upon saving BIOS settings, my PC rebooted twice by itself before finally loading Windows. Not sure if it was part of a selftest mechanism.. will have to test more on that.
However in Windows, the 3600 speed is confirmed and seems stable. Just wondering if this is safe? Honestly I don't really monitor RAM temps. Will the 266Mhz make much of difference?


----------



## JohnnyFlash

Imraneo said:


> Thanks guys.
> Firstly, my sticks are actually CL16-19-19-19-39. Sorry for the wrong info earlier.
> I started with adding more dram voltage. Currently at 1.45V. Upon saving BIOS settings, my PC rebooted twice by itself before finally loading Windows. Not sure if it was part of a selftest mechanism.. will have to test more on that.
> However in Windows, the 3600 speed is confirmed and seems stable. Just wondering if this is safe? Honestly I don't really monitor RAM temps. Will the 266Mhz make much of difference?


Run benchmarks for what you use and see for yourself. Run one set at 3600 and another at 3300, don't change anything else from the 3600 settings.


----------



## Huseyinbaykal

Got a replacement chance for my 5950x today. I need to decide if I want a new 5950x (2115SUS- 2017susor 2119sus) or a intel 11900k or kf or 10900kf. What should I do? Is 2115sus or 2017sus or 2119sus coded cpu newer ones that doesnt have problems? Need your opinions guys


----------



## JohnnyFlash

Huseyinbaykal said:


> Got a replacement chance for my 5950x today. I need to decide if I want a new 5950x (2115SUS- 2017susor 2119sus) or a intel 11900k or kf or 10900kf. What should I do? Is 2115sus or 2017sus or 2119sus coded cpu newer ones that doesnt have problems? Need your opinions guys


"Problems" aren't code specific. The only way to know, is to test.

That said, 95% of issues come from the stock curve being too aggressive for the chip, which can be corrected if you're willing to put the time in.


----------



## Huseyinbaykal

Got the 2119sus will test later. I have the time to put in but cant find any guide or any complate info. Read all the forum about my mb and ryzen. I will go over and over again


----------



## JohnnyFlash

Huseyinbaykal said:


> Got the 2119sus will test later. I have the time to put in but cant find any guide or any complate info. Read all the forum about my mb and ryzen. I will go over and over again


I would start by leaving everything stock, including the ram, and running memtest overnight. Then the next night do the same and run core cycler overnight. *Make sure you trust your cooling before doing this.*

If both pass, then set the memory profile and repeat. If that passes, then you can start playing with the CPU settings and timings. If you want a optimized and fully stable system, it takes about a week before it's usable if done right. This is true of both Intel or AMD. Too many people go by "it doesn't crash", which means nothing in reality.

It was 10 days from when I got my last component to when I when I was using the system. I know what the max core speed is for each core at voltages between 1.00 and 1.30, so there's no guess work involved as to what is stable. I just have to look at the boost and see if it lines up. But getting there takes time and effort.


----------



## Huseyinbaykal

Thanks for great help mate. I will do what you recomend. And maybe after that you can tell me other steps  thanks again 👍 @JohnnyFlash


----------



## 1devomer

Huseyinbaykal said:


> Got the 2119sus will test later. I have the time to put in but cant find any guide or any complate info. Read all the forum about my mb and ryzen. I will go over and over again





JohnnyFlash said:


> "Problems" aren't code specific. The only way to know, is to test.
> 
> That said, 95% of issues come from the stock curve being too aggressive for the chip, which can be corrected if you're willing to put the time in.


Well, avoiding early 5950x samples, that were released at launch, is *strongly *advised.

I would also remove the quotes and simply state _problems_.
A simple English Google search, will earn different results when looking for:
-10900k issues.
-5950x issues.

Looking at the Google search results, i think we can agree that the word _problems, _doesn't need the quotes.


----------



## JohnnyFlash

1devomer said:


> Well, avoiding early 5950x samples, that were released at launch, is *strongly *advised.
> 
> I would also remove the quotes and simply state _problems_.
> A simple English Google search, will earn different results when looking for:
> -10900k issues.
> -5950x issues.
> 
> Looking at the Google search results, i think we can agree that the word _problems, _doesn't need the quotes.


Yes, the early samples were the most affected, but it's very unlikely to get one now. As far as what the problems were, it was mainly that the voltage curve was too conservative, which could be fixed with a positive core offset. If you look at a lot of the threads that come up in that google search, my posts are in them on this and other forums.

Is there a difference between a chip needing a positive offset on a couple cores to be stable or sending it back and getting one that boosts 1-2 bins lower? It's the same thing, they just adjusted the curves. True problems where the chip was just not workable no matter what fell in the normal defective range. Binning always improves with time, so the newest chips have the best chance of hitting the top clocks, but that's not guaranteed.


----------



## Huseyinbaykal

Getting 1900fclk 3800mhz cl16 without any errors or restart. Hitting 5050mhz with stock settings of dark hero. I cant be more happy than this. Hope I can tune ram better.


----------



## 1devomer

JohnnyFlash said:


> Yes, the early samples were the most affected, but it's very unlikely to get one now. As far as what the problems were, it was mainly that the voltage curve was too conservative, which could be fixed with a positive core offset. If you look at a lot of the threads that come up in that google search, my posts are in them on this and other forums.
> 
> Is there a difference between a chip needing a positive offset on a couple cores to be stable or sending it back and getting one that boosts 1-2 bins lower? It's the same thing, they just adjusted the curves. True problems where the chip was just not workable no matter what fell in the normal defective range. Binning always improves with time, so the newest chips have the best chance of hitting the top clocks, but that's not guaranteed.


Deceptive customers practices are not a new thing, when speaking about the AMD company history.
And sometimes i wonder why i keep remembering the facts, instead of simply forgetting everything.

🤷‍♀️


----------



## Huseyinbaykal

Booting into windows with 4000mhz 2000fclk. Cinebench etc works but getting whea errors. Tryed increasing voltages but nothing changed. I will stick to 3800:1900. Also 4000:2000 didnt improved latency.


----------



## tsiros

Bought a 5950x within a month of it being available.
I get cache hierarchy errors when basically idling, no matter the motherboard settings (even pbo explicitly disabled and jedec ram settings).
My first amd cpu since 2015, when i went from an fx8150 to an i3 6320
This is the first cpu i ever had with such bad behavior. I am thoroughly disappointed.
Contacted the shop from where I bought it. I need their response before I continue.


----------



## Huseyinbaykal

tsiros said:


> Bought a 5950x within a month of it being available.
> I get cache hierarchy errors when basically idling, no matter the motherboard settings (even pbo explicitly disabled and jedec ram settings).
> My first amd cpu since 2015, when i went from an fx8150 to an i3 6320
> This is the first cpu i ever had with such bad behavior. I am thoroughly disappointed.
> Contacted the shop from where I bought it. I need their response before I continue.


just return it and get a new one. Its like day and night. Tryed disabling everything with the old one but nothing changed. New cpu is still hitting 5050mhz 1900fclock and have zero problems. Nothing disabled.


----------



## rob-tech

Does anyone know how to identify the stepping by looking at the CPU box, without opening anything? I know the production date code can be seen.

The reason that I ask is because there is a new B2 stepping launched and I want to get the latest revision when I swap out my 3950x.


----------



## kairi_zeroblade

rob-tech said:


> Does anyone know how to identify the stepping by looking at the CPU box, without opening anything? I know the production date code can be seen.
> 
> The reason that I ask is because there is a new B2 stepping launched and I want to get the latest revision when I swap out my 3950x.


its not out yet..

regarding your 1st query, the batch is imprinted on the Processor IHS..


----------



## GRABibus

It's incredible that people still have those weird issues....


----------



## Hueristic

GRABibus said:


> It's incredible that people still have those weird issues....


What people?


----------



## GRABibus

Hueristic said:


> What people?


people who still returns their CPU because of Wheas, reboots, 8 months after launch


----------



## Hueristic

GRABibus said:


> people who still returns their CPU because of Wheas, reboots, 8 months after launch


Point these people out because I don't see them.

This thread has morphed into a OC thread.


----------



## Sleepycat

rob-tech said:


> Does anyone know how to identify the stepping by looking at the CPU box, without opening anything? I know the production date code can be seen.
> 
> The reason that I ask is because there is a new B2 stepping launched and I want to get the latest revision when I swap out my 3950x.


Look at the actual CPU IHS which is on prominent display on the box's side transparent plastic window as shown in the pic below:









To interpret the batch code, it is for example in the Techspot photo below:
2038SUS = 2020, 38th week, SuzhoU (location of assembly), Saratoga (location of manufacture). So if you want a later batch, you'd be looking for something starting with a code 21 and a late week number. Today's date is week 59 for example.


----------



## rob-tech

Looks like there might be one final AM4 CPU lineup coming out with AMD V-Cache technology as a stop-gap for Alder Lake with ~15% performance uplift in gaming, this might not be the final/most powerful AM4 processor after all.


Sleepycat said:


> Look at the actual CPU IHS which is on prominent display on the box's side transparent plastic window as shown in the pic below:
> View attachment 2518474
> 
> 
> To interpret the batch code, it is for example in the Techspot photo below:
> 2038SUS = 2020, 38th week, SuzhoU (location of assembly), Saratoga (location of manufacture). So if you want a later batch, you'd be looking for something starting with a code 21 and a late week number. Today's date is week 59 for example.
> 
> 
> View attachment 2518476


Thanks, that's really helpful. 

I have decided to hold out a bit longer with all the talk of 3D V-Cache and a 15% bump in IPC that will be added to high end Zen 3 CPU's at the end of the year to beginning of 2022. I don't believe that AM4 is finished yet, with the supposed delay of AM5 toward the end of 2022.


----------



## LuchoU

It's very sad to see this lack of empathy in some invididuals. The WHEA issue in some early 5**0x CPU batches was/is real in 100% stock systems, because chips could not sustain the aggresive clocks with the default voltage curve, there were lot of people suffering this issue on new systems, including me, and there were several threads through the Internet with several pages of people trying to find a solution to this problem. If there is people entering late to the game and getting new batches that don't show this issue because possibly it has been solved in present batches, it does not hide the original issue with early batches and with people being affected by those. AMD never acknowledged this and never did a communicate to help people with some sort of express RMA, cross RMA or other types of help (money back?) that could have alleviated this situation. I don't support AMD, Intel, Nvidia or Apple, it doesn't matter to me, they are companies and as companies they need your money to make profit, but what I expect from this giant companies is to have good Customer Service and to create special communication channels to support this kind of issues.


----------



## tsiros

LuchoU said:


> because chips could not sustain the aggresive clocks with the default voltage curve,


My 5950x NEVER so much as _hiccuped_ under load, PBO on or off. 

Every time it has happened, the cpu was _idling_, at most watching a youtube.

I've used it in many different combinations of PBO on/off, ram at JEDEC/XMP, I even tried +50 mV to DRAM and +60 mV to CPU. I have found no configuration that is stable.
I have an asus b550-m, 2x32GB of 3600 ram, predator somethingsomething and a seasonic 550W platinum.

I am not even sure _if_ it is the cpu, but I can not think what else it can be. 

As I said, it has never, not once, failed when stress testing it to hell and back.


----------



## ENTERPRISE

Let's keep it friendly chaps


----------



## LuchoU

tsiros said:


> My 5950x NEVER so much as _hiccuped_ under load, PBO on or off.
> 
> Every time it has happened, the cpu was _idling_, at most watching a youtube.
> 
> I've used it in many different combinations of PBO on/off, ram at JEDEC/XMP, I even tried +50 mV to DRAM and +60 mV to CPU. I have found no configuration that is stable.
> I have an asus b550-m, 2x32GB of 3600 ram, predator somethingsomething and a seasonic 550W platinum.
> 
> I am not even sure _if_ it is the cpu, but I can not think what else it can be.
> 
> As I said, it has never, not once, failed when stress testing it to hell and back.


Hi tsiros, yes, I forgot to mention I also saw cases like yours, people with this issue at idle and IMO those were some of the worst scenarios. In both cases being at load with all the cores jumping, for instance playing a game, or like you said, just at windows desktop, the issue was totally random so it was very difficult to resolve. In both cases these are bad batches, I got "lucky" by playing with the curves and adding + values, which means extra voltage, but for other guys there was not solution at all even when adding extra voltage. I'm now stable, but with +5 in my both best cores at stock, I'm waiting for AMD to release the new stepping and I will probably start the RMA process.

Good luck with yours!


----------



## Pictus

tsiros said:


> My 5950x NEVER so much as _hiccuped_ under load, PBO on or off.
> 
> Every time it has happened, the cpu was _idling_, at most watching a youtube.
> 
> I've used it in many different combinations of PBO on/off, ram at JEDEC/XMP, I even tried +50 mV to DRAM and +60 mV to CPU. I have found no configuration that is stable.
> I have an asus b550-m, 2x32GB of 3600 ram, predator somethingsomething and a seasonic 550W platinum.
> 
> I am not even sure _if_ it is the cpu, but I can not think what else it can be.
> 
> As I said, it has never, not once, failed when stress testing it to hell and back.


Make sure the BIOS is updated(Version 2407 or Version 2404) and the chipset driver








AMD Ryzen Chipset Drivers (4.11.15.342) Download


This driver package contains the chipset drivers for AMD Ryzen processors for best performance and energy-efficient operation on Microsoft Windows.




www.techpowerup.com





In the BIOS try:
*Global C-state Control = Disabled
Power Supply Idle Control = Typical Current Idle*

They should fix your idle problems, but you may also
Set the "Minimum processor state" to not less than 10%









And in the BIOS I also like:

Download & Install ARMOURY CRATE app = Disabled
ErP Ready = Enable S4+S5
SB Clock Spread Spectrum = Disabled
VDDCR CPU Switching Frequency = 350
VDDCR CPU Power Phase Control = Extreme
VDDCR SOC Switching Frequency = 350
VDDCR SOC Power Phase Control = Extreme
PCIEX16_1 Mode = GEN3 or GEN4(depends on the GPU), but NOT AUTO


----------



## Anthos

tsiros said:


> My 5950x NEVER so much as _hiccuped_ under load, PBO on or off.
> 
> Every time it has happened, the cpu was _idling_, at most watching a youtube.
> 
> I've used it in many different combinations of PBO on/off, ram at JEDEC/XMP, I even tried +50 mV to DRAM and +60 mV to CPU. I have found no configuration that is stable.
> I have an asus b550-m, 2x32GB of 3600 ram, predator somethingsomething and a seasonic 550W platinum.
> 
> I am not even sure _if_ it is the cpu, but I can not think what else it can be.
> 
> As I said, it has never, not once, failed when stress testing it to hell and back.


When you are stress testing and all the cores are working, they obviously produce a lot of heat and that thermally limits how high they can clock. When you are doing something like watching youtube, the heat is low as most cores are idling so the one or two cores that are active try to reach a much higher frequency.


----------



## Blair Maynard

Hueristic said:


> What people?


Me. I got my 5950x about six weeks ago and I am having these same issues. Black screen crashes in idle (though my motherboard seems to think it is a RAM issue). I got it on Amazon, so maybe it is an old returned one that was being resold. I will check the number on the box when I get home from work this evening.


----------



## Hueristic

Blair Maynard said:


> Me. I got my 5950x about six weeks ago and I am having these same issues. Black screen crashes in idle (though my motherboard seems to think it is a RAM issue). I got it on Amazon, so maybe it is an old returned one that was being resold. I will check the number on the box when I get home from work this evening.


"six weeks ago " and registered today, tell me more, as you said maybe old stock or a return they resold.

Keep us informed and good luck.

Fortunately you have this entire thread of info to help you.


----------



## LuchoU

Blair Maynard said:


> Me. I got my 5950x about six weeks ago and I am having these same issues. Black screen crashes in idle (though my motherboard seems to think it is a RAM issue). I got it on Amazon, so maybe it is an old returned one that was being resold. I will check the number on the box when I get home from work this evening.


The best option would be to check the batch info, but that is printed in the CPU itself, so you will need to remove the cooler. Probably is from 2020 or early 2021.


----------



## Luckbad

I bought a 5900x and Asus Dark Hero last weekend from Micro Center and have had a horrible time with it.

I upgraded from a 9900k. Full water cooling loop, great temps, B-Die ram, 3090 Kingpin, 1000W Seasonic PSU.

I've been getting occasional random restarts, especially at idle or doing something simple like watching a YouTube video.

Tried all of the common advice, loosened ram timings, adjusted voltage, disabled C-states, etc.

I'm about two random restarts from returning everything and going back to the 9900k even though I can use the additional cores.


----------



## dansi

Luckbad said:


> I bought a 5900x and Asus Dark Hero last weekend from Micro Center and have had a horrible time with it.
> 
> I upgraded from a 9900k. Full water cooling loop, great temps, B-Die ram, 3090 Kingpin, 1000W Seasonic PSU.
> 
> I've been getting occasional random restarts, especially at idle or doing something simple like watching a YouTube video.
> 
> Tried all of the common advice, loosened ram timings, adjusted voltage, disabled C-states, etc.
> 
> I'm about two random restarts from returning everything and going back to the 9900k even though I can use the additional cores.


random restarts are probably down to PBO curve optimiser boost issues. Make sure it was set to default?


----------



## Luckbad

Yep, tried default, tried adding voltage instead of subtracting, tried Dynamic OC Switcher on and off... all sorts of things.

The random restarts are very hard to reproduce or predict, but I've also been getting instability with various settings in things like Prime95 or IntelBurnTest.

It's been a very frustrating slog. I loved AMD back in the day (early-mid 2000s) but jumped ship when they couldn't keep up with Intel. I was excited that AMD finally came back with this generation, but it's just been a heap of disappointment.

It's been more than a decade since I've had to do this much manual tweaking just to try to achieve stability. XMP settings for my ram seem to be impossible to achieve (G.Skill DDR4 3600 CL15), while it was incredibly easy with the Intel platform.

Ugh.

I'm frustrated, and unfortunately no advice seems to be worth applying at this point. I have to decide to either try to exchange for another 5900x or 5950x or return the entire thing. Micro Center does 15 days for CPUs and motherboards, so I'd have about a week to figure out if a replacement is fully stable should I go that route.

I'd heard horror stories of AMD drivers and stability, but it seemed that was mostly a thing of the past as well as primarily with the GPUs. I figured I'd have minimal trouble given that I've been building my own PCs since the 90s. No such luck.


----------



## Blair Maynard

Luckbad said:


> I bought a 5900x and Asus Dark Hero last weekend from Micro Center and have had a horrible time with it.
> 
> I upgraded from a 9900k. Full water cooling loop, great temps, B-Die ram, 3090 Kingpin, 1000W Seasonic PSU.
> 
> I've been getting occasional random restarts, especially at idle or doing something simple like watching a YouTube video.
> 
> Tried all of the common advice, loosened ram timings, adjusted voltage, disabled C-states, etc.
> 
> I'm about two random restarts from returning everything and going back to the 9900k even though I can use the additional cores.


I too have the Asus ROG Crosshair VIII Dark Hero motherboard for my 5950x, I have new Crucial 2x32 3600 RAM (I think it is Samsung E die) in slots 2 and 4. I don't get blue screens or anything like a normal Windows crash. The monitor screen just goes dark, some fans spin up and some fans stop, the amber light comes on indicating a RAM error, all the motherboard LED lights stay on. The only thing that works on the computer is the power supply shut off switch. This usually happens when I leave the computer on for a long period of time or when I am using it but not really doing anything. In the beginning, it was a few times a day. Then I loaded XMP (DOCP or something like that) and the crashes happened more frequently until I raised the RAM voltage from 1.35 to 1.37. I tried higher, but seemed to get more of these crashes. I called Crucial and they directed me to reset CMOS. I did that and set everything in the Asus firmware to default. I got one crash not long after, but none in the last day or two.

EDIT: My problem seems to have been solved by one of two things (or both) which I did following advice in this thread and others: I disabled power down mode in memory timings, and I put 10% in Windows 10 Power Options/Advanced Settings/Processor Power Management/Minimum Processor State. The computer was on for a day and no black screen crash. I then turned on DOCP/XMP and LEFT THE RAM VOLTAGE AT THE RATED 1.35v. The computer has been on in this state for about 24 hours and no crash. Fingers Crossed.


----------



## 1devomer

Luckbad said:


> I bought a 5900x and Asus Dark Hero last weekend from Micro Center and have had a horrible time with it.
> 
> I upgraded from a 9900k. Full water cooling loop, great temps, B-Die ram, 3090 Kingpin, 1000W Seasonic PSU.
> 
> I've been getting occasional random restarts, especially at idle or doing something simple like watching a YouTube video.
> 
> Tried all of the common advice, loosened ram timings, adjusted voltage, disabled C-states, etc.
> 
> I'm about two random restarts from returning everything and going back to the 9900k even though I can use the additional cores.





LuchoU said:


> The best option would be to check the batch info, but that is printed in the CPU itself, so you will need to remove the cooler. Probably is from 2020 or early 2021.


As @LuchoU said, check the manufacturing date reported on the CPU.

If it is an early sample, like late 2020 or early 2021, ask your reseller a replacement.

Otherwise send it back to AMD, it is a bit annoying, but I wouldn't come back to the 9900, if you really need the core count.


----------



## dansi

Luckbad said:


> Yep, tried default, tried adding voltage instead of subtracting, tried Dynamic OC Switcher on and off... all sorts of things.
> 
> The random restarts are very hard to reproduce or predict, but I've also been getting instability with various settings in things like Prime95 or IntelBurnTest.
> 
> It's been a very frustrating slog. I loved AMD back in the day (early-mid 2000s) but jumped ship when they couldn't keep up with Intel. I was excited that AMD finally came back with this generation, but it's just been a heap of disappointment.
> 
> It's been more than a decade since I've had to do this much manual tweaking just to try to achieve stability. XMP settings for my ram seem to be impossible to achieve (G.Skill DDR4 3600 CL15), while it was incredibly easy with the Intel platform.
> 
> Ugh.
> 
> I'm frustrated, and unfortunately no advice seems to be worth applying at this point. I have to decide to either try to exchange for another 5900x or 5950x or return the entire thing. Micro Center does 15 days for CPUs and motherboards, so I'd have about a week to figure out if a replacement is fully stable should I go that route.
> 
> I'd heard horror stories of AMD drivers and stability, but it seemed that was mostly a thing of the past as well as primarily with the GPUs. I figured I'd have minimal trouble given that I've been building my own PCs since the 90s. No such luck.


you can check event viewer of the warning and error logs, hopefully it shows which cores may be crashing.

but if you are crashing with bios default, better send it in for exchange or rma


----------



## zerodisbelief

Ive been living with this issue since February and it's driving me insane. 

Screens go black and I can still hear audio come through for a bit if I'm playing music. Then the fans spin up and then the system reboots itself.
I check the event logs and no event logs are recorded automatically to tell me what even went wrong. 
How can I fix this? ELI5, please.


----------



## Blair Maynard

zerodisbelief said:


> Ive been living with this issue since February and it's driving me insane.
> 
> Screens go black and I can still hear audio come through for a bit if I'm playing music. Then the fans spin up and then the system reboots itself.
> I check the event logs and no event logs are recorded automatically to tell me what even went wrong.
> How can I fix this? ELI5, please.


Did you try disabling power down mode in memory timings?


----------



## LuchoU

zerodisbelief said:


> Ive been living with this issue since February and it's driving me insane.
> 
> Screens go black and I can still hear audio come through for a bit if I'm playing music. Then the fans spin up and then the system reboots itself.
> I check the event logs and no event logs are recorded automatically to tell me what even went wrong.
> How can I fix this? ELI5, please.


First you need to be at full stock settings in bios, just set your ram to XMP profile.

Look for CoreCycler in google and test all your cores.

In Windows there is an option to create a full system dump when there is a critical error, that should give Windows some time to write the event in the log and you will be able to see the APIC ID to identify the failing core. If I remember well the option is located in system settings.

i believe the only possible way to workaround this issue is by adding additional voltage to the failing cores. You can try to add +10 or +5 in bios (AMD PBO) to all cores and see of there is any gain in stability. By using CoreCycler it will be easier to identify the problematic cores.

The definitive solution is to RMA your CPU. If you can look for the batch in your CPU (you will need to remove your cooling solution) an it's from 2020 or early 2021 then most probably you got a very badly binned CPU.

Enviado desde mi SM-G960U1 mediante Tapatalk


----------



## zerodisbelief

LuchoU said:


> First you need to be at full stock settings in bios, just set your ram to XMP profile.
> 
> Look for CoreCycler in google and test all your cores.
> 
> In Windows there is an option to create a full system dump when there is a critical error, that should give Windows some time to write the event in the log and you will be able to see the APIC ID to identify the failing core. If I remember well the option is located in system settings.
> 
> i believe the only possible way to workaround this issue is by adding additional voltage to the failing cores. You can try to add +10 or +5 in bios (AMD PBO) to all cores and see of there is any gain in stability. By using CoreCycler it will be easier to identify the problematic cores.
> 
> The definitive solution is to RMA your CPU. If you can look for the batch in your CPU (you will need to remove your cooling solution) an it's from 2020 or early 2021 then most probably you got a very badly binned CPU.
> 
> Enviado desde mi SM-G960U1 mediante Tapatalk


Thanks for the help. I'm going to RMA the CPU.
I was thinking that this was a GPU thanks for pointing me in the right direction.


I checked the setting in system settings and its already set to automatically create a dump file. When I opened the file up in WinDBG I don't see any APIC mentions, the bottom of the file talks about nvlddmkm.sys


*****
* Bugcheck Analysis *
*****

VIDEO_TDR_FAILURE (116)
Attempt to reset the display driver and recover from timeout failed.
Arguments:
Arg1: ffffc401104be460, Optional pointer to internal TDR recovery context (TDR_RECOVERY_CONTEXT).
Arg2: fffff80367347b14, The pointer into responsible device driver module (e.g. owner tag).
Arg3: ffffffffc000009a, Optional error code (NTSTATUS) of the last failed operation.
Arg4: 0000000000000004, Optional internal context dependent data.
UGCHECK_CODE: 116
BUGCHECK_P1: ffffc401104be460
BUGCHECK_P2: fffff80367347b14
BUGCHECK_P3: ffffffffc000009a
BUGCHECK_P4: 4
VIDEO_TDR_CONTEXT: dt dxgkrnl!_TDR_RECOVERY_CONTEXT ffffc401104be460
Symbol dxgkrnl!_TDR_RECOVERY_CONTEXT not found.
PROCESS_OBJECT: 0000000000000004
BLACKBOXBSD: 1 (!blackboxbsd)
BLACKBOXNTFS: 1 (!blackboxntfs)
BLACKBOXPNP: 1 (!blackboxpnp)
BLACKBOXWINLOGON: 1
PROCESS_NAME: System

STACK_TEXT:
ffffa18d`063ef9d8 fffff803`61c91cae : 00000000`00000116 ffffc401`104be460 fffff803`67347b14 ffffffff`c000009a : nt!KeBugCheckEx
ffffa18d`063ef9e0 fffff803`61c424d4 : fffff803`67347b14 ffffc40f`deff68a0 00000000`00002000 ffffc40f`deff6960 : dxgkrnl!TdrBugcheckOnTimeout+0xfe
ffffa18d`063efa20 fffff803`61c3b00f : ffffc40f`df04e000 00000000`01000000 00000000`00000002 00000000`00000002 : dxgkrnl!ADAPTER_RENDER::Reset+0x174
ffffa18d`063efa50 fffff803`61c913d5 : 00000000`00000100 ffffc40f`df04ea58 00000000`00000000 00000000`00000000 : dxgkrnl!DXGADAPTER::Reset+0x4df
ffffa18d`063efad0 fffff803`61c91547 : fffff803`51f24440 00000000`00000000 00000000`00000000 00000000`00000000 : dxgkrnl!TdrResetFromTimeout+0x15
ffffa18d`063efb00 fffff803`514b8505 : ffffc40f`ebdc7040 fffff803`61c91520 ffffc40f`bb487750 ffffc40f`00000000 : dxgkrnl!TdrResetFromTimeoutWorkItem+0x27
ffffa18d`063efb30 fffff803`51555845 : ffffc40f`ebdc7040 00000000`00000080 ffffc40f`bb4ac100 00000000`00000001 : nt!ExpWorkerThread+0x105
ffffa18d`063efbd0 fffff803`515fe828 : ffff9180`8d3a3180 ffffc40f`ebdc7040 fffff803`515557f0 00000000`00010000 : nt!PspSystemThreadStartup+0x55
ffffa18d`063efc20 00000000`00000000 : ffffa18d`063f0000 ffffa18d`063e9000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x28


SYMBOL_NAME: nvlddmkm+dc7b14
MODULE_NAME: nvlddmkm
IMAGE_NAME: nvlddmkm.sys
STACK_COMMAND: .thread ; .cxr ; kb
FAILURE_BUCKET_ID: 0x116_IMAGE_nvlddmkm.sys
OS_VERSION: 10.0.19041.1
BUILDLAB_STR: vb_release
OSPLATFORM_TYPE: x64
OSNAME: Windows 10

Am I looking at this wrong?


----------



## zerodisbelief

Blair Maynard said:


> Did you try disabling power down mode in memory timings?


No I'll try this too. thanks!

[EDIT: I disabled power down mode]


----------



## 1devomer

zerodisbelief said:


> Thanks for the help. I'm going to RMA the CPU.
> I was thinking that this was a GPU thanks for pointing me in the right direction.
> 
> 
> I checked the setting in system settings and its already set to automatically create a dump file. When I opened the file up in WinDBG I don't see any APIC mentions, the bottom of the file talks about nvlddmkm.sys
> 
> 
> *****
> * Bugcheck Analysis *
> *****
> 
> VIDEO_TDR_FAILURE (116)
> Attempt to reset the display driver and recover from timeout failed.
> Arguments:
> Arg1: ffffc401104be460, Optional pointer to internal TDR recovery context (TDR_RECOVERY_CONTEXT).
> Arg2: fffff80367347b14, The pointer into responsible device driver module (e.g. owner tag).
> Arg3: ffffffffc000009a, Optional error code (NTSTATUS) of the last failed operation.
> Arg4: 0000000000000004, Optional internal context dependent data.
> UGCHECK_CODE: 116
> BUGCHECK_P1: ffffc401104be460
> BUGCHECK_P2: fffff80367347b14
> BUGCHECK_P3: ffffffffc000009a
> BUGCHECK_P4: 4
> VIDEO_TDR_CONTEXT: dt dxgkrnl!_TDR_RECOVERY_CONTEXT ffffc401104be460
> Symbol dxgkrnl!_TDR_RECOVERY_CONTEXT not found.
> PROCESS_OBJECT: 0000000000000004
> BLACKBOXBSD: 1 (!blackboxbsd)
> BLACKBOXNTFS: 1 (!blackboxntfs)
> BLACKBOXPNP: 1 (!blackboxpnp)
> BLACKBOXWINLOGON: 1
> PROCESS_NAME: System
> 
> STACK_TEXT:
> ffffa18d`063ef9d8 fffff803`61c91cae : 00000000`00000116 ffffc401`104be460 fffff803`67347b14 ffffffff`c000009a : nt!KeBugCheckEx
> ffffa18d`063ef9e0 fffff803`61c424d4 : fffff803`67347b14 ffffc40f`deff68a0 00000000`00002000 ffffc40f`deff6960 : dxgkrnl!TdrBugcheckOnTimeout+0xfe
> ffffa18d`063efa20 fffff803`61c3b00f : ffffc40f`df04e000 00000000`01000000 00000000`00000002 00000000`00000002 : dxgkrnl!ADAPTER_RENDER::Reset+0x174
> ffffa18d`063efa50 fffff803`61c913d5 : 00000000`00000100 ffffc40f`df04ea58 00000000`00000000 00000000`00000000 : dxgkrnl!DXGADAPTER::Reset+0x4df
> ffffa18d`063efad0 fffff803`61c91547 : fffff803`51f24440 00000000`00000000 00000000`00000000 00000000`00000000 : dxgkrnl!TdrResetFromTimeout+0x15
> ffffa18d`063efb00 fffff803`514b8505 : ffffc40f`ebdc7040 fffff803`61c91520 ffffc40f`bb487750 ffffc40f`00000000 : dxgkrnl!TdrResetFromTimeoutWorkItem+0x27
> ffffa18d`063efb30 fffff803`51555845 : ffffc40f`ebdc7040 00000000`00000080 ffffc40f`bb4ac100 00000000`00000001 : nt!ExpWorkerThread+0x105
> ffffa18d`063efbd0 fffff803`515fe828 : ffff9180`8d3a3180 ffffc40f`ebdc7040 fffff803`515557f0 00000000`00010000 : nt!PspSystemThreadStartup+0x55
> ffffa18d`063efc20 00000000`00000000 : ffffa18d`063f0000 ffffa18d`063e9000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x28
> 
> 
> SYMBOL_NAME: nvlddmkm+dc7b14
> MODULE_NAME: nvlddmkm
> IMAGE_NAME: nvlddmkm.sys
> STACK_COMMAND: .thread ; .cxr ; kb
> FAILURE_BUCKET_ID: 0x116_IMAGE_nvlddmkm.sys
> OS_VERSION: 10.0.19041.1
> BUILDLAB_STR: vb_release
> OSPLATFORM_TYPE: x64
> OSNAME: Windows 10
> 
> Am I looking at this wrong?


From how you describe it, indeed it seems to be related to the gpu, instead of the cpu.

It seems that the Nvidia driver crashed, with a TDR error, which mean that the gpu was busy too long and didn't report back to the OS in time.

I would advise a clean driver install, get yourself the DDU tool, enable the driver cleaning in safe mode, in the tools options.

Download the latest AMD chipset drivers and Nvidia gpu drivers.

Unplug the ethernet cable, wifi, launch the tool and perform a cleaning of both AMD and gpu drivers, rebooting in safe mode.

Reboot once finished, install the chipset drivers and the gpu drivers.
Plug back your lan ethernet cable, wifi.

Test again, with default settings as @LuchoU pointed out, clear your CMOS, boot up into the bios, load optimized settings, setup the XMP/DOC memory profile.
Check the gpu t°, if everything seems fine on the gpu side, you can try to drop the pci-e Gen4 to pci-e Gen3 and check if the TDR crash still occurs.

The point is, trying to understand if the issues come from the gpu or from an unstable cpu, CoreCycler is a way to go to check your cpu cores.


----------



## mongoled

@zerodisbelief 

You could also try taking out the video card, cleaning the PCIe contacts on the video card with a pencil eraser and then re-seating the GPU ...


----------



## zerodisbelief

1devomer said:


> From how you describe it, indeed it seems to be related to the gpu, instead of the cpu.
> 
> It seems that the Nvidia driver crashed, with a TDR error, which mean that the gpu was busy too long and didn't report back to the OS in time.
> 
> I would advise a clean driver install, get yourself the DDU tool, enable the driver cleaning in safe mode, in the tools options.
> 
> Download the latest AMD chipset drivers and Nvidia gpu drivers.
> 
> Unplug the ethernet cable, wifi, launch the tool and perform a cleaning of both AMD and gpu drivers, rebooting in safe mode.
> 
> Reboot once finished, install the chipset drivers and the gpu drivers.
> Plug back your lan ethernet cable, wifi.
> 
> Test again, with default settings as @LuchoU pointed out, clear your CMOS, boot up into the bios, load optimized settings, setup the XMP/DOC memory profile.
> Check the gpu t°, if everything seems fine on the gpu side, you can try to drop the pci-e Gen4 to pci-e Gen3 and check if the TDR crash still occurs.
> 
> The point is, trying to understand if the issues come from the gpu or from an unstable cpu, CoreCycler is a way to go to check your cpu cores.



I'll defo do all these tests because I just looked in to my event viewer and I just spotted this.I might be having both CPU and GPU issues *** T-T 

How can 5950x and 3080OC both do me so wrong -.- ....
I just did BIOS flashback to the newest version of the BIOS too.


----------



## 1devomer

zerodisbelief said:


> I'll defo do all these tests because I just looked in to my event viewer and I just spotted this.I might be having both CPU and GPU issues *** T-T
> 
> How can 5950x and 3080OC both do me so wrong -.- ....
> I just did BIOS flashback to the newest version of the BIOS too.
> 
> View attachment 2521521


This kind of ACPI error is annoying, indeed, in theory, it should not impact performance or stability.
Depending on the drivers, software installed and the bios settings, the OS will complain it could not access to various part of the ACPI hardware devices, listed into the bios.
In this case, it seems that something is preventing or is accessing the EC without OS awareness, maybe some monitoring, RyzenMaster or the motherboard software.

What you should look for in the event viewer, are the common WHEA errors, reporting the cpu or system crash, if any.


----------



## AnimeGirlfriend

I am also getting these symptoms with my MSI MB and 5900X. Probably will refund and just wait and see what comes next year since I don't want to get another one from amazon and get the same issue later...


----------



## R7_5800x

Hey All,

I've got a 5800x on a x570 aorus pro wifi v1.2, put the system together in December and had struggled with system stability initially.

Ram: G.Skil Trident Z NEO 3600 cl16 kit 4x16gb for 64gb of memory (have tested good at 3600mhz with 2x DIMMs installed and 4x DIMMs installed)
GPU: MSI 3090 
PSU: NZXT C850 
AIO: NZXT X73 Kraken 360mm (push/pull) 
Case is a LianLi PC011D with all corsair fans for bottom and side intake along with push/pull on top mount 360mm AIO rad.

Ended up in a working state where F31o bios along with using manual PBO values for PPT (125), EDC (150), TDC (150) and the vcore set to 1.25000. Anything less on the EDC and system would blackscreen reboot or BSOD.

This was working up until last week when I started getting BSODs after working for the last 8 months without issue. 

I decided to install the latest AMD chipset drivers, which seemed to make things worse. Then moved on to updating the BIOS to latest available for the board being F34, since then anytime CPB even without PBO enabled/manual/auto will cause the system to blackscreen reboot when any game or workload that doesn't seem to take cpu to 100% usage immediately is launched. Made sure to clear prior bios settings and do a boot using defaults, then go back and change values. Even tried stepping back down to F30o which now won't work, and tried going to F33 then back to F34 (still there now). 

Things done so far, changed ram to default 2133mhz speed, set XMP values to match manually in ram timings and DRAM voltage, tried running at 3200mhz and matching FCLK of 1600. Ram passed memory tests with memtest+ and OCCT as long as CPB is turned off, as soon as CPB is turned on now system will blackscreen reboot within 30s to 15 minutes of games/programs being launched.

Did try changing PBO curve optimization to apply positive value of 1-5 which made no difference, c-states has been disabled. Windows Balanced Performance Plan have the idle % set to 10% as well. Have checked VSOC and did incremental testing on voltage varying from 1.0 to 1.175 which seemed to make no difference.

Currently my only error free options are to run the 5800x with CPB and PBO off 100%, or to set VCORE to 1.35000 and set multiplier to 45.50 to get 4550mhz on all core. Haven't tried to set anything higher for the static multiplier or VCORE even though I know they go higher when in AUTO, etc. Ram runs no issue either default or XMP 3600 cl16 1.35vdc. 

I suppose I should RMA this 5800x at this point? 

Any other suggestions?

Thanks for taking the time to read and potential advice,

X


----------



## rdr09

What happens when you set everything to optimized default including RAM?

It could be your motherboard that is at fault.


----------



## R7_5800x

rdr09 said:


> What happens when you set everything to optimized default including RAM?
> 
> It could be your motherboard that is at fault.


If I load optimized default for all settings it still crashes since CPB is enabled and PBO is auto in the defaults.

X


----------



## 1devomer

R7_5800x said:


> If I load optimized default for all settings it still crashes since CPB is enabled and PBO is auto in the defaults.
> 
> X


Read the manufacturing date engraved onto the cpu IHS.
If it is an early launch date sample AND you performed your duty, by doing diligently the usual AMD troubleshooting checks, RMA it back to AMD.

If you got reboots at idle on an early chip sample, with the latest stock bios and XMP RAM, being sure that the SOC and RAM voltage are in check, RMA it back to AMD.


----------



## bebius

Another one on the wagon here...
My 5600X fails p95 on the "best" core at stock settings.
When running PBO, I have to give it +10 +11 on the curve to clear the errors.
Already have filled an RMA request for AMD Europe.


----------



## 1devomer

bebius said:


> Another one on the wagon here...
> My 5600X fails p95 on the "best" core at stock settings.
> When running PBO, I have to give it +10 on the curve to clear the errors.
> Already have filled an RMA request for AMD Europe.


Can you, please, provide us the manufacturing date engraved on the cpu?


----------



## bebius

1devomer said:


> Can you, please, provide us the manufacturing date engraved on the cpu?


Sure, I'll do so when I extract it.


----------



## bebius

I am asking here, so I don't open a new worthless thread:
Can I use a ryzen 1200 while waiting for my 5600X replacement on my msi X470 Gaming Carbon pro? The bios page is this.
I will have to buy a used one, so I need to know beforehand.
Thanks in advance for your help.

Edit: I found it out, it should work.


----------



## 1devomer

bebius said:


> I am asking here, so I don't open a new worthless thread:
> Can I use a ryzen 1200 while waiting for my 5600X replacement on my msi X470 Gaming Carbon pro? The bios page is this.
> I will have to buy a used one, so I need to know beforehand.
> Thanks in advance for your help.
> 
> Edit: I found it out, it should work.


I also lean toward the 1200 still being supported, on this model.
Looking at the bios updates page, nothing about having removed 1st gen Ryzen support.


----------



## bebius

1devomer said:


> Can you, please, provide us the manufacturing date engraved on the cpu?


The number is 2119SUS. It's the 19th week of the year 2021, isn't it?


----------



## kairi_zeroblade

bebius said:


> The number is 2119SUS. It's the 19th week of the year 2021, isn't it?


yes, how weird, its a latest batch..so batches won't really tell you the tale..where was it diffused??


----------



## bebius

kairi_zeroblade said:


> yes, how weird, its a latest batch..so batches won't really tell you the tale..where was it diffused??


Diffused in USA
Diffused in Taiwan
Made in China


----------



## 1devomer

kairi_zeroblade said:


> yes, how weird, its a latest batch..so batches won't really tell you the tale..where was it diffused??


Batch tell you a tale, if you spent time checking how much AMD is pressured, by supply constrain on big contracts.
It was the case at launch, but it does not mean that AMD can't change dynamically its chipset binning through the year, depending on its needs.
It is what actually make chiplets manufacturing so flexible, interesting and lucrative.




bebius said:


> The number is 2119SUS. It's the 19th week of the year 2021, isn't it?


Yep, which translate to May 2021 sample. 
The only thing that could push AMD to lower again its bin quality, is the preparation and chiplet stockpiling, for the new ThreadRipper line-up, at the end of the year.


----------



## bebius

I guess it could always be a very small probability failure that's always present.
Btw they approved the RMA.


----------



## Skeetanator

Hey All,

Long time since I have been on here (had to create a new account). I built a new system a couple months ago and I have had issues with crashing/black screens since the initial build. After many adjustments and still receiving the occasional black screen, I have given up and just submitted an RMA request. I might be missing something and thought I would post my details here.

CPU: 5800X
Motherboard: Asus TUF B550-PLUS (no wifi) BIOS v2407
Ram: G.Skill Ripjaws V series 32GB (2 x 16GB) 3600 mhz
GPU: AMD R9 290 (one day I will be able to buy a new GPU for this system)
PSU: Seasonic FOCUS 650W 80+ Gold
Cooling: Custom water cooling loop
Case: Phanteks P500A

I experience random crashes/black screenshots on both default BIOS settings and custom BIOS settings with XMP enabled. I have tried all variations and no settings or default appear to be stable.

To get the crashing/black screen, this only seems to happen while gaming. I can run P95, AID64 and OCCT with no issues or errors. I only crash while gaming. I will appear "stable" after running these tests, but crash/black screen while gaming. This can happen right after i start gaming or it may take a bit to receive the black screen. The event viewer log shows a critical error Kernel-Power, event ID 41, task category 63.

I recently started testing with the CoreCycler tool created by sp00n82 and this has been successful in detecting errors with individual cores. After receiving errors using this tool, I tried adjusting all my BIOS settings again to re-test (default, XMP on/off, etc). All settings prompted an error using this tool.

I took a picture of my CPU before installing it and have the details available on the chip in case this helps with anything.
Diffused in USA
Diffused in Taiwan
Made in Malaysia
2103


----------



## bebius

Skeetanator said:


> Hey All,
> 
> Long time since I have been on here (had to create a new account). I built a new system a couple months ago and I have had issues with crashing/black screens since the initial build. After many adjustments and still receiving the occasional black screen, I have given up and just submitted an RMA request. I might be missing something and thought I would post my details here.
> 
> CPU: 5800X
> Motherboard: Asus TUF B550-PLUS (no wifi) BIOS v2407
> Ram: G.Skill Ripjaws V series 32GB (2 x 16GB) 3600 mhz
> GPU: AMD R9 290 (one day I will be able to buy a new GPU for this system)
> PSU: Seasonic FOCUS 650W 80+ Gold
> Cooling: Custom water cooling loop
> Case: Phanteks P500A
> 
> I experience random crashes/black screenshots on both default BIOS settings and custom BIOS settings with XMP enabled. I have tried all variations and no settings or default appear to be stable.
> 
> To get the crashing/black screen, this only seems to happen while gaming. I can run P95, AID64 and OCCT with no issues or errors. I only crash while gaming. I will appear "stable" after running these tests, but crash/black screen while gaming. This can happen right after i start gaming or it may take a bit to receive the black screen. The event viewer log shows a critical error Kernel-Power, event ID 41, task category 63.
> 
> I recently started testing with the CoreCycler tool created by sp00n82 and this has been successful in detecting errors with individual cores. After receiving errors using this tool, I tried adjusting all my BIOS settings again to re-test (default, XMP on/off, etc). All settings prompted an error using this tool.
> 
> I took a picture of my CPU before installing it and have the details available on the chip in case this helps with anything.
> Diffused in USA
> Diffused in Taiwan
> Made in Malaysia
> 2103


You could try to test your ram a little more with TM5 usmus and antaExtreme profiles at stock settings.
If it passes 2-3 runs ram can be ruled out and then you can open an RMA for the cpu. As a proof, use a video where you reset bios defaults and run p95 after setting the affinity to the problematic core (the one that fails with the CoreCycler) using the FFT size that failed.


----------



## LancerVI

I'm sorry to report that I am now also having this issue on an Asus Crosshair VIII Hero (wifi) latest bios and several previous iterations. Purchased this 5900x in Jan, been working fine at stock settings, and my Gskill at 3600 using DOCP.

5900x
Asus Crosshair VIII Hero (wifi) UEFI: 3801
eVGA 1200 P2 PSU (5'ish years old, can't remember)
G.skill F4-3600C17D-16GTZR 3600 DDR4 32 (8gb x 4)
Sabrent Rocket NVMe 2TB pcie4 x2
Samsung 860 EVO 1TB x2 (RAID0)
OCZ Vertex4 256GB x2 (RAID0)
eVGA 3090 FTW3 Ultra Gaming on EKWB w/backplace (passsive)
EK Velocity AM4 RGB block, 2x360mm rads, 1xDDC pump, 1xD5 pump, loop config: serial (<===for the 1st time ever, I think serial was a mistake)
10x Lian Li Uni Fan Sl120s
Lian Li PC-O11 Dynamic XL
PETG throughout

This machine was fairly rock solid until about a month ago. Tons of WHEA uncorrectables, disappearing M.2 Drives that test fine. I can run AIDA64 all day, all night. Same with 3dMark stability test, same with EVGA X-1 Precision, Memtest86 (4 passes), Furmark stability test.......everything passes.......



Load War Thunder, Civ6, Warhammer II, MSFS 2020.....within 15 minutes, I get blue BSODs WHEA_UNCORRECTABLE.......kernal-panic, VOL MGR 141 in event viewer...etc, etc.

This machine is just not stable at all. Default everything, lower ram to 2133, same results. Any kind of intense gaming load on this machine causes the BSOD.

I ran GPU-Zs GPU test, with AIDA64's stability test and a 4k 60FPS file playing in VLC player and left it........completely stable, still cooking after about 8 hours.......load War Thunder? Within 15 minutes or so, BSOD. Same with CIV6, though it triggers it faster.

Swapped out RAM for G.Skill F4-3300C16Q2-128GTZSW and get the exact same results. I'm at a loss.

Getting ready to swap in my old 2700x to see if it persists. If it does, it's the board, if not, proc. Let me just add that this is why I HATE hard line tubing. I'm a noob at hardline for sure, but it just makes maintenance on a rig way too much work. Going back to soft tube as soon as I'm able.


----------



## Vesimas

It's been a week since i finally completed my new rig (after almost 8 month on having all parts at home). No problem so far all stock except the ram profile. 5800x bought march 2021 from amd store


----------



## xeizo

LancerVI said:


> I'm sorry to report that I am now also having this issue on an Asus Crosshair VIII Hero (wifi) latest bios and several previous iterations. Purchased this 5900x in Jan, been working fine at stock settings, and my Gskill at 3600 using DOCP.
> 
> 5900x
> Asus Crosshair VIII Hero (wifi) UEFI: 3801
> eVGA 1200 P2 PSU (5'ish years old, can't remember)
> G.skill F4-3600C17D-16GTZR 3600 DDR4 32 (8gb x 4)
> Sabrent Rocket NVMe 2TB pcie4 x2
> Samsung 860 EVO 1TB x2 (RAID0)
> OCZ Vertex4 256GB x2 (RAID0)
> eVGA 3090 FTW3 Ultra Gaming on EKWB w/backplace (passsive)
> EK Velocity AM4 RGB block, 2x360mm rads, 1xDDC pump, 1xD5 pump, loop config: serial (<===for the 1st time ever, I think serial was a mistake)
> 10x Lian Li Uni Fan Sl120s
> Lian Li PC-O11 Dynamic XL
> PETG throughout
> 
> This machine was fairly rock solid until about a month ago. Tons of WHEA uncorrectables, disappearing M.2 Drives that test fine. I can run AIDA64 all day, all night. Same with 3dMark stability test, same with EVGA X-1 Precision, Memtest86 (4 passes), Furmark stability test.......everything passes.......
> 
> 
> 
> Load War Thunder, Civ6, Warhammer II, MSFS 2020.....within 15 minutes, I get blue BSODs WHEA_UNCORRECTABLE.......kernal-panic, VOL MGR 141 in event viewer...etc, etc.
> 
> This machine is just not stable at all. Default everything, lower ram to 2133, same results. Any kind of intense gaming load on this machine causes the BSOD.
> 
> I ran GPU-Zs GPU test, with AIDA64's stability test and a 4k 60FPS file playing in VLC player and left it........completely stable, still cooking after about 8 hours.......load War Thunder? Within 15 minutes or so, BSOD. Same with CIV6, though it triggers it faster.
> 
> Swapped out RAM for G.Skill F4-3300C16Q2-128GTZSW and get the exact same results. I'm at a loss.
> 
> Getting ready to swap in my old 2700x to see if it persists. If it does, it's the board, if not, proc. Let me just add that this is why I HATE hard line tubing. I'm a noob at hardline for sure, but it just makes maintenance on a rig way too much work. Going back to soft tube as soon as I'm able.


The sudden reboots are usually because the cpu boosts too high and too quick from a too low voltage, it's in fact very complicated and AMD has not got it fully right. It gets more complicated because it's usually one or a few cores which are responsible for all the crashes, AMD skimped out on QA a little bit too much on the consumer dies.

Why was it stable earlier? Don't know, but Microsoft has been fiddling a lot with Windows scheduler lately so maybe some Windows update triggered the dangerous peak load to one of your weak cores instead of one of your good cores?

Zen+ and Zen 2 never had these problems, that's because they boost way lower.

I don't think it's anything wrong with your mobo, possibly you could try to find your weak core/cores using core cycler and give it/them more juice using Curve Optimizer.

The problem is real, I know for a fact as Asus Crosshair 8 Extreme has a lot of nifty new features to mitigate exactly these problems. Sadly, I don't think those features can be implemented in the lesser boards as C8E has a more advanced VRM than most boards.

If you don't want to tweak your way out of it, RMA the processor and hopefully you get a better sample. Also, I hope that Zen3+ has these problems solved once and for all.

edit. word on the street is that a new stepping is already on it's way to e-tailers, no word on changes, new steppings are not uncommon. A guess is that it is dies that didn't become 3D-cache models and will be sold as normal models without the extra cache. But who knows?


----------



## Daylight_Invader

xeizo said:


> edit. word on the street is that a new stepping is already on it's way to e-tailers, no word on changes, new steppings are not uncommon. A guess is that it is dies that didn't become 3D-cache models and will be sold as normal models without the extra cache. But who knows?


I suspect that is exactly what the new stepping is all about. Processors made that won't be getting the extra cache, which ties in to what AMD said which was not to expect any performance differences.


----------



## Blameless

LancerVI said:


> I'm sorry to report that I am now also having this issue on an Asus Crosshair VIII Hero (wifi) latest bios and several previous iterations. Purchased this 5900x in Jan, been working fine at stock settings, and my Gskill at 3600 using DOCP.
> 
> 5900x
> Asus Crosshair VIII Hero (wifi) UEFI: 3801
> eVGA 1200 P2 PSU (5'ish years old, can't remember)
> G.skill F4-3600C17D-16GTZR 3600 DDR4 32 (8gb x 4)
> Sabrent Rocket NVMe 2TB pcie4 x2
> Samsung 860 EVO 1TB x2 (RAID0)
> OCZ Vertex4 256GB x2 (RAID0)
> eVGA 3090 FTW3 Ultra Gaming on EKWB w/backplace (passsive)
> EK Velocity AM4 RGB block, 2x360mm rads, 1xDDC pump, 1xD5 pump, loop config: serial (<===for the 1st time ever, I think serial was a mistake)
> 10x Lian Li Uni Fan Sl120s
> Lian Li PC-O11 Dynamic XL
> PETG throughout
> 
> This machine was fairly rock solid until about a month ago. Tons of WHEA uncorrectables, disappearing M.2 Drives that test fine. I can run AIDA64 all day, all night. Same with 3dMark stability test, same with EVGA X-1 Precision, Memtest86 (4 passes), Furmark stability test.......everything passes.......


When did you update to firmware 3801?

What vSoC, CLDO VDDG CCD/IOD, and LLC settings are you running?

Have you touched the LCLK DPM settings?


----------



## 1devomer

LancerVI said:


> I'm sorry to report that I am now also having this issue on an Asus Crosshair VIII Hero (wifi) latest bios and several previous iterations. Purchased this 5900x in Jan, been working fine at stock settings, and my Gskill at 3600 using DOCP.
> 
> 5900x
> Asus Crosshair VIII Hero (wifi) UEFI: 3801
> eVGA 1200 P2 PSU (5'ish years old, can't remember)
> G.skill F4-3600C17D-16GTZR 3600 DDR4 32 (8gb x 4)
> Sabrent Rocket NVMe 2TB pcie4 x2
> Samsung 860 EVO 1TB x2 (RAID0)
> OCZ Vertex4 256GB x2 (RAID0)
> eVGA 3090 FTW3 Ultra Gaming on EKWB w/backplace (passsive)
> EK Velocity AM4 RGB block, 2x360mm rads, 1xDDC pump, 1xD5 pump, loop config: serial (<===for the 1st time ever, I think serial was a mistake)
> 10x Lian Li Uni Fan Sl120s
> Lian Li PC-O11 Dynamic XL
> PETG throughout
> 
> This machine was fairly rock solid until about a month ago. Tons of WHEA uncorrectables, disappearing M.2 Drives that test fine. I can run AIDA64 all day, all night. Same with 3dMark stability test, same with EVGA X-1 Precision, Memtest86 (4 passes), Furmark stability test.......everything passes.......
> 
> 
> 
> Load War Thunder, Civ6, Warhammer II, MSFS 2020.....within 15 minutes, I get blue BSODs WHEA_UNCORRECTABLE.......kernal-panic, VOL MGR 141 in event viewer...etc, etc.
> 
> This machine is just not stable at all. Default everything, lower ram to 2133, same results. Any kind of intense gaming load on this machine causes the BSOD.
> 
> I ran GPU-Zs GPU test, with AIDA64's stability test and a 4k 60FPS file playing in VLC player and left it........completely stable, still cooking after about 8 hours.......load War Thunder? Within 15 minutes or so, BSOD. Same with CIV6, though it triggers it faster.
> 
> Swapped out RAM for G.Skill F4-3300C16Q2-128GTZSW and get the exact same results. I'm at a loss.
> 
> Getting ready to swap in my old 2700x to see if it persists. If it does, it's the board, if not, proc. Let me just add that this is why I HATE hard line tubing. I'm a noob at hardline for sure, but it just makes maintenance on a rig way too much work. Going back to soft tube as soon as I'm able.


You are not the first having this kind of issue, nor the last, a lot of Zen3 cpu died or degraded, less than a year after being bought.
Can you check carefully which kind of WHEA you got in the Event Viewer please, alongside sharing if you got USB or pci-e disconnecting issues?

You should do the usual AMD sanity checks:
-Verify the SOC, VDDG IO/CCD voltages.
-Disable the C-States and CoreBoost, PBO.
-Clear the CMOS, flash an old bios.
-Reseat carefully the cpu, without killing the pins.
-Check the cooling, T°, motherboard power connections.

If it is stable with these disabled but cannot run PBO enabled, causing random reboots, just RMA your cpu and get a better one from AMD.


----------



## LancerVI

I need to fall on my sword and admit to a mistake.

For me, turns out the EKWB distro plate I was using; I misinterpreted the flow directions. NOT EK's fault.

When water cooling, I always run serial. Always, always, always. Pump>CPU>RAD>GPU>RAD>PUMP/Plate or something similar. When i tore it down over the weekend, i realized I did this instead: PUMP>GPU>DISTROPLATE>CPU>RAD>RAD>PUMP/Plate.

I never noticed a problem, as I haven't been gaming/loading this machine as much as I anticipated this year, so I just missed it. My CPU temps were 60s/70s at load most of the time (seemed high to me, but usable), but I wasn't doing anything significant. It was only when I got some time to game again and try Windows 11 did this all crop up, leading to more confusion. (I though I had a botched w11 install at first)

Fact of the matter is, I'm an idiot and screwed up. I was cooling my 5900x with water directly from my EVGA 3090 FTW3 Ultra. That card is hot as hell and did my 5900X no favors. Now, I understand that the loop order "does it matter/not matter" debate, but I can say unequivocally; it matters. At least it did here. My coolant temp the entire time is roughly 32-36c and that's after hours of tests, benchs, etc. The problem is cooling a CPU directly after a 3090 caused all kinds of problems. When I corrected the routing this weekend, all of my problems disappeared.

That's twice now I've messed this particular rig up. 1st time, earlier this year, was with PSU cable extension from Cable Mods. (My CPU went up an entire grade merely by removing the cable extensions for MB and CPU socket) and became infinitely more stable.

Now that I have my loop going GPU>RAD>PUMP>CPU>RAD, my CPU temps are down across the board about 5-10C'ish for the CPU, DOCP is no issue at 3600 and my GPU is humming along nicely.

During this time, I did NOT suffer the USB disconnect issue other had. I use a metric-crap-ton of USB devices though and they're all on powered hubs (Flight/Driving sim enthusiast with TONS OF CONTROLLERs, pedals, etc) and from what I understand, powered hubs don't suffer as much from the disconnect issue. I also have a secondary USB 3.1 card installed in the last PCie slot too, so that seems to have helped me avoid that.

My Sabrent nVME however did disconnect all the time since this problem surfaced a month or so ago. It was installed in the top m.2 slot of the ChVIII Hero wifi. Again, I think this was heat related. Fact of the matter is I did a terrible job building this machine. The drive tests fine and no SMART errors or anything. I haven't reinstalled it, so I'm unsure yet.

In short, I apologize for the ruckus . My problems, it would appear, were completely and in all other ways self inflicted. I screwed up.


----------



## 1devomer

LancerVI said:


> I need to fall on my sword and admit to a mistake.
> 
> For me, turns out the EKWB distro plate I was using; I misinterpreted the flow directions. NOT EK's fault.
> 
> When water cooling, I always run serial. Always, always, always. Pump>CPU>RAD>GPU>RAD>PUMP/Plate or something similar. When i tore it down over the weekend, i realized I did this instead: PUMP>GPU>DISTROPLATE>CPU>RAD>RAD>PUMP/Plate.
> 
> I never noticed a problem, as I haven't been gaming/loading this machine as much as I anticipated this year, so I just missed it. My CPU temps were 60s/70s at load most of the time (seemed high to me, but usable), but I wasn't doing anything significant. It was only when I got some time to game again and try Windows 11 did this all crop up, leading to more confusion. (I though I had a botched w11 install at first)
> 
> Fact of the matter is, I'm an idiot and screwed up. I was cooling my 5900x with water directly from my EVGA 3090 FTW3 Ultra. That card is hot as hell and did my 5900X no favors. Now, I understand that the loop order "does it matter/not matter" debate, but I can say unequivocally; it matters. At least it did here. My coolant temp the entire time is roughly 32-36c and that's after hours of tests, benchs, etc. The problem is cooling a CPU directly after a 3090 caused all kinds of problems. When I corrected the routing this weekend, all of my problems disappeared.
> 
> That's twice now I've messed this particular rig up. 1st time, earlier this year, was with PSU cable extension from Cable Mods. (My CPU went up an entire grade merely by removing the cable extensions for MB and CPU socket) and became infinitely more stable.
> 
> Now that I have my loop going GPU>RAD>PUMP>CPU>RAD, my CPU temps are down across the board about 5-10C'ish for the CPU, DOCP is no issue at 3600 and my GPU is humming along nicely.
> 
> During this time, I did NOT suffer the USB disconnect issue other had. I use a metric-crap-ton of USB devices though and they're all on powered hubs (Flight/Driving sim enthusiast with TONS OF CONTROLLERs, pedals, etc) and from what I understand, powered hubs don't suffer as much from the disconnect issue. I also have a secondary USB 3.1 card installed in the last PCie slot too, so that seems to have helped me avoid that.
> 
> My Sabrent nVME however did disconnect all the time since this problem surfaced a month or so ago. It was installed in the top m.2 slot of the ChVIII Hero wifi. Again, I think this was heat related. Fact of the matter is I did a terrible job building this machine. The drive tests fine and no SMART errors or anything. I haven't reinstalled it, so I'm unsure yet.
> 
> In short, I apologize for the ruckus . My problems, it would appear, were completely and in all other ways self inflicted. I screwed up.


Glad you sorted out.


----------



## Blameless

LancerVI said:


> Now, I understand that the loop order "does it matter/not matter" debate, but I can say unequivocally; it matters. At least it did here. My coolant temp the entire time is roughly 32-36c and that's after hours of tests, benchs, etc. The problem is cooling a CPU directly after a 3090 caused all kinds of problems. When I corrected the routing this weekend, all of my problems disappeared.


This is strongly suggestive of inadequate flow. Improperly connecting the distributor plate could do that, but I'd still recommend checking flow rate.

Even if your 3090 is heavily OCed, running full tilt, and dumping ~500w of heat into the water, this is only enough for a ~1.9C temperature rise at one GPM. For loop order to cause a 5-10C temp differential, you either have woefully inadequate flow, or your GPU is dumping thousands of watts of heat into the loop (which is impossible).

Also, even a 10C temp elevation should not be enough to prompt the issues you were having, unless you were already highly borderline on temps. I would keep a close eye on things, in case there are further issues...problems disappearing out of nowhere is no guarantee they won't return.

Those Sabrent NVMe drives can get quite warm and this can cause issues. I usually recommend filling the gap between the board and the underside of the drive's PCB (on or opposite the controller IC) with a thermal pad or putty...this is lower profile than a heatsink, doesn't require meaningful airflow, and often cools just as well or better.


----------



## LancerVI

Blameless said:


> This is strongly suggestive of inadequate flow. Improperly connecting the distributor plate could do that, but I'd still recommend checking flow rate.
> 
> Even if your 3090 is heavily OCed, running full tilt, and dumping ~500w of heat into the water, this is only enough for a ~1.9C temperature rise at one GPM. For loop order to cause a 5-10C temp differential, you either have woefully inadequate flow, or your GPU is dumping thousands of watts of heat into the loop (which is impossible).
> 
> Also, even a 10C temp elevation should not be enough to prompt the issues you were having, unless you were already highly borderline on temps. I would keep a close eye on things, in case there are further issues...problems disappearing out of nowhere is no guarantee they won't return.
> 
> Those Sabrent NVMe drives can get quite warm and this can cause issues. I usually recommend filling the gap between the board and the underside of the drive's PCB (on or opposite the controller IC) with a thermal pad or putty...this is lower profile than a heatsink, doesn't require meaningful airflow, and often cools just as well or better.


Indeed. I would agree; that's why I've always been a "serial route / order doesn't matter" guy. I don't have a flow meter setup, but I'm running a D5 and DDC, which should be plenty. The D5 standalone with a top and the DDC via the distro plate. As an aside, that's why I put "'ish" in there. otally ballpark. But this, coupled with the use of a distro plate for the 1st time, definitely made a difference. That GPU was pumping my 5900x upwards to 90+C and as I looked over everything, that's when it would crash. My APC UPS reports CIV 6 pulling 850W and War Thunder pulling about the same at 830w. Everytime I got up to that kind of load, in the old order, CPU temps were in the 80s/90s. Changed the order, and now it barely hits mid 70s, AIDA64 full tilt with Furmark running a 1080p stability test, War Thunder or otherwise.

My CPU idles high 30s/low 40s (was high 40s, low 50s)

At load, it is now solidly high 60s low 70s. So it is defintely lower temps, but I admit, this could be a myriad of things. Better re-seat of the proc and I pulled out all of the PETG hardline and replaced it with my tried and true Koolance 13/19mm soft tubes and fittings. Still no problems after a 4 hour WarThunder / DCS / MSFS session yesterday.

Anyway, Thank you very much for the help guys. I do appreciate. I think my WC days are over. I have less days ahead than I do behind and I think I may be finally outgrowing the custom loop thing, purely out of laziness. I see AIOs in my future.

Cheers,
LancerVI


----------



## N2Gaming

So have all of the issues with the 5800x been fixed by now. I'd like to buy a 5800x but have been holding out to see if they ever fix the issues?


----------



## alexartwww

@Deepcuts

I've read your story. I have the same issue with my 5950x. I can't reproduce reboots on 100%. But with cinebench it reproducable.

I wrote to AMD. They told me to delete cinebench. For now it looks stable, but once it rebooted. Very rare.

Does your new system run cinebench without any problems?

Should I replace CPU as you did? Your opinion?

Thank you


----------



## tcclaviger

The 5950 is more WHEA prone, but also, along with it comes mandated newer bios files than 3950. I am not sure on the exact implementation date, but I've read that at some point there was an error checking implementation in AGESA added, so the most useful data would be that which uses post error-check enabled AGESA and both chips. Pre-error check being enabled it's very possible to have a 3950x spewing errors and they're never noticed.

My experience was that in Win 11 on 5950 non-stop WHEA at any FCLK, at any RAM, default CPU setting on C7H. Moved back to Win 10, no more WHEA spewing. That leads me to conclude there is definitely a windows check or windows scheduler change that has impacted these errors either being caught or occurring, not sure which. 

Swapped from a C7H to a Strix x570 II and no more whea spewing, can run over 1900FCLK without them, all other components remain the same except PSU (was drooping on 12v rail down to 11.3). Now, what was the cause the MB (chipset/bios) or the PSU, not sure, but a fresh ROG Thor 1200 and new board fixed it. Chip is now a monster again.


----------



## Deepcuts

alexartwww said:


> @Deepcuts
> 
> I've read your story. I have the same issue with my 5950x. I can't reproduce reboots on 100%. But with cinebench it reproducable.
> 
> I wrote to AMD. They told me to delete cinebench. For now it looks stable, but once it rebooted. Very rare.
> 
> Does your new system run cinebench without any problems?
> 
> Should I replace CPU as you did? Your opinion?
> 
> Thank you


If your system is not stable at stock (as written in the 1st post), my money is on faulty CPU.
Then again, I am just guessing here, not knowing your setup or what you did to test it.
AMD trying to be funny with deleting Cinebench. I guess they ran out of excuses.


----------



## alexartwww

Deepcuts said:


> If your system is not stable at stock (as written in the 1st post), my money is on faulty CPU.
> Then again, I am just guessing here, not knowing your setup or what you did to test it.
> AMD trying to be funny with deleting Cinebench. I guess they ran out of excuses.



AMD Ryzen 5950x(box version, SN#:9JQ0106S10249, Part #:100-100000059WOF)
Gigabyte Aorus Ultra x570 rev 1.2
4x HyperX Fury 32 ГБ DDR4 3600 Mhz DIMM CL18 HX436C18FB3/32
MSI SUPRIM 3080 LHR 10G
Samsung Evo Pro 1tb(for linux) + Western Digital Blue 500 Gb WDS500G2B0A SATA(for windows)
Noctua NH-U12A
Thermaltake Toughpower GF1 850w
I've tryed a lot of options. But helped to stable pass cinebench are:

vcore to 1.1
Power Supply idle control to Typical Current Idle
But I guess system lost performance in that case(some). Does your system pass cinebench well?


----------



## Deepcuts

alexartwww said:


> Does your system pass cinebench well?


Since I've got the 2nd CPU, absolutely no problem with any program, benchmark or game, including Cinebench.
Just to be clear, you have problems at stock settings without anything set in BIOS? Or you set some options, maybe even overclocked a bit?


----------



## alexartwww

Deepcuts said:


> Since I've got the 2nd CPU, absolutely no problem with any program, benchmark or game, inclusing Cinebench.
> Just to be clear, you have problems at stock settings without anything set in BIOS? Or you set some options, maybe even overclocked a bit?


*you have problems at stock settings without anything set in BIOS?*

Yes. I built new pc and started to test it to be sure that it is stable and do not overheat. And there I found this problems(sudden reboots)

*Or you set some options, maybe even overclocked a bit?*

I've never overclock. I do not want to do it, because it can damage CPU and needs water cooling witch can damage full system by leaking water(I think sooner or later it will leak). Intel for me is not an option because of overheat and water coolers. I ssaved money a lot of time to get this pc and I thought that all issues are fixed in the passing year after release( I don't want to be beta tester)

By the way - Linux is stable. I've compiled kernel and it works stable. But I guess Linux do not use CPU eco modes or not so deep, because I can hear coolers more often.

I think problem is in automation control with low voltages on CPU.


----------



## Deepcuts

If with the latest BIOS at stock settings you still get reboots=99% busted CPU
1% reserve because you did not specify if all others components were tested/swapped also.
I know I am biassed because I went through this already. Someone who never had a faulty new CPU would not even think about this possibilty. 
RMA is a pain, I know, but I would not like to work on such a system knowing that at any point it can just reboot out of the blue.
Or if you bought it at a local shop, give them a call. Maybe they can test your system with another CPU.


----------



## alexartwww

Deepcuts said:


> If with the latest BIOS at stock settings you still get reboots=99% busted CPU
> 1% reserve because you did not specify if all others components were tested/swapped also.
> I know I am biassed because I went through this already. Someone who never had a faulty new CPU would not even think about this possibilty.
> RMA is a pain, I know, but I would not like to work on such a system knowing that at any point it can just reboot out of the blue.
> Or if you bought it at a local shop, give them a call. Maybe they can test your system with another CPU.


I think local shop do not want to make investigation or change fault components. They earn money. Nothing else. 99% they will say that I'm an idiot. And it will take month(or more). Likely I bought box version, not oem.

That's why I talk directly to AMD. If they have customer care program - I want to use it.

Tell me about RMA. How long it took for you to change processor? Did you pay for delivery? What they did? Did they ask you to proof something?


----------



## Deepcuts

alexartwww said:


> I think local shop do not want to make investigation or change fault components. They earn money. Nothing else. 99% they will say that I'm an idiot. And it will take month(or more). Likely I bought box version, not oem.
> 
> That's why I talk directly to AMD. If they have customer care program - I want to use it.
> 
> Tell me about RMA. How long it took for you to change processor? Did you pay for delivery? What they did? Did they ask you to proof something?


I tried RMA with AMD and it was not a good experience.









I have purchased my 2nd CPU with my own money.
Once I was sure the 2nd CPU was good, I returned the 1st CPU to the shop and told them if they find it faulty to reimburse the money into my shop account to be used at a later date for some other component.

Side story for reference: had some issue with my last Dell monitor (~700 EUR)
Opened a ticket and in 4 days I had a new and improved model delivered with zero extra costs. The courier came back 7 days later to pickup the old faulty monitor.
Now, this is what I call customer service.

Good luck with whatever option you choose.


----------



## alexartwww

Deepcuts said:


> I tried RMA with AMD and it was not a good experience.
> View attachment 2533097
> 
> 
> I have purchased my 2nd CPU with my own money.
> Once I was sure the 2nd CPU was good, I returned the 1st CPU to the shop and told them if they find it faulty to reimburse the money into my shop account to be used at a later date for some other component.
> 
> Side story for reference: had some issue with my last Dell monitor (~700 EUR)
> Opened a ticket and in 4 days I had a new and improved model delivered with zero extra costs. The courier came back 7 days later to pickup the old faulty monitor.
> Now, this is what I call customer service.
> 
> Good luck with whatever option you choose.


Ah, you did it in a tricky way... Not directly to AMD.


----------



## tcclaviger

The AMD warranty sucks and is incredibly easy for them to attempt to deny. Get a single nick on top of the HS, yep, bye bye, even a faint scratch can DQ you. Use of LM TIM, yep bye bye if they can tell (and they can tell if it's been in place for more than a day or two, it leaves a slight discoloration).

I have seen Asus do the same thing to other for the smallest of surface marks. A mark made by a piece of plastic sliding along an area on the MB, even the top paint wasn't damaged, just had a small discoloration on it...

Best bet is always test before final build, and return to retailer if faulty. This is very impractical for most people, but necessary to not have to deal with their bad serivce.

B&H has been amazing for returns I've had to make.


----------



## alexartwww

tcclaviger said:


> The AMD warranty sucks and is incredibly easy for them to attempt to deny. Get a single nick on top of the HS, yep, bye bye, even a faint scratch can DQ you. Use of LM TIM, yep bye bye if they can tell (and they can tell if it's been in place for more than a day or two, it leaves a slight discoloration).
> 
> I have seen Asus do the same thing to other for the smallest of surface marks. A mark made by a piece of plastic sliding along an area on the MB, even the top paint wasn't damaged, just had a small discoloration on it...
> 
> Best bet is always test before final build, and return to retailer if faulty. This is very impractical for most people, but necessary to not have to deal with their bad serivce.
> 
> B&H has been amazing for returns I've had to make.


I don't have scratches there. Noctua is good polished cooler. And color was not changed. 65 C max. It can't.

Here on reddit good experiences.

__
https://www.reddit.com/r/Amd/comments/it00rd

Do you have experience of RMA?


----------



## tcclaviger

Attempted one with AMD and was denied RMA. Using LM apparently is not acceptable.

The comments about basis for denial are based on their written policy and pictures of CPUs I've seen turned around with RMA denials, some having phenomenally light marks on the top, in 1 case, 1 mark about 1/2" long that was only a discoloration that a camera could barely capture.

Asus on the other hand, has gone above and beyond to assist me. Overnight shipment from Taiwan with return Express labels, cross shipments etc. I know not everyone has had that experience with them, and some get denied for the most idiotic reasons.

Generally with Asus I've had only top of the stack parts, and it seems most of the problem RMAs I've seen were bottom of stack.


----------



## GRABibus

Hueristic said:


> Point these people out because I don't see them.
> 
> This thread has morphed into a OC thread.


now I can point some people since you asked me…😊


----------



## alexartwww

@Deepcuts I did RMA. Now it's stable. I had to pay for DHL (~$125) and wait for 2 months. + a lot of correspondence with AMD.


----------



## Deepcuts

Glad you sort it out, although shame on AMD for the time and money you spent for their mess.


----------



## evgeny8652

жаль, что все так зациклились на этих процессорах, вот и вопрос, что теперь делать? кто может сказать?


----------



## alexartwww

evgeny8652 said:


> жаль, что все так зациклились на этих процессорах, вот и вопрос, что теперь делать? кто может сказать?











Replaced 3950X with 5950X = WHEA and reboots


I'm sorry to report that I am now also having this issue on an Asus Crosshair VIII Hero (wifi) latest bios and several previous iterations. Purchased this 5900x in Jan, been working fine at stock settings, and my Gskill at 3600 using DOCP. 5900x Asus Crosshair VIII Hero (wifi) UEFI: 3801 eVGA...




www.overclock.net


----------



## Daylight_Invader

alexartwww said:


> Replaced 3950X with 5950X = WHEA and reboots
> 
> 
> I'm sorry to report that I am now also having this issue on an Asus Crosshair VIII Hero (wifi) latest bios and several previous iterations. Purchased this 5900x in Jan, been working fine at stock settings, and my Gskill at 3600 using DOCP. 5900x Asus Crosshair VIII Hero (wifi) UEFI: 3801 eVGA...
> 
> 
> 
> 
> www.overclock.net


I had memory issues myself and from Asus I know a bug exists from 3601 onwards with my 3600 CL16 RAM at DOCP. Try setting an SOC +ve offset of 0.00625. For me I was unable to boot without this. Apparently this issue will be fixed with 1.2.0.7. I'm not sure if this is definitely the cause of your issues, but it certainly is for me.


----------



## nado4ilhas

guys I read the forum so far and a lot of cpu problems when they launched, does anyone have any 5800x stepp 2043 SUS without failure or does everyone have a problem?


----------



## N2Gaming

If the cpu in question is fairly new you should be ok. I got a 5800x recently on Amazon and it's been flawless.


----------



## rob-tech

N2Gaming said:


> If the cpu in question is fairly new you should be ok. I got a 5800x recently on Amazon and it's been flawless.


It is still a random lottery, only my second 5950x was stable at stock with the first unit being garbage due to failing CoreCycler and causing USB stuttering issues without overvolting SOC. Both of the CPUs were B2 stepping and were from late production batches. The binning also noticeably varies as my second 5950x that is fully stable seems to return Cinebench R23 nT scores that are about 2.5% lower than the first unstable unit (this is however probably normal and within silicon production variation).

Either way, there is no quality and I would be prepared to exchange many times and buy from a retailer that won't cause you problems.

I will definitely go Intel for a future build as they can't possibly be as bad as AMD with regards to binning and tolerances.


----------



## ghiga_andrei

nado4ilhas said:


> guys I read the forum so far and a lot of cpu problems when they launched, does anyone have any 5800x stepp 2043 SUS without failure or does everyone have a problem?


2043 SUS is a very early sample, the chances to be unstable are very high. A lot of people here, including myself, exchanged CPUs multiple times until they got a stable one. My stable 5900x now starts with 21xx, got it from AMD RMA. I don't remember the production week, but it is Jan-Feb 2021.

But to be fair for statistics, people with CPUs without problems do not read this thread.


----------

