# WHEAService, WHEA errors suppressor - unleash Ryzen processor high FCLK without performance penalties



## ManniX-ITA

*Doesn't work with Windows 11
Either it's a bug, I have reported it, or Microsoft decided for you that is better you always get WHEA whatever you like it or not (more likely this one)*

Due to the high number of WHEA errors clogging the system, running a Ryzen with high FCLK incurs in a performance penalty.
WHEAService will disable the WHEA error reporting; please be careful and check system stability.
System should become stable and smooth once the error sources are disabled.
It's a non-destructive and low risk solution; you can always disable the service or uninstall it and go back to the previous state.

*Some settings advice for high FCLK:*









[Official] AMD Ryzen DDR4 24/7 Memory Stability Thread


AMD is the real enemy in here, not each other. The mess of AGESA and chipset drivers is 🤮 I truly believe if they knew what they were doing some of these better binned chips should be able to run FCLK between 1900~2000. Especially the chips that can run 1900 stable at pretty low voltages. It...




www.overclock.net





*Ver. 1.2.0.0 Release*

You can download it form GitHub:

Release v1.2.0.0 Release · mann1x/WHEAService

Use the MSI installer and reboot.

Please check the README on GitHub.

A brief explanation of what it does and what doesn't:

It will suppress the WHEA errors reporting from the WMI sources
It's not going to magically fix instabilities or improve performances
It will improve performances when you get thousands of WHEA per minute under load and the Event Log will consume a lot of resources to process them
You'll be able to see the other events logged in the System Log not just WHEA errors
It's not going to improve performances if the WHEA 19 correctable errors are causing performance degradation
Not going to stop the system crashing with WHEA 18 unrecoverable errors
Best and almost only way to check for performance regressions and improvements over lower FCLK is the monero xmr-stak-rx miner:









Releases · fireice-uk/xmr-stak


Free Monero RandomX Miner and unified CryptoNight miner - fireice-uk/xmr-stak




github.com





You need to properly configure it like you really want to mine with it so follow the instructions.

Run it with the command line:

_xmr-stak-rx.exe --noTest --noAMD --noNVIDIA_

Press 'h' to gather the throughput:










Configure it and run it for at least 5 minute and get the throughput for ALL at (2), the last 60 seconds.
Even better if you can withstand the temperatures is to let it run for 15 minutes and get the result at (3) which is the last 15 minutes.
The 15 minutes result is better for comparison.

These are the results you should get if there's no performance regression:










Enjoy!


----------



## kairi_zeroblade

so basically, it just disables reporting of WHEA events?? just a cosmetic change to event logs?? WHEA errors will slow the system down and sometimes it would crash it, so if I understand it right, its still imminent to get the crashes from time to time with this installed??


----------



## Veii

kairi_zeroblade said:


> so basically, it just disables reporting of WHEA events?? just a cosmetic change to event logs?? WHEA errors will slow the system down and sometimes it would crash it, so if I understand it right, its still imminent to get the crashes from time to time with this installed??


This exists to combat WHEA #19 which is IO Related and is triggering 7+ events per second
You can hide instability with it, logically
It's main goal is to fix a spam of DPC calls created by a wrongly spamming kernel event

As for Ryzen, auto correction happens and stability is kept up at any cost
It can be used to fake stuff, but it's goal is a more good minded one
Fixing an ongoing issue with IO & false WindowsHardwareError - Kernel reports
Soo the system remains responsible without the continuous spam loop of meaningless errors

It's not only a visual change, but fully disables error logging soo responsiveness is kept up


----------



## ManniX-ITA

kairi_zeroblade said:


> so basically, it just disables reporting of WHEA events?? just a cosmetic change to event logs?? WHEA errors will slow the system down and sometimes it would crash it, so if I understand it right, its still imminent to get the crashes from time to time with this installed??


Indeed as explained by @Veii the system will be kept responsive, it's not a cosmetic change.

I can't run with WHEA errors flowing at high rate, after a while my system gets slow and unresponsive with explorer stalling etc.

This is Latencymon now at FCLK 2033 MHz on my bloated Windows install running Chrome with about 400 tabs and lots of other stuff:










Note the average interrupt and process latency:











On my testing and benching install I have an average of 6 microseconds process latency with the highest around 70.


----------



## Yuke

I don't understand it but all i can say is that system is stable (Y-Cruncher, Karhu) and gaming is smooth (BL3, Yakuza).










Im running more sensible settings for now but will try my CL14 settings at some point for sure.


----------



## domdtxdissar

Iam sure its me being stupid, but how do i actually run this ?


----------



## ManniX-ITA

domdtxdissar said:


> Iam sure its me being stupid, but how do i actually run this ?


On GitHub the download for the MSI installer is in the Releases section.
Sorry, I considered it obvious but it's not of course...

You downloaded the source code, go here:









Releases · mann1x/WHEAService


WHEAService, suppressor for WHEA errors. Contribute to mann1x/WHEAService development by creating an account on GitHub.




github.com





And download the MSI file.


----------



## ManniX-ITA

@Veii 

I don't want to speak too early but it's something I'd like others to report.

I know you don't have USB issues; me either not big issues but some small and frequent annoyances (worse with 1.2.0.2):

USB Hub, 7 port powered, randomly resetting at Windows startup
Fingerprint scanner switching off when reading and brining down half of the aforementioned Hub
Logitech G935 Headset not present at boot, distorted audio, muted mic; needing a re-plug
I was expecting a high risk it could become worse running at high FCLK.
Instead didn't have a single issue since the, not even running at 2066 MHz.

Could be that USB is actually working better than FCLK 1900?

I'd be surprised...
But if someone had similar small sporadic issues and is noticing an improvement please let me know.


----------



## Fight Game

If I turn it up too high, I'm still getting WHEA errors after installing. I thought maybe I did it wrong so I was going to read the readme, but I dont see it. Edit: and it's not so high that it causes instability. I'm running a few stress tests and it's doing fine. But still getting WHEA errors.


----------



## Yuke

ManniX-ITA said:


> @Veii
> 
> I don't want to speak too early but it's something I'd like others to report.
> 
> I know you don't have USB issues; me either not big issues but some small and frequent annoyances (worse with 1.2.0.2):
> 
> USB Hub, 7 port powered, randomly resetting at Windows startup
> Fingerprint scanner switching off when reading and brining down half of the aforementioned Hub
> Logitech G935 Headset not present at boot, distorted audio, muted mic; needing a re-plug
> I was expecting a high risk it could become worse running at high FCLK.
> Instead didn't have a single issue since the, not even running at 2066 MHz.
> 
> Could be that USB is actually working better than FCLK 1900?
> 
> I'd be surprised...
> But if someone had similar small sporadic issues and is noticing an improvement please let me know.


I will observe USB soundcard disconnects the next days (at least didnt have one this morning after boot up)


----------



## ManniX-ITA

Fight Game said:


> If I turn it up too high, I'm still getting WHEA errors after installing. I thought maybe I did it wrong so I was going to read the readme, but I dont see it. Edit: and it's not so high that it causes instability. I'm running a few stress tests and it's doing fine. But still getting WHEA errors.


You could have an issue with the IOD not able to run at high FCLK.
Can you post the general and details tab contents of the event message?


----------



## ManniX-ITA

Findings so far:

Running Sandra all tests, reboot in the middle at 2033 MHz. switched back to 2000 MHz to adjust settings
*Voltages must be adjusted*, it's not easy with AGESA 1.2.x, was much better on ver 1.1.x, performance regressions
*Had to increase CCD* from 1050 to 1080, regained performances
Lowered SOC voltage to bare minimum, for me, to 1.14V, testing how low I can go; lost a bit on MT, will try 1.15-1.16V.
*Very interesting; couldn't pass with aggressive CO counts CoreCycler or OCCT at VSOC 1.75-1.2V; now it's flawless at 1.14V!* Have to re-tune probably...

_Still zero USB issues_
_Gaming is absolutely silk smooth_ like I don't even recall, WWZ headshots never missing the target 
For the first time in more than 6 months *I could play WWZ without the USB input lagging* at some point, have to check with a longer session but it's promising


----------



## Asmodian

So instead of using settings that do not cause errors you simply suppress them?

Why would I want to run settings that generate WHEA errors even if I can ignore them so they don't cause slowdowns as they pile up? This fixes a side effect of the WHEA errors, but they are caused by real hardware errors that no software is going to fix.

Am I am missing something?


----------



## ManniX-ITA

Asmodian said:


> Am I am missing something?


Yes, speculation (so far) is that the issue could be the Realtek EFI firmware for the ethernet LAN.
I can't map all these errors to any real fault.
I'm rock solid at 2000 MHz; actually it's more stable than at 1900 MHz.
There are for sure other reasons why you get WHEA and/or can't run above or at 1900 MHz.
But the guess is that the majority has no real issues going up, as expected by AMD.
Only this unmanageable flood of messages that are meaningless.

It's a huge risk and whoever is not ready to pay the consequences should not engage in it.

My advice is to prepare a USB stick with a Windows To Go install and make its own benchmarks and stress testing to understand if it's worth.
Then decide if it's feasible or not for his daily setup.

BTW I'm testing the CO counts again with CoreCycler and I can already set 2-3 counts less on some cores with 0.25V less on VSOC.
Seems this processor was born to run at FCLK 2000


----------



## Hibbing

Still seeing WHEA even after installing.


----------



## Asmodian

ManniX-ITA said:


> Yes, speculation (so far) is that the issue could be the Realtek EFI firmware for the ethernet LAN.
> I can't map all these errors to any real fault.


Have you tried disabling the Realtek LAN to see if the issues go away without the WHEA suppressor?


----------



## ManniX-ITA

Hibbing said:


> Still seeing WHEA even after installing.


Can you see these logs?










Can you also provide a screenshot of the WHEA error event?


----------



## ManniX-ITA

Asmodian said:


> Have you tried disabling the Realtek LAN to see if the issues go away without the WHEA suppressor?


Just disabling it doesn't work.
We are going to try with a modded BIOS without the EFI module.
Otherwise the LAN card is loaded and enabled at POST to support boot via network etc


----------



## ManniX-ITA

BTW as you can see from the screenshot above the system will still log some WHEA errors before the service starts and silence them
That's normal behavior; if you get more WHEA after then it's not.


----------



## Asmodian

ManniX-ITA said:


> Otherwise the LAN card is loaded and enabled at POST to support boot via network etc


Is it? You cannot boot via network with the LAN disabled in BIOS, so that seems weird.

What made you think it is something to do with the Realtek LAN?


----------



## ManniX-ITA

Asmodian said:


> What made you think it is something to do with the Realtek LAN?


Not my investigation, we'll see if it's the real culprit.
The BIOS contains the EFI UNDI module for the Realtek NIC and will initialize it whatever it's disabled or not.
Disabling it will only avoid it's being exposed to the OS.


----------



## weleh

Interesting topic.

I'll be trying this today but can I ask how did you test for performance?


----------



## ManniX-ITA

weleh said:


> Interesting topic.
> 
> I'll be trying this today but can I ask how did you test for performance?


Take reference benchmarks with everything you can to compare
Best for a quick check is GB5, then all the rest

On my 5950x it was extremely thought to reach both performances and stability at FCLK 2033
But seems I did it
Have to re-tune the counts since seems I can go lower now

But if you don't want to struggle use FCLK 2000
Easy peasy for me, stable without fuss at super low voltages with very few performance drops to fix


----------



## weleh

Tested 2000 fclk at 4000c15. Had some performance issues but seems to be strong now however Ive lost some mt performance. Single core performance is there. 

Geek5/3 showed improvements over my 3800c14 profile.

Had to increase vsoc to around 1.17V which is 1.145 on Windows, also had to increase vddg iod to 1.06V to eliminate sound crackles and performance degradation. 

Increased ccd too just in case. 

Got to do some Gaming to see if theres any hiccups.


----------



## weleh

High vdimm just to eliminate ram issues to make sure the rest is stable


----------



## weleh

Consistent results on AIDA too on my daily OS.


----------



## ManniX-ITA

Do you ever had small USB issues?
I had some and so far they disappeared.

It's impressive Zentimings that can open in a fraction of time.
Gaming for me was never so smooth, I don't know maybe I'm imagining things...
I've run some benchmarks but I did the mistake not to rerun them at FCLK 1900 with the new CO counts.
Can't compare them 1:1

Found out an issue on the memory profile... drove me crazy.
tRDWR/tWRRD at 8/3 were wrong, was 11/1.
Had to rebuild the profile and couldn't POST anymore.


----------



## DeletedMember558271

So what happens if you get your modded BIOS to disable Realtek and it doesn't change anything about the WHEA errors?
Still ignored even though know nothing about the source?
Right now everyone is disabling WHEA without actually knowing or having any confirmation of what's causing it, just a completely blind decision.
Even the boards that are "supposed" to be WHEA-free and don't have Realtek, aren't being WHEA-free for everyone.
Like Blameless with his ASRock B550 Phantom Gaming-ITX/ax, Intel I225-V 2.5G, a board that's "supposed" to be WHEA-free.
Obviously something else is or can be going on, and it's not just Realtek, unfortunately it seems.
Even if the BIOS does clear WHEA for you I don't know if that makes anything much clearer, when there's other knowledgeable people without Realtek and 1933+ WHEA.
Situation seems more complicated.


----------



## ManniX-ITA

Dreamic said:


> Obviously something else is or can be going on, and it's not just Realtek, unfortunately it seems.


As said already there are other possible causes and sources.
It's not the only EFI that could trigger errors and there's always the IOD which could not handle it.
Seems many others could give problems (AMD nVME has been given as an example) and also some design issues that would require physical rework of the motherboard.
The theory, so far is only speculation (at least for me until I can prove it), that the Realtek EFI is the most common and severe root cause.
Could also be it's not; there are a few cases with non-RTL NICs that have similar WHEA issues and also those with RTL NICs which doesn't have WHEA at all.
But these are all edge cases. The majority seems to have an extremely high flow of WHEA for apparently no reason.

I think at this point, since AMD is silent (although seems they are trying to do something in background), it's time for everyone to decide by their self what to do.
Being a free man in a free country I like to be given options and not being dictated.
There's a huge risk behind suppressing errors that could be useful, it's clearly stated and plainly obvious.
If you do something like this without understanding the consequences, I can't help 

I'm not throwing the ball and looking at the disaster.
Taking my risks switching my daily profile silencing the errors.
Not doing it out of the blue; have been benching and stressing with WHEA silenced for months.
I'm pretty confident that in my case these errors are total bogus.
So far it was a pleasure experience; smooth system, nice bump in performances and no more USB issues.
Hope will be the same for everyone but of course there will be exceptions.


----------



## 1devomer

ManniX-ITA said:


> BTW I'm testing the CO counts again with CoreCycler and I can already set 2-3 counts less on some cores with 0.25V less on VSOC.
> Seems this processor was born to run at FCLK 2000





ManniX-ITA said:


> I'm pretty confident that in my case these errors are total bogus.
> So far it was a pleasure experience; smooth system, nice bump in performances and no more USB issues.
> Hope will be the same for everyone but of course there will be exceptions.



TLDR

Before even thinking of disabling the WHEA error logging system, stop for a second.
Think if it is not better to RMA the cpu first, as everybody else currently is doing.


----------



## ManniX-ITA

1devomer said:


> Before even thinking of disabling the WHEA error logging system, stop a second.
> Think if it is not better to RMA the cpu first, as everybody else currently is doing.


It's an option of course but it did work only for some.
Most people I've seen sending back a 5900x/5950x best would get one without failing cores and booting at FCLK 1900.
Those who got something booting above 1900 without WHEA were mostly 5600x/5800x and very few.
I'm thinking myself about doing it since I still have the 3800x.
But I'm more inclined to wait till the last moment hoping the manufacturing has improved over time.


----------



## mongoled

1devomer said:


> TLDR
> 
> Before even thinking of disabling the WHEA error logging system, stop a second.
> Think if it is not better to RMA the cpu first, as everybody else currently is doing.


Dont give up do you

😂 😂


----------



## umeng2002

After upping my VSOC to 1.1 on my 5800X, my WHEA event 19 warnings (not errors) are gone. Why hide the issue? This is with IF at 1867 MHz.


----------



## ManniX-ITA

umeng2002 said:


> After upping my VSOC to 1.1 on my 5800X, my WHEA event 19 warnings (not errors) are gone. Why hide the issue? This is with IF at 1867 MHz.


Cause they were legitimate errors, this is meant to suppress the event 19 high rate flow which is triggered at FCLK 1900+.
In some cases this flow doesn't mean there's an issue.


----------



## weleh

ManniX-ITA said:


> Do you ever had small USB issues?
> I had some and so far they disappeared.
> 
> It's impressive Zentimings that can open in a fraction of time.
> Gaming for me was never so smooth, I don't know maybe I'm imagining things...
> I've run some benchmarks but I did the mistake not to rerun them at FCLK 1900 with the new CO counts.
> Can't compare them 1:1
> 
> Found out an issue on the memory profile... drove me crazy.
> tRDWR/tWRRD at 8/3 were wrong, was 11/1.
> Had to rebuild the profile and couldn't POST anymore.


I've never had USB issues on any of my boards / cpu


----------



## weleh

Anyway, I'm not seeing regression at 2000 fCLK.
MT was a bit lower than usual on GB5 but other benches showed same/higher performance, CB23, GB3, CPU-Z.
Memory scaling benches increased performance as expected. 
I'm not sure if the loss of MT performance is tied to SOC voltage increase or something else.


----------



## weleh

Dropped vsoc a bit more and performance seems to be stable.
Aida cosistently on 50.6 to 50.7ns
Bandwith is very consistent too.


----------



## DeletedMember558271

ManniX-ITA said:


> Could also be it's not; *there are a few cases with non-RTL NICs that have similar WHEA issues* and also those with RTL NICs which doesn't have WHEA at all.
> But these are all edge cases. The majority seems to have an extremely high flow of WHEA for apparently no reason.


Only a few cases with non-RTL NICs? I'm not sure just how guaranteed people are not to have WHEA if they buy like ROG Strix B550's, are most of those people all fine? If so, I should just buy like a Asus B550-A. Making me think about it if this is true.


ManniX-ITA said:


> It's an option of course but it did work only for some.
> Most people I've seen sending back a 5900x/5950x best would get one without failing cores and booting at FCLK 1900.
> *Those who got something booting above 1900 without WHEA were mostly 5600x/5800x and very few.*
> I'm thinking myself about doing it since I still have the 3800x.
> But I'm more inclined to wait till the last moment hoping the manufacturing has improved over time.


So they had fine motherboards/NICs but CPUs so bad they couldn't take advantage of it? How would I know if I have/get a good motherboard/NIC if my CPU is also bad causing WHEA, could think I keep getting bad boards/NICs when it's CPU... AMD... why

This is too annoying


----------



## weleh

You shouldn't see fCLK as a granted thing, it's an overclock past the stock specifications and I'm yet to find a CPU on new AGESAS that can't do stock fCLK.

This is just another tool to help those that want to get everything out of their system and from my own testing this morning and yesterday, it works. Even if WHEAs are still happening at a hardware level, there's 0 impact so far after disabling the logging system.


----------



## ManniX-ITA

Dreamic said:


> Only a few cases with non-RTL NICs? I'm not sure just how guaranteed people are not to have WHEA if they buy like ROG Strix B550's, are most of those people all fine? If so, I should just buy like a Asus B550-A. Making me think about it if this is true.


I can only speak from what I see here and around.
I'm not AMD and not a test center, luckily.
It's hard to really make an assessment even when you get in touch with the sources.
You never know, maybe it's all fine and then you discover they run FCLK in desync.
It's easier, since there are a lot more people with this problem and willing to try, to see if the workaround works.



Dreamic said:


> So they had fine motherboards/NICs but CPUs so bad they couldn't take advantage of it? How would I know if I have/get a good motherboard/NIC if my CPU is also bad causing WHEA, could think I keep getting bad boards/NICs when it's CPU... AMD... why
> 
> This is too annoying


I agree, too many variables and AMD as always is foggy, hyped up, then shut up, doesn't clarify what's up...


----------



## ManniX-ITA

@weleh

MSI CPU LLC is "bugged"; my Cinebench score sinks if I set to anything else than Auto.
Same thing on an X570 Unify. I think MSI did some trick to enhance CB scores and has drawbacks.
Didn't have a similar issue with the Aorus Master.

CPU LLC Auto is problematic cause seems to be high vdroop; therefore it's impacting negatively a lot of other stuff.
Had to find a fine tune between VSOC, PWM, SOC LLC, VDDG CCD & IOD.
FCLK 2000 is not forgiving as much as 1900.

I'll post later the screenshots form the BIOS.
Maybe there's something useful.

This is what I could do for now, but the CO counts are unstable.
More or less same or better than FCLK 1900.
Still have to tune the memory, is at 4000 CL15.



Spoiler: Benchmarks FCLK 2000


----------



## weleh

My geek 5 is fine. 

1800+ single core and 12000 multi


----------



## Fight Game

ManniX-ITA said:


> You could have an issue with the IOD not able to run at high FCLK.
> Can you post the general and details tab contents of the event message?


----------



## Fight Game




----------



## ManniX-ITA

Fight Game said:


> View attachment 2511359


This is really weird; must be really unstable.
It's missing all the information seems like the WHEA itself got corrupt...


----------



## ManniX-ITA

weleh said:


> My geek 5 is fine.
> 
> 1800+ single core and 12000 multi


Did you compare it?
You need to check one by one every benchmark.
Some of them could have better scores but others worse.
You need to fine-tune until you are higher or close in 2% in every test.
But maybe you don't need it, check if some are red in the comparison.

Like this, setting as baseline a good bench at FLCK 1900:


----------



## weleh

Yes,

There's still some MT performance to be had but it's pretty much the same. No throttleing.



Micro-Star International Co., Ltd. MS-7D13 vs Micro-Star International Co., Ltd. MS-7D13 - Geekbench Browser


----------



## 1devomer

ManniX-ITA said:


> It's an option of course but it did work only for some.
> Most people I've seen sending back a 5900x/5950x best would get one without failing cores and booting at FCLK 1900.
> Those who got something booting above 1900 without WHEA were mostly 5600x/5800x and very few.
> I'm thinking myself about doing it since I still have the 3800x.
> But I'm more inclined to wait till the last moment hoping the manufacturing has improved over time.



I can only warmly and kindly invite you to RMA your cpu, if it is awfully binned.
You never know, you could actually get a better cpu to review.

Nevertheless, you are right about waiting the right timing.
Citing the last AMD earning transcript, both Zen2 and Zen3 EPYC orders are going strong.
Which mean that the best IOD/CCD are still mostly allocated to these segments.

Tho, i'm not sure if this trend will change anywhere soon, i'm not sure either, that waiting too long would be really profitable.
AMD could decide to cut short the amount of RMA delivered soon.

Better be sad then sorry.


----------



## ManniX-ITA

weleh said:


> Yes,
> 
> There's still some MT performance to be had but it's pretty much the same. No throttleing.
> 
> 
> 
> Micro-Star International Co., Ltd. MS-7D13 vs Micro-Star International Co., Ltd. MS-7D13 - Geekbench Browser


Looks good!


----------



## ManniX-ITA

1devomer said:


> I can only warmly and kindly invite you to RMA your cpu, if it is awfully binned.


It's not awful but not even the best
Was waiting AMD to fix FCLK to decide but they didn't
Can post at 2067 MHz which is not bad at all for a 5950x

Now I'm running at 2033 and it rocks
I've just redone the tuning for the CO and at higher FCLK I can go down much more
Got back quite some good performances
Have to see first how it goes with a better cooling then I'll decide


----------



## DeletedMember558271

ManniX-ITA said:


> I can only speak from what I see here and around.
> I'm not AMD and not a test center, luckily.
> It's hard to really make an assessment even when you get in touch with the sources.
> You never know, maybe it's all fine and then you discover they run FCLK in desync.
> It's easier, since there are a lot more people with this problem and willing to try, to see if the workaround works.
> 
> 
> 
> I agree, too many variables and AMD as always is foggy, hyped up, then shut up, doesn't clarify what's up...


Yea, I asked in the only thread I could find here for Asus B550 without Realtek, and these were the 2 replies I got:


blu3dragon said:


> I think there is a good element of cpu silicon lottery with this. My 5800x won't post at 1900 and then has WHEA errors at anything over that.
> I settled on 1800 for daily use which is the highest I can go without raising SoC voltage.





drotaru said:


> Same for my 5900x 1866 max no WHEA 1900 no boot at all 1933 onwards whea errors
> 
> And I know for a fact that the board can do it as I could do it with my older 3600x
> 
> Sent from my VOG-L29 using Tapatalk


So non-RTL NICs aren't looking much more promising to me personally idk. Who knows how many Asus B550 ROG Strix boards I'd have to buy before I get one that works, and like them I still wouldn't be able to post at 1900 FCLK specifically as that's just a CPU thing it seems, need to RMA for 1900...
All this is making me wish Intel hadn't lost the lead to AMD at the time I was going to upgrade
Oh well


----------



## blu3dragon

Dreamic said:


> Yea, I asked in the only thread I could find here for Asus B550 without Realtek, and these were the 2 replies I got:
> 
> 
> So non-RTL NICs aren't looking much more promising to me personally idk. Who knows how many Asus B550 ROG Strix boards I'd have to buy before I get one that works, and like them I still wouldn't be able to post at 1900 FCLK specifically as that's just a CPU thing it seems, need to RMA for 1900...
> All this is making me wish Intel hadn't lost the lead to AMD at the time I was going to upgrade
> Oh well


I haven't tried the WHEA suppressor in this thread. Not sure I want to go that far though for a daily system 🙃
I expect the lottery is with the CPU itself rather than the motherboard, so if it's just a case of getting away from the realtek NIC you should only need one mobo.

It's not such a big deal unless you are chasing record memory bench results though. It seems pretty much all CPUs will hit FCLK 1800 / DDR 3600 and from there you can still tune memory timings and have the best performance out of any currently available CPU.


----------



## FightCat

Um hello, my question will most probably annoy you.

My CPU is a R9 5900x on an X570 mobo.

All I did so far other than setting XMP enabled in BIOS has been running CTR 2.1 RC5 and activating the profiles it created.

Do I benefit from using this tool or is this reserved for special purposes?

Thanks in advance.


----------



## Asmodian

FightCat said:


> Do I benefit from using this tool or is this reserved for special purposes?


Are you getting WHEA errors that cause slow downs?

If not, this isn't useful for you.


----------



## ManniX-ITA

Dreamic said:


> So non-RTL NICs aren't looking much more promising to me personally idk. Who knows how many Asus B550 ROG Strix boards I'd have to buy before I get one that works, and like them I still wouldn't be able to post at 1900 FCLK specifically as that's just a CPU thing it seems, need to RMA for 1900...


Could be the Realtek NIC is only one of the many problems... or that is not the root cause but a victim of something else.
But seems Intel fixed something while Realtek didn't.

For me the important thing is that I can run at FCLK 2000 and it works faster and better than 1900.
I still didn't have a single issue with my USB peripherals, something I experienced last with my old pal i4770k.

Tried FCLK 2033 and 2067 but I have random performance issues at 2033 and awful performances at 2067.
Will check again but honestly I'm not sure it's worth the effort since, in theory, I'm planning a new Gen4 GPU.


----------



## DeletedMember558271

ManniX-ITA said:


> Could be the Realtek NIC is only one of the many problems... or that is not the root cause but a victim of something else.
> But seems Intel fixed something while Realtek didn't.
> 
> For me the important thing is that I can run at FCLK 2000 and it works faster and better than 1900.
> I still didn't have a single issue with my USB peripherals, something I experienced last with my old pal i4770k.
> 
> Tried FCLK 2033 and 2067 but I have random performance issues at 2033 and awful performances at 2067.
> Will check again but honestly I'm not sure it's worth the effort since, in theory, I'm planning a new Gen4 GPU.


I don't know what Intel fixed with I225-V other than the connection problems it was having, when they released the fixed B3 stepping.
I think everyone can run 1933+ and have it work faster and better if they ignore/disable WHEA. I can boot 2000/4000 and run AIDA and other things with way better results too.
Reality is I don't think a single person knows why they have WHEA, or don't. It's all just complete guessing at this point.

And you can probably still do things to reduce the amount of WHEA you get, whether it would be happier with more voltage or not etc. Something I don't know if these benchmarks and stability tests would be able to pick up on and steer you towards very well. Do you think if you disable WHEA and rely completely on these tests and making them better, that if you were to reenable WHEA afterwards they would be reduced as much as possible? Cause I think that should be part of the goal
And if the tests are actually good I think they should be able to pick up on that and steer you towards that result


----------



## ManniX-ITA

Published a new release:









Release v1.1.0.1 Release · mann1x/WHEAService


Please uninstall Alpha version, reboot, install the new version and reboot Better management of Custom Event Source installation and uninstallation Disabled Autolog to Application log Fixed some i...




github.com





Not mandatory as anything functional changes.


----------



## weleh

Manni do you know why we need to tune VDDGs and SOC and stuff to gain performance back at 2000 fCLK?
Or even tweaking PBO limits?

Is there some sort of throttleing going on?


----------



## kairi_zeroblade

When I tried this (the 1st release), the WHEA Event 19's were gone on 1933mhz though the slowdowns were still there..dunno what I could be doing wrong, so I uninstalled it and returned to my stable profile (the slowdowns are gone on 1900/3800mhz and so was the spam of WHEA Event 19's)


----------



## ManniX-ITA

weleh said:


> Manni do you know why we need to tune VDDGs and SOC and stuff to gain performance back at 2000 fCLK?
> Or even tweaking PBO limits?
> 
> Is there some sort of throttleing going on?


Running at higher FCLK is more demanding.
You can run at very low FCLK with ridiculously low voltages.
But the more you scale up, more voltage is needed.
Not because of throttling but cause more power is needed to support it.

If the performances are dropping or the results not repeatable then could be there's throttling.
But could also be that the voltages are not enough or the IOD/CCDs are getting unstable.




kairi_zeroblade said:


> When I tried this (the 1st release), the WHEA Event 19's were gone on 1933mhz though the slowdowns were still there..dunno what I could be doing wrong, so I uninstalled it and returned to my stable profile..


Maybe nothing, this works only if you don't have performance degradation or instabilities.
In case you do, it can't help.

On a 3000 doesn't really change much to go to 1933, I would aim more to 1967 if possible.
You may have to try with higher VDDG and VSOC.
But if it works it's pretty rare.

On a 5000 better to aim for 2000, doesn't change too much on the voltages but gives a nice boost.


----------



## weleh

You need to retune voltages (VDDG's, SOC, and maybe even PBO limits) to get performance back.


----------



## ManniX-ITA

weleh said:


> You need to retune voltages (VDDG's, SOC, and maybe even PBO limits) to get performance back.


Indeed, some may need also tuning of PLL.
Luckily I don't need it up to 2000.
PBO limits I'm not sure, for me still the old ones works best.
But increased VSOC and IOD means a lot more negative counts, so you have to retune CO.


----------



## kairi_zeroblade

ManniX-ITA said:


> Maybe nothing, this works only if you don't have performance degradation or instabilities.
> In case you do, it can't help.


so it just really catches the Event 19's then..but the instability is still there..



ManniX-ITA said:


> You may have to try with higher VDDG and VSOC.


I already tried tuning these up also the cpu_vdd still no dice..



ManniX-ITA said:


> On a 5000 better to aim for 2000, doesn't change too much on the voltages but gives a nice boost.


I'll find time tuning over the weekend again see if something comes up..


----------



## DeletedMember558271

kairi_zeroblade said:


> so it just really catches the Event 19's then..but the instability is still there..


Yep, and nobody knows what is causing the WHEAs or instability.
And nothing they're probably doing after disabling them is probably helping to improve the stability of whatever it is.
I think almost everyone should be able to hit 2000/4000 and get similar performance boosts if they disable WHEA, don't think it's anything special.
And anyone that thinks they're fine or knows what it is right now is basically complete blind faith based off feelings they have in their stomach

*Edit*: Here's a good post I think, that happened after I posted this


RonLazer said:


> Dark Hero! Although its called "Promontory Presence" for some reason, I can never fathom why BIOS programmers refuse to give anything a useful name.
> 
> @Veii I don't think your theory about WHEAs being caused by the Realtek chip holds water (open to evidence to prove otherwise though!). I was just looking at the WHEA 19 Events and I noticed the Memory Hierarchy Level is "3", which according to the Windows hierarchy map is the L3 cache. WHEA 18s (caused by undervolting, usually curve-optimiser being too low) usually report level 0 which is the CPU registers, and sometimes level 1 (instruction cache), which is exactly what you'd expect if the error was occurring in the CPU cores. An L3 cache error implies it was caught by the GMI bus ECC mechanism before passing through the CPU cores or the L1/L2 cache. Now this could well be faulty data from a malfunctioning chip, but the CBS settings indicate that the MCA uses error thresholding and the default is set to 10, although I'm not sure how often it triggers, but you can increase it up to 4094 I think. A high memory intensity workload would be flooding the L3 cache with data, and if some of it was getting corrupted in the transfer (due to unstable link speeds) then that would exceed the MCA error threshold and trigger a PIE event which is reported to the OS. The AMD documentation points suggests this is the error:
> 
> View attachment 2511661
> 
> 
> So yeh, this might be a case of Occams Razer. The simplest explanation is the infinity fabric is clocked too high and data sent along the bus is getting corrupted. It's likely happening in normal operation, but rarely enough to stay below the MCA error threshold so it doesn't get reported.
> 
> I maintain that the root of the problem is due to the link-equalisation mechanism, where AMD/Hypertransport developed a protocol for synchronising the IF-PHY bridge interface up to 1900MHz (which appears to have been its maximum design spec), slapped together something functional but imperfect for 1900-2000MHz range in time for Zen3, but ultimately couldn't even get it to reliably synchronize until AGESA 1.1.8, decided it wasn't worth sinking extra effort into trying to push the link-equalisation past 2000MHz and so any higher frequencies re-use the protocol for 2000MHz. Maybe some chips have a remarkably stable interconnect and can function outside the specified range, but there's no hope of finding the hidden settings that will unlock performance outside this range. Maybe they will update the AGESA at some point to improve the IF-PHY training algorithm, but I wouldn't hold my breath.


As soon as Veii came out with this big reveal about Realtek people decided it's time to disable WHEA, and strangely nobody has been interested or in much of a hurry to actually prove any of it, or they tried and found out they were wrong are just kinda quiet and trying to move on... spending all this time screwing around with Micron Rev.E now for fun instead of investigating this any further... doesn't make any sense priority wise to me


----------



## ManniX-ITA

Dreamic said:


> Yep, and nobody knows what is causing the WHEAs or instability.


If we could we would have done already 



Dreamic said:


> And nothing they're probably doing after disabling them is probably helping to improve the stability of whatever it is.


I don't have stability issues unless I'm not upping my voltages.
But that's normal, same from FCLK 1800 to 1900 etc



Dreamic said:


> I think almost everyone should be able to hit 2000/4000 and get similar performance boosts if they disable WHEA, don't think it's anything special.


This is wrong; it is special.
Many IO dies or Cores can't handle it even with high voltages.
Up to FCLK 1900 everything is easy.
Consider the when you approach FCLK 2000 the PCIe Gen4 can start having reliability issues.
Don't think it's easy; just booting into Windows doesn't mean anything.
Getting good memory performances is also pretty easy.
Good and repeatable CPU performances it's not.



Dreamic said:


> As soon as Veii came out with this big reveal about Realtek people decided it's time to disable WHEA, and strangely nobody has been interested or in much of a hurry to actually prove any of it, or they tried and found out they were wrong are just kinda quiet and trying to move on...


There's nothing strange or quiet.
As said already it takes time and to prove anything is really hard and could be non-conclusive.
Don't judge someone else work on this topic unless you did your own  



Dreamic said:


> *Edit*: Here's a good post I think, that happened after I posted this


I answered Ron, I don't think his analysis is correct


----------



## mongoled

ManniX-ITA said:


> But increased VSOC and IOD means a lot more negative counts, so you have to retune CO


Can you explain this as im unsure if I have understood you correctly ?

Are you saying that by increasing vSOC and IOD that each CO step applies less voltage the more your increase these ??


----------



## ManniX-ITA

mongoled said:


> Are you saying that by increasing vSOC and IOD that each CO step applies less voltage the more your increase these ??


No, a negative CO count will make run a Core at lower voltage and higher speed (mostly).

Let's say you have a core that runs at 1.4V 5000MHz at -10.
You set it at -15 and will run at 1.38V 5050MHz.
But that's too much and it will crash.

If you raise VSOC and VDDG IOD it could not crash when set at -15.
I've gained from 2 counts to 5 for every core.
But you really need a lot of IOD, I'm at 1130mV now.

Plus you need strong PWM Switching frequency on both CPU and SOC, strong LLC on SOC.


----------



## ManniX-ITA

On my 5950x anything above 100 MHz PBO boost clock will start crashing the cores under CoreCycler.
Fine on everything else including OCCT.
It may be a thermal limitation.
Start with 0 boost clock and check with CoreCycler.
Once they are stable to the lowest count start stressing with CC upping the boost clock.
It's much more valuable a lower count than more boost clock.


----------



## weleh

Dreamic said:


> Yep, and nobody knows what is causing the WHEAs or instability.
> And nothing they're probably doing after disabling them is probably helping to improve the stability of whatever it is.
> I think almost everyone should be able to hit 2000/4000 and get similar performance boosts if they disable WHEA, don't think it's anything special.
> And anyone that thinks they're fine or knows what it is right now is basically complete blind faith based off feelings they have in their stomach
> 
> *Edit*: Here's a good post I think, that happened after I posted this
> 
> As soon as Veii came out with this big reveal about Realtek people decided it's time to disable WHEA, and strangely nobody has been interested or in much of a hurry to actually prove any of it, or they tried and found out they were wrong are just kinda quiet and trying to move on... spending all this time screwing around with Micron Rev.E now for fun instead of investigating this any further... doesn't make any sense priority wise to me


Wrong but you're free to prove otherwise.
I have 0 instability running 4067 MT/s 1:1:1.

You're free to boot 4000 MT/s 1:1:1 and test yourself. Not tuning voltages causes huge performance degradation in terms of CPU performance. This degradation can be so bad even your memory performnace will be trash.

Disabling the WHEA reporting service plus tunning voltages helps because you no longer have to deal with WHEA spam clogging the resources and allows you to run at least at the same level of your non WHEA setup in terms of CPU performance.

In my own case, my performance is exactly the same in terms of raw power and for the memory party, there is a gigantic boost in performance on synthetics.
I gained like 5GB/s Read, 3GB/s Copy, gained 2ns and on a synthetic like GB3 I'm not at 10.1K memory score single and multi from my 3800c14 setup which was 9.6K.


----------



## ManniX-ITA

Published a new release:

Release v1.2.0.0 Release · mann1x/WHEAService

Added some useful logs to list the WHEA sources (you can use it to understand from where the errors are coming from)


----------



## DeletedMember558271

ManniX-ITA said:


> If we could we would have done already


I still think we could find out more.



ManniX-ITA said:


> I don't have stability issues unless I'm not upping my voltages.
> But that's normal, same from FCLK 1800 to 1900 etc


You have stability issues with whatever it is that's unstable and causing WHEAs, at least.



ManniX-ITA said:


> This is wrong; it is special.
> Many IO dies or Cores can't handle it even with high voltages.
> Up to FCLK 1900 everything is easy.
> Consider the when you approach FCLK 2000 the PCIe Gen4 can start having reliability issues.
> Don't think it's easy; just booting into Windows doesn't mean anything.
> Getting good memory performances is also pretty easy.
> Good and repeatable CPU performances it's not.


I might be special then, didn't really spend any time checking and validating CPU performance, but 4000/2000 was literally 0 effort to boot last time I tried and had significant AIDA increases.
Also I don't think I've seen a single person that can't boot over 1900, seems like everyone can boot 1933 or 1966 or 2000, just with WHEA.





weleh said:


> Wrong but you're free to prove otherwise.
> I have 0 instability running 4067 MT/s 1:1:1.
> 
> You're free to boot 4000 MT/s 1:1:1 and test yourself. Not tuning voltages causes huge performance degradation in terms of CPU performance. This degradation can be so bad even your memory performnace will be trash.
> 
> Disabling the WHEA reporting service plus tunning voltages helps because you no longer have to deal with WHEA spam clogging the resources and allows you to run at least at the same level of your non WHEA setup in terms of CPU performance.
> 
> In my own case, my performance is exactly the same in terms of raw power and for the memory party, there is a gigantic boost in performance on synthetics.
> I gained like 5GB/s Read, 3GB/s Copy, gained 2ns and on a synthetic like GB3 I'm not at 10.1K memory score single and multi from my 3800c14 setup which was 9.6K.


Wrong about what? You have WHEA. You can't tell me where they're coming from, you can only hope and guess whatever it is that's unstable isn't something important.

Again don't think the results in here are too special, just because 99.99% of people haven't tried using this tool/disabling WHEA yet.
If everyone starts using this tool and barely anyone can accomplish what you have at 4000/2000, yea I guess you're special then.
Just because barely anyone else is trying this right now doesn't mean barely anyone else can accomplish it...

Don't think there's anything more to talk about right now


----------



## ManniX-ITA

Dreamic said:


> I still think we could find out more.


For sure, I'm always trying to prove that I'm wrong.
There's a lot of research ongoing maybe something good will come out.



Dreamic said:


> You have stability issues with whatever it is that's unstable and causing WHEAs, at least.


Yes I mean system stability, something wrong there's for sure that is causing WHEAs.



Dreamic said:


> I might be special then, didn't really spend any time checking and validating CPU performance, but 4000/2000 was literally 0 effort to boot last time I tried and had significant AIDA increases.
> Also I don't think I've seen a single person that can't boot over 1900, seems like everyone can boot 1933 or 1966 or 2000, just with WHEA.


There are some.
I meant real performances, not AIDA latency. That's easy.
The check a 3DMark, Geekbench, CineBench, blender, etc
If you want to really get 100% or more of the performances from FCLK 1900, on everything not just a single benchmark, you need to work a bit.
The Infinity Fabric runs really stressed and the differences between samples bigger.



Dreamic said:


> Wrong about what? You have WHEA. You can't tell me where they're coming from, you can only hope and guess whatever it is that's unstable isn't something important.


My take is that if it was something important, I would have noticed at this point right? 
The goal is to have a stable and performant system and it's achieved.

Now let me explain why I think in case you get an _*high data rate flow WHEAs 19 *_it is worth silencing them and why they could not be related to Infinity Fabric.
If you don't get an high rate of WHEA, it's better not to silence them and look for settings to minimize or eliminate WHEAs.

The "normal" WHEA 19 is usually coming from the NMI, it's a method specific to the CPU to raise an hardware error.
This flow it's not coming from there but from an unknown device driver. Still looking on how to identify who's sending it.

The Infinity Fabric is using GMI links which are high speed serial interfaces.
The most common HSI in the world is HDMI.
Same principle, there are 3 x TMDS inside to transfer the pixels, they are all a *serial link at high speed.*

That you get errors at these amazing speeds it's pretty normal, same on PCIe above Gen 2.
With PCIe Gen 5 to 6 will be normal to have a very high error rate; it will be just very fast and efficient in correcting them.

What happens when you have a very high number of corrected errors?
First thing will happen is reduced bandwidth. It's obvious, serial link; if the previous packet needs to be corrected will slow the flow.
There's a fixed capacity in error correction. Usually when you get a WHEA 19 means it's already above a threshold which can have an impact.

On HDMI means that you can't get a successful negotiation at 1080p, you need to scale down to 1080i which is using less bandwidth.
With a GPU on a bad riser cable you'd see the PCIe transfer test sinking in bandwidth.

When you are really on the brink of the error correction capabilities what happens usually is that at some point it's not enough.
It's a serial connection at high speed, there are buffers on both the sending and receiving side.
If the slow down is too serious at some point one or the other side will end up in a buffer overrun or underrun situation.

On HDMI means artefacts on screen, on a GPU usually a BSOD.
With a Ryzen CPU usually means a WHEA 18 and/or a sudden reboot.

I've experimented for long with my 3800x to run high FCLK and I know how it behaves when the Infinity Fabric is under stress.
With a few WHEAs 19 you can go on for months without any problem.
But if you start getting a steady flow, not hundreds per second, just in the rate of 1-2 per second running y-cruncher, it will crash 99%.

My theory is that if my 5950x was really experiencing such astronomical error correction rate in the L3, on the GMI links, anywhere in the IF, it would not even boot into Windows.
At the very least I would see such a big performance degradation that anything above FCLK 1900 would be actually slower than 1900.
And in any case running it with real workloads for days I would end up in stability issues.

There are for sure instabilities in the IF running over 1900 and you need to fix them as best with very fine tuned voltages.
At FCLK 2067 I can clearly see that there are small performance degradations here and there.
That's a point where I wouldn't feel safe to run it daily.
But even at the FCLK the benefits of running higher IF are higher than the performance drops and the stability is not undermined.

WHEAService is only meant to fix the latency penalty if you have this high data rate of WHEAs.
Which seems to be a very common trait for almost everyone.

Personally I'm looking for the best FCLK 2000 profile I can get for my daily.
There's a nice performance uplift in gaming, system responsiveness and zero USB issues.
But it's a risky overclock; I'll stress again that is not safe and there's a high risk it could brick your system, corrupt your OS or data.


----------



## 1devomer

ManniX-ITA said:


> ...That you get errors at these amazing speeds it's pretty normal, same on PCIe above Gen 2.
> With PCIe Gen 5 to 6 will be normal to have a very high error rate; it will be just very fast and efficient in correcting them....



Is it a specification coming from the PCI-SIG consortium??
Do you have a link to the documents describing the error correction methods?


----------



## ManniX-ITA

Only for members sadly:






PCIe® 6.0 Specification, Version 0.5: Now Available to Members | PCI-SIG







pcisig.com





I have read a while ago a very interesting series of articles from a guy working in PCIe validation.
There were all sort of details about it and also how important is the mainboard design in regard to length and pathway of the traces etc
But I'm not sure I've bookmarked them, if I find them I'll let you know.


----------



## 1devomer

ManniX-ITA said:


> Only for members sadly:
> 
> 
> 
> 
> 
> 
> PCIe® 6.0 Specification, Version 0.5: Now Available to Members | PCI-SIG
> 
> 
> 
> 
> 
> 
> 
> pcisig.com
> 
> 
> 
> 
> 
> I have read a while ago a very interesting series of articles from a guy working in PCIe validation.
> There were all sort of details about it and also how important is the mainboard design in regard to length and pathway of the traces etc
> But I'm not sure I've bookmarked them, if I find them I'll let you know.


I have found the article describing the PCIe 6:


> _"PCI Express® 6.0 Specification at 64.0 GT/s with PAM-4 signaling: a low latency, high bandwidth, high reliability and cost-effective interconnect_"


Alongside with adopting PAM4 signalling, it also implements a low latency FEC error correction method, at the PHY player i suppose.
It was required to mitigate the high bit error rate, inherent of the PAM4 signalling.

In fact, the paper seems to take a very serious note on the error correction management and reliability at high speeds.
Not sure that it should be taken so lightly, as you seem to believe.

Not sure that any of these new error correction methods, for high speed interconnect, are implemented into the AMD 5k ∞ fabric.


----------



## mongoled

@ManniX-ITA 

Thanks for the updated module



Whats the significance of the following 



> Failed to disable WHEA error source type 0 ID=1
> Failed to disable WHEA error source type 3 ID=3
> Failed to disable WHEA error source type 7 ID=4


----------



## ManniX-ITA

mongoled said:


> Whats the significance of the following


Nothing, for some sources WMI will fail the query and they will still be left in the running state. Despite that they will stop sending WHEAs...



1devomer said:


> Not sure if it should be taken so lightly, as you seems to believe.


Oh my, I must be so bad in explaining myself 
Why you believe so?
The core of the whole dissertation was exactly how vital error correction is for an HSI...


----------



## 1devomer

ManniX-ITA said:


> Why you believe so?


Dunno, must be because disabling error logging components, usually is not a good idea.
Or maybe, it's because i'm not accustomed to the idea, that having errors is a normal and common thing.


----------



## mongoled

So......

From Manni-ITX excellent work I can diagnose the following from my CPU running at 4133/2067

On most boots I get the following

Type: 3 Description: WheaErrSrcTypeNMI = 0x03, Non-Maskable Interrupt* - 1 Warning*
Type: 0 Description: WheaErrSrcTypeMCE = 0x00, Machine Check Exception* - 100 Warnings*

And on the rare occasion

Type: 1 Description: WheaErrSrcTypeCMC = 0x01, Corrected Machine Check* - 1 Warning*
Type: 3 Description: WheaErrSrcTypeNMI = 0x03, Non-Maskable Interrupt* - 1 Warning*
Type: 0 Description: WheaErrSrcTypeMCE = 0x00, Machine Check Exception* - 100 Warnings*

Once the system has booted I rarely get a Type 1/3 warning, get spammed with thousands of Type 0 warnings.

Reading your explanation, its the Type 0 warnings that dont seem to cause any sort of degradation issue to the hardware/software of the system


----------



## ManniX-ITA

1devomer said:


> Dunno, must be because disabling error logging components, usually is not a good idea.
> Or maybe it's because i'm maybe not accustomed to the idea, that having errors is a normal and common thing.


Indeed it's not a good idea and neither normal or a common thing 
It's a risky workaround for a specific use case.
Like all overclocks comes with high risk and needs a big leap of faith.
Same when playing with LN2 or with 2V VDIMM.
You take risks and it's better you know what you are doing.


----------



## DeletedMember558271

mongoled said:


> So......
> 
> From Manni-ITX excellent work I can diagnose the following from my CPU running at 4133/2067
> 
> On most boots I get the following
> 
> Type: 3 Description: WheaErrSrcTypeNMI = 0x03, Non-Maskable Interrupt* - 1 Warning*
> Type: 0 Description: WheaErrSrcTypeMCE = 0x00, Machine Check Exception* - 100 Warnings*
> 
> And on the rare occasion
> 
> Type: 1 Description: WheaErrSrcTypeCMC = 0x01, Corrected Machine Check* - 1 Warning*
> Type: 3 Description: WheaErrSrcTypeNMI = 0x03, Non-Maskable Interrupt* - 1 Warning*
> Type: 0 Description: WheaErrSrcTypeMCE = 0x00, Machine Check Exception* - 100 Warnings*
> 
> Once the system has booted I rarely get a Type 1/3 warning, get spammed with thousands of Type 0 warnings.
> 
> Reading your explanation, its the Type 0 warnings that dont seem to cause any sort of degradation issue to the hardware/software of the system


So you're getting some WHEA CPU instability errors you definitely shouldn't be ignoring, and they both appear together at the same time when exceeding 1900 FCLK yes?
Almost like they might be related to the same instability or something...

@ManniX-ITA , you have absolutely 0 Error 1's or 3's ever? Only ever seen error 0's?
I hope people here are making sure to check every WHEA they get, even if you're getting hundreds or thousands, gotta hold that arrow key to scroll through fast and see if the ErrorSource ever changes from 0 to something else...


----------



## DeletedMember558271

Oh... yea I think you pretty much admitted you do?


ManniX-ITA said:


> If I disable the WHEA Source for the high data rate flow of Error 19, I still get some spurious errors from other sources.
> I can't map them to anything as I usually can when something is wrong.
> *My conclusion is that when this high flow data rate starts bogging the system the whole WHEA error reporting system starts failing.*


You're just also choosing to ignore WHEA errors that point directly to CPU instability, because you decided they're fake and manifesting out of nowhere because of all the errors pointing to a driver source (which we're supposed to believe those ones are trustworthy about pointing to a driver source, but not believe the ones pointing to the CPU being unstable are trustworthy)

This is a lot of hoops to jump through and stretches being made, things being selectively ignored and listened to based on what you want to believe.
The errors that back up what I want to believe are real, the errors that don't are fake...

Wonder what's going to happen when Ron disables his chipset and still probably gets all these WHEA errors, what do we say then...


----------



## ManniX-ITA

mongoled said:


> So......
> 
> From Manni-ITX excellent work I can diagnose the following from my CPU running at 4133/2067
> 
> On most boots I get the following
> 
> Type: 3 Description: WheaErrSrcTypeNMI = 0x03, Non-Maskable Interrupt* - 1 Warning*
> Type: 0 Description: WheaErrSrcTypeMCE = 0x00, Machine Check Exception* - 100 Warnings*
> 
> And on the rare occasion
> 
> Type: 1 Description: WheaErrSrcTypeCMC = 0x01, Corrected Machine Check* - 1 Warning*
> Type: 3 Description: WheaErrSrcTypeNMI = 0x03, Non-Maskable Interrupt* - 1 Warning*
> Type: 0 Description: WheaErrSrcTypeMCE = 0x00, Machine Check Exception* - 100 Warnings*
> 
> Once the system has booted I rarely get a Type 1/3 warning, get spammed with thousands of Type 0 warnings.
> 
> Reading your explanation, its the Type 0 warnings that dont seem to cause any sort of degradation issue to the hardware/software of the system


You have to check the Source ID not the Type.
In my case I get almost all warnings from ErrorSourceID 0:

ID: 0 Type: 16 State: Stopped Description: WheaErrSrcTypeDeviceDriver = 0x10, Device Driver Error Source
ID: 1 Type: 0 State: Started Description: WheaErrSrcTypeMCE = 0x00, Machine Check Exception
ID: 2 Type: 1 State: Stopped Description: WheaErrSrcTypeCMC = 0x01, Corrected Machine Check
ID: 3 Type: 3 State: Started Description: WheaErrSrcTypeNMI = 0x03, Non-Maskable Interrupt
ID: 4 Type: 7 State: Started Description: WheaErrSrcTypeBOOT = 0x07, BOOT Error Source

Which is WheaErrSrcTypeDeviceDriver Type 16 (0x10).

Can you please double check?

I did the same exercise in the past as @RonLazer trying to decode the WHEA 19 error and cam to the same conclusions.
Still looking on how to decode the data but I've made just a few steps toward the goal.

But seems that thanks to @1devomar questions about error correction in Infinity Fabric GMI links* I've found a clue that could be important*.
BTW GMI links are a super-set of Hyper-Transport 3.1, it does per packet CRC-32 error correction.

I couldn't really understand reading the PPR what the heck really means "Master abort".

There is a description but is vague about the consequences:

_Accesses to unimplemented registers of implemented functions are ignored: writes dropped; reads return 0. Accesses to
unimplemented functions also ignored: writes are dropped; however, reads return all F's. The processor does not log any
master abort events for accesses to unimplemented registers or functions.
Accesses to device numbers of devices not implemented in the processor are routed based on the configuration map
registers. If such requests are master aborted, then the processor can log the event._

But in the Hyper-Transport specs is very well defined:

_A Master Abort indicates that a directed request f*ailed to find a device on the chain that would accept it*. Devices receiving a Master Abort response set the Received Master Abort bit in their Status CSR or (for bridges receiving the response on their secondary bus) their Secondary Status CSR. *Master Abort responses propagate through HyperTransport bridges in the same manner as Master Aborts through PCI bridges* – they are either converted to *normal (non-error) responses* or to Target Abort responses, depending on the state of the Master Abort Mode bit of the Bridge Control CSR. All Fs are returned as data for read responses._

The "Target Abort" and "Data Error" error responses are actually errors, as the target failed to 

_A Target Abort indicates that the device receiving the request took an error. *If the transaction was a read, the returning data cannot be used. If the transaction was a write, the target location must be assumed to have gone to an undefined state.* Devices receiving a Target Abort response set the Received Target Abort bit in their Status CSR or (for bridges receiving the response on their secondary bus) their Secondary Status CSR. Devices driving a Target Abort response set the Signaled Target Abort bit in their Status CSR or Secondary Status CSR, as appropriate. Target Abort responses pass through bridges as Target Abort responses. 

A Data Error indicates that the device receiving the request detected an *error in the data, such as a parity or ECC mismatch on another bus or memory.* If the transaction was a read, the returning data cannot be used. If the transaction was a write, the target location must be assumed to have gone to an undefined state. Devices with the Data Error Response bit set that receive a Data Error response will set the Master Data Error bit in their Status CSR or (for bridges receiving the response on their secondary bus) their Secondary Status CSR. Devices driving a TgtDone with Data Error will set the Data Error Detected bit in their Status CSR or Secondary Status CSR, as appropriate_

*But not "Master Abort"; it's like something is spamming the bus with messages to an unknown recipient.

This makes me think even more that my theory that is not an error but something acting stupid is valid.*
Unless these WHEA are not translated to Master Abort but to the others 2 types.


----------



## ManniX-ITA

Dreamic said:


> So you're getting some WHEA CPU instability errors you definitely shouldn't be ignoring, and they both appear together at the same time when exceeding 1900 FCLK yes?
> Almost like they might be related to the same instability or something...
> 
> @ManniX-ITA , you have absolutely 0 Error 1's or 3's ever? Only ever seen error 0's?
> I hope people here are making sure to check every WHEA they get, even if you're getting hundreds or thousands, gotta hold that arrow key to scroll through fast and see if the ErrorSource ever changes from 0 to something else...


Yes I get also sporadic errors from others sources, this doesn't change anything.



Dreamic said:


> Oh... yea I think you pretty much admitted you do?
> 
> You're just also choosing to ignore WHEA errors that point directly to CPU instability, because you decided they're fake and manifesting out of nowhere because of all the errors pointing to a driver source (which we're supposed to believe those ones are trustworthy about pointing to a driver source, but not believe the ones pointing to the CPU being unstable are trustworthy)
> 
> This is a lot of hoops to jump through and stretches being made, things being selectively ignored and listened to based on what you want to believe.
> The errors that back up what I want to believe are real, the errors that don't are fake...
> 
> Wonder what's going to happen when Ron disables his chipset and still probably gets all these WHEA errors, what do we say then...


Yes, yes, yes, yes 
Indeed Ron could disable the chipset and still get WHEA.
And then? What does it change?

We suppose is the Realtek LAN or another EFI or the Chipset or the IOD.
Whatever it is doesn't change a single bit about what I'm saying.
I believe but I may be wrong that these errors coming at high flow can be ignored if the system is stable and performant.
Suppressing them to avoid latency penalties and testing performances and stability is what I'm doing.
If you don't want, you are free to
I'm just giving the option to anyone without risking to mess up Windows.


----------



## weleh

Dreamic said:


> I still think we could find out more.
> 
> 
> You have stability issues with whatever it is that's unstable and causing WHEAs, at least.
> 
> 
> I might be special then, didn't really spend any time checking and validating CPU performance, but 4000/2000 was literally 0 effort to boot last time I tried and had significant AIDA increases.
> Also I don't think I've seen a single person that can't boot over 1900, seems like everyone can boot 1933 or 1966 or 2000, just with WHEA.
> 
> 
> 
> 
> Wrong about what? You have WHEA. You can't tell me where they're coming from, you can only hope and guess whatever it is that's unstable isn't something important.
> 
> Again don't think the results in here are too special, just because 99.99% of people haven't tried using this tool/disabling WHEA yet.
> If everyone starts using this tool and barely anyone can accomplish what you have at 4000/2000, yea I guess you're special then.
> Just because barely anyone else is trying this right now doesn't mean barely anyone else can accomplish it...
> 
> Don't think there's anything more to talk about right now


Wrong as in whatever is causing WHEAs isn't causing any systen instability, data corruption, crashes or anything similar, on my system.
Any synthetic benchmark has shown, as expected, big performance leap going from 1900 to 2033 fCLK and gaming has been buttery smooth.

I don't think you really understand the goal of this application or the goal of people who are using it. I personally am not denying WHEAs, however given this tool and performance gains with apparently no drawbacks, I'm choosing to ignore them in favor of better performance. 

No idea why you're so mad and bitching about this. If it's not for you, you don't need to be spamming here with 0 help except throwing shade at people.


----------



## 1devomer

ManniX-ITA said:


> ...But seems that thanks to @1devomar questions about error correction in Infinity Fabric GMI links* I've found a clue that could be important*.
> BTW GMI links are a super-set of Hyper-Transport 3.1, it does per packet CRC-32 error correction.
> 
> But in the Hyper-Transport specs is very well defined:
> 
> _A Master Abort indicates that a directed request f*ailed to find a device on the chain that would accept it*. Devices receiving a Master Abort response set the Received Master Abort bit in their Status CSR or (for bridges receiving the response on their secondary bus) their Secondary Status CSR. *Master Abort responses propagate through HyperTransport bridges in the same manner as Master Aborts through PCI bridges* – they are either converted to *normal (non-error) responses* or to Target Abort responses, depending on the state of the Master Abort Mode bit of the Bridge Control CSR. All Fs are returned as data for read responses....._


I don't know what make you guys think that, terminating a transaction on the bus with a Master Abort, is less indicative of an issue or an error than a Target Abort?

I was reading the:


> ®Tsi350™ PCI-to-PCI Bridge User Manual


It states the following in the section regarding Transaction Termination page 42:


> Master Abort: A master abort occurs when no target response is detected.
> When the initiator does not detect a DEVSEL_b from the target within five clock cycles after asserting FRAME_b, the initiator terminates the transaction with a master abort.
> If FRAME_b is still asserted, the initiator de-asserts FRAME_b on the next cycle, and then de-asserts IRDY_b on the following cycle.
> IRDY_b must be asserted in the same cycle in which FRAME_b de-asserts.
> If FRAME_b is already de-asserted, IRDY_b can be de-asserted on the next clock cycle following detection of the master abort condition.


Also, it would be interesting to know where the Master Abort is initiated, if it is initiated by the device or received by the device on the bus.
In both cases, from a pci bus specifications, both Master and Target Abort are different from a normal termination of a transaction on a bus.

Also:


> If Tsi350 is delivering posted write data when it terminates the transaction because the master latency timer expires, it initiates another transaction to deliver the remaining write data. The address of the transaction is updated to reflect the address of the current Dword to be delivered. If Tsi350 is prefetching read data when it terminates the transaction because the master latency timer expires, it does not repeat the transaction to obtain more data.





weleh said:


> Any synthetic benchmark has shown, as expected, big performance leap going from 1900 to 2033 fCLK and gaming has been buttery smooth.


Isn't the gaming already buttery smooth at 1800/1900Fclk?!?!
Not sure about the real benefit of such high Fclk in game, especially when playing at higher gpu bound resolutions.


----------



## ManniX-ITA

1devomer said:


> I don't know what make you guys think that, terminating a transaction on the bus with a Master Abort, is less indicative of an issue error than a Target Abort?


See the quotes above, considering this message is in theory from the IF you need to consider the HT specs.
I'm not considering it a non issue, otherwise there wouldn't be a WHEA error reported.
What I'm doing is assessing the severity according to the description of the error and the system behavior.
There's no malfunction or instability, anything missing, no performance degradation.
That's why I'm assuming, again I may be wrong, that there's an issue and is being reported but the impact is from minimal to zero.
Hence the decision to ignore it which seems to be a winning one 



1devomer said:


> Also, it would be interesting to know where the Master Abort is initiated, if it is initiated by the device or received by the device on the bus.
> In both cases, from a pci bus specifications, both Master and Target Abort are different from a normal termination of a transaction on a bus.


They are both errors. It would be really interesting to know who are the devices on both ends but it's tricky to find out.



1devomer said:


> Isn't the gaming already buttery smooth at 1800/1900Fclk?!?!
> Not sure about the real benefit of such high Fclk in game, especially when playing at higher gpu bound resolutions.


Well I have a GTX1070 running at [email protected]
There's a noticeable difference resulting in a better headshots average 
Games are not always GPU bound.
In SOTR the gain on CPU Renderer over 30 max fps, 10 in average and 5 in 95%.
Doesn't seem much but it's better gameplay.


----------



## 1devomer

ManniX-ITA said:


> See the quotes above, considering this message is in theory from the IF you need to consider the HT specs.
> I'm not considering it a non issue, otherwise there wouldn't be a WHEA error reported.
> What I'm doing is assessing the severity according to the description of the error and the system behavior.
> There's no malfunction or instability, anything missing, no performance degradation.
> That's why I'm assuming, again I may be wrong, that there's an issue and is being reported but the impact is from minimal to zero.
> Hence the decision to ignore it which seems to be a winning one
> 
> 
> 
> They are both errors. It would be really interesting to know who are the devices on both ends but it's tricky to find out.
> 
> 
> 
> Well I have a GTX1070 running at [email protected]
> There's a noticeable difference resulting in a better headshots average
> Games are not always GPU bound.
> In SOTR the gain on CPU Renderer over 30 max fps, 10 in average and 5 in 95%.
> Doesn't seem much but it's better gameplay.


Well, i guess that Microsoft have an issue on how WHEA errors are reported on AMD cpu.
Because usually, bus spamming brings performance degradation, but it seems you guys didn't notice any drawback at all.
And neither you pinpointed the devices that are causing these bus calls.


----------



## mongoled

ManniX-ITA said:


> Can you please double check?


Sorry for the confusion

To confirm, the WHEA that are in the 100s are

"Error Source: 0"
"ApicOd: 0"

Which if I have understood correctly is related to 

WheaErrSrcTypeDeviceDriver Type 16 (0x10)


----------



## ManniX-ITA

mongoled said:


> Sorry for the confusion
> 
> To confirm, the WHEA that are in the 100s are
> 
> "Error Source: 0"
> "ApicOd: 0"
> 
> Which if I have understood correctly is related to
> 
> WheaErrSrcTypeDeviceDriver Type 16 (0x10)


That's right thanks, seems it's consistent.


----------



## weleh

Where and how are you checking these things?


----------



## ManniX-ITA

weleh said:


> Where and how are you checking these things?


I'm comparing with my WHEA reports 
Is that what you mean'


----------



## TaunyTiger

*ManniX-ITA*

How do i Install it?
I've downloaded WHEAServiceSetup.msi and sourcecode.zip. What do i do with it?


----------



## ManniX-ITA

Just double click the MSI file and the installer will start, you don't need the source code


----------



## TaunyTiger

ManniX-ITA said:


> Just double click the MSI file and the installer will start, you don't need the source code


Thanks!
My bad, I clicked on the MSI file, nothing happend. So rebooted and tested 1933mhz, but got WHEA Errors. Now when you replyed,I clicked again, and now a install started. Now testing 2000mhz CL16.


----------



## craxton

quick question, whats it mean when it fails to stop 
certain services running thru WHEA?

can attach full log if needed to understand if the service just simply isnt there or not,


----------



## ManniX-ITA

craxton said:


> quick question, whats it mean when it fails to stop
> certain services running thru WHEA?


Not a problem, the command sent via the WMI interface fails but the source stops anyway sending events.


----------



## Mojundo

Hello,

I am interested in your whea suppresor. I used it with my win 10 pro without any issues but swapped to win10 ltsc and it is not working anymore.

I made a custom version to check where is the app is struggling and it seems is at the disabling event source, especially 0, 3 and 7 (no type 16 on my system)

any hint why I can' thave disable this? I disabled the id=1 but it is still popping in my event viewer (on this image you can check the context: https://cdn.discordapp.com/attachments/836257451883495434/847212932163960852/unknown.png), I don't have any whea except from this source

Thank you!


----------



## ManniX-ITA

Mojundo said:


> I made a custom version to check where is the app is struggling and it seems is at the disabling event source, especially 0, 3 and 7 (no type 16 on my system)


It always fails to disable some sources, they are kind of "system reserved"
but just attempting to stop them usually works to silence them, even it they are still running

Can you please try the latest version from Github?
I've added more detailed logging

What do you mean it's not working anymore?
Do you get WHEA Errors?


----------



## Mojundo

I already tried the last one fews times

And yes, I still get WHEA errors even with the app (it worked perfectly on Win10 pro, ltsc seems be buggy with that app), funniest thing : I only get one whea per minute, regardless of workload. Always from the Source=1 and ErrorType=10


----------



## ManniX-ITA

Mojundo said:


> I already tried the last one fews times
> 
> And yes, I still get WHEA errors even with the app (it worked perfectly on Win10 pro, ltsc seems be buggy with that app), funniest thing : I only get one whea per minute, regardless of workload. Always from the Source=1 and ErrorType=10


Which is not consistent with the screenshot you sent...

If you are using the latest version check the logs and post the content of this event message:


----------



## Mojundo

Here we go : https://cdn.discordapp.com/attachments/847509467250950154/847510353004200006/unknown.png

no type 16 found, failed type 0/3/7, ok type 1

always the same WHEA 19 : https://cdn.discordapp.com/attachments/847509467250950154/847510785177157652/unknown.png


----------



## ManniX-ITA

Mojundo said:


> no type 16 found, failed type 0/3/7, ok type 1


Sorry my mistake, I've read ErrorType as the source type and not the error type 

If you get it is probably because your settings are unstable.
Can you post a screenshot of the event, one for for the general tab and the other with details?


----------



## Mojundo

yes : https://cdn.discordapp.com/attachments/847509467250950154/847512114968789022/unknown.png and https://cdn.discordapp.com/attachments/847509467250950154/847512217347031100/unknown.png

settings are fine (12h+ prime95 800 FFTs), was just running fine on my old win10 pro without any whea...I guess it's my system settings (wmi rights? already tried it)


----------



## ManniX-ITA

Mojundo said:


> settings are fine (12h+ prime95 800 FFTs), was just running fine on my old win10 pro without any whea...I guess it's my system settings (wmi rights? already tried it)


If it's not stopping the source, it's probably the LTSC version having a different permission.
But I couldn't find how to override it.

It's the classic WHEA 19 that comes at high rate if you had the DeviceDriver source.
The difference is that is usually coming from source 3 (NMI) instead of 1 (CMC).

Unfortunately if the error source can't be silenced there's not much that can be done.
But if the system is stable and it's only one per minute it should not be a problem, just annoying.


----------



## ApolloX30

Hi, thanks for creating this thing. My question is how to uninstall the script if I want to. I haven't seen something in this regard. Also no information in the readme file, right?


----------



## ManniX-ITA

Yes, it's in the readme on Github:

_The batch files for installation are using InstallUtil from .NET Framework 4.0 therefore you need it installed to use them.

Please use the installer if you don't know exactly what I mean.
_
Hope you used the installer.
In that case you can uninstall like any other Windows application, from Settings or Control Panel.


----------



## ApolloX30

Didn't install yet. I only install stuff if I know that I can remove it  

Danke!


----------



## ManniX-ITA

I will add an Uninstall section in the readme so it's more clear


----------



## craxton

@ManniX-ITA

question, is there a way one can (add) the extra WHEA loggers (you have that i dont)
as there is another person (with my exact board) who has one other logger that i do not,

in other words, can i copy paste from somewhere (not exactly that way, but)
is it possible to make a device error logger?

if you recall you have one in particular i do not have.
which might be why im free from whea errors.

(EDIT)









wouldnt the one im speaking of be the one this says it successfully turned off?
even tho its not shown in operations log, nor have i gotten ID19 ever.....
only 18 and 20....(as stated where we first encountered)


----------



## craxton

ApolloX30 said:


> Didn't install yet. I only install stuff if I know that I can remove it
> 
> Danke!


i was able to uninstall it?

the logger window it made is still there, but its not installed anymore.
if i knew how to remove such things in event viewer id remove it. 
but, i trust manna to leave it there as its not spying on me.


----------



## ManniX-ITA

craxton said:


> i was able to uninstall it?
> 
> the logger window it made is still there, but its not installed anymore.
> if i knew how to remove such things in event viewer id remove it.
> but, i trust manna to leave it there as its not spying on me.


No spying 
Source code is available as well, whoever is paranoid can compile its own binary just in case.



craxton said:


> question, is there a way one can (add) the extra WHEA loggers (you have that i dont)
> as there is another person (with my exact board) who has one other logger that i do not,


You have all 5 sources there's no need.
Adding them is not easy and I'm not sure it would work.
The sources are added by the system or via a device driver install.
Not sure that "cloning" would work as there's some data that is machine specific.



craxton said:


> if you recall you have one in particular i do not have.
> which might be why im free from whea errors.


Which one?
I recall you have all 5.
That one in the screenshot is the source for the high flow WHEA 19.

I wouldn't touch it 
Try to fix the WHEA 18 and 20 as they are serious.


----------



## craxton

ManniX-ITA said:


> Try to fix the WHEA 18 and 20


(as per the rest of your response thank you ill keep that in mind and not bother lol)

as for WHEA 18 and 20 it would seem (using core cycler...which im finding out now why its not so great/turned against(edit
maybe i was reading wrong, but im for sure running the dog dodo outa my 5600x 
possibly going to buy a new one here shortly to check my luck on better cores??)
its failing on 3 cores (to which was only 1 adjusted this one core then 2 others failed with it....
causing positive curve to be needed....i cant understand this honestly...might just leave curve off and see what it says....
surely the cpu can manage all on its own lol


----------



## iraff1

I am interested in this, so what this thread tells me is that you guys are running stable computers with whea errors? What kind of errors are you getting?

When i put my FLCK to 2000 i get whea errors but i don't seem to crash, this is why i am asking, maybe i could do flck 2000 an just ignore the errors? The errors i was getting is the following: 

"A correct hardware error occured"
Reported by component: Processor Core
Error Source: Unknown Error Source:
Error Type: Bus/Interconnect Error
Processor APIC ID: 0

This seems to not change, it always the same. What is your guys take?


----------



## ManniX-ITA

iraff1 said:


> I am interested in this, so what this thread tells me is that you guys are running stable computers with whea errors? What kind of errors are you getting?
> 
> When i put my FLCK to 2000 i get whea errors but i don't seem to crash, this is why i am asking, maybe i could do flck 2000 an just ignore the errors? The errors i was getting is the following:
> 
> "A correct hardware error occured"
> Reported by component: Processor Core
> Error Source: Unknown Error Source:
> Error Type: Bus/Interconnect Error
> Processor APIC ID: 0
> 
> This seems to not change, it always the same. What is your guys take?


You have to check the details.

I usually get this error at boot:

_ ErrorSource 3 
ApicId 0 
MCABank 27 
MciStat 0xd02000000002080b 
MciAddr 0x0 
MciMisc 0x0 
ErrorType 10 _

ErrorSource 3 is NMI

Then a steady flow of:

_ ErrorSource 0 
ApicId 0 
MCABank 27 
MciStat 0x982000000002080b 
MciAddr 0x0 
MciMisc 0xd01a0ffe00000000 
ErrorType 10 _

ErrorSource 3 is DeviceDriver

All of them EventID 19.

If you get something else too, something is probably wrong with the settings and it's unstable.
Only those, specifically the flow with MciStat 0x982000000002080b, that's what I can keep getting in thousands without stability issues.


----------



## citizenasdf

@ManniX-ITA . Thank you so much for this. Will you be updating it for Windows 11? It's not working on the leaked build


----------



## ManniX-ITA

citizenasdf said:


> @ManniX-ITA . Thank you so much for this. Will you be updating it for Windows 11? It's not working on the leaked build


For sure yes but when 11 is out.


----------



## mongoled

ManniX-ITA said:


> For sure yes but when 11 is out.


Will be interesting to see if the WHEA situation changes


----------



## jankaw

Hey, @craxton here.
unsure if you fellers that have this WHEA-19 issue have "cutout issues with your display" 
rather using DP cable, or HDMI. but i used DDU to uninstall the driver, "ran it twice" 
and have stressted the EVER LOVING SHIX outa this "new" single CCD 5800x 
as running AIDA (before uninstall) would QUICKLY SHOW a black screen like i crashed, but would come back 
would do this several times per aida run. 

After DDU however have NOT had any screen cutouts.....(you few who changed your GPU) 
do you think this could be an Nvidia driver issue?????? i changed NOTHING ELSE. and this happens MORE while running 1.2SOC 
so of course thats what im running to test and be sure. also had mouse drop-out issues but with 
the below config im not getting the AUDIO ISSUES, nor the USB dropout issues while running AIDA or OCCT.

(even while have the whea supressor running this happened) again now without the nvidia driver installed 
(even tho windows re-installed it no matter what i turn off) 
its still not happened in sometime now. 

gonna reboot, after removing the suppressor and check again since the drivers will be fully initialized 
unless they already are. 
(tried what veii stated to use, no go still reinstalls, tried several other things but none the less nothing keeps drivers 
from reinstalling. even turning off windows updates still some how manages to allow update/driver auto installs.


----------



## citizenasdf

ManniX-ITA said:


> For sure yes but when 11 is out.


Dear @ManniX-ITA , Windows 11 is out  We're really looking forward to your update!


----------



## craxton

@ManniX-ITA 
quick question, concerning ive checked that what others are ignoring doesnt 
quite seem to be what i get with the 5800x. i get Two WHEA-19s (1000s of events)
but two different ones one being a 1 or 2 time thing upon boot, and the other being well. attaching 
shots, and what id assume your be asking for otherwise.

now, im running any fclk from 3800-4066 (3800 and ALL below are WHEA free)
unless i use CO to heafty which im yet to attempt dialing in.
however
-are these what your ignoring?
-do you have issues with display dropout? (random and hasnt happened since manually setting IOD/after reinstall driver, and setting TLD or a registry value from 1 to ten
to allow the GPU to respond (found on nvidia forum) worked so far, but the errors are still present) 
-has your mouse/usb dropped out? (mine did twice)

(below is the machine check error) 










Spoiler






Code:


- <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
- <System>
  <Provider Name="Microsoft-Windows-WHEA-Logger" Guid="{c26c4f3c-3f66-4e99-8f8a-39405cfed220}" />
  <EventID>19</EventID>
  <Version>0</Version>
  <Level>3</Level>
  <Task>0</Task>
  <Opcode>0</Opcode>
  <Keywords>0x8000000000000000</Keywords>
  <TimeCreated SystemTime="2021-06-29T04:22:58.4419162Z" />
  <EventRecordID>118934</EventRecordID>
  <Correlation ActivityID="{69fa9173-96bf-4dd7-8db2-1a3b44ec4259}" />
  <Execution ProcessID="4248" ThreadID="4748" />
  <Channel>System</Channel>
  <Computer>BLACK-BOSS</Computer>
  <Security UserID="S-1-5-19" />
  </System>
- <EventData>
  <Data Name="ErrorSource">1</Data>
  <Data Name="ApicId">0</Data>
  <Data Name="MCABank">27</Data>
  <Data Name="MciStat">0xd02000000002080b</Data>
  <Data Name="MciAddr">0x0</Data>
  <Data Name="MciMisc">0x0</Data>
  <Data Name="ErrorType">10</Data>
  <Data Name="TransactionType">256</Data>
  <Data Name="Participation">0</Data>
  <Data Name="RequestType">0</Data>
  <Data Name="MemorIO">2</Data>
  <Data Name="MemHierarchyLvl">3</Data>
  <Data Name="Timeout">0</Data>
  <Data Name="OperationType">256</Data>
  <Data Name="Channel">256</Data>
  <Data Name="Length">936</Data>
  <Data Name="RawData">435045521002FFFFFFFF03000200000002000000A8030000391604001D0615140000000000000000000000000000000000000000000000000000000000000000BDC407CF89B7184EB3C41F732CB57131B18BCE2DD7BD0E45B9AD9CF4EBD4F890FD416B1C9E6CD70100000000000000000000000000000000000000000000000058010000C00000000003000001000000ADCC7698B447DB4BB65E16F193C4F3DB0000000000000000000000000000000002000000000000000000000000000000000000000000000018020000800000000003000000000000B0A03EDC44A19747B95B53FA242B6E1D0000000000000000000000000000000002000000000000000000000000000000000000000000000098020000100100000003000000000000011D1E8AF94257459C33565E5CC3F7E8000000000000000000000000000000000200000000000000000000000000000000000000000000007F010000000000000002040000030000100FA2000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000007000000000000000000000000000000100FA200000810000B32D87EFFFB8B170000000000000000000000000000000000000000000000000000000000000000B3F8F31CB1C5A249AA595EEF92FFA63C01000000000000009E07C0200400000000000000000000000000000000000000000000000000000000000000000000000200000002000000DE443C709E6CD70100000000000000000000000000000000000000001B0000000B080200000020D0000000000000000000000000000000000000000000000000000500002E0001000100025A000000007D000000070000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000003B00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000</Data>
  </EventData>
  </Event>





below is the CPU/BUS interconnect error










Spoiler






Code:


- <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
- <System>
  <Provider Name="Microsoft-Windows-WHEA-Logger" Guid="{c26c4f3c-3f66-4e99-8f8a-39405cfed220}" />
  <EventID>19</EventID>
  <Version>0</Version>
  <Level>3</Level>
  <Task>0</Task>
  <Opcode>0</Opcode>
  <Keywords>0x8000000000000000</Keywords>
  <TimeCreated SystemTime="2021-06-29T04:20:58.0433445Z" />
  <EventRecordID>118891</EventRecordID>
  <Correlation ActivityID="{63df006e-c13e-408d-9940-1904af512687}" />
  <Execution ProcessID="4248" ThreadID="4768" />
  <Channel>System</Channel>
  <Computer>BLACK-BOSS</Computer>
  <Security UserID="S-1-5-19" />
  </System>
- <EventData>
  <Data Name="ErrorSource">0</Data>
  <Data Name="ApicId">0</Data>
  <Data Name="MCABank">27</Data>
  <Data Name="MciStat">0x982000000002080b</Data>
  <Data Name="MciAddr">0x0</Data>
  <Data Name="MciMisc">0xd01a0ffe00000000</Data>
  <Data Name="ErrorType">10</Data>
  <Data Name="TransactionType">256</Data>
  <Data Name="Participation">0</Data>
  <Data Name="RequestType">0</Data>
  <Data Name="MemorIO">2</Data>
  <Data Name="MemHierarchyLvl">3</Data>
  <Data Name="Timeout">0</Data>
  <Data Name="OperationType">256</Data>
  <Data Name="Channel">256</Data>
  <Data Name="Length">936</Data>
  <Data Name="RawData">435045521002FFFFFFFF03000200000002000000A8030000391404001D0615140000000000000000000000000000000000000000000000000000000000000000BDC407CF89B7184EB3C41F732CB57131B248949139377F4BA8F1E0062805C2A3FB416B1C9E6CD70100000000000000000000000000000000000000000000000058010000C00000000003000001000000ADCC7698B447DB4BB65E16F193C4F3DB0000000000000000000000000000000002000000000000000000000000000000000000000000000018020000800000000003000000000000B0A03EDC44A19747B95B53FA242B6E1D0000000000000000000000000000000002000000000000000000000000000000000000000000000098020000100100000003000000000000011D1E8AF94257459C33565E5CC3F7E8000000000000000000000000000000000200000000000000000000000000000000000000000000007F010000000000000002040000030000100FA2000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000007000000000000000000000000000000100FA200000810000B32D87EFFFB8B170000000000000000000000000000000000000000000000000000000000000000B3F8F31CB1C5A249AA595EEF92FFA63C01000000000000009E07C00004000000000000000000000000000000000000000000000000000000000000000000000002000000020000003ABDAB289E6CD70100000000000000000000000000000000000000001B0000000B08020000002098000000000000000000000000FE0F1AD00000000000000000000500002E0001000100025A000000007D000000270000000000000000000000000000000000000000000000000010000000000000001000000000000000100000000000000010003B00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000</Data>
  </EventData>
  </Event>


----------



## ManniX-ITA

citizenasdf said:


> Dear @ManniX-ITA , Windows 11 is out  We're really looking forward to your update!


I will check out and see what can be done.
Hope they didn't block the WMI calls... in theory should work as usual.



craxton said:


> -are these what your ignoring?
> -do you have issues with display dropout? (random and hasnt happened since manually setting IOD/after reinstall driver, and setting TLD or a registry value from 1 to ten
> to allow the GPU to respond (found on nvidia forum) worked so far, but the errors are still present)
> -has your mouse/usb dropped out? (mine did twice)


Yes these are those I'm ignoring, same MCA data

No never had display dropout, seems a bit weird, especially since it's new only the gpu...

No mouse or USB dropout, usually is a serious lack of voltage or way too much
Maybe the 5800x needs more
Consider I have VSOC at 1.23V in BIOS, CCD at 1080 and IOD at 1140
A single CCD should needs less but maybe still more than the 5600x


----------



## Veii

ManniX-ITA said:


> For sure yes but when 11 is out.
> 
> 
> citizenasdf said:
> 
> 
> 
> Dear @ManniX-ITA , Windows 11 is out  We're really looking forward to your update!
Click to expand...

Windows 22000.51 , is mostly written off as a Windows 10 
It has an interface change (explorer exe), 








has a different settings menu (which i honestly like more)









But continues to have a broken thread scheduler for AMD 
This is under Chipset Driver 2.17.25.506 (2nd of June,2021)
Although NVIDIA updated their drivers too ~ which are downloadable through Win-Update
+ Intel did too








The WHEA situation potentially could change, if every "brand" updates their drivers
But considering, the thread scheduler still is a mess (L3 cache issue) & windows 11 ProWS still is missing AMD Patches
It might make sense to wait till the next big update in July

I question if i should downgrade again, considering this is written off as Windows 10, just with version 22000.51
Yet misses all the Windows 10 AMD patches


----------



## craxton

ManniX-ITA said:


> A single CCD should needs less but maybe still more than the 5600x


usually more is where and when i experience these issues but (only upon stressing)



ManniX-ITA said:


> Consider I have VSOC at 1.23V in BIOS, CCD at 1080 and IOD at 1140


upped to 1.22 in bios and set iod to 1075 setting CCD to 990 
havent had the issue arise so i suppose what i first stated contradicts what i said.
however, im running a dual boot atm of win 10 
and preview build of WIN 11 and ran the WHEA suppressor, but they still come up. 
(its shown to be running by checking event viewer.)

going to give it another go before i rule out that its indeed not working even tho 
everything else from win10 is since not much has changed other than moving some stuff,
and looking way different almost like apple...which i wont get into. 




Spoiler



so far, i dislike two things task manager, and the fact programs arent in control panel any longer.
but tis quite nice (atm) was testing to check if the flooded WHEA-19 would magically disappear. 
but just like most other situations its not happening.



on a positive note (all cores can run -20 minus the worst being -10 with +1 in offset.
(sil quality is 126.xxx) 
if i cant get the issues of nvidia display driver crashing at 4066/2033 or 4000/2000 then i suppose 3800/1900 itll 
have to be. strange this chip boots upto 4133 1:1:1 mode... but has such an issue.



Veii said:


> Yet misses all the Windows 10 AMD patches


(are you running a leaked build)? 
or a preview build thru windows update?

i couldnt update to win 11 until i had completed the "second windows 10 install and FULLY updated it)
dont understand the math behind that. but it didnt start the install until it was completely updated with the latest 
patches on Microsoft pages. granted i did install razor and realtek after seeing that WHEA 19 is still present.
AMD drivers being the only thing "behind" 
i suppose now i understand what you mean by "scheduler"
being windows not knowing there are "big and way newer updates"?


----------



## mongoled

Increase vDDP if the graphics card is crashing and then recovering


----------



## craxton

mongoled said:


> Increase vDDP if the graphics card is crashing and then recovering


CLDO_VDDP
or CPU_VDDP?
had this strange stuff happen jus now?
(this is without hwinfo running but while windows had jus started)
did increase IOD/CCD voltages slightly in bios jus a moment ago.
but dang almost 100k copy and7k writes...who needs read speed....
(EDIT) 70k write*
thinking that you know im on MSI just as you, has to be CLDO_VDDP (PHY)


Spoiler



(EDIT AGAIN) re ran the L3 tests individually and well.....


----------



## Veii

craxton said:


> (are you running a leaked build)?
> or a preview build thru windows update?


"public" build 22000.51 but it's the same
But i think the issue is something else, and just a coincidence


craxton said:


> (EDIT AGAIN) re ran the L3 tests individually and well.....


We know windows 11 messed up cache performance
But read this post








MSI MEG X570 Unify Overclocking & Discussion Thread


so not even memory stability imrpoved or anything special with new beta bios? i think i will go back to asus boards msi they have beefy boards but their bios is just not as good as asus Will make little difference who you choose from what I have read across the various motherboard threads ...




www.overclock.net





I figured the new microcode is kind of buggy and reverted back
Have two theories for now,

New microcode does permanent caches as an "attempt" to fix USB issues ~ breaks cache on 1203A and anything no Patch B / microcode downgrade does not remove the change
It's just windows 11 broken thread scheduler

Theory 1 i haven't verified, as i'm still suffering from the microcode (update attempt) even with a downgrade
It would require me to downgrade to 21H1 Windows 10 , and see if cache fixes
If not, require me to wipe NVRAM and flash it again somehow ~ if anything from the "upgrade" was still left

Something generally is fishy here with L3 cache and the microcode updates, which are enforced after 1203A+
EDIT: Nothing fishy, but fishy because of the introduced 2100 FCLK lock for me on the new microcode 😐


----------



## ManniX-ITA

Unfortunately this Windows 11 pre-release build is bugged.
WHEAService is doing its job fine and disabling the error sources.
But there's a ghost errorsource, ID 0 for me, which seems to be a copy of the DeviceDriver source, which is ID 1 for me.
It's not listed via WMI and thus it can't be disabled.


----------



## craxton

Quick thought, those having WHEA 19 errors, 
give OCCT a quick try (without WHEA turned off)
and run your BEST CORE ONLY
and see if you get WHEA errors thru HWiNFO.
ran for 20 min didnt get a single one, the moment i stopped the test they 
resumed. tested the "worst" cores in the system and WHEA started within a few seconds.

could it be something in "AMDs" coding thats causing the issue to where the "core links" 
arent talking to each other correctly over "CACHE" ? 
(no i cant confirm this) but again 20 minutes of it running and none (NEW) came about 
while i could still use the system normally etc...


----------



## ManniX-ITA

I get some running CoreCycler on my best core
How do you run OCCT? There are many options


----------



## mongoled

craxton said:


> CLDO_VDDP
> or CPU_VDDP?
> had this strange stuff happen jus now?
> (this is without hwinfo running but while windows had jus started)
> did increase IOD/CCD voltages slightly in bios jus a moment ago.
> but dang almost 100k copy and7k writes...who needs read speed....
> (EDIT) 70k write*
> thinking that you know im on MSI just as you, has to be CLDO_VDDP (PHY)
> 
> 
> Spoiler
> 
> 
> 
> (EDIT AGAIN) re ran the L3 tests individually and well.....
> View attachment 2515680
> 
> 
> 
> 
> View attachment 2515679


Yes, CLDO_vDDP


----------



## craxton

ManniX-ITA said:


> How do you run OCCT? There are many options


doesnt matter, musta been a flook or something. 


Spoiler



bc now after starting the pc back up with the same settings still applied,
(unless its using a different core?) its no longer working as it did. 
worst core has "less" WHEAs but the "best" has MORE way more. 
but the test i was running anyhow is


----------



## ManniX-ITA

craxton said:


> doesnt matter, musta been a flook or something.


If you set Threads -> Advanced then you can choose on which core run the test

BTW latest OCCT 9 beta introduced CPU and memory benchmarks 
Will check it out later


----------



## craxton

ManniX-ITA said:


> Advanced then you can choose on which core run the test


oh yes lol believe me i know, have not test "all" as from what i can "recall"
the test was using the CPPC preferred core last night or rather this morning when i posted that.
but for whatever reason somethings changed upon boot? 


Spoiler



i did raise VDDP so perhaps thats a factor? 20mv might be the underlying issue im having "replicating" 
what i was seeing last night. as switching to 6 set cores (letting OCCT select those) ran for 5-10 min 
then started throwing WHEA, again i stopped and re-ran only 1 core (letting OCCT do the selection, unsure if its random?
and still no whea for an entire episode of whatever was on last night on netflix.





Spoiler



sadly i dont have 100s of screenshots on this install, so i cant take a look at what voltages were running, cores were setting at.
the only thing i did do was use zendebug to turn on C6 (however veii showed it that day) 

i have yet to give any beta options a try, perhaps thats something i can test out here shortly as its still "cool" inside.


----------



## craxton

OK, IVE done ALOT and CAN make this happen again.

so, without delay, upon me having issues with ANYTHING above 3800 1:1:1 mode,
and 1000s within an hour i got 3933 down to idk 1 every so often. but BUT

when i was playing around with AIDA and checking latency, i got 0 extra recorded errors, watching vids 0,
while hitting copy 0, but then i hit write and 1/2 whea 19s came up, then i hit READ and around 50 or so...
so

i then did read again, (yes only read) sometimes 0 sometimes only 1, same for write, but if i hit write, then read
id get the same result from 50- ?? well, didnt quite add them up, but anyhow....

if anyone understands what im saying and can get their WHEA 19s to a slow crawl, then give this a try.

seems that when the chip writes something, then you ask for it to turn around and read thats when it has big issues above 3800mhz (some might have lower FCLK IDK)

i did re-install my 5600x and re-ran all the other saved profiles i had, and re-did a 3800c14 set that 100% errored within a matter of minuted with TM5
and it threw 0 whea 19s. (only have 2 chips one board so keep that in mind)
so with that being said and the examples of 51-51-51 etc i was reading an hour or so ago in the now
closed zen timings page, could their be a "magical" setup timing that could help the CPU
read/write correctly? (EDIT) to deter the WHEA-19s of course<
this is far fetched thinking, but it doesnt explain how yeterday i put the EXACT timings i have now in with different voltages
only setup timings were 3-3-15 instead of now being 4-4-18 (btw if you look inside amd overclocking section it reads 3-3-15 as 3-3-f ?
and 4-4-18 as 4-4-12 ? i know there is a meaning for that
but i dont know the meaning and am not asking for it. just curious to whats changing within amd overclocking section
also "auto in the main page on this board sets 1140 IOD and CCD voltgaes in AMD overclocking section"
so be mindful of auto on MSI boards with the latest Agesa.
(EDIT) i also noticed that i can load my 5600x profiles to the 5800x with the same bios? is that new or ?





Spoiler



























and now its been 10 minutes or so here are the errors i have now while just being on this page and letting several 
pages of youtube minimized run








(also found that leaving the gpu on max boost clocks seems to help? or am i just looking in hard places for a pixel dust sprinkle ??


----------



## ManniX-ITA

craxton said:


> only setup timings were 3-3-15 instead of now being 4-4-18 (btw if you look inside amd overclocking section it reads 3-3-15 as 3-3-f ?


That's because it's 15 in Hexadecimal = F



craxton said:


> when i was playing around with AIDA and checking latency, i got 0 extra recorded errors, watching vids 0,
> while hitting copy 0, but then i hit write and 1/2 whea 19s came up, then i hit READ and around 50 or so...
> so


This looks like the de-sync issue that could be fixed enabling CLKREQ# in AMD PBS menu.
But you need to install the unlocked BIOS from Elmor to see that menu.



craxton said:


> (EDIT) i also noticed that i can load my 5600x profiles to the 5800x with the same bios? is that new or ?


Yes the profiles are not CPU dependent, you need the same motherboard with the same BIOS release for compatibility


----------



## Veii

craxton said:


> only setup timings were 3-3-15 instead of now being 4-4-18 (btw if you look inside amd overclocking section it reads 3-3-15 as 3-3-f ?
> 
> 
> ManniX-ITA said:
> 
> 
> 
> That's because it's 15 in Hexadecimal = F
Click to expand...

Is your AMD OVERCLOCKING in HEX ?
Usually to what i've seen, AMD OVERCLOCKING is in decimals, and only AMD CBS was in HEX

I'd be rather worried, that you swapped both around ?
ZenTimings should report a 15 not an F
But maybe AMD OVERCLOCKING was always in HEX for you
Didn't experience that with neither boards (MSI, ASRock , ASUS) 
Tho maybe. I think i've never set them there, as it sits in NVRAM and is wiped upon CMOS reset (but will refuse to reset itself that way)

AMD OVERCLOCKING, only PBO and maybe VDDP voltage, is what i change there
Rarely any voltages ~ as that will prevent the board from resetting itself


----------



## ManniX-ITA

Veii said:


> Is your AMD OVERCLOCKING in HEX ?


Yes both on MSI and GB the memory part is in Hex


----------



## craxton

Veii said:


> Is your AMD OVERCLOCKING in HEX ?
> Usually to what i've seen, AMD OVERCLOCKING is in decimals, and only AMD CBS was in HEX
> 
> I'd be rather worried, that you swapped both around ?
> ZenTimings should report a 15 not an F
> But maybe AMD OVERCLOCKING was always in HEX for you
> Didn't experience that with neither boards (MSI, ASRock , ASUS)
> Tho maybe. I think i've never set them there, as it sits in NVRAM and is wiped upon CMOS reset (but will refuse to reset itself that way)
> 
> AMD OVERCLOCKING, only PBO and maybe VDDP voltage, is what i change there
> Rarely any voltages ~ as that will prevent the board from resetting itself


oh no boss, thats not what i meant lol, i meant that when one sets 3-3-15 inside the "main ram overclocking"
section 3-3-15 or 4-4-18 it then sets for 3-3-15 3-3-f and 4-4-18 is set to 4-4-12
its been this way on my board since the first time i used setup timings.

(if i set main page to "auto" voltages then inside AMD section it shows CCD and IOD voltages
being PRESET TO 1150 (edit/corrected) will check and fix but, thats quite HIGH)

as some of you will notice i was pretty far back in the thread (you getting a +1 from me)
and i tried LOTS of stuff that worked before for others that wouldn work for me example
56-56-56 and 51-0-0 for 3800c14 (dimms overheated 3-3-15 at 1.55)
and it would seem that while others could easy use tRDWR 8 and tWRRD 3 or 1
i could ONLY post with tRDWR 10 and tWRRD 4. (these are indeed SR sticks) but act like DR.

to that extent, while runing the "old 4ghz mem oc from the 5600x on this 5800x
i changed only the 4000 to 3866 (1:1:1 ALWAYS) and could get WAY less whea 19s than that from an
hour before hand (i also added 3-3-15 tcke 9) and 1.4v with 4-4-18 tcke 11 wheas FLOODED



ManniX-ITA said:


> This looks like the de-sync issue that could be fixed enabling CLKREQ# in AMD PBS menu.
> But you need to install the unlocked BIOS from Elmor to see that menu.


sigh, that bios has all unlocked.
but i get way more issues running fclk 2000 for what ever reason.
but since im testing i might as well. (might even try the old bios i stayed with for a while)
i asked Eder to unlock the latest but 🤷‍♂️ and i myself tried every "old" guide i saw...


----------



## Veii

ManniX-ITA said:


> This looks like the de-sync issue that could be fixed enabling CLKREQ# in AMD PBS menu.
> But you need to install the unlocked BIOS from Elmor to see that menu.


Not anymore 
I figured the first AMI lock out and we have full write access to anything AMD CBS & PBS
+ hidden settings from normal SETUP menu or AMITSE 😁

I can change CLKREQ# live now on my ITX
But have yet to experience what it does
Changes can be done via RU Tool








Asrock X370 Taichi Overclocking Thread


Nope. It's actually X370 Taichi ultimate. Professional gaming was rebranded to Ultimate on X470. X370 taichi even has slots for reset/power buttons on motherboard, that are unused, but installed on professional gaming. Same board, plus those two buttons, tpm slot and aquantia chip and different...




www.overclock.net




MSI shouldn't have any enforced AMI Variable lock
Soo just go ahead and try to grab RU

F5 , to select UEFI boot variables
CTRL+W to save changes
CTRL+ALT+DEL , to reboot

Usb has to be formated via GPT (non bootable) in rufus
It's a whole EFI folder

EDIT:
Changes in UEFI SETUP_Var
Bad ones will require a bios reflash as they are one way
Be sure to use F12 , to make yourself screenshots


----------



## Veii

craxton said:


> oh no boss,


Please skip this wording
People would get to wrong ideas and make drama again 😅



craxton said:


> and i tried LOTS of stuff that worked before for others that wouldn work for me example
> 56-56-56 and 51-0-0 for 3800c14 (dimms overheated 3-3-15 at 1.55)


Give even weaker RTT_PARK a try (maybe /4), or maybe try to switch to RTT_WR /2
6/3/3 was one of the early resolves for dual rank , but it worked better on 2x16, than on 4x8

SETUP timings won't change heat behavior
At least they shouldn't
They will after the half-break point change to Command Rate behavior ~ soo behave very similar to GDM or 2T
It makes me wonder if you aren't just replicating GDM behavior which is 1T, 2T, 1T ~ by the half used setup timings 

Soo you say, your bios is open now or isn't
Should i take a look into it with the new shenanigan's and abstract methods ? 
Could decompile your bios and write you the hex out which you need/want to change
Soo you can at the same time learn something too & spread the knowledge

EDIT:
I could be wrong, but EDER's bios also misses couple of CBS & PBS options
Either he, or maybe MSI does filter them out
I saw A LOT of them for my fully locked ITX ~ which where not visible on the MSI Mods


----------



## craxton

Spoiler






Veii said:


> Please skip this wording
> People would get to wrong ideas and make drama again


ah, pretty sure "they/he" deleted their profile as of sometime around 3 am this morning as all
previous responses from "them/they" have a name now such as "deleteduser and a bunch of numbers"





Spoiler






Veii said:


> Give even weaker RTT_PARK a try (maybe /4), or maybe try to switch to RTT_WR /2
> 6/3/3 was one of the early resolves for dual rank , but it worked better on 2x16, than on 4x8


will do, as im "running it atm" but upon cycle 2 of TM5
does show errors to which was error #12 which said see error #2 to which,
well lol...makes me wonder why 40-24-20-20 works so well on these dimms 2000fclk c16 with 40-6-6-3 1.48
+2 on most using calc examples other than 32-48 being the same. and i had to +1 SCL as well.











(on the note of TM5, does changing the "testing order" like swapping the "starting test example is 6 with 3
change how the errors work out/what they mean?)



Spoiler






Veii said:


> SETUP timings won't change heat behavior
> At least they shouldn't
> They will after the half-break point change to Command Rate behavior ~ soo behave very similar to GDM or 2T
> It makes me wonder if you aren't just replicating GDM behavior which is 1T, 2T, 1T ~ by the half used setup timings


dont quite understand myself.
others that used "near" identical timings such as i ran these setup timings without issues.
possibly is my proc/RTT values not being quite right.





Spoiler






Veii said:


> Soo you say, your bios is open now or isn't


current bios, NO. but 1.60 YES
Here is where youll find 160 which has the "unlocked" options.
the "old unlocked bios" has horrible issues with USB dropouts, to where the 17x version (beta) doesnt.
thus why i aimed to unlock it myself. (something i changed looking back where i think page 140-145
somewhere in there you shared made the boot super slow) even re-flashing didnt change it.
i think it was something along the lines to do with "data eye" which to be fair, the settings you showed
were already "pre set" on my board by default. only DFE read training being on and the other, makes me have a 1 of 3 boot.
2 out of 3 are a no boot with lights on the mouse and keyboards lighting up."





Spoiler



@ManniX-ITA i do apologize for bringing ram into this thread.


----------



## ManniX-ITA

craxton said:


> @ManniX-ITA i do apologize for bringing ram into this thread.


No problem I hope the memory thread will re-open soon...



craxton said:


> (on the note of TM5, does changing the "testing order" like swapping the "starting test example is 6 with 3
> change how the errors work out/what they mean?)


Yes you need to keep the exact order and sequence. 
it's like lines of instructions in a program.


----------



## Veii

ManniX-ITA said:


> No problem I hope the memory thread will re-open soon...


Nobody forbid us to just make a "Vermeer Fabric Overclocking Thread" 🤭


ManniX-ITA said:


> Yes you need to keep the exact order and sequence.
> it's like lines of instructions in a program.


Exactly,
There are specific copy, write, transfer & #15 Test/Verify , steps
Changing or copying half of the config, might trigger errors faster, but will nullify it's effectiveness

Soo unless the person has knowledge like Anta, and can understand the programm's testing behavior,
He/She should not fiddle around with premade configs 

You only can extend the delay between each tests like KedarWolf did @craxton
But that will take instead of 2h, 8-10h to finish 


craxton said:


> the "old unlocked bios" has horrible issues with USB dropouts, to where the 17x version (beta) doesnt.
> thus why i aimed to unlock it myself. (something i changed looking back where i think page 140-145


Data-Eye should not be enabled, else the post time takes 55-60sec at minimum ~ i couldn't optimize it , it just takes too long
but the memory training re'design changes, do take 3-3.5sec only
1sec more

I can take a look and extract the current bios
Then you can try with RU, if you can change the specific HEX - if you have write access in the first place


----------



## ManniX-ITA

Veii said:


> Nobody forbid us to just make a "Vermeer Fabric Overclocking Thread"


That would be really interesting


----------



## craxton

Spoiler






Veii said:


> Nobody forbid us to just make a "Vermeer Fabric Overclocking Thread" 🤭
> 
> Exactly,
> There are specific copy, write, transfer & #15 Test/Verify , steps
> Changing or copying half of the config, might trigger errors faster, but will nullify it's effectiveness
> 
> Soo unless the person has knowledge like Anta, and can understand the programm's testing behavior,
> He/She should not fiddle around with premade configs
> 
> You only can extend the delay between each tests like KedarWolf did @craxton
> But that will take instead of 2h, 8-10h to finish
> 
> Data-Eye should not be enabled, else the post time takes 55-60sec at minimum ~ i couldn't optimize it , it just takes too long
> but the memory training re'design changes, do take 3-3.5sec only
> 1sec more
> 
> I can take a look and extract the current bios
> Then you can try with RU, if you can change the specific HEX - if you have write access in the first place


I suppose they didnt, but it's almost like anything that's relevant to pushing "any" processor to it's limits or almost when it's at it's peaks the threads shut down.
The 5ghz oc club for example has been shut down due to being to big. But
The ram oc thread isn't nowhere near that size, does it usually mean shut down for good, or does cleaning mean they got a report on something and they need to check it? That could take a while....

No no, not data eye settings, if I recall. It's inside there with it. Or maybe it's not? Unsure, maybe the ram training I was trying to do broke something on the bios?
Unsure honestly, but so long as I didn't reload the profile I was fine. Other than USB drives not appearing. I could unhook and rehook and they'd be ok. Just happens at boot and if over 1900fclk it's in and out with any voltages within safe ranges.

The second I see this 3800c14 set is fine, I'll be attempting to use the instructions given to unlock hidden stuff inside the bios. As 17x is promising on this board as one step above 3800 does have whea but it's not flooded with WHEA 19s until stressing comes into play. But at 3800 100% fine. If I share a good high quality image would you be able to tell what PCB these are? As they do NOT act like single rank, and almost act like DR but are single with a +2 attitude towards nearly everything...

Rttwr 2 and rtt park 4 neither worked. At 40proc nor 36.9. dims overheat says tm5 at 1.55v

Here are the pics, yes all modules are on the same side. The side with the resistor has no modules. Hopefully these are good enough.


----------



## Veii

craxton said:


> If I share a good high quality image would you be able to tell what PCB these are?


I can, but dual rank has ICs on both sides
A3 where acting funnily 
But that was someone else 
I forgot his name, it's not in this thread
With the chinese white Dimms


----------



## craxton

Spoiler






Veii said:


> I can, but dual rank has ICs on both sides
> A3 where acting funnily
> But that was someone else
> I forgot his name, it's not in this thread
> With the chinese white Dimms


If you wish to see the "numbers" on the chip inside the heatsink, lmk this camera can grab that info too.


----------



## Veii

craxton said:


> If you wish to see the "numbers" on the chip inside the heatsink, lmk this camera can grab that info too.


You can usually see everything , if you make a close picture on the bottom of the traces (bottom of the dimm)


----------



## craxton

Spoiler













Veii said:


> You can usually see everything , if you make a close picture on the bottom of the traces (bottom of the dimm)


I edited the response up above, with about 5 pictures. Are those not what your looking for?


----------



## Veii

craxton said:


> View attachment 2516031
> 
> I edited the response up above, with about 5 pictures. Are those not what your looking for?
> View attachment 2516032


Bit further up maybe ? 
and from both sides , centered above


----------



## craxton

Veii said:


> Bit further up maybe ?
> and from both sides , centered above


(edit) after looking at the ocing tforce page it would seem there almost "identical" to those sticks. which
would make them A2 or custom A2 but the way you put it, would be more like A2.


----------



## Veii

craxton said:


> (edit) after looking at the ocing tforce page it would seem there almost "identical" to those sticks. which
> would make them A2 or custom A2 but the way you put it, would be more like A2.
> 
> View attachment 2516037


Ah i ment "and from both sides" ~ as two centered pictrures, just from above
Traces are important 
They are A2 for sure, but can be A1 too


----------



## craxton

Veii said:


> Ah i ment "and from both sides" ~ as two centered pictrures, just from above
> Traces are important
> They are A2 for sure, but can be A1 too


So like these then, 10 min into tm5 so far 3800c14 1.53 bios, reported 1.513 hwinfo/ nope error bunch of strange looking symbols and 5 and 4, dimms don't feel hot. But reboot change, test... Passed longer this time than anything else I've tried.


----------



## Veii

craxton said:


> So like these then, 10 min into tm5 so far 3800c14 1.53 bios, reported 1.513 hwinfo/ nope error bunch of strange looking symbols and 5 and 4, dimms don't feel hot. But reboot change, test... Passed longer this time than anything else I've tried.
> View attachment 2516046
> 
> View attachment 2516047


They are too close , but here are the schematics again
looks like plain A2


----------



## craxton

Veii said:


> Spoiler
> 
> 
> 
> They are too close , but here are the schematics again
> looks like plain A2
> View attachment 2516049
> 
> View attachment 2516050
> 
> View attachment 2516051


yep, 100% a2...


Spoiler



PSST (this is something







)
should be able to get great core results.
(has any of you) had issues with HWiNFO and Snapshot pooling????
enabled on mine removes core clocks? already updated so thats not it..


----------



## ManniX-ITA

craxton said:


> should be able to get great core results.


Very nice sil quality!
No issues like that with HWInfo
I'm using the 7.05 beta


----------



## Veii

craxton said:


> should be able to get great core results.


very very nice 
I sit in the 116 range only
Never run SMU reading program like HWInfo along SMU based Tool.exe
Oh Ryzen Hydra , CTR 3.0 successor was spotted yesterday/today
Unsure when the free version will come out, but the pro utilizes CO for tuning. Guess support money was required, to cut an important feature away
But it's not that bad, curious to see how custom user CO + Hydra-Tool will work out


----------



## craxton

Veii said:


> "public" build 22000.51 but it's the same
> But i think the issue is something else, and just a coincidence
> 
> We know windows 11 messed up cache performance
> But read this post
> 
> 
> 
> 
> 
> 
> 
> 
> MSI MEG X570 Unify Overclocking & Discussion Thread
> 
> 
> so not even memory stability imrpoved or anything special with new beta bios? i think i will go back to asus boards msi they have beefy boards but their bios is just not as good as asus Will make little difference who you choose from what I have read across the various motherboard threads ...
> 
> 
> 
> 
> www.overclock.net
> 
> 
> 
> 
> 
> I figured the new microcode is kind of buggy and reverted back
> Have two theories for now,
> 
> New microcode does permanent caches as an "attempt" to fix USB issues ~ breaks cache on 1203A and anything no Patch B / microcode downgrade does not remove the change
> It's just windows 11 broken thread scheduler
> 
> Theory 1 i haven't verified, as i'm still suffering from the microcode (update attempt) even with a downgrade
> It would require me to downgrade to 21H1 Windows 10 , and see if cache fixes
> If not, require me to wipe NVRAM and flash it again somehow ~ if anything from the "upgrade" was still left
> 
> Something generally is fishy here with L3 cache and the microcode updates, which are enforced after 1203A+
> EDIT: Nothing fishy, but fishy because of the introduced 2100 FCLK lock for me on the new microcode 😐


to "state the obvious" it is 100% windows 11 thats the issue, i have a "DUAL" boot win 10 and win 11 on a "clean drive (win 11 is on the clean)
my win 10 is my normal windows install to which you see me screenshotting and such.

its not the microcode causing this, im still on the newest beta bios for my board (1.2.0.3.b) and dont have such issue within windows 10 (on that normal drive)

(EDIT) adding pics for "PROOF" this is literally while windows 10 is still loading (my normal win 10 install)
thus all my "auto programs still being loaded etc"








and this was the other day, "yesterday still had the same issue"
win 11 version "latest?" it has no updates so dev preview channel build.
(EDIT AGAIN) see below








ok heres a thing, that upon 3 reboots it happens EVERYTIME and well, id test
L3 cache by double clicking before programs/services started etc this is what i got roughly
(all three of these are on win 11








re-ran a few like latency and all L3 cache tests several times, (to which i never got the numbers it should be or close to again
unless i restarted and immediately ran L3 cache tests.


----------



## Veii

craxton said:


> re-ran a few like latency and all L3 cache tests several times, (to which i never got the numbers it should be or close to again
> unless i restarted and immediately ran L3 cache tests.


Yes, i can relate to this
Only once ever got them at fullspeed how they should be
It "looks like" it's throttled in post.
No chipset drivers did change that and no microcode - soo the fault belongs to Microsoft

Results where the same with virtualization and without support for it
(On topic that Win Xi, is a full virtualized sandbox) 

It seems to be variable, up to load
But at least performance is where it has to sit (it's not much worse, but could've been way better)
No allcore did change these results


----------



## mongoled

Sorry for offtopic, but as there are users other than MSI would like to ask if any of you have options to disable PCIe/SATA slots in their BIOSs ??

I have never seen such options in my MSI boards and am wondering why AMD (or board manufacturer) dont have this function in BIOS.

For example, when playing with different OSs, sometimes I want to make sure that certain M.2 slots are disabled so that nothing can be written to a particular SSD/HDD.

At this current time I have to physically remove/disconnect the drives, be it NVME or SATA, its just plain stupid not having such option in BIOSs of premium motherboards, especially if you are watercooling and have stuff not easily accessible ....


----------



## ManniX-ITA

So far for me did always work to set the disks in Disk manager to offline.
It does prevent Windows to access them beyond the basic driver initialization.
I've managed to seriously mess up also the Windows2GO install once testing memory timings, all the offline disks were left untouched and safe.


----------



## mongoled

ManniX-ITA said:


> So far for me did always work to set the disks in Disk manager to offline.
> It does prevent Windows to access them beyond the basic driver initialization.
> I've managed to seriously mess up also the Windows2GO install once testing memory timings, all the offline disks were left untouched and safe.


This works for when you do an install from Windows not work when you are installing "raw" i.e. direct from USB stick ??

For example I have a single SSD that has both Windows and Ubuntu, now if you ever delved into dual boot with Windows/Unix you will know that it was not a simple task to get both these OSs on one SSD, especially considering that Windows likes a GPT partiton where as Ubuntu is installed as MBR


----------



## ManniX-ITA

mongoled said:


> This works for when you do an install from Windows not work when you are installing "raw" i.e. direct from USB stick ??


Works always, just set the disks you want to keep safe Offline in Disk Manager.


----------



## mongoled

ManniX-ITA said:


> Works always, just set the disks you want to keep safe Offline in Disk Manager.


But we cant protect the OS we are actually using in this way as if we set to offline (is that even possible) then we would not be able to boot to it ever again through the normal procedure

🤣🤣 

Anyhow, this is offtopic, was specifically wondering if other manufacturers enabled such feature in their BIOS,

thanks for the tips, very useful!


----------



## ManniX-ITA

mongoled said:


> But we cant protect the OS we are actually using in this way as if we set to offline (is that even possible) then we would not be able to boot to it ever again through the normal procedure


Yes of course, you can't set Offline the disk from where you are booting 
But if you are testing dangerous settings is always better to use a "disposable" installation.
And I strongly recommend a Windows2GO installation running from USB, pendrive or better SSD.
I've managed to corrupt my install only once in 3-4 years.


----------



## mongoled

ManniX-ITA said:


> Yes of course, you can't set Offline the disk from where you are booting
> But if you are testing dangerous settings is always better to use a "disposable" installation.
> And I strongly recommend a Windows2GO installation running from USB, pendrive or better SSD.
> I've managed to corrupt my install only once in 3-4 years.


I miss the days when we had "real" BIOS where even the cheapest motherboards offered the option to disable different peripherals on the motherboard, be it SATA or onboard sound etc !


----------



## mongoled

@ManniX-ITA
No no no, sorry, i tried this method but the installer from USB is still writing information to the connected disks boot loader irrespective that it was set to "offline".

This is what I expected to happen, was hoping that I would have been incorrect ..

What this means is that the SSD (SATA) that I am actually installing the OS onto does not get any bootloader info, instead that info is added to one of the connected NVME drives


----------



## ManniX-ITA

Yes at boot and driver initialization the disks are still accessed.
What exactly are you trying to do; a new install on an M.2 drive which is not your primary?


----------



## mongoled

ManniX-ITA said:


> Yes at boot and driver initialization the disks are still accessed.
> What exactly are you trying to do; a new install on an M.2 drive which is not your primary?


Thanks for your interest I do understand you want to help and I have solutions for what I want to do so dont really need any guidance, just cannot get my head round why manufacturers dont allow us to be able to disable peripherals like we used to be able to do in pre UEFI BIOS.

I will answer your question though as it may be useful for someone else but some details are needed!

nvme0 has one OS and is GPT
nvme1 has two OS and is MBR (Windows followed by Ubuntu)

I want to play with Windows 11, so I create USB drive, everything fine and dandy.

I add a SATA drive (SATA0) for the Win11 install, everything looks as it should be

But the Windows installer has added its boot loader info to nvme0 but not to SATA0.

What this means is that whenever I boot nvme0 I am greeted by the Windows boot screen asking me which install I would like to use, choosing an OS that was not previously chosen results in a 2nd reboot to access the OS, which is lost time.....

Now this is not an issue for me as I use Ubuntu bootloader for my OS, but its still a pain in the arse.

And I know how to fix any issues after removing one of the disks that is in the Windows boot loader, but thats not the point. we should have easy access to disabling stuff via motherboard, not have to jump throuhg hoops just because we want to try a new OS on a seperate disk ......


----------



## craxton

mongoled said:


> Sorry for offtopic, but as there are users other than MSI would like to ask if any of you have options to disable PCIe/SATA slots in their BIOSs





Spoiler



(EDIT) if your unlocked bios does not have it then grab one from here (edars bios mods?)
which is what i was running 160 1.2.0.2 but since then im now running the "normal"
bios.

if your running the "unlocked" bios then you have these options in there to turn off
sata drives (mine had like 30) might not have been 30 per say. but did have a HIGH number
of USB, sata, and other things like that.
here is where its at,
unsure if thats what you mean however. but the options are there to turn off sata


----------



## ManniX-ITA

mongoled said:


> Thanks for your interest I do understand you want to help and I have solutions for what I want to do so dont really need any guidance, just cannot get my head round why manufacturers dont allow us to be able to disable peripherals like we used to be able to do in pre UEFI BIOS.


Got it, yes it's interesting info also here if someone want to experiment with high FCLK without messing with the primary install 

Indeed if you want to avoid the install to write the boot information the disks needs to be unplugged or the channel disabled (which is not always available).
Guess the manufacturers decided it was less risky to avoid the user would disable accidentally the disks and then complaint...

My suggestion was more for testing unstable configurations.
The Win2Go installation does not touch at all the other disks; the install is only added to the available Windows Boot Manager as UEFI choice if the disk is plugged before POST.


----------



## mongoled

craxton said:


> (EDIT) if your unlocked bios does not have it then grab one from here (edars bios mods?)
> which is what i was running 160 1.2.0.2 but since then im now running the "normal"
> bios.
> 
> if your running the "unlocked" bios then you have these options in there to turn off
> sata drives (mine had like 30) might not have been 30 per say. but did have a HIGH number
> of USB, sata, and other things like that.
> here is where its at,
> unsure if thats what you mean however. but the options are there to turn off sata
> View attachment 2516739
> View attachment 2516740
> View attachment 2516741


nvme ??


----------



## craxton

mongoled said:


> nvme ??


the only thing ive seen related to turning "off" an NVME drive is to disabled its drivers.
but then again windows doesnt allow much of that anymore.


Spoiler



-next time ill be sure and check the entire question instead of the first part.-
if the option to "turnoff" nvme drives was in there, i didnt record it.
as the images i shared here were recorded to try and get "differences" between the two bios revisions
but without the second one being unlocked/unhidden its hardly something i can manage to try/attempt.

it doesnt seem manufactures hardly "anywhere"
believe this needs to be there. i wished it was as i wouldnt have to remove the nvme drive (s) every time i wanna install windows
again.

so, a question for you both. since im booting into win 11 and "trying" and failing to do so
get 3800 c14-14 stable while its "failing" etc it can corrupt my other drives? since they aren't disabled inside bios????


----------



## mongoled

craxton said:


> the only thing ive seen related to turning "off" an NVME drive is to disabled its drivers.
> but then again windows doesnt allow much of that anymore.
> -next time ill be sure and check the entire question instead of the first part.-
> if the option to "turnoff" nvme drives was in there, i didnt record it.
> as the images i shared here were recorded to try and get "differences" between the two bios revisions
> but without the second one being unlocked/unhidden its hardly something i can manage to try/attempt.
> 
> it doesnt seem manufactures hardly "anywhere"
> believe this needs to be there. i wished it was as i wouldnt have to remove the nvme drive (s) every time i wanna install windows
> again.
> 
> so, a question for you both. since im booting into win 11 and "trying" and failing to do so
> get 3800 c14-14 stable while its "failing" etc it can corrupt my other drives? since they aren't disabled inside bios????


Initial question below



mongoled said:


> Sorry for offtopic, but as there are users other than MSI would like to ask if any of you have options to disable PCIe/SATA slots in their BIOSs ??
> 
> I have never seen such options in my MSI boards and am wondering why AMD (or board manufacturer) dont have this function in BIOS.
> 
> For example, when playing with different OSs, sometimes I want to make sure that certain M.2 slots are disabled so that nothing can be written to a particular SSD/HDD.
> 
> At this current time I have to physically remove/disconnect the drives, be it NVME or SATA, its just plain stupid not having such option in BIOSs of premium motherboards, especially if you are watercooling and have stuff not easily accessible ....


Thanks for confirming, but we are MSI users, so was hoping users of ASRock, Gigabyte and Asus would chime in.

Regards the "driver" you found, I think you found that in the CBS menu and that has to do with tweaking the actual PCIe "hardwire".

Google "PCIe redriver j3600"

First hit is a good read (Astra Labs)



I am guessing that is what you are referring to


----------



## Veii

mongoled said:


> nvme ??


Which Board & Bios do you run ?
You can try this method , or rather i assist you ~ just need to know which bios you have to decompile part of it








Asrock X370 Taichi Overclocking Thread


Nope. It's actually X370 Taichi ultimate. Professional gaming was rebranded to Ultimate on X470. X370 taichi even has slots for reset/power buttons on motherboard, that are unused, but installed on professional gaming. Same board, plus those two buttons, tpm slot and aquantia chip and different...




www.overclock.net





RU will allow you to change ~everything~ as long as you have write access
Figuring this out alone is a good contribution
But i think you should have. Only ASRock did this AMI variable lock nonsense

Quote me on that, as my ASUS board isn't set up
But you could disable them temporary , same as you can "just disable" PCIe headers
Probably that's the way to do it
Only Sata goes from the PCH and has another "disable" flag
The rest has to be done directly from CBS, by the IO-Die

You mentioned you have two NVMe's right ?
That for windows is sadly common
Windows 11's BCD is slightly different than 10's, soo even on the same drive it will install two versions of it (it's a mess)
but i get what you're talking about. it does jump drives for no reason
Tho i am nearly sure that there was a CMD (Shift+F10) to enforce the EFI directory

Mostly Hackintosh users with clover where doing this, as windows bootloader for it#s 100MB EFI partition and UNIX's 200MB partition ~ both had different GPT partition table "alignments ?"
Anywho, windows refused to install if macOS was on it - because the EFI partitions mismatched. Soo there was a command to enforce and change bootloader location ~ else it's autodetecting all drives that are plugged in

Quote me on that last part, but i think if you remove the drive letters from diskpart, before clicking on "next" before it initializes them
The bootloader shouldn't jump drives
Sadly i can not verify that, as i have only 1 drive for now ~ but yes i think this last part can work too
Tho if you want to give RU a try, let me know ~ so i can assist you in case something is fishy
Changes with it are permanent (well semi), permanent till by CMOS reset overwritten ~ but if not overwritten, permanent fully
(able to brick bioses soo be aware & maybe let me look over your shoulder the first time)


----------



## craxton

Spoiler



@Veii
what type of "donation" would it take for you, or @KedarWolf
to unlock/unhide the "next" and "current" bios for the B550 gaming edge wifi board?

if your interested you can either respond here, or shoot me a figure to my DM's here with your value
and paypal.
ive seen and read ALOT of "how-to" but im unfimilar with DOS and using flash drives for mostly anything outside of windows installs
and saving files to them.
all im after is a configured unlocked bios "fully" as fully as one can get it, for this beta bios thats on the page now,
and the next revision. im unsure who else i could tag, i already tried the one who first unlocked the last bios release but
he was 'un-responsive'



Veii said:


> Quote me on that last part, but i think if you remove the drive letters from diskpart, before clicking on "next" before it initializes them
> The bootloader shouldn't jump drives


as it stands, when i was "re-installing" back a few months ago to "test" my 100% WHEA-19 free issue (wasnt an issue i was actually 100% free)
i had turned off, and removed drive letters entirely, but where windows was "already" set up on one drive it stayed there
and didnt set-up a new "boot" partition. even tho it was indeed not configured and couldnt be seen inside windows a new "partition"
for "boot/EFI" was NOT made. to which, i had to remove ALL drives to make windows re-install EFI system etc on the new drive.


----------



## mongoled

Veii said:


> Which Board & Bios do you run ?
> You can try this method , or rather i assist you ~ just need to know which bios you have to decompile part of it
> 
> 
> 
> 
> 
> 
> 
> 
> Asrock X370 Taichi Overclocking Thread
> 
> 
> Nope. It's actually X370 Taichi ultimate. Professional gaming was rebranded to Ultimate on X470. X370 taichi even has slots for reset/power buttons on motherboard, that are unused, but installed on professional gaming. Same board, plus those two buttons, tpm slot and aquantia chip and different...
> 
> 
> 
> 
> www.overclock.net
> 
> 
> 
> 
> 
> RU will allow you to change ~everything~ as long as you have write access
> Figuring this out alone is a good contribution
> But i think you should have. Only ASRock did this AMI variable lock nonsense
> 
> Quote me on that, as my ASUS board isn't set up
> But you could disable them temporary , same as you can "just disable" PCIe headers
> Probably that's the way to do it
> Only Sata goes from the PCH and has another "disable" flag
> The rest has to be done directly from CBS, by the IO-Die
> 
> You mentioned you have two NVMe's right ?
> That for windows is sadly common
> Windows 11's BCD is slightly different than 10's, soo even on the same drive it will install two versions of it (it's a mess)
> but i get what you're talking about. it does jump drives for no reason
> Tho i am nearly sure that there was a CMD (Shift+F10) to enforce the EFI directory
> 
> Mostly Hackintosh users with clover where doing this, as windows bootloader for it#s 100MB EFI partition and UNIX's 200MB partition ~ both had different GPT partition table "alignments ?"
> Anywho, windows refused to install if macOS was on it - because the EFI partitions mismatched. Soo there was a command to enforce and change bootloader location ~ else it's autodetecting all drives that are plugged in
> 
> Quote me on that last part, but i think if you remove the drive letters from diskpart, before clicking on "next" before it initializes them
> The bootloader shouldn't jump drives
> Sadly i can not verify that, as i have only 1 drive for now ~ but yes i think this last part can work too
> Tho if you want to give RU a try, let me know ~ so i can assist you in case something is fishy
> Changes with it are permanent (well semi), permanent till by CMOS reset overwritten ~ but if not overwritten, permanent fully
> (able to brick bioses soo be aware & maybe let me look over your shoulder the first time)


The info is very useful, but IMHO nobody should have to go through all these steps just to be able to disable such things, it should be a given!

I managed to do what I wanted, but this took me a couple of hours instead of 2 minutes









Re Mac/Windows, sounds like similar issue when wanting to dual boot Windows/Ubuntu off same disk.

I have setup my benching OS and an Ubuntu installation on a 256GB NVME using MBR, that was a pain in the arse to get right, particularily when I wanted the ubuntu boot loader to be only on that disk!

Anyhow, I didnt want to drag this thread so off topic, I know many of you wanted to offer your assitance and you did, I was just wanting to hear if it was not only MSI that did not have such simple feature as disabling SATA/NVME or any other peripheral attached to the motheboard...


----------



## Eder

craxton said:


> oh no boss, thats not what i meant lol, i meant that when one sets 3-3-15 inside the "main ram overclocking"
> section 3-3-15 or 4-4-18 it then sets for 3-3-15 3-3-f and 4-4-18 is set to 4-4-12
> its been this way on my board since the first time i used setup timings.
> 
> (if i set main page to "auto" voltages then inside AMD section it shows CCD and IOD voltages
> being PRESET TO 1150 (edit/corrected) will check and fix but, thats quite HIGH)
> 
> as some of you will notice i was pretty far back in the thread (you getting a +1 from me)
> and i tried LOTS of stuff that worked before for others that wouldn work for me example
> 56-56-56 and 51-0-0 for 3800c14 (dimms overheated 3-3-15 at 1.55)
> and it would seem that while others could easy use tRDWR 8 and tWRRD 3 or 1
> i could ONLY post with tRDWR 10 and tWRRD 4. (these are indeed SR sticks) but act like DR.
> 
> to that extent, while runing the "old 4ghz mem oc from the 5600x on this 5800x
> i changed only the 4000 to 3866 (1:1:1 ALWAYS) and could get WAY less whea 19s than that from an
> hour before hand (i also added 3-3-15 tcke 9) and 1.4v with 4-4-18 tcke 11 wheas FLOODED
> 
> 
> sigh, that bios has all unlocked.
> but i get way more issues running fclk 2000 for what ever reason.
> but since im testing i might as well. (might even try the old bios i stayed with for a while)
> i asked Eder to unlock the latest but 🤷‍♂️ and i myself tried every "old" guide i saw...
> 
> 
> View attachment 2515992


Can have a look next weekend please remind me (pm). I have been very busy for a couple of months. Years of tweaking got me a really cool IT job 😁 so lots of new stuff at the moment.


----------



## mongoled

Eder said:


> Can have a look next weekend please remind me (pm). I have been very busy for a couple of months. Years of tweaking got me a really cool IT job 😁 so lots of new stuff at the moment.


Congratulations dude 😊😊


----------



## craxton

@anyone of yall, thats been running this since you had WHEA19 issues.
have you noticed any strange things occur? 
have you needed to increase voltage ranges on IOD/CCD/cLDO_VDDP 
to keep 4000/2000 stable (or whatever IF clk your running?)
ive managed to DRASTICLY lower my WHEA 19s at IF 2000 with some IOD/CCD voltage 
and knock out the USB/DISPLAY/AUDIO dropouts completely with (unsure if it was IOD/CCD voltage)
or an updated bios/it being unlocked.


----------



## Comalive

At fclk 2000, I get WHEA 19 spam. I pass all the stress test, but I get get a reboot every ~10 hours of the system being powered on, independent of the load at the time.
Do you think that this tool will be for me?


----------



## ManniX-ITA

Sorry summer break 



Comalive said:


> Do you think that this tool will be for me?


No sorry, it won't help.
You need to fix the reboot issue first.
I would check the CO counts first.



craxton said:


> have you needed to increase voltage ranges on IOD/CCD/cLDO_VDDP
> to keep 4000/2000 stable (or whatever IF clk your running?)


Yes, I need higher IOD/CCD for FCLK 2000 (1140/1080) stable and performant.
Those are too high for FCLK 1900, best are 1100/1060.
No changes in VDDP.


----------



## GRABibus

Hello,
I just discover this very interesting thread.
A simple question : by installing this tool, it removes Whea 19 warnings in event viewer, without creating performance penalties ?

if yes, I just wonder why not all Ryzen owners don’t use it to increase fclk > 1900MHz 😊


----------



## ManniX-ITA

GRABibus said:


> A simple question : by installing this tool, it removes Whea 19 warnings in event viewer, without creating performance penalties ?
> 
> if yes, I just wonder why not all Ryzen owners don’t use it to increase fclk > 1900MHz


Yes but only performance penalties due to the massive amount of WHEA events.
If the CPU is unstable at high IF and crashes or it's slower it doesn't help.

My 5950x works best at CCD/IOD/SOC voltages 1050/1080/1180 at FCLK 1900.
That's not enough for FCLK 2000, needs 1080/1140/1250.

So with lower voltages can be slower or unstable going up.
I also need a specific option in AMD PBS menu (which is usually hidden), CLKREQ# to be Enabled otherwise I have performance regressions.

That's why not everyone is running at high FCLK; can be slower or unstable or you could feel uncomfortable with those high voltages.

But if it works, it works quite well and it's pretty darn fast!
Also fixed my unstable USB issues for good which is a nice cherry on top 

Found testing with a monero miner (thanks to @Veii) is the best method to verify performance regressions:









Releases · fireice-uk/xmr-stak


Free Monero RandomX Miner and unified CryptoNight miner - fireice-uk/xmr-stak




github.com


----------



## Veii

GRABibus said:


> if yes, I just wonder why not all Ryzen owners don’t use it to increase fclk > 1900MHz 😊


That's what i wonder from AMDs side too 
When will they fix this - as dual CCD (6,8 cores) don't have WHEA #19 = "LCLK DPM towards IO" issues


----------



## Arashi

Edited for duplicate post.


----------



## Zogge

Question, if you disable one CCD on the 5950X, would that decrease or eliminate the number of WHEA 19 then on higher FCLK ? A waste yes but still.


----------



## ManniX-ITA

Zogge said:


> Question, if you disable one CCD on the 5950X, would that decrease or eliminate the number of WHEA 19 then on higher FCLK ? A waste yes but still.


Not on mine, didn't change a bit


----------



## Arashi

Hi everyone,

I've recently built a new rig in late August with a 5800X and Asus Strix B550-F ATX Mobo. Attained a stable OC at both 3800/4000MHz but 2000 FCLK results in a constant bombardment of WHEA #19 errors (hundreds in a matter of minutes) so I started searching for answers and found this thread.

I tried to follow the discussion in the memory stability thread to understand the underlying issue and it seems that the problem lies with the ethernet chip? It's kinda difficult to grasp what's really going on by digging thru a discussion that was 4 months old and I apologize if I misunderstood @Veii 's findings.

I am on the latest BIOS version and did not observe any peripheral problems throughout my testing. Is this workaround by ManniX the only thing I can do to "fix" the problem?

Upon further reading it seems to me that my mobo with I225-V is not supposed to have this problem or am I reading it the wrong way? Some of you mentioned Intel supposedly fixed this with a B3 stepping. Or is the fix he was referring relates to a entirely different issue? My mobo is on revision B0 so did I just get unlucky and bought a mobo that doesn't have the fix?

P.S. Reposting since original post got flagged for mod review without any action for apparently no reason.



ManniX-ITA said:


> Could be the Realtek NIC is only one of the many problems... or that is not the root cause but a victim of something else.
> But seems Intel fixed something while Realtek didn't.





DeletedMember558271 said:


> I don't know what Intel fixed with I225-V other than the connection problems it was having, when they released the fixed B3 stepping.


----------



## ManniX-ITA

Arashi said:


> Upon further reading it seems to me that my mobo with I225-V is not supposed to have this problem or am I reading it the wrong way? Some of you mentioned Intel supposedly fixed this with a B3 stepping. Or is the fix he was referring relates to a entirely different issue? My mobo is on revision B0 so did I just get unlucky and bought a mobo that doesn't have the fix?


Unfortunately we weren't able to confirm or deny any hypothesis.
The massive flow of WHEA 19 could be or could be not triggered by the Realtek or Intel NIC. As well as something else.

What I can say is that despite the massive amount of errors I'm 100% stable and with no performance degradation.
BUT in my case I could achieve it only with the CLKREQ# option enabled in the AMD PBS menu.
It may not be needed in all cases but if you have the PBS menu it's worth to check it out.

So my software helps if you are 100% stable.
Next step is to check if you have any performance degradation.

Basic step for me is that you take a Geekbench 5 baseline at FCLK 1900.
Then switch to FCLK 2000 and compare it.

You'll probably need more SOC, CCD, IOD voltages.
With my 5950x I had to bump from 1.17V/1020mv/1100mV to 1.22V/1080mV/1140mV.
A bump on LLC and PWM settings could be as well needed.
Testing with GB5 brought me to CCD 1050mV and IOD 1120mV.

Once you have same or better scores you have to compare with the Monero miner.









Releases · fireice-uk/xmr-stak


Free Monero RandomX Miner and unified CryptoNight miner - fireice-uk/xmr-stak




github.com





So take a baseline as well with FCLK 1900 and compare the hashrate at FCLK 2000.
This lead me to CCD 1080mV and IOD 1140mV.
Better scores than FCLK 1900.


----------



## Arashi

ManniX-ITA said:


> So my software helps if you are 100% stable.
> Next step is to check if you have any performance degradation.


Thanks a lot for your advice. I am still using my old 1070 on the new rig so the system is severely GPU bound and probably doesn't make too much sense to go through all the trouble to push 4000+ at this point. But I will definitely consider trying this out once I got my hands on a new GPU.


----------



## Veii

@Arashi
WHEA #19, has to do with something along the lines of the PCH (not always PCH only)

It also is connected to first batch units, which lack function of a specific sensor, or are "bugged out"
That's for the minimum requirement ~ luck on the CPU

The later part, is an AGESA sided & Module FW issue. A design issue
A similar one you can find with Intels Revision 2 2.5gbit NICs which also show the same issue

I would call it so far a design issue, less than a FW issue. A hardware design issue ~ as both parts CPU and PCB NICs are rather Hardware sided

PCIe dropouts @ and beyond 2000 FCLK are known - but so was/is also USB Dropouts, Audio Crackling and general PCH connectivity
Soo in some cases M.2 dropouts and instant shutdowns ~ along with Spectre.v5 issues and experimental patches.

The "issue" lies somewhere on the adaptive link speed management & inside some of the sensorics'. That is what is known by now on this forum.
Samples continue to get collected, but it's not 100% clear as ManniX-ITA mentioned. Because the issue is a combination of issues. The core issue surely is hardware based but it's unclear to what to point it ~ in the sea of reactive together side-issues

USB Dropouts, PCIe driver crashes ~ all (mostly) where a sideproduct of Spectre patches, which where a bug sideproduct of powermanagement redoing. So also enabling experimental dLDO_Injector sensor (too early) and forced in redoing powermanagement (forced, thanks to prediction branch exploit issues)
All these started to cause PCH connectivity issues ~ which every one of them can easily trigger WHEA #19 as "unknown issue"

Problem now,
At first Pre-PatchD with GMI speed overdrive (likely to compensate for the patches and again a bit more IPC on PatchD forward)
Patch-C enforced a 1900 FCLK hardcap. Gladly community was already able to push beyond this and it was dropped. Same thing that was/still is enforced on Matisse ~ for pretty much a similar reason (PCIe and I/O dropouts)
~ also this, is a factor for potential WHEA #19

PCIe 4.0 and NVMe 4.0 at this date function. We have 4 options for it to loadbalance:

SRIS (AMD CBS)
DPM LCLK (AMD OVERCLOCKING autodetection for PCIe 4.0 gear)
Normal DPM and GMI link speeds which do adapt and adjust itself.
there is one setting for PCIe stability, but i forgot the name right now

So far after 1.2.0.0 CPUs do use DPM LCLK balancing, and have to use it ~ else the Thermal Sensor and CCA/EDC sensor drop
(this does not bypass the PROCHOT throttle on 6 core units, but it does bug out if it "isn't" used)

PCH.SPI, PSP ~ got encrypted (well psp always was)
PCH on it's own after 1.2.0.0 changed functionality too ~ aside from it's own powermanagement attempts, it got locked down to prevent PCI MUTEX injection attacks
This logically also caused on the transition period from 1.1.8.X to 1.2.0.3/4 - issues, and also bricked some boards ~ because ROM flashing "methodic" changed

Most common WHEA#19 are ethernet & Single Virtualization Link (pcie ~ UAD suffers from these for example) issues IF ! , you are free from SATA and PCIe 4.0 dropout issues
Later you can hardly tell, as 1.2.0.3C still has some of the side effects of issues ~ some of the collective & also the CPU still does overboost bug out and sometimes either push 50ghz or request +1.68vCore
Soo some sensorics still are bugged ~ more information about values for sensorics is sadly not found by me, soo i should not quote voltage values ~ i shouldn't speak about

Till "everyone" is WHEA free ~ it will take time
They have so much to fix ~ and PCH issues technically are not CPU performance related.
It's good to have this tool here for people who need it ~ but it's also good to consider a possibility that some old realtek FWs where buggy and still "some" I225-V 2.5gbit are not bugfree as boardpartners mixed revision 2 (broken since on intel) with revision 3 (not broken) between boards

Issues remain still to be far to many ~ to nail it down to one, but it improves
So also potential peak FCLK stability ~ when WHEA#19 is not a factor of issue (or the PC doesn't reboot when it's hopefully not m.2 or SATA issue)


----------



## ManniX-ITA

Arashi said:


> Thanks a lot for your advice. I am still using my old 1070 on the new rig so the system is severely GPU bound and probably doesn't make too much sense to go through all the trouble to push 4000+ at this point. But I will definitely consider trying this out once I got my hands on a new GPU.


Well, I'm not sure that even if you are not anymore GPU bound would make sense.
I had a GTX 1070 and just switched to an RTX 3090. It's not that at FCLK 2000 it goes any faster than anyone at FCLK 1900.
Maybe for some very heavy professional workload but I didn't really made an A-B test.
The thread title it's kind of ironic 

The system is a bit more responsive and where memory matters it's a bit more faster.
We are talking about a few percentage points.
The only real substantial advantage for me it's the USB issues are gone 

It's a challenge and I love to run my CPU at the limit and over.
But that's the most of it!


----------



## mongoled

I also get the occasions that no WHEA 19s appear at high FCLK, which occur when ive changed some BIOS setting or have reloaded a config and its the first post/boot.

No idea why this happens from time to time ....


----------



## Arashi

Veii said:


> @Arashi
> WHEA #19, has to do with something along the lines of the PCH (not always PCH only)


Really appreciate the detailed explanation. I don't even know how to convey my gratitude for such a informative reply. Thank you!



Veii said:


> Later you can hardly tell, as 1.2.0.3C still has some of the side effects of issues ~ some of the collective & also the CPU still does overboost bug out and sometimes either push 50ghz or request +1.68vCore
> Soo some sensorics still are bugged ~ more information about values for sensorics is sadly not found by me, soo i should not quote voltage values ~ i shouldn't speak about


I am running on CO -25 all cores and TDC limit set to 80A to lower CPU temps.
When the TDC limit is not reached my system is able to maintain 4.85GHz ~1.3V @fulload.
Haven't messed with single thread boosting yet and I am not sure whether the overboost bug you mentioned can override PPT/TDC/EDC limit.
Even though I haven't observed Vcore passed 1.5V so far I am not entirely positive whether software readings can pick up the spikes from that bug you mentioned and that got me a little worried.



ManniX-ITA said:


> It's a challenge and I love to run my CPU at the limit and over.
> But that's the most of it!


I feel ya. TBH I am really tempted to do it just for science and see how far I can get. My system does boot into windows @2100 FCLK but I know trying to stabilise is probably gonna require voltages that's way out of my comfort zone.

On the other hand I know that 2000FCLK is very achievable barring the WHEA #19 issue. And that "free" performance is truly tempting.


----------



## Veii

Veii said:


> Till "everyone" is WHEA free ~ it will take time
> They have so much to fix ~ and PCH issues technically are not CPU performance related.
> It's good to have this tool here for people who need it


PCH issues wouldn't matter , the error spam - if you know that the CPU is rock stable
The Storage is stable, Onboard audio doesn't crackle, NVMe doesn't drop out on bandwidth tests

Then you can disable the spam of them.
PCH always will cause errors and always will continue to work with them
If PCH crashes, the whole PC shuts down ~ but you'll first notice it by graphical glitches or corrupted storage/BSODs

The only thing you should always keep an eye on - is memory latency, as EDC throttle and Package throttle will autocorrect L1, L2, L3 & write bandwidth (slow it down)
Soo miners are a good benchmark. Aida64 shows it. SiSoftware Sandra Inter-core test shows it. Geekbench does. You'll have fun 


Arashi said:


> Even though I haven't observed Vcore passed 1.5V so far I am not entirely positive whether software readings can pick up the spikes from that bug you mentioned and that got me a little worried.











This is Hydra, but it is not Hydra's fault - but rather overboost (bug, not feature) on suspended ~ deep sleeping cores that wake up
1.68v & 5.9ghz is kind of "toasty" as little peak
And both HWInfo and APCI readouts are correct.
But except for voltage pass the FIT VID mark (1.45v or 1.55v) on PBO this only peaks in frequency but not in voltage.
Nevertheless ~ AMD has a lot to fix still. Soo if you really want to push beyond your limit with a troublesome board ~ this tool here does help for such.
Or finding the first batch November Dual CCD 6 & 8 cores. They haven't such issues, but are plagued with bad V/F curve which need a little positive vcore offset ~ to function


----------



## Arashi

Veii said:


> PCH issues wouldn't matter , the error spam - if you know that the CPU is rock stable
> The Storage is stable, Onboard audio doesn't crackle, NVMe doesn't drop out on bandwidth tests


When you say bandwidth tests you mean stress-testing the entire PCIe 4.0 lane right? 



Veii said:


> 1.68v & 5.9ghz is kind of "toasty" as little peak


O man, what can I say.
YIKES!!!
Part of me kinda wished I 've held on and waited for 12th gen now with all of these piling up on top of each other.
But playing around with AMD after so long is kinda fun!


----------



## Veii

Arashi said:


> Part of me kinda wished I 've held on and waited for 12th gen now with all of these piling up on top of each other.
> But playing around with AMD after so long is kinda fun!


Alder Lake is scheduled for Q2 2022
Zen3D schedule is unknown , but still this year

If Zen3D doesn't appear great (doubt), then sure ~ but Alder Lake shouldn't be compared to Vermeer, like current media does
Nor should we stick to leaks. Although i heard myself it is gonna be great ~ so also did hear the same from Zen3D

Something being buggy, is kind of complaining on another level.
CPUs all function how they should on stock ~ not being able to run 2100 FCLK is kind of over expecting.
We can not expect Intel to run this on Gear 1 either ~ soo nobody should complain here


Arashi said:


> When you say bandwidth tests you mean stress-testing the entire PCIe 4.0 lane right?


Aida64 Cache bandwidth 
PCIe full range stability will show itself in 3D mark and OCCT memory error tests


----------



## Bal3Wolf

ManniX-ITA said:


> Unfortunately we weren't able to confirm or deny any hypothesis.
> The massive flow of WHEA 19 could be or could be not triggered by the Realtek or Intel NIC. As well as something else.
> 
> What I can say is that despite the massive amount of errors I'm 100% stable and with no performance degradation.
> BUT in my case I could achieve it only with the CLKREQ# option enabled in the AMD PBS menu.
> It may not be needed in all cases but if you have the PBS menu it's worth to check it out.
> 
> So my software helps if you are 100% stable.
> Next step is to check if you have any performance degradation.
> 
> Basic step for me is that you take a Geekbench 5 baseline at FCLK 1900.
> Then switch to FCLK 2000 and compare it.
> 
> You'll probably need more SOC, CCD, IOD voltages.
> With my 5950x I had to bump from 1.17V/1020mv/1100mV to 1.22V/1080mV/1140mV.
> A bump on LLC and PWM settings could be as well needed.
> Testing with GB5 brought me to CCD 1050mV and IOD 1120mV.
> 
> Once you have same or better scores you have to compare with the Monero miner.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Releases · fireice-uk/xmr-stak
> 
> 
> Free Monero RandomX Miner and unified CryptoNight miner - fireice-uk/xmr-stak
> 
> 
> 
> 
> github.com
> 
> 
> 
> 
> 
> So take a baseline as well with FCLK 1900 and compare the hashrate at FCLK 2000.
> This lead me to CCD 1080mV and IOD 1140mV.
> Better scores than FCLK 1900.



nice work on the tool and your info you gave us i installed the modded bios on my strix -e and used CLKREQ# im able to run 1933 now still cant get up to 2000 but thats a little more perf no memory errors i dont need whea service for 1933 with 4 x 16gig ddr3600 chips.


----------



## Bal3Wolf

i been sorta goofing around with 4000/2000 setup i seem to be able to get memory stable with decent bw and latency but noticed i need to push more vcore then i need at 3800/1900mhz i have figured out handbrake doing 3 encodes is great at finding issues passed 40mins on memtest then poof rebooted in 2mins once i loaded handbrake. Seems like you need a pretty big jump in vcore with unicore at 2000 but yet i can run pbo without it crashing it looks like.


----------



## ManniX-ITA

Bal3Wolf said:


> i been sorta goofing around with 4000/2000 setup i seem to be able to get memory stable with decent bw and latency but noticed i need to push more vcore then i need at 3800/1900mhz i have figured out handbrake doing 3 encodes is great at finding issues passed 40mins on memtest then poof rebooted in 2mins once i loaded handbrake. Seems like you need a pretty big jump in vcore with unicore at 2000 but yet i can run pbo without it crashing it looks like.


Do you mean more vCore with a static OC?
Could be also you need higher LLC/PWM on CPU/SOC to avoid dips.
I only need +0.0125/0.0250 depending on PBO settings.
But only to keep the performances in line, not for stability.
What did you set for SOC/VDDG voltages?


----------



## Bal3Wolf

yea more vcore with my static overclock that is stable with memory at 3800 but pbo was fine it waset crashing im using 1.050 on vddg an iod i tried soc up to 1.2 didnt seem to matter my base soc usualy is right at 1.12


----------



## ManniX-ITA

Bal3Wolf said:


> yea more vcore with my static overclock that is stable with memory at 3800 but pbo was fine it waset crashing im using 1.050 on vddg an iod i tried soc up to 1.2 didnt seem to matter my base soc usualy is right at 1.12


Static OC is indeed more demanding than PBO.

You can check the voltages I need here:









WHEAService, WHEA errors suppressor - unleash Ryzen...


@anyone of yall, thats been running this since you had WHEA19 issues. have you noticed any strange things occur? have you needed to increase voltage ranges on IOD/CCD/cLDO_VDDP to keep 4000/2000 stable (or whatever IF clk your running?) ive managed to DRASTICLY lower my WHEA 19s at IF 2000...




www.overclock.net





It's unlikely you can make it with those voltages.
Honestly I would avoid such high voltages with a static OC unless you can keep the CPU below 60c at all times.


----------



## Bal3Wolf

ManniX-ITA said:


> Static OC is indeed more demanding than PBO.
> 
> You can check the voltages I need here:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> WHEAService, WHEA errors suppressor - unleash Ryzen...
> 
> 
> @anyone of yall, thats been running this since you had WHEA19 issues. have you noticed any strange things occur? have you needed to increase voltage ranges on IOD/CCD/cLDO_VDDP to keep 4000/2000 stable (or whatever IF clk your running?) ive managed to DRASTICLY lower my WHEA 19s at IF 2000...
> 
> 
> 
> 
> www.overclock.net
> 
> 
> 
> 
> 
> It's unlikely you can make it with those voltages.
> Honestly I would avoid such high voltages with a static OC unless you can keep the CPU below 60c at all times.


giving up for the day driving me nuts last night i had a pretty stable memory config today it bsod almost instant on any timings.


----------



## ManniX-ITA

Bal3Wolf said:


> giving up for the day driving me nuts last night i had a pretty stable memory config today it bsod almost instant on any timings.


I agree it can be very frustrating 
I'd suggest small doses.
Maybe try again when a new BIOS is out.


----------



## Bal3Wolf

ManniX-ITA said:


> I agree it can be very frustrating
> I'd suggest small doses.
> Maybe try again when a new BIOS is out.



Think i figured out my screw up i started trying to use GDM off and just figured out it errors,bsods no matter what timings i set even at 3800mhz it was doing it so my board just cant run 4 16gig chips with GDM set to off. More tweaking needed but a good start no errors for 1.30Hrs.


----------



## mongoled

Bal3Wolf said:


> Think i figured out my screw up i started trying to use GDM off and just figured out it errors,bsods no matter what timings i set even at 3800mhz it was doing it so my board just cant run 4 16gig chips with GDM set to off. More tweaking needed but a good start no errors for 1.30Hrs.
> 
> 
> 
> 
> 
> View attachment 2526105
> 
> 
> View attachment 2526106
> 
> 
> View attachment 2526107


If you reset your BIOS to defaults what's the highest memory frequency you can boot?


----------



## Bal3Wolf

mongoled said:


> If you reset your BIOS to defaults what's the highest memory frequency you can boot?


with no volts added or what not clear really what your asking. Btw heres what i managed to get stable so far this memory has proved to be pretty dang good still going to see how much tighter i can get it but enough for now.


----------



## mongoled

@Bal3Wolf 
Sorry, should have said with GDM disabled, everything else AUTO.


----------



## MyUsername

A few days ago I found setting soc overclock vid to 38 in xfr enhancement I was actually able to post 1.2v on the soc, I didn't try it before because I didn't it'll work, but I was actually gobsmacked, literally every other method to set 1.2V soc fails to post just giving me 00, 92 on the debug and then I have no choice but to clear cmos as it gets stuck doing 00 92 loop. Playing with 2000/4000 stable in windows 10 I get a few wheas, but with the wheaservice it works fantastic, but Windows 11 is having trouble. Any ideas why it's not able to work?
I can't get over this is stable as I've got the 1900/3800 blackhole, unstable af.


----------



## ManniX-ITA

MyUsername said:


> Playing with 2000/4000 stable in windows 10 I get a few wheas


Which kind of WHEA? They are all Event 19?

You may need to set a higher VSOC since there's only 25mV from IOD when LLC drops the SOC voltage.
You may also need more CCD, try going up 1020/1050/1080mV.

I'm using BIOS version A21O cause it does have the PBS menu open.
Without CLKREQ# enabled there it's not working properly for me, performance regressions.
You should consider it.


----------



## MyUsername

ManniX-ITA said:


> Which kind of WHEA? They are all Event 19?
> 
> You may need to set a higher VSOC since there's only 25mV from IOD when LLC drops the SOC voltage.
> You may also need more CCD, try going up 1020/1050/1080mV.
> 
> I'm using BIOS version A21O cause it does have the PBS menu open.
> Without CLKREQ# enabled there it's not working properly for me, performance regressions.
> You should consider it.


Yeah your typical event 19. I'm not getting performance regressions, and your wheaservice clears all the wheas in Windows 10. The performance regression at 1900/3800 is crazy for me, it plummets 13500 down to 10000 then crashes, 2000fclk is a lot more stable and sits at 13300 on cpu-z. I have the PBS menu on a41 and it didn't improve or degrade to enable CLKREQ. If I could get the wheaservice to work on Windows 11 would be awesome as it wheas but the performance is stable.


----------



## ManniX-ITA

MyUsername said:


> I have the PBS menu on a41 and it didn't improve or degrade to enable CLKREQ.


I'd say a very good start but you have to check with the miner to be really sure you don't have performance regression.
It's the only way to stress enough the IF to tell for sure.
If you are dropping from 13500 to 13300 could be there's regression; fine tuning 1P8 voltage could help.
In AMD PBS you can do 1mV adjustments.



MyUsername said:


> If I could get the wheaservice to work on Windows 11 would be awesome as it wheas but the performance is stable.


Last time I tested it the WHEA sources would get the stop command but would not stop logging errors.
Could be Microsoft decided to "fix" my workaround.
Are you running the same Win11 build that was shared a while ago or a new one? Can you give me the exact version?
I can boot again in WIn11 and see if there's an update and try again.


----------



## Bal3Wolf

you know i must be lucky seems like i dont need alot of SOC,IOD,CCD volts to be stable it worked backwards more volts i sent more unstable it was, Anyone tried newer aida got some wierd results my bw is up on everything but so is latency retested mutiple times.


----------



## rossi594

@ManniX-ITA Buildzoid tested the supressor and said on his stream that it doesn't do anything for performance. Did you do more testing that shows that it improves anything?


----------



## rossi594

Bal3Wolf said:


> you know i must be lucky seems like i dont need alot of SOC,IOD,CCD volts to be stable it worked backwards more volts i sent more unstable it was, Anyone tried newer aida got some wierd results my bw is up on everything but so is latency retested mutiple times.
> 
> View attachment 2526557
> 
> 
> View attachment 2526559


I've seen lots of people that run high fclk without wheas using hynix djr or samsung c-die. The current issues seem to be B-Die related. (Some ppl. say board size affects it as well which would hint to pcie).


----------



## ManniX-ITA

rossi594 said:


> @ManniX-ITA Buildzoid tested the supressor and said on his stream that it doesn't do anything for performance. Did you do more testing that shows that it improves anything?


It's pretty obvious what it does... it helps with performances when you get thousands of WHEAs per minute and the Event Logger is processing them.
If you have this amount of WHEA under load the system becomes unresponsive.
That's why I made it; running y-cruncher my mouse was lagging 



rossi594 said:


> I've seen lots of people that run high fclk without wheas using hynix djr or samsung c-die. The current issues seem to be B-Die related. (Some ppl. say board size affects it as well which would hint to pcie).


No I can say for sure with Hynix DJR is exactly the same.


----------



## ManniX-ITA

Bal3Wolf said:


> you know i must be lucky seems like i dont need alot of SOC,IOD,CCD volts to be stable it worked backwards more volts i sent more unstable it was, Anyone tried newer aida got some wierd results my bw is up on everything but so is latency retested mutiple times.


You have a very high latency, maybe you can run with lower voltages since the IF is not much stressed.
But for sure it's a very nice sample cause I need a ton more!


----------



## Bal3Wolf

ManniX-ITA said:


> You have a very high latency, maybe you can run with lower voltages since the IF is not much stressed.
> But for sure it's a very nice sample cause I need a ton more!


yea its wierd with latecy all of sudden i had it down to 53-56 but all of sudden it started running higher and bw went way up gonna need more testing right now i went back to 3800/1900 and my static 4750/4650 1.23-1.26 static overclock redid all my rad fans to exhaust and on cb23 i dont pass 77c at current clocks with fans on full.


----------



## ManniX-ITA

Bal3Wolf said:


> yea its wierd with latecy all of sudden i had it down to 53-56 but all of sudden it started running higher and bw went way up gonna need more testing right now i went back to 3800/1900 and my static 4750/4650 1.23-1.26 static overclock redid all my rad fans to exhaust and on cb23 i dont pass 77c at current clocks with fans on full.


If you love static OC maybe check out Hydra. A bit rusty still but is very promising.


----------



## Bal3Wolf

ManniX-ITA said:


> If you love static OC maybe check out Hydra. A bit rusty still but is very promising.


yea i have it have toyed with it some waiting for more work to get done on it for differnt types of loads its a nice app tho i used it to check my CO values i had it was pretty close to ones i had been manualy testing.


----------



## ManniX-ITA

rossi594 said:


> Buildzoid tested the supressor and said on his stream that it doesn't do anything for performance. Did you do more testing that shows that it improves anything?


Do you remember which video?
I have updated the first post with some info.


----------



## ManniX-ITA

@Audioboxer 

Can you tell me the Windows 11 build that you're using?


----------



## Audioboxer

ManniX-ITA said:


> @Audioboxer
> 
> Can you tell me the Windows 11 build that you're using?


It's the release preview build which is supposedly going to be the same as the official launch build, 22000.194.


----------



## ManniX-ITA

Audioboxer said:


> It's the release preview build which is supposedly going to be the same as the official launch build, 22000.194.


Thanks, there's already the 22454 in the Dev channel.
Was curious if the behavior was the same.


----------



## Audioboxer

ManniX-ITA said:


> Thanks, there's already the 22454 in the Dev channel.
> Was curious if the behavior was the same.


Sorry, couldn't say. I never ran insider builds of Windows 10 and the only reason I jumped on this release preview was the commentary it is the final build for release and MS were just offering it up early for those who wanted it. Possibly to help with server load.


----------



## ManniX-ITA

Audioboxer said:


> Sorry, couldn't say. I never ran insider builds of Windows 10 and the only reason I jumped on this release preview was the commentary it is the final build for release and MS were just offering it up early for those who wanted it. Possibly to help with server load.


Thanks, no worries. I'm not updated on the status 
Did a re-recheck and the sources are properly Stopped but at least ErrorSource 0 keeps logging errors as it's still Started.
Not sure if it's a bugged feature or a featured bug to fix WHEAService.


----------



## vinz

Hello @ManniX-ITA

Thanks you for your tools that i was using so much with W10.

Did you have any way to have it working on W11 too ? Still locked to 1900 

EDIT : Sorry hadn't read "*Doesn't work with the Windows 11 pre-release builds which are circulating now* "

Please are you able to provide me one ?

Best Regards,
vinZ


----------



## ManniX-ITA

vinz said:


> Thanks you for your tools that i was using so much with W10.


You're welcome 
I have updated the disclaimer on first post.
I've also reported the bug to MS but I don't expect it'll be fixed.
More likely it's intentional to force the user to get WHEA errors whatever is the wish.


----------



## vinz

thanks for your reply, that's a ****ing bad news !
I hope we'll have a workaround in order to have this working


----------



## Audioboxer

Was just checking in to see if it was going to be possible with Windows 11, boooo lol.

But I guess the whole of Windows 11 is basically a performance penalty at this point lol. Early access OS.


----------



## Audioboxer

Hey guys looking for some advice for fixing USB issues at FCLK 2000 (same behaviour happens at 1933 just to add)










Here is a quick look at voltages. I can run memory stability tests (TM5/Karhu) for hours, no issues. General desktop use is absolutely fine, no issues. When I run CPU tests such as OCCT or y-cruncher (test 17 is a prime candidate) I end up with USB disconnects. Commander pro and my mouse tend to get hit the hardest.

I was doing most of my testing on Windows 11 but I've just came back to Windows 10, primarily to be able to use WHEAService. Error 19 spam is quite bad for me.

Something I noticed last night was when running OCCT or y-cruncher my mouse basically becomes unusable/laggy before it eventually disconnects. If I run said tests at 1900/3800 there is no mouse lag at all. I've also noticed I seem to be very sensitive with IOD voltage, as in, putting it too high (even going above 1.05v) can begin to introduce issues such as audio crackling or mouse lagging when idle/on desktop. I seem to be finding lower IOD voltages _help_ more? 1.05v is where I run it on my 3800/1900 profile.

As per advice of @ManniX-ITA CLKREQ# is turned on in my MSI bios. No idea what it does, but it seems to help with stability. I have no FCLK holes and no problems booting right up to 2100 (highest I've tried), I'm just really struggling to fix USB issues and be able to run CPU stability test apps. Funny thing is there are no CPU crashes, OCCT will continue to run/pass as will y-cruncher, but USB going crazy whilst running. I've not had any red WHEA errors either, it's just 19 spam.

Because it really messes with my commander pro (lose profiles and sometimes it even forces itself to do a FW update it doesn't need to) I'm reluctant to let the CPU stability test run long. Around 20 minutes in OCCT is my record so far before I noticed my mouse and commander pro disconnect. Usually though it happens within minutes. Can even be seconds depending on voltages and if CLKREQ# is enabled.

Onboard LAN is disabled (I use wireless) and GPU is set to PCIE 3.0. I've got a single NVMe drive connected. USB devices are mouse, keyboard, commander pro, Xbox wireless adapter and a hub. Commander pro is connected to the mobo USB connector which is USB 2.0.

I guess my plea is anyone who has managed to "fix" USB issues above 1900 FCLK, what was everything you had to do?


----------



## ManniX-ITA

Audioboxer said:


> , but USB going crazy whilst running.


That's usually because VDDG is too low.
Consider I need to have CCD at 1080mV and IOD at 1140mV for perfect stability and no drop in performances at high load.


----------



## Audioboxer

ManniX-ITA said:


> That's usually because VDDG is too low.
> Consider I need to have CCD at 1080mV and IOD at 1140mV for perfect stability and no drop in performances at high load.


Funnily enough it lessened with VDDG going lower. If I try running VDDG high, along with VSOC high, like your settings, I get worse results. 5950x. 20 minutes was my longest run of OCCT CPU test before a USB disconnect, with CCD at like 0.925v lol. IOD at 1.05v.

I gave up for now anyway, the only thing I can get running without issue is TM5 or Karhu, the second I hit the CPU hard with something like OCCT/y-cruncher/LinpackXtreme while it doesn't fail the testing the USB will go crazy at some point and my mouse pointer also sees performance issues with jerking/lagging.

I assume there is far too much WHEA correction going on whilst a CPU test is running and eventually USB gives out.


----------



## ApolloX30

I think I didn't report back here. 
About 2 months ago I tried the surpressor with e-die @ IF 2000. First I wanted to do stabilit tests and if positive continue with performance tests. 
Simple stability tests didn't find anything, but then I started to play Time Spy and there I had several reboots which I never had with Time Spy beforhand as well as afterwards. Based on this I ended the experiement and didn't do the performance tests.


----------



## Bohemian

I am running fresh win. install with latest bios for my Asus Strix B450-E gaming. With 2x8gb ram 4000mhz fclk2000 cl16,16,16,16,32,48 trfc 319 at 1.45V. Even in event log it throwing whea errors, windows, synth. benchmarks and games has zero issues.
Before I had 4x8 3200mhz cl14 and one more m2SSD which droped my pci-e 3.0 16x to only 8x. Even my event log was ok, I had random reboots every 2-3 days.
It seems this behaviour is gone for now. As far as my daily pc use is ok, I dont care about whats going on in event log anymore.


----------



## ManniX-ITA

Bohemian said:


> It seems this behaviour is gone for now. As far as my daily pc use is ok, I dont care about whats going on in event log anymore.


I would recommend anyway to compare FCLK 1900 against 2000 with the monero miner.


----------



## Veii

Timespy CPU seems to also show scaling between 1900 & 2000
Hence it tracks CPU throttling
Around 80-100p difference


----------



## maksimin11

Can someone advise how to disable geardown mode on Dual Rank B-die memory?
My 5600x with geardown mode 2000 FCLK works fine(No WHEA error about 420 hours)
But I can't disable geardown mode.
When I try 2T and geradown mode off, then window can't post.
Memory - F4-4000C16D-32GTRSA
VSOC : 1.125v LLC4
VDIMM : 1.42v
VDDP : 1.000v
VDDG CCD : 1.075v
VDDG IOD 1.075v


----------



## ManniX-ITA

Could be your VDIMM is too low.

This is my profile and it needs 1.55V:


----------



## ManniX-ITA

Thanks to David Kang (hahagu - Overview) there's a way to safely stop at least the WHEA events logging.
Not the same as stopping the source but at least no bloating of the system log.
Should work on Windows 11, I didn't test it yet.



https://github.com/mann1x/WHEAService/files/7574621/WHEA-Disable.zip



Info here:








Windows 11 - Block WHEA from Event Log (Information) · Issue #2 · mann1x/WHEAService


You can manually disable an error source from being auto logged to the event viewer, by editing [Computer\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\WMI\Autologger\EventLog-System{GUID}] a...




github.com





In the zip there's a single registry file to Enable/Disable both Event 19 and Event 20.
Would be better not to disable Event 20 as it's probably going to result in random reboots 99% of the times.


----------



## maksimin11

ManniX-ITA said:


> Could be your VDIMM is too low.
> 
> This is my profile and it needs 1.55V:
> 
> View attachment 2533408


Thx to information.
Success TM5 50cyc pass with GDM OFF 2T
DIMM 1.52v


Sub timing is still not yet set.


----------



## nada324

Hello,

Im getting whea #19 with error source 0 and with 1, but each one like 1sec one by one not 500million of them, maybe this would work?


----------



## ManniX-ITA

nada324 said:


> Hello,
> 
> Im getting whea #19 with error source 0 and with 1, but each one like 1sec one by one not 500million of them, maybe this would work?


Yes probably would help.
Usually the rate increases under huge stress, like y-cruncher.


----------



## Piers

Veii said:


> by being a bad PCB design


Can you please expand on this?

I've had one WHEA error since using this system (including 2000 IF) and manual overclocking, but the error was caused by Nvidia's craptastic drivers.


----------



## Veii

Piers said:


> Can you please expand on this?


Old news, has been fixed
Was an "side product issue" of the PCH link instability - and unstable Realtek NIC firmware

Should have written "bad component selection" 
As it was also with Intels I225-V Rev.02 ~ but they where not the main contributrs
It's been unstable DPM links all along, although we got "correctly tracking" tools only since around 4-5 months

The issue before couldn't be investigated that deeply, soo "side product issues" where thought to be "main issues"


----------



## Piers

Veii said:


> was also with Intels I225-V Rev.02


The I225-V REV_0*3* still has massive issues, but Intel refuses to listen.


----------



## Bohemian

maksimin11 said:


> Thx to information.
> Success TM5 50cyc pass with GDM OFF 2T
> DIMM 1.52v
> 
> 
> Sub timing is still not yet set.
> 
> View attachment 2534581
> 
> View attachment 2534582
> 
> View attachment 2534584


You guys overvolting Your dims too much if this is for Your daily operation. Absolute highest recommend voltage for long term daily use is 1.5V. Above is possible for short duration of time like benchmarking.
Also I had amazing scores in Aida64 at 4000mhz fclk 2000 cl16 at 1.48V, but in other tests like cpu-z, Geekbench5, 3d mark and built in in game benchmarks I had better results/higher FPS with 4x3733mhz cl15.
If You insist of runing so high voltage on You daily basis, be sure You run it on open test bench or high airflow case with active ram cooling to delay degradation when Your systém start throwing whea errors even on stock ram settings.
Theres reason why there are verry few kits on market with rated 1.5V on xmp profile and no manufacturer goes above this point.


----------



## Luggage

Bohemian said:


> You guys overvolting Your dims too much if this is for Your daily operation. Absolute highest recommend voltage for long term daily use is 1.5V. Above is possible for short duration of time like benchmarking.
> Also I had amazing scores in Aida64 at 4000mhz fclk 2000 cl16 at 1.48V, but in other tests like cpu-z, Geekbench5, 3d mark and built in in game benchmarks I had better results/higher FPS with 4x3733mhz cl15.
> If You insist of runing so high voltage on You daily basis, be sure You run it on open test bench or high airflow case with active ram cooling to delay degradation when Your systém start throwing whea errors even on stock ram settings.
> Theres reason why there are verry few kits on market with rated 1.5V on xmp profile and no manufacturer goes above this point.


There are quite a few 1.55v xmp kits and even some 1.6v


----------



## Piers

Bohemian said:


> Absolute highest recommend voltage for long term daily use is 1.5V.


Source?


----------



## ManniX-ITA

Bohemian said:


> Also I had amazing scores in Aida64 at 4000mhz fclk 2000 cl16 at 1.48V, but in other tests like cpu-z, Geekbench5, 3d mark and built in in game benchmarks I had better results/higher FPS with 4x3733mhz cl15.


That's because your IF couldn't keep up and got unstable.
Read the first post for some advice.
Depending on the AGESA version, stabilization couldn't be possible at all.



Bohemian said:


> Theres reason why there are verry few kits on market with rated 1.5V on xmp profile and no manufacturer goes above this point.


Definitely not, I have this kit and runs stock at 1.55V









F4-4000C14D-16GTZR - G.SKILL International Enterprise Co., Ltd.


Trident Z RGB DDR4-4000 CL14-15-15-35 1.55V 16GB (2x8GB) Featuring the award-winning Trident Z heatspreader design, the Trident Z RGB memory series combines vivid RGB lighting with awesome DDR4 DRAM performance.




www.gskill.com





This one runs at 1.6V:









F4-5333C22D-16GTES - G.SKILL International Enterprise Co., Ltd.


Trident Z Royal Elite DDR4-5333 CL22-32-32-52 1.60V 16GB (2x8GB) The Trident Z Royal Elite series is the upper echelon of DDR4 DRAM performance and design, featuring a meticulously sculpted crystalline pattern across the polished surface of the aluminum heatspreader, a patented full-length...




www.gskill.com





I'm running daily 1.55V on air and with water cooling 1.6-1.65V daily is very common.


----------



## Veii

Bohemian said:


> You guys overvolting Your dims too much if this is for Your daily operation. Absolute highest recommend voltage for long term daily use is 1.5V. Above is possible for short duration of time like benchmarking.


I daily 1.65v and voltage doesn't mean anything if you know Ohm's Law
ICs get unstable near 1.72v by being an Architectural Issue
DIMM PCBs start to crash near 1.6-1.65v, up to PCB Revision and Manufacture differences/tuning



Bohemian said:


> If You insist of runing so high voltage on You daily basis, be sure You run it on open test bench or high airflow case with active ram cooling to delay degradation when Your systém start throwing whea errors even on stock ram settings.


Memory voltage has close to zero connection to CPU voltage
Memory controller voltage barely needs above 900mV in order to run anything up to 4600MT/s
And around 1v cLDO_VDDP to run 5000MT/s on AMD

1v is quite some bit lower than "1.15v dangerous voltage" ~ which also depends on the circumstances of resistance and load-strain (Amperage)
Voltage by itself doesn't result in heat, as voltage by itself doesn't do anything.
It's amperage that generates heat (ohm's law)

"Degradation" of memory is not possible
ICs while being lithographic cells, are not sillicon
They are minuscule capacitors and resistors. These have a fixed lifespan dictated by heat.
Every Cap & Transistor or Resistor, are manufactured within conditions of 90-135° ~ 24/7 with a rated uptime of usually 2 digit years

Voltage does "kill" dimms, but it does not degrade dimms
Nor does it do any damage to memory by the "time" amount you try to define
Please do not spread fear & try to ask on something you're not secure of
~ soo this advice won't be given anymore 

Also ManniX-ITA is 100% correct
The only reason manufactures "fear" , although they don't at all ~ selling XMP kits
Is that first XMP wasn't mature, and 2nd it's the Termination voltages that dictate what is done with this VDIMM voltage
Usually beyond 1.46-1.52v , default presets overvolt the PCB and the PCB crashes = instability
It doesn't mean it's not runnable, it doesn't mean it's dying. But it simply means, that industry wasn't ready for higher voltage dimms.

There is zero fear attached to this and shouldn't need to be
As JEDEC changes month by month, so does industry develope further.
1.5v is not a problem, if you know what to set up.
Again , i daily 1.65-1.66v. Zero issues & haven't had a dead dimm so far.
But if you do not touch your RTTs (AMD & Intel) - it very well can result in too high amperage and the PCB crashing.
The ICs by themself are far more tolerable than the PCBs they are put on


----------



## ManniX-ITA

Veii said:


> Is that first XMP wasn't mature, and 2nd it's the Termination voltages that dictate what is done with this VDIMM voltage


In my opinion XMP it's just architecturally limited. 
it was decently good for DDR3 but for DDR4 it's just not enough.
Once you raise VDIMM and tight the timings all the specifics of the setup, DIMM PCB, IC binning and motherboard design kicks in.
Almost everything is dynamically set during memory training at boot and it's very likely going to fail if the XMP profile is not extremely conservative.
Even on the same model of CPU and an almost identical motherboard different kits may need very different manual adjustments to run reliably.



Veii said:


> Memory controller voltage barely needs above 900mV in order to run anything up to 4600MT/s
> And around 1v cLDO_VDDP to run 5000MT/s on AMD


Just found out AGESA 1.2.0.5 needs VDDP at 1000mV instead of the usual 900mV to run 4000CL14 T2.
Can run 4000CL16 no problem at 900mV.


----------



## Bohemian

Piers said:


> Source?











MemTestHelper/DDR4 OC Guide.md at oc-guide · integralfx/MemTestHelper


C# WPF to automate HCI MemTest. Contribute to integralfx/MemTestHelper development by creating an account on GitHub.




github.com







https://help.corsair.com/hc/en-us/articles/360052448851-Tips-on-safely-overclocking-memory











What Is The Safe Voltage Range For DDR4 Memory Overclocking? - Legit Reviews







www.legitreviews.com





There are numerous of Source suggest Ing not going above 1.5V for daily 24/7 usage. If You dont care about Your money spent, You can freely use even 1.65V, but anything above 1.5V is potentionaly fatal for Your memory sticks.


----------



## Veii

Bohemian said:


> There are numerous of Source suggest Ing not going above 1.5V for daily 24/7 usage. If You dont care about Your money spent, You can freely use even 1.65V, but anything above 1.5V is potentionaly fatal for Your memory sticks.


You post sources of 2014, and a contributional guide
Contributional = up to information shared it "developes" further. No information shared, or living with fear ~ no further develope

You also ignored (probably) my post and ManniX's
Because "maybe" it disagree's with your information, and common human behavior is to do everything but not blame themself being wrong. It's just the nature of humans

Both pages/posts including "legit" sold kits, exist but you ignored the existance of them.
* i'm very sure the whole G.Skill team who use custom PCBs too ~ are more than capable to know what is safe and what is dangerous under X circumstance
Explained by me that industry developes , and that ohm's law exist ~ soo what really matters is receiving Amperage [A] not used Voltage [V]
~ got also ignored

I mean, what should we do at this point,
You refuse to acknowledge that industry changes & refuse to likely even read ~ when people try to teach you something.
The information you shared is outdated and not helpful. Industry improves








And the information corsair shares, sadly is not helpful either ~ it's even a bit embarrassing that this post was made end of 2020, when they should know it better. Even HynixMFR often used in the cheap RGB kits, has a higher voltage tolerance (1.62 peak) without having any PCB issue. Nor IC issue

What means "not safe". Memory IC is not silicon. Memory IC can not degrade, period.
Memory IC can show issues if you run it very long at 90-95°, same as powersupplies ~ but this does not match reality.

It's sad to see soo many conflicting information's, and brands rather focusing on their law-based right, instead of actually supplying correct "IF , THEN" information.
Voltage by itself does not do anything
1.4v is soo low, that every Vendor can use this as law-based guarantee ~ soo nobody can ever kill their memory. It simply won't happen, soo such bold statement can be made

Just overclocking aside,
Information shared is simply flawed, incomplete and without explanation
It's like telling a bird, it can not fly ~ but not why it can't or under which rare circumstances this issue/voltage is bad.
in 98% the cases it's not even close to dangerous. And hence there is no degradation ~ things like "safe" can not exist. Logic


----------



## Bohemian

Okay my bad that there are actualy two kits on the market that are xmp rated above 1.5V, while 99,5% kits are 1,45V(yes still even in 2022)or below.
Only person who want to " teach me something" is You. I am not stupid. With all respect You are nobody my friend. I rather follow some serious tech Source than "advice" from some cocky user, who thing his smarter than manufacturer them selfs. No ofence buddy. If some want torture his ram sticks above its rated safe operation, so be it. I sayd if You do so, use minimal air fan blowing on Your ram sticks to prevent "frying it". If someone is runing his ram at 1.65V under watter 24/7 its not standard conditions and should not be recommend as "normal and safe" behaviour. That is extremely rare operational conditions, that should not be followed by ordinary pc user.


----------



## Veii

Bohemian said:


> Only person who want to " teach me something" is You. I am not stupid. ,I rather follow some serious tech Source than "advice" from some cocky user, who thing his smarter than manufacturer them selfs. No ofence buddy


The "i am stupid" part is in your head only. I didn't say such things

But i see what i see, you did intentionally ignore both posts which hopefully had zero critique
The 2nd post was a direct target quote ~ it shouldn't be "cocky" too but hence you ignored people's answers ~ that's the intention you brought to the thread. I had no malice or any other negative intention.

The only intention i had, is explaining correctly that impedance's, resistance and amperage is what matters for a PCB
No "cocky" person would sit , write you a paragraph or even bother to care about you ~ explaining that you maybe don't see picture "because voltage doesn't matter".
Soo tell me, where did i do you wrong to deserve such titles ?
I can't understand these assumptions and bad words

I've run HynixMFR on 1.62v for over a year on a 1700X. Zero damage or changes. It can't even, because memory does not degrade
At this time (2017) i didn't even know what RTTs where or how to work with them
Today i halfway know how to work with them, and can clearly see that voltage till the PCB crashes (not dies) ~ does not matter and is unique to the PCB design
* yes i do own most of the PCB schematics (customs included) , but it doesn't matter here what "i do" ~ it's the fact that ICs can not degrade, and the PCBs crash far earlier than the ICs themself.


Bohemian said:


> With all respect You are nobody my friend.


That's not for you to judge 
I'm independent and don't need to quote "sources". I am the source
Sadly, the only person that judges and compares others ~ is you

Well this brings nobody forward and wastes thread space, which is offtopic ontop of that
Can only say, that you are wrong with your judgement ~ as nobody had any malice intentions against you, and wouldn't want to spare time writing and explaining
Nobody had negative intentions towards you, soo nobody deserves this judgement either.
I only wrote that you ignored our posts, when the answer was in-front of you & the picture is never Black&White. Impedance's matter ~ how VDIMM behaves. VDIMM by itself does not even create heat
The voltage "advices" are based upon memory predictions by the ODMs and based upon old A0 PCB , which is very sensitive towards voltage
Yet even on that, i daily 1.65v without dimms ever getting warm (on A0 Viper Stellt 4000s) ~ because voltage is irrelevant. Resistance , impedance is relevant for what Amperage ends on the DIMM-PCBs.

EDIT:
Recommended limits, by the memOC guide above (which is a community project, again) are to guarantee DIMM-PCBs do not crash
DIMM PCBs can have burn marks, but DIMM PCBs can not degrade. And ICs have far higher architectural limits (up to which IC you get) .
Topic * absolutely is not black & white, and the only intention was to explain it to you ~ because it reads, like you trow everything in one basket. And such indeed is wrong, nothing argue here.
It's very unfortunate that companies do not write their "when and why's" ~ soo people get thought wrong things and imagine it is reality.
* but aside from black & white, topic indeed is offtopic here.
You came and started to judge people. And such is reality too


----------



## Audioboxer

Bohemian said:


> MemTestHelper/DDR4 OC Guide.md at oc-guide · integralfx/MemTestHelper
> 
> 
> C# WPF to automate HCI MemTest. Contribute to integralfx/MemTestHelper development by creating an account on GitHub.
> 
> 
> 
> 
> github.com
> 
> 
> 
> 
> 
> 
> 
> https://help.corsair.com/hc/en-us/articles/360052448851-Tips-on-safely-overclocking-memory
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> What Is The Safe Voltage Range For DDR4 Memory Overclocking? - Legit Reviews
> 
> 
> 
> 
> 
> 
> 
> www.legitreviews.com
> 
> 
> 
> 
> 
> There are numerous of Source suggest Ing not going above 1.5V for daily 24/7 usage. If You dont care about Your money spent, You can freely use even 1.65V, but anything above 1.5V is potentionaly fatal for Your memory sticks.


G.SKILL 1.55v rated DR DDR4 kit (my kit!) F4-4000C14D-32GVK-G.SKILL International Enterprise Co., Ltd.

TEAMGROUP 1.6v rated SR DDR4 kits XTREEM ARGB DDR4 DESKTOP MEMORY

It's simply not true having a statement as simple as 'going above 1.5V is potentially fatal'.

No offence is intended, just important not to scare people into thinking they're going to break their RAM.


----------



## Luggage

edit . that listing was bad.
deleted


----------



## OCmember

Read up to a few pages back. Is there a fix for WHEA error 19? I am stable with memory tests, and a little of y-cruncher but as soon as I run OCCT it WHEA errors. I'm currently stuck with the VDDG bug too and in the middle of updating the bios to F36a to see if it fixes it.


----------



## ManniX-ITA

OCmember said:


> Read up to a few pages back. Is there a fix for WHEA error 19?


There's no fix.
You can decide to ignore it but then it's up to you to verify it's not a reliability or performance issue.
Run benchmarks at a lower FCLK and compare the scores to verify are same or higher.
A long stress test session, 12 cycles and more, with y-cruncher to verify stability.


----------



## OCmember

ManniX-ITA said:


> There's no fix.
> You can decide to ignore it but then it's up to you to verify it's not a reliability or performance issue.
> Run benchmarks at a lower FCLK and compare the scores to verify are same or higher.
> A long stress test session, 12 cycles and more, with y-cruncher to verify stability.


Did we find out exactly what the problem was?

I'm thinking the IF in my gaming rig 5800X can do 1900 + so I might try the suppressor you made. Will stability tests still behave the same e.g. if I'm unstable with Corecycler, P95, OCCT, y-cruncher will they still return errors exploiting instability or will the WHEA error suppressor "tool" you made nullify that?



ManniX-ITA said:


> Best and almost only way to check for performance regressions and improvements over lower FCLK is the monero xmr-stak-rx miner:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Releases · fireice-uk/xmr-stak
> 
> 
> Free Monero RandomX Miner and unified CryptoNight miner - fireice-uk/xmr-stak
> 
> 
> 
> 
> github.com


Why wouldn't other benchmarks apps work just as good? e.g. AIDA64 memory bench?


----------



## OCmember

nm


----------



## ManniX-ITA

OCmember said:


> Did we find out exactly what the problem was?


Unfortunately we don't have enough tools at our disposal to deep dive in.
Some theories but none that can be proven.



OCmember said:


> Will stability tests still behave the same e.g. if I'm unstable with Corecycler, P95, OCCT, y-cruncher will they still return errors exploiting instability or will the WHEA error suppressor "tool" you made nullify that?


Yes you still get errors when it's unstable.
The suppressor will help with performances when you get loads of WHEA 19s that can slow down just because they have to be processed by Windows.

My suggestion is to run the stability tools without the suppressor and check you don't get WHEA 18 or 20.
Once you are sure that you only get WHEA 19s check the performances with the suppressor enabled.
Wide range of benchmarks but more specifically it's crucial you get a lower time with y-cruncher pi-25b calculation (you'll see easily if there's a positive or negative delta).
The move onto the xmr-stak-rx monero miner, check on the first post.
Don't run the miner in benchmark mode, you need to set it up for real mining and see if you get better scores after 15 minutes.


----------



## OCmember

ManniX-ITA said:


> Unfortunately we don't have enough tools at our disposal to deep dive in.
> Some theories but none that can be proven.
> 
> 
> 
> Yes you still get errors when it's unstable.
> The suppressor will help with performances when you get loads of WHEA 19s that can slow down just because they have to be processed by Windows.
> 
> My suggestion is to run the stability tools without the suppressor and check you don't get WHEA 18 or 20.
> Once you are sure that you only get WHEA 19s check the performances with the suppressor enabled.
> Wide range of benchmarks but more specifically it's crucial you get a lower time with y-cruncher pi-25b calculation (you'll see easily if there's a positive or negative delta).
> The move onto the xmr-stak-rx monero miner, check on the first post.
> Don't run the miner in benchmark mode, you need to set it up for real mining and see if you get better scores after 15 minutes.


Thanks. 

I'll pass on the mining app.


----------



## ManniX-ITA

OCmember said:


> I'll pass on the mining app.


I wouldn't but up to you. You don't need to mine for real, I use a fake address.
With y-cruncher you can check if there are latency issues on the IF.
But only with the miner you can check if the IF is falling down with high load.
It's the only "benchmark" that keeps a very steady 45 GBps rate on memory.


----------



## OCmember

I appreciate it, thank you


----------



## tdimarzio

So I've been running my 5950x at IF1900 for a couple months now. In those couple months I've averaged about 2 WHEA 19 per week, always with the same message:

A corrected hardware error has occurred.
Reported by component: Processor Core
Error Source: Unknown Error Source
Error Type: Bus/Interconnect Error
Processor APIC ID: 0

Otherwise the system is completely stable. Hundreds of hours of CoreCycler, TestMem5, Prime95, gaming, Virtual Machines, all while running 24x7. But, I cannot help but feel a little insecure about those couple WHEA 19 per week. I could back FCLK down to 1866, but I'm wondering if there's anything else I can do to eliminate those couple WHEA 19 a month. Or, should I not even worry about ~4 WHEA 19 correctable per month?


----------



## konawolv

tdimarzio said:


> So I've been running my 5950x at IF1900 for a couple months now. In those couple months I've averaged about 2 WHEA 19 per week, always with the same message:
> 
> A corrected hardware error has occurred.
> Reported by component: Processor Core
> Error Source: Unknown Error Source
> Error Type: Bus/Interconnect Error
> Processor APIC ID: 0
> 
> Otherwise the system is completely stable. Hundreds of hours of CoreCycler, TestMem5, Prime95, gaming, Virtual Machines, all while running 24x7. But, I cannot help but feel a little insecure about those couple WHEA 19 per week. I could back FCLK down to 1866, but I'm wondering if there's anything else I can do to eliminate those couple WHEA 19 a month. Or, should I not even worry about ~4 WHEA 19 correctable per month?


Are you using curve optimizer?

That error is telling you that your core 0 (first core) is erroring. You may need to adjust your co settings down a couple ticks on that core.


----------



## tdimarzio

konawolv said:


> Are you using curve optimizer?
> 
> That error is telling you that your core 0 (first core) is erroring. You may need to adjust your co settings down a couple ticks on that core.


Yes, I do use CO but those values have been vetted over hundreds of hours of CoreCycler and general use. Also, I've tried disabling CO altogether and it had no effect on these WHEA19. In my experience, a bad CO will result in WHEA 18, not WHEA 19. I can eliminate the WHEA 19 by reducing fclk. What I'm wondering is if a couple correctable WHEA 19 a week is worth the additional performance of IF1900 over IF1866.


----------



## ManniX-ITA

tdimarzio said:


> Yes, I do use CO but those values have been vetted over hundreds of hours of CoreCycler and general use. Also, I've tried disabling CO altogether and it had no effect on these WHEA19. In my experience, a bad CO will result in WHEA 18, not WHEA 19. I can eliminate the WHEA 19 by reducing fclk. What I'm wondering is if a couple correctable WHEA 19 a week is worth the additional performance of IF1900 over IF1866.


WHEA 19 can happen also when the core is slightly unstable, 18 is usually catastrophic.

Probably when boosting Core 0 is stressing too much the fabric.
Try with a slight bump up of VDDG IOD and/or VSOC.
Could also be a non optimal VDDG CCD, you need to test with slightly lower and higher voltages.


----------



## tdimarzio

ManniX-ITA said:


> WHEA 19 can happen also when the core is slightly unstable, 18 is usually catastrophic.
> 
> Probably when boosting Core 0 is stressing too much the fabric.
> Try with a slight bump up of VDDG IOD and/or VSOC.
> Could also be a non optimal VDDG CCD, you need to test with slightly lower and higher voltages.


Thanks. I will try those things and see if some combination can eliminate the couple WHEA19 a week. In the meantime, just to be 100% sure, I went down to fclk 1866 leaving CO enabled. So far it's been over a week with no WHEA 19. I will wait another week to be sure there are zero WHEA 19 when FCLK <= 1866. Then I can try bringing back to fclk 1900 and trying your suggestions.


----------



## ManniX-ITA

tdimarzio said:


> Thanks. I will try those things and see if some combination can eliminate the couple WHEA19 a week. In the meantime, just to be 100% sure, I went down to fclk 1866 leaving CO enabled. So far it's been over a week with no WHEA 19. I will wait another week to be sure there are zero WHEA 19 when FCLK <= 1866. Then I can try bringing back to fclk 1900 and trying your suggestions.


Yes good strategy.
Usually at FLCK 1900 if they are so sporadic they can be fixed finetuning voltages.
Another option to test is adding something to the PLL (VDD18), from 10 to 40mV.


----------



## tcclaviger

So just a touch of feedback for you ManniX-ITA.

5950X was run for almost a year, since this was released u til 3 days ago, running the suppressor at 1966 or 2000 C14 4x8 and 2x16 and 2x8+2x16 1:1. 24/7, every imaginable workload.

The only crashes ever experienced were CO being too aggressive or me forgetting to turn on the chiller (system tuned to require sub 60f loop temps), or when aggressivly tuning memory on a C7H, then Strix Gaming x570 II.

Both boards would stream 50+ 19s, both showed no regression and correct scaling once suppressor was enabled.

Now on x3d and C6E and C8E, running 2050 FCLK 1:1. Sub 90s y-crucher 2.5b in benchmate. No issues.

Pretty safe to say at this point I stand on the "error spam if all else is stable is a ghost error issue" side of things.

Thanks for making and sharing this last year!


----------



## Taraquin

It seems too low SOC, VDDG IOD and VDD18 can contribute to WHEA19, unsure about VDDG CCD and VDDP. If there are a few some simple voltage adjustments can make them go away, are there very many there is probably little hope.


----------



## tcclaviger

Taraquin said:


> It seems too low SOC, VDDG IOD and VDD18 can contribute to WHEA19, unsure about VDDG CCD and VDDP. If there are a few some simple voltage adjustments can make them go away, are there very many there is probably little hope.


I'll second how critical finding the right (not high necessarily) voltages is. X3D can go from 1 error a minute or so at 2050 FCLK to 100+ a minute just by changing SOC and PLL voltages the values I found for my chip through trial and error!


----------



## Taraquin

tcclaviger said:


> I'll second how critical finding the right (not high necessarily) voltages is. X3D can go from 1 error a minute or so at 2050 FCLK to 100+ a minute just by changing SOC and PLL voltages the values I found for my chip through trial and error!


Yeah, too high values can also cause trouble. I see a lot of motherboards running VDDG CCD\IOD and VDDP at 1.1v on auto, that might cause issues for some. I generally prefer lowest stable value with no performance penalty as it frees up more powerbudget for the cores


----------



## des2k...

so 2000IF (stress tested, passed) I get the whea19 warnings, no performance loss or DPC latency

I've noticed I have lots of contacting server when playing Destiny 2(always online game) vs 1900IF

Using the Intel Gb NIC (x570 aorus master rev1) under 2000IF I can't get more than 700mbs for bandwidth. Under 1900IF it's always 900mbs+

Also have the realtek 2.5Gb that I didn't test. But this points to be something with the PCIE bus / NIC ?


----------



## ManniX-ITA

des2k... said:


> Also have the realtek 2.5Gb that I didn't test. But this points to be something with the PCIE bus / NIC firmware after 1900IF.


Didn't test with the Master but on the Unify-X my Realtek NIC could do 2.1 GBps with iperf.
It may be a NIC FW issue indeed.


----------



## des2k...

*holy crap* ! unplugged from Intel NIC, I'm on 2.5gb Realtek now ; would you look at this empty window  ? For reference booting with the Intel NIC is a good 300 whea19 every 2mins


----------



## LazyGamer

des2k... said:


> *holy crap* ! unplugged from Intel NIC, I'm on 2.5gb Realtek now ; would you look at this empty window  ? For reference booting with the Intel NIC is a good 300 whea19 every 2mins


This is the '211' Gigabit NIC, right? Couldn't find it listed out elsewhere.


----------



## des2k...

LazyGamer said:


> This is the '211' Gigabit NIC, right? Couldn't find it listed out elsewhere.


yep it's Intel I211


----------



## Taraquin

des2k... said:


> *holy crap* ! unplugged from Intel NIC, I'm on 2.5gb Realtek now ; would you look at this empty window  ? For reference booting with the Intel NIC is a good 300 whea19 every 2mins
> 
> View attachment 2559166


Try adjusting SOC, IOD, VDDP and VDD18 up or down, may get rid of the last WHEA19 since you have quite few.


----------



## des2k...

Taraquin said:


> Try adjusting SOC, IOD, VDDP and VDD18 up or down, may get rid of the last WHEA19 since you have quite few.


Not sure any of that is worth much to me since the intel nic on that pcie2.0 x1lane is the one with 300+ whea 19 every 3mins. Just un-plugging the cord is 0 whea19.

soc,iod,vddp,vdd18 auto work up to 2000IF with 2100IF being the limit on this 5900x.

Also 4x8Gb limit seem to be around 4000 speed. Around 2066IF there's performance loss(but could be async ratio being too big) & 2100IF hard crash.

3800cl14 & 4000cl16 passed my testing, either one is good enough for me for 24/7 with my bdie kits.

I could sell them and get 2x16 bdie already binned for 4000cl14, but I don't need past 63GB/s sub 52ns latency😛 on this 5900x


----------



## xolhid

des2k... said:


> Not sure any of that is worth much to me since the intel nic on that pcie2.0 x1lane is the one with 300+ whea 19 every 3mins. Just un-plugging the cord is 0 whea19.
> 
> soc,iod,vddp,vdd18 auto work up to 2000IF with 2100IF being the limit on this 5900x.


Auto voltages are likely not optimal or stable. It's really easy to fix as well.

First find your lowest VDDP that doesn't lose performance or stability. 

Then CCD, but don't go below VDDP.

Then IOD, but no lower than 40mv above CCD.

Then SOC, but no lower than 40mv above IOD.

Ryzen 5000 seems to function optimally when VDDP, CCD & IOD use 40mv stepping between them. You can have more than 40mv as long as it is a multiple of 40mv.

SOC doesn't have to follow the stepping as long as it is 40mv above IOD. SOC does impact the cache performance of the CPU, so don't get too low. Too high and it can negatively effect performance consistency like DPC latency. I use LatencyMon, AIDA64 memory bench & DRAM Calculator latency Graph to determine the optimal setting.

VDD18 didn't help me at all, but may be something to consider. 

When optimizing VDDP, CCD & IOD you can reduce & completely eliminate WHEA19 errors. When you chage one, you have to change the other two to maintain 40mv stepping.

I have done this with 4 Ryzen5000 setups and it works every time. Trusting the default board settings just because it "boots into windows" or "works fine" is up to you, but I don't recommend it.

My current setup with a 5950x and Asus TUF Gaming X570-Plus Wi-Fi. 3933mhz Stable.
VDDP 860mv > CCD 980mv > IOD 1060mv
SOC 1140mv

I don't need the WHEAService. Once I dialed in the correct voltage settings there aren't any WHEA19 at all. The system is butter smooth compared to the fubar default voltages set by the motherboard.


----------



## des2k...

xolhid said:


> Auto voltages are likely not optimal or stable. It's really easy to fix as well.
> 
> First find your lowest VDDP that doesn't lose performance or stability.
> 
> Then CCD, but don't go below VDDP.
> 
> Then IOD, but no lower than 40mv above CCD.
> 
> Then SOC, but no lower than 40mv above IOD.
> 
> Ryzen 5000 seems to function optimally when VDDP, CCD & IOD use 40mv stepping between them. You can have more than 40mv as long as it is a multiple of 40mv.
> 
> SOC doesn't have to follow the stepping as long as it is 40mv above IOD. SOC does impact the cache performance of the CPU, so don't get too low. Too high and it can negatively effect performance consistency like DPC latency. I use LatencyMon, AIDA64 memory bench & DRAM Calculator latency Graph to determine the optimal setting.
> 
> VDD18 didn't help me at all, but may be something to consider.
> 
> When optimizing VDDP, CCD & IOD you can reduce & completely eliminate WHEA19 errors. When you chage one, you have to change the other two to maintain 40mv stepping.
> 
> I have done this with 4 Ryzen5000 setups and it works every time. Trusting the default board settings just because it "boots into windows" or "works fine" is up to you, but I don't recommend it.
> 
> My current setup with a 5950x and Asus TUF Gaming X570-Plus Wi-Fi. 3933mhz Stable.
> VDDP 860mv > CCD 980mv > IOD 1060mv
> SOC 1140mv
> 
> I don't need the WHEAService. Once I dialed in the correct voltage settings there aren't any WHEA19 at all. The system is butter smooth compared to the fubar default voltages set by the motherboard.


Auto voltages on x570 aorus master with this 5900x b2:
soc auto, 1.2v set 1.175v real
vddp auto, 900.2mv
vddg auto, 997.6mv

It passed Prime95 Large & Blend for weeks since I had to do multiple 8h runs for CO validation both under 1900IF & 2000IF.
I moved vddg to 1050mv after the first test to be 100% sure for 2000IF/4000cl16 config.

I'm 100% not looking for undervolting soc,vddg at these high frequency. I know at best it's ~10w power saving on the package & crazy unstable so I can't validate with any fixed load testing.

I just don't have temperature or package wattage issue. IO die sits at delta 0 water (24c) for games and Cores at delta 2 for idle / low loads.

Under 210w R23 is 65c+ & Prime95 small 190w is 70c+ with custom loop.

The IF limit on this is cpu is about 2066-2100 which my 4x8 kits can't post on this daisy chain board.

So I'm 99.9% sure 2000IF is good 24/7 for apps,games & heavy load since I'm -100mhz from the IF limit on this CPU.


----------



## xolhid

des2k... said:


> Auto voltages on x570 aorus master with this 5900x b2:
> soc auto, 1.2v set 1.175v real
> vddp auto, 900.2mv
> vddg auto, 997.6mv
> 
> It passed Prime95 Large & Blend for weeks since I had to do multiple 8h runs for CO validation both under 1900IF & 2000IF.
> I moved vddg to 1050mv after the first test to be 100% sure for 2100IF/4000cl16 config.
> 
> I'm 100% not looking for undervolting soc,vddg at these high frequency. I know at best it's ~10w power saving on the package & crazy unstable so I can't validate with any fixed load testing.


Try y-cruncher and post your results. At least 1 hour, but 4 hours should be enough to declare error stable.

Prime95 isn't a reliable stress test for modern systems. Stability isn't just if you don't get errors. Prime95 doesn't test for DPC latency spikes or CPU Cache performance issues.

My instructions aren't about "undervolting". I specifically said find the lowest settings that "doesn't lose performance or stability." Undervolting usually implies sacrificing performance. Finding the optimal settings for the best performance is what I am promoting.

By lowering voltages without losing performance you are putting less voltage stress and thermal load on the CPU and its memory controller. It could and usually does allow you to improve the performance from overclock settings.


----------



## ManniX-ITA

des2k... said:


> So I'm 99.9% sure 2000IF is good 24/7 for apps,games & heavy load since I'm -100mhz from the IF limit on this CPU.


There are two tests to validate the IF and understand if there's a performance regression or not.
System can look perfectly fine but maybe under the hood is not.
But then it's up to you to decide.
Even if slightly unstable under heavy load doesn't mean it's not good enough to be faster than IF 1900 with your usual workload.

You need to run benchmarks first at IF 1900 and than at higher IF and compare.

One is y-cruncher prime 2.5b, you can run it directly or via BenchMate.

It should take around 64-66 seconds at IF 1900.
If it takes longer at higher IF speed, there's performance regression.

Then there's the XMR-Stak-RX.
You can use BenchMaestro in my signature but the best way is to configure the miner for real mining.
The benchmark mode is not stressing like the real workload.
The real workload can peak the memory bandwidth usage like anything else.

You should get mining between 18800 and 19200 H/s at IF1900.
Again if it's lower at higher IF means it's unstable.

For fine-tuning VDDG CCD and IOD, I'd recommend GeekBench 5.
Compare one by one the test results with different voltages and you'll find out which is the best spot.
VDDG CCD can be tuned looking at the AES-XTS score.


----------



## des2k...

ManniX-ITA said:


> There are two tests to validate the IF and understand if there's a performance regression or not.
> System can look perfectly fine but maybe under the hood is not.
> But then it's up to you to decide.
> Even if slightly unstable under heavy load doesn't mean it's not good enough to be faster than IF 1900 with your usual workload.
> 
> You need to run benchmarks first at IF 1900 and than at higher IF and compare.
> 
> One is y-cruncher prime 2.5b, you can run it directly or via BenchMate.
> 
> It should take around 64-66 seconds at IF 1900.
> If it takes longer at higher IF speed, there's performance regression.
> 
> Then there's the XMR-Stak-RX.
> You can use BenchMaestro in my signature but the best way is to configure the miner for real mining.
> The benchmark mode is not stressing like the real workload.
> The real workload can peak the memory bandwidth usage like anything else.
> 
> You should get mining between 18800 and 19200 H/s at IF1900.
> Again if it's lower at higher IF means it's unstable.
> 
> For fine-tuning VDDG CCD and IOD, I'd recommend GeekBench 5.
> Compare one by one the test results with different voltages and you'll find out which is the best spot.
> VDDG CCD can be tuned looking at the AES-XTS score.


I will post the 2000IF then 1900IF mining & y-cruncher scores soon. Just waiting on 4h y-cruncher to pass.


----------



## des2k...

xolhid said:


> Try y-cruncher and post your results. At least 1 hour, but 4 hours should be enough to declare error stable.
> 
> Prime95 isn't a reliable stress test for modern systems. Stability isn't just if you don't get errors. Prime95 doesn't test for DPC latency spikes or CPU Cache performance issues.
> 
> My instructions aren't about "undervolting". I specifically said find the lowest settings that "doesn't lose performance or stability." Undervolting usually implies sacrificing performance. Finding the optimal settings for the best performance is what I am promoting.
> 
> By lowering voltages without losing performance you are putting less voltage stress and thermal load on the CPU and its memory controller. It could and usually does allow you to improve the performance from overclock settings.


2000IF with 4 cores CO -6 & 8 cores CO -12 is working so far for 1h+









Was 30mins fail with these below but I was expecting that since those values were close to Prime95 failing around 6hour mark (CO-16 & CO-18)
4 cores CO -6
4 cores CO -14
4 cores CO -16


----------



## des2k...

xolhid said:


> Try y-cruncher and post your results. At least 1 hour, but 4 hours should be enough to declare error stable.
> 
> Prime95 isn't a reliable stress test for modern systems. Stability isn't just if you don't get errors. Prime95 doesn't test for DPC latency spikes or CPU Cache performance issues.
> 
> My instructions aren't about "undervolting". I specifically said find the lowest settings that "doesn't lose performance or stability." Undervolting usually implies sacrificing performance. Finding the optimal settings for the best performance is what I am promoting.
> 
> By lowering voltages without losing performance you are putting less voltage stress and thermal load on the CPU and its memory controller. It could and usually does allow you to improve the performance from overclock settings.


It's a pass for 2000IF & CO-6 CO-12 for 4hours. Here are peaks for wattage,temps.

SOC,VDDP auto
VDDG 1050mv

I will use this tool again for tweaking each core, see if I can push past -12 on some of them.


----------



## des2k...

ManniX-ITA said:


> There are two tests to validate the IF and understand if there's a performance regression or not.
> System can look perfectly fine but maybe under the hood is not.
> But then it's up to you to decide.
> Even if slightly unstable under heavy load doesn't mean it's not good enough to be faster than IF 1900 with your usual workload.
> 
> You need to run benchmarks first at IF 1900 and than at higher IF and compare.
> 
> One is y-cruncher prime 2.5b, you can run it directly or via BenchMate.
> 
> It should take around 64-66 seconds at IF 1900.
> If it takes longer at higher IF speed, there's performance regression.
> 
> Then there's the XMR-Stak-RX.
> You can use BenchMaestro in my signature but the best way is to configure the miner for real mining.
> The benchmark mode is not stressing like the real workload.
> The real workload can peak the memory bandwidth usage like anything else.
> 
> You should get mining between 18800 and 19200 H/s at IF1900.
> Again if it's lower at higher IF means it's unstable.
> 
> For fine-tuning VDDG CCD and IOD, I'd recommend GeekBench 5.
> Compare one by one the test results with different voltages and you'll find out which is the best spot.
> VDDG CCD can be tuned looking at the AES-XTS score.


64-66 really ? I have a hard time believing that since it's 93 on this review with 3200mem










and here's mine at 2000IF 4000cl16









Might as well boot 1900IF and try it....


----------



## ManniX-ITA

des2k... said:


> Might as well boot 1900IF and try it....


Yep it's a bit too much...

This is a good result at FCLK 2000:









And this a good one at FCLK 1900:


----------



## des2k...

ManniX-ITA said:


> Yep it's a bit too much...
> 
> This is a good result at FCLK 2000:
> 
> 
> And this a good one at FCLK 1900:


I got it here y-cruncher - A Multi-Threaded Pi Program and my 3200 score matches that review screenshot I posted earlier

it complains about performance loss ?; here's 1600IF with no CO


----------



## ManniX-ITA

des2k... said:


> I got it here y-cruncher - A Multi-Threaded Pi Program and my 3200 score matches that review screenshot I posted earlier


Both y-cruncher and xmr-stak-rx scores are dependent on the memory settings.
You need to compare your IF 1900 against your IF 2000.



des2k... said:


> it complains about performance loss ?; here's 1600IF with no CO


You need to add the privilege to your user:








Enable the Lock Pages in Memory Option (Windows) - SQL Server


Learn how to turn on the Lock Pages in Memory option. See how it can boost performance by keeping data in physical memory instead of paging it to disk.



docs.microsoft.com


----------



## xolhid

des2k... said:


> It's a pass for 2000IF & CO-6 CO-12 for 4hours. Here are peaks for wattage,temps.
> 
> SOC,VDDP auto
> VDDG 1050mv
> 
> I will use this tool again for tweaking each core, see if I can push past -12 on some of them.
> 
> View attachment 2559266


Error stable is a good thing, but is only part of the battle. Performance stable is just as important. You can have a system that doesn't produce errors when stressed to the max, but can still have inconsistent and erratic performance results.

VDDP, VDDG(CCD & IOD, and SOC can and usually will impact performance consistency when they aren't set properly. I find that DPC latency is a good measure of performance stability.

Using AIDA memory benchmark, DRam Calculator Latency Graph & LatencyMon can help dial in these voltages and other overclock settings for the most consistent performance.


----------



## des2k...

ManniX-ITA said:


> Both y-cruncher and xmr-stak-rx scores are dependent on the memory settings.
> You need to compare your IF 1900 against your IF 2000.
> 
> 
> 
> You need to add the privilege to your user:
> 
> 
> 
> 
> 
> 
> 
> 
> Enable the Lock Pages in Memory Option (Windows) - SQL Server
> 
> 
> Learn how to turn on the Lock Pages in Memory option. See how it can boost performance by keeping data in physical memory instead of paging it to disk.
> 
> 
> 
> docs.microsoft.com






















that's what I get with mine, prob needs static OC as I have doubts it's a memory latency / bandwidth issue

3800cl14









4000cl16


----------



## ManniX-ITA

des2k... said:


> that's what I get with mine, prob needs static OC as I have doubts it's a memory latency / bandwidth issue


For a 5900X it's a good result.
Static OC is indeed faster:


----------



## des2k...

xolhid said:


> Error stable is a good thing, but is only part of the battle. Performance stable is just as important. You can have a system that doesn't produce errors when stressed to the max, but can still have inconsistent and erratic performance results.
> 
> VDDP, VDDG(CCD & IOD, and SOC can and usually will impact performance consistency when they aren't set properly. I find that DPC latency is a good measure of performance stability.
> 
> Using AIDA memory benchmark, DRam Calculator Latency Graph & LatencyMon can help dial in these voltages and other overclock settings for the most consistent performance.


yeah.. I know DPC latency, it's 90% of the time around 15us-50us with some peaks; usually doesn't even register drivers outside kernel for routine execution

I know that SQL tweak from earlier(y-cruncher 2.5b) was worst for kernel latency, climbed to 260us+ and started registering driver execution times with firefox / youtube tab.

CO off it's way cleaner for latency (lower peaks) but I don't think that's worth much.

Using Firefox, Youtube & Amazon music.


----------



## tdimarzio

ManniX-ITA said:


> Yes good strategy.
> Usually at FLCK 1900 if they are so sporadic they can be fixed finetuning voltages.
> Another option to test is adding something to the PLL (VDD18), from 10 to 40mV.


Happy to report that I was able to stabilize FCLK 1900 using the suggestions. Left CO values as they were, as they had nothing to do with it. Verified that fclk 1866 was 100% stable (zero WHEA 19 over multiple weeks). Went back to 1900 and first tried various VSOC values between 1.1 and 1.2v. No apparent, consistent effect. Still sporadic WHEA 19. Then I adjusted VDDG IOD and CCD both to 1050mV and BOOM, all WHEA 19 are gone. Unfortunately, as I did both IOD and CCD at the same time, can't be sure which one it was. But I'm very happy that with both at 1050 mV it's completely stable. Thanks for the suggestions!


----------



## ManniX-ITA

tdimarzio said:


> Happy to report that I was able to stabilize FCLK 1900 using the suggestions. Left CO values as they were, as they had nothing to do with it. Verified that fclk 1866 was 100% stable (zero WHEA 19 over multiple weeks). Went back to 1900 and first tried various VSOC values between 1.1 and 1.2v. No apparent, consistent effect. Still sporadic WHEA 19. Then I adjusted VDDG IOD and CCD both to 1050mV and BOOM, all WHEA 19 are gone. Unfortunately, as I did both IOD and CCD at the same time, can't be sure which one it was. But I'm very happy that with both at 1050 mV it's completely stable. Thanks for the suggestions!


Well done!

I'd recommend a bit more fine tuning.
It's really very sample specific but you can probably get better performances.
CCD is very likely too high at 1050 mV.
But also IOD could be too high or too low.

Take baseline benchmarks with GeekBench 5, CB23, y-cruncher pi 2.5b and also BenchMaestro in my signature.

Then start testing lowering CCD voltage by 10mV steps.
Check GB 5, focus on AES-XTS scores.
Considering the WHEA errors it could be optimal around 1000-1020mV.
Once you found the sweet spot for CCD move to IOD.
The IOD sweet spot is often 1050mV but test with 1060-1080 and a bit lower 1020-1040.

Once you have found the best voltages for GB 5 double check you get same or improved scores with the other benchmarks.
Otherwise start testing a bit higher/lower both to compensate.
If it's not optimal for the other benchmarks, it's usually very near, around 10-20mV.


----------



## ChillyRide

Insane tool! Those timings was stable and whea free at 1900, cant say it is stable but no more long boot and perf degradation, add tons of PLL 2.1v


----------



## tdimarzio

ManniX-ITA said:


> Well done!
> 
> I'd recommend a bit more fine tuning.
> It's really very sample specific but you can probably get better performances.
> CCD is very likely too high at 1050 mV.
> But also IOD could be too high or too low.
> 
> Take baseline benchmarks with GeekBench 5, CB23, y-cruncher pi 2.5b and also BenchMaestro in my signature.
> 
> Then start testing lowering CCD voltage by 10mV steps.
> Check GB 5, focus on AES-XTS scores.
> Considering the WHEA errors it could be optimal around 1000-1020mV.
> Once you found the sweet spot for CCD move to IOD.
> The IOD sweet spot is often 1050mV but test with 1060-1080 and a bit lower 1020-1040.
> 
> Once you have found the best voltages for GB 5 double check you get same or improved scores with the other benchmarks.
> Otherwise start testing a bit higher/lower both to compensate.
> If it's not optimal for the other benchmarks, it's usually very near, around 10-20mV.


Thanks. Will look into tuning the voltages further. In the meantime I did some quick cb23 nT and GB5 tests and the results seem mostly unchanged. Hard to say for sure since I can't control for ambient and it's been getting warmer here. But, the last CB23 nT I recorded was a little over 30k (30031) and just recorded 29936. There's also an OS change between those too. Win10 prior / Win11 later. So, not apples-to-apples. GB5 seems about the same. Still, using the current values as the baseline I can try what you've suggested, especially around CCD. I can also try reducing vsoc now that I know it wasn't playing a role in the WHEA 19. Maybe between reducing CCD and vsoc I can squeeze a little more performance. By the time this thing is perfectly tuned it will be time to upgrade to Zen4


----------



## shnyaps

ChillyRide said:


> Insane tool! Those timings was stable and whea free at 1900, cant say it is stable but no more long boot and perf degradation, add tons of PLL 2.1v


Did you mean you have increased 1.8 V up to 2.1V?


----------



## ChillyRide

shnyaps said:


> Did you mean you have increased 1.8 V up to 2.1V?


Yes. With 2.1v it helps with aida bench, lower voltage shows perf degrade. But I am back to my stable 3800:1900. Nothing helps to stabilise ram. I spend countless hours on reading, testing and stabilising my rig with cl13. I am done spending more time.


----------



## tdimarzio

ManniX-ITA said:


> Well done!
> 
> I'd recommend a bit more fine tuning.
> It's really very sample specific but you can probably get better performances.
> CCD is very likely too high at 1050 mV.
> But also IOD could be too high or too low.
> 
> Take baseline benchmarks with GeekBench 5, CB23, y-cruncher pi 2.5b and also BenchMaestro in my signature.
> 
> Then start testing lowering CCD voltage by 10mV steps.
> Check GB 5, focus on AES-XTS scores.
> Considering the WHEA errors it could be optimal around 1000-1020mV.
> Once you found the sweet spot for CCD move to IOD.
> The IOD sweet spot is often 1050mV but test with 1060-1080 and a bit lower 1020-1040.
> 
> Once you have found the best voltages for GB 5 double check you get same or improved scores with the other benchmarks.
> Otherwise start testing a bit higher/lower both to compensate.
> If it's not optimal for the other benchmarks, it's usually very near, around 10-20mV.


Further refinement to the voltages.
CCD indeed was too high at 1050. Stock (1000) was fine for CCD.
Also lowered VSOC from 1200 mV (auto value on Aorus B550 when IF >=1900) to 1100 mV.
So, in the end, the critical value for eliminating the WHEA19 was VDDG IOD @ 1050 (vs. 1000, stock).
In other words, had I known from the beginning, it would have been as simple as adding +50 mV to IOD and everything else on auto.
In time, I will try +/- 10mV increments on IOD to see if there is a more optimal value. Thanks again!


----------



## ManniX-ITA

tdimarzio said:


> In time, I will try +/- 10mV increments on IOD to see if there is a more optimal value. Thanks again!


Nice! You're welcome.

I'd bump up a little VSOC, around 1.125V.
It's probably reading 1.087mV in Zentimings.
Under load, depending on LLC setting, could go even below.
If the gap between VDDG IOD and VSOC is lower than 40-50mV (forgot the exact number) the CPU will auto-correct it.
Which is bad as all auto voltages with AMD...
You can get weird issues, instability, poor performances, stuttering in games.
I had issues with a delta below 60mV.
To be on the safe side set VSOC as read in Zentimings at least 60mV higher than IOD.


----------



## tdimarzio

ManniX-ITA said:


> Nice! You're welcome.
> 
> I'd bump up a little VSOC, around 1.125V.
> It's probably reading 1.087mV in Zentimings.
> Under load, depending on LLC setting, could go even below.
> If the gap between VDDG IOD and VSOC is lower than 40-50mV (forgot the exact number) the CPU will auto-correct it.
> Which is bad as all auto voltages with AMD...
> You can get weird issues, instability, poor performances, stuttering in games.
> I had issues with a delta below 60mV.
> To be on the safe side set VSOC as read in Zentimings at least 60mV higher than IOD.


Ok I'll adjust so VSOC is at least 60mV greater than IOD. Thanks!


----------



## mtrai

@ManniX-ITA and @Veii I just needed to thank you both very much for all you hard work. I now have a much greater understanding of the voltage settings in the bios.

I finally had the time to really dive deep since I stopped mining on my main rig last Friday.

Funny I can post at 2000 but unstable. 1900 is now stable as a rock, was never ever able to get stable above 1833. Unfortunately this motherboard has other issues. (Gigabyte 570 aurora master v1.2) A new MSI MEG X570s ACE MAX should arrive this Wednesday, so get to start retuning. I am sure I will be able to get high stable, but not chance the voltages with this motherboard as it has already taken out several components.

Thanks again, I know y'all do not hear that enough, God knows I did not when I was modding bios for Ryzen 2k boards.


----------



## Tatili

Hi everyone who post this thread, and especially to @ManniX-ITA, I'll adjust so VSOC is at least 60mV greater than IOD and Boom...I can post 2000 CL14, logger Whea 19 Gone !!!
Stable, gaming like a charm, happy with my setup now 4000Mhz, share my good numbers with you all, cheers


----------



## Tatili

Dap


----------



## Dasa

Running windows 11 with WHEA suppressed but I did this a long time ago and don't remember how it was done or how to reverse it so that I can work out which cores are making my system crash when overclocking the CPU.
Any ideas how to bring back the WHEA?


----------



## ManniX-ITA

Dasa said:


> Running windows 11 with WHEA suppressed but I did this a long time ago and don't remember how it was done or how to reverse it so that I can work out which cores are making my system crash when overclocking the CPU.
> Any ideas how to bring back the WHEA?


Maybe look if you have something in scheduled tasks


----------



## Dasa

Found WHEA service and disabled it but still no WHEA and from memory it didn't work with Windows 11 so I found another method but I cant find that other method again so I am not sure how to reverse it.
May have been a reg edit I don't know and google isn't being helpfull.


----------



## Cidious

Just got the X570S Unify-X MAX in to replace my faulty X570 Unify. Thought to give it another go for FCLK 2000 with 5800X3D and Rev E mem. 
Can get the performance to improve with Aida but still tons of WHEA 19.



















What are the next steps for stabilizing and trying to get rid of the WHEA 19 without supressing them.


----------



## ManniX-ITA

Cidious said:


> What are the next steps for stabilizing and trying to get rid of the WHEA 19 without supressing them.


Never read about anyone with a 3D and FCLK 1900+ without WHEA, even by hearsay.

If you want to try, usually what is attempted is higher VDD18 and VSOC/VDDG voltages.
Very small fine-tuning to the 5mV can be necessary.
Problem I see is that the 3D usually likes lower voltages, not higher.

I suggest you first take a baseline of y-cruncher benchmark pi2.5b and hash rate output of cpuminer-opt.
WHEA or not if the IF doesn't work properly, you get performance regressions.
So besides a better AIDA score you need to be sure it really goes faster and not slower.

Honestly, considering how much less important is RAM on the 3D, I'd look for better timings at 1900 MHz.


----------



## LazyGamer

ManniX-ITA said:


> Honestly, considering how much less important is RAM on the 3D, I'd look for better timings at 1900 MHz.


This - I'd recommend just getting it stable and forgetting about it. In particular because you're gaining memory latency by having the 3D cache there in the first place. If you wanted to race memory benchmarks, the 5700G is your chip .


----------



## Cidious

ManniX-ITA said:


> Never read about anyone with a 3D and FCLK 1900+ without WHEA, even by hearsay.
> 
> If you want to try, usually what is attempted is higher VDD18 and VSOC/VDDG voltages.
> Very small fine-tuning to the 5mV can be necessary.
> Problem I see is that the 3D usually likes lower voltages, not higher.
> 
> I suggest you first take a baseline of y-cruncher benchmark pi2.5b and hash rate output of cpuminer-opt.
> WHEA or not if the IF doesn't work properly, you get performance regressions.
> So besides a better AIDA score you need to be sure it really goes faster and not slower.
> 
> Honestly, considering how much less important is RAM on the 3D, I'd look for better timings at 1900 MHz.


1900 is stable at the max setting that my ram reasonably likes with reasonable daily voltaqge and yes due to the cache this is less relevant. it's more curiosity thing because AMD was so confident saying 2000 is the new 1900 but still it's a bit of a unicorn.. I rand 1900IF from my 3600 and on, on all my chips.. 3800X, 3950X, 5800X, 5900X and 5800X3D now.. AMD saying 2000 is what 1900 was for Zen 2 was just a marketing gimmick. pisses me off to the day of today haha.


----------



## ManniX-ITA

Cidious said:


> 1900 is stable at the max setting that my ram reasonably likes with reasonable daily voltaqge and yes due to the cache this is less relevant. it's more curiosity thing because AMD was so confident saying 2000 is the new 1900 but still it's a bit of a unicorn.. I rand 1900IF from my 3600 and on, on all my chips.. 3800X, 3950X, 5800X, 5900X and 5800X3D now.. AMD saying 2000 is what 1900 was for Zen 2 was just a marketing gimmick. pisses me off to the day of today haha.


Oh if it's a curiosity thing could be an interesting challenge 
I'd see what happens pushing VDD18 and VSOC high, including high LLC for CPU and SOC.
Then playing with IOD and CCD, trying to keep them low as possible.
Probably you would need IOD very high as well, that's feeding the Infinity Fabric.


----------



## D3humaniz3d

Hi everyone,

I have a 5900X paired with an Aorus Xtreme with a kit of 4x8GB Hynix DJR (G.Skill RipjawsV). I'm kinda new to this whole OC schtick but am learning as I go, so please don't slam me too hard 😖

I have two questions:

1. My daily setup is a tightened 3800Mhz with [email protected] and I have no issues with it aside from sporadic WHEA 19 errors (Error source 0 iirc, cleared the log like 30 mins ago without saving...) surfacing after waking the system from sleep. I've played around with all the standard voltages and settings (Vsoc, VDDP, VDDG and CPU VTT18, procODT) with little success on getting rid of the WHEA 19's. The only thing I have not tried *yet to adjust are the PM_CLDO12, PM_1VSOC, PM1V8 voltages*, but I have not found any documentation regarding what these voltages are responsible for in the first place and how high I can push them within the reasonable territory for a daily OC. The OC worked WHEA free in a Ceosshair VIII Dark Hero previously.










Ignore the Vsoc on the screenshot, I daily it with 1.15 in Zentimings, currently testing it with less voltage (set to 1075).

2. My second question regards the higher FCLK speeds... I can boot with 2100 FCLK in 1:1 mode, but it will throw ~ a hundred WHEA19's every two minutes and needs quite high Vsoc (around 1.29v displayed in Zen) for the mouse to not drop out every two seconds. The greatest success I saw was with FCLK 2000 with the settings at 900mV VDDP, 950mV CCD, 1100mV IOD and 1250mV SOC (set in bios, get in Zentimings a bit less), but the same amount of WHEA's was popping out.

The structure of the WHEA dumps is as follows: 

One or two WHEA 19's with the Error source 1 pop up;

30 seconds later, around a hundred WHEA 19's with the Error Source 0 follow the first Error Source 1;

Repeat every 2 minutes.

Karhu throws an error within 120% - 140% of coverage. I also noticed huge spikes in Latencymon proceeding the dump of the 100 WHEA 19's which would happen regardless of whether or not I had Manni's WHEA supressor running or not, despite lowered average latency. 

The WHEA's themselves seemed to have been affected by the CPU VTT18 voltage, as increasing this from 1.8V to 2.0V seemed to decrease the WHEA 0 spam by 10 WHEA Error Source 0's.

My takeaway from this is that my chip cannot handle the higher FCLK's and I should not bother with it? 

Kind regards,
D3


----------



## ManniX-ITA

D3humaniz3d said:


> surfacing after waking the system from sleep. I've played around with all the standard voltages and settings (Vsoc, VDDP, VDDG and CPU VTT18, procODT) with little success on getting rid of the WHEA 19's. The only thing I have not tried *yet to adjust are the PM_CLDO12, PM_1VSOC, PM1V8 voltages*


Never had any success touching them.
A reliable wake up from standby is one of the things I really miss from Intel.
Across 3 boards and different Ryzen processors I'm still unable to get the system working properly after wake up.
With the latest Win10 patches it improved a bit but it's still unreliable.
At some point I always have to reboot cause something starts freaking out or the system is slow.
You can try to play with Erp and Idle PSU power settings but you can probably only get it work better, never as it should.



D3humaniz3d said:


> My takeaway from this is that my chip cannot handle the higher FCLK's and I should not bother with it?


If you get different error sources and spikes in latency yes, it's for sure unstable.
You can try to focus on 2000 MHz, it's already very high for a 5900X with dual CCD, but it could take months to stabilize.
Even if stable and with non or a few WHEA then you need to check if there are performance regressions.
Meaning benchmarking with the monero miner and y-cruncher pi 2.5b.
Only if you get better results than with FCLK 1900 then it's working.

Depends on the processor but you probably need IOD up to 1120/1150, CCD to 1000/1050 and sometimes VDDP up to 1000/1050.


----------



## D3humaniz3d

ManniX-ITA said:


> Depends on the processor but you probably need IOD up to 1120/1150, CCD to 1000/1050 and sometimes VDDP up to 1000/1050.


Thank you for the writeup! Cleared up some things for me. I will try the voltages you suggested later in the evening. Fingers crossed it will stop doing stupid things with that IOD voltage - Have not yet tried to go that high.


----------



## ManniX-ITA

D3humaniz3d said:


> Thank you for the writeup! Cleared up some things for me. I will try the voltages you suggested later in the evening. Fingers crossed it will stop doing stupid things with that IOD voltage - Have not yet tried to go that high.


IOD feeds the Infinity Fabric so it's really the pivotal factor, together with VSOC which is feeding it.
CCD is crucial as well but needs to be much lower, only slightly higher for every FCLK strap.
All they do worse when too low or too high so you need to find the right values.


----------



## D3humaniz3d

ManniX-ITA said:


> IOD feeds the Infinity Fabric so it's really the pivotal factor, together with VSOC which is feeding it.
> CCD is crucial as well but needs to be much lower, only slightly higher for every FCLK strap.
> All they do worse when too low or too high so you need to find the right values.


One more question: Can the PBO Limits affect the stability? Because I'm doing this all on stock operation, no CO offstets, no adjusted PBO limits, etc. My thought process behind this was not to introduce additional sources of instability.


----------



## ManniX-ITA

D3humaniz3d said:


> One more question: Can the PBO Limits affect the stability? Because I'm doing this all on stock operation, no CO offstets, no adjusted PBO limits, etc. My thought process behind this was not to introduce additional sources of instability.


IF stability depends also on the usage.
If you find stability it can get unstable again when you set PBO, CO and also better memory timings.
It's a good idea to start with stock settings, easier to deal with it.
But indeed could be only the first half of the work.


----------



## mongoled

@ManniX-ITA
Hope your doing well.
Has anyone reported the tool not working in Window 10 21H1 ??
I have it installed but it is not suppressing the whea warnings

*_edit_
actually, I may be remembering incorrectly how the suppressor works.

I get the "bootup" whea warnings, but after the PC has fully booted the whea warning dont appear.

I think this is correct.


----------



## ManniX-ITA

mongoled said:


> actually, I may be remembering incorrectly how the suppressor works.
> 
> I get the "bootup" whea warnings, but after the PC has fully booted the whea warning dont appear.
> 
> I think this is correct.


Doing well thanks, except from the first cold after 3 years, sneezing and coughing!

Yes indeed that's correct.
First WHEAs will be logged till suppressed.

Working a lot on CPUDoc now that I'm in Italy.
Lots of good stuff incoming;

5-7% CPU Boost in performances (SMT off like without disabling it, gaming +3-6 fps on average with peaks up to 30-50 fps)
WHEA Suppressor through MSR, completely freeing Windows resources, hopefully more granular (hope will be possible to filter only FCLK WHEA, have to test yet)
Completely dynamic power plan; adapting to CPU model, Ultimate plan performances with super low temperature and power idling


----------



## mongoled

ManniX-ITA said:


> Doing well thanks, except from the first cold after 3 years, sneezing and coughing!
> 
> Yes indeed that's correct.
> First WHEAs will be logged till suppressed.
> 
> Working a lot on CPUDoc now that I'm in Italy.
> Lots of good stuff incoming;
> 
> 5-7% CPU Boost in performances (SMT off like without disabling it, gaming +3-6 fps on average with peaks up to 30-50 fps)
> WHEA Suppressor through MSR, completely freeing Windows resources, hopefully more granular (hope will be possible to filter only FCLK WHEA, have to test yet)
> Completely dynamic power plan; adapting to CPU model, Ultimate plan performances with super low temperature and power idling


Ahh, get well soon or hopefully you are over the cold already 



Oh nice to see you are working on some "goodies" im sure you are having fun with that along with the trials and tribulations it involves.

Look forward to seeing the fruits of your dedication


----------



## ManniX-ITA

Unfortunately Microsoft and AMD obviously discarded support for Ryzen 3000.
The Hacksaw temp graph is gone for the 5000 but it's back for the 3000.
And there's no "solution", even if the heterogenous scheduling is still working as before (lower half of CPPC performance cores are Efficient).
Didn't test with the old drivers before upgrading to 22H2.

The good news is that CPUDoc is nearing its first Beta milestone and is just great 😁 

On my 3800X the idle temperature graph from the infamous hacksaw graph at 40-50° C is now a rock steady 36° C.
Power consumption in idle from 33W-50W PPT is down to a rock steady 24W.
First stage, light standby, is already 25W and 37° C temperature.

*From at least 9W up to 26W reduction, just awesome.*

All this while maintaining the stellar Ultimate power plan performances while in use:


----------



## Artylol

Help me to get rid of WHEA errors. My chip easily does 1900, 1933 with adjusted voltages and anything higher gives whea errors. And that given my chip booted into 2000IF completely stock, without adjustments. What strange is whea errors drop in 100-103 errors at a time exactly every 1 minute, while being clean inbetween. I tried: 

1) Changing voltages up and down (incl VDD18)
2) Changing DRAM electrical termination settings, but I dont believe it can help even in theory.
3) Changing voltages to CPU core

I also found out that there is sweet spot for VDDG_CCD, where write bandwidth drops below 900 and also drop above 900, then again regains at 1000+
Here are my settings


----------



## zGunBLADEz

hmmm interesting i will try this in the infamous gigabyte z690 itx lite pci ex4/5 dilemma lol
i get half a million of wheas in first boot in a matter of seconds if i switch from pciex3


----------



## ManniX-ITA

Artylol said:


> Help me to get rid of WHEA errors. My chip easily does 1900, 1933 with adjusted voltages and anything higher gives whea errors. And that given my chip booted into 2000IF completely stock, without adjustments. What strange is whea errors drop in 100-103 errors at a time exactly every 1 minute, while being clean inbetween. I tried:
> 
> 1) Changing voltages up and down (incl VDD18)
> 2) Changing DRAM electrical termination settings, but I dont believe it can help even in theory.
> 3) Changing voltages to CPU core


Not always possible to fix it, that's why there is the suppressor.
Sometimes despite all is working fine you keep getting WHEAs.

You probably need much higher VSOC, around 1.2-1.225V.
Maybe also a little bit more IOD, I was running 1140mV.
Depending on the sample you may also need more CCD, 900mV is low.
Usually at least 950mV, I was using 1050mV.

Check with GeekBench 5 AES XTS test to find the best CCD value.
But after you have settled VSOC and IOD.

To verify there's no performance loss check at least with y-cruncher pi 2.5b.
Must finish in less time than FCLK 1900.



zGunBLADEz said:


> hmmm interesting i will try this in the infamous gigabyte z690 itx lite pci ex4/5 dilemma lol
> i get half a million of wheas in first boot in a matter of seconds if i switch from pciex3


That's Intel but in theory should work just as fine.
Never tested


----------



## zGunBLADEz

latest windows 10 fully updated no good man.
try starting the service but no dice.


----------



## ManniX-ITA

zGunBLADEz said:


> latest windows 10 fully updated no good man.
> try starting the service but no dice.


the service starts and stops, doesn't keep running

do you mean that you keep getting WHEAs?


----------



## zGunBLADEz

the service doesnt start at all i try manually to start it but no dice. so i havent get into the whea part yet as service isnt running


----------



## ManniX-ITA

zGunBLADEz said:


> the service doesnt start at all i try manually to start it but no dice. so i havent get into the whea part yet as service isnt running


Uhm probably it's not clear what it does 
It will never end in a "Running" state.
Many Windows service are just meant to be started, they do stuff and they stop (or keep Running if needed).

Check with Computer Management; that's the only way to know what it does.
You should have a new Application log:










There are a series of event logs and then you can check what is trying to disable and what is the status at the end before it exits.


----------



## zGunBLADEz

ManniX-ITA said:


> Uhm probably it's not clear what it does
> It will never end in a "Running" state.
> Many Windows service are just meant to be started, they do stuff and they stop (or keep Running if needed).
> 
> Check with Computer Management; that's the only way to know what it does.
> You should have a new Application log:
> 
> 
> 
> There are a series of event logs and then you can check what is trying to disable and what is the status at the end before it exits.


ok this is what i got..
Dont get scare thats a KNOWN issue on this motherboard lol if you go above gen3 it will FLOOD you with whea errors on gen3 0 problems


----------



## zGunBLADEz




----------



## ManniX-ITA

zGunBLADEz said:


> Dont get scare thats a KNOWN issue on this motherboard lol if you go above gen3 it will FLOOD you with whea errors on gen3 0 problems


Wow that's a lot 

Different source probably.
Can you get a screenshot of the event log as I posted before? with the list?
Also a screenshot of the WHEA event log itself.


----------



## ManniX-ITA

Like this with details:


----------



## zGunBLADEz

theres more if failed for example to disable some of the wheas
like

Error Sources count is 9, current status:

ID: 0 Type: 4 State: Started Description: WheaErrSrcTypePCIe = 0x04, PCI Express Error
ID: 1 Type: 4 State: Started Description: WheaErrSrcTypePCIe = 0x04, PCI Express Error
ID: 2 Type: 4 State: Started Description: WheaErrSrcTypePCIe = 0x04, PCI Express Error
ID: 3 Type: 16 State: Started Description: WheaErrSrcTypeDeviceDriver = 0x10, Device Driver Error Source
ID: 4 Type: 0 State: Started Description: WheaErrSrcTypeMCE = 0x00, Machine Check Exception
ID: 5 Type: 1 State: Started Description: WheaErrSrcTypeCMC = 0x01, Corrected Machine Check
ID: 6 Type: 3 State: Started Description: WheaErrSrcTypeNMI = 0x03, Non-Maskable Interrupt
ID: 7 Type: 7 State: Started Description: WheaErrSrcTypeBOOT = 0x07, BOOT Error Source
ID: 8 Type: 16 State: Started Description: WheaErrSrcTypeDeviceDriver = 0x10, Device Driver Error Source

Successfully disabled WHEA error source type 16 ID=3
Failed to disable WHEA error source type 0 ID=4
Successfully disabled WHEA error source type 1 ID=5
Failed to disable WHEA error source type 3 ID=6
Failed to disable WHEA error source type 7 ID=7


Error Sources count is 9, current status:

ID: 0 Type: 4 State: Started Description: WheaErrSrcTypePCIe = 0x04, PCI Express Error
ID: 1 Type: 4 State: Started Description: WheaErrSrcTypePCIe = 0x04, PCI Express Error
ID: 2 Type: 4 State: Started Description: WheaErrSrcTypePCIe = 0x04, PCI Express Error
ID: 3 Type: 16 State: Stopped Description: WheaErrSrcTypeDeviceDriver = 0x10, Device Driver Error Source
ID: 4 Type: 0 State: Started Description: WheaErrSrcTypeMCE = 0x00, Machine Check Exception
ID: 5 Type: 1 State: Stopped Description: WheaErrSrcTypeCMC = 0x01, Corrected Machine Check
ID: 6 Type: 3 State: Started Description: WheaErrSrcTypeNMI = 0x03, Non-Maskable Interrupt
ID: 7 Type: 7 State: Started Description: WheaErrSrcTypeBOOT = 0x07, BOOT Error Source
ID: 8 Type: 16 State: Started Description: WheaErrSrcTypeDeviceDriver = 0x10, Device Driver Error Source


giga did a number on this mobo lol cant complaint for $150 bucks tho i manage the gen3


----------



## ManniX-ITA

zGunBLADEz said:


> theres more if failed for example to disable some of the wheas


That's normal, some sources will fail the stop command and will keep running but they'll stop sending errors.

I bet those errors are coming from ID 0 to 2 with Type 0x04.

You need to find in the System logs the WHEA Event logs and post the content of details.
Check what is the ErrorSource and post them one event log for each different source.

I can make a quick binary just for this and we can see if it works.


----------



## zGunBLADEz

it did manage to block some of them tho.... im looping heaven and is around 147k before ufffff lol

Error Sources count is 9, current status:

ID: 0 Type: 4 State: Started Description: WheaErrSrcTypePCIe = 0x04, PCI Express Error
ID: 1 Type: 4 State: Started Description: WheaErrSrcTypePCIe = 0x04, PCI Express Error
ID: 2 Type: 4 State: Started Description: WheaErrSrcTypePCIe = 0x04, PCI Express Error
ID: 3 Type: 16 State: Stopped Description: WheaErrSrcTypeDeviceDriver = 0x10, Device Driver Error Source
ID: 4 Type: 0 State: Started Description: WheaErrSrcTypeMCE = 0x00, Machine Check Exception
ID: 5 Type: 1 State: Stopped Description: WheaErrSrcTypeCMC = 0x01, Corrected Machine Check
ID: 6 Type: 3 State: Started Description: WheaErrSrcTypeNMI = 0x03, Non-Maskable Interrupt
ID: 7 Type: 7 State: Started Description: WheaErrSrcTypeBOOT = 0x07, BOOT Error Source
ID: 8 Type: 16 State: Started Description: WheaErrSrcTypeDeviceDriver = 0x10, Device Driver Error Source

he fail disabling
Failed to disable WHEA error source type 7 ID=7
Failed to disable WHEA error source type 3 ID=6
Failed to disable WHEA error source type 0 ID=4

Successfully disabled WHEA error source type 1 ID=5
Successfully disabled WHEA error source type 16 ID=3

Error Sources targeted to be disabled:

Type: 16 Description: WheaErrSrcTypeDeviceDriver = 0x10, Device Driver Error Source
Type: 0 Description: WheaErrSrcTypeMCE = 0x00, Machine Check Exception
Type: 1 Description: WheaErrSrcTypeCMC = 0x01, Corrected Machine Check
Type: 3 Description: WheaErrSrcTypeNMI = 0x03, Non-Maskable Interrupt
Type: 7 Description: WheaErrSrcTypeBOOT = 0x07, BOOT Error Source

Error Sources count is 9, current status:

ID: 0 Type: 4 State: Started Description: WheaErrSrcTypePCIe = 0x04, PCI Express Error
ID: 1 Type: 4 State: Started Description: WheaErrSrcTypePCIe = 0x04, PCI Express Error
ID: 2 Type: 4 State: Started Description: WheaErrSrcTypePCIe = 0x04, PCI Express Error
ID: 3 Type: 16 State: Started Description: WheaErrSrcTypeDeviceDriver = 0x10, Device Driver Error Source
ID: 4 Type: 0 State: Started Description: WheaErrSrcTypeMCE = 0x00, Machine Check Exception
ID: 5 Type: 1 State: Started Description: WheaErrSrcTypeCMC = 0x01, Corrected Machine Check
ID: 6 Type: 3 State: Started Description: WheaErrSrcTypeNMI = 0x03, Non-Maskable Interrupt
ID: 7 Type: 7 State: Started Description: WheaErrSrcTypeBOOT = 0x07, BOOT Error Source
ID: 8 Type: 16 State: Started Description: WheaErrSrcTypeDeviceDriver = 0x10, Device Driver Error Source


----------



## ManniX-ITA

zGunBLADEz said:


> it did manage to block some of them tho.... im looping heaven and is around 147k before ufffff lol


Try this version, replace what's in the installation folder:






WHEAService_with_PCIe.zip







drive.google.com


----------



## zGunBLADEz

its so many that i only see one like a thousand times lol
for example this is the one i only see


----------



## zGunBLADEz

did that still error

Error Sources targeted to be disabled:

Type: 16 Description: WheaErrSrcTypeDeviceDriver = 0x10, Device Driver Error Source
Type: 0 Description: WheaErrSrcTypeMCE = 0x00, Machine Check Exception
Type: 1 Description: WheaErrSrcTypeCMC = 0x01, Corrected Machine Check
Type: 3 Description: WheaErrSrcTypeNMI = 0x03, Non-Maskable Interrupt
Type: 7 Description: WheaErrSrcTypeBOOT = 0x07, BOOT Error Source


Error Sources count is 9, current status:

ID: 0 Type: 4 State: Started Description: WheaErrSrcTypePCIe = 0x04, PCI Express Error
ID: 1 Type: 4 State: Started Description: WheaErrSrcTypePCIe = 0x04, PCI Express Error
ID: 2 Type: 4 State: Started Description: WheaErrSrcTypePCIe = 0x04, PCI Express Error
ID: 3 Type: 16 State: Started Description: WheaErrSrcTypeDeviceDriver = 0x10, Device Driver Error Source
ID: 4 Type: 0 State: Started Description: WheaErrSrcTypeMCE = 0x00, Machine Check Exception
ID: 5 Type: 1 State: Started Description: WheaErrSrcTypeCMC = 0x01, Corrected Machine Check
ID: 6 Type: 3 State: Started Description: WheaErrSrcTypeNMI = 0x03, Non-Maskable Interrupt
ID: 7 Type: 7 State: Started Description: WheaErrSrcTypeBOOT = 0x07, BOOT Error Source
ID: 8 Type: 16 State: Started Description: WheaErrSrcTypeDeviceDriver = 0x10, Device Driver Error Source

Successfully disabled WHEA error source type 16 ID=3
Failed to disable WHEA error source type 0 ID=4
Successfully disabled WHEA error source type 1 ID=5
Failed to disable WHEA error source type 3 ID=6
Failed to disable WHEA error source type 7 ID=7


----------



## ManniX-ITA

zGunBLADEz said:


> did that still error


Uhm, are you sure you replace it correctly?
It's missing the Type: 4 in the list of Sources targeted to be disabled, seems it's not the version I've sent you.


----------



## ManniX-ITA

zGunBLADEz said:


> Error Sources targeted to be disabled:
> 
> Type: 16 Description: WheaErrSrcTypeDeviceDriver = 0x10, Device Driver Error Source
> Type: 0 Description: WheaErrSrcTypeMCE = 0x00, Machine Check Exception
> Type: 1 Description: WheaErrSrcTypeCMC = 0x01, Corrected Machine Check
> Type: 3 Description: WheaErrSrcTypeNMI = 0x03, Non-Maskable Interrupt
> Type: 7 Description: WheaErrSrcTypeBOOT = 0x07, BOOT Error Source


You should get this message instead:

Error Sources targeted to be disabled:

Type: 16 Description: WheaErrSrcTypeDeviceDriver = 0x10, Device Driver Error Source
Type: 0 Description: WheaErrSrcTypeMCE = 0x00, Machine Check Exception
Type: 1 Description: WheaErrSrcTypeCMC = 0x01, Corrected Machine Check
Type: 3 Description: WheaErrSrcTypeNMI = 0x03, Non-Maskable Interrupt
Type: 7 Description: WheaErrSrcTypeBOOT = 0x07, BOOT Error Source
Type: 4 Description: WheaErrSrcTypePCIe = 0x04, PCI Express Error


----------



## Artylol

Deleted


----------



## Artylol

ManniX-ITA said:


> Not always possible to fix it, that's why there is the suppressor.
> Sometimes despite all is working fine you keep getting WHEAs.
> 
> You probably need much higher VSOC, around 1.2-1.225V.
> Maybe also a little bit more IOD, I was running 1140mV.
> Depending on the sample you may also need more CCD, 900mV is low.
> Usually at least 950mV, I was using 1050m


Is that ok to run with loads of WHEA errors for daily use? I does not seem to get USB/audio instabilities. Also isn't is unsafe to go above 1.2V on VSOC?


----------



## ManniX-ITA

Artylol said:


> Is that ok to run with whea errors due to IF OC?


If they are only WHEA 19 and only because the FCLK speed, yes.
But you need to ensure it, no one else can do it for you.
I've run for months benchmarks and stress tests on a secondary install before using it daily on my main install.



Artylol said:


> I mean it does not seem to show any instabilities and USB/audio dropouts, but yet to be tested for no performance loss.


This is pretty important. 
You need to check with stress tests and benchmarks.

It's also true that a lot of people runs it anyway despite they have a performance loss.
Depends on the use cases.
The performance loss is (or should be, otherwise something is really wrong) only around 90-100% CPU load.
If the main use case is gaming running high FCLK with sync RAM gives a very nice boost.

I don't run at FCLK above 1900 with this 5950X B2; it's unstable and even if it wasn't the performance loss is pretty big.
For my uses cases it's too risky and the reward not enough.



Artylol said:


> Also, is it really safe to run VSOC above 1.2V because there are a lot of information that it is not.


I've been running it for almost a year and a half without any issue.
Many others as well.
But yes there's not much info around and no definitive or official answer.
From my experience it's still a safe voltage but borderline.
Considering the IOD manufacturing node and average temperature it should be far from being dangerous.
In any case at around 1.25V it's eating so much power from the CPU that it becomes non-sense.


----------



## zGunBLADEz

As soon i have a chance i re check again.


----------



## zGunBLADEz

so i uninstall the service as i used the installer....
i dl the portable which i replace it with the files you gave me and i got the following

Service does not exist. Going to install now


'InstallUtil' is not recognized as an internal or external command,
operable program or batch file.
Installation Completed!

Press any key to continue . . .


it doesnt install the WHEAservice


----------



## ManniX-ITA

zGunBLADEz said:


> so i uninstall the service as i used the installer....
> i dl the portable which i replace it with the files you gave me and i got the following
> 
> Service does not exist. Going to install now


Ah no, you need to use the installer.
Then copy the content (actually just the .exe is fine) into the "\Program Files (x86)\WHEAService" folder and start again the service


----------



## zGunBLADEz

*oh wow what sorcery* is this??? we getting there









this is the new log

Error Sources targeted to be disabled:

Type: 16 Description: WheaErrSrcTypeDeviceDriver = 0x10, Device Driver Error Source
Type: 0 Description: WheaErrSrcTypeMCE = 0x00, Machine Check Exception
Type: 1 Description: WheaErrSrcTypeCMC = 0x01, Corrected Machine Check
Type: 3 Description: WheaErrSrcTypeNMI = 0x03, Non-Maskable Interrupt
Type: 7 Description: WheaErrSrcTypeBOOT = 0x07, BOOT Error Source
Type: 4 Description: WheaErrSrcTypePCIe = 0x04, PCI Express Error


Error Sources count is 9, current status:

ID: 0 Type: 4 State: Started  Description: WheaErrSrcTypePCIe = 0x04, PCI Express Error
ID: 1 Type: 4 State: Started Description: WheaErrSrcTypePCIe = 0x04, PCI Express Error
ID: 2 Type: 4 State: Started Description: WheaErrSrcTypePCIe = 0x04, PCI Express Error
ID: 3 Type: 16 State: Started Description: WheaErrSrcTypeDeviceDriver = 0x10, Device Driver Error Source
ID: 4 Type: 0 State: Started Description: WheaErrSrcTypeMCE = 0x00, Machine Check Exception
ID: 5 Type: 1 State: Started Description: WheaErrSrcTypeCMC = 0x01, Corrected Machine Check
ID: 6 Type: 3 State: Started Description: WheaErrSrcTypeNMI = 0x03, Non-Maskable Interrupt
ID: 7 Type: 7 State: Started Description: WheaErrSrcTypeBOOT = 0x07, BOOT Error Source
ID: 8 Type: 16 State: Started Description: WheaErrSrcTypeDeviceDriver = 0x10, Device Driver Error Source

Successfully disabled WHEA error source type 16 ID=3
Failed to disable WHEA error source type 0 ID=4
Successfully disabled WHEA error source type 1 ID=5
Failed to disable WHEA error source type 3 ID=6
Failed to disable WHEA error source type 7 ID=7
Successfully disabled WHEA error source type 4 ID=0

Error Sources count is 9, current status:

ID: 0 Type: 4 State: Stopped Description: WheaErrSrcTypePCIe = 0x04, PCI Express Error
ID: 1 Type: 4 State: Started Description: WheaErrSrcTypePCIe = 0x04, PCI Express Error
ID: 2 Type: 4 State: Started Description: WheaErrSrcTypePCIe = 0x04, PCI Express Error
ID: 3 Type: 16 State: Stopped Description: WheaErrSrcTypeDeviceDriver = 0x10, Device Driver Error Source
ID: 4 Type: 0 State: Started Description: WheaErrSrcTypeMCE = 0x00, Machine Check Exception
ID: 5 Type: 1 State: Stopped Description: WheaErrSrcTypeCMC = 0x01, Corrected Machine Check
ID: 6 Type: 3 State: Started Description: WheaErrSrcTypeNMI = 0x03, Non-Maskable Interrupt
ID: 7 Type: 7 State: Started Description: WheaErrSrcTypeBOOT = 0x07, BOOT Error Source
ID: 8 Type: 16 State: Started Description: WheaErrSrcTypeDeviceDriver = 0x10, Device Driver Error Source


----------



## zGunBLADEz

closed hwinfo run a 3d application 0 errors @[email protected]









let me reboot


----------



## ManniX-ITA

zGunBLADEz said:


> *oh wow what sorcery* is this??? we getting there


Oh yeah 

I see a bug there, ID 8, 1 and 2 sources are not disabled.
I probably considered there's only one source per type, while there can be multiple.
I'll prepare a new binary.


----------



## zGunBLADEz

so i got a few on reboot BUT NO MORE on 3d applications running if i close hwinfo and reopen it no errors on 3d applications..


----------



## ManniX-ITA

zGunBLADEz said:


> so i got a few on reboot BUT NO MORE on 3d applications running if i close hwinfo and reopen it no errors on 3d applications..


Very nice!
Yes you will get WHEAs at least till Windows is starting the service and disable the sources.
I'll work anyway on this new version, you never know maybe doing something else they start popping again...


----------



## zGunBLADEz

ManniX-ITA said:


> Very nice!
> Yes you will get WHEAs at least till Windows is starting the service and disable the sources.
> I'll work anyway on this new version, you never know maybe doing something else they start popping again...


yeah i will do a driver uninstall with ddu and reinstall a fresh driver to see if they come back etc. as i want to start messing around with your scheduler and do some benchmarks.. also have to swap between pci gens as this board goes all way up to gen 5 even so 3090 is gen 4 but just to verify.. i will put a delay on hwinfo start then


----------



## zGunBLADEz

yeah it survived the ddu and fresh driver install. 0 whea errors other than the ones that popout before the service starts


----------



## zGunBLADEz

so. im getting stuck on









at first boot it be fine then it will drop down to that and get stuck there


----------



## ManniX-ITA

zGunBLADEz said:


> at first boot it be fine then it will drop down to that and get stuck there


Wow they really did a number with this board...
Guess the only option is to hide Restart and only use Shutdown


----------



## zGunBLADEz

was worth a try tho lolz i knew what i was getting into when i got the board. still a good sale


----------

