Fixing GPU/display crashes on Windows using MCP Server and AI (#ai #mcp #windows #troubleshoot #nvidia)

By creating a MCP Server that reads from Windows EventLog, I was able to fix NVIDIA crashers on my machine

Overview

Some time ago, the displays of my desktop started to turn black (like if the display was off). This would come and go, until the point where the displays would turn black, I could listen to the computer's audio, but the video never came back. So I knew procrastination was not an option anymore.

Before we get into the weeds of the MCP Server, let's understand the problem and the context.

Troubleshooting in Windows

If you're a Windows user, you might know about Event Viewer, which is a tool that allows you to view system, application, and security event logs. These logs contain information about system events, such as errors, warnings, and information messages. Sounds nice and easy, right? The problem is that it is absolutely overwhelming to sift through all the logs to find the relevant information, especially if you don't know exactly what you're looking for, like I was. I knew the symptoms, but I didn't know what was causing them.

Another problem with Event Viewer is that some logs might contain sensitive information. This was not exactly pertinent to my problem, but if I wanted to enroll an AI to help me, I needed to take this into account.

Would AI really help solve this problem?

Since our current AI models are good at quickly sifting through large amounts of data and (hopefully) identifying patterns and correlations. However, to allow AI to do its job, I would need to allow it to run PowerShell commands to extract information, then run more scripts to parse the data into a format it can understand.

Besides the security implication, there was another problem: The sheer volume of data in those logs would get in the way of AI (filling up context windows pretty fast), and increasing token usage to an absurd amount.

It sounds like I'm exaggerating, but if you ever had to find something in Event Viewer, you know what I'm talking about. 😅

Calling the OS-Doctor!

With all that in mind, it would look like MCP Servers were made for exactly this type of use case. I thought about simply searching for an existing MCP Server that did this, because in this day and age, probably we have tons of ready-to-use out there. However, I was also curious to learn more about ai-tools, how they work, and make them tick.

How did it work?

Better than I expected. I integrated it with Claude Code, and it was quickly able to go through all the data, identify the possible causes of the crash, suggest fixes, and monitor the system while I tested the system. Overall, I'm pretty happy with the result, especially because I was able to fix the issue before it became a bigger problem.

How do you use it?

First you let your AI agent know it exists. In my case, I was using Claude Code, and I configured this tool to be available globally, so in the file c:\Users\<my username>\.claude.json I added:

 1{
 2  "mcpServers": {
 3    "os-doctor": {
 4      "command": "gsudo",
 5      "args": [
 6        "C:/path/to/published/McpOsDoctor.exe"
 7      ]
 8    }
 9  }
10}

Then you can start the agent, and ask something like What diagnostic tools do you have available?, and it will list what the os-doctor mcp has available.

From there, you can simply ask some troubleshooting questions, and it will make use of the MCP server. For Instance:

  • "Why is my computer running slow?": Claude will check processes, system info, and event logs
  • "Show me any recent system errors" queries the event log for Error/Critical entries
  • "Is the Windows Update service running?": checks service status
  • "Has my computer crashed recently?": inspects boot history for unexpected shutdowns
  • "What's using all my memory?": lists top processes by memory consumption
  • "Monitor my CPU and GPU temperatures": starts sensor monitoring and reports thermal data

Which tools does os-doctor have?

ToolDescription
get_capabilitiesReports available tools, platform, elevation status, and parameter hints
query_system_logSearch Windows Event Log entries by time, severity, source, and keywords
list_log_sourcesList available event log sources
get_service_statusQuery Windows services by name, pattern, or status
list_top_processesList top processes sorted by CPU or memory usage
get_system_infoHardware and OS snapshot (hostname, CPU, memory, disks, uptime)
get_boot_historyBoot, shutdown, crash, and sleep/wake events with timestamps
get_gpu_infoNVIDIA GPU info: model, driver, VRAM usage, temperature, utilization, power draw via nvidia-smi
get_directx_infoDirectX version, display adapters (VRAM, drivers, feature levels), and sound devices via dxdiag
start_sensor_monitoringStart background hardware sensor polling (temperature, fan, voltage, clock, load, power)
stop_sensor_monitoringStop background sensor monitoring; collected data remains available via get_sensor_data
get_sensor_dataRetrieve sensor monitoring results with min/max/average/current statistics per sensor

Note: get_gpu_info requires nvidia-smi to be installed and accessible in the system's PATH, and get_directx_info requires dxdiag to be installed and accessible in the system's PATH.

How to install this MCP Server?

If you want it ready to use, you can download the pre-built binary from the releases page, place it in any directory, and configure your AI agent to use it.

If you prefer building from source, you can clone the repository, and go from there. It was written in C# (.NET10).

Any plans on supporting other operating systems?

So you thought this is a nice project, but you are one of the lucky ones that don't use Windows? Well, I built this MCP Server thinking about cross-platform compatibility, so it should be pretty simple to add support for Linux, and MacOS. That being said, I don't have concrete plans on adding this feature at the moment.

Conclusion

Overall, this was a fun project to work on, and it worked really well. Goes to show that AI can be a powerful, and useful tool if you know the problem you're trying to solve.

If you want to check the code out, you can visit the GitHub repository.

Hope it helps! :)

Translations: