The Case of the Laggy Xbox Controller on Windows 10

Posted on April 18, 2016 by Rafael Rivera in Windows 10, Xbox One with 0 Comments

Xbox One Special Edition Covert Forces Wireless Controller

Update 4/22 – Build 14328 flighted today and fixes this issue. Recommend you upgrade as soon as possible.

If you’re a Windows Insider and a gamer, you’re probably no stranger to Xbox controller woes on Windows 10; on build 14291, simply connecting a controller could bring down the operating system. Build 14295 quickly fixed that, but then introduced a problem that made it impossible to use for long periods of time. Fast forward to today — build 14316 — and there’s still no fix in sight. What are Windows Insiders to do during Xbox beta season? Rollback? Nah.

Fix the problem themselves, of course!

Let’s do this.

So, we know the controller works, that is, we can play Xbox games via Xbox’s streaming feature. But after a short amount of time, everything starts to slow down and audio starts crackling. Let’s fire up the Windows Performance Recorder (WPR) and trace what’s going on when things start to suck.

WPR - Profile Configuration

We’ll enable the CPU usage (not pictured) and Desktop composition activity profiles, hit Start, fire up Fallout 4, and just play for a minute or so.

As expected, things are very sluggish now. Let’s stop, allow the system to recover a bit and save our trace.

Opening the trace in Windows Performance Analyzer (WPA), we see a very interesting linear progression of CPU activity in our Computation graph set thumbnail, but let’s focus on the UI sluggishness first.

Opening the Video graph set and dragging the Dwm Frame Details graph into our analysis area yields a beautiful visualization of the frame rate over time. This confirms that the Desktop Window Manager (DWM) was happily rendering my full-screen Xbox streaming window at about 60 frames per second until something happened, causing it to drop.

Let’s bring in the CPU Usage graph for some side-by-side analysis.

WPR - Correlated Frame Rate and CPU Usage

OK, bringing in the CPU Usage graph definitely confirms something is tying up the process and tanking our frame rate.

WPR - CPU Usage Stack

Expanding the Stack reveals we’re doing a lot of work on the controller input handler, per report (e.g. button press). Wait, we’re getting the bounds of the display every single time we do something on the controller? And acquiring a fresh device context handle via GetDC (which redirects to ZwUserGetDC) every time to do so? Uh oh, we may have found our issue.

Let’s take a peek at what’s inside DesktopInputDisplay::GetBounds with a debugger. Now, because DWM is responsible for drawing all the UI on our machine, we’ll need to attach a debugger that’s controllable externally. Otherwise, we’ll just hang ourselves up and lose control of our machine.

cdb server spun up

So that’s running. Time to attach from the laptop.

cdb attached and ready to go

And we’re in. Let’s unassemble that function now.

cdb - Function unassemble

This function is pretty small. Reading a disassembly listing and understanding assembly instructions are beyond the goal of this post so I’ll summarize what we’re looking at with some C pseudo-code:

HRESULT GetBounds(int *width, int *height)
{
    if (!width || !height)
        return E_INVALIDARG;

    HDC hdc = GetDC(HWND_DESKTOP);
    *width = GetDeviceCaps(hdc, HORZRES);
    *height = GetDeviceCaps(hdc, VERTRES);

    if (!*width || !*height)
    {
        *width = 1024;
        *height = 768;
    }

    return S_OK;
}

I did a bit of research and … it’s not clear why this code is here and not part of some initialization elsewhere.

It’s basically telling Windows to reach into kernel space, wake up the display driver, and tell it to get ready for drawing operations on the desktop. The driver then jumps out of bed, looks at the desktop, sets up some stuff internally and gives us a token to refer to this work later. But in a dirt bag move, we just ask for the measurements of the desktop and leave without cleaning up.

Rude!

Anyway, we have two problems here:

  1. GetDC is expensive to call in the context of an input handler that gets called in sub-millisecond intervals. I going to guess that we’re clogging up a queue somewhere with our repetitive calls.
  2. We’re not cleaning up the handle GetDC returns. I don’t believe this is directly related to our problem but it does mean if you tough through the lagginess, you’ll eventually crash DWM with an out-of-memory error.

Looking at the code again, it’s clear failures aren’t considered critical. If something bad happens with either GetDC or GetDeviceCaps, we fall through to returning hard-coded values. So let’s just hack out the GetDC call and let it do that all the time.

cdb - Before Patch

This is before we patch this out.

cdb - After Patch

And here’s the after. Because the original call instruction was 6 bytes long, I used a near jump instruction (2 bytes) to jump over the remaining 4 bytes. (We could have just overwritten the whole instruction with 6 no-operation instructions but that’s more keys to punch.)

Let’s give it a spin.

WPR - Results after patch

And we’re in the clear; I didn’t hit any sudden stuttering and verified things look much nicer in Windows Performance Analyzer.

This Band-aid should hold until Microsoft fixes the issue in an upcoming flight. If you want to apply this fix on your machine without all the manual steps above, do the following:

  1. Ensure you’re using a 64-bit copy of Windows 10 14316.rs1_release.160402-2217
  2. Install the Debugging Tools for Windows
  3. Open the folder the tools were installed to in an elevated command prompt
  4. Carefully issue the following command: cdb -pn dwm.exe -c “.symfix;eb ism32k!DesktopInputDisplay::GetBounds+29 eb 04;.detach;”

Happy fragging! (I’m WithinRafael on Xbox.)

Tagged with , ,