
Microsoft technical fellow Steven (“Stevie”) Bathiche followed up his epic Build 2023 appearance with a short talk about Copilot+ PCs and NPUs at Build 2024 this past month. If you care about AI and how it is disrupting personal computing in unprecedented ways, both these discussions are worth your time. And each stands as a slice in time overview of what we can expect in the coming year.
I’ve discussed Stevie’s Build 2023 appearance many times, but it’s worth summarizing. He explained that Microsoft and third party developers would add AI capabilities to their platforms, apps, and services using three “application structures,” and that these would coexist for a time until the ways in which we interact with personal computing resources were fundamentally altered.
The first application structure, “beside applications,” is common today with Copilot being the obvious example: This is an obvious way to add AI capabilities to existing (legacy) apps, and it can happen external to the app (Copilot in Windows 11) or be added to an app, like Copilot in Microsoft Word or Excel.
The second structure is “inside applications,” where AI is embedded inside an app, resulting in simplified user interfaces but powerful capabilities. He cited Clipchamp and Microsoft Designer as examples of this type of app. But the implication here is interesting: Where apps with hundreds or thousands of commands—like those in Office—have historically required busy user interfaces, making it difficult to find what you want, “inside applications” apps are an almost magical combination of simple and powerful. My experiences with Clipchamp certainly bear this out.
“Outside applications,” the third structure, is the most futuristic. It may also be the end-game, if you will, for that coming generation of personal computing interactions, one that breaks us out of the monolithic standalone app model that we’re so used to today. In this model, AI capabilities are exposed as agents (services) controlled by an underlying orchestrator. Instead of using explicit apps, users will ask the AI to perform some task and the orchestrator will determine the optimal combination of apps, services, plugins, and whatever else is needed to accomplish that task.
I want to focus on the orchestration bit here, as I feel this is perhaps the most important concept to understand. Orchestration also plays a role—or will, I think—in the way that software of the future, be it Windows, apps, services, whatever, will interoperate with the NPU, GPU, and CPU components in modern PCs. Which, interestingly enough, was the topic of Stevie’s Build 2024 talk. But he didn’t mention the term “orchestrate” even once.
We’ll get to that. For now, consider how Stevie described orchestration a year ago.
“The AI will orchestrate across the multiple apps, plugins, and services, functioning more as an agent,” he said. “If you take a step back, the Windows shell itself is an orchestrator. In fact, may be one of the most powerful orchestrators across apps, across content, across the [Microsoft] Graph. Imagine with AI and natural language, you start to see glimpses of the opportunity with [Copilot in Windows]. And it is here when you get intelligence that is functioning not just at granular details, but at the higher levels where you get a mixing of both tactics and strategy, you get both vision and execution. It’s like a Copilot of Copilots, a very powerful application structure.”
When he said those words in May 2023, we knew that Microsoft was adding Copilot to Windows, Microsoft 365, and elsewhere. That Windows 11 was getting other AI-based features, typically via specific features in specific apps, and that those feature in no way took advantage of any underlying AI acceleration features in PC’s NPU (typically not present), GPU, or CPU. And we had a vague understanding that NPUs were coming to more PCs, and that this change would likely result in local (on-device) and hybrid (on-device mixed with cloud-based) AI capabilities in the platform. But Stevie’s 2023 talk didn’t touch on any of that. At the time, the goal was to simply explain how the transition to the AI era would unfold. Which, I have to say, he did admirably.
But it’s a year later now. And a lot has happened in this space since then.
Sticking just to Windows, Microsoft sped the release of Copilot into Windows 11 before it was ready and subsequently updated this software several times. Qualcomm announced its Snapdragon X family of Arm-based and NPU-centric PC microprocessor SoCs (systems on a chip) last October and promised a mid-2024 launch. Intel introduced its AI PC specification in December 2023 when it launched its first-generation Core Ultra (“Meteor Lake”) chipsets with integrated NPUs. Microsoft and Qualcomm launched the Copilot+ PC specification at Build 2024 with the first Snapdragon X-based PCs shipping later this month. And then AMD, NVIDIA, and Intel each announced how coming PCs based on their own silicon would dramatically expand the choices in the Copilot+ PC space.
Copilot+ PC as a specification is a bit of a conceptual hurdle, and it’s only natural that we try to make sense of it by comparing it to (somewhat) similar releases of the past. I think of it this way. Windows has always come in multiple SKUs, or product editions, each with its own set of unique features (Home, Pro). Some Windows features—like Windows Hello facial recognition—require specific hardware that may or may not be present in the PC you buy, regardless of the product edition. And Microsoft has, at times, offered scenario-specific Windows product editions that it only made available with new PCs (Tablet PC, Media Center) before later rolling that functionality into mainstream Windows product editions.
Copilot+ PC basically straddles those three concepts. It’s not a unique product edition: You will see Windows 11 Home and Pro in different Copilot+ PCs. But it does require a new PC, and its unique features require the specific hardware that makes a Copilot+ PC a Copilot+ PC: You won’t (yet) get Copilot+ PC capabilities in existing PCs as a software upgrade. That will change over time, first slowly. And then, I think, these features will just become part of Windows, just as the hardware requirements for Copilot+ PC will become commonplace and available in all PCs. For now, we’re in this transition period.
The Copilot+ PC hardware requirements seem straightforward: An approved AMD, Intel, or Qualcomm SoC with an NPU that provides 40 TOPS of accelerated AI performance or more, 16 GB of DDR5/LPDDR5 RAM or more, and 256GB SSD/UFS or more. But this specification is, in fact, more nuanced than that. As I noted recently in my latest Microsoft Recall editorial, Copilot+ PCs will all provide other very specific hardware and software features, like Windows Hello Enhanced Sign-In, an advanced biometric security feature that seems to negate the complaints about Recall. Copilot+ PCs will also arrive with over 50 language models preinstalled, each custom-tailored for specific use cases and designed specifically to run on that NPU.
On that note, Copilot+ PC delivers several AI-based features on top of Windows, features that utilize those on-device models and the PC’s NPU. There’s the controversial Microsoft Recall, of course, but also CoCreator, Live Captions with real-time subtitles and translation, Windows Studio Effects, automatic super resolution for videos and video games, and more. To be clear, these features all run locally and entirely on the NPU. That’s interesting, but it’s also controversial: As many disgruntled gaming PC owners have griped, their powerful GPUs offer more TOPS performance than the minimum required by Copilot+ PC. It’s just that this requirement is NPU-specific: GPUs don’t qualify.
But why?
That’s where Stevie’s Build 2024 finally enters the picture. It’s a year later, and Microsoft and its silicon, hardware, and software partners are now promoting and supporting new AI capabilities that each hopes are compelling enough to drive a new wave of upgrades. These capabilities aren’t coming from the cloud, they’re going to run on-device. The capabilities that Microsoft provides are limited to Copilot+ PCs, which we might see as a superset of Windows 11, and thus can only be had (legally/normally) by those who buy a new PC. The capabilities provided by third-party app developers might work differently. (For example, Adobe might allow certain AI-accelerated Photoshop or Premiere Pro features to work on non-Copilot+ PCs, just not as efficiently.)
In his short Build 2024 talk, Stevie explains that AI represents a “step function change” in compute ability similar to the change from vacuum tubes to microprocessors, a change so significant that it will enable new types of devices that will one day in no way resemble the PCs we now use. The drive of this change, he says, is the NPU.
For the AI haters and deniers, this may be a conceptual leap too far. But as I noted in Ask Paul last week, this is conceptually similar to how a GPU offloads certain tasks from the CPU, freeing the CPU from other tasks but also improving battery life (if the PC is a laptop) because the GPU doesn’t just perform those tasks with better performance than the CPU, it’s much more efficient too. This is a magic combination, a rare win-win.
NPUs work the same way, conceptually. Where CPUs are optimal for scalars, like numbers, and GPUs are optimal for vectors, like arrays of numbers, NPUs are optimal for tensors, like arrays of arrays. Each of these data types—scalars (numbers), vectors (arrays of numbers), and tensors (arrays of arrays) is much more complex than the last. But NPUs provide the same magical win-win as do GPUs in that they are more powerful and more efficient than the other processor types. In many if not all cases, a GPU or CPU could do the work of an NPU, it’s just that they would be much slower and much less efficient.
Stevie explains it better than I do, and it’s worth watching the portion of the video of his talk that deals with this topic. He demonstrates an image generation workload running against a Copilot+ PC-capable NPU, an Nvidia RTX3080 GPU, and an Intel Core i7 CPU. The NPU is 32 times as fast as the CPU at this task and, while it doesn’t say this, the images he shows suggest that it’s also about 16 times faster than the GPU. But it’s not just performance: The NPU is also dramatically more efficient.
Here, the numbers are rather incredible. To achieve the same performance as a sub-4-watt NPU in this workload, you would need 20 Core i7 processors, and would consume about 440 watts of power. For the GPU, you’re looking at 320 watts of power. That’s an incredible amount of energy consumption—440 watts vs. 320 watts vs. less than 4 watts—just to achieve what the NPU does without blinking an eye or impacting your battery life.
Stevie refers to the AI step function change as a “spark of intelligence at the edge.” It’s what enables us to move past the now-basic AI capabilities of the past—autocomplete, language translation, and the like—into “magical” capabilities that harness the power of ever-bigger language models. And the line between the old and the new, so to speak, is that 40 TOPS NPU requirement that Microsoft specifies. These NPUs have crossed the threshold, he says, into capabilities that are sparks of intelligence.
“Now our devices can run these types of models locally on the device to do amazing things,” he said. “That’s really cool, and that allows us to do a whole bunch of neat things with interaction technology. With AI, we’re going [to move past the mouse and point-and-click paradigm], go to from being programmatic to being piloted. We’re going to go from being exact to being fuzzy. We’re going to go from being explicit to implicit. We’re going to go from commanding to asking. And that’s going to change how people use computers.”
There is a lot to unpack there. Some of it is obvious, some less so. Based on the earlier discussion of application structures, I have to think that piloted is what we see now in “beside applications” scenarios like Copilot. That being fuzzy and implicit is tied to new interaction models in which we use optimized AI applications (“inside applications” like Clipchamp) to perform complex tasks using simpler UIs. And that commanding (mouse/keyboard) gives way to asking, which is what we do now with chatbot prompts and will do in the future in a more sophisticated way with “outside applications,” AI agents, and orchestrators.
“This amount of compute allows us to do the low latency transform, [uses] the neural networks built into the operating system in a performant manner so that we can do better privacy and work offline,” he continued.
He then went into a discussion of Recall and how it was built using multiple AI features and multiple on-device models to handle such things as text encoding, image encoding, optical character recognition, natural language parsing, screen region detection, and more, all while creating a vector database on the fly. Getting past the controversies, the key to this feature is multiple models, all running constantly in the background, without impact the device’s battery life. It is, of course, the NPU that makes this possible. And not just possible but viable in the first place.
To drive this point home, Stevie then provided what he described as a demo Microsoft had never shown before: A Copilot+ PC running all the 40+ models that it provides, simultaneously, performing automated image generation, Studio Effects, Phi Silica, and other interactions, absolutely hammering the device’s NPU while Recall captured and semantically indexed it all.
But “there’s no workload on the GPU,” he points out, pointing to the Task Manager display on-screen proving that. “And what does this mean? By the end of this demo, while I’m running this entire workload, I can run a game [too]. That’s the power that sits in these Copilot+ PCs. This a laptop, for crying out loud.” And not just a laptop: A Snapdragon X Elite (X1E78100)-based laptop, based on the Task Manager display. Not even the highest-end version of the chip.
Stevie then concludes his talk by explaining that Microsoft will expose these capabilities via the Windows Copilot Runtime, which is not a runtime but rather a set of developer APIs related to new AI capabilities. Some are available now, but many more are coming soon via a set of pre-release builds of the next version of the Windows App SDK, which Microsoft tells me should be finalized in late summer. (I will explore those developer capabilities as part of my work modernizing the WPF version of .NETpad this year.)
Oddly—or, perhaps, purposefully—Stevie never mentions any version of the term orchestration during this talk. I found that confusing and surprising the first time I watched the talk. And I initially figured I must have missed it. But in rewatching the talk and reading its transcript repeatedly, I can see that my initial assessment was correctly.
And I cannot stop thinking about orchestration.
While the AI capabilities Microsoft will ship in Copilot+ PCs are, of course, impressive, there is also a curious element of low-tech to them too. Consider Recall, which stores much of the content in its vector database as raw screenshots, relying on OCR and other capabilities to infer what it is you later search for. Surely, this is a temporary measure, a capability that exists because so few apps natively support Recall. As more apps do, the need for screenshots will lessen and maybe even disappear in time.
The lack of orchestration—or, perhaps, the laser focus on a very specific NPU type for very specific tasks—speaks to this same low-tech weirdness. Surely, a sophisticated AI orchestrator in Windows or elsewhere would examine a user’s request and then intelligently determine not just which apps, services, plugins, and other things would best accomplish that task, but it would also schedule those tasks against the appropriate processor on the fly. This is important because certain tasks will always run better (faster and/or more efficiently) on certain processors, and because every PC is different, with a different mix of CPU, GPU, and NPU.
But that’s not what happens on a Copilot+ PC. Instead, there are certain tasks, unique to Copilot+ PC, that only run on the NPU. And, as with any other Windows 11 PC, certain tasks that run on the GPU when available, and on the CPU otherwise. This is a simple form of orchestration, I guess. But the orchestration I envision is much more sophisticated, a sort of intelligent design, if you will, pushed from religious into computer science. Which maybe makes sense: AI often feels like magic, after all.
We are in a transition, so it’s likely in these early days that the sophisticated orchestration I expect isn’t necessary or even viable. That as time goes on, as more and more legacy PCs are replaced by Copilot+ PCs/AI PCs of whatever level of power, that more sophisticated orchestration will emerge at the operating system level. I feel like this has to happen, frankly. It’s the only way we move from the mouse interactions we still use today, interactions first introduced in Douglas Engelbart’s “Mother of All Demos” in 1968, to a future of AI agents doing our bidding using new interface paradigms on new form factors. It’s how the PC of the future will no longer resemble the PC as we’ve known it for so long.
Maybe Stevie’s Build 2025 talk will focus on this orchestration. Or maybe I’m getting ahead of myself, and despite the speed at which AI has consumed our industry, we’re still years away from this coming innovation. Maybe it requires something that is not Windows, something that does to Windows what Windows did to MS-DOS.
The future is difficult to see. But I can’t stop thinking about this, wondering how and when it will unfold.
With technology shaping our everyday lives, how could we not dig deeper?
Thurrott Premium delivers an honest and thorough perspective about the technologies we use and rely on everyday. Discover deeper content as a Premium member.