Think Globally, AI Locally (Premium)

Paul Thurrott
May 19, 2024
0

Carrots growing in the ground

In the 1990s, web browser upstart Netscape bragged about operating on “Internet time,” a term that roiled Microsoft cofounder Bill Gates so badly that he once sputtered that the only way to win that battle was for his company to ship new products at double that speed.

Gates was intensively competitive, but that was a different era, one in which Microsoft often took several years to release new versions of Windows and Office. These days, “Internet time” seems hopelessly quaint, and the software giant has for years shipped software updates at a far more frenetic pace.

Well, things are speeding up yet again.

Thanks largely to an unexpectedly aggressive Microsoft—which announced what we now call Copilot less than 18 months ago—we’re now living in “AI time.” In this unpredictable new era, it’s reasonable to expect to wake up to a new AI-based innovation almost every single day.

Frankly, this pace is unsustainable, and you don’t have to be an AI denier to see that. AI is just software, and it’s still constrained by the same constraints as any software, albeit on a far faster delivery schedule than ever before. And AI’s “put up or shut up” moment is upon us: Between Google I/O (last week), Microsoft Build (this coming week), and Apple’s WWDC (in June), we should head into late 2024 with a much clearer idea of what we can expect over the next several months.

This much is clear. We’re on the cusp of a new phase in this AI era, one in which the expensive, cloud-based capabilities we’re still struggling to grasp will be augmented and, in some cases, even replaced by new on-device AI capabilities that require major changes to the smartphones, PCs, and web browsers we use every day.

I am referring, of course, to hybrid AI. Moving AI workloads from the cloud to our devices is the holy grail for Big Tech, and while the benefits to users have been murky to date, Google I/O provided a template I expect to be repeated at Build and WWDC. That is, it’s informative to examine the hybrid AI capabilities that Google just announced, mostly for Android, and then consider how those types of changes might improve Windows PCs, Macs, and iPhones and iPads.

Today, all the mainstream AI services we access are delivered from cloud datacenters at great costs. But implementing hybrid AI isn’t a simple client-server scenario. Instead, it requires major investments by platform makers, and its success is dependent in part on the willingness of the user base to upgrade the devices they use. And that will require some marketing: The ability to remove or blur the background of an image slightly faster isn’t going to inspire anyone to spend $1000+ on a new AI PC.

New generations of phones, tablets, and PCs will of course ship with Neural Processing Units (NPUs), special processors that accelerate local workloads. But less obviously, these devices will also require CPU and GPU upgrades, and they will require far more RAM than is the case today. Looking specifically at the PC, modern SoCs (systems on a chip) from AMD, Intel, and Qualcomm all provide at least the basics across these processor types, but there will be rapid improvements across the board in subsequent generation SoCs too. These AI PCs will also require at least 16 GB of RAM, a minimum recommendation I moved to last year with the Windows 11 Field Guide. And power users will want at least 32 GB of RAM.

Working in tandem, a device’s CPU, GPU, and NPU will divvy up local AI workloads according to what will likely be rapidly evolving logic that will vary by software and the underlying hardware capabilities: Most local AI workloads are today optimized for GPUs, not NPUs, because the latter was largely non-existent until fairly recently. But that’s going to change. As will the separation of local (on-device) and cloud-based AI workloads. As these solutions improve, many will adopt truly hybrid processing models of various types. There will be purely cloud-driven and local AI workloads, on-device workloads that work even when the device is offline, and workloads that split processing between the two and offload to the cloud only when needed.

To achieve this, platform makers like Apple, Google, and Microsoft have to support it in their operating systems and implement specific solutions in apps and services that they provide in-box. Third-party developers then need to adopt these capabilities as well, differentiating not just between platforms (Android/iOS or Windows/Mac) but also based on the underlying hardware that will vary device-to-device. It’s going to a mess, especially in the short term.

Looking to Google, the company announced its first on-device capabilities in late 2023 when it introduced the Pixel 8 Pro. This smartphone would be the first to support an on-device small language model (SLM), a stripped down version of its cloud-based Gemini models called Gemini Nano. The Pixel 8 Pro doesn’t currently support any hybrid AI workloads. Instead, it has very specific on-device capabilities—summaries of recordings made with the Recorder app and Magic Compose in Messages at launch—and then that full suite of AI-based capabilities for which Pixels are famous; those require an Internet connection and are all cloud-based.

Google partnered with Samsung to bring Gemini Nano to the full family of Galaxy S24 smartphones (as opposed to, say, just the Galaxy S24 Ultra), which explains why they all ship with at least 8 GB of RAM (the S24+ and Ultra both have 12 GB of RAM like the Pixel 8 Pro). And at Google I/O last week, Google had several Gemini Nano announcements of interest that directly impact this discussion.

Most obviously, Gemini Nano will come to even more smartphones in late 2024, almost certainly new models with the hardware power required to handle on-device AI. But more impressively, Gemini Nano is going multimodal, yet another example of the confusing language of AI. Today, many AI models are modal, meaning that they understand only a single input mode, typically text. But multimodal models are, of course, more versatile and powerful in that they support multiple input (and output) modes, typically some mix of text, images, audio, and video.

Google’s announcement here was a bit vague: It referred to something called Gemini Nano with Multimodality (capitalization Google’s), which I take to be a second discrete version of Gemin Nano, one that will certainly have higher hardware requirements than the original. This suggests that it may not run on all the devices that currently support Gemini Nano, or that it will be less efficient on those devices. My guess is that it will launch alongside (and with) the Pixel 9 Pro this fall.

Last year, Google previewed two Gemini Nano-based Pixel features, Talkback and Scam call protection, but these features won’t arrive until Android 15 and may require new Pixel phones too. They do require Gemini Nano with Multimodality, of course: Talkback uses vision capabilities to deliver spoken feedback to users with low-vision issues, while Scam call protection works its magic by listening in on phone calls. At Google I/O last week, Google showed off both features in a more detailed fashioned and explicitly promised they would come to Pixel “later this year.”

As exciting, Google is bringing Gemini Nano to Google Chrome on desktop (Windows, Mac, and Linux) starting with version 126. I was surprised it didn’t provide more detail about this exciting change during the I/O keynote, but in a pre-I/O briefing, I was told that Google was working with all the major web browser makers—not just Chromium partners like Microsoft, but also Apple and Mozilla—to ensure that the web was AI ready as a platform. This includes technologies like WebGPU and WebAssembly (Wasm) that push web apps ever-further into native app capabilities, but also ways for web apps to use hardware-accelerated capabilities provided by the device (CPU, GPU, and NPU) and on-device SLMs. It gave Adobe Photoshop for the web as an example of the types of powerful new solutions that will start using more and more of these features.

Google also provides a bit more information on its Chrome for Developers website, noting that on-device AI can be more private and secure because it works only with local data, and that providing Gemini Nano (or, more likely, Gemini Nano with Multimodality) directly in Android and the Chrome web browser on desktop is more efficient for developers as they won’t have to deploy models on their own. My suspicion is that we’ll similar solutions from Apple and Google, with Apple providing its own on-device models in Macs, iPhones, and iPads, and Microsoft bringing a Copilot- or Phi-branded SLM to Windows 11. (We could learn about that latter possibility as soon as tomorrow.)

Google’s plans for deploying Gemini may provide a model for Microsoft to follow, too. (I expect Apple to simply ship its SLM with new devices.) That is, instead of just including Gemini Nano with the default Chrome download, Google plans to roll it out to users as needed, starting with those who are heavy users of Chrome’s Help Me Write feature. In its initial configuration, Help Me Write works against Gemini models in the cloud, but as customers use it more, Google will download Gemini Nano to their devices and the feature will use that instead.

Platform support for local (and hybrid) AI capabilities and the spread of compatible hardware will help drive upgrades to new AI PCs (and other AI-capable devices). But that’s only true if customers see the value, and to date, this has been the weak link in the AI PC story. With one uninteresting exception—Windows Studio Effects—all the AI capabilities in Windows 11 today, whether in apps or the system itself, are either cloud-based or do nothing to take advantage of local hardware acceleration. Likewise, the third-party apps that take advantage of these capabilities are both few and thin, and are mostly niche solutions like audio and video editors.

That needs to change. And that could start as soon as Microsoft’s special event tomorrow. Many have promoted this even as as Surface event, but it’s much more than that: The focus is on AI, and I expect Microsoft and its hardware partners to show off a wide range of new AI-capable PCs running on the new Snapdragon X platform. I also expect to hear about new on-device and hybrid AI capabilities that will arrive in Windows 11 version 24H2 in late 2024. And, ideally, about the third-party software that will make all of these advances truly compelling.

We’ll know soon enough. See you tomorrow!

Gain unlimited access to Premium articles.

With technology shaping our everyday lives, how could we not dig deeper?

Thurrott Premium delivers an honest and thorough perspective about the technologies we use and rely on everyday. Discover deeper content as a Premium member.

Tagged with

Premium

About author

Paul Thurrott

Paul Thurrott is an award-winning technology journalist and blogger with 30 years of industry experience and the author of 30 books. He is the owner of Thurrott.com and the host of three tech podcasts: Windows Weekly with Leo Laporte and Richard Campbell, Hands-On Windows, and First Ring Daily with Brad Sams. He was formerly the senior technology analyst at Windows IT Pro and the creator of the SuperSite for Windows from 1999 to 2014 and the Major Domo of Thurrott.com while at BWW Media Group from 2015 to 2023. You can reach Paul via email, Twitter or Mastodon.

View Articles

Currently on Forums
Visit the forums
- Microsoft cancels Claude Code licenses, shifting developers to GitHub Copilot CLI
  Posted by shameer_mulji
  
  2
  comments
- No longer 15 GB GMail storage
  Posted by Christian Gaeng
  
  4
  comments
- [CLOSED] Ask Paul for this Friday, May 15
  Posted by Paul Thurrott
  
  7
  comments
- Attack bypasses BitLocker using Windows Recovery Environment
  Posted by Christian Gaeng
  
  8
  comments
Podcasts
Podcast Hub
- First Ring Daily 1965: Small Break
  
  Aired on May 14, 2026 by Brad Sams with 0 Comments
- First Ring Daily 1964: GoogleBook
  
  Aired on May 13, 2026 by Brad Sams with 0 Comments
- First Ring Daily 1963: Court Report
  
  Aired on May 12, 2026 by Brad Sams with 0 Comments
- First Ring Daily 1962: Set Your Phasers
  
  Aired on May 11, 2026 by Brad Sams with 0 Comments
Join the crowd where the love of tech is real - become a Thurrott Premium Member today!

Explore Premium Benefits

Gain unlimited access to Premium articles.

Tagged with

Share post