Apple Details How It Trains Its AI

Paul Thurrott
Jun 14, 2024
7

Apple Intelligence

An Apple research document promises that the firm will not use its customers’ private personal data to train its AI models. But its Applebot web crawler was trained on “the open web” without the consent of publishers, with Apple only now allowing them to opt out after the fact.

“Apple Intelligence is comprised of multiple highly-capable generative models that are specialized for our users’ everyday tasks, and can adapt on the fly for their current activity,” Apple explains. “The foundation models built into Apple Intelligence have been fine-tuned for user experiences such as writing and refining text, prioritizing and summarizing notifications, creating playful images for conversations with family and friends, and taking in-app actions to simplify interactions across apps.”

The document goes on to explain how two of Apple’s foundational models, one on-device and one in its Private Cloud Compute datacenters –were built. And it notes some other models Apple has created, including one for coding that will appear in XCode and a diffusion model so “users can express themselves” in apps like Messages.

Like Microsoft and Google, Apple has established a set of responsible AI development principles. But unlike these companies, Apple explicitly will not use customer data to train its AI models.

“We do not use our users’ private personal data or user interactions when training our foundation models,” Apple says. But this leads to an obvious question, given that AI models have to be trained on something. How did Apple train these models?

According to the research document, Apple trained its models on “licensed data, including data selected to enhance specific features, as well as publicly available data collected by our web-crawler, AppleBot. Web publishers have the option to opt out of the use of their web content for Apple Intelligence training with a data usage control.”

And that’s true. It’s just that Applebot has already crawled the web and gathered the data it needs without telling anyone or giving publishers the chance to opt out. They can opt out now, as explained by an Apple Support document. But it may be too late, as that data was already taken and used to train AI. You can’t unwind that clock.

In the good news department, Apple says that it removes personal information like social security numbers, credit card numbers, and the like when it’s found on the public Internet. It also filters out profanity and other “low-quality content.”

I won’t pretend to understand the technical details of Apple’s AI model training, but the company claims its on-device foundational model is superior to Google Gemma-2B and -7B, Microsoft Phi-3-mini, and Mistral-7B. And that its cloud-based Private Cloud Compute foundational model is superior to OpenAI ChatGPT-3.5 Turbo, and Mistral-8x22B, and roughly on-par with OpenAI ChatGPT-4 Turbo.

Apple says it will share more information soon about more members of its broader family of generative AI models, including its language, diffusion, and coding models.

Tagged with

Private Cloud Compute

About author

Paul Thurrott

Paul Thurrott is an award-winning technology journalist and blogger with 30 years of industry experience and the author of 30 books. He is the owner of Thurrott.com and the host of three tech podcasts: Windows Weekly with Leo Laporte and Richard Campbell, Hands-On Windows, and First Ring Daily with Brad Sams. He was formerly the senior technology analyst at Windows IT Pro and the creator of the SuperSite for Windows from 1999 to 2014 and the Major Domo of Thurrott.com while at BWW Media Group from 2015 to 2023. You can reach Paul via email, Twitter or Mastodon.

View Articles

Currently on Forums
Visit the forums
- [CLOSED] Ask Paul for Friday, June 26
  Posted by Paul Thurrott
  
  7
  comments
- Interview with Cory Doctorow regarding AI and the AI Bubble
  Posted by anoldamigauser
  
  12
  comments
- [CLOSED] Ask Paul for Friday, June 19
  Posted by Paul Thurrott
  
  5
  comments
- Microsoft Office 365 Desktop Apps – Upgrade your plan banner
  Posted by Lee Thacker
  
  6
  comments
Podcasts
Podcast Hub
- First Ring Daily 1985: Another Day, Another Doom
  
  Aired on June 30, 2026 by Brad Sams with 0 Comments
- First Ring Daily 1984: End of the Year
  
  Aired on June 29, 2026 by Brad Sams with 0 Comments
- Windows Weekly 989: Deer Hate MSDN
  
  Aired on June 25, 2026 by Paul Thurrott with 0 Comments
- First Ring Daily 1983: Digging a Ditch
  
  Aired on June 25, 2026 by Brad Sams with 0 Comments
Join the crowd where the love of tech is real - become a Thurrott Premium Member today!

Explore Premium Benefits

Tagged with

Share post