
An Apple research document promises that the firm will not use its customers’ private personal data to train its AI models. But its Applebot web crawler was trained on “the open web” without the consent of publishers, with Apple only now allowing them to opt out after the fact.
“Apple Intelligence is comprised of multiple highly-capable generative models that are specialized for our users’ everyday tasks, and can adapt on the fly for their current activity,” Apple explains. “The foundation models built into Apple Intelligence have been fine-tuned for user experiences such as writing and refining text, prioritizing and summarizing notifications, creating playful images for conversations with family and friends, and taking in-app actions to simplify interactions across apps.”
Sign up for our new free newsletter to get three time-saving tips each Friday — and get free copies of Paul Thurrott's Windows 11 and Windows 10 Field Guides (normally $9.99) as a special welcome gift!
"*" indicates required fields
The document goes on to explain how two of Apple’s foundational models, one on-device and one in its Private Cloud Compute datacenters –were built. And it notes some other models Apple has created, including one for coding that will appear in XCode and a diffusion model so “users can express themselves” in apps like Messages.
Like Microsoft and Google, Apple has established a set of responsible AI development principles. But unlike these companies, Apple explicitly will not use customer data to train its AI models.
“We do not use our users’ private personal data or user interactions when training our foundation models,” Apple says. But this leads to an obvious question, given that AI models have to be trained on something. How did Apple train these models?
According to the research document, Apple trained its models on “licensed data, including data selected to enhance specific features, as well as publicly available data collected by our web-crawler, AppleBot. Web publishers have the option to opt out of the use of their web content for Apple Intelligence training with a data usage control.”
And that’s true. It’s just that Applebot has already crawled the web and gathered the data it needs without telling anyone or giving publishers the chance to opt out. They can opt out now, as explained by an Apple Support document. But it may be too late, as that data was already taken and used to train AI. You can’t unwind that clock.
In the good news department, Apple says that it removes personal information like social security numbers, credit card numbers, and the like when it’s found on the public Internet. It also filters out profanity and other “low-quality content.”
I won’t pretend to understand the technical details of Apple’s AI model training, but the company claims its on-device foundational model is superior to Google Gemma-2B and -7B, Microsoft Phi-3-mini, and Mistral-7B. And that its cloud-based Private Cloud Compute foundational model is superior to OpenAI ChatGPT-3.5 Turbo, and Mistral-8x22B, and roughly on-par with OpenAI ChatGPT-4 Turbo.
Apple says it will share more information soon about more members of its broader family of generative AI models, including its language, diffusion, and coding models.