Apple Details its New Privacy-Focused Approach to AI Training

Apple Intelligence

Apple detailed yesterday how it plans to improve its AI training practices by using users’ data in a way that respects their privacy. In a post on its Machine Learning Research blog spotted by Bloomberg, the company explained that it will compare “synthetic data” with content from participating devices, all while making sure sampled content never leaves the device and isn’t shared with Apple.

“One of our principles is that Apple does not use our users’ private personal data or user interactions when training our foundation models, and, for content publicly available on the internet, we apply filters to remove personally identifiable information like social security and credit card numbers,” the company emphasized. To improve Apple Intelligence features, however, Apple needed to create new methods to “discover usage trends and aggregated insights” without compromising the privacy of its users.

To improve Apple Intelligence’s text generation capabilities, Apple is creating synthetic emails that mimic user-generated emails, and the company will compare them to real emails from devices that have opted in to Device Analytics. However, the analysis process will happen on participating devices, without Apple getting any knowledge of individual user emails. This is quite technical, but the two following paragraphs explain how Apple will proceed:

“To curate a representative set of synthetic emails, we start by creating a large set of synthetic messages on a variety of topics (…) We then derive a representation, called an embedding, of each synthetic message that captures some of the key dimensions of the message like language, topic, and length. These embeddings are then sent to a small number of user devices that have opted in to Device Analytics.

Participating devices then select a small sample of recent user emails and compute their embeddings. Each device then decides which of the synthetic embeddings is closest to these samples. Using differential privacy, Apple can then learn the most-frequently selected synthetic embeddings across all devices, without learning which synthetic embedding was selected on any given device. These most-frequently selected synthetic embeddings can then be used to generate training or testing data, or we can run additional curation steps to further refine the dataset.”

Apple said yesterday that it will soon start using synthetic data with users who opt in to Device Analytics to improve the email summaries provided by Apple Intelligence. According to Gurman, this will happen with upcoming betas of iOS 18.5, iPadOS 18.5, and macOS 15.5.

In addition to comparing synthetic data sets with real emails to improve text generation, Apple is also using what it calls “differential privacy” to improve Genmoji, its AI-generated emoji on devices that support Apple Intelligence. In practice, differential privacy works by randomly polling participating devices to identify popular prompts and prompt patterns, all while ensuring that these prompts cannot be linked to individual users.

“In upcoming releases we will also use this approach, with the same privacy protections, for Image Playground, Image Wand, Memories Creation and Writing Tools in Apple Intelligence, as well as in Visual Intelligence,” the company said yesterday.

Tagged with

Share post

Thurrott