Google’s New Privacy Policy Confirms AI Data Scraping

Blurry Google logo
Image credit: Mitchell Luo

Google has quietly updated its privacy policy to explain how it will use public data to help train its AI products. And it makes it clear that it will scrape data from any public-facing website to improve its AI.

The change is buried deep in the privacy policy.

“Google uses information to improve our services and to develop new products, features, and technologies that benefit our users and the public,” the first relevant passage notes. “For example, we use publicly available information to help train Google’s AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities.”

That doesn’t sound like a privacy invasion to me, but in a later clause, the section on “publicly accessible sources” has been modified to account for AI data scraping as well.

“We may collect information that’s publicly available online or from other public sources to help train Google’s AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities,” it reads. “Or, if your business’s information appears on a website, we may index and display it on Google services.”

I don’t want to be too alarmist about this. And to give Google some credit, it does maintain a version of the privacy policy that calls out the changes it made in the most recent revision, and most of the changes in the current version, from July 1, are not related to AI. But it’s reasonable to view these changes in the context of Google’s business practices, and this is a company that still makes almost 80 percent of its revenues by harvesting user data and selling it to advertisers. Unattributed data-scraping is inarguably core to this business.

Tagged with

Share post

Thurrott