
OpenAI today responded to The New York Times’ copyright infringement lawsuit, but I feel that it bungled a chance to set the record straight, in part by admitting to the crime at the center of the paper’s allegations.
“While we disagree with the claims in The New York Times lawsuit, we view it as an opportunity to clarify our business, our intent, and how we build our technology,” a new OpenAI blog post notes. “Our “position can be summed up in four points: We collaborate with news organizations and are creating new opportunities; training is fair use, but we provide an opt-out because it’s the right thing to do; ‘regurgitation’ is a rare bug that we are working to drive to zero; [and] The New York Times is not telling the full story.”
Um.
By explaining that “regurgitation”—what The New York Times called the “verbatim” recitation of its content as demonstrated by multiple examples—is real, OpenAI has explicitly admitted to the publication’s central allegation: ChatGPT literally can and does republish copyrighted content, as it charged. And while this bug may or may not be rare—here, one might say that it is OpenAI “not telling the full story”—it is apparently enough of an issue that the company is “working to drive to zero,” meaning they are working to eliminate it. But the NYT complaint is about what OpenAI does now, not what it may or may not do in the future. And right now, OpenAI just admitted that the publication is correct.
Anyway. OpenAI says that it has met with “dozens” of news organizations to “opportunities, discuss their concerns, and provide solutions.” And while it doesn’t specifically mention paying to access copyrighted content, it does point to its partnerships with just four organizations, the Associated Press, Axel Springer, American Journalism Project, and NYU as “a glimpse into [its] approach.” So OpenAI has agreed to license content from those four sources, an act that involves a one-time fee or ongoing payments. But no parties have ever revealed the financial terms of these deals. Yes, I looked.
OpenAI then claims that “training AI models using publicly available internet materials is fair use,” an emphatic shift from its earlier, pre-lawsuit position that “[OpenAI] believes that the training of AI models qualifies as a fair use.” But now, it has expanded on that claim, noting that AI training is somehow “fair to creators, necessary for innovators, and critical for US competitiveness” without ever trying to explain why. It doesn’t matter: The only legal issue in all that is the fair use bit, and as I noted earlier, that legal standard is in fact not clear-cut at all.
(Fair use is like pornography in that you always know it when you see it. For example, a movie critic playing clips from a movie while reviewing it is an example of fair use; playing the entire movie, discussing it afterward, and not paying its owners is not fair use. That’s obvious. But the tricky bit is finding the line where fair use turns into theft.)
OpenAI says that it offers a simple opt-out process for publishers, an option The New York Times availed itself of in August 2023. But the firm doesn’t explain when it introduced this option, or whether the Times were aware of this before or after OpenAI had scraped its entire library of copyrighted content. Does opting-out somehow remove that content from ChatGPT’s “memory,” producing a “Flowers for Algernon” effect? (It doesn’t.)
OpenAI’s use of human terms like “regurgitation,” “memory,” “learning process” are explicitly meant to obscure to humanize AI despite the fact that it does not “learn” in a way that resembles human learning at all. And part of the problem with this lack of differentiation is that AI lacks the moral, ethical, and critical thinking skills of even a small child, which is exactly what leads to “hallucinations” (read: software bugs) and “regurgitation” (read: plagiarism and copyright infringement). Even unsophisticated people would never parrot back facts they had just heard to those who had taught them, hoping to appear intelligence. AI lacks even that basic skill.
OpenAI’s argument that ChatGPT regurgitating (sorry, repeating copyrighted content verbatim) is “not an appropriate use” is moot. “I didn’t mean to do that illegal thing”—ignorance of the law for us humans—is not an accepted legal defense. ChatGPT is doing this thing. You just admitted to it.
As for The New York Times “not telling the full story,” sure. The lawsuit is literally their side of the story and if this case goes to court, you can both argue your versions of the story in front of a jury, and we’ll see how that goes. But the OpenAI response is likewise “not telling the full story,” as noted throughout this post. It’s rare for anyone trying to make a point to tell the full story. You’re trying to convince others that you’re right, not tell a nuanced story that might make you look less than perfect.
All this said, there are some interesting revelations in the response too.
OpenAI says that it and the NYT negotiated “constructively” until their last communication on December 19, which was about 10 days before the lawsuit. It claims that it told the publication that its content “didn’t meaningfully contribute to the training of our existing models and also wouldn’t be sufficiently impactful for future training,” which I believe to be beside the point, like a thief claiming that it had only stolen small things from one accuser because there were other bigger targets out there: Stealing is stealing. And the publication apparently never contacted OpenAI ahead of time before launching the lawsuit. Which is perfectly legal.
But in response to the Times’ very specific examples of theft, OpenAI claims that the paper had previously refused to share examples with it. And that the regurgitations that it cites in its lawsuit “appear to be from years-old articles that have proliferated on multiple third–party websites.” That is interesting, but its notion of “intentionally manipulated prompts” is ludicrous and akin to claiming that someone specifically drove to a home where they had reason to believe that they would find their stolen goods, and then did. Here’s an idea: Don’t let users “manipulate” prompts.
OpenAI also addresses what must be a key underlying concern of the NYT, that ChatGPT could be used by its paying customers to read the news, a technology shift that could haste or lead to the death of traditional news publishers. “This misuse … is not a substitute for The New York Times,” it says. Here, I agree, but this is tangential to the legal allegation because it speaks to what can happen because of this technology: It’s true that ChapGPT today is no substitute for a newspaper, but it or things that build on it could be. After all, no one thought that the web would destroy traditional publishing back in 1995. The NYT sees ChatGPT as the earthquake that could trigger the tsunami that overwhelms it. This is at least understandable.
Put simply, OpenAI response does not meet the bar of its own central argument, that the Times lawsuit is without merit. But the firm does leave open the possibility for the outcome I’ve always seen in this case, that the two sides will partner and move forward, if not amicably then at least peacefully.
“We are hopeful for a constructive partnership with The New York Times and respect its long history, which includes reporting the first working neural network over 60 years ago and championing First Amendment freedoms,” it writes in a curiously graceful passage. “[And] we look forward to continued collaboration with news organizations, helping elevate their ability to produce quality journalism by realizing the transformative potential of AI.”
The trick now is to make that future a reality: AI today is viewed by content creators as a technological innovation that will harm their careers and their ways of life, not help them. If OpenAI believes what it says, then a future in which AI works with these creators is the type of future I and other creators can embrace.
With technology shaping our everyday lives, how could we not dig deeper?
Thurrott Premium delivers an honest and thorough perspective about the technologies we use and rely on everyday. Discover deeper content as a Premium member.