DeepSeek took the AI world by storm this month, but the company now faces accusations of using data without permission. OpenAI claims to have evidence that DeepSeek leveraged OpenAI’s models to train a competing AI model. If true, this would violate OpenAI’s terms of service.
OpenAI told the Financial Times about evidence of DeepSeek using “distillation” to train AI models. In this context, that term refers to a company using a preexisting model’s outputs to train a newer model. Distillation reduces the cost of model creation by building on the work already done for the “teacher model.”
DeepSeek’s R1 model shook the AI industry and sent stocks plummeting because the company claimed to have developed it at a fraction of the cost of competing models. If it’s proven that DeepSeek used distillation to build on OpenAI’s models, the claims of lower costs would hold less weight.
Related: DeepSeek outperforms OpenAI’s reasoning model at just 3% of the cost
Distillation is not always a bad thing. In fact, it is used in the AI industry frequently. The issue in the case of DeepSeek and OpenAI is that the former has been accused of using distillation in a way that violates OpenAI’s terms of service. The OpenAI API cannot be used to “copy” OpenAI’s service. Users are also prohibited from “[using] output to develop models that compete with OpenAI.”
The Financial Times spoke with a person close to OpenAI and shared context regarding OpenAI’s terms of use.
As first covered by Bloomberg, both OpenAI and Microsoft have investigated the situation and found accounts that are believed to be from DeepSeek. Those accounts had access blocked last year because they were suspected of being used for distillation.
OpenAI shared the following statement with our colleagues at TechRadar:
“We know PRC based companies – and others – are constantly trying to distill the models of leading US AI companies. As the leading builder of AI, we engage in countermeasures to protect our IP, including a careful process for which frontier capabilities to include in released models, and believe as we go forward that it is critically important that we are working closely with the U.S. government to best protect the most capable models from efforts by adversaries and competitors to take US technology.”
David Sacks, the White House AI and crypto czar, recently discussed DeepSeek and OpenAI. “There’s substantial evidence that what DeepSeek did here is they distilled the knowledge out of OpenAI models, and I don’t think OpenAI is very happy about this,” said Sacks.
OpenAI’s ironic accusation
The irony in all of this is that OpenAI has been accused of using data to train AI models without permission several times. The New York Times sued OpenAI in December 2023, arguing that using data to train generative models does not fall under fair use. Lawsuits from The Intercept, Raw Story, and AlterNet followed in February 2024.
One of the biggest criticisms of OpenAI is that it allegedly trained its AI models using data without proper authorization
OpenAI allegedly using data without permission to train AI models does not excuse any behavior of DeepSeek. OpenAI has clear terms that prohibit using its models to create technology that competes with OpenAI. That being said, the accusations from OpenAI are quite ironic.