OpenAI suspects DeepSeek of using its AI models—was China’s AI revolution built on theft?
OpenAI suspects DeepSeek of using its AI models—was China’s AI revolution built on theft?
A probe into DeepSeek’s AI training methods could undermine its status as a rising AI powerhouse.
Is the surprising success of China's DeepSeek—the company that claims to have trained an advanced artificial intelligence (AI) model at a fraction of the cost and resources required by Western firms—the result of intellectual property theft? That is the question now being raised by OpenAI and the White House. If their suspicions prove correct, it would cast doubt on the breakthrough that has positioned DeepSeek as a formidable player in the global AI race.
This week, DeepSeek sent shockwaves through the AI and semiconductor industries with the release of R1, an advanced generative AI (GenAI) model. R1 is said to rival OpenAI’s o1 model in capabilities, yet was reportedly trained at a fraction of the usual cost—just $6 million in computing power. In comparison, developing state-of-the-art models often requires investments of tens of millions, if not billions, of dollars.
However, OpenAI now suggests that DeepSeek’s success may have been built on the foundation of OpenAI’s own extensive research and development efforts. The company claims to have found evidence that DeepSeek used OpenAI’s models to train R1, a practice that could constitute intellectual property infringement.
Speaking to the Financial Times, OpenAI stated that DeepSeek employed a training method called "distillation," in which smaller AI models are improved using outputs from larger, more capable models. While this technique is common, companies typically use their own models for the process. OpenAI, however, suspects that DeepSeek “distilled” its models using OpenAI’s proprietary technology in violation of its terms of service.
“The problem is when someone takes our technology and uses it to build their own product,” a source close to OpenAI told Financial Times on Wednesday. The company further stated, “We know that groups in [China] are actively working to use methods, including what’s known as distillation, to try to replicate advanced U.S. AI models.
“We are aware of and reviewing indications that DeepSeek may have inappropriately distilled our models, and will share information as we know more. We take aggressive, proactive countermeasures to protect our technology and will continue working closely with the U.S. government to protect the most capable models being built here.”
According to Bloomberg, both Microsoft and OpenAI are investigating whether a group linked to DeepSeek obtained data from OpenAI’s models without authorization. Sources familiar with the probe say that as early as last fall, Microsoft researchers identified individuals believed to be associated with DeepSeek extracting large volumes of data through OpenAI’s developer tools. If confirmed, this activity could violate OpenAI’s terms of service and indicate an attempt to bypass usage limits designed to prevent excessive data extraction.
The suspicions have gained traction in Washington. "There's substantial evidence that what DeepSeek did here is they distilled the knowledge out of OpenAI's models," said White House "AI and crypto czar", David Sacks. "I think one of the things you're going to see over the next few months is our leading AI companies taking steps to try and prevent distillation... That would definitely slow down some of these copycat models."
So far, neither Sacks, OpenAI, nor Microsoft has publicly disclosed concrete proof of these allegations. However, industry insiders acknowledge that AI labs in both China and the United States frequently use model outputs from companies like OpenAI to improve their own systems. While this is a common practice, the extent to which DeepSeek relied on OpenAI’s work remains unclear.
If the accusations are substantiated, DeepSeek’s achievement could be significantly undermined. It is difficult to quantify how much it benefited from leveraging OpenAI’s models, but if its cost savings were largely due to this approach, then its so-called revolution in AI development could be far less groundbreaking than it appears. Moreover, if training next-generation AI models at a lower cost is only possible through the use of pre-existing models, then companies like OpenAI will continue to require billions in computing resources to stay ahead.
This, in turn, presents a broader challenge for the AI industry: how to prevent competitors—especially Chinese firms, which have a history of disregarding Western intellectual property laws—from exploiting their technological advancements to develop rival models at a fraction of the cost.