Why OpenAI and Microsoft’s IP Theft Claims Against DeepSeek Don’t Hold Up
Note: I am not an attorney and have no expertise in law or IP law.
OpenAI and Microsoft have gone on the offensive against DeepSeek, alleging potential intellectual property (IP) theft. David Sacks, who serves as an advisor on AI and cryptocurrency matters, suggested the possibility of IP theft through a technique known as distillation: "There's a technique in AI called distillation... when one model learns from another model [and] kind of sucks (emphasis mine) the knowledge out of the parent model... And there's substantial evidence that what DeepSeek did here is they distilled the knowledge out of OpenAI models, and I don't think OpenAI is very happy about this."1
![](https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedd1337c-bf3a-4778-9225-d260f6ea768d_1648x2248.jpeg)
This claim warrants careful scrutiny and skepticism, particularly given Sacks' complex web of interests. While he advises on AI and crypto policy, his primary role is as a venture capitalist with substantial investments in Silicon Valley, including AI and cryptocurrency companies. This dual position—as both a policy advisor and major investor in the technologies he oversees—raises significant questions about potential conflicts of interest and the objectivity of these allegations against DeepSeek.
Ok. Let’s breakdown the claim of potential intellectual property theft. There are two ways to use OpenAI. The first is direct use, which is what most of us are familiar with. The second is through APIs, which stands for Application Program Interface. Think of an API like an electrical outlet - it provides a standard way to access power, but you can't see the complex electrical grid or how electricity is generated behind the wall.
But does using an API to interact with an AI model—without accessing its internal workings—amount to stealing intellectual property? That’s the core question. OpenAI and Microsoft’s argument appears to rest on the idea that by repeatedly querying an API and analyzing its responses, DeepSeek has somehow extracted or “sucked out” OpenAI’s proprietary knowledge in an unlawful way.
To assess whether this claim holds water, let’s consider an analogy. Imagine you’re trying to understand how a car works. You can drive it, observe its speed, listen to the engine, and test how it responds to different road conditions. Based on those observations, you might be able to infer some aspects of how the car is designed. But does that mean you’ve stolen the car’s intellectual property? Of course not. You haven’t taken the engine, the blueprints, or the proprietary software running the vehicle—you’ve simply learned from using it.
Now, let’s break this down step by step to see whether DeepSeek’s alleged use of OpenAI APIs is any different.
1. APIs Don’t Provide Access to the LLM’s Internals
When you interact with an API, you only see outputs—you’re inferring how the model works, not accessing or copying its internal architecture or weights.
This is very different from stealing proprietary code or directly copying training data.
👉 Car Analogy: If you drive a car, observe how it handles, and then build your own based on what you learned, you haven’t stolen the car manufacturer’s IP.
👉 LLM Equivalent: If you interact with an LLM, infer how it structures knowledge, and then build a different model based on those insights, you haven’t copied the original LLM.
2. Learning from Use ≠ Stealing Intellectual Property
If interacting with a system and learning from its behavior constituted IP theft, then every engineer who reverse-engineered a product based on observation would be guilty of theft.
Courts usually distinguish learning from interaction vs. copying internal components (e.g., training data, code, or model weights).
👉 Car Analogy: If you drive a Tesla, take notes on how it accelerates, and then build your own electric car with a different motor and software, Tesla can’t claim you stole their IP.
👉 LLM Equivalent: If you query an LLM extensively, analyze its responses, and then train a different LLM with a different dataset and architecture, that’s not the same as copying the original model.
3. Using an API to Enhance a Product vs. Rebuilding the Original
If you use a car’s API (say, Tesla’s Autopilot API) to add components to a car (e.g., a better dashboard UI), you are working within the ecosystem.
If you then build your own self-driving car based on what you learned from using the API, Tesla might not like it, but it’s not clear you stole anything.
👉 Car Analogy: If you integrate Tesla’s API into a car accessory and later build an entirely different self-driving system based on the experience, Tesla may claim it’s derived, but it’s not a clear case of theft.
👉 LLM Equivalent: If you use OpenAI’s API to enhance an application and later build your own model based on that experience, OpenAI might assert that you copied them, but proving this would be difficult.
4. The LLM Provider’s Argument is a Stretch
The strongest argument the LLM provider has is that you violated its API Terms of Service (e.g., using it to train another model).
But IP law is different—a company can’t just claim, “You used our tool and got smarter, therefore you stole our IP.”
If learning from outputs constituted theft, then using Google Search to research and write a book would be IP theft against Google.
👉 Car Analogy: If Ford says, “You drove our car and learned too much about suspension systems, so you stole our IP,” that wouldn’t hold up in court.
👉 LLM Equivalent: If OpenAI says, “You used our API, learned from it, and now you’re training your own LLM,” it’s a weak legal argument.
In conclusion, Using an LLM API to learn and build something new is more like driving a car and designing a better one—it’s not theft. However, the LLM provider might still try to shut it down on contractual or business grounds.
At its core, OpenAI and Microsoft’s claim against DeepSeek hinges on a flawed premise: that learning from a system’s outputs is the same as stealing its internals. But by that logic, every innovator who studies a product and builds something better would be guilty of intellectual property theft. That’s not how technological progress—or IP law—works. While OpenAI may have contractual levers to restrict how its API is used, it’s a huge stretch to argue that someone who learns from an API’s responses, without accessing its underlying code or training data, has “stolen” anything. If anything, OpenAI’s position seems less about protecting intellectual property and more about maintaining competitive control. The real question is not whether DeepSeek has committed IP theft, but whether OpenAI and Microsoft are trying to weaponize intellectual property claims to stifle competition.
Financial Times. OpenAI says it has evidence China’s DeepSeek used its model to train competitor. Jan 29, 2025. https://www.ft.com/content/a0dfedd1-5255-4fa9-8ccc-1fe01de87ea6