LLM App Compliance

Privacy Policy for an LLM App

Apps built on large language models route user input to a third party model provider, often store embeddings of personal content, and may use the data for product improvement. Your privacy policy must explain all of this.

Last updated · Reviewed for compliance

AK
Written by Anupam Kumar
Updated
9 min read
Reviewed for compliance

2026 LLM compliance

An LLM app privacy policy must disclose every data flow tied to the model, including prompts, embeddings, vector storage, model provider processing, and any training or evaluation use of user content. Under GDPR, CCPA, and the EU AI Act provisions taking effect in 2026, LLM app operators must identify themselves as the data controller, name the model provider as a processor, describe retention specifically, and provide a workable path for users to delete their data.

What an LLM App Actually Does With User Data

An LLM app typically takes a user prompt, optionally enriches it with context from a vector database or knowledge base, sends the combined prompt to a model provider API, and returns the response to the user. Each of those steps touches data that may be personal.

If your app stores user uploaded documents or notes and creates vector embeddings for retrieval, those embeddings are derived from the original content. Under GDPR, embeddings of personal content are considered personal data even though they look like numbers.

If your app maintains conversation history, that history is a long term store of user prompts and model responses. It is fully personal data and must be handled with appropriate security and retention controls.

Who Is the Data Controller and Who Is the Processor

You, the operator of the LLM app, are the data controller. You decide what data is collected, why, and what is done with it. You are responsible to users and regulators.

The model provider (OpenAI, Anthropic, Google, an open source model on a cloud GPU, or anything else) is a data processor. They process data on your instructions. Your contract with them sets the boundaries.

If you use a vector database hosted by a third party, that vendor is also a processor. The same is true for your hosting provider, your authentication provider, your payment processor, and any analytics tool you embed. The privacy policy must list each one.

Training and Improvement Disclosure

If you do not use user data to train any model, say so explicitly. This is often the strongest privacy claim a small LLM app can make and users value it.

If you do use user data, describe: what data is used, what kind of training or fine tuning, who has access to the raw data, how a user can opt out, and what happens to data already used in training when a user later requests deletion.

Be honest about the limits of deletion. Trained model weights cannot be selectively unlearned. If a user's data has been used to fine tune a model, that influence is baked in. You can stop using the user's data going forward, but the historical training cannot be reversed. Privacy regulators are increasingly focused on this gap and your policy should address it directly.

Vector Stores and Personal Embeddings

If your app uses retrieval augmented generation, you almost certainly have a vector database holding embeddings of user content. List the vendor (Pinecone, Weaviate, Qdrant, pgvector, or similar) and where it is hosted.

Describe what gets embedded, how long the embeddings are retained, and what happens when a user deletes the source content. Best practice is to delete the embeddings at the same time as the source.

Embeddings are not anonymisation. Two papers from 2023 and 2024 showed that embeddings of personal text can be inverted to recover meaningful portions of the original. Treat embeddings as personal data for GDPR purposes.

EU AI Act and Transparency Obligations

The EU AI Act introduces transparency requirements that overlap with privacy disclosure. Users interacting with an AI system must be informed of that fact, and certain high risk uses require additional documentation.

Most general purpose LLM apps fall into the limited or minimal risk category, which means transparency obligations apply but not the heavy compliance burden of high risk AI. Your privacy policy and a clear in product disclosure together usually satisfy the transparency requirement.

If your LLM app is used in employment screening, education, credit scoring, or law enforcement, you may be in a higher risk category and additional documentation is required outside the privacy policy.

Frequently Asked Questions

Do I need a separate privacy policy for my LLM app or can I reuse my main website policy?

If the LLM app is a separate product or has materially different data flows from your main website, write a separate policy or a clearly marked section. Combining everything into one policy is acceptable only if every flow is described accurately.

How do I describe a model provider that says they do not train on my data?

State the fact and link to the provider's documentation. For example: prompts and responses are sent to OpenAI under their API enterprise terms, which exclude this data from model training. This is a strong claim and users appreciate the link to verify it.

What if I cannot afford a lawyer to review my LLM app privacy policy?

Use a generator that produces a structurally sound policy and customize the AI specific sections yourself. For high risk uses (health, finance, legal, employment), invest in a lawyer review. For consumer or productivity tools, a well written generated policy plus careful customisation is enough to start.

Does a self hosted open source LLM still need a privacy policy?

Yes. The legal requirement comes from collecting personal data, not from using a third party API. If users send prompts that include personal information, you have a privacy policy obligation regardless of where the model runs.

Generate an LLM app privacy policy

Vector store, model provider, training opt out, and EU AI Act disclosure, all covered.

Related Resources