Most LLM products were not designed with GDPR in mind, and most product teams treat compliance as a checkbox added at the end. Both habits produce the same result: a working product, an unhappy DPO, and an architecture rewrite when a customer or regulator probes the data flow. This post is a practical walkthrough of the GDPR articles that actually bite when you build with LLMs, and the patterns that hold up under scrutiny.
The five articles that matter most
Article 5 — principles
GDPR Article 5 sets the foundational principles. Three of them apply directly to LLM products:
- Article 5(1)(b) — purpose limitation. Personal data collected for one purpose cannot be repurposed for another without a fresh lawful basis. If you indexed a customer-support transcript to answer support questions, you cannot then feed it to a marketing-analytics LLM without fresh consent.
- Article 5(1)(c) — data minimisation. Process only what you need. LLM products often violate this by sending entire user histories or full document corpora to the model when only a relevant slice is needed.
- Article 5(1)(d) — accuracy. Personal data must be accurate. LLM outputs containing personal data are often hallucinated and inaccurate, which is a documented enforcement target — see Italian and Polish DPA actions on ChatGPT in 2023-2024.
Article 6 — lawful basis
Every processing operation needs a lawful basis. The six bases (Article 6(1)(a)-(f)) are: consent, contract, legal obligation, vital interests, public task, legitimate interests. For LLM-driven personal-data processing, the realistic options are usually consent, contract necessity, or legitimate interests. Each carries different evidence requirements:
- Consent must be specific, informed, freely given, and revocable. A pre-ticked box does not qualify. Bundle consents are usually invalid.
- Contract necessity requires that the processing is genuinely necessary to perform the contract — not just convenient.
- Legitimate interests requires a documented Legitimate Interests Assessment (LIA) showing that the controller's interest is not overridden by the data subject's rights and reasonable expectations.
Article 22 — automated decision-making
Article 22 grants data subjects the right not to be subject to a decision based solely on automated processing — including profiling — that produces legal or similarly significant effects. LLM-driven product decisions that affect users meaningfully (credit, employment screening, insurance pricing, content moderation that restricts access) potentially fall under Article 22. The standard mitigations: meaningful human review in the loop, the right to obtain human intervention, the right to contest.
Article 25 — data protection by design and by default
Privacy must be built into the system from the start. For LLM products this means: minimise the personal data sent to the model, prefer pseudonymised or aggregated inputs where possible, and document the design choices that protect data by default.
Articles 28 and 32 — processor agreements and security
When you send personal data to an LLM API (OpenAI, Anthropic, Mistral, Cohere), the LLM provider is your data processor under Article 28. You need a Data Processing Agreement (DPA) covering the GDPR-required clauses. All major LLM providers offer one — the question is whether you have signed it. Article 32 then requires "appropriate technical and organisational measures" for security: encryption in transit and at rest, access control, breach detection. None of this is unique to LLMs, but it applies to LLM data flows like to any other.
International transfers — Article 44 and Schrems II
Sending personal data to an LLM provider hosted outside the EU/EEA is an international data transfer, governed by Articles 44-49. Since the Schrems II ruling (2020), transfers to the United States in particular require additional safeguards beyond Standard Contractual Clauses (SCCs):
- Transfer Impact Assessment (TIA) documenting the legal regime of the recipient country and any supplementary measures.
- Encryption where keys are held only by the controller (not the processor).
- EU-US Data Privacy Framework (DPF) certification on the processor side, where applicable.
The CNIL (France's data protection authority) has been explicit that controllers using US-hosted AI services for personal data must document a defensible transfer mechanism. The simplest path is to use EU-hosted endpoints where available — Azure OpenAI Service in EU regions, AWS Bedrock eu-west-3 / eu-central-1 with EU-only routing, Mistral on EU infrastructure.
The right to erasure problem
Article 17 grants the right to erasure ("right to be forgotten"). This is straightforward for structured databases — delete the row — and operationally hard for LLMs. Two distinct concerns:
Erasure from prompts and retrieval indexes
Tractable. If a user requests erasure, you delete their data from your retrieval index (vector DB), conversation history, and any caches. This is engineering work but bounded.
Erasure from model weights
Hard. If the user's personal data was part of model training, removing it from weights is technically very difficult — the most common position taken by regulators is that model providers (the trainers) bear this obligation, not deployers (the API consumers). For controllers using foundation-model APIs without fine-tuning on personal data, the practical posture is: ensure the contract with the LLM provider commits them to honouring erasure requests against their training pipeline. For controllers fine-tuning their own models on personal data, the obligation is direct and the engineering harder.
DPIA — when you need one
Article 35 requires a Data Protection Impact Assessment when processing is likely to result in high risk to data subjects. The CNIL's list of operations requiring a DPIA includes processing that uses "innovative technologies" — and the CNIL has explicitly named generative AI as falling within this. Practically, if your product uses LLMs to process personal data at meaningful scale, expect to need a DPIA. Topics to cover:
- Description of the processing and purposes
- Necessity and proportionality assessment
- Risks to data subjects and likelihood of those risks
- Mitigations applied (Article 25 by-design measures, encryption, access control)
- International-transfer safeguards if applicable
- Consultation with the DPO and (where required) with the supervisory authority
Patterns that hold up under scrutiny
Pattern 1: Pseudonymise before sending to the LLM
Many LLM use cases do not actually require the user's personal data to reach the model. A support-summarisation tool can replace names, emails, phone numbers, and IDs with tokens before the prompt is built, then map tokens back at output time. Pseudonymisation is not anonymisation under GDPR (the data is still personal data), but it materially reduces the surface area and is a strong Article 25 by-design measure.
Pattern 2: EU-hosted endpoints by default
Use AWS Bedrock eu-west-3, Azure OpenAI Service in EU regions, or Mistral's EU infrastructure as the default. Route to non-EU endpoints only when the use case explicitly justifies it and a documented TIA exists. This collapses the international-transfer compliance work for most operations.
Pattern 3: Logged consent and lawful-basis tracking
For each personal-data processing operation in your system, store the lawful basis used and (where consent is the basis) the consent record. Without this, you cannot answer a regulator who asks "on what basis did you process this user's data on this date".
Pattern 4: Output filtering for personal data hallucinations
LLMs hallucinate. When the hallucination is a plausible-sounding made-up fact about a real person, that is a GDPR Article 5(1)(d) accuracy violation in the making. Production systems should run output through a filter that flags unsupported personal-data claims, and either suppress them or explicitly mark them as unverified.
Pattern 5: Audit log per query
For regulated deployments, keep a per-query audit log: timestamp, user, retrieved context, generated output, lawful basis, refusal events. This serves the EU AI Act Article 12 record-keeping obligation as well, and it is the single most useful artefact you can produce when a regulator or customer requests evidence.
Patterns that look fine but are not
- "We do not store the data after the API call." Storage is not the issue; the processing happens at the moment of the API call. If personal data flows to a non-EU processor without a documented transfer mechanism, that is the violation, even if nothing is retained.
- "We anonymise before sending." Most "anonymisation" is actually pseudonymisation under the GDPR definition, because the data can still be reassociated with the individual through other information. Pseudonymisation is a useful safeguard but does not exit GDPR scope.
- "The user agreed to the Terms." A blanket terms-of-service acceptance is rarely a valid consent under GDPR for specific personal-data operations like LLM inference. Specific, informed consent or another lawful basis is required for the specific processing.
- "The LLM provider says they are GDPR-compliant." The provider being compliant covers their obligations as a processor; it does not cover yours as a controller. The controller has independent obligations regardless of who the processor is.
- "It is just for internal testing." Internal use of personal data still requires a lawful basis. "Testing" is not a Article 6 basis.
A practical compliance checklist
Before shipping an LLM-powered product to EU users, walk this checklist:
- Map every personal-data flow into and out of LLM calls
- Identify the lawful basis (Article 6) for each flow
- Sign the Article 28 DPA with each LLM provider
- Confirm the EU residency posture of each LLM endpoint and document any non-EU transfers (TIA + SCCs + supplementary measures)
- Conduct a DPIA if processing is at meaningful scale or involves sensitive data
- Implement pseudonymisation or data minimisation at the prompt-build layer where feasible
- Implement an audit log per query covering lawful basis, retrieved context, output, and refusals
- Build erasure workflows that cover indexes, prompts, and any caches
- Train the team on the GDPR baseline — engineers need to recognise when they are processing personal data
- Have your DPO sign off on the architecture before launch
Bottom line
GDPR-compliant LLM products are not harder to build than non-compliant ones — they are just designed differently from the start. The product teams that get this right treat the data flow as a first-class architectural concern, not a legal review at the end. Insightrix Sovereign AI structures EU-compliant LLM deployments specifically; the CARAG research paper formalises a specific architecture for compliance-aware retrieval. For binding guidance on your specific facts, consult your DPO or a qualified data-protection lawyer.