Problem
An enterprise client needed retrieval-augmented generation over a 1.2M-document internal corpus where retrieval-time eligibility — who is allowed to see what, for which purpose — matters as much as relevance. Off-the-shelf RAG breaks here: the most relevant passage may be the most legally inadmissible.
Approach
Built a five-stage architecture treating compliance as a first-class property of the index, retriever, generator, and audit log. Each chunk carries a 27-bit policy bitmask packed in a single 32-bit word. Bitwise admissibility checks evaluated inside the HNSW inner loop, before the result heap updates. Generator gets admissible and inadmissible buckets explicitly separated, with a refusal head when no admissible evidence exists. Every query commits a Merkle-anchored audit log sufficient for Article 12 of the EU AI Act.
Stack
Qdrant with custom HNSW patches · FastAPI · Claude / GPT-4 · Python audit-log substrate
Outcome
Sub-300 ms p95 retrieval latency on a 2.5M-node graph. Production-grade compliance posture, audit-defensible by design. The architecture was independently validated on a public 26,595-chunk benchmark from real SEC EDGAR filings — published as a working draft, demonstrating the same architecture cuts constraint violations from 81.12% to 0.00% and output disclosures from 21.29% to 0.00% at a 4.8 F1 cost.