2026·33 pages·Insightrix Working Draft·EN
Compliance-Aware Retrieval-Augmented Generation for Regulated Financial-Reporting Corpora
A Real-World Evaluation on SEC EDGAR Filings
CARAG is a five-stage RAG architecture that treats compliance as a first-class property of the index, the retriever, the generator, and the audit log. Evaluated on a benchmark built from 6,000 real SEC EDGAR filings (26,595 chunks across seven recent quarters), it cuts the constraint-violation rate from 81.12% to 0.00% and the output-disclosure rate from 21.29% to 0.00%, at a Token-F1 cost of only 4.8 points and 0 ms of 95th-percentile latency overhead.
CVR 81.12% → 0.00%ODR 21.29% → 0.00%4.8 F1 cost0 ms p95 latency overhead