RAG poisoning: corpus contamination and interpretive drift
This page defines RAG poisoning as contamination of a retrieval corpus that alters consumed authority and causes interpretive drift.
RAG (Retrieval-Augmented Generation) architectures do not respond solely “with a model”. They respond with a model and a retrieval system: index, embeddings, search engine, document bases, filters, ranking rules, and context assembly. In this framework, the attack surface is not limited to the instruction (prompt). It includes the material the system will cite, summarize, or treat as reference.
On gautierdorval.com, RAG poisoning is treated as a particular case of “AI poisoning”: an alteration of the source consumed as authority in the interpretation chain, which produces biased, unstable, or diverted responses.
Status of this page
This page is an interpretive clarification.
It aims to stabilize the internal usage of the term “RAG poisoning” by distinguishing it from ordinary retrieval errors, simple public disinformation, and prompt injection attacks.
Operational definition
RAG poisoning: intentional or instrumentalized contamination of an indexed corpus (documents, fragments, metadata) used for context retrieval, in order to displace consumed authority, bias recall, or inject fragments that systematically alter outputs.
The central property is: the poisoned content is not merely visible, it is ingested, indexed, then recalled as context in responses, which grants it an implicit authority rank.
Corpus contamination: what is actually targeted
In a RAG architecture, the attack rarely targets “the model”. It targets the corpus and its selection mechanisms:
- source content (pages, docs, notes, databases, tickets)
- segmentation (chunks) and context boundaries
- embeddings and semantic similarity
- ranking (what surfaces first)
- filters and selection policies
- deduplication, canonicalization, and normalization.
A successful contamination modifies what the system “considers relevant”, not merely what it could read.
Minimum typology (effect mechanisms)
- Reference derivation: surfacing a non-canonical source as if it were more authoritative.
- Directional bias: steering responses toward a narrative or recurring attribution.
- Recall instability: causing contradictions depending on queries, sessions, or formulations.
- Fragment contamination: injecting “plausible” chunks that attach to many subjects.
- Degradation: intentional noise to reduce overall reliability and open inference space.
Necessary distinctions (what this is not)
- Poor retrieval is not automatically poisoning: it can be a weak index, poorly cut chunks, or inadequate ranking.
- A misleading public page is not poisoning as long as it is not ingested and recalled by the system as context.
- Prompt injection targets the instruction. RAG poisoning targets the material recalled as an authority source.
Relation with indirect injection
Indirect injection and RAG poisoning are often combined:
- indirect injection seeks to pass a hostile instruction through processed content (e.g. summary)
- RAG poisoning seeks to make this poisoned content recalled recurrently, by making it “relevant” for many queries.
In that case, the problem is no longer punctual. It becomes persistent and systemic.
Scope of this clarification
This page applies to human readings, automated syntheses, zero-click citations, and interconnected agent chains. It must be interpreted as a principle clarification: in RAG, authority governance passes as much through corpus governance as through instruction control.