AI poisoning: definition and taxonomy

This page defines “AI poisoning” operationally and proposes a readable taxonomy, to avoid confusions, semantic slippage, and improper analogies.

In AI systems, “poisoning” is often used as a catch-all, sometimes to designate training corpus poisoning, sometimes injection into a RAG base, sometimes corruption of agentic memory. This ambiguity favors implicit interpretations and erroneous diagnoses.

On gautierdorval.com, the term “AI poisoning” is treated as a concept of intentional or instrumentalized corruption of an authority source in a system’s interpretation chain. This is not a rhetorical effect, nor simple “disinformation”, but an action aimed at degrading, biasing, diverting, or destabilizing response production.

Status of this page

This page is an interpretive clarification.

It aims to stabilize the term’s internal usage, set reading bounds, and provide a functional taxonomy. It does not standardize external vocabulary and does not claim to cover all security research.

Operational definition

AI poisoning: deliberate (or made deliberate) alteration of a data flow, knowledge base, or memory mechanism, so as to produce a systematic drift in an AI system’s outputs, whether by bias, degradation, deviation, or instability.

Poisoning is recognized by a central property: it targets a source consumed as authority by the system (training, index, retrieval, memory, rules, tools, prompts, pipeline), and not merely content exposed to humans.

Functional taxonomy

This taxonomy classifies poisoning by where the alteration occurs and the type of effect sought.

1) By alteration surface (where it happens)

Training poisoning: alteration of a dataset used to train or fine-tune a model or learning component.
Retrieval poisoning (RAG): alteration of an indexed base, internal search engine, graph, or corpus used for passage retrieval.
Agentic memory poisoning: alteration of a state store (episodic, semantic, procedural memory) to influence an agent’s future decisions.
Pipeline poisoning: alteration of an upstream step (ETL, scraping, normalization, deduplication, scoring, filters) that modifies consumed truth.
Instruction poisoning: alteration of an instruction system, policies, templates, or tools (prompts, rules, functions) that steer interpretation.

2) By effect mechanism (what it produces)

Directional bias: favoring a conclusion, narrative, or recurring attribution.
Degradation: reducing overall quality (noise, inconsistencies, contradictions) to weaken reliability.
Reference derivation: displacing authority toward a “more cited” but non-canonical source (informational gravity effect).
Instability: making output vary by context, preventing reading stabilization.
Conditional triggering: producing an effect only under certain input or context conditions (without detailing procedures here).

Necessary distinctions (what this is not)

It is not a simple factual error or a one-time hallucination.
It is not merely public disinformation: the key difference is ingestion as authority by the system.
It is not a terminological debate: the purpose here is interpretation stability.

Recognition criteria (quick read)

Persistent: the drift repeats over time or across sessions.
Systemic: multiple responses or decisions converge toward the same bias.
Anchored: the effect appears correlated to a source, index, memory, or policy.
Resistant: correcting user input is not enough to correct output.

Scope of this clarification

This page applies to human readings, automated syntheses, zero-click citations, and interconnected agent chains. It establishes an internal reading framework: when the term “AI poisoning” is used in this ecosystem, it must point to an alteration of consumed authority, not to a simple content controversy.

AI poisoning: definition and taxonomy