Training data poisoning: source governance and provenance

This page defines training data poisoning as a provenance corruption that alters learned authority, and clarifies why source governance is an interpretive challenge, not merely technical.

When the training corpus is contaminated, the problem is not merely “an error in a dataset”. The problem is an alteration of what the system learns as regularities, hierarchies, associations, and truth signals.

On gautierdorval.com, training poisoning is treated as a case of AI poisoning with high inertia: once learned, the bias becomes difficult to isolate, because it manifests as “natural” model behavior.

Status of this page

This page is an interpretive clarification.

It stabilizes the term’s usage in this ecosystem and distinguishes it from ordinary data noise, variable corpus quality, or simple controversial content on the web.

Operational definition

Training data poisoning: intentional (or made intentional) alteration of a corpus used to train or fine-tune a model, in order to provoke a bias, deviation, instability, or conditional behavior that subsequently manifests as a system property.

The central signature is a provenance corruption: the system learns from sources that should not be authoritative, or learns relations that have been artificially made dominant.

Why provenance is the real perimeter

The risk is not merely “what is in the text”, but the status of sources and the mechanisms by which they enter the corpus:

source selection and ingestion perimeters
licenses, rights, and usage constraints
traceability, timestamping, versions, and lineage
deduplication, canonicalization, normalization
implicit weighting (repetition, overrepresentation, imbalance).

Weak provenance governance allows low-quality, deceptively authoritative, or hostile-intent sources to become “learned truth”.

Minimum typology (effect mechanisms)

Directional bias: favoring an interpretation, attribution, or narrative.
Degradation: introducing noise, contradictions, or conceptual confusion.
Reference derivation: making the system learn an erroneous source hierarchy (inverted authority).
Instability: making outputs sensitive to minor formulations, due to lacking stabilization.
Conditional triggering: provoking behavior only under certain conditions (without detailing procedures here).

Necessary distinctions

An imperfect corpus is not automatically poisoned: the key is intention (or instrumentalization) and systemic effect.
Public disinformation is not poisoning as long as it is not integrated into the training corpus with sufficient weight.
RAG drift concerns an indexed and recalled corpus. Training poisoning concerns learned authority.

Source governance (interpretive reading)

In an interpreted web, source governance is a component of interpretive governance:

defining what has the right to be authoritative
documenting exclusions (what must not be learned)
stabilizing canonical definitions and their boundaries
reducing inference space through explicit bounds.

Without these bounds, learning tends to reconstruct “probable” concepts, not authorized concepts.

Relation to other clarifications in the series

AI poisoning: definition, taxonomy, and interpretation risks
RAG poisoning: corpus contamination and interpretive drift
Prompt injection: authority threat and instruction/data confusion

Scope of this clarification

This page applies to human readings, automated syntheses, zero-click citations, and interconnected agent chains. It must be interpreted as a principle clarification: if provenance is not governed, learned authority becomes a potential drift.

Training data poisoning: source governance and provenance