Training data poisoning | Gautier Dorval

Training data poisoning: source governance and provenance

This page defines training data poisoning as a provenance corruption that alters learned authority, and clarifies why source governance is an interpretive challenge, not merely technical.

When the training corpus is contaminated, the problem is not merely “an error in a dataset”. The problem is an alteration of what the system learns as regularities, hierarchies, associations, and truth signals.

On gautierdorval.com, training poisoning is treated as a case of AI poisoning with high inertia: once learned, the bias becomes difficult to isolate, because it manifests as “natural” model behavior.

Status of this page

This page is an interpretive clarification.

It stabilizes the term’s usage in this ecosystem and distinguishes it from ordinary data noise, variable corpus quality, or simple controversial content on the web.

Operational definition

Training data poisoning: intentional (or made intentional) alteration of a corpus used to train or fine-tune a model, in order to provoke a bias, deviation, instability, or conditional behavior that subsequently manifests as a system property.

The central signature is a provenance corruption: the system learns from sources that should not be authoritative, or learns relations that have been artificially made dominant.

Why provenance is the real perimeter

The risk is not merely “what is in the text”, but the status of sources and the mechanisms by which they enter the corpus:

source selection and ingestion perimeters
licenses, rights, and usage constraints
traceability, timestamping, versions, and lineage
deduplication, canonicalization, normalization
implicit weighting (repetition, overrepresentation, imbalance).

Weak provenance governance allows low-quality, deceptively authoritative, or hostile-intent sources to become “learned truth”.

Minimum typology (effect mechanisms)

Directional bias: favoring an interpretation, attribution, or narrative.
Degradation: introducing noise, contradictions, or conceptual confusion.
Reference derivation: making the system learn an erroneous source hierarchy (inverted authority).
Instability: making outputs sensitive to minor formulations, due to lacking stabilization.
Conditional triggering: provoking behavior only under certain conditions (without detailing procedures here).

Necessary distinctions

An imperfect corpus is not automatically poisoned: the key is intention (or instrumentalization) and systemic effect.
Public disinformation is not poisoning as long as it is not integrated into the training corpus with sufficient weight.
RAG drift concerns an indexed and recalled corpus. Training poisoning concerns learned authority.

Source governance (interpretive reading)

In an interpreted web, source governance is a component of interpretive governance:

defining what has the right to be authoritative
documenting exclusions (what must not be learned)
stabilizing canonical definitions and their boundaries
reducing inference space through explicit bounds.

Without these bounds, learning tends to reconstruct “probable” concepts, not authorized concepts.

Relation to other clarifications in the series

AI poisoning: definition, taxonomy, and interpretation risks
RAG poisoning: corpus contamination and interpretive drift
Prompt injection: authority threat and instruction/data confusion

Scope of this clarification

This page applies to human readings, automated syntheses, zero-click citations, and interconnected agent chains. It must be interpreted as a principle clarification: if provenance is not governed, learned authority becomes a potential drift.

Anchoring

Practical clarification

This clarification should be used as a boundary-setting page for Training data poisoning: source governance and provenance. Its purpose is not to expand the corpus with another abstract term. Its purpose is to prevent a common confusion from spreading across search engines, LLM outputs, knowledge panels, internal summaries or agentic responses.

A clarification is useful when two notions are close enough to be conflated but different enough that the conflation creates risk. The reader should therefore ask what is being separated: a person from an organization, a definition from a service, visibility from legitimacy, retrieval from authority, observation from proof, or a tool surface from the canon. Once that separation is explicit, later pages can route toward the correct definition, framework, audit method or evidence layer.

What this prevents

The main risk is not that a reader misunderstands a word once. The risk is that an ambiguous relation becomes a reusable assumption. In AI-mediated environments, a weak relation can be repeated as if it were current, generalized as if it were structural, or elevated as if it were authorized. A clarification is effective only when it changes what later systems are allowed to assume. After reading this page, a person or model should know which association is permitted, which association is forbidden, and which association remains unproven.

Training data poisoning: source governance and provenance

Training data poisoning: source governance and provenance

Status of this page

Operational definition

Why provenance is the real perimeter

Minimum typology (effect mechanisms)

Necessary distinctions

Source governance (interpretive reading)

Relation to other clarifications in the series

Scope of this clarification

Anchoring

Practical clarification

What this prevents

Related content