Neighborhood contamination

Neighborhood contamination designates the phenomenon where the interpretation of an entity or concept is altered by the semantic proximity of neighboring content (dominant categories, co-occurrences, adjacent entities), to the point where the AI system attributes to the subject properties that primarily belong to its environment, not to its canon.

In an interpreted web, meaning is not determined solely by what is declared, but by what surrounds the entity. Neighborhood contamination is therefore a major mechanism of interpretive invisibilization and capture.

Definition

Neighborhood contamination is the situation where:

a subject A has a clear canon;
but its semantic neighborhood (B, C, D) is denser, more repeated, or more dominant;
and AI projects onto A attributes, intentions, categories, or explanations from the neighborhood.

The result is an interpretation that is “statistically coherent” but canonically false.

Why this is critical in AI systems

The model learns by proximity: co-occurrences and associations dominate granularity.
The model standardizes: it reduces the specific to the most frequent generic (smoothing).
The model aligns on clusters: a dominant cluster can reframe the concept.

Common contamination forms

Categorical contamination: the concept is reframed into a standard category (e.g. “framework” assimilated to “certification”).
Homonymy contamination: neighborhood of a better-known homonymous entity.
Dominant discourse contamination: a current or school imposes its vocabulary around the subject.
Secondary source contamination: wikis, aggregators, summaries that become more visible than the canon.

Practical indicators (symptoms)

AI systems describe the subject with the attributes of another adjacent subject.
The vocabulary is “corrected” toward generic terms.
Responses cite sources that mostly concern the neighborhood, not the subject.
The confusion persists even after publishing a canon, indicating inertia.

What neighborhood contamination is not

It is not a simple factual error. It is a referential shift.
It is not only SEO. It is a property of interpretation by proximity.
It is not necessarily intentional. It can emerge without explicit attack.

Minimum rule (enforceable formulation)

Rule NC-1: when a subject is exposed to a dominant neighborhood, the canon must provide disambiguation markers and explicit governed negations against probable reframings. Any attribution originating from the neighborhood must be considered at-risk inference and, if ungoverned, trigger a legitimate non-response.

Example

Case: an original concept is explained as a variant of a more widespread concept, because surrounding pages use that dominant vocabulary.

Diagnosis: neighborhood contamination, interpretive smoothing, then interpretive capture.

Expected correction: canonical reinforcement, governed negations, satellite pages, external graph, fidelity proofs.

Corpus role and diagnostic use

In the corpus, Neighborhood contamination names a failure mode in the reconstruction of meaning. It is not merely a stylistic issue and it is not solved by adding more content by default. It helps identify how an entity, claim, role, source or concept can be shifted by proximity, smoothing, competing sources, stale fragments, unstable wording or unresolved authority conflicts.

This definition is useful when a response is not obviously false but still changes the frame. The system may keep the right words while altering the hierarchy, the perimeter, the level of certainty, the relation between concepts or the currentness of a claim. That kind of error often survives because it appears coherent at the surface.

Failure pattern to detect

The typical failure is a representational drift that becomes stable enough to be repeated. A system may merge nearby concepts, overstate a weak signal, hide contradiction, compress uncertainty, or let an external graph contaminate a canonical framing. Once repeated across tools, the distortion can become harder to correct than a simple factual error.

Reading rule

Use this definition with semantic architecture, interpretive observability, interpretive risk, proof of fidelity and canon-output gap. The term should help move from a vague complaint about AI outputs to a precise diagnosis of the distortion.

Neighborhood contamination