Indirect injection: when "summarize this content" becomes an attack surface

Indirect injection: when “summarize this content” becomes an attack surface

This page defines indirect injection as an authority threat that transits through a legitimate task (“summarize”, “explain”, “extract”) and converts a hostile instruction into consumed context.

Prompt injection is often imagined as an adversary who “speaks to the model” directly. However, in a modern architecture (RAG, assisted browsing, agents), a large portion of context is not provided by the user but retrieved (pages, documents, extracts, emails, repositories, tools). Indirect injection exploits this reality: it places instructions in content that will subsequently be processed as data.

The critical point is structural: a work instruction (“summarize this content”) forces the system to ingest third-party text. If the system does not explicitly bound what can instruct, it risks letting a hostile instruction slip into the decisional hierarchy.

Status of this page

This page is an interpretive clarification.

It establishes an internal reading framework to distinguish indirect injection from a simple error, hallucination, or misleading content. It does not constitute an operational procedure or exploitation guide.

Operational definition

Indirect injection: insertion of instructions or constraints into third-party content (page, document, extract, tool output) such that, during a legitimate task (summary, extraction, classification, response), the system treats these instructions as authoritative context and modifies its output, priorities, or decisions.

The central mechanism is an instruction/data confusion transiting through a processing step perceived as neutral.

Why “summarize this content” is an attack surface

A summary request has a particular property: it implicitly gives the content a status of “raw material” to ingest, without prior validation of its role.

If the system does not impose strict separation between:

rules (what can instruct)
context (what can inform)
sources (what can be authoritative)

then content can contain a hostile instruction that will be processed as if it were compatible with the requested task, or even given priority.

Common surfaces (where injection hides)

Web pages: sections invisible to the eye (footer, comments, accordions), or non-editorialized “SEO” content.
Documents: PDFs, docs, notes, where the instruction is buried in a paragraph.
Tool outputs: API outputs, connectors, scrapers, logs, consumed as “raw data”.
RAG-indexed content: a poisoned fragment can be recalled out of context and gain an implicit authority rank.

Authority threat: the real problem

Indirect injection is an authority threat, not simply “malicious text”.

It seeks to displace what decides: elevate an instruction originating from third-party content above the policy, system, or legitimate user instruction. When this happens, the system is no longer “summarizing” — it is obeying an illegitimate rank.

Bounding and distinctions

Misleading content is not automatically indirect injection: it becomes so if it influences the instruction hierarchy.
A bad synthesis does not prove injection: the signature is a systematic deviation or abnormal priority.
Informational noise is not an instruction: injection implies an explicit or implicit attempt at constraint.

Relation to other clarifications in the series

Prompt injection: authority threat and instruction/data confusion
RAG poisoning: corpus contamination and interpretive drift
AI poisoning: definition, taxonomy, and interpretation risks
Q-Layer against injection attacks: response conditions bounding

Scope of this clarification

This page applies to human readings, automated syntheses, zero-click citations, and interconnected agent chains. It must be interpreted as a principle clarification: any processing task (summary, extraction, reformulation) can become an attack surface if the authority hierarchy is not explicitly bounded.

Indirect injection: when “summarize this content” becomes an attack surface