Indirect injection: when “summarize this content” becomes an attack surface
This page defines indirect injection as an authority threat that transits through a legitimate task (“summarize”, “explain”, “extract”) and converts a hostile instruction into consumed context.
Prompt injection is often imagined as an adversary who “speaks to the model” directly. However, in a modern architecture (RAG, assisted browsing, agents), a large portion of context is not provided by the user but retrieved (pages, documents, extracts, emails, repositories, tools). Indirect injection exploits this reality: it places instructions in content that will subsequently be processed as data.
The critical point is structural: a work instruction (“summarize this content”) forces the system to ingest third-party text. If the system does not explicitly bound what can instruct, it risks letting a hostile instruction slip into the decisional hierarchy.
Status of this page
This page is an interpretive clarification.
It establishes an internal reading framework to distinguish indirect injection from a simple error, hallucination, or misleading content. It does not constitute an operational procedure or exploitation guide.
Operational definition
Indirect injection: insertion of instructions or constraints into third-party content (page, document, extract, tool output) such that, during a legitimate task (summary, extraction, classification, response), the system treats these instructions as authoritative context and modifies its output, priorities, or decisions.
The central mechanism is an instruction/data confusion transiting through a processing step perceived as neutral.
Why “summarize this content” is an attack surface
A summary request has a particular property: it implicitly gives the content a status of “raw material” to ingest, without prior validation of its role.
If the system does not impose strict separation between:
- rules (what can instruct)
- context (what can inform)
- sources (what can be authoritative)
then content can contain a hostile instruction that will be processed as if it were compatible with the requested task, or even given priority.
Common surfaces (where injection hides)
- Web pages: sections invisible to the eye (footer, comments, accordions), or non-editorialized “SEO” content.
- Documents: PDFs, docs, notes, where the instruction is buried in a paragraph.
- Tool outputs: API outputs, connectors, scrapers, logs, consumed as “raw data”.
- RAG-indexed content: a poisoned fragment can be recalled out of context and gain an implicit authority rank.
Authority threat: the real problem
Indirect injection is an authority threat, not simply “malicious text”.
It seeks to displace what decides: elevate an instruction originating from third-party content above the policy, system, or legitimate user instruction. When this happens, the system is no longer “summarizing” — it is obeying an illegitimate rank.
Bounding and distinctions
- Misleading content is not automatically indirect injection: it becomes so if it influences the instruction hierarchy.
- A bad synthesis does not prove injection: the signature is a systematic deviation or abnormal priority.
- Informational noise is not an instruction: injection implies an explicit or implicit attempt at constraint.
Relation to other clarifications in the series
- Prompt injection: authority threat and instruction/data confusion
- RAG poisoning: corpus contamination and interpretive drift
- AI poisoning: definition, taxonomy, and interpretation risks
- Q-Layer against injection attacks: response conditions bounding
Scope of this clarification
This page applies to human readings, automated syntheses, zero-click citations, and interconnected agent chains. It must be interpreted as a principle clarification: any processing task (summary, extraction, reformulation) can become an attack surface if the authority hierarchy is not explicitly bounded.