Features 13.01.2026

Unpacking Prompt Injection: The AI Vulnerability You Can’t Patch Away

The NCSC has warned of a “dangerous misunderstanding” about these GenAI threats.

CISOs should assume failure, constrain impact and focus on recovery to mitigate AI risk, Ian Williams learns

When the UK’s National Cyber Security Centre (NCSC) recently warned that mischaracterising prompt injection could lead to large-scale breaches, it was not highlighting another software flaw waiting for a fix. Prompt injection is often treated as the AI equivalent of SQL injection or cross-site scripting, the agency said. But while that comparison is convenient and familiar, it is also wrong.

Unlike traditional injection attacks, prompt injection is not the result of improper input handling or missing sanitisation. It stems from how large language models (LLMs) operate at a fundamental level, with significant implications for security leaders managing AI risk.

Why prompt injection is different

Prompt injection refers to techniques that influence an AI system’s behaviour by embedding instructions within data the model is asked to process. Those instructions may be explicitly included in prompts or hidden in documents, web content, emails, logs, or outputs from other systems. If the model treats them as authoritative, it may override safeguards, disclose sensitive information, or take unintended actions.

The distinction that matters for CISOs is scope. SQL injection compromises a database. Prompt injection can compromise decision-making.

Pete Luban, field CISO at AttackIQ, tells Assured Intelligence that prompt injection is “not just a technical issue”, but one that affects automation, trust and business processes across an organisation. Because AI systems increasingly sit inside customer workflows, security operations, and automated decision paths, the impact of manipulation extends well beyond IT systems themselves.

That shift is what elevates prompt injection from an application security concern to a board-level risk.

Why familiar security models fail

Security teams have decades of experience dealing with untrusted input. Established practices are built on separating data from instructions and constraining execution paths. LLMs do not work that way.

“Prompt injection is not a transitional phase in AI security, nor a vulnerability waiting for a patch”

In a language model, instructions and data are processed through the same mechanism. There is no reliable, programmatic way for the system to determine whether a piece of text is benign context or an attempt to override previous instructions. Any text that enters the model’s context window can influence its behaviour.

This is why prompt injection resists traditional fixes. Guardrails, prompt hardening, and filtering can reduce risk, but they cannot provide the deterministic guarantees that defenders expect from mature application security controls.

As developer Simon Willison warned in a recent conversation with RedMonk analyst Kate Holterhoff, the industry has been “unusually slow in recognising that prompt injection isn’t a bug at all, but an inherent property of how these models work”. That hesitation is becoming harder to justify as attacks move from theory to practice.

From isolated attacks to AI supply chains

Early discussions of prompt injection often centred on consumer chatbots and contrived demonstrations. That framing no longer holds.

Recent techniques such as HashJack and PromptFix demonstrate how malicious prompts can be embedded upstream and propagated through AI supply chains. Rather than interacting directly with a target model, attackers can place instructions in data sources or third-party services that an AI system automatically consumes.

“Every integration point where an LLM processes untrusted content becomes a potential injection vector” Kate Holterhoff

The result is a broader attack surface that includes documentation, repositories, telemetry feeds, integrations, and vendor-provided services.

Holterhoff tells Assured Intelligence that the most serious risk is no longer limited to chatbots; it lies with LLM-powered agents with permission to act. “For the most part, chatbots could leak information, but an agent with calendar access, email integration, and API credentials can do significant damage,” she explains. As organisations deploy systems that retrieve documents, process emails, or make access decisions, “every integration point where an LLM processes untrusted content becomes a potential injection vector,” Holterhoff adds.

Chrisi Baetz, a vCISO associate at Cyberhash, tells Assured Intelligence that many organisations still struggle even to identify where they’re using AI. Browser-based agents, SaaS features, and shadow AI tools often bypass formal security reviews, expanding exposure without corresponding visibility or controls. In that context, prompt injection becomes as much a governance issue as a technical one, she argues.

Why it’s not patchable

The NCSC’s December warning rests on a simple point: prompt injection is a design limitation, not a defect. LLMs are probabilistic systems trained to predict language, not to reason about intent. They cannot reliably distinguish between trusted instructions and untrusted input. Framing prompt injection as a bug encourages organisations to look for a fix that does not exist, the agency argues.

AttackIQ’s Luban agrees with that assessment. Treating prompt injection like a traditional vulnerability creates a false sense of security, he says. Prevention cannot be guaranteed. A more realistic course of action would focus not on blocking prompt injection entirely, but on designing systems that assume it will occur and limit the resulting damage.

What this means for CISOs

For CISOs, this implies that prompt injection cannot be mitigated by a single tool or policy. It requires a shift in emphasis. Rather than focusing exclusively on inputs, security teams need to monitor behaviour. Unexpected tool usage, anomalous action sequences, or outputs that fall outside established norms may be more meaningful indicators than the prompts themselves.

“Claims that systems are ‘prompt-injection resistant’ or ‘secure by design’ should be treated cautiously”

Identity governance is equally critical. AI systems operate with inherited permissions, and overly broad access dramatically increases the blast radius of attacks. Cyberhash’s Baetz explains that organisations already struggling with identity management and segmentation are likely to find AI-specific risks unmanageable.

There is also a need for scepticism. Claims that systems are “prompt-injection resistant” or “secure by design” should be treated cautiously. At best, these claims are narrow and contextual. At worst, they obscure the technology’s fundamental limitations.

Reduce impact, don’t chase immunity

Accepting that prompt injection cannot be fully mitigated does not mean accepting uncontrolled risk. It means focusing on resilience.

One possible course of action would be to limit autonomy. High-impact actions should not be executed automatically without validation or oversight. Human-in-the-loop controls, often framed as a weakness of automation, function as a deliberate safety mechanism in AI-driven systems.

Architectural containment also matters. Separating read and write privileges, enforcing segmentation between AI systems and core platforms, and defining explicit handoff points all help constrain damage when manipulation occurs. Continuous adversarial testing is another useful approach. Traditional pen testing focuses on code and infrastructure. But AI systems need to be tested for behavioural failures, including how they respond to malicious or conflicting instructions.

AttackIQ’s Luban describes acceptable risk in practical terms: knowing what a manipulated model could affect, how quickly the organisation would notice, and how fast it could intervene or shut the system down.

Governance gaps and an escalating arms race

Prompt injection also exposes weaknesses in organisational governance. Shadow AI usage, undocumented integrations, and unclear ownership make risk and assessment management difficult.

“Accepting that prompt injection cannot be fully mitigated does not mean accepting uncontrolled risk”

RedMonk’s Holterhoff is increasingly concerned about how the threat itself is evolving. “We’re currently in an arms race where defenders are leveraging LLMs to detect sophisticated attacks, but bad actors are also using LLMs to craft attacks that are more nuanced and effective,” she says. As models become more capable, the resulting attacks become even harder to detect and mitigate.

Cyberhash’s Baetz adds that outright bans on AI tools often backfire, pushing usage into unmanaged personal accounts. Allowing controlled, visible use creates opportunities for oversight, education, and risk assessment, rather than driving the problem underground, she says.

A permanent condition, not a temporary flaw

Prompt injection is not a transitional phase in AI security, nor a vulnerability waiting for a patch. It is a permanent property of systems that reason over language rather than execute deterministic code.

The mistake organisations make is not deploying AI systems that can be manipulated. It is assuming they cannot be – or that someone else has already solved the problem.

For CISOs, maturity will be measured not by claims of immunity, but by how well organisations assume failure, constrain impact, and recover when AI systems behave unexpectedly.

Latest articles

Be an insider. Sign up now!