There is a well-established idea in software security called shift-left. Move the security work earlier in the development lifecycle. Find vulnerabilities before they are built in rather than after they are deployed. The logic is sound and the evidence supports it, defects caught at design are orders of magnitude cheaper to fix than defects caught in production.

For most software products, shift-left has become organisational common sense. For AI products in regulated environments, it is being applied with the right intention and the wrong model. The shift is happening. It is shifting toward the wrong point.

Where the Gap Appears

The development lifecycle and where each intervention lands
Design
Build
Test
Deploy
Production
The accountability gap
Design-time contract lands here
Threat model currently lands here
Design-time contract: architectural commitment
Threat model: risk assessment only
The accountability gap
Figure 1. Shift-left applied correctly lands at design, not at build. The gap between threat assessment and architectural commitment is where governance accountability is lost.

When a product team introduces an AI agent into a regulated product, the security conversation that happens earliest is typically a threat assessment. What can go wrong? What data does the agent access? What external services does it call? The threat model is produced. The risks are rated. The high-severity items are flagged for remediation.

What does not happen, in most organisations, is a design-time contract: a documented set of commitments about what the agent is permitted to do, under what conditions, with what limits on scope and duration, and with what evidence trail built into the architecture from the first line of code.

The threat model describes what could go wrong. The design-time contract specifies what is explicitly permitted and what is not, and builds the enforcement of that specification into the system rather than into the documentation. These are not the same thing. The first is an assessment. The second is an architectural commitment. Organisations that have the first and not the second have produced a risk register, not a governance posture.

Why Shift-Left Fails Here

Three questions are present at design time. Only one produces an architectural commitment.
Security team asks
"What could this agent do that we do not want it to do?"
Threat Assessment
The required third question
"What is this agent explicitly permitted to do, and how is that enforced architecturally?"
The Design-Time Contract
Product manager asks
"What should this agent do to deliver the product outcome?"
Product Requirement
All three questions are necessary. Only the centre produces an architectural commitment. Without it, the other two produce a gap.
Figure 2. The design-time contract is the missing third question. Security asks what could go wrong. Product asks what should happen. Neither asks what is explicitly permitted and how that is enforced.

The pattern is visible in how product security programmes report on their own health. Density and volume metrics (total open findings, scan coverage percentages, tools deployed) look credible in a slide but do not tell you whether the programme is keeping its commitments. SLA adherence, findings closed within window divided by findings due, is the metric that reveals whether anything is actually getting fixed. The same structural problem appears one layer earlier at design time. A threat model produced at the start of a product build satisfies a process gate. But if it does not produce specific architectural decisions, it has produced the appearance of security governance without the substance.

This is where shift-left fails for AI products specifically. The PM and the security team are both present at design time. They are solving different problems. Neither question produces the design-time contract. The contract requires a third question: what is this agent explicitly permitted to do, and how is that permission enforced at the architectural level rather than the policy level?

What the Contract Covers

A design-time contract for an AI agent in a regulated product is not a lengthy governance document. It is a precise set of architectural commitments across four dimensions.

The Design-Time Contract
01
Dimension 1
Scope
What data sources, tools, and external services the agent is permitted to access. A scoped permission set enforced by the architecture, not listed in documentation.
Enforced at architecture level
02
Dimension 2
Duration
Under what conditions the agent's permissions are active. Tightly scoped, time-limited, logged for every invocation. Most deployed agents are on, always, with full permissions.
Time-limited and logged
03
Dimension 3
Behavioural Limits
What the agent is explicitly not permitted to produce. Documented constraints on output types, encoded in the system, not documented in a policy that depends on engineers remembering to follow it.
Encoded in system, not policy
04
Dimension 4
Evidence Trail
What the agent logs at inference time, not deployment. Model version, input data, output produced, override events. Cannot be retrofitted. Must be built at design time or it will not exist when needed.
Built at design, not retrofitted
4iGov.cloud  ·  Product Security in AI Agentic Development
Figure 3. The four dimensions of the design-time contract. Download as a reference card for internal use.

Scope. What data sources, tools, and external services the agent is permitted to access. Not a list of what it might access, a scoped permission set that the architecture enforces. An agent that can call any tool available in the environment is not scoped. An agent that can call a defined set of tools, with defined input and output constraints, is.

Duration. Under what conditions the agent's permissions are active. Break-glass access applied to AI agents means tightly scoped, time-limited, logged for every invocation. Most deployed agents have no equivalent. They are on, always, with the full permission set granted at deployment.

Behavioural limits. What the agent is explicitly not permitted to produce. For a regulated financial product this means documented constraints on output types: no hallucinated citations, no extrapolated financial advice beyond the product's scope, no data returned outside the residency boundary. These limits exist in policies. The design-time contract encodes them in the system.

Evidence trail. What the agent logs at inference time, not at deployment, at inference. Model version in production at a specific timestamp. Input data at the point of decision. Output produced. Override events. This is the infrastructure that makes the evidentiary chain from the AI Accountability series possible. It cannot be retrofitted after deployment without significant cost. It must be built at design time or it will not exist when it is needed.

The Forcing Function

The EU AI Act's technical documentation requirements for high-risk systems and the FCA's PS7/23 model risk management principles both describe something that functions as a design-time contract: documented parameters, scoped permissions, evidence of continuous monitoring against those parameters. Neither framework uses that term. Both frameworks require the substance.

What they also establish is a timing obligation. The documentation must exist before deployment, not be assembled in response to an examination. An organisation that waits until a regulatory examination to produce this documentation will find that the evidence standard requires a connected record, parameters linked to monitoring data linked to decision logs, that cannot be reconstructed from documentation written after the fact.