There is a well-established idea in software security called shift-left. Move the security work earlier in the development lifecycle. Find vulnerabilities before they are built in rather than after they are deployed. The logic is sound and the evidence supports it, defects caught at design are orders of magnitude cheaper to fix than defects caught in production.
For most software products, shift-left has become organisational common sense. For AI products in regulated environments, it is being applied with the right intention and the wrong model. The shift is happening. It is shifting toward the wrong point.
Where the Gap Appears
When a product team introduces an AI agent into a regulated product, the security conversation that happens earliest is typically a threat assessment. What can go wrong? What data does the agent access? What external services does it call? The threat model is produced. The risks are rated. The high-severity items are flagged for remediation.
What does not happen, in most organisations, is a design-time contract: a documented set of commitments about what the agent is permitted to do, under what conditions, with what limits on scope and duration, and with what evidence trail built into the architecture from the first line of code.
The threat model describes what could go wrong. The design-time contract specifies what is explicitly permitted and what is not, and builds the enforcement of that specification into the system rather than into the documentation. These are not the same thing. The first is an assessment. The second is an architectural commitment. Organisations that have the first and not the second have produced a risk register, not a governance posture.
Why Shift-Left Fails Here
The pattern is visible in how product security programmes report on their own health. Density and volume metrics (total open findings, scan coverage percentages, tools deployed) look credible in a slide but do not tell you whether the programme is keeping its commitments. SLA adherence, findings closed within window divided by findings due, is the metric that reveals whether anything is actually getting fixed. The same structural problem appears one layer earlier at design time. A threat model produced at the start of a product build satisfies a process gate. But if it does not produce specific architectural decisions, it has produced the appearance of security governance without the substance.
This is where shift-left fails for AI products specifically. The PM and the security team are both present at design time. They are solving different problems. Neither question produces the design-time contract. The contract requires a third question: what is this agent explicitly permitted to do, and how is that permission enforced at the architectural level rather than the policy level?
What the Contract Covers
A design-time contract for an AI agent in a regulated product is not a lengthy governance document. It is a precise set of architectural commitments across four dimensions.
Scope. What data sources, tools, and external services the agent is permitted to access. Not a list of what it might access, a scoped permission set that the architecture enforces. An agent that can call any tool available in the environment is not scoped. An agent that can call a defined set of tools, with defined input and output constraints, is.
Duration. Under what conditions the agent's permissions are active. Break-glass access applied to AI agents means tightly scoped, time-limited, logged for every invocation. Most deployed agents have no equivalent. They are on, always, with the full permission set granted at deployment.
Behavioural limits. What the agent is explicitly not permitted to produce. For a regulated financial product this means documented constraints on output types: no hallucinated citations, no extrapolated financial advice beyond the product's scope, no data returned outside the residency boundary. These limits exist in policies. The design-time contract encodes them in the system.
Evidence trail. What the agent logs at inference time, not at deployment, at inference. Model version in production at a specific timestamp. Input data at the point of decision. Output produced. Override events. This is the infrastructure that makes the evidentiary chain from the AI Accountability series possible. It cannot be retrofitted after deployment without significant cost. It must be built at design time or it will not exist when it is needed.
The Forcing Function
The EU AI Act's technical documentation requirements for high-risk systems and the FCA's PS7/23 model risk management principles both describe something that functions as a design-time contract: documented parameters, scoped permissions, evidence of continuous monitoring against those parameters. Neither framework uses that term. Both frameworks require the substance.
What they also establish is a timing obligation. The documentation must exist before deployment, not be assembled in response to an examination. An organisation that waits until a regulatory examination to produce this documentation will find that the evidence standard requires a connected record, parameters linked to monitoring data linked to decision logs, that cannot be reconstructed from documentation written after the fact.