The Accountability Gap, 4iGov

A question that surfaced during doctoral research, on why AI agents in regulated industries seem to operate in a space where accountability is assumed but rarely architected.

This article starts with an admission: it does not come from a settled position of expertise. It comes from preparing a doctoral research proposal, working through the existing literature on AI governance, and arriving at a question that the literature did not cleanly answer.

That question, stated simply, is this: in a regulated organisation where an AI agent is making consequential decisions, approving or declining financial applications, classifying risk, generating compliance documentation, who is actually accountable for what that agent does? Not in the legal sense of ultimate liability. In the operational sense of day-to-day ownership. Who decides what the agent is allowed to do? Who notices when it drifts? Who answers for it when something goes wrong?

The more carefully this question is examined, the less obvious the answer becomes. And that, in itself, seems worth documenting.

How This Question Formed

The academic literature on AI governance is growing quickly and is, in places, genuinely rigorous. The University of Turku's Hourglass Model maps governance tasks across the full AI system lifecycle, from environmental and organisational constraints down to system-level controls. The NIST AI Risk Management Framework provides a structured vocabulary for thinking about AI risk. The EU AI Act, whatever its implementation challenges, represents the first serious attempt to translate AI governance principles into binding legal obligations.

What the literature is less clear on is internal organisational accountability. The frameworks describe what governance should achieve. They are, with some exceptions, quieter on who carries the responsibility for achieving it when an AI agent is embedded in a live product rather than managed as a standalone research project or documented IT asset.

That silence is not a failure of the research. It reflects a genuine open question, one that organisations are navigating in real time, often without a clear internal model for how to do it.

The observable pattern, across financial services in particular, is that product management and governance have historically operated as adjacent but separate functions. Product managers shape what gets built. Security, compliance, and risk teams govern the environment it gets built in. This division of responsibility is rational, and for most of the software era it has worked well enough.

The question this research is organised around is whether that arrangement is still adequate when the product includes AI agents, systems that do not merely execute predefined logic but infer, decide, and act within parameters that were only partially specified at design time. And the early indication, from both the literature and observable industry practice, is that the answer may be no.

The Boundary That Made Sense, Until Recently

To understand the gap, it helps to be precise about the professional arrangement it emerged from.

In a well-structured financial institution, the security perimeter around a product is established early, typically during MVP design, in conversations between solution architects and security teams that happen before a product manager writes a feature brief. Encryption standards, network protocols, authentication requirements, data residency constraints, these are decided upstream and assumed to carry forward into everything built on top of them.

This is not negligence. It is specialisation. A product manager building a lending journey or a payments flow does not need to specify TLS configuration. That decision belongs to people with the expertise and the mandate to make it correctly. The PM's job is to understand the regulatory constraints that shape the product and work within the governance structure that the organisation has built around it.

The result is a professional division that serves everyone reasonably well: product teams focus on value delivery, governance teams focus on risk and compliance, and the assumption is that the two lanes run in parallel without needing to merge.

That assumption rests on a specific condition: that the system being governed behaves predictably. A deterministic feature, a form submission, a calculation, a document generation workflow, does what it was built to do. The security perimeter drawn at MVP stage remains valid because the system's behaviour stays within the parameters that were assessed.

This is where the existing accountability structure begins to show the strain. The governance framework was designed for a system that stays where you put it. The agent does not stay where you put it. And the question of who notices, who decides what to do about it, and who is accountable for the outcome does not have a clean answer in most organisations operating today.

What This Looks Like in Practice

The clearest illustration of this gap is not a dramatic system failure. It is a much quieter, more common problem: the accumulation of unresolved risk.

In most regulated organisations, risk registers and GRC tools contain a significant backlog of items that have been identified, assessed, rated, and then deprioritised because the product roadmap could not absorb them. This is not unusual, and it is not inherently unreasonable, not every risk can be addressed immediately. The problem is structural: the GRC tool and the product roadmap have historically operated in separate worlds, and there is rarely a forcing function that requires the two to be reconciled before a new product decision is made.

The result is that governance becomes consistently downstream of product decisions. By the time a compliance finding surfaces, in a penetration test, an internal audit, or a regulatory inspection, the architecture it relates to has already been built around a different assumption. Remediation at that stage is expensive, disruptive, and sometimes architecturally difficult.

For AI agents specifically, this dynamic is more consequential. Industry analysis of enterprise AI deployments in financial services has documented a recurring pattern of governance-related failures, not because the underlying technology was inadequate, but because the accountability infrastructure around the deployment was not built to match the system's actual risk surface. The agent worked. The governance of the agent did not.

A wealth management platform that deploys an AI robo-advisor faces a version of this problem that is not resolved by existing compliance infrastructure. The agent can be instructed to align recommendations to a client's stated risk profile. What the compliance sign-off cannot specify is how the agent balances that stated profile against the client's observed behaviour over time, how it responds to a market event that falls outside its training distribution, or what it does when a client's circumstances change faster than the model's context window can accommodate. These are not exotic edge cases. They are the normal operating conditions of a live financial product.

Under the EU AI Act, a system of this kind may be classified as high-risk, requiring conformity assessment, technical documentation, human oversight mechanisms, and logging that allows decisions to be traced and explained. The question of who builds those requirements into the product, not as a compliance afterthought but as architectural commitments made at design time, is, again, the question that does not have a settled answer.

The Regulatory Environment Is Hardening

The governance gap described above is not new. What is new is the regulatory context around it.

BaFin, Germany's financial regulator, has signalled through its guidance that AI governance in banks is a management-level accountability, not an IT accountability. DORA, the EU's Digital Operational Resilience Act, classifies AI as an ICT risk, bringing AI system governance within the scope of its resilience requirements for financial entities. The EU AI Act creates specific conformity obligations for high-risk AI systems that cannot be met through policy documentation alone: they require architectural choices, logging infrastructure, and human oversight mechanisms that must be present in the system's design.

What this regulatory hardening means for product teams is that the informal accountability arrangement, where governance is assumed to be someone else's lane, is becoming less viable. Regulators are asking specific questions about who owns AI system behaviour, and organisations that cannot answer those questions clearly are finding that the documentation they have prepared does not satisfy the evidence standard being applied.

Where the Research Is Focused

The doctoral research this series is connected to is attempting to develop a more precise account of what AI agent governance accountability looks like inside product organisations, not as a theoretical framework, but as a set of questions that can be tested against real organisational contexts.

The central question is whether the product manager is the natural owner of AI agent governance accountability, and if so, what that ownership requires in practice. The initial hypothesis, formed from the literature review and from observable industry patterns, is that the product function carries the accountability by default, because it carries the broadest responsibility for the product's behaviour. But carrying accountability by default is not the same as being equipped to exercise it.

A secondary question is whether the existing tooling and framework landscape, which the next article in this series examines in detail, provides the infrastructure that a governance-accountable product manager would need. The early indication is that it provides significant capability in adjacent areas, and that a specific question remains open precisely where the product function's accountability sits.

These are working hypotheses, not conclusions. They will be tested, revised, and in some cases probably abandoned as the research develops. This series documents that process, not because the journey is the point, but because the questions are genuinely open and the field benefits from honest accounts of what practitioners are finding as they work through them.

If you are working through related questions in your own context, whether in product, governance, risk, or engineering, we would welcome the conversation. Not to validate a predetermined position, but because this is a problem that is better understood from multiple vantage points than from any single one.

4iGov is an early-stage research and product initiative exploring AI governance accountability in regulated industries. This is the first article in an ongoing series. The views expressed reflect work in progress and should not be read as settled conclusions.