AI in the financial sector is no longer experimental. Institutions are already deploying AI across internal productivity tools, customer support, compliance, fraud detection, and risk management workflows. Yet despite years of investment and large-scale transformation programs, most organizations continue to face the same structural constraints: governance, auditability, third-party risk, and ultimately, trust.
This persistence surfaced clearly in a recent Mila-hosted gathering of Canadian financial leaders, regulators, insurers, and AI researchers focused on the next phase of AI adoption in finance.
What emerged was a growing imbalance: AI capabilities are advancing at extraordinary speed, from copilots to increasingly autonomous agentic systems, while governance frameworks, regulatory processes, and institutional readiness continue to evolve far more slowly.
The Paradox of Shadow AI
Some organizations are already exploring agentic AI workflows capable of orchestrating tasks, accessing tools, and automating complex operations. For example, TD recently announced an agentic AI system designed to automate parts of the mortgage and home equity lending process, reducing some workflows from hours to minutes while explicitly emphasizing oversight and responsible deployment.
At the same time, many institutions continue to impose strict limitations on internal use of tools such as ChatGPT, Claude, Gemini, or Copilot due to unresolved concerns around privacy, data sovereignty, and control over sensitive information. Despite these restrictions, “shadow AI” is becoming increasingly common, as employees independently adopt public tools to accelerate daily work, introducing new compliance risks.
Complicating matters further, nearly all institutions now depend in some capacity on third-party AI providers, infrastructure, or models. While this accelerates innovation, it also introduces compounded exposure to vendor opacity, auditability gaps, and limited visibility into how data and decisions flow across increasingly interconnected systems.
The figure below shows that banks now rely on an increasingly diverse array of third-party AI vendors, with the number of non-major providers doubling since last year. This growing dependence on a wider range of suppliers significantly heightens the complexity of interconnected risks across the financial sector.

The Human Pressure Behind AI Adoption
Beyond technology, organizations are also navigating an important human transition. Executives are under pressure to deliver AI-driven efficiency gains and demonstrate return on investment, while employees are simultaneously curious, cautious, and uncertain about how their roles will evolve. In many institutions, middle management has become the operational layer responsible for translating ambitious AI strategies into workable day-to-day processes while balancing compliance, productivity, and workforce concerns.
Many employees are now expected to supervise systems they did not design, do not fully understand, and are not formally trained to govern. As AI systems become more embedded into workflows, questions around expertise, human oversight and decision authority grow harder to resolve in practice.
Why Governance Is Becoming Technical
Financial institutions are increasingly requiring real-time governance mechanisms embedded directly into AI systems: systems that can detect policy violations, constrain or block unsafe behavior, and escalate uncertain or high-risk cases to human review. In practice, this means “guardrails” are becoming core infrastructure: programmable constraints and monitoring layers that shape what models can and cannot do in production.
Governance for AI systems is fundamentally shifting from a final-stage compliance exercise to an architectural necessity. Institutions are now embedding governance directly into the core design of AI systems. The path to trustworthy AI depends less on static policy documents and more on dynamic technical infrastructure that enables continuous oversight of AI behavior.
The Emerging Evaluation Gap
One challenge surfaced repeatedly during discussions: evaluation. Most organizations still lack reliable methods to evaluate modern AI systems in realistic financial environments. Traditional benchmarks are insufficient in settings where model behavior drifts over time, hallucination risks persist, and traceability is required across complex decision chains.
The challenge becomes significantly more acute with agentic systems. Evaluating a single model output is already non-trivial. Evaluating an autonomous system capable of browsing, calling tools, modifying information, or interacting across workflows is an entirely different class of problem.
Approaches such as LLM-as-a-judge offer partial solutions, but introduce trade-offs: increased latency, higher cost, and limited suitability for real-time decisioning. As a result, institutions are converging toward hybrid architectures that distribute evaluation across lightweight detectors, model-based adjudication, and human escalation depending on risk level.
Shared Risks Require Shared Defenses
As AI capabilities advance, financial institutions are increasingly recognizing that certain risks, such as cybersecurity, model failure, and systemic AI behavior, are not competitive differentiators. They are shared exposures. Because institutions largely rely on the same third-party foundation models and operate within highly interconnected markets, a single vulnerability or algorithmic failure can quickly cascade across the entire sector.
Yet most organizations are still building parallel infrastructure to solve similar problems in isolation, resulting in duplicated effort and fragmented standards. This is driving growing interest in shared benchmarks, common evaluation methodologies, and collaborative red-teaming initiatives.
The goal is not only efficiency, but acceleration: enabling the sector to converge more quickly on robust and trustworthy AI deployment practices, and to build shared defenses against adversarial attacks that no single institution can effectively address alone.
Bridging the Gap Between Research and Industry
This creates an opportunity for research institutions like Mila, as the current AI challenges in finance require advances in:
- AI evaluation and benchmarking
- Agentic system safety
- Robust guardrails
- Privacy-preserving AI
- Synthetic data generation
- Human-AI oversight
- Interpretable risk management systems
These collaborations are thus becoming a defining factor not only for individual advancements, but for public trust, consumer protection, and systemic stability in financial systems that underpin the broader economy and society.
The “Mila x Finance: The Era of Agents, Risk, and Consumer Protection” report captures how Canadian financial institutions are navigating these tensions, and highlights a growing need for engagements between research, industry, and regulators as AI becomes embedded in critical financial infrastructure.