AI is no longer a novelty within financial services. Those early, heady days are over, almost as quickly as they began. Across the financial sector, AI has transitioned to a must-have within operations, adopted by major institutions eager to integrate it ever deeper into their systems. “AI-native” platforms and “AI-powered” solutions abound, all promising transformative efficiencies.
It seems that every major tech announcement, whether or not it’s related to finance, is about AI. At Money20/20 in June, Microsoft showcased a mind-boggling installation of turnkey AI infrastructure solutions for financial institutions. As Microsoft moves in this direction, it’s likely that customers will follow. The conversation is pivoting from isolated AI pilots to enterprise-scale AI deployments that underpin fraud detection, liquidity forecasting, and credit scoring, amongst many others.
AI innovation arms race
The buzz and enterprise-size appetites have set off an arms race. Visa and Mastercard are moving rapidly into agentic AI commerce, accelerating development of systems where bots, not humans, initiate and complete financial transactions. PayPal (unsurprisingly) is too. An era of machine-to-machine financial decision-making is taking shape under our feet. More adaptable and faster than traditional approaches, AI-powered payment operations have proven successful enough that recent surveys found nearly two-thirds of CFOs now consider AI essential to payment operations.
Amid all this innovation, one critical question remains worryingly unaddressed: How secure are these AI systems? As one neobank CEO I met put it: “AI isn’t really the core concern when it comes to security from a client’s perspective. Clients care that their money is safe, which comes down to strong cryptography and secure systems.”
This surprised me, given some of the other conversations I’ve recently had, which point towards this being a perspective that banks no longer have the luxury of maintaining.
Cryptography, to be sure, hasn’t stopped being essential. Strong encryption protects data at rest and in transit. It does not, however, account for the logic engines that make decisions about fund transfers, loan approvals, or compliance flags. It can’t automatically catch and counter fraudsters using generative AI, deepfakes, and voice cloning to evade detection.

US Tariffs are shifting - will you react or anticipate?
Don’t let policy changes catch you off guard. Stay proactive with real-time data and expert analysis.
By GlobalDataIn an agentic system, risk is not just about someone intercepting the data. Risk also comes from incorrect or manipulated decision-making, especially when systems interpret and act on ambiguous or adversarial inputs.
Security is now a question of behaviour
AI systems don’t operate like traditional software. In our research lab, we refer to this as the “security gene.” Just as biological systems can carry hidden traits, AI models develop behavioural patterns during training that do not appear in code reviews and only become visible in specific runtime scenarios.
Consider a chatbot embedded in a financial services platform, configured specifically to avoid discussing competitors. If the model was trained on internal documents that contain competitive analysis, an attacker could compromise this configuration and manipulate the prompt to surface that information anyway. This form of leakage does not require a code-level exploit, just clever linguistic framing.
We tested this in a recent exercise. By gradually building rapport with a sales assistant chatbot, the model was persuaded to recommend a rival product. This wasn’t a technical failing of the guardrails, just an instance in which the AI interpreted empathy as the higher priority. It acted as it was trained to—helpfully, not defensively.
AI systems as attack gateway
In more serious scenarios, the attack paths become clearer and more dangerous. In another test, we simulated a database (SQL) injection through a chatbot interface. In summary, after probing the chatbot, we inferred that there was a backend database. We inserted obfuscated SQL commands into messages, identified the database and then exploited poor input handling to trigger unauthorised queries.
These are not theoretical risks. If left untested, these technical vulnerabilities won’t remain isolated. They’ll manifest as broader organisational risks that span beyond the engineering domain. When viewed through the lens of operational impact, the consequences of insufficient AI security testing map directly to failures in safety, security, and business operations.
Safety risks occur when AI systems produce outputs that are inaccurate, harmful, or misaligned with intent. This can result in reputational damage, especially in customer-facing environments or regulatory contexts.
Security risks arise from direct exploitation. This includes prompt manipulation, unauthorised access via plugins, or data leakage through context persistence. Such vulnerabilities often exploit the model’s design logic, token handling, or interface assumptions.
Business risks emerge when AI systems do not meet operational or compliance standards. These failures can lead to regulatory penalties, service disruptions, or unanticipated costs. When AI systems are deployed without proper assurance processes, particularly within automated decision-making workflows, these risks increase significantly.
Why traditional security tools fall short
Many AI vulnerabilities stem from factors that static code analysis and conventional scanners aren’t designed to detect. They arise from model behaviour, integration errors, or live context manipulation. The AI stack has moved beyond what current AppSec practices were designed to manage. Approaching AI testing as if it were a code library inevitably produces blind spots.
Some vulnerabilities surface only during live interactions with user input. Others emerge through interconnected systems, such as plugins or document retrievers, where privilege escalation can occur. Data privacy issues often stem from memory features or context retention. These factors cannot be audited effectively unless systems are tested under realistic traffic and adversarial pressure.
These technical issues carry strategic implications. A chatbot revealing proprietary pricing models is tantamount to a governance failure. If an autonomous agent incorrectly approves a loan due to poisoned input, it opens the door to a regulatory breach. The risks migrate from engineering teams to the executive suite.
Regulators are beginning to respond. The Bank of England has flagged concerns about the use of autonomous AI in trading. If systems behave in synchronised or manipulated ways, volatility and systemic risk can follow. At the European level, new rules will soon require evidence of safety testing and compliance monitoring for high-risk AI.
Third-party AI doesn’t transfer accountability
It’s tempting to assume that using AI services or agents from a reputable foundational model provider comes with built-in security. But just as cloud service providers (CSPs) don’t guarantee the security of the applications you build on their platforms, foundational model providers don’t assume liability for how their AI is used—or misused—in your environment.
The responsibility for security, compliance, and reputation always rests with the organisation that deploys the AI.
We’ve already seen what happens when companies try to shift the blame. Not too long ago, Air Canada was forced to honour a refund policy “invented” by its AI-powered customer support chatbot, which promised a $1 fare adjustment that didn’t exist. The airline attempted to argue that the AI made the error—but the court rejected that defence, holding Air Canada responsible for the system it deployed.
The same applies in finance. If an AI agent incorrectly recommends a loan product or leaks sensitive pricing data, it’s not the model provider that regulators or customers will hold accountable. It’s the institution that integrated and deployed the system. Financial firms must treat third-party AI the same way they treat any external code dependency: with rigorous testing, continuous validation, and clear governance.
Guardrails alone do not provide assurance
Some security leaders rely on guardrails to contain model behaviour. These can serve as a first line of defence but are just as easily bypassed using adaptive or iterative prompts. This isn’t a failure of sophistication but a question of whether the model prioritises helpfulness over restriction.
Financial services have always depended on trust, and now this trust must extend to the AI systems managing core operations. Models must not only perform well during training. Their operational behaviour must be observable, testable, and explainable. Institutions need AI-specific security practices that match the complexity of the systems they now depend on.
Adversarial testing, integration-layer simulations, and runtime observability must be embedded into deployment pipelines. AI security practices must become part of core engineering and risk governance in the same way DevSecOps practices became essential to software resilience.
The significant benefits AI introduces hinge on control. Without a clear understanding of how AI behaves in dynamic environments, organisations increase their exposure to silent failures. These failures do not always announce themselves through alerts. They often begin virtually undetected, surfacing only when consequences can no longer be avoided.
Speed isn’t everything. The financial institutions that lead in the next phase won’t be those that adopt AI the fastest but those that implement it securely, with accountability and clarity at every step.
Steve Street is COO and co-founder of Mindgard