Machine Learning and the Coming Transformation of Finance
AI Can Crush Fraud and Transform Trading But You Better Give It the Right Guard Rails to Keep It from Going Off the Tracks
AI Can Crush Fraud and Transform Trading But You Better Give It the Right Guard Rails to Keep It from Going Off the Tracks
Artificial intelligence has a long history in finance and banking.
When it comes to fraud detection, big financial houses had data scientists before they were called data scientists. Long before anyone heard the words “deep learning,” rule based expert systems often formed the backbone of fraud detection and trading systems.
Today, with AI and machine learning (ML) growing more and more advanced, it’s poised to surge into other areas of the financial industry like customer service, loan processing, risk analysis, portfolio allocation, robo-advisors and more.
But bringing ML into your business requires a deep rethinking on multiple fronts. It’s not enough to bring over the old MATLAB workflow your statisticians used to use. To really build and run machine learning models at financial service scale, you’ll need a rock solid foundation. You’ll need a tech stack that grows with your team and your existing DevOps framework won’t port to ML. There are too many steps in machine learning that are different from traditional coding, like training and experimentation.
That’s the work of the AI Infrastructure Alliance which brought together over 45 companies to build a comprehensive and deeply interoperable stack that will power your machine learning today and tomorrow.
But even that’s not enough. You’ll need more than just software. You’ll need people too.
Despite all the promise of machine learning to deliver more flexible responses to changing customer needs, combat fraud, and adapt to rapidly changing markets, machine learning is still in its infancy. It can be superhuman and then make a mistake not even a baby would make. That’s why your systems need a human in the loop.
Designing a comprehensive human in the loop system is what we call an AI Red Team. They’re the rapid response team when things go wrong and they’re always looking for flies in the ointment of your algorithms so they can stop problems before they start.
Let’s take a look at where AI was in financial companies and where it’s going and then dive into building a modern AI/ML stack and an AI Red team to make sure your pipelines are fast, flexible and secure.
Crushing Fraud
Fraud continues to tear into the profits of banks and financial institutions everywhere. Where there’s money, there’s crime.
In 2019, financial services companies lost 28.65 billion to fraud, amounting to 6.8 cents for every $100 of total volume. That’s up from 27 billion in losses only a year earlier. It’s expected to jump to 35 billion by 2025.
Hand coded logic rules still form the bedrock of anti-fraud systems. Those are rules that fraud fighters codify into algorithmic wisdom based on their experience. They work, but over time they prove very brittle. They’re not adaptable to changing threats.
If you set a threshold of $500 for an attack then attackers increasingly find ways to commit fraud below those levels, like stealing thousands of credit cards and doing smaller transactions across all of them. If you move the threshold to $100, you get too many false positives. Just as hand crafted rules failed to stop the deluge of spam in the 1990s, savvy financial criminals find ways to game the rules and slip away into the night with a big payday.
As more and more of our transactions have moved online, it’s only gotten worse, with attackers getting increasingly sophisticated. The age of armed men in ski masks is coming to an end. Armed bank robbery continues to decline.
Today’s bank job is digital.
Skimmers put a card reader and a camera in an ATM, capture the card, clone it and regular folks like you and me wake up to find our accounts drained and we don’t even know why.
Why risk a shootout when you can get ATMs to spit out cash without ever touching them?
That’s what happened for years as the Carbanak gang hit ATMs all over the world. They hacked banks got ATMs to spew cash. All they had to do to collect the money was send a few guys with bags and masks to pick it up. The leader of the Carbanak gang is in custody but his malware lives on, and 1.2 billion dollars remains missing. The Carbanak malware is still alive and well, morphing in the wild on the dark net and for sale to the highest bidder.
But it doesn’t have to be highly advanced attacks that slip through the traditional rule based systems of financial institutions. Identity theft, credit cards for sale by the millions on the dark web, and petty theft dwarf the more sophisticated attacks like Carbanak that attracted the hounds at Interpol. It’s these crimes that largely go undetected and fly under the radar. A small-time criminal stealing 50 credit cards and buying gas and Soda at the Local 7'11 doesn’t raise an international investigation but little people get hurt and the financial institution usually just eats the loss.
Trading on the Edge
Algorithmic trading swept into the financial markets in the 1980s, with the rise of personal computers. Before that all trading was discretionary. According to the market research firm, Mordor Intelligence, “Algorithmic trading accounted for around 60–73% of the overall United States equity trading” in 2020.
In just 40 years, algorithmic trading went from zero to the majority of the market.
While smaller traders and programmers have gotten in on the game, trading is still dominated by large hedge funds, investment banks and proprietary trading firms. They’ve got the size, the scale and the will to build their own proprietary trading software with dedicated quant teams, data centers and staff.
Early trading systems were mostly “expert systems,” aka human knowledge coded into rules. Experienced traders helped programmers translate their trader’s wisdom into heuristics. But just like hand-coded fraud detection, these systems can struggle over time. They don’t always adapt to black swan events, like COVID or changes in market dynamics. Even worse, a mechanical system may work for years and then fail for reasons that just aren’t clear until later.
That’s what happened to Knight Capital in 2012, a leading financial market maker:
“Knight originally received 212 small orders from retail customers and then mistakenly streamed thousands of orders per second into the NYSE market over a 45 minute period; it executed over 4 million trades in 154 stocks totaling more than 397 million shares and assumed a net long position in 80 stocks of approximately $3.5 billion as well as a net short position in 74 stocks of approximately $3.15 billion. Knight lost over $460 million from these unwanted positions, and by the next day, its own stock price had dropped by 75%.”
Other examples abound: A trading system failure hit Marsten Parker, one of the legendary traders profiled in the latest Market Wizards book. One of his systems worked for 7 years and then it started to fail catastrophically. It was never profitable again and he had to switch systems. It was 2013 and he noticed his short trades were performing terribly. In hindsight, he understood:
“It was the period when people introduced the term BTFD (buy the f*ing dip). What was happening was that the same signals used for my short trades became popular buy signals.”
If traditional fraud and trading systems have failed us, what will give financial houses the edge they need to crush fraud and adapt to the markets faster?
Machine learning.
Machine learning (ML) algorithms are different than hand coded heuristics. In ML, algorithms learn their own rules. They learn deep patterns discovered in the data.
Older approaches to ML like Support Vector Machines (SVM), Decision Tree (DT) and Logistic Regression (LR) are already used across credit card fraud detection systems. But they’re not good for large datasets and not very adaptable. Increasingly, they’re giving way to modern machine learning breakthroughs like Long Short Term Memory (LSTM) , Convolutional Neural Nets (CNNs) and Transformers, which learn from much bigger datasets and prove increasingly adaptable to changing threats. ML is the natural evolution of fraud detection.
It’s also the natural evolution of algorithmic trading. Machine learning holds the potential to make trading systems more flexible and more adaptable to changing market dynamics and conditions, especially if the systems are continually learning.
As Ivy Schmerken wrote for FinExtra:
“Machine learning is…[the] next step of algorithmic trading because machine learning identifies patterns and behaviors in historical data and learns from it,” said Robert Hegarty, managing partner, Hegarty Group, a consultancy focusing on financial services, technology, data, and AI/machine learning. While traditional algorithms are created by programmers and quant strategists, these algorithms based on if/then rules do not learn on their own; they need to be updated. “With machine learning, you turn it over to the machine to learn the best trading patterns and update the algorithms automatically, with no human intervention,” said Hegarty. “That’s the big differentiator.”
AI is Already Here
Machine learning isn’t just coming to financial houses, it’s already here.
According to Forbes:
“70% of all financial services firms are using machine learning to predict cash flow events, fine-tune credit scores and detect fraud, according to a recent survey by Deloitte Insights.
54% of Financial Services organizations with 5,000+ employees have adopted AI, according to the latest Economist Intelligence Unit adoption study.”
As financial firms get more comfortable with machine learning in their most advanced departments, they’ll start to adapt it in other areas to deal with the vast treasure trove of structured and unstructured data pouring into their data lakes.
Whether that’s trying to give customers better answers when they call with questions, or quickly figuring out whether someone is qualified for a loan, machine learning will seep into every aspect of the financial enterprise. It will also revolutionize the areas where it’s already dominate, trading and fraud.
None of this comes without risks though. Rule bases systems are at least easier to understand. People can inspect and interpret hand-coded rules but with machine learning the systems are more opaque and we don’t always know why a machine made the decision it made.
Even worse, as governments take their first stabs at regulation, it’s clear from early drafts of bills in the EU, that regulators don’t fully understand how machine learning models work and they’ve drafted vaguely worded bills that will be open to interpretation and create additional compliance complexity.
So does that mean that trading and fraud departments shouldn’t aggressively move into machine learning? Absolutely not, but it does mean that they should move into AI with a clear vision and an understanding of how to mitigate those risks.
Managing the Risks of Machine Learning Today and Tomorrow
Machine learning holds tremendous promise across the financial world. But as machine learning makes its way into more and more aspects of the financial services landscape it brings with it increased risks and uncertainty. There’s four ways to mitigate those risks:
Data governance — controlling who has access to what data
Data lineage/Data Versioning — tracking changes and dependencies between dataset as they move through your systems
Scalable platforms — dynamic interoperability between best-in-class tools for ML with clean abstractions and APIs
AI Red Teams — the QA, test, and rapid response team for your production ML practice
The one advantage financial firms have over their attackers is their massive treasure trove of data. As pioneering AI expert, Andrew Ng, says “Data is food for AI.” The key is using that data wisely.
Ensuring you have clean, well curated data sources are the keys to machine learning models that behave well. Knowing where that data came from, who touched it and why, gives you a clear path to tracking down problems. Without that, you may find that a simple mistake in your data, such as a corrupted day or week of trading signals, could result in your model learning something totally wrong about the world. To track it down you have to be able to go back in time and understand what happened and when.
In short, data governance is about policies and procedures. Design a good method for knowing where your data comes from and how it got there. As Tech Target writes:
“Data governance (DG) is the process of managing the availability, usability, integrity and security of the data in enterprise systems, based on internal data standards and policies that also control data usage. Effective data governance ensures that data is consistent and trustworthy.”
A Data Time Machine
Beyond data governance you need data lineage and data versioning. Data versioning is the snapshotting of data at points in time, and data lineage is the tracking of how that data changes over time.
Data versioning has been around for years with file systems like ZFS, but people were limited by their local physical storage. You could only take so many snapshots and you didn’t have a lot of context to what was in those snapshots. Now the cloud providers like AWS, Azure and GCP have effectively given us infinite object storage. A machine learning platform like Pachyderm uses a copy-on-write file system over the top of that object storage to keep constant snapshots of every change to your data with essentially infinite snaps.
Copy on write is not optional.
You may have heard of metadata stores but they’re a disaster waiting to happen without immutability to back them up. Without a copy-on-write file system that metadata can point to a state in your data that no longer exists.
If you have 50,000 unstructured 10-K filings in a directory, encoded with Word2Vec, and you run 50 experiments on them and then someone comes along and re-encodes them as Glove and overwrites the original files, all those 50 experiments are now worthless because you can’t recreate them.
Snapshots aren’t enough either. It’s like having a backup of that data but you don’t necessarily know what’s in that backup. You need the context. That’s data lineage, which provides context with Git-like commits to track every step of the data as it changes simultaneously with the models, and the code. Now you can roll backwards and forwards with your data lineage time machine to find exactly when data got corrupted or when things changed in a way that broke your model.
Coupled with your data governance strategy, you can do deep forensics to quickly get to the bottom of what went wrong. That gives you a leg up on auditing and compliance that’s coming fast to the world of AI. It gives you the ability to recreate any step in your pipeline and rebuild your model as needed. Lastly, it lets you correct mistakes like mislabeled data, and broken or missing data.
Scalability and the Canonical Stack
Beyond that you need scalability and a tech stack that scales with your team. Too often organizations bring in data scientists and expect them to spin up their own tools. They end up working on their laptop with a Jupyter notebook, some Python and Pytorch and expect that to scale to support an entire team of data scientists.
The AI Infrastructure Alliance is looking to make sure a diverse set of highly specialized tools works together seamlessly. Right now, we’re seeing a Cambrian explosion of new energy in the AI space and in the next few years we’ll see the emergence of a LAMP stack for AI, a quintessential set of tools that makes it easy to build complex algorithms on and deliver them from data to decisions.
If they’re all pulling down terabytes of data to their laptops and that data is getting out of sync, you’ve got a broken system that won’t scale to deliver the edge you need to cut costs and drive up revenue. If your stack doesn’t work for a team of dozens or hundreds of data scientists and engineers working together, it’s not going to last.
Your AI/ML stack needs to connect to existing role based access control and weave together a complex series of tools that each do different parts of the machine learning workflow. It needs to carry algorithms from concept to production and your data scientists shouldn’t have to be IT engineers too, writing their own ETL and building their own complex architectures. That allows the data scientists to focus on the AI instead of the IT.
AI Red Team
Think of the AI Red Team as the machine learning version of the network security “red team.”
The idea of a red team is as old as the 11th century when the Vatican would appoint a Devil’s Advocate, whose job it was to discredit candidates for sainthood. Today, companies use red teams for everything from simulating the thinking of rival companies to stress-testing strategies and defending their networks against security threats.
The job of the AI Red Team is to think of everything that can and will go wrong with AI models. They’re in charge of Murphy’s Law for machine learning. It has three major jobs:
Triage short-term problems
Find solutions for long-term problems, such as drift and hidden bias
Build unit tests and design end-to-end machine learning pipelines that make sure every model passes those tests on the way to production
Investing in an AI Red Team now, before you have to answer questions from regulators on why your loan processing algorithm is biased against a protected group, will mean you’re ready to answer those questions with a strong, clear voice.
Powering the Financial Centers of Tomorrow
Today, machine learning can increasingly design its own rules that are much more flexible than traditional expert systems. Instead of following a set of human designed logic, machine learning systems write their own rules by studying a much wider range of data points, behavior patterns and account information to pick out fraud in the noise of every day financial transactions.
Just as Bayesian filters helped crush spam in the 2000s, machine learning can spot more complex patterns of fraud as they happen. Trading systems will get more adaptable and flexible, able to deal with black swans like COVID, and pivot to protect people’s wealth. It will also help financial houses answer 70% of questions before a customer ever needs to talk to someone on the phone.
All that promise and potential is incredibly tantalizing and it’s easy to charge in head first, without understanding all the risks clearly. If financial teams don’t invest in the right tools and people from the start, machine learning may prove a dangerous mirage. They risk getting caught out by regulators or an angry mob on Twitter when their models go awry. Either that or they risk a system that melts down under pressure or does strange things they never expected.
But if financial houses invest in governance, data versioning/lineage, a robust tech stack and build a strong AI Red Team right out of the gate, they’ll have a rock solid foundation to tap the full power and potential of machine learning today and tomorrow.
###########################################
I’m an author, engineer, pro-blogger, podcaster, public speaker and Chief Technical Evangelist at Pachyderm, a cutting edge data lineage and AI pipeline platform. I’m also Managing Director of the rapidly growing AI Infrastructure Alliance, which is helping make the canonical stack for machine learning a reality.