AI is perhaps more important to the financial services sector than any other. As an industry built on data, where the right information at the right time can potentially make or lose millions, any extra analytical edge that a bank, insurance or investment company can gain is highly valuable.
As the most data-rich organizations in the world, financial institutions are well-positioned to build AI solutions that can improve how they manage risk, govern internal processes and serve their customers. However, transforming this data into a useful AI application takes some heavy lifting.
This article explains the challenges involved in managing and moving the large data volumes involved, and how financial services companies can design data architectures to support innovation in AI.
Why artificial intelligence is important to the financial sector
Financial services are still experimenting with AI, but early adopters stand to make great gains in three key areas.
First, they can improve the customer experience with client-facing improvements. One example is the use of AI to personalise financial offers after analysing customer behaviour in retail banking. An AI bot could tell a customer that it had waived a fee for an accidental overdraft, or could offer to match interest rates from a competing bank.
Second, they can drive new efficiencies into back-end processes that reduce costs and improve performance. For example, American Express is already using AI to spot fraudulent transactions.
Chatbots, both in contact centres and in retail branches, marry both of these wins by improving customer experiences and lowering the cost per interaction for banks.
Third, banks can take the processes that they excel at with AI and turn them into exter-nal services. We have already seen this with BlackRock’s AI-powered Aladdin risk manage-ment platform, which the investment fund management giant expects to make up 30 per cent of its revenue by 2022.
The importance of data
One thing underpins all AI activities: data. Machine learning and deep learning algorithms rely on extensive data to train and constantly refine their algorithms.
In the world of AI more than anywhere else, data really is the new oil. It must be extracted, crude and unusable in its raw state, and then refined gradually to the point where the AI engine can use it. Different data streams must be integrated, deduplicated, cleaned and then formatted. This must happen repeatedly as AI algorithms produce their own data, which is then recombined and used again.
Managing these complex and high-volume processes is the biggest challenge for companies hoping to embrace AI. IDC’s January Cognitive, ML, and AI Workloads Infrastructure Market Survey found that 50 per cent of all companies viewed data volume and quality as a major challenge, while 47 per cent cited advanced data management as a big stumbling block.
Financial companies face particular challenges stemming from their complex environments. They must pull distributed data coming from different financial systems to fuel their AI initiatives.
Just like oil, AI data must be transported. Banks want data in the right place at the right time to fuel AI projects, which usually involves transferring it between financial systems. These systems may be located continents apart and operated by different third parties. Efficient data management and movement is therefore crucial in any financial services AI operation.
The need for governance
Financial services companies must consider another requirement when creating a data-friendly AI architecture: governance.
Investment and retail banks alike make daily decisions affecting customers’ everyday lives. A strong regulatory framework also subjects these financial institutions to heavy scrutiny. They must be accountable for their actions. This means ensuring that the data they use for AI training and inference is properly sourced and prepared. Its provenance must be verified at each stage so that it doesn’t pollute AI algorithms and introduce inaccuracies or bias.
AI data pipelines
A data pipeline can solve these problems for banks that need the right data in the right place at the right time. In an AI setting, a pipeline is a set of data processing elements connected in a series, where the output of one element is the input of the next.
The pipeline must support the smooth flow of data across the entire AI lifecycle, which spans several discrete stages:
Ingestion. This is where the data destined for the AI training engine is created, collated and stored, ready for the next stage in the process.
Data preparation. Data from the edge is transformed into models that the AI training engine can accept.
Exploration. Just as with conventional students, AI algorithms don’t learn unless they are given the right information in the appropriate way. Data scientists experiment with different parts of the data set and with different AI frameworks to prepare their training data for the next stage.
Training. The training process involves feeding thousands of examples into the AI algorithm, which uses graphical processing units (GPUs) to learn from them. The output of this phase is a statistical model that the AI can use to interpret new data.
Inference. This is the stage where the AI uses the model to evaluate new data, such as recognising images or ‘reading’ text.
Archiving. The data used to train AI can lose value, becoming less relevant over time. Financial institutions must separate this from the training set to make way for new data as it continually retrains its models. Compliance requirements in the heavily regulated financial sector require companies to archive this data for certain periods, and it must be stored securely.
Each of these stages can occur at one or more of three locations in the AI pipeline: The edge, the core, or the cloud…
The edge is the place where data is created or ingested for AI processing. In some cases, this could be the edge of the network. For example, a user may generate data for consumption by a fraud detection app when they use a mobile banking app on their phone or interact at a branch. However, the edge of an AI data pipeline could also be a structured database containing customer transaction records on the company’s premises.
The core is the centre of the AI process, where the preparation, exploration, training and often the inference of data happens.
The cloud can house data at all stages of the AI workflow, from ingestion through to training and subsequent archiving. For example, many financial services companies will handle the compute-intensive training stage in the cloud to take advantage of its elastic compute capabilities.
Increasingly, data will flow between the edge, the cloud and the core at various stages of the AI process. Data from IoT devices or on-premises databases will flow to core AI systems hosted either on-premises or in the cloud, or both.
Banks may also migrate cloud-based AI processing to their own premises as their AI projects mature and gain more internal traction, while still storing edge data in the cloud or spreading it between different systems.
The type of data used can also affect its storage and transmission in the pipeline. For example, an AI system may need to process data-intensive images that can be stored in the cloud, but it may also train using sensitive customer data which is best stored in an on-premise environment. Rather than a single data lake, this can lead to the creation of federated ‘pools’ of data spread between multiple infrastructures, united by the pipeline.
Building the pipeline
NetApp offers a range of integrated storage and processing solutions at all stages of the AI pipeline, along with a range of services that helps customers tie them all together into a cohesive, unified workflow.
The company has partnered with NVIDIA, which leads the field in AI computation. The two companies have combined their computing and all-flash storage technology to store and deliver data at high volume to AI processes at high speed, helping to avoid bottlenecks in the AI pipeline.
NetApp is able to integrate not only its own platforms but other storage platforms too, creating data hubs atop financial customers’ existing infrastructure.
The company’s range of AI-ready components also provides some basic data controls, such as the ability to lock data and prevent further editing, replicating traditional write-once-read-many storage technology. Its software provides interfaces for third party products, allowing customers to build more data governance and control capabilities into their AI pipelines.
Finally, NetApp works with partners to offer AI pipeline consulting and integration services for financial companies as they build out their machine learning workflows.
Across the world, financial institutions are preparing themselves for an AI-powered future. While JP Morgan saves 360,000 employee hours using machine learning to read loan agreements, others like hedge fund Man AHL are using it to route trades effectively. The technology is transforming banking across all applications, ranging from the obvious to the niche. Powering your AI projects with a seamless, fast AI pipeline will position you to take advantage of this revolutionary approach to data processing.