Banking Mid Atlantic 3 2020

TECH TRENDS

I

Machine Learning Technology Stack for Banks

Willingness To Appreciate Nuances Will Serve Customers Best B Y A NK U R G A RG, S P E C I AL TO B A N K I N G MI D AT L A N T I C

mplementation of machine learning (ML) is often misunderstood. Yet, knowledge of a ML stack—the collection of technological tools and processes that facilitate the generation of data-derived insights—is vital. There are four key components of the ML stack: 1) Sourcing data; 2) Establishing a trusted zone or “single source of truth” (SSOT); 3) Establishing modeling environments; and 4) Provisioning model outputs or insights to downstream applications. “By understanding ML technology stack implementation, banks can leverage the benefits of data and generate programming that could transform their businesses, with early adopters more likely to see sustained success,” according to Raymond Chase, vice president for data analytics with Connecticut-based People’s United Bank. With experience in the industry spanning more than 30 years, Chase says he has seen many projects fail despite best intentions when the ML stack is not addressed. SOURCING DATA Data sourcing includes surveying accessible data types that are fed as inputs to the algorithm, as well as the processes and technologies needed to tap into sources. Examples of sources include core transactions, customerprovided information, the Internet, external databases, market research data, social media, and website traffic. Once sourced, data must be curated through an SSOT, a structuring of the data in a consistent place, and data lineage is established to ensure quality and trace impact downstream. Curated data from an SSOT can then be sourced by a modeling environment that is created to implement ML algorithms.

10 ISSUE THREE 2020 | BANKING MID ATLANTIC

THE TRUSTED ZONE It is important to prove data validity and quality throughout the chain of handling. Data must be aggregated, reconciled, and validated before being consumed for ML purposes. Key attributes of a trusted zone include: · A central repository of data, aggregated from multiple channels. · Clearly defined and documented data elements and data lineage. · Documentation of assumptions. For example, if a teller’s cash transaction is reported by the core system and reported by the teller transaction system, documentation must show which entry prevails and why it prevails. · Protocol for addressing unintended exceptions. For example, if there’s a localized glitch at the branch level for an ATM that is not able to report transactions on a certain date, there should be a way to account for missing transactions and to capture them when they’re available. · Daily reporting that matches and reconciles counts across systems. · Architecture that expands vertically and horizontally. · The data store that houses the trusted zone should have high availability and be resilient to failure. Lately, more data warehouses are hosted on Cloud. Cloud benefits include high availability, cost-effectiveness, and horizontal and vertical scaling. Another trend is increasing adoption of NoSQL databases such as MongoDB. These provide greater flexibility and better performance to store unstructured data, vis-a-vis relational databases. As with all things digital, regulation

and security of data are intensive. Data is more intimate today, and privacy and security regulations are more complicated. The data governance team should be part of any ML implementation. Having data lineage that tracks data sourcing is thus effective to ensure compliance. Data collected and held must be protected. Security and risk management teams must be involved to initiate and monitor best practices, and to develop security breach response. Investment in outsourced assistance is worthwhile for smaller institutions. If Cloud vendors are utilized, they must contractually agree that data security is their responsibility. Transmission of data from the premises to Cloud and from Cloud to premises must be part of the scope and should be carefully designed to address security risk. Data encryption before transmittal to Cloud, even when transmission occurs over a secured virtual private network, is valuable. ML MODELING ENVIRONMENT The objective is to facilitate creation of models that generate meaningful insights and placing insights in a way that passes model validation and audit requirements. There are three components: modeling infrastructure, development tools, and DevOps. Different options for ML modeling environments include: · Ready-to-use services, such as Amazon’s Polly and IBM’s Watson. · Automated ML, such as DataRobot. · ML Workbench, such as Amazon’s SageMaker. · Custom-/in-house-built ML

Turn static files into dynamic content formats.

Create a flipbook