CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED PLATFORM

Executive Summary Financial institutions have implemented and continue to implement many disparate applications for risk management and regulatory compliance in response to increasing regulatory requirements and the need for reduced operational risk. Recently, however, many organizations have begun unifying these separate applications via a single, comprehensive, enterprise-wide platform. Doing so allows them to gain a more accurate and complete view of their enterprise data, identify operational and compliance risks and suspicious activities faster and more thoroughly, more easily comply with new and changing regulations, and simplify overall management and maintenance. This white paper describes the business drivers for consolidating risk management and regulatory compliance applications and data. We then discuss strengths and limitations of various technologies including traditional data warehouses and newer big data technologies. The white paper then presents a new technology, InterSystems IRIS Data Platform, which complements organizations existing data management environments to provide a unified, panoramic view of all enterprise data and an ultra-high performance scale-out processing layer for performing complex processing tasks on distributed data to feed applications and deliver fast and accurate answers to unplanned questions from regulators and business users. Introduction Over the past two decades, financial institutions have implemented many different applications to support various risk management and regulatory compliance initiatives across multiple lines of business, geographic regions, and legal entities. Especially since the events of 2001 and 2008, new regulations have spawned an explosion in the implementation of point solutions. Financial services firms have created a patchwork of disconnected applications, each with its own set of rules and data stored in various formats and representations to meet these enhanced reporting requirements. As a result, many organizations are finding it difficult to obtain a complete and accurate enterprise-level view of their data for managing risk, comply with regulations, and perform surveillance, especially as data volumes grow and as new regulations are added and existing regulations change. This has exposed firms to financial and reputational risk, as well as fines and penalties from regulators. Many firms are struggling to keep pace with the volume, variety, and veracity of their data as they attempt to analyze years or even decades worth of enterprise data to comply with the onslaught of new and changing regulations. Compounding the challenge, organizations must also deal with the velocity of data to support real-time use cases. They must also comply with intraday liquidity reporting and other requirements that require shortening the delay between transactions and awareness and action. Page 2

Big Data and Data Warehousing Approaches Some firms have begun evaluating and implementing Apache Hadoop, Apache Spark, and other big data technologies as they work to consolidate their risk management and compliance applications to create a single, consistent enterprise-level platform. These big data technologies can provide some benefits for consolidation initiatives; for example, for cost-effective data storage and to support simple searches and certain kinds of queries. For example: HDFS (Hadoop Distributed File System) and other data repositories can provide scalable and cost-effective storage for large volumes of source data from across the organization; MapReduce and other tools such as Impala and Hive can deliver answers to certain kinds of questions from the data; Spark, with its in-memory architecture, can provide higher levels of performance for certain kinds of analytic queries. For many scenarios, however, these big data frameworks and tools are not sufficient, often leading to more of a data swamp than a data lake, where large amounts of dissimilar source data are stored but not easily leveraged and accurate answers to complex or unplanned questions remain difficult or impossible to find. Hadoop and Spark are efficient at distributing a search or simple processing task over multiple nodes, for example. However, they often take hours or fail to complete more complex tasks that require joining data from different tables or different nodes. This is especially important as firms ask new or unplanned questions of distributed data. For example, attempting to correlate multiple activities and products associated with a customer s account that are spread across multiple tables on different nodes using MapReduce or Spark often times out, never returning an answer. Hadoop presents additional limitations for consolidated risk and compliance initiatives, for example, concerning security, data governance and lineage. Financial organizations are discovering that these big data frameworks and technologies, while providing some advantages for storing large data sets and for certain types of processing tasks, are not sufficient. Some firms are attempting to utilize traditional data warehousing technologies for these consolidation initiatives. But data warehouses require data to first be pre-processed and structured; which can remove valuable information, require expensive storage and processing resources, have scale-up (vs. scale-out architectures), and limit the number of concurrent clients. And it s difficult or impossible to structure the data in advance to support any potential downstream ad hoc queries that may be asked by regulators. InterSystems IRIS Data Platform for Consolidated Risk Management and Regulatory Compliance InterSystems IRIS Data Platform provides critical capabilities to help firms successfully consolidate their risk management and regulatory compliance applications onto a single platform. It seamlessly complements an organization s existing data management infrastructure, including legacy applications, data warehouses, big data technologies and data lakes. It enables organizations to obtain a unified, panoramic view of all data from multiple sources across the organization via a real-time distributed caching layer, providing accurate, secure data access to distributed source data. Apache MapReduce and Spark are efficient at distributing a search or simple processing task over multiple nodes, for example. However, they often take hours or fail to complete more complex tasks that require joining distributed data. Page 3

It provides an ultra-high performance, scale-out processing layer for performing complex batch and real-time processing tasks on large, distributed data sets, including the ability to perform complex multi-table joins on sharded data without requiring co-sharding 1, Other Data Stores REPORTING VISUALIZATION TOOLS AD HOC ANALYTICS without replicating data, and without performing network broadcasts. The result is the ability to reliably provide fast answers to questions that would otherwise take hours or time out without completing, while reducing operational costs. Business Applications, Reporting, and Analytics MACHINE LEARNING/AI NATURAL LANGUAGE PROCESSING InterSystems IRIS Data Platform Security and Access Layer DECISION MAKING REGULATORY APPLICATIONS RISK MITIGATION OPERATIONS MONITORING InterSystems IRIS Data Platform enables organizations to gain a consistent view across all of their enterprise data, gain more accurate and more timely intelligence into their businesses, ensure better compliance with regulations, and respond more quickly to unplanned questions from financial regulators and compliance analysts, while providing the required role-based access, security, encryption, governance and lineage capabilities. It is horizontally scalable technology that supports scale-out, sharded architectures to manage and analyze very large data sets using low-cost distributed processing and storage nodes. ANSI SQL OBJECT CUBE JSON/REST TEXT EXPLORATION LAKES AUTHENTICATION AUTHORIZATION ENCRYPTION AUDIT EDW MARTS RELATIONAL DBS... COLUMNAR STORES SPARK CONNECTOR PARALLEL LOADERS SQL GATEWAY Read-Across Distributed Cache ADAPTERS Curated Data Tier TRANSFORMATION ENGINE Enterprise Cache Protocol Data-Aware Intelligence WORKFLOW ENGINE Panoramic View Integration and Enrichment Layer ENCRYPTION ENTITY RESOLUTION SEMANTICS ENGINE HIGH-SPEED INGESTION InterSystems performs fast and efficient complex processing tasks on distributed data sets without requiring co-sharding or duplicating data to perform multi-table joins for example, without the need for network broadcasts. The result is dramatically higher performance and reliability and lower operational costs, providing fast answers for tasks that would otherwise take hours or time out without completing. Data Sources BATCH IoT DEVICE REFERENCE SENSITIVE/ PRIVATE... UNSTRUCTURED HIGH-SPEED TRANSACTION LOWER COMPLEXITY HIGHER COMPLEXITY Figure 1: InterSystems IRIS Data Platform for Consolidated Risk and Compliance: Reference Architecture 1 Co-sharded data refers to data that is partitioned on a common key. Page 4

InterSystems IRIS: Database An HTAP Multi-Model Database At the core of the InterSystems IRIS Data Platform is a proven, enterprise-grade, distributed, hybrid transactional-analytic (HTAP), multi-model database that is designed to work with large sets of heterogeneous data. It ingests, stores, and indexes large volumes of transactional data at very high ingest rates to support real-time analytical use cases. C++ / JAVA / PYTHON / ANSI SQL / SPARK Access It provides the flexibility to store dissimilar source data in the most appropriate format. The data is stored once, and can be described in multiple representations such as SQL, object, multi-dimensional arrays, key value pairs, document, and so on. This eliminates the need to duplicate data or provide mappings between different representations (e.g. object-to-relational mapping) for superior performance and efficiency. It natively supports sharded, scale-out, distributed architectures, providing a cost-effective data platform for working with large data sets using commodity resources. It provides strong enterprise-level security measures, integration with Kerberos and LDAP security measures, role-based access control, and encryption for data both in transit and at rest. RELATIONAL OBJECT KEY-VALUE FREE TEXT Multi-Model Panoramic View Enterprise Cache Protocol (ECP) InterSystems provides powerful capabilities for reliable, high-performance, distributed, multi-workload processing at very high scale. This results in large part from a unique technology, Enterprise Cache Protocol (ECP), implemented, optimized, and hardened in thousands of mission-critical production environments. ECP is an integral capability of InterSystems IRIS. It provides fast and reliable answers to queries to distributed data sets without regard to how the data is organized. ECP natively supports distributed, sharded architectures. Complex joins are processed locally, rather than broadcast across the network, eliminating the latencies and time outs typically associated with broadcast joins, increasing performance and reducing operational costs. ECP makes it possible for regulators and compliance analysts to gain fast and accurate answers to unplanned queries without expensive and time-consuming pre-processing or replication of data. Using ECP is transparent, and requires no application changes or specialized techniques. Applications and processing tasks simply treat the entire data as if it were local. The performance and scalability benefits are dramatic, enabling organizations to gain answers, correlate information and identify patterns in distributed, non co-sharded data sets with performance and reliability, and at significantly lower cost. Enterprise Cache Protocol Data-Aware Intelligence Figure 2: InterSystems IRIS hybrid transactional-analytic multi-model database Page 5

Integration with Apache Spark Many of the business and regulatory drivers for risk management and compliance now require intraday and near real-time reporting and visibility, driving organizations to utilize ever higher performance computing paradigms to reduce latencies. As a result, Apache Spark, with its in-memory architecture, is being evaluated and adopted by some organizations. InterSystems IRIS provides fast and efficient parallel connectivity with Apache Spark, providing a high-performance, seamless complement to the big data architectures already in use at many financial institutions. The shard-aware integration and enrichment layer supports both batch inserts and individual inserts, enabling very large data sets to be ingested into InterSystems data shards quickly. InterSystems technology integrates directly with Spark via a shard-aware native Spark connector. The connector presents InterSystems data shards as native partitions for highest performance. This deep integration enables organizations to leverage InterSystems seamlessly and optimize Spark queries for improved performance. Concurrent Transactional and Analytic Processing Using Real-Time Data While some risk and compliance use cases require analysis of batch data, others require real-time data to be ingested and analyzed with historical or reference data. For example, they may require transaction monitoring and filtering, intraday or pre-trade liquidity calculations, and other real-time and low-latency applications. InterSystems IRIS is optimized to process both very high transactional workloads and a high volume of analytic queries on the transactional as well as historical or other batch data simultaneously. This is done without compromising performance for either workload type. It is ideal for handling both real-time and batch requirements. Interoperability InterSystems IRIS provides connectivity to a wide range of applications and data sources, including databases, flat files, etc. It also includes a built-in adapter library that provides connectivity and data transformations for traditional industry standards, protocols, and technologies such as REST, SOAP, HTTP/S, FIX, Kafka, and JMS. InterSystems IRIS offers a SQL Gateway that can access and present data and metadata in common databases and data warehouses, including Oracle, Sybase, and DB2 as well as SQL-on-Hadoop engines such as Hive and Impala to InterSystems client applications as native tables. In addition, the data platform provides capabilities for applying transformations, a workflow engine, encryption, entity resolution, and a natural language processing engine for working with unstructured text. InterSystems IRIS provides comprehensive, unified access to support third-party reporting, analytics, and visualization tools already in use, and to support existing risk and compliance (as well as other) applications. It provides ANSI SQL support with time-tested, proven SQL optimizations on fully sharded, scale-out data architectures, as well as the flexibility to support object, OLAP, and JSON/REST access to the data. Integrated Analysis of Unstructured Text Unstructured data, including free text in emails, documents, text messages, master agreements, and Suspicious Activity Reports (SARs), as well as external data from blogs and tweets, can provide valuable insight to help banks reduce risk and identify suspicious or fraudulent behavior. Natural language processing is an integral component of InterSystems IRIS. These native capabilities provide a unique bottom-up approach that analyzes text based on what is contained in the text itself. It can work with customer-defined dictionaries and ontologies and it provides native embedded semantic analysis capabilities for analyzing patterns and correlations in unstructured data. InterSystems IRIS also includes capabilities for performing data exploration, signal detection and trend analysis, content-based profiling and clustering, and information extraction, categorization and mapping on unstructured data. These capabilities can be useful in summarizing and contextualizing large amounts of free text for various compliance and surveillance initiatives. Page 6

Data Lineage and Data Governance Effective data lineage the ability to describe the source of the data and how it changes as it moves through the data pipeline and data governance are critical for risk and compliance initiatives. For example, regulations such as Consolidated Audit Trail SEC Rule 613 require organizations to collect and accurately identify every order, cancellation, modification and trade execution for all exchange-listed equities and options across all U.S. markets. Different applications that perform different functions may store different representations of the data or may modify the data; for example, to break a large initial order into smaller child orders for execution. Compliance analysts and regulators must have confidence in the original data sources and the processes and transformations that are applied. InterSystems IRIS provides strong support for multiple data types including both object and SQL schema and its flexible metadata capabilities allow the application of proper data lineage and provenance. Conclusion InterSystems IRIS Data Platform offers a powerful and seamless complement to financial institutions existing infrastructure to deliver a secure, panoramic view to all of their enterprise-wide data assets. It provides an ultra-high performance, distributed, scale-out processing layer for handling a range of complex batch and real-time tasks that are required for consolidating risk management and regulatory compliance point solutions into a comprehensive, unified platform. By incorporating InterSystems IRIS into their consolidation initiatives, financial institutions can more quickly and cost-effectively ask more questions of their enterprise data in applications, data lakes, warehouses, and other data sources, gain more accurate and more timely intelligence into their business, ensure better compliance with industry regulations, and respond more quickly to questions from financial regulators and compliance analysts, reducing operational and regulatory risk. InterSystems is the engine behind the world s most important applications. In healthcare, finance, government, and other sectors where lives and livelihoods are at stake, InterSystems is the power behind what matters. Founded in 1978, InterSystems is a privately held company headquartered in Cambridge, Massachusetts (USA), with offices worldwide, and its software products are used daily by millions of people in more than 80 countries. For more information, visit Financial.InterSystems.com Page 7