Big Data for Compliance and Industry Trends Nadeem Asghar Field CTO Financial Services - Hortonworks
RegOps COE Project Overview Data requirements for controls implementation of regulatory reports Centralized Repository for all order related data Foundation for CAT regulation Data Sourcing & Storage The data sourcing and storage components forms the backbone of RegOps COE, and is composed of a data warehouse that ingests, formats and stores order, trade, transaction and reference data. Regulatory Reporting: This component is the regulatory reporting engine, which will use the data from the core TDP and generate and submit multiple regulatory report, including but not limited to OATS, BlueSheet, Trace, MSRB, Insite, LOPR and Large Trader. Data Sourcing Ingest different types of data from multiple sources, this data can be categorized in three main categories: i. Transaction Data (e.g.: order, execution, fills, allocations, etc.) ii. Position Data (e.g.: Client/Firm SOD, EOD positions, etc.) iii. Reference Data (e.g.: products, accounts, counterparties, etc.) Project Drivers High Level Scope Functional Components Single source for all regulatory reporting, compliance reporting and inquiry request Potential for Business usecases like TCA, Benchmarking, etc. Internal & External Reporting This component will generate ad-hoc customized reports as well as canned reports for the purpose of internal analysis and external regulatory inquiries like TMMS Exam, etc. Control Framework This component is a structural framework that will enable comprehensive data quality, timeliness and completeness checks to help identify inherent issues in firms trading data. Data Retention The submitted data must be retained for analysis, investigation, regulatory and legal purpose. At the minimum, the solution will follow standard data retention requirement from Rule 613 Consolidated Audit Trail (CAT) requirement, minimally 6 years. Data must be retained in such a way that the raw data from each source can be logically separate from all the sources. Data Storage All the data from different sources will be stored in central repository. It is expected that the repository will have capacity high enough to store extremely large volume of data. Any regulatory reported data should be stored in WORM compliant format. Processing Data obtained from all the sources needs to be processed according to pre-defined rules. Following are some of the key processing functionalities: i. Normalization/Linkages ii. Replays/Reprocessing iii. Symbology/Cross-Currency Support iv. Regulatory report generation and submission v. (Foundation for) Surveillance routine coverage and alert generation Control Framework The control framework is responsible for maintaining the logical integrity and accuracy of the data submitted by the subscribers. Below are some of the main functionalities of control framework: i. Data quality and completeness controls ii. Reconciliations iii. Error handling/exception management iv. Audit Trails v. Automatic data quarantine capability Security and Data Access Data from various sources should only be accessed to personnel from that source and/or any personnel with proper level of data access privileges. A hierarchy of data access should be implemented and maintained which grants privileges. 2 Regulatory Operations Technology 5/2/2017
RegOps COE Technical Overview Data Sourcing i. Structured data from various systems ii. Support for multiple types of structured data iii. High volumes, approx. 200 500 million events/day iv. Real-time, intraday & EOD batch processing Processing i. Micro-batches or EOD (real-time not a must) ii. Technical key creation iii. Temporal milestoning/versioning iv. Trade Linkage v. Symbology/Cross-Currency Support vi. Replays/Reprocessing vii. Data consistency viii. Least /Minimal replication Data Quality & Control Framework i. Data quality and completeness controls i. Field, across fields and across rows within a group ii. Reconciliations i. Source to target, target to different target iii. Error handling/exception management iv. Audit Trails v. Automatic data quarantine capability Data Storage & Data Retention i. Support WORM storage [no deletion] ii. Data usage i. Primary usage for upto 5 days data ii. Secondary usage upto 3 months data iii. Readily available upto 2 years iv. Otherwise upto 6 years iii. Books & Records for regulatory purpose iv. Approx. 15TB of uncompressed data per year Technical Components Reporting i. Support for all Order based reporting (including global regulatory report) ii. Support for Micro-batches or EOD reporting (real-time not a must) iii. Various type s of Daily, Weekly, Monthly and Quarterly reports iv. Report Level Adjustment e.g.: Regulators rejects & mismatches Data Analysis and UI Tool i. UI for data query, pivot and mining ii. Usage of graph (nice to have) iii. Canned and Uncanned reports [adhoc reports] iv. Role based authentication Workflow Tool i. Process management ii. Data correction Security and Data Access i. Data security/confidentiality/privacy Other considerations i. SDLC compliant ii. Logging; metrics iii. Cost iv. Archiving v. 2.5X peak volume certified vi. Scalability, Stability and Resiliency vii. BCP/quick recovery 3 Regulatory Operations Technology 5/2/2017
4 Regulatory Operations Technology 5/2/2017
5 Regulatory Operations Technology 5/2/2017
6 Regulatory Operations Technology 5/2/2017
7 Regulatory Operations Technology 5/2/2017
8 Hortonworks Inc. 2011 2016. All Rights Reserved