Big Data Appliance in Risk Management

Similar documents
Data Mining & Machine Learning F2.4DN1/F2.9DM1

SOLUTION BRIEF RSA ARCHER BUSINESS RESILIENCY

Evolving To The Big Data Warehouse

Netezza The Analytics Appliance

Credit Union Cyber Crisis: Gaining Awareness and Combatting Cyber Threats Without Breaking the Bank

Cyber Security: Threat and Prevention

Improving the ROI of Your Data Warehouse

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

Moving Digital Identity to the Cloud, a Fundamental Shift in rethinking the enterprise collaborative model.

BUILD BETTER MICROSOFT SQL SERVER SOLUTIONS Sales Conversation Card

JAVASCRIPT CHARTING. Scaling for the Enterprise with Metric Insights Copyright Metric insights, Inc.

Automated Context and Incident Response

Massive Scalability With InterSystems IRIS Data Platform

whitepaper How to Measure, Report On, and Actually Reduce Vulnerability Risk

DATA MINING TRANSACTION

EBOOK. Stopping Fraud. How Proofpoint Helps Protect Your Organization from Impostors, Phishers and Other Non-Malware Threats.

Risk: Security s New Compliance. Torsten George VP Worldwide Marketing and Products, Agiliance Professional Strategies - S23

Texas Death Row. Last Statements. Data Warehousing and Data Mart. By Group 16. Irving Rodriguez Joseph Lai Joe Martinez

Boost your Analytics with Machine Learning for SQL Nerds. Julie mssqlgirl.com

Wayward Wi-Fi. How Rogue Hotspots Can Hijack Your Data and Put Your Mobile Devices at Risk

REVENUE REPORTING DASHBOARD FOR A HOTEL GROUP

Private cloud for business

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team

NPP & Blockchain Have you thought about the data? Ken Krupa, CTO, MarkLogic

with Advanced Protection

MATT JONES HOW WHATSAPP REDUCED SPAM WHILE LAUNCHING END-TO-END ENCRYPTION

Data Migration Platform

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

Sales and Marketing Strategies That Work for Financial Services

MOBIUS + ARKIVY the enterprise solution for MIFID2 record keeping

Comparative analysis of data mining methods for predicting credit default probabilities in a retail bank portfolio

Divide & Recombine (D&R) with Tessera: High Performance Computing for Data Analysis.

How to develop a website content evaluation plan

Bisnode View Why is it so damn hard to piece together information across the enterprise?

Informatica Data Quality Product Family

Regulation and Innovation: The Experience of Regulating Kenya s M-Pesa

Managing Data Resources

How Real Time Are Your Analytics?

Security analytics: From data to action Visual and analytical approaches to detecting modern adversaries

REGULATORY REPORTING FOR FINANCIAL SERVICES

Symantec Enterprise Vault

What s New in Dealmaker Winter 14?

Machine-Powered Learning for People-Centered Security

FIBO Operational Ontologies Briefing for the Object Management Group

Disruptive Technologies Legal and Regulatory Aspects. 16 May 2017 Investment Summit - Swiss Gobal Enterprise

Village Software. Security Assessment Report

SentryWire Next generation packet capture and network security.

SentryWire Next generation packet capture and network security.

VOLTDB + HP VERTICA. page

EBOOK. Stopping Fraud. How Proofpoint Helps Protect Your Organisation from Impostors, Phishers and Other Non-Malware Threats.

Visual Workflow Implementation Guide

Summer 2010 Research Project. Spam Filtering by Text Classification. Manoj Reddy Advisor: Dr. Behrang Mohit

Analytics Driven, Simple, Accurate and Actionable Cyber Security Solution CYBER ANALYTICS

NETACEA / WHITE PAPER DNS VS JAVASCRIPT

Imperva Incapsula Website Security

Role of search in retail banking. Analysis of current account usage and application in Italy APRIL JUNE 2011

Fast Innovation requires Fast IT

Wild Mushrooms Classification Edible or Poisonous

Administration of Symantec Messaging Gateway 10.5 Study Guide

Search Engine Optimization. Presentation Overview 1/7/2014. Internet Marketing Reality Check: Discover the Techniques Proven to Increase Leads

Microsoft Developer Day

Cisco s Appliance-based Content Security: IronPort and Web Security

SEO Toolkit Keyword and Competitor Research and On Page Optimisation

Introducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone

1 GOOGLE NOW ALLOWS EDITING BUSINESS LISTINGS

ISUPPLIER PORTAL USER MANUAL ADERP VERSION 1.0

NetBanking Manage your finances by clicking and not by walking to the branch, from any computer with internet access.

Mining Web Data. Lijun Zhang

A BETTER PATH: Security Enlightened. Security s Shift to the Cloud

IMPLEMENTING SECURITY, PRIVACY, AND FAIR DATA USE PRINCIPLES

Deep Learning & Accelerating the NLP Journey in the Unstructured World

MOBILE DEFEND. Powering Robust Mobile Security Solutions

Q-Balancer Range FAQ The Q-Balance LB Series General Sales FAQ

Data Warehouses Chapter 12. Class 10: Data Warehouses 1

Big Data: From Transactions, To Interactions

Data Quality Architecture and Options

BROWSER POLICY: DESCRIPTION OF SUPPORT 2 SUPPORTED BROWSERS 2 TIER 2 SUPPORTED BROWSERS 2 UNSUPPORTED BROWSERS 2

Using Internet as a Data Source for Official Statistics: a Comparative Analysis of Web Scraping Technologies

Data safety for digital business. Veritas Backup Exec WHITE PAPER. One solution for hybrid, physical, and virtual environments.

Fine-Grained Access Control

Spam Detection ECE 539 Fall 2013 Ethan Grefe. For Public Use

Chapter 6 VIDEO CASES

Security and Compliance for Office 365

BigDataBench: a Benchmark Suite for Big Data Application

May 4, :00 3:00pm ET

The Changing DNS Market: A Technical Perspective

The European Single Electronic Format (ESEF) Requirements for reporting entities overview of the draft regulatory standard (RTS)

WIN BACK YOUR CUSTOMERS! A guide to re-engaging your inactive subscribers

The Challenge of Managing

Short review of the Coinmama.com. Registration on the Coinmama

Introduction to Data Science Day 2

SIEM Product Comparison

RISK MANAGEMENT Education and Certification

ADERP ISUPPLIER PORTAL USER MANUAL VERSION 1.2

COST PER LEAD ADVERTISING BY THE NUMBERS 10 Steps That Will Transform Your Acquisition Process

Contractual Approaches to Data Protection in Clinical Research Projects

How a Metadata Repository enables dynamism and automation in SDTM-like dataset generation

Identifying Important Communications

CS 188: Artificial Intelligence Fall 2011

Buying Names: Append, List Rental + Co-Registration Realities

Transcription:

Big Data Appliance in Risk Management Erste Group Bank Jozef Zubricky Group Credit Risk Models and Methods

Digital data have predictive power...

Web Scenarios with highest predictive power Currency Conversion Information (1.3 % Defaults) Loan Consolidation Information (4.6% Defaults)

Simplest method is Naïve Bayes Text classification: SPAM filter Email messages Term-frequency matrix Class probabilities Message 3 Msg1 Msg2 Msg3 Msg1 Msg2 Msg3 Message 2 Client 3 Message 1 Your email has won 2.5 million. Client 2 Client 1 sit Mozilla Consolida Macintosh 193.1.2.0 your 1 2 1 email 1 0 1 has 1 2 1 won 1 0 0 million 1 0 0 Cli1 Cli2 Cli3 sit 1 1 0 Mozilla 1 0 3 Consolida 2 0 0 Macintosh 1 2 0 193.1.2.0 1 2 0 SPAM 90% 5% 10% HAM 10% 95% 90% The messages with high SPAM probability classified as SPAM. Text classification: Digital scoring Lists of strings Term-frequency matrix Class probabilities Cli1 Cli2 Cli3 High risk 10% 40% 90% Low risk 90% 60% 10% Probability of high risk used as an additional variable in the scorecard.

Not able to implement In our traditional system, due to computational speed and ever changing underlying websites

Big Data appliance is build for such tasks Input Column 1: IP Address Column 2: Timestamp of click Column 3: URL of Page Visited Column 4: Webpage Text Map() Key (IP Address) Value (Timestamp) Value (URL or Web Page visited) Value (Probability of Web Page being good or bad based on Webpage text and Naïve Bayes) Reduce() Shuffle and Sort Least Risky Pages Most Risky Pages Least Risky IP Addresses Most Risky IP Addresses Key(IP Address) Value (List of probabilities of all the websites visited by IP Address per user session defined by timestamps)

No business case... Nobody wants to finance this just for one problem like this

Problem: Come up data driven model. Natural experiment data

Why not champion challenger? It is costly Reputation Risk

We found 2 natural experiments In 2007 we granted for almost one month loans without considering current instalments Until 2010 we were granting foreign exchange loans in some countries Well we though we did found natural experiments...

We deleted data and scripts were not working For first experiment data for 2007 were no longer available For second experiment, scripts were not working across a corridor

We managed to... Retrieve data from old backups. Modelling itself was quite a success. We have found our relationships

So we set up a project called CRANE Central Place for Model Development, Monitoring and Validation Unlimited Data History, to Utilise Past Crisis Data for Model Development Automated Data Load and Post Rollout Check to Reduce Operational Problems

We needed cheap storage 4 Cost of Data Warehouse Appliance 3 2 Cost of Hadoop Appliance 1 0 10TB 20TB 30TB 40TB 50TB

Big Data Technology without Big Data

But this is how it got in Business case based on cheap storage

But... How to connect it to production legacy systems? What is regulatory environment?

Our Environment is diverse We expanded by buying banks Different legacy IT systems Central modelling team distributed across locations Not enough storage in enterprise data warehouse

EBA and ESMA published Regulation draft on Big Data Type: EBA ESMA big data into Google They seek comments until 17.3.2017 People who wrote it have good insight into industry

Main Takeaways of Risks Transparent Security Reputation Conformist Referring to other Cybersecurity and Wrong decisions, Conformist directives aimed at data protection of difficult to control, behaviour of people client ownership of Big Data solutions. exclusion of groups when they know how the data and Risk of outsourcing of clients form their data influence transparency of use services decision making etc.

Thank you for your attention What is your experience with Data-mart projects? Jozef Zubricky jozef.zubricky@erstegroup.com What is your experience with Big Data usage? Jozef Zubricky +43 664 818 2976