MULTIVARIATE ANALYSIS OF STEALTH QUANTITATES (MASQ)

Similar documents
Security analytics: From data to action Visual and analytical approaches to detecting modern adversaries

SentryWire Next generation packet capture and network security.

SentryWire Next generation packet capture and network security.

Transforming Security from Defense in Depth to Comprehensive Security Assurance

Security Gap Analysis: Aggregrated Results

Incident Response Services

Detect Cyber Threats with Securonix Proxy Traffic Analyzer

Deriving Network Traffic Signatures via Large Graphs

RSA INCIDENT RESPONSE SERVICES

Cisco Stealthwatch Improves Threat Defense with Network Visibility and Security Analytics

With turing you can: Identify, locate and mitigate the effects of botnets or other malware abusing your infrastructure

RSA INCIDENT RESPONSE SERVICES

White Paper. Why IDS Can t Adequately Protect Your IoT Devices

MACHINE LEARNING & INTRUSION DETECTION: HYPE OR REALITY?

Machine-Powered Learning for People-Centered Security

UNCLASSIFIED R-1 ITEM NOMENCLATURE FY 2013 OCO

The Mimecast Security Risk Assessment Quarterly Report May 2017

Comparison of Firewall, Intrusion Prevention and Antivirus Technologies

in PCI Regulated Environments

CYBERBIT P r o t e c t i n g a n e w D i m e n s i o n

VULNERABILITIES IN 2017 CODE ANALYSIS WEB APPLICATION AUTOMATED

RiskSense Attack Surface Validation for IoT Systems

Perimeter Defenses T R U E N E T W O R K S E C U R I T Y DEPENDS ON MORE THAN

Cisco ASA 5500 Series IPS Solution

UNCLASSIFIED R-1 ITEM NOMENCLATURE FY 2013 OCO

WHY SIEMS WITH ADVANCED NETWORK- TRAFFIC ANALYTICS IS A POWERFUL COMBINATION. A Novetta Cyber Analytics Brief

Privileged Account Security: A Balanced Approach to Securing Unix Environments

BUILDING A NEXT-GENERATION FIREWALL

Behavioral Analytics A Closer Look

Popular SIEM vs aisiem

SIEMLESS THREAT MANAGEMENT

RSA NetWitness Suite Respond in Minutes, Not Months

AUTOMATED SECURITY ASSESSMENT AND MANAGEMENT OF THE ELECTRIC POWER GRID

ISO COMPLIANCE GUIDE. How Rapid7 Can Help You Achieve Compliance with ISO 27002

STAY ONE STEP AHEAD OF THE CRIMINAL MIND. F-Secure Rapid Detection & Response

SIEM Solutions from McAfee

IPS with isensor sees, identifies and blocks more malicious traffic than other IPS solutions

AND FINANCIAL CYBER FRAUD INSTITUTIONS FROM. Solution Brief PROTECTING BANKING

SOLUTION BRIEF RSA NETWITNESS SUITE 3X THE IMPACT WITH YOUR EXISTING SECURITY TEAM

CYBER ANALYTICS. Architecture Overview. Technical Brief. May 2016 novetta.com 2016, Novetta

Ο ρόλος της τεχνολογίας στο ταξίδι της συμμόρφωσης με τον Γενικό Κανονισμό. Αντιγόνη Παπανικολάου & Νίκος Αναστόπουλος

COMPUTER FORENSICS (CFRS)

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Activating Intrusion Prevention Service

The Top 6 WAF Essentials to Achieve Application Security Efficacy

Database Infrastructure to Support Knowledge Management in Physicochemical Data - Application in NIST/TRC SOURCE Data System

Intrusion Detection by Combining and Clustering Diverse Monitor Data

Unlocking the Power of the Cloud

McAfee Embedded Control

IBM Next Generation Intrusion Prevention System

Introduction to Data Mining and Data Analytics

FFIEC Cybersecurity Assessment Tool

Digital Forensics Readiness PREPARE BEFORE AN INCIDENT HAPPENS

Intrusion Detection System using AI and Machine Learning Algorithm

Un SOC avanzato per una efficace risposta al cybercrime

Advanced Threat Intelligence to Detect Advanced Malware Jim Deerman

Analytics Driven, Simple, Accurate and Actionable Cyber Security Solution CYBER ANALYTICS

MITIGATE CYBER ATTACK RISK

Delay Injection for. Service Dependency Detection

Protecting Against Modern Attacks. Protection Against Modern Attack Vectors

EXABEAM HELPS PROTECT INFORMATION SYSTEMS

Automated Firewall Change Management Securing change management workflow to ensure continuous compliance and reduce risk

ENTERPRISE ENDPOINT PROTECTION BUYER S GUIDE

Management of IT Infrastructure Security by Establishing Separate Functional Area with Spiral Security Model

MOBILE DEFEND. Powering Robust Mobile Security Solutions

ARC VIEW. Critical Industries Need Continuous ICS Security Monitoring. Keywords. Summary. By Sid Snitkin

Financial Statement Generator Script

Data mining overview. Data Mining. Data mining overview. Data mining overview. Data mining overview. Data mining overview 3/24/2014

HUAWEI TECHNOLOGIES CO., LTD. Huawei FireHunter6000 series

THE EVOLUTION OF SIEM

8 Must Have. Features for Risk-Based Vulnerability Management and More

Machine Learning and Next-Generation Intrusion Prevention System (NGIPS)

Build a system health check for Db2 using IBM Machine Learning for z/os

1. Inroduction to Data Mininig

Chapter 3. Foundations of Business Intelligence: Databases and Information Management

WHITEHAT SECURITY. T.C. NIEDZIALKOWSKI Technical Evangelist. DECEMBER 2012

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

ABB Ability Cyber Security Services Protection against cyber threats takes ability

Mechanisms for Database Intrusion Detection and Response. Michael Sintim - Koree SE 521 March 6, 2013.

Artificial Intelligence Drives the next Generation of Internet Security

Gain Control Over Your Cloud Use with Cisco Cloud Consumption Professional Services

From Correlation to Causation: Active Delay Injection for Service Dependency Detection

Example 1 Root Cause Analysis Report Focal Point: Loss of Productivity

Encrypted Traffic Security (ETS) White Paper

Problem Code: #ISR13. College Code :

OUTSMART ADVANCED CYBER ATTACKS WITH AN INTELLIGENCE-DRIVEN SECURITY OPERATIONS CENTER

Basic Concepts in Intrusion Detection

Compare Security Analytics Solutions

Table of Content Security Trend

Automating the Top 20 CIS Critical Security Controls

Analytical Techniques for Anomaly Detection Through Features, Signal-Noise Separation and Partial-Value Association

RSA Advanced Security Operations Richard Nichols, Director EMEA. Copyright 2015 EMC Corporation. All rights reserved. 1

Copyright 2013 EMC Corporation. All rights reserved. BIG DATA AND SECURITY JOINING FORCES

Machine Learning in Data Quality Monitoring

EU GENERAL DATA PROTECTION: TIME TO ACT. Laurent Vanderschrick Channel Manager Belgium & Luxembourg Stefaan Van Hoornick Technical Manager BeNeLux

rat Comodo Valkyrie Software Version 1.1 Administrator Guide Guide Version Comodo Security Solutions 1255 Broad Street Clifton, NJ 07013

Protecting Against Online Fraud. F5 EMEA Webinar August 2014

The Invisible Threat of Modern Malware Lee Gitzes, CISSP Comm Solutions Company

Use Cases. E-Commerce. Enterprise

RSA Enterprise Compromise Assessment Tool (ECAT) Date: January 2014 Authors: Jon Oltsik, Senior Principal Analyst and Tony Palmer, Senior Lab Analyst

Transcription:

MULTIVARIATE ANALYSIS OF STEALTH QUANTITATES (MASQ) Application of Machine Learning to Testing in Finance, Cyber, and Software Innovation center, Washington, D.C. THE SCIENCE OF TEST WORKSHOP 2017

AGENDA INTRODUCTION TO MASQ MULTIVARIATE ANALYSIS OF STEALTH QUANTITIES (MASQ) ALGORITHM OVERVIEW MULTIVARIATE ANALYSIS OF STEALTH QUANTITIES APPLIED TO CYBER SECURITY (MASQ C) MULTIVARIATE ANALYSIS OF STEALTH QUANTITIES APPLIED TO AUDIT TESTING (MASQ F) MULTIVARIATE ANALYSIS OF STEALTH QUANTITIES APPLIED TO SOFTWARE TESTING (MASQ S) 1

MASQ IS USED TO FIND RARE OBSERVATIONS IN LARGE COMPLEX DATA STEALTH QUANTITIES ARE DANGEROUS THREATS Stealth Quantities are anomalies that can easily hide among multiple variables, they are anomalies that are too rare or easily modified to easily train on Only by applying advanced analytics to all data under investigation simultaneously will Stealth Quantities be found AKA NEEDLE IN A HAYSTACK The most apt analogy for stealth quantities is the needle in a haystack. Stealth Quantities typically exist in randomly related data and defy conven onal efforts of discovery. Needle in the Haystack Stealth Quantities are hiding amongst normal observations exploiting the dimensions that exist in data to mask themselves as normal.

WHAT IS MASQ? Machine Intelligence MASQ is a proprietary algorithm that mines data for known and unknown signatures. Signatures are characteristic or distinctive patterns searchable within data. Signatures fall into two categories: known and unknown. Known signatures are generally well documented and have predefined properties or rules than make Supervised Learning by a machine straightforward. Unknown signatures exist within data but do not have predefined properties or rules making Supervised Learning impractical. Predicting the quantity of unknown signatures is possible, but not trivial to perform, and requires Unsupervised Learning techniques. MASQ makes use of both styles to find any anomaly in any data set. Discipline Agnostic Cloud Analytics MASQ works on any volume, variety, velocity, or veracity of data to map all data into distinct subpopulations from which errors or malicious behavior can be flagged. MASQ was created to address the kind of multivariate systems presented by an organization s cyber, financial, or simulation enterprises of today. Multivariate acknowledges the fact that today s enterprise systems deal with hundreds to thousands of variables that characterize financial transactions, experimental data, network traffic, etc. Predictive Anomaly Detection Stealth Quantities are unknown risks lurking within the large, complex datasets that defy many of today s methods of detection. Known risks (e.g., known viruses, past audit findings, or programming bugs) are generally well documented and have predefined properties or rules than make Supervised Learning by a machine straightforward. Unknown risks (e.g., new malware, new forms of fraud, etc.) exist within data streams but do not have predefined properties or rules making Supervised Learning, and traditional techniques, impractical or late to need. MASQ finds both known, current threats and new threats yet undiscovered.

HOW MASQ WORKS MASQ is a process that leverages machine learning techniques by combining both supervised and unsupervised machine learning techniques (hybrid machine learning) Unsupervised Learning means that you do not need to tell the program what to look for prior to analyzing the data; every time you examine data in MASQ it can find new types of rare events Supervised Learning captures SME feedback regarding the goodness or badness of rare event groups. MASQ then learns what rare events are acceptable and those that are not which streamlines repeated application of the software to additional data MASQ finds rare events through highly advanced clustering techniques Algorithms split Data into Groups using Hierarchical Clustering, Applies Weights, Splits again, until the Data is Separated into Distinct Populations

THE MASQ PROCESS Millions of Data Points Pass Through MASQ UNSUPERVISED LEARNING MASQ Filters and Splits Data Points into Representative Groups Centroids Classify Groups of Data Points SMEs Assess Impact of Rare Event Groups Threats are Simultaneously Corrected and Trained On SUPERVISED LEARNING Learn / Train Correct / Prevent

Use Case: Detecting Cyber Anomalies TITLE Detecting Cyber Attacks within Millions of Netflow Transmissions THE CHALLENGE Naval commands are often targets of sophisticated new forms of cyber attacks that traditional cyber security efforts are much more likely to misidentify as acceptable behavior. Six known attacks were injected into six portions of the client s network data to test MASQ. SQL Injection and Pass the Hash were considered difficult to detect by SMEs. OUR SOLUTION + Mine ALL of the Data: 13.8 million total transmissions were analyzed in minutes. MASQ operates by conducting multidimensional analysis across hundreds of variables simultaneously. + Improve Understanding: MASQ summarized all packets into 21,600 groups containing between 500 and 1,500 transmissions each + SQL Injections found in clusters 22, 113, and 7 + Pass the Hash found in clusters 45, 147, and 76 + Improve Efficiency: MASQ breaks data into clusters. This means that MASQ uses a small subset of data to identify thousands of problem points. Attacks were identified after analyzing less than 0.05% of the 13.8 million transmissions Transmissions Analyzed 6000 5000 4000 3000 2000 1000 0 151 Cumulative Transmissions Analyzed Slope due to larger volume of total transmissions 936 1852 4782 4844 5596 1 2 3 4 5 6 Attacks Detected Timestamp Attack Number of Transmissions Transmissions Analyzed 23 Jul 1000 SQL Injection 1,616,130 151 23 Jul 1100 Pass the Hash 2,912,504 785 24 Jul 1300 SQL Injection 3,699,647 916 25 Jul 1200 Pass the Hash 3,359,869 2930 24 Jul 0100 SQL Injection 1,112,536 62 24 Jul 0100 Pass the Hash 1,119,467 752 Total 13,820,153 5,596 6

Use Case: Find Accounting Anomalies for Audit Readiness TITLE Identification of Accounting Anomalies Within Financial Transactions THE CHALLENGE The Navy processes millions of transactions quarterly. The sheer volume of data makes it impossible for a human to determine anomalies leading to audit failure. The client was interested to determine if MASQ could find accounting behavior not known to SMEs and improve audit readiness. OUR SOLUTION + Mine ALL of the Data: The MASQ team ran analysis on the STARS-FL segment MILPAY and MILSTRIP independently of each other. The team also processed ERP general ledger data as well for FY15. + Improve Understanding: We built profiles of anomalous and normal transactions across all business lines so that SMEs are able to identify ALL existing anomalies and predict new anomalies. Key findings included: + A large number of credits and debits were discovered to have both positive negative and values. + Missing obligation document for extended period of time (over 30 days) + Document type mismatches. Quantity and Dollar mismatches, etc. + Improve Efficiency: MASQ breaks data into clusters. This means that MASQ uses a small subset of data to identify thousands of problem points. Visualization of Data clusters including credit/debit mismatches (dark red), missing debits (purple), document type mismatches (yellow), normal data (green) MILPAY MILSTRIP Time Period FYXX FYYY Q1 FYXX Q1 Transactions 778,807 7,536,054 Documents 37,684 54,210 1,288,224 Anomalies Found 87,824 3,984 336,712 Erorr Percentage 11.90% 7.30% 4.50% 7

Use Case: Find Simulation Anomalies for Software Validation THE CHALLENGE Batch simulations are highly complex computer software that create an operational environment to systematically test NextGen concepts. The volume, veracity, and variety of data produced by batch simulations makes the Verification & Validation (V&V) process very challenging. NASA needs a V&V process for batch simulation that will ensure success in rapid fashion. OUR SOLUTION NASA provided the MASQ team with 1.2 TB of PTM Batch Simulation. The data contained three errors known to the customer. The MASQ team isolated the three known errors within the data and found an additional six risks and ~40 high-priority anomalous patterns in the data. Reconciling these ~50 anomalies will give NASA assurance that experiment analysis and results publication can proceed. Further, because the anomalies are sub settable from the data, analysis teams are able to predict the power of test results and proceed with limited hypothesis testing. Constant Optimal Altitude Unclassified Risk Classifying a single yellow cluster will classify numerous other yellow clusters. Key Takeaway: Human inspection is only needed for a fraction of yellow clusters. Normal Flights Normal flights represent most of the data. High Time Flown Non Matching Aircraft Separation Or Failure to Climb Or No Response to Altitude Change Request Entangled flights without PTM Or PTM flight blocked continuously (ONOPT = 0)