Big Data & Security Analytics David J. White February 2016 AlixPartners, LLP 2013
Introduction David J. White Director, Information Management Services - AlixPartners, LLP +1.213.437.7147 dwhite@alixpartners.com www.alixpartners.com David is a Director in the Data Privacy, Security, and Information Governance practice group at AlixPartners. His practice focuses on issues regarding Information Lifecycle Governance with a particular focus on global data privacy and security. He has over two decades of experience assisting corporations to implement compliant privacy & security programs. Prior to joining AlixPartners, he was a Partner in the Commercial Litigation practice group at a top AmLaw100 law firm. David is a certified Six Sigma Green Belt, and Certified Information Privacy Professional (CIPP/E/US), and a registered US Patent Attorney. He also holds a U.S. Juris Doctorate degree and a an LL.M. from London University. 2
2015: 79,000 Security Incidents & 2,122 Confirmed Breaches 3
Increasingly Complex Environments Compound Problem IoT BYOD Social Media Partner Integration Migration to Cloud and Hosted Environments Virtualization Encryption 4
Traditional Security Isn t Enough Virus, Malware, and Spyware Scanning Endpoint Security Personal firewalls Spam filtering URL filtering Application controls File integrity monitoring Intrusion Prevention Systems Secure Mail and Web Gateways Network Behavior Analysis Data Loss Prevention Systems Security Information and Event Management (SIEM) 5
Guarding the Perimeter is No Longer Practical 6
A Simple Truth No matter how hard you try or how much money your organization spends, your network will be compromised at some point and it probably already has been. 7
Big Data Big Solution? 8
What is Big Data A New Ginormous Black Box Data System that is Scary.VERY Scary! 9
What is Big Data Just Another Name For The Same Data Analytics We Have Always Done Just With Bigger Data Sets. Why Worry! 10
What is Big Data Four V s: Volume, Velocity, Variety, & Veracity 11
What is Big Data A Computational Ecosystem Comprised Of Specialized File Systems, Programs, And Algorithms Used To Extract Actionable Intelligence From Disparate Data Sources Without The Need To Make The Source Data Conform To A Predefined Tabular Format Of Columns And Rows 12
Big Data - A Short History Massive Growth in Data in Past Decade BD Build to Address Needs to Process Massive Amounts of Data Very Quickly Google Wanted to Index the Internet Input Coming From Large Diversity of Sources and Formats Traditional Data Systems Not Equipped to Handle Volume or Processing Speed 13
What's Wrong with Traditional RMDB Hardware is Slow and Expensive Data Must Be Uniform and Made to Conform to Predefined Structures Before Loading Requires Predefined Relationships Between Data Objects/Elements Query Language Not Very Flexible or Powerful Output is Slow 14
Big Data Is born 2005 Hadoop Designed to Address These Exact Issues Not Just Distributed Array of Hard Drives (RAID) New File System Built To Access Massive Arrays of Raw Format Data Files No Need to Cram Internet Into Massive Structured Data Tables More Akin to Book Index Broken Across Large Numbers of Small Files Google is Now More Than 1 Million Petabytes in Size and Processes More Than 24 Petabytes of Data a Day 15
Key Technical Differences FILE Level Differences VS. Structured Relational Database A Large Array of Unstructured Data 16
Key Technical Differences FILE Level Differences VS. Structured Relational Database Large Array of Unstructured Data 17
Key Technical Differences 18
Key Technical Differences Application Layer Output 19
What Does This Mean? ETL VS ELT Extract Predefined Requirements Transform Traditional RDBS Limited Data Sets Based on Needs Preplanned Updates Driven by Business Rules & Requirements Significant Cleansing& Validation Data Normalized to Fit Predefined Structure LOAD Clean Normalized Data to System Extract Theoretically ALL Source Data Data With No Yet Known Purpose Load All Data Big Data System Bulk Data Sets or Live Streams No Cleansing or Normalization Needed Whenever Available & Ready Transform Only as Needed During Analysis Schema on Write vs Schema on Read 20
Impact On Security http://www.ibmbigdatahub.com/infographic/four-vs-big-data 21
How It Works for Security Analaytics Computer network Equivalent Of A Closed circuit Security Camera System Always On 24/7 Captures And Analyzes Data (Including Packet Header And Payload, OSI Layers 2 Through 7) At Wire Speed Provides Complete, Forensically Sound Record Of All Network Activity Real time & Back in time Analysis of Files, Applications, Flows, and Packets 22
Big Data and Security - Typical Internal Sources All IP traffic flowing across your network, including web traffic, email, file transfers, and IoT traffic Network flow records (such as NetFlow, cflow, jflow, and sflow) from network routers and switches VM to VM (virtual machine to virtual machine) IP traffic on VMware, Xen, and other virtualization platforms User account directories, such as Microsoft Active Directory and LDAP Detonation and behavioral analysis result feeds from malware analysis appliances 23
Big Data and Security - Typical External Sources Cyberthreat and reputation feeds, such as Emerging Threats, Google Safe Browsing, Malware Domain List, SANS Internet Storm Center, SORBS (Spam and Open Relay Blocking System), VirusTotal, and other spam or IP address blacklists IP geolocation services, such as Digital Envoy, Geobytes, MaxMind, and Quova Website intelligence services, such as DomainTools, Robtex, and the global domain registry database 24
External Source Use - Geolocation Analytics Geolocation is the practice of assessing the real world location of an Internet connected computer or device. Geolocation integration enables users to view the origin, destination, and flow of network traffic. 25
Security Analytics Key Uses Incident Response And Forensics Situational Awareness Cyber Threat Detection Data Loss Monitoring And Analysis Verification Of An Organization s Policy Compliance Security Assurance (Always on Verification Of The Effectiveness Of Your Other Security Tools) End-Point Behavior Monitoring 26
Big Data Security Example CSIRT Investigation Do we know who did this to us? How did they do it? What systems were compromised? Can we be sure that the attack is over? How can we be sure that it won t happen again? 27
CSIRT Investigations Before An Attack Gain situational awareness. The system helps set a baseline and familiarizes you with the types of traffic on your network so that you can recognize out of the ordinary communications. Reduce your network s attack surface. The system identifies applications, communications, and operating systems that pose a security risk and/or aren t approved for use in your organization. Distinguish Normal from Abnormal During An Attack Detect the threat. Identify anomalous communications, such as an internal host connecting to an outside host for unusually long periods, an internal host transmitting an abnormally large amount of data, or an end user host (desktop or laptop) communicating with other end user hosts rather than servers. Identify rogue hosts. Rogue hosts (computers planted inside the organization for nefarious reasons) are clearly outside the operating system and/or application parameters set by your IT department. Quarantine the threat. Identify other hosts that may have been compromised so you can quarantine them for remediation. After An Attack Verify attack termination. Verify the attack has ended and confirm whether any lingering threats need to be remediated. Confirm exfiltrated data. Determine the scope and extent of the data breach. Identify the root cause. Understand exactly how the breach happened so you can ensure that it doesn t happen again. 28
Tips for Getting Started 29
AlixPartners is ready to field a team of relevant experts whenever and wherever they are needed. Our professionals speak more than 50 languages and have experience in every corner of the world. Call us. We ll be there when it really matters. AlixPartners, LLP 2013 AlixPartners, LLP, 2012 30