Hadoop Security. Building a fence around your Hadoop cluster. Lars Francke June 12, Berlin Buzzwords 2017

Similar documents
HDP Security Overview

HDP Security Overview

Important Notice Cloudera, Inc. All rights reserved.

Enabling Secure Hadoop Environments

Important Notice Cloudera, Inc. All rights reserved.

ISILON ONEFS WITH HADOOP KERBEROS AND IDENTITY MANAGEMENT APPROACHES. Technical Solution Guide

How to Run the Big Data Management Utility Update for 10.1

Configuring Ports for Big Data Management, Data Integration Hub, Enterprise Information Catalog, and Intelligent Data Lake 10.2

Administration 1. DLM Administration. Date of Publish:

Installation 1. DLM Installation. Date of Publish:

Securing the Oracle BDA - 1

Installing SmartSense on HDP

Administration 1. DLM Administration. Date of Publish:

Getting Started 1. Getting Started. Date of Publish:

Hortonworks University. Education Catalog 2018 Q1

Configuring Sqoop Connectivity for Big Data Management

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY

Upgrading Big Data Management to Version Update 2 for Hortonworks HDP

Hortonworks SmartSense

HDFS Federation. Sanjay Radia Founder and Hortonworks. Page 1

IBM BigInsights Security Implementation: Part 1 Introduction to Security Architecture

Hortonworks Data Platform

Release Notes 1. DLM Release Notes. Date of Publish:

Document Type: Best Practice

Certified Big Data and Hadoop Course Curriculum

Innovatus Technologies

Apache Ranger User Guide

Hortonworks and The Internet of Things

Hortonworks Data Platform

Security 3. NiFi Authentication. Date of Publish:

Big Data Hadoop Stack

Oracle BDA: Working With Mammoth - 1

Hadoop. Introduction / Overview

Informatica Big Data Management HotFix 1. Big Data Management Security Guide

Introduction to Cloudbreak

Pre-Installation Tasks Before you apply the update, shut down the Informatica domain and perform the pre-installation tasks.

Knox Implementation with AD/LDAP

Cloudera Installation

Important Notice Cloudera, Inc. All rights reserved.

Oracle Big Data Fundamentals Ed 1

Configuring and Deploying Hadoop Cluster Deployment Templates

A Reference Architecture for Securing Big Data Infrastructures. July

Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT.

Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk

Informatica Cloud Spring Complex File Connector Guide

MapR Enterprise Hadoop

Informatica Cloud Spring Hadoop Connector Guide

Cloudera Installation

Hortonworks Data Platform

New Features and Enhancements in Big Data Management 10.2

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

Hortonworks SmartSense

KNIME Extension for Apache Spark Installation Guide. KNIME AG, Zurich, Switzerland Version 3.7 (last updated on )

Automation of Rolling Upgrade for Hadoop Cluster without Data Loss and Job Failures. Hiroshi Yamaguchi & Hiroyuki Adachi

Hortonworks Data Platform

Syncsort DMX-h. Simplifying Big Data Integration. Goals of the Modern Data Architecture SOLUTION SHEET

How to Install and Configure EBF16193 for Hortonworks HDP 2.3 and HotFix 3 Update 2

KNIME Extension for Apache Spark Installation Guide

Configuring Apache Knox SSO

Oracle Public Cloud Machine

Hortonworks Data Platform

1Z Oracle Big Data 2017 Implementation Essentials Exam Summary Syllabus Questions

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

SOLUTION TRACK Finding the Needle in a Big Data Innovator & Problem Solver Cloudera

HDInsight > Hadoop. October 12, 2017

Oracle Cloud Using Oracle Big Data Cloud Service. Release

Big Data Architect.

Certified Big Data Hadoop and Spark Scala Course Curriculum

Informatica Big Data Management Big Data Management Administrator Guide

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI)

Hortonworks DataPlane Service (DPS)

CSE 444: Database Internals. Lecture 23 Spark

Hadoop & Big Data Analytics Complete Practical & Real-time Training

Services: Monitoring and Logging. 9/16/2018 IST346: Info Tech Management & Administration 1

Datameer Big Data Governance. Bringing open-architected and forward-compatible governance controls to Hadoop analytics

Hadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours)

Xcalar Installation Guide

exam. Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0

CIS 612 Advanced Topics in Database Big Data Project Lawrence Ni, Priya Patil, James Tench

Acronis and Acronis Secure Zone are registered trademarks of Acronis International GmbH.

Data encryption & security. An overview

TECHNICAL OVERVIEW OF NEW AND IMPROVED FEATURES OF EMC ISILON ONEFS 7.1.1

YARN: A Resource Manager for Analytic Platform Tsuyoshi Ozawa

Best Practices and Performance Tuning on Amazon Elastic MapReduce

SAP VORA 1.4 on AWS - MARKETPLACE EDITION FREQUENTLY ASKED QUESTIONS

Chase Wu New Jersey Institute of Technology

docs.hortonworks.com

Security and Performance advances with Oracle Big Data SQL

Oracle 1Z Oracle Big Data 2017 Implementation Essentials.

Configuring Intelligent Streaming 10.2 For Kafka on MapR

Connect the Appliance to a Cisco Cloud Web Security Proxy

Configuring Hadoop Security with Cloudera Manager

Cloudera Manager Quick Start Guide

Exam Questions

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

IBM Secure Proxy. Advanced edge security for your multienterprise. Secure your network at the edge. Highlights

SmartSense Configuration Guidelines

Hortonworks Data Platform

Transcription:

Hadoop Security Building a fence around your Hadoop cluster Lars Francke June 12, 2017 Berlin Buzzwords 2017

Introduction

About me - Lars Francke Partner & Co-Founder at OpenCore Before that: EMEA Hadoop Consultant Hadoop since 2008/2009 Apache Committer: Hive, ORC Contact: lars.francke@opencore.com @lars_francke 2017 OpenCore GmbH & Co. KG 1/50

Overview

Overview - Be Warned? 2017 OpenCore GmbH & Co. KG 2/50

The Book Source: oreilly.com 2017 OpenCore GmbH & Co. KG 3/50

Other Books Source: oreilly.com 2017 OpenCore GmbH & Co. KG 4/50

Other Sources Kerberos: The Definitive Guide 1 Kerberos (German) 2 Active Directory, 5th Edition 3 Hadoop and Kerberos: The Madness beyond the Gate 4 HBase: The Definitive Guide, 2nd Edition 5 Bulletproof SSL and TLS 6 1 http://shop.oreilly.com/product/9780596004033.do 2 http://www.kerberos-buch.de/ 3 http://shop.oreilly.com/product/0636920023913.do 4 https://www.gitbook.com/book/steveloughran/kerberos_and_hadoop/details 5 http://shop.oreilly.com/product/0636920033943.do 6 https://www.feistyduck.com/books/bulletproof-ssl-and-tls/ 2017 OpenCore GmbH & Co. KG 5/50

Project Management How is a typical project structured and where does Hadoop security come into play? 2017 OpenCore GmbH & Co. KG 6/50

Project Management Idea/Initiation 2017 OpenCore GmbH & Co. KG 7/50

Project Management Idea/Initiation Planning/Design 2017 OpenCore GmbH & Co. KG 7/50

Project Management Idea/Initiation Planning/Design Execution/Implementation 2017 OpenCore GmbH & Co. KG 7/50

Project Management Idea/Initiation Planning/Design Execution/Implementation Production 2017 OpenCore GmbH & Co. KG 7/50

When To Start You can not begin thinking about Security too early! 2017 OpenCore GmbH & Co. KG 8/50

Next steps Let s dig into the details of each step in a project 2017 OpenCore GmbH & Co. KG 9/50

Project Planning

What To Think About? What am I going to use the cluster for? 2017 OpenCore GmbH & Co. KG 10/50

What To Think About? What am I going to use the cluster for? Which tools am I going to need to achieve the use-cases? 2017 OpenCore GmbH & Co. KG 10/50

What To Think About? What am I going to use the cluster for? Which tools am I going to need to achieve the use-cases? What kind of data am I going to ingest, store or process? 2017 OpenCore GmbH & Co. KG 10/50

What To Think About? What am I going to use the cluster for? Which tools am I going to need to achieve the use-cases? What kind of data am I going to ingest, store or process? What kind of corporate guidelines exist that must be followed? 2017 OpenCore GmbH & Co. KG 10/50

Do Not Assume Anything Do not assume companies/people know what a product is capable of just because they seem confident. 2017 OpenCore GmbH & Co. KG 11/50

What does Security even mean? What is part of Hadoop Security? 2017 OpenCore GmbH & Co. KG 12/50

Parts Of A Security Concept Authentication of users 2017 OpenCore GmbH & Co. KG 13/50

Parts Of A Security Concept Authentication of users Authorization of users 2017 OpenCore GmbH & Co. KG 13/50

Parts Of A Security Concept Authentication of users Authorization of users Auditing 2017 OpenCore GmbH & Co. KG 13/50

Parts Of A Security Concept Authentication of users Authorization of users Auditing Data Protection: Encryption on-the-wire 2017 OpenCore GmbH & Co. KG 13/50

Parts Of A Security Concept Authentication of users Authorization of users Auditing Data Protection: Encryption on-the-wire Data Protection: Encryption at-rest 2017 OpenCore GmbH & Co. KG 13/50

Project Execution

Kerberos == Hadoop Security? So, does Hadoop Security mean I need to enable Kerberos and I m done? 2017 OpenCore GmbH & Co. KG 14/50

Kerberos == Hadoop Security? So, does Hadoop Security mean I need to enable Kerberos and I m done? No! Far from it! 2017 OpenCore GmbH & Co. KG 14/50

Overview Before you re able to secure anything 2017 OpenCore GmbH & Co. KG 15/50

Overview Before you re able to secure anything something must be running that can be secured. 2017 OpenCore GmbH & Co. KG 15/50

Overview - contd. 2017 OpenCore GmbH & Co. KG 16/50

Network 2017 OpenCore GmbH & Co. KG 17/50

Host/Operating System SELinux NTP/Chrony Firewalls Antivirus Proxies 2017 OpenCore GmbH & Co. KG 18/50

Hadoop Installation You usually install a cluster manager first 2017 OpenCore GmbH & Co. KG 19/50

Hadoop Installation You usually install a cluster manager first Then you install Agents 2017 OpenCore GmbH & Co. KG 19/50

Hadoop Installation You usually install a cluster manager first Then you install Agents Followed by a distribution with lots of components 2017 OpenCore GmbH & Co. KG 19/50

Authentication Core Hadoop requires Kerberos for strong authentication Multiple choices on how to implement that: Cluster-local standalone KDC Cluster-local standalone KDC with one-way cross-realm trust to a central KDC Direct integration into a central KDC 2017 OpenCore GmbH & Co. KG 20/50

Authentication: Cloudera Manager/Ambari Wizards Source: unsplash.com by Sirotorn Sumpunkulpak 2017 OpenCore GmbH & Co. KG 21/50

Authentication - contd. What you need: Names/IPs for your KDC Supported Encryption Types (see my blog post 7 for more details) Firewall must allow all cluster machines to access the KDC Potentially a bunch of information to configure krb5.conf properly 7 http://www.opencore.com/blog/2017/3/kerberos-encryption-types 2017 OpenCore GmbH & Co. KG 22/50

Authentication - contd. Ideally you want the automatic option, but... You need an account in your AD/KDC that is allowed to create other accounts! (AD only) You need to talk via LDAPS and make sure that you have all the truststores set up properly (AD only) You cannot have multiple SPNs per User For SPNEGO you need a HTTP/<host> principal, sometimes doesn t match corporate guidelines 2017 OpenCore GmbH & Co. KG 23/50

Authentication - contd. There s a manual option 2017 OpenCore GmbH & Co. KG 24/50

Authentication - contd. Kerberos is only part of the authentication story. 2017 OpenCore GmbH & Co. KG 25/50

Authentication - contd. Knox, Cloudera Manager, Ambari, Ranger, Sentry, Hive, Impala etc. all have their own authentication layers which also support LDAP(S) or other mechanisms. 2017 OpenCore GmbH & Co. KG 26/50

Identity Management Your users need to exist on the workers so that YARN can start jobs using setuid(). 2017 OpenCore GmbH & Co. KG 27/50

Identity Management - contd. This is usually done using a tool like SSSD (free) or Centrify (commercial) 2017 OpenCore GmbH & Co. KG 28/50

Identity Management - contd. This is usually done using a tool like SSSD (free) or Centrify (commercial) When using Centrify, configure it to not create the default HTTP principal for each server 2017 OpenCore GmbH & Co. KG 28/50

Identity Management - contd. This is usually done using a tool like SSSD (free) or Centrify (commercial) When using Centrify, configure it to not create the default HTTP principal for each server This does not mean that your users need to be able to SSH into the workers, quite the opposite 2017 OpenCore GmbH & Co. KG 28/50

Identity Management - contd. You need to have the details needed to fetch user & group information from LDAP(S) Some solutions require a domain join 2017 OpenCore GmbH & Co. KG 29/50

Impersonation/Proxy Users 2017 OpenCore GmbH & Co. KG 30/50

Authorization All tools have some form of authorization built-in Core Hadoop components have a first line defense: Service Level Authorizations Ranger & Sentry promise cross-cutting RBAC functionality 2017 OpenCore GmbH & Co. KG 31/50

Data Protection: Encryption At-Rest HDFS Transparent Encryption All I need is a KMS, right? 2017 OpenCore GmbH & Co. KG 32/50

Data Protection: Encryption At-Rest - contd. KMS should be close to your clients and NameNode KMS should be separately administered AFAIK only Hadoop 3 will allow the / (root) directory to be its own zone and have sub-zones Secure access to the backing store & host where KMS runs You can use a HSM What about Impersonation and things like Knox or HttpFS? 2017 OpenCore GmbH & Co. KG 33/50

Data Protection: Encryption At Rest - contd. Now that my data in HDFS is secure I m good, right? Metadata in databases Loq & Audit files YARN localized stuff Temporary files (Spark) spill files & cached data Only Cloudera has a (paid) solution for this: Cloudera NavEncrypt 2017 OpenCore GmbH & Co. KG 34/50

Data Protection/Authorization: Data Masking Static vs. dynamic masking Ranger & BigSQL support masking, Sentry does not RecordService might be a potential integration point 3rd party tools 2017 OpenCore GmbH & Co. KG 35/50

Data Protection: Encryption On-The-Wire Web Interfaces REST/Thrift Interfaces MapReduce Shuffle Spark Shuffle (only in 2.1+) Server to Agent communication Ingest/Egress tools (Flume, Informatica, Kafka, ) RPC HDFS Data traffic Traffic from your YARN apps 2017 OpenCore GmbH & Co. KG 36/50

Data Protection: Encryption On-The-Wire 2-way TLS possible? Which cipher suites are supported? Some tools automatically disable HTTP others don t! Missing documentation all around 2017 OpenCore GmbH & Co. KG 37/50

Auditing & Logging No documentation (on auditing events and lots of other things) Ranger aggregates audit logs, async by default Cloudera has Navigator No integrity protection Need to protect against admins 2017 OpenCore GmbH & Co. KG 38/50

3rd Party Tools Not a single product has good documentation How do they access the cluster? Do they themselves authenticate and authorize users? How? Auditing/Logging? 2017 OpenCore GmbH & Co. KG 39/50

Production

What now? So you ve finished the project and handed it over to the operations team what now? 2017 OpenCore GmbH & Co. KG 40/50

Keep it running Plan regular updates to your OS Use a Configuration Management tool (e.g. Ansible) Plan regular updates to your distribution! 2017 OpenCore GmbH & Co. KG 41/50

Monitoring All our logging & auditing is irrelevant if you re not monitoring & alerting on those logs. 2017 OpenCore GmbH & Co. KG 42/50

Environments You need a place where you can test upgrades. You might also need a place for backups. Devs would like a cluster as well 2017 OpenCore GmbH & Co. KG 43/50

Environments 2017 OpenCore GmbH & Co. KG 44/50

Environments 2017 OpenCore GmbH & Co. KG 45/50

User Lifecycle What happens when a user leaves your organization? Caches Ranger Usersync Existing sessions Shared accounts 2017 OpenCore GmbH & Co. KG 46/50

Misc

Cloud? What about the cloud providers? 2017 OpenCore GmbH & Co. KG 47/50

Distributions Cloudera 2017 OpenCore GmbH & Co. KG 48/50

Distributions Cloudera Hortonworks 2017 OpenCore GmbH & Co. KG 48/50

Distributions Cloudera Hortonworks IBM 2017 OpenCore GmbH & Co. KG 48/50

Distributions Cloudera Hortonworks IBM Microsoft HDInsight 2017 OpenCore GmbH & Co. KG 48/50

Thank You! Thank you for listening! 2017 OpenCore GmbH & Co. KG 49/50

Questions & Contact Questions? Contact me at: lars.francke@opencore.com @lars_francke Visit us at opencore.com And if this stuff interests you: We re looking to expand our team! 2017 OpenCore GmbH & Co. KG 50/50