Enabling Universal Authorization Models using Sentry

Similar documents
Innovatus Technologies

Oracle Big Data Fundamentals Ed 2

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI)

Design Proposal for Hive Metastore Plugin

Oracle 1Z Oracle Big Data 2017 Implementation Essentials.

Integration of Apache Hive

1Z Oracle Big Data 2017 Implementation Essentials Exam Summary Syllabus Questions

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

Securing the Oracle BDA - 1

Oracle Big Data Fundamentals Ed 1

Lecture 7 (03/12, 03/14): Hive and Impala Decisions, Operations & Information Technologies Robert H. Smith School of Business Spring, 2018

Security and Performance advances with Oracle Big Data SQL

Hadoop. Introduction / Overview

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

Configuring and Deploying Hadoop Cluster Deployment Templates

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::

Product Compatibility Matrix

Blended Learning Outline: Cloudera Data Analyst Training (171219a)

Question: 1 You need to place the results of a PigLatin script into an HDFS output directory. What is the correct syntax in Apache Pig?

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Because databases are not easily accessible by Hadoop, Apache Sqoop was created to efficiently transfer bulk data between Hadoop and external

Cloudera Introduction

Big Data with Hadoop Ecosystem

Lenses 2.1 Enterprise Features PRODUCT DATA SHEET

Cloudera Introduction

Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk

Using Apache Phoenix to store and access data

Exam Questions 1z0-449

Introduction to Oracle NoSQL Database

Big Data Analytics using Apache Hadoop and Spark with Scala

Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT.

Enabling Secure Hadoop Environments

Impala. A Modern, Open Source SQL Engine for Hadoop. Yogesh Chockalingam

Cloudera Introduction

Exam Questions

Hadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours)

Security Benefits of Implementing Database Vault. -Arpita Ghatak

BIG DATA ANALYTICS USING HADOOP TOOLS APACHE HIVE VS APACHE PIG

Using Apache Zeppelin

Cloudera Introduction

Hortonworks Data Platform

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data

i2b2 Software Architecture Project Management (PM) Cell Document Version: i2b2 Software Version:

Part2: Let s pick one cloud IaaS middleware: OpenStack. Sergio Maffioletti

Big Data Syllabus. Understanding big data and Hadoop. Limitations and Solutions of existing Data Analytics Architecture

JBoss DNA. Randall Hauch Principal Software Engineer JBoss Data Services

Cloudera Introduction

Oracle Database: SQL and PL/SQL Fundamentals NEW

How to choose the right approach to analytics and reporting

HDP Security Overview

HDP Security Overview

Cloudera Introduction

SOLUTION TRACK Finding the Needle in a Big Data Innovator & Problem Solver Cloudera

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

YARN: A Resource Manager for Analytic Platform Tsuyoshi Ozawa

Fedora Commons Update

AWS Serverless Architecture Think Big

Oracle Data Integrator 12c: Integration and Administration

Gain Insights From Unstructured Data Using Pivotal HD. Copyright 2013 EMC Corporation. All rights reserved.

WHITEPAPER. MemSQL Enterprise Feature List

Cisco and Cloudera Deliver WorldClass Solutions for Powering the Enterprise Data Hub alerts, etc. Organizations need the right technology and infrastr

Apache Ranger User Guide

Asanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks

Oracle Data Integrator 12c: Integration and Administration

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

Granting Read-only Access To An Existing Oracle Schema

International Journal of Advance Engineering and Research Development. A study based on Cloudera's distribution of Hadoop technologies for big data"

Working with Database Connections. Version: 18.1

iway iway Big Data Integrator New Features Bulletin and Release Notes Version DN

Big Data Infrastructure at Spotify

Sql Server 'create Schema' Must Be The First Statement In A Query Batch

HDInsight > Hadoop. October 12, 2017

ITS. MySQL for Database Administrators (40 Hours) (Exam code 1z0-883) (OCP My SQL DBA)

Important Notice Cloudera, Inc. All rights reserved.

Apache HBase Andrew Purtell Committer, Apache HBase, Apache Software Foundation Big Data US Research And Development, Intel

Big Data Hadoop Stack

Cloudera Introduction

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,

Is NiFi compatible with Cloudera, Map R, Hortonworks, EMR, and vanilla distributions?

Hadoop Overview. Lars George Director EMEA Services

Data Informatics. Seon Ho Kim, Ph.D.

Down the event-driven road: Experiences of integrating streaming into analytic data platforms

HADOOP COURSE CONTENT (HADOOP-1.X, 2.X & 3.X) (Development, Administration & REAL TIME Projects Implementation)

Data Governance Overview

The State of Apache HBase. Michael Stack

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

Technical White Paper

Hue Application for Big Data Ingestion

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

Big Data Hadoop Course Content

HBASE + HUE THE UI FOR APACHE HADOOP

Accelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite. Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017

MapR Enterprise Hadoop

Configuring Ports for Big Data Management, Data Integration Hub, Enterprise Information Catalog, and Intelligent Data Lake 10.2

HCatalog. Table Management for Hadoop. Alan F. Page 1

Oracle. Oracle Big Data 2017 Implementation Essentials. 1z Version: Demo. [ Total Questions: 10] Web:

Hortonworks Data Platform

Developing Solutions for Google Cloud Platform (CPD200) Course Agenda

Intellicus Enterprise Reporting and BI Platform

Transcription:

Enabling Universal Authorization Models using Sentry Hao Hao - hao.hao@cloudera.com Anne Yu - anneyu@cloudera.com Vancouver BC, Canada, May 9-12 2016

About us Software engineers at Cloudera Apache Sentry PMC and Committer Hao, used to work at Search Backend, ebay Inc. Anne, used to work at Search Backend, A9, an Amazon subsidiary.

Presentation Agenda Sentry Overview Introduction Architecture Sentry Generic Authorization Model Motivation: easy integration with Apache data engines, even third-party data applications Successfully integrated with Apache Solr, Kafka and Sqoop2 Integration examples Sentry Other Features and Future Work

Sentry Overview Authorization Service Sentry provides the ability to enforce role-based access control (RBAC) to data and/or metadata for authenticated users in a fine-grained manner. Enterprise grade big data security. Provides unified policy management. Pluggable and highly modular. Work out of the box with Apache Hive, Hive metastore/hcataglog, Apache Solr, Apache Kafka, Apache Sqoop and Apache Impala.

Sentry Architecture Hive Solr Sqoop Hook Hook Sentry Client Plugin Thrift APIs Sentry Server (Policy Metadata Store) Hook Server Client Model Thrift client APIs Get privileges; Grant/Revoke role; Grant/Revoke privileges; List roles

Sentry Architecture Hive Solr Sqoop Access Binding Layer DB Policy Authorization Provider Solr Policy Sqoop Policy Provider Backend Policy Metadata Store Local File/HDFS Sentry Database

Sentry Architecture Hive Solr Sqoop DB Policy Access Binding Layer Authorization Provider Solr Policy Provider Backend Sqoop Policy Binding Layer: takes the authorization requests in the native format of requestors and converts that into a authz request based on the authorization data model that can be handled by Sentry authorization provider. Policy Metadata Store Local File/HDFS Sentry Database

Sentry Architecture Hive Solr Sqoop DB Policy Access Binding Layer Authorization Provider Solr Policy Provider Backend Sqoop Policy Authorization provider: an abstraction for making the authorization decision for the authz request from binding layer. Currently, supplies a RBAC authorization model implementation. Policy Metadata Store Local File/HDFS Sentry Database

Sentry Architecture Hive Solr Sqoop DB Policy Access Binding Layer Authorization Provider Solr Policy Provider Backend Sqoop Policy Policy : gets the requested privileges from the binding layer and the required privileges from the provider layer. It looks at the requested and required privileges and makes the decision whether the action should be allowed. Policy Metadata Store Local File/HDFS Sentry Database

Sentry Architecture Hive Solr Sqoop DB Policy Access Binding Layer Authorization Provider Solr Policy Provider Backend Sqoop Policy Policy Backend: making the authorization metadata available for the policy engine. It allows the metadata to be pulled out of the underlying repository independent of the way that metadata is stored. Policy Metadata Store Local File/HDFS Sentry Database

Sentry Architecture Hive Solr Sqoop DB Policy Access Binding Layer Authorization Provider Solr Policy Provider Backend Sqoop Policy Sentry policy store and Sentry Service: persist the role to privilege and group to role mappings in an RDBMS and provide programmatic APIs to create, query, update and delete it. This enables various Sentry clients to retrieve and modify the privileges concurrently and securely. Policy Metadata Store Local File/HDFS Sentry Database

Authorization Model for SQL s Provides out of box integration with Apache SQL engines Apache Hive, Impala (incubating) Grant permissions to allow DDLs and DMLs on these objects A fixed metadata model designed to policies and objects suitable for SQL data databases, tables, views and columns Specific Client plugin for SQL engines DB policy engine If support other data applications, will need extensive development

Generic Authorization Model Motivation Supports various data applications out of box Easy integration with any new components Very few implementation A more flexible design Generic authorization data model for defining sensitive resources Generic policy engine that includes: access actions and privileges abstraction could be interpreted by various engines

Generic Authorization Model Hive Solr Kafka Access Binding Layer Authorization Provider Policy Provider Backend Policy Metadata Store Local File/HDFS Sentry Database

Server Client Model Thrift client APIs Get privilege; Grant/Revoke role; Grant/Revoke privilege; List roles Provide SentryCLI to manage policies and metadata sentryshell --grant_role_privilege --role analyst --privilege server=server1->db=db2->table=tab1->action=select --conf sentry-site. xml RESTful client APIs (Ongoing) Use HTTP requests to manage roles and privileges

Generic Authorization Model Hive Solr Kafka Access Binding Layer Authorization Provider Policy Provider Backend REST API Policy Metadata Store Sentry Shell Local File/HDFS Sentry Database

Authorization Data Model Authorization data model defines the objects to be subject to authorization rules, the granularity of actions to be allowed and privilege rules that allow access to objects. In generic model, different data application objects are represented as resources. Actions and privileges can be transformed into read, write or all operations based on data application definition. For example, SQL authorization data model: objects are databases, tables, views, uris or functions; privileges are create database, insert into table, create function using func.jar and so on.

Authorization Data Model (cont ) Solr authorization data model: objects are indexes, collections, or documents; actions and privileges can be search through collections or documents, create index, list status of cores and so on. Sqoop authorization data model: objects are resources, servers, connectors, links or jobs; actions and privileges can be interpreted as connect to server, execute job. Kafka authorization data model: objects are clusters, topics or consumer groups; actions and privileges write on topic for producer, create on cluster for auto topic etc.

Integrating with Universal Authorization Model Define Authorization data: Define authorizables as resources. For example, Solr:///collection=c1/field=f1; Hive:///database=db/table=tbl/column=cl. Define actions and privilege rules. For example, Solr read on collection means user can search through collection. Hive read on database means user can show databases. Extend generic Plugin : Query Sentry to retrieve privileges, do the transformation, return rules which can be used to authorize access actions. Policy enforcement on data application side: Implement hook and binding to enforce authorization by using Plugin.

Integration Use Cases Authorizing Hive Authorizing Solr Authorizing Sqoop2

Authorizing Hive Define authorization data model: Authorization objects: server, database, table, view, column and so on Access actions: create database, insert into a table, select column from table Privileges: database create, table insert, column select Extend binding and hook to retrieve privileges from Sentry HS2 integrate hook into build stage HMS integrate hook into a pre metadata change event Extend metadata: privilege (including resources and actions). Resource could be defined as a generic string in URI format: hive: ///database=db1/table=tbl1/column=c1 Extend HS2 or HMS clients to manage Sentry metadata such as roles and privileges

Authorizing Solr Extend authorization data model: Authorization objects: collections, documents, indexes or admin operations Extend binding and hook to retrieve privileges from Sentry: Implement hooks to override request handler implementation Hooks enforce collection or document level authorization Extend metadata: privilege (including resources and actions). Resource could be defined as a generic string in URI format: solr: ///collection=c1/field=f1 SolrCli wraps around SentryCLI to manage metadata

Authorizing Sqoop2 Extend authorization data model: Authorize objects: server, connector, link and job Access actions: show connector, update link, show job Privileges: connection read, link write, job read Extend binding and hook to retrieve privileges from Sentry Client -> JobRequestHandler -> Authorize through Sentry -> JDBC Repo Extend metadata: privileges (including resources and actions). Resource could be defined as a generic string in URI format: sqoop: ///server=server1/connector=conn Sqoop Cli is extended to manage Sentry metadata such as roles and privileges

Sentry Releases Successfully graduated from the Incubator in March of 2016 and now is a Top-Level Apache project Sentry 1.6.0 released on Feb 24, 2016 Add capability to export/import to dump or load Sentry metadata Integrate Sqoop with Sentry by using generic authorization model About to release 1.7.0 Integrate Hive v2 into Sentry Integrate Kafka with Sentry by using generic authorization model Integrate Solr with Sentry by using generic authorization model

Future Work New design of Sentry HA Compatible with HMS HA and HDFS ACLs Sync Up RESTful client APIs to define authorization data model Continue work for generic authorization model Attribute Based Access Control

Reference Apache Sentry: https://cwiki.apache.org/confluence/display/sentry/home Integrating with Sentry New Universal Authorization Model: https://cwiki. apache. org/confluence/display/sentry/integrating+with+sentry+new+universal+aut horization+model

Questions?