MAPR DATA GOVERNANCE WITHOUT COMPROMISE

Similar documents
Smart Data Catalog DATASHEET

Informatica Enterprise Information Catalog

HDP Security Overview

HDP Security Overview

WHITE PAPER: TOP 10 CAPABILITIES TO LOOK FOR IN A DATA CATALOG

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

Modern Data Warehouse The New Approach to Azure BI

GDPR Data Discovery and Reporting

Virtuoso Infotech Pvt. Ltd.

Data Governance: Data Usage Labeling and Enforcement in Adobe Cloud Platform

Microsoft SharePoint Server 2013 Plan, Configure & Manage

Data Governance Overview

MAPR TECHNOLOGIES, INC. TECHNICAL BRIEF APRIL 2017 MAPR SNAPSHOTS

MOBIUS + ARKIVY the enterprise solution for MIFID2 record keeping

The Value of Data Modeling for the Data-Driven Enterprise

Data Governance Data Usage Labeling and Enforcement in Adobe Experience Platform

Getting personal with your customers and GDPR

Advanced Solutions of Microsoft SharePoint Server 2013

Advanced Solutions of Microsoft SharePoint Server 2013 Course Contact Hours

Advanced Solutions of Microsoft SharePoint 2013

The Need for Big Data Governance

IBM Data Replication for Big Data

SIEM Solutions from McAfee

Data Protection for Virtualized Environments

Solving the Enterprise Data Dilemma

SAP Agile Data Preparation Simplify the Way You Shape Data PUBLIC

Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools

Run the business. Not the risks.

Hortonworks DataPlane Service

ADABAS & NATURAL 2050+

SYMANTEC: SECURITY ADVISORY SERVICES. Symantec Security Advisory Services The World Leader in Information Security

Capture Business Opportunities from Systems of Record and Systems of Innovation

Syncsort DMX-h. Simplifying Big Data Integration. Goals of the Modern Data Architecture SOLUTION SHEET

Improving Your Business with Oracle Data Integration See How Oracle Enterprise Metadata Management Can Help You

BUSINESS DATA LAKE FADI FAKHOURI, SR. SYSTEMS ENGINEER, ISILON SPECIALIST. Copyright 2016 EMC Corporation. All rights reserved.

WHITEPAPER. MemSQL Enterprise Feature List

Progress DataDirect For Business Intelligence And Analytics Vendors

How Security Policy Orchestration Extends to Hybrid Cloud Platforms

The Emerging Data Lake IT Strategy

Data Virtualization and the API Ecosystem

vrealize Introducing VMware vrealize Suite Purpose Built for the Hybrid Cloud

SOLUTION BRIEF HELPING BREACH RESPONSE FOR GDPR WITH RSA SECURITY ADDRESSING THE TICKING CLOCK OF GDPR COMPLIANCE

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019

SOLUTION BRIEF RSA SECURID SUITE ACCELERATE BUSINESS WHILE MANAGING IDENTITY RISK

IBM InfoSphere Information Analyzer

The age of Big Data Big Data for Oracle Database Professionals

by Cisco Intercloud Fabric and the Cisco

Unified Governance for Amazon S3 Data Lakes

Informatica Data Quality Product Family

Enabling Secure Hadoop Environments

Processing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd.

HDInsight > Hadoop. October 12, 2017

Solution Brief. Bridging the Infrastructure Gap for Unstructured Data with Object Storage. 89 Fifth Avenue, 7th Floor. New York, NY 10003

Oracle Big Data Discovery

Overview of Data Services and Streaming Data Solution with Azure

The Value of Data Governance for the Data-Driven Enterprise

Lenses 2.1 Enterprise Features PRODUCT DATA SHEET

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

BEST BIG DATA CERTIFICATIONS

USERS CONFERENCE Copyright 2016 OSIsoft, LLC

Unifying Big Data Workloads in Apache Spark

Best practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP

MapR Enterprise Hadoop

Lambda Architecture for Batch and Stream Processing. October 2018

Microsoft Power BI for O365

Data safety for digital business. Veritas Backup Exec WHITE PAPER. One solution for hybrid, physical, and virtual environments.

Benchmarks Prove the Value of an Analytical Database for Big Data

Datameer Big Data Governance. Bringing open-architected and forward-compatible governance controls to Hadoop analytics

ELASTIC DATA PLATFORM

Information empowerment for your evolving data ecosystem

TECHNICAL OVERVIEW OF NEW AND IMPROVED FEATURES OF EMC ISILON ONEFS 7.1.1

2 The IBM Data Governance Unified Process

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data

SOLUTION TRACK Finding the Needle in a Big Data Innovator & Problem Solver Cloudera

Copyright 2016 Datalynx Pty Ltd. All rights reserved. Datalynx Enterprise Data Management Solution Catalogue

How to choose the right approach to analytics and reporting

Datameer for Data Preparation:

Hybrid Data Platform

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.

Swimming in the Data Lake. Presented by Warner Chaves Moderated by Sander Stad

Xcelerated Business Insights (xbi): Going beyond business intelligence to drive information value

Oracle Big Data Connectors

Sustainable Security Operations

Stages of Data Processing

ETL is No Longer King, Long Live SDD

TECHNICAL OVERVIEW OF NEW AND IMPROVED FEATURES OF DELL EMC ISILON ONEFS 8.0

Enterprise Data Catalog for Microsoft Azure Tutorial

What s a BA to do with Data? Discover and define standard data elements in business terms

Tamr Technical Whitepaper

Best Practices in Securing a Multicloud World

Hortonworks and The Internet of Things

Are your data ready for GDPR Compliance?

Migrate from Netezza Workload Migration

Deploying, Managing and Reusing R Models in an Enterprise Environment

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E

7 Reasons to Worry About Your Current Archiving Strategy

Good analytics needs good data and that needs good metadata

BRINGING DATA LINEAGE TO YOUR FINGERTIPS

Accelerate Your Enterprise Private Cloud Initiative

CA Security Management

Transcription:

MAPR TECHNOLOGIES, INC. WHITE PAPER JANUARY 2018 MAPR DATA GOVERNANCE

TABLE OF CONTENTS EXECUTIVE SUMMARY 3 BACKGROUND 4 MAPR DATA GOVERNANCE 5 CONCLUSION 7

EXECUTIVE SUMMARY The MapR DataOps Governance Framework is designed to provide a complete enterprisewide management solution to governing data. It supports data lineage, metadata catalog, data dictionary, and data lifecycle management. Critical business decisions are being made against data. The result is tremendous pressure to create and maintain trust in data quality and regulatory data compliance. To achieve a high level of confidence in the quality of data, the MapR solution considers more than a single environment such as Hadoop because most data originates and is processed outside of a single platform. An enterprise solution must consider the entire enterprise and not focus only on a single point solution. The MapR DataOps Governance Framework is a blend of technology options that assist the data governance process. These technologies can be tailored to your organizational data transformation and data lineage requirements. Our complete enterprise-centric management capabilities include platform-based security, data lineage, metadata management at scale, self-service data discovery, and data lifecycle management. Platform-Based Security. As the only data platform with built-in security, MapR is designed to apply security semantics automatically as data is being stored and retrieved from the platform. MapR solves for all four pillars of security authentication, authorization, auditing, and data protection using platform-level capabilities that don t require external security tools or plugins. Such a solution is therefore complete and cannot be bypassed by components that have not been carefully altered to work with an external security tool. Data Lineage. MapR provides a robust, scalable mechanism to capture the data evolution across the enterprise and tracks the complete data transformation inside and outside of the big data platform. Metadata Management at Scale. MapR offers one complete metadata catalog to store and query metadata such as data source, transformations, and stewardship in a highly scalable and efficient manner. Secure, Self-Service Data Discovery. Using interactive SQL powered by Apache Drill, MapR allows users to discover data without first having to create a schema. This ensures granular security during the discovery process by empowering data owners and administrators to expose portions even obfuscated portions of data. Data Lifecycle Management. MapR assigns policies to place data in restricted zones based on criteria such as the data s age, temperature, or tenancy requirements. Cold data can be archived or deleted at once. 3

BACKGROUND Data governance is less about the technology and more about a set of processes tracking and managing the data origin and all subsequent transformations. The goal of the MapR DataOps Governance Framework is to achieve a high level of data quality and integrity to gain a competitive advantage and to meet mandated compliance. It is critical to understand your existing processes and objectives before choosing a technology. Technology can be leveraged to support data governance processes, but the challenge is selecting the right technologies to track the full holistic transformation of your data. Choosing the right technology requires a solid understanding of your organization s business needs: How do you define the owner of the data? What is your data management strategy? What is the data-cleaning process and criteria for data validation, correctness, and completeness? What are the various data transformations used against your data today? Are there any industry or regulatory requirements? What are the data access policies for your organization? What data controls and change recording are required? Today, no single technology or vendor offers a one size fits all solution. Any vendor making this product claim is misleading you. Every industry and organization has unique processes and requirements that demand great care when selecting technologies to assist in the data governance process. Before choosing a technology, you must understand the full transformation process of the data so that you can select technologies that track and manage data with an enterprisewide view. Having an enterprisewide view of data is critical to achieving a core goal of data governance: addressing data quality. A data governance solution is only truly helpful if it addresses all enterprise data management processes and flows, not just those within a single domain or big data platform. After all, data quality problems can be introduced anywhere in the chain, even before the data reaches the big data platform. Other big data vendors make claims of having complete data governance. These big data solutions mostly focus on data governance within the walls of a big data world and have significant gaps when managing data governance from an enterprisewide view. These are point solutions to an enterprise problem. It is crucial to leverage the right technology for the organization. The MapR Converged Data Platform is specifically designed to be open and pluggable. This allows teams to leverage the right data governance technology in tandem with existing MapR data governance capabilities. 4

MAPR DATA GOVERNANCE The MapR data governance solution consists of two main components: the MapR Converged Data Platform and the MapR DataOps Governance Framework. MapR Open Approach to Governance for All Data. RELATIONAL, SAAS, MAINFRAME DOCUMENTS, EMAILS BLOGS, SOCIAL MEDIA, LINK DATA LOG FILES, CLICKSTREAMS ENTERPRISEWIDE GOVERNANCE WORKFLOW PLATFORM-BASED SECURITY SEARCH COMPLIANCE-READY LINEAGE DISCOVERY SCALABLE METADATA REPOSITORY BUSINESS INTELLIGENCE ANALYTICS OPERATIONAL APPLICATIONS CLOUD-SCALE DATA STORE ANALYTICS & ML ENGINES OPERATIONAL DATABASE GLOBAL EVENT STREAMS CONVERGED DATA PLATFORM High Availability Real-Time Unified Security Multi-Tenancy Disaster Recovery Global Namespace MapR DataOps Governance Framework The MapR Converged Data Platform offers a robust and unmatched protection scheme for data within the MapR platform. MapR security is built directly into the platform and supports the ability to apply security protection directly as data comes into and out of the platform without requiring an external security manager server or specific security plugins for each ecosystem component. MapR security semantics are applied automatically by design for data being retrieved or stored by any ecosystem, application, or users out of the box. The MapR DataOps Governance Framework is built on an open architecture, allowing customers to extend and use the right technology to support processes that match their use cases. With MapR, businesses can track and manage the data transformation process to achieve a complete data-governance data-lineage monitoring solution. MapR offers a rich set of APIs available to data governance technologies suitable for tracking and managing data across the enterprise. The MapR DataOps Governance Framework architecture leverages the right partner technology to provide the best data governance approach. Big data only solutions offered by others do not provide full end-to-end data governance solutions. Their patchwork of disparate security models and adhoc security services add complexity without actually solving the problem. Our open architecture allows for a best-of-breed solution from industry data-governance leaders, giving you a broad range of technology options tailored to specific use cases and requirements. Every organization has unique data quality procedures in place. Great care is required in selecting technologies to assist in the data governance process to successfully keep track of the metadata and the transformation process. For this reason, the MapR DataOps Governance Framework is designed explicitly toward an open architecture. This lets customers plug in the right technology to extend MapR to support and assist in data governance process and procedures. The MapR open architecture is supported by leading industry data governance solutions such as Cask, Waterline, Infomatica, Collibra, Podium, Dataguise, Talend, and Alation. In addition, MapR data governance partners provide an even tighter integration and certified arrangement so that MapR customers have one metadata catalog and a clear path of data lineage as illustrated by the graphic below. MapR is currently pursuing arrangements with Cask and Waterline. 5

Cask provides a unified integration platform for big data. Open source Cask Data Application Platform (CDAP) lets architects, data scientists, and business analysts focus on applications and insights rather than infrastructure and integration. Through powerful self-service data lineage tools and APIs, CDAP provides users with visibility into how data is flowing into, through, and out of data lakes. It allows them to perform impact and root cause analysis as well as provides an audit trail for compliance. CDAP provides the capabilities and standardization to collect technical, operational, and business metadata from data ingestion and transformation needed to create rich metadata for governance. Programmatic APIs allow for integrating with existing Spark or MapReduce-based applications for publishing metadata, which enables better tracking and visibility with preexisting solutions. CDAP also provides the capability to aggregate and index data at the level of entities where users interact, which is essential. It supports searching based on tags, properties, or schema fields and types, which is critical for discovering datasets in an operational cluster. Both a data dictionary and preferred tags provide a way for standardizing tags and fields that are applied on the datasets. EDW OPTIMIZATION MANAGED DATA LAKE BUSINESS-CRITICAL DATA OPS & IoT DATA PREPARATION DATA INGESTION OPERATIONS & MANAGEMENT SECURITY & GOVERNANCE APP DEVELOPMENT ECOSYSTEM NAVIGATOR NiFi / HDF VERSUS CONVERGED DATA PLATFORM MapR DataOps Governance Framework with Cask vs the Competition 6

Waterline Data provides a business-centric data catalog in the enterprise. Companies often have problems finding, organizing, and effectively using their data. Most organizations track their data using tribal knowledge in the heads of their data analysts, scientists, and stewards. Waterline s Smart Data Catalog replaces this tribal knowledge with software that automatically profiles and tags data using machine learning plus a system of ratings and reviews think of it as Google meets Yelp for data to catalog data consistently so users can quickly search for and find data. Waterline provides solutions for self-service analytics and data governance and compliance that automate the discovery, curation, and resolution of critical data. This allows users to spend more time using data and less time searching for it, to better comply with data regulatory requirements, and to reduce the costs associated with data redundancy and data hoarding. MAPR AND WATERLINE DATA EXTENDS GOVERNANCE INFRASTRUCTURE METADATA SERVICES SECURITY DATA SOURCE SUPPORT FINGERPRINTING DISCOVERY SERVICES ENABLE SECURITY FOR DARK DATA CATALOG DATA SOURCES BEYOND HADOOP Tag Discovery & Suggestions Statistical Demographics Near Real-Time Security Updates Sensitive Data Discovery Relational Azure Blobs S3 + Redshift Inferred Lineage Curation Metadata Repository (Navigator or Atlas) Tag Based Access Control Infrastructure HDFS, Hive CDH & HDP BASIC SERVICES MapR DataOps Governance Framework with Waterline Data vs the Competition 7

MapR Data Governance Without Compromise provides a way to feed relevant MapR data governance data into a customized solution. MapR Professional Services can develop a custom data governance solution that integrates with an existing or new solution. During a six week engagement, MapR Professional Services develops the foundation for a custom solution using core features of the MapR Converged Data Platform to create an enterprisewide platform for cataloging metadata, collating data evolution events for lineage, and organizing data and assigning policies to facilitate data lifecycle management. CONCLUSION Data governance is not just about the technology. Rather, it is a set of processes that track and manage the origin and transformation of all data to achieve a high level of data quality and integrity. The end result is a competitive advantage for your business. Data governance ensures business data is efficiently managed throughout the enterprise data lifecycle, resulting in data that benefits the business through its high quality, integrity, and trustworthiness. This enterprisewide process is established by people responsible for data quality. The role of technology is to support the process and the people managing it. Choosing the right technology to align with your organization s goals is essential in establishing a holistic data management program. For the data to be useful, you must manage it. Because decisions are being made against this data, creating and maintaining trust in the data quality is essential for data governance success. The MapR DataOps Governance Framework is built on an open architecture. This design provides the necessary flexibility for plugging and extending the right technology that aligns with your organizational processes. Data scientists need an enterprisewide view of the data to ensure the data maintained is high quality. This cannot be achieved using technologies that are only focused on big data. More information on the professional services based governance engagement can be found here: https://mapr.com/solutions/quickstart/data-governance/ For more information visit mapr.com MapR and the MapR logo are registered trademarks of MapR and its subsidiaries in the United States and other countries. Other marks and brands may be claimed as the property of others. The product plans, specifications, and descriptions herein are provided for information only and subject to change without notice, and are provided without warranty of any kind, express or implied. Copyright 2018 MapR Technologies, Inc.