MapR Enterprise Hadoop

Similar documents
Ulf Andreasson. 7 th Oct 2014

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Configuring and Deploying Hadoop Cluster Deployment Templates

April Copyright 2013 Cloudera Inc. All rights reserved.

Hadoop An Overview. - Socrates CCDH

Hadoop. Introduction / Overview

Innovatus Technologies

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,

Big Data Analytics using Apache Hadoop and Spark with Scala

Gain Insights From Unstructured Data Using Pivotal HD. Copyright 2013 EMC Corporation. All rights reserved.

Hortonworks and The Internet of Things

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam

Big Data Architect.

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Oracle Big Data Fundamentals Ed 2

docs.hortonworks.com

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

Big Data Hadoop Stack

BIG DATA COURSE CONTENT

DATA SCIENCE USING SPARK: AN INTRODUCTION

Copyright 2015 EMC Corporation. All rights reserved. A long time ago

Hadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours)

Automation of Rolling Upgrade for Hadoop Cluster without Data Loss and Job Failures. Hiroshi Yamaguchi & Hiroyuki Adachi

HDInsight > Hadoop. October 12, 2017

The Reality of Qlik and Big Data. Chris Larsen Q3 2016

Microsoft Big Data and Hadoop

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

Cmprssd Intrduction To

Hadoop & Big Data Analytics Complete Practical & Real-time Training

Hadoop Development Introduction

Introduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data

Big Data Hadoop Course Content

Product Compatibility Matrix

Lenovo Big Data Validated Design for Cloudera Enterprise and VMware

Oracle Big Data Fundamentals Ed 1

Shark: SQL and Rich Analytics at Scale. Michael Xueyuan Han Ronny Hajoon Ko

Chase Wu New Jersey Institute of Technology

Hadoop Overview. Lars George Director EMEA Services

Hortonworks Data Platform

Cloudera Introduction

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?

EsgynDB Enterprise 2.0 Platform Reference Architecture

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

Hortonworks University. Education Catalog 2018 Q1

Oracle Big Data Connectors

Cloudera Introduction

Big Data Syllabus. Understanding big data and Hadoop. Limitations and Solutions of existing Data Analytics Architecture

microsoft

Best Practices for Deploying Hadoop Workloads on HCI Powered by vsan

Cloudera Introduction

SAP VORA 1.4 on AWS - MARKETPLACE EDITION FREQUENTLY ASKED QUESTIONS

Cloudera Introduction

Introduction to Database Services

Report on The Infrastructure for Implementing the Mobile Technologies for Data Collection in Egypt

Certified Big Data Hadoop and Spark Scala Course Curriculum

Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT.

Building an Integrated Big Data & Analytics Infrastructure September 25, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle

Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies

Interactive SQL-on-Hadoop from Impala to Hive/Tez to Spark SQL to JethroData

Big Data with Hadoop Ecosystem

Lecture 11 Hadoop & Spark

Question: 1 You need to place the results of a PigLatin script into an HDFS output directory. What is the correct syntax in Apache Pig?

Data Architectures in Azure for Analytics & Big Data

Cloudera Introduction

2/26/2017. Originally developed at the University of California - Berkeley's AMPLab

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI)

Modern Data Warehouse The New Approach to Azure BI

HADOOP COURSE CONTENT (HADOOP-1.X, 2.X & 3.X) (Development, Administration & REAL TIME Projects Implementation)

Big Data Infrastructures & Technologies

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Introduction to BigData, Hadoop:-

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism

Cisco and Cloudera Deliver WorldClass Solutions for Powering the Enterprise Data Hub alerts, etc. Organizations need the right technology and infrastr

Distributed Systems 16. Distributed File Systems II

Cloudera Introduction

Backtesting with Spark

Getting Started with Hadoop and BigInsights

Hadoop, Yarn and Beyond

SOLUTION TRACK Finding the Needle in a Big Data Innovator & Problem Solver Cloudera

Certified Big Data and Hadoop Course Curriculum

Hadoop course content

Warehouse- Scale Computing and the BDAS Stack

Practical Big Data Processing An Overview of Apache Flink

WITH INTEL TECHNOLOGIES

Unifying Big Data Workloads in Apache Spark

Intro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect

Welcome to. uweseiler

Stages of Data Processing

Cloud Computing & Visualization

Lenovo Big Data Reference Architecture for Hortonworks Data Platform Using System x Servers

Approaching the Petabyte Analytic Database: What I learned

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

MySQL Multi-Site/Multi-Master MySQL High Availability and Disaster Recovery ~~~ Heterogeneous Real-Time Data Replication Oracle Replication

Introduction into Big Data analytics Lecture 3 Hadoop ecosystem. Janusz Szwabiński

Cloud Analytics and Business Intelligence on AWS

Apache Spark 2.0. Matei

Lambda Architecture with Apache Spark

Transcription:

2014 MapR Technologies 2014 MapR Technologies 1

MapR Enterprise Hadoop Top Ranked Cloud Leaders 500+ Customers 2014 MapR Technologies 2

Key MapR Advantage Partners Business Services APPLICATIONS & OS ANALYTICS & BUSINESS INTELLIGENCE DATA WAREHOUSE & INTEGRATION INFRASTRUCTURE & CLOUD CONSULTANTS & INTEGRATORS 2014 MapR Technologies 3

MapR Distribution for Hadoop 2014 2014 MapR MapR Technologies Technologies 4

MANAGEMENT MANAGEMENT Hadoop Distributions Distribution A Distribution C Open Source Open Source Open Source ARCHITECTURAL INNOVATIONS 2014 MapR Technologies 5

Architecture Matters for Success FOUNDATION 2014 MapR Technologies 6

Architecture Matters for Success NEW APPLICATIONS SLAs TRUSTED INFORMATION LOWER TCO Data protection & security Multi-tenancy Open standards for integration High performance Workload management FOUNDATION 2014 MapR Technologies 7

CLI REST API GUI MapR Control System (Management and Monitoring) MapR Distribution for Apache Hadoop APACHE HADOOP AND OSS ECOSYSTEM Batch Tez* ML, Graph SQL NoSQL & Search Streaming Data Integration & Access Security Workflow & Data Governance Provisioning & Coordination Spark Drill* Cascading GraphX Shark Accumulo* Hue Savannah* Pig MLLib Impala Solr Storm* HttpFS Juju MapReduc e v1 & v2 Mahout Hive HBase Spark Streaming Flume Knox* Falcon* Whirr YARN Sqoop Sentry* Oozie ZooKeeper EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS NFS HDFS API HBase API JSON API MapR-FS (POSIX) MapR Data Platform (Random Read/Write) MapR-DB (High-Performance NoSQL) Enterprise Grade Data Hub Operational 2014 MapR Technologies 8

Business Continuity 2014 2014 MapR MapR Technologies Technologies 9

Business Continuity High Availability 1 2 3 Data Protection Disaster Recovery What are your requirements? What do you have for your enterprise storage, databases and data warehouses? 2014 MapR Technologies 10

AVAILABILITY 1 High Availability Everywhere No NameNode architecture MapReduce/YARN HA NFS HA Instant recovery Rolling upgrades HA is built in Distributed metadata can self-heal No practical limit on # of files Jobs are not impacted by failures Meet your data processing SLAs High throughput and resilience for NFS-based data ingestion, import/export and multi-client access Files and tables are accessible within seconds of a node failure or cluster restart Upgrade the software with no downtime No special configuration to enable HA All MapR customers operate with HA 2014 MapR Technologies 11

DATA PROTECTION 2 Data Protection: Replication and Snapshots Replication C1 C2 C1 C2 C5 C6 Protect from hardware failures File chunks, table regions and metadata are automatically replicated (3x by default) At least one replica on a different rack C3 C7 C4 C7 C1 C4 C4 C2 C6 C5 C7 C3 C5 C3 C6 Snapshots Protect from user and application errors Point-in-time recovery No data duplication No performance or scale impact Read files and tables directly from snapshot ₁ 2014 MapR Technologies 12

DISASTER RECOVERY 3 Disaster Recovery : MapR Mirroring Production Research WAN Datacenter 1 Datacenter 2 Production WAN EC2 Flexible Choose the volumes/directories to mirror You don t need to mirror the entire cluster Active/active Fast No performance impact Block-level (8KB) deltas Automatic compression Safe Point-in-time consistency End-to-end checksums Easy Graceful handling of network issues No third-party software Takes less than two minutes to configure! 2014 MapR Technologies 13

SQL Interactive SQL-on-Hadoop: MapR Customers Have Options! Drill 1.0 Hive 0.13 with Tez Impala 1.x Presto 0.56 Shark 0.8 Vertica Latency Low Medium Low Low Medium Low Files Yes (all Hive file formats) Yes (all Hive file formats) Yes (Parquet, Sequence, ) Yes (RC, Sequence, Text) Yes (all Hive file formats) HBase/M7 Yes Yes Various issues No Yes No Schema Hive or schemaless Yes (all Hive file formats) Hive Hive Hive Hive Proprietary or Hive SQL support ANSI SQL HiveQL HiveQL (subset) ANSI SQL HiveQL ANSI SQL + advanced analytics Client support ODBC/JDBC ODBC/JDBC ODBC/JDBC ODBC/JDBC ODBC/JDBC ODBC/JDBC, ADO.NET, Large joins Yes Yes No No No Yes Nested data Yes Limited No Limited Limited Limited Hive UDFs Yes Yes Limited No Yes No Transactions No No No No No Yes Optimizer Limited Limited Limited Limited Limited Yes Concurrency Limited Limited Limited Limited Limited Yes 2014 MapR Technologies 14

SQL Benchmark Berkeley AmpLab https://amplab.cs.berkeley.edu/benchmark/ 2014 MapR Technologies 15

Apache Open Source Community Projects Q1 Q2 Q3 Q4 Resource management YARN 2.4 Batch MapReduce 2.4 Hive 0.12 Pig 0.11 Cascading 2.5 Spark 0.9 Interactive SQL Drill 1.0 Shark 0.9 Impala 1.2.3 Hive on Tez 0.13 Data integration Flume 1.4.0 Sqoop 1.4.4 HttpFS 1.0 Machine learning Mahout 0.8 MLLib 0.9 GraphX 0.9 Coordination Oozie 3.3.2 ZooKeeper Streaming Storm 0.9.0 Spark Streaming 0.9 Data management Falcon 0.3 Knox 0.3 Sentry 1.2.0 GUI and provisioning Hue 3.5 Savannah 0.4 NoSQL and search HBase 0.94 Solr (LWS 2.6.1) Complete distribution with aggressive roadmap 20 projects in the distribution 10+ new projects in 2014 Monthly update to projects Run multiple versions simultaneously Adopt new project versions w/o upgrading cluster Upgrade cluster without migrating all applications to new project versions Update New 2014 MapR Technologies 16

Performance 2014 2014 MapR MapR Technologies Technologies 17

Time in Seconds Time in Seconds Flux7: Comparative Study of Hadoop Distributions Web Search and Data Analytics Benchmarks Lower is Better IDH 2.4.1 925 CDH 4.3 HDP 1.3 563 532 MapR M5 2.1.3 425 333 367 331 163 Page Rank Hive JOIN Query Source: Flux7 Labs Study, October 2013 Hardware Specs: EC2 on AWS 1 Master: m1.xlarge; 64-bit; 4 vcpu, 8 ECU; 15 GiB RAM; 4x420 GB Storage; 4x Intel Xeon CPU E5-2650 0 @ 2.00 GHz 4 Slaves: m1.large; 64-bit; 2 vcpu, 4 ECU; 7.5 GiB RAM; 2x420 GB Storage; 2x Intel Xeon CPU E5430 @ 2.66 GHz 2014 MapR Technologies 18

MB per Second MB per Second Flux7: Comparative Study of Hadoop Distributions Read and Write Throughput Benchmarks Higher is Better 212 262 276 475 465 IDH 2.4.1 CDH 4.3 HDP 1.3 MapR M5 2.1.3 59 69 64 DFSIO Read Throughput DFSIO Write Throughput Source: Flux7 Labs Study, October 2013 Hardware Specs: EC2 on AWS 1 Master: m1.xlarge; 64-bit; 4 vcpu, 8 ECU; 15 GiB RAM; 4x420 GB Storage; 4x Intel Xeon CPU E5-2650 0 @ 2.00 GHz 4 Slaves: m1.large; 64-bit; 2 vcpu, 4 ECU; 7.5 GiB RAM; 2x420 GB Storage; 2x Intel Xeon CPU E5430 @ 2.66 GHz 2014 MapR Technologies 19

M7: In-Hadoop Database Operational Applications and Analytics Combined 2014 2014 MapR MapR Technologies Technologies 20

MapR M7: The Best In-Hadoop Database HBase JVM HDFS NoSQL Columnar Store Apache HBase API Integrated with Hadoop JVM ext3/ext4 Disks Tables/Files Disks Other Distros MapR M7 The most scalable, enterprise-grade, NoSQL database that supports online applications and analytics 2014 MapR Technologies 21

HBase Apps: High Performance with Consistent Low Latency M7 (ops/sec) Other HBase 0.94.9 (ops/sec) Other HBase 0.94.9 latency (usec) M7 Latency (usec) 2014 MapR Technologies 22

MapR M7: The Best In-Hadoop Database HBase JVM HDFS NoSQL Columnar Store Apache HBase API Integrated with Hadoop JVM ext3/ext4 Disks Tables/Files Disks Other Distros MapR M7 The most scalable, enterprise-grade, NoSQL database that supports online applications and analytics 2014 MapR Technologies 23

MapR Editions Control System NFS Access Performance Unlimited Nodes Free Control System NFS Access Performance High Availability Snapshots & Mirroring 24 X 7 Support Annual Subscription All the Features of M5 Simplified Administration for HBase Increased Performance Consistent Low Latency Unified Snapshots, Mirroring Fastest On-Ramp: MapR Sandbox for Hadoop 2014 MapR Technologies 24

Opportunity to Revolutionize Enterprise Data Architecture From Redundant Processing Silos and Data Science Experiments 2014 MapR Technologies 25

The Production Enterprise Data Hub to Consolidated Operational and Analytical Workloads 2014 MapR Technologies 26

MapR Sandbox for Hadoop The Fastest On-Ramp to Hadoop Complete MapR distribution for Hadoop Free download Most advanced distribution Tutorials and advanced user interfaces MapR Control System (MCS) for administrators Hadoop User Experience (HUE) for developers Point-and-click tutorials Fully configured in virtual machines Supports VMware and VirtualBox Drag-and-drop data movement 2014 MapR Technologies 27

Q & A Engage with us! @mapr / @agoujet maprtech mapr-technologies MapR agoujet@mapr.com maprtech 2014 MapR Technologies 28