Gain Insights From Unstructured Data Using Pivotal HD 1
Traditional Enterprise Analytics Process 2
The Fundamental Paradigm Shift Internet age and exploding data growth Enterprises leverage new data sources to identify emerging trends and opportunities Traditional database tools not able to cope 3
Hadoop: Platform for Big Data Flexible Scalable Inexpensive Fault-toleran Gain Insights from Unstructured Data Rapidly Adopted 4
The Analytics Process with Hadoop 5
Economics Have Changed the Game $80,000 $60,000 $40,000 Big Data Platform Price/TB Big Data RDBMS pricing will ultimately converge with Hadoop pricing $20,000 $- 2008 2009 2010 2011 2012 2013 Big Data DB Hadoop 6
Our Big Bets With Hadoop 1. HDFS becomes the data substrate for the next generation of data infrastructures 2. A set of integrated, enterprise-scale services will evolve on top of HDFS 1. Provisioning flexibility and elasticity become critical capabilities for this data infrastructure 7
Pivotal and Hadoop 8
Pivotal Data Fabric Stream Ingestion Data Staging Platform Analytical Query Operational Intelligence Run-Time Applications Streaming Services Data Mgmt. Services In-Memory DB In-Memory Objects HDFS Enterprise Data Warehouse RDBMS Continues to serve as system of record Traditional BI/Reporting Data Visualization Compliance and financial reporting 9
Flexible Deployment Model deploy Private Cloud On Premise Public Cloud 10
PIVOTAL HD The World s Most Powerful Hadoop Distribution 11
What Is Pivotal HD? World s first true SQL processing for enterpriseready Hadoop 100% Apache Hadoop-based platform Virtualization and cloud ready with VMWare and Isilon Available as a software-only or appliance-based solution 12
Pivotal Hadoop Distributions Current Release Apache Hadoop 1.x Upcoming Release Apache Hadoop 2.x 100% Open Source Compatible 13
Pivotal HD Architecture: Apache Resource Management & Workflow Yarn Zookeeper HBase Sqoop HDFS Pig, Hive, Mahout Map Reduce Flume Apache 14
Pivotal HD Architecture: Enterprise Pivotal HD Enterprise Resource Management & Workflow Yarn Zookeeper HBase Hadoop Virtualization (HVE) HDFS Pig, Hive, Mahout Map Reduce Command Center Sqoop Data Loader Flume Apache Pivotal HD Enterprise 15
Data Loader Architecture Streams Pull Data Loader Push Connectors Web GUI and CLI Files HDFS NFS HTTP FTP Flume Data Source Registration Copy Strategy Optimization Job Management Data Processing Data Destination Registration Data Copy HDFS Local REST APIs.. 16
Cluster Management With Command Center Deploy Configure Analyze Monitor Manage 17
Pivotal HD Architecture: HAWQ HAWQ Advanced Database Services Pivotal HD Enterprise Resource Management & Workflow Yarn Zookeeper HBase Xtension Framework ANSI SQL + Analytics Catalog Services Dynamic Pipelining Hadoop Virtualization (HVE) HDFS Query Optimizer Pig, Hive, Mahout Map Reduce Command Center Sqoop Data Loader Flume Apache Pivotal HD Enterprise HAWQ 18
HAWQ: A True SQL Engine for Hadoop Scale and Performance Fault Tolerance Transaction Support Data Management and Analysis 19
Resource Management Leveraging Greenplum DB On Top of Hadoop HAWQ Query Engine Catalog Service Planner Optimizer Executor Transaction Manager GPXF HDFS 20
GPXF: Xtension Framework Xtension Framework Enable custom connector development for other data sources HDFS HBase Hive 21
How HAWQ Works: Submit Query Clients SELECT beer, price FROM Bars b, Sells s WHERE b.name = s.bar AND b.city = San Francisco JDBC/ODBC SQL Console HAWQ Master Host Query Parser Query Optimizer HDFS Namenode Query Host Executor Query Host Executor Query Executor Host... 22
How HAWQ Works: Optimizer Clients Parse Tree JDBC/ODBC SQL Console HAWQ Master Host Query Parser Query Optimizer HDFS Namenode Metadata Cost Model Resources Query Host Executor Query Host Executor Query Executor Host... 23
HAWQ Query Plan Clients Motion Gather Project s.beer, s.price JDBC/ODBC SQL Console HAWQ Master Host Query Parser Query Optimizer HDFS Namenode s Scan Sells HashJoin b.name = s.bar Motion Redist(b.name) Filter b.city = 'San Francisco' b Scan Bars Query Host Executor Query Host Executor Query Executor Host... 24
Query Plan Sent To s Clients JDBC/ODBC SQL Console HAWQ Master Host Query Parser Query Optimizer HDFS Namenode Motion Gather Project s.beer, s.price HashJoin b.name = s.bar Query Host Executor Query Host Executor Query Executor Motion Gather Motion Gather Project s.beer, s.price Project s.beer, s.price HashJoin b.name = s.bar HashJoin b.name = s.bar Host... Motion Redist(b.name) Motion Redist(b.name) Motion Redist(b.name) s Scan Sells Filter b.city = 'San Francisco' s Scan Sells Filter b.city = 'San Francisco' s Scan Sells Filter b.city = 'San Francisco' b Scan Bars b Scan Bars b Scan Bars 25
HAWQ Leverages Dynamic Pipelining Clients JDBC/ODBC SQL Console HAWQ Master Host Query Parser Query Optimizer HDFS Namenode Query Host Executor Query Host Executor Query Executor D y n a m i c P i p e l i n i n g Host... 26
Aggregate Data: Sent To The Master & Client Clients JDBC/ODBC SQL Console HAWQ Master Host Query Parser Query Optimizer HDFS Namenode Query Host Executor Query Host Executor Query Executor Host... 27
HAWQ Deployment Model Master Servers & Name Nodes Query planning & dispatch Dynamic Pipelining Segment Servers & Data Nodes Query processing & data storage External Sources Loading, streaming, etc.... ODBC/JDBC Driver......... HDFS 28
HAWQ Benchmarks User inteligence 4.2 198 Sales analysis 8.7 161 Click analysis 2.0 415 Data exploration 2.7 1,285 BI drill down 2.8 1,815 47X 19X 208X 476X 648X 29
HAWQ: The Foundation of Big Data Pivotal Data Fabric Stream Ingestion Data Staging Platform Analytical Query Operational Intelligence Run-Time Applications Streaming Services Data Mgmt. Services In-Memory DB In-Memory Objects HDFS 30