Vertica Now. Quarterly Customer Webinar. January 19, 2017

Similar documents
Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Benchmarks Prove the Value of an Analytical Database for Big Data

Additional License Authorizations. For HPE HAVEn and Vertica Analytics Platform software products

April Copyright 2013 Cloudera Inc. All rights reserved.

Security and Performance advances with Oracle Big Data SQL

Additional License Authorizations. For Vertica software products

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019

microsoft

Databricks, an Introduction

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Approaching the Petabyte Analytic Database: What I learned

Why All Column Stores Are Not the Same Twelve Low-Level Features That Offer High Value to Analysts

WHITEPAPER. MemSQL Enterprise Feature List

Shark: SQL and Rich Analytics at Scale. Michael Xueyuan Han Ronny Hajoon Ko

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data

Modernizing Business Intelligence and Analytics

Exam Questions

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

Architectural challenges for building a low latency, scalable multi-tenant data warehouse

Cloud Analytics and Business Intelligence on AWS

Integrating Advanced Analytics with Big Data

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo

Microsoft Azure Databricks for data engineering. Building production data pipelines with Apache Spark in the cloud

CloudExpo November 2017 Tomer Levi

Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here

Oracle Big Data Connectors

Blended Learning Outline: Cloudera Data Analyst Training (171219a)

Asanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks

Evolving To The Big Data Warehouse

Service-Level Agreement (SLA) based Reliability, Availability, and Scalability (RAS) for analytics The solution has no single point of failure. The Ve

Syncsort DMX-h. Simplifying Big Data Integration. Goals of the Modern Data Architecture SOLUTION SHEET

Technical Sheet NITRODB Time-Series Database

QLIK INTEGRATION WITH AMAZON REDSHIFT

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

Big Data Infrastructures & Technologies

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

Integrate MATLAB Analytics into Enterprise Applications

In-Memory Computing EXASOL Evaluation

Part 1: Indexes for Big Data

Big Data Architect.

ITOM CUSTOMER FORUM 2018

Microsoft. Perform Data Engineering on Microsoft Azure HDInsight Version: Demo. Web: [ Total Questions: 10]

Safe Harbor Statement

Supported Platforms. HP Vertica Analytic Database. Software Version: 7.0.x

Impala. A Modern, Open Source SQL Engine for Hadoop. Yogesh Chockalingam

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::

Integrate MATLAB Analytics into Enterprise Applications

4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015)

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics

Microsoft Big Data and Hadoop

Big Data and Object Storage

VOLTDB + HP VERTICA. page

Building a Data Strategy for a Digital World

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

Netezza The Analytics Appliance

What's New in MATLAB for Engineering Data Analytics?

Cloudline Autonomous Driving Solutions. Accelerating insights through a new generation of Data and Analytics October, 2018

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

DATA SCIENCE USING SPARK: AN INTRODUCTION

exam. Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0

REGULATORY REPORTING FOR FINANCIAL SERVICES

Shine a Light on Dark Data with Vertica Flex Tables

Apache HAWQ (incubating)

Spatial Analytics Built for Big Data Platforms

BigInsights and Cognos Stefan Hubertus, Principal Solution Specialist Cognos Wilfried Hoge, IT Architect Big Data IBM Corporation

iway Big Data Integrator New Features Bulletin and Release Notes

MapR Enterprise Hadoop

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

From Hindsight to Insight to Foresight Extend Analytical Capabilities with Vertica

Big Data with Hadoop Ecosystem

Agenda. Spark Platform Spark Core Spark Extensions Using Apache Spark

MATLAB. Senior Application Engineer The MathWorks Korea The MathWorks, Inc. 2

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo

Integrate MATLAB Analytics into Enterprise Applications

Oracle NoSQL Database Enterprise Edition, Version 18.1

Transform to Your Cloud

Innovatus Technologies

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Pagely.com implements log analytics with AWS Glue and Amazon Athena using Beyondsoft s ConvergDB

TPCX-BB (BigBench) Big Data Analytics Benchmark

Cloud Computing & Visualization

Please give me your feedback

Apache Spark 2.0. Matei

Oracle Spatial Summit 2015 Best Practices for Developing Geospatial Apps for the Cloud

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet

Oracle Big Data Fundamentals Ed 1

Quobyte The Data Center File System QUOBYTE INC.

ArcGIS Enterprise: An Introduction. Philip Heede

Microsoft Power BI for O365

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

Microsoft Analytics Platform System (APS)

BI, Big Data, Mission Critical. Eduardo Rivadeneira Specialist Sales Manager

Stages of Data Processing

Applied Spark. From Concepts to Bitcoin Analytics. Andrew F.

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

An Introduction to Apache Spark

DATACENTER SERVICES DATACENTER

Big Data Hadoop Stack

Databases 2 (VU) ( / )

Transcription:

Vertica Now Quarterly Customer Webinar January 19, 2017

Vertica Now is our brand new, customer-only quarterly webinar designed to keep you informed of new releases and updates, helpful technical resources, future roadmap capabilities, and more! 2

Today s Agenda Vertica Updates: Micro Focus Spin-merger (Colin Mahony, SVP/GM Big Data Platform) What s New: Vertica 8.0SP1 Release (Misha Davidson, Director of Engineering) o o o In-database analytics additions and improvements Management Console for Azure and AWS Open Source and format integrations o Scale and concurrency performance enhancements Did You Know: Geospatial Analytics (Casey Starnes, Information Developer) o o Introduction How Vertica approaches Geospatial Analytics (use case example) o Q&A Summary and resources 3

4 Micro Focus Spin-Merger Colin Mahony

Welcome to an exciting 2017! Colin Mahony SVP & GM, HPE Big Data

Continuing the journey toward a pure play software business Q: What areas are you most excited to focus on and grow within the HPE software portfolio? The first is big data analytics. Big data analytics is in my core mission statement because it is required in every piece of software now security, of software now security, IT operations, application development, archiving. Customers want to be able to quickly search, find, and make search, find, and make decisions using big-data analytics. Vertica is and will continue to be one of the most prevalent big data analytics OEM prevalent big data analytics OEM platforms in the industry. If you look at companies that have big data as their models Uber, Facebook, and a models Uber, Facebook, and a number of others Vertica is at the core of that. Chris Hsu, EVP & GM of HPE Software (Named CEO of newly formed company) http://bit.ly/2jblpjg 6

The New Company: Micro Focus + HPE Software One of the world s largest pure-play software companies Annual Revenue $4.5 billion* FTSE 100 company Customers 50,000 Employees ~20,000 With one of the broadest IT SW portfolios - designed to address customers challenges across: IT Operations Security Information Governance Big Data Analytics Cloud Linux & Open Source DevOps Supported by strong leadership and operations World-class management team comprising leaders from HPE Software and Micro Focus Set up for success HPE shareholders will own 50.1% of the new company Senior HPE executive will serve on the board HPE will nominate 50% of independent directors Stronger go-to-market Deeper R&D resources Improved operational efficiencies and scale and we re just beginning the new company will have a strong platform for M&A. * Trailing twelve months as of April 30, 2016 for HPE s Software segment and Micro Focus including Serena Software for the full year

8 What s New: 8.0SP1 Release Misha Davidson

Vertica s strategic focus Performance at scale Integration: interoperation with open source / formats Multi cloud: freedom to deploy anywhere In-database analytics & machine learning

User Defined Loads ODBC, JDBC, OLEDB Vertica Big Data Analytics integrated with OSS / CSPs through an ecosystem-friendly architecture Data Transformation User Defined Functions R Java Python C++ SQL BI & Visualization Messaging Geospatial Event Series Time series High scale / concurrency Pattern Matching Machine Learning Text Analytics ETL Storage Management Security integrations LDAP, Kerberos, also FIPS External tables to analyze in place 10

In-database Analytics & Machine Learning 11

Why put Machine Learning in Vertica? Limitations of standalone solutions for ML in Big Data Added Cost Requires Down Sampling Slower Time to Development Slower Time to Deployment Additional hardware required for building predictive models Cannot process large data sets due to memory and computational limitations, resulting in inaccurate predictions Higher turnaround times for model building/scoring and need for moving large volumes of data between systems Inability to quickly deploy predictive models into production 12

Building Machine Learning into the Core Vertica Server New capabilities deliver predictive analytics at speed and scale Node 1 Node 2. Node n - Run in parallel across hundreds of nodes in a Vertica cluster - Eliminating all data duplication typically required of alternative vendor offerings - No need to down-sampling which can lead to less accurate predictions - A single system for SQL analytics and Machine Learning 13

Machine Learning Pack 8.0SP1 is much faster Now installed with Vertica by default starting from 8.0.0 Algorithm Model Training Prediction Evaluation Linear Regression Logistic Regression K-means Naïve Bayes New in 8.1 Data Preparation Normalization Imbalanced data processing Sampling Outlier Detection Model Management Summarize models Rename models Delete models 14

Runtime (second) Runtime (second) Runtime (second) Runtime (second) Vertica s ML libraries run faster / scale better than Spark Benchmark between Vertica 8.01 and Spark 2.01 KMeans (K=5; Col=100) Linear Regression (Col=100) 10000 10000 1000 1000 100 100 10 10 1 1 10 100 1000 1 0.1 1 10 100 1000 Rows (million) Rows (million) Naive Bayes (Col=100) Logistic Regression (Col=100) 1000 10000 100 1000 10 100 1 1 10 100 1000 10 0.1 Rows (million) 1 1 10 100 1000 Rows (million) 15

Multi cloud: freedom to deploy anywhere AWS / Azure 16

Vertica multi cloud New in 8.0SP1 Management Console (MC) support Management Console is now available in Azure Can be instantiated alongside a Vertica cluster running in Azure as well 3 nodes + MC Cloud Formation Template now works in AWS In about 15 minutes, the new 3 node Vertica cluster with MC is ready. All access information can be viewed in the create stack s outputs tab.

Integration: interoperation with open source / formats Hadoop / Kafka 18

Vertica SQL on Hadoop integration Vertica is fast and Hadoop is big; our integration brings together fast and big Vertica offers only full-featured query engine for Hadoop Analyze data in place from HDFS, both in Parquet and ORC Integrated with HCatalog, is Hadoop distribution agnostic HPE Vertica Rich SQL support same core engine for both deployments High performance C++ libhdfs connectivity, parallel readers Integrated with Kerberos security Support for running on Hadoop machines, or dedicated cluster TPC-DS Benchmark results 1.6X faster than Impala 5.3.3 on average* HCatalog Ran all 99 queries with no modification vs. Impala s 67 avro parquet txt orc ROS ROS New in 8.0SP1 HDFS HDFS HDFS Support concurrent access to multiple HDFS clusters Added support for HA Name Nodes Expose syntax for Hive-style partition pruning for ORC and Parquet readers *Results are reflective of internal testing by HPE in a controlled environment. Actual performance may vary depending on application, installation, environment, datasets, and other factors

Vertica - Kafka integration Scalable, Fault Tolerant load with exactly-once semantics Vertica schedules loads to continuously consume from Kafka JSON, Avro, and custom parsers Handles bursty data, dynamically prioritizes topics according to their throughput In-database monitoring with admin GUI CLI Load Scheduler Terabytes per hour loaded, no tuning 3 Node Vertica Cluster (24 cores, 252 GB RAM each), 1 Kafka Broker New in 8.0SP1 Easier to use, automated configuration / validation of Kafka topics and brokers Kafka Kafka Kafka Load Export Vertica Vertica Vertica Kafka Plugin Kafka Plugin Kafka UDx Library

Performance at scale Operations / Data Management 21

More efficient and scalable rebalance in 8.0SP1 Improved performance and scale E.g., rebalance from 4 to 14 nodes improvements in 8.0SP1: 10-100 Tables, 20-200 Projections Improved monitoring tables / queries Both execution time and memory footprint of monitoring operations are dramatically reduced rebalance unsegmented performance 500 rebalance unsegmented performance rebalance_table_status resources rebalance_table_status performance 450 60 8000 450 Time (s) 400 50 350 300 40 250 30 200 150 20 100 10 50 0 8.0 100T x 20P SP1 100T x 20P 8.0 100T x 20P SP1 100T x 20P 8.0 10T x 200P SP1 10T x 200P Max Memory (MB) 7000 6000 5000 4000 3000 2000 1000 0 8.0 SP1 Time (s) 400 350 300 250 200 150 100 50 0 8.0 SP1 22

23 Did You Know: Geospatial Analytics Casey Starnes

HPE Vertica Geospatial Analytics Quick Overview What do I use Geospatial Analytics for? Use Geospatial Analytics to store and query vector data (points, lines, polygons) in Vertica. 24

HPE Vertica Geospatial Analytics Quick Overview What data types are supported? Vertica supports two spatial data types that can be used to store geographical objects such as points, lines, and polygons. GEOMETRY: Used to store planar data. It is generally used to store X,Y coordinates in a two-dimensional space. GEOGRAPHY: Used to store spherical (round-earth) data, or an object in the WGS84 coordinate system. It is used to store longitude and latitude coordinates that represent points, lines, and polygons on the Earth's surface. 25

HPE Vertica Geospatial Analytics Quick Overview What functions are supported? Vertica provides 65+ built-in functions for spatial analysis. The functions allow for the creation, comparison, analysis, and retrieval of spatial data. ST_<function_name> Compliant with OGC standards. (40+ functions) ST_Area ST_Distance ST_Intersects ST_IsValid STV_<function_name> Vertica specific; not OGC compliant. (25+ functions) STV_Create_Index STV_NN STV_Intersect STV_Export2Shapefile 26

HPE Vertica Geospatial Analytics Quick Overview How do spatial indexes in Vertica work? Spatial indexes are only used when performing spatial joins using STV_Intersect Query Result Join [block_id, count(ph.id)/count(rd.id)] Group By [block_id, count(ph.id)] Group By [block_id, count(rd.id)] x Spatial Join [ph.id, block_id] Spatial Join [rd.id, block_id] x x potholes ph blocks_idx roads rd x pothole road 27

Boston Potholes Use Case 28

Boston Potholes - Use Case Potholes are a fact of life in New England 29

Boston Potholes - Use Case Potholes need to be fixed 30

Boston Potholes - Use Case Where in Boston are the most potholes fixed? 31

Pothole spatial analysis Block Pothole Density: # potholes # roads Query Result Join [block_id, count(ph.id)/count(rd.id)] x Group By [block_id, count(ph.id)] Group By [block_id, count(rd.id)] x Spatial Join [ph.id, block_id] Spatial Join [rd.id, block_id] x x pothole road potholes ph blocks_idx roads rd Block Pothole Density = 3/2 32

Boston Potholes - Use Case D. Kahle and H. Wickham. ggmap: Spatial Visualization with ggplot2. The R Journal, 5(1), 144-161. URL http://journal.r-project.org/archive/2013-1/kahle-wickham.pdf 33

Geospatial Analytics - Summary Spatial data types follow the OGC standard 65+ geospatial functions Perform fast spatial joins using STV_Intersect Spatial indexes can only be used with STV_Intersect Leverages the Vertica MPP architecture for scalability In-database geospatial analytics Spatial data types and SQL functions Inclusion of geographical dimension in data analysis Additional Resources and Examples: https://github.com/vertica/vertica-geospatial 34

Q&A Do you have additional feedback for our team? BigDataPlatformCustomerCareTeam@HPE.com