VOLTDB + HP VERTICA. page

Similar documents
Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics

2014 年 3 月 13 日星期四. From Big Data to Big Value Infrastructure Needs and Huawei Best Practice

Microsoft Analytics Platform System (APS)

WHITEPAPER. MemSQL Enterprise Feature List

VoltDB vs. Redis Benchmark

Data-Intensive Distributed Computing

Modern Data Warehouse The New Approach to Azure BI

Přehled novinek v SQL Server 2016

Architecture of a Real-Time Operational DBMS

Managing IoT and Time Series Data with Amazon ElastiCache for Redis

Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack. Chief Architect RainStor

Building a Data Strategy for a Digital World

Data 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp.

Achieving Horizontal Scalability. Alain Houf Sales Engineer

Evolving To The Big Data Warehouse

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

Evolution of Big Data Facebook. Architecture Summit, Shenzhen, August 2012 Ashish Thusoo

Overview of Data Services and Streaming Data Solution with Azure

Architectural challenges for building a low latency, scalable multi-tenant data warehouse

HYBRID TRANSACTION/ANALYTICAL PROCESSING COLIN MACNAUGHTON

5 Fundamental Strategies for Building a Data-centered Data Center

Data Analytics at Logitech Snowflake + Tableau = #Winning

NewSQL. Database Landscape From: the 451 group. OLTP Focus. NewSQL: Flying on ACID. Cloud DB, Winter 2014, Lecture 14

A Single Source of Truth

DATABASE SCALE WITHOUT LIMITS ON AWS

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019

LazyBase: Trading freshness and performance in a scalable database

Big Data on AWS. Big Data Agility and Performance Delivered in the Cloud. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Microsoft Big Data and Hadoop

NewSQL: Flying on ACID

HOW TO ACHIEVE REAL-TIME ANALYTICS ON A DATA LAKE USING GPUS. Mark Brooks - Principal System Kinetica May 09, 2017

Data 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp.

New Oracle NoSQL Database APIs that Speed Insertion and Retrieval

SAP IQ - Business Intelligence and vertical data processing with 8 GB RAM or less

NOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS. Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe

Revolutionizing the Datacenter Join the Conversation #OpenPOWERSummit

Cloud Analytics and Business Intelligence on AWS

S-Store: Streaming Meets Transaction Processing

Traditional RDBMS Wisdom is All Wrong -- In Three Acts "

Key Differentiators. What sets Ideal Anaytics apart from traditional BI tools

Beyond Relational Databases: MongoDB, Redis & ClickHouse. Marcos Albe - Principal Support Percona

Big Data Facebook

In-Memory Computing EXASOL Evaluation

Conceptual Modeling on Tencent s Distributed Database Systems. Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc.

Data Acquisition. The reference Big Data stack

VoltDB for Financial Services Technical Overview

Lambda Architecture for Batch and Stream Processing. October 2018

Introduction to Oracle NoSQL Database

An InterSystems Guide to the Data Galaxy. Benjamin De Boe Product Manager

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

How Real Time Are Your Analytics?

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

HyPer-sonic Combined Transaction AND Query Processing

IOTA ARCHITECTURE: DATA VIRTUALIZATION AND PROCESSING MEDIUM DR. KONSTANTIN BOUDNIK DR. ALEXANDRE BOUDNIK

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

HyPer-sonic Combined Transaction AND Query Processing

Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk

Cloud Computing & Visualization

Traditional RDBMS Wisdom is All Wrong -- In Three Acts. Michael Stonebraker

Application-Tier In-Memory Analytics Best Practices and Use Cases

MariaDB MaxScale 2.0 and ColumnStore 1.0 for the Boston MySQL Meetup Group Jon Day, Solution Architect - MariaDB

NewSQL Databases. The reference Big Data stack

Top Five Reasons for Data Warehouse Modernization Philip Russom

In-Memory Data Management Jens Krueger

Kognitio Analytical Platform

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Big Data on AWS. Peter-Mark Verwoerd Solutions Architect

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

MarkLogic Technology Briefing

Leveraging Customer Behavioral Data to Drive Revenue the GPU S7456

Safe Harbor Statement

Accelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite. Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017

Flash Storage Complementing a Data Lake for Real-Time Insight

Part 1: Indexes for Big Data

Capture Business Opportunities from Systems of Record and Systems of Innovation

Index. Raul Estrada and Isaac Ruiz 2016 R. Estrada and I. Ruiz, Big Data SMACK, DOI /

The Google File System

Field Testing Buffer Pool Extension and In-Memory OLTP Features in SQL Server 2014

Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016

Integrating Oracle Databases with NoSQL Databases for Linux on IBM LinuxONE and z System Servers

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

BIG DATA TESTING: A UNIFIED VIEW

DATABASES IN THE CMU-Q December 3 rd, 2014

Understanding the latent value in all content

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

Rickard Linck Client Technical Professional Core Database and Lifecycle Management Common Analytic Engine Cloud Data Servers On-Premise Data Servers

Isilon: Raising The Bar On Performance & Archive Use Cases. John Har Solutions Product Manager Unstructured Data Storage Team

Crescando: Predictable Performance for Unpredictable Workloads

Introduction to Database Services

Scaling Without Sharding. Baron Schwartz Percona Inc Surge 2010

SAP HANA. Jake Klein/ SVP SAP HANA June, 2013

The Technology of the Business Data Lake. Appendix

Trafodion Enterprise-Class Transactional SQL-on-HBase

Oracle Exadata: Strategy and Roadmap

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,

Modernizing Business Intelligence and Analytics

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic

Transcription:

VOLTDB + HP VERTICA

ARCHITECTURE FOR FAST AND BIG

DATA ARCHITECTURE FOR FAST + BIG DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting Fast Operational Database Streaming Analytics Columnar Analytics OLAP Ingest / Interactive Decisioning Export Data Lake (HDFS) Non Relational Processing ETL CRM ERP Etc. Enterprise Apps 3

REQUIREMENTS FOR FAST DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting SQL on Hadoop 5 Fast Operational Database Ingest / Interactive 3 1 2 Streaming Analytics Decisioning Export 4 1) Ingest & interact on streams of inbound data 2) Make per event, data driven decisions Explorator 3) Real-time y Analytics Data Lake analytics on fast moving data 4) Integrated (HDFS) export to data warehouse 5) High speed serving of warehouse derived analytics Map Reduce ETL CRM ERP Etc. Enterprise Apps

REQUIREMENTS FOR FAST DATA STREAM PROCESSING 2 Streaming Alternative is Wrong Decisions only on Aggregated or predefined 1 Ingest 5 SQL database Decisioning Stream Processing Unable to do fast serving of Analytics from warehouse Continuous Computation for RTA 3 Hand coded computations 4 BIG DATA BI Reporting ETL SQL on Hadoop 1)Ingest & interact on streams of inbound data 2)Make per event, data driven decisions 3)Real-time analytics on fast moving data Explorator 4)Integrated export to data warehouse y Analytics 5)High Data speed Lake serving of warehouse derived analytics (HDFS) 6)System of Record OLTP (requires Map different system) Reduce CR M ERP Etc. Enterprise Apps

VOLTDB S ROLE

VOLTDB ASSUMPTIONS (2008) High availability fundamental Shared nothing commodity clusters Win for cloud and non-cloud users alike. Operational data sets fit in RAM External transaction control is slow 10s to 100s of cores per machine Specialized systems win Nobody cares about 5x faster 10x is a floor Mike Stonebraker

TRADITIONAL RDBMS: BAD AT CONCURRENCY, DURABILITY Heavy Overhead 1000s of concurrent versions Contention for locked records Contention for latching on lock table Index bottlenecks Disk I/O bottlenecks Architecture limits scaling Buffer Management 29% Useful Work 12% Latching 10% Index Management 11% Locking 18% Logging 20%

THE VOLTDB TECHNOLOGY OVERVIEW High-Velocity, In-Memory Database Data ingestion, decisioning and real-time analytics Thousands to millions of transactions a second Data fully protected with disk durability Relational, ACID-compliant SQL Keep complex data management where it belongs Visibility into business via real-time analytics SQL lowers development costs Scale out on commodity hardware Clustered system with single operational view Built-in failover and replication Flexible deployment in cloud or dedicated servers

VOLTDB EXPORT Connector VoltDB Server Data Queue Batch Insert Commit Target Database Overflow to disk Automatic and continuous Transactional data transfer Resilient against impedance mismatches

Throughput (ops/sec) Throughput (ops/sec) Throughput (ops/sec) VOLTDB YCSB YCSB Workload-B Scaling Softlayer vs AWS YCSB Workload-A Scaling Softlayer vs AWS 1,600,000 1,400,000 1,200,000 1,000,000 800,000 800,000 700,000 600,000 500,000 400,000 300,000 200,000 100,000 0 3 Nodes 6 Nodes 9 Nodes 12 Nodes 600,000 400,000 200,000 YCSB Workload-E Scaling Softlayer vs AWS 500,000 400,000 300,000 200,000 0 3 Nodes 6 Nodes 9 Nodes 12 Nodes 100,000 0 3 Nodes 6 Nodes 9 Nodes 12 Nodes

VOLTDB APPLICATIONS Data Pipelines: apps against streams using export connectors to downstream OLAP/HDFS Stream processing Event correlation Real time ETL Streaming scale (100k+ write transactions / second) workloads Pair new events to previous events. Session start, update, end. Max sensor reading in 200ms window. CDR update. ACID upsert. Efficient continuous trickle load to archive destination (HDFS, OLAP) Real time Analytics: in-memory MPP SQL on materialized views and moving windows Real time Analytics Running aggregates, groups, summary data. Streaming counters, time-series grouping Moving window cache Persist tip of stream for adhoc query and real time analysis, operational monitoring Fast Decisions: scalable request/response applications requiring ACID transactions and high throughput Per-event decisions Real time Analytics Synchronous per-event (ms latency) authorization, personalization, recommendation Running aggregates, groups, summary data. Cross-event, cross-row, DB global summaries. 12

VOLTDB + HP VERTICA

DATA ARCHITECTURE FOR FAST + BIG DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting Fast Operational Database Streaming Analytics Columnar Analytics OLAP Ingest / Interactive Decisioning Export Data Lake (HDFS) Non Relational Processing ETL CRM ERP Etc. Enterprise Apps 14

HP VERTICA VOLTDB JOINT CUSTOMERS 15

SAMPLE OF VOLTDB / OLAP JOINT APPLICATIONS VoltDB OLAP Event logging/profiling Edgar Online Ingest events Filter to ~10% Export to Vertica Analytic reports Online Game Optimization Machine Zone Ingest game events Real-time dashboards Moving window A/B in-game testing Analytics to Tableau Mortgage Loan App Large Bank Operational DB Ingest, update Scoring dashboard 5,000+ concurrent users Export to Vertica High Volume Analytics Near real-time/batch Historical Store Marketing Solutions FICO OLTP client Ingest events (15-20k tps) Update new information In transaction analytics Export to Vertica DB of Record Analytic Request 3 Vertica clusters MultiTB 16

EXAMPLE VoltDB for Fast. Vertica for Big Bi-directional connections VoltDB Export (VoltDB -> Vertica) Vertica UDX (VoltDB <- Vertica) Per-event personalization using real time data and historical scoring

REAL TIME SCORING EXAMPLE Personalization opportunities User segmentation model calculated in Vertica and stored in VoltDB F2P gaming platform Segment scored responses Game play events and scoring decisions exported to Vertica

FAST AND BIG IN COMBINATION VoltDB Profile In memory: user segmentation - GB to TB (300M+ rows) 10k to 1M+ requests/sec 99 percentile latency under 5ms. (5x9 s under 50ms) VoltDB export to Vertica Vertica Profile TB to PB of historical data Columnar analytics for fast reporting. Real time ingest of historical data (possibly via VoltDB) Vertica UDX to VoltDB

THANK YOU! 20