VOLTDB + HP VERTICA
ARCHITECTURE FOR FAST AND BIG
DATA ARCHITECTURE FOR FAST + BIG DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting Fast Operational Database Streaming Analytics Columnar Analytics OLAP Ingest / Interactive Decisioning Export Data Lake (HDFS) Non Relational Processing ETL CRM ERP Etc. Enterprise Apps 3
REQUIREMENTS FOR FAST DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting SQL on Hadoop 5 Fast Operational Database Ingest / Interactive 3 1 2 Streaming Analytics Decisioning Export 4 1) Ingest & interact on streams of inbound data 2) Make per event, data driven decisions Explorator 3) Real-time y Analytics Data Lake analytics on fast moving data 4) Integrated (HDFS) export to data warehouse 5) High speed serving of warehouse derived analytics Map Reduce ETL CRM ERP Etc. Enterprise Apps
REQUIREMENTS FOR FAST DATA STREAM PROCESSING 2 Streaming Alternative is Wrong Decisions only on Aggregated or predefined 1 Ingest 5 SQL database Decisioning Stream Processing Unable to do fast serving of Analytics from warehouse Continuous Computation for RTA 3 Hand coded computations 4 BIG DATA BI Reporting ETL SQL on Hadoop 1)Ingest & interact on streams of inbound data 2)Make per event, data driven decisions 3)Real-time analytics on fast moving data Explorator 4)Integrated export to data warehouse y Analytics 5)High Data speed Lake serving of warehouse derived analytics (HDFS) 6)System of Record OLTP (requires Map different system) Reduce CR M ERP Etc. Enterprise Apps
VOLTDB S ROLE
VOLTDB ASSUMPTIONS (2008) High availability fundamental Shared nothing commodity clusters Win for cloud and non-cloud users alike. Operational data sets fit in RAM External transaction control is slow 10s to 100s of cores per machine Specialized systems win Nobody cares about 5x faster 10x is a floor Mike Stonebraker
TRADITIONAL RDBMS: BAD AT CONCURRENCY, DURABILITY Heavy Overhead 1000s of concurrent versions Contention for locked records Contention for latching on lock table Index bottlenecks Disk I/O bottlenecks Architecture limits scaling Buffer Management 29% Useful Work 12% Latching 10% Index Management 11% Locking 18% Logging 20%
THE VOLTDB TECHNOLOGY OVERVIEW High-Velocity, In-Memory Database Data ingestion, decisioning and real-time analytics Thousands to millions of transactions a second Data fully protected with disk durability Relational, ACID-compliant SQL Keep complex data management where it belongs Visibility into business via real-time analytics SQL lowers development costs Scale out on commodity hardware Clustered system with single operational view Built-in failover and replication Flexible deployment in cloud or dedicated servers
VOLTDB EXPORT Connector VoltDB Server Data Queue Batch Insert Commit Target Database Overflow to disk Automatic and continuous Transactional data transfer Resilient against impedance mismatches
Throughput (ops/sec) Throughput (ops/sec) Throughput (ops/sec) VOLTDB YCSB YCSB Workload-B Scaling Softlayer vs AWS YCSB Workload-A Scaling Softlayer vs AWS 1,600,000 1,400,000 1,200,000 1,000,000 800,000 800,000 700,000 600,000 500,000 400,000 300,000 200,000 100,000 0 3 Nodes 6 Nodes 9 Nodes 12 Nodes 600,000 400,000 200,000 YCSB Workload-E Scaling Softlayer vs AWS 500,000 400,000 300,000 200,000 0 3 Nodes 6 Nodes 9 Nodes 12 Nodes 100,000 0 3 Nodes 6 Nodes 9 Nodes 12 Nodes
VOLTDB APPLICATIONS Data Pipelines: apps against streams using export connectors to downstream OLAP/HDFS Stream processing Event correlation Real time ETL Streaming scale (100k+ write transactions / second) workloads Pair new events to previous events. Session start, update, end. Max sensor reading in 200ms window. CDR update. ACID upsert. Efficient continuous trickle load to archive destination (HDFS, OLAP) Real time Analytics: in-memory MPP SQL on materialized views and moving windows Real time Analytics Running aggregates, groups, summary data. Streaming counters, time-series grouping Moving window cache Persist tip of stream for adhoc query and real time analysis, operational monitoring Fast Decisions: scalable request/response applications requiring ACID transactions and high throughput Per-event decisions Real time Analytics Synchronous per-event (ms latency) authorization, personalization, recommendation Running aggregates, groups, summary data. Cross-event, cross-row, DB global summaries. 12
VOLTDB + HP VERTICA
DATA ARCHITECTURE FOR FAST + BIG DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting Fast Operational Database Streaming Analytics Columnar Analytics OLAP Ingest / Interactive Decisioning Export Data Lake (HDFS) Non Relational Processing ETL CRM ERP Etc. Enterprise Apps 14
HP VERTICA VOLTDB JOINT CUSTOMERS 15
SAMPLE OF VOLTDB / OLAP JOINT APPLICATIONS VoltDB OLAP Event logging/profiling Edgar Online Ingest events Filter to ~10% Export to Vertica Analytic reports Online Game Optimization Machine Zone Ingest game events Real-time dashboards Moving window A/B in-game testing Analytics to Tableau Mortgage Loan App Large Bank Operational DB Ingest, update Scoring dashboard 5,000+ concurrent users Export to Vertica High Volume Analytics Near real-time/batch Historical Store Marketing Solutions FICO OLTP client Ingest events (15-20k tps) Update new information In transaction analytics Export to Vertica DB of Record Analytic Request 3 Vertica clusters MultiTB 16
EXAMPLE VoltDB for Fast. Vertica for Big Bi-directional connections VoltDB Export (VoltDB -> Vertica) Vertica UDX (VoltDB <- Vertica) Per-event personalization using real time data and historical scoring
REAL TIME SCORING EXAMPLE Personalization opportunities User segmentation model calculated in Vertica and stored in VoltDB F2P gaming platform Segment scored responses Game play events and scoring decisions exported to Vertica
FAST AND BIG IN COMBINATION VoltDB Profile In memory: user segmentation - GB to TB (300M+ rows) 10k to 1M+ requests/sec 99 percentile latency under 5ms. (5x9 s under 50ms) VoltDB export to Vertica Vertica Profile TB to PB of historical data Columnar analytics for fast reporting. Real time ingest of historical data (possibly via VoltDB) Vertica UDX to VoltDB
THANK YOU! 20