HOW TO ACHIEVE REAL-TIME ANALYTICS ON A DATA LAKE USING GPUS Mark Brooks - Principal System Engineer @ Kinetica May 09, 2017
The Challenge: How to maintain analytic performance while dealing with: Larger data volumes Streaming data with minimal end-to-end latency Ad-hoc drill down (you can t pre-aggregate everything) 2
Architectural and Design Approaches 1. One database to rule them all 2. SQL on Hadoop (or directly on the Data Lake) 3. Data Lake + NoSQL + Spark + Search + Cache + 4. Lambda Architecture 5. Kappa Architecture 6. Next generation hardware acceleration 3
One Database To Rule Them All 4
SQL on a Data Lake Credit: https://www.slideshare.net/bigdatapump/sql-on-hadoop-49494494 5
Hadoop + NoSQL + Search + Memory Cache + Credit: Matt Turck - https://www.slideshare.net/mjft01/big-data-landscape-matt-turck-may-2014 6
Lambda Architecture Credit: Nathan Marz James Kinley http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html http://jameskinley.tumblr.com/tagged/lambda 7
Lambda Architecture Credit: James Kinley http://jameskinley.tumblr.com/tagged/lambda 7
Kappa Architecture Credit: Jay Kreps https://www.oreilly.com/ideas/questioning-the-lambda-architecture 8
Kappa Architecture Stream processing systems already have a notion of parallelism; why not just handle reprocessing by increasing the parallelism and replaying history very, very fast? Credit: Jay Kreps https://www.oreilly.com/ideas/questioning-the-lambda-architecture 8
Next Generation Hardware Acceleration Consider a system with these characteristics: Horizontally Scalable Low end-to-end latency Powerful enough to not require pre-aggregation This is now possible Credit: Jay Kreps https://www.oreilly.com/ideas/questioning-the-lambda-architecture 8
GPU Accelerated Compute 1990-2000 s 2005 2010 2017 AT SCALE PROCESSING BECOMES THE BOTTLENECK DATA WAREHOUSE DISTRIBUTED STORAGE AFFORDABLE MEMORY GPU ACCELERATED COMPUTE RDBMS & Data Warehouse technologies enable organizations to store and analyze growing volumes of data Hadoop and MapReduce enables distributed storage and processing across multiple machines. Affordable memory allows for faster data read and write. HANA, MemSQL, & Exadata provide faster analytics. GPU cores bulk process tasks in parallel - far more efficient for many data-intensive tasks than CPUs which process those tasks linearly. on high performance machines, but at high cost. Storing massive volumes of data becomes more affordable, but performance is slow 12
Kinetica: Core ANALYTICS DATABASE ACCELERATED BY GPUs Columnar in-memory database Data available much like a traditional RDBMS rows, columns Data held in-memory; persisted to disk Interact with Kinetica through its native REST API, Java, Python, JavaScript, NodeJS, C++, SQL, etc as well as with various connectors HTTP Head Node GPU Accelerated Columnar In-memory Database A1 B1 C1 A2 B2 C2 A3 B3 C3 A4 B4 C4 Disk Commodity Hardware w/ GPUs KINETICA Native GIS & IP address object support VERY FAST: Ideal for OLAP workloads Typical hardware setup: 256GB - 1TB memory with 2-4 GPUs per node. 13
Multi-Head Ingest and Scale-Out Architecture ON-DEMAND SCALE OUT HTTP Head Node HTTP Head Node HTTP Head Node Columnar In-memory Columnar In-memory Columnar In-memory A1 B1 C1 A2 B2 C2 A3 B3 C3 A1 B1 C1 A2 B2 C2 A3 B3 C3 A1 B1 C1 A2 B2 C2 A3 B3 C3 + A4 B4 C4 A4 B4 C4 A4 B4 C4 Disk Disk Disk Commodity Hardware w/ GPUs Commodity Hardware w/ GPUs Commodity Hardware w/ GPUs MULTI-HEAD INGEST 19
Real-Time Data Handlers for Structured & Unstructured Data Java API APIs C++ API VISUALIZATION via ODBC/JDBC GEOSPATIAL CAPABILITIES Geometric Objects WMS JavaScript API Node.js API Tracks WKT REST API Python API Geospatial Endpoints OPEN SOURCE INTEGRATION HTTP Head Node HTTP Head Node HTTP Head Node HTTP Head Node Apache NiFi Apache Kafka Apache Spark Apache Storm Columnar In-memory A1 B1 C1 A2 B2 C2 A3 B3 C3 A4 B4 C4 Columnar In-memory A1 B1 C1 A2 B2 C2 A3 B3 C3 A4 B4 C4 Columnar In-memory A1 B1 C1 A2 B2 C2 A3 B3 C3 A4 B4 C4 Columnar In-memory A1 B1 C1 A2 B2 C2 A3 B3 C3 A4 B4 C4 OTHER INTEGRATION Message Queues ETL Tools Disk Commodity Hardware w/ GPUs Disk Commodity Hardware w/ GPUs Disk Commodity Hardware w/ GPUs Disk Commodity Hardware w/ GPUs Streaming Tools KINETICA CLUSTER On-Demand Scale 20
Parallel Ingest Provides High Performance Streaming PARALLEL INGEST 1 NODE (1TB/2GPU) 1 NODE (1TB/2GPU) 1 NODE (1TB/2GPU) Each node of the system can share the task of data ingest, provides more and faster throughput. It can be made faster simply by adding more nodes. No compute is used on ingest! 16
Speed Layer for the Data Lake Parallel ingestion of events Kinetica is speed layer with realtime analytic capabilities Amazon Kinesis Put, get, scan ANALYSTS HDFS for archival store Much looser coupling than traditional lambda architecture Batch mode Spark or MR jobs can push data to Kinetica as needed for fast query on data loaded from the data lake EVENTS MESSAGE BROKERS Kinetica Connectors STREAM PROCESSING Parallel Ingestion Execute complex analytics on the fly MOBILE USERS DASHBOARDS & APPLICATIONS ALERTING SYSTEMS HDFS / AWS S3 / GCS / Azure Data Lake 17
Real-Time, Advanced Analytics, Speed Layer for Teradata or Oracle Parallel ingestion of events Lambda-type architecture for Teradata or Oracle Kinetica is speed layer with near-real-time analytic capabilities Converge Machine Learning, streaming and location analytics and fast Query and Analytics with Kinetica and RDBMS DATA IN MOTION AND REST Amazon Kinesis Kinetica Connectors STREAM / ETL PROCESSING Fast GPU accelerated, in- Memory Database Converge ML, AI, Streaming MOBILE USERS ANALYSTS DASHBOARDS & APPLICATIONS ALERTING SYSTEMS DATA WAREHOUSE / TRANSACTIONAL 18
Advanced In-Database Analytics ORCHESTRATION LAYER WITH USER-DEFINED FUNCTIONS (UDFs) 1. User-defined functions (UDFs) can receive table data, do arbitrary computations, and save output to a separate table in a distributed manner. 2. UDFs have direct access to CUDA APIs enables compute-to-grid analytics for logic deployed within Kinetica. 3. Works with custom code, or packaged code. Opens the way for machine learning/artificial intelligence libraries such as TensorFlow, BIDMach, Caffe and Torch to work on data directly within Kinetica. 4. Available now with C++ & Java bindings. PHYSICAL / VIRTUAL SERVER Table A Table B Table n Table C Proc Server UDF_A UDF_B UDF_n CUDA Libraries GPU n number of Kinetica servers Data returned to output table for further analysis /exec/proc/udf_a/ UDFs exposed from RESTful endpoint 19
Kinetica Architecture STREAMING DATA ETL / STREAM PROCESSING PARALLEL INGEST ON DEMAND SCALE OUT + 1TB MEM / 2 GPU CARDS Native APIs SQL Geospatial WMS Custom Connectors BI / GIS / APPS KINETICA REVEAL BI DASHBOARDS UDFs ERP / CRM / TRANSACTIONAL DATA In-Database Processing CUSTOM LOGIC BIDMach ML Libs CUSTOM APPS & GEOSPATIAL 20
AI & BI on One GPU-Accelerated Database BUSINESS INTELLIGENCE CUSTOM APPLICATIONS HIGH FIDELITY GEOSPATIAL PIPELINE SQL ODBC / JDBC Native REST API WMS HIGH PERFORMANCE ANALYTICS DATABASE BUSINESS USERS BIDMach UDF UDF UDF DATA SCIENTISTS / DEVELOPERS MACHINE LEARNING & DEEP LEARNING GPU-ACCELERATED DATA SCIENCE PREDICTIVE MODELS e.g. Risk Management, Sales Volume, Fraud. 21
50-100x Faster on Queries with Large Datasets WHEN COMPARED TO LEADING IN-MEMORY ALTERNATIVES Large retailer tested complex SQL queries on 3 years of retail data (150bn rows) 10 node Kinetica cluster against 30TB+ cluster from next best alternative GPU is able to perform many instructions in parallel. Huge performance gains on aggregations, group bys, joins, etc. Kinetica sustained ingest of 1.3bn objects/minute with 70 attributes per row SELECT (Q10) GROUP BY (Q5) SUM (Q1) 0 5 10 15 20 25 30 35 40 45 50 Kinetica Leading In-Memory DB More Details 22
Distributed Geospatial Pipeline NATIVE VISUALIZATION IS DESIGNED FOR FAST MOVING, LOCATION-BASED DATA Native Geospatial Object Types Points, Shapes, Tracks, Labels Native Geospatial Functions Filters (by area, by series, by geometry, etc.) Aggregation (histograms) Geofencing - triggers Video generation (based on dates/times) Generate Map Overlay Imagery (via WMS) Rasterize points Style based on attributes (class-break) Heat maps 23
Full-Text Search Kinetica includes powerful text search functionality, including : Rain Tire ~5 "Union Tranquility"~10 Exact Phrases Boolean AND / OR Wildcards Grouping Fuzzy Search (Damerau-Levenshtein optimal string alignment algorithm) N-Gram Term Proximity Search Term Boosting Relevance Prioritization [100 TO 200] 22
CASE STUDY : LOCATION BASED ANALYTICS INTELLIGENCE: US Army - INSCOM US Army s in-memory computational engine for any data with a geospatial or temporal attribute for a major joint cloud initiative within the Intelligence Community (IC ITE). U.S Army INSCOM Shift from Oracle to GPUdb Intel analysts are able to conduct near real-time analytics and fuse SIGINT, ISR, and GEOINT streaming big data feeds and visualize in a web browser. First time in history military analysts are able to query and visualize billions to trillions of near realtime objects in a production environment. Major executive military and congressional visibility. GPUdb (20ms) 42x Lower Space 28x Lower Cost 38x Lower Power Cost 1 GPUdb server vs 42 servers with Oracle 10gR2 (2011) Oracle Spatial (92 Minutes) 24
CASE STUDY : LOCATION BASED ANALYTICS LOGISTICS: Workforce optimization USPS is the single largest logistic entity in the country, moving more individual items in four hours than the combination of UPS, FedEx, and DHL move all year. DISTRIBUTED ANALYSIS USPS parallel cluster is able to serve up to 15,000 simultaneous sessions, providing the service s managers and analysts with the capability to instantly analyze their areas of responsibility via dashboards. AT SCALE With 200,000 USPS devices emitting location once every minute, that amounts to more than a quarter billion events captured and analyzed daily tracked on 10 nodes. 25
CASE STUDY : LOCATION BASED ANALYTICS LOGISTICS & FLEET MANAGEMENT LARGE RETAILER Kinetica enables agile tracking of shipments to assist store managers for tracking of inventory and arrival times. Visibility and tracking of deliveries & trucks for store managers ETA & Notifications Provide estimated time of delivery, notifications and custom location based alerting Route Optimization based on truck size, and if cargo is perishable or contains hazardous materials. 27
CASE STUDY : ADVANCED IN-DATABASE ANALYTICS RISK MANAGEMENT MULTINATIONAL BANK Large financial institution moves counterparty risk analysis from overnight to real-time. Data collected by XVA library which computes risk metrics for each trade Risk computations are becoming more complex and computationally heavy. xva analysis needs to project years into the future. Kinetica enables banks to move from batch/overnight analysis to a streaming/real-time system for flexible real-time monitoring by traders, auditors and management. 28
Scale Out on Industry Standard Hardware Kinetica typically results in 1 10 hardware costs of standard in-memory databases. Runs on industry standard servers, 512GB memory with GPUs (ex. NVIDIA K80) IN THE CLOUD WITH: COMING SOON: CERTIFIED ON PREMISE WITH: 29
Stop by Booth #431 and Get Your Free T-shirt www.kinetica.com