Leveraging Customer Behavioral Data to Drive Revenue the GPU S7456

Similar documents
Netezza The Analytics Appliance

Introduction to K2View Fabric

VOLTDB + HP VERTICA. page

Why All Column Stores Are Not the Same Twelve Low-Level Features That Offer High Value to Analysts

Top Five Reasons for Data Warehouse Modernization Philip Russom

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Microsoft Analytics Platform System (APS)

Performance and Scalability Overview

Appliances and DW Architecture. John O Brien President and Executive Architect Zukeran Technologies 1

Approaching the Petabyte Analytic Database: What I learned

Accelerate your SAS analytics to take the gold

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

GPU Accelerated Data Processing Speed of Thought Analytics at Scale

BIG DATA ANALYTICS A PRACTICAL GUIDE

Exadata X3 in action: Measuring Smart Scan efficiency with AWR. Franck Pachot Senior Consultant

Microsoft Exam

Sub-Second Response Times with New In-Memory Analytics in MicroStrategy 10. Onur Kahraman

The Reality of Qlik and Big Data. Chris Larsen Q3 2016

Revolutionizing Data Warehousing in Telecom with the Vertica Analytic Database

BEST PRACTICES IN SELECTING AND DEVELOPING AN ANALYTIC APPLIANCE

Exadata. Presented by: Kerry Osborne. February 23, 2012

ELTMaestro for Spark: Data integration on clusters

Microsoft Azure Databricks for data engineering. Building production data pipelines with Apache Spark in the cloud

Shine a Light on Dark Data with Vertica Flex Tables

Oracle Exadata: The World s Fastest Database Machine

IBM PureData System for Analytics The Next Generation. Ralf Götz Client Technical Professional Big Data IBM Deutschland GmbH

Interactive SQL-on-Hadoop from Impala to Hive/Tez to Spark SQL to JethroData

Massively Parallel Processing. Big Data Really Fast. A Proven In-Memory Analytical Processing Platform for Big Data

Data-Intensive Distributed Computing

Oracle Big Data Connectors

DATABASE SCALE WITHOUT LIMITS ON AWS

One is the Loneliest Number: Scaling out your Data Warehouse

Spotfire: Brisbane Breakfast & Learn. Thursday, 9 November 2017

Demystifying Cloud Data Warehousing

Cloud Computing & Visualization

High-Performance Distributed DBMS for Analytics

Title: Episode 11 - Walking through the Rapid Business Warehouse at TOMS Shoes (Duration: 18:10)

Five Common Myths About Scaling MySQL

IBM dashdb Local. Using a software-defined environment in a private cloud to enable hybrid data warehousing. Evolving the data warehouse

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

April Copyright 2013 Cloudera Inc. All rights reserved.

Part 1: Indexes for Big Data

When, Where & Why to Use NoSQL?

Behind the Glitz - Is Life Better on Another Database Platform?

QLIKVIEW ARCHITECTURAL OVERVIEW

How Real Time Are Your Analytics?

FEATURES BENEFITS SUPPORTED PLATFORMS. Reduce costs associated with testing data projects. Expedite time to market

GPU ACCELERATED DATABASE MANAGEMENT SYSTEMS

Modern Data Warehouse The New Approach to Azure BI

CloudExpo November 2017 Tomer Levi

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics

HANA Performance. Efficient Speed and Scale-out for Real-time BI

QLIK INTEGRATION WITH AMAZON REDSHIFT

WHITEPAPER. MemSQL Enterprise Feature List

Exadata Implementation Strategy

Answer: A Reference: df(page 1, first para)

Safe Harbor Statement

Drawing the Big Picture

Cloud Analytics and Business Intelligence on AWS

Data Warehouse Appliance: Main Memory Data Warehouse

BI ENVIRONMENT PLANNING GUIDE

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

HP NonStop Database Solution

BigInsights and Cognos Stefan Hubertus, Principal Solution Specialist Cognos Wilfried Hoge, IT Architect Big Data IBM Corporation

Modernizing Business Intelligence and Analytics

Service-Level Agreement (SLA) based Reliability, Availability, and Scalability (RAS) for analytics The solution has no single point of failure. The Ve

An Introduction to Big Data Formats

Massive Scalability With InterSystems IRIS Data Platform

Bringing OpenStack to the Enterprise. An enterprise-class solution ensures you get the required performance, reliability, and security

Key Differentiators. What sets Ideal Anaytics apart from traditional BI tools

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Storage Optimization with Oracle Database 11g

Data Lake Based Systems that Work

Demystifying Data Warehouse as a Service (DWaaS)

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019

HOW TO ACHIEVE REAL-TIME ANALYTICS ON A DATA LAKE USING GPUS. Mark Brooks - Principal System Kinetica May 09, 2017

Autonomous Database Level 100

SAP HANA. Jake Klein/ SVP SAP HANA June, 2013

Syncsort DMX-h. Simplifying Big Data Integration. Goals of the Modern Data Architecture SOLUTION SHEET

Actian Vector Benchmarks. Cloud Benchmarking Summary Report

In-Memory Computing EXASOL Evaluation

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Syllabus. Syllabus. Motivation Decision Support. Syllabus

Tour of Database Platforms as a Service. June 2016 Warner Chaves Christo Kutrovsky Solutions Architect

Tutorial Outline. Map/Reduce vs. DBMS. MR vs. DBMS [DeWitt and Stonebraker 2008] Acknowledgements. MR is a step backwards in database access

Optimizing and Modeling SAP Business Analytics for SAP HANA. Iver van de Zand, Business Analytics

Oracle: From Client Server to the Grid and beyond

CrateDB for Time Series. How CrateDB compares to specialized time series data stores

A Primer on Web Analytics

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism

Acquiring Big Data to Realize Business Value

JAVASCRIPT CHARTING. Scaling for the Enterprise with Metric Insights Copyright Metric insights, Inc.

In-Memory Data Management

Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here

Oracle #1 RDBMS Vendor

Data Analytics at Logitech Snowflake + Tableau = #Winning

Přehled novinek v SQL Server 2016

Big Data with Hadoop Ecosystem

QLIKVIEW SCALABILITY BENCHMARK WHITE PAPER

Bringing Data to Life

Transcription:

Leveraging Customer Behavioral Data to Drive Revenue the GPU way 1

Hi! Arnon Shimoni Senior Solutions Architect I like hardware & parallel / concurrent stuff In my 4 th year at SQream Technologies Send gifs to @arnon86 or arnon@sqream.com 2

tl;dr GPUs are good number crunchers makes them good for data processing SQream DB with GPUs is fast Rethink current solutions, the GPU can help Simple hardware is good enough, let s avoid throwing lots of hardware at issues. Don t need to shovel money at the problem! 3

SQream DB an SQL database powered by GPUs Powered by GPUs Massively parallel engine Relies on GPUs for power, not RAM Fast Columnar storage Always on compression 2 TB / hour / GPU ingest speed Scalable 10 TB to 1 PB with ease SQL Database Familiar ANSI SQL Standard connectors (ODBC, JDBC) </> Extensible for AI Python, Jupyter, etc Data science 4

This story starts at MWC last year That s my ear! 5

SQream knows telecoms We ve helped operators with Better analysis of network events Speeding up CDR preparations More history with security management (SIEM) And now customer behaviour

There is a lot of data about customers in telecoms Where and when they wake up and where they spend their days (daily grinders) When/where were they were Instagramming (When and where data was used) How frustrated they got (what the network experience was in each location) What modes of transport they use How close they are to competitor locations But are they actually using this data? Are they getting anything actionable? Are they looking at the entire customer base, and not just a single customer? 7

You know, Telefonica has this multi-million dollar product based on Hadoop for selling this customer behaviour data to 3 rd party companies. Have you thought about maybe getting the same solution for your company, but much simpler? 8

Oh, and we ll do it for you with a single machine 9

Why their current setup wasn t good enough for this Data scientists and BI professionals have only short windows of time to run queries, because of overloaded systems Windows cut even shorter due to long overnight loading Queries take hours, and iterations become painful Long queries Coffee breaks Bathroom breaks Unhappy managers Unhappy everyone 10

Databases that displease data scientists When data scientists or BI professionals want to ask questions that no one has asked before, these systems tend to break and not deliver what s expected They re just not designed for ad-hoc querying Legacy databases require indexing and a lot of manual tuning Newer databases like Vertica also require creating projections, which is time-consuming and inflexible Distributed databases don t perform well when JOIN operations are necessary In-memory databases are very painful on the wallet if you need more than a couple of terabytes 11

Picking the wrong databases will cause pain! Just some of what we saw Cloudera for the BI team Teradata for the marketing team Oracle Exadata Transactional - for CDR collection and customer records Vertica, Netezza for financial Lots of Greenplum to collect from many sources, for marketing and BI 12

Chanel says racks are fashionable. Our customers think otherwise 13

SQream DB software in a standard 2U server Configured with 96GB RAM and a single for a $4,000 total investment. Designed to handle ~40 TB of telecom data Tesla K80 14

Sample dashboards generated Dashboard showing 3G/4G data throughput throughout the day (Morning, Lunch, Evening, Night, ). Larger circles represent more data throughput. Colour becomes darker as the day progresses. Dark-outline circles mean more night-time traffic. Dashboard aggregates directly off SQream DB, with no intermediate steps. Represents 3 table join (3.3B rows 40M rows 300K rows) 15

Sample dashboards generated Dashboard showing 3G/4G data throughput throughout the day (Morning, Lunch, Evening, Night, ). Larger circles represent more data throughput. Colour becomes darker as the day progresses. Dark-outline circles mean more night-time traffic. Dashboard aggregates directly off SQream DB, with no intermediate steps. Represents 3 table join (3.3B rows 40M rows 300K rows) 16

Data Sources Saving hours on reporting with SQream DB Augmenting legacy MPP with a faster, easier to use GPU-powered analytics database 5 hours CDR 4G 80 node ETL Process Aggregations CDR 3G Direct Loading, 2TB/h ingest rate Non CDR Dozens of Reports 20 minutes with SQream DB 15x faster 17

The cost of performance 80 nodes 5 full racks 960 CPU cores, 5.12 TB RAM HP DL380g9 with NVIDIA Tesla K80 96 GB RAM + 6 TB storage 300 m ETL time 20 m 15x faster 120 m Reporting time 12x faster 10 m $10,000,000 $ TCO w/license 50x more cost effective $ $200,000 SQream DB v1.9.6

That wasn t an anomaly We ve done it against Netezza, Teradata, Oracle, Vertica, and even Hadoop based systems. 8 full 42U racks, 56 S-Blades 7 TB RAM 33.70 Average query time (seconds) Dell C4130 with 4x NVIDIA Tesla K80 512 GB RAM + iscsi JBOD (20TB) 31.70 56 Processing Units (S-Blade / GPUs) 4 4.0 Compression ratio 4.7 12,000,000 $ Cost of Ownership $ 500,000 Netezza SQream DB v1.9.7

Find out more about SQream s high performance GPU-driven database software or www.sqream.com arnon@sqream.com