Leveraging Customer Behavioral Data to Drive Revenue the GPU way 1
Hi! Arnon Shimoni Senior Solutions Architect I like hardware & parallel / concurrent stuff In my 4 th year at SQream Technologies Send gifs to @arnon86 or arnon@sqream.com 2
tl;dr GPUs are good number crunchers makes them good for data processing SQream DB with GPUs is fast Rethink current solutions, the GPU can help Simple hardware is good enough, let s avoid throwing lots of hardware at issues. Don t need to shovel money at the problem! 3
SQream DB an SQL database powered by GPUs Powered by GPUs Massively parallel engine Relies on GPUs for power, not RAM Fast Columnar storage Always on compression 2 TB / hour / GPU ingest speed Scalable 10 TB to 1 PB with ease SQL Database Familiar ANSI SQL Standard connectors (ODBC, JDBC) </> Extensible for AI Python, Jupyter, etc Data science 4
This story starts at MWC last year That s my ear! 5
SQream knows telecoms We ve helped operators with Better analysis of network events Speeding up CDR preparations More history with security management (SIEM) And now customer behaviour
There is a lot of data about customers in telecoms Where and when they wake up and where they spend their days (daily grinders) When/where were they were Instagramming (When and where data was used) How frustrated they got (what the network experience was in each location) What modes of transport they use How close they are to competitor locations But are they actually using this data? Are they getting anything actionable? Are they looking at the entire customer base, and not just a single customer? 7
You know, Telefonica has this multi-million dollar product based on Hadoop for selling this customer behaviour data to 3 rd party companies. Have you thought about maybe getting the same solution for your company, but much simpler? 8
Oh, and we ll do it for you with a single machine 9
Why their current setup wasn t good enough for this Data scientists and BI professionals have only short windows of time to run queries, because of overloaded systems Windows cut even shorter due to long overnight loading Queries take hours, and iterations become painful Long queries Coffee breaks Bathroom breaks Unhappy managers Unhappy everyone 10
Databases that displease data scientists When data scientists or BI professionals want to ask questions that no one has asked before, these systems tend to break and not deliver what s expected They re just not designed for ad-hoc querying Legacy databases require indexing and a lot of manual tuning Newer databases like Vertica also require creating projections, which is time-consuming and inflexible Distributed databases don t perform well when JOIN operations are necessary In-memory databases are very painful on the wallet if you need more than a couple of terabytes 11
Picking the wrong databases will cause pain! Just some of what we saw Cloudera for the BI team Teradata for the marketing team Oracle Exadata Transactional - for CDR collection and customer records Vertica, Netezza for financial Lots of Greenplum to collect from many sources, for marketing and BI 12
Chanel says racks are fashionable. Our customers think otherwise 13
SQream DB software in a standard 2U server Configured with 96GB RAM and a single for a $4,000 total investment. Designed to handle ~40 TB of telecom data Tesla K80 14
Sample dashboards generated Dashboard showing 3G/4G data throughput throughout the day (Morning, Lunch, Evening, Night, ). Larger circles represent more data throughput. Colour becomes darker as the day progresses. Dark-outline circles mean more night-time traffic. Dashboard aggregates directly off SQream DB, with no intermediate steps. Represents 3 table join (3.3B rows 40M rows 300K rows) 15
Sample dashboards generated Dashboard showing 3G/4G data throughput throughout the day (Morning, Lunch, Evening, Night, ). Larger circles represent more data throughput. Colour becomes darker as the day progresses. Dark-outline circles mean more night-time traffic. Dashboard aggregates directly off SQream DB, with no intermediate steps. Represents 3 table join (3.3B rows 40M rows 300K rows) 16
Data Sources Saving hours on reporting with SQream DB Augmenting legacy MPP with a faster, easier to use GPU-powered analytics database 5 hours CDR 4G 80 node ETL Process Aggregations CDR 3G Direct Loading, 2TB/h ingest rate Non CDR Dozens of Reports 20 minutes with SQream DB 15x faster 17
The cost of performance 80 nodes 5 full racks 960 CPU cores, 5.12 TB RAM HP DL380g9 with NVIDIA Tesla K80 96 GB RAM + 6 TB storage 300 m ETL time 20 m 15x faster 120 m Reporting time 12x faster 10 m $10,000,000 $ TCO w/license 50x more cost effective $ $200,000 SQream DB v1.9.6
That wasn t an anomaly We ve done it against Netezza, Teradata, Oracle, Vertica, and even Hadoop based systems. 8 full 42U racks, 56 S-Blades 7 TB RAM 33.70 Average query time (seconds) Dell C4130 with 4x NVIDIA Tesla K80 512 GB RAM + iscsi JBOD (20TB) 31.70 56 Processing Units (S-Blade / GPUs) 4 4.0 Compression ratio 4.7 12,000,000 $ Cost of Ownership $ 500,000 Netezza SQream DB v1.9.7
Find out more about SQream s high performance GPU-driven database software or www.sqream.com arnon@sqream.com