IBM Db2 Event Store Simplifying and Accelerating Storage and Analysis of Fast Data. IBM Db2 Event Store

Size: px

Start display at page:

Download "IBM Db2 Event Store Simplifying and Accelerating Storage and Analysis of Fast Data. IBM Db2 Event Store"

Clinton Todd
5 years ago
Views:

1 IBM Db2 Event Store Simplifying and Accelerating Storage and Analysis of Fast Data IBM Db2 Event Store

2 Disclaimer The information contained in this presentation is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided as is, without warranty of any kind, express or implied. In addition, this information is based on IBM s current product plans and strategy, which are subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this presentation or any other documentation. Nothing contained in this presentation is intended to, or shall have the effect of: Creating any warranty or representation from IBM (or its affiliates or its or their suppliers and/or licensors); or Altering the terms and conditions of the applicable license agreement governing the use of IBM software. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.

The Challenges of Fast Data Data is arriving faster than ever before Billions of events processed every day Evident cross industry and driven by IoT Must land data quickly, or throw it away Total

3 The Challenges of Fast Data Data is arriving faster than ever before Billions of events processed every day Evident cross industry and driven by IoT Must land data quickly, or throw it away Total data is large, and growing rapidly Storing all events implies large data sets Storage costs are significant, and must be managed Data is useless without fast insights Data value decays rapidly over time Insights must derived quickly, and used advanced analytics (ML) Data availability without duplication Data must be available to the entire organization without requiring replication or duplication Maintain data in open format for future-proofing

4 Existing Solution for Fast Data Lambda Architecture Batch Layer Query-friendly Storage Data Stream Queries Speed Layer Store data quickly

5 Existing Solution for Fast Data Lambda Architecture Batch Layer Query-friendly Storage Complex Two separate architectures to deploy/manage Data Stream Queries Speed Layer Store data quickly

6 Existing Solution for Fast Data Lambda Architecture Batch Layer Query-friendly Storage Complex Two separate architectures to deploy/manage Data Stream Queries Speed Layer Store data quickly Costly Two copies of the data

7 Existing Solution for Fast Data Lambda Architecture Batch Layer Query-friendly Storage Complex Two separate architectures to deploy/manage Data Stream Queries Complex Queries need to consult two data stores Speed Layer Store data quickly Costly Two copies of the data

8 Alternative Approach Kappa Architecture Data Stream Queries Speed Layer Store data quickly

9 Alternative Approach Kappa Architecture Better Only writing data once Data Stream Queries Speed Layer Store data quickly

10 Alternative Approach Kappa Architecture Better Only writing data once Data Stream Better Less infrastructure to manage Speed Layer Store data quickly Queries

11 Alternative Approach Kappa Architecture Data Stream Better Only writing data once Better Less infrastructure to manage Worse Queries are much less efficient Queries Speed Layer Store data quickly

12 Alternative Approach Modified Lambda Data Stream Ingest Layer Store data quickly Query Layer Query-friendly Storage Queries

13 Alternative Approach Modified Lambda Extract, Transform, Load ETL Data Stream Ingest Layer Store data quickly Query Layer Query-friendly Storage Queries There is no such thing as a new idea. It is impossible. We simply take a lot of old ideas and put them into a sort of mental kaleidoscope. We give them a turn and they make new and curious combinations. Mark Twain

14 Alternative Approach Modified Lambda Complex Still maintaining two separate stores Data Stream Ingest Layer Store data quickly Query Layer Query-friendly Storage Queries

15 Alternative Approach Modified Lambda Complex Still maintaining two separate stores Data Stream Ingest Layer Store data quickly Query Layer Query-friendly Storage Queries Costly Still two copies of the data

16 Alternative Approach Modified Lambda Complex Still maintaining two separate stores Incomplete Queries do not consider all data Data Stream Ingest Layer Store data quickly Query Layer Query-friendly Storage Queries Costly Still two copies of the data

17 Surely there must be a better way

Store 1 2 Lightning Fast Ingest 1 Million inserts per second

ingested quickly, then refined and enriched Real-time

Super-fast lookups and intelligent scans Integrated machine

Packaged and integrated with IBM Data Science experience;

Architected to scale to very large clusters 4 Built for Data

18 What is IBM Db2 Event Store? A unified offering for Fast Data which delivers IBM Db2 Event Store 1 2 Lightning Fast Ingest 1 Million inserts per second per node Ingest scales linearly with added nodes Data ingested quickly, then refined and enriched Real-time Analytics Real-time analytics over ALL ingested data Super-fast lookups and intelligent scans Integrated machine learning capabilities 3 Integrated and Highly Available Packaged and integrated with IBM Data Science experience; available Streams sink Remains available on node failure Architected to scale to very large clusters 4 Built for Data Sharing and Efficiency Writes to shared storage in Parquet format Able to leverage low-cost object storage Single copy of the data

19 Understanding the Engine and Components IBM Db2 Event Store Open Access IBM Streams High Speed Ingest Real-Time Insights Machine Learning IBM Parquet Compatible BIGSQL tools IBM Event Store Cluster Highly Available Distributed Storage Open Data Format In-Memory Data grid

20 IBM Db2 Event Store A combination of IBM assets + Open Source

21 How Event Store Manages Data IBM Db2 Event Store Maintains two data tiers Most recent data stored locally and replicated for HA Data and index cached locally for fast access Rest of data stored in shared storage layer Data is sharded (by hash) and is logically owned ( mastered ) by a given node Shared storage layer Allows for fast recovery from failures with logical remastering Reduces storage costs when using object storage Provides high availability Separates compute and storage Node A Node B Node C IBM Event Store Engine IBM Event Store Engine Log (on SSD) Log (on SSD) Log (on SSD) Compressed Parquet data Shared Storage IBM Event Store Engine Cache Cache Cache

Rows are placed in the queryable log, replicated to replica nodes, and reply is sent to client 4.

22 IBM Event Store Ingest 1. Ingest occurs using Python, Scala or Java API, or Streams sink 2. Batches of rows are formed and sent asynchronously from client to the appropriate Event Store nodes 3. Rows are placed in the queryable log, replicated to replica nodes, and reply is sent to client 4. After some time, data in logs is formed into Parquet blocks and written to the storage layer 1 Log (on SSD) Event Store Client Streaming Application Node A Node B Node C IBM Event Store Engine IBM Event Store Engine Log (on SSD) Compressed Parquet data Shared Storage 2 IBM Event Store Engine IBM Db2 Event Store Log (on SSD) Configurable Replication 3 Share 4

23 IBM Event Store Analytics 1. Analytics queries through Spark like Scala API, or through Python/REST 2. Queries are sent to either Event Store nodes, or vanilla Spark nodes depending on performance requirements and whether most recent (ungroomed) data is required a) Query is sent to Event Store nodes to retrieve most recent data and combine with groomed data in cache or in storage layer IBM BLUSpark Event Store Engine Engine Cache 2a Event Store Client Analytical Application Node A Node B Node C Spark Executor 1 Log (on SSD) Spark Executor IBM BLUSpark Event Store Engine Engine Cache 2a Log (on SSD) 2b Spark Executor IBM Db2 Event Store b) Query is sent to Spark node(s) to read data all but most recent data from storage Compressed Parquet data Storage

24 IBM Event Store Availability - Ingest Provides HA on all data Shared data leverages replicated storage Log data is replicated by IBM Event Store On insert In the presence of node failures, inserts continue to be processed So long as configured replication factor is achievable Event Store Client Streaming Application Node A Node B Node C IBM Event Store Engine IBM Event Store Engine IBM Event Store Engine Log (on SSD) Log (on SSD) Log (on SSD) IBM Db2 Event Store Configurable Replication Share Compressed Parquet data Storage

25 IBM Event Store Availability - Query Provides HA on all data Shared data leverages replicated storage Log data is replicated by IBM Event Store On query In the presence of node failures, queries continue to be processed So long as configured number of query replicas are reachable Data in storage layer is directly queryable regardless of the state of IBM Event Store nodes IBM Event Store Engine Event Store Client Transactional Application Node A Node B Node C Spark Executor Spark Executor IBM BLUSpark Event Store Engine Compressed Parquet data Spark Executor BLUSpark IBM Event Store Engine Log (on SSD) Log (on SSD) Log (on SSD) IBM Db2 Event Store Storage

26 Demo

29 Offerings

30 IBM Event Store Offerings IBM Db2 Event Store Developer Edition Free Download and Go edition Great for getting started, writing your first application Packaged with Desktop version of DSX Runs on MacOS, Linux, Windows Download from Enterprise Edition Production level offering Includes high availability, monitoring UI, REST API Packaged with DSX Local Comes in 1 and 3 node installers

31 Recent Enhancements

32 IBM Db2 Event Store Enterprise Version IBM Db2 Event Store Time To Live (TTL) What Continuous data ingest without ever increasing storage requirements Why Data is plentiful, but storage is costly and data value degrades with time How Retention time specified in hours at table creation time, data automatically deleted when retention time is exceeded JDBC Connectivity What Query a database using standard JDBC interface Why Leverage existing development skill and established tooling How Included with Event Store Enterprise v1.1.2 ( As well as Performance and stability improvements

33 Thank you For more information, visit: ibm.biz/eventstore Contact eventstore@ca.ibm.com IBM Hybrid Cloud / 2018 / 2018 IBM Corporation

34 Db2 Community Share. Solve. Do More. IBM Db2 Event Store Community JOIN the Db2 Community and keep an eye out for upcoming webinars, forum posts, blogs and more. Have questions about this content? Have suggestions for upcoming topics? Post them to the Forum or send an to acommunity Manager:

IBM Db2 Warehouse on Cloud

IBM Db2 Warehouse on Cloud February 01, 2018 Ben Hudson, Offering Manager Noah Kuttler, Product Marketing CALL LOGISTICS Data Warehouse Community Share. Solve. Do More. There are 2 options to listen to