Achieving Horizontal Scalability. Alain Houf Sales Engineer

Similar documents
Massive Scalability With InterSystems IRIS Data Platform

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

Accelerating Digital Transformation with InterSystems IRIS and vsan

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

An InterSystems Guide to the Data Galaxy. Benjamin De Boe Product Manager

<Insert Picture Here> MySQL Web Reference Architectures Building Massively Scalable Web Infrastructure

VOLTDB + HP VERTICA. page

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

CIT 668: System Architecture. Amazon Web Services

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

DATABASE SCALE WITHOUT LIMITS ON AWS

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

Accelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite. Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017

InterSystems Cloud Manager & Containers for InterSystems Technologies. Luca Ravazzolo Product Manager

Accelerate Big Data Insights

Approaching the Petabyte Analytic Database: What I learned

Massively Parallel Processing. Big Data Really Fast. A Proven In-Memory Analytical Processing Platform for Big Data

Aerospike Scales with Google Cloud Platform


From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019

Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack. Chief Architect RainStor

MySQL Cluster Web Scalability, % Availability. Andrew

CA485 Ray Walshe Google File System

Caché and Data Management in the Financial Services Industry

MySQL In the Cloud. Migration, Best Practices, High Availability, Scaling. Peter Zaitsev CEO Los Angeles MySQL Meetup June 12 th, 2017.

Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016

Datacenter replication solution with quasardb

Cloud Analytics and Business Intelligence on AWS

PUBLIC SAP Vora Sizing Guide

Fluentd + MongoDB + Spark = Awesome Sauce

Mellanox InfiniBand Solutions Accelerate Oracle s Data Center and Cloud Solutions

Modern Data Warehouse The New Approach to Azure BI

HANA Performance. Efficient Speed and Scale-out for Real-time BI

5 Fundamental Strategies for Building a Data-centered Data Center

Oracle Autonomous Database

Shark: SQL and Rich Analytics at Scale. Michael Xueyuan Han Ronny Hajoon Ko

FLORIDA DEPARTMENT OF TRANSPORTATION PRODUCTION BIG DATA PLATFORM

Cloud Computing & Visualization

Big Data Architect.

Best Practices and Performance Tuning on Amazon Elastic MapReduce

April Copyright 2013 Cloudera Inc. All rights reserved.

IBM Storage Solutions & Software Defined Infrastructure

Flash Storage Complementing a Data Lake for Real-Time Insight

Nutanix Tech Note. Virtualizing Microsoft Applications on Web-Scale Infrastructure

Increasing Performance of Existing Oracle RAC up to 10X

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

WHITEPAPER. MemSQL Enterprise Feature List

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

microsoft

RIGHTNOW A C E

EXTRACT DATA IN LARGE DATABASE WITH HADOOP

ARCHITECTING WEB APPLICATIONS FOR THE CLOUD: DESIGN PRINCIPLES AND PRACTICAL GUIDANCE FOR AWS

Distributed File Systems II

Part 1: Indexes for Big Data

Low Latency Data Grids in Finance

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

REGULATORY REPORTING FOR FINANCIAL SERVICES

QLIK INTEGRATION WITH AMAZON REDSHIFT

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

Data pipelines with PostgreSQL & Kafka

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic

Power Systems with POWER8 Scale-out Technical Sales Skills V1

Lambda Architecture for Batch and Stream Processing. October 2018

Discover the all-flash storage company for the on-demand world

SoftNAS Cloud Performance Evaluation on Microsoft Azure

Scalable Shared Databases for SQL Server 2005

Cloudera s Enterprise Data Hub on the Amazon Web Services Cloud: Quick Start Reference Deployment October 2014

How to Scale Out MySQL on EC2 or RDS. Victoria Dudin, Director R&D, ScaleBase

Embedded Technosolutions

Practical MySQL Performance Optimization. Peter Zaitsev, CEO, Percona July 20 th, 2016 Percona Technical Webinars

Microsoft SQL Server HA and DR with DVX

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

The Desired State. Solving the Data Center s N-Dimensional Challenge

Why Datrium DVX is Best for VDI

PRESENTATION TITLE GOES HERE. Understanding Architectural Trade-offs in Object Storage Technologies

ELASTIC DATA PLATFORM

DDN. DDN Updates. Data DirectNeworks Japan, Inc Shuichi Ihara. DDN Storage 2017 DDN Storage

Highly Scalable, Non-RDMA NVMe Fabric. Bob Hansen,, VP System Architecture

Couchbase Architecture Couchbase Inc. 1

MySQL High Availability

Service Mesh and Microservices Networking

THE EMC ISILON STORY. Big Data In The Enterprise. Deya Bassiouni Isilon Regional Sales Manager Emerging Africa, Egypt & Lebanon.

Managing IoT and Time Series Data with Amazon ElastiCache for Redis

Lenovo Database Configuration

Microsoft Exam

Crescando: Predictable Performance for Unpredictable Workloads

Amazon AWS-Solution-Architect-Associate Exam

SoftNAS Cloud Performance Evaluation on AWS

Inside GigaSpaces XAP Technical Overview and Value Proposition

Architectural challenges for building a low latency, scalable multi-tenant data warehouse

Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here

Maximizing Fraud Prevention Through Disruptive Architectures Delivering speed at scale.

Coherence & WebLogic Server integration with Coherence (Active Cache)

Isilon: Raising The Bar On Performance & Archive Use Cases. John Har Solutions Product Manager Unstructured Data Storage Team

Shark: SQL and Rich Analytics at Scale. Yash Thakkar ( ) Deeksha Singh ( )

An Oracle White Paper April 2010

Data Analytics at Logitech Snowflake + Tableau = #Winning

Document Sub Title. Yotpo. Technical Overview 07/18/ Yotpo

Survey of the Azure Data Landscape. Ike Ellis

Rickard Linck Client Technical Professional Core Database and Lifecycle Management Common Analytic Engine Cloud Data Servers On-Premise Data Servers

Transcription:

Achieving Horizontal Scalability Alain Houf Sales Engineer

Scale Matters InterSystems IRIS Database Platform lets you: Scale up and scale out Scale users and scale data Mix and match a variety of approaches to scalability, to suit your application and business needs 2 InterSystems Corporation. All rights reserved.

Scaling Up: Vertical Scalability 3 InterSystems Corporation. All rights reserved.

Vertical Scalability Expand capacity of an individual server by adding CPU, memory, I/O & networking components to address workload requirements Advantages Architectural simplicity Fine-grained balancing possible Challenges Software complexity Hardware limitations Non-linear price / performance Requires careful upfront sizing 4 InterSystems Corporation. All rights reserved.

InterSystems SQL Parallel Query Execution Leverage multiple CPU cores to serve up SQL query results Spawns 1 process per core = vertical scalability Most beneficial for aggregation queries on large datasets Currently considered by optimizer based on the %PARALLEL hint, fully transparent automation under development 5 InterSystems Corporation. All rights reserved.

Scaling Out: Horizontal Scalability 6 InterSystems Corporation. All rights reserved.

Horizontal Scalability Expand capacity of a cluster by adding servers to address workload requirements Advantages Near-linear price / performance Leverage commodity, virtual & cloud-based systems Allows elastic scaling Challenges Software complexity Emphasis on networking 7 InterSystems Corporation. All rights reserved.

Horizontally Scaling Users 9 InterSystems Corporation. All rights reserved.

InterSystems ECP Application Servers The InterSystems Enterprise Cache Protocol is a powerful mechanism to distribute data and application logic across database instances. It decouples the execution of application code from persisting the data it handles: ECP Application Server services user requests off a local database cache ECP Data Server persists updates to disk Horizontally Scaling Cache: Allows the caches of multiple instances to each have an independent working set in memory, kept in sync with persisted data Fully transparent to application code 10 InterSystems Corporation. All rights reserved.

Horizontally Scaling Data 11 InterSystems Corporation. All rights reserved.

Horizontally Scaling Data 12 InterSystems Corporation. All rights reserved.

InterSystems SQL Sharding SQL Sharding allows table data to be partitioned over multiple instances Takes parallel SQL processing one step further by distributing the work over multiple servers rather than multiple processes on the same server Distributed data layout can further be exploited through parallel loading and 3rd party frameworks like Apache Spark Horizontally Scaling Cache: Allows cache of multiple instances to be added up to keep a larger overall working set in memory Fully transparent to application code 13 InterSystems Corporation. All rights reserved.

Independently Scaling Users and Data 14 InterSystems Corporation. All rights reserved.

InterSystems SQL Sharding

Sharded Architecture Basics Two main instance roles participate in a sharded cluster: One Shard Master (DM) Entry point to the sharded namespace Stores table definitions, code, data for nonsharded tables shard master Any number of Shard Servers (DS) Provide scalable storage, cache capacity for sharded tables Sharded tables are partitioned across shard servers Nonsharded tables are mapped to shard servers via ECP Routine database is shared between all shard servers Transparent to user code, not accessed directly by users shard server shard server 16 InterSystems Corporation. All rights reserved.

Sharded Architecture Query Processing application 1. Application issues query to shard master 2. Shard master analyzes query for partitioning opportunities and sends shard-local queries to shard servers 3. Shard-local queries are resolved by shard servers and results sent back to master via ECP shard master shard master 4. Shard master aggregates shard-local query results and sends main query results back to application shard server shard server 17 InterSystems Corporation. All rights reserved.

Sharded Architecture Shard Master App Servers Shard Master Application Servers (AM) scale user application workload while Shard Servers scale query processing Use ECP to read nonsharded tables from the Shard Master Data Server (DM) Connect directly to the Shard Servers for sharded table data AM AM AM DM DS DS DS DS 18 InterSystems Corporation. All rights reserved.

Sharded Architecture Query Shards For demanding use cases, application servers can also be added to the shard level to spread the shard-local query workload: analytic ingest Data Shards (DS) persist a partition s data Query Shards (QS) query the data of the corresponding Data Shard via ECP DM For example, large ingestion workloads can be sent straight to the data shards while query shards reserve their cache for a concurrent analytical query workload QS QS QS DS DS DS 19 InterSystems Corporation. All rights reserved.

Joining Sharded Tables Cosharded joins Equijoins on the user-defined shard keys of two or more tables can be executed locally on each shard Extremely efficient, scales well with number of tables and number of shards Any set of sharded tables can be joined Each shard server can access data from other shards via ECP Efficient shard tuple algorithm assigns shard sets to each shard server Sharded tables can be joined with nonsharded tables Shard servers access data from nonsharded tables via ECP 20 InterSystems Corporation. All rights reserved.

Leveraging Other Features of InterSystems IRIS Data Platform Mirroring Sharding leverages mirroring to provide High Availability Fully supported for all data-storing components of sharded clusters (DM & DS) Automatic completion of sharded queries upon node failover InterSystems Connector for Apache Spark Leverages sharded topologies - Spark workers connect directly to shards to execute local queries, do aggregating work in Spark itself JDBC Transparently makes direct parallel connections to shards for high speed data ingestion 21 InterSystems Corporation. All rights reserved.

Use Cases

Use Cases Multi-Asset Global Trading System One of the top global investment banks who processes 13% of global equities trading volume, runs its global trading system on top of InterSystems data platform. More than 2 billions of transactions/day, more than 6TB data are generated every day Has evaluated InterSystems IRIS for real-time data access, short term and long term storage, replacing ECP app servers, replacing Sybase ASE, Sybase IQ and Rainstor. InterSystems IRIS improves query performance by 300% and reduces cost by 70%. Benchmark Service Another top global investment bank is evaluating InterSystems IRIS for replacing its existing Sybase IQ for its benchmark service Has found that InterSystems IRIS is up to 2x faster than another in memory data base, and up to 3x~10x faster than Sybase IQ 23 InterSystems Corporation. All rights reserved.

Use Case 1 Multi-Asset Global Trading System Real Time Access and Data Storage on Private Cloud

Use Case 1: Current InterSystems SQL Environment The trading system persists intraday transactions to hundreds of InterSystems SQL instances: They are divided into Data Servers (DS) and App Servers (AS) Interconnected by InterSystems ECP All of them are running on physical servers AS needs 3x of RAM than DS (128GB vs 40GB) TSS/Hermes AS AS AS Data Server Data Server Data Server To avoid additional load on the AS s by non trading TIS/Persistor related queries, the customer has also set up Sybase ASE instances and is replicating data from trading system/intersystems SQL environment to these ASE instances to serve those queries. 25 InterSystems Corporation. All rights reserved.

Use Case 1: Current Storage Infrastructure Caché intraday Sybase ASE 7 days Sybase IQ 6 months Rainstor forever In near real time, the customer replicates data from trading system/caché to Sybase ASE instances, typically one ASE instance will hold data from more than one trading system/caché instance for 7 days. At EOD, the customer dumps data from its trading system/caché instances to Sybase IQ for up to 6 months, and to Rainstor to keep them there forever. 26 InterSystems Corporation. All rights reserved.

Use Case 1: InterSystems IRIS for Real Time Access Proposed InterSystems IRIS Architecture The trading system components TIS and persistors will continue to store data into existing DS s There will be no more expensive AS s Cloud based InterSystems IRIS query cluster One or more InterSystems IRIS shard master(s) TSS/Hermes & client apps For each DS, there will be one or more IRIS query shard(s) Each node only requires 40GB RAM, no expensive storage either. This cloud based InterSystems IRIS configuration will provide a DS DS TIS/Persistor DS real time, horizontally scalable query facility, that can replace current AS s and Sybase ASE for intraday queries. It will improve query performance by 300%, cut hardware cost by 70%. DM QS QS QS 27 InterSystems Corporation. All rights reserved.

Use Case 1: InterSystems IRIS for Data Storage TSS/Hermes ASE/IQ/Rainstor clients DM DM QS QS QS QS QS QS DS DS DS DS DS DS TIS/Persistor InterSystems IRIS native data replication will move data from InterSystems Caché data servers to InterSystems IRIS data shards in near real time. The cloud based InterSystems IRIS data storage facility can hold 7days, 30 days or 6months of trading data. 28 InterSystems Corporation. All rights reserved.

Use Case 2 Benchmark Service Succeed where Hadoop and Traditional Data Warehouse Fail to Deliver

Use Case 2: Background The investment bank has 18,000 benchmarks (14,000 benchmarks are from external sources, 4,000 benchmarks are created internally). 8TB total data volume. Its asset managers need to use the benchmark service to compare the portfolio they are managing for their clients against one or more benchmarks. Typically end of the day. Its real time strategy trading platform also uses the benchmark service to make trading decisions during trading hours. 30 InterSystems Corporation. All rights reserved.

Use Case 2: Challenges The bank has a peta-bytes data lake on Hadoop. Complex SQL Joins The bank has created many curated SQL stores to serve enterprise applications/customers. Currently Sybase ASE cannot keep up with applications/customers demand. Low Latency Requirements In-memory SQL solutions are expensive and/or unstable 31 InterSystems Corporation. All rights reserved.

Use Case 2: InterSystems IRIS Succeeds Where Others Fail The bank deployed InterSystems IRIS on VMs provisioned from its private cloud Each VM has 4 cores, 32GB RAM, 200GB internal disk. Different sharding strategies by different sharding keys (indexid, businessdate) InterSystems IRIS is up to 2x faster than another distributed in memory database, up to 3x~10x faster than Sybase IQ in many test cases, and InterSystems IRIS is always fast across the board. 32 InterSystems Corporation. All rights reserved.

Q&A

Thank you.