Achieving Horizontal Scalability Alain Houf Sales Engineer
Scale Matters InterSystems IRIS Database Platform lets you: Scale up and scale out Scale users and scale data Mix and match a variety of approaches to scalability, to suit your application and business needs 2 InterSystems Corporation. All rights reserved.
Scaling Up: Vertical Scalability 3 InterSystems Corporation. All rights reserved.
Vertical Scalability Expand capacity of an individual server by adding CPU, memory, I/O & networking components to address workload requirements Advantages Architectural simplicity Fine-grained balancing possible Challenges Software complexity Hardware limitations Non-linear price / performance Requires careful upfront sizing 4 InterSystems Corporation. All rights reserved.
InterSystems SQL Parallel Query Execution Leverage multiple CPU cores to serve up SQL query results Spawns 1 process per core = vertical scalability Most beneficial for aggregation queries on large datasets Currently considered by optimizer based on the %PARALLEL hint, fully transparent automation under development 5 InterSystems Corporation. All rights reserved.
Scaling Out: Horizontal Scalability 6 InterSystems Corporation. All rights reserved.
Horizontal Scalability Expand capacity of a cluster by adding servers to address workload requirements Advantages Near-linear price / performance Leverage commodity, virtual & cloud-based systems Allows elastic scaling Challenges Software complexity Emphasis on networking 7 InterSystems Corporation. All rights reserved.
Horizontally Scaling Users 9 InterSystems Corporation. All rights reserved.
InterSystems ECP Application Servers The InterSystems Enterprise Cache Protocol is a powerful mechanism to distribute data and application logic across database instances. It decouples the execution of application code from persisting the data it handles: ECP Application Server services user requests off a local database cache ECP Data Server persists updates to disk Horizontally Scaling Cache: Allows the caches of multiple instances to each have an independent working set in memory, kept in sync with persisted data Fully transparent to application code 10 InterSystems Corporation. All rights reserved.
Horizontally Scaling Data 11 InterSystems Corporation. All rights reserved.
Horizontally Scaling Data 12 InterSystems Corporation. All rights reserved.
InterSystems SQL Sharding SQL Sharding allows table data to be partitioned over multiple instances Takes parallel SQL processing one step further by distributing the work over multiple servers rather than multiple processes on the same server Distributed data layout can further be exploited through parallel loading and 3rd party frameworks like Apache Spark Horizontally Scaling Cache: Allows cache of multiple instances to be added up to keep a larger overall working set in memory Fully transparent to application code 13 InterSystems Corporation. All rights reserved.
Independently Scaling Users and Data 14 InterSystems Corporation. All rights reserved.
InterSystems SQL Sharding
Sharded Architecture Basics Two main instance roles participate in a sharded cluster: One Shard Master (DM) Entry point to the sharded namespace Stores table definitions, code, data for nonsharded tables shard master Any number of Shard Servers (DS) Provide scalable storage, cache capacity for sharded tables Sharded tables are partitioned across shard servers Nonsharded tables are mapped to shard servers via ECP Routine database is shared between all shard servers Transparent to user code, not accessed directly by users shard server shard server 16 InterSystems Corporation. All rights reserved.
Sharded Architecture Query Processing application 1. Application issues query to shard master 2. Shard master analyzes query for partitioning opportunities and sends shard-local queries to shard servers 3. Shard-local queries are resolved by shard servers and results sent back to master via ECP shard master shard master 4. Shard master aggregates shard-local query results and sends main query results back to application shard server shard server 17 InterSystems Corporation. All rights reserved.
Sharded Architecture Shard Master App Servers Shard Master Application Servers (AM) scale user application workload while Shard Servers scale query processing Use ECP to read nonsharded tables from the Shard Master Data Server (DM) Connect directly to the Shard Servers for sharded table data AM AM AM DM DS DS DS DS 18 InterSystems Corporation. All rights reserved.
Sharded Architecture Query Shards For demanding use cases, application servers can also be added to the shard level to spread the shard-local query workload: analytic ingest Data Shards (DS) persist a partition s data Query Shards (QS) query the data of the corresponding Data Shard via ECP DM For example, large ingestion workloads can be sent straight to the data shards while query shards reserve their cache for a concurrent analytical query workload QS QS QS DS DS DS 19 InterSystems Corporation. All rights reserved.
Joining Sharded Tables Cosharded joins Equijoins on the user-defined shard keys of two or more tables can be executed locally on each shard Extremely efficient, scales well with number of tables and number of shards Any set of sharded tables can be joined Each shard server can access data from other shards via ECP Efficient shard tuple algorithm assigns shard sets to each shard server Sharded tables can be joined with nonsharded tables Shard servers access data from nonsharded tables via ECP 20 InterSystems Corporation. All rights reserved.
Leveraging Other Features of InterSystems IRIS Data Platform Mirroring Sharding leverages mirroring to provide High Availability Fully supported for all data-storing components of sharded clusters (DM & DS) Automatic completion of sharded queries upon node failover InterSystems Connector for Apache Spark Leverages sharded topologies - Spark workers connect directly to shards to execute local queries, do aggregating work in Spark itself JDBC Transparently makes direct parallel connections to shards for high speed data ingestion 21 InterSystems Corporation. All rights reserved.
Use Cases
Use Cases Multi-Asset Global Trading System One of the top global investment banks who processes 13% of global equities trading volume, runs its global trading system on top of InterSystems data platform. More than 2 billions of transactions/day, more than 6TB data are generated every day Has evaluated InterSystems IRIS for real-time data access, short term and long term storage, replacing ECP app servers, replacing Sybase ASE, Sybase IQ and Rainstor. InterSystems IRIS improves query performance by 300% and reduces cost by 70%. Benchmark Service Another top global investment bank is evaluating InterSystems IRIS for replacing its existing Sybase IQ for its benchmark service Has found that InterSystems IRIS is up to 2x faster than another in memory data base, and up to 3x~10x faster than Sybase IQ 23 InterSystems Corporation. All rights reserved.
Use Case 1 Multi-Asset Global Trading System Real Time Access and Data Storage on Private Cloud
Use Case 1: Current InterSystems SQL Environment The trading system persists intraday transactions to hundreds of InterSystems SQL instances: They are divided into Data Servers (DS) and App Servers (AS) Interconnected by InterSystems ECP All of them are running on physical servers AS needs 3x of RAM than DS (128GB vs 40GB) TSS/Hermes AS AS AS Data Server Data Server Data Server To avoid additional load on the AS s by non trading TIS/Persistor related queries, the customer has also set up Sybase ASE instances and is replicating data from trading system/intersystems SQL environment to these ASE instances to serve those queries. 25 InterSystems Corporation. All rights reserved.
Use Case 1: Current Storage Infrastructure Caché intraday Sybase ASE 7 days Sybase IQ 6 months Rainstor forever In near real time, the customer replicates data from trading system/caché to Sybase ASE instances, typically one ASE instance will hold data from more than one trading system/caché instance for 7 days. At EOD, the customer dumps data from its trading system/caché instances to Sybase IQ for up to 6 months, and to Rainstor to keep them there forever. 26 InterSystems Corporation. All rights reserved.
Use Case 1: InterSystems IRIS for Real Time Access Proposed InterSystems IRIS Architecture The trading system components TIS and persistors will continue to store data into existing DS s There will be no more expensive AS s Cloud based InterSystems IRIS query cluster One or more InterSystems IRIS shard master(s) TSS/Hermes & client apps For each DS, there will be one or more IRIS query shard(s) Each node only requires 40GB RAM, no expensive storage either. This cloud based InterSystems IRIS configuration will provide a DS DS TIS/Persistor DS real time, horizontally scalable query facility, that can replace current AS s and Sybase ASE for intraday queries. It will improve query performance by 300%, cut hardware cost by 70%. DM QS QS QS 27 InterSystems Corporation. All rights reserved.
Use Case 1: InterSystems IRIS for Data Storage TSS/Hermes ASE/IQ/Rainstor clients DM DM QS QS QS QS QS QS DS DS DS DS DS DS TIS/Persistor InterSystems IRIS native data replication will move data from InterSystems Caché data servers to InterSystems IRIS data shards in near real time. The cloud based InterSystems IRIS data storage facility can hold 7days, 30 days or 6months of trading data. 28 InterSystems Corporation. All rights reserved.
Use Case 2 Benchmark Service Succeed where Hadoop and Traditional Data Warehouse Fail to Deliver
Use Case 2: Background The investment bank has 18,000 benchmarks (14,000 benchmarks are from external sources, 4,000 benchmarks are created internally). 8TB total data volume. Its asset managers need to use the benchmark service to compare the portfolio they are managing for their clients against one or more benchmarks. Typically end of the day. Its real time strategy trading platform also uses the benchmark service to make trading decisions during trading hours. 30 InterSystems Corporation. All rights reserved.
Use Case 2: Challenges The bank has a peta-bytes data lake on Hadoop. Complex SQL Joins The bank has created many curated SQL stores to serve enterprise applications/customers. Currently Sybase ASE cannot keep up with applications/customers demand. Low Latency Requirements In-memory SQL solutions are expensive and/or unstable 31 InterSystems Corporation. All rights reserved.
Use Case 2: InterSystems IRIS Succeeds Where Others Fail The bank deployed InterSystems IRIS on VMs provisioned from its private cloud Each VM has 4 cores, 32GB RAM, 200GB internal disk. Different sharding strategies by different sharding keys (indexid, businessdate) InterSystems IRIS is up to 2x faster than another distributed in memory database, up to 3x~10x faster than Sybase IQ in many test cases, and InterSystems IRIS is always fast across the board. 32 InterSystems Corporation. All rights reserved.
Q&A
Thank you.