EsgynDB Enterprise 2.0 Platform Reference Architecture

EsgynDB Enterprise 2.0 Platform Reference Architecture This document outlines a Platform Reference Architecture for EsgynDB Enterprise, built on Apache Trafodion (Incubating) implementation with licensed support and extensions. It outlines server configurations for various vendors and attempts to describe some of the considerations for sizing an EsgynDB Application 1 INTRODUCTION... 1 2 ARCHITECTURE... 1 3 CAPACITY PLANNING... 5 3.1 PROCESSING USAGE... 5 3.2 MEMORY USAGE... 6 3.3 DISK USAGE... 6 3.4 NETWORK USAGE... 7 4 REFERENCE ARCHITECTURE GUIDANCE FOR PRODUCTION BARE METAL CLUSTER... 7 4.1 MEDIUM/LARGE DEPLOYMENT... 8 4.2 SMALL DEPLOYMENT... 9 5 CLOUD DEPLOYMENT... 10 6 CONCLUSION... 10 1 Introduction The Apache Trafodion (Incubating) project provides a full transactional SQL database integrated into the Apache Hadoop ecosystem to support operational workloads. EsgynDB Enterprise 2.0, built on Apache Trafodion, offers a fully-supported, enterprise-ready version with extensions for additional features, including cross datacenter support in EsgynDB Enterprise Advanced 2.0. The reference architecture description describes a purpose-built EsgynDB Enterprise installation. Specifically it describes the architecture and provisioning for a cluster whose purpose is running one or more EsgynDB application workload(s). This reference architecture does not describe a configuration where EsgynDB Enterprise is part of a wider Hadoop cluster running other ecosystem applications such as MapReduce. Clusters running mixed workloads can start from sizing/provisioning information here. But the final sizing/provisioning must also incorporate requirements from the other workloads in the cluster. As such it is beyond the scope of this document. 2 Architecture Apache Trafodion provides an enterprise-class, web scale database engine in the Hadoop ecosystem. In addition, Trafodion enables SQL query language and transactional semantics for native Apache HBase and Apache Hive tables. Trafodion provides transactional support for data stored in HBase. It supports fully distributed ACID transactions across multiple statements, tables, and rows, which enables 06Dec2015 2015 Esgyn Corporation 1

EsgynDB Enterprise to support operational workloads that are generally beyond most Hadoop ecosystem components. EsgynDB Enterprise Release 2.0 extends Apache Trafodion by providing additional features such as cross datacenter support, using the architecture depicted below The architecture involves one or more clients concurrently using SQL queries to access the data managed by EsgynDB via a driver (ODBC/JDBC/ADO.NET). The driver library provides the connection and session between the application (which might or might not execute on the same cluster) query and the SQL engine layer. In the SQL engine layer, a master query execution server process for each query prepares and executes a query process. Depending on the specifics of the workload, it might involve a distributed transaction manager, or one or more groups of executor server processes (ESPs) that execute portions of the query plan in parallel. These groups of ESPs (for a given query, there might be zero or more groups), reflect the degree of parallelism for the query. The query can reference native HBase or Hive tables as well. Ultimately EsgynDB uses HDFS as the storage layer foundation, with an appropriate replication factor (usually 3, but in some cloud configurations 2 is the appropriate replication factor) to provide availability if a node fails. Significant processes used for query processing include: Process Name Description Distribution Count DCS Master Initial connection point for locating a session-hosting mxosrvr On one single node One active per cluster, often configured with a floating IP for high DCS Server Master executor (mxosrvr) Executor Server Process (ESP) Process that manages status and connection usage for mxosrvr processes Master executor process that hosts the SQL session, does query compilation and execution of root operator Executes parallel fragments of SQL plans On each node Multiple on all data nodes in instance Multiple run on all data nodes in cluster, in variable size groups. availability. One for each node where mxosrvrs run. Count defines maximum number of concurrent sessions. Workload Dependent: determined by concurrent 06Dec2015 2015 Esgyn Corporation 2

DTM Maintains transactional state and log outcome information for transactions. Runs on all data nodes in the instance. parallel queries, query plan, and degree of parallelism. One per data node. For the EsgynDB Enterprise 2.0 version, cross datacenter support is implemented via DTM, which communicates with Transaction Manager processes on the peer datacenter clusters to replicate the transactions on both clusters. The EsgynDB Manager architecture was simplified in the above architecture picture to show its relationship to the query processing engine. The EsgynDB Manager subsystem architecture expands to multiple processes as depicted in the following picture: EsgynDB Manager processes include: Process Name Description Distribution Count DB Manager Web application server that browser connects with to On one single node One per cluster on the first data node OpenTSDB Lightweight service On each node One per node processes for collecting time-series metrics TCollectors Collection scripts that collect time-based metrics at interval Multiple on all data nodes in instance; processes per node vary System and HBase metrics are collected on each node EsgynDB metrics collected cluster-wide from a process on the first data node 06Dec2015 2015 Esgyn Corporation 3

REST Server Process that handles REST requests from on- and offcluster clients One per cluster One per cluster on the first data node In addition to the listed processes used for query-processing and manageability, there are other processes that are part of the EsgynDB stack supporting its runtime execution environment. These processes generally use fewer resources and have little material impact on platform sizing and provisioning. EsgynDB Enterprise is integrated into the Hadoop ecosystem as depicted in the following picture: The EsgynDB database engine uses HBase for storage services. As such, it relies on HBase configuration and tuning to achieve optimal performance. EsgynDB cluster provisioning must incorporate HBase configuration considerations. HBase processes can be divided into two classes: control processes and data processes. Control processes are one-offs that are involved in managing the HBase system and managing its metadata. Data processes are processes that are involved in serving the data itself, including reading, updating, and writing (HBase scan, get, and put operations). HBase control processes include: Process HMaster ZooKeeper Description Metadata and table creation/deletion Not an HBase process, but used for information management and coordination across nodes. HBase data processes include: Process RegionServer Description Controls data serving, including servicing get/put, and separation of data into individual regions. HBase in turn uses HDFS services for scalability, availability, and recovery (replication) within the cluster. As such, EsgynDB cluster provisioning must also incorporate HDFS configuration considerations, including replication. Control processes are singleton processes that manage the HDFS file system. In HDFS, they control the location for individual data blocks. Data processes are involved in reading and accessing that data. 06Dec2015 2015 Esgyn Corporation 4

HDFS control processes include: Process NameNode Secondary NameNode HDFS data processes include: Process DataNode Description Manages the metadata files that are used to map blocks to individual files and select locations for replication. Gets a checkpoint of all metadata from NameNode once per interval (hour is the default). This data can be used to recreate the block -> file mappings if the NameNode is lost. However, it is not simply a hot backup for the NameNode. Description Serves up reads and writes from individual files, and sends periodic I m-alive messages, including files/blocks it is managing to the NameNode. In addition to the HBase and HDFS control processes listed above, other control node processes include: Process Management Server Process Description Ambari, Cloudera Manager, etc. web page node. Some management servers do detailed database and analytic function. In smaller clusters, control processes and data processes might reside on the same node. For larger clusters, management processes have significantly different provisioning requirements and so are often isolated on different nodes. The reference architecture assumes separate control and data nodes. 3 Capacity Planning This section discusses issues and sizing recommendations to take into consideration when sizing an EsgynDB Enterprise database. 3.1 Processing Usage When sizing the processing power for an EsgynDB Enterprise cluster, consider the following: In a typical high-performance configuration, nodes for management are configured separately from data nodes. The two types are typically provisioned differently for storage (size, configuration) as well as network and memory. In a very small or test configuration, the distinction between data nodes and control nodes is blurred, and most management processes are collocated with data processes for both Hadoop/HBase and EsgynDB. So long as this configuration meets performance and availability objectives, this configuration is valid, especially for basic development and test clusters. Consider the following factors when assessing the required number of nodes: More nodes with fewer cores is preferable to an equivalent number of cores spread over fewer nodes, so long as the number of cores per node is reasonably modern (e.g., 8 or more) for typical production workloads. Scaling out (increasing the number of nodes to achieve the desired number of cores) is preferable to scaling up (increasing the cores per node to achieve the desired number of cores) because: o More nodes with fewer cores is typically cheaper than fewer nodes with more cores 06Dec2015 2015 Esgyn Corporation 5

o o The domain of failure is smaller when losing a node or disk on a cluster with more nodes The available I/O bandwidth and parallelism is higher with more nodes. Clusters smaller than 3 nodes are not advised, given HDFS replication requirements for availability and recoverability. The number of simultaneous users (concurrency) drives the number external corporate networkconnected nodes, as does the ingest rate for data arrival/refresh. This number determines the total number of mxosrvr processes. The actual connections are distributed around the cluster based on mxosrvr process distribution. Multiple mxosrvr processes can run on the same node. Types of workloads are the other key considerations for number of nodes. The number of nodes and cores reflects the amount of parallelism available for concurrent users of the applications running on the cluster. If typical workloads are high-concurrency short queries, then thinner nodes might be acceptable. If typical workloads involve large scans, then more processing power is needed. Understand the types, frequency, plans, and typical concurrency for the application, ideally via prototyping the workloads and queries whenever possible. 3.2 Memory Usage When sizing an EsgynDB Enterprise cluster for memory usage, keep in mind the following considerations: Many Hadoop ecosystem processes are Java processes. Due to memory efficiency optimizations for the JVM, there is a significant restriction just below 32GB. Crossing this threshold actually results in less usable memory because the internal representation of pointers changes in a way that consumes significantly more space. Large memory consumers for data nodes include: o HDFS DataNode processes o HBase RegionServers Among control processes, the large memory consumers are: HDFS NameNode processes Plan for these processes to use a heap size of 16-32GB each for optimal performance on a large cluster. Reducing the memory for these components affects performance significantly, so do careful tuning and analysis before choosing a smaller value. The primary users of memory in the EsgynDB database engine are the mxosrvrs. For each concurrent connection on a node, plan for 512MB (0.5 GB) per connection per node. 3.3 Disk Usage When sizing an EsgynDB Enterprise cluster for disk usage, keep in mind the following considerations: For data nodes, SSD is only beneficial for high concurrency write. In general HDD is sufficient. For control nodes, SSD is similarly not cost effective the goal is to have most control information cached in memory. For data nodes, HDD data disks configure disks as direct attached storage in a JBOD (Just a Bunch of Disks) configuration. RAID striping slows down HDFS and actually reduces concurrency and recoverability. For control nodes, data disks can be configured as either JBOD or RAID1 or RAID10. 06Dec2015 2015 Esgyn Corporation 6

As with processing power, disks are a unit of parallelism. For a given total-disk-per-node value, if workloads include many large scans, it is often most effective to have more smaller disks than fewer larger disks per node on data nodes. The reference architecture assumes that most workloads include large scans. HBase SNAPPY or GZ compression is strongly suggested. SNAPPY has less CPU overhead, but GZ compresses better. Degree of compression varies widely depending on the data and workload patterns, but generally accepted calculations suggest around a 30%-40% reduction, depending on data. Compression adds to the path length for reading and writing, which can have an effect on data growth and ingest. Compression happens at the HBase file block level, limiting the amount of un-compression required at read time. When calculating overall disk space and data disk space per node, be sure to account for working space and anticipated ingest/outflow per node. Also remember that blocks of an HDFS file come with a replication factor (typically set to 3, so 3 copies of the data). That means that each 10 GB file actually occupies 30GB on disk. Esgyn recommends leaving approximately 33% of disk space free for overhead workspace. 3.4 Network Usage When sizing an EsgynDB Enterprise cluster for network usage, keep in mind the following considerations: In general, 10GigE is the standard for networking for data traffic within an EsgynDB cluster. Using a slower network for data flow can significantly impact performance. 2 bonded 10GigE networks provide more throughput for I/O intensive applications. In some cases, a second slower network is configured for cluster (not Hadoop/HBase) maintenance in order to keep that traffic separate from the operational data workflow. Consider failure scenarios when connecting nodes from different racks together. HDFS block placement algorithm is biased to select nodes on at least 2 different racks for a block s location if the replication factor is 3 or greater. If using the cross datacenter feature in EsgynDB Enterprise Advanced 2.0, there must be a highspeed connection between the two data centers. If using the cross datacenter feature in EsgynDB Enterprise Advanced 2.0, both clusters must be configured so that the application can actively connect to either peer cluster via EsgynDB drivers when both are running and accessible. This capability ensures that the application can access either cluster exclusively in case of loss of communications with one of the two. 4 Reference Architecture Guidance for Production Bare Metal Cluster This section contains recommendations for hardware configurations and software provisioning for a bare metal EsgynDB cluster. The recommendations are hardware-independent. Check with your hardware vendor for specific part numbers and availability/timeliness. The configuration described is for a medium or large EsgynDB installation, with separate control and data nodes. Smaller configurations with all processes on the same nodes are covered in separate section. For Data Nodes, the basic hardware recommendation for each node is: 06Dec2015 2015 Esgyn Corporation 7

Resource CPU Memory Recommendation Intel XEON or AMD 64-bit processors 8 Number of cores per node 16 64GB for overall Hadoop ecosystem and query processing plus usual overhead plus 0.5GB for each mxosrvr on the node. To calculate the number of mxosrvr processes: max number of concurrent connections number of nodes 64GB Memory Size 128GB. Most common value is 96GB. Network Storage 10GigE, 1GigE, or 2x10GigE Bonded SATA or SAS or SSD, typically 12-24 1TB disks configured in a JBOD configuration For Control Nodes, the basic hardware recommendation for each node is: Resource CPU Memory Recommendation Intel XEON or AMD 64-bit processors 8 Number of cores per node 16 64GB for overall Hadoop ecosystem and query + overhead for swapping and process maintenance as possible/desired. 64GB Memory Size 128GB. Most common value is 96GB. Network Storage 10GigE, 1GigE, or 2x10GigE Bonded, plus appropriate switches for off platform to on platform. SATA or SAS or SSD, typically 6-12 1TB disks configured in a RAID1 or RAID10 configuration 4.1 Medium/Large Deployment A medium or large deployment uses the specifications above including both control and management nodes. Processes are placed in these nodes as depicted in the following figure: 06Dec2015 2015 Esgyn Corporation 8

In the above picture, the control nodes flank the data nodes and are only used for the DCS master process. There s no specific constraint for node naming conventions, including no assumption that nodes are consecutively numbered. The vertical bars represent individual nodes, and the ovals represent processes within the node. 4.2 Small Deployment For a small (2-3 node, typically less than one rack) deployment, the control nodes are collapsed into the regular node infrastructure as follows: In the above picture, the control nodes have been removed and control processes run on the same nodes as the functional processes. 06Dec2015 2015 Esgyn Corporation 9

5 Cloud Deployment When deploying EsgynDB in a cloud environment such as Amazon s AWS, use the guidelines above to provision resources. For configuration use HDFS replication factor 3 if you choose instance local store for the file system, otherwise use HDFS replication factor 2 if you use EBS volumes. 6 Conclusion This EsgynDB Platform Reference Architecture document serves as a starting point for defining the platform to build an EsgynDB cluster where EsgynDB is the primary purpose for the cluster. It also is intended to assist application developers and users in planning the deployment strategy for EsgynDB applications. Esgyn recommends consulting with an Esgyn technical resource to get additional information, training, and guidance. 06Dec2015 2015 Esgyn Corporation 10