Twelve Cluster Technologies Available in SAS 9.4

Size: px
Start display at page:

Download "Twelve Cluster Technologies Available in SAS 9.4"

Transcription

1 ABSTRACT Paper SAS Twelve Cluster Technologies Available in SAS 9.4 Rob Collum, SAS Institute Inc. We are always looking for ways to improve the performance, efficiency, and availability of our investment in SAS solutions. To address those needs, SAS offers the ability to cluster many of its constituent software components. This session identifies twelve components of SAS software components that can be clustered and describes how they are designed to boost the capabilities of SAS to function in the enterprise. INTRODUCTION SAS software is frequently used to support critical analytics tasks that drive day-to-day business decisions. With successful implementation of those tasks, they take on heightened importance and popularity. Over time, the enterprise becomes increasingly reliant on SAS software to support running more tasks for a growing community of users. This growth amplifies the need for SAS software to deliver those results in an efficient and dependable manner. As the software use grows and expands, the SAS solution needs to scale to meet the demand. And as the business becomes ever more reliant on the solution, then the SAS software needs a robust deployment to ensure its continued availability even in the face of outages, planned or not. Clustering of software components provides the ability to support these needs, offering improved scalability and/or availability of the solution. This paper will briefly introduce the primary concepts to describe the different cluster technologies used by SAS software. It will also provide key architectural requirements as well as provide a rating of the difficulty of deployment based on this scale: CHALLENGING MEDIUM TRANSPARENT HARD Figure 1. Deployment Difficult Rating Scale EASY Deployment of the cluster technologies used by SAS range from TRANSPARENT where zero effort is required to enable the cluster all the way to CHALLENGING where significant effort to deploy and maintain the cluster is necessary. CLUSTERING 101 A cluster is a set of systems that work together with the goal of providing a single service. Planning the deployment of clustered software systems requires the system architects to understand the specific capabilities and requirements of each type of clustered component. SCALABILITY Scalability is an assessment of a system s potential to change in size. From a computing perspective this means that a system is scalable when there is a well-defined understanding of how we change the size of our computing environment to either increase or decrease its capacity to do work. As most computing work tends to grow in size over time, we are most often concerned with increasing our systems ability to process work. 1

2 When designed for scalability beyond a single host machine, the components of a software cluster are deployed across multiple hosts. Incoming tasks are distributed (that is, load balanced) across the software s cluster nodes on those host machines. AVAILABILITY Hardware failures in a computing environment are inevitable. Hard drives fail; power supplies overload; motherboards burn out; accidents happen. Software clusters designed for improved availability are resilient in the face of these challenges. The first goal of any available system is to remove potential single points of failure (SPOF). A single point of failure is a problem because if that particular item ceases to function properly, then other inter-dependent components won t be able to function either. To achieve improved availability, a software cluster is deployed across multiple host machines to remove single points of failure. This concept can be extended beyond just the server hosts, but also to storage solutions, racks, data centers, and even geography. CLUSTER TECHNOLOGIES AVAILABLE IN SAS SAS solutions consist of many discrete software elements. Each of these elements is responsible for a set of tasks to enable the fast, efficient delivery of analytics and results to users. The users do not need to know about the cluster technologies used at their site they just want their SAS jobs to run. SAS architects and administrators, on the other hand, need to be familiar with these cluster technologies to properly plan, deploy, and support SAS solutions. CLUSTER DEPLOYMENT CONCEPTS The various components of SAS solutions offer different cluster implementations. Some are optimized for scalability only, or for availability only. Some offer both improved scalability and availability. Furthermore, to fully realize the software cluster s potential, third-party software and/or hardware might be required. There are two computing architectures referred to in this paper: SMP symmetric multi-processing: Most software we work with regularly operates using the SMP model. It executes on a single host machine alongside other software simultaneously. It can take advantage of multi-threaded processing on one or more CPU cores. In a clustered environment, a new task is assigned by a load-balancing algorithm to one host machine for processing. MPP massively parallel processing: Some software is designed to operate across multiple host machines and work together to provide a single logical service. This approach allows for massive scalability since individual host machines can be improved, but more importantly, new hosts can be added to the cluster as needed. Typically, a task provided to this software is broken up into smaller chunks and distributed across the hosts of the cluster. Each host processes its chunk of the problem in parallel with the others to radically reduce the time needed to complete the overall task. For each cluster discussed here, this paper will: Identify the cluster s objective: scalability, availability, or both. Explain the cluster s functions in basic terms. Provide a simple rating to gauge the difficulty of the cluster s deployment. A rating of easy assumes expertise in planned SAS software deployments. A rating of hard implies that experience with thirdparty technologies, system administrative privileges, and/or complex manual configuration is required. Describe required deployment considerations. 2

3 01 SAS METADATA SERVER The SAS Metadata Server fulfills numerous roles in a SAS solution environment, including authentication, authorization, service definition, content management, and much more. It is one of the most critical components of the SAS Intelligence Platform architecture. Clustering Functions The SAS Metadata Server can be clustered to run as multiple instances on separate host machines to provide much higher availability since access to metadata is so crucial to the operations of all SAS solutions. The cluster technology used by SAS Metadata Server requires at least 3 configured instances and offers improved availability through active-active operation of all of the clustered nodes. CLUSTER OF SAS METADATA SERVERS host1.site.com host2.site.com host3.site.com MANAGER Metadata Node Metadata Node Metadata Node #3 Live Metadata Live Metadata Live Metadata Local disk Local disk Local disk Metadata Backups Network disk Figure 2. SAS Metadata Server Clustered to Operate across 3 Hosts Workload and data integrity are managed by one SAS Metadata Server Node, which operates as the Manager of the cluster. The remaining instances act as Worker Nodes. If one Worker fails, then the service will continue to operate uninterrupted with only 2 (or down to half of the configured number of instances, depending on certain conditions). If the Manager Node goes down, then the Workers will elect a new Manager and operations will resume. There are three kinds of clients that rely on the SAS Metadata Server. And how they discover where the active cluster nodes are running varies a bit: 1. Managed desktop clients: SAS Enterprise Guide, SAS Office Analytics, and so on Connection profile maintained by the client application itself Successful connection to any node of a clustered SAS Metadata Server automatically updates 3

4 the connection profile (in case of added or removed nodes) 2. Server processes: SAS Object Spawner, SAS Application Servers (Workspace, OLAP Server, others), and so on A connection profile for each service is maintained in the SAS configuration directory which is automatically created at install time. Use the sas-update-metadata-profile utility to make changes to the profile if the metadata cluster configuration changes. 3. Programmatic interfaces: SAS Display Manager, User-authored SAS programs, and so on Users can create their own metadata connection profile to describe individual or clustered metadata servers. Useful for dev/test/prod environments. When deployed in a clustered configuration, the SAS Metadata Server cannot support normal user operations with less than half of the configured hosts online. The reasons for this are based in the concept of quorum and preventing the potential computing problem known as split-brain where separate data sets get out of sync. In the illustration above, notice that each metadata node maintains its own local copy of metadata files. The Manager Node coordinates the Update operations of the cluster. A separate network file share is also required for the backup and recovery process. Clustering Deployment Considerations: Deploying a clustered SAS Metadata Server is EASY since it s a standard option of a planned SAS software deployment. Summary: Objective Operation Cluster Architecture Client discovery Deployment difficulty Improves availability Active-Active failover Manager coordinates update actions Manager delegates all user operations to the Workers SMP Minimum 3 server hosts, 4 is the maximum recommended No geographic separation Preferred: Use SAS Metadata Server Connection Profile Also: Direct metadata connection to any host (deprecated) Easy Standard option of a planned SAS deployment 02 SAS OBJECT SPAWNER The SAS Object Spawner provides a mechanism to launch and manage SAS Workspace Servers, SAS Stored Process Servers, and SAS Pooled Workspace Servers. If running multiple instances of the object spawner, then they can be configured to work together. This effectively creates a cluster of object spawners which then provides load balancing and higher availability of these services. There are several choices of load-balancing algorithms to choose from to meet your customer s workload requirements. 4

5 Clustering Functions When an object spawner first starts up, it communicates with the SAS Metadata Server to get its configuration properties, including references to any other object spawners in the environment. The object spawners will elect one to act as the parent-peer which acts as a load balancer for the cluster. The parent-peer node will also accept job assignments to launch SAS server processes on its host machine. CLUSTER OF SAS OBJECT SPAWNERS host1.site.com host2.site.com PARENT-PEER MANAGER Object Spawner PEER Object Spawner Workspace Server Workspace Server Stored Process Server Pooled Workspace Server Stored Process Server Pooled Workspace Server Data Source Network disk Figure 3. SAS Object Spawners Clustered to Launch SAS Server Processes across Multiple Hosts In the illustration above, notice the reference to shared network disk. The object spawner cluster does not require this (although it is a convenient way to deploy a shared configuration). However, users of the SAS server processes will not know which host is running the associated SAS Workspace Server (or Stored Process Server or Pooled Workspace Server). The proven practice is to place SAS data locations on a common shared filesystem that is mounted at the same directory location on each host. SAS clients that communicate with the object spawner include SAS Enterprise Guide, SAS Studio, SAS Enterprise Miner, and many more. These clients discover the connection information to communicate with the object spawner cluster from the SAS Metadata Server. 5

6 Clustering Deployment Considerations: Deploying SAS Object Spawners in a clustered configuration is EASY since the components are installed as part of a planned SAS software deployment. Some manual modification of metadata as well as the associated configuration files on disk is required. Summary: Objective Operation Cluster Architecture Client discovery Deployment difficulty Improve scalability and availability of SAS server processes Active-Active failover Parent-peer load balances requests, all nodes participate in job execution SMP Minimum 2 server hosts Excellent integration with SAS Grid Manager Object spawner definitions reside in SAS Metadata Server Easy Standard option of a planned SAS deployment with a few manual configuration items 03 SAS OLAP SERVER The SAS OLAP Server has much of the same technology as the object spawner baked right into it. So while a single OLAP instance can run alone, you have the option of deploying several and then configuring them to work together. Again, this creates a cluster of OLAP services that are able to load balance incoming requests and provide higher availability. Clustering Functions Like the object spawner the OLAP server requests its cluster configuration information at start-up from the SAS Metadata Server. One node acts as the parent-peer and handles load balancing for the cluster. All nodes participate in responding to tasks. 6

7 CLUSTER OF SAS OLAP SERVERS host1.site.com host2.site.com PARENT-PEER MANAGER OLAP Server PEER OLAP Server Data Source Network disk Figure 4. A Cluster of SAS OLAP Servers to Distribute Tasks across Multiple Host Machines Similar to the SAS server processes associated with the object spawner, the illustration above shows a shared file system at a common mount point on both hosts where the OLAP servers can find data. Clustering Deployment Considerations: Deploying SAS OLAP Server in a clustered configuration is EASY since the components are installed as part of a planned SAS software deployment. Some manual modification of metadata as well as the associated configuration files on disk is required. Summary: Objective Operation Cluster Architecture Client discovery Deployment difficulty Improve scalability and availability of SAS OLAP Server Active-Active failover Parent-peer load balances requests, all nodes participate in job execution SMP Minimum 2 server hosts Excellent integration with SAS Grid Manager OLAP Server definitions reside in SAS Metadata Server Easy Standard option of a planned SAS deployment with a few manual configuration items 7

8 04 SAS WEB APPLICATION SERVER The SAS Web Application Server is a specialized implementation of Pivotal tc Server which is itself a robustly configured implementation of Apache Tomcat suitable for enterprise use. The web app server is responsible for hosting the SAS web applications that are server-side Java programs that provide a web interface for access to SAS analytics and administration capabilities. Clustering Functions The SAS Web Application Server can be configured to provide vertical clustering with multiple Java virtual machine (JVM) instances on a single host machine as well as horizontal clustering with JVM instances deployed across multiple hosts. The SAS Deployment Wizard asks some simple questions about the host machines in the environment and then handles all of the tricky clustering details for you at installation time. 8

9 CLUSTER OF SAS WEB APPLICATION SERVERS host1.site.com host2.site.com SASServer1_1 (Pivotal tc Server) SASServer1_2 (Pivotal tc Server) SAS web apps: - SAS Themes - SAS Backup & Recovery - Etc. SAS web apps: - SAS Themes - SAS Backup & Recovery - Etc. SASServer2_1 (Pivotal tc Server) SASServer2_2 (Pivotal tc Server) SAS web apps: - SAS Studio - SAS Environment Mgr - Etc. SAS web apps: - SAS Studio - SAS Environment Mgr - Etc. SASServer11_1 (Pivotal tc Server) SASServer11_2 (Pivotal tc Server) SAS web apps: - SAS Enterprise Miner - SAS Model Manager - Etc. SAS web apps: - SAS Enterprise Miner - SAS Model Manager - Etc. SAS Web Server Automatic Load balancing Session aware Figure 5. The SAS Web Application Server with JVMs Clustered Both Vertically and Horizontally Depending on the SAS solutions licensed and deployed at your site, there could be as many as fifteen different JVMs used, named SASServer1 to SASServer15. The various web applications associated with those solutions are automatically placed in specific JVMs based on function. The SAS Web Server is automatically configured at installation time to act as a reverse proxy for a clustered SAS Web Application Server. Simply accessing a solution s portion of the URL via the SAS Web Server s address will automatically route to the appropriate SAS web application in a specific JVM ensuring proper load balancing. 9

10 User requests are associated with sessions that are proxied to always return to the same SAS Web Application Server instance where the session was initiated there is no way to replicate a user s session across multiple SAS Web Application Server instances. Clustering Deployment Considerations: Deploying SAS Web Application Server in a clustered configuration is EASY since the components are installed and configured as part of a planned SAS software deployment. Not all SAS web applications, however, are actually clusterable. Some can only run on a single JVM in the deployment. The SAS Deployment Wizard is aware of these web apps and configures services appropriately. Summary: Objective Operation Cluster Architecture Client discovery Deployment difficulty Improve scalability and availability of SAS web apps Active-Active failover All JVMs participate in user activity as needed. Load balancing is automatically handled by the SAS Web Server SMP Minimum 2 server hosts SAS Web App Server cluster configuration is stored in the SAS Metadata Server Easy Standard option of a planned SAS deployment 05 SAS WEB SERVER The SAS Web Server is a specialized implementation of Pivotal Web Server which is itself a robustly configured implementation of Apache HTTP Server suitable for enterprise use. The web server provides a single point of contact for all your SAS solution content available at your site. It acts as a reverse proxy and directs requests for content or web apps to the appropriate destination. Clustering Functions The SAS Web Server has no built-in functionality to support clustering. However, the HTTP protocol is designed to operate statelessly and if we take care to handle user sessions, then we can deploy a second SAS Web Server to provide improved availability. 10

11 CLUSTER OF SAS WEB SERVERS ws1.site.com ws2.site.com Web Server (Pivotal Web Server) Web Server (Pivotal Web Server) JARs and config Local disk JARs and config Local disk 3 rd Party Load Balancer Hardware or Software Session aware Figure 6. SAS Web Servers Clustered behind a Third-party Load Balancer In this type of deployment which is typical for Apache HTTP Servers the web servers do not communicate nor coordinate with each other. They operate independently. Coordination is handled by a third-party load balancer. You and your IT team can select the load balancing product that works best at your site. The key requirement is to ensure that the load balancer implements the common feature to direct a user s web session requests to the same web server every time sessions are load balanced, not just individual HTTP requests. Remember that the SAS Web Server will then forward the request on to the SAS Web Application Server where the session information resides to ensure that the user can work within the scope of her session. If one of the web servers fails, then any associated user sessions will be abandoned those affected users must log on to create a new session. All user-generated HTTP traffic should be directed to the load balancer, not directly to the clustered web servers. This will ensure that if one web server goes down, then the load balancer will direct all traffic to the remaining SAS Web Servers. Of course, to properly ensure high availability, the IT organization must configure the load balancer for improved availability as well. Clustering Deployment Considerations: Deploying the SAS Web Server in a clustered configuration is rated as MEDIUM difficulty. The SAS Deployment Wizard will only deploy a single SAS Web Server. To create a web server cluster, a second SAS Web Server must be created on a new host as a copy of the software and parameter files from the first with some manual configuration changes. Also, the site s IT team must deploy their choice of 11

12 a third-party load balancer that is configured to distribute requests and sessions across the SAS Web Servers. Summary: Objective Operation Cluster Architecture Client discovery Deployment difficulty Improve availability Active-Active failover Load balancing is provided by a third-party load balancer SMP Minimum 2 server hosts HTTP clients connect to the third-party load balancer Medium Manual deployment of additional SAS Web Servers Requires third-party hardware and/or software support as well as environmental and system administration privileges 06 SAS LASR ANALYTIC SERVER The SAS LASR Analytic Server is an in-memory analytic engine, which can be operated in two modes: distributed (also known as massively parallel processing or MPP) and non-distributed (also known as symmetric multi-processing or SMP). Distributed SAS LASR Analytic Server is of primary interest here and operates as a cluster of multiple instances, each of which runs on a separate host machine. This MPP approach gives SAS LASR Analytic Server the ability to scale up and handle extremely large volumes. SAS LASR Analytic Server, however, even as MPP, does not offer high availability or predictive load balancing. Scalability to tackle big data at blazing fast speeds is its primary objective. Clustering Functions The Distributed SAS LASR Analytic Server operates using the MPP paradigm. Analytic tasks in SAS LASR Analytic Server are processed by distributing the data for analysis evenly across multiple machines. Each worker instance of the distributed SAS LASR Analytic Server is given a subset of data to work with. After the workers have completed their assigned tasks on their respective sets of data, their results are collected by the SAS LASR Analytic Server Root Node and assembled into the final answer. 12

13 CLUSTER OF SAS LASR ANALYTIC SERVERS host1.site.com host2.site.com host3.site.com host4.site.com MANAGER ROOT LASR Node LASR Node LASR Node #3 LASR Node #4 SMP LASR SMP LASR SMP LASR Figure 7. Distributed SAS LASR Analytic Server for Large Data Tasks alongside SMP LASR Instances for Small Data Jobs A SAS LASR Analytic Server cluster is designed for ultimate scalability and is not highly available. Even co-locating your SAS LASR Analytic Server cluster alongside a Hadoop cluster which has high availability features enabled will not improve the availability of SAS LASR Analytic Server itself. If the LASR Root or any LASR Worker goes offline, then the SAS LASR Analytic Server cluster must be restarted and any required tables reloaded. If a LASR host machine must remain offline, then SAS LASR Analytic Server can be started with the remaining nodes (possibly requiring manual configuration). If SAS LASR Analytic Server works with SASHDAT files from a co-located Hadoop provider, then symmetry between the LASR nodes and the HDFS nodes must be maintained. A recent improvement in load balancing came with the release of SAS 9.4 M3. A distributed deployment of SAS LASR Analytic Server can also work with a number of individual SMP LASR instances in the same clustered environment to host small tables that could not be efficiently stored in a distributed manner. Clustering Deployment Considerations: Deploying the Distributed SAS LASR Analytic Server is rated as MEDIUM difficulty. The SAS Deployment Wizard does not deploy LASR. Instead, installation is performed at the Linux command-line and requires root-level privileges to complete. Furthermore, if SASHDAT is desired for staging data for rapidly (re-)loading into LASR, then a supported Hadoop distribution must be deployed alongside LASR symmetrically where the LASR Root Node is hosted on the same machine as the HDFS NameNode and the LASR Workers are placed alongside each of the HDFS DataNodes. Summary: Objective Operation Cluster Architecture Massive scalability All nodes critical LASR Root manages requests and directs the LASR Workers MPP (plus some SMP if needed) Minimum 4 server hosts 13

14 Client discovery Deployment difficulty LASR Root Node is defined in the SAS Metadata Server Medium Requires system administration privileges in Linux (and possibly Hadoop) Also possibly third-party hardware and/or software support 07 SAS HIGH-PERFORMANCE ANALYTICS SERVER Before there was SAS LASR Analytic Server, we had SAS High-Performance Analytics Server. And today, it still offers unique and valuable analytics capabilities not found in SAS LASR Analytic Server. Clustering Functions The SAS High-Performance Analytics Server also functions as an in-memory analytics engine in much the same way as SAS LASR Analytic Server. However, the main difference is that where SAS LASR Analytic Server persists that is, remains continuously available with resident data for incoming requests the SAS High-Performance Analytics Server instead will instantiate across all machines, load data and run the one job requested, and then shutdown. Each subsequent request repeats that process: start up a new server, load data, run analysis, return results, and quit. CLUSTER OF SAS HIGH-PERFORMANCE ANALYTIC SERVERS host1.site.com host2.site.com host3.site.com host4.site.com MANAGER ROOT HPAS Node HPAS Node HPAS Node #3 HPAS Node #4 Figure 8. Distributed SAS High-Performance Analytics Server for Large Data Tasks Scalability is also the primary objective here. The availability situation might be considered slightly better than SAS LASR Analytic Server, however. That s because each new request instantiates a new inmemory service. So it s possible that the loss of a host machine could go unnoticed if it occurs between requests. However, just like SAS LASR Analytic Server, if an SAS High-Performance Analytics Server host machine must remain offline, then manual configuration might be required so that a new instance of SAS High-Performance Analytics Server can run on the remaining hosts. Clustering Deployment Considerations: Deploying the Distributed SAS High-Performance Analytic Server software is rated as MEDIUM difficulty. The SAS Deployment Wizard does not deploy SAS High-Performance Analytics Server. Instead, installation is performed at the Linux command-line and requires root-level privileges to complete. SAS LASR Analytic Server and SAS High-Performance Analytics Server rely on the same underlying software to operate in MPP mode - TKGrid. So a single deployment of TKGrid with proper licensing can host both SAS LASR Analytic Server and SAS High-Performance Analytics Server running at the same time. Be 14

15 aware that SAS LASR Analytic Server and SAS High-Performance Analytics Server use memory independently of each other. And while you can process a LASR table with SAS High-Performance Analytics Server, realize that execution of the HP PROC will make its own in-memory copy of that data before processing. Summary: Objective Operation Cluster Architecture Client discovery Deployment difficulty Massive scalability All nodes critical SAS High-Performance Analytics Server Root Node manages requests and directs the Worker Nodes MPP Minimum 4 server hosts SAS High-Performance Analytics Server Root Node host, port, and other connection information are specified in your SAS program code Medium Requires system administration privileges in Linux (and possibly Hadoop) Also possibly third-party hardware and/or software support 08 SAS IN-DATABASE EMBEDDED PROCESS (for Hadoop) The SAS Embedded Process is part of the SAS In-Database technology offerings. The idea behind SAS In-Database is to move analytic capabilities to where the data resides already (versus copying data from a central store over to analytic services hosted elsewhere). The embedded process achieves this goal beautifully. Clustering Functions SAS offers the embedded process for several third-party data providers. In this paper, we will look at the SAS Embedded Process for Hadoop. The embedded process software is deployed to an existing Hadoop cluster. An instance of the embedded process runs on each and every Hadoop node responsible for executing MapReduce jobs. In this way, the embedded process is automatically in a position to leverage the existing Hadoop abilities supporting availability and scalability while participating in the Hadoop load balancing scheme via YARN. 15

16 MPP CLUSTER OF SAS EMBEDDED PROCESS YARN hdfs-nn1.site.com or MR1 hdfs-dn1.site.com hdfs-dn2.site.com hdfs-dn3.site.com YARN or MR1 YARN or MR1 YARN or MR1 EP Node EP Node EP Node #3 EP Node #4 Hive Hive Hive Hive HDFS HDFS HDFS HDFS Beyond performing analytics within the Hadoop cluster, the embedded process can also perform data transfer from each embedded process node directly to each node of a SAS LASR Analytic Server or SAS High-Performance Analytics Server cluster. This is known as parallel data transfer and provides a fast and efficient means to copy data over to your in-memory analytics solution. The SAS Embedded Process also supports deployment to other database technology vendors and in those situations, relies on specific requirements to deploy and run properly into each of those environments. Clustering Deployment Considerations: Deploying the SAS Embedded Process software is rated as MEDIUM difficulty. The SAS Deployment Wizard does not deploy the embedded process. Instead, installation is performed one of three ways: 1. At the Linux command-line on the Hadoop cluster and requires root-level privileges to complete. 2. Use SAS Deployment Manager to create a Cloudera parcel for the embedded process and then Cloudera Manager can deploy to the Cloudera Distribution Hadoop cluster. 3. Use SAS Deployment Manager to create a stack for Hortonworks for the embedded process and then Ambari can deploy to the Hortonworks Data Platform cluster. In order to install the embedded process in Hadoop, you must have administration-level privileges to manage HDFS. Database permissions to access data in Hive (or another supported DBMS based on Hadoop) are needed as well. Summary: Objective Operation Cluster Architecture Massive scalability while relying on the Hadoop load balancing and availability model All nodes are used The EP instantiates on each node as a MapReduce job and works with the data locally available MPP The EP deploys to all MapReduce hosts 16

17 Client discovery Deployment difficulty Host, port, and other connection information for the SAS Embedded Process are specified in your SAS program code Medium Requires system administration privileges in Linux and Hadoop Also possibly third-party hardware and/or software support 09 SAS CACHE LOCATOR The SAS Cache Locator is a specialized implementation of Pivotal GemFire. It provides service discovery functionality and informs new, connecting members (like the SAS Web Application Server) where other running members are located. Clustering Functions The members of a SAS Cache Locator cluster act as peers and provide failover support for each other. If one goes down, the other can continue doing all of the work. CLUSTER OF SAS CACHE LOCATOR SAS-server.site.com SAS-midtier.site.com PEER Cache Locator (Pivotal GemFire) PEER Cache Locator (Pivotal GemFire) Figure 9. Automatically Deployed Cluster of SAS Cache Locator Clustering Deployment Considerations: Deploying the SAS Cache Locator as a cluster is rated as TRANSPARENT difficulty. For multi-machine deployments, the SAS Deployment Wizard will automatically deploy two SAS Cache Locators in a clustered arrangement. One is placed on the primary middle-tier host machine and the other on the primary server-tier host. Summary: Objective Operation Cluster Architecture Improved scalability and availability of the Cache Locator service Active-Active failover Both nodes participate in task execution SMP Minimum 2 server hosts 17

18 Client discovery Deployment difficulty Host and port of cluster nodes automatically configured in participating members Transparent Cluster deployment is fully automatic in multi-tier deployments 10 SAS JMS BROKER The SAS JMS Broker is a SAS deployment of Apache ActiveMQ software. It provides Java message services for SAS solutions. Clustering Functions Clustering is achieved through the use of a master-slave paradigm where all instances rely on a shared file system to access common files. To ensure high availability of the JMS Broker, then the shared filesystem for common files must also be hardened against failure. CLUSTER OF SAS JMS BROKER host1.site.com host2.site.com JMS Broker (Apache ActiveMQ) JMS Broker (Apache ActiveMQ) WARM STANDBY JMS Broker Common files Network disk Figure 10. SAS JMS Broker after Manual Deployment with a Failover Node Clustering Deployment Considerations: Deploying the SAS JMS Broker as a cluster is rated as HARD difficulty. The SAS Deployment Wizard will only deploy a single JMS Broker instance. The broker can be clustered through additional manual steps to eliminate the single point of failure and improve availability. 18

19 Summary: Objective Operation Cluster Architecture Client discovery Deployment difficulty Improved availability of the JMS Broker service Active-Passive failover with a Warm Standby node Only one node is active to execute tasks SMP Minimum 2 server hosts Host and port of cluster nodes automatically configured in participating members Hard Requires environment and system administration privileges as well as using third-party documentation. 11 SAS ENVIRONMENT MANAGER SERVER The SAS Environment Manager is a web application that provides an administrative and monitoring interface for your SAS solution through your web browser. The SAS Environment Manager interface relies on an infrastructure built using VMware vfabric Hyperic software, which consists of a SAS Environment Manager Server and SAS Environment Manager Agents. An agent is automatically deployed by the SAS Deployment Wizard to each host machine in your SAS solution and collects operational metrics, resource utilization, and so on. The agents then transmit their data to the SAS Environment Manager Server, which stores the data in a central database the SAS Web Infrastructure Data Server for monitoring and reporting purposes. In its way then, the SAS Environment Manager functionality requires its own complex, multi-host architecture. Improving the availability of SAS Environment Manager means enabling clustering for those other pieces as well. However, for this part of our discussion about clustering, we focus only on the SAS Environment Manager Server. Clustering Functions Clustering the SAS Environment Manager Server improves availability of the service by mitigating the single node of failure. It requires manual effort to deploy a second node. A third-party load-balancer must also be used to automatically redirect traffic to the warm standby node if the primary node goes offline. 19

20 CLUSTER OF SAS ENVIRONMENT MANAGER SERVER host1.site.com host2.site.com Environment Manager Server (VMWare vfabric Hyperic Server) Environment Manager Server (VMWare vfabric Hyperic Server) WARM STANDBY 3 rd Party Load Balancer Hardware or Software Figure 11. SAS Environment Manager Server Clustered for Improved Availability Clustering Deployment Considerations: Deploying the SAS Environment Manager Server as a cluster is rated as HARD difficulty. The SAS Deployment Wizard will only deploy a single SAS Environment Manager Server. The Server component can be clustered through additional manual steps to eliminate the single point of failure and improve availability. Summary: Objective Operation Cluster Architecture Client discovery Deployment difficulty Improved availability of the SAS Environment Manager Server Active-Passive failover with a Warm Standby node Only one node is active to execute tasks SMP Minimum 2 server hosts Clients connect to the third-party load balancer Hard Requires environment and system administration privileges as well as using third-party documentation. 12 SAS WEB INFRASTRUCTURE PLATFORM DATA SERVER 20

21 The SAS Web Infrastructure Platform Data Server is built using a PostgreSQL RDBMS and Pgpool-II middleware. It provides transactional data storage for use by SAS middle-tier and SAS solutions software. Clustering Functions The SAS Web Infrastructure Platform Data Server can be clustered to mitigate a single point of failure to improve availability. The Postgres software provides a standard approach to this type of cluster where a second instance of the PostgreSQL server runs as a warm standby node with its own copy of the database in Read-Only mode. Streaming replication is used so that the contents of the live database are copied in near real time to the standby database. CLUSTER OF SAS WEB INFRASTRUCTURE PLATFORM DATA SERVER host1.site.com host2.site.com WIP Data Server (postgresql 9.1.9) Streaming Replication WIP Data Server (postgresql 9.1.9) WARM STANDBY Live database Local disk Read-only database Local disk pgpool-ii Server (pgpool-ii 3.3.4) SAS Web Application Server Figure 12. SAS Web Infrastructure Platform Data Server Clustered for Improved Availability Clustering Deployment Considerations: Deploying the SAS Web Infrastructure Platform Data Server as a cluster is rated as CHALLENGING difficulty. The SAS Deployment Wizard will only deploy a single SAS Web Infrastructure Platform Data Server service. The SAS Web Infrastructure Platform Data Server can be clustered through additional manual steps to eliminate the single point of failure and improve availability. Clustering of the SAS Web Infrastructure Platform Data Server must be enabled at initial deployment of the SAS solution. A single instance cannot be retro-fitted to operate as a cluster. By default, the SAS Deployment Wizard configures a different PostgreSQL Server instance for each SAS database. So enabling high availability must be considered for the SharedServices database, the SAS Environment Manager database, any SAS solution-specific databases, and so on. In this explanation, we did not eliminate the one instance of Pgpool-II Server as a single point of failure. SAS Technical Support is working with SAS R&D to evaluate additional steps for that as well. Never configure more than one primary SAS Web Infrastructure Platform Data Server for use simultaneously. Doing so can result in data loss or corruption that can be very difficult to recover from 21

22 without full backups of the database(s). Make regular backups of the database(s) as a best practice. The high-availability configuration is not a substitute for making regular backups. Summary: Objective Operation Cluster Architecture Client discovery Deployment difficulty Improved availability of the SAS databases in SAS Web Infrastructure Platform Data Server Active-Passive failover with a Warm Standby node Only one node is active to execute tasks SMP Minimum 2 server hosts SAS High-Availability JDBC shim, which redirects queries and connections to the database on the running host Challenging Limited applicability; involves manifold expert configurations; requires environment and system administration privileges as well as using third-party documentation. 22

23 CONCLUSION There are twelve different cluster technologies used by the SAS 9.4 platform. CLUSTER TECHNOLOGIES IN SAS 9.4 SAS IN-DATABASE ANALYTICS SAS METADATA TIER SAS MIDDLE TIER SAS EMBEDDED PROCESS in HADOOP SAS METADATA SERVER SAS WEB SERVER YARN hdfs-nn1.site.com EP Node or MR1 hdfs-dn1.site.com EP Node hdfs-dn2.site.com EP Node #3 hdfs-dn3.site.com YARN or MR1 YARN or MR1 YARN or MR1 EP Node #4 MANAGER Metadata Node Metadata Node Metadata Node #3 Web Server (Pivotal Web Server) 3 rd Party Load Balancer Web Server (Pivotal Web Server) Hive Hive Hive Hive Live Live Live Metadata Metadata Metadata Local disk Local disk Local disk JARs and config Local disk JARs and config Local disk HDFS HDFS HDFS HDFS SAS IN-MEMORY ANALYTICS Metadata Backups Network disk SAS COMPUTE TIER SAS ENVIRONMENT MANAGER SERVER Environment Manager Server (VMWare vfabric Hyperic Server) 3 rd Party Load Balancer Environment Manager Server (VMWare vfabric Hyperic Server) WARM STANDBY SAS LASR ANALYTIC SERVER SAS OBJECT SPAWNER and IOM SERVERS SAS WEB APPLICATION SERVER MANAGER ROOT LASR Node LASR Node LASR Node #3 LASR Node #4 PARENT-PEER MANAGER Object Spawner PEER Object Spawner SASServer1_1 (Pivotal tc Server) SAS web apps: - SAS Themes - SAS Backup & Recovery - Etc. SASServer1_2 (Pivotal tc Server) SAS web apps: - SAS Themes - SAS Backup & Recovery - Etc. SMP LASR SMP LASR SMP LASR Workspace Server Stored Process Server Pooled Workspace Server Workspace Server Stored Process Server Pooled Workspace Server SASServer2_1 (Pivotal tc Server) SAS web apps: - SAS Studio - SAS Environment Mgr - Etc. SASServer2_2 (Pivotal tc Server) SAS web apps: - SAS Studio - SAS Environment Mgr - Etc. MANAGER ROOT HPAS Node SAS HIGH-PERFORMANCE ANALYTIC SERVER HPAS Node HPAS Node #3 HPAS Node #4 Data Source Network disk SASServer11_1 (Pivotal tc Server) SAS web apps: - SAS Enterprise Miner - SAS Model Manager - Etc. SASServer11_2 (Pivotal tc Server) SAS web apps: - SAS Enterprise Miner - SAS Model Manager - Etc. PEER SAS CACHE LOCATOR PEER Cache Locator (Pivotal GemFire) Cache Locator (Pivotal GemFire) SAS OLAP SERVER SAS WEB INFRASTRUCTURE PLATFORM DATA SERVER PARENT-PEER MANAGER PEER WIP Data Server (postgresql 9.1.9) WIP Data Server (postgresql 9.1.9) WARM STANDBY OLAP Server OLAP Server Live database Local disk pgpool-ii Server (pgpool-ii 3.3.4) Read-only database Local disk SAS JMS BROKER Data Source Network disk JMS Broker (Apache ActiveMQ) JMS Broker (Apache ActiveMQ) WARM STANDBY JMS Broker Common files Network disk Figure 13. Twelve Cluster Technologies Available in SAS 9.4 Clustering the components of the SAS software solution offerings improves the services ability to scale up for greater load, eliminates any single points of failure to improve availability of the services, and in some cases, can do both. 23

24 The techniques for clustering vary from one software component to another as does the considerations around deployment and usage. In many cases, the installation utility for each component of the SAS software can automatically configure individual services to operate as a cluster. However, a few SAS products will require manual deployment and configuration to provide a cluster. ACKNOWLEDGMENTS I would like to thank the following people for taking the time to review and contribute to this paper: Scott McCauley Edoardo Riva Mark Thomas Simon Williams RECOMMENDED READING For more information about the different clustering technologies used by SAS 9.4 software components, read the following. SAS METADATA SERVER SAS 9.4 Intelligence Platform: System Administration Guide, Fourth Edition SAS OBJECT SPAWNER SAS 9.4 Intelligence Platform: Application Server Administration Guide SAS OLAP SERVER SAS 9.4 Intelligence Platform: Application Server Administration Guide SAS WEB APPLICATION SERVER SAS 9.4 Intelligence Platform: Middle-Tier Administration Guide, Fourth Edition SAS 9.4 Guide to Software Updates Exceptions to Middle-Tier Clustering Support SAS WEB SERVER SAS 9.4 Intelligence Platform: Middle-Tier Administration Guide, Fourth Edition SAS LASR ANALYTIC SERVER SAS High-Performance Analytics Infrastructure 3.5 Installation and Configuration Guide SAS LASR Analytic Server 2.8: Reference Guide SAS HIGH-PERFORMANCE ANALYTIC SERVER SAS High-Performance Analytics Infrastructure 3.5 Installation and Configuration Guide SAS IN-DATABASE EMBEDDED PROCESS SAS In-Database Products: Administrator s Guide, Eighth Edition SAS CACHE LOCATOR SAS 9.4 Intelligence Platform: Middle-Tier Administration Guide, Fourth Edition 24

25 SAS JMS BROKER SAS 9.4 Intelligence Platform: Middle-Tier Administration Guide, Fourth Edition Original ApacheMQ documentation Features > Clustering > MasterSlave > Shared File System Master Slave SAS ENVIRONMENT MANAGER SAS 9.4 Intelligence Platform: Middle-Tier Administration Guide, Fourth Edition Original VMware vfabric Hyperic documentation vfabric Hyperic Configuration > Configure and Run the Hyperic Server > Clustering Hyperic Servers for Failover ering_hyperic_servers_for_failover.html SAS WEB INFRASTRUCTURE PLATFORM DATA SERVER SAS 9.4 Intelligence Platform: Middle-Tier Administration Guide, Fourth Edition Technical papers on support.sas.com Configuring the SAS Web Infrastructure Platform Data Server for High Availability Managing SAS Web Infrastructure Platform Data Server High-Availability Clusters CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Rob Collum SAS Institute Inc. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 25

SAS Visual Analytics 7.3: Installation and Configuration Guide (Distributed SAS LASR )

SAS Visual Analytics 7.3: Installation and Configuration Guide (Distributed SAS LASR ) SAS Visual Analytics 7.3: Installation and Configuration Guide (Distributed SAS LASR ) SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2015. SAS Visual

More information

EsgynDB Enterprise 2.0 Platform Reference Architecture

EsgynDB Enterprise 2.0 Platform Reference Architecture EsgynDB Enterprise 2.0 Platform Reference Architecture This document outlines a Platform Reference Architecture for EsgynDB Enterprise, built on Apache Trafodion (Incubating) implementation with licensed

More information

SAS Viya : Architect for High Availability Now and Users Will Thank You Later

SAS Viya : Architect for High Availability Now and Users Will Thank You Later Paper SAS1835-2018 SAS Viya : Architect for High Availability Now and Users Will Thank You Later Jerry Read, SAS Institute Inc. ABSTRACT You have SAS Viya installed and running for your business. Everyone

More information

High-availability services in enterprise environment with SAS Grid Manager

High-availability services in enterprise environment with SAS Grid Manager ABSTRACT Paper 1726-2018 High-availability services in enterprise environment with SAS Grid Manager Andrey Turlov, Allianz Technology SE; Nikolaus Hartung, SAS Many organizations, nowadays, rely on services

More information

SAS Viya 3.2 Administration: SAS Infrastructure Data Server

SAS Viya 3.2 Administration: SAS Infrastructure Data Server SAS Viya 3.2 Administration: SAS Infrastructure Data Server SAS Infrastructure Data Server: Overview SAS Infrastructure Data Server is based on PostgreSQL version 9 and is configured specifically to support

More information

Cloudera Exam CCA-410 Cloudera Certified Administrator for Apache Hadoop (CCAH) Version: 7.5 [ Total Questions: 97 ]

Cloudera Exam CCA-410 Cloudera Certified Administrator for Apache Hadoop (CCAH) Version: 7.5 [ Total Questions: 97 ] s@lm@n Cloudera Exam CCA-410 Cloudera Certified Administrator for Apache Hadoop (CCAH) Version: 7.5 [ Total Questions: 97 ] Question No : 1 Which two updates occur when a client application opens a stream

More information

Using SAS Enterprise Guide with the WIK

Using SAS Enterprise Guide with the WIK Using SAS Enterprise Guide with the WIK Philip Mason, Wood Street Consultants Ltd, United Kingdom ABSTRACT Enterprise Guide provides an easy to use interface to SAS software for users to create reports

More information

Two-Machine Deployment of SAS Office Analytics 7.4

Two-Machine Deployment of SAS Office Analytics 7.4 Two-Machine Deployment of SAS Office Analytics 7.4 SAS Documentation January 8, 2018 The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2017. Two-Machine Deployment of

More information

SAS Viya 3.3 Administration: Licensing

SAS Viya 3.3 Administration: Licensing Viya 3.3 Administration: Licensing Licensing: Overview Viya uses a single licensing file. Both Cloud Analytic Services (CAS) and Foundation use the same. During installation, a is applied to both the CAS

More information

Configuring and Deploying Hadoop Cluster Deployment Templates

Configuring and Deploying Hadoop Cluster Deployment Templates Configuring and Deploying Hadoop Cluster Deployment Templates This chapter contains the following sections: Hadoop Cluster Profile Templates, on page 1 Creating a Hadoop Cluster Profile Template, on page

More information

TECHNICAL OVERVIEW OF NEW AND IMPROVED FEATURES OF EMC ISILON ONEFS 7.1.1

TECHNICAL OVERVIEW OF NEW AND IMPROVED FEATURES OF EMC ISILON ONEFS 7.1.1 TECHNICAL OVERVIEW OF NEW AND IMPROVED FEATURES OF EMC ISILON ONEFS 7.1.1 ABSTRACT This introductory white paper provides a technical overview of the new and improved enterprise grade features introduced

More information

SAS Viya 3.4 Administration: Licensing

SAS Viya 3.4 Administration: Licensing Viya 3.4 Administration: Licensing Licensing: Overview........................................................................... 1 Where the License File Resides..........................................................

More information

SAS 9.4 Intelligence Platform: Overview, Second Edition

SAS 9.4 Intelligence Platform: Overview, Second Edition SAS 9.4 Intelligence Platform: Overview, Second Edition SAS Documentation September 19, 2017 The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2016. SAS 9.4 Intelligence

More information

SAS Viya 3.2 Administration: Monitoring

SAS Viya 3.2 Administration: Monitoring SAS Viya 3.2 Administration: Monitoring Monitoring: Overview SAS Viya provides monitoring functions through several facilities. Use the monitoring system that matches your needs and your environment: SAS

More information

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP TITLE: Implement sort algorithm and run it using HADOOP PRE-REQUISITE Preliminary knowledge of clusters and overview of Hadoop and its basic functionality. THEORY 1. Introduction to Hadoop The Apache Hadoop

More information

SAS Environment Manager 2.1

SAS Environment Manager 2.1 SAS Environment Manager 2.1 User s Guide Second Edition SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2013. SAS Environment Manager 2.1: User's

More information

Whitepaper: Back Up SAP HANA and SUSE Linux Enterprise Server with SEP sesam. Copyright 2014 SEP

Whitepaper: Back Up SAP HANA and SUSE Linux Enterprise Server with SEP sesam.  Copyright 2014 SEP Whitepaper: Back Up SAP HANA and SUSE Linux Enterprise Server with SEP sesam info@sepusa.com www.sepusa.com Table of Contents INTRODUCTION AND OVERVIEW... 3 SOLUTION COMPONENTS... 4-5 SAP HANA... 6 SEP

More information

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop

More information

VMware Cloud Application Platform

VMware Cloud Application Platform VMware Cloud Application Platform Jerry Chen Vice President of Cloud and Application Services Director, Cloud and Application Services VMware s Three Strategic Focus Areas Re-think End-User Computing Modernize

More information

Hybrid Backup & Disaster Recovery. Back Up SAP HANA and SUSE Linux Enterprise Server with SEP sesam

Hybrid Backup & Disaster Recovery. Back Up SAP HANA and SUSE Linux Enterprise Server with SEP sesam Hybrid Backup & Disaster Recovery Back Up SAP HANA and SUSE Linux Enterprise Server with SEP sesam 1 Table of Contents 1. Introduction and Overview... 3 2. Solution Components... 3 3. SAP HANA: Data Protection...

More information

powered by Cloudian and Veritas

powered by Cloudian and Veritas Lenovo Storage DX8200C powered by Cloudian and Veritas On-site data protection for Amazon S3-compliant cloud storage. assistance from Lenovo s world-class support organization, which is rated #1 for overall

More information

Maximum Availability Architecture: Overview. An Oracle White Paper July 2002

Maximum Availability Architecture: Overview. An Oracle White Paper July 2002 Maximum Availability Architecture: Overview An Oracle White Paper July 2002 Maximum Availability Architecture: Overview Abstract...3 Introduction...3 Architecture Overview...4 Application Tier...5 Network

More information

Namenode HA. Sanjay Radia - Hortonworks

Namenode HA. Sanjay Radia - Hortonworks Namenode HA Sanjay Radia - Hortonworks Sanjay Radia - Background Working on Hadoop for the last 4 years Part of the original team at Yahoo Primarily worked on HDFS, MR Capacity scheduler wire protocols,

More information

Enhancing VMware Horizon View with F5 Solutions

Enhancing VMware Horizon View with F5 Solutions Enhancing VMware Horizon View with F5 Solutions VMware Horizon View is the leading virtualization solution for delivering desktops as a managed service to a wide range of devices. F5 BIG-IP devices optimize

More information

<Insert Picture Here> Oracle Coherence & Extreme Transaction Processing (XTP)

<Insert Picture Here> Oracle Coherence & Extreme Transaction Processing (XTP) Oracle Coherence & Extreme Transaction Processing (XTP) Gary Hawks Oracle Coherence Solution Specialist Extreme Transaction Processing What is XTP? Introduction to Oracle Coherence

More information

System Specification

System Specification NetBrain Integrated Edition 7.0 System Specification Version 7.0b1 Last Updated 2017-11-07 Copyright 2004-2017 NetBrain Technologies, Inc. All rights reserved. Introduction NetBrain Integrated Edition

More information

Qlik Sense Enterprise architecture and scalability

Qlik Sense Enterprise architecture and scalability White Paper Qlik Sense Enterprise architecture and scalability June, 2017 qlik.com Platform Qlik Sense is an analytics platform powered by an associative, in-memory analytics engine. Based on users selections,

More information

Accelerate Big Data Insights

Accelerate Big Data Insights Accelerate Big Data Insights Executive Summary An abundance of information isn t always helpful when time is of the essence. In the world of big data, the ability to accelerate time-to-insight can not

More information

SAS Factory Miner 14.2: Administration and Configuration

SAS Factory Miner 14.2: Administration and Configuration SAS Factory Miner 14.2: Administration and Configuration SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2016. SAS Factory Miner 14.2: Administration

More information

Introduction to Hadoop and MapReduce

Introduction to Hadoop and MapReduce Introduction to Hadoop and MapReduce Antonino Virgillito THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Large-scale Computation Traditional solutions for computing large

More information

Cloud Computing & Visualization

Cloud Computing & Visualization Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International

More information

ELASTIC DATA PLATFORM

ELASTIC DATA PLATFORM SERVICE OVERVIEW ELASTIC DATA PLATFORM A scalable and efficient approach to provisioning analytics sandboxes with a data lake ESSENTIALS Powerful: provide read-only data to anyone in the enterprise while

More information

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015 Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document

More information

Hortonworks Data Platform

Hortonworks Data Platform Hortonworks Data Platform Workflow Management (August 31, 2017) docs.hortonworks.com Hortonworks Data Platform: Workflow Management Copyright 2012-2017 Hortonworks, Inc. Some rights reserved. The Hortonworks

More information

VMware View Upgrade Guide

VMware View Upgrade Guide View 4.0 View Manager 4.0 View Composer 2.0 This document supports the version of each product listed and supports all subsequent versions until the document is replaced by a new edition. To check for

More information

50 Must Read Hadoop Interview Questions & Answers

50 Must Read Hadoop Interview Questions & Answers 50 Must Read Hadoop Interview Questions & Answers Whizlabs Dec 29th, 2017 Big Data Are you planning to land a job with big data and data analytics? Are you worried about cracking the Hadoop job interview?

More information

Every SAS Cloud has a Silver Lining. Letting your data reign in the cloud

Every SAS Cloud has a Silver Lining. Letting your data reign in the cloud Every SAS Cloud has a Silver Lining Letting your data reign in the cloud DSS SAS SYSTEM Current Single Virtual Server unit with 16 cores upgraded to 32 cores 256 Gb RAM 150 registered users Data collector

More information

Installing and Configuring VMware Identity Manager Connector (Windows) OCT 2018 VMware Identity Manager VMware Identity Manager 3.

Installing and Configuring VMware Identity Manager Connector (Windows) OCT 2018 VMware Identity Manager VMware Identity Manager 3. Installing and Configuring VMware Identity Manager Connector 2018.8.1.0 (Windows) OCT 2018 VMware Identity Manager VMware Identity Manager 3.3 You can find the most up-to-date technical documentation on

More information

Bringing OpenStack to the Enterprise. An enterprise-class solution ensures you get the required performance, reliability, and security

Bringing OpenStack to the Enterprise. An enterprise-class solution ensures you get the required performance, reliability, and security Bringing OpenStack to the Enterprise An enterprise-class solution ensures you get the required performance, reliability, and security INTRODUCTION Organizations today frequently need to quickly get systems

More information

FUJITSU Storage ETERNUS AF series and ETERNUS DX S4/S3 series Non-Stop Storage Reference Architecture Configuration Guide

FUJITSU Storage ETERNUS AF series and ETERNUS DX S4/S3 series Non-Stop Storage Reference Architecture Configuration Guide FUJITSU Storage ETERNUS AF series and ETERNUS DX S4/S3 series Non-Stop Storage Reference Architecture Configuration Guide Non-stop storage is a high-availability solution that combines ETERNUS SF products

More information

Ultra Messaging Queing Edition (Version ) Guide to Queuing

Ultra Messaging Queing Edition (Version ) Guide to Queuing Ultra Messaging Queing Edition (Version 6.10.1) Guide to Queuing 2005-2017 Contents 1 Introduction 5 1.1 UMQ Overview.............................................. 5 1.2 Architecture...............................................

More information

Surveillance Dell EMC Isilon Storage with Video Management Systems

Surveillance Dell EMC Isilon Storage with Video Management Systems Surveillance Dell EMC Isilon Storage with Video Management Systems Configuration Best Practices Guide H14823 REV 2.0 Copyright 2016-2018 Dell Inc. or its subsidiaries. All rights reserved. Published April

More information

Hadoop and HDFS Overview. Madhu Ankam

Hadoop and HDFS Overview. Madhu Ankam Hadoop and HDFS Overview Madhu Ankam Why Hadoop We are gathering more data than ever Examples of data : Server logs Web logs Financial transactions Analytics Emails and text messages Social media like

More information

Maximizing Availability With Hyper-Converged Infrastructure

Maximizing Availability With Hyper-Converged Infrastructure Maximizing Availability With Hyper-Converged Infrastructure With the right solution, organizations of any size can use hyper-convergence to help achieve their most demanding availability objectives. Here

More information

SAS Viya 3.4 Administration: Monitoring

SAS Viya 3.4 Administration: Monitoring SAS Viya 3.4 Administration: Monitoring Monitoring: Overview.......................................................................... 1 Monitoring: Concepts..........................................................................

More information

Configuring Ports for Big Data Management, Data Integration Hub, Enterprise Information Catalog, and Intelligent Data Lake 10.2

Configuring Ports for Big Data Management, Data Integration Hub, Enterprise Information Catalog, and Intelligent Data Lake 10.2 Configuring s for Big Data Management, Data Integration Hub, Enterprise Information Catalog, and Intelligent Data Lake 10.2 Copyright Informatica LLC 2016, 2017. Informatica, the Informatica logo, Big

More information

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case

More information

Server Installation Guide

Server Installation Guide Server Installation Guide Server Installation Guide Legal notice Copyright 2018 LAVASTORM ANALYTICS, INC. ALL RIGHTS RESERVED. THIS DOCUMENT OR PARTS HEREOF MAY NOT BE REPRODUCED OR DISTRIBUTED IN ANY

More information

Managing Complex SAS Metadata Security Using Nested Groups to Organize Logical Roles

Managing Complex SAS Metadata Security Using Nested Groups to Organize Logical Roles Paper 1789-2018 Managing Complex SAS Metadata Security Using Nested Groups to Organize Logical Roles ABSTRACT Stephen Overton, Overton Technologies SAS Metadata security can be complicated to setup and

More information

International Journal of Advance Engineering and Research Development. A Study: Hadoop Framework

International Journal of Advance Engineering and Research Development. A Study: Hadoop Framework Scientific Journal of Impact Factor (SJIF): e-issn (O): 2348- International Journal of Advance Engineering and Research Development Volume 3, Issue 2, February -2016 A Study: Hadoop Framework Devateja

More information

A BigData Tour HDFS, Ceph and MapReduce

A BigData Tour HDFS, Ceph and MapReduce A BigData Tour HDFS, Ceph and MapReduce These slides are possible thanks to these sources Jonathan Drusi - SCInet Toronto Hadoop Tutorial, Amir Payberah - Course in Data Intensive Computing SICS; Yahoo!

More information

Oracle Big Data Connectors

Oracle Big Data Connectors Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process

More information

WEBSPHERE APPLICATION SERVER

WEBSPHERE APPLICATION SERVER WEBSPHERE APPLICATION SERVER Introduction What is websphere, application server, webserver? WebSphere vs. Weblogic vs. JBOSS vs. tomcat? WebSphere product family overview Java basics [heap memory, GC,

More information

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera, How Apache Hadoop Complements Existing BI Systems Dr. Amr Awadallah Founder, CTO Cloudera, Inc. Twitter: @awadallah, @cloudera 2 The Problems with Current Data Systems BI Reports + Interactive Apps RDBMS

More information

Introduction to the Active Everywhere Database

Introduction to the Active Everywhere Database Introduction to the Active Everywhere Database INTRODUCTION For almost half a century, the relational database management system (RDBMS) has been the dominant model for database management. This more than

More information

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017 Hadoop File System 1 S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y Moving Computation is Cheaper than Moving Data Motivation: Big Data! What is BigData? - Google

More information

VMware vsphere Big Data Extensions Administrator's and User's Guide

VMware vsphere Big Data Extensions Administrator's and User's Guide VMware vsphere Big Data Extensions Administrator's and User's Guide vsphere Big Data Extensions 1.1 This document supports the version of each product listed and supports all subsequent versions until

More information

Database Assessment for PDMS

Database Assessment for PDMS Database Assessment for PDMS Abhishek Gaurav, Nayden Markatchev, Philip Rizk and Rob Simmonds Grid Research Centre, University of Calgary. http://grid.ucalgary.ca 1 Introduction This document describes

More information

Distributed File Systems II

Distributed File Systems II Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation

More information

Hyper-Converged Infrastructure: Providing New Opportunities for Improved Availability

Hyper-Converged Infrastructure: Providing New Opportunities for Improved Availability Hyper-Converged Infrastructure: Providing New Opportunities for Improved Availability IT teams in companies of all sizes face constant pressure to meet the Availability requirements of today s Always-On

More information

Intelligence Platform

Intelligence Platform SAS Publishing SAS Overview Second Edition Intelligence Platform The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2006. SAS Intelligence Platform: Overview, Second Edition.

More information

Is Your Data Viable? Preparing Your Data for SAS Visual Analytics 8.2

Is Your Data Viable? Preparing Your Data for SAS Visual Analytics 8.2 Paper SAS1826-2018 Is Your Data Viable? Preparing Your Data for SAS Visual Analytics 8.2 Gregor Herrmann, SAS Institute Inc. ABSTRACT We all know that data preparation is crucial before you can derive

More information

EMC VPLEX Geo with Quantum StorNext

EMC VPLEX Geo with Quantum StorNext White Paper Application Enabled Collaboration Abstract The EMC VPLEX Geo storage federation solution, together with Quantum StorNext file system, enables a global clustered File System solution where remote

More information

@joerg_schad Nightmares of a Container Orchestration System

@joerg_schad Nightmares of a Container Orchestration System @joerg_schad Nightmares of a Container Orchestration System 2017 Mesosphere, Inc. All Rights Reserved. 1 Jörg Schad Distributed Systems Engineer @joerg_schad Jan Repnak Support Engineer/ Solution Architect

More information

Amazon AWS-Solution-Architect-Associate Exam

Amazon AWS-Solution-Architect-Associate Exam Volume: 858 Questions Question: 1 You are trying to launch an EC2 instance, however the instance seems to go into a terminated status immediately. What would probably not be a reason that this is happening?

More information

What is Cloud Computing? What are the Private and Public Clouds? What are IaaS, PaaS, and SaaS? What is the Amazon Web Services (AWS)?

What is Cloud Computing? What are the Private and Public Clouds? What are IaaS, PaaS, and SaaS? What is the Amazon Web Services (AWS)? What is Cloud Computing? What are the Private and Public Clouds? What are IaaS, PaaS, and SaaS? What is the Amazon Web Services (AWS)? What is Amazon Machine Image (AMI)? Amazon Elastic Compute Cloud (EC2)?

More information

1 BRIEF / Oracle Solaris Cluster Features and Benefits

1 BRIEF / Oracle Solaris Cluster Features and Benefits Oracle Solaris Cluster is a comprehensive high availability (HA) and disaster recovery (DR) solution for Oracle SPARC and x86 environments that is based on Oracle Solaris. It combines extreme service availability

More information

What about when it s down? An Application for the Enhancement of the SAS Middle Tier User Experience

What about when it s down? An Application for the Enhancement of the SAS Middle Tier User Experience Paper 11421-2016 What about when it s down? An Application for the Enhancement of the SAS Middle Tier User Experience Christopher Blake, Royal Bank of Scotland ABSTRACT The SAS Web Application Server goes

More information

ScaleArc for SQL Server

ScaleArc for SQL Server Solution Brief ScaleArc for SQL Server Overview Organizations around the world depend on SQL Server for their revenuegenerating, customer-facing applications, running their most business-critical operations

More information

Document Sub Title. Yotpo. Technical Overview 07/18/ Yotpo

Document Sub Title. Yotpo. Technical Overview 07/18/ Yotpo Document Sub Title Yotpo Technical Overview 07/18/2016 2015 Yotpo Contents Introduction... 3 Yotpo Architecture... 4 Yotpo Back Office (or B2B)... 4 Yotpo On-Site Presence... 4 Technologies... 5 Real-Time

More information

DriveScale-DellEMC Reference Architecture

DriveScale-DellEMC Reference Architecture DriveScale-DellEMC Reference Architecture DellEMC/DRIVESCALE Introduction DriveScale has pioneered the concept of Software Composable Infrastructure that is designed to radically change the way data center

More information

SAS 9.3 Intelligence Platform

SAS 9.3 Intelligence Platform SAS 9.3 Intelligence Platform Middle-Tier Administration Guide Second Edition SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2012. SAS 9.3 Intelligence

More information

Appendix A - Glossary(of OO software term s)

Appendix A - Glossary(of OO software term s) Appendix A - Glossary(of OO software term s) Abstract Class A class that does not supply an implementation for its entire interface, and so consequently, cannot be instantiated. ActiveX Microsoft s component

More information

The Ins and Outs of Internal and External Host Names with SAS Grid Manager

The Ins and Outs of Internal and External Host Names with SAS Grid Manager Paper SAS1775-2018 The Ins and Outs of Internal and External Host Names with SAS Grid Manager Paula Kavanagh, SAS Institute Inc., Cary, NC ABSTRACT It is common to have network topologies introduce an

More information

Service Manager. Installation and Deployment Guide

Service Manager. Installation and Deployment Guide Service Manager powered by HEAT Installation and Deployment Guide 2017.2 Copyright Notice This document contains the confidential information and/or proprietary property of Ivanti, Inc. and its affiliates

More information

SAS Inventory Optimization 5.1

SAS Inventory Optimization 5.1 SAS Inventory Optimization 5.1 System Administration Guide SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Insitute Inc. 2011. SAS Inventory Optimization 5.1: System

More information

Scaling DreamFactory

Scaling DreamFactory Scaling DreamFactory This white paper is designed to provide information to enterprise customers about how to scale a DreamFactory Instance. The sections below talk about horizontal, vertical, and cloud

More information

SAS Viya 3.3 Administration: Identity Management

SAS Viya 3.3 Administration: Identity Management SAS Viya 3.3 Administration: Identity Management Identity Management Overview................................................................. 2 Getting Started with Identity Management......................................................

More information

Grid Computing in SAS 9.4, Fifth Edition

Grid Computing in SAS 9.4, Fifth Edition Grid Computing in SAS 9.4, Fifth Edition SAS Documentation September 12, 2017 The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2017. Grid Computing in SAS 9.4, Fifth

More information

Grid Computing in SAS 9.4

Grid Computing in SAS 9.4 Grid Computing in SAS 9.4 SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2013. Grid Computing in SAS 9.4. Cary, NC: SAS Institute Inc. Grid Computing

More information

WHITE PAPER NGINX An Open Source Platform of Choice for Enterprise Website Architectures

WHITE PAPER NGINX An Open Source Platform of Choice for Enterprise Website Architectures ASHNIK PTE LTD. White Paper WHITE PAPER NGINX An Open Source Platform of Choice for Enterprise Website Architectures Date: 10/12/2014 Company Name: Ashnik Pte Ltd. Singapore By: Sandeep Khuperkar, Director

More information

SAS Visual Analytics Environment Stood Up? Check! Data Automatically Loaded and Refreshed? Not Quite

SAS Visual Analytics Environment Stood Up? Check! Data Automatically Loaded and Refreshed? Not Quite Paper SAS1952-2015 SAS Visual Analytics Environment Stood Up? Check! Data Automatically Loaded and Refreshed? Not Quite Jason Shoffner, SAS Institute Inc., Cary, NC ABSTRACT Once you have a SAS Visual

More information

SAS Data Explorer 2.1: User s Guide

SAS Data Explorer 2.1: User s Guide SAS Data Explorer 2.1: User s Guide Working with SAS Data Explorer Understanding SAS Data Explorer SAS Data Explorer and the Choose Data Window SAS Data Explorer enables you to copy data to memory on SAS

More information

Cloudera s Enterprise Data Hub on the Amazon Web Services Cloud: Quick Start Reference Deployment October 2014

Cloudera s Enterprise Data Hub on the Amazon Web Services Cloud: Quick Start Reference Deployment October 2014 Cloudera s Enterprise Data Hub on the Amazon Web Services Cloud: Quick Start Reference Deployment October 2014 Karthik Krishnan Page 1 of 20 Table of Contents Table of Contents... 2 Abstract... 3 What

More information

ABSTRACT MORE THAN SYNTAX ORGANIZE YOUR WORK THE SAS ENTERPRISE GUIDE PROJECT. Paper 50-30

ABSTRACT MORE THAN SYNTAX ORGANIZE YOUR WORK THE SAS ENTERPRISE GUIDE PROJECT. Paper 50-30 Paper 50-30 The New World of SAS : Programming with SAS Enterprise Guide Chris Hemedinger, SAS Institute Inc., Cary, NC Stephen McDaniel, SAS Institute Inc., Cary, NC ABSTRACT SAS Enterprise Guide (with

More information

SAS Contextual Analysis 13.2: Administrator s Guide

SAS Contextual Analysis 13.2: Administrator s Guide SAS Contextual Analysis 13.2: Administrator s Guide SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2014. SAS Contextual Analysis 13.2: Administrator's

More information

Optimizing Pulse Secure Access Suite with Pulse Secure Virtual Application Delivery Controller solution

Optimizing Pulse Secure Access Suite with Pulse Secure Virtual Application Delivery Controller solution DATASHEET Optimizing Pulse Secure Access Suite with Pulse Secure Virtual Application Delivery Controller solution Features & Benefits Best-in-class VPN and vadc solutions A single point of access for all

More information

Microsoft SQL Server

Microsoft SQL Server Microsoft SQL Server Abstract This white paper outlines the best practices for Microsoft SQL Server Failover Cluster Instance data protection with Cohesity DataPlatform. December 2017 Table of Contents

More information

Coherence & WebLogic Server integration with Coherence (Active Cache)

Coherence & WebLogic Server integration with Coherence (Active Cache) WebLogic Innovation Seminar Coherence & WebLogic Server integration with Coherence (Active Cache) Duško Vukmanović FMW Principal Sales Consultant Agenda Coherence Overview WebLogic

More information

Dept. Of Computer Science, Colorado State University

Dept. Of Computer Science, Colorado State University CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [HADOOP/HDFS] Trying to have your cake and eat it too Each phase pines for tasks with locality and their numbers on a tether Alas within a phase, you get one,

More information

SAS Viya : What It Means for SAS Administration

SAS Viya : What It Means for SAS Administration ABSTRACT Paper SAS0644-2017 SAS Viya : What It Means for SAS Administration Mark Schneider, SAS Institute Inc. Not only does the new SAS Viya platform bring exciting advancements in high-performance analytics,

More information

High-Performance Analytics Infrastructure 2.5

High-Performance Analytics Infrastructure 2.5 SAS High-Performance Analytics Infrastructure 2.5 Installation and Configuration Guide SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2014. SAS High-Performance

More information

Oracle WebLogic Server 11g: Administration Essentials

Oracle WebLogic Server 11g: Administration Essentials Oracle University Contact Us: +33 (0) 1 57 60 20 81 Oracle WebLogic Server 11g: Administration Essentials Duration: 5 Days What you will learn This Oracle WebLogic Server 11g: Administration Essentials

More information

Addressing Data Management and IT Infrastructure Challenges in a SharePoint Environment. By Michael Noel

Addressing Data Management and IT Infrastructure Challenges in a SharePoint Environment. By Michael Noel Addressing Data Management and IT Infrastructure Challenges in a SharePoint Environment By Michael Noel Contents Data Management with SharePoint and Its Challenges...2 Addressing Infrastructure Sprawl

More information

Hadoop. Introduction / Overview

Hadoop. Introduction / Overview Hadoop Introduction / Overview Preface We will use these PowerPoint slides to guide us through our topic. Expect 15 minute segments of lecture Expect 1-4 hour lab segments Expect minimal pretty pictures

More information

SAS. Installation Guide Fifth Edition Intelligence Platform

SAS. Installation Guide Fifth Edition Intelligence Platform SAS Installation Guide Fifth Edition 9.1.3 Intelligence Platform The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2006. SAS 9.1.3 Intelligence Platform: Installation

More information

Installing SmartSense on HDP

Installing SmartSense on HDP 1 Installing SmartSense on HDP Date of Publish: 2018-07-12 http://docs.hortonworks.com Contents SmartSense installation... 3 SmartSense system requirements... 3 Operating system, JDK, and browser requirements...3

More information

Identifying Workloads for the Cloud

Identifying Workloads for the Cloud Identifying Workloads for the Cloud 1 This brief is based on a webinar in RightScale s I m in the Cloud Now What? series. Browse our entire library for webinars on cloud computing management. Meet our

More information

Datacenter replication solution with quasardb

Datacenter replication solution with quasardb Datacenter replication solution with quasardb Technical positioning paper April 2017 Release v1.3 www.quasardb.net Contact: sales@quasardb.net Quasardb A datacenter survival guide quasardb INTRODUCTION

More information

Frequently Asked Questions about SAS Environment Manager on SAS 9.4

Frequently Asked Questions about SAS Environment Manager on SAS 9.4 ABSTRACT Paper SAS0575-2017 Frequently Asked Questions about SAS Environment Manager on SAS 9.4 Zhiyong Li, SAS Institute Inc. SAS Environment Manager is the predominant tool for managing your SAS environment.

More information