DriveScale-CLOUDERA Reference Architecture

Size: px
Start display at page:

Download "DriveScale-CLOUDERA Reference Architecture"

Transcription

1 DriveScale-CLOUDERA Reference Architecture

2 Table of Contents 1. Executive Summary Audience and Scope DriveScale Cloudera Enterprise Solution Overview DriveScale Components Overview Hardware: DriveScale Adapter Chassis with DriveScale Adapters Software Reference Architecture Details Physical Cluster Component List Logical Cluster Topology Physical Cluster Topology Cluster Management Enabling Hadoop Virtualization Extensions Disk and Filesystem Layout OS Supportability/Compatibility Matrix JBOD Supportability/Compatibility Matrix Rack Scalability of 55

3 7. References Bill of Materials Conclusion Appendix A: Glossary of Terms Appendix B: DriveScale Cluster Install Configure Your Domains with DriveScale Central (DSC) Set up DMS nodes Set up DriveScale Adapter (DSA) Start the DMS and setup login to DMS Set up Servers/DataNodes/MasterNodes Tagging JBOD and drives Creating Server Nodes and Clusters from templates Appendix B: Cloudera Manager Install Cloudera Manager Installation Procedure for Reference Architecture of 55

4 DRIVESCALE-CLOUDERA 1. Executive Summary This document is a high-level design reference architecture guide for implementing Cloudera Enterprise on a DriveScale solution with industry standard servers and JBOD. The reference architecture introduces all the high-level hardware, and software that are included in the stack. Each high-level component is then described individually. This reference architecture does not describe the Cloudera data components or their applications. DriveScale Technology Overview DriveScale is leading the charge in bringing hyperscale computing capabilities to mainstream enterprises. Its compose-able data center architecture transforms rigid data centers into flexible and responsive scale-out deployments. Using DriveScale, data center administrators can deploy independent pools of commodity compute and storage resources, automatically discover available assets, and combine and recombine these resources as needed. The solution is provided through a set of on-premises and SaaS tools that coordinate between multiple levels of infrastructure. With DriveScale, Hadoop architects can more easily support Hadoop deployments of any size as well as other modern application workloads. DriveScale provides hardware and software technology that allows separate deployment of compute and storage using commodity servers with minimal drives for Operating System and JBODs (Just a Bunch of Disks), with flexible binding of storage-to-compute resources in any ratio required by an application. As needs change, these bindings can be dissolved and reconfigured on demand, all under software control. DriveScale technology acquires a deep understanding of the physical infrastructure and dynamics of a data center, which it uses to provide an integrated set of intelligence and automation tools to scaleout data center infrastructure to greatly simplify and optimize the data center s operations. 2. Audience and Scope This reference architecture guide is for Hadoop and IT architects who are responsible for the design and deployment of Cloudera Enterprise solutions on premises, as well as for Apache Hadoop administrators and architects and data center architects/engineers who collaborate with specialists in that space. 3. DriveScale Cloudera Enterprise Solution Overview Apache Hadoop is designed to address the ever so changing hardware requirements from customers for a more flexible and dynamic hardware infrastructure that provides significant cost and operational benefits. It is designed with composability as the primary goal, saving money, improving utilization and greatly simplifying the deployment of Hadoop clusters. Hadoop is an Apache project being developed in the Java programming language by a global community of contributors. Yahoo!, has been the largest contributor to this project, and uses Apache Hadoop extensively across its businesses. Core committers on the Hadoop project include employees from Cloudera, ebay, Facebook, Getopt, Hortonworks, Huawei, IBM, InMobi, INRIA, LinkedIn, MapR, 4 of 55

5 Microsoft, Pivotal, Twitter, UC Berkeley, VMware, WANdisco, and Yahoo!, with contributions from many more individuals and organizations. Although Hadoop is popular and widely used, installing, configuring, and running a production Hadoop cluster involves many concerns, including: Choosing the appropriate Hadoop software distribution and extensions Installing monitoring and management software Allocation of Hadoop services to physical nodes Selection of appropriate server hardware Rightsizing the storage configuration Implementing data locality Design of the network fabric Sizing and system scalability Overall performance These concerns are complicated by the need to understand the workloads that will be running on the cluster, the fast-moving pace of the core Hadoop project, and the challenges to managing a system designed to scale to thousands of nodes in a single cluster. The DriveScale Cloudera Solution was designed by DriveScale in collaboration with Cloudera, and embodies all the hardware, software, resources and services needed to run Hadoop in a production environment. This end-to-end solution approach means that you can be in production with Hadoop in a shorter time than is typically possible with homegrown solutions. The solution is based on Cloudera Enterprise Data Hub 5.x (including Cloudera Distributed Hadoop), DriveScale hardware and software, industry standard servers, network switches and JBODs. This solution includes components that span the entire solution stack: Reference architecture and best practices Optimized storage configurations Optimized network infrastructure Cloudera Enterprise Data Hub including Cloudera Distributed Hadoop This solution is designed to address the clear majority of Apache Hadoop use cases including, but not limited to: Big data analytics ETL Offload Data Warehouse Optimization Batch processing of unstructured data Big data visualization Search and predictive analysis 5 of 55

6 4. DriveScale Components Overview DriveScale system is composed of one hardware component and four software components which are described below: 4.1 Hardware: DriveScale Adapter Chassis with DriveScale Adapters This is a 1U appliance with adapters that connect to servers via 10Gb Ethernet interfaces and to JBOD s via SAS interfaces. 4.2 Software There are four principal components of the DriveScale software: a) DriveScale Management Server (DMS) The server running the DMS software bundle is called the DMS node. A typical deployment consists of three DMS s in a clustered configuration for high availability (HA). The software manages and configure resources and contains the inventory/configuration information repository and database: 3 Inventory: DMS s, DS Adapters, switches, JBOD chassis, disks, server nodes 3 Configuration: node templates, cluster templates, configured clusters 3 DMS Database: used as a message bus to communicate with the end points. b) DriveScale Server Agent DriveScale Server Agent discovery action provides inventory for hardware and servers, and creates mappings between server nodes and the disks they consume. c) DriveScale Central (DSC) Cloud-based software management portal that acts as the: o o software distribution repositories for subscribers DriveScale keys repository 6 of 55

7 o o o centralized log file repository user documentation repository license manager d) DriveScale Adapter Firmware DriveScale Adapter firmware enables the JBODs to be mapped to the servers and over the network to be used as local drives. 5. Reference Architecture Details 5.1 Physical Cluster Component List The following table lists the physical components for the cluster. Component Configuration Description Quantity DriveScale Adapter Chassis DHCP, Jumbo frame enabled 1U appliance with adapters that connect to servers via Ethernet, and to JBOD s via SAS. 2 DriveScale Adapter DHCP, Jumbo frame enabled Provides the data network. 4 for each chassis DriveScale Management Server (DMS) DMS running as a VM Manages and configures the nodes and cluster and also stores the inventory/ configuration repository of every hardware in the cluster. Min 1, for HA 3 DMS s should be configured as master and slave Servers 2 socket CPU and memory according to the individual Hadoop cluster requirements Commodity x86 servers that house all the Node Manager, compute instances and DriveScale agents. Min 3 Master nodes + 5 Data nodes HDD for Servers 2 drives configured in RAID 1 The internal drives are used for OS install. 2 for each server NICs Dual port 10 Gbps Ethernet SFP+ NICs. Provides the data network 1 for each server JBOD Chassis Default configuration Houses the drive with dual IO controllers. Min 2, Recommended 3 for production environment by Cloudera 7 of 55

8 Component Configuration Description Quantity HDD for JBOD Default configuration Drives to house the data for the cluster. Depending on the cluster requirements ToR 10G switch LLDP, MLAG, 9K Jumbo Frame configured Provides data network connectivity. 2 for each rack ToR 1G switch Default configuration Provides management network connectivity. 1 for each rack 5.2 Logical Cluster Topology The minimum requirements to build out the cluster are: 3 Master Nodes 5 Data Nodes 1 DriveScale Adapter Chassis 1 DriveScale Management Server 2 10G Switches 1 1G Switch 2 JBOD s chassis with drives This reference architecture is built on 3 master nodes and 5 data nodes with 2 JBOD chassis and 126 drives of 1or 2 or 3 TB HDD. The following table lists the configurations of the servers and number of drives used. For clusters that require the maximum read bandwidth out of each attached drive concurrently, it is recommended that the nodes in such a cluster be configured with a maximum of 8 drives each, assuming 2 x 10Gbps Ethernet bandwidth per node. However, this is an extreme case. A general rule of thumb for calculating the number of drives to allocate to each node in a cluster is dependent on the application but it is safe to allocate up to 16 drives per node, again assuming 2 x 10Gbps Ethernet bandwidth per node. With the availability of quad-port 10Gbps Ethernet adapters, one can add significantly higher I/O per node and therefore greater numbers of drives as well. Component Configuration Description Quantity Master nodes 2 sockets 8 core CPU, 64GB RAM, 10GbE Intel NIC with 2 internal HDD for OS and 4 high capacity HDD mounted from the JBOD. Master nodes hosts the Cloudera master services and DriveScale agents. 3 8 of 55

9 Component Configuration Description Quantity Worker nodes 2 sockets 8 core CPU, 64GB RAM, 10GbE Intel NIC with 2 internal HDD for OS and 16 high capacity HDD mounted from the JBOD. For Impala nodes, the minimum RAM should be 128GB. Data nodes house the HDFS Data Nodes and YARN Node managers, any additional required services and DriveScale agents. 5 Notes: - Customers with higher (or lower) compute needs can acquire bigger (or smaller) data nodes configured with CPU and memory that fits the specific requirements of their applications. - Similarly, depending on the data requirements, customers can add or remove disk drives to match the specific needs of their applications. The following table identifies service roles for different node types. Master Node Master Node Master Node Worker Node ZooKeeper ZooKeeper ZooKeeper ZooKeeper YARN Resource Manager Resource Manager History Server Node Manager Hive MetaStore, WebHCat, HiveServer2 Management (misc) Navigator HUE Cloudera Agent Cloudera Agent Cloudera Agent, Oozie, Cloudera Manager, Management Services Navigator, Key Management Services HUE Cloudera Agent HBASE HMaster HMaster HMaster Region Server 9 of 55

10 Master Node Master Node Master Node Worker Node Impala Search Kafka StateStore, Catalog Impala Daemon Solr Broker Spark History Server Runs on YARN HDFS NameNode, QJN NameNode, QJN QJN DataNode 5.3 Physical Cluster Topology Diagram 1: DriveScale lab Architecture with 2xDSA Chassis (8x Adapters in use), 2x JBOD, 3 Master Nodes and 5 Data Nodes 10 of 55

11 Diagram 2: DriveScale lab Architecture with 3xDSA Chassis (12x Adapters in use), 2x JBOD, 3 Master Nodes and 5 Data Nodes Notes: - The 1GbE management connections were made to 10GbE switches, DSA chassis s and JBOD s. The connections are omitted in the diagram to ease readability. - 1GbE connection is used only for server management purpose with BMC IDRAC (Dell) or ilo (HPE). It is not a part of the Hadoop network. Please note that multi-homed clusters are not supported by Cloudera. - SAS connections from DSA chassis 2 to JBOD 2 was also replicated as DSA chassis 1 to JBOD 1. The connections are omitted in the diagram to make it look less congested. - The drives to the master nodes and data nodes were distributed across the two JBOD s chassis. 11 of 55

12 5.4 Cluster Management This section details the steps for setting up a DriveScale enabled Hadoop cluster using Cloudera manager. Setting up DriveScale cluster Before installing Cloudera Manager or using an existing install of Cloudera Manager, you must complete the following tasks for setting up the DriveScale solution: 1. Rack and install the DriveScale Adapter chassis and controllers (DSAs) using the documentation provided by DriveScale. 2. Rack and install the JBOD s using the documentation provided by the vendor. 3. Rack and install the server using the documentation provided by the vendor. 4. Create RAID 1 for the internal HDD on the server and install the OS on all the servers. 5. Install and configure DriveScale Management Server (DMS) either as a VM or on a standalone server. 6. Set up DSA configuration from DMS. 7. Install and configure DriveScale agents on the master and data nodes. 8. Create master/data node and cluster template with required drives using DMS. 9. Create the cluster from the template using DMS. 10. Ensure that DriveScale cluster is up and running before proceeding ahead. 12 of 55

13 Setting up Cloudera cluster 1. After the successful completion of the steps mentioned above, install Cloudera Manager using the Cloudera CDH Installation guide. 2. Ensure that Cloudera HDFS cluster is set up in a high availability mode. 3. The following services were set up for this reference architecture. HDFS HBase Hive Hue Impala Kafka Oozie Solr Spark YARN ZooKeeper 13 of 55

14 4. Ensure that the master and data nodes are up and running with the right assigned roles and storage. 5.5 Enabling Hadoop Virtualization Extensions With DriveScale solution, we enable configuration of a highly available Hadoop cluster including rack awareness. Hadoop Virtualization Extensions (HVE), enables customers to get additional capabilities for failure mitigation and rack awareness thereby enabling the cluster to survive the worst-casescenario of total power or hardware failure of any component including JBOD failures for an extended period. HVE can be enabled in Cloudera Manager. To enable HVE, follow the documentation on HVE from Cloudera. Also, below are the steps we followed for this reference architecture. For this reference architecture, below are the name details of the master and data nodes: Node Types Master Nodes Data Nodes Server Names u32.data1.r3.hq.drivescale.com u33.data1.r3.hq.drivescale.com u34.data1.r3.hq.drivescale.com u27.data1.r3.hq.drivescale.com u28.data1.r3.hq.drivescale.com u29.data1.r3.hq.drivescale.com u30.data1.r3.hq.drivescale.com u31.data1.r3.hq.drivescale.com 14 of 55

15 1. Go to the Cloudera Manager. a) Configure the following safety valves based on your environment: o HDFS hdfs coresite.xml: <property> <name>net.topology.impl</name> <value>org.apache.hadoop.net.networktopologywithnodeg roup</value> </property> <property> <name>net.topology.nodegroup.aware</name> <value>true</value> </property> <property> <name>dfs.block.replicator.classname</name> <value>org.apache.hadoop.hdfs.server.blockmanagement. BlockPlacementPolicyWithNodeGroup</value> </property> o YARN YARN Service MapReduce Advanced Configuration Snippet (Safety Valve), add the following properties and values: <property> <name>mapred.jobtracker.nodegroup.aware</name> <value>true</value> </property> <property> <name>mapred.task.cache.levels </name> <value>3</value> </property> 15 of 55

16 b) Based on the number of JBOD s and data nodes, create a minimum of 3 zones with at least 1 or 2 data nodes in each zone. Notes: - If the replication factor required in your environment is 3, then a minimum of 3 zones are required while setting up HVE. - This is because only one copy of the data is saved in each HVE zone. 16 of 55

17 c) For this reference architecture, we created 4 zones with a minimum of 1 or 2 data nodes in each zone. Refer to the notes section below for detailed reasoning. Notes: - For this reference architecture, each of the JBODs have two drawers of drives. There are two expanders in each of the four drawers in the 2 JBODs. All the drives in each drawer were tagged using the DMS UI. - Each of the nodes belonging to the same zone were created from drives of the same drawer of the JBOD. - The table below lists all the nodes with the JBOD and drawer ID along with the zone ID. Node Types Server Names JBOD/Drawer ID Zone ID Master Nodes u32.data1.r3.hq.drivescale.com J1D1 1 u33.data1.r3.hq.drivescale.com J1D2 2 u34.data1.r3.hq.drivescale.com J2D1 3 Data Nodes u27.data1.r3.hq.drivescale.com J1D1 1 u28.data1.r3.hq.drivescale.com J1D2 2 u29.data1.r3.hq.drivescale.com J2D1 3 u30.data1.r3.hq.drivescale.com J2D2 4 u31.data1.r3.hq.drivescale.com J2D2 4 d) Select Hosts -> All hosts. e) Select hosts u27 and u32. f) Click on Action: Assign Rack 17 of 55

18 g) Assign rack name /default/zone1 h) Select hosts u28 and u33 i) Click on Action: Assign Rack j) Assign rack name /default/zone2 k) Select hosts u29 and u34 l) Click on Action: Assign Rack m) Assign rack name /default/zone3 n) Select hosts u30 and u31 18 of 55

19 o) Click on Action: Assign Rack p) Assign rack name /default/zone4 2. Go back to Cloudera Manager and Hosts to see the new HVE configuration changes. 5.6 Disk and Filesystem Layout Node/Role Disk and Filesystem Layout Description Management/Master Ext4 1/2/3TB drives are mounted from the JBOD s YARN Node Manager nodes Ext4 1/2/3TB drives are mounted from the JBOD s 5.7 OS Supportability/Compatibility Matrix DMS Server Nodes CentOS/RHEL 6.x X X CentOS/RHEL 7.x X X Ubuntu X X 19 of 55

20 5.8 JBOD Supportability/Compatibility Matrix With DriveScale solution, we recommend customers should use high capacity JBODs with dual hot-pluggable IO controllers (Expanders) and enough upstream bandwidth. The JBODs should also have dual hot-pluggable redundant power supplies. DriveScale has evaluated and tested a few of the vendor offerings for redundancy, management functionality and performance. The table listed below has the JBOD vendor name and the model numbers which are certified by DriveScale. JBOD Vendor Dell Hewlett Packard Enterprise Model Number PowerVault MD3060e and 2.5, 60 bays, 4U, redundant expanders, 2 x 3 x mini-sas 6G D , 70 bays, 5U, quad expanders, 4 x 2 x mini-sas 12G D , 70 bays, 5U, quad expanders, 4 x 2 x mini-sas 6G RAID Inc./Newisys NDS-4600/ , 60 bays, 4U, redundant expanders, 2 x 4 x mini- SAS 6G NDS , 24 bays, 2U, redundant expanders, 2 x 3 x mini-sas 6G NDS , 90/96 bays, 4U, redundant expanders, 2 x 6 x mini- SAS-HD 12G NDS , 84 bays, 4U, redundant expanders, 2 x 5 x mini-sas-hd 12G Qunta (QCT) M6400H - 3.5, 60 bays, 4U, redundant expanders, 2 x 4 x mini-sas 6G JB , 60 bays, 4U, redundant expanders, 2 x 4 x mini-sas 12G Promise Inc. J5300s - 3.5, 12 bays,2u, redundant expanders, 2 x 2 x mini-sas-hd 12G J5320s - 2.5,24 bays, 2U, redundant expanders, 2 x 2 x mini-sas-hd 12G J , 16 bays, 3U, redundant expanders, 2 x 2 x mini-sas-hd 12G J , 24 bays, 4U, redundant expanders, 2 x 2 x mini-sas-hd 12G 20 of 55

21 6. Rack Scalability Customers can scale beyond one rack in a straightforward manner, in order to expand their compute and storage resources, as application needs grow. Customers can change or maintain the computeto-storage ratio for the new racks or an existing rack. For every new JBOD addition, a new DriveScale Adapter with four controllers must be added as well. Since drives are assigned from within the rack to servers in the rack, scaling is achieved by simply adding more racks with Servers, DriveScale Adapters, Switches and JBODs. Diagram 3: DriveScale Rack Scalability 7. References 1. Cloudera Manager Installation Guide 2. Cloudera High Availability documentation 3. High Availability for Other CDH components ha.html#xd_583c10bfdbd326ba--6eed2fb d04bee--7d18 4. Cloudera Multihoming support documentation html#cdh_cm_network_security 21 of 55

22 8. Bill of Materials Server Components Intel Xeon Processsor based servers with dual or quad port 10GbE SFP+ NICs. The exact CPU models, number of sockets, and memory are based on customer application needs Quantity Depends on customer application needs JBOD Components DriveScale certified JBODs NL-SAS HDDs Quantity Depends on customer application needs Depends on customer application needs Switch DriveScale certified 10GbE SFP+ switches 1GBaseT switch Quantity An even number of switches for redundant switch fabric Based on the number of Servers and JBODs in configuration DriveScale components DriveScale Adapter Chassis DriveScale Adapter Quantity One for each JBOD Four for each DSA Chassis Software CentOS Version Please refer to 6.7 section DriveScale Adapter 1.3 CDH of 55

23 9. Conclusion The DriveScale-Cloudera reference architecture guide is designed to provide an overview of the combined solution, the key components that are employed and details on how to install and setup clusters using these technologies. 10. Appendix A: Glossary of Terms Term Data Node DSA DSC DMS HBA HDD HDFS High Availability JBOD Description Worker nodes of the cluster to which the HDFS data is written. DriveScale Adapter. DriveScale Adapter is a 1RU Ethernet to SAS adapter serving as a bridge between 10 Gbps Ethernet connecting compute resources to JBODs full of commodity disks. DriveScale Central. DriveScale Central is a web-based user interface to the DriveScale cloud that performs DriveScale account management. DSC is where you download the keys to enable installation of the DriveScale software, and then set up your DriveScale Management Domain(s) (DMDs). This is where you create your domain, select and configure the DMS nodes for the domain, and select a chassis (with its associated DriveScale Adapters, DSAs) for the domain. DriveScale Management Server. DriveScale Management Server is the server that runs the bundle of software (service) that manages a set of Physical Resources to enable the DriveScale services. DriveScale Manager is the web-based user interface to the DMS. Host bus adapter. An I/O controller that is used to interface a host with storage devices. Hard disk drive. Hadoop Distributed File System. Configuration that addresses availability issues in a cluster. In a standard configuration, the Name Node is a single point of failure (SPOF). Each cluster has a single Name Node, and if that machine or process became unavailable, the cluster is unavailable until the Name Node is either restarted or brought up on a new host. The secondary Name Node does not provide failover capability. High availability enables running two Name Nodes in the same cluster: the active Name Node and the standby Name Node. The standby Name Node allows a fast failover to a new Name Node in case of machine crash or planned maintenance. Just a bunch of disks. A JBOD chassis hosts many HDDs and two redundant SAS switches (also called controllers). The SAS switches provide dual path access to each of the HDDs in the chassis through multiple Mini- SAS HD interface connectors 23 of 55

24 Term Job History Server MLAG Name Node NIC Node Manager NUMA PDU QJM QJN Description Process that archives job metrics and metadata. One per cluster. Multi-chassis Link Aggregation. MLAG is the ability of two or more switches to act like a single switch when forming link bundles. The metadata master of HDFS essential for the integrity and proper functioning of the distributed filesystem. Network interface card. The process that starts application processes and manages resources on the Data Nodes. Non-uniform memory access. Addresses variable memory access latency in multi-socket servers. This is typical of SMP (symmetric multiprocessing) systems, and there are several strategies to optimize applications and operating systems. vsphere ESXi can be optimized for NUMA. It can also present the NUMA architecture to the virtualized guest OS, which can then leverage it to optimize memory access. This is called vnuma. Power distribution unit. Quorum Journal Manager. Provides a fencing mechanism for high availability in a Hadoop cluster. This service is used to distribute HDFS edit logs to multiple hosts (at least three are required) from the active Name Node. The standby Name Node reads the edits from the Journal Nodes and constantly applies them to its own namespace. In case of a failover, the standby Name Node applies all the edits from the Journal Nodes before promoting itself to the active state. Quorum Journal Nodes. Nodes on which the journal services are installed. RM ToR ZK Resource Manager. The resource management component of YARN. This initiates application startup and controls scheduling on the Data Nodes of the cluster (one instance per cluster). Top of rack. ZooKeeper. A centralized service for maintaining configuration information, naming, and providing distributed synchronization and group services. 24 of 55

25 11. Appendix B: DriveScale Cluster Install Notes: - You must complete the racking and cabling of the Servers, DSA, JBOD s and switches per the details in the installation guide. - You must obtain the credentials of DSC which is shipped with hardware. - You must decide whether to set up your DMS as one standalone server or as a high-availability cluster with three servers. When three are used, the Management Domain can survive the failure of any one of the DMS machines. - The DMS servers should be configured with at least GB of memory. - DHCP server in the 10G/1G network - Access to the DHCP administrator/server to get the IP address(es) of the DSA(s) based on the MAC address(es) of the DSA(s). - The network address(es) of your DMS server(s). - The network address(es) of your DSA(s). - The network addresses of your compute servers Configure Your Domains with DriveScale Central (DSC) 1. Log in to DSC with the credentials obtained from DS. a) Go to b) Log in using the credentials provided to you by DriveScale. c) A checklist of the tasks that need to be accomplished appears on the main DSC page. 25 of 55

26 2. Go to the Domains link in the left navigation panel and click on Create Domain. 3. Fill the name, FQDN name and any notes for the domain. Click on Create. 4. Go to the Downloads link in the left navigation panel and download the config-training,ds-dmskeys-xxx.rpm and ds-repo-xxx.rpm files on your local machine. 26 of 55

27 11.2 Set up DMS nodes 1. On each of the DMS machines, copy and then install the repo and keys RPM package after downloading the file using WinScp tool or using scp command if you are using a linux machine. scp ds-* root@x.x.x.x:/tmp rpm -ivh ds-repo-* rpm -ivh ds-dms* 2. On each of the DMS machines, copy the config.training file into a file named /etc/drivescale/ conf after downloading the file using WinScp tool or using scp command if you are using a linux machine. scp config.training root@x.x.x.x:/tmp cp /tmp/config.training /etc/drivescale/conf 3. Install the dms server on all the DMS machines using the yum install command. Yum will automatically install the dms server from the repo. yum -y install dms-ds 11.3 Set up DriveScale Adapter (DSA) 1. Log into one of the DMS machines using ssh. 2. Set up the DriveScale management Domain configuration for each DSA using the same config file (config.training in this example) as was used in the DMSes. 3. This is done via the /opt/drivescale/bin/dsa command installed on each DMS. The default DSA username and password is admin/admin. 4. Run the command listed below to check the current configuration and settings of DSA. /opt/drivescale/bin/dsa --username admin --password admin --adapter <management IP of DSA adapter> service showconf 5. Verify the 10Gbps interface IPs address run the following command: /opt/drivescale/bin/dsa --username admin --password admin --adapter <management IP of the DSA> net show --interface 10gBond 6. To verify the management IP/gateway etc of the DSA in case we have the FQDN only. /opt/drivescale/bin/dsa --username admin --password admin --adapter <DSA Adapter Management FQDN > net show --interface mgmt. 7. Push the config from the DMS to the DSA /opt/drivescale/bin/dsa --username admin --password admin --adapter <management IP of the DSA> service config --file /tmp/conf.training 8. Restart DSA /opt/drivescale/bin/dsa --username admin --password admin --adapter <management IP of the DSA> service restart 27 of 55

28 11.4 Start the DMS and setup login to DMS 1. Log into one of the DMS machines using ssh. 2. Start the drivescale service by running the following command: service drivescale start 3. Set up the DMS in SET_UP_MODE by entering the following command: /opt/drivescale/bin/setup-mode 4. Login to the DMS UI using the FQDN name or IP address of the DMS. 5. Create a username and password for the first-time users. 6. To use internal authentication, you just need to provide the username for the initial admin (superuser), and enter and confirm the password. The first name and last name is optional. 7. Click Configure DMS to create the Admin user on the DMS, and to configure the authentication method 11.5 Set up Servers/DataNodes/MasterNodes Notes: The two 10GbE ports are bonded into a single port for the DSA. Users need to bond the two 10GbE ports on the server nodes as well. Create the ifcfg file for creating the bond on the server. auto bond0 iface bond0 inet static slaves eth2 eth3 bond_miimon 100 bond_mode 802.3ad bond_xmit_hash_policy layer2 pre-up ifconfig eth2 mtu 9000 && ifconfig eth3 mtu 9000 mtu 9000 auto vmbr1 iface vmbr1 inet static address <IP for the 20G Bond interface> netmask bridge_ports bond0 bridge_stp off bridge_fd 0 pre-up ifconfig eth2 mtu 9000 pre-up ifconfig eth3 mtu 9000 pre-up ifconfig bond0 mtu 9000 mtu of 55

29 1. On each of the server machines, copy and then install the repo RPM package after downloading the file using WinScp tool or using scp command if you are using a linux machine. scp ds-* root@x.x.x.x:/tmp rpm -ivh ds-repo-* 2. Install the server software on all the server machines using the yum install command. Yum will automatically install the dms server from the repo. yum -y install ds-server 3. Start the drivescale service. service drivescale start 4. Remove the temporary ds-repo*, ds-dms* and /tmp/config.mydomain files from each of the server and DMS machines 11.6 Tagging JBOD and drives 1. Use the API documentation to build the JBOD tagging script. 2. Create a script to tag drives from each JBOD with a different tag. 3. Below is an example of the script. You can run the script from the DMS server. #!/bin/bash list= 5000c500560df c ef7 5000c ff 5000c50040b201db 5000c dc2f 5000c50040b c50034ee c500035d7eb3 5000c500033aa c500033aaf9b 5000c500350a41bb 5000c50055a592db 5000c500033ae8f3 5000c50040e9fcbf 5000c50040ab7d c a of 55

30 5000c50040b230fb 5000c50040a6cf3b 5000c50040aa27f3 5000c50040b2aa77 for api in $list do curl -u -k -X PATCH --header Content-Type: application/vnd.drivescale. v2+json --header Accept: application/json -d { tags : [ 6000_1 ] } $api; sleep 1; done 11.7 Creating Server Nodes and Clusters from templates Creating Node and Cluster Template 1. Connect to the DMS UI Composer and from the left-hand panel navigate to the Composer section and click on the Node template. 2. From the top right corner select the Create template and fill in the details for the template name and click on Save. 3. Create 3 templates for MasterNode and 5 for DataNode Notes: Create a DataNode and MasterNode Template for each of the data node with the following minimum requirements: Data Nodes d1 d2 d3 Minimum requirements Drives: 16 Use drives with all these tags: 6000_1 Exclude drives with any of these tags: 6000_2 6020_1 6020_2 Drives: 16 Use drives with all these tags: 6000_2 Exclude drives with any of these tags: 6000_1 6020_1 6020_2 Drives: 16 Use drives with all these tags: 6020_1 Exclude drives with any of these tags: 6000_1 6000_1 6020_2 30 of 55

31 Data Nodes d4/d5 m1 m2 m3 Minimum requirements Drives: 16 Use drives with all these tags: 6020_2 Exclude drives with any of these tags: 6000_2 6000_1 6020_1 Drives: 16 Use drives with all these tags: 6000_1 Exclude drives with any of these tags: 6000_2 6020_1 6020_2 Drives: 16 Use drives with all these tags: 6000_2 Drives: 16 Use drives with all these tags: 6020_1 Customers can change the disk, RPM, CPU or RAM according to their cluster requirements. 31 of 55

32 32 of 55

33 4. From the left-hand panel navigate to the Composer section and click on the Cluster template. 5. From the top right corner select the Create template and fill in the details for the template name as CDH_CERT_TEMPLATE. Click on Add new node type(s) and select all the data and master node templates: 6. For the newly created cluster template, click on Edit and choose Min/Max instances as 1 for all the previously created data node and master node templates: 33 of 55

34 7. Select Save after editing all the Min/Max instances for all the node template Creating Cluster Template 1. From the left-hand panel navigate to the Composer section and click on the Cluster template. 2. From the top right corner select the Create template and fill in the details for the template name as CDH_CERT_TEMPLATE with Data Nodes (mounted with 16 Disks) and 3 Master Nodes (mounted with 4 Disks) based on Cluster Template CD. 3. Select Create. 34 of 55

35 DriveScale Cluster Verification 1. From the left-hand panel navigate to the Explorer section and click on the Logical section. Verify the cluster status. 2. Click on the Details tab on top right corner. 3. Select the Cluster_CD cluster and check for the details. 35 of 55

36 12. Appendix B: Cloudera Manager Install Notes: - Cloudera Manager install was performed on Master Node 3 (u34.data1.r3.hq.drivescale.com). - Make sure NTP is installed, configured and time is sync with all nodes. - Make sure IPTABLES is stopped and SELINUX is disabled. - Make sure the OS has access to internet to download Cloudera packages. - Make sure all nodes have their correct FQDN HOSTNAME in /etc/sysconfig/network. - Make sure on Cloudera manager host that /etc/hosts has an entry as follows: u34.data1.r3.hq.drivescale.com 12.1 Cloudera Manager Installation Procedure for Reference Architecture 1. ssh to the Master node 3 and start the cloudera manager installation. # ssh to u34: [root@u34 ~] yum install wget [root@u34 ~]# wget [root@u34~]# chmod u+x cloudera-manager-installer.bin [root@u34~]# sudo./cloudera-manager-installer.bin ê Accept install and Licensing. 2. Open Web Browser: Login with default login/pass: admin/admin 36 of 55

37 3. Accept the license agreement. 4. Select Cloudera Enterprise (60 days trial) or upload the Cloudera Enterprise license. 37 of 55

38 5. Click on Continue. 6. Discover all nodes via FQDN or IP addresses by adding the names or IP and clicking on Search. 38 of 55

39 7. Select all the hosts for the cluster installation and click on Continue. 8. Select the preferred method of repository installation, version of CDH, any additional parcels etc and click on Continue. 39 of 55

40 9. Select the Install Oracle Java Developer Kit option and click on Continue. 10. For this cluster, Single User Mode was not enabled. Click on Continue. 40 of 55

41 11. Enter the correct root credentials to connect the Nodes and click on Continue. 12. Wait for the cluster installation to complete. 41 of 55

42 Notes: In case of packages install failure, remove following packages on the nodes: rpm -e --nodeps --justdb glibc-common el6.x86_64 --allmatches ; rpm -e --nodeps --justdb glibc el6.x86_64 --allmatches ; rpm -e --nodeps --justdb gdbm el6.x86_64 allmatches In case Java version running on Nodes is older: move JAVA from version 1.6 to 1.7: [root@dn~]# java -version [root@dn~]# cd /usr/java ; rm -f latest ; ln -s /usr/java/jdk1.7.0_67-cloudera latest Transparent Huge Page Compaction can cause significant performance problems on all nodes. To disable this, run the following (add the same command to an init script such as /etc/rc.local so it will be set on system reboot): [root@dn~]# echo never > /sys/kernel/mm/transparent_hugepage/defrag ; echo never > /sys/kernel/mm/transparent_hugepage/enabled 13. Wait for the Parcels installation to complete. 42 of 55

43 14. Run the host inspection agent to verify the correctness for packages. Click on Continue after the agent is run successfully. 43 of 55

44 15. Select the services you would like to install on the nodes. For this setup, we have selected custom services and the details are 44 of 55

45 16. Select the hosts which would have the various HBase, HDFS, Hue, Hive and other services running. 45 of 55

46 17. For this reference architecture, we are using the Embedded Database. Copy the username and passwords for all the different databases. Click on Test Connection and ensure that it is successful. Click on Continue. Please be advised that Cloudera recommends using external database for production environment. 46 of 55

47 18. Review all the changes for the setup and ensure all the details are correct. Click on Continue. 47 of 55

48 48 of 55

49 49 of 55

50 19. After the installation completes, go to the main page and enable the services. The steps are mentioned below. All the actions are performed in the Cloudera Manager (CM)>Cluster1 a) Enable ZooKeeper HA (select the 3 Masters Node) and start roles. b) Enable HDFS HA and select 2 Master nodes for NameNode and JournalNode: 50 of 55

51 Notes: Open HDFS permission for HBASE on all MASTER nodes: ~]# sudo -u hdfs /opt/cloudera/parcels/cdh cdh p0.41/bin/hadoop fs -chmod 777 / Initialize Postgres DB [root@u34 ~]# service postgresql initdb [root@u34 ~]# /etc/init.d/postgresql start 3 PGSQL Policy: [root@u34 ~]# cat /var/lib/pgsql/data/postgresql.conf grep -e listen -e [root@u34 ~]# vi /var/lib/pgsql/data/postgresql.conf listen_addresses = * standard_conforming_strings = off 51 of 55

52 ~]# vi /var/lib/pgsql/data/pg_hba.conf host all all md5 3 Install JDBC [root@u34 ~]# yum install postgresql-jdbc [root@u34 ~]# mkdir /usr/lib/hive/lib/ [root@u34 ~]# ln -s /usr/share/java/postgresql-jdbc.jar /usr/lib/hive/lib/ postgresql-jdbc.jar [root@u34 ~]# /etc/init.d/postgresql restart c) HIVE - Create PGSQL user for HIVE (user: hive, pass: hive) Connect to PostGresql DB: [root@u34 ~]# sudo -u postgres psql postgres=# CREATE USER hive WITH PASSWORD hive ; postgres=# CREATE DATABASE hive; postgres=# GRANT ALL PRIVILEGES ON DATABASE hive TO hive; Test connection: [root@u34 ~]# psql -h u34.data1.r3.hq.drivescale.com -U hive -d hive - Go to CM -> Cluster1 ->HIVE->Configuration-> Metastore DB Change Default port from 7423 to 5432 DB user: hive DB pass: hive 52 of 55

53 - Go to CM -> Cluster1-> Hive Action: Create MetaStore Database Action: Start all Hive services d) OOZIE: - Create PGSQL user for OOZIE (user: oozie, pass: oozie) Connect to PostGresql DB: [root@u34 ~]# sudo -u postgres psql postgres=# CREATE USER oozie WITH PASSWORD oozie ; postgres=# CREATE DATABASE oozie; postgres=# GRANT ALL PRIVILEGES ON DATABASE oozie TO oozie; Test connection: [root@u34 ~]# psql -h u34.data1.r3.hq.drivescale.com -U oozie -d oozie - Go to CM -> Cluster1 ->OOZIE->Configuration-> OOZIE DB Change Default port from 7423 to 5432 DB user: oozie DB pass: oozie 53 of 55

54 - Go to CM -> Cluster1-> OOZIE Action: Create OOZIE Database Action: Start all OOZIE services e) HUE - Create PGSQL user for HUE (user: hue, pass: hue) Connect to PostGresql DB: [root@u34 ~]# sudo -u postgres psql postgres=# CREATE USER hue WITH PASSWORD hue ; postgres=# CREATE DATABASE hue; postgres=# GRANT ALL PRIVILEGES ON DATABASE hue TO hue; Test connection: [root@u34 ~]# psql -h u34.data1.r3.hq.drivescale.com -U hue -d hue - Go to CM -> Cluster1 ->OOZIE->Configuration-> HUE DB Change Default port from 7423 to 5432 DB user: hue DB pass: hue - Go to CM -> Cluster1-> HUE Action: Sync Database Action: Start all HUE services 54 of 55

55 f) SOLR - Go to CM -> Cluster1-> SOLR Action: Initialize SORL Action: Start all SOLR services g) SPARK - Go to CM -> Cluster1-> SPARK Action: Install SPARK Jar Action: Create SPARK History Log Directory Action: Start all SPARK services h) Ensure all the services are up and running for the cluster. - Go to CM -> Cluster1 -> Services and Hosts Status DriveScale, Inc 1230 Midas Way, Suite 210 Sunnyvale, CA Main: +1(408) www. drivescale.com WP of 55

DRIVESCALE-MAPR Reference Architecture

DRIVESCALE-MAPR Reference Architecture DRIVESCALE-MAPR Reference Architecture Table of Contents Glossary of Terms.... 4 Table 1: Glossary of Terms...4 1. Executive Summary.... 5 2. Audience and Scope.... 5 3. DriveScale Advantage.... 5 Flex

More information

DRIVESCALE-HDP REFERENCE ARCHITECTURE

DRIVESCALE-HDP REFERENCE ARCHITECTURE DRIVESCALE-HDP REFERENCE ARCHITECTURE April 2017 Contents 1. Executive Summary... 2 2. Audience and Scope... 3 3. Glossary of Terms... 3 4. DriveScale Hortonworks Data Platform - Apache Hadoop Solution

More information

DriveScale Administration Guide version 1.3

DriveScale Administration Guide version 1.3 DriveScale Administration Guide version 1.3 Overview This Administrator's Guide covers information on managing your DriveScale domains, after you have installed the DriveScale software components. It also

More information

VMware vsphere Big Data Extensions Administrator's and User's Guide

VMware vsphere Big Data Extensions Administrator's and User's Guide VMware vsphere Big Data Extensions Administrator's and User's Guide vsphere Big Data Extensions 1.1 This document supports the version of each product listed and supports all subsequent versions until

More information

Cloudera Manager Quick Start Guide

Cloudera Manager Quick Start Guide Cloudera Manager Guide Important Notice (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained in this

More information

Data Sheet. DriveScale Overview

Data Sheet. DriveScale Overview DriveScale Overview DriveScale offers a rack scale architecture, a next generation data center infrastructure based on hardware and software interface technology that brings the advantages of proprietary

More information

Configuring and Deploying Hadoop Cluster Deployment Templates

Configuring and Deploying Hadoop Cluster Deployment Templates Configuring and Deploying Hadoop Cluster Deployment Templates This chapter contains the following sections: Hadoop Cluster Profile Templates, on page 1 Creating a Hadoop Cluster Profile Template, on page

More information

Dell Cloudera Apache Hadoop Solution Reference Architecture Guide - Version 5.7

Dell Cloudera Apache Hadoop Solution Reference Architecture Guide - Version 5.7 Dell Cloudera Apache Hadoop Solution Reference Architecture Guide - Version 5.7 20-206 Dell Inc. Contents 2 Contents Trademarks... 5 Notes, Cautions, and Warnings... 6 Glossary... 7 Dell Cloudera Apache

More information

Dell Cloudera Apache Hadoop Solution Reference Architecture Guide - Version 5.5.1

Dell Cloudera Apache Hadoop Solution Reference Architecture Guide - Version 5.5.1 Dell Cloudera Apache Hadoop Solution Reference Architecture Guide - Version 5.5. 20-206 Dell Inc. Contents 2 Contents Trademarks... 5 Notes, Cautions, and Warnings... 6 Glossary... 7 Dell Cloudera Apache

More information

Dell Ready Bundle for Cloudera Hadoop. Architecture Guide Version 5.10

Dell Ready Bundle for Cloudera Hadoop. Architecture Guide Version 5.10 Dell Ready Bundle for Cloudera Hadoop Architecture Guide Version 5.10 Dell Converged Platforms and Solutions ii Contents Contents List of Figures...v List of Tables... vi Trademarks...8 Glossary...9 Notes,

More information

Beta. VMware vsphere Big Data Extensions Administrator's and User's Guide. vsphere Big Data Extensions 1.0 EN

Beta. VMware vsphere Big Data Extensions Administrator's and User's Guide. vsphere Big Data Extensions 1.0 EN VMware vsphere Big Data Extensions Administrator's and User's Guide vsphere Big Data Extensions 1.0 This document supports the version of each product listed and supports all subsequent versions until

More information

DriveScale-DellEMC Reference Architecture

DriveScale-DellEMC Reference Architecture DriveScale-DellEMC Reference Architecture DellEMC/DRIVESCALE Introduction DriveScale has pioneered the concept of Software Composable Infrastructure that is designed to radically change the way data center

More information

Cloudera Manager Installation Guide

Cloudera Manager Installation Guide Cloudera Manager Installation Guide Important Notice (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained

More information

Automation of Rolling Upgrade for Hadoop Cluster without Data Loss and Job Failures. Hiroshi Yamaguchi & Hiroyuki Adachi

Automation of Rolling Upgrade for Hadoop Cluster without Data Loss and Job Failures. Hiroshi Yamaguchi & Hiroyuki Adachi Automation of Rolling Upgrade for Hadoop Cluster without Data Loss and Job Failures Hiroshi Yamaguchi & Hiroyuki Adachi About Us 2 Hiroshi Yamaguchi Hiroyuki Adachi Hadoop DevOps Engineer Hadoop Engineer

More information

Fidelis Enterprise Collector Cluster QUICK START GUIDE. Rev-I Collector Controller2 (HP DL360-G10) and Collector XA2 (HP DL360-G10) Platforms

Fidelis Enterprise Collector Cluster QUICK START GUIDE. Rev-I Collector Controller2 (HP DL360-G10) and Collector XA2 (HP DL360-G10) Platforms Fidelis Enterprise Collector Cluster Rev-I Collector Controller2 (HP DL360-G10) and Collector XA2 (HP DL360-G10) Platforms 1. System Overview The Fidelis Collector is the security analytics database for

More information

Redhat OpenStack 5.0 and PLUMgrid OpenStack Networking Suite 2.0 Installation Hands-on lab guide

Redhat OpenStack 5.0 and PLUMgrid OpenStack Networking Suite 2.0 Installation Hands-on lab guide Redhat OpenStack 5.0 and PLUMgrid OpenStack Networking Suite 2.0 Installation Hands-on lab guide Oded Nahum Principal Systems Engineer PLUMgrid EMEA November 2014 Page 1 Page 2 Table of Contents Table

More information

Cisco and Cloudera Deliver WorldClass Solutions for Powering the Enterprise Data Hub alerts, etc. Organizations need the right technology and infrastr

Cisco and Cloudera Deliver WorldClass Solutions for Powering the Enterprise Data Hub alerts, etc. Organizations need the right technology and infrastr Solution Overview Cisco UCS Integrated Infrastructure for Big Data and Analytics with Cloudera Enterprise Bring faster performance and scalability for big data analytics. Highlights Proven platform for

More information

Fidelis Network High Capacity Collector QUICK START GUIDE. Rev-I Collector Controller Appliances Based on HP DL360-G9 and DL380-G9 Platforms

Fidelis Network High Capacity Collector QUICK START GUIDE. Rev-I Collector Controller Appliances Based on HP DL360-G9 and DL380-G9 Platforms Fidelis Network High Capacity Collector Rev-I Collector Controller Appliances Based on HP DL360-G9 and DL380-G9 Platforms 1. System Overview The Fidelis Collector is the security analytics database for

More information

Hadoop. Introduction / Overview

Hadoop. Introduction / Overview Hadoop Introduction / Overview Preface We will use these PowerPoint slides to guide us through our topic. Expect 15 minute segments of lecture Expect 1-4 hour lab segments Expect minimal pretty pictures

More information

Installing VMware vsphere 5.1 Components

Installing VMware vsphere 5.1 Components Installing VMware vsphere 5.1 Components Module 14 You Are Here Course Introduction Introduction to Virtualization Creating Virtual Machines VMware vcenter Server Configuring and Managing Virtual Networks

More information

Installing HDF Services on an Existing HDP Cluster

Installing HDF Services on an Existing HDP Cluster 3 Installing HDF Services on an Existing HDP Cluster Date of Publish: 2018-08-13 http://docs.hortonworks.com Contents Upgrade Ambari and HDP...3 Installing Databases...3 Installing MySQL... 3 Configuring

More information

Vendor: Cloudera. Exam Code: CCA-505. Exam Name: Cloudera Certified Administrator for Apache Hadoop (CCAH) CDH5 Upgrade Exam.

Vendor: Cloudera. Exam Code: CCA-505. Exam Name: Cloudera Certified Administrator for Apache Hadoop (CCAH) CDH5 Upgrade Exam. Vendor: Cloudera Exam Code: CCA-505 Exam Name: Cloudera Certified Administrator for Apache Hadoop (CCAH) CDH5 Upgrade Exam Version: Demo QUESTION 1 You have installed a cluster running HDFS and MapReduce

More information

Fidelis Network High Capacity Collector QUICK START GUIDE. Rev-H Collector Controller Appliances Based on HP DL360-G9 and DL380-G9 Platforms

Fidelis Network High Capacity Collector QUICK START GUIDE. Rev-H Collector Controller Appliances Based on HP DL360-G9 and DL380-G9 Platforms Fidelis Network High Capacity Collector Rev-H Collector Controller Appliances Based on HP DL360-G9 and DL380-G9 Platforms 1. System Overview The Fidelis Collector is the security analytics database for

More information

Installing Cisco MSE in a VMware Virtual Machine

Installing Cisco MSE in a VMware Virtual Machine Installing Cisco MSE in a VMware Virtual Machine This chapter describes how to install and deploy a Cisco Mobility Services Engine (MSE) virtual appliance. Cisco MSE is a prebuilt software solution that

More information

Deploying the Cisco Tetration Analytics Virtual

Deploying the Cisco Tetration Analytics Virtual Deploying the Cisco Tetration Analytics Virtual Appliance in the VMware ESXi Environment About, on page 1 Prerequisites for Deploying the Cisco Tetration Analytics Virtual Appliance in the VMware ESXi

More information

Important Notice Cloudera, Inc. All rights reserved.

Important Notice Cloudera, Inc. All rights reserved. Cloudera Upgrade Important Notice 2010-2018 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, and any other product or service names or slogans contained in this document are trademarks

More information

Lenovo Big Data Validated Design for Cloudera Enterprise and VMware

Lenovo Big Data Validated Design for Cloudera Enterprise and VMware Lenovo Big Data Validated Design for Cloudera Enterprise and VMware Last update: 20 June 2017 Version 1.2 Configuration Reference Number BDACLDRXX63 Describes the reference architecture for Cloudera Enterprise,

More information

SPLUNK ENTERPRISE AND ECS TECHNICAL SOLUTION GUIDE

SPLUNK ENTERPRISE AND ECS TECHNICAL SOLUTION GUIDE SPLUNK ENTERPRISE AND ECS TECHNICAL SOLUTION GUIDE Splunk Frozen and Archive Buckets on ECS ABSTRACT This technical solution guide describes a solution for archiving Splunk frozen buckets to ECS. It also

More information

Best Practices for Deploying Hadoop Workloads on HCI Powered by vsan

Best Practices for Deploying Hadoop Workloads on HCI Powered by vsan Best Practices for Deploying Hadoop Workloads on HCI Powered by vsan Chen Wei, ware, Inc. Paudie ORiordan, ware, Inc. #vmworld HCI2038BU #HCI2038BU Disclaimer This presentation may contain product features

More information

EsgynDB Enterprise 2.0 Platform Reference Architecture

EsgynDB Enterprise 2.0 Platform Reference Architecture EsgynDB Enterprise 2.0 Platform Reference Architecture This document outlines a Platform Reference Architecture for EsgynDB Enterprise, built on Apache Trafodion (Incubating) implementation with licensed

More information

Baremetal with Apache CloudStack

Baremetal with Apache CloudStack Baremetal with Apache CloudStack ApacheCon Europe 2016 Jaydeep Marfatia Cloud, IOT and Analytics Me Director of Product Management Cloud Products Accelerite Background Project lead for open source project

More information

Fidelis Enterprise Collector Cluster QUICK START GUIDE. Rev-H Collector Controller2 (HP DL360-G9) and Collector XA2 (HP DL360-G9) Platforms

Fidelis Enterprise Collector Cluster QUICK START GUIDE. Rev-H Collector Controller2 (HP DL360-G9) and Collector XA2 (HP DL360-G9) Platforms Fidelis Enterprise Collector Cluster Rev-H Collector Controller2 (HP DL360-G9) and Collector XA2 (HP DL360-G9) Platforms 1. System Overview The Fidelis Collector is the security analytics database for

More information

Installing an HDF cluster

Installing an HDF cluster 3 Installing an HDF cluster Date of Publish: 2018-08-13 http://docs.hortonworks.com Contents Installing Ambari...3 Installing Databases...3 Installing MySQL... 3 Configuring SAM and Schema Registry Metadata

More information

Hadoop An Overview. - Socrates CCDH

Hadoop An Overview. - Socrates CCDH Hadoop An Overview - Socrates CCDH What is Big Data? Volume Not Gigabyte. Terabyte, Petabyte, Exabyte, Zettabyte - Due to handheld gadgets,and HD format images and videos - In total data, 90% of them collected

More information

Installing SmartSense on HDP

Installing SmartSense on HDP 1 Installing SmartSense on HDP Date of Publish: 2018-07-12 http://docs.hortonworks.com Contents SmartSense installation... 3 SmartSense system requirements... 3 Operating system, JDK, and browser requirements...3

More information

Oracle Big Data Fundamentals Ed 2

Oracle Big Data Fundamentals Ed 2 Oracle University Contact Us: 1.800.529.0165 Oracle Big Data Fundamentals Ed 2 Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, you learn about big data, the technologies

More information

Xcalar Installation Guide

Xcalar Installation Guide Xcalar Installation Guide Publication date: 2018-03-16 www.xcalar.com Copyright 2018 Xcalar, Inc. All rights reserved. Table of Contents Xcalar installation overview 5 Audience 5 Overview of the Xcalar

More information

NetApp Solutions for Hadoop Reference Architecture

NetApp Solutions for Hadoop Reference Architecture White Paper NetApp Solutions for Hadoop Reference Architecture Gus Horn, Iyer Venkatesan, NetApp April 2014 WP-7196 Abstract Today s businesses need to store, control, and analyze the unprecedented complexity,

More information

Big Data Hadoop Stack

Big Data Hadoop Stack Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware

More information

Oracle BDA: Working With Mammoth - 1

Oracle BDA: Working With Mammoth - 1 Hello and welcome to this online, self-paced course titled Administering and Managing the Oracle Big Data Appliance (BDA). This course contains several lessons. This lesson is titled Working With Mammoth.

More information

Tuning Enterprise Information Catalog Performance

Tuning Enterprise Information Catalog Performance Tuning Enterprise Information Catalog Performance Copyright Informatica LLC 2015, 2018. Informatica and the Informatica logo are trademarks or registered trademarks of Informatica LLC in the United States

More information

Resiliency Replication Appliance Installation Guide Version 7.2

Resiliency Replication Appliance Installation Guide Version 7.2 Resiliency Replication Appliance Installation Guide Version 7.2 DISCLAIMER IBM believes that the information in this publication is accurate as of its publication date. The information is subject to change

More information

Hortonworks SmartSense

Hortonworks SmartSense Hortonworks SmartSense Installation (January 8, 2018) docs.hortonworks.com Hortonworks SmartSense: Installation Copyright 2012-2018 Hortonworks, Inc. Some rights reserved. The Hortonworks Data Platform,

More information

AppDefense Plug-In. VMware AppDefense 2.0

AppDefense Plug-In. VMware AppDefense 2.0 VMware 2.0 You can find the most up-to-date technical documentation on the VMware website at: https://docs.vmware.com/ If you have comments about this documentation, submit your feedback to docfeedback@vmware.com

More information

Dell Reference Configuration for Hortonworks Data Platform 2.4

Dell Reference Configuration for Hortonworks Data Platform 2.4 Dell Reference Configuration for Hortonworks Data Platform 2.4 A Quick Reference Configuration Guide Kris Applegate Solution Architect Dell Solution Centers Executive Summary This document details the

More information

Falling Out of the Clouds: When Your Big Data Needs a New Home

Falling Out of the Clouds: When Your Big Data Needs a New Home Falling Out of the Clouds: When Your Big Data Needs a New Home Executive Summary Today s public cloud computing infrastructures are not architected to support truly large Big Data applications. While it

More information

Virtual Appliance User s Guide

Virtual Appliance User s Guide Cast Iron Integration Appliance Virtual Appliance User s Guide Version 4.5 July 2009 Cast Iron Virtual Appliance User s Guide Version 4.5 July 2009 Copyright 2009 Cast Iron Systems. All rights reserved.

More information

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals

More information

Introduction to the Oracle Big Data Appliance - 1

Introduction to the Oracle Big Data Appliance - 1 Hello and welcome to this online, self-paced course titled Administering and Managing the Oracle Big Data Appliance (BDA). This course contains several lessons. This lesson is titled Introduction to the

More information

Hortonworks DataFlow

Hortonworks DataFlow Hortonworks DataFlow Installing HDF Services on a New HDP Cluster for IBM (December 22, 2017) docs.hortonworks.com Hortonworks DataFlow: Installing HDF Services on a New HDP Cluster for IBM Power Systems

More information

ThoughtSpot on AWS Quick Start Guide

ThoughtSpot on AWS Quick Start Guide ThoughtSpot on AWS Quick Start Guide Version 4.2 February 2017 Table of Contents Contents Chapter 1: Welcome to ThoughtSpot...3 Contact ThoughtSpot... 4 Chapter 2: Introduction... 6 About AWS...7 Chapter

More information

SAP VORA 1.4 on AWS - MARKETPLACE EDITION FREQUENTLY ASKED QUESTIONS

SAP VORA 1.4 on AWS - MARKETPLACE EDITION FREQUENTLY ASKED QUESTIONS SAP VORA 1.4 on AWS - MARKETPLACE EDITION FREQUENTLY ASKED QUESTIONS 1. What is SAP Vora? SAP Vora is an in-memory, distributed computing solution that helps organizations uncover actionable business insights

More information

NexentaStor VVOL

NexentaStor VVOL NexentaStor 5.1.1 VVOL Admin Guide Date: January, 2018 Software Version: NexentaStor 5.1.1 VVOL Part Number: 3000-VVOL-5.1.1-000065-A Table of Contents Preface... 3 Intended Audience 3 References 3 Document

More information

Fidelis Network Sensor Appliances QUICK START GUIDE

Fidelis Network Sensor Appliances QUICK START GUIDE Fidelis Network Sensor Appliances Rev-H Fidelis Sensors (Direct, Internal, Web, and Mail Appliances) Based on HPE DL350-G9 and DL560-G9 Platforms 1. System Overview Fidelis Sensors are the components that

More information

Configuring Ports for Big Data Management, Data Integration Hub, Enterprise Information Catalog, and Intelligent Data Lake 10.2

Configuring Ports for Big Data Management, Data Integration Hub, Enterprise Information Catalog, and Intelligent Data Lake 10.2 Configuring s for Big Data Management, Data Integration Hub, Enterprise Information Catalog, and Intelligent Data Lake 10.2 Copyright Informatica LLC 2016, 2017. Informatica, the Informatica logo, Big

More information

Dell EMC Ready Architectures for VDI

Dell EMC Ready Architectures for VDI Dell EMC Ready Architectures for VDI Designs for Citrix XenDesktop and XenApp for Dell EMC XC Family September 2018 H17388 Deployment Guide Abstract This deployment guide provides instructions for deploying

More information

VMware vsphere Big Data Extensions Administrator's and User's Guide

VMware vsphere Big Data Extensions Administrator's and User's Guide VMware vsphere Big Data Extensions Administrator's and User's Guide vsphere Big Data Extensions 2.3.2 This document supports the version of each product listed and supports all subsequent versions until

More information

Dell EMC Ready System for VDI on VxRail

Dell EMC Ready System for VDI on VxRail Dell EMC Ready System for VDI on VxRail Citrix XenDesktop for Dell EMC VxRail Hyperconverged Appliance April 2018 H16968.1 Deployment Guide Abstract This deployment guide provides instructions for deploying

More information

Managing Cisco UCS C3260 Dense Storage Rack Server

Managing Cisco UCS C3260 Dense Storage Rack Server Managing Cisco UCS C3260 Dense Storage Rack Server This chapter contains the following topics: About Cisco UCS C3260 Dense Storage Rack Server, page 1 Cisco UCS C3260 Dense Storage Rack Server Architectural

More information

Installing Cisco CMX in a VMware Virtual Machine

Installing Cisco CMX in a VMware Virtual Machine Installing Cisco CMX in a VMware Virtual Machine This chapter describes how to install and deploy a Cisco Mobility Services Engine (CMX) virtual appliance. Cisco CMX is a prebuilt software solution that

More information

Deploy the ExtraHop Trace Appliance with VMware

Deploy the ExtraHop Trace Appliance with VMware Deploy the ExtraHop Trace Appliance with VMware Published: 2018-12-14 This guide explains how to deploy the virtual ExtraHop Trace appliances (ETA 1150v and ETA 6150v) on the VMware ESXi/ESX platform.

More information

Installation and Cluster Deployment Guide for KVM

Installation and Cluster Deployment Guide for KVM ONTAP Select 9 Installation and Cluster Deployment Guide for KVM Using ONTAP Select Deploy 2.9 August 2018 215-13526_A0 doccomments@netapp.com Updated for ONTAP Select 9.4 Table of Contents 3 Contents

More information

QUICK START GUIDE. Fidelis Collector SA. Rev-I Collector SA (HP DL360-G10) Platforms.

QUICK START GUIDE. Fidelis Collector SA. Rev-I Collector SA (HP DL360-G10) Platforms. Fidelis Collector SA Rev-I Collector SA (HP DL360-G10) Platforms 1. System Overview The Fidelis Collector is the security analytics database for Fidelis Network. The Fidelis Collector SA receives network

More information

<Partner Name> <Partner Product> RSA Ready Implementation Guide for. MapR Converged Data Platform 3.1

<Partner Name> <Partner Product> RSA Ready Implementation Guide for. MapR Converged Data Platform 3.1 RSA Ready Implementation Guide for MapR Jeffrey Carlson, RSA Partner Engineering Last Modified: 02/25/2016 Solution Summary RSA Analytics Warehouse provides the capacity

More information

QUICK START GUIDE. Fidelis Network K2 Appliances. Rev-I K2 (HP DL360-G10) Platforms.

QUICK START GUIDE. Fidelis Network K2 Appliances. Rev-I K2 (HP DL360-G10) Platforms. Fidelis Network K2 Appliances Rev-I K2 (HP DL360-G10) Platforms 1. System Overview The Fidelis K2 appliance is the central component for command and control of Fidelis Network components. With K2, you

More information

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera, How Apache Hadoop Complements Existing BI Systems Dr. Amr Awadallah Founder, CTO Cloudera, Inc. Twitter: @awadallah, @cloudera 2 The Problems with Current Data Systems BI Reports + Interactive Apps RDBMS

More information

Overview. About the Cisco UCS S3260 System

Overview. About the Cisco UCS S3260 System About the Cisco UCS S3260 System, on page 1 How to Use This Guide, on page 3 Cisco UCS S3260 System Architectural, on page 5 Connectivity Matrix, on page 7 Deployment Options, on page 7 Management Through

More information

VMware vsphere Big Data Extensions Command-Line Interface Guide

VMware vsphere Big Data Extensions Command-Line Interface Guide VMware vsphere Big Data Extensions Command-Line Interface Guide vsphere Big Data Extensions 2.0 This document supports the version of each product listed and supports all subsequent versions until the

More information

Appliance Guide. Version 1.0

Appliance Guide. Version 1.0 Appliance Guide Version 1.0 Contents Contents 1 Revision history 2 Getting Started 3 Getting to Know the R7-3000/5000/5000x 5 Getting to Know the R7-1000 6 Setting Up the Appliance 7 Logging in to the

More information

Hadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours)

Hadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours) Bigdata Fundamentals Day1: (2hours) 1. Understanding BigData. a. What is Big Data? b. Big-Data characteristics. c. Challenges with the traditional Data Base Systems and Distributed Systems. 2. Distributions:

More information

Deploy the ExtraHop Discover Appliance with VMware

Deploy the ExtraHop Discover Appliance with VMware Deploy the ExtraHop Discover Appliance with VMware Published: 2018-07-17 The ExtraHop virtual appliance can help you to monitor the performance of your applications across internal networks, the public

More information

Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT.

Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT. Oracle Big Data. A NALYTICS A ND MANAG E MENT. Oracle Big Data: Redundância. Compatível com ecossistema Hadoop, HIVE, HBASE, SPARK. Integração com Cloudera Manager. Possibilidade de Utilização da Linguagem

More information

Oracle Big Data Connectors

Oracle Big Data Connectors Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process

More information

Hortonworks Data Platform

Hortonworks Data Platform Hortonworks Data Platform Apache Ambari Upgrade (October 30, 2017) docs.hortonworks.com Hortonworks Data Platform: Apache Ambari Upgrade Copyright 2012-2017 Hortonworks, Inc. Some rights reserved. The

More information

Oracle Big Data Fundamentals Ed 1

Oracle Big Data Fundamentals Ed 1 Oracle University Contact Us: +0097143909050 Oracle Big Data Fundamentals Ed 1 Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, learn to use Oracle's Integrated Big Data

More information

Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine

Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine Version 4.11 Last Updated: 1/10/2018 Please note: This appliance is for testing and educational purposes only;

More information

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme VIRT1445BU Extreme Performance: Fast Virtualized Hadoop and Spark on All-Flash Disks VMworld 2017 Dave Jaffe, Performance Engineering, VMware Justin Murray, Technical Marketing, VMware Content: Not for

More information

HDP Security Overview

HDP Security Overview 3 HDP Security Overview Date of Publish: 2018-07-15 http://docs.hortonworks.com Contents HDP Security Overview...3 Understanding Data Lake Security... 3 What's New in This Release: Knox... 5 What's New

More information

HDP Security Overview

HDP Security Overview 3 HDP Security Overview Date of Publish: 2018-07-15 http://docs.hortonworks.com Contents HDP Security Overview...3 Understanding Data Lake Security... 3 What's New in This Release: Knox... 5 What's New

More information

Achieve Optimal Network Throughput on the Cisco UCS S3260 Storage Server

Achieve Optimal Network Throughput on the Cisco UCS S3260 Storage Server White Paper Achieve Optimal Network Throughput on the Cisco UCS S3260 Storage Server Executive Summary This document describes the network I/O performance characteristics of the Cisco UCS S3260 Storage

More information

Dell Technologies IoT Solution Surveillance with Genetec Security Center

Dell Technologies IoT Solution Surveillance with Genetec Security Center Dell Technologies IoT Solution Surveillance with Genetec Security Center Surveillance December 2018 H17436 Sizing Guide Abstract The purpose of this guide is to help you understand the benefits of using

More information

VMware vsphere Big Data Extensions Command-Line Interface Guide

VMware vsphere Big Data Extensions Command-Line Interface Guide VMware vsphere Big Data Extensions Command-Line Interface Guide vsphere Big Data Extensions 1.1 This document supports the version of each product listed and supports all subsequent versions until the

More information

CCA-410. Cloudera. Cloudera Certified Administrator for Apache Hadoop (CCAH)

CCA-410. Cloudera. Cloudera Certified Administrator for Apache Hadoop (CCAH) Cloudera CCA-410 Cloudera Certified Administrator for Apache Hadoop (CCAH) Download Full Version : http://killexams.com/pass4sure/exam-detail/cca-410 Reference: CONFIGURATION PARAMETERS DFS.BLOCK.SIZE

More information

Twelve Cluster Technologies Available in SAS 9.4

Twelve Cluster Technologies Available in SAS 9.4 ABSTRACT Paper SAS415-2017 Twelve Cluster Technologies Available in SAS 9.4 Rob Collum, SAS Institute Inc. We are always looking for ways to improve the performance, efficiency, and availability of our

More information

Storage Manager 2018 R1. Installation Guide

Storage Manager 2018 R1. Installation Guide Storage Manager 2018 R1 Installation Guide Notes, Cautions, and Warnings NOTE: A NOTE indicates important information that helps you make better use of your product. CAUTION: A CAUTION indicates either

More information

Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here

Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here Marcel Kornacker marcel@cloudera.com Speaker Name or Subhead Goes Here 2013-11-12 Copyright 2013 Cloudera

More information

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018 Cloud Computing and Hadoop Distributed File System UCSB CS70, Spring 08 Cluster Computing Motivations Large-scale data processing on clusters Scan 000 TB on node @ 00 MB/s = days Scan on 000-node cluster

More information

docs.hortonworks.com

docs.hortonworks.com docs.hortonworks.com Hortonworks Data Platform : Security Administration Tools Guide Copyright 2012-2014 Hortonworks, Inc. Some rights reserved. The Hortonworks Data Platform, powered by Apache Hadoop,

More information

Dell EMC Ready System for VDI on XC Series

Dell EMC Ready System for VDI on XC Series Dell EMC Ready System for VDI on XC Series Citrix XenDesktop for Dell EMC XC Series Hyperconverged Appliance March 2018 H16969 Deployment Guide Abstract This deployment guide provides instructions for

More information

McAfee Boot Attestation Service 3.5.0

McAfee Boot Attestation Service 3.5.0 Product Guide McAfee Boot Attestation Service 3.5.0 For use with epolicy Orchestrator 4.6.7, 4.6.8, 5.1.0 Software COPYRIGHT Copyright 2014 McAfee, Inc. Do not copy without permission. TRADEMARK ATTRIBUTIONS

More information

How to Install and Configure Big Data Edition for Hortonworks

How to Install and Configure Big Data Edition for Hortonworks How to Install and Configure Big Data Edition for Hortonworks 1993-2015 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying,

More information

Dell EMC Ready Architecture for Red Hat OpenStack Platform

Dell EMC Ready Architecture for Red Hat OpenStack Platform Dell EMC Ready Architecture for Red Hat OpenStack Platform Cumulus Switch Configuration Guide Version 13 Dell EMC Service Provider Solutions ii Contents Contents List of Tables...iii Trademarks... iv Notes,

More information

Migrating VMs from VMware vsphere to Oracle Private Cloud Appliance O R A C L E W H I T E P A P E R O C T O B E R

Migrating VMs from VMware vsphere to Oracle Private Cloud Appliance O R A C L E W H I T E P A P E R O C T O B E R Migrating VMs from VMware vsphere to Oracle Private Cloud Appliance 2.3.1 O R A C L E W H I T E P A P E R O C T O B E R 2 0 1 7 Table of Contents Introduction 2 Environment 3 Install Coriolis VM on Oracle

More information

Release Notes for Cisco Application Policy Infrastructure Controller Enterprise Module, Release x

Release Notes for Cisco Application Policy Infrastructure Controller Enterprise Module, Release x Release s for Cisco Application Policy Infrastructure Controller Enterprise Module, Release 1.3.3.x First Published: 2017-02-10 Release s for Cisco Application Policy Infrastructure Controller Enterprise

More information

Polarion 18.2 Enterprise Setup

Polarion 18.2 Enterprise Setup SIEMENS Polarion 18.2 Enterprise Setup POL005 18.2 Contents Overview........................................................... 1-1 Terminology..........................................................

More information

Cisco Unified Provisioning Manager 2.2

Cisco Unified Provisioning Manager 2.2 Cisco Unified Provisioning Manager 2.2 General Q. What is Cisco Unified Provisioning Manager (UPM)? A. Cisco Unified Provisioning Manager is part of the Cisco Unified Communications Management Suite. Cisco

More information

Implementing Multi-Chassis Link Aggregation Groups (MC-LAG)

Implementing Multi-Chassis Link Aggregation Groups (MC-LAG) Implementing Multi-Chassis Link Aggregation Groups (MC-LAG) HPE Synergy Virtual Connect SE 40Gb F8 Module and Arista 7050 Series Switches Technical white paper Technical white paper Contents Introduction...

More information

Installation and Cluster Deployment Guide for KVM

Installation and Cluster Deployment Guide for KVM ONTAP Select 9 Installation and Cluster Deployment Guide for KVM Using ONTAP Select Deploy 2.7 March 2018 215-12976_B0 doccomments@netapp.com Updated for ONTAP Select 9.3 Table of Contents 3 Contents

More information

PRODUCT DOCUMENTATION. Backup & Replication v5.0. User Guide.

PRODUCT DOCUMENTATION. Backup & Replication v5.0. User Guide. PRODUCT DOCUMENTATION User Guide Backup & Replication v5.0 www.nakivo.com Table of Contents Solution Architecture... 4 Deployment...11 System Requirements... 12 Deployment Scenarios... 15 Installing NAKIVO

More information

Lenovo Big Data Reference Architecture for Hortonworks Data Platform Using System x Servers

Lenovo Big Data Reference Architecture for Hortonworks Data Platform Using System x Servers Lenovo Big Data Reference Architecture for Hortonworks Data Platform Using System x Last update: 12 December 2017 Version 1.1 Configuration Reference Number: BDAHWKSXX62 Describes the RA for Hortonworks

More information