Deployment Guide: IBM BigInsights with IBM Spectrum Scale and Ambari

Size: px

Start display at page:

Download "Deployment Guide: IBM BigInsights with IBM Spectrum Scale and Ambari"

Kerrie Hill
6 years ago
Views:

1 Deployment Guide: IBM BigInsights with IBM Spectrum Scale and Ambari March 28, 2016 Version 1.2 Written for: Apache Ambari V2.1 IBM BigInsights V4.1 IBM Open Platform with Apache Hadoop V4.1 IBM Spectrum Scale V4.1.1 and up

2 Contents Contents... 2 Figures and Tables Overview Installation options Package download... 7 IBM Spectrum Scale Hadoop Connector... 7 IBM Spectrum Scale Ambari integration module A new cluster setup Known limitations Preparing the environment Validating the network Set up password-less for root Preparing the environment for IOP Dependencies Software packages Kernel RPMs Set up the Yum repositories Ambari and IOP mirror repositories The IBM Spectrum Scale Yum repository OS Repository Ambari installation Install the Ambari-Server RPM Install the IBM Spectrum Scale Ambari integration module Setting up the Ambari server Starting the Ambari server Install IOP with IBM Spectrum Scale using Ambari Before you begin Ambari Wizard Create a cluster Welcome Screen Cluster Name Select Stack /86

3 Install Options Confirm Hosts Choose Services Assign Masters Assign Slaves and Clients Customize Services Starting deployment Review IBM Spectrum Scale deployment modes Deploy IOP over existing IBM Spectrum Scale file system (FPO) Deploy IOP over existing IBM Spectrum Scale file system (ESS) Additional steps for deploying IOP over existing IBM Spectrum Scale file system FPO or ESS Deploy IOP over new IBM Spectrum Scale file system (FPO support only) Verify and Test Installation Appendix A. Preparing a stanza File B. IBM Spectrum Scale-FPO Deployment Disk-partitioning algorithm Failure Group selection rules Rack Mapping File Partitioning Function Matrix in Automatic Deployment C. Dual-network deployment Two network adapters, configured with different sub-network addresses Two network adapters, configured with same sub-network addresses D. BigInsights Value Add Services on IBM Spectrum Scale Troubleshooting Value Add Services E. Node management Add Node Remove Node F. Upgrade IBM Spectrum Scale to Latest PTF G. Upgrade the IBM Spectrum Scale Ambari integration module H. IBM Spectrum Scale UI I. Collecting the snap data /86

4 J. HTTPS/REST API K. Resources FAQ Notices /86

5 Figures and Tables Figure 1 ambari login Figure 2 ambari welcome page Figure 3 ambari cluster name Figure 4 ambari select stack Figure 5 ambari install options host list Figure 6 ambari confirm hosts Figure 7ambari choose services Figure 8 ambari assign masters Figure 9 ambari assign slaves and clients Figure 10 ambari customize service iop tabs Figure 11 ambari IBM Spectrum Scale customize services standard and advanced settings Figure 12 ambari IBM Spectrum Scale custom services advance list Figure 13 ambari IBM Spectrum Scale data and metadata replicas Figure 15 ambari deployment review Figure 14ambari IBM Spectrum Scale hadoop local cache file stanza Figure 16 ambari nsd stanza Figure 17 ambari rack mapping Figure 18 ambari hosts panel Figure 19 ambari hosts gpfs node components Figure 20 ambari hosts actions Figure 21 ambari hosts actions delete host Figure 22 ambari upgrade IBM Spectrum Scale Figure 23 ambari dashboard actions stop all Figure 24 ambari dashboard add services Figure 25 ambari upgrade choose services Figure 26 ambari add service wizard Figure 27 ambari assign nodes - hadoop connector + IBM Spectrum Scale node Figure 28 ambari customize services verification Figure 29 ambari review panel Figure 30 ambari after upgrade dashboard Figure 31 ambari collect snap data Table 1 hadoop connector and ambari integration module... 7 Table 2ambari and iop repository packages Table 3 IBM Spectrum Scale editions Table 4 IBM Spectrum Scale checklist parameters Table 5IBM Spectrum Scale partitioning function matrix /86

6 1. Overview This document describes the installation and the configuration of the IBM BigInsights Open Platform with Apache Hadoop stack onto IBM Spectrum Scale filesystem by using the Apache Ambari framework. The IBM Open Platform with Apache Hadoop (IOP) supports the Hadoop Distributed File System (HDFS). IBM Spectrum Scale, formerly known as IBM General Parallel File System (IBM GPFS), is also supported for customers who require advanced capabilities like a POSIX compliant file system, information lifecycle management, incremental backups, high performance replication, and FIPS-140 / NIST complaint encryption. With Ambari, the system administrator can provision, manage, and monitor a Hadoop cluster. Ambari can also start and stop IBM Spectrum Scale services on all the nodes in the cluster and report the basic status information through the Ambari web user interface (UI). 1.1 Installation options You can install IBM Spectrum Scale in one of the following ways: Install IBM Spectrum Scale as part of the Ambari IOP installation. With this method, a new IBM Spectrum Scale File Placement Optimizer (FPO) is deployed and configured. IOP is then installed and configured to use IBM Spectrum Scale instead of HDFS. This procedure is the basic installation based on best practice policies for a big data cluster on IBM Spectrum Scale. Install IBM Spectrum Scale manually before installing Ambari IOP. During installation of IOP, the pre-created IBM Spectrum Scale filesystem is detected and only the Hadoop integration components for IBM Spectrum Scale are deployed. The installer will install and configure Hadoop workload on top of the existing IBM Spectrum Scale without any validation checking on the preexisting IBM Spectrum Scale configuration. This installation could be an existing FPO or shared storage, such as Elastic Storage Server (ESS) installation. This installation procedure can be used by advanced users. Add a node to an existing Ambari IOP cluster. Ambari adds nodes and installs IBM Spectrum Scale software onto the existing IBM Spectrum Scale cluster, such as an ESS configuration, but does not create any Network Shared Disks (NSDs) or add NSDs into the existing file system. You can view the current best practices for installation here: tem%20%28gpfs%29/page/references In all cases, a local repository for IBM Spectrum Scale is required. Ambari reads from the repository to deploy IBM Spectrum Scale, if it is not already created, and the following Hadoop integration components: Module Description 6/86

7 IBM Spectrum Scale Hadoop Connector IBM Spectrum Scale Ambari integration Module Provides an implementation of the Hadoop FileSystem API, thereby enabling Hadoop applications to use IBM Spectrum Scale as the distributed file system by using either the IBM Spectrum Scale (gpfs:///) or HDFS (hdfs:///) URI scheme. Enables basic administration of IBM Spectrum Scale within the Ambari console. When installed, a IBM Spectrum Scale service appears in the Ambari interface instead of the standard HDFS service. For a list of limitations, see Known Limitations. TABLE 1 HADOOP CONNECTOR AND AMBARI INTEGRATION MODULE 1.2 Package download IBM Spectrum Scale Hadoop Connector The IBM Spectrum Scale Hadoop connector is independently installed from IBM Spectrum Scale and provided as an RPM file. The Hadoop connector supports both IBM Spectrum Scale ESS and IBM Spectrum Scale FPO. Download IBM Spectrum Scale Hadoop Connector from the IBM Spectrum Scale wiki here: References: Hadoop Connector Download The module name is hadoop-gpfs-connector (version). WARNING: Saving this package in /root/ can cause installation problems. Note: There are two types of connectors: 1 st Generation and 2 nd Generation. However, only the 1 st Generation connector is supported and the 2 nd Generation connector does not have any Ambari support. The 2 nd Generation connector will be supported by the end of 2Q Support for the 2 nd Generation connector will be announced on the wiki. The Ambari-based installer attempts to detect a pre-existing Hadoop connector on each node. If one is found, the installer does not overwrite or re-deploy the connector. If IBM Spectrum Scale Hadoop Connector is not detected on any node, the installer deploys the IBM Spectrum Scale Hadoop Connector RPM file on all nodes from the IBM Spectrum Scale repository. IBM Spectrum Scale Ambari integration module The IBM Spectrum Scale Ambari Integration Module is independent of IBM Spectrum Scale and is provided as a separate RPM file. The Hadoop connector supports both, IBM Spectrum Scale ESS and IBM Spectrum Scale FPO deployments. 7/86

8 For traditional Hadoop clusters that use HDFS, an HDFS service appears in the Ambari console to provide a graphical management interface for the HDFS configuration (hdfs-site.xml) and the Hadoop cluster itself (coresite.xml). Through the Ambari HDFS Service, you can start and stop the HDFS service, make configuration changes, and propagate those changes across the cluster. IBM Spectrum Scale replaces HDFS, and the Ambari HDFS service is no longer used. The IBM Spectrum Scale Ambari integration module, provided as an RPM, creates an Ambari IBM Spectrum Scale service to start and stop IBM Spectrum Scale and make the configuration changes. Download the IBM Spectrum Scale Ambari integration module from the IBM Spectrum Scale wiki here: References: BigInsight Enterprise Manager The module name is gpfs.ambari-iop_4.1-(version). WARNING: Saving this package in /root/ can cause installation problems. 1.3 A new cluster setup The installation process attempts to detect an existing IBM Spectrum Scale file system. For IBM Spectrum Scale FPO, which is a multi-node, just-a-bunch-of-disk/jbod configuration, the installer can set up a new clustered file system if the hostnames of all the nodes and disk devices are available at each node via a stanza file. The installer designates manager roles and quorum nodes and creates NSDs and the file system. The best practices for the Hadoop configuration are automatically implemented. 2. Known limitations The following are the known limitations and workarounds. Note: This is an iterative document. For the latest version of this document, see the IBM Spectrum Scale wiki, References: BigInsight Enterprise Manager. Upgrading Ambari is not supported. The WebHDFS protocol is not supported. As a workaround, the HTTPFS protocol can be used and the HTTPFS setup is required if you want to use Big R. Ambari File View does not work because it requires WebHDFS. The alternatives include: SMB export for the Windows Explorer access NFS export for the NFS clients 8/86

9 A tool such as WinSCP that has the file browser capability and can be used to upload files All of these exploit the POSIX compliance feature of IBM Spectrum Scale. While deploying IOP over an existing IBM Spectrum Scale cluster, the IBM Spectrum Scale cluster must be started and the file system must be mounted on all the nodes before starting the Ambari deployment. If you cannot connect to the Ambari Server through the web browser, check to see if the following message is displayed in the Ambari Server log, which is located in /var/log/ambari-server: WARN [main] AbstractConnector:335 - insufficient threads configured for SelectChannelConnector@ :8080 The size of the threadpool can be increased to match the number of CPUs on the node where the Ambari Server is running. For example, if you have 160 CPUs, add the following properties to /etc/ambari-server/conf/ambari.properties: server.execution.scheduler.maxthreads=160 agent.threadpool.size.max=160 client.threadpool.size.max=160 Only IBM Spectrum Scale Version and later and IBM Spectrum Scale Version 4.2 and earlier is supported. Ambari supports the automatic installation of IBM Spectrum Scale Version and later. After the IBM Spectrum Scale base packages are installed by Ambari, the IBM Spectrum Scale PTF packages must be upgraded manually. See Upgrade IBM Spectrum Scale to Latest PTF. 6. After adding and removing nodes from Ambari, some aspects of the IBM Spectrum Scale configuration, such as pagepool as seen by running the mmlsconfig command, are not refreshed until after the next restart of the IBM Spectrum Scale Ambari service. However, this does not impact the functionality. 7. For CentOS, create the /etc/redhat-release file to simulate a Redhat environment. Otherwise, the Ambari deployment will fail. For example: # cat redhat-release Red Hat Enterprise Linux Server release 6.6 (Santiago) 8. The Big SQL uninstallation fails when IBM Spectrum Scale is used. Work around: Create the following symbolic link on all nodes: # ln -s /var/lib/ambari-agent/cache/stacks/biginsights/4.1.spectrumscale/\ services/bigsql \ /var/lib/ambari-agent/cache/stacks/biginsights/4.1/services/bigsql 9/86

10 9. If Symphony is installed, the Symphony fix pack, sym-7.1-build391507, is required. 10. While adding a new node to a cluster on IBM Spectrum Scale with Ambari, you must add all the services. Attempts to add individual services such as Sqoop, BigSQL worker, after adding the node fail due to the dependency on the HDFS client. 3. Preparing the environment 3.1 Validating the network While using a private network for Hadoop data nodes, ensure that all nodes, including the management nodes, have hostnames bound to the faster internal network. On all nodes, the hostname -f must return the FQDN of the faster internal network. This network can be a bonded network. If the nodes do not return the FQDN, modify /etc/sysconfig/network and use the hostname command to change the FQDN of the node. If the nodes in your cluster have two network adapters, see Dual Network Deployment. 3.2 Set up password-less for root The IBM Spectrum Scale Master node is a special role designated to the node on which Ambari is installed and issues IBM Spectrum Scale commands. Password-less SSH for root from the IBM Spectrum Scale master node to all other IBM Spectrum Scale nodes must be configured. In most cases, the IBM Spectrum Scale Master node is the Ambari server node. If the IBM Spectrum Scale master node is the Ambari server node, passwordless SSH requirement is concurrently met with when you perform the IOP pre-install tasks. If the IBM Spectrum Scale master node is not the Ambari server node, then set up passwordless SSH from the IBM Spectrum Scale Master node to all other nodes. 3.3 Preparing the environment for IOP 10/86

11 Before installing IBM Open Platform (IOP), pre-installation tasks must be performed on each node. These preinstallation tasks are the same for installing Hadoop with HDFS or IBM Spectrum Scale. 1. Perform the steps described at Preparing Your Environment: 2. Pre-create Hadoop services IDs and groups according to If you are using LDAP, create the IDs and groups on the LDAP server and ensure that all nodes can authenticate the users. If you are using local IDs, the IDs must be pre-created on all nodes. While Ambari can create service IDs and groups during installation, this is not optional for IBM Spectrum Scale deployments because Ambari does not guarantee consistent UIDs and GIDs (see JIRA AMBARI ). This is not critical for HDFS but for IBM Spectrum Scale because it is a kernel-level file system. For example: groupadd --gid 1000 hadoop groupadd --gid 1016 rddcached #optionally align rddcached GID with UID useradd -g hadoop -u 1001 ams useradd -g hadoop -u 1002 hive useradd -g hadoop -u 1003 oozie useradd -g hadoop -u 1004 ambari-qa useradd -g hadoop -u 1005 flume useradd -g hadoop -u 1006 hdfs useradd -g hadoop -u 1007 solr useradd -g hadoop -u 1008 knox useradd -g hadoop -u 1009 spark useradd -g hadoop -u 1010 mapred useradd -g hadoop -u 1011 hbase useradd -g hadoop -u 1012 zookeeper useradd -g hadoop -u 1013 sqoop useradd -g hadoop -u 1014 yarn useradd -g hadoop -u 1015 hcat useradd -g rddcached -u 1016 rddcached #optionally align rddcached GID with UID useradd -g hadoop -u 1017 kafka 4. Dependencies 4.1 Software packages 11/86

12 This section lists only the dependencies for Ambari and IBM Spectrum Scale. The dependencies for Hadoop have not been listed in this section. The installer can access a RHEL repository from every node of the Hadoop cluster. Failure in executing the yum install <RPM> command causes the overall installation process to fail. The following packages must be installed on the Ambari server node: postgresql postgresql-server postgresql-libs The following packages must be installed on all IBM Spectrum Scale nodes: ksh libstdc++ libstdc++-devel compat-libstdc++(only X86_64; not needed for ppc64/ppc64le) kernel kernel-devel gcc-c++ libstdc++ imake(x86_64 only; not needed for ppc64/ppc64le) make The following recommended packages can be downloaded on all nodes: acl, libacl to enable Hadoop ACL support libattr to enable Hadoop extended attributes Some of these packages are installed by default while installing the operating system. 4.2 Kernel RPMs Check the kernel RPMs that are installed. Unlike HDFS, IBM Spectrum Scale is a kernel-level file system and as such, integrates tightly with the operating system. This is a critical dependency. Ensure the environment has matching kernel, kernel-devel, and kernel-headers. 1. On all nodes, check the kernel RPMs that are installed: rpm -qa grep kernel 2. On all nodes, confirm that the output includes the following: kernel-headers kernel-devel 12/86

13 kernel If any of the kernel RPMs are missing, install them. If the kernels already exist, run the yum install commands. yum -y install kernel-headers kernel-devel 3. Validate that all of the kernel RPM versions match. WARNING: Kernels are updated after the original operating system installation. Ensure that the active kernel version matches the installed version of both kernel-devel and kernel-headers. ~]# uname r el7.ppc64le<== Find kernel-devel and kernel-headers to match this [root@mn01-dat ~]# rpm -qa grep kernel kernel el7.ppc64le kernel-tools-libs el7.ppc64le kernel-devel el7.ppc64le<== kernel-devel matches kernel-bootwrapper el7.ppc64le kernel-headers el7.ppc64le<== kernel-headers matches kernel-tools el7.ppc64le If multiple kernels are installed, check to ensure that only one instance of the kernel, kernel-header, and kernel-devel are installed. If older kernel packages are installed, remove them. 5. Set up the Yum repositories IBM Open Platform (IOP) and BigInsights support installation by reading from the IBM-hosted Yum repositories or the local mirror repositories. Reading from the local mirror repositories is faster for multi-node clusters because each node performs their own download of repository code. IBM Spectrum Scale only supports installation through a local repository. 5.1 Ambari and IOP mirror repositories 1. Obtain the appropriate tarballs based on the operating system of the cluster for the IBM Open Platform repository and Ambari packages. Only the operating systems and the hardware listed in the repository are supported. Use either wget or curl -O to get the tarballs. 13/86

14 Linux x86-64(rhel6) Ambari platform.ibm.com/repos/ambari/rhel/6/x86_64/2.1.x/ga/2.1/ambari el6.x86_64.tar.gz IOP IOP-Utils el6.x86_64.tar.gz el6.x86_64.tar.gz Linux x86-64(rhel7) Ambari platform.ibm.com/repos/ambari/rhel/7/x86_64/2.1.x/ga/2.1/ambari el7.x86_64.tar.gz IOP IOP-Utils el7.x86_64.tar.gz el7.x86_64.tar.gz Power Linux LE (RHEL7) Ambari platform.ibm.com/repos/ambari/rhel/7/ppc64le/2.1.x/ga/2.1/ambari el7.ppc64le.tar.gz IOP IOP-Utils el7.ppc64le.tar.gz el7.ppc64le.tar.gz TABLE 2AMBARI AND IOP REPOSITORY PACKAGES Note: If you are using a Windows system to download the files, you can also open the URLs in a web browser and proceed to download the files. You can then transfer the files to the system that will host the mirror repository files. 2. Select a server to act as the mirror repository server. This server requires the installation of the Apache HTTP server or a similar HTTP server. Every node in the Hadoop cluster must be able to access this repository server. This mirror server can be defined in the DNS or you can add an entry for the mirror server in /etc/hosts on each node of the cluster. 14/86

15 a) Create an HTTP server on the mirror repository server, such as Apache httpd. If Apache httpd is not already installed, install it with the yum install httpd command. You can start Apache httpd by performing the following steps: apachectl start or service httpd start Optional: Ensure the http server starts automatically on reboot: chkconfig httpd on b) Ensure that any firewall settings allow inbound HTTP access from your cluster nodes to your mirror web server. c) On the mirror repository server, create a directory for your repositories, such as <document root>/repos. For Apache httpd with document root /var/www/html, type the following command: mkdir -p /var/www/html/repos d) Test your local repository by browsing to the web directory: This example uses RHEL 7.1 # rpm -qa grep httpd httpd-tools el7_1.1.x86_64 httpd el7_1.1.x86_64 # service httpd start # service httpd status Redirecting to /bin/systemctl status httpd.service httpd.service - The Apache HTTP Server Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled) Active: active (running) since Tue :37:06 EDT; 2 weeks 2 days ago Process: 6270 ExecReload=/usr/sbin/httpd $OPTIONS -k graceful (code=exited, status=0/success) Main PID: 3984 (httpd) Status: "Total requests: 0; Current requests/sec: 0; Current traffic: 0 B/sec" CGroup: /system.slice/httpd.service \u251c\u /usr/sbin/httpd -DFOREGROUND \u251c\u /usr/sbin/httpd -DFOREGROUND \u251c\u /usr/sbin/httpd -DFOREGROUND \u251c\u /usr/sbin/httpd -DFOREGROUND \u251c\u /usr/sbin/httpd -DFOREGROUND \u251c\u /usr/sbin/httpd -DFOREGROUND \u251c\u /usr/sbin/httpd -DFOREGROUND \u251c\u /usr/sbin/httpd -DFOREGROUND \u251c\u /usr/sbin/httpd -DFOREGROUND 15/86

16 \u251c\u /usr/sbin/httpd -DFOREGROUND \u2514\u /usr/sbin/httpd -DFOREGROUND Oct 20 14:37:06 c902mnp08 httpd[3984]: [Tue Oct 20 14:37: ] [so:warn] [pid 3984] AH01574: module rewrite_module is already loaded, skipping Oct 20 14:37:06 c902mnp08 httpd[3984]: AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using s message Oct 20 14:37:06 c902mnp08 systemd[1]: Started The Apache HTTP Server. Oct 25 03:44:01 c902mnp08 systemd[1]: Reloading The Apache HTTP Server. Oct 25 03:44:02 c902mnp08 httpd[50951]: [Sun Oct 25 03:44: ] [so:warn] [pid 50951] AH01574: module rewrite_module is already loaded, skipping Oct 25 03:44:02 c902mnp08 systemd[1]: Reloaded The Apache HTTP Server. Nov 02 03:06:01 c902mnp08 systemd[1]: Reloading The Apache HTTP Server. Nov 02 03:06:01 c902mnp08 httpd[6270]: [Mon Nov 02 03:06: ] [so:warn] [pid 6270] AH01574: module rewrite_module is already loaded, skipping Nov 02 03:06:02 c902mnp08 systemd[1]: Reloaded The Apache HTTP Server. Hint: Some lines were ellipsized, use -l to show in full. # systemctl enable httpd 3. Log in to the mirror repository server as root and extract the Ambari and IOP repository tarballs into the repos directory under <document root> (For example: /var/www/html/repos). For each of the tarballs downloaded in the previous step, run the following commands: cd /var/www/html/repos tar xzvf <path to downloaded tarballs> The result should be 3 subdirectories under /var/www/html/repos, one for each extracted tarball. This example uses RHEL 7.1. IOP # cd /var/www/html/repos # tar xzvf iop el7.x86_64.tar.gz IOP/rhel/7/x86_64/4.1.x/GA/ / IOP/rhel/7/x86_64/4.1.x/GA/ /kafka/ IOP/rhel/7/x86_64/4.1.x/GA/ /kafka/noarch/ IOP-UTILS # cd /var/www/html/repos # tar xzvf iop-utils el7.x86_64.tar.gz IOP-UTILS/ IOP-UTILS/rhel/ IOP-UTILS/rhel/7/ IOP-UTILS/rhel/7/x86_64/ IOP-UTILS/rhel/7/x86_64/1.1/... Ambari # cd /var/www/html/repos 16/86

17 # tar xzvf ambari el7.x86_64.tar.gz Ambari/rhel/7/x86_64/2.1x/GA/2.1/ Ambari/rhel/7/x86_64/2.1x/GA/2.1/ambari-server-2.1.0_IBM-4.x86_64.rpm Ambari/rhel/7/x86_64/2.1x/GA/2.1/ambari-metrics-collector-2.1.0_IBM- 4.x86_64.rpm Ambari/rhel/7/x86_64/2.1x/GA/2.1/ambari-metrics-monitor-2.1.0_IBM- 4.x86_64.rpm Ambari/rhel/7/x86_64/2.1x/GA/2.1/ambari-metrics-hadoop-sink-2.1.0_IBM- 4.x86_64.rpm Ambari/rhel/7/x86_64/2.1x/GA/2.1/ambari-agent-2.1.0_IBM-4.x86_64.rpm Ambari/rhel/7/x86_64/2.1x/GA/2.1/ambari-log4j-2.1.0_IBM_8.noarch.rpm Ambari/rhel/7/x86_64/2.1x/GA/2.1/repodata/ Ambari/rhel/7/x86_64/2.1x/GA/2.1/repodata/652ba4ae68cb7da47520a2e5361e37d6aff4fc e405693b0c9747b0f811bd0e3a-other.xml.gz Ambari/rhel/7/x86_64/2.1x/GA/2.1/repodata/repomd.xml Ambari/rhel/7/x86_64/2.1x/GA/2.1/repodata/745c87f70592df43313a23a64359b9cff5adc8 a e7ae8c584dff9b07f-primary.sqlite.bz2 Ambari/rhel/7/x86_64/2.1x/GA/2.1/repodata/1fbb5a4f2cbe f3e58bd9b7c2c28 a3181b1af14a5992e9e7583c54-other.sqlite.bz2 Ambari/rhel/7/x86_64/2.1x/GA/2.1/repodata/475de5547d77578f29f8fd3e3fdf98b56b4cff 40a59d717cdb147347df34ae3b-primary.xml.gz Ambari/rhel/7/x86_64/2.1x/GA/2.1/repodata/ebb308e260270ac8092fc942f6b97b359e fd1cb744b8efc9de6b08fb7-filelists.sqlite.bz2 Ambari/rhel/7/x86_64/2.1x/GA/2.1/repodata/40aa5cfd67b38eaea2e77fd98945afccd58341 bf6c8183acd3d96f7f1a7ad745-filelists.xml.gz Ambari/rhel/7/x86_64/2.1x/GA/2.1/ambari.repo URLs for each Yum repository: IOP: IOP-UTILS: Ambari: 5.2 The IBM Spectrum Scale Yum repository Note: If you have already set up an IBM Spectrum Scale file system, you can skip this section. The following instructions are written for customers deploying IBM Open Platform (IOP) with IBM Spectrum Scale Advanced Edition, the version that is included with BigInsights Enterprise Management. If you are using Ambari to install IBM Spectrum Scale, use the Standard or Advanced Edition of IBM Spectrum Scale. IBM Spectrum Scale Express Edition can be used only if it is installed and configured manually before installing Ambar and IOP. The following list of RPM packages for IBM Spectrum Scale v4.1.1 can help verify the edition of IBM Spectrum Scale. 17/86

18 IBM Spectrum Scale Edition rpm package list Express Edition gpfs.base gpfs.gpl gpfs.docs gpfs.gskit gpfs.msg.en_us gpfs.platform Standard Edition <Express Edition rpm list> + gpfs.ext Advanced Edition <Standard Edition rpm list> + gpfs.crypto For IBM Spectrum Scale 4.2 release: Add gpfs.adv to list above TABLE 3 IBM SPECTRUM SCALE EDITIONS When you purchase the IBM Spectrum Scale license, get an account for the Passport Advantage Website to download the IBM Spectrum Scale packages. For internal IBM users, follow the instructions at Software Sellers Workplace. For customer POC or trial licenses, send an to scale@us.ibm.com. Example uses IBM Spectrum Scale version. 1. On the repository web server, create a directory for your IBM Spectrum Scale repos, such as <document root>/repos/gpfs. For Apache httpd with document root /var/www/html, type the following command: mkdir -p /var/www/html/repos/gpfs 2. Obtain the IBM Spectrum Scale Software. If you have already installed IBM Spectrum Scale manually, skip this step. In Passport Advantage, locate and download the software package titled: IBM Spectrum Scale Advanced Depending on your license entitlement, the package might be available as part of IBM BigInsights Enterprise Management Version 4.1 or IBM BigInsights for Apache Hadoop Version 4.1 rather than as a stand-alone IBM Spectrum Scale package. Download the file IBM_SPECTRUM_SCALE_ADVLx86_4.1.1.tar.gz (x86_64) or IBM_SPECTRUM_SCALE_ADV_LxP8_4.1.1.tar.gz (ppc64le), then extract the installer. As root or a user with sudo privileges: 18/86

19 tar zxvf IBM_SPECTRUM_SCALE_ADVLx86_4.1.1.tar.gz chmod +x Spectrum_Scale_install _x86_64_advanced./Spectrum_Scale_install _x86_64_advanced --dir /var/www/html/repos/gpfs--silent Note: The --silent option is used to accept the Software License Agreement and the --dir option places the IBM Spectrum Scale RPMs into the directory/var/www/html/repos/gpfs. Without specifying the --dir optionthe default location will be /usr/lpp/mmfs/ If the RPM was extracted into the IBM Spectrum Scale default directory, /usr/lpp/mmfs/4.1.1, copy all the IBM Spectrum Scale RPM files into the IBM Spectrum Scale repository path: cd /usr/lpp/mmfs/4.1.1 cp gpfs*.rpm /var/www/html/repos/gpfs 4. Ensure the directory does not contain any optional IBM Spectrum Scale rpm packages. See IBM Spectrum Scale Installation Guide for the specific version for more information on base and optional packages. Ambari and IOP require only the following packages: gpfs.base gpfs.gpl gpfs.docs gpfs.gskit gpfs.msg.en_us gpfs.ext gpfs.crypto (if Advanced edition is used) gpfs.adv (if IBM Spectrum Scale 4.2 Advanced edition is used) 5. Delete the gpfs.hadoop-connector RPM from /var/www/html/repos/gpfs. cd/var/www/html/repos/gpfs rm gpfs.hadoop-connector*.rpm Use the newest connector packages downloaded from IBM Spectrum Scale Hadoop Connector. 6. Copy the IBM Spectrum Scale Hadoop Connector RPM to the IBM Spectrum Scale repo path. 19/86

20 cp gpfs.hadoop-connector (version) /var/www/html/repos/gpfs WARNING: If you want to apply upgrade IBM Spectrum Scale from GA level code, do not put the PTF update packages into the IBM Spectrum Scale repo package PATH. The PTF update packages must be installed separately after the cluster has been installed and configured. 7. Check for IBM Spectrum Scale RPMs in the /root/ directory. If the RPMs exist, relocate them to a subdirectory. There are known issues with IBM Spectrum Scale RPMS in the /root that cause the Ambari installation to fail. 8. Create the YUM repository createrepo /var/www/html/repos/gpfs/ # cd /var/www/html/repos/gpfs/ #createrepo. Spawning worker 0 with 8 pkgs Workers Finished Gathering worker results Saving Primary metadata Saving file lists metadata Saving other metadata Generating sqlite DBs Sqlite DBs complete 9. Access the repository at OS Repository Because some of the IBM Spectrum Scale RPMs have dependencies on all nodes, you must also create the operating system repository. 1. Create the repository path: mkdir /var/www/html/repos/<rhel_oslevel> 2. Synchronize each local directory with the current yum repository: Run the following: cd /var/www/html/repos/<rhel_oslevel> reposync --gpgcheck -l --repoid=rhel-7-server-rpms --download_path=/var/www/html/repos/<rhel_oslevel> 3. Create the repository for this node: createrepo -v /var/www/html/repos/<rhel_oslevel> 20/86

21 4. Ensure that all the firewalls are disabled or you have the httpd service port open, because Yum uses http to get the packages from the repository. 5. On all nodes in the cluster that require the repositories, create a file in /etc/yum.repos.d called local_<rhel_oslevel>.repo 6. Copy this file to all nodes. The contents of this file must look like the following: [local_rhel7.1] name=local_rhel7.1 enabled=yes baseurl= IP that all nodes can reach>/repos/<rhel_oslevel> gpgcheck=no 7. Run the yum repolist and yum install RPMs without external connections. 6. Ambari installation 6.1 Install the Ambari-Server RPM 1. Log on to the Ambari server and create the Ambari YUM repo file, ambari.repo. In this example the Ambari-server is compute000. Replace this hostname with your Ambari-server hostname and use the appropriate value for baseurl for the local repository previously configured. Perform this step on the Ambari-server only. WARNING: Verify if or is functioning for your repository. [root@smn GPFS]#ssh compute000 [root@compute000 ~]# cat /etc/yum.repos.d/ambari.repo [BI_AMBARI-2.1.0] name=ambari baseurl= enabled=1 gpgcheck=0 [root@compute000 ~]# yum clean all [root@compute000 ~]#yum makecache 21/86

22 2. Use yum to install the ambari-server rpm: yum -y install ambari-server ~]#yum -y install ambari-server Loaded plugins: product-id, subscription-manager This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register. BI_AMBARI kb 00:00:00 xcat-rhels7.1-path0 4.1 kb 00:00:00 xcat-otherpkgs0 2.9 kb 00:00:00 Resolving Dependencies --> Running transaction check ---> Package ambari-server.x86 0:2.1.0_IBM-3 will be installed --> Processing Dependency: postgresql-server >= 8.1 for package: ambari-server-2.1.0_ibm-3.x86 --> Running transaction check ---> Package postgresql-server.x86 0: ael7b will be installed --> Processing Dependency: postgresql(ppc-64) = ael7b for package: postgresql-server ael7b.x86 --> Processing Dependency: postgresql-libs(ppc-64) = ael7b for package: postgresql-server ael7b.x86 --> Processing Dependency: libpq.so.5()(64bit) for package: postgresql-server ael7b.x86 --> Running transaction check ---> Package postgresql.x86 0: ael7b will be installed ---> Package postgresql-libs.x86 0: ael7b will be installed --> Finished Dependency Resolution Dependencies Resolved ========================================================================================================== =================================================== Package Arch Version Repository Size ========================================================================================================== =================================================== Installing: ambari-server x _IBM-3 BI_AMBARI _ M Installing for dependencies: postgresql x ael7b xcat-rhels7.1-path0 3.0 M postgresql-libs x ael7b xcat-rhels7.1-path0 241 k postgresql-server x ael7b xcat-rhels7.1-path0 4.1 M Transaction Summary ========================================================================================================== =================================================== Install 1 Package (+3 Dependent packages) Total download size: 348 M Installed size: 404 M Downloading packages: (1/4): postgresql-libs ael7b.x86.rpm 241 kb 00:00:00 (2/4): postgresql ael7b.x86.rpm 3.0 MB 00:00:00 (3/4): postgresql-server ael7b.x86.rpm 4.1 MB 00:00:00 (4/4): ambari-server-2.1.0_ibm-3.x86.rpm 341 MB 00:00: Total 59 MB/s 348 MB 00:00:05 Running transaction check Running transaction test Transaction test succeeded Running transaction Installing : postgresql-libs ael7b.x86 1/4 Installing : postgresql ael7b.x86 2/4 Installing : postgresql-server ael7b.x86 3/4 Installing : ambari-server-2.1.0_ibm-3.x86 4/4 22/86

23 Verifying : postgresql-libs ael7b.x86 1/4 Verifying : postgresql ael7b.x86 2/4 Verifying : postgresql-server ael7b.x86 3/4 Verifying : ambari-server-2.1.0_ibm-3.x86 4/4 Installed: ambari-server.x86 0:2.1.0_IBM-3 Dependency Installed: postgresql.x86 0: ael7b postgresql-libs.x86 0: ael7b postgresql-server.x86 0: ael7b Complete! 6.2 Install the IBM Spectrum Scale Ambari integration module 1. Install the gpfs.ambari integration module onto the Ambari server node: chmod755gpfs.ambari-iop-<version>.noarch.bin./gpfs.ambari-iop-<version>.noarch.bin Note: To avoid potential problems, do not put the gpfs-ambari integration package in /root/. [root@compute000 ~]#chmod 755gpfs.ambari-iop_ noarch.bin [root@compute000 ~]#./gpfs.ambari-iop_ noarch.bin International License Agreement for Non-Warranted Programs Part 1 - General Terms BY DOWNLOADING, INSTALLING, COPYING, ACCESSING, CLICKING ON AN "ACCEPT" BUTTON, OR OTHERWISE USING THE PRO- GRAM, LICENSEE AGREES TO THE TERMS OF THIS AGREEMEN T. IF YOU ARE ACCEPTING THESE TERMS ON BEHALF OF LICENSEE, YOU REPRESENT AND WARRANT THAT YOU HAVE FULL AU- THORITY TO BIND LICENSEE TO THESE TERMS. IF YOU DO NOT AGREE TO THESE TERMS, * DO NOT DOWNLOAD, INSTALL, COPY, ACCESS, CLICK ON AN "ACCEPT" BUTTON, OR USE THE PROGRAM; AND * PROMPTLY RETURN THE UNUSED MEDIA AND DOCUMENTATION TO THE PARTY FROM WHOM IT WAS OBTAINED FOR A RE- FUND OF THE AMOUNT PAID. IF THE PROGRAM WAS DOWNLOADED, D ESTROY ALL COPIES OF THE PROGRAM.... Z (07/2011) Do you agree to the above license terms? [yes or no] yes Unpacking... Done Installing... Preparing... ################################# [100%] Updating / installing... 1:gpfs.ambari-iop_ ################################# [100%] 23/86

24 2. Update the Ambari configuration file to use the local repository. If a cloned local Yum repository is used, the Ambari configuration file must be updated before setting up the Ambari server. Update the value of openjdk1.8.url and openjdk1.7.url in /etc/ambari-server/conf/ambari.properties. Instead of ibm-open-platform.ibm.com, specify the hostname of your local repository server. Also check the protocol type (http vs. https). [root@compute000 scripts]# cat /etc/ambari-server/conf/ambari.properties grep ibm-open openjdk1.8.url= server>/repos/iop-utils/rhel7/x86/1.1/openjdk/jdk tar.gz openjdk1.7.url= server>/repos/iop-utils/rhel7/x86/1.1/openjdk/jdk tar.gz 3. Update the number of threads used by Ambari-server and Ambari-agent to match the number of cpus on the nodes. In this example, the value is updated to 160: [root@compute001 ~]# nproc 160 [root@compute001 ~]# grep -i thread /etc/ambari-server/conf/ambari.properties agent.threadpool.size.max=160 server.execution.scheduler.maxthreads=160 client.threadpool.size.max= Update the Spark params.py file in the Ambari server stack definition so that the Spark history services can be started. The parameter, spark_eventlog_dir_mode, is by default which will cause permission issues when starting Spark History Service. The workaround is to change the value to This change can be made at any time before or after the initial deployment. Note: If you have already set up Ambari, restart the Ambari server after making this change. vim /var/lib/ambari-server/resources/stacks/biginsights/4.1/services/spark/package/scripts/params.py 70 spark_hdfs_user_dir = format("/user/{spark_user}") 71 spark_hdfs_user_mode = spark_eventlog_dir_mode = spark_jar_hdfs_dir = "/iop/apps/ /spark/jars" 74 spark_jar_hdfs_dir_mode = spark_jar_file_mode = spark_jar_src_dir = "/usr/iop/current/spark-client/lib" 77 spark_jar_src_file = "spark-assembly.jar" 24/86

25 5. Update the hive.py file in the Ambari server stack definition to change the permission of Hive s data warehouse directory. The data warehouse directory is specified as hive.metastore.warehouse.dir. The default directory is /apps/hive/warehouse. When starting the Hive service, the permission of this directory is reset to770 (rwx rwx ---) and the directory is owned by hive:hadoop. Therefore, other users from other groups cannot access the directory and thus cannot create any hive database or table under the warehouse. Make the following change to allow other users to be able to create hive database or tables: vim /var/lib/ambari-server/resources/stacks/biginsights/4.0/services/hive/package/scripts/hive.py 171 params.hdfsresource(params.hive_apps_whs_dir, 172 type="directory", 173 action="create_on_execute", 174 owner=params.hive_user, 175 group=params.user_group, 176 mode= ) This change can be done after the initial deployment and the Ambari server must be restarted to make it effective. 6. Update the Yum repository file: /var/lib/ambari-server/resources/stacks/biginsights/4.1.spectrumscale/repos/repoinfo.xml You can change the repo info in the Ambari GUI later, but those changes will not be saved into the repoinfo.xml file. This happens because at the next restart of the Ambari-server, Ambari checks the repoinfo.xml that has been updated and uploads the file into the database. Therefore, the previous changes are lost. [root@compute000 ~]# cat /var/lib/ambari-server/resources/stacks/biginsights/4.1.spectrumscale/repos/repoinfo.xml <?xml version="1.0"?>  <reposinfo> <mainrepoid>iop-4.1</mainrepoid> <os family="redhat6"> <repo> <baseurl> server>/repos/iop/rhel6/x86_64/4.1</baseurl> <repoid>iop-4.1</repoid> <reponame>iop</reponame> 25/86

26 </repo> <repo> <baseurl> server>/repos/iop-utils/rhel6/ppc64le/1.1</baseurl> <repoid>iop-utils-1.0</repoid> <reponame>iop-utils</reponame> </repo> </os> </reposinfo> 6.3 Setting up the Ambari server Run the setup command to configure your Ambari Server, Database, JDK, LDAP, and other options: ambari-server setup ~]#ambari-server setup Using python /usr/bin/python2.7 Setup ambari-server Checking SELinux... SELinux status is 'disabled' Customize user account for ambari-server daemon [y/n] (n)? n Adjusting ambari-server permissions and ownership... Checking firewall status... Redirecting to /bin/systemctl status iptables.service Checking JDK... [1] OpenJDK [2] OpenJDK (deprecated) [3] Custom JDK ============================================================================== Enter choice (1): 1 <==== Note: JDK or greater is required to run Spark applications with Platform Symphony on the ppc64le platform Downloading JDK from server>/repo/iop-utils_1.1/openjdk/jdk tar.gz to /var/lib/ambari-server/resources/jdk tar.gz jdk tar.gz % (48.3 MB of 48.3 MB) Successfully downloaded JDK distribution to /var/lib/ambari-server/resources/jdk tar.gz Installing JDK to /usr/jdk64/ Successfully installed JDK to /usr/jdk64/ Completing setup... Configuring database... Enter advanced database configuration [y/n] (n)? n Configuring database... Default properties detected. Using built-in database. Configuring ambari database... Checking PostgreSQL... Running initdb: This may take upto a minute. Initializing database... OK 26/86

27 About to start PostgreSQL Configuring local database... Connecting to local database...done. Configuring PostgreSQL... Restarting PostgreSQL Extracting system views......ambari-admin-2.1.0_ibm.jar.. Adjusting ambari-server permissions and ownership... Ambari Server 'setup' completed successfully. 6.4 Starting the Ambari server Ambari server uses Port 8080 by default. If there are any other services that using this port, another port can be assigned to Ambari. To change the default port of Ambari, change or add the following line in /etc/ambari-server/conf/ambari.properties: client.api.port=<port_number> The port number can be changed later with ambari-server restart after adding the port you want to the /etc/ambari-server/conf/ambari.properties file. Optionally, PostgreSQL is used by the Ambari server to store the cluster configuration information. Ensure that it restarts after reboot: chkconfig postgresql on Then, start Ambari server: ambari-server start ~]# ambari-server start Using python /usr/bin/python2.7 Starting ambari-server Ambari Server running with administrator privileges. Organizing resource files at /var/lib/ambari-server/resources... Server PID at: /var/run/ambari-server/ambari-server.pid Server out at: /var/log/ambari-server/ambari-server.out Server log at: /var/log/ambari-server/ambari-server.log Waiting for server start... 27/86

28 Ambari Server 'start' completed successfully. 7. Install IOP with IBM Spectrum Scale using Ambari 7.1 Before you begin Set up passwordless ssh access for root. Before the installation, configure the root passwordless access from the IBM Spectrum Scale master node to all other IBM Spectrum Scale nodes. This is required for IBM Spectrum Scale. The following steps are for passwordless access for root : a. Define Node1 as the IBM Spectrum Scale master. b. Log on to Node1 as the root user. # cd /root/.ssh c. Generate a pair of public authentication keys. Do not type a passphrase. # ssh-keygen -t rsa Generating the public-private rsa key pair. Type the name of the file in which you want to save the key (/root/.ssh/id_rsa): Type the passphrase. Type the same passphrase again. The identification has been saved in /root/.ssh/id_rsa. The public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: Note: During ssh-keygen -t rsa, accept the default for all. d. Set the public key into authorized_keys file # cd /root/.ssh/; cat id_rsa.pub > authorized_keys e. Copy the generated public key file to nodex # scp /root/.ssh/* root@nodex :/root/.ssh f. Make sure public key file permission is correct # ssh root@nodex chmod 700.ssh; chmod 640.ssh/authorized_keys" g. Check passwordless access # ssh node2 [root@node1 ~]# ssh node2 The authenticity of host 'gpfstest9 ( )' can't be established. RSA key fingerprint is 03:bc:35:34:8c:7f:bc:ed:90:33:1f:32:21:48:06:db. Are you sure you want to continue connecting (yes/no)?yes 28/86

29 Note: You also need to run ssh node1 to add the key into /root/.ssh/known_hosts for passwordless access. If you have pre-installed a Spectrum Scale file system: a. Ensure that IBM Spectrum Scale is set to automount on reboot mmchfs<device> -A yes b. Ensure that the IBM Spectrum Scale cluster is started on all nodes /usr/lpp/mmfs/bin/mmstartup -a c. Ensure that the IBM Spectrum Scale filesystem is mounted on all nodes mmmount<fs-name> -a d. Ensure that no IBM Spectrum Scale NSD stanza file called gpfs_nsd (default expected name) exists under /var/lib/ambari-server/resources/ on the Ambari server node e. If ESS is used as the shared storage in an Ambari cluster, create a shared node information file for the ESS cluster called /var/lib/ambari-server/resources/shared_gpfs_node.cfg on the Ambari server. This file must contain only one hostname which is the hostname of the node in the ESS cluster. Ambari uses this one node to join the ESS cluster. Password-less SSH must be configured from the Ambari server to this node. [root@compute000 scripts]# cat /var/lib/ambari-server/resources/shared_gpfs_node.cfg compute002 If you are using Ambari to deploy a new IBM Spectrum Scale FPO file system, prepare a GPFS NSD Stanza file as described in Preparing a stanza File and add it to /var/lib/ambari-server/resources/ on the Ambari server node. The default file name is gpfs_nsd. Because there are likely repository changes in the environment, clear all the yum packages and headers from the cache on all nodes. yum clean all; yum makecache 7.2 Ambari Wizard Open a browser (Firefox or Internet Explorer) to log on to the Ambari administrator console at 29/86

30 Platform Cluster Manager (PCM) uses Port Therefore, IBM customers using both PCM and Ambari on the same host require a port change. The ISH and IDEA solutions have this configuration. If you do not see the Ambari console, the default client port (8080) might have been changed by setting client.api.port in /etc/ambari-server/conf/ambari.properties. The default Ambari account is admin:admin. FIGURE 1 AMBARI LOGIN 7.3 Create a cluster Welcome Screen Click Create a Cluster > Launch Install Wizard. 30/86

FIGURE 2 AMBARI WELCOME PAGE Cluster Name Type a name for the cluster FIGURE 3 AMBARI CLUSTER NAME Select Stack Click Next and the BigInsights stacks now appear on the Select Stack page.

1 Description Installs BigInsights and IBM Spectrum Scale at the same time Installs BigInsights with HDFS Note: Expand the Advanced Repository Options section to review the repository settings.

31 FIGURE 2 AMBARI WELCOME PAGE Cluster Name Type a name for the cluster FIGURE 3 AMBARI CLUSTER NAME Select Stack Click Next and the BigInsights stacks now appear on the Select Stack page. Stack Name BigInsights 4.1 IBM Spectrum Scale BigInsights 4.1 Description Installs BigInsights and IBM Spectrum Scale at the same time Installs BigInsights with HDFS Note: Expand the Advanced Repository Options section to review the repository settings. Ensure that the local mirror repository configured is correct. If you are installing on RHEL6, you can uncheck the repository information for RHEL7. If you are installing RHEL7, you can uncheck repository information for RHEL6. Validate the base URLs for all the local repositories: IBM Spectrum Scale, IOP, and IOP-UTILS. Note: If you want to use the public BigInsight IOP 4.1 repository, such as 31/86

platform.ibm.com/repos/iop/rhel/6/x86_64/4.1.x/ga/4.1.0.0/, and IOP-UTILS 1.1, such as http://ibmopen-platform.ibm.com/repos/iop-utils/rhel/6/x86_64/1.

32 platform.ibm.com/repos/iop/rhel/6/x86_64/4.1.x/ga/ /, and IOP-UTILS 1.1, such as ensure that all the nodes in the cluster can access the internet. In this mode, installation might take more time because it needs to download all the RPM packages during installation. FIGURE 4 AMBARI SELECT STACK Install Options On the Install Options screen, in the Target Hosts section, type the list of fully qualified domain names (FQDNs) of the nodes in the cluster. For SSH Private Key, upload or copy and paste the key from /root/.ssh/id_rsa from the Ambari server. 32/86

33 FIGURE 5 AMBARI INSTALL OPTIONS HOST LIST Confirm Hosts On the Confirm Hosts screen, click the Register and Confirm button. Ambari installs agents and the selected software onto all the specified nodes and do some basic verification checks. Note that because you pre-created the entire set of IOP user IDs, the system displays a warning that the IDs already exists. However, fix all the errors to ensure that the pre-requisites have been met. FIGURE 6 AMBARI CONFIRM HOSTS 33/86

34 Choose Services On the Choose Services screen, select the services to be installed. Note: It is possible to select only IBM Spectrum Scale and to leave all other services unchecked for the purposes of using Ambari to deploy a general purpose IBM Spectrum Scale cluster. However, the configuration settings for IBM Spectrum Scale will be Hadoop-centric and might not be appropriate for other types of workloads. Any services not selected for installation. FIGURE 7AMBARI CHOOSE SERVICES Assign Masters On the Assign Masters screen, services that belong on a master / management / edge node are presented. Select the location where you want to deploy the master node services. Note the IBM Spectrum Scale Master and consider its placement. 34/86

FIGURE 8 AMBARI ASSIGN MASTERS The IBM Spectrum Scale Master Node designates the node from where Ambari issues commands affecting the entire cluster.

35 FIGURE 8 AMBARI ASSIGN MASTERS The IBM Spectrum Scale Master Node designates the node from where Ambari issues commands affecting the entire cluster. For example, when IBM Spectrum Scale is first being installed and an FPO cluster is first being created, the commands are all executed on the IBM Spectrum Scale Master Node. This is where you ll want to ensure passwordless SSH is set up to every node. As another example, if the configuration changes are made after the cluster has been deployed; the IBM Spectrum Scale Master Node executes the commands to reconfigure the cluster and, if necessary, restarts IBM Spectrum Scale on all nodes. The term Master is used to follow the convention used by the other Hadoop services. The IBM Spectrum Scale master node has no special role in the IBM Spectrum Scale cluster, other than being one of the quorum nodes. Important: If the IBM Spectrum Scale cluster has been created, a quorum node must be selected as the IBM Spectrum Scale master node. There are recommendations for assigning node roles on the BigInsights Knowledge Center at Assign Slaves and Clients On the Assign Slaves and Clients screen, select the client and slave components that are to be deployed across the data nodes. Important: If you want to install Big SQL, install all clients on all nodes including the management nodes. For the IBM Spectrum Scale node client, select all the nodes including all management nodes. They will be selected on each node by default. This is to run both, the IBM Spectrum Scale node and IBM Spectrum Scale Hadoop Connector on every node. 35/86

FIGURE 9 AMBARI ASSIGN SLAVES AND CLIENTS Customize Services On the Customize Services screen, IBM Spectrum Scale has its own tab that must be reviewed carefully.

36 FIGURE 9 AMBARI ASSIGN SLAVES AND CLIENTS Customize Services On the Customize Services screen, IBM Spectrum Scale has its own tab that must be reviewed carefully. There are two tabs: Standard and Advanced configuration. If a new IBM Spectrum Scale cluster is being created, configuration fields on both tabs are populated with values taken from the IBM Spectrum Scale for Hadoop Best Practices White Paper at References: Deploy BigInsights IOP 4.1 over IBM Spectrum Scale In the Standard tab, users can adjust parameters via slider bars or drop-down menus. The Advanced tab contains parameters that do not need to be changed frequently. IMPORTANT: Read and follow the IBM Spectrum Scale deployment modes section FIRST to know how each mode of IBM Spectrum Scale you are deploying IOP onto would affect the system and the Standard and Advanced tabs in the Ambari wizard. 36/86

37 Here are important IBM Spectrum Scale parameters checklists: Standard tab Rule Advanced tab Rule Cluster Name Advanced core-site: fs.defaultfs Make sure hdfs://localhost:8020 is used FileSystem Name Advanced gpfs-advance: gpfs.quorum.nodes The node number must be odd FileSystem Mount Point NSD stanza file See guide in Deploy IOP over new IBM Spectrum Scale file system (FPO support only) Policy file See guide in Deploy IOP over new IBM Spectrum Scale file system (FPO support only) Hadoop local cache disk stanza file See guide in Deploy IOP over existing IBM Spectrum Scale file system (ESS) Default Metadata Replicas <= Max Metadata Replicas Default Data Replicas <= Max Data Replicas Max Metadata Replicas Max Data Replicas TABLE 4 IBM SPECTRUM SCALE CHECKLIST PARAMETERS 37/86

38 FIGURE 10 AMBARI CUSTOMIZE SERVICE IOP TABS Note:. The red circle denotes mandatory en- Check IBM Spectrum Scale values and all the services with a red circle tries before the service can be deployed. 38/86

39 FIGURE 11 AMBARI IBM SPECTRUM SCALE CUSTOMIZE SERVICES STANDARD AND ADVANCED SETTINGS 39/86

40 FIGURE 12 AMBARI IBM SPECTRUM SCALE CUSTOM SERVICES ADVANCE LIST Verify the configuration for IBM Spectrum Scale service. If you have already created the IBM Spectrum Scale cluster and are using Ambari to deploy IOP and Hadoop integration modules, IBM Spectrum Scale Hadoop Connector and IBM Spectrum Scale Ambari Module, the fields are populated by using values detected from the existing cluster. The parameters with a lock icon must not be changed after deployment. These include parameters like the cluster name, remote shell, filesystem name, and max data replicas. Therefore, double check all the parameters with the lock icon before proceeding to the next step. Further, while every attempt is made to detect the correct values from the cluster, verify that parameters are imported properly and make corrections as needed. 40/86

41 FIGURE 13 AMBARI IBM SPECTRUM SCALE DATA AND METADATA REPLICAS The review parameters for Max Data Replicas and Max Metadata Replicas as these values cannot be changed after the file system is created. If you decrease the values from the default of 3, ensure that it is really what you want. Also, setting Max Data Replicas, Max Metadata Replicas, Default Data Replicas, and Default Metadata Replicas to three implies that at least three failure groups in your clusters (at least three nodes with disks), or the file system creation will fail. Ambari can select mounted paths when local paths must be used. When a shared file system is selected by Ambari, some services will not function properly. The following directories can be affected by this shared file system issue. Therefore, verify that the configuration directories are in a local filesystem. They will all use the same file system, so if one directory is good, they should all be good. Check the following parameters: HBase YARN > Advanced (Tab) >hbase-site: Ensure that the HBase local directory is not a shared mount point. They should point to a node local directory such as /hadoop/hbase. YARN > Advanced (Tab) > Node Manager: Ensure that yarn.nodemanager.log-dirs and yarn.nodemanager.local-dirs is not a shared mount point. They must point to a node local directory such as /hadoop/yarn/local. 41/86

42 use /hadoop/yarn/log use /hadoop/yarn/local YARN > Advanced (Tab) > Application Timeline Server: Ensure that yarn.timeline-service.leveldbtimeline-store.path does not use a shared mount point. use /hadoop/yarn/timeline 42/86

43 Oozie > Oozie Server : Ensure that Oozie.Data.Dir does not use a shared mount point. use /hadoop/oozie/data Zookeeper > Oozie Server: Ensure that the Zookeeper directory does not use a shared mount point. use /hadoop/zookeeper Kafka Broker : Ensure that logs.dirs is a local path. use /hadoop/kafka-logs 43/86

44 7.4 Starting deployment Review After verifying all the services on the Review page, click Deploy to begin the installation. FIGURE 14 AMBARI DEPLOYMENT REVIEW 7.5 IBM Spectrum Scale deployment modes The IBM Spectrum Scale state has three different modes. Follow the steps that pertain to your file system setup requirements. Modes: IOP over existing IBM Spectrum Scale file system (FPO) IOP over existing IBM Spectrum Scale file system (ESS) IOP over new IBM Spectrum Scale cluster (FPO support only) Deploy IOP over existing IBM Spectrum Scale file system (FPO) If you take this mode, you need to start the IBM Spectrum Scale cluster by using mmstartup -a 44/86

45 in the console of any one node in the IBM Spectrum Scale cluster. mount the file system over all nodes by using mmmount<fs-name> -a. Ensure that the IBM Spectrum Scale NSD stanza file, gpfs_nsd, does not exist under /var/lib/ambariserver/resources/ on the Ambari server node. If you haven t started the IBM Spectrum Scale cluster yet but are at the Ambari assign slaves and clients panel, Figure 9 ambari assign slaves and clients, click the previous button to go back to Assign Master panel in Ambari. Start the IBM Spectrum Scale cluster and mount the file system onto all the nodes. Go back to the Ambari gui to continue on to the assign slaves and client panel. Ambari detects the mounted file system and reflects it in Custom Service page for IBM Spectrum Scale. Deploy IOP over existing IBM Spectrum Scale file system (ESS) If you take this mode, start ESS and set up passwordless ssh login from the Ambari server at one of the IBM Spectrum Scale nodes which is in the ESS IBM Spectrum Scale cluster. One configuration file must be created named /var/lib/ambari-server/resources/shared_gpfs_node.cfg on the Ambari server. This file must contain the hostname of a node which is part of the ESS cluster. Ensure that the IBM Spectrum Scale NSD stanza file, gpfs_nsd, does not exist under /var/lib/ambariserver/resources/ on the Ambari server node. If you haven t started the IBM Spectrum Scale cluster yet but are at the Ambari assign slaves and clients panel, Figure 9 ambari assign slaves and clients, click the previous button to go back to Assign Master panel in Ambari. Start the IBM Spectrum Scale cluster and mount the file system onto all the nodes. Go back to the Ambari gui to continue on to the assign slaves and client panel. Ambari automatically detects the mounted file system and reflects it in Custom Service page for IBM Spectrum Scale. In this mode, Ambari can create local cache disk for Hadoop usage. Create the following file: [root@compute000 GPFS]# cat /var/lib/ambari-server/resources/hadoop_disk DISK compute001.private.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg,/dev/sdi,/dev/sdj,/dev/sdk,/dev/sdl,/dev /sdm,/dev/sdn,/dev/sdo,/dev/sdp DISK compute002.private.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg,/dev/sdi,/dev/sdj,/dev/sdk,/dev/sdl,/dev /sdm,/dev/sdn,/dev/sdo,/dev/sdp DISK compute003.private.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg,/dev/sdi,/dev/sdj,/dev/sdk,/dev/sdl,/dev /sdm,/dev/sdn,/dev/sdo,/dev/sdp DISK compute005.private.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg,/dev/sdi,/dev/sdj,/dev/sdk,/dev/sdl,/dev /sdm,/dev/sdn,/dev/sdo,/dev/sdp DISK compute006.private.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg,/dev/sdi,/dev/sdj,/dev/sdk,/dev/sdl,/dev 45/86

46 /sdm,/dev/sdn,/dev/sdo,/dev/sdp Add the file name on the Custom Service page, Hadoop local cache disk stanza file. FIGURE 15AMBARI IBM SPECTRUM SCALE HADOOP LOCAL CACHE FILE STANZA Note: If you are not using shared storage, you do not need this configuration and you can leave this parameter unchanged in the Ambari GUI. Additional steps for deploying IOP over existing IBM Spectrum Scale file system FPO or ESS For a pre-created IBM Spectrum Scale cluster, please review this section carefully. IBM Spectrum Scale NSD stanza file is not required because the filesystem already exists. Because Ambari does not allow blank value, leave the default value of IBM Spectrum Scale NSD stanza file. Deploy IOP over new IBM Spectrum Scale file system (FPO support only) To deploy onto a new IBM Spectrum Scale FPO cluster, follow the following setup points: Prepare a IBM Spectrum Scale NSD stanza file Two types of NSD files are supported for file system auto creation. One is the preferred simple format, and another is the standard IBM Spectrum Scale NSD file format for IBM Spectrum Scale experts. 46/86

47 If a simple NSD file is used, Ambari selects the proper metadata and data ratio for you. If possible, Ambari creates partitions on some disks for Hadoop intermediate data, which improves the Hadoop performance. If the standard IBM Spectrum Scale NSD file is used, administrators are responsible for storage space arrangement. One policy file is also required when the standard IBM Spectrum Scale NSD file is used. Partition algorithm Algorithm for system pool and usage. Failure group selection rule Failure groups are created dependent on the rack location of the node. Rack Mapping File Nodes can be defined to belong to racks. Partitioning the function matrix The reason why one disk is divided into two partitions is because one partition is used for ext3/ext4 to store the map/reduce intermediate data, while another partition is used as a data disk in the IBM Spectrum Scale file system. Also, only data disks can be partitioned, metadata disks cannot. For more information on each of the setup points, refer to the Appendix Preparing a stanza File and IBM Spectrum Scale-FPO Deployment sections. 7.6 Verify and Test Installation For an initial installation through Ambari, the UID/GID of these users will be consistent over all nodes. However, if you deploy it for the second time or part of nodes have been created with some UID/GID above, the UID/GID of these users might not be consistent over all nodes, per the AMBARI issue, from the Ambari community. After deployment and during verification of system, check by using mmdsh -N all id <user-name> to see whether the UID is consistent across all nodes. 47/86

48 After the Ambari deployment, check the IBM Spectrum Scale installed packages on all nodes by using rpm -qa grep gpfs to verified that all base IBM Spectrum Scale packages have been installed. Run wordcount to test the installation For example: As user fvt255 create the hadoop filesystem of the user if it does not exist. ~]# hadoop fs -mkdir /user/fvt255 ~]# hadoop fs -chown -R fvt255:users /user/fvt255 Copy the mywordcountfile file to be used as input to the wordcount program. Run the wordcount program ~]$ yarn jar /usr/iop/ /hadoop-mapreduce/hadoop-mapreduce-examples ibm-8.jar wordcount /user/fvt255/wc_input/mywordcountfile /user/fvt255/wc_output ~]$ hadoop fs -ls wc_output Found 2 items -rw-r--r-- 3 fvt255 users :48 wc_output/_success -rw-r--r-- 3 fvt255 users :48 wc_output/part-r Check the Hadoop GUI 48/86

49 Appendix A. Preparing a stanza File The Ambari install process can install and configure a new IBM Spectrum Scale cluster file system and configure it for Hadoop workloads according to best practices. To support this task, the installer must know about the disks available in the cluster and how you want to use them. If you do not indicate preferences, intelligent defaults are used. Note: Ambari for deploying a new IBM Spectrum Scale cluster is only supported for FPO. Two types of NSD files are supported for file system auto-creation. One is the preferred simple format, and another is the standard IBM Spectrum Scale NSD format intended for experienced IBM Spectrum Scale administrators. Preferred Simple Format o Ambari selects the proper metadata and data ratios. Standard Format o The GPFS administrator is responsible for the storage arrangement and configuration. o o If possible, Ambari creates partitions on some disks for Hadoop intermediate data to enhance performance One system pool and one data pool are created o o A policy file is also required Storage pools and block sizes can be defined as needed. o NSD file must be located at /var/lib/ambariserver/resources/ on the Ambari server o Only /dev/sdx and /dev/dx-x devices are supported Example of a Preferred Simple IBM Spectrum Scale NSD File This tells IBM Spectrum Scale set-up process that there are 7 nodes, each with 6 disk drives to be defined as NSDs. All information must be continuous with no extra spaces. # cat /var/lib/ambari-server/resources/gpfs_nsd DISK compute001.private.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg DISK compute002.private.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg DISK compute003.private.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg DISK compute005.private.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg DISK compute006.private.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg DISK compute007.private.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg 49/86

50 If you want to select specific disks for metadata, such as SSD drives, add the label -meta to those disks. For a simple NSD file, add the label meta for the disks that you want to be metadata disks, as shown in the following example. If -meta is used, the Partition algorithm is ignored. # cat /var/lib/ambari-server/resources/gpfs_nsd DISK compute001.private.dns.zone:/dev/sdb-meta,/dev/sdc,/dev/sdd DISK compute002.private.dns.zone:/dev/sdb-meta,/dev/sdc,/dev/sdd DISK compute003.private.dns.zone:/dev/sdb-meta,/dev/sdc,/dev/sdd DISK compute005.private.dns.zone:/dev/sdb-meta,/dev/sdc,/dev/sdd DISK compute006.private.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd DISK compute007.private.dns.zone:/dev/sdb,/dev/sdc,/dev/sdd In the simple NSD file, above, /dev/sdb from compute001, compute002, compute003 and compute005 are specified as meta disks in the IBM Spectrum Scale file system. The partition algorithm is ignored if the nodes listed in the simple NSD file do not match the set of nodes that will be used for the NodeManager service. If some nodes that are not NodeManagers are in the NSD file or some nodes that will be NodeManagers are not in the NSD file, no partitioning will be done. Example of a Standard IBM Spectrum Scale NSD File %pool: pool=system blocksize=256k layoutmap=cluster allowwriteaffinity=no %pool: pool=datapool blocksize=2m layoutmap=cluster allowwriteaffinity=yes writeaffinitydepth=1 blockgroupfactor=256 # gpfstest9 %nsd: nsd=node9_meta_sdb device=/dev/sdb servers=gpfstest9 usage=metadataonly failuregroup=101 pool=system %nsd: nsd=node9_meta_sdc device=/dev/sdc servers=gpfstest9 usage=metadataonly failuregroup=101 pool=system %nsd: nsd=node9_data_sde2 device=/dev/sde2 servers=gpfstest9 usage=dataonly failuregroup=1,0,1 pool=datapool %nsd: nsd=node9_data_sdf2 device=/dev/sdf2 servers=gpfstest9 usage=dataonly failuregroup=1,0,1 pool=datapool # gpfstest10 %nsd: nsd=node10_meta_sdb device=/dev/sdb servers=gpfstest10 usage=metadataonly failuregroup=201 pool=system %nsd: nsd=node10_meta_sdc device=/dev/sdc servers=gpfstest10 usage=metadataonly failuregroup=201 pool=system %nsd: nsd=node10_data_sde2 device=/dev/sde2 servers=gpfstest10 usage=dataonly failuregroup=2,0,1 pool=datapool %nsd: nsd=node10_data_sdf2 device=/dev/sdf2 servers=gpfstest10 usage=dataonly failuregroup=2,0,1 pool=datapool # gpfstest11 %nsd: nsd=node11_meta_sdb device=/dev/sdb servers=gpfstest11 usage=metadataonly failuregroup=301 pool=system %nsd: nsd=node11_meta_sdc device=/dev/sdc servers=gpfstest11 usage=metadataonly failuregroup=301 pool=system %nsd: nsd=node11_data_sde2 device=/dev/sde2 servers=gpfstest11 usage=dataonly failuregroup=3,0,1 pool=datapool %nsd: nsd=node11_data_sdf2 device=/dev/sdf2 servers=gpfstest11 usage=dataonly failuregroup=3,0,1 pool=datapool Type the /var/lib/ambari-server/resources/gpfs_nsd filename into the NSD stanza field. If you are using standard NSD stanza file, then a policy file is required. 50/86

51 Policy File E.g. bigpfs.pol RULE 'default' SET POOL 'datapool' Because of the limitations of the Ambari framework, the NSD file must be copied to the Ambari server under the /var/lib/ambari-server/resources/ directory. Make sure the correct file name is specified on the IBM Spectrum Scale Customize Services page. FIGURE 16 AMBARI NSD STANZA B. IBM Spectrum Scale-FPO Deployment Disk-partitioning algorithm If a simple NSD file is used and without the -meta label, Ambari assigns metadata and data disks and partitions the disk following the rules below: 1. If nodes number are less than four: a. If the disk number of each node is less than three, put all disks to system pool and usage = metadataanddata. No partitioning is done. b. If the disk number of each node is greater than four, assign metaonly and dataonly disks based on ratio 1:3 on each node. But the MAX metadisk number/per node is four. Partitioning is done provided that all NodeManager nodes are also NSD nodes and have the same number of NSD disks. 2. If node number is greater than 5: a. If the disk number of each node is less than put all disks to system pool and usage is metadata and data. Partitioning is not done. b. Set 4 nodes to metanodes where meta disks are located. Others are datanodes. 51/86

52 c. Failure groups are created based on the failure group selection rule d. Assign meta disk and data disks on meta node. Assign only data disk on data node. The ratio follows best practice and between 1:3 and 1:10. e. If all node manager nodes have the same number of NSD disks, create local partition on data disks for Hadoop intermediate data. Failure Group selection rules Failure groups are created based on rack allocation of the nodes. One rack mapping file is supported (Rack Mapping File). Ambari reads this file and assigns one failure group per rack. The rack number must be three or greater. If rack mapping file is not provided, virtual racks are created for data fault toleration. 1. If the node number is less than four, each node is on a different rack. 2. If the node number is greater than five and node number is greater than 10, every two nodes are put in one virtual rack. 3. If the node number is greater than ten and node number is less than 21, every three nodes are put in one virtual rack. 4. If the node number is less than 22, every 10 nodes are put in one virtual rack. Rack Mapping File Nodes can be defined to belong to racks. For three or more racks, the failure groups of each NSD will correspond to the rack the node is in. A sample file is available on the Ambari server at /var/lib/ambariserver/resources/. # cat /var/lib/ambari-server/resources/racks.sample #Host/Rack map configuration file #Format: #[hostname]:/[rackname] #Example: #mn01:/rack1 #NOTE: #The first character in rack name must be "/" mn03:/rack1 mn04:/rack2 dn02:/rack3 52/86

53 FIGURE 17 AMBARI RACK MAPPING Partitioning Function Matrix in Automatic Deployment Each data disk is divided into two partitions because one partition will be used for an ext4 file system to store the map or reduce intermediate data, while another partition will be used as a data disk in the IBM Spectrum Scale file system. Only data disks can be partitioned. Meta disks cannot be partitioned. On the other hand, if a node is not selected as a NodeManager for YARN, then there will not be any map or reduce tasks running on that node. In this case, partitioning the disks of the node makes no sense because the local partition will not be used. 53/86

54 The following table describes the partitioning function matrix: TABLE 5IBM SPECTRUM SCALE PARTITIONING FUNCTION MATRIX Specify the standard Specify the simple NSD file Specify the simple NSD file NSD file without the -meta label with the -meta label #1: No partitioning; Partition and select meta No partitioning. <node manager host list> == < IBM Spectrum Scale NSD server nodes> The node manager host list is equal to IBM Spectrum Scale Create an NSD directly with the NSD file. disks for the customer according to Diskpartitioning algorithm and Failure Group selection rules All disks marked with the - meta label will be used for metadata NSD disks. All others are marked as data NSDs. NSD server nodes. #2: No partitioning. No partitioning, but select No partitioning. <node manager host list>>< IBM Spectrum Scale NSD server nodes> Some node manager hosts are not in IBM Spectrum Scale Create the NSD directly with the specified NSD file. meta disks for the customer according to Diskpartitioning algorithm and Failure Group selection rules All disks marked with the - meta label are used for metadata NSD disks. All others are marked as data NSDs. NSD server nodes but all IBM Spectrum Scale NSD server nodes are in node manager host list. <node manager host list><< No partitioning. No partitioning, but select No partitioning. IBM Spectrum Scale NSD server nodes> Some IBM Spectrum Scale NSD server nodes are not in the node manager host list Create the NSD directly with the specified NSD file. meta disks for customer according to Diskpartitioning algorithm and Failure Group selection rules All disks marked with the - meta label will be used for metadata NSD disks. All others are marked as data NSDs. but all node manager host lists are in IBM Spectrum Scale NSD server nodes. As for standard NSD files or simple NSD files with the -meta label. The IBM Spectrum Scale NSD and filesystem are created directly. 54/86

55 To specify which disks are used for metadata and have data disks partitioned, use the script partition_disks_general.sh, found in Attachments at the bottom of the References: BigInsight Enterprise Manager wiki, to partition the disks first, and specify the partition that is used for GPFS NSD in a simple NSD file. For example: [root@compute000 GPFS]# cat /var/lib/ambari-server/resources/gpfs_nsd DISK compute001.private.dns.zone:/dev/sdb-meta,/dev/sdc2,/dev/sdd2 DISK compute002.private.dns.zone:/dev/sdb-meta,/dev/sdc2,/dev/sdd2 DISK compute003.private.dns.zone:/dev/sdb-meta,/dev/sdc2,/dev/sdd2 DISK compute005.private.dns.zone:/dev/sdb-meta,/dev/sdc2,/dev/sdd2 DISK compute006.private.dns.zone:/dev/sdb,/dev/sdc2,/dev/sdd2 DISK compute007.private.dns.zone:/dev/sdb,/dev/sdc2,/dev/sdd2 After deployment is done by this mode, manually update the yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs to contain the directory list from the disk partitions that are used for map/reduce intermediate data. C. Dual-network deployment In your cluster, if each node has two network adapters, it is called a dual network cluster. You can assign one network for Ambari services, such as Yarn and Hbase, and another network for IBM Spectrum Scale data transfer. Configuring the network setting can improve network bandwidth for both IBM Spectrum Scale and Hadoop services, such as Yarn, because both services consume a lot of network bandwidth when there are data IO operations running on the Hadoop cluster. If the two network adapters are 1Gb network and 10Gb network, route all services over the 10Gb network because Yarn-like workloads need a lot of network bandwidth. If these services are to be routed over the 1Gb network, the map and reduce job performance is impacted. Ambari does not support dual networks if one network adapter is used for IBM Spectrum Scale and the second network adapter is used for Hadoop services because Ambari uses the same host list for both services. Manual deployment is required to create this dual network cluster environment. For more information, see References: Deploy BigInsights IOP 4.1 over IBM Spectrum Scale. Two network adapters, configured with different sub-network addresses If each node in the cluster has two network adapters, where one is configured with one sub-network address, x/24, and the other is configured with another sub-network address x/24, the following decisions must be made: The sub-network address that are to be used for all Ambari services assuming that the sub-network address is x) The sub-network address to be used for IBM Spectrum Scale node-to-node data transfer assuming that the sub-network address is x. 55/86

56 When deploying IOP + IBM Spectrum Scale through Ambari, specify the IP addresses or corresponding host name list for x. After deployment has been completed through Ambari, perform the following steps: 1. Stop all services on the Ambari GUI by selecting Actions from the left panel 2. SSH to any one of the IBM Spectrum Scale node and run mmchconfig subnets= N all 3. Start all services on Ambari. After completing the preceding steps, run the system monitor tool nmon which can be downloaded from to ensure that there is obvious network traffic over the x network adapter when you write data into IBM Spectrum Scale. Two network adapters, configured with same sub-network addresses If each node in the cluster has two network adapters, where both network adapters are configured with the same sub-network addresses, for one adapter and for the other, the following decisions must be made: The network address to be used for all the Ambari services assuming it is <address1> The network address to be used for IBM Spectrum Scale node-to-node data transfer assuming it is <address2>. While deploying IOP + IBM Spectrum Scale through Ambari, specify the IP addresses or corresponding host name list for the IP address <address1>. After deployment has been completed through Ambari, perform the following steps: 1. Stop all services on the Ambari GUI (From left panel, at bottom, select Actions ) 2. SSH to any one of the IBM Spectrum Scale node and run the following commands: mmchcluster --ccr-disable mmchnode --daemon-interface=hostnamey -N hostnamex <== do this for all nodes mmchcluster ccr-enable Note: hostnamex must be changed to the real host name that represents <address1> of the node that is to be changed. hostnamey must be changed to the real host name that represents <address2> of the node that is to be changed. For example, you could get hostnamex from the output of /usr/lpp/mmfs/bin/mmlscluster (the value that is displayed in the Admin node name column): Node Daemon node name IP address Admin node name Designation gpfstest10.cn.ibm.com gpfstest10.cn.ibm.com quorum 2 gpfstest11.cn.ibm.com gpfstest11.cn.ibm.com 3 gpfstest12.cn.ibm.com gpfstest12.cn.ibm.com 56/86

57 Run mmchnode daemon-interface=gpfstest10g.cn.ibm.com -N gpfstest10.cn.ibm.com to change the daemon interface from the IP address of gpfstest10.cn.ibm.com into gpfstest10g.cn.ibm.com.here, gpfstest10g.cn.ibm.com is used to replace hostnamey. Do this for all nodes accordingly. 3. Start all services on Ambari. After completing the preceding steps, run the system monitor tool nmon. The nmon tool can be downloaded from This ensures that there is obvious network traffic over the x network adapter when you write data into IBM Spectrum Scale. D. BigInsights Value Add Services on IBM Spectrum Scale For IBM Spectrum Scale, some minor adjustments are required to the standard BigInsights Value Add installation instructions. 1. Perform the preparation Steps for BigInsights Value Adds: 2. After Installing the BigInsights Analyst (BI-Analyst-IOP-x-x-x.rpm) and Data Scientist (BI-DS-IOP-x-x-x.rpm) RPMs, the ambari-server resource stacks are updated in /var/lib/ambari-server/resources/stacks/biginsights/4.1/repos/repoinfo.xml Modify the file to reference your local repo for Big Insights value adds. For example: <repo> <baseurl> </baseurl> <repoid>biginsights-valuepack </repoid> <reponame>biginsights-valuepack </reponame> </repo> 3. Look for the same file in the IBM Spectrum Scale stack subdirectory and add the same repo definition as in the prior step: /var/lib/ambari-server/resources/stacks/biginsights/4.1.spectrumscale/repos/repoinfo.xml 57/86

58 Insert the same local repository information in the correct OS version section. 4. Reset YUM cached packages and headers and restart the Ambari server yum clean all; yum makecache ambari-server restart Troubleshooting Value Add Services Why can t BigSheets see files in IBM Spectrum Scale? Perform the following changes and restart BigSheets. Step 1: Ensure that GPFS connector is version hadoop-gpfs or later. To determine current version installed, use: rpm qa grep connector If the connector installed does not meet this requirement, see additional resources below to acquire the latest available connector. Step 2: Post install, BigSheets application must link in IBM Spectrum Scale Hadoop Jar file. Perform the following on the BigSheets master node: cd /usr/ibmpacks/bigsheets/<version>/jetty/lib/ext/ ln -s /usr/iop/current/hadoop-client/hadoop-gpfs.jar Step 3: If you are using Knox Demo LDAP, the user ID used to authenticate with Knox must also be created at the operating system all nodes. Knox demo LDAP is for demonstration purposes only and does not integrate with the operating system of Hadoop nodes. Therefore, users created in Knox demo LDAP do not exist outside of Knox. If both Knox and the operating systems of all Hadoop nodes are configured for the same LDAP server, this step is not required. Issues with Text Analytics Problem: Text Analytics cannot see files id IBM Spectrum Scale Solution: Text Analytics application needs to load hadoop-gpfs.jar. Perform the following change and restart Text Analytics 58/86

59 In the file: /usr/ibmpacks/text-analytics-web-tooling/<version>/jetty-distribution-<version>/contexts/ textanalytics-web-tooling.xml Add text in bold: <Set name="extraclasspath">/usr/iop/current/hadoop-client/conf,/usr/iop/current/hadoopclient/hadoop-gpfs.jar, </Set> Problem: Text Analytics Run on Cluster failed with error. The specified extractors could not be executed due to an unexpected error. Verify that you have read and write access to the directories you have selected, and that the file names you have entered are valid. Solution: Check the logs located under: /usr/ibmpacks/text-analytics-web-tooling/3.8/jetty-distribution v /logs If you see exceptions like: java.io.filenotfoundexception: GPFSC00007E: File does not exist: /user/tauser/lib You must manually create the directory by taking the following steps: Step 1: Log in to the Text Analytics Node Step 2: su tauser Step 3: hadoop dfs -mkdir /user/tauser/lib Step 4: Copy all jar packages under: and /usr/ibmpacks/text-analytics-runtime/4.10/lib /usr/ibmpacks/current/text-analytics-runtime/action-api/lib/ and the sub- directories into /<gpfs-mount-point>/user/tauser/lib/ Problem: Text Analytics Run on Clusterfails with error: The specified extractors are not executed because a valid cluster configuration could not be found. Contact the system administrator to verify that the cluster is running correctly and that the cluster 59/86

60 configuration is available on your server. If you indicated that the execution artifacts must be generated, those artifacts must still be available at the specified location, even though the extractors were not executed. Solution: This is a known issue for BigInsights V and specific to IBM Spectrum Scale support. The later versions might not have this problem. If you encounter this error, please contact IBM support and reference defect Issues with Big R 1. Big R for GPFS requires HTTPFS to be configured as a workaround for WebHDFS. Verify the HTTPFS configuration according to Appendix A. 2. On all nodes, make hadoop-gpfs.jar to Big R: vi /usr/ibmpacks/bigr/<version>/bigr-jaql/<version>/conf/jaql.xml Add the hadoop-gpfs.jar to the list of jars in the jaql.job.jars name/value section. For example: <name>jaql.job.jars</name> <value>jaql.jar,commons-lang-2.5.jar, icu4j.jar,..,hadoop-gpfs.jar</value> 3. On all nodes, create a symbolic link to hadoop-gpfs.jar on two places: cd /usr/ibmpacks/bigr/<version>/bigr-jaql/<version>/lib ln -s /usr/iop/current/hadoop-client/hadoop-gpfs.jar cd /usr/ibmpacks/bigr/<version>/bigr-bigsql1/<version>/lib/ext ln -s /usr/iop/current/hadoop-client/hadoop-gpfs.jar 4. Validate the permissions on <gpfs_mount>/tmp/bigr: Ensure that <gpfs_mount>/tmp/bigr is writable for all users chmod 777 <gpfs_mount>/tmp/bigr 5. Restart Big R and run Big R service check E. Node management Add Node New nodes can be added via the Ambari Web GUI. 1. Access the Hosts tab in the Ambari GUI (Illustration 5.a), and click Add New Hosts from the Actions button. 60/86

61 2. Specify the new node information, and then click Registration and Confirm. Note: In illustration 5.b, the SSH Private Key is the key of the user on the Ambari Server. 3. Select the services that you need to install on to this new node. 61/86

62 4. If several configuration groups were created, select one of them for the new node. 5. Start the deployment by clicking Deploy 62/86

63 6. Restart the IBM Spectrum Scale Master service on the IBM Spectrum Scale master node. This action updates the pagepool configuration on all the new nodes. If several nodes are added, restart once, after all nodes are deployed. 63/86

7. Ambari does not create NSDs on the new nodes. To create IBM Spectrum Scale NSDs and add NSDs to the filesystem, follow the instructions at References: Deploy BigInsights IOP 4.

64 7. Ambari does not create NSDs on the new nodes. To create IBM Spectrum Scale NSDs and add NSDs to the filesystem, follow the instructions at References: Deploy BigInsights IOP 4.1 over IBM Spectrum Scale. Remove Node Note: Before removing one node, check that the following conditions are met: 1. The rest of the IBM Spectrum Scale file system free space is enough for all data. 2. The number of quorum nodes is enough for keeping the IBM Spectrum Scale cluster up. 3. The number of failure groups is greater than the number of data replicas. Automatically removing an IBM Spectrum Scale node is not supported in this release. To remove one IBM Spectrum Scale node, some manual steps are required. This example shows how to remove node compute Click the node name on the Hosts tab in Figure 18 ambari hosts panel FIGURE 18 AMBARI HOSTS PANEL 2. Stop all services on this node except on GPFS Node in Figure 19 ambari hosts gpfs node components 64/86

FIGURE 19 AMBARI HOSTS GPFS NODE COMPONENTS 3. Log on to the node terminal through SSH. Then, remove NSD from the IBM Spectrum Scale file system. If this node is not an NSD node, skip this step.

65 FIGURE 19 AMBARI HOSTS GPFS NODE COMPONENTS 3. Log on to the node terminal through SSH. Then, remove NSD from the IBM Spectrum Scale file system. If this node is not an NSD node, skip this step. ~]# cat nsd.stanza %pool:pool=system layoutmap=cluster blocksize=256k %pool:pool=datapool layoutmap=cluster blocksize=2048k allowwriteaffinity=yes writeaffinitydepth=1 blockgroupfactor=128 %nsd: nsd=gpfs33nsd device=/dev/sdd servers=compute002 usage=dataonly failuregroup=2,0,3 pool=datapool %nsd: nsd=gpfs34nsd device=/dev/sde servers=compute002 usage=dataonly failuregroup=2,0,3 pool=datapool %nsd: nsd=gpfs35nsd device=/dev/sdf servers=compute002 usage=dataonly failuregroup=2,0,3 pool=datapool ~]# /usr/lpp/mmfs/bin/mmdeldisk bigpfs -F nsd.stanza Deleting disks... Scanning file system metadata, phase 1... Scan completed successfully. Scanning file system metadata, phase 2... Scanning file system metadata for datapool storage pool Scan completed successfully. Scanning file system metadata, phase 3... Scan completed successfully. Scanning file system metadata, phase 4... Scan completed successfully. Scanning user file metadata % complete on Wed Aug 19 04:00: ( inodes with total MB data processed) % complete on Wed Aug 19 04:00: ( inodes with total MB data processed) Scan completed successfully. Checking Allocation Map for storage pool system Checking Allocation Map for storage pool datapool tsdeldisk completed. mmdeldisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. [root@compute002 ~]# /usr/lpp/mmfs/bin/mmdelnsd -F nsd.stanza mmdelnsd: Processing disk gpfs33nsd mmdelnsd: Processing disk gpfs34nsd mmdelnsd: Processing disk gpfs35nsd mmdelnsd: Propagating the cluster configuration data to all 65/86

affected nodes. This is an asynchronous process. 4. Stop the IBM Spectrum Scale Node service from the Ambari GUI in Figure 20 ambari hosts actions FIGURE 20 AMBARI HOSTS ACTIONS 5.

66 affected nodes. This is an asynchronous process. 4. Stop the IBM Spectrum Scale Node service from the Ambari GUI in Figure 20 ambari hosts actions FIGURE 20 AMBARI HOSTS ACTIONS 5. Log on to the IBM Spectrum Scale master node. Then, remove the node from the IBM Spectrum Scale cluster. ~]# /usr/lpp/mmfs/bin/mmhadoopctl connector stop Hadoop connector 'gpfs-connector-daemon' stopped. ~]# ssh compute001 Last login: Thu Aug 20 05:23: from compute000 ~]# /usr/lpp/mmfs/bin/mmhadoopctl connector detach --distribution BigInsights -N compute002 DISTRIBUTION=biginsights VERSION=4.1 ARCH=Linux-amd64-64 CONNECTOR_DIR=/usr/lpp/mmfs/hadoop SRC_JAR=/usr/lpp/mmfs/hadoop/hadoop-gpfs jar JAR_FILE=hadoop-gpfs.jar JAR_DIR=/usr/iop/current/hadoop-client OOZIE_SERVER_DIR=/usr/iop/current/oozie-server SOLR_SERVER_DIR=/usr/iop/current/solr-server/server SLIDER_JAR_DIR=/usr/iop/current/slider-client/lib/ From compute002: Remove connector rm -f /usr/iop/current/hadoop-client/hadoop-gpfs.jar succeeded. rm -f /usr/iop/current/oozie-server/libext/hadoop-gpfs.jar succeeded. rm -f /usr/iop/current/slider-client/lib//hadoop-gpfs.jar succeeded. 66/86

67 ~]# /usr/lpp/mmfs/bin/mmdelnode compute002 Verifying GPFS is stopped on all affected nodes... mmdelnode: Command successfully completed mmdelnode: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. ~]# ssh compute002 Last login: Wed Aug 19 04:09: from compute001 ~]# yum erase gpfs.* Loaded plugins: product-id, security, subscription-manager This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register. Setting up Remove Process Resolving Dependencies --> Running transaction check ---> Package gpfs.base.x86_64 0: will be erased ---> Package gpfs.crypto.x86_64 0: will be erased ---> Package gpfs.docs.noarch 0: will be erased ---> Package gpfs.ext.x86_64 0: will be erased ---> Package gpfs.gpl.noarch 0: will be erased ---> Package gpfs.gskit.x86_64 0: will be erased ---> Package gpfs.hadoop-connector.x86_64 0: will be erased ---> Package gpfs.msg.en_us.noarch 0: will be erased --> Finished Dependency Resolution Dependencies Resolved =============================================================================================================== ============================================== Package Arch Version Repository Size =============================================================================================================== ============================================== Removing: gpfs.base x86_64 42 M gpfs.crypto x86_ k gpfs.docs noarch 1.5 M gpfs.ext x86_ M gpfs.gpl noarch 2.5 M gpfs.gskit x86_64 28 M gpfs.hadoop-connector x86_ k gpfs.msg.en_us noarch 601 k Transaction Summary =============================================================================================================== ============================================== Remove 8 Package(s) Installed size: 83 M Is this ok [y/n]: y Downloading Packages: Running rpm_check_debug Running Transaction Test Transaction Test Succeeded Running Transaction Erasing : gpfs.crypto x86_64 1/8 Erasing : gpfs.gpl noarch 2/8 make[1]: Entering directory `/usr/lpp/mmfs/src'. rm -f -rf usr make[2]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' make[1]: Leaving directory `/usr/lpp/mmfs/src' 67/86

Erasing : gpfs.ext-4.1.1-1.x86_64 3/8 Erasing : gpfs.hadoop-connector-2.7.0-1.x86_64 4/8 Erasing : gpfs.base-4.1.1-1.x86_64 5/8 Erasing : gpfs.msg.en_us-4.1.1-1.noarch 6/8 Erasing : gpfs.docs-4.1.1-1.noarch 7/8 Erasing : gpfs.

68 Erasing : gpfs.ext x86_64 3/8 Erasing : gpfs.hadoop-connector x86_64 4/8 Erasing : gpfs.base x86_64 5/8 Erasing : gpfs.msg.en_us noarch 6/8 Erasing : gpfs.docs noarch 7/8 Erasing : gpfs.gskit x86_64 8/8 Verifying : gpfs.gpl noarch 1/8 Verifying : gpfs.gskit x86_64 2/8 Verifying : gpfs.crypto x86_64 3/8 Verifying : gpfs.base x86_64 4/8 Verifying : gpfs.ext x86_64 5/8 Verifying : gpfs.hadoop-connector x86_64 6/8 Verifying : gpfs.docs noarch 7/8 Verifying : gpfs.msg.en_us noarch 8/8 Removed: gpfs.base.x86_64 0: gpfs.crypto.x86_64 0: gpfs.docs.noarch 0: gpfs.ext.x86_64 0: gpfs.gpl.noarch 0: gpfs.gskit.x86_64 0: gpfs.hadoop-connector.x86_64 0: gpfs.msg.en_us.noarch 0: Complete! 6. Log on to the node and stop the ambari- agent. [root@compute002 ~]# ambari-agent stop Verifying Python version compatibility... Using python /usr/bin/python2.6 Found ambari-agent PID: Stopping ambari-agent Removing PID file at /var/run/ambari-agent/ambari-agent.pid ambari-agent successfully stopped 7. Delete the host using the Ambari GUI in Figure 21 ambari hosts actions delete host Notes: By removing this host, Ambari ignores future communications from this host. Software packages will not be removed from the host. The components on the host must not be restarted. If you wish to add this host to the cluster, clean it. 68/86

69 FIGURE 21 AMBARI HOSTS ACTIONS DELETE HOST 8. Clean the software packages. This is an optional step. ~]# python /usr/lib/python2.6/site-packages/ambari_agent/hostcleanup.py --silent --skip=users -f /etc/ambariagent/conf/hostcleanup.ini,/etc/ambari-agent/conf/hostcleanup_custom_actions.ini WARNING:HostCleanup:No alternatives found for: flume-conf WARNING:HostCleanup:No alternatives found for: hadoop-conf WARNING:HostCleanup:No alternatives found for: hadoop-httpfs-conf WARNING:HostCleanup:No alternatives found for: hadoop-httpfs-tomcat-conf WARNING:HostCleanup:No alternatives found for: hbase-conf WARNING:HostCleanup:No alternatives found for: hive-webhcat-conf WARNING:HostCleanup:No alternatives found for: hive-conf WARNING:HostCleanup:No alternatives found for: hive-hcatalog-conf WARNING:HostCleanup:No alternatives found for: kafka-conf WARNING:HostCleanup:No alternatives found for: knox-conf WARNING:HostCleanup:No alternatives found for: oozie-conf WARNING:HostCleanup:No alternatives found for: oozie-tomcat-conf WARNING:HostCleanup:No alternatives found for: pig-conf WARNING:HostCleanup:No alternatives found for: spark-conf WARNING:HostCleanup:No alternatives found for: sqoop-conf WARNING:HostCleanup:No alternatives found for: zookeeper-conf INFO:HostCleanup:Clean-up completed. The output is at /var/lib/ambari-agent/data/hostcleanup.result F. Upgrade IBM Spectrum Scale to Latest PTF You can update the IBM Spectrum Scale and IBM Spectrum Scale Hadoop Connector PTF package through the Ambari server. The cross release upgrade is not supported. The IBM Spectrum Scale PTF and the IBM Spectrum Scale Hadoop connector can be upgraded separately. 1. Put all PTF RPM packages or the IBM Spectrum Scale Hadoop connector PTF RPM in the IBM Spectrum Scale YUM repository, which was created in chapter 2.3. In most of cases, the IBM Spectrum Scale PTF package includes following packages: gpfs.base x.ppc64le.update.rpm gpfs.crypto-4.1.x-1.ppc64le.update.rpm gpfs.docs x.noarch.rpm gpfs.ext x.ppc64le.update.rpm gpfs.gpl x.noarch.rpm gpfs.gskit ppc64le.rpm gpfs.msg.en_us x.noarch.rpm 2. Go to the IBM Spectrum Scale Yum directory and rebuild the Yum database by using the command 'createrepo' GPFS]# createrepo. Spawning worker 0 with 8 pkgs Workers Finished Gathering worker results Saving Primary metadata 69/86

70 Saving file lists metadata Saving other metadata Generating sqlite DBs Sqlite DBs complete Log on to the Ambari web GUI and stop all services, including IBM Spectrum Scale. 4. Click IBM Spectrum Scale. Click Upgrade_SpectrumScale in the Service Actions drop down list. If you need to upgrade IBM Spectrum Scale Hadoop connector, click Upgrade_Connector. FIGURE 22 AMBARI UPGRADE IBM SPECTRUM SCALE G. Upgrade the IBM Spectrum Scale Ambari integration module If the Ambari cluster has been installed with the gpfs.ambari-iop_4.1-1 package, you can upgrade to this gpfs.ambari-iop_4.1-2 release to get more powerful features such as IBM Spectrum Scale upgrade, Filesystem monitor, and Filesystem debug info collection. To upgrade from gpfs.ambari-iop_4.1-1, perform the following steps: 1. Stop all services through the Ambari GUI. 70/86

71 FIGURE 23 AMBARI DASHBOARD ACTIONS STOP ALL 2. Log on to the Ambari server node and delete the IBM Spectrum Scale service via the Ambari REST API. ~]# curl -u admin:admin -X DELETE -H "X-Requested-By: ambari" dat:8080/api/v1/clusters/[cluster_name]/services/gpfs 3. Copy the gpfs.ambari-iop_4.1-2 package to the Ambari server node. 4. Upgrade the gpfs.ambari package to If /var/lib/ambari-server/resources/stacks/biginsights/4.1.spectrumscale/repos/repoinfo.xml was updated, it will be kept. Ignore the warning. ~]# /gpfs.ambari-iop_4.1-2.noarch.bin -u q Unpacking... Done Upgrading... Preparing... ################################# [100%] Updating / installing... 1:gpfs.ambari-iop_4.1-2 warning: /var/lib/ambari-server/resources/stacks/biginsights/4.1.spectrumscale/repos/repoinfo.xml created as /var/lib/ambari-server/resources/stacks/biginsights/4.1.spectrumscale/repos/repoinfo.xml.rpmnew ################################# [ 50%] Cleaning up / removing... 2:gpfs.ambari-iop_4.1-1 ################################# [100%] 5. Restart the Ambari server. 71/86

72 ~]# ambari-server restart Using python /usr/bin/python2.7 Restarting ambari-server Using python /usr/bin/python2.7 Stopping ambari-server Ambari Server stopped Using python /usr/bin/python2.7 Starting ambari-server Ambari Server running with administrator privileges. Organizing resource files at /var/lib/ambari-server/resources... Server PID at: /var/run/ambari-server/ambari-server.pid Server out at: /var/log/ambari-server/ambari-server.out Server log at: /var/log/ambari-server/ambari-server.log Waiting for server start... Ambari Server 'start' completed successfully. 6. Start the IBM Spectrum Scale cluster and verify that the file system is mounted. ~]# /usr/lpp/mmfs/bin/mmstartup -a Wed Nov 25 21:05:09 EST 2015: mmstartup: Starting GPFS ~]# /usr/lpp/mmfs/bin/mmlsmount all -L File system bigpfs is mounted on 4 nodes: compute compute compute compute Add the IBM Spectrum Scale service from the Ambari GUI. a. Click '+ Add service' from the 'Action' drop down list on the Dashboard tab. 72/86

73 FIGURE 24 AMBARI DASHBOARD ADD SERVICES b. Select IBM Spectrum Scale on the Choose Services page. 73/86

74 FIGURE 25 AMBARI UPGRADE CHOOSE SERVICES c. Select the original IBM Spectrum Scale master node before upgrading. FIGURE 26 AMBARI ADD SERVICE WIZARD d. Assign all nodes to the IBM Spectrum Scale Hadoop connector and the IBM Spectrum Scale node. FIGURE 27 AMBARI ASSIGN NODES - HADOOP CONNECTOR + IBM SPECTRUM SCALE NODE 74/86

75 e. Verify that all the current IBM Spectrum Scale parameters are correct on the Customize Services page. If not, please check if IBM Spectrum Scale cluster is functioning. FIGURE 28 AMBARI CUSTOMIZE SERVICES VERIFICATION f. Review the summary and start the deployment. IBM Spectrum Scale is not actually deployed. It only adds IBM Spectrum Scale to the Ambari server. FIGURE 29 AMBARI REVIEW PANEL 75/86

76 g. Start other services. After deployment, the IBM Spectrum Scale service is started automatically. Click Start All from the Action drop down list on the Dashboard tab. H. IBM Spectrum Scale UI FIGURE 30 AMBARI AFTER UPGRADE DASHBOARD The IBM Spectrum Scale summary page in Ambari contains a Quick Links menu with an item that opens the IBM Spectrum Scale UI in a new tab. The IBM Spectrum Scale GUI is not installed or configured by Ambari. There is merely a link to the UI if the administrator wants to set it up. 76/86

77 If you are running IBM Spectrum Scale or later, the rpms required to install the GUI are included in Standard and Advanced Editions for Linux on x86 and Power (Big Endian or Little Endian). The GUI requires RHEL 7. Installation instructions are available on IBM Knowledge Center here: 01.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.ins.doc/bl1ins_manualinst allofgui.htm If you are running IBM Spectrum Scale 4.1, there is an Open Beta of the GUI available here: Ambari assumes that the GUI is running the same node as the IBM Spectrum Scale Master. If you are using ESS, then Ambari assumes that the ESS GUI has been installed on the node specified in /var/lib/ambari-server/resources/shared_gpfs_node.cfg. The host and the port that Ambari links to can be configured by setting gpfs.webui.address in the gpfs-advance configuration. If this value is changed after the initial cluster deployment, refresh the browser window where the Ambari GUI is running so that the change can take effect. I. Collecting the snap data It is possible to collect IBM Spectrum Scale snap data from the Ambari GUI. The command is run by the IBM Spectrum Scale Master and the snap data is saved to /var/log/ambari.gpfs.snap.<timestamp> on the IBM Spectrum Scale Master node. 77/86

It is also possible to override the default behavior of this snap by providing the arguments to be given to the gpfs.snap command in the file /var/lib/ambari-server/resources/gpfs.snap.args.

78 It is also possible to override the default behavior of this snap by providing the arguments to be given to the gpfs.snap command in the file /var/lib/ambari-server/resources/gpfs.snap.args. By default, the IBM Spectrum Scale Master will run the following command: /usr/lpp/mmfs/bin/gpfs.snap -d /var/log/ambari.gpfs.snap.<timestamp> -N <all nodes> --check-space -- timeout 600 Where <all nodes> is the list of nodes in the IBM Spectrum Scale cluster and in the Ambari cluster. The external nodes in a shared cluster, such as ESS servers, are not included. If you wanted to override these default arguments, then you would specify the arguments to be passed to gpfs.snap in /var/lib/ambari-server/resources/gpfs.snap.args. For example, if you wanted to write the snap data to a different location, collect snap data from all nodes in the cluster, and increase the timeout. You can provide a gpfs.snap.args file similar to that in the example: [root@mn01]# cat /var/lib/ambari-server/resources/gpfs.snap.args -d /root/gpfs.snap.out -a --timeout 1200 You can see the output from the snap command and learn which directory the snap data was written to by looking at the output file from Ambari. FIGURE 31 AMBARI COLLECT SNAP DATA 78/86

Hortonworks DataFlow

Hortonworks DataFlow Installing HDF Services on a New HDP Cluster for IBM (December 22, 2017) docs.hortonworks.com Hortonworks DataFlow: Installing HDF Services on a New HDP Cluster for IBM Power Systems