Part II (c) Desktop Installation. Net Serpents LLC, USA

Part II (c) Desktop ation

Desktop ation ation Supported Platforms Required Software Releases &Mirror Sites Configure Format Start/ Stop Verify

Supported Platforms ation GNU Linux supported for Development Production Demonstrated on 2000 node cluster Win32 Development only Not supported as a production platform

Required Software Required Software Following to be installed first Java 1.6.x or higher ssh: Ubuntu: ssh and rsync Windows: openssh

Releases and Mirror Sites Releases and Mirror Sites Releases http://hadoop.apache.org/releases.html Stable Release 2.7.1 (July 2015) Stable release: 2.6.0 (released Nov 2014) Earlier good releases: 2.4.0 (April 2014) 2.2.0 (GA Release Oct 2013) 1.0.0 (Dec 2011) Visit: http://hadoop.apache.org/releases.html

Mirror Sites Mirror Sites Downloads available at several mirror sites Suggested by Apache: http://www.apache.org/dyn/closer.cgi/hadoop/ common Other mirror sites at: http://www.apache.org/dyn/closer.cgi/hadoop/ common

- Overview Hadoop 2.6.0 on Ubuntu 14.0.4 Step 1- Java Step 2 Create a dedicated hadoop user Step 3 ssh Step 4 Create ssh certificates Ubuntu is the most popular Linux distribution Step 5 Hadoop Step 6 Setup configuration Files Step 7 Format Step 8 Start/ Stop Step 9 - Verify

Step 1 - Java Login as an admin user: $ cd ~ # Update the source list $ sudo apt-get update $ sudo apt-get install default-jdk # Verify version of Java is 1.6.0 or higher $ java -version java version "1.7.0_65" OpenJDK Runtime Environment (IcedTea 2.5.3) (7u71-2.5.3-0ubuntu0.14.04.1) OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)

- Overview Hadoop 2.6.0 on Ubuntu 14.0.4 Step 1- Java Step 2 Create a dedicated hadoop user Step 3 ssh Step 4 Create ssh certificates Step 5 Hadoop Step 6 Setup configuration Files Step 7 Format Step 8 Start/ Stop Step 9 - Verify

Step 2 Create a dedicated Hadoop user # Create a hadoop group $ sudo addgroup hadoop Adding group `hadoop' (GID 1009) Done. # Create hadoop user $ sudo adduser --ingroup hadoop huser Adding user `huser'... Adding new user `huser' (1001) with group `hadoop'... Creating home directory `/home/huser'... Copying files from `/etc/skel'... Enter new UNIX password: Retype new UNIX password: passwd: password updated successfully Changing the user information for huser

Step 2 Create a dedicated Hadoop user Enter the new value, or press ENTER for the default Full Name []: Room Number []: Work Phone []: Home Phone []: Other []: Is the information correct? [Y/n] Y # Add new user to sudoers $ sudo adduser huser sudo [sudo] password for admin: Adding user `huser' to group `sudo'... Adding user huser to group sudo Done.

Step 3 SSH $ sudo apt-get install ssh # Verify SSH is installed $ which ssh /usr/bin/ssh $ which sshd /usr/sbin/sshd

Step 4 Create SSH Certificates $ sudo su huser #Generate a key pair $ ssh-keygen -f ~/.ssh/id_rsa -t rsa -P "" Generating public/private rsa key pair. Enter file in which to save the key (/home/huser/.ssh/id_rsa): Created directory '/home/huser/.ssh'. Your identification has been saved in /home/huser/.ssh/id_rsa. Your public key has been saved in /home/huser/.ssh/id_rsa.pub. The key fingerprint is: 20:6c:f3:ff:0f:33:bf:30:72:c3:22:70:24:cc:2d:d3 huser@laptop The key's randomart image is: +--[ RSA 2048]----+.oo.o

Step 4 Create SSH Certificates # Create list of authorized keys to avoid being prompted for password $ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Step 5 Hadoop # Download distribution from a mirror site $ sudo wget http://mirrors.sonic.net/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz # Extract the files $ tar xvzf hadoop-2.6.0.tar.gz $ cd hadoop-2.6.0 # Move files to /usr/local/hadoop $ sudo mkdir /usr/local/hadoop $ sudo mv * /usr/local/hadoop # Change ownership to hadoop user $ sudo chown -R huser:hadoop /usr/local/hadoop

Step 6 Configure # Update links to point to Java $update-alternatives --config java There is only one alternative in link group java (providing /usr/bin/java): /usr/lib/jvm/ java-7-openjdk-amd64/jre/bin/java Nothing to configure. # Note down JAVA_HOME variable value $ which javac /usr/bin/javac $ readlink -f /usr/bin/javac /usr/lib/jvm/java-7-openjdk-amd64/bin/javac (Note: JAVA_HOME would be everything before /bin/javac)

Step 6 Configure # Add variables to the end of.bashrc $ vi ~/.bashrc export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64 export HADOOP_INSTALL=/usr/local/hadoop export PATH=$PATH:$HADOOP_INSTALL/bin export PATH=$PATH:$HADOOP_INSTALL/sbin export HADOOP_MAPRED_HOME=$HADOOP_INSTALL export HADOOP_COMMON_HOME=$HADOOP_INSTALL export HADOOP_HDFS_HOME=$HADOOP_INSTALL

Step 6 Configure export YARN_HOME=$HADOOP_INSTALL export HADOOP_COMMON_LIB_NATIVE_DIR= $HADOOP_INSTALL/lib/native export HADOOP_OPTS="-Djava.library.path= $HADOOP_INSTALL/lib # Execute the commands in.bashrc $ source ~/.bashrc

Step 6 Configure # Configure hadoop-env.sh Change variable JAVA_HOME in hadoop-env.sh $ vi /usr/local/hadoop/etc/hadoop/hadoop-env.sh export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

Step 6 Configure # Configure core-site.xml # First create a tmp folder for hadoop $ sudo mkdir -p /app/hadoop/tmp $ sudo chown huser:hadoop /app/hadoop/tmp # Modify core-site.xml $ vi /usr/local/hadoop/etc/hadoop/core-site.xml

Step 6 Configure Modify as follows: <configuration> <property> <name>hadoop.tmp.dir</name> <value>/app/hadoop/tmp</value> <description>a base for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:54310</value> <description>the name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.scheme.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> </property> </configuration>

Step 6 Configure # Configure mapred-site.xml # First copy the file from the template provided $ cp /usr/local/hadoop/etc/hadoop/mapredsite.xml.template /usr/local/hadoop/etc/hadoop/ mapred-site.xml # Modify mapred-site.xml $ vi /usr/local/hadoop/etc/hadoop/mapred-site.xml

Step 6 Configure Modify as follows: <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:54311</value> <description>the host and port that the MapReduce job tracker runs at. </description> </property> </configuration>

Step 6 Configure # Configure hdfs-site.xml # First create the directories for data node and name node $ sudo mkdir -p /usr/local/hadoop_store/hdfs/namenode $ sudo mkdir -p /usr/local/hadoop_store/hdfs/datanode $ sudo chown -R huser:hadoop /usr/local/hadoop_store # Modify hdfs-site.xml $ vi /usr/local/hadoop/etc/hadoop/mapred-site.xml

Step 6 Configure Modify as follows: <configuration> <property> <name>dfs.replication</name> <value>1</value> <description>this is a default value for block replication. Tis could be different from the value specified when the file is created. This value is just a default if none is specified at file creation. </description> </property> <property> continued see next page

Step 6 Configure <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop_store/hdfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/usr/local/hadoop_store/hdfs/datanode</value> </property> </configuration>

Step 7 Format Format $ hadoop namenode -format DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. 15/04/18 14:43:03 INFO namenode.namenode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = laptop/192.168.1.1 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 2.6.0 STARTUP_MSG: classpath = /usr/local/hadoop/etc/hadoop... STARTUP_MSG: java = 1.7.0_65 ************************************************************/ 15/04/18 14:43:03 INFO namenode.namenode: registered UNIX signal handlers for [TERM, HUP, INT] 15/04/18 14:43:03 INFO namenode.namenode: createnamenode [-format] 15/04/18 14:43:07 WARN util.nativecode

Step 8 Start/ Stop Start / Stop To view the available commands: $ ls /usr/local/hadoop/sbin Start hadoop (Login as hadoop user) Available commands: $start-all.sh (deprecated) this starts all daemons OR $start-dfs.sh $start-yarn.sh $mr-jobhistory-daemon.sh start historyserver -

Step 8 Start/ Stop Start / Stop To stop hadoop (Login as hadoop user): $stop-all.sh Or $ Stop-dfs.sh $ stop-yarn.sh $ mr-jobhistory-daemon.sh stop historyserver

Step 9 Verify Verify To verify: $ jps (to view running daemons)

Step 9 Verify Verify WebUI: Using port 50070 Eg.,http://ec2-54-86-169-214.compute-1.amazonaws.com:50070

Quiz Quiz 1 - Which two of the following must be installed for hadoop installation to succeed a- ssh b- java c- ftp d- ruby 2 Which of the following Linux commands may be used to install a software like ssh or java a- get b- put c- get-apt d- startall.sh 3- Name the commands you would use to start the hadoop daemons and to stop them

Quiz Quiz 4 - Which of the following is NOT a configuration file for hadoop a- hadoop-env.sh b- core-site.xml c- mapred-site.xml d- hadoop-configuration.xml 5 True or false a- The user owning hadoop files must belong to a group called hadoop b- A hadoop cluster running on a single node is called pseudo-distributed mode c- A fully distributed hadoop configuration should have a minimum of three nodes