Hands-on Exercise Hadoop

Size: px
Start display at page:

Download "Hands-on Exercise Hadoop"

Transcription

1 Department of Economics and Business Administration Chair of Business Information Systems I Prof. Dr. Barbara Dinter Big Data Management Hands-on Exercise Hadoop Building and Testing a Hadoop Cluster by Means of Apache Ambari 1. Configuration of the Cluster Nodes First of all, define which computer is your master node and which computers are your slave nodes of your future Hadoop cluster. Now, configure your computers by installing an operating system (OS) on them. Pay attention to following hints. Hints: For the purpose of our exercise, we use CentOS (Community Enterprise Operation System) in the version 6.7 (64bit) which is a ready enterprise Linux distribution. 1 Install in parallel the OS for the master and the first slave node. Once you are done with this task, you can move on and install CentOS on the second slave node. It is recommended that during this process one student manually (not electronically) and cautiously writes down all relevant information for each node, e.g. the IP address, the unique host name (Fully Qualified Domain Name; FQDN), defined usernames and passwords and installed services. Media test: Skip the media test. In our case, it is not necessary and only consumes a lot of time, which can be saved by skipping it. Assigning the host name: Please configure the fully qualified domain name (FQDN) of the node as the host name (in the manner of: <computer name>.<domain>.<top level domain>). ATTENTION: When assigning the host name, you can scroll the page a bit down and select at the left side the option Configure Network. Do so and set up the configurations as follows: eth0 or eth1 (network interface controller) edit check the option connect automatically apply. Otherwise, you have to perform this step after the installation of CentOS and therefore you would have no immediate internet connection right after the installation. Assigning the root password: Please choose a non trivial password. Though, choose a password which you can share with your team mates and/or your supervisor during the exercise. Do not save it somewhere electronically! This is a considerable advice for all passwords you are defining during this exercise. Installation type: Install CentOS in the Desktop version. 1 CentOS can be downloaded for free at Hands on Exercise Hadoop Page 1 of 9

2 The installation of the OS takes a considerable amount of time (about 18 minutes). Take your time by investigating the following steps. 2. Configuration of CentOS Perform the following configurations of CentOS simultaneously for the master and the first slave node. Pay attention to the following hints. Hints: License policy: Accept the license policy. Choose the username: Choose a username for the Linux system. It is recommended to choose some intuitive names for the related nodes like master, slave01, and slave02. Subsequently assign a password for this account (Comply with the aforementioned password policy!). Time settings: Activate the option Synchronize the date and time via network (Network Time Protocol; NTP). This is necessary because we want all the nodes to be exactly synchronized. Kdump: Deactivate Kdump. It is not necessary in our use case and it only wastes valuable computational resources. HINT: Make your work a bit easier by deactivating the screen saver on each node. Repeat this configuration process in parallel (in the background) for the second slave node. 3. Configuration of the Hadoop Cluster Preparation of CentOS Please download the installation guide for installing a Hortonworks Data Platform 2.3 (HDP 2.3) Hadoop cluster by using Apache Ambari. You can find the installation guide at: works.com/hdpdocuments/ambari /bk_Installing_HDP_AMB/bk_Installing_HDP_AMB pdf. Note that not all of the steps being mentioned in the installation guide must be performed. Among them are for instance many steps dealing with problems under different operation systems and which do not refer to the CentOS. Think before you type! Before you start installing your Hadoop cluster, take a short look at the Linux terminal commands which you may need during the cluster installation: Linux Terminal Commands that You Should Have in Mind: You can find the terminal (also known as console or shell) under: Application System Tools Terminal (It is advised to create a shortcut for the terminal on the system panel by right clicking on the terminal and selecting Add to Panel ). You can also set a key combination (e.g. F3 or Ctrl + T ) to access the terminal directly via selecting: System Settings Hot Keys Desktop Start a terminal. Table 1 gives an overview of all Linux terminal commands which you may need during the exercise. You can find a more general and complete overview of these commands and their applications at: Hands on Exercise Hadoop Page 2 of 9

3 Table 1: Overview of required Linux terminal commands Command SU (Super User) EXIT Description Changes the active user to super user (root); this is necessary because in some cases, the user must be a root to run a command; the user is asked afterwards to insert the root password Closes the terminal or the current session TABULATOR ( ) Completes the input based on available files located on the working path CLEAR STRG + C PWD (Print Working Directory) LS (LiSt) CD (Change Directory) MKDIR (MaKe DIRectory) IFCONFIG HOSTNAME SSH (Secure SHell) SCP (Secure CoPy) Clears the terminal window Cancels the terminal inputs or running processes (keyboard interrupt) Shows the path of the current working directory Shows all available files in the current working directory Changes the working directory to another given location: cd <Path> switches to the given path; cd / switches to the root directory; cd.. switches to the parent directory; cd - switches to the previous directory Creates a new directory Shows the IP and the MAC address of the computer Shows the assigned host name of the node Opens a remote connection to another node, whereby the destination node is represented in the following way: <user>@<fqdn of destination node> (Securely) Copies the data from a source path to a destination path. It is especially suitable for exchanging data among nodes Now, start the installation by correctly configuring CentOS. Please follow the steps mentioned in the installation guide carefully and pay attention to the following hints. The pages 1 4 of the installation guide only contain some introducing information. In Section (page 5) you should get active for the first time. Hints: If you don t know a command, you can open the manual via the terminal with the following command: man <command> Preparation OpenSSL (important!): o First update the OpenSSL version on all nodes. In order to do this, execute as root: yum update openssl Regarding Section (Check the Maximum Open File Descriptors) of HDP installation guide: o The amount of open file descriptors (= the amount of data which can be processed simultaneously) can be set as root by the following commands ulimit -Hn and ulimit -Sn Regarding Section (Set Up Password less SSH) of HDP installation guide: Hands on Exercise Hadoop Page 3 of 9

4 o Please use your master node as the Ambari server o Generate the public SSH key as root user (!) o The key is available afterwards at: /root/.ssh/ o After setting the read and write rights (Step 4), run additionally the following command on all slave nodes restorecon -R ~/.ssh (NOTE: Password less SSH should not only be set up on both slave nodes, but also on the master node itself. Section of the HDP installation guide must therefore be executed in the case of three cluster nodes three times.) Regarding Section (Enable NTP on the Cluster and on the Browser Host) of the HDP installation guide: o Enable NTP: Actually, you should have already done this step during the installation of CentOS. Regarding Section (Check DNS and NSCD) of the HDP installation guide: o In the case of using the DNS of your institution: It is important that you have assigned the FQDN correctly during the installation phase. However, check the mentioned configuration files with respect to your FQDNs. o Otherwise: The mapping between the chosen FQDNs and their related IP addresses have to be performed manually on each node. Therefore, the related hosts files (that can be found under /etc/hosts) on each node have to be adapted (cf. Section Edit the Host File of HDP installation guide). Regarding Section (Configuring iptables) of the HDP installation guide: o You can only deactivate IPTables when you are logged in as a root user. Run these commands on all nodes! 4. Configuration of the Hadoop Cluster Installing Apache Ambari The installation of the Apache Ambari server is addressed from page 21 of HDP installation guide. Pay attention to the following hint. Hint: Regarding Section 2.2. (Set Up the Ambari Server) of HDP installation guide: o Step 6: Enter advanced Database configuration : Enter n (Default Database) 5. Installation and Start of the Hadoop Cluster The installation of the Hadoop cluster is explained from page 32 of the HDP installation guide. Pay attention to the following hints: Hints: Regarding Section 3.5. (Install Options) of the HDP installation guide: o Target Hosts: Provide the FQDNs of all nodes, including the master node (!). Hands on Exercise Hadoop Page 4 of 9

5 o SSH Private Key: In order to be able to enter the SSH private key, you should copy as a root user the file /root/.ssh/id_rsa to the desktop (e.g. /home/master/desktop) and transfer the file ownership to the master user (e.g. chown -c master id_rsa). Subsequently, you can select and open the key via the web interface. OR: o Open as root the id_rsa file in the terminal using the cat command. Mark the text, copy and paste it into the Ambari web interface. Regarding Section 3.6. (Confirm Hosts) of the HDP installation guide: o Confirm Hosts: Pay attention to potential errors and/or warnings after registering the clients automatically. Don t move on to the next step before you have handled and solved all errors and/or warnings! Regarding Section 3.7. (Choose Services) of the HDP installation guide: o Choose Services: In order to keep the installation process short, select only Hadoop components and applications which are necessary for this exercise, namely HDFS, YARN + MapReduce2 and Pig. 2 Confirm all dependencies with OK. Regarding Section 3.9. (Assign Slaves and Clients) of the HDP installation guide: o Assign Clients: In order to keep the installation process short, install the necessary HDFS and Pig clients only on your master node. As a result, you can execute HDFS commands and start Pig scripts only from your master node. Regarding Section (Customize Services) of the HDP installation guide: o Customize Services: Do not change any settings of the services. However, you can scroll to see what kind of settings you might select, if you are interested. If you have selected the installation of the Hadoop application Apache Oozie, you must provide a database user and a password in this step. Regarding Section (Review) of the HDP installation guide: o Deploy: Depending on how many Hadoop applications you have selected, the deploy step may take up to 10 minutes to complete. Hence, it is time for another short break. 6. Using and Testing the Hadoop Cluster Use and test your recently installed Hadoop cluster. However, read the following hints regarding the usage of the Hadoop Distributed File Systems (HDFS). Subsequently, solve the exercises. Hints for Using HDFS: When automatically installing the Hadoop cluster by using Apache Ambari, you will find a user account called hdfs, which is created by Ambari. The hdfs user has read and write permissions for the virtual Hadoop Distributed File System. By using the terminal command passwd hdfs you can choose a new password for this account. Do so on your master node. After you have successfully changed the password, switch the CentOS user and log in as hdfs (Do not log out, such that the Ambari Server can still run in the background!). Solve the exercises using this account. Execute all HDFS commands as hdfs user (not as root!) and use for the following exercises the HDFS directory /tmp/. 2 The applications Hive + HCatalog and HBase consume too many resources to be launched and therefore are not recommended for our test purposes. Hands on Exercise Hadoop Page 5 of 9

6 By using the following terminal commands, you can put data on HDFS hdfs dfs -copyfromlocal foo.txt /tmp/ or get data from HDFS and put it on your physical local file system hdfs dfs -copytolocal /tmp/wordcountoutput/part-r result.txt or show your data in the terminal window hdfs dfs -cat /tmp/wordcountoutput/part-r The following command will list the content of an HDFS directory: hdfs dfs -ls /tmp/. An extensive overview of all HDFS terminal commands can be found on the HDFS cheat sheet of the book (Dirk deroos, 2014: Hadoop for Dummies) at to/content/hadoop for dummies cheat sheet.html. Hands on Exercise Hadoop Page 6 of 9

7 Exercise 1 Airline On-time Performance Implement the HDFS example Airline on time performance from the book (Dirk deroos, 2014: Hadoop for Dummies, Chapter 13) on your own Hadoop cluster. Thereby, the following steps are of interest: Downloading the sample dataset Copying the sample dataset into HDFS Your first Hadoop program: Hello Hadoop! For your inputs and outputs, use the following HDFS directory rather than the one mentioned in the book: /tmp/. NOTE: The Pig script in the book contains \ to indicate line breaks. These backslashes are not part of the actual script and should therefore be ignored. In addition to that, the script contains a small bug: The path of the input data in the LOAD command (first line) should be put between two single quotation marks (cf. the construction of the following Pig script). If they are missing, an error in the terminal window will be shown. Exercise 2 The Word Count Example Apply the knowledge you gained in the first exercise to implement another Pig script in which you count the frequency of words in a text ( Word Count Example ). Use as an input text file the free RFC 7230 Hypertext Transfer Protocol (HTTP/1.1) : Your Pig script should have the following structure 3 : a = load '/foo.txt'; b = foreach a generate flatten(tokenize((chararray)$0)) as word; c = group b by word; d = foreach c generate COUNT(b), group; store d into '/output'; Subsequently answer the following questions: 2.1 Which essential text mining step have you just applied? 2.2 Where are the phases map and reduce located in the script? 2.3 Have a careful look at the physical representation of the HDFS blocks in your file system. Navigate as root to the following directory: (the name of the 5th subdirectory varies, depending on the time and the date of the file generation as well as the IP address of the node): /hadoop/hdfs/data/current/bp-<nb>-<ip>-<datetime>/current/finalized. By using the terminal command ls -lh you can see in the terminal window all files which are located in the current working directory including their sizes in bytes. Compare the directories on your master and slave nodes with each other! 4 Exercise 3 Performance Tests 3.1 How long does your Hadoop cluster take to compute the word count for the aforementioned RFC 7230? 3 Sample code obtained from tutorial/word counting with apache pig/. 4 Reminder: The default replication factor of HDFS is equal to 3. Hands on Exercise Hadoop Page 7 of 9

8 3.2 How long does your Hadoop cluster take to compute the word count of a significantly shorter text, like a text with probably just a sentence in length (Determination of the administration overhead)? 3.3 How long does your Hadoop cluster take to compute the word count for a significantly longer text, in a scale of 100 to 1000 times larger than RFC 7230? Exercise 4 Extension of an existing Hadoop Cluster 4.1 Consider the following question: Which steps do you need in order to add another slave node to your existing Hadoop cluster? 4.2 What kind of scaling category corresponds to this method? Hands on Exercise Hadoop Page 8 of 9

9 Additional Exercises: Explorative Learning The following exercises represent supplementary exercises. Start solving Exercise 5, by the time your group has done all four previous exercises. If several groups are done with their exercises and there is still some time left (~ 20 minutes), move on to Exercise 6 and collectively work on it. Exercise 5 The Hadoop Ecosystem Try to run a different Hadoop application other than Pig on your Hadoop cluster. For this purpose, search for a short tutorial on the Internet by yourself. 5 NOTE: You can add further services to your Hadoop cluster via the Ambari web dashboard: Actions + Add Services. Exercise 6 Think Big! Connect all available nodes in your laboratory to one single Hadoop cluster. Finally repeat your performance tests from Exercise 2. Can you observe a performance boost? NOTE: By using the following command, you can reset all the settings on your nodes which have been made by the Ambari installation wizard: python /usr/lib/python2.6/site-packages/ambari_agent/hostcleanup.py --silent 5 A good starting point for free Hadoop tutorials: sandbox/. Hands on Exercise Hadoop Page 9 of 9

Guide for Attempting an HDP Certification Practice Exam. Revision 2 Hortonworks University

Guide for Attempting an HDP Certification Practice Exam. Revision 2 Hortonworks University Guide for Attempting an HDP Certification Practice Exam Revision 2 Hortonworks University Overview Hortonworks University has developed a practice environment that emulates our actual exam environment.

More information

Chase Wu New Jersey Institute of Technology

Chase Wu New Jersey Institute of Technology CS 644: Introduction to Big Data Chapter 4. Big Data Analytics Platforms Chase Wu New Jersey Institute of Technology Some of the slides were provided through the courtesy of Dr. Ching-Yung Lin at Columbia

More information

Cloudera Manager Quick Start Guide

Cloudera Manager Quick Start Guide Cloudera Manager Guide Important Notice (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained in this

More information

Hortonworks Data Platform

Hortonworks Data Platform Hortonworks Data Platform Workflow Management (August 31, 2017) docs.hortonworks.com Hortonworks Data Platform: Workflow Management Copyright 2012-2017 Hortonworks, Inc. Some rights reserved. The Hortonworks

More information

Installing SmartSense on HDP

Installing SmartSense on HDP 1 Installing SmartSense on HDP Date of Publish: 2018-07-12 http://docs.hortonworks.com Contents SmartSense installation... 3 SmartSense system requirements... 3 Operating system, JDK, and browser requirements...3

More information

Xcalar Installation Guide

Xcalar Installation Guide Xcalar Installation Guide Publication date: 2018-03-16 www.xcalar.com Copyright 2018 Xcalar, Inc. All rights reserved. Table of Contents Xcalar installation overview 5 Audience 5 Overview of the Xcalar

More information

CENG 334 Computer Networks. Laboratory I Linux Tutorial

CENG 334 Computer Networks. Laboratory I Linux Tutorial CENG 334 Computer Networks Laboratory I Linux Tutorial Contents 1. Logging In and Starting Session 2. Using Commands 1. Basic Commands 2. Working With Files and Directories 3. Permission Bits 3. Introduction

More information

Create Test Environment

Create Test Environment Create Test Environment Describes how to set up the Trafodion test environment used by developers and testers Prerequisites Python Passwordless ssh If you already have an existing set of ssh keys If you

More information

Contents. Note: pay attention to where you are. Note: Plaintext version. Note: pay attention to where you are... 1 Note: Plaintext version...

Contents. Note: pay attention to where you are. Note: Plaintext version. Note: pay attention to where you are... 1 Note: Plaintext version... Contents Note: pay attention to where you are........................................... 1 Note: Plaintext version................................................... 1 Hello World of the Bash shell 2 Accessing

More information

How to Run the Big Data Management Utility Update for 10.1

How to Run the Big Data Management Utility Update for 10.1 How to Run the Big Data Management Utility Update for 10.1 2016 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording

More information

Introduction into Big Data analytics Lecture 3 Hadoop ecosystem. Janusz Szwabiński

Introduction into Big Data analytics Lecture 3 Hadoop ecosystem. Janusz Szwabiński Introduction into Big Data analytics Lecture 3 Hadoop ecosystem Janusz Szwabiński Outlook of today s talk Apache Hadoop Project Common use cases Getting started with Hadoop Single node cluster Further

More information

Configure HOSTNAME by adding the hostname to the file /etc/sysconfig/network. Do the same to all the other 3(4) nodes.

Configure HOSTNAME by adding the hostname to the file /etc/sysconfig/network. Do the same to all the other 3(4) nodes. Network setup As the root user execute the command "ifconfig" on each host. Take a note of ipaddress's of all machines I have a home LAN so my addresses are class C which might be in the format 192.168.192.x.

More information

Using The Hortonworks Virtual Sandbox Powered By Apache Hadoop

Using The Hortonworks Virtual Sandbox Powered By Apache Hadoop Using The Hortonworks Virtual Sandbox Powered By Apache Hadoop This work by Hortonworks, Inc. is licensed under a Creative Commons Attribution ShareAlike3.0 Unported License. Legal Notice Copyright 2012

More information

Processing Big Data with Hadoop in Azure HDInsight

Processing Big Data with Hadoop in Azure HDInsight Processing Big Data with Hadoop in Azure HDInsight Lab 1 - Getting Started with HDInsight Overview In this lab, you will provision an HDInsight cluster. You will then run a sample MapReduce job on the

More information

Sandbox Setup Guide for HDP 2.2 and VMware

Sandbox Setup Guide for HDP 2.2 and VMware Waterline Data Inventory Sandbox Setup Guide for HDP 2.2 and VMware Product Version 2.0 Document Version 10.15.2015 2014-2015 Waterline Data, Inc. All rights reserved. All other trademarks are the property

More information

Hortonworks SmartSense

Hortonworks SmartSense Hortonworks SmartSense Installation (April 3, 2017) docs.hortonworks.com Hortonworks SmartSense: Installation Copyright 2012-2017 Hortonworks, Inc. Some rights reserved. The Hortonworks Data Platform,

More information

Accessing clusters 2. Accessing Clusters. Date of Publish:

Accessing clusters 2. Accessing Clusters. Date of Publish: 2 Accessing Clusters Date of Publish: 2018-09-14 http://docs.hortonworks.com Contents Cloudbreak user accounts... 3 Finding cluster information in the web UI... 3 Cluster summary... 4 Cluster information...

More information

Developer Training for Apache Spark and Hadoop: Hands-On Exercises

Developer Training for Apache Spark and Hadoop: Hands-On Exercises 201709c Developer Training for Apache Spark and Hadoop: Hands-On Exercises Table of Contents General Notes... 1 Hands-On Exercise: Starting the Exercise Environment (Local VM)... 5 Hands-On Exercise: Starting

More information

Bitnami Apache Solr for Huawei Enterprise Cloud

Bitnami Apache Solr for Huawei Enterprise Cloud Bitnami Apache Solr for Huawei Enterprise Cloud Description Apache Solr is an open source enterprise search platform from the Apache Lucene project. It includes powerful full-text search, highlighting,

More information

Upgrading Big Data Management to Version Update 2 for Hortonworks HDP

Upgrading Big Data Management to Version Update 2 for Hortonworks HDP Upgrading Big Data Management to Version 10.1.1 Update 2 for Hortonworks HDP Copyright Informatica LLC 2017. Informatica, the Informatica logo, and Informatica Big Data Management are trademarks or registered

More information

Linux Operating System Environment Computadors Grau en Ciència i Enginyeria de Dades Q2

Linux Operating System Environment Computadors Grau en Ciència i Enginyeria de Dades Q2 Linux Operating System Environment Computadors Grau en Ciència i Enginyeria de Dades 2017-2018 Q2 Facultat d Informàtica de Barcelona This first lab session is focused on getting experience in working

More information

UNIT 9 Introduction to Linux and Ubuntu

UNIT 9 Introduction to Linux and Ubuntu AIR FORCE ASSOCIATION S CYBERPATRIOT NATIONAL YOUTH CYBER EDUCATION PROGRAM UNIT 9 Introduction to Linux and Ubuntu Learning Objectives Participants will understand the basics of Linux, including the nature,

More information

Jackson State University Department of Computer Science CSC / Computer Security Fall 2013 Instructor: Dr. Natarajan Meghanathan

Jackson State University Department of Computer Science CSC / Computer Security Fall 2013 Instructor: Dr. Natarajan Meghanathan Jackson State University Department of Computer Science CSC 437-01/539-01 Computer Security Fall 2013 Instructor: Dr. Natarajan Meghanathan Lab Project # 2: Running Secure Shell (SSH) Server in a Virtual

More information

Hortonworks SmartSense

Hortonworks SmartSense Hortonworks SmartSense Installation (January 8, 2018) docs.hortonworks.com Hortonworks SmartSense: Installation Copyright 2012-2018 Hortonworks, Inc. Some rights reserved. The Hortonworks Data Platform,

More information

Configure HOSTNAME by adding the hostname to the file /etc/sysconfig/network. Do the same to all the all nodes.

Configure HOSTNAME by adding the hostname to the file /etc/sysconfig/network. Do the same to all the all nodes. Network setup As the root user execute the command "ifconfig" on each host. Take a note of ipaddress's of all machines I have a home LAN so my addresses are class C which might be in the format 192.168.192.x.

More information

Working with Basic Linux. Daniel Balagué

Working with Basic Linux. Daniel Balagué Working with Basic Linux Daniel Balagué How Linux Works? Everything in Linux is either a file or a process. A process is an executing program identified with a PID number. It runs in short or long duration

More information

Using Hive for Data Warehousing

Using Hive for Data Warehousing An IBM Proof of Technology Using Hive for Data Warehousing Unit 1: Exploring Hive An IBM Proof of Technology Catalog Number Copyright IBM Corporation, 2013 US Government Users Restricted Rights - Use,

More information

This lab exercise is to be submitted at the end of the lab session! passwd [That is the command to change your current password to a new one]

This lab exercise is to be submitted at the end of the lab session! passwd [That is the command to change your current password to a new one] Data and Computer Security (CMPD414) Lab II Topics: secure login, moving into HOME-directory, navigation on Unix, basic commands for vi, Message Digest This lab exercise is to be submitted at the end of

More information

Azure Marketplace Getting Started Tutorial. Community Edition

Azure Marketplace Getting Started Tutorial. Community Edition Azure Marketplace Getting Started Tutorial Community Edition Introduction NooBaa software provides a distributed storage solution for unstructured data such as analytics data, multi-media, backup, and

More information

Hortonworks Data Platform v1.0 Powered by Apache Hadoop Installing and Configuring HDP using Hortonworks Management Center

Hortonworks Data Platform v1.0 Powered by Apache Hadoop Installing and Configuring HDP using Hortonworks Management Center Hortonworks Data Platform v1.0 Powered by Apache Hadoop Installing and Configuring HDP using Hortonworks Management Center This work by Hortonworks, Inc. is licensed under a Creative Commons Attribution-ShareAlike

More information

Cloud Computing II. Exercises

Cloud Computing II. Exercises Cloud Computing II Exercises Exercise 1 Creating a Private Cloud Overview In this exercise, you will install and configure a private cloud using OpenStack. This will be accomplished using a singlenode

More information

Bitnami MEAN for Huawei Enterprise Cloud

Bitnami MEAN for Huawei Enterprise Cloud Bitnami MEAN for Huawei Enterprise Cloud Description Bitnami MEAN Stack provides a complete development environment for mongodb and Node.js that can be deployed in one click. It includes the latest stable

More information

Session 1: Accessing MUGrid and Command Line Basics

Session 1: Accessing MUGrid and Command Line Basics Session 1: Accessing MUGrid and Command Line Basics Craig A. Struble, Ph.D. July 14, 2010 1 Introduction The Marquette University Grid (MUGrid) is a collection of dedicated and opportunistic resources

More information

Installing Apache Zeppelin

Installing Apache Zeppelin 3 Installing Date of Publish: 2018-04-01 http://docs.hortonworks.com Contents Install Using Ambari...3 Enabling HDFS and Configuration Storage for Zeppelin Notebooks in HDP-2.6.3+...4 Overview... 4 Enable

More information

Linux Essentials Objectives Topics:

Linux Essentials Objectives Topics: Linux Essentials Linux Essentials is a professional development certificate program that covers basic knowledge for those working and studying Open Source and various distributions of Linux. Exam Objectives

More information

Enterprise Steam Installation and Setup

Enterprise Steam Installation and Setup Enterprise Steam Installation and Setup Release H2O.ai Mar 01, 2017 CONTENTS 1 Installing Enterprise Steam 3 1.1 Obtaining the License Key........................................ 3 1.2 Ubuntu Installation............................................

More information

VMware vsphere Big Data Extensions Administrator's and User's Guide

VMware vsphere Big Data Extensions Administrator's and User's Guide VMware vsphere Big Data Extensions Administrator's and User's Guide vsphere Big Data Extensions 1.1 This document supports the version of each product listed and supports all subsequent versions until

More information

Hitachi Hyper Scale-Out Platform (HSP) Hortonworks Ambari VM Deployment Guide

Hitachi Hyper Scale-Out Platform (HSP) Hortonworks Ambari VM Deployment Guide Hitachi Hyper Scale-Out Platform (HSP) MK-95HSP017-03 11 October 2016 2016 Hitachi, Ltd. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic

More information

STUDENT GRADE IMPROVEMENT INHIGHER STUDIES

STUDENT GRADE IMPROVEMENT INHIGHER STUDIES STUDENT GRADE IMPROVEMENT INHIGHER STUDIES Sandhya P. Pandey Assistant Professor, The S.I.A college of Higher Education, Dombivili( E), Thane, Maharastra. Abstract: In India Higher educational institutions

More information

CS CS Tutorial 2 2 Winter 2018

CS CS Tutorial 2 2 Winter 2018 CS CS 230 - Tutorial 2 2 Winter 2018 Sections 1. Unix Basics and connecting to CS environment 2. MIPS Introduction & CS230 Interface 3. Connecting Remotely If you haven t set up a CS environment password,

More information

Oracle BDA: Working With Mammoth - 1

Oracle BDA: Working With Mammoth - 1 Hello and welcome to this online, self-paced course titled Administering and Managing the Oracle Big Data Appliance (BDA). This course contains several lessons. This lesson is titled Working With Mammoth.

More information

Hortonworks DataFlow

Hortonworks DataFlow Hortonworks DataFlow Installing HDF Services on a New HDP Cluster (February 28, 2018) docs.hortonworks.com Hortonworks DataFlow: Installing HDF Services on a New HDP Cluster Copyright 2012-2018 Hortonworks,

More information

An Introduction to Cluster Computing Using Newton

An Introduction to Cluster Computing Using Newton An Introduction to Cluster Computing Using Newton Jason Harris and Dylan Storey March 25th, 2014 Jason Harris and Dylan Storey Introduction to Cluster Computing March 25th, 2014 1 / 26 Workshop design.

More information

Logging on to the Hadoop Cluster Nodes. To login to the Hadoop cluster in ROGER, a user needs to login to ROGER first, for example:

Logging on to the Hadoop Cluster Nodes. To login to the Hadoop cluster in ROGER, a user needs to login to ROGER first, for example: Hadoop User Guide Logging on to the Hadoop Cluster Nodes To login to the Hadoop cluster in ROGER, a user needs to login to ROGER first, for example: ssh username@roger-login.ncsa. illinois.edu after entering

More information

ScopTEL TM IP PBX Software. DNS Server Configuration Wizard

ScopTEL TM IP PBX Software. DNS Server Configuration Wizard ScopTEL TM IP PBX Software DNS Server Configuration Wizard Network Module - ifconfig A newly installed server uses DHCP to get an IP address from a DHCP server on the network so make sure the eth0 interface

More information

This tutorial will guide you how to setup and run your own minecraft server on a Linux CentOS 6 in no time.

This tutorial will guide you how to setup and run your own minecraft server on a Linux CentOS 6 in no time. This tutorial will guide you how to setup and run your own minecraft server on a Linux CentOS 6 in no time. Running your own server lets you play together with your friends and family with your own set

More information

Deploying Rubrik Datos IO to Protect MongoDB Database on GCP

Deploying Rubrik Datos IO to Protect MongoDB Database on GCP DEPLOYMENT GUIDE Deploying Rubrik Datos IO to Protect MongoDB Database on GCP TABLE OF CONTENTS INTRODUCTION... 1 OBJECTIVES... 1 COSTS... 2 BEFORE YOU BEGIN... 2 PROVISIONING YOUR INFRASTRUCTURE FOR THE

More information

Parallel Programming Pre-Assignment. Setting up the Software Environment

Parallel Programming Pre-Assignment. Setting up the Software Environment Parallel Programming Pre-Assignment Setting up the Software Environment Authors: B. Wilkinson and C. Ferner. Modification date: Aug 21, 2014 (Minor correction Aug 27, 2014.) Software The purpose of this

More information

Troubleshooting Cisco APIC-EM Multi-Host

Troubleshooting Cisco APIC-EM Multi-Host The following procedures may be used to troubleshoot a Cisco APIC-EM multi-host configuration: Changing the Settings in a Multi-Host Cluster, page 1 Removing a Single Host from a Multi-Host Cluster, page

More information

202 Lab Introduction Connecting to the Lab Environment

202 Lab Introduction Connecting to the Lab Environment 202 Lab Introduction Connecting to the Lab Environment Objectives During this v7.1 Deployment lab, each student (from the Blue group or Green group) must verify access (and permissions) to their assigned

More information

Lab Working with Linux Command Line

Lab Working with Linux Command Line Introduction In this lab, you will use the Linux command line to manage files and folders and perform some basic administrative tasks. Recommended Equipment A computer with a Linux OS, either installed

More information

A Guide to Running Map Reduce Jobs in Java University of Stirling, Computing Science

A Guide to Running Map Reduce Jobs in Java University of Stirling, Computing Science A Guide to Running Map Reduce Jobs in Java University of Stirling, Computing Science Introduction The Hadoop cluster in Computing Science at Stirling allows users with a valid user account to submit and

More information

Setting up a Chaincoin Masternode

Setting up a Chaincoin Masternode Setting up a Chaincoin Masternode Introduction So you want to set up your own Chaincoin Masternode? You ve come to the right place! These instructions are correct as of April, 2017, and relate to version

More information

Installation 1. DLM Installation. Date of Publish:

Installation 1. DLM Installation. Date of Publish: 1 DLM Installation Date of Publish: 2018-05-18 http://docs.hortonworks.com Contents Installation overview...3 Setting Up the Local Repository for Your DLM Installation... 3 Set up a local repository for

More information

Part I. Introduction to Linux

Part I. Introduction to Linux Part I Introduction to Linux 7 Chapter 1 Linux operating system Goal-of-the-Day Familiarisation with basic Linux commands and creation of data plots. 1.1 What is Linux? All astronomical data processing

More information

Bitnami JRuby for Huawei Enterprise Cloud

Bitnami JRuby for Huawei Enterprise Cloud Bitnami JRuby for Huawei Enterprise Cloud Description JRuby is a 100% Java implementation of the Ruby programming language. It is Ruby for the JVM. JRuby provides a complete set of core built-in classes

More information

Hortonworks Technical Preview for Apache Falcon

Hortonworks Technical Preview for Apache Falcon Architecting the Future of Big Data Hortonworks Technical Preview for Apache Falcon Released: 11/20/2013 Architecting the Future of Big Data 2013 Hortonworks Inc. All Rights Reserved. Welcome to Hortonworks

More information

Managing and Monitoring a Cluster

Managing and Monitoring a Cluster 2 Managing and Monitoring a Cluster Date of Publish: 2018-04-30 http://docs.hortonworks.com Contents ii Contents Introducing Ambari operations... 5 Understanding Ambari architecture... 5 Access Ambari...

More information

In this exercise you will practice working with HDFS, the Hadoop. You will use the HDFS command line tool and the Hue File Browser

In this exercise you will practice working with HDFS, the Hadoop. You will use the HDFS command line tool and the Hue File Browser Access HDFS with Command Line and Hue Data Files (local): ~/labs/data/kb/* ~/labs/data/base_stations.tsv In this exercise you will practice working with HDFS, the Hadoop Distributed File System. You will

More information

Horizon DaaS Platform 6.1 Service Provider Installation - vcloud

Horizon DaaS Platform 6.1 Service Provider Installation - vcloud Horizon DaaS Platform 6.1 Service Provider Installation - vcloud This guide provides information on how to install and configure the DaaS platform Service Provider appliances using vcloud discovery of

More information

Linux Command Line Primer. By: Scott Marshall

Linux Command Line Primer. By: Scott Marshall Linux Command Line Primer By: Scott Marshall Draft: 10/21/2007 Table of Contents Topic Page(s) Preface 1 General Filesystem Background Information 2 General Filesystem Commands 2 Working with Files and

More information

Accessing Hadoop Data Using Hive

Accessing Hadoop Data Using Hive An IBM Proof of Technology Accessing Hadoop Data Using Hive Unit 3: Hive DML in action An IBM Proof of Technology Catalog Number Copyright IBM Corporation, 2015 US Government Users Restricted Rights -

More information

Hadoop An Overview. - Socrates CCDH

Hadoop An Overview. - Socrates CCDH Hadoop An Overview - Socrates CCDH What is Big Data? Volume Not Gigabyte. Terabyte, Petabyte, Exabyte, Zettabyte - Due to handheld gadgets,and HD format images and videos - In total data, 90% of them collected

More information

Beta. VMware vsphere Big Data Extensions Administrator's and User's Guide. vsphere Big Data Extensions 1.0 EN

Beta. VMware vsphere Big Data Extensions Administrator's and User's Guide. vsphere Big Data Extensions 1.0 EN VMware vsphere Big Data Extensions Administrator's and User's Guide vsphere Big Data Extensions 1.0 This document supports the version of each product listed and supports all subsequent versions until

More information

Hitachi Hyper Scale-Out Platform (HSP) Hortonworks Ambari VM Quick Reference Guide

Hitachi Hyper Scale-Out Platform (HSP) Hortonworks Ambari VM Quick Reference Guide Hitachi Hyper Scale-Out Platform (HSP) MK-95HSP013-03 14 October 2016 2016 Hitachi, Ltd. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic

More information

Introduction to Linux Workshop 2. The George Washington University SEAS Computing Facility

Introduction to Linux Workshop 2. The George Washington University SEAS Computing Facility Introduction to Linux Workshop 2 The George Washington University SEAS Computing Facility Course Goals SSH and communicating with other machines Public/Private key generation,.ssh directory, and the config

More information

Innovatus Technologies

Innovatus Technologies HADOOP 2.X BIGDATA ANALYTICS 1. Java Overview of Java Classes and Objects Garbage Collection and Modifiers Inheritance, Aggregation, Polymorphism Command line argument Abstract class and Interfaces String

More information

Commands are in black

Commands are in black Starting From the Shell Prompt (Terminal) Commands are in black / +--------+---------+-------+---------+---------+------ +------ +------ +------ +------ +------ +-- Bin boot dev etc home media sbin bin

More information

Azure Marketplace. Getting Started Tutorial. Community Edition

Azure Marketplace. Getting Started Tutorial. Community Edition Azure Marketplace Getting Started Tutorial Community Edition Introduction NooBaa software provides a distributed storage solution for unstructured data such as analytics data, multi-media, backup, and

More information

Me CloudTM: Getti g Started Guide

Me CloudTM: Getti g Started Guide Me CloudTM: Getti g Started Guide November 2016 Version 1.16 Kodiak Data, Inc. 2570 W El Camino Real Suite 500 Mountain View, CA 94040 Phone: (650) 383-8374 support@kodiakdata.com www.kodiakdata.com Copyright

More information

Linux Kung Fu. Stephen James UBNetDef, Spring 2017

Linux Kung Fu. Stephen James UBNetDef, Spring 2017 Linux Kung Fu Stephen James UBNetDef, Spring 2017 Introduction What is Linux? What is the difference between a client and a server? What is Linux? Linux generally refers to a group of Unix-like free and

More information

Bitnami ez Publish for Huawei Enterprise Cloud

Bitnami ez Publish for Huawei Enterprise Cloud Bitnami ez Publish for Huawei Enterprise Cloud Description ez Publish is an Enterprise Content Management platform with an easy to use Web Content Management System. It includes role-based multi-user access,

More information

CST8207: GNU/Linux Operating Systems I Lab Six Linux File System Permissions. Linux File System Permissions (modes) - Part 1

CST8207: GNU/Linux Operating Systems I Lab Six Linux File System Permissions. Linux File System Permissions (modes) - Part 1 Student Name: Lab Section: Linux File System Permissions (modes) - Part 1 Due Date - Upload to Blackboard by 8:30am Monday March 12, 2012 Submit the completed lab to Blackboard following the Rules for

More information

OPS235: Week 1. Installing Linux ( Lab1: Investigations 1-4)

OPS235: Week 1. Installing Linux ( Lab1: Investigations 1-4) OPS235: Week 1 Installing Linux ( Lab1: Investigations 1-4) 1 Agenda: Lab 1 Thinking Ahead (Tips / Warnings): Required Materials / Coming Prepared to Labs Importance of Mastering the CLI (Command Line

More information

Getting Started 1. Getting Started. Date of Publish:

Getting Started 1. Getting Started. Date of Publish: 1 Date of Publish: 2018-07-03 http://docs.hortonworks.com Contents... 3 Data Lifecycle Manager terminology... 3 Communication with HDP clusters...4 How pairing works in Data Lifecycle Manager... 5 How

More information

Unix Tutorial Haverford Astronomy 2014/2015

Unix Tutorial Haverford Astronomy 2014/2015 Unix Tutorial Haverford Astronomy 2014/2015 Overview of Haverford astronomy computing resources This tutorial is intended for use on computers running the Linux operating system, including those in the

More information

Tutorial 1: Unix Basics

Tutorial 1: Unix Basics Tutorial 1: Unix Basics To log in to your ece account, enter your ece username and password in the space provided in the login screen. Note that when you type your password, nothing will show up in the

More information

SAP VORA 1.4 on AWS - MARKETPLACE EDITION FREQUENTLY ASKED QUESTIONS

SAP VORA 1.4 on AWS - MARKETPLACE EDITION FREQUENTLY ASKED QUESTIONS SAP VORA 1.4 on AWS - MARKETPLACE EDITION FREQUENTLY ASKED QUESTIONS 1. What is SAP Vora? SAP Vora is an in-memory, distributed computing solution that helps organizations uncover actionable business insights

More information

SPLUNK ENTERPRISE AND ECS TECHNICAL SOLUTION GUIDE

SPLUNK ENTERPRISE AND ECS TECHNICAL SOLUTION GUIDE SPLUNK ENTERPRISE AND ECS TECHNICAL SOLUTION GUIDE Splunk Frozen and Archive Buckets on ECS ABSTRACT This technical solution guide describes a solution for archiving Splunk frozen buckets to ECS. It also

More information

Bitnami Re:dash for Huawei Enterprise Cloud

Bitnami Re:dash for Huawei Enterprise Cloud Bitnami Re:dash for Huawei Enterprise Cloud Description Re:dash is an open source data visualization and collaboration tool. It was designed to allow fast and easy access to billions of records in all

More information

eftp Application User Guide

eftp Application User Guide Team A eftp User Guide 1/30 eftp Application User Guide Table of Contents Page 1. Acknowledgement 2 2. Introduction a. Welcome eftp Audience 3 b. What s in this manual 3 c. Manual Conventions 3 d. Getting

More information

CSE 101 Introduction to Computers Development / Tutorial / Lab Environment Setup

CSE 101 Introduction to Computers Development / Tutorial / Lab Environment Setup CSE 101 Introduction to Computers Development / Tutorial / Lab Environment Setup Purpose: The purpose of this lab is to setup software that you will be using throughout the term for learning about Python

More information

Bitnami Piwik for Huawei Enterprise Cloud

Bitnami Piwik for Huawei Enterprise Cloud Bitnami Piwik for Huawei Enterprise Cloud Description Piwik is a real time web analytics software program. It provides detailed reports on website visitors: the search engines and keywords they used, the

More information

Linux Systems Administration Getting Started with Linux

Linux Systems Administration Getting Started with Linux Linux Systems Administration Getting Started with Linux Network Startup Resource Center www.nsrc.org These materials are licensed under the Creative Commons Attribution-NonCommercial 4.0 International

More information

Hortonworks DataFlow

Hortonworks DataFlow Hortonworks DataFlow Installing HDF Services on a New HDP Cluster for IBM (December 22, 2017) docs.hortonworks.com Hortonworks DataFlow: Installing HDF Services on a New HDP Cluster for IBM Power Systems

More information

We are ready to serve Latest Testing Trends, Are you ready to learn.?? New Batches Info

We are ready to serve Latest Testing Trends, Are you ready to learn.?? New Batches Info We are ready to serve Latest Testing Trends, Are you ready to learn.?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : About Quality Thought We are

More information

Plexxi HCN Plexxi Connect Installation, Upgrade and Administration Guide Release 3.0.0

Plexxi HCN Plexxi Connect Installation, Upgrade and Administration Guide Release 3.0.0 Plexxi HCN Plexxi Connect Installation, Upgrade and Administration Guide Release 3.0.0 May 3, 2018 100 Innovative Way - Suite 3322 Nashua, NH 03062 Tel. +1.888.630.PLEX (7539) www.plexxi.com Legal Notices

More information

System Manager Unit (SMU) Hardware Reference

System Manager Unit (SMU) Hardware Reference System Manager Unit (SMU) Hardware Reference MK-92HNAS065-02 Notices and Disclaimer Copyright 2015 Hitachi Data Systems Corporation. All rights reserved. The performance data contained herein was obtained

More information

TDDE31/732A54 - Big Data Analytics Lab compendium

TDDE31/732A54 - Big Data Analytics Lab compendium TDDE31/732A54 - Big Data Analytics Lab compendium For relational databases lab, please refer to http://www.ida.liu.se/~732a54/lab/rdb/index.en.shtml. Description and Aim In the lab exercises you will work

More information

CS/CIS 249 SP18 - Intro to Information Security

CS/CIS 249 SP18 - Intro to Information Security Lab assignment CS/CIS 249 SP18 - Intro to Information Security Lab #2 - UNIX/Linux Access Controls, version 1.2 A typed document is required for this assignment. You must type the questions and your responses

More information

CS 1110, LAB 1: EXPRESSIONS AND ASSIGNMENTS First Name: Last Name: NetID:

CS 1110, LAB 1: EXPRESSIONS AND ASSIGNMENTS   First Name: Last Name: NetID: CS 1110, LAB 1: EXPRESSIONS AND ASSIGNMENTS http://www.cs.cornell.edu/courses/cs1110/2018sp/labs/lab01/lab01.pdf First Name: Last Name: NetID: Learning goals: (1) get hands-on experience using Python in

More information

Exercise #1: ANALYZING SOCIAL MEDIA AND CUSTOMER SENTIMENT WITH APACHE NIFI AND HDP SEARCH INTRODUCTION CONFIGURE AND START SOLR

Exercise #1: ANALYZING SOCIAL MEDIA AND CUSTOMER SENTIMENT WITH APACHE NIFI AND HDP SEARCH INTRODUCTION CONFIGURE AND START SOLR Exercise #1: ANALYZING SOCIAL MEDIA AND CUSTOMER SENTIMENT WITH APACHE NIFI AND HDP SEARCH INTRODUCTION We will use Solr and the LucidWorks HDP Search to view our streamed data in real time to gather insights

More information

Test Lab Introduction to the Test Lab Linux Cluster Environment

Test Lab Introduction to the Test Lab Linux Cluster Environment Test Lab 1.0 - Introduction to the Test Lab Linux Cluster Environment Test lab is a set of three disposable cluster environments that can be used for systems research. All three environments are accessible

More information

Hortonworks Data Platform

Hortonworks Data Platform Data Governance () docs.hortonworks.com : Data Governance Copyright 2012-2017 Hortonworks, Inc. Some rights reserved. The, powered by Apache Hadoop, is a massively scalable and 100% open source platform

More information

docs.hortonworks.com

docs.hortonworks.com docs.hortonworks.com : Getting Started Guide Copyright 2012, 2014 Hortonworks, Inc. Some rights reserved. The, powered by Apache Hadoop, is a massively scalable and 100% open source platform for storing,

More information

Bitnami Ruby for Huawei Enterprise Cloud

Bitnami Ruby for Huawei Enterprise Cloud Bitnami Ruby for Huawei Enterprise Cloud Description Bitnami Ruby Stack provides a complete development environment for Ruby on Rails that can be deployed in one click. It includes most popular components

More information

Big Data Programming: an Introduction. Spring 2015, X. Zhang Fordham Univ.

Big Data Programming: an Introduction. Spring 2015, X. Zhang Fordham Univ. Big Data Programming: an Introduction Spring 2015, X. Zhang Fordham Univ. Outline What the course is about? scope Introduction to big data programming Opportunity and challenge of big data Origin of Hadoop

More information

Bitnami HHVM for Huawei Enterprise Cloud

Bitnami HHVM for Huawei Enterprise Cloud Bitnami HHVM for Huawei Enterprise Cloud Description HHVM is an open source virtual machine designed for executing programs written in Hack and PHP. HHVM uses a just-in-time (JIT) compilation approach

More information

Introduction to UNIX command-line

Introduction to UNIX command-line Introduction to UNIX command-line Boyce Thompson Institute March 17, 2015 Lukas Mueller & Noe Fernandez Class Content Terminal file system navigation Wildcards, shortcuts and special characters File permissions

More information

INITIAL EVALUATION BIGSQL FOR HORTONWORKS (Homerun or merely a major bluff?)

INITIAL EVALUATION BIGSQL FOR HORTONWORKS (Homerun or merely a major bluff?) PER STRICKER, THOMAS KALB 07.02.2017, HEART OF TEXAS DB2 USER GROUP, AUSTIN 08.02.2017, DB2 FORUM USER GROUP, DALLAS INITIAL EVALUATION BIGSQL FOR HORTONWORKS (Homerun or merely a major bluff?) Copyright

More information