Hands-on Exercise Hadoop
|
|
- Merilyn Andrews
- 5 years ago
- Views:
Transcription
1 Department of Economics and Business Administration Chair of Business Information Systems I Prof. Dr. Barbara Dinter Big Data Management Hands-on Exercise Hadoop Building and Testing a Hadoop Cluster by Means of Apache Ambari 1. Configuration of the Cluster Nodes First of all, define which computer is your master node and which computers are your slave nodes of your future Hadoop cluster. Now, configure your computers by installing an operating system (OS) on them. Pay attention to following hints. Hints: For the purpose of our exercise, we use CentOS (Community Enterprise Operation System) in the version 6.7 (64bit) which is a ready enterprise Linux distribution. 1 Install in parallel the OS for the master and the first slave node. Once you are done with this task, you can move on and install CentOS on the second slave node. It is recommended that during this process one student manually (not electronically) and cautiously writes down all relevant information for each node, e.g. the IP address, the unique host name (Fully Qualified Domain Name; FQDN), defined usernames and passwords and installed services. Media test: Skip the media test. In our case, it is not necessary and only consumes a lot of time, which can be saved by skipping it. Assigning the host name: Please configure the fully qualified domain name (FQDN) of the node as the host name (in the manner of: <computer name>.<domain>.<top level domain>). ATTENTION: When assigning the host name, you can scroll the page a bit down and select at the left side the option Configure Network. Do so and set up the configurations as follows: eth0 or eth1 (network interface controller) edit check the option connect automatically apply. Otherwise, you have to perform this step after the installation of CentOS and therefore you would have no immediate internet connection right after the installation. Assigning the root password: Please choose a non trivial password. Though, choose a password which you can share with your team mates and/or your supervisor during the exercise. Do not save it somewhere electronically! This is a considerable advice for all passwords you are defining during this exercise. Installation type: Install CentOS in the Desktop version. 1 CentOS can be downloaded for free at Hands on Exercise Hadoop Page 1 of 9
2 The installation of the OS takes a considerable amount of time (about 18 minutes). Take your time by investigating the following steps. 2. Configuration of CentOS Perform the following configurations of CentOS simultaneously for the master and the first slave node. Pay attention to the following hints. Hints: License policy: Accept the license policy. Choose the username: Choose a username for the Linux system. It is recommended to choose some intuitive names for the related nodes like master, slave01, and slave02. Subsequently assign a password for this account (Comply with the aforementioned password policy!). Time settings: Activate the option Synchronize the date and time via network (Network Time Protocol; NTP). This is necessary because we want all the nodes to be exactly synchronized. Kdump: Deactivate Kdump. It is not necessary in our use case and it only wastes valuable computational resources. HINT: Make your work a bit easier by deactivating the screen saver on each node. Repeat this configuration process in parallel (in the background) for the second slave node. 3. Configuration of the Hadoop Cluster Preparation of CentOS Please download the installation guide for installing a Hortonworks Data Platform 2.3 (HDP 2.3) Hadoop cluster by using Apache Ambari. You can find the installation guide at: works.com/hdpdocuments/ambari /bk_Installing_HDP_AMB/bk_Installing_HDP_AMB pdf. Note that not all of the steps being mentioned in the installation guide must be performed. Among them are for instance many steps dealing with problems under different operation systems and which do not refer to the CentOS. Think before you type! Before you start installing your Hadoop cluster, take a short look at the Linux terminal commands which you may need during the cluster installation: Linux Terminal Commands that You Should Have in Mind: You can find the terminal (also known as console or shell) under: Application System Tools Terminal (It is advised to create a shortcut for the terminal on the system panel by right clicking on the terminal and selecting Add to Panel ). You can also set a key combination (e.g. F3 or Ctrl + T ) to access the terminal directly via selecting: System Settings Hot Keys Desktop Start a terminal. Table 1 gives an overview of all Linux terminal commands which you may need during the exercise. You can find a more general and complete overview of these commands and their applications at: Hands on Exercise Hadoop Page 2 of 9
3 Table 1: Overview of required Linux terminal commands Command SU (Super User) EXIT Description Changes the active user to super user (root); this is necessary because in some cases, the user must be a root to run a command; the user is asked afterwards to insert the root password Closes the terminal or the current session TABULATOR ( ) Completes the input based on available files located on the working path CLEAR STRG + C PWD (Print Working Directory) LS (LiSt) CD (Change Directory) MKDIR (MaKe DIRectory) IFCONFIG HOSTNAME SSH (Secure SHell) SCP (Secure CoPy) Clears the terminal window Cancels the terminal inputs or running processes (keyboard interrupt) Shows the path of the current working directory Shows all available files in the current working directory Changes the working directory to another given location: cd <Path> switches to the given path; cd / switches to the root directory; cd.. switches to the parent directory; cd - switches to the previous directory Creates a new directory Shows the IP and the MAC address of the computer Shows the assigned host name of the node Opens a remote connection to another node, whereby the destination node is represented in the following way: <user>@<fqdn of destination node> (Securely) Copies the data from a source path to a destination path. It is especially suitable for exchanging data among nodes Now, start the installation by correctly configuring CentOS. Please follow the steps mentioned in the installation guide carefully and pay attention to the following hints. The pages 1 4 of the installation guide only contain some introducing information. In Section (page 5) you should get active for the first time. Hints: If you don t know a command, you can open the manual via the terminal with the following command: man <command> Preparation OpenSSL (important!): o First update the OpenSSL version on all nodes. In order to do this, execute as root: yum update openssl Regarding Section (Check the Maximum Open File Descriptors) of HDP installation guide: o The amount of open file descriptors (= the amount of data which can be processed simultaneously) can be set as root by the following commands ulimit -Hn and ulimit -Sn Regarding Section (Set Up Password less SSH) of HDP installation guide: Hands on Exercise Hadoop Page 3 of 9
4 o Please use your master node as the Ambari server o Generate the public SSH key as root user (!) o The key is available afterwards at: /root/.ssh/ o After setting the read and write rights (Step 4), run additionally the following command on all slave nodes restorecon -R ~/.ssh (NOTE: Password less SSH should not only be set up on both slave nodes, but also on the master node itself. Section of the HDP installation guide must therefore be executed in the case of three cluster nodes three times.) Regarding Section (Enable NTP on the Cluster and on the Browser Host) of the HDP installation guide: o Enable NTP: Actually, you should have already done this step during the installation of CentOS. Regarding Section (Check DNS and NSCD) of the HDP installation guide: o In the case of using the DNS of your institution: It is important that you have assigned the FQDN correctly during the installation phase. However, check the mentioned configuration files with respect to your FQDNs. o Otherwise: The mapping between the chosen FQDNs and their related IP addresses have to be performed manually on each node. Therefore, the related hosts files (that can be found under /etc/hosts) on each node have to be adapted (cf. Section Edit the Host File of HDP installation guide). Regarding Section (Configuring iptables) of the HDP installation guide: o You can only deactivate IPTables when you are logged in as a root user. Run these commands on all nodes! 4. Configuration of the Hadoop Cluster Installing Apache Ambari The installation of the Apache Ambari server is addressed from page 21 of HDP installation guide. Pay attention to the following hint. Hint: Regarding Section 2.2. (Set Up the Ambari Server) of HDP installation guide: o Step 6: Enter advanced Database configuration : Enter n (Default Database) 5. Installation and Start of the Hadoop Cluster The installation of the Hadoop cluster is explained from page 32 of the HDP installation guide. Pay attention to the following hints: Hints: Regarding Section 3.5. (Install Options) of the HDP installation guide: o Target Hosts: Provide the FQDNs of all nodes, including the master node (!). Hands on Exercise Hadoop Page 4 of 9
5 o SSH Private Key: In order to be able to enter the SSH private key, you should copy as a root user the file /root/.ssh/id_rsa to the desktop (e.g. /home/master/desktop) and transfer the file ownership to the master user (e.g. chown -c master id_rsa). Subsequently, you can select and open the key via the web interface. OR: o Open as root the id_rsa file in the terminal using the cat command. Mark the text, copy and paste it into the Ambari web interface. Regarding Section 3.6. (Confirm Hosts) of the HDP installation guide: o Confirm Hosts: Pay attention to potential errors and/or warnings after registering the clients automatically. Don t move on to the next step before you have handled and solved all errors and/or warnings! Regarding Section 3.7. (Choose Services) of the HDP installation guide: o Choose Services: In order to keep the installation process short, select only Hadoop components and applications which are necessary for this exercise, namely HDFS, YARN + MapReduce2 and Pig. 2 Confirm all dependencies with OK. Regarding Section 3.9. (Assign Slaves and Clients) of the HDP installation guide: o Assign Clients: In order to keep the installation process short, install the necessary HDFS and Pig clients only on your master node. As a result, you can execute HDFS commands and start Pig scripts only from your master node. Regarding Section (Customize Services) of the HDP installation guide: o Customize Services: Do not change any settings of the services. However, you can scroll to see what kind of settings you might select, if you are interested. If you have selected the installation of the Hadoop application Apache Oozie, you must provide a database user and a password in this step. Regarding Section (Review) of the HDP installation guide: o Deploy: Depending on how many Hadoop applications you have selected, the deploy step may take up to 10 minutes to complete. Hence, it is time for another short break. 6. Using and Testing the Hadoop Cluster Use and test your recently installed Hadoop cluster. However, read the following hints regarding the usage of the Hadoop Distributed File Systems (HDFS). Subsequently, solve the exercises. Hints for Using HDFS: When automatically installing the Hadoop cluster by using Apache Ambari, you will find a user account called hdfs, which is created by Ambari. The hdfs user has read and write permissions for the virtual Hadoop Distributed File System. By using the terminal command passwd hdfs you can choose a new password for this account. Do so on your master node. After you have successfully changed the password, switch the CentOS user and log in as hdfs (Do not log out, such that the Ambari Server can still run in the background!). Solve the exercises using this account. Execute all HDFS commands as hdfs user (not as root!) and use for the following exercises the HDFS directory /tmp/. 2 The applications Hive + HCatalog and HBase consume too many resources to be launched and therefore are not recommended for our test purposes. Hands on Exercise Hadoop Page 5 of 9
6 By using the following terminal commands, you can put data on HDFS hdfs dfs -copyfromlocal foo.txt /tmp/ or get data from HDFS and put it on your physical local file system hdfs dfs -copytolocal /tmp/wordcountoutput/part-r result.txt or show your data in the terminal window hdfs dfs -cat /tmp/wordcountoutput/part-r The following command will list the content of an HDFS directory: hdfs dfs -ls /tmp/. An extensive overview of all HDFS terminal commands can be found on the HDFS cheat sheet of the book (Dirk deroos, 2014: Hadoop for Dummies) at to/content/hadoop for dummies cheat sheet.html. Hands on Exercise Hadoop Page 6 of 9
7 Exercise 1 Airline On-time Performance Implement the HDFS example Airline on time performance from the book (Dirk deroos, 2014: Hadoop for Dummies, Chapter 13) on your own Hadoop cluster. Thereby, the following steps are of interest: Downloading the sample dataset Copying the sample dataset into HDFS Your first Hadoop program: Hello Hadoop! For your inputs and outputs, use the following HDFS directory rather than the one mentioned in the book: /tmp/. NOTE: The Pig script in the book contains \ to indicate line breaks. These backslashes are not part of the actual script and should therefore be ignored. In addition to that, the script contains a small bug: The path of the input data in the LOAD command (first line) should be put between two single quotation marks (cf. the construction of the following Pig script). If they are missing, an error in the terminal window will be shown. Exercise 2 The Word Count Example Apply the knowledge you gained in the first exercise to implement another Pig script in which you count the frequency of words in a text ( Word Count Example ). Use as an input text file the free RFC 7230 Hypertext Transfer Protocol (HTTP/1.1) : Your Pig script should have the following structure 3 : a = load '/foo.txt'; b = foreach a generate flatten(tokenize((chararray)$0)) as word; c = group b by word; d = foreach c generate COUNT(b), group; store d into '/output'; Subsequently answer the following questions: 2.1 Which essential text mining step have you just applied? 2.2 Where are the phases map and reduce located in the script? 2.3 Have a careful look at the physical representation of the HDFS blocks in your file system. Navigate as root to the following directory: (the name of the 5th subdirectory varies, depending on the time and the date of the file generation as well as the IP address of the node): /hadoop/hdfs/data/current/bp-<nb>-<ip>-<datetime>/current/finalized. By using the terminal command ls -lh you can see in the terminal window all files which are located in the current working directory including their sizes in bytes. Compare the directories on your master and slave nodes with each other! 4 Exercise 3 Performance Tests 3.1 How long does your Hadoop cluster take to compute the word count for the aforementioned RFC 7230? 3 Sample code obtained from tutorial/word counting with apache pig/. 4 Reminder: The default replication factor of HDFS is equal to 3. Hands on Exercise Hadoop Page 7 of 9
8 3.2 How long does your Hadoop cluster take to compute the word count of a significantly shorter text, like a text with probably just a sentence in length (Determination of the administration overhead)? 3.3 How long does your Hadoop cluster take to compute the word count for a significantly longer text, in a scale of 100 to 1000 times larger than RFC 7230? Exercise 4 Extension of an existing Hadoop Cluster 4.1 Consider the following question: Which steps do you need in order to add another slave node to your existing Hadoop cluster? 4.2 What kind of scaling category corresponds to this method? Hands on Exercise Hadoop Page 8 of 9
9 Additional Exercises: Explorative Learning The following exercises represent supplementary exercises. Start solving Exercise 5, by the time your group has done all four previous exercises. If several groups are done with their exercises and there is still some time left (~ 20 minutes), move on to Exercise 6 and collectively work on it. Exercise 5 The Hadoop Ecosystem Try to run a different Hadoop application other than Pig on your Hadoop cluster. For this purpose, search for a short tutorial on the Internet by yourself. 5 NOTE: You can add further services to your Hadoop cluster via the Ambari web dashboard: Actions + Add Services. Exercise 6 Think Big! Connect all available nodes in your laboratory to one single Hadoop cluster. Finally repeat your performance tests from Exercise 2. Can you observe a performance boost? NOTE: By using the following command, you can reset all the settings on your nodes which have been made by the Ambari installation wizard: python /usr/lib/python2.6/site-packages/ambari_agent/hostcleanup.py --silent 5 A good starting point for free Hadoop tutorials: sandbox/. Hands on Exercise Hadoop Page 9 of 9
Guide for Attempting an HDP Certification Practice Exam. Revision 2 Hortonworks University
Guide for Attempting an HDP Certification Practice Exam Revision 2 Hortonworks University Overview Hortonworks University has developed a practice environment that emulates our actual exam environment.
More informationChase Wu New Jersey Institute of Technology
CS 644: Introduction to Big Data Chapter 4. Big Data Analytics Platforms Chase Wu New Jersey Institute of Technology Some of the slides were provided through the courtesy of Dr. Ching-Yung Lin at Columbia
More informationCloudera Manager Quick Start Guide
Cloudera Manager Guide Important Notice (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained in this
More informationHortonworks Data Platform
Hortonworks Data Platform Workflow Management (August 31, 2017) docs.hortonworks.com Hortonworks Data Platform: Workflow Management Copyright 2012-2017 Hortonworks, Inc. Some rights reserved. The Hortonworks
More informationInstalling SmartSense on HDP
1 Installing SmartSense on HDP Date of Publish: 2018-07-12 http://docs.hortonworks.com Contents SmartSense installation... 3 SmartSense system requirements... 3 Operating system, JDK, and browser requirements...3
More informationXcalar Installation Guide
Xcalar Installation Guide Publication date: 2018-03-16 www.xcalar.com Copyright 2018 Xcalar, Inc. All rights reserved. Table of Contents Xcalar installation overview 5 Audience 5 Overview of the Xcalar
More informationCENG 334 Computer Networks. Laboratory I Linux Tutorial
CENG 334 Computer Networks Laboratory I Linux Tutorial Contents 1. Logging In and Starting Session 2. Using Commands 1. Basic Commands 2. Working With Files and Directories 3. Permission Bits 3. Introduction
More informationCreate Test Environment
Create Test Environment Describes how to set up the Trafodion test environment used by developers and testers Prerequisites Python Passwordless ssh If you already have an existing set of ssh keys If you
More informationContents. Note: pay attention to where you are. Note: Plaintext version. Note: pay attention to where you are... 1 Note: Plaintext version...
Contents Note: pay attention to where you are........................................... 1 Note: Plaintext version................................................... 1 Hello World of the Bash shell 2 Accessing
More informationHow to Run the Big Data Management Utility Update for 10.1
How to Run the Big Data Management Utility Update for 10.1 2016 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording
More informationIntroduction into Big Data analytics Lecture 3 Hadoop ecosystem. Janusz Szwabiński
Introduction into Big Data analytics Lecture 3 Hadoop ecosystem Janusz Szwabiński Outlook of today s talk Apache Hadoop Project Common use cases Getting started with Hadoop Single node cluster Further
More informationConfigure HOSTNAME by adding the hostname to the file /etc/sysconfig/network. Do the same to all the other 3(4) nodes.
Network setup As the root user execute the command "ifconfig" on each host. Take a note of ipaddress's of all machines I have a home LAN so my addresses are class C which might be in the format 192.168.192.x.
More informationUsing The Hortonworks Virtual Sandbox Powered By Apache Hadoop
Using The Hortonworks Virtual Sandbox Powered By Apache Hadoop This work by Hortonworks, Inc. is licensed under a Creative Commons Attribution ShareAlike3.0 Unported License. Legal Notice Copyright 2012
More informationProcessing Big Data with Hadoop in Azure HDInsight
Processing Big Data with Hadoop in Azure HDInsight Lab 1 - Getting Started with HDInsight Overview In this lab, you will provision an HDInsight cluster. You will then run a sample MapReduce job on the
More informationSandbox Setup Guide for HDP 2.2 and VMware
Waterline Data Inventory Sandbox Setup Guide for HDP 2.2 and VMware Product Version 2.0 Document Version 10.15.2015 2014-2015 Waterline Data, Inc. All rights reserved. All other trademarks are the property
More informationHortonworks SmartSense
Hortonworks SmartSense Installation (April 3, 2017) docs.hortonworks.com Hortonworks SmartSense: Installation Copyright 2012-2017 Hortonworks, Inc. Some rights reserved. The Hortonworks Data Platform,
More informationAccessing clusters 2. Accessing Clusters. Date of Publish:
2 Accessing Clusters Date of Publish: 2018-09-14 http://docs.hortonworks.com Contents Cloudbreak user accounts... 3 Finding cluster information in the web UI... 3 Cluster summary... 4 Cluster information...
More informationDeveloper Training for Apache Spark and Hadoop: Hands-On Exercises
201709c Developer Training for Apache Spark and Hadoop: Hands-On Exercises Table of Contents General Notes... 1 Hands-On Exercise: Starting the Exercise Environment (Local VM)... 5 Hands-On Exercise: Starting
More informationBitnami Apache Solr for Huawei Enterprise Cloud
Bitnami Apache Solr for Huawei Enterprise Cloud Description Apache Solr is an open source enterprise search platform from the Apache Lucene project. It includes powerful full-text search, highlighting,
More informationUpgrading Big Data Management to Version Update 2 for Hortonworks HDP
Upgrading Big Data Management to Version 10.1.1 Update 2 for Hortonworks HDP Copyright Informatica LLC 2017. Informatica, the Informatica logo, and Informatica Big Data Management are trademarks or registered
More informationLinux Operating System Environment Computadors Grau en Ciència i Enginyeria de Dades Q2
Linux Operating System Environment Computadors Grau en Ciència i Enginyeria de Dades 2017-2018 Q2 Facultat d Informàtica de Barcelona This first lab session is focused on getting experience in working
More informationUNIT 9 Introduction to Linux and Ubuntu
AIR FORCE ASSOCIATION S CYBERPATRIOT NATIONAL YOUTH CYBER EDUCATION PROGRAM UNIT 9 Introduction to Linux and Ubuntu Learning Objectives Participants will understand the basics of Linux, including the nature,
More informationJackson State University Department of Computer Science CSC / Computer Security Fall 2013 Instructor: Dr. Natarajan Meghanathan
Jackson State University Department of Computer Science CSC 437-01/539-01 Computer Security Fall 2013 Instructor: Dr. Natarajan Meghanathan Lab Project # 2: Running Secure Shell (SSH) Server in a Virtual
More informationHortonworks SmartSense
Hortonworks SmartSense Installation (January 8, 2018) docs.hortonworks.com Hortonworks SmartSense: Installation Copyright 2012-2018 Hortonworks, Inc. Some rights reserved. The Hortonworks Data Platform,
More informationConfigure HOSTNAME by adding the hostname to the file /etc/sysconfig/network. Do the same to all the all nodes.
Network setup As the root user execute the command "ifconfig" on each host. Take a note of ipaddress's of all machines I have a home LAN so my addresses are class C which might be in the format 192.168.192.x.
More informationWorking with Basic Linux. Daniel Balagué
Working with Basic Linux Daniel Balagué How Linux Works? Everything in Linux is either a file or a process. A process is an executing program identified with a PID number. It runs in short or long duration
More informationUsing Hive for Data Warehousing
An IBM Proof of Technology Using Hive for Data Warehousing Unit 1: Exploring Hive An IBM Proof of Technology Catalog Number Copyright IBM Corporation, 2013 US Government Users Restricted Rights - Use,
More informationThis lab exercise is to be submitted at the end of the lab session! passwd [That is the command to change your current password to a new one]
Data and Computer Security (CMPD414) Lab II Topics: secure login, moving into HOME-directory, navigation on Unix, basic commands for vi, Message Digest This lab exercise is to be submitted at the end of
More informationAzure Marketplace Getting Started Tutorial. Community Edition
Azure Marketplace Getting Started Tutorial Community Edition Introduction NooBaa software provides a distributed storage solution for unstructured data such as analytics data, multi-media, backup, and
More informationHortonworks Data Platform v1.0 Powered by Apache Hadoop Installing and Configuring HDP using Hortonworks Management Center
Hortonworks Data Platform v1.0 Powered by Apache Hadoop Installing and Configuring HDP using Hortonworks Management Center This work by Hortonworks, Inc. is licensed under a Creative Commons Attribution-ShareAlike
More informationCloud Computing II. Exercises
Cloud Computing II Exercises Exercise 1 Creating a Private Cloud Overview In this exercise, you will install and configure a private cloud using OpenStack. This will be accomplished using a singlenode
More informationBitnami MEAN for Huawei Enterprise Cloud
Bitnami MEAN for Huawei Enterprise Cloud Description Bitnami MEAN Stack provides a complete development environment for mongodb and Node.js that can be deployed in one click. It includes the latest stable
More informationSession 1: Accessing MUGrid and Command Line Basics
Session 1: Accessing MUGrid and Command Line Basics Craig A. Struble, Ph.D. July 14, 2010 1 Introduction The Marquette University Grid (MUGrid) is a collection of dedicated and opportunistic resources
More informationInstalling Apache Zeppelin
3 Installing Date of Publish: 2018-04-01 http://docs.hortonworks.com Contents Install Using Ambari...3 Enabling HDFS and Configuration Storage for Zeppelin Notebooks in HDP-2.6.3+...4 Overview... 4 Enable
More informationLinux Essentials Objectives Topics:
Linux Essentials Linux Essentials is a professional development certificate program that covers basic knowledge for those working and studying Open Source and various distributions of Linux. Exam Objectives
More informationEnterprise Steam Installation and Setup
Enterprise Steam Installation and Setup Release H2O.ai Mar 01, 2017 CONTENTS 1 Installing Enterprise Steam 3 1.1 Obtaining the License Key........................................ 3 1.2 Ubuntu Installation............................................
More informationVMware vsphere Big Data Extensions Administrator's and User's Guide
VMware vsphere Big Data Extensions Administrator's and User's Guide vsphere Big Data Extensions 1.1 This document supports the version of each product listed and supports all subsequent versions until
More informationHitachi Hyper Scale-Out Platform (HSP) Hortonworks Ambari VM Deployment Guide
Hitachi Hyper Scale-Out Platform (HSP) MK-95HSP017-03 11 October 2016 2016 Hitachi, Ltd. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic
More informationSTUDENT GRADE IMPROVEMENT INHIGHER STUDIES
STUDENT GRADE IMPROVEMENT INHIGHER STUDIES Sandhya P. Pandey Assistant Professor, The S.I.A college of Higher Education, Dombivili( E), Thane, Maharastra. Abstract: In India Higher educational institutions
More informationCS CS Tutorial 2 2 Winter 2018
CS CS 230 - Tutorial 2 2 Winter 2018 Sections 1. Unix Basics and connecting to CS environment 2. MIPS Introduction & CS230 Interface 3. Connecting Remotely If you haven t set up a CS environment password,
More informationOracle BDA: Working With Mammoth - 1
Hello and welcome to this online, self-paced course titled Administering and Managing the Oracle Big Data Appliance (BDA). This course contains several lessons. This lesson is titled Working With Mammoth.
More informationHortonworks DataFlow
Hortonworks DataFlow Installing HDF Services on a New HDP Cluster (February 28, 2018) docs.hortonworks.com Hortonworks DataFlow: Installing HDF Services on a New HDP Cluster Copyright 2012-2018 Hortonworks,
More informationAn Introduction to Cluster Computing Using Newton
An Introduction to Cluster Computing Using Newton Jason Harris and Dylan Storey March 25th, 2014 Jason Harris and Dylan Storey Introduction to Cluster Computing March 25th, 2014 1 / 26 Workshop design.
More informationLogging on to the Hadoop Cluster Nodes. To login to the Hadoop cluster in ROGER, a user needs to login to ROGER first, for example:
Hadoop User Guide Logging on to the Hadoop Cluster Nodes To login to the Hadoop cluster in ROGER, a user needs to login to ROGER first, for example: ssh username@roger-login.ncsa. illinois.edu after entering
More informationScopTEL TM IP PBX Software. DNS Server Configuration Wizard
ScopTEL TM IP PBX Software DNS Server Configuration Wizard Network Module - ifconfig A newly installed server uses DHCP to get an IP address from a DHCP server on the network so make sure the eth0 interface
More informationThis tutorial will guide you how to setup and run your own minecraft server on a Linux CentOS 6 in no time.
This tutorial will guide you how to setup and run your own minecraft server on a Linux CentOS 6 in no time. Running your own server lets you play together with your friends and family with your own set
More informationDeploying Rubrik Datos IO to Protect MongoDB Database on GCP
DEPLOYMENT GUIDE Deploying Rubrik Datos IO to Protect MongoDB Database on GCP TABLE OF CONTENTS INTRODUCTION... 1 OBJECTIVES... 1 COSTS... 2 BEFORE YOU BEGIN... 2 PROVISIONING YOUR INFRASTRUCTURE FOR THE
More informationParallel Programming Pre-Assignment. Setting up the Software Environment
Parallel Programming Pre-Assignment Setting up the Software Environment Authors: B. Wilkinson and C. Ferner. Modification date: Aug 21, 2014 (Minor correction Aug 27, 2014.) Software The purpose of this
More informationTroubleshooting Cisco APIC-EM Multi-Host
The following procedures may be used to troubleshoot a Cisco APIC-EM multi-host configuration: Changing the Settings in a Multi-Host Cluster, page 1 Removing a Single Host from a Multi-Host Cluster, page
More information202 Lab Introduction Connecting to the Lab Environment
202 Lab Introduction Connecting to the Lab Environment Objectives During this v7.1 Deployment lab, each student (from the Blue group or Green group) must verify access (and permissions) to their assigned
More informationLab Working with Linux Command Line
Introduction In this lab, you will use the Linux command line to manage files and folders and perform some basic administrative tasks. Recommended Equipment A computer with a Linux OS, either installed
More informationA Guide to Running Map Reduce Jobs in Java University of Stirling, Computing Science
A Guide to Running Map Reduce Jobs in Java University of Stirling, Computing Science Introduction The Hadoop cluster in Computing Science at Stirling allows users with a valid user account to submit and
More informationSetting up a Chaincoin Masternode
Setting up a Chaincoin Masternode Introduction So you want to set up your own Chaincoin Masternode? You ve come to the right place! These instructions are correct as of April, 2017, and relate to version
More informationInstallation 1. DLM Installation. Date of Publish:
1 DLM Installation Date of Publish: 2018-05-18 http://docs.hortonworks.com Contents Installation overview...3 Setting Up the Local Repository for Your DLM Installation... 3 Set up a local repository for
More informationPart I. Introduction to Linux
Part I Introduction to Linux 7 Chapter 1 Linux operating system Goal-of-the-Day Familiarisation with basic Linux commands and creation of data plots. 1.1 What is Linux? All astronomical data processing
More informationBitnami JRuby for Huawei Enterprise Cloud
Bitnami JRuby for Huawei Enterprise Cloud Description JRuby is a 100% Java implementation of the Ruby programming language. It is Ruby for the JVM. JRuby provides a complete set of core built-in classes
More informationHortonworks Technical Preview for Apache Falcon
Architecting the Future of Big Data Hortonworks Technical Preview for Apache Falcon Released: 11/20/2013 Architecting the Future of Big Data 2013 Hortonworks Inc. All Rights Reserved. Welcome to Hortonworks
More informationManaging and Monitoring a Cluster
2 Managing and Monitoring a Cluster Date of Publish: 2018-04-30 http://docs.hortonworks.com Contents ii Contents Introducing Ambari operations... 5 Understanding Ambari architecture... 5 Access Ambari...
More informationIn this exercise you will practice working with HDFS, the Hadoop. You will use the HDFS command line tool and the Hue File Browser
Access HDFS with Command Line and Hue Data Files (local): ~/labs/data/kb/* ~/labs/data/base_stations.tsv In this exercise you will practice working with HDFS, the Hadoop Distributed File System. You will
More informationHorizon DaaS Platform 6.1 Service Provider Installation - vcloud
Horizon DaaS Platform 6.1 Service Provider Installation - vcloud This guide provides information on how to install and configure the DaaS platform Service Provider appliances using vcloud discovery of
More informationLinux Command Line Primer. By: Scott Marshall
Linux Command Line Primer By: Scott Marshall Draft: 10/21/2007 Table of Contents Topic Page(s) Preface 1 General Filesystem Background Information 2 General Filesystem Commands 2 Working with Files and
More informationAccessing Hadoop Data Using Hive
An IBM Proof of Technology Accessing Hadoop Data Using Hive Unit 3: Hive DML in action An IBM Proof of Technology Catalog Number Copyright IBM Corporation, 2015 US Government Users Restricted Rights -
More informationHadoop An Overview. - Socrates CCDH
Hadoop An Overview - Socrates CCDH What is Big Data? Volume Not Gigabyte. Terabyte, Petabyte, Exabyte, Zettabyte - Due to handheld gadgets,and HD format images and videos - In total data, 90% of them collected
More informationBeta. VMware vsphere Big Data Extensions Administrator's and User's Guide. vsphere Big Data Extensions 1.0 EN
VMware vsphere Big Data Extensions Administrator's and User's Guide vsphere Big Data Extensions 1.0 This document supports the version of each product listed and supports all subsequent versions until
More informationHitachi Hyper Scale-Out Platform (HSP) Hortonworks Ambari VM Quick Reference Guide
Hitachi Hyper Scale-Out Platform (HSP) MK-95HSP013-03 14 October 2016 2016 Hitachi, Ltd. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic
More informationIntroduction to Linux Workshop 2. The George Washington University SEAS Computing Facility
Introduction to Linux Workshop 2 The George Washington University SEAS Computing Facility Course Goals SSH and communicating with other machines Public/Private key generation,.ssh directory, and the config
More informationInnovatus Technologies
HADOOP 2.X BIGDATA ANALYTICS 1. Java Overview of Java Classes and Objects Garbage Collection and Modifiers Inheritance, Aggregation, Polymorphism Command line argument Abstract class and Interfaces String
More informationCommands are in black
Starting From the Shell Prompt (Terminal) Commands are in black / +--------+---------+-------+---------+---------+------ +------ +------ +------ +------ +------ +-- Bin boot dev etc home media sbin bin
More informationAzure Marketplace. Getting Started Tutorial. Community Edition
Azure Marketplace Getting Started Tutorial Community Edition Introduction NooBaa software provides a distributed storage solution for unstructured data such as analytics data, multi-media, backup, and
More informationMe CloudTM: Getti g Started Guide
Me CloudTM: Getti g Started Guide November 2016 Version 1.16 Kodiak Data, Inc. 2570 W El Camino Real Suite 500 Mountain View, CA 94040 Phone: (650) 383-8374 support@kodiakdata.com www.kodiakdata.com Copyright
More informationLinux Kung Fu. Stephen James UBNetDef, Spring 2017
Linux Kung Fu Stephen James UBNetDef, Spring 2017 Introduction What is Linux? What is the difference between a client and a server? What is Linux? Linux generally refers to a group of Unix-like free and
More informationBitnami ez Publish for Huawei Enterprise Cloud
Bitnami ez Publish for Huawei Enterprise Cloud Description ez Publish is an Enterprise Content Management platform with an easy to use Web Content Management System. It includes role-based multi-user access,
More informationCST8207: GNU/Linux Operating Systems I Lab Six Linux File System Permissions. Linux File System Permissions (modes) - Part 1
Student Name: Lab Section: Linux File System Permissions (modes) - Part 1 Due Date - Upload to Blackboard by 8:30am Monday March 12, 2012 Submit the completed lab to Blackboard following the Rules for
More informationOPS235: Week 1. Installing Linux ( Lab1: Investigations 1-4)
OPS235: Week 1 Installing Linux ( Lab1: Investigations 1-4) 1 Agenda: Lab 1 Thinking Ahead (Tips / Warnings): Required Materials / Coming Prepared to Labs Importance of Mastering the CLI (Command Line
More informationGetting Started 1. Getting Started. Date of Publish:
1 Date of Publish: 2018-07-03 http://docs.hortonworks.com Contents... 3 Data Lifecycle Manager terminology... 3 Communication with HDP clusters...4 How pairing works in Data Lifecycle Manager... 5 How
More informationUnix Tutorial Haverford Astronomy 2014/2015
Unix Tutorial Haverford Astronomy 2014/2015 Overview of Haverford astronomy computing resources This tutorial is intended for use on computers running the Linux operating system, including those in the
More informationTutorial 1: Unix Basics
Tutorial 1: Unix Basics To log in to your ece account, enter your ece username and password in the space provided in the login screen. Note that when you type your password, nothing will show up in the
More informationSAP VORA 1.4 on AWS - MARKETPLACE EDITION FREQUENTLY ASKED QUESTIONS
SAP VORA 1.4 on AWS - MARKETPLACE EDITION FREQUENTLY ASKED QUESTIONS 1. What is SAP Vora? SAP Vora is an in-memory, distributed computing solution that helps organizations uncover actionable business insights
More informationSPLUNK ENTERPRISE AND ECS TECHNICAL SOLUTION GUIDE
SPLUNK ENTERPRISE AND ECS TECHNICAL SOLUTION GUIDE Splunk Frozen and Archive Buckets on ECS ABSTRACT This technical solution guide describes a solution for archiving Splunk frozen buckets to ECS. It also
More informationBitnami Re:dash for Huawei Enterprise Cloud
Bitnami Re:dash for Huawei Enterprise Cloud Description Re:dash is an open source data visualization and collaboration tool. It was designed to allow fast and easy access to billions of records in all
More informationeftp Application User Guide
Team A eftp User Guide 1/30 eftp Application User Guide Table of Contents Page 1. Acknowledgement 2 2. Introduction a. Welcome eftp Audience 3 b. What s in this manual 3 c. Manual Conventions 3 d. Getting
More informationCSE 101 Introduction to Computers Development / Tutorial / Lab Environment Setup
CSE 101 Introduction to Computers Development / Tutorial / Lab Environment Setup Purpose: The purpose of this lab is to setup software that you will be using throughout the term for learning about Python
More informationBitnami Piwik for Huawei Enterprise Cloud
Bitnami Piwik for Huawei Enterprise Cloud Description Piwik is a real time web analytics software program. It provides detailed reports on website visitors: the search engines and keywords they used, the
More informationLinux Systems Administration Getting Started with Linux
Linux Systems Administration Getting Started with Linux Network Startup Resource Center www.nsrc.org These materials are licensed under the Creative Commons Attribution-NonCommercial 4.0 International
More informationHortonworks DataFlow
Hortonworks DataFlow Installing HDF Services on a New HDP Cluster for IBM (December 22, 2017) docs.hortonworks.com Hortonworks DataFlow: Installing HDF Services on a New HDP Cluster for IBM Power Systems
More informationWe are ready to serve Latest Testing Trends, Are you ready to learn.?? New Batches Info
We are ready to serve Latest Testing Trends, Are you ready to learn.?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : About Quality Thought We are
More informationPlexxi HCN Plexxi Connect Installation, Upgrade and Administration Guide Release 3.0.0
Plexxi HCN Plexxi Connect Installation, Upgrade and Administration Guide Release 3.0.0 May 3, 2018 100 Innovative Way - Suite 3322 Nashua, NH 03062 Tel. +1.888.630.PLEX (7539) www.plexxi.com Legal Notices
More informationSystem Manager Unit (SMU) Hardware Reference
System Manager Unit (SMU) Hardware Reference MK-92HNAS065-02 Notices and Disclaimer Copyright 2015 Hitachi Data Systems Corporation. All rights reserved. The performance data contained herein was obtained
More informationTDDE31/732A54 - Big Data Analytics Lab compendium
TDDE31/732A54 - Big Data Analytics Lab compendium For relational databases lab, please refer to http://www.ida.liu.se/~732a54/lab/rdb/index.en.shtml. Description and Aim In the lab exercises you will work
More informationCS/CIS 249 SP18 - Intro to Information Security
Lab assignment CS/CIS 249 SP18 - Intro to Information Security Lab #2 - UNIX/Linux Access Controls, version 1.2 A typed document is required for this assignment. You must type the questions and your responses
More informationCS 1110, LAB 1: EXPRESSIONS AND ASSIGNMENTS First Name: Last Name: NetID:
CS 1110, LAB 1: EXPRESSIONS AND ASSIGNMENTS http://www.cs.cornell.edu/courses/cs1110/2018sp/labs/lab01/lab01.pdf First Name: Last Name: NetID: Learning goals: (1) get hands-on experience using Python in
More informationExercise #1: ANALYZING SOCIAL MEDIA AND CUSTOMER SENTIMENT WITH APACHE NIFI AND HDP SEARCH INTRODUCTION CONFIGURE AND START SOLR
Exercise #1: ANALYZING SOCIAL MEDIA AND CUSTOMER SENTIMENT WITH APACHE NIFI AND HDP SEARCH INTRODUCTION We will use Solr and the LucidWorks HDP Search to view our streamed data in real time to gather insights
More informationTest Lab Introduction to the Test Lab Linux Cluster Environment
Test Lab 1.0 - Introduction to the Test Lab Linux Cluster Environment Test lab is a set of three disposable cluster environments that can be used for systems research. All three environments are accessible
More informationHortonworks Data Platform
Data Governance () docs.hortonworks.com : Data Governance Copyright 2012-2017 Hortonworks, Inc. Some rights reserved. The, powered by Apache Hadoop, is a massively scalable and 100% open source platform
More informationdocs.hortonworks.com
docs.hortonworks.com : Getting Started Guide Copyright 2012, 2014 Hortonworks, Inc. Some rights reserved. The, powered by Apache Hadoop, is a massively scalable and 100% open source platform for storing,
More informationBitnami Ruby for Huawei Enterprise Cloud
Bitnami Ruby for Huawei Enterprise Cloud Description Bitnami Ruby Stack provides a complete development environment for Ruby on Rails that can be deployed in one click. It includes most popular components
More informationBig Data Programming: an Introduction. Spring 2015, X. Zhang Fordham Univ.
Big Data Programming: an Introduction Spring 2015, X. Zhang Fordham Univ. Outline What the course is about? scope Introduction to big data programming Opportunity and challenge of big data Origin of Hadoop
More informationBitnami HHVM for Huawei Enterprise Cloud
Bitnami HHVM for Huawei Enterprise Cloud Description HHVM is an open source virtual machine designed for executing programs written in Hack and PHP. HHVM uses a just-in-time (JIT) compilation approach
More informationIntroduction to UNIX command-line
Introduction to UNIX command-line Boyce Thompson Institute March 17, 2015 Lukas Mueller & Noe Fernandez Class Content Terminal file system navigation Wildcards, shortcuts and special characters File permissions
More informationINITIAL EVALUATION BIGSQL FOR HORTONWORKS (Homerun or merely a major bluff?)
PER STRICKER, THOMAS KALB 07.02.2017, HEART OF TEXAS DB2 USER GROUP, AUSTIN 08.02.2017, DB2 FORUM USER GROUP, DALLAS INITIAL EVALUATION BIGSQL FOR HORTONWORKS (Homerun or merely a major bluff?) Copyright
More information