A Guide to Running Map Reduce Jobs in Java University of Stirling, Computing Science

Size: px
Start display at page:

Download "A Guide to Running Map Reduce Jobs in Java University of Stirling, Computing Science"

Transcription

1 A Guide to Running Map Reduce Jobs in Java University of Stirling, Computing Science Introduction The Hadoop cluster in Computing Science at Stirling allows users with a valid user account to submit and run MapReduce jobs written in Java. This guide explains how to take your java program and run it on the cluster. User Account You will need a user account for the Hadoop cluster. If you are taking the MSc. in Big Data, you will already have one. The username and password are the same as those you use to log into the computers in the department. If you don t have an account, ask Kevin Swingler to arrange one for you. Connect to the Hadoop Server Use SSH to connect to the Hadoop server. The details are: Host name: hadoop0.cs.stir.ac.uk Port: 541 In the labs, you can use the Windows program called PuTTY to connect. You should get a command window like this: In the picture above, I ve typed pwd to check the current directory, and found I am in /home/kms. You will be in /home/username. This directory is the same as the one you see when you connect to your H: drive from Windows. You should make a directory (do it either here or in Windows) in which your Hadoop work will go. Call it Hadoop. You can then move files to it, edit them, etc. in Windows. cd Hadoop will change you to that directory and ls will list the contents. You will find the history command useful. It lists your previous commands with a number. The rerun one, type!num where num is the number from the history list of the command you want to rerun. For example!4 runs the 4 th command in your history. Look up unix commands online for more help with the shell. Other helpful things include using the tab key to complete partial filenames and right clicking the mouse to paste.

2 Getting Files into HDFS When you run a MapReduce job, you will process files in the HDFS store, not the directory we have just been looking at. First, create a directory on HDFS into which your data will go: hdfs dfs -mkdir /home/username/targetdir To copy files from this directory into HDFS, use the HDFS command copyfromlocal: hdfs dfs -copyfromlocal datafile /home/username/targetdir will copy the file datafile from the current directory on hadoop0 to the directory called targetdir in your user space on HDFS. You can check it is there with hdfs dfs ls /home/username/targetdir You ll see this is quite slow the first time you do it, and never very fast after that. You only need to copy your data files to HDFS, not the java programs you wish to run. That is done differently, as is explained below. Compiling Your Java Write your MapReduce java program using your favourite development environment. We will be using pretty small programs, so any text editor will do, or you could use eclipse, for example. In what follows, replace username with your username and myprog with the name you gave to your java program file. hadoop com.sun.tools.javac.main myprog.java This will produce the java.class files for the project. Type ls to see them. Then make a jar file by typing jar cf myprog.jar myprog*.class Now type ls to verify that you have a file called myprog.jar. Note: This is not a guide to writing MapReduce programs in Java. See the other practical sheets and lecture notes for that.

3 Submit the Job to Hadoop If all went well, you are ready to submit the program to hadoop. Do so by typing hadoop jar myprog.jar myclass /home/username/data /home/username/res Where (in this example) the files you wish to process are in /home/username/data and you would like the results to go to /home/username/res. The /res folder must NOT already exist. Hadoop will create it for you, but if it is already there, it will fail. If all is well, you should see status updates for your job in the command window: Monitor the Job Online You can also see the jobs running in Hadoop by pointing a web browser at See the Results The results can be found in the directory you specified when you submitted the job. In our example above, that was /home/username/res. List the contents with hdfs dfs ls /home/username/res and you will see a file called part-r View its contents with hdfs dfs cat /home/username/res/part-r You should see the list of keys and values output from your reduce function. If you don t want to keep the results, remove the directory they are in with Hdfs dfs rm -R /home/username/res

4 Killing a Job If your job is taking a VERY long time, perhaps it is looping forever. You can kill it as follows: mapred job list That should show you a list of jobs running. Find yours (look for the username) and identify the JobID something like job_ _0178 for example. Then type mapred job kill jobid Where jobid is the ID you have just identified. Ask for demonstrator help if you need it. Testing Your Code Without Hadoop It can be rather tedious compiling, making the jar file, and running the Hadoop job just to find a bug, so you can do some testing outside the Hadoop environment. You will need these two libraries: hadoop-common alpha.jar hadoop-mapreduce-client-core alpha.jar They are already in the labs. On your own machine, put them somewhere sensible, for example C:\Program Files\Java\jre7\lib\ext. Add them to your java project in Properties->Java Build Path->Libraries by clicking Add External Jars. What I am describing here is a way to check your code for typos and simple bugs, not a way to test whether Map Reduce actually solves the problem you are attempting. It won t collate the output from the mapper and send it to the reducer. You won t be able to write to context, so replace context.write with System.out.printf, or make your map and reduce functions return values instead of writing to the context. Here is an example. Say we have a map method as follows: public static class TMapper extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasmoretokens()) { word.set(itr.nexttoken()); // context.write(word, one); System.out.printf("%s : %s\n",word.tostring(), one.tostring()); And we want to test that the output is correct, we can call the map function using: public static void main(string[] args) throws IOException, InterruptedException {

5 TMapper tm=new TMapper(); Text word = new Text(); word.set("hello World"); tm.map(0, word, null); The correct output would be Hello : 1 World : 1 Note the commented out // context.write(word, one); line. You would uncomment that before running the job on Hadoop (and comment out the printf).

COMP4442. Service and Cloud Computing. Lab 12: MapReduce. Prof. George Baciu PQ838.

COMP4442. Service and Cloud Computing. Lab 12: MapReduce. Prof. George Baciu PQ838. COMP4442 Service and Cloud Computing Lab 12: MapReduce www.comp.polyu.edu.hk/~csgeorge/comp4442 Prof. George Baciu csgeorge@comp.polyu.edu.hk PQ838 1 Contents Introduction to MapReduce A WordCount example

More information

Java in MapReduce. Scope

Java in MapReduce. Scope Java in MapReduce Kevin Swingler Scope A specific look at the Java code you might use for performing MapReduce in Hadoop Java program recap The map method The reduce method The whole program Running on

More information

Compile and Run WordCount via Command Line

Compile and Run WordCount via Command Line Aims This exercise aims to get you to: Compile, run, and debug MapReduce tasks via Command Line Compile, run, and debug MapReduce tasks via Eclipse One Tip on Hadoop File System Shell Following are the

More information

Attacking & Protecting Big Data Environments

Attacking & Protecting Big Data Environments Attacking & Protecting Big Data Environments Birk Kauer & Matthias Luft {bkauer, mluft}@ernw.de #WhoAreWe Birk Kauer - Security Researcher @ERNW - Mainly Exploit Developer Matthias Luft - Security Researcher

More information

Parallel Data Processing with Hadoop/MapReduce. CS140 Tao Yang, 2014

Parallel Data Processing with Hadoop/MapReduce. CS140 Tao Yang, 2014 Parallel Data Processing with Hadoop/MapReduce CS140 Tao Yang, 2014 Overview What is MapReduce? Example with word counting Parallel data processing with MapReduce Hadoop file system More application example

More information

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc.

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc. D. Praveen Kumar Junior Research Fellow Department of Computer Science & Engineering Indian Institute of Technology (Indian School of Mines) Dhanbad, Jharkhand, India Head of IT & ITES, Skill Subsist Impels

More information

You should see something like this, called the prompt :

You should see something like this, called the prompt : CSE 1030 Lab 1 Basic Use of the Command Line PLEASE NOTE this lab will not be graded and does not count towards your final grade. However, all of these techniques are considered testable in a labtest.

More information

STA 303 / 1002 Using SAS on CQUEST

STA 303 / 1002 Using SAS on CQUEST STA 303 / 1002 Using SAS on CQUEST A review of the nuts and bolts A.L. Gibbs January 2012 Some Basics of CQUEST If you don t already have a CQUEST account, go to www.cquest.utoronto.ca and request one.

More information

Oregon State University School of Electrical Engineering and Computer Science. CS 261 Recitation 1. Spring 2011

Oregon State University School of Electrical Engineering and Computer Science. CS 261 Recitation 1. Spring 2011 Oregon State University School of Electrical Engineering and Computer Science CS 261 Recitation 1 Spring 2011 Outline Using Secure Shell Clients GCC Some Examples Intro to C * * Windows File transfer client:

More information

Experiences with a new Hadoop cluster: deployment, teaching and research. Andre Barczak February 2018

Experiences with a new Hadoop cluster: deployment, teaching and research. Andre Barczak February 2018 Experiences with a new Hadoop cluster: deployment, teaching and research Andre Barczak February 2018 abstract In 2017 the Machine Learning research group got funding for a new Hadoop cluster. However,

More information

MapReduce & YARN Hands-on Lab Exercise 1 Simple MapReduce program in Java

MapReduce & YARN Hands-on Lab Exercise 1 Simple MapReduce program in Java MapReduce & YARN Hands-on Lab Exercise 1 Simple MapReduce program in Java Contents Page 1 Copyright IBM Corporation, 2015 US Government Users Restricted Rights - Use, duplication or disclosure restricted

More information

Processing Big Data with Hadoop in Azure HDInsight

Processing Big Data with Hadoop in Azure HDInsight Processing Big Data with Hadoop in Azure HDInsight Lab 1 - Getting Started with HDInsight Overview In this lab, you will provision an HDInsight cluster. You will then run a sample MapReduce job on the

More information

CSCI 161: Introduction to Programming I Lab 1b: Hello, World (Eclipse, Java)

CSCI 161: Introduction to Programming I Lab 1b: Hello, World (Eclipse, Java) Goals - to learn how to compile and execute a Java program - to modify a program to enhance it Overview This activity will introduce you to the Java programming language. You will type in the Java program

More information

Introduction to Map/Reduce. Kostas Solomos Computer Science Department University of Crete, Greece

Introduction to Map/Reduce. Kostas Solomos Computer Science Department University of Crete, Greece Introduction to Map/Reduce Kostas Solomos Computer Science Department University of Crete, Greece What we will cover What is MapReduce? How does it work? A simple word count example (the Hello World! of

More information

Lab 1 Introduction to UNIX and C

Lab 1 Introduction to UNIX and C Name: Lab 1 Introduction to UNIX and C This first lab is meant to be an introduction to computer environments we will be using this term. You must have a Pitt username to complete this lab. NOTE: Text

More information

Parallel Programming Pre-Assignment. Setting up the Software Environment

Parallel Programming Pre-Assignment. Setting up the Software Environment Parallel Programming Pre-Assignment Setting up the Software Environment Authors: B. Wilkinson and C. Ferner. Modification date: Aug 21, 2014 (Minor correction Aug 27, 2014.) Software The purpose of this

More information

Problem Set 0. General Instructions

Problem Set 0. General Instructions CS246: Mining Massive Datasets Winter 2014 Problem Set 0 Due 9:30am January 14, 2014 General Instructions This homework is to be completed individually (no collaboration is allowed). Also, you are not

More information

Chapter 3. Distributed Algorithms based on MapReduce

Chapter 3. Distributed Algorithms based on MapReduce Chapter 3 Distributed Algorithms based on MapReduce 1 Acknowledgements Hadoop: The Definitive Guide. Tome White. O Reilly. Hadoop in Action. Chuck Lam, Manning Publications. MapReduce: Simplified Data

More information

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 16. Big Data Management VI (MapReduce Programming)

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 16. Big Data Management VI (MapReduce Programming) Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 16 Big Data Management VI (MapReduce Programming) Credits: Pietro Michiardi (Eurecom): Scalable Algorithm

More information

MapReduce Simplified Data Processing on Large Clusters

MapReduce Simplified Data Processing on Large Clusters MapReduce Simplified Data Processing on Large Clusters Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) MapReduce 1393/8/5 1 /

More information

Logging on to the Hadoop Cluster Nodes. To login to the Hadoop cluster in ROGER, a user needs to login to ROGER first, for example:

Logging on to the Hadoop Cluster Nodes. To login to the Hadoop cluster in ROGER, a user needs to login to ROGER first, for example: Hadoop User Guide Logging on to the Hadoop Cluster Nodes To login to the Hadoop cluster in ROGER, a user needs to login to ROGER first, for example: ssh username@roger-login.ncsa. illinois.edu after entering

More information

ECE5610/CSC6220 Introduction to Parallel and Distribution Computing. Lecture 6: MapReduce in Parallel Computing

ECE5610/CSC6220 Introduction to Parallel and Distribution Computing. Lecture 6: MapReduce in Parallel Computing ECE5610/CSC6220 Introduction to Parallel and Distribution Computing Lecture 6: MapReduce in Parallel Computing 1 MapReduce: Simplified Data Processing Motivation Large-Scale Data Processing on Large Clusters

More information

EECS Software Tools. Lab 2 Tutorial: Introduction to UNIX/Linux. Tilemachos Pechlivanoglou

EECS Software Tools. Lab 2 Tutorial: Introduction to UNIX/Linux. Tilemachos Pechlivanoglou EECS 2031 - Software Tools Lab 2 Tutorial: Introduction to UNIX/Linux Tilemachos Pechlivanoglou (tipech@eecs.yorku.ca) Sep 22 & 25, 2017 Material marked with will be in your exams Sep 22 & 25, 2017 Introduction

More information

ITNPBD7 Cluster Computing Spring Using Condor

ITNPBD7 Cluster Computing Spring Using Condor The aim of this practical is to work through the Condor examples demonstrated in the lectures and adapt them to alternative tasks. Before we start, you will need to map a network drive to \\wsv.cs.stir.ac.uk\datasets

More information

Data-Intensive Computing with MapReduce

Data-Intensive Computing with MapReduce Data-Intensive Computing with MapReduce Session 2: Hadoop Nuts and Bolts Jimmy Lin University of Maryland Thursday, January 31, 2013 This work is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

The Command Shell. Fundamentals of Computer Science

The Command Shell. Fundamentals of Computer Science The Command Shell Fundamentals of Computer Science Outline Starting the Command Shell Locally Remote Host Directory Structure Moving around the directories Displaying File Contents Compiling and Running

More information

Java & Inheritance. Inheritance - Scenario

Java & Inheritance. Inheritance - Scenario Java & Inheritance ITNPBD7 Cluster Computing David Cairns Inheritance - Scenario Inheritance is a core feature of Object Oriented languages. A class hierarchy can be defined where the class at the top

More information

Hadoop Tutorial. General Instructions

Hadoop Tutorial. General Instructions CS246H: Mining Massive Datasets Hadoop Lab Winter 2018 Hadoop Tutorial General Instructions The purpose of this tutorial is to get you started with Hadoop. Completing the tutorial is optional. Here you

More information

CSCI 161: Introduction to Programming I Lab 1a: Programming Environment: Linux and Eclipse

CSCI 161: Introduction to Programming I Lab 1a: Programming Environment: Linux and Eclipse CSCI 161: Introduction to Programming I Lab 1a: Programming Environment: Linux and Eclipse Goals - to become acquainted with the Linux/Gnome environment Overview For this lab, you will login to a workstation

More information

Helsinki 19 Jan Practical course in genome bioinformatics DAY 0

Helsinki 19 Jan Practical course in genome bioinformatics DAY 0 Helsinki 19 Jan 2017 529028 Practical course in genome bioinformatics DAY 0 This document can be downloaded at: http://ekhidna.biocenter.helsinki.fi/downloads/teaching/spring2017/exercises_day0.pdf The

More information

Hadoop streaming is an alternative way to program Hadoop than the traditional approach of writing and compiling Java code.

Hadoop streaming is an alternative way to program Hadoop than the traditional approach of writing and compiling Java code. title: "Data Analytics with HPC: Hadoop Walkthrough" In this walkthrough you will learn to execute simple Hadoop Map/Reduce jobs on a Hadoop cluster. We will use Hadoop to count the occurrences of words

More information

CSC116: Introduction to Computing - Java

CSC116: Introduction to Computing - Java CSC116: Introduction to Computing - Java Course Information Introductions Website Syllabus Schedule Computing Environment AFS (Andrew File System) Linux/Unix Commands Helpful Tricks Computers First Java

More information

Hands-on Exercise Hadoop

Hands-on Exercise Hadoop Department of Economics and Business Administration Chair of Business Information Systems I Prof. Dr. Barbara Dinter Big Data Management Hands-on Exercise Hadoop Building and Testing a Hadoop Cluster by

More information

Getting Started with Hadoop

Getting Started with Hadoop Getting Started with Hadoop May 28, 2018 Michael Völske, Shahbaz Syed Web Technology & Information Systems Bauhaus-Universität Weimar 1 webis 2018 What is Hadoop Started in 2004 by Yahoo Open-Source implementation

More information

Lab 1 Introduction to UNIX and C

Lab 1 Introduction to UNIX and C Name: Lab 1 Introduction to UNIX and C This first lab is meant to be an introduction to computer environments we will be using this term. You must have a Pitt username to complete this lab. The doc is

More information

CS/IT 114 Introduction to Java, Part 1 FALL 2016 CLASS 2: SEP. 8TH INSTRUCTOR: JIAYIN WANG

CS/IT 114 Introduction to Java, Part 1 FALL 2016 CLASS 2: SEP. 8TH INSTRUCTOR: JIAYIN WANG CS/IT 114 Introduction to Java, Part 1 FALL 2016 CLASS 2: SEP. 8TH INSTRUCTOR: JIAYIN WANG 1 Notice Class Website http://www.cs.umb.edu/~jane/cs114/ Reading Assignment Chapter 1: Introduction to Java Programming

More information

Guided Tour (Version 3.3) By Steven Castellucci as Modified by Brandon Haworth

Guided Tour (Version 3.3) By Steven Castellucci as Modified by Brandon Haworth Guided Tour (Version 3.3) By Steven Castellucci as Modified by Brandon Haworth This document was inspired by the Guided Tour written by Professor H. Roumani. His version of the tour can be accessed at

More information

Clustering Documents. Document Retrieval. Case Study 2: Document Retrieval

Clustering Documents. Document Retrieval. Case Study 2: Document Retrieval Case Study 2: Document Retrieval Clustering Documents Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April, 2017 Sham Kakade 2017 1 Document Retrieval n Goal: Retrieve

More information

CpSc 1111 Lab 1 Introduction to Unix Systems, Editors, and C

CpSc 1111 Lab 1 Introduction to Unix Systems, Editors, and C CpSc 1111 Lab 1 Introduction to Unix Systems, Editors, and C Welcome! Welcome to your CpSc 111 lab! For each lab this semester, you will be provided a document like this to guide you. This material, as

More information

Pivotal Capgemini Just Do It Training HDFS-NFS Gateway Labs

Pivotal Capgemini Just Do It Training HDFS-NFS Gateway Labs Pivotal Capgemini Just Do It Training HDFS-NFS Gateway Labs In this lab exercise you will have an opportunity to explore HDFS as well as become familiar with using the HDFS- NFS Bridge. First we will go

More information

High Performance Computing (HPC) Club Training Session. Xinsheng (Shawn) Qin

High Performance Computing (HPC) Club Training Session. Xinsheng (Shawn) Qin High Performance Computing (HPC) Club Training Session Xinsheng (Shawn) Qin Outline HPC Club The Hyak Supercomputer Logging in to Hyak Basic Linux Commands Transferring Files Between Your PC and Hyak Submitting

More information

Extreme Computing. Introduction to MapReduce. Cluster Outline Map Reduce

Extreme Computing. Introduction to MapReduce. Cluster Outline Map Reduce Extreme Computing Introduction to MapReduce 1 Cluster We have 12 servers: scutter01, scutter02,... scutter12 If working outside Informatics, first: ssh student.ssh.inf.ed.ac.uk Then log into a random server:

More information

Carnegie Mellon. Linux Boot Camp. Jack, Matthew, Nishad, Stanley 6 Sep 2016

Carnegie Mellon. Linux Boot Camp. Jack, Matthew, Nishad, Stanley 6 Sep 2016 Linux Boot Camp Jack, Matthew, Nishad, Stanley 6 Sep 2016 1 Connecting SSH Windows users: MobaXterm, PuTTY, SSH Tectia Mac & Linux users: Terminal (Just type ssh) andrewid@shark.ics.cs.cmu.edu 2 Let s

More information

Portions adapted from A Visual Guide to Version Control. Introduction to CVS

Portions adapted from A Visual Guide to Version Control. Introduction to CVS Portions adapted from A Visual Guide to Version Control Introduction to CVS Outline Introduction to Source Code Management & CVS CVS Terminology & Setup Basic commands Checkout, Add, Commit, Diff, Update,

More information

Cloud Programming on Java EE Platforms. mgr inż. Piotr Nowak

Cloud Programming on Java EE Platforms. mgr inż. Piotr Nowak Cloud Programming on Java EE Platforms mgr inż. Piotr Nowak dsh distributed shell commands execution -c concurrent --show-machine-names -M --group cluster -g cluster /etc/dsh/groups/cluster needs passwordless

More information

LAB #5 Intro to Linux and Python on ENGR

LAB #5 Intro to Linux and Python on ENGR LAB #5 Intro to Linux and Python on ENGR 1. Pre-Lab: In this lab, we are going to download some useful tools needed throughout your CS career. First, you need to download a secure shell (ssh) client for

More information

UoW HPC Quick Start. Information Technology Services University of Wollongong. ( Last updated on October 10, 2011)

UoW HPC Quick Start. Information Technology Services University of Wollongong. ( Last updated on October 10, 2011) UoW HPC Quick Start Information Technology Services University of Wollongong ( Last updated on October 10, 2011) 1 Contents 1 Logging into the HPC Cluster 3 1.1 From within the UoW campus.......................

More information

Intermediate Programming, Spring Misha Kazhdan

Intermediate Programming, Spring Misha Kazhdan 600.120 Intermediate Programming, Spring 2017 Misha Kazhdan Outline Unix/Linux command line Basics of the Emacs editor Compiling and running a simple C program Cloning a repository Connecting to ugrad

More information

An Introduction to Cluster Computing Using Newton

An Introduction to Cluster Computing Using Newton An Introduction to Cluster Computing Using Newton Jason Harris and Dylan Storey March 25th, 2014 Jason Harris and Dylan Storey Introduction to Cluster Computing March 25th, 2014 1 / 26 Workshop design.

More information

CS158 - Assignment 9 Faster Naive Bayes? Say it ain t so...

CS158 - Assignment 9 Faster Naive Bayes? Say it ain t so... CS158 - Assignment 9 Faster Naive Bayes? Say it ain t so... Part 1 due: Sunday, Nov. 13 by 11:59pm Part 2 due: Sunday, Nov. 20 by 11:59pm http://www.hadoopwizard.com/what-is-hadoop-a-light-hearted-view/

More information

CS CS Tutorial 2 2 Winter 2018

CS CS Tutorial 2 2 Winter 2018 CS CS 230 - Tutorial 2 2 Winter 2018 Sections 1. Unix Basics and connecting to CS environment 2. MIPS Introduction & CS230 Interface 3. Connecting Remotely If you haven t set up a CS environment password,

More information

CS451 - Assignment 8 Faster Naive Bayes? Say it ain t so...

CS451 - Assignment 8 Faster Naive Bayes? Say it ain t so... CS451 - Assignment 8 Faster Naive Bayes? Say it ain t so... Part 1 due: Friday, Nov. 8 before class Part 2 due: Monday, Nov. 11 before class Part 3 due: Sunday, Nov. 17 by 11:50pm http://www.hadoopwizard.com/what-is-hadoop-a-light-hearted-view/

More information

Clustering Documents. Case Study 2: Document Retrieval

Clustering Documents. Case Study 2: Document Retrieval Case Study 2: Document Retrieval Clustering Documents Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 21 th, 2015 Sham Kakade 2016 1 Document Retrieval Goal: Retrieve

More information

Tutorial 1: Unix Basics

Tutorial 1: Unix Basics Tutorial 1: Unix Basics To log in to your ece account, enter your ece username and password in the space provided in the login screen. Note that when you type your password, nothing will show up in the

More information

CS 2400 Laboratory Assignment #1: Exercises in Compilation and the UNIX Programming Environment (100 pts.)

CS 2400 Laboratory Assignment #1: Exercises in Compilation and the UNIX Programming Environment (100 pts.) 1 Introduction 1 CS 2400 Laboratory Assignment #1: Exercises in Compilation and the UNIX Programming Environment (100 pts.) This laboratory is intended to give you some brief experience using the editing/compiling/file

More information

Creating an Inverted Index using Hadoop

Creating an Inverted Index using Hadoop Creating an Inverted Index using Hadoop Redeeming Google Cloud Credits 1. Go to https://goo.gl/gcpedu/zvmhm6 to redeem the $150 Google Cloud Platform Credit. Make sure you use your.edu email. 2. Follow

More information

New User Tutorial. OSU High Performance Computing Center

New User Tutorial. OSU High Performance Computing Center New User Tutorial OSU High Performance Computing Center TABLE OF CONTENTS Logging In... 3-5 Windows... 3-4 Linux... 4 Mac... 4-5 Changing Password... 5 Using Linux Commands... 6 File Systems... 7 File

More information

Introduction to the Linux Command Line

Introduction to the Linux Command Line Introduction to the Linux Command Line May, 2015 How to Connect (securely) ssh sftp scp Basic Unix or Linux Commands Files & directories Environment variables Not necessarily in this order.? Getting Connected

More information

CS 261 Recitation 1 Compiling C on UNIX

CS 261 Recitation 1 Compiling C on UNIX Oregon State University School of Electrical Engineering and Computer Science CS 261 Recitation 1 Compiling C on UNIX Winter 2017 Outline Secure Shell Basic UNIX commands Editing text The GNU Compiler

More information

AMS 200: Working on Linux/Unix Machines

AMS 200: Working on Linux/Unix Machines AMS 200, Oct 20, 2014 AMS 200: Working on Linux/Unix Machines Profs. Nic Brummell (brummell@soe.ucsc.edu) & Dongwook Lee (dlee79@ucsc.edu) Department of Applied Mathematics and Statistics University of

More information

Guidelines For Hadoop and Spark Cluster Usage

Guidelines For Hadoop and Spark Cluster Usage Guidelines For Hadoop and Spark Cluster Usage Procedure to create an account in CSX. If you are taking a CS prefix course, you already have an account; to get an initial password created: 1. Login to https://cs.okstate.edu/pwreset

More information

User Guide Version 2.0

User Guide Version 2.0 User Guide Version 2.0 Page 2 of 8 Summary Contents 1 INTRODUCTION... 3 2 SECURESHELL (SSH)... 4 2.1 ENABLING SSH... 4 2.2 DISABLING SSH... 4 2.2.1 Change Password... 4 2.2.2 Secure Shell Connection Information...

More information

Introduction to Linux Environment. Yun-Wen Chen

Introduction to Linux Environment. Yun-Wen Chen Introduction to Linux Environment Yun-Wen Chen 1 The Text (Command) Mode in Linux Environment 2 The Main Operating Systems We May Meet 1. Windows 2. Mac 3. Linux (Unix) 3 Windows Command Mode and DOS Type

More information

CMU MSP Intro to Hadoop

CMU MSP Intro to Hadoop CMU MSP 36602 Intro to Hadoop H. Seltman, April 3 and 5 2017 1) Carl had created an MSP virtual machine that you can download as an appliance for VirtualBox (also used for SAS University Edition). See

More information

Introduction: What is Unix?

Introduction: What is Unix? Introduction Introduction: What is Unix? An operating system Developed at AT&T Bell Labs in the 1960 s Command Line Interpreter GUIs (Window systems) are now available Introduction: Unix vs. Linux Unix

More information

CMSC 201 Spring 2018 Lab 01 Hello World

CMSC 201 Spring 2018 Lab 01 Hello World CMSC 201 Spring 2018 Lab 01 Hello World Assignment: Lab 01 Hello World Due Date: Sunday, February 4th by 8:59:59 PM Value: 10 points At UMBC, the GL system is designed to grant students the privileges

More information

CMSC 201 Spring 2017 Lab 01 Hello World

CMSC 201 Spring 2017 Lab 01 Hello World CMSC 201 Spring 2017 Lab 01 Hello World Assignment: Lab 01 Hello World Due Date: Sunday, February 5th by 8:59:59 PM Value: 10 points At UMBC, our General Lab (GL) system is designed to grant students the

More information

CPS109 Lab 1. i. To become familiar with the Ryerson Computer Science laboratory environment.

CPS109 Lab 1. i. To become familiar with the Ryerson Computer Science laboratory environment. CPS109 Lab 1 Source: Partly from Big Java lab1, by Cay Horstmann. Objective: i. To become familiar with the Ryerson Computer Science laboratory environment. ii. To obtain your login id and to set your

More information

Session 1: Accessing MUGrid and Command Line Basics

Session 1: Accessing MUGrid and Command Line Basics Session 1: Accessing MUGrid and Command Line Basics Craig A. Struble, Ph.D. July 14, 2010 1 Introduction The Marquette University Grid (MUGrid) is a collection of dedicated and opportunistic resources

More information

Ultimate Hadoop Developer Training

Ultimate Hadoop Developer Training First Edition Ultimate Hadoop Developer Training Lab Exercises edubrake.com Hadoop Architecture 2 Following are the exercises that the student need to finish, as required for the module Hadoop Architecture

More information

A Brief Introduction to the Command Line. Hautahi Kingi

A Brief Introduction to the Command Line. Hautahi Kingi A Brief Introduction to the Command Line Hautahi Kingi Introduction A shell is a computer program like any other. But its primary purpose is to read commands and run other programs, rather than to perform

More information

Check the entries in the home directory again with an ls command and then change to the java directory:

Check the entries in the home directory again with an ls command and then change to the java directory: MODULE 1p - A Directory for Java Files FIRST TASK Log in to PWF Linux and check the files in your home directory with an ls command. Create a directory for your Java files: c207@pccl504:~> mkdir java Move

More information

Refresher workshop in programming for polytechnic graduates General Java Program Compilation Guide

Refresher workshop in programming for polytechnic graduates General Java Program Compilation Guide Refresher workshop in programming for polytechnic graduates General Java Program Compilation Guide Overview Welcome to this refresher workshop! This document will serve as a self-guided explanation to

More information

Steps: First install hadoop (if not installed yet) by, https://sl6it.wordpress.com/2015/12/04/1-study-and-configure-hadoop-for-big-data/

Steps: First install hadoop (if not installed yet) by, https://sl6it.wordpress.com/2015/12/04/1-study-and-configure-hadoop-for-big-data/ SL-V BE IT EXP 7 Aim: Design and develop a distributed application to find the coolest/hottest year from the available weather data. Use weather data from the Internet and process it using MapReduce. Steps:

More information

TDDE31/732A54 - Big Data Analytics Lab compendium

TDDE31/732A54 - Big Data Analytics Lab compendium TDDE31/732A54 - Big Data Analytics Lab compendium For relational databases lab, please refer to http://www.ida.liu.se/~732a54/lab/rdb/index.en.shtml. Description and Aim In the lab exercises you will work

More information

When you first log in, you will be placed in your home directory. To see what this directory is named, type:

When you first log in, you will be placed in your home directory. To see what this directory is named, type: Chem 7520 Unix Crash Course Throughout this page, the command prompt will be signified by > at the beginning of a line (you do not type this symbol, just everything after it). Navigation When you first

More information

Big Data Exercises. Fall 2017 Week 5 ETH Zurich. MapReduce

Big Data Exercises. Fall 2017 Week 5 ETH Zurich. MapReduce Big Data Exercises Fall 2017 Week 5 ETH Zurich MapReduce Reading: White, T. (2015). Hadoop: The Definitive Guide (4th ed.). O Reilly Media, Inc. [ETH library] (Chapters 2, 6, 7, 8: mandatory, Chapter 9:

More information

Basic Unix Commands. CGS 3460, Lecture 6 Jan 23, 2006 Zhen Yang

Basic Unix Commands. CGS 3460, Lecture 6 Jan 23, 2006 Zhen Yang Basic Unix Commands CGS 3460, Lecture 6 Jan 23, 2006 Zhen Yang For this class you need to work from your grove account to finish your homework Knowing basic UNIX commands is essential to finish your homework

More information

Section 2: Developer tools and you. Alex Mariakakis (staff-wide)

Section 2: Developer tools and you. Alex Mariakakis (staff-wide) Section 2: Developer tools and you Alex Mariakakis cse331-staff@cs.washington.edu (staff-wide) What is an SSH client? Uses the secure shell protocol (SSH) to connect to a remote computer o Enables you

More information

Using LINUX a BCMB/CHEM 8190 Tutorial Updated (1/17/12)

Using LINUX a BCMB/CHEM 8190 Tutorial Updated (1/17/12) Using LINUX a BCMB/CHEM 8190 Tutorial Updated (1/17/12) Objective: Learn some basic aspects of the UNIX operating system and how to use it. What is UNIX? UNIX is the operating system used by most computers

More information

Hadoop 2.X on a cluster environment

Hadoop 2.X on a cluster environment Hadoop 2.X on a cluster environment Big Data - 05/04/2017 Hadoop 2 on AMAZON Hadoop 2 on AMAZON Hadoop 2 on AMAZON Regions Hadoop 2 on AMAZON S3 and buckets Hadoop 2 on AMAZON S3 and buckets Hadoop 2 on

More information

Hadoop Lab 3 Creating your first Map-Reduce Process

Hadoop Lab 3 Creating your first Map-Reduce Process Programming for Big Data Hadoop Lab 3 Creating your first Map-Reduce Process Lab work Take the map-reduce code from these notes and get it running on your Hadoop VM Driver Code Mapper Code Reducer Code

More information

Big Data Analytics: Insights and Innovations

Big Data Analytics: Insights and Innovations International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 6, Issue 10 (April 2013), PP. 60-65 Big Data Analytics: Insights and Innovations

More information

Running Sentaurus on the DOE Network

Running Sentaurus on the DOE Network Running Sentaurus on the DOE Network Department of Electronics Carleton University April 2016 Warning: This primer is intended to cover the DOE.CARLETON.CA domain. Other domains are not covered in this

More information

C02: Overview of Software Development and Java

C02: Overview of Software Development and Java CISC 3120 C02: Overview of Software Development and Java Hui Chen Department of Computer & Information Science CUNY Brooklyn College 08/31/2017 CUNY Brooklyn College 1 Outline Recap and issues Brief introduction

More information

This document is intended to help you connect to the CVS server on a Windows system.

This document is intended to help you connect to the CVS server on a Windows system. Sourceforge CVS Access Sourceforge CVS Access... 1 Introduction... 1 Tools... 1 Generate Public / Private Keys... 1 Configuring Sourceforge Account... 4 Loading Private Keys for Authentication... 7 Testing

More information

CHE3935. Lecture 1. Introduction to Linux

CHE3935. Lecture 1. Introduction to Linux CHE3935 Lecture 1 Introduction to Linux 1 Logging In PuTTY is a free telnet/ssh client that can be run without installing it within Windows. It will only give you a terminal interface, but used with a

More information

HPC Introductory Course - Exercises

HPC Introductory Course - Exercises HPC Introductory Course - Exercises The exercises in the following sections will guide you understand and become more familiar with how to use the Balena HPC service. Lines which start with $ are commands

More information

Shell and Utility Commands

Shell and Utility Commands Table of contents 1 Shell Commands... 2 2 Utility Commands...3 1. Shell Commands 1.1. fs Invokes any FsShell command from within a Pig script or the Grunt shell. 1.1.1. Syntax fs subcommand subcommand_parameters

More information

CENG 334 Computer Networks. Laboratory I Linux Tutorial

CENG 334 Computer Networks. Laboratory I Linux Tutorial CENG 334 Computer Networks Laboratory I Linux Tutorial Contents 1. Logging In and Starting Session 2. Using Commands 1. Basic Commands 2. Working With Files and Directories 3. Permission Bits 3. Introduction

More information

Unix. 1 tblgrant projects 0 Jun 17 15:40 doc1.txt. documents]$ touch doc2.txt documents]$ ls -l total 0

Unix. 1 tblgrant projects 0 Jun 17 15:40 doc1.txt. documents]$ touch doc2.txt documents]$ ls -l total 0 Unix Each one of you will get an account on silo.cs.indiana.edu where you will do most of your work. You will need to become familiar with the Unix environment (file system and a number of Unix commands)

More information

ENCM 339 Fall 2017: Editing and Running Programs in the Lab

ENCM 339 Fall 2017: Editing and Running Programs in the Lab page 1 of 8 ENCM 339 Fall 2017: Editing and Running Programs in the Lab Steve Norman Department of Electrical & Computer Engineering University of Calgary September 2017 Introduction This document is a

More information

Laboratory 1 Semester 1 11/12

Laboratory 1 Semester 1 11/12 CS2106 National University of Singapore School of Computing Laboratory 1 Semester 1 11/12 MATRICULATION NUMBER: In this lab exercise, you will get familiarize with some basic UNIX commands, editing and

More information

Using IDLE for

Using IDLE for Using IDLE for 15-110 Step 1: Installing Python Download and install Python using the Resources page of the 15-110 website. Be sure to install version 3.3.2 and the correct version depending on whether

More information

Linux File System and Basic Commands

Linux File System and Basic Commands Linux File System and Basic Commands 0.1 Files, directories, and pwd The GNU/Linux operating system is much different from your typical Microsoft Windows PC, and probably looks different from Apple OS

More information

CS Fundamentals of Programming II Fall Very Basic UNIX

CS Fundamentals of Programming II Fall Very Basic UNIX CS 215 - Fundamentals of Programming II Fall 2012 - Very Basic UNIX This handout very briefly describes how to use Unix and how to use the Linux server and client machines in the CS (Project) Lab (KC-265)

More information

Lab Compiling using an IDE (Eclipse)

Lab Compiling using an IDE (Eclipse) Lab 1. This introductory lab is composed of three tasks. Your final objective is to run your first Hadoop application. For this goal, you must learn how to compile the source code and produce a jar, connect

More information

Parallel Programming Pre-Assignment. Setting up the Software Environment

Parallel Programming Pre-Assignment. Setting up the Software Environment Parallel Programming Pre-Assignment Setting up the Software Environment Author: B. Wilkinson Modification date: January 3, 2016 Software The purpose of this pre-assignment is to set up the software environment

More information

Guided Tour (Version 3.4) By Steven Castellucci

Guided Tour (Version 3.4) By Steven Castellucci Guided Tour (Version 3.4) By Steven Castellucci This document was inspired by the Guided Tour written by Professor H. Roumani. His version of the tour can be accessed at the following URL: http://www.cse.yorku.ca/~roumani/jbayork/guidedtour.pdf.

More information

PROJECT INFRASTRUCTURE AND BASH INTRODUCTION MARKUS PILMAN<

PROJECT INFRASTRUCTURE AND BASH INTRODUCTION MARKUS PILMAN< PROJECT INFRASTRUCTURE AND BASH INTRODUCTION MARKUS PILMAN< MPILMAN@INF.ETHZ.CH> ORGANIZATION Tutorials on Tuesdays - Sometimes, will be announced In General: no exercise sessions (unless you get an email

More information