Requirement and Dependencies

Similar documents
SLURM Operation on Cray XT and XE

Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU

MATRIX:DJLSYS EXPLORING RESOURCE ALLOCATION TECHNIQUES FOR DISTRIBUTED JOB LAUNCH UNDER HIGH SYSTEM UTILIZATION

Introduction & Motivation Problem Statement Proposed Work Evaluation Conclusions Future Work

Slurm Birds of a Feather

Slurm Roadmap. Danny Auble, Morris Jette, Tim Wickberg SchedMD. Slurm User Group Meeting Copyright 2017 SchedMD LLC

Introduction & Motivation Problem Statement Proposed Work Evaluation Conclusions Future Work

SCALABLE HYBRID PROTOTYPE

Table of Contents. Table of Contents Job Manager for remote execution of QuantumATK scripts. A single remote machine

Introduction to Cuda Visualization. Graphical Application Tunnelling on Palmetto

GUT. GUT Installation Guide

STARTING THE DDT DEBUGGER ON MIO, AUN, & MC2. (Mouse over to the left to see thumbnails of all of the slides)

CS155: Computer Security Spring Project #1

Extending SLURM with Support for GPU Ranges

Slurm Version Overview

This tutorial will guide you how to setup and run your own minecraft server on a Linux CentOS 6 in no time.

High Scalability Resource Management with SLURM Supercomputing 2008 November 2008

Exascale Process Management Interface

Installing Joomla

CS 0449 Project 4: /dev/rps Due: Friday, December 8, 2017, at 11:59pm

Scratchbox Remote Shell

/ Cloud Computing. Recitation 5 September 27 th, 2016

COSC 6374 Parallel Computation. Debugging MPI applications. Edgar Gabriel. Spring 2008

ecamguvcview Build & Installation Guide

Programming Assignment 3

Assignment 1: Plz Tell Me My Password

Rice Imputation Server tutorial

MSLURM SLURM management of multiple environments

1. Introduction. 2. Setup

Linux Tutorial. Ken-ichi Nomura. 3 rd Magics Materials Software Workshop. Gaithersburg Marriott Washingtonian Center November 11-13, 2018

Slurm Overview. Brian Christiansen, Marshall Garey, Isaac Hartung SchedMD SC17. Copyright 2017 SchedMD LLC

Introduction to Slurm

Versions and 14.11

Resource Management at LLNL SLURM Version 1.2

Compiling applications for the Cray XC

continout_data.txt DESCRIPTION: Dataset contains continuous outcome and matches with the continfile.txt individuals

MIC Lab Parallel Computing on Stampede

Getting Started with Serial and Parallel MATLAB on bwgrid

Massey University Follow Me Printer Setup for Linux systems

Introduction to High-Performance Computing (HPC)

Version control system (VCS)

Slurm Burst Buffer Support

Using and Modifying the BSC Slurm Workload Simulator. Slurm User Group Meeting 2015 Stephen Trofinoff and Massimo Benini, CSCS September 16, 2015

Simulink S-Function for RT-LAB Document 1b Creation of a S-Function From C Code and Protection of the Source Code Version 1.2

Playing with Linux CGROUP

Slurm Support for Linux Control Groups

GUT. GUT Installation Guide

============================================================

Introduction: Before start doing any code, there is some terms that one should be familiar with:

SLURM Simulator improvements and evaluation

Figure 1: Graphical representation of a client-server application

Baseband Device Drivers. Release rc1

Media Streaming. A Benchmark for Media Streaming and Cloud Computing USER S MANUAL

Working with Basic Linux. Daniel Balagué

Supercomputing environment TMA4280 Introduction to Supercomputing

Slurm at UPPMAX. How to submit jobs with our queueing system. Jessica Nettelblad sysadmin at UPPMAX

COSC 6374 Parallel Computation. Edgar Gabriel Fall Each student should deliver Source code (.c file) Documentation (.pdf,.doc,.tex or.

BusinessObjects XI Release 2

Advanced Threading and Optimization

CSCI 201 Lab 1 Environment Setup

Getting to Work with OpenPiton. Princeton University. OpenPit

CprE Computer Architecture and Assembly Level Programming Spring Lab-2

Bitnami MEAN for Huawei Enterprise Cloud

1 Bull, 2011 Bull Extreme Computing

These instructions cover how to install and use pre-compiled binaries to monitor AIX 5.3 using NRPE.

Table of Contents. Introduction... 3 System Requirements... 3 Tutorial Videos... 3 Installing Racknet... 4 Support... 10

bwunicluster Tutorial Access, Data Transfer, Compiling, Modulefiles, Batch Jobs

Running Kmeans Spark on EC2 Documentation

User Manual. 1. Open pjsip-apps/build/wince-evc4/wince_demos.vcw EVC4 workspace, 2. Build the pjsua_wince application.

Towards Exascale: Leveraging InfiniBand to accelerate the performance and scalability of Slurm jobstart.

LOCAL WALLET (COLD WALLET):

EECS Software Tools. Lab 2 Tutorial: Introduction to UNIX/Linux. Tilemachos Pechlivanoglou

OpenPiton in Action. Princeton University. OpenPit

Programming Techniques for Supercomputers. HPC RRZE University Erlangen-Nürnberg Sommersemester 2018

Install and Configure wxwidgets on Ubuntu

VIRTUAL GPU LICENSE SERVER VERSION , , AND 5.1.0

Table of contents. Zip Processor 3.0 DMXzone.com

Introduction to SLURM & SLURM batch scripts

Slurm Workload Manager Overview SC15

bwunicluster Tutorial Access, Data Transfer, Compiling, Modulefiles, Batch Jobs

INSTALLATION INSTRUCTIONS FOR IPYTHON ENVIRONMENT

EE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California

Beginner's Guide for UK IBM systems

Bitnami Tiny Tiny RSS for Huawei Enterprise Cloud

Introduction to SLURM on the High Performance Cluster at the Center for Computational Research

143a, Spring 2018 Discussion Week 4 Programming Assignment. Jia Chen 27 Apr 2018

Hands-on. MPI basic exercises

1 Installation (briefly)

Portland State University Maseeh College of Engineering and Computer Science. Proficiency Examination Process

Lab 2: Threads and Processes

OpenMP Example. $ ssh # You might have to type yes if this is the first time you log on

Programming Assignment #4

COLD WALLET + MASTERNODE SETUP ON LINUX

Oracle BDA: Working With Mammoth - 1

IU RGRbench Workload/Benchmark User Guide to Database Creation and Population

Masternode Setup Guide

Intel Manycore Testing Lab (MTL) - Linux Getting Started Guide

UNic Eclipse Mini Tutorial (Updated 06/09/2012) Prepared by Harald Gjermundrod

Module 4: Working with MPI

Lab 6: OS Security for the Internet of Things

Transcription:

Requirement and Dependencies Linux GCC Google Protocol Buffer Version 2.4.1: https://protobuf.googlecode.com/files/protobuf- 2.4.1.tar.gz Google Protocol Buffer C-bindings Version 0.15: https://code.google.com/p/protobufc/downloads/detail?name=protobuf-c-0.15.tar.gz MATRIX and Distributed Job Launch Hands-on Tutorial 2

1. Install Google Protocol Buffers Refer to ZHT tutorial 2. Install Google Protocol Buffers C-binding Refer to ZHT tutorial 3. Download MATRIX code https://github.com/anupammr89/matrix 4. Unzip MATRIX code 5. cd to matrix-master folder 6. read the README file for details first 7. do make to compile MATRIX and Distributed Job Launch Hands-on Tutorial 3

1. Running the server./server portno neighbor zht.cfg TCP username logging max_tasks_per_package num_tasks prefix shared Arguments Description portno is the MATRIX server port number (50000) neighbo is the membership list file: 126.47.142.97 50000 Zht.cfg is to configure other parameters in ZHT (use default) TCP is to use TCP protocol, instead, UDP is used username is auxiliary parameter used by ZHT: kwang logging is the flag to turn ON/OFF logging information: 0/1 max_tasks_per_package is the number of tasks to bundle in single package before submitting to MATRIX server: 8 num_tasks is the total number of tasks submitted by the client: 8 prefix is the path to logging directory to be used by every instance of MATRIX server, it is private to every instance: /tmp shared is the path to logging directory that is shared by all MATRIX servers: ~/Downloads/ MATRIX and Distributed Job Launch Hands-on Tutorial 4

1. Running the client./client per_client_task neighbor zht.cfg TCP sleep_time logging max_tasks_per_package client_index num_tasks prefix shared DAG_choice Arguments Description: per_client_task is the number of tasks to be submitted by this client; useful when there are multiple clients trying to submit tasks to MATRIX sleep_time is the duration of sleep task that is submitted: 0 client_index is the index of this client; used when there are multiple clients: 0 DAG_choice is the choice of DAG to be used; 0 - Bag of Tasks, 1 - Random DAG, 2 - Pipeline DAG, 3 - Fan In DAG, 4 - Fan Out DAG Rest arguments are the same as mentioned for the server MATRIX and Distributed Job Launch Hands-on Tutorial 5

Copy the binaries i.e. server and client, to every node Copy configure files, that is, neighbor, zht.cfg, to every node Edit the the file neighbor to contain all the ip and port of these nodes, DO NOT USE localhost In every node, run as mentioned in Benchmark MATRIX and Distributed Job Launch Hands-on Tutorial 6

Requirement and Dependencies Linux GCC ZHT SLURM Version 2.5.3 MATRIX and Distributed Job Launch Hands-on Tutorial 7

1. download DJL code https://github.com/kwangiit/dist_job_launch Unzip the file Use distjoblaunch_4.zip as the code base 2. unzip distjoblaunch_4.zip 3. cd to slurm-2.5.3 folder 4. install SLURM http://www.schedmd.com/slurmdocs/quickstart_admin.html./configure make sudo make install 5. copy./etc/slurm.conf.example file to the /etc directory in where the SLURM is installed, and name it as slurm.conf file 6. modify the slurm.conf file to match your configuration, see the sample slurm.conf file in the slurm-2.5.3 folder 7. cd to slurm-2.5.3/src/zht/src folder to compile ZHT 8. compile ZHT: refer to ZHT tutorial 9. cd to controller folder slurm-2.5.3/src/controller 10. compile our controller: make MATRIX and Distributed Job Launch Hands-on Tutorial 8

1. Running ZHT server cd to slurm-2.5.3/src/zht/src run ZHT server command:./zhtserver -z zht.conf n neighbor.conf Running controller cd to slurm-2.5.3/src/controller run controller:./controller numcontroller memlist partitionsize workload zht_config_file zht_mem_list numcontroller: number of controllers, e. g. 1 memlist: controller membership list, e. g. localhost partitionsize: number of slurm daemons, e. g. 1 workload: workload file for this controller, e.g. srun N1 /bin/sleep 0 zht.conf: zht configure file, e. g.../../zht/src/zht.conf neighbor.conf: zht membership list, e.g.../../zht/src/neighbor.conf Running SLURM daemon slurmd D -vvvvv MATRIX and Distributed Job Launch Hands-on Tutorial 9

Initialize as ZHT client: c_zht_init Read the controller membership list Read workload Create a thread to receive messages Create individual thread to launch each job allocate nodes create job create job step launch the job step Wait until all jobs are finished MATRIX and Distributed Job Launch Hands-on Tutorial 10

More information: http://datasys.cs.iit.edu/~kewang/ Contact: kwang22@hawk.iit.edu Questions? MATRIX and Distributed Job Launch Hands-on Tutorial 11