COSC 6385 Computer Architecture. - Homework

Similar documents
COSC 6374 Parallel Computation. Edgar Gabriel Fall Each student should deliver Source code (.c file) Documentation (.pdf,.doc,.tex or.

COSC 6385 Computer Architecture - Project

COSC 6374 Parallel Computation. Debugging MPI applications. Edgar Gabriel. Spring 2008

STARTING THE DDT DEBUGGER ON MIO, AUN, & MC2. (Mouse over to the left to see thumbnails of all of the slides)

Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU

LAB. Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers

CRUK cluster practical sessions (SLURM) Part I processes & scripts

For Dr Landau s PHYS8602 course

To connect to the cluster, simply use a SSH or SFTP client to connect to:

Working with Shell Scripting. Daniel Balagué

Using a Linux System 6

UPPMAX Introduction Martin Dahlö Valentin Georgiev

COMP 3500 Introduction to Operating Systems Project 5 Virtual Memory Manager

Operating Systems (ECS 150) Spring 2011

CSC BioWeek 2018: Using Taito cluster for high throughput data analysis

Graham vs legacy systems

Submitting batch jobs

EE516: Embedded Software Project 1. Setting Up Environment for Projects

Exercise: Calling LAPACK

Introduction to SLURM on the High Performance Cluster at the Center for Computational Research

Module 4: Working with MPI

Submitting batch jobs Slurm on ecgate Solutions to the practicals

Federated Cluster Support

Introduction to GACRC Teaching Cluster

CMPE 655 Fall 2016 Assignment 2: Parallel Implementation of a Ray Tracer

CS 314 Principles of Programming Languages Fall 2017 A Compiler and Optimizer for tinyl Due date: Monday, October 23, 11:59pm

Carnegie Mellon University Department of Computer Science /615 - Database Applications C. Faloutsos & A. Pavlo, Fall 2015

CS Programming Languages Fall Homework #2

3. When you process a largest recent earthquake query, you should print out:

Project #1: Tracing, System Calls, and Processes

CS3114 (Fall 2013) PROGRAMMING ASSIGNMENT #2 Due Tuesday, October 11:00 PM for 100 points Due Monday, October 11:00 PM for 10 point bonus

High Performance Computing Cluster Basic course

Lecture 19: Memory Hierarchy Five Ways to Reduce Miss Penalty (Second Level Cache) Admin

CSC BioWeek 2016: Using Taito cluster for high throughput data analysis

ECE 574 Cluster Computing Lecture 4

Duke Compute Cluster Workshop. 3/28/2018 Tom Milledge rc.duke.edu

Linux Tutorial. Ken-ichi Nomura. 3 rd Magics Materials Software Workshop. Gaithersburg Marriott Washingtonian Center November 11-13, 2018

Deep RL and Controls Homework 2 Tensorflow, Keras, and Cluster Usage

It is academic misconduct to share your work with others in any form including posting it on publicly accessible web sites, such as GitHub.

Heterogeneous Job Support

LAB. Preparing for Stampede: Programming Heterogeneous Many- Core Supercomputers

Supercomputing in Plain English Exercise #6: MPI Point to Point

Choosing Resources Wisely. What is Research Computing?

Compilation and Parallel Start

XSEDE New User Training. Ritu Arora November 14, 2014

Lab 1 Introduction to UNIX and C

Compiling applications for the Cray XC

CS155: Computer Security Spring Project #1. Due: Part 1: Thursday, April pm, Part 2: Monday, April pm.

Introduction to OpenMP. Lecture 2: OpenMP fundamentals

Computer Systems Principles Cache Simulator

Beginner's Guide for UK IBM systems

Final Programming Project

CpSc 1111 Lab 9 2-D Arrays

Streaming vs. batch processing

SCALABLE HYBRID PROTOTYPE

Carnegie Mellon University Database Applications Fall 2009, Faloutsos Assignment 5: Indexing (DB-internals) Due: 10/8, 1:30pm, only

Assignment 1: Communicating with Programs

Project 1. 1 Introduction. October 4, Spec version: 0.1 Due Date: Friday, November 1st, General Instructions

OBTAINING AN ACCOUNT:

Quick Start Guide. by Burak Himmetoglu. Supercomputing Consultant. Enterprise Technology Services & Center for Scientific Computing

Computer Architecture Assignment 4 - Cache Simulator

CNAG Advanced User Training

Introduction to High-Performance Computing (HPC)

HPC Introductory Course - Exercises

Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine

Embedded System Design and Modeling EE382N.23, Fall 2015

Unix/Linux Operating System. Introduction to Computational Statistics STAT 598G, Fall 2011

Choosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 FAS Research Computing

P a g e 1. HPC Example for C with OpenMPI

Using Cartesius and Lisa. Zheng Meyer-Zhao - Consultant Clustercomputing

Project 5 Handling Bit Arrays and Pointers in C

Introduction to GACRC Teaching Cluster PHYS8602

CS261: HOMEWORK 2 Due 04/13/2012, at 2pm

Using Compute Canada. Masao Fujinaga Information Services and Technology University of Alberta

Exercises: Abel/Colossus and SLURM

How to run a job on a Cluster?

Batch Systems. Running your jobs on an HPC machine

CS 361S - Network Security and Privacy Spring Project #2

Introduction to RCC. September 14, 2016 Research Computing Center

Troubleshooting Jobs on Odyssey

Quick Start Guide. by Burak Himmetoglu. Supercomputing Consultant. Enterprise Technology Services & Center for Scientific Computing

Slurm and Abel job scripts. Katerina Michalickova The Research Computing Services Group SUF/USIT October 23, 2012

Bright Cluster Manager: Using the NVIDIA NGC Deep Learning Containers

Processes. CS 475, Spring 2018 Concurrent & Distributed Systems

Project 1 for CMPS 181: Implementing a Paged File Manager

CS155: Computer Security Spring Project #1

Introduction to RCC. January 18, 2017 Research Computing Center

Parallel Merge Sort Using MPI

Exercise 1: Basic Tools

Assignment 3, Due October 4

DATABASE SYSTEMS. Introduction to MySQL. Database System Course, 2016

COSC 6374 Parallel Computation. Performance Modeling and 2 nd Homework. Edgar Gabriel. Spring Motivation

Xeon Phi Native Mode - Sharpen Exercise

OpenACC Course. Office Hour #2 Q&A

Duke Compute Cluster Workshop. 11/10/2016 Tom Milledge h:ps://rc.duke.edu/

Meteorology 5344, Fall 2017 Computational Fluid Dynamics Dr. M. Xue. Computer Problem #l: Optimization Exercises

IMPORTANT REMINDERS. Code-packaging info for your tar file. Other reminders, FYI

PPCES 2016: MPI Lab March 2016 Hristo Iliev, Portions thanks to: Christian Iwainsky, Sandra Wienke

Slurm at UPPMAX. How to submit jobs with our queueing system. Jessica Nettelblad sysadmin at UPPMAX

Slurm basics. Summer Kickstart June slide 1 of 49

Transcription:

COSC 6385 Computer Architecture - Homework Fall 2008 1 st Assignment Rules Each team should deliver Source code (.c,.h and Makefiles files) Please: no.o files and no executables! Documentation (.pdf,.doc,.tex or.txt file) Deliver electronically to gabriel@cs.uh.edu Expected by Monday, October 13, 11.59pm In case of questions: ask the TAs first, if they don t know the answer, they will ask me. Ask early, not the day before the submission is due 1

About the Project Given the source code for sequential image segmentation code ( File cosc6385-hw.tar.gz). You can open the archive with tar xzvf cosc6385-hw.tar.gz The archive contains the among others following files Makefile /* To compile everything on Linux/Unix */ main.c /* The only file that you have to modify! */ OutputStack.conf /* A configuration file to be used for the hw */ OutputStack.raw /* the raw image */ About the Project A sequential code performing image segmentation for a multi-spectral image Code provided by Shishir Shah Input file: a flat image ( no compression ) a configuration file Start the application by Compiling: just type make Run: allocate a node (see later in the lecture) type:./multiscalegabor OutputStack.conf 2

Configuration file OutputStack.raw //name of the image 1040 // image height 1392 // image width 3 // no. of segments to be created 1 //smoothing flag 0:no smoothing, 1:smoothing 0 //write texture information to file 0:no 1:yes 0 //fftw flag 0:FFTW_ESTIMATE, 1:FFTW_MEASURE 4 //no. of channels, e.g. bw:1 color:3 The source code The source code contains of the following parts: Domain 1: Perform I/O operations (e.g. read image, write texture information, write result of segmentation ) Step 4, 6c and 11 Domain 2: Create a filter bank of gabor filters Step 5 Domain 3: Perform a convolution operation of each filter on the image Steps 2, 6a, 6b, and 6d Domain 4: Determine texture statistics and clustering Step 8 and 9 Domain 5: Perform spatial smoothing on the labels Step 10 3

Part 1: Instrument the main file main.c in order to use hardware performance counters to determine the behavior of each Domain described on the previous page separately. The hardware performance counters should be based on the PAPI library, and you could monitor the following values: Level 2 Cache Hits and Level 2 Cache misses Number of floating point operations and integer instructions Floating point performance Whether you can access these values will depend on the processor you are really using! Please note that counter values might overflow, and PAPI can handle that, you have to include however special functions calls for that. Part 2: Run the modified code on the shark cluster. Generate graphs for at least 3 PAPI hardware counters showing the values for each Domain separately.. Please document (you can use PAPI to figure many of these things out!) : Processor type, frequency Operating System (as precisely as possible) Cache sizes Each team has a single account 4

Part 3 ( only for the two-person teams!) Generate an estimate of the cache usage of the original code (without PAPI calls in it) using the valgrind toolkit with cachegrind, e.g. valgrind tool=cachegrind./multiscalegabor OutputStack.conf If possible, compare the data produced by valgrind to the data obtained with PAPI Note: the execution of the application using valgrind/cachegrind will be significantly slower than without it! Notes The PAPI version installed on shark is 3.6.0 On the front-end node you can find tons ton s of examples in C and Fortran on how to use PAPI in /opt/papi-3.6.0/share/examples/. E.g. src/ctests/avail.c -> how to check on a processor whether a counter is available src/ctests/high_level.c -> how to use the high-level API of PAPI src/ctests/memory.c -> how to extract information of the memory subsystem (e.g. cache sizes) 5

1 st Assignment The Documentation should contain (Brief) Problem description Solution strategy Results section Description of resources used Description of measurements performed Results (graphs + findings) 1 st Assignment The document should not contain Replication of the entire source code that s why you have to deliver the sources Screen shots of every single measurement you made Actually, no screen shots at all. The slurm output files 6

How to use a cluster A cluster usually consists of a front-end node and compute nodes You can login to the front end node using ssh (from windows or linux machines) using the login name and the password assigned to you. The front end node is supposed to be there for editing, and compiling - not for running jobs! If 40 teams would run their jobs on the same processor, everything would stall!!!!! To allocate a node for interactive development: teamxy@shark:~>salloc N 1 bash teamxy@shark:~>squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 489 calc smith R 0:02 1 shark08 teamxy@shark:~> ssh shark08 How to use a cluster (II) Once your code is correct and you would like to do the measurements: You have to submit a batch job The command you need is sbatch, e.g. sbatch N 1./ImageAnalysis.sh Your job goes into a queue, and will be executed as soon as a node is available. You can check the status of your job with sqeueu 7

How to use a cluster (III) The output of squeue gives you a job-id for your job Once your job finishes, you will have a file called slurm-<jobid>.out in your home directory, which contains all the output of your printf statements etc. Note the batch script used for the job submission (e.g. ImageAnalysis.sh) has to be executable. This means, that after you downloaded it from the webpage and copied it to shark, you have to type chmod +x ImageAnalysis.sh Please do not edit the ImageAnalysis.sh file on MS Windows. Windows does not add the UNIX EOF markers, and this confuses slurm when reading the file. Notes PAPI Project webpage: http://icl.cs.utk.edu/papi PAPI Programmer s guide: http://icl.cs.utk.edu/projects/papi/files/documentation/papi_prog_r ef.pdf PAPI User s guide: http://icl.cs.utk.edu/projects/papi/files/documentation/papi_user_gui DE_306.pdf If you need hints on how to use a UNIX/Linux machine through ssh: http://www.cs.uh.edu/~gabriel/cosc4397_s06/parco_08_introductionunix.pdf How to use a cluster such as shark http://pstl.cs.uh.edu/resources.html 8