wget

Size: px
Start display at page:

Download "wget https://www.csc2.ncsu.edu/faculty/efg/506/f17/www/homework/p2/program2.tgz"

Transcription

1 ECE/CSC 506: Architecture of Parallel Computers Program 2: Simulating Ocean Currents (Serial, OpenMP, and CUDA versions) Due: Friday, September 29, 2017 Preliminary version, differences between preliminary version and this version 1. Overall Problem Description In this project, you will add new features to a trace-driven Ocean Current Simulator. You can fetch the simulation to login.hpc.ncsu.edu with the command wget You are provided with superclass grid.cpp and a derived class solver_serial.cpp. There are incomplete functions in solver_serial that need to be completed. You will work on three versions of this simulation, a serial version, an OpenMP version, and a CUDA version. Your project should build on a Linux machine. The most challenging part of this machine problem is to understand the loop dependences and the decomposition of the tasks. In this project, you will implement red-and-black ordering as discussed in the lecture. The purpose is to understand the importance of parallelization of a program. In the OpenMP implementation, you will need to build a new derived class solver_omp.cpp in line with solver_serial. In the CUDA version, you need to build a new derived class solver_cuda.cu. Your project should build on the login.hpc.ncsu.edu.. The objective of this part of project is to understand how to parallelize a serial application. 2. Simulator The specifications of the project are as follows. A grid (array), which has a regular dimension (N N), is created and initialized with the values from the input trace file. The heart of this simulator is an equation solver function which solves a simple partial differential equation on the grid. The border rows and columns do not participate in the computation, as the boundary values do not change. The interior (N 2) (N 2) points are updated by the equation solver. The computation proceeds over a number of sweeps. In each sweep, it operates on all the elements of the grid, replacing the value of each element with a weighted average of itself and its four nearest neighbor elements. The updates are done in place in the grid, so a point sees the new values of the points above and to the left of it, and the old values of the points below it and to its right. During each sweep, the equation solver also computes the average difference of an updated element from its previous value. If this average difference over all elements is smaller than a predefined tolerance parameter, the solution is said to have converged and the solver exits at the end of the sweep. Otherwise, it performs another sweep and tests for convergence again. 1

2 This project exploits parallelism using red-black ordering. The idea is to separate the grid points into alternating red points and black points as on a checkerboard, as shown in the figure, so that no red point is adjacent to another red point, and no black point is adjacent to another black point. Since each point reads only its four nearest neighbors, in order to compute a red point, we do not need the updated value of any other red point, but only the updated values of the above and left black points (in a standard sweep), and vice versa. We can therefore divide a grid sweep into two phases: first compute all red points, and then compute all black points. Within each phase there are no dependences among grid points, so we can compute all red points in parallel, then synchronize globally, and then compute all black points in parallel. Red point Black point 3. Building the simulator You are provided with superclasses grid.cpp and solver_serial.cpp, along with main.cpp. The grid class is used to create a grid array and define the necessary functions needed to traverse the grid, which are as below. void initialize_grid(file* p_file); void set_tol_value(float tol) {tolerance = tol;} void print_grid(); virtual void simulate_eqn_solver() = 0; The initialize_grid method initializes the grid with the values from the input file. The method set_tol_value is used to update the tolerance value associated with the grid. The print_grid method displays the final contents of the grid. The simulate_eqn_solver function performs the finite differential operation over the grid and invokes the function for red-black ordering. This function is declared as a pure virtual function since the actual definition is done in solver_serial.cpp. The solver_serial.cpp superclass contains the functions simulate_eqn_solver and red_black_ordering. You need to implement them as shown in the pseudocode at the top of the next page. 2

3 while (!done) do /* outermost loop over sweeps */ diff = 0; /* initialize maximum difference to 0 */ for i -> 1 to n do /* sweep over non-border points of grid */ for j -> 1 to n do temp = A[i,j]; /* save old value of element */ A[i,j] = 0.2 * (A[i,j] + A[i,j-1] + A[i-1,j] + A[i,j+1] + A[i+1,j]); /*compute average */ diff += abs(a[i,j] - temp); end for end for if (diff/(n*n) < TOL) then done = 1; end while Once you're ready to build your program, you can compile as follows: make serial g++ -O0 -Wall -Werror -D SERIAL -c main.cpp -o SERIAL/main.o g++ -O0 -Wall -Werror -D SERIAL -c grid.cpp -o SERIAL/grid.o g++ -O0 -Wall -Werror -D SERIAL -c solver_serial.cpp -o SERIAL/solver_serial.o g++ -O0 -Wall -Werror -D SERIAL -o ocean_sim_serial SERIAL/main.o SERIAL/grid.o SERIAL/solver_serial.o -lm FA OCEAN SIMULATOR SERIAL VERSION Compilation Done ---> nothing else to make :) Executables called ocean_sim_serial will be created. In order to run your simulator, you need to execute the following command:./ocean_sim_serial dimension tolerance trace_file./ocean_sim_serial dimension tolerance trace_file num_of_threads where ocean_sim_serial is the executables of the Ocean simulator generated after making dimension is the grid dimension tolerance is the point of convergence for the equation solver trace_file is the input file that has the dummy ocean current trace. num_of_threads is the number of threads or 3

4 Your output should match the given validation runs in terms of given results and format. You will need to match the results using the diff command. You can use the following command diff iw given output file your output file You can dump the output from your simulator to stdout and redirect it to a file using the > operator. You will be provided with outputs of 5 validation runs. You may build the omp version and the cuda version similarly by running make omp make cuda ocean_sim_omp and ocean_sim_cuda will be generated for each implementation respectively. You may also run make to generate all three executables in one go, once you have tested all three implementations. Please assure that your serial and omp program will run on hpc.ncsu.edu, and your cuda program will run on the ARC Cluster. TAs will use these environments to verify your code. Editing, Compiling and Running Please refer to the Guide to ARC document for how to connect to ARC Cluster. Once you have a prompt on a compute node, you should still be able to see the files you uploaded. To edit your program, you can make edits on a Linux host and just push changes up to arc using sftp. However, it will probably be easier to just edit directly on the ARC machine. If you're accustomed to vim or emacs, they are both there for you. If you're not used to editing on a Linux machine, nano is available and fairly easy to use. To run the test vecadd code, run the make command, then./vec_add 10 input.txt You should see the following output: [unityid@c23 vecadd]$ make /usr/local/cuda/bin/nvcc -arch=sm_30 -g -G -O0 -o vectoradd.o -c vectoradd.cu /usr/local/cuda/bin/nvcc -L/usr/local/cuda/lib64 -lcuda -o vec_add vectoradd.o FA VECTOR ADDITION CUDA SAMPLE PROGRAM Compilation Done ---> nothing else to make :) [unityid@c23 vecadd]$./vec_add 10 input.txt ===== CSC506 Vector Add CUDA Sample Code ===== NUM OF ELEMENTS: 10 TRACE FILE: input.txt [Vector addition of 10 elements] Copy input data from the host memory to the CUDA device CUDA kernel launch with 1 blocks of 256 threads Copy output data from the CUDA device to the host memory =====Completed Vector Addition===== =

5 = = = = = = = = = Done [unityid@c23 vecadd]$ Once you're ready to build your program, you can compile as follows: [unityid@c23 506_ocean_sim_cuda]$ make clean; make cuda rm -rf SERIAL OMP CUDA rm -f *.o ocean_sim_* g++ -O0 -Wall -Werror -D CUDA -c main.cpp -o CUDA/main.o g++ -O0 -Wall -Werror -D CUDA -c grid.cpp -o CUDA/grid.o /usr/local/cuda-7.0/bin/nvcc -arch=sm_21 -g -G -O0 -o CUDA/solver_cuda.o -c solver_cuda.cu /usr/local/cuda-7.0/bin/nvcc -arch=sm_21 -g -G -O0 -o ocean_sim_cuda CUDA/main.o CUDA/grid.o CUDA/solver_cuda.o FA OCEAN SIMULATOR CUDA VERSION [unityid@c23 506_ocean_sim_cuda]$ When you're ready to run, be sure you're on a compute node. Your prompt should say something like [unityid@c23 506_ocean_sim_cuda]$. You can run your program just like you would an ordinary program../ocean_sim_cuda input_16x16.txt > val_16x16.txt Your output should match the given validation runs in terms of given results and format. You will need to match the results using the diff command. You can use the following command: diff iw given output file your output file You can dump the output from your simulator to stdout and redirect it to a file using the > operator. You will be provided with outputs of 5 validation runs. 4. Report For the OpenMP version, investigate the effect of varying the grid size and the number of threads. Graph the results and explain what you see. For the CUDA version, compare the sequence of commands to launch a CUDA kernel against launching serial code. If you obtain a speedup, explain why. If you don t, explain the overhead that prevents it. 5. Grading 20%: Your code compiles successfully 5

6 40%: Your output matches exactly for runs on all five files (points will be equally distributed). 40%: Report. Credit will be given on the statistics shown and discussion presented. 6. Submission Format In order to grade all submissions promptly, we have to ask you to follow the submission format. Your final submission should be in a zip file named unityid1_unityid2_program2.zip. If you are not working as a team, just name the file unityid_program2.zip. Your Unity ID is the one generated from your name, with alphabets and sometimes digits at the end. Do NOT use your campus card ID or your alias. Your zip file should only contain the following files: grid.cpp grid.h main.cpp main.h Makefile solver_cuda.cu solver_cuda.h solver_omp.cpp solver_omp.h solver_serial.cpp solver_serial.h Do not include the parent folder in your zip file. Also, no need to include the input and output files. This command should help you generate the zip file: zip -r unityid1_unityid2_program2.zip *.cpp *.h *.cu Makefile Not following the submission format property will result in a maximum of 5 points penalty. 7. Suggestions Read the main and superclasses carefully, and understand how the program works Most of the code given to you is well encapsulated, so you do not have to modify most of the existing functions. You just need to define the incomplete functions Understand how the different CUDA APIs handle memory-allocation errors and out-of-bounds references. Make sure there are no memory leaks in your program by de-allocating memory after you are done with it. Most of the code given to you is well encapsulated, so you do not have to modify most of the existing functions. You just need to complete the definition of the incomplete functions. There will be occasional downtime with ARC. Please start working on the assignment at the earliest and not wait until the last minute. 6

Parallelization of an Example Program

Parallelization of an Example Program Parallelization of an Example Program [ 2.3] In this lecture, we will consider a parallelization of the kernel of the Ocean application. Goals: Illustrate parallel programming in a low-level parallel language.

More information

Simulating ocean currents

Simulating ocean currents Simulating ocean currents We will study a parallel application that simulates ocean currents. Goal: Simulate the motion of water currents in the ocean. Important to climate modeling. Motion depends on

More information

Parallelization Principles. Sathish Vadhiyar

Parallelization Principles. Sathish Vadhiyar Parallelization Principles Sathish Vadhiyar Parallel Programming and Challenges Recall the advantages and motivation of parallelism But parallel programs incur overheads not seen in sequential programs

More information

CSC/ECE 506: Computer Architecture and Multiprocessing Program 3: Simulating DSM Coherence Due: Tuesday, Nov 22, 2016

CSC/ECE 506: Computer Architecture and Multiprocessing Program 3: Simulating DSM Coherence Due: Tuesday, Nov 22, 2016 CSC/ECE 506: Computer Architecture and Multiprocessing Program 3: Simulating DSM Coherence Due: Tuesday, Nov 22, 2016 1. Overall Problem Description In this project, you will add new features to a trace-driven

More information

Lect. 2: Types of Parallelism

Lect. 2: Types of Parallelism Lect. 2: Types of Parallelism Parallelism in Hardware (Uniprocessor) Parallelism in a Uniprocessor Pipelining Superscalar, VLIW etc. SIMD instructions, Vector processors, GPUs Multiprocessor Symmetric

More information

Assignment 7. CUDA Programming Assignment

Assignment 7. CUDA Programming Assignment Assignment 7 CUDA Programming Assignment B. Wilkinson, April 17a, 2016 The purpose of this assignment is to become familiar with writing, compiling, and executing CUDA programs. We will use cci-grid08,

More information

ECE7660 Parallel Computer Architecture. Perspective on Parallel Programming

ECE7660 Parallel Computer Architecture. Perspective on Parallel Programming ECE7660 Parallel Computer Architecture Perspective on Parallel Programming Outline Motivating Problems (application case studies) Process of creating a parallel program What a simple parallel program looks

More information

Lecture 2: Parallel Programs. Topics: consistency, parallel applications, parallelization process

Lecture 2: Parallel Programs. Topics: consistency, parallel applications, parallelization process Lecture 2: Parallel Programs Topics: consistency, parallel applications, parallelization process 1 Sequential Consistency A multiprocessor is sequentially consistent if the result of the execution is achievable

More information

CUDA Kenjiro Taura 1 / 36

CUDA Kenjiro Taura 1 / 36 CUDA Kenjiro Taura 1 / 36 Contents 1 Overview 2 CUDA Basics 3 Kernels 4 Threads and thread blocks 5 Moving data between host and device 6 Data sharing among threads in the device 2 / 36 Contents 1 Overview

More information

Parallel Poisson Solver in Fortran

Parallel Poisson Solver in Fortran Parallel Poisson Solver in Fortran Nilas Mandrup Hansen, Ask Hjorth Larsen January 19, 1 1 Introduction In this assignment the D Poisson problem (Eq.1) is to be solved in either C/C++ or FORTRAN, first

More information

CSC/ECE 506: Architecture of Parallel Computers Program 2: Bus-Based Cache Coherence Protocols Due: Wednesday, October 25, 2017

CSC/ECE 506: Architecture of Parallel Computers Program 2: Bus-Based Cache Coherence Protocols Due: Wednesday, October 25, 2017 CSC/ECE 506: Architecture of Parallel Computers Program 2: Bus-Based Cache Coherence Protocols Due: Wednesday, October 25, 2017 1. Overall Problem Description In this project, you will add new features

More information

CSCI 4210 Operating Systems CSCI 6140 Computer Operating Systems Homework 3 (document version 1.2) Multi-threading in C using Pthreads

CSCI 4210 Operating Systems CSCI 6140 Computer Operating Systems Homework 3 (document version 1.2) Multi-threading in C using Pthreads CSCI 4210 Operating Systems CSCI 6140 Computer Operating Systems Homework 3 (document version 1.2) Multi-threading in C using Pthreads Overview This homework is due by 11:59:59 PM on Tuesday, April 10,

More information

It is academic misconduct to share your work with others in any form including posting it on publicly accessible web sites, such as GitHub.

It is academic misconduct to share your work with others in any form including posting it on publicly accessible web sites, such as GitHub. p4: Cache Simulator 1. Logistics 1. This project must be done individually. It is academic misconduct to share your work with others in any form including posting it on publicly accessible web sites, such

More information

Cuda Compilation Utilizing the NVIDIA GPU Oct 19, 2017

Cuda Compilation Utilizing the NVIDIA GPU Oct 19, 2017 Cuda Compilation Utilizing the NVIDIA GPU Oct 19, 2017 This document will essentially provide the reader with the understanding on how to use the CUDA 7.0 environment within the Electrical and Computer

More information

Parallel Programming Basics

Parallel Programming Basics Lecture 4: Parallel Programming Basics Parallel Computer Architecture and Programming CMU 15-418/15-618, Fall 2017 1 Review: 3 parallel programming models Shared address space - Communication is unstructured,

More information

Introduction to OpenMP

Introduction to OpenMP 1 / 7 Introduction to OpenMP: Exercises and Handout Introduction to OpenMP Christian Terboven Center for Computing and Communication, RWTH Aachen University Seffenter Weg 23, 52074 Aachen, Germany Abstract

More information

Shared Memory Programming With OpenMP Exercise Instructions

Shared Memory Programming With OpenMP Exercise Instructions Shared Memory Programming With OpenMP Exercise Instructions John Burkardt Interdisciplinary Center for Applied Mathematics & Information Technology Department Virginia Tech... Advanced Computational Science

More information

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can

More information

CSCI 204 Introduction to Computer Science II

CSCI 204 Introduction to Computer Science II CSCI 204 Project 2 Maze Assigned: Wednesday 09/27/2017 First Phase (Recursion) Due Friday, 10/06/2017 Second Phase (Stack) Due Monday, 10/16/2017 1 Objective The purpose of this assignment is to give you

More information

Homework 3 Grade Database Management Due Date

Homework 3 Grade Database Management Due Date Homework 3 Grade Database Management Due Date Project Statement This assignment is meant to grant you further C experience inside a full Linux environment. You will be designing software that creates a

More information

Advanced Computer Architecture Lab 3 Scalability of the Gauss-Seidel Algorithm

Advanced Computer Architecture Lab 3 Scalability of the Gauss-Seidel Algorithm Advanced Computer Architecture Lab 3 Scalability of the Gauss-Seidel Algorithm Andreas Sandberg 1 Introduction The purpose of this lab is to: apply what you have learned so

More information

Domain Decomposition: Computational Fluid Dynamics

Domain Decomposition: Computational Fluid Dynamics Domain Decomposition: Computational Fluid Dynamics December 0, 0 Introduction and Aims This exercise takes an example from one of the most common applications of HPC resources: Fluid Dynamics. We will

More information

Parallel Programming Basics

Parallel Programming Basics Lecture 5: Parallel Programming Basics Parallel Computer Architecture and Programming Prof. Kayvon in China Prof. Kayvon in China CMU / 清华 大学, Summer 2017 Checking your understanding export void sinx(

More information

Domain Decomposition: Computational Fluid Dynamics

Domain Decomposition: Computational Fluid Dynamics Domain Decomposition: Computational Fluid Dynamics July 11, 2016 1 Introduction and Aims This exercise takes an example from one of the most common applications of HPC resources: Fluid Dynamics. We will

More information

AMath 483/583 Lecture 24. Notes: Notes: Steady state diffusion. Notes: Finite difference method. Outline:

AMath 483/583 Lecture 24. Notes: Notes: Steady state diffusion. Notes: Finite difference method. Outline: AMath 483/583 Lecture 24 Outline: Heat equation and discretization OpenMP and MPI for iterative methods Jacobi, Gauss-Seidel, SOR Notes and Sample codes: Class notes: Linear algebra software $UWHPSC/codes/openmp/jacobi1d_omp1.f90

More information

Lecture 16: Recapitulations. Lecture 16: Recapitulations p. 1

Lecture 16: Recapitulations. Lecture 16: Recapitulations p. 1 Lecture 16: Recapitulations Lecture 16: Recapitulations p. 1 Parallel computing and programming in general Parallel computing a form of parallel processing by utilizing multiple computing units concurrently

More information

AMath 483/583 Lecture 24

AMath 483/583 Lecture 24 AMath 483/583 Lecture 24 Outline: Heat equation and discretization OpenMP and MPI for iterative methods Jacobi, Gauss-Seidel, SOR Notes and Sample codes: Class notes: Linear algebra software $UWHPSC/codes/openmp/jacobi1d_omp1.f90

More information

Module 4: Parallel Programming: Shared Memory and Message Passing Lecture 7: Examples of Shared Memory and Message Passing Programming

Module 4: Parallel Programming: Shared Memory and Message Passing Lecture 7: Examples of Shared Memory and Message Passing Programming The Lecture Contains: Shared Memory Version Mutual Exclusion LOCK Optimization More Synchronization Message Passing Major Changes MPI-like Environment file:///d /...haudhary,%20dr.%20sanjeev%20k%20aggrwal%20&%20dr.%20rajat%20moona/multi-core_architecture/lecture7/7_1.htm[6/14/2012

More information

Domain Decomposition: Computational Fluid Dynamics

Domain Decomposition: Computational Fluid Dynamics Domain Decomposition: Computational Fluid Dynamics May 24, 2015 1 Introduction and Aims This exercise takes an example from one of the most common applications of HPC resources: Fluid Dynamics. We will

More information

CS 6400 Lecture 11 Name:

CS 6400 Lecture 11 Name: Readers and Writers Example - Granularity Issues. Multiple concurrent readers, but exclusive access for writers. Original Textbook code with ERRORS - What are they? Lecture 11 Page 1 Corrected Textbook

More information

Concurrency, Thread. Dongkun Shin, SKKU

Concurrency, Thread. Dongkun Shin, SKKU Concurrency, Thread 1 Thread Classic view a single point of execution within a program a single PC where instructions are being fetched from and executed), Multi-threaded program Has more than one point

More information

CpSc 1111 Lab 1 Introduction to Unix Systems, Editors, and C

CpSc 1111 Lab 1 Introduction to Unix Systems, Editors, and C CpSc 1111 Lab 1 Introduction to Unix Systems, Editors, and C Welcome! Welcome to your CpSc 111 lab! For each lab this semester, you will be provided a document like this to guide you. This material, as

More information

Parallel Programming Project Notes

Parallel Programming Project Notes 1 Parallel Programming Project Notes Mike Bailey mjb@cs.oregonstate.edu project.notes.pptx Why Are These Notes Here? 2 These notes are here to: 1. Help you setup and run your projects 2. Help you get the

More information

Project 3a: Malloc and Free

Project 3a: Malloc and Free Project 3a: Malloc and Free DUE 03/17 at 11:59 PM One late day allowed for submission without any penalty Objectives There are four objectives to this part of the assignment: Note To understand the nuances

More information

HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA

HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES Cliff Woolley, NVIDIA PREFACE This talk presents a case study of extracting parallelism in the UMT2013 benchmark for 3D unstructured-mesh

More information

Solving something like this

Solving something like this Waves! Solving something like this The Wave Equation (1-D) = (n-d) = The Wave Equation,, =,, (,, ) (,, ) ( ) = (,, ) (,, ) ( ), =2,, + (, 2, +, ) Boundary Conditions Examples: Manual motion at an end u(0,

More information

Working Outside the Lab

Working Outside the Lab Working Outside the Lab Step 1. Connect to the correct WiFi network In order to work on campus your computer must be connected to a secure network. (not the UARK guest Wi-Fi) Step 2. Use ssh to access

More information

CSE 4/521 Introduction to Operating Systems

CSE 4/521 Introduction to Operating Systems CSE 4/521 Introduction to Operating Systems Lecture 5 Threads (Overview, Multicore Programming, Multithreading Models, Thread Libraries, Implicit Threading, Operating- System Examples) Summer 2018 Overview

More information

Software Design and Analysis for Engineers

Software Design and Analysis for Engineers Software Design and Analysis for Engineers by Dr. Lesley Shannon Email: lshannon@ensc.sfu.ca Course Website: http://www.ensc.sfu.ca/~lshannon/courses/ensc251 Simon Fraser University Slide Set: 2 Date:

More information

Lecture V: Introduction to parallel programming with Fortran coarrays

Lecture V: Introduction to parallel programming with Fortran coarrays Lecture V: Introduction to parallel programming with Fortran coarrays What is parallel computing? Serial computing Single processing unit (core) is used for solving a problem One task processed at a time

More information

Shared Memory Programming With OpenMP Computer Lab Exercises

Shared Memory Programming With OpenMP Computer Lab Exercises Shared Memory Programming With OpenMP Computer Lab Exercises Advanced Computational Science II John Burkardt Department of Scientific Computing Florida State University http://people.sc.fsu.edu/ jburkardt/presentations/fsu

More information

Assignment 5 Using Paraguin to Create Parallel Programs

Assignment 5 Using Paraguin to Create Parallel Programs Overview Assignment 5 Using Paraguin to Create Parallel Programs C. Ferner andb. Wilkinson October 15, 2014 The goal of this assignment is to use the Paraguin compiler to create parallel solutions using

More information

Parallel and Distributed Computing

Parallel and Distributed Computing Parallel and Distributed Computing Project Assignment MAX-SAT SOLVER Version 1.0 (07/03/2016) 2015/2016 2nd Semester CONTENTS Contents 1 Introduction 2 2 Problem Description 2 2.1 Illustrative Example...................................

More information

a f b e c d Figure 1 Figure 2 Figure 3

a f b e c d Figure 1 Figure 2 Figure 3 CS2604 Fall 2001 PROGRAMMING ASSIGNMENT #4: Maze Generator Due Wednesday, December 5 @ 11:00 PM for 125 points Early bonus date: Tuesday, December 4 @ 11:00 PM for 13 point bonus Late date: Thursday, December

More information

Practical 2: Ray Tracing

Practical 2: Ray Tracing 2017/2018, 4th quarter INFOGR: Graphics Practical 2: Ray Tracing Author: Jacco Bikker The assignment: The purpose of this assignment is to create a small Whitted-style ray tracer. The renderer should be

More information

Hands-on CUDA Optimization. CUDA Workshop

Hands-on CUDA Optimization. CUDA Workshop Hands-on CUDA Optimization CUDA Workshop Exercise Today we have a progressive exercise The exercise is broken into 5 steps If you get lost you can always catch up by grabbing the corresponding directory

More information

1 DOMjudge Overview Scoreboard... 2

1 DOMjudge Overview Scoreboard... 2 Contents 1 DOMjudge 2 1.1 Overview................................................. 2 1.2 Scoreboard................................................ 2 2 Solving a Problem 3 2.1 Reading the Problem Statement....................................

More information

Project 2. Assigned: 02/20/2015 Due Date: 03/06/2015

Project 2. Assigned: 02/20/2015 Due Date: 03/06/2015 CSE 539 Project 2 Assigned: 02/20/2015 Due Date: 03/06/2015 Building a thread-safe memory allocator In this project, you will implement a thread-safe malloc library. The provided codebase includes a simple

More information

INTRODUCTION TO GPU COMPUTING WITH CUDA. Topi Siro

INTRODUCTION TO GPU COMPUTING WITH CUDA. Topi Siro INTRODUCTION TO GPU COMPUTING WITH CUDA Topi Siro 19.10.2015 OUTLINE PART I - Tue 20.10 10-12 What is GPU computing? What is CUDA? Running GPU jobs on Triton PART II - Thu 22.10 10-12 Using libraries Different

More information

CpSc 1111 Lab 9 2-D Arrays

CpSc 1111 Lab 9 2-D Arrays CpSc 1111 Lab 9 2-D Arrays Overview This week, you will gain some experience with 2-dimensional arrays, using loops to do the following: initialize a 2-D array with data from an input file print out the

More information

CSE 5A Introduction to Programming I (C) Homework 4

CSE 5A Introduction to Programming I (C) Homework 4 CSE 5A Introduction to Programming I (C) Homework 4 Read Chapter 7 Due: Friday, October 26 by 6:00pm All programming assignments must be done INDIVIDUALLY by all members of the class. Start early to ensure

More information

--Introduction to Culler et al. s Chapter 2, beginning 192 pages on software

--Introduction to Culler et al. s Chapter 2, beginning 192 pages on software CS/ECE 757: Advanced Computer Architecture II (Parallel Computer Architecture) Parallel Programming (Chapters 2, 3, & 4) Copyright 2001 Mark D. Hill University of Wisconsin-Madison Slides are derived from

More information

Report of Linear Solver Implementation on GPU

Report of Linear Solver Implementation on GPU Report of Linear Solver Implementation on GPU XIANG LI Abstract As the development of technology and the linear equation solver is used in many aspects such as smart grid, aviation and chemical engineering,

More information

Project 1. 1 Introduction. October 4, Spec version: 0.1 Due Date: Friday, November 1st, General Instructions

Project 1. 1 Introduction. October 4, Spec version: 0.1 Due Date: Friday, November 1st, General Instructions Project 1 October 4, 2013 Spec version: 0.1 Due Date: Friday, November 1st, 2013 1 Introduction The sliding window protocol (SWP) is one of the most well-known algorithms in computer networking. SWP is

More information

OpenMP on the FDSM software distributed shared memory. Hiroya Matsuba Yutaka Ishikawa

OpenMP on the FDSM software distributed shared memory. Hiroya Matsuba Yutaka Ishikawa OpenMP on the FDSM software distributed shared memory Hiroya Matsuba Yutaka Ishikawa 1 2 Software DSM OpenMP programs usually run on the shared memory computers OpenMP programs work on the distributed

More information

CSCI 4210 Operating Systems CSCI 6140 Computer Operating Systems Project 1 (document version 1.3) Process Simulation Framework

CSCI 4210 Operating Systems CSCI 6140 Computer Operating Systems Project 1 (document version 1.3) Process Simulation Framework CSCI 4210 Operating Systems CSCI 6140 Computer Operating Systems Project 1 (document version 1.3) Process Simulation Framework Overview This project is due by 11:59:59 PM on Thursday, October 20, 2016.

More information

Data Types primitive, arrays, objects Java overview Primitive data types in Java

Data Types primitive, arrays, objects Java overview Primitive data types in Java Data Types primitive, arrays, objects Java overview Primitive data types in Java 46 Recap Day 2! Lessons Learned:! Sample run vs. pseudocode! Java program structure: set-up, then real statements, decent

More information

CMSC 150 Lab 8, Part II: Little PhotoShop of Horrors, Part Deux 10 Nov 2015

CMSC 150 Lab 8, Part II: Little PhotoShop of Horrors, Part Deux 10 Nov 2015 CMSC 150 Lab 8, Part II: Little PhotoShop of Horrors, Part Deux 10 Nov 2015 By now you should have completed the Open/Save/Quit portion of the menu options. Today we are going to finish implementing the

More information

Programming Assignment 2

Programming Assignment 2 CMSC 417 Computer Networks Fall 2017 Programming Assignment 2 Assigned: September 14 Due: September 27, 11:59:59 PM. Weight: 1.2x 1 Introduction In this assignment you will implement distance vector routing.

More information

Project 2: Shell with History1

Project 2: Shell with History1 Project 2: Shell with History1 See course webpage for due date. Submit deliverables to CourSys: https://courses.cs.sfu.ca/ Late penalty is 10% per calendar day (each 0 to 24 hour period past due). Maximum

More information

Lecture 1: Introduction

Lecture 1: Introduction Lecture 1: Introduction ourse organization: 13 lectures on parallel architectures ~5 lectures on cache coherence, consistency ~3 lectures on TM ~2 lectures on interconnection networks ~2 lectures on large

More information

Advanced Parallel Programming

Advanced Parallel Programming Sebastian von Alfthan Jussi Enkovaara Pekka Manninen Advanced Parallel Programming February 15-17, 2016 PRACE Advanced Training Center CSC IT Center for Science Ltd, Finland All material (C) 2011-2016

More information

CSE Theory of Computing Fall 2017 Project 2-Finite Automata

CSE Theory of Computing Fall 2017 Project 2-Finite Automata CSE 30151 Theory of Computing Fall 2017 Project 2-Finite Automata Version 1: Sept. 27, 2017 1 Overview The goal of this project is to have each student understand at a deep level the functioning of a finite

More information

CPSC 427: Object-Oriented Programming

CPSC 427: Object-Oriented Programming CPSC 427: Object-Oriented Programming Michael J. Fischer Lecture 3 September 7, 2016 CPSC 427, Lecture 3 1/27 Insertion Sort Example Program specification Monolithic solution Modular solution in C Modular

More information

How to SFTP to nice.fas.harvard.edu from Windows

How to SFTP to nice.fas.harvard.edu from Windows How to SFTP to nice.fas.harvard.edu from Windows Recall that nice.fas.harvard.edu refers to a cluster of computers running Linux on which you have an account (your so-called FAS account). On this cluster

More information

Chapter 4: Threads. Operating System Concepts 9 th Edition

Chapter 4: Threads. Operating System Concepts 9 th Edition Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples

More information

Cosc 242 Assignment. Due: 4pm Friday September 15 th 2017

Cosc 242 Assignment. Due: 4pm Friday September 15 th 2017 Cosc 242 Assignment Due: 4pm Friday September 15 th 2017 Group work For this assignment we require you to work in groups of three people. You may select your own group and inform us of your choice via

More information

Software Development Techniques. 26 November Marking Scheme

Software Development Techniques. 26 November Marking Scheme Software Development Techniques 26 November 2015 Marking Scheme This marking scheme has been prepared as a guide only to markers. This is not a set of model answers, or the exclusive answers to the questions,

More information

High Performance Computing Lecture 41. Matthew Jacob Indian Institute of Science

High Performance Computing Lecture 41. Matthew Jacob Indian Institute of Science High Performance Computing Lecture 41 Matthew Jacob Indian Institute of Science Example: MPI Pi Calculating Program /Each process initializes, determines the communicator size and its own rank MPI_Init

More information

CS 6353 Compiler Construction Project Assignments

CS 6353 Compiler Construction Project Assignments CS 6353 Compiler Construction Project Assignments In this project, you need to implement a compiler for a language defined in this handout. The programming language you need to use is C or C++ (and the

More information

Sudoku-4: Logical Structure

Sudoku-4: Logical Structure Sudoku-4: Logical Structure CSCI 4526 / 6626 Fall 2016 1 Goals To use an STL vector. To use an enumerated type. To learn about tightly coupled classes. To learn about circular dependencies. To use the

More information

OpenACC 2.6 Proposed Features

OpenACC 2.6 Proposed Features OpenACC 2.6 Proposed Features OpenACC.org June, 2017 1 Introduction This document summarizes features and changes being proposed for the next version of the OpenACC Application Programming Interface, tentatively

More information

Lecture 10. Shared memory programming

Lecture 10. Shared memory programming Lecture 10 Shared memory programming Announcements 2 day Symposium for Project Presentations Wednesday and Thursday, 3/12 and 3/13 Weds: 2pm to 5pm, Thursday during class time 11 projects @ 20 minutes

More information

5.5 Example: Transforming an Adjacency Matrix, R-Callable Code

5.5 Example: Transforming an Adjacency Matrix, R-Callable Code 127 5.5 Example: Transforming an Adjacency Matrix, R-Callable Code A typical application might involve an analyst writing most of his code in R, for convenience, but write the parallel part of the code

More information

Supporting Data Parallelism in Matcloud: Final Report

Supporting Data Parallelism in Matcloud: Final Report Supporting Data Parallelism in Matcloud: Final Report Yongpeng Zhang, Xing Wu 1 Overview Matcloud is an on-line service to run Matlab-like script on client s web browser. Internally it is accelerated by

More information

Project 2 Overview: Part A: User space memory allocation

Project 2 Overview: Part A: User space memory allocation Project 2 Overview: Once again, this project will have 2 parts. In the first part, you will get to implement your own user space memory allocator. You will learn the complexities and details of memory

More information

Lecture 27: Multiprocessors. Today s topics: Shared memory vs message-passing Simultaneous multi-threading (SMT) GPUs

Lecture 27: Multiprocessors. Today s topics: Shared memory vs message-passing Simultaneous multi-threading (SMT) GPUs Lecture 27: Multiprocessors Today s topics: Shared memory vs message-passing Simultaneous multi-threading (SMT) GPUs 1 Shared-Memory Vs. Message-Passing Shared-memory: Well-understood programming model

More information

Functions BCA-105. Few Facts About Functions:

Functions BCA-105. Few Facts About Functions: Functions When programs become too large and complex and as a result the task of debugging, testing, and maintaining becomes difficult then C provides a most striking feature known as user defined function

More information

ENERGY 211 / CME 211. Evolution

ENERGY 211 / CME 211. Evolution ENERGY 211 / CME 211 Lecture 2 September 24, 2008 1 Evolution In the beginning, we all used assembly That was too tedious, so a very crude compiler for FORTRAN was built FORTRAN was still too painful to

More information

Lecture 27: Pot-Pourri. Today s topics: Shared memory vs message-passing Simultaneous multi-threading (SMT) GPUs Disks and reliability

Lecture 27: Pot-Pourri. Today s topics: Shared memory vs message-passing Simultaneous multi-threading (SMT) GPUs Disks and reliability Lecture 27: Pot-Pourri Today s topics: Shared memory vs message-passing Simultaneous multi-threading (SMT) GPUs Disks and reliability 1 Shared-Memory Vs. Message-Passing Shared-memory: Well-understood

More information

Using Java for Scientific Computing. Mark Bul EPCC, University of Edinburgh

Using Java for Scientific Computing. Mark Bul EPCC, University of Edinburgh Using Java for Scientific Computing Mark Bul EPCC, University of Edinburgh markb@epcc.ed.ac.uk Java and Scientific Computing? Benefits of Java for Scientific Computing Portability Network centricity Software

More information

Creating a Shell or Command Interperter Program CSCI411 Lab

Creating a Shell or Command Interperter Program CSCI411 Lab Creating a Shell or Command Interperter Program CSCI411 Lab Adapted from Linux Kernel Projects by Gary Nutt and Operating Systems by Tannenbaum Exercise Goal: You will learn how to write a LINUX shell

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms http://sudalab.is.s.u-tokyo.ac.jp/~reiji/pna16/ [ 8 ] OpenMP Parallel Numerical Algorithms / IST / UTokyo 1 PNA16 Lecture Plan General Topics 1. Architecture and Performance

More information

Outline. Computer programming. Debugging. What is it. Debugging. Hints. Debugging

Outline. Computer programming. Debugging. What is it. Debugging. Hints. Debugging Outline Computer programming Debugging Hints Gathering evidence Common C errors "Education is a progressive discovery of our own ignorance." Will Durant T.U. Cluj-Napoca - Computer Programming - lecture

More information

Portability of OpenMP Offload Directives Jeff Larkin, OpenMP Booth Talk SC17

Portability of OpenMP Offload Directives Jeff Larkin, OpenMP Booth Talk SC17 Portability of OpenMP Offload Directives Jeff Larkin, OpenMP Booth Talk SC17 11/27/2017 Background Many developers choose OpenMP in hopes of having a single source code that runs effectively anywhere (performance

More information

Jim Lambers ENERGY 211 / CME 211 Autumn Quarter Programming Project 4

Jim Lambers ENERGY 211 / CME 211 Autumn Quarter Programming Project 4 Jim Lambers ENERGY 211 / CME 211 Autumn Quarter 2008-09 Programming Project 4 This project is due at 11:59pm on Friday, October 31. 1 Introduction In this project, you will do the following: 1. Implement

More information

CS201: Lab #4 Writing a Dynamic Storage Allocator

CS201: Lab #4 Writing a Dynamic Storage Allocator CS201: Lab #4 Writing a Dynamic Storage Allocator In this lab you will write a dynamic storage allocator for C programs, i.e., your own version of the malloc, free and realloc routines. You are encouraged

More information

Parallelism paradigms

Parallelism paradigms Parallelism paradigms Intro part of course in Parallel Image Analysis Elias Rudberg elias.rudberg@it.uu.se March 23, 2011 Outline 1 Parallelization strategies 2 Shared memory 3 Distributed memory 4 Parallelization

More information

COMP Parallel Computing. SMM (2) OpenMP Programming Model

COMP Parallel Computing. SMM (2) OpenMP Programming Model COMP 633 - Parallel Computing Lecture 7 September 12, 2017 SMM (2) OpenMP Programming Model Reading for next time look through sections 7-9 of the Open MP tutorial Topics OpenMP shared-memory parallel

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP p. 1/?? Introduction to OpenMP Simple SPMD etc. Nick Maclaren nmm1@cam.ac.uk September 2017 Introduction to OpenMP p. 2/?? Terminology I am badly abusing the term SPMD tough The

More information

A Comparison of Unified Parallel C, Titanium and Co-Array Fortran. The purpose of this paper is to compare Unified Parallel C, Titanium and Co-

A Comparison of Unified Parallel C, Titanium and Co-Array Fortran. The purpose of this paper is to compare Unified Parallel C, Titanium and Co- Shaun Lindsay CS425 A Comparison of Unified Parallel C, Titanium and Co-Array Fortran The purpose of this paper is to compare Unified Parallel C, Titanium and Co- Array Fortran s methods of parallelism

More information

A function is a named piece of code that performs a specific task. Sometimes functions are called methods, procedures, or subroutines (like in LC-3).

A function is a named piece of code that performs a specific task. Sometimes functions are called methods, procedures, or subroutines (like in LC-3). CIT Intro to Computer Systems Lecture # (//) Functions As you probably know from your other programming courses, a key part of any modern programming language is the ability to create separate functions

More information

Lecture 6: odds and ends

Lecture 6: odds and ends Lecture 6: odds and ends Prof. Mike Giles mike.giles@maths.ox.ac.uk Oxford University Mathematical Institute Oxford e-research Centre Lecture 6 p. 1 Overview synchronicity multiple streams and devices

More information

1.00 Lecture 2. What s an IDE?

1.00 Lecture 2. What s an IDE? 1.00 Lecture 2 Interactive Development Environment: Eclipse Reading for next time: Big Java: sections 3.1-3.9 (Pretend the method is main() in each example) What s an IDE? An integrated development environment

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 4 C Pointers 2004-09-08 Lecturer PSOE Dan Garcia www.cs.berkeley.edu/~ddgarcia Cal flies over Air Force We re ranked 13 th in the US and

More information

Multiprocessors at Earth This lecture: How do we program such computers?

Multiprocessors at Earth This lecture: How do we program such computers? Multiprocessors at IT/UPPMAX DARK 2 Programming of Multiprocessors Sverker Holmgren IT/Division of Scientific Computing Ra. 280 proc. Cluster with SM nodes Ngorongoro. 64 proc. SM (cc-numa) Debet&Kredit.

More information

Chapter 4: Multithreaded Programming

Chapter 4: Multithreaded Programming Chapter 4: Multithreaded Programming Silberschatz, Galvin and Gagne 2013 Chapter 4: Multithreaded Programming Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading

More information

Project Introduction

Project Introduction Project 1 Assigned date: 10/01/2018 Due Date: 6pm, 10/29/2018 1. Introduction The sliding window protocol (SWP) is one of the most well-known algorithms in computer networking. SWP is used to ensure the

More information

Chapter 4: Threads. Chapter 4: Threads

Chapter 4: Threads. Chapter 4: Threads Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples

More information

ECE 574 Cluster Computing Lecture 15

ECE 574 Cluster Computing Lecture 15 ECE 574 Cluster Computing Lecture 15 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 30 March 2017 HW#7 (MPI) posted. Project topics due. Update on the PAPI paper Announcements

More information