ECE 462 Object-Oriented Programming using C++ and Java. Parallelism

Similar documents
ECE 462 Object-Oriented Programming using C++ and Java. Scheduling and Critical Section

Introducing Parallel Computing in Undergraduate Curriculum

CMSC 132: Object-Oriented Programming II

ECE 462 Object-Oriented Programming using C++ and Java. Key Inputs in Java Games

ECE 462 Object-Oriented Programming using C++ and Java. Flickering and Double Buffering

CS 3305 Intro to Threads. Lecture 6

Chapter 5: Thread-Level Parallelism Part 1

ECE 462 Object-Oriented Programming using C++ and Java. Testing

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

ECE 462 Object-Oriented Programming using C++ and Java. Graphical User Interface

ECE 462 Object-Oriented Programming using C++ and Java. Multi-Player Game using Network

RAID 0 (non-redundant) RAID Types 4/25/2011

Introduction to parallel computing. Seminar Organization

Online Course Evaluation. What we will do in the last week?

Parallel Programming: Background Information

Parallel Programming: Background Information

BlueGene/L (No. 4 in the Latest Top500 List)

Lecture notes for CS Chapter 4 11/27/18

BCA (Part II) EXAMINATION 2008 C++ PROGRAMMING Max Time : 3 Hours Max. Marks : 50

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

FLYNN S TAXONOMY OF COMPUTER ARCHITECTURE

Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS. Teacher: Jan Kwiatkowski, Office 201/15, D-2

CS420: Operating Systems

Computer Architecture

Review for Midterm 3/28/11. Administrative. Parts of Exam. Midterm Exam Monday, April 4. Midterm. Design Review. Final projects

Chapter 6 Solutions S-3

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

UNIT I (Two Marks Questions & Answers)

Scalability and Classifications

7 Solutions. Solution 7.1. Solution 7.2

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Flynn s Taxonomy of Parallel Architectures

Computer organization by G. Naveen kumar, Asst Prof, C.S.E Department 1

Latency and Throughput

Computer Systems Architecture

ECE264 Fall 2013 Exam 1, September 24, 2013

ECE453: Advanced Computer Architecture II Homework 1

Processes, Threads, SMP, and Microkernels

Contribution:javaMultithreading Multithreading Prof. Dr. Ralf Lämmel Universität Koblenz-Landau Software Languages Team

Processor Architecture and Interconnect

18-447: Computer Architecture Lecture 30B: Multiprocessors. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013

Multithreaded Programming Part II. CSE 219 Stony Brook University, Department of Computer Science

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

THREAD LEVEL PARALLELISM

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

Parallel Computing Introduction

Chapter 4 Threads, SMP, and

Concurrent Processes Rab Nawaz Jadoon

ECE 462 Exam 3. 11:30AM-12:20PM, November 17, 2010

Parallel Computer Architectures. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan

ECE 462 Fall 2011, Third Exam

Introduction to High-Performance Computing

Chapter 4 Data-Level Parallelism

Parallel Computing: Parallel Architectures Jin, Hai

CS 6400 Lecture 11 Name:

TDT4260/DT8803 COMPUTER ARCHITECTURE EXAM

A Sophomoric Introduction to Shared-Memory Parallelism and Concurrency Lecture 4 Shared-Memory Concurrency & Mutual Exclusion

Tesla Architecture, CUDA and Optimization Strategies

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

Lecture 2: CUDA Programming

Computer Architecture: Parallel Processing Basics. Prof. Onur Mutlu Carnegie Mellon University

Programming in Parallel COMP755

Parallel Processors. The dream of computer architects since 1950s: replicate processors to add performance vs. design a faster processor

ECE264 Summer 2013 Exam 1, June 20, 2013

CSE 160 Lecture 7. C++11 threads C++11 memory model

Lecture 12: EIT090 Computer Architecture

CS Operating system

Performance Throughput Utilization of system resources

Topic 8: Lazy Evaluation

Computer Systems Architecture

CS 426 Parallel Computing. Parallel Computing Platforms

Reintroduction to Concurrency

Threads and Parallelism in Java

Programmazione di sistemi multicore

Dr. Joe Zhang PDC-3: Parallel Platforms

Outline. Object Oriented Programming. Course goals. Staff. Course resources. Assignments. Course organization Introduction Java overview Autumn 2003

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes:

Project. Threads. Plan for today. Before we begin. Thread. Thread. Minimum submission. Synchronization TSP. Thread synchronization. Any questions?

COMP 346 WINTER Tutorial 2 SHARED DATA MANIPULATION AND SYNCHRONIZATION

Multiprocessors. Flynn Taxonomy. Classifying Multiprocessors. why would you want a multiprocessor? more is better? Cache Cache Cache.

About this exam review

Chapter 2 Parallel Computer Models & Classification Thoai Nam

Introduction to Parallel Programming

Parallel Numerics, WT 2013/ Introduction

Program Graph. Lecture 25: Parallelism & Concurrency. Performance. What does it mean?

Embedded processors. Timo Töyry Department of Computer Science and Engineering Aalto University, School of Science timo.toyry(at)aalto.

Parallel Processing. Computer Architecture. Computer Architecture. Outline. Multiple Processor Organization

Lecture 35. Threads. Reading for next time: Big Java What is a Thread?

Multiple Issue and Static Scheduling. Multiple Issue. MSc Informatics Eng. Beyond Instruction-Level Parallelism

Parallel Architectures

CMSC 132: Object-Oriented Programming II. Threads in Java

Case Study: Parallelizing a Recursive Problem with Intel Threading Building Blocks

CS/CoE 1541 Final exam (Fall 2017). This is the cumulative final exam given in the Fall of Question 1 (12 points): was on Chapter 4

Advanced Computer Architecture. The Architecture of Parallel Computers

Issues in Parallel Processing. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors.

Midterm - Winter SE 350

Module Contact: Dr Anthony J. Bagnall, CMP Copyright of the University of East Anglia Version 2

! Readings! ! Room-level, on-chip! vs.!

Transcription:

ECE 462 Object-Oriented Programming g using C++ and Java Parallelism Yung-Hsiang Lu yunglu@purdue.edu d YHL Parallelism 1

Parallelism parallel: several activities occurring simultaneously Parallelism is natural phenomena is our daily lives Many students take notes in a class. Multiple cars wait for a traffic light. Several lines of people order fast food (and eat). A student uses several washers in a laundry room. Several players try to catch a basketball. A hundred customers order books on-line. YHL Parallelism 2

YHL Parallelism 3

YHL Parallelism 4

Sequential Programming Model One statement executes first, then the next statement. Control structures (if, for, while, function call...) may change the sequence of execution. Within a small segment (inside if, for, while, function...), statements are executed sequentially, one by one. Sequential programming model was appropriate when a computer had only one processor. YHL Parallelism 5

Parallel Computer on Your Desktop YHL Parallelism 6

Parallel Programming Models Multi-threading is one of the most popular, p but not the only available programming model. Other models include client-server producer-consumer pipeline multithread... YHL Parallelism 7

Client Server web browser (client) web server client sends request, server responds The client and the server may (in fact, very likely) be mulitthread individually. client request server response request time response YHL Parallelism 8

Pipeline stage 1 stage 2 stage 3 instruction-level parallelism factory assembly line buffet or BBQ line YHL Parallelism 9

Latency and Throughput Car 1 Car 2 Car 3 S1 S2 S3 S1 S2 S3 S1 S2 S3 S1 S2 S3 S1 S2 S3 S1 S2 S3 S3 S1 S2 S3 S1 S2 S3 S1 S2 S2 S3 S1 S2 S3 S1 S2 S3 S1 divide into stages pipeline time YHL Parallelism 10

Requirements for Parallelism multiple processing units multiple data independence among the units and data (usually) some forms of agreements among the units and the data YHL Parallelism 11

Multi-Thread YHL Parallelism 12

Why is Multiple Thread Important? One of the most popular p parallel programming g models. Most machines (through libraries and languages) support multithreads. Multiple threads allow a program to respond to different requests quickly. You have been using multi-thread programs for years: web browser and most GUI programs. It demonstrates many important concepts. YHL Parallelism 13

What is a thread? A thread is (almost) an independent program except different threads created in the same program can access shared variables / objects smaller overhead compared with processes in context switch once a thread starts running, it runs independently until completion or synchronization context switch = the processor changes which thread to execute OOP is a natural approach for parallelism. Objects are independent and active. YHL Parallelism 14

ECE 462 Object-Oriented Programming g using C++ and Java Threads Yung-Hsiang Lu yunglu@purdue.edu d YHL Threads 1

Java Thread class YourClass extends Thread { public void run () { // what does this thread do? } }... public static ti void main(string i [] args) { YourClass t1 = new YourClass("..."); t1.start(); } main start YHL Threads 2

Qt Thread #include <QtCore> class YourClass extends QThread { public void run () { // what does this thread do? } };... int main(int argc, char * argv[]) { YourClass t1 ("..."); t1.start(); } YHL Threads 3

Thread Execution Time t1 t1 t1 t2 t2 t2 t3 t3 t3 t1 t1 t1 t1 t1 t1 t2 t2 t2 t2 t2 t2 t3 t3 t3 t3 t3 t3 time YHL Threads 4

Many Threads, Few Processors advantages of many threads, even on single processor impression of continuous progress in all threads improve utilization of different components handle user inputs more quickly disadvantage of many threads add slight work to program structure incur overhead in thread creation cause complex interleaving the execution and possibly wrong results (if you do not think "in parallel") YHL Threads 5

Java Thread Examples YHL Threads 6

YHL Threads 7

YHL Threads 8

Matrix Addition B11 B12 B13 B14 B21 B22 B23 B24 B31 B32 B33 B34 B41 B42 B43 B44 A11 A12 A13 A14 A21 A22 A23 A24 A31 A32 A33 A34 A41 A42 A43 A44 C11 C12 C13 C14 C21 C22 C23 C24 C31 C32 C33 C34 C41 C42 C43 C44 YHL Threads 9

Divide Matrix Addition thread 1 thread 2 B11 B12 B13 B14 B21 B22 B23 B24 B31 B32 B33 B34 B41 B42 B43 B44 A11 A12 A13 A14 A21 A22 A23 A24 A31 A32 A33 A34 A41 A42 A43 A44 C11 C12 C13 C14 C21 C22 C23 C24 C31 C32 C33 C34 C41 C42 C43 C44 YHL Threads 10

Matrix Multiplication B11 B12 B13 B14 B21 B22 B23 B24 B31 B32 B33 B34 B41 B42 B43 B44 A11 A12 A13 A14 A21 A22 A23 A24 A31 A32 A33 A34 A41 A42 A43 A44 C11 C12 C13 C14 C21 C22 C23 C24 C31 C32 C33 C34 C41 C42 C43 C44 YHL Threads 11

Divide Matrix Multiplication thread 1 thread 2 B11 B12 B13 B14 B21 B22 B23 B24 B31 B32 B33 B34 B41 B42 B43 B44 A11 A12 A13 A14 A21 A22 A23 A24 A31 A32 A33 A34 A41 A42 A43 A44 C11 C12 C13 C14 C21 C22 C23 C24 C31 C32 C33 C34 C41 C42 C43 C44 YHL Threads 12

main thread th[0] = new AdderThread... th[1] = new AdderThread... th[2] = new AdderThread... th[0].start(); th[1].start(); th[2].start(); time th[0].join(); th[1].join(); th[2].join(); YHL Threads 13

YHL Threads 14

YHL Threads 15

YHL Threads 16

YHL Threads 17

YHL Threads 18

YHL Threads 19

YHL Threads 20

YHL Threads 21

YHL Threads 22

YHL Threads 23

YHL Threads 24

totally 8 processors (0 7) YHL Threads 25

8GB memory YHL Threads 26

When Java starts, use 1024MB memory. Maximum memory is 2048MB. YHL Threads 27

YHL Threads 28

Qt Thread YHL Threads 29

YHL Threads 30

YHL Threads 31

YHL Threads 32

YHL Threads 33

YHL Threads 34

YHL Threads 35

YHL Threads 36

YHL Threads 37

YHL Threads 38

YHL Threads 39

YHL Threads 40

ECE 462 Object-Oriented Programming g using C++ and Java Parallel Programs Yung-Hsiang Lu yunglu@purdue.edu d YHL Parallel Programs 1

Classification S: single, M: multiple, I: instruction, D: data SISD: single instruction single data sequential program on a single-processor computer SIMD: single instruction multiple data multiple data elements are operated in the same way, matrix addition MISD: multiple instruction single data relatively rare, find the eigenvalues and transpose of the same matrix MIMD: multiple instruction multiple data most general form of parallel computation YHL Parallel Programs 2

Parallelization data parallelism break a big gpiece of data into smaller units different rows of matrices different bank accounts different books instruction parallelism break a long function into smaller functions different bank transactions different purchase options granularity of data and instruction parallelism YHL Parallel Programs 3

Data Sharing read only no problem BIGG write problem, pobe if youae are not caeu careful read (d0) before write (d1) write (d0) before read (d1) write (d0) before write (d1) will be explained later on the topic of synchronization, mutual exclusion, and critical section YHL Parallel Programs 4

Data Partition how to add a group of number? divide them into pairs and add the pairs recursively 7 9 6 5 3 4 1 8 2 6 t1 t2 t3 t4 t5 time 16 11 7 9 8 t1 O(log n) t2 27 16 t1 43 YHL Parallel Programs 5 t1 51

Assumptions the time and space (overhead) for thread creation, scheduling, and destruction can be ignored number of threads = half the number of operands (initially) iti number of processor = half the number of operands (initially) all processors can read the operands within the same amount of time the intermediate results are accessible by other processors YHL Parallel Programs 6

Structure of Parallel Programs data partition create threads start thread execution performance improvement by parallel processing collect results YHL Parallel Programs 7