Chapter 6 Parallel Loops

Size: px
Start display at page:

Download "Chapter 6 Parallel Loops"

Transcription

1 Chapter 6 Parallel Loops Part I. Preliminaries Part II. Tightly Coupled Multicore Chapter 6. Parallel Loops Chapter 7. Parallel Loop Schedules Chapter 8. Parallel Reduction Chapter 9. Reduction Variables Chapter 10. Load Balancing Chapter 11. Overlapping Chapter 12. Sequential Dependencies Chapter 13. Strong Scaling Chapter 14. Weak Scaling Chapter 15. Exhaustive Search Chapter 16. Heuristic Search Chapter 17. Parallel Work Queues Part III. Loosely Coupled Cluster Part IV. GPU Acceleration Part V. Map-Reduce Appendices 45

2 46 BIG CPU, BIG DATA W e begin our study of tightly coupled multicore parallel programming with a simple Parallel Java 2 program to test numbers for primality. Recall that a number is prime if it is divisible only by itself and 1. PrimeSeq (Listing 6.1) is a sequential (non-parallel) program to test the numbers specified on the command line. The program illustrates several features that I ll include in all the parallel programs we study: The program is implemented as a subclass of class Task (in package edu.rit.pj2), and the program s code is in the body of the task s main() method. Unlike a typical Java program, the main() method is an instance method, not a static method. The Parallel Java 2 middleware expects this. The main() method is declared to throw any exception (line 9). If an exception is thrown anywhere in the program, this lets the exception propagate out of the main() method, which will terminate the program and print an error message with an exception stack trace. I do this because I m lazy and I don t want to write a handler for every exception. At line 12, the program makes sure the proper command line arguments are present; if not, the program prints an error message and exits. If I run the program with no arguments, this reminds me what the arguments should be. On line 41, the program exits by calling the terminate() method. The argument is an exit code that is returned to the operating system; a nonzero value says that an error occurred. The program must not call System.exit(); doing so interferes with the Parallel Java 2 middleware. I overrode the static coresrequired() method (lines 45 48) to return the value 1, indicating that this program will use one core. This is standard for a non-parallel program. If the coresrequired() method is not overridden, the Parallel Java 2 middleware will assume that the program will use all the cores on the node. The loop on lines is the heart of the program. The loop iterates over the command line argument array, converts each number from a String to a long, calls the isprime() method, and prints the number if isprime() says it s prime. The isprime() method uses trial division to test whether the number x is prime. The method tries to divide x by 2 and by every odd number p up through the square root of x. If any remainder is 0, then p is a factor of x, so x is not prime. If none of the remainders are 0, then x is prime. (There s no point in trying factors greater than the square root of x, because if there were such a factor, x would have another factor less than the square root of x, and we would have found that other factor already.) Trial division is not a very efficient algorithm for primality testing, but it suffices for this example.

3 Chapter 6. Parallel Loops package edu.rit.pj2.example; import edu.rit.pj2.task; public class PrimeSeq extends Task // Main program. public void main (String[] args) throws Exception // Validate command line arguments. if (args.length < 1) usage(); // Test numbers for primality. for (int i = 0; i < args.length; ++ i) if (isprime (Long.parseLong (args[i]))) System.out.printf ("%s%n", args[i]); // Test the given number for primality. private static boolean isprime (long x) if (x % 2 == 0) return false; long p = 3; long psqr = p*p; while (psqr <= x) if (x % p == 0) return false; p += 2; psqr = p*p; return true; // Print a usage message and exit. private static void usage() System.err.println ("Usage: java pj2 " + "edu.rit.pj2.example.primeseq <number>..."); terminate (1); // Specify that this task requires one core. protected static int coresrequired() return 1; Listing 6.1. PrimeSeq.java

4 48 BIG CPU, BIG DATA I ran the PrimeSeq program on tardis, a cluster parallel computer with ten nodes, each node having 12 cores. For now I ll confine myself to running multicore parallel programs on just one node of the cluster. (Later we will develop cluster parallel programs and run them on all the cluster nodes.) Because PrimeSeq is not a parallel program, though, it ran on only one core. I gave it 12 very large numbers to test, all of which happened to be prime. Here is the command and the program s output on one of the tardis nodes: $ java pj2 debug=makespan edu.rit.pj2.example.primeseq \ \ \ \ Job 1 makespan msec Note that I ran the program by typing the command java pj2. pj2 is the actual Java main program; it is a launcher for Parallel Java 2 programs. pj2 creates an instance of the specified class (class edu.rit.pj2.example-.primeseq in this case), which must be a subclass of class Task. pj2 then calls the task s main() method, passing in an array of the command line argument strings. I also included the option debug=makespan before the task class name. This sets the pj2 program s debug parameter to include the makespan debug printout. Makespan is the elapsed wall clock time from when the task starts running to when the task finishes running. With that option, the pj2 program measures the makespan and prints it as the final line of output. (You can turn on other debug printouts and set other pj2 parameters as well. Refer to the Javadoc documentation for the pj2 program.) The loop iterations executed sequentially one after another on a single core. The running time measurement says that the whole program took 32.6 seconds. From this I infer that each loop iteration each execution of the is- Prime() method took 2.7 seconds.

5 Chapter 6. Parallel Loops 49 However, for this program there s no need to do the loop iterations in sequence. Because no iteration depends on the results of any prior iteration, we say that the loop does not have any sequential dependencies. (Later we will study loops that do have sequential dependencies.) Therefore, we can execute all the loop iterations in parallel, each on a separate core. Doing so, we hope the program will finish in less time. PrimeSmp (Listing 6.2) is a parallel version of the primality testing program. It starts out the same as program PrimeSeq. But I replaced the normal, sequential for loop in the original program with a work sharing parallel for loop in the new program (lines 16 23). The pattern for writing a parallel for loop is parallelfor (lb, ub).exec (new Loop() public void run (int i) Loop body code for iteration i ); The parallel for loop begins with parallelfor instead of just for. (parallelfor() is actually a method of class Task.) Then come the lower and upper bounds of the loop index. The loop goes from lb to ub inclusive, so at line 16 I specified bounds of 0 through args.length-1 to loop over all the command line arguments. The statement so far creates a parallel loop object. Then I called the parallel loop s exec() method, passing in another object, namely the loop body. The loop body is an instance of a subclass of class Loop (in package edu.rit.pj2), which I created using Java s anonymous inner class syntax. I put the code for one loop iteration in the Loop class s run() method, whose argument is the loop index i; this is the same code as the loop

6 50 BIG CPU, BIG DATA body in the sequential version. The rest of the program is the same as the sequential version. I did not override the coresrequired() method in the PrimeSmp program. Thus, by default, the Parallel Java 2 middleware will assume that the program will use all the cores on the node. When the PrimeSmp program runs, the parallel for loop object that is created contains a hidden parallel thread team. There is one thread in the team for each core of the machine where the program is executing. The loop iterations are partitioned among the team threads, and each team thread calls the loop body s run() method repeatedly for a different subset of the loop indexes, concurrently with the other team threads. In other words, the work of the parallel for loop is shared among all the threads, rather than being executed by a single thread as in the sequential version. When the program runs on a multicore parallel computer, the Java Virtual Machine and the operating system schedule each team thread to run on a separate core, resulting in parallel execution of the loop iterations. At the end of the parallel for loop, each team thread waits until all the team threads have finished executing their subsets of the loop iterations, then the program proceeds to execute whatever comes after the parallel loop. This implicit thread synchronization is called a barrier. I ran the PrimeSmp program on tardis, with the same arguments as the previous example. A tardis node has 12 cores, so the parallel for loop has 12 team threads. The loop s iterations are divided among the team threads; thus, each team thread does one iteration. Here is the result: $ java pj2 debug=makespan edu.rit.pj2.example.primesmp \ \ \ \ Job 1 makespan 2857 msec Keep in mind that the number of threads in the parallel thread team is equal to the number of cores in the node, not the number of iterations in the loop. In this example, it s just a coincidence that the number of loop itera-

7 Chapter 6. Parallel Loops package edu.rit.pj2.example; import edu.rit.pj2.loop; import edu.rit.pj2.task; public class PrimeSmp extends Task // Main program. public void main (final String[] args) throws Exception // Validate command line arguments. if (args.length < 1) usage(); // Test numbers for primality. parallelfor (0, args.length - 1).exec (new Loop() public void run (int i) if (isprime (Long.parseLong (args[i]))) System.out.printf ("%s%n", args[i]); ); // Test the given number for primality. private static boolean isprime (long x) if (x % 2 == 0) return false; long p = 3; long psqr = p*p; while (psqr <= x) if (x % p == 0) return false; p += 2; psqr = p*p; return true; // Print a usage message and exit. private static void usage() System.err.println ("Usage: java pj2 " + "edu.rit.pj2.example.primesmp <number>..."); terminate (1); Listing 6.2. PrimeSmp.java

8 52 BIG CPU, BIG DATA tions is the same as the number of cores. Shortly we will see examples of parallel loops where the number of loop iterations is not the same as the number of cores. Notice three things about the parallel program. First, while the sequential program finished in 32.6 seconds, the parallel program finished in only 2.9 seconds. From this I infer that the 12 loop iterations were indeed performed simultaneously on 12 cores instead of one after another on a single core. That is, the parallel program yielded a speedup over the sequential program. To be precise, the speedup factor was = (Later we will study more about parallel program performance and the metrics with which we measure performance.) Second, the parallel program s running time (2.9 seconds) was a bit larger than the running time of one loop iteration (2.7 seconds). Why? Because the parallel program has extra overhead creating the 12 threads in the parallel thread team, partitioning the loop iterations among the team threads, synchronizing the threads at the end-of-loop barrier that the sequential program does not have. This extra overhead increases the parallel program s running time slightly. Third, while the parallel program did determine correctly that every number on the command line was prime, the parallel program reported the prime numbers in a different order from the sequential program. This happened because each number was printed by a separate team thread, and there was nothing in the program to force the team threads to print the numbers in any particular order. (In this example program, there s no need to synchronize the threads so as to make them do their printouts in a certain order.) Under the Hood Figure 6.1 shows in more detail what happens when the PrimeSmp program executes the parallel for loop code, parallelfor (0, args.length 1).exec (new Loop() public void run (int i) if (isprime (Long.parseLong (args[i]))) System.out.printf ("%s%n", args[i]); ); on a parallel computer with 12 cores. Keep in mind that all this happens automatically. I only had to write the code above. Still, it s helpful to understand what s going on under the hood. In the figure, the various objects are arranged from top to bottom, and time flows from left to right. Methods called on each object are depicted as gray boxes. The threads calling the methods are shown as thick lines. A solid

9 Chapter 6. Parallel Loops 53 Figure 6.1. Parallel for loop execution flow line means the thread is executing; a dashed line means the thread is blocked waiting for something to happen. Execution begins with the main program thread calling the main() method on the Task object, an instance of the PrimeSmp subclass. The main thread calls the task s parallelfor() method with a lower bound of 0 and an upper bound of 11, which are the index bounds of the args array with 12 command line arguments. The parallelfor() method creates a parallel for loop object, an instance of class IntParallelForLoop. Hidden inside the parallel for loop is a team of threads, one thread for each core. Each team thread has a different rank in the range 0 through 11. Initially, the team threads are blocked until the program is ready to execute the parallel for loop.

10 54 BIG CPU, BIG DATA The main thread creates a loop object, which is an instance of an anonymous inner subclass of class Loop. The main thread calls the parallel for loop s exec() method, passing in the loop object. The exec() method creates 11 additional copies of the loop object by calling the loop object s clone() method. The main thread unblocks the team threads, then blocks itself inside the exec() method until the team threads finish. At this point the parallel for loop begins executing. Each team thread calls the run() method on a different one of the loop objects. Because each team thread is executing on a different core, the run() method calls proceed in parallel. Each team thread passes a different index as the run() method s argument. However, across all the team threads, every loop index from 0 to 11 gets passed to some loop object s run() method. Each team thread, executing the code in its own run() method, tests one of the numbers on the command line for primality and prints the number if it s prime. The Parallel Java 2 middleware automatically collects the characters each thread prints on System.out (or System.err) in separate per-thread internal buffers. When the program terminates, the buffers contents are written to the program s standard output stream (or standard error stream), one buffer at a time. Thus, characters printed by different threads are not commingled. To emit a printout before the end of the program, after printing the characters, call System.out.flush() (or System.err.flush()). After returning from the loop object s run() method, each team thread waits at a barrier. When all the team threads have arrived at the barrier, the team threads block themselves, and the main thread is unblocked. The main thread resumes executing and returns from the parallel for loop s exec() method. The main thread continues executing the code in the task s main() method after the parallel for loop. Thus, the parallel program follows this pattern of execution: Sequential section (single main program thread) Parallel section (multiple team threads) Sequential section (single main program thread) This pattern, of one or more parallel sections interspersed within a sequential section, is found in almost every multicore parallel program. Only a portion of the program is executed in parallel; the rest of the program is executed sequentially. We will return to this observation when we study parallel program performance. What if we run the PrimeSmp program with 12 command line arguments on a parallel computer with more than 12 cores? The parallel team will have more threads than needed to handle all the loop indexes. In that case, team threads rank 0 through 11 call the loop objects run() methods as described

11 Chapter 6. Parallel Loops 55 above, and team threads rank 12 and higher merely proceed directly to the barrier without calling the run() method. What if we do a parallel for loop with more loop indexes than there are cores? In that case, each team thread will call its loop object s run() method more than once. We ll study how that works in the next chapter. Points to Remember Write a Parallel Java 2 program as a subclass of class Task. Put the program code in the task s main() method. The task s main() method must be an instance method, not a static method. You can parallelize a loop if there are no sequential dependencies among the loop iterations. Use the parallelfor() pattern to parallelize a for loop. The parallel for loop index goes from the lower bound to the upper bound inclusive. The parallel for loop contains a hidden team of threads. The number of threads in the team is the number of cores on the node (not the number of loop iterations). Put the loop body code in the run() method of the inner Loop subclass. To emit a printout before the end of the program, call System.out.flush() or System.err.flush(). To terminate the program other than by returning from the task s main() method, call terminate(). Don t call System.exit(). In a non-parallel program, override the static coresrequired() method to return the number of cores the program will use, namely 1. Use the java pj2 command to run your Parallel Java 2 program. Use the debug=makespan option to measure the program s running time.

12 56 BIG CPU, BIG DATA

Chapter 21 Cluster Parallel Loops

Chapter 21 Cluster Parallel Loops Chapter 21 Cluster Parallel Loops Part I. Preliminaries Part II. Tightly Coupled Multicore Part III. Loosely Coupled Cluster Chapter 18. Massively Parallel Chapter 19. Hybrid Parallel Chapter 20. Tuple

More information

Chapter 16 Heuristic Search

Chapter 16 Heuristic Search Chapter 16 Heuristic Search Part I. Preliminaries Part II. Tightly Coupled Multicore Chapter 6. Parallel Loops Chapter 7. Parallel Loop Schedules Chapter 8. Parallel Reduction Chapter 9. Reduction Variables

More information

Chapter 11 Overlapping

Chapter 11 Overlapping Chapter 11 Overlapping Part I. Preliminaries Part II. Tightly Coupled Multicore Chapter 6. Parallel Loops Chapter 7. Parallel Loop Schedules Chapter 8. Parallel Reduction Chapter 9. Reduction Variables

More information

Chapter 31 Multi-GPU Programming

Chapter 31 Multi-GPU Programming Chapter 31 Multi-GPU Programming Part I. Preliminaries Part II. Tightly Coupled Multicore Part III. Loosely Coupled Cluster Part IV. GPU Acceleration Chapter 29. GPU Massively Parallel Chapter 30. GPU

More information

Chapter 19 Hybrid Parallel

Chapter 19 Hybrid Parallel Chapter 19 Hybrid Parallel Part I. Preliminaries Part II. Tightly Coupled Multicore Part III. Loosely Coupled Cluster Chapter 18. Massively Parallel Chapter 19. Hybrid Parallel Chapter 20. Tuple Space

More information

Chapter 26 Cluster Heuristic Search

Chapter 26 Cluster Heuristic Search Chapter 26 Cluster Heuristic Search Part I. Preliminaries Part II. Tightly Coupled Multicore Part III. Loosely Coupled Cluster Chapter 18. Massively Parallel Chapter 19. Hybrid Parallel Chapter 20. Tuple

More information

Chapter 27 Cluster Work Queues

Chapter 27 Cluster Work Queues Chapter 27 Cluster Work Queues Part I. Preliminaries Part II. Tightly Coupled Multicore Part III. Loosely Coupled Cluster Chapter 18. Massively Parallel Chapter 19. Hybrid Parallel Chapter 20. Tuple Space

More information

Chapter 9 Reduction Variables

Chapter 9 Reduction Variables Chapter 9 Reduction Variables Part I. Preliminaries Part II. Tightly Coupled Multicore Chapter 6. Parallel Loops Chapter 7. Parallel Loop Schedules Chapter 8. Parallel Reduction Chapter 9. Reduction Variables

More information

Chapter 24 File Output on a Cluster

Chapter 24 File Output on a Cluster Chapter 24 File Output on a Cluster Part I. Preliminaries Part II. Tightly Coupled Multicore Part III. Loosely Coupled Cluster Chapter 18. Massively Parallel Chapter 19. Hybrid Parallel Chapter 20. Tuple

More information

Chapter 13 Strong Scaling

Chapter 13 Strong Scaling Chapter 13 Strong Scaling Part I. Preliminaries Part II. Tightly Coupled Multicore Chapter 6. Parallel Loops Chapter 7. Parallel Loop Schedules Chapter 8. Parallel Reduction Chapter 9. Reduction Variables

More information

Chapter 17 Parallel Work Queues

Chapter 17 Parallel Work Queues Chapter 17 Parallel Work Queues Part I. Preliminaries Part II. Tightly Coupled Multicore Chapter 6. Parallel Loops Chapter 7. Parallel Loop Schedules Chapter 8. Parallel Reduction Chapter 9. Reduction

More information

A First Parallel Program

A First Parallel Program A First Parallel Program Chapter 4 Primality Testing A simple computation that will take a long time. Whether a number x is prime: Decide whether a number x is prime using the trial division algorithm.

More information

Chapter 20 Tuple Space

Chapter 20 Tuple Space Chapter 20 Tuple Space Part I. Preliminaries Part II. Tightly Coupled Multicore Part III. Loosely Coupled Cluster Chapter 18. Massively Parallel Chapter 19. Hybrid Parallel Chapter 20. Tuple Space Chapter

More information

Chapter 36 Cluster Map-Reduce

Chapter 36 Cluster Map-Reduce Chapter 36 Cluster Map-Reduce Part I. Preliminaries Part II. Tightly Coupled Multicore Part III. Loosely Coupled Cluster Part IV. GPU Acceleration Part V. Big Data Chapter 35. Basic Map-Reduce Chapter

More information

Chapter 25 Interacting Tasks

Chapter 25 Interacting Tasks Chapter 25 Interacting Tasks Part I. Preliminaries Part II. Tightly Coupled Multicore Part III. Loosely Coupled Cluster Chapter 18. Massively Parallel Chapter 19. Hybrid Parallel Chapter 20. Tuple Space

More information

BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing. Second Edition. Alan Kaminsky

BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing. Second Edition. Alan Kaminsky Solving the World s Toughest Computational Problems with Parallel Computing Second Edition Alan Kaminsky Department of Computer Science B. Thomas Golisano College of Computing and Information Sciences

More information

AP COMPUTER SCIENCE JAVA CONCEPTS IV: RESERVED WORDS

AP COMPUTER SCIENCE JAVA CONCEPTS IV: RESERVED WORDS AP COMPUTER SCIENCE JAVA CONCEPTS IV: RESERVED WORDS PAUL L. BAILEY Abstract. This documents amalgamates various descriptions found on the internet, mostly from Oracle or Wikipedia. Very little of this

More information

BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing Second Edition. Alan Kaminsky

BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing Second Edition. Alan Kaminsky Solving the World s Toughest Computational Problems with Parallel Computing Second Edition Alan Kaminsky Solving the World s Toughest Computational Problems with Parallel Computing Second Edition Alan

More information

BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing. Alan Kaminsky

BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing. Alan Kaminsky BIG CPU, BIG DATA Solving the World s Toughest Computational Problems with Parallel Computing Alan Kaminsky Department of Computer Science B. Thomas Golisano College of Computing and Information Sciences

More information

Appendix A Clash of the Titans: C vs. Java

Appendix A Clash of the Titans: C vs. Java Appendix A Clash of the Titans: C vs. Java Part I. Preliminaries Part II. Tightly Coupled Multicore Part III. Loosely Coupled Cluster Part IV. GPU Acceleration Part V. Map-Reduce Appendices Appendix A.

More information

Clojure Concurrency Constructs. CSCI 5828: Foundations of Software Engineering Lecture 12 10/02/2014

Clojure Concurrency Constructs. CSCI 5828: Foundations of Software Engineering Lecture 12 10/02/2014 Clojure Concurrency Constructs CSCI 5828: Foundations of Software Engineering Lecture 12 10/02/2014 1 Goals Cover the material presented in Chapters 3 & 4 of our concurrency textbook! Books examples from

More information

Nested Loops ***** ***** ***** ***** ***** We know we can print out one line of this square as follows: System.out.

Nested Loops ***** ***** ***** ***** ***** We know we can print out one line of this square as follows: System.out. Nested Loops To investigate nested loops, we'll look at printing out some different star patterns. Let s consider that we want to print out a square as follows: We know we can print out one line of this

More information

Chapter 3 Parallel Software

Chapter 3 Parallel Software Chapter 3 Parallel Software Part I. Preliminaries Chapter 1. What Is Parallel Computing? Chapter 2. Parallel Hardware Chapter 3. Parallel Software Chapter 4. Parallel Applications Chapter 5. Supercomputers

More information

Thread Safety. Review. Today o Confinement o Threadsafe datatypes Required reading. Concurrency Wrapper Collections

Thread Safety. Review. Today o Confinement o Threadsafe datatypes Required reading. Concurrency Wrapper Collections Thread Safety Today o Confinement o Threadsafe datatypes Required reading Concurrency Wrapper Collections Optional reading The material in this lecture and the next lecture is inspired by an excellent

More information

Chapter 38 Map-Reduce Meets GIS

Chapter 38 Map-Reduce Meets GIS Chapter 38 Map-Reduce Meets GIS Part I. Preliminaries Part II. Tightly Coupled Multicore Part III. Loosely Coupled Cluster Part IV. GPU Acceleration Part V. Big Data Chapter 35. Basic Map-Reduce Chapter

More information

CS 31: Introduction to Computer Systems : Threads & Synchronization April 16-18, 2019

CS 31: Introduction to Computer Systems : Threads & Synchronization April 16-18, 2019 CS 31: Introduction to Computer Systems 22-23: Threads & Synchronization April 16-18, 2019 Making Programs Run Faster We all like how fast computers are In the old days (1980 s - 2005): Algorithm too slow?

More information

Lecture 16: Recapitulations. Lecture 16: Recapitulations p. 1

Lecture 16: Recapitulations. Lecture 16: Recapitulations p. 1 Lecture 16: Recapitulations Lecture 16: Recapitulations p. 1 Parallel computing and programming in general Parallel computing a form of parallel processing by utilizing multiple computing units concurrently

More information

Exception Handling. Sometimes when the computer tries to execute a statement something goes wrong:

Exception Handling. Sometimes when the computer tries to execute a statement something goes wrong: Exception Handling Run-time errors The exception concept Throwing exceptions Handling exceptions Declaring exceptions Creating your own exception Ariel Shamir 1 Run-time Errors Sometimes when the computer

More information

Example: Monte Carlo Simulation 1

Example: Monte Carlo Simulation 1 Example: Monte Carlo Simulation 1 Write a program which conducts a Monte Carlo simulation to estimate π. 1 See https://en.wikipedia.org/wiki/monte_carlo_method. Zheng-Liang Lu Java Programming 133 / 149

More information

Exception Handling. Run-time Errors. Methods Failure. Sometimes when the computer tries to execute a statement something goes wrong:

Exception Handling. Run-time Errors. Methods Failure. Sometimes when the computer tries to execute a statement something goes wrong: Exception Handling Run-time errors The exception concept Throwing exceptions Handling exceptions Declaring exceptions Creating your own exception 22 November 2007 Ariel Shamir 1 Run-time Errors Sometimes

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms http://sudalab.is.s.u-tokyo.ac.jp/~reiji/pna16/ [ 8 ] OpenMP Parallel Numerical Algorithms / IST / UTokyo 1 PNA16 Lecture Plan General Topics 1. Architecture and Performance

More information

Why do we care about parallel?

Why do we care about parallel? Threads 11/15/16 CS31 teaches you How a computer runs a program. How the hardware performs computations How the compiler translates your code How the operating system connects hardware and software The

More information

OOADP/OOSE Re- exam. 23 August Mapping marks onto grades. Answers

OOADP/OOSE Re- exam. 23 August Mapping marks onto grades. Answers OOADP/OOSE Re- exam 23 August 213 Mapping marks onto grades Answers 1. [4 marks] The amount of communication required between team members increases (in the worst case) with the square of the number of

More information

Array Basics: Outline. Creating and Accessing Arrays. Creating and Accessing Arrays. Arrays (Savitch, Chapter 7)

Array Basics: Outline. Creating and Accessing Arrays. Creating and Accessing Arrays. Arrays (Savitch, Chapter 7) Array Basics: Outline Arrays (Savitch, Chapter 7) TOPICS Array Basics Arrays in Classes and Methods Programming with Arrays Searching and Sorting Arrays Multi-Dimensional Arrays Static Variables and Constants

More information

CS61B, Spring 2003 Discussion #17 Amir Kamil UC Berkeley 5/12/03

CS61B, Spring 2003 Discussion #17 Amir Kamil UC Berkeley 5/12/03 CS61B, Spring 2003 Discussion #17 Amir Kamil UC Berkeley 5/12/03 Topics: Threading, Synchronization 1 Threading Suppose we want to create an automated program that hacks into a server. Many encryption

More information

Exception Handling Introduction. Error-Prevention Tip 13.1 OBJECTIVES

Exception Handling Introduction. Error-Prevention Tip 13.1 OBJECTIVES 1 2 13 Exception Handling It is common sense to take a method and try it. If it fails, admit it frankly and try another. But above all, try something. Franklin Delano Roosevelt O throw away the worser

More information

CS 231 Data Structures and Algorithms, Fall 2016

CS 231 Data Structures and Algorithms, Fall 2016 CS 231 Data Structures and Algorithms, Fall 2016 Dr. Bruce A. Maxwell Department of Computer Science Colby College Course Description Focuses on the common structures used to store data and the standard

More information

Topics. Java arrays. Definition. Data Structures and Information Systems Part 1: Data Structures. Lecture 3: Arrays (1)

Topics. Java arrays. Definition. Data Structures and Information Systems Part 1: Data Structures. Lecture 3: Arrays (1) Topics Data Structures and Information Systems Part 1: Data Structures Michele Zito Lecture 3: Arrays (1) Data structure definition: arrays. Java arrays creation access Primitive types and reference types

More information

Java Threads. COMP 585 Noteset #2 1

Java Threads. COMP 585 Noteset #2 1 Java Threads The topic of threads overlaps the boundary between software development and operation systems. Words like process, task, and thread may mean different things depending on the author and the

More information

Computation Abstractions. Processes vs. Threads. So, What Is a Thread? CMSC 433 Programming Language Technologies and Paradigms Spring 2007

Computation Abstractions. Processes vs. Threads. So, What Is a Thread? CMSC 433 Programming Language Technologies and Paradigms Spring 2007 CMSC 433 Programming Language Technologies and Paradigms Spring 2007 Threads and Synchronization May 8, 2007 Computation Abstractions t1 t1 t4 t2 t1 t2 t5 t3 p1 p2 p3 p4 CPU 1 CPU 2 A computer Processes

More information

Informatica 3. Marcello Restelli. Laurea in Ingegneria Informatica Politecnico di Milano 9/15/07 10/29/07

Informatica 3. Marcello Restelli. Laurea in Ingegneria Informatica Politecnico di Milano 9/15/07 10/29/07 Informatica 3 Marcello Restelli 9/15/07 10/29/07 Laurea in Ingegneria Informatica Politecnico di Milano Structuring the Computation Control flow can be obtained through control structure at instruction

More information

Arrays. Chapter 7 (Done right after 4 arrays and loops go together, especially for loops)

Arrays. Chapter 7 (Done right after 4 arrays and loops go together, especially for loops) Arrays Chapter 7 (Done right after 4 arrays and loops go together, especially for loops) Object Quick Primer A large subset of Java s features are for OOP Object- Oriented Programming We ll get to that

More information

CSC System Development with Java. Exception Handling. Department of Statistics and Computer Science. Budditha Hettige

CSC System Development with Java. Exception Handling. Department of Statistics and Computer Science. Budditha Hettige CSC 308 2.0 System Development with Java Exception Handling Department of Statistics and Computer Science 1 2 Errors Errors can be categorized as several ways; Syntax Errors Logical Errors Runtime Errors

More information

Need for synchronization: If threads comprise parts of our software systems, then they must communicate.

Need for synchronization: If threads comprise parts of our software systems, then they must communicate. Thread communication and synchronization There are two main aspects to Outline for Lecture 19 multithreaded programming in Java: I. Thread synchronization. thread lifecycle, and thread synchronization.

More information

Threads Chate Patanothai

Threads Chate Patanothai Threads Chate Patanothai Objectives Knowing thread: 3W1H Create separate threads Control the execution of a thread Communicate between threads Protect shared data C. Patanothai Threads 2 What are threads?

More information

For instance, we can declare an array of five ints like this: int numbers[5];

For instance, we can declare an array of five ints like this: int numbers[5]; CIT 593 Intro to Computer Systems Lecture #17 (11/13/12) Arrays An array is a collection of values. In C, as in many other languages: all elements of the array must be of the same type the number of elements

More information

Concurrency & Parallelism. Threads, Concurrency, and Parallelism. Multicore Processors 11/7/17

Concurrency & Parallelism. Threads, Concurrency, and Parallelism. Multicore Processors 11/7/17 Concurrency & Parallelism So far, our programs have been sequential: they do one thing after another, one thing at a. Let s start writing programs that do more than one thing at at a. Threads, Concurrency,

More information

Chapter 5 Object-Oriented Programming

Chapter 5 Object-Oriented Programming Chapter 5 Object-Oriented Programming Develop code that implements tight encapsulation, loose coupling, and high cohesion Develop code that demonstrates the use of polymorphism Develop code that declares

More information

Operating Systems 2 nd semester 2016/2017. Chapter 4: Threads

Operating Systems 2 nd semester 2016/2017. Chapter 4: Threads Operating Systems 2 nd semester 2016/2017 Chapter 4: Threads Mohamed B. Abubaker Palestine Technical College Deir El-Balah Note: Adapted from the resources of textbox Operating System Concepts, 9 th edition

More information

Encryption Key Search

Encryption Key Search Encryption Key Search Chapter 5 Breaking the Cipher Encryption: To conceal passwords, credit card numbers, and other sensitive information from prying eyes while e-mail messages and Web pages traverse

More information

Threads, Concurrency, and Parallelism

Threads, Concurrency, and Parallelism Threads, Concurrency, and Parallelism Lecture 24 CS2110 Spring 2017 Concurrency & Parallelism So far, our programs have been sequential: they do one thing after another, one thing at a time. Let s start

More information

STATS 507 Data Analysis in Python. Lecture 2: Functions, Conditionals, Recursion and Iteration

STATS 507 Data Analysis in Python. Lecture 2: Functions, Conditionals, Recursion and Iteration STATS 507 Data Analysis in Python Lecture 2: Functions, Conditionals, Recursion and Iteration Functions in Python We ve already seen examples of functions: e.g., type()and print() Function calls take the

More information

Jump Statements. The keyword break and continue are often used in repetition structures to provide additional controls.

Jump Statements. The keyword break and continue are often used in repetition structures to provide additional controls. Jump Statements The keyword break and continue are often used in repetition structures to provide additional controls. break: the loop is terminated right after a break statement is executed. continue:

More information

Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2

Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Lecture 3: Processes Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Process in General 3.3 Process Concept Process is an active program in execution; process

More information

Programming Languages Third Edition. Chapter 9 Control I Expressions and Statements

Programming Languages Third Edition. Chapter 9 Control I Expressions and Statements Programming Languages Third Edition Chapter 9 Control I Expressions and Statements Objectives Understand expressions Understand conditional statements and guards Understand loops and variation on WHILE

More information

1.00 Lecture 2. What s an IDE?

1.00 Lecture 2. What s an IDE? 1.00 Lecture 2 Interactive Development Environment: Eclipse Reading for next time: Big Java: sections 3.1-3.9 (Pretend the method is main() in each example) What s an IDE? An integrated development environment

More information

Dr. D. M. Akbar Hussain DE5 Department of Electronic Systems

Dr. D. M. Akbar Hussain DE5 Department of Electronic Systems Concurrency 1 Concurrency Execution of multiple processes. Multi-programming: Management of multiple processes within a uni- processor system, every system has this support, whether big, small or complex.

More information

CS 151. Exceptions & Javadoc. slides available on course website. Sunday, September 9, 12

CS 151. Exceptions & Javadoc. slides available on course website. Sunday, September 9, 12 CS 151 Exceptions & Javadoc slides available on course website 1 Announcements Prelab 1 is due now. Please place it in the appropriate (Mon vs. Tues) box. Please attend lab this week. There may be a lecture

More information

Jump Statements. The keyword break and continue are often used in repetition structures to provide additional controls.

Jump Statements. The keyword break and continue are often used in repetition structures to provide additional controls. Jump Statements The keyword break and continue are often used in repetition structures to provide additional controls. break: the loop is terminated right after a break statement is executed. continue:

More information

COMP1008 Exceptions. Runtime Error

COMP1008 Exceptions. Runtime Error Runtime Error COMP1008 Exceptions Unexpected error that terminates a program. Undesirable Not detectable by compiler. Caused by: Errors in the program logic. Unexpected failure of services E.g., file server

More information

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013 Lecture 13: Memory Consistency + a Course-So-Far Review Parallel Computer Architecture and Programming Today: what you should know Understand the motivation for relaxed consistency models Understand the

More information

Process Description and Control

Process Description and Control Process Description and Control 1 Process:the concept Process = a program in execution Example processes: OS kernel OS shell Program executing after compilation www-browser Process management by OS : Allocate

More information

Massively Parallel Approximation Algorithms for the Knapsack Problem

Massively Parallel Approximation Algorithms for the Knapsack Problem Massively Parallel Approximation Algorithms for the Knapsack Problem Zhenkuang He Rochester Institute of Technology Department of Computer Science zxh3909@g.rit.edu Committee: Chair: Prof. Alan Kaminsky

More information

Functional Programming and Parallel Computing

Functional Programming and Parallel Computing Functional Programming and Parallel Computing Björn Lisper School of Innovation, Design, and Engineering Mälardalen University bjorn.lisper@mdh.se http://www.idt.mdh.se/ blr/ Functional Programming and

More information

wait with priority An enhanced version of the wait operation accepts an optional priority argument:

wait with priority An enhanced version of the wait operation accepts an optional priority argument: wait with priority An enhanced version of the wait operation accepts an optional priority argument: syntax: .wait the smaller the value of the parameter, the highest the priority

More information

CMSC 132: Object-Oriented Programming II

CMSC 132: Object-Oriented Programming II CMSC 132: Object-Oriented Programming II Synchronization in Java Department of Computer Science University of Maryland, College Park Multithreading Overview Motivation & background Threads Creating Java

More information

CSE 501N Final Fall Points Possible

CSE 501N Final Fall Points Possible Name CSE 501N Final Fall 2008 250 Points Possible True or False: 30 points (2 points each) 1.) True or False: Inner classes can be useful for exporting object state in an encapsulated way. 2.) True or

More information

Parallel Computing with MATLAB

Parallel Computing with MATLAB Parallel Computing with MATLAB CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University

More information

Software Testing Prof. Meenakshi D Souza Department of Computer Science and Engineering International Institute of Information Technology, Bangalore

Software Testing Prof. Meenakshi D Souza Department of Computer Science and Engineering International Institute of Information Technology, Bangalore Software Testing Prof. Meenakshi D Souza Department of Computer Science and Engineering International Institute of Information Technology, Bangalore Lecture 04 Software Test Automation: JUnit as an example

More information

Question 1. Notes on the Exam. Today. Comp 104: Operating Systems Concepts 11/05/2015. Revision Lectures

Question 1. Notes on the Exam. Today. Comp 104: Operating Systems Concepts 11/05/2015. Revision Lectures Comp 104: Operating Systems Concepts Revision Lectures Today Here are a sample of questions that could appear in the exam Please LET ME KNOW if there are particular subjects you want to know about??? 1

More information

Lesson 3: Accepting User Input and Using Different Methods for Output

Lesson 3: Accepting User Input and Using Different Methods for Output Lesson 3: Accepting User Input and Using Different Methods for Output Introduction So far, you have had an overview of the basics in Java. This document will discuss how to put some power in your program

More information

Objectives. Order (sort) the elements of an array Search an array for a particular item Define, use multidimensional array

Objectives. Order (sort) the elements of an array Search an array for a particular item Define, use multidimensional array Arrays Chapter 7 Objectives Nature and purpose of an array Using arrays in Java programs Methods with array parameter Methods that return an array Array as an instance variable Use an array not filled

More information

Java s Implementation of Concurrency, and how to use it in our applications.

Java s Implementation of Concurrency, and how to use it in our applications. Java s Implementation of Concurrency, and how to use it in our applications. 1 An application running on a single CPU often appears to perform many tasks at the same time. For example, a streaming audio/video

More information

CmpSci 187: Programming with Data Structures Spring 2015

CmpSci 187: Programming with Data Structures Spring 2015 CmpSci 187: Programming with Data Structures Spring 2015 Lecture #13, Concurrency, Interference, and Synchronization John Ridgway March 12, 2015 Concurrency and Threads Computers are capable of doing more

More information

Multiprocessor Systems. Chapter 8, 8.1

Multiprocessor Systems. Chapter 8, 8.1 Multiprocessor Systems Chapter 8, 8.1 1 Learning Outcomes An understanding of the structure and limits of multiprocessor hardware. An appreciation of approaches to operating system support for multiprocessor

More information

6.001 Notes: Section 17.5

6.001 Notes: Section 17.5 6.001 Notes: Section 17.5 Slide 17.5.1 Now, let's look at one example in which changing the evaluation model allows us to explore a very different kind of computational problem. Our goal is to show how

More information

CS125 : Introduction to Computer Science. Lecture Notes #38 and #39 Quicksort. c 2005, 2003, 2002, 2000 Jason Zych

CS125 : Introduction to Computer Science. Lecture Notes #38 and #39 Quicksort. c 2005, 2003, 2002, 2000 Jason Zych CS125 : Introduction to Computer Science Lecture Notes #38 and #39 Quicksort c 2005, 2003, 2002, 2000 Jason Zych 1 Lectures 38 and 39 : Quicksort Quicksort is the best sorting algorithm known which is

More information

Chapter 4: Threads. Operating System Concepts 9 th Edition

Chapter 4: Threads. Operating System Concepts 9 th Edition Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples

More information

Documen(ng code, Javadoc, Defensive Programming, Asserts, Excep(ons & Try/Catch

Documen(ng code, Javadoc, Defensive Programming, Asserts, Excep(ons & Try/Catch Documen(ng code, Javadoc, Defensive Programming, Asserts, Excep(ons & Try/Catch 1 Most important reason to comment A) To summarize the code B) To explain how the code works C) To mark loca(ons that need

More information

/* Copyright 2012 Robert C. Ilardi

/* Copyright 2012 Robert C. Ilardi / Copyright 2012 Robert C. Ilardi Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

More information

Introduction to Concurrent Software Systems. CSCI 5828: Foundations of Software Engineering Lecture 08 09/17/2015

Introduction to Concurrent Software Systems. CSCI 5828: Foundations of Software Engineering Lecture 08 09/17/2015 Introduction to Concurrent Software Systems CSCI 5828: Foundations of Software Engineering Lecture 08 09/17/2015 1 Goals Present an overview of concurrency in software systems Review the benefits and challenges

More information

Chapter 4: Threads. Operating System Concepts 9 th Edition

Chapter 4: Threads. Operating System Concepts 9 th Edition Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples

More information

CONTENTS: While loops Class (static) variables and constants Top Down Programming For loops Nested Loops

CONTENTS: While loops Class (static) variables and constants Top Down Programming For loops Nested Loops COMP-202 Unit 4: Programming with Iterations Doing the same thing again and again and again and again and again and again and again and again and again... CONTENTS: While loops Class (static) variables

More information

High Performance Computing Prof. Matthew Jacob Department of Computer Science and Automation Indian Institute of Science, Bangalore

High Performance Computing Prof. Matthew Jacob Department of Computer Science and Automation Indian Institute of Science, Bangalore High Performance Computing Prof. Matthew Jacob Department of Computer Science and Automation Indian Institute of Science, Bangalore Module No # 09 Lecture No # 40 This is lecture forty of the course on

More information

Notes on the Exam. Question 1. Today. Comp 104:Operating Systems Concepts 11/05/2015. Revision Lectures (separate questions and answers)

Notes on the Exam. Question 1. Today. Comp 104:Operating Systems Concepts 11/05/2015. Revision Lectures (separate questions and answers) Comp 104:Operating Systems Concepts Revision Lectures (separate questions and answers) Today Here are a sample of questions that could appear in the exam Please LET ME KNOW if there are particular subjects

More information

Synchronization SPL/2010 SPL/20 1

Synchronization SPL/2010 SPL/20 1 Synchronization 1 Overview synchronization mechanisms in modern RTEs concurrency issues places where synchronization is needed structural ways (design patterns) for exclusive access 2 Overview synchronization

More information

Chapter 4: Multithreaded Programming

Chapter 4: Multithreaded Programming Chapter 4: Multithreaded Programming Silberschatz, Galvin and Gagne 2013! Chapter 4: Multithreaded Programming Overview Multicore Programming Multithreading Models Threading Issues Operating System Examples

More information

Total Score /20 /20 /20 /25 /15 Grader

Total Score /20 /20 /20 /25 /15 Grader NAME: NETID: CS2110 Fall 2009 Final Exam December 16, 2009 Write your name and Cornell netid. There are 5 questions on 10 numbered pages. Check now that you have all the pages. Write your answers in the

More information

Project #1 rev 2 Computer Science 2334 Fall 2013 This project is individual work. Each student must complete this assignment independently.

Project #1 rev 2 Computer Science 2334 Fall 2013 This project is individual work. Each student must complete this assignment independently. Project #1 rev 2 Computer Science 2334 Fall 2013 This project is individual work. Each student must complete this assignment independently. User Request: Create a simple magazine data system. Milestones:

More information

1 Getting started with Processing

1 Getting started with Processing cis3.5, spring 2009, lab II.1 / prof sklar. 1 Getting started with Processing Processing is a sketch programming tool designed for use by non-technical people (e.g., artists, designers, musicians). For

More information

Program Graph. Lecture 25: Parallelism & Concurrency. Performance. What does it mean?

Program Graph. Lecture 25: Parallelism & Concurrency. Performance. What does it mean? Program Graph Lecture 25: Parallelism & Concurrency CS 62 Fall 2015 Kim Bruce & Michael Bannister Some slides based on those from Dan Grossman, U. of Washington Program using fork and join can be seen

More information

Object-oriented programming. and data-structures CS/ENGRD 2110 SUMMER 2018

Object-oriented programming. and data-structures CS/ENGRD 2110 SUMMER 2018 Object-oriented programming 1 and data-structures CS/ENGRD 2110 SUMMER 2018 Lecture 1: Types and Control Flow http://courses.cs.cornell.edu/cs2110/2018su Lecture 1 Outline 2 Languages Overview Imperative

More information

Comp 204: Computer Systems and Their Implementation. Lecture 25a: Revision Lectures (separate questions and answers)

Comp 204: Computer Systems and Their Implementation. Lecture 25a: Revision Lectures (separate questions and answers) Comp 204: Computer Systems and Their Implementation Lecture 25a: Revision Lectures (separate questions and answers) 1 Today Here are a sample of questions that could appear in the exam Please LET ME KNOW

More information

CS 31: Intro to Systems Threading & Parallel Applications. Kevin Webb Swarthmore College November 27, 2018

CS 31: Intro to Systems Threading & Parallel Applications. Kevin Webb Swarthmore College November 27, 2018 CS 31: Intro to Systems Threading & Parallel Applications Kevin Webb Swarthmore College November 27, 2018 Reading Quiz Making Programs Run Faster We all like how fast computers are In the old days (1980

More information

Introduction to Concurrent Software Systems. CSCI 5828: Foundations of Software Engineering Lecture 12 09/29/2016

Introduction to Concurrent Software Systems. CSCI 5828: Foundations of Software Engineering Lecture 12 09/29/2016 Introduction to Concurrent Software Systems CSCI 5828: Foundations of Software Engineering Lecture 12 09/29/2016 1 Goals Present an overview of concurrency in software systems Review the benefits and challenges

More information

Maximum Clique Problem

Maximum Clique Problem Maximum Clique Problem Dler Ahmad dha3142@rit.edu Yogesh Jagadeesan yj6026@rit.edu 1. INTRODUCTION Graph is a very common approach to represent computational problems. A graph consists a set of vertices

More information

Subset Sum - A Dynamic Parallel Solution

Subset Sum - A Dynamic Parallel Solution Subset Sum - A Dynamic Parallel Solution Team Cthulu - Project Report ABSTRACT Tushar Iyer Rochester Institute of Technology Rochester, New York txi9546@rit.edu The subset sum problem is an NP-Complete

More information

Le L c e t c ur u e e 5 To T p o i p c i s c t o o b e b e co c v o e v r e ed e Exception Handling

Le L c e t c ur u e e 5 To T p o i p c i s c t o o b e b e co c v o e v r e ed e Exception Handling Course Name: Advanced Java Lecture 5 Topics to be covered Exception Handling Exception HandlingHandlingIntroduction An exception is an abnormal condition that arises in a code sequence at run time A Java

More information

Semantic Analysis. Lecture 9. February 7, 2018

Semantic Analysis. Lecture 9. February 7, 2018 Semantic Analysis Lecture 9 February 7, 2018 Midterm 1 Compiler Stages 12 / 14 COOL Programming 10 / 12 Regular Languages 26 / 30 Context-free Languages 17 / 21 Parsing 20 / 23 Extra Credit 4 / 6 Average

More information

Processes, Threads and Processors

Processes, Threads and Processors 1 Processes, Threads and Processors Processes and Threads From Processes to Threads Don Porter Portions courtesy Emmett Witchel Hardware can execute N instruction streams at once Ø Uniprocessor, N==1 Ø

More information