EE/CSCI 451 Midterm 1

Size: px
Start display at page:

Download "EE/CSCI 451 Midterm 1"

Transcription

1 EE/CSCI 451 Midterm 1 Spring 2018 Instructor: Xuehai Qian Friday: 02/26/2018 Problem # Topic Points Score 1 Definitions 20 2 Memory System Performance 10 3 Cache Performance 10 4 Shared Memory Programming Model 15 5 Interconnection Networks 10 6 Interconnection Networks 10 7 Analytical Modeling 12 8 Program and Data Mapping 13 Total 100 Student Name: Student USC-ID: 1

2 Problem 1 (10 2 = 20 Points) Define/Explain the following terms a. Work optimal parallel algorithm The cost of solving a problem on a single processing element is the execution time of the fastest known sequential algorithm. A parallel algorithm is work-optimal if the cost of solving a problem on a parallel computer has the same asymptotic growth as a function of the input size as the fastest-known sequential algorithm on a single processing element. A work-optimal parallel algorithm has an efficiency of O(1). b. Store and forward routing In store-and-forward routing, when a message traverses a path with multiple links, each intermediate node on the path forwards the message to the next node after it has received and stored the entire message. c. Spatial locality Spatial locality implies that if a location i is referenced at time t, then locations near i are referenced in a small window of time following t. d. Data dependency Data dependency is a situation in which a program statement (i.e., instruction) refers to the data output by a previous statement. e. Instruction-level parallelism Instruction-level parallelism (ILP) is a measure of the number of operations that can be performed simultaneously in a computer program. ILP is exploited by executing multiple operations from a program in a single cycle. f. Bisection width of a network The bisection width of a network is defined as the minimum number of communication links that must be removed to partition the network into two equal halves. g. Non blocking network In a non blocking network, any connection request from input to output can be routed without rearranging the existing set of connections. 2

3 h. Asynchronous execution Asynchronous execution has no global clock to coordinate the execution among the processors. The order of execution of the instructions depends on the input data, the scheduling algorithm, the speed of the processors, and the speed of the communication network. i. Shuffle exchange network Shuffle exchange network performs shuffle and exchange operations to route from a source x = x n 1 x 0 to a destination y = y n 1 y 0. The shuffle operation is performed by circularly shifting the bits of x. x x n 2 x 0 x n 1 The exchange operation is performed by complementing the least significant bit of x. x x n 1 ( x 0 ) j. Cache pollution Cache pollution describes the scenario in which a program loads unnecessary data into the cache resulting in the eviction of useful data into lower levels of the memory hierarchy (e.g., main memory). 3

4 Problem 2 (10 Points) Memory System Performance Consider a memory system which has 100-cycle latency DRAM and is connected to a processor that operates at 1 GHz. The processor-memory bus can support one word per cycle (streaming bandwidth = 1 word/cycle). Assume cache has been disabled. The processor has two floating point multiply-add units. Each multiply-add unit is capable of executing one multiplication and one addition per processor cycle. Thus, the processor can execute four floating point operations including two floating point multiplications and two floating point additions in each processor cycle. a. What is the peak floating point performance of the processor? State any assumption(s) you may make. (2 points) Peak performance = 1 GHz 4 FLOPS/cycle = 4 GFLOPS Consider the following program: result = 0; // The result is stored in local register for (i = 0; i < ; i++) result = result * C[i] + A[i] * B[i]; b. Assume each element of A, B, and C is one word stored in DRAM and the memory system supports streaming. What is the sustained performance in the best case (in FLOPS)? State any assumption(s) you may make. (4 points) In the best case, three operands (A[i], B[i], C[i]) are streamed from memory in every three cycles and used to perform 3 FLOPs (2 multiplications and 1 addition). The computation can be completely overlapped with memory accesses. Sustained performance (best case) = 3 FLOPs over 3 processor cycles = 1 GFLOPS c. Assume the memory system does not support streaming. What is the sustained performance in the worst case (in FLOPS)? State any assumption(s) you may make. (4 points) Three operands (A[i], B[i], C[i]) are read from memory in every cycles and used to perform 3 FLOPs. The computation can be completely overlapped with memory accesses. Sustained performance (worst case) = 3 FLOPs over 300 cycles = 0.01 GFLOPS. 4

5 Problem 3 (10 Points) Cache Performance Suppose we want to double the value of each element of a matrix A on a uni-processor. The size of each element is 1 word. The matrix is stored in row major order. The processor has a direct-mapped cache with 8 cache lines. The size of each cache line is 4 words. a. Compute the cache hit ratio for read operations when executing the following code (2 points). Explain (2 points). 1 for (i = 0; i < 128; i++) 2 for (j = 0; j < 128; j++) 3 A[i][j]=2*A[i][j]; 4 end for 5 end for The cache hit ratio for read operations is 75%. When a cache miss occurs due to accessing A[i][j], A[i][j+1], A[i][j+2], A[i][j+3] will be brought into cache in the same cache line. This results in 3 cache hits when accessing these three elements. Thus, in general, accessing every 4 elements results in 1 cache miss and 3 cache hits. b. Compute the cache hit ratio for read operations when executing the following code (2 points). Explain (2 points). 1 for (j = 0; j < 128; j++) 2 for (i = 0; i < 128; i++) 3 A[i][j]=2*A[i][j]; 4 end for 5 end for The cache hit ratio for read operations is 0%. When a cache miss occurs due to accessing A[i][j], A[i][j+1], A[i][j+2], A[i][j+3] will be brought into cache in the same cache line. However, the program needs to access A[i+1][j] next because it accesses the matrix in column major order. Thus, cache miss repeatedly occurs for accessing A[i+1][j], A[i+2][j], A[i+3][j],. c. Repeat parts a and b if the size of each cache line is 8 words. (2 points) Part a: 7/8 = 87.5%; Part b: 0% 5

6 Problem 4 (15 Points) Shared Memory Programming Model Given an undirected graph G(V, E), V = {0,, n 1}, a connected component is defined as a sub-graph such that any two vertices of the component are connected by a path in G. The root of a connected component is defined as the smallest vertex in the connected component. The label of a connected component is its root vertex. The algorithm to find the connected component (the label of the root vertex) to which each vertex belongs to is illustrated in Figure 1. For each vertex i (0 i < n), we use c(i) to keep track of the label of the connected component that the vertex belongs to. The algorithm is iterative; in each iteration, all the edges are traversed to update c(0),, c(n 1). The algorithm terminates when there is no update for any c(i) in an iteration. Then, c(i) becomes the label of the connected component that vertex i belongs to. In Figure 1, we also show an example input graph and its output. Suppose we want to parallelize the algorithm using p (p > 1) threads, with each thread executing the computation for E edges ( E =total # of edges in G) in each iteration. p We define iteration for a thread as the work done to traverse its own edges once. For this problem, we assume that threads take similar amount of time to traverse their edges in each iteration. Input: c(0) = c(4) = 4 c 1 = c 2 = 2 c 3 = 3 Output: c(0) = c(4) = 0 c 1 = c 2 = 2 c 3 = 2 Figure 1: Finding connected components a. What are the shared variables for each thread? (3 points) The shared variables for thread(i, j) are At least one vertex has update and 6

7 c(i), c(j). b. Write the pseudo code of the function executed by T hread w (0 w < p). Note that your code must ensure that at the end of k-th iteration, all vertices in a connected component at a distance less than or equal to k from the root should have the correct label of that component. (5 points) /* Pseudo code executed by Thread with index w */ 1 Let edge[] denote the array that stores the E edges 2 while (At_least_one_vertex_has_update = true) then 3 Lock(At_least_one_vertex_has_update); 4 At_least_one_vertex_has_update = false; 5 unlock(at_least_one_vertex_has_update); 6 for(g=w* E /p; g<(w+1)* E /p; g++) 7 Lock(c(edge[g].i), c(edge[g].j)); 8 m = min (c(edge[g].i), c(edge[g].j)); 9 if c(edge[g].i) > m then 10 c(edge[g].i) = m; 11 At_least_one_vertex_has_update = true; 12 end if 13 if c(edge[g].j) > m then 14 c(edge[g].j) = m; 15 At_least_one_vertex_has_update = true; 16 end if 17 Unlock(c(edge[g].i), c(edge[g].j)); 18 end for 19 barrier; 20 end while c. If your code in part b does not use any locks, will the execution ever terminate? If yes, will it be able to produce the correct output? Explain. If no, explain why the execution will never terminate. (3 points) Even if the locks are not used, the execution will still terminate. This can be established by the following fact: If in the correct program, after some iteration k, c(i) has a value l, then in the program without locks, c(i) will have a value l in iteration k + 1. This is because due to race conditions, i might miss an update from some of its neighbor j, but that update will be propagated in the next iteration. d. Given the input graph shown below, if your code in Part b does not use any lock, for p = 3, what is the total number of iterations that the algorithm executes in the best case? What is the total number of iterations that the algorithm executes in the worst case? Explain. (4 points) Input graph 1 Best case: 2 iterations Initial setup c(0) = 0, c(1) = 1, c(2) = n

8 Iteration 1 c(0) = 0, c(1) = 0, c(2) = 0 Iteration 2 c(0) = 0, c(1) = 0, c(2) = 0 no update algorithm terminates Worst case: 3 iterations Initial setup c(0) = 0, c(1) = 1, c(2) = 2 Iteration 1 c(0) = 0, c(1) = 0, c(2) = 1 (c(2) is incorrectly updated due to race condition) Iteration 2 c(0) = 0, c(1) = 0, c(2) = 0 Iteration 3 c(0) = 0, c(1) = 0, c(2) = 0 no update algorithm terminates 8

9 Problem 5 (10 Points) Interconnection Networks Definition 5.1: A p-input and p-output CLOS network can be defined as a 3-stage network where Stage 0 and Stage 2 consist of two p p switches and Stage 1 consists 2 2 of p 2 2 switches. 2 a. Draw such a network for p = 8. (2 points) The network is as shown below: 4 X 4 4 X 4 4 X 4 4 X 4 Figure 2: CLOS network for p = 8 b. Apply Definition 5.1 of CLOS network to recursively decompose all the switches in Stage 0 and Stage 2 until the network only has 2 2 switches. Draw such a network for p = 8. (3 points) The network is as shown below: Figure 3: CLOS network for p = 8 c. In general, derive an expression for the total number of switches and the total delay from an input to an output for an n input and n output CLOS network which is obtained by recursively applying Definition 5.1 to the switches in Stage 0 and Stage 2 as done in part (b). Use order notation. Note that the final network consists of only 2 2 switches. (Assume the delay of each 2 2 switch is equal to 1 unit) (5 points) 9

10 The recurrence relation for the total number of switches for a network of size n: S(n) is S(n) = 4S( n 2 ) + n 2, S(2) = 1 = 4[4S( n 2 2 ) + n 2 2 ] + n 2 = 4 2 S( n 2 2 ) + n + n 2 = 4 2 [4S( n 2 3 ) + n 2 3 ] + n + n 2 = 4 3 S( n 2 3 ) + 2n + n + n 2 =... = 4 k S(2) + 2 k 2 n + + n + n 2 = 4 k S(2) + n 2 [2k ], k = log 2 n 1 = 4 k S(2) + n 2 [2k 1], k = log 2 n 1 = S(n) = n2 2 n 2 = θ(n2 ) The recurrence relation for the total delay for a network of size n: D(n) is D(n) = 2D( n ) + 1, D(2) = 1 2 = 2[2D( n 2 2 ) + 1] + 1 = 2 2 D( n 2 2 ) =... = 2 k D(2) + 2 k = 2 k+1 1, k = log 2 n 1 = D(n) = n 1 = θ(n) 10

11 Problem 6 (10 Points) Interconnection Networks In HW #2, we designed a 2 k -node Shuffle-Exchange network along 2 dimensions. In this problem, we generalize the design to support l dimensions (l k). We evenly divide the k bits into l chunks. Each chunk of k bits is used to perform SE routing in l each of the l dimensions. We assume k is divisible by l. a. Draw the network for k = 4 and l = 2. (2 points) Shuffle Exchange b. Show all the intermediate nodes while routing from source s = 1110 to destination d = 1001 in the network. (2 points) Intermediate nodes: c. Assume k and l are even numbers. Suppose we route from s = (all 00s) to d = (all 10s), what is the total path length (each shuffle or exchange operation takes 1 unit) in terms of k and l? (4 points) Along each dimension, we need to perform k l shuffle operations and k 2l exchange operations. Thus, for l dimensions the total path length is 3k 2. d. Suppose k = l, comment on the resulting network. (2 points) When k = l, the network becomes a k-dimensional hypercube network. 11

12 Problem 7 (12 Points) Analytical Modeling In this problem, we will perform analytical model of the algorithm to compute π that you implemented in PHW 1. Consider the following algorithm: #define N 1000 CalculatePI 1 num_of_points_in_circle = 0; 2 num_of_points_not_in_circle = 0; for (i = 0; i < N; i++) { 3.1 Generate a random point in unit square 3.2 Check whether the point is inside the unit circle or not if Yes num_of_points_in_circle++; else num_of_points_not_in_circle++; } 4 PI = 4*num_of_points_in_circle/N; Assume each statement which has a line number on the left takes 1 cycle to execute. Note that only for loop is parallelizable. Assume that the code is run on perfect parallel architecture with no overheads such as communication, coordination, or thread creation overheads. Also ignore any concurrency issues. i.e. you can assume that even if multiple threads write to the same location, any race conditions are taken care of by the architecture without any additional overhead and the intended result is produced. a. Calculate S and P. (3 points) S: time taken by the portion of the code which cannot be parallelized. P : time taken in a serial program by the portion of the code which can be parallelized. S = 3, statements 1,2 and 4 P = N 3 = 3000 statements They will run 1000 times in a serial program. b. Now, if we use p N threads to parallelize the for loop, derive an expression for the overall speedup achieved in terms of p. What is the maximum speedup that can be achieved? What is the value of p that achieves the maximum speedup? (3 points) Speedup = p Maximum Speedup = 3003 = Maximum speedup is achieved when p = 1000 c. Derive an expression for Efficiency in terms of p. (3 points) Efficiency = p

13 d. Now consider the following algorithm: CalculatePI(N) 1 num_of_points_in_circle = 0; 2 num_of_points_not_in_circle = 0; for (i = 0; i < N; i++) { 3.1 Generate a random point in unit square 3.2 Check whether the point is inside the unit circle or not if Yes 3.3 num_of_points_in_circle++; else 3.4 num_of_points_not_in_circle++; } 4 PI = 4*num_of_points_in_circle/N; Note that in this case, N is a parameter to the function and is not fixed as in parts a,b and c. In this part, we use p = N threads to parallelize the for loop. Derive an expression for the scaled speedup achieved in terms of p. (3 points) Scaled speedup = 3+3p 3+3 = 3+3p 6 0.5p 13

14 Problem 8 (13 Points) Program and Data Mapping Assume a k-node fully connected network G is embedded in a k-node ring G as follows: (a) node i in the fully connected network is mapped to node i in the ring (0 i < k); (b) let e ij be the edge from node i G to j G; then edge mapping function is as follows: e ij path in G starting from i and moving in clockwise manner to reach j. e.g. e 03 is mapped to {(0, 1), (1, 2), (2, 3)}. Note that e ij and e ji may be mapped to different paths in G. However, assume that k-node ring is undirected Figure 4: Embedding a fully connected network into a ring for k = 6 a. Figure 4 shows the mapping for k = 6. Derive the values of dilation and congestion for k = 6. (3 points) Mapping function f: i i Dilation = 5, (1,0) {(1,2), (2,3), (3,4), (4,5), (5,0)} Congestion = = 15 {e.g. (1,2)}. There are 30 directed edges in G. Out of each edge pair e ij and e ji, exactly one edge will map to any arbitrary edge in G. Hence 15 edges in G will map to any edge in G. b. Derive the exact expressions for dilation and congestion in terms of k (Assume k is even). (5 points) Dilation = k 1, a single edge between adjacent nodes will have to travel through the entire ring. Congestion = (k 1) + (k 2) = (k 1)k, every undirected edge in G will add 1 2 to the number of paths going each edge. Hence, congestion is the number of undirected edges in G. c. Now assume that the k-node fully connected network is mapped to a k-node linear array (without wraparound). In this problem, an edge e ij in G is mapped to the shortest path in G. Derive the exact expressions for dilation and congestion in terms of k. (Assume k is even. Also assume e ij and e ji are distinct edges in G). (5 points) Dilation = k 1 Congestion = k k 2 = k2, occurs at the edge connection the vertices ( n 1, n)

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #11 2/21/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline Midterm 1:

More information

EE/CSCI 451 Spring 2017 Homework 3 solution Total Points: 100

EE/CSCI 451 Spring 2017 Homework 3 solution Total Points: 100 EE/CSCI 451 Spring 2017 Homework 3 solution Total Points: 100 1 [10 points] 1. Task parallelism: The computations in a parallel algorithm can be split into a set of tasks for concurrent execution. Task

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #4 1/24/2018 Xuehai Qian xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Announcements PA #1

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #5 1/29/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 From last class Outline

More information

Hypercubes. (Chapter Nine)

Hypercubes. (Chapter Nine) Hypercubes (Chapter Nine) Mesh Shortcomings: Due to its simplicity and regular structure, the mesh is attractive, both theoretically and practically. A problem with the mesh is that movement of data is

More information

EE/CSCI 451 Spring 2018 Homework 8 Total Points: [10 points] Explain the following terms: EREW PRAM CRCW PRAM. Brent s Theorem.

EE/CSCI 451 Spring 2018 Homework 8 Total Points: [10 points] Explain the following terms: EREW PRAM CRCW PRAM. Brent s Theorem. EE/CSCI 451 Spring 2018 Homework 8 Total Points: 100 1 [10 points] Explain the following terms: EREW PRAM CRCW PRAM Brent s Theorem BSP model 1 2 [15 points] Assume two sorted sequences of size n can be

More information

EE/CSCI 451 Spring 2018 Homework 2 Assigned: February 7, 2018 Due: February 14, 2018, before 11:59 pm Total Points: 100

EE/CSCI 451 Spring 2018 Homework 2 Assigned: February 7, 2018 Due: February 14, 2018, before 11:59 pm Total Points: 100 EE/CSCI 45 Spring 08 Homework Assigned: February 7, 08 Due: February 4, 08, before :59 pm Total Points: 00 [0 points] Explain the following terms:. Diameter of a network. Bisection width of a network.

More information

Basic Communication Operations (Chapter 4)

Basic Communication Operations (Chapter 4) Basic Communication Operations (Chapter 4) Vivek Sarkar Department of Computer Science Rice University vsarkar@cs.rice.edu COMP 422 Lecture 17 13 March 2008 Review of Midterm Exam Outline MPI Example Program:

More information

Parallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. Fall Parallel Sorting

Parallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. Fall Parallel Sorting Parallel Systems Course: Chapter VIII Sorting Algorithms Kumar Chapter 9 Jan Lemeire ETRO Dept. Fall 2017 Overview 1. Parallel sort distributed memory 2. Parallel sort shared memory 3. Sorting Networks

More information

TDT4260/DT8803 COMPUTER ARCHITECTURE EXAM

TDT4260/DT8803 COMPUTER ARCHITECTURE EXAM Norwegian University of Science and Technology Department of Computer and Information Science Page 1 of 13 Contact: Magnus Jahre (952 22 309) TDT4260/DT8803 COMPUTER ARCHITECTURE EXAM Monday 4. June Time:

More information

Effect of memory latency

Effect of memory latency CACHE AWARENESS Effect of memory latency Consider a processor operating at 1 GHz (1 ns clock) connected to a DRAM with a latency of 100 ns. Assume that the processor has two ALU units and it is capable

More information

Interconnection networks

Interconnection networks Interconnection networks When more than one processor needs to access a memory structure, interconnection networks are needed to route data from processors to memories (concurrent access to a shared memory

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #15 3/7/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 From last class Outline

More information

Lecture 4: Graph Algorithms

Lecture 4: Graph Algorithms Lecture 4: Graph Algorithms Definitions Undirected graph: G =(V, E) V finite set of vertices, E finite set of edges any edge e = (u,v) is an unordered pair Directed graph: edges are ordered pairs If e

More information

Parallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. November Parallel Sorting

Parallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. November Parallel Sorting Parallel Systems Course: Chapter VIII Sorting Algorithms Kumar Chapter 9 Jan Lemeire ETRO Dept. November 2014 Overview 1. Parallel sort distributed memory 2. Parallel sort shared memory 3. Sorting Networks

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

Physical Organization of Parallel Platforms. Alexandre David

Physical Organization of Parallel Platforms. Alexandre David Physical Organization of Parallel Platforms Alexandre David 1.2.05 1 Static vs. Dynamic Networks 13-02-2008 Alexandre David, MVP'08 2 Interconnection networks built using links and switches. How to connect:

More information

1. (a) O(log n) algorithm for finding the logical AND of n bits with n processors

1. (a) O(log n) algorithm for finding the logical AND of n bits with n processors 1. (a) O(log n) algorithm for finding the logical AND of n bits with n processors on an EREW PRAM: See solution for the next problem. Omit the step where each processor sequentially computes the AND of

More information

Double-Precision Matrix Multiply on CUDA

Double-Precision Matrix Multiply on CUDA Double-Precision Matrix Multiply on CUDA Parallel Computation (CSE 60), Assignment Andrew Conegliano (A5055) Matthias Springer (A995007) GID G--665 February, 0 Assumptions All matrices are square matrices

More information

Introduction to Multithreaded Algorithms

Introduction to Multithreaded Algorithms Introduction to Multithreaded Algorithms CCOM5050: Design and Analysis of Algorithms Chapter VII Selected Topics T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein. Introduction to algorithms, 3 rd

More information

Chapter 2. Parallel Hardware and Parallel Software. An Introduction to Parallel Programming. The Von Neuman Architecture

Chapter 2. Parallel Hardware and Parallel Software. An Introduction to Parallel Programming. The Von Neuman Architecture An Introduction to Parallel Programming Peter Pacheco Chapter 2 Parallel Hardware and Parallel Software 1 The Von Neuman Architecture Control unit: responsible for deciding which instruction in a program

More information

Network-on-chip (NOC) Topologies

Network-on-chip (NOC) Topologies Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance

More information

VIII. Communication costs, routing mechanism, mapping techniques, cost-performance tradeoffs. April 6 th, 2009

VIII. Communication costs, routing mechanism, mapping techniques, cost-performance tradeoffs. April 6 th, 2009 VIII. Communication costs, routing mechanism, mapping techniques, cost-performance tradeoffs April 6 th, 2009 Message Passing Costs Major overheads in the execution of parallel programs: from communication

More information

Unit 9 : Fundamentals of Parallel Processing

Unit 9 : Fundamentals of Parallel Processing Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #7 2/5/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class

More information

CSC 447: Parallel Programming for Multi- Core and Cluster Systems

CSC 447: Parallel Programming for Multi- Core and Cluster Systems CSC 447: Parallel Programming for Multi- Core and Cluster Systems Parallel Sorting Algorithms Instructor: Haidar M. Harmanani Spring 2016 Topic Overview Issues in Sorting on Parallel Computers Sorting

More information

Make sure that your exam is not missing any sheets, then write your full name and login ID on the front.

Make sure that your exam is not missing any sheets, then write your full name and login ID on the front. ETH login ID: (Please print in capital letters) Full name: 63-300: How to Write Fast Numerical Code ETH Computer Science, Spring 015 Midterm Exam Wednesday, April 15, 015 Instructions Make sure that your

More information

Midterm solutions. n f 3 (n) = 3

Midterm solutions. n f 3 (n) = 3 Introduction to Computer Science 1, SE361 DGIST April 20, 2016 Professors Min-Soo Kim and Taesup Moon Midterm solutions Midterm solutions The midterm is a 1.5 hour exam (4:30pm 6:00pm). This is a closed

More information

17/05/2018. Outline. Outline. Divide and Conquer. Control Abstraction for Divide &Conquer. Outline. Module 2: Divide and Conquer

17/05/2018. Outline. Outline. Divide and Conquer. Control Abstraction for Divide &Conquer. Outline. Module 2: Divide and Conquer Module 2: Divide and Conquer Divide and Conquer Control Abstraction for Divide &Conquer 1 Recurrence equation for Divide and Conquer: If the size of problem p is n and the sizes of the k sub problems are

More information

Interconnect Technology and Computational Speed

Interconnect Technology and Computational Speed Interconnect Technology and Computational Speed From Chapter 1 of B. Wilkinson et al., PARAL- LEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers, augmented

More information

Lecture 2. Memory locality optimizations Address space organization

Lecture 2. Memory locality optimizations Address space organization Lecture 2 Memory locality optimizations Address space organization Announcements Office hours in EBU3B Room 3244 Mondays 3.00 to 4.00pm; Thurs 2:00pm-3:30pm Partners XSED Portal accounts Log in to Lilliput

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #8 2/7/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class

More information

Lecture 8 Parallel Algorithms II

Lecture 8 Parallel Algorithms II Lecture 8 Parallel Algorithms II Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Original slides from Introduction to Parallel

More information

Vector an ordered series of scalar quantities a one-dimensional array. Vector Quantity Data Data Data Data Data Data Data Data

Vector an ordered series of scalar quantities a one-dimensional array. Vector Quantity Data Data Data Data Data Data Data Data Vector Processors A vector processor is a pipelined processor with special instructions designed to keep the (floating point) execution unit pipeline(s) full. These special instructions are vector instructions.

More information

PIPELINE AND VECTOR PROCESSING

PIPELINE AND VECTOR PROCESSING PIPELINE AND VECTOR PROCESSING PIPELINING: Pipelining is a technique of decomposing a sequential process into sub operations, with each sub process being executed in a special dedicated segment that operates

More information

Sorting Algorithms. Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar

Sorting Algorithms. Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar Sorting Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003. Topic Overview Issues in Sorting on Parallel

More information

INTERCONNECTION NETWORKS LECTURE 4

INTERCONNECTION NETWORKS LECTURE 4 INTERCONNECTION NETWORKS LECTURE 4 DR. SAMMAN H. AMEEN 1 Topology Specifies way switches are wired Affects routing, reliability, throughput, latency, building ease Routing How does a message get from source

More information

[4] 1 cycle takes 1/(3x10 9 ) seconds. One access to memory takes 50/(3x10 9 ) seconds. =16ns. Performance = 4 FLOPS / (2x50/(3x10 9 )) = 120 MFLOPS.

[4] 1 cycle takes 1/(3x10 9 ) seconds. One access to memory takes 50/(3x10 9 ) seconds. =16ns. Performance = 4 FLOPS / (2x50/(3x10 9 )) = 120 MFLOPS. Give your answers in the space provided with each question. Answers written elsewhere will not be graded. Q1). [4 points] Consider a memory system with level 1 cache of 64 KB and DRAM of 1GB with processor

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2013 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

Advanced Parallel Architecture. Annalisa Massini /2017

Advanced Parallel Architecture. Annalisa Massini /2017 Advanced Parallel Architecture Annalisa Massini - 2016/2017 References Advanced Computer Architecture and Parallel Processing H. El-Rewini, M. Abd-El-Barr, John Wiley and Sons, 2005 Parallel computing

More information

Analytical Modeling of Parallel Systems. To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003.

Analytical Modeling of Parallel Systems. To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003. Analytical Modeling of Parallel Systems To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003. Topic Overview Sources of Overhead in Parallel Programs Performance Metrics for

More information

Parallel Computer Architecture and Programming Written Assignment 3

Parallel Computer Architecture and Programming Written Assignment 3 Parallel Computer Architecture and Programming Written Assignment 3 50 points total. Due Monday, July 17 at the start of class. Problem 1: Message Passing (6 pts) A. (3 pts) You and your friend liked the

More information

Dense Matrix Algorithms

Dense Matrix Algorithms Dense Matrix Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text Introduction to Parallel Computing, Addison Wesley, 2003. Topic Overview Matrix-Vector Multiplication

More information

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1 Asymptotics, Recurrence and Basic Algorithms 1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1 1. O(logn) 2. O(n) 3. O(nlogn) 4. O(n 2 ) 5. O(2 n ) 2. [1 pt] What is the solution

More information

Solutions to Midterm 2 - Monday, July 11th, 2009

Solutions to Midterm 2 - Monday, July 11th, 2009 Solutions to Midterm - Monday, July 11th, 009 CPSC30, Summer009. Instructor: Dr. Lior Malka. (liorma@cs.ubc.ca) 1. Dynamic programming. Let A be a set of n integers A 1,..., A n such that 1 A i n for each

More information

Code Optimizations for High Performance GPU Computing

Code Optimizations for High Performance GPU Computing Code Optimizations for High Performance GPU Computing Yi Yang and Huiyang Zhou Department of Electrical and Computer Engineering North Carolina State University 1 Question to answer Given a task to accelerate

More information

Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming University of Evansville

Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming University of Evansville Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming of Evansville Selection of slides from CIS 410/510 Introduction to Parallel Computing Department of Computer and Information

More information

How to Write Fast Numerical Code

How to Write Fast Numerical Code How to Write Fast Numerical Code Lecture: Memory hierarchy, locality, caches Instructor: Markus Püschel TA: Alen Stojanov, Georg Ofenbeck, Gagandeep Singh Organization Temporal and spatial locality Memory

More information

Lecture 3: Sorting 1

Lecture 3: Sorting 1 Lecture 3: Sorting 1 Sorting Arranging an unordered collection of elements into monotonically increasing (or decreasing) order. S = a sequence of n elements in arbitrary order After sorting:

More information

Lecture 9: Group Communication Operations. Shantanu Dutt ECE Dept. UIC

Lecture 9: Group Communication Operations. Shantanu Dutt ECE Dept. UIC Lecture 9: Group Communication Operations Shantanu Dutt ECE Dept. UIC Acknowledgement Adapted from Chapter 4 slides of the text, by A. Grama w/ a few changes, augmentations and corrections Topic Overview

More information

Computer organization by G. Naveen kumar, Asst Prof, C.S.E Department 1

Computer organization by G. Naveen kumar, Asst Prof, C.S.E Department 1 Pipelining and Vector Processing Parallel Processing: The term parallel processing indicates that the system is able to perform several operations in a single time. Now we will elaborate the scenario,

More information

Lecture 12: Instruction Execution and Pipelining. William Gropp

Lecture 12: Instruction Execution and Pipelining. William Gropp Lecture 12: Instruction Execution and Pipelining William Gropp www.cs.illinois.edu/~wgropp Yet More To Consider in Understanding Performance We have implicitly assumed that an operation takes one clock

More information

Basic Communication Operations Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar

Basic Communication Operations Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar Basic Communication Operations Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003 Topic Overview One-to-All Broadcast

More information

Exam Sample Questions CS3212: Algorithms and Data Structures

Exam Sample Questions CS3212: Algorithms and Data Structures Exam Sample Questions CS31: Algorithms and Data Structures NOTE: the actual exam will contain around 0-5 questions. 1. Consider a LinkedList that has a get(int i) method to return the i-th element in the

More information

Introduction to Parallel & Distributed Computing Parallel Graph Algorithms

Introduction to Parallel & Distributed Computing Parallel Graph Algorithms Introduction to Parallel & Distributed Computing Parallel Graph Algorithms Lecture 16, Spring 2014 Instructor: 罗国杰 gluo@pku.edu.cn In This Lecture Parallel formulations of some important and fundamental

More information

a. Assuming a perfect balance of FMUL and FADD instructions and no pipeline stalls, what would be the FLOPS rate of the FPU?

a. Assuming a perfect balance of FMUL and FADD instructions and no pipeline stalls, what would be the FLOPS rate of the FPU? CPS 540 Fall 204 Shirley Moore, Instructor Test November 9, 204 Answers Please show all your work.. Draw a sketch of the extended von Neumann architecture for a 4-core multicore processor with three levels

More information

Graph Theory. Part of Texas Counties.

Graph Theory. Part of Texas Counties. Graph Theory Part of Texas Counties. We would like to visit each of the above counties, crossing each county only once, starting from Harris county. Is this possible? This problem can be modeled as a graph.

More information

Data Communication and Parallel Computing on Twisted Hypercubes

Data Communication and Parallel Computing on Twisted Hypercubes Data Communication and Parallel Computing on Twisted Hypercubes E. Abuelrub, Department of Computer Science, Zarqa Private University, Jordan Abstract- Massively parallel distributed-memory architectures

More information

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight

More information

The complexity of Sorting and sorting in linear-time. Median and Order Statistics. Chapter 8 and Chapter 9

The complexity of Sorting and sorting in linear-time. Median and Order Statistics. Chapter 8 and Chapter 9 Subject 6 Spring 2017 The complexity of Sorting and sorting in linear-time Median and Order Statistics Chapter 8 and Chapter 9 Disclaimer: These abbreviated notes DO NOT substitute the textbook for this

More information

How to Write Fast Numerical Code

How to Write Fast Numerical Code How to Write Fast Numerical Code Lecture: Cost analysis and performance Instructor: Markus Püschel TA: Gagandeep Singh, Daniele Spampinato & Alen Stojanov Technicalities Research project: Let us know (fastcode@lists.inf.ethz.ch)

More information

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11 Preface xvii Acknowledgments xix CHAPTER 1 Introduction to Parallel Computing 1 1.1 Motivating Parallelism 2 1.1.1 The Computational Power Argument from Transistors to FLOPS 2 1.1.2 The Memory/Disk Speed

More information

Memory Systems and Performance Engineering

Memory Systems and Performance Engineering SPEED LIMIT PER ORDER OF 6.172 Memory Systems and Performance Engineering Fall 2010 Basic Caching Idea A. Smaller memory faster to access B. Use smaller memory to cache contents of larger memory C. Provide

More information

DIVIDE & CONQUER. Problem of size n. Solution to sub problem 1

DIVIDE & CONQUER. Problem of size n. Solution to sub problem 1 DIVIDE & CONQUER Definition: Divide & conquer is a general algorithm design strategy with a general plan as follows: 1. DIVIDE: A problem s instance is divided into several smaller instances of the same

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #12 2/21/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Last class Outline

More information

Data Structure and Algorithm, Spring 2013 Midterm Examination 120 points Time: 2:20pm-5:20pm (180 minutes), Tuesday, April 16, 2013

Data Structure and Algorithm, Spring 2013 Midterm Examination 120 points Time: 2:20pm-5:20pm (180 minutes), Tuesday, April 16, 2013 Data Structure and Algorithm, Spring 2013 Midterm Examination 120 points Time: 2:20pm-5:20pm (180 minutes), Tuesday, April 16, 2013 Problem 1. In each of the following question, please specify if the statement

More information

Parallel Architecture. Sathish Vadhiyar

Parallel Architecture. Sathish Vadhiyar Parallel Architecture Sathish Vadhiyar Motivations of Parallel Computing Faster execution times From days or months to hours or seconds E.g., climate modelling, bioinformatics Large amount of data dictate

More information

Parallel Numerics, WT 2013/ Introduction

Parallel Numerics, WT 2013/ Introduction Parallel Numerics, WT 2013/2014 1 Introduction page 1 of 122 Scope Revise standard numerical methods considering parallel computations! Required knowledge Numerics Parallel Programming Graphs Literature

More information

Memory Hierarchy. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Memory Hierarchy. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Memory Hierarchy Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Time (ns) The CPU-Memory Gap The gap widens between DRAM, disk, and CPU speeds

More information

Lecture 2: Getting Started

Lecture 2: Getting Started Lecture 2: Getting Started Insertion Sort Our first algorithm is Insertion Sort Solves the sorting problem Input: A sequence of n numbers a 1, a 2,..., a n. Output: A permutation (reordering) a 1, a 2,...,

More information

Analysis of Algorithms. Unit 4 - Analysis of well known Algorithms

Analysis of Algorithms. Unit 4 - Analysis of well known Algorithms Analysis of Algorithms Unit 4 - Analysis of well known Algorithms 1 Analysis of well known Algorithms Brute Force Algorithms Greedy Algorithms Divide and Conquer Algorithms Decrease and Conquer Algorithms

More information

Brute Force: Selection Sort

Brute Force: Selection Sort Brute Force: Intro Brute force means straightforward approach Usually based directly on problem s specs Force refers to computational power Usually not as efficient as elegant solutions Advantages: Applicable

More information

cs/ee 143 Fall

cs/ee 143 Fall cs/ee 143 Fall 2018 5 2 Ethernet 2.1 W&P, P3.2 3 Points. Consider the Slotted ALOHA MAC protocol. There are N nodes sharing a medium, and time is divided into slots. Each packet takes up a single slot.

More information

CS 426 Parallel Computing. Parallel Computing Platforms

CS 426 Parallel Computing. Parallel Computing Platforms CS 426 Parallel Computing Parallel Computing Platforms Ozcan Ozturk http://www.cs.bilkent.edu.tr/~ozturk/cs426/ Slides are adapted from ``Introduction to Parallel Computing'' Topic Overview Implicit Parallelism:

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Introduction to Parallel Computing George Karypis Sorting Outline Background Sorting Networks Quicksort Bucket-Sort & Sample-Sort Background Input Specification Each processor has n/p elements A ordering

More information

Advanced optimizations of cache performance ( 2.2)

Advanced optimizations of cache performance ( 2.2) Advanced optimizations of cache performance ( 2.2) 30 1. Small and Simple Caches to reduce hit time Critical timing path: address tag memory, then compare tags, then select set Lower associativity Direct-mapped

More information

Computer Science 385 Design and Analysis of Algorithms Siena College Spring Topic Notes: Brute-Force Algorithms

Computer Science 385 Design and Analysis of Algorithms Siena College Spring Topic Notes: Brute-Force Algorithms Computer Science 385 Design and Analysis of Algorithms Siena College Spring 2019 Topic Notes: Brute-Force Algorithms Our first category of algorithms are called brute-force algorithms. Levitin defines

More information

Writing Parallel Programs; Cost Model.

Writing Parallel Programs; Cost Model. CSE341T 08/30/2017 Lecture 2 Writing Parallel Programs; Cost Model. Due to physical and economical constraints, a typical machine we can buy now has 4 to 8 computing cores, and soon this number will be

More information

Load Balancing and Termination Detection

Load Balancing and Termination Detection Chapter 7 Load Balancing and Termination Detection 1 Load balancing used to distribute computations fairly across processors in order to obtain the highest possible execution speed. Termination detection

More information

Parallel Exact Inference on the Cell Broadband Engine Processor

Parallel Exact Inference on the Cell Broadband Engine Processor Parallel Exact Inference on the Cell Broadband Engine Processor Yinglong Xia and Viktor K. Prasanna {yinglonx, prasanna}@usc.edu University of Southern California http://ceng.usc.edu/~prasanna/ SC 08 Overview

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

CS575 Parallel Processing

CS575 Parallel Processing CS575 Parallel Processing Lecture three: Interconnection Networks Wim Bohm, CSU Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 license.

More information

Review: Creating a Parallel Program. Programming for Performance

Review: Creating a Parallel Program. Programming for Performance Review: Creating a Parallel Program Can be done by programmer, compiler, run-time system or OS Steps for creating parallel program Decomposition Assignment of tasks to processes Orchestration Mapping (C)

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 1: Course Overview; Matrix Multiplication Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 21 Outline 1 Course

More information

Parallel Graph Algorithms

Parallel Graph Algorithms Parallel Graph Algorithms Design and Analysis of Parallel Algorithms 5DV050 Spring 202 Part I Introduction Overview Graphsdenitions, properties, representation Minimal spanning tree Prim's algorithm Shortest

More information

The PRAM (Parallel Random Access Memory) model. All processors operate synchronously under the control of a common CPU.

The PRAM (Parallel Random Access Memory) model. All processors operate synchronously under the control of a common CPU. The PRAM (Parallel Random Access Memory) model All processors operate synchronously under the control of a common CPU. The PRAM (Parallel Random Access Memory) model All processors operate synchronously

More information

COMP Analysis of Algorithms & Data Structures

COMP Analysis of Algorithms & Data Structures COMP 3170 - Analysis of Algorithms & Data Structures Shahin Kamali Lecture 6 - Jan. 15, 2018 CLRS 7.1, 7-4, 9.1, 9.3 University of Manitoba COMP 3170 - Analysis of Algorithms & Data Structures 1 / 12 Quick-sort

More information

Parallel Processing IMP Questions

Parallel Processing IMP Questions Winter 14 Summer 14 Winter 13 Summer 13 180702 Parallel Processing IMP Questions Sr Chapter Questions Total 1 3 2 9 3 10 4 9 5 7 What is Data Decomposition? Explain Data Decomposition with proper example.

More information

CS 6143 COMPUTER ARCHITECTURE II SPRING 2014

CS 6143 COMPUTER ARCHITECTURE II SPRING 2014 CS 6143 COMPUTER ARCHITECTURE II SPRING 2014 DUE : April 9, 2014 HOMEWORK IV READ : - Related portions of Chapter 5 and Appendces F and I of the Hennessy book - Related portions of Chapter 1, 4 and 6 of

More information

Memory. Objectives. Introduction. 6.2 Types of Memory

Memory. Objectives. Introduction. 6.2 Types of Memory Memory Objectives Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured. Master the concepts

More information

2. True or false: even though BFS and DFS have the same space complexity, they do not always have the same worst case asymptotic time complexity.

2. True or false: even though BFS and DFS have the same space complexity, they do not always have the same worst case asymptotic time complexity. 1. T F: Consider a directed graph G = (V, E) and a vertex s V. Suppose that for all v V, there exists a directed path in G from s to v. Suppose that a DFS is run on G, starting from s. Then, true or false:

More information

Practice Problems for the Final

Practice Problems for the Final ECE-250 Algorithms and Data Structures (Winter 2012) Practice Problems for the Final Disclaimer: Please do keep in mind that this problem set does not reflect the exact topics or the fractions of each

More information

Parallel Computing. Hwansoo Han (SKKU)

Parallel Computing. Hwansoo Han (SKKU) Parallel Computing Hwansoo Han (SKKU) Unicore Limitations Performance scaling stopped due to Power consumption Wire delay DRAM latency Limitation in ILP 10000 SPEC CINT2000 2 cores/chip Xeon 3.0GHz Core2duo

More information

Problem Score Maximum MC 34 (25/17) = 50 Total 100

Problem Score Maximum MC 34 (25/17) = 50 Total 100 Stony Brook University Midterm 2 CSE 373 Analysis of Algorithms November 22, 2016 Midterm Exam Name: ID #: Signature: Circle one: GRAD / UNDERGRAD INSTRUCTIONS: This is a closed book, closed mouth exam.

More information

Week 7: Assignment Solutions

Week 7: Assignment Solutions Week 7: Assignment Solutions 1. In 6-bit 2 s complement representation, when we subtract the decimal number +6 from +3, the result (in binary) will be: a. 111101 b. 000011 c. 100011 d. 111110 Correct answer

More information

7 Distributed Data Management II Caching

7 Distributed Data Management II Caching 7 Distributed Data Management II Caching In this section we will study the approach of using caching for the management of data in distributed systems. Caching always tries to keep data at the place where

More information

Dynamic Programming II

Dynamic Programming II Lecture 11 Dynamic Programming II 11.1 Overview In this lecture we continue our discussion of dynamic programming, focusing on using it for a variety of path-finding problems in graphs. Topics in this

More information

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK UNIT-III. SUB NAME: DESIGN AND ANALYSIS OF ALGORITHMS SEM/YEAR: III/ II PART A (2 Marks)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK UNIT-III. SUB NAME: DESIGN AND ANALYSIS OF ALGORITHMS SEM/YEAR: III/ II PART A (2 Marks) DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK UNIT-III SUB CODE: CS2251 DEPT: CSE SUB NAME: DESIGN AND ANALYSIS OF ALGORITHMS SEM/YEAR: III/ II PART A (2 Marks) 1. Write any four examples

More information

Total Score /15 /20 /30 /10 /5 /20 Grader

Total Score /15 /20 /30 /10 /5 /20 Grader NAME: NETID: CS2110 Fall 2009 Prelim 2 November 17, 2009 Write your name and Cornell netid. There are 6 questions on 8 numbered pages. Check now that you have all the pages. Write your answers in the boxes

More information

9 Distributed Data Management II Caching

9 Distributed Data Management II Caching 9 Distributed Data Management II Caching In this section we will study the approach of using caching for the management of data in distributed systems. Caching always tries to keep data at the place where

More information