Lecture 11 Loop Transformations for Parallelism and Locality
|
|
- Madeleine Neal
- 5 years ago
- Views:
Transcription
1 Lecture 11 Loop Transformations for Parallelism and Locality 1. Examples 2. Affine Partitioning: Do-all 3. Affine Partitioning: Pipelining Readings: Chapter , , Shared Memory Machines Performance on Shared Address Space Multiprocessors: Parallelism & Locality P P P P M M M M nterconnect 2 1
2 Parallelism and Locality Parallelism DOES NOT imply speed up! Parallel performance: mprove locality with loop transformations Minimize communication Operations using the same data are executed on the same processor Sequential performance: mprove locality with loop transformations Minimize cache misses Operations using the same data are executed close in time. 3 Loop Permutation (Loop nterchange) Z[,] = Z[-1,] for = 1 to 3 for = 1 to 4 Z[, ] = Z[ -1, ] " j' % " ' = 0 1 % " i % ' ' # i' & # 1 0& # j& 4 2
3 Loop Fusion T[]= A[]+B[] (s1) for = 1 to 4 C[ ]= T[ ] x T[ ] (s2) for = 1 to 4 T[]= A[]+B[] (s1) C[]= T[] x T[] (s2) s1: [ j] = 1 [ ] [i] [ j] = [ 1] [i'] s2: 5 Loop Transformations Unimodular transforms on loop nests nterchange Skewing Reversal Cross statement transforms Loop fusion Loop fission Re-indexing How to combine them to get parallelism and locality? 6 3
4 Affine Partitioning: An Contrived but llustrative Example FOR j = 1 TO n FOR i = 1 TO n A[i,j] = A[i,j]+B[i-1,j]; (S 1 ) B[i,j] = A[i,j-1]*B[i,j]; (S 2 ) j S 1 S 2 i 7 Best Parallelization Scheme Algorithm finds affine partition mappings for each instruction: S1: Execute iteration (i, j) on processor i-j. S2: Execute iteration (i, j) on processor i-j+1. SPMD code: Let p be the processor s D number if (1-n <= p <= n) then if [1 <= p) then B[p,1] = A[p,0] * B[p,1]; (S 2 ) for i 1 = max[1,1+p) to min[n,n-1+p) do A[i 1,i 1 -p] = A[i 1,i 1 -p] + B[i 1-1,i 1 -p]; (S 1 ) B[i 1,i 1 -p+1] = A[i 1,i 1 -p] * B[i 1,i 1 -p+1]; (S 2 ) if (p <= 0) then A[n+p,n] = A[n+p,N] + B[n+p-1,n]; (S 1 ) 8 4
5 2. teration Space FOR i = 0 to 5 FOR j = i to 7 n-deep loop nests: n-dimensional polytope terations: coordinates in the iteration space Assume: iteration index is incremented in the loop Sequential execution order: lexicographic order [0,0], [0,1],, [0,6], [0,7], [1,1],, [1,6], [1,7], 9 Maximum Parallelism & No Communication Loops F 1 i 1 +f 1 F 2 i 2 +f 2 Array C 2 i 2 +c 2 C 1 i 1 +c 1 Processor D For every pair of data dependent accesses F 1 i 1 +f 1 and F 2 i 2 +f 2 Find C 1, c 1, C 2, c 2 : i 1, i 2 F 1 i 1 + f 1 = F 2 i 2 +f 2 C 1 i 1 +c 1 = C 2 i 2 +c 2 with the objective of maximizing the rank of C 1, C
6 Rank of Partitioning = Degree of Parallelism Affine Mapping i j [ 0 0] [ 1] i 1 0 i 0 j 0 1 j Rank Mapped to same processor 11 Example 1: Loop Transform Z[,] = Z[-1,] Find affine partitioning: c 1, c 2, c 0 such that " p = [ c 1 c 2 ] i % ' + c 0 # j& Suppose itera4on i,j & i, j refer to same loca4on i = i - 1 j = j No communica4on means: c 1 i + c 2 j + c 0 = c 1 i + c 2 j + c 0 c 1 (i - 1) +c 2 j + c 0 = c 1 i + c 2 j + c 0 c 1 = 0 p = c 2 j + c 0 Pick simplest c 2, c 0 : c 2 = 1, c 0 = 0 p = j 12 6
7 Code Generation Naive Each processor visits all the iterations Executes only if it owns that iteration Optimization Removes unnecessary looping and condition evaluation 13 Code Generation Z[,] = Z[-1,] p = j for P = 1 to 3 if (j == P) Z[,] = Z[-1,] for P = 1 to 3 Z[,P] = Z[-1,P] SPMD (single program multiple data) code: Z[,P] = Z[-1,P] 14 7
8 Loop Permutation (Loop nterchange) Z[,] = Z[-1,] for P = 1 to 3 Z[,P] = Z[-1,P] " p' % " ' = 0 1 % " # i' & # 1 0& ' i % # j ' & P 15 Example 2: Loop Fusion T[]= A[]+B[] (s1) for = 1 to 4 C[ ]= T[ ] x T[ ] (s2) Find affine partitioning: c 1,1, c 1,0, c 2.1, c 1,0, such that s1: s2: [ ] [i] + c 1,0 [ p] = [ c 2,1 ] [i'] + c 2,0 [ p] = c 1,1 Suppose itera4on i & i refer to the same loca4on i = i No communica4on means: c 1,1 i + c 1,0 = c 2,1 i + c 2,0 c 1,1 = c 2,1 c 1,0 = c 2,0 Pick simplest values: c 1,1 = c 2,1 = 1, c 1,0 = c 2,0 = 0 p = i; p=i 16 8
9 Loop Fusion T[]= A[]+B[] (s1) for = 1 to 4 C[ ]= T[ ] x T[ ] (s2) for P = 1 to 4 if ( == P) T[]= A[]+B[] (s1) for = 1 to 4 if ( == P) C[ ]= T[ ] x T[ ] (s2) s1: s2: [ p] = 1 [ ] [i] [ p] = 1 [ ] [i'] for P = 1 to 4 T[P]= A[P]+B[P] (s1) C[P]= T[P] x T[P] (s2) 17 Example 3: 2 Nested, Parallel Loops Z[,] = Z[,]+1 Find affine partitioning: c 1, c 2, c 0 such that " p = [ c 1 c 2 ] i % ' + c 0 # j& Suppose itera4on i,j & i, j refer to same loca4on i = i j = j No communica4on means: c 1 i + c 2 j + c 0 = c 1 i + c 2 j + c 0 c 1 i +c 2 j + c 0 = c 1 i + c 2 j + c 0 No constraints Two basis vectors: [c 1 c 2 ]=[1 0], or [c 1 c 2 ] = [0 1] Two answers for p: two degrees of parallelism " # p 1 p 2 % " ' = 1 0 %" & # 0 1& ' i % # j ' & 18 9
10 Example 3: 2 Nested, Parallel Loops Z[,] = Z[,]+1 " # p 1 p 2 % " ' = 1 0 %" & # 0 1& ' i % # j ' & for p1 = 1 to 4 for p2 = 1 to 3 if (==p1 & == p2) Z[,] = Z[,]+1 for p1 = 1 to 4 for p2 = 1 to 3 Z[p1,p2] = Z[p1,p2]+1 19 Optimizing Arbitrary Loop Nesting Using Affine Partitions (chotst, NAS) DO 1 = 0, N 0 = MAX ( -M, - ) DO 2 = 0, -1 DO 3 = 0 -, -1 DO 3 L = 0, NMAT 3 A(L,,) = A(L,,) - A(L,,+) * A(L,+,) DO 2 L = 0, NMAT 2 A(L,,) = A(L,,) * A(L,0,+) DO 4 L = 0, NMAT 4 EPSS(L) = EPS * A(L,0,) DO 5 = 0, -1 DO 5 L = 0, NMAT 5 A(L,0,) = A(L,0,) - A(L,,) ** 2 DO 1 L = 0, NMAT 1 A(L,0,) = 1. / SQRT ( ABS (EPSS(L) + A(L,0,)) ) DO 6 = 0, NRHS DO 7 K = 0, N DO 8 L = 0, NMAT 8 B(,L,K) = B(,L,K) * A(L,0,K) DO 7 = 1, MN (M, N-K) DO 7 L = 0, NMAT 7 B(,L,K+) = B(,L,K+) - A(L,-,K+) * B(,L,K) DO 6 K = N, 0, -1 DO 9 L = 0, NMAT 9 B(,L,K) = B(,L,K) * A(L,0,K) DO 6 = 1, MN (M, K) DO 6 L = 0, NMAT 6 B(,L,K-) = B(,L,K-) - A(L,-,K) * B(,L,K) A B EPSS 20 L L L 10
11 Chotst: Results with Affine Partitioning + Blocking (Unimodular: a subset of affine partitioning for perfect loop nests) Speedup Unimodular + Blocking Affine Partitioning + Blocking Number of Processors 21 Summary of Affine Partitioning Communication-Free Loops F 1 i 1 +f 1 F 2 i 2 +f 2 Array C 2 i 2 +c 2 C 1 i 1 +c 1 Processor D 22 11
12 Advanced topic: Pipelining SOR (Successive Over-Relaxation): An Example for i = 1 TO m for j = 1 to n A[i,j] = c * (A[i-1,j] + A[i,j-1]) i j 23 Finding the Maximum Degree of Pipelining i 1 i 2 Loops F 2 i 2 +f 2 F 1 i 1 +f 1 C C 1 i 1 +c 2 i 2 +c 2 1 Time Stage Array For every pair of data dependent accesses F 1 i 1 +f 1 and F 2 i 2 +f 2 Let B 1 i 1 +b 1 0, B 2 i 2 +b 2 0 be the corresponding loop bound constraints, Find C 1, c 1, C 2, c 2 : i 1, i 2 B 1 i 1 + b 1 0, B 2 i 2 + b 2 0 (i 1 i 2 ) (F 1 i 1 + f 1 = F 2 i 2 +f 2 ) C 1 i 1 +c 1 C 2 i 2 +c 2 with the objective of maximizing the rank of C 1, C
13 Key nsight Choice in time mapping => (pipelined) parallelism Rank(C) 1 degree of parallelism with 1 degree of synchronization Can create blocks with Rank(C) dimensions Find time partitions is not as straightforward as space partitions Need to deal with linear inequalities Solved using Farkas Lemma no simple intuitive proof 25 Communication-Free Summary of Affine Partitioning F 1 i 1 +f 1 Loops Array F 2 i 2 +f 2 C 1 i 1 +c 1 C 2 i 2 +c 2 Processor D Pipelining i 1 i 2 Loops F 2 i 2 +f 2 F 1 i 1 +f 1 Array C 1 i 1 +c 1 C 2 i 2 +c 2 Time Stage 26 13
Loop Transformations, Dependences, and Parallelization
Loop Transformations, Dependences, and Parallelization Announcements HW3 is due Wednesday February 15th Today HW3 intro Unimodular framework rehash with edits Skewing Smith-Waterman (the fix is in!), composing
More informationThe Polyhedral Model (Transformations)
The Polyhedral Model (Transformations) Announcements HW4 is due Wednesday February 22 th Project proposal is due NEXT Friday, extension so that example with tool is possible (see resources website for
More informationCS 293S Parallelism and Dependence Theory
CS 293S Parallelism and Dependence Theory Yufei Ding Reference Book: Optimizing Compilers for Modern Architecture by Allen & Kennedy Slides adapted from Louis-Noël Pouche, Mary Hall End of Moore's Law
More informationTiling: A Data Locality Optimizing Algorithm
Tiling: A Data Locality Optimizing Algorithm Announcements Monday November 28th, Dr. Sanjay Rajopadhye is talking at BMAC Friday December 2nd, Dr. Sanjay Rajopadhye will be leading CS553 Last Monday Kelly
More informationPrinciple of Polyhedral model for loop optimization. cschen 陳鍾樞
Principle of Polyhedral model for loop optimization cschen 陳鍾樞 Outline Abstract model Affine expression, Polygon space Polyhedron space, Affine Accesses Data reuse Data locality Tiling Space partition
More informationLecture 9 Basic Parallelization
Lecture 9 Basic Parallelization I. Introduction II. Data Dependence Analysis III. Loop Nests + Locality IV. Interprocedural Parallelization Chapter 11.1-11.1.4 CS243: Parallelization 1 Machine Learning
More informationLecture 9 Basic Parallelization
Lecture 9 Basic Parallelization I. Introduction II. Data Dependence Analysis III. Loop Nests + Locality IV. Interprocedural Parallelization Chapter 11.1-11.1.4 CS243: Parallelization 1 Machine Learning
More informationModule 16: Data Flow Analysis in Presence of Procedure Calls Lecture 32: Iteration. The Lecture Contains: Iteration Space.
The Lecture Contains: Iteration Space Iteration Vector Normalized Iteration Vector Dependence Distance Direction Vector Loop Carried Dependence Relations Dependence Level Iteration Vector - Triangular
More information6.189 IAP Lecture 11. Parallelizing Compilers. Prof. Saman Amarasinghe, MIT IAP 2007 MIT
6.189 IAP 2007 Lecture 11 Parallelizing Compilers 1 6.189 IAP 2007 MIT Outline Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization Opportunities Generation of Parallel
More informationCompiling for Advanced Architectures
Compiling for Advanced Architectures In this lecture, we will concentrate on compilation issues for compiling scientific codes Typically, scientific codes Use arrays as their main data structures Have
More informationParallelization. Saman Amarasinghe. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
Spring 2 Parallelization Saman Amarasinghe Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Outline Why Parallelism Parallel Execution Parallelizing Compilers
More informationA Crash Course in Compilers for Parallel Computing. Mary Hall Fall, L2: Transforms, Reuse, Locality
A Crash Course in Compilers for Parallel Computing Mary Hall Fall, 2008 1 Overview of Crash Course L1: Data Dependence Analysis and Parallelization (Oct. 30) L2 & L3: Loop Reordering Transformations, Reuse
More informationOutline. Why Parallelism Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization Opportunities
Parallelization Outline Why Parallelism Parallel Execution Parallelizing Compilers Dependence Analysis Increasing Parallelization Opportunities Moore s Law From Hennessy and Patterson, Computer Architecture:
More informationParallelisation. Michael O Boyle. March 2014
Parallelisation Michael O Boyle March 2014 1 Lecture Overview Parallelisation for fork/join Mapping parallelism to shared memory multi-processors Loop distribution and fusion Data Partitioning and SPMD
More informationReview. Loop Fusion Example
Review Distance vectors Concisely represent dependences in loops (i.e., in iteration spaces) Dictate what transformations are legal e.g., Permutation and parallelization Legality A dependence vector is
More informationLoop Transformations! Part II!
Lecture 9! Loop Transformations! Part II! John Cavazos! Dept of Computer & Information Sciences! University of Delaware! www.cis.udel.edu/~cavazos/cisc879! Loop Unswitching Hoist invariant control-flow
More informationAffine and Unimodular Transformations for Non-Uniform Nested Loops
th WSEAS International Conference on COMPUTERS, Heraklion, Greece, July 3-, 008 Affine and Unimodular Transformations for Non-Uniform Nested Loops FAWZY A. TORKEY, AFAF A. SALAH, NAHED M. EL DESOUKY and
More informationLinear Loop Transformations for Locality Enhancement
Linear Loop Transformations for Locality Enhancement 1 Story so far Cache performance can be improved by tiling and permutation Permutation of perfectly nested loop can be modeled as a linear transformation
More informationCoarse-Grained Parallelism
Coarse-Grained Parallelism Variable Privatization, Loop Alignment, Loop Fusion, Loop interchange and skewing, Loop Strip-mining cs6363 1 Introduction Our previous loop transformations target vector and
More informationNull space basis: mxz. zxz I
Loop Transformations Linear Locality Enhancement for ache performance can be improved by tiling and permutation Permutation of perfectly nested loop can be modeled as a matrix of the loop nest. dependence
More information6. Lecture notes on matroid intersection
Massachusetts Institute of Technology 18.453: Combinatorial Optimization Michel X. Goemans May 2, 2017 6. Lecture notes on matroid intersection One nice feature about matroids is that a simple greedy algorithm
More informationLoop Nest Optimizer of GCC. Sebastian Pop. Avgust, 2006
Loop Nest Optimizer of GCC CRI / Ecole des mines de Paris Avgust, 26 Architecture of GCC and Loop Nest Optimizer C C++ Java F95 Ada GENERIC GIMPLE Analyses aliasing data dependences number of iterations
More informationEssential constraints: Data Dependences. S1: a = b + c S2: d = a * 2 S3: a = c + 2 S4: e = d + c + 2
Essential constraints: Data Dependences S1: a = b + c S2: d = a * 2 S3: a = c + 2 S4: e = d + c + 2 Essential constraints: Data Dependences S1: a = b + c S2: d = a * 2 S3: a = c + 2 S4: e = d + c + 2 S2
More informationData Dependence Analysis
CSc 553 Principles of Compilation 33 : Loop Dependence Data Dependence Analysis Department of Computer Science University of Arizona collberg@gmail.com Copyright c 2011 Christian Collberg Data Dependence
More informationEE/CSCI 451 Midterm 1
EE/CSCI 451 Midterm 1 Spring 2018 Instructor: Xuehai Qian Friday: 02/26/2018 Problem # Topic Points Score 1 Definitions 20 2 Memory System Performance 10 3 Cache Performance 10 4 Shared Memory Programming
More informationMaximizing parallelism and minimizing synchronization with affine partitions 1
Ž. Parallel Computing 24 998 445 475 Maximizing parallelism and minimizing synchronization with affine partitions Amy W. Lim ), Monica S. Lam 2 Computer Systems Laboratory, Stanford UniÕersity, Stanford,
More informationEfficient Polynomial-Time Nested Loop Fusion with Full Parallelism
Efficient Polynomial-Time Nested Loop Fusion with Full Parallelism Edwin H.-M. Sha Timothy W. O Neil Nelson L. Passos Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science Erik
More informationComputing and Informatics, Vol. 36, 2017, , doi: /cai
Computing and Informatics, Vol. 36, 2017, 566 596, doi: 10.4149/cai 2017 3 566 NESTED-LOOPS TILING FOR PARALLELIZATION AND LOCALITY OPTIMIZATION Saeed Parsa, Mohammad Hamzei Department of Computer Engineering
More informationParallelizing The Matrix Multiplication. 6/10/2013 LONI Parallel Programming Workshop
Parallelizing The Matrix Multiplication 6/10/2013 LONI Parallel Programming Workshop 2013 1 Serial version 6/10/2013 LONI Parallel Programming Workshop 2013 2 X = A md x B dn = C mn d c i,j = a i,k b k,j
More informationTransformations Techniques for extracting Parallelism in Non-Uniform Nested Loops
Transformations Techniques for extracting Parallelism in Non-Uniform Nested Loops FAWZY A. TORKEY, AFAF A. SALAH, NAHED M. EL DESOUKY and SAHAR A. GOMAA ) Kaferelsheikh University, Kaferelsheikh, EGYPT
More informationHigh Performance Computing Lecture 41. Matthew Jacob Indian Institute of Science
High Performance Computing Lecture 41 Matthew Jacob Indian Institute of Science Example: MPI Pi Calculating Program /Each process initializes, determines the communicator size and its own rank MPI_Init
More informationPolyhedral Compilation Foundations
Polyhedral Compilation Foundations Louis-Noël Pouchet pouchet@cse.ohio-state.edu Dept. of Computer Science and Engineering, the Ohio State University Feb 22, 2010 888.11, Class #5 Introduction: Polyhedral
More informationControl flow graphs and loop optimizations. Thursday, October 24, 13
Control flow graphs and loop optimizations Agenda Building control flow graphs Low level loop optimizations Code motion Strength reduction Unrolling High level loop optimizations Loop fusion Loop interchange
More informationClass Information INFORMATION and REMINDERS Homework 8 has been posted. Due Wednesday, December 13 at 11:59pm. Third programming has been posted. Due Friday, December 15, 11:59pm. Midterm sample solutions
More informationLegal and impossible dependences
Transformations and Dependences 1 operations, column Fourier-Motzkin elimination us use these tools to determine (i) legality of permutation and Let generation of transformed code. (ii) Recall: Polyhedral
More informationTiling: A Data Locality Optimizing Algorithm
Tiling: A Data Locality Optimizing Algorithm Previously Unroll and Jam Homework PA3 is due Monday November 2nd Today Unroll and Jam is tiling Code generation for fixed-sized tiles Paper writing and critique
More informationLooPo: Automatic Loop Parallelization
LooPo: Automatic Loop Parallelization Michael Claßen Fakultät für Informatik und Mathematik Düsseldorf, November 27 th 2008 Model-Based Loop Transformations model-based approach: map source code to an
More informationPerformance Issues in Parallelization Saman Amarasinghe Fall 2009
Performance Issues in Parallelization Saman Amarasinghe Fall 2009 Today s Lecture Performance Issues of Parallelism Cilk provides a robust environment for parallelization It hides many issues and tries
More informationIterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time
Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time Louis-Noël Pouchet, Cédric Bastoul, Albert Cohen and Nicolas Vasilache ALCHEMY, INRIA Futurs / University of Paris-Sud XI March
More informationAdvanced optimizations of cache performance ( 2.2)
Advanced optimizations of cache performance ( 2.2) 30 1. Small and Simple Caches to reduce hit time Critical timing path: address tag memory, then compare tags, then select set Lower associativity Direct-mapped
More informationAutotuning. John Cavazos. University of Delaware UNIVERSITY OF DELAWARE COMPUTER & INFORMATION SCIENCES DEPARTMENT
Autotuning John Cavazos University of Delaware What is Autotuning? Searching for the best code parameters, code transformations, system configuration settings, etc. Search can be Quasi-intelligent: genetic
More informationCS671 Parallel Programming in the Many-Core Era
1 CS671 Parallel Programming in the Many-Core Era Polyhedral Framework for Compilation: Polyhedral Model Representation, Data Dependence Analysis, Scheduling and Data Locality Optimizations December 3,
More information11.1 Facility Location
CS787: Advanced Algorithms Scribe: Amanda Burton, Leah Kluegel Lecturer: Shuchi Chawla Topic: Facility Location ctd., Linear Programming Date: October 8, 2007 Today we conclude the discussion of local
More informationGraphs and Network Flows IE411. Lecture 13. Dr. Ted Ralphs
Graphs and Network Flows IE411 Lecture 13 Dr. Ted Ralphs IE411 Lecture 13 1 References for Today s Lecture IE411 Lecture 13 2 References for Today s Lecture Required reading Sections 21.1 21.2 References
More informationPerformance Issues in Parallelization. Saman Amarasinghe Fall 2010
Performance Issues in Parallelization Saman Amarasinghe Fall 2010 Today s Lecture Performance Issues of Parallelism Cilk provides a robust environment for parallelization It hides many issues and tries
More informationCS 473: Algorithms. Ruta Mehta. Spring University of Illinois, Urbana-Champaign. Ruta (UIUC) CS473 1 Spring / 29
CS 473: Algorithms Ruta Mehta University of Illinois, Urbana-Champaign Spring 2018 Ruta (UIUC) CS473 1 Spring 2018 1 / 29 CS 473: Algorithms, Spring 2018 Simplex and LP Duality Lecture 19 March 29, 2018
More informationModule 18: Loop Optimizations Lecture 36: Cycle Shrinking. The Lecture Contains: Cycle Shrinking. Cycle Shrinking in Distance Varying Loops
The Lecture Contains: Cycle Shrinking Cycle Shrinking in Distance Varying Loops Loop Peeling Index Set Splitting Loop Fusion Loop Fission Loop Reversal Loop Skewing Iteration Space of The Loop Example
More informationData Dependences and Parallelization
Data Dependences and Parallelization 1 Agenda Introduction Single Loop Nested Loops Data Dependence Analysis 2 Motivation DOALL loops: loops whose iterations can execute in parallel for i = 11, 20 a[i]
More informationA Generic Separation Algorithm and Its Application to the Vehicle Routing Problem
A Generic Separation Algorithm and Its Application to the Vehicle Routing Problem Presented by: Ted Ralphs Joint work with: Leo Kopman Les Trotter Bill Pulleyblank 1 Outline of Talk Introduction Description
More informationLinear Programming in Small Dimensions
Linear Programming in Small Dimensions Lekcija 7 sergio.cabello@fmf.uni-lj.si FMF Univerza v Ljubljani Edited from slides by Antoine Vigneron Outline linear programming, motivation and definition one dimensional
More informationLecture 2. 1 Introduction. 2 The Set Cover Problem. COMPSCI 632: Approximation Algorithms August 30, 2017
COMPSCI 632: Approximation Algorithms August 30, 2017 Lecturer: Debmalya Panigrahi Lecture 2 Scribe: Nat Kell 1 Introduction In this lecture, we examine a variety of problems for which we give greedy approximation
More informationECE 669 Parallel Computer Architecture
ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation Parallel Compilation Two approaches to compilation Parallelize a program manually Sequential code converted to parallel code Develop
More informationLoop Tiling for Parallelism
Loop Tiling for Parallelism THE KLUWER INTERNATIONAL SERIES IN ENGINEERING AND COMPUTER SCIENCE LOOP TILING FOR PARALLELISM JINGLING XUE School of Computer Science and Engineering The University of New
More informationACO Comprehensive Exam October 12 and 13, Computability, Complexity and Algorithms
1. Computability, Complexity and Algorithms Given a simple directed graph G = (V, E), a cycle cover is a set of vertex-disjoint directed cycles that cover all vertices of the graph. 1. Show that there
More information4 Greedy Approximation for Set Cover
COMPSCI 532: Design and Analysis of Algorithms October 14, 2015 Lecturer: Debmalya Panigrahi Lecture 12 Scribe: Allen Xiao 1 Overview In this lecture, we introduce approximation algorithms and their analysis
More informationProgram Transformations for Cache Locality Enhancement on Shared-memory Multiprocessors. Naraig Manjikian
Program Transformations for Cache Locality Enhancement on Shared-memory Multiprocessors by Naraig Manjikian A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy
More informationLecture: Pipelining Basics
Lecture: Pipelining Basics Topics: Basic pipelining implementation Video 1: What is pipelining? Video 2: Clocks and latches Video 3: An example 5-stage pipeline Video 4: Loads/Stores and RISC/CISC Video
More informationExploring Parallelism At Different Levels
Exploring Parallelism At Different Levels Balanced composition and customization of optimizations 7/9/2014 DragonStar 2014 - Qing Yi 1 Exploring Parallelism Focus on Parallelism at different granularities
More informationLoops and Locality. with an introduc-on to the memory hierarchy. COMP 506 Rice University Spring target code. source code OpJmizer
COMP 506 Rice University Spring 2017 Loops and Locality with an introduc-on to the memory hierarchy source code Front End IR OpJmizer IR Back End target code Copyright 2017, Keith D. Cooper & Linda Torczon,
More informationCOT 6936: Topics in Algorithms! Giri Narasimhan. ECS 254A / EC 2443; Phone: x3748
COT 6936: Topics in Algorithms! Giri Narasimhan ECS 254A / EC 2443; Phone: x3748 giri@cs.fiu.edu http://www.cs.fiu.edu/~giri/teach/cot6936_s12.html https://moodle.cis.fiu.edu/v2.1/course/view.php?id=174
More informationMathematical and Algorithmic Foundations Linear Programming and Matchings
Adavnced Algorithms Lectures Mathematical and Algorithmic Foundations Linear Programming and Matchings Paul G. Spirakis Department of Computer Science University of Patras and Liverpool Paul G. Spirakis
More informationSteiner Trees and Forests
Massachusetts Institute of Technology Lecturer: Adriana Lopez 18.434: Seminar in Theoretical Computer Science March 7, 2006 Steiner Trees and Forests 1 Steiner Tree Problem Given an undirected graph G
More information5. Lecture notes on matroid intersection
Massachusetts Institute of Technology Handout 14 18.433: Combinatorial Optimization April 1st, 2009 Michel X. Goemans 5. Lecture notes on matroid intersection One nice feature about matroids is that a
More informationORIE 6300 Mathematical Programming I September 2, Lecture 3
ORIE 6300 Mathematical Programming I September 2, 2014 Lecturer: David P. Williamson Lecture 3 Scribe: Divya Singhvi Last time we discussed how to take dual of an LP in two different ways. Today we will
More informationLinear programming and the efficiency of the simplex algorithm for transportation polytopes
Linear programming and the efficiency of the simplex algorithm for transportation polytopes Edward D. Kim University of Wisconsin-La Crosse February 20, 2015 Loras College Department of Mathematics Colloquium
More informationOutline. L5: Writing Correct Programs, cont. Is this CUDA code correct? Administrative 2/11/09. Next assignment (a homework) given out on Monday
Outline L5: Writing Correct Programs, cont. 1 How to tell if your parallelization is correct? Race conditions and data dependences Tools that detect race conditions Abstractions for writing correct parallel
More informationParallel Auction Algorithm for Linear Assignment Problem
Parallel Auction Algorithm for Linear Assignment Problem Xin Jin 1 Introduction The (linear) assignment problem is one of classic combinatorial optimization problems, first appearing in the studies on
More informationCS6841: Advanced Algorithms IIT Madras, Spring 2016 Lecture #1: Properties of LP Optimal Solutions + Machine Scheduling February 24, 2016
CS6841: Advanced Algorithms IIT Madras, Spring 2016 Lecture #1: Properties of LP Optimal Solutions + Machine Scheduling February 24, 2016 Lecturer: Ravishankar Krishnaswamy Scribe: Anand Kumar(CS15M010)
More informationof Nebraska - Lincoln. Computer Science and Engineering, Department of
University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln CSE Technical reports Computer Science and Engineering, Department of 2005 DCDP: A Novel Data-Centric and Design-Pattern
More informationPolyhedral Operations. Algorithms needed for automation. Logistics
Polyhedral Operations Logistics Intermediate reports late deadline is Friday March 30 at midnight HW6 (posted) and HW7 (posted) due April 5 th Tuesday April 4 th, help session during class with Manaf,
More informationDiscrete Optimization 2010 Lecture 5 Min-Cost Flows & Total Unimodularity
Discrete Optimization 2010 Lecture 5 Min-Cost Flows & Total Unimodularity Marc Uetz University of Twente m.uetz@utwente.nl Lecture 5: sheet 1 / 26 Marc Uetz Discrete Optimization Outline 1 Min-Cost Flows
More informationParallel Computing: Parallel Algorithm Design Examples Jin, Hai
Parallel Computing: Parallel Algorithm Design Examples Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology ! Given associative operator!! a 0! a 1! a 2!! a
More informationAlgorithms for Bioinformatics
Adapted from slides by Leena Salmena and Veli Mäkinen, which are partly from http: //bix.ucsd.edu/bioalgorithms/slides.php. 582670 Algorithms for Bioinformatics Lecture 6: Distance based clustering and
More informationLecture 19. Broadcast routing
Lecture 9 Broadcast routing Slide Broadcast Routing Route a packet from a source to all nodes in the network Possible solutions: Flooding: Each node sends packet on all outgoing links Discard packets received
More informationTiling: A Data Locality Optimizing Algorithm
Tiling: A Data Locality Optimizing Algorithm Previously Performance analysis of existing codes Data dependence analysis for detecting parallelism Specifying transformations using frameworks Today Usefulness
More informationCS4230 Parallel Programming. Lecture 7: Loop Scheduling cont., and Data Dependences 9/6/12. Administrative. Mary Hall.
CS4230 Parallel Programming Lecture 7: Loop Scheduling cont., and Data Dependences Mary Hall September 7, 2012 Administrative I will be on travel on Tuesday, September 11 There will be no lecture Instead,
More informationCS4961 Parallel Programming. Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms 9/5/12. Administrative. Mary Hall September 4, 2012
CS4961 Parallel Programming Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms Administrative Mailing list set up, everyone should be on it - You should have received a test mail last night
More informationNetwork Design and Optimization course
Effective maximum flow algorithms Modeling with flows Network Design and Optimization course Lecture 5 Alberto Ceselli alberto.ceselli@unimi.it Dipartimento di Tecnologie dell Informazione Università degli
More informationAN EFFICIENT IMPLEMENTATION OF NESTED LOOP CONTROL INSTRUCTIONS FOR FINE GRAIN PARALLELISM 1
AN EFFICIENT IMPLEMENTATION OF NESTED LOOP CONTROL INSTRUCTIONS FOR FINE GRAIN PARALLELISM 1 Virgil Andronache Richard P. Simpson Nelson L. Passos Department of Computer Science Midwestern State University
More informationGraphs and Network Flows IE411. Lecture 21. Dr. Ted Ralphs
Graphs and Network Flows IE411 Lecture 21 Dr. Ted Ralphs IE411 Lecture 21 1 Combinatorial Optimization and Network Flows In general, most combinatorial optimization and integer programming problems are
More informationx ji = s i, i N, (1.1)
Dual Ascent Methods. DUAL ASCENT In this chapter we focus on the minimum cost flow problem minimize subject to (i,j) A {j (i,j) A} a ij x ij x ij {j (j,i) A} (MCF) x ji = s i, i N, (.) b ij x ij c ij,
More informationDynamic Programming Algorithms
Based on the notes for the U of Toronto course CSC 364 Dynamic Programming Algorithms The setting is as follows. We wish to find a solution to a given problem which optimizes some quantity Q of interest;
More informationUniversity of Malaga. Image Template Matching on Distributed Memory and Vector Multiprocessors
Image Template Matching on Distributed Memory and Vector Multiprocessors V. Blanco M. Martin D.B. Heras O. Plata F.F. Rivera September 995 Technical Report No: UMA-DAC-95/20 Published in: 5th Int l. Conf.
More informationLecture 8: The EM algorithm
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 8: The EM algorithm Lecturer: Manuela M. Veloso, Eric P. Xing Scribes: Huiting Liu, Yifan Yang 1 Introduction Previous lecture discusses
More informationAutomatic Tiling of Iterative Stencil Loops
Automatic Tiling of Iterative Stencil Loops Zhiyuan Li and Yonghong Song Purdue University Iterative stencil loops are used in scientific programs to implement relaxation methods for numerical simulation
More informationAssignment 5: Solutions
Algorithm Design Techniques Assignment 5: Solutions () Port Authority. [This problem is more commonly called the Bin Packing Problem.] (a) Suppose K = 3 and (w, w, w 3, w 4 ) = (,,, ). The optimal solution
More informationHigh Dimensional Indexing by Clustering
Yufei Tao ITEE University of Queensland Recall that, our discussion so far has assumed that the dimensionality d is moderately high, such that it can be regarded as a constant. This means that d should
More informationStatic Data Race Detection for SPMD Programs via an Extended Polyhedral Representation
via an Extended Polyhedral Representation Habanero Extreme Scale Software Research Group Department of Computer Science Rice University 6th International Workshop on Polyhedral Compilation Techniques (IMPACT
More informationClustering: Centroid-Based Partitioning
Clustering: Centroid-Based Partitioning Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong 1 / 29 Y Tao Clustering: Centroid-Based Partitioning In this lecture, we
More information1. Lecture notes on bipartite matching February 4th,
1. Lecture notes on bipartite matching February 4th, 2015 6 1.1.1 Hall s Theorem Hall s theorem gives a necessary and sufficient condition for a bipartite graph to have a matching which saturates (or matches)
More information6.842 Randomness and Computation September 25-27, Lecture 6 & 7. Definition 1 Interactive Proof Systems (IPS) [Goldwasser, Micali, Rackoff]
6.84 Randomness and Computation September 5-7, 017 Lecture 6 & 7 Lecturer: Ronitt Rubinfeld Scribe: Leo de Castro & Kritkorn Karntikoon 1 Interactive Proof Systems An interactive proof system is a protocol
More informationPolar Duality and Farkas Lemma
Lecture 3 Polar Duality and Farkas Lemma October 8th, 2004 Lecturer: Kamal Jain Notes: Daniel Lowd 3.1 Polytope = bounded polyhedron Last lecture, we were attempting to prove the Minkowsky-Weyl Theorem:
More informationEnhancing Parallelism
CSC 255/455 Software Analysis and Improvement Enhancing Parallelism Instructor: Chen Ding Chapter 5,, Allen and Kennedy www.cs.rice.edu/~ken/comp515/lectures/ Where Does Vectorization Fail? procedure vectorize
More informationIncreasing Parallelism of Loops with the Loop Distribution Technique
Increasing Parallelism of Loops with the Loop Distribution Technique Ku-Nien Chang and Chang-Biau Yang Department of pplied Mathematics National Sun Yat-sen University Kaohsiung, Taiwan 804, ROC cbyang@math.nsysu.edu.tw
More informationPage 1. Parallelization techniques. Dependence graph. Dependence Distance and Distance Vector
Parallelization techniques The parallelization techniques for loops normally follow the following three steeps: 1. Perform a Data Dependence Test to detect potential parallelism. These tests may be performed
More informationParallel Computer Architecture and Programming Written Assignment 3
Parallel Computer Architecture and Programming Written Assignment 3 50 points total. Due Monday, July 17 at the start of class. Problem 1: Message Passing (6 pts) A. (3 pts) You and your friend liked the
More informationLecture 5: Properties of convex sets
Lecture 5: Properties of convex sets Rajat Mittal IIT Kanpur This week we will see properties of convex sets. These properties make convex sets special and are the reason why convex optimization problems
More information1. Lecture notes on bipartite matching
Massachusetts Institute of Technology 18.453: Combinatorial Optimization Michel X. Goemans February 5, 2017 1. Lecture notes on bipartite matching Matching problems are among the fundamental problems in
More information4 Integer Linear Programming (ILP)
TDA6/DIT37 DISCRETE OPTIMIZATION 17 PERIOD 3 WEEK III 4 Integer Linear Programg (ILP) 14 An integer linear program, ILP for short, has the same form as a linear program (LP). The only difference is that
More informationLecture 2: Parallel Programs. Topics: consistency, parallel applications, parallelization process
Lecture 2: Parallel Programs Topics: consistency, parallel applications, parallelization process 1 Sequential Consistency A multiprocessor is sequentially consistent if the result of the execution is achievable
More information