Interprocedural Dependence Analysis and Parallelization

Size: px
Start display at page:

Download "Interprocedural Dependence Analysis and Parallelization"

Transcription

1 RETROSPECTIVE: Interprocedural Dependence Analysis and Parallelization Michael G Burke IBM T.J. Watson Research Labs P.O. Box 704 Yorktown Heights, NY USA mgburke@us.ibm.com Ron K. Cytron Department of Computer Science and Engineering Washington University Campus Box 1045 St Louis, MO USA cytron@acm.org ABSTRACT The area of dependence analysis has served as grounds for fruitful research as well as practical implementation. Compilers and tools that utilize dependence information can generate code that takes advantage of parallel resources and storage hierarchies on modern architectures. Here, we offer some historical background on the context and thinking that fostered our 1986 paper. We also attempt to summarize the direction research in this area has taken since the paper s appearance. Background In 1985, when this paper was submitted to PLDI, the authors of this paper were members of the PTRAN (Parallel TRANslation) group at IBM T. J. Watson Research Labs in Yorktown Heights. Fran Allen, now Research Staff Member Emerita, directed the group, whose research included program optimizations and transformations for parallel architectures. Fran asked us to think about a compile-time test for array overlap that would be appropriate for Fortran, where arrays are statically declared but can overlap in nonobvious ways across different compilation units. The problem thus posed was interprocedural in nature, but it was complicated by Fortran COMMON blocks and other such structures by which a given location in memory could be known by different names. We surveyed literature on dependence analysis and concluded that a subscript test, of the kind formulated by Banerjee-Wolfe, would be appropriate. That test, however, proceeds subscript-by-subscript, and holds only if array indices do not violate their declared bounds. Fortran offered no mechanism to determine the size of an array dimension at runtime, nor were runtime violations of declared bounds cause for terminating a Fortran program. Thus, Fortran programmers violated declared array-bounds with abandon. It occurrred to us that the lower view of an array subscript is simply a linear index in memory, and all Fortran compilers eventually generate code to treat arrays of higher dimension as a onedimensional vector. By applying Banerjee-Wolfe to the linearized subscript form, Fran s problem could be solved conservatively: the compile-time test is reliable concerning array independence, but the test may flag some arrays as overlapping when in fact they are independent. Fortunately, this approach is appropriate for a compiletime test. Further, it turned out that for some higher-dimension arrays, tests on the linearized form could prove independence where subscriptby-subscript testing could not. 20 Years of the ACM/SIGPLAN Conference on Programming Language Design and Implementation ( ): A Selection, Copyright 2003 ACM $5.00. Our paper is perhaps better known for its hierarchical reformulation of Wolfe s direction vectors. A direction vector shows the direction of a data dependence in terms of an iteration space. In a single-loop environment, Wolfe s < dependence (called a true dependence by Kennedy and Allen) moves forward through the iteration space, while Wolfe s > dependence moves backward. We showed that dependence testing could proceed first by testing for dependences over all direction vectors (which we called? ). If the test is positive, then further refinement of? into Wolfe s direction vectors can provide more information about the nature of the dependence. This reformulation was useful, especially for the problem posed by Fran, because many array expressions that might appear to overlap have absolutely no overlap once the linearized references are obtained. The? test quickly obviates the need for further dependence testing between the arrays. Subsequent Developments At the time of our paper s publication in 1986, we had implemented the dependence-testing aspects of the paper and verified the efficiency on the Perfect benchmarks a popular suite of Fortran benchmarks at that time. While our paper suggested that interprocedural subscript analysis might uncover more parallelism, we did not implement the work to that extent, and so that hypothesis remained open. Michael Hind added the full, interprocedural subscript-analysis described in our paper. His experiments showed that the extra analysis did not in fact expose much more parallelism than did the intraprocedural version we had implemented [2]. Subsequently, Mary Hall, continuing work she had begun at Rice but now at Stanford working with Monica Lam and others, showed that exposing significant parallelism on the Perfect benchmarks required powerful transformations like array privatization. The Stanford experiments [1] were performed against the PTRAN measurements on the Perfect benchmarks that Hind, et al. had described in our paper. In a subsequent paper [3], the Stanford group cited our approach as the standard one for computing direction vectors. In his book [4], Michael Wolfe acknowledged and adopted our framework for computing dependence relations hierarchically. While dependence-testing of the form described in our paper does not see much use these days for uncovering parallelism in dusty deck Fortran programs, sophisticated analysis of this form is present in tools and in compilers that restructure programs for advanced architectures, including those that feature elements of parallelism as well as deep storage hierarchies. At IBM, our dependence test found its way into the IBM XL Fortran product that was first shipped as a product in 1996 ten years after the publication of our ACM SIGPLAN 139 Best of PLDI

2 paper. Dependence analysis is a specialized area of computer science, but it has served as a fertile ground for theoretical and practical research. We are pleased to have been part of its noble history and we thank the selection committee for this honor. 1. ACKNOWLEDGEMENTS This work builds on the work of two groups who pioneered the area of dependence analysis: from Illinois, David Kuck, Utpal Banerjee, and Michael Wolfe; from Rice, Ken Kennedy and Randy Allen. The authors thank Fran Allen for inspiring and supporting this work and Vivek Sarkar for advocating our work for this recognition. REFERENCES [1] M. W. Hall, S. Amarasinghe, B. Murphy, S. Liao, and M. Lam. Detecting coarse-grain parallelism using an interprocedural parallelizing compiler. Proceedings of Supercomputing 95, [2] Michael Hind, Michael Burke, Paul Carini, and Sam Midkiff. An Empirical Study of Precise Interprocedural Array Analysis. Scientific Programming, 3(3), [3] Maydan, Hennessy, and Lam. Efficient and exact data dependence analysis. PLDI, [4] Michael J. Wolfe. Optimizing Supercompilers for Supercomputers. Pitman, London and The MIT Press, Cambridge, Massachusetts, In the series, Research Monographs in Parallel and Distributed Computing This monograph is a revised version of the author s Ph.D. dissertation published as Technical Report UIUCDCS-R , U. Illinois at Urbana-Champaign, ACM SIGPLAN 140 Best of PLDI

3 ACM SIGPLAN 141 Best of PLDI

4 ACM SIGPLAN 142 Best of PLDI

5 ACM SIGPLAN 143 Best of PLDI

6 ACM SIGPLAN 144 Best of PLDI

7 ACM SIGPLAN 145 Best of PLDI

8 ACM SIGPLAN 146 Best of PLDI

9 ACM SIGPLAN 147 Best of PLDI

10 ACM SIGPLAN 148 Best of PLDI

11 ACM SIGPLAN 149 Best of PLDI

12 ACM SIGPLAN 150 Best of PLDI

13 ACM SIGPLAN 151 Best of PLDI

14 ACM SIGPLAN 152 Best of PLDI

15 ACM SIGPLAN 153 Best of PLDI

16 ACM SIGPLAN 154 Best of PLDI

Identifying Parallelism in Construction Operations of Cyclic Pointer-Linked Data Structures 1

Identifying Parallelism in Construction Operations of Cyclic Pointer-Linked Data Structures 1 Identifying Parallelism in Construction Operations of Cyclic Pointer-Linked Data Structures 1 Yuan-Shin Hwang Department of Computer Science National Taiwan Ocean University Keelung 20224 Taiwan shin@cs.ntou.edu.tw

More information

Chapter 1: Interprocedural Parallelization Analysis: A Case Study. Abstract

Chapter 1: Interprocedural Parallelization Analysis: A Case Study. Abstract Chapter 1: Interprocedural Parallelization Analysis: A Case Study Mary W. Hall Brian R. Murphy Saman P. Amarasinghe Abstract We present an overview of our interprocedural analysis system, which applies

More information

Parallelization System. Abstract. We present an overview of our interprocedural analysis system,

Parallelization System. Abstract. We present an overview of our interprocedural analysis system, Overview of an Interprocedural Automatic Parallelization System Mary W. Hall Brian R. Murphy y Saman P. Amarasinghe y Shih-Wei Liao y Monica S. Lam y Abstract We present an overview of our interprocedural

More information

UMIACS-TR December, CS-TR-3192 Revised April, William Pugh. Dept. of Computer Science. Univ. of Maryland, College Park, MD 20742

UMIACS-TR December, CS-TR-3192 Revised April, William Pugh. Dept. of Computer Science. Univ. of Maryland, College Park, MD 20742 UMIACS-TR-93-133 December, 1992 CS-TR-3192 Revised April, 1993 Denitions of Dependence Distance William Pugh Institute for Advanced Computer Studies Dept. of Computer Science Univ. of Maryland, College

More information

Analysis of Pointers and Structures

Analysis of Pointers and Structures RETROSPECTIVE: Analysis of Pointers and Structures David Chase, Mark Wegman, and Ken Zadeck chase@naturalbridge.com, zadeck@naturalbridge.com, wegman@us.ibm.com Historically our paper was important because

More information

p q r int (*funcptr)(); SUB2() {... SUB3() {... } /* End SUB3 */ SUB1() {... c1: SUB3();... c3 c1 c2: SUB3();... } /* End SUB2 */ ...

p q r int (*funcptr)(); SUB2() {... SUB3() {... } /* End SUB3 */ SUB1() {... c1: SUB3();... c3 c1 c2: SUB3();... } /* End SUB2 */ ... Lecture Notes in Computer Science, 892, Springer-Verlag, 1995 Proceedings from the 7th International Workshop on Languages and Compilers for Parallel Computing Flow-Insensitive Interprocedural Alias Analysis

More information

Precise Executable Interprocedural Slices

Precise Executable Interprocedural Slices Precise Executable Interprocedural Slices DAVID BINKLEY Loyola College in Maryland The notion of a program slice, originally introduced by Mark Weiser, is useful in program debugging, automatic parallelization,

More information

Space Efficient Conservative Garbage Collection

Space Efficient Conservative Garbage Collection RETROSPECTIVE: Space Efficient Conservative Garbage Collection Hans-J. Boehm HP Laboratories 1501 Page Mill Rd. MS 1138 Palo Alto, CA, 94304, USA Hans.Boehm@hp.com ABSTRACT Both type-accurate and conservative

More information

Integer Programming for Array Subscript Analysis

Integer Programming for Array Subscript Analysis Appears in the IEEE Transactions on Parallel and Distributed Systems, June 95 Integer Programming for Array Subscript Analysis Jaspal Subhlok School of Computer Science, Carnegie Mellon University, Pittsburgh

More information

Symbolic Evaluation of Sums for Parallelising Compilers

Symbolic Evaluation of Sums for Parallelising Compilers Symbolic Evaluation of Sums for Parallelising Compilers Rizos Sakellariou Department of Computer Science University of Manchester Oxford Road Manchester M13 9PL United Kingdom e-mail: rizos@csmanacuk Keywords:

More information

Advanced Compiler Construction

Advanced Compiler Construction Advanced Compiler Construction Qing Yi class web site: www.cs.utsa.edu/~qingyi/cs6363 cs6363 1 A little about myself Qing Yi Ph.D. Rice University, USA. Assistant Professor, Department of Computer Science

More information

Efficient Computation of LALR(1) Look-Ahead Sets

Efficient Computation of LALR(1) Look-Ahead Sets RETROSPECTIVE: Efficient Computation of LALR(1) Look-Ahead Sets Thomas J. Pennello ARC International Santa Cruz, CA 95060 tom.pennello@arc.com Frank DeRemer 8 South Circle Santa Cruz, CA 95060 fderemer@alum.mit.edu

More information

University of Ghent. St.-Pietersnieuwstraat 41. Abstract. Sucient and precise semantic information is essential to interactive

University of Ghent. St.-Pietersnieuwstraat 41. Abstract. Sucient and precise semantic information is essential to interactive Visualizing the Iteration Space in PEFPT? Qi Wang, Yu Yijun and Erik D'Hollander University of Ghent Dept. of Electrical Engineering St.-Pietersnieuwstraat 41 B-9000 Ghent wang@elis.rug.ac.be Tel: +32-9-264.33.75

More information

Control Flow Analysis with SAT Solvers

Control Flow Analysis with SAT Solvers Control Flow Analysis with SAT Solvers Steven Lyde, Matthew Might University of Utah, Salt Lake City, Utah, USA Abstract. Control flow analyses statically determine the control flow of programs. This is

More information

Department of. Computer Science. Uniqueness Analysis of Array. Omega Test. October 21, Colorado State University

Department of. Computer Science. Uniqueness Analysis of Array. Omega Test. October 21, Colorado State University Department of Computer Science Uniqueness Analysis of Array Comprehensions Using the Omega Test David Garza and Wim Bohm Technical Report CS-93-127 October 21, 1993 Colorado State University Uniqueness

More information

Title: ====== Open Research Compiler (ORC): Proliferation of Technologies and Tools

Title: ====== Open Research Compiler (ORC): Proliferation of Technologies and Tools Tutorial Proposal to Micro-36 Title: ====== Open Research Compiler (ORC): Proliferation of Technologies and Tools Abstract: ========= Open Research Compiler (ORC) has been well adopted by the research

More information

Unrolling Loops Containing Task Parallelism

Unrolling Loops Containing Task Parallelism Unrolling Loops Containing Task Parallelism Roger Ferrer 1, Alejandro Duran 1, Xavier Martorell 1,2, and Eduard Ayguadé 1,2 1 Barcelona Supercomputing Center Nexus II, Jordi Girona, 29, Barcelona, Spain

More information

Increasing Parallelism of Loops with the Loop Distribution Technique

Increasing Parallelism of Loops with the Loop Distribution Technique Increasing Parallelism of Loops with the Loop Distribution Technique Ku-Nien Chang and Chang-Biau Yang Department of pplied Mathematics National Sun Yat-sen University Kaohsiung, Taiwan 804, ROC cbyang@math.nsysu.edu.tw

More information

Generalized Iteration Space and the. Parallelization of Symbolic Programs. (Extended Abstract) Luddy Harrison. October 15, 1991.

Generalized Iteration Space and the. Parallelization of Symbolic Programs. (Extended Abstract) Luddy Harrison. October 15, 1991. Generalized Iteration Space and the Parallelization of Symbolic Programs (Extended Abstract) Luddy Harrison October 15, 1991 Abstract A large body of literature has developed concerning the automatic parallelization

More information

Case Studies on Cache Performance and Optimization of Programs with Unit Strides

Case Studies on Cache Performance and Optimization of Programs with Unit Strides SOFTWARE PRACTICE AND EXPERIENCE, VOL. 27(2), 167 172 (FEBRUARY 1997) Case Studies on Cache Performance and Optimization of Programs with Unit Strides pei-chi wu and kuo-chan huang Department of Computer

More information

Profiling Dependence Vectors for Loop Parallelization

Profiling Dependence Vectors for Loop Parallelization Profiling Dependence Vectors for Loop Parallelization Shaw-Yen Tseng Chung-Ta King Chuan-Yi Tang Department of Computer Science National Tsing Hua University Hsinchu, Taiwan, R.O.C. fdr788301,king,cytangg@cs.nthu.edu.tw

More information

Department of. Computer Science. Uniqueness and Completeness. Analysis of Array. Comprehensions. December 15, Colorado State University

Department of. Computer Science. Uniqueness and Completeness. Analysis of Array. Comprehensions. December 15, Colorado State University Department of Computer Science Uniqueness and Completeness Analysis of Array Comprehensions David Garza and Wim Bohm Technical Report CS-93-132 December 15, 1993 Colorado State University Uniqueness and

More information

Advanced Compiler Construction Theory And Practice

Advanced Compiler Construction Theory And Practice Advanced Compiler Construction Theory And Practice Introduction to loop dependence and Optimizations 7/7/2014 DragonStar 2014 - Qing Yi 1 A little about myself Qing Yi Ph.D. Rice University, USA. Associate

More information

CS 526 Advanced Topics in Compiler Construction. 1 of 12

CS 526 Advanced Topics in Compiler Construction. 1 of 12 CS 526 Advanced Topics in Compiler Construction 1 of 12 Course Organization Instructor: David Padua 3-4223 padua@uiuc.edu Office hours: By appointment Course material: Website Textbook: Randy Allen and

More information

Visual Amortization Analysis of Recompilation Strategies

Visual Amortization Analysis of Recompilation Strategies 2010 14th International Information Conference Visualisation Information Visualisation Visual Amortization Analysis of Recompilation Strategies Stephan Zimmer and Stephan Diehl (Authors) Computer Science

More information

i=1 i=2 i=3 i=4 i=5 x(4) x(6) x(8)

i=1 i=2 i=3 i=4 i=5 x(4) x(6) x(8) Vectorization Using Reversible Data Dependences Peiyi Tang and Nianshu Gao Technical Report ANU-TR-CS-94-08 October 21, 1994 Vectorization Using Reversible Data Dependences Peiyi Tang Department of Computer

More information

Briki: a Flexible Java Compiler

Briki: a Flexible Java Compiler Briki: a Flexible Java Compiler Michał Cierniak Wei Li Department of Computer Science University of Rochester Rochester, NY 14627 fcierniak,weig@cs.rochester.edu May 1996 Abstract We present a Java compiler

More information

Compiling for Advanced Architectures

Compiling for Advanced Architectures Compiling for Advanced Architectures In this lecture, we will concentrate on compilation issues for compiling scientific codes Typically, scientific codes Use arrays as their main data structures Have

More information

Center for Supercomputing Research and Development. recognizing more general forms of these patterns, notably

Center for Supercomputing Research and Development. recognizing more general forms of these patterns, notably Idiom Recognition in the Polaris Parallelizing Compiler Bill Pottenger and Rudolf Eigenmann potteng@csrd.uiuc.edu, eigenman@csrd.uiuc.edu Center for Supercomputing Research and Development University of

More information

Advanced Compiler Construction

Advanced Compiler Construction CS 526 Advanced Compiler Construction http://misailo.cs.illinois.edu/courses/cs526 INTERPROCEDURAL ANALYSIS The slides adapted from Vikram Adve So Far Control Flow Analysis Data Flow Analysis Dependence

More information

Lecture 5. Data Flow Analysis

Lecture 5. Data Flow Analysis Lecture 5. Data Flow Analysis Wei Le 2014.10 Abstraction-based Analysis dataflow analysis: combines model checking s fix point engine with abstract interpretation of data values abstract interpretation:

More information

Feedback Guided Scheduling of Nested Loops

Feedback Guided Scheduling of Nested Loops Feedback Guided Scheduling of Nested Loops T. L. Freeman 1, D. J. Hancock 1, J. M. Bull 2, and R. W. Ford 1 1 Centre for Novel Computing, University of Manchester, Manchester, M13 9PL, U.K. 2 Edinburgh

More information

The Essence of Compiling with Continuations

The Essence of Compiling with Continuations RETROSPECTIVE: The Essence of Compiling with Continuations Cormac Flanagan Amr Sabry Bruce F. Duba Matthias Felleisen Systems Research Center Compaq cormac.flanagan@compaq.com Dept. of Computer Science

More information

On Privatization of Variables for Data-Parallel Execution

On Privatization of Variables for Data-Parallel Execution On Privatization of Variables for Data-Parallel Execution Manish Gupta IBM T. J. Watson Research Center P. O. Box 218 Yorktown Heights, NY 10598 mgupta@watson.ibm.com Abstract Privatization of data is

More information

A PRACTICAL ALGORITHM

A PRACTICAL ALGORITHM William Pugh A PRACTICAL ALGORITHM for Exact Array Dependence Analysis ndamental analis step in an ad- ',nced optimizing compiler (as well as many other software tools) is data dependence analysis for

More information

INTERPROCEDURAL PARALLELIZATION USING MEMORY CLASSIFICATION ANALYSIS BY JAY PHILIP HOEFLINGER B.S., University of Illinois, 1974 M.S., University of I

INTERPROCEDURAL PARALLELIZATION USING MEMORY CLASSIFICATION ANALYSIS BY JAY PHILIP HOEFLINGER B.S., University of Illinois, 1974 M.S., University of I cflcopyright by Jay Philip Hoeflinger 2000 INTERPROCEDURAL PARALLELIZATION USING MEMORY CLASSIFICATION ANALYSIS BY JAY PHILIP HOEFLINGER B.S., University of Illinois, 1974 M.S., University of Illinois,

More information

Typed Fusion with Applications to Parallel and Sequential Code Generation

Typed Fusion with Applications to Parallel and Sequential Code Generation Typed Fusion with Applications to Parallel and Sequential Code Generation Ken Kennedy Kathryn S. McKinley Department of Computer Science Department of Computer Science Rice University, CITI University

More information

Design-Driven Compilation

Design-Driven Compilation Design-Driven Compilation Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology Cambridge, MA 02139 {rugina, rinard@lcs.mit.edu Abstract. This paper introduces

More information

Parallelizing SPECjbb2000 with Transactional Memory

Parallelizing SPECjbb2000 with Transactional Memory Parallelizing SPECjbb2000 with Transactional Memory JaeWoong Chung, Chi Cao Minh, Brian D. Carlstrom, Christos Kozyrakis Computer Systems Laboratory Stanford University {jwchung, caominh, bdc, kozyraki}@stanford.edu

More information

A Data Dependence Graph in Polaris. July 17, Center for Supercomputing Research and Development. Abstract

A Data Dependence Graph in Polaris. July 17, Center for Supercomputing Research and Development. Abstract A Data Dependence Graph in Polaris Yunheung Paek Paul Petersen July 17, 1996 Center for Supercomputing Research and Development University of Illinois at Urbana-Champaign Urbana, Illinois 61801 Abstract

More information

Originally appeared at Supercomputing 91 This expanded version appeared in Comm. of the ACM, August 1992

Originally appeared at Supercomputing 91 This expanded version appeared in Comm. of the ACM, August 1992 Originally appeared at Supercomputing 91 This expanded version appeared in Comm. of the ACM, August 1992 The Omega Test: a fast and practical integer programming algorithm for dependence analysis William

More information

Research Statement. 1 My Approach to Research. John Whaley January 2005

Research Statement. 1 My Approach to Research. John Whaley January 2005 Research Statement John Whaley January 2005 1 My Approach to Research As a child, I was always interested in building things. When I was six years old, I taught myself programming by typing in programs

More information

Keywords AST, Pattern Matching, Automatic Parallelization, Loop Parallelization, Python

Keywords AST, Pattern Matching, Automatic Parallelization, Loop Parallelization, Python Volume 7, Issue 3, March 217 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com An Automatic Parallelizing

More information

Using Cache Models and Empirical Search in Automatic Tuning of Applications. Apan Qasem Ken Kennedy John Mellor-Crummey Rice University Houston, TX

Using Cache Models and Empirical Search in Automatic Tuning of Applications. Apan Qasem Ken Kennedy John Mellor-Crummey Rice University Houston, TX Using Cache Models and Empirical Search in Automatic Tuning of Applications Apan Qasem Ken Kennedy John Mellor-Crummey Rice University Houston, TX Outline Overview of Framework Fine grain control of transformations

More information

Lecture Notes on Liveness Analysis

Lecture Notes on Liveness Analysis Lecture Notes on Liveness Analysis 15-411: Compiler Design Frank Pfenning André Platzer Lecture 4 1 Introduction We will see different kinds of program analyses in the course, most of them for the purpose

More information

Report on article The Travelling Salesman Problem: A Linear Programming Formulation

Report on article The Travelling Salesman Problem: A Linear Programming Formulation Report on article The Travelling Salesman Problem: A Linear Programming Formulation Radosław Hofman, Poznań 2008 Abstract This article describes counter example prepared in order to prove that linear formulation

More information

A Compiler-Directed Cache Coherence Scheme Using Data Prefetching

A Compiler-Directed Cache Coherence Scheme Using Data Prefetching A Compiler-Directed Cache Coherence Scheme Using Data Prefetching Hock-Beng Lim Center for Supercomputing R & D University of Illinois Urbana, IL 61801 hblim@csrd.uiuc.edu Pen-Chung Yew Dept. of Computer

More information

Theory and Algorithms for the Generation and Validation of Speculative Loop Optimizations

Theory and Algorithms for the Generation and Validation of Speculative Loop Optimizations Theory and Algorithms for the Generation and Validation of Speculative Loop Optimizations Ying Hu Clark Barrett Benjamin Goldberg Department of Computer Science New York University yinghubarrettgoldberg

More information

Interprocedural Symbolic Range Propagation for Optimizing Compilers

Interprocedural Symbolic Range Propagation for Optimizing Compilers Interprocedural Symbolic Range Propagation for Optimizing Compilers Hansang Bae and Rudolf Eigenmann School of Electrical and Computer Engineering Purdue University, West Lafayette, IN 47907 {baeh,eigenman}@purdue.edu

More information

Objective. We will study software systems that permit applications programs to exploit the power of modern high-performance computers.

Objective. We will study software systems that permit applications programs to exploit the power of modern high-performance computers. CS 612 Software Design for High-performance Architectures 1 computers. CS 412 is desirable but not high-performance essential. Course Organization Lecturer:Paul Stodghill, stodghil@cs.cornell.edu, Rhodes

More information

Exact Side Eects for Interprocedural Dependence Analysis Peiyi Tang Department of Computer Science The Australian National University Canberra ACT 26

Exact Side Eects for Interprocedural Dependence Analysis Peiyi Tang Department of Computer Science The Australian National University Canberra ACT 26 Exact Side Eects for Interprocedural Dependence Analysis Peiyi Tang Technical Report ANU-TR-CS-92- November 7, 992 Exact Side Eects for Interprocedural Dependence Analysis Peiyi Tang Department of Computer

More information

The Relationships between Domain Specific and General- Purpose Languages

The Relationships between Domain Specific and General- Purpose Languages The Relationships between Domain Specific and General- Purpose Languages Oded Kramer and Arnon Sturm Department of Information Systems Engineering, Ben-Gurion University of the Negev Beer-Sheva, Israel

More information

Lazy Code Motion. Jens Knoop FernUniversität Hagen. Oliver Rüthing University of Dortmund. Bernhard Steffen University of Dortmund

Lazy Code Motion. Jens Knoop FernUniversität Hagen. Oliver Rüthing University of Dortmund. Bernhard Steffen University of Dortmund RETROSPECTIVE: Lazy Code Motion Jens Knoop FernUniversität Hagen Jens.Knoop@fernuni-hagen.de Oliver Rüthing University of Dortmund Oliver.Ruething@udo.edu Bernhard Steffen University of Dortmund Bernhard.Steffen@udo.edu

More information

Reducing Parallelizing Compilation Time by Removing Redundant Analysis

Reducing Parallelizing Compilation Time by Removing Redundant Analysis Reducing Parallelizing Compilation Time by Removing Redundant Analysis Jixin Han Rina Fujino Ryota Tamura Mamoru Shimaoka Hiroki Mikami Waseda University, Japan {kalfazed,rfujino,r tamura,shimaoka,hiroki}

More information

Data Dependency. Extended Contorol Dependency. Data Dependency. Conditional Branch. AND OR Original Control Flow. Control Flow. Conditional Branch

Data Dependency. Extended Contorol Dependency. Data Dependency. Conditional Branch. AND OR Original Control Flow. Control Flow. Conditional Branch Coarse Grain Task Parallel Processing with Cache Optimization on Shared Memory Multiprocessor Kazuhisa Ishizaka, Motoki Obata, Hironori Kasahara fishizaka,obata,kasaharag@oscar.elec.waseda.ac.jp Dept.EECE,

More information

Compiler techniques for leveraging ILP

Compiler techniques for leveraging ILP Compiler techniques for leveraging ILP Purshottam and Sajith October 12, 2011 Purshottam and Sajith (IU) Compiler techniques for leveraging ILP October 12, 2011 1 / 56 Parallelism in your pocket LINPACK

More information

JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING

JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING Larson Hogstrom, Mukarram Tahir, Andres Hasfura Massachusetts Institute of Technology, Cambridge, Massachusetts, USA 18.337/6.338

More information

DIGITAL SIGNAL PROCESSING AND ITS USAGE

DIGITAL SIGNAL PROCESSING AND ITS USAGE DIGITAL SIGNAL PROCESSING AND ITS USAGE BANOTHU MOHAN RESEARCH SCHOLAR OF OPJS UNIVERSITY ABSTRACT High Performance Computing is not the exclusive domain of computational science. Instead, high computational

More information

Lecture Notes on Dataflow Analysis

Lecture Notes on Dataflow Analysis Lecture Notes on Dataflow Analysis 15-411: Compiler Design Frank Pfenning Lecture 5 September 9, 2008 1 Introduction In this lecture we first extend liveness analysis to handle memory references and then

More information

Bandwidth-Based Performance Tuning and Prediction

Bandwidth-Based Performance Tuning and Prediction !#"$ % &(' ) *+,-. %#/ 01 24357698;:06=6@BA5C6DA5C6615@E F GHIFJ & GLKNMOQPRQCS GHT 0 U9T Q"DVWXZYQK KNK [#\0]_^`\0aXbdc aex\0f`\)ghà ikjlcm].nghakghopop\0oq[#c r sutu^kgh^`vpcm] evpi0\qw]xvzym\0à f{vp^}

More information

arxiv: v3 [cs.dc] 1 Mar 2013

arxiv: v3 [cs.dc] 1 Mar 2013 OPTIMIZING SYNCHRONIZATION ALGORITHM FOR AUTO-PARALLELIZING COMPILER Gang Liao, Si-hui Qin, Long-fei Ma, Qi Sun Department of Computer Science and Engineering, Sichuan University Jinjiang College, China,

More information

Extended Linear Scan: an Alternate Foundation for Global Register Allocation

Extended Linear Scan: an Alternate Foundation for Global Register Allocation Extended Linear Scan: an Alternate Foundation for Global Register Allocation Vivek Sarkar 1 and Rajkishore Barik 2 1 IBM T.J. Watson Research Center, Email: vsarkar@us.ibm.com 2 IBM India Research Laboratory,

More information

ANALYZING THREADS FOR SHARED MEMORY CONSISTENCY BY ZEHRA NOMAN SURA

ANALYZING THREADS FOR SHARED MEMORY CONSISTENCY BY ZEHRA NOMAN SURA ANALYZING THREADS FOR SHARED MEMORY CONSISTENCY BY ZEHRA NOMAN SURA B.E., Nagpur University, 1998 M.S., University of Illinois at Urbana-Champaign, 2001 DISSERTATION Submitted in partial fulfillment of

More information

Thread-Sensitive Points-to Analysis for Multithreaded Java Programs

Thread-Sensitive Points-to Analysis for Multithreaded Java Programs Thread-Sensitive Points-to Analysis for Multithreaded Java Programs Byeong-Mo Chang 1 and Jong-Deok Choi 2 1 Dept. of Computer Science, Sookmyung Women s University, Seoul 140-742, Korea chang@sookmyung.ac.kr

More information

Tiling: A Data Locality Optimizing Algorithm

Tiling: A Data Locality Optimizing Algorithm Tiling: A Data Locality Optimizing Algorithm Previously Unroll and Jam Homework PA3 is due Monday November 2nd Today Unroll and Jam is tiling Code generation for fixed-sized tiles Paper writing and critique

More information

Affine and Unimodular Transformations for Non-Uniform Nested Loops

Affine and Unimodular Transformations for Non-Uniform Nested Loops th WSEAS International Conference on COMPUTERS, Heraklion, Greece, July 3-, 008 Affine and Unimodular Transformations for Non-Uniform Nested Loops FAWZY A. TORKEY, AFAF A. SALAH, NAHED M. EL DESOUKY and

More information

Lecture Notes on Register Allocation

Lecture Notes on Register Allocation Lecture Notes on Register Allocation 15-411: Compiler Design Frank Pfenning Lecture 3 September 1, 2009 1 Introduction In this lecture we discuss register allocation, which is one of the last steps in

More information

Analyzing programs with explicit parallelism

Analyzing programs with explicit parallelism Oregon Health & Science University OHSU Digital Commons CSETech June 1991 Analyzing programs with explicit parallelism Harini Srinivasan Michael Wolfe Follow this and additional works at: http://digitalcommons.ohsu.edu/csetech

More information

A Geometric Approach for Partitioning N-Dimensional Non-Rectangular Iteration Spaces

A Geometric Approach for Partitioning N-Dimensional Non-Rectangular Iteration Spaces A Geometric Approach for Partitioning N-Dimensional Non-Rectangular Iteration Spaces Arun Kejariwal, Paolo D Alberto, Alexandru Nicolau Constantine D. Polychronopoulos Center for Embedded Computer Systems

More information

Software pipelining of nested loops 2 1 Introduction Exploiting parallelism in loops in scientic programs is an important factor in realizing the pote

Software pipelining of nested loops 2 1 Introduction Exploiting parallelism in loops in scientic programs is an important factor in realizing the pote Software pipelining of nested loops J. Ramanujam Dept. of Electrical and Computer Engineering Louisiana State University, Baton Rouge, LA 70803 E-mail: jxr@ee.lsu.edu May 1994 Abstract This paper presents

More information

A Framework for the Performance Evaluation of Operating System Emulators. Joshua H. Shaffer. A Proposal Submitted to the Honors Council

A Framework for the Performance Evaluation of Operating System Emulators. Joshua H. Shaffer. A Proposal Submitted to the Honors Council A Framework for the Performance Evaluation of Operating System Emulators by Joshua H. Shaffer A Proposal Submitted to the Honors Council For Honors in Computer Science 15 October 2003 Approved By: Luiz

More information

Parallel Programming. Michael Gerndt Technische Universität München

Parallel Programming. Michael Gerndt Technische Universität München Parallel Programming Michael Gerndt Technische Universität München gerndt@in.tum.de Contents 1. Introduction 2. Parallel architectures 3. Parallel applications 4. Parallelization approach 5. OpenMP 6.

More information

The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization

The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization 160 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 10, NO. 2, FEBRUARY 1999 The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization Lawrence

More information

Automatic Discovery of Coarse-Grained Parallelism in Media Applications

Automatic Discovery of Coarse-Grained Parallelism in Media Applications Automatic Discovery of Coarse-Grained Parallelism in Media Applications Shane Ryoo, Sain-Zee Ueng, Christopher I. Rodrigues, Robert E. Kidd, Matthew I. Frank, and Wen-mei W. Hwu Center for Reliable and

More information

Combining Interprocedural Pointer Analysis

Combining Interprocedural Pointer Analysis RC 21532 (96749) 3/17/99 Computer Science IBM Research Report Combining Interprocedural Pointer Analysis and Conditional Constant Propagation Anthony Pioli Forman Interactive 134 5th Ave New York, NY 10011

More information

Advanced Program Analyses and Verifications

Advanced Program Analyses and Verifications Advanced Program Analyses and Verifications Thi Viet Nga NGUYEN François IRIGOIN entre de Recherche en Informatique - Ecole des Mines de Paris 35 rue Saint Honoré, 77305 Fontainebleau edex, France email:

More information

A SIMDizing C Compiler for the Mitsubishi Electric Neuro4 Processor Array

A SIMDizing C Compiler for the Mitsubishi Electric Neuro4 Processor Array Mitsubishi Electric Research Laboratories MERL/SV 95TR031 December 18, 1995 A SIMDizing C Compiler for the Mitsubishi Electric Neuro4 Processor Array Venkat Konda, Hugh Lauer, Katsunobu Muroi, Kenichi

More information

Modeling Dependencies for Cascading Selective Undo

Modeling Dependencies for Cascading Selective Undo Modeling Dependencies for Cascading Selective Undo Aaron G. Cass and Chris S. T. Fernandes Union College, Schenectady, NY 12308, USA, {cassa fernandc}@union.edu Abstract. Linear and selective undo mechanisms

More information

RICE UNIVERSITY. Transforming Complex Loop Nests For Locality by Qing Yi

RICE UNIVERSITY. Transforming Complex Loop Nests For Locality by Qing Yi RICE UNIVERSITY Transforming Complex Loop Nests For Locality by Qing Yi A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY Approved, Thesis Committee: Ken

More information

Data structures for optimizing programs with explicit parallelism

Data structures for optimizing programs with explicit parallelism Oregon Health & Science University OHSU Digital Commons CSETech March 1991 Data structures for optimizing programs with explicit parallelism Michael Wolfe Harini Srinivasan Follow this and additional works

More information

Extending Blaise Capabilities in Complex Data Collections

Extending Blaise Capabilities in Complex Data Collections Extending Blaise Capabilities in Complex Data Collections Paul Segel and Kathleen O Reagan,Westat International Blaise Users Conference, April 2012, London, UK Summary: Westat Visual Survey (WVS) was developed

More information

Optimizing Inter-Nest Data Locality Using Loop Splitting and Reordering

Optimizing Inter-Nest Data Locality Using Loop Splitting and Reordering Optimizing Inter-Nest Data Locality Using Loop Splitting and Reordering Sofiane Naci The Computer Laboratory, University of Cambridge JJ Thompson Avenue Cambridge CB3 FD United Kingdom Sofiane.Naci@cl.cam.ac.uk

More information

been implemented as part of the PTRAN (Parallel Translation) project at IBM Research [ABC + 87]. The PTRAN system contains a program database which ca

been implemented as part of the PTRAN (Parallel Translation) project at IBM Research [ABC + 87]. The PTRAN system contains a program database which ca Determining Average Program Execution Times and their Variance Vivek Sarkar IBM Research T. J. Watson Research Center P. O. Box 704, Yorktown Heights, NY 10598 Abstract This paper presents a general framework

More information

Discrete Optimization. Lecture Notes 2

Discrete Optimization. Lecture Notes 2 Discrete Optimization. Lecture Notes 2 Disjunctive Constraints Defining variables and formulating linear constraints can be straightforward or more sophisticated, depending on the problem structure. The

More information

Compiling Java For High Performance on Servers

Compiling Java For High Performance on Servers Compiling Java For High Performance on Servers Ken Kennedy Center for Research on Parallel Computation Rice University Goal: Achieve high performance without sacrificing language compatibility and portability.

More information

HOW AND WHEN TO FLATTEN JAVA CLASSES?

HOW AND WHEN TO FLATTEN JAVA CLASSES? HOW AND WHEN TO FLATTEN JAVA CLASSES? Jehad Al Dallal Department of Information Science, P.O. Box 5969, Safat 13060, Kuwait ABSTRACT Improving modularity and reusability are two key objectives in object-oriented

More information

cies. IEEE Trans. Comput., 38, 5 (May),

cies. IEEE Trans. Comput., 38, 5 (May), cies. IEEE Trans. Comput., 38, 5 (May), 663-678. [33] M. D. Smith. Support for speculative execution in high-performance processors. PhD thesis, Stanford University, November 1992. [34] D. W. Wall. Limits

More information

Regression-Based Multi-Model Prediction of Data Reuse Signature

Regression-Based Multi-Model Prediction of Data Reuse Signature Regression-Based Multi-Model Prediction of Data Reuse Signature Xipeng Shen Yutao Zhong Chen Ding Computer Science Department, University of Rochester {xshen,ytzhong,cding}@cs.rochester.edu Abstract As

More information

An Object Oriented Runtime Complexity Metric based on Iterative Decision Points

An Object Oriented Runtime Complexity Metric based on Iterative Decision Points An Object Oriented Runtime Complexity Metric based on Iterative Amr F. Desouky 1, Letha H. Etzkorn 2 1 Computer Science Department, University of Alabama in Huntsville, Huntsville, AL, USA 2 Computer Science

More information

A General Greedy Approximation Algorithm with Applications

A General Greedy Approximation Algorithm with Applications A General Greedy Approximation Algorithm with Applications Tong Zhang IBM T.J. Watson Research Center Yorktown Heights, NY 10598 tzhang@watson.ibm.com Abstract Greedy approximation algorithms have been

More information

Towards Automatic Parallelisation for Multi-Processor DSPs

Towards Automatic Parallelisation for Multi-Processor DSPs Towards Automatic Parallelisation for Multi-Processor DSPs Björn Franke Michael O Boyle Institute for Computing Systems Architecture (ICSA) Division of Informatics, University of Edinburgh Abstract This

More information

Speculative Synchronization

Speculative Synchronization Speculative Synchronization José F. Martínez Department of Computer Science University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu/martinez Problem 1: Conservative Parallelization No parallelization

More information

Compile-time Inter-query Dependence Analysis

Compile-time Inter-query Dependence Analysis Compile-time Inter-query Dependence Analysis Srinivasan Parthasarathy, Wei Li, Michał Cierniak, Mohammed Javeed Zaki Department of Computer Science, University of Rochester, Rochester, NY 14627-0226 fsrini,wei,cierniak,zakig@cs.rochester.edu

More information

Hierarchical Pointer Analysis for Distributed Programs

Hierarchical Pointer Analysis for Distributed Programs Hierarchical Pointer Analysis for Distributed Programs Amir Kamil Computer Science Division, University of California, Berkeley kamil@cs.berkeley.edu April 14, 2006 1 Introduction Many distributed, parallel

More information

Lecture 2: Control Flow Analysis

Lecture 2: Control Flow Analysis COM S/CPRE 513 x: Foundations and Applications of Program Analysis Spring 2018 Instructor: Wei Le Lecture 2: Control Flow Analysis 2.1 What is Control Flow Analysis Given program source code, control flow

More information

THREAD-LEVEL AUTOMATIC PARALLELIZATION IN THE ELBRUS OPTIMIZING COMPILER

THREAD-LEVEL AUTOMATIC PARALLELIZATION IN THE ELBRUS OPTIMIZING COMPILER THREAD-LEVEL AUTOMATIC PARALLELIZATION IN THE ELBRUS OPTIMIZING COMPILER L. Mukhanov email: mukhanov@mcst.ru P. Ilyin email: ilpv@mcst.ru S. Shlykov email: shlykov@mcst.ru A. Ermolitsky email: era@mcst.ru

More information

CS426 Compiler Construction Fall 2006

CS426 Compiler Construction Fall 2006 CS426 Compiler Construction David Padua Department of Computer Science University of Illinois at Urbana-Champaign 0. Course organization 2 of 23 Instructor: David A. Padua 4227 SC, 333-4223 Office Hours:

More information

Static and Dynamic Evaluation of Data Dependence Analysis*

Static and Dynamic Evaluation of Data Dependence Analysis* Static and Dynamic Evaluation of Data Dependence Analysis* Paul M. Petersen David A. Padua Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, 465 CSRL, 1308

More information

Modeling Dependencies for Cascading Selective Undo

Modeling Dependencies for Cascading Selective Undo Modeling Dependencies for Cascading Selective Undo Aaron G. Cass and Chris S. T. Fernandes Union College, Schenectady, NY 12308, USA, {cassa fernandc}@union.edu Abstract. Linear and selective undo mechanisms

More information

Lightweight Barrier-Based Parallelization Support for Non-Cache-Coherent MPSoC Platforms

Lightweight Barrier-Based Parallelization Support for Non-Cache-Coherent MPSoC Platforms Lightweight Barrier-Based Parallelization Support for Non-Cache-Coherent MPSoC Platforms Andrea Marongiu DEIS University of Bologna Viale Risorgimento 2 40133 Bologna amarongiu@deis.unibo.it Luca Benini

More information