Interprocedural Dependence Analysis and Parallelization
|
|
- Branden Malone
- 5 years ago
- Views:
Transcription
1 RETROSPECTIVE: Interprocedural Dependence Analysis and Parallelization Michael G Burke IBM T.J. Watson Research Labs P.O. Box 704 Yorktown Heights, NY USA mgburke@us.ibm.com Ron K. Cytron Department of Computer Science and Engineering Washington University Campus Box 1045 St Louis, MO USA cytron@acm.org ABSTRACT The area of dependence analysis has served as grounds for fruitful research as well as practical implementation. Compilers and tools that utilize dependence information can generate code that takes advantage of parallel resources and storage hierarchies on modern architectures. Here, we offer some historical background on the context and thinking that fostered our 1986 paper. We also attempt to summarize the direction research in this area has taken since the paper s appearance. Background In 1985, when this paper was submitted to PLDI, the authors of this paper were members of the PTRAN (Parallel TRANslation) group at IBM T. J. Watson Research Labs in Yorktown Heights. Fran Allen, now Research Staff Member Emerita, directed the group, whose research included program optimizations and transformations for parallel architectures. Fran asked us to think about a compile-time test for array overlap that would be appropriate for Fortran, where arrays are statically declared but can overlap in nonobvious ways across different compilation units. The problem thus posed was interprocedural in nature, but it was complicated by Fortran COMMON blocks and other such structures by which a given location in memory could be known by different names. We surveyed literature on dependence analysis and concluded that a subscript test, of the kind formulated by Banerjee-Wolfe, would be appropriate. That test, however, proceeds subscript-by-subscript, and holds only if array indices do not violate their declared bounds. Fortran offered no mechanism to determine the size of an array dimension at runtime, nor were runtime violations of declared bounds cause for terminating a Fortran program. Thus, Fortran programmers violated declared array-bounds with abandon. It occurrred to us that the lower view of an array subscript is simply a linear index in memory, and all Fortran compilers eventually generate code to treat arrays of higher dimension as a onedimensional vector. By applying Banerjee-Wolfe to the linearized subscript form, Fran s problem could be solved conservatively: the compile-time test is reliable concerning array independence, but the test may flag some arrays as overlapping when in fact they are independent. Fortunately, this approach is appropriate for a compiletime test. Further, it turned out that for some higher-dimension arrays, tests on the linearized form could prove independence where subscriptby-subscript testing could not. 20 Years of the ACM/SIGPLAN Conference on Programming Language Design and Implementation ( ): A Selection, Copyright 2003 ACM $5.00. Our paper is perhaps better known for its hierarchical reformulation of Wolfe s direction vectors. A direction vector shows the direction of a data dependence in terms of an iteration space. In a single-loop environment, Wolfe s < dependence (called a true dependence by Kennedy and Allen) moves forward through the iteration space, while Wolfe s > dependence moves backward. We showed that dependence testing could proceed first by testing for dependences over all direction vectors (which we called? ). If the test is positive, then further refinement of? into Wolfe s direction vectors can provide more information about the nature of the dependence. This reformulation was useful, especially for the problem posed by Fran, because many array expressions that might appear to overlap have absolutely no overlap once the linearized references are obtained. The? test quickly obviates the need for further dependence testing between the arrays. Subsequent Developments At the time of our paper s publication in 1986, we had implemented the dependence-testing aspects of the paper and verified the efficiency on the Perfect benchmarks a popular suite of Fortran benchmarks at that time. While our paper suggested that interprocedural subscript analysis might uncover more parallelism, we did not implement the work to that extent, and so that hypothesis remained open. Michael Hind added the full, interprocedural subscript-analysis described in our paper. His experiments showed that the extra analysis did not in fact expose much more parallelism than did the intraprocedural version we had implemented [2]. Subsequently, Mary Hall, continuing work she had begun at Rice but now at Stanford working with Monica Lam and others, showed that exposing significant parallelism on the Perfect benchmarks required powerful transformations like array privatization. The Stanford experiments [1] were performed against the PTRAN measurements on the Perfect benchmarks that Hind, et al. had described in our paper. In a subsequent paper [3], the Stanford group cited our approach as the standard one for computing direction vectors. In his book [4], Michael Wolfe acknowledged and adopted our framework for computing dependence relations hierarchically. While dependence-testing of the form described in our paper does not see much use these days for uncovering parallelism in dusty deck Fortran programs, sophisticated analysis of this form is present in tools and in compilers that restructure programs for advanced architectures, including those that feature elements of parallelism as well as deep storage hierarchies. At IBM, our dependence test found its way into the IBM XL Fortran product that was first shipped as a product in 1996 ten years after the publication of our ACM SIGPLAN 139 Best of PLDI
2 paper. Dependence analysis is a specialized area of computer science, but it has served as a fertile ground for theoretical and practical research. We are pleased to have been part of its noble history and we thank the selection committee for this honor. 1. ACKNOWLEDGEMENTS This work builds on the work of two groups who pioneered the area of dependence analysis: from Illinois, David Kuck, Utpal Banerjee, and Michael Wolfe; from Rice, Ken Kennedy and Randy Allen. The authors thank Fran Allen for inspiring and supporting this work and Vivek Sarkar for advocating our work for this recognition. REFERENCES [1] M. W. Hall, S. Amarasinghe, B. Murphy, S. Liao, and M. Lam. Detecting coarse-grain parallelism using an interprocedural parallelizing compiler. Proceedings of Supercomputing 95, [2] Michael Hind, Michael Burke, Paul Carini, and Sam Midkiff. An Empirical Study of Precise Interprocedural Array Analysis. Scientific Programming, 3(3), [3] Maydan, Hennessy, and Lam. Efficient and exact data dependence analysis. PLDI, [4] Michael J. Wolfe. Optimizing Supercompilers for Supercomputers. Pitman, London and The MIT Press, Cambridge, Massachusetts, In the series, Research Monographs in Parallel and Distributed Computing This monograph is a revised version of the author s Ph.D. dissertation published as Technical Report UIUCDCS-R , U. Illinois at Urbana-Champaign, ACM SIGPLAN 140 Best of PLDI
3 ACM SIGPLAN 141 Best of PLDI
4 ACM SIGPLAN 142 Best of PLDI
5 ACM SIGPLAN 143 Best of PLDI
6 ACM SIGPLAN 144 Best of PLDI
7 ACM SIGPLAN 145 Best of PLDI
8 ACM SIGPLAN 146 Best of PLDI
9 ACM SIGPLAN 147 Best of PLDI
10 ACM SIGPLAN 148 Best of PLDI
11 ACM SIGPLAN 149 Best of PLDI
12 ACM SIGPLAN 150 Best of PLDI
13 ACM SIGPLAN 151 Best of PLDI
14 ACM SIGPLAN 152 Best of PLDI
15 ACM SIGPLAN 153 Best of PLDI
16 ACM SIGPLAN 154 Best of PLDI
Identifying Parallelism in Construction Operations of Cyclic Pointer-Linked Data Structures 1
Identifying Parallelism in Construction Operations of Cyclic Pointer-Linked Data Structures 1 Yuan-Shin Hwang Department of Computer Science National Taiwan Ocean University Keelung 20224 Taiwan shin@cs.ntou.edu.tw
More informationChapter 1: Interprocedural Parallelization Analysis: A Case Study. Abstract
Chapter 1: Interprocedural Parallelization Analysis: A Case Study Mary W. Hall Brian R. Murphy Saman P. Amarasinghe Abstract We present an overview of our interprocedural analysis system, which applies
More informationParallelization System. Abstract. We present an overview of our interprocedural analysis system,
Overview of an Interprocedural Automatic Parallelization System Mary W. Hall Brian R. Murphy y Saman P. Amarasinghe y Shih-Wei Liao y Monica S. Lam y Abstract We present an overview of our interprocedural
More informationUMIACS-TR December, CS-TR-3192 Revised April, William Pugh. Dept. of Computer Science. Univ. of Maryland, College Park, MD 20742
UMIACS-TR-93-133 December, 1992 CS-TR-3192 Revised April, 1993 Denitions of Dependence Distance William Pugh Institute for Advanced Computer Studies Dept. of Computer Science Univ. of Maryland, College
More informationAnalysis of Pointers and Structures
RETROSPECTIVE: Analysis of Pointers and Structures David Chase, Mark Wegman, and Ken Zadeck chase@naturalbridge.com, zadeck@naturalbridge.com, wegman@us.ibm.com Historically our paper was important because
More informationp q r int (*funcptr)(); SUB2() {... SUB3() {... } /* End SUB3 */ SUB1() {... c1: SUB3();... c3 c1 c2: SUB3();... } /* End SUB2 */ ...
Lecture Notes in Computer Science, 892, Springer-Verlag, 1995 Proceedings from the 7th International Workshop on Languages and Compilers for Parallel Computing Flow-Insensitive Interprocedural Alias Analysis
More informationPrecise Executable Interprocedural Slices
Precise Executable Interprocedural Slices DAVID BINKLEY Loyola College in Maryland The notion of a program slice, originally introduced by Mark Weiser, is useful in program debugging, automatic parallelization,
More informationSpace Efficient Conservative Garbage Collection
RETROSPECTIVE: Space Efficient Conservative Garbage Collection Hans-J. Boehm HP Laboratories 1501 Page Mill Rd. MS 1138 Palo Alto, CA, 94304, USA Hans.Boehm@hp.com ABSTRACT Both type-accurate and conservative
More informationInteger Programming for Array Subscript Analysis
Appears in the IEEE Transactions on Parallel and Distributed Systems, June 95 Integer Programming for Array Subscript Analysis Jaspal Subhlok School of Computer Science, Carnegie Mellon University, Pittsburgh
More informationSymbolic Evaluation of Sums for Parallelising Compilers
Symbolic Evaluation of Sums for Parallelising Compilers Rizos Sakellariou Department of Computer Science University of Manchester Oxford Road Manchester M13 9PL United Kingdom e-mail: rizos@csmanacuk Keywords:
More informationAdvanced Compiler Construction
Advanced Compiler Construction Qing Yi class web site: www.cs.utsa.edu/~qingyi/cs6363 cs6363 1 A little about myself Qing Yi Ph.D. Rice University, USA. Assistant Professor, Department of Computer Science
More informationEfficient Computation of LALR(1) Look-Ahead Sets
RETROSPECTIVE: Efficient Computation of LALR(1) Look-Ahead Sets Thomas J. Pennello ARC International Santa Cruz, CA 95060 tom.pennello@arc.com Frank DeRemer 8 South Circle Santa Cruz, CA 95060 fderemer@alum.mit.edu
More informationUniversity of Ghent. St.-Pietersnieuwstraat 41. Abstract. Sucient and precise semantic information is essential to interactive
Visualizing the Iteration Space in PEFPT? Qi Wang, Yu Yijun and Erik D'Hollander University of Ghent Dept. of Electrical Engineering St.-Pietersnieuwstraat 41 B-9000 Ghent wang@elis.rug.ac.be Tel: +32-9-264.33.75
More informationControl Flow Analysis with SAT Solvers
Control Flow Analysis with SAT Solvers Steven Lyde, Matthew Might University of Utah, Salt Lake City, Utah, USA Abstract. Control flow analyses statically determine the control flow of programs. This is
More informationDepartment of. Computer Science. Uniqueness Analysis of Array. Omega Test. October 21, Colorado State University
Department of Computer Science Uniqueness Analysis of Array Comprehensions Using the Omega Test David Garza and Wim Bohm Technical Report CS-93-127 October 21, 1993 Colorado State University Uniqueness
More informationTitle: ====== Open Research Compiler (ORC): Proliferation of Technologies and Tools
Tutorial Proposal to Micro-36 Title: ====== Open Research Compiler (ORC): Proliferation of Technologies and Tools Abstract: ========= Open Research Compiler (ORC) has been well adopted by the research
More informationUnrolling Loops Containing Task Parallelism
Unrolling Loops Containing Task Parallelism Roger Ferrer 1, Alejandro Duran 1, Xavier Martorell 1,2, and Eduard Ayguadé 1,2 1 Barcelona Supercomputing Center Nexus II, Jordi Girona, 29, Barcelona, Spain
More informationIncreasing Parallelism of Loops with the Loop Distribution Technique
Increasing Parallelism of Loops with the Loop Distribution Technique Ku-Nien Chang and Chang-Biau Yang Department of pplied Mathematics National Sun Yat-sen University Kaohsiung, Taiwan 804, ROC cbyang@math.nsysu.edu.tw
More informationGeneralized Iteration Space and the. Parallelization of Symbolic Programs. (Extended Abstract) Luddy Harrison. October 15, 1991.
Generalized Iteration Space and the Parallelization of Symbolic Programs (Extended Abstract) Luddy Harrison October 15, 1991 Abstract A large body of literature has developed concerning the automatic parallelization
More informationCase Studies on Cache Performance and Optimization of Programs with Unit Strides
SOFTWARE PRACTICE AND EXPERIENCE, VOL. 27(2), 167 172 (FEBRUARY 1997) Case Studies on Cache Performance and Optimization of Programs with Unit Strides pei-chi wu and kuo-chan huang Department of Computer
More informationProfiling Dependence Vectors for Loop Parallelization
Profiling Dependence Vectors for Loop Parallelization Shaw-Yen Tseng Chung-Ta King Chuan-Yi Tang Department of Computer Science National Tsing Hua University Hsinchu, Taiwan, R.O.C. fdr788301,king,cytangg@cs.nthu.edu.tw
More informationDepartment of. Computer Science. Uniqueness and Completeness. Analysis of Array. Comprehensions. December 15, Colorado State University
Department of Computer Science Uniqueness and Completeness Analysis of Array Comprehensions David Garza and Wim Bohm Technical Report CS-93-132 December 15, 1993 Colorado State University Uniqueness and
More informationAdvanced Compiler Construction Theory And Practice
Advanced Compiler Construction Theory And Practice Introduction to loop dependence and Optimizations 7/7/2014 DragonStar 2014 - Qing Yi 1 A little about myself Qing Yi Ph.D. Rice University, USA. Associate
More informationCS 526 Advanced Topics in Compiler Construction. 1 of 12
CS 526 Advanced Topics in Compiler Construction 1 of 12 Course Organization Instructor: David Padua 3-4223 padua@uiuc.edu Office hours: By appointment Course material: Website Textbook: Randy Allen and
More informationVisual Amortization Analysis of Recompilation Strategies
2010 14th International Information Conference Visualisation Information Visualisation Visual Amortization Analysis of Recompilation Strategies Stephan Zimmer and Stephan Diehl (Authors) Computer Science
More informationi=1 i=2 i=3 i=4 i=5 x(4) x(6) x(8)
Vectorization Using Reversible Data Dependences Peiyi Tang and Nianshu Gao Technical Report ANU-TR-CS-94-08 October 21, 1994 Vectorization Using Reversible Data Dependences Peiyi Tang Department of Computer
More informationBriki: a Flexible Java Compiler
Briki: a Flexible Java Compiler Michał Cierniak Wei Li Department of Computer Science University of Rochester Rochester, NY 14627 fcierniak,weig@cs.rochester.edu May 1996 Abstract We present a Java compiler
More informationCompiling for Advanced Architectures
Compiling for Advanced Architectures In this lecture, we will concentrate on compilation issues for compiling scientific codes Typically, scientific codes Use arrays as their main data structures Have
More informationCenter for Supercomputing Research and Development. recognizing more general forms of these patterns, notably
Idiom Recognition in the Polaris Parallelizing Compiler Bill Pottenger and Rudolf Eigenmann potteng@csrd.uiuc.edu, eigenman@csrd.uiuc.edu Center for Supercomputing Research and Development University of
More informationAdvanced Compiler Construction
CS 526 Advanced Compiler Construction http://misailo.cs.illinois.edu/courses/cs526 INTERPROCEDURAL ANALYSIS The slides adapted from Vikram Adve So Far Control Flow Analysis Data Flow Analysis Dependence
More informationLecture 5. Data Flow Analysis
Lecture 5. Data Flow Analysis Wei Le 2014.10 Abstraction-based Analysis dataflow analysis: combines model checking s fix point engine with abstract interpretation of data values abstract interpretation:
More informationFeedback Guided Scheduling of Nested Loops
Feedback Guided Scheduling of Nested Loops T. L. Freeman 1, D. J. Hancock 1, J. M. Bull 2, and R. W. Ford 1 1 Centre for Novel Computing, University of Manchester, Manchester, M13 9PL, U.K. 2 Edinburgh
More informationThe Essence of Compiling with Continuations
RETROSPECTIVE: The Essence of Compiling with Continuations Cormac Flanagan Amr Sabry Bruce F. Duba Matthias Felleisen Systems Research Center Compaq cormac.flanagan@compaq.com Dept. of Computer Science
More informationOn Privatization of Variables for Data-Parallel Execution
On Privatization of Variables for Data-Parallel Execution Manish Gupta IBM T. J. Watson Research Center P. O. Box 218 Yorktown Heights, NY 10598 mgupta@watson.ibm.com Abstract Privatization of data is
More informationA PRACTICAL ALGORITHM
William Pugh A PRACTICAL ALGORITHM for Exact Array Dependence Analysis ndamental analis step in an ad- ',nced optimizing compiler (as well as many other software tools) is data dependence analysis for
More informationINTERPROCEDURAL PARALLELIZATION USING MEMORY CLASSIFICATION ANALYSIS BY JAY PHILIP HOEFLINGER B.S., University of Illinois, 1974 M.S., University of I
cflcopyright by Jay Philip Hoeflinger 2000 INTERPROCEDURAL PARALLELIZATION USING MEMORY CLASSIFICATION ANALYSIS BY JAY PHILIP HOEFLINGER B.S., University of Illinois, 1974 M.S., University of Illinois,
More informationTyped Fusion with Applications to Parallel and Sequential Code Generation
Typed Fusion with Applications to Parallel and Sequential Code Generation Ken Kennedy Kathryn S. McKinley Department of Computer Science Department of Computer Science Rice University, CITI University
More informationDesign-Driven Compilation
Design-Driven Compilation Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology Cambridge, MA 02139 {rugina, rinard@lcs.mit.edu Abstract. This paper introduces
More informationParallelizing SPECjbb2000 with Transactional Memory
Parallelizing SPECjbb2000 with Transactional Memory JaeWoong Chung, Chi Cao Minh, Brian D. Carlstrom, Christos Kozyrakis Computer Systems Laboratory Stanford University {jwchung, caominh, bdc, kozyraki}@stanford.edu
More informationA Data Dependence Graph in Polaris. July 17, Center for Supercomputing Research and Development. Abstract
A Data Dependence Graph in Polaris Yunheung Paek Paul Petersen July 17, 1996 Center for Supercomputing Research and Development University of Illinois at Urbana-Champaign Urbana, Illinois 61801 Abstract
More informationOriginally appeared at Supercomputing 91 This expanded version appeared in Comm. of the ACM, August 1992
Originally appeared at Supercomputing 91 This expanded version appeared in Comm. of the ACM, August 1992 The Omega Test: a fast and practical integer programming algorithm for dependence analysis William
More informationResearch Statement. 1 My Approach to Research. John Whaley January 2005
Research Statement John Whaley January 2005 1 My Approach to Research As a child, I was always interested in building things. When I was six years old, I taught myself programming by typing in programs
More informationKeywords AST, Pattern Matching, Automatic Parallelization, Loop Parallelization, Python
Volume 7, Issue 3, March 217 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com An Automatic Parallelizing
More informationUsing Cache Models and Empirical Search in Automatic Tuning of Applications. Apan Qasem Ken Kennedy John Mellor-Crummey Rice University Houston, TX
Using Cache Models and Empirical Search in Automatic Tuning of Applications Apan Qasem Ken Kennedy John Mellor-Crummey Rice University Houston, TX Outline Overview of Framework Fine grain control of transformations
More informationLecture Notes on Liveness Analysis
Lecture Notes on Liveness Analysis 15-411: Compiler Design Frank Pfenning André Platzer Lecture 4 1 Introduction We will see different kinds of program analyses in the course, most of them for the purpose
More informationReport on article The Travelling Salesman Problem: A Linear Programming Formulation
Report on article The Travelling Salesman Problem: A Linear Programming Formulation Radosław Hofman, Poznań 2008 Abstract This article describes counter example prepared in order to prove that linear formulation
More informationA Compiler-Directed Cache Coherence Scheme Using Data Prefetching
A Compiler-Directed Cache Coherence Scheme Using Data Prefetching Hock-Beng Lim Center for Supercomputing R & D University of Illinois Urbana, IL 61801 hblim@csrd.uiuc.edu Pen-Chung Yew Dept. of Computer
More informationTheory and Algorithms for the Generation and Validation of Speculative Loop Optimizations
Theory and Algorithms for the Generation and Validation of Speculative Loop Optimizations Ying Hu Clark Barrett Benjamin Goldberg Department of Computer Science New York University yinghubarrettgoldberg
More informationInterprocedural Symbolic Range Propagation for Optimizing Compilers
Interprocedural Symbolic Range Propagation for Optimizing Compilers Hansang Bae and Rudolf Eigenmann School of Electrical and Computer Engineering Purdue University, West Lafayette, IN 47907 {baeh,eigenman}@purdue.edu
More informationObjective. We will study software systems that permit applications programs to exploit the power of modern high-performance computers.
CS 612 Software Design for High-performance Architectures 1 computers. CS 412 is desirable but not high-performance essential. Course Organization Lecturer:Paul Stodghill, stodghil@cs.cornell.edu, Rhodes
More informationExact Side Eects for Interprocedural Dependence Analysis Peiyi Tang Department of Computer Science The Australian National University Canberra ACT 26
Exact Side Eects for Interprocedural Dependence Analysis Peiyi Tang Technical Report ANU-TR-CS-92- November 7, 992 Exact Side Eects for Interprocedural Dependence Analysis Peiyi Tang Department of Computer
More informationThe Relationships between Domain Specific and General- Purpose Languages
The Relationships between Domain Specific and General- Purpose Languages Oded Kramer and Arnon Sturm Department of Information Systems Engineering, Ben-Gurion University of the Negev Beer-Sheva, Israel
More informationLazy Code Motion. Jens Knoop FernUniversität Hagen. Oliver Rüthing University of Dortmund. Bernhard Steffen University of Dortmund
RETROSPECTIVE: Lazy Code Motion Jens Knoop FernUniversität Hagen Jens.Knoop@fernuni-hagen.de Oliver Rüthing University of Dortmund Oliver.Ruething@udo.edu Bernhard Steffen University of Dortmund Bernhard.Steffen@udo.edu
More informationReducing Parallelizing Compilation Time by Removing Redundant Analysis
Reducing Parallelizing Compilation Time by Removing Redundant Analysis Jixin Han Rina Fujino Ryota Tamura Mamoru Shimaoka Hiroki Mikami Waseda University, Japan {kalfazed,rfujino,r tamura,shimaoka,hiroki}
More informationData Dependency. Extended Contorol Dependency. Data Dependency. Conditional Branch. AND OR Original Control Flow. Control Flow. Conditional Branch
Coarse Grain Task Parallel Processing with Cache Optimization on Shared Memory Multiprocessor Kazuhisa Ishizaka, Motoki Obata, Hironori Kasahara fishizaka,obata,kasaharag@oscar.elec.waseda.ac.jp Dept.EECE,
More informationCompiler techniques for leveraging ILP
Compiler techniques for leveraging ILP Purshottam and Sajith October 12, 2011 Purshottam and Sajith (IU) Compiler techniques for leveraging ILP October 12, 2011 1 / 56 Parallelism in your pocket LINPACK
More informationJULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING
JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING Larson Hogstrom, Mukarram Tahir, Andres Hasfura Massachusetts Institute of Technology, Cambridge, Massachusetts, USA 18.337/6.338
More informationDIGITAL SIGNAL PROCESSING AND ITS USAGE
DIGITAL SIGNAL PROCESSING AND ITS USAGE BANOTHU MOHAN RESEARCH SCHOLAR OF OPJS UNIVERSITY ABSTRACT High Performance Computing is not the exclusive domain of computational science. Instead, high computational
More informationLecture Notes on Dataflow Analysis
Lecture Notes on Dataflow Analysis 15-411: Compiler Design Frank Pfenning Lecture 5 September 9, 2008 1 Introduction In this lecture we first extend liveness analysis to handle memory references and then
More informationBandwidth-Based Performance Tuning and Prediction
!#"$ % &(' ) *+,-. %#/ 01 24357698;:06=6@BA5C6DA5C6615@E F GHIFJ & GLKNMOQPRQCS GHT 0 U9T Q"DVWXZYQK KNK [#\0]_^`\0aXbdc aex\0f`\)ghà ikjlcm].nghakghopop\0oq[#c r sutu^kgh^`vpcm] evpi0\qw]xvzym\0à f{vp^}
More informationarxiv: v3 [cs.dc] 1 Mar 2013
OPTIMIZING SYNCHRONIZATION ALGORITHM FOR AUTO-PARALLELIZING COMPILER Gang Liao, Si-hui Qin, Long-fei Ma, Qi Sun Department of Computer Science and Engineering, Sichuan University Jinjiang College, China,
More informationExtended Linear Scan: an Alternate Foundation for Global Register Allocation
Extended Linear Scan: an Alternate Foundation for Global Register Allocation Vivek Sarkar 1 and Rajkishore Barik 2 1 IBM T.J. Watson Research Center, Email: vsarkar@us.ibm.com 2 IBM India Research Laboratory,
More informationANALYZING THREADS FOR SHARED MEMORY CONSISTENCY BY ZEHRA NOMAN SURA
ANALYZING THREADS FOR SHARED MEMORY CONSISTENCY BY ZEHRA NOMAN SURA B.E., Nagpur University, 1998 M.S., University of Illinois at Urbana-Champaign, 2001 DISSERTATION Submitted in partial fulfillment of
More informationThread-Sensitive Points-to Analysis for Multithreaded Java Programs
Thread-Sensitive Points-to Analysis for Multithreaded Java Programs Byeong-Mo Chang 1 and Jong-Deok Choi 2 1 Dept. of Computer Science, Sookmyung Women s University, Seoul 140-742, Korea chang@sookmyung.ac.kr
More informationTiling: A Data Locality Optimizing Algorithm
Tiling: A Data Locality Optimizing Algorithm Previously Unroll and Jam Homework PA3 is due Monday November 2nd Today Unroll and Jam is tiling Code generation for fixed-sized tiles Paper writing and critique
More informationAffine and Unimodular Transformations for Non-Uniform Nested Loops
th WSEAS International Conference on COMPUTERS, Heraklion, Greece, July 3-, 008 Affine and Unimodular Transformations for Non-Uniform Nested Loops FAWZY A. TORKEY, AFAF A. SALAH, NAHED M. EL DESOUKY and
More informationLecture Notes on Register Allocation
Lecture Notes on Register Allocation 15-411: Compiler Design Frank Pfenning Lecture 3 September 1, 2009 1 Introduction In this lecture we discuss register allocation, which is one of the last steps in
More informationAnalyzing programs with explicit parallelism
Oregon Health & Science University OHSU Digital Commons CSETech June 1991 Analyzing programs with explicit parallelism Harini Srinivasan Michael Wolfe Follow this and additional works at: http://digitalcommons.ohsu.edu/csetech
More informationA Geometric Approach for Partitioning N-Dimensional Non-Rectangular Iteration Spaces
A Geometric Approach for Partitioning N-Dimensional Non-Rectangular Iteration Spaces Arun Kejariwal, Paolo D Alberto, Alexandru Nicolau Constantine D. Polychronopoulos Center for Embedded Computer Systems
More informationSoftware pipelining of nested loops 2 1 Introduction Exploiting parallelism in loops in scientic programs is an important factor in realizing the pote
Software pipelining of nested loops J. Ramanujam Dept. of Electrical and Computer Engineering Louisiana State University, Baton Rouge, LA 70803 E-mail: jxr@ee.lsu.edu May 1994 Abstract This paper presents
More informationA Framework for the Performance Evaluation of Operating System Emulators. Joshua H. Shaffer. A Proposal Submitted to the Honors Council
A Framework for the Performance Evaluation of Operating System Emulators by Joshua H. Shaffer A Proposal Submitted to the Honors Council For Honors in Computer Science 15 October 2003 Approved By: Luiz
More informationParallel Programming. Michael Gerndt Technische Universität München
Parallel Programming Michael Gerndt Technische Universität München gerndt@in.tum.de Contents 1. Introduction 2. Parallel architectures 3. Parallel applications 4. Parallelization approach 5. OpenMP 6.
More informationThe LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization
160 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 10, NO. 2, FEBRUARY 1999 The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization Lawrence
More informationAutomatic Discovery of Coarse-Grained Parallelism in Media Applications
Automatic Discovery of Coarse-Grained Parallelism in Media Applications Shane Ryoo, Sain-Zee Ueng, Christopher I. Rodrigues, Robert E. Kidd, Matthew I. Frank, and Wen-mei W. Hwu Center for Reliable and
More informationCombining Interprocedural Pointer Analysis
RC 21532 (96749) 3/17/99 Computer Science IBM Research Report Combining Interprocedural Pointer Analysis and Conditional Constant Propagation Anthony Pioli Forman Interactive 134 5th Ave New York, NY 10011
More informationAdvanced Program Analyses and Verifications
Advanced Program Analyses and Verifications Thi Viet Nga NGUYEN François IRIGOIN entre de Recherche en Informatique - Ecole des Mines de Paris 35 rue Saint Honoré, 77305 Fontainebleau edex, France email:
More informationA SIMDizing C Compiler for the Mitsubishi Electric Neuro4 Processor Array
Mitsubishi Electric Research Laboratories MERL/SV 95TR031 December 18, 1995 A SIMDizing C Compiler for the Mitsubishi Electric Neuro4 Processor Array Venkat Konda, Hugh Lauer, Katsunobu Muroi, Kenichi
More informationModeling Dependencies for Cascading Selective Undo
Modeling Dependencies for Cascading Selective Undo Aaron G. Cass and Chris S. T. Fernandes Union College, Schenectady, NY 12308, USA, {cassa fernandc}@union.edu Abstract. Linear and selective undo mechanisms
More informationRICE UNIVERSITY. Transforming Complex Loop Nests For Locality by Qing Yi
RICE UNIVERSITY Transforming Complex Loop Nests For Locality by Qing Yi A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY Approved, Thesis Committee: Ken
More informationData structures for optimizing programs with explicit parallelism
Oregon Health & Science University OHSU Digital Commons CSETech March 1991 Data structures for optimizing programs with explicit parallelism Michael Wolfe Harini Srinivasan Follow this and additional works
More informationExtending Blaise Capabilities in Complex Data Collections
Extending Blaise Capabilities in Complex Data Collections Paul Segel and Kathleen O Reagan,Westat International Blaise Users Conference, April 2012, London, UK Summary: Westat Visual Survey (WVS) was developed
More informationOptimizing Inter-Nest Data Locality Using Loop Splitting and Reordering
Optimizing Inter-Nest Data Locality Using Loop Splitting and Reordering Sofiane Naci The Computer Laboratory, University of Cambridge JJ Thompson Avenue Cambridge CB3 FD United Kingdom Sofiane.Naci@cl.cam.ac.uk
More informationbeen implemented as part of the PTRAN (Parallel Translation) project at IBM Research [ABC + 87]. The PTRAN system contains a program database which ca
Determining Average Program Execution Times and their Variance Vivek Sarkar IBM Research T. J. Watson Research Center P. O. Box 704, Yorktown Heights, NY 10598 Abstract This paper presents a general framework
More informationDiscrete Optimization. Lecture Notes 2
Discrete Optimization. Lecture Notes 2 Disjunctive Constraints Defining variables and formulating linear constraints can be straightforward or more sophisticated, depending on the problem structure. The
More informationCompiling Java For High Performance on Servers
Compiling Java For High Performance on Servers Ken Kennedy Center for Research on Parallel Computation Rice University Goal: Achieve high performance without sacrificing language compatibility and portability.
More informationHOW AND WHEN TO FLATTEN JAVA CLASSES?
HOW AND WHEN TO FLATTEN JAVA CLASSES? Jehad Al Dallal Department of Information Science, P.O. Box 5969, Safat 13060, Kuwait ABSTRACT Improving modularity and reusability are two key objectives in object-oriented
More informationcies. IEEE Trans. Comput., 38, 5 (May),
cies. IEEE Trans. Comput., 38, 5 (May), 663-678. [33] M. D. Smith. Support for speculative execution in high-performance processors. PhD thesis, Stanford University, November 1992. [34] D. W. Wall. Limits
More informationRegression-Based Multi-Model Prediction of Data Reuse Signature
Regression-Based Multi-Model Prediction of Data Reuse Signature Xipeng Shen Yutao Zhong Chen Ding Computer Science Department, University of Rochester {xshen,ytzhong,cding}@cs.rochester.edu Abstract As
More informationAn Object Oriented Runtime Complexity Metric based on Iterative Decision Points
An Object Oriented Runtime Complexity Metric based on Iterative Amr F. Desouky 1, Letha H. Etzkorn 2 1 Computer Science Department, University of Alabama in Huntsville, Huntsville, AL, USA 2 Computer Science
More informationA General Greedy Approximation Algorithm with Applications
A General Greedy Approximation Algorithm with Applications Tong Zhang IBM T.J. Watson Research Center Yorktown Heights, NY 10598 tzhang@watson.ibm.com Abstract Greedy approximation algorithms have been
More informationTowards Automatic Parallelisation for Multi-Processor DSPs
Towards Automatic Parallelisation for Multi-Processor DSPs Björn Franke Michael O Boyle Institute for Computing Systems Architecture (ICSA) Division of Informatics, University of Edinburgh Abstract This
More informationSpeculative Synchronization
Speculative Synchronization José F. Martínez Department of Computer Science University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu/martinez Problem 1: Conservative Parallelization No parallelization
More informationCompile-time Inter-query Dependence Analysis
Compile-time Inter-query Dependence Analysis Srinivasan Parthasarathy, Wei Li, Michał Cierniak, Mohammed Javeed Zaki Department of Computer Science, University of Rochester, Rochester, NY 14627-0226 fsrini,wei,cierniak,zakig@cs.rochester.edu
More informationHierarchical Pointer Analysis for Distributed Programs
Hierarchical Pointer Analysis for Distributed Programs Amir Kamil Computer Science Division, University of California, Berkeley kamil@cs.berkeley.edu April 14, 2006 1 Introduction Many distributed, parallel
More informationLecture 2: Control Flow Analysis
COM S/CPRE 513 x: Foundations and Applications of Program Analysis Spring 2018 Instructor: Wei Le Lecture 2: Control Flow Analysis 2.1 What is Control Flow Analysis Given program source code, control flow
More informationTHREAD-LEVEL AUTOMATIC PARALLELIZATION IN THE ELBRUS OPTIMIZING COMPILER
THREAD-LEVEL AUTOMATIC PARALLELIZATION IN THE ELBRUS OPTIMIZING COMPILER L. Mukhanov email: mukhanov@mcst.ru P. Ilyin email: ilpv@mcst.ru S. Shlykov email: shlykov@mcst.ru A. Ermolitsky email: era@mcst.ru
More informationCS426 Compiler Construction Fall 2006
CS426 Compiler Construction David Padua Department of Computer Science University of Illinois at Urbana-Champaign 0. Course organization 2 of 23 Instructor: David A. Padua 4227 SC, 333-4223 Office Hours:
More informationStatic and Dynamic Evaluation of Data Dependence Analysis*
Static and Dynamic Evaluation of Data Dependence Analysis* Paul M. Petersen David A. Padua Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, 465 CSRL, 1308
More informationModeling Dependencies for Cascading Selective Undo
Modeling Dependencies for Cascading Selective Undo Aaron G. Cass and Chris S. T. Fernandes Union College, Schenectady, NY 12308, USA, {cassa fernandc}@union.edu Abstract. Linear and selective undo mechanisms
More informationLightweight Barrier-Based Parallelization Support for Non-Cache-Coherent MPSoC Platforms
Lightweight Barrier-Based Parallelization Support for Non-Cache-Coherent MPSoC Platforms Andrea Marongiu DEIS University of Bologna Viale Risorgimento 2 40133 Bologna amarongiu@deis.unibo.it Luca Benini
More information