Tag der mündlichen Prüfung: 03. Juni 2004 Dekan / Dekanin: Prof. Dr. Bernhard Steffen Gutachter / Gutachterinnen: Prof. Dr. Francky Catthoor, Prof. Dr

Size: px
Start display at page:

Download "Tag der mündlichen Prüfung: 03. Juni 2004 Dekan / Dekanin: Prof. Dr. Bernhard Steffen Gutachter / Gutachterinnen: Prof. Dr. Francky Catthoor, Prof. Dr"

Transcription

1 Source Code Optimization Techniques for Data Flow Dominated Embedded Software Dissertation zur Erlangung des Grades eines Doktors der Naturwissenschaften der Universität Dortmund am Fachbereich Informatik von Heiko Falk Dortmund 2004

2 Tag der mündlichen Prüfung: 03. Juni 2004 Dekan / Dekanin: Prof. Dr. Bernhard Steffen Gutachter / Gutachterinnen: Prof. Dr. Francky Catthoor, Prof. Dr. Peter Marwedel, Prof. Dr. Peter Padawitz

3

4

5 Contents v Contents List of Figures List of Tables ix xiii 1 Introduction Why Source Code Optimization? Abstraction Levels of Code Optimization Survey of the traditional Code Optimization Process Scopes for Code Optimization Target Application Domain Goals and Contributions Outline of the Thesis Author's Contribution to the Thesis Existing Code Optimization Techniques Specification Optimization Algorithm Selection Memory Hierarchy Exploitation Processor-independent Source Code Optimizations Processor-specific Source Code Optimizations Compiler Optimizations Loop Optimizations for High-Performance Computing Code Generation for embedded Processors Fundamental Concepts for Optimization and Evaluation Polyhedral Modeling Optimization using Genetic Algorithms Benchmarking Methodology Profiling of Pipeline and Cache Performance Compilation for Runtime and Code Size Measurement Estimation of Energy Dissipation

6 vi Contents 3.4 Summary Intermediate Representations Low-level Intermediate Representations GNU RTL Trimaran ELCOR IR Medium-level Intermediate Representations Sun IR IR-C / LANCE High-level Intermediate Representations SUIF IMPACT Selection of an IR for Source Code Optimization Summary Loop Nest Splitting Introduction Control Flow Overhead in Data Flow dominated Software Control Flow Overhead caused by Data Partitioning Splitting of Loop Nests for Control Flow Optimization Related Work Analysis and Optimization Techniques for Loop Nest Splitting Preliminaries Condition Satisfiability Condition Optimization Chromosomal Representation Fitness Function Polytope Generation Global Search Space Construction Global Search Space Exploration Chromosomal Representation Fitness Function Source Code Transformation Generation of the splitting If-Statement Loop Nest Duplication Extensions for Loops with non-constant Bounds Experimental Results Stand-alone Loop Nest Splitting Pipeline and Cache Performance Execution Times and Code Sizes Energy Consumption

7 Contents vii Combined Data Partitioning and Loop Nest Splitting for energy-efficient Scratchpad Utilization Execution Times and Code Sizes Energy Consumption Summary Advanced Code Hoisting A motivating Example Related Work Analysis Techniques for Advanced Code Hoisting Common Subexpression Identification Collection of equivalent Expressions Computation of Live Ranges of Expressions Determination of the outermost Loop for a CSE Computation of Execution Frequencies using Polytope Models Experimental Results Pipeline and Cache Performance Execution Times and Code Sizes Energy Consumption Summary Ring Buffer Replacement Motivation Optimization Steps Ring Buffer Scalarization Loop Unrolling for Ring Buffers Experimental Results Pipeline and Cache Performance Execution Times and Code Sizes Energy Consumption Summary Summary and Conclusions Summary and Contribution to Research Future Work A Experimental Comparison of SUIF and IR-C / LANCE 199

8 viii Contents B Benchmarking Data for Loop Nest Splitting 201 B.1 Values of performance-monitoring Counters B.1.1 Intel Pentium III B.1.2 Sun UltraSPARC III B.1.3 MIPS R B.2 Execution Times and Code Sizes B.3 Energy Consumption of an ARM7TDMI Core B.4 Combined Data Partitioning and Loop Nest Splitting B.4.1 Execution Times and Code Sizes B.4.2 Energy Consumption C Benchmarking Data for Advanced Code Hoisting 213 C.1 Values of performance-monitoring Counters C.1.1 Intel Pentium III C.1.2 Sun UltraSPARC III C.1.3 MIPS R C.2 Execution Times and Code Sizes C.3 Energy Consumption of an ARM7TDMI Core D Benchmarking Data for Ring Buffer Replacement 219 D.1 Values of performance-monitoring Counters D.1.1 Intel Pentium III D.1.2 Sun UltraSPARC III D.1.3 MIPS R D.2 Execution Times and Code Sizes D.3 Energy Consumption of an ARM7TDMI Core References 223 Index 235

9 List of Figures ix List of Figures 1.1 Examples of typical Embedded Systems Abstraction Levels and achievable Improvements of Code Optimizations Structure of Optimizing Compilers Implementation of a saturating Addition Abstraction Levels of Code Optimizations Overview of the DTSE Methodology Replacement of complex Code by atic6x Intrinsic A convex Polytope A non-convex Set of Points Schematic Overview of Genetic Algorithms Genetic Algorithm Encoding Crossover Mutation Overview of the Workflow employed in the encc Compiler Experimental Comparison of Runtimes based on SUIF and IR-C / LANCE Atypical Loop Nest Atypical Code Fragment before and after Data Partitioning Loop Nest after Splitting Conventional Loop Splitting Conventional Loop Unswitching Design Flow of Loop Nest Splitting Conditions as Polytopes Structure of the Fitness Function for Global Search Space Exploration Atypical Loop Nest with variable Loop Bounds Structure of the Fitness Function for Condition Optimization. 103

10 x List of Figures 5.11 A Loop Nest with variable Loop Bounds after Splitting Relative Pipeline and Cache Behavior for Intel Pentium III Relative Pipeline and Cache Behavior for Sun UltraSPARC III Relative Pipeline and Cache Behavior for MIPS R Relative Runtimes after Loop Nest Splitting Relative CodeSizes after Loop Nest Splitting Relative Energy Consumption after Loop Nest Splitting Relative Runtimes after Data Partitioning and Loop Nest Splitting Relative Code Sizes after Data Partitioning and Loop Nest Splitting Relative Energy Consumption after Data Partitioning Relative Energy Consumption after Data Partitioning and Loop Nest Splitting ACodeSequence of GSM Codebook Search The GSM Codebook Search after Code Hoisting without Control Flow Analysis The GSM Codebook Search after Advanced Code Hoisting Local Common Subexpression Elimination Global Common Subexpression Elimination Loop-invariant CodeMotion Parts of DTSE causing Control Flow and Addressing Overhead AFragment of the CAVITY Benchmark before and after DTSE Algorithm for Advanced Code Hoisting Algorithm to identify Common Subexpressions The SUIF Abstract Syntax Tree Bottom-up Syntax Tree Traversal Algorithm to determine the Parent of expr i;c and expr i;c Basic Algorithm for Syntax Tree Traversal Algorithm to compute Live Ranges Algorithm to determine the outermost Loop for a CSE General Structure of Control Flow surrounding an Expression expr Relative Pipeline and Cache Behavior for Intel Pentium III Relative Pipeline and Cache Behavior for Sun UltraSPARC III Relative Pipeline and Cache Behavior for MIPS R Relative Runtimes after Advanced Code Hoisting Runtime Comparison of Advanced Code Hoisting and Common Subexpression Elimination Relative CodeSizes after Advanced Code Hoisting

11 List of Figures xi 6.24 Relative Energy Consumption after Advanced Code Hoisting Structure of a Ring Buffer Realization of a Ring Buffer using an Array Original Structure of the CAVITY Application Structure of the CAVITY Application after Data Reuse Transformation Access Analysis of Line Buffers during In-Place Mapping Scalar Replacement of Array Elements Principal operating Mode of Ring Buffer Replacement Ring Buffer Scalarization Loop Unrolling for Ring Buffers Relative Pipeline and Cache Behavior for Intel Pentium III Relative Pipeline and Cache Behavior for Sun UltraSPARC III Relative Pipeline and Cache Behavior for MIPS R Relative Runtimes after Ring Buffer Replacement Relative CodeSizes after Ring Buffer Replacement Relative Energy Consumption after Ring Buffer Replacement Relative Runtimes after entire Optimization Sequence Possible Speed / Size Trade-Offs for Loop Nest Splitting B.1 Structure of the Intel Pentium III Architecture B.2 Structure of the Sun UltraSPARC III Architecture

12 xii

13 List of Tables xiii List of Tables 5.1 Characteristic Properties of Benchmarks A.1 Runtimes of original Benchmark Versions A.2 Runtimes of SUIF Benchmark Versions A.3 Runtimes of IR-C Benchmark Versions B.1 PMC Values for original Benchmark Versions B.2 PMC Values for Benchmarks after Loop Nest Splitting B.3 PIC Values for original Benchmark Versions B.4 PIC Values for Benchmarks after Loop Nest Splitting B.5 PC Values for original Benchmark Versions B.6 PC Values for Benchmarks after Loop Nest Splitting B.7 Compilers and Optimization Levels for Runtime and Code Size Measurements B.8 Runtimes of original Benchmark Versions B.9 Runtimes of Benchmarks after Loop Nest Splitting B.10 Code Sizes of original Benchmark Versions B.11 Code Sizes of Benchmarks after Loop Nest Splitting B.12 Energy Consumption of original Benchmark Versions B.13 Energy Consumption of Benchmarks after Loop Nest Splitting 209 B.14 Runtimes of Benchmarks after Data Partitioning and Loop Nest Splitting B.15 Code Sizes of Benchmarks after Data Partitioning and Loop Nest Splitting B.16 Energy Consumption of original Benchmark Versions B.17 Energy Consumption of Benchmarks after Data Partitioning. 211 B.18 Energy Consumption of Benchmarks after Loop Nest Splitting 211 C.1 PMC Values for original Benchmark Versions C.2 PMC Values for Benchmarks after Advanced Code Hoisting. 214 C.3 PIC Values for original Benchmark Versions

14 xiv List of Tables C.4 PIC Values for Benchmarks after Advanced Code Hoisting C.5 PC Values for original Benchmark Versions C.6 PC Values for Benchmarks after Advanced Code Hoisting C.7 Runtimes of original Benchmark Versions C.8 Runtimes of Benchmarks after Advanced Code Hoisting C.9 Code Sizes of original Benchmark Versions C.10 Code Sizes of Benchmarks after Advanced Code Hoisting C.11 Energy Consumption of original Benchmark Versions C.12 Energy Consumption of Benchmarks after Advanced Code Hoisting D.1 PMC Values of Ring Buffer Replacement for CAVITY D.2 PIC Values of Ring Buffer Replacement for CAVITY D.3 PC Values of Ring Buffer Replacement for CAVITY D.4 Runtimes of Ring Buffer Replacement for CAVITY D.5 Code Sizes of Ring Buffer Replacement for CAVITY D.6 Energy Consumption of Ring Buffer Replacement for CAVITY 222

Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications

Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications University of Dortmund Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications Robert Pyka * Christoph Faßbach * Manish Verma + Heiko Falk * Peter Marwedel

More information

fakultät für informatik informatik 12 technische universität dortmund Optimizations Peter Marwedel TU Dortmund, Informatik 12 Germany 2008/06/25

fakultät für informatik informatik 12 technische universität dortmund Optimizations Peter Marwedel TU Dortmund, Informatik 12 Germany 2008/06/25 12 Optimizations Peter Marwedel TU Dortmund, Informatik 12 Germany 2008/06/25 Application Knowledge Structure of this course New clustering 3: Embedded System HW 2: Specifications 5: Application mapping:

More information

Autotuning. John Cavazos. University of Delaware UNIVERSITY OF DELAWARE COMPUTER & INFORMATION SCIENCES DEPARTMENT

Autotuning. John Cavazos. University of Delaware UNIVERSITY OF DELAWARE COMPUTER & INFORMATION SCIENCES DEPARTMENT Autotuning John Cavazos University of Delaware What is Autotuning? Searching for the best code parameters, code transformations, system configuration settings, etc. Search can be Quasi-intelligent: genetic

More information

MPI: A Message-Passing Interface Standard

MPI: A Message-Passing Interface Standard MPI: A Message-Passing Interface Standard Version 2.1 Message Passing Interface Forum June 23, 2008 Contents Acknowledgments xvl1 1 Introduction to MPI 1 1.1 Overview and Goals 1 1.2 Background of MPI-1.0

More information

Loop Nest Splitting for WCET-Optimization and Predictability Improvement

Loop Nest Splitting for WCET-Optimization and Predictability Improvement Loop Nest Splitting for WCET-Optimization and Predictability Improvement Heiko Falk Martin Schwarzer University of Dortmund, Computer Science 12, D - 44221 Dortmund, Germany Heiko.Falk Martin.Schwarzer@udo.edu

More information

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1 Table of Contents About the Authors... iii Introduction... xvii Chapter 1: System Software... 1 1.1 Concept of System Software... 2 Types of Software Programs... 2 Software Programs and the Computing Machine...

More information

Computer Architecture is...??? CSE 240A -- Principles of Computer Architecture. Computer Architecture is...??? Computer Architecture is...???

Computer Architecture is...??? CSE 240A -- Principles of Computer Architecture. Computer Architecture is...??? Computer Architecture is...??? Computer is...??? -- Principles of Computer Computer Architect (building architect) high-level design organization functionality performance Hardware Designer (builder, construction engineer) materials

More information

Optimizations - Compilation for Embedded Processors -

Optimizations - Compilation for Embedded Processors - 12 Optimizations - Compilation for Embedded Processors - Peter Marwedel TU Dortmund Informatik 12 Germany Graphics: Alexandra Nolte, Gesine Marwedel, 2003 2011 年 01 月 09 日 These slides use Microsoft clip

More information

Optimizations - Compilation for Embedded Processors -

Optimizations - Compilation for Embedded Processors - 12 Optimizations - Compilation for Embedded Processors - Jian-Jia Chen (Slides are based on Peter Marwedel) TU Dortmund Informatik 12 Germany 2015 年 01 月 13 日 These slides use Microsoft clip arts. Microsoft

More information

Memory Hierarchy Basics

Memory Hierarchy Basics Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases

More information

Efficient Hardware Acceleration on SoC- FPGA using OpenCL

Efficient Hardware Acceleration on SoC- FPGA using OpenCL Efficient Hardware Acceleration on SoC- FPGA using OpenCL Advisor : Dr. Benjamin Carrion Schafer Susmitha Gogineni 30 th August 17 Presentation Overview 1.Objective & Motivation 2.Configurable SoC -FPGA

More information

Control Hazards. Branch Prediction

Control Hazards. Branch Prediction Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional

More information

Contents. Acknowledgments... xi. Foreword 1... xiii Pierre FICHEUX. Foreword 2... xv Maryline CHETTO. Part 1. Introduction... 1

Contents. Acknowledgments... xi. Foreword 1... xiii Pierre FICHEUX. Foreword 2... xv Maryline CHETTO. Part 1. Introduction... 1 Contents Acknowledgments... xi Foreword 1... xiii Pierre FICHEUX Foreword 2... xv Maryline CHETTO Part 1. Introduction... 1 Chapter 1. General Introduction... 3 1.1. The outburst of digital data... 3 1.2.

More information

A Coarse-granular Approach to Software Development allowing Non-Programmers to Build and Deploy Reliable, Web-based Applications

A Coarse-granular Approach to Software Development allowing Non-Programmers to Build and Deploy Reliable, Web-based Applications A Coarse-granular Approach to Software Development allowing Non-Programmers to Build and Deploy Reliable, Web-based Applications Dissertation zur Erlangung des Grades eines Doktors der Naturwissenschaften

More information

WCET-Aware C Compiler: WCC

WCET-Aware C Compiler: WCC 12 WCET-Aware C Compiler: WCC Jian-Jia Chen (slides are based on Prof. Heiko Falk) TU Dortmund, Informatik 12 2015 年 05 月 05 日 These slides use Microsoft clip arts. Microsoft copyright restrictions apply.

More information

Languages and Compiler Design II IR Code Optimization

Languages and Compiler Design II IR Code Optimization Languages and Compiler Design II IR Code Optimization Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU Spring 2010 rev.: 4/16/2010 PSU CS322 HM 1 Agenda IR Optimization

More information

Cover Page. The handle holds various files of this Leiden University dissertation

Cover Page. The handle   holds various files of this Leiden University dissertation Cover Page The handle http://hdl.handle.net/1887/32963 holds various files of this Leiden University dissertation Author: Zhai, Jiali Teddy Title: Adaptive streaming applications : analysis and implementation

More information

Combined Data Partitioning and Loop Nest Splitting for Energy Consumption Minimization

Combined Data Partitioning and Loop Nest Splitting for Energy Consumption Minimization Combined Data Partitioning and Loop Nest Splitting for Energy Consumption Minimization Heiko Falk and Manish Verma University of Dortmund, Computer Science 12, D-44221 Dortmund, Germany Abstract. For mobile

More information

Control Flow driven Splitting of Loop Nests at the Source Code Level

Control Flow driven Splitting of Loop Nests at the Source Code Level Control Flow drien Splitting of Loop Nests at the Source Code Leel Heiko Falk, Peter Marwedel Uniersity of Dortmund, Computer Science 12, D - 44221 Dortmund, Germany Heiko.Falk Peter.Marwedel@udo.edu Abstract

More information

The Automatic Design of Batch Processing Systems

The Automatic Design of Batch Processing Systems The Automatic Design of Batch Processing Systems by Barry Dwyer, M.A., D.A.E., Grad.Dip. A thesis submitted for the degree of Doctor of Philosophy in the Department of Computer Science University of Adelaide

More information

SPARK: A Parallelizing High-Level Synthesis Framework

SPARK: A Parallelizing High-Level Synthesis Framework SPARK: A Parallelizing High-Level Synthesis Framework Sumit Gupta Rajesh Gupta, Nikil Dutt, Alex Nicolau Center for Embedded Computer Systems University of California, Irvine and San Diego http://www.cecs.uci.edu/~spark

More information

Computer Architecture A Quantitative Approach

Computer Architecture A Quantitative Approach Computer Architecture A Quantitative Approach Third Edition John L. Hennessy Stanford University David A. Patterson University of California at Berkeley With Contributions by David Goldberg Xerox Palo

More information

Fast Algorithms for Octagon Abstract Domain

Fast Algorithms for Octagon Abstract Domain Research Collection Master Thesis Fast Algorithms for Octagon Abstract Domain Author(s): Singh, Gagandeep Publication Date: 2014 Permanent Link: https://doi.org/10.3929/ethz-a-010154448 Rights / License:

More information

Computer Architecture

Computer Architecture Computer Architecture Pipelined and Parallel Processor Design Michael J. Flynn Stanford University Technische Universrtat Darmstadt FACHBEREICH INFORMATIK BIBLIOTHEK lnventar-nr.: Sachgebiete: Standort:

More information

Compiler Construction

Compiler Construction Compiler Construction Thomas Noll Software Modeling and Verification Group RWTH Aachen University https://moves.rwth-aachen.de/teaching/ws-1819/cc/ Generation of Intermediate Code Outline of Lecture 15

More information

Excel Programming with VBA (Macro Programming) 24 hours Getting Started

Excel Programming with VBA (Macro Programming) 24 hours Getting Started Excel Programming with VBA (Macro Programming) 24 hours Getting Started Introducing Visual Basic for Applications Displaying the Developer Tab in the Ribbon Recording a Macro Saving a Macro-Enabled Workbook

More information

Control Hazards. Prediction

Control Hazards. Prediction Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional

More information

Bi-Objective Optimization for Scheduling in Heterogeneous Computing Systems

Bi-Objective Optimization for Scheduling in Heterogeneous Computing Systems Bi-Objective Optimization for Scheduling in Heterogeneous Computing Systems Tony Maciejewski, Kyle Tarplee, Ryan Friese, and Howard Jay Siegel Department of Electrical and Computer Engineering Colorado

More information

Exploiting Idle Floating-Point Resources for Integer Execution

Exploiting Idle Floating-Point Resources for Integer Execution Exploiting Idle Floating-Point Resources for Integer Execution, Subbarao Palacharla, James E. Smith University of Wisconsin, Madison Motivation Partitioned integer and floating-point resources on current

More information

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases capacity and conflict misses, increases miss penalty Larger total cache capacity to reduce miss

More information

Compiler Construction

Compiler Construction Compiler Construction Lecture 15: Code Generation I (Intermediate Code) Thomas Noll Lehrstuhl für Informatik 2 (Software Modeling and Verification) noll@cs.rwth-aachen.de http://moves.rwth-aachen.de/teaching/ss-14/cc14/

More information

"Charting the Course to Your Success!" MOC D Querying Microsoft SQL Server Course Summary

Charting the Course to Your Success! MOC D Querying Microsoft SQL Server Course Summary Course Summary Description This 5-day instructor led course provides students with the technical skills required to write basic Transact-SQL queries for Microsoft SQL Server 2014. This course is the foundation

More information

Compiler Construction

Compiler Construction Compiler Construction Lecture 15: Code Generation I (Intermediate Code) Thomas Noll Lehrstuhl für Informatik 2 (Software Modeling and Verification) noll@cs.rwth-aachen.de http://moves.rwth-aachen.de/teaching/ss-14/cc14/

More information

Static Single Assignment Form in the COINS Compiler Infrastructure

Static Single Assignment Form in the COINS Compiler Infrastructure Static Single Assignment Form in the COINS Compiler Infrastructure Masataka Sassa (Tokyo Institute of Technology) Background Static single assignment (SSA) form facilitates compiler optimizations. Compiler

More information

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano Building a Runnable Program and Code Improvement Dario Marasco, Greg Klepic, Tess DiStefano Building a Runnable Program Review Front end code Source code analysis Syntax tree Back end code Target code

More information

Optimization of Dynamic Data Structures in Multimedia Embedded Systems Using Evolutionary Computation

Optimization of Dynamic Data Structures in Multimedia Embedded Systems Using Evolutionary Computation Optimization of Dynamic Data Structures in Multimedia Embedded Systems Using Evolutionary Computation D. Atienza, C. Baloukas, L. Papadopoulos, C. Poucet, S. Mamagkakis, J. I. Hidalgo, F. Catthoor, D.

More information

PACE: Power-Aware Computing Engines

PACE: Power-Aware Computing Engines PACE: Power-Aware Computing Engines Krste Asanovic Saman Amarasinghe Martin Rinard Computer Architecture Group MIT Laboratory for Computer Science http://www.cag.lcs.mit.edu/ PACE Approach Energy- Conscious

More information

Tiling: A Data Locality Optimizing Algorithm

Tiling: A Data Locality Optimizing Algorithm Tiling: A Data Locality Optimizing Algorithm Announcements Monday November 28th, Dr. Sanjay Rajopadhye is talking at BMAC Friday December 2nd, Dr. Sanjay Rajopadhye will be leading CS553 Last Monday Kelly

More information

COMPUTATIONAL CHALLENGES IN HIGH-RESOLUTION CRYO-ELECTRON MICROSCOPY. Thesis by. Peter Anthony Leong. In Partial Fulfillment of the Requirements

COMPUTATIONAL CHALLENGES IN HIGH-RESOLUTION CRYO-ELECTRON MICROSCOPY. Thesis by. Peter Anthony Leong. In Partial Fulfillment of the Requirements COMPUTATIONAL CHALLENGES IN HIGH-RESOLUTION CRYO-ELECTRON MICROSCOPY Thesis by Peter Anthony Leong In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy California Institute

More information

A Feasibility Study for Methods of Effective Memoization Optimization

A Feasibility Study for Methods of Effective Memoization Optimization A Feasibility Study for Methods of Effective Memoization Optimization Daniel Mock October 2018 Abstract Traditionally, memoization is a compiler optimization that is applied to regions of code with few

More information

Timed Compiled-Code Functional Simulation of Embedded Software for Performance Analysis of SOC Design

Timed Compiled-Code Functional Simulation of Embedded Software for Performance Analysis of SOC Design IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 22, NO. 1, JANUARY 2003 1 Timed Compiled-Code Functional Simulation of Embedded Software for Performance Analysis of

More information

ECE 5730 Memory Systems

ECE 5730 Memory Systems ECE 5730 Memory Systems Spring 2009 Off-line Cache Content Management Lecture 7: 1 Quiz 4 on Tuesday Announcements Only covers today s lecture No office hours today Lecture 7: 2 Where We re Headed Off-line

More information

Computer Organization

Computer Organization A Text Book of Computer Organization and Architecture Prof. JATINDER SINGH Director, GGI, Dhaliwal Er. AMARDEEP SINGH M.Tech (IT) AP&HOD, Deptt.of CSE, SVIET, Banur Er. GURJEET SINGH M.Tech (CSE) Head,

More information

CPSC 510 (Fall 2007, Winter 2008) Compiler Construction II

CPSC 510 (Fall 2007, Winter 2008) Compiler Construction II CPSC 510 (Fall 2007, Winter 2008) Compiler Construction II Robin Cockett Department of Computer Science University of Calgary Alberta, Canada robin@cpsc.ucalgary.ca Sept. 2007 Introduction to the course

More information

Program design and analysis

Program design and analysis Program design and analysis Optimizing for execution time. Optimizing for energy/power. Optimizing for program size. Motivation Embedded systems must often meet deadlines. Faster may not be fast enough.

More information

Performance Issues and Query Optimization in Monet

Performance Issues and Query Optimization in Monet Performance Issues and Query Optimization in Monet Stefan Manegold Stefan.Manegold@cwi.nl 1 Contents Modern Computer Architecture: CPU & Memory system Consequences for DBMS - Data structures: vertical

More information

Intermediate representation

Intermediate representation Intermediate representation Goals: encode knowledge about the program facilitate analysis facilitate retargeting facilitate optimization scanning parsing HIR semantic analysis HIR intermediate code gen.

More information

Compiler-Directed Memory Hierarchy Design for Low-Energy Embedded Systems

Compiler-Directed Memory Hierarchy Design for Low-Energy Embedded Systems Compiler-Directed Memory Hierarchy Design for Low-Energy Embedded Systems Florin Balasa American University in Cairo Noha Abuaesh American University in Cairo Ilie I. Luican Microsoft Inc., USA Cristian

More information

APPENDIX D: DIGITIZED MAPS

APPENDIX D: DIGITIZED MAPS APPENDIX D: DIGITIZED MAPS 289 290 E-13 E-16 E-15 E-14 E-8 E-11 E-7 E-6 E-12 E-10 E-17 E-5 E-9 E-18 E-2 E-19 E-3 E-4 Figure D-1. The 18 quadrangles of digitized maps (Scale = 1:8,500). The Painted Temple

More information

An Ontological Framework for Contextualising Information in Hypermedia Systems.

An Ontological Framework for Contextualising Information in Hypermedia Systems. An Ontological Framework for Contextualising Information in Hypermedia Systems. by Andrew James Bucknell Thesis submitted for the degree of Doctor of Philosophy University of Technology, Sydney 2008 CERTIFICATE

More information

THE AUSTRALIAN NATIONAL UNIVERSITY Mid Semester Examination April 2010 COMP3320/6464. High Performance Scientific Computing

THE AUSTRALIAN NATIONAL UNIVERSITY Mid Semester Examination April 2010 COMP3320/6464. High Performance Scientific Computing THE AUSTRALIAN NATIONAL UNIVERSITY Mid Semester Examination April 2010 COMP3320/6464 High Performance Scientific Computing Study Period: 15 minutes Time Allowed: 90 minutes Permitted Materials: Non-Programmable

More information

INTEGRATING THE CAD MODEL WITH DYNAMIC SIMULATION: SIMULATION DATA EXCHANGE. Shreekanth Moorthy

INTEGRATING THE CAD MODEL WITH DYNAMIC SIMULATION: SIMULATION DATA EXCHANGE. Shreekanth Moorthy Proceedings of the 1999 Winter Simulation Conference P. A. Farrington, H. B. Nembhard, D. T. Sturrock, and G. W. Evans, eds. INTEGRATING THE CAD MODEL WITH DYNAMIC SIMULATION: SIMULATION DATA EXCHANGE

More information

Supercomputing in Plain English Part IV: Henry Neeman, Director

Supercomputing in Plain English Part IV: Henry Neeman, Director Supercomputing in Plain English Part IV: Henry Neeman, Director OU Supercomputing Center for Education & Research University of Oklahoma Wednesday September 19 2007 Outline! Dependency Analysis! What is

More information

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11 Preface xvii Acknowledgments xix CHAPTER 1 Introduction to Parallel Computing 1 1.1 Motivating Parallelism 2 1.1.1 The Computational Power Argument from Transistors to FLOPS 2 1.1.2 The Memory/Disk Speed

More information

Summary of Contents LIST OF FIGURES LIST OF TABLES

Summary of Contents LIST OF FIGURES LIST OF TABLES Summary of Contents LIST OF FIGURES LIST OF TABLES PREFACE xvii xix xxi PART 1 BACKGROUND Chapter 1. Introduction 3 Chapter 2. Standards-Makers 21 Chapter 3. Principles of the S2ESC Collection 45 Chapter

More information

Dedication. To the departed souls of my parents & father-in-law.

Dedication. To the departed souls of my parents & father-in-law. Abstract In this thesis work, a contribution to the field of Formal Verification is presented innovating a semantic-based approach for the verification of concurrent and distributed programs by applying

More information

A Parallelizing Compiler for Multicore Systems

A Parallelizing Compiler for Multicore Systems A Parallelizing Compiler for Multicore Systems José M. Andión, Manuel Arenaz, Gabriel Rodríguez and Juan Touriño 17th International Workshop on Software and Compilers for Embedded Systems (SCOPES 2014)

More information

Introduction to Compilers

Introduction to Compilers Introduction to Compilers Compilers are language translators input: program in one language output: equivalent program in another language Introduction to Compilers Two types Compilers offline Data Program

More information

DRIM:ALowPowerDynamically Reconfigurable Instruction Memory Hierarchy for Embedded Systems

DRIM:ALowPowerDynamically Reconfigurable Instruction Memory Hierarchy for Embedded Systems DRIM:ALowPowerDynamically Reconfigurable Instruction Memory Hierarchy for Embedded Systems Zhiguo Ge, Weng-Fai Wong Department of Computer Science, National University of Singapore {gezhiguo,wongwf}@comp.nus.edu.sg

More information

Memory Hierarchy Management for Iterative Graph Structures

Memory Hierarchy Management for Iterative Graph Structures Memory Hierarchy Management for Iterative Graph Structures Ibraheem Al-Furaih y Syracuse University Sanjay Ranka University of Florida Abstract The increasing gap in processor and memory speeds has forced

More information

Towards Optimal Custom Instruction Processors

Towards Optimal Custom Instruction Processors Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT CHIPS 18 Overview 1. background: extensible processors

More information

Rechnerarchitektur (RA)

Rechnerarchitektur (RA) 12 Rechnerarchitektur (RA) Sommersemester 2017 Architecture-Aware Optimizations -Software Optimizations- Jian-Jia Chen Informatik 12 Jian-jia.chen@tu-.. http://ls12-www.cs.tu.de/daes/ Tel.: 0231 755 6078

More information

Structure of Computer Systems

Structure of Computer Systems Structure of Computer Systems Structure of Computer Systems Baruch Zoltan Francisc Technical University of Cluj-Napoca Computer Science Department U. T. PRES Cluj-Napoca, 2002 CONTENTS PREFACE... xiii

More information

Compiler Construction

Compiler Construction Compiler Construction Thomas Noll Software Modeling and Verification Group RWTH Aachen University https://moves.rwth-aachen.de/teaching/ss-17/cc/ Generation of Intermediate Code Outline of Lecture 15 Generation

More information

USC 227 Office hours: 3-4 Monday and Wednesday CS553 Lecture 1 Introduction 4

USC 227 Office hours: 3-4 Monday and Wednesday  CS553 Lecture 1 Introduction 4 CS553 Compiler Construction Instructor: URL: Michelle Strout mstrout@cs.colostate.edu USC 227 Office hours: 3-4 Monday and Wednesday http://www.cs.colostate.edu/~cs553 CS553 Lecture 1 Introduction 3 Plan

More information

250 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 2, FEBRUARY 2011

250 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 2, FEBRUARY 2011 250 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 2, FEBRUARY 2011 Energy-Efficient Hardware Data Prefetching Yao Guo, Member, IEEE, Pritish Narayanan, Student Member,

More information

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved. LRU A list to keep track of the order of access to every block in the set. The least recently used block is replaced (if needed). How many bits we need for that? 27 Pseudo LRU A B C D E F G H A B C D E

More information

Visualization of OpenCL Application Execution on CPU-GPU Systems

Visualization of OpenCL Application Execution on CPU-GPU Systems Visualization of OpenCL Application Execution on CPU-GPU Systems A. Ziabari*, R. Ubal*, D. Schaa**, D. Kaeli* *NUCAR Group, Northeastern Universiy **AMD Northeastern University Computer Architecture Research

More information

(Cover Page) TITLE OF THE THESIS TITLE OF THE THESIS TITLE OF THE THESIS (Not more than 3 lines. CAPS Times New Roman Font size 20)

(Cover Page) TITLE OF THE THESIS TITLE OF THE THESIS TITLE OF THE THESIS (Not more than 3 lines. CAPS Times New Roman Font size 20) (Cover Page) (Not more than 3 lines. CAPS Times New Roman Font size 20) A Thesis submitted by Dr./ Mr. / Ms [Full Name with Initials (if any)] (US No..) to NITTE (DEEMED TO BE UNIVERSITY) (Estd under Section

More information

Loop Transformations! Part II!

Loop Transformations! Part II! Lecture 9! Loop Transformations! Part II! John Cavazos! Dept of Computer & Information Sciences! University of Delaware! www.cis.udel.edu/~cavazos/cisc879! Loop Unswitching Hoist invariant control-flow

More information

A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors

A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors Murali Jayapala 1, Francisco Barat 1, Pieter Op de Beeck 1, Francky Catthoor 2, Geert Deconinck 1 and Henk Corporaal

More information

Topic & Scope. Content: The course gives

Topic & Scope. Content: The course gives Topic & Scope Content: The course gives an overview of network processor cards (architectures and use) an introduction of how to program Intel IXP network processors some ideas of how to use network processors

More information

Unit 2: High-Level Synthesis

Unit 2: High-Level Synthesis Course contents Unit 2: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 2 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis

More information

Loops. Announcements. Loop fusion. Loop unrolling. Code motion. Array. Good targets for optimization. Basic loop optimizations:

Loops. Announcements. Loop fusion. Loop unrolling. Code motion. Array. Good targets for optimization. Basic loop optimizations: Announcements HW1 is available online Next Class Liang will give a tutorial on TinyOS/motes Very useful! Classroom: EADS Hall 116 This Wed ONLY Proposal is due on 5pm, Wed Email me your proposal Loops

More information

Data Reuse and Parallelism in Hardware Compilation

Data Reuse and Parallelism in Hardware Compilation Imperial College London Department of Electrical and Electronic Engineering Data Reuse and Parallelism in Hardware Compilation Qiang Liu Submitted in part fulfilment of the requirements for the degree

More information

Masterpraktikum Scientific Computing

Masterpraktikum Scientific Computing Masterpraktikum Scientific Computing High-Performance Computing Michael Bader Alexander Heinecke Technische Universität München, Germany Outline Logins Levels of Parallelism Single Processor Systems Von-Neumann-Principle

More information

Roll No. :... Invigilator's Signature :. CS/B.Tech(CSE)/SEM-7/CS-701/ LANGUAGE PROCESSOR. Time Allotted : 3 Hours Full Marks : 70

Roll No. :... Invigilator's Signature :. CS/B.Tech(CSE)/SEM-7/CS-701/ LANGUAGE PROCESSOR. Time Allotted : 3 Hours Full Marks : 70 Name : Roll No. :... Invigilator's Signature :. CS/B.Tech(CSE)/SEM-7/CS-701/2011-12 2011 LANGUAGE PROCESSOR Time Allotted : 3 Hours Full Marks : 70 The figures in the margin indicate full marks. Candidates

More information

Principles of Compiler Design

Principles of Compiler Design Principles of Compiler Design Intermediate Representation Compiler Lexical Analysis Syntax Analysis Semantic Analysis Source Program Token stream Abstract Syntax tree Unambiguous Program representation

More information

technische universität dortmund fakultät für informatik informatik 12 Optimizations Peter Marwedel TU Dortmund Informatik 12 Germany 2010/01/10

technische universität dortmund fakultät für informatik informatik 12 Optimizations Peter Marwedel TU Dortmund Informatik 12 Germany 2010/01/10 12 Peter Marwedel TU Dortmund Informatik 12 Germany 2010/01/10 Graphics: Alexandra Nolte, Gesine Marwedel, 2003 Optimizations Structure of this course Application Knowledge 2: Specification Design repository

More information

Advanced Instruction-Level Parallelism

Advanced Instruction-Level Parallelism Advanced Instruction-Level Parallelism Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu

More information

Software Exorcism: A Handbook for Debugging and Optimizing Legacy Code

Software Exorcism: A Handbook for Debugging and Optimizing Legacy Code Software Exorcism: A Handbook for Debugging and Optimizing Legacy Code BILL BLUNDEN Apress About the Author Acknowledgments Introduction xi xiii xv Chapter 1 Preventative Medicine 1 1.1 Core Problems 2

More information

Compiler Optimization

Compiler Optimization Compiler Optimization The compiler translates programs written in a high-level language to assembly language code Assembly language code is translated to object code by an assembler Object code modules

More information

Locality-Aware Automatic Parallelization for GPGPU with OpenHMPP Directives

Locality-Aware Automatic Parallelization for GPGPU with OpenHMPP Directives Locality-Aware Automatic Parallelization for GPGPU with OpenHMPP Directives José M. Andión, Manuel Arenaz, François Bodin, Gabriel Rodríguez and Juan Touriño 7th International Symposium on High-Level Parallel

More information

Programming Language Processor Theory

Programming Language Processor Theory Programming Language Processor Theory Munehiro Takimoto Course Descriptions Method of Evaluation: made through your technical reports Purposes: understanding various theories and implementations of modern

More information

1 CUSTOM TAG FUNDAMENTALS PREFACE... xiii. ACKNOWLEDGMENTS... xix. Using Custom Tags The JSP File 5. Defining Custom Tags The TLD 6

1 CUSTOM TAG FUNDAMENTALS PREFACE... xiii. ACKNOWLEDGMENTS... xix. Using Custom Tags The JSP File 5. Defining Custom Tags The TLD 6 PREFACE........................... xiii ACKNOWLEDGMENTS................... xix 1 CUSTOM TAG FUNDAMENTALS.............. 2 Using Custom Tags The JSP File 5 Defining Custom Tags The TLD 6 Implementing Custom

More information

Computer Systems A Programmer s Perspective 1 (Beta Draft)

Computer Systems A Programmer s Perspective 1 (Beta Draft) Computer Systems A Programmer s Perspective 1 (Beta Draft) Randal E. Bryant David R. O Hallaron August 1, 2001 1 Copyright c 2001, R. E. Bryant, D. R. O Hallaron. All rights reserved. 2 Contents Preface

More information

"Charting the Course... MOC C: Querying Data with Transact-SQL. Course Summary

Charting the Course... MOC C: Querying Data with Transact-SQL. Course Summary Course Summary Description This course is designed to introduce students to Transact-SQL. It is designed in such a way that the first three days can be taught as a course to students requiring the knowledge

More information

Supercomputing and Science An Introduction to High Performance Computing

Supercomputing and Science An Introduction to High Performance Computing Supercomputing and Science An Introduction to High Performance Computing Part IV: Dependency Analysis and Stupid Compiler Tricks Henry Neeman, Director OU Supercomputing Center for Education & Research

More information

Compiler Construction

Compiler Construction Compiler Construction Thomas Noll Software Modeling and Verification Group RWTH Aachen University https://moves.rwth-aachen.de/teaching/ss-16/cc/ Seminar Analysis and Verification of Pointer Programs (WS

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2

More information

Apple LLVM GPU Compiler: Embedded Dragons. Charu Chandrasekaran, Apple Marcello Maggioni, Apple

Apple LLVM GPU Compiler: Embedded Dragons. Charu Chandrasekaran, Apple Marcello Maggioni, Apple Apple LLVM GPU Compiler: Embedded Dragons Charu Chandrasekaran, Apple Marcello Maggioni, Apple 1 Agenda How Apple uses LLVM to build a GPU Compiler Factors that affect GPU performance The Apple GPU compiler

More information

SEMINAR REPORT SEMINAR TITLE. Name of the student

SEMINAR REPORT SEMINAR TITLE. Name of the student SEMINAR REPORT ON SEMINAR TITLE By Name of the student DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING M.B.E.SOCIETY S COLLEGE OF ENGINEERING, AMBAJOGAI 20XX-20XX SEMINAR REPORT ON SEMINAR TITLE By Name

More information

Computer Architecture

Computer Architecture Computer Architecture Lecture 2: Fundamental Concepts and ISA Dr. Ahmed Sallam Based on original slides by Prof. Onur Mutlu What Do I Expect From You? Chance favors the prepared mind. (Louis Pasteur) كل

More information

Facilities Manager Local Device Tracking

Facilities Manager Local Device Tracking Facilities Manager Local Device Tracking The Information Collection Engine (ICE) can track print volumes on all types of devices, whether they are networked or not. Devices that do not support SNMP or

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2

More information

A Tutorial Introduction 1

A Tutorial Introduction 1 Preface From the Old to the New Acknowledgments xv xvii xxi 1 Verilog A Tutorial Introduction 1 Getting Started A Structural Description Simulating the binarytoeseg Driver Creating Ports For the Module

More information

"Charting the Course... MOC A Developing Data Access Solutions with Microsoft Visual Studio Course Summary

Charting the Course... MOC A Developing Data Access Solutions with Microsoft Visual Studio Course Summary Description Course Summary In this course, experienced developers who know the basics of data access (CRUD) in Windows client and Web application environments will learn to optimize their designs and develop

More information

AUTOMATED STUDENT S ATTENDANCE ENTERING SYSTEM BY ELIMINATING FORGE SIGNATURES

AUTOMATED STUDENT S ATTENDANCE ENTERING SYSTEM BY ELIMINATING FORGE SIGNATURES AUTOMATED STUDENT S ATTENDANCE ENTERING SYSTEM BY ELIMINATING FORGE SIGNATURES K. P. M. L. P. Weerasinghe 149235H Faculty of Information Technology University of Moratuwa June 2017 AUTOMATED STUDENT S

More information

Microsoft Power Tools for Data Analysis #10 Power BI M Code: Helper Table to Calculate MAT By Month & Product. Notes from Video:

Microsoft Power Tools for Data Analysis #10 Power BI M Code: Helper Table to Calculate MAT By Month & Product. Notes from Video: Microsoft Power Tools for Data Analysis #10 Power BI M Code: Helper Table to Calculate MAT By Month & Product Table of Contents: Notes from Video: 1. Intermediate Fact Table / Helper Table:... 1 2. Goal

More information

"Charting the Course to Your Success!" MOC Microsoft SharePoint 2010 Site Collection and Site Administration Course Summary

Charting the Course to Your Success! MOC Microsoft SharePoint 2010 Site Collection and Site Administration Course Summary MOC 50547 Microsoft SharePoint Site Collection and Site Course Summary Description This five-day instructor-led Site Collection and Site Administrator course gives students who have SharePoint Owner permissions

More information