Tag der mündlichen Prüfung: 03. Juni 2004 Dekan / Dekanin: Prof. Dr. Bernhard Steffen Gutachter / Gutachterinnen: Prof. Dr. Francky Catthoor, Prof. Dr
|
|
- Coleen McDaniel
- 5 years ago
- Views:
Transcription
1 Source Code Optimization Techniques for Data Flow Dominated Embedded Software Dissertation zur Erlangung des Grades eines Doktors der Naturwissenschaften der Universität Dortmund am Fachbereich Informatik von Heiko Falk Dortmund 2004
2 Tag der mündlichen Prüfung: 03. Juni 2004 Dekan / Dekanin: Prof. Dr. Bernhard Steffen Gutachter / Gutachterinnen: Prof. Dr. Francky Catthoor, Prof. Dr. Peter Marwedel, Prof. Dr. Peter Padawitz
3
4
5 Contents v Contents List of Figures List of Tables ix xiii 1 Introduction Why Source Code Optimization? Abstraction Levels of Code Optimization Survey of the traditional Code Optimization Process Scopes for Code Optimization Target Application Domain Goals and Contributions Outline of the Thesis Author's Contribution to the Thesis Existing Code Optimization Techniques Specification Optimization Algorithm Selection Memory Hierarchy Exploitation Processor-independent Source Code Optimizations Processor-specific Source Code Optimizations Compiler Optimizations Loop Optimizations for High-Performance Computing Code Generation for embedded Processors Fundamental Concepts for Optimization and Evaluation Polyhedral Modeling Optimization using Genetic Algorithms Benchmarking Methodology Profiling of Pipeline and Cache Performance Compilation for Runtime and Code Size Measurement Estimation of Energy Dissipation
6 vi Contents 3.4 Summary Intermediate Representations Low-level Intermediate Representations GNU RTL Trimaran ELCOR IR Medium-level Intermediate Representations Sun IR IR-C / LANCE High-level Intermediate Representations SUIF IMPACT Selection of an IR for Source Code Optimization Summary Loop Nest Splitting Introduction Control Flow Overhead in Data Flow dominated Software Control Flow Overhead caused by Data Partitioning Splitting of Loop Nests for Control Flow Optimization Related Work Analysis and Optimization Techniques for Loop Nest Splitting Preliminaries Condition Satisfiability Condition Optimization Chromosomal Representation Fitness Function Polytope Generation Global Search Space Construction Global Search Space Exploration Chromosomal Representation Fitness Function Source Code Transformation Generation of the splitting If-Statement Loop Nest Duplication Extensions for Loops with non-constant Bounds Experimental Results Stand-alone Loop Nest Splitting Pipeline and Cache Performance Execution Times and Code Sizes Energy Consumption
7 Contents vii Combined Data Partitioning and Loop Nest Splitting for energy-efficient Scratchpad Utilization Execution Times and Code Sizes Energy Consumption Summary Advanced Code Hoisting A motivating Example Related Work Analysis Techniques for Advanced Code Hoisting Common Subexpression Identification Collection of equivalent Expressions Computation of Live Ranges of Expressions Determination of the outermost Loop for a CSE Computation of Execution Frequencies using Polytope Models Experimental Results Pipeline and Cache Performance Execution Times and Code Sizes Energy Consumption Summary Ring Buffer Replacement Motivation Optimization Steps Ring Buffer Scalarization Loop Unrolling for Ring Buffers Experimental Results Pipeline and Cache Performance Execution Times and Code Sizes Energy Consumption Summary Summary and Conclusions Summary and Contribution to Research Future Work A Experimental Comparison of SUIF and IR-C / LANCE 199
8 viii Contents B Benchmarking Data for Loop Nest Splitting 201 B.1 Values of performance-monitoring Counters B.1.1 Intel Pentium III B.1.2 Sun UltraSPARC III B.1.3 MIPS R B.2 Execution Times and Code Sizes B.3 Energy Consumption of an ARM7TDMI Core B.4 Combined Data Partitioning and Loop Nest Splitting B.4.1 Execution Times and Code Sizes B.4.2 Energy Consumption C Benchmarking Data for Advanced Code Hoisting 213 C.1 Values of performance-monitoring Counters C.1.1 Intel Pentium III C.1.2 Sun UltraSPARC III C.1.3 MIPS R C.2 Execution Times and Code Sizes C.3 Energy Consumption of an ARM7TDMI Core D Benchmarking Data for Ring Buffer Replacement 219 D.1 Values of performance-monitoring Counters D.1.1 Intel Pentium III D.1.2 Sun UltraSPARC III D.1.3 MIPS R D.2 Execution Times and Code Sizes D.3 Energy Consumption of an ARM7TDMI Core References 223 Index 235
9 List of Figures ix List of Figures 1.1 Examples of typical Embedded Systems Abstraction Levels and achievable Improvements of Code Optimizations Structure of Optimizing Compilers Implementation of a saturating Addition Abstraction Levels of Code Optimizations Overview of the DTSE Methodology Replacement of complex Code by atic6x Intrinsic A convex Polytope A non-convex Set of Points Schematic Overview of Genetic Algorithms Genetic Algorithm Encoding Crossover Mutation Overview of the Workflow employed in the encc Compiler Experimental Comparison of Runtimes based on SUIF and IR-C / LANCE Atypical Loop Nest Atypical Code Fragment before and after Data Partitioning Loop Nest after Splitting Conventional Loop Splitting Conventional Loop Unswitching Design Flow of Loop Nest Splitting Conditions as Polytopes Structure of the Fitness Function for Global Search Space Exploration Atypical Loop Nest with variable Loop Bounds Structure of the Fitness Function for Condition Optimization. 103
10 x List of Figures 5.11 A Loop Nest with variable Loop Bounds after Splitting Relative Pipeline and Cache Behavior for Intel Pentium III Relative Pipeline and Cache Behavior for Sun UltraSPARC III Relative Pipeline and Cache Behavior for MIPS R Relative Runtimes after Loop Nest Splitting Relative CodeSizes after Loop Nest Splitting Relative Energy Consumption after Loop Nest Splitting Relative Runtimes after Data Partitioning and Loop Nest Splitting Relative Code Sizes after Data Partitioning and Loop Nest Splitting Relative Energy Consumption after Data Partitioning Relative Energy Consumption after Data Partitioning and Loop Nest Splitting ACodeSequence of GSM Codebook Search The GSM Codebook Search after Code Hoisting without Control Flow Analysis The GSM Codebook Search after Advanced Code Hoisting Local Common Subexpression Elimination Global Common Subexpression Elimination Loop-invariant CodeMotion Parts of DTSE causing Control Flow and Addressing Overhead AFragment of the CAVITY Benchmark before and after DTSE Algorithm for Advanced Code Hoisting Algorithm to identify Common Subexpressions The SUIF Abstract Syntax Tree Bottom-up Syntax Tree Traversal Algorithm to determine the Parent of expr i;c and expr i;c Basic Algorithm for Syntax Tree Traversal Algorithm to compute Live Ranges Algorithm to determine the outermost Loop for a CSE General Structure of Control Flow surrounding an Expression expr Relative Pipeline and Cache Behavior for Intel Pentium III Relative Pipeline and Cache Behavior for Sun UltraSPARC III Relative Pipeline and Cache Behavior for MIPS R Relative Runtimes after Advanced Code Hoisting Runtime Comparison of Advanced Code Hoisting and Common Subexpression Elimination Relative CodeSizes after Advanced Code Hoisting
11 List of Figures xi 6.24 Relative Energy Consumption after Advanced Code Hoisting Structure of a Ring Buffer Realization of a Ring Buffer using an Array Original Structure of the CAVITY Application Structure of the CAVITY Application after Data Reuse Transformation Access Analysis of Line Buffers during In-Place Mapping Scalar Replacement of Array Elements Principal operating Mode of Ring Buffer Replacement Ring Buffer Scalarization Loop Unrolling for Ring Buffers Relative Pipeline and Cache Behavior for Intel Pentium III Relative Pipeline and Cache Behavior for Sun UltraSPARC III Relative Pipeline and Cache Behavior for MIPS R Relative Runtimes after Ring Buffer Replacement Relative CodeSizes after Ring Buffer Replacement Relative Energy Consumption after Ring Buffer Replacement Relative Runtimes after entire Optimization Sequence Possible Speed / Size Trade-Offs for Loop Nest Splitting B.1 Structure of the Intel Pentium III Architecture B.2 Structure of the Sun UltraSPARC III Architecture
12 xii
13 List of Tables xiii List of Tables 5.1 Characteristic Properties of Benchmarks A.1 Runtimes of original Benchmark Versions A.2 Runtimes of SUIF Benchmark Versions A.3 Runtimes of IR-C Benchmark Versions B.1 PMC Values for original Benchmark Versions B.2 PMC Values for Benchmarks after Loop Nest Splitting B.3 PIC Values for original Benchmark Versions B.4 PIC Values for Benchmarks after Loop Nest Splitting B.5 PC Values for original Benchmark Versions B.6 PC Values for Benchmarks after Loop Nest Splitting B.7 Compilers and Optimization Levels for Runtime and Code Size Measurements B.8 Runtimes of original Benchmark Versions B.9 Runtimes of Benchmarks after Loop Nest Splitting B.10 Code Sizes of original Benchmark Versions B.11 Code Sizes of Benchmarks after Loop Nest Splitting B.12 Energy Consumption of original Benchmark Versions B.13 Energy Consumption of Benchmarks after Loop Nest Splitting 209 B.14 Runtimes of Benchmarks after Data Partitioning and Loop Nest Splitting B.15 Code Sizes of Benchmarks after Data Partitioning and Loop Nest Splitting B.16 Energy Consumption of original Benchmark Versions B.17 Energy Consumption of Benchmarks after Data Partitioning. 211 B.18 Energy Consumption of Benchmarks after Loop Nest Splitting 211 C.1 PMC Values for original Benchmark Versions C.2 PMC Values for Benchmarks after Advanced Code Hoisting. 214 C.3 PIC Values for original Benchmark Versions
14 xiv List of Tables C.4 PIC Values for Benchmarks after Advanced Code Hoisting C.5 PC Values for original Benchmark Versions C.6 PC Values for Benchmarks after Advanced Code Hoisting C.7 Runtimes of original Benchmark Versions C.8 Runtimes of Benchmarks after Advanced Code Hoisting C.9 Code Sizes of original Benchmark Versions C.10 Code Sizes of Benchmarks after Advanced Code Hoisting C.11 Energy Consumption of original Benchmark Versions C.12 Energy Consumption of Benchmarks after Advanced Code Hoisting D.1 PMC Values of Ring Buffer Replacement for CAVITY D.2 PIC Values of Ring Buffer Replacement for CAVITY D.3 PC Values of Ring Buffer Replacement for CAVITY D.4 Runtimes of Ring Buffer Replacement for CAVITY D.5 Code Sizes of Ring Buffer Replacement for CAVITY D.6 Energy Consumption of Ring Buffer Replacement for CAVITY 222
Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications
University of Dortmund Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications Robert Pyka * Christoph Faßbach * Manish Verma + Heiko Falk * Peter Marwedel
More informationfakultät für informatik informatik 12 technische universität dortmund Optimizations Peter Marwedel TU Dortmund, Informatik 12 Germany 2008/06/25
12 Optimizations Peter Marwedel TU Dortmund, Informatik 12 Germany 2008/06/25 Application Knowledge Structure of this course New clustering 3: Embedded System HW 2: Specifications 5: Application mapping:
More informationAutotuning. John Cavazos. University of Delaware UNIVERSITY OF DELAWARE COMPUTER & INFORMATION SCIENCES DEPARTMENT
Autotuning John Cavazos University of Delaware What is Autotuning? Searching for the best code parameters, code transformations, system configuration settings, etc. Search can be Quasi-intelligent: genetic
More informationMPI: A Message-Passing Interface Standard
MPI: A Message-Passing Interface Standard Version 2.1 Message Passing Interface Forum June 23, 2008 Contents Acknowledgments xvl1 1 Introduction to MPI 1 1.1 Overview and Goals 1 1.2 Background of MPI-1.0
More informationLoop Nest Splitting for WCET-Optimization and Predictability Improvement
Loop Nest Splitting for WCET-Optimization and Predictability Improvement Heiko Falk Martin Schwarzer University of Dortmund, Computer Science 12, D - 44221 Dortmund, Germany Heiko.Falk Martin.Schwarzer@udo.edu
More informationAbout the Authors... iii Introduction... xvii. Chapter 1: System Software... 1
Table of Contents About the Authors... iii Introduction... xvii Chapter 1: System Software... 1 1.1 Concept of System Software... 2 Types of Software Programs... 2 Software Programs and the Computing Machine...
More informationComputer Architecture is...??? CSE 240A -- Principles of Computer Architecture. Computer Architecture is...??? Computer Architecture is...???
Computer is...??? -- Principles of Computer Computer Architect (building architect) high-level design organization functionality performance Hardware Designer (builder, construction engineer) materials
More informationOptimizations - Compilation for Embedded Processors -
12 Optimizations - Compilation for Embedded Processors - Peter Marwedel TU Dortmund Informatik 12 Germany Graphics: Alexandra Nolte, Gesine Marwedel, 2003 2011 年 01 月 09 日 These slides use Microsoft clip
More informationOptimizations - Compilation for Embedded Processors -
12 Optimizations - Compilation for Embedded Processors - Jian-Jia Chen (Slides are based on Peter Marwedel) TU Dortmund Informatik 12 Germany 2015 年 01 月 13 日 These slides use Microsoft clip arts. Microsoft
More informationMemory Hierarchy Basics
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases
More informationEfficient Hardware Acceleration on SoC- FPGA using OpenCL
Efficient Hardware Acceleration on SoC- FPGA using OpenCL Advisor : Dr. Benjamin Carrion Schafer Susmitha Gogineni 30 th August 17 Presentation Overview 1.Objective & Motivation 2.Configurable SoC -FPGA
More informationControl Hazards. Branch Prediction
Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional
More informationContents. Acknowledgments... xi. Foreword 1... xiii Pierre FICHEUX. Foreword 2... xv Maryline CHETTO. Part 1. Introduction... 1
Contents Acknowledgments... xi Foreword 1... xiii Pierre FICHEUX Foreword 2... xv Maryline CHETTO Part 1. Introduction... 1 Chapter 1. General Introduction... 3 1.1. The outburst of digital data... 3 1.2.
More informationA Coarse-granular Approach to Software Development allowing Non-Programmers to Build and Deploy Reliable, Web-based Applications
A Coarse-granular Approach to Software Development allowing Non-Programmers to Build and Deploy Reliable, Web-based Applications Dissertation zur Erlangung des Grades eines Doktors der Naturwissenschaften
More informationWCET-Aware C Compiler: WCC
12 WCET-Aware C Compiler: WCC Jian-Jia Chen (slides are based on Prof. Heiko Falk) TU Dortmund, Informatik 12 2015 年 05 月 05 日 These slides use Microsoft clip arts. Microsoft copyright restrictions apply.
More informationLanguages and Compiler Design II IR Code Optimization
Languages and Compiler Design II IR Code Optimization Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU Spring 2010 rev.: 4/16/2010 PSU CS322 HM 1 Agenda IR Optimization
More informationCover Page. The handle holds various files of this Leiden University dissertation
Cover Page The handle http://hdl.handle.net/1887/32963 holds various files of this Leiden University dissertation Author: Zhai, Jiali Teddy Title: Adaptive streaming applications : analysis and implementation
More informationCombined Data Partitioning and Loop Nest Splitting for Energy Consumption Minimization
Combined Data Partitioning and Loop Nest Splitting for Energy Consumption Minimization Heiko Falk and Manish Verma University of Dortmund, Computer Science 12, D-44221 Dortmund, Germany Abstract. For mobile
More informationControl Flow driven Splitting of Loop Nests at the Source Code Level
Control Flow drien Splitting of Loop Nests at the Source Code Leel Heiko Falk, Peter Marwedel Uniersity of Dortmund, Computer Science 12, D - 44221 Dortmund, Germany Heiko.Falk Peter.Marwedel@udo.edu Abstract
More informationThe Automatic Design of Batch Processing Systems
The Automatic Design of Batch Processing Systems by Barry Dwyer, M.A., D.A.E., Grad.Dip. A thesis submitted for the degree of Doctor of Philosophy in the Department of Computer Science University of Adelaide
More informationSPARK: A Parallelizing High-Level Synthesis Framework
SPARK: A Parallelizing High-Level Synthesis Framework Sumit Gupta Rajesh Gupta, Nikil Dutt, Alex Nicolau Center for Embedded Computer Systems University of California, Irvine and San Diego http://www.cecs.uci.edu/~spark
More informationComputer Architecture A Quantitative Approach
Computer Architecture A Quantitative Approach Third Edition John L. Hennessy Stanford University David A. Patterson University of California at Berkeley With Contributions by David Goldberg Xerox Palo
More informationFast Algorithms for Octagon Abstract Domain
Research Collection Master Thesis Fast Algorithms for Octagon Abstract Domain Author(s): Singh, Gagandeep Publication Date: 2014 Permanent Link: https://doi.org/10.3929/ethz-a-010154448 Rights / License:
More informationComputer Architecture
Computer Architecture Pipelined and Parallel Processor Design Michael J. Flynn Stanford University Technische Universrtat Darmstadt FACHBEREICH INFORMATIK BIBLIOTHEK lnventar-nr.: Sachgebiete: Standort:
More informationCompiler Construction
Compiler Construction Thomas Noll Software Modeling and Verification Group RWTH Aachen University https://moves.rwth-aachen.de/teaching/ws-1819/cc/ Generation of Intermediate Code Outline of Lecture 15
More informationExcel Programming with VBA (Macro Programming) 24 hours Getting Started
Excel Programming with VBA (Macro Programming) 24 hours Getting Started Introducing Visual Basic for Applications Displaying the Developer Tab in the Ribbon Recording a Macro Saving a Macro-Enabled Workbook
More informationControl Hazards. Prediction
Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional
More informationBi-Objective Optimization for Scheduling in Heterogeneous Computing Systems
Bi-Objective Optimization for Scheduling in Heterogeneous Computing Systems Tony Maciejewski, Kyle Tarplee, Ryan Friese, and Howard Jay Siegel Department of Electrical and Computer Engineering Colorado
More informationExploiting Idle Floating-Point Resources for Integer Execution
Exploiting Idle Floating-Point Resources for Integer Execution, Subbarao Palacharla, James E. Smith University of Wisconsin, Madison Motivation Partitioned integer and floating-point resources on current
More informationMemory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple
Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases capacity and conflict misses, increases miss penalty Larger total cache capacity to reduce miss
More informationCompiler Construction
Compiler Construction Lecture 15: Code Generation I (Intermediate Code) Thomas Noll Lehrstuhl für Informatik 2 (Software Modeling and Verification) noll@cs.rwth-aachen.de http://moves.rwth-aachen.de/teaching/ss-14/cc14/
More information"Charting the Course to Your Success!" MOC D Querying Microsoft SQL Server Course Summary
Course Summary Description This 5-day instructor led course provides students with the technical skills required to write basic Transact-SQL queries for Microsoft SQL Server 2014. This course is the foundation
More informationCompiler Construction
Compiler Construction Lecture 15: Code Generation I (Intermediate Code) Thomas Noll Lehrstuhl für Informatik 2 (Software Modeling and Verification) noll@cs.rwth-aachen.de http://moves.rwth-aachen.de/teaching/ss-14/cc14/
More informationStatic Single Assignment Form in the COINS Compiler Infrastructure
Static Single Assignment Form in the COINS Compiler Infrastructure Masataka Sassa (Tokyo Institute of Technology) Background Static single assignment (SSA) form facilitates compiler optimizations. Compiler
More informationBuilding a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano
Building a Runnable Program and Code Improvement Dario Marasco, Greg Klepic, Tess DiStefano Building a Runnable Program Review Front end code Source code analysis Syntax tree Back end code Target code
More informationOptimization of Dynamic Data Structures in Multimedia Embedded Systems Using Evolutionary Computation
Optimization of Dynamic Data Structures in Multimedia Embedded Systems Using Evolutionary Computation D. Atienza, C. Baloukas, L. Papadopoulos, C. Poucet, S. Mamagkakis, J. I. Hidalgo, F. Catthoor, D.
More informationPACE: Power-Aware Computing Engines
PACE: Power-Aware Computing Engines Krste Asanovic Saman Amarasinghe Martin Rinard Computer Architecture Group MIT Laboratory for Computer Science http://www.cag.lcs.mit.edu/ PACE Approach Energy- Conscious
More informationTiling: A Data Locality Optimizing Algorithm
Tiling: A Data Locality Optimizing Algorithm Announcements Monday November 28th, Dr. Sanjay Rajopadhye is talking at BMAC Friday December 2nd, Dr. Sanjay Rajopadhye will be leading CS553 Last Monday Kelly
More informationCOMPUTATIONAL CHALLENGES IN HIGH-RESOLUTION CRYO-ELECTRON MICROSCOPY. Thesis by. Peter Anthony Leong. In Partial Fulfillment of the Requirements
COMPUTATIONAL CHALLENGES IN HIGH-RESOLUTION CRYO-ELECTRON MICROSCOPY Thesis by Peter Anthony Leong In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy California Institute
More informationA Feasibility Study for Methods of Effective Memoization Optimization
A Feasibility Study for Methods of Effective Memoization Optimization Daniel Mock October 2018 Abstract Traditionally, memoization is a compiler optimization that is applied to regions of code with few
More informationTimed Compiled-Code Functional Simulation of Embedded Software for Performance Analysis of SOC Design
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 22, NO. 1, JANUARY 2003 1 Timed Compiled-Code Functional Simulation of Embedded Software for Performance Analysis of
More informationECE 5730 Memory Systems
ECE 5730 Memory Systems Spring 2009 Off-line Cache Content Management Lecture 7: 1 Quiz 4 on Tuesday Announcements Only covers today s lecture No office hours today Lecture 7: 2 Where We re Headed Off-line
More informationComputer Organization
A Text Book of Computer Organization and Architecture Prof. JATINDER SINGH Director, GGI, Dhaliwal Er. AMARDEEP SINGH M.Tech (IT) AP&HOD, Deptt.of CSE, SVIET, Banur Er. GURJEET SINGH M.Tech (CSE) Head,
More informationCPSC 510 (Fall 2007, Winter 2008) Compiler Construction II
CPSC 510 (Fall 2007, Winter 2008) Compiler Construction II Robin Cockett Department of Computer Science University of Calgary Alberta, Canada robin@cpsc.ucalgary.ca Sept. 2007 Introduction to the course
More informationProgram design and analysis
Program design and analysis Optimizing for execution time. Optimizing for energy/power. Optimizing for program size. Motivation Embedded systems must often meet deadlines. Faster may not be fast enough.
More informationPerformance Issues and Query Optimization in Monet
Performance Issues and Query Optimization in Monet Stefan Manegold Stefan.Manegold@cwi.nl 1 Contents Modern Computer Architecture: CPU & Memory system Consequences for DBMS - Data structures: vertical
More informationIntermediate representation
Intermediate representation Goals: encode knowledge about the program facilitate analysis facilitate retargeting facilitate optimization scanning parsing HIR semantic analysis HIR intermediate code gen.
More informationCompiler-Directed Memory Hierarchy Design for Low-Energy Embedded Systems
Compiler-Directed Memory Hierarchy Design for Low-Energy Embedded Systems Florin Balasa American University in Cairo Noha Abuaesh American University in Cairo Ilie I. Luican Microsoft Inc., USA Cristian
More informationAPPENDIX D: DIGITIZED MAPS
APPENDIX D: DIGITIZED MAPS 289 290 E-13 E-16 E-15 E-14 E-8 E-11 E-7 E-6 E-12 E-10 E-17 E-5 E-9 E-18 E-2 E-19 E-3 E-4 Figure D-1. The 18 quadrangles of digitized maps (Scale = 1:8,500). The Painted Temple
More informationAn Ontological Framework for Contextualising Information in Hypermedia Systems.
An Ontological Framework for Contextualising Information in Hypermedia Systems. by Andrew James Bucknell Thesis submitted for the degree of Doctor of Philosophy University of Technology, Sydney 2008 CERTIFICATE
More informationTHE AUSTRALIAN NATIONAL UNIVERSITY Mid Semester Examination April 2010 COMP3320/6464. High Performance Scientific Computing
THE AUSTRALIAN NATIONAL UNIVERSITY Mid Semester Examination April 2010 COMP3320/6464 High Performance Scientific Computing Study Period: 15 minutes Time Allowed: 90 minutes Permitted Materials: Non-Programmable
More informationINTEGRATING THE CAD MODEL WITH DYNAMIC SIMULATION: SIMULATION DATA EXCHANGE. Shreekanth Moorthy
Proceedings of the 1999 Winter Simulation Conference P. A. Farrington, H. B. Nembhard, D. T. Sturrock, and G. W. Evans, eds. INTEGRATING THE CAD MODEL WITH DYNAMIC SIMULATION: SIMULATION DATA EXCHANGE
More informationSupercomputing in Plain English Part IV: Henry Neeman, Director
Supercomputing in Plain English Part IV: Henry Neeman, Director OU Supercomputing Center for Education & Research University of Oklahoma Wednesday September 19 2007 Outline! Dependency Analysis! What is
More informationContents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11
Preface xvii Acknowledgments xix CHAPTER 1 Introduction to Parallel Computing 1 1.1 Motivating Parallelism 2 1.1.1 The Computational Power Argument from Transistors to FLOPS 2 1.1.2 The Memory/Disk Speed
More informationSummary of Contents LIST OF FIGURES LIST OF TABLES
Summary of Contents LIST OF FIGURES LIST OF TABLES PREFACE xvii xix xxi PART 1 BACKGROUND Chapter 1. Introduction 3 Chapter 2. Standards-Makers 21 Chapter 3. Principles of the S2ESC Collection 45 Chapter
More informationDedication. To the departed souls of my parents & father-in-law.
Abstract In this thesis work, a contribution to the field of Formal Verification is presented innovating a semantic-based approach for the verification of concurrent and distributed programs by applying
More informationA Parallelizing Compiler for Multicore Systems
A Parallelizing Compiler for Multicore Systems José M. Andión, Manuel Arenaz, Gabriel Rodríguez and Juan Touriño 17th International Workshop on Software and Compilers for Embedded Systems (SCOPES 2014)
More informationIntroduction to Compilers
Introduction to Compilers Compilers are language translators input: program in one language output: equivalent program in another language Introduction to Compilers Two types Compilers offline Data Program
More informationDRIM:ALowPowerDynamically Reconfigurable Instruction Memory Hierarchy for Embedded Systems
DRIM:ALowPowerDynamically Reconfigurable Instruction Memory Hierarchy for Embedded Systems Zhiguo Ge, Weng-Fai Wong Department of Computer Science, National University of Singapore {gezhiguo,wongwf}@comp.nus.edu.sg
More informationMemory Hierarchy Management for Iterative Graph Structures
Memory Hierarchy Management for Iterative Graph Structures Ibraheem Al-Furaih y Syracuse University Sanjay Ranka University of Florida Abstract The increasing gap in processor and memory speeds has forced
More informationTowards Optimal Custom Instruction Processors
Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT CHIPS 18 Overview 1. background: extensible processors
More informationRechnerarchitektur (RA)
12 Rechnerarchitektur (RA) Sommersemester 2017 Architecture-Aware Optimizations -Software Optimizations- Jian-Jia Chen Informatik 12 Jian-jia.chen@tu-.. http://ls12-www.cs.tu.de/daes/ Tel.: 0231 755 6078
More informationStructure of Computer Systems
Structure of Computer Systems Structure of Computer Systems Baruch Zoltan Francisc Technical University of Cluj-Napoca Computer Science Department U. T. PRES Cluj-Napoca, 2002 CONTENTS PREFACE... xiii
More informationCompiler Construction
Compiler Construction Thomas Noll Software Modeling and Verification Group RWTH Aachen University https://moves.rwth-aachen.de/teaching/ss-17/cc/ Generation of Intermediate Code Outline of Lecture 15 Generation
More informationUSC 227 Office hours: 3-4 Monday and Wednesday CS553 Lecture 1 Introduction 4
CS553 Compiler Construction Instructor: URL: Michelle Strout mstrout@cs.colostate.edu USC 227 Office hours: 3-4 Monday and Wednesday http://www.cs.colostate.edu/~cs553 CS553 Lecture 1 Introduction 3 Plan
More information250 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 2, FEBRUARY 2011
250 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 2, FEBRUARY 2011 Energy-Efficient Hardware Data Prefetching Yao Guo, Member, IEEE, Pritish Narayanan, Student Member,
More informationLRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.
LRU A list to keep track of the order of access to every block in the set. The least recently used block is replaced (if needed). How many bits we need for that? 27 Pseudo LRU A B C D E F G H A B C D E
More informationVisualization of OpenCL Application Execution on CPU-GPU Systems
Visualization of OpenCL Application Execution on CPU-GPU Systems A. Ziabari*, R. Ubal*, D. Schaa**, D. Kaeli* *NUCAR Group, Northeastern Universiy **AMD Northeastern University Computer Architecture Research
More information(Cover Page) TITLE OF THE THESIS TITLE OF THE THESIS TITLE OF THE THESIS (Not more than 3 lines. CAPS Times New Roman Font size 20)
(Cover Page) (Not more than 3 lines. CAPS Times New Roman Font size 20) A Thesis submitted by Dr./ Mr. / Ms [Full Name with Initials (if any)] (US No..) to NITTE (DEEMED TO BE UNIVERSITY) (Estd under Section
More informationLoop Transformations! Part II!
Lecture 9! Loop Transformations! Part II! John Cavazos! Dept of Computer & Information Sciences! University of Delaware! www.cis.udel.edu/~cavazos/cisc879! Loop Unswitching Hoist invariant control-flow
More informationA Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors
A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors Murali Jayapala 1, Francisco Barat 1, Pieter Op de Beeck 1, Francky Catthoor 2, Geert Deconinck 1 and Henk Corporaal
More informationTopic & Scope. Content: The course gives
Topic & Scope Content: The course gives an overview of network processor cards (architectures and use) an introduction of how to program Intel IXP network processors some ideas of how to use network processors
More informationUnit 2: High-Level Synthesis
Course contents Unit 2: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 2 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis
More informationLoops. Announcements. Loop fusion. Loop unrolling. Code motion. Array. Good targets for optimization. Basic loop optimizations:
Announcements HW1 is available online Next Class Liang will give a tutorial on TinyOS/motes Very useful! Classroom: EADS Hall 116 This Wed ONLY Proposal is due on 5pm, Wed Email me your proposal Loops
More informationData Reuse and Parallelism in Hardware Compilation
Imperial College London Department of Electrical and Electronic Engineering Data Reuse and Parallelism in Hardware Compilation Qiang Liu Submitted in part fulfilment of the requirements for the degree
More informationMasterpraktikum Scientific Computing
Masterpraktikum Scientific Computing High-Performance Computing Michael Bader Alexander Heinecke Technische Universität München, Germany Outline Logins Levels of Parallelism Single Processor Systems Von-Neumann-Principle
More informationRoll No. :... Invigilator's Signature :. CS/B.Tech(CSE)/SEM-7/CS-701/ LANGUAGE PROCESSOR. Time Allotted : 3 Hours Full Marks : 70
Name : Roll No. :... Invigilator's Signature :. CS/B.Tech(CSE)/SEM-7/CS-701/2011-12 2011 LANGUAGE PROCESSOR Time Allotted : 3 Hours Full Marks : 70 The figures in the margin indicate full marks. Candidates
More informationPrinciples of Compiler Design
Principles of Compiler Design Intermediate Representation Compiler Lexical Analysis Syntax Analysis Semantic Analysis Source Program Token stream Abstract Syntax tree Unambiguous Program representation
More informationtechnische universität dortmund fakultät für informatik informatik 12 Optimizations Peter Marwedel TU Dortmund Informatik 12 Germany 2010/01/10
12 Peter Marwedel TU Dortmund Informatik 12 Germany 2010/01/10 Graphics: Alexandra Nolte, Gesine Marwedel, 2003 Optimizations Structure of this course Application Knowledge 2: Specification Design repository
More informationAdvanced Instruction-Level Parallelism
Advanced Instruction-Level Parallelism Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu
More informationSoftware Exorcism: A Handbook for Debugging and Optimizing Legacy Code
Software Exorcism: A Handbook for Debugging and Optimizing Legacy Code BILL BLUNDEN Apress About the Author Acknowledgments Introduction xi xiii xv Chapter 1 Preventative Medicine 1 1.1 Core Problems 2
More informationCompiler Optimization
Compiler Optimization The compiler translates programs written in a high-level language to assembly language code Assembly language code is translated to object code by an assembler Object code modules
More informationLocality-Aware Automatic Parallelization for GPGPU with OpenHMPP Directives
Locality-Aware Automatic Parallelization for GPGPU with OpenHMPP Directives José M. Andión, Manuel Arenaz, François Bodin, Gabriel Rodríguez and Juan Touriño 7th International Symposium on High-Level Parallel
More informationProgramming Language Processor Theory
Programming Language Processor Theory Munehiro Takimoto Course Descriptions Method of Evaluation: made through your technical reports Purposes: understanding various theories and implementations of modern
More information1 CUSTOM TAG FUNDAMENTALS PREFACE... xiii. ACKNOWLEDGMENTS... xix. Using Custom Tags The JSP File 5. Defining Custom Tags The TLD 6
PREFACE........................... xiii ACKNOWLEDGMENTS................... xix 1 CUSTOM TAG FUNDAMENTALS.............. 2 Using Custom Tags The JSP File 5 Defining Custom Tags The TLD 6 Implementing Custom
More informationComputer Systems A Programmer s Perspective 1 (Beta Draft)
Computer Systems A Programmer s Perspective 1 (Beta Draft) Randal E. Bryant David R. O Hallaron August 1, 2001 1 Copyright c 2001, R. E. Bryant, D. R. O Hallaron. All rights reserved. 2 Contents Preface
More information"Charting the Course... MOC C: Querying Data with Transact-SQL. Course Summary
Course Summary Description This course is designed to introduce students to Transact-SQL. It is designed in such a way that the first three days can be taught as a course to students requiring the knowledge
More informationSupercomputing and Science An Introduction to High Performance Computing
Supercomputing and Science An Introduction to High Performance Computing Part IV: Dependency Analysis and Stupid Compiler Tricks Henry Neeman, Director OU Supercomputing Center for Education & Research
More informationCompiler Construction
Compiler Construction Thomas Noll Software Modeling and Verification Group RWTH Aachen University https://moves.rwth-aachen.de/teaching/ss-16/cc/ Seminar Analysis and Verification of Pointer Programs (WS
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2
More informationApple LLVM GPU Compiler: Embedded Dragons. Charu Chandrasekaran, Apple Marcello Maggioni, Apple
Apple LLVM GPU Compiler: Embedded Dragons Charu Chandrasekaran, Apple Marcello Maggioni, Apple 1 Agenda How Apple uses LLVM to build a GPU Compiler Factors that affect GPU performance The Apple GPU compiler
More informationSEMINAR REPORT SEMINAR TITLE. Name of the student
SEMINAR REPORT ON SEMINAR TITLE By Name of the student DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING M.B.E.SOCIETY S COLLEGE OF ENGINEERING, AMBAJOGAI 20XX-20XX SEMINAR REPORT ON SEMINAR TITLE By Name
More informationComputer Architecture
Computer Architecture Lecture 2: Fundamental Concepts and ISA Dr. Ahmed Sallam Based on original slides by Prof. Onur Mutlu What Do I Expect From You? Chance favors the prepared mind. (Louis Pasteur) كل
More informationFacilities Manager Local Device Tracking
Facilities Manager Local Device Tracking The Information Collection Engine (ICE) can track print volumes on all types of devices, whether they are networked or not. Devices that do not support SNMP or
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2
More informationA Tutorial Introduction 1
Preface From the Old to the New Acknowledgments xv xvii xxi 1 Verilog A Tutorial Introduction 1 Getting Started A Structural Description Simulating the binarytoeseg Driver Creating Ports For the Module
More information"Charting the Course... MOC A Developing Data Access Solutions with Microsoft Visual Studio Course Summary
Description Course Summary In this course, experienced developers who know the basics of data access (CRUD) in Windows client and Web application environments will learn to optimize their designs and develop
More informationAUTOMATED STUDENT S ATTENDANCE ENTERING SYSTEM BY ELIMINATING FORGE SIGNATURES
AUTOMATED STUDENT S ATTENDANCE ENTERING SYSTEM BY ELIMINATING FORGE SIGNATURES K. P. M. L. P. Weerasinghe 149235H Faculty of Information Technology University of Moratuwa June 2017 AUTOMATED STUDENT S
More informationMicrosoft Power Tools for Data Analysis #10 Power BI M Code: Helper Table to Calculate MAT By Month & Product. Notes from Video:
Microsoft Power Tools for Data Analysis #10 Power BI M Code: Helper Table to Calculate MAT By Month & Product Table of Contents: Notes from Video: 1. Intermediate Fact Table / Helper Table:... 1 2. Goal
More information"Charting the Course to Your Success!" MOC Microsoft SharePoint 2010 Site Collection and Site Administration Course Summary
MOC 50547 Microsoft SharePoint Site Collection and Site Course Summary Description This five-day instructor-led Site Collection and Site Administrator course gives students who have SharePoint Owner permissions
More information