Declarative Machine Learning for Energy Efficient Compiler Optimisations
|
|
- Marilynn Benson
- 5 years ago
- Views:
Transcription
1 Declarative Machine Learning for Energy Efficient Compiler Optimisations 1 Year PhD Review Craig Blackmore Supervised by Dr. Oliver Ray and Dr. Kerstin Eder 28 th October 2015
2 Introduction Motivation: reduce execution time and energy consumption of imperative programs
3 Introduction Motivation: reduce execution time and energy consumption of imperative programs Current focus: execution time of C programs for embedded architectures
4 Introduction Motivation: reduce execution time and energy consumption of imperative programs Current focus: execution time of C programs for embedded architectures Methodology: use machine learning to tune compiler parameters for a given program
5 Introduction Motivation: reduce execution time and energy consumption of imperative programs Current focus: execution time of C programs for embedded architectures Methodology: use machine learning to tune compiler parameters for a given program Research hypothesis: relational representation of source code allows learning of more accurate models by Inductive Logic Programming (ILP)
6 Compiling C Programs with GCC $ gcc program.c
7 Compiling C Programs with GCC $ gcc program.c $ time./a.out
8 Compiling C Programs with GCC $ gcc program.c $ time./a.out 1.70s
9 Compiling C Programs with GCC $ gcc program.c $ time./a.out 1.70s $ gcc program.c -O3 $ time./a.out 0.44s
10 Compiling C Programs with GCC $ gcc program.c $ time./a.out 1.70s $ gcc program.c -O3 $ time./a.out 0.44s $ gcc program.c -O3 -fno-guess-branch-probability $ time./a.out 0.27s
11 Compiling C Programs with GCC $ gcc program.c $ time./a.out 1.70s $ gcc program.c -O3 $ time./a.out 0.44s Configuration = set of compiler flags $ gcc program.c -O3 -fno-guess-branch-probability $ time./a.out 0.27s
12 Compiling C Programs with GCC $ gcc program.c $ time./a.out 1.70s $ gcc program.c -O3 $ time./a.out 0.44s Configuration = set of compiler flags $ gcc program.c -O3 -fno-guess-branch-probability $ time./a.out 0.27s How do we know which optimisations to use?
13 Random Iterative Compilation Better than O3 ARM Cortex-M3
14 Random Iterative Compilation 14% average possible improvement vs O3 (Up to 58% improvement) Better than O3 ARM Cortex-M3
15 Challenges Large search space Over 100 optimisation flags Over configurations Over years to exhaustive search Complex interactions between flags Good solutions program/platform dependent
16 Energy-time Correlation Single-threaded benchmarks Simple pipeline
17 Compiler Tuning Methods O1 O2 O3 Standard optimisation levels BEST EXECUTION TIME Milepost Predictive Methods Random Search Iterative Compilation Exhaustive Search (infeasible) SEARCH TIME
18 Compiler Tuning Methods O1 O2 O3 Standard optimisation levels BEST EXECUTION TIME Milepost ILP Hybrid Predictive Methods Random Search Iterative Compilation Exhaustive Search (infeasible) SEARCH TIME
19 Program Representation int fac(int x) { int y = 1; while (x > 1) { y = y*x; x = x-1; } return y; }
20 Program Representation int fac(int x) { int y = 1; while (x > 1) { y = y*x; x = x-1; } return y; } ed4 int y = 1; ed1 bb1 if (x > 1) bb2 ed2 y = y*x; x = x-1; bb3 return y; bb4 ed3
21 Program Representation Prolog-encoded IR: int y = 1; bb(bb1). bb(bb2). edge(ed1). edge_src(ed1,bb1). edge_dest(ed1,bb2).... ed4 ed1 bb1 if (x > 1) bb2 ed2 y = y*x; x = x-1; bb3 return y; bb4 ed3
22 Program Representation Prolog-encoded IR: int y = 1; bb(bb1). edge_src(ed1,bb1). edge_dest(ed1,bb2).... ed4 ed1 bb1 if (x > 1) bb2 ed2 ed3 y = y*x; x = x-1; bb3 return y; bb4 Feature vector: # basic blocks = 4 # edges = 4 # conditional branches = 1...
23 Program Representation Prolog-encoded IR: int y = 1; bb(bb1). edge_src(ed1,bb1). edge_dest(ed1,bb2).... ed4 ed1 bb1 if (x > 1) bb2 ed2 ed3 y = y*x; x = x-1; bb3 return y; bb4
24 22 cbench programs Milepost Intermediate Representation Random Iterative Compilation Good configurations Feature Vector (56 features) Machine Learning Flatten program into lossy feature vector {ft1=3, ft2=7,, ft56=6} Models (1NN, decision tree)
25 22 cbench programs Milepost Intermediate Representation Random Iterative Compilation Good configurations Feature Vector (56 features) Machine Learning Identify good configurations (within x% of the best config) Models (1NN, decision tree)
26 22 cbench programs Milepost Intermediate Representation Random Iterative Compilation Good configurations Feature Vector (56 features) Machine Learning Output models: Feature vector predicted configuration Models (1NN, decision tree)
27 60 BEEBS programs Our Approach Intermediate Representation Feature Vector (56 features) Random Iterative Compilation B Good/bad flags E ILP H Rule-based model
28 60 BEEBS programs Our Approach Intermediate Representation Random Iterative Compilation B Good/bad flags E ILP Bristol/Embecosm Embedded Benchmark Suite H (BEEBS): more diverse than cbench ~3x as many programs Rule-based model
29 60 BEEBS programs Our Approach Intermediate Representation Random Iterative Compilation B Good/bad flags E ILP Background knowledge B: edge_src(prog1,func1,bb1,ed1). edge_dest(prog1,func1,bb2,ed1). H H: Rule-based rule-based model
30 60 BEEBS programs Our Approach Intermediate Representation Random Iterative Compilation B Good/bad flags E ILP Background knowledge B: edge_src(prog1,func1,bb1,ed1). edge_dest(prog1,func1,bb2,ed1). bb_stmt_f(prog1,func1,bb2,st4). H H: Rule-based rule-based model
31 60 BEEBS programs Our Approach Intermediate Representation Random Iterative Compilation B Good/bad flags E ILP Background knowledge B: edge_src(prog1,func1,bb1,ed1). H: Rule-based rule-based edge_dest(prog1,func1,bb2,ed1). model bb_stmt_f(prog1,func1,bb2,st4). stmt_code(prog1,func1,st4,gimple_cond). H
32 60 BEEBS programs Our Approach Intermediate Representation Random Iterative Compilation B Good/bad flags E ILP H Examples E: badflag(prog1,'-flag-x'). Rule-based model
33 60 BEEBS programs Our Approach Intermediate Representation Random Iterative Compilation B Good/bad flags E ILP H Examples E: badflag(prog1,'-flag-x'). :- badflag(prog2,'-flag-x'). Rule-based model
34 60 BEEBS programs Our Approach Intermediate Representation Random Iterative Compilation B Good/bad flags E ILP H Rule-based Hypotheses H: model badflag(prog,'-flag-x') :- stmt_code(prog,func,stmt,gimple_cond).
35 Results
36 Results
37 Learned Hypotheses: Facts 19 flags always off (bad) 33 flags always on (good or indifferent)
38 Learned Hypotheses: Facts 19 flags always off (bad) 5 disabled at O3 33 flags always on (good or indifferent) 29 enabled at O3
39 Learned Hypotheses: Rules badflag(p,'-fguess-branch-probability') :- stmt_code(p,f,s,gimple_cond), expr_ge2(p,f,real_type).
40 Learned Hypotheses: Rules badflag(p,'-fguess-branch-probability') :- stmt_code(p,f,s,gimple_cond), expr_ge2(p,f,real_type). Disable -fguess-branch-probability if function F of program P has a conditional statement S and at least two statements with real number expressions
41 Learned Hypotheses: Rules badflag(p,'-fguess-branch-probability') :- stmt_code(p,f,s,gimple_cond), expr_ge2(p,f,real_type). Disable -fguess-branch-probability if function F of program P has a conditional statement S and at least two statements with real number expressions Compiler heuristic was wrong for functions with real number expressions
42 Learned Hypotheses: Rules badflag(p,'-fschedule-insns2') :- expr_ge2(p,f,array ref).
43 Learned Hypotheses: Rules badflag(p,'-fschedule-insns2') :- expr_ge2(p,f,array ref). Disable -fschedule-insns2 if function F of program P contains at least two statements that reference arrays
44 Learned Hypotheses: Rules badflag(p,'-fschedule-insns2') :- expr_ge2(p,f,array ref). Disable -fschedule-insns2 if function F of program P contains at least two statements that reference arrays New feature - no mention of arrays in feature vector
45 Hybrid Approach 8% average improvement vs O3 (Up to 50% improvement)
46 Hybrid Approach Predictions virtually instantaneous thanks to predictive model 8% average improvement vs O3 (Up to 50% improvement)
47 Ideas for Improving ILP Approach Improve flag examples Predict whole configurations (retain dependencies) Improve background knowledge
48 Compiler Tuning Methods O1 O2 O3 Standard optimisation levels BEST EXECUTION TIME Milepost ILP Hybrid Predictive Methods Random Search Iterative Compilation Exhaustive Search (infeasible) SEARCH TIME
49 Compiler Tuning Methods O1 O2 O3 Standard optimisation levels BEST EXECUTION TIME Milepost ILP Hybrid Predictive Methods Combined Elimination Random Search Iterative Compilation Exhaustive Search (infeasible) SEARCH TIME
50 Combined Elimination (CE) Introduced by Pan et al Hill-climbing & iterative compilation Starts with all flags enabled Prioritises removal of current worst flag
51 Combined Elimination (CE) Introduced by Pan et al Hill-climbing & iterative compilation Starts with all flags enabled Prioritises removal of current worst flag My evaluation 3 times as many flags and benchmarks Compares to random iterative compilation Tested 3 initial configurations: CE = All flags on CE-O3 = O3 CE-random = Best random configuration
52 Combined Elimination (CE) 17% average possible improvement vs O3
53 Combined Elimination (CE) CE outperforms random after 108 configs
54 Combined Elimination (CE) CE outperforms random after 108 configs
55 Downsampling Configuration Sets Iterative compilation (e.g configs) Reduced Set (e.g. 15 configs) Small set of configs that cover a range of programs
56 Downsampling Configuration Sets
57 Ideas for Improving Background Knowledge Continue developing extra rules to help generalise IR Milepost IR very large (60 lines of code 3000 facts) Other IRs to investigate LLVM IR Constrained Horn Clauses (e.g. SeaHorn)
58 Publications Blackmore, C., Ray, O., Eder, K.: A logic programming approach to predict effective compiler settings for embedded software. Theory and Practice of Logic Programming 15(4-5), (2015). Blackmore, C., Ray, O., Kull, M., et al.: Reframing of Classification and Regression Tasks for Predicting the Effects of Compiler Settings on Multiple Embedded Systems. 2nd International Workshop on Learning over Multiple Contexts (2015).
59 Long-Term Plan Compiler tuning for larger scale software Function-level optimisation Interesting energy cases Improve ILP approach Create tool for software engineers Iterative and predictive approaches
60 Conclusions ILP allows learning directly from IR Relational IR more expressive than feature vector Predictive approaches likely to be most beneficial at function-level Approach applicable to any metric that: can be measured AND is influenced by the compiler
61 Any questions?
An Evaluation of Autotuning Techniques for the Compiler Optimization Problems
An Evaluation of Autotuning Techniques for the Compiler Optimization Problems Amir Hossein Ashouri, Gianluca Palermo and Cristina Silvano Politecnico di Milano, Milan, Italy {amirhossein.ashouri,ginaluca.palermo,cristina.silvano}@polimi.it
More informationHugh Leather, Edwin Bonilla, Michael O'Boyle
Automatic Generation for Machine Learning Based Optimizing Compilation Hugh Leather, Edwin Bonilla, Michael O'Boyle Institute for Computing Systems Architecture University of Edinburgh, UK Overview Introduction
More informationWho Ate My Battery? Why Free and Open Source Systems Are Solving the Problem of Excessive Energy Consumption. Jeremy Bennett
Who Ate My Battery? Why Free and Open Source Systems Are Solving the Problem of Excessive Energy Consumption Jeremy Bennett Why? Ericsson T65 released 2001 Li-Ion 720 mah standby 300 h talk time 11 h includes
More informationMACHINE LEARNING BASED COMPILER OPTIMIZATION
MACHINE LEARNING BASED COMPILER OPTIMIZATION Arash Ashari Slides have been copied and adapted from 2011 SIParCS by William Petzke; Self-Tuning Compilers Selecting a good set of compiler optimization flags
More informationMachine Learning based Compilation
Machine Learning based Compilation Michael O Boyle March, 2014 1 Overview Machine learning - what is it and why is it useful? Predictive modelling OSE Scheduling and low level optimisation Loop unrolling
More informationLecture 8 Dynamic Branch Prediction, Superscalar and VLIW. Computer Architectures S
Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Computer Architectures 521480S Dynamic Branch Prediction Performance = ƒ(accuracy, cost of misprediction) Branch History Table (BHT) is simplest
More informationUnderstanding The Effects of Wrong-path Memory References on Processor Performance
Understanding The Effects of Wrong-path Memory References on Processor Performance Onur Mutlu Hyesoon Kim David N. Armstrong Yale N. Patt The University of Texas at Austin 2 Motivation Processors spend
More informationSome material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier
Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science ! CPI = (1-branch%) * non-branch CPI + branch% *
More informationarxiv: v3 [cs.oh] 25 May 2017
Energy Transparency for Deeply Embedded Programs Kyriakos Georgiou 1, Steve Kerrison 1, Zbigniew Chamski 2, Kerstin Eder 1 1 University of Bristol 2 Infrasoft IT Solutions, Poland arxiv:1609.02193v3 [cs.oh]
More informationA Large-Scale Cross-Architecture Evaluation of Thread-Coarsening. Alberto Magni, Christophe Dubach, Michael O'Boyle
A Large-Scale Cross-Architecture Evaluation of Thread-Coarsening Alberto Magni, Christophe Dubach, Michael O'Boyle Introduction Wide adoption of GPGPU for HPC Many GPU devices from many of vendors AMD
More informationSireesha R Basavaraju Embedded Systems Group, Technical University of Kaiserslautern
Sireesha R Basavaraju Embedded Systems Group, Technical University of Kaiserslautern Introduction WCET of program ILP Formulation Requirement SPM allocation for code SPM allocation for data Conclusion
More informationCreating Meaningful Training Data for Dicult Job Shop Scheduling Instances for Ordinal Regression
Creating Meaningful Training Data for Dicult Job Shop Scheduling Instances for Ordinal Regression Helga Ingimundardóttir University of Iceland March 28 th, 2012 Outline Introduction Job Shop Scheduling
More informationComputer Science 246 Computer Architecture
Computer Architecture Spring 2009 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Compiler ILP Static ILP Overview Have discussed methods to extract ILP from hardware Why can t some of these
More informationECE 571 Advanced Microprocessor-Based Design Lecture 8
ECE 571 Advanced Microprocessor-Based Design Lecture 8 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 16 February 2017 Announcements HW4 Due HW5 will be posted 1 HW#3 Review Energy
More informationMachine Learning for Software Engineering
Machine Learning for Software Engineering Introduction and Motivation Prof. Dr.-Ing. Norbert Siegmund Intelligent Software Systems 1 2 Organizational Stuff Lectures: Tuesday 11:00 12:30 in room SR015 Cover
More informationWho Ate My Battery? Why Free and Open Source Systems Are Solving the Problem of Excessive Energy Consumption
Who Ate My Battery? Why Free and Open Source Systems Are Solving the Problem of Excessive Energy Consumption Jeremy Bennett, Embecosm Kerstin Eder, Computer Science, University of Bristol Why? Ericsson
More informationModern Processor Architectures. L25: Modern Compiler Design
Modern Processor Architectures L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant minimising the number of instructions
More informationSymbolic AI. Andre Freitas. Photo by Vasilyev Alexandr
Symbolic AI Andre Freitas Photo by Vasilyev Alexandr Acknowledgements These slides were based on the slides of: Peter A. Flach, Rule induction tutorial, IDA Spring School 2001. Anoop & Hector, Inductive
More informationEvolutionary Methods for State-based Testing
Evolutionary Methods for State-based Testing PhD Student Raluca Lefticaru Supervised by Florentin Ipate University of Piteşti, Romania Department of Computer Science Outline Motivation Search-based software
More informationPiecewise Holistic Autotuning of Compiler and Runtime Parameters
Piecewise Holistic Autotuning of Compiler and Runtime Parameters Mihail Popov, Chadi Akel, William Jalby, Pablo de Oliveira Castro University of Versailles Exascale Computing Research August 2016 C E R
More informationReduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction
ISA Support Needed By CPU Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with control hazards in instruction pipelines by: 1 2 3 4 Assuming that the branch
More informationA hybrid approach to application instrumentation
A hybrid approach to application instrumentation Ashay Rane, Leo Fialho and James Browne 4 th August, 2014 Petascale Tools Workshop 1 Program Instrumentation What is instrumentation? Addition of statements
More informationPower Struggles: Revisiting the RISC vs. CISC Debate on Contemporary ARM and x86 Architectures
Power Struggles: Revisiting the RISC vs. CISC Debate on Contemporary ARM and x86 Architectures by Ernily Blern, laikrishnan Menon, and Karthikeyan Sankaralingarn Danilo Dominguez Perez danilo0@iastate.edu
More informationIs dynamic compilation possible for embedded system?
Is dynamic compilation possible for embedded system? Scopes 2015, St Goar Victor Lomüller, Henri-Pierre Charles CEA DACLE / Grenoble www.cea.fr June 2 2015 Introduction : Wake Up Questions Session FAQ
More informationModern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design
Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant
More informationLLFI: An Intermediate Code-Level Fault Injection Tool for Hardware Faults
LLFI: An Intermediate Code-Level Fault Injection Tool for Hardware Faults Qining Lu, Mostafa Farahani, Jiesheng Wei, Anna Thomas, and Karthik Pattabiraman Department of Electrical and Computer Engineering,
More informationOutline of the module
Evolutionary and Heuristic Optimisation (ITNPD8) Lecture 2: Heuristics and Metaheuristics Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ Computing Science and Mathematics, School of Natural Sciences University
More informationarxiv: v2 [cs.pl] 28 Mar 2019
Lost in translation: Exposing hidden compiler optimization opportunities Kyriakos Georgiou, Zbigniew Chamski, Andres Amaya Garcia, David May, Kerstin Eder University of Bristol, UK arxiv:1903.11397v2 [cs.pl]
More informationA Roadmap to an Enhanced Graph Based Data mining Approach for Multi-Relational Data mining
A Roadmap to an Enhanced Graph Based Data mining Approach for Multi-Relational Data mining D.Kavinya 1 Student, Department of CSE, K.S.Rangasamy College of Technology, Tiruchengode, Tamil Nadu, India 1
More informationDynamic Control Hazard Avoidance
Dynamic Control Hazard Avoidance Consider Effects of Increasing the ILP Control dependencies rapidly become the limiting factor they tend to not get optimized by the compiler more instructions/sec ==>
More informationA Comparative Performance Evaluation of Different Application Domains on Server Processor Architectures
A Comparative Performance Evaluation of Different Application Domains on Server Processor Architectures W.M. Roshan Weerasuriya and D.N. Ranasinghe University of Colombo School of Computing A Comparative
More informationECE 571 Advanced Microprocessor-Based Design Lecture 9
ECE 571 Advanced Microprocessor-Based Design Lecture 9 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 30 September 2014 Announcements Next homework coming soon 1 Bulldozer Paper
More informationK Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat
K Nearest Neighbor Wrap Up K- Means Clustering Slides adapted from Prof. Carpuat K Nearest Neighbor classification Classification is based on Test instance with Training Data K: number of neighbors that
More informationAmir H. Ashouri University of Toronto Canada
Compiler Autotuning using Machine Learning: A State-of-the-art Review Amir H. Ashouri University of Toronto Canada 4 th July, 2018 Politecnico di Milano, Italy Background 2 Education B.Sc (2005-2009):
More informationComputer Science 146. Computer Architecture
Computer rchitecture Spring 2004 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture 11: Software Pipelining and Global Scheduling Lecture Outline Review of Loop Unrolling Software Pipelining
More informationControl Hazards. Branch Prediction
Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional
More informationECE571: Advanced Microprocessor Design Final Project Spring Officially Due: Friday, 4 May 2018 (Last day of Classes)
Overview: ECE571: Advanced Microprocessor Design Final Project Spring 2018 Officially Due: Friday, 4 May 2018 (Last day of Classes) Design a project that explores the power, energy, and/or performance
More informationGeneral Concepts. Abstraction Computational Paradigms Implementation Application Domains Influence on Success Influences on Design
General Concepts Abstraction Computational Paradigms Implementation Application Domains Influence on Success Influences on Design 1 Abstractions in Programming Languages Abstractions hide details that
More informationIntroduction. L25: Modern Compiler Design
Introduction L25: Modern Compiler Design Course Aims Understand the performance characteristics of modern processors Be familiar with strategies for optimising dynamic dispatch for languages like JavaScript
More informationA Survey on Compiler Autotuning using Machine Learning
1 A Survey on Compiler Autotuning using Machine Learning AMIR H. ASHOURI, University of Toronto, Canada WILLIAM KILLIAN, Millersville University of Pennsylvania, USA JOHN CAVAZOS, University of Delaware,
More informationInductive Logic Programming in Clementine
Inductive Logic Programming in Clementine Sam Brewer 1 and Tom Khabaza 2 Advanced Data Mining Group, SPSS (UK) Ltd 1st Floor, St. Andrew s House, West Street Woking, Surrey GU21 1EB, UK 1 sbrewer@spss.com,
More informationLecture 8: Compiling for ILP and Branch Prediction. Advanced pipelining and instruction level parallelism
Lecture 8: Compiling for ILP and Branch Prediction Kunle Olukotun Gates 302 kunle@ogun.stanford.edu http://www-leland.stanford.edu/class/ee282h/ 1 Advanced pipelining and instruction level parallelism
More informationExploitation of instruction level parallelism
Exploitation of instruction level parallelism Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering
More informationAutomatic Algorithm Configuration based on Local Search
Automatic Algorithm Configuration based on Local Search Frank Hutter 1 Holger Hoos 1 Thomas Stützle 2 1 Department of Computer Science University of British Columbia Canada 2 IRIDIA Université Libre de
More informationEnergy-Efficiency Prediction of Multithreaded Workloads on Heterogeneous Composite Cores Architectures using Machine Learning Techniques
Energy-Efficiency Prediction of Multithreaded Workloads on Heterogeneous Composite Cores Architectures using Machine Learning Techniques Hossein Sayadi Department of Electrical and Computer Engineering
More informationCost Modelling for Vectorization on ARM
Cost Modelling for Vectorization on ARM Angela Pohl, Biagio Cosenza and Ben Juurlink ARM Research Summit 2018 Challenges of Auto-Vectorization in Compilers 1. Is it possible to vectorize the code? Passes:
More informationA Hyper-heuristic based on Random Gradient, Greedy and Dominance
A Hyper-heuristic based on Random Gradient, Greedy and Dominance Ender Özcan and Ahmed Kheiri University of Nottingham, School of Computer Science Jubilee Campus, Wollaton Road, Nottingham, NG8 1BB, UK
More informationCS252 S05. Outline. Dynamic Branch Prediction. Static Branch Prediction. Dynamic Branch Prediction. Dynamic Branch Prediction
Outline CMSC Computer Systems Architecture Lecture 9 Instruction Level Parallelism (Static & Dynamic Branch ion) ILP Compiler techniques to increase ILP Loop Unrolling Static Branch ion Dynamic Branch
More informationFADA : Fuzzy Array Dataflow Analysis
FADA : Fuzzy Array Dataflow Analysis M. Belaoucha, D. Barthou, S. Touati 27/06/2008 Abstract This document explains the basis of fuzzy data dependence analysis (FADA) and its applications on code fragment
More informationExploiting Phase Inter-Dependencies for Faster Iterative Compiler Optimization Phase Order Searches
1/26 Exploiting Phase Inter-Dependencies for Faster Iterative Compiler Optimization Phase Order Searches Michael R. Jantz Prasad A. Kulkarni Electrical Engineering and Computer Science, University of Kansas
More informationUnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss
UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss AAAI 2018, New Orleans, USA Simon Meister, Junhwa Hur, and Stefan Roth Department of Computer Science, TU Darmstadt 2 Deep
More informationA Bayesian Network Approach for Compiler Auto-tuning for Embedded Processors
A Bayesian Network Approach for Compiler Auto-tuning for Embedded Processors Amir Hossein Ashouri, Giovanni Mariani, Gianluca Palermo and Cristina Silvano Dipartimento di Elettronica, Informazione e Bioingegneria,
More informationCompiler Optimizations and Auto-tuning. Amir H. Ashouri Politecnico Di Milano -2014
Compiler Optimizations and Auto-tuning Amir H. Ashouri Politecnico Di Milano -2014 Compilation Compilation = Translation One piece of code has : Around 10 ^ 80 different translations Different platforms
More informationMilepost GCC: machine learning enabled self-tuning compiler
The final publication is available at: http://www.springerlink.com/content/d753r27550257252 Milepost GCC: machine learning enabled self-tuning compiler Grigori Fursin 12 Yuriy Kashnikov 2 Abdul Wahid Memon
More informationEfficient Hardware Acceleration on SoC- FPGA using OpenCL
Efficient Hardware Acceleration on SoC- FPGA using OpenCL Advisor : Dr. Benjamin Carrion Schafer Susmitha Gogineni 30 th August 17 Presentation Overview 1.Objective & Motivation 2.Configurable SoC -FPGA
More informationIntroduction to Machine Learning. Xiaojin Zhu
Introduction to Machine Learning Xiaojin Zhu jerryzhu@cs.wisc.edu Read Chapter 1 of this book: Xiaojin Zhu and Andrew B. Goldberg. Introduction to Semi- Supervised Learning. http://www.morganclaypool.com/doi/abs/10.2200/s00196ed1v01y200906aim006
More informationBEAMJIT: An LLVM based just-in-time compiler for Erlang. Frej Drejhammar
BEAMJIT: An LLVM based just-in-time compiler for Erlang Frej Drejhammar 140407 Who am I? Senior researcher at the Swedish Institute of Computer Science (SICS) working on programming languages,
More informationWrong Path Events and Their Application to Early Misprediction Detection and Recovery
Wrong Path Events and Their Application to Early Misprediction Detection and Recovery David N. Armstrong Hyesoon Kim Onur Mutlu Yale N. Patt University of Texas at Austin Motivation Branch predictors are
More informationAutomatic Algorithm Configuration based on Local Search
Automatic Algorithm Configuration based on Local Search Frank Hutter 1 Holger Hoos 1 Thomas Stützle 2 1 Department of Computer Science University of British Columbia Canada 2 IRIDIA Université Libre de
More informationComputational Interdisciplinary Modelling High Performance Parallel & Distributed Computing Our Research
Insieme Insieme-an Optimization System for OpenMP, MPI and OpenCLPrograms Institute of Computer Science University of Innsbruck Thomas Fahringer, Ivan Grasso, Klaus Kofler, Herbert Jordan, Hans Moritsch,
More informationPredicting GPU Performance from CPU Runs Using Machine Learning
Predicting GPU Performance from CPU Runs Using Machine Learning Ioana Baldini Stephen Fink Erik Altman IBM T. J. Watson Research Center Yorktown Heights, NY USA 1 To exploit GPGPU acceleration need to
More informationAccurate and Stable Empirical CPU Power Modelling for Multi- and Many-Core Systems
Accurate and Stable Empirical CPU Power Modelling for Multi- and Many-Core Systems Matthew J. Walker*, Stephan Diestelhorst, Geoff V. Merrett* and Bashir M. Al-Hashimi* *University of Southampton Arm Ltd.
More informationA Fast Instruction Set Simulator for RISC-V
A Fast Instruction Set Simulator for RISC-V Maxim.Maslov@esperantotech.com Vadim.Gimpelson@esperantotech.com Nikita.Voronov@esperantotech.com Dave.Ditzel@esperantotech.com Esperanto Technologies, Inc.
More informationFormalizing Fact Extraction
atem 2003 Preliminary Version Formalizing Fact Extraction Yuan Lin 1 School of Computer Science University of Waterloo 200 University Avenue West Waterloo, ON N2L 3G1, Canada Richard C. Holt 2 School of
More informationA Study for Branch Predictors to Alleviate the Aliasing Problem
A Study for Branch Predictors to Alleviate the Aliasing Problem Tieling Xie, Robert Evans, and Yul Chu Electrical and Computer Engineering Department Mississippi State University chu@ece.msstate.edu Abstract
More informationStatic Branch Prediction
Static Branch Prediction Branch prediction schemes can be classified into static and dynamic schemes. Static methods are usually carried out by the compiler. They are static because the prediction is already
More informationProcessor Performance and Parallelism Y. K. Malaiya
Processor Performance and Parallelism Y. K. Malaiya Processor Execution time The time taken by a program to execute is the product of n Number of machine instructions executed n Number of clock cycles
More informationCOSC 6385 Computer Architecture - Review for the 2 nd Quiz
COSC 6385 Computer Architecture - Review for the 2 nd Quiz Fall 2006 Covered topic area End of section 3 Multiple issue Speculative execution Limitations of hardware ILP Section 4 Vector Processors (Appendix
More informationMemory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple
Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases capacity and conflict misses, increases miss penalty Larger total cache capacity to reduce miss
More informationAdaptive Optimization for OpenCL Programs on Embedded Heterogeneous Systems
Adaptive Optimization for OpenCL Programs on Embedded Heterogeneous Systems Ben Taylor, Vicent Sanz Marco, Zheng Wang School of Computing and Communications, Lancaster University, UK {b.d.taylor, v.sanzmarco,
More informationECE 571 Advanced Microprocessor-Based Design Lecture 7
ECE 571 Advanced Microprocessor-Based Design Lecture 7 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 9 February 2016 HW2 Grades Ready Announcements HW3 Posted be careful when
More informationTECH. 9. Code Scheduling for ILP-Processors. Levels of static scheduling. -Eligible Instructions are
9. Code Scheduling for ILP-Processors Typical layout of compiler: traditional, optimizing, pre-pass parallel, post-pass parallel {Software! compilers optimizing code for ILP-processors, including VLIW}
More informationEfficient Runahead Threads Tanausú Ramírez Alex Pajuelo Oliverio J. Santana Onur Mutlu Mateo Valero
Efficient Runahead Threads Tanausú Ramírez Alex Pajuelo Oliverio J. Santana Onur Mutlu Mateo Valero The Nineteenth International Conference on Parallel Architectures and Compilation Techniques (PACT) 11-15
More informationIntermediate Programming, Spring 2017*
600.120 Intermediate Programming, Spring 2017* Misha Kazhdan *Much of the code in these examples is not commented because it would otherwise not fit on the slides. This is bad coding practice in general
More informationUG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects
Announcements UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects Inf3 Computer Architecture - 2017-2018 1 Last time: Tomasulo s Algorithm Inf3 Computer
More informationCS4617 Computer Architecture
1/27 CS4617 Computer Architecture Lecture 7: Instruction Set Architectures Dr J Vaughan October 1, 2014 2/27 ISA Classification Stack architecture: operands on top of stack Accumulator architecture: 1
More informationHW/SW Codesign. WCET Analysis
HW/SW Codesign WCET Analysis 29 November 2017 Andres Gomez gomeza@tik.ee.ethz.ch 1 Outline Today s exercise is one long question with several parts: Basic blocks of a program Static value analysis WCET
More informationLecture 7: Static ILP and branch prediction. Topics: static speculation and branch prediction (Appendix G, Section 2.3)
Lecture 7: Static ILP and branch prediction Topics: static speculation and branch prediction (Appendix G, Section 2.3) 1 Support for Speculation In general, when we re-order instructions, register renaming
More informationIntroducing the Latest SiFive RISC-V Core IP Series
Introducing the Latest SiFive RISC-V Core IP Series Drew Barbier DAC, June 2018 1 SiFive RISC-V Core IP Product Offering SiFive RISC-V Core IP Industry leading 32-bit and 64-bit Embedded Cores High performance
More informationInstruction Level Parallelism
Instruction Level Parallelism Software View of Computer Architecture COMP2 Godfrey van der Linden 200-0-0 Introduction Definition of Instruction Level Parallelism(ILP) Pipelining Hazards & Solutions Dynamic
More informationCOMPILER AUTOTUNING USING MACHINE LEARNING TECHNIQUES
POLITECNICO DI MILANO DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING (DEIB) DOCTORAL PROGRAM IN 2016 COMPILER AUTOTUNING USING MACHINE LEARNING TECHNIQUES Doctoral Dissertation of: Amir Hossein Ashouri
More informationAdaptive Optimization for OpenCL Programs on Embedded Heterogeneous Systems
Adaptive Optimization for OpenCL Programs on Embedded Heterogeneous Systems Ben Taylor, Vicent Sanz Marco, Zheng Wang School of Computing and Communications, Lancaster University, UK {b.d.taylor, v.sanzmarco,
More informationJPEG decoding using end of block markers to concurrently partition channels on a GPU. Patrick Chieppe (u ) Supervisor: Dr.
JPEG decoding using end of block markers to concurrently partition channels on a GPU Patrick Chieppe (u5333226) Supervisor: Dr. Eric McCreath JPEG Lossy compression Widespread image format Introduction
More information1993. (BP-2) (BP-5, BP-10) (BP-6, BP-10) (BP-7, BP-10) YAGS (BP-10) EECC722
Dynamic Branch Prediction Dynamic branch prediction schemes run-time behavior of branches to make predictions. Usually information about outcomes of previous occurrences of branches are used to predict
More informationInduction-variable Optimizations in GCC
Induction-variable Optimizations in GCC 程斌 bin.cheng@arm.com 2013.11 Outline Background Implementation of GCC Learned points Shortcomings Improvements Question & Answer References Background Induction
More informationMilepost GCC: Machine Learning Enabled Self-tuning Compiler
Int J Parallel Prog (2011) 39:296 327 DOI 10.1007/s10766-010-0161-2 Milepost GCC: Machine Learning Enabled Self-tuning Compiler Grigori Fursin Yuriy Kashnikov Abdul Wahid Memon Zbigniew Chamski Olivier
More informationRESTORE: REUSING RESULTS OF MAPREDUCE JOBS. Presented by: Ahmed Elbagoury
RESTORE: REUSING RESULTS OF MAPREDUCE JOBS Presented by: Ahmed Elbagoury Outline Background & Motivation What is Restore? Types of Result Reuse System Architecture Experiments Conclusion Discussion Background
More informationLecture 6: Inductive Logic Programming
Lecture 6: Inductive Logic Programming Cognitive Systems - Machine Learning Part II: Special Aspects of Concept Learning FOIL, Inverted Resolution, Sequential Covering last change November 15, 2010 Ute
More informationApplication Specific Signal Processors S
1 Application Specific Signal Processors 521281S Dept. of Computer Science and Engineering Mehdi Safarpour 23.9.2018 Course contents Lecture contents 1. Introduction and number formats 2. Signal processor
More informationThe Mercury project. Zoltan Somogyi
The Mercury project Zoltan Somogyi The University of Melbourne Linux Users Victoria 7 June 2011 Zoltan Somogyi (Linux Users Victoria) The Mercury project June 15, 2011 1 / 23 Introduction Mercury: objectives
More informationStochastic propositionalization of relational data using aggregates
Stochastic propositionalization of relational data using aggregates Valentin Gjorgjioski and Sašo Dzeroski Jožef Stefan Institute Abstract. The fact that data is already stored in relational databases
More informationIntelligent Compilation
Intelligent Compilation John Cavazos Department of Computer and Information Sciences University of Delaware Autotuning and Compilers Proposition: Autotuning is a component of an Intelligent Compiler. Code
More informationControl Hazards. Branch Recovery. Control Hazard Pipeline Diagram. Branch Performance
Control Hazards ranch Recovery D/
More informationFDO: Magic Make My Program Faster compilation option?
FDO: Magic Make My Program Faster compilation option? Paweł Moll Embedded Linux Conference Europe, Berlin, October 2016 Agenda FDO Basics Instrumentation based FDO Sample based ( Auto ) FDO Deployments
More informationLab 1: Using the LegUp High-level Synthesis Framework
Lab 1: Using the LegUp High-level Synthesis Framework 1 Introduction and Motivation This lab will give you an overview of how to use the LegUp high-level synthesis framework. In LegUp, you can compile
More informationInstruction-Level Parallelism Dynamic Branch Prediction. Reducing Branch Penalties
Instruction-Level Parallelism Dynamic Branch Prediction CS448 1 Reducing Branch Penalties Last chapter static schemes Move branch calculation earlier in pipeline Static branch prediction Always taken,
More informationLLVM performance optimization for z Systems
LLVM performance optimization for z Systems Dr. Ulrich Weigand Senior Technical Staff Member GNU/Linux Compilers & Toolchain Date: Mar 27, 2017 2017 IBM Corporation Agenda LLVM on z Systems Performance
More informationL25: Modern Compiler Design Exercises
L25: Modern Compiler Design Exercises David Chisnall Deadlines: October 26 th, November 23 th, November 9 th These simple exercises account for 20% of the course marks. They are intended to provide practice
More informationAutoTune Workshop. Michael Gerndt Technische Universität München
AutoTune Workshop Michael Gerndt Technische Universität München AutoTune Project Automatic Online Tuning of HPC Applications High PERFORMANCE Computing HPC application developers Compute centers: Energy
More informationImproving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique
www.ijcsi.org 29 Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique Anotai Siltepavet 1, Sukree Sinthupinyo 2 and Prabhas Chongstitvatana 3 1 Computer Engineering, Chulalongkorn
More information