Declarative Machine Learning for Energy Efficient Compiler Optimisations

Size: px
Start display at page:

Download "Declarative Machine Learning for Energy Efficient Compiler Optimisations"

Transcription

1 Declarative Machine Learning for Energy Efficient Compiler Optimisations 1 Year PhD Review Craig Blackmore Supervised by Dr. Oliver Ray and Dr. Kerstin Eder 28 th October 2015

2 Introduction Motivation: reduce execution time and energy consumption of imperative programs

3 Introduction Motivation: reduce execution time and energy consumption of imperative programs Current focus: execution time of C programs for embedded architectures

4 Introduction Motivation: reduce execution time and energy consumption of imperative programs Current focus: execution time of C programs for embedded architectures Methodology: use machine learning to tune compiler parameters for a given program

5 Introduction Motivation: reduce execution time and energy consumption of imperative programs Current focus: execution time of C programs for embedded architectures Methodology: use machine learning to tune compiler parameters for a given program Research hypothesis: relational representation of source code allows learning of more accurate models by Inductive Logic Programming (ILP)

6 Compiling C Programs with GCC $ gcc program.c

7 Compiling C Programs with GCC $ gcc program.c $ time./a.out

8 Compiling C Programs with GCC $ gcc program.c $ time./a.out 1.70s

9 Compiling C Programs with GCC $ gcc program.c $ time./a.out 1.70s $ gcc program.c -O3 $ time./a.out 0.44s

10 Compiling C Programs with GCC $ gcc program.c $ time./a.out 1.70s $ gcc program.c -O3 $ time./a.out 0.44s $ gcc program.c -O3 -fno-guess-branch-probability $ time./a.out 0.27s

11 Compiling C Programs with GCC $ gcc program.c $ time./a.out 1.70s $ gcc program.c -O3 $ time./a.out 0.44s Configuration = set of compiler flags $ gcc program.c -O3 -fno-guess-branch-probability $ time./a.out 0.27s

12 Compiling C Programs with GCC $ gcc program.c $ time./a.out 1.70s $ gcc program.c -O3 $ time./a.out 0.44s Configuration = set of compiler flags $ gcc program.c -O3 -fno-guess-branch-probability $ time./a.out 0.27s How do we know which optimisations to use?

13 Random Iterative Compilation Better than O3 ARM Cortex-M3

14 Random Iterative Compilation 14% average possible improvement vs O3 (Up to 58% improvement) Better than O3 ARM Cortex-M3

15 Challenges Large search space Over 100 optimisation flags Over configurations Over years to exhaustive search Complex interactions between flags Good solutions program/platform dependent

16 Energy-time Correlation Single-threaded benchmarks Simple pipeline

17 Compiler Tuning Methods O1 O2 O3 Standard optimisation levels BEST EXECUTION TIME Milepost Predictive Methods Random Search Iterative Compilation Exhaustive Search (infeasible) SEARCH TIME

18 Compiler Tuning Methods O1 O2 O3 Standard optimisation levels BEST EXECUTION TIME Milepost ILP Hybrid Predictive Methods Random Search Iterative Compilation Exhaustive Search (infeasible) SEARCH TIME

19 Program Representation int fac(int x) { int y = 1; while (x > 1) { y = y*x; x = x-1; } return y; }

20 Program Representation int fac(int x) { int y = 1; while (x > 1) { y = y*x; x = x-1; } return y; } ed4 int y = 1; ed1 bb1 if (x > 1) bb2 ed2 y = y*x; x = x-1; bb3 return y; bb4 ed3

21 Program Representation Prolog-encoded IR: int y = 1; bb(bb1). bb(bb2). edge(ed1). edge_src(ed1,bb1). edge_dest(ed1,bb2).... ed4 ed1 bb1 if (x > 1) bb2 ed2 y = y*x; x = x-1; bb3 return y; bb4 ed3

22 Program Representation Prolog-encoded IR: int y = 1; bb(bb1). edge_src(ed1,bb1). edge_dest(ed1,bb2).... ed4 ed1 bb1 if (x > 1) bb2 ed2 ed3 y = y*x; x = x-1; bb3 return y; bb4 Feature vector: # basic blocks = 4 # edges = 4 # conditional branches = 1...

23 Program Representation Prolog-encoded IR: int y = 1; bb(bb1). edge_src(ed1,bb1). edge_dest(ed1,bb2).... ed4 ed1 bb1 if (x > 1) bb2 ed2 ed3 y = y*x; x = x-1; bb3 return y; bb4

24 22 cbench programs Milepost Intermediate Representation Random Iterative Compilation Good configurations Feature Vector (56 features) Machine Learning Flatten program into lossy feature vector {ft1=3, ft2=7,, ft56=6} Models (1NN, decision tree)

25 22 cbench programs Milepost Intermediate Representation Random Iterative Compilation Good configurations Feature Vector (56 features) Machine Learning Identify good configurations (within x% of the best config) Models (1NN, decision tree)

26 22 cbench programs Milepost Intermediate Representation Random Iterative Compilation Good configurations Feature Vector (56 features) Machine Learning Output models: Feature vector predicted configuration Models (1NN, decision tree)

27 60 BEEBS programs Our Approach Intermediate Representation Feature Vector (56 features) Random Iterative Compilation B Good/bad flags E ILP H Rule-based model

28 60 BEEBS programs Our Approach Intermediate Representation Random Iterative Compilation B Good/bad flags E ILP Bristol/Embecosm Embedded Benchmark Suite H (BEEBS): more diverse than cbench ~3x as many programs Rule-based model

29 60 BEEBS programs Our Approach Intermediate Representation Random Iterative Compilation B Good/bad flags E ILP Background knowledge B: edge_src(prog1,func1,bb1,ed1). edge_dest(prog1,func1,bb2,ed1). H H: Rule-based rule-based model

30 60 BEEBS programs Our Approach Intermediate Representation Random Iterative Compilation B Good/bad flags E ILP Background knowledge B: edge_src(prog1,func1,bb1,ed1). edge_dest(prog1,func1,bb2,ed1). bb_stmt_f(prog1,func1,bb2,st4). H H: Rule-based rule-based model

31 60 BEEBS programs Our Approach Intermediate Representation Random Iterative Compilation B Good/bad flags E ILP Background knowledge B: edge_src(prog1,func1,bb1,ed1). H: Rule-based rule-based edge_dest(prog1,func1,bb2,ed1). model bb_stmt_f(prog1,func1,bb2,st4). stmt_code(prog1,func1,st4,gimple_cond). H

32 60 BEEBS programs Our Approach Intermediate Representation Random Iterative Compilation B Good/bad flags E ILP H Examples E: badflag(prog1,'-flag-x'). Rule-based model

33 60 BEEBS programs Our Approach Intermediate Representation Random Iterative Compilation B Good/bad flags E ILP H Examples E: badflag(prog1,'-flag-x'). :- badflag(prog2,'-flag-x'). Rule-based model

34 60 BEEBS programs Our Approach Intermediate Representation Random Iterative Compilation B Good/bad flags E ILP H Rule-based Hypotheses H: model badflag(prog,'-flag-x') :- stmt_code(prog,func,stmt,gimple_cond).

35 Results

36 Results

37 Learned Hypotheses: Facts 19 flags always off (bad) 33 flags always on (good or indifferent)

38 Learned Hypotheses: Facts 19 flags always off (bad) 5 disabled at O3 33 flags always on (good or indifferent) 29 enabled at O3

39 Learned Hypotheses: Rules badflag(p,'-fguess-branch-probability') :- stmt_code(p,f,s,gimple_cond), expr_ge2(p,f,real_type).

40 Learned Hypotheses: Rules badflag(p,'-fguess-branch-probability') :- stmt_code(p,f,s,gimple_cond), expr_ge2(p,f,real_type). Disable -fguess-branch-probability if function F of program P has a conditional statement S and at least two statements with real number expressions

41 Learned Hypotheses: Rules badflag(p,'-fguess-branch-probability') :- stmt_code(p,f,s,gimple_cond), expr_ge2(p,f,real_type). Disable -fguess-branch-probability if function F of program P has a conditional statement S and at least two statements with real number expressions Compiler heuristic was wrong for functions with real number expressions

42 Learned Hypotheses: Rules badflag(p,'-fschedule-insns2') :- expr_ge2(p,f,array ref).

43 Learned Hypotheses: Rules badflag(p,'-fschedule-insns2') :- expr_ge2(p,f,array ref). Disable -fschedule-insns2 if function F of program P contains at least two statements that reference arrays

44 Learned Hypotheses: Rules badflag(p,'-fschedule-insns2') :- expr_ge2(p,f,array ref). Disable -fschedule-insns2 if function F of program P contains at least two statements that reference arrays New feature - no mention of arrays in feature vector

45 Hybrid Approach 8% average improvement vs O3 (Up to 50% improvement)

46 Hybrid Approach Predictions virtually instantaneous thanks to predictive model 8% average improvement vs O3 (Up to 50% improvement)

47 Ideas for Improving ILP Approach Improve flag examples Predict whole configurations (retain dependencies) Improve background knowledge

48 Compiler Tuning Methods O1 O2 O3 Standard optimisation levels BEST EXECUTION TIME Milepost ILP Hybrid Predictive Methods Random Search Iterative Compilation Exhaustive Search (infeasible) SEARCH TIME

49 Compiler Tuning Methods O1 O2 O3 Standard optimisation levels BEST EXECUTION TIME Milepost ILP Hybrid Predictive Methods Combined Elimination Random Search Iterative Compilation Exhaustive Search (infeasible) SEARCH TIME

50 Combined Elimination (CE) Introduced by Pan et al Hill-climbing & iterative compilation Starts with all flags enabled Prioritises removal of current worst flag

51 Combined Elimination (CE) Introduced by Pan et al Hill-climbing & iterative compilation Starts with all flags enabled Prioritises removal of current worst flag My evaluation 3 times as many flags and benchmarks Compares to random iterative compilation Tested 3 initial configurations: CE = All flags on CE-O3 = O3 CE-random = Best random configuration

52 Combined Elimination (CE) 17% average possible improvement vs O3

53 Combined Elimination (CE) CE outperforms random after 108 configs

54 Combined Elimination (CE) CE outperforms random after 108 configs

55 Downsampling Configuration Sets Iterative compilation (e.g configs) Reduced Set (e.g. 15 configs) Small set of configs that cover a range of programs

56 Downsampling Configuration Sets

57 Ideas for Improving Background Knowledge Continue developing extra rules to help generalise IR Milepost IR very large (60 lines of code 3000 facts) Other IRs to investigate LLVM IR Constrained Horn Clauses (e.g. SeaHorn)

58 Publications Blackmore, C., Ray, O., Eder, K.: A logic programming approach to predict effective compiler settings for embedded software. Theory and Practice of Logic Programming 15(4-5), (2015). Blackmore, C., Ray, O., Kull, M., et al.: Reframing of Classification and Regression Tasks for Predicting the Effects of Compiler Settings on Multiple Embedded Systems. 2nd International Workshop on Learning over Multiple Contexts (2015).

59 Long-Term Plan Compiler tuning for larger scale software Function-level optimisation Interesting energy cases Improve ILP approach Create tool for software engineers Iterative and predictive approaches

60 Conclusions ILP allows learning directly from IR Relational IR more expressive than feature vector Predictive approaches likely to be most beneficial at function-level Approach applicable to any metric that: can be measured AND is influenced by the compiler

61 Any questions?

An Evaluation of Autotuning Techniques for the Compiler Optimization Problems

An Evaluation of Autotuning Techniques for the Compiler Optimization Problems An Evaluation of Autotuning Techniques for the Compiler Optimization Problems Amir Hossein Ashouri, Gianluca Palermo and Cristina Silvano Politecnico di Milano, Milan, Italy {amirhossein.ashouri,ginaluca.palermo,cristina.silvano}@polimi.it

More information

Hugh Leather, Edwin Bonilla, Michael O'Boyle

Hugh Leather, Edwin Bonilla, Michael O'Boyle Automatic Generation for Machine Learning Based Optimizing Compilation Hugh Leather, Edwin Bonilla, Michael O'Boyle Institute for Computing Systems Architecture University of Edinburgh, UK Overview Introduction

More information

Who Ate My Battery? Why Free and Open Source Systems Are Solving the Problem of Excessive Energy Consumption. Jeremy Bennett

Who Ate My Battery? Why Free and Open Source Systems Are Solving the Problem of Excessive Energy Consumption. Jeremy Bennett Who Ate My Battery? Why Free and Open Source Systems Are Solving the Problem of Excessive Energy Consumption Jeremy Bennett Why? Ericsson T65 released 2001 Li-Ion 720 mah standby 300 h talk time 11 h includes

More information

MACHINE LEARNING BASED COMPILER OPTIMIZATION

MACHINE LEARNING BASED COMPILER OPTIMIZATION MACHINE LEARNING BASED COMPILER OPTIMIZATION Arash Ashari Slides have been copied and adapted from 2011 SIParCS by William Petzke; Self-Tuning Compilers Selecting a good set of compiler optimization flags

More information

Machine Learning based Compilation

Machine Learning based Compilation Machine Learning based Compilation Michael O Boyle March, 2014 1 Overview Machine learning - what is it and why is it useful? Predictive modelling OSE Scheduling and low level optimisation Loop unrolling

More information

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW. Computer Architectures S

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW. Computer Architectures S Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Computer Architectures 521480S Dynamic Branch Prediction Performance = ƒ(accuracy, cost of misprediction) Branch History Table (BHT) is simplest

More information

Understanding The Effects of Wrong-path Memory References on Processor Performance

Understanding The Effects of Wrong-path Memory References on Processor Performance Understanding The Effects of Wrong-path Memory References on Processor Performance Onur Mutlu Hyesoon Kim David N. Armstrong Yale N. Patt The University of Texas at Austin 2 Motivation Processors spend

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science ! CPI = (1-branch%) * non-branch CPI + branch% *

More information

arxiv: v3 [cs.oh] 25 May 2017

arxiv: v3 [cs.oh] 25 May 2017 Energy Transparency for Deeply Embedded Programs Kyriakos Georgiou 1, Steve Kerrison 1, Zbigniew Chamski 2, Kerstin Eder 1 1 University of Bristol 2 Infrasoft IT Solutions, Poland arxiv:1609.02193v3 [cs.oh]

More information

A Large-Scale Cross-Architecture Evaluation of Thread-Coarsening. Alberto Magni, Christophe Dubach, Michael O'Boyle

A Large-Scale Cross-Architecture Evaluation of Thread-Coarsening. Alberto Magni, Christophe Dubach, Michael O'Boyle A Large-Scale Cross-Architecture Evaluation of Thread-Coarsening Alberto Magni, Christophe Dubach, Michael O'Boyle Introduction Wide adoption of GPGPU for HPC Many GPU devices from many of vendors AMD

More information

Sireesha R Basavaraju Embedded Systems Group, Technical University of Kaiserslautern

Sireesha R Basavaraju Embedded Systems Group, Technical University of Kaiserslautern Sireesha R Basavaraju Embedded Systems Group, Technical University of Kaiserslautern Introduction WCET of program ILP Formulation Requirement SPM allocation for code SPM allocation for data Conclusion

More information

Creating Meaningful Training Data for Dicult Job Shop Scheduling Instances for Ordinal Regression

Creating Meaningful Training Data for Dicult Job Shop Scheduling Instances for Ordinal Regression Creating Meaningful Training Data for Dicult Job Shop Scheduling Instances for Ordinal Regression Helga Ingimundardóttir University of Iceland March 28 th, 2012 Outline Introduction Job Shop Scheduling

More information

Computer Science 246 Computer Architecture

Computer Science 246 Computer Architecture Computer Architecture Spring 2009 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Compiler ILP Static ILP Overview Have discussed methods to extract ILP from hardware Why can t some of these

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 8

ECE 571 Advanced Microprocessor-Based Design Lecture 8 ECE 571 Advanced Microprocessor-Based Design Lecture 8 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 16 February 2017 Announcements HW4 Due HW5 will be posted 1 HW#3 Review Energy

More information

Machine Learning for Software Engineering

Machine Learning for Software Engineering Machine Learning for Software Engineering Introduction and Motivation Prof. Dr.-Ing. Norbert Siegmund Intelligent Software Systems 1 2 Organizational Stuff Lectures: Tuesday 11:00 12:30 in room SR015 Cover

More information

Who Ate My Battery? Why Free and Open Source Systems Are Solving the Problem of Excessive Energy Consumption

Who Ate My Battery? Why Free and Open Source Systems Are Solving the Problem of Excessive Energy Consumption Who Ate My Battery? Why Free and Open Source Systems Are Solving the Problem of Excessive Energy Consumption Jeremy Bennett, Embecosm Kerstin Eder, Computer Science, University of Bristol Why? Ericsson

More information

Modern Processor Architectures. L25: Modern Compiler Design

Modern Processor Architectures. L25: Modern Compiler Design Modern Processor Architectures L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant minimising the number of instructions

More information

Symbolic AI. Andre Freitas. Photo by Vasilyev Alexandr

Symbolic AI. Andre Freitas. Photo by Vasilyev Alexandr Symbolic AI Andre Freitas Photo by Vasilyev Alexandr Acknowledgements These slides were based on the slides of: Peter A. Flach, Rule induction tutorial, IDA Spring School 2001. Anoop & Hector, Inductive

More information

Evolutionary Methods for State-based Testing

Evolutionary Methods for State-based Testing Evolutionary Methods for State-based Testing PhD Student Raluca Lefticaru Supervised by Florentin Ipate University of Piteşti, Romania Department of Computer Science Outline Motivation Search-based software

More information

Piecewise Holistic Autotuning of Compiler and Runtime Parameters

Piecewise Holistic Autotuning of Compiler and Runtime Parameters Piecewise Holistic Autotuning of Compiler and Runtime Parameters Mihail Popov, Chadi Akel, William Jalby, Pablo de Oliveira Castro University of Versailles Exascale Computing Research August 2016 C E R

More information

Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction

Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction ISA Support Needed By CPU Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with control hazards in instruction pipelines by: 1 2 3 4 Assuming that the branch

More information

A hybrid approach to application instrumentation

A hybrid approach to application instrumentation A hybrid approach to application instrumentation Ashay Rane, Leo Fialho and James Browne 4 th August, 2014 Petascale Tools Workshop 1 Program Instrumentation What is instrumentation? Addition of statements

More information

Power Struggles: Revisiting the RISC vs. CISC Debate on Contemporary ARM and x86 Architectures

Power Struggles: Revisiting the RISC vs. CISC Debate on Contemporary ARM and x86 Architectures Power Struggles: Revisiting the RISC vs. CISC Debate on Contemporary ARM and x86 Architectures by Ernily Blern, laikrishnan Menon, and Karthikeyan Sankaralingarn Danilo Dominguez Perez danilo0@iastate.edu

More information

Is dynamic compilation possible for embedded system?

Is dynamic compilation possible for embedded system? Is dynamic compilation possible for embedded system? Scopes 2015, St Goar Victor Lomüller, Henri-Pierre Charles CEA DACLE / Grenoble www.cea.fr June 2 2015 Introduction : Wake Up Questions Session FAQ

More information

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant

More information

LLFI: An Intermediate Code-Level Fault Injection Tool for Hardware Faults

LLFI: An Intermediate Code-Level Fault Injection Tool for Hardware Faults LLFI: An Intermediate Code-Level Fault Injection Tool for Hardware Faults Qining Lu, Mostafa Farahani, Jiesheng Wei, Anna Thomas, and Karthik Pattabiraman Department of Electrical and Computer Engineering,

More information

Outline of the module

Outline of the module Evolutionary and Heuristic Optimisation (ITNPD8) Lecture 2: Heuristics and Metaheuristics Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ Computing Science and Mathematics, School of Natural Sciences University

More information

arxiv: v2 [cs.pl] 28 Mar 2019

arxiv: v2 [cs.pl] 28 Mar 2019 Lost in translation: Exposing hidden compiler optimization opportunities Kyriakos Georgiou, Zbigniew Chamski, Andres Amaya Garcia, David May, Kerstin Eder University of Bristol, UK arxiv:1903.11397v2 [cs.pl]

More information

A Roadmap to an Enhanced Graph Based Data mining Approach for Multi-Relational Data mining

A Roadmap to an Enhanced Graph Based Data mining Approach for Multi-Relational Data mining A Roadmap to an Enhanced Graph Based Data mining Approach for Multi-Relational Data mining D.Kavinya 1 Student, Department of CSE, K.S.Rangasamy College of Technology, Tiruchengode, Tamil Nadu, India 1

More information

Dynamic Control Hazard Avoidance

Dynamic Control Hazard Avoidance Dynamic Control Hazard Avoidance Consider Effects of Increasing the ILP Control dependencies rapidly become the limiting factor they tend to not get optimized by the compiler more instructions/sec ==>

More information

A Comparative Performance Evaluation of Different Application Domains on Server Processor Architectures

A Comparative Performance Evaluation of Different Application Domains on Server Processor Architectures A Comparative Performance Evaluation of Different Application Domains on Server Processor Architectures W.M. Roshan Weerasuriya and D.N. Ranasinghe University of Colombo School of Computing A Comparative

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 9

ECE 571 Advanced Microprocessor-Based Design Lecture 9 ECE 571 Advanced Microprocessor-Based Design Lecture 9 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 30 September 2014 Announcements Next homework coming soon 1 Bulldozer Paper

More information

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat K Nearest Neighbor Wrap Up K- Means Clustering Slides adapted from Prof. Carpuat K Nearest Neighbor classification Classification is based on Test instance with Training Data K: number of neighbors that

More information

Amir H. Ashouri University of Toronto Canada

Amir H. Ashouri University of Toronto Canada Compiler Autotuning using Machine Learning: A State-of-the-art Review Amir H. Ashouri University of Toronto Canada 4 th July, 2018 Politecnico di Milano, Italy Background 2 Education B.Sc (2005-2009):

More information

Computer Science 146. Computer Architecture

Computer Science 146. Computer Architecture Computer rchitecture Spring 2004 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture 11: Software Pipelining and Global Scheduling Lecture Outline Review of Loop Unrolling Software Pipelining

More information

Control Hazards. Branch Prediction

Control Hazards. Branch Prediction Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional

More information

ECE571: Advanced Microprocessor Design Final Project Spring Officially Due: Friday, 4 May 2018 (Last day of Classes)

ECE571: Advanced Microprocessor Design Final Project Spring Officially Due: Friday, 4 May 2018 (Last day of Classes) Overview: ECE571: Advanced Microprocessor Design Final Project Spring 2018 Officially Due: Friday, 4 May 2018 (Last day of Classes) Design a project that explores the power, energy, and/or performance

More information

General Concepts. Abstraction Computational Paradigms Implementation Application Domains Influence on Success Influences on Design

General Concepts. Abstraction Computational Paradigms Implementation Application Domains Influence on Success Influences on Design General Concepts Abstraction Computational Paradigms Implementation Application Domains Influence on Success Influences on Design 1 Abstractions in Programming Languages Abstractions hide details that

More information

Introduction. L25: Modern Compiler Design

Introduction. L25: Modern Compiler Design Introduction L25: Modern Compiler Design Course Aims Understand the performance characteristics of modern processors Be familiar with strategies for optimising dynamic dispatch for languages like JavaScript

More information

A Survey on Compiler Autotuning using Machine Learning

A Survey on Compiler Autotuning using Machine Learning 1 A Survey on Compiler Autotuning using Machine Learning AMIR H. ASHOURI, University of Toronto, Canada WILLIAM KILLIAN, Millersville University of Pennsylvania, USA JOHN CAVAZOS, University of Delaware,

More information

Inductive Logic Programming in Clementine

Inductive Logic Programming in Clementine Inductive Logic Programming in Clementine Sam Brewer 1 and Tom Khabaza 2 Advanced Data Mining Group, SPSS (UK) Ltd 1st Floor, St. Andrew s House, West Street Woking, Surrey GU21 1EB, UK 1 sbrewer@spss.com,

More information

Lecture 8: Compiling for ILP and Branch Prediction. Advanced pipelining and instruction level parallelism

Lecture 8: Compiling for ILP and Branch Prediction. Advanced pipelining and instruction level parallelism Lecture 8: Compiling for ILP and Branch Prediction Kunle Olukotun Gates 302 kunle@ogun.stanford.edu http://www-leland.stanford.edu/class/ee282h/ 1 Advanced pipelining and instruction level parallelism

More information

Exploitation of instruction level parallelism

Exploitation of instruction level parallelism Exploitation of instruction level parallelism Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering

More information

Automatic Algorithm Configuration based on Local Search

Automatic Algorithm Configuration based on Local Search Automatic Algorithm Configuration based on Local Search Frank Hutter 1 Holger Hoos 1 Thomas Stützle 2 1 Department of Computer Science University of British Columbia Canada 2 IRIDIA Université Libre de

More information

Energy-Efficiency Prediction of Multithreaded Workloads on Heterogeneous Composite Cores Architectures using Machine Learning Techniques

Energy-Efficiency Prediction of Multithreaded Workloads on Heterogeneous Composite Cores Architectures using Machine Learning Techniques Energy-Efficiency Prediction of Multithreaded Workloads on Heterogeneous Composite Cores Architectures using Machine Learning Techniques Hossein Sayadi Department of Electrical and Computer Engineering

More information

Cost Modelling for Vectorization on ARM

Cost Modelling for Vectorization on ARM Cost Modelling for Vectorization on ARM Angela Pohl, Biagio Cosenza and Ben Juurlink ARM Research Summit 2018 Challenges of Auto-Vectorization in Compilers 1. Is it possible to vectorize the code? Passes:

More information

A Hyper-heuristic based on Random Gradient, Greedy and Dominance

A Hyper-heuristic based on Random Gradient, Greedy and Dominance A Hyper-heuristic based on Random Gradient, Greedy and Dominance Ender Özcan and Ahmed Kheiri University of Nottingham, School of Computer Science Jubilee Campus, Wollaton Road, Nottingham, NG8 1BB, UK

More information

CS252 S05. Outline. Dynamic Branch Prediction. Static Branch Prediction. Dynamic Branch Prediction. Dynamic Branch Prediction

CS252 S05. Outline. Dynamic Branch Prediction. Static Branch Prediction. Dynamic Branch Prediction. Dynamic Branch Prediction Outline CMSC Computer Systems Architecture Lecture 9 Instruction Level Parallelism (Static & Dynamic Branch ion) ILP Compiler techniques to increase ILP Loop Unrolling Static Branch ion Dynamic Branch

More information

FADA : Fuzzy Array Dataflow Analysis

FADA : Fuzzy Array Dataflow Analysis FADA : Fuzzy Array Dataflow Analysis M. Belaoucha, D. Barthou, S. Touati 27/06/2008 Abstract This document explains the basis of fuzzy data dependence analysis (FADA) and its applications on code fragment

More information

Exploiting Phase Inter-Dependencies for Faster Iterative Compiler Optimization Phase Order Searches

Exploiting Phase Inter-Dependencies for Faster Iterative Compiler Optimization Phase Order Searches 1/26 Exploiting Phase Inter-Dependencies for Faster Iterative Compiler Optimization Phase Order Searches Michael R. Jantz Prasad A. Kulkarni Electrical Engineering and Computer Science, University of Kansas

More information

UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss

UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss AAAI 2018, New Orleans, USA Simon Meister, Junhwa Hur, and Stefan Roth Department of Computer Science, TU Darmstadt 2 Deep

More information

A Bayesian Network Approach for Compiler Auto-tuning for Embedded Processors

A Bayesian Network Approach for Compiler Auto-tuning for Embedded Processors A Bayesian Network Approach for Compiler Auto-tuning for Embedded Processors Amir Hossein Ashouri, Giovanni Mariani, Gianluca Palermo and Cristina Silvano Dipartimento di Elettronica, Informazione e Bioingegneria,

More information

Compiler Optimizations and Auto-tuning. Amir H. Ashouri Politecnico Di Milano -2014

Compiler Optimizations and Auto-tuning. Amir H. Ashouri Politecnico Di Milano -2014 Compiler Optimizations and Auto-tuning Amir H. Ashouri Politecnico Di Milano -2014 Compilation Compilation = Translation One piece of code has : Around 10 ^ 80 different translations Different platforms

More information

Milepost GCC: machine learning enabled self-tuning compiler

Milepost GCC: machine learning enabled self-tuning compiler The final publication is available at: http://www.springerlink.com/content/d753r27550257252 Milepost GCC: machine learning enabled self-tuning compiler Grigori Fursin 12 Yuriy Kashnikov 2 Abdul Wahid Memon

More information

Efficient Hardware Acceleration on SoC- FPGA using OpenCL

Efficient Hardware Acceleration on SoC- FPGA using OpenCL Efficient Hardware Acceleration on SoC- FPGA using OpenCL Advisor : Dr. Benjamin Carrion Schafer Susmitha Gogineni 30 th August 17 Presentation Overview 1.Objective & Motivation 2.Configurable SoC -FPGA

More information

Introduction to Machine Learning. Xiaojin Zhu

Introduction to Machine Learning. Xiaojin Zhu Introduction to Machine Learning Xiaojin Zhu jerryzhu@cs.wisc.edu Read Chapter 1 of this book: Xiaojin Zhu and Andrew B. Goldberg. Introduction to Semi- Supervised Learning. http://www.morganclaypool.com/doi/abs/10.2200/s00196ed1v01y200906aim006

More information

BEAMJIT: An LLVM based just-in-time compiler for Erlang. Frej Drejhammar

BEAMJIT: An LLVM based just-in-time compiler for Erlang. Frej Drejhammar BEAMJIT: An LLVM based just-in-time compiler for Erlang Frej Drejhammar 140407 Who am I? Senior researcher at the Swedish Institute of Computer Science (SICS) working on programming languages,

More information

Wrong Path Events and Their Application to Early Misprediction Detection and Recovery

Wrong Path Events and Their Application to Early Misprediction Detection and Recovery Wrong Path Events and Their Application to Early Misprediction Detection and Recovery David N. Armstrong Hyesoon Kim Onur Mutlu Yale N. Patt University of Texas at Austin Motivation Branch predictors are

More information

Automatic Algorithm Configuration based on Local Search

Automatic Algorithm Configuration based on Local Search Automatic Algorithm Configuration based on Local Search Frank Hutter 1 Holger Hoos 1 Thomas Stützle 2 1 Department of Computer Science University of British Columbia Canada 2 IRIDIA Université Libre de

More information

Computational Interdisciplinary Modelling High Performance Parallel & Distributed Computing Our Research

Computational Interdisciplinary Modelling High Performance Parallel & Distributed Computing Our Research Insieme Insieme-an Optimization System for OpenMP, MPI and OpenCLPrograms Institute of Computer Science University of Innsbruck Thomas Fahringer, Ivan Grasso, Klaus Kofler, Herbert Jordan, Hans Moritsch,

More information

Predicting GPU Performance from CPU Runs Using Machine Learning

Predicting GPU Performance from CPU Runs Using Machine Learning Predicting GPU Performance from CPU Runs Using Machine Learning Ioana Baldini Stephen Fink Erik Altman IBM T. J. Watson Research Center Yorktown Heights, NY USA 1 To exploit GPGPU acceleration need to

More information

Accurate and Stable Empirical CPU Power Modelling for Multi- and Many-Core Systems

Accurate and Stable Empirical CPU Power Modelling for Multi- and Many-Core Systems Accurate and Stable Empirical CPU Power Modelling for Multi- and Many-Core Systems Matthew J. Walker*, Stephan Diestelhorst, Geoff V. Merrett* and Bashir M. Al-Hashimi* *University of Southampton Arm Ltd.

More information

A Fast Instruction Set Simulator for RISC-V

A Fast Instruction Set Simulator for RISC-V A Fast Instruction Set Simulator for RISC-V Maxim.Maslov@esperantotech.com Vadim.Gimpelson@esperantotech.com Nikita.Voronov@esperantotech.com Dave.Ditzel@esperantotech.com Esperanto Technologies, Inc.

More information

Formalizing Fact Extraction

Formalizing Fact Extraction atem 2003 Preliminary Version Formalizing Fact Extraction Yuan Lin 1 School of Computer Science University of Waterloo 200 University Avenue West Waterloo, ON N2L 3G1, Canada Richard C. Holt 2 School of

More information

A Study for Branch Predictors to Alleviate the Aliasing Problem

A Study for Branch Predictors to Alleviate the Aliasing Problem A Study for Branch Predictors to Alleviate the Aliasing Problem Tieling Xie, Robert Evans, and Yul Chu Electrical and Computer Engineering Department Mississippi State University chu@ece.msstate.edu Abstract

More information

Static Branch Prediction

Static Branch Prediction Static Branch Prediction Branch prediction schemes can be classified into static and dynamic schemes. Static methods are usually carried out by the compiler. They are static because the prediction is already

More information

Processor Performance and Parallelism Y. K. Malaiya

Processor Performance and Parallelism Y. K. Malaiya Processor Performance and Parallelism Y. K. Malaiya Processor Execution time The time taken by a program to execute is the product of n Number of machine instructions executed n Number of clock cycles

More information

COSC 6385 Computer Architecture - Review for the 2 nd Quiz

COSC 6385 Computer Architecture - Review for the 2 nd Quiz COSC 6385 Computer Architecture - Review for the 2 nd Quiz Fall 2006 Covered topic area End of section 3 Multiple issue Speculative execution Limitations of hardware ILP Section 4 Vector Processors (Appendix

More information

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases capacity and conflict misses, increases miss penalty Larger total cache capacity to reduce miss

More information

Adaptive Optimization for OpenCL Programs on Embedded Heterogeneous Systems

Adaptive Optimization for OpenCL Programs on Embedded Heterogeneous Systems Adaptive Optimization for OpenCL Programs on Embedded Heterogeneous Systems Ben Taylor, Vicent Sanz Marco, Zheng Wang School of Computing and Communications, Lancaster University, UK {b.d.taylor, v.sanzmarco,

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 7

ECE 571 Advanced Microprocessor-Based Design Lecture 7 ECE 571 Advanced Microprocessor-Based Design Lecture 7 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 9 February 2016 HW2 Grades Ready Announcements HW3 Posted be careful when

More information

TECH. 9. Code Scheduling for ILP-Processors. Levels of static scheduling. -Eligible Instructions are

TECH. 9. Code Scheduling for ILP-Processors. Levels of static scheduling. -Eligible Instructions are 9. Code Scheduling for ILP-Processors Typical layout of compiler: traditional, optimizing, pre-pass parallel, post-pass parallel {Software! compilers optimizing code for ILP-processors, including VLIW}

More information

Efficient Runahead Threads Tanausú Ramírez Alex Pajuelo Oliverio J. Santana Onur Mutlu Mateo Valero

Efficient Runahead Threads Tanausú Ramírez Alex Pajuelo Oliverio J. Santana Onur Mutlu Mateo Valero Efficient Runahead Threads Tanausú Ramírez Alex Pajuelo Oliverio J. Santana Onur Mutlu Mateo Valero The Nineteenth International Conference on Parallel Architectures and Compilation Techniques (PACT) 11-15

More information

Intermediate Programming, Spring 2017*

Intermediate Programming, Spring 2017* 600.120 Intermediate Programming, Spring 2017* Misha Kazhdan *Much of the code in these examples is not commented because it would otherwise not fit on the slides. This is bad coding practice in general

More information

UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects

UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects Announcements UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects Inf3 Computer Architecture - 2017-2018 1 Last time: Tomasulo s Algorithm Inf3 Computer

More information

CS4617 Computer Architecture

CS4617 Computer Architecture 1/27 CS4617 Computer Architecture Lecture 7: Instruction Set Architectures Dr J Vaughan October 1, 2014 2/27 ISA Classification Stack architecture: operands on top of stack Accumulator architecture: 1

More information

HW/SW Codesign. WCET Analysis

HW/SW Codesign. WCET Analysis HW/SW Codesign WCET Analysis 29 November 2017 Andres Gomez gomeza@tik.ee.ethz.ch 1 Outline Today s exercise is one long question with several parts: Basic blocks of a program Static value analysis WCET

More information

Lecture 7: Static ILP and branch prediction. Topics: static speculation and branch prediction (Appendix G, Section 2.3)

Lecture 7: Static ILP and branch prediction. Topics: static speculation and branch prediction (Appendix G, Section 2.3) Lecture 7: Static ILP and branch prediction Topics: static speculation and branch prediction (Appendix G, Section 2.3) 1 Support for Speculation In general, when we re-order instructions, register renaming

More information

Introducing the Latest SiFive RISC-V Core IP Series

Introducing the Latest SiFive RISC-V Core IP Series Introducing the Latest SiFive RISC-V Core IP Series Drew Barbier DAC, June 2018 1 SiFive RISC-V Core IP Product Offering SiFive RISC-V Core IP Industry leading 32-bit and 64-bit Embedded Cores High performance

More information

Instruction Level Parallelism

Instruction Level Parallelism Instruction Level Parallelism Software View of Computer Architecture COMP2 Godfrey van der Linden 200-0-0 Introduction Definition of Instruction Level Parallelism(ILP) Pipelining Hazards & Solutions Dynamic

More information

COMPILER AUTOTUNING USING MACHINE LEARNING TECHNIQUES

COMPILER AUTOTUNING USING MACHINE LEARNING TECHNIQUES POLITECNICO DI MILANO DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING (DEIB) DOCTORAL PROGRAM IN 2016 COMPILER AUTOTUNING USING MACHINE LEARNING TECHNIQUES Doctoral Dissertation of: Amir Hossein Ashouri

More information

Adaptive Optimization for OpenCL Programs on Embedded Heterogeneous Systems

Adaptive Optimization for OpenCL Programs on Embedded Heterogeneous Systems Adaptive Optimization for OpenCL Programs on Embedded Heterogeneous Systems Ben Taylor, Vicent Sanz Marco, Zheng Wang School of Computing and Communications, Lancaster University, UK {b.d.taylor, v.sanzmarco,

More information

JPEG decoding using end of block markers to concurrently partition channels on a GPU. Patrick Chieppe (u ) Supervisor: Dr.

JPEG decoding using end of block markers to concurrently partition channels on a GPU. Patrick Chieppe (u ) Supervisor: Dr. JPEG decoding using end of block markers to concurrently partition channels on a GPU Patrick Chieppe (u5333226) Supervisor: Dr. Eric McCreath JPEG Lossy compression Widespread image format Introduction

More information

1993. (BP-2) (BP-5, BP-10) (BP-6, BP-10) (BP-7, BP-10) YAGS (BP-10) EECC722

1993. (BP-2) (BP-5, BP-10) (BP-6, BP-10) (BP-7, BP-10) YAGS (BP-10) EECC722 Dynamic Branch Prediction Dynamic branch prediction schemes run-time behavior of branches to make predictions. Usually information about outcomes of previous occurrences of branches are used to predict

More information

Induction-variable Optimizations in GCC

Induction-variable Optimizations in GCC Induction-variable Optimizations in GCC 程斌 bin.cheng@arm.com 2013.11 Outline Background Implementation of GCC Learned points Shortcomings Improvements Question & Answer References Background Induction

More information

Milepost GCC: Machine Learning Enabled Self-tuning Compiler

Milepost GCC: Machine Learning Enabled Self-tuning Compiler Int J Parallel Prog (2011) 39:296 327 DOI 10.1007/s10766-010-0161-2 Milepost GCC: Machine Learning Enabled Self-tuning Compiler Grigori Fursin Yuriy Kashnikov Abdul Wahid Memon Zbigniew Chamski Olivier

More information

RESTORE: REUSING RESULTS OF MAPREDUCE JOBS. Presented by: Ahmed Elbagoury

RESTORE: REUSING RESULTS OF MAPREDUCE JOBS. Presented by: Ahmed Elbagoury RESTORE: REUSING RESULTS OF MAPREDUCE JOBS Presented by: Ahmed Elbagoury Outline Background & Motivation What is Restore? Types of Result Reuse System Architecture Experiments Conclusion Discussion Background

More information

Lecture 6: Inductive Logic Programming

Lecture 6: Inductive Logic Programming Lecture 6: Inductive Logic Programming Cognitive Systems - Machine Learning Part II: Special Aspects of Concept Learning FOIL, Inverted Resolution, Sequential Covering last change November 15, 2010 Ute

More information

Application Specific Signal Processors S

Application Specific Signal Processors S 1 Application Specific Signal Processors 521281S Dept. of Computer Science and Engineering Mehdi Safarpour 23.9.2018 Course contents Lecture contents 1. Introduction and number formats 2. Signal processor

More information

The Mercury project. Zoltan Somogyi

The Mercury project. Zoltan Somogyi The Mercury project Zoltan Somogyi The University of Melbourne Linux Users Victoria 7 June 2011 Zoltan Somogyi (Linux Users Victoria) The Mercury project June 15, 2011 1 / 23 Introduction Mercury: objectives

More information

Stochastic propositionalization of relational data using aggregates

Stochastic propositionalization of relational data using aggregates Stochastic propositionalization of relational data using aggregates Valentin Gjorgjioski and Sašo Dzeroski Jožef Stefan Institute Abstract. The fact that data is already stored in relational databases

More information

Intelligent Compilation

Intelligent Compilation Intelligent Compilation John Cavazos Department of Computer and Information Sciences University of Delaware Autotuning and Compilers Proposition: Autotuning is a component of an Intelligent Compiler. Code

More information

FDO: Magic Make My Program Faster compilation option?

FDO: Magic Make My Program Faster compilation option? FDO: Magic Make My Program Faster compilation option? Paweł Moll Embedded Linux Conference Europe, Berlin, October 2016 Agenda FDO Basics Instrumentation based FDO Sample based ( Auto ) FDO Deployments

More information

Lab 1: Using the LegUp High-level Synthesis Framework

Lab 1: Using the LegUp High-level Synthesis Framework Lab 1: Using the LegUp High-level Synthesis Framework 1 Introduction and Motivation This lab will give you an overview of how to use the LegUp high-level synthesis framework. In LegUp, you can compile

More information

Instruction-Level Parallelism Dynamic Branch Prediction. Reducing Branch Penalties

Instruction-Level Parallelism Dynamic Branch Prediction. Reducing Branch Penalties Instruction-Level Parallelism Dynamic Branch Prediction CS448 1 Reducing Branch Penalties Last chapter static schemes Move branch calculation earlier in pipeline Static branch prediction Always taken,

More information

LLVM performance optimization for z Systems

LLVM performance optimization for z Systems LLVM performance optimization for z Systems Dr. Ulrich Weigand Senior Technical Staff Member GNU/Linux Compilers & Toolchain Date: Mar 27, 2017 2017 IBM Corporation Agenda LLVM on z Systems Performance

More information

L25: Modern Compiler Design Exercises

L25: Modern Compiler Design Exercises L25: Modern Compiler Design Exercises David Chisnall Deadlines: October 26 th, November 23 th, November 9 th These simple exercises account for 20% of the course marks. They are intended to provide practice

More information

AutoTune Workshop. Michael Gerndt Technische Universität München

AutoTune Workshop. Michael Gerndt Technische Universität München AutoTune Workshop Michael Gerndt Technische Universität München AutoTune Project Automatic Online Tuning of HPC Applications High PERFORMANCE Computing HPC application developers Compute centers: Energy

More information

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique www.ijcsi.org 29 Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique Anotai Siltepavet 1, Sukree Sinthupinyo 2 and Prabhas Chongstitvatana 3 1 Computer Engineering, Chulalongkorn

More information