Module II: Optimizing Serial Programs Part 1 - Compilation and Linking

Size: px
Start display at page:

Download "Module II: Optimizing Serial Programs Part 1 - Compilation and Linking"

Transcription

1 Performance Programming: Theory, Practice and Case Studies Module II: Optimizing Serial Programs Part 1 - Compilation and Linking 38

2 Compilation overview Outline Compiler optimizations General optimizations Specifying target architecture Function inlining Data alignment optimizations Data prefetching Aliasing optimizations Compiler macro options Compiler pragmas and directives Software pipelining Linking overview Static and dynamic linking Using optimized mathematical libraries 39

3 Compilation Overview 40 Compiler: translates source language program into target language program Analysis stage Synthesis stage Analysis stage converts source program into intermediate representation includes three stages: lexical, syntax and semantic analysis Synthesis stage converts intermediate representation into target language program is divided into two steps: code optimization and code generation Other activities: symbol table management, error handling, OS interface

4 Compiler Organization Example: Sun ONE Studio compilers 41 Different frontends (C, C++, Fortran 77, Fortran 95) generate intermediate representation (SunIR) Backend generates code for target architectures (SPARC, x86)

5 Using Compilers Optimizing compiler: most important performance enhancing tool Compiler optimizations from a usage perspective: Applicability Utility Cost Complementary Optimizations Example: using double-word alignment and architecture-specific optimizations results in better performance than when these are used in isolation Use the latest release of compiler for best performance 42

6 Selecting Compiler Options Determining optimal compiler options is an iterative process Order of flags can make a difference, e.g. later options may take precedence over earlier ones Possible cross-compilation: compilation and target systems may be different Default settings of options should be considered Macro options can be used Trade-off between optimization level and compilation time and resource utilization 43

7 Setting Optimization Level 44 The most basic and actively used optimization option Generally is set with -O option Different compilers have different number of levels and different meanings for each of them For example, GNU: O1-O3, Sun: O1-O5 Compaq (DEC) -O2 and HP +O2 don't necessarily mean the same thing By default (no option) usually no optimization is used. Default optimization level (usually -O) can mean different things and is not recommended. Higher optimizations lead to higher compilation times and larger binaries

8 Different Optimization Levels Example: Optimization levels for Sun ONE Studio compilers -O2: -O1 plus basic global optimization including algebraic simplification, local and global subexpression elimination, register allocation, elimination of the dead code, constant propagation, tail-call elimination. -O1: Basic local optimization, assembly postpass. -O3: -O2 plus loop unrolling, fusion, software pipelining. -O4: -O3 plus function inlining within the module, aggressive global optimization. 45 -O5: Highest optimization level, likely to improve performance when used in combination with profile feedback.

9 Example: Effect of Optimization Runtime for basic matrix-matrix multiplication compiled with different optimization B HP C compiler Run Time (sec.) None O1 O2 O3 O

10 Increased Compilation Time Compilation time for dblat3.f function from Netlib Compaq Fortran Compiler X5.4A B5P Compilation Time (sec.) -g O1 O2 O3 O4 O

11 Setting Target Architecture Selecting specific target architecture allows the compiler backend to optimally use available resources Instructions understood by the CPU Registers (number, types) CPU properties (latencies, pipeline depth, etc.) Example options IBM: -qarch=pwr4 Sun: -xarch=v8plusa SGI: -mips4 GNU: -march=pentiumpro DEC: -arch ev67 48

12 Benefit of Architecture Setting Example: repeated coordinate transformation Sun Forte 6 Fortran 77 compiler Runs on Sun Ultra 80 (400MHz UltraSPARC-II) CPU characterized by -xarch=v8plusa option Option -xarch=v8 (for SuperSPARC) is suboptimal Runtime in seconds v8 v8plusa

13 Function Call Inlining 50 Inlining: replicating function code in the callee function. Pros Cons Eliminates function call overhead Improved optimization due to code transparency across function calls Larger binaries and increased compilation time Possibility of increased instruction cache misses Inlining can be controlled by compiler options (optimization levels Ox and specialized options) Some environments allow inlining assembly templates

14 Basic example Example: Inlining Inlining point Function body 51

15 Example: Inlining (continued) Building executable with and without inlining Sun ONE Studio 7 compilers; Ultra 60 (360 MHz) Cross-file inlining Runtime in seconds no inline inline

16 Inlining Options 53 Options that control inlining for various compilers: -inline -xinline +Oinline -Q -Minline -xcrossfile High -Ox optimization levels can imply inlining Functions can be inlined selectively: e.g. - xinline=[f1..fn] Selective inlining may reduce the number of inlined functions (otherwise inlined with -Ox)

17 Vectorization of Standard Calls Standard math calls in large loops can be vectorized (exp, log, trig functions, etc.) Compiler can replace calls with vectorized versions provided in libraries Can be a dedicated option HP: +Ovectorize Sun: -xvector Can be included in some optimization -Ox level Can be controlled by directives or pragmas in the code 54

18 Profile/Feedback Optimization 55 Feedback directed optimizations based on runtime execution frequency data: Optimizations performed on more frequently executed portions Register allocation, basic block ordering, code motion and rearrangement, inlining Corresponding compiler options HP: +Oprofile=collect, +Oprofile=use Sun: -xprofile=collect, -xprofile=use Compaq: -feedback (in combination with Pixie) GNU: -fprofile-arcs, -fbranch-probabilities Using profile-feedback optimization Compile program with options to collect data Run on a training data set. Profile file or directory is generated Recompile with option to optimize based on collected data Other options should be used consistently

19 Data Prefetching 56 Prefetching allows overlapping execution with fetching data from memory Benefit memory latency bound applications (particularly on high latency machines) Efficient for programs with repeatable memory access patterns Best used in combination with option that specify microarchitecture features Sample compiler options HP: +Odataprefetch Sun: -xprefetch, -xprefetch_level SGI: pf<n>, prefetch, prefetch_ahead, prefetch_manual Prefetching can also be manually controlled with pragmas or directives

20 Benefits of Prefetching Example daxpy-type loop calculation Moderate effect on Sun Ultra 60 Big impact on high-latency Sun Enterprise Even higher effect on UltraSPARC-III based SunFire 480R (extra prefetch cache, higher clock speed makes relative latencies higher) Scaled runtimes Ultra 60 Sun E1000 Sun 480R no -xprefetch -xprefetch 57

21 Floating Point Optimizations 58 IEEE 754 floating point standard ensures similar numerical behavior on different platforms Binary representation of FP numbers Operation precedence Rounding Underflow, overflow and trap handling Some compiler optimizations relax IEEE 754 requirements (slight numeric differences can occur) Algebraic simplifications Underflow control Rounding control Trap handling Example options that affect FP behavior Sun: -fsimple, -fround, -fns, -ftrap HP: +FPstring, +FPVZO

22 Example: Algebraic Simplifications 59 Sun -fsimple option allows non-ieee arithmetic and specifies FP simplifying assumptions -fsimple=0 No simplifying assumptions; IEEE conformant. -fsimple=1 Conservative simplification; does not strictly conform to IEEE 754, but numeric results typically unchanged. IEEE 754 default rounding and trapping modes unchanged. Infinities or NaNs not propagated as NaNs; that is, x*0 can be replaced by 0. Computations do not depend on the sign of zero. -fsimple=2 Aggressive optimizations that may lead to different numeric results. Example: with -fsimple=2 computation of x/y replaced with x*z, where z=1/y is computed once. Other optimizations: cycle shrinking, height reduction of the directed acyclic graph, cross-iteration common subexpression elimination, and scalar replacement. -fsimple=2 best used in conjunction with other high optimization flags.

23 Algebraic Simplifications (cont.) Example 1: elimination of redundant FP operations causing exceptions: dum1 = 0.d0 dum2 = 20.d0/dum1 -fsimple=0 - program to aborts; -fsimple=1 okay Example2: -fsimple=1 and -fsimple=2 for dotproduct (Forte 6 Update 1 Fortran 90; Ultra60) sum=0.d0 do i=1,nele sum = sum + a(i)*b(i) enddo Runtime in seconds -fsimple=1 -fsimple=

24 Data Alignment 61 Most microprocessors have preferred data alignment Data on natural or preferred byte-boundaries accessed faster, e.g. data on 8-byte boundaries can be accessed in 1 doubleword load/store instruction Misaligned data may cause restrictive load/store instructions Programming language standards specify language-specific alignment rules Compiler makes conservative assumption on data-alignment Compiler options (+align, -dalign) cause generation of double-word load/store for DP data Padding may be inserted in Fortran COMMON blocks Must be used consistently (if one module compiled with it, compile all) Optimizer can better carry out other optimizations (e.g. software pipelining)

25 Example: Data Alignment Example: array copy on SunFire 480 Forte Developer 7 compiler Runtime in seconds Memory no -dalign -dalign L2 cache

26 Pointer Alias Analysis Options 63 C programs: Pointers can point to overlapping regions of memory (aliasing) causing memory ambiguity Memory alias disambiguation in general programs is very complex for compiler to perform Compiler is conservative: optimizations (load-hoisting, unrolling, s/w pipelining) suppressed Options can be used to inform the compiler about aliasing properties of the code IBM: -qalias, SGI: alias, SUN: -xrestrict, -xalias_level, HP: -alias, GNU: -fstrict-aliasing, - fargument-alias, -fstrict-aliasing Fortran standard makes it programmer's responsibility to ensure absence of aliasing

27 Pointer Alias Analysis Alias Relationships in a program: Does alias Does not alias May alias int *i, j=1; double func(double b) int sum, *vloc; i = &j { double *c; function foo (double *a) c = (double *)... malloc(sizeof(double)); for (i=0;i<n;i++) { *c = 10.0; for (j=0;j<m;j++) { b = b + *c; a[vloc[i]+j] = free(c); a[vloc[i]+j] + sum return(b); } } } Can have big performance impact in memory bound programs Compiler generates conservative code in does alias situation Large programs with many pointers: may alias is most common. Considered equivalent to does alias (conservative code generated) More the number of does not alias relationships, higher the flexibility compiler has in generating optimized code

28 Effect of Pointer Alias Analysis Computational program on Sun Blade 1000 Structures similar to those used in graph-partitioning Type-based alias disambiguation with -xalias_level=< setting> option; 7 settings: any, basic, weak, layout, strict, std, strong Without any setting: (default level) layout used Programs conforming to ISO 1999 C standard: std Runtime in seconds -xalias_level=strong (14 ld instructions) -xalias_level=std (21) -xalias_level=strict (21) -xalias_level=layout (31) -xalias_level=weak (31) xalias_level=basic (36) -xalias_level=any (66) 65

29 Compiler Macro Options 66 Some compilers provide meta-options that combine most effective optimizations Examples: Sun: -fast, SGI: -Ofast Can be a good starting point for selecting options Caveats Can interfere with other options Macro options can change between releases Can have components that should be used consitently on all parts of the code (e.g. data alignment) Some component options should be also used for linking Macro options can have architecture settings Sun ONE Studio C compiler: -fast -fns -fsimple=2 -fsingle -ftrap=%none - xalias_level=basic -native -xbuiltin=%all -xdepend -xlibmil -xmemalign=8s -xo5 - xprefetch=auto,explicit

30 Compiler Directives and Pragmas Directives and pragmas: Annotations inserted in source Provide compiler with specific information about parts of the program (usually statements following the annotation) Directives/pragmas specific to a compiler are usually ignored by other compilers First step in source-code modification Directives are used for Parallelization (e.g. OpenMP) Pipelining control Prefetching control Data alignment Alias analysis Storage allocation / array padding 67

31 Pipelining 68 Software pipelining Runtime of a program is T = Ninstr X Cycle-time X CPI A higher Instruction Level Parallelism (ILP) decreases CPI Technique to extract Instruction Level Parallelism (ILP) in loops Breaks loop iterations in multiple parts; parts from disjoint iterations are overlapped such that they map to different functional units of the processor Parallelism in loop iterations mapped onto ILP of the processor Software pipelining and loop unrolling: Independent but related approaches; often used together Loop unrolling: decreases loop overhead but no scheduling of instructions Software pipelining: instructions scheduled to decrease processor pipeline stalls by hiding instruction latencies

32 Pipelining (cont.) Software Pipelining via Modulo Scheduling In many cases it is not safe for compiler to pipeline a loop Loops with indirection Loops with branches or conditionals Fat (computationally dense) loops Loops where trip count calculation cannot be safely performed 69

33 Pipelining Control Example: extracted from reservoir simulation application (data-set fits in 4MB level-2 cache) Sun Forte 6 update 1 compilers do ivr=isvs(1,ksld),isvs(2,ksld) iv1 = isvr(1,ivr) iv2 = isvr(2,ivr) iv = isvr(3,ivr)!$pragma sun pipeloop = 0 do ic=iv1,iv2 in1 = ic + iv in2 = ic - icf wrk(in1) = wrk(in1) - a(in2)*wrk(ic) enddo Runtime in seconds enddo Pragma pipeloop No pragma

34 Linking Overview 71 Linking: stages in generating and running executables (link-editing and runtime linking) Link-editor Concatenates relocatable object files (produced by the compiler or assembler) to generate libraries or executables Performs symbol resolution to bind external symbols to implementations using the symbol tables in object files and libraries Runtime (dynamic) linker Loads executable files and shared libraries and generates a runnable process. Maps the files produced by the link-editor to memory and performs relocations Linker stages can be hidden: link-editing can done by the compiler and the runtime linker is invoked by running the executable

35 Static and Dynamic Linking Static linking: objects files from archives (*.a files) go into the executable Executable contains all the code it needs to run Easier to deploy and test applications Only the code for the required functions is used Limited portability/flexibility Dynamic linking: executable may have dependencies on runtime libraries Executables tend to be smaller (do not replicate the code ) Libraries can be shared between several processes running on the system (improves memory utilization and reduces paging) Greater flexibility for building and testing an application 72

36 Types of Libraries 73 System libraries (usually in /usr/lib) Typically shared libraries (can be static as well, but use of shared libs is recommended for portability) Some environments provide both 64-bit and 32-bit versions Compiler libraries Static and dynamic libraries In some environments these libraries might not be available on user system, developers can either link them statically or link dynamic versions and distribute libraries with the application The dynamically linked compiler libraries can be later replaced with a new version or patch User or application libraries An application can use dynamic or static libraries as well as the mixture of the two Greater flexibility with dynamic linking

37 Linker Mapfiles Mapfiles can be used to specify the layout of functions in libraries (and memory) Can improve instruction cache utilization and reduce paging activity if callers/callees are placed nearby Mapfiles can be generated by profiling tools Can be used for other purposes (e.g. to reduce the scope of symbols) 74

38 Optimized Mathematical Libraries Using optimized libraries takes advantage of the highly efficient implementations of standard mathematical functions Examples of optimized libraries Optimized versions of standard UNIX math (libm) calls Vectorized versions of standard calls Optimized inline templates Optimized and/or parallelized BLAS, FFT, etc. Libraries/calls with SIMD instructions Libraries for distributed computing 75

39 Vectorized Math Libraries Some libraries offer vector versions of some elementary mathematical functions Functions are evaluated for an entire vector of values at once Can be invoked (replacing standard calls) using compiler options Examples: HP: HP-VML (Itanium) Sun: Vector Math Library (libmvec) IBM: Vector Mathematical Acceleration SubSystem (MASS) 76

40 Optimized BLAS, FFT, etc. Optimized/parallelized libraries for BLAS, sparse BLAS, LAPACK, ScaLAPACK, FFT Versions for 32- and 64-bit, different CPU architectures, etc. Examples Sun: Performance Library (libsunperf), Scientific Subroutine Library (libs3l) HP: MLIB (VECLIB, LAPACK, ScaLAPACK, and Super- LU_DIST) IBM: Engineering and Scientific Subroutine Library (ESSL) SGI: Scientific Computing Software Library (SCSL) Compaq: Compaq Extended Math Library (CXML) 77

41 Single Instruction Multiple Data Single Instruction Multiple Data (SIMD) One instruction operates on several pieces of data Used in media, bioinformatics, etc. Can be accessed by wrappers provided in libraries SIMD implementations Sun: VIS Intel: MMX, SSE, SSE-2 AMD: 3DNow! Motorola: AltiVec 78

42 Summary 79 Compiler is the most efficient tool for optimizing applications Compiler options should be carefully selected Higher optimization leads to larger binaries and higher compilation times Large performance potential in using options related to setting architecture, data alignment and alias disambiguation Prefetching options can be used to hide memory latency Directives can be used to provide additional information to the compiler about regions of the code Optimized mathematical libraries can efficiently implement common APIs

Introduction. No Optimization. Basic Optimizations. Normal Optimizations. Advanced Optimizations. Inter-Procedural Optimizations

Introduction. No Optimization. Basic Optimizations. Normal Optimizations. Advanced Optimizations. Inter-Procedural Optimizations Introduction Optimization options control compile time optimizations to generate an application with code that executes more quickly. Absoft Fortran 90/95 is an advanced optimizing compiler. Various optimizers

More information

Kampala August, Agner Fog

Kampala August, Agner Fog Advanced microprocessor optimization Kampala August, 2007 Agner Fog www.agner.org Agenda Intel and AMD microprocessors Out Of Order execution Branch prediction Platform, 32 or 64 bits Choice of compiler

More information

Performance Tools and Environments Carlo Nardone. Technical Systems Ambassador GSO Client Solutions

Performance Tools and Environments Carlo Nardone. Technical Systems Ambassador GSO Client Solutions Performance Tools and Environments Carlo Nardone Technical Systems Ambassador GSO Client Solutions The Stack Applications Grid Management Standards, Open Source v. Commercial Libraries OS Management MPI,

More information

Compiler Options. Linux/x86 Performance Practical,

Compiler Options. Linux/x86 Performance Practical, Center for Information Services and High Performance Computing (ZIH) Compiler Options Linux/x86 Performance Practical, 17.06.2009 Zellescher Weg 12 Willers-Bau A106 Tel. +49 351-463 - 31945 Ulf Markwardt

More information

Compiling for Performance on hp OpenVMS I64. Doug Gordon Original Presentation by Bill Noyce European Technical Update Days, 2005

Compiling for Performance on hp OpenVMS I64. Doug Gordon Original Presentation by Bill Noyce European Technical Update Days, 2005 Compiling for Performance on hp OpenVMS I64 Doug Gordon Original Presentation by Bill Noyce European Technical Update Days, 2005 Compilers discussed C, Fortran, [COBOL, Pascal, BASIC] Share GEM optimizer

More information

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

The New C Standard (Excerpted material)

The New C Standard (Excerpted material) The New C Standard (Excerpted material) An Economic and Cultural Commentary Derek M. Jones derek@knosof.co.uk Copyright 2002-2008 Derek M. Jones. All rights reserved. 39 3.2 3.2 additive operators pointer

More information

AMD S X86 OPEN64 COMPILER. Michael Lai AMD

AMD S X86 OPEN64 COMPILER. Michael Lai AMD AMD S X86 OPEN64 COMPILER Michael Lai AMD CONTENTS Brief History AMD and Open64 Compiler Overview Major Components of Compiler Important Optimizations Recent Releases Performance Applications and Libraries

More information

Optimising with the IBM compilers

Optimising with the IBM compilers Optimising with the IBM Overview Introduction Optimisation techniques compiler flags compiler hints code modifications Optimisation topics locals and globals conditionals data types CSE divides and square

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

Compiler Optimization

Compiler Optimization Compiler Optimization The compiler translates programs written in a high-level language to assembly language code Assembly language code is translated to object code by an assembler Object code modules

More information

Advanced issues in pipelining

Advanced issues in pipelining Advanced issues in pipelining 1 Outline Handling exceptions Supporting multi-cycle operations Pipeline evolution Examples of real pipelines 2 Handling exceptions 3 Exceptions In pipelined execution, one

More information

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano

Building a Runnable Program and Code Improvement. Dario Marasco, Greg Klepic, Tess DiStefano Building a Runnable Program and Code Improvement Dario Marasco, Greg Klepic, Tess DiStefano Building a Runnable Program Review Front end code Source code analysis Syntax tree Back end code Target code

More information

CMSC 611: Advanced Computer Architecture

CMSC 611: Advanced Computer Architecture CMSC 611: Advanced Computer Architecture Compilers Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science

More information

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Instruction Set Principles The Role of Compilers MIPS 2 Main Content Computer

More information

Processor (IV) - advanced ILP. Hwansoo Han

Processor (IV) - advanced ILP. Hwansoo Han Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle

More information

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer

More information

What Compilers Can and Cannot Do. Saman Amarasinghe Fall 2009

What Compilers Can and Cannot Do. Saman Amarasinghe Fall 2009 What Compilers Can and Cannot Do Saman Amarasinghe Fall 009 Optimization Continuum Many examples across the compilation pipeline Static Dynamic Program Compiler Linker Loader Runtime System Optimization

More information

Lec 25: Parallel Processors. Announcements

Lec 25: Parallel Processors. Announcements Lec 25: Parallel Processors Kavita Bala CS 340, Fall 2008 Computer Science Cornell University PA 3 out Hack n Seek Announcements The goal is to have fun with it Recitations today will talk about it Pizza

More information

Exploitation of instruction level parallelism

Exploitation of instruction level parallelism Exploitation of instruction level parallelism Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering

More information

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel

More information

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight

More information

Optimizing Cache Performance in Matrix Multiplication. UCSB CS240A, 2017 Modified from Demmel/Yelick s slides

Optimizing Cache Performance in Matrix Multiplication. UCSB CS240A, 2017 Modified from Demmel/Yelick s slides Optimizing Cache Performance in Matrix Multiplication UCSB CS240A, 2017 Modified from Demmel/Yelick s slides 1 Case Study with Matrix Multiplication An important kernel in many problems Optimization ideas

More information

Administration. Prerequisites. Meeting times. CS 380C: Advanced Topics in Compilers

Administration. Prerequisites. Meeting times. CS 380C: Advanced Topics in Compilers Administration CS 380C: Advanced Topics in Compilers Instructor: eshav Pingali Professor (CS, ICES) Office: POB 4.126A Email: pingali@cs.utexas.edu TA: TBD Graduate student (CS) Office: Email: Meeting

More information

Programmazione Avanzata

Programmazione Avanzata Programmazione Avanzata Vittorio Ruggiero (v.ruggiero@cineca.it) Roma, Marzo 2017 Pipeline Outline CPU: internal parallelism? CPU are entirely parallel pipelining superscalar execution units SIMD MMX,

More information

Outline. Register Allocation. Issues. Storing values between defs and uses. Issues. Issues P3 / 2006

Outline. Register Allocation. Issues. Storing values between defs and uses. Issues. Issues P3 / 2006 P3 / 2006 Register Allocation What is register allocation Spilling More Variations and Optimizations Kostis Sagonas 2 Spring 2006 Storing values between defs and uses Program computes with values value

More information

Performance Issues and Query Optimization in Monet

Performance Issues and Query Optimization in Monet Performance Issues and Query Optimization in Monet Stefan Manegold Stefan.Manegold@cwi.nl 1 Contents Modern Computer Architecture: CPU & Memory system Consequences for DBMS - Data structures: vertical

More information

Cell SDK and Best Practices

Cell SDK and Best Practices Cell SDK and Best Practices Stefan Lutz Florian Braune Hardware-Software-Co-Design Universität Erlangen-Nürnberg siflbrau@mb.stud.uni-erlangen.de Stefan.b.lutz@mb.stud.uni-erlangen.de 1 Overview - Introduction

More information

Chapter 13 Reduced Instruction Set Computers

Chapter 13 Reduced Instruction Set Computers Chapter 13 Reduced Instruction Set Computers Contents Instruction execution characteristics Use of a large register file Compiler-based register optimization Reduced instruction set architecture RISC pipelining

More information

Optimisation p.1/22. Optimisation

Optimisation p.1/22. Optimisation Performance Tuning Optimisation p.1/22 Optimisation Optimisation p.2/22 Constant Elimination do i=1,n a(i) = 2*b*c(i) enddo What is wrong with this loop? Compilers can move simple instances of constant

More information

Scientific Computing. Some slides from James Lambers, Stanford

Scientific Computing. Some slides from James Lambers, Stanford Scientific Computing Some slides from James Lambers, Stanford Dense Linear Algebra Scaling and sums Transpose Rank-one updates Rotations Matrix vector products Matrix Matrix products BLAS Designing Numerical

More information

Virtual Machines and Dynamic Translation: Implementing ISAs in Software

Virtual Machines and Dynamic Translation: Implementing ISAs in Software Virtual Machines and Dynamic Translation: Implementing ISAs in Software Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology Software Applications How is a software application

More information

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction)

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction) EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction) Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering

More information

Introduction to Compilers

Introduction to Compilers Introduction to Compilers Compilers are language translators input: program in one language output: equivalent program in another language Introduction to Compilers Two types Compilers offline Data Program

More information

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Instruction Set Principles The Role of Compilers MIPS 2 Main Content Computer

More information

Multiple Choice Type Questions

Multiple Choice Type Questions Techno India Batanagar Computer Science and Engineering Model Questions Subject Name: Computer Architecture Subject Code: CS 403 Multiple Choice Type Questions 1. SIMD represents an organization that.

More information

Advanced Instruction-Level Parallelism

Advanced Instruction-Level Parallelism Advanced Instruction-Level Parallelism Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3050: Theory on Computer Architectures, Spring 2017, Jinkyu

More information

CPSC 313, 04w Term 2 Midterm Exam 2 Solutions

CPSC 313, 04w Term 2 Midterm Exam 2 Solutions 1. (10 marks) Short answers. CPSC 313, 04w Term 2 Midterm Exam 2 Solutions Date: March 11, 2005; Instructor: Mike Feeley 1a. Give an example of one important CISC feature that is normally not part of a

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST Chapter 4. Advanced Pipelining and Instruction-Level Parallelism In-Cheol Park Dept. of EE, KAIST Instruction-level parallelism Loop unrolling Dependence Data/ name / control dependence Loop level parallelism

More information

Chapter 4 The Processor 1. Chapter 4D. The Processor

Chapter 4 The Processor 1. Chapter 4D. The Processor Chapter 4 The Processor 1 Chapter 4D The Processor Chapter 4 The Processor 2 Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline

More information

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation Weiping Liao, Saengrawee (Anne) Pratoomtong, and Chuan Zhang Abstract Binary translation is an important component for translating

More information

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

A Study of Workstation Computational Performance for Real-Time Flight Simulation

A Study of Workstation Computational Performance for Real-Time Flight Simulation A Study of Workstation Computational Performance for Real-Time Flight Simulation Summary Jeffrey M. Maddalon Jeff I. Cleveland II This paper presents the results of a computational benchmark, based on

More information

Chapter 5:: Target Machine Architecture (cont.)

Chapter 5:: Target Machine Architecture (cont.) Chapter 5:: Target Machine Architecture (cont.) Programming Language Pragmatics Michael L. Scott Review Describe the heap for dynamic memory allocation? What is scope and with most languages how what happens

More information

Group B Assignment 8. Title of Assignment: Problem Definition: Code optimization using DAG Perquisite: Lex, Yacc, Compiler Construction

Group B Assignment 8. Title of Assignment: Problem Definition: Code optimization using DAG Perquisite: Lex, Yacc, Compiler Construction Group B Assignment 8 Att (2) Perm(3) Oral(5) Total(10) Sign Title of Assignment: Code optimization using DAG. 8.1.1 Problem Definition: Code optimization using DAG. 8.1.2 Perquisite: Lex, Yacc, Compiler

More information

Optimization Prof. James L. Frankel Harvard University

Optimization Prof. James L. Frankel Harvard University Optimization Prof. James L. Frankel Harvard University Version of 4:24 PM 1-May-2018 Copyright 2018, 2016, 2015 James L. Frankel. All rights reserved. Reasons to Optimize Reduce execution time Reduce memory

More information

CS450/650 Notes Winter 2013 A Morton. Superscalar Pipelines

CS450/650 Notes Winter 2013 A Morton. Superscalar Pipelines CS450/650 Notes Winter 2013 A Morton Superscalar Pipelines 1 Scalar Pipeline Limitations (Shen + Lipasti 4.1) 1. Bounded Performance P = 1 T = IC CPI 1 cycletime = IPC frequency IC IPC = instructions per

More information

Multiple Issue ILP Processors. Summary of discussions

Multiple Issue ILP Processors. Summary of discussions Summary of discussions Multiple Issue ILP Processors ILP processors - VLIW/EPIC, Superscalar Superscalar has hardware logic for extracting parallelism - Solutions for stalls etc. must be provided in hardware

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568 Part 10 Compiler Techniques / VLIW Israel Koren ECE568/Koren Part.10.1 FP Loop Example Add a scalar

More information

PERFORMANCE OPTIMISATION

PERFORMANCE OPTIMISATION PERFORMANCE OPTIMISATION Adrian Jackson a.jackson@epcc.ed.ac.uk @adrianjhpc Hardware design Image from Colfax training material Pipeline Simple five stage pipeline: 1. Instruction fetch get instruction

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Introduction to Parallel Computing W. P. Petersen Seminar for Applied Mathematics Department of Mathematics, ETHZ, Zurich wpp@math. ethz.ch P. Arbenz Institute for Scientific Computing Department Informatik,

More information

IA-64 Compiler Technology

IA-64 Compiler Technology IA-64 Compiler Technology David Sehr, Jay Bharadwaj, Jim Pierce, Priti Shrivastav (speaker), Carole Dulong Microcomputer Software Lab Page-1 Introduction IA-32 compiler optimizations Profile Guidance (PGOPTI)

More information

EKT 303 WEEK Pearson Education, Inc., Hoboken, NJ. All rights reserved.

EKT 303 WEEK Pearson Education, Inc., Hoboken, NJ. All rights reserved. + EKT 303 WEEK 13 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. + Chapter 15 Reduced Instruction Set Computers (RISC) Table 15.1 Characteristics of Some CISCs, RISCs, and Superscalar

More information

Multi-cycle Instructions in the Pipeline (Floating Point)

Multi-cycle Instructions in the Pipeline (Floating Point) Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining

More information

HY425 Lecture 09: Software to exploit ILP

HY425 Lecture 09: Software to exploit ILP HY425 Lecture 09: Software to exploit ILP Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS November 4, 2010 ILP techniques Hardware Dimitrios S. Nikolopoulos HY425 Lecture 09: Software to exploit

More information

55:132/22C:160, HPCA Spring 2011

55:132/22C:160, HPCA Spring 2011 55:132/22C:160, HPCA Spring 2011 Second Lecture Slide Set Instruction Set Architecture Instruction Set Architecture ISA, the boundary between software and hardware Specifies the logical machine that is

More information

Instruction Set Architecture

Instruction Set Architecture Computer Architecture Instruction Set Architecture Lynn Choi Korea University Machine Language Programming language High-level programming languages Procedural languages: C, PASCAL, FORTRAN Object-oriented

More information

HY425 Lecture 09: Software to exploit ILP

HY425 Lecture 09: Software to exploit ILP HY425 Lecture 09: Software to exploit ILP Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS November 4, 2010 Dimitrios S. Nikolopoulos HY425 Lecture 09: Software to exploit ILP 1 / 44 ILP techniques

More information

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need??

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need?? Outline EEL 7 Graduate Computer Architecture Chapter 3 Limits to ILP and Simultaneous Multithreading! Limits to ILP! Thread Level Parallelism! Multithreading! Simultaneous Multithreading Ann Gordon-Ross

More information

The SGI Pro64 Compiler Infrastructure - A Tutorial

The SGI Pro64 Compiler Infrastructure - A Tutorial The SGI Pro64 Compiler Infrastructure - A Tutorial Guang R. Gao (U of Delaware) J. Dehnert (SGI) J. N. Amaral (U of Alberta) R. Towle (SGI) Acknowledgement The SGI Compiler Development Teams The MIPSpro/Pro64

More information

Optimising for the p690 memory system

Optimising for the p690 memory system Optimising for the p690 memory Introduction As with all performance optimisation it is important to understand what is limiting the performance of a code. The Power4 is a very powerful micro-processor

More information

SRI VENKATESWARA COLLEGE OF ENGINEERING AND TECHNOLOGY DEPARTMENT OF ECE EC6504 MICROPROCESSOR AND MICROCONTROLLER (REGULATION 2013)

SRI VENKATESWARA COLLEGE OF ENGINEERING AND TECHNOLOGY DEPARTMENT OF ECE EC6504 MICROPROCESSOR AND MICROCONTROLLER (REGULATION 2013) SRI VENKATESWARA COLLEGE OF ENGINEERING AND TECHNOLOGY DEPARTMENT OF ECE EC6504 MICROPROCESSOR AND MICROCONTROLLER (REGULATION 2013) UNIT I THE 8086 MICROPROCESSOR PART A (2 MARKS) 1. What are the functional

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

Bindel, Fall 2011 Applications of Parallel Computers (CS 5220) Tuning on a single core

Bindel, Fall 2011 Applications of Parallel Computers (CS 5220) Tuning on a single core Tuning on a single core 1 From models to practice In lecture 2, we discussed features such as instruction-level parallelism and cache hierarchies that we need to understand in order to have a reasonable

More information

Intel Performance Libraries

Intel Performance Libraries Intel Performance Libraries Powerful Mathematical Library Intel Math Kernel Library (Intel MKL) Energy Science & Research Engineering Design Financial Analytics Signal Processing Digital Content Creation

More information

Compiler Architecture

Compiler Architecture Code Generation 1 Compiler Architecture Source language Scanner (lexical analysis) Tokens Parser (syntax analysis) Syntactic structure Semantic Analysis (IC generator) Intermediate Language Code Optimizer

More information

Agenda. What is the Itanium Architecture? Terminology What is the Itanium Architecture? Thomas Siebold Technology Consultant Alpha Systems Division

Agenda. What is the Itanium Architecture? Terminology What is the Itanium Architecture? Thomas Siebold Technology Consultant Alpha Systems Division What is the Itanium Architecture? Thomas Siebold Technology Consultant Alpha Systems Division thomas.siebold@hp.com Agenda Terminology What is the Itanium Architecture? 1 Terminology Processor Architectures

More information

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level Parallelism (ILP) &

More information

CSCE 5610: Computer Architecture

CSCE 5610: Computer Architecture HW #1 1.3, 1.5, 1.9, 1.12 Due: Sept 12, 2018 Review: Execution time of a program Arithmetic Average, Weighted Arithmetic Average Geometric Mean Benchmarks, kernels and synthetic benchmarks Computing CPI

More information

Code optimization techniques

Code optimization techniques & Alberto Bertoldo Advanced Computing Group Dept. of Information Engineering, University of Padova, Italy cyberto@dei.unipd.it May 19, 2009 The Four Commandments 1. The Pareto principle 80% of the effects

More information

LECTURE 19. Subroutines and Parameter Passing

LECTURE 19. Subroutines and Parameter Passing LECTURE 19 Subroutines and Parameter Passing ABSTRACTION Recall: Abstraction is the process by which we can hide larger or more complex code fragments behind a simple name. Data abstraction: hide data

More information

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013 Lecture 13: Memory Consistency + a Course-So-Far Review Parallel Computer Architecture and Programming Today: what you should know Understand the motivation for relaxed consistency models Understand the

More information

Chapter 9 Memory Management

Chapter 9 Memory Management Contents 1. Introduction 2. Computer-System Structures 3. Operating-System Structures 4. Processes 5. Threads 6. CPU Scheduling 7. Process Synchronization 8. Deadlocks 9. Memory Management 10. Virtual

More information

Math 230 Assembly Programming (AKA Computer Organization) Spring MIPS Intro

Math 230 Assembly Programming (AKA Computer Organization) Spring MIPS Intro Math 230 Assembly Programming (AKA Computer Organization) Spring 2008 MIPS Intro Adapted from slides developed for: Mary J. Irwin PSU CSE331 Dave Patterson s UCB CS152 M230 L09.1 Smith Spring 2008 MIPS

More information

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown

More information

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1 Table of Contents About the Authors... iii Introduction... xvii Chapter 1: System Software... 1 1.1 Concept of System Software... 2 Types of Software Programs... 2 Software Programs and the Computing Machine...

More information

DSP Mapping, Coding, Optimization

DSP Mapping, Coding, Optimization DSP Mapping, Coding, Optimization On TMS320C6000 Family using CCS (Code Composer Studio) ver 3.3 Started with writing a simple C code in the class, from scratch Project called First, written for C6713

More information

Cache-oblivious Programming

Cache-oblivious Programming Cache-oblivious Programming Story so far We have studied cache optimizations for array programs Main transformations: loop interchange, loop tiling Loop tiling converts matrix computations into block matrix

More information

These slides do not give detailed coverage of the material. See class notes and solved problems (last page) for more information.

These slides do not give detailed coverage of the material. See class notes and solved problems (last page) for more information. 11 1 This Set 11 1 These slides do not give detailed coverage of the material. See class notes and solved problems (last page) for more information. Text covers multiple-issue machines in Chapter 4, but

More information

Multi-core processors are here, but how do you resolve data bottlenecks in native code?

Multi-core processors are here, but how do you resolve data bottlenecks in native code? Multi-core processors are here, but how do you resolve data bottlenecks in native code? hint: it s all about locality Michael Wall October, 2008 part I of II: System memory 2 PDC 2008 October 2008 Session

More information

Page # Let the Compiler Do it Pros and Cons Pros. Exploiting ILP through Software Approaches. Cons. Perhaps a mixture of the two?

Page # Let the Compiler Do it Pros and Cons Pros. Exploiting ILP through Software Approaches. Cons. Perhaps a mixture of the two? Exploiting ILP through Software Approaches Venkatesh Akella EEC 270 Winter 2005 Based on Slides from Prof. Al. Davis @ cs.utah.edu Let the Compiler Do it Pros and Cons Pros No window size limitation, the

More information

Intel C++ Compiler Professional Edition 11.1 for Mac OS* X. In-Depth

Intel C++ Compiler Professional Edition 11.1 for Mac OS* X. In-Depth Intel C++ Compiler Professional Edition 11.1 for Mac OS* X In-Depth Contents Intel C++ Compiler Professional Edition 11.1 for Mac OS* X. 3 Intel C++ Compiler Professional Edition 11.1 Components:...3 Features...3

More information

Make Your C/C++ and PL/I Code FLY With the Right Compiler Options

Make Your C/C++ and PL/I Code FLY With the Right Compiler Options Make Your C/C++ and PL/I Code FLY With the Right Compiler Options Visda Vokhshoori/Peter Elderon IBM Corporation Session 13790 Insert Custom Session QR if Desired. WHAT does good application performance

More information

HPC VT Machine-dependent Optimization

HPC VT Machine-dependent Optimization HPC VT 2013 Machine-dependent Optimization Last time Choose good data structures Reduce number of operations Use cheap operations strength reduction Avoid too many small function calls inlining Use compiler

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

Overview Implicit Vectorisation Explicit Vectorisation Data Alignment Summary. Vectorisation. James Briggs. 1 COSMOS DiRAC.

Overview Implicit Vectorisation Explicit Vectorisation Data Alignment Summary. Vectorisation. James Briggs. 1 COSMOS DiRAC. Vectorisation James Briggs 1 COSMOS DiRAC April 28, 2015 Session Plan 1 Overview 2 Implicit Vectorisation 3 Explicit Vectorisation 4 Data Alignment 5 Summary Section 1 Overview What is SIMD? Scalar Processing:

More information

An Oracle White Paper June Optimizing Applications with Oracle Solaris Studio Compilers and Tools

An Oracle White Paper June Optimizing Applications with Oracle Solaris Studio Compilers and Tools An Oracle White Paper June 2010 Optimizing Applications with Oracle Solaris Studio Compilers and Tools Introduction...1 Oracle Solaris Studio Compilers and Tools...2 Optimizing Applications for Serial

More information

Lecture 7 Instruction Level Parallelism (5) EEC 171 Parallel Architectures John Owens UC Davis

Lecture 7 Instruction Level Parallelism (5) EEC 171 Parallel Architectures John Owens UC Davis Lecture 7 Instruction Level Parallelism (5) EEC 171 Parallel Architectures John Owens UC Davis Credits John Owens / UC Davis 2007 2009. Thanks to many sources for slide material: Computer Organization

More information

Number Representations

Number Representations Number Representations times XVII LIX CLXX -XVII D(CCL)LL DCCC LLLL X-X X-VII = DCCC CC III = MIII X-VII = VIIIII-VII = III 1/25/02 Memory Organization Viewed as a large, single-dimension array, with an

More information

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Several Common Compiler Strategies Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Basic Instruction Scheduling Reschedule the order of the instructions to reduce the

More information

DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING QUESTION BANK

DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING QUESTION BANK DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING QUESTION BANK SUBJECT : CS6303 / COMPUTER ARCHITECTURE SEM / YEAR : VI / III year B.E. Unit I OVERVIEW AND INSTRUCTIONS Part A Q.No Questions BT Level

More information

Memory Management. Reading: Silberschatz chapter 9 Reading: Stallings. chapter 7 EEL 358

Memory Management. Reading: Silberschatz chapter 9 Reading: Stallings. chapter 7 EEL 358 Memory Management Reading: Silberschatz chapter 9 Reading: Stallings chapter 7 1 Outline Background Issues in Memory Management Logical Vs Physical address, MMU Dynamic Loading Memory Partitioning Placement

More information

Introduction to Runtime Systems

Introduction to Runtime Systems Introduction to Runtime Systems Towards Portability of Performance ST RM Static Optimizations Runtime Methods Team Storm Olivier Aumage Inria LaBRI, in cooperation with La Maison de la Simulation Contents

More information

UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects

UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects Announcements UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects Inf3 Computer Architecture - 2017-2018 1 Last time: Tomasulo s Algorithm Inf3 Computer

More information

Intel Math Kernel Library 10.3

Intel Math Kernel Library 10.3 Intel Math Kernel Library 10.3 Product Brief Intel Math Kernel Library 10.3 The Flagship High Performance Computing Math Library for Windows*, Linux*, and Mac OS* X Intel Math Kernel Library (Intel MKL)

More information

CHAPTER 5 A Closer Look at Instruction Set Architectures

CHAPTER 5 A Closer Look at Instruction Set Architectures CHAPTER 5 A Closer Look at Instruction Set Architectures 5.1 Introduction 199 5.2 Instruction Formats 199 5.2.1 Design Decisions for Instruction Sets 200 5.2.2 Little versus Big Endian 201 5.2.3 Internal

More information

Intel s MMX. Why MMX?

Intel s MMX. Why MMX? Intel s MMX Dr. Richard Enbody CSE 820 Why MMX? Make the Common Case Fast Multimedia and Communication consume significant computing resources. Providing specific hardware support makes sense. 1 Goals

More information

HP PA-8000 RISC CPU. A High Performance Out-of-Order Processor

HP PA-8000 RISC CPU. A High Performance Out-of-Order Processor The A High Performance Out-of-Order Processor Hot Chips VIII IEEE Computer Society Stanford University August 19, 1996 Hewlett-Packard Company Engineering Systems Lab - Fort Collins, CO - Cupertino, CA

More information

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard.

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard. COMP 212 Computer Organization & Architecture Pipeline Re-Cap Pipeline is ILP -Instruction Level Parallelism COMP 212 Fall 2008 Lecture 12 RISC & Superscalar Divide instruction cycles into stages, overlapped

More information