Punctual Coalescing. Fernando Magno Quintão Pereira
|
|
- Chad Gilmore
- 5 years ago
- Views:
Transcription
1 Punctual Coalescing Fernando Magno Quintão Pereira
2 Register Coalescing Register coalescing is an op7miza7on on top of register alloca7on. The objec7ve is to map both variables used in a copy instruc7on to the same register. int a = 0; int b = a; print(b); a R1 b R2 R1 = 0; R2 = R1; print(r2); a R1 b R1 R1 = 0; R1 = R1; print(r1);
3 The Importance of Register Coalescing Size reduc7on Execu7on speedup Energy saving
4 Punctual Coalescing Register assignment technique based on puzzle solving. It deals with aliased registers, e.g x86. No worries: I will define these concepts It is O(K*I), where K = number of registers, and I is number of instruc7ons. Strength: very fast! Weakness: it is a local solu7on.
5 Register Aliasing Two registers alias when an assignment to one may change the value of the other. Ex. X86 s GP regs: AX BX CX DX AH AL BH BL CH CL DH DL Ex. PowerPC s doubles: D0 D15 F10 F1 F30 F31 And also ARM, UltraSparc, ST240, etc
6 Register Aliasing Register aliasing allows the register allocator to keep more variables in registers. E.g.: two floats in the same register. But finding a register assignment that minimizes the number of registers taken is NPcomplete. Lee et al: Aliased register alloca7on for straightline programs is NP complete, TCS, 2008
7 Register alloca3on: fit the bars on the boxes. Aliased register alloca3on: Bars may have different widths. Program Live Ranges Registers a = a X L Y L Z L X H Y H Z H B = B c = c d = B d E = c E = a, d, E
8 Complexity result: op7mal aliased register alloca7on is NP complete. Program Live Ranges Registers X H = a X L Y L Z L X H Y H Z H Y = B X L = c Y H = Y d Z = X L E = X H,Y H,Z
9 Elementary form: split variables between each pair of consecu7ve instruc7ons. Elementary Program a 1 = a Live Ranges Registers X L Y L Z L X H Y H Z H (a 2 )=(a 1 ) B 2 = B (a 3,B 3 )=(a 2,B 2 ) c 3 = c (a 4,B 4,c 4 )=(a 3,B 3,c 3 ) d 4 = B 4 d (a 5,c 5,d 5 )=(a 4,c 4,d 4 ) E 5 = c 5 E (a 6,c 6,E 6 )=(a 5,c 5,E 5 ) = a 6,d 6,E 6
10 Complexity result: aliased register alloca7on has polynomial 7me solu7on for elementary form programs. Elementary Program X H = a Live Ranges Registers X L Y L Z L X H Y H Z H B Y = X L = c Y H = Y Copy d Y L = X H E X = X L = Y L,Y H,X
11 Elementary Form = Slow RA? If a program has O(V) variables and O(I) instruc7ons, then the elementary form program has O(V*I) variables. If we assume that each instruc7on defines a variable, we have O(V*I) = O(V 2 ). Many graph coloring based register allocators are O(V 2 ). O(V 4 ) Is just too slow. We have a O(K*I) approach, where K is the number of registers!
12 Growth in number of variables Number of variables in elementary form trace 0 Data extracted from 4054 program traces taken from SPEC CPU Number of variables in the original trace
13 Our solu3on: traverse the program from top to boeom using the solu7on of previous puzzles to guide the solu7on of the next one to be solved. a 1 = a X L Y L Z L X H Y H Z H (a 2 )=(a 1 ) B 2 = B (a 3,B 3 )=(a 2,B 2 ) c 3 = c (a 4,B 4,c 4 )=(a 3,B 3,c 3 ) d 4 = B 4 (a 5,c 5,d 5 )=(a 4,c 4,d 4 ) d E What is the best alloca3on at this point of the program? E 5 = c 5 (a 6,c 6,E 6 )=(a 5,c 5,E 5 ) = a 6,d 6,E 6
14 Punctual Coalescing Instance: two consecu7ve puzzles, p 1 and p 2, such that p 1 is already solved. Problem: find a solu7on for p 2 that minimizes the number of copies inserted between p 1 and p 1. Puzzle p 1 is called the guider, and puzzle p 2 is called the follower.
15 Example Guider This solu7on has only three matches. Follower This solu7on is beeer: it has four matches.
16 Polynomial 7me Punctual Coalescing Punctual coalescing has a polynomial 7me solu7on, as long as the follower starts with an empty register board. These are 89% of all the puzzles found in SPEC CPU 2000, when compiled to x86 using LLVM. Our solu7on is O(K), where K is the number of registers. For the whole program, O(I*K). For the solu7on, see our paper: Punctual Coalescing.
17 Punctual Global 1 a b c d E Sequence of Op7mal Punctual coalescings Op7mal Global Coalescing 2 a, b, c, d := := b, d a b c d a b c d a a b b c c d d a a c c b b d d 3 4 E := := E, a, c a c a E a c E a c c E E a a c c E E Global Coalescing: find the register assignment that Minimizes the number of copies in the whole program. This problem is NP complete.
18 Experiments We have compared four different coalescers: A coalescing oblivious register allocator based on the coloring of chordal graphs. The punctual coalescing heuris7cs used in register alloca7on by puzzle solving. The op7mal punctual coalescer. An op7mal solu7on for global coalescing, based on integer linear programming. Data: program traces from SPEC CPU 2000.
19 Number of copies inserted Gzip Vpr Gcc Mcf Craly Parser Gap Vortex Bzip2 Twolf Chordal (x 1000) 13,4 47,6 329,8 4,75 46,2 27,0 174,6 199,3 10,5 101,8 Heuris7cs Punctual ILP
20 Conclusion Punctual coalescing is a viable alterna7ve for JIT compilers. Has been implemented in LLVM with very compe77ve compila7on 7mes. Results are very good for program traces. We need a beeer algorithm for whole func7on op7miza7on! Slower compila7on 7me, but beeer code quality.
Register Alloca.on Deconstructed. David Ryan Koes Seth Copen Goldstein
Register Alloca.on Deconstructed David Ryan Koes Seth Copen Goldstein 12th Interna+onal Workshop on So3ware and Compilers for Embedded Systems April 24, 12009 Register Alloca:on Problem unbounded number
More informationDesign of Experiments - Terminology
Design of Experiments - Terminology Response variable Measured output value E.g. total execution time Factors Input variables that can be changed E.g. cache size, clock rate, bytes transmitted Levels Specific
More informationThe Designing and Implementation of A SSA-based register allocator
The Designing and Implementation of A SSA-based register allocator Fernando Magno Quintao Pereira UCLA - University of California, Los Angeles Abstract. The register allocation problem has an exact polynomial
More informationAPPENDIX Summary of Benchmarks
158 APPENDIX Summary of Benchmarks The experimental results presented throughout this thesis use programs from four benchmark suites: Cyclone benchmarks (available from [Cyc]): programs used to evaluate
More informationBaggy bounds checking. Periklis Akri5dis, Manuel Costa, Miguel Castro, Steven Hand
Baggy bounds checking Periklis Akri5dis, Manuel Costa, Miguel Castro, Steven Hand C/C++ programs are vulnerable Lots of exis5ng code in C and C++ More being wrieen every day C/C++ programs are prone to
More informationLecture 3: Instruction Set Architecture
Lecture 3: Instruction Set Architecture Interface Software/compiler instruction set hardware Design Space of ISA Five Primary Dimensions Number of explicit operands ( 0, 1, 2, 3 ) Operand Storage Where
More informationData Structures Unit 02
Data Structures Unit 02 Bucharest University of Economic Studies Memory classes, Bit structures and operators, User data types Memory classes Define specific types of variables in order to differentiate
More informationMODE (mod) FIELD CODES. mod MEMORY MODE: 8-BIT DISPLACEMENT MEMORY MODE: 16- OR 32- BIT DISPLACEMENT REGISTER MODE
EXERCISE 9. Determine the mod bits from Figure 7-24 and write them in Table 7-7. MODE (mod) FIELD CODES mod 00 01 10 DESCRIPTION MEMORY MODE: NO DISPLACEMENT FOLLOWS MEMORY MODE: 8-BIT DISPLACEMENT MEMORY
More informationCompiler Optimization Intermediate Representation
Compiler Optimization Intermediate Representation Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology
More informationCredits and Disclaimers
Credits and Disclaimers 1 The examples and discussion in the following slides have been adapted from a variety of sources, including: Chapter 3 of Computer Systems 2 nd Edition by Bryant and O'Hallaron
More informationImprovements to Linear Scan register allocation
Improvements to Linear Scan register allocation Alkis Evlogimenos (alkis) April 1, 2004 1 Abstract Linear scan register allocation is a fast global register allocation first presented in [PS99] as an alternative
More informationMicroprocessors & Assembly Language Lab 1 (Introduction to 8086 Programming)
Microprocessors & Assembly Language Lab 1 (Introduction to 8086 Programming) Learning any imperative programming language involves mastering a number of common concepts: Variables: declaration/definition
More informationmith College Computer Science CSC231 - Assembly Week #4 Dominique Thiébaut
mith College Computer Science CSC231 - Assembly Week #4 Dominique Thiébaut dthiebaut@smith.edu Homework Solutions Outline Review Hexdump Pentium Data Registers 32-bit, 16-bit and 8-bit quantities (registers
More informationCode segment Stack segment
Registers Most of the registers contain data/instruction offsets within 64 KB memory segment. There are four different 64 KB segments for instructions, stack, data and extra data. To specify where in 1
More information: Advanced Compiler Design. 8.0 Instruc?on scheduling
6-80: Advanced Compiler Design 8.0 Instruc?on scheduling Thomas R. Gross Computer Science Department ETH Zurich, Switzerland Overview 8. Instruc?on scheduling basics 8. Scheduling for ILP processors 8.
More informationHigh-Level Synthesis Creating Custom Circuits from High-Level Code
High-Level Synthesis Creating Custom Circuits from High-Level Code Hao Zheng Comp Sci & Eng University of South Florida Exis%ng Design Flow Register-transfer (RT) synthesis - Specify RT structure (muxes,
More informationShengyue Wang, Xiaoru Dai, Kiran S. Yellajyosula, Antonia Zhai, Pen-Chung Yew Department of Computer Science & Engineering University of Minnesota
Loop Selection for Thread-Level Speculation, Xiaoru Dai, Kiran S. Yellajyosula, Antonia Zhai, Pen-Chung Yew Department of Computer Science & Engineering University of Minnesota Chip Multiprocessors (CMPs)
More informationProgram controlled semiconductor device (IC) which fetches (from memory), decodes and executes instructions.
8086 Microprocessor Microprocessor Program controlled semiconductor device (IC) which fetches (from memory), decodes and executes instructions. It is used as CPU (Central Processing Unit) in computers.
More informationKoji Inoue Department of Informatics, Kyushu University Japan Science and Technology Agency
Lock and Unlock: A Data Management Algorithm for A Security-Aware Cache Department of Informatics, Japan Science and Technology Agency ICECS'06 1 Background (1/2) Trusted Program Malicious Program Branch
More informationSimple and Efficient Construction of Static Single Assignment Form
Simple and Efficient Construction of Static Single Assignment Form saarland university Matthias Braun, Sebastian Buchwald, Sebastian Hack, Roland Leißa, Christoph Mallon and Andreas Zwinkau computer science
More informationThe x86 Architecture
The x86 Architecture Lecture 24 Intel Manual, Vol. 1, Chapter 3 Robb T. Koether Hampden-Sydney College Fri, Mar 20, 2015 Robb T. Koether (Hampden-Sydney College) The x86 Architecture Fri, Mar 20, 2015
More informationMicroarchitecture Overview. Performance
Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 18, 2005 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make
More informationMicroarchitecture Overview. Performance
Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 15, 2007 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make
More informationCredits and Disclaimers
Credits and Disclaimers 1 The examples and discussion in the following slides have been adapted from a variety of sources, including: Chapter 3 of Computer Systems 3 nd Edition by Bryant and O'Hallaron
More informationLecture 15 Intel Manual, Vol. 1, Chapter 3. Fri, Mar 6, Hampden-Sydney College. The x86 Architecture. Robb T. Koether. Overview of the x86
Lecture 15 Intel Manual, Vol. 1, Chapter 3 Hampden-Sydney College Fri, Mar 6, 2009 Outline 1 2 Overview See the reference IA-32 Intel Software Developer s Manual Volume 1: Basic, Chapter 3. Instructions
More informationECpE 185 Laboratory Hand Assembly Fall 2006
ECpE 185 Laboratory Hand Assembly Fall 2006 Hand-Assembly, Using DEBUG Introduction: In this Hand-Assembly Lab, you will develop an 8-bit version of the program from Debug Introduction Lab, using byte-size
More informationDetecting Global Stride Locality in Value Streams
Detecting Global Stride Locality in Value Streams Huiyang Zhou, Jill Flanagan, Thomas M. Conte TINKER Research Group Department of Electrical & Computer Engineering North Carolina State University 1 Introduction
More informationThe von Neumann Machine
The von Neumann Machine 1 1945: John von Neumann Wrote a report on the stored program concept, known as the First Draft of a Report on EDVAC also Alan Turing Konrad Zuse Eckert & Mauchly The basic structure
More informationEffective Memory Protection Using Dynamic Tainting
Effective Memory Protection Using Dynamic Tainting James Clause Alessandro Orso (software) and Ioanis Doudalis Milos Prvulovic (hardware) College of Computing Georgia Institute of Technology Supported
More informationStatic single assignment
Static single assignment Control-flow graph Loop-nesting forest Static single assignment SSA with dominance property Unique definition for each variable. Each definition dominates its uses. Static single
More informationADDRESSING MODES. Operands specify the data to be used by an instruction
ADDRESSING MODES Operands specify the data to be used by an instruction An addressing mode refers to the way in which the data is specified by an operand 1. An operand is said to be direct when it specifies
More informationInlining Java Native Calls at Runtime
Inlining Java Native Calls at Runtime (CASCON 2005 4 th Workshop on Compiler Driven Performance) Levon Stepanian, Angela Demke Brown Computer Systems Group Department of Computer Science, University of
More informationExecution-based Prediction Using Speculative Slices
Execution-based Prediction Using Speculative Slices Craig Zilles and Guri Sohi University of Wisconsin - Madison International Symposium on Computer Architecture July, 2001 The Problem Two major barriers
More informationAssembly Language Programming Introduction
Assembly Language Programming Introduction October 10, 2017 Motto: R7 is used by the processor as its program counter (PC). It is recommended that R7 not be used as a stack pointer. Source: PDP-11 04/34/45/55
More informationCS241 Computer Organization Spring 2015 IA
CS241 Computer Organization Spring 2015 IA-32 2-10 2015 Outline! Review HW#3 and Quiz#1! More on Assembly (IA32) move instruction (mov) memory address computation arithmetic & logic instructions (add,
More informationRegisters. Ray Seyfarth. September 8, Bit Intel Assembly Language c 2011 Ray Seyfarth
Registers Ray Seyfarth September 8, 2011 Outline 1 Register basics 2 Moving a constant into a register 3 Moving a value from memory into a register 4 Moving values from a register into memory 5 Moving
More informationLecture 1 Introduc-on
Lecture 1 Introduc-on What would you get out of this course? Structure of a Compiler Op9miza9on Example 15-745: Introduc9on 1 What Do Compilers Do? 1. Translate one language into another e.g., convert
More informationRegister Packing Exploiting Narrow-Width Operands for Reducing Register File Pressure
Register Packing Exploiting Narrow-Width Operands for Reducing Register File Pressure Oguz Ergin*, Deniz Balkan, Kanad Ghose, Dmitry Ponomarev Department of Computer Science State University of New York
More informationMoodle WILLINGDON COLLEGE SANGLI (B. SC.-II) Digital Electronics
Moodle 4 WILLINGDON COLLEGE SANGLI (B. SC.-II) Digital Electronics Advanced Microprocessors and Introduction to Microcontroller Moodle developed By Dr. S. R. Kumbhar Department of Electronics Willingdon
More informationSPRING TERM BM 310E MICROPROCESSORS LABORATORY PRELIMINARY STUDY
BACKGROUND Interrupts The INT instruction is the instruction which does the most work in any assembler program. What it does is it calls a DOS interrupt (like a function) to perform a special task. When
More informationChip-Multithreading Systems Need A New Operating Systems Scheduler
Chip-Multithreading Systems Need A New Operating Systems Scheduler Alexandra Fedorova Christopher Small Daniel Nussbaum Margo Seltzer Harvard University, Sun Microsystems Sun Microsystems Sun Microsystems
More informationSPRING TERM BM 310E MICROPROCESSORS LABORATORY PRELIMINARY STUDY
BACKGROUND 8086 CPU has 8 general purpose registers listed below: AX - the accumulator register (divided into AH / AL): 1. Generates shortest machine code 2. Arithmetic, logic and data transfer 3. One
More informationInstruction Set Architecture (ISA) Data Types
Instruction Set Architecture (ISA) Data Types Computer Systems: Section 4.1 Suppose you built a computer What Building Blocks would you use? Arithmetic Logic Unit (ALU) OP1 OP2 OPERATION ALU RES Full Adder
More informationComputer Processors. Part 2. Components of a Processor. Execution Unit The ALU. Execution Unit. The Brains of the Box. Processors. Execution Unit (EU)
Part 2 Computer Processors Processors The Brains of the Box Computer Processors Components of a Processor The Central Processing Unit (CPU) is the most complex part of a computer In fact, it is the computer
More informationJosé F. Martínez 1, Jose Renau 2 Michael C. Huang 3, Milos Prvulovic 2, and Josep Torrellas 2
CHERRY: CHECKPOINTED EARLY RESOURCE RECYCLING José F. Martínez 1, Jose Renau 2 Michael C. Huang 3, Milos Prvulovic 2, and Josep Torrellas 2 1 2 3 MOTIVATION Problem: Limited processor resources Goal: More
More informationRethinking Code Generation in Compilers
Rethinking Code Generation in Compilers Christian Schulte SCALE KTH Royal Institute of Technology & SICS (Swedish Institute of Computer Science) joint work with: Mats Carlsson SICS Roberto Castañeda Lozano
More informationArchitecture Cloning For PowerPC Processors. Edwin Chan, Raul Silvera, Roch Archambault IBM Toronto Lab Oct 17 th, 2005
Architecture Cloning For PowerPC Processors Edwin Chan, Raul Silvera, Roch Archambault edwinc@ca.ibm.com IBM Toronto Lab Oct 17 th, 2005 Outline Motivation Implementation Details Results Scenario Previously,
More informationCS 31: Intro to Systems ISAs and Assembly. Martin Gagné Swarthmore College February 7, 2017
CS 31: Intro to Systems ISAs and Assembly Martin Gagné Swarthmore College February 7, 2017 ANNOUNCEMENT All labs will meet in SCI 252 (the robot lab) tomorrow. Overview How to directly interact with hardware
More informationSystem calls and assembler
System calls and assembler Michal Sojka sojkam1@fel.cvut.cz ČVUT, FEL License: CC-BY-SA 4.0 System calls (repetition from lectures) A way for normal applications to invoke operating system (OS) kernel's
More informationModeling and Solving Code Generation for Real
Modeling and Solving Code Generation for Real Christian Schulte KTH Royal Institute of Technology & SICS (Swedish Institute of Computer Science) joint work with: Mats Carlsson SICS Roberto Castañeda Lozano
More informationEx : Write an ALP to evaluate x(y + z) where x = 10H, y = 20H and z = 30H and store the result in a memory location 54000H.
Ex : Write an ALP to evaluate x(y + z) where x = 10H, y = 20H and z = 30H and store the result in a memory location 54000H. MOV AX, 5000H MOV DS, AX MOV AL, 20H MOV CL, 30H ADD AL, CL MOV CL, 10H MUL CL
More informationInstruction Set Architectures
Instruction Set Architectures ISAs Brief history of processors and architectures C, assembly, machine code Assembly basics: registers, operands, move instructions 1 What should the HW/SW interface contain?
More informationIntroduction to IA-32. Jo, Heeseung
Introduction to IA-32 Jo, Heeseung IA-32 Processors Evolutionary design Starting in 1978 with 8086 Added more features as time goes on Still support old features, although obsolete Totally dominate computer
More informationLow-Level Programming
Programming for Electrical and Computer Engineers Low-Level Programming Dr. D. J. Jackson Lecture 15-1 Introduction Previous chapters have described C s high-level, machine-independent features. However,
More informationLow-Complexity Reorder Buffer Architecture*
Low-Complexity Reorder Buffer Architecture* Gurhan Kucuk, Dmitry Ponomarev, Kanad Ghose Department of Computer Science State University of New York Binghamton, NY 13902-6000 http://www.cs.binghamton.edu/~lowpower
More informationINTRODUCTION TO IA-32. Jo, Heeseung
INTRODUCTION TO IA-32 Jo, Heeseung IA-32 PROCESSORS Evolutionary design Starting in 1978 with 8086 Added more features as time goes on Still support old features, although obsolete Totally dominate computer
More informationUNIT 2 PROCESSORS ORGANIZATION CONT.
UNIT 2 PROCESSORS ORGANIZATION CONT. Types of Operand Addresses Numbers Integer/floating point Characters ASCII etc. Logical Data Bits or flags x86 Data Types Operands in 8 bit -Byte 16 bit- word 32 bit-
More informationAutomatic Selection of Compiler Options Using Non-parametric Inferential Statistics
Automatic Selection of Compiler Options Using Non-parametric Inferential Statistics Masayo Haneda Peter M.W. Knijnenburg Harry A.G. Wijshoff LIACS, Leiden University Motivation An optimal compiler optimization
More informationFaculty of Engineering Computer Engineering Department Islamic University of Gaza Assembly Language Lab # 2 Assembly Language Fundamentals
Faculty of Engineering Computer Engineering Department Islamic University of Gaza 2011 Assembly Language Lab # 2 Assembly Language Fundamentals Assembly Language Lab # 2 Assembly Language Fundamentals
More informationAddressing Modes on the x86
Addressing Modes on the x86 register addressing mode mov ax, ax, mov ax, bx mov ax, cx mov ax, dx constant addressing mode mov ax, 25 mov bx, 195 mov cx, 2056 mov dx, 1000 accessing data in memory There
More informationIntroduction to Machine/Assembler Language
COMP 40: Machine Structure and Assembly Language Programming Fall 2017 Introduction to Machine/Assembler Language Noah Mendelsohn Tufts University Email: noah@cs.tufts.edu Web: http://www.cs.tufts.edu/~noah
More informationDynamically Controlled Resource Allocation in SMT Processors
Dynamically Controlled Resource Allocation in SMT Processors Francisco J. Cazorla, Alex Ramirez, Mateo Valero Departament d Arquitectura de Computadors Universitat Politècnica de Catalunya Jordi Girona
More informationSHEET-2 ANSWERS. [1] Rewrite Program 2-3 to transfer one word at a time instead of one byte.
SHEET-2 ANSWERS [1] Rewrite Program 2-3 to transfer one word at a time instead of one byte. TITLE PROG2-3 PURPOSE: TRANSFER 6 WORDS OF DATA PAGE 60,132.MODEL SMALL.STACK 64.DATA ORG 10H DATA_IN DW 234DH,
More informationUMBC. A register, an immediate or a memory address holding the values on. Stores a symbolic name for the memory location that it represents.
Intel Assembly Format of an assembly instruction: LABEL OPCODE OPERANDS COMMENT DATA1 db 00001000b ;Define DATA1 as decimal 8 START: mov eax, ebx ;Copy ebx to eax LABEL: Stores a symbolic name for the
More informationCS241 Computer Organization Spring Introduction to Assembly
CS241 Computer Organization Spring 2015 Introduction to Assembly 2-05 2015 Outline! Rounding floats: round-to-even! Introduction to Assembly (IA32) move instruction (mov) memory address computation arithmetic
More informationThe von Neumann Machine
The von Neumann Machine 1 1945: John von Neumann Wrote a report on the stored program concept, known as the First Draft of a Report on EDVAC also Alan Turing Konrad Zuse Eckert & Mauchly The basic structure
More informationDnmaloc: a more secure memory allocator
Dnmaloc: a more secure memory allocator 28 September 2005 Yves Younan, Wouter Joosen, Frank Piessens and Hans Van den Eynden DistriNet, Department of Computer Science Katholieke Universiteit Leuven Belgium
More informationAssembly level Programming. 198:211 Computer Architecture. (recall) Von Neumann Architecture. Simplified hardware view. Lecture 10 Fall 2012
19:211 Computer Architecture Lecture 10 Fall 20 Topics:Chapter 3 Assembly Language 3.2 Register Transfer 3. ALU 3.5 Assembly level Programming We are now familiar with high level programming languages
More informationChapter 10. Improving the Runtime Type Checker Type-Flow Analysis
122 Chapter 10 Improving the Runtime Type Checker The runtime overhead of the unoptimized RTC is quite high, because it instruments every use of a memory location in the program and tags every user-defined
More informationA Cross-Architectural Interface for Code Cache Manipulation. Kim Hazelwood and Robert Cohn
A Cross-Architectural Interface for Code Cache Manipulation Kim Hazelwood and Robert Cohn Software-Managed Code Caches Software-managed code caches store transformed code at run time to amortize overhead
More informationMachine/Assembler Language Putting It All Together
COMP 40: Machine Structure and Assembly Language Programming Fall 2015 Machine/Assembler Language Putting It All Together Noah Mendelsohn Tufts University Email: noah@cs.tufts.edu Web: http://www.cs.tufts.edu/~noah
More informationLecture 5: Computer Organization Instruction Execution. Computer Organization Block Diagram. Components. General Purpose Registers.
Lecture 5: Computer Organization Instruction Execution Computer Organization Addressing Buses Fetch-Execute Cycle Computer Organization CPU Control Unit U Input Output Memory Components Control Unit fetches
More informationAdministration CS 412/413. Instruction ordering issues. Simplified architecture model. Examples. Impact of instruction ordering
dministration CS 1/13 Introduction to Compilers and Translators ndrew Myers Cornell University P due in 1 week Optional reading: Muchnick 17 Lecture 30: Instruction scheduling 1 pril 00 1 Impact of instruction
More informationOutline. Speculative Register Promotion Using Advanced Load Address Table (ALAT) Motivation. Motivation Example. Motivation
Speculative Register Promotion Using Advanced Load Address Table (ALAT Jin Lin, Tong Chen, Wei-Chung Hsu, Pen-Chung Yew http://www.cs.umn.edu/agassiz Motivation Outline Scheme of speculative register promotion
More informationPOSH: A TLS Compiler that Exploits Program Structure
POSH: A TLS Compiler that Exploits Program Structure Wei Liu, James Tuck, Luis Ceze, Wonsun Ahn, Karin Strauss, Jose Renau and Josep Torrellas Department of Computer Science University of Illinois at Urbana-Champaign
More informationEEM336 Microprocessors I. Data Movement Instructions
EEM336 Microprocessors I Data Movement Instructions Introduction This chapter concentrates on common data movement instructions. 2 Chapter Objectives Upon completion of this chapter, you will be able to:
More informationComputer Systems CSE 410 Autumn Memory Organiza:on and Caches
Computer Systems CSE 410 Autumn 2013 10 Memory Organiza:on and Caches 06 April 2012 Memory Organiza?on 1 Roadmap C: car *c = malloc(sizeof(car)); c->miles = 100; c->gals = 17; float mpg = get_mpg(c); free(c);
More informationExploiting Incorrectly Speculated Memory Operations in a Concurrent Multithreaded Architecture (Plus a Few Thoughts on Simulation Methodology)
Exploiting Incorrectly Speculated Memory Operations in a Concurrent Multithreaded Architecture (Plus a Few Thoughts on Simulation Methodology) David J Lilja lilja@eceumnedu Acknowledgements! Graduate students
More informationInstruction Set Architectures
Instruction Set Architectures! ISAs! Brief history of processors and architectures! C, assembly, machine code! Assembly basics: registers, operands, move instructions 1 What should the HW/SW interface
More informationicroprocessor istory of Microprocessor ntel 8086:
Microprocessor A microprocessor is an electronic device which computes on the given input similar to CPU of a computer. It is made by fabricating millions (or billions) of transistors on a single chip.
More informationWhich is the best? Measuring & Improving Performance (if planes were computers...) An architecture example
1 Which is the best? 2 Lecture 05 Performance Metrics and Benchmarking 3 Measuring & Improving Performance (if planes were computers...) Plane People Range (miles) Speed (mph) Avg. Cost (millions) Passenger*Miles
More informationOutline. Register Allocation. Issues. Storing values between defs and uses. Issues. Issues P3 / 2006
P3 / 2006 Register Allocation What is register allocation Spilling More Variations and Optimizations Kostis Sagonas 2 Spring 2006 Storing values between defs and uses Program computes with values value
More informationAccess. Young W. Lim Fri. Young W. Lim Access Fri 1 / 18
Access Young W. Lim 2017-01-27 Fri Young W. Lim Access 2017-01-27 Fri 1 / 18 Outline 1 Introduction References IA32 Operand Forms Data Movement Instructions Young W. Lim Access 2017-01-27 Fri 2 / 18 Based
More informationBloom Filtering Cache Misses for Accurate Data Speculation and Prefetching
Bloom Filtering Cache Misses for Accurate Data Speculation and Prefetching Jih-Kwon Peir, Shih-Chang Lai, Shih-Lien Lu, Jared Stark, Konrad Lai peir@cise.ufl.edu Computer & Information Science and Engineering
More informationMany Cores, One Thread: Dean Tullsen University of California, San Diego
Many Cores, One Thread: The Search for Nontraditional Parallelism University of California, San Diego There are some domains that feature nearly unlimited parallelism. Others, not so much Moore s Law and
More informationModule 3 Instruction Set Architecture (ISA)
Module 3 Instruction Set Architecture (ISA) I S A L E V E L E L E M E N T S O F I N S T R U C T I O N S I N S T R U C T I O N S T Y P E S N U M B E R O F A D D R E S S E S R E G I S T E R S T Y P E S O
More informationHardware and Software Architecture. Chapter 2
Hardware and Software Architecture Chapter 2 1 Basic Components The x86 processor communicates with main memory and I/O devices via buses Data bus for transferring data Address bus for the address of a
More informationSummer 2003 Lecture 26 07/24/03
Summer 2003 Lecture 26 07/24/03 Organization of Data on the Disk The logical organization of the FAT file system on a disk is made up of the following elements. BOOT Sector Root Directory Structure File
More informationInterfacing Compiler and Hardware. Computer Systems Architecture. Processor Types And Instruction Sets. What Instructions Should A Processor Offer?
Interfacing Compiler and Hardware Computer Systems Architecture FORTRAN 90 program C++ program Processor Types And Sets FORTRAN 90 Compiler C++ Compiler set level Hardware 1 2 What s Should A Processor
More informationCS 31: Intro to Systems ISAs and Assembly. Kevin Webb Swarthmore College February 9, 2016
CS 31: Intro to Systems ISAs and Assembly Kevin Webb Swarthmore College February 9, 2016 Reading Quiz Overview How to directly interact with hardware Instruction set architecture (ISA) Interface between
More informationA Cost Effective Spatial Redundancy with Data-Path Partitioning. Shigeharu Matsusaka and Koji Inoue Fukuoka University Kyushu University/PREST
A Cost Effective Spatial Redundancy with Data-Path Partitioning Shigeharu Matsusaka and Koji Inoue Fukuoka University Kyushu University/PREST 1 Outline Introduction Data-path Partitioning for a dependable
More informationTurning C into Object Code Code in files p1.c p2.c Compile with command: gcc -O p1.c p2.c -o p Use optimizations (-O) Put resulting binary in file p
Turning C into Object Code Code in files p1.c p2.c Compile with command: gcc -O p1.c p2.c -o p Use optimizations (-O) Put resulting binary in file p text C program (p1.c p2.c) Compiler (gcc -S) text Asm
More informationChapter 3: Instruc0on Level Parallelism and Its Exploita0on
Chapter 3: Instruc0on Level Parallelism and Its Exploita0on - Abdullah Muzahid Hardware- Based Specula0on (Sec0on 3.6) In mul0ple issue processors, stalls due to branches would be frequent: You may need
More informationIntegrated CPU and Cache Power Management in Multiple Clock Domain Processors
Integrated CPU and Cache Power Management in Multiple Clock Domain Processors Nevine AbouGhazaleh, Bruce Childers, Daniel Mossé & Rami Melhem Department of Computer Science University of Pittsburgh HiPEAC
More informationCS 31: Intro to Systems ISAs and Assembly. Kevin Webb Swarthmore College September 25, 2018
CS 31: Intro to Systems ISAs and Assembly Kevin Webb Swarthmore College September 25, 2018 Overview How to directly interact with hardware Instruction set architecture (ISA) Interface between programmer
More informationReverse Engineering II: The Basics
Reverse Engineering II: The Basics Gergely Erdélyi Senior Manager, Anti-malware Research Protecting the irreplaceable f-secure.com Binary Numbers 1 0 1 1 - Nibble B 1 0 1 1 1 1 0 1 - Byte B D 1 0 1 1 1
More informationProgram Op*miza*on and Analysis. Chenyang Lu CSE 467S
Program Op*miza*on and Analysis Chenyang Lu CSE 467S 1 Program Transforma*on op#mize Analyze HLL compile assembly assemble Physical Address Rela5ve Address assembly object load executable link Absolute
More informationAccess. Young W. Lim Sat. Young W. Lim Access Sat 1 / 19
Access Young W. Lim 2017-06-10 Sat Young W. Lim Access 2017-06-10 Sat 1 / 19 Outline 1 Introduction References IA32 Operand Forms Data Movement Instructions Data Movement Examples Young W. Lim Access 2017-06-10
More information/INFOMOV/ Optimization & Vectorization. J. Bikker - Sep-Nov Lecture 2: Low Level. Welcome!
/INFOMOV/ Optimization & Vectorization J. Bikker - Sep-Nov 2017 - Lecture 2: Low Level Welcome! INFOMOV Lecture 2 Low Level 2 Previously in INFOMOV Consistent Approach (0.) Determine optimization requirements
More informationA Global Progressive Register Allocator
A Global Progressive Register Allocator David Ryan Koes Seth Copen Goldstein School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213-3891 {dkoes,seth}@cs.cmu.edu Abstract This paper
More information