Register Alloca.on Deconstructed. David Ryan Koes Seth Copen Goldstein

Size: px
Start display at page:

Download "Register Alloca.on Deconstructed. David Ryan Koes Seth Copen Goldstein"

Transcription

1 Register Alloca.on Deconstructed David Ryan Koes Seth Copen Goldstein 12th Interna+onal Workshop on So3ware and Compilers for Embedded Systems April 24, 12009

2 Register Alloca:on Problem unbounded number of program variables v = 1 w = v + 3 x = w + v u = v t = u + x print(x); print(w); print(t); print(u); register allocator limited number of processor registers + slow memory eax ebx ecx edx esi edi esp ebp 2

3 Register Alloca:on Graph coloring Linear scan Op:mal frameworks Move elimina:on allocators 3

4 Register Alloca:on Graph coloring Linear scan Op:mal frameworks Move elimina:on allocators 3

5 Ques:ons What is the penalty of decomposing register alloca:on into individual components? What is the individual impact of each component on code quality? How far from op:mal are exis:ng heuris:cs? Our goal is to answer these ques.ons. An op:mal register alloca:on framework is used to empirically evaluate the importance of the components of register alloca:on, the impact of component integra:on, and the effec:veness of exis:ng heuris:cs. 4

6 Outline Mo:va:on Register Alloca:on Components Move Inser:on Coalescing Spilling Assignment Methodology Results Conclusion 5

7 Move Inser:on Addi:onal move instruc:ons can simplify assignment problem Can eliminate need to spill Does not directly improve code quality L1: write b read a write c read b write a read c branch L1 Live a a b c RA a a b c 6 move r0 -> r1

8 Move Inser:on Evalua:on full: move instruc:ons may be inserted at any program point limited: move instruc:ons may be inserted only at the entry and exit of basic blocks none: no register to register move instruc:ons are generated by the allocator 7

9 Coalescing Eliminate move instruc:ons by assigning each operand to the same loca:on Can be performed as separate pass lose ability to coalesce with physical registers lose ability to coalesce uncoalescables Pre alloc Coalescing Physical Reg Uncoalescable t1 t1 t2 t1 t2 t3 t3 t3 t3 t3 t3 r0 t1 t1 t2 t1 t2 t1 8

10 Coalescing Evalua:on integrated op.mal: move coalescing is solved op:mally as part of the complete register alloca:on problem integrated op.mal ignoring uncoalescable: the register allocator fully op:mizes only those move instruc:ons iden:fied as coalescable prior to register alloca:on separate op.mal: move coalescing is solved op:mally as a separate problem prior to alloca:on separate aggressive: a greedy heuris:c aggressively eliminates coalescable moves prior to register alloca:on none: no coalescing is performed 9

11 Spilling Can be performed as a separate pass spill variables to memory to meet register needs at each program point if move and swap inser:ons are allowed, assignment is now possible reg MAXLIVE #REG = 2 b c d 10 mem

12 Spilling Evalua:on integrated op.mal: spill code genera:on is solved op:mally as part of the complete register alloca:on problem separate op.mal: the spill code genera:on problem (reducing max liveness to meet register availability) is solved op:mally as a standalone problem separate heuris.c: the spill code genera:on problem is solved as a standalone problem using a heuris:c algorithm 11

13 Assignment assign physical register(s) to each variable at every program point may change assignment of variable by inser:ng move instruc:on if spilling and coalescing are performed separately, leaves assignment op:mizes for register preferences 12

14 Assignment Evalua:on integrated op.mal: assignment is solved op:mally as part of the complete register alloca:on problem graph heuris.c: a graph coloring based heuris:c is used to assign registers to the results of spill code genera:on; move instruc:ons may be inserted to improve colorability linear scan heuris.c: a linear scan based heuris:c is used to assign register to the results of spill code genera:on; move instruc:ons may be inserted to improve colorability 13

15 Methodology Implement op:mal register alloca:on framework in LLVM 2.4 Consider four target architectures and two code quality metrics CISC Fewer registers More registers RISC 14

16 Limita:ons Self selec:ng bias in results limited to those func:ons where an op:mal solu:on can be found in reasonable :meframe however, qualita:ve results do not appear to change as more :me is allowed for op:mal allocator Implement swap using memory loca:on Performance metric necessarily inexact (weighted sum of memory opera:ons) Evaluate performance only on desktop processors 15

17 Results: Code Size Evaluate subset of Mibench Consider all func:ons where op:mal solu:ons can be found in <10 minutes more than 70% coverage of func:ons Report code size increase rela:ve to fully op:mal (1.0 best possible result) 16

18 Results: Code Performance Evaluate subset of SPEC2006 Op:mize only cri:cal(>85% of running :me) func:ons Intel Core 2 Quad 2.4GHz Report geometric mean rela:ve to fully op:mal model Possible to do beier than op:mal due to limita:ons of metric 17

19 Move Inser:on: Code Size 18

20 Move Inser:on: Code Performance 19

21 Coalescing: Code Size 20

22 Coalescing: Code Performance 21

23 Spilling: Code Size 22

24 Spilling: Code Performance 23

25 Assignment: Code Size 24

26 Assignment: Code Performance 25

27 Heuris:cs: Code Size 26

28 Heuris:cs: Code Performance 27

29 Conclusions When targe:ng processor performance, new register allocator designs should focus on solving spill code op.miza.on as the coalescing, move inser:on, and register assignment problems are adequately solved using exis:ng heuris:cs. When targe:ng code size, new register allocator designs should focus on solving both the spill code op.miza.on and register assignment problems, possibly in an integrated framework. 28

: Advanced Compiler Design. 8.0 Instruc?on scheduling

: Advanced Compiler Design. 8.0 Instruc?on scheduling 6-80: Advanced Compiler Design 8.0 Instruc?on scheduling Thomas R. Gross Computer Science Department ETH Zurich, Switzerland Overview 8. Instruc?on scheduling basics 8. Scheduling for ILP processors 8.

More information

Register Allocation, iii. Bringing in functions & using spilling & coalescing

Register Allocation, iii. Bringing in functions & using spilling & coalescing Register Allocation, iii Bringing in functions & using spilling & coalescing 1 Function Calls ;; f(x) = let y = g(x) ;; in h(y+x) + y*5 (:f (x

More information

Punctual Coalescing. Fernando Magno Quintão Pereira

Punctual Coalescing. Fernando Magno Quintão Pereira Punctual Coalescing Fernando Magno Quintão Pereira Register Coalescing Register coalescing is an op7miza7on on top of register alloca7on. The objec7ve is to map both variables used in a copy instruc7on

More information

Data Representa/ons: IA32 + x86-64

Data Representa/ons: IA32 + x86-64 X86-64 Instruc/on Set Architecture Instructor: Sanjeev Se(a 1 Data Representa/ons: IA32 + x86-64 Sizes of C Objects (in Bytes) C Data Type Typical 32- bit Intel IA32 x86-64 unsigned 4 4 4 int 4 4 4 long

More information

UNIT V: CENTRAL PROCESSING UNIT

UNIT V: CENTRAL PROCESSING UNIT UNIT V: CENTRAL PROCESSING UNIT Agenda Basic Instruc1on Cycle & Sets Addressing Instruc1on Format Processor Organiza1on Register Organiza1on Pipeline Processors Instruc1on Pipelining Co-Processors RISC

More information

Lecture 1 Introduc-on

Lecture 1 Introduc-on Lecture 1 Introduc-on What would you get out of this course? Structure of a Compiler Op9miza9on Example 15-745: Introduc9on 1 What Do Compilers Do? 1. Translate one language into another e.g., convert

More information

Reading assignment. Chapter 3.1, 3.2 Chapter 4.1, 4.3

Reading assignment. Chapter 3.1, 3.2 Chapter 4.1, 4.3 Reading assignment Chapter 3.1, 3.2 Chapter 4.1, 4.3 1 Outline Introduc5on to assembly programing Introduc5on to Y86 Y86 instruc5ons, encoding and execu5on 2 Assembly The CPU uses machine language to perform

More information

MODE (mod) FIELD CODES. mod MEMORY MODE: 8-BIT DISPLACEMENT MEMORY MODE: 16- OR 32- BIT DISPLACEMENT REGISTER MODE

MODE (mod) FIELD CODES. mod MEMORY MODE: 8-BIT DISPLACEMENT MEMORY MODE: 16- OR 32- BIT DISPLACEMENT REGISTER MODE EXERCISE 9. Determine the mod bits from Figure 7-24 and write them in Table 7-7. MODE (mod) FIELD CODES mod 00 01 10 DESCRIPTION MEMORY MODE: NO DISPLACEMENT FOLLOWS MEMORY MODE: 8-BIT DISPLACEMENT MEMORY

More information

Lecture 3: Instruction Set Architecture

Lecture 3: Instruction Set Architecture Lecture 3: Instruction Set Architecture Interface Software/compiler instruction set hardware Design Space of ISA Five Primary Dimensions Number of explicit operands ( 0, 1, 2, 3 ) Operand Storage Where

More information

Compiler Optimization Intermediate Representation

Compiler Optimization Intermediate Representation Compiler Optimization Intermediate Representation Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology

More information

hashfs Applying Hashing to Op2mize File Systems for Small File Reads

hashfs Applying Hashing to Op2mize File Systems for Small File Reads hashfs Applying Hashing to Op2mize File Systems for Small File Reads Paul Lensing, Dirk Meister, André Brinkmann Paderborn Center for Parallel Compu2ng University of Paderborn Mo2va2on and Problem Design

More information

Machine Code and Assemblers November 6

Machine Code and Assemblers November 6 Machine Code and Assemblers November 6 CSC201 Section 002 Fall, 2000 Definitions Assembly time vs. link time vs. load time vs. run time.c file.asm file.obj file.exe file compiler assembler linker Running

More information

Chapter 2. lw $s1,100($s2) $s1 = Memory[$s2+100] sw $s1,100($s2) Memory[$s2+100] = $s1

Chapter 2. lw $s1,100($s2) $s1 = Memory[$s2+100] sw $s1,100($s2) Memory[$s2+100] = $s1 Chapter 2 1 MIPS Instructions Instruction Meaning add $s1,$s2,$s3 $s1 = $s2 + $s3 sub $s1,$s2,$s3 $s1 = $s2 $s3 addi $s1,$s2,4 $s1 = $s2 + 4 ori $s1,$s2,4 $s2 = $s2 4 lw $s1,100($s2) $s1 = Memory[$s2+100]

More information

Soft GPGPUs for Embedded FPGAS: An Architectural Evaluation

Soft GPGPUs for Embedded FPGAS: An Architectural Evaluation Soft GPGPUs for Embedded FPGAS: An Architectural Evaluation 2nd International Workshop on Overlay Architectures for FPGAs (OLAF) 2016 Kevin Andryc, Tedy Thomas and Russell Tessier University of Massachusetts

More information

Review Questions. 1 The DRAM problem [5 points] Suggest a solution. 2 Big versus Little Endian Addressing [5 points]

Review Questions. 1 The DRAM problem [5 points] Suggest a solution. 2 Big versus Little Endian Addressing [5 points] Review Questions 1 The DRAM problem [5 points] Suggest a solution 2 Big versus Little Endian Addressing [5 points] Consider the 32-bit hexadecimal number 0x21d3ea7d. 1. What is the binary representation

More information

PhD in Computer And Control Engineering XXVII cycle. Torino February 27th, 2015.

PhD in Computer And Control Engineering XXVII cycle. Torino February 27th, 2015. PhD in Computer And Control Engineering XXVII cycle Torino February 27th, 2015. Parallel and reconfigurable systems are more and more used in a wide number of applica7ons and environments, ranging from

More information

Machine- level Programming

Machine- level Programming Machine- level Programming Topics Assembly Programmer s Execu:on Model Accessing Informa:on Registers Memory Arithme:c opera:ons 1! Evolu:onary Design Star:ng in 1978 with 8086 IA32 Processors Added more

More information

Chapter 3: Instruc0on Level Parallelism and Its Exploita0on

Chapter 3: Instruc0on Level Parallelism and Its Exploita0on Chapter 3: Instruc0on Level Parallelism and Its Exploita0on - Abdullah Muzahid Hardware- Based Specula0on (Sec0on 3.6) In mul0ple issue processors, stalls due to branches would be frequent: You may need

More information

High-Level Synthesis Creating Custom Circuits from High-Level Code

High-Level Synthesis Creating Custom Circuits from High-Level Code High-Level Synthesis Creating Custom Circuits from High-Level Code Hao Zheng Comp Sci & Eng University of South Florida Exis%ng Design Flow Register-transfer (RT) synthesis - Specify RT structure (muxes,

More information

ADDRESSING MODES. Operands specify the data to be used by an instruction

ADDRESSING MODES. Operands specify the data to be used by an instruction ADDRESSING MODES Operands specify the data to be used by an instruction An addressing mode refers to the way in which the data is specified by an operand 1. An operand is said to be direct when it specifies

More information

UMBC. A register, an immediate or a memory address holding the values on. Stores a symbolic name for the memory location that it represents.

UMBC. A register, an immediate or a memory address holding the values on. Stores a symbolic name for the memory location that it represents. Intel Assembly Format of an assembly instruction: LABEL OPCODE OPERANDS COMMENT DATA1 db 00001000b ;Define DATA1 as decimal 8 START: mov eax, ebx ;Copy ebx to eax LABEL: Stores a symbolic name for the

More information

Register Allocation Deconstructed

Register Allocation Deconstructed Register Allocation Deconstructed David Ryan Koes Carnegie Mellon University Pittsburgh, PA dkoes@cs.cmu.edu Seth Copen Goldstein Carnegie Mellon University Pittsburgh, PA seth@cs.cmu.edu Abstract Register

More information

Extending Heuris.c Search

Extending Heuris.c Search Extending Heuris.c Search Talk at Hebrew University, Cri.cal MAS group Roni Stern Department of Informa.on System Engineering, Ben Gurion University, Israel 1 Heuris.c search 2 Outline Combining lookahead

More information

2012 David Black- Schaffer 1

2012 David Black- Schaffer 1 1 Instruc*on Set Architectures 2 Introduc

More information

Machine Programming I: Basics

Machine Programming I: Basics Machine- level Representa1on of Programs (Chapter 3): Machine- Programming Basics Instructor: Sanjeev Se1a 1 Machine Programming I: Basics History of Intel processors and architectures C, assembly, machine

More information

The x86 Architecture

The x86 Architecture The x86 Architecture Lecture 24 Intel Manual, Vol. 1, Chapter 3 Robb T. Koether Hampden-Sydney College Fri, Mar 20, 2015 Robb T. Koether (Hampden-Sydney College) The x86 Architecture Fri, Mar 20, 2015

More information

The von Neumann Machine

The von Neumann Machine The von Neumann Machine 1 1945: John von Neumann Wrote a report on the stored program concept, known as the First Draft of a Report on EDVAC also Alan Turing Konrad Zuse Eckert & Mauchly The basic structure

More information

x86 assembly CS449 Fall 2017

x86 assembly CS449 Fall 2017 x86 assembly CS449 Fall 2017 x86 is a CISC CISC (Complex Instruction Set Computer) e.g. x86 Hundreds of (complex) instructions Only a handful of registers RISC (Reduced Instruction Set Computer) e.g. MIPS

More information

CPSC 313, 04w Term 2 Midterm Exam 2 Solutions

CPSC 313, 04w Term 2 Midterm Exam 2 Solutions 1. (10 marks) Short answers. CPSC 313, 04w Term 2 Midterm Exam 2 Solutions Date: March 11, 2005; Instructor: Mike Feeley 1a. Give an example of one important CISC feature that is normally not part of a

More information

History of the Intel 80x86

History of the Intel 80x86 Intel s IA-32 Architecture Cptr280 Dr Curtis Nelson History of the Intel 80x86 1971 - Intel invents the microprocessor, the 4004 1975-8080 introduced 8-bit microprocessor 1978-8086 introduced 16 bit microprocessor

More information

Lecture 15 Intel Manual, Vol. 1, Chapter 3. Fri, Mar 6, Hampden-Sydney College. The x86 Architecture. Robb T. Koether. Overview of the x86

Lecture 15 Intel Manual, Vol. 1, Chapter 3. Fri, Mar 6, Hampden-Sydney College. The x86 Architecture. Robb T. Koether. Overview of the x86 Lecture 15 Intel Manual, Vol. 1, Chapter 3 Hampden-Sydney College Fri, Mar 6, 2009 Outline 1 2 Overview See the reference IA-32 Intel Software Developer s Manual Volume 1: Basic, Chapter 3. Instructions

More information

Digital Forensics Lecture 3 - Reverse Engineering

Digital Forensics Lecture 3 - Reverse Engineering Digital Forensics Lecture 3 - Reverse Engineering Low-Level Software Akbar S. Namin Texas Tech University Spring 2017 Reverse Engineering High-Level Software Low-level aspects of software are often the

More information

x86 assembly CS449 Spring 2016

x86 assembly CS449 Spring 2016 x86 assembly CS449 Spring 2016 CISC vs. RISC CISC [Complex instruction set Computing] - larger, more feature-rich instruction set (more operations, addressing modes, etc.). slower clock speeds. fewer general

More information

How Software Executes

How Software Executes How Software Executes CS-576 Systems Security Instructor: Georgios Portokalidis Overview Introduction Anatomy of a program Basic assembly Anatomy of function calls (and returns) Memory Safety Programming

More information

CSE P 501 Compilers. Instruc7on Selec7on Hal Perkins Winter UW CSE P 501 Winter 2016 N-1

CSE P 501 Compilers. Instruc7on Selec7on Hal Perkins Winter UW CSE P 501 Winter 2016 N-1 CSE P 501 Compilers Instruc7on Selec7on Hal Perkins Winter 2016 UW CSE P 501 Winter 2016 N-1 Agenda Compiler back- end organiza7on Instruc7on selec7on tree padern matching Credits: Slides by Keith Cooper

More information

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 21: Generating Pentium Code 10 March 08

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 21: Generating Pentium Code 10 March 08 CS412/CS413 Introduction to Compilers Tim Teitelbaum Lecture 21: Generating Pentium Code 10 March 08 CS 412/413 Spring 2008 Introduction to Compilers 1 Simple Code Generation Three-address code makes it

More information

Compiler Construction D7011E

Compiler Construction D7011E Compiler Construction D7011E Lecture 8: Introduction to code generation Viktor Leijon Slides largely by Johan Nordlander with material generously provided by Mark P. Jones. 1 What is a Compiler? Compilers

More information

Research opportuni/es with me

Research opportuni/es with me Research opportuni/es with me Independent study for credit - Build PL tools (parsers, editors) e.g., JDial - Build educa/on tools (e.g., Automata Tutor) - Automata theory problems e.g., AutomatArk - Research

More information

Compiler: Control Flow Optimization

Compiler: Control Flow Optimization Compiler: Control Flow Optimization Virendra Singh Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/

More information

Complex Instruction Set Computer (CISC)

Complex Instruction Set Computer (CISC) Introduction ti to IA-32 IA-32 Processors Evolutionary design Starting in 1978 with 886 Added more features as time goes on Still support old features, although obsolete Totally dominate computer market

More information

CSSE232 Computer Architecture I. Datapath

CSSE232 Computer Architecture I. Datapath CSSE232 Computer Architecture I Datapath Class Status Reading Sec;ons 4.1-3 Project Project group milestone assigned Indicate who you want to work with Indicate who you don t want to work with Due next

More information

Baggy bounds checking. Periklis Akri5dis, Manuel Costa, Miguel Castro, Steven Hand

Baggy bounds checking. Periklis Akri5dis, Manuel Costa, Miguel Castro, Steven Hand Baggy bounds checking Periklis Akri5dis, Manuel Costa, Miguel Castro, Steven Hand C/C++ programs are vulnerable Lots of exis5ng code in C and C++ More being wrieen every day C/C++ programs are prone to

More information

Outline. What Makes a Good ISA? Programmability. Implementability

Outline. What Makes a Good ISA? Programmability. Implementability Outline Instruction Sets in General MIPS Assembly Programming Other Instruction Sets Goals of ISA Design RISC vs. CISC Intel x86 (IA-32) What Makes a Good ISA? Programmability Easy to express programs

More information

X86 Stack Calling Function POV

X86 Stack Calling Function POV X86 Stack Calling Function POV Computer Systems Section 3.7 Stack Frame Reg Value ebp xffff FFF0 esp xffff FFE0 eax x0000 000E Memory Address Value xffff FFF8 xffff FFF4 x0000 0004 xffff FFF4 x0000 0003

More information

Algorithms for Auto- tuning OpenACC Accelerated Kernels

Algorithms for Auto- tuning OpenACC Accelerated Kernels Outline Algorithms for Auto- tuning OpenACC Accelerated Kernels Fatemah Al- Zayer 1, Ameerah Al- Mu2ry 1, Mona Al- Shahrani 1, Saber Feki 2, and David Keyes 1 1 Extreme Compu,ng Research Center, 2 KAUST

More information

Sungkyunkwan University

Sungkyunkwan University u Complete addressing mode, address computa3on (leal) u Arithme3c opera3ons u x86-64 u Control: Condi3on codes u Condi3onal branches u While loops int absdiff(int x, int y) { int result; if (x > y) { result

More information

Program Op*miza*on and Analysis. Chenyang Lu CSE 467S

Program Op*miza*on and Analysis. Chenyang Lu CSE 467S Program Op*miza*on and Analysis Chenyang Lu CSE 467S 1 Program Transforma*on op#mize Analyze HLL compile assembly assemble Physical Address Rela5ve Address assembly object load executable link Absolute

More information

Chapter 11. Addressing Modes

Chapter 11. Addressing Modes Chapter 11 Addressing Modes 1 2 Chapter 11 11 1 Register addressing mode is the most efficient addressing mode because the operands are in the processor itself (there is no need to access memory). Chapter

More information

Reusability of So/ware- Defined Networking Applica=ons: A Run=me, Mul=- Controller Approach

Reusability of So/ware- Defined Networking Applica=ons: A Run=me, Mul=- Controller Approach Reusability of So/ware- Defined Networking Applica=ons: A Run=me, Mul=- Controller Approach Roberto Doriguzzi Corin (CREATE- NET), Pedro A. Aranda Gu=érrez (Telefonica), Elisa Rojas (Telcaria), Holger

More information

Using Graph- Based Characteriza4on for Predic4ve Modeling of Vectorizable Loop Nests

Using Graph- Based Characteriza4on for Predic4ve Modeling of Vectorizable Loop Nests Using Graph- Based Characteriza4on for Predic4ve Modeling of Vectorizable Loop Nests William Killian PhD Prelimary Exam Presenta4on Department of Computer and Informa4on Science CommiIee John Cavazos and

More information

Lecture 4 CIS 341: COMPILERS

Lecture 4 CIS 341: COMPILERS Lecture 4 CIS 341: COMPILERS CIS 341 Announcements HW2: X86lite Available on the course web pages. Due: Weds. Feb. 7 th at midnight Pair-programming project Zdancewic CIS 341: Compilers 2 X86 Schematic

More information

MICROPROCESSOR TECHNOLOGY

MICROPROCESSOR TECHNOLOGY MICROPROCESSOR TECHNOLOGY Assis. Prof. Hossam El-Din Moustafa Lecture 5 Ch.2 A Top-Level View of Computer Function (Cont.) 24-Feb-15 1 CPU (CISC & RISC) Intel CISC, Motorola RISC CISC (Complex Instruction

More information

Dynamic Translation for EPIC Architectures

Dynamic Translation for EPIC Architectures Dynamic Translation for EPIC Architectures David R. Ditzel Chief Architect for Hybrid Computing, VP IAG Intel Corporation Presentation for 8 th Workshop on EPIC Architectures April 24, 2010 1 Dynamic Translation

More information

Dealing with Register Hierarchies

Dealing with Register Hierarchies Dealing with Register Hierarchies Matthias Braun (MatzeB) / LLVM Developers' Meeting 2016 r1;r2;r3;r4 r0;r1;r2;r3 r0,r1,r2,r3 S0 D0 S1 Q0 S2 D1 S3 r3;r4;r5 r2;r3;r4 r1;r2;r3 r2;r3 r1,r2,r3,r4 r3,r4,r5,r6

More information

hnp://

hnp:// The bots face off in a tournament against one another and about an equal number of humans, with each player trying to score points by elimina&ng its opponents. Each player also has a "judging gun" in addi&on

More information

Procedure Calls. Young W. Lim Sat. Young W. Lim Procedure Calls Sat 1 / 27

Procedure Calls. Young W. Lim Sat. Young W. Lim Procedure Calls Sat 1 / 27 Procedure Calls Young W. Lim 2016-11-05 Sat Young W. Lim Procedure Calls 2016-11-05 Sat 1 / 27 Outline 1 Introduction References Stack Background Transferring Control Register Usage Conventions Procedure

More information

A Script- Based Autotuning Compiler System to Generate High- Performance CUDA code

A Script- Based Autotuning Compiler System to Generate High- Performance CUDA code A Script- Based Autotuning Compiler System to Generate High- Performance CUDA code Malik Khan, Protonu Basu, Gabe Rudy, Mary Hall, Chun Chen, Jacqueline Chame Mo:va:on Challenges to programming the GPU

More information

Region of memory managed with stack discipline Grows toward lower addresses. Register %esp contains lowest stack address = address of top element

Region of memory managed with stack discipline Grows toward lower addresses. Register %esp contains lowest stack address = address of top element Machine Representa/on of Programs: Procedures Instructors: Sanjeev Se(a 1 IA32 Stack Region of memory managed with stack discipline Grows toward lower addresses Stack BoGom Increasing Addresses Register

More information

Summary. Alexandre David

Summary. Alexandre David 3.8-3.12 Summary Alexandre David 3.8.4 & 3.8.5 n Array accesses 12-04-2011 Aalborg University, CART 2 Carnegie Mellon N X N Matrix Code Fixed dimensions Know value of N at compile 2me #define N 16 typedef

More information

Assembly I: Basic Operations. Computer Systems Laboratory Sungkyunkwan University

Assembly I: Basic Operations. Computer Systems Laboratory Sungkyunkwan University Assembly I: Basic Operations Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Moving Data (1) Moving data: movl source, dest Move 4-byte ( long )

More information

Outline. What Makes a Good ISA? Programmability. Implementability. Programmability Easy to express programs efficiently?

Outline. What Makes a Good ISA? Programmability. Implementability. Programmability Easy to express programs efficiently? Outline Instruction Sets in General MIPS Assembly Programming Other Instruction Sets Goals of ISA Design RISC vs. CISC Intel x86 (IA-32) What Makes a Good ISA? Programmability Easy to express programs

More information

What is a Compiler? Compiler Construction SMD163. Why Translation is Needed: Know your Target: Lecture 8: Introduction to code generation

What is a Compiler? Compiler Construction SMD163. Why Translation is Needed: Know your Target: Lecture 8: Introduction to code generation Compiler Construction SMD163 Lecture 8: Introduction to code generation Viktor Leijon & Peter Jonsson with slides by Johan Nordlander Contains material generously provided by Mark P. Jones What is a Compiler?

More information

Instruction Selection. Problems. DAG Tiling. Pentium ISA. Example Tiling CS412/CS413. Introduction to Compilers Tim Teitelbaum

Instruction Selection. Problems. DAG Tiling. Pentium ISA. Example Tiling CS412/CS413. Introduction to Compilers Tim Teitelbaum Instruction Selection CS42/CS43 Introduction to Compilers Tim Teitelbaum Lecture 32: More Instruction Selection 20 Apr 05. Translate low-level IR code into DAG representation 2. Then find a good tiling

More information

Search Engines. Informa1on Retrieval in Prac1ce. Annota1ons by Michael L. Nelson

Search Engines. Informa1on Retrieval in Prac1ce. Annota1ons by Michael L. Nelson Search Engines Informa1on Retrieval in Prac1ce Annota1ons by Michael L. Nelson All slides Addison Wesley, 2008 Evalua1on Evalua1on is key to building effec$ve and efficient search engines measurement usually

More information

Registers. Ray Seyfarth. September 8, Bit Intel Assembly Language c 2011 Ray Seyfarth

Registers. Ray Seyfarth. September 8, Bit Intel Assembly Language c 2011 Ray Seyfarth Registers Ray Seyfarth September 8, 2011 Outline 1 Register basics 2 Moving a constant into a register 3 Moving a value from memory into a register 4 Moving values from a register into memory 5 Moving

More information

Data Flow Analysis. Suman Jana. Adopted From U Penn CIS 570: Modern Programming Language Implementa=on (Autumn 2006)

Data Flow Analysis. Suman Jana. Adopted From U Penn CIS 570: Modern Programming Language Implementa=on (Autumn 2006) Data Flow Analysis Suman Jana Adopted From U Penn CIS 570: Modern Programming Language Implementa=on (Autumn 2006) Data flow analysis Derives informa=on about the dynamic behavior of a program by only

More information

BLAS. Basic Linear Algebra Subprograms

BLAS. Basic Linear Algebra Subprograms BLAS Basic opera+ons with vectors and matrices dominates scien+fic compu+ng programs To achieve high efficiency and clean computer programs an effort has been made in the last few decades to standardize

More information

Computer Systems CSE 410 Autumn Memory Organiza:on and Caches

Computer Systems CSE 410 Autumn Memory Organiza:on and Caches Computer Systems CSE 410 Autumn 2013 10 Memory Organiza:on and Caches 06 April 2012 Memory Organiza?on 1 Roadmap C: car *c = malloc(sizeof(car)); c->miles = 100; c->gals = 17; float mpg = get_mpg(c); free(c);

More information

COMP 210 Example Question Exam 2 (Solutions at the bottom)

COMP 210 Example Question Exam 2 (Solutions at the bottom) _ Problem 1. COMP 210 Example Question Exam 2 (Solutions at the bottom) This question will test your ability to reconstruct C code from the assembled output. On the opposing page, there is asm code for

More information

mith College Computer Science CSC231 - Assembly Week #3 Dominique Thiébaut

mith College Computer Science CSC231 - Assembly Week #3 Dominique Thiébaut mith College Computer Science CSC231 - Assembly Week #3 Dominique Thiébaut dthiebaut@smith.edu memory mov add registers hexdump listings number systems Let's Review Last Week's Material section.data

More information

Objec0ves. Gain understanding of what IDA Pro is and what it can do. Expose students to the tool GUI

Objec0ves. Gain understanding of what IDA Pro is and what it can do. Expose students to the tool GUI Intro to IDA Pro 31/15 Objec0ves Gain understanding of what IDA Pro is and what it can do Expose students to the tool GUI Discuss some of the important func

More information

CS101: Fundamentals of Computer Programming. Dr. Tejada www-bcf.usc.edu/~stejada Week 1 Basic Elements of C++

CS101: Fundamentals of Computer Programming. Dr. Tejada www-bcf.usc.edu/~stejada Week 1 Basic Elements of C++ CS101: Fundamentals of Computer Programming Dr. Tejada stejada@usc.edu www-bcf.usc.edu/~stejada Week 1 Basic Elements of C++ 10 Stacks of Coins You have 10 stacks with 10 coins each that look and feel

More information

Equa%onal Reasoning of x86 Assembly Code. Kevin Coogan and Saumya Debray University of Arizona, Tucson, AZ

Equa%onal Reasoning of x86 Assembly Code. Kevin Coogan and Saumya Debray University of Arizona, Tucson, AZ Equa%onal Reasoning of x86 Assembly Code Kevin Coogan and Saumya Debray University of Arizona, Tucson, AZ Assembly Code is Source Code Commercial libraries oeen do not come with source code, but there

More information

Compiler construction. x86 architecture. This lecture. Lecture 6: Code generation for x86. x86: assembly for a real machine.

Compiler construction. x86 architecture. This lecture. Lecture 6: Code generation for x86. x86: assembly for a real machine. This lecture Compiler construction Lecture 6: Code generation for x86 Magnus Myreen Spring 2018 Chalmers University of Technology Gothenburg University x86 architecture s Some x86 instructions From LLVM

More information

The von Neumann Machine

The von Neumann Machine The von Neumann Machine 1 1945: John von Neumann Wrote a report on the stored program concept, known as the First Draft of a Report on EDVAC also Alan Turing Konrad Zuse Eckert & Mauchly The basic structure

More information

Assembly Language. Lecture 2 - x86 Processor Architecture. Ahmed Sallam

Assembly Language. Lecture 2 - x86 Processor Architecture. Ahmed Sallam Assembly Language Lecture 2 - x86 Processor Architecture Ahmed Sallam Introduction to the course Outcomes of Lecture 1 Always check the course website Don t forget the deadline rule!! Motivations for studying

More information

Code Genera*on for Control Flow Constructs

Code Genera*on for Control Flow Constructs Code Genera*on for Control Flow Constructs 1 Roadmap Last *me: Got the basics of MIPS CodeGen for some AST node types This *me: Do the rest of the AST nodes Introduce control flow graphs Scanner Parser

More information

Module 3 Instruction Set Architecture (ISA)

Module 3 Instruction Set Architecture (ISA) Module 3 Instruction Set Architecture (ISA) I S A L E V E L E L E M E N T S O F I N S T R U C T I O N S I N S T R U C T I O N S T Y P E S N U M B E R O F A D D R E S S E S R E G I S T E R S T Y P E S O

More information

CS241 Computer Organization Spring 2015 IA

CS241 Computer Organization Spring 2015 IA CS241 Computer Organization Spring 2015 IA-32 2-10 2015 Outline! Review HW#3 and Quiz#1! More on Assembly (IA32) move instruction (mov) memory address computation arithmetic & logic instructions (add,

More information

Today: Machine Programming I: Basics

Today: Machine Programming I: Basics Today: Machine Programming I: Basics History of Intel processors and architectures C, assembly, machine code Assembly Basics: Registers, operands, move Intro to x86-64 1 Intel x86 Processors Totally dominate

More information

CMSC Lecture 03. UMBC, CMSC313, Richard Chang

CMSC Lecture 03. UMBC, CMSC313, Richard Chang CMSC Lecture 03 Moore s Law Evolution of the Pentium Chip IA-32 Basic Execution Environment IA-32 General Purpose Registers Hello World in Linux Assembly Language Addressing Modes UMBC, CMSC313, Richard

More information

Assembly I: Basic Operations. Jo, Heeseung

Assembly I: Basic Operations. Jo, Heeseung Assembly I: Basic Operations Jo, Heeseung Moving Data (1) Moving data: movl source, dest Move 4-byte ("long") word Lots of these in typical code Operand types Immediate: constant integer data - Like C

More information

Introduction to IA-32. Jo, Heeseung

Introduction to IA-32. Jo, Heeseung Introduction to IA-32 Jo, Heeseung IA-32 Processors Evolutionary design Starting in 1978 with 8086 Added more features as time goes on Still support old features, although obsolete Totally dominate computer

More information

ASSEMBLY I: BASIC OPERATIONS. Jo, Heeseung

ASSEMBLY I: BASIC OPERATIONS. Jo, Heeseung ASSEMBLY I: BASIC OPERATIONS Jo, Heeseung MOVING DATA (1) Moving data: movl source, dest Move 4-byte ("long") word Lots of these in typical code Operand types Immediate: constant integer data - Like C

More information

W4118: PC Hardware and x86. Junfeng Yang

W4118: PC Hardware and x86. Junfeng Yang W4118: PC Hardware and x86 Junfeng Yang A PC How to make it do something useful? 2 Outline PC organization x86 instruction set gcc calling conventions PC emulation 3 PC board 4 PC organization One or more

More information

Communicating with People (2.8)

Communicating with People (2.8) Communicating with People (2.8) For communication Use characters and strings Characters 8-bit (one byte) data for ASCII lb $t0, 0($sp) ; load byte Load a byte from memory, placing it in the rightmost 8-bits

More information

INTRODUCTION TO IA-32. Jo, Heeseung

INTRODUCTION TO IA-32. Jo, Heeseung INTRODUCTION TO IA-32 Jo, Heeseung IA-32 PROCESSORS Evolutionary design Starting in 1978 with 8086 Added more features as time goes on Still support old features, although obsolete Totally dominate computer

More information

Systems Architecture I

Systems Architecture I Systems Architecture I Topics Assemblers, Linkers, and Loaders * Alternative Instruction Sets ** *This lecture was derived from material in the text (sec. 3.8-3.9). **This lecture was derived from material

More information

Where we are. Instruction selection. Abstract Assembly. CS 4120 Introduction to Compilers

Where we are. Instruction selection. Abstract Assembly. CS 4120 Introduction to Compilers Where we are CS 420 Introduction to Compilers Andrew Myers Cornell University Lecture 8: Instruction Selection 5 Oct 20 Intermediate code Canonical intermediate code Abstract assembly code Assembly code

More information

Assembly Language for Intel-Based Computers, 4 th Edition. Chapter 2: IA-32 Processor Architecture Included elements of the IA-64 bit

Assembly Language for Intel-Based Computers, 4 th Edition. Chapter 2: IA-32 Processor Architecture Included elements of the IA-64 bit Assembly Language for Intel-Based Computers, 4 th Edition Kip R. Irvine Chapter 2: IA-32 Processor Architecture Included elements of the IA-64 bit Slides prepared by Kip R. Irvine Revision date: 09/25/2002

More information

Instruction Set Architecture

Instruction Set Architecture CS:APP Chapter 4 Computer Architecture Instruction Set Architecture Randal E. Bryant adapted by Jason Fritts http://csapp.cs.cmu.edu CS:APP2e Hardware Architecture - using Y86 ISA For learning aspects

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3 CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3 Instructor: Dan Garcia inst.eecs.berkeley.edu/~cs61c! Compu@ng in the News At a laboratory in São Paulo,

More information

Procedure Calls. Young W. Lim Mon. Young W. Lim Procedure Calls Mon 1 / 29

Procedure Calls. Young W. Lim Mon. Young W. Lim Procedure Calls Mon 1 / 29 Procedure Calls Young W. Lim 2017-08-21 Mon Young W. Lim Procedure Calls 2017-08-21 Mon 1 / 29 Outline 1 Introduction Based on Stack Background Transferring Control Register Usage Conventions Procedure

More information

Computer System Architecture

Computer System Architecture CSC 203 1.5 Computer System Architecture Department of Statistics and Computer Science University of Sri Jayewardenepura Instruction Set Architecture (ISA) Level 2 Introduction 3 Instruction Set Architecture

More information

Assembly Language. Lecture 2 x86 Processor Architecture

Assembly Language. Lecture 2 x86 Processor Architecture Assembly Language Lecture 2 x86 Processor Architecture Ahmed Sallam Slides based on original lecture slides by Dr. Mahmoud Elgayyar Introduction to the course Outcomes of Lecture 1 Always check the course

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures) More Cache: Set Associa0vity. Smart Phone. Today s Lecture. Core.

CS 61C: Great Ideas in Computer Architecture (Machine Structures) More Cache: Set Associa0vity. Smart Phone. Today s Lecture. Core. CS 6C: Great Ideas in Computer Architecture (Machine Structures) More Cache: Set Associavity Instructors: Randy H Katz David A PaGerson Guest Lecture: Krste Asanovic hgp://insteecsberkeleyedu/~cs6c/fa

More information

Credits and Disclaimers

Credits and Disclaimers Credits and Disclaimers 1 The examples and discussion in the following slides have been adapted from a variety of sources, including: Chapter 3 of Computer Systems 2 nd Edition by Bryant and O'Hallaron

More information

MPI Performance Analysis Trace Analyzer and Collector

MPI Performance Analysis Trace Analyzer and Collector MPI Performance Analysis Trace Analyzer and Collector Berk ONAT İTÜ Bilişim Enstitüsü 19 Haziran 2012 Outline MPI Performance Analyzing Defini6ons: Profiling Defini6ons: Tracing Intel Trace Analyzer Lab:

More information

Op#mizing Websites for Results

Op#mizing Websites for Results MARKETING EDUCATION SERIES Op#mizing Websites for Results 10:00 AM 1:00 PM 3:00 PM David Warren VP Digital Media PennWell Objec5ve and Scope Of this Talk Objec5ve: To help you (marketers) make the most

More information

Prac%cal Control Flow Integrity & Randomiza%on for Binary Executables

Prac%cal Control Flow Integrity & Randomiza%on for Binary Executables Prac%cal Control Flow Integrity & Randomiza%on for Binary Executables Chao Zhang Tao Wei Zhaofeng Chen Lei Duan Peking University Peking University UC Berkeley Peking University Peking University László

More information