CS377P Programming for Performance Single Thread Performance In-order Pipelines

Size: px
Start display at page:

Download "CS377P Programming for Performance Single Thread Performance In-order Pipelines"

Transcription

1 CS377P Programming for Performance Single Thread Performance In-order Pipelines Sreepathi Pai UTCS September 9, 2015

2 Outline 1 Introduction 2 Pipeline Preliminaries 3 The Inorder Pipeline and Performance Issues 4 Performance Tuning

3 Outline 1 Introduction 2 Pipeline Preliminaries 3 The Inorder Pipeline and Performance Issues 4 Performance Tuning

4 Compute-Bound programs CPU-bound programs (i.e. not I/O-bound) Compute-bound Memory-bound Compute-bound programs have a high ratio of ALU to memory operations (arithmetic intensity) do not wait for memory (data usually fits in cache) are limited by the CPU pipeline benefit the most from changing the CPU

5 Examples of Compute-bound Programs Numerical Software Cryptography Signal-processing Molecular Dynamics Parsers (?)

6 Performance of Compute-Bound Programs Goal: Achieve maximum IPC instructions per cycle property of the pipeline/cpu

7 Outline 1 Introduction 2 Pipeline Preliminaries 3 The Inorder Pipeline and Performance Issues 4 Performance Tuning

8 Example int count(const char *s) { int c = 0; while(*s!= \0 ) { if(*s == A ) c++; } s++; } return c; What determines the performance of the above code?

9 Same length, different number of A s Time (ns) Number of As

10 Same length, different number of A s (with CI) Time (ns) Number of As

11 Same length and number of A s, but different distribution Time (ns) start10k.dat end10k.dat random10k.dat

12 objdump -d d <count>: 40052d: 55 push %rbp 40052e: e5 mov %rsp,%rbp : d e8 mov %rdi,-0x18(%rbp) : c7 45 fc movl $0x0,-0x4(%rbp) 40053c: eb 14 jmp <count+0x25> 40053e: 48 8b 45 e8 mov -0x18(%rbp),%rax : 0f b6 00 movzbl (%rax),%eax : 3c 41 cmp $0x41,%al : jne 40054d <count+0x20> : fc 01 addl $0x1,-0x4(%rbp) 40054d: e8 01 addq $0x1,-0x18(%rbp) : 48 8b 45 e8 mov -0x18(%rbp),%rax : 0f b6 00 movzbl (%rax),%eax : 84 c0 test %al,%al 40055b: 75 e1 jne 40053e <count+0x11> 40055d: 8b 45 fc mov -0x4(%rbp),%eax : 5d pop %rbp : c3 retq

13 gas listing (partial) 4:br.c **** int count(const char *s) { 9.loc cfi_startproc pushq %rbp 12.cfi_def_cfa_offset cfi_offset 6, E5 movq %rsp, %rbp 15.cfi_def_cfa_register DE8 movq %rdi, -24(%rbp) 5:br.c **** int c = 0; 17.loc C745FC00 movl $0, -4(%rbp) :br.c **** while(*s!= \0 ) { 19.loc f EB14 jmp.l2 21.L4: 8:br.c **** if(*s == A ) 22.loc B45E8 movq -24(%rbp), %rax FB600 movzbl (%rax), %eax C41 cmpb $65, %al a 7504 jne.l3 9:br.c **** c++; 27.loc c 8345FC01 addl $1, -4(%rbp)

14 How to Read x86 Assembly in 2 minutes LABEL: instruction operands... where operand may be: immediate: $65 register: %rax implicit: push %rbp memory address: $.LC1 memory reference 1: (%rax) (equivalent to *rax) memory reference 2: -18(%rax) (eq. to *(rax - 18)) many more memory reference formats (see manual)

15 Outline 1 Introduction 2 Pipeline Preliminaries 3 The Inorder Pipeline and Performance Issues 4 Performance Tuning

16 The MIPS 5-stage Pipeline From Wikipedia: Instruction Fetch Instruction Decode Register Fetch Execute Address Calc. Memory Access Write Back IF ID EX MEM WB Next PC Adder Next SEQ PC RS1 RS2 Register File Next SEQ PC Zero? Branch taken MUX PC Memory IR IF / ID Sign Extend Imm ID / EX MUX MUX ALU EX / MEM Memory MEM / WB MUX WB Data

17 Stages Fetch Fetches instructions at (%pc) Increment (%pc) to point to next instruction Decode Identifies instruction Identifies operands (memory/registers/immediate) Execute Performs ALU operations Memory Performs Loads/Stores Writeback Retires instruction Results are visible in register file Stores are forwarded to memory

18 Fetch Performance : 3c 41 cmp $0x41,%al : jne 40054d <count+0x20> : fc 01 addl $0x1,-0x4(%rbp) 40054d: e8 01 addq $0x1,-0x18(%rbp) After fetching 0x400547, where should the next instruction be fetched from?

19 Decode/Execute Performance fdivp fdivp Above code executes: (A/B)/C Decode must handle hazards: Structural: not enough divide units Data: results (from previous instructions) not ready yet

20 Memory Performance mov (%rax), %rbx addl $1, %rbx What if data is not found in cache? Topic to be covered later under Caches

21 Writeback Performance Is this a significant bottleneck for in-order processors?

22 Outline 1 Introduction 2 Pipeline Preliminaries 3 The Inorder Pipeline and Performance Issues 4 Performance Tuning

23 What can you, as a programmer, do? For simple, in-order pipelines: Very little A compiler can nearly always outperform a programmer

24 Eliminating Branches Loop unrolling for(i = 0; i < n; i++) { body; } After unrolling the loop UNROLL times: for(i = 0; i < n - UNROLL; i+=unroll) { body; i += 1; body;... i += UNROLL - 1; body; } for(; i < n; i++) { body; } Compiler can do this: see gcc -funroll-loops

25 Eliminating Branches Inline Code Goal is to produce straight-line code: f() { body_f; } while(cond) { f(); } After inlining: while(cond) { body_f; } Compiler can do this as well: gcc -finline*.

26 Other Fetch Optimizations If convert branches replace branches with conditional/predicated instructions (cmove) (gcc -O0) : 3c 41 cmp $0x41,%al : jne <count+0x20> (gcc -O3) : 80 fa 41 cmp $0x41,%dl : 0f 44 c1 cmove %ecx,%eax Layout hot code nearby use compiler profile-guided optimization All of these optimizations are best done by the compiler. But check that the compiler is doing it!

27 Structural Hazards and Data Dependencies Data Dependencies change order of instructions to minimize waiting for data tweak algorithm if necessary Instruction Scheduling change order of instructions to minimize stalls best done by compiler (see gcc -march -mcpu -mtune) but examine anyway hand-assemble otherwise

28 Conclusion Compute-bound code requires: Steady supply of instructions to pipeline Minimum waiting/stalling for operations Compilers can do a very good job but always examine generated code! Handwriting assembly code is an alternative but time-consuming may as well report a bug to the compiler writers

CS 261 Fall Machine and Assembly Code. Data Movement and Arithmetic. Mike Lam, Professor

CS 261 Fall Machine and Assembly Code. Data Movement and Arithmetic. Mike Lam, Professor CS 261 Fall 2018 0000000100000f50 55 48 89 e5 48 83 ec 10 48 8d 3d 3b 00 00 00 c7 0000000100000f60 45 fc 00 00 00 00 b0 00 e8 0d 00 00 00 31 c9 89 0000000100000f70 45 f8 89 c8 48 83 c4 10 5d c3 Mike Lam,

More information

Download the tarball for this session. It will include the following files:

Download the tarball for this session. It will include the following files: Getting Started 1 Download the tarball for this session. It will include the following files: driver driver.c bomb.h bomb.o 64-bit executable C driver source declaration for "bomb" 64-bit object code for

More information

Download the tarball for this session. It will include the following files:

Download the tarball for this session. It will include the following files: Getting Started 1 Download the tarball for this session. It will include the following files: driver driver.c bomb.h bomb.o 64-bit executable C driver source declaration for "bomb" 64-bit object code for

More information

CS-220 Spring 2018 Test 2 Version Practice Apr. 23, Name:

CS-220 Spring 2018 Test 2 Version Practice Apr. 23, Name: CS-220 Spring 2018 Test 2 Version Practice Apr. 23, 2018 Name: 1. (10 points) For the following, Check T if the statement is true, the F if the statement is false. (a) T F : The main difference between

More information

Binghamton University. CS-220 Spring X86 Debug. Computer Systems Section 3.11

Binghamton University. CS-220 Spring X86 Debug. Computer Systems Section 3.11 X86 Debug Computer Systems Section 3.11 GDB is a Source Level debugger We have learned how to debug at the C level But the machine is executing X86 object code! How does GDB play the shell game? Makes

More information

1. A student is testing an implementation of a C function; when compiled with gcc, the following x86-64 assembly code is produced:

1. A student is testing an implementation of a C function; when compiled with gcc, the following x86-64 assembly code is produced: This assignment refers to concepts discussed in sections 2.1.1 2.1.3, 2.1.8, 2.2.1 2.2.6, 3.2, 3.4, and 3.7.1of csapp; see that material for discussions of x86 assembly language and its relationship to

More information

CS-220 Spring 2018 Final Exam Version Practice May 10, Name:

CS-220 Spring 2018 Final Exam Version Practice May 10, Name: CS-220 Spring 2018 Final Exam Version Practice May 10, 2018 Name: 1. (10 points) For the following, Check T if the statement is true, the F if the statement is false. (a) T F : One of the advantages of

More information

C to Assembly SPEED LIMIT LECTURE Performance Engineering of Software Systems. I-Ting Angelina Lee. September 13, 2012

C to Assembly SPEED LIMIT LECTURE Performance Engineering of Software Systems. I-Ting Angelina Lee. September 13, 2012 6.172 Performance Engineering of Software Systems SPEED LIMIT PER ORDER OF 6.172 LECTURE 3 C to Assembly I-Ting Angelina Lee September 13, 2012 2012 Charles E. Leiserson and I-Ting Angelina Lee 1 Bugs

More information

Generation. representation to the machine

Generation. representation to the machine Unoptimized i Code Generation From the intermediate representation to the machine code 5 Outline Introduction Machine Language Overview of a modern processor Memory Layout Procedure Abstraction Procedure

More information

CS377P Programming for Performance Leveraging the Compiler for Performance

CS377P Programming for Performance Leveraging the Compiler for Performance CS377P Programming for Performance Leveraging the Compiler for Performance Sreepathi Pai UTCS October 5, 2015 Outline 1 Compiler Performance 2 Compiler Internals 3 Domain-specific Languages (DSLs) Outline

More information

Machine Program: Procedure. Zhaoguo Wang

Machine Program: Procedure. Zhaoguo Wang Machine Program: Procedure Zhaoguo Wang Requirements of procedure calls? P() { y = Q(x); y++; 1. Passing control int Q(int i) { int t, z; return z; Requirements of procedure calls? P() { y = Q(x); y++;

More information

Machine-Level Programming (2)

Machine-Level Programming (2) Machine-Level Programming (2) Yanqiao ZHU Introduction to Computer Systems Project Future (Fall 2017) Google Camp, Tongji University Outline Control Condition Codes Conditional Branches and Conditional

More information

Branching and Looping

Branching and Looping Branching and Looping Ray Seyfarth August 10, 2011 Branching and looping So far we have only written straight line code Conditional moves helped spice things up In addition conditional moves kept the pipeline

More information

18-600: Recitation #4 Exploits

18-600: Recitation #4 Exploits 18-600: Recitation #4 Exploits 20th September 2016 Agenda More x86-64 assembly Buffer Overflow Attack Return Oriented Programming Attack 3 Recap: x86-64: Register Conventions Arguments passed in registers:

More information

CS356: Discussion #7 Buffer Overflows. Marco Paolieri

CS356: Discussion #7 Buffer Overflows. Marco Paolieri CS356: Discussion #7 Buffer Overflows Marco Paolieri (paolieri@usc.edu) Array Bounds class Bounds { public static void main(string[] args) { int[] x = new int[10]; for (int i = 0; i

More information

Changelog. Performance. locality exercise (1) a transformation

Changelog. Performance. locality exercise (1) a transformation Changelog Performance Changes made in this version not seen in first lecture: 26 October 2017: slide 28: remove extraneous text from code 1 1 locality exercise (1) a transformation /* version 1 */ for

More information

Changelog. Changes made in this version not seen in first lecture: 26 October 2017: slide 28: remove extraneous text from code

Changelog. Changes made in this version not seen in first lecture: 26 October 2017: slide 28: remove extraneous text from code Performance 1 Changelog 1 Changes made in this version not seen in first lecture: 26 October 2017: slide 28: remove extraneous text from code locality exercise (1) 2 /* version 1 */ for (int i = 0; i

More information

12.1. CS356 Unit 12. Processor Hardware Organization Pipelining

12.1. CS356 Unit 12. Processor Hardware Organization Pipelining 12.1 CS356 Unit 12 Processor Hardware Organization Pipelining BASIC HW 12.2 Inputs Outputs 12.3 Logic Circuits Combinational logic Performs a specific function (mapping of 2 n input combinations to desired

More information

CS429: Computer Organization and Architecture

CS429: Computer Organization and Architecture CS429: Computer Organization and Architecture Dr. Bill Young Department of Computer Sciences University of Texas at Austin Last updated: February 26, 2018 at 12:02 CS429 Slideset 8: 1 Controlling Program

More information

CSC 252: Computer Organization Spring 2018: Lecture 5

CSC 252: Computer Organization Spring 2018: Lecture 5 CSC 252: Computer Organization Spring 2018: Lecture 5 Instructor: Yuhao Zhu Department of Computer Science University of Rochester Action Items: Assignment 1 is due tomorrow, midnight Assignment 2 is out

More information

void twiddle1(int *xp, int *yp) { void twiddle2(int *xp, int *yp) {

void twiddle1(int *xp, int *yp) { void twiddle2(int *xp, int *yp) { Optimization void twiddle1(int *xp, int *yp) { *xp += *yp; *xp += *yp; void twiddle2(int *xp, int *yp) { *xp += 2* *yp; void main() { int x = 3; int y = 3; twiddle1(&x, &y); x = 3; y = 3; twiddle2(&x,

More information

CSE351 Autumn 2014 Midterm Exam (29 October 2014)

CSE351 Autumn 2014 Midterm Exam (29 October 2014) CSE351 Autumn 2014 Midterm Exam (29 October 2014) Please read through the entire examination first! We designed this exam so that it can be completed in 50 minutes and, hopefully, this estimate will prove

More information

Computer Systems C S Cynthia Lee

Computer Systems C S Cynthia Lee Computer Systems C S 1 0 7 Cynthia Lee 2 Today s Topics Code optimization! Optimization reality check Don t let it be your Waterloo. 4 Optimization Reality Check Optimization is really exciting but it

More information

Princeton University Computer Science 217: Introduction to Programming Systems. Assembly Language: Function Calls

Princeton University Computer Science 217: Introduction to Programming Systems. Assembly Language: Function Calls Princeton University Computer Science 217: Introduction to Programming Systems Assembly Language: Function Calls 1 Goals of this Lecture Help you learn: Function call problems x86-64 solutions Pertinent

More information

How Software Executes

How Software Executes How Software Executes CS-576 Systems Security Instructor: Georgios Portokalidis Overview Introduction Anatomy of a program Basic assembly Anatomy of function calls (and returns) Memory Safety Intel x86

More information

Lecture 3 CIS 341: COMPILERS

Lecture 3 CIS 341: COMPILERS Lecture 3 CIS 341: COMPILERS HW01: Hellocaml! Announcements is due tomorrow tonight at 11:59:59pm. HW02: X86lite Will be available soon look for an announcement on Piazza Pair-programming project Simulator

More information

CSE351 Autumn 2012 Midterm Exam (5 Nov 2012)

CSE351 Autumn 2012 Midterm Exam (5 Nov 2012) CSE351 Autumn 2012 Midterm Exam (5 Nov 2012) Please read through the entire examination first! We designed this exam so that it can be completed in 50 minutes and, hopefully, this estimate will prove to

More information

The Hardware/Software Interface CSE351 Spring 2013

The Hardware/Software Interface CSE351 Spring 2013 The Hardware/Software Interface CSE351 Spring 2013 x86 Programming II 2 Today s Topics: control flow Condition codes Conditional and unconditional branches Loops 3 Conditionals and Control Flow A conditional

More information

Question 1: Number Representation

Question 1: Number Representation Question 1: Number Representation (A) What is the value of the char 0b 1101 1101 in decimal? If x = 0xDD, x = 0x23 = 2 5 +3 = 35 Also accepted unsigned: 0xDD = (16+1)*13 = 221-35 or 221 (B) What is the

More information

CS356 Unit 12a. Logic Circuits. Combinational Logic Gates BASIC HW. Processor Hardware Organization Pipelining

CS356 Unit 12a. Logic Circuits. Combinational Logic Gates BASIC HW. Processor Hardware Organization Pipelining 2a. 2a.2 CS356 Unit 2a Processor Hardware Organization Pipelining BASIC HW Logic Circuits 2a.3 Combinational Logic Gates 2a.4 logic Performs a specific function (mapping of input combinations to desired

More information

Changes made in this version not seen in first lecture:

Changes made in this version not seen in first lecture: 1 Changelog 1 Changes made in this version not seen in first lecture: 11 April 2018: loop unrolling v cache blocking (2): corrected second example which just did no loop unrolling or cache blocking before

More information

CS165 Computer Security. Understanding low-level program execution Oct 1 st, 2015

CS165 Computer Security. Understanding low-level program execution Oct 1 st, 2015 CS165 Computer Security Understanding low-level program execution Oct 1 st, 2015 A computer lets you make more mistakes faster than any invention in human history - with the possible exceptions of handguns

More information

CS Bootcamp x86-64 Autumn 2015

CS Bootcamp x86-64 Autumn 2015 The x86-64 instruction set architecture (ISA) is used by most laptop and desktop processors. We will be embedding assembly into some of our C++ code to explore programming in assembly language. Depending

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware 4.1 Introduction We will examine two MIPS implementations

More information

CS356: Discussion #8 Buffer-Overflow Attacks. Marco Paolieri

CS356: Discussion #8 Buffer-Overflow Attacks. Marco Paolieri CS356: Discussion #8 Buffer-Overflow Attacks Marco Paolieri (paolieri@usc.edu) Previous Example #include void unreachable() { printf("impossible.\n"); void hello() { char buffer[6]; scanf("%s",

More information

CS356 Unit 5. Translation to Assembly. Translating HLL to Assembly ASSEMBLY TRANSLATION EXAMPLE. x86 Control Flow

CS356 Unit 5. Translation to Assembly. Translating HLL to Assembly ASSEMBLY TRANSLATION EXAMPLE. x86 Control Flow 5.1 5.2 CS356 Unit 5 x86 Control Flow Compiler output ASSEMBLY TRANSLATION EXAMPLE Translation to Assembly 5.3 Translating HLL to Assembly 5.4 We will now see some C code and its assembly translation A

More information

CS356: Discussion #15 Review for Final Exam. Marco Paolieri Illustrations from CS:APP3e textbook

CS356: Discussion #15 Review for Final Exam. Marco Paolieri Illustrations from CS:APP3e textbook CS356: Discussion #15 Review for Final Exam Marco Paolieri (paolieri@usc.edu) Illustrations from CS:APP3e textbook Processor Organization Pipeline: Computing Throughput and Delay n 1 2 3 4 5 6 clock (ps)

More information

Recitation #6. Architecture Lab

Recitation #6. Architecture Lab 18-600 Recitation #6 Architecture Lab Overview Where we are Archlab Intro Optimizing a Y86 program Intro to HCL Optimizing Processor Design Where we Are Archlab Intro Part A: write and simulate Y86-64

More information

CS153: Compilers Lecture 2: Assembly

CS153: Compilers Lecture 2: Assembly CS153: Compilers Lecture 2: Assembly Stephen Chong https://www.seas.harvard.edu/courses/cs153 Announcements (1/2) Name tags Device free seating Right side of classroom (as facing front): no devices Allow

More information

CS 107. Lecture 13: Assembly Part III. Friday, November 10, Stack "bottom".. Earlier Frames. Frame for calling function P. Increasing address

CS 107. Lecture 13: Assembly Part III. Friday, November 10, Stack bottom.. Earlier Frames. Frame for calling function P. Increasing address CS 107 Stack "bottom" Earlier Frames Lecture 13: Assembly Part III Argument n Friday, November 10, 2017 Computer Systems Increasing address Argument 7 Frame for calling function P Fall 2017 Stanford University

More information

18-600: Recitation #4 Exploits (Attack Lab)

18-600: Recitation #4 Exploits (Attack Lab) 18-600: Recitation #4 Exploits (Attack Lab) September 19th, 2017 Announcements Some students have triggered the bomb multiple times Use breakpoints for explode_bomb() Attack lab will be released on Sep.

More information

15-213/18-243, Spring 2011 Exam 1

15-213/18-243, Spring 2011 Exam 1 Andrew login ID: Full Name: Section: 15-213/18-243, Spring 2011 Exam 1 Thursday, March 3, 2011 (v1) Instructions: Make sure that your exam is not missing any sheets, then write your Andrew login ID, full

More information

Computer Organization: A Programmer's Perspective

Computer Organization: A Programmer's Perspective A Programmer's Perspective Instruction Set Architecture Gal A. Kaminka galk@cs.biu.ac.il Outline: CPU Design Background Instruction sets Logic design Sequential Implementation A simple, but not very fast

More information

CSE 351 Midterm - Winter 2017

CSE 351 Midterm - Winter 2017 CSE 351 Midterm - Winter 2017 February 08, 2017 Please read through the entire examination first, and make sure you write your name and NetID on all pages! We designed this exam so that it can be completed

More information

Machine-Level Programming II: Control

Machine-Level Programming II: Control Machine-Level Programming II: Control CSE 238/2038/2138: Systems Programming Instructor: Fatma CORUT ERGİN Slides adapted from Bryant & O Hallaron s slides 1 Today Control: Condition codes Conditional

More information

5.1. CS356 Unit 5. x86 Control Flow

5.1. CS356 Unit 5. x86 Control Flow 5.1 CS356 Unit 5 x86 Control Flow 5.2 Compiler output ASSEMBLY TRANSLATION EXAMPLE 5.3 Translation to Assembly We will now see some C code and its assembly translation A few things to remember: Data variables

More information

Machine-Level Programming II: Control

Machine-Level Programming II: Control Mellon Machine-Level Programming II: Control CS140 Computer Organization and Assembly Slides Courtesy of: Randal E. Bryant and David R. O Hallaron 1 First https://www.youtube.com/watch?v=ivuu8jobb1q 2

More information

Machine Organization & Assembly Language

Machine Organization & Assembly Language Name: 1 CSE 378 Fall 2010 Machine Organization & Assembly Language Final Exam Solution Write your answers on these pages. Additional pages may be attached (with staple) if necessary. Please ensure that

More information

CS 6354: Static Scheduling / Branch Prediction. 12 September 2016

CS 6354: Static Scheduling / Branch Prediction. 12 September 2016 1 CS 6354: Static Scheduling / Branch Prediction 12 September 2016 On out-of-order RDTSC 2 Because of pipelines, etc.: RDTSC can actually take its time measurement after it starts Earlier instructions

More information

CS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines

CS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines CS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines Sreepathi Pai UTCS September 14, 2015 Outline 1 Introduction 2 Out-of-order Scheduling 3 The Intel Haswell

More information

CS3350B Computer Architecture Quiz 3 March 15, 2018

CS3350B Computer Architecture Quiz 3 March 15, 2018 CS3350B Computer Architecture Quiz 3 March 15, 2018 Student ID number: Student Last Name: Question 1.1 1.2 1.3 2.1 2.2 2.3 Total Marks The quiz consists of two exercises. The expected duration is 30 minutes.

More information

Pipelining. Pipeline performance

Pipelining. Pipeline performance Pipelining Basic concept of assembly line Split a job A into n sequential subjobs (A 1,A 2,,A n ) with each A i taking approximately the same time Each subjob is processed by a different substation (or

More information

Procedures and the Call Stack

Procedures and the Call Stack Procedures and the Call Stack Topics Procedures Call stack Procedure/stack instructions Calling conventions Register-saving conventions Why Procedures? Why functions? Why methods? int contains_char(char*

More information

CS 3330 Exam 2 Fall 2017 Computing ID:

CS 3330 Exam 2 Fall 2017 Computing ID: S 3330 Fall 2017 Exam 2 Variant page 1 of 8 Email I: S 3330 Exam 2 Fall 2017 Name: omputing I: Letters go in the boxes unless otherwise specified (e.g., for 8 write not 8 ). Write Letters clearly: if we

More information

Machine Language CS 3330 Samira Khan

Machine Language CS 3330 Samira Khan Machine Language CS 3330 Samira Khan University of Virginia Feb 2, 2017 AGENDA Logistics Review of Abstractions Machine Language 2 Logistics Feedback Not clear Hard to hear Use microphone Good feedback

More information

Intel x86-64 and Y86-64 Instruction Set Architecture

Intel x86-64 and Y86-64 Instruction Set Architecture CSE 2421: Systems I Low-Level Programming and Computer Organization Intel x86-64 and Y86-64 Instruction Set Architecture Presentation J Read/Study: Bryant 3.1 3.5, 4.1 Gojko Babić 03-07-2018 Intel x86

More information

EXAMINATIONS 2014 TRIMESTER 1 SWEN 430. Compiler Engineering. This examination will be marked out of 180 marks.

EXAMINATIONS 2014 TRIMESTER 1 SWEN 430. Compiler Engineering. This examination will be marked out of 180 marks. T E W H A R E W Ā N A N G A O T E Ū P O K O O T E I K A A M Ā U I VUW V I C T O R I A UNIVERSITY OF WELLINGTON EXAMINATIONS 2014 TRIMESTER 1 SWEN 430 Compiler Engineering Time Allowed: THREE HOURS Instructions:

More information

The von Neumann Machine

The von Neumann Machine The von Neumann Machine 1 1945: John von Neumann Wrote a report on the stored program concept, known as the First Draft of a Report on EDVAC also Alan Turing Konrad Zuse Eckert & Mauchly The basic structure

More information

1 Number Representation(10 points)

1 Number Representation(10 points) Name: Sp15 Midterm Q1 1 Number Representation(10 points) 1 NUMBER REPRESENTATION(10 POINTS) Let x=0xe and y=0x7 be integers stored on a machine with a word size of 4bits. Show your work with the following

More information

CSE351 Autumn 2014 Midterm Exam (29 October 2014)

CSE351 Autumn 2014 Midterm Exam (29 October 2014) CSE351 Autumn 2014 Midterm Exam (29 October 2014) (Version A) Please read through the entire examination first! We designed this exam so that it can be completed in 50 minutes and, hopefully, this estimate

More information

CS 3330 Exam 3 Fall 2017 Computing ID:

CS 3330 Exam 3 Fall 2017 Computing ID: S 3330 Fall 2017 Exam 3 Variant E page 1 of 16 Email I: S 3330 Exam 3 Fall 2017 Name: omputing I: Letters go in the boxes unless otherwise specified (e.g., for 8 write not 8 ). Write Letters clearly: if

More information

Machine-level Programs Procedure

Machine-level Programs Procedure Computer Systems Machine-level Programs Procedure Han, Hwansoo Mechanisms in Procedures Passing control To beginning of procedure code Back to return point Passing data Procedure arguments Return value

More information

Review addressing modes

Review addressing modes Review addressing modes Op Src Dst Comments movl $0, %rax Register movl $0, 0x605428 Direct address movl $0, (%rcx) Indirect address movl $0, 20(%rsp) Indirect with displacement movl $0, -8(%rdi, %rax,

More information

6.035 Project 3: Unoptimized Code Generation. Jason Ansel MIT - CSAIL

6.035 Project 3: Unoptimized Code Generation. Jason Ansel MIT - CSAIL 6.035 Project 3: Unoptimized Code Generation Jason Ansel MIT - CSAIL Quiz Monday 50 minute quiz Monday Covers everything up to yesterdays lecture Lexical Analysis (REs, DFAs, NFAs) Syntax Analysis (CFGs,

More information

How Software Executes

How Software Executes How Software Executes CS-576 Systems Security Instructor: Georgios Portokalidis Overview Introduction Anatomy of a program Basic assembly Anatomy of function calls (and returns) Memory Safety Programming

More information

x86-64 Programming III & The Stack

x86-64 Programming III & The Stack x86-64 Programming III & The Stack CSE 351 Winter 2018 Instructor: Mark Wyse Teaching Assistants: Kevin Bi Parker DeWilde Emily Furst Sarah House Waylon Huang Vinny Palaniappan http://xkcd.com/1652/ Administrative

More information

Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition. Carnegie Mellon

Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition. Carnegie Mellon Carnegie Mellon Machine-Level Programming III: Procedures 15-213/18-213/14-513/15-513: Introduction to Computer Systems 7 th Lecture, September 18, 2018 Today Procedures Mechanisms Stack Structure Calling

More information

Do not turn the page until 5:10.

Do not turn the page until 5:10. University of Washington Computer Science & Engineering Autumn 2018 Instructor: Justin Hsia 2018-10-29 Last Name: First Name: Student ID Number: Name of person to your Left Right All work is my own. I

More information

CS 107 Lecture 10: Assembly Part I

CS 107 Lecture 10: Assembly Part I CS 107 Lecture 10: Assembly Part I Friday, February 9th, 2018 Computer Systems Winter 2018 Stanford University Computer Science Department Reading: Course Reader: x86-64 Assembly Language, Textbook: Chapter

More information

CS 351 Exam 2, Fall 2012

CS 351 Exam 2, Fall 2012 CS 351 Exam 2, Fall 2012 Your name: Rules You may use one handwritten 8.5 x 11 cheat sheet (front and back). This is the only resource you may consult during this exam. Include explanations and comments

More information

Assembly III: Procedures. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Assembly III: Procedures. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Assembly III: Procedures Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Mechanisms in Procedures Passing control To beginning of procedure code

More information

CSC 252: Computer Organization Spring 2018: Lecture 11

CSC 252: Computer Organization Spring 2018: Lecture 11 CSC 252: Computer Organization Spring 2018: Lecture 11 Instructor: Yuhao Zhu Department of Computer Science University of Rochester Action Items: Assignment 3 is due March 2, midnight Announcement Programming

More information

CSCI 2021: x86-64 Control Flow

CSCI 2021: x86-64 Control Flow CSCI 2021: x86-64 Control Flow Chris Kauffman Last Updated: Mon Mar 11 11:54:06 CDT 2019 1 Logistics Reading Bryant/O Hallaron Ch 3.6: Control Flow Ch 3.7: Procedure calls Goals Jumps and Control flow

More information

6.1. CS356 Unit 6. x86 Procedures Basic Stack Frames

6.1. CS356 Unit 6. x86 Procedures Basic Stack Frames 6.1 CS356 Unit 6 x86 Procedures Basic Stack Frames 6.2 Review of Program Counter (Instruc. Pointer) PC/IP is used to fetch an instruction PC/IP contains the address of the next instruction The value in

More information

Do not turn the page until 12:30.

Do not turn the page until 12:30. University of Washington Computer Science & Engineering Autumn 2016 Instructor: Justin Hsia 2016-12-13 Last Name: First Name: Perfect Perry Student ID Number: 1234567 Section you attend (circle): Chris

More information

Optimization part 1 1

Optimization part 1 1 Optimization part 1 1 Changelog 1 Changes made in this version not seen in first lecture: 29 Feb 2018: loop unrolling performance: remove bogus instruction cache overhead remark 29 Feb 2018: spatial locality

More information

Do not turn the page until 12:30.

Do not turn the page until 12:30. University of Washington Computer Science & Engineering Autumn 2017 Instructor: Justin Hsia 2017-12-13 Last Name: First Name: Student ID Number: Name of person to your Left Right All work is my own. I

More information

Do not turn the page until 5:10.

Do not turn the page until 5:10. University of Washington Computer Science & Engineering Autumn 2017 Instructor: Justin Hsia 2017-10-30 Last Name: First Name: Student ID Number: Name of person to your Left Right All work is my own. I

More information

Computer Systems C S Cynthia Lee

Computer Systems C S Cynthia Lee Computer Systems C S 1 0 7 Cynthia Lee 2 Today s Topics Function call and return in x86-64 Registers Call stack NEXT TIME: NEW topic: the build process Taking a look at each step of the process Preprocessor,

More information

C to Machine Code x86 basics: Registers Data movement instructions Memory addressing modes Arithmetic instructions

C to Machine Code x86 basics: Registers Data movement instructions Memory addressing modes Arithmetic instructions C to Machine Code x86 basics: Registers Data movement instructions Memory addressing modes Arithmetic instructions Program, Application Software Hardware next few weeks Programming Language Compiler/Interpreter

More information

Final Jeopardy. CS356 Unit 15. Binary Brainteaser 100. Binary Brainteaser 200. Review

Final Jeopardy. CS356 Unit 15. Binary Brainteaser 100. Binary Brainteaser 200. Review 15.1 Final Jeopardy 15.2 Binary Brainteasers Instruction Inquiry Random Riddles Memory Madness Processor Predicaments Programming Pickles CS356 Unit 15 Review 100 100 100 100 100 100 200 200 200 200 200

More information

COSC 6385 Computer Architecture - Pipelining

COSC 6385 Computer Architecture - Pipelining COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage

More information

CS 33. Architecture and Optimization (2) CS33 Intro to Computer Systems XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

CS 33. Architecture and Optimization (2) CS33 Intro to Computer Systems XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. CS 33 Architecture and Optimization (2) CS33 Intro to Computer Systems XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. Modern CPU Design Instruction Control Retirement Unit Register File

More information

Machine Programming 3: Procedures

Machine Programming 3: Procedures Machine Programming 3: Procedures CS61, Lecture 5 Prof. Stephen Chong September 15, 2011 Announcements Assignment 2 (Binary bomb) due next week If you haven t yet please create a VM to make sure the infrastructure

More information

THE UNIVERSITY OF BRITISH COLUMBIA CPSC 261: MIDTERM 1 February 14, 2017

THE UNIVERSITY OF BRITISH COLUMBIA CPSC 261: MIDTERM 1 February 14, 2017 THE UNIVERSITY OF BRITISH COLUMBIA CPSC 261: MIDTERM 1 February 14, 2017 Last Name: First Name: Signature: UBC Student #: Important notes about this examination 1. You have 70 minutes to write the 6 questions

More information

Machine-Level Programming III: Procedures

Machine-Level Programming III: Procedures Machine-Level Programming III: Procedures CSE 238/2038/2138: Systems Programming Instructor: Fatma CORUT ERGİN Slides adapted from Bryant & O Hallaron s slides Mechanisms in Procedures Passing control

More information

How Software Executes

How Software Executes How Software Executes CS-576 Systems Security Instructor: Georgios Portokalidis Overview Introduction Anatomy of a program Basic assembly Anatomy of function calls (and returns) Memory Safety Programming

More information

GCC and Assembly language. GCC and Assembly language. Consider an example (dangeous) foo.s

GCC and Assembly language. GCC and Assembly language. Consider an example (dangeous) foo.s GCC and Assembly language slide 1 GCC and Assembly language slide 2 during the construction of an operating system kernel, microkernel, or embedded system it is vital to be able to access some of the microprocessor

More information

Princeton University Computer Science 217: Introduction to Programming Systems. Machine Language

Princeton University Computer Science 217: Introduction to Programming Systems. Machine Language Princeton University Computer Science 217: Introduction to Programming Systems Machine Language 1 A paradox grader.c enum {BUFSIZE = 48}; char grade = 'D'; char name[bufsize]; /* Read a string into s */

More information

CS429: Computer Organization and Architecture

CS429: Computer Organization and Architecture CS429: Computer Organization and Architecture Dr. Bill Young Department of Computer Sciences University of Texas at Austin Last updated: February 28, 2018 at 06:32 CS429 Slideset 9: 1 Mechanisms in Procedures

More information

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle? CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:

More information

Today: Machine Programming I: Basics

Today: Machine Programming I: Basics Today: Machine Programming I: Basics History of Intel processors and architectures C, assembly, machine code Assembly Basics: Registers, operands, move Intro to x86-64 1 Intel x86 Processors Totally dominate

More information

ECEC 355: Pipelining

ECEC 355: Pipelining ECEC 355: Pipelining November 8, 2007 What is Pipelining Pipelining is an implementation technique whereby multiple instructions are overlapped in execution. A pipeline is similar in concept to an assembly

More information

Function Calls and Stack

Function Calls and Stack Function Calls and Stack Philipp Koehn 16 April 2018 1 functions Another Example 2 C code with an undefined function int main(void) { int a = 2; int b = do_something(a); urn b; } This can be successfully

More information

Princeton University Computer Science 217: Introduction to Programming Systems. A paradox. Machine Language. Machine language.

Princeton University Computer Science 217: Introduction to Programming Systems. A paradox. Machine Language. Machine language. Princeton University Computer Science 217: Introduction to Programming Systems Machine Language 1 A paradox grader.c enum {BUFSIZE = 48; char grade = 'D'; char name[bufsize]; /* Read a string into s */

More information

Functions. Ray Seyfarth. August 4, Bit Intel Assembly Language c 2011 Ray Seyfarth

Functions. Ray Seyfarth. August 4, Bit Intel Assembly Language c 2011 Ray Seyfarth Functions Ray Seyfarth August 4, 2011 Functions We will write C compatible function C++ can also call C functions using extern "C" {...} It is generally not sensible to write complete assembly programs

More information

CSE 351 Midterm - Winter 2015

CSE 351 Midterm - Winter 2015 CSE 351 Midterm - Winter 2015 February 09, 2015 Please read through the entire examination first! We designed this exam so that it can be completed in 50 minutes and, hopefully, this estimate will prove

More information

Lecture: Pipelining Basics

Lecture: Pipelining Basics Lecture: Pipelining Basics Topics: Basic pipelining implementation Video 1: What is pipelining? Video 2: Clocks and latches Video 3: An example 5-stage pipeline Video 4: Loads/Stores and RISC/CISC Video

More information

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown

More information

Credits and Disclaimers

Credits and Disclaimers Credits and Disclaimers 1 The examples and discussion in the following slides have been adapted from a variety of sources, including: Chapter 3 of Computer Systems 3 nd Edition by Bryant and O'Hallaron

More information