CMSC Computer Architecture Lecture 2: ISA. Prof. Yanjing Li Department of Computer Science University of Chicago

Similar documents
CMSC Computer Architecture Lecture 2: ISA. Prof. Yanjing Li Department of Computer Science University of Chicago

CMSC Computer Architecture Lecture 3: ISA and Introduction to Microarchitecture. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 5: Pipelining. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages

Elementary Educational Computer

End Semester Examination CSE, III Yr. (I Sem), 30002: Computer Organization

Appendix D. Controller Implementation

CMSC Computer Architecture Lecture 10: Caches. Prof. Yanjing Li University of Chicago

CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW. Prof. Yanjing Li University of Chicago

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5

Instruction and Data Streams

COMPUTER ORGANIZATION AND DESIGN. ARM Edition. The Hardware/Software Interface. Chapter 2. Instructions: Language of the Computer

Chapter 5: Processor Design Advanced Topics. Microprogramming: Basic Idea

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5.

ECE 486/586. Computer Architecture. Lecture # 7

Computer Architecture. Microcomputer Architecture and Interfacing Colorado School of Mines Professor William Hoff

Lecture 4: Instruction Set Architecture

CMSC Computer Architecture Lecture 4: Single-Cycle uarch and Pipelining. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 11: More Caches. Prof. Yanjing Li University of Chicago

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method

Multiprocessors. HPC Prof. Robert van Engelen

UNIVERSITY OF MORATUWA

Description of Single Cycle Computer (SCC)

CSEE 3827: Fundamentals of Computer Systems

The Magma Database file formats

EE260: Digital Design, Spring /16/18. n Example: m 0 (=x 1 x 2 ) is adjacent to m 1 (=x 1 x 2 ) and m 2 (=x 1 x 2 ) but NOT m 3 (=x 1 x 2 )

Chapter 11. Friends, Overloaded Operators, and Arrays in Classes. Copyright 2014 Pearson Addison-Wesley. All rights reserved.

Isn t It Time You Got Faster, Quicker?

Computer Architecture ELEC3441

Introduction to Computing Systems: From Bits and Gates to C and Beyond 2 nd Edition

Instructions: Language of the Computer

APPLICATION NOTE PACE1750AE BUILT-IN FUNCTIONS

Course Site: Copyright 2012, Elsevier Inc. All rights reserved.

Chapter 4 The Datapath

Master Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1

Fundamentals of. Chapter 1. Microprocessor and Microcontroller. Dr. Farid Farahmand. Updated: Tuesday, January 16, 2018

Chapter 2A Instructions: Language of the Computer

CS252 Spring 2017 Graduate Computer Architecture. Lecture 6: Out-of-Order Processors

Python Programming: An Introduction to Computer Science

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control

Ones Assignment Method for Solving Traveling Salesman Problem

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Hash Tables. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015.

Computer Architecture

Uniprocessors. HPC Prof. Robert van Engelen

Basic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000.

Computer Architecture Lecture 3: ISA Tradeoffs. Prof. Onur Mutlu Carnegie Mellon University Spring 2014, 1/17/2014

EGC-442 Introduction to Computer Architecture Dr. Izadi Course Design Project (100 Points)

EE University of Minnesota. Midterm Exam #1. Prof. Matthew O'Keefe TA: Eric Seppanen. Department of Electrical and Computer Engineering

Computer Architecture

Computers and Scientific Thinking

Instruction Set Architecture. "Speaking with the computer"

Anne Bracy CS 3410 Computer Science Cornell University. See P&H Chapter: , , Appendix B

How do we evaluate algorithms?

COSC 1P03. Ch 7 Recursion. Introduction to Data Structures 8.1

EC-801 Advanced Computer Architecture

Reliable Transmission. Spring 2018 CS 438 Staff - University of Illinois 1

CMSC Computer Architecture Lecture 15: Multi-Core. Prof. Yanjing Li University of Chicago

Reminder: tutorials start next week!

WYSE Academic Challenge Sectional Computer Science 2005 SOLUTION SET

Operating System Concepts. Operating System Concepts

Assembly language Simple, regular instructions building blocks of C, Java & other languages Typically one-to-one mapping to machine language

ISA and RISCV. CASS 2018 Lavanya Ramapantulu

Computer Architecture. MIPS Instruction Set Architecture

Design of Digital Circuits Lecture 14: Pipelining. Prof. Onur Mutlu ETH Zurich Spring April 2018

Processor Architecture

EECS Computer Organization Fall Based on slides by the author and prof. Mary Jane Irwin of PSU.

Assembly language Simple, regular instructions building blocks of C, Java & other languages Typically one-to-one mapping to machine language

Lecture 4: MIPS Instruction Set

Chapter 3 Classification of FFT Processor Algorithms

Processor. Han Wang CS3410, Spring 2012 Computer Science Cornell University. See P&H Chapter , 4.1 4

EE123 Digital Signal Processing

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Lecture 1: Introduction and Fundamental Concepts 1

Threads and Concurrency in Java: Part 1

Threads and Concurrency in Java: Part 1

COMPUTER ORGANIZATION AND DESIGN

RISC, CISC, and ISA Variations

Computer Architecture

CMPT 125 Assignment 2 Solutions

Heaps. Presentation for use with the textbook Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015

Chapter 3. Floating Point Arithmetic

Arquitectura de Computadores

CS 11 C track: lecture 1

Instruction Set Architectures Part I: From C to MIPS. Readings:

Getting Started. Getting Started - 1

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Hakim Weatherspoon CS 3410 Computer Science Cornell University

ISA: The Hardware Software Interface

Page 1. Why Care About the Memory Hierarchy? Memory. DRAMs over Time. Virtual Memory!

CS 111: Program Design I Lecture 15: Objects, Pandas, Modules. Robert H. Sloan & Richard Warner University of Illinois at Chicago October 13, 2016

. Written in factored form it is easy to see that the roots are 2, 2, i,

ELEC / Computer Architecture and Design Fall 2013 Instruction Set Architecture (Chapter 2)

Code Review Defects. Authors: Mika V. Mäntylä and Casper Lassenius Original version: 4 Sep, 2007 Made available online: 24 April, 2013

Instruction Set Architecture

EN164: Design of Computing Systems Lecture 09: Processor / ISA 2

Computer Systems - HS

Transcription:

CMSC 22200 Computer Architecture Lecture 2: ISA Prof. Yajig Li Departmet of Computer Sciece Uiversity of Chicago

Admiistrative Stuff Lab1 out toight Due Thursday (10/18) Lab1 review sessio Tomorrow, 10/05, 4:30-5:30pm Locatio: JCL 346 2

Lecture Outlie Itroductio to ISA Case Study: ARMv8 / LEGv8 3

Review: Basic Cocepts Basic cocepts What is a computer? What is the vo Neuma model? What is ISA? What is uarch? Desig poit 4

ISA Istructios Opcodes, Addressig Modes, Data Types Istructio Types ad Formats Registers, Coditio Codes Memory orgaizatio Address space, Addressability, Aligmet Virtual memory maagemet Call, Iterrupt/Exceptio Hadlig Access Cotrol, Priority/Privilege I/O: memory-mapped vs. istr. Task/thread Maagemet Power ad Thermal Maagemet Multi-threadig support, Multiprocessor support 5

ISA Elemet: Istructio Or machie code, cosists of opcode: what the istructio does (add, sub, ) operads: who it is to do it to (register, memory, immediate) Example 6

Istructio Classes Operate istructios Process data: arithmetic ad logical operatios Fetch operads, compute result, store result Implicit seuetial cotrol flow (e.g., PC <= PC + 4) Data movemet istructios Move data betwee memory, registers, I/O devices Implicit seuetial cotrol flow Cotrol flow istructios Chage the seuece of istructios that are executed 7

Load/Store vs. Memory/Memory Architectures Load/store architecture: operate istructios operate oly o registers E.g., MIPS, ARM ad may RISC ISAs Memory/memory architecture: operate istructios ca operate o memory locatios E.g., x86 8

Data Types Represetatio of iformatio for which there are istructios that operate o the represetatio ARMv8 Iteger (byte, half word, word, doubleword, uad word) Floatig poit (half-, sigle-, double-precisio) Fixed poit Vector formats Others E.g., strigs i x86 9

Istructio Process Style Specifies the umber of operads a istructio operates o ad how it does so 0, 1, 2, 3 address machies 0-address: stack machie (op, push, pop) 1-address: accumulator machie (e.g., add mem) 2-address: 2-operad machie (op D, S; oe is both source ad dest) 3-address: 3-operad machie (op D, S1, S2; source ad dest separate) E.g., ARMv8 represets a 3-address machie 10

Istructio Addressig Modes Specifies how to obtai a operad of a istructio Register Immediate Memory (displacemet, register idirect, idexed, absolute, memory idirect, autoicremet, autodecremet, ) 11

Istructio Addressig Modes for Memory Specify how to obtai memory operads Absolute LW Rt, 10000 use immediate value as address Register Idirect: LW Rt, (r base ) use GPR[r base ] as address Displaced or based: LW Rt, offset(r base ) use offset+gpr[r base ] as address Idexed: LW Rt, (r base, r idex ) use GPR[r base ]+GPR[r idex ] as address Memory Idirect LW Rt ((r base )) use value at M[ GPR[ r base ] ] as address Auto ic/decremet LW Rt, (r base ) use GRP[r base ] as address, but ic. or dec. GPR[r base ] each time 12

Istructio Legth Fixed legth: Legth of all istructios the same + Easier to decode sigle istructio i hardware + Easier to decode multiple istructios cocurretly (superscalar) -- Wasted bits i istructios (Why is this bad?) -- Harder-to-exted ISA (how to add ew istructios?) Variable legth: Legth of istructios differet + Compact ecodig (Why is this good?) + extesibility -- More logic to decode a sigle istructio -- Harder to decode multiple istructios cocurretly Tradeoffs Code size (memory space, badwidth, latecy) vs. hardware complexity ISA extesibility ad expressiveess vs. hardware complexity Performace/eergy efficiecy? Smaller code vs. ease of decode 13

Uiform/No-uiform Decode of Ist Uiform decode: Same bits i each istructio correspod to the same meaig Opcode is always i the same locatio Ditto operad specifiers, immediate values, + Easier decode, simpler hardware + Eables parallelism: geerate target address before kowig the istructio is a brach -- Restricts istructio format (fewer istructios?) or wastes space No-uiform decode E.g., opcode ca be the 1st-7th byte i x86 + More compact ad powerful istructio format -- More complex decode logic Uiform decode usually meas fixed legth as well 14

x86 vs. MIPS Istructio Formats x86 MIPS: 0 6-bit opcode 6-bit opcode 6-bit rs 5-bit rs 5-bit rt 5-bit rt 5-bit immediate 26-bit rd 5-bit immediate 16-bit shamt 5-bit fuct 6-bit R-type I-type J-type 15

ISA Elemet: Registers Fast storage How may? Size of each register? Geeral purpose vs. special purpose? Why is havig registers a good idea? Because programs exhibit a characteristic called data locality A recetly produced/accessed value is likely to be used more tha oce (temporal locality) Storig that value i a register elimiates the eed to go to memory each time that value is eeded Complier: Register optimizatio is importat! 16

ISA Elemet: Memory Orgaizatio Address space: How may uiuely idetifiable locatios i memory Addressability: How much data does each uiuely idetifiable locatio store Byte addressable: most ISAs Aliged/ualiged access MSB byte-3 byte-2 byte-1 byte-0 byte-7 byte-6 byte-5 byte-4 LSB 17

ISA Elemet: I/O How to iterface with I/O devices Memory mapped I/O A regio of memory is mapped to I/O devices I/O operatios are loads ad stores to those locatios Special I/O istructios IN ad OUT istructios i x86 deal with ports of the chip Tradeoffs? Which oe is more geeral purpose? 18

Other ISA Elemets Privilege modes User vs supervisor Who ca execute what istructios? Who ca access which registers? Exceptio ad iterrupt hadlig What procedure is followed whe somethig goes wrog with a istructio? What procedure is followed whe a exteral device reuests the processor? Virtual memory Each program has the illusio of the etire memory space, which is greater tha physical memory Access protectio 19

May Differet ISAs X86 ARM MIPS SPARC IBM 360 What/why are the fudametal differeces? 20

CISC vs. RISC CISC, Complex istructio set computer à complex istructios Iitially motivated by ot good eough code geeratio Memory size/badwidth cosideratios RISC, Reduced istructio set computer à simple istructios Goal: eable better compiler cotrol ad optimizatio Motivated by Simplifyig the hardware à lower cost, higher freuecy Eablig the compiler to optimize the code better Simple compiler, complex hardware vs. complex compiler, simple hardware 21

CISC vs. RISC Usually, RISC Simple istructios Fixed legth Uiform decode Few addressig modes CISC Complex istructios Variable legth No-uiform decode May addressig modes 22

ARMv8/LEGv8 Case Study 23

The ARMv8 ISA Commercialized by ARM Holdigs (www.arm.com) Large share of embedded core market Applicatios i mobile, cosumer electroics, etwork/storage euipmet, cameras, priters, Typical of may moder ISAs Referece (5740 pages) https://developer.arm.com/docs/ddi0487/a/arm-architecturereferece-maual-armv8-for-armv8-a-architecture-profile **Based o origial figure from [P&H CO&D, COPYRIGHT 2016 Elsevier. ALL RIGHTS RESERVED.]

ARMv8 Overview RISC, Load/store architecture, both 32- ad 64-bit 3-address machie 32-bit istructios Simple datatypes it, floatig poit, fixed poit, vector Addressig modes: reg, imm, simple mem addressig mem addr. calculated from reg ad istructio cotets oly 32 GPRs, PC, SP, ELR, 32 SIMD/FP registers Byte addressable You will implemet ARMv8 i C (Lab1)

LEGv8 A subset of ARMv8 With some differeces Referece Gree card from textbook Also available olie http://booksite.elsevier.com/9780128017333/arm_ref.php

Istructio Formats **Based o origial figure from [P&H CO&D, COPYRIGHT 2016 Elsevier. ALL RIGHTS RESERVED.]

R-format Istructios opcode Rm shamt R Rd 11 bits 5 bits 6 bits 5 bits 5 bits Istructio fields opcode: operatio code Rm: the secod register source operad shamt: shift amout (oly used for shift operatios) R: the first register source operad Rd: the register destiatio **Based o origial figure from [P&H CO&D, COPYRIGHT 2016 Elsevier. ALL RIGHTS RESERVED.]

R-format Example opcode Rm shamt R Rd 11 bits 5 bits 6 bits 5 bits 5 bits ADD X9,X20,X21 // add the values i X20 ad X21, ad put //the result i X9, or GPR[x9] = GPR[x20]+GPR[x21] 10001011000 two 10101 two 000000 two 10100 two 01001 two 1000 1011 0001 0101 0000 0010 1000 1001 two = 8B150289 16 **Based o origial figure from [P&H CO&D, COPYRIGHT 2016 Elsevier. ALL RIGHTS RESERVED.]

shamt i R-format istructios opcode Rm shamt R Rd shamt: how may positios to shift Shift left logical (LSL) R[Rd] <- R[R] << shamt //Shift left ad fill with 0 bits LSL by i bits: multiplies by 2 i Shift right logical (LSR) 11 bits 5 bits 6 bits 5 bits 5 bits R[Rd] <- R[R] >> shamt //Shift right ad fill with 0 bits LSR by i bits: divides by 2 i (usiged oly) Note, the shifted register versios of R istructios i ARMv8 support shift operatios i the secod operad before applyig the operatio specified i opcode **Based o origial figure from [P&H CO&D, COPYRIGHT 2016 Elsevier. ALL RIGHTS RESERVED.]

C to Assembly 101 C code: f = (g + h) - (i + j); f,, j i X19, X20,, X23 Compiled ito assembly: ADD X9, X20, X21 ADD X10, X22, X23 SUB X19, X9, X10 **Based o origial figure from [P&H CO&D, COPYRIGHT 2016 Elsevier. ALL RIGHTS RESERVED.]

I-format Istructios opcode immediate R Rd 10 bits 12 bits 5 bits 5 bits Immediate istructios R: source register Rd: destiatio register Immediate field: costat data; zero-exteded Example: ADDI X22, X22, #4 What does the machie code look like for ADDI? **Based o origial figure from [P&H CO&D, COPYRIGHT 2016 Elsevier. ALL RIGHTS RESERVED.]

D-format Istructios Load/store istructios opcode addoffset op2 R Rt 11 bits 9 bits 2 bits 5 bits 5 bits R: base register addoffset: costat offset from cotets of base register (+/- 32 doublewords) op2: expads the opcode field Rt: destiatio (load) or source (store) register umber Example: LDUR X9,[X22,#64] LDUR opcode:11111000010 2; op2:0 X9 (Rt field) X22 (R field) **Based o origial figure from [P&H CO&D, COPYRIGHT 2016 Elsevier. ALL RIGHTS RESERVED.]

C to Assembly 201 C code: A[12] = h + A[8]; h i X21, base address of A i X22, elemets i A are 8 bytes X9 is used as a temp register Compiled code: LDUR X9,[X22,#64] ADD STUR X9,X21,X9 X9,[X22,#96] Note: Idex 8 reuires offset of 64 (byte-addressed memory) **Based o origial figure from [P&H CO&D, COPYRIGHT 2016 Elsevier. ALL RIGHTS RESERVED.]

B Format Istructios opcode BR_address 6 bits 26 bits Example: B L1 brach ucoditioally to istructio labeled L1; B opcode: 000101 2 (ARMv8) Effect: PC = PC + BrachAddr **Based o origial figure from [P&H CO&D, COPYRIGHT 2016 Elsevier. ALL RIGHTS RESERVED.]

CB Format Istructios Brach to a labeled istructio if a coditio is true Otherwise, cotiue seuetially Examples opcode CBZ register, L1 COND_BR_address 8 bits 19 bits 5 bits if (register == 0) brach to istructio labeled L1; CBNZ register, L1 if (register!= 0) brach to istructio labeled L1; Effect: if take, PC = PC + CodBrachAddr; else PC=PC+4 18 Rt **Based o origial figure from [P&H CO&D, COPYRIGHT 2016 Elsevier. ALL RIGHTS RESERVED.]