CSCI 402: Computer Architectures. Instructions: Language of the Computer (4) Fengguang Song Department of Computer & Information Science IUPUI

Similar documents
Two processors sharing an area of memory. P1 writes, then P2 reads Data race if P1 and P2 don t synchronize. Result depends of order of accesses

LECTURE 2: INSTRUCTIONS

Computer Architecture

Procedure Call Instructions

Chapter 2. Instructions: Language of the Computer

Chapter 2. Instructions: Language of the Computer

Architecture II. Computer Systems Laboratory Sungkyunkwan University

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 2. Instructions: Language of the Computer

Instruction Set. The MIPS Instruction Set. Chapter 2

Chapter 2. Instructions: Language of the Computer

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 2. Instructions: Language of the Computer

MIPS Instruction Set Architecture (2)

1 5. Addressing Modes COMP2611 Fall 2015 Instruction: Language of the Computer

Chapter 2. Instructions: Language of the Computer. Baback Izadi ECE Department

ECE369. Chapter 2 ECE369

Thomas Polzer Institut für Technische Informatik

Chapter 2. Baback Izadi Division of Engineering Programs

Instructions: Language of the Computer. Euiseong Seo Computer Systems Laboratory Sungkyunkwan University

Chapter 2. Instructions: Language of the Computer

EC 413 Computer Organization

Rechnerstrukturen. Chapter 2. Instructions: Language of the Computer

Computer Systems Laboratory Sungkyunkwan University

COE608: Computer Organization and Architecture

CS3350B Computer Architecture MIPS Procedures and Compilation

MIPS R-format Instructions. Representing Instructions. Hexadecimal. R-format Example. MIPS I-format Example. MIPS I-format Instructions

Computer Architecture Computer Science & Engineering. Chapter 2. Instructions: Language of the Computer BK TP.HCM

Chapter 2. Instructions: Language of the Computer. Adapted by Paulo Lopes

ELEC / Computer Architecture and Design Fall 2013 Instruction Set Architecture (Chapter 2)

Chapter 2. Instructions: Language of the Computer. HW#1: 1.3 all, 1.4 all, 1.6.1, , , , , and Due date: one week.

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

COMPUTER ORGANIZATION AND DESIGN

Control Instructions. Computer Organization Architectures for Embedded Computing. Thursday, 26 September Summary

Control Instructions

Chapter 2. Instructions: Language of the Computer

Lecture 4: MIPS Instruction Set

Instructions: Language of the Computer

CS3350B Computer Architecture MIPS Instruction Representation

Branch Addressing. Jump Addressing. Target Addressing Example. The University of Adelaide, School of Computer Science 28 September 2015

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: MIPS Instruction Set Architecture

CENG3420 Lecture 03 Review

Chapter 2A Instructions: Language of the Computer

Computer Architecture

Chapter 2: Instructions: Language of the Computer CSCE 212 Introduction to Computer Architecture, Spring

Machine Language Instructions Introduction. Instructions Words of a language understood by machine. Instruction set Vocabulary of the machine

Topic Notes: MIPS Instruction Set Architecture

Chapter 2. Instruction Set Architecture (ISA)

Instruction Set Architecture. "Speaking with the computer"

Instructions: MIPS arithmetic. MIPS arithmetic. Chapter 3 : MIPS Downloaded from:

Systems Architecture I

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: MIPS Instruction Set Architecture

Stored Program Concept. Instructions: Characteristics of Instruction Set. Architecture Specification. Example of multiple operands

Computer Architecture. Chapter 2-2. Instructions: Language of the Computer

Chapter 3 MIPS Assembly Language. Ó1998 Morgan Kaufmann Publishers 1

Computer Organization MIPS ISA

Computer Architecture

Lecture 2. Instructions: Language of the Computer (Chapter 2 of the textbook)

Chapter 2. Instructions: Language of the Computer

Chapter 2. Computer Abstractions and Technology. Lesson 4: MIPS (cont )

ENGN1640: Design of Computing Systems Topic 03: Instruction Set Architecture Design

MODULE 4 INSTRUCTIONS: LANGUAGE OF THE MACHINE

Computer Architecture. Lecture 2 : Instructions

Chapter 2. lw $s1,100($s2) $s1 = Memory[$s2+100] sw $s1,100($s2) Memory[$s2+100] = $s1

COMPSCI 313 S Computer Organization. 7 MIPS Instruction Set

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

CENG3420 L03: Instruction Set Architecture

CSCI 402: Computer Architectures. Instructions: Language of the Computer (3) Fengguang Song Department of Computer & Information Science IUPUI.

CSE 141 Computer Architecture Spring Lecture 3 Instruction Set Architecute. Course Schedule. Announcements

Review of instruction set architectures

Chapter 2: Instructions:

EEC 581 Computer Architecture Lecture 1 Review MIPS

Reduced Instruction Set Computer (RISC)

Lecture 4: Instruction Set Architecture

CS222: Dr. A. Sahu. Indian Institute of Technology Guwahati

ISA and RISCV. CASS 2018 Lavanya Ramapantulu

Chapter 2 Instructions: Language of the Computer

Chapter 2. Instructions:

Introduction to the MIPS. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

CS222: MIPS Instruction Set

CSCI 402: Computer Architectures

MIPS Memory Access Instructions

CS 61C: Great Ideas in Computer Architecture. MIPS Instruction Formats

ECE 486/586. Computer Architecture. Lecture # 8

Programmable Machines

Course Administration

Rui Wang, Assistant professor Dept. of Information and Communication Tongji University.

Reduced Instruction Set Computer (RISC)

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA MIPS ISA. In a CPU. (vonneumann) Processor Organization

Lecture Topics. Branch Condition Options. Branch Conditions ECE 486/586. Computer Architecture. Lecture # 8. Instruction Set Principles.

Programmable Machines

Communicating with People (2.8)

2. Instructions: Language of the Computer Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3

Machine Instructions - II. Hwansoo Han

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA

All instructions have 3 operands Operand order is fixed (destination first)

Procedure Calling. Procedure Calling. Register Usage. 25 September CSE2021 Computer Organization

Chapter 1. Computer Abstractions and Technology. Lesson 3: Understanding Performance

ECE/CS 552: Introduction to Computer Architecture ASSIGNMENT #1 Due Date: At the beginning of lecture, September 22 nd, 2010

Instructions: MIPS ISA. Chapter 2 Instructions: Language of the Computer 1

Reminder: tutorials start next week!

Chapter 2. Instructions: Language of the Computer. Jiang Jiang

Transcription:

CSCI 402: Computer Architectures Instructions: Language of the Computer (4) Fengguang Song Department of Computer & Information Science IUPUI op Instruction address 6 bits 26 bits Jump Addressing J-type format: e.g., jal target, j target Jump (j and jal) targets could be anywhere in the text segment (see next slide) They encode an absolute address in instruction n Jump addressing Op=2->j Op=3->jal n New address = PC 31 28 (address 4) Q: Is instruction jr a J-format instruction? 2 1

Memory Layout (4 segments) Text: instructions (i.e., code) Static data: global variables, static variables, constant arrays and strings $gp initialized to the starting address Dynamic data: heap by malloc in C, new in C++ Stack: automatic storage Reserved is not counted! 3 Branch Addressing PC beq $s1, $s2, Label PC+4 add $t0, $s0, $s1 Most branch targets are nearby branch In SPEC, 50% of branches are within the distance of 16 instructions! Branch either forward or backward op rs rt constant or address 6 bits 5 bits 5 bits 16 bits n PC-relative addressing n Target address = PC + offset 4 n Note: PC was already incremented by 4 automatically 4 2

Target Addressing Example Loop code from Assume Loop is at location 0x80000 Loop: sll $t1, $s3, 2 80000 0 0 19 9 2 0 add $t1, $t1, $s6 80004 0 9 22 9 0 32 lw $t0, 0($t1) 80008 35 9 8 0 bne $t0, $s5, Exit 80012 5 8 21 +2? addi $s3, $s3, 1 80016 8 19 19 1 j Loop 80020 2 20000? Exit: 80024 5 What If Branching Too Far Away? If branch target is too far to encode (with 16 bits), assembler will rewrite the code E.g., beq $s0,$s1, L1 bne $s0,$s1, L2 j L1 L2: //L2 has 26 bits 6 3

All 5 Addressing Modes on MIPS addi add lw For instructions bne, beq (31:28) j, jal Assembler Pseudoinstructions Most assembler instructions have a one-to-one relationship to machine instructions add, sub, lw, sw, bne Pseudoinstructions: Not real instructions, in the assembler s imagination, e.g., move $t0, $t1 add $t0, $zero, $t1 blt $t0, $t1, L slt $at, $t0, $t1 bne $at, $zero, L others: bgt, ble, bge, $at (i.e., register 1): assembler temporary 8 4

Let s Look at A Real Example To show the performance of bubble sort void sort (int v[], int n) { int i, j; for (i = 0; i < n; i += 1) { for (j = i 1; //from right to left j >= 0 && v[j+1] < v[j]; j--) { swap(v,j); //swap v[j], v[j+1] } } } The above algorithm is to repeatedly move the smallest element to the lowest index position of the array 9 Effect of Compiler Optimization 180000 160000 140000 120000 100000 80000 60000 40000 20000 0 Compiled with gcc for Pentium 4 under Linux -Used to Sort 100,000 words Clock Cycles none O1 O2 O3 3 2.5 2 1.5 1 0.5 0 Relative Performance none O1 O2 O3 4 versions 140000 120000 100000 80000 60000 40000 20000 0 Instruction count none O1 O2 O3 2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 CPI none O1 O2 O3 10 5

Lesson Learnt Instruction count and CPI are not good performance indicators in isolation 11 Effect of Programming Languages 3 Bubblesort Relative Performance Bubble Sort 2.5 2 1.5 1 0.5 8.3X Higher is better! 0 C/none C/O1 C/O2 C/O3 Java/int Java/JIT 2.5 Quicksort Relative Performance Quick Sort 2 1.5 1 0.5 0 20X 3X 6.6X C/none C/O1 C/O2 C/O3 Java/int Java/JIT Lesson: Compiler optimizations are sensitive to the algorithm Lesson: Java/JIT compiled code is significantly faster than JVM interpreted, even comparable to optimized C in some cases 12 6

Effect of Different Algorithms 3000 2500 2000 1500 2468X Quicksort vs. Bubblesort Speedup 1000 500 0 338X C/none C/O1 C/O2 C/O3 Java/int Java/JIT 13 Lesson Learnt However, nothing can fix a dumb algorithm! 14 7

Section 2.14 Arrays vs. Pointers Both arrays and pointers can be used to access C arrays, e.g., array[10], *ptr Array indexing (e.g., array[index]) (1) Multiply the index by the element size (2) Then add it to the array base address Pointers correspond to memory addresses NO indexing calculation is needed But pointers are difficult to use W will look at assembly code to get an insight 15 Example: Clearing an Array (i.e., set every element to zero) $a0 $a1 $a0 $a1 clear_v1(int array[], int size) { int i; //in $t0 for (i = 0; i < size; i += 1) array[i] = 0; } clear_v2(int *array, int size) { int *p; //in $t0 for (p = &array[0]; p < &array[size]; p = p + 1) *p = 0; } move $t0,$zero # i = 0 loop1: sll $t1,$t0,2 # $t1 = i * 4 add $t2,$a0,$t1 # $t2 = # &array[i] sw $zero, 0($t2) # array[i] = 0 addi $t0,$t0,1 # i += 1 slt $t3,$t0,$a1 # $t3 = # (i < size) bne $t3,$zero,loop1 # if ( ) # goto loop1 move $t0,$a0 # p = & array[0] sll $t1,$a1,2 # $t1 = size * 4 add $t2,$a0,$t1 # $t2 = # &array[size] loop2: sw $zero,0($t0) # Memory[p] = 0 addi $t0,$t0,4 # p = p + 4 slt $t3,$t0,$t2 # $t3 = #(p<&array[size]) bne $t3,$zero,loop2 # if ( ) # goto loop2 16 8

Array version vs. Pointer version The Array Version requires shift inside the loop Also, offset needs to be added to the base address In general, compiler can achieve the same effect as our manual control of pointers This is better because it make program clearer and safer 17 x86 Registers (80386 and IA-32) 8 32-bit GPRs Index: 3 bits //MIPS has 32 GPRs 6 16-bit segment registers 18 9

What about 64-bit Mode Intel Architectures? Basic Program Execution Registers Sixteen 64-bit Registers General-Purpose Registers 2^64-1 Address Space Six 16-bit Registers 64-bits 64-bits Segment Registers RFLAGS Register RIP (Instruction Pointer Register) FPU Registers Eight 80-bit Registers Floating-Point Data Registers 16 bits Control Register 16 bits Status Register 0 16 bits Tag Register Opcode Register (11-bits) 64 bits FPU Instruction Pointer Register 64 bits FPU Data (Operand) Pointer Register Bounds Registers https://software.intel.com/sites/default/files/managed/39/c5/32546 2-sdm-vol-1-2abcd-3abcd.pdf (Vol. 1 3-3) Intel 64 and IA-32 Architectures Software Developer s Manual Volume 1: Basic Architecture 19 x86 s Addressing Modes 2 operands per instruction Source & Dest operand 2nd source operand Register Register Register Immediate Register Memory Memory Register Memory Immediate Memory Memory n Differences (vs MIPS) n One operand must be both source and dest n One of the two operands can be in memory n In MIPS, all operands are in registers 20 10

2 bytes x86 s Instruction Encoding 1 byte 5 bytes 3 bytes 5 bytes 6 bytes Variable-length encoding Ranges from 1 byte to 15 bytes Either with 8-, 16- or 32 bits operand Postfix bytes to specify addressing mode Complicated, see Fig. 2.42 in book Prefix bytes also modify operation Operand length, repetition, locking, 21 Purpose and Caveats ISA Reference: http://www.intel.com/assets/en_us/pdf/manual/253667.pdf This guide should give you enough background to read and understand (most) of the 64bit x86 Quote from a professor at UCSD assembly that gcc is likely to produce. x86 is a poorly-designed ISA. It s a mess, but it is the most widely used ISA in the world today. It breaks almost every rule of good ISA design Just because it is popular does not mean it s good Intel and AMD have managed to engineer (at considerable cost) their CPUs so that this ugliness has relatively little impact on their processors design (more on this later) There s a nice example here http://en.wikibooks.org/wiki/x86_assembly/gas_syntax 22 11

Implementation of x86 ISA Complex instructions make implementation difficult Hardware has to translate complex instructions to simpler micro-operations Simple instructions: 1 1 Complex instructions: 1 many! So Intel hardware has a micro-engine It is similar to MIPS Nevertheless, its large market share makes this method economically viable Have comparable performance to RISC Because compilers manage to avoid complex instructions 23 RISC - Reduced Instruction Set Computer As opposed to CISC (Complicated Instruction Set Computer or x86) RISC philosophy: fixed instruction lengths, Always 32 bits load-store instruction sets small number of addressing modes limited operations simpler, cheaper CISC philosophy (Intel/AMD x86) Hardware is faster than software! RISC examples: MIPS, ARM, Sun SPARC, HP PA-RISC, IBM PowerPC, DEC/Compaq/HP Alpha, Design goals of ISA: Speed, cost, size, power consumption, reliability, memory space 24 12

Fallacies Powerful instruction Þ Higher performance yes, fewer instructions will be required but, complex instructions are hard to implement May slow down all instructions, including those simple common ones e.g., blt is not provided on MIPS Also, compilers are particularly good at making faster code from simple instructions Try to use assembly code for higher performance Modern compilers are better at dealing with modern processors Even if assembly code is faster à But take longer time, not portable More lines of code Þ Have more errors and less productivity 25 Fallacies Importance of binary compatibility Þ Instruction set does Not change But they do frequently add more instructions x86 instruction set One instruction added per month BTW, it increase the difficulty for other companies to try to build compatible processors! 26 13

Pitfalls Sequential words are located at sequential byte addresses (e.g. ptr++) Increment by 4, not by 1! Try to access a pointer (pointing to an automatic variable) outside of its defining procedure e.g., passing the pointer back by return Why? pointer becomes invalid when stack popped 27 Concluding Remarks Learned 4 design principles in Chapter: 1. Simplicity favors regularity (e.g., fixed 32 bits, three operands, three instruction formats) 2. Smaller is faster (e.g., just 32 registers): 3. Make the common case fast (e.g., support immediate operands, use slt+bne for blt) 4. Good design demands good compromise (e.g., fixed vs various size) x86 uses different sizes for different instructions MIPS makes all the instructions the same length, thereby requiring different instruction formats Layers of software/hardware High Level Language à Compiler à Assembler à hardware MIPS: A classical example of RISC ISAs vs. x86 (CISC) 28 14

Concluding Remarks MIPS Instruction categories: arithmetic, logical, data transfer, conditional branches, unconditional jumps But they are not equally popular Based on the measurement of MIPS instruction executions in benchmark programs: 29 The MIPS Instructions we have mentioned in classes. You should feel comfortable with them, e.g., know what it means which is the 1 st source, 2 nd source, and destination 30 15

ARM v8 Instructions ARM v8 is for 64-bit ARM Revealed in 2013 ARM v8 resembles MIPS If you know MIPS, it is very easy for you to pick up ARM v8. Changes from ARM v7 such as: Immediate field is 12-bit constant //like MIPS now GPR set expanded to 32 //like MIPS Addressing modes work for all word sizes Divide instruction Branch if equal, and branch if not equal instructions //like MIPS Is now much closer to MIPS than ARM v7 31 16