Computer Architecture and Organization

Similar documents
Principles of Computer Architecture. Chapter 5: Languages and the Machine

Principles of Computer Architecture. Chapter 10: Trends in Computer. Principles of Computer Architecture by M. Murdocca and V.

CPSC 352. Computer Organization. Chapter 5: Languages and the

CMSC 313 Lecture 27. System Performance CPU Performance Disk Performance. Announcement: Don t use oscillator in DigSim3

Chapter 7 Subroutines. Richard P. Paul, SPARC Architecture, Assembly Language Programming, and C

What is Pipelining? RISC remainder (our assumptions)

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

REDUCED INSTRUCTION SET COMPUTERS (RISC)

Pipelining, Branch Prediction, Trends

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Advanced Computer Architecture

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

EXAM #1. CS 2410 Graduate Computer Architecture. Spring 2016, MW 11:00 AM 12:15 PM

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

ECE 341. Lecture # 15

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.

CSE A215 Assembly Language Programming for Engineers

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr

Designing for Performance. Patrick Happ Raul Feitosa

SISTEMI EMBEDDED. Computer Organization Pipelining. Federico Baronti Last version:

ECE 486/586. Computer Architecture. Lecture # 12

Major Advances (continued)

UNIT- 5. Chapter 12 Processor Structure and Function

Pipelining and Vector Processing

EITF20: Computer Architecture Part2.2.1: Pipeline-1

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

Chapter 13 Reduced Instruction Set Computers

Lecture 4: RISC Computers

Lecture: Pipelining Basics

Xuan Guo. Lecture XIX: Subroutines (2) CSC 3210 Computer Organization and Programming Georgia State University. March 31, 2015.

Modern Processors. RISC Architectures

Chapter 6: Datapath and Control. Principles of Computer Architecture. Principles of Computer Architecture by M. Murdocca and V.

P = time to execute I = number of instructions executed C = clock cycles per instruction S = clock speed

Instr. execution impl. view

CPU Structure and Function

More advanced CPUs. August 4, Howard Huang 1

Chapter 04: Instruction Sets and the Processor organizations. Lesson 20: RISC and converged Architecture

COSC 6385 Computer Architecture - Pipelining (II)

Week 11: Assignment Solutions

5008: Computer Architecture HW#2

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

Pipelined Processor Design

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Instruction Pipelining

Instruction Pipelining

CS Mid-Term Examination - Fall Solutions. Section A.

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

Compiler Code Generation COMP360

Spring 2014 Midterm Exam Review

Lecture 4: RISC Computers

Improving Performance: Pipelining

07 - Program Flow Control

CS311 Lecture: Pipelining, Superscalar, and VLIW Architectures revised 10/18/07

ECEC 355: Pipelining

Module 4c: Pipelining

Chapter 9. Pipelining Design Techniques

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

ELE 375 Final Exam Fall, 2000 Prof. Martonosi

Pipelining: Hazards Ver. Jan 14, 2014

Final Exam Fall 2007

CPE300: Digital System Architecture and Design

COSC 6385 Computer Architecture - Pipelining

Performance. CS 3410 Computer System Organization & Programming. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]

CPU Structure and Function. Chapter 12, William Stallings Computer Organization and Architecture 7 th Edition

Page 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer

Course Description: This course includes concepts of instruction set architecture,

William Stallings Computer Organization and Architecture. Chapter 12 Reduced Instruction Set Computers

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard.

Computer Architecture CS372 Exam 3

Lecture 5: Pipelining Basics

MACHINE LANGUAGE AND ASSEMBLY LANGUAGE

Pipelining. Maurizio Palesi

Pipeline Review. Review

EITF20: Computer Architecture Part2.2.1: Pipeline-1

CHAPTER 8: CPU and Memory Design, Enhancement, and Implementation

Chapter 14 - Processor Structure and Function

Chapter 12. CPU Structure and Function. Yonsei University

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

CS356 Unit 12a. Logic Circuits. Combinational Logic Gates BASIC HW. Processor Hardware Organization Pipelining

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

INSTRUCTION LEVEL PARALLELISM

Pipeline Architecture RISC

Lecture 15: Pipelining. Spring 2018 Jason Tang

Compiler Architecture

Code Generation. CS 540 George Mason University

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

Unpipelined Machine. Pipelining the Idea. Pipelining Overview. Pipelined Machine. MIPS Unpipelined. Similar to assembly line in a factory

CS / ECE 6810 Midterm Exam - Oct 21st 2008

CS311 Lecture: Pipelining and Superscalar Architectures

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

CPU Pipelining Issues

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )

Instruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

Appendix C: Pipelining: Basic and Intermediate Concepts

Transcription:

6-1 Chapter 6 - Languages and the Machine Computer Architecture and Organization Miles Murdocca and Vincent Heuring Chapter 6 Languages and the Machine

6-2 Chapter 6 - Languages and the Machine Chapter Contents 6.1 The Compilation Process 6.2 The Assembly Process 6.3 Linking and Loading 6.4 Macros 6.5 Quantitative Analyses of Program Execution 6.6 From CISC to RISC 6.7 Pipelining the Datapath 6.8 Overlapping Register Windows 6.9 Low Power Coding

6-3 Chapter 6 - Languages and the Machine The Compilation Process Compilation translates a program written in a high level language into a functionally equivalent program in assembly language. Consider a simple high-level language assignment statement: A = B + 4; Steps involved in compiling this statement into assemby code: Reducing the program text to the basic symbols of the language (for example, into identifiers such as A and B), denotations such as the constant value 4, and program delimiters such as = and +. This portion of compilation is referred to as lexical analysis. Parsing symbols to recognize the underlying program structure. For the statement above, the parser must recognize the form: Identifier = Expression, where Expression is further parsed into the form: Identifier + Constant. Parsing is sometimes called syntactic analysis.

6-4 Chapter 6 - Languages and the Machine The Compilation Process Name analysis: associating the names A and B with particular program variables, and further associating them with particular memory locations where the variables are located at run time. Type analysis: determining the types of all data items. In the example above, variables A and B and constant 4 would be recognized as being of type int in some languages. Name and type analysis are sometimes referred to together as semantic analysis: determining the underlying meaning of program components. Action mapping and code generation: associating program statements with their appropriate assembly language sequence. In the statement above, the assembly language sequence might be as follows: ld [B], %r0, %r1 add %r1, 4, %r2 st %r2, %r0, [A]! Get variable B into a register.! Compute the value of the expression! Make the assignment.

6-5 Chapter 6 - Languages and the Machine The Assembly Process The process of translating an assembly language program into a machine language program is referred to as the assembly process. Production assemblers generally provide this support: Allow programmer to specify locations of data and code. Provide assembly-language mnemonics for all machine instructions and addressing modes, and translate valid assembly language statements into the equivalent machine language. Permit symbolic labels to represent addresses and constants. Provide a means for the programmer to specify the starting address of the program, if there is one; and provide a degree of assemble-time arithmetic. Include a mechanism that allows variables to be defined in one assembly language program and used in another, separately assembled program. Support macro expansion.

6-6 Chapter 6 - Languages and the Machine Assembly Example We explore how the assembly process proceeds by hand assembling a simple ARC assembly language program.

6-7 Chapter 6 - Languages and the Machine Instruction Formats and PSR Format for the ARC

6-8 Chapter 6 - Languages and the Machine Assembled Code ld [x], %r1 1100 0010 0000 0000 0010 1000 0001 0100 ld [y], %r2 1100 0100 0000 0000 0010 1000 0001 1000 addcc %r1,%r2,%r3 1000 0110 1000 0000 0100 0000 0000 0010 st %r3, [z] 1100 0110 0010 0000 0010 1000 0001 1100 jmpl %r15+4, %r0 1000 0001 1100 0011 1110 0000 0000 0100 15 0000 0000 0000 0000 0000 0000 0000 1111 9 0000 0000 0000 0000 0000 0000 0000 1001 0 0000 0000 0000 0000 0000 0000 0000 0000

6-9 Chapter 6 - Languages and the Machine Forward Referencing An example of forward referencing:

6-10 Chapter 6 - Languages and the Machine

6-11 Chapter 6 - Languages and the Machine Assembled Program

6-12 Chapter 6 - Languages and the Machine Linking: Using.global and.extern A.global is used in the module where a symbol is defined and a.extern is used in every other module that refers to it.

6-13 Chapter 6 - Languages and the Machine Linking and Loading: Symbol Tables Symbol tables for the previous example:

6-14 Chapter 6 - Languages and the Machine Example ARC Program

6-15 Chapter 6 - Languages and the Machine Macro Definition A macro definition for push:

6-16 Chapter 6 - Languages and the Machine Recursive Macro Expansion

6-17 Chapter 6 - Languages and the Machine Instruction Frequency Frequency of occurrence of instruction types for a variety of languages. The percentages do not sum to 100 due to roundoff. (Adapted from Knuth, D. E., An Empirical Study of FORTRAN Programs, Software Practice and Experience, 1, 105-133, 1971.)

6-18 Chapter 6 - Languages and the Machine Complexity of Assignments Percentages showing complexity of assignments and procedure calls. (Adapted from Tanenbaum, A., Structured Computer Organization, 4/e, Prentice Hall, Upper Saddle River, New Jersey, 1999.)

6-19 Chapter 6 - Languages and the Machine Speedup and Efficiency Speedup S is the ratio of the time needed to execute a program without an enhancement to the time required with an enhancement. Time T is computed as the instruction count IC times the number of cycles per instruction CPI times the cycle time τ. Substituting T into the speedup percentage calculation above yields:

6-20 Chapter 6 - Languages and the Machine Example Example: Estimate the speedup obtained by replacing a CPU having an average CPI of 5 with another CPU having an average CPI of 3.5, with the clock period increased from 100 ns to 120 ns. The previous equation becomes:

6-21 Chapter 6 - Languages and the Machine Four/Five-Stage Instruction Pipeline We used a five-step fetch-execute cycle earlier: (1) instruction fetch, (2) decode, (3) operand fetch, (4) ALU operation, (5) result writeback. These five phases can be thought of as only four phases in which the fourth phase, execute, has two subphases: ALU operation and writeback. A result writeback is not always needed and can be bypassed, thus the five phases are only four phases some of the time. For this discussion, we take a simple approach and force all instructions to go entirely through each phase whether or not that is actually needed, and so the ALU operation and writeback that are combined below will be implemented in five phases here.

6-22 Chapter 6 - Languages and the Machine Pipeline Behavior Pipeline behavior during a memory reference and during a branch.

6-23 Chapter 6 - Languages and the Machine Filling the Load Delay Slot SPARC code, (a) with a nop inserted, and (b) with srl migrated to nop position.

6-24 Chapter 6 - Languages and the Machine Call-Return Behavior Call-return behavior as a function of nesting depth and time (Adapted from Stallings, W., Computer Organization and Architecture: Designing for Performance, 4/e, Prentice Hall, Upper Saddle River, 1996).

6-25 Chapter 6 - Languages and the Machine User view of RISC I registers. SPARC Registers

6-26 Chapter 6 - Languages and the Machine Overlapping Register Windows

6-27 Chapter 6 - Languages and the Machine Example: Compiled C Program Source code for C program to be compiled with gcc.

6-28 Chapter 6 - Languages and the Machine gcc Generated SPARC Code

6-29 Chapter 6 - Languages and the Machine gcc Generated SPARC Code (cont )

6-30 Chapter 6 - Languages and the Machine Effect of Compiler Optimization SPARC code generated with the -O optimization flag:

6-31 Chapter 6 - Languages and the Machine Low Power Coding Consider the ARC sequence shown below: ld [2096], %r16 ld [2100], %r17 ld [2104], %r18 ld [2108], %r19 The corresponding machine code is: (Continued on next slide)

6-32 Chapter 6 - Languages and the Machine Low Power Coding (Continued) The total number of transitions for the code in the previous slide is eight. However, if we reorder the last two instructions, then the total number of transitions is reduced to only six: There are several other optimizations that can be made to code sequences based on choosing instructions and parameters from functionally equivalent possibilities in such a way that the number of transitions are reduced. (Continued on next slide)

6-33 Chapter 6 - Languages and the Machine Low Power Coding (continued) The use of a Gray code reduces the number of transitions in sequences of instructions, and in sequences of addresses: