CPE300: Digital System Architecture and Design

Similar documents
EE292: Fundamentals of ECE

ECE331: Hardware Organization and Design

CPE300: Digital System Architecture and Design

The bottom line: Performance. Measuring and Discussing Computer System Performance. Our definition of Performance. How to measure Execution Time?

M1 Computers and Data

CPE300: Digital System Architecture and Design

Chapter 2 Logic Gates and Introduction to Computer Architecture

Defining Performance. Performance. Which airplane has the best performance? Boeing 777. Boeing 777. Boeing 747. Boeing 747

Microcomputers. Outline. Number Systems and Digital Logic Review

Defining Performance. Performance 1. Which airplane has the best performance? Computer Organization II Ribbens & McQuain.

Topics/Assignments. Class 10: Big Picture. What s Coming Next? Perspectives. So Far Mostly Programmer Perspective. Where are We? Where are We Going?

CpE 442 Introduction to Computer Architecture. The Role of Performance

Lecture 2: Computer Performance. Assist.Prof.Dr. Gürhan Küçük Advanced Computer Architectures CSE 533

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture

EE 3170 Microcontroller Applications

CO Computer Architecture and Programming Languages CAPL. Lecture 15

Number System. Introduction. Decimal Numbers

What is Good Performance. Benchmark at Home and Office. Benchmark at Home and Office. Program with 2 threads Home program.

1010 2?= ?= CS 64 Lecture 2 Data Representation. Decimal Numbers: Base 10. Reading: FLD Digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

Course web site: teaching/courses/car. Piazza discussion forum:

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

Introduction to CPU Design

Processing Unit CS206T

CS 31: Intro to Systems Digital Logic. Kevin Webb Swarthmore College February 2, 2016

MC1601 Computer Organization

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 1. Computer Abstractions and Technology

The Role of Performance

Cycles Per Instruction For This Microprocessor

Performance, Power, Die Yield. CS301 Prof Szajda

Introduction to Computer Science-103. Midterm

Dec Hex Bin ORG ; ZERO. Introduction To Computing

CPE300: Digital System Architecture and Design

The Von Neumann Computer Model

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology

SAE5C Computer Organization and Architecture. Unit : I - V

Chapter 4. The Processor

Designing for Performance. Patrick Happ Raul Feitosa

ECE 2030D Computer Engineering Spring problems, 5 pages Exam Two 8 March 2012

Response Time and Throughput

Performance COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Chapter 4. The Processor

CS 31: Intro to Systems Digital Logic. Kevin Webb Swarthmore College February 3, 2015

CSE A215 Assembly Language Programming for Engineers

Inf2C - Computer Systems Lecture 2 Data Representation

CPE300: Digital System Architecture and Design

COMPUTER ORGANIZATION AND DESI

GRE Architecture Session

Instructor Information

ECE 2030B 1:00pm Computer Engineering Spring problems, 5 pages Exam Two 10 March 2010

Hardware Design I Chap. 10 Design of microprocessor

COMP2121: Microprocessors and Interfacing. Instruction Set Architecture (ISA)

ECE C61 Computer Architecture Lecture 2 performance. Prof. Alok N. Choudhary.

Chapter 4. The Processor

Computer Organisation CS303

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Lecture - 4. Measurement. Dr. Soner Onder CS 4431 Michigan Technological University 9/29/2009 1

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

Performance. CS 3410 Computer System Organization & Programming. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]

Overview of Computer Organization. Chapter 1 S. Dandamudi

1.3 Data processing; data storage; data movement; and control.

ECE 341 Midterm Exam

Engineering 9859 CoE Fundamentals Computer Architecture

MARIE: An Introduction to a Simple Computer

CSE 141 Summer 2016 Homework 2


Overview of Computer Organization. Outline

Segment 1A. Introduction to Microcomputer and Microprocessor

Microprocessors I MICROCOMPUTERS AND MICROPROCESSORS

SYLLABUS. osmania university CHAPTER - 1 : REGISTER TRANSFER LANGUAGE AND MICRO OPERATION CHAPTER - 2 : BASIC COMPUTER

ECE 341 Midterm Exam

Processor (I) - datapath & control. Hwansoo Han

Advanced Computer Architecture-CS501

10.1. Unit 10. Signed Representation Systems Binary Arithmetic

COMPUTER ARCHITECTURE AND ORGANIZATION Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital

PART A (22 Marks) 2. a) Briefly write about r's complement and (r-1)'s complement. [8] b) Explain any two ways of adding decimal numbers.

CS430 Computer Architecture

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

IA Digital Electronics - Supervision I

Semester Transition Point. EE 109 Unit 11 Binary Arithmetic. Binary Arithmetic ARITHMETIC

The functional block diagram of 8085A is shown in fig.4.1.

CS 110 Computer Architecture

SISTEMI EMBEDDED. Basic Concepts about Computers. Federico Baronti Last version:

CS/COE0447: Computer Organization

CS/COE0447: Computer Organization

COMPUTER ORGANIZATION & ARCHITECTURE

From CISC to RISC. CISC Creates the Anti CISC Revolution. RISC "Philosophy" CISC Limitations

Chapter 4. The Processor Designing the datapath

Micro computer Organization

Chapter 1. Microprocessor architecture ECE Dr. Mohamed Mahmoud.

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra

Computer Performance Evaluation: Cycles Per Instruction (CPI)

Performance evaluation. Performance evaluation. CS/COE0447: Computer Organization. It s an everyday process

Outline Marquette University

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr

ECE260: Fundamentals of Computer Engineering

Binary Addition. Add the binary numbers and and show the equivalent decimal addition.

Advanced Computer Architecture

Practice Assignment 1

Computer Organization and Technology Processor and System Structures

Latches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter

Transcription:

CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Number Representation 09212011 http://www.egr.unlv.edu/~b1morris/cpe300/

2 Outline Recap Logic Circuits for Register Transfer Machine Number Representation Performance Measurement CISC vs. RISC

3 Logic Circuits in ISA Circuit components support data transmission and storage as well Flip-flops for registers (machine state) Logic gates for control

4 Multi-Bit Register Transfer Implementing A<m..1> B<m..1> Strobe signal to store (latch) value in register

5 m-bit Multiplexer Multiplexer gate signals Gi may be produced by a binary to one-out-of n decoder How many gates with how many inputs? What is relationship between k and n?

6 Multiplexed Transfers using Gates and Strobes Selected gate and strobe determine which Register is transferred to where. A C, and B C can occur together, but not A C, and B D

7 Wired-OR Bus Bus is a shared datapath Open collector gates driving bus OR distributed over the entire connection Single pull-up resistor for whole bus

8 Tri-State Gate Controlled gating Only one gate active at a time Undefined output when not active

9 Tri-State Bus Can make any register transfer R[i] R[j] Only single gate may be active at a time G i G j = 1

10 Heuring s Rules of Buses Only one thing on bus during a clock cycle Gate-strobe paradigm Bus contents disappear at end of clock cycle Bus items are not stored unless strobed into a register Clock period must be long enough to ensure valid signals everywhere along bus What are contents of tri-state bus when enable signal is low? Hi-Z in disconnected floating state

11 Example: Registers + ALU with Single Bus Example Abstract RTN R[3] R[1]+R[2]; Concrete RTN Y R[2]; Z R[1] + Y; R[3] A; ALU-type units are combinational logic have no memory Control Sequence R[2] out, Y in ; R[1] out, Z in ; Z out, R[3] in ; Note: 3 concrete steps to describe single abstract RTN step

12 Signal Timing Distinction between gating and strobing signal How is minimum clock period determined?

13 Example notes R[i]or Y can get the contents of anything but Y Result cannot be on bus containing operand Arithmetic units have result registers Only one of two operands can be on the bus at a time Adder has register for one operand

14 RTN and Implementation Abstract RTN Describes what machine does R[3] R[1] + R[2]; Concrete RTN Describes how it is accomplished given particular hardware implementation Y R[2]; Z R[1] + Y; R[3] Z; Control Sequence Control signal assertion sequence to produce result R[2] out, Y in ; R[1] out, Z in ; Z out, R[3] in

15 Chapter 2 Summary Classes of computer ISAs Memory addressing modes SRC: a complete example ISA RTN as a description method for ISAs RTN description of addressing modes Implementation of RTN operations with digital logic circuits Gates, strobes, and multiplexers

16 Machine Representation Computers manipulate bits Bits must represent things Instructions, numbers, characters, etc. Must tell machine what the bits mean Given N bits 2 N different things can be represented

17 Positional Notation for Numbers Base (radix) B number B symbols per digit Base 10 (Decimal): 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 Base 2 (binary) 0, 1 Number representation d 31 d 30 d 2 d 1 d 0 is 32 digit number Value = d 31 B 31 + d 30 B 30 + + d 1 B 1 + d 0 B 0 Examples (Decimal): 90 = 9 10 1 + 0 10 0 (Binary): 1011010 = 1 2 6 + 0 2 5 + 1 2 4 + 1 2 3 + 0 2 2 + 1 2 1 + 0 2 0 = 64 + 16 + 8 + 2 = 90 7 binary digits needed for 2 digit decimal number

18 Hexadecimal Number: Base 16 More human readable than binary Base with easy conversion to binary Any multiple of 2 base could work (e.g. octal) Hexadecimal digits Decimal 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 binary 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 hex 0 1 2 3 4 5 6 7 8 9 A B C D E F 1 hex digit represents 16 decimal values or 4 binary digits Will use 0x to indicate hex digit

19 Hex/Binary Conversion Decimal 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 binary 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 hex 0 1 2 3 4 5 6 7 8 9 A B C D E F Examples 1010 1100 0101 (binary) = 0xAC5 10111 (binary) = 0001 0111 (binary) = 0x17 0x3F9 = 0011 1111 1001 (binary) = 11 1111 1001 (binary)

20 Signed Numbers N bits represents 2 N values Unsigned integers Range [0, 2 32-1] How can negative values be indicated? Use a sign-bit Boolean indicator bit (flag)

21 Sign and Magnitude 16-bit numbers +1 (decimal) = 0000 0000 0000 0001 = 0x0001-1 (decimal) = 1000 0000 0000 0000 = 0x8001 Problems Two zeros 0x0000 0x8000 Complicated arithmetic Special steps needed to handle when signs are same or different (must check sign bit)

22 Ones Complement Complement the bits of a number +1 (decimal) = 0000 0000 0000 0001 = 0x0001-1 (decimal) = 1111 1111 1111 1110 = 0xFFFE Positive number have leading zeros Negative number have leading ones Arithmetic not too difficult Still have two zeros

23 Two s Complement Subtract large number from a smaller one Borrow from leading zeros Binary Result has leading ones 0011 3 0100 4 Unbalanced representation 1111-1 Leading zeros for positive 2 N-1 non-negatives Leading ones for negative number 2 N-1 negative number One zero representation First bit is sign-bit (must indicate width) Value = d 31-2 31 + d 30 2 30 + + d 1 2 1 + d 0 2 0 Negative value for sign bit Decimal

24 Two s Complement Negation Shortcut = invert bits and add 1 Number + complement = 0xF..F = -1 x + x = 1 x + 1 = x Example x 1111 1110 x 0000 0001 x + 1 0000 0010

25 Two s Complement Sign Extension Machine s have fixed width (e.g. 32-bits) Real numbers have infinite width (invisible extension) Positive has infinite 0 s Negative has infinite 1 s Replicate sign bit (msb) of smaller container to fill new bits in larger container Example 1111 1111 1111 1110 1111 1111 1111 1111 1111 1111 1111 1110

26 Overflow Fixed bit width limits number representation Occurs if result of arithmetic operation cannot be represented by hardware bits Example 8-bit: 127 + 127 Binary 0111 1111 127 0111 1111 127 Decimal 1111 1110-2 (254) Sometimes called the V flag in condition code

27 Chapter 3 3.1 Machine characteristics and performance 3.2 RISC vs. CISC 3.3 A CISC microprocessor The Motorola MC68000 3.4 A RISC architecture The SPARC

28 Machine Performance What is machine performance? How can performance be measured? Response time How long to complete a task Throughput Total work completed per unit time E.g. task/per hour

29 Some Performance Metrics MIPS: Millions of Instructions Per Second Instruction Count MIPS = Execution Time Pitfalls Differences in ISA between machines (different instruction counts on different machines) Differences in complexity between instructions Different values for a single computer (two different programs) MFLOPS: Million Floating Point OPs Per Second Other instructions counted as overhead for the floating point Used by supercomputing community Whetstones: Synthetic benchmark A program made-up to test specific performance features Dhrystones: Synthetic competitor for Whetstone Made up to correct Whetstone s emphasis on floating point System Performance Evaluation Cooperative (SPEC) Selection of real programs for benchmark Taken from the C/Unix world

30 Relative Performance Performance x = 1 Execution time x Speedup = n = Performance x = Execution time y Performance y Execution time x Example Compare driving speeds. 34 mph old route and 46 mph on new n = speed new = 46 speed = 1.35 old 34 Compare based on driving time. 96 minutes old route and 71 minutes on new n = time old time new = 96 71 = 1.35

31 Measuring Performance Program execution time is best measure of performance Wall clock time/response time/elapsed time Total time to complete a task (including disk access, memory access, I/O, etc.) CPU (execution) time Time spent just on CPU computation User CPU time time spent on program System CPU time time spent in OS

32 CPU Clocking Operation of digital hardware governed by a constant-rate clock Clock cycles/ticks/periods Clock period: duration of a clock cycle e.g., 250ps = 0.25ns = 250 10 12 sec Clock frequency (rate): cycles per second e.g., 4.0GHz = 4000MHz = 4.0 10 9 Hz

33 CPU Time For a given program CPU Time = CPU clock cycles clock cycle time CPU clock cycles CPU Time = Clock Rate Performance improvements Reduce number of clock cycles Increase clock rate Hardware designer must trade off between clock cycle count and clock rate

34 CPU Clock Cycles Cycles related to number of instructions CPU clock cycles = # Instructions CPI CPU Time = #Instructions CPI clock cycle time CPU Time = # Instructions CPI Clock Rate Clock cycles per instruction (CPI) Average number of clock cycles per instruction (given a program or program fragment) Determined by CPU hardware Different CPI for different instructions Average CPI affected by instruction mix

35 CPI Example Computer A: Cycle Time = 250ps, CPI = 2.0 Computer B: Cycle Time = 500ps, CPI = 1.2 Same ISA Which is faster, and by how much? Computer A: CPU time = I 2.0 250 ps = I 500 ps Computer B: CPU time = I 1.2 500 ps = I 600 ps Speedup n = time slow = I 600 time = 1.2 fast I 500 Computer A is 1.2 times faster, 20% speedup A faster

36 CPI Details Different instruction classes may have different cycle time Clock Cycles i 1 Weighted average CPI CPI Clock Cycles Instruction Count n (CPIi Instruction Counti) n i 1 CPI i Instruction Counti Instruction Count Relative frequency of instruction class

37 Performance Summary CPU Time = #Instructions CPI clock cycle time Execution time T IC CPI T := CPU time IC := instruction count CPI := clock cycles/instruction τ := duration of clock period

38 Example 3.1 System clock of computer increased in frequency from 700 MHz to 1.2 GHz. What is speedup? (assume no other factors) Speedup n = time old = (IC CPI τ) old time = 1.71 new 1/1200 IC and CPI do not change because only clock was adjusted (IC CPI τ) new = 1/700

39 RISC vs. CISC Designs CISC: Complex Instruction Set Computer Many complex instructions and addressing modes Some instructions take many steps to execute Not always easy to find best instruction for a task RISC: Reduced Instruction Set Computer Few, simple instructions, addressing modes Usually one word per instruction May take several instructions to accomplish what CISC can do in one Complex address calculations may take several instructions Usually has load-store, general register ISA

40 Memory Bottleneck Memory no longer expensive Design for speed Parameter 1981 (8086) 2004 (Pentium P4) Improvement factor Clock Frequency 4.7 MHz 4 GHz ~1000 Clock Period 212 ns 200 ps ~1000 Memory Cycle Time 100 ns 70 ns 1.4 Clocks per Memory Cycle.47 280 ~ -500

41 Dealing with Memory Bottleneck Employ one or more levels of cache memory. Prefetch instructions and data into I-cache and D- cache. Out of order execution. Speculative execution. One word per instruction (RISC) Simple addressing modes (RISC) Load-Store architecture (RISC) Lots of general purpose registers (RISC)

42 RISC Design Characteristics Simple instructions can be done in few clocks Simplicity may even allow a shorter clock period A pipelined design can allow an instruction to complete in every clock period Fixed length instructions simplify fetch & decode The rules may allow starting next instruction without necessary results of the previous Unconditionally executing the instruction after a branch Starting next instruction before register load is complete

43 More on RISC Prefetch instructions Get instruction/data/location before needed in pipeline Pipelining Beginning execution of an instruction before the previous instruction(s) have completed. (Chapter 5.) Superscalar operation Issuing more than one instruction simultaneously. Instruction-level parallelism (Chapter 5.) Out-of-order execution Delayed loads, stores, and branches Operands may not be available when an instruction attempts to access them. Register Windows ability to switch to a different set of CPU registers with a single command. Alleviates procedure call/return overhead. Discussed with SPARC (Chapter 3)