The Stored Program Computer

Similar documents
Chapter 4 The Von Neumann Model

Introduction to Computer Engineering. CS/ECE 252 Prof. Mark D. Hill Computer Sciences Department University of Wisconsin Madison

Chapter 4 The Von Neumann Model

Chapter 4 The Von Neumann Model

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Computing Layers

LC-3 Architecture. (Ch4 ish material)

Chapter 4. The Processor

Chapter 4. The Processor

Processor (I) - datapath & control. Hwansoo Han

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

COSC121: Computer Systems: Review

Systems Architecture

Chapter 4. The Processor Designing the datapath

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

Chapter 4. The Processor

Lecture Topics. Announcements. Today: The MIPS ISA (P&H ) Next: continued. Milestone #1 (due 1/26) Milestone #2 (due 2/2)

C Functions and Pointers. C Pointers. CS270 - Fall Colorado State University. CS270 - Fall Colorado State University

The Von Neumann Architecture Odds and Ends. Designing Computers. The Von Neumann Architecture. CMPUT101 Introduction to Computing - Spring 2001

ECE260: Fundamentals of Computer Engineering

CS3350B Computer Architecture Winter 2015

CMPUT101 Introduction to Computing - Summer 2002

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Chapter 4. The Processor

ECE232: Hardware Organization and Design

The Von Neumann Architecture. Designing Computers. The Von Neumann Architecture. CMPUT101 Introduction to Computing - Spring 2001

The Stored Program Computer

Designing Computers. The Von Neumann Architecture. The Von Neumann Architecture. The Von Neumann Architecture

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)

CS 31: Intro to Systems Digital Logic. Kevin Webb Swarthmore College February 2, 2016

COMPUTER ORGANIZATION AND DESIGN

ECE232: Hardware Organization and Design. Computer Organization - Previously covered

Chapter 4 The Processor 1. Chapter 4A. The Processor

CS 31: Intro to Systems Digital Logic. Kevin Webb Swarthmore College February 3, 2015

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

CMPUT101 Introduction to Computing - Summer 2002

The MIPS Processor Datapath

Introduction. Datapath Basics

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA MIPS ISA. In a CPU. (vonneumann) Processor Organization

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA

Lecture Topics. Announcements. Today: Integer Arithmetic (P&H ) Next: The MIPS ISA (P&H ) Consulting hours. Milestone #1 (due 1/26)

Computer Architecture (part 2)

ECE260: Fundamentals of Computer Engineering

CENG 3420 Lecture 06: Datapath

Chapter 4. The Processor. Computer Architecture and IC Design Lab

Review: MIPS Organization

ECE260: Fundamentals of Computer Engineering

Inf2C - Computer Systems Lecture Processor Design Single Cycle

Outline. EEL-4713 Computer Architecture Designing a Single Cycle Datapath

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

TDT4255 Computer Design. Lecture 4. Magnus Jahre. TDT4255 Computer Design

361 datapath.1. Computer Architecture EECS 361 Lecture 8: Designing a Single Cycle Datapath

Review. N-bit adder-subtractor done using N 1- bit adders with XOR gates on input. Lecture #19 Designing a Single-Cycle CPU

CS222: Processor Design

Review: Abstract Implementation View

ECE 486/586. Computer Architecture. Lecture # 7

UC Berkeley CS61C : Machine Structures

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

Clever Signed Adder/Subtractor. Five Components of a Computer. The CPU. Stages of the Datapath (1/5) Stages of the Datapath : Overview

UC Berkeley CS61C : Machine Structures

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.

CS 5803 Introduction to High Performance Computer Architecture: Arithmetic Logic Unit. A.R. Hurson 323 CS Building, Missouri S&T

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: MIPS Instruction Set Architecture

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.

COMPUTER ORGANIZATION AND DESIGN

ECE260: Fundamentals of Computer Engineering

CC 311- Computer Architecture. The Processor - Control

CSE140: Components and Design Techniques for Digital Systems

ΗΜΥ 656 ΠΡΟΧΩΡΗΜΕΝΗ ΑΡΧΙΤΕΚΤΟΝΙΚΗ ΗΛΕΚΤΡΟΝΙΚΩΝ ΥΠΟΛΟΓΙΣΤΩΝ Εαρινό Εξάμηνο 2007

Rui Wang, Assistant professor Dept. of Information and Communication Tongji University.

CS61C : Machine Structures

Block diagram view. Datapath = functional units + registers

The Big Picture: Where are We Now? EEM 486: Computer Architecture. Lecture 3. Designing a Single Cycle Datapath

CS61C : Machine Structures

CS 61C: Great Ideas in Computer Architecture Datapath. Instructors: John Wawrzynek & Vladimir Stojanovic

Systems Architecture I

Single Cycle Datapath

Ch 5: Designing a Single Cycle Datapath

Processor. Han Wang CS3410, Spring 2012 Computer Science Cornell University. See P&H Chapter , 4.1 4

CS 61C: Great Ideas in Computer Architecture. MIPS CPU Datapath, Control Introduction

CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath

CS Computer Architecture Spring Week 10: Chapter

CPE 335 Computer Organization. Basic MIPS Architecture Part I

Programmable Machines

LECTURE 3: THE PROCESSOR

Learning Outcomes. Spiral 3-3. Sorting: Software Implementation REVIEW

Chapter 3 Arithmetic for Computers

Topic Notes: MIPS Instruction Set Architecture

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

Programmable Machines

LECTURE 5. Single-Cycle Datapath and Control

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

Single Cycle Datapath

CS31001 COMPUTER ORGANIZATION AND ARCHITECTURE. Debdeep Mukhopadhyay, CSE, IIT Kharagpur. Instructions and Addressing

Transcription:

ΗΜΥ 312 -- ΑΡΧΙΤΕΚΤΟΝΙΚΗ ΗΛΕΚΤΡΟΝΙΚΩΝ ΥΠΟΛΟΓΙΣΤΩΝ ΔΙΑΛΕΞΕΙΣ 12-13: CPU Datapath Design Intro to ALU Διδάσκουσα: ΜΑΡΙΑ Κ. ΜΙΧΑΗΛ Αναπληρώτρια Καθηγήτρια, ΗΜΜΥ (mmichael@ucy.ac.cy) [Προσαρµογή από Computer Architecture, Computer Organization and Design, Patterson & Hennessy, 2005 και Superscalar Microprocessor Design, Johnson, 1992 ] The Stored Program Computer 1943: ENIAC Presper Eckert and John Mauchly -- first general electronic computer (or was it John V. Atananasoff in 1939?) Hard-wired program -- settings of dials and switches 1944: Beginnings of EDVAC among other improvements, includes program stored in memory 1945: John von Neumann wrote a report on the stored program concept, known as the First Draft of a Report on EDVAC The basic structure proposed in the draft became known as the von Neumann machine (or model). a memory, containing instructions and data a processing unit, for performing arithmetic and logical operations a control unit, for interpreting instructions ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.2 1

Von Neumann Model MEMORY MAR MDR INPUT Keyboard Mouse Scanner Disk, etc. PROCESSING UNIT ALU REG FILE OUTPUT Monitor Printer LED Disk, etc. CONTROL UNIT PC IR ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.3 Data Path Components Global bus Set of wires that carry n-bit signals to many components Inputs to bus are controlled by triangle structure called tri-state devices» Place signal on bus when enabled» Only one (n-bit) signal should be enabled at a time» Control unit decides which signal drives the bus Any number of components can read bus» Register only captures bus data if write-enabled by the control unit Memory and I/O Control signals and data registers for memory and I/O devices Memory: LW, SW Input (keyboard): Interrupt, DMA Output (text display): Interrupt, DMA ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.4 2

Data Path Components (cont.) ALU/FPU Input: register file or sign-extended bits from IR (immediate field) Output: bus; used by» Condition code registers» Register file» Memory and I/O registers Register File Two read addresses, one write address Input: n bits from bus» Result of ALU operation or memory (or I/O) read Outputs: two n-bit» Used by ALU, PC, memory address» Data for store instructions passes through ALU ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.5 Instructions (ISA) Fundamental unit of work Constituents Opcode: operation to be performed Operands: data/locations to be used for operation Encoded as a sequence of bits (just like data!) Sometimes have a fixed length (e.g., 16 or 32 bits) Atomic: operation is either executed completely, or not at all ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.6 3

Instruction Processing FETCH instruction from mem. DECODE instruction EVALUATE ADDRESS FETCH OPERANDS EXECUTE operation STORE result ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.7 Instruction Processing: FETCH Idea Put next instruction in IR & increment PC Steps Load contents of PC into MAR Increment PC Send read signal to memory Read contents of MDR, store in IR F D EA OP EX S ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.8 4

Instruction Processing: DECODE Identify opcode In LC-3, always first four bits of instruction 4-to-16 decoder asserts control line corresponding to desired opcode Identify operands from the remaining bits Depends on opcode e.g., for LDR, last six bits give offset e.g., for ADD, last three bits name source operand #2 F D EA OP EX S ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.9 Instruction Processing: EVALUATE ADDRESS Compute address For loads and stores For control-flow instructions F D Examples Add offset to base register (as in LDR) Add offset to PC (as in LD and BR) EA OP EX S ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.10 5

Instruction Processing: FETCH OPERANDS Get source operands for operation F Examples Read data from register file (ADD) Load data from memory (LDR) EA D OP EX S ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.11 Instruction Processing: EXECUTE Actually performs operation F Examples Send operands to ALU and assert ADD signal Do nothing (e.g., for loads and stores) EA D OP EX S ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.12 6

Instruction Processing: STORE Write results to destination Register or memory Examples Result of ADD is placed in destination reg. Result of load instruction placed in destination reg. For store instruction, place data in memory» Set MDR» Assert WRITE signal to memory F D EA OP EX S ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.13 Datapath and Control Unit ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.14 7

Tracking Control Signals - Cycle 1 LW ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.15 Tracking Control Signals - Cycle 2 SW LW ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.16 8

Tracking Control Signals - Cycle 3 1 ADD SW LW 0 01 ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.17 Tracking Control Signals - Cycle 4 0 0 SUB ADD SW LW 1 ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.18 9

Tracking Control Signals - Cycle 5 1 1 SUB ADD SW LW ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.19 Changing the Sequence of Instructions In the FETCH phase, we incremented the Program Counter by 1. What if we don t want to always execute the instruction that follows this one? examples: loop, if-then, function call Need special instructions that change the contents of the PC. These are called jumps and branches. jumps are unconditional -- they always change the PC branches are conditional -- they change the PC only if some condition is true (e.g., the contents of a register is zero) ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.20 10

Instruction Processing Summary Instructions look just like data -- it s all interpretation. Three basic kinds of instructions: computational instructions (ADD, AND, ) data movement instructions (LD, ST, ) control instructions (JMP, BRnz, ) Six basic phases of instruction processing: F D EA OP EX S not all phases are needed by every instruction phases may take variable number of machine cycles ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.21 Driving Force: The Clock The clock is a signal that keeps the control unit moving At each clock tick, control unit moves to the next machine cycle -- may be next instruction or next phase of current instruction. Clock generator circuit: Based on crystal oscillator Generates regular sequence of 0 and 1 logic levels Clock cycle (or machine cycle) -- rising edge to rising edge 1 0 Machine Cycle time ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.22 11

Instructions vs. Clock Cycles MIPS vs. MHz MIPS = millions of instructions per second MHz = millions of clock cycles per second These are not the same -- why? ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.23 The Control Unit Program is stored in memory as machine language instructions, in binary The task of the control unit is to execute programs by repeatedly: Fetch from memory the next instruction to be executed. Decode it, that is, determine what is to be done. Execute it by issuing the appropriate signals to the ALU, memory, and I/O subsystems. Continues until the HALT instruction ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.24 12

30/10/16 von Neumann Architecture ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.25 (c) Yngvi Bjornsson The Von Neumann Architecture Bus Processor (CPU) Input-Output Memory Control Unit ALU Store data and program Execute program Do arithmetic/logic operations requested by program Communicate with "outside world", e.g. Screen Keyboard Storage devices... ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.26 13

The ALU Subsystem The ALU (Arithmetic/Logic Unit) performs mathematical operations (+, -, x, /, ) logic operations (=, <, >, and, or, not,...) In today's computers integrated into the CPU Consists of: Circuits to do the arithmetic/logic operations. Registers (fast storage units) to store intermediate computational results. Bus that connects the two. ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.27 Structure of the ALU Registers: Very fast local memory cells, that store operands of operations and intermediate results. CCR (condition code register), a special purpose register that stores the result of <, =, > operations ALU circuitry: Contains an array of circuits to do mathematical/logic operations. Bus: Data path interconnecting the registers to the ALU circuitry. R0 R1 R2 Rn ALU circuitry GT EQ LT ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.28 28 14

ALU and its importance It is a functional box designed to perform basic arithmetic, logic, and shift operations on the data. Implementation of the basic operations such as logic, program control, and data transfer operations are easier than arithmetic and I/O operations. Therefore, in this section we concentrate on arithmetic operations. ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.29 Why do we need to improve the ALU? In an attempt to improve the performance, this section will talk about the Arithmetic Logic Unit. In regard to our previous mentions about CPU time (T), we are looking at techniques to reduce p. T = I c * CPI * τ = I c * (p+m*k)* τ Instruction Count Clock Processor Memory Latency (+ Cache) ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.30 15

Review: MIPS Arithmetic Instructions 31 25 20 15 5 0 R-type: op Rs Rt Rd funct I-Type: op Rs Rt Immed 16 expand immediates to 32 bits before ALU 10 operations so can encode in 4 bits Type op funct ADD 00 100000 ADDU 00 100001 SUB 00 100010 Type op funct SUBU 00 100011 00 101000 AND 00 100100 00 101001 OR 00 100101 SLT 00 101010 XOR 00 100110 SLTU 00 101011 NOR 00 100111 00 101100 ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.31 zero ovf 1 1 A 32 ALU result 32 B 32 4 m (operation) 0 add 1 addu 2 sub 3 subu 4 and 5 or 6 xor 7 nor a slt b sltu Logic Operations Logic Operation Symbol MIPS instruction shift left << sll $10, $16, 8 shift right >> srl $10, $16, 8 AND & and $3, $7, $8 OR or $3, $7, $8 R-type (add, sub) instruction format: op rs rt rd shamt funct 6 bits 5 5 5 5 6 = 32 opcode 1st src 2nd src dest shift amount func --> fields For shift instructions (shift left and shift right), the 1st source register is unused. ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.32 16

ALU q Some ALU operations: arithmetic: logic: comparison : q Big Picture: What s in there?? How do we build it?? ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.33 Principles For a simple machine, the ALU should at least be able to perform operations such as: - add - increment - subtract - decrement A simple ALU is basically an adder and some control circuits augmented by special circuits to carry out the logic and shift operations. ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.34 17

Serial/Parallel/Modular ALUs An ALU can be of three types: - Serial - Parallel - Functional (Modular) Similar to the definition of serial and parallel adders, one can define serial and parallel ALUs. In a serial ALU, one bit of the operand(s) participates in the operation during each clock pulse. In a parallel ALU, operation on all the bits of the operand(s) is initiated simultaneously. - In simple terms a parallel ALU can be looked at as a cascade of identical units forming a one dimensional array of cells. ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.35 Parallel ALU Parallel ALU Control Signals Bn A n B 2 A 2 B 1 A 1 C n+1 Cn C 3 C 2 C 1 F n F 2 F 1 ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.36 18

Parallel ALU Parallel ALU - ALU operation is determined by the control signals. - In a very simple form, the bit-pattern of the control signal is determined by the operation code. - In a parallel ALU, one needs to determine the design of a unit and then replicate it. ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.37 Simple ALU Design Example A Simple Arithmetic Logic Unit A i X i C S i+1 2 B i S 1 S 0 Y i F i C i Z i M - S 2, S 1, S 0, and M are the control signals. - A i, B i, and C i are the operand bits and carry-in, respectively. - F i and C i+1 are the result bit and carry-out, respectively. ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.38 19

Simple ALU Design Example A Simple Arithmetic Logic Unit A i Full Adder S 2 X i C i+1 B i S 1 S 0 Y i F i C i Z i M - S 2, S 1, S 0, and M are the control signals. - A i, B i, and C i are the operand bits and carry-in, respectively. - F i and C i+1 are the result bit and carry-out, respectively. ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.39 Simple ALU Design Example A Simple Arithmetic Logic Unit - The function of each stage can be defined as: F i X i Y i Z i C i+1 X i Y i +(X i Y i )Z i = X i Y i +X i Z i +Y i Z i - By appropriate setting of the control signals one can initiate a variety of the operations. ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.40 20

Simple ALU Design Example A Simple Arithmetic Logic Unit Example M=0 S 2 =1 S 1 =0 S 0 =1 C 1 =x F A B ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.41 Simple ALU Design Example A Simple Arithmetic Logic Unit Example M=1 S 2 =1 S 1 =1 S 0 =1 C 1 =0 F A - 1 ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.42 21

Improvements? As discussed before a parallel ALU offers a higher speed relative to a serial ALU. How can one improve the performance (speed) of ALU further? Is it possible to build (design) an ALU faster than a parallel ALU? ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.43 Functional (Modular) ALU Functional (modular) ALU - ALU is a collection of independent units each tailored for a specific operation. As a result, independent operations can be overlapped. - This approach allows an additional degree of concurrency relative to a parallel ALU, since it allows several operations to be performed on data simultaneously. - This speed improvement comes at the expense of extra overhead needed to detect data independent operations. ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.44 22

Functional (Modular) ALU Functional (modular) ALU Adder 1 Adder 2 Subtractor Multiplier ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.45 How do we design an ALU? Is it possible to improve the performance of an ALU further? Naturally, we can improve the performance (physical speed) by taking advantage of the advances in technology. How can we improve the logical speed of the ALU further? In a functional ALU, is it possible to devise algorithms which allow one to improve the performance of the basic operations? If this is a valid direction, then the question of how to design a fast ALU will change to how to design a fast adder, a fast multiplier,...?" As a computer architect, how do you design an ALU? ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.46 23

Review: A 32-bit Adder/Subtractor q Built out of 32 full adders (FAs) A 0 A B carry_in 1 bit FA carry_out S add/subt B 0 B 1 B 2 A1 A 2 c 0 =carry_in 1-bit FA S 0 c 1 1-bit FA S 1 c 2 1-bit FA S 2 c 3 S = A xor B xor carry_in... carry_out = A B v A carry_in v B carry_in (majority function) q Small but slow! B 31 A 31 c 31 1-bit FA S 31 c 32 =carry_out ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.47 Lets start with the addition q Fast Adder How to design an adder faster than a parallel adder? What is the major bottle-neck in a parallel adder? Is the carry generation and propagation the major bottleneck? Is it possible to eliminate, moderate, or reduce the delay of carry generation and propagation? ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.48 24

How do we speed up Adders? q Fast Adder Carry Lookahead: Generate and propagate carries ahead of time relative to a parallel adder. - Scheme 1 - Scheme 2 Carry Select ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.49 Recall from ECE 210/211/212/213 Basic Building Block A 4-Bit Ripple-Carry Adder Timing: Let Δt propagation delay of 1 gate F 1 = 2Δt C 2 = 2Δt F 2 = 4Δt C 3 = 4Δt F 3 = 6Δt C 4 = 6Δt F 4 = 8Δt C 5 = 8Δt or, give n bits, it requires 2n logic levels (gate delays) B A 4 B A B A B A 4 3 3 2 2 1 1 C 5 Carry-out C4 C3 C2 FA FA FA FA C1 Carry-in F F 4 3 F 2 F1 ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.50 25

Carry Lookahead Adder (CLA) q Fast Adder Carry Lookahead (Scheme 1) C i+1 =A i B i +(A i B i )C i =A i B i +(A i +B i )C i Carry Generate term (G i ) Carry Propagate Term (P i ) ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.51 Σχεδιασµός CLA q Από ένα FA, διαχωρίζουµε µεταξύ της παραγωγής (generation) του κρατουµένου (όταν ένα νέο κρατούµενο παράγεται, C out =1) και της µετάδοσης (propagation) του κρατουµένου (όταν ένα υπάρχον C in µεταδίδεται στο C out ) q Παραγωγή: G i = A i B i : if 1, C i+1 =1 q Μετάδοση: P i = A i B i : εάν 1 τότε C i+1 = C i Full Adder (FA) Partial Full Adder (PFA) B i A i A i B i S i C i+1 C i S i G i P i C i ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.52 26

Μπλοκ CLA q Υλοποίηση: C 1 = G 0 +P 0 C 0 C 2 = G 1 +P 1 C 1 = G 1 +P 1 (G 0 +P 0 C 0 ) = G 1 +P 1 G 0 +P 1 P 0 C 0 C 3 = G 2 + P 2 C 2 = G 2 +P 2 G 1 +P 2 P 1 G 0 +P 2 P 1 P 0 C 0 C 4 = G 3 +P 3 G 2 +P 3 P 2 G 1 +P 3 P 2 P 1 G 0 + P 3 P 2 P 1 P 0 C 0 = G 0-3 + P 0-3 C 0 Οµάδα Παραγωγής Κρατουµένου Οµάδα Μετάδοσης Κρατουµένου ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.53 Λογική Παραγωγής/Μετάδοσης για 4-bit CLA Όλα 2-επιπέδων à Το Cout υπολογίζεται γρήγορα ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.54 27

Carry Lookahead q Fast Adder Carry Lookahead (Scheme 1) Extended (CLA) 4-Bit Full Adder Carry Lookahead (Scheme 1) g C 4 P 4 g 3 P 3 g 2 P 2 g 5 F4 1 P 1 F 3 F 2 F 1 (F.A.) (F.A.) (F.A.) (F.A.) C C C 3 2 4 C 1 B4 A 4 B3 A 3 B2 A 2 B1 A 1 ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.55 Carry Lookahead q Fast Adder Carry Lookahead (Scheme 1) Extended (CLA) 4-Bit Full Adder Timing p s and g s are generated in 1Δt C s are generated after another 2Δt F s are generated after another 2Δt à 5Δt (or logic levels) for 4-bits à for n-bits?? ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.56 28

Carry Lookahead q Fast Adder Carry Lookahead (Scheme 1) - Extended (CLA) 4-Bit Full Adders in Ripple - What is the speedup w.r.t. a non-cla (ripple) for n bits? Bn-3 - Bn A n-3 - A n B5- B8 A 5- A 8 B1- B4 A1- A 4 Carry-out Extended 4-Bit F.A. Extended 4-Bit F.A. Extended 4-Bit F.A. C1 Fn-3 - Fn F5- F8 F1- F4 ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.57 Carry Lookahead q Fast Adder Carry Lookahead (Scheme 2) CLA 2 CLA 2 CLA 2 Carry-out B29-32 A29-32 B5 - B 8 A 5- A8 B1 -B 4 A 1 - A 4 4-Bit F.A. 4-Bit F.A. 4-Bit F.A. C1 F29-32 F5 - F8 F1 - F4 Timing CLA = 5Δt Cascades of CLAs overlap 1Δt operation ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.58 29

Carry Select q Fast Adder Carry Select - Carry-in to a 4-bit full adder is either 0 or 1. - Duplicate each stage - e.g., 4-bit full adder. - Initiate each unit in a stage with carry-in of 0 and 1. - Use a multiplexer to select the correct answer. ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.59 Carry Select q Fast Adder Carry Select 8-bit B5 - B 8 A 5- A 8 B5 - B8 A5 - A8 1 0 B 1- B4 A 1- A4 4-Bit Full Adder 4-Bit Full Adder 4-Bit Full Adder Carry-in F5 " - F8 " F5 ' - F8 ' F1- F4 12 Δ t MUX 10 Δ t F4- F8 ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.60 30

ALU Questions - Practice q Questions Calculate the execution time of a 16-bit adder using carry lookahead scheme 1. Formulate the execution time of an n-bit adder using carry lookahead scheme 1 (n is a multiple of 4). Calculate the execution time of a 16-bit adder using carry lookahead Scheme 2. Formulate the execution time of an n-bit adder using carry lookahead scheme 2 (n is a multiple of 4). Calculate the execution time of a 16-bit adder using carry select scheme. Formulate the execution time of an n-bit adder using carry select scheme. Is it possible to combine carry lookahead and carry select concepts to design a faster adder? ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.61 ΕΠΟΜΕΝΗ ΔΙΑΛΕΞΗ ΚΑΙ ΚΑΤ ΟΙΚΟΝ ΜΕΛΕΤΗ q ΕΠΟΜΕΝΗ ΕΝΟΤΗΤΑ Αριθµητική Η/Υ και βελτιστοποίηση ALU (Computer Arithmetic and ALU Improvements) q ΚΑΤ ΟΙΚΟΝ ΜΕΛΕΤΗ Κεφάλαια 5-6 Patterson&Hennessy (από το βιβλίο του ΗΜΥ212) Παραρτήµατα Β και Ι του βιβλίου σας ΗΜΥ 312 Δ12-13 CPU Datapath Design Intro to ALU.62 31