Computer Architecture

Similar documents
Software Development for SAP R/3

Lecture Notes in Computer Science 2001 Edited by G. Goos, J. Hartmanis and J. van Leeuwen

By, Ajinkya Karande Adarsh Yoga

The Information Retrieval Series. Series Editor W. Bruce Croft

Computer Arithmetic andveriloghdl Fundamentals

Interfacing with C++

Advanced Data Mining Techniques

Jinkun Liu Xinhua Wang. Advanced Sliding Mode Control for Mechanical Systems. Design, Analysis and MATLAB Simulation

Lecture Notes in Mathematics Editors: J.--M. Morel, Cachan F. Takens, Groningen B. Teissier, Paris

Julien Masanès. Web Archiving. With 28 Figures and 6 Tables ABC

Graphics Programming in c++

Gengsheng Lawrence Zeng. Medical Image Reconstruction. A Conceptual Tutorial

CPE300: Digital System Architecture and Design

c-xsc R. Klatte U. Kulisch A. Wiethoff C. Lawo M. Rauch A C++ Class Library for Extended Scientific Computing Springer-Verlag Berlin Heidelberg GmbH

Topics. 6.1 Number Systems and Radix Conversion 6.2 Fixed-Point Arithmetic 6.3 Seminumeric Aspects of ALU Design 6.4 Floating-Point Arithmetic

Computer Arithmetic Ch 8

Computer Arithmetic Ch 8

Computer Organisation CS303

Computer Science Workbench. Editor: Tosiyasu L. Kunii

INTELLIGENCE PLUS CHARACTER - THAT IS THE GOAL OF TRUE EDUCATION UNIT-I

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 3. Arithmetic for Computers Implementation

High Availability and Disaster Recovery

Contributions to Economics

Geometric Modeling and Algebraic Geometry

The Architectural Logic of Database Systems

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

Parallel Programming

Divide: Paper & Pencil

CO212 Lecture 10: Arithmetic & Logical Unit

Honorary Professor Supercomputer Education and Research Centre Indian Institute of Science, Bangalore

Real-Time Graphics Rendering Engine

Research on Industrial Security Theory

A. Portela A. Charafi Finite Elements Using Maple

Philip Andrew Simpson. FPGA Design. Best Practices for Team-based Reuse. Second Edition

Course Description: This course includes concepts of instruction set architecture,

Stefan Waldmann. Topology. An Introduction

Signed Multiplication Multiply the positives Negate result if signs of operand are different

PART A (22 Marks) 2. a) Briefly write about r's complement and (r-1)'s complement. [8] b) Explain any two ways of adding decimal numbers.

George Grätzer. Practical L A TEX

Springer-Verlag Berlin Heidelberg GmbH

Chapter 5 : Computer Arithmetic

Digital System Design with SystemVerilog

SIDDHARTH GROUP OF INSTITUTIONS :: PUTTUR Siddharth Nagar, Narayanavanam Road QUESTION BANK (DESCRIPTIVE) UNIT-I

CHETTINAD COLLEGE OF ENGINEERING AND TECHNOLOGY COMPUTER ARCHITECURE- III YEAR EEE-6 TH SEMESTER 16 MARKS QUESTION BANK UNIT-1

VERILOG QUICKSTART. James M. Lee Cadence Design Systems, Inc. SPRINGER SCIENCE+BUSINESS MEDIA, LLC

SYLLABUS. osmania university CHAPTER - 1 : REGISTER TRANSFER LANGUAGE AND MICRO OPERATION CHAPTER - 2 : BASIC COMPUTER

Data Representation Type of Data Representation Integers Bits Unsigned 2 s Comp Excess 7 Excess 8

Chapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers

Similarity and Compatibility in Fuzzy Set Theory

Computer Architecture and Organization

Organisasi Sistem Komputer

Computational Geometry - Algorithms and Applications

PostScript ej Acrobat/PDF

ECE 30 Introduction to Computer Engineering

Chapter 10 - Computer Arithmetic

Enabling Technologies for Wireless E-Business

Robust SRAM Designs and Analysis

Lecture Notes in Computer Science. Edited by G. Goos, J. Hartmanis and J. van Leeuwen

Floating-point representations

Floating-point representations

Computer Organization

Digital Design. Verilo. and. Fundamentals. fit HDL. Joseph Cavanagh. CRC Press Taylor & Francis Group Boca Raton London New York

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction

COMPUTER ORGANIZATION AND ARCHITECTURE

Springer-Verlag Berlin Heidelberg GmbH

1. NUMBER SYSTEMS USED IN COMPUTING: THE BINARY NUMBER SYSTEM

DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY. Department of Computer science and engineering

CS Computer Architecture. 1. Explain Carry Look Ahead adders in detail

C NUMERIC FORMATS. Overview. IEEE Single-Precision Floating-point Data Format. Figure C-0. Table C-0. Listing C-0.

MASTERING COBOL PROGRAMMING

Inf2C - Computer Systems Lecture 2 Data Representation

Arithmetic Logic Unit

ECE260: Fundamentals of Computer Engineering

ITIL 2011 At a Glance. John O. Long

FLOATING POINT NUMBERS

THE VERILOG? HARDWARE DESCRIPTION LANGUAGE

Principles of Computer Architecture. Chapter 3: Arithmetic

UNIT-II. Part-2: CENTRAL PROCESSING UNIT

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

Week 7: Assignment Solutions

COMPUTER ORGANIZATION AND ARCHITECTURE

CS6303-COMPUTER ARCHITECTURE UNIT I OVERVIEW AND INSTRUCTIONS PART A

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

Embedded Systems Ch 15 ARM Organization and Implementation

VHDL for Synthesis. Course Description. Course Duration. Goals

DC57 COMPUTER ORGANIZATION JUNE 2013

ECE 341 Midterm Exam

Guide to OSI and TCP/IP Models

Part III The Arithmetic/Logic Unit. Oct Computer Architecture, The Arithmetic/Logic Unit Slide 1

ECE 341 Midterm Exam

Chapter 3. Arithmetic Text: P&H rev

Digital Logic & Computer Design CS Professor Dan Moldovan Spring 2010

SpringerBriefs in Computer Science

An Introduction to Programming with IDL

Chapter 3: Arithmetic for Computers

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

Computer Architecture Set Four. Arithmetic

EC2303-COMPUTER ARCHITECTURE AND ORGANIZATION

Advanced Computer Architecture-CS501

COMP2611: Computer Organization. Data Representation

Transcription:

Computer Architecture

Springer-Verlag Berlin Heidelberg GmbH

Silvia M. Mueller Wolfgang J. Paul Computer Architecture Complexity and Correctness With 214 Figures and 185 Tables Springer

Silvia Melitta Mueller IBM Lab Boblingen - Dept. 3173 SchOnaicherstr. 220 7lO32 Boblingen, Germany E-mail: SMM@de.ibm.com Wolfgang J. Paul Fachbereich Informatik Universitat des Saarlandes 1m Stadtwald, Gebaude 45 66123 Saarbriicken, Germany E-mail: wjp@cs.uni-sb.de Cover picture by Jantje JanSen, Karlsruhe Library of Congress Cataloging-in-Publication Data applied for Die Deutsche Bibliothek - CIP-Einheitsaufnahme Muller, Silvia Melitta: Computer architecture: complexity and correctness; with 185 tablesl Silvia M. Muller; Wolfgang J. Paul. - Berlin; Heidelberg; New York; Barcelona; Hong Kong; London; Milan; Paris; Singapore; Tokyo: Springer, 2000 ACM Subject Classification (1998): B, C ISBN 978-3-642-08691-5 ISBN 978-3-662-04267-0 (ebook) DOI 10.1007/978-3-662-04267-0 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. Springer-Verlag is a company in the BertelsmannSpringer publishing group. Springer-Verlag Berlin Heidelberg 2000 Originally published by Springer-Verlag Berlin Heidelberg New York in 2000. Softcover reprint of the hardcover 1st edition 2000 The use of general descriptive names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Camera-ready by the authors Design: design + production GmbH, Heidelberg Printed on acid-free paper SPIN 10769135 06/3l42SR - 5432 10

Preface I N THIS BOOK we develop at the gate level the complete design of a pipelined RISe processor with delayed branch, forwarding, hardware interlock, precise maskable nested interrupts, caches, and a fully IEEEcompliant floating point unit. The design is completely modular. This permits us to give rigorous correctness proofs for almost every part of the design. Also, because we can compute gate counts and gate delays, we can formally analyze the cost effectiveness of all parts of the design. Acknowledgments This book owes much to the work of the following students and postdocs: P. Dell, G. Even, N. Gerteis, C. Jacobi, D. Knuth, D. Kroening, H. Leister, P.-M. Seidel. March 2000 Silvia M. Mueller Wolfgang 1. Paul

Contents 1 Introduction 1 2 Basics 7 2.1 Hardware Model 7 2.1.1 Components 7 2.1.2 Cycle Times 9 2.1.3 Hierarchical Designs 10 2.1.4 Notations for Delay Formulae 10 2.2 Number Representations and Basic Circuits 12 2.2.1 Natural Numbers 12 2.2.2 Integers... 14 2.3 Basic Circuits........ 17 2.3.1 Trivial Constructions 17 2.3.2 Testing for Zero or Equality 19 2.3.3 Decoders... 19 2.3.4 Leading Zero Counter 21 2.4 Arithmetic Circuits... 22 2.4.1 Carry Chain Adders 22 2.4.2 Conditional Sum Adders 24 2.4.3 Parallel Prefix Computation 27 2.4.4 Carry Lookahead Adders. 28 2.4.5 Arithmetic Units 30 2.4.6 Shifter... 31

Table of contents viii 2.5 Multipliers... 2.5.1 School Method.. 2.5.2 Carry Save Adders 2.5.3 Multiplication Arrays. 2.5.4 4/2-Trees....... 2.6 2.7 2.8 2.5.5 Multipliers with Booth Recoding 2.5.6 Cost and Delay of the Booth Multiplier Control Automata........ 2.6.1 Finite State Transducers 2.6.2 Coding the State.... 2.6.3 Generating the Outputs. 2.6.4 Computing the Next State 2.6.5 Moore Automata..... 2.6.6 Precomputing the Control Signals 2.6.7 Mealy Automata.... 2.6.8 Interaction with the Data Paths.. Selected References and Further Reading Exercises.... 3 A Sequential DLX Design 3.1 Instruction Set Architecture. 3.1.1 Instruction Formats. 3.1.2 Instruction Set Coding 3.1.3 Memory Organization 3.2 High Level Data Paths.... 3.3 Environments.... 3.3.1 General Purpose Register File 3.3.2 Instruction Register Environment 3.3.3 PC Environment... 3.3.4 ALU Environment..... 3.3.5 Memory Environment... 3.3.6 Shifter Environment SHenv 3.3.7 Shifter Environment SH4Lenv 3.4 Sequential Control.... 3.4.1 Sequential Control without Stalling 3.4.2 Parameters of the Control Automaton 3.4.3 A Simple Stall Engine 3.5 Hardware Cost and Cycle Time. 3.5.1 Hardware Cost..... 3.5.2 Cycle Time..... 3.6 Selected References and Further Reading 34 34 35 36 37 42 47 50 50 51 51 52 54 55 56 58 61 61 63 63 64 64 68 69 71 71 73 74 75 78 81 85 88 88 95 97 99 99 100 104

4 Basic Pipelining 4.1 Delayed Branch and Delayed PC 4.2 Prepared Sequential Machines... 4.2.1 Prepared DLX Data Paths 4.3 4.4 4.5 4.6 4.7 4.8 4.2.2 FSD for the Prepared Data Paths. 4.2.3 Precomputed Control. 4.2.4 A Basic Observation Pipelining as a Transformation 4.3.1 Correctness... 4.3.2 Hardware Cost and Cycle Time Result Forwarding...... 4.4.1 Valid Flags..... 4.4.2 3-Stage Forwarding. 4.4.3 Correctness. Hardware Interlock..... 4.5.1 Stall Engine... 4.5.2 Scheduling Function 4.5.3 Simulation Theorem Cost Performance Analysis. 4.6.1 Hardware Cost and Cycle Time 4.6.2 Performance Model....... 4.6.3 Delay Slots of Branch/Jump Instructions 4.6.4 CPI Ratio of the DLX Designs.. 4.6.5 Design Evaluation....... Selected References and Further Reading Exercises... 105 107 111 114 120 122 128 130 131 139 143 144 145 148 151 151 154 157 159 159 160 162 163 166 168 169 Table of contents 5 Interrupt Handling 5.1 Attempting a Rigorous Treatment of Interrupts 5.2 Extended Instruction Set Architecture..... 171 171 174 5.3 Interrupt Service Routines For Nested Interrupts. 177 5.4 Admissible Interrupt Service Routines 180 5.4.1 Set of Constraints............. 180 5.4.2 Bracket Structures... 181 5.4.3 Properties of Admissible Interrupt Service Routines 182 5.5 Interrupt Hardware..... 190 5.5.1 Environment PCenv... 191 5.5.2 Circuit Daddr........... 193 5.5.3 Register File Environment RFenv 194 5.5.4 Modified Data Paths.... 198 5.5.5 Cause Environment CAenv. 5.5.6 Control Unit..... 202 204 ix

Table of contents 5.6 Pipelined Interrupt Hardware.... 214 5.6.1 PC Environment.,.... 214 5.6.2 Forwarding and Interlocking 216 5.6.3 Stall Engine......... 220 5.6.4 Cost and Delay of the DLXn Hardware 225 5.7 Correctness of the Interrupt Hardware.. 227 5.8 Selected References and Further Reading 235 5.9 Exercises... 236 6 Memory System Design 239 6.1 A Monolithic Memory Design. '. 239 6.1.1 The Limits of On-chip RAM. 240 6.1.2 A Synchronous Bus Protocol. 241 6.1.3 Sequential DLX with Off-Chip Main Memory 245 6.2 The Memory Hierarchy...... 253 6.2.1 The Principle of Locality...... 254 6.2.2 The Principles of Caches...... 255 6.2.3 Execution of Memory Transactions 263 6.3 A Cache Design............... 265 6.3.1 Design of a Direct Mapped Cache. 266 6.3.2 Design of a Set Associative Cache. 268 6.3.3 Design of a Cache Interface 276 6.4 Sequential DLX with Cache Memory 280 6.4.1 Changes in the DLX Design.. 280 6.4.2 Variations of the Cache Design. 290 6.5 Pipelined DLX with Cache Memory.. 299 6.5.1 Changes in the DLX Data Paths 300 6.5.2 Memory Control........ 304 6.5.3 Design Evaluation....... 309 6.6 Selected References and Further Reading 314 6.7 Exercises... 314 x 7 IEEE Floating Point Standard and Theory of Rounding 317 7.1 Number Formats... 317../. 7.1.1 Bmary Fractions....... 317 7.1.2 Two's Complement Fractions 318 7.1.3 Biased Integer Format.... 318 7.1.4 IEEE Floating Point Numbers 320 7.1.5 Geometry of Representable Numbers 321 7.1.6 Convention on Notation 322 7.2 Rounding........ 323 7.2.1 Rounding Modes.... 323

7.2.2 Two Central Concepts... 325 7.2.3 Factorings and Normalization Shifts. 325 7.2.4 Algebra of Rounding and Sticky Bits 326 7.2.5 Rounding with Unlimited Exponent Range 330 7.2.6 Decomposition Theorem for Rounding 330 7.2.7 Rounding Algorithms 335 7.3 Exceptions..... 335 7.3.1 Overflow... 336 7.3.2 Underflow... 336 7.3.3 Wrapped Exponents 338 7.3.4 Inexact Result.... 341 7.4 Arithmetic on Special Operands 341 7.4.1 Operations with NaNs 342 7.4.2 Addition and Subtraction. 343 7.4.3 Multiplication. 344 7.4.4 Division... 344 7.4.5 Comparison..... 345 7.4.6 Format Conversions 347 7.5 Selected References and Further Reading 349 7.6 Exercises... 349 8 Floating Point Algorithms and Data Paths 351 8.1 Unpacking... 354 8.2 Addition and Subtraction.. 359 8.2.1 Addition Algorithm. 359 8.2.2 Adder Circuitry... 360 8.3 Multiplication and Division. 372 8.3.1 Newton-Raphson Iteration 373 8.3.2 Initial Approximation.. 375 8.3.3 Newton-Raphson Iteration with Finite Precision. 377 8.3.4 Table Size versus Number of Iterations... 379 8.3.5 Computing the Representative of the Quotient. 380 8.3.6 Multiplier and Divider Circuits. 381 8.4 Floating Point Rounder....... 390 8.4.1 Specification and Overview.. 391 8.4.2 Normalization Shift....... 394 8.4.3 Selection of the Representative. 405 8.4.4 Significand Rounding 406 8.4.5 Post Normalization.. 407 8.4.6 Exponent Adjustment 408 8.4.7 Exponent Rounding 409 8.4.8 Circuit SPEcFPRND 410 Table of contents xi

Table of contents xii 8.5 Circuit FCon.... 8.5.1 Floating Point Condition Test 8.5.2 Absolute Value and Negation. 8.5.3 IEEE Floating Point Exceptions 8.6 Format Conversion........... 8.6.1 Specification of the Conversions 8.6.2 Implementation of the Conversions 8.7 Evaluation of the FPU Design.... 8.8 Selected References and Further Reading 8.9 Exercises.... 9 Pipelined DLX Machine with Floating Point Core 9.1 Extended Instruction Set Architecture 9.1.1 FPU Register Set.. 9.1.2 Interrupt Causes... 9.1.3 FPU Instruction Set.. 9.2 Data Paths without Forwarding 9.2.1 Instruction Decode 9.2.2 Memory Stage.. 9.2.3 Write Back Stage. 9.2.4 Execute Stage... 9.3 Control of the Prepared Sequential Design 9.3.1 Precomputed Control without Division 9.3.2 Supporting Divisions... 9.4 Pipelined DLX Design with FPU.. 9.4.1 PC Environment...... 9.4.2 Forwarding and Interlocking 9.4.3 Stall Engine.... 9.4.4 Cost and Delay of the Control 9.4.5 Simulation Theorem..... 9.5 Evaluation... 9.5.1 Hardware Cost and Cycle Time 9.5.2 Variation of the Cache Size. 9.6 Exercises....... A DLX Instruction Set Architecture Al DLX Fixed-Point Core: FXU. AI.I Instruction Formats.. Al.2 Instruction Set Coding A2 Floating-Point Extension.. A2.1 FPU Register Set.. A2.2 FPU Instruction Set. 412 414 417 418 418 419 423 432 435 436 439 441 441 443 444 445 448 451 455 461 470 474 479 485 485 486 498 503 507 508 508 511 516 519 519 520 521 521 521 522

B Specification of the FDLX Design 527 B.I RTL Instructions of the FDLX 527 Bol.1 Stage IF 0 527 Bol.2 Stage ID 527 B.l.3 Stage EX 529 Bol.4 StageM 0 532 B.l.5 StageWB 0 534 B.2 Control Automata of the FDLX Design 534 Bo201 Automaton Controlling Stage ID 0 535 B.202 Precomputed Control 0 0 0 0 0 0 0 536 Table of contents Bibliography 543 Index 549 xiii