CS Computer Architecture

Similar documents
Computer Organization & Assembly Language Programming

Introduction. Computer System Organization. Languages, Levels, Virtual Machines. A multilevel machine. Sarjana Magister Program

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra

Chapter 2 Lecture 1 Computer Systems Organization

CS Computer Architecture

Processing Unit CS206T

Computer Systems O rganization Organization Chapter 2 1

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

Computer Organization and Assembly Language

CS 3733 Operating Systems:

Page 1. Multilevel Memories (Improving performance using a little cash )

Real instruction set architectures. Part 2: a representative sample

Overview of Computer Organization. Chapter 1 S. Dandamudi

CS 31: Intro to Systems Digital Logic. Kevin Webb Swarthmore College February 3, 2015

COSC 243. Computer Architecture 1. COSC 243 (Computer Architecture) Lecture 6 - Computer Architecture 1 1

Chapter 4. The Processor

Overview of Computer Organization. Outline

CS 31: Intro to Systems Digital Logic. Kevin Webb Swarthmore College February 2, 2016

UNIT- 5. Chapter 12 Processor Structure and Function

COSC 122 Computer Fluency. Computer Organization. Dr. Ramon Lawrence University of British Columbia Okanagan

Microprocessor Architecture Dr. Charles Kim Howard University

Chapter 5 12/2/2013. Objectives. Computer Systems Organization. Objectives. Objectives (continued) Introduction. INVITATION TO Computer Science 1

CPE300: Digital System Architecture and Design

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

Chapter 12. CPU Structure and Function. Yonsei University

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language.

ECE Lab 8. Logic Design for a Direct-Mapped Cache. To understand the function and design of a direct-mapped memory cache.

Computer Organization and Levels of Abstraction

omputer Design Concept adao Nakamura

Topics: Memory Management (SGG, Chapter 08) 8.1, 8.2, 8.3, 8.5, 8.6 CS 3733 Operating Systems

Assembly Language. Lecture 2 x86 Processor Architecture

Chapter 5:: Target Machine Architecture (cont.)

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics

COMPUTER ORGANISATION CHAPTER 1 BASIC STRUCTURE OF COMPUTERS

Computer Organization and Levels of Abstraction

The Institution of Engineers - Sri Lanka

Chapter 2 Data Manipulation

PIPELINE AND VECTOR PROCESSING

Latches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter

DC57 COMPUTER ORGANIZATION JUNE 2013

TK2123: COMPUTER ORGANISATION & ARCHITECTURE. CPU and Memory (2)

CPSC 313, 04w Term 2 Midterm Exam 2 Solutions

CS 101, Mock Computer Architecture

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Chapter 4. The Processor

5 Computer Organization

Computer organization and architecture UNIT-I 2 MARKS

Where Does The Cpu Store The Address Of The

(Advanced) Computer Organization & Architechture. Prof. Dr. Hasan Hüseyin BALIK (4 th Week)

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

What is Superscalar? CSCI 4717 Computer Architecture. Why the drive toward Superscalar? What is Superscalar? (continued) In class exercise

Chapter 3 - Top Level View of Computer Function

COMPUTER SYSTEMS ORGANIZATION

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.

Assembly Language. Lecture 2 - x86 Processor Architecture. Ahmed Sallam

Memory Organization MEMORY ORGANIZATION. Memory Hierarchy. Main Memory. Auxiliary Memory. Associative Memory. Cache Memory.

Computer Organization & Assembly Language Programming

Computer Organization

William Stallings Computer Organization and Architecture 10 th Edition Pearson Education, Inc., Hoboken, NJ. All rights reserved.

UNIT-II. Part-2: CENTRAL PROCESSING UNIT

Chapter 14 - Processor Structure and Function

CC312: Computer Organization

Lecture 8: RISC & Parallel Computers. Parallel computers

Advanced Caching Techniques (2) Department of Electrical Engineering Stanford University

Chapter 9: A Closer Look at System Hardware

COMPUTER ORGANIZATION AND DESIGN

Basic Computer Architecture

Chapter 9: A Closer Look at System Hardware 4

Von Neumann Architecture

CO Computer Architecture and Programming Languages CAPL. Lecture 15

Chapter 1 Computer System Overview

CPE300: Digital System Architecture and Design

ECE 341 Final Exam Solution

Advanced Parallel Architecture Lesson 3. Annalisa Massini /2015

SAE5C Computer Organization and Architecture. Unit : I - V

Operating Systems. Designed and Presented by Dr. Ayman Elshenawy Elsefy

CS1004: Intro to CS in Java, Spring 2005

Lecture 29 Review" CPU time: the best metric" Be sure you understand CC, clock period" Common (and good) performance metrics"

Chapter 8 & Chapter 9 Main Memory & Virtual Memory

EE 3170 Microcontroller Applications

Chapter 4. The Processor

Reduced Instruction Set Computer

5 Computer Organization

Lecture 11 Cache. Peng Liu.

CS 351 Final Exam Solutions

Introduction to C. Why C? Difference between Python and C C compiler stages Basic syntax in C

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Computer Memory.

INTEL Architectures GOPALAKRISHNAN IYER FALL 2009 ELEC : Computer Architecture and Design

Unit 2. Chapter 4 Cache Memory

Characteristics. Microprocessor Design & Organisation HCA2102. Unit of Transfer. Location. Memory Hierarchy Diagram

East Tennessee State University Department of Computer and Information Sciences CSCI 4717 Computer Architecture TEST 3 for Fall Semester, 2005

Alternate definition: Instruction Set Architecture (ISA) What is Computer Architecture? Computer Organization. Computer structure: Von Neumann model

Advanced Parallel Architecture Lesson 3. Annalisa Massini /2015

1.3 Data processing; data storage; data movement; and control.

WEEK 7. Chapter 4. Cache Memory Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Show Me the $... Performance And Caches

Transcription:

CS 35101 Computer Architecture Section 600 Dr. Angela Guercio Fall 2010

Computer Systems Organization The CPU (Central Processing Unit) is the brain of the computer. Fetches instructions from main memory. Examines them, and then executes them one after another. The components are connected by a bus, which is a collection of parallel wires for transmitting address, data, and control signals. Busses can be external to the CPU, connecting memory and I/O devices, but also internal to the CPU.

Processors The CPU is composed of several distinct parts: The control unit fetches instructions from main memory and determines their type. The arithmetic logic unit performs operations such as addition and boolean AND needed to carry out the instructions. A small, high-speed memory made up of registers, each of which has a certain size and function. The most important register is the Program Counter (PC) which points to the next instruction to be fetched. The Instruction Register (IR) holds the instruction currently being executed.

CPU Organization An important part of the organization of a computer is called the data path. It consists of the registers, the ALU, and several buses connecting the pieces. The ALU performs simple operations on its inputs, yielding a result in the output register. Later the register can be stored into memory, if desired. Most instructions can be divided into two categories: Register-memory instructions allow memory words to be fetched into registers, where they can be used as inputs in subsequent instructions, for example.

CPU Organization Register-register instructions fetch two operands from the registers, brings them into the ALU input registers, performs an operation, and stores the result back in a register. The process of running two operands through the ALU and storing the result is called the data path cycle. The faster the data path cycle, the faster the machine.

A von Neumann Machine The data path of a typical Von Neumann machine.

Instruction Execution The CPU executes as a series of small steps: 1. Fetch the next instruction from memory into the IR. 2. Change the PC to point to the following instruction. 3. Determine the type of instruction fetched. 4. If the instruction uses a word in memory, determine where it is. 5. Fetch the word into a CPU register. 6. Execute the instruction. 7. Go to step 1 to execute next instruction.

Instruction Execution A program that fetches, examines, and executes the instructions of another program is called an interpreter. Interpreted (as opposed to direct hardware implementation) of instructions has several benefits: Incorrectly implemented instructions can be fixed in the field. New instructions can be added at minimal cost. Structured design permitting efficient development, testing, and documenting of complex instructions.

Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13- 148521-0 Interpreter (1)... An interpreter for a simple computer (in Java).

Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13- 148521-0 Interpreter (2) An interpreter for a simple computer (in Java).

Instruction Execution By the late 70s, the use of simple processors running interpreters was widespread. The interpreters were held in fast read-only memories called control stores. In 1980, a group at Berkeley began designing VLSI CPU chips that did not use interpretation. They used the term RISC for this concept. RISC stands for Reduced Instruction Set Computer, contrasted with CISC (Complex Instruction Set Computer)

The RISC Design Principles Certain of the RISC design principles have now been generally accepted as good practice: All instructions are executed directly by hardware. Maximize the rate at which instructions are issued. Use parallelism to execute multiple slow instructions in a short time period. Instructions should be easy to decode. Only loads and stores should reference memory. Since memory access time is unpredictable, it makes parallelism difficult. Provide plenty of registers. Since accessing memory is slow.

Instruction-Level Parallelism Parallelism comes in two varieties: Instruction-level parallelism exploits parallelism within individual instructions to get more instructions/second Processor-level parallelism allows multiple CPUs to work together on a problem Fetching instructions from memory is a bottleneck. Instructions can be fetched in advance and stored in a prefetch buffer.

Pipelining This breaks up the instruction execution into two parts - fetch and execute. In pipelining, we break an instruction up into many parts, each one handled by dedicated hardware units running in parallel. Each unit is called a stage. After the pipeline is filled, an instruction completes at each (longest stage length) time interval. This time interval is the clock cycle of the CPU. The time to fill the pipeline is called the latency.

Pipelining a) A five-stage pipeline b) The state of each stage as a function of time. Nine clock cycles are illustrated

Superscalar Architectures We can also imagine having multiple pipelines. One possibility is to have multiple equivalent pipelines with a common instruction fetch unit. The Pentium adopted this approach with two pipelines. Complex rules must be used to determine that the two instructions don t conflict. Pentium-specific compilers produced compatible pairs of instructions. Another approach is to have a single pipeline with multiple functional units. This approach is called superscalar architecture and is used on high-end CPUs (including the Pentium II).

Superscalar Architecture Dual five-stage pipelines with a common instruction fetch unit.

Superscalar Architecture A superscalar processor with five functional units.

Processor-Level Parallelism Instruction-level parallelism speed up execution by a factor of five or ten. To get speed-ups of 50, 100, or more, we need to use multiple CPUs. Array processors consist of a large number of identical processors that perform the same sequence of instructions on different sets of data. The first array processor was the ILLIAC IV (1972) with an 8x8 array of processors.

Processor-Level Parallelism A vector processor is similar to an array processor but while the array processor has as many adders as data elements, in the vector processor the addition operations are performed in a single, highly pipelined adder. Vector processors use vector registers which are a set of conventional registers which can be loaded from memory in a single instruction. Two vectors of elements are added together in a pipelined adder.

Array Processors An array of processor of the ILLIAC IV type.

Multiprocessors The processing elements in an array processor are not independent since they have a common control unit. A multiprocessor is a system with multiple CPUs sharing a common memory. Multiprocessors can have a single global memory or a global memory with local memory for each CPU Systems with no common memory are called multicomputers. They communicate via a fast network which may be connected in various topologies. Multicomputers are easier to build, but more difficult to program.

Multiprocessors a) A single-bus multiprocessor. b) A multicomputer with local memories.

Reading Read Chapter 2 up to page 69 Next time Memory Read ahead the rest of Chapter 2

CS 35101 Computer Architecture Section 600 Dr. Angela Guercio Fall 2010

Primary Memory The memory is that part of the computer where programs and data are stored. The basic unit of memory is the binary digit called a bit. A bit may contain a 0 or a 1. Binary arithmetic is used by computers since it is easy to distinguish between two values of a continuous physical quantity such as voltage or current. Memories consist of a number of cells. Each cell has an address (number) used to refer to it.

Primary Memory Computers express memory addresses as binary numbers. If an address has m bits, the maximum number of cells addressable is 2 m. A cell is the smallest addressable unit. Nowadays, most all manufacturers use an 8-bit cell called a byte. Bytes are grouped into words. A computer with a 32-bit word has 4 bytes/word and 32-bit registers and instructions.

Memory Organization Three ways of organizing a 96-bit memory.

Memory Organization Number of bits per cell for some historically interesting commercial computers

Byte Ordering The bytes in a word can be ordered from left-to-right or right-to-left. The first is called big endian ordering while the second is called little endian ordering. Representation of integers is the same in the two scheme, but strings are represented differently. Care must be taken when transferring data among machines with different byte ordering.

Memory Organization (a) Big endian memory (b) Little endian memory

Memory Organization (a) A personal record for a big endian machine. (b) The same record for a little endian machine. (c) The result of transferring from big endian to little endian. (d) The result of byte-swapping (c).

Error-Correcting Codes Occasional errors may occur in computer memories due to voltage spikes or other causes. Errors can be handles by adding extra check bits to words of memory. Suppose a word of memory has m data bits and r check bits. Let the total length be n = m + r. This n bit unit is often referred to as a codeword. The number of bits in which two codewords differ is called the Hamming distance.

Error-Correcting Codes To detect d single-bit errors requires a distance d + 1 code. To correct d single-bit errors requires a distance 2d + 1 code. Consider adding a single parity bit to the data. The bit is chosen so that the number of 1 bits in the codeword is even (or odd). Now a single error results in an invalid codeword. It takes two errors to go from one valid codeword to another.

Error-Correcting Codes Imagine we want to design a code with m data bits and r check bits that will allow all singlebit errors to be corrected. Each of the 2 m legal memory words has n illegal codewords at a distance 1 from it. Form these by inverting each of the n bits in the n- bit codeword. Each of the 2 m legal memory words requires n + 1 bit patterns dedicated to it. (n + 1) 2 m <= 2 n since n = m + r, (m + r + 1) <= 2 r

Error-Correcting Codes Number of check bits for a code that can correct a single error

Error-Correcting Codes The following figure illustrates an error-correcting code for 4-bit words. The three circles form 7 regions. Encode the 4-bit word 1100 in four of those regions then add a parity bit to each of the three empty regions so that the sum of the bits in each circle is an even number. Now suppose that the bit in the AC region goes bad, changing from a 0 to a 1. Circles A and C have the wrong parity. The only single-bit change that corrects them is to restore AC back to 0, thus correcting the error.

Error-Correcting Codes (a) Encoding of 1100 (b) Even parity added (c) Error in AC

Hamming s Algorithm Hamming s algorithm can be used to construct single error-correcting codes for any size memory word. In a Hamming code, r parity bits are added to an m-bit word, forming a new word of length m + r bits. The bits are numbered starting at 1, not 0, with bit 1 the leftmost (high-order) bit. All bits whose bit number is a power of 2 are parity bits; the rest are used for data. In a 16-bit word, 5 parity bits are added. Bits 1, 2, 4, 8, and 16 are parity bits. The word has 21 total bits.

Hamming s Algorithm Each parity bit checks specific bit positions; the parity bit is set so that the total number of 1s in the checked positions is even. The positions checked are: Bit 1 checks bits 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21. Bit 2 checks bits 2, 3, 6, 7, 10, 11, 14, 15, 18, 19. Bit 4 checks bits 4, 5, 6, 7, 12, 13, 14, 15, 20, 21. Bit 8 checks bits 8, 9, 10, 11, 12, 13, 14, 15. Bit 16 checks bits 16, 17, 18, 19, 20, 21. In general each bit b is checked by those bits b 1, b 2,, b j such that b 1 + b 2 + + b j = b.

Error-Correcting Codes Construction of the Hamming code for the memory word 11110000010101110 by adding 5 check bits to the 16 data bits.

Hamming s Algorithm Consider what would happen if bit 5 in the word on the previous slide were inverted by a surge on the power line. Bit 5 would then be a 0. The 5 parity bits would be checked with the following results: Parity bit 1 incorrect (positions checked contain 5 1s) Parity bit 2 correct (positions checked contain 6 1s) Parity bit 4 incorrect (positions checked contain 5 1s) Parity bit 8 correct (positions checked contain two 1s) Parity bit 16 correct (positions checked contain four 1s)

Hamming s Algorithm The incorrect bit must be one of the bits checked by parity bit 1 and by parity bit 4. These are bits 5, 7, 13, 15, or 21. However, bit 2 is correct, eliminating 7 and 15. Similarly, bit 8 is correct, eliminating 13. Finally, bit 16 is correct, eliminating 21. The only bit left is 5, which is the one in error. If all parity bits are correct, there were no errors (or more than one). Otherwise, add up all the incorrect parity bits. The sum gives the position of the incorrect bit.

Cache Memory The cache is logically between the CPU and main memory. Physically, there are several possible places it could be located.

Hit and Miss Ratio Hit Ratio the fraction of all references that can be satisfied out of the cache Let s call it h Miss ratio 1-h Mean Access Time = c + (1 - h)*m Where c is the cache access time

Principle of Locality Memory references made in any short time interval tend to use only a small fraction of the total memory When a word is reference, it and some neighbors are brought from the large slow memory into the cache Next time can be accessed quickly

Memory Packaging and Types A single inline memory module (SIMM) holding 256 MB. Two of the chips control the SIMM.

Memory Hierarchies A five-level memory hierarchy.