Computer Organization & Assembly Language Programming

Size: px

Start display at page:

Download "Computer Organization & Assembly Language Programming"

Camron Marshall
6 years ago
Views:

1 Computer Organization & Assembly Language Programming CSE (Fall 2011) Lecture 5 Memory Junzhou Huang, Ph.D. Department of Computer Science and Engineering Fall 2011 CSE 2312 Computer Organization & Assembly Language Programming 1

2 Reviewing (1): CPU The organization of a simple computer with one CPU and two I/O devices Fall 2011 CSE 2312 Computer Organization & Assembly Language Programming 2

3 Reviewing (2): Instruction Execution Steps Central to the operation of all computers Fetch-decode-execute Fetch next instruction from memory into instruction register Change the program counter to point out the following instruction Determine type of instruction just fetched If instructions uses a word in memory, determine where it is Fetch the word, if needed, into a CPU register Execute the instruction Go to step 1 to begin executing following instruction Fall 2011 CSE 2312 Computer Organization & Assembly Language Programming 3

4 Reviewing (3): Interpreting Instructions Interpreter A program that fetches, examines and executes the instructions of other program Can write a program to imitate the function of a CPU Main advantage: the ability to design a simple processor with the complexity largely confined to the memory holding the interpreter Benefits (simple computer with interpreted instructions) The ability to fix incorrectly implemented instructions or make up for design deficiencies in the basic hardware The opportunity to add new instructions at minimal cost even after delivery of the machine Structured design that permitted efficient development, testing and documenting of complex instructions Fall 2011 CSE 2312 Computer Organization & Assembly Language Programming 4

5 Reviewing (4): Design Principles Instructions directly executed by hardware Eliminating a level of interpretation provides high speed for most instructions; Less frequently occurring instructions are acceptable Maximize rate at which instructions are issued Parallelism can play a major role in improving performance Instructions should be easy to decode A critical limit on the rate of issue of instructions is decoding individual instructions to determine what resources they need; Fewer different formats for instructions, the better Only loads, stores should reference memory Access the memory can take a long time All other instructions should operate only on registers Provide plenty of registers Running out of registers leads to flush them back to memory Memory access leads to slow speed Fall 2011 CSE 2312 Computer Organization & Assembly Language Programming 5

6 Reviewing (5): Instruction-Level Parallelism A five-stage pipeline The state of each stage as a function of time. Nine clock cycles are illustrated Fall 2011 CSE 2312 Computer Organization & Assembly Language Programming 6

7 Reviewing (6): Pipelining A five-stage pipeline Suppose 2ns for the cycle time. It takes 10ns for an instruction to progress all the way through the fivestage pipeline So, the machine runs at 100 MIPS? Actual rate is 500 MIPS Pipelining Allow a tradeoff between latency and processor bandwidth Latency: how long it takes to execute an instruction Processor bandwidth: how many MIPS the CPU has Example Suppose a complex instruction should take 10 ns, under perfect condition, how many stages pipeline we should design to guarantee to execute 500 MIPS? Each pipeline: 1/500 MIPS = 2 ns 10 ns/ 2ns =5 stages Fall 2011 CSE 2312 Computer Organization & Assembly Language Programming 7

Memory What is Memory Part of computer Used for store data and program Basic Unit of Memory: bit A bit contains a 0 or 1 Simplest possible unit Cell (Memories consist of a number of cells) Cell:

cells will have addresses 0 to n-1 Adjacent cells have consecutive addresses (by definition) If an address has m bits, the maximum number of cells addressable is 2 m The number of bits in the

8 Memory What is Memory Part of computer Used for store data and program Basic Unit of Memory: bit A bit contains a 0 or 1 Simplest possible unit Cell (Memories consist of a number of cells) Cell: the smallest addressable unit Each cell can store a piece of information Each cell has a number called its address A cell with k bits can hold one of 2 k different bit combinations A memory with n cells will have addresses 0 to n-1 Adjacent cells have consecutive addresses (by definition) If an address has m bits, the maximum number of cells addressable is 2 m The number of bits in the address determines the maximum number of directly addressable cells in the memory and independent of the number of bits per cell Fall 2011 CSE 2312 Computer Organization & Assembly Language Programming 8

9 Memory Addresses (1) Three ways of organizing a 96-bit memory Question: How many bits are sufficient for an address to reference the memory of Fig (a), (b), (c)? 4, 3, 3 Fall 2011 CSE 2312 Computer Organization & Assembly Language Programming 9

10 Memory Addresses (2) Number of bits per cell for some historically interesting commercial computers Recent: All computer manufactures have standardized on an 8-bit cell Byte: 8-bit Word A group of bytes Most instructions operate on entire words 32-bit Word: 4 bytes per word 64-bit Word: 8 bytes per word 32-bit machines need 32-bit registers 64-bit machines need 64-bit registers Fall 2011 CSE 2312 Computer Organization & Assembly Language Programming 10

11 Byte Ordering (1) (a) Big endian memory SPARC, IBM mainframes (b) Little endian memory Inter family Byte Order Left-to-right or right-to-left In both cases, 32-bit integer is represented starting from rightmost and zerofilling in the leftmost For example, 6 will be represented as 110 in rightmost 3 bits. Other leftmost 29 bits will be filled as 0 Problem: mixture of integer, strings and others Fall 2011 CSE 2312 Computer Organization & Assembly Language Programming 11

12 Byte Ordering (2) (a) A personal record for a big endian machine; (b) The same record for a little endian machine; (c) The result of transferring from big endian to little endian; (d) The result of byte-swapping No simple solution Possible Solution: include a header in front of each data item indicating its data type and how long it is Fall 2011 CSE 2312 Computer Organization & Assembly Language Programming 12

13 Error Correcting Codes Why Error Correcting Codes Computer memories can make errors due to voltage spikes or others Error Correcting Codes are used to guard against such errors Codeword An n-bit unit containing m data and r check bits where n = m + r Hamming distance between two Codewords Defined as the number of bit positions in which two Codewords differ If the Hamming distance between two codewords is d, it will require d singlebit errors to convert one into another Example: What is Hamming distance between and ? Error Correcting Properties To detect d single-bit errors, you need a minimal distance d+1 Why? In this case, no single-bit error can change a valid codeword to another To correct d single-bit errors, you need a minimal distance 2d+1 Fall 2011 CSE 2312 Computer Organization & Assembly Language Programming 13

14 Number of Check Bits (Single-bit Error) Number of check bits for a code that can correct a single error How to compute number of check bits Formulation: (m+r+1)<=2 r Fall 2011 CSE 2312 Computer Organization & Assembly Language Programming 14

15 Example for Checking Bits (a) Encoding of 1100 (b) Even parity added (c) Error in AC Example Encode 4-bit memory word 1100 in the region AB, ABC,AC, BC. 1 bit per region Add a parity bit to each of the three empty region to produce even parity Suppose AC has an error. Computer will know A and C have the wrong parity. The only single-bit change that corrects them is to restore AC back to 0 Fall 2011 CSE 2312 Computer Organization & Assembly Language Programming 15

16 Parity Bits Hamming Algorithm Construct error-correcting codes for any size memory word In a Hamming code, r-bit parity bits are added to an m-bit word, forming a new word with m+r bits The bits are numbered starting at 1 with bit 1 the leftmost bit All bits with bit number a power of 2 are parity bits, all the rest are data bits Each parity bit will check specific bit positions The parity bit is set so that the total number of 1s in the checked position is even (odd) Bit b is checked by those bits b 1, b 2,, b j such that b 1 + b b j =b Exercise Which bits will be parity bits? Answer: 1, 2, 4, 8, 16, 32, 64, What bits will check bit 5? Answer: 1 and 4 What bits will check bit 6? Answer: 2 and 4 What bits will check bit 7? Answer: 1, 2 and 4 Fall 2011 CSE 2312 Computer Organization & Assembly Language Programming 16

17 Example Construction of the Hamming code for the memory word by adding 5 check bits to the 16 data bits. Fall 2011 CSE 2312 Computer Organization & Assembly Language Programming 17

18 Imbalance Between CPU and Memory CPU Faster Than Memory Historically When being able to put more circuits on a chip, CPU designers use it make CPUs faster and Memory designers use it increase the capacity So, CPUs issue a memory request, it will not get the requested word for many CPU cycles Two solutions: 1) hardware, continue execute and stall CPU; 2) software, the compiler is forced to insert NOP (no operation) The performance degrade is the same Technology problem? No, economics problem. Engineers can build memories as fast as CPUs Has to be located on the CPU chip and then make CPU chip more expensive and bigger Practical Solution Combing a small amount of fast memory (cache) with a large amount of slow memory Most Heavily used memory words are kept in the cache Fall 2011 CSE 2312 Computer Organization & Assembly Language Programming 18

19 Cache Memory The cache is logically between the CPU and main memory. Physically, there are several possible places it could be located. Locality Principle: when a word is referenced, it and some of its neighbors are brought from large slow memory to the cache Fall 2011 CSE 2312 Computer Organization & Assembly Language Programming 19

20 Cache Design Important for high-performance CPUs; Several issues Cache size: the bigger the cache, the better it performs, but also more costs. Size of the cache line. A 16-KB cache can be divided up into 1024 lines of 16 bytes, or 2048 lines of 8 bytes, and other combinations. Cache organization: how does the cache keep track of which memory words are currently being held? Whether instructions and data are kept in the same cache or different ones? Unified cache (instructions and data use the same cache) is a simpler design and automatically balances instruction fetches against data fetches. Split cache (with instructions in one cache and data in the other) allows parallel accesses; a unified one does not. Also, as instructions are not modified during execution, the contents of the instruction cache never has to be written back into memory. Number of caches. Chips with a primary cache on chip, a secondary cache off chip but in the same package as the CPU chip, and a third cache still further away. Fall 2011 CSE 2312 Computer Organization & Assembly Language Programming 20

21 Memory Packaging SIMM: Single Inline Memory Module DIMM: Dual Inline Memory Module A single inline memory module (SIMM) holding 256 MB. Two of the chips control the SIMM. Fall 2011 CSE 2312 Computer Organization & Assembly Language Programming 21

CS Computer Architecture

CS Computer Architecture CS 35101 Computer Architecture Section 600 Dr. Angela Guercio Fall 2010 Computer Systems Organization The CPU (Central Processing Unit) is the brain of the computer. Fetches instructions from main memory.