Summary of Computer Architecture
Summary CHAP 1: INTRODUCTION
Structure Top Level Peripherals Computer Central Processing Unit Main Memory Computer Systems Interconnection Communication lines Input Output
Structure - CPU CPU I/O Computer System Bus Memory CPU Registers Internal CPU Interconnection Arithmetic and Logic Unit Control Unit
CPU CPU controls the operation of the computer Components of CPU Control Unit control the operation of the CPU Arithmetic Logic Unit (ALU) performs data processing function e.g. calculation Internal CPU Interconnection provides communication between control unit, registers and ALU.
Structure - Control Unit Control Unit ALU CPU Internal Bus Registers Control Unit Sequencing Logic Control Unit Registers and Decoders Control Memory
Summary CHAP 2: BUS
Bus system Expansion slots (PCI, PCIe, )
Function of Control Unit For each operation a unique code is provided e.g. ADD, MOVE A hardware segment accepts the code and issues the control signals We have a computer! Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM 9 BIT20303-Computer Architecture
Components The Control Unit and the Arithmetic and Logic Unit (ALU) constitute the Central Processing Unit (CPU) Data and instructions need to get into the system and results out Input/output Temporary storage of code and results is needed Main memory Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM 10 BIT20303-Computer Architecture
Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM 11 BIT20303-Computer Architecture
Computer Components: Top Level View Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM 12 BIT20303-Computer Architecture
How Instruction is Executed? What is instruction? Instruction specify the action that the processor is suppose to take. The processing required for a single instruction is called an instruction cycle. Instruction cycle are made of these two steps: Fetch (processor reads from memory and also referred to as fetch cycle) Execute (Also referred to as execute cycle) Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM 13 BIT20303-Computer Architecture
Fetch Cycle Program Counter (PC) holds address of next instruction to fetch Processor fetches instruction from memory location pointed to by PC Increment PC Unless told otherwise Instruction loaded into Instruction Register (IR) Processor interprets instruction and performs required actions Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM 14 BIT20303-Computer Architecture
Execute Cycle An instruction s execution (execute cycle) may involve one or a combination of these actions Processor-memory Data transfer between CPU and main memory Processor I/O Data transfer between CPU and I/O module Data processing Some arithmetic or logical operation on data Control Alteration of operations sequences Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM 15 BIT20303-Computer Architecture
Instruction Format Assume both instructions and data are 16 bits (2 bytes) long. The instruction format provides 4 bytes for the opcode, so that there can be as many as 2 4 = 16 different opcodes and up to 2 12 words of memory can be directly addressed. Instruction format Integer format Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM 16 BIT20303-Computer Architecture
What is Word, Half-Word and Double Word? A "word," in computing, is a standard memory size used for data storage. The most popular word sizes for modern computers is 16, 32, or 64 bits. Some systems or programming languages do not declare specific sizes for variables and use "word," "half-word" and "double word" to describe how much storage space you are allocating. This means that if you have a system with a 32 bit word size, and you declare a double word integer, you have declared a 64 bit integer. Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM
Example of Program Execution Internal CPU Registers PC (Program Counter) AC (Accumulator) a data register IR (Instruction Register) Program to be executed: Adds the content of the memory word at address 940 to the content of the memory word address 941 and stores the result in latter location. (Assume a word=16 bits/2 bytes) Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM 18 BIT20303-Computer Architecture
(cont.) Example of Program Execution Requires 3 fetch and 3 execute cycles. 1. {1 st Fetch cycle} The PC contains 300, the address of the first instruction. This instruction (the value 1940 in hexadecimal) is loaded into the instruction register IR and the PC is incremented. Note that this process involves the use of a memory address register (MAR) and a memory buffer register (MBR). For simplicity these intermediate registers are ignored. NOTE: The number used in this example is in hexadecimal e.g. 0x1940. Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM 19 BIT20303-Computer Architecture
(cont.) Example of Program Execution 2. {1 st Execute cycle} The first 4 bits (first hexadecimal digit) in the IR indicate that the AC is to be loaded. The remaining 12 bits (3 hexadecimal digits) specify the address (940) from which data are to be loaded. Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM 20 BIT20303-Computer Architecture
(cont.) Example of Program Execution 3. {2 nd Fetch cycle} The next instruction (5941) is fetched from location 301 and the PC is incremented. Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM 21 BIT20303-Computer Architecture
(cont.) Example of Program Execution 4. {2 nd Execute cycle} The old content of the AC and the content of location 941 are added and the result is stored in the AC. Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM 22 BIT20303-Computer Architecture
(cont.) Example of Program Execution 5. {3 rd Fetch cycle} The next instruction (2941) is fetched from location 302 and the PC is incremented. Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM 23 BIT20303-Computer Architecture
(cont.) Example of Program Execution 6. {3 rd Execute cycle} The content of AC is stored in location 941. Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM 24 BIT20303-Computer Architecture
6 5 2 1,10 7 4,11 3 8 9 Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM 25 BIT20303-Computer Architecture
Summary CHAP 3: MEMORY
Location Inside CPU (e.g. Registers) Internal (inside the computer e.g. RAM, Level 1 or L1 cache, L2 cache, L3 cache) External (outside of the computer e.g. Hard disks, SSD, removable drives)
A Modern Memory Hierarchy Memory Abstraction Register File 32 words, sub-nsec L1 cache ~32 KB, ~nsec manual/compiler register spilling L2 cache 512 KB ~ 1MB, many nsec L3 cache,... Automatic HW cache management Main memory (DRAM), GB, ~100 nsec Swap Disk 100 GB, ~10 msec automatic demand paging 28
How to access memory location? Random (e.g. RAM) individual address identify locations exactly Direct (e.g. hard disk) Each block has unique address; access by jumping to specific block plus sequential search Associative (e.g. cache) data is retrieved based on the portion of its contents rather than its address Sequentially (e.g. tape) start from the beginning of the tape; access time depends on location of data and previous location.
RAM Two types Static RAM (SRAM) Dynamic RAM (DRAM)
_bitline Memory Technology: DRAM Dynamic random access memory Capacitor charge state indicates stored value Whether the capacitor is charged or discharged indicates storage of 1 or 0 1 capacitor 1 access transistor Capacitor leaks through the RC path DRAM cell loses charge over time DRAM cell needs to be refreshed row enable
bitline _bitline Memory Technology: SRAM Static random access memory Two cross coupled inverters store a single bit Feedback path enables the stored value to persist in the cell 4 transistors for storage 2 transistors for access row select
Fundamental tradeoff Fast memory: small Large memory: slow Idea: Memory hierarchy Memory Hierarchy CPU RF Cache Main Memory (DRAM) Hard Disk Latency, cost, size, bandwidth
Caching Basics: Exploit Temporal Locality Idea: Store recently accessed data in automatically managed fast memory (called cache) Anticipation: the data will be accessed again soon Temporal locality principle Recently accessed data will be again accessed in the near future This is what Maurice Wilkes had in mind: Wilkes, Slave Memories and Dynamic Storage Allocation, IEEE Trans. On Electronic Computers, 1965. The use is discussed of a fast core memory of, say 32000 words as a slave to a slower core memory of, say, one million words in such a way that in practical cases the effective access time is nearer that of the fast memory than that of the slow memory.
Caching Basics: Exploit Spatial Locality Idea: Store addresses adjacent to the recently accessed one in automatically managed fast memory Logically divide memory into equal size blocks Fetch to cache the accessed block in its entirety Anticipation: nearby data will be accessed soon Spatial locality principle Nearby data in memory will be accessed in the near future E.g., sequential instruction access, array traversal This is what IBM 360/85 implemented 16 Kbyte cache with 64 byte blocks Liptay, Structural aspects of the System/360 Model 85 II: the cache, IBM Systems Journal, 1968.
The Bookshelf Analogy Book in your hand Desk Bookshelf Boxes at home Boxes in storage Recently-used books tend to stay on desk Comp Arch books, books for classes you are currently taking Until the desk gets full Adjacent books in the shelf needed around the same time If I have organized/categorized my books well in the shelf
Cache Cache hits vs. Cache misses Cache types Direct-mapped cache Set Associativity cache
Summary CHAP 4: INPUT OUTPUT
Input/Output Problems Wide variety of peripherals Delivering different amounts of data At different speeds In different formats All slower than CPU and RAM Need I/O modules BIT20303-Computer Architecture 39
Input/Output Module Interface to CPU and Memory Interface to one or more peripherals BIT20303-Computer Architecture 40
Generic Model of I/O Module BIT20303-Computer Architecture 41
External Devices Human readable Screen, printer, keyboard Machine readable Monitoring and control Communication Modem Network Interface Card (NIC) BIT20303-Computer Architecture 42
External Device Block Diagram Control Signal determines the function that the device will perform such as send data to the I/O module (INPUT or READ) or accept data from the I/O module (OUTPUT or WRITE). Status signal indicates the state of the device e.g. busy or idle. Data are according to the control signal either for READ or WRITE. Buffer is to temporarily hold the data being transferred between I/O and the external environment.
I/O Module Functions Control & Timing CPU Communication Device Communication Data Buffering Error Detection BIT20303-Computer Architecture 44
Three Techniques for Input of a Block of Data What are the differences between these techniques? BIT20303-Computer Architecture 45
Programmed I/O BIT20303-Computer Architecture 46
Programmed I/O CPU has direct control over I/O Sensing status Read/write commands Transferring data CPU waits for I/O module to complete operation Wastes CPU time BIT20303-Computer Architecture 47
Programmed I/O - detail CPU requests I/O operation I/O module performs operation I/O module sets status bits CPU checks status bits periodically I/O module does not inform CPU directly I/O module does not interrupt CPU CPU may wait or come back later BIT20303-Computer Architecture 48
Interrupt-Driven I/O BIT20303-Computer Architecture 49
Interrupt Driven I/O Basic Operation CPU issues read command I/O module gets data from peripheral whilst CPU does other work I/O module interrupts CPU CPU requests data I/O module transfers data BIT20303-Computer Architecture 50
Simple Interrupt Processing BIT20303-Computer Architecture 51
Direct Memory Access (DMA) BIT20303-Computer Architecture 52
DMA Interrupt driven and programmed I/O require active CPU intervention Transfer rate is limited CPU is tied up DMA is the answer BIT20303-Computer Architecture 53
DMA Operation CPU tells DMA controller:- Read/Write Device address Starting address of memory block for data Amount of data to be transferred CPU carries on with other work DMA controller deals with transfer DMA controller sends interrupt when finished BIT20303-Computer Architecture 54
DMA Transfer Cycle Stealing DMA controller takes over bus for a cycle Transfer of one word of data Not an interrupt CPU does not switch context CPU suspended just before it accesses bus i.e. before an operand or data fetch or a data write Slows down CPU but not as much as CPU doing transfer BIT20303-Computer Architecture 55
Summary CHAP 5: COMPUTER ARITHMETIC
Unsigned Integer 0101 + 0010 =(4+1) + 2 = 7 0101 1010 + 0001 0001 0101 + 0010 0111 0101 1010 + 0001 0001 0110 1011
0101 x 0110 0101 x 0110 0000 0101 0000 + 0101 011001 Unsigned Integer
(REVERSE BIT) (PLUS 1) Minimum value = 1000000 = -64 Maximum value = 0111111 = 63
Signed Integers (2 s Complement) OVERFLOW RULE If 2 numbers are added, and they are both positive or both negative, then OVERFLOW occurs if and only if the result has the opposite sign.
Fixed Floating Point 0010.1010 =2 1 + 2-1 + 2-3 = 2 + (½) + (1/8) = 2 + 0.5 + 0.25 = 2.75
Single-Precision Floating Point FORMULA: Sign (1 bit).exponent (3 bit).significand (4 bit) ANSWER: 1.125x0.5=1.625 Note: Bias = 3, Thus exponent = -1 (where 010 is 2; thus 2 3 = -1), 1.001=1 + (1/8)=1 + 0.125
Single Precision Floating Point 0 010 0010 (8 bit) Sign = 0 Exponent = 010 7 = -5 Significand = 0010 = 2-3 = (1/8) = 0.25 (-1) Sign x 1.significand x 2 exponent-bias = (-1) 0 x 1.0010 x 2-5 = 1 x (1+0.25) x (1/32) = 1.25 x 0.03125 = 0.0390625 1 01111110 00100000000 000000000000 (24 bit) (-1) Sign x 1.significand x 2 exponent-bias = (-1) 1 x 1.0010 x 2 126-127 = -1 x (1+0.25) x 2-1 = -1.25 x 0.5 = -0.625 NOTE: For 8 bit, bias=3 (-3 to 4); for 24 bit, bias=127 (-127 to 128)
3-bit bias 111=-3 011=3 8-bit bias 1111 1111=-127 0111 1111=127
CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data BIT20303-Computer Architecture 65
Summary CHAP 7: CPU
CPU With Systems Bus BIT20303-Computer Architecture 67
CPU Internal Structure BIT20303-Computer Architecture 68
Registers A small storage available in CPU Faster than main memory BIT20303-Computer Architecture 69
Type of Registers General Purpose Data Address hold addresses that are used by instructions to access main memory (RAM) Control and Status BIT20303-Computer Architecture 70
How to increase speed performance of CPU? Improving organization e.g. locate cache nearer to CPU, increase bus bandwidth Increase clock frequency e.g. from 1 GHz to 5 GHz Increase parallelism e.g. pipelining, superscalar, Simultaneous Multithreading (SMT)
Thank You