ECE331: Hardware Organization and Design Lecture 28: System Dependability, Error Correction Codes and Virtual Machines Adapted from Computer Organization and Design, Patterson & Hennessy, UCB
Overview Dependability Hamming Correction Codes Virtual Machines ECE331: Introduction to Virtual Memory 2
Dependability Service accomplishment Service delivered as specified Restoration Failure Fault: failure of a component May or may not lead to system failure Service interruption Deviation from specified service ECE331: Introduction to Virtual Memory 3
Dependability Measures Reliability: mean time to failure (MTTF) Service interruption: mean time to repair (MTTR) Mean time between failures MTBF = MTTF + MTTR Availability = MTTF / (MTTF + MTTR) Improving Availability Increase MTTF: fault avoidance, fault tolerance, fault forecasting Reduce MTTR: improved tools and processes for diagnosis and repair ECE331: Introduction to Virtual Memory 4
Detection of Memory Errors Parity checking In its simplest form, an extra bit (usually LSB), that can have a value of 0 or 1, and is used to make the sum of all bits in the memory cell, either even or odd. Example: 0011 0010 (8 bits) à 0011 00101 (9 bits) for even parity. This only tells us that there is an error, it doesn t help correct the error ECE331: Introduction to Virtual Memory 5
The Hamming SEC Code Hamming distance Number of bits that are different between two bit patterns 0011 0010 binary string #1 0000 0010 binary string #2 This leads to a Hamming distance of 2 Minimum distance = 2 provides single bit error detection e.g. parity code Minimum distance = 3 provides single error correction, 2 bit error detection ECE331: Introduction to Virtual Memory 6
Encoding SEC (Single Error Correcting) To calculate Hamming code: Number bits from 1 on the left All bit positions that are a power 2 are parity bits Each parity bit checks certain data bits: Note: In a break from convention, we start numbering bits from 1 on the left (MSB), as opposed to the rightmost (LSB) bit being zero ECE331: Introduction to Virtual Memory 7
Decoding SEC Value of parity bits indicates which bits are in error Use numbering from encoding procedure Example: We have an 8 bit value: 1001 1010 Add 4 bits for parity checking: 1_001 _1010 Bit 1 Bit 2 Bit 4 Bit 8 Bit 1 (0001) covers all bit locations that have a 1 in the LSB (e.g. 0001, 0011, etc., but not 0010, 0110, etc.) Bit 2 (0010) covers all bit locations that have a 1 in the 2 nd bit (e.g. 0010, 0011, etc., but not 0001, 0101, etc.) Bit 4 (0100) covers all bit locations that have a 1 in the 3 rd bit (e.g. 0100, 0101, etc., but not 0001, 1011, etc.) ECE331: Introduction to Virtual Memory 8
Example continued Fill out the remaining parity bits based on the rules of parity. In this case, for even parity 1_001 _1010 Bit 1 Bit 2 Bit 4 Bit 8 Bit 1: (add bits 1, 3, 5, 7, 9, and 11) à Bit 1 = 0 Bit 2: (add bits 2, 3, 6, 7, 10, and 11) à Bit 2 = 1 Bit 4: (add bits 4, 5, 6, 7, and 12) à Bit 4 = 1 Bit 8: (add bits 8, 9, 10, 11, and 12) à Bit 8 = 0 The resulting 12-bit value, that includes the Hamming correction code is 0111 0010 1010 ECE331: Introduction to Virtual Memory 9
Example continued If there is a single bit-error in the data stored in memory, we have 0111 0010 1110 We can check the parity bits Bit 1: add bits 1, 3, 5, 7, 9, and 11 = 4, even, okay! Bit 2: add bits 2, 3, 6, 7, 10, and 11 = 5, odd, error! Bit 4: add bits 4, 5, 6, 7, and 12 = 2, even, okay! Bit 8: add bits 8, 9, 10, 11, and 12 = 3, odd, error! Add the bit locations together to find out where the error is: 2 + 8 = 10. An error in Bit 10 Flip bit 10, 1 à 0 This fixes the problem! As long as there is only one error, this works for all bits, including the parity bits. ECE331: Introduction to Virtual Memory 10
SEC/DEC Code Add an additional parity bit for the whole word (p n ) Make Hamming distance = 4 Decoding: Let H = SEC parity bits H even, p n even, no error H odd, p n odd, correctable single bit error H even, p n odd, error in p n bit H odd, p n even, double error occurred Note: ECC DRAM uses SEC/DEC with 8 bits protecting every 64 bits. With 72 bits, we can have Single Bit Error Corrections and Double Error Detections. Many DIMMS are 72 bits wide. Can be done across memory systems (e.g. hard drives for RAID) ECE331: Introduction to Virtual Memory 11
Virtual Machines Host computer emulates guest operating system and machine resources Improved isolation of multiple guests Avoids security and reliability problems Aids sharing of resources Virtualization has some performance impact Feasible with modern high-performance comptuers Examples IBM VM/370 (1970s technology!) VMWare Microsoft Virtual PC ECE331: Introduction to Virtual Memory 12
Virtual Machine Monitor (VMM) Maps virtual resources to physical resources Memory, I/O devices, CPUs Guest code runs on native machine in user mode Traps to VMM on privileged instructions and access to protected resources Guest OS may be different from host OS VMM handles real I/O devices Emulates generic virtual I/O devices for guest Guest OS s: VM1 VM2 VM3 Host OS VMM I/O Processor, running ISA ECE331: Introduction to Virtual Memory 13
Example: Timer Virtualization In native machine, on timer interrupt OS suspends current process, handles interrupt, selects and resumes next process With Virtual Machine Monitor VMM suspends current VM, handles interrupt, selects and resumes next VM If a VM requires timer interrupts VMM emulates a virtual timer Emulates interrupt for VM when physical timer interrupt occurs ECE331: Introduction to Virtual Memory 14
Instruction Set Support User and System modes Privileged instructions only available in system mode Trap to system if executed in user mode All physical resources only accessible using privileged instructions Including page tables, interrupt controls, I/O registers Renaissance of virtualization support Current ISAs (e.g., x86) adapting ECE331: Introduction to Virtual Memory 15
Summary Finished discussion of power and energy in processors Reliability, Dependability, Availability and MTBF Hamming correction codes Virtual machines ECE331: Introduction to Virtual Memory 16