Fault Tolerance & Reliability CDA Chapter 2 Additional Interesting Codes

Fault Tolerance & Reliability CDA 5140 Chapter 2 Additional Interesting Codes m-out-of-n codes - each binary code word has m ones in a length n non-systematic codeword - used for unidirectional errors only - if errors are from 0 to 1, then weight will be greater than m and if from 1 to 0 the weight will be less then m - for both cases, unidirectional errors can be detected - these codes useful for encoding control information since each bit typically has separate interpretation - in particular, 1-out-of-n codes are used to control an operation without decoding other bits, i.e. n valid codewords Berger codes - binary systematic unidirectional error-detecting codes - for information bits given as (a m, a m-1,... a 1 ) the number of ones is determined, and this count is then appended, in binary as the check symbol - the number of check bits is log 2 (m + 1) - for example, the information bits 1100011 would have the check bits 100 for a total word 1100011 100 Error Correcting/Detecting Unidirectional Code - following is single error correcting, multiple unidirectional error detecting code - assign weights to the m information bits which are the m least positive integers which are not powers of 2, beginning with 3, 5, 6, 7 from the lowest order position

- type 1 check is the sum of the weights corresponding to positions with information bit of zero, expressed in binary - the type 2 check is the number of 0 s in the information and type 1 check bits - for example for 8 information bits, the weights are: 12, 11, 10, 9, 7, 6, 5, 3 - consider information bits (1, 0, 0, 1, 1, 1, 0, 1) which gives the sum 26 = 5 +!0 + 11 or 011010, and then the number of 0 s is six, or 0110 and the code word is: 10011101 011010 0110 - for the received word (a m, a m-1,... a 1, b r, b r-1,... b 0, d s, d s-1,... d 0 ), let B = b i 2 i, 0 < i < r and D = d i 2 i, 0 < i < s then for no error, if B is the corresponding weighted sum (using the weights 3, 5, 6, 7 ) for the received word, B B = 0 for a single error, if B B = w i, complement bit with weight w i, in the information part, then count the number of 0 s and calculate D, and if D D = 0, then single error corrected for multiple unidirectional errors, either w i is not in the weight set or D - D 0 - for the word received as above, we have B = 26 and D = 6, and without any errors we would have B = 26 and D = 6 indicating no errors in either the information or the check parts - if the received word were 10111101 011010 0110, then B = 5 + 11 = 16, and then B B = 16 26 = 10 and hence complementing the bit with weight 10 gives 10011101 011010 0110 and D = 6 so D D = 0 so a single error is corrected - if the received word were 10011101 011000 0110, B = 26 and B = 24, so B B = 2, and complementing the position with weight 2 (in the type 1 check) we have 10011101 011010 0110, and D = 6 and D = 6, so D D = 0 and so a single error has been corrected

- if the received word were 10011111 011011 0111 then B = 21 and B = 27, and B B = 6, so complementing the corresponding information bit we obtain 10011011 011011 0111 so that B = 27 so that B B = 0, however, D = 5, and D = 7, so D D = 2, so we know that we have multiple unidirectional errors - Codes for Mass Storage - goal in such storage is high bit density, but in so doing, the frequency of random errors increases - error-correcting codes allow for increased bit density while still providing reliability - some loss of bit density due to redundancy is acceptable since overall there is a significant gain in bit density by using sophisticated error-correction techniques - errors come from media noise, writing noise, reading noise - sophisticated error-correcting and error-detecting codes tolerate significant numbers of persistent or intermittent errors - re-reading is used in some instances when there is reading noise or intermittent media-caused noise - to avoid too many re-readings, majority rule decision on each bit before using error-correcting or error-detecting - these approaches applicable to disk, magnetic tape, and most mass storage Codes for Magnetic Tape - for multi-track tapes, errors often occur in clusters, especially in a single track - the codes used for these are typically cyclic redundancy check codes which are a type of the generating polynomial codes studied earlier where shift registers are used to correct the byte errors

Arithmetic Codes - coding for arithmetic operations different than what is used for correcting and detecting errors in memory but has a similar basis - such codes are defined for arithmetic operations but can also be applied to some instances of data transmissions and memory protection - any integer N base r can be expressed as N = a n-1 r n-1 + a n-2 r n-2 +... + a 0 for 0 < a i < r, i = 0, 1,... n-1 - the arithmetic weight W(N) is defined as the minimum number of non-zero terms when N is expressed in the form N = b n-1 r n-1 + b n-2 r n-2 +... + b 0 where in the binary case the b i are 0, +1 - for example, W(14) = 2 not 3 since you can express 14 as 2 4 2 instead of 2 3 + 2 2 + 2 1 = 14 - arithmetic distance between two numbers N and M is then defined as W(N M) - for example when r = 2 the arithmetic distance between 27 and 19 is 1 since 27 19 = 8 = 2 3 and W(2 3 ) = 1 - codes are defined based on this definition of distance Communication Codes - communication medium often subject to greater frequency of errors and more variance of conditions than general computing environment, however with increased use of optical fiber situation is improving - if time delay is acceptable, error detection and request for retransmission is used with virtually guaranteed reliability (called ARQ or Acknowledge Request) - sophisticated error-correcting codes are used with detection and retransmission

- variety of approaches based on sending data in blocks with sequence numbers which must be acknowledged either in groups or individually - when a block is not received can selectively request that block to be resent or a series of blocks - cyclic code with a generator polynomial is used to check the header and if it is incorrect (non-zero remainder polynomial) frame discarded Burst Error Correction - in all the above codes assume that have a single error and probability of multiple errors is very small - in burst errors, assume errors most commonly come in bursts - this pattern of errors is typical for rotational magnetic and optical storage devices such as CDs, CD-ROMs, hard drives, and floppy disk drives - typical burst error patterns can be seen in following messages where x represents an error: m 1 = bbbxxbxbbbbb m 2 = bxbxxbbbbbbb m 3 = bbbbxbxbbbbb m 4 = bxxbbbbbbbbb - in the first 2 messages the burst is of length 4, message 3 of length 3, and the last message the burst is of length 2 - in general consider bursts of length t - for error-detection an example code of length 12 is used where a code word is given as V = (x 1, x 2,... x 12 ) - for this example let t = 4 so bits separated by 4 positions will only contain at most 1 error - thus the check bits can be chosen based on:

x 1 XOR x 5 XOR x 9 = 0 x 2 XOR X 6 XOR x 10 = 0 x 3 XOR x 7 XOR x 11 = 0 x 4 XOR x 8 XOR x 12 = 0 - note that each bit occurs in only one equation and for a burst of length 4 or less, the burst will be spread over the equations, and none of the members of the burst can cancel out the error - define four of the bits in V to be check bits, and for simplicity take those to be the first 4 bits, so that V = (c 1 c 2 c 3 c 4 b 1 b 2 b 3 b 4 b 5 b 6 b 7 b 8 ) - and c 1 XOR b 1 XOR b 5 = 0 c 2 XOR b 2 XOR b 6 = 0 c 3 XOR b 3 XOR b 7 = 0 c 4 XOR b 4 XOR b 8 = 0 - these can then be used to generate the syndrome vector e 1 e 2 e 3 e 4 x 1 XOR x 5 XOR x 9 = e 1 x 2 XOR X 6 XOR x 10 = e 2 x 3 XOR x 7 XOR x 11 = e 3 x 4 XOR x 8 XOR x 12 = e 4 - the all-zero syndrome again indicates no errors assuming t or fewer errors - the following properties hold for burst codes: o for burst length of t, t check bits are needed for error detection o for m message bits, burst length t, code length is m + t o there are t check-bit equations as above o generation and checking for burst error code realized by linear feedback shift register - error-correction is more complex but is based on the same concepts