Structurig Redudacy for Fault Tolerace CSE 598D: Fault Tolerat Software
What do we wat to achieve? Versios Damage Assessmet Versio 1 Error Detectio Iputs Versio 2 Voter Outputs State Restoratio Cotiued Service Versio N
Robust Software Robust software approach does ot use redudacy Robustess: extet to which software ca cotiue to operate correctly despite the itroductio of ivalid iputs as defied i program specificatio Hadles Out of rage iputs Iputs of the wrog type Iputs i the wrog format What happes whe ivalid iput is detected?
Iputs false Valid iput? true Request ew iput OR Use last acceptable value OR Use predefied default value Raise exceptio flag Cotiue software operatio Result Hadle exceptios Robust Software
Self Checkig Software Testig iput data by, for example, error detectig code ad data type checks Testig the cotrol sequeces by, for example, settig bouds o loop iteratios Testig the fuctio of the process by, for example, performig reasoableess check o the output
Assertios A assertio is a statemet that eables you to test your assumptios about your program at specific poits For example, if you write a fuctio that calculates the speed of a particle, you might assert that the calculated speed is less tha the speed d of light Each assertio cotais a Boolea expressio that you believe will be true whe the assertio executes. If it is ot true, the system will throw a error. By verifyig that the Boolea expressio is ideed true, the assertio cofirms your assumptios about the behavior of your program, icreasig your cofidece that the program is free of errors Q: Why do we eed assertios if we have a exceptio mechaism? A: Exceptios are primarily used to hadle uusual coditios arisig durig program executio. Assertios are used to specify coditios that t a programmer assumes are true. Whe programmig, if a programmer ca c swear that the value beig passed ito a particular method is positive o matter what a callig cliet passes, it ca be documeted usig a assertio to state it. Exceptios hadle abormal coditios arisig i the course of the program; however they do ot guaratee smooth or correct c executio of the program. Assertios help state scearios that esure e the program is ruig smoothly. Assertios ca be efficiet tools to esure correct executio of a program. They improve the cofidece about t the program. Not very ew: see Bob Floyd's origial paper "Assigig meaigs to programs" (1967)
Robust Software + Errors detectio of errors i the developmet ad test process - Caot detect ad tolerate less specific errors
Desig Diversity Redudat, exact copies of software compoets aloe caot icrease reliability Diversity: Provisio of idetical services through separate desig ad implemetatios called modules, versios, variats, alteratives Goal: Make the variats as diverse ad idepedet as possible, with the ultimate objective beig the miimizatio of idetical error causes Whe the variats fail, we wat them fail o disjoit subsets of the iput space We wat the reliability of variats as high as possible (at least oe variat will be operatioal at all times)
Desig Diversity Begis with a iitial requiremets specificatio Specificatios may also employ diversity (as log as fuctioal equivalecy is maitaied) Each developer or developmet orgaizatio implemets the variat to the specificatio ad provides the outputs required by the specificatio
Desig Diversity Variat 1 Variat 2 Variat Decider Correct Icorrect
Variats ad Adjudicator ad Cost Whe sigificat idepedece i the variats failure profile ca be achieved, a simple ad efficiet adjudicator ca be used, ad desig diversity provides effective error recovery from desig faults It is likely, however, that completely idepedet developmet caot be achieved i practice Is desig diversity costly?
Case Study Bishop presets a useful review of the research i this area Summarized fidigs: A sigificat proportio of the faults foud i the experimets were similar The major cause of the commo faults was the specificatio (ay solutio?) The major deficiecies i the specificatios were icompleteess ad ambiguity. This caused/forced the programmer to make some icorrect ad potetially commo desig choices Diverse desig specificatios ca potetially reduce specificatio related commo faults I geeral, fewer faults seem to occur i strogly typed, tightly structured laguages such as Modula 2 ad Ada, while low-level level assembler has the worst performace i terms of fault tolerace\ A sigificat improvemet i the reductio of idetical ad very similar faults was foud by usig the N-versio N desig paradigm
Levels of Diversity Two aspects of the level of fault tolerace to cosider Determiig at what level of detail to decompose the system ito modules that will be diversified Determiatio of which layers of the system to diversify (hardware, applicatio software, system software, operators, ad iterfaced betwee these compoets) Multilayer diversity? Problems: cost ad speed
Systematic Diversity Oe way to add diversity at a potetially lower cost is systematic diversity, although it is typically used as a software techique for toleratig hardware faults Utilizatio of differet processor registers i the variats Trasformatio of mathematical expressios Differet implemetatio of programmig structures Differet memory usages Usig complemetary brachig coditios i the variats by trasformig the brach statemets Differet compilers, libraries, ad likers Differet optimizatio ad code-geeratio optios
Data Diversity Limitatios of some desig diverse techiques led to the developmet of data diverse software fault tolerace techiques Data diverse techiques are meat to complemet, rather tha replace, desig diverse techiques Steps Obtai a related set of poits i the program data space, executig the same software o those poits Use a decisio algorithm to determie the resultig output
Failure Domai ad Failure Regio Failure Domai: set of iput poits that cause program failure Failure Regio: geometry of the failure domai Iput space of most programs is a hyperspace of may dimesios E.g., if a program reads ad processes a set of 25 floatig-poit umbers, its iput space has 25 dimesios The valid program space is defied by the specificatios ad by tested values ad rages
Basic Data Re-expressio expressio x Execute P P(x) Re-expressio y = R(x) Execute P P(y) The program, P, ad R determie the relatioship betwee P(x) ad P(y)
Re-expressio expressio with Postexecutio Adjustmet x Execute P P(x) Re-expressio y = R(x) Execute P Adjust for re-expressio A(P(y)) P(y)
Re-expressio expressio via Decompositio ad Recombiatio x Execute P P(x) P(x 1 ) Decompose x -> x 1,x 2,,x N P(x 2 ) Recombie P(x i ) F(P(x i )) P(x N )
Sets i the Output Space Valid output set {y Valid(x,Pc(y))} Failureset {y ot Valid(y,P c (y))} Idetical output set {y Correct(x,P c (y))} These sets are importat i the developmet of data re-expressio Algorithms.
Data Re-expressio expressio R(x)=y x Failure Set Idetical output set Valid output set Iput Space Output Space Failure Regio
Examples of Data Re-expressio expressio Itersectio of lie segmets (exact) Sort fuctio (exact) Sesor data (approximate) What about re-expressio expressio via decompositio ad recombiatio? si(a+b) ) = si(a)cos(b) ) + cos(a)si(b) cos(a) ) = si(π/2 /2-a) si(a+b)= )=si(a) si(π/2 /2-b) + si(π/2 /2-a)si(b) Data re-expressio expressio ca be used o umeric data, character strigs, differetial equatios, ad other represetatios. For example, combiig tree trasformatios, data storage re-orderig, ad code storage re-orderig provide cosiderable diversity i the data processed by large fractios of a covetioal compiler Cautio: Exact re-expressio expressio algorithms may have the defect of preservig precisely those aspects of the data that cause program failure
Temporal Diversity Temporal diversity ivolves the performace or occurrece of a evet at differet times E.g., begiig software executio at differet times effective for trasiet faults Temporal diversity by usig data produced at differet times ca also provide iputs to a data diverse techique temporal skewig of data Receive iput Receive iput Receive iput Software executio Adjudicate result Reject Accept Discard t i t i+1 t i+2
Architectural Structure for Diverse Software To aid i avoidace of faults i the first place ad the tolerace of those remaiig faults, the system complexity must be cotrolled Structurig the hardware ad software compoets that comprise these systems is a key factor to cotrollig the complexity Laprie ad colleagues describe two structurig mechaisms Layerig: We wat each layer to have the fault tolerace mechaisms to hadle the errors produced i that layer Error cofiemet areas: described i terms of the system hardware ad software architecture elemets
Xu ad Radell Framework Adjudicator Variat Targets at developig fault tolerat applicatio Two abstract classes Voter-1 Voter-2 Variat-1 Variat-2 Complex Voter-1 Complex Variat-1 User-defied adjudicators User-defied variat hierarchy