Design and Evaluation of a Window-Consistent Replication Service

Size: px

Start display at page:

Download "Design and Evaluation of a Window-Consistent Replication Service"

Rosalind Summers
5 years ago
Views:

1 986 IEEE TRANSACTIONS ON COMPUTERS, VOL. 46, NO. 9, SEPTEMBER 1997 Desgn and Evaluaton of a Wndow-Consstent Replcaton Servce Ashsh Mehra, Student Member, IEEE, Jennfer Rexford, Member, IEEE, and Farnam Jahanan, Member, IEEE Abstract Real-tme applcatons typcally operate under strct tmng and dependablty constrants. Although tradtonal data replcaton protocols provde fault tolerance, real-tme guarantees requre bounded overhead for managng ths redundancy. Ths paper presents the desgn and evaluaton of a wndow-consstent prmary-backup replcaton servce that provdes tmely avalablty of the repostory by relaxng the consstency of the replcated data. The servce guarantees controlled nconsstency by schedulng update transmssons from the prmary to the backup(s); ths ensures that clent applcatons nteract wth a wndow-consstent repostory when a backup must supplant a faled prmary. Experments on our prototype mplementaton, on a network of Intel-based PCs runnng RT-Mach, show that the servce handles a range of clent loads whle mantanng bounds on temporal nconsstency. Index Terms Real-tme systems, fault tolerance, replcaton protocols, temporal consstency, schedulng. 1 INTRODUCTION M ANY embedded real-tme applcatons, such as automated manufacturng and process control, requre tmely access to a fault-tolerant data repostory. Faulttolerant systems typcally employ some form of redundancy to nsulate applcatons from falures. Tme redundancy protects applcatons by repeatng computaton or communcaton operatons, whle space redundancy masks falures by replcatng physcal resources. The tme-space trade-offs employed n most systems may prove napproprate for achevng fault tolerance n a real-tme envronment. In partcular, when tme s scarce and the overhead for managng redundancy s too hgh, alternatve approaches must balance the trade-off between tmng predctablty and fault tolerance. For example, consder the process-control system shown n Fg. 1a. A dgtal controller supports montorng, control, and actuaton of the plant (external world). The controller software executes a tght loop, samplng sensors, calculatng new values, and sendng sgnals to external devces under ts control. It also mantans an n-memory data repostory, whch s updated frequently durng each teraton of the control loop. The data repostory must be replcated on a backup controller to meet the strct tmng constrant on system recovery when the prmary controller fals, as shown n Fg. 1b. In the event of a prmary falure, the system must swtch to the backup node wthn a few hundred A. Mehra s wth the Multmeda Networkng Department, IBM T.J. Watson Research Center, Hawthorne, NY E-mal: ashsh@eecs.umch.edu. J. Rexford s wth Network Mathematcs Research, Networkng and Dstrbuted Systems, AT&T Labs Research, Florham Park, NJ E-mal: jrex@research.att.com. F. Jahanan s wth the Real-Tme Computng Laboratory, Department of Electrcal Engneerng and Computer Scence, The Unversty of Mchgan, Ann Arbor, MI E-mal: farnam@eecs.umch.edu. Manuscrpt receved 21 Jan For nformaton on obtanng reprnts of ths artcle, please send e-mal to: tc@computer.org, and reference IEEECS Log Number mllseconds. Snce there can be hundreds of updates to the data repostory durng each teraton of the control loop, t s mpractcal (and perhaps mpossble) to update the backup synchronously each tme the prmary repostory changes. An alternatve soluton explots the data semantcs n a process-control system by allowng the backup to mantan a less current copy of the data that resdes on the prmary. The applcaton may have dstnct tolerances for the staleness of dfferent data objects. Wth suffcently recent data, the backup can safely supplant a faled prmary; the backup can then reconstruct a consstent system state by extrapolatng from prevous values and new sensor readngs. However, the system must ensure that the dstance between the prmary and the backup data s bounded wthn a predefned tme wndow. Data objects may have dstnct tolerances n how far the backup can lag behnd before the object state becomes stale. The challenge s to bound the dstance between the prmary and the backup such that consstency s not compromsed, whle mnmzng the overhead n exchangng messages between the prmary and ts backup. Ths paper presents the desgn and mplementaton of a data replcaton servce that combnes fault-tolerant protocols, real-tme schedulng, and temporal consstency semantcs to accommodate such system requrements [24], [29]. A clent applcaton regsters a data object wth the servce by declarng the consstency requrements for the data n terms of a tme wndow. The prmary selectvely transmts to the backup, as opposed to sendng an update every tme an object changes, to bound both resource utlzaton and data nconsstency. The prmary ensures that each backup ste mantans a verson of the object that was vald on the prmary wthn the precedng tme wndow by schedulng these update messages. The next secton dscusses related work on fault-tolerant protocols and relaxed consstency semantcs, wth an emphass on supportng real-tme applcatons. Secton /97/$ IEEE

2 MEHRA ET AL.: DESIGN AND EVALUATION OF A WINDOW-CONSISTENT REPLICATION SERVICE 987 (a) Dgtal controller nteractng wth a plant (b) Prmary-backup control system Fg. 1. Computer control system. descrbes the proposed wndow-consstent prmary-backup archtecture and replcaton protocols for mantanng controlled nconsstency wthn the servce. Ths replcaton model ntroduces a number of nterestng ssues n schedulng, fault detecton, and system recovery. Secton 4 consders real-tme schedulng algorthms for creatng and mantanng a wndow-consstent backup, whle Secton 5 presents technques for fault detecton and recovery for prmary, backup, and communcaton falures. In Secton 6, we present and evaluate an mplementaton of the wndowconsstent replcaton servce on a network of Intel-based PCs runnng RT-Mach [32]. Secton 7 concludes the paper by hghlghtng the lmtatons of ths work and dscussng future research drectons. 2 RELATED WORK 2.1 Replcaton Models A common approach to buldng fault-tolerant dstrbuted systems s to replcate servers that fal ndependently. In actve (state-machne) replcaton schemes [6], [30], a collecton of dentcal servers mantan copes of the system state. Clent wrte operatons are appled atomcally to all of the replcas so that after detectng a server falure, the remanng servers can contnue the servce. Passve (prmarybackup) replcaton [2], [9], on the other hand, dstngushes one replca as the prmary server, whch handles all clent requests. A wrte operaton at the prmary nvokes the transmsson of an update message to the backup servers. If the prmary fals, a falover occurs and one of the backups becomes the new prmary. In recent years, several fault-tolerant dstrbuted systems have employed state-machne [7], [11], [26] or prmarybackup [4], [5], [9] replcaton. In general, passve replcaton schemes have longer recovery delays snce a backup must nvoke an explct recovery algorthm to replace a faled prmary. On the other hand, actve replcaton typcally ncurs more overhead n respondng to clent requests snce the servce must execute an agreement protocol to ensure atomc ordered delvery of messages to all replcas. In both replcaton models, each clent wrte operaton generates communcaton wthn the servce to mantan agreement amongst the replcas. Ths artfcally tes the rate of wrte operatons to the communcaton capacty n the servce, lmtng system throughput whle ensurng consstent data. Past work on server replcaton has focused, n most cases, on mprovng throughput and latency for clent requests. For example, Fg. 2a shows the basc prmarybackup model, where a clent wrte operaton at the prmary P trggers a synchronous update to the backup B [4]. The servce can mprove response tme by allowng the backup B to acknowledge the clent C [2], as shown n Fg. 2b. Fnally, the prmary can further reduce wrte latency by replyng to C mmedately after sendng an update message to B, wthout watng for an acknowledgment [8], as shown n Fg. 2c. Smlar performance optmzatons apply to the state-machne replcaton model. Although these technques sgnfcantly mprove average performance, they do not guarantee bounded worst-case delay, snce they do not lmt communcaton wthn the servce. (a) Blockng (b) Effcent blockng (c) Nonblockng Fg. 2. Prmary-backup models. Synchronzaton of redundant servers poses addtonal challenges n real-tme envronments, where applcatons operate under strct tmng and dependablty constrants; server replcaton for hard real-tme systems s under nvestgaton n several recent expermental projects [15], [16], [33]. Synchronzaton overheads, communcaton delay, and

3 988 IEEE TRANSACTIONS ON COMPUTERS, VOL. 46, NO. 9, SEPTEMBER 1997 nteracton wth the external envronment complcate the desgn of replcaton protocols for real-tme applcatons. These overheads must be quantfed precsely for the system to satsfy real-tme constrants. 2.2 Consstency Semantcs A replcaton servce can bound these overheads by relaxng the data consstency requrements n the repostory. For a large class of real-tme applcatons, the system can recover from a server falure, even though the servers may not have mantaned dentcal copes of the replcated state. Ths facltates alternatve approaches that trade atomc or causal consstency among the replcas for less expensve replcaton protocols. Enforcng a weaker correctness crteron has been studed extensvely for dfferent purposes and applcaton areas. In partcular, a number of researchers have observed that seralzablty s too strct a correctness crteron for real-tme databases. Relaxed correctness crtera facltate hgher concurrency by permttng a lmted amount of nconsstency n how a transacton vews the database state [12], [17], [18], [20], [28]. Smlarly, mprecse computaton guarantees tmely completon of an applcaton by relaxng the accuracy requrements of the computaton [22]. Ths s partcularly useful n applcatons that use dscrete samples of contnuous-tme varables, snce these values can be approxmated when there s not suffcent tme to compute an exact value. Weak consstency can also mprove performance n nonreal-tme applcatons. For nstance, the quas-copy model permts some nconsstency between the central data and ts cached copes at remote stes [1]. Ths gves the scheduler more flexblty n propagatng updates to the cached copes. In the same sprt, wndow-consstent replcaton allows computatons that may otherwse be dsallowed by exstng actve or passve protocols that requre atomc updates to a collecton of replcas. 3 WINDOW-CONSISTENT REPLICATION The wndow-consstent replcaton servce conssts of a prmary and one or more backups, wth the data on the prmary shadowed at each backup ste. These servers store objects whch change over tme, n response to clent nteracton wth the prmary. In the absence of falures, the prmary satsfes all clent requests and supples a dataconsstent repostory. However, f the prmary crashes, a wndow-consstent backup performs a falover to become the new prmary. Hence, servce avalablty hnges on the exstence of a wndow-consstent backup to supplant a faled prmary. 3.1 System Model Unlke the prmary-backup protocols n Fg. 2, the wndowconsstent replcaton model decouples clent read and wrte operatons from communcaton wthn the servce. As shown n Fg. 3, the prmary object manager (OM) handles clent data requests, whle sendng messages to the backups at the behest of the update scheduler (US). Snce read and wrte operatons do not trgger transmssons to the backup stes, clent response tme depends only on local operatons at the prmary. Ths allows the prmary to handle a hgh rate of clent requests whle ndependently sendng update messages to the backup stes. Although these update transmssons must accommodate the temporal consstency requrements of the objects, the prmary cannot compromse the clent applcaton s processng demands. Hence, the prmary must match the update rate wth the avalable processng and network bandwdth by selectvely transmttng messages to the backups. The prmary executes an admsson control algorthm as part of object creaton to ensure that the US can schedule suffcent update transmssons for any new objects. Unlke clent reads and wrtes, object creaton and deleton requres complete agreement between the prmary and all the backups n the replcaton servce. Fg. 3. Wndow-consstent prmary-backup archtecture. 3.2 Consstency Semantcs The prmary US schedules transmssons to the backups to ensure that each replca has a suffcently recent verson of each object. Tmestamps t P () t and t B () t dentfy successve versons of object O at the prmary and backup stes, respectvely. At tme t, the prmary P has a copy of O wrtten by the clent applcaton at tme t P ( t), whle a backup B stores a, possbly older, verson orgnally wrtten on P at tme t B ( t). Whle B may have an older verson of O than P, the copy on B must be recent enough. If O has wndow d, a wndow-consstent backup must beleve n data that was vald on P wthn the last d tme unts. DEFINITION 1. At tme t, a backup copy of object O has wndow-nconsstency t - t, where t s the maxmum tme P B such that t t and t ( t ) = t ( t). Object O s wndowconsstent f and only f t - t d ; a backup B s wndowconsstent f and only f all of ts objects are wndowconsstent. In other words, B has a wndow-consstent copy of object O at tme t f and only f P t t B - d t t t P t c h af af. For example, n Fg. 4, P performs several wrte operatons on O, on behalf of clent requests, but selectvely transmts update messages to B. At tme t, the prmary has the most recent verson of the object, wrtten by the clent at tme d. The backup has a copy frst recorded on the prmary at tme b; the prmary stopped belevng ths verson at tme c.

4 MEHRA ET AL.: DESIGN AND EVALUATION OF A WINDOW-CONSISTENT REPLICATION SERVICE 989 Fg. 4. Wndow-consstency semantcs. Thus, t P B P ()= t d, t = b, and t ( t - d) = a. Snce a b d, B has a wndow-consstent verson of O at tme t. The backup object has nconsstency t - c, whch s less than ts wndowconsstency requrement d. A small value of t - c allows the clent to operate wth a more recent copy of the object f the backup must supplant a faled prmary. The metrc t - t represents an object s temporal nconsstency wthn the replcaton servce, as seen by an omnscent observer. Snce the backup ste does not always have up-to-date knowledge of clent operatons, the backup has a more conversatve vew of temporal consstency, as dscussed n Secton 5.2. The clent may also requre bounds on the staleness of the backup s object, relatve to the prmary s copy, to construct a vald system state when a falover occurs. In partcular, f the clent reads O at P tme t on P, t receves the verson that t wrote t - t () t tme unts ago. On the other hand, f B supplants a faled prmary, the clent would read the verson that t wrote B P B t - t ( t) tme unts ago. Ths verson s t () t - t () t older than that on the prmary; n Fg. 4, ths clent vew has nconsstency d - b. DEFINITION 2. At tme t, object O has recovery nconsstency P B t () t - t () t. Two components contrbute to ths recovery nconsstency: clent wrte patterns and the temporal nconsstency wthn the servce. Wndow-consstent replcaton bounds the latter, allowng the clent to bound recovery nconsstency based on ts access patterns. For example, suppose consecutve clent wrtes occur at most w tme unts apart; typcally, w s smaller than d, snce the prmary sends only selectve updates to the backup stes. The wndowconsstency bound t - t d then ensures that the backup s copy of the object was wrtten on the prmary no earler than tme t - (d + w ). Snce t P () t t, wndow consstency P B guarantees that t () t - t () t d + w. 4 REAL-TIME UPDATE SCHEDULING Ths secton descrbes how the prmary can use exstng real-tme task schedulng algorthms to coordnate update transmssons to the backups. In the absence of lnk (performance or crash) falures [10], we assume a bound, on the end-to-end communcaton latency wthn the servce. For example, a real-tme channel [14], [23] wth the desred bound could be establshed between the prmary and the backups. Several other approaches to provdng bounds on communcaton latency are dscussed n [3]. If a clent operaton modfes O, the prmary must send an update for the object wthn the next d -, tme unts; otherwse, the backups may not receve a suffcently recent verson of O before the tme-wndow d elapses. In order to bound the temporal nconsstency wthn the servce, t suffces that the prmary send O to the backups at least once every d -, tme unts. Whle boundng the temporal nconsstency, the prmary may send addtonal updates to the backups f suffcent processng and network capacty are avalable; these extra transmssons ncrease the servce s reslence to lost update messages and the average goodness of the replcated data. In addton to sendng update transmssons to the backups, the prmary must allow effcent ntegraton of new backups nto the replcaton servce. Lmted processng and network capacty necesstate a trade-off between tmely ntegraton of a new backup and keepng exstng backups wndow-consstent. The prmary should mnmze the tme to ntegrate a new replca, especally when there are no other wndow-consstent backups, snce a subsequent prmary crash would result n a server falure. The prmary constructs a schedule that sends each object to the backup exactly once, and allows the prmary to smoothly transton to the update transmsson schedule. Whle several task models can accommodate the requrements of wndowconsstent schedulng and backup ntegraton, we ntally consder the perodc task model [19], [21]. 4.1 Perodc Schedulng of Updates The transmssons of updates can be cast as tasks that run perodcally wth deadlnes derved from the objects wndow-consstency requrements. The prmary coordnates transmssons to the backups by schedulng an update task wth perod p and servce tme e for each object O ; 1 for wndow consstency, ths permts a maxmum perod p = (d -,)/2. The end of a perod serves as both the deadlne for one nvocaton of the task and the arrval tme for the subsequent nvocaton. The scheduler always runs the ready task wth the hghest prorty, preemptng executon f a hgher-prorty task arrves. For example, ratemonotonc schedulng statcally assgns hgher prorty to tasks wth shorter perods [19], [21], whle earlest-due-date schedulng favors tasks wth earler deadlnes [21]. The schedulng algorthm, coupled wth the object parameters e and d, determnes a schedulablty crteron based on the total processor and network utlzaton. The schedulablty crteron governs object admsson nto the replcaton servce. The prmary rejects an object regstraton request (specfyng e and d ) f t cannot schedule suffcent updates for the new object wthout jeopardzng the wndow consstency of exstng objects,.e., t does not have 1. The sze of O determnes the tme e requred for each update transmsson. In order to accommodate preemptve schedulng and objects of varous szes, the prmary can send an update message as one or more fxed-length packets.

5 990 IEEE TRANSACTIONS ON COMPUTERS, VOL. 46, NO. 9, SEPTEMBER 1997 suffcent processng and network resources to accommodate the object s wndow-consstency requrements. The schedulng algorthm mantans wndow consstency for all objects as long as the the collecton of tasks does not exceed a certan bound on resource utlzaton (e.g., 0.69 for ratemonotonc and 1 for earlest-due-date) [21]. 4.2 Compressng the Perodc Schedule Whle the perodc model can guarantee suffcent updates for each object, the schedule updates O only once per perod p, even f computaton and network resources permt more frequent transmssons. Ths restrcton arses because the perodc model assumes that a task becomes ready to run only at perod boundares. However, the prmary can transmt the current verson of an object at any tme. The scheduler can captalze on ths readness of tasks to mprove both resource utlzaton and the wndow consstency on the backups by compressng the perodc schedule. Consder two objects O 1 (wth p 1 = 5 and e 1 = 2) and O 2 (p 2 = 3 and e 2 = 1), as shown n Fg. 5; the unshaded boxes denote transmsson of O 1, whle the shaded boxes sgnfy transmsson of O 2. The scheduler must send an update requrng one unt of processng tme once every three tme unts (unshaded box) and an update requrng two unts of processng tme once every fve tme unts (shaded box). The schedule repeats after each major cycle of length 15. Each tme unt corresponds to a tck whch s the granularty of resource allocaton for processng and transmsson of a packet. For ths example, both the rate-monotonc and earlest-due-date algorthms generate the schedule shown n Fg. 5a. Whle each update s sent as requred n the major cycle of length 15, the schedule has four unts of slack tme. The replcaton servce can captalze on ths slack tme to mprove the average temporal consstency of the backup objects. In partcular, the perodc schedule n Fg. 5a can provde the order of task executons wthout restrctng the tme the tasks become actve. If no tasks are ready to run, the scheduler can advance to the earlest pendng task and actvate that task by advancng the logcal tme to the start of the next perod for that object. Wth the compressed schedule, the prmary stll transmts an update for each O at least once per perod p but can send more frequent update messages when tme allows. As shown n Fg. 5b, compressng the slack tme allows the schedule to start over at tme 11. In the worst case, the compressed schedule degrades to the perodc schedule wth the assocated guarantees. 4.3 Integratng a New Backup To mnmze the tme the servce operates wthout a wndow-consstent backup, the prmary P needs an effcent mechansm to ntegrate a new or nvald backup B. P must send the new backup B a copy of each object and then transton to the normal perodc schedule, as shown n Fg. 6. Although B may not have wndow-consstent objects durng the executon of the ntegraton schedule, each object must become consstent and reman consstent untl ts frst update n the normal perodc schedule. As a result, B must receve a copy of O wthn the perod p before the perodc schedule begns; ths ensures that B can afford to wat untl the next p nterval to start recevng perodc update messages for O. In order to ntegrate the new backup, then, the prmary must execute an ntegraton schedule that would allow t to transton to the perodc schedule whle mantanng wndow consstency. Referrng to Fg. 6, a wndow-consstent transton requres pror post pror D + D d -,; D j s the tme elapsed from the j j j (a) Perodc schedule (b) Compressed perodc schedule Fg. 5. Compresson (p 1 = 5, e 1 = 2, p 2 = 3, e 2 = 1). last transmsson of O j to the end of the ntegraton schedule, whle D j post s the tme from the start of the perodc schedule untl the frst transmsson of O j. Ths ensures wndow consstency for each object, even across the schedule transton. Snce the perodc task model provdes post post Dj pj, t suffces to ensure that Dj pj = ( d j -,) 2. A smple schedule for ntegraton s to send objects to the new backup usng the normal perodc schedule already beng used for update transmssons to the exstng replcas. Ths ncurs a worst-case delay of 2 max {p } to ntegrate the new backup nto the servce. However, f the servce has no wndow-consstent backup stes, the prmary should mnmze the tme requred to ntegrate a new replca. In partcular, an effcent ntegraton schedule should transmt each object exactly once before transtonng to the normal perodc schedule. The prmary may adapt the normal perodc schedule nto an effcent ntegraton schedule by removng duplcate object transmssons. In partcular, the prmary can transmt the objects n order of ther last update transmssons before the end of a major cycle n the normal schedule. For example, for the schedule shown n Fg. 5a, the ntegraton Fg. 6. Integratng a new backup repostory.

6 MEHRA ET AL.: DESIGN AND EVALUATION OF A WINDOW-CONSISTENT REPLICATION SERVICE 991 Fg. 7. Update protocols. schedule s [O 1, O 2 ] because the last transmsson for O 1 (O 2 ) before tme 15 s at tme 10 (12). A transton from the ntegraton schedule to the normal schedule sustans wndow consstency on the newly ntegrated backup snce the normal schedule guarantees wndow consstency across major cycles. Snce the ntegraton schedule s derved from pror post the perodc schedule, t follows that Dj Dj pj. The normal schedule order can be determned when objects are created or durng the frst major cycle of the normal schedule. Snce the schedule transmts each object only N once, the ntegraton delay s Â e, where N s the number of regstered objects. Although ths approach s effcent for statc object sets, dynamc creaton and deleton of objects ntroduces more complexty. Snce the transmsson order n the normal schedule depends on the object set, the prmary must recompute the ntegraton schedule whenever a new object enters the servce. The cost of constructng an ntegraton schedule, especally for dynamc object sets, can be reduced by sendng the objects to B n reverse perod order, such that the objects wth larger perods are sent before those wth smaller perods. For object O j, ths ensures that only objects wth smaller or equvalent perods can follow O j n the ntegraton schedule; these same objects can precede O j n the perodc schedule. Ths guarantees that the ntegraton schedule transmts O j no more than p j tme unts before the start of the perodc schedule, ensurng a wndow-consstent transton. For example, n Fg. 6, p p j p k. In the perodc schedule, objects O wth p p j are transmtted at least once post wthn tme D j but exactly once wthn tme D pror j ; t follows that Dj Dj pj. After object creatons or dele- pror post tons, the prmary can construct the new ntegraton schedule by sortng the new set of perods. The prmary mnmzes the tme t operates wthout a wndow-consstent backup by transmttng each object exactly once before transtonng to the normal perodc schedule. 5 FAULT DETECTION AND RECOVERY Although real-tme schedulng of update messages can mantan wndow-consstent replcas, processor and communcaton falures potentally dsrupt system operaton. We assume that servers may suffer crash falures and the communcaton subsystem may suffer omsson or performance falures; when a ste fals, the remanng replcas must recover n a tmely manner to contnue the datarepostory servce. The prmary attempts to mnmze the tme t operates wthout a wndow-consstent backup, snce a subsequent prmary crash would cause a servce falure. Smlarly, the backup tres to detect a prmary crash and ntate falover before any backup objects become wndownconsstent. Although the prmary and backup cannot have complete knowledge of the global system state, the message exchange between servers provdes a measure of recent servce actvty. 5.1 Update Protocols Fg. 7 shows how the prmary and backup stes exchange object data and estmate global system state. We assume that the servers communcate only by exchangng messages. Snce these messages nclude temporal nformaton, P and B cannot effectvely reason about each other unless server clocks are synchronzed wthn a known maxmum bound. A clock synchronzaton algorthm can use the transmt tmes for the update and acknowledgment messages to bound clock skew n the servce. Usng the update protocols, P and B each approxmate global state by mantanng the most recent nformaton receved from the other ste. Before transmttng an update message at tme t, the prmary records the verson tmestamp t xmt for the selected B xmt object O. Snce t t, ths nformaton gves P an optmstc vew of the backup s wndow consstency. The prmary s message to the backup contans the object data, along wth the verson tmestamp and the transmsson tme. B uses the transmsson tme to detect out-of-order message arrvals by mantanng t xmt, the tme of the most recent transmsson of O that has been successfully receved; the stes store monotoncally nondecreasng verson tmestamps, wthout requrng relable or n-order message delvery n the servce. Upon recevng a newer transmsson of O, the backup updates the object s data, the verson tmestamp t B, and t xmt ; xmt as dscussed n Secton 5.2, the backup uses t to reason about ts own wndow consstency. To dagnose a crashed prmary, B also mantans t last, the transmsson tme of the last message receved from P regardng any object; that s, t = max last xmt { t }. Smlarly, P tracks the transmsson tmes of B s messages to dagnose possble crash falures. Hence, the backup s acknowledgment message to P ncludes the transmsson tme t, as well

7 992 IEEE TRANSACTIONS ON COMPUTERS, VOL. 46, NO. 9, SEPTEMBER 1997 as t B, the most recent verson tmestamp for O on B. Usng ths nformaton, the prmary determnes t ack, the most recent verson of O that B has successfully acknowledged. B ack Snce t t, ths varable gves P a pessmstc measure of the backup s wndow consstency; as dscussed n Secton 5.3, the prmary uses t ack and t xmt to select polces for schedulng update transmssons to the backup. 5.2 Backup Recovery From Prmary Falures A backup ste must estmate ts own wndow consstency and the status of the prmary to successfully supplant a crashed prmary. Whle B may be unaware of recent clent nteracton wth P for each object, B does know the tme xmt t when P transmtted verson t B of object O. Although P may contnue to beleve verson t xmt, even after transmttng the update message, B conservatvely estmates that the clent wrote a new verson of O just after P transmtted the object at tme t xmt. In partcular, DEFINITION 3. At tme t, the backup copy of object O has estmated nconsstency t - t xmt ; the backup knows that O s xmt wndow-consstent f t - t d. Fg. 4 shows an example of ths backup vew of wndow consstency. Usng ths consstency metrc, the backup must balance the possblty of becomng wndow-nconsstent wth the lkelhood of falsely dagnosng a prmary crash. If B beleves that all of ts objects are stll wndow-consstent, B need not trgger a falover untl further delay would endanger the consstency of a backup object; n partcular, the backup conservatvely estmates that ts copy of O could xmt become wndow-nconsstent by tme t + d, n the absence of further update messages from P. However, to reduce the lkelhood of false falure detecton, falover should only occur f B has not receved any messages from P for some mnmum tme b. In ths adaptve falure detecton mechansm, B dagnoses a prmary crash at tme t crash = mn t + d o xmt f and only f t crash t last + b. After falover, the new prmary ste nvokes the clent applcaton and begns nteractng wth the external envronment. For a perod of tme, the new P operates wth some partally nconsstent data but gradually constructs a consstent system state from these old values and new sensor readngs. The new P later ntegrates a fresh backup to enhance future servce avalablty. Snce B dagnoses a prmary crash through mssed update messages, lost or delayed messages could stll trgger false falure detecton, resultng n multple actve prmary stes. When the system has multple backups, the replcas can vote to select a sngle, vald prmary. However, when the servce has only two stes, communcaton falures can cause each ste to assume the other has faled. In ths stuaton, a thrd-party wtness [27] can select the prmary ste. t Ths wtness does not act as a prmary or backup server, but casts the decdng vote n falure dagnoss. In a real-tme control system, the actuator devces could mplctly serve as ths wtness; f a new server starts ssung commands to the actuators, the devces could gnore subsequent nstructons from the prevous prmary ste. 5.3 Prmary Recovery From Backup Falures Servce avalablty also depends on tmely recovery from backup falures. Snce the data-repostory servce contnues whenever a vald prmary exsts, the prmary can temporarly tolerate backup crashes or communcaton falures wthout endangerng the clent applcaton. Ultmately, though, P should mnmze the porton of tme t operates wthout a wndow-consstent backup, snce a subsequent prmary crash would cause a servce falure. The prmary should dagnose possble backup crashes and effcently ntegrate new backup stes. If P beleves that an operatonal backup has become wndow-nconsstent, due to lost update messages or transent overload condtons, the prmary should quckly refresh the nconsstent objects. As n Secton 5.2, tmeout mechansms can detect possble server falures. The prmary assumes that the backup has crashed f P has not receved any acknowledgment messages n the last a tme unts (.e., t - t last a). After detectng a backup crash, P can ntegrate a fresh backup ste nto the system whle contnung to satsfy clent read and wrte requests. If the P mstakenly dagnoses a backup crash, the system must operate wth one less replca whle the prmary ntegrates a new backup ste; ths new backup does not become wndow-consstent untl the ntegraton schedule completes, as descrbed n Secton 4.3. However, f the backup has actually faled, a large tmeout value ncreases the falure dagnoss latency, whch also ncreases the tme the system operates wthout suffcent backup stes. Hence, P must carefully select a to maxmze the backups chance of recoverng from a subsequent prmary falure. Even f the backup ste does not crash, delayed or lost update messages can compromse the wndow consstency of backup objects, makng B nelgble to replace a crashed ack prmary. Usng t and t xmt, P can estmate the consstency of backup objects and select the approprate polcy for schedulng update transmssons. The prmary may choose to rentegrate an nconsstent backup, even when t - t last < a, rather than wat for a later update message to restore the objects wndow consstency. Suppose the prmary thnks that B s copy of O s wndow-nconsstent. Under perodc update schedulng, P may not send another update message for ths object untl some tme 2p = d - d later. If ths object has a large wndow d, the prmary can reestablsh the backup s wndow consstently more quckly by executng the ntegraton schedule, whch requres tme Â, where e s the servce tme for object O, as descrbed n Secton 4.1. Stll, the prmary cannot accurately determne f the backup object O s nconsstent, snce lost or delayed acknowledgment messages can result n an overly pessmstc e

8 MEHRA ET AL.: DESIGN AND EVALUATION OF A WINDOW-CONSISTENT REPLICATION SERVICE 993 value for t ack. The prmary should not be overly aggressve n dagnosng nconsstent backup objects, snce rentegraton temporarly prohbts the backup from replacng a faled prmary. Instead, P should deally retransmt the offendng object, wthout volatng the wndow consstency of the other objects n the servce. For example, P can schedule a specal retransmsson wndow for transmttng objects that have not receved acknowledgment messages for past updates; when ths retransmsson object s selected for servce, P transmts an update message for one of the exstng objects, based on the values of t ack and t xmt. Ths mproves the lkelhood of havng wndow-consstent backup stes, even n the presence of communcaton falures. 6 IMPLEMENTATION AND EVALUATION 6.1 Prototype Implementaton We have developed a prototype mplementaton of the wndow-consstent replcaton servce to demonstrate and evaluate the proposed servce model. The mplementaton conssts of a prmary and a backup server, wth the clent applcaton runnng on the prmary node as shown n Fg. 3. The prmary mplements rate-monotonc schedulng of update transmssons, wth an opton to enable schedule compresson. Tck schedulng allocates the processor for dfferent actvtes, such as handlng clent requests, sendng update messages, and processng acknowledgments from the backup. At the start of each tck, the prmary transmts an update message to the backup for one of the objects, as determned by the schedulng algorthm. Any clent read/wrte requests and update acknowledgments are processed next, wth prorty gven to clent requests. Each server s currently an Intel-based PC runnng the Real-Tme Mach [25], [32] operatng system. 2 The stes communcate over an Ethernet through UDP datagrams usng the Socket++ lbrary [31], wth extensons to the UNIX select call for prorty-based access to the actve sockets. At ntalzaton, sockets are regstered at the approprate prorty such that the socket for recevng clent requests has a hgher prorty over that for recevng update acknowledgments from the backup. A tck perod of 100 ms was chosen to mnmze the ntruson from other runnable system processes. 3 To further mnmze nterference, experments were conducted wth lghtly-loaded machnes on the same Ethernet segment; we dd not observe any sgnfcant fluctuatons n network or processor load durng the experments. The prmary and backup stes mantan n-memory logs of events at run-tme to effcently collect performance data wth mnmal ntruson. Estmates of the clock skew between the prmary and the backup, derved from actual measurements of round-trp latency, are used to adjust the occurrence tmes of events to calculate the dstance between 2. Earler experments on Sun workstatons runnng Solars 1.1 show smlar results [24]. 3. The 100 ms tck perod has the same granularty as the process schedulng quantum to lmt the nterference from other jobs runnng on the machne. However, smaller tck perods are desrable n order to allow objects to specfy tghter wndows (the wndow sze s expressed n number of tcks) and respond to clent requests n a tmely manner. objects on the prmary and backup stes. The prototype evaluaton consders three man consstency metrcs representng wndow consstency and the backup and clent vews. These performablty metrcs are nfluenced by several parameters, ncludng clent wrte rate, communcaton falures, and schedule compresson. The experments vary the clent wrte rate by changng the tme w between successve clent wrtes to an object. We nject communcaton falures by randomly droppng update messages; ths captures the effect of transent network load as well as lost update acknowledgments. The nvarants n our evaluaton are the tck perod (100 ms), the objects wndow sze (d = 30 tcks), and the number of objects (N = 10); gven the tck perod and d, N s determned by the schedulablty crteron of the rate-monotonc schedulng algorthm. All objects have the same update transmsson tme of one tck, wth the object sze chosen such that the tme to process and transmt the object s reasonably small compared to the tck sze; the extra tme wthn each tck perod s used to process clent requests and update acknowledgements. Experments ran for 45 mnutes for each data pont. 6.2 Omnscent Vew (Wndow Consstency) The wndow-consstency metrc (t - t ) captures the actual temporal nconsstency between the prmary and the backup stes, and serves as a reference pont for the performance of the replcaton servce. Fg. 8a shows the average maxmum dstance between the prmary and the backup as a functon of the probablty of message loss for three dfferent clent wrte perods, wth and wthout schedule compresson. Ths measures the nconsstency of each backup object just before recevng an update, averaged over all versons and all objects, reflectng the goodness of the replcated data. Fg. 8b shows the probablty of an nconsstent backup as a functon message loss; ths faulttolerance metrc measures the lkelhood that the backup has one or more nconsstent objects. In these experments, the clent wrtes each object once every tck (w = 100 ms), once every three tcks (w = 300 ms) and once every seven tcks (w = 700 ms). The probablty of message loss vares from zero percent to 10 percent; experments wth hgher message loss rates reveal smlar trends. Message loss ncreases the dstance between the prmary and the backup, as well as the lkelhood of an nconsstent backup. However, the nfluence of message loss s not as pronounced due to conservatve object admsson n the current mplementaton. Ths occurs because, on average, the perodc model schedules updates twce as often as necessary, n order to guarantee the requred worst-case spacng between update transmssons. Message loss should have more nfluence n other schedulng models whch permt hgher resource utlzaton, as dscussed n Secton 7. Hgher clent wrte rates also tend to ncrease the backup s nconsstency; as the clent wrtes more frequently, the prmary s copy of the object changes soon after sendng an update message, resultng n staler data at the backup ste. Schedule compresson s very effectve n mprovng both performance varables. The average maxmum dstance be-

9 994 IEEE TRANSACTIONS ON COMPUTERS, VOL. 46, NO. 9, SEPTEMBER 1997 (a) Average maxmum dstance (a) Average maxmum dstance (b) Probablty (backup nconsstent) Fg. 8. Wndow consstency t - t : The graphs show the performance of the servce as a functon of the clent wrte rate, message loss, and schedule compresson. Although object nconsstency ncreases wth message loss, compressng the perodc schedule reduces the effects of communcaton falures. Inconsstency ncreases as the clent wrtes more frequently, snce the prmary changes t object soon after transmttng an update message to the backup. tween the prmary and backup under no message loss (the y-ntercept) reduces by about 30 percent for hgh clent rates n Fg. 8a; smlar reductons are seen for all message loss probabltes. Ths occurs because schedule compresson successfully utlzes dle tcks n the schedule generated by the rate-monotonc schedulng algorthm; the utlzaton thus ncreases to 100 percent and the prmary sends approxmately 30 percent more object updates to the backup. Compresson plays a relatvely more mportant role n reducng the lkelhood of an nconsstent backup, as can be seen from Fg. 8b. Also, compresson reduces the mpact of communcaton falures, snce the extra update transmssons effectvely mask lost messages. 6.3 Backup Vew (Estmated Consstency) Although Fg. 8 provdes a system-wde vew of wndow consstency, the backup ste has lmted knowledge of the (b) Probablty(backup nconsstent) Fg. 9. Backup vew t - : The plots show system performance from the backup s conservatve vewpont, as a functon of the clent wrte rate, message loss, and schedule compresson. As n Fg. 8, temporal consstency mproves under schedule compresson but worsens under ncreasng message loss. The backup s vew s mpervous to the clent wrte rate. t xmt prmary state. The backup s vew (t - t xmt ) s a good, albet pessmstc, estmate of the actual wndow consstency, as shown n Fg. 9. The backup ste uses ths metrc to evaluate ts own wndow consstency to detect a crashed prmary and effect a falover. As n Fg. 8, message loss ncreases the average maxmum dstance (Fg. 9a) and the lkelhood of an nconsstent backup (Fg. 9b). Schedule compresson also has smlar benefts for the backup s estmate of wndow consstency. However, unlke Fg. 8, the clent wrte rate does not nfluence the backup s vew of ts wndow consstency. The backup (pessmstcally) assumes that the clent wrtes an object on the prmary mmedately after the prmary transmts an update message for that object to the backup. For ths reason, the backup s estmate of the average maxmum dstance between the prmary and the backup s always worse than that derved from the omnscent vew. It follows

10 MEHRA ET AL.: DESIGN AND EVALUATION OF A WINDOW-CONSISTENT REPLICATION SERVICE 995 that ths estmate s more accurate for hgh clent wrte rates, as can be seen by comparng Fgs. 8a and 9a; for hgh clent rates relatve to the wndow, t - t xmt and t - t are vrtually dentcal. The wndow-consstent replcaton model s desgned to operate wth hgh clent wrte rates, relatve to communcaton wthn the servce, so the backup typcally has an accurate vew of ts temporal consstency. 6.4 Clent Vew (Recovery Consstency) The clent vew (t P (t) - t B (t)) measures the nconsstency between the prmary and backup versons on object reads; better recovery consstency provdes a more accurate system state after falover. Snce the clent can read at an arbtrary tme, Fg. 10 shows the tme average of recovery nconsstency, averaged across all objects, wth and wthout compresson. We attrbute the mnor fluctuatons n the graphs to nose n the measurements. The dstance metrc s not senstve to the clent wrte rate, snce frequent clent wrtes ncrease both t P () t and t B (); t when the clent wrtes more often, the prmary copy changes frequently (.e., t P () t s close to t), but the backup also receves more recent versons of the data (.e., t B () t s close to t xmt ). Moderate message loss does not have a sgnfcant nfluence on read nconsstency, especally under schedule compresson. As expected, schedule compresson mproves the read nconsstency seen by the clent sgnfcantly (< 30%). It s, therefore, an effectve technque for mprovng the goodness of the replcated data. P B Fg. 10. Clent vew t () t - t () t : Ths graph presents the tme average of recovery nconsstency, as a functon of the clent wrte rate, message loss, and schedule compresson. Compressng the update schedule mproves consstency by generatng more frequent update transmsson, whle message loss worsens read consstency. The metrc s largely ndependent of the clent wrte rate. 7 CONCLUSION AND FUTURE WORK Wndow consstency offers a framework for desgnng replcaton protocols wth predctable tmng behavor. By decouplng communcaton wthn the servce from the handlng of clent requests, a replcaton protocol can handle a hgher rate of read and wrte operatons and provde more tmely response to clents. Schedulng the selectve communcaton wthn the servce provdes bounds on the degree of nconsstency between servers. Whle our prototype mplementaton has successfully demonstrated the utlty of the wndow-consstent replcaton model, more extensve evaluaton s needed to valdate the deas dentfed n ths paper. We have recently added support for fault-detecton, falover, and ntegraton of new backups. Further experments on the current platform wll ascertan the usefulness of processor capacty reserves [25] and other RT-Mach features n mplementng the wndow-consstent replcaton servce. The present work extends nto several frutful areas of research: Object admsson/schedulng: We are studyng technques to maxmze the number of admtted objects and mprove objects wndow consstency by optmzng object admsson and update schedulng. For the wndow-consstent replcaton servce, the perodc task model s overly conservatve n acceptng object regstraton requests; that s, t may ether lmt the number of objects that are accepted or t may accept only those objects wth relatvely large wndows. Ths occurs because, on average, the perodc model schedules updates twce as often as necessary n order to guarantee the requred worst-case spacng between update transmssons. We are explorng other schedulng algorthms, such as the dstance-constraned task model [13], whch assgns task prortes based on separaton constrants, n terms of ther mplementaton complexty and ablty to accommodate dynamc creaton/deleton of objects. We are also consderng technques to maxmze the goodness of the replcated data. As one possble approach, we are explorng ways to ncorporate the clent wrte rate n object admsson and schedulng. An alternate approach s to optmze the object wndow sze tself by proportonally shrnkng object wndows such that the system remans schedulable; ths should mprove each object s worst-case temporal nconsstency. The selecton of object wndow szes can be cast as an nstance of the lnear programmng optmzaton problem. Schedule compresson can stll be used to mprove the utlzaton of the remanng avalable resources. Interobject wndow consstency: We are extendng our wndow-consstent replcaton model to ncorporate temporal consstency constrants between objects. Our goal s to bound consstency n a replcated set of related objects; new algorthms may be necessary for real-tme update schedulng of such object sets. Ths s related to the problem of ensurng temporally consstent objects n a real-tme database system; however, our goal s to bound consstency n a replcated set of related objects. Alternatve replcaton models: Although the current prototype mplements a prmary-backup archtecture wth a sngle backup ste, we are studyng the addtonal ssues nvolved n supportng multple backups. In addton, we are also explorng wndow consstency n the state-machne replcaton. Ths would enable us to nvestgate the applcablty of wndow consstency to alternatve replcaton models.

996 IEEE TRANSACTIONS ON COMPUTERS, VOL. 46, NO.

helpful comments. The work reported n ths paper was supported n part by the U.S. Natonal Scence Foundaton under Grant MIP-9203895.

Garca-Molna, Data Cachng Issues n an Informaton Retreval System, ACM Trans. Database Systems, vol. 15, no. 3, pp. 359-384, Sept. 1990. [2] P. Alsberg and J.

11 996 IEEE TRANSACTIONS ON COMPUTERS, VOL. 46, NO. 9, SEPTEMBER 1997 ACKNOWLEDGMENTS The authors wsh to thank Sreekanth Brahmamdam and Hock-Song Ang for ther help n runnng experments and post-processng the collected data, and the revewers for ther helpful comments. The work reported n ths paper was supported n part by the U.S. Natonal Scence Foundaton under Grant MIP Any opnons, fndngs, and conclusons or recommendatons expressed n ths paper are those of the authors and do not necessarly reflect the vew of the NSF. REFERENCES [1] R. Alonso, D. Barbara, and H. Garca-Molna, Data Cachng Issues n an Informaton Retreval System, ACM Trans. Database Systems, vol. 15, no. 3, pp , Sept [2] P. Alsberg and J. Day, A Prncple for Reslent Sharng of Dstrbuted Resources, Proc. IEEE Int l Conf. Software Eng., Los Angeles, [3] C.M. Aras, J.F. Kurose, D.S. Reeves, and H. Schulzrnne, Real- Tme Communcaton n Packet-Swtched Networks, Proc. IEEE, vol. 82, no. 1, pp , Jan [4] J.F. Bartlett, A NonStop Kernel, Proc. ACM Symp Operatng Systems Prncples, [5] A. Bhde, E.N. Elnozahy, and S.P. Morgan, A Hghly Avalable Network Fle Server, Proc. Wnter USENIX Conf., pp , Jan [6] K.P. Brman and T.A. Joseph, Relable Communcaton n the Presence of Falures, ACM Trans. Computer Systems, vol. 5, no. 1, pp , [7] K.P. Brman, The Process Group Approach to Relable Dstrbuted Computng, Comm. ACM, vol. 36, no. 12, pp , Dec [8] N. Budhraja and K. Marzullo, Tradeoffs n Implementng Prmary-Backup Protocols, Dept. of Computer Scence TR , Cornell Unv., [9] N. Budhraja and K. Marzullo, Tradeoffs n Implementng Prmary-Backup Protocols, Proc. IEEE Symp. Parallel and Dstrbuted Processng, pp , Oct [10] F. Crstan, Understandng Fault Tolerant Dstrbuted Systems, Comm. ACM, vol. 34, no. 2, pp , Feb [11] F. Crstan, B. Dancy, and J. Dehn, Fault-Tolerance n the Advanced Automaton System, Proc. Int l Symp. Fault-Tolerant Computng, pp. 6-17, June [12] S.B. Davdson and A. Watters, Partal Computaton n Real-Tme Database Systems, Proc. Workshop Real-Tme Operatng Systems and Software, pp , May [13] C.-C. Han and K.-J. Ln, Schedulng Dstance-Constraned Real- Tme Tasks, Proc. Real-Tme Systems Symp., pp , [14] D.D. Kandlur, K.G. Shn, and D. Ferrar, Real-Tme Communcaton n Mult-Hop Networks, IEEE Trans. Parallel and Dstrbuted Systems, vol. 5, no. 10, pp. 1,044-1,056, Oct [15] H. Kopetz, A. Damm, C. Koza, M. Mulazzan, W. Schwabl, C. Senft, and R. Zanlnger, Dstrbuted Fault-Tolerant Real-Tme Systems: The MARS Approach, IEEE Mcro, pp , Feb [16] H. Kopetz and G. Grunstedl, TTP A Protocol for Fault- Tolerant Real-Tme Systems, Computer, vol. 27, no. 1, pp , Jan [17] H.F. Korth, N. Soparkar, and A. Slberschatz, Trggered Real Tme Databases wth Consstency Constrants, Proc. Int l Conf. Very Large Data Bases, Aug [18] T.-W. Kuo and A.K. Mok, Ssp: A Sematcs-Based Protocol for Real-Tme Data Access, Proc. Real-Tme Systems Symp., pp , Dec [19] J. Lehoczky, L. Sha, and Y. Dng, The Rate Monotonc Schedulng Algorthm: Exact Characterzaton and Average Case Behavor, Proc. Real-Tme Systems Symp., pp , Dec [20] K.-J. Ln, F. Jahanan, A. Jhngran, and C.D. Locke, A Model of Hard Real-Tme Transacton Systems, Techncal Report RC 17515, IBM T.J. Watson Research Center, Jan [21] C.L. Lu and J.W. Layland, Schedulng Algorthms for Multprogrammng n a Hard Real-Tme Envronment, J. ACM, vol. 20, no. 1, pp , Jan [22] J.W.S. Lu, W.-K. Shh, and K.-J. Ln, Imprecse Computatons, Proc. IEEE, vol. 82, no. 1, pp , Jan [23] A. Mehra, A. Indresan, and K.G. Shn, Structurng Communcaton Software for Qualty-of-Servce Guarantees, Proc. 17th Real- Tme Systems Symp., pp , Dec [24] A. Mehra, J. Resford, H.-S. Ang, and F. Jahanan, Desgn and Evaluaton of a Wndow-Consstent Replcaton Servce, Proc. IEEE Real-Tme Technology and Applcatons Symp., pp , May [25] C.W. Mercer, S. Savage, and H. Tokuda, Processor Capacty Reserves: Operatng System Support for Multmeda Applcatons, Proc. IEEE Int l Conf. Multmeda Computng and Systems, pp , May [26] S. Mshra, L.L. Peterson, and R.D. Schlchtng, Consul: A Communcaton Substrate for Fault-Tolerant Dstrbuted Programs, Techncal Report 91-32, Unv. of Arzona, Nov [27] J.-F. Pars, Usng Volatle Wtnesses to Extend the Applcablty of Avalable Copy Protocols, Proc. Workshop Management of Replcated Data, pp , Nov [28] C. Pu and A. Leff, Replca Control n Dstrbuted Systems: An Asynchronous Approach, Proc. ACM SIGMOD, pp , May [29] J. Rexford, A. Mehra, J. Dolter, and F. Jahanan, Wndow- Consstent Replcaton for Real-Tme Applcatons, Proc. Workshop Real-Tme Operatng Systems and Software, pp , May [30] F.B. Schneder, Implementng Fault-Tolerant Servces Usng the State Machne Approach: A Tutoral, ACM Computng Surveys, vol. 22, no. 4, pp , Dec [31] G. Swamnathan, C++ Socket Classes. Unv. of Vrgna, June [32] H. Tokuda, T. Nakajma, and P. Rao, Real-Tme Mach: Toward a Predctable Real-Tme System, Proc. USENIX Mach Workshop, pp , Oct [33] P. Verssmo, P. Barrett, P. Bond, A. Hlborne, L. Rodrgues, and D. Seaton, The Extra Performance Archtecture (XPA), Delta-4 A Generc Archtecture for Dependable Dstrbuted Computng, D. Powell, ed., Ashsh Mehra receved a BTech degree n electrcal engneerng from the Indan Insttute of Technology at Kanpur, Inda, n 1989, and the MSE and PhD degrees n computer scence and engneerng from the Unversty of Mchgan n Ann Arbor n 1992 and 1997, respectvely. He s now a research staff member at the IBM TJ Watson Research Center. Hs prmary research nterests are n operatng system and networkng support for applcaton qualty of servce requrements, Internet-based network computng, varous aspects of code moblty and securty, hgh-speed networkng, and performance evaluaton. Jennfer Rexford receved a BSE degree n electrcal engneerng from Prnceton Unversty n 1991, and MS and PhD degrees n computer scence and engneerng from the Unversty of Mchgan n Ann Arbor, n 1993 and 1996, respectvely. Snce 1996, she has been n the Network Mathematcs Research Department at AT&T Labs Research n New Jersey. Her research nterests nclude packet schedulng, routng/sgnalng protocols, and performance evaluaton, wth an emphass on effcent support for qualty-of-servce guarantees. Farnam Jahanan receved the MS and PhD degrees n computer scence from the Unversty of Texas at Austn n 1987 and 1989, respectvely. He s currently a faculty member n the Department of Electrcal Engneerng and Computer Scence at the Unversty of Mchgan. Pror to jonng the faculty at the Unversty of Mchgan n 1993, he was a research staff member at the IBM T.J. Watson Research Center, where he led several expermental projects n dstrbuted and fault-tolerant systems. Hs current research nterests nclude real-tme software systems, fault-tolerant dstrbuted computng, and protocols for wde-area collaboratve envronments.

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process