Space-Optimal, Wait-Free Real-Time Synchronization

1 Space-Optmal, Wat-Free Real-Tme Synchronzaton Hyeonjoong Cho, Bnoy Ravndran ECE Dept., Vrgna Tech Blacksburg, VA 24061, USA {hjcho,bnoy}@vt.edu E. Douglas Jensen The MITRE Corporaton Bedford, MA 01730, USA jensen@mtre.org Abstract We consder wat-free synchronzaton for the sngle-wrter/multple-reader problem n small-memory embedded real-tme systems. We present an analytcal soluton to the problem of determnng the mnmum, optmal space cost requred for ths problem, consderng a-pror knowledge of nterferences the frst such result. We also show that the space costs requred by prevous algorthms can be obtaned by our analytcal soluton, whch subsumes them as specal cases. We also present a watfree protocol that utlzes the mnmum space cost determned by our analytcal soluton. Our evaluaton studes and mplementaton measurements usng the SHaRK RTOS kernel valdate our analytcal results. I. INTRODUCTION Most embedded real-tme systems nvolve mutually exclusve, concurrent access to shared data objects, resultng n contenton for those objects. Resoluton of the contenton drectly affects the system s tmelness, and thus the system s behavor. Mechansms that resolve such contenton can be broadly classfed nto: (1) lock-based e.g., Prorty Inhertance and Celng protocols [1], Stack Resource Polcy [2], DASA [3]; (2) wat-free e.g., protocol [4], Chen s protocol [5], [6], [7]; and (3) lock-free e.g., [8].

2 Lock-based protocols have several dsadvantages such as seralzed access to shared objects, resultng n reduced concurrency and thus reduced resource utlzaton [8]. Further, many lockbased protocols typcally ncur addtonal run-tme overhead due to ncreased context swtchng between actvtes blocked on shared objects (.e., blockers ) and actvtes that hold locks of those objects (.e., lock holders ). The ncreased context swtchng occurs when lock-based protocols preempt the currently executng blocker, execute the lock holder untl the holder releases the lock, and then resume the blocker s executon. Another dsadvantage of usng locks s the possblty of deadlocks that can occur when lock holders crash, causng ndefnte starvaton to blockers. Further, many (real-tme) lock-based protocols requre a-pror knowledge of the celngs of the locks [1], [2], whch may be dffcult to obtan n some applcaton contexts. Furthermore, OS data structures (e.g., semaphore control blocks) must be a-pror updated wth that knowledge, resultng n reduced flexblty (e.g., recomplaton to accommodate new actvtes) [8]. These drawbacks have motvated research on wat-free and lock-free object sharng n realtme systems. Wat-free protocols use multple nternal buffers 1 (e.g., a crcular buffer) for wrters and readers [4]. For the sngle-wrter/multple-reader (or SWMR) problem, wat-free protocols typcally use multple buffers for the shared object, where the number of buffers used s proportonal to the maxmum number of tmes the readers can be nterfered by the wrter, when the readers are readng. The maxmum number of nterferences of a reader bounds the number of tmes the wrter can update the object whle the reader s readng. Thus, by usng as many buffers as the worst-case number of nterferences of readers, the readers and the wrter 1 We use the term nternal here to explctly ndcate that a sngle wat-free buffer nternally uses multple buffers for ts atomc operatons. In ths paper, the buffers and the space cost mplctly mean the nternal buffers and the cost of the nternal buffers, respectvely, unless otherwse noted.

3 can contnuously read and wrte n dfferent buffers, respectvely, and avod nterference. Lock-free protocols allow readers to concurrently read whle the wrter s wrtng (wthout acqurng locks), but the readers check whether ther readng was nterfered by the wrter. If so, they read agan. Thus, a reader contnuously reads, checks, and retres untl ts read becomes successful. Snce a reader s worst-case number of retres depends upon the worst-case number of tmes the reader s nterfered by the wrter, the addtonal executon-tme overhead ncurred for the retres s bounded by the number of nterferences. Both wat-free and lock-free protocols ncur addtonal costs wth respect to ther lock-based counterparts. Wat-free protocols generally ncur addtonal space costs due to ther multple buffer usage, whch s nfeasble n many small-memory, embedded real-tme systems. Lockfree protocols generally ncur addtonal tme costs due to ther retres, whch s antagonstc to tmelness optmzaton. Pror research have shown how to mtgate these space and tme costs, so that they are feasble for embedded real-tme systems. An excellent survey of ths pror research can be found n [7]. To provde context for our work, we summarze some mportant efforts here: In [4], Kopetz and Resnger present one of the earlest wat-free protocols, where buffer szes n proportonal to worst-case nterferences are used. In [8], Anderson et al. show how to bound the retry loops of lock-free protocols through judcous schedulng. In [5], Chen and Burns present one of the most space-effcent wat-free protocols, where the worst-case preemptons need not be a-pror known. In [9], Sundell and Tsgas descrbe a wat-free protocol for the multple-wrter/multple-reader problem. In [7], Huang et al. mprove the tme and space costs of Chen s protocol. In ths paper, we focus on wat-free synchronzaton for the SWMR problem n small-memory, embedded real-tme systems. We focus on wat-free, as opposed to lock-free, as majorty of the lock-free protocols have hgh computatonal costs [7]. We consder the SWMR problem, as t

4 occurs n most embedded real-tme systems [7], and focus on mnmzng ts space costs. We present an analytcal soluton to the problem of determnng the mnmum number of buffers that s requred to ensure the safety and orderlness of wat-free synchronzaton n SWMR. We call ths problem, Wat-Free Buffer sze decson Problem (or WFBP). Note that the optmalty n space that we provde s on the requred number of nternal buffers, and does not nclude the control varables needed for the wat-free protocol s operaton. Ths s because the space cost of nternal buffers domnates that of the control varables, especally when the data sze becomes larger. We prove that our soluton to WFBP subsumes the number of buffers requred by prevous wat-free protocols ncludng Chen s [5] and [4] protocols as specal cases. We analytcally dentfy the condtons under whch our protocol needs less (and equal) number of buffers than other protocols. Further, we present a wat-free protocol that utlzes the mnmum buffer requrement determned by our soluton. To determne the buffer requrements under a broad range of reader/wrter scenaros, we conduct numercal evaluatons. We also mplement our protocol n the SHaRK RTOS [10]. Our evaluatons and mplementaton measurements confrm our soluton to WFBP and valdate our analytcal results. Thus, the paper s contrbutons nclude the analytcal soluton that we present for WFBP and the wat-free protocol that uses the concomtant mnmum number of buffers. Among the class of wat-free protocols that consder a-pror knowledge of nterferences, our optmal space lower bound s the frst such bound that s analytcally establshed. The rest of the paper s organzed as follows: We present our analytcal soluton to WFBP and our wat-free protocol n Secton II. In Secton III, we formally compare our protocol wth Chen s and protocols. We numercally evaluate our protocol n Secton IV, and report our mplementaton experence n Secton V. We conclude the paper n Secton VI.

5 II. A SPACE-OPTIMAL WAIT-FREE PROTOCOL A wat-free protocol solves the asynchronous sngle-wrter/multple-reader problem by ensurng that each reader accesses the shared object wthout any nterference from the wrter. To realze the wat-free mechansm, the protocol must hold two propertes: safety and orderlness [5]. The safety property ensures that the shared object does not become corrupted durng readng and wrtng. The orderlness property ensures that all readers always read the latest data that s completely wrtten by the wrter. The basc dea to acheve the two propertes s rooted n the three-slot fully asynchronous mechansm for the sngle-reader/sngle-wrter problem [11]. For ths problem, Chen et al. show that three buffers are requred to keep the latest completely updated buffer for the next readng, whle a wrter and a reader are occupyng buffers respectvely. Ths mechansm allows that a reader can always obtan data from the buffer slot that s last completely updated, whle the wrter s wrtng the new verson of the shared data [5]. The buffers needed for the sngle-wrter/multple-reader problem consst of three types: buffers for readers, a buffer for the latest wrtten data, and a buffer for the next wrte operaton. The buffers for readers must satsfy safety.e., suffcent buffers must be avalable to avod nterference between readng and wrtng. However, ths does not mply that we need as many buffers as there are readers. The two buffers for wrtng are requred to realze orderlness.e., the latest wrtten data must be saved so that a newly actvated reader can access t at any tme. In addton, the latest wrtten data must be kept untl the wrter completely wrtes the next data nto another buffer. We now dscuss how to determne the mnmum number of buffers that are needed for the sngle-wrter/multple-reader problem n the followng subsectons:

6 A. Protocol Structure and Task Model Fgure 1 shows a wat-free protocol s common mplementaton. W.2 and R.2 show the code sectons of the wrter and a reader that wrte and read data, respectvely. W.1 s the code secton where the wrter decdes on the buffer for wrtng, and updates a control varable that ndcates the selected buffer. W.3 s the code secton where the wrter ndcates completon of wrtng and the buffer that has the latest data. In R.1, the reader checks for the latest data to read. (a) Wrte (b) Read Fg. 1. Typcal Wat-Free Implementaton The buffer sze requred for the protocol [4] and the mproved protocols n [7] s determned based on the temporal propertes of tasks. These pror works consder the perodc task model, where tasks concurrently share data objects. Aperodc tasks are handled by a perodc server, so the perodc model s not a lmtng assumpton. Assumng that all deadlnes are met (.e., durng under-load stuatons and precludng overloads), the maxmum number of preemptons of the reader by the wrter task n the worst-case can be obtaned. We consder the same task model. B. Number of Buffers n Use We ntroduce some notatons for convenence, most of whch are smlar to those n [5]. We denote the total number of readers as M and the th reader as R. The reader R s j th nstance

7 of readng s denoted as R [j]. The wrter s k th wrtng nstance s denoted as W [k]. R [j] (op) stands for a specfc operaton of R [j]. For example, R [j] (READIN G[] = 0) mples the executon of one statement n Chen s algorthm [5]. W [k] (op) also stands for the operaton n W [k]. If R [j] reads what W [k] wrtes, we denote t as w(r [j] ) = W [k]. As prevously mentoned, safety and orderlness can be acheved wth multple buffers for readers, one buffer for the latest wrtten data, and another buffer for the next wrtng. Fg. 2. Number of Buffers n Use Suppose we have 4 readers and 1 wrter, as shown n Fgure 2. At tme t 1, w(r 1 )=W [2], w(r 2 )=W [2], and w(r 3 )=W [1]. Ths mples that two buffers are beng used by the readers. In addton, one buffer s requred to store and save the latest completely wrtten data by W [4], and another s needed for the next wrtng operaton by W [5]. Thus, four buffers are beng used n total, at tme t 1. At tme t 2, w(r 1 )=W [6], w(r 2 )=W [5], w(r 3 )=W [4], and w(r 4 )=W [6]. The latest wrtten data s by W [6], and W [7] s the next operaton. Thus, the total number of buffers used at tme t 2 s four, whch s the mnmum number requred at t 2 for ensurng safety and orderlness propertes. The basc ntuton for determnng the mnmum number of buffers s to construct a worstcase where the requred number of buffers s as large as possble, when the maxmum possble number of nterferences of all readers wth the wrter occurs. We map ths problem to a problem

8 called the Dverse Selecton Problem (or DSP) and then solve t. C. Dverse Selecton Problem The DSP denoted as D(R, R( x)) s defned wth the problem range R and the range vector R( x) of all elements n the vector x. R has the lower and upper bounds defned as [l, u]. Each element x n the vector x has the range r =[l, u ]. The soluton to the problem D s represented as a vector x =< x 1,..., x M > where the vector sze n( x) s M. Every x must satsfy ts range constrant r and the problem range constrant R. We defne { x} as a set ncludng all elements of x, but wthout duplcates. Thus, the sze of { x}, n({ x}), s less than or equal to n( x). The objectve of DSP s to determne the maxmum n({ x}) by selectng x, satsfyng all range constrants as dversely as possble. Gven a vector, v =< v 1,..., v,... >, we denote the number of v s havng k value as H( v, k), and the maxmum value among all v elements as T op( v). Gven D(R, R x ), the optmal soluton of D when R = [t 1, t 2 ] s denoted as n max [t 1,t 2 ] ({ x}). For example, f v =< 1, 2, 2, 2, 6 > then, H( v, 2) = 3 and T op( v) = 6. An easy approach to solve DSP s by consderng all possble cases. The number of all possble cases s n(r 1 )... n(r M ), where the R of the problem s gven as < r 1,...r M >. By consderng all cases, we can select a vector x that maxmzes n({ x}). However, such an approach would be computatonally expensve. A more effcent approach to solve DSP can be found by an nductve strategy. Consder a DSP D(R, R( x)), where R = [1, ] and R=< [1, u 1 ],..., [1, u M ] >. If all lower bounds of elements of x are 1, we can defne the upper bound vector u =< u 1,..., u M > nstead of R, for convenence. In the rest of the paper, we call the problem defned by D(R, u) as DSP. Ths s smply because ths assumpton s well-mapped to the problem of decdng the mnmum buffer

9 sze for the wat-free protocol. Fg. 3. An Inductve Approach to DSP The soluton to the problem D(R, u) can be represented as n max [1,T op( u)] ({ x}). The dea to decompose the problem s shown n Fgure 3. If the soluton to the problem D([6, 12], u) can be derved from the soluton to the problem D([7, 12], u), we can nductvely determne the fnal soluton to the problem D([1, 12], u). Theorem 2.1 (DSP for the Wat-Free Protocol): In the DSP D(R, u) wth R = [1, N], n max [t+1] ({ x}) = n max ({ x}) + 1, f t+1 k=0 n max ({ x}), otherwse H( u, N k) > nmax ({ x}) where N = T op( u), =[N t, N], and 0 t < N. When t = 0, the n max [0] ({ x}) = 1. Proof: Assume that we have the soluton to the problem D([N t, N], u). When ths problem s extended to D([N (t + 1), N], u), the ranges of several varables x overlap wth the problem range [N (t + 1), N]. The number of newly added varables that we need to consder s H( u, N (t + 1)). When the problem range s extended by 1, the maxmum possble ncrement of n max [t+1] ({ x}) s 1. The ncrement happens only f the number of all x whch have

10 ther r overlapped wth [N (t + 1), N] s greater than n max ({ x}). In other words, ths happens when new elements appear n the extended problem scope, or there s an element duplcated wthn [N t, N] at the prevous step. Otherwse, n max [t+1]({ x}) has no change from before. The ncrement means that the value of one element s determned as dversely as possble. The proof s by nducton on t. Bass. We show that the theorem holds when t = 0. When the problem s D([T op( u), T op( u)], u), there must be at least one element x wth the range [1, T op( u)], and the maxmum possble value of n max [0] ({ x}) s 1. Hence, the bass for the nducton holds. Inducton step. Assume that the theorem holds true when R = [N t, N]. We arrve at the optmal soluton of D([N (t + 1), N], u) wth the optmal soluton of D([N t, N], u) as n the base step. Suppose that the derved soluton n max [t+1]({ x}) s not optmal. Then, there must exst another optmal soluton n max [t+1] ({ x} ). Clearly, n max [t+1] ({ x} ) s greater than n max [t+1]({ x}). Now, there are two possble cases: Case 1. If H( u, N (t + 1)) > n max ({ x}), then n max ({ x}) s nmax({ x}) + 1, whch s less than n max [t+1] ({ x} ). Therefore, n max ({ x}) < n max [t+1] ({ x} ) 1. Ths means that there exsts another { x} that has more than n max optmal. [t+1] ({ x}) elements. Ths contradcts the assumpton that n max ({ x}) s Case 2. If H( u, N (t + 1)) = n ({ x}), then n max [t+1]({ x}) s nmax ({ x}), whch s less than n max [t+1] ({ x} ). Snce no element s range becomes newly overlapped and no element has ts duplcate, n max [t+1] ({ x} ) = n max ({ x} ). Ths means that there exsts another n max ({ x} ), whch s greater than n max ({ x}). Ths contradcts the assumpton that n max ({ x}) s optmal. Theorem 2.2 (Soluton Vector for the DSP): In the DSP D(R, u) wth R = [1, N], { x} {N (t + 1)}, f t+1 k=0 H( u, N k) > nmax ({ x}), { x} [t+1] = { x}, otherwse

11 where N = T op( u), =[N t, N], and 0 t < N. When t = 0, { x} [0] = {N}. Proof: By Theorem 2.1, { x} can be constructed by addng {N (t + 1)} whenever n max t ({ x}) ncreases by 1. Note that ths { x} s one of the soluton vectors. D. Smlarty to WFBP The DSP has smlarty wth the Wat-Free Buffer sze decson Problem (or WFBP). In ths problem, we are gven M readers and ther maxmum nterferences as < N1 max,..., NM max >. The objectve of WFBP s to determne the worst-case maxmum number of buffers. Fg. 4. A Worst-Case of the WFBP Fgure 4 llustrates how to construct the worst-case where the requred number of buffers are as large as possble wth an example. For convenence, the ndex of the wrter s reversed compared wth Fgure 2. In ths example, R 1 s maxmum nterference s 5, whch s llustrated n a lne. It means that w(r 1 ) may belong to the set {W [1],...,W [6] }. We assume that the worst-case happens at tme t between W [2] and W [1], where W [2] wrtes the latest completely wrtten data, and W [1] s the next wrtng operaton for whch another buffer s needed. For ths reason, we restate WFBP as determnng x =< w(r 1 ),..., w(r M ) > that wll maxmze n({ x} {W [1], W [2] }), where w(r ) {W [1],..., W [N max +1] }. If we abbrevate W [j] as j, the problem s redefned as determnng x =< x 1,..., x M > that wll maxmze n({ x} {1, 2}),

12 where x {1,..., N max + 1}. Ths s equvalent to DSP except that n({ x} {1, 2}) s used as the objectve to maxmze, nstead of n({ x}). Therefore, the fnal soluton { x} of a gven WFBP s obtaned wth a sum of the soluton from a mapped DSP and a set {1, 2}. We clam that ths s correct, because the algorthm for DSP that we propose s desgned to fnd { x} whch does not have 1 and 2 as ts elements, f possble. We can guarantee that n ths way, even f the soluton from DSP s summed wth {1} or {2}, t s stll for the worst-case. Corollary 2.3 (Space Optmalty): If a soluton to the WFBP can be obtaned, then t must be the mnmum and space-optmal buffer sze that satsfes the two propertes, safety and orderlness. Proof: The soluton s the number of buffers needed n the worst-case of the gven problem. Even wth one less buffer than the obtaned soluton, we cannot realze all readng and wrtng, and stll satsfy safety and orderlness. Hence, the soluton to the WFBP s the mnmum and space-optmal. E. Algorthm for WFBP We now present an algorthm, Algorthm 1, to solve the WFBP based on the prevous sectons. The algorthm nputs nclude the number of readers M and the maxmum nterference N max []. The sum and the functon doesexst(t) correspond to H( u,...) and H( u, t) n Theorem 2.1. To reduce the tme complexty of doesexst(t), we sort all N max [] before the man loop. doesexst(t) uses a statc varable, and does not search the entre array N max [] each tme. The flag on ndcates whether or not the DSP soluton ncludes. If t does not nclude 1 or 2, the requred buffer sze for the WFBP soluton, n, s ncremented. The tme complexty of ths algorthm s O(MlogM + N max ). We beleve that ths cost s

13 Algorthm 1: Algorthm for WFBP 1 nput : # of readers M; max nterference N max [M] 2 output : requred buffer sze n 3 sum=n=0; 4 on 1 =on 2 =false; 5 for = 1 to M do N max []++; 6 sort( N max [1,...,M] ); 7 for t=n max [1] to 1 do 8 sum += doesexst( t, N max [1,...,M] ); 9 f sum>n then 10 n++; 11 f t=2 then on 2 = true; 12 f t=1 then on 1 = true; 13 14 f on 2=false then n++; f on 1=false then n++; reasonable, as the algorthm s run off-lne for determnng the buffer needs. F. A Wat-Free Implementaton The protocol uses a crcular buffer to realze wat-free synchronzaton. The dea behnd the crcular buffer s that whle a wrter crcularly accesses the buffers, the readers follow the wrter. However, we cannot use the crcular type of buffer because a wrter n our protocol needs to determne a safe buffer, whch can be any of the buffers. The same stuaton arses wth Chen s protocol, where the wrter can access anywhere. Thus, to mplement our protocol, we slghtly modfy Chen s protocol. Our mplementaton scheme s shown n Algorthms 2 and 3. In Algorthms 2 and 3, the GetBuf() functon searches the empty buffer to wrte to the buffers assgned by Algorthm 1. Compared wth the mplementaton n [7], our approach does not need to mplement separate protocols for fast readers and slow readers. Addtonally, we acheve the speed mprovement by reducng the requred buffer sze, whch reduces the number of teratons n GetBuf() s loop, compared wth the orgnal Chen s protocol [5].

14 Algorthm 2: Modfed Chen s Protocol for Wrter 1 Data: BUFFER [1,...,NB](NB: # of buffers) ; READING [1,...,n] (n: # of readers) ; LATEST 2 GetBuf() 3 begn 4 bool InUse [1,...,NB]; 5 for =1 to NB do InUse []=false; 6 InUse[LATEST ]=true; 7 for =1 to n do 8 j = READING []; 9 f j 0 then InUse [j]=true; 10 =1; whle InUse [] do ++; 11 return ; 12 end 13 Wrter() 14 begn 15 nteger wdx, ; 16 wdx = GetBuf(); 17 Wrte data nto BUFFER [wdx]; 18 LATEST = wdx; 19 for =1 to n do 20 Compare-and-Swap(READING [],0,wdx); 21 end Algorthm 3: Modfed Chen s Protocol for Reader 1 Data: BUFFER [1,...,NB](NB: # of buffers) ; READING [1,...,n] (n: # of readers) ; LATEST 2 Reader() 3 begn 4 nteger rdx; 5 READING [d]=0; 6 rdx = LATEST; 7 Compare-and-Swap(READING [d],0,rdx); 8 rdx = READING [d]; 9 Read data from BUFFER [rdx]; 10 end III. FORMAL COMPARISON WITH CHEN S AND A. Specal Case Behavor The buffer sze that the protocol [4] requres depends on the maxmum number of nterferences that a reader can suffer from the wrter. It does not depend on the number of readers, because smultaneous readng by the readers accesses the same buffer, rrespectve of the number of readers. On the other hand, the buffer sze that the Chen s protocol [5] requres s

15 drectly proportonal to the number of readers, and s ndependent of the number of nterferences. We now show that our protocol subsumes both Chen s protocol and the protocol as specal cases. Lemma 3.1: The buffer sze for Chen s protocol [5] s a specal case of the WFBP soluton gven n Algorthm 1. Proof: Assume that we are gven M readers and no nformaton about nterferences. We can map ths problem to DSP, by settng R as [1, ] and the upper-bounds of x as <,..., >. Accordng to Theorem 2.2, n({ x}) cannot exceed n( x). Thus, the worst-case buffer sze s obtaned as (M + 2), that s n( x)+n({1, 2}). Ths s exactly the same value as that obtaned by Chen s protocol. Lemma 3.2: The buffer sze for protocol [4] s a specal case of the WFBP soluton gven n Algorthm 1. Proof: Assume that we are gven nfnte number of readers wth a knowledge of T op( u) = N max. Ths problem can be modeled as the problem wth R = [1, N max +1] and, u = N max +1 for the worst-case. By Theorem 2.1, H( u, N) =, and whenever t ncreases, n({ x}) ncreases by 1 untl t and n({ x}) reaches to N max and N max +1, respectvely. Thus, the worst-case buffer sze s obtaned as N max + 1,.e., n({1,..., N max + 1} {1, 2}). Ths s exactly the same value as that obtaned by protocol. Theorem 3.3 (Upper Bound of the WFBP soluton): In the WFBP, n max ({ x}) mn(m + 2, N max + 1), where M s the number of readers and N max s the maxmum number of nterferences that a reader can suffer. Proof: Proof follows drectly from Lemmas 3.1 and 3.2. Chen s protocol s attractve because the number of nterferences need not be known a-pror. On the other hand, has the advantage that the requred number of buffers can be further

16 reduced f the number of nterferences are much smaller than the number of readers. Addtonally, we note that the number of buffers needed by our algorthm s less than or equal to that of Chen s or protocol. B. Buffer Sze Condtons Accordng to Theorem 3.3, our wat-free protocol always fnds the number of requred buffers whch s less than or equal to that of Chen s protocol or the protocol. We now dentfy the precse condtons under whch the requred buffer sze of our protocol s equal to that of Chen s or. To derve the condtons, we observe two propertes n the WFBP. In the followng theorem, we ntroduce a notaton {{ x}}, whch denotes the set ncludng all possble solutons { x} for the gven DSP. Theorem 3.4 (Chen s Tester): When the number of readers n the wat-free buffer sze decson problem s M and N max > M, {3,..., M + 2} {{ x}}, f and only f n max ({ x}) M + 2. Proof: We prove both necessary and suffcent condtons. Case 1. Assume that when {3,..., M + 2} {{ x}}, n max ({ x}) < M + 2. Snce the sze of the optmal soluton s less than M + 2, the sze of { x} cannot exceed M + 2. Ths contradcts our assumpton that {1, 2} {3,..., M + 2} s a soluton. Case 2. Assume that the set { x} s {x 3,..., x M+2 }, n whch x s are dfferent between each other and algned n ncreasng order. Now, all x must not be 1 or 2, otherwse n max ({ x}) s less than M + 2. Therefore, x 3 should be greater than or equal to 3, and x 4 s greater than x 3. Inductvely, x +1 x + 1, where 3 < M + 2. In other words, snce x x 1 + 1 x 2 + 2..., the nequalty u x holds. For ths reason, {3,..., M + 2} satsfes the range constrants of all elements.

17 By Theorem 3.4, n max ({ x}) < M +2, f {3,..., M +2} / {{ x}}. Ths means that by checkng f {3,..., M + 2} s feasble for the problem, we can determne whether or not t requres M + 2 buffers that Chen s protocol needs. Theorem 3.5 ( Tester): When the number of readers n the wat-free buffer sze decson problem s M and N max M, {2,..., N max + 1} {{ x}}, f and only f n max ({ x}) N max + 1. Proof: We prove both necessary and suffcent condtons. Case 1. Assume that when {2,..., N max + 1} {{ x}}, n max ({ x}) < N max + 1. Snce the sze of the optmal soluton s less than N max + 1, the sze of { x} cannot exceed N max + 1. Ths contradcts our assumpton that {1, 2} {2,..., N max + 1} s a soluton. Case 2. Assume that the set { x} s {x 2,..., x N max +1}, n whch x s are dfferent between each other and algned n ncreasng order. Now, all x must not be 1, otherwse n max ({ x}) s less than N max +1. Therefore, x 2 should be greater than or equal to 2, and x 3 s greater than x 2. Inductvely, x +1 x + 1 where 2 < N max + 1. In other words, snce x x 1 + 1 x 2 + 2..., the nequalty u x holds. For ths reason, {2,..., N max + 1} satsfes the range constrants of all elements. We can also nvestgate f a gven WFBP needs N max +1 buffers or less by checkng feasblty wth {2,..., N max + 1}. We call {3,..., M + 2} and {2,..., N max + 1} as Chen s tester and tester, respectvely. From Theorems 3.4 and 3.5, we derve a decson procedure that determnes the wat-free protocol wth the lowest buffer sze. Fgure 5 shows ths procedure. To llustrate t, we use the WFBP example n [7], whch s also shown n Table I. By our decson procedure, snce N max > M, Chen s protocol requres smaller number of buffers than. The next step s determnng whether Chen s tester, whch s < 3, 4,..., 9 >

18 Fg. 5. Decson Procedure n ths problem, s feasble. It turns out that t s not feasble, as the second element 4 n the tester s out of the range [1, 3] of reader 1. Hence, we expect to fnd smaller number of requred buffers than that of Chen s protocol. TABLE I TASK SET Task Reader 0 2 Reader 1 2 Reader 2 2 Reader 3 3 Reader 4 3 Reader 5 14 Reader 6 49 N max Algorthm 1 determnes that we need 6 buffers for ths problem. We determne a vector { x} = {1, 2, 3, 4, 15, 50} as a worst-case canddate for the WFBP from Theorem 2.2. As mentoned earler, the soluton means that one of the worst-cases occurs when we need buffers for wrters {W [1], W [2], W [3], W [4], W [15], W [50] }. C. Comparson wth Improved Chen s Protocol In [7], Huang et al. suggest a transformaton mechansm to reduce the buffer sze needs of a gven wat-free protocol. The transformaton s appled to many wat-free protocols ncludng Chen s protocol. The transformed Chen s protocol s called Improved Chen s protocol n [7].

19 We cannot formally compare our protocol wth Improved Chen s protocol n terms of space cost, because no analytcal foundaton s gven for the transformaton mechansm n [7]. Consequently, a formal comparson s not possble, and only an expermental comparson s possble, where the two protocols can be compared for as many cases as possble. We do ths n Secton IV. Our experments n Secton IV reveal that the buffer sze needs of our protocol and Improved Chen s are the same, for all the cases that we consder. Of course, ths does not mply that Improved Chen s and ours always need the same number of buffers, because t s mpossble that our evaluaton studes n Secton IV cover all the cases. Nevertheless, note that wth Corollary 2.3, we guarantee that the buffer sze needed for wat-free cannot be reduced any further. Addtonal advantage of our protocol s that t s not requred to dvde readers nto fast and slow groups and apply two separate readng operatons as Improved Chen s does. D. Comparson of Tme Complexty Implementaton of and Chen s protocols requre the Compare-And-Swap (CAS) nstructon. The CAS nstructon s used to atomcally modfy control varables of the wat-free protocol by combnng comparson and swap operatons nto a sngle nstructon. The nstructon s avalable n many modern processors and takes constant tme. has no loop wthn both wrte and read operatons. However, Chen s protocol has 3 loops wthn the wrte operaton and no loop wthn the read operaton. Wth n buffers, the tme complexty of Chen s wrtng operaton s O(n). Improved Chen s protocol and our protocol are varatons of Chen s protocol, and hence have smlar tme complextes as that of Chen s wrtng and readng. Accordng to Theorem 3.3, the loop teraton n our protocol s wrte operaton cannot exceed M + 2. Thus, the tme complexty

20 TABLE II ASYMPTOTICAL TIME COMPLEXITIES Wat-Free Protocol Read Wrte O(1) O(1) Chen s O(1) O(n) Improved Chen s O(1) O(n) O(1) O(n) of our protocol s O(n), whch s the same as that of Chen s. Snce the asymptotcal speeds are therefore smlar, a speed mprovement can be obtaned (for Chen s, Improved Chen s, and ours) by reducng the buffer sze. Table II summarzes the asymptotcal tme complextes of the protocols. IV. NUMERICAL EVALUATION STUDIES We conduct numercal evaluatons to evaluate the buffer sze needs of our protocol under a broad range of reader/wrter condtons, ncludng ncreasng maxmum nterferences and readers. We also consder, Chen s, and Improved Chen s protocols for comparatve study. We consder Improved Chen s protocol among all protocols n [7], because t s the most spaceeffcent protocol n [7]. We exclude the Double Buffer protocol [7] from our study as t needs nearly two tmes the buffer space than Chen s protocol. (The Double Buffer protocol trades off space for tme.) Thus, our protocol wll clearly outperform the Double Buffer protocol n terms of buffer needs. A. Increasng Interferences We consder a task set wth 1 wrter and multple readers whose maxmum number of nterferences N max s randomly generated wth a normal dstrbuton (wth a fxed standard devaton of 5), and by varyng the average. The protocols are evaluated by ther buffer sze needs the actual amount of needed memory s the number of buffers tmes the message sze n bytes. Each experment s repeated 100 tmes to determne the average buffer szes.

21 Number of requred buffers 60 50 40 30 20 Imp. 10 0 10 20 30 40 50 Ave. of maxmum numbers of nterferences Number of requred buffers 60 50 40 30 20 Imp. 10 0 10 20 30 40 50 Ave. of maxmum numbers of nterferences Number of requred buffers 60 50 40 30 20 Imp. 10 0 10 20 30 40 50 Ave. of maxmum numbers of nterferences (a) M = 20 (b) M = 30 (c) M = 40 Fg. 6. Buffer Szes Under Increasng Interferences Wth Normal Dstrbuton for N max Fgure 6 shows the buffer sze needs of each protocol as the average N max s ncreased from 5 to 45, for 20, 30, and 40 readers. From the fgure, we observe that as N max ncreases, the buffer sze needs of ncreases, whereas that of Chen s protocol remans the same (for a gven reader sze), snce ts buffer needs s proportonal only to the number of readers. As the number of readers ncreases from 20 to 40, Chen s protocol needs ncreasng number of buffers. Meanwhle, the number of buffers that our protocol requres never exceeds that of Chen s and s, as Theorem 3.3 holds. Interestngly, the number of buffers that Improved Chen s protocol requres s exactly the same as that of ours. Note that no analyss on the buffer sze needs of Improved Chen s s presented n [7], whereas Theorem 3.3 gves the upper bound on the buffer sze needs of our protocol. We observed exact smlar trends for other fxed standard devatons for N max s dstrbuton, and other dstrbutons for N max. Fgure 7(a) shows the buffer sze needs of each protocol, when N max s generated wth a normal dstrbuton, wth a fxed standard devaton of 10 (nstead of 5), and by varyng the average N max from 5 to 45, for 40 readers. Fgure 7(b) shows the protocols buffer needs under the exact same condtons as those n Fgure 7(a), except that N max s now generated wth an unform dstrbuton.

22 Number of requred buffers 70 60 50 40 30 20 Imp. 0 10 20 30 40 50 Ave. of maxmum numbers of nterferences Number of requred buffers 60 50 40 30 20 Imp. 10 0 10 20 30 40 50 Ave. of maxmum numbers of nterferences (a) Normal Dstrbuton (b) Unform Dstrbuton Fg. 7. Buffer Szes Under Dfferent N max Dstrbutons wth 40 Readers From the fgures, we observe that our protocol s buffer needs never exceed that of Chen s and s, and s the same as that of Improved Chen s. B. Heterogenous Readers n Multple Groups From Fgure 6, we also observe that when most readers have small N max, the number of buffers needed by our protocol approaches that of s. Moreover, when most readers have larger N max, the number of buffers needed by our protocol approaches that of Chen s protocol s. Ths motvates us to study the buffer sze needs of our protocol under two groups of readers, one that has small N max s and the other that has large N max s. (A smlar evaluaton s conducted n [7], where readers are classfed as fast and slow. ) We dvde tasks nto the two groups whose averages of the (normal) dstrbuton for N max s are fxed as 5 and 45, respectvely. We then vary the rato of the two groups. For example, 3:1 n the X-axs n Fgure 8(a) means that the readers havng smaller N max are 3 tmes more than the readers havng larger N max. Fgure 8 shows the buffer szes of each protocol as the rato s vared from 3:1 to 1:3, for 20, 30, and 40 readers. We observe that the buffers needed for, Improved Chen s, and our protocol ncrease as the readers wth larger N max ncreases. Ths result s consstent wth that n [7], where Improved Chen s s shown to requre less buffers, as fast readers wth smaller

23 60 60 60 Number of requred buffers 50 40 30 20 10 0 Imp. 31 21 11 12 13 Rato of two reader groups Number of requred buffers 50 40 30 20 10 0 Imp. 31 21 11 12 13 Rato of two reader groups Number of requred buffers 50 40 30 20 10 0 Imp. 3:1 2:1 1:1 1:2 1:3 Rato of two reader groups (a) M = 20 (b) M = 30 (c) M = 40 Fg. 8. Buffer Szes Wth 2 Reader Groups Under Varyng Reader Rato, for 20, 30, and 40 Readers N max ncreases. The results confrm that ours and Improved Chen s requre the mnmum buffer sze when consderng two heterogenous reader groups. We now consder a more complex scenaro wth three reader groups, called fast, slow, and medum, whch are not consdered n [7]. The averages of the (normal) dstrbuton for N max s for the three groups are fxed as 5, 25, and 45, respectvely, and the rato of the three groups are vared from 6:3:1 to 1:3:6. Fgure 9 shows the results. 60 60 60 Number of requred buffers 50 40 30 20 Imp. 6:3:1 4:2:1 1:1:1 1:2:4 1:3:6 Rato of three reader groups Number of requred buffers 50 40 30 20 Imp. 6:3:1 4:2:1 1:1:1 1:2:4 1:3:6 Rato of three reader groups Number of requred buffers 50 40 30 20 Imp. 6:3:1 4:2:1 1:1:1 1:2:4 1:3:6 Rato of three reader groups (a) M = 20 (b) M = 30 (c) M = 40 Fg. 9. Buffer Szes Wth 3 Reader Groups Under Varyng Reader Rato, for 20, 30, and 40 Readers From the fgure, we observe that as the number of fast readers ncreases, the number of buffers needed decreases. Further, we observe that the buffer sze requred by Improved Chen s s the same as that of ours even when we nclude the medum reader group n our evaluaton.

24 V. IMPLEMENTATION EXPERIENCE A wat-free protocol s practcal effectveness s determned by ts space and tme costs. In developng a wat-free protocol, we focus on optmzng space costs, and we establsh the space optmalty of our protocol. Although reducng the protocol s tme costs s not our goal, we now determne the tme costs to establsh our protocol s effectveness. Our wat-free protocol (Algorthms 2 and 3) s a modfcaton of Chen s protocol, augmented wth the buffer sze computed by Algorthm 1. Thus, we expect that our protocol ncurs at most as much tme overhead as that of Chen s. Moreover, the hgher space effcency that our protocol enjoys can lead to hgher tme effcency, because t reduces the search space for determnng the protocol s safe buffer e.g., GetBuf() s loop n Algorthm 2. To evaluate the actual tme costs of our protocol, we mplement our protocol n the SHaRK (Soft Hard Real-Tme Kernel) OS [10], runnng on a 500MHz, Pentum-III processor. Smlar to Secton IV, we also mplement Chen s, Improved Chen s, and protocols for a comparatve study. We also consder lock-based sharng n ths study. Note that all protocols n our study can be adopted for both un-processor and mult-processor systems, although we consder only the performance n the un-processor n ths secton. We consder a task set wth 20 readers and a wrter, and use a message sze of 8 bytes for an nter-process communcaton (or IPC). We measure the average-case executon tme (or ACET) and the worst case executon tme (or WCET) for performng an IPC. The executon tme for an IPC s the tme needed for executng the code segment that accesses the shared object. Wth tradtonal lock-based sharng, ths code segment s the crtcal secton. Note that a wat-free protocol s IPC executon tme ncludes tmes for controllng protocol s varables, accessng the shared object, and potental nterference from other tasks. Thus, WCET tends to be much larger

25 than ACET. In Secton IV-B, we vared the rato of two reader groups whose averages of the (normal) dstrbuton for N max s are fxed as 5 and 45, respectvely. We now select two cases from whch the rato of readers havng smaller and larger N max are 4:1 and 1:4, respectvely. These two cases can be represented as 16 fast and 4 slow readers, and 4 fast and 16 slow readers, respectvely, for the purpose of Improved Chen s [7], snce that protocol needs the readers to be classfed as slow and fast. We fx the wrter s perod as 0.2 msec and let the wrter nvoke 6,000,000 tmes durng our experment nterval for computng the ACETs. The perod of the 20 readers ranges from 400 usec to approxmately 10msec. Average executon tme (usec) 1.0 0.8 0.6 0.4 0.2 0.0 13.8 Lock-based Imp. Average executon tme (usec) 1.0 0.8 0.6 0.4 0.2 0.0 9.9 Lock-based Imp. (a) 16 Fast and 4 Slow Readers (b) 4 Fast and 16 Slow Readers Fg. 10. ACET of Read/Wrte n SHaRK RTOS Fgure 10 shows the measurements from our mplementaton. We observe that has the smallest ACET, lock-based sharng has the largest ACET, and Chen s, Improved Chen s, and our protocol have almost the same ACET n our mplementaton. has the smallest ACET, because ts mplementaton does not have any loop (and thus less computatonal costs) nsde both the reader and wrter operatons. Lock-based sharng has the largest ACET due to ts blockng tmes. Further, accessng and releasng locks n SHaRK s done through system calls, whch takes longer than wat-free protocols (whch are mplemented wthout system calls).

26 Worst case executon tme (usec) 6 5 4 3 2 1 0 31.2 Lock-based Imp. Worst case executon tme (usec) 6 5 4 3 2 1 0 21.6 Lock-based Imp. (a) 16 Fast and 4 Slow Readers (b) 4 Fast and 16 Slow Readers Fg. 11. WCET of Read/Wrte n SHaRK RTOS In [7], when the number of fast readers are ncreasng, the ACET of Improved Chen s tends to be shorter because the needed buffer sze decreases and, a part of Improved Chen s, performs faster. Ths trend does not appear n our experments. Ths s because the expected speed mprovement s only (approxmately) 0.1 usec. Ths dfference s small enough to be affected by the OS type, code optmzatons, and measurement methodology, among other factors. We observed the smlar results n WCET n Fgure 11. Although reducng the protocol s tme costs s not our goal, we observe that varatons of Chen s ncludng Chen s, Improved Chen s, and ours have much the same ACET and WCET at least n our mplementaton and thus, we beleve that our protocol s tme costs s comparable to that of prevous protocols. We have suggested the decson procedure that determnes the wat-free protocol havng the lowest buffer sze n Secton III-B. Before applyng our protocol, we can determne whch protocol, among Chen s,, and ours, requres the least buffer sze usng the decson procedure descrbed n Fgure 5. We now apply ths decson procedure to the 16 fast/4 slowreader example consdered prevously. Table III shows 16 fast/4 slow readers N max s. At the frst step n the decson procedure, we can easly fnd that N max = 47 > M = 20. It mples that Chen s protocol needs lower

27 buffer sze than. Now, the next step s to check f Chen s tester s feasble. Chen s tester s evaluated as {3,...,22} by Theorem 3.4. TABLE III DECISION PROCEDURE ON 16 FAST AND 4 SLOW READERS Task N max + 1 Chen s Tester Feasblty Reader 0 48 22 O Reader 1 47 21 O Reader 2 47 20 O Reader 3 47 19 O Reader 4 10 18 X Reader 5 9 17 X Reader 6 9 16 X Reader 7 9 15 X Reader 8 8 14 X Reader 9 7 13 X Reader 10 7 12 X Reader 11 6 11 X Reader 12 6 10 X Reader 13 4 9 X Reader 14 3 8 X Reader 15 3 7 X Reader 16 3 6 X Reader 17 3 5 X Reader 18 3 4 X Reader 19 3 3 O Number of requred buffers 50 40 30 20 10 0 Lock-based Imp. Number of requred buffers 50 40 30 20 10 0 Lock-based Imp. (a) 16 Fast and 4 Slow Readers (b) 4 Fast and 16 Slow Readers Fg. 12. Buffer Szes Table III ndcates that Chen s tester s not feasble because 18 n the Chen s tester column s not between 1 and 10, for example. Therefore, at the fnal step, we can conclude that our protocol requres less buffers than Chen s. Ths s true as shown n Fgure 12, whch shows the

28 number of requred buffer sze for each protocol. VI. CONCLUSIONS In ths paper, we consder the sngle-wrter/multple-reader problem that occurs n embedded real-tme systems. We present an analytcal soluton to the problem of determnng the absolute mnmum buffer requrement of wat-free protocols for ths problem the frst such bound establshed for wat-free protocols that consder a-pror knowledge of nterferences. We also show that the space costs requred by prevous algorthms ncludng Chen s and can also be obtaned by our soluton, whch subsumes them as specal cases. We also present a wat-free protocol that uses the mnmum buffer sze determned by our analytcal soluton. Our evaluaton studes and mplementaton measurements valdate our analytcal results. Some aspects of the work are drectons for further research. Examples nclude extendng the protocol for the multple-wrter/multple-reader problem, and complex concurrent objects such as (non-blockng) stacks and queues. VII. ACKNOWLEDGEMENTS Ths work was sponsored by the US Offce of Naval Research under Grant N00014-00-1-0549 and The MITRE Corporaton under Grant 52917. A prelmnary verson of ths paper appeared n [12]. REFERENCES [1] L. Sha, R. Rajkumar, and J. P. Lehoczky, Prorty nhertance protocols: An approach to real-tme synchronzaton, IEEE Transactons on Computers, vol. 39, no. 9, pp. 1175 1185, 1990. [2] T. P. Baker, Stack-based schedulng of real-tme processes, Real-Tme Systems, vol. 3, no. 1, pp. 67 99, Mar. 1991. [3] R. K. Clark, Schedulng dependent real-tme actvtes, Ph.D. dssertaton, Carnege Mellon Unversty, 1990. [4] H. Kopetz and J. Resnger, The non-blockng wrte protocol nbw: A soluton to a real-tme synchronsaton problem, n IEEE Real-Tme Systems Symposum, 1993, pp. 131 137.

29 [5] J. Chen and A. Burns, A fully asynchronous reader/wrter mechansm for multprocessor real-tme systems, Unversty of York, Tech. Rep. YCS-288, May 1997. [6] J. H. Anderson, R. Jan, and S. Ramamurthy, Wat-free object-sharng schemes for real-tme unprocessors and multprocessors, n IEEE Real-Tme Systems Symposum, Dec. 1997, pp. 111 122. [7] H. Huang, P. Plla, and K. G. Shn, Improvng wat-free algorthms for nterprocess communcaton n embedded real-tme systems, n USENIX Annual Techncal Conference, 2002, pp. 303 316. [8] J. H. Anderson, S. Ramamurthy, and K. Jeffay, Real-tme computng wth lock-free shared objects, ACM Transactons On Computer Systems, vol. 15, no. 2, pp. 134 165, 1997. [9] H. Sundell and P. Tsgas, Space effcent wat-free buffer sharng n multprocessor real-tme systems based on tmng nformaton, n IEEE Real-Tme Computng Systems and Applcatons, 2000, pp. 433 440. [10] P. Ga, L. Aben, M. Gorg, and G. Buttazzo, A new kernel approach for modular real-tme systems development, n Euromcro Conference on Real-Tme Systems, 2001, pp. 199 206. [11] J. Chen and A. Burns, A three-slot asynchronous reader/wrter mechansm for multprocessor real-tme systems, Unversty of York, Tech. Rep. YCS-186, 1997. [12] H. Cho, B. Ravndran, and E. D. Jensen, A space-optmal, wat-free real-tme synchronzaton protocol, n IEEE Euromcro Conference on Real-Tme Systems, July 2005, pp. 79 88.