The Data Locality of Work Stealing
|
|
- Stewart Wright
- 6 years ago
- Views:
Transcription
1 The Daa Localiy of Work Sealing Umu A. Acar School of Compuer Science Carnegie Mellon Universiy Guy E. Blelloch School of Compuer Science Carnegie Mellon Universiy Rober D. Blumofe Deparmen of Compuer Sciences Universiy of Texas a Ausin rdb@cs.uexas.edu Absrac This paper sudies he daa localiy of he worksealing scheduling algorihm on hardwareconrolled sharedmemory machines. We presen lower and upper bounds on he number of cache misses using work sealing and inroduce a localiyguided worksealing algorihm along wih experimenal validaion. As a lower bound we show ha here is a family of mulihreaded compuaions each member of which requires oal operaions (work) for which when using worksealing he oal number of cache misses on one processor is consan while even on wo processors he oal number of cache misses is. For nesedparallel compuaions however we show ha on processors he expeced addiional number of cache misses beyond hose on a single processor is bounded by where is he execuion ime of an insrucion incurring a cache miss is he seal ime is he size of cache and is he number of nodes on he longes chain of dependences. Based on his we give srong bounds on he oal running ime of nesedparallel compuaions using work sealing. For he second par of our resuls we presen a localiyguided work sealing algorihm ha improves he daa localiy of mulihreaded compuaions by allowing a hread o have an affiniy for a processor. Our iniial experimens on ieraive daaparallel applicaions show ha he algorihm maches he performance of saicpariioning under radiional work loads bu improves he performance up o "! over saic pariioning under muliprogrammed work loads. Furhermore he localiyguided work sealing improves he performance of worksealing up o#$"!. 1 Inroducion Many of oday s parallel applicaions use sophisicaed adapive algorihms which are bes realized wih parallel programming sysems ha suppor dynamic lighweigh hreads such as Cilk [8] Nesl [5] Hood [10] and many ohers [ ]. The core of hese sysems is a hread scheduler ha balances load among he processes. In addiion o a good load balance however good daa localiy is essenial in obaining high performance from modern parallel sysems. Several researches have sudied echniques o improve he daa localiy of mulihreaded programs. One class of such echniques is based on sofwareconrolled disribuion of daa among he local memories of a disribued shared memory sysem [ ]. Anoher class of echniques is based on hins supplied by he programmer so ha similar asks migh be execued on he same processor [ ]. Boh hese classes of echniques rely on he programmer or compiler o deermine he daa access paerns in he program which may be very difficul when he program has complicaed daa access paerns. Perhaps he earlies class of echniques was o aemp o execue hreads ha are close in he compuaion graph on he same processor [ ]. The worksealing algorihm is he mos sudied of hese echniques [ ]. Blumofe e al showed ha fullysric compuaions achieve a provably good daa localiy [7] when execued wih he worksealing algorihm on a dagconsisen disribued shared memory sysems. In recen work Narlikar showed ha work sealing improves he performance of spaceefficien mulihreaded applicaions by increasing he daa localiy [29]. None of his previous work however has sudied upper or lower bounds on he daa localiy of mulihreaded compuaions execued on exising hardwareconrolled shared memory sysems. In his paper we presen heoreical and experimenal resuls on he daa localiy of work sealing on hardwareconrolled shared memory sysems (HSMSs). Our firs se of resuls are upper and lower bounds on he number of cache misses in mulihreaded compuaions execued by he worksealing algorihm. Le%'&( denoe he number of cache misses in he uniprocessor execuion and %')* denoe he number of cache misses in a processor execuion of a mulihreaded compuaion by he work sealing algorihm on an HSMS wih cache size. Then for a mulihreaded compuaion wih & work (oal number of insrucions) criical pah (longes sequence of dependences) we show he following resuls for he worksealing algorihm running on a HSMS. + Lower bounds on he number of cache misses for general compuaions: We show ha here is a family of compuaions wih &. such ha %'&( /102 while even on wo processors he number of misses % Upper bounds on he number of cache misses for nesedparallel compuaions: For a nesedparallel compuaion we show ha% )87 %'&9 :<;= > where> is he number of seals in he processor execuion. We hen show ha he
2 ? Speedup linear worksealing localiyguided worksealing saic parioning Number of Processes Figure 1: The speedup obained by hree differen overrelaxaion algorihms. expeced number of seals is ( ( where is he ime for a cache miss and is he ime for a seal. + Upper bound on he execuion ime of nesedparallel compuaions: We show ha he expeced execuion ime of a nesedparallel compuaion on processors is ' $ :/ G:H Ï =@$A9BDCFE ) : where &9 is he uniprocessor execuion ime of he compuaion including cache misses. As in previous work [6 9] we represen a mulihreaded compuaion as a direced acyclic graph (dag) of insrucions. Each node in he dag represens a single insrucion and he edges represen ordering consrains. A nesedparallel compuaion [5 6] is a racefree compuaion ha can be represened wih a seriesparallel dag [33]. Nesedparallel compuaions include compuaions consising of parallel loops and fork an joins and any nesing of hem. This class includes mos compuaions ha can be expressed in Cilk [8] and all compuaions ha can be expressed in Nesl [5]. Our resuls show ha nesedparallel compuaions have much beer localiy characerisics under work sealing han do general compuaions. We also briefly consider anoher class of compuaions compuaions wih fuures [ ] and show ha hey can be as bad as general compuaions. The second par of our resuls are on furher improving he daa localiy of mulihreaded compuaions wih work sealing. In work sealing a processor seals a hread from a randomly (wih uniform disribuion) chosen processor when i runs ou of work. In cerain applicaions such as ieraive daaparallel applicaions random seals may cause poor daa localiy. The localiyguided work sealing is a heurisic modificaion o work sealing ha allows a hread o have an affiniy for a process. In localiyguided work sealing when a process obains work i gives prioriy o a hread ha has affiniy for he process. Localiyguided work sealing can be used o implemen a number of echniques ha researchers sugges o improve daa localiy. For example he programmer can achieve an iniial disribuion of work among he processes or schedule hreads based on hins by appropriaely assigning affiniies o hreads in he compuaion. Our preliminary experimens wih localiyguided work sealing give encouraging resuls showing ha for cerain applicaions he performance is very close o ha of saic pariioning in dedicaed mode (i.e. when he user can lock down a fixed number of processors) bu does no suffer a performance cliff problem [10] in muliprogrammed mode (i.e. when processors migh be aken by oher users or he OS). Figure 1 shows a graph comparing work sealing localiyguided work sealing and saic pariioning for a simple overrelaxaion algorihm on a J9K processor Sun Ulra Enerprise. The overrelaxaion algorihm ieraes over a J dimensional array performing a 0 poin sencil compuaion on each sep. The superlinear speedup for saic pariioning and localiyguided work sealing is due o he fac ha he daa for each run does no fi ino he L; cache of one processor bu fis ino he collecive L; cache of L or more processors. For his benchmark he following can be seen from he graph. 1. Localiyguided work sealing does significanly beer han sandard work sealing since on each sep he cache is prewarmed wih he daa i needs. 2. Localiyguided work sealing does approximaely as well as saic pariioning for up o 14 processes. 3. When rying o schedule more han 14 processes on 14 processors saic pariioning has a serious performance drop. The iniial drop is due o load imbalance caused by he coarsegrained pariioning. The performance hen approaches ha of work sealing as he pariioning ges more finegrained. We are ineresed in he performance of worksealing compuaions on hardwareconrolled shared memory (HSMSs). We model an HSMS as a group of idenical processors each of which has is own cache and has a single shared memory. Each cache conains blocks and is managed by he memory subsysem auomaically. We allow for a variey of cache organizaions and replacemen policies including boh direcmapped and associaive caches. We assign a server process wih each processor and associae he cache of a processor wih process ha he processor is assigned. One limiaion of our work is ha we assume ha here is no false sharing. 2 Relaed Work As menioned in Secion 1 here are hree main classes of echniques ha researchers have suggesed o improve he daa localiy of mulihreaded programs. In he firs class he program daa is disribued among he nodes of a disribued sharedmemory sysem by he programmer and a hread in he compuaion is scheduled on he node ha holds he daa ha he hread accesses [ ]. In he second class daalocaliy hins supplied by he programmer are used in hread scheduling [ ]. Techniques from boh classes are employed in disribued shared memory sysems such as COOL and Illinois Concer [15 22] and also used o improve he daa localiy of sequenial programs [31]. However he firs class of echniques do no apply direcly o HSMSs because HSMSs do no allow sofware conrolled disribuion of daa among he caches. Furhermore boh classes of echniques rely on he programmer o deermine he daa access paerns in he applicaion and hus may no be appropriae for applicaions wih complex daaaccess paerns. The hird class of echniques which is based on execuion of hreads ha are close in he compuaion graph on he same process is applied in many scheduling algorihms including work sealing [ ]. Blumofe e al showed bounds on he number of cache misses in a fullysric compuaion execued by he worksealing algorihm under he dagconsisen disribued sharedmemory of Cilk [7]. Dag consisency is a relaxed memoryconsisency model ha is employed in he disribued sharedmemory implemenaion of he Cilk language. In a disribued Cilk applicaion processes mainain he dag consisency by means of he BACKER algorihm. In [7] Blumofe e al bound he number of sharedmemory cache misses in a disribued Cilk
3 RQ QVQR POM NQSQ Q P TU Figure 2: A dag (direced acyclic graph) for a mulihreaded compuaion. Threads are shown as gray recangles. applicaion for caches ha are mainained wih he LRU replacemen policy. They assumed ha accesses o he shared memory are disribued uniformly and independenly which is no generally rue because hreads may concurrenly access he same pages by algorihm design. Furhermore hey assumed ha processes do no generae seal aemps frequenly by making processes do addiional page ransfers before hey aemp o seal from anoher process. 3 The Model In his secion we presen a graphheoreic model for mulihreaded compuaions describe he worksealing algorihm define seriesparallel and nesedparallel compuaions and inroduce our model of an HSMS (Hardwareconrolled SharedMemory Sysem). As wih previous work [6 9] we represen a mulihreaded compuaion as a direced acyclic graph a dag of insrucions (see Figure 2). Each node in he dag represens an insrucion and he edges represen ordering consrains. There are hree ypes of edges coninuaion spawn and dependency edges. A hread is a sequenial ordering of insrucions and he nodes ha corresponds o he insrucions are linked in a chain by coninuaion edges. A spawn edge represens he creaion of a new hread and goes from he node represening he insrucion ha spawns he new hread o he node represening he firs insrucion of he new hread. A dependency edge from insrucion W of a hread o insrucionx of some oher hread represens a synchronizaion beween wo insrucions such ha insrucionx mus be execued aferw. We draw spawn edges wih hick sraigh arrows dependency edges wih curly arrows and coninuaion edges wih hick sraigh arrows hroughou his paper. Also we show pahs wih wavy lines. For a compuaion wih an associaed dag we define he compuaional work & as he number of nodes in and he criical pah as he number of nodes on he longes pah of. LeY andz be any wo nodes in a dag. Then we cally an ancesor ofz andz a descendan ofy if here is a pah fromy oz. Any node is is descendan and ancesor. We say ha wo nodes are relaives if here is a pah from one o he oher oherwise we say ha he nodes are independen. The children of a node are independen because oherwise he edge from he node o one child is redundan. We call a common descendan[ ofy andz a merger of Y andz if he pahs fromy o[ andz o[ have only[ in common. We define he deph of a node Y as he number of edges on he shores pah from he roo node oy. We define he leas common ancesor ofy andz as he ancesor of bohy andz wih maximum deph. Similarly we define he greaes common descendan of Y and Z as he descendan of boh Y and Z wih minimum deph. An edge Y]\^Z2 is redundan if here is a pah beween Y and Z ha does no conain he edge Y \^Z2. The ransiive reducion of a dag is he dag wih all he redundan edges removed. In his paper we are only concerned wih he ransiive reducion of he compuaional dags. We also require ha he dags have a single node wih indegree he roo and a single node wih oudegree he final node. In a muliprocess execuion of a mulihreaded compuaion independen nodes can execue a he same ime. If wo independen nodes read or modify he same daa we say ha hey are RR or WW sharing respecively. If one node is reading and he oher is modifying he daa we say hey are RW sharing. RW or WW sharing can cause daa races and he oupu of a compuaion wih such races usually depends on he scheduling of nodes. Such races are ypically indicaive of a bug [18]. We refer o compuaions ha do no have any RW or WW sharing as racefree compuaions. In his paper we consider only racefree compuaions. The worksealing algorihm is a hread scheduling algorihm for mulihreaded compuaions. The idea of worksealing daes back o he research of Buron and Sleep [11] and has been sudied exensively since hen [ ]. In he worksealing algorihm each process mainains a pool of ready hreads and obains work from is pool. When a process spawns a new hread he process adds he hread ino is pool. When a process runs ou of work and finds is pool empy i chooses a random process as is vicim and ries o seal work from he vicim s pool. In our analysis we imagine he worksealing algorihm operaing on individual nodes in he compuaion dag raher han on he hreads. Consider a mulihreaded compuaion and is execuion by he worksealing algorihm. We divide he execuion ino discree ime seps such ha a each sep each process is eiher working on a node which we call he assigned node or is rying o seal work. The execuion of a node akesj ime sep if he node does no incur a cache miss and seps oherwise. We say ha a node is execued a he ime sep ha a process complees execuing he node. The execuion ime of a compuaion is he number of ime seps ha elapse beween he ime sep ha a process sars execuing he roo node o he ime sep ha he final node is execued. The execuion schedule specifies he aciviy of each process a each ime sep. During he execuion each process mainains a deque (doubly ended queue) of ready nodes; we call he ends of a deque he op and he boom. When a nodey is execued i enables some oher node Z if Y is he las paren of Z ha is execued. We call he edge Y]\^Z2 an enabling edge and Y he designaed paren of Z. When a process execues a node ha enables oher nodes one of he enabled nodes become he assigned node and he process pushes he res ono he boom of is deque. If no node is enabled hen he process obains work from is deque by removing a node from he boom of he deque. If a process finds is deque empy i becomes a hief and seals from a randomly chosen process he vicim. This is a seal aemp and akes a leas and a mos _F ime seps for some consan_a`bj o complee. A hief process migh make muliple seal aemps before succeeding or migh never succeed. When a seal succeeds he hief process sars working on he solen node a he sep following he compleion of he seal. We say ha a seal aemp occurs a he sep i complees. The worksealing algorihm can be implemened in various ways. We say ha an implemenaion of work sealing is deerminisic if whenever a process enables oher nodes he implemenaion always chooses he same node as he assigned node for hen nex sep on ha process and he remaining nodes are always placed in he deque in he same order. This mus be rue for boh muliprocess and uniprocess execuions. We refer o a deerminisic implemenaion of he worksealing algorihm ogeher wih he HSMS ha runs he implemenaion as a work sealer. For breviy we refer o an execuion of a mulihreaded compuaion wih a work sealer as an execuion. We define he oal work as he number of seps aken by a uniprocess execuion including he cache misses and denoe i by& where is he cache size. We denoe he number of cache misses in a process execuion wih block caches as %')c. We define he cache overhead
4 2f 1f f 1 h f g 2 ~ ~ ~~ ~ ~ ƒ ~ ~ ~ ~ ed i h g (a) (b) (c) Figure 3: Illusraes he recursive definiion for seriesparallel dags. Figure (a) is he base case figure (b) depics he serial and figure (c) depics he parallel composiion. of a process execuion as % ) kjl%'&( where %'&9 is he number of misses in he uniprocess execuion on he same work sealer. We refer o a mulihreaded compuaion for which he ransiive reducion of he corresponding dag is seriesparallel [33] as a seriesparallel compuaion. A seriesparallel dag mn\po is a dag wih wo disinguished verices a source rqsm and a sink qrm and can be defined recursively as follows (see Figure 3). + Base: consiss of a single edge connecing o. + Series Composiion: consiss of wo seriesparallel dags &9 m]&(\^o&^ and 3$ mu32\^o3= wih disjoin edge ses such ha is he source of & Y is he sink of & and he source of 3 and is he sink of 3. Moreoverm &wv m 3 yx$yfz. + Parallel Composiion: The graph consiss of wo seriesparallel dags &9 m]&(\^o&^ and 3$ mu32\^o3= wih disjoin edges ses such ha and are he source and he sink of boh & and 3. Moreoverm &*v m 3 {x$$\ z. A nesedparallel compuaion is a racefree seriesparallel compuaion [6]. We also consider mulihreaded compuaions ha use fuures [ ]. The dag srucures of compuaions wih fuures are defined elsewhere [4]. This is a superclass of nesedparallel compuaions bu sill much more resricive han general compuaions. The worksealing algorihm for fuures is a resriced form of worksealing algorihm where a process sars execuing a newly creaed hread immediaely puing is assigned hread ono is deque. In our analysis we consider several cache organizaion and replacemen policies for an HSMS. We model a cache as a se of (cache) lines each of which can hold he daa belonging o a memory block (a consecuive ypically small region of memory). One insrucion can operae on a mos one memory block. We say ha an insrucion accesses a block or he line ha conains he block when he insrucion reads or modifies he block. We say ha an insrucion overwries a line ha conains he block when he insrucion accesses some oher block ha replaces in he cache. We say ha a cache replacemen policy is simple if i saisfies wo condiions. Firs he policy is deerminisic. Second whenever he policy decides o overwrie a cache line } i makes he decision o overwrie} by only using informaion peraining o he accesses ha are made afer he las access o }. We refer o a cache managed wih a simple cachereplacemen policy as a simple cache. Simple caches and replacemen policies are common in pracice. For example leasrecenly used (LRU) replacemen policy direc Š ˆ ˆ ~ƒ~ Figure 4: The srucure for dag of a compuaion wih a large cache overhead. mapped caches and se associaive caches where each se is mainained by a simple cache replacemen policy are simple. In regards o he definiion of RW or WW sharing we assume ha reads and wries perain o he whole block. This means we do no allow for false sharing when wo processes accessing differen porions of a block invalidae he block in each oher s caches. In pracice false sharing is an issue bu can ofen be avoided by a knowledge of underlying memory sysem and appropriaely padding he shared daa o preven wo processes from accessing differen porions of he same block. 4 General Compuaions In his secion we show ha he cache overhead of a muliprocess execuion of a general compuaion and a compuaion wih fuures can be large even hough he uniprocess execuion incurs a small number of misses. Theorem 1 There is a family of compuaions x Œ y_f\^žf " n_qr H cz wih compuaional work whose uniprocess execuion incurs 02 misses while any; process execuion of he compuaion incurs misses on a work sealer wih a cache size of assuming ha ay where is he maximum seal ime. Proof: Figure 4 shows he srucure of a dag C for 4 K2. Each node excep he roo node represens a sequence of insrucions accessing a se of disinc memory blocks. The roo node represens :/ insrucions ha accesses disinc memory blocks. The graph has wo symmeric componens and C C which corresponds o he lef and he righ subree of he roo excluding he leaves. We pariion he nodes in ino hree classes C such ha all nodes in a class access he same memory blocks while nodes from differen classes access muually disjoin se of memory blocks. The firs class conains he roo node only he second class conains all he nodes in and he hird class conains he res C of he nodes which are he nodes in and he leaves of C. For general y_ can be pariioned ino and he_ C leaves of and he roo similarly. Each of and conains ;] 3 ]jj nodes and has he srucure of a complee binary ree wih addiional_ leaves a he lowes level. There is a dependency edge from he leaves of boh and o he leaves of. Consider a work sealer ha execues he nodes of in he order ha hey are numbered in a uniprocess execuion. In he uniprocess execuion no node in incurs a cache miss excep he roo node since all nodes in access he same memory blocks as he roo of. The same argumen holds for and he_ leaves of. Hence he execuion of he nodes in and he leaves causes ;= misses. Since he roo node causes misses he oal
5 Ÿ ž Ÿ š š œ Figure 5: The srucure for dag of a compuaion wih fuures ha can incur a large cache overhead. number of misses in he uniprocess execuion is 0". Now consider a; process execuion wih he same work sealer and call he processes process and J. A ime sep J process sars execuing he roo node which enables he roo of no laer han ime sep. Since process sars sealing immediaely and here are no oher processes o seal from process J seals and sars working on he roo of no laer han ime sepb:<. Hence he roo of execues before he roo of and hus all he nodes in execue before he corresponding symmeric node in. Therefore for any leaf of he paren ha is in execues before he paren in. Therefore a leaf node of is execued immediaely afer is paren in and hus causes cache misses. Thus he oal number of cache misses is _ *y. There exiss compuaions similar o he compuaion in Figure 4 ha generalizes Theorem 1 for arbirary number of processes by making sure ha all he processes bu ; seal hroughou any muliprocess execuion. Even in he general case however where he average parallelism is higher han he number of processes Theorem 1 can be generalized wih he same bound on expeced number of cache misses by exploiing he symmery in and by assuming a symmerically disribued sealime. Wih a symmerically disribued sealime for any a seal ha akes seps more han mean sealime is equally likely o happen as a seal ha akes less seps han he mean. Theorem 1 holds for compuaions wih fuures as well. Mulihreaded compuing wih fuures is a fairly resriced form of mulihreaded compuing compared o compuing wih evens such as synchronizaion variables. The graph in Figure 5 shows he srucure of a dag whose ; process execuion causes large number of cache misses. In a ; process execuion of he enabling paren of he leaf nodes in he righ subree of he roo are in he lef subree and herefore he execuion of each such leaf node causes misses. 5 NesedParallel Compuaions In his secion we show ha he cache overhead of an execuion of a nesedparallel compuaion wih a work sealer is a mos wice he produc of he number of seals and he cache size. Our proof has wo seps. Firs we show ha he cache overhead is bounded by he produc of he cache size and he number of nodes ha are execued ou of order wih respec o he uniprocess execuion order. Second we prove ha he number of such ouoforder execuions is a mos wice he number of seals. Consider a compuaion and is process execuion ) wih a work sealer and he uniprocess execuion & wih he same work sealer. LeZ be a node in and nodey be he node ha execues immediaely beforez in &. Then we say haz is drifed in ) if node Y is no execued immediaely before Z by he process ha execuesz in ). Lemma 2 esablishes a key propery of an execuion wih simple caches. Lemma 2 Consider a process wih a simple cache of blocks. Le & denoe he execuion of a sequence of insrucions on he process saring wih cache sae & and le /3 denoe he execuion of he same sequence of insrucions saring wih cache sae 3. Then & incurs a mos more misses han 3. Proof: We consruc a oneoone mapping beween he cache lines in & and 3 such ha an insrucion ha accesses a line } & in & accesses he enry} 3 in /3 if and only if}ª& is mapped o }3. Consider & and le} & be a cache line. LeW be he firs insrucion ha accesses or overwries }ª&. Le } 3 be he cache line ha he same insrucion accesses or overwries in /3 and map } & o }3. Since he caches are simple an insrucion ha overwries} & in & overwries}3 in 3. Therefore he number of misses ha overwries} & in & is equal o he number of misses ha overwries} 3 in /3 afer insrucion W. Since W iself can cause J miss he number of misses ha overwries} & in & is a mosj more han he number of misses ha overwries}3 in 3. We consruc he mapping for each cache line in & in he same way. Now le us show ha he mapping is oneoone. For he sake of conradicion assume ha wo cache lines} & and}3 in & map o he same line in 3. Le W & and W3 be he firs insrucions accessing he cache lines in & such ha WI& is execued before WD3. SinceW«& and WD3 map o he same line in 3 and caches are simplew 3 accesses he line ha W & accesses in & bu hen} & y}3 a conradicion. Hence he oal number of cache misses in & is a mos more han he misses in 3. Theorem 3 Le denoe he oal number of drifed nodes in an execuion of a nesedparallel compuaion wih a work sealer on processes each of which has a simple cache wih words. Then he cache overhead of he execuion is a mos. Proof: Le ) denoe he process execuion and le & be he uniprocess execuion of he same compuaion wih he same work sealer. We divide he muliprocess compuaion ino pieces each of which can incur a mos more misses han in he uniprocess execuion. Le Y be a drifed node le be he process ha execues Y. Le Z be he nex drifed node execued on (or he final node of he compuaion). Le he ordered se represen he execuion order of all he nodes ha are execued afery (Y is included) and beforez (Z is excluded if i is drifed included oherwise) on in ). Then nodes in are execued on he same process and in he same order in boh & and ). Now consider he number of cache misses during he execuion of he nodes in in & and ). Since he compuaion is nesed parallel and herefore race free a process ha execues in parallel wih does no cause o incur cache misses due o sharing. Therefore by Lemma 2 during he execuion of he nodes in he number of cache misses in ) is a mos more han he number of misses in &. This bound holds for each of he sequence of such insrucions corresponding o drifed nodes. Since he sequence saring a he roo node and ending a he firs drifed node incurs he same number of misses in & and ) ) akes a mos more misses han & and he cache overhead is a mos. Lemma 2 (and hus Theorem 3) does no hold for caches ha are no simple. For example consider he execuion of a sequence of insrucions on a cache wih leasfrequenlyused replacemen policy saring a wo cache saes. In he firs cache sae he blocks ha are frequenly accessed by he insrucions are in he cache wih high frequencies whereas in he second cache sae he blocks ha
6 G 1 ± ¹ ¾ º Á À ½ ² ³ Figure 6: Children of and heir merger. µ Figure 7: The join embedding of Y and Z. are in he cache are no accessed by he insrucion and have low frequencies. The execuion wih he second cache sae herefore incurs many more misses han he size of he cache compared o he execuion wih he second cache sae. Now we show ha he number of drifed nodes in an execuion of a seriesparallel compuaion wih a work sealer is a mos wice he number of seals. The proof is based on he represenaion of seriesparallel compuaions as spdags. We call a node wih oudegree of a leas; a fork node and pariion he nodes of an spdag excep he roo ino hree caegories: join nodes sable nodes and nomadic nodes. We call a node ha has an indegree of a leas ; a join node and pariion all he nodes ha have indegree J ino wo classes: a nomadic node has a paren ha is a fork node and a sable node has a paren ha has oudegreej. The roo node has indegree and i does no belong o any of hese caegories. Lemma 4 liss wo fundamenal properies of spdags; one can prove boh properies by inducion on he number of edges in an spdag. Lemma 4 Le be an spdag. Then has he following properies. 1. The leas common ancesor of any wo nodes in is unique. 2. The greaes common descendan of any wo nodes in is unique and is equal o heir unique merger. Lemma 5 Le be a fork node. Then no child of is a join node. Proof: Le Y and Z denoe wo children of and suppose Y is a join node as in Figure 6. Le denoe some oher paren ofy and denoe he unique merger of Y and Z. Then boh and Y are mergers for and which is a conradicion of Lemma 5. Hence Y is no a join node. Corollary 6 Only nomadic nodes can be solen in an execuion of a seriesparallel compuaion by he worksealing algorihm. Proof: Le Y be a solen node in an execuion. Then Y is pushed on a deque and hus he enabling paren of Y is a fork node. By Lemma 5Y is no a join node and has an incoming degreej. Therefore Y is nomadic. Consider a seriesparallel compuaion and le be is spdag. LeY andz be wo independen nodes in and le and denoe heir leas common ancesor and greaes common descendan respecively as shown in Figure 7. Le & denoe he graph ha is G 2» ¼ G1 Figure 8: The join node is he leas common ancesor of[ and. NodeY andz are he children of. induced by he relaives ofy ha are descendans of and also ancesors of. Similarly le 3 denoe he graph ha is induced by he relaives of Z ha are descendans of and ancesors of. Then we call & he embedding of Y wih respec oz and 3 he embedding ofz wih respec oy. We call he graph ha is he union of & and 3 he join embedding of Y and Z wih source and sink. Now consider an execuion of and[ and be he children of such ha[ is execued before. Then we call[ he leader and he guard of he join embedding. Lemma 7 Le mn\^o be an spdag and le[ and be wo parens of a join node in. Le & denoe he embedding of[ wih respec o and 3 denoe he embedding of wih respec o [. Le denoe he source and denoe he sink of he join embedding. Then he parens of any node in & excep for and is in & and he parens of any node in 3 excep for and is in 3. Proof: Since[ and are independen boh of and are differen from [ and (see Figure 8). Firs we show ha here is no an edge ha sars a a node in & excep a and ends a a node in 3 excep a and vice versa. For he sake of conradicion assume here is an edge G\Â such ha ÄÃ b is in & and yã is in 3. Then is he leas common ancesor of [ and ; hence no such G\^ exiss. A similar argumen holds when is in 3 and is in &. Second we show ha here does no exiss an edge ha originaes from a node ouside of & or 3 and ends a a node a & or 3. For he sake of conradicion le Å \ÂÆ be an edge such ha Æ is in & andå is no in & or 3. ThenÆ is he unique merger for he wo children of he leas common ancesor of Å and which we denoe wih. Bu hen is also a merger for he children of. The children of are independen and have a unique merger hence here is no such edge ÅÇ\ Æ. A similar argumen holds whenæ is in 3. Therefore we conclude ha he parens of any node in & excep and is in & and he parens of any node in 3 excep and is in 3. Lemma 8 Le be an spdag and le[ and be wo parens of a join node in. Consider he join embedding of[ and and le Y be he guard node of he embedding. Then[ and are execued in he same respecive order in a muliprocess execuion as hey are execued in he uniprocess execuion if he guard node Y is no solen. Proof: Le be he source he sink and Z he leader of he join embedding. SinceY is no solen Z is no solen. Hence by Lemma 7 before i sars working on Y he process ha execues execued Z and all is descendans in he embedding excep for Hence is execued before Y and [ is execued afer Y as in he uniprocess execuion. Therefore[ and are execued in he same respecive order as hey execue in he uniprocess execuion. G2
7 Ê ÎÊ ÌÊ Ë ÏÉ É Í ÍÉ Ì Ê È Figure 9: Nodes & and 3 are wo join nodes wih he common guardy. Lemma 9 A nomadic node is drifed in an execuion only if i is solen. Proof: Le Y be a nomadic and drifed node. Then by Lemma 5 Y has a single paren ha enablesy. IfY is he firs child of o execue in he uniprocess execuion hen Y is no drifed in he muliprocess execuion. Hence Y is no he firs child o execue. Le Z be he las child of ha is execued beforey in he uniprocess execuion. Now consider he muliprocess execuion and le be he process ha execues Z. For he sake of conradicion assume ha Y is no solen. Consider he join embedding of Y and Z as shown in Figure 8. Since all parens of he nodes in 3 excep for and are in 3 by Lemma 7 execues all he nodes in 3 before i execues Y and hus precedes Y on. Bu heny is no drifed because is he node ha is execued immediaely before Y in he uniprocess compuaion. Hence Y is solen. Le us define he cover of a join node in an execuion as he se of all he guard nodes of he join embedding of all possible pairs of parens of in he execuion. The following lemma shows ha a join node is drifed only if a node in is cover is solen. Lemma 10 A join node is drifed in an execuion only if a node in is cover is solen in he execuion. Proof: ÈÉ Consider he execuion and le be a join node ha is drifed. Assume for he sake of conradicion ha no node in he cover of is solen. Le [ and be any wo parens of as in Figure 8. Then[ and are execued in he same order as in he uniprocess execuion by Lemma 8. Bu hen all parens of execue in he same order as in he uniprocess execuion. Hence he enabling paren of in he execuion is he same as in he uniprocess execuion. Furhermore he enabling paren of has oudegree J because oherwise is no a join node by Lemma 5 and hus he process ha enables execues. Therefore is no drifed. A conradicion hence a node in he cover of is solen. Lemma 11 The number of drifed nodes in an execuion of a seriesparallel compuaion is a mos wice he number of seals in he execuion. Proof: We associae each drifed node in he execuion wih a seal such ha no seal has more han ; drifed nodes associaed wih i. Consider a drifed node Y. Then Y is no he roo node of he compuaion and i is no sable eiher. Hence Y is eiher a nomadic or join node. IfY is nomadic heny is solen by Lemma 9 and we associae Y wih he seal ha seals Y. Oherwise Y is a join node and here is a node in is cover YF ha is solen by Lemma 10. We associaey wih he seal ha seals a node in is cover. Now assume here are more han; nodes associaed wih a seal ha seals node Y. Then here are a leas wo join nodes & and 3 ha are associaed wihy. Therefore nodey is in he join embedding of wo parens of & and also 3. Le Æ & [ & be hese parens of & andæ 3 [ 3 be he parens of 3 as shown in Figure 9. Bu heny has paren ha is a fork node and is a join node which conradics Lemma 5. Hence no such Y exiss. Theorem 12 The cache overhead of an execuion of a nesedparallel compuaion wih simple caches is a mos wice he produc of he number of misses in he execuion and he cache size. Proof: Follows from Theorem 3 and Lemma An Analysis of Nonblocking Work Sealing The nonblocking implemenaion of he worksealing algorihm delivers provably good performance under radiional and muliprogrammed workloads. A descripion of he implemenaion and is analysis is presened in [2]; an experimenal evaluaion is given in [10]. In his secion we exend he analysis of he nonblocking worksealing algorihm for classical workloads and bound he execuion ime of a nesedparallel compuaion wih a work sealer o include he number of cache misses he cachemiss penaly and he seal ime. Firs we bound he number of seal aemps in an execuion of a general compuaion by he worksealing algorihm. Then we bound he execuion ime of a nesedparallel compuaion wih a work sealer using resuls from Secion 5. The analysis ha we presen here is similar o he analysis given in [2] and uses he same poenial funcion echnique. We associae a nonnegaive poenial wih nodes in a compuaion s dag and show ha he poenial decreases as he execuion proceeds. We assume ha a node in a compuaion dag has oudegree a mos ;. This is consisen wih he assumpion ha each node represens on insrucion. Consider an execuion of a compuaion wih is dag mc\^o wih he worksealing algorihm. The execuion grows a ree he enabling ree ha conains each node in he compuaion and is enabling edge. We define he disance of a node YqGm ÐF YF as j4ð2ñâò ªÓ YF where Ð2ÑÂÒ Ó YF is he deph of Y in he enabling ree of he compuaion. Inuiively he disance of a node indicaes how far he node is away from end of he compuaion. We define he poenial funcion in erms of disances. A any given sep W we assign a posiive poenial o each ready node all oher nodes have poenial. A node is ready if i is enabled and no ye execued o compleion. Le Y denoe a ready node a ime sep W. Then we define ÔÕI YF he poenial of Y a ime sep W as Ö Ô Õ Ỹ c 0 3^ B Ø9EIÙ & if Y is assigned; 0 3^ B Ø9E oherwise. The poenial a sepwú Õ is he sum of he poenial of each ready node a sep W. When an execuion begins he only ready node is he roo node which has disance and is assigned o some process so we sar wih ÚÛÜ0 Ý/Ù &. As he execuion proceeds nodes ha are deeper in he dag become ready and he poenial decreases. There are no ready nodes a he end of an execuion and he poenial is. Le us give a few more definiions ha enable us o associae a poenial wih each process. Le Õ = denoe he se of ready nodes ha are in he deque of process along wih s assigned node if any a he beginning of sep W. We say ha each node in ÞÕp = Y belongs o process. Then we define he poenial of s deque as Ú Õ = ß Ô Õ YF nå Ø$à"áâªBäãIE
8 ç beginning of sep W and le Õ denoe he se of all oher Ú Õ {Ú Õ æ Õ :8Ú Õ Õ n\ In addiion leæ Õ denoe he se of processes whose deque is empy a he processes. We pariion he poenial ÚèÕ ino wo pars Ú Õ Ú Õ Ú Õ Ú Õ where æ Õ *éß = ãêà2ëâ and Õ *éß = n\ ãêàuìcâ and we analyze he wo pars separaely. Lemma 13 liss four basic properies of he poenial ha we use frequenly. The proofs for hese properies are given in [2] and he lised properies are correc independen of he ime ha execuion of a node or a seal akes. Therefore we give a shor proof skech. Lemma 13 The poenial funcion saisfies he following properies. 1. Suppose nodey is assigned o a process a sepw. Then he poenial decreases by a leas ;$í(02ïô Õ YF. 2. Suppose a node Y is execued a sep W. Then he poenial decreases by a leas $í î"ïôõ«ỹ a sepw. 3. Consider any sep W and any process in Õ. The opmos node Y in s deque conribues a leas 0"í K of he poenial associaed wih. Tha is we have ÔÕI YF ïỳ 0"í K" ÂÚðÕI =. 4. Suppose a process Ò chooses process in Õ as is vicim a ime sepw (a seal aemp ofò argeing occurs a sepw). Then he poenial decreases by a leas ÂJ(í$;= ÂÚ Õ = due o he assignmen or execuion of a node belonging o a he end of sepw. Propery J follows direcly from he definiion of he poenial funcion. Propery ; holds because a node enables a mos wo children wih smaller poenial one of which becomes assigned. Specifically he poenial afer he execuion of node Y decreases by a leas Ôc YF ê ÂJñj ò& j ó& *õôó Ôc YF. Propery0 follows from a srucural propery of he nodes in a deque. The disance of he nodes in a process deque decrease monoonically from he op of he deque o boom. Therefore he poenial in he deque is he sum of geomerically decreasing erms and dominaed by he poenial of he op node. The las propery holds because when a process chooses process in Õ as is vicim he node a he op of s deque is assigned a he nex sep. Therefore he poenial decreases by ;=í 0=Ô Õ YF by propery J. Moreover Ô Õ YF k`{ 02í K" ÂÚ Õ = by propery 0 and he resul follows. Lemma 16 shows ha he poenial decreases as a compuaion proceeds. The proof for Lemma 16 uilizes balls and bins game bound from Lemma 14. Lemma 14 (Balls and Weighed Bins) Suppose ha a leas balls are hrown independenly and uniformly a random ino bins where binw has a weighö Õ forw J$\9å å9å9\^. The oal weigh is öøúù Õüû& ) ö Õ. For each binw define he random variable Õ as ýõ þ öaõ if some ball lands in bin W ; oherwise. If 1úù Õüû& ) Õ hen for anyÿ in he range 'ÿ új we have x( é`'ÿnöúz yjèj8j(í2 Â ÂJðj4ÿ*ÏÑ=. This lemma can be proven wih an applicaion of Markov s inequaliy. The proof of a weaker version of his lemma for he case of exacly hrows is similar and given in [2]. Lemma 14 also follows from he weaker lemma because does no decrease wih more hrows. We now show ha whenever or more seal aemps occur he poenial decreases by a consan fracion of Ú Õ Õ wih consan probabiliy. Lemma 15 Consider any sep W and any laer sepx such ha a leas seal aemps occur a seps from W (inclusive) o X (exclusive). Then we have þ*ú Õ j Ú Þ` K J Ú Õ Õ J K å Moreover he poenial decrease is because of he execuion or assignmen of nodes belonging o a process in /Õ. Proof: Consider all processes and seal aemps ha occur a or afer sep W. For each process in Õ if one or more of he aemps arge as he vicim hen he poenial decreases by ÂJ(í=;$ ÂÚèÕI = due o he execuion or assignmen of nodes ha belong o by properyk in Lemma 13. If we hink of each aemp as a ball oss hen we have an insance of he Balls and Weighed Bins Lemma (Lemma 14). For each process in /Õ we assign a weigh ö ã õ ÂJ(í=;$ ÂÚèÕp = and for each oher process inæ Õ we assign a weighö ã y. The weighs sum oö ÂJí$;$ ÂÚ Õ Õ. Using ÿ J(í=; in Lemma 14 we conclude ha he poenial decreases by a leasÿnö ÂJí K" ÂÚ Õ Õ wih probabiliy greaer han Jèj J(í2 Â ÂJj4ÿ*ÏÑ= Jí K due o he execuion or assignmen of nodes ha belong o a process in Õ. We now bound he number of seal aemps in a worksealing compuaion. Lemma 16 Consider a process execuion of a mulihreaded compuaion wih he worksealing algorihm. Le & and denoe he compuaional work and he criical pah of he compuaion. Then he expeced number of seal aemps in he execuion is (. Moreover for any he number of seal aemps is ( :F ÂJ(í= Â wih probabiliy a leas Jèj. Proof: We analyze he number of seal aemps by breaking he execuion ino phases of seal aemps. We show ha wih consan probabiliy a phase causes he poenial o drop by a consan facor. The firs phase begins a sep & J and ends a he firs sep & such ha a leas seal aemps occur during he inerval of seps & \ &. The second phase begins a sep 3 & : J and so on. Le us firs show ha here are a leas seps in a phase. A process has a mos J ousanding seal aemp a any ime and a seal aemp akes a leas seps o complee. Therefore a mos seal aemps occur in a period of ime seps. Hence a phase of seal aemps akes a leas I ( ( Â Âí$(`s ime unis. Consider a phase beginning a sep W and le X be he sep a which he nex phase begins. Then W :< 7 X. We will show ha we have x2ú 7 02í(K2 ÂÚ Õ z J(í K. Recall ha he poenial can be pariioned as ÚèÕyÚèÕI æ Õ 2:GÚèÕª /Õ. Since he phase conains (w seal aemps x$ú Õ j8ú `{ ÂJ(í(K2 ÂÚ Õ Õ Âz Jí K due o execuion or assignmen of nodes ha belong o a process in Õ by Lemma 15. Now we show ha he poenial also drops by a consan fracion ofú Õ æ Õ due o he execuion of assigned nodes ha are assigned o he processes in æ Õ. Consider a process say in æ Õ. If does no have an assigned node hen Ú Õ =.. If has an assigned node Y hen ÚèÕI = ý ÔÕI YF. In his case process complees execuing node Y a sep W=:aúj4JX a he
9 laes and he poenial drops by a leas $í(î2ïô Õ YF by propery ; of Lemma 13. Summing over each process in æ Õ we have ÚèÕ]j Ú ` $í î" ÂÚèÕ«æ Õ. Thus we have shown ha he poenial decreases a leas by a quarer of Ú Õ æ Õ and Ú Õ Õ. Therefore no maer how he oal poenial is disribued over æ Õ and Õ he oal poenial decreases by a quarer wih probabiliy more han Jí K ha is x=ú Õ jú ỳ ÂJí K" ÂÚ Õ z {J(í(K. We say ha a phase is successful if i causes he poenial o drop by a leas a Jí K fracion. A phase is successful wih probabiliy a leas Jí K. Since he poenial sars a ÚÛ.0 Ý Ù & and ends a (and is always an ineger) he number of successful phases is a mos ; "! j{j( ò 0b#$. The expeced number of phases needed o obain #= successful phases is a mos0";. Thus he expeced number of phases is and because each phase conains w seal aemps he expeced number of seal aemps is ( (4. The high probabiliy bound follows by an applicaion of he Chernoff bound. Theorem 17 Le %8)c be he number of cache misses in a process execuion of a nesedparallel compuaion wih a worksealer ha has simple caches of blocks each. Le% & be he number of cache misses in he uniprocess execuion Then %')* c{% & : ( {: #%$c ÂJí=  wih probabiliy a leas J2j&. The expeced number of cache misses is %'& :8 ( Proof: Theorem 12 shows ha he cache overhead of a nesedparallel compuaion is a mos wice he produc of he number of seals and he cache size. Lemma 16 shows ha he number of seal aemps is (((/ ý:%$c ÂJí=   wih probabiliy a leas J2j and he expeced number of seals is (((<. The number of seals is no greaer han he number of seal aemps. Therefore he bounds follow. Theorem 18 Consider a process nesedparallel worksealing compuaion wih simple caches of blocks. Then for any y he execuion ime is & :8 :%$ ÂJ(í=  =: 8:( ê :'%$ ÂJí=   wih probabiliy a leas ÂJj(=. Moreover he expeced running ime is Proof: &( :<8 :s b:8(ï nå We use an accouning argumen o bound he running ime. A each sep in he compuaion each process pus a dollar ino one of wo buckes ha maches is aciviy a ha sep. We name he wo buckes as he work and he seal bucke. A process pus a dollar ino he work bucke a a sep if i is working on a node in he sep. The execuion of a node in he dag adds eiher J or dollars o he work bucke. Similarly a process pus a dollar ino he seal bucke for each sep ha i spends sealing. Each seal aemp akes ( seps. Therefore each seal adds ( dollars o he seal bucke. The number of dollars in he work bucke a he end of execuion is a mos & :y j8j( %8)k  which is & : j8j ) y:%$c ÂJ(í+   *./01 Figure 10: The ree of hreads creaed in a daaparallel worksealing applicaion.. wih probabiliy a leasjèj3 The oal number of dollars in seal bucke is he oal number of seal aemps muliplied by he number of dollars added o he seal bucke for each seal aemp which is (. Therefore oal number of dollars in he seal bucke is 4) * < {:3%$c ÂJí wih probabiliy a leas J/j4. Each process adds exacly one dollar o a bucke a each sep so we divide he oal number of dollars by o ge he high probabiliy bound in he heorem. A similar argumen holds for he expeced ime bound. 7 LocaliyGuided Work Sealing The worksealing algorihm achieves good daa localiy by execuing nodes ha are close in he compuaion graph on he same process. For cerain applicaions however regions of he program ha access he same daa are no close in he compuaional graph. As an example consider an applicaion ha akes a sequence of seps each of which operaes in parallel over a se or array of values. We will call such an applicaion an ieraive daaparallel applicaion. Such an applicaion can be implemened using worksealing by forking a ree of hreads on each sep in which each leaf of he ree updaes a region of he daa (ypically disjoin). Figure 10 shows an example of he rees of hreads creaed in wo seps. Each node represens a hread and is labeled wih he process ha execues i. The gray nodes are he leaves. The hreads synchronize in he same order as hey fork. The firs and second seps are srucurally idenical and each pair of corresponding gray nodes updae he same region ofen using much of he same inpu daa. The dashed recangle in Figure 10 for example shows a pair of such gray nodes. To ge good localiy for his applicaion hreads ha updae he same daa on differen seps ideally should run on he same processor even hough hey are no close in he dag. In work sealing however his is highly unlikely o happen due o he random seals. Figure 10 for example shows an execuion where all pairs of corresponding gray nodes run on differen processes. In his secion we describe and evaluae localiyguided work sealing a heurisic modificaion o work sealing which is designed o allow localiy beween nodes ha are disan in he compuaional graph. In localiyguided work sealing each hread can be given an affiniy for a process and when a process obains work i gives prioriy o hreads wih affiniy for i. To enable his in addiion o a deque each process mainains a mailbox: a firsinfirsou 2./01  Â
10 (FIFO) queue of poiners o hreads ha have affiniy for he process. There are hen wo differences beween he localiyguided worksealing and worksealing algorihms. Firs when creaing a hread a process will push he hread ono boh he deque as in normal work sealing and also ono he ail of he mailbox of he process ha he hread has affiniy for. Second a process will firs ry o obain work from is mailbox before aemping a seal. Because hreads can appear wice once in a mailbox and once on a deque here needs o be some form of synchronizaion beween he wo copies o make sure he hread is no execued wice. A number of echniques ha have been suggesed o improve he daa localiy of mulihreaded programs can be realized by he localiyguided worksealing algorihm ogeher wih an appropriae policy o deermine he affiniies of hreads. For example an iniial disribuion of work among processes can be enforced by seing he affiniies of a hread o he process ha i will be assigned a he beginning of he compuaion. We call his localiyguided worksealing wih iniial placemens. Likewise echniques ha rely on hins from he programmer can be realized by seing he affiniy of hreads based on he hins. In he nex secion we describe an implemenaion of localiyguided work sealing for ieraive daaparallel applicaions. The implemenaion described can be modified easily o implemen oher echniques menioned. 7.1 Implemenaion We buil localiyguided work sealing ino Hood. Hood is a mulihreaded programming library wih a nonblocking implemenaion of work sealing ha delivers provably good performance under boh radiional and muliprogrammed workloads [ ]. In Hood he programmer defines a hread as a C++ class which we refer o as he hread definiion. A hread definiion has a mehod named run ha defines he code ha he hread execues. The run mehod is a C++ funcion which can call Hood library funcions o creae and synchronize wih oher hreads. A rope is an objec ha is an insance of a hread definiion class. Each ime he run mehod of a rope is execued i creaes a new hread. A rope can have an affiniy for a process and when he Hood runime sysem execues such a rope he sysem passes his affiniy o he hread. If he hread does no run on he process for which i has affiniy he affiniy of he rope is updaed o he new process. Ieraive daaparallel applicaions can effecively use ropes by making sure all corresponding hreads (hreads ha updae he same region across differen seps) are generaed from he same rope. A hread will herefore always have an affiniy for he process on which i s corresponding hread ran on he previous sep. The dashed recangle in Figure 10 for example represens wo hreads ha are generaed in wo execuions of one rope. To iniialize he ropes he programmer needs o creae a ree of ropes before he firs sep. This ree is hen used on each sep when forking he hreads. To implemen localiyguided work sealing in Hood we use a nonblocking queue for each mailbox. Since a hread is pu o a mailbox and o a deque one issue is making sure ha he hread is no execued wice once from he mailbox and once from he deque. One soluion is o remove he oher copy of a hread when a process sars execuing i. In pracice his is no efficien because i has a large synchronizaion overhead. In our implemenaion we do his lazily: when a process sars execuing a hread i ses a flag using an aomic updae operaion such as esandse or compareandswap o mark he hread. When execuing a hread a process idenifies a marked hread wih he aomic updae and discards he hread. The second issue comes up when one wans o reuse he hread daa srucures ypically hose from he previous sep. When a hread s srucure is reused in a sep he copies from he previous sep which can be in a mailbox or a deque needs o be marked invalid. One can implemen his by invalidaing all he Benchmark Work Overhead Criical Pah Average ( & ) (@$A ) Lengh ( ) Par. ) saichea J(2åî" hea J9Lñå;= J$å J(; uå$k2 0=LuJ=åüJ=J lghea J9Lñå076 J$å J(; uå$k$k 076$;2å" iphea J9Lñå076 J$å J(; uå$k$k 076$;2å" saicrelax K$KñåüJ J$å$# relax K$0ñåî=0 J$å$# uå$0$î J$J; LñåKñJ lgrelax K$Kñå;=; J$å$# uå$0$î J$J 0$0ñå#=K iprelax K$Kñå;=; J$å$# uå$0$î J$J 0$0ñå#=K Table 1: Measured benchmark characerisics. We compiled all applicaions wih Sun CC compiler using xarch=v8plus O5 dalign flags. All imes are given in seconds. denoes he execuion ime of he sequenial algorihm for he applicaion and is J9Kñå (K for Hea and for Relax. muliple copies of hreads a he end of a sep and synchronizing all processes before he nex sep sar. In muliprogrammed workloads however he kernel can swap a process ou prevening i from paricipaing o he curren sep. Such a swapped ou process prevens all he oher processes from proceeding o he nex sep. In our implemenaion o avoid he synchronizaion a he end of each sep we imesamp hread daa srucures such ha each process closely follows he ime of he compuaion and ignores a hread ha is ouofdae. 7.2 Experimenal Resuls In his secion we presen he resuls of our preliminary experimens wih localiyguided work sealing on wo small applicaions. The experimens were run on a J K processor Sun Ulra Enerprise wih K$= MHz processors and K M bye L2 cache each and running Solaris 2.7. We used he processor bind sysem call of Solaris 2.7 o bind processes o processors o preven Solaris kernel from migraing a process among processors causing he process o loose is cache sae. When he number of processes is less han number of processors we bind one process o each processor oherwise we bind processes o processors such ha processes are disribued among processors as evenly as possible. We use he applicaions Hea and Relax in our evaluaion. Hea is a Jacobi overrelaxaion ha simulaes hea propagaion on a; dimensional grid for a number of seps. This benchmark was derived from similar Cilk [27] and SPLASH [35] benchmarks. The main daa srucures are wo equalsized arrays. The algorihm runs in seps each of which updaes he enries in one array using he daa in he oher array which was updaed in he previous sep. Relax is a GaussSeidel overrelaxaion algorihm ha ieraes over one a J dimensional array updaing each elemen by a weighed average of is value and ha of is wo neighbors. We implemened each applicaion wih four sraegies saic pariioning work sealing localiyguided work sealing and localiy guided work sealing wih iniial placemens. The saic pariioning benchmarks divide he oal work equally among he number of processes and makes sure ha each process accesses he same daa elemens in all he seps. I is implemened direcly wih Solaris hreads. The hree worksealing sraegies are all implemened in Hood. The plain worksealing version uses hreads direcly and he wo localiyguided versions use ropes by building a ree of ropes a he beginning of he compuaion. The iniial placemen sraegy assigns iniial affiniies o he ropes near he op of he ree o achieve a good iniial load balance. We use he following prefixes in he names of he benchmarks: saic (saic pariioning) none (work seal
Implementing Ray Casting in Tetrahedral Meshes with Programmable Graphics Hardware (Technical Report)
Implemening Ray Casing in Terahedral Meshes wih Programmable Graphics Hardware (Technical Repor) Marin Kraus, Thomas Erl March 28, 2002 1 Inroducion Alhough cell-projecion, e.g., [3, 2], and resampling,
More informationThe Data Locality of Work Stealing
The Data Locality of Work Stealing Umut A. Acar School of Computer Science Carnegie Mellon University umut@cs.cmu.edu Guy E. Blelloch School of Computer Science Carnegie Mellon University guyb@cs.cmu.edu
More informationScheduling. Scheduling. EDA421/DIT171 - Parallel and Distributed Real-Time Systems, Chalmers/GU, 2011/2012 Lecture #4 Updated March 16, 2012
EDA421/DIT171 - Parallel and Disribued Real-Time Sysems, Chalmers/GU, 2011/2012 Lecure #4 Updaed March 16, 2012 Aemps o mee applicaion consrains should be done in a proacive way hrough scheduling. Schedule
More informationShortest Path Algorithms. Lecture I: Shortest Path Algorithms. Example. Graphs and Matrices. Setting: Dr Kieran T. Herley.
Shores Pah Algorihms Background Seing: Lecure I: Shores Pah Algorihms Dr Kieran T. Herle Deparmen of Compuer Science Universi College Cork Ocober 201 direced graph, real edge weighs Le he lengh of a pah
More information4. Minimax and planning problems
CS/ECE/ISyE 524 Inroducion o Opimizaion Spring 2017 18 4. Minima and planning problems ˆ Opimizing piecewise linear funcions ˆ Minima problems ˆ Eample: Chebyshev cener ˆ Muli-period planning problems
More informationCoded Caching with Multiple File Requests
Coded Caching wih Muliple File Requess Yi-Peng Wei Sennur Ulukus Deparmen of Elecrical and Compuer Engineering Universiy of Maryland College Park, MD 20742 ypwei@umd.edu ulukus@umd.edu Absrac We sudy a
More informationOptimal Crane Scheduling
Opimal Crane Scheduling Samid Hoda, John Hooker Laife Genc Kaya, Ben Peerson Carnegie Mellon Universiy Iiro Harjunkoski ABB Corporae Research EWO - 13 November 2007 1/16 Problem Track-mouned cranes move
More informationSam knows that his MP3 player has 40% of its battery life left and that the battery charges by an additional 12 percentage points every 15 minutes.
8.F Baery Charging Task Sam wans o ake his MP3 player and his video game player on a car rip. An hour before hey plan o leave, he realized ha he forgo o charge he baeries las nigh. A ha poin, he plugged
More informationPART 1 REFERENCE INFORMATION CONTROL DATA 6400 SYSTEMS CENTRAL PROCESSOR MONITOR
. ~ PART 1 c 0 \,).,,.,, REFERENCE NFORMATON CONTROL DATA 6400 SYSTEMS CENTRAL PROCESSOR MONTOR n CONTROL DATA 6400 Compuer Sysems, sysem funcions are normally handled by he Monior locaed in a Peripheral
More informationA Matching Algorithm for Content-Based Image Retrieval
A Maching Algorihm for Conen-Based Image Rerieval Sue J. Cho Deparmen of Compuer Science Seoul Naional Universiy Seoul, Korea Absrac Conen-based image rerieval sysem rerieves an image from a daabase using
More informationCOMP26120: Algorithms and Imperative Programming
COMP26120 ecure C3 1/48 COMP26120: Algorihms and Imperaive Programming ecure C3: C - Recursive Daa Srucures Pee Jinks School of Compuer Science, Universiy of Mancheser Auumn 2011 COMP26120 ecure C3 2/48
More informationNEWTON S SECOND LAW OF MOTION
Course and Secion Dae Names NEWTON S SECOND LAW OF MOTION The acceleraion of an objec is defined as he rae of change of elociy. If he elociy changes by an amoun in a ime, hen he aerage acceleraion during
More informationQuantitative macro models feature an infinite number of periods A more realistic (?) view of time
INFINIE-HORIZON CONSUMPION-SAVINGS MODEL SEPEMBER, Inroducion BASICS Quaniaive macro models feaure an infinie number of periods A more realisic (?) view of ime Infinie number of periods A meaphor for many
More informationA Formalization of Ray Casting Optimization Techniques
A Formalizaion of Ray Casing Opimizaion Techniques J. Revelles, C. Ureña Dp. Lenguajes y Sisemas Informáicos, E.T.S.I. Informáica, Universiy of Granada, Spain e-mail: [jrevelle,almagro]@ugr.es URL: hp://giig.ugr.es
More informationCOSC 3213: Computer Networks I Chapter 6 Handout # 7
COSC 3213: Compuer Neworks I Chaper 6 Handou # 7 Insrucor: Dr. Marvin Mandelbaum Deparmen of Compuer Science York Universiy F05 Secion A Medium Access Conrol (MAC) Topics: 1. Muliple Access Communicaions:
More informationSTEREO PLANE MATCHING TECHNIQUE
STEREO PLANE MATCHING TECHNIQUE Commission III KEY WORDS: Sereo Maching, Surface Modeling, Projecive Transformaion, Homography ABSTRACT: This paper presens a new ype of sereo maching algorihm called Sereo
More informationEECS 487: Interactive Computer Graphics
EECS 487: Ineracive Compuer Graphics Lecure 7: B-splines curves Raional Bézier and NURBS Cubic Splines A represenaion of cubic spline consiss of: four conrol poins (why four?) hese are compleely user specified
More informationImproving the Efficiency of Dynamic Service Provisioning in Transport Networks with Scheduled Services
Improving he Efficiency of Dynamic Service Provisioning in Transpor Neworks wih Scheduled Services Ralf Hülsermann, Monika Jäger and Andreas Gladisch Technologiezenrum, T-Sysems, Goslarer Ufer 35, D-1585
More informationNetwork management and QoS provisioning - QoS in Frame Relay. . packet switching with virtual circuit service (virtual circuits are bidirectional);
QoS in Frame Relay Frame relay characerisics are:. packe swiching wih virual circui service (virual circuis are bidirecional);. labels are called DLCI (Daa Link Connecion Idenifier);. for connecion is
More informationImage segmentation. Motivation. Objective. Definitions. A classification of segmentation techniques. Assumptions for thresholding
Moivaion Image segmenaion Which pixels belong o he same objec in an image/video sequence? (spaial segmenaion) Which frames belong o he same video sho? (emporal segmenaion) Which frames belong o he same
More informationDistributed Task Negotiation in Modular Robots
Disribued Task Negoiaion in Modular Robos Behnam Salemi, eer Will, and Wei-Min Shen USC Informaion Sciences Insiue and Compuer Science Deparmen Marina del Rey, USA, {salemi, will, shen}@isi.edu Inroducion
More informationY. Tsiatouhas. VLSI Systems and Computer Architecture Lab
CMOS INEGRAED CIRCUI DESIGN ECHNIQUES Universiy of Ioannina Clocking Schemes Dep. of Compuer Science and Engineering Y. siaouhas CMOS Inegraed Circui Design echniques Overview 1. Jier Skew hroughpu Laency
More informationCENG 477 Introduction to Computer Graphics. Modeling Transformations
CENG 477 Inroducion o Compuer Graphics Modeling Transformaions Modeling Transformaions Model coordinaes o World coordinaes: Model coordinaes: All shapes wih heir local coordinaes and sies. world World
More informationChapter 4 Sequential Instructions
Chaper 4 Sequenial Insrucions The sequenial insrucions of FBs-PLC shown in his chaper are also lised in secion 3.. Please refer o Chaper, "PLC Ladder diagram and he Coding rules of Mnemonic insrucion",
More informationNRMI: Natural and Efficient Middleware
NRMI: Naural and Efficien Middleware Eli Tilevich and Yannis Smaragdakis Cener for Experimenal Research in Compuer Sysems (CERCS), College of Compuing, Georgia Tech {ilevich, yannis}@cc.gaech.edu Absrac
More informationCS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II
CS 152 Compuer Archiecure and Engineering Lecure 7 - Memory Hierarchy-II Krse Asanovic Elecrical Engineering and Compuer Sciences Universiy of California a Berkeley hp://www.eecs.berkeley.edu/~krse hp://ins.eecs.berkeley.edu/~cs152
More informationA time-space consistency solution for hardware-in-the-loop simulation system
Inernaional Conference on Advanced Elecronic Science and Technology (AEST 206) A ime-space consisency soluion for hardware-in-he-loop simulaion sysem Zexin Jiang a Elecric Power Research Insiue of Guangdong
More informationCAMERA CALIBRATION BY REGISTRATION STEREO RECONSTRUCTION TO 3D MODEL
CAMERA CALIBRATION BY REGISTRATION STEREO RECONSTRUCTION TO 3D MODEL Klečka Jan Docoral Degree Programme (1), FEEC BUT E-mail: xkleck01@sud.feec.vubr.cz Supervised by: Horák Karel E-mail: horak@feec.vubr.cz
More informationQuick Verification of Concurrent Programs by Iteratively Relaxed Scheduling
Quick Verificaion of Concurren Programs by Ieraively Relaxed Scheduling Parick Mezler, Habib Saissi, Péer Bokor, Neeraj Suri Technische Univerisä Darmsad, Germany {mezler, saissi, pbokor, suri}@deeds.informaik.u-darmsad.de
More informationLess Pessimistic Worst-Case Delay Analysis for Packet-Switched Networks
Less Pessimisic Wors-Case Delay Analysis for Packe-Swiched Neworks Maias Wecksén Cenre for Research on Embedded Sysems P O Box 823 SE-31 18 Halmsad maias.wecksen@hh.se Magnus Jonsson Cenre for Research
More informationGauss-Jordan Algorithm
Gauss-Jordan Algorihm The Gauss-Jordan algorihm is a sep by sep procedure for solving a sysem of linear equaions which may conain any number of variables and any number of equaions. The algorihm is carried
More informationOutline. EECS Components and Design Techniques for Digital Systems. Lec 06 Using FSMs Review: Typical Controller: state
Ouline EECS 5 - Componens and Design Techniques for Digial Sysems Lec 6 Using FSMs 9-3-7 Review FSMs Mapping o FPGAs Typical uses of FSMs Synchronous Seq. Circuis safe composiion Timing FSMs in verilog
More informationTheory of Computing Systems 2002 Springer-Verlag New York Inc.
Theory Comput. Systems 35, 321 347 (2002) DOI: 10.1007/s00224-002-1057-3 Theory of Computing Systems 2002 Springer-Verlag New York Inc. The Data Locality of Work Stealing Umut A. Acar, 1 Guy E. Blelloch,
More informationThe Roots of Lisp paul graham
The Roos of Lisp paul graham Draf, January 18, 2002. In 1960, John McCarhy published a remarkable paper in which he did for programming somehing like wha Euclid did for geomery. 1 He showed how, given
More informationWhy Waste a Perfectly Good Abstraction?
Why Wase a Perfecly Good Absracion? Arie Gurfinkel and Marsha Chechik Deparmen of Compuer Science, Universiy of Torono, Torono, ON M5S 3G4, Canada. Email: arie,chechik@cs.orono.edu Absrac. Sofware model-checking
More information4 Error Control. 4.1 Issues with Reliable Protocols
4 Error Conrol Jus abou all communicaion sysems aemp o ensure ha he daa ges o he oher end of he link wihou errors. Since i s impossible o build an error-free physical layer (alhough some shor links can
More informationAn Improved Square-Root Nyquist Shaping Filter
An Improved Square-Roo Nyquis Shaping Filer fred harris San Diego Sae Universiy fred.harris@sdsu.edu Sridhar Seshagiri San Diego Sae Universiy Seshigar.@engineering.sdsu.edu Chris Dick Xilinx Corp. chris.dick@xilinx.com
More informationChapter 3 MEDIA ACCESS CONTROL
Chaper 3 MEDIA ACCESS CONTROL Overview Moivaion SDMA, FDMA, TDMA Aloha Adapive Aloha Backoff proocols Reservaion schemes Polling Disribued Compuing Group Mobile Compuing Summer 2003 Disribued Compuing
More informationsource managemen, naming, proecion, and service provisions. This paper concenraes on he basic processor scheduling aspecs of resource managemen. 2 The
Virual Compuers A New Paradigm for Disribued Operaing Sysems Banu Ozden y Aaron J. Goldberg Avi Silberschaz z 600 Mounain Ave. AT&T Bell Laboraories Murray Hill, NJ 07974 Absrac The virual compuers (VC)
More informationMATH Differential Equations September 15, 2008 Project 1, Fall 2008 Due: September 24, 2008
MATH 5 - Differenial Equaions Sepember 15, 8 Projec 1, Fall 8 Due: Sepember 4, 8 Lab 1.3 - Logisics Populaion Models wih Harvesing For his projec we consider lab 1.3 of Differenial Equaions pages 146 o
More informationCS 152 Computer Architecture and Engineering. Lecture 6 - Memory
CS 152 Compuer Archiecure and Engineering Lecure 6 - Memory Krse Asanovic Elecrical Engineering and Compuer Sciences Universiy of California a Berkeley hp://www.eecs.berkeley.edu/~krse hp://ins.eecs.berkeley.edu/~cs152
More informationLearning in Games via Opponent Strategy Estimation and Policy Search
Learning in Games via Opponen Sraegy Esimaion and Policy Search Yavar Naddaf Deparmen of Compuer Science Universiy of Briish Columbia Vancouver, BC yavar@naddaf.name Nando de Freias (Supervisor) Deparmen
More informationSTRING DESCRIPTIONS OF DATA FOR DISPLAY*
SLAC-PUB-383 January 1968 STRING DESCRIPTIONS OF DATA FOR DISPLAY* J. E. George and W. F. Miller Compuer Science Deparmen and Sanford Linear Acceleraor Cener Sanford Universiy Sanford, California Absrac
More informationIn fmri a Dual Echo Time EPI Pulse Sequence Can Induce Sources of Error in Dynamic Magnetic Field Maps
In fmri a Dual Echo Time EPI Pulse Sequence Can Induce Sources of Error in Dynamic Magneic Field Maps A. D. Hahn 1, A. S. Nencka 1 and D. B. Rowe 2,1 1 Medical College of Wisconsin, Milwaukee, WI, Unied
More informationConstant-Work-Space Algorithms for Shortest Paths in Trees and Simple Polygons
Journal of Graph Algorihms and Applicaions hp://jgaa.info/ vol. 15, no. 5, pp. 569 586 (2011) Consan-Work-Space Algorihms for Shores Pahs in Trees and Simple Polygons Tesuo Asano 1 Wolfgang Mulzer 2 Yajun
More informationA non-stationary uniform tension controlled interpolating 4-point scheme reproducing conics
A non-saionary uniform ension conrolled inerpolaing 4-poin scheme reproducing conics C. Beccari a, G. Casciola b, L. Romani b, a Deparmen of Pure and Applied Mahemaics, Universiy of Padova, Via G. Belzoni
More informationIt is easier to visualize plotting the curves of cos x and e x separately: > plot({cos(x),exp(x)},x = -5*Pi..Pi,y = );
Mah 467 Homework Se : some soluions > wih(deools): wih(plos): Warning, he name changecoords has been redefined Problem :..7 Find he fixed poins, deermine heir sabiliy, for x( ) = cos x e x > plo(cos(x)
More informationMORPHOLOGICAL SEGMENTATION OF IMAGE SEQUENCES
MORPHOLOGICAL SEGMENTATION OF IMAGE SEQUENCES B. MARCOTEGUI and F. MEYER Ecole des Mines de Paris, Cenre de Morphologie Mahémaique, 35, rue Sain-Honoré, F 77305 Fonainebleau Cedex, France Absrac. In image
More informationPacket Scheduling in a Low-Latency Optical Interconnect with Electronic Buffers
Packe cheduling in a Low-Laency Opical Inerconnec wih Elecronic Buffers Lin Liu Zhenghao Zhang Yuanyuan Yang Dep Elecrical & Compuer Engineering Compuer cience Deparmen Dep Elecrical & Compuer Engineering
More informationAutomatic Calculation of Coverage Profiles for Coverage-based Testing
Auomaic Calculaion of Coverage Profiles for Coverage-based Tesing Raimund Kirner 1 and Waler Haas 1 Vienna Universiy of Technology, Insiue of Compuer Engineering, Vienna, Ausria, raimund@vmars.uwien.ac.a
More informationDynamic Route Planning and Obstacle Avoidance Model for Unmanned Aerial Vehicles
Volume 116 No. 24 2017, 315-329 ISSN: 1311-8080 (prined version); ISSN: 1314-3395 (on-line version) url: hp://www.ijpam.eu ijpam.eu Dynamic Roue Planning and Obsacle Avoidance Model for Unmanned Aerial
More informationVisual Indoor Localization with a Floor-Plan Map
Visual Indoor Localizaion wih a Floor-Plan Map Hang Chu Dep. of ECE Cornell Universiy Ihaca, NY 14850 hc772@cornell.edu Absrac In his repor, a indoor localizaion mehod is presened. The mehod akes firsperson
More informationAssignment 2. Due Monday Feb. 12, 10:00pm.
Faculy of rs and Science Universiy of Torono CSC 358 - Inroducion o Compuer Neworks, Winer 218, LEC11 ssignmen 2 Due Monday Feb. 12, 1:pm. 1 Quesion 1 (2 Poins): Go-ack n RQ In his quesion, we review how
More informationLecture 18: Mix net Voting Systems
6.897: Advanced Topics in Crypography Apr 9, 2004 Lecure 18: Mix ne Voing Sysems Scribed by: Yael Tauman Kalai 1 Inroducion In he previous lecure, we defined he noion of an elecronic voing sysem, and specified
More information4.1 3D GEOMETRIC TRANSFORMATIONS
MODULE IV MCA - 3 COMPUTER GRAPHICS ADMN 29- Dep. of Compuer Science And Applicaions, SJCET, Palai 94 4. 3D GEOMETRIC TRANSFORMATIONS Mehods for geomeric ransformaions and objec modeling in hree dimensions
More informationAn Efficient Delivery Scheme for Coded Caching
201 27h Inernaional Teleraffic Congress An Efficien Delivery Scheme for Coded Caching Abinesh Ramakrishnan, Cedric Wesphal and Ahina Markopoulou Deparmen of Elecrical Engineering and Compuer Science, Universiy
More informationA Principled Approach to. MILP Modeling. Columbia University, August Carnegie Mellon University. Workshop on MIP. John Hooker.
Slide A Principled Approach o MILP Modeling John Hooer Carnegie Mellon Universiy Worshop on MIP Columbia Universiy, Augus 008 Proposal MILP modeling is an ar, bu i need no be unprincipled. Slide Proposal
More informationConcurrency Control and Recovery in Transactional Process Management
In: Proceedings of he ACM Symposium on Principles of Daabase Sysems (PODS 99), pages 316-326, Philadelphia, Pennsylvania, USA, May/June, 1999. Concurrency Conrol and Recovery in Transacional Process Managemen
More informationMOTION DETECTORS GRAPH MATCHING LAB PRE-LAB QUESTIONS
NME: TE: LOK: MOTION ETETORS GRPH MTHING L PRE-L QUESTIONS 1. Read he insrucions, and answer he following quesions. Make sure you resae he quesion so I don hae o read he quesion o undersand he answer..
More informationNearest Keyword Search in XML Documents
Neares Keyword Search in XML Documens Yufei Tao Savros Papadopoulos Cheng Sheng Kosas Sefanidis Deparmen of Compuer Science and Engineering Chinese Universiy of Hong Kong New Terriories, Hong Kong {aoyf,
More informationFIELD PROGRAMMABLE GATE ARRAY (FPGA) AS A NEW APPROACH TO IMPLEMENT THE CHAOTIC GENERATORS
FIELD PROGRAMMABLE GATE ARRAY (FPGA) AS A NEW APPROACH TO IMPLEMENT THE CHAOTIC GENERATORS Mohammed A. Aseeri and M. I. Sobhy Deparmen of Elecronics, The Universiy of Ken a Canerbury Canerbury, Ken, CT2
More informationWho thinks who knows who? Socio-Cognitive Analysis of an Network
Who hinks who knows who? Socio-Cogniive Analysis of an Email Nework Nishih Pahak Deparmen of Compuer Science Universiy of Minnesoa Minneapolis, MN, USA npahak@cs.umn.edu Sandeep Mane Deparmen of Compuer
More informationPetri Nets for Object-Oriented Modeling
Peri Nes for Objec-Oriened Modeling Sefan Wi Absrac Ensuring he correcness of concurren rograms is difficul since common aroaches for rogram design do no rovide aroriae mehods This aer gives a brief inroducion
More informationRule-Based Multi-Query Optimization
Rule-Based Muli-Query Opimizaion Mingsheng Hong Dep. of Compuer cience Cornell Universiy mshong@cs.cornell.edu Johannes Gehrke Dep. of Compuer cience Cornell Universiy johannes@cs.cornell.edu Mirek Riedewald
More informationSimple Network Management Based on PHP and SNMP
Simple Nework Managemen Based on PHP and SNMP Krasimir Trichkov, Elisavea Trichkova bsrac: This paper aims o presen simple mehod for nework managemen based on SNMP - managemen of Cisco rouer. The paper
More informationPerformance Evaluation of Implementing Calls Prioritization with Different Queuing Disciplines in Mobile Wireless Networks
Journal of Compuer Science 2 (5): 466-472, 2006 ISSN 1549-3636 2006 Science Publicaions Performance Evaluaion of Implemening Calls Prioriizaion wih Differen Queuing Disciplines in Mobile Wireless Neworks
More informationComputer representations of piecewise
Edior: Gabriel Taubin Inroducion o Geomeric Processing hrough Opimizaion Gabriel Taubin Brown Universiy Compuer represenaions o piecewise smooh suraces have become vial echnologies in areas ranging rom
More informationVideo Content Description Using Fuzzy Spatio-Temporal Relations
Proceedings of he 4s Hawaii Inernaional Conference on Sysem Sciences - 008 Video Conen Descripion Using Fuzzy Spaio-Temporal Relaions rchana M. Rajurkar *, R.C. Joshi and Sananu Chaudhary 3 Dep of Compuer
More informationA GRAPHICS PROCESSING UNIT IMPLEMENTATION OF THE PARTICLE FILTER
A GRAPHICS PROCESSING UNIT IMPLEMENTATION OF THE PARTICLE FILTER ABSTRACT Modern graphics cards for compuers, and especially heir graphics processing unis (GPUs), are designed for fas rendering of graphics.
More informationThe Data Locality of Work Stealing
The Data Locality of Work Stealing Umut A. Acar umut@cs.cmu.edu School of Computer Science Carnegie Mellon University Guy E. Blelloch blelloch@cs.cmu.edu School of Computer Science Carnegie Mellon University
More informationPartition-based document identifier assignment (PBDIA) algorithm. (long queries)
( ) Pariion-based documen idenifier assignmen (PBDIA) algorihm PBDIA (long queries) (parallel IR) :,,,, d-gap Compressing an invered file can grealy improve query performance of an informaion rerieval
More informationNonparametric CUSUM Charts for Process Variability
Journal of Academia and Indusrial Research (JAIR) Volume 3, Issue June 4 53 REEARCH ARTICLE IN: 78-53 Nonparameric CUUM Chars for Process Variabiliy D.M. Zombade and V.B. Ghue * Dep. of aisics, Walchand
More informationThe Beer Dock: Three and a Half Implementations of the Beer Distribution Game
The Beer Dock 2002-08-13 17:55:44-0700 The Beer Dock: Three and a Half Implemenaions of he Beer Disribuion Game Michael J. Norh[1] and Charles M. Macal Argonne Naional Laboraory, Argonne, Illinois Absrac
More informationAn Implementation of the PCP, SRP, D-PCP, M-PCP, and FMLP Real-Time Synchronization Protocols in LITMUS RT
An Implemenaion of he PCP, SRP, D-PCP, M-PCP, and FMLP Real-Time Synchronizaion Proocols in LITMUS RT Björn B. Brandenburg and James H. Anderson The Universiy of Norh Carolina a Chapel Hill Absrac We exend
More informationAnalysis of Various Types of Bugs in the Object Oriented Java Script Language Coding
Indian Journal of Science and Technology, Vol 8(21), DOI: 10.17485/ijs/2015/v8i21/69958, Sepember 2015 ISSN (Prin) : 0974-6846 ISSN (Online) : 0974-5645 Analysis of Various Types of Bugs in he Objec Oriened
More informationAn Adaptive Spatial Depth Filter for 3D Rendering IP
JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.3, NO. 4, DECEMBER, 23 175 An Adapive Spaial Deph Filer for 3D Rendering IP Chang-Hyo Yu and Lee-Sup Kim Absrac In his paper, we presen a new mehod
More informationUser Adjustable Process Scheduling Mechanism for a Multiprocessor Embedded System
Proceedings of he 6h WSEAS Inernaional Conference on Applied Compuer Science, Tenerife, Canary Islands, Spain, December 16-18, 2006 346 User Adjusable Process Scheduling Mechanism for a Muliprocessor Embedded
More informationTest - Accredited Configuration Engineer (ACE) Exam - PAN-OS 6.0 Version
Tes - Accredied Configuraion Engineer (ACE) Exam - PAN-OS 6.0 Version ACE Exam Quesion 1 of 50. Which of he following saemens is NOT abou Palo Alo Neworks firewalls? Sysem defauls may be resored by performing
More informationPrecise Voronoi Cell Extraction of Free-form Rational Planar Closed Curves
Precise Voronoi Cell Exracion of Free-form Raional Planar Closed Curves Iddo Hanniel, Ramanahan Muhuganapahy, Gershon Elber Deparmen of Compuer Science Technion, Israel Insiue of Technology Haifa 32000,
More informationA Routing Algorithm for Flip-Chip Design
A Rouing Algorihm for Flip-hip Design Jia-Wei Fang, I-Jye Lin, and Yao-Wen hang, Graduae Insiue of Elecronics Engineering, Naional Taiwan Universiy, Taipei Deparmen of Elecrical Engineering, Naional Taiwan
More informationDesign Alternatives for a Thin Lens Spatial Integrator Array
Egyp. J. Solids, Vol. (7), No. (), (004) 75 Design Alernaives for a Thin Lens Spaial Inegraor Array Hala Kamal *, Daniel V azquez and Javier Alda and E. Bernabeu Opics Deparmen. Universiy Compluense of
More informationAML710 CAD LECTURE 11 SPACE CURVES. Space Curves Intrinsic properties Synthetic curves
AML7 CAD LECTURE Space Curves Inrinsic properies Synheic curves A curve which may pass hrough any region of hreedimensional space, as conrased o a plane curve which mus lie on a single plane. Space curves
More informationFully Dynamic Algorithm for Top-k Densest Subgraphs
Fully Dynamic Algorihm for Top-k Denses Subgraphs Muhammad Anis Uddin Nasir 1, Arisides Gionis 2, Gianmarco De Francisci Morales 3 Sarunas Girdzijauskas 4 Royal Insiue of Technology, Sweden Aalo Universiy,
More informationThe Impact of Product Development on the Lifecycle of Defects
The Impac of Produc Developmen on he Lifecycle of Rudolf Ramler Sofware Compeence Cener Hagenberg Sofware Park 21 A-4232 Hagenberg, Ausria +43 7236 3343 872 rudolf.ramler@scch.a ABSTRACT This paper invesigaes
More informationProbabilistic Detection and Tracking of Motion Discontinuities
Probabilisic Deecion and Tracking of Moion Disconinuiies Michael J. Black David J. Flee Xerox Palo Alo Research Cener 3333 Coyoe Hill Road Palo Alo, CA 94304 fblack,fleeg@parc.xerox.com hp://www.parc.xerox.com/fblack,fleeg/
More informationCS 152 Computer Architecture and Engineering. Lecture 6 - Memory
CS 152 Compuer Archiecure and Engineering Lecure 6 - Memory Krse Asanovic Elecrical Engineering and Compuer Sciences Universiy of California a Berkeley hp://www.eecs.berkeley.edu/~krse hp://ins.eecs.berkeley.edu/~cs152
More informationTime Expression Recognition Using a Constituent-based Tagging Scheme
Track: Web Conen Analysis, Semanics and Knowledge Time Expression Recogniion Using a Consiuen-based Tagging Scheme Xiaoshi Zhong and Erik Cambria School of Compuer Science and Engineering Nanyang Technological
More informationUtility-Based Hybrid Memory Management
Uiliy-Based Hybrid Memory Managemen Yang Li Saugaa Ghose Jongmoo Choi Jin Sun Hui Wang Onur Mulu Carnegie Mellon Universiy Dankook Universiy Beihang Universiy ETH Zürich While he memory fooprins of cloud
More informationParallel and Distributed Systems for Constructive Neural Network Learning*
Parallel and Disribued Sysems for Consrucive Neural Nework Learning* J. Flecher Z. Obradovi School of Elecrical Engineering and Compuer Science Washingon Sae Universiy Pullman WA 99164-2752 Absrac A consrucive
More informationAnnouncements For The Logic of Boolean Connectives Truth Tables, Tautologies & Logical Truths. Outline. Introduction Truth Functions
Announcemens For 02.05.09 The Logic o Boolean Connecives Truh Tables, Tauologies & Logical Truhs 1 HW3 is due nex Tuesday William Sarr 02.05.09 William Sarr The Logic o Boolean Connecives (Phil 201.02)
More information1.4 Application Separable Equations and the Logistic Equation
1.4 Applicaion Separable Equaions and he Logisic Equaion If a separable differenial equaion is wrien in he form f ( y) dy= g( x) dx, hen is general soluion can be wrien in he form f ( y ) dy = g ( x )
More informationChapter Six Chapter Six
Chaper Si Chaper Si 0 CHAPTER SIX ConcepTess and Answers and Commens for Secion.. Which of he following graphs (a) (d) could represen an aniderivaive of he funcion shown in Figure.? Figure. (a) (b) (c)
More informationAchieving Security Assurance with Assertion-based Application Construction
Achieving Securiy Assurance wih Asserion-based Applicaion Consrucion Carlos E. Rubio-Medrano and Gail-Joon Ahn Ira A. Fulon Schools of Engineering Arizona Sae Universiy Tempe, Arizona, USA, 85282 {crubiome,
More informationDiscrete Event Systems. Lecture 14: Discrete Control. Continuous System. Discrete Event System. Discrete Control Systems.
Lecure 14: Discree Conrol Discree Even Sysems [Chaper: Sequenial Conrol + These Slides] Discree Even Sysems Sae Machine-Based Formalisms Saechars Grafce Laboraory 2 Peri Nes Implemenaion No covered in
More informationAxiomatic Foundations and Algorithms for Deciding Semantic Equivalences of SQL Queries
Axiomaic Foundaions and Algorihms for Deciding Semanic Equivalences of SQL Queries Shumo Chu, Brendan Murphy, Jared Roesch, Alvin Cheung, Dan Suciu Paul G. Allen School of Compuer Science and Engineering
More informationHandling uncertainty in semantic information retrieval process
Handling uncerainy in semanic informaion rerieval process Chkiwa Mounira 1, Jedidi Anis 1 and Faiez Gargouri 1 1 Mulimedia, InfoRmaion sysems and Advanced Compuing Laboraory Sfax Universiy, Tunisia m.chkiwa@gmail.com,
More informationA Tool for Multi-Hour ATM Network Design considering Mixed Peer-to-Peer and Client-Server based Services
A Tool for Muli-Hour ATM Nework Design considering Mied Peer-o-Peer and Clien-Server based Services Conac Auhor Name: Luis Cardoso Company / Organizaion: Porugal Telecom Inovação Complee Mailing Address:
More informationA GRAPHICS PROCESSING UNIT IMPLEMENTATION OF THE PARTICLE FILTER
A GRAPHICS PROCESSING UNIT IMPLEMENTATION OF THE PARTICLE FILTER Gusaf Hendeby, Jeroen D. Hol, Rickard Karlsson, Fredrik Gusafsson Deparmen of Elecrical Engineering Auomaic Conrol Linköping Universiy,
More informationME 406 Assignment #1 Solutions
Assignmen#1Sol.nb 1 ME 406 Assignmen #1 Soluions PROBLEM 1 We define he funcion for Mahemaica. In[1]:= f@_d := Ep@D - 4 Sin@D (a) We use Plo o consruc he plo. In[2]:= Plo@f@D, 8, -5, 5
More informationA Progressive-ILP Based Routing Algorithm for Cross-Referencing Biochips
16.3 A Progressive-ILP Based Rouing Algorihm for Cross-Referencing Biochips Ping-Hung Yuh 1, Sachin Sapanekar 2, Chia-Lin Yang 1, Yao-Wen Chang 3 1 Deparmen of Compuer Science and Informaion Engineering,
More information