The Data Locality of Work Stealing

Size: px
Start display at page:

Download "The Data Locality of Work Stealing"

Transcription

1 The Daa Localiy of Work Sealing Umu A. Acar School of Compuer Science Carnegie Mellon Universiy Guy E. Blelloch School of Compuer Science Carnegie Mellon Universiy Rober D. Blumofe Deparmen of Compuer Sciences Universiy of Texas a Ausin rdb@cs.uexas.edu Absrac This paper sudies he daa localiy of he worksealing scheduling algorihm on hardwareconrolled sharedmemory machines. We presen lower and upper bounds on he number of cache misses using work sealing and inroduce a localiyguided worksealing algorihm along wih experimenal validaion. As a lower bound we show ha here is a family of mulihreaded compuaions each member of which requires oal operaions (work) for which when using worksealing he oal number of cache misses on one processor is consan while even on wo processors he oal number of cache misses is. For nesedparallel compuaions however we show ha on processors he expeced addiional number of cache misses beyond hose on a single processor is bounded by where is he execuion ime of an insrucion incurring a cache miss is he seal ime is he size of cache and is he number of nodes on he longes chain of dependences. Based on his we give srong bounds on he oal running ime of nesedparallel compuaions using work sealing. For he second par of our resuls we presen a localiyguided work sealing algorihm ha improves he daa localiy of mulihreaded compuaions by allowing a hread o have an affiniy for a processor. Our iniial experimens on ieraive daaparallel applicaions show ha he algorihm maches he performance of saicpariioning under radiional work loads bu improves he performance up o "! over saic pariioning under muliprogrammed work loads. Furhermore he localiyguided work sealing improves he performance of worksealing up o#$"!. 1 Inroducion Many of oday s parallel applicaions use sophisicaed adapive algorihms which are bes realized wih parallel programming sysems ha suppor dynamic lighweigh hreads such as Cilk [8] Nesl [5] Hood [10] and many ohers [ ]. The core of hese sysems is a hread scheduler ha balances load among he processes. In addiion o a good load balance however good daa localiy is essenial in obaining high performance from modern parallel sysems. Several researches have sudied echniques o improve he daa localiy of mulihreaded programs. One class of such echniques is based on sofwareconrolled disribuion of daa among he local memories of a disribued shared memory sysem [ ]. Anoher class of echniques is based on hins supplied by he programmer so ha similar asks migh be execued on he same processor [ ]. Boh hese classes of echniques rely on he programmer or compiler o deermine he daa access paerns in he program which may be very difficul when he program has complicaed daa access paerns. Perhaps he earlies class of echniques was o aemp o execue hreads ha are close in he compuaion graph on he same processor [ ]. The worksealing algorihm is he mos sudied of hese echniques [ ]. Blumofe e al showed ha fullysric compuaions achieve a provably good daa localiy [7] when execued wih he worksealing algorihm on a dagconsisen disribued shared memory sysems. In recen work Narlikar showed ha work sealing improves he performance of spaceefficien mulihreaded applicaions by increasing he daa localiy [29]. None of his previous work however has sudied upper or lower bounds on he daa localiy of mulihreaded compuaions execued on exising hardwareconrolled shared memory sysems. In his paper we presen heoreical and experimenal resuls on he daa localiy of work sealing on hardwareconrolled shared memory sysems (HSMSs). Our firs se of resuls are upper and lower bounds on he number of cache misses in mulihreaded compuaions execued by he worksealing algorihm. Le%'&( denoe he number of cache misses in he uniprocessor execuion and %')* denoe he number of cache misses in a processor execuion of a mulihreaded compuaion by he work sealing algorihm on an HSMS wih cache size. Then for a mulihreaded compuaion wih & work (oal number of insrucions) criical pah (longes sequence of dependences) we show he following resuls for he worksealing algorihm running on a HSMS. + Lower bounds on he number of cache misses for general compuaions: We show ha here is a family of compuaions wih &. such ha %'&( /102 while even on wo processors he number of misses % Upper bounds on he number of cache misses for nesedparallel compuaions: For a nesedparallel compuaion we show ha% )87 %'&9 :<;= > where> is he number of seals in he processor execuion. We hen show ha he

2 ? Speedup linear worksealing localiyguided worksealing saic parioning Number of Processes Figure 1: The speedup obained by hree differen overrelaxaion algorihms. expeced number of seals is ( ( where is he ime for a cache miss and is he ime for a seal. + Upper bound on he execuion ime of nesedparallel compuaions: We show ha he expeced execuion ime of a nesedparallel compuaion on processors is ' $ :/ G:H Ï =@$A9BDCFE ) : where &9 is he uniprocessor execuion ime of he compuaion including cache misses. As in previous work [6 9] we represen a mulihreaded compuaion as a direced acyclic graph (dag) of insrucions. Each node in he dag represens a single insrucion and he edges represen ordering consrains. A nesedparallel compuaion [5 6] is a racefree compuaion ha can be represened wih a seriesparallel dag [33]. Nesedparallel compuaions include compuaions consising of parallel loops and fork an joins and any nesing of hem. This class includes mos compuaions ha can be expressed in Cilk [8] and all compuaions ha can be expressed in Nesl [5]. Our resuls show ha nesedparallel compuaions have much beer localiy characerisics under work sealing han do general compuaions. We also briefly consider anoher class of compuaions compuaions wih fuures [ ] and show ha hey can be as bad as general compuaions. The second par of our resuls are on furher improving he daa localiy of mulihreaded compuaions wih work sealing. In work sealing a processor seals a hread from a randomly (wih uniform disribuion) chosen processor when i runs ou of work. In cerain applicaions such as ieraive daaparallel applicaions random seals may cause poor daa localiy. The localiyguided work sealing is a heurisic modificaion o work sealing ha allows a hread o have an affiniy for a process. In localiyguided work sealing when a process obains work i gives prioriy o a hread ha has affiniy for he process. Localiyguided work sealing can be used o implemen a number of echniques ha researchers sugges o improve daa localiy. For example he programmer can achieve an iniial disribuion of work among he processes or schedule hreads based on hins by appropriaely assigning affiniies o hreads in he compuaion. Our preliminary experimens wih localiyguided work sealing give encouraging resuls showing ha for cerain applicaions he performance is very close o ha of saic pariioning in dedicaed mode (i.e. when he user can lock down a fixed number of processors) bu does no suffer a performance cliff problem [10] in muliprogrammed mode (i.e. when processors migh be aken by oher users or he OS). Figure 1 shows a graph comparing work sealing localiyguided work sealing and saic pariioning for a simple overrelaxaion algorihm on a J9K processor Sun Ulra Enerprise. The overrelaxaion algorihm ieraes over a J dimensional array performing a 0 poin sencil compuaion on each sep. The superlinear speedup for saic pariioning and localiyguided work sealing is due o he fac ha he daa for each run does no fi ino he L; cache of one processor bu fis ino he collecive L; cache of L or more processors. For his benchmark he following can be seen from he graph. 1. Localiyguided work sealing does significanly beer han sandard work sealing since on each sep he cache is prewarmed wih he daa i needs. 2. Localiyguided work sealing does approximaely as well as saic pariioning for up o 14 processes. 3. When rying o schedule more han 14 processes on 14 processors saic pariioning has a serious performance drop. The iniial drop is due o load imbalance caused by he coarsegrained pariioning. The performance hen approaches ha of work sealing as he pariioning ges more finegrained. We are ineresed in he performance of worksealing compuaions on hardwareconrolled shared memory (HSMSs). We model an HSMS as a group of idenical processors each of which has is own cache and has a single shared memory. Each cache conains blocks and is managed by he memory subsysem auomaically. We allow for a variey of cache organizaions and replacemen policies including boh direcmapped and associaive caches. We assign a server process wih each processor and associae he cache of a processor wih process ha he processor is assigned. One limiaion of our work is ha we assume ha here is no false sharing. 2 Relaed Work As menioned in Secion 1 here are hree main classes of echniques ha researchers have suggesed o improve he daa localiy of mulihreaded programs. In he firs class he program daa is disribued among he nodes of a disribued sharedmemory sysem by he programmer and a hread in he compuaion is scheduled on he node ha holds he daa ha he hread accesses [ ]. In he second class daalocaliy hins supplied by he programmer are used in hread scheduling [ ]. Techniques from boh classes are employed in disribued shared memory sysems such as COOL and Illinois Concer [15 22] and also used o improve he daa localiy of sequenial programs [31]. However he firs class of echniques do no apply direcly o HSMSs because HSMSs do no allow sofware conrolled disribuion of daa among he caches. Furhermore boh classes of echniques rely on he programmer o deermine he daa access paerns in he applicaion and hus may no be appropriae for applicaions wih complex daaaccess paerns. The hird class of echniques which is based on execuion of hreads ha are close in he compuaion graph on he same process is applied in many scheduling algorihms including work sealing [ ]. Blumofe e al showed bounds on he number of cache misses in a fullysric compuaion execued by he worksealing algorihm under he dagconsisen disribued sharedmemory of Cilk [7]. Dag consisency is a relaxed memoryconsisency model ha is employed in he disribued sharedmemory implemenaion of he Cilk language. In a disribued Cilk applicaion processes mainain he dag consisency by means of he BACKER algorihm. In [7] Blumofe e al bound he number of sharedmemory cache misses in a disribued Cilk

3 RQ QVQR POM NQSQ Q P TU Figure 2: A dag (direced acyclic graph) for a mulihreaded compuaion. Threads are shown as gray recangles. applicaion for caches ha are mainained wih he LRU replacemen policy. They assumed ha accesses o he shared memory are disribued uniformly and independenly which is no generally rue because hreads may concurrenly access he same pages by algorihm design. Furhermore hey assumed ha processes do no generae seal aemps frequenly by making processes do addiional page ransfers before hey aemp o seal from anoher process. 3 The Model In his secion we presen a graphheoreic model for mulihreaded compuaions describe he worksealing algorihm define seriesparallel and nesedparallel compuaions and inroduce our model of an HSMS (Hardwareconrolled SharedMemory Sysem). As wih previous work [6 9] we represen a mulihreaded compuaion as a direced acyclic graph a dag of insrucions (see Figure 2). Each node in he dag represens an insrucion and he edges represen ordering consrains. There are hree ypes of edges coninuaion spawn and dependency edges. A hread is a sequenial ordering of insrucions and he nodes ha corresponds o he insrucions are linked in a chain by coninuaion edges. A spawn edge represens he creaion of a new hread and goes from he node represening he insrucion ha spawns he new hread o he node represening he firs insrucion of he new hread. A dependency edge from insrucion W of a hread o insrucionx of some oher hread represens a synchronizaion beween wo insrucions such ha insrucionx mus be execued aferw. We draw spawn edges wih hick sraigh arrows dependency edges wih curly arrows and coninuaion edges wih hick sraigh arrows hroughou his paper. Also we show pahs wih wavy lines. For a compuaion wih an associaed dag we define he compuaional work & as he number of nodes in and he criical pah as he number of nodes on he longes pah of. LeY andz be any wo nodes in a dag. Then we cally an ancesor ofz andz a descendan ofy if here is a pah fromy oz. Any node is is descendan and ancesor. We say ha wo nodes are relaives if here is a pah from one o he oher oherwise we say ha he nodes are independen. The children of a node are independen because oherwise he edge from he node o one child is redundan. We call a common descendan[ ofy andz a merger of Y andz if he pahs fromy o[ andz o[ have only[ in common. We define he deph of a node Y as he number of edges on he shores pah from he roo node oy. We define he leas common ancesor ofy andz as he ancesor of bohy andz wih maximum deph. Similarly we define he greaes common descendan of Y and Z as he descendan of boh Y and Z wih minimum deph. An edge Y]\^Z2 is redundan if here is a pah beween Y and Z ha does no conain he edge Y \^Z2. The ransiive reducion of a dag is he dag wih all he redundan edges removed. In his paper we are only concerned wih he ransiive reducion of he compuaional dags. We also require ha he dags have a single node wih indegree he roo and a single node wih oudegree he final node. In a muliprocess execuion of a mulihreaded compuaion independen nodes can execue a he same ime. If wo independen nodes read or modify he same daa we say ha hey are RR or WW sharing respecively. If one node is reading and he oher is modifying he daa we say hey are RW sharing. RW or WW sharing can cause daa races and he oupu of a compuaion wih such races usually depends on he scheduling of nodes. Such races are ypically indicaive of a bug [18]. We refer o compuaions ha do no have any RW or WW sharing as racefree compuaions. In his paper we consider only racefree compuaions. The worksealing algorihm is a hread scheduling algorihm for mulihreaded compuaions. The idea of worksealing daes back o he research of Buron and Sleep [11] and has been sudied exensively since hen [ ]. In he worksealing algorihm each process mainains a pool of ready hreads and obains work from is pool. When a process spawns a new hread he process adds he hread ino is pool. When a process runs ou of work and finds is pool empy i chooses a random process as is vicim and ries o seal work from he vicim s pool. In our analysis we imagine he worksealing algorihm operaing on individual nodes in he compuaion dag raher han on he hreads. Consider a mulihreaded compuaion and is execuion by he worksealing algorihm. We divide he execuion ino discree ime seps such ha a each sep each process is eiher working on a node which we call he assigned node or is rying o seal work. The execuion of a node akesj ime sep if he node does no incur a cache miss and seps oherwise. We say ha a node is execued a he ime sep ha a process complees execuing he node. The execuion ime of a compuaion is he number of ime seps ha elapse beween he ime sep ha a process sars execuing he roo node o he ime sep ha he final node is execued. The execuion schedule specifies he aciviy of each process a each ime sep. During he execuion each process mainains a deque (doubly ended queue) of ready nodes; we call he ends of a deque he op and he boom. When a nodey is execued i enables some oher node Z if Y is he las paren of Z ha is execued. We call he edge Y]\^Z2 an enabling edge and Y he designaed paren of Z. When a process execues a node ha enables oher nodes one of he enabled nodes become he assigned node and he process pushes he res ono he boom of is deque. If no node is enabled hen he process obains work from is deque by removing a node from he boom of he deque. If a process finds is deque empy i becomes a hief and seals from a randomly chosen process he vicim. This is a seal aemp and akes a leas and a mos _F ime seps for some consan_a`bj o complee. A hief process migh make muliple seal aemps before succeeding or migh never succeed. When a seal succeeds he hief process sars working on he solen node a he sep following he compleion of he seal. We say ha a seal aemp occurs a he sep i complees. The worksealing algorihm can be implemened in various ways. We say ha an implemenaion of work sealing is deerminisic if whenever a process enables oher nodes he implemenaion always chooses he same node as he assigned node for hen nex sep on ha process and he remaining nodes are always placed in he deque in he same order. This mus be rue for boh muliprocess and uniprocess execuions. We refer o a deerminisic implemenaion of he worksealing algorihm ogeher wih he HSMS ha runs he implemenaion as a work sealer. For breviy we refer o an execuion of a mulihreaded compuaion wih a work sealer as an execuion. We define he oal work as he number of seps aken by a uniprocess execuion including he cache misses and denoe i by& where is he cache size. We denoe he number of cache misses in a process execuion wih block caches as %')c. We define he cache overhead

4 2f 1f f 1 h f g 2 ~ ~ ~~ ~ ~ ƒ ~ ~ ~ ~ ed i h g (a) (b) (c) Figure 3: Illusraes he recursive definiion for seriesparallel dags. Figure (a) is he base case figure (b) depics he serial and figure (c) depics he parallel composiion. of a process execuion as % ) kjl%'&( where %'&9 is he number of misses in he uniprocess execuion on he same work sealer. We refer o a mulihreaded compuaion for which he ransiive reducion of he corresponding dag is seriesparallel [33] as a seriesparallel compuaion. A seriesparallel dag mn\po is a dag wih wo disinguished verices a source rqsm and a sink qrm and can be defined recursively as follows (see Figure 3). + Base: consiss of a single edge connecing o. + Series Composiion: consiss of wo seriesparallel dags &9 m]&(\^o&^ and 3$ mu32\^o3= wih disjoin edge ses such ha is he source of & Y is he sink of & and he source of 3 and is he sink of 3. Moreoverm &wv m 3 yx$yfz. + Parallel Composiion: The graph consiss of wo seriesparallel dags &9 m]&(\^o&^ and 3$ mu32\^o3= wih disjoin edges ses such ha and are he source and he sink of boh & and 3. Moreoverm &*v m 3 {x$$\ z. A nesedparallel compuaion is a racefree seriesparallel compuaion [6]. We also consider mulihreaded compuaions ha use fuures [ ]. The dag srucures of compuaions wih fuures are defined elsewhere [4]. This is a superclass of nesedparallel compuaions bu sill much more resricive han general compuaions. The worksealing algorihm for fuures is a resriced form of worksealing algorihm where a process sars execuing a newly creaed hread immediaely puing is assigned hread ono is deque. In our analysis we consider several cache organizaion and replacemen policies for an HSMS. We model a cache as a se of (cache) lines each of which can hold he daa belonging o a memory block (a consecuive ypically small region of memory). One insrucion can operae on a mos one memory block. We say ha an insrucion accesses a block or he line ha conains he block when he insrucion reads or modifies he block. We say ha an insrucion overwries a line ha conains he block when he insrucion accesses some oher block ha replaces in he cache. We say ha a cache replacemen policy is simple if i saisfies wo condiions. Firs he policy is deerminisic. Second whenever he policy decides o overwrie a cache line } i makes he decision o overwrie} by only using informaion peraining o he accesses ha are made afer he las access o }. We refer o a cache managed wih a simple cachereplacemen policy as a simple cache. Simple caches and replacemen policies are common in pracice. For example leasrecenly used (LRU) replacemen policy direc Š ˆ ˆ ~ƒ~ Figure 4: The srucure for dag of a compuaion wih a large cache overhead. mapped caches and se associaive caches where each se is mainained by a simple cache replacemen policy are simple. In regards o he definiion of RW or WW sharing we assume ha reads and wries perain o he whole block. This means we do no allow for false sharing when wo processes accessing differen porions of a block invalidae he block in each oher s caches. In pracice false sharing is an issue bu can ofen be avoided by a knowledge of underlying memory sysem and appropriaely padding he shared daa o preven wo processes from accessing differen porions of he same block. 4 General Compuaions In his secion we show ha he cache overhead of a muliprocess execuion of a general compuaion and a compuaion wih fuures can be large even hough he uniprocess execuion incurs a small number of misses. Theorem 1 There is a family of compuaions x Œ y_f\^žf " n_qr H cz wih compuaional work whose uniprocess execuion incurs 02 misses while any; process execuion of he compuaion incurs misses on a work sealer wih a cache size of assuming ha ay where is he maximum seal ime. Proof: Figure 4 shows he srucure of a dag C for 4 K2. Each node excep he roo node represens a sequence of insrucions accessing a se of disinc memory blocks. The roo node represens :/ insrucions ha accesses disinc memory blocks. The graph has wo symmeric componens and C C which corresponds o he lef and he righ subree of he roo excluding he leaves. We pariion he nodes in ino hree classes C such ha all nodes in a class access he same memory blocks while nodes from differen classes access muually disjoin se of memory blocks. The firs class conains he roo node only he second class conains all he nodes in and he hird class conains he res C of he nodes which are he nodes in and he leaves of C. For general y_ can be pariioned ino and he_ C leaves of and he roo similarly. Each of and conains ;] 3 ]jj nodes and has he srucure of a complee binary ree wih addiional_ leaves a he lowes level. There is a dependency edge from he leaves of boh and o he leaves of. Consider a work sealer ha execues he nodes of in he order ha hey are numbered in a uniprocess execuion. In he uniprocess execuion no node in incurs a cache miss excep he roo node since all nodes in access he same memory blocks as he roo of. The same argumen holds for and he_ leaves of. Hence he execuion of he nodes in and he leaves causes ;= misses. Since he roo node causes misses he oal

5 Ÿ ž Ÿ š š œ Figure 5: The srucure for dag of a compuaion wih fuures ha can incur a large cache overhead. number of misses in he uniprocess execuion is 0". Now consider a; process execuion wih he same work sealer and call he processes process and J. A ime sep J process sars execuing he roo node which enables he roo of no laer han ime sep. Since process sars sealing immediaely and here are no oher processes o seal from process J seals and sars working on he roo of no laer han ime sepb:<. Hence he roo of execues before he roo of and hus all he nodes in execue before he corresponding symmeric node in. Therefore for any leaf of he paren ha is in execues before he paren in. Therefore a leaf node of is execued immediaely afer is paren in and hus causes cache misses. Thus he oal number of cache misses is _ *y. There exiss compuaions similar o he compuaion in Figure 4 ha generalizes Theorem 1 for arbirary number of processes by making sure ha all he processes bu ; seal hroughou any muliprocess execuion. Even in he general case however where he average parallelism is higher han he number of processes Theorem 1 can be generalized wih he same bound on expeced number of cache misses by exploiing he symmery in and by assuming a symmerically disribued sealime. Wih a symmerically disribued sealime for any a seal ha akes seps more han mean sealime is equally likely o happen as a seal ha akes less seps han he mean. Theorem 1 holds for compuaions wih fuures as well. Mulihreaded compuing wih fuures is a fairly resriced form of mulihreaded compuing compared o compuing wih evens such as synchronizaion variables. The graph in Figure 5 shows he srucure of a dag whose ; process execuion causes large number of cache misses. In a ; process execuion of he enabling paren of he leaf nodes in he righ subree of he roo are in he lef subree and herefore he execuion of each such leaf node causes misses. 5 NesedParallel Compuaions In his secion we show ha he cache overhead of an execuion of a nesedparallel compuaion wih a work sealer is a mos wice he produc of he number of seals and he cache size. Our proof has wo seps. Firs we show ha he cache overhead is bounded by he produc of he cache size and he number of nodes ha are execued ou of order wih respec o he uniprocess execuion order. Second we prove ha he number of such ouoforder execuions is a mos wice he number of seals. Consider a compuaion and is process execuion ) wih a work sealer and he uniprocess execuion & wih he same work sealer. LeZ be a node in and nodey be he node ha execues immediaely beforez in &. Then we say haz is drifed in ) if node Y is no execued immediaely before Z by he process ha execuesz in ). Lemma 2 esablishes a key propery of an execuion wih simple caches. Lemma 2 Consider a process wih a simple cache of blocks. Le & denoe he execuion of a sequence of insrucions on he process saring wih cache sae & and le /3 denoe he execuion of he same sequence of insrucions saring wih cache sae 3. Then & incurs a mos more misses han 3. Proof: We consruc a oneoone mapping beween he cache lines in & and 3 such ha an insrucion ha accesses a line } & in & accesses he enry} 3 in /3 if and only if}ª& is mapped o }3. Consider & and le} & be a cache line. LeW be he firs insrucion ha accesses or overwries }ª&. Le } 3 be he cache line ha he same insrucion accesses or overwries in /3 and map } & o }3. Since he caches are simple an insrucion ha overwries} & in & overwries}3 in 3. Therefore he number of misses ha overwries} & in & is equal o he number of misses ha overwries} 3 in /3 afer insrucion W. Since W iself can cause J miss he number of misses ha overwries} & in & is a mosj more han he number of misses ha overwries}3 in 3. We consruc he mapping for each cache line in & in he same way. Now le us show ha he mapping is oneoone. For he sake of conradicion assume ha wo cache lines} & and}3 in & map o he same line in 3. Le W & and W3 be he firs insrucions accessing he cache lines in & such ha WI& is execued before WD3. SinceW«& and WD3 map o he same line in 3 and caches are simplew 3 accesses he line ha W & accesses in & bu hen} & y}3 a conradicion. Hence he oal number of cache misses in & is a mos more han he misses in 3. Theorem 3 Le denoe he oal number of drifed nodes in an execuion of a nesedparallel compuaion wih a work sealer on processes each of which has a simple cache wih words. Then he cache overhead of he execuion is a mos. Proof: Le ) denoe he process execuion and le & be he uniprocess execuion of he same compuaion wih he same work sealer. We divide he muliprocess compuaion ino pieces each of which can incur a mos more misses han in he uniprocess execuion. Le Y be a drifed node le be he process ha execues Y. Le Z be he nex drifed node execued on (or he final node of he compuaion). Le he ordered se represen he execuion order of all he nodes ha are execued afery (Y is included) and beforez (Z is excluded if i is drifed included oherwise) on in ). Then nodes in are execued on he same process and in he same order in boh & and ). Now consider he number of cache misses during he execuion of he nodes in in & and ). Since he compuaion is nesed parallel and herefore race free a process ha execues in parallel wih does no cause o incur cache misses due o sharing. Therefore by Lemma 2 during he execuion of he nodes in he number of cache misses in ) is a mos more han he number of misses in &. This bound holds for each of he sequence of such insrucions corresponding o drifed nodes. Since he sequence saring a he roo node and ending a he firs drifed node incurs he same number of misses in & and ) ) akes a mos more misses han & and he cache overhead is a mos. Lemma 2 (and hus Theorem 3) does no hold for caches ha are no simple. For example consider he execuion of a sequence of insrucions on a cache wih leasfrequenlyused replacemen policy saring a wo cache saes. In he firs cache sae he blocks ha are frequenly accessed by he insrucions are in he cache wih high frequencies whereas in he second cache sae he blocks ha

6 G 1 ± ¹ ¾ º Á À ½ ² ³ Figure 6: Children of and heir merger. µ Figure 7: The join embedding of Y and Z. are in he cache are no accessed by he insrucion and have low frequencies. The execuion wih he second cache sae herefore incurs many more misses han he size of he cache compared o he execuion wih he second cache sae. Now we show ha he number of drifed nodes in an execuion of a seriesparallel compuaion wih a work sealer is a mos wice he number of seals. The proof is based on he represenaion of seriesparallel compuaions as spdags. We call a node wih oudegree of a leas; a fork node and pariion he nodes of an spdag excep he roo ino hree caegories: join nodes sable nodes and nomadic nodes. We call a node ha has an indegree of a leas ; a join node and pariion all he nodes ha have indegree J ino wo classes: a nomadic node has a paren ha is a fork node and a sable node has a paren ha has oudegreej. The roo node has indegree and i does no belong o any of hese caegories. Lemma 4 liss wo fundamenal properies of spdags; one can prove boh properies by inducion on he number of edges in an spdag. Lemma 4 Le be an spdag. Then has he following properies. 1. The leas common ancesor of any wo nodes in is unique. 2. The greaes common descendan of any wo nodes in is unique and is equal o heir unique merger. Lemma 5 Le be a fork node. Then no child of is a join node. Proof: Le Y and Z denoe wo children of and suppose Y is a join node as in Figure 6. Le denoe some oher paren ofy and denoe he unique merger of Y and Z. Then boh and Y are mergers for and which is a conradicion of Lemma 5. Hence Y is no a join node. Corollary 6 Only nomadic nodes can be solen in an execuion of a seriesparallel compuaion by he worksealing algorihm. Proof: Le Y be a solen node in an execuion. Then Y is pushed on a deque and hus he enabling paren of Y is a fork node. By Lemma 5Y is no a join node and has an incoming degreej. Therefore Y is nomadic. Consider a seriesparallel compuaion and le be is spdag. LeY andz be wo independen nodes in and le and denoe heir leas common ancesor and greaes common descendan respecively as shown in Figure 7. Le & denoe he graph ha is G 2» ¼ G1 Figure 8: The join node is he leas common ancesor of[ and. NodeY andz are he children of. induced by he relaives ofy ha are descendans of and also ancesors of. Similarly le 3 denoe he graph ha is induced by he relaives of Z ha are descendans of and ancesors of. Then we call & he embedding of Y wih respec oz and 3 he embedding ofz wih respec oy. We call he graph ha is he union of & and 3 he join embedding of Y and Z wih source and sink. Now consider an execuion of and[ and be he children of such ha[ is execued before. Then we call[ he leader and he guard of he join embedding. Lemma 7 Le mn\^o be an spdag and le[ and be wo parens of a join node in. Le & denoe he embedding of[ wih respec o and 3 denoe he embedding of wih respec o [. Le denoe he source and denoe he sink of he join embedding. Then he parens of any node in & excep for and is in & and he parens of any node in 3 excep for and is in 3. Proof: Since[ and are independen boh of and are differen from [ and (see Figure 8). Firs we show ha here is no an edge ha sars a a node in & excep a and ends a a node in 3 excep a and vice versa. For he sake of conradicion assume here is an edge G\Â such ha ÄÃ b is in & and yã is in 3. Then is he leas common ancesor of [ and ; hence no such G\^ exiss. A similar argumen holds when is in 3 and is in &. Second we show ha here does no exiss an edge ha originaes from a node ouside of & or 3 and ends a a node a & or 3. For he sake of conradicion le Å \ÂÆ be an edge such ha Æ is in & andå is no in & or 3. ThenÆ is he unique merger for he wo children of he leas common ancesor of Å and which we denoe wih. Bu hen is also a merger for he children of. The children of are independen and have a unique merger hence here is no such edge ÅÇ\ Æ. A similar argumen holds whenæ is in 3. Therefore we conclude ha he parens of any node in & excep and is in & and he parens of any node in 3 excep and is in 3. Lemma 8 Le be an spdag and le[ and be wo parens of a join node in. Consider he join embedding of[ and and le Y be he guard node of he embedding. Then[ and are execued in he same respecive order in a muliprocess execuion as hey are execued in he uniprocess execuion if he guard node Y is no solen. Proof: Le be he source he sink and Z he leader of he join embedding. SinceY is no solen Z is no solen. Hence by Lemma 7 before i sars working on Y he process ha execues execued Z and all is descendans in he embedding excep for Hence is execued before Y and [ is execued afer Y as in he uniprocess execuion. Therefore[ and are execued in he same respecive order as hey execue in he uniprocess execuion. G2

7 Ê ÎÊ ÌÊ Ë ÏÉ É Í ÍÉ Ì Ê È Figure 9: Nodes & and 3 are wo join nodes wih he common guardy. Lemma 9 A nomadic node is drifed in an execuion only if i is solen. Proof: Le Y be a nomadic and drifed node. Then by Lemma 5 Y has a single paren ha enablesy. IfY is he firs child of o execue in he uniprocess execuion hen Y is no drifed in he muliprocess execuion. Hence Y is no he firs child o execue. Le Z be he las child of ha is execued beforey in he uniprocess execuion. Now consider he muliprocess execuion and le be he process ha execues Z. For he sake of conradicion assume ha Y is no solen. Consider he join embedding of Y and Z as shown in Figure 8. Since all parens of he nodes in 3 excep for and are in 3 by Lemma 7 execues all he nodes in 3 before i execues Y and hus precedes Y on. Bu heny is no drifed because is he node ha is execued immediaely before Y in he uniprocess compuaion. Hence Y is solen. Le us define he cover of a join node in an execuion as he se of all he guard nodes of he join embedding of all possible pairs of parens of in he execuion. The following lemma shows ha a join node is drifed only if a node in is cover is solen. Lemma 10 A join node is drifed in an execuion only if a node in is cover is solen in he execuion. Proof: ÈÉ Consider he execuion and le be a join node ha is drifed. Assume for he sake of conradicion ha no node in he cover of is solen. Le [ and be any wo parens of as in Figure 8. Then[ and are execued in he same order as in he uniprocess execuion by Lemma 8. Bu hen all parens of execue in he same order as in he uniprocess execuion. Hence he enabling paren of in he execuion is he same as in he uniprocess execuion. Furhermore he enabling paren of has oudegree J because oherwise is no a join node by Lemma 5 and hus he process ha enables execues. Therefore is no drifed. A conradicion hence a node in he cover of is solen. Lemma 11 The number of drifed nodes in an execuion of a seriesparallel compuaion is a mos wice he number of seals in he execuion. Proof: We associae each drifed node in he execuion wih a seal such ha no seal has more han ; drifed nodes associaed wih i. Consider a drifed node Y. Then Y is no he roo node of he compuaion and i is no sable eiher. Hence Y is eiher a nomadic or join node. IfY is nomadic heny is solen by Lemma 9 and we associae Y wih he seal ha seals Y. Oherwise Y is a join node and here is a node in is cover YF ha is solen by Lemma 10. We associaey wih he seal ha seals a node in is cover. Now assume here are more han; nodes associaed wih a seal ha seals node Y. Then here are a leas wo join nodes & and 3 ha are associaed wihy. Therefore nodey is in he join embedding of wo parens of & and also 3. Le Æ & [ & be hese parens of & andæ 3 [ 3 be he parens of 3 as shown in Figure 9. Bu heny has paren ha is a fork node and is a join node which conradics Lemma 5. Hence no such Y exiss. Theorem 12 The cache overhead of an execuion of a nesedparallel compuaion wih simple caches is a mos wice he produc of he number of misses in he execuion and he cache size. Proof: Follows from Theorem 3 and Lemma An Analysis of Nonblocking Work Sealing The nonblocking implemenaion of he worksealing algorihm delivers provably good performance under radiional and muliprogrammed workloads. A descripion of he implemenaion and is analysis is presened in [2]; an experimenal evaluaion is given in [10]. In his secion we exend he analysis of he nonblocking worksealing algorihm for classical workloads and bound he execuion ime of a nesedparallel compuaion wih a work sealer o include he number of cache misses he cachemiss penaly and he seal ime. Firs we bound he number of seal aemps in an execuion of a general compuaion by he worksealing algorihm. Then we bound he execuion ime of a nesedparallel compuaion wih a work sealer using resuls from Secion 5. The analysis ha we presen here is similar o he analysis given in [2] and uses he same poenial funcion echnique. We associae a nonnegaive poenial wih nodes in a compuaion s dag and show ha he poenial decreases as he execuion proceeds. We assume ha a node in a compuaion dag has oudegree a mos ;. This is consisen wih he assumpion ha each node represens on insrucion. Consider an execuion of a compuaion wih is dag mc\^o wih he worksealing algorihm. The execuion grows a ree he enabling ree ha conains each node in he compuaion and is enabling edge. We define he disance of a node YqGm ÐF YF as j4ð2ñâò ªÓ YF where Ð2ÑÂÒ Ó YF is he deph of Y in he enabling ree of he compuaion. Inuiively he disance of a node indicaes how far he node is away from end of he compuaion. We define he poenial funcion in erms of disances. A any given sep W we assign a posiive poenial o each ready node all oher nodes have poenial. A node is ready if i is enabled and no ye execued o compleion. Le Y denoe a ready node a ime sep W. Then we define ÔÕI YF he poenial of Y a ime sep W as Ö Ô Õ Ỹ c 0 3^ B Ø9EIÙ & if Y is assigned; 0 3^ B Ø9E oherwise. The poenial a sepwú Õ is he sum of he poenial of each ready node a sep W. When an execuion begins he only ready node is he roo node which has disance and is assigned o some process so we sar wih ÚÛÜ0 Ý/Ù &. As he execuion proceeds nodes ha are deeper in he dag become ready and he poenial decreases. There are no ready nodes a he end of an execuion and he poenial is. Le us give a few more definiions ha enable us o associae a poenial wih each process. Le Õ = denoe he se of ready nodes ha are in he deque of process along wih s assigned node if any a he beginning of sep W. We say ha each node in ÞÕp = Y belongs o process. Then we define he poenial of s deque as Ú Õ = ß Ô Õ YF nå Ø$à"áâªBäãIE

8 ç beginning of sep W and le Õ denoe he se of all oher Ú Õ {Ú Õ æ Õ :8Ú Õ Õ n\ In addiion leæ Õ denoe he se of processes whose deque is empy a he processes. We pariion he poenial ÚèÕ ino wo pars Ú Õ Ú Õ Ú Õ Ú Õ where æ Õ *éß = ãêà2ëâ and Õ *éß = n\ ãêàuìcâ and we analyze he wo pars separaely. Lemma 13 liss four basic properies of he poenial ha we use frequenly. The proofs for hese properies are given in [2] and he lised properies are correc independen of he ime ha execuion of a node or a seal akes. Therefore we give a shor proof skech. Lemma 13 The poenial funcion saisfies he following properies. 1. Suppose nodey is assigned o a process a sepw. Then he poenial decreases by a leas ;$í(02ïô Õ YF. 2. Suppose a node Y is execued a sep W. Then he poenial decreases by a leas $í î"ïôõ«ỹ a sepw. 3. Consider any sep W and any process in Õ. The opmos node Y in s deque conribues a leas 0"í K of he poenial associaed wih. Tha is we have ÔÕI YF ïỳ 0"í K" ÂÚðÕI =. 4. Suppose a process Ò chooses process in Õ as is vicim a ime sepw (a seal aemp ofò argeing occurs a sepw). Then he poenial decreases by a leas ÂJ(í$;= ÂÚ Õ = due o he assignmen or execuion of a node belonging o a he end of sepw. Propery J follows direcly from he definiion of he poenial funcion. Propery ; holds because a node enables a mos wo children wih smaller poenial one of which becomes assigned. Specifically he poenial afer he execuion of node Y decreases by a leas Ôc YF ê ÂJñj ò& j ó& *õôó Ôc YF. Propery0 follows from a srucural propery of he nodes in a deque. The disance of he nodes in a process deque decrease monoonically from he op of he deque o boom. Therefore he poenial in he deque is he sum of geomerically decreasing erms and dominaed by he poenial of he op node. The las propery holds because when a process chooses process in Õ as is vicim he node a he op of s deque is assigned a he nex sep. Therefore he poenial decreases by ;=í 0=Ô Õ YF by propery J. Moreover Ô Õ YF k`{ 02í K" ÂÚ Õ = by propery 0 and he resul follows. Lemma 16 shows ha he poenial decreases as a compuaion proceeds. The proof for Lemma 16 uilizes balls and bins game bound from Lemma 14. Lemma 14 (Balls and Weighed Bins) Suppose ha a leas balls are hrown independenly and uniformly a random ino bins where binw has a weighö Õ forw J$\9å å9å9\^. The oal weigh is öøúù Õüû& ) ö Õ. For each binw define he random variable Õ as ýõ þ öaõ if some ball lands in bin W ; oherwise. If 1úù Õüû& ) Õ hen for anyÿ in he range 'ÿ új we have x( é`'ÿnöúz yjèj8j(í2 Â ÂJðj4ÿ*ÏÑ=. This lemma can be proven wih an applicaion of Markov s inequaliy. The proof of a weaker version of his lemma for he case of exacly hrows is similar and given in [2]. Lemma 14 also follows from he weaker lemma because does no decrease wih more hrows. We now show ha whenever or more seal aemps occur he poenial decreases by a consan fracion of Ú Õ Õ wih consan probabiliy. Lemma 15 Consider any sep W and any laer sepx such ha a leas seal aemps occur a seps from W (inclusive) o X (exclusive). Then we have þ*ú Õ j Ú Þ` K J Ú Õ Õ J K å Moreover he poenial decrease is because of he execuion or assignmen of nodes belonging o a process in /Õ. Proof: Consider all processes and seal aemps ha occur a or afer sep W. For each process in Õ if one or more of he aemps arge as he vicim hen he poenial decreases by ÂJ(í=;$ ÂÚèÕI = due o he execuion or assignmen of nodes ha belong o by properyk in Lemma 13. If we hink of each aemp as a ball oss hen we have an insance of he Balls and Weighed Bins Lemma (Lemma 14). For each process in /Õ we assign a weigh ö ã õ ÂJ(í=;$ ÂÚèÕp = and for each oher process inæ Õ we assign a weighö ã y. The weighs sum oö ÂJí$;$ ÂÚ Õ Õ. Using ÿ J(í=; in Lemma 14 we conclude ha he poenial decreases by a leasÿnö ÂJí K" ÂÚ Õ Õ wih probabiliy greaer han Jèj J(í2 Â ÂJj4ÿ*ÏÑ= Jí K due o he execuion or assignmen of nodes ha belong o a process in Õ. We now bound he number of seal aemps in a worksealing compuaion. Lemma 16 Consider a process execuion of a mulihreaded compuaion wih he worksealing algorihm. Le & and denoe he compuaional work and he criical pah of he compuaion. Then he expeced number of seal aemps in he execuion is (. Moreover for any he number of seal aemps is ( :F ÂJ(í= Â wih probabiliy a leas Jèj. Proof: We analyze he number of seal aemps by breaking he execuion ino phases of seal aemps. We show ha wih consan probabiliy a phase causes he poenial o drop by a consan facor. The firs phase begins a sep & J and ends a he firs sep & such ha a leas seal aemps occur during he inerval of seps & \ &. The second phase begins a sep 3 & : J and so on. Le us firs show ha here are a leas seps in a phase. A process has a mos J ousanding seal aemp a any ime and a seal aemp akes a leas seps o complee. Therefore a mos seal aemps occur in a period of ime seps. Hence a phase of seal aemps akes a leas I ( ( Â Âí$(`s ime unis. Consider a phase beginning a sep W and le X be he sep a which he nex phase begins. Then W :< 7 X. We will show ha we have x2ú 7 02í(K2 ÂÚ Õ z J(í K. Recall ha he poenial can be pariioned as ÚèÕyÚèÕI æ Õ 2:GÚèÕª /Õ. Since he phase conains (w seal aemps x$ú Õ j8ú `{ ÂJ(í(K2 ÂÚ Õ Õ Âz Jí K due o execuion or assignmen of nodes ha belong o a process in Õ by Lemma 15. Now we show ha he poenial also drops by a consan fracion ofú Õ æ Õ due o he execuion of assigned nodes ha are assigned o he processes in æ Õ. Consider a process say in æ Õ. If does no have an assigned node hen Ú Õ =.. If has an assigned node Y hen ÚèÕI = ý ÔÕI YF. In his case process complees execuing node Y a sep W=:aúj4JX a he

9 laes and he poenial drops by a leas $í(î2ïô Õ YF by propery ; of Lemma 13. Summing over each process in æ Õ we have ÚèÕ]j Ú ` $í î" ÂÚèÕ«æ Õ. Thus we have shown ha he poenial decreases a leas by a quarer of Ú Õ æ Õ and Ú Õ Õ. Therefore no maer how he oal poenial is disribued over æ Õ and Õ he oal poenial decreases by a quarer wih probabiliy more han Jí K ha is x=ú Õ jú ỳ ÂJí K" ÂÚ Õ z {J(í(K. We say ha a phase is successful if i causes he poenial o drop by a leas a Jí K fracion. A phase is successful wih probabiliy a leas Jí K. Since he poenial sars a ÚÛ.0 Ý Ù & and ends a (and is always an ineger) he number of successful phases is a mos ; "! j{j( ò 0b#$. The expeced number of phases needed o obain #= successful phases is a mos0";. Thus he expeced number of phases is and because each phase conains w seal aemps he expeced number of seal aemps is ( (4. The high probabiliy bound follows by an applicaion of he Chernoff bound. Theorem 17 Le %8)c be he number of cache misses in a process execuion of a nesedparallel compuaion wih a worksealer ha has simple caches of blocks each. Le% & be he number of cache misses in he uniprocess execuion Then %')* c{% & : ( {: #%$c ÂJí=  wih probabiliy a leas J2j&. The expeced number of cache misses is %'& :8 ( Proof: Theorem 12 shows ha he cache overhead of a nesedparallel compuaion is a mos wice he produc of he number of seals and he cache size. Lemma 16 shows ha he number of seal aemps is (((/ ý:%$c ÂJí=   wih probabiliy a leas J2j and he expeced number of seals is (((<. The number of seals is no greaer han he number of seal aemps. Therefore he bounds follow. Theorem 18 Consider a process nesedparallel worksealing compuaion wih simple caches of blocks. Then for any y he execuion ime is & :8 :%$ ÂJ(í=  =: 8:( ê :'%$ ÂJí=   wih probabiliy a leas ÂJj(=. Moreover he expeced running ime is Proof: &( :<8 :s b:8(ï nå We use an accouning argumen o bound he running ime. A each sep in he compuaion each process pus a dollar ino one of wo buckes ha maches is aciviy a ha sep. We name he wo buckes as he work and he seal bucke. A process pus a dollar ino he work bucke a a sep if i is working on a node in he sep. The execuion of a node in he dag adds eiher J or dollars o he work bucke. Similarly a process pus a dollar ino he seal bucke for each sep ha i spends sealing. Each seal aemp akes ( seps. Therefore each seal adds ( dollars o he seal bucke. The number of dollars in he work bucke a he end of execuion is a mos & :y j8j( %8)k  which is & : j8j ) y:%$c ÂJ(í+   *./01 Figure 10: The ree of hreads creaed in a daaparallel worksealing applicaion.. wih probabiliy a leasjèj3 The oal number of dollars in seal bucke is he oal number of seal aemps muliplied by he number of dollars added o he seal bucke for each seal aemp which is (. Therefore oal number of dollars in he seal bucke is 4) * < {:3%$c ÂJí wih probabiliy a leas J/j4. Each process adds exacly one dollar o a bucke a each sep so we divide he oal number of dollars by o ge he high probabiliy bound in he heorem. A similar argumen holds for he expeced ime bound. 7 LocaliyGuided Work Sealing The worksealing algorihm achieves good daa localiy by execuing nodes ha are close in he compuaion graph on he same process. For cerain applicaions however regions of he program ha access he same daa are no close in he compuaional graph. As an example consider an applicaion ha akes a sequence of seps each of which operaes in parallel over a se or array of values. We will call such an applicaion an ieraive daaparallel applicaion. Such an applicaion can be implemened using worksealing by forking a ree of hreads on each sep in which each leaf of he ree updaes a region of he daa (ypically disjoin). Figure 10 shows an example of he rees of hreads creaed in wo seps. Each node represens a hread and is labeled wih he process ha execues i. The gray nodes are he leaves. The hreads synchronize in he same order as hey fork. The firs and second seps are srucurally idenical and each pair of corresponding gray nodes updae he same region ofen using much of he same inpu daa. The dashed recangle in Figure 10 for example shows a pair of such gray nodes. To ge good localiy for his applicaion hreads ha updae he same daa on differen seps ideally should run on he same processor even hough hey are no close in he dag. In work sealing however his is highly unlikely o happen due o he random seals. Figure 10 for example shows an execuion where all pairs of corresponding gray nodes run on differen processes. In his secion we describe and evaluae localiyguided work sealing a heurisic modificaion o work sealing which is designed o allow localiy beween nodes ha are disan in he compuaional graph. In localiyguided work sealing each hread can be given an affiniy for a process and when a process obains work i gives prioriy o hreads wih affiniy for i. To enable his in addiion o a deque each process mainains a mailbox: a firsinfirsou 2./01  Â

10 (FIFO) queue of poiners o hreads ha have affiniy for he process. There are hen wo differences beween he localiyguided worksealing and worksealing algorihms. Firs when creaing a hread a process will push he hread ono boh he deque as in normal work sealing and also ono he ail of he mailbox of he process ha he hread has affiniy for. Second a process will firs ry o obain work from is mailbox before aemping a seal. Because hreads can appear wice once in a mailbox and once on a deque here needs o be some form of synchronizaion beween he wo copies o make sure he hread is no execued wice. A number of echniques ha have been suggesed o improve he daa localiy of mulihreaded programs can be realized by he localiyguided worksealing algorihm ogeher wih an appropriae policy o deermine he affiniies of hreads. For example an iniial disribuion of work among processes can be enforced by seing he affiniies of a hread o he process ha i will be assigned a he beginning of he compuaion. We call his localiyguided worksealing wih iniial placemens. Likewise echniques ha rely on hins from he programmer can be realized by seing he affiniy of hreads based on he hins. In he nex secion we describe an implemenaion of localiyguided work sealing for ieraive daaparallel applicaions. The implemenaion described can be modified easily o implemen oher echniques menioned. 7.1 Implemenaion We buil localiyguided work sealing ino Hood. Hood is a mulihreaded programming library wih a nonblocking implemenaion of work sealing ha delivers provably good performance under boh radiional and muliprogrammed workloads [ ]. In Hood he programmer defines a hread as a C++ class which we refer o as he hread definiion. A hread definiion has a mehod named run ha defines he code ha he hread execues. The run mehod is a C++ funcion which can call Hood library funcions o creae and synchronize wih oher hreads. A rope is an objec ha is an insance of a hread definiion class. Each ime he run mehod of a rope is execued i creaes a new hread. A rope can have an affiniy for a process and when he Hood runime sysem execues such a rope he sysem passes his affiniy o he hread. If he hread does no run on he process for which i has affiniy he affiniy of he rope is updaed o he new process. Ieraive daaparallel applicaions can effecively use ropes by making sure all corresponding hreads (hreads ha updae he same region across differen seps) are generaed from he same rope. A hread will herefore always have an affiniy for he process on which i s corresponding hread ran on he previous sep. The dashed recangle in Figure 10 for example represens wo hreads ha are generaed in wo execuions of one rope. To iniialize he ropes he programmer needs o creae a ree of ropes before he firs sep. This ree is hen used on each sep when forking he hreads. To implemen localiyguided work sealing in Hood we use a nonblocking queue for each mailbox. Since a hread is pu o a mailbox and o a deque one issue is making sure ha he hread is no execued wice once from he mailbox and once from he deque. One soluion is o remove he oher copy of a hread when a process sars execuing i. In pracice his is no efficien because i has a large synchronizaion overhead. In our implemenaion we do his lazily: when a process sars execuing a hread i ses a flag using an aomic updae operaion such as esandse or compareandswap o mark he hread. When execuing a hread a process idenifies a marked hread wih he aomic updae and discards he hread. The second issue comes up when one wans o reuse he hread daa srucures ypically hose from he previous sep. When a hread s srucure is reused in a sep he copies from he previous sep which can be in a mailbox or a deque needs o be marked invalid. One can implemen his by invalidaing all he Benchmark Work Overhead Criical Pah Average ( & ) (@$A ) Lengh ( ) Par. ) saichea J(2åî" hea J9Lñå;= J$å J(; uå$k2 0=LuJ=åüJ=J lghea J9Lñå076 J$å J(; uå$k$k 076$;2å" iphea J9Lñå076 J$å J(; uå$k$k 076$;2å" saicrelax K$KñåüJ J$å$# relax K$0ñåî=0 J$å$# uå$0$î J$J; LñåKñJ lgrelax K$Kñå;=; J$å$# uå$0$î J$J 0$0ñå#=K iprelax K$Kñå;=; J$å$# uå$0$î J$J 0$0ñå#=K Table 1: Measured benchmark characerisics. We compiled all applicaions wih Sun CC compiler using xarch=v8plus O5 dalign flags. All imes are given in seconds. denoes he execuion ime of he sequenial algorihm for he applicaion and is J9Kñå (K for Hea and for Relax. muliple copies of hreads a he end of a sep and synchronizing all processes before he nex sep sar. In muliprogrammed workloads however he kernel can swap a process ou prevening i from paricipaing o he curren sep. Such a swapped ou process prevens all he oher processes from proceeding o he nex sep. In our implemenaion o avoid he synchronizaion a he end of each sep we imesamp hread daa srucures such ha each process closely follows he ime of he compuaion and ignores a hread ha is ouofdae. 7.2 Experimenal Resuls In his secion we presen he resuls of our preliminary experimens wih localiyguided work sealing on wo small applicaions. The experimens were run on a J K processor Sun Ulra Enerprise wih K$= MHz processors and K M bye L2 cache each and running Solaris 2.7. We used he processor bind sysem call of Solaris 2.7 o bind processes o processors o preven Solaris kernel from migraing a process among processors causing he process o loose is cache sae. When he number of processes is less han number of processors we bind one process o each processor oherwise we bind processes o processors such ha processes are disribued among processors as evenly as possible. We use he applicaions Hea and Relax in our evaluaion. Hea is a Jacobi overrelaxaion ha simulaes hea propagaion on a; dimensional grid for a number of seps. This benchmark was derived from similar Cilk [27] and SPLASH [35] benchmarks. The main daa srucures are wo equalsized arrays. The algorihm runs in seps each of which updaes he enries in one array using he daa in he oher array which was updaed in he previous sep. Relax is a GaussSeidel overrelaxaion algorihm ha ieraes over one a J dimensional array updaing each elemen by a weighed average of is value and ha of is wo neighbors. We implemened each applicaion wih four sraegies saic pariioning work sealing localiyguided work sealing and localiy guided work sealing wih iniial placemens. The saic pariioning benchmarks divide he oal work equally among he number of processes and makes sure ha each process accesses he same daa elemens in all he seps. I is implemened direcly wih Solaris hreads. The hree worksealing sraegies are all implemened in Hood. The plain worksealing version uses hreads direcly and he wo localiyguided versions use ropes by building a ree of ropes a he beginning of he compuaion. The iniial placemen sraegy assigns iniial affiniies o he ropes near he op of he ree o achieve a good iniial load balance. We use he following prefixes in he names of he benchmarks: saic (saic pariioning) none (work seal

Implementing Ray Casting in Tetrahedral Meshes with Programmable Graphics Hardware (Technical Report)

Implementing Ray Casting in Tetrahedral Meshes with Programmable Graphics Hardware (Technical Report) Implemening Ray Casing in Terahedral Meshes wih Programmable Graphics Hardware (Technical Repor) Marin Kraus, Thomas Erl March 28, 2002 1 Inroducion Alhough cell-projecion, e.g., [3, 2], and resampling,

More information

The Data Locality of Work Stealing

The Data Locality of Work Stealing The Data Locality of Work Stealing Umut A. Acar School of Computer Science Carnegie Mellon University umut@cs.cmu.edu Guy E. Blelloch School of Computer Science Carnegie Mellon University guyb@cs.cmu.edu

More information

Scheduling. Scheduling. EDA421/DIT171 - Parallel and Distributed Real-Time Systems, Chalmers/GU, 2011/2012 Lecture #4 Updated March 16, 2012

Scheduling. Scheduling. EDA421/DIT171 - Parallel and Distributed Real-Time Systems, Chalmers/GU, 2011/2012 Lecture #4 Updated March 16, 2012 EDA421/DIT171 - Parallel and Disribued Real-Time Sysems, Chalmers/GU, 2011/2012 Lecure #4 Updaed March 16, 2012 Aemps o mee applicaion consrains should be done in a proacive way hrough scheduling. Schedule

More information

Shortest Path Algorithms. Lecture I: Shortest Path Algorithms. Example. Graphs and Matrices. Setting: Dr Kieran T. Herley.

Shortest Path Algorithms. Lecture I: Shortest Path Algorithms. Example. Graphs and Matrices. Setting: Dr Kieran T. Herley. Shores Pah Algorihms Background Seing: Lecure I: Shores Pah Algorihms Dr Kieran T. Herle Deparmen of Compuer Science Universi College Cork Ocober 201 direced graph, real edge weighs Le he lengh of a pah

More information

4. Minimax and planning problems

4. Minimax and planning problems CS/ECE/ISyE 524 Inroducion o Opimizaion Spring 2017 18 4. Minima and planning problems ˆ Opimizing piecewise linear funcions ˆ Minima problems ˆ Eample: Chebyshev cener ˆ Muli-period planning problems

More information

Coded Caching with Multiple File Requests

Coded Caching with Multiple File Requests Coded Caching wih Muliple File Requess Yi-Peng Wei Sennur Ulukus Deparmen of Elecrical and Compuer Engineering Universiy of Maryland College Park, MD 20742 ypwei@umd.edu ulukus@umd.edu Absrac We sudy a

More information

Optimal Crane Scheduling

Optimal Crane Scheduling Opimal Crane Scheduling Samid Hoda, John Hooker Laife Genc Kaya, Ben Peerson Carnegie Mellon Universiy Iiro Harjunkoski ABB Corporae Research EWO - 13 November 2007 1/16 Problem Track-mouned cranes move

More information

Sam knows that his MP3 player has 40% of its battery life left and that the battery charges by an additional 12 percentage points every 15 minutes.

Sam knows that his MP3 player has 40% of its battery life left and that the battery charges by an additional 12 percentage points every 15 minutes. 8.F Baery Charging Task Sam wans o ake his MP3 player and his video game player on a car rip. An hour before hey plan o leave, he realized ha he forgo o charge he baeries las nigh. A ha poin, he plugged

More information

PART 1 REFERENCE INFORMATION CONTROL DATA 6400 SYSTEMS CENTRAL PROCESSOR MONITOR

PART 1 REFERENCE INFORMATION CONTROL DATA 6400 SYSTEMS CENTRAL PROCESSOR MONITOR . ~ PART 1 c 0 \,).,,.,, REFERENCE NFORMATON CONTROL DATA 6400 SYSTEMS CENTRAL PROCESSOR MONTOR n CONTROL DATA 6400 Compuer Sysems, sysem funcions are normally handled by he Monior locaed in a Peripheral

More information

A Matching Algorithm for Content-Based Image Retrieval

A Matching Algorithm for Content-Based Image Retrieval A Maching Algorihm for Conen-Based Image Rerieval Sue J. Cho Deparmen of Compuer Science Seoul Naional Universiy Seoul, Korea Absrac Conen-based image rerieval sysem rerieves an image from a daabase using

More information

COMP26120: Algorithms and Imperative Programming

COMP26120: Algorithms and Imperative Programming COMP26120 ecure C3 1/48 COMP26120: Algorihms and Imperaive Programming ecure C3: C - Recursive Daa Srucures Pee Jinks School of Compuer Science, Universiy of Mancheser Auumn 2011 COMP26120 ecure C3 2/48

More information

NEWTON S SECOND LAW OF MOTION

NEWTON S SECOND LAW OF MOTION Course and Secion Dae Names NEWTON S SECOND LAW OF MOTION The acceleraion of an objec is defined as he rae of change of elociy. If he elociy changes by an amoun in a ime, hen he aerage acceleraion during

More information

Quantitative macro models feature an infinite number of periods A more realistic (?) view of time

Quantitative macro models feature an infinite number of periods A more realistic (?) view of time INFINIE-HORIZON CONSUMPION-SAVINGS MODEL SEPEMBER, Inroducion BASICS Quaniaive macro models feaure an infinie number of periods A more realisic (?) view of ime Infinie number of periods A meaphor for many

More information

A Formalization of Ray Casting Optimization Techniques

A Formalization of Ray Casting Optimization Techniques A Formalizaion of Ray Casing Opimizaion Techniques J. Revelles, C. Ureña Dp. Lenguajes y Sisemas Informáicos, E.T.S.I. Informáica, Universiy of Granada, Spain e-mail: [jrevelle,almagro]@ugr.es URL: hp://giig.ugr.es

More information

COSC 3213: Computer Networks I Chapter 6 Handout # 7

COSC 3213: Computer Networks I Chapter 6 Handout # 7 COSC 3213: Compuer Neworks I Chaper 6 Handou # 7 Insrucor: Dr. Marvin Mandelbaum Deparmen of Compuer Science York Universiy F05 Secion A Medium Access Conrol (MAC) Topics: 1. Muliple Access Communicaions:

More information

STEREO PLANE MATCHING TECHNIQUE

STEREO PLANE MATCHING TECHNIQUE STEREO PLANE MATCHING TECHNIQUE Commission III KEY WORDS: Sereo Maching, Surface Modeling, Projecive Transformaion, Homography ABSTRACT: This paper presens a new ype of sereo maching algorihm called Sereo

More information

EECS 487: Interactive Computer Graphics

EECS 487: Interactive Computer Graphics EECS 487: Ineracive Compuer Graphics Lecure 7: B-splines curves Raional Bézier and NURBS Cubic Splines A represenaion of cubic spline consiss of: four conrol poins (why four?) hese are compleely user specified

More information

Improving the Efficiency of Dynamic Service Provisioning in Transport Networks with Scheduled Services

Improving the Efficiency of Dynamic Service Provisioning in Transport Networks with Scheduled Services Improving he Efficiency of Dynamic Service Provisioning in Transpor Neworks wih Scheduled Services Ralf Hülsermann, Monika Jäger and Andreas Gladisch Technologiezenrum, T-Sysems, Goslarer Ufer 35, D-1585

More information

Network management and QoS provisioning - QoS in Frame Relay. . packet switching with virtual circuit service (virtual circuits are bidirectional);

Network management and QoS provisioning - QoS in Frame Relay. . packet switching with virtual circuit service (virtual circuits are bidirectional); QoS in Frame Relay Frame relay characerisics are:. packe swiching wih virual circui service (virual circuis are bidirecional);. labels are called DLCI (Daa Link Connecion Idenifier);. for connecion is

More information

Image segmentation. Motivation. Objective. Definitions. A classification of segmentation techniques. Assumptions for thresholding

Image segmentation. Motivation. Objective. Definitions. A classification of segmentation techniques. Assumptions for thresholding Moivaion Image segmenaion Which pixels belong o he same objec in an image/video sequence? (spaial segmenaion) Which frames belong o he same video sho? (emporal segmenaion) Which frames belong o he same

More information

Distributed Task Negotiation in Modular Robots

Distributed Task Negotiation in Modular Robots Disribued Task Negoiaion in Modular Robos Behnam Salemi, eer Will, and Wei-Min Shen USC Informaion Sciences Insiue and Compuer Science Deparmen Marina del Rey, USA, {salemi, will, shen}@isi.edu Inroducion

More information

Y. Tsiatouhas. VLSI Systems and Computer Architecture Lab

Y. Tsiatouhas. VLSI Systems and Computer Architecture Lab CMOS INEGRAED CIRCUI DESIGN ECHNIQUES Universiy of Ioannina Clocking Schemes Dep. of Compuer Science and Engineering Y. siaouhas CMOS Inegraed Circui Design echniques Overview 1. Jier Skew hroughpu Laency

More information

CENG 477 Introduction to Computer Graphics. Modeling Transformations

CENG 477 Introduction to Computer Graphics. Modeling Transformations CENG 477 Inroducion o Compuer Graphics Modeling Transformaions Modeling Transformaions Model coordinaes o World coordinaes: Model coordinaes: All shapes wih heir local coordinaes and sies. world World

More information

Chapter 4 Sequential Instructions

Chapter 4 Sequential Instructions Chaper 4 Sequenial Insrucions The sequenial insrucions of FBs-PLC shown in his chaper are also lised in secion 3.. Please refer o Chaper, "PLC Ladder diagram and he Coding rules of Mnemonic insrucion",

More information

NRMI: Natural and Efficient Middleware

NRMI: Natural and Efficient Middleware NRMI: Naural and Efficien Middleware Eli Tilevich and Yannis Smaragdakis Cener for Experimenal Research in Compuer Sysems (CERCS), College of Compuing, Georgia Tech {ilevich, yannis}@cc.gaech.edu Absrac

More information

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II CS 152 Compuer Archiecure and Engineering Lecure 7 - Memory Hierarchy-II Krse Asanovic Elecrical Engineering and Compuer Sciences Universiy of California a Berkeley hp://www.eecs.berkeley.edu/~krse hp://ins.eecs.berkeley.edu/~cs152

More information

A time-space consistency solution for hardware-in-the-loop simulation system

A time-space consistency solution for hardware-in-the-loop simulation system Inernaional Conference on Advanced Elecronic Science and Technology (AEST 206) A ime-space consisency soluion for hardware-in-he-loop simulaion sysem Zexin Jiang a Elecric Power Research Insiue of Guangdong

More information

CAMERA CALIBRATION BY REGISTRATION STEREO RECONSTRUCTION TO 3D MODEL

CAMERA CALIBRATION BY REGISTRATION STEREO RECONSTRUCTION TO 3D MODEL CAMERA CALIBRATION BY REGISTRATION STEREO RECONSTRUCTION TO 3D MODEL Klečka Jan Docoral Degree Programme (1), FEEC BUT E-mail: xkleck01@sud.feec.vubr.cz Supervised by: Horák Karel E-mail: horak@feec.vubr.cz

More information

Quick Verification of Concurrent Programs by Iteratively Relaxed Scheduling

Quick Verification of Concurrent Programs by Iteratively Relaxed Scheduling Quick Verificaion of Concurren Programs by Ieraively Relaxed Scheduling Parick Mezler, Habib Saissi, Péer Bokor, Neeraj Suri Technische Univerisä Darmsad, Germany {mezler, saissi, pbokor, suri}@deeds.informaik.u-darmsad.de

More information

Less Pessimistic Worst-Case Delay Analysis for Packet-Switched Networks

Less Pessimistic Worst-Case Delay Analysis for Packet-Switched Networks Less Pessimisic Wors-Case Delay Analysis for Packe-Swiched Neworks Maias Wecksén Cenre for Research on Embedded Sysems P O Box 823 SE-31 18 Halmsad maias.wecksen@hh.se Magnus Jonsson Cenre for Research

More information

Gauss-Jordan Algorithm

Gauss-Jordan Algorithm Gauss-Jordan Algorihm The Gauss-Jordan algorihm is a sep by sep procedure for solving a sysem of linear equaions which may conain any number of variables and any number of equaions. The algorihm is carried

More information

Outline. EECS Components and Design Techniques for Digital Systems. Lec 06 Using FSMs Review: Typical Controller: state

Outline. EECS Components and Design Techniques for Digital Systems. Lec 06 Using FSMs Review: Typical Controller: state Ouline EECS 5 - Componens and Design Techniques for Digial Sysems Lec 6 Using FSMs 9-3-7 Review FSMs Mapping o FPGAs Typical uses of FSMs Synchronous Seq. Circuis safe composiion Timing FSMs in verilog

More information

Theory of Computing Systems 2002 Springer-Verlag New York Inc.

Theory of Computing Systems 2002 Springer-Verlag New York Inc. Theory Comput. Systems 35, 321 347 (2002) DOI: 10.1007/s00224-002-1057-3 Theory of Computing Systems 2002 Springer-Verlag New York Inc. The Data Locality of Work Stealing Umut A. Acar, 1 Guy E. Blelloch,

More information

The Roots of Lisp paul graham

The Roots of Lisp paul graham The Roos of Lisp paul graham Draf, January 18, 2002. In 1960, John McCarhy published a remarkable paper in which he did for programming somehing like wha Euclid did for geomery. 1 He showed how, given

More information

Why Waste a Perfectly Good Abstraction?

Why Waste a Perfectly Good Abstraction? Why Wase a Perfecly Good Absracion? Arie Gurfinkel and Marsha Chechik Deparmen of Compuer Science, Universiy of Torono, Torono, ON M5S 3G4, Canada. Email: arie,chechik@cs.orono.edu Absrac. Sofware model-checking

More information

4 Error Control. 4.1 Issues with Reliable Protocols

4 Error Control. 4.1 Issues with Reliable Protocols 4 Error Conrol Jus abou all communicaion sysems aemp o ensure ha he daa ges o he oher end of he link wihou errors. Since i s impossible o build an error-free physical layer (alhough some shor links can

More information

An Improved Square-Root Nyquist Shaping Filter

An Improved Square-Root Nyquist Shaping Filter An Improved Square-Roo Nyquis Shaping Filer fred harris San Diego Sae Universiy fred.harris@sdsu.edu Sridhar Seshagiri San Diego Sae Universiy Seshigar.@engineering.sdsu.edu Chris Dick Xilinx Corp. chris.dick@xilinx.com

More information

Chapter 3 MEDIA ACCESS CONTROL

Chapter 3 MEDIA ACCESS CONTROL Chaper 3 MEDIA ACCESS CONTROL Overview Moivaion SDMA, FDMA, TDMA Aloha Adapive Aloha Backoff proocols Reservaion schemes Polling Disribued Compuing Group Mobile Compuing Summer 2003 Disribued Compuing

More information

source managemen, naming, proecion, and service provisions. This paper concenraes on he basic processor scheduling aspecs of resource managemen. 2 The

source managemen, naming, proecion, and service provisions. This paper concenraes on he basic processor scheduling aspecs of resource managemen. 2 The Virual Compuers A New Paradigm for Disribued Operaing Sysems Banu Ozden y Aaron J. Goldberg Avi Silberschaz z 600 Mounain Ave. AT&T Bell Laboraories Murray Hill, NJ 07974 Absrac The virual compuers (VC)

More information

MATH Differential Equations September 15, 2008 Project 1, Fall 2008 Due: September 24, 2008

MATH Differential Equations September 15, 2008 Project 1, Fall 2008 Due: September 24, 2008 MATH 5 - Differenial Equaions Sepember 15, 8 Projec 1, Fall 8 Due: Sepember 4, 8 Lab 1.3 - Logisics Populaion Models wih Harvesing For his projec we consider lab 1.3 of Differenial Equaions pages 146 o

More information

CS 152 Computer Architecture and Engineering. Lecture 6 - Memory

CS 152 Computer Architecture and Engineering. Lecture 6 - Memory CS 152 Compuer Archiecure and Engineering Lecure 6 - Memory Krse Asanovic Elecrical Engineering and Compuer Sciences Universiy of California a Berkeley hp://www.eecs.berkeley.edu/~krse hp://ins.eecs.berkeley.edu/~cs152

More information

Learning in Games via Opponent Strategy Estimation and Policy Search

Learning in Games via Opponent Strategy Estimation and Policy Search Learning in Games via Opponen Sraegy Esimaion and Policy Search Yavar Naddaf Deparmen of Compuer Science Universiy of Briish Columbia Vancouver, BC yavar@naddaf.name Nando de Freias (Supervisor) Deparmen

More information

STRING DESCRIPTIONS OF DATA FOR DISPLAY*

STRING DESCRIPTIONS OF DATA FOR DISPLAY* SLAC-PUB-383 January 1968 STRING DESCRIPTIONS OF DATA FOR DISPLAY* J. E. George and W. F. Miller Compuer Science Deparmen and Sanford Linear Acceleraor Cener Sanford Universiy Sanford, California Absrac

More information

In fmri a Dual Echo Time EPI Pulse Sequence Can Induce Sources of Error in Dynamic Magnetic Field Maps

In fmri a Dual Echo Time EPI Pulse Sequence Can Induce Sources of Error in Dynamic Magnetic Field Maps In fmri a Dual Echo Time EPI Pulse Sequence Can Induce Sources of Error in Dynamic Magneic Field Maps A. D. Hahn 1, A. S. Nencka 1 and D. B. Rowe 2,1 1 Medical College of Wisconsin, Milwaukee, WI, Unied

More information

Constant-Work-Space Algorithms for Shortest Paths in Trees and Simple Polygons

Constant-Work-Space Algorithms for Shortest Paths in Trees and Simple Polygons Journal of Graph Algorihms and Applicaions hp://jgaa.info/ vol. 15, no. 5, pp. 569 586 (2011) Consan-Work-Space Algorihms for Shores Pahs in Trees and Simple Polygons Tesuo Asano 1 Wolfgang Mulzer 2 Yajun

More information

A non-stationary uniform tension controlled interpolating 4-point scheme reproducing conics

A non-stationary uniform tension controlled interpolating 4-point scheme reproducing conics A non-saionary uniform ension conrolled inerpolaing 4-poin scheme reproducing conics C. Beccari a, G. Casciola b, L. Romani b, a Deparmen of Pure and Applied Mahemaics, Universiy of Padova, Via G. Belzoni

More information

It is easier to visualize plotting the curves of cos x and e x separately: > plot({cos(x),exp(x)},x = -5*Pi..Pi,y = );

It is easier to visualize plotting the curves of cos x and e x separately: > plot({cos(x),exp(x)},x = -5*Pi..Pi,y = ); Mah 467 Homework Se : some soluions > wih(deools): wih(plos): Warning, he name changecoords has been redefined Problem :..7 Find he fixed poins, deermine heir sabiliy, for x( ) = cos x e x > plo(cos(x)

More information

MORPHOLOGICAL SEGMENTATION OF IMAGE SEQUENCES

MORPHOLOGICAL SEGMENTATION OF IMAGE SEQUENCES MORPHOLOGICAL SEGMENTATION OF IMAGE SEQUENCES B. MARCOTEGUI and F. MEYER Ecole des Mines de Paris, Cenre de Morphologie Mahémaique, 35, rue Sain-Honoré, F 77305 Fonainebleau Cedex, France Absrac. In image

More information

Packet Scheduling in a Low-Latency Optical Interconnect with Electronic Buffers

Packet Scheduling in a Low-Latency Optical Interconnect with Electronic Buffers Packe cheduling in a Low-Laency Opical Inerconnec wih Elecronic Buffers Lin Liu Zhenghao Zhang Yuanyuan Yang Dep Elecrical & Compuer Engineering Compuer cience Deparmen Dep Elecrical & Compuer Engineering

More information

Automatic Calculation of Coverage Profiles for Coverage-based Testing

Automatic Calculation of Coverage Profiles for Coverage-based Testing Auomaic Calculaion of Coverage Profiles for Coverage-based Tesing Raimund Kirner 1 and Waler Haas 1 Vienna Universiy of Technology, Insiue of Compuer Engineering, Vienna, Ausria, raimund@vmars.uwien.ac.a

More information

Dynamic Route Planning and Obstacle Avoidance Model for Unmanned Aerial Vehicles

Dynamic Route Planning and Obstacle Avoidance Model for Unmanned Aerial Vehicles Volume 116 No. 24 2017, 315-329 ISSN: 1311-8080 (prined version); ISSN: 1314-3395 (on-line version) url: hp://www.ijpam.eu ijpam.eu Dynamic Roue Planning and Obsacle Avoidance Model for Unmanned Aerial

More information

Visual Indoor Localization with a Floor-Plan Map

Visual Indoor Localization with a Floor-Plan Map Visual Indoor Localizaion wih a Floor-Plan Map Hang Chu Dep. of ECE Cornell Universiy Ihaca, NY 14850 hc772@cornell.edu Absrac In his repor, a indoor localizaion mehod is presened. The mehod akes firsperson

More information

Assignment 2. Due Monday Feb. 12, 10:00pm.

Assignment 2. Due Monday Feb. 12, 10:00pm. Faculy of rs and Science Universiy of Torono CSC 358 - Inroducion o Compuer Neworks, Winer 218, LEC11 ssignmen 2 Due Monday Feb. 12, 1:pm. 1 Quesion 1 (2 Poins): Go-ack n RQ In his quesion, we review how

More information

Lecture 18: Mix net Voting Systems

Lecture 18: Mix net Voting Systems 6.897: Advanced Topics in Crypography Apr 9, 2004 Lecure 18: Mix ne Voing Sysems Scribed by: Yael Tauman Kalai 1 Inroducion In he previous lecure, we defined he noion of an elecronic voing sysem, and specified

More information

4.1 3D GEOMETRIC TRANSFORMATIONS

4.1 3D GEOMETRIC TRANSFORMATIONS MODULE IV MCA - 3 COMPUTER GRAPHICS ADMN 29- Dep. of Compuer Science And Applicaions, SJCET, Palai 94 4. 3D GEOMETRIC TRANSFORMATIONS Mehods for geomeric ransformaions and objec modeling in hree dimensions

More information

An Efficient Delivery Scheme for Coded Caching

An Efficient Delivery Scheme for Coded Caching 201 27h Inernaional Teleraffic Congress An Efficien Delivery Scheme for Coded Caching Abinesh Ramakrishnan, Cedric Wesphal and Ahina Markopoulou Deparmen of Elecrical Engineering and Compuer Science, Universiy

More information

A Principled Approach to. MILP Modeling. Columbia University, August Carnegie Mellon University. Workshop on MIP. John Hooker.

A Principled Approach to. MILP Modeling. Columbia University, August Carnegie Mellon University. Workshop on MIP. John Hooker. Slide A Principled Approach o MILP Modeling John Hooer Carnegie Mellon Universiy Worshop on MIP Columbia Universiy, Augus 008 Proposal MILP modeling is an ar, bu i need no be unprincipled. Slide Proposal

More information

Concurrency Control and Recovery in Transactional Process Management

Concurrency Control and Recovery in Transactional Process Management In: Proceedings of he ACM Symposium on Principles of Daabase Sysems (PODS 99), pages 316-326, Philadelphia, Pennsylvania, USA, May/June, 1999. Concurrency Conrol and Recovery in Transacional Process Managemen

More information

MOTION DETECTORS GRAPH MATCHING LAB PRE-LAB QUESTIONS

MOTION DETECTORS GRAPH MATCHING LAB PRE-LAB QUESTIONS NME: TE: LOK: MOTION ETETORS GRPH MTHING L PRE-L QUESTIONS 1. Read he insrucions, and answer he following quesions. Make sure you resae he quesion so I don hae o read he quesion o undersand he answer..

More information

Nearest Keyword Search in XML Documents

Nearest Keyword Search in XML Documents Neares Keyword Search in XML Documens Yufei Tao Savros Papadopoulos Cheng Sheng Kosas Sefanidis Deparmen of Compuer Science and Engineering Chinese Universiy of Hong Kong New Terriories, Hong Kong {aoyf,

More information

FIELD PROGRAMMABLE GATE ARRAY (FPGA) AS A NEW APPROACH TO IMPLEMENT THE CHAOTIC GENERATORS

FIELD PROGRAMMABLE GATE ARRAY (FPGA) AS A NEW APPROACH TO IMPLEMENT THE CHAOTIC GENERATORS FIELD PROGRAMMABLE GATE ARRAY (FPGA) AS A NEW APPROACH TO IMPLEMENT THE CHAOTIC GENERATORS Mohammed A. Aseeri and M. I. Sobhy Deparmen of Elecronics, The Universiy of Ken a Canerbury Canerbury, Ken, CT2

More information

Who thinks who knows who? Socio-Cognitive Analysis of an Network

Who thinks who knows who? Socio-Cognitive Analysis of an  Network Who hinks who knows who? Socio-Cogniive Analysis of an Email Nework Nishih Pahak Deparmen of Compuer Science Universiy of Minnesoa Minneapolis, MN, USA npahak@cs.umn.edu Sandeep Mane Deparmen of Compuer

More information

Petri Nets for Object-Oriented Modeling

Petri Nets for Object-Oriented Modeling Peri Nes for Objec-Oriened Modeling Sefan Wi Absrac Ensuring he correcness of concurren rograms is difficul since common aroaches for rogram design do no rovide aroriae mehods This aer gives a brief inroducion

More information

Rule-Based Multi-Query Optimization

Rule-Based Multi-Query Optimization Rule-Based Muli-Query Opimizaion Mingsheng Hong Dep. of Compuer cience Cornell Universiy mshong@cs.cornell.edu Johannes Gehrke Dep. of Compuer cience Cornell Universiy johannes@cs.cornell.edu Mirek Riedewald

More information

Simple Network Management Based on PHP and SNMP

Simple Network Management Based on PHP and SNMP Simple Nework Managemen Based on PHP and SNMP Krasimir Trichkov, Elisavea Trichkova bsrac: This paper aims o presen simple mehod for nework managemen based on SNMP - managemen of Cisco rouer. The paper

More information

Performance Evaluation of Implementing Calls Prioritization with Different Queuing Disciplines in Mobile Wireless Networks

Performance Evaluation of Implementing Calls Prioritization with Different Queuing Disciplines in Mobile Wireless Networks Journal of Compuer Science 2 (5): 466-472, 2006 ISSN 1549-3636 2006 Science Publicaions Performance Evaluaion of Implemening Calls Prioriizaion wih Differen Queuing Disciplines in Mobile Wireless Neworks

More information

Computer representations of piecewise

Computer representations of piecewise Edior: Gabriel Taubin Inroducion o Geomeric Processing hrough Opimizaion Gabriel Taubin Brown Universiy Compuer represenaions o piecewise smooh suraces have become vial echnologies in areas ranging rom

More information

Video Content Description Using Fuzzy Spatio-Temporal Relations

Video Content Description Using Fuzzy Spatio-Temporal Relations Proceedings of he 4s Hawaii Inernaional Conference on Sysem Sciences - 008 Video Conen Descripion Using Fuzzy Spaio-Temporal Relaions rchana M. Rajurkar *, R.C. Joshi and Sananu Chaudhary 3 Dep of Compuer

More information

A GRAPHICS PROCESSING UNIT IMPLEMENTATION OF THE PARTICLE FILTER

A GRAPHICS PROCESSING UNIT IMPLEMENTATION OF THE PARTICLE FILTER A GRAPHICS PROCESSING UNIT IMPLEMENTATION OF THE PARTICLE FILTER ABSTRACT Modern graphics cards for compuers, and especially heir graphics processing unis (GPUs), are designed for fas rendering of graphics.

More information

The Data Locality of Work Stealing

The Data Locality of Work Stealing The Data Locality of Work Stealing Umut A. Acar umut@cs.cmu.edu School of Computer Science Carnegie Mellon University Guy E. Blelloch blelloch@cs.cmu.edu School of Computer Science Carnegie Mellon University

More information

Partition-based document identifier assignment (PBDIA) algorithm. (long queries)

Partition-based document identifier assignment (PBDIA) algorithm. (long queries) ( ) Pariion-based documen idenifier assignmen (PBDIA) algorihm PBDIA (long queries) (parallel IR) :,,,, d-gap Compressing an invered file can grealy improve query performance of an informaion rerieval

More information

Nonparametric CUSUM Charts for Process Variability

Nonparametric CUSUM Charts for Process Variability Journal of Academia and Indusrial Research (JAIR) Volume 3, Issue June 4 53 REEARCH ARTICLE IN: 78-53 Nonparameric CUUM Chars for Process Variabiliy D.M. Zombade and V.B. Ghue * Dep. of aisics, Walchand

More information

The Beer Dock: Three and a Half Implementations of the Beer Distribution Game

The Beer Dock: Three and a Half Implementations of the Beer Distribution Game The Beer Dock 2002-08-13 17:55:44-0700 The Beer Dock: Three and a Half Implemenaions of he Beer Disribuion Game Michael J. Norh[1] and Charles M. Macal Argonne Naional Laboraory, Argonne, Illinois Absrac

More information

An Implementation of the PCP, SRP, D-PCP, M-PCP, and FMLP Real-Time Synchronization Protocols in LITMUS RT

An Implementation of the PCP, SRP, D-PCP, M-PCP, and FMLP Real-Time Synchronization Protocols in LITMUS RT An Implemenaion of he PCP, SRP, D-PCP, M-PCP, and FMLP Real-Time Synchronizaion Proocols in LITMUS RT Björn B. Brandenburg and James H. Anderson The Universiy of Norh Carolina a Chapel Hill Absrac We exend

More information

Analysis of Various Types of Bugs in the Object Oriented Java Script Language Coding

Analysis of Various Types of Bugs in the Object Oriented Java Script Language Coding Indian Journal of Science and Technology, Vol 8(21), DOI: 10.17485/ijs/2015/v8i21/69958, Sepember 2015 ISSN (Prin) : 0974-6846 ISSN (Online) : 0974-5645 Analysis of Various Types of Bugs in he Objec Oriened

More information

An Adaptive Spatial Depth Filter for 3D Rendering IP

An Adaptive Spatial Depth Filter for 3D Rendering IP JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.3, NO. 4, DECEMBER, 23 175 An Adapive Spaial Deph Filer for 3D Rendering IP Chang-Hyo Yu and Lee-Sup Kim Absrac In his paper, we presen a new mehod

More information

User Adjustable Process Scheduling Mechanism for a Multiprocessor Embedded System

User Adjustable Process Scheduling Mechanism for a Multiprocessor Embedded System Proceedings of he 6h WSEAS Inernaional Conference on Applied Compuer Science, Tenerife, Canary Islands, Spain, December 16-18, 2006 346 User Adjusable Process Scheduling Mechanism for a Muliprocessor Embedded

More information

Test - Accredited Configuration Engineer (ACE) Exam - PAN-OS 6.0 Version

Test - Accredited Configuration Engineer (ACE) Exam - PAN-OS 6.0 Version Tes - Accredied Configuraion Engineer (ACE) Exam - PAN-OS 6.0 Version ACE Exam Quesion 1 of 50. Which of he following saemens is NOT abou Palo Alo Neworks firewalls? Sysem defauls may be resored by performing

More information

Precise Voronoi Cell Extraction of Free-form Rational Planar Closed Curves

Precise Voronoi Cell Extraction of Free-form Rational Planar Closed Curves Precise Voronoi Cell Exracion of Free-form Raional Planar Closed Curves Iddo Hanniel, Ramanahan Muhuganapahy, Gershon Elber Deparmen of Compuer Science Technion, Israel Insiue of Technology Haifa 32000,

More information

A Routing Algorithm for Flip-Chip Design

A Routing Algorithm for Flip-Chip Design A Rouing Algorihm for Flip-hip Design Jia-Wei Fang, I-Jye Lin, and Yao-Wen hang, Graduae Insiue of Elecronics Engineering, Naional Taiwan Universiy, Taipei Deparmen of Elecrical Engineering, Naional Taiwan

More information

Design Alternatives for a Thin Lens Spatial Integrator Array

Design Alternatives for a Thin Lens Spatial Integrator Array Egyp. J. Solids, Vol. (7), No. (), (004) 75 Design Alernaives for a Thin Lens Spaial Inegraor Array Hala Kamal *, Daniel V azquez and Javier Alda and E. Bernabeu Opics Deparmen. Universiy Compluense of

More information

AML710 CAD LECTURE 11 SPACE CURVES. Space Curves Intrinsic properties Synthetic curves

AML710 CAD LECTURE 11 SPACE CURVES. Space Curves Intrinsic properties Synthetic curves AML7 CAD LECTURE Space Curves Inrinsic properies Synheic curves A curve which may pass hrough any region of hreedimensional space, as conrased o a plane curve which mus lie on a single plane. Space curves

More information

Fully Dynamic Algorithm for Top-k Densest Subgraphs

Fully Dynamic Algorithm for Top-k Densest Subgraphs Fully Dynamic Algorihm for Top-k Denses Subgraphs Muhammad Anis Uddin Nasir 1, Arisides Gionis 2, Gianmarco De Francisci Morales 3 Sarunas Girdzijauskas 4 Royal Insiue of Technology, Sweden Aalo Universiy,

More information

The Impact of Product Development on the Lifecycle of Defects

The Impact of Product Development on the Lifecycle of Defects The Impac of Produc Developmen on he Lifecycle of Rudolf Ramler Sofware Compeence Cener Hagenberg Sofware Park 21 A-4232 Hagenberg, Ausria +43 7236 3343 872 rudolf.ramler@scch.a ABSTRACT This paper invesigaes

More information

Probabilistic Detection and Tracking of Motion Discontinuities

Probabilistic Detection and Tracking of Motion Discontinuities Probabilisic Deecion and Tracking of Moion Disconinuiies Michael J. Black David J. Flee Xerox Palo Alo Research Cener 3333 Coyoe Hill Road Palo Alo, CA 94304 fblack,fleeg@parc.xerox.com hp://www.parc.xerox.com/fblack,fleeg/

More information

CS 152 Computer Architecture and Engineering. Lecture 6 - Memory

CS 152 Computer Architecture and Engineering. Lecture 6 - Memory CS 152 Compuer Archiecure and Engineering Lecure 6 - Memory Krse Asanovic Elecrical Engineering and Compuer Sciences Universiy of California a Berkeley hp://www.eecs.berkeley.edu/~krse hp://ins.eecs.berkeley.edu/~cs152

More information

Time Expression Recognition Using a Constituent-based Tagging Scheme

Time Expression Recognition Using a Constituent-based Tagging Scheme Track: Web Conen Analysis, Semanics and Knowledge Time Expression Recogniion Using a Consiuen-based Tagging Scheme Xiaoshi Zhong and Erik Cambria School of Compuer Science and Engineering Nanyang Technological

More information

Utility-Based Hybrid Memory Management

Utility-Based Hybrid Memory Management Uiliy-Based Hybrid Memory Managemen Yang Li Saugaa Ghose Jongmoo Choi Jin Sun Hui Wang Onur Mulu Carnegie Mellon Universiy Dankook Universiy Beihang Universiy ETH Zürich While he memory fooprins of cloud

More information

Parallel and Distributed Systems for Constructive Neural Network Learning*

Parallel and Distributed Systems for Constructive Neural Network Learning* Parallel and Disribued Sysems for Consrucive Neural Nework Learning* J. Flecher Z. Obradovi School of Elecrical Engineering and Compuer Science Washingon Sae Universiy Pullman WA 99164-2752 Absrac A consrucive

More information

Announcements For The Logic of Boolean Connectives Truth Tables, Tautologies & Logical Truths. Outline. Introduction Truth Functions

Announcements For The Logic of Boolean Connectives Truth Tables, Tautologies & Logical Truths. Outline. Introduction Truth Functions Announcemens For 02.05.09 The Logic o Boolean Connecives Truh Tables, Tauologies & Logical Truhs 1 HW3 is due nex Tuesday William Sarr 02.05.09 William Sarr The Logic o Boolean Connecives (Phil 201.02)

More information

1.4 Application Separable Equations and the Logistic Equation

1.4 Application Separable Equations and the Logistic Equation 1.4 Applicaion Separable Equaions and he Logisic Equaion If a separable differenial equaion is wrien in he form f ( y) dy= g( x) dx, hen is general soluion can be wrien in he form f ( y ) dy = g ( x )

More information

Chapter Six Chapter Six

Chapter Six Chapter Six Chaper Si Chaper Si 0 CHAPTER SIX ConcepTess and Answers and Commens for Secion.. Which of he following graphs (a) (d) could represen an aniderivaive of he funcion shown in Figure.? Figure. (a) (b) (c)

More information

Achieving Security Assurance with Assertion-based Application Construction

Achieving Security Assurance with Assertion-based Application Construction Achieving Securiy Assurance wih Asserion-based Applicaion Consrucion Carlos E. Rubio-Medrano and Gail-Joon Ahn Ira A. Fulon Schools of Engineering Arizona Sae Universiy Tempe, Arizona, USA, 85282 {crubiome,

More information

Discrete Event Systems. Lecture 14: Discrete Control. Continuous System. Discrete Event System. Discrete Control Systems.

Discrete Event Systems. Lecture 14: Discrete Control. Continuous System. Discrete Event System. Discrete Control Systems. Lecure 14: Discree Conrol Discree Even Sysems [Chaper: Sequenial Conrol + These Slides] Discree Even Sysems Sae Machine-Based Formalisms Saechars Grafce Laboraory 2 Peri Nes Implemenaion No covered in

More information

Axiomatic Foundations and Algorithms for Deciding Semantic Equivalences of SQL Queries

Axiomatic Foundations and Algorithms for Deciding Semantic Equivalences of SQL Queries Axiomaic Foundaions and Algorihms for Deciding Semanic Equivalences of SQL Queries Shumo Chu, Brendan Murphy, Jared Roesch, Alvin Cheung, Dan Suciu Paul G. Allen School of Compuer Science and Engineering

More information

Handling uncertainty in semantic information retrieval process

Handling uncertainty in semantic information retrieval process Handling uncerainy in semanic informaion rerieval process Chkiwa Mounira 1, Jedidi Anis 1 and Faiez Gargouri 1 1 Mulimedia, InfoRmaion sysems and Advanced Compuing Laboraory Sfax Universiy, Tunisia m.chkiwa@gmail.com,

More information

A Tool for Multi-Hour ATM Network Design considering Mixed Peer-to-Peer and Client-Server based Services

A Tool for Multi-Hour ATM Network Design considering Mixed Peer-to-Peer and Client-Server based Services A Tool for Muli-Hour ATM Nework Design considering Mied Peer-o-Peer and Clien-Server based Services Conac Auhor Name: Luis Cardoso Company / Organizaion: Porugal Telecom Inovação Complee Mailing Address:

More information

A GRAPHICS PROCESSING UNIT IMPLEMENTATION OF THE PARTICLE FILTER

A GRAPHICS PROCESSING UNIT IMPLEMENTATION OF THE PARTICLE FILTER A GRAPHICS PROCESSING UNIT IMPLEMENTATION OF THE PARTICLE FILTER Gusaf Hendeby, Jeroen D. Hol, Rickard Karlsson, Fredrik Gusafsson Deparmen of Elecrical Engineering Auomaic Conrol Linköping Universiy,

More information

ME 406 Assignment #1 Solutions

ME 406 Assignment #1 Solutions Assignmen#1Sol.nb 1 ME 406 Assignmen #1 Soluions PROBLEM 1 We define he funcion for Mahemaica. In[1]:= f@_d := Ep@D - 4 Sin@D (a) We use Plo o consruc he plo. In[2]:= Plo@f@D, 8, -5, 5

More information

A Progressive-ILP Based Routing Algorithm for Cross-Referencing Biochips

A Progressive-ILP Based Routing Algorithm for Cross-Referencing Biochips 16.3 A Progressive-ILP Based Rouing Algorihm for Cross-Referencing Biochips Ping-Hung Yuh 1, Sachin Sapanekar 2, Chia-Lin Yang 1, Yao-Wen Chang 3 1 Deparmen of Compuer Science and Informaion Engineering,

More information