Uncorrected Proof. Thread-Level Speculation

Size: px
Start display at page:

Download "Uncorrected Proof. Thread-Level Speculation"

Transcription

1 Encyclopeda of Parallel Computng /3/8 12:30 Page 1 #2 T Thread-Level Josep Torrellas Unversty of Illnos at Urbana-Champagn 4231 Sebel Center,M/C-258,Urbana,IL,USA Synonyms Speculatve multthreadng (SM); Speculatve parallelzaton; Speculatve run-tme parallelzaton; Speculatve threadng; Speculatve thread-level parallelzaton; Thread-level data speculaton (TLDS); TLS Defnton Thread-Level (TLS) refers to an envronment where executon threads operate speculatvely, performng potentally unsafe operatons, and temporarly bufferng the state they generate n a buffer or cache. At a certan pont, the operatons of a thread are declared to be correct or ncorrect. If they are correct, the thread commts, mergng the state t generated wth the correct state of the program; f they are ncorrect, the thread s squashed and typcally restarted from ts begnnng. The term TLS s most often assocated to a scenaro where the purpose s to execute a sequental applcaton n parallel. In ths case, the compler or the hardware breaks down the applcaton nto speculatve threads that execute n parallel. However, strctly speakng, TLS can be appled to any envronment where threads are executed speculatvely and can be squashed and restarted. Dscusson Basc Concepts n Thread-Level In ts most common use, Thread-Level (TLS) conssts of extractng unts of work (.e., tasks) from a sequental applcaton and executng them on dfferent threads n parallel, hopng not to volate sequental semantcs. The control flow n the sequen- 35 tal code mposes a relatve orderng between the tasks, 36 whch s expressed n terms of predecessor and suc- 37 cessor tasks. The sequental code also nduces a data 38 dependence relaton on the memory accesses ssued by 39 the dfferent tasks that parallel executon cannot volate. 40 AtasksSpeculatve when t may perform or may 41 have performed operatons that volate data or con- 42 trol dependences wth ts predecessor tasks. Other- 43 wse, the task s nonspeculatve. The memory accesses 44 ssued by speculatve tasks are called speculatve mem- 45 ory accesses. 46 When a nonspeculatve task fnshes executon, t s 47 ready to Commt. The role of commt s to nform the 48 rest of the system that the data generated by the task 49 s now part of the safe, nonspeculatve program state. 50 Among other operatons, commttng always nvolves 51 passng the Commt Token to the mmedate succes- 52 sor task. Ths s because mantanng correct sequental 53 semantcs n the parallel executon requres that tasks 54 commt n order from predecessor to successor. If a task 55 reaches ts end and s stll speculatve, t cannot com- 56 mt untl t acqures nonspeculatve status and all ts 57 predecessors have commtted. 58 Fgure 1 shows an example of several tasks run- 59 nng on four processors. In ths example, when task T3 60 executng on processor 4 fnshes the executon, t can- 61 notcommtuntltspredecessortaskst0,t1,andt2 62 also fnsh and commt. In the meantme, dependng on 63 the hardware support, processor 4 may have to stall or 64 may be able to start executng speculatve task T7. The 65 example also shows how the nonspeculatve task status 66 changes as tasks fnsh and commt, and the passng of 67 the commt token. 68 Memory accesses ssued by a speculatve task 69 must be handled carefully. Stores generate Speculatve 70 Versons of data that cannot smply be merged wth 71 the nonspeculatve state of the program. The reason s 72 that they may be ncorrect. Consequently, these versons 73 Davd Padua (ed.), Encyclopeda of Parallel Computng, DOI / , Sprnger Scence+Busness Meda LLC 2011

2 Encyclopeda of Parallel Computng /3/8 12:30 Page 2 #3 2 T Thread-Level Proc# Tme T0 T1 T2 T3 T4 T5 Nonspeculatve task tmelne T6 Commt token transfer Thread-Level. Fg. 1 A set of tasks executng on four processors. The fgure shows the nonspeculatve task tmelne and the transfer of the commt token 74 are stored n a Speculatve Buffer local to the processor 75 runnng the task e.g., the frst-level cache. Only when 76 the task becomes nonspeculatve are ts versons safe. 77 Loads ssued by a speculatve task try to fnd the 78 requested datum n the local speculatve buffer. If they 79 mss, they fetch the correct verson from the memory 80 subsystem,.e., the closest predecessor verson from the 81 speculatve buffers of other tasks. If no such verson 82 exsts, they fetch the datum from memory. 83 As tasks execute n parallel, the system must den- 84 tfy any volatons of cross-task data dependences. 85 Typcally, ths s done wth specal hardware or soft- 86 ware support that tracks, for each ndvdual task, the 87 data that the task wrote and the data that the task read 88 wthout frst wrtng t. A data-dependence volaton s 89 flagged when a task modfes a datum that has been read 90 earler by a successor task. At ths pont, the consumer 91 task s squashed and all the data versons that t has 92 produced are dscarded. Then, the task s re-executed. 93 Fgure 2 shows an example of a data-dependence 94 volaton. In the example, each teraton of a loop 95 s a task. Each teraton ssues two accesses to an 96 array, through an un-analyzable subscrpted subscrpt. 97 At run-tme, teraton J wrtes A[5] after ts succes- 98 sor teraton J+2 reads A[5]. Ths s a Read After 99 Wrte (RAW) dependence that gets volated due to 100 the parallel executon. Consequently, teraton J+2 s 101 squashed and restarted. Ordnarly, all the successor Stall T7 tasks of teraton J+2 are also squashed at ths tme 102 because they may have consumed versons generated 103 by the squashed task. Whle t s possble to selectvely 104 squash only tasks that used ncorrect data, t would 105 nvolve extra complexty. Fnally, as teraton J+2 re- 106 executes, t wll re-read A[5]. However, at ths tme, the 107 value read wll be the verson generated by teraton J. 108 Note that WAR and WAW dependence volatons do 109 not need to nduce task squashes. The successor task has 110 prematurely wrtten the datum, but the datum remans 111 buffered n ts speculatve buffer. A subsequent read 112 from a predecessor task (n a WAR volaton) wll get a 113 correct verson, whle a subsequent wrte from a prede- 114 cessor task (n a WAW volaton) wll generate a verson 115 that wll be merged wth man memory before the one 116 from the successor task. 117 However, many proposed TLS schemes, to reduce 118 hardware complexty, nduce squashes n a varety of st- 119 uatons. For nstance, f the system has no support to 120 keepdfferentversonsofthesamedatumndfferent 121 speculatve buffers n the machne, cross-task WAR and 122 WAW dependence volatons nduce squashes. More- 123 over, f the system only tracks accesses on a per-lne 124 bass, t cannot dsambguate accesses to dfferent words 125 n the same memory lne. In ths case, false sharng of a 126 cache lne by two dfferent processors can appear as a 127 data-dependence volaton and also trgger a squash. 128

3 Encyclopeda of Parallel Computng /3/8 12:30 Page 3 #4 Thread-Level T 3 for (=0; <N; ++) {... = A[L[]] +... Iteraton J Iteraton J+1 Iteraton J+2... = A[4] = A[2] = A[5] A[K[]] =.... } A[5] =... A[2] =... A[6] =... RAW volaton Thread-Level. Fg. 2 Example of a data-dependence volaton Fnally, whle TLS can be appled to varous code structures,tsmostoftenappledtoloops.inths case, tasks are typcally formed by a set of consecutve teratons. The rest of ths artcle s organzed as follows: Frst, the artcle brefly classfes TLS schemes. Then, t descrbes the two major problems that any TLS scheme has to solve, namely, bufferng and managng speculatve state, and detectng and handlng dependence volatons. Next, t descrbes the ntal efforts n TLS, other uses of TLS, and machnes that use TLS. Classfcaton of Thread-Level Schemes There have been many proposals of TLS schemes. They can be broadly classfed dependng on the emphass on hardware versus software, and the type of target machne. The majorty of the proposed schemes use hardware support to detect cross-task dependence volatons that result n task squashes (e.g., [1, 4, 6, 8, 11, 12, 14, 16, 18, 20, 23, 27, 28, 31, 32, 36]). Typcally, ths s attaned by usng the hardware cache coherence protocol, whch sends coherence messages between the caches when multple processors access the same memory lne. Among all these hardware-based schemes, the majorty rely on a compler or a software layer to dentfy and prepare the tasks that should be executed n parallel. Consequently, there have been several proposals for TLS complers (e.g., [9, 19, 33, 34]). Very few schemes rely on the hardware to dentfy the tasks (e.g., [1]). Several schemes, especally n the early stages of TLS research, proposed software-only approaches to TLS (e.g., [7, 13, 25, 26]). In ths case, the compler typcally generates code that causes each task to keep shadow locatons and, after the parallel executon, checks f multple tasks have updated a common locaton. If they have, the orgnal state s restored. Most proposed TLS schemes target small shared- 166 memory machnes of about two to eght processors 167 (e.g., [14, 18, 27, 29]). It s n ths range of paral- 168 lelsm that TLS s most cost effectve. Some TLS pro- 169 posals have focused on smaller machnes and have 170 extended a superscalar core wth some hardware unts 171 that execute threads speculatvely [1, 20]. Fnally, some 172 TLS proposals have targeted scalable multprocessors 173 [4, 23, 28]. Ths s a more challengng envronment, 174 gven the longer communcaton latences nvolved. It 175 requres applcatons that have sgnfcant parallelsm 176 that cannot be analyzed statcally by the compler. 177 Bufferng and Managng Speculatve State 178 The state produced by speculatve tasks s unsafe, snce 179 such tasks may be squashed. Therefore, any TLS scheme 180 must be able to dentfy such state and, when neces- 181 sary, separate t from the rest of the memory state. 182 For ths, TLS systems use structures, such as caches 183 [4, 6, 12, 18, 28], and specal buffers [8, 14, 23, 32], or 184 undo logs [7, 11, 36]. Ths secton outlnes the chal- 185 lenges n bufferng and managng speculatve state. A 186 more detaled analyss and a taxonomy s presented by 187 Garzaran et al. [10]. 188 Multple Versons of the Same Varable 189 n the System 190 Every tme that a task wrtes for the frst tme to a 191 varable, a new verson of the varable appears n the 192 system. Thus, two speculatve tasks runnng on dfferent 193 processors may create two dfferent versons of the same 194 varable [4, 12]. These versons need to be buffered sep- 195 arately, and specal actons may need to be taken so that 196 a reader task can fnd the correct verson out of the sev- 197 eral coexstng n the system. Such a verson wll be the 198 verson created by the producer task that s the closest 199 predecessor of the reader task. 200 A task has at most a sngle verson of any gven 201 varable, even f t wrtes to the varable multple tmes. 202

4 Encyclopeda of Parallel Computng /3/8 12:30 Page 4 #5 4 T Thread-Level The reason s that, on a dependence volaton, the whole task s undone. Therefore, there s no need to keep ntermedate values of the varable. Multple Speculatve Tasks per Processor When a processor fnshes executng a task, the task may stll be speculatve. If the TLS bufferng support s such that the processor can only hold state from a sngle speculatve task, the processor stalls untl the task commts. However, to better tolerate task load mbalance, the local buffer may have been desgned to buffer state from several speculatve tasks, enablng the processor to execute another speculatve task. In ths case, the state of each task must be tagged wth the ID of the task. Multple Versons of the Same Varable n a Sngle Processor When a processor buffers state from multple speculatve tasks, t s possble that two such tasks create two versons of the same varable. Ths occurs n loadmbalanced applcatons that exhbt prvate data patterns (.e., WAW dependences between tasks). In ths case, the buffer wll have to hold multple versons of the same varable. Each verson wll be tagged wth a dfferent task ID. Ths support ntroduces complcaton to the buffer or cache. Indeed, on an external request, extra comparsons wll need to be done f the cache has two versons of the same varable. Mergng of Task State The state produced by speculatve tasks s typcally merged wth man memory at task commt tme; however, t can nstead be merged as t s beng generated. The frst approach s called Archtectural Man Memory (AMM) or Lazy Verson Management; thesecondone s called Future Man Memory (FMM) or Eager Verson Management. These schemes dffer on whether the man memory contans only safe data (AMM) or t can also contan speculatve data (FMM). In AMM systems, all speculatve versons reman n caches or buffers that are kept separate from the coherent memory state. Only when a task becomes nonspeculatve can ts buffered state be merged wth man memory. In a straghtforward mplementaton, when a task commts, all the buffered drty cache lnes are merged wth man memory, ether by wrtng back the lnes to memory [4] or by requestng ownershp for 246 them to obtan coherence wth man memory [28]. 247 In FMM systems, versons from speculatve tasks are 248 merged wth the coherent memory when they are gen- 249 erated. However, to enable recovery from task squashes, 250 when a task generates a speculatve verson of a varable, 251 the prevous verson of the varable s saved n a log. 252 Note that, n both approaches, the coherent memory 253 state can temporarly resde n caches, whch functon 254 n ther tradtonal role of extensons of man memory. 255 Detectng and Handlng Dependence 256 Volatons 257 Basc Concepts 258 The second aspect of TLS nvolves detectng and han- 259 dlng dependence volatons. Most TLS proposals focus 260 on data dependences, rather than control dependences. 261 To detect (cross-task) data-dependence volatons, most 262 TLS schemes use the same approach. Specfcally, when 263 a speculatve task wrtes a datum, the hardware sets a 264 Speculatve Wrte bt assocated wth the datum n the 265 cache; when a speculatve task reads a datum before t 266 wrtes to t (an event called Exposed Read), the hard- 267 ware sets an Exposed Read bt. Dependng on the TLS 268 scheme supported, these accesses also cause a tag asso- 269 cated wth the datum to be set to the ID of the task. 270 In addton, when a task wrtes a datum, the cache 271 coherence protocol transacton that sends nvaldatons 272 to other caches checks these bts. If a successor task has 273 ts Exposed Read bt set for the datum, the successor 274 task has prematurely read the datum (.e., ths s a RAW 275 dependence volaton), and s squashed [18]. 276 If the Speculatve Wrte and Exposed Read bts are 277 kept on a per-word bass, only dependences on the same 278 word can cause squashes. However, keepng and man- 279 tanng such bts on a per-word bass n caches, network 280 messages, and perhaps drectory modules s costly n 281 hardware. Moreover, t does not come naturally to the 282 coherence protocol of multprocessors, whch operate 283 at the granularty of memory lnes. 284 Keepng these bts on a per-lne bass s cheaper and 285 compatble wth manstream cache coherence proto- 286 cols. However, the hardware cannot then dsambguate 287 accesses at word level. Furthermore, t cannot combne 288 dfferent versons of a lne that have been updated n df- 289 ferent words. Consequently, cross-task RAW and WAW 290

5 Encyclopeda of Parallel Computng /3/8 12:30 Page 5 #6 Thread-Level T volatons, on both the same word and dfferent words of a lne (.e., false sharng), cause squashes. Task squash s a very costly operaton. The cost s threefold: overhead of the squash operaton tself, loss of whatever correct work has already been performed by the offendng task and ts successors, and cache msses n the offendng task and ts successors needed to reload state when restartng. The latter overhead appears because, as part of the squash operaton, the speculatve state n the cache s nvaldated. Fgure 3a shows an example of a RAW volaton across tasks and +j+1. The consumer task and ts successors are squashed. Technques to Avod Squashes Snce squashes are so expensve, there are technques to avod them. If the compler can conclude that a certan par of accesses wll frequently cause a data-dependence volaton, t can statcally nsert a synchronzaton operaton that forces the correct task orderng at runtme. Alternatvely, the machne can have hardware support that records, at runtme, where dependence volatons occur. Such hardware may record the program counter of the read or wrtes nvolved, or the address of the memory locaton beng accessed. Based on ths Tme a RAW +j +j+1 Sqsh Sqsh +j+2 b +j Commt +j+1 +j+2 nformaton, when these program counters are reached 315 or the memory locaton s accessed, the hardware can 316 try one of several technques to avod the volaton. Ths 317 secton outlnes some of the technques that can be used. 318 A more complete descrpton of the choces s presented 319 by Cntra and Torrellas [5]. Wthout loss of generalty, a 320 RAW volaton s assumed. 321 Based on past hstory, the predctor may predct 322 that the par of conflctng accesses are engaged n false 323 sharng. In ths case, t can smply allow the read to pro- 324 ceed and then the subsequent wrte to execute slently, 325 wthout sendng nvaldatons. Later, before the con- 326 sumer task s allowed to commt, t s necessary to 327 check whether the sectons of the lne read by the con- 328 sumer overlap wth the sectons of the lne wrtten by 329 the producer. Ths can be easly done f the caches 330 have per-word access bts. If there s no overlap, t was 331 false sharng and the squash s avoded. Fgure 3bshows 332 the resultng tme lne. 333 When there s a true data dependence between tasks, 334 a squash can be avoded wth effectve use of value pre- 335 dcton. Specfcally, the predctor can predct the value 336 that the producer wll produce, speculatvely provde t 337 to the consumer s read, and let the consumer proceed. 338 Useful work Wasted correct work Commt c +j +j+1 +j+2 Release Squash overhead Checkng overhead +j Commt d +j+1 Release +j+2 Possbly ncorrect work Stall overhead Thread-Level. Fg. 3 RAW data-dependence volaton that results n a squash (a) or that does not cause a squash due to false sharng or value predcton (b), or consumer stall (c and d)

6 Encyclopeda of Parallel Computng /3/8 12:30 Page 6 #7 6 T Thread-Level Agan, before the consumer s allowed to commt, t s necessary to check that the value provded was correct. The tmelne s also shown n Fg. 3b. In cases where the predctor s unable to predct the value, t can avod the squash by stallng the consumer task at the tme of the read. Ths case can use two possble approaches. An aggressve approach s to release the consumer task and let t read the current value as soon as the predcted producer task commts. The tme lne s shown n Fg. 3c. In ths case, f an ntervenng task between the frst producer and the consumer later wrtes the lne, the consumer wll be squashed. A more conservatve approach s not to release the consumer task untl t becomes nonspeculatve. In ths case, the presence of multple predecessor wrters wll not squash the consumer. The tme lne s shown n Fg. 3d. Intal Efforts n Thread-Level An early proposal for hardware support for a form of speculatve parallelzaton was made by Knght [16] n the context of functonal languages. Later, the Multscalar processor [27]wasthefrstproposaltouseaform of TLS wthn a sngle-chp multthreaded archtecture. A software-only form of TLS was proposed n the LRPDtest [25]. Early proposals of hardware-based TLS nclude the work of several authors [14, 17, 21, 29, 35]. Other Uses of Thread-Level TLS concepts have been used n envronments that have goals other than tryng to parallelze sequental programs. For example, they have been used to speed up explctly parallel programs through Speculatve Synchronzaton [22], or for parallel program debuggng [24] orprogrammontorng[37]. Smlar concepts to TLS have been used n systems supportng hardware transactonal memory [15] and contnuous atomc-block operaton [30]. Machnes that Use Thread-Level Several machnes bult by computer manufacturers have hardware support for some form of TLS although the specfc mplementaton detals are typcally not dsclosed. Such machnes nclude systems desgned for Java applcatons such as Sun Mcrosystems MAJC chp [31] and Azul Systems Vega processor [2]. The most hgh-profle system wth hardware support for speculatve threads s Sun Mcrosystems ROCK 383 processor [3]. Other manufacturers are rumored to be 384 developng prototypes wth smlar hardware. 385 Related Entres 386 Instructon-Level 387 Speculatve Synchronzaton 388 Transactonal Memory 389 Bblography Akkary H, Drscoll M (1998) A dynamc multthreadng proces- 391 sor. In: Internatonal symposum on mcroarchtecture, Dallas, 392 November Azul Systems. Vega 3 Processor products/vega/processor Chaudhry S, Cypher R, Ekman M, Karlsson M, Landn A, Yp S, 396 Zeffer H, Tremblay M (2009) Smultaneous speculatve threadng: 397 a novel ppelne archtecture mplemented n Sun s ROCK Pro- 398 cessor. In: Internatonal symposum on computer archtecture, 399 Austn, June Cntra M, Martínez JF, Torrellas J (2000) Archtectural support 401 for scalable speculatve parallelzaton n shared-memory mult- 402 processors. In: Internatonal symposum on computer archtec- 403 ture, Vancouver, June 2000, pp Cntra M, Torrellas J (2002) Elmnatng squashes through 405 learnng cross-thread volatons n speculatve parallelzaton for 406 multprocessors. In: Proceedngs of the 8th Hgh-Performance 407 computer archtecture conference, Boston, Feb Fgueredo R, Fortes J (2001) Hardware support for extract- 409 ng coarse-gran speculatve parallelsm n dstrbuted shared- 410 memory multprocesors. In: Proceedngs of the nternatonal 411 conference on parallel processng, Valenca, Span, September Frank M, Lee W, Amarasnghe S (2001) A software framework 414 for supportng general purpose applcatons on raw computaton 415 fabrcs. Techncal report, MIT/LCS Techncal Memo MIT-LCS- 416 TM-619, July Frankln M, Soh G (1996) ARB: a hardware mechansm for 418 dynamc reorderng of memory references. IEEE Trans Comput (5): Garca C, Madrles C, Sanchez J, Marcuello P, Gonzalez A, 421 Tullsen D (2005) Mtoss compler: An nfrastructure for specu- 422 latve threadng based on pre-computaton slces. In: Conference 423 on programmng language desgn and mplementaton, Chcago, 424 Illnos, June Garzarán M, Prvulovc M, Llabería J, Vñals V, Rauchwerger L, 426 Torrellas J (2005) Tradeoffs n bufferng speculatve memory 427 state for thread-level speculaton n multprocessors. ACM Trans 428 Archt Code Optm Garzaran MJ, Prvulovc M, Llabería JM, Vñals V, Rauchwerger L, 430 Torrellas J (2003) Usng software loggng to support mult- 431 verson bufferng n thread-level speculaton. In: Internatonal 432 AU1

7 Encyclopeda of Parallel Computng /3/8 12:30 Page 7 #8 Thread-Level T conference on parallel archtectures and complaton technques, New Orleans, Sept Gopal S, Vjaykumar T, Smth J, Soh G (1998) Speculatve versonng cache. In: Internatonal symposum on hgh-performance computer archtecture, Las Vegas, Feb Gupta M, Nm R (1998) Technques for speculatve run-tme parallelzaton of loops. In: Proceedngs of supercomputng 1998, ACM Press, Melbourne, Australa, Nov Hammond L, Wlley M, Olukotun K (1998) Data speculaton support for a chp multprocessor. In: Internatonal conference on archtectural support for programmng languages and operatng systems, San Jose, Calforna, Oct 1998, pp Herlhy M, Moss E (1993) Transactonal memory: archtectural support for lock-free data structures. In: Internatonal symposum on computer archtecture, IEEE Computer Socety Press, San Dego, May Knght T (1986) An archtecture for mostly functonal languages. In: ACM lsp and functonal programmng conference, ACM Press, New York, Aug 1986, pp Krshnan V, Torrellas J (1998) Hardware and software support for speculatve executon of sequental bnares on a chpmultprocessor. In: Internatonal conference on supercomputng, Melbourne, Australa, July Krshnan V, Torrellas J (1999) A chp-multprocessor archtecture wth speculatve multthreadng. IEEE Trans Comput 48(9): Lu W, Tuck J, Ceze L, Ahn W, Strauss K, Renau J, Torrellas J (2006) POSH: A TLS compler that explots program structure. In: Internatonal symposum on prncples and practce of parallel programmng, San Dego, Mar Marcuello P, Gonzalez A (1999) Clustered speculatve multthreaded processors. In: Internatonal conference on supercomputng, Rhodes, Island, June 1999, pp Marcuello P, Gonzalez A, Tubella J (1998) Speculatve multthreaded processors. In: Internatonal conference on supercomputng, ACM, Melbourne, Australa, July Martnez J, Torrellas J (2002) Speculatve synchronzaton: applyng thread-level speculaton to explctly parallel applcatons. In: Internatonal conference on archtectural support for programmng languages and operatng systems, San Jose, Oct Prvulovc M, Garzaran MJ, Rauchwerger L, Torrellas J (2001) Removng archtectural bottlenecks to the scalablty of speculatve parallelzaton. In: Proceedngs of the 28th nternatonal symposum on computer archtecture (ISCA 01), New York, June 2001, pp Prvulovc M, Torrellas J (2003) ReEnact: usng thread-level speculaton to debug data races n multthreaded codes. In: Internatonal symposum on computer archtecture, San Dego, June Rauchwerger L, Padua D (1995) The LRPD test: speculatve runtme parallelzaton of loops wth prvatzaton and reducton parallelzaton. In: Conference on programmng language desgn and mplementaton, La Jolla, Calforna, June Rundberg P, Stenstrom P (2000) Low-cost thread-level data 486 dependence speculaton on multprocessors. In: Fourth work- 487 shop on multthreaded executon, archtecture and complaton, 488 Monterrey, Dec SohG,BreachS,VjaykumarT(1995)Multscalarprocessors.In: 490 Internatonal Symposum on computer archtecture, ACM Press, 491 New York, June Steffan G, Colohan C, Zha A, Mowry T (2000) A scalable 493 approach to thread-level speculaton. In: Proceedngs of the 27th 494 Annual Internatonal symposum on computer archtecture, Van- 495 couver, June 2000, pp Steffan G, Mowry TC (1998) The potental for usng thread- 497 level data speculaton to facltate automatc parallelzaton. In: 498 Internatonal symposum on hgh-performance computer arch- 499 tecture, Las Vegas, Feb TorrellasJ,CezeL,TuckJ,CascavalC,MontesnosP,AhnW, 501 Prvulovc M (2009) The bulk multcore archtecture for mproved 502 programmablty. Communcatons of the ACM, New York Tremblay M (1999) MAJC: mcroprocessor archtecture for java 504 computng. Hot Chps, Palo Alto, Aug Tsa J, Huang J, Amlo C, Llja D, Yew P (1999) The superthreaded 506 processor archtecture. IEEE Trans Comput 48(9): Vjaykumar T, Soh G (1998) Task selecton for a multscalar pro- 508 cessor. In: Internatonal symposum on mcroarchtecture, Dallas, 509 Nov 1998, pp Zha A, Colohan C, Steffan G, Mowry T (2002) Compler opt- 511 mzaton of scalar value communcaton between speculatve 512 threads. In: Internatonal conference on archtectural support for 513 programmng languages and operatng systems, San Jose, Oct Zhang Y, Rauchwerger L, Torrellas J (1998) Hardware for specula- 516 tve run-tme parallelzaton n dstrbuted shared-memory mul- 517 tprocessors. In: Proceedngs of the 4th Internatonal symposum 518 on hgh-performance computer archtecture (HPCA), Phoenx, 519 Feb 1998, pp Zhang Y, Rauchwerger L, Torrellas J (1999) Hardware for spec- 521 ulatve parallelzaton of partally-parallel loops n DSM mult- 522 processors. In: Proceedngs of the 5th nternatonal symposum 523 on hgh-performance computer archtecture, Orlando, Jan 1999, 524 pp ZhouP,QnF,LuW,ZhouY,Torrellas(2004)Watcher:effcent 526 archtectural support for software debuggng. In: Internatonal 527 symposum on computer archtecture, IEEE Computer socety, 528 München, June

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Assembler. Building a Modern Computer From First Principles.

Assembler. Building a Modern Computer From First Principles. Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought

More information

An Optimal Algorithm for Prufer Codes *

An Optimal Algorithm for Prufer Codes * J. Software Engneerng & Applcatons, 2009, 2: 111-115 do:10.4236/jsea.2009.22016 Publshed Onlne July 2009 (www.scrp.org/journal/jsea) An Optmal Algorthm for Prufer Codes * Xaodong Wang 1, 2, Le Wang 3,

More information

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1) Secton 1.2 Subsets and the Boolean operatons on sets If every element of the set A s an element of the set B, we say that A s a subset of B, or that A s contaned n B, or that B contans A, and we wrte A

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access

Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access Agenda Cache Performance Samra Khan March 28, 217 Revew from last lecture Cache access Assocatvty Replacement Cache Performance Cache Abstracton and Metrcs Address Tag Store (s the address n the cache?

More information

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance Tsnghua Unversty at TAC 2009: Summarzng Mult-documents by Informaton Dstance Chong Long, Mnle Huang, Xaoyan Zhu State Key Laboratory of Intellgent Technology and Systems, Tsnghua Natonal Laboratory for

More information

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints Australan Journal of Basc and Appled Scences, 2(4): 1204-1208, 2008 ISSN 1991-8178 Sum of Lnear and Fractonal Multobjectve Programmng Problem under Fuzzy Rules Constrants 1 2 Sanjay Jan and Kalash Lachhwan

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)

More information

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to:

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to: 4.1 4.2 Motvaton EE 457 Unt 4 Computer System Performance An ndvdual user wants to: Mnmze sngle program executon tme A datacenter owner wants to: Maxmze number of Mnmze ( ) http://e-tellgentnternetmarketng.com/webste/frustrated-computer-user-2/

More information

Efficient Distributed File System (EDFS)

Efficient Distributed File System (EDFS) Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate

More information

ELEC 377 Operating Systems. Week 6 Class 3

ELEC 377 Operating Systems. Week 6 Class 3 ELEC 377 Operatng Systems Week 6 Class 3 Last Class Memory Management Memory Pagng Pagng Structure ELEC 377 Operatng Systems Today Pagng Szes Vrtual Memory Concept Demand Pagng ELEC 377 Operatng Systems

More information

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface.

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface. IDC Herzlya Shmon Schocken Assembler Shmon Schocken Sprng 2005 Elements of Computng Systems 1 Assembler (Ch. 6) Where we are at: Human Thought Abstract desgn Chapters 9, 12 abstract nterface H.L. Language

More information

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6)

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6) Harvard Unversty CS 101 Fall 2005, Shmon Schocken Assembler Elements of Computng Systems 1 Assembler (Ch. 6) Why care about assemblers? Because Assemblers employ some nfty trcks Assemblers are the frst

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

News. Recap: While Loop Example. Reading. Recap: Do Loop Example. Recap: For Loop Example

News. Recap: While Loop Example. Reading. Recap: Do Loop Example. Recap: For Loop Example Unversty of Brtsh Columba CPSC, Intro to Computaton Jan-Apr Tamara Munzner News Assgnment correctons to ASCIIArtste.java posted defntely read WebCT bboards Arrays Lecture, Tue Feb based on sldes by Kurt

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

Agenda & Reading. Simple If. Decision-Making Statements. COMPSCI 280 S1C Applications Programming. Programming Fundamentals

Agenda & Reading. Simple If. Decision-Making Statements. COMPSCI 280 S1C Applications Programming. Programming Fundamentals Agenda & Readng COMPSCI 8 SC Applcatons Programmng Programmng Fundamentals Control Flow Agenda: Decsonmakng statements: Smple If, Ifelse, nested felse, Select Case s Whle, DoWhle/Untl, For, For Each, Nested

More information

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss. Today s Outlne Sortng Chapter 7 n Wess CSE 26 Data Structures Ruth Anderson Announcements Wrtten Homework #6 due Frday 2/26 at the begnnng of lecture Proect Code due Mon March 1 by 11pm Today s Topcs:

More information

Verification by testing

Verification by testing Real-Tme Systems Specfcaton Implementaton System models Executon-tme analyss Verfcaton Verfcaton by testng Dad? How do they know how much weght a brdge can handle? They drve bgger and bgger trucks over

More information

Tradeoffs in Buffering Speculative Memory State for Thread-Level Speculation in Multiprocessors

Tradeoffs in Buffering Speculative Memory State for Thread-Level Speculation in Multiprocessors Tradeoffs in Buffering Speculative Memory State for Thread-Level Speculation in Multiprocessors MARíA JESÚS GARZARÁN University of Illinois at Urbana-Champaign MILOS PRVULOVIC Georgia Institute of Technology

More information

A New Transaction Processing Model Based on Optimistic Concurrency Control

A New Transaction Processing Model Based on Optimistic Concurrency Control A New Transacton Processng Model Based on Optmstc Concurrency Control Wang Pedong,Duan Xpng,Jr. Abstract-- In ths paper, to support moblty and dsconnecton of moble clents effectvely n moble computng envronment,

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

Utility-Based Acceleration of Multithreaded Applications on Asymmetric CMPs

Utility-Based Acceleration of Multithreaded Applications on Asymmetric CMPs Utlty-Based Acceleraton of Multthreaded Applcatons on Asymmetrc CMPs José A. Joao M. Aater Suleman Onur Mutlu Yale N. Patt ECE Department The Unversty of Texas at Austn Austn, TX, USA {joao, patt}@ece.utexas.edu

More information

Loop Transformations, Dependences, and Parallelization

Loop Transformations, Dependences, and Parallelization Loop Transformatons, Dependences, and Parallelzaton Announcements Mdterm s Frday from 3-4:15 n ths room Today Semester long project Data dependence recap Parallelsm and storage tradeoff Scalar expanson

More information

Conditional Speculative Decimal Addition*

Conditional Speculative Decimal Addition* Condtonal Speculatve Decmal Addton Alvaro Vazquez and Elsardo Antelo Dep. of Electronc and Computer Engneerng Unv. of Santago de Compostela, Span Ths work was supported n part by Xunta de Galca under grant

More information

Real-Time Systems. Real-Time Systems. Verification by testing. Verification by testing

Real-Time Systems. Real-Time Systems. Verification by testing. Verification by testing EDA222/DIT161 Real-Tme Systems, Chalmers/GU, 2014/2015 Lecture #8 Real-Tme Systems Real-Tme Systems Lecture #8 Specfcaton Professor Jan Jonsson Implementaton System models Executon-tme analyss Department

More information

Scheduling Remote Access to Scientific Instruments in Cyberinfrastructure for Education and Research

Scheduling Remote Access to Scientific Instruments in Cyberinfrastructure for Education and Research Schedulng Remote Access to Scentfc Instruments n Cybernfrastructure for Educaton and Research Je Yn 1, Junwe Cao 2,3,*, Yuexuan Wang 4, Lanchen Lu 1,3 and Cheng Wu 1,3 1 Natonal CIMS Engneerng and Research

More information

Configuration Management in Multi-Context Reconfigurable Systems for Simultaneous Performance and Power Optimizations*

Configuration Management in Multi-Context Reconfigurable Systems for Simultaneous Performance and Power Optimizations* Confguraton Management n Mult-Context Reconfgurable Systems for Smultaneous Performance and Power Optmzatons* Rafael Maestre, Mlagros Fernandez Departamento de Arqutectura de Computadores y Automátca Unversdad

More information

GSLM Operations Research II Fall 13/14

GSLM Operations Research II Fall 13/14 GSLM 58 Operatons Research II Fall /4 6. Separable Programmng Consder a general NLP mn f(x) s.t. g j (x) b j j =. m. Defnton 6.. The NLP s a separable program f ts objectve functon and all constrants are

More information

Optimization and Parallelization of Sequential Programs

Optimization and Parallelization of Sequential Programs DF Advanced Compler Constructon TDDC86 Compler optmzatons and code generaton Optmzaton and Parallelzaton of Sequental Programs Lecture 7 Chrstoph Kessler IDA / PELAB Lnköpng Unversty Sweden Outlne Towards

More information

[KV99] M. Kaul and R, Vemuri. Integrated Block-Processing and Design-Space Exploration in Temporal Partitioning for RTR Architectures.

[KV99] M. Kaul and R, Vemuri. Integrated Block-Processing and Design-Space Exploration in Temporal Partitioning for RTR Architectures. [KV99] M. Kaul and R, Vemur. Integrated Block-Processng and Desgn-Space Exploraton n Temporal Parttonng for RTR Archtectures. Reconfgurable Archtectures Workshop Proceedngs, Puerto Rco, Aprl 1999. [OCS98]

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introducton 1.1 Parallel Processng There s a contnual demand for greater computatonal speed from a computer system than s currently possble (.e. sequental systems). Areas need great computatonal

More information

If you miss a key. Chapter 6: Demand Paging Source:

If you miss a key. Chapter 6: Demand Paging Source: ADRIAN PERRIG & TORSTEN HOEFLER ( -6- ) Networks and Operatng Systems Chapter 6: Demand Pagng Source: http://redmne.replcant.us/projects/replcant/wk/samsunggalaxybackdoor If you mss a key after yesterday

More information

Fast Computation of Shortest Path for Visiting Segments in the Plane

Fast Computation of Shortest Path for Visiting Segments in the Plane Send Orders for Reprnts to reprnts@benthamscence.ae 4 The Open Cybernetcs & Systemcs Journal, 04, 8, 4-9 Open Access Fast Computaton of Shortest Path for Vstng Segments n the Plane Ljuan Wang,, Bo Jang

More information

Memory Modeling in ESL-RTL Equivalence Checking

Memory Modeling in ESL-RTL Equivalence Checking 11.4 Memory Modelng n ESL-RTL Equvalence Checkng Alfred Koelbl 2025 NW Cornelus Pass Rd. Hllsboro, OR 97124 koelbl@synopsys.com Jerry R. Burch 2025 NW Cornelus Pass Rd. Hllsboro, OR 97124 burch@synopsys.com

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

Transaction-Consistent Global Checkpoints in a Distributed Database System

Transaction-Consistent Global Checkpoints in a Distributed Database System Proceedngs of the World Congress on Engneerng 2008 Vol I Transacton-Consstent Global Checkponts n a Dstrbuted Database System Jang Wu, D. Manvannan and Bhavan Thurasngham Abstract Checkpontng and rollback

More information

Dataflow: A Complement to Superscalar

Dataflow: A Complement to Superscalar Dataflow: A Complement to Superscalar Mha Budu Pedro V. Artgas Seth Copen Goldsten mbudu@mcrosoft.com artgas@cs.cmu.edu seth@cs.cmu.edu Mcrosoft Research, Slcon Valley Carnege Mellon Unversty Abstract

More information

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following. Complex Numbers The last topc n ths secton s not really related to most of what we ve done n ths chapter, although t s somewhat related to the radcals secton as we wll see. We also won t need the materal

More information

Petri Net Based Software Dependability Engineering

Petri Net Based Software Dependability Engineering Proc. RELECTRONIC 95, Budapest, pp. 181-186; October 1995 Petr Net Based Software Dependablty Engneerng Monka Hener Brandenburg Unversty of Technology Cottbus Computer Scence Insttute Postbox 101344 D-03013

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Circuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL)

Circuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL) Crcut Analyss I (ENG 405) Chapter Method of Analyss Nodal(KCL) and Mesh(KVL) Nodal Analyss If nstead of focusng on the oltages of the crcut elements, one looks at the oltages at the nodes of the crcut,

More information

Information Sciences

Information Sciences Informaton Scences 79 (9) 369 367 ontents lsts avalable at ScenceDrect Informaton Scences ournal homepage: www.elsever.com/locate/ns Necessary and suffcent condtons for transacton-consstent global checkponts

More information

CE 221 Data Structures and Algorithms

CE 221 Data Structures and Algorithms CE 1 ata Structures and Algorthms Chapter 4: Trees BST Text: Read Wess, 4.3 Izmr Unversty of Economcs 1 The Search Tree AT Bnary Search Trees An mportant applcaton of bnary trees s n searchng. Let us assume

More information

Loop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation

Loop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation Loop Transformatons for Parallelsm & Localty Last week Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Scalar expanson for removng false dependences Loop nterchange Loop

More information

Floating-Point Division Algorithms for an x86 Microprocessor with a Rectangular Multiplier

Floating-Point Division Algorithms for an x86 Microprocessor with a Rectangular Multiplier Floatng-Pont Dvson Algorthms for an x86 Mcroprocessor wth a Rectangular Multpler Mchael J. Schulte Dmtr Tan Carl E. Lemonds Unversty of Wsconsn Advanced Mcro Devces Advanced Mcro Devces Schulte@engr.wsc.edu

More information

Simulation Based Analysis of FAST TCP using OMNET++

Simulation Based Analysis of FAST TCP using OMNET++ Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Intro. Iterators. 1. Access

Intro. Iterators. 1. Access Intro Ths mornng I d lke to talk a lttle bt about s and s. We wll start out wth smlartes and dfferences, then we wll see how to draw them n envronment dagrams, and we wll fnsh wth some examples. Happy

More information

Analysis of Continuous Beams in General

Analysis of Continuous Beams in General Analyss of Contnuous Beams n General Contnuous beams consdered here are prsmatc, rgdly connected to each beam segment and supported at varous ponts along the beam. onts are selected at ponts of support,

More information

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Performance Study of Parallel Programming on Cloud Computing Environments Using MapReduce

Performance Study of Parallel Programming on Cloud Computing Environments Using MapReduce Performance Study of Parallel Programmng on Cloud Computng Envronments Usng MapReduce Wen-Chung Shh, Shan-Shyong Tseng Department of Informaton Scence and Applcatons Asa Unversty Tachung, 41354, Tawan

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

AMath 483/583 Lecture 21 May 13, Notes: Notes: Jacobi iteration. Notes: Jacobi with OpenMP coarse grain

AMath 483/583 Lecture 21 May 13, Notes: Notes: Jacobi iteration. Notes: Jacobi with OpenMP coarse grain AMath 483/583 Lecture 21 May 13, 2011 Today: OpenMP and MPI versons of Jacob teraton Gauss-Sedel and SOR teratve methods Next week: More MPI Debuggng and totalvew GPU computng Read: Class notes and references

More information

International Conference on Parallel Processing, St. Charles, IL, August COMMUNICATION OPTIMIZATIONS USED IN THE PARADIGM

International Conference on Parallel Processing, St. Charles, IL, August COMMUNICATION OPTIMIZATIONS USED IN THE PARADIGM Internatonal Conference on Parallel Processng, St. Charles, IL, August 199 1 COMMCATION OPTIMIZATIONS USED IN THE PARADIGM COMPILER FOR DISTRIBUTED-MEMORY MULTICOMPUTERS Danel J. Palermo, Ernesto Su, John

More information

Solving Planted Motif Problem on GPU

Solving Planted Motif Problem on GPU Solvng Planted Motf Problem on GPU Naga Shalaja Dasar Old Domnon Unversty Norfolk, VA, USA ndasar@cs.odu.edu Ranjan Desh Old Domnon Unversty Norfolk, VA, USA dranjan@cs.odu.edu Zubar M Old Domnon Unversty

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

The Shortest Path of Touring Lines given in the Plane

The Shortest Path of Touring Lines given in the Plane Send Orders for Reprnts to reprnts@benthamscence.ae 262 The Open Cybernetcs & Systemcs Journal, 2015, 9, 262-267 The Shortest Path of Tourng Lnes gven n the Plane Open Access Ljuan Wang 1,2, Dandan He

More information

Chapter 6 Programmng the fnte element method Inow turn to the man subject of ths book: The mplementaton of the fnte element algorthm n computer programs. In order to make my dscusson as straghtforward

More information

Parallel and Distributed Association Rule Mining - Dr. Giuseppe Di Fatta. San Vigilio,

Parallel and Distributed Association Rule Mining - Dr. Giuseppe Di Fatta. San Vigilio, Parallel and Dstrbuted Assocaton Rule Mnng - Dr. Guseppe D Fatta fatta@nf.un-konstanz.de San Vglo, 18-09-2004 1 Overvew Assocaton Rule Mnng (ARM) Apror algorthm Hgh Performance Parallel and Dstrbuted Computng

More information

An efficient iterative source routing algorithm

An efficient iterative source routing algorithm An effcent teratve source routng algorthm Gang Cheng Ye Tan Nrwan Ansar Advanced Networng Lab Department of Electrcal Computer Engneerng New Jersey Insttute of Technology Newar NJ 7 {gc yt Ansar}@ntedu

More information

Concurrent models of computation for embedded software

Concurrent models of computation for embedded software Concurrent models of computaton for embedded software and hardware! Researcher overvew what t looks lke semantcs what t means and how t relates desgnng an actor language actor propertes and how to represent

More information

Run-Time Operator State Spilling for Memory Intensive Long-Running Queries

Run-Time Operator State Spilling for Memory Intensive Long-Running Queries Run-Tme Operator State Spllng for Memory Intensve Long-Runnng Queres Bn Lu, Yal Zhu, and lke A. Rundenstener epartment of Computer Scence, Worcester Polytechnc Insttute Worcester, Massachusetts, USA {bnlu,

More information

User Authentication Based On Behavioral Mouse Dynamics Biometrics

User Authentication Based On Behavioral Mouse Dynamics Biometrics User Authentcaton Based On Behavoral Mouse Dynamcs Bometrcs Chee-Hyung Yoon Danel Donghyun Km Department of Computer Scence Department of Computer Scence Stanford Unversty Stanford Unversty Stanford, CA

More information

Dynamic Voltage Scaling of Supply and Body Bias Exploiting Software Runtime Distribution

Dynamic Voltage Scaling of Supply and Body Bias Exploiting Software Runtime Distribution Dynamc Voltage Scalng of Supply and Body Bas Explotng Software Runtme Dstrbuton Sungpack Hong EE Department Stanford Unversty Sungjoo Yoo, Byeong Bn, Kyu-Myung Cho, Soo-Kwan Eo Samsung Electroncs Taehwan

More information

Related-Mode Attacks on CTR Encryption Mode

Related-Mode Attacks on CTR Encryption Mode Internatonal Journal of Network Securty, Vol.4, No.3, PP.282 287, May 2007 282 Related-Mode Attacks on CTR Encrypton Mode Dayn Wang, Dongda Ln, and Wenlng Wu (Correspondng author: Dayn Wang) Key Laboratory

More information

AADL : about scheduling analysis

AADL : about scheduling analysis AADL : about schedulng analyss Schedulng analyss, what s t? Embedded real-tme crtcal systems have temporal constrants to meet (e.g. deadlne). Many systems are bult wth operatng systems provdng multtaskng

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information

Nachos Project 3. Speaker: Sheng-Wei Cheng 2010/12/16

Nachos Project 3. Speaker: Sheng-Wei Cheng 2010/12/16 Nachos Project Speaker: Sheng-We Cheng //6 Agenda Motvaton User Programs n Nachos Related Nachos Code for User Programs Project Assgnment Bonus Submsson Agenda Motvaton User Programs n Nachos Related Nachos

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

Notes on Organizing Java Code: Packages, Visibility, and Scope

Notes on Organizing Java Code: Packages, Visibility, and Scope Notes on Organzng Java Code: Packages, Vsblty, and Scope CS 112 Wayne Snyder Java programmng n large measure s a process of defnng enttes (.e., packages, classes, methods, or felds) by name and then usng

More information

Feature-Based Matrix Factorization

Feature-Based Matrix Factorization Feature-Based Matrx Factorzaton arxv:1109.2271v3 [cs.ai] 29 Dec 2011 Tanq Chen, Zhao Zheng, Quxa Lu, Wenan Zhang, Yong Yu {tqchen,zhengzhao,luquxa,wnzhang,yyu}@apex.stu.edu.cn Apex Data & Knowledge Management

More information

ETAtouch RESTful Webservices

ETAtouch RESTful Webservices ETAtouch RESTful Webservces Verson 1.1 November 8, 2012 Contents 1 Introducton 3 2 The resource /user/ap 6 2.1 HTTP GET................................... 6 2.2 HTTP POST..................................

More information

WCET-Directed Dynamic Scratchpad Memory Allocation of Data

WCET-Directed Dynamic Scratchpad Memory Allocation of Data WCET-Drected Dynamc Scratchpad Memory Allocaton of Data Jean-Franços Deverge and Isabelle Puaut Unversté Européenne de Bretagne / IRISA, Rennes, France Abstract Many embedded systems feature processors

More information

Solving two-person zero-sum game by Matlab

Solving two-person zero-sum game by Matlab Appled Mechancs and Materals Onlne: 2011-02-02 ISSN: 1662-7482, Vols. 50-51, pp 262-265 do:10.4028/www.scentfc.net/amm.50-51.262 2011 Trans Tech Publcatons, Swtzerland Solvng two-person zero-sum game by

More information

High-Boost Mesh Filtering for 3-D Shape Enhancement

High-Boost Mesh Filtering for 3-D Shape Enhancement Hgh-Boost Mesh Flterng for 3-D Shape Enhancement Hrokazu Yagou Λ Alexander Belyaev y Damng We z Λ y z ; ; Shape Modelng Laboratory, Unversty of Azu, Azu-Wakamatsu 965-8580 Japan y Computer Graphcs Group,

More information

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 An Iteratve Soluton Approach to Process Plant Layout usng Mxed

More information

Using Delayed Addition Techniques to Accelerate Integer and Floating-Point Calculations in Configurable Hardware

Using Delayed Addition Techniques to Accelerate Integer and Floating-Point Calculations in Configurable Hardware Draft submtted for publcaton. Please do not dstrbute Usng Delayed Addton echnques to Accelerate Integer and Floatng-Pont Calculatons n Confgurable Hardware Zhen Luo, Nonmember and Margaret Martonos, Member,

More information

Optimizing for Speed. What is the potential gain? What can go Wrong? A Simple Example. Erik Hagersten Uppsala University, Sweden

Optimizing for Speed. What is the potential gain? What can go Wrong? A Simple Example. Erik Hagersten Uppsala University, Sweden Optmzng for Speed Er Hagersten Uppsala Unversty, Sweden eh@t.uu.se What s the potental gan? Latency dfference L$ and mem: ~5x Bandwdth dfference L$ and mem: ~x Repeated TLB msses adds a factor ~-3x Execute

More information

Active Contours/Snakes

Active Contours/Snakes Actve Contours/Snakes Erkut Erdem Acknowledgement: The sldes are adapted from the sldes prepared by K. Grauman of Unversty of Texas at Austn Fttng: Edges vs. boundares Edges useful sgnal to ndcate occludng

More information

Insertion Sort. Divide and Conquer Sorting. Divide and Conquer. Mergesort. Mergesort Example. Auxiliary Array

Insertion Sort. Divide and Conquer Sorting. Divide and Conquer. Mergesort. Mergesort Example. Auxiliary Array Inserton Sort Dvde and Conquer Sortng CSE 6 Data Structures Lecture 18 What f frst k elements of array are already sorted? 4, 7, 1, 5, 1, 16 We can shft the tal of the sorted elements lst down and then

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

CHAPTER 2 DECOMPOSITION OF GRAPHS

CHAPTER 2 DECOMPOSITION OF GRAPHS CHAPTER DECOMPOSITION OF GRAPHS. INTRODUCTION A graph H s called a Supersubdvson of a graph G f H s obtaned from G by replacng every edge uv of G by a bpartte graph,m (m may vary for each edge by dentfyng

More information

Speculative Synchronization

Speculative Synchronization Speculative Synchronization José F. Martínez Department of Computer Science University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu/martinez Problem 1: Conservative Parallelization No parallelization

More information

an assocated logc allows the proof of safety and lveness propertes. The Unty model nvolves on the one hand a programmng language and, on the other han

an assocated logc allows the proof of safety and lveness propertes. The Unty model nvolves on the one hand a programmng language and, on the other han UNITY as a Tool for Desgn and Valdaton of a Data Replcaton System Phlppe Quennec Gerard Padou CENA IRIT-ENSEEIHT y Nnth Internatonal Conference on Systems Engneerng Unversty of Nevada, Las Vegas { 14-16

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

Private Information Retrieval (PIR)

Private Information Retrieval (PIR) 2 Levente Buttyán Problem formulaton Alce wants to obtan nformaton from a database, but she does not want the database to learn whch nformaton she wanted e.g., Alce s an nvestor queryng a stock-market

More information