Optimization and Parallelization of Sequential Programs

Size: px
Start display at page:

Download "Optimization and Parallelization of Sequential Programs"

Transcription

1 DF Advanced Compler Constructon TDDC86 Compler optmzatons and code generaton Optmzaton and Parallelzaton of Sequental Programs Lecture 7 Chrstoph Kessler IDA / PELAB Lnköpng Unversty Sweden Outlne Towards (sem-)automatc parallelzaton of sequental programs Data dependence analyss for loops Some loop transformatons Loop nvarant code hostng, loop unrollng, loop fuson, loop nterchange, loop blockng and tlng Statc loop parallelzaton Run-tme loop parallelzaton Doacross parallelzaton, Inspector-executor method Speculatve parallelzaton (as tme permts) Auto-tunng (later, f tme) Chrstoph Kessler, IDA, Lnköpngs unverstet, 4. Foundatons: Control and Data Dependence Foundatons: Control and Data Dependence Consder statements S, T n a sequental program (S=T possble) Scope of analyss s typcally a functon,.e. ntra-procedural analyss Assume that a control flow path S T s possble Can be done at arbtrary granularty (nstructons, operatons, statements, compound statements, program regons) Relevant are only the read and wrte effects on memory (.e. on program varables) by each operaton, and the effect on control flow Control dependence S T, f the fact whether T s executed may depend on S (e.g. condton) Imples that relatve executon order S T must be preserved when restructurng the program S: f () { T: Mostly obvous from nestng structure n well-structured programs, but more trcky n arbtrary branchng code (e.g. assembler code) Data dependence S T, f statement S may execute (dynamcally) before T and both may access the same memory locaton and at least one of these accesses s a wrte Means that executon order S before T must be preserved when restructurng the program In general, only a conservatve over-estmaton can be determned statcally flow dependence: (RAW, read-after-wrte) S may wrte a locaton z that T may read ant dependence: (WAR, wrte-after-read) S may read a locaton x that T may overwrtes output dependence: (WAW, wrte-after-wrte) both S and T may wrte the same locaton S: z = ; T: =..z.. ; (flow dependence) 3 4 Dependence Graph (Data, Control, Program) Dependence Graph: Drected graph, consstng of all statements as vertces and all (data, control, any) dependences as edges. Why Loop Optmzaton and Parallelzaton? Loops are a promsng obect for program optmzatons, ncludng automatc parallelzaton: Hgh executon frequency Most computaton done n (nner) loops Even small optmzatons can have large mpact (cf. Amdahl s Law) Regular, repettve behavor compact descrpton relatvely smple to analyze statcally Well researched 5 6

2 Loop Optmzatons General Issues Move loop nvarant computatons out of loops Modfy the order of teratons or parts thereof Goals: Improve data access localty Faster executon Reduce loop control overhead Enhance possbltes for loop parallelzaton or vectorzaton DF Advanced Compler Constructon TDDC86 Compler optmzatons and code generaton Data Dependence Analyss for Loops A more formal ntroducton Only transformatons that preserve the program semantcs (ts nput/output behavor) are admssble Conservatve (statc) crterum: preserve data dependences Need data dependence analyss for loops 7 Chrstoph Kessler, IDA, Lnköpngs unverstet, 4. Data Dependence Analyss Overvew Precedence relaton between statements Important for loop optmzatons, vectorzaton and parallelzaton, nstructon schedulng, data cache optmzatons Conservatve approxmatons to dsontness of pars of memory accesses weaker than data-flow analyss but generalzes ncely to the level of ndvdual array element Loops, loop nests Iteraton space Array subscrpts n loops Index space Dependence testng methods Data dependence graph Data + control dependence graph Program dependence graph 9 Data Dependence Graph Loop Iteraton Space Data dependence graph for straght-lne code ( basc block, no branchng) s always acyclc, because relatve executon order of statements s forward only. Data dependence graph for a loop: Dependence edge S T f a dependence may exst for some par of nstances (teratons) of S, T Cycles possble Loop-ndependent versus loop-carred dependences (assumng we know statcally that arrays a and b do not ntersect)

3 Example Loop Normalzaton (assumng that we statcally know that arrays A, X, Y, Z do not ntersect, otherwse there mght be further dependences) (Iteratons unrolled) Data dependence graph: S S 3 Dependence Dstance and Drecton 5 Lnear Dophantne Equatons 7 4 Dependence Equaton System 6 Dependence Testng, : GCD-Test 8 3

4 For multdmensonal arrays? 9 Survey of Dependence Tests DF Advanced Compler Constructon Loop Optmzatons General Issues TDDC86 Compler optmzatons and code generaton Move loop nvarant computatons out of loops Modfy the order of teratons or parts thereof Loop Transformatons and Parallelzaton Chrstoph Kessler, IDA, Lnköpngs unverstet, 4. Some mportant loop transformatons Loop normalzaton Goals: Improve data access localty Faster executon Reduce loop control overhead Enhance possbltes for loop parallelzaton or vectorzaton Only transformatons that preserve the program semantcs (ts nput/output behavor) are admssble Conservatve (statc) crterum: preserve data dependences Need data dependence analyss for loops Loop Invarant Code Hostng Move loop nvarant code out of the loop Loop parallelzaton Loop nvarant code hostng Loop nterchange Loop fuson vs. Loop dstrbuton / fsson Strp-mnng / loop tlng / blockng vs. Loop lnearzaton Loop unrollng, unroll-and-am Complers can do ths automatcally f they can statcally fnd out what code s loop nvarant for (=; <; ++) tmp = c / d; a[] = b[] + c / d; for (=; <; ++) a[] = b[] + tmp; Loop peelng Index set splttng, Loop unswtchng Scalar replacement, Scalar expanson Later: Software ppelnng More: Cycle shrnkng, Loop skewng,

5 Loop Unrollng Loop Interchange () Loop unrollng For properly nested loops Can be enforced wth compler optons e.g. funroll= (statements n nnermost loop body only) for (=; <5; ++) { a[] = b[]; Example : for (=; <M; ++) for ( =; <5; +=) { Unroll by : a[][] Reduces loop overhead (total # comparsons, branches, ncrements) a[][] row-wse storage of D-arrays n C, Java a[][m-] new teraton order Longer loop body may enable further local optmzatons (e.g. common subexpresson elmnaton, regster allocaton, nstructon schedulng, usng SIMD nstructons) old teraton order a[n-][] a[n-][] longer code Exercse: Formulate the unrollng rule for lmt C. Kessler, IDA, Lnköpngs unverstet. 5 statcally unknown TDDD56 upper Multcoreloop and GPU Programmng Foundatons: Loop-Carred Data Dependences Can mprove data access localty n memory herarchy (fewer cache msses / page faults) 6 Loop Interchange () Recall: Data dependence S T, S: z = ; f operaton S may execute (dynamcally) before operaton T and both may access the same memory locaton T: =..z.. ; and at least one of these accesses s a wrte In general, only a conservatve over-estmaton can be determned statcally. Be careful wth loop carred data dependences! = =3 T T T3 S S S3 for (=; <N; ++) for (=; <N; ++) for (=; <M; ++) a[][] =a[+][-]...; Iteraton space: =N- a[][] =a[+][-]; f the data dependence S T may exst for nstances of S and T n dfferent teratons of L. Iteraton space: = Example : for (=; <M; ++) Data dependence S T s called loop carred by a loop L TN- SN- Iteraton (,) reads locaton a[+][-] that was wrtten n an earler teraton, (-,+) Iteraton (,) reads locaton a[+][-], that wll be overwrtten n a later teraton (+,-) new teraton order old teraton order partal order between the operaton nstances resp. teratons 7 a[ ][ ] =. ; for (=; <M; ++) a[ ][ ] =. ; a[+] = b[+]; L: for (=; <N; ++) { T: = x[ - ]; S: x[ ] = ; for (=; <N; ++) for (=; <N; ++) a[] = b[]; Interchangng the loop headers would volate the partal teraton order gven by the data dependences 8 Loop Interchange (3) Loop Fuson Be careful wth loop-carred data dependences! Merge subsequent loops wth same header Example 3: for (=; <M; ++) for (=; <N; ++) for (=; <N; ++) OK for (=; <M; ++) a[][] =a[-][-]...; a[][] =a[-][-]; Iteraton space: new teraton order old teraton order Iteraton (,) reads locaton a[-][-] that was wrtten n earler teraton (-,-) Generally: Interchangng loop headers s only admssble f loop-carred dependences have the same drecton for all loops n the loop nest (all drected along or all aganst the teraton order) 9 Safe f nether loop carres a (backward) dependence for (=; <N; ++) for (= ; <N; ++) { a[ ] = ; a[ ] = ; for (=; <N; ++) Iteraton (,) reads locaton a[-][-] that was wrtten n earler teraton (-,-) = a[ ] ; = a[ ] ; OK Read of a[] stll after wrte of a[], for all For N suffcently large, a[] wll no longer be n the cache at ths tme Can mprove data access localty and reduces number of branches 3 5

6 Loop Iteraton Reorderng Loop Parallelzaton -loop carres a dependence, ts teraton order must be preserved -loop carres a dependence, ts teraton order must be preserved Loop parallelzaton 3 Remark on Loop Parallelzaton 3 Strp Mnng / Loop Blockng / -Tlng Introducng temporary copes of arrays can remove some antdependences to enable automatc loop parallelzaton for (=; <n; ++) a[] = a[] + a[+]; The loop-carred dependence can be elmnated: for (=; <n; ++) aold[+] = a[+]; for (=; <n; ++) a[] = a[] + aold[+]; Parallelzable loop Parallelzable loop Tled Matrx-Matrx Multplcaton () Tled Matrx-Matrx Multplcaton () Matrx-Matrx multplcaton C = A x B Block each loop by block sze S here for square (n x n) matrces C, A, B, wth n large (~3): C = S k=..n A k B k for all, =...n A for (=; <n; +=S) for (kk=; kk<n; kk+=s) k for (=; <n; ++) k kk kk for (=; <n; +=S) (here wthout the ntalzaton of C-entres to ): k B Good spatal localty for A, B and C for (=; < +S; ++) for (=; < +S; ++) for (k=; k<n; k++) C[][] += A[][k] * B[k][]; Code after tlng: Standard algorthm for Matrx-Matrx multplcaton for (=; <n; ++) (choose S so that a block of A, B, C ft n cache together), k then nterchange loops k 35 Good spatal localty on A, C for (k=kk; k < kk+s; k++) Bad spatal localty on B (many capacty msses) C[][] += A[][k] * B[k][]; 36 6

7 Remark on Localty Transformatons Loop Dstrbuton (a.k.a. Loop Fsson) An alternatve can be to change the data layout rather than the control structure of the program Store matrx B n transposed form, or, f necessary, consder transposng t, whch may pay off over several subsequent computatons Fndng the best layout for all multdmensonal arrays s a NP-complete optmzaton problem [Mace, 988] Recursve array layouts that preserve localty Morton-order Herarchcally layout tled arrays In the best case, can make computatons cache-oblvous Performance largely ndependent of TDDD56 cache sze 37 Multcore and GPU Programmng Loop Fuson 38 Loop Nest Flattenng / Lnearzaton 39 Loop Unrollng 4 Loop Unrollng wth Unknown Upper Bound 4 4 7

8 Loop Unroll-And-Jam 43 Loop Peelng Index Set Splttng Loop Unswtchng 45 Scalar Replacement Scalar Expanson / Array Prvatzaton

9 Idom recognton and algorthm replacement DF Advanced Compler Constructon TDDC86 Compler optmzatons and code generaton C. Kessler: Pattern-drven automatc parallelzaton. Scentfc Programmng, 996. Concludng Remarks A. Shafee-Sarvestan, E. Hansson, C. Kessler: Extensble recognton of algorthmc patterns n DSP programs for automatc parallelzaton. Int. J. on Parallel Programmng, Lmts of Statc Analyzablty Outlook: Runtme Analyss and Parallelzaton Chrstoph Kessler, IDA, Lnköpngs unverstet, 4. Remark on statc analyzablty () Remark on statc analyzablty () Statc dependence nformaton s always a (safe) Statc dependence nformaton s always a (safe) overapproxmaton of the real (run-tme) dependences overapproxmaton of the real (run-tme) dependences Fndng out the real ones exactly s statcally undecdable! Fndng out the latter exactly s statcally undecdable! If n doubt, a dependence must be assumed may prevent some optmzatons or parallelzaton If n doubt, a dependence must be assumed may prevent some optmzatons or parallelzaton One man reason for mprecson s alasng,.e. the program may have several ways to refer to the same memory locaton Ponter alasng vod mergesort ( nt* a, nt n ) { mergesort ( a, n/ ); mergesort ( a + n/, n-n/ ); 5 Another reason for mprecson are statcally unknown values that mply whether a dependence exsts or not Unknown dependence dstance // value of K statcally unknown Loop-carred dependence for ( =; <N; ++ ) f K < N. Otherwse, the loop s { parallelzable. S: a[] = a[] + a[k]; 5 How could a statc analyss tool (e.g., compler) know that the two recursve calls read and wrte dsont subarrays of a? Outlook: Runtme Parallelzaton DF Advanced Compler Constructon TDDC86 Compler optmzatons and code generaton Run-Tme Parallelzaton 53 Chrstoph Kessler, IDA, Lnköpngs unverstet, 4. 9

10 Goal of run-tme parallelzaton Overvew Typcal target: rregular loops Run-tme parallelzaton of rregular loops for ( =; <n; ++) a[] = f ( a[ g() ], a[ h() ],... ); DOACROSS parallelzaton Inspector-Executor Technque (shared memory) Array ndex expressons g, h... depend on run-tme data Inspector-Executor Technque (message passng) * Prvatzng DOALL Test * Iteratons cannot be statcally proved ndependent (and not ether dependent wth dstance +) Speculatve run-tme parallelzaton of rregular loops * Prncple: At runtme, nspect g, h... to fnd out the real dependences and compute a schedule for partally parallel executon Can also be combned wth speculatve parallelzaton LRPD Test * General Thread-Level Speculaton 55 Hardware support * * = not covered n ths course. See the references. 56 DOACROSS Parallelzaton Inspector-Executor Technque () Useful f loop-carred dependence dstances are unknown, but often > Compler generates peces of customzed code for such loops: Allow ndependent subsequent loop teratons to overlap Blateral synchronzaton between really-dependent teratons Inspector for ( =; <n; ++) a[] = f ( a[ g() ],... ); calculates values of ndex expresson by smulatng whole loop executon typcally, based on sequental verson of the source loop (some computatons could be left out) sh float aold[n]; sh flag done[n]; // flag (semaphore) array forall n..n- { // spawn n threads, one per teraton done[n] = ; aold[] = a[]; // create a copy forall n..n- { // spawn n threads, one per teraton f (g() < ) wat untl done[ g() ] ); a[] = f ( a[ g() ],... ); set( done[] ); else a[] = f ( aold[ g() ],... ); set done[]; 57 Inspector-Executor Technque () computes mplctly the real teraton dependence graph computes a parallel schedule as (greedy) wavefront traversal of the teraton dependence graph n topologcal order all teratons n same wavefront are ndependent schedule depth = #wavefronts = crtcal path length Executor follows ths schedule to execute the loop for ( =; <n; ++) a[] = f ( a[ g() ], a[ h() ],... ); for (=; <n; ++) a[] =... a[ g() ]...; Inspector: nt wf[n]; // wavefront ndces nt depth = ; for (=; <n; ++) wf[] = ; // nt. for (=; <n; ++) { wf[] = max ( wf[ g() ], wf[ h() ],... ) + ; depth = max ( depth, wf[] ); Inspector consders only flow dependences (RAW), ant- and output dependences to be preserved by executor 59 Inspector-Executor Technque (3) Source loop: 58 Executor: g() wf[] g()<? no yes no yes yes yes float aold[n]; // buffer array aold[:n] = a[:n]; 3 4 for (w=; w<depth; w++) forall (,, n, #) f (wf[] == w) { 5 a = (g() < )? a[g()] : aold[g()];... // smlarly, a for h etc. a[] = f ( a, a,... ); teraton (flow) dependence graph 6

11 Inspector-Executor Technque (4) DF Advanced Compler Constructon TDDC86 Compler optmzatons and code generaton Problem: Inspector remans sequental no speedup Soluton approaches: Re-use schedule over subsequent teratons of an outer loop f access pattern does not change Thread-Level Speculaton amortzes nspector overhead across repeated executons Parallelze the nspector usng doacross parallelzaton [Saltz,Mrchandaney 9] Parallelze the nspector usng sectonng [Leung/Zahoran 9] compute processor-local wavefronts n parallel, concatenate trade-off schedule qualty (depth) vs. nspector speed Parallelze the nspector usng bootstrappng [Leung/Z. 9] Start wth suboptmal schedule by sectonng, use ths to execute the nspector refned schedule 6 Chrstoph Kessler, IDA, Lnköpngs unverstet, 4. Speculatvely parallel executon TLS Example For automatc parallelzaton of sequental code where dependences are hard to analyze statcally Works on a task graph constructed mplctly and dynamcally Speculate on: control flow, data ndependence, synchronzaton, values We focus on thread-level speculaton (TLS) for CMP/MT processors. Speculatve nstructon-level parallelsm s not consdered here. Task: statcally: Connected, sngle-entry subgraph of the controlflow graph Basc blocks, loop bodes, loops, or entre functons dynamcally: Contguous fragment of dynamc nstructon stream wthn statc task regon, entered at statc task entry 63 Data dependence problem n TLS Explotng module-level speculatve parallelsm (across functon calls) Source: F. Warg: Technques for Reducng Thread-Level Speculaton Overhead n Chp Multprocessors. PhD thess, Chalmers TH, Gothenburg, June Speculatvely parallel executon of tasks Speculaton on nter-task control flow After havng assgned a task, predct ts successor task and start t speculatvely Speculaton on data ndependence For nter-task memory data (flow) dependences conservatvely: speculatvely: awat wrte (memory synchronzaton, message) hope for ndependence and contnue (execute the load) Roll-back of speculatve results on ms-speculaton (expensve) When startng speculaton, state must be buffered Squash an offendng task and all ts successors, restart Commt speculatve results when speculaton resolved to correct Source: F. Warg: Technques for Reducng Thread-Level Speculaton Overhead n Chp Multprocessors. PhD thess, Chalmers TH, Gothenburg, June Task s retred 66

12 Selectng Tasks for Speculaton TLS Implementatons Small tasks: Software-only speculaton too much overhead (task startup, task retrement) low parallelsm degree Large tasks: hgher msspeculaton probablty hgher rollback cost many speculatons ongong n parallel may saturate the resources Load balancng ssues avod large varaton n task szes Traversal of the program s control flow graph (CFG) for loops [Rauchwerger, Padua 94, 95]... Hardware-based speculaton Typcally, ntegrated n cache coherence protocols Used wth multthreaded processors / chp multprocessors for automatc parallelzaton of sequental legacy code If source code avalable, compler may help e.g. wth dentfyng sutable threads Heurstcs for task sze, control and data dep. speculaton Some references on Dependence Analyss, Loop optmzatons and Transformatons DF Advanced Compler Constructon TDDC86 Compler optmzatons and code generaton H. Zma, B. Chapman: Supercomplers for Parallel and Vector Computers. Addson-Wesley / ACM press, 99. M. Wolfe: Hgh-Performance Complers for Parallel Computng. Addson-Wesley, 996. Questons? R. Allen, K. Kennedy: Optmzng Complers for Modern Archtectures. Morgan Kaufmann,. Chrstoph Kessler, IDA, Lnköpngs unverstet, 4. Some references on run-tme parallelzaton Idom recognton and algorthm replacement: C. Kessler: Pattern-drven automatc parallelzaton. Scentfc Programmng 5:5-74, 996. A. Shafee-Sarvestan, E. Hansson, C. Kessler: Extensble recognton of algorthmc patterns n DSP programs for automatc paral-lelzaton. Int. J. on Parallel Programmng, 3. C. Kessler, IDA, Lnköpngs unverstet. 7 Some references on speculatve executon / parallelzaton R. Cytron: Doacross: Beyond vectorzaton for multprocessors. Proc. ICPP-986 D. Chen, J. Torrellas, P. Yew: An Effcent Algorthm for the Run-tme Parallelzaton of DOACROSS Loops, Proc. IEEE Supercomputng Conf., Nov. 4, IEEE CS Press, pp J. Martnez, J. Torrellas: Speculatve Locks for Concurrent Executon of Crtcal R. Mrchandaney, J. Saltz, R. M. Smth, D. M. Ncol, K. Crowley: Prncples of run-tme support for parallel processors, Proc. ACM Int. Conf. on Supercomputng, July 988, pp F. Warg and P. Stenström: Lmts on speculatve module-level parallelsm n J. Saltz and K. Crowley and R. Mrchandaney and H. Berryman: Runtme Schedulng and Executon of Loops on Message Passng Machnes, Journal on Parallel and Dstr. Computng 8 (99): J. Saltz, R. Mrchandaney: The preprocessed doacross loop. Proc. ICPP-99 Int. Conf. on Parallel Processng. S. Leung, J. Zahoran: Improvng the performance of run-tme parallelzaton. Proc. ACM PPoPP-993, pp Lawrence Rauchwerger, Davd Padua: The Prvatzng DOALL Test: A Run-Tme Technque for DOALL Loop Identfcaton and Array Prvatzaton. Proc. ACM Int. Conf. on Supercomputng, July 994, pp Lawrence Rauchwerger, Davd Padua: The LRPD Test: Speculatve Run-Tme Parallelzaton of Loops wth Prvatzaton and Reducton Parallelzaton. Proc. ACM SIGPLAN PLDI-95, 995, pp T. Vaykumar, G. Soh: Task Selecton for a Multscalar Processor. Proc. MICRO-3, Dec Sectons n Shared-Memory Multprocessors. Proc. WMPI at ISCA,. mperatve and obect-orented programs on CMP platforms. Pr. IEEE PACT. P. Marcuello and A. Gonzalez: Thread-spawnng schemes for speculatve multthreadng. Proc. HPCA-8,. J. Steffan et al.: Improvng value communcaton for thread-level speculaton. HPCA-8,. M. Cntra, J. Torrellas: Elmnatng squashes through learnng cross-thread volatons n speculatve parallelzaton for multprocessors. HPCA-8,. Fredrk Warg and Per Stenström: Improvng speculatve thread-level parallelsm through module run-length predcton. Proc. IPDPS 3. F. Warg: Technques for Reducng Thread-Level Speculaton Overhead n Chp Multprocessors. PhD thess, Chalmers TH, Gothenburg, June 6. T. Ohsawa et al.: Pnot: Speculatve mult-threadng processor archtecture explotng parallelsm over a wde range of granulartes. Proc. MICRO-38, 5. 7

Loop Permutation. Loop Transformations for Parallelism & Locality. Legality of Loop Interchange. Loop Interchange (cont)

Loop Permutation. Loop Transformations for Parallelism & Locality. Legality of Loop Interchange. Loop Interchange (cont) Loop Transformatons for Parallelsm & Localty Prevously Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Loop nterchange Loop transformatons and transformaton frameworks

More information

Vectorization in the Polyhedral Model

Vectorization in the Polyhedral Model Vectorzaton n the Polyhedral Model Lous-Noël Pouchet pouchet@cse.oho-state.edu Dept. of Computer Scence and Engneerng, the Oho State Unversty October 200 888. Introducton: Overvew Vectorzaton: Detecton

More information

Loop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation

Loop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation Loop Transformatons for Parallelsm & Localty Last week Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Scalar expanson for removng false dependences Loop nterchange Loop

More information

Lecture 15: Memory Hierarchy Optimizations. I. Caches: A Quick Review II. Iteration Space & Loop Transformations III.

Lecture 15: Memory Hierarchy Optimizations. I. Caches: A Quick Review II. Iteration Space & Loop Transformations III. Lecture 15: Memory Herarchy Optmzatons I. Caches: A Quck Revew II. Iteraton Space & Loop Transformatons III. Types of Reuse ALSU 7.4.2-7.4.3, 11.2-11.5.1 15-745: Memory Herarchy Optmzatons Phllp B. Gbbons

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

Polyhedral Compilation Foundations

Polyhedral Compilation Foundations Polyhedral Complaton Foundatons Lous-Noël Pouchet pouchet@cse.oho-state.edu Dept. of Computer Scence and Engneerng, the Oho State Unversty Feb 8, 200 888., Class # Introducton: Polyhedral Complaton Foundatons

More information

Loop Transformations, Dependences, and Parallelization

Loop Transformations, Dependences, and Parallelization Loop Transformatons, Dependences, and Parallelzaton Announcements Mdterm s Frday from 3-4:15 n ths room Today Semester long project Data dependence recap Parallelsm and storage tradeoff Scalar expanson

More information

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions Sortng Revew Introducton to Algorthms Qucksort CSE 680 Prof. Roger Crawfs Inserton Sort T(n) = Θ(n 2 ) In-place Merge Sort T(n) = Θ(n lg(n)) Not n-place Selecton Sort (from homework) T(n) = Θ(n 2 ) In-place

More information

Today Using Fourier-Motzkin elimination for code generation Using Fourier-Motzkin elimination for determining schedule constraints

Today Using Fourier-Motzkin elimination for code generation Using Fourier-Motzkin elimination for determining schedule constraints Fourer Motzkn Elmnaton Logstcs HW10 due Frday Aprl 27 th Today Usng Fourer-Motzkn elmnaton for code generaton Usng Fourer-Motzkn elmnaton for determnng schedule constrants Unversty Fourer-Motzkn Elmnaton

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

Uncorrected Proof. Thread-Level Speculation

Uncorrected Proof. Thread-Level Speculation Encyclopeda of Parallel Computng 00170 2011/3/8 12:30 Page 1 #2 T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 Thread-Level Josep Torrellas Unversty of Illnos

More information

Assembler. Building a Modern Computer From First Principles.

Assembler. Building a Modern Computer From First Principles. Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought

More information

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016)

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016) Technsche Unverstät München WSe 6/7 Insttut für Informatk Prof. Dr. Thomas Huckle Dpl.-Math. Benjamn Uekermann Parallel Numercs Exercse : Prevous Exam Questons Precondtonng & Iteratve Solvers (From 6)

More information

Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access

Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access Agenda Cache Performance Samra Khan March 28, 217 Revew from last lecture Cache access Assocatvty Replacement Cache Performance Cache Abstracton and Metrcs Address Tag Store (s the address n the cache?

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

Parallel matrix-vector multiplication

Parallel matrix-vector multiplication Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

Cache Memories. Lecture 14 Cache Memories. Inserting an L1 Cache Between the CPU and Main Memory. General Org of a Cache Memory

Cache Memories. Lecture 14 Cache Memories. Inserting an L1 Cache Between the CPU and Main Memory. General Org of a Cache Memory Topcs Lecture 4 Cache Memores Generc cache memory organzaton Drect mapped caches Set assocate caches Impact of caches on performance Cache Memores Cache memores are small, fast SRAM-based memores managed

More information

2.1. The Program Model

2.1. The Program Model Hyperplane Parttonng : n pproach to Global ata Parttonng for strbuted Memory Machnes S. R. Prakash and Y.. Srkant epartment of S, Indan Insttute of Scence angalore, Inda, 6 bstract utomatc Global ata Parttonng

More information

LLVM passes and Intro to Loop Transformation Frameworks

LLVM passes and Intro to Loop Transformation Frameworks LLVM passes and Intro to Loop Transformaton Frameworks Announcements Ths class s recorded and wll be n D2L panapto. No quz Monday after sprng break. Wll be dong md-semester class feedback. Today LLVM passes

More information

ELEC 377 Operating Systems. Week 6 Class 3

ELEC 377 Operating Systems. Week 6 Class 3 ELEC 377 Operatng Systems Week 6 Class 3 Last Class Memory Management Memory Pagng Pagng Structure ELEC 377 Operatng Systems Today Pagng Szes Vrtual Memory Concept Demand Pagng ELEC 377 Operatng Systems

More information

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe CSCI 104 Sortng Algorthms Mark Redekopp Davd Kempe Algorthm Effcency SORTING 2 Sortng If we have an unordered lst, sequental search becomes our only choce If we wll perform a lot of searches t may be benefcal

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Module Management Tool in Software Development Organizations

Module Management Tool in Software Development Organizations Journal of Computer Scence (5): 8-, 7 ISSN 59-66 7 Scence Publcatons Management Tool n Software Development Organzatons Ahmad A. Al-Rababah and Mohammad A. Al-Rababah Faculty of IT, Al-Ahlyyah Amman Unversty,

More information

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss. Today s Outlne Sortng Chapter 7 n Wess CSE 26 Data Structures Ruth Anderson Announcements Wrtten Homework #6 due Frday 2/26 at the begnnng of lecture Proect Code due Mon March 1 by 11pm Today s Topcs:

More information

Parallel and Distributed Association Rule Mining - Dr. Giuseppe Di Fatta. San Vigilio,

Parallel and Distributed Association Rule Mining - Dr. Giuseppe Di Fatta. San Vigilio, Parallel and Dstrbuted Assocaton Rule Mnng - Dr. Guseppe D Fatta fatta@nf.un-konstanz.de San Vglo, 18-09-2004 1 Overvew Assocaton Rule Mnng (ARM) Apror algorthm Hgh Performance Parallel and Dstrbuted Computng

More information

Concurrent Apriori Data Mining Algorithms

Concurrent Apriori Data Mining Algorithms Concurrent Apror Data Mnng Algorthms Vassl Halatchev Department of Electrcal Engneerng and Computer Scence York Unversty, Toronto October 8, 2015 Outlne Why t s mportant Introducton to Assocaton Rule Mnng

More information

[KV99] M. Kaul and R, Vemuri. Integrated Block-Processing and Design-Space Exploration in Temporal Partitioning for RTR Architectures.

[KV99] M. Kaul and R, Vemuri. Integrated Block-Processing and Design-Space Exploration in Temporal Partitioning for RTR Architectures. [KV99] M. Kaul and R, Vemur. Integrated Block-Processng and Desgn-Space Exploraton n Temporal Parttonng for RTR Archtectures. Reconfgurable Archtectures Workshop Proceedngs, Puerto Rco, Aprl 1999. [OCS98]

More information

Wavefront Reconstructor

Wavefront Reconstructor A Dstrbuted Smplex B-Splne Based Wavefront Reconstructor Coen de Vsser and Mchel Verhaegen 14-12-201212 2012 Delft Unversty of Technology Contents Introducton Wavefront reconstructon usng Smplex B-Splnes

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Agenda & Reading. Simple If. Decision-Making Statements. COMPSCI 280 S1C Applications Programming. Programming Fundamentals

Agenda & Reading. Simple If. Decision-Making Statements. COMPSCI 280 S1C Applications Programming. Programming Fundamentals Agenda & Readng COMPSCI 8 SC Applcatons Programmng Programmng Fundamentals Control Flow Agenda: Decsonmakng statements: Smple If, Ifelse, nested felse, Select Case s Whle, DoWhle/Untl, For, For Each, Nested

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

Giving credit where credit is due

Giving credit where credit is due CSCE 23J Computer Organzaton Cache Memores Dr. Stee Goddard goddard@cse.unl.edu Gng credt where credt s due Most of sldes for ths lecture are based on sldes created by Drs. Bryant and O Hallaron, Carnege

More information

CS221: Algorithms and Data Structures. Priority Queues and Heaps. Alan J. Hu (Borrowing slides from Steve Wolfman)

CS221: Algorithms and Data Structures. Priority Queues and Heaps. Alan J. Hu (Borrowing slides from Steve Wolfman) CS: Algorthms and Data Structures Prorty Queues and Heaps Alan J. Hu (Borrowng sldes from Steve Wolfman) Learnng Goals After ths unt, you should be able to: Provde examples of approprate applcatons for

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information

AADL : about scheduling analysis

AADL : about scheduling analysis AADL : about schedulng analyss Schedulng analyss, what s t? Embedded real-tme crtcal systems have temporal constrants to meet (e.g. deadlne). Many systems are bult wth operatng systems provdng multtaskng

More information

Dijkstra s Single Source Algorithm. All-Pairs Shortest Paths. Dynamic Programming Solution. Performance. Decision Sequence.

Dijkstra s Single Source Algorithm. All-Pairs Shortest Paths. Dynamic Programming Solution. Performance. Decision Sequence. All-Pars Shortest Paths Gven an n-vertex drected weghted graph, fnd a shortest path from vertex to vertex for each of the n vertex pars (,). Dstra s Sngle Source Algorthm Use Dstra s algorthm n tmes, once

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to

More information

Estimation of Parallel Complexity with Rewriting Techniques

Estimation of Parallel Complexity with Rewriting Techniques Estmaton of Parallel Complexty wth Rewrtng Technques Chrstophe Alas, Carsten Fuhs, Laure Gonnord INRIA & LIP (UMR CNRS/ENS Lyon/UCB Lyon1/INRIA), Lyon, France, chrstophe.alas@ens-lyon.fr Brkbeck, Unversty

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface.

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface. IDC Herzlya Shmon Schocken Assembler Shmon Schocken Sprng 2005 Elements of Computng Systems 1 Assembler (Ch. 6) Where we are at: Human Thought Abstract desgn Chapters 9, 12 abstract nterface H.L. Language

More information

Verification by testing

Verification by testing Real-Tme Systems Specfcaton Implementaton System models Executon-tme analyss Verfcaton Verfcaton by testng Dad? How do they know how much weght a brdge can handle? They drve bgger and bgger trucks over

More information

Midterms Save the Dates!

Midterms Save the Dates! Unversty of Brtsh Columba CPSC, Intro to Computaton Alan J. Hu Readngs Ths Week: Ch 6 (Ch 7 n old 2 nd ed). (Remnder: Readngs are absolutely vtal for learnng ths stuff!) Thnkng About Loops Lecture 9 Some

More information

Conditional Speculative Decimal Addition*

Conditional Speculative Decimal Addition* Condtonal Speculatve Decmal Addton Alvaro Vazquez and Elsardo Antelo Dep. of Electronc and Computer Engneerng Unv. of Santago de Compostela, Span Ths work was supported n part by Xunta de Galca under grant

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Introducton to Algorthms October 4, 2002 Massachusetts Insttute of Technology 6046J/18410J Professors Erk Demane and Shaf Goldwasser Handout 14 Problem Set 3 Solutons (Exercses were not to be turned n,

More information

Introduction to Programming. Lecture 13: Container data structures. Container data structures. Topics for this lecture. A basic issue with containers

Introduction to Programming. Lecture 13: Container data structures. Container data structures. Topics for this lecture. A basic issue with containers 1 2 Introducton to Programmng Bertrand Meyer Lecture 13: Contaner data structures Last revsed 1 December 2003 Topcs for ths lecture 3 Contaner data structures 4 Contaners and genercty Contan other objects

More information

Petri Net Based Software Dependability Engineering

Petri Net Based Software Dependability Engineering Proc. RELECTRONIC 95, Budapest, pp. 181-186; October 1995 Petr Net Based Software Dependablty Engneerng Monka Hener Brandenburg Unversty of Technology Cottbus Computer Scence Insttute Postbox 101344 D-03013

More information

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to:

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to: 4.1 4.2 Motvaton EE 457 Unt 4 Computer System Performance An ndvdual user wants to: Mnmze sngle program executon tme A datacenter owner wants to: Maxmze number of Mnmze ( ) http://e-tellgentnternetmarketng.com/webste/frustrated-computer-user-2/

More information

ADJUSTING A PROGRAM TRANSFORMATION FOR LEGALITY

ADJUSTING A PROGRAM TRANSFORMATION FOR LEGALITY Parallel Processng Letters c World Scentfc Publshng Company ADJUSTING A PROGRAM TRANSFORMATION FOR LEGALITY CÉDRIC BASTOUL Laboratore PRSM, Unversté de Versalles Sant Quentn 45 avenue des États-Uns, 785

More information

Parallel Solutions of Indexed Recurrence Equations

Parallel Solutions of Indexed Recurrence Equations Parallel Solutons of Indexed Recurrence Equatons Yos Ben-Asher Dep of Math and CS Hafa Unversty 905 Hafa, Israel yos@mathcshafaacl Gad Haber IBM Scence and Technology 905 Hafa, Israel haber@hafascvnetbmcom

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6)

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6) Harvard Unversty CS 101 Fall 2005, Shmon Schocken Assembler Elements of Computng Systems 1 Assembler (Ch. 6) Why care about assemblers? Because Assemblers employ some nfty trcks Assemblers are the frst

More information

Dijkstra s Single Source Algorithm. All-Pairs Shortest Paths. Dynamic Programming Solution. Performance

Dijkstra s Single Source Algorithm. All-Pairs Shortest Paths. Dynamic Programming Solution. Performance All-Pars Shortest Paths Gven an n-vertex drected weghted graph, fnd a shortest path from vertex to vertex for each of the n vertex pars (,). Dkstra s Sngle Source Algorthm Use Dkstra s algorthm n tmes,

More information

Dynamic Programming. Example - multi-stage graph. sink. source. Data Structures &Algorithms II

Dynamic Programming. Example - multi-stage graph. sink. source. Data Structures &Algorithms II Dynamc Programmng Example - mult-stage graph 1 source 9 7 3 2 2 3 4 5 7 11 4 11 8 2 2 1 6 7 8 4 6 3 5 6 5 9 10 11 2 4 5 12 snk Data Structures &Algorthms II A labeled, drected graph Vertces can be parttoned

More information

Efficient Distributed File System (EDFS)

Efficient Distributed File System (EDFS) Effcent Dstrbuted Fle System (EDFS) (Sem-Centralzed) Debessay(Debsh) Fesehaye, Rahul Malk & Klara Naherstedt Unversty of Illnos-Urbana Champagn Contents Problem Statement, Related Work, EDFS Desgn Rate

More information

Improving High Level Synthesis Optimization Opportunity Through Polyhedral Transformations

Improving High Level Synthesis Optimization Opportunity Through Polyhedral Transformations Improvng Hgh Level Synthess Optmzaton Opportunty Through Polyhedral Transformatons We Zuo 2,5, Yun Lang 1, Peng L 1, Kyle Rupnow 3, Demng Chen 2,3 and Jason Cong 1,4 1 Center for Energy-Effcent Computng

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introducton 1.1 Parallel Processng There s a contnual demand for greater computatonal speed from a computer system than s currently possble (.e. sequental systems). Areas need great computatonal

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

News. Recap: While Loop Example. Reading. Recap: Do Loop Example. Recap: For Loop Example

News. Recap: While Loop Example. Reading. Recap: Do Loop Example. Recap: For Loop Example Unversty of Brtsh Columba CPSC, Intro to Computaton Jan-Apr Tamara Munzner News Assgnment correctons to ASCIIArtste.java posted defntely read WebCT bboards Arrays Lecture, Tue Feb based on sldes by Kurt

More information

Machine Learning. Topic 6: Clustering

Machine Learning. Topic 6: Clustering Machne Learnng Topc 6: lusterng lusterng Groupng data nto (hopefully useful) sets. Thngs on the left Thngs on the rght Applcatons of lusterng Hypothess Generaton lusters mght suggest natural groups. Hypothess

More information

AMath 483/583 Lecture 21 May 13, Notes: Notes: Jacobi iteration. Notes: Jacobi with OpenMP coarse grain

AMath 483/583 Lecture 21 May 13, Notes: Notes: Jacobi iteration. Notes: Jacobi with OpenMP coarse grain AMath 483/583 Lecture 21 May 13, 2011 Today: OpenMP and MPI versons of Jacob teraton Gauss-Sedel and SOR teratve methods Next week: More MPI Debuggng and totalvew GPU computng Read: Class notes and references

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

CHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vidyanagar

CHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vidyanagar CHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vdyanagar Faculty Name: Am D. Trved Class: SYBCA Subject: US03CBCA03 (Advanced Data & Fle Structure) *UNIT 1 (ARRAYS AND TREES) **INTRODUCTION TO ARRAYS If we want

More information

Real-Time Systems. Real-Time Systems. Verification by testing. Verification by testing

Real-Time Systems. Real-Time Systems. Verification by testing. Verification by testing EDA222/DIT161 Real-Tme Systems, Chalmers/GU, 2014/2015 Lecture #8 Real-Tme Systems Real-Tme Systems Lecture #8 Specfcaton Professor Jan Jonsson Implementaton System models Executon-tme analyss Department

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms Course Introducton Course Topcs Exams, abs, Proects A quc loo at a few algorthms 1 Advanced Data Structures and Algorthms Descrpton: We are gong to dscuss algorthm complexty analyss, algorthm desgn technques

More information

Real-time Scheduling

Real-time Scheduling Real-tme Schedulng COE718: Embedded System Desgn http://www.ee.ryerson.ca/~courses/coe718/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrcal and Computer Engneerng Ryerson Unversty Overvew RTX

More information

CE 221 Data Structures and Algorithms

CE 221 Data Structures and Algorithms CE 1 ata Structures and Algorthms Chapter 4: Trees BST Text: Read Wess, 4.3 Izmr Unversty of Economcs 1 The Search Tree AT Bnary Search Trees An mportant applcaton of bnary trees s n searchng. Let us assume

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

System-on-Chip Design Analysis of Control Data Flow. Hao Zheng Comp Sci & Eng U of South Florida

System-on-Chip Design Analysis of Control Data Flow. Hao Zheng Comp Sci & Eng U of South Florida System-on-Chp Desgn Analyss of Control Data Flow Hao Zheng Comp Sc & Eng U of South Florda Overvew DF models descrbe concurrent computa=on at a very hgh level Each actor descrbes non-trval computa=on.

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Proceedngs of the Wnter Smulaton Conference M E Kuhl, N M Steger, F B Armstrong, and J A Jones, eds A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS Mark W Brantley Chun-Hung

More information

Hierarchical clustering for gene expression data analysis

Hierarchical clustering for gene expression data analysis Herarchcal clusterng for gene expresson data analyss Gorgo Valentn e-mal: valentn@ds.unm.t Clusterng of Mcroarray Data. Clusterng of gene expresson profles (rows) => dscovery of co-regulated and functonally

More information

Outline. Digital Systems. C.2: Gates, Truth Tables and Logic Equations. Truth Tables. Logic Gates 9/8/2011

Outline. Digital Systems. C.2: Gates, Truth Tables and Logic Equations. Truth Tables. Logic Gates 9/8/2011 9/8/2 2 Outlne Appendx C: The Bascs of Logc Desgn TDT4255 Computer Desgn Case Study: TDT4255 Communcaton Module Lecture 2 Magnus Jahre 3 4 Dgtal Systems C.2: Gates, Truth Tables and Logc Equatons All sgnals

More information

Brave New World Pseudocode Reference

Brave New World Pseudocode Reference Brave New World Pseudocode Reference Pseudocode s a way to descrbe how to accomplsh tasks usng basc steps lke those a computer mght perform. In ths week s lab, you'll see how a form of pseudocode can be

More information

Programming FPGAs in C/C++ with High Level Synthesis PACAP - HLS 1

Programming FPGAs in C/C++ with High Level Synthesis PACAP - HLS 1 Programmng FPGAs n C/C wth Hgh Level Synthess PACAP - HLS 1 Outlne Why Hgh Level Synthess? Challenges when syntheszng hardware from C/C Hgh Level Synthess from C n a nutshell Explorng performance/area

More information

K-means and Hierarchical Clustering

K-means and Hierarchical Clustering Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your

More information

Estimating Costs of Path Expression Evaluation in Distributed Object Databases

Estimating Costs of Path Expression Evaluation in Distributed Object Databases Estmatng Costs of Path Expresson Evaluaton n Dstrbuted Obect Databases Gabrela Ruberg, Fernanda Baão, and Marta Mattoso Department of Computer Scence COPPE/UFRJ P.O.Box 685, Ro de Janero, RJ, 2945-970

More information

4/11/17. Agenda. Princeton University Computer Science 217: Introduction to Programming Systems. Goals of this Lecture. Storage Management.

4/11/17. Agenda. Princeton University Computer Science 217: Introduction to Programming Systems. Goals of this Lecture. Storage Management. //7 Prnceton Unversty Computer Scence 7: Introducton to Programmng Systems Goals of ths Lecture Storage Management Help you learn about: Localty and cachng Typcal storage herarchy Vrtual memory How the

More information

11. APPROXIMATION ALGORITHMS

11. APPROXIMATION ALGORITHMS Copng wth NP-completeness 11. APPROXIMATION ALGORITHMS load balancng center selecton prcng method: vertex cover LP roundng: vertex cover generalzed load balancng knapsack problem Q. Suppose I need to solve

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

7/12/2016. GROUP ANALYSIS Martin M. Monti UCLA Psychology AGGREGATING MULTIPLE SUBJECTS VARIANCE AT THE GROUP LEVEL

7/12/2016. GROUP ANALYSIS Martin M. Monti UCLA Psychology AGGREGATING MULTIPLE SUBJECTS VARIANCE AT THE GROUP LEVEL GROUP ANALYSIS Martn M. Mont UCLA Psychology NITP AGGREGATING MULTIPLE SUBJECTS When we conduct mult-subject analyss we are tryng to understand whether an effect s sgnfcant across a group of people. Whether

More information

Algorithmic Transformation Techniques for Efficient Exploration of Alternative Application Instances

Algorithmic Transformation Techniques for Efficient Exploration of Alternative Application Instances In: Proc. 0th Int. Symposum on Hardware/Software Codesgn (CODES 02), Estes Park, Colorado, USA, May 6 8, 2002 Algorthmc Transformaton Technques for Effcent Exploraton of Alternatve Applcaton Instances

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15 CS434a/541a: Pattern Recognton Prof. Olga Veksler Lecture 15 Today New Topc: Unsupervsed Learnng Supervsed vs. unsupervsed learnng Unsupervsed learnng Net Tme: parametrc unsupervsed learnng Today: nonparametrc

More information

Meta-heuristics for Multidimensional Knapsack Problems

Meta-heuristics for Multidimensional Knapsack Problems 2012 4th Internatonal Conference on Computer Research and Development IPCSIT vol.39 (2012) (2012) IACSIT Press, Sngapore Meta-heurstcs for Multdmensonal Knapsack Problems Zhbao Man + Computer Scence Department,

More information

Optimizing for Speed. What is the potential gain? What can go Wrong? A Simple Example. Erik Hagersten Uppsala University, Sweden

Optimizing for Speed. What is the potential gain? What can go Wrong? A Simple Example. Erik Hagersten Uppsala University, Sweden Optmzng for Speed Er Hagersten Uppsala Unversty, Sweden eh@t.uu.se What s the potental gan? Latency dfference L$ and mem: ~5x Bandwdth dfference L$ and mem: ~x Repeated TLB msses adds a factor ~-3x Execute

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

Performance Study of Parallel Programming on Cloud Computing Environments Using MapReduce

Performance Study of Parallel Programming on Cloud Computing Environments Using MapReduce Performance Study of Parallel Programmng on Cloud Computng Envronments Usng MapReduce Wen-Chung Shh, Shan-Shyong Tseng Department of Informaton Scence and Applcatons Asa Unversty Tachung, 41354, Tawan

More information

International Conference on Parallel Processing, St. Charles, IL, August COMMUNICATION OPTIMIZATIONS USED IN THE PARADIGM

International Conference on Parallel Processing, St. Charles, IL, August COMMUNICATION OPTIMIZATIONS USED IN THE PARADIGM Internatonal Conference on Parallel Processng, St. Charles, IL, August 199 1 COMMCATION OPTIMIZATIONS USED IN THE PARADIGM COMPILER FOR DISTRIBUTED-MEMORY MULTICOMPUTERS Danel J. Palermo, Ernesto Su, John

More information

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices Internatonal Mathematcal Forum, Vol 7, 2012, no 52, 2549-2554 An Applcaton of the Dulmage-Mendelsohn Decomposton to Sparse Null Space Bases of Full Row Rank Matrces Mostafa Khorramzadeh Department of Mathematcal

More information

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning Outlne Artfcal Intellgence and ts applcatons Lecture 8 Unsupervsed Learnng Professor Danel Yeung danyeung@eee.org Dr. Patrck Chan patrckchan@eee.org South Chna Unversty of Technology, Chna Introducton

More information

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces Range mages For many structured lght scanners, the range data forms a hghly regular pattern known as a range mage. he samplng pattern s determned by the specfc scanner. Range mage regstraton 1 Examples

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

Concurrent models of computation for embedded software

Concurrent models of computation for embedded software Concurrent models of computaton for embedded software and hardware! Researcher overvew what t looks lke semantcs what t means and how t relates desgnng an actor language actor propertes and how to represent

More information