Simulating Ocean Currents. Simulating Galaxy Evolution

Size: px
Start display at page:

Download "Simulating Ocean Currents. Simulating Galaxy Evolution"

Transcription

1 Simulating Ocean Currents (a) Cross sections (b) Satial discretization of a cross section Model as two-dimensional grids Discretize in sace and time finer satial and temoral resolution => greater accuracy Many different comutations er time ste set u and solve equations Concurrency across and within grid comutations 7 Simulating Galaxy Evolution Simulate the interactions of many stars evolving over time Comuting forces is exensive O(n 2 ) brute force aroach Hierarchical Methods take advantage of force law: G m 1 m 2 Star on which for ces are being comuted r 2 Large grou far enough away to aroximate Star too close to aroximate Small grou far enough away to aroximate to center of mass Many time-stes, lenty of concurrency across stars within one 8

2 Rendering Scenes by Ray Tracing Shoot rays into scene through ixels in image lane Follow their aths they bounce around as they strike objects they generate new rays: ray tree er inut ray Result is color and oacity for that ixel Parallelism across rays All case studies have abundant concurrency 9 Creating a Parallel Program Assumtion: Sequential algorithm is given Sometimes need very different algorithm, but beyond scoe Pieces of the job: Identify work that can be done in arallel Partition work and erhas data among rocesses Manage data access, communication and synchronization Note: work includes comutation, data access and I/O Main goal: Seedu (lus low rog. effort and resource needs) For a fixed roblem: Seedu () = Seedu () = Performance() Performance(1) Time(1) Time() 10

3 Stes in Creating a Parallel Program Partitioning D e c o m o s i t i o n A s s i g n m e n t O r c h e s t r a t i o n M a i n g P 0 P 1 P 2 P 3 Sequential comutation Tasks Processes Parallel Processors rogram 4 stes: Decomosition, Assignment, Orchestration, Maing Done by rogrammer or system software (comiler, runtime,...) Issues are the same, so assume rogrammer does it all exlicitly 11 Some Imortant Concets Task: Arbitrary iece of undecomosed work in arallel comutation Executed sequentially; concurrency is only across tasks E.g. a article/cell in Barnes-Hut, a ray or ray grou in Raytrace Fine-grained versus coarse-grained tasks Process (thread): Abstract entity that erforms the tasks assigned to rocesses Processes communicate and synchronize to erform their tasks Processor: Physical engine on which rocess executes Processes virtualize machine to rogrammer first write rogram in terms of rocesses, then ma to rocessors 12

4 Decomosition Break u comutation into tasks to be divided among rocesses Tasks may become available dynamically No. of available tasks may vary with time i.e. identify concurrency and decide level at which to exloit it Goal: Enough tasks to kee rocesses busy, but not too many No. of tasks available at a time is uer bound on achievable seedu 13 Limited Concurrency: Amdahl s Law Most fundamental limitation on arallel seedu If fraction s of seq execution is inherently serial, seedu <= 1/s Examle: 2-hase calculation swee over n-by-n grid and do some indeendent comutation swee again and add each value to global sum Time for first hase = n 2 / Second hase serialized at global variable, so time = n 2 Seedu <= 2n 2 or at most 2 n 2 + n 2 Trick: divide second hase into two accumulate into rivate sum during swee add er-rocess rivate sum into global sum Parallel time is n 2 / + n2/ +, and seedu at best 2n 2 2n

5 Pictorial Deiction 1 (a) n 2 n 2 work done concurrently 1 (b) n 2 / n 2 1 (c) n 2 / n 2 / Time 15 Concurrency Profiles Cannot usually divide into serial and arallel art 1,400 1,200 1,000 Concurrency Clock cycle number Area under curve is total work done, or time with 1 rocessor Horizontal extent is lower bound on time (infinite rocessors) f k k 1 Seedu is the ratio: k=1, base case: f k s + 1-s k k=1 Amdahl s law alies to any overhead, not just limited concurrency 16

6 Assignment Secifying mechanism to divide work u among rocesses E.g. which rocess comutes forces on which stars, or which rays Together with decomosition, also called artitioning Balance workload, reduce communication and management cost Structured aroaches usually work well Code insection (arallel loos) or understanding of alication Well-known heuristics Static versus dynamic assignment As rogrammers, we worry about artitioning first Usually indeendent of architecture or rog model But cost and comlexity of using rimitives may affect decisions As architects, we assume rogram does reasonable job of it 17 Orchestration Naming data Structuring communication Synchronization Organizing data structures and scheduling tasks temorally Goals Reduce cost of communication and synch. as seen by rocessors Reserve locality of data reference (incl. data structure organization) Schedule tasks to satisfy deendences early Reduce overhead of arallelism management Closest to architecture (and rogramming model & language) Choices deend a lot on comm. abstraction, efficiency of rimitives Architects should rovide aroriate rimitives efficiently 18

7 Maing After orchestration, already have arallel rogram Two asects of maing: Which rocesses will run on same rocessor, if necessary Which rocess runs on which articular rocessor maing to a network toology One extreme: sace-sharing Machine divided into subsets, only one a at a time in a subset Processes can be inned to rocessors, or left to OS Another extreme: comlete resource management control to OS OS uses the erformance techniques we will discuss later Real world is between the two User secifies desires in some asects, system may ignore Usually adot the view: rocess <-> rocessor 19 Parallelizing Comutation vs. Data Above view is centered around comutation Comutation is decomosed and assigned (artitioned) Partitioning Data is often a natural view too Comutation follows data: owner comutes Grid examle; data mining; High Performance Fortran (HPF) But not general enough Distinction between com. and data stronger in many alications Barnes-Hut, Raytrace (later) Retain comutation-centric view Data access and communication is art of orchestration 20

8 High-level Goals High erformance (seedu over sequential rogram) Table 2.1 Stes in the Parallelization Process and Their Goals Ste Architecture- Deendent? Major Performance Goals Decomosition Mostly no Exose enough concurrency but not too much Assignment Mostly no Balance workload Reduce communication volume Orchestration Yes Reduce noninherent communication via data locality Reduce communication and synchronization cost as seen by the rocessor Reduce serialization at shared resources Schedule tasks to satisfy deendences early Maing Yes Put related rocesses on the same rocessor if necessary Exloit locality in network toology But low resource usage and develoment effort Imlications for algorithm designers and architects Algorithm designers: high-erf., low resource needs Architects: high-erf., low cost, reduced rogramming effort e.g. gradually imroving erf. with rogramming effort may be referable to sudden threshold after large rogramming effort 21 Parallelization of An Examle Program Motivating roblems all lead to large, comlex rograms Examine a simlified version of a iece of Ocean simulation Iterative equation solver Illustrate arallel rogram in low-level arallel language C-like seudocode with simle extensions for arallelism Exose basic comm. and synch. rimitives that must be suorted State of most real arallel rogramming today 23

9 Grid Solver Examle Exression for udating each interior oint: A[i,j] = 0.2 (A[i,j] + A[i,j 1] + A[i 1, j] + A[i,j + 1] + A[i + 1, j]) Simlified version of solver in Ocean simulation Gauss-Seidel (near-neighbor) swees to convergence interior n-by-n oints of (n+2)-by-(n+2) udated in each swee udates done in-lace in grid, and diff. from rev. value comuted accumulate artial diffs into global diff at end of every swee check if error has converged (to within a tolerance arameter) if so, exit solver; if not, do another swee int n; /*size of matrix: (n + 2-by-n + 2) elements*/ 2. float **A, diff = 0; 3. main() 4. begin 5. read(n) ; /*read inut arameter: matrix size*/ 6. A malloc (a 2-d array of size n + 2 by n + 2 doubles); 7. initialize(a); /*initialize the matrix A somehow*/ 8. Solve (A); /*call the routine to solve equation*/ 9. end main 10. rocedure Solve (A) /*solve the equation system*/ 11. float **A; /*A is an (n + 2)-by-(n + 2) array*/ 12. begin 13. int i, j, done = 0; 14. float diff = 0, tem; 15. while (!done) do /*outermost loo over swees*/ 16. diff = 0; /*initialize maximum difference to 0*/ 17. for i 1 to n do /*swee over nonborder oints of grid*/ 18. for j 1 to n do 19. tem = A[i,j]; /*save old value of element*/ 20. A[i,j] 0.2 * (A[i,j] + A[i,j-1] + A[i-1,j] A[i,j+1] + A[i+1,j]); /*comute average*/ 22. diff += abs(a[i,j] - tem); 23. end for 24. end for 25. if (diff/(n*n) < TOL) then done = 1; 26. end while 27. end rocedure 25

10 Decomosition Simle way to identify concurrency is to look at loo iterations deendence analysis; if not enough concurrency, then look further Not much concurrency here at this level (all loos sequential) Examine fundamental deendences, ignoring loo structure Concurrency O(n) along anti-diagonals, serialization O(n) along diag. Retain loo structure, use t-to-t synch; Problem: too many synch os. Restructure loos, use global synch; imbalance and too much synch 26 Exloit Alication Knowledge Reorder grid traversal: red-black ordering Red oint Black oint Different ordering of udates: may converge quicker or slower Red swee and black swee are each fully arallel: Global synch between them (conservative but convenient) Ocean uses red-black; we use simler, asynchronous one to illustrate no red-black, simly ignore deendences within swee sequential order same as original, arallel rogram nondeterministic 27

11 Decomosition Only 15. while (!done) do /*a sequential loo*/ 16. diff = 0; 17. for_all i 1 to n do /*a arallel loo nest*/ 18. for_all j 1 to n do 19. tem = A[i,j]; 20. A[i,j] 0.2 * (A[i,j] + A[i,j-1] + A[i-1,j] A[i,j+1] + A[i+1,j]); 22. diff += abs(a[i,j] - tem); 23. end for_all 24. end for_all 25. if (diff/(n*n) < TOL) then done = 1; 26. end while Decomosition into elements: degree of concurrency n 2 To decomose into rows, make line 18 loo sequential; degree n for_all leaves assignment left to system but imlicit global synch. at end of for_all loo 28 Assignment Static assignments (given decomosition into rows) i block assignment of rows: Row i is assigned to rocess cyclic assignment of rows: rocess i is assigned rows i, i+, and so on P 0 P 1 P 2 P 4 Dynamic assignment get a row index, work on the row, get a new row, and so on Static assignment into rows reduces concurrency (from n to ) block assign. reduces communication by keeing adjacent rows together Let s dig into orchestration under three rogramming models 29

12 Data Parallel Solver 1. int n, nrocs; /*grid size (n + 2-by-n + 2) and number of rocesses*/ 2. float **A, diff = 0; 3. main() 4. begin 5. read(n); read(nrocs); ; /*read inut grid size and number of rocesses*/ 6. A G_MALLOC (a 2-d array of size n+2 by n+2 doubles); 7. initialize(a); /*initialize the matrix A somehow*/ 8. Solve (A); /*call the routine to solve equation*/ 9. end main 10. rocedure Solve(A) /*solve the equation system*/ 11. float **A; /*A is an (n + 2-by-n + 2) array*/ 12. begin 13. int i, j, done = 0; 14. float mydiff = 0, tem; 14a. DECOMP A[BLOCK,*, nrocs]; 15. while (!done) do /*outermost loo over swees*/ 16. mydiff = 0; /*initialize maximum difference to 0*/ 17. for_all i 1 to n do /*swee over non-border oints of grid*/ 18. for_all j 1 to n do 19. tem = A[i,j]; /*save old value of element*/ 20. A[i,j] 0.2 * (A[i,j] + A[i,j-1] + A[i-1,j] A[i,j+1] + A[i+1,j]); /*comute average*/ 22. mydiff += abs(a[i,j] - tem); 23. end for_all 24. end for_all 24a. REDUCE (mydiff, diff, ADD); 25. if (diff/(n*n) < TOL) then done = 1; 26. end while 27. end rocedure 30 Shared Address Sace Solver Single Program Multile Data (SPMD) Processes Solve Solve Solve Solve Swee Test Convergence Assignment controlled by values of variables used as loo bounds 31

13 1. int n, nrocs; /*matrix dimension and number of rocessors to be used*/ 2a. float **A, diff; /*A is global (shared) array reresenting the grid*/ /*diff is global (shared) maximum difference in current swee*/ 2b. LOCKDEC(diff_lock); /*declaration of lock to enforce mutual exclusion*/ 2c. BARDEC (bar1); /*barrier declaration for global synchronization between swees*/ 3. main() 4. begin 5. read(n); read(nrocs); /*read inut matrix size and number of rocesses*/ 6. A G_MALLOC (a two-dimensional array of size n+2 by n+2 doubles); 7. initialize(a); /*initialize A in an unsecified way*/ 8a. CREATE (nrocs 1, Solve, A); 8. Solve(A); /*main rocess becomes a worker too*/ 8b. WAIT_FOR_END (nrocs 1); /*wait for all child rocesses created to terminate*/ 9. end main 10. rocedure Solve(A) 11. float **A; /*A is entire n+2-by-n+2 shared array, as in the sequential rogram*/ 12. begin 13. int i,j, id, done = 0; 14. float tem, mydiff = 0; /*rivate variables*/ 14a. int mymin = 1 + (id * n/nrocs); /*assume that n is exactly divisible by*/ 14b. int mymax = mymin + n/nrocs - 1 /*nrocs for simlicity here*/ 15. while (!done) do /*outer loo over all diagonal elements*/ 16. mydiff = diff = 0; /*set global diff to 0 (okay for all to do it)*/ 16a. BARRIER(bar1, nrocs); /*ensure all reach here before anyone modifies diff*/ 17. for i mymin to mymax do /*for each of my rows*/ 18. for j 1 to n do /*for all nonborder elements in that row*/ 19. tem = A[i,j]; 20. A[i,j] = 0.2 * (A[i,j] + A[i,j-1] + A[i-1,j] A[i,j+1] + A[i+1,j]); 22. mydiff += abs(a[i,j] - tem); 23. endfor 24. endfor 25a. LOCK(diff_lock); /*udate global diff if necessary*/ 25b. diff += mydiff; 25c. UNLOCK(diff_lock); 25d. BARRIER(bar1, nrocs); /*ensure all reach here before checking if done*/ 25e. if (diff/(n*n) < TOL) then done = 1; /*check convergence; all get same answer*/ 25f. BARRIER(bar1, nrocs); 26. endwhile 27. end rocedure 32 Notes on SAS Program SPMD: not lockste or even necessarily same instructions Assignment controlled by values of variables used as loo bounds unique id er rocess, used to control assignment Done condition evaluated redundantly by all Code that does the udate identical to sequential rogram each rocess has rivate mydiff variable Most interesting secial oerations are for synchronization accumulations into shared diff have to be mutually exclusive why the need for all the barriers? 33

14 Message Passing Grid Solver Cannot declare A to be shared array any more Need to comose it logically from er-rocess rivate arrays usually allocated in accordance with the assignment of work rocess assigned a set of rows allocates them locally Transfers of entire rows between traversals Structurally similar to SAS (e.g. SPMD), but orchestration different data structures and data access/naming communication synchronization int id, n, b; /*rocess id, matrix dimension and number of rocessors to be used*/ 2. float **mya; 3. main() 4. begin 5. read(n); read(nrocs); /*read inut matrix size and number of rocesses*/ 8a. CREATE (nrocs-1, Solve); 8b. Solve(); /*main rocess becomes a worker too*/ 8c. WAIT_FOR_END (nrocs 1); /*wait for all child rocesses created to terminate*/ 9. end main 10. rocedure Solve() 11. begin 13. int i,j, id, n = n/nrocs, done = 0; 14. float tem, temdiff, mydiff = 0; /*rivate variables*/ 6. mya malloc(a 2-d array of size [n/nrocs + 2] by n+2); /*my assigned rows of A*/ 7. initialize(mya); /*initialize my rows of A, in an unsecified way*/ 15. while (!done) do 16. mydiff = 0; /*set local diff to 0*/ 16a. if (id!= 0) then SEND(&myA[1,0],n*sizeof(float),id-1,ROW); 16b. if (id = nrocs-1) then SEND(&myA[n,0],n*sizeof(float),id+1,ROW); 16c. if (id!= 0) then RECEIVE(&myA[0,0],n*sizeof(float),id-1,ROW); 16d. if (id!= nrocs-1) then RECEIVE(&myA[n +1,0],n*sizeof(float), id+1,row); /*border rows of neighbors have now been coied into mya[0,*] and mya[n +1,*]*/ 17. for i 1 to n do /*for each of my (nonghost) rows*/ 18. for j 1 to n do /*for all nonborder elements in that row*/ 19. tem = mya[i,j]; 20. mya[i,j] = 0.2 * (mya[i,j] + mya[i,j-1] + mya[i-1,j] mya[i,j+1] + mya[i+1,j]); 22. mydiff += abs(mya[i,j] - tem); 23. endfor 24. endfor /*communicate local diff values and determine if done; can be relaced by reduction and broadcast*/ 25a. if (id!= 0) then /*rocess 0 holds global total diff*/ 25b. SEND(mydiff,sizeof(float),0,DIFF); 25c. RECEIVE(done,sizeof(int),0,DONE); 25d. else /*id 0 does this*/ 25e. for i 1 to nrocs-1 do /*for each other rocess*/ 25f. RECEIVE(temdiff,sizeof(float),*,DIFF); 25g. mydiff += temdiff; /*accumulate into total*/ 25h. endfor 25i if (mydiff/(n*n) < TOL) then done = 1; 25j. for i 1 to nrocs-1 do /*for each other rocess*/ 25k. SEND(done,sizeof(int),i,DONE); 25l. endfor 25m. endif 26. endwhile 27. end rocedure 40

15 Notes on Message Passing Program Use of ghost rows Receive does not transfer data, send does unlike SAS which is usually receiver-initiated (load fetches data) Communication done at beginning of iteration, so no asynchrony Communication in whole rows, not element at a time Core similar, but indices/bounds in local rather than global sace Synchronization through sends and receives Udate of global diff and event synch for done condition Could imlement locks and barriers with messages Can use REDUCE and BROADCAST library calls to simlify code /*communicate local diff values and determine if done, using reduction and broadcast*/ 25b. REDUCE(0,mydiff,sizeof(float),ADD); 25c. if (id == 0) then 25i. if (mydiff/(n*n) < TOL) then done = 1; 25k. endif 25m. BROADCAST(0,done,sizeof(int),DONE); 41 Send and Receive Alternatives Can extend functionality: stride, scatter-gather, grous Semantic flavors: based on when control is returned Affect when data structures or buffers can be reused at either end Send/Receive Synchronous Asynchronous Blocking asynch. Nonblocking asynch. Affect event synch (mutual excl. by fiat: only one rocess touches data) Affect ease of rogramming and erformance Synchronous messages rovide built-in synch. through match Searate event synchronization needed with asynch. messages With synch. messages, our code is deadlocked. Fix? 42

16 Orchestration: Summary Shared address sace Shared and rivate data exlicitly searate Communication imlicit in access atterns No correctness need for data distribution Synchronization via atomic oerations on shared data Synchronization exlicit and distinct from data communication Message assing Data distribution among local address saces needed No exlicit shared structures (imlicit in comm. atterns) Communication is exlicit Synchronization imlicit in communication (at least in synch. case) mutual exclusion by fiat 43 Correctness in Grid Solver Program Decomosition and Assignment similar in SAS and message-assing Orchestration is different Data structures, data access/naming, communication, synchronization SAS Msg-Passing Exlicit global data structure? Yes No Assignment indet of data layout? Yes No Communication Imlicit Exlicit Synchronization Exlicit Imlicit Exlicit relication of border rows? No Yes Requirements for erformance are another story... 44

ECE7660 Parallel Computer Architecture. Perspective on Parallel Programming

ECE7660 Parallel Computer Architecture. Perspective on Parallel Programming ECE7660 Parallel Computer Architecture Perspective on Parallel Programming Outline Motivating Problems (application case studies) Process of creating a parallel program What a simple parallel program looks

More information

Parallelization of an Example Program

Parallelization of an Example Program Parallelization of an Example Program [ 2.3] In this lecture, we will consider a parallelization of the kernel of the Ocean application. Goals: Illustrate parallel programming in a low-level parallel language.

More information

Simulating ocean currents

Simulating ocean currents Simulating ocean currents We will study a parallel application that simulates ocean currents. Goal: Simulate the motion of water currents in the ocean. Important to climate modeling. Motion depends on

More information

Parallelization Principles. Sathish Vadhiyar

Parallelization Principles. Sathish Vadhiyar Parallelization Principles Sathish Vadhiyar Parallel Programming and Challenges Recall the advantages and motivation of parallelism But parallel programs incur overheads not seen in sequential programs

More information

Lecture 2: Parallel Programs. Topics: consistency, parallel applications, parallelization process

Lecture 2: Parallel Programs. Topics: consistency, parallel applications, parallelization process Lecture 2: Parallel Programs Topics: consistency, parallel applications, parallelization process 1 Sequential Consistency A multiprocessor is sequentially consistent if the result of the execution is achievable

More information

COMP Parallel Computing. BSP (1) Bulk-Synchronous Processing Model

COMP Parallel Computing. BSP (1) Bulk-Synchronous Processing Model COMP 6 - Parallel Comuting Lecture 6 November, 8 Bulk-Synchronous essing Model Models of arallel comutation Shared-memory model Imlicit communication algorithm design and analysis relatively simle but

More information

--Introduction to Culler et al. s Chapter 2, beginning 192 pages on software

--Introduction to Culler et al. s Chapter 2, beginning 192 pages on software CS/ECE 757: Advanced Computer Architecture II (Parallel Computer Architecture) Parallel Programming (Chapters 2, 3, & 4) Copyright 2001 Mark D. Hill University of Wisconsin-Madison Slides are derived from

More information

Lecture 26: Multiprocessors. Today s topics: Synchronization Consistency Shared memory vs message-passing

Lecture 26: Multiprocessors. Today s topics: Synchronization Consistency Shared memory vs message-passing Lecture 26: Multiprocessors Today s topics: Synchronization Consistency Shared memory vs message-passing 1 Constructing Locks Applications have phases (consisting of many instructions) that must be executed

More information

Parallel Programming. Motivating Problems (application case studies) Process of creating a parallel program

Parallel Programming. Motivating Problems (application case studies) Process of creating a parallel program arallel rogramming Simulating Ocean Currents Lecture 8: arallel rocessing Motivating roblems (application case studies) rocess of creating a parallel program rof. Fred Chong ECS 250A Computer Architecture

More information

Lecture 26: Multiprocessors. Today s topics: Directory-based coherence Synchronization Consistency Shared memory vs message-passing

Lecture 26: Multiprocessors. Today s topics: Directory-based coherence Synchronization Consistency Shared memory vs message-passing Lecture 26: Multiprocessors Today s topics: Directory-based coherence Synchronization Consistency Shared memory vs message-passing 1 Cache Coherence Protocols Directory-based: A single location (directory)

More information

Lecture 1: Introduction

Lecture 1: Introduction Lecture 1: Introduction ourse organization: 13 lectures on parallel architectures ~5 lectures on cache coherence, consistency ~3 lectures on TM ~2 lectures on interconnection networks ~2 lectures on large

More information

Multiprocessors at Earth This lecture: How do we program such computers?

Multiprocessors at Earth This lecture: How do we program such computers? Multiprocessors at IT/UPPMAX DARK 2 Programming of Multiprocessors Sverker Holmgren IT/Division of Scientific Computing Ra. 280 proc. Cluster with SM nodes Ngorongoro. 64 proc. SM (cc-numa) Debet&Kredit.

More information

Lecture 18: Shared-Memory Multiprocessors. Topics: coherence protocols for symmetric shared-memory multiprocessors (Sections

Lecture 18: Shared-Memory Multiprocessors. Topics: coherence protocols for symmetric shared-memory multiprocessors (Sections Lecture 18: Shared-Memory Multiprocessors Topics: coherence protocols for symmetric shared-memory multiprocessors (Sections 4.1-4.2) 1 Ocean Kernel Procedure Solve(A) begin diff = done = 0; while (!done)

More information

Lecture 27: Multiprocessors. Today s topics: Shared memory vs message-passing Simultaneous multi-threading (SMT) GPUs

Lecture 27: Multiprocessors. Today s topics: Shared memory vs message-passing Simultaneous multi-threading (SMT) GPUs Lecture 27: Multiprocessors Today s topics: Shared memory vs message-passing Simultaneous multi-threading (SMT) GPUs 1 Shared-Memory Vs. Message-Passing Shared-memory: Well-understood programming model

More information

Complexity Issues on Designing Tridiagonal Solvers on 2-Dimensional Mesh Interconnection Networks

Complexity Issues on Designing Tridiagonal Solvers on 2-Dimensional Mesh Interconnection Networks Journal of Comuting and Information Technology - CIT 8, 2000, 1, 1 12 1 Comlexity Issues on Designing Tridiagonal Solvers on 2-Dimensional Mesh Interconnection Networks Eunice E. Santos Deartment of Electrical

More information

Shuigeng Zhou. May 18, 2016 School of Computer Science Fudan University

Shuigeng Zhou. May 18, 2016 School of Computer Science Fudan University Query Processing Shuigeng Zhou May 18, 2016 School of Comuter Science Fudan University Overview Outline Measures of Query Cost Selection Oeration Sorting Join Oeration Other Oerations Evaluation of Exressions

More information

Lecture 18. Today, we will discuss developing algorithms for a basic model for parallel computing the Parallel Random Access Machine (PRAM) model.

Lecture 18. Today, we will discuss developing algorithms for a basic model for parallel computing the Parallel Random Access Machine (PRAM) model. U.C. Berkeley CS273: Parallel and Distributed Theory Lecture 18 Professor Satish Rao Lecturer: Satish Rao Last revised Scribe so far: Satish Rao (following revious lecture notes quite closely. Lecture

More information

10. Parallel Methods for Data Sorting

10. Parallel Methods for Data Sorting 10. Parallel Methods for Data Sorting 10. Parallel Methods for Data Sorting... 1 10.1. Parallelizing Princiles... 10.. Scaling Parallel Comutations... 10.3. Bubble Sort...3 10.3.1. Sequential Algorithm...3

More information

Parallel Programs. EECC756 - Shaaban. Parallel Random-Access Machine (PRAM) Example: Asynchronous Matrix Vector Product on a Ring

Parallel Programs. EECC756 - Shaaban. Parallel Random-Access Machine (PRAM) Example: Asynchronous Matrix Vector Product on a Ring Parallel Programs Conditions of Parallelism: Data Dependence Control Dependence Resource Dependence Bernstein s Conditions Asymptotic Notations for Algorithm Analysis Parallel Random-Access Machine (PRAM)

More information

10. Multiprocessor Scheduling (Advanced)

10. Multiprocessor Scheduling (Advanced) 10. Multirocessor Scheduling (Advanced) Oerating System: Three Easy Pieces AOS@UC 1 Multirocessor Scheduling The rise of the multicore rocessor is the source of multirocessorscheduling roliferation. w

More information

Lecture 17: Multiprocessors. Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )

Lecture 17: Multiprocessors. Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections ) Lecture 17: Multiprocessors Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections 4.1-4.2) 1 Taxonomy SISD: single instruction and single data stream: uniprocessor

More information

[9] J. J. Dongarra, R. Hempel, A. J. G. Hey, and D. W. Walker, \A Proposal for a User-Level,

[9] J. J. Dongarra, R. Hempel, A. J. G. Hey, and D. W. Walker, \A Proposal for a User-Level, [9] J. J. Dongarra, R. Hemel, A. J. G. Hey, and D. W. Walker, \A Proosal for a User-Level, Message Passing Interface in a Distributed-Memory Environment," Tech. Re. TM-3, Oak Ridge National Laboratory,

More information

AUTOMATIC GENERATION OF HIGH THROUGHPUT ENERGY EFFICIENT STREAMING ARCHITECTURES FOR ARBITRARY FIXED PERMUTATIONS. Ren Chen and Viktor K.

AUTOMATIC GENERATION OF HIGH THROUGHPUT ENERGY EFFICIENT STREAMING ARCHITECTURES FOR ARBITRARY FIXED PERMUTATIONS. Ren Chen and Viktor K. inuts er clock cycle Streaming ermutation oututs er clock cycle AUTOMATIC GENERATION OF HIGH THROUGHPUT ENERGY EFFICIENT STREAMING ARCHITECTURES FOR ARBITRARY FIXED PERMUTATIONS Ren Chen and Viktor K.

More information

Introduction to Parallel Algorithms

Introduction to Parallel Algorithms CS 1762 Fall, 2011 1 Introduction to Parallel Algorithms Introduction to Parallel Algorithms ECE 1762 Algorithms and Data Structures Fall Semester, 2011 1 Preliminaries Since the early 1990s, there has

More information

Efficient Parallel Hierarchical Clustering

Efficient Parallel Hierarchical Clustering Efficient Parallel Hierarchical Clustering Manoranjan Dash 1,SimonaPetrutiu, and Peter Scheuermann 1 Deartment of Information Systems, School of Comuter Engineering, Nanyang Technological University, Singaore

More information

Learning Motion Patterns in Crowded Scenes Using Motion Flow Field

Learning Motion Patterns in Crowded Scenes Using Motion Flow Field Learning Motion Patterns in Crowded Scenes Using Motion Flow Field Min Hu, Saad Ali and Mubarak Shah Comuter Vision Lab, University of Central Florida {mhu,sali,shah}@eecs.ucf.edu Abstract Learning tyical

More information

Parallel Programming Basics

Parallel Programming Basics Lecture 4: Parallel Programming Basics Parallel Computer Architecture and Programming CMU 15-418/15-618, Fall 2017 1 Review: 3 parallel programming models Shared address space - Communication is unstructured,

More information

Space-efficient Region Filling in Raster Graphics

Space-efficient Region Filling in Raster Graphics "The Visual Comuter: An International Journal of Comuter Grahics" (submitted July 13, 1992; revised December 7, 1992; acceted in Aril 16, 1993) Sace-efficient Region Filling in Raster Grahics Dominik Henrich

More information

Lecture 27: Pot-Pourri. Today s topics: Shared memory vs message-passing Simultaneous multi-threading (SMT) GPUs Disks and reliability

Lecture 27: Pot-Pourri. Today s topics: Shared memory vs message-passing Simultaneous multi-threading (SMT) GPUs Disks and reliability Lecture 27: Pot-Pourri Today s topics: Shared memory vs message-passing Simultaneous multi-threading (SMT) GPUs Disks and reliability 1 Shared-Memory Vs. Message-Passing Shared-memory: Well-understood

More information

arxiv: v1 [cs.dc] 13 Nov 2018

arxiv: v1 [cs.dc] 13 Nov 2018 Task Grah Transformations for Latency Tolerance arxiv:1811.05077v1 [cs.dc] 13 Nov 2018 Victor Eijkhout November 14, 2018 Abstract The Integrative Model for Parallelism (IMP) derives a task grah from a

More information

Multicast in Wormhole-Switched Torus Networks using Edge-Disjoint Spanning Trees 1

Multicast in Wormhole-Switched Torus Networks using Edge-Disjoint Spanning Trees 1 Multicast in Wormhole-Switched Torus Networks using Edge-Disjoint Sanning Trees 1 Honge Wang y and Douglas M. Blough z y Myricom Inc., 325 N. Santa Anita Ave., Arcadia, CA 916, z School of Electrical and

More information

1.5 Case Study. dynamic connectivity quick find quick union improvements applications

1.5 Case Study. dynamic connectivity quick find quick union improvements applications . Case Study dynamic connectivity quick find quick union imrovements alications Subtext of today s lecture (and this course) Stes to develoing a usable algorithm. Model the roblem. Find an algorithm to

More information

521493S Computer Graphics Exercise 3 (Chapters 6-8)

521493S Computer Graphics Exercise 3 (Chapters 6-8) 521493S Comuter Grahics Exercise 3 (Chaters 6-8) 1 Most grahics systems and APIs use the simle lighting and reflection models that we introduced for olygon rendering Describe the ways in which each of

More information

Recap: Consensus. CSE 486/586 Distributed Systems Mutual Exclusion. Why Mutual Exclusion? Why Mutual Exclusion? Mutexes. Mutual Exclusion C 1

Recap: Consensus. CSE 486/586 Distributed Systems Mutual Exclusion. Why Mutual Exclusion? Why Mutual Exclusion? Mutexes. Mutual Exclusion C 1 Reca: Consensus Distributed Systems Mutual Exclusion Steve Ko Comuter Sciences and Engineering University at Buffalo On a synchronous system There s an algorithm that works. On an asynchronous system It

More information

10 File System Mass Storage Structure Mass Storage Systems Mass Storage Structure Mass Storage Structure FILE SYSTEM 1

10 File System Mass Storage Structure Mass Storage Systems Mass Storage Structure Mass Storage Structure FILE SYSTEM 1 10 File System 1 We will examine this chater in three subtitles: Mass Storage Systems OERATING SYSTEMS FILE SYSTEM 1 File System Interface File System Imlementation 10.1.1 Mass Storage Structure 3 2 10.1

More information

Lect. 2: Types of Parallelism

Lect. 2: Types of Parallelism Lect. 2: Types of Parallelism Parallelism in Hardware (Uniprocessor) Parallelism in a Uniprocessor Pipelining Superscalar, VLIW etc. SIMD instructions, Vector processors, GPUs Multiprocessor Symmetric

More information

Grouping of Patches in Progressive Radiosity

Grouping of Patches in Progressive Radiosity Grouing of Patches in Progressive Radiosity Arjan J.F. Kok * Abstract The radiosity method can be imroved by (adatively) grouing small neighboring atches into grous. Comutations normally done for searate

More information

Lecture: Memory Technology Innovations

Lecture: Memory Technology Innovations Lecture: Memory Technology Innovations Topics: state-of-the-art and upcoming changes: buffer chips, 3D stacking, non-volatile cells, photonics Multiprocessor intro 1 Modern Memory System...... PROC.. 4

More information

EE678 Application Presentation Content Based Image Retrieval Using Wavelets

EE678 Application Presentation Content Based Image Retrieval Using Wavelets EE678 Alication Presentation Content Based Image Retrieval Using Wavelets Grou Members: Megha Pandey megha@ee. iitb.ac.in 02d07006 Gaurav Boob gb@ee.iitb.ac.in 02d07008 Abstract: We focus here on an effective

More information

Efficient Processing of Top-k Dominating Queries on Multi-Dimensional Data

Efficient Processing of Top-k Dominating Queries on Multi-Dimensional Data Efficient Processing of To-k Dominating Queries on Multi-Dimensional Data Man Lung Yiu Deartment of Comuter Science Aalborg University DK-922 Aalborg, Denmark mly@cs.aau.dk Nikos Mamoulis Deartment of

More information

An Efficient Video Program Delivery algorithm in Tree Networks*

An Efficient Video Program Delivery algorithm in Tree Networks* 3rd International Symosium on Parallel Architectures, Algorithms and Programming An Efficient Video Program Delivery algorithm in Tree Networks* Fenghang Yin 1 Hong Shen 1,2,** 1 Deartment of Comuter Science,

More information

Source-to-Source Code Generation Based on Pattern Matching and Dynamic Programming

Source-to-Source Code Generation Based on Pattern Matching and Dynamic Programming Source-to-Source Code Generation Based on Pattern Matching and Dynamic Programming Weimin Chen, Volker Turau TR-93-047 August, 1993 Abstract This aer introduces a new technique for source-to-source code

More information

A New and Efficient Algorithm-Based Fault Tolerance Scheme for A Million Way Parallelism

A New and Efficient Algorithm-Based Fault Tolerance Scheme for A Million Way Parallelism A New and Efficient Algorithm-Based Fault Tolerance Scheme for A Million Way Parallelism Erlin Yao, Mingyu Chen, Rui Wang, Wenli Zhang, Guangming Tan Key Laboratory of Comuter System and Architecture Institute

More information

Introduction to Visualization and Computer Graphics

Introduction to Visualization and Computer Graphics Introduction to Visualization and Comuter Grahics DH2320, Fall 2015 Prof. Dr. Tino Weinkauf Introduction to Visualization and Comuter Grahics Grids and Interolation Next Tuesday No lecture next Tuesday!

More information

Fast Distributed Process Creation with the XMOS XS1 Architecture

Fast Distributed Process Creation with the XMOS XS1 Architecture Communicating Process Architectures 20 P.H. Welch et al. (Eds.) IOS Press, 20 c 20 The authors and IOS Press. All rights reserved. Fast Distributed Process Creation with the XMOS XS Architecture James

More information

IMS Network Deployment Cost Optimization Based on Flow-Based Traffic Model

IMS Network Deployment Cost Optimization Based on Flow-Based Traffic Model IMS Network Deloyment Cost Otimization Based on Flow-Based Traffic Model Jie Xiao, Changcheng Huang and James Yan Deartment of Systems and Comuter Engineering, Carleton University, Ottawa, Canada {jiexiao,

More information

Equality-Based Translation Validator for LLVM

Equality-Based Translation Validator for LLVM Equality-Based Translation Validator for LLVM Michael Ste, Ross Tate, and Sorin Lerner University of California, San Diego {mste,rtate,lerner@cs.ucsd.edu Abstract. We udated our Peggy tool, reviously resented

More information

Lecture: Coherence Protocols. Topics: wrap-up of memory systems, multi-thread programming models, snooping-based protocols

Lecture: Coherence Protocols. Topics: wrap-up of memory systems, multi-thread programming models, snooping-based protocols Lecture: Coherence Protocols Topics: wrap-up of memory systems, multi-thread programming models, snooping-based protocols 1 Future Memory Trends pin count is not increasing High memory bandwidth requires

More information

Control plane and data plane. Computing systems now. Glacial process of innovation made worse by standards process. Computing systems once upon a time

Control plane and data plane. Computing systems now. Glacial process of innovation made worse by standards process. Computing systems once upon a time Classical work Architecture A A A Intro to SDN A A Oerating A Secialized Packet A A Oerating Secialized Packet A A A Oerating A Secialized Packet A A Oerating A Secialized Packet Oerating Secialized Packet

More information

Autonomic Physical Database Design - From Indexing to Multidimensional Clustering

Autonomic Physical Database Design - From Indexing to Multidimensional Clustering Autonomic Physical Database Design - From Indexing to Multidimensional Clustering Stehan Baumann, Kai-Uwe Sattler Databases and Information Systems Grou Technische Universität Ilmenau, Ilmenau, Germany

More information

Sensitivity Analysis for an Optimal Routing Policy in an Ad Hoc Wireless Network

Sensitivity Analysis for an Optimal Routing Policy in an Ad Hoc Wireless Network 1 Sensitivity Analysis for an Otimal Routing Policy in an Ad Hoc Wireless Network Tara Javidi and Demosthenis Teneketzis Deartment of Electrical Engineering and Comuter Science University of Michigan Ann

More information

2. Introduction to Operating Systems

2. Introduction to Operating Systems 2. Introduction to Oerating Systems Oerating System: Three Easy Pieces 1 What a haens when a rogram runs? A running rogram executes instructions. 1. The rocessor fetches an instruction from memory. 2.

More information

Distributed Systems (5DV147)

Distributed Systems (5DV147) Distributed Systems (5DV147) Mutual Exclusion and Elections Fall 2013 1 Processes often need to coordinate their actions Which rocess gets to access a shared resource? Has the master crashed? Elect a new

More information

Convex Hulls. Helen Cameron. Helen Cameron Convex Hulls 1/101

Convex Hulls. Helen Cameron. Helen Cameron Convex Hulls 1/101 Convex Hulls Helen Cameron Helen Cameron Convex Hulls 1/101 What Is a Convex Hull? Starting Point: Points in 2D y x Helen Cameron Convex Hulls 3/101 Convex Hull: Informally Imagine that the x, y-lane is

More information

Collective Communication: Theory, Practice, and Experience. FLAME Working Note #22

Collective Communication: Theory, Practice, and Experience. FLAME Working Note #22 Collective Communication: Theory, Practice, and Exerience FLAME Working Note # Ernie Chan Marcel Heimlich Avi Purkayastha Robert van de Geijn Setember, 6 Abstract We discuss the design and high-erformance

More information

Parallel Construction of Multidimensional Binary Search Trees. Ibraheem Al-furaih, Srinivas Aluru, Sanjay Goil Sanjay Ranka

Parallel Construction of Multidimensional Binary Search Trees. Ibraheem Al-furaih, Srinivas Aluru, Sanjay Goil Sanjay Ranka Parallel Construction of Multidimensional Binary Search Trees Ibraheem Al-furaih, Srinivas Aluru, Sanjay Goil Sanjay Ranka School of CIS and School of CISE Northeast Parallel Architectures Center Syracuse

More information

Optimizing Dynamic Memory Management!

Optimizing Dynamic Memory Management! Otimizing Dynamic Memory Management! 1 Goals of this Lecture! Hel you learn about:" Details of K&R hea mgr" Hea mgr otimizations related to Assignment #6" Faster free() via doubly-linked list, redundant

More information

The Anubis Service. Paul Murray Internet Systems and Storage Laboratory HP Laboratories Bristol HPL June 8, 2005*

The Anubis Service. Paul Murray Internet Systems and Storage Laboratory HP Laboratories Bristol HPL June 8, 2005* The Anubis Service Paul Murray Internet Systems and Storage Laboratory HP Laboratories Bristol HPL-2005-72 June 8, 2005* timed model, state monitoring, failure detection, network artition Anubis is a fully

More information

CS 470 Spring Mike Lam, Professor. Performance Analysis

CS 470 Spring Mike Lam, Professor. Performance Analysis CS 470 Sring 2018 Mike Lam, Professor Performance Analysis Performance analysis Why do we arallelize our rograms? Performance analysis Why do we arallelize our rograms? So that they run faster! Performance

More information

The VEGA Moderately Parallel MIMD, Moderately Parallel SIMD, Architecture for High Performance Array Signal Processing

The VEGA Moderately Parallel MIMD, Moderately Parallel SIMD, Architecture for High Performance Array Signal Processing The VEGA Moderately Parallel MIMD, Moderately Parallel SIMD, Architecture for High Performance Array Signal Processing Mikael Taveniku 2,3, Anders Åhlander 1,3, Magnus Jonsson 1 and Bertil Svensson 1,2

More information

PRO: a Model for Parallel Resource-Optimal Computation

PRO: a Model for Parallel Resource-Optimal Computation PRO: a Model for Parallel Resource-Otimal Comutation Assefaw Hadish Gebremedhin Isabelle Guérin Lassous Jens Gustedt Jan Arne Telle Abstract We resent a new arallel comutation model that enables the design

More information

Lecture 8: Orthogonal Range Searching

Lecture 8: Orthogonal Range Searching CPS234 Comutational Geometry Setember 22nd, 2005 Lecture 8: Orthogonal Range Searching Lecturer: Pankaj K. Agarwal Scribe: Mason F. Matthews 8.1 Range Searching The general roblem of range searching is

More information

Optimization of Collective Communication Operations in MPICH

Optimization of Collective Communication Operations in MPICH To be ublished in the International Journal of High Performance Comuting Alications, 5. c Sage Publications. Otimization of Collective Communication Oerations in MPICH Rajeev Thakur Rolf Rabenseifner William

More information

Flynn's Classification

Flynn's Classification Multiprocessors Oracles's SPARC M7-32 core, 64MB L3 cache (8 x 8 MB), 1.6TB/s. 256 KB of 4-way SA L2 ICache, 0.5 TB/s per cluster. 2 cores share 256 KB, 8-way SA L2 DCache, 0.5 TB/s. Flynn's Classification

More information

Lecture: Memory, Coherence Protocols. Topics: wrap-up of memory systems, intro to multi-thread programming models

Lecture: Memory, Coherence Protocols. Topics: wrap-up of memory systems, intro to multi-thread programming models Lecture: Memory, Coherence Protocols Topics: wrap-up of memory systems, intro to multi-thread programming models 1 Refresh Every DRAM cell must be refreshed within a 64 ms window A row read/write automatically

More information

Parallel Algorithms for the Summed Area Table on the Asynchronous Hierarchical Memory Machine, with GPU implementations

Parallel Algorithms for the Summed Area Table on the Asynchronous Hierarchical Memory Machine, with GPU implementations Parallel Algorithms for the Summed Area Table on the Asynchronous Hierarchical Memory Machine, ith GPU imlementations Akihiko Kasagi, Koji Nakano, and Yasuaki Ito Deartment of Information Engineering Hiroshima

More information

Object and Native Code Thread Mobility Among Heterogeneous Computers

Object and Native Code Thread Mobility Among Heterogeneous Computers Object and Native Code Thread Mobility Among Heterogeneous Comuters Bjarne Steensgaard Eric Jul Microsoft Research DIKU (Det. of Comuter Science) One Microsoft Way University of Coenhagen Redmond, WA 98052

More information

Lecture: Coherence Protocols. Topics: wrap-up of memory systems, multi-thread programming models, snooping-based protocols

Lecture: Coherence Protocols. Topics: wrap-up of memory systems, multi-thread programming models, snooping-based protocols Lecture: Coherence Protocols Topics: wrap-up of memory systems, multi-thread programming models, snooping-based protocols 1 Future Memory Trends pin count is not increasing High memory bandwidth requires

More information

Lecture: Memory, Coherence Protocols. Topics: wrap-up of memory systems, multi-thread programming models, snooping-based protocols

Lecture: Memory, Coherence Protocols. Topics: wrap-up of memory systems, multi-thread programming models, snooping-based protocols Lecture: Memory, Coherence Protocols Topics: wrap-up of memory systems, multi-thread programming models, snooping-based protocols 1 Modern Memory System...... PROC.. 4 DDR3 channels 64-bit data channels

More information

Simple example. Analysis of programs with pointers. Points-to relation. Program model. Points-to graph. Ordering on points-to relation

Simple example. Analysis of programs with pointers. Points-to relation. Program model. Points-to graph. Ordering on points-to relation Simle eamle Analsis of rograms with ointers := 5 tr := @ *tr := 9 := rogram S1 S2 S3 S4 deendences What are the deendences in this rogram? Problem: just looking at variable names will not give ou the correct

More information

The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops

The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops The R-LRPD Test: Seculative Parallelization of Partially Parallel Loos Francis Dang, Hao Yu, Lawrence Rauchwerger Det. of Comuter Science, Texas A&M University College Station, TX 778- {fhd,hy89,rwerger}@cs.tamu.edu

More information

Mitigating the Impact of Decompression Latency in L1 Compressed Data Caches via Prefetching

Mitigating the Impact of Decompression Latency in L1 Compressed Data Caches via Prefetching Mitigating the Imact of Decomression Latency in L1 Comressed Data Caches via Prefetching by Sean Rea A thesis resented to Lakehead University in artial fulfillment of the requirement for the degree of

More information

PREDICTING LINKS IN LARGE COAUTHORSHIP NETWORKS

PREDICTING LINKS IN LARGE COAUTHORSHIP NETWORKS PREDICTING LINKS IN LARGE COAUTHORSHIP NETWORKS Kevin Miller, Vivian Lin, and Rui Zhang Grou ID: 5 1. INTRODUCTION The roblem we are trying to solve is redicting future links or recovering missing links

More information

Module 4: Parallel Programming: Shared Memory and Message Passing Lecture 7: Examples of Shared Memory and Message Passing Programming

Module 4: Parallel Programming: Shared Memory and Message Passing Lecture 7: Examples of Shared Memory and Message Passing Programming The Lecture Contains: Shared Memory Version Mutual Exclusion LOCK Optimization More Synchronization Message Passing Major Changes MPI-like Environment file:///d /...haudhary,%20dr.%20sanjeev%20k%20aggrwal%20&%20dr.%20rajat%20moona/multi-core_architecture/lecture7/7_1.htm[6/14/2012

More information

ISO/IEC , the standard for Modula-2: Changes, Clarications and Additions. June 27, 1996

ISO/IEC , the standard for Modula-2: Changes, Clarications and Additions. June 27, 1996 ISO/IEC 10514-1, the standard for Modula-2: Changes, Clarications and Additions M. Schonhacker Vienna University of Technology Austria schoenhacker@eiunix.tuwien.ac.at C. Pronk Delft University of Technology

More information

Lecture 3: Geometric Algorithms(Convex sets, Divide & Conquer Algo.)

Lecture 3: Geometric Algorithms(Convex sets, Divide & Conquer Algo.) Advanced Algorithms Fall 2015 Lecture 3: Geometric Algorithms(Convex sets, Divide & Conuer Algo.) Faculty: K.R. Chowdhary : Professor of CS Disclaimer: These notes have not been subjected to the usual

More information

GEOMETRIC CONSTRAINT SOLVING IN < 2 AND < 3. Department of Computer Sciences, Purdue University. and PAMELA J. VERMEER

GEOMETRIC CONSTRAINT SOLVING IN < 2 AND < 3. Department of Computer Sciences, Purdue University. and PAMELA J. VERMEER GEOMETRIC CONSTRAINT SOLVING IN < AND < 3 CHRISTOPH M. HOFFMANN Deartment of Comuter Sciences, Purdue University West Lafayette, Indiana 47907-1398, USA and PAMELA J. VERMEER Deartment of Comuter Sciences,

More information

Parallel Mesh Generation

Parallel Mesh Generation Parallel Mesh Generation Nikos Chrisochoides Comuter Science Deartment College of William and Mary Williamsburg, VA 23185 and Division of Alied Mathematics Brown University 182 George Street Providence,

More information

An improved algorithm for Hausdorff Voronoi diagram for non-crossing sets

An improved algorithm for Hausdorff Voronoi diagram for non-crossing sets An imroved algorithm for Hausdorff Voronoi diagram for non-crossing sets Frank Dehne, Anil Maheshwari and Ryan Taylor May 26, 2006 Abstract We resent an imroved algorithm for building a Hausdorff Voronoi

More information

Models for Advancing PRAM and Other Algorithms into Parallel Programs for a PRAM-On-Chip Platform

Models for Advancing PRAM and Other Algorithms into Parallel Programs for a PRAM-On-Chip Platform Models for Advancing PRAM and Other Algorithms into Parallel Programs for a PRAM-On-Chi Platform Uzi Vishkin George C. Caragea Bryant Lee Aril 2006 University of Maryland, College Park, MD 20740 UMIACS-TR

More information

Coarse grained gather and scatter operations with applications

Coarse grained gather and scatter operations with applications Available online at www.sciencedirect.com J. Parallel Distrib. Comut. 64 (2004) 1297 1310 www.elsevier.com/locate/jdc Coarse grained gather and scatter oerations with alications Laurence Boxer a,b,, Russ

More information

Patterned Wafer Segmentation

Patterned Wafer Segmentation atterned Wafer Segmentation ierrick Bourgeat ab, Fabrice Meriaudeau b, Kenneth W. Tobin a, atrick Gorria b a Oak Ridge National Laboratory,.O.Box 2008, Oak Ridge, TN 37831-6011, USA b Le2i Laboratory Univ.of

More information

Texture Mapping with Vector Graphics: A Nested Mipmapping Solution

Texture Mapping with Vector Graphics: A Nested Mipmapping Solution Texture Maing with Vector Grahics: A Nested Mimaing Solution Wei Zhang Yonggao Yang Song Xing Det. of Comuter Science Det. of Comuter Science Det. of Information Systems Prairie View A&M University Prairie

More information

Truth Trees. Truth Tree Fundamentals

Truth Trees. Truth Tree Fundamentals Truth Trees 1 True Tree Fundamentals 2 Testing Grous of Statements for Consistency 3 Testing Arguments in Proositional Logic 4 Proving Invalidity in Predicate Logic Answers to Selected Exercises Truth

More information

An Efficient Coding Method for Coding Region-of-Interest Locations in AVS2

An Efficient Coding Method for Coding Region-of-Interest Locations in AVS2 An Efficient Coding Method for Coding Region-of-Interest Locations in AVS2 Mingliang Chen 1, Weiyao Lin 1*, Xiaozhen Zheng 2 1 Deartment of Electronic Engineering, Shanghai Jiao Tong University, China

More information

Graph: composable production systems in Clojure. Jason Wolfe Strange Loop 12

Graph: composable production systems in Clojure. Jason Wolfe Strange Loop 12 Grah: comosable roduction systems in Clojure Jason Wolfe (@w01fe) Strange Loo 12 Motivation Interesting software has: many comonents comlex web of deendencies Develoers want: simle, factored code easy

More information

Non-Strict Independence-Based Program Parallelization Using Sharing and Freeness Information

Non-Strict Independence-Based Program Parallelization Using Sharing and Freeness Information Non-Strict Indeendence-Based Program Parallelization Using Sharing and Freeness Information Daniel Cabeza Gras 1 and Manuel V. Hermenegildo 1,2 Abstract The current ubiuity of multi-core rocessors has

More information

CMSC 425: Lecture 16 Motion Planning: Basic Concepts

CMSC 425: Lecture 16 Motion Planning: Basic Concepts : Lecture 16 Motion lanning: Basic Concets eading: Today s material comes from various sources, including AI Game rogramming Wisdom 2 by S. abin and lanning Algorithms by S. M. LaValle (Chats. 4 and 5).

More information

OMNI: An Efficient Overlay Multicast. Infrastructure for Real-time Applications

OMNI: An Efficient Overlay Multicast. Infrastructure for Real-time Applications OMNI: An Efficient Overlay Multicast Infrastructure for Real-time Alications Suman Banerjee, Christoher Kommareddy, Koushik Kar, Bobby Bhattacharjee, Samir Khuller Abstract We consider an overlay architecture

More information

S16-02, URL:

S16-02, URL: Self Introduction A/Prof ay Seng Chuan el: Email: scitaysc@nus.edu.sg Office: S-0, Dean s s Office at Level URL: htt://www.hysics.nus.edu.sg/~hytaysc I was a rogrammer from to. I have been working in NUS

More information

Privacy Preserving Moving KNN Queries

Privacy Preserving Moving KNN Queries Privacy Preserving Moving KNN Queries arxiv:4.76v [cs.db] 4 Ar Tanzima Hashem Lars Kulik Rui Zhang National ICT Australia, Deartment of Comuter Science and Software Engineering University of Melbourne,

More information

Distributed Algorithms

Distributed Algorithms Course Outline With grateful acknowledgement to Christos Karamanolis for much of the material Jeff Magee & Jeff Kramer Models of distributed comuting Synchronous message-assing distributed systems Algorithms

More information

Using Rational Numbers and Parallel Computing to Efficiently Avoid Round-off Errors on Map Simplification

Using Rational Numbers and Parallel Computing to Efficiently Avoid Round-off Errors on Map Simplification Using Rational Numbers and Parallel Comuting to Efficiently Avoid Round-off Errors on Ma Simlification Maurício G. Grui 1, Salles V. G. de Magalhães 1,2, Marcus V. A. Andrade 1, W. Randolh Franklin 2,

More information

Chapter 8: Adaptive Networks

Chapter 8: Adaptive Networks Chater : Adative Networks Introduction (.1) Architecture (.2) Backroagation for Feedforward Networks (.3) Jyh-Shing Roger Jang et al., Neuro-Fuzzy and Soft Comuting: A Comutational Aroach to Learning and

More information

A Study of Protocols for Low-Latency Video Transport over the Internet

A Study of Protocols for Low-Latency Video Transport over the Internet A Study of Protocols for Low-Latency Video Transort over the Internet Ciro A. Noronha, Ph.D. Cobalt Digital Santa Clara, CA ciro.noronha@cobaltdigital.com Juliana W. Noronha University of California, Davis

More information

Collective communication: theory, practice, and experience

Collective communication: theory, practice, and experience CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Comutat.: Pract. Exer. 2007; 19:1749 1783 Published online 5 July 2007 in Wiley InterScience (www.interscience.wiley.com)..1206 Collective

More information

Distributed Estimation from Relative Measurements in Sensor Networks

Distributed Estimation from Relative Measurements in Sensor Networks Distributed Estimation from Relative Measurements in Sensor Networks #Prabir Barooah and João P. Hesanha Abstract We consider the roblem of estimating vectorvalued variables from noisy relative measurements.

More information

SPITFIRE: Scalable Parallel Algorithms for Test Set Partitioned Fault Simulation

SPITFIRE: Scalable Parallel Algorithms for Test Set Partitioned Fault Simulation To aear in IEEE VLSI Test Symosium, 1997 SITFIRE: Scalable arallel Algorithms for Test Set artitioned Fault Simulation Dili Krishnaswamy y Elizabeth M. Rudnick y Janak H. atel y rithviraj Banerjee z y

More information

PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs

PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs PowerGrah: Distributed Grah-Parallel Comutation on Natural Grahs Joseh E. Gonzalez Carnegie Mellon University jegonzal@cs.cmu.edu Danny Bickson Carnegie Mellon University bickson@cs.cmu.edu Yucheng Low

More information

An empirical analysis of loopy belief propagation in three topologies: grids, small-world networks and random graphs

An empirical analysis of loopy belief propagation in three topologies: grids, small-world networks and random graphs An emirical analysis of looy belief roagation in three toologies: grids, small-world networks and random grahs R. Santana, A. Mendiburu and J. A. Lozano Intelligent Systems Grou Deartment of Comuter Science

More information