Architecture and Programming Model for High Performance Interactive Computation

Size: px
Start display at page:

Download "Architecture and Programming Model for High Performance Interactive Computation"

Transcription

1 Architecture and Programming Model for High Performance Interactive Computation Based on Title of TM127, CAPSL TM-127 By Jack B. Dennis, Arvind, Guang R. Gao, Xiaoming Li and Lian-Ping Wang April 3rd, 2014 Haitao Wei CAPSL at UDEL CPEG-852:Advanced Topics in Computing Systems 4/22/14

2 Outline Introduction to DDDAS/Interaction Computation An Example and Problems Fresh Breeze Execution Model and Architecture Execution Model Memory Model Task Model Streaming and Transactions Stream Type and Operations Concurrency Operations of Transaction Style Conclusion 1

3 Outline Introduction to DDDAS/Interaction Computation An Example and Problems Fresh Breeze Execution Model and Architecture Execution Model Memory Model Task Model Streaming and Transactions Stream Type and Operations Concurrency Operations of Transaction Style Conclusion 2

4 An Example of DDDAS/Interaction Computation Radio Astronomy Receipt signals Antenna Array Antenna Array Filter signals and control Local Processor Local Processor Analyze signals Data Analysis Observer Make decision and change parameters 3

5 Dynamic Data Driven Application System (DDDAS) Challenges real time interaction with parts of the physical environment. management of processing and memory resources according to dynamic needs generated by local events input and output devices process streams of data items make decisions about the work using transaction processing 4

6 Our Solutions: Programming Model and Architecture Support Fresh Breeze Execution Model and Architecture based on codelet execution model support fine-grained execution and memory management Streaming support streaming data expression and operations Transaction support concurrency operations of transaction style 5

7 Outline Introduction to DDDAS/Interaction Computation An Example and Problems Fresh Breeze Execution Model and Architecture Execution Model Memory Model Task Model Streaming and Transactions Stream Type and Operations Concurrency Operations of Transaction Style Conclusion 6

8 CPU CPU Memory Memory Thread Unit Executor Locus A Single Thread Thread Unit Executor Locus A Thread Pool Coarse-Grain thread- The family home model Fine-Grain non-preemptive thread- The hotel model Coarse-Grain vs. Fine-Grain Multithreading 7

9 Case Studies of Fine-Gran Execution Models Dataflow Model (1970s - ) EARTH Model ( ) HTVM Model ( ) Fresh Breeze Model (2000 -) Runnemede Model (2010- ) 8

10 Fresh Breeze Execution Model Task Model A set of rules for creating, destroying and managing threads Execution Model Memory Model Dictate the ordering of memory operations Synchronization Model Provide a set of mechanisms to protect from data races The Abstract Machine 9

11 Fresh Breeze Memory Model -- Main Features and Vision Global shared name space with one-level store A single-update storage model to eliminate the cache-coherence problem Concept of sealed memory chunks/sections with single assigned property Trees of fixed-sized chuncks Fine-Grain memory management support memory allocation and data transfer is performed entirely by architecture/hardware mechanisms[nd 10 security

12 Fresh Breeze Memory Model Master Chunk Data Chunk data item DAG heap chunk handle Arrays as Trees of Chunks Write Once then Read only Fix chunk size: 128 Bytes: 16 doubles, 32 integers, Chunk handle: 64 bits unique identifier Arrays: Three levels yields 4096 elements(longs) 11

13 Task/Concurrency Model Master Task Spawned Tasks spawn(n) Worker 0 Worker n-1 join_update join_update join_fetch Join ContinuationTask - Asynchronous tasking - Continuation Task receives children s results - Non-blocking continuation - Light-Weight Tasks 12

14 Example Dot Product sum=0; for(i=0;i<16*16*16;i++) sum+=a[i]*b[i]; Sync Chunk Build Vector Build Vector Build Vector Step1: Build Vector Join-update Build Chunk A0~A15 Data Chunk 16 elements Build Build Chunk Chunk A240~A255 Build Chunk Build Done Build Done Continue Codelet 13

15 Example Dot Product sum=0; for(i=0;i<16*16*16;i++) sum+=a[i]*b[i]; partial sum sum Traverse Vector partial sum Traverse Vector Traverse Vector Step2: Compute Comput e Comput Comput e e Comput e A0~A15 sum B0~B15 Reduce Reduce Reduce 14

16 Outline Introduction to DDDAS/Interaction Computation An Example and Problems Fresh Breeze Execution Model and Architecture Execution Model Memory Model Task Model Streaming and Transactions Stream Type and Operations Concurrency Operations of Transaction Style Conclusion 15

17 What We Will Learn Streaming stream data type and operations Future structure to implement streaming streaming on dataflow model: static dataflow and synchronous dataflow Transactions Determinate and Repeatable Using Future and Guard to implement concurrency operations of transaction style 16

18 Stream Type and Operations Stream: A sequence of values of type, maybe infinite Define a stream Stream <DataItem> instream = new Stream <DataItem>(); DateItem can be any data type Concatenate two streams Stream <DataItem> strm1 = strm0 + new Stream <DataItem>{i0, i1,... } Get first element in stream strm.first(); 17

19 Stream Type and Operations (cont d) Remove the first element in stream Stream <DataItem> strm1 = strm0.rest () Stream <DataItem> strm = strm.first () + strm.rest () Append an data item to stream strm.append(item) ; It is the end of data stream if ( strm.moredata ()) { statement } 18

20 An Stream Example Average Pairs Source Module Average Pairs Sink Module Stream <int> sourcestream = new Stream <int> ( ) ; while ( true ) { int itm = genedataitem (); sourcestream.append (itm); } result sourcestream ; while(instream.moredata()) { int itm0 = instream.first (); instream = instream.rest (); int itm1 = instream.first (); instream = instream.rest (); outstream.append (( itm0 + itm1 ) / 2); } result outstream ; while (instream.moredata ()) { achive.put (instream.first()); instream = instream.rest (); } result achive.getachive(); 19

21 Stream Implementation in FreshBreeze Stream representation a linear chain of chunks, each chunk holds data items and a reference to the next chunk Stream operations FIFO queue operations on chain of chunks read from the head of the chain of chunks, write to the tail of the chain of chunks Synchronization between Producer and Consumer Special Object: Future 20

22 Future A future is a memory cell with a state waiting to receive a data value:status: undefined, defined, waiting Future Read and Future Write are Atomic 1. create future 2. T1 write future 3. T2 read future undef Data defined Data defined Read After Write T2 gets Data 21

23 Future (Cont d) A future is a memory cell with a state waiting to receive a data value:status: undefined, defined, waiting Future Read and Future Write are Atomic 1. create future 2. T1 read future 3. T2 read future 4. T3 write future undef waiting waiting Data defined T1 Read T1 Read T1 Data T2 Read T2 Data Write After Read 22

24 Stream Operation Based on Future Fresh Breeze Instruction Set Support 4 stream operations New, Append, First and Rest 1. new stream undef 23

25 Stream Operation Based on Future Fresh Breeze Instruction Set Support 4 stream operations New, Append, First and Rest 1. new stream 2. append Data1 defined undef 24

26 Stream Operation Based on Future Fresh Breeze Instruction Set Support 4 stream operations New, Append, First and Rest 1. new stream 2. append 3. first Data1 defined undef Data1 25

27 Stream Operation Based on Future Fresh Breeze Instruction Set Support 4 stream operations New, Append, First and Rest 1. new stream 2. append 3. first 4. rest Data1 undef 26

28 Related Works on Streaming Based on Dataflow Model Streaming based on Static Dataflow Model Streaming based Synchronized Dataflow Model 27

29 Translate Average Pairs to Dataflow Graph Tail Recursive AveragePairs D reset AveragePairs Extract Compute [1] [2] + /2 concat R 28

30 Translate Average Pairs to Dataflow Graph General AveragePairs Compute D *TF T I + /2 R 29

31 Translate Average Pairs to Dataflow Graph General AveragePairs Compute 4321 TTF T I + /2 R 30

32 Translate Average Pairs to Dataflow Graph General AveragePairs Compute 432 TT F 1 T 1 I + /2 R 31

33 Translate Average Pairs to Dataflow Graph General AveragePairs Compute 432 TT T I 1 + /2 R 32

34 Translate Average Pairs to Dataflow Graph General AveragePairs Compute 43 T T T 2 2 I 1 + /2 R 33

35 Translate Average Pairs to Dataflow Graph General AveragePairs Compute 43 T T 2 I /2 R 34

36 Translate Average Pairs to Dataflow Graph General AveragePairs Compute 43 T T T 2 I 3 + /2 R 35

37 Translate Average Pairs to Dataflow Graph General AveragePairs Compute 4 T T T 3 3 I 2 + /2 1 R 36

38 Translate Average Pairs to Dataflow Graph General AveragePairs Compute 4 T T 3 2 I 3 + /2 R 1 37

39 Translate Average Pairs to Dataflow Graph General AveragePairs Compute 4 T T 3 I + /2 R

40 Translate Average Pairs to Dataflow Graph General AveragePairs Compute 5 T T T 4 4 I 3 + /2 R

41 Can Any Streaming Program Work? Translate any streaming program into dataflow graph---recursive and General Answer is YES, see Jack 94 s Paper [2] 40

42 Related Works on Streaming Based on Dataflow Model Streaming based on Static Dataflow Model Streaming based Synchronized Dataflow Model 41

43 Synchronous Dataflow Model Synchronous Dataflow Graph (SDF) Node (actor) represents computation Edge is FIFO queue representing data communication Weights on Edge presents producing rate and consuming rate A p FIFO q B Each time firing actor A produces p data items and actor B consumes q data items. 42

44 Average Pairs to Synchronous Dataflow Graph Source push=1 peek=2 pop=1 Average Pairs push=1 pop=1 Sink Extend the SDF to support peek sematic 43

45 Average Pairs to Synchronous Dataflow Graph Source Average Pairs Sink 44

46 Average Pairs to Synchronous Dataflow Graph Source Average Pairs Sink 45

47 Average Pairs to Synchronous Dataflow Graph Source Average Pairs Sink 46

48 Average Pairs to Synchronous Dataflow Graph Source Average Pairs Sink 47

49 Average Pairs to Synchronous Dataflow Graph Source Average Pairs Sink Details about SDF model, see Lee 87 s Paper [8] 48

50 Some Projects Based on SDF model Early Ptolemy Project at UC Berkeley Software Synthesis for Embedded system StreamIt at MIT streaming program language and compiler InforStream and SPL IBM streaming computing product Our work COStream hierarchical date flow programming language and compiler OpenStream language and compiler support for streaming in OpenMP 49

51 Resources Early Ptolemy Project at UC Berkeley StreamIt at MIT InforStream and SPL COStream OpenStream 50

52 Outline Introduction to DDDAS/Interaction Computation An Example and Problems Fresh Breeze Execution Model and Architecture Execution Model Memory Model Task Model Streaming and Transactions Stream Type and Operations Transactions Conclusion 51

53 Concurrent Transactions Scenario: A Simple Shared Hash Table Shared by two concurrent users. Either user may search the value corresponding to a key, and either user may add or delete entries Using concurrent shared queue User A User B Enter Requests Request Queue Process Queue 52

54 Determinate and Repeatable A system is determinate if the same ultimate output is produced for every run of the system for a presented input, and this is true for all choices of input and for all interpretations of operators. A parallel program schema is said to be repeatable if for a specific interpretation of the operators of the schema, the output for all runs will be strictly a function of the input. 53

55 Determinate f() g() Input: x for all interpretation Merge Output: f(x), g(x) g(x), f(x) Non-Determinate Observe the input and output Black Box Principle Determinant requires that a program schema produce the same results for given input regardless of the specific functions assigned to its operators 54

56 Repeatable f() Merge g() Input: x for specific interpretation f()=g() Output: f(x), f(x) Repeatable White Box Principle A program, with specified operations, is repeatable if any run of the program for a given input produces the same result 55

57 Determinate and Repeatable Determinate Repeatable 56

58 Revisit the Concurrent Request Example Request A Request B Enter Requests Request Queue Process Queue Non-determinate The result order of request A and request B can not be fixed Non-Repeatable Enter requests is interpreted as a non-determinate merge 57

59 Support Transaction Using Guard In FreshBreeze Guard object special data object which can only be accessed by GuardSwap instruction GuardSwap atomic instruction put the new data object into guard, and return the old data object in guard For the Concurrent Request Example using a guard to lock the tail of the queue each request needs to get the guard before be added to the tail of the queue 58

60 Concurrent Writes to Queue Two requests arrive Request A head RA defined undef RA defined undef guard RB defined undef Request B 59

61 Concurrent Writes to Queue Contend the guard guardswap (atomic) Request A head RA defined undef RA defined undef guard RB defined undef guardswap (atomic) Request B 60

62 Concurrent Writes to Queue Request A gets the guard and old tail guardswap (atomic) Request A head RA defined undef RA defined undef guard RB defined undef Request B 61

63 Concurrent Writes to Queue Request A substitute the old tail with the new request WriteFuture (atomic) Request A head RA defined undef RA defined guard RB defined undef Request B 62

64 Concurrent Writes to Queue head Request B gets guard and add to the tail RA Request A defined RA defined RB defined undef Homework: Support concurrent reads? guard Request B 63

65 Request B Enter Requests Request Queue Request A // Init queuehead, queuetail; TransactionManager () { queuehead =queuetail new Future <QueueEntry> (); Guard managerguard = new Guard (queuetail); processqueue ( queuehead ) ; } enterrequest ( RQ request ) { QueueEntry newentry = new QueueEntry ( request ); Future <QueueEntry> oldtail = managerguard.guardswap (newentry.next); FutureWrite ( oldtail, newentry ); } Process Queue processqueue( Future <QueueEntry> reqqueue ) { QueueEntry entry = reqqueue.futureread (); <RQ> request = entry. request ; request. processrequest (); Future <QueueEntry> newhead = entry.next ; processqueue ( newhead ) ;} 64

66 Conclusion Codelet based fine-grain execution model is promising for DDDAS Streaming and Transaction are two important features that support Interactive computing Future and Guard are two special mechanisms to support synchronization and concurrency. 65

67 Reference 1. Dennis, J., Arvind, Gao, G., Li, X., Wang, L.P.: CH1: Architecture and Programming Model for High Performance Interactive Computation (2014) 2. Dennis, J.: Stream data types for signal processing. In: Advanced Topics in Dataflow Computing and Multithreading. pp IEEE Computer Society Press (1995) 3. Dennis, J.: The Fresh Breeze Model of Thread Execution. In Workshop on Programming Models for Ubiquitous Parallelism (2006) 4. Dennis, J.: A Fresh Breeze Processor Supporting Instruction Level Parallelism. Proceedings of Data-Flow Execution Models for Extreme Scale Computing (DFM) (2013) 5. Dennis, J., Gao, G.: Multiprocessor implementation of nondeterminate computations in a functional programming framework, Technical Report TM-375. Computation Structures Group, MIT, Cambridge, MA (1995), pubs/memos/memo-375/memo-375.pdf 6. Dennis, J., Gao, G., Sarkar, V.: Determinacy and repeatability of parallel program schemata. Proceedings of Data- Flow Execution Models for Extreme Scale Computing (DFM) pp (September 2012) 7. Dennis, J., Gao, G., Meng, X.: Experiments with the Fresh Breeze tree-based memory model. International Symposium on Supercomputing, Hamburg (June 2011) 8. Edward A. Lee, David G. Messerschmitt: Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing. IEEE Trans. Computers 36(1): (1987)

68 Concurrent Writes to Queue guardswap (atomic) Request A head RA defined undef RA defined undef guard guard RB defined undef guardswap (atomic) Request B 67

Toward A Codelet Based Execution Model and Its Memory Semantics

Toward A Codelet Based Execution Model and Its Memory Semantics Toward A Codelet Based Execution Model and Its Memory Semantics -- For Future Extreme-Scale Computing Systems HPC Worshop Centraro, Italy June 20, 2012 Guang R. Gao ACM Fellow and IEEE Fellow Distinguished

More information

Architecture and Programming Models for High Performance Intensive Computation

Architecture and Programming Models for High Performance Intensive Computation AFRL-AFOSR-VA-TR-2016-0230 Architecture and Programming Models for High Performance Intensive Computation XiaoMing Li UNIVERSITY OF DELAWARE 06/29/2016 Final Report Air Force Research Laboratory AF Office

More information

The Fresh Breeze Model of Thread Execution

The Fresh Breeze Model of Thread Execution The Fresh Breeze Model of Thread Execution ABSTRACT We present the program execution model developed for the Fresh Breeze Project, which has the goal of developing a multi-core chip architecture that supports

More information

CPEG 852 Advanced Topics in Computing Systems The Dataflow Model of Computation

CPEG 852 Advanced Topics in Computing Systems The Dataflow Model of Computation CPEG 852 Advanced Topics in Computing Systems The Dataflow Model of Computation Dynamic Dataflow Stéphane Zuckerman Computer Architecture & Parallel Systems Laboratory Electrical & Computer Engineering

More information

Fundamental Algorithms for System Modeling, Analysis, and Optimization

Fundamental Algorithms for System Modeling, Analysis, and Optimization Fundamental Algorithms for System Modeling, Analysis, and Optimization Stavros Tripakis, Edward A. Lee UC Berkeley EECS 144/244 Fall 2014 Copyright 2014, E. A. Lee, J. Roydhowdhury, S. A. Seshia, S. Tripakis

More information

Topic C Memory Models

Topic C Memory Models Memory Memory Non- Topic C Memory CPEG852 Spring 2014 Guang R. Gao CPEG 852 Memory Advance 1 / 29 Memory 1 Memory Memory Non- 2 Non- CPEG 852 Memory Advance 2 / 29 Memory Memory Memory Non- Introduction:

More information

Software Synthesis Trade-offs in Dataflow Representations of DSP Applications

Software Synthesis Trade-offs in Dataflow Representations of DSP Applications in Dataflow Representations of DSP Applications Shuvra S. Bhattacharyya Department of Electrical and Computer Engineering, and Institute for Advanced Computer Studies University of Maryland, College Park

More information

CPEG 852 Advanced Topics in Computing Systems The Dataflow Model of Computation

CPEG 852 Advanced Topics in Computing Systems The Dataflow Model of Computation CPEG 852 Advanced Topics in Computing Systems The Dataflow Model of Computation Stéphane Zuckerman Computer Architecture & Parallel Systems Laboratory Electrical & Computer Engineering Dept. University

More information

StreamIt: A Language for Streaming Applications

StreamIt: A Language for Streaming Applications StreamIt: A Language for Streaming Applications William Thies, Michal Karczmarek, Michael Gordon, David Maze, Jasper Lin, Ali Meli, Andrew Lamb, Chris Leger and Saman Amarasinghe MIT Laboratory for Computer

More information

Cache Aware Optimization of Stream Programs

Cache Aware Optimization of Stream Programs Cache Aware Optimization of Stream Programs Janis Sermulins, William Thies, Rodric Rabbah and Saman Amarasinghe LCTES Chicago, June 2005 Streaming Computing Is Everywhere! Prevalent computing domain with

More information

University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory

University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory Massively Multi-Core Systems and Virtual Memory Guang R. Gao and Jack B. Dennis

More information

Concurrent Models of Computation

Concurrent Models of Computation Concurrent Models of Computation Edward A. Lee Robert S. Pepper Distinguished Professor, UC Berkeley EECS 219D: Concurrent Models of Computation Fall 2011 Copyright 2011, Edward A. Lee, All rights reserved

More information

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory 1 Example Programs Initially, A = B = 0 P1 P2 A = 1 B = 1 if (B == 0) if (A == 0) critical section

More information

Multigrain Parallelism: Bridging Coarse- Grain Parallel Languages and Fine-Grain Event-Driven Multithreading

Multigrain Parallelism: Bridging Coarse- Grain Parallel Languages and Fine-Grain Event-Driven Multithreading Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory - CAPSL Multigrain Parallelism: Bridging Coarse- Grain Parallel Languages and Fine-Grain Event-Driven

More information

Overview. CMSC 330: Organization of Programming Languages. Concurrency. Multiprocessors. Processes vs. Threads. Computation Abstractions

Overview. CMSC 330: Organization of Programming Languages. Concurrency. Multiprocessors. Processes vs. Threads. Computation Abstractions CMSC 330: Organization of Programming Languages Multithreaded Programming Patterns in Java CMSC 330 2 Multiprocessors Description Multiple processing units (multiprocessor) From single microprocessor to

More information

EECS 583 Class 20 Research Topic 2: Stream Compilation, GPU Compilation

EECS 583 Class 20 Research Topic 2: Stream Compilation, GPU Compilation EECS 583 Class 20 Research Topic 2: Stream Compilation, GPU Compilation University of Michigan December 3, 2012 Guest Speakers Today: Daya Khudia and Mehrzad Samadi nnouncements & Reading Material Exams

More information

EECS 144/244: Fundamental Algorithms for System Modeling, Analysis, and Optimization

EECS 144/244: Fundamental Algorithms for System Modeling, Analysis, and Optimization EECS 144/244: Fundamental Algorithms for System Modeling, Analysis, and Optimization Dataflow Lecture: SDF, Kahn Process Networks Stavros Tripakis University of California, Berkeley Stavros Tripakis: EECS

More information

Dennis [1] developed the model of dataflowschemas, building on work by Karp and Miller [2]. These dataflow graphs, as they were later called, evolved

Dennis [1] developed the model of dataflowschemas, building on work by Karp and Miller [2]. These dataflow graphs, as they were later called, evolved Predictive Block Dataflow Model for Parallel Computation EE382C: Embedded Software Systems Literature Survey Vishal Mishra and Kursad Oney Dept. of Electrical and Computer Engineering The University of

More information

Parallel Programming: Background Information

Parallel Programming: Background Information 1 Parallel Programming: Background Information Mike Bailey mjb@cs.oregonstate.edu parallel.background.pptx Three Reasons to Study Parallel Programming 2 1. Increase performance: do more work in the same

More information

Parallel Programming: Background Information

Parallel Programming: Background Information 1 Parallel Programming: Background Information Mike Bailey mjb@cs.oregonstate.edu parallel.background.pptx Three Reasons to Study Parallel Programming 2 1. Increase performance: do more work in the same

More information

COMPILER AND MEMORY LAYOUT IN FRESH BREEZE SYSTEM. Chao Yang

COMPILER AND MEMORY LAYOUT IN FRESH BREEZE SYSTEM. Chao Yang COMPILER AND MEMORY LAYOUT IN FRESH BREEZE SYSTEM by Chao Yang A thesis submitted to the Faculty of the University of Delaware in partial fulfillment of the requirements for the degree of Master of Science

More information

Overview of Dataflow Languages. Waheed Ahmad

Overview of Dataflow Languages. Waheed Ahmad Overview of Dataflow Languages Waheed Ahmad w.ahmad@utwente.nl The purpose of models is not to fit the data but to sharpen the questions. Samuel Karlins 11 th R.A Fisher Memorial Lecture Royal Society

More information

Threads. Raju Pandey Department of Computer Sciences University of California, Davis Spring 2011

Threads. Raju Pandey Department of Computer Sciences University of California, Davis Spring 2011 Threads Raju Pandey Department of Computer Sciences University of California, Davis Spring 2011 Threads Effectiveness of parallel computing depends on the performance of the primitives used to express

More information

Lecture 4: Synchronous Data Flow Graphs - HJ94 goal: Skiing down a mountain

Lecture 4: Synchronous Data Flow Graphs - HJ94 goal: Skiing down a mountain Lecture 4: Synchronous ata Flow Graphs - I. Verbauwhede, 05-06 K.U.Leuven HJ94 goal: Skiing down a mountain SPW, Matlab, C pipelining, unrolling Specification Algorithm Transformations loop merging, compaction

More information

Static Scheduling and Code Generation from Dynamic Dataflow Graphs With Integer- Valued Control Streams

Static Scheduling and Code Generation from Dynamic Dataflow Graphs With Integer- Valued Control Streams Presented at 28th Asilomar Conference on Signals, Systems, and Computers November, 994 Static Scheduling and Code Generation from Dynamic Dataflow Graphs With Integer- Valued Control Streams Joseph T.

More information

Topic B (Cont d) Dataflow Model of Computation

Topic B (Cont d) Dataflow Model of Computation opic B (Cont d) Dataflow Model of Computation Guang R. Gao ACM ellow and IEEE ellow Endowed Distinguished Professor Electrical & Computer Engineering University of Delaware ggao@capsl.udel.edu 09/07/20

More information

Marwan Burelle. Parallel and Concurrent Programming

Marwan Burelle.  Parallel and Concurrent Programming marwan.burelle@lse.epita.fr http://wiki-prog.infoprepa.epita.fr Outline 1 2 3 OpenMP Tell Me More (Go, OpenCL,... ) Overview 1 Sharing Data First, always try to apply the following mantra: Don t share

More information

Advanced Topic: Efficient Synchronization

Advanced Topic: Efficient Synchronization Advanced Topic: Efficient Synchronization Multi-Object Programs What happens when we try to synchronize across multiple objects in a large program? Each object with its own lock, condition variables Is

More information

Concept of a process

Concept of a process Concept of a process In the context of this course a process is a program whose execution is in progress States of a process: running, ready, blocked Submit Ready Running Completion Blocked Concurrent

More information

LabVIEW Based Embedded Design [First Report]

LabVIEW Based Embedded Design [First Report] LabVIEW Based Embedded Design [First Report] Sadia Malik Ram Rajagopal Department of Electrical and Computer Engineering University of Texas at Austin Austin, TX 78712 malik@ece.utexas.edu ram.rajagopal@ni.com

More information

HW/SW Codesign. Exercise 2: Kahn Process Networks and Synchronous Data Flows

HW/SW Codesign. Exercise 2: Kahn Process Networks and Synchronous Data Flows HW/SW Codesign Exercise 2: Kahn Process Networks and Synchronous Data Flows 4. October 2017 Stefan Draskovic stefan.draskovic@tik.ee.ethz.ch slides by: Mirela Botezatu 1 Kahn Process Network (KPN) Specification

More information

Dynamic Scheduling Implementation to Synchronous Data Flow Graph in DSP Networks

Dynamic Scheduling Implementation to Synchronous Data Flow Graph in DSP Networks Dynamic Scheduling Implementation to Synchronous Data Flow Graph in DSP Networks ENSC 833 Project Final Report Zhenhua Xiao (Max) zxiao@sfu.ca April 22, 2001 Department of Engineering Science, Simon Fraser

More information

Parallel Program Execution and Architecture Models with Dataflow Origin The EARTH Experience

Parallel Program Execution and Architecture Models with Dataflow Origin The EARTH Experience Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory - CAPSL Parallel Program Execution and Architecture Models with Dataflow Origin The EARTH Experience

More information

Threads Implementation. Jo, Heeseung

Threads Implementation. Jo, Heeseung Threads Implementation Jo, Heeseung Today's Topics How to implement threads? User-level threads Kernel-level threads Threading models 2 Kernel/User-level Threads Who is responsible for creating/managing

More information

What Came First? The Ordering of Events in

What Came First? The Ordering of Events in What Came First? The Ordering of Events in Systems @kavya719 kavya the design of concurrent systems Slack architecture on AWS systems with multiple independent actors. threads in a multithreaded program.

More information

Lecture 21: Transactional Memory. Topics: Hardware TM basics, different implementations

Lecture 21: Transactional Memory. Topics: Hardware TM basics, different implementations Lecture 21: Transactional Memory Topics: Hardware TM basics, different implementations 1 Transactions New paradigm to simplify programming instead of lock-unlock, use transaction begin-end locks are blocking,

More information

Hardware Software Codesign

Hardware Software Codesign Hardware Software Codesign 2. Specification and Models of Computation Lothar Thiele 2-1 System Design Specification System Synthesis Estimation SW-Compilation Intellectual Prop. Code Instruction Set HW-Synthesis

More information

Dataflow Languages. Languages for Embedded Systems. Prof. Stephen A. Edwards. March Columbia University

Dataflow Languages. Languages for Embedded Systems. Prof. Stephen A. Edwards. March Columbia University Dataflow Languages Languages for Embedded Systems Prof. Stephen A. Edwards Columbia University March 2009 Philosophy of Dataflow Languages Drastically different way of looking at computation Von Neumann

More information

EE382N.23: Embedded System Design and Modeling

EE382N.23: Embedded System Design and Modeling EE38N.3: Embedded System Design and Modeling Lecture 5 Process-Based MoCs Andreas Gerstlauer Electrical and Computer Engineering University of Texas at Austin gerstl@ece.utexas.edu Lecture 5: Outline Process-based

More information

Topic 4c: A hybrid Dataflow-Von Neumann PXM: The EARTH Experience

Topic 4c: A hybrid Dataflow-Von Neumann PXM: The EARTH Experience Topic 4c: A hybrid Dataflow-Von Neumann PXM: The EARTH Experience CPEG421/621: Compiler Design Material mostly taken from Professor Guang R. Gao s previous courses, with additional material from J.Suetterlein.

More information

University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory

University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory Experiments with the Fresh Breeze Tree-Based Memory Model Jack B. Dennis,

More information

CS301 - Data Structures Glossary By

CS301 - Data Structures Glossary By CS301 - Data Structures Glossary By Abstract Data Type : A set of data values and associated operations that are precisely specified independent of any particular implementation. Also known as ADT Algorithm

More information

Efficient Work Stealing for Fine-Grained Parallelism

Efficient Work Stealing for Fine-Grained Parallelism Efficient Work Stealing for Fine-Grained Parallelism Karl-Filip Faxén Swedish Institute of Computer Science November 26, 2009 Task parallel fib in Wool TASK 1( int, fib, int, n ) { if( n

More information

Today s Topics. u Thread implementation. l Non-preemptive versus preemptive threads. l Kernel vs. user threads

Today s Topics. u Thread implementation. l Non-preemptive versus preemptive threads. l Kernel vs. user threads Today s Topics COS 318: Operating Systems Implementing Threads u Thread implementation l Non-preemptive versus preemptive threads l Kernel vs. user threads Jaswinder Pal Singh and a Fabulous Course Staff

More information

Midterm I October 12 th, 2005 CS162: Operating Systems and Systems Programming

Midterm I October 12 th, 2005 CS162: Operating Systems and Systems Programming University of California, Berkeley College of Engineering Computer Science Division EECS Fall 2005 John Kubiatowicz Midterm I October 12 th, 2005 CS162: Operating Systems and Systems Programming Your Name:

More information

OpenMP Optimization and its Translation to OpenGL

OpenMP Optimization and its Translation to OpenGL OpenMP Optimization and its Translation to OpenGL Santosh Kumar SITRC-Nashik, India Dr. V.M.Wadhai MAE-Pune, India Prasad S.Halgaonkar MITCOE-Pune, India Kiran P.Gaikwad GHRIEC-Pune, India ABSTRACT For

More information

Lecture 10: Multi-Object Synchronization

Lecture 10: Multi-Object Synchronization CS 422/522 Design & Implementation of Operating Systems Lecture 10: Multi-Object Synchronization Zhong Shao Dept. of Computer Science Yale University Acknowledgement: some slides are taken from previous

More information

Lecture notes Lectures 1 through 5 (up through lecture 5 slide 63) Book Chapters 1-4

Lecture notes Lectures 1 through 5 (up through lecture 5 slide 63) Book Chapters 1-4 EE445M Midterm Study Guide (Spring 2017) (updated February 25, 2017): Instructions: Open book and open notes. No calculators or any electronic devices (turn cell phones off). Please be sure that your answers

More information

fakultät für informatik informatik 12 technische universität dortmund Data flow models Peter Marwedel TU Dortmund, Informatik /10/08

fakultät für informatik informatik 12 technische universität dortmund Data flow models Peter Marwedel TU Dortmund, Informatik /10/08 12 Data flow models Peter Marwedel TU Dortmund, Informatik 12 2009/10/08 Graphics: Alexandra Nolte, Gesine Marwedel, 2003 Models of computation considered in this course Communication/ local computations

More information

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6)

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6) Lecture: Consistency Models, TM Topics: consistency models, TM intro (Section 5.6) 1 Coherence Vs. Consistency Recall that coherence guarantees (i) that a write will eventually be seen by other processors,

More information

CSE 544 Principles of Database Management Systems

CSE 544 Principles of Database Management Systems CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 5 - DBMS Architecture and Indexing 1 Announcements HW1 is due next Thursday How is it going? Projects: Proposals are due

More information

Problem Set 2. CS347: Operating Systems

Problem Set 2. CS347: Operating Systems CS347: Operating Systems Problem Set 2 1. Consider a clinic with one doctor and a very large waiting room (of infinite capacity). Any patient entering the clinic will wait in the waiting room until the

More information

CPSC/ECE 3220 Fall 2017 Exam Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.

CPSC/ECE 3220 Fall 2017 Exam Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts. CPSC/ECE 3220 Fall 2017 Exam 1 Name: 1. Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.) Referee / Illusionist / Glue. Circle only one of R, I, or G.

More information

Lecture 14. Synthesizing Parallel Programs IAP 2007 MIT

Lecture 14. Synthesizing Parallel Programs IAP 2007 MIT 6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007 MIT Synthesizing parallel programs (or borrowing some ideas from hardware design) Arvind Computer Science & Artificial

More information

Introduction to Computer Systems /18-243, fall th Lecture, Dec 1

Introduction to Computer Systems /18-243, fall th Lecture, Dec 1 Introduction to Computer Systems 15-213/18-243, fall 2009 24 th Lecture, Dec 1 Instructors: Roger B. Dannenberg and Greg Ganger Today Multi-core Thread Level Parallelism (TLP) Simultaneous Multi -Threading

More information

Intel Thread Building Blocks, Part IV

Intel Thread Building Blocks, Part IV Intel Thread Building Blocks, Part IV SPD course 2017-18 Massimo Coppola 13/04/2018 1 Mutexes TBB Classes to build mutex lock objects The lock object will Lock the associated data object (the mutex) for

More information

Dynamic Expressivity with Static Optimization for Streaming Languages

Dynamic Expressivity with Static Optimization for Streaming Languages Dynamic Expressivity with Static Optimization for Streaming Languages Robert Soulé Michael I. Gordon Saman marasinghe Robert Grimm Martin Hirzel ornell MIT MIT NYU IM DES 2013 1 Stream (FIFO queue) Operator

More information

Future Directions. Edward A. Lee. Berkeley, CA May 12, A New Computational Platform: Ubiquitous Networked Embedded Systems. actuate.

Future Directions. Edward A. Lee. Berkeley, CA May 12, A New Computational Platform: Ubiquitous Networked Embedded Systems. actuate. Future Directions Edward A. Lee 6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 A New Computational Platform: Ubiquitous Networked Embedded Systems sense actuate control Ptolemy II support

More information

Deterministic Concurrency

Deterministic Concurrency Candidacy Exam p. 1/35 Deterministic Concurrency Candidacy Exam Nalini Vasudevan Columbia University Motivation Candidacy Exam p. 2/35 Candidacy Exam p. 3/35 Why Parallelism? Past Vs. Future Power wall:

More information

SWARM Tutorial. Chen Chen 4/12/2012

SWARM Tutorial. Chen Chen 4/12/2012 SWARM Tutorial Chen Chen 4/12/2012 1 Outline Introduction to SWARM Programming in SWARM Atomic Operations in SWARM Parallel For Loop in SWARM 2 Outline Introduction to SWARM Programming in SWARM Atomic

More information

EE213A - EE298-2 Lecture 8

EE213A - EE298-2 Lecture 8 EE3A - EE98- Lecture 8 Synchronous ata Flow Ingrid Verbauwhede epartment of Electrical Engineering University of California Los Angeles ingrid@ee.ucla.edu EE3A, Spring 000, Ingrid Verbauwhede, UCLA - Lecture

More information

CSE Traditional Operating Systems deal with typical system software designed to be:

CSE Traditional Operating Systems deal with typical system software designed to be: CSE 6431 Traditional Operating Systems deal with typical system software designed to be: general purpose running on single processor machines Advanced Operating Systems are designed for either a special

More information

Implementation of Process Networks in Java

Implementation of Process Networks in Java Implementation of Process Networks in Java Richard S, Stevens 1, Marlene Wan, Peggy Laramie, Thomas M. Parks, Edward A. Lee DRAFT: 10 July 1997 Abstract A process network, as described by G. Kahn, is a

More information

Concurrent Component Patterns, Models of Computation, and Types

Concurrent Component Patterns, Models of Computation, and Types Concurrent Component Patterns, Models of Computation, and Types Edward A. Lee Yuhong Xiong Department of Electrical Engineering and Computer Sciences University of California at Berkeley Presented at Fourth

More information

Applying Models of Computation to OpenCL Pipes for FPGA Computing. Nachiket Kapre + Hiren Patel

Applying Models of Computation to OpenCL Pipes for FPGA Computing. Nachiket Kapre + Hiren Patel Applying Models of Computation to OpenCL Pipes for FPGA Computing Nachiket Kapre + Hiren Patel nachiket@uwaterloo.ca Outline Models of Computation and Parallelism OpenCL code samples Synchronous Dataflow

More information

StreamIt on Fleet. Amir Kamil Computer Science Division, University of California, Berkeley UCB-AK06.

StreamIt on Fleet. Amir Kamil Computer Science Division, University of California, Berkeley UCB-AK06. StreamIt on Fleet Amir Kamil Computer Science Division, University of California, Berkeley kamil@cs.berkeley.edu UCB-AK06 July 16, 2008 1 Introduction StreamIt [1] is a high-level programming language

More information

AN EFFICIENT IMPLEMENTATION OF NESTED LOOP CONTROL INSTRUCTIONS FOR FINE GRAIN PARALLELISM 1

AN EFFICIENT IMPLEMENTATION OF NESTED LOOP CONTROL INSTRUCTIONS FOR FINE GRAIN PARALLELISM 1 AN EFFICIENT IMPLEMENTATION OF NESTED LOOP CONTROL INSTRUCTIONS FOR FINE GRAIN PARALLELISM 1 Virgil Andronache Richard P. Simpson Nelson L. Passos Department of Computer Science Midwestern State University

More information

Threads. Computer Systems. 5/12/2009 cse threads Perkins, DW Johnson and University of Washington 1

Threads. Computer Systems.   5/12/2009 cse threads Perkins, DW Johnson and University of Washington 1 Threads CSE 410, Spring 2009 Computer Systems http://www.cs.washington.edu/410 5/12/2009 cse410-20-threads 2006-09 Perkins, DW Johnson and University of Washington 1 Reading and References Reading» Read

More information

The Problem with Treads

The Problem with Treads The Problem with Treads Edward A. Lee Programming Technology Lecture 2 11/09/08 Background on Edward A. Lee Bachelors degree (Yale University) (1979) Master degree (MIT) (1981) Ph.D. (U. C. Berkeley) (1986)

More information

Pointers II. Class 31

Pointers II. Class 31 Pointers II Class 31 Compile Time all of the variables we have seen so far have been declared at compile time they are written into the program code you can see by looking at the program how many variables

More information

Efficient streaming applications on multi-core with FastFlow: the biosequence alignment test-bed

Efficient streaming applications on multi-core with FastFlow: the biosequence alignment test-bed Efficient streaming applications on multi-core with FastFlow: the biosequence alignment test-bed Marco Aldinucci Computer Science Dept. - University of Torino - Italy Marco Danelutto, Massimiliano Meneghin,

More information

CSE 120 Principles of Operating Systems

CSE 120 Principles of Operating Systems CSE 120 Principles of Operating Systems Spring 2009 Lecture 4: Threads Geoffrey M. Voelker Announcements Homework #1 due now Project 0 due tonight Project 1 out April 9, 2009 CSE 120 Lecture 4 Threads

More information

Assessment of New Languages of Multicore Processors

Assessment of New Languages of Multicore Processors Assessment of New Languages of Multicore Processors Mahmoud Parmouzeh, Department of Computer, Shabestar Branch, Islamic Azad University, Shabestar, Iran. parmouze@gmail.com AbdolNaser Dorgalaleh, Department

More information

Concurrent Models of Computation for Embedded Software

Concurrent Models of Computation for Embedded Software Concurrent Models of Computation for Embedded Software Edward A. Lee Professor, UC Berkeley EECS 219D Concurrent Models of Computation Fall 2011 Copyright 2009-2011, Edward A. Lee, All rights reserved

More information

Multithreaded Algorithms Part 1. Dept. of Computer Science & Eng University of Moratuwa

Multithreaded Algorithms Part 1. Dept. of Computer Science & Eng University of Moratuwa CS4460 Advanced d Algorithms Batch 08, L4S2 Lecture 11 Multithreaded Algorithms Part 1 N. H. N. D. de Silva Dept. of Computer Science & Eng University of Moratuwa Announcements Last topic discussed is

More information

Questions answered in this lecture: CS 537 Lecture 19 Threads and Cooperation. What s in a process? Organizing a Process

Questions answered in this lecture: CS 537 Lecture 19 Threads and Cooperation. What s in a process? Organizing a Process Questions answered in this lecture: CS 537 Lecture 19 Threads and Cooperation Why are threads useful? How does one use POSIX pthreads? Michael Swift 1 2 What s in a process? Organizing a Process A process

More information

CSE 120 Principles of Operating Systems

CSE 120 Principles of Operating Systems CSE 120 Principles of Operating Systems Fall 2015 Lecture 4: Threads Geoffrey M. Voelker Announcements Project 0 due Project 1 out October 6, 2015 CSE 120 Lecture 4 Threads 2 Processes Recall that a process

More information

Scheduling FFT Computation on SMP and Multicore Systems Ayaz Ali, Lennart Johnsson & Jaspal Subhlok

Scheduling FFT Computation on SMP and Multicore Systems Ayaz Ali, Lennart Johnsson & Jaspal Subhlok Scheduling FFT Computation on SMP and Multicore Systems Ayaz Ali, Lennart Johnsson & Jaspal Subhlok Texas Learning and Computation Center Department of Computer Science University of Houston Outline Motivation

More information

CSE 544: Principles of Database Systems

CSE 544: Principles of Database Systems CSE 544: Principles of Database Systems Anatomy of a DBMS, Parallel Databases 1 Announcements Lecture on Thursday, May 2nd: Moved to 9am-10:30am, CSE 403 Paper reviews: Anatomy paper was due yesterday;

More information

CSC2/458 Parallel and Distributed Systems Scalable Synchronization

CSC2/458 Parallel and Distributed Systems Scalable Synchronization CSC2/458 Parallel and Distributed Systems Scalable Synchronization Sreepathi Pai February 20, 2018 URCS Outline Scalable Locking Barriers Outline Scalable Locking Barriers An alternative lock ticket lock

More information

Understandable Concurrency

Understandable Concurrency Edward A. Lee Professor, Chair of EE, and Associate Chair of EECS Director, CHESS: Center for Hybrid and Embedded Software Systems Director, Ptolemy Project UC Berkeley Chess Review November 21, 2005 Berkeley,

More information

Interface Automata and Actif Actors

Interface Automata and Actif Actors Interface Automata and Actif Actors H. John Reekie Dept. of Electrical Engineering and Computer Science University of California at Berkeley johnr@eecs.berkeley.edu Abstract This technical note uses the

More information

Hierarchical FSMs with Multiple CMs

Hierarchical FSMs with Multiple CMs Hierarchical FSMs with Multiple CMs Manaloor Govindarajan Balasubramanian Manikantan Bharathwaj Muthuswamy (aka Bharath) Reference: Hierarchical FSMs with Multiple Concurrency Models. Alain Girault, Bilung

More information

CS 152 Computer Architecture and Engineering. Lecture 19: Synchronization and Sequential Consistency

CS 152 Computer Architecture and Engineering. Lecture 19: Synchronization and Sequential Consistency CS 152 Computer Architecture and Engineering Lecture 19: Synchronization and Sequential Consistency Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste

More information

Disciplined Concurrent Models of Computation for Parallel Software

Disciplined Concurrent Models of Computation for Parallel Software Disciplined Concurrent Models of Computation for Parallel Software Edward A. Lee Robert S. Pepper Distinguished Professor and UC Berkeley Invited Keynote Talk 2008 Summer Institute The Concurrency Challenge:

More information

Lecture: Consistency Models, TM

Lecture: Consistency Models, TM Lecture: Consistency Models, TM Topics: consistency models, TM intro (Section 5.6) No class on Monday (please watch TM videos) Wednesday: TM wrap-up, interconnection networks 1 Coherence Vs. Consistency

More information

Region-Based Memory Management for Task Dataflow Models

Region-Based Memory Management for Task Dataflow Models Region-Based Memory Management for Task Dataflow Models Nikolopoulos, D. (2012). Region-Based Memory Management for Task Dataflow Models: First Joint ENCORE & PEPPHER Workshop on Programmability and Portability

More information

Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP

Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Presenter: Course: EEC 289Q: Reconfigurable Computing Course Instructor: Professor Soheil Ghiasi Outline Overview of M.I.T. Raw processor

More information

POSIX Threads: a first step toward parallel programming. George Bosilca

POSIX Threads: a first step toward parallel programming. George Bosilca POSIX Threads: a first step toward parallel programming George Bosilca bosilca@icl.utk.edu Process vs. Thread A process is a collection of virtual memory space, code, data, and system resources. A thread

More information

Threads Questions Important Questions

Threads Questions Important Questions Threads Questions Important Questions https://dzone.com/articles/threads-top-80-interview https://www.journaldev.com/1162/java-multithreading-concurrency-interviewquestions-answers https://www.javatpoint.com/java-multithreading-interview-questions

More information

IBM Power Multithreaded Parallelism: Languages and Compilers. Fall Nirav Dave

IBM Power Multithreaded Parallelism: Languages and Compilers. Fall Nirav Dave 6.827 Multithreaded Parallelism: Languages and Compilers Fall 2006 Lecturer: TA: Assistant: Arvind Nirav Dave Sally Lee L01-1 IBM Power 5 130nm SOI CMOS with Cu 389mm 2 2GHz 276 million transistors Dual

More information

Concurrent ML. John Reppy January 21, University of Chicago

Concurrent ML. John Reppy January 21, University of Chicago Concurrent ML John Reppy jhr@cs.uchicago.edu University of Chicago January 21, 2016 Introduction Outline I Concurrent programming models I Concurrent ML I Multithreading via continuations (if there is

More information

Lecture 15: I/O Devices & Drivers

Lecture 15: I/O Devices & Drivers CS 422/522 Design & Implementation of Operating Systems Lecture 15: I/O Devices & Drivers Zhong Shao Dept. of Computer Science Yale University Acknowledgement: some slides are taken from previous versions

More information

Marwan Burelle. Parallel and Concurrent Programming

Marwan Burelle.   Parallel and Concurrent Programming marwan.burelle@lse.epita.fr http://wiki-prog.infoprepa.epita.fr Outline 1 2 Solutions Overview 1 Sharing Data First, always try to apply the following mantra: Don t share data! When non-scalar data are

More information

Lecture 4: Threads; weaving control flow

Lecture 4: Threads; weaving control flow Lecture 4: Threads; weaving control flow CSE 120: Principles of Operating Systems Alex C. Snoeren HW 1 Due NOW Announcements Homework #1 due now Project 0 due tonight Project groups Please send project

More information

Embedded Systems 8. Identifying, modeling and documenting how data moves around an information system. Dataflow modeling examines

Embedded Systems 8. Identifying, modeling and documenting how data moves around an information system. Dataflow modeling examines Embedded Systems 8 - - Dataflow modeling Identifying, modeling and documenting how data moves around an information system. Dataflow modeling examines processes (activities that transform data from one

More information

Parallel Program Execution and Architecture Models With Dataflow Origin -- The EARTH Experience

Parallel Program Execution and Architecture Models With Dataflow Origin -- The EARTH Experience Parallel Program Execution and Architecture Models With Dataflow Origin -- The EARTH Experience Guang R. Gao ACM Fellow and IEEE Fellow Endowed Distinguished Professor Electrical & Computer Engineering

More information

SYSTEMS MEMO #12. A Synchronization Library for ASIM. Beng-Hong Lim Laboratory for Computer Science.

SYSTEMS MEMO #12. A Synchronization Library for ASIM. Beng-Hong Lim Laboratory for Computer Science. ALEWIFE SYSTEMS MEMO #12 A Synchronization Library for ASIM Beng-Hong Lim (bhlim@masala.lcs.mit.edu) Laboratory for Computer Science Room NE43-633 January 9, 1992 Abstract This memo describes the functions

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Portland State University ECE 588/688 Introduction to Parallel Computing Reference: Lawrence Livermore National Lab Tutorial https://computing.llnl.gov/tutorials/parallel_comp/ Copyright by Alaa Alameldeen

More information

ADAPTIVE TASK SCHEDULING USING LOW-LEVEL RUNTIME APIs AND MACHINE LEARNING

ADAPTIVE TASK SCHEDULING USING LOW-LEVEL RUNTIME APIs AND MACHINE LEARNING ADAPTIVE TASK SCHEDULING USING LOW-LEVEL RUNTIME APIs AND MACHINE LEARNING Keynote, ADVCOMP 2017 November, 2017, Barcelona, Spain Prepared by: Ahmad Qawasmeh Assistant Professor The Hashemite University,

More information