Architecture and Programming Model for High Performance Interactive Computation
|
|
- Virgil Mills
- 5 years ago
- Views:
Transcription
1 Architecture and Programming Model for High Performance Interactive Computation Based on Title of TM127, CAPSL TM-127 By Jack B. Dennis, Arvind, Guang R. Gao, Xiaoming Li and Lian-Ping Wang April 3rd, 2014 Haitao Wei CAPSL at UDEL CPEG-852:Advanced Topics in Computing Systems 4/22/14
2 Outline Introduction to DDDAS/Interaction Computation An Example and Problems Fresh Breeze Execution Model and Architecture Execution Model Memory Model Task Model Streaming and Transactions Stream Type and Operations Concurrency Operations of Transaction Style Conclusion 1
3 Outline Introduction to DDDAS/Interaction Computation An Example and Problems Fresh Breeze Execution Model and Architecture Execution Model Memory Model Task Model Streaming and Transactions Stream Type and Operations Concurrency Operations of Transaction Style Conclusion 2
4 An Example of DDDAS/Interaction Computation Radio Astronomy Receipt signals Antenna Array Antenna Array Filter signals and control Local Processor Local Processor Analyze signals Data Analysis Observer Make decision and change parameters 3
5 Dynamic Data Driven Application System (DDDAS) Challenges real time interaction with parts of the physical environment. management of processing and memory resources according to dynamic needs generated by local events input and output devices process streams of data items make decisions about the work using transaction processing 4
6 Our Solutions: Programming Model and Architecture Support Fresh Breeze Execution Model and Architecture based on codelet execution model support fine-grained execution and memory management Streaming support streaming data expression and operations Transaction support concurrency operations of transaction style 5
7 Outline Introduction to DDDAS/Interaction Computation An Example and Problems Fresh Breeze Execution Model and Architecture Execution Model Memory Model Task Model Streaming and Transactions Stream Type and Operations Concurrency Operations of Transaction Style Conclusion 6
8 CPU CPU Memory Memory Thread Unit Executor Locus A Single Thread Thread Unit Executor Locus A Thread Pool Coarse-Grain thread- The family home model Fine-Grain non-preemptive thread- The hotel model Coarse-Grain vs. Fine-Grain Multithreading 7
9 Case Studies of Fine-Gran Execution Models Dataflow Model (1970s - ) EARTH Model ( ) HTVM Model ( ) Fresh Breeze Model (2000 -) Runnemede Model (2010- ) 8
10 Fresh Breeze Execution Model Task Model A set of rules for creating, destroying and managing threads Execution Model Memory Model Dictate the ordering of memory operations Synchronization Model Provide a set of mechanisms to protect from data races The Abstract Machine 9
11 Fresh Breeze Memory Model -- Main Features and Vision Global shared name space with one-level store A single-update storage model to eliminate the cache-coherence problem Concept of sealed memory chunks/sections with single assigned property Trees of fixed-sized chuncks Fine-Grain memory management support memory allocation and data transfer is performed entirely by architecture/hardware mechanisms[nd 10 security
12 Fresh Breeze Memory Model Master Chunk Data Chunk data item DAG heap chunk handle Arrays as Trees of Chunks Write Once then Read only Fix chunk size: 128 Bytes: 16 doubles, 32 integers, Chunk handle: 64 bits unique identifier Arrays: Three levels yields 4096 elements(longs) 11
13 Task/Concurrency Model Master Task Spawned Tasks spawn(n) Worker 0 Worker n-1 join_update join_update join_fetch Join ContinuationTask - Asynchronous tasking - Continuation Task receives children s results - Non-blocking continuation - Light-Weight Tasks 12
14 Example Dot Product sum=0; for(i=0;i<16*16*16;i++) sum+=a[i]*b[i]; Sync Chunk Build Vector Build Vector Build Vector Step1: Build Vector Join-update Build Chunk A0~A15 Data Chunk 16 elements Build Build Chunk Chunk A240~A255 Build Chunk Build Done Build Done Continue Codelet 13
15 Example Dot Product sum=0; for(i=0;i<16*16*16;i++) sum+=a[i]*b[i]; partial sum sum Traverse Vector partial sum Traverse Vector Traverse Vector Step2: Compute Comput e Comput Comput e e Comput e A0~A15 sum B0~B15 Reduce Reduce Reduce 14
16 Outline Introduction to DDDAS/Interaction Computation An Example and Problems Fresh Breeze Execution Model and Architecture Execution Model Memory Model Task Model Streaming and Transactions Stream Type and Operations Concurrency Operations of Transaction Style Conclusion 15
17 What We Will Learn Streaming stream data type and operations Future structure to implement streaming streaming on dataflow model: static dataflow and synchronous dataflow Transactions Determinate and Repeatable Using Future and Guard to implement concurrency operations of transaction style 16
18 Stream Type and Operations Stream: A sequence of values of type, maybe infinite Define a stream Stream <DataItem> instream = new Stream <DataItem>(); DateItem can be any data type Concatenate two streams Stream <DataItem> strm1 = strm0 + new Stream <DataItem>{i0, i1,... } Get first element in stream strm.first(); 17
19 Stream Type and Operations (cont d) Remove the first element in stream Stream <DataItem> strm1 = strm0.rest () Stream <DataItem> strm = strm.first () + strm.rest () Append an data item to stream strm.append(item) ; It is the end of data stream if ( strm.moredata ()) { statement } 18
20 An Stream Example Average Pairs Source Module Average Pairs Sink Module Stream <int> sourcestream = new Stream <int> ( ) ; while ( true ) { int itm = genedataitem (); sourcestream.append (itm); } result sourcestream ; while(instream.moredata()) { int itm0 = instream.first (); instream = instream.rest (); int itm1 = instream.first (); instream = instream.rest (); outstream.append (( itm0 + itm1 ) / 2); } result outstream ; while (instream.moredata ()) { achive.put (instream.first()); instream = instream.rest (); } result achive.getachive(); 19
21 Stream Implementation in FreshBreeze Stream representation a linear chain of chunks, each chunk holds data items and a reference to the next chunk Stream operations FIFO queue operations on chain of chunks read from the head of the chain of chunks, write to the tail of the chain of chunks Synchronization between Producer and Consumer Special Object: Future 20
22 Future A future is a memory cell with a state waiting to receive a data value:status: undefined, defined, waiting Future Read and Future Write are Atomic 1. create future 2. T1 write future 3. T2 read future undef Data defined Data defined Read After Write T2 gets Data 21
23 Future (Cont d) A future is a memory cell with a state waiting to receive a data value:status: undefined, defined, waiting Future Read and Future Write are Atomic 1. create future 2. T1 read future 3. T2 read future 4. T3 write future undef waiting waiting Data defined T1 Read T1 Read T1 Data T2 Read T2 Data Write After Read 22
24 Stream Operation Based on Future Fresh Breeze Instruction Set Support 4 stream operations New, Append, First and Rest 1. new stream undef 23
25 Stream Operation Based on Future Fresh Breeze Instruction Set Support 4 stream operations New, Append, First and Rest 1. new stream 2. append Data1 defined undef 24
26 Stream Operation Based on Future Fresh Breeze Instruction Set Support 4 stream operations New, Append, First and Rest 1. new stream 2. append 3. first Data1 defined undef Data1 25
27 Stream Operation Based on Future Fresh Breeze Instruction Set Support 4 stream operations New, Append, First and Rest 1. new stream 2. append 3. first 4. rest Data1 undef 26
28 Related Works on Streaming Based on Dataflow Model Streaming based on Static Dataflow Model Streaming based Synchronized Dataflow Model 27
29 Translate Average Pairs to Dataflow Graph Tail Recursive AveragePairs D reset AveragePairs Extract Compute [1] [2] + /2 concat R 28
30 Translate Average Pairs to Dataflow Graph General AveragePairs Compute D *TF T I + /2 R 29
31 Translate Average Pairs to Dataflow Graph General AveragePairs Compute 4321 TTF T I + /2 R 30
32 Translate Average Pairs to Dataflow Graph General AveragePairs Compute 432 TT F 1 T 1 I + /2 R 31
33 Translate Average Pairs to Dataflow Graph General AveragePairs Compute 432 TT T I 1 + /2 R 32
34 Translate Average Pairs to Dataflow Graph General AveragePairs Compute 43 T T T 2 2 I 1 + /2 R 33
35 Translate Average Pairs to Dataflow Graph General AveragePairs Compute 43 T T 2 I /2 R 34
36 Translate Average Pairs to Dataflow Graph General AveragePairs Compute 43 T T T 2 I 3 + /2 R 35
37 Translate Average Pairs to Dataflow Graph General AveragePairs Compute 4 T T T 3 3 I 2 + /2 1 R 36
38 Translate Average Pairs to Dataflow Graph General AveragePairs Compute 4 T T 3 2 I 3 + /2 R 1 37
39 Translate Average Pairs to Dataflow Graph General AveragePairs Compute 4 T T 3 I + /2 R
40 Translate Average Pairs to Dataflow Graph General AveragePairs Compute 5 T T T 4 4 I 3 + /2 R
41 Can Any Streaming Program Work? Translate any streaming program into dataflow graph---recursive and General Answer is YES, see Jack 94 s Paper [2] 40
42 Related Works on Streaming Based on Dataflow Model Streaming based on Static Dataflow Model Streaming based Synchronized Dataflow Model 41
43 Synchronous Dataflow Model Synchronous Dataflow Graph (SDF) Node (actor) represents computation Edge is FIFO queue representing data communication Weights on Edge presents producing rate and consuming rate A p FIFO q B Each time firing actor A produces p data items and actor B consumes q data items. 42
44 Average Pairs to Synchronous Dataflow Graph Source push=1 peek=2 pop=1 Average Pairs push=1 pop=1 Sink Extend the SDF to support peek sematic 43
45 Average Pairs to Synchronous Dataflow Graph Source Average Pairs Sink 44
46 Average Pairs to Synchronous Dataflow Graph Source Average Pairs Sink 45
47 Average Pairs to Synchronous Dataflow Graph Source Average Pairs Sink 46
48 Average Pairs to Synchronous Dataflow Graph Source Average Pairs Sink 47
49 Average Pairs to Synchronous Dataflow Graph Source Average Pairs Sink Details about SDF model, see Lee 87 s Paper [8] 48
50 Some Projects Based on SDF model Early Ptolemy Project at UC Berkeley Software Synthesis for Embedded system StreamIt at MIT streaming program language and compiler InforStream and SPL IBM streaming computing product Our work COStream hierarchical date flow programming language and compiler OpenStream language and compiler support for streaming in OpenMP 49
51 Resources Early Ptolemy Project at UC Berkeley StreamIt at MIT InforStream and SPL COStream OpenStream 50
52 Outline Introduction to DDDAS/Interaction Computation An Example and Problems Fresh Breeze Execution Model and Architecture Execution Model Memory Model Task Model Streaming and Transactions Stream Type and Operations Transactions Conclusion 51
53 Concurrent Transactions Scenario: A Simple Shared Hash Table Shared by two concurrent users. Either user may search the value corresponding to a key, and either user may add or delete entries Using concurrent shared queue User A User B Enter Requests Request Queue Process Queue 52
54 Determinate and Repeatable A system is determinate if the same ultimate output is produced for every run of the system for a presented input, and this is true for all choices of input and for all interpretations of operators. A parallel program schema is said to be repeatable if for a specific interpretation of the operators of the schema, the output for all runs will be strictly a function of the input. 53
55 Determinate f() g() Input: x for all interpretation Merge Output: f(x), g(x) g(x), f(x) Non-Determinate Observe the input and output Black Box Principle Determinant requires that a program schema produce the same results for given input regardless of the specific functions assigned to its operators 54
56 Repeatable f() Merge g() Input: x for specific interpretation f()=g() Output: f(x), f(x) Repeatable White Box Principle A program, with specified operations, is repeatable if any run of the program for a given input produces the same result 55
57 Determinate and Repeatable Determinate Repeatable 56
58 Revisit the Concurrent Request Example Request A Request B Enter Requests Request Queue Process Queue Non-determinate The result order of request A and request B can not be fixed Non-Repeatable Enter requests is interpreted as a non-determinate merge 57
59 Support Transaction Using Guard In FreshBreeze Guard object special data object which can only be accessed by GuardSwap instruction GuardSwap atomic instruction put the new data object into guard, and return the old data object in guard For the Concurrent Request Example using a guard to lock the tail of the queue each request needs to get the guard before be added to the tail of the queue 58
60 Concurrent Writes to Queue Two requests arrive Request A head RA defined undef RA defined undef guard RB defined undef Request B 59
61 Concurrent Writes to Queue Contend the guard guardswap (atomic) Request A head RA defined undef RA defined undef guard RB defined undef guardswap (atomic) Request B 60
62 Concurrent Writes to Queue Request A gets the guard and old tail guardswap (atomic) Request A head RA defined undef RA defined undef guard RB defined undef Request B 61
63 Concurrent Writes to Queue Request A substitute the old tail with the new request WriteFuture (atomic) Request A head RA defined undef RA defined guard RB defined undef Request B 62
64 Concurrent Writes to Queue head Request B gets guard and add to the tail RA Request A defined RA defined RB defined undef Homework: Support concurrent reads? guard Request B 63
65 Request B Enter Requests Request Queue Request A // Init queuehead, queuetail; TransactionManager () { queuehead =queuetail new Future <QueueEntry> (); Guard managerguard = new Guard (queuetail); processqueue ( queuehead ) ; } enterrequest ( RQ request ) { QueueEntry newentry = new QueueEntry ( request ); Future <QueueEntry> oldtail = managerguard.guardswap (newentry.next); FutureWrite ( oldtail, newentry ); } Process Queue processqueue( Future <QueueEntry> reqqueue ) { QueueEntry entry = reqqueue.futureread (); <RQ> request = entry. request ; request. processrequest (); Future <QueueEntry> newhead = entry.next ; processqueue ( newhead ) ;} 64
66 Conclusion Codelet based fine-grain execution model is promising for DDDAS Streaming and Transaction are two important features that support Interactive computing Future and Guard are two special mechanisms to support synchronization and concurrency. 65
67 Reference 1. Dennis, J., Arvind, Gao, G., Li, X., Wang, L.P.: CH1: Architecture and Programming Model for High Performance Interactive Computation (2014) 2. Dennis, J.: Stream data types for signal processing. In: Advanced Topics in Dataflow Computing and Multithreading. pp IEEE Computer Society Press (1995) 3. Dennis, J.: The Fresh Breeze Model of Thread Execution. In Workshop on Programming Models for Ubiquitous Parallelism (2006) 4. Dennis, J.: A Fresh Breeze Processor Supporting Instruction Level Parallelism. Proceedings of Data-Flow Execution Models for Extreme Scale Computing (DFM) (2013) 5. Dennis, J., Gao, G.: Multiprocessor implementation of nondeterminate computations in a functional programming framework, Technical Report TM-375. Computation Structures Group, MIT, Cambridge, MA (1995), pubs/memos/memo-375/memo-375.pdf 6. Dennis, J., Gao, G., Sarkar, V.: Determinacy and repeatability of parallel program schemata. Proceedings of Data- Flow Execution Models for Extreme Scale Computing (DFM) pp (September 2012) 7. Dennis, J., Gao, G., Meng, X.: Experiments with the Fresh Breeze tree-based memory model. International Symposium on Supercomputing, Hamburg (June 2011) 8. Edward A. Lee, David G. Messerschmitt: Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing. IEEE Trans. Computers 36(1): (1987)
68 Concurrent Writes to Queue guardswap (atomic) Request A head RA defined undef RA defined undef guard guard RB defined undef guardswap (atomic) Request B 67
Toward A Codelet Based Execution Model and Its Memory Semantics
Toward A Codelet Based Execution Model and Its Memory Semantics -- For Future Extreme-Scale Computing Systems HPC Worshop Centraro, Italy June 20, 2012 Guang R. Gao ACM Fellow and IEEE Fellow Distinguished
More informationArchitecture and Programming Models for High Performance Intensive Computation
AFRL-AFOSR-VA-TR-2016-0230 Architecture and Programming Models for High Performance Intensive Computation XiaoMing Li UNIVERSITY OF DELAWARE 06/29/2016 Final Report Air Force Research Laboratory AF Office
More informationThe Fresh Breeze Model of Thread Execution
The Fresh Breeze Model of Thread Execution ABSTRACT We present the program execution model developed for the Fresh Breeze Project, which has the goal of developing a multi-core chip architecture that supports
More informationCPEG 852 Advanced Topics in Computing Systems The Dataflow Model of Computation
CPEG 852 Advanced Topics in Computing Systems The Dataflow Model of Computation Dynamic Dataflow Stéphane Zuckerman Computer Architecture & Parallel Systems Laboratory Electrical & Computer Engineering
More informationFundamental Algorithms for System Modeling, Analysis, and Optimization
Fundamental Algorithms for System Modeling, Analysis, and Optimization Stavros Tripakis, Edward A. Lee UC Berkeley EECS 144/244 Fall 2014 Copyright 2014, E. A. Lee, J. Roydhowdhury, S. A. Seshia, S. Tripakis
More informationTopic C Memory Models
Memory Memory Non- Topic C Memory CPEG852 Spring 2014 Guang R. Gao CPEG 852 Memory Advance 1 / 29 Memory 1 Memory Memory Non- 2 Non- CPEG 852 Memory Advance 2 / 29 Memory Memory Memory Non- Introduction:
More informationSoftware Synthesis Trade-offs in Dataflow Representations of DSP Applications
in Dataflow Representations of DSP Applications Shuvra S. Bhattacharyya Department of Electrical and Computer Engineering, and Institute for Advanced Computer Studies University of Maryland, College Park
More informationCPEG 852 Advanced Topics in Computing Systems The Dataflow Model of Computation
CPEG 852 Advanced Topics in Computing Systems The Dataflow Model of Computation Stéphane Zuckerman Computer Architecture & Parallel Systems Laboratory Electrical & Computer Engineering Dept. University
More informationStreamIt: A Language for Streaming Applications
StreamIt: A Language for Streaming Applications William Thies, Michal Karczmarek, Michael Gordon, David Maze, Jasper Lin, Ali Meli, Andrew Lamb, Chris Leger and Saman Amarasinghe MIT Laboratory for Computer
More informationCache Aware Optimization of Stream Programs
Cache Aware Optimization of Stream Programs Janis Sermulins, William Thies, Rodric Rabbah and Saman Amarasinghe LCTES Chicago, June 2005 Streaming Computing Is Everywhere! Prevalent computing domain with
More informationUniversity of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory
University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory Massively Multi-Core Systems and Virtual Memory Guang R. Gao and Jack B. Dennis
More informationConcurrent Models of Computation
Concurrent Models of Computation Edward A. Lee Robert S. Pepper Distinguished Professor, UC Berkeley EECS 219D: Concurrent Models of Computation Fall 2011 Copyright 2011, Edward A. Lee, All rights reserved
More informationLecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory
Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory 1 Example Programs Initially, A = B = 0 P1 P2 A = 1 B = 1 if (B == 0) if (A == 0) critical section
More informationMultigrain Parallelism: Bridging Coarse- Grain Parallel Languages and Fine-Grain Event-Driven Multithreading
Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory - CAPSL Multigrain Parallelism: Bridging Coarse- Grain Parallel Languages and Fine-Grain Event-Driven
More informationOverview. CMSC 330: Organization of Programming Languages. Concurrency. Multiprocessors. Processes vs. Threads. Computation Abstractions
CMSC 330: Organization of Programming Languages Multithreaded Programming Patterns in Java CMSC 330 2 Multiprocessors Description Multiple processing units (multiprocessor) From single microprocessor to
More informationEECS 583 Class 20 Research Topic 2: Stream Compilation, GPU Compilation
EECS 583 Class 20 Research Topic 2: Stream Compilation, GPU Compilation University of Michigan December 3, 2012 Guest Speakers Today: Daya Khudia and Mehrzad Samadi nnouncements & Reading Material Exams
More informationEECS 144/244: Fundamental Algorithms for System Modeling, Analysis, and Optimization
EECS 144/244: Fundamental Algorithms for System Modeling, Analysis, and Optimization Dataflow Lecture: SDF, Kahn Process Networks Stavros Tripakis University of California, Berkeley Stavros Tripakis: EECS
More informationDennis [1] developed the model of dataflowschemas, building on work by Karp and Miller [2]. These dataflow graphs, as they were later called, evolved
Predictive Block Dataflow Model for Parallel Computation EE382C: Embedded Software Systems Literature Survey Vishal Mishra and Kursad Oney Dept. of Electrical and Computer Engineering The University of
More informationParallel Programming: Background Information
1 Parallel Programming: Background Information Mike Bailey mjb@cs.oregonstate.edu parallel.background.pptx Three Reasons to Study Parallel Programming 2 1. Increase performance: do more work in the same
More informationParallel Programming: Background Information
1 Parallel Programming: Background Information Mike Bailey mjb@cs.oregonstate.edu parallel.background.pptx Three Reasons to Study Parallel Programming 2 1. Increase performance: do more work in the same
More informationCOMPILER AND MEMORY LAYOUT IN FRESH BREEZE SYSTEM. Chao Yang
COMPILER AND MEMORY LAYOUT IN FRESH BREEZE SYSTEM by Chao Yang A thesis submitted to the Faculty of the University of Delaware in partial fulfillment of the requirements for the degree of Master of Science
More informationOverview of Dataflow Languages. Waheed Ahmad
Overview of Dataflow Languages Waheed Ahmad w.ahmad@utwente.nl The purpose of models is not to fit the data but to sharpen the questions. Samuel Karlins 11 th R.A Fisher Memorial Lecture Royal Society
More informationThreads. Raju Pandey Department of Computer Sciences University of California, Davis Spring 2011
Threads Raju Pandey Department of Computer Sciences University of California, Davis Spring 2011 Threads Effectiveness of parallel computing depends on the performance of the primitives used to express
More informationLecture 4: Synchronous Data Flow Graphs - HJ94 goal: Skiing down a mountain
Lecture 4: Synchronous ata Flow Graphs - I. Verbauwhede, 05-06 K.U.Leuven HJ94 goal: Skiing down a mountain SPW, Matlab, C pipelining, unrolling Specification Algorithm Transformations loop merging, compaction
More informationStatic Scheduling and Code Generation from Dynamic Dataflow Graphs With Integer- Valued Control Streams
Presented at 28th Asilomar Conference on Signals, Systems, and Computers November, 994 Static Scheduling and Code Generation from Dynamic Dataflow Graphs With Integer- Valued Control Streams Joseph T.
More informationTopic B (Cont d) Dataflow Model of Computation
opic B (Cont d) Dataflow Model of Computation Guang R. Gao ACM ellow and IEEE ellow Endowed Distinguished Professor Electrical & Computer Engineering University of Delaware ggao@capsl.udel.edu 09/07/20
More informationMarwan Burelle. Parallel and Concurrent Programming
marwan.burelle@lse.epita.fr http://wiki-prog.infoprepa.epita.fr Outline 1 2 3 OpenMP Tell Me More (Go, OpenCL,... ) Overview 1 Sharing Data First, always try to apply the following mantra: Don t share
More informationAdvanced Topic: Efficient Synchronization
Advanced Topic: Efficient Synchronization Multi-Object Programs What happens when we try to synchronize across multiple objects in a large program? Each object with its own lock, condition variables Is
More informationConcept of a process
Concept of a process In the context of this course a process is a program whose execution is in progress States of a process: running, ready, blocked Submit Ready Running Completion Blocked Concurrent
More informationLabVIEW Based Embedded Design [First Report]
LabVIEW Based Embedded Design [First Report] Sadia Malik Ram Rajagopal Department of Electrical and Computer Engineering University of Texas at Austin Austin, TX 78712 malik@ece.utexas.edu ram.rajagopal@ni.com
More informationHW/SW Codesign. Exercise 2: Kahn Process Networks and Synchronous Data Flows
HW/SW Codesign Exercise 2: Kahn Process Networks and Synchronous Data Flows 4. October 2017 Stefan Draskovic stefan.draskovic@tik.ee.ethz.ch slides by: Mirela Botezatu 1 Kahn Process Network (KPN) Specification
More informationDynamic Scheduling Implementation to Synchronous Data Flow Graph in DSP Networks
Dynamic Scheduling Implementation to Synchronous Data Flow Graph in DSP Networks ENSC 833 Project Final Report Zhenhua Xiao (Max) zxiao@sfu.ca April 22, 2001 Department of Engineering Science, Simon Fraser
More informationParallel Program Execution and Architecture Models with Dataflow Origin The EARTH Experience
Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory - CAPSL Parallel Program Execution and Architecture Models with Dataflow Origin The EARTH Experience
More informationThreads Implementation. Jo, Heeseung
Threads Implementation Jo, Heeseung Today's Topics How to implement threads? User-level threads Kernel-level threads Threading models 2 Kernel/User-level Threads Who is responsible for creating/managing
More informationWhat Came First? The Ordering of Events in
What Came First? The Ordering of Events in Systems @kavya719 kavya the design of concurrent systems Slack architecture on AWS systems with multiple independent actors. threads in a multithreaded program.
More informationLecture 21: Transactional Memory. Topics: Hardware TM basics, different implementations
Lecture 21: Transactional Memory Topics: Hardware TM basics, different implementations 1 Transactions New paradigm to simplify programming instead of lock-unlock, use transaction begin-end locks are blocking,
More informationHardware Software Codesign
Hardware Software Codesign 2. Specification and Models of Computation Lothar Thiele 2-1 System Design Specification System Synthesis Estimation SW-Compilation Intellectual Prop. Code Instruction Set HW-Synthesis
More informationDataflow Languages. Languages for Embedded Systems. Prof. Stephen A. Edwards. March Columbia University
Dataflow Languages Languages for Embedded Systems Prof. Stephen A. Edwards Columbia University March 2009 Philosophy of Dataflow Languages Drastically different way of looking at computation Von Neumann
More informationEE382N.23: Embedded System Design and Modeling
EE38N.3: Embedded System Design and Modeling Lecture 5 Process-Based MoCs Andreas Gerstlauer Electrical and Computer Engineering University of Texas at Austin gerstl@ece.utexas.edu Lecture 5: Outline Process-based
More informationTopic 4c: A hybrid Dataflow-Von Neumann PXM: The EARTH Experience
Topic 4c: A hybrid Dataflow-Von Neumann PXM: The EARTH Experience CPEG421/621: Compiler Design Material mostly taken from Professor Guang R. Gao s previous courses, with additional material from J.Suetterlein.
More informationUniversity of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory
University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory Experiments with the Fresh Breeze Tree-Based Memory Model Jack B. Dennis,
More informationCS301 - Data Structures Glossary By
CS301 - Data Structures Glossary By Abstract Data Type : A set of data values and associated operations that are precisely specified independent of any particular implementation. Also known as ADT Algorithm
More informationEfficient Work Stealing for Fine-Grained Parallelism
Efficient Work Stealing for Fine-Grained Parallelism Karl-Filip Faxén Swedish Institute of Computer Science November 26, 2009 Task parallel fib in Wool TASK 1( int, fib, int, n ) { if( n
More informationToday s Topics. u Thread implementation. l Non-preemptive versus preemptive threads. l Kernel vs. user threads
Today s Topics COS 318: Operating Systems Implementing Threads u Thread implementation l Non-preemptive versus preemptive threads l Kernel vs. user threads Jaswinder Pal Singh and a Fabulous Course Staff
More informationMidterm I October 12 th, 2005 CS162: Operating Systems and Systems Programming
University of California, Berkeley College of Engineering Computer Science Division EECS Fall 2005 John Kubiatowicz Midterm I October 12 th, 2005 CS162: Operating Systems and Systems Programming Your Name:
More informationOpenMP Optimization and its Translation to OpenGL
OpenMP Optimization and its Translation to OpenGL Santosh Kumar SITRC-Nashik, India Dr. V.M.Wadhai MAE-Pune, India Prasad S.Halgaonkar MITCOE-Pune, India Kiran P.Gaikwad GHRIEC-Pune, India ABSTRACT For
More informationLecture 10: Multi-Object Synchronization
CS 422/522 Design & Implementation of Operating Systems Lecture 10: Multi-Object Synchronization Zhong Shao Dept. of Computer Science Yale University Acknowledgement: some slides are taken from previous
More informationLecture notes Lectures 1 through 5 (up through lecture 5 slide 63) Book Chapters 1-4
EE445M Midterm Study Guide (Spring 2017) (updated February 25, 2017): Instructions: Open book and open notes. No calculators or any electronic devices (turn cell phones off). Please be sure that your answers
More informationfakultät für informatik informatik 12 technische universität dortmund Data flow models Peter Marwedel TU Dortmund, Informatik /10/08
12 Data flow models Peter Marwedel TU Dortmund, Informatik 12 2009/10/08 Graphics: Alexandra Nolte, Gesine Marwedel, 2003 Models of computation considered in this course Communication/ local computations
More informationLecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6)
Lecture: Consistency Models, TM Topics: consistency models, TM intro (Section 5.6) 1 Coherence Vs. Consistency Recall that coherence guarantees (i) that a write will eventually be seen by other processors,
More informationCSE 544 Principles of Database Management Systems
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 5 - DBMS Architecture and Indexing 1 Announcements HW1 is due next Thursday How is it going? Projects: Proposals are due
More informationProblem Set 2. CS347: Operating Systems
CS347: Operating Systems Problem Set 2 1. Consider a clinic with one doctor and a very large waiting room (of infinite capacity). Any patient entering the clinic will wait in the waiting room until the
More informationCPSC/ECE 3220 Fall 2017 Exam Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.
CPSC/ECE 3220 Fall 2017 Exam 1 Name: 1. Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.) Referee / Illusionist / Glue. Circle only one of R, I, or G.
More informationLecture 14. Synthesizing Parallel Programs IAP 2007 MIT
6.189 IAP 2007 Lecture 14 Synthesizing Parallel Programs Prof. Arvind, MIT. 6.189 IAP 2007 MIT Synthesizing parallel programs (or borrowing some ideas from hardware design) Arvind Computer Science & Artificial
More informationIntroduction to Computer Systems /18-243, fall th Lecture, Dec 1
Introduction to Computer Systems 15-213/18-243, fall 2009 24 th Lecture, Dec 1 Instructors: Roger B. Dannenberg and Greg Ganger Today Multi-core Thread Level Parallelism (TLP) Simultaneous Multi -Threading
More informationIntel Thread Building Blocks, Part IV
Intel Thread Building Blocks, Part IV SPD course 2017-18 Massimo Coppola 13/04/2018 1 Mutexes TBB Classes to build mutex lock objects The lock object will Lock the associated data object (the mutex) for
More informationDynamic Expressivity with Static Optimization for Streaming Languages
Dynamic Expressivity with Static Optimization for Streaming Languages Robert Soulé Michael I. Gordon Saman marasinghe Robert Grimm Martin Hirzel ornell MIT MIT NYU IM DES 2013 1 Stream (FIFO queue) Operator
More informationFuture Directions. Edward A. Lee. Berkeley, CA May 12, A New Computational Platform: Ubiquitous Networked Embedded Systems. actuate.
Future Directions Edward A. Lee 6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 A New Computational Platform: Ubiquitous Networked Embedded Systems sense actuate control Ptolemy II support
More informationDeterministic Concurrency
Candidacy Exam p. 1/35 Deterministic Concurrency Candidacy Exam Nalini Vasudevan Columbia University Motivation Candidacy Exam p. 2/35 Candidacy Exam p. 3/35 Why Parallelism? Past Vs. Future Power wall:
More informationSWARM Tutorial. Chen Chen 4/12/2012
SWARM Tutorial Chen Chen 4/12/2012 1 Outline Introduction to SWARM Programming in SWARM Atomic Operations in SWARM Parallel For Loop in SWARM 2 Outline Introduction to SWARM Programming in SWARM Atomic
More informationEE213A - EE298-2 Lecture 8
EE3A - EE98- Lecture 8 Synchronous ata Flow Ingrid Verbauwhede epartment of Electrical Engineering University of California Los Angeles ingrid@ee.ucla.edu EE3A, Spring 000, Ingrid Verbauwhede, UCLA - Lecture
More informationCSE Traditional Operating Systems deal with typical system software designed to be:
CSE 6431 Traditional Operating Systems deal with typical system software designed to be: general purpose running on single processor machines Advanced Operating Systems are designed for either a special
More informationImplementation of Process Networks in Java
Implementation of Process Networks in Java Richard S, Stevens 1, Marlene Wan, Peggy Laramie, Thomas M. Parks, Edward A. Lee DRAFT: 10 July 1997 Abstract A process network, as described by G. Kahn, is a
More informationConcurrent Component Patterns, Models of Computation, and Types
Concurrent Component Patterns, Models of Computation, and Types Edward A. Lee Yuhong Xiong Department of Electrical Engineering and Computer Sciences University of California at Berkeley Presented at Fourth
More informationApplying Models of Computation to OpenCL Pipes for FPGA Computing. Nachiket Kapre + Hiren Patel
Applying Models of Computation to OpenCL Pipes for FPGA Computing Nachiket Kapre + Hiren Patel nachiket@uwaterloo.ca Outline Models of Computation and Parallelism OpenCL code samples Synchronous Dataflow
More informationStreamIt on Fleet. Amir Kamil Computer Science Division, University of California, Berkeley UCB-AK06.
StreamIt on Fleet Amir Kamil Computer Science Division, University of California, Berkeley kamil@cs.berkeley.edu UCB-AK06 July 16, 2008 1 Introduction StreamIt [1] is a high-level programming language
More informationAN EFFICIENT IMPLEMENTATION OF NESTED LOOP CONTROL INSTRUCTIONS FOR FINE GRAIN PARALLELISM 1
AN EFFICIENT IMPLEMENTATION OF NESTED LOOP CONTROL INSTRUCTIONS FOR FINE GRAIN PARALLELISM 1 Virgil Andronache Richard P. Simpson Nelson L. Passos Department of Computer Science Midwestern State University
More informationThreads. Computer Systems. 5/12/2009 cse threads Perkins, DW Johnson and University of Washington 1
Threads CSE 410, Spring 2009 Computer Systems http://www.cs.washington.edu/410 5/12/2009 cse410-20-threads 2006-09 Perkins, DW Johnson and University of Washington 1 Reading and References Reading» Read
More informationThe Problem with Treads
The Problem with Treads Edward A. Lee Programming Technology Lecture 2 11/09/08 Background on Edward A. Lee Bachelors degree (Yale University) (1979) Master degree (MIT) (1981) Ph.D. (U. C. Berkeley) (1986)
More informationPointers II. Class 31
Pointers II Class 31 Compile Time all of the variables we have seen so far have been declared at compile time they are written into the program code you can see by looking at the program how many variables
More informationEfficient streaming applications on multi-core with FastFlow: the biosequence alignment test-bed
Efficient streaming applications on multi-core with FastFlow: the biosequence alignment test-bed Marco Aldinucci Computer Science Dept. - University of Torino - Italy Marco Danelutto, Massimiliano Meneghin,
More informationCSE 120 Principles of Operating Systems
CSE 120 Principles of Operating Systems Spring 2009 Lecture 4: Threads Geoffrey M. Voelker Announcements Homework #1 due now Project 0 due tonight Project 1 out April 9, 2009 CSE 120 Lecture 4 Threads
More informationAssessment of New Languages of Multicore Processors
Assessment of New Languages of Multicore Processors Mahmoud Parmouzeh, Department of Computer, Shabestar Branch, Islamic Azad University, Shabestar, Iran. parmouze@gmail.com AbdolNaser Dorgalaleh, Department
More informationConcurrent Models of Computation for Embedded Software
Concurrent Models of Computation for Embedded Software Edward A. Lee Professor, UC Berkeley EECS 219D Concurrent Models of Computation Fall 2011 Copyright 2009-2011, Edward A. Lee, All rights reserved
More informationMultithreaded Algorithms Part 1. Dept. of Computer Science & Eng University of Moratuwa
CS4460 Advanced d Algorithms Batch 08, L4S2 Lecture 11 Multithreaded Algorithms Part 1 N. H. N. D. de Silva Dept. of Computer Science & Eng University of Moratuwa Announcements Last topic discussed is
More informationQuestions answered in this lecture: CS 537 Lecture 19 Threads and Cooperation. What s in a process? Organizing a Process
Questions answered in this lecture: CS 537 Lecture 19 Threads and Cooperation Why are threads useful? How does one use POSIX pthreads? Michael Swift 1 2 What s in a process? Organizing a Process A process
More informationCSE 120 Principles of Operating Systems
CSE 120 Principles of Operating Systems Fall 2015 Lecture 4: Threads Geoffrey M. Voelker Announcements Project 0 due Project 1 out October 6, 2015 CSE 120 Lecture 4 Threads 2 Processes Recall that a process
More informationScheduling FFT Computation on SMP and Multicore Systems Ayaz Ali, Lennart Johnsson & Jaspal Subhlok
Scheduling FFT Computation on SMP and Multicore Systems Ayaz Ali, Lennart Johnsson & Jaspal Subhlok Texas Learning and Computation Center Department of Computer Science University of Houston Outline Motivation
More informationCSE 544: Principles of Database Systems
CSE 544: Principles of Database Systems Anatomy of a DBMS, Parallel Databases 1 Announcements Lecture on Thursday, May 2nd: Moved to 9am-10:30am, CSE 403 Paper reviews: Anatomy paper was due yesterday;
More informationCSC2/458 Parallel and Distributed Systems Scalable Synchronization
CSC2/458 Parallel and Distributed Systems Scalable Synchronization Sreepathi Pai February 20, 2018 URCS Outline Scalable Locking Barriers Outline Scalable Locking Barriers An alternative lock ticket lock
More informationUnderstandable Concurrency
Edward A. Lee Professor, Chair of EE, and Associate Chair of EECS Director, CHESS: Center for Hybrid and Embedded Software Systems Director, Ptolemy Project UC Berkeley Chess Review November 21, 2005 Berkeley,
More informationInterface Automata and Actif Actors
Interface Automata and Actif Actors H. John Reekie Dept. of Electrical Engineering and Computer Science University of California at Berkeley johnr@eecs.berkeley.edu Abstract This technical note uses the
More informationHierarchical FSMs with Multiple CMs
Hierarchical FSMs with Multiple CMs Manaloor Govindarajan Balasubramanian Manikantan Bharathwaj Muthuswamy (aka Bharath) Reference: Hierarchical FSMs with Multiple Concurrency Models. Alain Girault, Bilung
More informationCS 152 Computer Architecture and Engineering. Lecture 19: Synchronization and Sequential Consistency
CS 152 Computer Architecture and Engineering Lecture 19: Synchronization and Sequential Consistency Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste
More informationDisciplined Concurrent Models of Computation for Parallel Software
Disciplined Concurrent Models of Computation for Parallel Software Edward A. Lee Robert S. Pepper Distinguished Professor and UC Berkeley Invited Keynote Talk 2008 Summer Institute The Concurrency Challenge:
More informationLecture: Consistency Models, TM
Lecture: Consistency Models, TM Topics: consistency models, TM intro (Section 5.6) No class on Monday (please watch TM videos) Wednesday: TM wrap-up, interconnection networks 1 Coherence Vs. Consistency
More informationRegion-Based Memory Management for Task Dataflow Models
Region-Based Memory Management for Task Dataflow Models Nikolopoulos, D. (2012). Region-Based Memory Management for Task Dataflow Models: First Joint ENCORE & PEPPHER Workshop on Programmability and Portability
More informationProcessor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP
Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Presenter: Course: EEC 289Q: Reconfigurable Computing Course Instructor: Professor Soheil Ghiasi Outline Overview of M.I.T. Raw processor
More informationPOSIX Threads: a first step toward parallel programming. George Bosilca
POSIX Threads: a first step toward parallel programming George Bosilca bosilca@icl.utk.edu Process vs. Thread A process is a collection of virtual memory space, code, data, and system resources. A thread
More informationThreads Questions Important Questions
Threads Questions Important Questions https://dzone.com/articles/threads-top-80-interview https://www.journaldev.com/1162/java-multithreading-concurrency-interviewquestions-answers https://www.javatpoint.com/java-multithreading-interview-questions
More informationIBM Power Multithreaded Parallelism: Languages and Compilers. Fall Nirav Dave
6.827 Multithreaded Parallelism: Languages and Compilers Fall 2006 Lecturer: TA: Assistant: Arvind Nirav Dave Sally Lee L01-1 IBM Power 5 130nm SOI CMOS with Cu 389mm 2 2GHz 276 million transistors Dual
More informationConcurrent ML. John Reppy January 21, University of Chicago
Concurrent ML John Reppy jhr@cs.uchicago.edu University of Chicago January 21, 2016 Introduction Outline I Concurrent programming models I Concurrent ML I Multithreading via continuations (if there is
More informationLecture 15: I/O Devices & Drivers
CS 422/522 Design & Implementation of Operating Systems Lecture 15: I/O Devices & Drivers Zhong Shao Dept. of Computer Science Yale University Acknowledgement: some slides are taken from previous versions
More informationMarwan Burelle. Parallel and Concurrent Programming
marwan.burelle@lse.epita.fr http://wiki-prog.infoprepa.epita.fr Outline 1 2 Solutions Overview 1 Sharing Data First, always try to apply the following mantra: Don t share data! When non-scalar data are
More informationLecture 4: Threads; weaving control flow
Lecture 4: Threads; weaving control flow CSE 120: Principles of Operating Systems Alex C. Snoeren HW 1 Due NOW Announcements Homework #1 due now Project 0 due tonight Project groups Please send project
More informationEmbedded Systems 8. Identifying, modeling and documenting how data moves around an information system. Dataflow modeling examines
Embedded Systems 8 - - Dataflow modeling Identifying, modeling and documenting how data moves around an information system. Dataflow modeling examines processes (activities that transform data from one
More informationParallel Program Execution and Architecture Models With Dataflow Origin -- The EARTH Experience
Parallel Program Execution and Architecture Models With Dataflow Origin -- The EARTH Experience Guang R. Gao ACM Fellow and IEEE Fellow Endowed Distinguished Professor Electrical & Computer Engineering
More informationSYSTEMS MEMO #12. A Synchronization Library for ASIM. Beng-Hong Lim Laboratory for Computer Science.
ALEWIFE SYSTEMS MEMO #12 A Synchronization Library for ASIM Beng-Hong Lim (bhlim@masala.lcs.mit.edu) Laboratory for Computer Science Room NE43-633 January 9, 1992 Abstract This memo describes the functions
More informationIntroduction to Parallel Computing
Portland State University ECE 588/688 Introduction to Parallel Computing Reference: Lawrence Livermore National Lab Tutorial https://computing.llnl.gov/tutorials/parallel_comp/ Copyright by Alaa Alameldeen
More informationADAPTIVE TASK SCHEDULING USING LOW-LEVEL RUNTIME APIs AND MACHINE LEARNING
ADAPTIVE TASK SCHEDULING USING LOW-LEVEL RUNTIME APIs AND MACHINE LEARNING Keynote, ADVCOMP 2017 November, 2017, Barcelona, Spain Prepared by: Ahmad Qawasmeh Assistant Professor The Hashemite University,
More information