Engineering Parallel Software with

Size: px
Start display at page:

Download "Engineering Parallel Software with"

Transcription

1 Engineering Parallel Software with Our Pattern Language Profeor Kurt Keutzer and Tim Matton and (Jike Chong), Ekaterina Gonina, Bor-Yiing Su and Michael Anderon, Bryan Catanzaro, Chao-Yue Lai, Mark Murphy, David Sheffield, Naryanan Sundaram, 1/77 Main Point of Previou Lecture Many approache to parallelizing oftware are not working Profile and improve Swap in a new parallel programming language Rely on a uper parallelizing compiler My own experience ha hown that a ound oftware architecture i the greatet ingle indicator of a oftware project ucce. Software mut be architected to achieve productivity, efficiency, and correctne SW architecture >> programming environment >> programming language >> compiler and debugger (>>hardware architecture) If we had undertood how to architect equential oftware, then parallelizing oftware would not have been uch a challenge Key to architecture (oftware or otherwie) i deign pattern and a pattern language At the highet level our pattern language ha: Eight tructural pattern Thirteen computational pattern Ye, we really believe arbitrarily complex parallel oftware can built jut from thee! 2/77 Kurt Keutzer 1

2 Outline Our Pattern Language for parallel programming Detailed example uing Our Pattern Language 3/77 Alexander Pattern Language Chritopher Alexander approach to (civil) architecture: "Each pattern decribe a problem which occur over and over again in our environment, and then decribe the core of the olution to that problem, in uch a way that you can ue thi olution a million time over, without ever doing it the ame way twice. Page x, A Pattern Language, Chritopher Alexander Alexander 253 (civil) architectural pattern range from the creation of citie (2. ditribution of town) to particular building problem (232. roof cap) A pattern language i an organized way of tackling an architectural problem uing pattern Main limitation: It about civil not oftware architecture!!! 4 4/77 Kurt Keutzer 2

3 A Pattern Language Pattern embody generalizable olution to recurrent problem Collection of individual pattern (e.g. Deign Pattern, Gamma, Helm, Johnon, Vliide) are great, but we need more help in the oftware development enterprie We would like a comprehenive pattern language which cover the entire proce of parallel oftware development and implementation Keutzer, Matton, the PALLAS group, and other have developed jut uch a pattern language Pattern language i overviewed in: Today we don t have time to go through all the pattern, but we will briefly decribe the tructure of OPL and how how it can be ued 5/77 Our Pattern Language: At a High Level Application Structural Pattern Computational Pattern Implementation Pattern Execution Pattern 6/77 Kurt Keutzer 3

4 Architecting Parallel Software Application Identify the Software Structure Pipe-and-Filter Agent-and-Repoitory Event-baed Bulk Synchronou MapReduce Layered Sytem Model-view controller Arbitrary Tak Graph Puppeteer Model-view-controller Identify the Key Computation Graph Algorithm Dynamic programming Dene/Spare Linear Algebra Un/Structured Grid Graphical Model Finite State Machine Backtrack Branch-and-Bound N-Body Method Circuit Spectral Method Monte-Carlo 7/77 Our Pattern Language 2010: Detail Structural Pattern Pipe-and-Filter Agent-and-Repoitory Proce-Control Event-Baed/Implicit- Invocation Puppeteer Model-View-Controller Iterative-Refinement Map-Reduce Layered-Sytem Arbitrary-Static-Tak-Graph Computational Pattern Graph-Algorithm Dynamic-Programming Dene-Linear-Algebra Spare-Linear-Algebra Untructured-Grid Structured-Grid Graphical-Model Finite-State-Machine Backtrack-Branch-and- Bound N-Body-Method Circuit Spectral-Method Monte-Carlo Tak-Parallelim Divide and Conquer Data-Parallelim Dicrete-Event Geometric-Decompoition Implementation Strategy Pattern Shared-Queue SPMD Fork/Join Loop-Par. Ditributed-Array Shared-map Data-Par/index-pace Actor Tak-Queue Shared-Data Partitioned Graph Program tructure Data tructure Parallel Execution Pattern MIMD SIMD Thread-Pool Tak-Graph Tranaction 8/77 Kurt Keutzer 4

5 Our Pattern Language: At a High Level Application Structural Pattern Computational Pattern Implementation Pattern Execution Pattern 9/77 The Algorithm Strategy Deign Space Start Organize By Flow of Data Organize By Tak Organize By Data Regular? Irregular? Linear? Recurive? Linear? Recurive? Dicrete Event Tak Parallelim Divide and Conquer Geometric Decompoition??? Tak-Parallelim Data-Parallelim Divide and Conquer Dicrete-Event Geometric-Decompoition 10/77 Kurt Keutzer 5

6 Our Pattern Language: At a High Level Application Structural Pattern Computational Pattern Implementation Pattern Execution Pattern Implementation Strategy Pattern SPMD Fork/Join Data-Par/index-pace Actor Program tructure Loop-Par. Tak-Queue Shared-Queue Shared-map Partitioned Graph Ditributed-Array Shared-Data Data tructure 11/77 Our Pattern Language: At a High Level Application Structural Pattern Computational Pattern Implementation Pattern Execution Pattern Parallel Execution Pattern MIMD SIMD Thread-Pool Tak-Graph Tranaction 12/77 Kurt Keutzer 6

7 Outline Our Pattern Language for parallel programming Detailed example uing Our Pattern Language 13/77 Speech Recognition Application Software Architecture uing Pattern Identify Structural Pattern Identify Computational Pattern Parallelization: (for each module) Algorithm trategy pattern Implementation trategy pattern Execution pattern Concluion Outline 14/77 Kurt Keutzer 7

8 Automatic Speech Recognition Key technology for enabling rich human-computer interaction Increaingly important for intelligent device without keyboard Interaction require low latency repone Only one of many component in exciting new real-time application Main contribution: A detailed analyi of the concurrency in large vocabulary continuou peech recognition (LVCSR) ummarized in the pattern language An implementation on GPU with 11x peedup over optimized C++ verion Achieving 3x better than real time performance with 50,000 word vocabulary 15/77 Continuou Speech Recognition Challenge: Recognizing word from a large vocabulary arranged in exponentially many poible permutation Inferring word boundarie from the context of neighboring word Hidden Markov Model (HMM) i the mot ucceful approach 16/77 Kurt Keutzer 8

9 CSR Sytem Structure Inference engine baed ytem Ued in Sphinx (CMU, USA), HTK (Cambridge, UK), and Juliu (CSRC, Japan) Modular and flexible etup Shown to be effective for Arabic, Englih, Japanee, and Mandarin 17/77 Recognition Proce Conider a equence of feature one at a time, for each feature: Start with mot-likely-equence ending with previou feature Correponding to a et of active tate Conider the likelihood of current feature in it context Compute mot-likely-equence ending with current feature Correponding to the next et of active tate Recognition i a proce of graph traveral Recognition Network 18/77 Kurt Keutzer 9

10 Recognition Network Feature from one frame Gauian Mixture Model for One Phone State Mixture Component Computing ditance to each mixture component Computing weighted um of all component HMM Acoutic Phone Model aa hh n Pronunciation Model... HOP hh aa p... ON aa n... POP p aa p... Bigram Language Model T P CAT HAT HO IN ON PO P THE Compiled HMM Recognition Network... CAT... HAT HOP IN... ON POP... THE... 19/77 Graph Traveral Characteritic Compiled HMM Recognition Network Linear Lexical Network WFST Recognition Network Operation on a graph i driven by the graph tructure Irregular graph tructure challenging to tatically load balancing Irregular tate acce pattern challenging to dynamically load balancing Multiple tate may hare the ame next tate write contention in arc evaluation Traveral produce memory accee that pan the entire graph poor patial locality Inner loop have high data acce to computation ratio limited by memory bandwidth 20/77 Kurt Keutzer 10

11 Parallel Platform Characteritic Multicore/manycore deign philoophy Multicore: Devote ignificant tranitor reource to ingle thread performance Core Manycore: Maximizing computation throughput at the expene of ingle thread performance Architecture Trend: Increaing vector unit width Increaing number of core per die Core Core Cach Cach e e Cac ch Cach e e Core Core Application Implication: Mut increae data acce regularity Mut optimize ynchronization cot Core Cach e Cach e Core 21/77 Outline Speech Recognition Application Software Architecture uing Pattern Identify Structural Pattern Identify Computational Pattern Parallelization: (for each module) Algorithm trategy pattern Implementation trategy pattern Execution pattern Concluion 22/77 Kurt Keutzer 11

12 OPL 2010 Application Structural Pattern Pipe-and-Filter Agent-and-Repoitory Proce-Control Event-Baed/Implicit- Invocation Puppeteer Model-View-Controller Iterative-Refinement Map-Reduce Layered-Sytem Arbitrary-Static-Tak-Graph Computational Pattern Graph-Algorithm Dynamic-Programming Dene-Linear-Algebra Spare-Linear-Algebra Untructured-Grid Structured-Grid Graphical-Model Finite-State-Machine Backtrack-Branch-and- Bound N-Body-Method Circuit Spectral-Method Monte-Carlo Tak-Parallelim Divide and Conquer Data-Parallelim Dicrete-Event Geometric-Decompoition Implementation Strategy Pattern Shared-Queue SPMD Fork/Join Loop-Par. Ditributed-Array Shared-map Data-Par/index-pace Actor Tak-Queue Shared-Data Partitioned Graph Program tructure Data tructure Parallel Execution Pattern MIMD SIMD Thread-Pool Tak-Graph Tranaction Concurrency Foundation contruct (not expreed a pattern) Thread creation/detruction Proce creation/detruction Meage-Paing Collective-Comm. Tranactional memory Point-To-Point-Sync. (mutual excluion) collective ync. (barrier) Memory ync/fence 23 23/77 Architecting Parallel Software Decompoe Tak Group tak Order Tak Decompoe Data Data haring Data acce Identify the Software Identify the Key Structure Computation 24/77 Kurt Keutzer 12

13 Inference Engine Architecture Intro Recognition i a proce of graph traveral Each time-tep we need to identify the likely tate in the recognition network given the obervation acoutic ignal From a the of active tate we want to compute the next et of active tate uing probabilitie of acoutic ymbol and tate tranition What Structural pattern i thi? 25/77 Key computation: HMM Inference Algorithm An intance of: Graphical Model Implemented with: Dynamic Programming Find the mot-likely equence of tate that produced the obervation Viterbi Algorithm t State 1 State 2 State 3 State 4 Ob 1 Ob 2 Ob 3 Ob 4 x x x x Legend: A State x An Obervation P( x t t ) m [t-1][ t-1 ] P( t t-1 ) m [t][ t ] Markov Condition: J. Chong, Y. Yi, A. Faria, N.R. Satih and K. Keutzer, Data-Parallel Large Vocabulary Continuou Speech Recognition on Graphic Proceor, Emerging Application and Manycore Arch. 2008, pp , June /77 Kurt Keutzer 13

14 Iterative Refinement Structural Pattern One iteration per time tep Identify the et of probable tate in the network given acoutic ignal given current active tate et Prune unlikely tate Repeat 27/77 Digging Deeper Active Set Computation Architecture In each iteration we need to: Compute obervation probabilitie of tranition from current tate Travere the likely non-epilon arc to reach the et of next active tate t Travere the likely epilon arc to reach the et of next active tate What Structural pattern i thi? 28/77 Kurt Keutzer 14

15 Digging Deeper Active Set Computation Architecture Phae 1 Obervation probability computation Graphtraveral Phae 2 In each iteration we need to: Compute obervation probabilitie of tranition from current tate Travere the likely non-epilon arc to reach the et of next active tate Travere the likely epilon arc to reach the et of next active tate What Structural pattern i thi? 29/77 Phae 1: Obervation Probability computation Architecture Obervation probabilitie are computed from Gauian Mixture Model Each Gauian probability in each mixture i independent Probability for one phone tate i the um of all Gauian time the mixture probability for that tate What Structural pattern i thi? Dan Klein CS288, Lecture 9 30/77 Kurt Keutzer 15

16 Map-Reduce Structural Pattern Map each mixture probability computation Reduce the reult accumulate the total probability for that tate Gauian Mixture Model for One Phone State Mixture Component 31/77 Phae 2: Graph Traveral Now that we know the tranition probabilitie from current et of tate, we need to compute the next et of active tate (follow the likely tranition) Each tranition i independent Multiple tranition might end in the ame tate The end reult need to be a et of mot probable tate from all tranition What Structural pattern i thi? 32/77 Kurt Keutzer 16

17 Map-Reduce Structural Pattern- again Map each mixture probability computation Reduce the reult accumulate the total probability for that tate 33/77 High Level Structure of Engine Inference Engine Active Set Computation Inference Engine Active Set Computation Phae 1: Obervation Probability Computation Phae 2: Graph Traveral Phae 1 Phae 2 34/77 Kurt Keutzer 17

18 Outline Speech Recognition Application Software Architecture uing Pattern Identify Structural Pattern Identify Computational Pattern Parallelization: (for each module) Algorithm trategy pattern Implementation trategy pattern Execution pattern Concluion 35/77 What about Computation? Active Set Computation: Phae 1: Compute obervation probability of tranition given current et of tate Phae 2: Travere arc to determine next et of mot likely active tate What Computational Pattern are thee? Computational Pattern Graph-Algorithm Dynamic-Programming Dene-Linear-Algebra Spare-Linear-Algebra Untructured-Grid Structured-Grid Graphical-Model Finite-State-Machine Backtrack-Branch-and- Bound N-Body-Method Circuit Spectral-Method Monte-Carlo 36/77 Kurt Keutzer 18

19 Outline Speech Recognition Application Software Architecture uing Pattern Identify Structural Pattern Identify Computational Pattern Parallelization: (for each module) Algorithm trategy pattern Implementation trategy pattern Execution pattern Concluion 37/77 Now to Parallelim Inference Engine Structural Pattern: Iterative Refinement Computational Pattern: Dynamic Programming Inference engine and Active Set computation i equential Let look at Phae 1 and Phae 2 What Parallel Algorithm Strategy can we ue? Tak-Parallelim Data-Parallelim Divide and Conquer Dicrete-Event Geometric-Decompoition 38/77 Kurt Keutzer 19

20 Now to Parallelim Inference Engine Phae 1: Obervation Probability Computation Structural: MapReduce Computational: Graphical Model Compute cluter Gauian Mixture Probabilitie for each tranition label Hint: Look at data dependencie (or lack there-of) Tak-Parallelim Divide and Conquer Data-Parallelim Dicrete-Event Geometric-Decompoition 39/77 Now to Parallelim Inference Engine Phae 1: Obervation Probability Computation Structural: MapReduce Computational: Graphical Model Compute cluter Gauian Mixture Probabilitie for each tranition label Map reduce necearily implie dataparallelim Tak-Parallelim Data-Parallelim Divide and Conquer Dicrete-Event Geometric-Decompoition 40/77 Kurt Keutzer 20

21 Obervation Probability Pattern Computation Structural Pattern Pipe-and-Filter Agent-and-Repoitory Proce-Control Event-Baed/Implicit- Invocation Puppeteer Model-View-Controller Iterative-Refinement Map-Reduce Layered-Sytem Arbitrary-Static-Tak-Graph Computational Pattern Graph-Algorithm Dynamic-Programming Dene-Linear-Algebra Spare-Linear-Algebra Untructured-Grid Structured-Grid Graphical-Model Finite-State-Machine Backtrack-Branch-and- Bound N-Body-Method Circuit Spectral-Method Monte-Carlo Tak-Parallelim Divide and Conquer Data-Parallelim Dicrete-Event Geometric-Decompoition Implementation Strategy Pattern Shared-Queue SPMD Fork/Join Loop-Par. Ditributed-Array Shared-map Data-Par/index-pace Actor Tak-Queue Shared-Data Partitioned Graph Program tructure Data tructure Parallel Execution Pattern MIMD SIMD Thread-Pool Tak-Graph Tranaction Implementation trategy: Data tructure? 41/77 Structural Pattern Pipe-and-Filter Agent-and-Repoitory Proce-Control Event-Baed/Implicit- Invocation Puppeteer Obervation Probability Pattern Decompoition Model-View-Controller Iterative-Refinement Map-Reduce Layered-Sytem Arbitrary-Static-Tak-Graph Computational Pattern Graph-Algorithm Dynamic-Programming Dene-Linear-Algebra Spare-Linear-Algebra Untructured-Grid Structured-Grid Graphical-Model Finite-State-Machine Backtrack-Branch-and- Bound N-Body-Method Circuit Spectral-Method Monte-Carlo Tak-Parallelim Divide and Conquer Data-Parallelim Dicrete-Event Geometric-Decompoition Implementation Strategy Pattern Shared-Queue SPMD Fork/Join Loop-Par. Ditributed-Array Shared-map Data-Par/index-pace Actor Tak-Queue Shared-Data Partitioned Graph Program tructure Data tructure Parallel Execution Pattern MIMD SIMD Thread-Pool Tak-Graph Tranaction Gauian Mixture Model i hared among all computation read only 42/77 Kurt Keutzer 21

22 Structural Pattern Pipe-and-Filter Agent-and-Repoitory Proce-Control Event-Baed/Implicit- Invocation Puppeteer Obervation Probability Pattern Decompoition Model-View-Controller Iterative-Refinement Map-Reduce Layered-Sytem Arbitrary-Static-Tak-Graph Computational Pattern Graph-Algorithm Dynamic-Programming Dene-Linear-Algebra Spare-Linear-Algebra Untructured-Grid Structured-Grid Graphical-Model Finite-State-Machine Backtrack-Branch-and- Bound N-Body-Method Circuit Spectral-Method Monte-Carlo Tak-Parallelim Divide and Conquer Data-Parallelim Dicrete-Event Geometric-Decompoition Implementation Strategy Pattern Shared-Queue SPMD Fork/Join Loop-Par. Ditributed-Array Shared-map Data-Par/index-pace Actor Tak-Queue Shared-Data Partitioned Graph Program tructure Data tructure Parallel Execution Pattern MIMD SIMD Thread-Pool Tak-Graph Tranaction Program tructure? 43/77 Structural Pattern Pipe-and-Filter Agent-and-Repoitory Proce-Control Event-Baed/Implicit- Invocation Puppeteer Obervation Probability Pattern Decompoition Model-View-Controller Iterative-Refinement Map-Reduce Layered-Sytem Arbitrary-Static-Tak-Graph Computational Pattern Graph-Algorithm Dynamic-Programming Dene-Linear-Algebra Spare-Linear-Algebra Untructured-Grid Structured-Grid Graphical-Model Finite-State-Machine Backtrack-Branch-and- Bound N-Body-Method Circuit Spectral-Method Monte-Carlo Tak-Parallelim Divide and Conquer Data-Parallelim Dicrete-Event Geometric-Decompoition Implementation Strategy Pattern Shared-Queue SPMD Fork/Join Loop-Par. Ditributed-Array Shared-map Data-Par/index-pace Actor Tak-Queue Shared-Data Partitioned Graph Program tructure Data tructure Parallel Execution Pattern MIMD SIMD Thread-Pool Tak-Graph Tranaction Data parallel a ingle intruction tream i applied to multiple data element 44/77 Kurt Keutzer 22

23 Obervation Probability Pattern Decompoition Structural Pattern Pipe-and-Filter Agent-and-Repoitory Proce-Control Event-Baed/Implicit- Invocation Puppeteer Model-View-Controller Iterative-Refinement Map-Reduce Layered-Sytem Arbitrary-Static-Tak-Graph Computational Pattern Graph-Algorithm Dynamic-Programming Dene-Linear-Algebra Spare-Linear-Algebra Untructured-Grid Structured-Grid Graphical-Model Finite-State-Machine Backtrack-Branch-and- Bound N-Body-Method Circuit Spectral-Method Monte-Carlo Tak-Parallelim Divide and Conquer Data-Parallelim Dicrete-Event Geometric-Decompoition Implementation Strategy Pattern Shared-Queue SPMD Fork/Join Loop-Par. Ditributed-Array Shared-map Data-Par/index-pace Actor Tak-Queue Shared-Data Partitioned Graph Program tructure Data tructure Parallel Execution Pattern MIMD SIMD Thread-Pool Tak-Graph Tranaction Execution? a ingle intruction tream i applied to multiple data element 45/77 Obervation Probability Pattern Decompoition Structural Pattern Pipe-and-Filter Agent-and-Repoitory Proce-Control Event-Baed/Implicit- Invocation Puppeteer Model-View-Controller Iterative-Refinement Map-Reduce Layered-Sytem Arbitrary-Static-Tak-Graph Computational Pattern Graph-Algorithm Dynamic-Programming Dene-Linear-Algebra Spare-Linear-Algebra Untructured-Grid Structured-Grid Graphical-Model Finite-State-Machine Backtrack-Branch-and- Bound N-Body-Method Circuit Spectral-Method Monte-Carlo Tak-Parallelim Divide and Conquer Data-Parallelim Dicrete-Event Geometric-Decompoition Implementation Strategy Pattern Shared-Queue SPMD Fork/Join Loop-Par. Ditributed-Array Shared-map Data-Par/index-pace Actor Tak-Queue Shared-Data Partitioned Graph Program tructure Data tructure Parallel Execution Pattern MIMD SIMD Thread-Pool Tak-Graph Tranaction SIMD 46/77 Kurt Keutzer 23

24 Now to Parallelim Pt2 Inference Engine Phae 2: Graph Traveral Structural: MapReduce Computational: Graph algorithm/graph traveral The recognition network i a finite tate tranducer, repreented a a weighted and labeled graph Decoding on thi graph i Breadth- Firt Traveral What Parallel Algorithmic Strategy can we ue? Hint: What are the operand? What are the dependence? Tak-Parallelim Divide and Conquer Data-Parallelim Dicrete-Event Geometric-Decompoition 47/77 Now to Parallelim Inference Engine Phae 2: Graph Traveral Structural: MapReduce Computational: Graph Traveral The recognition network i a finite tate tranducer, repreented a a weighted and labeled graph Decoding on thi graph i Breadth- Firt Traveral Data Parallelim! Each next tate computation can be computed independently. Tak-Parallelim Divide and Conquer Data-Parallelim Dicrete-Event Geometric-Decompoition 48/77 Kurt Keutzer 24

25 Active Set Computation Structural Pattern Pipe-and-Filter Agent-and-Repoitory Proce-Control Event-Baed/Implicit- Invocation Puppeteer Model-View-Controller Iterative-Refinement Map-Reduce Layered-Sytem Arbitrary-Static-Tak-Graph Computational Pattern Graph-Algorithm Dynamic-Programming Dene-Linear-Algebra Spare-Linear-Algebra Untructured-Grid Structured-Grid Graphical-Model Finite-State-Machine Backtrack-Branch-and- Bound N-Body-Method Circuit Spectral-Method Monte-Carlo Tak-Parallelim Divide and Conquer Data-Parallelim Dicrete-Event Geometric-Decompoition Implementation Strategy Pattern Shared-Queue SPMD Fork/Join Loop-Par. Ditributed-Array Shared-map Data-Par/index-pace Actor Tak-Queue Shared-Data Partitioned Graph Program tructure Data tructure Parallel Execution Pattern MIMD SIMD Thread-Pool Tak-Graph Tranaction SIMD 49/77 Shared Queue Implementation In each iteration: Arc tranition from the current et of tate are travered to contruct a et of next active tate Each thread enqueue one or more next active tate to a global queue of next active tate Thi approach caue ignificant contention a the queue head pointer In our application - thouand of thread are acceing one memory location Caue erialization of thread memory accee Limit calability 50/77 Kurt Keutzer 25

26 Shared Queue Implementation Solution to the queue head pointer contention -> ditributed queue 1. Each thread in a thread block write to a local queue of next active tate 2. Local queue are merged into a global queue Contention on global queue pointer i reduced from #thread to #block Significantly improve calability 51/77 Speech Reference Implementation Jike Chong, Ekaterina Gonina, Youngmin Yi, Kurt Keutzer, A Fully Data Parallel WFST-baed Large Vocabulary Continuou Speech Recognition on a Graphic Proceing Unit, Proceeding of the 10th Annual Conference of the International Speech Communication Aociation (InterSpeech), page , September, W R W R W R Manycore GPU Data W R R W R Control Phae 0 Iteration Control Prepare ActiveSet Phae 1 Compute Obervation Probability Phae 2 For each active arc: Compute arc tranition probability Copy reult back to CPU Control Read File Initialize data tructure Collect Backtrack Info Backtrack Output Reult CPU Data W R W R Kiun You, Jike Chong, Youngmin Yi, Ekaterina Gonina, Chritopher Hughe, Yen-Kuang Chen, Wonyong Sung, Kurt Keutzer, Parallel Scalability in Speech Recognition: Inference engine in large vocabulary continuou peech recognition, IEEE Signal Proceing Magazine, vol. 26, no. 6, pp , November Jike Chong, Ekaterina Gonina, Kiun You, Kurt Keutzer, Scalable Parallelization of Automatic Speech Recognition, Invited book chapter in Scaling Up Machine Learning, an upcoming 2010 Cambridge Univerity Pre book. 52/77 Kurt Keutzer 26

27 Summary A good architect need to undertand: Structural pattern Computational pattern Refinement through Our Pattern Language Graph algorithm and graphical model are critical to many application Graph algorithm are epecially difficult to parallelize and library upport i inadequate There will be at leat a decade of hand-crafted olution We achieved good reult on parallelizing large-vocabulary automatic peech recognition 53/77 Extra 54/77 Kurt Keutzer 27

28 Example: Breadth Firt Search Breadth firt earch Ditributed Graph Ditributed Queue Ditributed Viitor Ditributed Property Map Algorithm Viitor 55/77 Example: Ditributed BFS Ditributed Graph (a) Ditributed Graph (b) Ditributed adjacency lit repreentation 56/77 Kurt Keutzer 28

29 Example: Ditributed BFS Ditributed Queue pop() from local queue puh() end meage to the vertex owner empty() exhaut local queue and ynchronize with other proceor to determine termination condition Meage are only received after all proceor have completed operation at one level (b) Ditributed adjacency lit repreentation 57 57/77 Example: Ditributed BFS Ditributed Viitor Owner-compute cheme, o ditributed verion i the ame a equential viitor Ditributed Property Map Local Property Map Store local propertie for local vertice and edge Ghot Cell Store propertie for ghot cell Proce Group Communication medium Data Race Reolver Decide among variou put() meage ent to the Ditributed Property Map (a) Ditributed Graph (b) Ditributed adjacency lit repreentation 58 58/77 Kurt Keutzer 29

30 Performance achieved 128 node, 2GHz AMD Opteron, 4GB RAM each, connected over Infiniband, one core active per node BFS: 100k vertice and ~15M edge (C) Crauer et al (E) Eager heuritic BFS: 1M vertice and ~15M edge Dijktra Algorithm: 100k vertice and ~15M edge 59 59/77 Dicuion Graph traveral algorithm characteritic: 1. Input data-driven computation 2. Untructured problem 3. Poor data locality 4. High data acce to computation ratio High demand for low memory latency and poor data locality make it challenging for PE without fine-grain multiproceing Pointer chaing mainly involve integer operation Niagara type platform eem uitable for thi application domain What about GPGPU? What how well would our cheduling algorithm map to Parallel BGL? 60 60/77 Kurt Keutzer 30

Third party names are the property of their owners.

Third party names are the property of their owners. Disclaimer READ THIS its very important The views expressed in this talk are those of the speaker and not his employer. This is an academic style talk and does not address details of any particular Intel

More information

Lecture 14: Minimum Spanning Tree I

Lecture 14: Minimum Spanning Tree I COMPSCI 0: Deign and Analyi of Algorithm October 4, 07 Lecture 4: Minimum Spanning Tree I Lecturer: Rong Ge Scribe: Fred Zhang Overview Thi lecture we finih our dicuion of the hortet path problem and introduce

More information

GPU-Accelerated Large Vocabulary Continuous Speech Recognition for Scalable Distributed Speech Recognition. Jungsuk Kim Ian Lane

GPU-Accelerated Large Vocabulary Continuous Speech Recognition for Scalable Distributed Speech Recognition. Jungsuk Kim Ian Lane GPU-Accelerated Large Vocabulary Continuou Speech Recognition for Scalable Ditributed Speech Recognition Junguk Kim Ian Lane Electrical and Computer Engineering Carnegie Mellon Univerity March 20, 2015

More information

The Association of System Performance Professionals

The Association of System Performance Professionals The Aociation of Sytem Performance Profeional The Computer Meaurement Group, commonly called CMG, i a not for profit, worldwide organization of data proceing profeional committed to the meaurement and

More information

Contents. shortest paths. Notation. Shortest path problem. Applications. Algorithms and Networks 2010/2011. In the entire course:

Contents. shortest paths. Notation. Shortest path problem. Applications. Algorithms and Networks 2010/2011. In the entire course: Content Shortet path Algorithm and Network 21/211 The hortet path problem: Statement Verion Application Algorithm (for ingle ource p problem) Reminder: relaxation, Dijktra, Variant of Dijktra, Bellman-Ford,

More information

Generic Traverse. CS 362, Lecture 19. DFS and BFS. Today s Outline

Generic Traverse. CS 362, Lecture 19. DFS and BFS. Today s Outline Generic Travere CS 62, Lecture 9 Jared Saia Univerity of New Mexico Travere(){ put (nil,) in bag; while (the bag i not empty){ take ome edge (p,v) from the bag if (v i unmarked) mark v; parent(v) = p;

More information

3D SMAP Algorithm. April 11, 2012

3D SMAP Algorithm. April 11, 2012 3D SMAP Algorithm April 11, 2012 Baed on the original SMAP paper [1]. Thi report extend the tructure of MSRF into 3D. The prior ditribution i modified to atify the MRF property. In addition, an iterative

More information

Building a Compact On-line MRF Recognizer for Large Character Set using Structured Dictionary Representation and Vector Quantization Technique

Building a Compact On-line MRF Recognizer for Large Character Set using Structured Dictionary Representation and Vector Quantization Technique 202 International Conference on Frontier in Handwriting Recognition Building a Compact On-line MRF Recognizer for Large Character Set uing Structured Dictionary Repreentation and Vector Quantization Technique

More information

Today s Outline. CS 561, Lecture 23. Negative Weights. Shortest Paths Problem. The presence of a negative cycle might mean that there is

Today s Outline. CS 561, Lecture 23. Negative Weights. Shortest Paths Problem. The presence of a negative cycle might mean that there is Today Outline CS 56, Lecture Jared Saia Univerity of New Mexico The path that can be trodden i not the enduring and unchanging Path. The name that can be named i not the enduring and unchanging Name. -

More information

Shortest Paths Problem. CS 362, Lecture 20. Today s Outline. Negative Weights

Shortest Paths Problem. CS 362, Lecture 20. Today s Outline. Negative Weights Shortet Path Problem CS 6, Lecture Jared Saia Univerity of New Mexico Another intereting problem for graph i that of finding hortet path Aume we are given a weighted directed graph G = (V, E) with two

More information

1 The secretary problem

1 The secretary problem Thi i new material: if you ee error, pleae email jtyu at tanford dot edu 1 The ecretary problem We will tart by analyzing the expected runtime of an algorithm, a you will be expected to do on your homework.

More information

The Data Locality of Work Stealing

The Data Locality of Work Stealing The Data Locality of Work Stealing Umut A. Acar School of Computer Science Carnegie Mellon Univerity umut@c.cmu.edu Guy E. Blelloch School of Computer Science Carnegie Mellon Univerity guyb@c.cmu.edu Robert

More information

Advanced Encryption Standard and Modes of Operation

Advanced Encryption Standard and Modes of Operation Advanced Encryption Standard and Mode of Operation G. Bertoni L. Breveglieri Foundation of Cryptography - AES pp. 1 / 50 AES Advanced Encryption Standard (AES) i a ymmetric cryptographic algorithm AES

More information

Stochastic Search and Graph Techniques for MCM Path Planning Christine D. Piatko, Christopher P. Diehl, Paul McNamee, Cheryl Resch and I-Jeng Wang

Stochastic Search and Graph Techniques for MCM Path Planning Christine D. Piatko, Christopher P. Diehl, Paul McNamee, Cheryl Resch and I-Jeng Wang Stochatic Search and Graph Technique for MCM Path Planning Chritine D. Piatko, Chritopher P. Diehl, Paul McNamee, Cheryl Rech and I-Jeng Wang The John Hopkin Univerity Applied Phyic Laboratory, Laurel,

More information

Keywords Cloud Computing, Service Level Agreements (SLA), CloudSim, Monitoring & Controlling SLA Agent, JADE

Keywords Cloud Computing, Service Level Agreements (SLA), CloudSim, Monitoring & Controlling SLA Agent, JADE Volume 5, Iue 8, Augut 2015 ISSN: 2277 128X International Journal of Advanced Reearch in Computer Science and Software Engineering Reearch Paper Available online at: www.ijarce.com Verification of Agent

More information

Operational Semantics Class notes for a lecture given by Mooly Sagiv Tel Aviv University 24/5/2007 By Roy Ganor and Uri Juhasz

Operational Semantics Class notes for a lecture given by Mooly Sagiv Tel Aviv University 24/5/2007 By Roy Ganor and Uri Juhasz Operational emantic Page Operational emantic Cla note for a lecture given by Mooly agiv Tel Aviv Univerity 4/5/7 By Roy Ganor and Uri Juhaz Reference emantic with Application, H. Nielon and F. Nielon,

More information

Laboratory Exercise 6

Laboratory Exercise 6 Laboratory Exercie 6 Adder, Subtractor, and Multiplier a a The purpoe of thi exercie i to examine arithmetic circuit that add, ubtract, and multiply number. Each b c circuit will be decribed in Verilog

More information

AUTOMATIC TEST CASE GENERATION USING UML MODELS

AUTOMATIC TEST CASE GENERATION USING UML MODELS Volume-2, Iue-6, June-2014 AUTOMATIC TEST CASE GENERATION USING UML MODELS 1 SAGARKUMAR P. JAIN, 2 KHUSHBOO S. LALWANI, 3 NIKITA K. MAHAJAN, 4 BHAGYASHREE J. GADEKAR 1,2,3,4 Department of Computer Engineering,

More information

How to. write a paper. The basics writing a solid paper Different communities/different standards Common errors

How to. write a paper. The basics writing a solid paper Different communities/different standards Common errors How to write a paper The baic writing a olid paper Different communitie/different tandard Common error Reource Raibert eay My grammar point Article on a v. the Bug in writing Clarity Goal Conciene Calling

More information

Increasing Throughput and Reducing Delay in Wireless Sensor Networks Using Interference Alignment

Increasing Throughput and Reducing Delay in Wireless Sensor Networks Using Interference Alignment Int. J. Communication, Network and Sytem Science, 0, 5, 90-97 http://dx.doi.org/0.436/ijcn.0.50 Publihed Online February 0 (http://www.scirp.org/journal/ijcn) Increaing Throughput and Reducing Delay in

More information

Lecture Outline. Global flow analysis. Global Optimization. Global constant propagation. Liveness analysis. Local Optimization. Global Optimization

Lecture Outline. Global flow analysis. Global Optimization. Global constant propagation. Liveness analysis. Local Optimization. Global Optimization Lecture Outline Global flow analyi Global Optimization Global contant propagation Livene analyi Adapted from Lecture by Prof. Alex Aiken and George Necula (UCB) CS781(Praad) L27OP 1 CS781(Praad) L27OP

More information

Factor Graphs and Inference

Factor Graphs and Inference Factor Graph and Inerence Sargur Srihari rihari@cedar.bualo.edu 1 Topic 1. Factor Graph 1. Factor in probability ditribution. Deriving them rom graphical model. Eact Inerence Algorithm or Tree graph 1.

More information

Lecture 8: More Pipelining

Lecture 8: More Pipelining Overview Lecture 8: More Pipelining David Black-Schaffer davidbb@tanford.edu EE8 Spring 00 Getting Started with Lab Jut get a ingle pixel calculating at one time Then look into filling your pipeline Multiplier

More information

Distributed Packet Processing Architecture with Reconfigurable Hardware Accelerators for 100Gbps Forwarding Performance on Virtualized Edge Router

Distributed Packet Processing Architecture with Reconfigurable Hardware Accelerators for 100Gbps Forwarding Performance on Virtualized Edge Router Ditributed Packet Proceing Architecture with Reconfigurable Hardware Accelerator for 100Gbp Forwarding Performance on Virtualized Edge Router Satohi Nihiyama, Hitohi Kaneko, and Ichiro Kudo Abtract To

More information

Markov Random Fields in Image Segmentation

Markov Random Fields in Image Segmentation Preented at SSIP 2011, Szeged, Hungary Markov Random Field in Image Segmentation Zoltan Kato Image Proceing & Computer Graphic Dept. Univerity of Szeged Hungary Zoltan Kato: Markov Random Field in Image

More information

Performance of a Robust Filter-based Approach for Contour Detection in Wireless Sensor Networks

Performance of a Robust Filter-based Approach for Contour Detection in Wireless Sensor Networks Performance of a Robut Filter-baed Approach for Contour Detection in Wirele Senor Network Hadi Alati, William A. Armtrong, Jr., and Ai Naipuri Department of Electrical and Computer Engineering The Univerity

More information

Analyzing Hydra Historical Statistics Part 2

Analyzing Hydra Historical Statistics Part 2 Analyzing Hydra Hitorical Statitic Part Fabio Maimo Ottaviani EPV Technologie White paper 5 hnode HSM Hitorical Record The hnode i the hierarchical data torage management node and ha to perform all the

More information

Universität Augsburg. Institut für Informatik. Approximating Optimal Visual Sensor Placement. E. Hörster, R. Lienhart.

Universität Augsburg. Institut für Informatik. Approximating Optimal Visual Sensor Placement. E. Hörster, R. Lienhart. Univerität Augburg à ÊÇÅÍÆ ËÀǼ Approximating Optimal Viual Senor Placement E. Hörter, R. Lienhart Report 2006-01 Januar 2006 Intitut für Informatik D-86135 Augburg Copyright c E. Hörter, R. Lienhart Intitut

More information

Exercise 4: Markov Processes, Cellular Automata and Fuzzy Logic

Exercise 4: Markov Processes, Cellular Automata and Fuzzy Logic Exercie 4: Marko rocee, Cellular Automata and Fuzzy Logic Formal Method II, Fall Semeter 203 Solution Sheet Marko rocee Theoretical Exercie. (a) ( point) 0.2 0.7 0.3 tanding 0.25 lying 0.5 0.4 0.2 0.05

More information

CS201: Data Structures and Algorithms. Assignment 2. Version 1d

CS201: Data Structures and Algorithms. Assignment 2. Version 1d CS201: Data Structure and Algorithm Aignment 2 Introduction Verion 1d You will compare the performance of green binary earch tree veru red-black tree by reading in a corpu of text, toring the word and

More information

LinkGuide: Towards a Better Collection of Hyperlinks in a Website Homepage

LinkGuide: Towards a Better Collection of Hyperlinks in a Website Homepage Proceeding of the World Congre on Engineering 2007 Vol I LinkGuide: Toward a Better Collection of Hyperlink in a Webite Homepage A. Ammari and V. Zharkova chool of Informatic, Univerity of Bradford anammari@bradford.ac.uk,

More information

Routing Definition 4.1

Routing Definition 4.1 4 Routing So far, we have only looked at network without dealing with the iue of how to end information in them from one node to another The problem of ending information in a network i known a routing

More information

Digifort Standard. Architecture

Digifort Standard. Architecture Digifort Standard Intermediate olution for intalling up to 32 camera The Standard verion provide the ideal reource for local and remote monitoring of up to 32 camera per erver and a the intermediate verion

More information

ES205 Analysis and Design of Engineering Systems: Lab 1: An Introductory Tutorial: Getting Started with SIMULINK

ES205 Analysis and Design of Engineering Systems: Lab 1: An Introductory Tutorial: Getting Started with SIMULINK ES05 Analyi and Deign of Engineering Sytem: Lab : An Introductory Tutorial: Getting Started with SIMULINK What i SIMULINK? SIMULINK i a oftware package for modeling, imulating, and analyzing dynamic ytem.

More information

Course Project: Adders, Subtractors, and Multipliers a

Course Project: Adders, Subtractors, and Multipliers a In the name Allah Department of Computer Engineering 215 Spring emeter Computer Architecture Coure Intructor: Dr. Mahdi Abbai Coure Project: Adder, Subtractor, and Multiplier a a The purpoe of thi p roject

More information

A New Approach to Pipeline FFT Processor

A New Approach to Pipeline FFT Processor A ew Approach to Pipeline FFT Proceor Shouheng He and Mat Torkelon Department of Applied Electronic, Lund Univerity S- Lund, SWEDE email: he@tde.lth.e; torkel@tde.lth.e Abtract A new VLSI architecture

More information

Key Terms - MinMin, MaxMin, Sufferage, Task Scheduling, Standard Deviation, Load Balancing.

Key Terms - MinMin, MaxMin, Sufferage, Task Scheduling, Standard Deviation, Load Balancing. Volume 3, Iue 11, November 2013 ISSN: 2277 128X International Journal of Advanced Reearch in Computer Science and Software Engineering Reearch Paper Available online at: www.ijarce.com Tak Aignment in

More information

A PROBABILISTIC NOTION OF CAMERA GEOMETRY: CALIBRATED VS. UNCALIBRATED

A PROBABILISTIC NOTION OF CAMERA GEOMETRY: CALIBRATED VS. UNCALIBRATED A PROBABILISTIC NOTION OF CAMERA GEOMETRY: CALIBRATED VS. UNCALIBRATED Jutin Domke and Yianni Aloimono Computational Viion Laboratory, Center for Automation Reearch Univerity of Maryland College Park,

More information

DAROS: Distributed User-Server Assignment And Replication For Online Social Networking Applications

DAROS: Distributed User-Server Assignment And Replication For Online Social Networking Applications DAROS: Ditributed Uer-Server Aignment And Replication For Online Social Networking Application Thuan Duong-Ba School of EECS Oregon State Univerity Corvalli, OR 97330, USA Email: duongba@eec.oregontate.edu

More information

Set-based Approach for Lossless Graph Summarization using Locality Sensitive Hashing

Set-based Approach for Lossless Graph Summarization using Locality Sensitive Hashing Set-baed Approach for Lole Graph Summarization uing Locality Senitive Hahing Kifayat Ullah Khan Supervior: Young-Koo Lee Expected Graduation Date: Fall 0 Deptartment of Computer Engineering Kyung Hee Univerity

More information

999 Computer System Network. (12) Patent Application Publication (10) Pub. No.: US 2006/ A1. (19) United States

999 Computer System Network. (12) Patent Application Publication (10) Pub. No.: US 2006/ A1. (19) United States (19) United State US 2006O1296.60A1 (12) Patent Application Publication (10) Pub. No.: Mueller et al. (43) Pub. Date: Jun. 15, 2006 (54) METHOD AND COMPUTER SYSTEM FOR QUEUE PROCESSING (76) Inventor: Wolfgang

More information

Laboratory Exercise 6

Laboratory Exercise 6 Laboratory Exercie 6 Adder, Subtractor, and Multiplier The purpoe of thi exercie i to examine arithmetic circuit that add, ubtract, and multiply number. Each type of circuit will be implemented in two

More information

Laboratory Exercise 6

Laboratory Exercise 6 Laboratory Exercie 6 Adder, Subtractor, and Multiplier The purpoe of thi exercie i to examine arithmetic circuit that add, ubtract, and multiply number. Each circuit will be decribed in Verilog and implemented

More information

Refining SIRAP with a Dedicated Resource Ceiling for Self-Blocking

Refining SIRAP with a Dedicated Resource Ceiling for Self-Blocking Refining SIRAP with a Dedicated Reource Ceiling for Self-Blocking Mori Behnam, Thoma Nolte Mälardalen Real-Time Reearch Centre P.O. Box 883, SE-721 23 Väterå, Sweden {mori.behnam,thoma.nolte}@mdh.e ABSTRACT

More information

A Map Reduce Framework for Programming Graphics Processors

A Map Reduce Framework for Programming Graphics Processors A Map Reduce Framework for Programming Graphic Proceor Bryan Catanzaro, Narayanan Sundaram and Kurt Keutzer Univerity of California, Berkeley 545Q Cory Hall Berkeley, California 94720 {catanzar, narayan,

More information

YOUNGMIN YI. B.S. in Computer Engineering, 2000 Seoul National University (SNU), Seoul, Korea

YOUNGMIN YI. B.S. in Computer Engineering, 2000 Seoul National University (SNU), Seoul, Korea YOUNGMIN YI Parallel Computing Lab Phone: +1 (925) 348-1095 573 Soda Hall Email: ymyi@eecs.berkeley.edu Electrical Engineering and Computer Science Web: http://eecs.berkeley.edu/~ymyi University of California,

More information

A CLUSTERING-BASED HYBRID REPLICA CONTROL PROTOCOL FOR HIGH AVAILABILITY IN GRID ENVIRONMENT

A CLUSTERING-BASED HYBRID REPLICA CONTROL PROTOCOL FOR HIGH AVAILABILITY IN GRID ENVIRONMENT Journal of Computer Science 10 (12): 2442-2449, 2014 ISSN: 1549-3636 2014 R. Latip et al., Thi open acce article i ditributed under a Creative Common Attribution (CC-BY) 3.0 licene doi:10.3844/jcp.2014.2442.2449

More information

[N309] Feedforward Active Noise Control Systems with Online Secondary Path Modeling. Muhammad Tahir Akhtar, Masahide Abe, and Masayuki Kawamata

[N309] Feedforward Active Noise Control Systems with Online Secondary Path Modeling. Muhammad Tahir Akhtar, Masahide Abe, and Masayuki Kawamata he 32nd International Congre and Expoition on Noie Control Engineering Jeju International Convention Center, Seogwipo, Korea, Augut 25-28, 2003 [N309] Feedforward Active Noie Control Sytem with Online

More information

A Sparse Shared-Memory Multifrontal Solver in SCAD Software

A Sparse Shared-Memory Multifrontal Solver in SCAD Software Proceeding of the International Multiconference on ISBN 978-83-6080--9 Computer Science and Information echnology, pp. 77 83 ISSN 896-709 A Spare Shared-Memory Multifrontal Solver in SCAD Software Sergiy

More information

Bottom Up parsing. Bottom-up parsing. Steps in a shift-reduce parse. 1. s. 2. np. john. john. john. walks. walks.

Bottom Up parsing. Bottom-up parsing. Steps in a shift-reduce parse. 1. s. 2. np. john. john. john. walks. walks. Paring Technologie Outline Paring Technologie Outline Bottom Up paring Paring Technologie Paring Technologie Bottom-up paring Step in a hift-reduce pare top-down: try to grow a tree down from a category

More information

Architecture and grid application of cluster computing system

Architecture and grid application of cluster computing system Architecture and grid application of cluter computing ytem Yi Lv*, Shuiqin Yu, Youju Mao Intitute of Optical Fiber Telecom, Chongqing Univ. of Pot & Telecom, Chongqing, 400065, P.R.China ABSTRACT Recently,

More information

Parallel Approaches for Intervals Analysis of Variable Statistics in Large and Sparse Linear Equations with RHS Ranges

Parallel Approaches for Intervals Analysis of Variable Statistics in Large and Sparse Linear Equations with RHS Ranges American Journal of Applied Science 4 (5): 300-306, 2007 ISSN 1546-9239 2007 Science Publication Correponding Author: Parallel Approache for Interval Analyi of Variable Statitic in Large and Spare Linear

More information

Topics. Lecture 37: Global Optimization. Issues. A Simple Example: Copy Propagation X := 3 B > 0 Y := 0 X := 4 Y := Z + W A := 2 * 3X

Topics. Lecture 37: Global Optimization. Issues. A Simple Example: Copy Propagation X := 3 B > 0 Y := 0 X := 4 Y := Z + W A := 2 * 3X Lecture 37: Global Optimization [Adapted from note by R. Bodik and G. Necula] Topic Global optimization refer to program optimization that encompa multiple baic block in a function. (I have ued the term

More information

Delaunay Triangulation: Incremental Construction

Delaunay Triangulation: Incremental Construction Chapter 6 Delaunay Triangulation: Incremental Contruction In the lat lecture, we have learned about the Lawon ip algorithm that compute a Delaunay triangulation of a given n-point et P R 2 with O(n 2 )

More information

xy-monotone path existence queries in a rectilinear environment

xy-monotone path existence queries in a rectilinear environment CCCG 2012, Charlottetown, P.E.I., Augut 8 10, 2012 xy-monotone path exitence querie in a rectilinear environment Gregory Bint Anil Mahehwari Michiel Smid Abtract Given a planar environment coniting of

More information

Compiler Construction

Compiler Construction Compiler Contruction Lecture 6 - An Introduction to Bottom- Up Paring 3 Robert M. Siegfried All right reerved Bottom-up Paring Bottom-up parer pare a program from the leave of a pare tree, collecting the

More information

A Basic Prototype for Enterprise Level Project Management

A Basic Prototype for Enterprise Level Project Management A Baic Prototype for Enterprie Level Project Management Saurabh Malgaonkar, Abhay Kolhe Computer Engineering Department, Mukeh Patel School of Technology Management & Engineering, NMIMS Univerity, Mumbai,

More information

IMPROVED JPEG DECOMPRESSION OF DOCUMENT IMAGES BASED ON IMAGE SEGMENTATION. Tak-Shing Wong, Charles A. Bouman, and Ilya Pollak

IMPROVED JPEG DECOMPRESSION OF DOCUMENT IMAGES BASED ON IMAGE SEGMENTATION. Tak-Shing Wong, Charles A. Bouman, and Ilya Pollak IMPROVED DECOMPRESSION OF DOCUMENT IMAGES BASED ON IMAGE SEGMENTATION Tak-Shing Wong, Charle A. Bouman, and Ilya Pollak School of Electrical and Computer Engineering Purdue Univerity ABSTRACT We propoe

More information

Programmazione di sistemi multicore

Programmazione di sistemi multicore Programmazione di itemi multicore A.A. 2015-2016 LECTURE 9 IRENE FINOCCHI http://wwwuer.di.uniroma1.it/~inocchi/ More complex parallel pattern PARALLEL PREFIX PARALLEL SORTING PARALLEL MERGESORT PARALLEL

More information

Trainable Context Model for Multiscale Segmentation

Trainable Context Model for Multiscale Segmentation Trainable Context Model for Multicale Segmentation Hui Cheng and Charle A. Bouman School of Electrical and Computer Engineering Purdue Univerity Wet Lafayette, IN 47907-1285 {hui, bouman}@ ecn.purdue.edu

More information

VLSI Design 9. Datapath Design

VLSI Design 9. Datapath Design VLSI Deign 9. Datapath Deign 9. Datapath Deign Lat module: Adder circuit Simple adder Fat addition Thi module omparator Shifter Multi-input Adder Multiplier omparator detector: A = 1 detector: A = 11 111

More information

SIMIT 7. Component Type Editor (CTE) User manual. Siemens Industrial

SIMIT 7. Component Type Editor (CTE) User manual. Siemens Industrial SIMIT 7 Component Type Editor (CTE) Uer manual Siemen Indutrial Edition January 2013 Siemen offer imulation oftware to plan, imulate and optimize plant and machine. The imulation- and optimizationreult

More information

Aspects of Formal and Graphical Design of a Bus System

Aspects of Formal and Graphical Design of a Bus System Apect of Formal and Graphical Deign of a Bu Sytem Tiberiu Seceleanu Univerity of Turku, Dpt. of Information Technology Turku, Finland tiberiu.eceleanu@utu.fi Tomi Weterlund Turku Centre for Computer Science

More information

Audio-Visual Voice Command Recognition in Noisy Conditions

Audio-Visual Voice Command Recognition in Noisy Conditions Audio-Viual Voice Command Recognition in Noiy Condition Joef Chaloupka, Jan Nouza, Jindrich Zdanky Laboratory of Computer Speech Proceing, Intitute of Information Technology and Electronic, Technical Univerity

More information

Uninformed Search Complexity. Informed Search. Search Revisited. Day 2/3 of Search

Uninformed Search Complexity. Informed Search. Search Revisited. Day 2/3 of Search Informed Search ay 2/3 of Search hap. 4, Ruel & Norvig FS IFS US PFS MEM FS IS Uninformed Search omplexity N = Total number of tate = verage number of ucceor (branching factor) L = Length for tart to goal

More information

Multicast with Network Coding in Application-Layer Overlay Networks

Multicast with Network Coding in Application-Layer Overlay Networks IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 22, NO. 1, JANUARY 2004 1 Multicat with Network Coding in Application-Layer Overlay Network Ying Zhu, Baochun Li, Member, IEEE, and Jiang Guo Abtract

More information

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing On the Efficacy of a Fued CPU+GPU Proceor (or APU) for Parallel Computing Mayank Daga, Ahwin M. Aji, and Wu-chun Feng Dept. of Computer Science Sampling of field that ue GPU Mac OS X Comology Molecular

More information

mapping reult. Our experiment have revealed that for many popular tream application, uch a networking and multimedia application, the number of VC nee

mapping reult. Our experiment have revealed that for many popular tream application, uch a networking and multimedia application, the number of VC nee Reolving Deadlock for Pipelined Stream Application on Network-on-Chip Xiaohang Wang 1,2, Peng Liu 1 1 Department of Information Science and Electronic Engineering, Zheiang Univerity Hangzhou, Zheiang,

More information

Laboratory Exercise 6

Laboratory Exercise 6 Laboratory Exercie 6 Adder, Subtractor, and Multiplier The purpoe of thi exercie i to examine arithmetic circuit that add, ubtract, and multiply number. Each circuit will be decribed in VHL and implemented

More information

Chapter S:II (continued)

Chapter S:II (continued) Chapter S:II (continued) II. Baic Search Algorithm Sytematic Search Graph Theory Baic State Space Search Depth-Firt Search Backtracking Breadth-Firt Search Uniform-Cot Search AND-OR Graph Baic Depth-Firt

More information

(12) Patent Application Publication (10) Pub. No.: US 2011/ A1

(12) Patent Application Publication (10) Pub. No.: US 2011/ A1 (19) United State US 2011 0316690A1 (12) Patent Application Publication (10) Pub. No.: US 2011/0316690 A1 Siegman (43) Pub. Date: Dec. 29, 2011 (54) SYSTEMAND METHOD FOR IDENTIFYING ELECTRICAL EQUIPMENT

More information

On successive packing approach to multidimensional (M-D) interleaving

On successive packing approach to multidimensional (M-D) interleaving On ucceive packing approach to multidimenional (M-D) interleaving Xi Min Zhang Yun Q. hi ankar Bau Abtract We propoe an interleaving cheme for multidimenional (M-D) interleaving. To achieved by uing a

More information

Cutting Stock by Iterated Matching. Andreas Fritsch, Oliver Vornberger. University of Osnabruck. D Osnabruck.

Cutting Stock by Iterated Matching. Andreas Fritsch, Oliver Vornberger. University of Osnabruck. D Osnabruck. Cutting Stock by Iterated Matching Andrea Fritch, Oliver Vornberger Univerity of Onabruck Dept of Math/Computer Science D-4909 Onabruck andy@informatikuni-onabrueckde Abtract The combinatorial optimization

More information

Source Code (C) Phantom Support Sytem Generic Front-End Compiler BB CFG Code Generation ANSI C Single-threaded Application Phantom Call Identifier AEB

Source Code (C) Phantom Support Sytem Generic Front-End Compiler BB CFG Code Generation ANSI C Single-threaded Application Phantom Call Identifier AEB Code Partitioning for Synthei of Embedded Application with Phantom André C. Nácul, Tony Givargi Department of Computer Science Univerity of California, Irvine {nacul, givargi@ic.uci.edu ABSTRACT In a large

More information

Minimum congestion spanning trees in bipartite and random graphs

Minimum congestion spanning trees in bipartite and random graphs Minimum congetion panning tree in bipartite and random graph M.I. Otrovkii Department of Mathematic and Computer Science St. John Univerity 8000 Utopia Parkway Queen, NY 11439, USA e-mail: otrovm@tjohn.edu

More information

Maneuverable Relays to Improve Energy Efficiency in Sensor Networks

Maneuverable Relays to Improve Energy Efficiency in Sensor Networks Maneuverable Relay to Improve Energy Efficiency in Senor Network Stephan Eidenbenz, Luka Kroc, Jame P. Smith CCS-5, MS M997; Lo Alamo National Laboratory; Lo Alamo, NM 87545. Email: {eidenben, kroc, jpmith}@lanl.gov

More information

AN ALGORITHM FOR MINIMALLY LATENT GLOBAL VIRTUAL TIME. equal to the timestamp of the last event executed.

AN ALGORITHM FOR MINIMALLY LATENT GLOBAL VIRTUAL TIME. equal to the timestamp of the last event executed. AN ALGORITHM FOR MINIMALLY LATENT GLOBAL VIRTUAL TIME Alexander I. Tomlinon Vijay K. Garg Department of Electrical and Computer Engineering The Univerity of Texa at Autin, Autin, Texa 78712 Thi paper appear

More information

New Structural Decomposition Techniques for Constraint Satisfaction Problems

New Structural Decomposition Techniques for Constraint Satisfaction Problems New Structural Decompoition Technique for Contraint Satifaction Problem Yaling Zheng and Berthe Y. Choueiry Contraint Sytem Laboratory Univerity of Nebraka-Lincoln Email: yzheng choueiry@ce.unl.edu Abtract.

More information

Domain-Specific Modeling for Rapid System-Wide Energy Estimation of Reconfigurable Architectures

Domain-Specific Modeling for Rapid System-Wide Energy Estimation of Reconfigurable Architectures Domain-Specific Modeling for Rapid Sytem-Wide Energy Etimation of Reconfigurable Architecture Seonil Choi 1,Ju-wookJang 2, Sumit Mohanty 1, Viktor K. Praanna 1 1 Dept. of Electrical Engg. 2 Dept. of Electronic

More information

A Practical Model for Minimizing Waiting Time in a Transit Network

A Practical Model for Minimizing Waiting Time in a Transit Network A Practical Model for Minimizing Waiting Time in a Tranit Network Leila Dianat, MASc, Department of Civil Engineering, Sharif Univerity of Technology, Tehran, Iran Youef Shafahi, Ph.D. Aociate Profeor,

More information

Semi-Distributed Load Balancing for Massively Parallel Multicomputer Systems

Semi-Distributed Load Balancing for Massively Parallel Multicomputer Systems Syracue Univerity SUFAC lectrical ngineering and Computer Science echnical eport College of ngineering and Computer Science 8-1991 Semi-Ditributed Load Balancing for aively Parallel ulticomputer Sytem

More information

SLA Adaptation for Service Overlay Networks

SLA Adaptation for Service Overlay Networks SLA Adaptation for Service Overlay Network Con Tran 1, Zbigniew Dziong 1, and Michal Pióro 2 1 Department of Electrical Engineering, École de Technologie Supérieure, Univerity of Quebec, Montréal, Canada

More information

The underigned hereby recommend to the Faculty of Graduate Studie and Reearch aceeptance of the thei, Two Topic in Applied Algorithmic ubmitted by Pat

The underigned hereby recommend to the Faculty of Graduate Studie and Reearch aceeptance of the thei, Two Topic in Applied Algorithmic ubmitted by Pat Two Topic in Applied Algorithmic By Patrick R. Morin A thei ubmitted to the Faculty of Graduate Studie and Reearch in partial fullment of the requirement for the degree of Mater of Computer Science Ottawa-Carleton

More information

SIMIT 7. Profinet IO Gateway. User Manual

SIMIT 7. Profinet IO Gateway. User Manual SIMIT 7 Profinet IO Gateway Uer Manual Edition January 2013 Siemen offer imulation oftware to plan, imulate and optimize plant and machine. The imulation- and optimizationreult are only non-binding uggetion

More information

Multi-Target Tracking In Clutter

Multi-Target Tracking In Clutter Multi-Target Tracking In Clutter John N. Sander-Reed, Mary Jo Duncan, W.B. Boucher, W. Michael Dimmler, Shawn O Keefe ABSTRACT A high frame rate (0 Hz), multi-target, video tracker ha been developed and

More information

Localized Minimum Spanning Tree Based Multicast Routing with Energy-Efficient Guaranteed Delivery in Ad Hoc and Sensor Networks

Localized Minimum Spanning Tree Based Multicast Routing with Energy-Efficient Guaranteed Delivery in Ad Hoc and Sensor Networks Localized Minimum Spanning Tree Baed Multicat Routing with Energy-Efficient Guaranteed Delivery in Ad Hoc and Senor Network Hanne Frey Univerity of Paderborn D-3398 Paderborn hanne.frey@uni-paderborn.de

More information

Diverse: Application-Layer Service Differentiation in Peer-to-Peer Communications

Diverse: Application-Layer Service Differentiation in Peer-to-Peer Communications Divere: Application-Layer Service Differentiation in Peer-to-Peer Communication Chuan Wu, Student Member, IEEE, Baochun Li, Senior Member, IEEE Department of Electrical and Computer Engineering Univerity

More information

Computer Arithmetic Homework Solutions. 1 An adder for graphics. 2 Partitioned adder. 3 HDL implementation of a partitioned adder

Computer Arithmetic Homework Solutions. 1 An adder for graphics. 2 Partitioned adder. 3 HDL implementation of a partitioned adder Computer Arithmetic Homework 3 2016 2017 Solution 1 An adder for graphic In a normal ripple carry addition of two poitive number, the carry i the ignal for a reult exceeding the maximum. We ue thi ignal

More information

Edits in Xylia Validity Preserving Editing of XML Documents

Edits in Xylia Validity Preserving Editing of XML Documents dit in Xylia Validity Preerving diting of XML Document Pouria Shaker, Theodore S. Norvell, and Denni K. Peter Faculty of ngineering and Applied Science, Memorial Univerity of Newfoundland, St. John, NFLD,

More information

A Multi-objective Genetic Algorithm for Reliability Optimization Problem

A Multi-objective Genetic Algorithm for Reliability Optimization Problem International Journal of Performability Engineering, Vol. 5, No. 3, April 2009, pp. 227-234. RAMS Conultant Printed in India A Multi-objective Genetic Algorithm for Reliability Optimization Problem AMAR

More information

The norm Package. November 15, Title Analysis of multivariate normal datasets with missing values

The norm Package. November 15, Title Analysis of multivariate normal datasets with missing values The norm Package November 15, 2003 Verion 1.0-9 Date 2002/05/06 Title Analyi of multivariate normal dataet with miing value Author Ported to R by Alvaro A. Novo . Original by Joeph

More information

Problem Set 2 (Due: Friday, October 19, 2018)

Problem Set 2 (Due: Friday, October 19, 2018) Electrical and Computer Engineering Memorial Univerity of Newfoundland ENGI 9876 - Advanced Data Network Fall 2018 Problem Set 2 (Due: Friday, October 19, 2018) Quetion 1 Conider an M/M/1 model of a queue

More information

Synthesis of Signal Processing Structured Datapaths for FPGAs Supporting RAMs and Busses

Synthesis of Signal Processing Structured Datapaths for FPGAs Supporting RAMs and Busses Synthei of Signal Proceing Structured Datapath for FPGA Supporting AM and Bue Baher Haroun and Behzad Sajjadi Department of lectrical and Computer ngineering, Concordia Univerity 455 de Maionneuve Blvd.

More information

A TOPSIS based Method for Gene Selection for Cancer Classification

A TOPSIS based Method for Gene Selection for Cancer Classification Volume 67 No17, April 2013 A TOPSIS baed Method for Gene Selection for Cancer Claification IMAbd-El Fattah,WIKhedr, KMSallam, 1 Department of Statitic, 3 Department of Deciion upport, 2 Department of information

More information

KS3 Maths Assessment Objectives

KS3 Maths Assessment Objectives KS3 Math Aement Objective Tranition Stage 9 Ratio & Proportion Probabilit y & Statitic Appreciate the infinite nature of the et of integer, real and rational number Can interpret fraction and percentage

More information

( ) subject to m. e (2) L are 2L+1. = s SEG SEG Las Vegas 2012 Annual Meeting Page 1

( ) subject to m. e (2) L are 2L+1. = s SEG SEG Las Vegas 2012 Annual Meeting Page 1 A new imultaneou ource eparation algorithm uing frequency-divere filtering Ying Ji*, Ed Kragh, and Phil Chritie, Schlumberger Cambridge Reearch Summary We decribe a new imultaneou ource eparation algorithm

More information

else end while End References

else end while End References 621-630. [RM89] [SK76] Roenfeld, A. and Melter, R. A., Digital geometry, The Mathematical Intelligencer, vol. 11, No. 3, 1989, pp. 69-72. Sklanky, J. and Kibler, D. F., A theory of nonuniformly digitized

More information

A Linear Interpolation-Based Algorithm for Path Planning and Replanning on Girds *

A Linear Interpolation-Based Algorithm for Path Planning and Replanning on Girds * Advance in Linear Algebra & Matrix Theory, 2012, 2, 20-24 http://dx.doi.org/10.4236/alamt.2012.22003 Publihed Online June 2012 (http://www.scirp.org/journal/alamt) A Linear Interpolation-Baed Algorithm

More information

A System Dynamics Model for Transient Availability Modeling of Repairable Redundant Systems

A System Dynamics Model for Transient Availability Modeling of Repairable Redundant Systems International Journal of Performability Engineering Vol., No. 3, May 05, pp. 03-. RAMS Conultant Printed in India A Sytem Dynamic Model for Tranient Availability Modeling of Repairable Redundant Sytem

More information

Development of an atmospheric climate model with self-adapting grid and physics

Development of an atmospheric climate model with self-adapting grid and physics Intitute of Phyic Publihing Journal of Phyic: Conference Serie 16 (2005) 353 357 doi:10.1088/1742-6596/16/1/049 SciDAC 2005 Development of an atmopheric climate model with elf-adapting grid and phyic Joyce

More information