Software Synthesis Trade-offs in Dataflow Representations of DSP Applications

Similar documents
MODELING OF BLOCK-BASED DSP SYSTEMS

Software Synthesis and Code Generation for Signal Processing Systems

Software Synthesis and Code Generation for Signal Processing Systems

Renesting Single Appearance Schedules to Minimize Buffer Memory ABSTRACT

Static Scheduling and Code Generation from Dynamic Dataflow Graphs With Integer- Valued Control Streams

The Dataflow Interchange Format: Towards Co-design of DSP-oriented Dataflow Models and Transformations

STATIC SCHEDULING FOR CYCLO STATIC DATA FLOW GRAPHS

Shared Memory Implementations of Synchronous Dataflow Specifications

A Lightweight Dataflow Approach for Design and Implementation of SDR Systems

Hardware/Software Co-synthesis of DSP Systems

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas

Memory-constrained Block Processing Optimization for Synthesis of DSP Software

Parameterized Modeling and Scheduling for Dataflow Graphs 1

An Optimized Message Passing Framework for Parallel Implementation of Signal Processing Applications

Fundamental Algorithms for System Modeling, Analysis, and Optimization

Dataflow Languages. Languages for Embedded Systems. Prof. Stephen A. Edwards. March Columbia University

Node Prefetch Prediction in Dataflow Graphs

Software Synthesis from Dataflow Models for G and LabVIEW

HETEROGENEOUS MULTIPROCESSOR MAPPING FOR REAL-TIME STREAMING SYSTEMS

[6] E. A. Lee and D. G. Messerschmitt, Static Scheduling of Synchronous Dataflow Programs for Digital Signal Processing, IEEE Trans.

ACKNOWLEDGEMENTS. and patience, and for his kindness and his great enthusiasm for research, which

Code Generation for TMS320C6x in Ptolemy

Parameterized Dataflow Modeling for DSP Systems

RESYNCHRONIZATION FOR MULTIPROCESSOR DSP IMPLEMENTATION PART 1: MAXIMUM THROUGHPUT RESYNCHRONIZATION 1

Design of a Dynamic Data-Driven System for Multispectral Video Processing

Lecture 4: Synchronous Data Flow Graphs - HJ94 goal: Skiing down a mountain

The Ptolemy Project. Modeling and Design of Reactive Systems. Presenter: Praveen Murthy, PhD Postdoc and Presenter

Invited submission to the Journal of VLSI Signal Processing SYNTHESIS OF EMBEDDED SOFTWARE FROM SYNCHRONOUS DATAFLOW SPECIFICATIONS

In Journal of VLSI Signal Processing Systems, Vol 21, No. 2, pages , June Kluwer Academic Publishers.

Synthesis of Embedded Software from Synchronous Dataflow Specifications

Algorithm. topsort ABCD 9,(3(2AB)C)(2D) buffer cost + looped schedule GDPPO. individuals actual population. Scheduling framework

Rapid Prototyping for Digital Signal Processing Systems using Parameterized Synchronous Dataflow Graphs

Cycle-Breaking Techniques for Scheduling Synchronous Dataflow Graphs

Cache Aware Optimization of Stream Programs

Disciplined Concurrent Models of Computation for Parallel Software

An Overview of the Ptolemy Project. Organizational

Partial Expansion Graphs: Exposing Parallelism and Dynamic Scheduling Opportunities for DSP Applications

Multicore DSP Software Synthesis using Partial Expansion of Dataflow Graphs

Multithreaded Simulation for Synchronous Dataflow Graphs

Minimising Buffer Requirements of Synchronous Dataflow Graphs with Model Checking

MULTIDIMENSIONAL DATAFLOW GRAPH MODELING AND MAPPING FOR EFFICIENT GPU IMPLEMENTATION

Dynamic Scheduling Implementation to Synchronous Data Flow Graph in DSP Networks

Published in: Proceedings of the 45th Annual Asilomar Conference on Signals, Systems, and Computers

Optimal Architectures for Massively Parallel Implementation of Hard. Real-time Beamformers

The Ptolemy Kernel Supporting Heterogeneous Design

System-level Synthesis of Dataflow Applications for FPGAbased Distributed Platforms

Efficient Simulation of Critical Synchronous Dataflow Graphs

EE213A - EE298-2 Lecture 8

Concurrent Models of Computation

Computational Models for Concurrent Streaming Applications

Rapid Prototyping of flexible Embedded Systems on multi-dsp Architectures

Dynamic Response Time Optimization for SDF Graphs

Abstract. 1. Introduction. 2 AnalysisofAffine Nested Loop Programs

Heterogeneous Design in Functional DIF

A Design Framework for Mapping Vectorized Synchronous Dataflow Graphs onto CPU-GPU Platforms

DIGITAL VS. ANALOG SIGNAL PROCESSING Digital signal processing (DSP) characterized by: OUTLINE APPLICATIONS OF DIGITAL SIGNAL PROCESSING

Modeling Stream-Based Applications using the SBF model of computation

Unit 2: High-Level Synthesis

Embedded Systems CS - ES

A Period Graph Throughput Estimator for Multiprocessor Systems 1

Heterogeneous Design in Functional DIF

System Canvas: A New Design Environment for Embedded DSP and Telecommunication Systems

Introduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation

OpenDF A Dataflow Toolset for Reconfigurable Hardware and Multicore Systems

Design Space Exploration

Energy Aware Optimized Resource Allocation Using Buffer Based Data Flow In MPSOC Architecture

Buffer Allocation in Regular Dataflow Networks: An Approach Based on Coloring Circular- Arc Graphs

Embedded Systems 8. Identifying, modeling and documenting how data moves around an information system. Dataflow modeling examines

Overview of Dataflow Languages. Waheed Ahmad

Embedded system design with multiple languages

JOINT MINIMIZATION OF CODE AND DATA FOR SYN- CHRONOUS DATAFLOW PROGRAMS

Compositionality in Synchronous Data Flow: Modular Code Generation from Hierarchical SDF Graphs

The Future of the Ptolemy Project

EE382N.23: Embedded System Design and Modeling

Hierarchical FSMs with Multiple CMs

Symbolic Buffer Sizing for Throughput-Optimal Scheduling of Dataflow Graphs

Shared Memory Implementations of Synchronous Dataflow Specifications Using Lifetime Analysis Techniques

EECS 144/244: Fundamental Algorithms for System Modeling, Analysis, and Optimization

Adaptive Stream Mining: A Novel Dynamic Computing Paradigm for Knowledge Extraction

MULTIDIMENSIONAL DATAFLOW GRAPH MODELING AND MAPPING FOR EFFICIENT GPU IMPLEMENTATION

Communication Systems Design in Practice

DUE to tight resource constraints, strict real-time performance

Goal-Driven Reconfiguration of Polymorphous Architectures

Algorithm GDPPO. individuals actual population The SDF-scheduling framework

Efficient Simulation of Critical Synchronous Dataflow Graphs

In Proceedings of the Conference on Design and Architectures for Signal and Image Processing, Edinburgh, Scotland, October 2010.

Synthesis of DSP Systems using Data Flow Graphs for Silicon Area Reduction

Modelling, Analysis and Scheduling with Dataflow Models

Memory Optimal Single Appearance Schedule with Dynamic Loop Count for Synchronous Dataflow Graphs

Efficient Code Synthesis from Extended Dataflow Graphs for Multimedia Applications

Compositionality in system design: interfaces everywhere! UC Berkeley

Communication Systems Design in Practice

Hierarchical Reconfiguration of Dataflow Models

ABSTRACT PERFORMANCE ANALYSIS OF POLYMORPHOUS COMPUTING ARCHITECTURES. Degree and year: Master of Science, 2001

ADSL Transmitter Modeling and Simulation. Department of Electrical and Computer Engineering University of Texas at Austin. Kripa Venkatachalam.

Architecture and Programming Model for High Performance Interactive Computation

EE382V: System-on-a-Chip (SoC) Design

Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture

fakultät für informatik informatik 12 technische universität dortmund Data flow models Peter Marwedel TU Dortmund, Informatik /10/08

LabVIEW Based Embedded Design [First Report]

Transcription:

in Dataflow Representations of DSP Applications Shuvra S. Bhattacharyya Department of Electrical and Computer Engineering, and Institute for Advanced Computer Studies University of Maryland, College Park 20742 ssb@eng.umd.edu, http://www.ece.umd.edu/~ssb S. S. Bhattacharyya, University of Maryland, College Park. Page 1 of 26.

Computational characteristics Infinitely iterated DSP system implementation Possibly significant multirate components Largely deterministic control flow Deadline oriented Implementation objectives Speed (latency and throughput) Predictable satisfaction of real-time constraints Memory requirements (code and data) Memory partitioning (code/data, on-chip/off-chip) Power consumption (Peak power, energy/function, ) Cost Time to market S. S. Bhattacharyya, University of Maryland, College Park. Page 2 of 26.

Block diagram environments for DSP application designed by members of the Ptolemy project, UC Berkeley S. S. Bhattacharyya, University of Maryland, College Park. Page 3 of 26.

Speech coding example application designed by members of the Ptolemy project, UC Berkeley S. S. Bhattacharyya, University of Maryland, College Park. Page 4 of 26.

Block specification S. S. Bhattacharyya, University of Maryland, College Park. Page 5 of 26.

Advantages of block-diagram-based DSP design Exposes coarse-grain parallelism Exposes additional high-level structure that is helpful for analysis, verification, and optimization Encourages desirable software engineering practices Modularity Code re-use Intuitive to DSP algorithm designers Block diagrams correspond to signal flow graphs Easy to learn Naturally supports a visual programming interface Avoids overspecification of the system S. S. Bhattacharyya, University of Maryland, College Park. Page 6 of 26.

DSP actors Fine-grain dataflow: nodes are simple atomic functions, < 5 processor instructions Coarse-grain dataflow: nodes are complex functions of 10-100 instructions or more For application modeling, we are interested in mixed-grain dataflow node functions can have arbitrary complexity examples: adders, FFT units, FIR filters, adaptive filters,... Internal functionality of a dataflow node Can be specified in an arbitrary programming language (usually C or assembly lang. for DSP software), or Can be specified as a nested dataflow graph (hierarchical dataflow) S. S. Bhattacharyya, University of Maryland, College Park. Page 7 of 26.

Evolution of dataflow models for DSP Synchronous dataflow, Lee/Messerschmitt, 1987 [16] SPW (Cadence), ADS (Hewlett Packard), Well behaved stream flow graphs, Gao/Govindarajan/Panangaden, 1992 [17] The token flow model, Buck 1993 [18] Multidimensional synchronous dataflow, Lee, 1992 [15] Scalable synchronous dataflow, Ritz/Pankert/Meyr, 1993 [7] COSSAP (Synopsys) Cyclo-static and cyclo-dynamic dataflow, Bilsen et al. [13] Virtuoso Synchro (Eonic Systems) Bounded dynamic dataflow, Pankert/Mauss/Ritz/Meyr, 1994 [14] The processing graph method, Kaplan/Stevens, 1995 [19] U. S. Naval Research Lab Parameterized dataflow, Bhattacharya/Bhattacharyya, 2001 [22] S. S. Bhattacharyya, University of Maryland, College Park. Page 8 of 26.

Dataflow modeling issues A consistent dataflow specification avoids (for all possible input sets): U: Unbounded buffer accumulation D: Deadlock (buffer underflow) A partially-consistent specification avoids U and D for some inputs, and suffers from U or D for other inputs. A decidable dataflow MoC is one in which consistency can always be verified in finite time. A binary-consistency dataflow model is one that contains no partially-consistent specifications. A statically-schedulable model is one for which static, periodic schedules can be constructed for all consistent specifications. S. S. Bhattacharyya, University of Maryland, College Park. Page 9 of 26.

Comparison of dataflow models Decidable Binary consistency Staticallyschedulable synchronous dataflow YES YES YES cyclo-static dataflow YES YES YES well-behaved dataflow YES YES NO token-flow model NO NO NO bdd.dynamic dataflow NO NO NO multidimensional SDF YES YES YES scalable SDF YES YES YES S. S. Bhattacharyya, University of Maryland, College Park. Page 10 of 26.

Synchronous dataflow The number of tokens produced and consumed by each actor is fixed Edges may have delays, implemented as initial tokens Periodic schedules Unique repetitions vector q 2 1 1 1 X Y Z D static, periodic schedules q X = 1, q Y = 2, q Z = 2 YXZYZ, XYZYZ, X(2 YZ) S. S. Bhattacharyya, University of Maryland, College Park. Page 11 of 26.

Synthesis flow for DSP Software Hierarchical Block Diagram Spec. Graphical Compiler Dataflow MoC Specification Scheduler Threading Compiler Machine Dependent Optimization Retargetable Code Generation Source- Source Optimization Procedural Language Specification Object Code S. S. Bhattacharyya, University of Maryland, College Park. Page 12 of 26.

Related Work Loop transformations Assume loop nests are given a-priori Focus on individual nests in isolation Complementary to SDF loop rolling Memory management and register allocation Assume given loop nest / instruction schedule, or Assume homogeneous data sizes, or Assume arbitrary control flow Block processing of dataflow graphs Ritz et al.: multirate block processing Lalgudi et al., Zivojnovic et al.: retiming techniques Hong et al.: minimizing context switching for single-rate graphs S. S. Bhattacharyya, University of Maryland, College Park. Page 13 of 26.

Scheduling trade-offs 1 10 1 1 1 X Y Z 5D Schedule 1: YZYZYZYZYZXYZYZYZYZYZ Threading Mode Code Size Buff. Cost Loop Overhead Subroutine Overhead Actor Activations Inlined κ( X) + 10κ( Y) 11 0 0 21 + 10κ( Z) Subr. κ( X) + κ( Y) + κ( Z) 11 0 21 +21σ c 21σ t S. S. Bhattacharyya, University of Maryland, College Park. Page 14 of 26.

Scheduling trade-offs 2 10 1 1 1 X Y Z 5D Schedule 2: (5YZ)X(5YZ) Threading Mode Code Size Buff. Cost Loop Overhead Subroutine Overhead Actor Activations Inlined κ( X) + 2κ( Y) 11 + 0 21 + 2κ( Z) + 2L c κ( X) + κ( Y) + κ( Z) + 2L c + 5σ c Subr. 11 21 2L s 10L i 2L s + 10L i 21σ t S. S. Bhattacharyya, University of Maryland, College Park. Page 15 of 26.

Scheduling trade-offs 3 10 1 1 1 X Y Z 5D Schedule 3: X(10Y)(10Z) Schedule 4: X(10YZ) Threading Mode Code Size Buff. Cost Loop Overhead Subroutine Overhead Actor Activations Inlined κ( X) + κ( Y) 25 + 0 3 + κ( Z) + 2L c κ( X) + κ( Y) + κ( Z) + L c Inlined 16 + 0 21 2L s L s 20L i 10L i S. S. Bhattacharyya, University of Maryland, College Park. Page 16 of 26. vs. 11 for previous schedules

Scheduling example summary Thr. Schedule Mode YZYZYZYZYZX YZYZYZYZYZ YZYZYZYZYZX YZYZYZYZYZ ( 5YZ)X 5YZ Code Size κ( X) + 10κ( Y) + 10κ( Z) Buff. Cost Loop Ovhd. Subr. Ovhd Actor Activs. I 11 0 0 21 κ( X) + κ( Y) + κ( Z) 21σ +21σ t c ( ) κ( X) + 2κ( Y) + 2κ( Z) 2L s + 10L i +2L c ( 5YZ)X( 5YZ) κ( X) + κ( Y) + κ( Z) 2L s + 10L i 21σ + 2L c + 5σ t c X( 10Y) ( 10Z) κ( X) + κ( Y) + κ( Z) 2L s + 20L i +2L c X( 10YZ) κ( X) + κ( Y) + κ( Z) + L c L s + 10L i S 11 0 21 I 11 0 21 S 11 21 I 25 0 3 I 16 0 21 S. S. Bhattacharyya, University of Maryland, College Park. Page 17 of 26.

Code/data design space example S. S. Bhattacharyya, University of Maryland, College Park. Page 18 of 26. Code Data Minimum buffer schedule, no looping 13735 32 Minimum buffer schedule, with looping 9400 32 Worst minimum code size schedule 170 1021 Best minimum code size schedule 170 264

Buffer minimization Construction of minimum buffer, static, periodic schedules is intractable [12] Cycles Delays Production/consumption mismatches on edges (iteration) Graph-theoretic upper bound analysis [8] Exact for some useful subclasses of graphs Optimal scheduling technique for restricted graphs [9] No directed cycles No interacting undirected cycles Demand-driven scheduling approach [11] Good general heuristic Buffer minimization for maximum-throughput (rate-optimal) schedules [10] S. S. Bhattacharyya, University of Maryland, College Park. Page 19 of 26.

CD to DAT example minimum buffer schedule: A B A B C A B C A B A B C A B C D E A F F F F F B A B C A B C A B A B C D E A F F F F F B C A B A B C A B C A B A B C D E A F F F F F B C A B A B C A B C D E A F F F F F B A B C A B C A B A B C A B C D E A F F F F F B A B C A B C A B A B C D E A F F F F F B C A B A B C A B C A B A B C D E A F F F F F E B C A F F F F F B A B C A B C D E A F F F F F B A B C A B C A B A B C A B C D E A F F F F F B A B C A B C A B A B C D E A F F F F F B C A B A B C A B C A B A B C D E A F F F F F B C A B A B C A B C D E A F F F F F B A B C A B C A B A B C A B C D E A F F F F F B A B C A B C A B A B C D E A F F F F F E B C A F F F F F B A B C A B C A B A B C D E A F F F F F B C A B A B C A B C D E A F F F F F B A B C A B C A B A B C A B C D E A F F F F F B A B C A B C A B A B C D E A F F F F F B C A B A B C A B C A B A B C D E A F F F F F B C A B A B C A B C D E A F F F F F B A B C A B C A B A B C A B C D E A F F F F F E B A F F F F F B C A B C A B A B C D E A F F F F F B C A B A B C A B C A B A B C D E A F F F F F B C A B A B C A B C D E A F F F F F B A B C A B C A B A B C A B C D E A F F F F F B A B C A B C A B A B C D E A F F F F F B C A B A B C A B C A B A B C D E A F F F F F B C A B A B C A B C D E F F F F F E F F F F F S. S. Bhattacharyya, University of Maryland, College Park. Page 20 of 26.

Optimization Techniques Buffer minimization Lifetime analysis Buffer merging Joint code and data minimization Data partitioning Synchronization optimization Redundant synchronization elimination Resynchronization Transaction ordering S. S. Bhattacharyya, University of Maryland, College Park. Page 21 of 26.

Summary Decidable dataflow models and synchronous dataflow Periodic schedules Implicit iteration Complex design space Optimization techniques S. S. Bhattacharyya, University of Maryland, College Park. Page 22 of 26.

Bibliography 1 [1] V. Zivojnovic, S. Ritz, and H. Meyr, Retiming of DSP programs for optimum vectorization, in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, 1994. [2] K. N. Lalgundi, M. C. Papaefthymiou, and M. Potkonjak, Optimizing Systems for Effective Block-Processing: The k-delay Problem, in Proceedings of the Design Automation Conference, 1996, pp. 714--719. [3] S. Ritz, M. Willems, and H. Meyr, Scheduling for Optimum Data Memory Compaction in Block Diagram Oriented Software Synthesis, in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, 1995. [4] P. K. Murthy and S. S. Bhattacharyya, Shared Memory Implementations of Synchronous Dataflow Specifications using Lifetime Analysis Techniques, Institute for Advanced Computer Studies, University of Maryland at College Park UMIACS-TR-99-32, June 1999. [5] S. S. Bhattacharyya and P. K. Murthy, The CBP parameter a useful annotation to aid SDF compilers, Institute for Advanced Computer Studies, University of Maryland at College Park UMIACS-TR-99-56, September 1999. S. S. Bhattacharyya, University of Maryland, College Park. Page 23 of 26.

Bibliography 2 [6] S. S. Bhattacharyya, P. K. Murthy, and E. A. Lee, Optimal Parenthesization of Lexical Orderings for DSP Block Diagrams, in Proceedings of the International Workshop on VLSI Signal Processing: IEEE press, 1995. [7] S. Ritz, M. Pankert, and H. Meyr, Optimum Vectorization of Scalable Synchronous Dataflow Graphs, in Proceedings of the International Conference on Application Specific Array Processors, 1993. [8] M. Ade, R. Lauwereins, and J. A. Peperstraete, Data Memory Minimisation for Synchronous Data Flow Graphs Emulated on DSP-FPGA Targets, in Proceedings of the Design Automation Conference, 1994, pp. 64--69. [9] M. Cubric and P. Panangaden, Minimal Memory Schedules for Dataflow Networks, in CONCUR '93, 1993. [10] R. Govindarajan, G. R. Gao, and P. Desai, Minimizing Memory Requirements in Rate-Optimal Schedules, in Proceedings of the International Conference on Application Specific Array Processors, 1994. [11] E. A. Lee, W. H. Ho, E. Goei, J. Bier, and S. S. Bhattacharyya, Gabriel: A Design Environment for DSP, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, 1989. S. S. Bhattacharyya, University of Maryland, College Park. Page 24 of 26.

Bibliography 3 [12] S. S. Bhattacharyya, P. K. Murthy, and E. A. Lee, Software Synthesis from Dataflow Graphs: Kluwer Academic Publishers, 1996. [13] G. Bilsen, M. Engels, R. Lauwereins, and J. A. Peperstraete, Cyclo-Static Dataflow, IEEE Transactions on Signal Processing, vol. 44, pp. 397--408, 1996. [14] M. Pankert, O. Mauss, S. Ritz, and H. Meyr, Dynamic Data Flow and Control Flow in High Level DSP Code Synthesis, in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, 1994. [15] E. A. Lee, Representing and Exploiting Data Parallelism Using Multidimensional Dataflow Diagrams, in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, 1993, pp. 453--456. [16] E. A. Lee and D. G. Messerschmitt, Static Scheduling of Synchronous Dataflow Programs for Digital Signal Processing, IEEE Transactions on Computers, 1987. [17] G. R. Gao, R. Govindarajan, and P. Panangaden, Well-Behaved Programs for DSP Computation, in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, 1992. S. S. Bhattacharyya, University of Maryland, College Park. Page 25 of 26.

Bibliography 4 [18] J. T. Buck and E. A. Lee, The Token Flow Model, in Advanced Topics in Dataflow Computing and Multithreading, L. Bic, G. Gao, and J.-L. Gaudiot, Eds.: IEEE Computer Society Press, 1993. [19] R. S. Stevens, The Processing Graph Method Tool (PGMT), in Proceedings of the International Conference on Application Specific Systems, Architectures, and Processors, 1997, pp. 263--271. [20] J. Teich, E. Zitzler, and S. S. Bhattacharyya, Optimized Software Synthesis for Digital Signal Processing Algorithms an Evolutionary Approach, in Proceedings of the IEEE Workshop on Signal Processing Systems, October, 1998. [21] S. Sriram and S. S. Bhattacharyya. Embedded Multiprocessors: Scheduling and Synchronization. Marcel Dekker, Inc., 2000. [22] B. Bhattacharya and S. S. Bhattacharyya. Parameterized dataflow modeling for DSP systems. IEEE Transactions on Signal Processing, 49(10):2408-2421, October 2001. S. S. Bhattacharyya, University of Maryland, College Park. Page 26 of 26.