System-on-Chip Design Analysis of Control Data Flow. Hao Zheng Comp Sci & Eng U of South Florida

Similar documents
High-Level Synthesis Creating Custom Circuits from High-Level Code

CHAPTER 4. Applications of Boolean Algebra/ Minterm and Maxterm Expansions

Assembler. Building a Modern Computer From First Principles.

Outline. Digital Systems. C.2: Gates, Truth Tables and Logic Equations. Truth Tables. Logic Gates 9/8/2011

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

The stream cipher MICKEY-128 (version 1) Algorithm specification issue 1.0

Agenda & Reading. Simple If. Decision-Making Statements. COMPSCI 280 S1C Applications Programming. Programming Fundamentals

such that is accepted of states in , where Finite Automata Lecture 2-1: Regular Languages be an FA. A string is the transition function,

Circuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL)

Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access

Ptolemy II in Embedded Signal Processing Architectures: Deriving Process Networks From Matlab

Storage Binding in RTL synthesis

LLVM passes and Intro to Loop Transformation Frameworks

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6)

Lecture 3: Computer Arithmetic: Multiplication and Division

Code Genera*on for Control Flow Constructs

RADIX-10 PARALLEL DECIMAL MULTIPLIER

Algorithmic Transformation Techniques for Efficient Exploration of Alternative Application Instances

CPE 628 Chapter 2 Design for Testability. Dr. Rhonda Kay Gaede UAH. UAH Chapter Introduction

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Design of Structure Optimization with APDL

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface.

Conditional Speculative Decimal Addition*

Concurrent Apriori Data Mining Algorithms

CSSE232 Computer Architecture I. Mul5cycle Datapath

Oracle Database: SQL and PL/SQL Fundamentals Certification Course

Vulnerability Analysis (III): Sta8c Analysis

The MIPS Processor Datapath

Loop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation

AADL : about scheduling analysis

Area Efficient Self Timed Adders For Low Power Applications in VLSI

Parallel Inverse Halftoning by Look-Up Table (LUT) Partitioning

COMP303 Computer Architecture Lecture 9. Single Cycle Control

Register Transfer Methodology II

Outline. Register Transfer Methodology II. 1. One shot pulse generator. Refined block diagram of FSMD

Concurrent models of computation for embedded software

IP Lookup-2: The Completion Buffer. IP-Lookup module without the completion buffer

Behavior-Level Observability Analysis for Operation Gating in Low-Power Behavioral Synthesis

CMPS 10 Introduction to Computer Science Lecture Notes

CSSE232 Computer Architecture I. Datapath

Loop Permutation. Loop Transformations for Parallelism & Locality. Legality of Loop Interchange. Loop Interchange (cont)

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 32: Pipeline Parallelism 3

Module Management Tool in Software Development Organizations

High level vs Low Level. What is a Computer Program? What does gcc do for you? Program = Instructions + Data. Basic Computer Organization

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

Lec-6-HW-2-digitalDesign

THEORETICAL BACKGROUND FOR THE APPLET DESIGN AND TEST OF DIGITAL SYSTEMS ON RT-LEVEL AND RELATED EXERCISES

Programming FPGAs in C/C++ with High Level Synthesis PACAP - HLS 1

Data Structures and Algorithms in Compiler Optimization. Comp314 Lecture Dave Peixotto

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Learning Outcomes. Spiral 3-3. Sorting: Software Implementation REVIEW

Spatial Computation ABSTRACT 1. INTRODUCTION

UPCRC. Illiac. Gigascale System Research Center. Petascale computing. Cloud Computing Testbed (CCT) 2

Programming Assignment Six. Semester Calendar. 1D Excel Worksheet Arrays. Review VBA Arrays from Excel. Programming Assignment Six May 2, 2017

CE 221 Data Structures and Algorithms

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

CDA 4253 FPGA System Design Op7miza7on Techniques. Hao Zheng Comp S ci & Eng Univ of South Florida

FPGA-based implementation of circular interpolation

Topic 5: semantic analysis. 5.5 Types of Semantic Actions

Computer Vision. Pa0ern Recogni4on Concepts Part II. Luis F. Teixeira MAP- i 2012/13

RISC Architecture: Multi-Cycle Implementation

Research Article A Formal Model for Performance and Energy Evaluation of Embedded Systems

Improved Symoblic Simulation By Dynamic Funtional Space Partitioning

Newton-Raphson division module via truncated multipliers

Run-Time Energy Estimation in System-On-a-Chip Designs *

Q.1 Q.20 Carry One Mark Each. is differentiable for all real values of x

Classifying Acoustic Transient Signals Using Artificial Intelligence

A Model Based on Multi-agent for Dynamic Bandwidth Allocation in Networks Guang LU, Jian-Wen QI

Transac.on Management. Transac.ons. CISC437/637, Lecture #16 Ben Cartere?e

Midterms Save the Dates!

A SAT-BASED BOUNDED MODEL CHECKER FOR CONCURRENT ASSEMBLY PROGRAMS. Guodong Li, Ganesh Gopalakrishnan, Konrad Slind

UNIT I Introduction to VHDL VHDL: - V -VHSIC, H - Hardware, D - Description, L Language Fundamental section of a basic VHDL code Library :

The Codesign Challenge

Loop Transformations, Dependences, and Parallelization

Article RGCA: a Reliable GPU Cluster Architecture for Large-Scale Internet of Things Computing Based on Effective Performance-Energy Optimization

A SCALABLE DIGITAL ARCHITECTURE OF A KOHONEN NEURAL NETWORK

Analysis of Min Sum Iterative Decoder using Buffer Insertion

Design of Embedded DSP Processors

Reading assignment. Chapter 3.1, 3.2 Chapter 4.1, 4.3

CS 151 Midterm. Instructions: Student ID. (Last Name) (First Name) Signature

Array transposition in CUDA shared memory

Introduction to Programming. Lecture 13: Container data structures. Container data structures. Topics for this lecture. A basic issue with containers

Private Information Retrieval (PIR)

Chapter 1. Introduction

Parallel Processing for Large-scale Fault Tree in Wireless Sensor Networks

Efficient Distributed File System (EDFS)

Processor: Multi- Cycle Datapath & Control

Lecture #17: CPU Design II Control

Scheduling with Integer Time Budgeting for Low-Power Optimization

Intro. Iterators. 1. Access

CTL Property Checking Based on a New High Level Model Without Equation Solving

c. Typically results in an intractably large set of test cases even for small programs

Verification by testing

Improved Mutual Information Based on Relative Frequency. Factor and Degree of Difference among Classes

Instructor: Randy H. Katz hap://inst.eecs.berkeley.edu/~cs61c/fa13. Fall Lecture #7. Warehouse Scale Computer

Wireless Temperature Monitoring Overview

124 Chapter 8. Case Study: A Memory Component ndcatng some error condton. An exceptonal return of a value e s called rasng excepton e. A return s ssue

Two-Stage Data Distribution for Distributed Surveillance Video Processing with Hybrid Storage Architecture

CS 465 Final Review. Fall 2017 Prof. Daniel Menasce

Transcription:

System-on-Chp Desgn Analyss of Control Data Flow Hao Zheng Comp Sc & Eng U of South Florda

Overvew DF models descrbe concurrent computa=on at a very hgh level Each actor descrbes non-trval computa=on. Each actor s oben descrbed n C. Can be mapped to ether HW or SW Wll look at ssues n mappng C to HW. 2

Data & Control Edges of C Programs C s used as a modelng as well as an mplementa=on language. Mappng C programs to HW s hard. HW s parallel whle C s sequen=al. need to understand the structure of C programs. Rela=ons between opera=ons n C programs Data edges: data moved from one op. to another. Control edge: no data xfer. 3

Control Flow Graph nt x(a, b) { nt r; 2 f (a > b) 3 r = a; else 4 r = b; 5 return r; } nt max(nt a, b) f (a > b) r = a return r; Control Edges 2 3 4 5 r = b Control edges are oben labeled wth cond=ons whose sa=sfac=on dctates f a control can be taken. 4

Data Flow Graph Data Edges nt max(nt a, b) { nt r; 2 f (a > b) 3 r = a; else 4 r = b; 5 return r; } a, b a 2 (a>b) 3 4 r r 5 b Data edges are labeled wth varables upon whch one opera=on depends on another 5

Implemen@ng Control/Data Edges A data edge => flow of nforma=on Must be mplemented. A control edge => result of seman=cs of program language Maybe gnore or changed f the behavor remans the same. 6

Implemen@ng Control/Data Edges Control Edges Data Edges Hardware Implementaton nt sum(nt a, b, c) { nt v; v = a + b; // op 2 v2 = v + c; // op 3 return v2; } 2 3 a, b v v2 2 3 c a b c v adder adder 4 4 v2 Control edges are meanngless as HW s parallel. 7

Control/Data Edges Example nt sum(nt a, b, c, d) {// op nt v; v = a + b; // op 2 v2 = c + d; // op 3 return v + v2; // op 4 } 8

Basc Elements of CFG 2 3 for (=0; < 20; ++) { // body of the loop } entry 2 3 ext body 9

Construc@on of CFG f(a < b) { // true branch } else { // false branch } entry true false ext 0

Construc@on of CFG whle (a < b) { // loop body } entry ext body

Construc@on of CFG do { // loop body } whle (a<b) entry body ext 2

Construc@on of CFG: GCD : nt gcd(nt a, nt b) { 2: whle (a!= b) { 3: f (a > b) 4: a = a - b; else 5: b = b - a; } 6: return a; } 2 6 3 4 5 A control path n CFG corresponds to a sequence of execu=ons of statements 3

Construc@on of DFG: GCD : nt gcd(nt a, nt b) { 2: whle (a!= b) { 3: f (a > b) 4: a = a - b; else 5: b = b - a; } 6: return a; } 2 (a!=b) 3 a, b 6 CFG 2 3 6 a (a>b) 4 5 b 4 5 Par=al DFG 4

Construc@on of DFG: GCD : nt gcd(nt a, nt b) { 2: whle (a!= b) { 3: f (a > b) 4: a = a - b; else 5: b = b - a; } 6: return a; } a a, b a a 2 3 4 5 b a a, b a a, b b b a, b b 6 a 5

Construc@on of CFG/DFG 2a : nt L[3] = {0, 20, 30}; 2a 2b 2c 2: for (nt =; <3; ++) 3: L[] = L[] + L[-]; 2b ext How to treat ndexed varables n DFG construc=on? 3 2c CFG 6

Construc@on of CFG/DFG a b 2a 2a 2b 2b L L[0], L[], L[2] 2c 3 2c 3 L L[] Treat L as a sngle monolthc varable Loca=ons of L are treated ndvdually 7

Construc@on of CFG/DFG a b 2a 2a 2b 2b L L[0], L[], L[2] 2c 3 2c 3 L L[] L[2] Treat L as a sngle monolthc varable Loca=ons of L are treated ndvdually 8

DFG Analyss Loop Unrollng nt L[3] = {0, 20, 30}; L[] = L[] + L[0]; L[2] = L[2] + L[]; 9

Transla@ng C to HW Assump=ons: Scalar C programs no ponters and arrays Implement each statement n a clock cycle. Basc Idea Construct CFG and DFG CFG => controller (control edge -> control sg.) DFG => datapath (data edges -> comp conn.) Not very effcent exst many op=mza=on opportun=es 20

HW RTL Archtecture Control Inputs Control Sgnals Data Inputs Controller Datapath Control Outputs Status Sgnals Data Outputs 2

Transla@ng C to HW: Buldng Datapath Each varable => a regster MUX s used f a varable s updated n mul=ple statements. Each expresson => a combna=onal logc Cond=onal expressons => flags to controller Datapath crcuts and regsters are connected accordng to data edges n DFG. 22

Transla@ng C to HW: Buldng Datapath : nt gcd(nt a, nt b) { 2: whle (a!= b) { 3: f (a > b) 4: a = a - b; else 5: b = b - a; } 6: return a; } a-b - - a n_a upd_a upd_b!= n_b b b-a flag_whle > out_a flag_f 23

Transla@ng C to HW: Buldng Controller _ / run s s2! flag_whle / _ s6 Label CFG edges wth flags from datapath and ac=ons that DP should perform, and mplement CFG as FSM. _ / run4 flag_f / _ s4 s3 _ / run5 flag_whle / _! flag_f / _ s5 24

Transla@ng C to HW: Buldng Controller flag_whle flag_f state n_a n_b Next-state Logc Datapath flag_whle flag_f upd_a upd_b Lookup Table command {_, run, run4, run5} nstructon upd_a upd_b out_a _ run run4 run5 a a_n a - b a b b_n b b - a upd_a upd_b 25

Lmta@ons Each varable mapped to a regster. A func=onal unt s allocated to every operator. Performance bojleneck as a sngle statement s executed n a sngle clock cycle. Processor s already dong ths. Can mul=ple statements be executed n a cycle? 26

Transla@ng C to HW: Sngle-Assgnment Form Each varable s assgned exactly once. To mprove effcency of the HW mplementa=on. a = a + ; a = a * 3; a = a 2; a2 = a + ; a3 = a2 * 3; a4 = a3 2; 27

Transla@ng C to HW: Sngle-Assgnment Form nt gcd(nt a, b) { whle (a!= b) { f (a > b) a = a b; else b = b a; } return a; } nt gcd(nt a, b) { whle (merge(a, a2)!= merge(b, b2)) { a3 = merge(a, a2); b3 = merge(b, b2); f (a3 > b3) a2 = a3 b3; else b2 = b3 a3; } return a; } 28

Transla@ng C to HW: Sngle-Assgnment Form a b nt gcd(nt a, b) { whle (merge(a, a2)!= merge(b, b2)) { a3 = merge(a, a2); b3 = merge(b, b2); f (a3 > b3) a2 = a3 b3; else b2 = b3 a3; } return a; } a3 > flag_whle!= flag_whle - - b3 a2 b2 29

Readng Gude Chapter 4, the CoDesgn book. 30