Chapter 8 Folding. VLSI DSP 2008 Y.T. Hwang 8-1. Introduction (1)
|
|
- Oliver Dalton
- 6 years ago
- Views:
Transcription
1 Chapter 8 olding LSI SP 008 Y.T. Hang 8- folding Introduction SP architecture here multiple operations are multiplexed to a single function unit Trading area for time in a SP architecture Reduce the number of function units by a factor of N at the expense of increasing the computing time by a factor of N N: folding factor Present a systematic ay to derive the folded SP architecture LSI SP 008 Y.T. Hang 8-
2 olding example yn = an + bn + cn Time multiplexed on a single pipeline adder An input sample must remains clock cycles Introduction LSI SP 008 Y.T. Hang 8-3 More on folding Introduction 3 May lead to an architecture using a large number of registers esign to minimize the number of registers LSI SP 008 Y.T. Hang 8-4
3 Preliminary Consider a G olding transformation An edge e connecting nodes and ith e delays Executions of the l-th iterations of and at time units Nl+u and Nl+v u and v: folding orders and 0 u,v N- N: folding factor, the number of operations folded to a single function unit H and H : function units to execute nodes and H is pipelined by P stages LSI SP 008 Y.T. Hang 8-5 olding an edge e olding transformation has e delays l-th iteration of node is available at time Nl + u + P Generated data is used by the l+e-th iteration of The result must be stored for e [ N l e v] [ Nl P u] N e P v u olding factor = N LSI SP 008 Y.T. Hang 8-6
4 olding set olding transformation 3 An order of operations executed by the same hardare Example: S = {A,Ø,A} A: S 0, A: S Biquad filter example Addition : u.t. and -stage pipelining, P A = Multiplication: u.t. and -stage pipelining, P M = olding factor N = 4 Assume folding set S = {4,, 3, }, S = {5, 8, 6, 7} LSI SP 008 Y.T. Hang 8-7 olding transformation 4 Biquad filter example cont. Node 3 is executed on adder at time instance 4l + LSI SP 008 Y.T. Hang 8-8
5 olding transformation 5 Biquad filter example cont. 8 = 5 : an edge from the adder to the multiplier in the folded G ith 5 delays Because node 8 has S, the folded edge is sitched at the input of the multiplier at 4l + LSI SP 008 Y.T. Hang 8-9 alid folding e 0 olding transformation 6 must hold for all edges in the G Can be achieved by retiming Recall e after retiming has a delay r e = e + r - r ' e Let 0 denote the number of folded delay by folding the retimed G ' e 0 N e r r P r r N e P e N r v u 0 v u 0 r r e N LSI SP 008 Y.T. Hang 8-0
6 olding transformation 7 Retiming for valid folding Solve a system of inequalities irst construct a constraint se loyd-warshall algorithm to solve the problem LSI SP 008 Y.T. Hang 8- olding transformation 8 Retiming for valid folding cont. Constraint graph Solution r = -, r = 0 r3 = -, r4 = 0 r5 = -, r6 = - r7 = -, r8 = - Leads to the G in ig 6.3 Can be achieved equivalently by cut set retiming using C and C LSI SP 008 Y.T. Hang 8-
7 More on folding olding transformation 8 The original G and the N-unfolded version of the folded G synthesized ith folding factor N are retimed and/or pipelined versions of each other An arbitrary G can be unfolded by a factor N and then folded again to generate a family of architectures LSI SP 008 Y.T. Hang 8-3 Register minimization in folding Lifetime analysis To compute the minimum number of registers required to implement a SP algorithm in hardare A data sample variable is live from the time it is produced excluded through the time it is consumed included A variable after lifetime is called dead The maximum number of live variables at each time unit is the minimum number of registers required to implement the SP program LSI SP 008 Y.T. Hang 8-4
8 Example Register minimization in folding Assume 3 variables a, b, c Life time of variable a: {,,3,4} Life time of variable b: {,3,4,5,6,7} Life time of variable c: {5,6,7} Number of live variables {,,,,,,} registers are needed to implement the SP program LSI SP 008 Y.T. Hang 8-5 Linear lifetime chart When the iteration period is less than the span of the scheduling, the scheduling overlaps The number of live variables at time instance n is the sum of the number of live variables at cycles n-kn, k Non-overlapped Overlapped ith Schedule period 6 LSI SP 008 Y.T. Hang 8-6
9 Linear lifetime chart Matrix transpose example Assume ro-ise access a d g b e h c a f b i c d e f g h i Input time: T input Zero latency output time: T zlout T diff = T zlout T input Required latency T lat = magnitude of the most negative value of T diff T output = T zlout + T lat LSI SP 008 Y.T. Hang 8-7 Linear lifetime chart 3 Matrix transpose example cont. Assume iteration period of the SP program is N = 9 LSI SP 008 Y.T. Hang 8-8
10 Circular lifetime chart Circular lifetime chart Point i represents the time partition i and all time instances {Nl+i} linear circular LSI SP 008 Y.T. Hang 8-9 ata allocation orard backard register allocation To achieve minimum number of registers etermine ho variables are assigned to registers in the allocation table Step : determine the minimum number of registers using lifetime analysis Step : Input each variable at the time step corresponding to the beginning of its lifetime If multiple variables are input in a given cycle, they are allocated to multiple registers according to lifetime in a descending order LSI SP 008 Y.T. Hang 8-0
11 ata allocation orard allocation If register i holds the variable in the current cycle, then register i+ holds the same variable in the next cycle If the register i+ is not available, then the variable is allocated to the first available forard register Step 3: Each register is allocated in a forard manner until it is dead or reaches the last register Step 4: In periodic scheduling, the allocation of current iteration also repeats itself in subsequent iterations If R j is occupied by a variable in cycle l, hash the position for R j at time unit l+n LSI SP 008 Y.T. Hang 8- Step 5: ata allocation 3 or a variable that reaches the last register and is not yet dead, allocate it in backard manner If multiple registers available, choose the one ith least but sufficient number of forard registers capable of completing the allocation After a variable has been allocated backard, allocate it in a forard manner until it is dead or again reaches the last register Step 6: Repeat step 4 and 5 as required until the allocation is complete LSI SP 008 Y.T. Hang 8-
12 ata allocation 4 3X3 matrix transpose example ith N = 9 hashing After steps ~4 completion LSI SP 008 Y.T. Hang 8-3 Another example ata allocation 5 Linear lifetime chart Step ~4 completion LSI SP 008 Y.T. Hang 8-4
13 ata allocation 6 architecture design after register allocation LSI SP 008 Y.T. Hang 8-5 ata allocation 7 architecture design after register allocation LSI SP 008 Y.T. Hang 8-6
14 Goal Register minimization in folding To synthesize control circuits in folded architectures ith minimum number of registers Procedures Perform retiming for folding Write folding equations se the folding equations to construct a lifetime table ra the lifetime chart and determine the required number of registers Perform forard-backard register allocation ra the folded architecture that uses the minimum number of registers LSI SP 008 Y.T. Hang 8-7 Biquad filter Biquad filter example Original bi-quad ilter design esign after retiming LSI SP 008 Y.T. Hang 8-8
15 Biquad filter example esign ithout register minimization Total of 6 external and 3 internal pipelining registers olding equations olded architecture LSI SP 008 Y.T. Hang 8-9 Biquad filter example 3 Construct a lifetime table Each a node ith lifetime T input T output corresponds to an entry in the lifetime table T input : u folding order + P # of pipelining stages of the function unit T output : u+ P +max { oe node, folding order is 3, adder s P is T input = 3+=4 T output = u+ P +max { = 3++max{,0,,3,5}=9 LSI SP 008 Y.T. Hang 8-30
16 Biquad filter example 4 Construct a lifetime table and lifetime chart Assume N iteration period is 4 Minimum number of registers required is LSI SP 008 Y.T. Hang 8-3 Biquad filter example 5 Allocation table Only variables n, n 7 and n 8 ith non-zero duration are shon ariable n is output in cycles 4,5,6,8,9, only the latest cycle 9 is shon in the table LSI SP 008 Y.T. Hang 8-3
17 Biquad filter example 6 olded design ith registers Edge has = delay after delay the variable n is located in R An edge from R to adder sitched at 4l+ because the node has folding order LSI SP 008 Y.T. Hang 8-33 Biquad filter example 7 olded design ith registers cont. Edge 7 has 7= 3 delays after 3 delays the variable n is located in R An edge from R to multiplier sitched at 4l+ because the node 7 has folding order LSI SP 008 Y.T. Hang 8-34
18 IIR filter before retiming yn = ayn-3 + byn-5 + xn olding factor = IIR filter example olding set: A S = {,}, MPY S = {4,3} Retiming solution r = 0, r = 0, r3 = -, r4 = - LSI SP 008 Y.T. Hang 8-35 IIR filter after retiming olding equations for the retimed G = = 0 3= 3 + = 5 4= + 0 = 3 = + 0 = 4 = = 0 IIR filter example Lifetime table LSI SP 008 Y.T. Hang 8-36
19 Lifetime chart IIR filter example 3 A total of 3 registers is needed LSI SP 008 Y.T. Hang 8-37 IIR filter example 4 Allocation table and folded design 3 registers minimized v.s. 6 registers unminimized LSI SP 008 Y.T. Hang 8-38
20 olding of multi-rate systems ecimators and expanders lead to a multi-rate system ecimation by M expansion by M ecimator: thro aay M- out of M samples y n = xmn Expander: insert M- zeros in beteen y E x n / M if n is a multiple 0 otherise of M LSI SP 008 Y.T. Hang 8-39 olding of multi-rate systems olding of an decimator Arc ith decimator olded arc l-th iteration of node executed at time N l + u l-th iteration of node executed at time N l + v olding order u[0, N olding order v[0, N LSI SP 008 Y.T. Hang 8-40
21 LSI SP 008 Y.T. Hang 8-4 olding of multi-rate systems 3 olding of an decimator cont. Sample yl consumed during the l-th iteration of is produced during the Ml M + -th iteration of yl is consumed by H in time unit N l + v generated by H in time unit N Ml M + +u+p yl must be stored for l M x l s l y Ml x Ml s l s l x l s u v P M N l MN N P u M Ml N v l N ] [ ] [ LSI SP 008 Y.T. Hang 8-4 olding of multi-rate systems 4 olding of an decimator cont. In a decimator, N = MN Node executes M times for each execution of node u v P M N
22 olding of multi-rate systems 5 ecimator folding example olding factors N = N N 6 0 N N 3 olding orders u, v, v, v 4, v P = 0 3 olding equations e e 30 e e LSI SP 008 Y.T. Hang 8-43 olding of multi-rate systems 6 ecimator folding example cont. Number of registers required can be reduced using lifetime analysis 0 must hold given a feasible schedule Noble identities elay redistribution in a multirate system LSI SP 008 Y.T. Hang 8-44
23 LSI SP 008 Y.T. Hang 8-45 olding of multi-rate systems 7 Retiming of multi-rate G Let and be the number of delays on arc after retiming ru, rv: retiming values of nodes and, respectively r uv : number of times one delays removed from its output, and M delays are added to its input uv uv uv uv N Mr r N Mr r r Mr N u v P r Mr r r M N r r r Mr u v P M N 0 ] [ here ' ' ' ' ' ' LSI SP 008 Y.T. Hang 8-46 olding of multi-rate systems 8 Retiming of multi-rate G cont. Note that retiming may yield not equivalent result due to its periodically time varying nature Example: assume ra = -, rmpy = 0 z n = axn + yn z n = axn- + yn-
Chapter 6: Folding. Keshab K. Parhi
Chapter 6: Folding Keshab K. Parhi Folding is a technique to reduce the silicon area by timemultiplexing many algorithm operations into single functional units (such as adders and multipliers) Fig(a) shows
More informationFolding. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall,
Folding ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall, 2010 ldvan@cs.nctu.edu.tw http://www.cs.nctu.tw/~ldvan/ Outline Introduction Folding Transformation
More informationFolding. Hardware Mapped vs. Time multiplexed. Folding by N (N=folding factor) Node A. Unfolding by J A 1 A J-1. Time multiplexed/microcoded
Folding is verse of Unfolding Node A A Folding by N (N=folding fator) Folding A Unfolding by J A A J- Hardware Mapped vs. Time multiplexed l Hardware Mapped vs. Time multiplexed/mirooded FI : y x(n) h
More informationExercises in DSP Design 2016 & Exam from Exam from
Exercises in SP esign 2016 & Exam from 2005-12-12 Exam from 2004-12-13 ept. of Electrical and Information Technology Some helpful equations Retiming: Folding: ω r (e) = ω(e)+r(v) r(u) F (U V) = Nw(e) P
More informationSynthesis of DSP Systems using Data Flow Graphs for Silicon Area Reduction
Synthesis of DSP Systems using Data Flow Graphs for Silicon Area Reduction Rakhi S 1, PremanandaB.S 2, Mihir Narayan Mohanty 3 1 Atria Institute of Technology, 2 East Point College of Engineering &Technology,
More informationRetiming. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall,
Retiming ( 范倫達 ), Ph.. epartment of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall, 2 ldvan@cs.nctu.edu.tw http://www.cs.nctu.tw/~ldvan/ Outlines Introduction efinitions and Properties
More informationOptimized Design Platform for High Speed Digital Filter using Folding Technique
Volume-2, Issue-1, January-February, 2014, pp. 19-30, IASTER 2013 www.iaster.com, Online: 2347-6109, Print: 2348-0017 ABSTRACT Optimized Design Platform for High Speed Digital Filter using Folding Technique
More informationMemory, Area and Power Optimization of Digital Circuits
Memory, Area and Power Optimization of Digital Circuits Laxmi Gupta Electronics and Communication Department Jaypee Institute of Information Technology Noida, Uttar Pradesh, India Ankita Bharti Electronics
More informationS Postgraduate Course on Signal Processing in Communications, FALL Topic: Iteration Bound. Harri Mäntylä
S-38.220 Postgraduate Course on Signal Processing in Communications, FALL - 99 Topic: Iteration Bound Harri Mäntylä harri.mantyla@hut.fi ate: 11.10.1999 1. INTROUCTION...3 2. ATA-FLOW GRAPH (FG) REPRESENTATIONS...4
More informationTake Home Final Examination (From noon, May 5, 2004 to noon, May 12, 2004)
Last (family) name: First (given) name: Student I.D. #: Department of Electrical and Computer Engineering University of Wisconsin - Madison ECE 734 VLSI Array Structure for Digital Signal Processing Take
More informationPerformance Analysis of CORDIC Architectures Targeted by FPGA Devices
International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Performance Analysis of CORDIC Architectures Targeted by FPGA Devices Guddeti Nagarjuna Reddy 1, R.Jayalakshmi 2, Dr.K.Umapathy
More informationUnit 2: High-Level Synthesis
Course contents Unit 2: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 2 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis
More informationFOLDED ARCHITECTURE FOR NON CANONICAL LEAST MEAN SQUARE ADAPTIVE DIGITAL FILTER USED IN ECHO CANCELLATION
FOLDED ARCHITECTURE FOR NON CANONICAL LEAST MEAN SQUARE ADAPTIVE DIGITAL FILTER USED IN ECHO CANCELLATION Pradnya Zode 1 and Dr.A.Y.Deshmukh 2 1 Research Scholar, Department of Electronics Engineering
More informationFILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas
FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS Waqas Akram, Cirrus Logic Inc., Austin, Texas Abstract: This project is concerned with finding ways to synthesize hardware-efficient digital filters given
More informationHead, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India
Mapping Signal Processing Algorithms to Architecture Sumam David S Head, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India sumam@ieee.org Objectives At the
More informationIntroduction to Field Programmable Gate Arrays
Introduction to Field Programmable Gate Arrays Lecture 2/3 CERN Accelerator School on Digital Signal Processing Sigtuna, Sweden, 31 May 9 June 2007 Javier Serrano, CERN AB-CO-HT Outline Digital Signal
More informationVLSI Implementation of Low Power Area Efficient FIR Digital Filter Structures Shaila Khan 1 Uma Sharma 2
IJSRD - International Journal for Scientific Research & Development Vol. 3, Issue 05, 2015 ISSN (online): 2321-0613 VLSI Implementation of Low Power Area Efficient FIR Digital Filter Structures Shaila
More informationTextbook: VLSI ARRAY PROCESSORS S.Y. Kung
: 1/34 Textbook: VLSI ARRAY PROCESSORS S.Y. Kung Prentice-Hall, Inc. : INSTRUCTOR : CHING-LONG SU E-mail: kevinsu@twins.ee.nctu.edu.tw Chapter 4 2/34 Chapter 4 Systolic Array Processors Outline of Chapter
More informationAt the ith stage: Input: ci is the carry-in Output: si is the sum ci+1 carry-out to (i+1)st state
Chapter 4 xi yi Carry in ci Sum s i Carry out c i+ At the ith stage: Input: ci is the carry-in Output: si is the sum ci+ carry-out to (i+)st state si = xi yi ci + xi yi ci + xi yi ci + xi yi ci = x i yi
More informationDSP Architecture Optimization in MATLAB/Simulink Environment
University of California Los Angeles DSP Architecture Optimization in MATLAB/Simulink Environment A thesis submitted in partial satisfaction of the requirements for the degree Master of Science in Electrical
More informationHigh-Level Synthesis (HLS)
Course contents Unit 11: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 11 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis
More informationVLSI Implementation of Parallel CRC Using Pipelining, Unfolding and Retiming
IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 2, Issue 5 (May. Jun. 203), PP 66-72 e-issn: 239 4200, p-issn No. : 239 497 VLSI Implementation of Parallel CRC Using Pipelining, Unfolding
More informationAn example of LP problem: Political Elections
Linear Programming An example of LP problem: Political Elections Suppose that you are a politician trying to win an election. Your district has three different types of areas: urban, suburban, and rural.
More informationIntroduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation
Introduction to Electronic Design Automation Model of Computation Jie-Hong Roland Jiang 江介宏 Department of Electrical Engineering National Taiwan University Spring 03 Model of Computation In system design,
More informationParallel FIR Filters. Chapter 5
Chapter 5 Parallel FIR Filters This chapter describes the implementation of high-performance, parallel, full-precision FIR filters using the DSP48 slice in a Virtex-4 device. ecause the Virtex-4 architecture
More informationEECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141
EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 14 EE141 Outline Parallelism EE141 2 Parallelism Parallelism is the act of doing more
More informationAdditional Slides to De Micheli Book
Additional Slides to De Micheli Book Sungho Kang Yonsei University Design Style - Decomposition 08 3$9 0 Behavioral Synthesis Resource allocation; Pipelining; Control flow parallelization; Communicating
More informationIntel Stratix 10 Variable Precision DSP Blocks User Guide
Intel Stratix 10 Variable Precision DSP Blocks User Guide Updated for Intel Quartus Prime Design Suite: 17.1 Subscribe Send Feedback Latest document on the web: PDF HTML Contents Contents 1 Intel Stratix
More informationAn Efficient Constant Multiplier Architecture Based On Vertical- Horizontal Binary Common Sub-Expression Elimination Algorithm
Volume-6, Issue-6, November-December 2016 International Journal of Engineering and Management Research Page Number: 229-234 An Efficient Constant Multiplier Architecture Based On Vertical- Horizontal Binary
More informationChapter 4. Combinational Logic
Chapter 4. Combinational Logic Tong In Oh 1 4.1 Introduction Combinational logic: Logic gates Output determined from only the present combination of inputs Specified by a set of Boolean functions Sequential
More informationDigital Design using HDLs EE 4755 Final Examination
Name Digital Design using HDLs EE 4755 Final Examination Thursday, 8 December 26 2:3-4:3 CST Alias Problem Problem 2 Problem 3 Problem 4 Problem 5 Problem 6 Exam Total (3 pts) (2 pts) (5 pts) (5 pts) (
More informationParallelization. Marcel Jacomet Josef Goette Bern University of Applied Sciences Bfh-Ti HuCE-microLab, Biel/Bienne.
Josef Goette Bern University of Applied Sciences Bfh-Ti HuCE-microLab, Biel/Bienne Marcel.Jacomet@bfh.ch OCT at ata-path FiFo FT huce.ti.bfh.ch/microlab October 11, 2017 to OCT OCT at ata-path FiFo FT
More informationCOMPUTATIONAL PROPERIES OF DSP ALGORITHMS
COMPUTATIONAL PROPERIES OF DSP ALGORITHMS 1 DSP Algorithms A DSP algorithm is a computational rule, f, that maps an ordered input sequence, x(nt), to an ordered output sequence, y(nt), according to xnt
More informationA Novel Area Efficient Folded Modified Convolutional Interleaving Architecture for MAP Decoder
A Novel Area Efficient Folded Modified Convolutional Interleaving Architecture for Decoder S.Shiyamala Department of ECE SSCET Palani, India. Dr.V.Rajamani Principal IGCET Trichy,India ABSTRACT This paper
More informationEECS150 - Digital Design Lecture 09 - Parallelism
EECS150 - Digital Design Lecture 09 - Parallelism Feb 19, 2013 John Wawrzynek Spring 2013 EECS150 - Lec09-parallel Page 1 Parallelism Parallelism is the act of doing more than one thing at a time. Optimization
More informationAdvanced Design System DSP Synthesis
Advanced Design System 2002 DSP Synthesis February 2002 Notice The information contained in this document is subject to change without notice. Agilent Technologies makes no warranty of any kind with regard
More informationCHAPTER 3 METHODOLOGY. 3.1 Analysis of the Conventional High Speed 8-bits x 8-bits Wallace Tree Multiplier
CHAPTER 3 METHODOLOGY 3.1 Analysis of the Conventional High Speed 8-bits x 8-bits Wallace Tree Multiplier The design analysis starts with the analysis of the elementary algorithm for multiplication by
More informationCombined Code and Data Minimization Algorithms
Problem Statement ombined ode and ata Minimization lgorithms March, 995 Mini onference on Ptolemy Praveen K Murthy (U erkeley, Shuvra S hattacharyya (Hitachi merica Ltd, dward Lee (U erkeley {murthy,shuvra,eal}@eecsberkeleyedu
More information19. Implementing High-Performance DSP Functions in Stratix & Stratix GX Devices
19. Implementing High-Performance SP Functions in Stratix & Stratix GX evices S52007-1.1 Introduction igital signal processing (SP) is a rapidly advancing field. With products increasing in complexity,
More informationFPGA IMPLEMENTATION OF ADAPTIVE TEMPORAL KALMAN FILTER FOR REAL TIME VIDEO FILTERING March 15, 1999
FPGA IMPLEMENTATION OF ADAPTIVE TEMPORAL KALMAN FILTER FOR REAL TIME VIDEO FILTERING March 15, 1999 Robert D. Turney +, Ali M. Reza, and Justin G. R. Dela + CORE Solutions Group, Xilinx San Jose, CA 9514-3450,
More informationEECS150 - Digital Design Lecture 24 - High-Level Design (Part 3) + ECC
EECS150 - Digital Design Lecture 24 - High-Level Design (Part 3) + ECC April 12, 2012 John Wawrzynek Spring 2012 EECS150 - Lec24-hdl3 Page 1 Parallelism Parallelism is the act of doing more than one thing
More informationNetwork service model. Network service model. Network Layer (part 1) Virtual circuits. By the end of this lecture, you should be able to.
Netork Layer (part ) y the end of this lecture, you should be able to. xplain the operation of distance vector routing algorithm xplain shortest path routing algorithm escribe the major points of RIP and
More informationPower Optimization for Universal Hash Function Data Path Using Divide-and-Concatenate Technique
Poer Optimization or Universal Hash Function Data Path Using Divide-and-Concatenate Technique Bo Yang, and Ramesh Karri Dept. o Electrical and Computer Engineering, Polytechnic University Brooklyn, NY,
More informationHDL. Operations and dependencies. FSMs Logic functions HDL. Interconnected logic blocks HDL BEHAVIORAL VIEW LOGIC LEVEL ARCHITECTURAL LEVEL
ARCHITECTURAL-LEVEL SYNTHESIS Motivation. Outline cgiovanni De Micheli Stanford University Compiling language models into abstract models. Behavioral-level optimization and program-level transformations.
More informationLecture 15 Register Allocation & Spilling
I. Motivation Lecture 15 Register Allocation & Spilling I. Introduction II. Abstraction and the Problem III. Algorithm IV. Spilling Problem Allocation of variables (pseudo-registers) to hardware registers
More informationFPGA Polyphase Filter Bank Study & Implementation
FPGA Polyphase Filter Bank Study & Implementation Raghu Rao Matthieu Tisserand Mike Severa Prof. John Villasenor Image Communications/. Electrical Engineering Dept. UCLA 1 Introduction This document describes
More informationJan Rabaey Homework # 7 Solutions EECS141
UNIVERSITY OF CALIFORNIA College of Engineering Department of Electrical Engineering and Computer Sciences Last modified on March 30, 2004 by Gang Zhou (zgang@eecs.berkeley.edu) Jan Rabaey Homework # 7
More informationAlgorithms Transformation Techniques for Low-Power Wireless VLSI Systems Design
International Journal of Wireless Information Networks, Vol. 5, No. 2, 1998 Algorithms Transformation Techniques for Low-Power Wireless VLSI Systems Design Naresh R. Shanbhag 1 This paper presents an overview
More informationNumber Systems. Readings: , Problem: Implement simple pocket calculator Need: Display, adders & subtractors, inputs
Number Systems Readings: 3-3.3.3, 3.3.5 Problem: Implement simple pocket calculator Need: Display, adders & subtractors, inputs Display: Seven segment displays Inputs: Switches Missing: Way to implement
More informationECE468 Computer Organization & Architecture. The Design Process & ALU Design
ECE6 Computer Organization & Architecture The Design Process & Design The Design Process "To Design Is To Represent" Design activity yields description/representation of an object -- Traditional craftsman
More informationDesign of Two Different 128-bit Adders. Project Report
Design of Two Different 128-bit Adders Project Report By Vladislav uravin Concordia ID: 5505763 COEN6501: Digital Design & Synthesis Offered by Professor Asim Al-Khalili Concordia University December 2004
More informationAn Approach for Integrating Basic Retiming and Software Pipelining
An Approach for Integrating Basic Retiming and Software Pipelining Noureddine Chabini Department of Electrical and Computer Engineering Royal Military College of Canada PB 7000 Station Forces Kingston
More informationThe theory and design of a class of perfect reconstruction modified DFT filter banks with IIR filters
Title The theory and design of a class of perfect reconstruction modified DFT filter banks with IIR filters Author(s) Yin, SS; Chan, SC Citation Midwest Symposium On Circuits And Systems, 2004, v. 3, p.
More informationCHAPTER 3 REVISED SIMPLEX METHOD AND DATA STRUCTURES
46 CHAPTER 3 REVISED SIMPLEX METHOD AND DATA STRUCTURES 3.1 INTRODUCTION While solving a linear programming problem, a systematic search is made to find a non-negative vector X which extremizes a linear
More information1 Introduction Data format converters (DFCs) are used to permute the data from one format to another in signal processing and image processing applica
A New Register Allocation Scheme for Low Power Data Format Converters Kala Srivatsan, Chaitali Chakrabarti Lori E. Lucke Department of Electrical Engineering Minnetronix, Inc. Arizona State University
More informationTOPIC : Verilog Synthesis examples. Module 4.3 : Verilog synthesis
TOPIC : Verilog Synthesis examples Module 4.3 : Verilog synthesis Example : 4-bit magnitude comptarator Discuss synthesis of a 4-bit magnitude comparator to understand each step in the synthesis flow.
More informationArea Efficient, Low Power Array Multiplier for Signed and Unsigned Number. Chapter 3
Area Efficient, Low Power Array Multiplier for Signed and Unsigned Number Chapter 3 Area Efficient, Low Power Array Multiplier for Signed and Unsigned Number Chapter 3 3.1 Introduction The various sections
More informationFloating Point Arithmetic
Floating Point Arithmetic Clark N. Taylor Department of Electrical and Computer Engineering Brigham Young University clark.taylor@byu.edu 1 Introduction Numerical operations are something at which digital
More informationRapid: A Configurable Architecture for Compute-Intensive Applications
Rapid: Configurable rchitecture for Compute-Intensive pplications Carl Ebeling Dept. of Computer Science and Engineering niversity of Washington lternatives for High-Performance Systems SIC se application-specific
More informationBit-Serial Inner Product Processors in VLSI. Misha R. Durie Bell Laboratories Murray Hill, New Jersey 07974
155 Bit-Serial Inner Product Processors in VLSI Misha R. Durie Bell Laboratories Murray Hill, New Jersey 07974 Carver A. Mead California Institute of Technology Pasadena, California 91125 1. Introduction
More informationFPGA Implementation of High Speed FIR Filters Using Add and Shift Method
FPGA Implementation of High Speed FIR Filters Using Add and Shift Method Abstract We present a method for implementing high speed Finite Impulse Response (FIR) filters using just registered adders and
More informationBinary Addition. Add the binary numbers and and show the equivalent decimal addition.
Binary Addition The rules for binary addition are 0 + 0 = 0 Sum = 0, carry = 0 0 + 1 = 0 Sum = 1, carry = 0 1 + 0 = 0 Sum = 1, carry = 0 1 + 1 = 10 Sum = 0, carry = 1 When an input carry = 1 due to a previous
More informationCOE 561 Digital System Design & Synthesis Introduction
1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design
More informationFIR Filter Architecture for Fixed and Reconfigurable Applications
FIR Filter Architecture for Fixed and Reconfigurable Applications Nagajyothi 1,P.Sayannna 2 1 M.Tech student, Dept. of ECE, Sudheer reddy college of Engineering & technology (w), Telangana, India 2 Assosciate
More informationArithmetic Processing
CS/EE 5830/6830 VLSI ARCHITECTURE Chapter 1 Basic Number Representations and Arithmetic Algorithms Arithmetic Processing AP = (operands, operation, results, conditions, singularities) Operands are: Set
More informationSIGNALS AND SYSTEMS I Computer Assignment 2
SIGNALS AND SYSTES I Computer Assignment 2 Lumped linear time invariant discrete and digital systems are often implemented using linear constant coefficient difference equations. In ATLAB, difference equations
More informationLow power Comb Decimation Filter Using Polyphase
O Low power Comb Decimation Filter Using olyphase Decomposition For ono-bit Analog-to-Digital Converters Y Dumonteix, H Aboushady, H ehrez and Louërat Université aris VI, Laboratoire LI6 4 lace Jussieu,
More informationAffine Transformations Computer Graphics Scott D. Anderson
Affine Transformations Computer Graphics Scott D. Anderson 1 Linear Combinations To understand the poer of an affine transformation, it s helpful to understand the idea of a linear combination. If e have
More informationROTATION SCHEDULING ON SYNCHRONOUS DATA FLOW GRAPHS. A Thesis Presented to The Graduate Faculty of The University of Akron
ROTATION SCHEDULING ON SYNCHRONOUS DATA FLOW GRAPHS A Thesis Presented to The Graduate Faculty of The University of Akron In Partial Fulfillment of the Requirements for the Degree Master of Science Rama
More informationA VLSI DSP DESIGN AND IMPLEMENTATION O F COMB FILTER USING UN-FOLDING METHODOLOGY
VLSI SP ESIGN N IMPLEMENTTION O F COM FILTER USING UN-FOLING METHOOLOGY 1 PURU GUPT & 2 TRUN KUMR RWT 1&2 ept. of Electronics and Communication Engineering, Netaji Subhas Institute of Technology -warka
More informationDEPTH-FIRST SEARCH A B C D E F G H I J K L M N O P. Graph Traversals. Depth-First Search
PTH-IRST SRH raph Traversals epth-irst Search H I J K L M N O P epth-irst Search 1 xploring a Labyrinth Without etting Lost depth-first search (S) in an undirected graph is like wandering in a labyrinth
More informationGraph Algorithms. Chromatic Polynomials. Graph Algorithms
Graph Algorithms Chromatic Polynomials Graph Algorithms Chromatic Polynomials Definition G a simple labelled graph with n vertices and m edges. k a positive integer. P G (k) number of different ways of
More informationBinary Adders. Ripple-Carry Adder
Ripple-Carry Adder Binary Adders x n y n x y x y c n FA c n - c 2 FA c FA c s n MSB position Longest delay (Critical-path delay): d c(n) = n d carry = 2n gate delays d s(n-) = (n-) d carry +d sum = 2n
More informationRetiming Arithmetic Datapaths using Timed Taylor Expansion Diagrams
Retiming Arithmetic Datapaths using Timed Taylor Expansion Diagrams Daniel Gomez-Prado Dusung Kim Maciej Ciesielski Emmanuel Boutillon 2 University of Massachusetts Amherst, USA. {dgomezpr,ciesiel,dukim}@ecs.umass.edu
More informationHardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University
Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis
More informationDesign of Efficient Fast Fourier Transform
Design of Efficient Fast Fourier Transform Shymna Nizar N. S PG student, VLSI & Embedded Systems, ECE Department TKM Institute of Technology Karuvelil P.O, Kollam, Kerala-691505, India Abhila R Krishna
More informationAnnouncements. Midterm 2 next Thursday, 6-7:30pm, 277 Cory Review session on Tuesday, 6-7:30pm, 277 Cory Homework 8 due next Tuesday Labs: project
- Fall 2002 Lecture 20 Synthesis Sequential Logic Announcements Midterm 2 next Thursday, 6-7:30pm, 277 Cory Review session on Tuesday, 6-7:30pm, 277 Cory Homework 8 due next Tuesday Labs: project» Teams
More informationAdvanced Design System 1.5. DSP Synthesis
Advanced Design System 1.5 DSP Synthesis December 2000 Notice The information contained in this document is subject to change without notice. Agilent Technologies makes no warranty of any kind with regard
More informationResearch Article Design of Synthesizable, Retimed Digital Filters Using FPGA Based Path Solvers with MCM Approach: Comparison and CAD Tool
VLSI Design Volume 204, Article ID 28070, 8 pages http://dx.doi.org/0.55/204/28070 Research Article Design of Synthesizable, Retimed Digital Filters Using FPGA Based Path Solvers with MCM Approach: Comparison
More informationLearning Outcomes. Spiral 2-2. Digital System Design DATAPATH COMPONENTS
2-2. 2-2.2 Learning Outcomes piral 2-2 Arithmetic Components and Their Efficient Implementations I understand the control inputs to counters I can design logic to control the inputs of counters to create
More informationSymbolic Buffer Sizing for Throughput-Optimal Scheduling of Dataflow Graphs
Symbolic Buffer Sizing for Throughput-Optimal Scheduling of Dataflow Graphs Anan Bouakaz Pascal Fradet Alain Girault Real-Time and Embedded Technology and Applications Symposium, Vienna April 14th, 2016
More informationReadings: Storage unit. Can hold an n-bit value Composed of a group of n flip-flops. Each flip-flop stores 1 bit of information.
Registers Readings: 5.8-5.9.3 Storage unit. Can hold an n-bit value Composed of a group of n flip-flops Each flip-flop stores 1 bit of information ff ff ff ff 178 Controlled Register Reset Load Action
More informationMapping Algorithms to Hardware By Prawat Nagvajara
Electrical and Computer Engineering Mapping Algorithms to Hardware By Prawat Nagvajara Synopsis This note covers theory, design and implementation of the bit-vector multiplication algorithm. It presents
More informationTHE LOGIC OF COMPOUND STATEMENTS
CHAPTER 2 THE LOGIC OF COMPOUND STATEMENTS Copyright Cengage Learning. All rights reserved. SECTION 2.5 Application: Number Systems and Circuits for Addition Copyright Cengage Learning. All rights reserved.
More informationMRPF: An Architectural Transformation for Synthesis of High-Performance and Low-Power Digital Filters
MRPF: An Architectural Transformation for Synthesis of High-Performance and Low-Power Digital Filters Hunsoo Choo, Khurram Muhammad, Kaushik Roy Electrical & Computer Engineering Department Texas Instruments
More informationWorkload Characterization Techniques
Workload Characterization Techniques Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu These slides are available on-line at: http://www.cse.wustl.edu/~jain/cse567-08/
More informationRetiming. Adapted from: Synthesis and Optimization of Digital Circuits, G. De Micheli Stanford. Outline. Structural optimization methods. Retiming.
Retiming Adapted from: Synthesis and Optimization of Digital Circuits, G. De Micheli Stanford Outline Structural optimization methods. Retiming. Modeling. Retiming for minimum delay. Retiming for minimum
More informationHigh Level Synthesis
High Level Synthesis Design Representation Intermediate representation essential for efficient processing. Input HDL behavioral descriptions translated into some canonical intermediate representation.
More information6. Algorithm Design Techniques
6. Algorithm Design Techniques 6. Algorithm Design Techniques 6.1 Greedy algorithms 6.2 Divide and conquer 6.3 Dynamic Programming 6.4 Randomized Algorithms 6.5 Backtracking Algorithms Malek Mouhoub, CS340
More informationLearning Outcomes. Spiral 2 2. Digital System Design DATAPATH COMPONENTS
2-2. 2-2.2 Learning Outcomes piral 2 2 Arithmetic Components and Their Efficient Implementations I know how to combine overflow and subtraction results to determine comparison results of both signed and
More informationUsing the DSP Blocks in Stratix & Stratix GX Devices
Using the SP Blocks in Stratix & Stratix GX evices November 2002, ver. 3.0 Application Note 214 Introduction Traditionally, designers had to make a trade-off between the flexibility of off-the-shelf digital
More informationHIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE
HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE Anni Benitta.M #1 and Felcy Jeba Malar.M *2 1# Centre for excellence in VLSI Design, ECE, KCG College of Technology, Chennai, Tamilnadu
More informationAcyclic orientations do not lead to optimal deadlock-free packet routing algorithms
Acyclic orientations do not lead to optimal deadloc-ree pacet routing algorithms Daniel Šteanovič 1 Department o Computer Science, Comenius University, Bratislava, Slovaia Abstract In this paper e consider
More informationLecture 4: Synchronous Data Flow Graphs - HJ94 goal: Skiing down a mountain
Lecture 4: Synchronous ata Flow Graphs - I. Verbauwhede, 05-06 K.U.Leuven HJ94 goal: Skiing down a mountain SPW, Matlab, C pipelining, unrolling Specification Algorithm Transformations loop merging, compaction
More informationMark Redekopp, All rights reserved. EE 352 Unit 8. HW Constructs
EE 352 Unit 8 HW Constructs Logic Circuits Combinational logic Perform a specific function (mapping of 2 n input combinations to desired output combinations) No internal state or feedback Given a set of
More informationLECTURE 13: SOLUTION METHODS FOR CONSTRAINED OPTIMIZATION. 1. Primal approach 2. Penalty and barrier methods 3. Dual approach 4. Primal-dual approach
LECTURE 13: SOLUTION METHODS FOR CONSTRAINED OPTIMIZATION 1. Primal approach 2. Penalty and barrier methods 3. Dual approach 4. Primal-dual approach Basic approaches I. Primal Approach - Feasible Direction
More informationInstruction Fetch Energy Reduction Using Loop Caches For Embedded Applications with Small Tight Loops. Lea Hwang Lee, William Moyer, John Arends
Instruction Fetch Energy Reduction Using Loop Caches For Embedded Applications ith Small Tight Loops Lea Hang Lee, William Moyer, John Arends Instruction Fetch Energy Reduction Using Loop Caches For Loop
More informationIntel HLS Compiler: Fast Design, Coding, and Hardware
white paper Intel HLS Compiler Intel HLS Compiler: Fast Design, Coding, and Hardware The Modern FPGA Workflow Authors Melissa Sussmann HLS Product Manager Intel Corporation Tom Hill OpenCL Product Manager
More informationImplementing FIR Filters
Implementing FIR Filters in FLEX Devices February 199, ver. 1.01 Application Note 73 FIR Filter Architecture This section describes a conventional FIR filter design and how the design can be optimized
More informationCS435 Introduction to Big Data FALL 2018 Colorado State University. 9/24/2018 Week 6-A Sangmi Lee Pallickara. Topics. This material is built based on,
FLL 218 olorado State University 9/24/218 Week 6-9/24/218 S435 Introduction to ig ata - FLL 218 W6... PRT 1. LRGE SLE T NLYTIS WE-SLE LINK N SOIL NETWORK NLYSIS omputer Science, olorado State University
More information