Textbook: VLSI ARRAY PROCESSORS S.Y. Kung
|
|
- Eugenia Robinson
- 5 years ago
- Views:
Transcription
1 : 1/34 Textbook: VLSI ARRAY PROCESSORS S.Y. Kung Prentice-Hall, Inc. : INSTRUCTOR : CHING-LONG SU kevinsu@twins.ee.nctu.edu.tw
2 Chapter 4 2/34 Chapter 4 Systolic Array Processors
3 Outline of Chapter 4 3/ Introduction 4.2 Systolic Array Processors 4.4 Performance Analysis and esign Optimization 4.5 Systolic Arrays for the Transitive Closure and ynamic Programming Problems 4.6 Systolic esign for Artificial Neural Network 4.7 Conclusion Remarks 4.8 Problems
4 4.1 Introduction 4/ Introduction 4.2 Systolic Array Processors 4.4 Performance Analysis and esign Optimization 4.5 Systolic Arrays for the Transitive Closure and ynamic Programming Problems 4.6 Systolic esign for Artificial Neural Network 4.7 Conclusion Remarks 4.8 Problems
5 4.1 Introduction 5/34 In Chapter 4 1. Review the algorithm mapping onto SFG methodology 2. iscuss the cut-set systolization (retiming) method for systolic array design
6 4.2 Systolic Array Processors 6/ Introduction 4.2 Systolic Array Processors 4.4 Performance Analysis and esign Optimization 4.5 Systolic Arrays for the Transitive Closure and ynamic Programming Problems 4.6 Systolic esign for Artificial Neural Network 4.7 Conclusion Remarks 4.8 Problems
7 4.2 Systolic Array Processors 7/34 Systolic Array Processor 1. A systolic system is a network of processors which rhythmically compute and pass data through the system. 2. Systolic Array Processor avoids the classic memory access bottleneck problem. 3. Systolic Array Processor can solve the computebound and I/O-bound computations.
8 4.2 Systolic Array Processors 8/34 Basic Configuration of Systolic Array Memory PE PE PE PE PE PE PE PE
9 4.2 Systolic Array Processors 9/34 1. Synchrony: The data are rhythmically computed (timed by a global clock) and passed through the network. 2. Modularity and Regularity 3. Spatial Locality and Temporal Locality 4. Pipelinability efinition of Systolic Arrays
10 4.2 Systolic Array Processors 10/34 Example 1: Systolic Array for Convolution - u 1 - u 0 - y 0 - y 1 W 0 W 0 W 0 W 0 0 a in a out a out= a in b out W i b in b out= b in+ a in* W i
11 4.2 Systolic Array Processors 11/34 C out a 23 b 32 a 12 b 21 Example 2: Hexagonal Systolic Array for Band Matrix Multiplication T=4 T=3 a 32 a 31 T=2 a 22 a 21 T=1 a 11 T=1 c 11 b 11 T=1 b 22 b 12 T=2 b 23 b 13 T=3 T=4 T=2 c 21 c 12 T=3 c 31 c 22 c 13 T=4 c 41 c 32 c 23 c 14
12 4.2 Systolic Array Processors 12/34 Properties of Systolic Array 1. Simple and Regular esign 2. Concurrency and Communication 3. Balancing Computation with I/O
13 4.2 Systolic Array Processors 13/34 Clock istribution Scheme for Synchronization of the Systolic Array System: H-tree Layout for the Balance of the Clock Circuit elay Linear Array Square Array Hexagonal Array
14 4.2 Systolic Array Processors 14/34 Systolic vs. SIM vs. SFG Arrays Control Unit (Central Control) Global Communication Control Bus Processing Uuit Processing Uuit Processing Uuit ata Bus Interconnection Network (Local) SIM Array
15 4.2 Systolic Array Processors 15/34 Systolic vs. SIM vs. SFG Arrays Control Unit Control Unit Control Unit Processing Uuit Processing Uuit Processing Uuit Interconnection Network (Local) Systolic Array
16 4.2 Systolic Array Processors 16/34 Systolic vs. SIM vs. SFG Arrays Control Unit Control Unit Control Unit Processing Uuit Processing Uuit Processing Uuit ata Bus Global Communication Interconnection Network (Local) SFG Array
17 4.2 Systolic Array Processors 17/ Introduction 4.2 Systolic Array Processors 4.4 Performance Analysis and esign Optimization 4.5 Systolic Arrays for the Transitive Closure and ynamic Programming Problems 4.6 Systolic esign for Artificial Neural Network 4.7 Conclusion Remarks 4.8 Problems
18 18/34 Three Stage of Canonical Mapping Algorithm for Systolic Array esign 1. erive a (local) G from the Algorithm 2. Map the G to an SFG Array 3. Transform the SFG to a Systolic Array (i.e. retiming)
19 19/34 The maor systolic array design gap is that most SFGs are not given in temporally localized form. Systolic Array = SFG Array + Pipeline Retiming
20 20/34 Cut-Set Retiming Procedure 1. Timing Scale: All delays may be scaled, i.e. (I/O own Sample) 2. elay Transfer: Given a cut-set of the SFG, which partitions the graph into two components, we can group the edges of the cut-set into two inbound edges and outbound edges.
21 21/34 ata Transfer Rule Inbound Cut +k +k - k Outbound
22 22/34 Systolization Procedure 1. Selection of Basic Operation Modules 2. Applying Retiming Rules 3. Combination of elay and Operation Modules
23 23/34 Systolization Procedure: Example of Lattice Filters X 2 X 1 X 0 + * + * + * Critical Path = 6 MAC Y 0 Y 1 Y * + * + * + k i SFG for AR Lattice Filter - X 2 - X 1 - X 0 + * + * + * Y 0 - Y 1 - Y 2-2 * + 2 * + 2 * + Step 1. Time-Rescaled SFG for AR Lattice Filter
24 24/34 Systolization Procedure: Example of Lattice Filters - X 2 - X 1 - X 0 Y 0 - Y 1 - Y * * * * + 2- * + Step 2. Retiming SFG for AR Lattice Filter + * Critical Path = 2 MAC - X 2 - X 1 - X 0 Y 0 - Y 1 - Y 2 - Step 3. Systolic Array for AR Lattice Filter
25 25/34 Systolization Procedure: Example of Matrix Multiplication b 41 b 42 b 32 b 22 b 12 b 43 b 31 b 44 b 33 b 21 b 34 b 23 b 11 b 24 b 13 b 14 a 14 a 13 a 12 a 11 a 24 a 23 a 22 a 21 a 34 a 33 a 32 a 31 a 44 a 43 a 42 a 41
26 26/34 All systolic arrays obtained from linear proections of the G can be derived by the following two steps. 1. Mapping the G onto SFGs by the SFG Proection Procedure 2. Mapping the SFG onto a Systolic Array by the Cut-Set Retiming
27 27/34 Retiming in the Sorting Systolic Arrays: Insertion Sorter x 1 4 x 1 3 i INPUT x 1 2 x 1 1 x 11 x 2 1 x 31 x 41 m 1 1 m 21 m 31 m 4 1 m 51 m i m x 12 x 22 x 32 x 42 m 2 2 m 32 m 42 m 52 m i m x 2 3 x 3 3 x 43 m 33 m 43 m 53 i 3 m 3 m 3 d = s =[1,0] x 34 x 44 m 44 m 54 m i m x 45
28 28/34 Retiming in the Sorting Systolic Arrays: Selection Sorter i INPUT x 11 x 2 1 x 31 x 41 m 11 m 2 1 m 3 1 m 4 1 m 5 1 x 12 x 22 x 3 2 x 4 2 m 22 m 32 m 4 2 m 5 2 x 23 x 3 3 x 4 3 m 33 m 43 m 53 x 3 4 x 4 4 m 44 m 5 4 d = s = [0,1] x 4 5 x 1 x 2 x 3 x 4 m 1 1 x 1 1 m 2 2 x 2 1 m 3 3 x 3 1 m 4 4 x 4 1
29 29/34 Retiming in the Sorting Systolic Arrays: Bubble-Sorter i INPUT x 1 1 x 21 x 3 1 x 4 1 m 11 m 21 m 31 m 41 m 51 x 12 x 2 2 x 32 x 42 m 2 2 m 3 2 m 4 2 m 52 x 23 x 33 x 4 3 m 3 3 m 43 m 5 3 d = s = [1,1] x 3 4 x 44 m 44 m 54 x 4 5 x 2 2 x 3 3 x 4 4 m 1 3 m 1 1 x 1 1 m 1 5 m 1 7
30 30/34 Rotation of Schedule Vector for Insertion Sorter x 1 4 x 1 3 i INPUT esired Schedule x 1 2 x 1 1 x 11 x 2 1 x 31 x 41 m 1 1 m 21 m 31 m 4 1 m 51 m i m x 12 x 22 x 32 x 42 m 2 2 m 32 m 42 m 52 m i m x 2 3 x 3 3 x 43 m 33 m 43 m 53 i 3 m 3 m 3 d = s =[1,0] efault Schedule x 34 x 44 m 44 m 54 m i m x 45
31 31/34 Bit Level Systolic Arrays: Example of Inner Product of Two Vectors The inner product vector c of two vectors a and b is computed as: c n = 1a k b k k = Assume that elements of a and b are m-bit integer.
32 32/34 c a 1 Bit Level Systolic Arrays: Example of Inner Product of Two Vectors n k = 1 = a b = = 2 b 1 k + m 1 = 0 a a 2 1, k b a m 1 = 0 n b b 1, n m 1 = 0 a n, 2 m 1 = 0 b n, a 1, m-1 a 1,1 a 1,0 b 1, m-1 b 1,1 b 1,0 Top Layer: b 1, 0 a 1,m-1 b 1,0 a 1, 1 b 1,0 a 1, 0 b 1, 1 a 1,m-1 b 1,1 a 1, 1 b 1,1 a 1, 0 b 1,m-1 a 1, m-1 b 1, m-1 a 1,1 b 1, m-1 a 1,0
33 33/34 c a 1 n k = 1 = a b = = 2 Bit Level Systolic Arrays: Example of Inner Product of Two Vectors b 1 k + m 1 = 0 a a 2 1, k b a m 1 = 0 n b b 1, n m 1 = 0 a n, 2 m 1 = 0 b n, a n, m-1 a n, 1 a n, 0 b n, m-1 b n, 1 b n, 0 Bottom Layer: b n, 0 a n,m-1 b n, 0 a n,1 b n, 0 a n,0 b n,1 a n,m-1 b n,1 a n,1 b n, 1 a n,0 b n,m - 1 a n,m-1 b n,m - 1 a n,1 b n, m-1 a n,0
34 34/34 a 0, 0 b 0,0 b 0,1 a 0, 1 a 0, 2 b 0, 2 Top Layer a 1, 0 a 1, 1 b 1,0 b 1,1 Bit Level Systolic Arrays: Example of Inner Product of Two Vectors Sum Carry a 1, 2 a 2, 1 a 2, 0 b 2,0 b 2,1 b 1, 2 a 2, 2 b 2, 2 HA Bottom Layer FA c 4 c 0 c 3 c 1 c 2
SHARED MEMORY VS DISTRIBUTED MEMORY
OVERVIEW Important Processor Organizations 3 SHARED MEMORY VS DISTRIBUTED MEMORY Classical parallel algorithms were discussed using the shared memory paradigm. In shared memory parallel platform processors
More informationLinear Arrays. Chapter 7
Linear Arrays Chapter 7 1. Basics for the linear array computational model. a. A diagram for this model is P 1 P 2 P 3... P k b. It is the simplest of all models that allow some form of communication between
More informationAcademic Course Description
Academic Course Description SRM University Faculty of Engineering and Technology Department of Electronics and Communication Engineering VL2003 DSP Structures for VLSI Systems First Semester, 2014-15 (ODD
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #11 2/21/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline Midterm 1:
More informationAcademic Course Description. VL2003 Digital Processing Structures for VLSI First Semester, (Odd semester)
Academic Course Description SRM University Faculty of Engineering and Technology Department of Electronics and Communication Engineering VL2003 Digital Processing Structures for VLSI First Semester, 2015-16
More informationNepal Telecom Nepal Doorsanchar Company Ltd.
Nepal Telecom Nepal Doorsanchar Company Ltd. Syllabus lg=g+= 124 ;+u ;DalGwt cg';'lr - 3_ Part II: (Specialized subject for Computer Engineer Level 7 Tech. - Free and Internal competition) Time: 2 hours
More informationComputer Engineering Syllabus 2017
INTRODUCTION The Canadian Engineering Qualifications Board of Engineers Canada issues the Examination Syllabus that includes a continually increasing number of engineering disciplines. Each discipline
More informationRetiming. Adapted from: Synthesis and Optimization of Digital Circuits, G. De Micheli Stanford. Outline. Structural optimization methods. Retiming.
Retiming Adapted from: Synthesis and Optimization of Digital Circuits, G. De Micheli Stanford Outline Structural optimization methods. Retiming. Modeling. Retiming for minimum delay. Retiming for minimum
More informationHRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing
HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing Mingyu Gao and Christos Kozyrakis Stanford University http://mast.stanford.edu HPCA March 14, 2016 PIM is Coming Back End of Dennard
More informationSystolic Parallel Processing
Algorithms and architectures of multimedia systems Systolic Parallel Processing Lecture Notes Matej Zajc University of Ljubljana 2011/12 Matej Zajc 2011 2 1 Introduction to systolic parallel processing
More informationDESIGN AND IMPLEMENTATION OF VLSI SYSTOLIC ARRAY MULTIPLIER FOR DSP APPLICATIONS
International Journal of Computing Academic Research (IJCAR) ISSN 2305-9184 Volume 2, Number 4 (August 2013), pp. 140-146 MEACSE Publications http://www.meacse.org/ijcar DESIGN AND IMPLEMENTATION OF VLSI
More informationA Comprehensive Methodology for Algorithm Characterization, Regularization and Mapping Into Optimal VLSI Arrays.
Louisiana State University LSU Digital Commons LSU Historical Dissertations and Theses Graduate School 1989 A Comprehensive Methodology for Algorithm Characterization, Regularization and Mapping Into Optimal
More informationDesign of an Efficient Architecture for Advanced Encryption Standard Algorithm Using Systolic Structures
Design of an Efficient Architecture for Advanced Encryption Standard Algorithm Using Systolic Structures 1 Suresh Sharma, 2 T S B Sudarshan 1 Student, Computer Science & Engineering, IIT, Khragpur 2 Assistant
More informationChhattisgarh Swami Vivekanand Technical University, Bhilai
Sr. No. Chhattisgarh Swami Vivekanand Technical University, Bhilai SCHEME OF MASTER OF TECHNOLOGY Electronics & Telecommunication Engineering (VLSI & Embedded System Design) Board Of Studies Code M. Tech.
More informationImplementation of A Optimized Systolic Array Architecture for FSBMA using FPGA for Real-time Applications
46 IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.3, March 2008 Implementation of A Optimized Systolic Array Architecture for FSBMA using FPGA for Real-time Applications
More informationSystolic Arrays. Presentation at UCF by Jason HandUber February 12, 2003
Systolic Arrays Presentation at UCF by Jason HandUber February 12, 2003 Presentation Overview Introduction Abstract Intro to Systolic Arrays Importance of Systolic Arrays Necessary Review VLSI, definitions,
More informationROTATION SCHEDULING ON SYNCHRONOUS DATA FLOW GRAPHS. A Thesis Presented to The Graduate Faculty of The University of Akron
ROTATION SCHEDULING ON SYNCHRONOUS DATA FLOW GRAPHS A Thesis Presented to The Graduate Faculty of The University of Akron In Partial Fulfillment of the Requirements for the Degree Master of Science Rama
More informationDesign and Implementation of VLSI 8 Bit Systolic Array Multiplier
Design and Implementation of VLSI 8 Bit Systolic Array Multiplier Khumanthem Devjit Singh, K. Jyothi MTech student (VLSI & ES), GIET, Rajahmundry, AP, India Associate Professor, Dept. of ECE, GIET, Rajahmundry,
More informationChapter 8 Folding. VLSI DSP 2008 Y.T. Hwang 8-1. Introduction (1)
Chapter 8 olding LSI SP 008 Y.T. Hang 8- folding Introduction SP architecture here multiple operations are multiplexed to a single function unit Trading area for time in a SP architecture Reduce the number
More informationAnalysis of Radix- SDF Pipeline FFT Architecture in VLSI Using Chip Scope
Analysis of Radix- SDF Pipeline FFT Architecture in VLSI Using Chip Scope G. Mohana Durga 1, D.V.R. Mohan 2 1 M.Tech Student, 2 Professor, Department of ECE, SRKR Engineering College, Bhimavaram, Andhra
More informationDESIGN OF AN FFT PROCESSOR
1 DESIGN OF AN FFT PROCESSOR Erik Nordhamn, Björn Sikström and Lars Wanhammar Department of Electrical Engineering Linköping University S-581 83 Linköping, Sweden Abstract In this paper we present a structured
More informationIncreasing interconnection network connectivity for reducing operator complexity in asynchronous vision systems
Increasing interconnection network connectivity for reducing operator complexity in asynchronous vision systems Valentin Gies and Thierry M. Bernard ENSTA, 32 Bd Victor 75015, Paris, FRANCE, contact@vgies.com,
More informationChap. 3. Input/Output
Chap. 3. Input/Output 17 janvier 07 3.1 Principles of I/O hardware 3.2 Principles of I/O software 3.3 Deadlocks [3.4 Overview of IO in Minix 3] [3.5 Block Devices in Minix 3] [3.6 RAM Disks] 3.7 Disks
More informationEECS 312 Digital Integrated Circuits. Instructor s Name: Prof. Pinaki Mazumder. T,Th 3:00 4:30 pm. Overview
EECS 312 igital Integrated Circuits Instructor s Name: Prof. Pinaki Mazumder mazum@eecs.umich.edu T,Th 3:00 4:30 pm 1 Overview Logistics Go over syllabus & Course Overview igital ICs are omnipresent: Applications
More informationScalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA
Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Yun R. Qu, Viktor K. Prasanna Ming Hsieh Dept. of Electrical Engineering University of Southern California Los Angeles, CA 90089
More informationHIGH-LEVEL SYNTHESIS
HIGH-LEVEL SYNTHESIS Page 1 HIGH-LEVEL SYNTHESIS High-level synthesis: the automatic addition of structural information to a design described by an algorithm. BEHAVIORAL D. STRUCTURAL D. Systems Algorithms
More informationVirtex-II Architecture
Virtex-II Architecture Block SelectRAM resource I/O Blocks (IOBs) edicated multipliers Programmable interconnect Configurable Logic Blocks (CLBs) Virtex -II architecture s core voltage operates at 1.5V
More informationESE535: Electronic Design Automation. Today. Question. Question. Intuition. Gate Array Evaluation Model
ESE535: Electronic Design Automation Work Preclass Day 2: January 21, 2015 Heterogeneous Multicontext Computing Array Penn ESE535 Spring2015 -- DeHon 1 Today Motivation in programmable architectures Gate
More informationCSC630/COS781: Parallel & Distributed Computing
CSC630/COS781: Parallel & Distributed Computing Algorithm Design Chapter 3 (3.1-3.3) 1 Contents Preliminaries of parallel algorithm design Decomposition Task dependency Task dependency graph Granularity
More informationStatement of Research
On Exploring Algorithm Performance Between Von-Neumann and VLSI Custom-Logic Computing Architectures Tiffany M. Mintz James P. Davis, Ph.D. South Carolina Alliance for Minority Participation University
More informationHardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University
Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis
More informationContents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11
Preface xvii Acknowledgments xix CHAPTER 1 Introduction to Parallel Computing 1 1.1 Motivating Parallelism 2 1.1.1 The Computational Power Argument from Transistors to FLOPS 2 1.1.2 The Memory/Disk Speed
More informationExercises in DSP Design 2016 & Exam from Exam from
Exercises in SP esign 2016 & Exam from 2005-12-12 Exam from 2004-12-13 ept. of Electrical and Information Technology Some helpful equations Retiming: Folding: ω r (e) = ω(e)+r(v) r(u) F (U V) = Nw(e) P
More informationA Novel Methodology for Designing Radix-2 n Serial-Serial Multipliers
Journal of Computer Science 6 (4): 461-469, 21 ISSN 1549-3636 21 Science Publications A Novel Methodology for Designing Radix-2 n Serial-Serial Multipliers Abdurazzag Sulaiman Almiladi Department of Computer
More informationThis material exempt per Department of Commerce license exception TSU. Improving Performance
This material exempt per Department of Commerce license exception TSU Performance Outline Adding Directives Latency Manipulating Loops Throughput Performance Bottleneck Summary Performance 13-2 Performance
More informationUnit 2: High-Level Synthesis
Course contents Unit 2: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 2 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis
More informationCONTENTS Equivalence Classes Partition Intersection of Equivalence Relations Example Example Isomorphis
Contents Chapter 1. Relations 8 1. Relations and Their Properties 8 1.1. Definition of a Relation 8 1.2. Directed Graphs 9 1.3. Representing Relations with Matrices 10 1.4. Example 1.4.1 10 1.5. Inverse
More informationRetiming. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall,
Retiming ( 范倫達 ), Ph.. epartment of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall, 2 ldvan@cs.nctu.edu.tw http://www.cs.nctu.tw/~ldvan/ Outlines Introduction efinitions and Properties
More informationStructure of Computer Systems
288 between this new matrix and the initial collision matrix M A, because the original forbidden latencies for functional unit A still have to be considered in later initiations. Figure 5.37. State diagram
More informationCommunication Complexity and Parallel Computing
Juraj Hromkovic Communication Complexity and Parallel Computing With 40 Figures Springer Table of Contents 1 Introduction 1 1.1 Motivation and Aims 1 1.2 Concept and Organization 4 1.3 How to Read the
More informationvs. GPU Performance Without the Answer University of Virginia Computer Engineering g Labs
Where is the Data? Why you Cannot Debate CPU vs. GPU Performance Without the Answer Chris Gregg and Kim Hazelwood University of Virginia Computer Engineering g Labs 1 GPUs and Data Transfer GPU computing
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #12 2/21/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Last class Outline
More informationA Crash Course in Compilers for Parallel Computing. Mary Hall Fall, L2: Transforms, Reuse, Locality
A Crash Course in Compilers for Parallel Computing Mary Hall Fall, 2008 1 Overview of Crash Course L1: Data Dependence Analysis and Parallelization (Oct. 30) L2 & L3: Loop Reordering Transformations, Reuse
More informationThe Memory Hierarchy Part I
Chapter 6 The Memory Hierarchy Part I The slides of Part I are taken in large part from V. Heuring & H. Jordan, Computer Systems esign and Architecture 1997. 1 Outline: Memory components: RAM memory cells
More informationUNIT-III REGISTER TRANSFER LANGUAGE AND DESIGN OF CONTROL UNIT
UNIT-III 1 KNREDDY UNIT-III REGISTER TRANSFER LANGUAGE AND DESIGN OF CONTROL UNIT Register Transfer: Register Transfer Language Register Transfer Bus and Memory Transfers Arithmetic Micro operations Logic
More informationMatrix Multiplication
Systolic Arrays Matrix Multiplication Is used in many areas of signal processing, graph theory, graphics, machine learning, physics etc Implementation considerations are based on precision, storage and
More informationSystolic Arrays for Reconfigurable DSP Systems
Systolic Arrays for Reconfigurable DSP Systems Rajashree Talatule Department of Electronics and Telecommunication G.H.Raisoni Institute of Engineering & Technology Nagpur, India Contact no.-7709731725
More informationComputer Architectures for Deep Learning. Ethan Dell and Daniyal Iqbal
Computer Architectures for Deep Learning Ethan Dell and Daniyal Iqbal Agenda Introduction to Deep Learning Challenges Architectural Solutions Hardware Architectures CPUs GPUs Accelerators FPGAs SOCs ASICs
More informationRe-configurable VLIW processor for streaming data
International Workshop NGNT 97 Re-configurable VLIW processor for streaming data V. Iossifov Studiengang Technische Informatik, FB Ingenieurwissenschaften 1, FHTW Berlin. G. Megson School of Computer Science,
More informationImplementation Of Quadratic Rotation Decomposition Based Recursive Least Squares Algorithm
157 Implementation Of Quadratic Rotation Decomposition Based Recursive Least Squares Algorithm Manpreet Singh 1, Sandeep Singh Gill 2 1 University College of Engineering, Punjabi University, Patiala-India
More informationEE178 Spring 2018 Lecture Module 4. Eric Crabill
EE178 Spring 2018 Lecture Module 4 Eric Crabill Goals Implementation tradeoffs Design variables: throughput, latency, area Pipelining for throughput Retiming for throughput and latency Interleaving for
More informationAll MSEE students are required to take the following two core courses: Linear systems Probability and Random Processes
MSEE Curriculum All MSEE students are required to take the following two core courses: 3531-571 Linear systems 3531-507 Probability and Random Processes The course requirements for students majoring in
More informationPrinciple of Polyhedral model for loop optimization. cschen 陳鍾樞
Principle of Polyhedral model for loop optimization cschen 陳鍾樞 Outline Abstract model Affine expression, Polygon space Polyhedron space, Affine Accesses Data reuse Data locality Tiling Space partition
More informationLecture 7: Parallel Processing
Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction
More informationI) The Question paper contains 40 multiple choice questions with four choices and student will have
Time: 3 Hrs. Model Paper I Examination-2016 BCA III Advanced Computer Architecture MM:50 I) The Question paper contains 40 multiple choice questions with four choices and student will have to pick the
More informationHigh Performance Hardware Architectures for A Hexagon-Based Motion Estimation Algorithm
High Performance Hardware Architectures for A Hexagon-Based Motion Estimation Algorithm Ozgur Tasdizen 1,2,a, Abdulkadir Akin 1,2,b, Halil Kukner 1,2,c, Ilker Hamzaoglu 1,d, H. Fatih Ugurdag 3,e 1 Electronics
More informationRUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch
RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC Zoltan Baruch Computer Science Department, Technical University of Cluj-Napoca, 26-28, Bariţiu St., 3400 Cluj-Napoca,
More informationINTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 10 /Issue 1 / JUN 2018
A HIGH-PERFORMANCE FIR FILTER ARCHITECTURE FOR FIXED AND RECONFIGURABLE APPLICATIONS S.Susmitha 1 T.Tulasi Ram 2 susmitha449@gmail.com 1 ramttr0031@gmail.com 2 1M. Tech Student, Dept of ECE, Vizag Institute
More informationQERL INTRODUCTION. Overview of Capturing Video and Audio from Start to Finish. Page 3. Use EyeTV to capture and condition audio and video data
N C S U Q U A L I T A T I V E E U C A T I O N E S E A C H L A B QEL INTOUCTION Overview of Capturing Video and Audio from Start to Finish Version.5. - Summer 009 Lab Overview An introduction to the lab
More informationAdministrivia. CSE 370 Spring 2006 Introduction to Digital Design Lecture 9: Multilevel Logic
SE 370 Spring 2006 Introduction to igital esign Lecture 9: Multilevel Logic Last Lecture Introduction to Verilog Today Multilevel Logic Hazards dministrivia Hand in Homework #3 Homework #3 posted this
More informationHigh-Level Synthesis (HLS)
Course contents Unit 11: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 11 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis
More informationP-1/26. Samir Palnitkar. Prentice-Hall, Inc. INSTRUCTOR : CHING-LUNG SU.
: P-1/26 Textbook: Verilog HDL 2 nd. Edition Samir Palnitkar Prentice-Hall, Inc. : INSTRUCTOR : CHING-LUNG SU E-mail: kevinsu@yuntech.edu.tw Chapter 4 P-2/26 Chapter 4 Modules and Outline of Chapter 4
More informationMapping Algorithms to Hardware By Prawat Nagvajara
Electrical and Computer Engineering Mapping Algorithms to Hardware By Prawat Nagvajara Synopsis This note covers theory, design and implementation of the bit-vector multiplication algorithm. It presents
More informationCOMP Parallel Computing. SMM (1) Memory Hierarchies and Shared Memory
COMP 633 - Parallel Computing Lecture 6 September 6, 2018 SMM (1) Memory Hierarchies and Shared Memory 1 Topics Memory systems organization caches and the memory hierarchy influence of the memory hierarchy
More informationCOURSE DESCRIPTION. CS 232 Course Title Computer Organization. Course Coordinators
COURSE DESCRIPTION Dept., Number Semester hours CS 232 Course Title Computer Organization 4 Course Coordinators Badii, Joseph, Nemes 2004-2006 Catalog Description Comparative study of the organization
More informationWhat is a parallel computer?
7.5 credit points Power 2 CPU L 2 $ IBM SP-2 node Instructor: Sally A. McKee General interconnection network formed from 8-port switches Memory bus Memory 4-way interleaved controller DRAM MicroChannel
More informationAdditional Slides to De Micheli Book
Additional Slides to De Micheli Book Sungho Kang Yonsei University Design Style - Decomposition 08 3$9 0 Behavioral Synthesis Resource allocation; Pipelining; Control flow parallelization; Communicating
More informationAdvanced Topics UNIT 2 PERFORMANCE EVALUATIONS
Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Structure Page Nos. 2.0 Introduction 4 2. Objectives 5 2.2 Metrics for Performance Evaluation 5 2.2. Running Time 2.2.2 Speed Up 2.2.3 Efficiency 2.3 Factors
More informationGraph Theory S 1 I 2 I 1 S 2 I 1 I 2
Graph Theory S I I S S I I S Graphs Definition A graph G is a pair consisting of a vertex set V (G), and an edge set E(G) ( ) V (G). x and y are the endpoints of edge e = {x, y}. They are called adjacent
More informationDESIGN OF A HIGH SPEED SYNCHRONOUS MEMORY BUS INTERFACE
ESIGN OF HIGH SPEE SYNCHRONOUS MEMORY BUS INTERFCE Mayank Gupta mayank@ee.ucla.edu UI:92995846 Jen-Wei Ko jenwei@ee.ucla.edu UI:85777 University of California, Los ngeles M26 Term Project Report Professor
More informationRedundancy in fault tolerant computing. D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992
Redundancy in fault tolerant computing D. P. Siewiorek R.S. Swarz, Reliable Computer Systems, Prentice Hall, 1992 1 Redundancy Fault tolerance computing is based on redundancy HARDWARE REDUNDANCY Physical
More informationParallel Algorithm Design. Parallel Algorithm Design p. 1
Parallel Algorithm Design Parallel Algorithm Design p. 1 Overview Chapter 3 from Michael J. Quinn, Parallel Programming in C with MPI and OpenMP Another resource: http://www.mcs.anl.gov/ itf/dbpp/text/node14.html
More informationA VLSI DSP DESIGN AND IMPLEMENTATION O F COMB FILTER USING UN-FOLDING METHODOLOGY
VLSI SP ESIGN N IMPLEMENTTION O F COM FILTER USING UN-FOLING METHOOLOGY 1 PURU GUPT & 2 TRUN KUMR RWT 1&2 ept. of Electronics and Communication Engineering, Netaji Subhas Institute of Technology -warka
More informationMassive Parallel QCD Computing on FPGA Accelerator with Data-Flow Programming
Massive Parallel QCD Computing on FPGA Accelerator with Data-Flow Programming Thomas Janson and Udo Kebschull Infrastructure and Computer Systems in Data Processing (IRI) Goethe University Frankfurt Germany
More informationParallel Architectures
Parallel Architectures Instructor: Tsung-Che Chiang tcchiang@ieee.org Department of Science and Information Engineering National Taiwan Normal University Introduction In the roughly three decades between
More informationQuantitative Finance COURSE NUMBER: 22:839:615 COURSE TITLE: Special Topics Oriented Programming 2
Quantitative Finance COURSE NUMBER: 22:839:615 COURSE TITLE: Special Topics Oriented Programming 2 COURSE DESCRIPTION This course assumes a student has prior programming language experience with C++. It
More informationVLSI Designs for Digital Signal Processing
VLSI esigns for igital Signal Processing Instructor: Yin-Tsung Hwang epartment of Electrical Engineering National Chung Hsing University VLSI SP 2010 Y.T.HWANG 1-1 Chapter 1 Introduction to VLSI SP VLSI
More informationMain Points of the Computer Organization and System Software Module
Main Points of the Computer Organization and System Software Module You can find below the topics we have covered during the COSS module. Reading the relevant parts of the textbooks is essential for a
More informationResearch Incubator: Combinatorial Optimization. Dr. Lixin Tao December 9, 2003
Research Incubator: Combinatorial Optimization Dr. Lixin Tao December 9, 23 Content General Nature of Research on Combinatorial Optimization Problem Identification and Abstraction Problem Properties and
More informationPAPER Design of Optimal Array Processors for Two-Step Division-Free Gaussian Elimination
1503 PAPER Design of Optimal Array Processors for Two-Step Division-Free Gaussian Elimination Shietung PENG and Stanislav G. SEDUKHIN Nonmembers SUMMARY The design of array processors for solving linear
More informationAlgorithms and Applications
Algorithms and Applications 1 Areas done in textbook: Sorting Algorithms Numerical Algorithms Image Processing Searching and Optimization 2 Chapter 10 Sorting Algorithms - rearranging a list of numbers
More informationSpeculation and Future-Generation Computer Architecture
Speculation and Future-Generation Computer Architecture University of Wisconsin Madison URL: http://www.cs.wisc.edu/~sohi Outline Computer architecture and speculation control, dependence, value speculation
More informationIntroduction. Sungho Kang. Yonsei University
Introduction Sungho Kang Yonsei University Outline VLSI Design Styles Overview of Optimal Logic Synthesis Model Graph Algorithm and Complexity Asymptotic Complexity Brief Summary of MOS Device Behavior
More informationLinking Layout to Logic Synthesis: A Unification-Based Approach
Linking Layout to Logic Synthesis: A Unification-Based Approach Massoud Pedram Department of EE-Systems University of Southern California Los Angeles, CA February 1998 Outline Introduction Technology and
More informationPYRAMID: A Hierarchical Approach to E-beam Proximity Effect Correction
PYRAMID: A Hierarchical Approach to E-beam Proximity Effect Correction Soo-Young Lee Auburn University leesooy@eng.auburn.edu Presentation Proximity Effect PYRAMID Approach Exposure Estimation Correction
More informationConcurrent Programming Introduction
Concurrent Programming Introduction Frédéric Haziza Department of Computer Systems Uppsala University Ericsson - Fall 2007 Outline 1 Good to know 2 Scenario 3 Definitions 4 Hardware 5 Classical
More informationCOPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code
COPY RIGHT 2018IJIEMR.Personal use of this material is permitted. Permission from IJIEMR must be obtained for all other uses, in any current or future media, including reprinting/republishing this material
More informationEECS Components and Design Techniques for Digital Systems. Lec 20 RTL Design Optimization 11/6/2007
EECS 5 - Components and Design Techniques for Digital Systems Lec 2 RTL Design Optimization /6/27 Shauki Elassaad Electrical Engineering and Computer Sciences University of California, Berkeley Slides
More informationMemory hierarchy review. ECE 154B Dmitri Strukov
Memory hierarchy review ECE 154B Dmitri Strukov Outline Cache motivation Cache basics Six basic optimizations Virtual memory Cache performance Opteron example Processor-DRAM gap in latency Q1. How to deal
More informationComputing the Discrete Fourier Transform on FPGA Based Systolic Arrays
Computing the Discrete Fourier Transform on FPGA Based Systolic Arrays Chris Dick School of Electronic Engineering La Trobe University Melbourne 3083, Australia Abstract Reconfigurable logic arrays allow
More informationParallel Programming Multicore systems
FYS3240 PC-based instrumentation and microcontrollers Parallel Programming Multicore systems Spring 2011 Lecture #9 Bekkeng, 4.4.2011 Introduction Until recently, innovations in processor technology have
More informationA HIGH PERFORMANCE FIR FILTER ARCHITECTURE FOR FIXED AND RECONFIGURABLE APPLICATIONS
A HIGH PERFORMANCE FIR FILTER ARCHITECTURE FOR FIXED AND RECONFIGURABLE APPLICATIONS Saba Gouhar 1 G. Aruna 2 gouhar.saba@gmail.com 1 arunastefen@gmail.com 2 1 PG Scholar, Department of ECE, Shadan Women
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #8 2/7/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class
More informationCPE300: Digital System Architecture and Design
CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Layered View of the Computer http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Recap Assembly/Machine Programmer View
More informationESE534: Computer Organization. Tabula. Previously. Today. How often is reuse of the same operation applicable?
ESE534: Computer Organization Day 22: April 9, 2012 Time Multiplexing Tabula March 1, 2010 Announced new architecture We would say w=1, c=8 arch. 1 [src: www.tabula.com] 2 Previously Today Saw how to pipeline
More informationELE 455/555 Computer System Engineering. Section 4 Parallel Processing Class 1 Challenges
ELE 455/555 Computer System Engineering Section 4 Class 1 Challenges Introduction Motivation Desire to provide more performance (processing) Scaling a single processor is limited Clock speeds Power concerns
More informationVolume 5, Issue 5 OCT 2016
DESIGN AND IMPLEMENTATION OF REDUNDANT BASIS HIGH SPEED FINITE FIELD MULTIPLIERS Vakkalakula Bharathsreenivasulu 1 G.Divya Praneetha 2 1 PG Scholar, Dept of VLSI & ES, G.Pullareddy Eng College,kurnool
More informationParallel Processors. Session 1 Introduction
Parallel Processors Session 1 Introduction Applications of Parallel Processors Structural Analysis Weather Forecasting Petroleum Exploration Fusion Energy Research Medical Diagnosis Aerodynamics Simulations
More informationDigital Design Methodology (Revisited) Design Methodology: Big Picture
Digital Design Methodology (Revisited) Design Methodology Design Specification Verification Synthesis Technology Options Full Custom VLSI Standard Cell ASIC FPGA CS 150 Fall 2005 - Lec #25 Design Methodology
More informationTransaction Level Modeling with SystemC. Thorsten Grötker Engineering Manager Synopsys, Inc.
Transaction Level Modeling with SystemC Thorsten Grötker Engineering Manager Synopsys, Inc. Outline Abstraction Levels SystemC Communication Mechanism Transaction Level Modeling of the AMBA AHB/APB Protocol
More information